This section describes how to add variant calls to the database with the add command (see Add). Currently, Persephone supports Single Nucleotide Polymorphisms [SNPs] and Insertion/Deletion [indels] and can read VCF files with tetraploid variants. The steps below show how to use the add command with the control file (see Control Files) "add_GRCh37.p13_1000genomes.ini" to load 1000 genomes Variant Call Format (VCF) files.

  1. Review the "add_GRCh37.p13_1000genomes.ini" control file, which is included in the PersephoneShell file archive "Samples/Variant" folder and is shown below.

[ProcessRun]
; RunDescription: if specified, a custom description will be used. Will be ignored if a RunId is specified.
;                 otherwise, "Added variants for {MapSet Accession No.} from {Sources}." will be used.
RunDescription="Added 1000 genomes VCFs from http://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502."
 
[MapSet]
; Either MapSetId or MapSetPath is required.
; MapSetId: id of a target map set.
MapSetId=240044500
; MapSetPath: path of a target map set.
; MapSetPath="/Homo spiens/GRCh37.p13"
 
[Variant]
; Sources (required): Comma delimited VCF/Text files (comma-delimited) located locally or remotely accessible via URL.
Sources=ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz,...,
ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
; FileType: {Vcf|Text (delimited text file);Excel}
FileType=Vcf
; Coordinate system: 1 (one-based) / 0 (zero-based). Default value is 1.
; CoordinateSystem=1
; IncludedSamples: Comma-delimited sample names in the VCF source to be included.
;                  if not specified, all the samples in the file will be included.
;                  To list sample names in VCF, run 'bcftools query -l VCF_FILE'
;                  SampleNamesToFilter is obsolete.
IncludedSamples= HG00096,HG00097,HG00099,HG00100,HG00101,HG00102,HG00103,HG00104,HG00105,HG00106,HG00108
; ExcludedSamples: Comma-delimited sample names in the VCF source to be excluded.
;                  if not specified, no sample in the file will be excluded.
;ExcludedSamples=HG00096

[MapMapping]
; If no mapping is found in this section, it assumes that each MAP_NAME in file exactly matches a MAP_NAME in DB.
; If map names in file are different from those in DB, map each MAP_NAME in file to its MAP_ID or ACCESION_NO in DB.
; Otherwise, marker will be created without mapping.
;MAP_NAME in file=MAP_ID or ACCESSION_NO in DB
1=240044684
2=240044685
3=240044686
4=240044687
5=240044688
6=240044689
7=240044690
8=240044691
9=240044692
10=240044693
11=240044694
12=240044695
13=240044696
14=240044697
15=240044698
16=240044699
17=240044700
18=240044701
19=240044702
20=240044703
21=240044704
22=240044705
X=240044706
Y=240044707

  1. Assume that the control file "add_GRCh37.p13_1000genomes.ini" is located in the directory where you installed PersephoneShell (e.g., C:\PersephoneShell).
  2. You can add the variants in the interactive or the command line mode (see Running PersephoneShell). In the interactive mode, enter:

PS> add variants -c add_GRCh37.p13_1000genomes.ini

In the command line mode the command would be (use the proper connection name after -s)

C:\PersephoneShell> psh -s ********** add variants -c add_GRCh37.p13_1000genomes.ini

A verification message will be displayed.

Note

Adding the variants from VCF file is usually a lengthy procedure. It requires the test mode (-t) to be executed first, during which the data from VCF file is analyzed and compressed on a local disk. The following loading command would use the binary blocks assembled during the test phase. Running the adding command without test will trigger the test anyway.