Sample information for genotyping studies can be loaded from either VCF or delimited text (e.g., CSV) format files. A delimited text file format should contain each sample in its own row as shown below. Alternatively, you can transpose the data turning the rows into columns.

10101 1 1 1 MZ00023554 TC248295 TIGR:TC248295
 TM00023554 GA Heat shock protein 82. {Oryza sativa;} ^|^PIR|S25541|S25541
 heat shock protein 82 - rice
 (strain Taichung Native One) {Oryza sativa;} ^|^GB|BAD08897.1|42407751|AP003892 heat shock
 protein 82 {Oryza sativa (j-TRUNCATED- SP|P33126|HS82_ORYSA AZM4_39948 NA
 LOC_Os08g39140;LOC_Os09g30430;LOC_Os09g30450 3.6.4.9|2.7.1.- NA NA 1
10102 1 1 2 MZ00023408 TC248006 TIGR:TC248006
 TM00023408 GA putative histone H2A {Oryza sativa (japonica cultivar-group);}
 GB|AAS75248.1|45680447|AC093921 AZM4_135834 NA
 LOC_Os12g34510;LOC_Os03g51200;LOC_Os01g31800 NA NA NA N/A

ProcessRun Section

The ProcessRun section contains information about the data loading.  

[ProcessRun]
; Run description: if specified, a custom description will be used, otherwise, "Added sample information for {MapSet Accession No.}." will be used.
RunDescription="Added sample information for GPL6438 from
http://ncbi.nlm.nih.gov/geo/query/acc.cgi?view=data&acc=GPL6438&id=45515&db=GeoDb_blob21."

MapSet Section

The MapSet section describes the map set your genotyping data belongs to. The following shows an example MapSet section.

[MapSet]
; Either MapSetId or MapSetPath is required.
; MapSetId: id of a target map set.
;MapSetId=232287170
; MapSetPath: path of a target map set.
MapSetPath="/Zea mays/Corn annotation"


Sample Section

The Sample section defines file type, sources, and their parsing information.

[Sample]
; Source (required): a TXT file located locally or remotely accessible via URL.
Source="Samples\Sample\GPL6438-tbl-1.txt"
; Number format culture: specifies a culture name used to parse numbers in data. Default value is en - English
;  e.g. de - German, es - Spanish, fr - French. For more cultures, https://msdn.microsoft.com/en-us/goglobal/bb896001.aspx
;NumberFormatCulture="fr"
; AddModes: choose a mode to add name, description or qualifiers among
;           1. AddAnyway: Add regardless of duplication. Faster as it does not check.
;           2. AddOrDie: add if not exists; die (throw exception) otherwise.
;           3. AddOrUpdate: add if not exists; update otherwise.
;           4. AddOrSkip: add if not exists; skip otherwise.
SampleNameAddMode=AddOrUpdate
SampleQualifierAddMode=AddOrUpdate
; FileType: {Text (delimited text file), Vcf}
FileType=Text
;------------------------------------------------------------------------------------
; Parsing Information
;------------------------------------------------------------------------------------
; SkipHeaderLines: the number of lines to skip parsing
;TextSkipHeaderLines=0
; CommentPrefix: comment prefix to skip parsing
;TextCommentPrefix="#"
; Delimiter: specify one among Colon(:), Comma(,), Period(.), Hyphen(-), SemiColon(;), Slash(/), Tab(\t), VerticalBar(|)
TextDelimiter=Tab
;-------------------------
; SampleNameIndex (required): column index(0-based) for sample names.
TextSampleNameIndex=5
; SampleDescriptionIndex: column index(0-based) for sample description.
SampleDescriptionIndex=10
; SampleQualifierIndex: column index(0-based) for additional information.
; TextSampleQualifierIndex.INDEX(0-based)=NAME(,TYPE(,FORMAT))
;TextSampleQualifierIndex.3="Description"
;TextSampleQualifierIndex.4="Condition1","int"
;TextSampleQualifierIndex.5="Condition2","double","0.0000"
TextSampleQualifierIndex.4="Vendor ID"
TextSampleQualifierIndex.7="GenBank Accession"
TextSampleQualifierIndex.8="TIGR ID"
TextSampleQualifierIndex.15="EC"
TextSampleQualifierIndex.16="GO"
TextSampleQualifierIndex.18="Chromosome"

The sample information can be parsed out of the VCF files.

The VCF sample shown below contains ##SAMPLE parameter lines in the header that correspond to sample names listed in the #CHROM line.

Note

The ID and Description parameters are defined in the VCF 4.2 specification.

##fileformat=VCFv4.2
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total Allele Count">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##SAMPLE=<ID=HG00096,Gender=Male,Description="Donor, parents and grandparents were all born in the United Kingdom">
##SAMPLE=<ID=HG00097,Gender=Female,Description="Donor, parents and grandparents were all born in the United Kingdom">
##SAMPLE=<ID=HG00099,Gender=Female,Description="Donor, parents and grandparents were all born in the United Kingdom">
##SAMPLE=<ID=HG00100,Gender=Female,Description="Donor, parents and grandparents were all born in the United Kingdom">
##reference=GRCh37
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00096HG00097HG00099HG00100

Specify qualifiers other than ID or Description in the VcfSampleQualifierKey parameter as shown below.

[Sample]
Source="Samples/Sample/1000genomes.vcf"
;NumberFormatCulture="fr"
FileType=Vcf
;##SAMPLE=<ID=HG00096,Gender=Male,Description="Donor, parents and grandparents were all born in
 the United Kingdom">
VcfSampleNameKey="ID"
VcfSampleDescriptionKey="Description"
VcfSampleQualifierKey.Gender="gender"

The record VcfSampleQualifierKey.Gender="gender" specifies which entry on the ##SAMPLE line ('Gender=Female') should be parsed out, and which qualifier should be created in the database that will store this information (lowercase 'gender').

The sample qualifiers are shown in Persephone's "Select samples" form. The user can select qualifiers from the list on the right and add columns to the grid, like shown below:

The qualifiers normally store strings, but other data types are also supported ('int','double'). For example, to store an integer qualifier 'total seq count' that is read from the record 'tsc' use this construct:

VcfSampleQualifierKey.tsc="total seq count",int