Adding QTLs
Once you have loaded markers and trait terms with PersephoneShell you may need to add QTLs related to certain traits and markers. To add QTLs, you need to use the add command to load a control file with the QTL data. This section demonstrates the procedures to create a QTL study and its QTL data in your Persephone database.
Important
Please make sure the trait ontology is already loaded into your database. See Adding Ontology Terms for more information.
- Download the QTL data file you need from Gramene. For example, the "Rice_QTL.dat" data file at ftp://ftp.gramene.org/pub/gramene/archives/PAST_RELEASES/release44/data/qtl/Rice_QTL.dat, which is shown below.
qtl_accession_id qtl_name published_symbol to_accession trait_category trait_name trait_symbol chromosome start end
AQGJ017 BNL6.32 TO:0000543 Biochemical leaf nitrogen content LFNICN Chr. 1 10121499 10122731AQO030 BNL6.32 TO:0000078 Vigor root dry weight RTDWT Chr. 1 37713253 37713775
..
- Let's assume that the QTL data file (e.g., "Rice_QTL.dat") is located in the directory where you installed PersephoneShell (e.g., C:\PersephoneShell).
- Add the QTL data file in the interactive or the command line mode (see Running PersephoneShell). In the interactive mode, enter the following:
PS> add qtl -c Rice_QTL.ini
In the command line mode the command line should contain the connection name (specified after -s):
C:\PersephoneShell> psh -s ********** add qtls -c Rice_QTL.ini
A verification message will be displayed.
Depending on the instruction StudyFromSourceHeader, the study information can be provided as a part of the INI file under section [Study] or as a part of the data file.
Reading Study from the INI file
The INI file:
[ProcessRun]
; RunDescription: if specified, a custom description will be used,
; otherwise, "Added QTLs for {MapSet Accession No.} from {Sources}." will be used.
RunDescription="Added QTL for IRGSP-1.0.25 from ftp://ftp.gramene.org/pub/gramene/archives/PAST_RELEASES/release44/data/qtl/Rice_QTL.dat."
[MapSet]
; Either MapSetId or MapSetPath is required.
; MapSetId: id of a target map set.
MapSetId=2
; MapSetPath: path of a target map set.
;MapSetPath="Oryza sativa (rice)/Rice annotation"
[Study]
;-------------------------
; To load more qtls on existing study, please specify a StudyId.
; StudyId: id of a study.
; if specified, all the study information in this section will be ignored.
;StudyId=123456
;-------------------------
; Otherwise, create a study with
; StudyName (required): name of a study.
StudyName="Gramene Rice QTL"
; StudyDescription: description of a study.
StudyDescription=""
;-------------------------
; StudyFromSourceHeader: true/false(default), whether Study should be read from source file
StudyFromSourceHeader=false
; TreatmentName: name of treatment.
;TreatmentName=""
; TreatmentDescription: description of treatment.
;TreatmentDescription=""
[Qtl]
; Source (required): a TXT file located locally or remotely accessible via URL.
Source="o:\Qtl\Rice_QTL.dat.txt"
; Number format culture: specifies a culture name used to parse numbers in data. Default value is en - English.
; e.g. de - German, es - Spanish, fr - French. For more cultures, https://msdn.microsoft.com/en-us/goglobal/bb896001.aspx
;NumberFormatCulture="fr"
; FileType: {Text (delimited text file)|Excel}
FileType=Text
; Origin (required): database or institution that the QTL originates from
Origin="Gramene"
; SearchAliases: indicates if marker aliases will be looked up or not.
SearchAliases=true
; MappingMethod: method to map qtls in the file. e.g. MapQTL
; if not specified, 'Unknown' is used.
;MappingMethod="MapQTL"
; Coordinate system: 1 (one-based) / 0 (zero-based). Default value is 1.
CoordinateSystem=1
; Commit frequency: indicates how often the process commits qtls. Every N qtls.
CommitFrequency=1000
; RebuildIndex: indicates if Qtl indices are rebuilt or not. Default value is true.
;RebuildIndex=false
;-------------------------
; TrackName: track name to be displayed on the plate.
TrackName="Rice QTL"
; TrackDescription: track description shared across maps in the MapSet.
TrackDescription="Rice QTL"
;-------------------------
; AddModes: choose a mode to add mappings and qualifiers among
; 1. AddAnyway: Add regardless of duplication. Faster as it does not check.
; 2. AddOrDie: add if not exists; die (throw exception) otherwise.
; 3. AddOrUpdate: add if not exists; update otherwise.
; 4. AddOrSkip: add if not exists; skip otherwise.
QtlQualifierAddMode=AddOrUpdate
MappingAddMode=AddAnyway
MappingQualifierAddMode=AddOrUpdate
;-----------------------------------------------------------------------------
; Parsing Information
;-------------------------
; SkipHeaderLines: the number of lines to skip parsing
TextSkipHeaderLines=1
; CommentPrefix: comment prefix to skip parsing
;TextCommentPrefix="#"
; Delimiter: specify one among Colon(:), Comma(,), Period(.), Hyphen(-), SemiColon(;), Slash(/), Tab(\t), VerticalBar(|)
TextDelimiter=Tab
;-------------------------
; 1) QTL
; QtlNameIndex (required): column index(0-based) for Qtl name.
TextQtlNameIndex=0
; QtlAccessionIndex: column index(0-based) for Qtl accession.
TextQtlAccessionIndex=0
; QtlDescriptionIndex: column index(0-based) for Qtl description.
;TextQtlDescriptionIndex=4
;TextQtlQualifierIndex.COL_INDEX(0-based)=qualifierName((:displayText),dataType,dataFormat)
TextQtlQualifierIndex.5="Symbol:Trait Symbol"
TextQtlQualifierIndex.2="Alias"
;-------------------------
; 2) Trait
; TraitTermAccessionIndex (required): column index(0-based) for trait term accession.
TextTraitTermAccessionIndex=3
;-------------------------
; 3) Mapping
; MapNameIndex (required): column index(0-based) for map names.
TextMapNameIndex=7
; ExtStartIndex: column index(0-based) for extended start position for confidence interval.
;TextExtStartIndex=13
; StartIndex (required): column index(0-based) for start positions.
TextStartIndex=8
; PeakSkewnessIndex: column index(0-based) for peak skewness, [0,1].
;TextPeakSkewnessIndex=14
; EndIndex (required): column index(0-based) for end positions.
TextEndIndex=9
; ExtEndIndex: column index(0-based) for extended end position for confidence interval.
;TextExtEndIndex=13
; LODIndex: column index(0-based) for LOD score.
;TextLODIndex=7
;TextMappingQualifierIndex.COL_INDEX(0-based)=QUAL_NAME,DISPLAY_TEXT,DESCRIPTION
;TextMappingQualifierIndex.3="","",""
;-------------------------
; 4) Marker
; StartMarkerNameIndex: column index(0-based) for start marker names.
TextStartMarkerNameIndex=10
; EndMarkerNameIndex: column index(0-based) for end marker names.
TextEndMarkerNameIndex=11
[MapMapping]
; If no mapping is found in this section, it assumes that each MAP_NAME in file exactly matches a MAP_NAME in DB.
; If map names in file are different from those in DB, map each MAP_NAME in file to its corresponding MAP_NAME in DB.
; Otherwise, marker will be created without mapping.
;MAP_NAME in file=MAP_NAME in DB
"Chr. 1"=Chr1
"Chr. 2"=Chr2
"Chr. 3"=Chr3
"Chr. 4"=Chr4
"Chr. 5"=Chr5
"Chr. 6"=Chr6
"Chr. 7"=Chr7
"Chr. 8"=Chr8
"Chr. 9"=Chr9
"Chr. 10"=Chr10
"Chr. 11"=Chr11
"Chr. 12"=Chr12
"Chr.Sy"=ChrSy
"Chr.Un"=ChrUn
[DbSequences]
; The ID columns below are used in loading qtls.
; Oracle: If there is no sequence/trigger assigned to these columns, you must specify a sequence for them.
;PROCESS_RUN.RUN_ID=ID_SEQ
;STUDY.STUDY_ID=ID_SEQ
;STUDY_QUALIFIER.QUALIFIER_ID=ID_SEQ
;DESCRIPTION.DESCR_ID=ID_SEQ
;TRACK.TRACK_ID=ID_SEQ
;TRACK_STYLE.TRACK_STYLE_ID=ID_SEQ
;QTL.QTL_ID=ID_SEQ
;QTL_NAME.QTL_NAME_ID=ID_SEQ
;QTL_QUALIFIER.QUALIFIER_ID=ID_SEQ
;QTL_QUALIFIER_NAME.QUALIFIER_NAME_ID=ID_SEQ
;QUALIFIER_DISPLAY.QUAL_ID=ID_SEQ
;MK_MAPPING.MAPPING_ID=ID_SEQ
;MAPPING_QUALIFIER.MAPPING_QUALIFIER_ID=ID_SEQ
Reading Study from the data file
If StudyFromSourceHeader is set to true, the main study information is coming from the CSV data file. It starts with the section that has a special header customized by the variable StudySectionSubstr under [QtlHeader] section of the INI file. The Study section would end at the line that contains the header text of the next section with QTLs. It is common to have the section headers as '-----Study-------' and '------QTL----':
The sample data file is here:
----------------------------------Study-----------------------,,,,,,,,,,,,,,,,,,,,,,,,
Map Set UID,sampleMapSet,,,,,,,,,,,,,,,,,,,,,,,
Study UID,Test study 8,,,,,,,,,,,,,,,,,,,,,,,
Study Description,describe,,,,,,,,,,,,,,,,,,,,,,,
Locus Name / Alias,aa,,,,,,,,,,,,,,,,,,,,,,,
Locus Type,bb,,,,,,,,,,,,,,,,,,,,,,,
Project Code,cc,,,,,,,,,,,,,,,,,,,,,,,
Contact Unit,Institute-123,,,,,,,,,,,,,,,,,,,,,,,
Taxon,Test arabidopsis genotype,,,,,,,,,,,,,,,,,,,,,,,
Location,dd,,,,,,,,,,,,,,,,,,,,,,,
Study Date(s),ee,,,,,,,,,,,,,,,,,,,,,,,
Analysis Method,ff,,,,,,,,,,,,,,,,,,,,,,,
Population Type / Generation,gg,,,,,,,,,,,,,,,,,,,,,,,
Population size,222,,,,,,,,,,,,,,,,,,,,,,,
Parent A Tissue UID / Parent A Material,xx,,,,,,,,,,,,,,,,,,,,,,,
Parent B Tissue UID / Parent B Material,yy,,,,,,,,,,,,,,,,,,,,,,,
Study Condition,zz,,,,,,,,,,,,,,,,,,,,,,,
Reference,rr,,,,,,,,,,,,,,,,,,,,,,,
Empty qualifier,,,,,,,,,,,,,,,,,,,,,,,,
--------------------------------QTL------------------------------------------------,,,,,,,,,,,,,,,,,,,,,,,,
Map Name,QTL Name,QTL Significance,QTL Explanation of Variance,QTL Description of dominance,Trait Trait Name,Trait Ontology ID,"Trait Measurement criteria",Trait Heritibility,Marker Marker start,Marker Marker peak,Marker Marker end,Marker Position start,Marker position peak,Marker position end,Genotypes Allele Parent A,Genotypes Allele parent B,Phenotype mu Parent A,Phenotype mu Heteroz,Phenotype mu Parent B,Phenotype Intercept,Phenotype Additive Effect,Phenotype Dominance effect,Repository URL,Skewness
1,qtlA,33.01,17.5,Forked trusses is dominant,forked trusses,TO:0000707,# splitting clusters of the first 6 to 8 clusters,0,markerA,markerNew,markerNew2,0,11,22,G,A,1.86413,2.61951,4.73846,,-1.43717,-0.681784,0.6,
1,qtlB,3.01,17.5,TraitDescription,forked trusses,TO:0000707,# splitting clusters of the first 6 to 8 clusters,0,markerA,markerC,markerNew2,10,13,22,G,A,1.86413,2.61951,4.73846,,-1.43717,-0.681784,0.6,