This section describes how to add markers using delimited text files to your Persephone database with the add command (see Add). The steps below show how to use the add command with the control file (see Control Files) "add_Bdistachyon_Bd3-1xBd21.ini" to load markers (see the table below) for Brachypodium distachyon.

In general Persephone terms, a marker is a named location on a map. It can be just a single position or an interval. Markers should have at least one name which is considered a primary name, the other aliases are optional. Each name has a name type, it helps to distinguish different naming systems.

A marker can also have associated sequences: a probe(s) sequence, primers, sequences of related proteins, etc.

For these terms, such as names and sequences, the Persephone database has separate tables. Other marker properties are stored in a form of key-value pairs, called qualifiers. Both the key and the value are stored as strings but it is possible to give the value a different type, for example, a floating point number. This helps in sorting and filtering qualifiers based on their values.

Note

This section describes how to load markers with Delimited TXT files. See Add Markers in the Use Case section for steps to load markers from GFF files.

Species

Marker Name

Marker Type

Population

Linkage Group

Position (cM)

Brachypodium distachyon

BdSSR544

SSR

Bd3-1 x Bd21

a

0

Brachypodium distachyon

CD725461

COS

Bd3-1 x Bd21

a

15.9

Brachypodium distachyon

Wheat61

COS

Bd3-1 x Bd21

a

23.4

Brachypodium distachyon

INTR2-8

COS

Bd3-1 x Bd21

a

35.8

Loading markers with mapping

Review a copy of the "add_Bdistachyon_Bd3-1xBd21.ini" control file, which is included in "Samples\Marker" folder and is shown below. Please note that the data in the table above corresponds to the data in the INI file below. The column index values specify which column contains which category. For example,

TextMapNameIndex=4

tells that the fifth column (index=4) contains map names.

Another version of column index instruction can be written as:

TextMarkerNameIndex.1=FULL_NAME

This means that the column with index 1 contains marker names that have type "FULL_NAME". A marker can have multiple names, each of them has its type, depending on the naming system used.

The markers and their positions should be listed in a file (Tab-delimited text or GFF). The positions reference the maps loaded in the previous step. It is possible to derive the list of maps from the file with marker positions. In this case, the start and end coordinates of the maps will be based on the lowest and the highest marker position listed in that file. The example below assumes that the maps are already created in the database under map set called "Bd3-1 x Bd21 - 7".

[ProcessRun]
; Run description: if specified, a custom description will be used. Will be ignored if a RunId is specified.
;                  otherwise, "Added markers for {MapSet Accession No.} from {Sources}." will be used.
RunDescription="Added genetic markers for Brachypodium distachyon - population:Bd3-1 x Bd21 from http://pgdbj.jp/kazusa/jsp/mapSelect.do?change=Population&crop_id=9&population_id=10&gene_or_physical=1."

[MapSet]
; Either MapSetId or MapSetPath is required.
; MapSetId: id of a target map set.
;MapSetId=270702459
; MapSetPath: path of a target map set.
MapSetPath="/Brachypodium/Bd3-1 x Bd21 - 7"

[MarkerType]
; To add or update a marker type, specify a type name and description.
;TypeName=Description
SSR="Simple Sequence Repeat."
COS="Conserved Ortholog Set."
IFLP="Intron Fragment Length Polymorphism."
Indel="Insertion/Deletion."

[MarkerNameType]

[MarkerSequenceType]

[Marker]
; Source (required): a TXT file or GFF file located locally or remotely accessible via URL.
Source="Samples/Marker/Bdistachyon_Bd3-1xBd21.txt"
; FileType: {Text (delimited text file)|Gff}
FileType=Text
; Origin (required): database or institution that the file originate
Origin="PGDB"
; CoordinateSystem (required): 1 (one-based) / 0 (zero-based). Default value is 1.
CoordinateSystem=0
; MappingMethod: method to map markers in the file. e.g. BLAST, RepeatMasker
;                if not specified, 'Unknown' is used.
;MappingMethod=""
; Tracks (required): comma delimited. Corresponding section(s) named the same must exist
;                    At least one track need to be specified.
Tracks="Markers"
; SourceOrganismId: markers are suggested to be unique in a source organism.
;                   Specify a source organism if you want to lookup markers belonging to the organism.
;                   Otherwise, inferred by target MapSet.
;SourceOrganismId=1534
; BypassLookup: A marker name-id dictionary to the check duplication of markers will be built.
;                             For each marker, the program checks if a marker with identical name and source organism already present in db.
;                             By default, a dictionary name=>id is built internally to speed up the lookup.
;                        This may result in an increased memory usage.
;                To bypass this step, set BypassLookup true. Default value is false.
;BypassLookup=false

; SearchAliases: markers are suggested to be unique in a source organism.
;                To check duplication, a marker name-id dictionary will be built.
;                Indicates if other names besides primary name given marker ID are searched or not.  Default value is false.
;SearchAliases=false
; RebuildIndex: indicates if marker indices are rebuilt or not. Default value is false.
;RebuildIndex=true
; Commit frequency: indicates how often the process commits markers. Every N markers.
CommitFrequency=1000

[Markers]
; TrackName: track name to be displayed on the plate.
TrackName="Markers"
; TrackDescription: track description shared across maps in the MapSet.
TrackDescription="Bd3-1xBd21 markers."
; TrackType: choose one among
;            1. GENERIC_BP_TRACK: markers for physical maps.
;            2. MARKER_TRACK: markers for genetic maps.
;            3. HEAT_TRACK: markers for physical maps. Heatmap of physical distance.
;            4. DENSE_BP_MARKER_TRACK: dense marker track for physical maps. Heatmap will not work.
;            5. CYTOBAND_TRACK: cytoband markers. Each marker has to have 'GieStain' qualifier.    
TrackType=MARKER_TRACK
; TrackColor: {NamedColor|HTML hex code|R,G,B} - color of marker glyphs
TrackColor=Cyan
; PrimaryMarkerNameType: markers can have multiple names, each name has its type, one of the types should store the primary name;
; The primary name is shown in the marker's label
PrimaryMarkerNameType=FULL_NAME
; AddModes: choose a mode to add name, mapping, qualifiers or sequence among
;           1. AddAnyway: Add regardless of duplication. Faster as it does not check.
;           2. AddOrDie: add if not exists; die (throw exception) otherwise.
;           3. AddOrUpdate: add if not exists; update otherwise.
;           4. AddOrSkip: add if not exists; skip otherwise.
MarkerNameAddMode=AddOrSkip
MarkerSequenceAddMode=AddOrSkip
MarkerQualifierAddMode=AddOrUpdate
MappingAddMode=AddAnyway
MappingQualifierAddMode=AddOrUpdate

; Commands for text-delimited format are below
; TextSkipHeaderLines: the number of lines to skip parsing
TextSkipHeaderLines=1
; CommentPrefix: comment prefix to skip parsing
;TextCommentPrefix="##"
; Delimiter: specify one among Colon(:), Comma(,), Period(.), Hyphen(-), SemiColon(;), Slash(/), Tab(\t), VerticalBar(|)
TextDelimiter=Tab
; Either marker type or marker type index should be provided.
; MarkerType: specify a marker type. (single). All markers will have the same type specified here.
;MarkerType="SSR"
; MarkerTypeIndex: column index(0-based) for marker types. (multiple). Each marker can have its own type.
; Each type should exist in the database or described in this control file in section [MarkerType]
TextMarkerTypeIndex=2
; MapNameIndex (required for mapping): column index(0-based) for map names.
TextMapNameIndex=4
; StartIndex (required for mapping): column index(0-based) for start positions.
TextStartIndex=5
; EndIndex: column index(0-based) for end positions. Nullable for point markers.
;TextEndIndex=5

; TextMappingScoreIndex: 0-based column index for mapping score
;TextMappingScoreIndex=6

; TextMarkerNameIndex (required): column index(0-based) for marker names.
; The following means that column 1 (second from the left) contains marker name of type "FULL_NAME"
TextMarkerNameIndex.1=FULL_NAME
;TextMarkerNameIndex.2=ALIAS

; TextMarkerSequenceIndex: column index(0-based) for a marker sequence.
; The following means that the column 10 contains sequence of type "ASSAY_SEQ"
;TextMarkerSequenceIndex.10=ASSAY_SEQ

; TextFilterIndex: column index(0-based) for filters delimited comma.
;              if not specified, all the items will be included.
;TextFilterIndex=0
; TextFilterValues: only lines containing one of these values (separated by comma) in the column specified above will be considered
;TextFilterValues="Brachypodium distachyon"

; Qualifiers: used to add additional information.
; TextMarkerQualifierIndex.COL_INDEX(0-based)=qualifierName((:displayText),dataType,dataFormat)
; The following means that text in column 13 will be stored as qualifier "alleles"
;TextMarkerQualifierIndex.3=alleles
; The following means that column 14 will be stored as qualifier "OTV", shown to the end users as "Off-Target Variants"
; with the type of 'int' (recognized data types: string, int, double)
;TextMarkerQualifierIndex.14=OTV:Off-Target Variants,int

;TextMappingQualifierIndex.COL_INDEX(0-based)=qualifierName((:displayText),dataType,dataFormat)
; Qualifiers for mapping positions (as opposed to marker qualifiers - a marker can have several mappings)
;TextMappingQualifierIndex.12=uniqueMapping

[MapMapping]
; LoadListedMapsOnly: if true, only maps listed below will be loaded, otherwise psh will try to match map_names
; not listed here to MAP_NAME, MAP_ID or ACCESSION_NO. The map will be automatically skipped if no match is found.
;LoadListedMapsOnly=true
; If no mapping is found in this section, it assumes that each MAP_NAME in file exactly matches a MAP_NAME in DB.
; If map names in file are different from those in DB, map each MAP_NAME in file to its MAP_NAME, MAP_ID or ACCESION_NO in DB.
; Otherwise, marker will be created without mapping.
;MAP_NAME in file=MAP_NAME, MAP_ID or ACCESSION_NO in DB

Loading markers without mapping

Normally, the 'add marker' command is used to load markers together with their positions on existing maps, either genetic or physical. In this case, the [MapSet] section is required and should reference a map set already present in the database. However, there are tasks of updating existing markers by adding extra qualifiers or loading new unmapped markers, and for this mode, an alternative form of the INI file is used, which does not have the [MapSet] section nor other map- or track-related information.  The main switch between loading markers with or without mapping is done by using LoadUpdateMarkersOnly key. When its value is true, PersephoneShell will not look for the [MapSet] section and will read the parsing instructions from the [Marker] section.

[ProcessRun]
; Run description: if specified, a custom description will be used. Will be ignored if a RunId is specified.
;                  otherwise, "Added markers for {MapSet Accession No.} from {Sources}." will be used.
RunDescription="Updated wheat 280K markers from http://journals.plos.org/plosone/article/file?type=supplementary&id=info:doi/10.1371/journal.pone.0186329.s001"

[MapSet]
; LoadListedMarkersOnly: if true, no mapping is expected. You can use this file to upload additional qualifiers to existing markers
LoadUpdateMarkersOnly=true

[MarkerType]
; To add or update a marker type, specify a type name and description.
;TypeName=Description
SNP="Single Nucleotide Polymorphism"

[MarkerNameType]
FULL_NAME="Full name"

[MarkerSequenceType]
;ASSAY_SEQ="Assay sequence"

[Marker]
; Source (required): a TXT file or GFF file located locally or remotely accessible via URL.
Sources="r:\wheat\Supplemental Table S2.txt"
; Number format culture: specifies a culture name used to parse numbers in data. Default value is en - English.
;                        e.g. de - German, es - Spanish, fr - French. For more cultures, https://msdn.microsoft.com/en-us/goglobal/bb896001.aspx
;NumberFormatCulture="fr"
; FileType: {Text (delimited text file)|Gff}
FileType=Text

; Origin (required): database or institution that the file originate
Origin="IWGSC"
; SourceOrganismId: markers are suggested to be unique in a source organism.
;                   Specify a source organism if you want to lookup markers belonging to the organism.
;                   Otherwise, inferred by target MapSet.
SourceOrganismId=4565
; BypassLookup: A marker name-id dictionary to the check duplication of markers will be built.
;                             For each marker, the program checks if a marker with identical name and source organism already present in db.
;                             By default, a dictionary name=>id is built internally to speed up the lookup.
;                        This may result in an increased memory usage.
;                To bypass this step, set BypassLookup true. Default value is false.
BypassLookup=false
; SearchAliases: Indicates if other names besides primary name is searched or not.  Default value is false.
;SearchAliases=false
; RebuildIndex: indicates if marker indices are rebuilt or not. Default value is false.
;RebuildIndex=true
; Commit frequency: indicates how often the process commits markers. Every N markers.
CommitFrequency=1000

; PrimaryMarkerNameType: markers can have multiple names, each name has its type, one of the types should store the primary name;
; The primary name is shown in the marker's label
PrimaryMarkerNameType=FULL_NAME
; AddModes: choose a mode to add name, mapping, qualifiers or sequence among
;           1. AddAnyway: Add regardless of duplication. Faster as it does not check.
;           2. AddOrDie: add if not exists; die (throw exception) otherwise.
;           3. AddOrUpdate: add if not exists; update otherwise.
;           4. AddOrSkip: add if not exists; skip otherwise.
MarkerNameAddMode=AddOrSkip
MarkerSequenceAddMode=AddOrSkip
MarkerQualifierAddMode=AddOrSkip
; SkipHeaderLines: the number of lines to skip parsing
TextSkipHeaderLines=1
; TextCommentPrefix: comment prefix to skip parsing
;TextCommentPrefix="##"
; Delimiter: specify one among Colon(:), Comma(,), Period(.), Hyphen(-), SemiColon(;), Slash(/), Tab(\t), VerticalBar(|)
TextDelimiter=Tab
; Either marker type or marker type index should be provided.
; MarkerType: specify a marker type. (single). All markers will have the same type specified here.
;MarkerType="SSR"
; TextMarkerTypeIndex: column index(0-based) for marker types. (multiple). Each marker can have its own type.
; Each type should exist in the database or described in this control file in section [MarkerType]
TextMarkerTypeIndex=2

; TextMarkerNameIndex (required): column index(0-based) for marker names.
; The following means that column 1 (second from the left) contains marker name of type "FULL_NAME"
TextMarkerNameIndex.1=FULL_NAME
;TextMarkerNameIndex.2=ALIAS

; TextMarkerSequenceIndex: column index(0-based) for a marker sequence.
; The following means that the column 10 contains sequence of type "ASSAY_SEQ"
;TextMarkerSequenceIndex.10=ASSAY_SEQ

; TextMarkerSequenceIndex: column index(0-based) for a marker sequence.
; The following means that the column 10 contains sequence of type "ASSAY_SEQ"
;TextMarkerSequenceIndex.10=ASSAY_SEQ

; TextFilterIndex: column index(0-based) for filters delimited comma.
;              if not specified, all the items will be included.
;TextFilterIndex=0
; TextFilterValues: only lines containing one of these values (separated by comma) in the column specified above will be considered
;TextFilterValues="Brachypodium distachyon"

; Qualifiers: used to add additional information.
; TextMarkerQualifierIndex.COL_INDEX(0-based)=qualifierName((:displayText),dataType,dataFormat)
; The following means that text in column 13 will be stored as qualifier "alleles"
;TextMarkerQualifierIndex.3=alleles
; The following means that column 14 will be stored as qualifier "OTV", shown to the end users as "Off-Target Variants"
; with the type of 'int' (recognized data types: string, int, double)
;TextMarkerQualifierIndex.14=OTV:Off-Target Variants,int
TextMarkerQualifierIndex.0=ProbeSetId
TextMarkerQualifierIndex.8=Category
TextMarkerQualifierIndex.9=HET_#,int
TextMarkerQualifierIndex.10=AA_#,int
TextMarkerQualifierIndex.11=BB_#,int
TextMarkerQualifierIndex.12=NA_#,int
TextMarkerQualifierIndex.13=OTV:Off-Target Variants,int
TextMarkerQualifierIndex.14=PIC:Polymorphism Information Content,double

[MapMapping]
; If no mapping is found in this section, it assumes that each MAP_NAME in file exactly matches a MAP_NAME in DB.
; If map names in file are different from those in DB, map each MAP_NAME in file to its MAP_ID or ACCESION_NO in DB.
; Otherwise, marker will be created without mapping.
;MAP_NAME in file=MAP_ID or ACCESSION_NO in DB
;LoadListedMapsOnly=true


[DbSequences]
; The ID columns below are used in loading markers.
; If there is no sequence/trigger assigned to these columns, you must specify a sequence for them.
;PROCESS_RUN.RUN_ID=ID_SEQ
;ANALYSIS.ANALYSIS_ID=ID_SEQ
;MARKER_TYPE.MARKER_TYPE_ID=ID_SEQ
;DESCRIPTION.DESCR_ID=ID_SEQ
;TRACK.TRACK_ID=ID_SEQ
;TRACK_STYLE.TRACK_STYLE_ID=ID_SEQ
;MARKER.MARKER_ID=ID_SEQ
;MARKER_NAME.MARKER_NAME_ID=ID_SEQ
;MARKER_NAME_TYPE.MARKER_NAME_TYPE_ID=ID_SEQ
;MARKER_SEQUENCE.MARKER_SEQUENCE_ID=ID_SEQ
;MARKER_SEQUENCE_TYPE.MARKER_SEQUENCE_TYPE_ID=ID_SEQ
;MARKER_QUALIFIER.QUALIFIER_ID=ID_SEQ
;MARKER_QUALIFIER_NAME.QUALIFIER_NAME_ID=ID_SEQ
;QUALIFIER_DISPLAY.QUAL_ID=ID_SEQ
;MK_MAPPING.MAPPING_ID=ID_SEQ
;MAPPING_QUALIFIER.MAPPING_QUALIFIER_ID=ID_SEQ

Copy the control file "add_Bdistachyon_Bd3-1xBd21.ini" to the directory where you installed PersephoneShell (e.g., C:\PersephoneShell).

Add the markers in an interactive or command line mode (see Running PersephoneShell). In the interactive mode, enter the following:

PS> add markers -c add_Bdistachyon_Bd3-1xBd21.ini

In the command line mode enter (use the proper connection name after -s):

C:\PersephoneShell> psh -s ********** add markers -c add_Bdistachyon_Bd3-1xBd21e.ini

A verification message will be displayed.

As usual, please use '-t' switch first to test the files before loading.