Genetic maps (linkage groups) contain tracks with markers, whose position is based on genetic linkage and is measured in 'cM' (centimorgans).

A one-step loading procedure (recommended)

The simplest way to load the genetic maps is using a file with marker positions. The map and marker names with the marker mapping coordinates should be provided on each line. The size of each map will be calculated from the lowest and highest marker coordinates on each map. The command for loading the genetic maps is:

add genetic_map -c <controlfile> [-t|-v]

The INI control file should have the information about the new map set and the instructions on parsing the marker mapping information. As with the INI files for adding sequences, the control files for genetic maps should contain the sections [MapSet] and [MapSetTree]. The data needed for creating a new map set is provided in the section [MapSet]. This includes OrganismId, MapSetName, AccessionNo, etc.


[MapSet]
;------------------------------------------------------------------------------------------------
; Adding new MapSet
;------------------------------------------------------------------------------------------------
; Organism ID (required): organism ID should exist.
OrganismId=4565
; Display name (required): a name shown in MapSetTree
DisplayName="ARE_050606"
; Description: by default, organism name + display name.
Description="Genetic map from https://urgi.versailles.inra.fr/GnpMap/mapping/id.do?action=MAP&id=62"
; AccessionNo: accession of the genome build. See http://ncbi.nlm.nih.gov/genome
AccessionNo="ARE_050606"
; Source ID: database or institution that the MapSet/sequence originate
SourceId="URGI_INRA"

; MapOrder: Order maps by different methods. 
; Natural: using 'natural compare' method. The digits are grouped and ordered as numbers, so 10 goes after 2 (Default)
; Alphabetical: a regular sort in English
; Original: do not sort, preserve the original order of maps from file
;MapOrder=Alphabetical


Note, by default, the order of maps will be based on 'natural compare' method - the digits are grouped and treated as numbers, so Chr1, Chr2 and Chr10 are properly sorted. This automatic sorting can be changed by setting the value of MapOrder instruction. The ordering of maps is specific to genetic maps. In Persephone, the sequence-based (physical) maps are listed based on the logic that considers if a map is a chromosome or not. The chromosomes are always shown first, based on their original order in the file. The other sequences (non-chromosomes) are reversely sorted by size.

To place the map set in the map set tree we need the section [MapSetTree]:


[MapSetTree]
;------------------------------------------------------------------------------------------------
; 1. Adding new MapSetTree node to a parent node
;------------------------------------------------------------------------------------------------
; Parent node ID: if specified, the MapSet with the new sequences will be placed under this parent node as a child.
;ParentNodeId=42
;------------------------------------------------------------------------------------------------
; 3. Adding new MapSetTree node under a new root node
;------------------------------------------------------------------------------------------------
; Root node name: usually an organism name. Ignored if the root name already exists.
RootNodeName="/Triticum aestivum"
; Root node order number: order of the root node in the MapSetTree. By default, 0.
;RootNodeOrderNo=0

Each newly added marker should have a marker type, marker name type and optionally marker sequence type. This information is given in the following sections:


[MarkerType]
; To add or update a marker type, specify a type name and description.
;TypeName=Description
;DArT=Diversity Arrays Technology
PCR-SSR=PCR-SSR
SSR=Simple Sequence Repeat
1D=unknown

[MarkerNameType]
; Markers can have multiple names. Add new name types in this section by providing the type and its description
;BristolAffyCode=Bristol Affy Code
;ALIAS="Alias"
;PROBESET_ID=ID for Axiom array
;Generated=Generated by psh
FULL_NAME=Full name

[MarkerSequenceType]
;ASSAY_SEQ="Assay sequence"

If the new markers use some new types, they should be introduced in the corresponding sections shown above. 

The information needed to correctly parse the marker mapping data, such as which column contains map or marker names, is provided in the section [Marker]. Please read the comments for each line.


[Marker]
; Source (required): a TXT file on local disk or remotely accessible via URL.
Source=$DATA/wheat/genetic/ARE_050606.gen  

; CoordinateSystem: 1 (one-based) / 0 (zero-based). Default value is 0.
;CoordinateSystem=1
; MappingMethod: method to map markers in the file. e.g. BLAST, RepeatMasker
;                if not specified, 'Unknown' is used.
;MappingMethod="bionano"
; SourceOrganismId: markers are suggested to be unique in a source organism.
;                   Specify a source organism if you want to lookup markers belonging to the organism.
;                   Otherwise, inferred by target MapSet.
;SourceOrganismId=1534
; BypassLookup: A marker name-id dictionary to check duplication will be built.
;               To bypass this step, set BypassLookup true. Default value is false.
;BypassLookup=
; SearchAliases: Indicates if other names besides primary name is searched or not.  Default value is false.
;SearchAliases=false
; NamePrefixesForLookup: a lookup table is built in memory from the marker names stored in the database to speed up finding the existing markers. 
; If a marker with identical name and OrganismId is found, its MarkerId will be reused. Building the lookup can be sometimes tricky:
; the size of the lookup can be prohibitively large. 
; To reduce the lookup table size, PersephoneShell will try to find the common name prefix in the list of the new markers
; and use it to filter the lookup table. If the common prefix of the markers to be loaded is known in advance, it can be defined here.
;NamePrefixesForLookup=XB,X

; Commit frequency: indicates how often the process commits markers. Every N markers.
CommitFrequency=1000

; TrackName: track name to be displayed on the plate.
TrackName="Markers"
; TrackDescription: track description shared across maps in the MapSet.
TrackDescription="Genetic map from https://urgi.versailles.inra.fr/GnpMap/mapping/id.do?action=MAP&id=62"

; TrackColor: {NamedColor|HTML hex code|R,G,B}
;TrackColor=255,0,0
PrimaryMarkerNameType=FULL_NAME

; AddModes: choose a mode to add name, mapping, qualifiers or sequence among
;           1. AddAnyway: Add regardless of duplication. Faster as it does not check.
;           2. AddOrDie: add if not exists; die (throw exception) otherwise.
;           3. AddOrUpdate: add if not exists; update otherwise.
;           4. AddOrSkip: add if not exists; skip otherwise.
MarkerNameAddMode=AddOrSkip
MarkerSequenceAddMode=AddOrSkip
MarkerQualifierAddMode=AddAnyway
MappingAddMode=AddAnyway
MappingQualifierAddMode=AddAnyway
; SkipHeaderLines: the number of lines to skip parsing
;SkipHeaderLines=0
; CommentPrefix: comment prefix to skip parsing
CommentPrefix="#"
; Delimiter: specify one among Colon(:), Comma(,), Period(.), Hyphen(-), SemiColon(;), Slash(/), Tab(\t), VerticalBar(|)
Delimiter=Tab
; Either marker type or marker type index should be provided.
; MarkerType: specify a default marker type. (single)
MarkerType=SSR

; MarkerTypeIndex: column index(0-based) for marker types. (multiple)
MarkerTypeIndex=3
; MapNameIndex (required for mapping): column index(0-based) for map names.
MapNameIndex=0
; StartIndex (required for mapping): column index(0-based) for start positions.
StartIndex=2
; EndIndex: column index(0-based) for end positions. Nullable for point markers.
;EndIndex=3
; MarkerNameIndex (required): column index(0-based) for marker names.
MarkerNameIndex.1=FULL_NAME
;MarkerNameIndex.2=ALIAS
; MarkerSequenceIndex: column index(0-based) for a marker sequence.
;MarkerSequenceIndex.6=ASSAY_SEQ
; FilterIndex: column index(0-based) for filters delimited comma.
;              if not specified, all the items will be included.
;FilterIndex=0
;FilterValue="Brachypodium distachyon"
; Qualifiers: used to add additional information.

Test the control file using the -t switch and, if everything is correct, load the data in the verbose mode replacing -t with -v.

A two-step loading procedure

Creating genetic maps with markers can be done in two steps (see below for the "shortcut" trick using DeriveMapListFrom): 

  • create a map set with empty maps using 'add maps' command and 
  • add markers with command 'add markers'.        

This section describes the procedure of adding maps.

The control INI file should have the information about the new map set, its location in the map set tree, the OrganismId, the list of map names, their accession and size, as well as other house-keeping information like ProcessRun, or DbSequences.

[ProcessRun]
; Run description: if specified, a custom description will be used,
;                  otherwise, "Added maps for {MapSet Accession No.}." will be used.
RunDescription="Added maps for Sorghum bicolor BTx623-IS320C"

[MapSet]
; Organism ID (required): organism ID should exist.
OrganismId=4558
; Display name (required): a name shown in MapSetTree.
DisplayName="BTx623-IS320C"
; Description: please try to provide detailed description with proper credits.
Description="A set of ~10k markers, derived from 437-line recombinant inbred population of BTx623 and IS320C.
Original publication: http://www.g3journal.org/content/4/10/1963 Resolution of Genetic Map Expansion Caused by Excess Heterozygosity in Plant Recombinant Inbred Populations 
Sandra K. Truong, Ryan F. McCormick, Daryl T. Morishige and John E. Mullet"

; AccessionNo (required): accession of the genome build. See http://ncbi.nlm.nih.gov/genome
AccessionNo="BTx623-IS320C"
; Source ID: database or institution that the MapSet/sequence originate
SourceId="Publication"
; DistanceUnit: choose 'cM' for genetic maps.
;               We suggest to add physical maps along with sequences. So use 'add sequences' instead of 'add maps'.
DistanceUnit="cM"

[MapSetTree]
;------------------------------------------------------------------------------------------------
; 1. Using existing MapSetTree node
;------------------------------------------------------------------------------------------------
; Node ID: if specified, the MapSet with the new sequences will be placed on this node.
;NodeId=12345
;------------------------------------------------------------------------------------------------
; 2. Adding new MapSetTree node to a parent node
;------------------------------------------------------------------------------------------------
; Parent node ID: if specified, the MapSet with the new sequences will be placed under this parent node as a child.
ParentNodeId=25
;------------------------------------------------------------------------------------------------
; 3. Adding new MapSetTree node under a new root node
;------------------------------------------------------------------------------------------------
; Root node name: usually an organism name. Ignored if the root name already exists.
;RootNodeName="Oryza sativa test"
; Root node order number: order of the root node in the MapSetTree. By default, 0.
;RootNodeOrderNo=0

[Maps]
; Lists all the chromosomes, scaffolds or linkage groups.
;MapName=Accession,Start,End
1=1,0,177.79
2=2,0,190.52
3=3,0,170.08
4=4,0,153.02
5=5,0,118.58
6=6,0,128.59
7=7,0,127.91
8=8,0,106.65
9=9,0,119.19
10=10,0,126.86
; DeriveMapListFrom: if present, the map names and sizes will be derived from the file with marker mappings. The map start and end positions will 
; correspond to lowest and highest coordinates of markers for each map
;DeriveMapListFrom=o:\Marker\smallweb\wheat_genetic_CS_Renan.ini 
; AddModes: choose a mode to add name, mapping, qualifiers or sequence among
;           1. AddOrDie (default): add if not exists; die (throw exception) otherwise.
;           2. AddOrUpdate: add if not exists; update otherwise.
;           3. AddOrSkip: add if not exists; skip otherwise.
MapAddMode=AddOrUpdate

[DbSequences]
; The ID columns below are used in loading maps.
; Oracle: If there is no sequence/trigger assigned to these columns, you must specify a sequence for them.
;PROCESS_RUN.RUN_ID=ID_SEQ
;MAP_SET.MAP_SET_ID=ID_SEQ
;MAP_SET_TREE.ID=ID_SEQ
;MK_MAP.MAP_ID=ID_SEQ

Derive map list from the marker file

You can instruct PersephoneShell to derive the list of maps from the file with marker mapping. The map names and sizes will be extracted from the marker file referenced in another control file, used for loading the markers. The map start and end positions will correspond to the lowest and highest coordinates of the markers on each map. To enable this, replace the explicit list of maps in [Maps] section with DeriveMapListFrom record:


[Maps]
DeriveMapListFrom=o:\Marker\smallweb\wheat_genetic_CS_Renan.ini

Test the files:

PS> add map -c maps.ini -t

and then run the loading process for real:

PS> add map -c maps.ini -v

Please see the sample files in Samples\Map folder.