If you plan to load multiple map sets that are available in data sets with common format, it is useful to engage the command build ini. It will accept a template file with placeholders, which will be filled with values from a tab-delimited text file. For example, if your pangenome data set consists of 20 genomes, the data file should contain 20 lines, one line per map set. You can add different organisms or sequences or gene annotations. For that, you will need multiple INI files, which can be generated using the command build ini.

Provide a template file where placeholders have the form of {number}, such as {0}, {1}, etc. The values from the rows and columns in the data files will be embedded into the corresponding placeholders. The values from the column 0 will fill {0}, and so forth.

For instance, here is a template file for loading organism information:


[Organism]
; Organism ID (optional, if not specified, it will be autogenerated)
OrganismId={6}
; Look up taxonomy information in http://www.ncbi.nlm.nih.gov/taxonomy
; Taxnomy ID (required)
TaxonomyId={6}
; Alternative ID: user defined ID
;AlternativeId="x"
; Scientific name (required)
ScientificName={2} {3}
; Common name (optional)
{7}
;If plant, specify if the organism is monocot(0) or eudicot(1)
PlantClassification=1

The tab-delimited text data file (organisms.txt) is shown here as a table:

Solins1

v1.1

Solanum

insanum

Sins1

Sinsanum.fasta.gz

2056095

;

Solgig1

v1.2

Solanum

giganteum

Sgig1

Sgiganteum.fasta.gz

374017

;

Solvio1

v1.1

Solanum

violaceum

Svio1

Sviolaceum.fasta.gz

329803

;

Sollin1

v1.2

Solanum

linearifolium

Slin1

Slinnaeanum.fasta.gz

329777

CommonName=sodom-apple

Solang8

v1.1

Solanum

anguivi

Sang8

Sanguivi.fasta.gz

329760

;

Solrob1

v1.2

Solanum

robustum

Srob1

Srobustum.fasta.gz

238982

;

Solqui2

v1.3

Solanum

quitoense

Squi2

Squitoense.fasta.gz

227725

CommonName=lulo

The column with index 6 contains TaxonomyId that will also be used as OrganismId. The values will replace the placeholders {6}, which, as you can see, appear more than once in the template.

As the CommonName is optional and is not available for all the organisms, the placeholder {7} will be filled with the instruction with the common name or with the comment symbol.

The command to build the INI files need to know the template file, the data file, the way to name the output INI files, the output directory, etc. Place these parameters into another INI file (buildOrganisms.ini):


[Build]
;FileNameIndex (0-based): define which column contains the base file name for generated INI files. For example, if the column contains 'Solins1',
; the file 'Solins1.ini' will be created in the folder pointed by OutputDir
FileNameIndex=0

;OutputDir: specify the output folder for the generated INI files
OutputDir=$DATA/Samples/Organism/pangenome

;DataFile: the tab-delimited text file with lines containing the data for each genome. 
DataFile=$DATA/tomato/organisms.txt 

;TemplateFile: the template INI file with placeholders in the form of {0},{1},... referencing the corresponding columns of the tab-delimited data file.
TemplateFile=$DATA/tomato/templateOrganisms.ini

;TextDelimiters=Tab


The first instruction FileNameIndex=0 specifies the column that contains the base file name for the newly created INI files. As a result, the files like Solins1.ini will be created in the output directory $DATA/samples/Organism/pangenome.

To generate the INI files, run the command

PS> build ini -c buildOrganisms.ini

A similar approach can be used to generate INI files for loading FASTA sequences and gene annotation.