Adding Organisms
Step 1: Adding Organisms
Build INI files for adding organisms
The metadata for all entries of the pangenome can be downloaded from the supplemental materials published in the paper https://www.nature.com/articles/s41588-026-02506-0:
|
ID |
NAME |
TAXONOMY |
CATEGORY |
COUNTRY |
REGION |
|
AM001 |
WI2757 |
Cucumis sativus var. sativus |
Cultivar |
USA |
America |
|
AM002 |
WI7012 |
Cucumis sativus var. sativus |
Cultivar |
USA |
America |
|
AM003 |
WI7037 |
Cucumis sativus var. sativus |
Cultivar |
USA |
America |
|
AM006 |
True Lemon |
Cucumis sativus var. sativus |
Cultivar |
USA |
America |
|
AM011 |
WI7150 |
Cucumis sativus var. sativus |
Cultivar |
USA |
America |
|
AM014 |
WI7167 |
Cucumis sativus var. xishuangbannanesis |
Xishuangbanna |
China |
East Asia |
|
AM015 |
WI7204 |
Cucumis sativus var. sativus |
Cultivar |
Israel |
Central/West Asia |
|
AM016 |
Poinsett 76 |
Cucumis sativus var. sativus |
Cultivar |
USA |
America |
...
The assemblies in the cucumber pangenome are associated with three different varieties: Cucumis sativus var. hardwickii, Cucumis sativus var. xishuangbannanesis, and Cucumis sativus var. sativus
The information about the three organisms (the taxonomy number) can be fetched from the NCBI taxonomy pages and placed into a tab-delimited text file with two columns
organisms.txt:
|
Cucumis sativus var. sativus |
869827 |
|
Cucumis sativus var. xishuangbannanesis |
2219226 |
|
Cucumis sativus var. hardwickii |
319220 |
The file has two columns. To build the INI files for the command add organism, we need a template INI file, which we will place in $DATA/cucumber data folder and name it templateOrganisms.ini:
templateOrganisms.ini:
[Organism]
; Organism ID (optional, normally the same as TaxonomyId. If not specified, it will be auto-generated)
OrganismId={1}
; Look up taxonomy information in http://www.ncbi.nlm.nih.gov/taxonomy
; Taxonomy ID (optional)
TaxonomyId={1}
; Alternative ID: user defined ID
;AlternativeId=""
; Scientific name (required)
ScientificName={0}
; Common name (optional)
CommonName="cucumber"
;PlantClassification: (optional). If plant, specify if the organism is monocot(0) or eudicot(1).
PlantClassification=1
Remember that we address the columns by a 0-based index, so the two columns in the file have index 0 and 1. The placeholder {0} will be replaced with the value from the first column of the text file. For the first organism entry,
ScientificName={0}
will become:
ScientificName=Cucumis sativus var. sativus
The command build ini has its own INI file, which specifies the output directory, the data, and template files:
build-organism.ini:
[Build]
;FileNameIndex (0-based): define which column contains the base file name for generated INI files. For example, if the column contains 'Solins1',
; the file 'Solins1.ini' will be created in the folder pointed by OutputDir
FileNameIndex=0
;OutputDir: specify the output folder for the generated INI files
OutputDir=$DATA/cucumber/organism-ini
;DataFile: the tab-delimited text file with lines containing the data for each genome.
DataFile=$DATA/cucumber/organisms.txt
;TemplateFile: the template INI file with placeholders in the form of {0},{1},... referencing the corresponding columns of the tab-delimited data file.
TemplateFile=$DATA/cucumber/templateOrganisms.ini
FileNameIndex defines which column in the data file should be used for naming the generated INI files.
Now, when all the necessary files are ready, we can run the command to generate the control files for adding organisms:
PS> build ini -c $DATA/cucumber/build-organisms.ini
Output INI files will be placed in /data/Samples/Organism/cucumber-pangenome
Built file /data/Data/cucumber/organism-ini/Cucumis sativus var. sativus.ini
Built file /data/Data/cucumber/organism-ini/Cucumis sativus var. xishuangbannanesis.ini
Built file /data/Data/cucumber/organism-ini/Cucumis sativus var. hardwickii.ini
Adding organisms in a batch
To load multiple organisms, we will use the generated INI files and run the command add organism for all of them at once. Instead of a single INI file, the add commands can also accept a file name mask, such as $DATA/cucumber/organism-ini/*.ini. The command add organism will be executed for each INI file that matches the wildcard mask:
PS> add organism -c /data/Data/cucumber/organism-ini/*.ini -v
1/3 /data/Data/cucumber/organism-ini/Cucumis sativus var. hardwickii.ini
- Control file has been successfully parsed.
Testing ...
- Checking for OrganismId duplication: passed
- Validating Name length (1..100): passed
- Checking Name duplication: passed
- Validating CommonName length (1..1000): passed
Result:
- all 4 test(s) passed
Organism (Cucumis sativus var. hardwickii: 319220) has been successfully inserted.
2/3 /data/Data/cucumber/organism-ini/Cucumis sativus var. sativus.ini
- Control file has been successfully parsed.
Testing ...
- Checking for OrganismId duplication: passed
- Validating Name length (1..100): passed
- Checking Name duplication: passed
- Validating CommonName length (1..1000): passed
Result:
- all 4 test(s) passed
Organism (Cucumis sativus var. sativus: 869827) has been successfully inserted.
3/3 /data/Data/cucumber/organism-ini/Cucumis sativus var. xishuangbannanesis.ini
- Control file has been successfully parsed.
Testing ...
- Checking for OrganismId duplication: passed
- Validating Name length (1..100): passed
- Checking Name duplication: passed
- Validating CommonName length (1..1000): passed
Result:
- all 4 test(s) passed
Organism (Cucumis sativus var. xishuangbannanesis: 2219226) has been successfully inserted.
3 INI files were successful
/data/Data/cucumber/organism-ini/Cucumis sativus var. hardwickii.ini
/data/Data/cucumber/organism-ini/Cucumis sativus var. sativus.ini
/data/Data/cucumber/organism-ini/Cucumis sativus var. xishuangbannanesis.ini
Note the last 4 lines of the output. After the batch command completion, the list of successful INI files, as well as those that failed will be printed. This gives you a summary, which is especially useful when running the command in test mode - in case there is an error, it will print the names of the problematic INI files.
After confirming that all tests have been successful, remove the flag -t and run the loading process.