Add
The add command is used to add objects to the Persephone database. The syntax is as follows:
add {target} -c controlFile [(-v | -t) -d]
where "controlFile" is file with an ".ini" extension that contains the required data needed to add the target. (See Control Files for more information.)
The table below lists the definitions for the add command parameters.
Add Command Parameters
Parameter |
Required or Optional? |
Definition |
{target} |
Required |
A target is the object type you want to add, which can be organism, sequence(s), sequencedatabase, map(s), annotation(s), annotation_qualifier(s), annotation_search, qualifier_link, bed, marker(s), alignment(s), ribbon(s), mapsettreenode, expression, variant(s), quantitative, tracktreenode, qtl(s), ortholog(s), sam, or ontology. |
-c controlFile |
Required most of the time |
Loads data from the specified control file. As described in Control Files, control files are files with an ".ini" extension. Please note, the supported data format of the control file varies depending on the target type you select. |
-v |
Optional |
Executes the add command in verbose mode with extra information printed on screen. |
-t |
Optional |
Executes the add command in test mode. |
-d |
Optional |
Executes the add command in debug mode. You can send the debug output to Persephone Software, LLC. at http://persephonesoft.com/contact. |
-f |
Optional |
Normally, when creating some objects from the command line the program will ask for confirmation. The force mode will skip this confirmation. The flag is typically used in a batch mode, when adding the data is done via a script. |
Adding Data with the Add Command
See the following use cases for examples of using the add command to add data.
Purpose |
Example command |
Notes |
add organism -c indica.ini -v |
Before entering any data, a parent organism should be created |
|
add sequence -c indica.ini -v |
Map sets can be of different kinds. If the maps are based on genomic sequences, use this command to add the map set itself and the maps with sequences |
|
add annotation -c indica_bgi.ini -v |
Add gene model tracks using this command. Each map can contain several tracks with gene models predicted by different annotation methods. |
|
add annotation_qualifier -c indica_pfam.ini -v |
Add extra qualifiers to the gene models already loaded to the database. This info may include functional annotation or hyperlinks to external resources |
|
Adding new qualifiers by extracting values from the existing qualifiers |
add annotation_qualifier "Oryza sativa/IRGSP1.0" |
Add new qualifiers interactively. Name the map set and a track with gene models, provide text modification rules (regular expressions) on how to extract the new values from existing qualifiers and store them under new qualifiers. |
add qualifier_link |
Interactive command to nominate a qualifier as a hyper-link. Normally, this allows to open external web pages with extra info about the gene. |
|
add annotation_search "Oryza sativa/IRGSP1.0" |
Interactively mark some qualifiers as gene name or function. This info will be used to narrow down the search. |
|
add marker -c clinvar.ini -v |
Create a marker track with markers positioned on a map. The mapping coordinates can be bp for the maps based on sequence or cM for genetic maps. To add marker tracks to the genetic maps, the map set should be created first using the command add map |
|
add marker -c clinvar.ini -v |
Same as above, but the marker coordinates are provided in a form of a tab-delimited file. |
|
add sequencedatabase -c arabidopsis.ini -v |
Add map set with sequences and gene annotation in one step by providing data in Genbank format |
|
add map -c linkage.ini -v |
Adding genetic maps is done in two steps: first, add the empty maps, then add marker mapping. The information about the maps can be provided in a separate file listing the sizes of maps or can be derived from the file with marker positions. |
|
add expression -c tissues.ini -v |
This command adds gene expression on the level one gene - one value per experiment. One job like this can load multiple values for each gene. |
|
add variant -c 1000genomes.ini -v |
Load variants that contain SNPs or indels. To save space, the data is highly compressed, so that each position in each sample carries the alleles and coverage values only. The position names and other properties can be stored as an additional marker track. |
|
add sample -c extra_info.ini -v |
The genotyping samples can have additional qualifiers and description. |
|
add ontology -c gramene.ini -v |
Each QTL is linked to a trait that is placed in a trait ontology. Before loading any QTL, provide the trait ontology in OBO format |
|
add qtl -c heat.ini -v |
QTLs must have a trait listed in the trait ontology. A QTL should be assigned to a study that groups multiple QTLs. The QTL data can be read from files in text or Excel format, that may also include the study information. |
|
add ribbons -c irgsp_ir64.ini -v |
Synteny ribbon-like connectors link related intervals between sequences. Note that the web version of Persephone can find such regions in the run time. |
|
add tracktreenode "Oryza sativa/IRGSP1.0" |
For better organization, the tracks can be grouped to form a tree-like structure. This interactive command will ask to name the new group node and to list the tracks to be grouped. |
|
add bed -c regions.ini -v |
The data in BED file with additional annotation information can be displayed as colored elements on the maps. |
|
add alignment -c swissprot.ini -v |
A special track with protein or cDNA alignments can help annotating genomic regions. A typical example - a set of pre-calculated tblastn hits for well characterized proteins, such as SwissProt. |
|
Adding map set tree nodes |
add mapsettreenode "Oryza sativa/Genetic" -v |
Map sets can be organized by introducing more nodes in the map set tree. One command can introduce more than one node if a child node references a parent that does not exist yet. |
add quantitative rna-seq-coverage.ini -v |
Quantitative tracks can contain values displayed in a form of a chart along the sequence, such as RNA-seq coverage. |
|
add sam ESTs.ini -v |
If you want to load a "countable" number of spliced alignments (not millions), use this command. Note: the web version of Persephone allows users to visualize large bam files. |
|
Adding sequence storage |
add storage |
In case the database is not Oracle, storing sequences in the database is not allowed. The compressed sequence data is stored in the file system, with the metadata being loaded into the database. If the default storage space is not enough, add another storage interactively, using this command. |
add ortholog -c rice-corn.ini -v |
The orthologous gene pairs can be calculated by different methods. Use this command to load this information supplied in a tab-delimited file. |