The add command is used to add objects to the Persephone database. The syntax is as follows:

add {target} -c controlFile [(-v | -t) -d]

where "controlFile" is file with an ".ini" extension that contains the required data needed to add the target. (See Control Files for more information.)

The table below lists the definitions for the add command parameters.

Add Command Parameters

Parameter

Required or Optional?

Definition

{target}

Required

A target is the object type you want to add, which can be organism, sequence(s), sequencedatabase, map(s), annotation(s), annotation_qualifier(s), annotation_search, qualifier_link, bed, marker(s), alignment(s),  ribbon(s), mapsettreenode, expression, variant(s), quantitative, tracktreenode, qtl(s), ortholog(s), sam, or ontology.

-c controlFile

Required most of the time

Loads data from the specified control file. As described in Control Files, control files are files with an ".ini" extension. Please note, the supported data format of the control file varies depending on the target type you select.

-v

Optional

Executes the add command in verbose mode with extra information printed on screen.

-t

Optional

Executes the add command in test mode.

-d

Optional

Executes the add command in debug mode. You can send the debug output to Persephone Software, LLC. at http://persephonesoft.com/contact.

-f

Optional

Normally, when creating some objects from the command line the program will ask for confirmation. The force mode will skip this confirmation. The flag is typically used in a batch mode, when adding the data is done via a script.

Adding Data with the Add Command

See the following use cases for examples of using the add command to add data.

Purpose

Example command

Notes

Add an Organism

add organism -c indica.ini -v

Before entering any data, a parent organism should be created

Add Map Set, Maps, and Sequences

add sequence -c indica.ini -v

Map sets can be of different kinds. If the maps are based on genomic sequences, use this command to add the map set itself and the maps with sequences

Add Gene Annotations

add annotation -c indica_bgi.ini -v

Add gene model tracks using this command. Each map can contain several tracks with gene models predicted by different annotation methods.

Add qualifiers to existing gene models

add annotation_qualifier -c indica_pfam.ini -v

Add extra qualifiers to the gene models already loaded to the database. This info may include functional annotation or hyperlinks to external resources

Adding new qualifiers by extracting values from the existing qualifiers

add annotation_qualifier "Oryza sativa/IRGSP1.0"

Add new qualifiers interactively. Name the map set and a track with gene models, provide text modification rules (regular expressions) on how to extract the new values from existing qualifiers and store them under new qualifiers.

Adding qualifier links

add qualifier_link

Interactive command to nominate a qualifier as a hyper-link. Normally, this allows to open external web pages with extra info about the gene.

Adding annotation search terms

add annotation_search "Oryza sativa/IRGSP1.0"

Interactively mark some qualifiers as gene name or function. This info will be used to narrow down the search.

Add Markers (GFF files)

add marker -c clinvar.ini -v

Create a marker track with markers positioned on a map. The mapping coordinates can be bp for the maps based on sequence or cM for genetic maps. To add marker tracks to the genetic maps, the map set should be created first using the command add map

Add Markers (Delimited Text Files)

add marker -c clinvar.ini -v

Same as above, but the marker coordinates are provided in a form of a tab-delimited file.

Adding SequenceDatabase

add sequencedatabase -c arabidopsis.ini -v

Add map set with sequences and gene annotation in one step by providing data in Genbank format

Add genetic maps

add map -c linkage.ini -v

Adding genetic maps is done in two steps: first, add the empty maps, then add marker mapping. The information about the maps can be provided in a separate file listing the sizes of maps or can be derived from the file with marker positions.

Adding Expression Data

add expression -c tissues.ini -v

This command adds gene expression on the level one gene - one value per experiment. One job like this can load multiple values for each gene.

Adding Variants

add variant -c 1000genomes.ini -v

Load variants that contain SNPs or indels. To save space, the data is highly compressed, so that each position in each sample carries the alleles and coverage values only. The position names and other properties can be stored as an additional marker track.

Adding Info for Genotyping Samples

add sample -c extra_info.ini -v

The genotyping samples can have additional qualifiers and description. 

Adding Ontology Terms

add ontology -c gramene.ini -v

Each QTL is linked to a trait that is placed in a trait ontology. Before loading any QTL, provide the trait ontology in OBO format

Adding QTLs

add qtl -c heat.ini -v

QTLs must have a trait listed in the trait ontology. A QTL should be assigned to a study that groups multiple QTLs. The QTL data can be read from files in text or Excel format, that may also include the study information.

Adding Synteny Ribbons

add ribbons -c irgsp_ir64.ini -v

Synteny ribbon-like connectors link related intervals between sequences. Note that  the web version of Persephone can find such regions in the run time.

Adding a Track Tree Node

add tracktreenode "Oryza sativa/IRGSP1.0"

For better organization, the tracks can be grouped to form a tree-like structure. This interactive command will ask to name the new group node and to list the tracks to be grouped.

Adding BED file

add bed -c regions.ini -v

The data in BED file with additional annotation information can be displayed as colored elements on the maps.

Adding protein or nucleotide alignments

add alignment -c swissprot.ini -v

A special track with protein or cDNA alignments can help annotating genomic regions. A typical example - a set of pre-calculated tblastn hits for well characterized proteins, such as SwissProt.

Adding map set tree nodes

add mapsettreenode "Oryza sativa/Genetic" -v

Map sets can be organized by introducing more nodes in the map set tree. One command can introduce more than one node if a child node references a parent that does not exist yet.

Adding quantitative tracks

add quantitative rna-seq-coverage.ini -v

Quantitative tracks can contain values displayed in a form of a chart along the sequence, such as RNA-seq coverage.

Adding alignments in sam format

add sam ESTs.ini -v

If you want to load a "countable" number of spliced alignments (not millions), use this command. Note: the web version of Persephone allows users to visualize large bam files.

Adding sequence storage

add storage

In case the database is not Oracle, storing sequences in the database is not allowed. The compressed sequence data is stored in the file system, with the metadata being loaded into the database. If the default storage space is not enough, add another storage interactively, using this command.

Adding orthologous gene pairs

add ortholog -c rice-corn.ini -v

The orthologous gene pairs can be calculated by different methods. Use this command to load this information supplied in a tab-delimited file.