This section provides a use case of adding Oryza sativa japonica (Japanese rice) to your Persephone database using PersephoneShell. To add rice genomic data you will need to perform the following tasks:

  1. Add an Organism
  2. Download the genomic annotation for Oryza sativa. You will need these files for Steps 3, 5, and 6 below. Please note, you must save these files in a location where they can be accessed by PersephoneShell.
  3. Add Map Set, Maps, and Sequences
  4. Optional. Verify Addition of Maps, Map Sets, and Map Set Trees
  5. Add Gene Annotations
  6. Add Markers

Please also familiarize yourself with Persephone's data hierarchy, which is described below.

Tip

If you need to delete any of the data you have loaded, perform the steps outlined in Deleting Loaded Data or use the 'init' command.

Persephone Data Hierarchy

The figure below shows some of the Persephone objects and their relationship.

Data Hierarchy

Data Hierarchy

The highest-level model object is Organism that contains taxonomy information, such as scientific name and common name. The corresponding genomic data for Organism can be organized into multiple map sets, which usually represent different assembly builds. A map set consists of chromosomes, scaffolds, or genetic maps, also known as maps. A physical map, such as a chromosome or a scaffold, represents a genomic sequence with features located in base pairs (bp; 1-based), while a genetic map is represented as distances between genetic markers and gene loci in centiMorgan (cM; floating numbers).

The mapped features are displayed in a form of tracks. A map can contain multiple tracks of different kinds including

  • annotation tracks with gene models,
  • marker tracks with SNP markers or repeats,
  • quantitative tracks with RNA-seq coverage or methylation profiles,
  • variation tracks with multiple genotypes,
  • QTL tracks,
  • tracks with expression data, etc.