Loading Data into Persephone
The data in the Persephone system is stored in the database and the file system. The relational database contains the majority of objects that need consistency. The bulk of compressed binary blocks, such as genomic sequences, BAM/CRAM records, or variant data, is stored outside the database in the file system, which makes it easier to maintain and backup.
The data is loaded by using PersephoneShell (psh). This tool is designed to effectively manage various database objects that the Persephone client program visualizes. The shell allows users to list, add, and modify objects like organisms, genomic sequences, gene annotation, and so forth.
Please note that PersephoneShell is an administrative tool that should be run by a person that has received proper training. The following documentation should provide enough knowledge to maintain all the data management tasks.
The data in the database is shared between all users of the Persephone application. On top of that, the users can add their individual data sets, which are visible to them only. The user data is stored in the file system and is not accessible by PersephoneShell. All maintenance of the individual data sets is done via the main Persephone client application. The external files with genomic sequences, gene annotation, NGS read alignments, etc. are uploaded via drag&drop or by a URL.
Tip
It is quite common to use the drag&drop feature to preview the data files before loading them into the database.
The Persephone software stack is usually supplied as a Docker image, with all components pre-installed. The following sections, including Setting Up PersephoneShell or Initializing the Schema, offer ways of advanced customization and can be initially skipped. The page describing typical steps of using PersephoneShell under Docker is available here.
Click the following links for more information.
- Setting Up PersephoneShell. This section describes how to install PersephoneShell.
- Running PersephoneShell. How to run PersephoneShell from a command prompt. Learn some common tricks.
- Initializing the Schema. How to use the init command to initialize the Persephone schema and add configuration values.
- Use Case. A use case where PersephoneShell is used to add an organism and corresponding map sets, maps, sequences, annotations, markers and other tracks. This section can be used as a basic tutorial.
- Commands. The detailed reference of PersephoneShell's commands.
- Control files. The structure of the INI file format and common rules of editing the files.