Initializing the Schema
PersephoneShell comes with an init command to initialize a Persephone schema and add configuration values. If the schema is empty, PersephoneShell will automatically suggest running the init command right after PersephoneShell starts. Otherwise, the init command will display a warning that the schema is not empty and, with your permission, will clear the existing objects.
PS> init -v
Schema is not empty (2 Organism(s), 2 MapSet(s) found). If you proceed, all the data will be lost.
Do you want to initialize the schema? (Y/N) Y
Checking DB settings:
- max_connections != 500
Please review the warnings, they may help with the database performance issues.
The recommended parameters are listed at https://help.persephonesoft.com/SettingupthePersephoneSystem.html
Creating directory /home/ec2-user/bin/blast
Do you want to download the BLAST binary files from https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.10.0/ncbi-blast-2.10.0+-x64-linux.tar.gz? (Y/N) Y
Downloading BLAST binary files...
Extracting BLAST binary files into /home/ec2-user/bin/blast
Please specify a directory where BLAST data files will be stored.
BLAST data directory? /home/ec2-user/blastdata
Creating the Persephone schema...
Deleting objects... Completed
Creating objects... Completed
Inserting values... Completed
Schema have been successfully created.
Before uploading any data, please add at least one sequence storage.
NOTE: You can do it right now, or anytime later using 'add storage' command.
Do you want to add the sequence storage now? (Y/N) Y
Storage Id (positive integer)? 1
Configuration (path to the folder for sequences)? /home/ec2-user/seq
The folder already exists. All data in this folder will be erased. Do you want to proceed? (Y/N) Y
Storage priority (leave blank for highest priority)?
Following storage could be added:
- Id: 1
- Path: /home/ec2-user/seq
Do you want to add the storage? (Y/N) Y
'sequence.options' file is missing in storage folder '/home/ec2-user/seq'.
File created: '/home/ec2-user/seq/sequence.options'.
Storage added
Would you like to add a sample dataset (Arabidopsis thaliana)?
NOTE: You can do it anytime later by issuing the following two commands:
add organism -v -c <PersephoneShellFolder>\Samples\Organism\add_Arabidopsis_thaliana.ini
add sequencedatabase -v -c <PersephoneShellFolder>\Samples\SequenceDatabase\add_TAIR10.ini
Do you want to add the sample dataset? (Y/N) Y
During the initialization, PersephoneShell will check a few configuration values and will warn if it finds them different from the optimal values. This may be important if you are setting up a production system. Some values are critical, and if they differ from the correct ones the installation will halt.
One of the steps during installation is installing BLAST binaries. They will be installed into the folder specified in the PersephoneShell configuration file under BlastBinDir. You can reference an existing BLAST installation directory, in that case PersephoneShell will confirm if you want to keep using the existing directory.
Another important parameter is location of the BLAST index files. Enter the corresponding folder (please make sure it has all necessary write permissions) at the prompt for BLAST data directory. For each newly loaded genome and gene annotation, PersephoneShell will export the sequences from the database and store them in that folder.
Note:
The list of the BLAST data folders for each connection is stored in the file blast.ini. If needed, it can be edited manually.
For MySQL-compatible database, you will need to provide location of the storage for files with compressed genomic sequences. (With Oracle, you can have an option of storing them in the database, but we recommend using the external file storage - the sequences will be efficiently compressed).
Persephone can use several storage locations for the genomic sequences, each will have its own ID. For example, if you run out of space, you can add a new storage and start using it for the new sequences. During the initialization, you can give the first storage an ID=1. As with other writable folders, please ensure that the folder has all necessary permissions.
We recommend adding the sample data set of Arabidopsis thaliana genome. It will create a map set with a couple of tracks and corresponding BLAST data files, which will help you test the entire system.
Note:
The database can also be initialized from a backup archive created by the command backup. In this case please use the command restore.