Running PersephoneShell
Patterns of using PersephoneShell
Important
Currently, PersephoneShell does not support multiple sessions for writing into the database. Every time a data-modifying command is issued, it claims a lock on writing into the database and, if another session tries to modify the data, a warning will be displayed and the other operation will be blocked. The multi-tasking mode is coming soon.
PersephoneShell is used to load, inspect, and manipulate the objects in the database. This is done by issuing commands on the command line. Some commands require multiple parameters, and in such case, they should be specified in a separate "control" file (in a standard INI format) referenced on the command line as:
-c <control_file>
As a typical pattern, each loading operation should be preceded by a test run. First, run the command with -t switch (to test) and then, if all tests have been successful, remove -t and use -v parameter (verbose output). Note, the arrow up/down will browse through history of commands; the TAB key will try to auto-complete the commands or the file paths.
The help pages for each command have numerous examples. Please check them out.
When adding a new data set, it is common to find an existing sample INI file in a directory corresponding to the data type, make a copy of the INI file and edit its values. The sample INI files coming with the PersephoneShell package may contain the file versions suitable for different sources, such as NCBI or JGI. The samples use specifics of the format common for the data origin. For example, the GFF3 files from NCBI use AccessionNo as the name of maps, thus the section [MapMapping] includes the instruction
MapIdentifiedBy=AccessionNo
The format of the FASTA headers, and hence the parsing logic, is also different for different organizations producing the files. The name of the origin of the data is usually appended in the sample INI file name, for example: beetEL10-ncbi.ini, IRGSP-1.0-gramene.ini, Wm82.a4-phytozome.ini. Using the right template file will help minimize the editing and save you quite some time.
Some commands (edit, create) require specifying the map set on the command line. This information can be provided as a numeric MapSetId (e.g., 16) or as the full path to the map set in the map set tree (e.g., "Arabidopsis thaliana/TAIR10"). For example:
edit mapset 16
or
edit mapset "Arabidopsis thaliana/TAIR10"
The auto-complete feature greatly improves productivity when using the map set path. In the example above, type (note, the path is case-sensitive!):
edit mapset A
then press the TAB key to input Arabidopsis thaliana:
edit mapset "Arabidopsis thaliana/"
At this stage you can start pressing the TAB key to list all map sets in the current node or, to narrow down the list, type the first letters of the corresponding name.
Rules for the objects in the database
- Map set's AccessionNo should be unique across the entire database.
- Map name's AccessionNo should be unique within the map set.
- All track names should be unique within the map set. When adding a track using the add command, the tracks added to each map will have the same name.
- The annotation tracks reference the method of the annotation. Two tracks in a map set cannot use the same method.
- Track can be grouped under a parent node. The command to list the tree structure of tracks is list tracktree.
- Deleting a track means deleting multiple tracks with the given name from each map. The corresponding command is delete tracktreenode
- A marker is an object that can be mapped onto multiple maps. We need to distinguish a marker and a marker mapping.
Interactive versus Command Line Mode
PersephoneShell is executed either in a command line mode (also known as "batch mode") or in an interactive mode. The command line mode executes a single command at a time and returns to the OS prompt. It is useful when the commands are called from a shell script. The command returns 0 for normal termination and 1 for any error.
psh <command> <target> [OPTIONS]
Here and further, we will assume that psh is a script that has the connection name specified inside, so we will omit the pair of the connection arguments (-s <connection_name>) in the examples below. A typical script psh usually includes mono and would look like this:
mono $PSH_ROOT/psh.exe -s prod "$@"
The following screenshot shows an example.
ubuntu@P1:~/bin/psh$ ./psh list mapsettree
Arabidopsis thaliana
TAIR10
Fragaria
F.iinumae v1.0
F.vesca v4.0.a2
Fragaria x ananassa v1.0
Malus
Malus domestica HFTH1_v1.0.a1
Malus x domestica GDDH13 v1.1
Prunus
Prunus persica v2.0
Prunus armeniaca v1.0
Prunus yedoensis v1.0
Prunus dulcis v1.0
Pyrus
P.bretschneideri DangshanSuli v1.1
P.communis Bartlett DH v2.0
Pyrus betulifolia v1.0
P.ussuriensis_x_communis v1.0
Conifers
Picea abies v1.0
Rosa
Rosa chinensis v2.0
ubuntu@P1:~/bin/psh$
When forming the command line for the batch mode please keep in mind that the list of parameters passed to psh should start with a verb, such as 'add' or 'create', e.g.:
psh add sequence -c mapset1.ini -t
or when running the program on linux, without using the script:
mono psh.exe -s prod add sequence -c mapset1.ini -t
As you can see, after removing -s prod, the list of arguments starts with the command verb (add).
As for the interactive mode, it allows you to not only test test out the commands but get some summary of the data, such as listing the annotation qualifiers found in the GFF files, or extracting map names from a sequence file, etc.
The default shell prompt PS> indicates that you are in the interactive mode. To start PersephoneShell in the interactive mode, call the script psh without extra parameters.
ubuntu2@P5:~/bin/psh$ ./psh
PersephoneShell. Version Built on Apr 09, 2023 18:04
Copyright (C) 2014-2023 Persephone Software, LLC.
$DATA variable is set to '/home/ubuntu2/bin/psh/data/'
PS>
Tip
In the interactive mode, use the up and down arrows on your keyboard to scroll through your command history.
Displaying the Help Menu
To list supported commands, use the help command:
PS> help
PersephoneShell 1.0.8826.19279
Copyright (C) 2014-2024 Persephone Software, LLC.
add Add source(s) to the Persephone database.
analyze Analyze Persephone database parameters or a FASTA file.
backup Backup database/sequences/file storage.
cleanup Cleanup temporary or orphan vcf/sequence/blast folders.
clear Clear the screen
color Set the interface colors
create Run analysis and create entries in the DB
delete Delete object(s) from the Persephone database.
edit Edit selected object.
export Export DNA or protein sequences and create BLAST index files.
history Display the command history list.
init Initialize the Persephone database.
install Install a third-party program like BLAST, minimap2, etc.
list List objects in the database.
printmapping Print mapping table with two columns to use for MapMapping
section
quit Quit PersephoneShell.
restore Restore database/sequences/file storage.
reset Reset the interface colors to the default values
searchindex Synchronize or rebuild Solr search indexes. Some tracks can be
masked (skipped) from indexing.
update Update data in DB or Storage. Updating GC content will
create/update all the GC tracks (precalculated GC and N
statistics for each sequence)
cd Change current directory.
ls List directory contents.
version Display version information.
help Display more information on a specific command.
Check our online help at https://help.persephonesoft.com/LoadingDataintoPersephone.html
Typing 'help' followed by a command will displayed a detailed help about the command. For example, 'help export' and 'help export dna' will display different level of details about the (sub-)command.
Running PersephoneShell on Mac OS and Linux Computers
You can run PersephoneShell on Linux or Mac OS X computers by installing the Mono framework. Click the appropriate link below to learn more about the Mono framework and how to install it on your machine.
- Linux: http://www.mono-project.com/docs/about-mono/supported-platforms/linux/
- Mac OS X: http://www.mono-project.com/docs/about-mono/supported-platforms/osx/
Mono framework runs any executable file (*.exe) compiled in the .NET/Mono framework as shown below.
mono psh.exe -s <CONNECTION_STRING_NAME>
A sample script psh is included into the package. When using PersephoneShell on linux systems, the command 'mono psh.exe', needed to start the application, can be placed into the script. In this case, we can use the same syntax when launching PersephoneShell under Windows or linux:
psh -s <CONNECTION_STRING_NAME>
Note
All the needed software packages are preinstalled in the Persephone Docker image. The shell script persephone.sh supplied with the Docker image will launch PersephoneShell as: ./persephone.sh psh. Note that the script uses bash command line interpreter. It should be installed on the host operating system.