Running PersephoneShell
Patterns of using PersephoneShell
Important
Currently, some of the PersephoneShell commands do not support multiple sessions for writing into the database. Every time such data-modifying command is issued, it claims a lock on writing into the database and, if another session issues a conflicting command, a warning will be displayed and the other operation will be blocked. Some commands are automatically queued.
Using control files
PersephoneShell is used to load, inspect, and manipulate the objects in the database. This is done by issuing commands on the command line. Some commands require multiple parameters, and in such case, they should be specified in a separate "control" file (in a standard INI format) referenced on the command line as:
-c <control_file>
First test then run
As a typical pattern, each loading operation should be preceded by a test run. First, run the command with -t switch (to test) and then, if all tests have been successful, remove -t and use -v parameter (verbose output). Note, the arrow up/down will browse through history of commands; the TAB key will try to auto-complete the commands or the file paths.
Use help commands
A quick command reference is available with the command 'help'. It will list all available commands and if you start typing the particular command after the word help, the instructions will be more specific:
PS> help edit mapset
EDIT MAPSET
Change properties of a Map Set, such as its name, AccessionNo or description.
SYNTAX: EDIT MAPSET <mapSetId|mapSetPath> [-c controlFile]
https://help.persephonesoft.com/Editmapset.html
The help pages for each command have numerous examples. Please check them out.
Start with the existing INI files
When adding a new data set, it is common to find an existing sample INI file in a directory corresponding to the data type, make a copy of the INI file and edit its values. The sample INI files coming with the PersephoneShell package (see the subfolder Samples under the location of psh.exe, the Docker container has it at /data/psh/Samples) may contain the file versions suitable for different sources, such as NCBI or JGI. The samples use specifics of the format common for the data origin. For example, the GFF3 files from NCBI use AccessionNo as the name of maps, thus the section [MapMapping] includes the instruction
MapIdentifiedBy=AccessionNo
The format of the FASTA headers, and hence the parsing logic, is also different for different organizations producing the files. The name of the origin of the data is usually appended in the sample INI file name, for example: beetEL10-ncbi.ini, IRGSP-1.0-gramene.ini, Wm82.a4-phytozome.ini. Using the right template file will help minimize the editing and save you quite some time.
Address map sets via MapSetId or path
Some commands (edit, create) require specifying the map set on the command line. This information can be provided as a numeric MapSetId (e.g., 16) or as the full path to the map set in the map set tree (e.g., "Arabidopsis thaliana/TAIR10"). For example:
edit mapset 16
or
edit mapset "Arabidopsis thaliana/TAIR10"
The auto-complete feature greatly improves productivity when using the map set path. In the example above, type (note, the path is case-sensitive!):
edit mapset A
then press the TAB key to input Arabidopsis thaliana:
edit mapset "Arabidopsis thaliana/"
At this stage you can start pressing the TAB key to list all map sets in the current node or, to narrow down the list, type the first letters of the corresponding name.
Rules for the objects in the database
- Map set's AccessionNo should be unique across the entire database.
- Map name's AccessionNo should be unique within the map set.
- All track names should be unique within the map set. When adding a track using the add command, the tracks added to each map will have the same name.
- The annotation tracks reference the method of the annotation. Two tracks in a map set cannot use the same method.
- Track can be grouped under a parent node. The command to list the tree structure of tracks is list tracktree.
- Deleting a track means deleting multiple tracks with the given name from each map. The corresponding command is delete tracktreenode
- A marker is an object that can be mapped onto multiple maps. We need to distinguish a marker and a marker mapping.
Interactive versus Command Line Mode
PersephoneShell is executed either in a command line mode (also known as "batch mode") or in an interactive mode. The command line mode executes a single command at a time and returns to the OS prompt. It is useful when the commands are called from a shell script. The command returns 0 for normal termination and 1 for any error.
psh <command> <target> [OPTIONS]
Here and further, we will assume that psh is a script that has the connection name specified inside, so we will omit the pair of the connection arguments (-s <connection_name>) in the examples below. A typical script psh usually includes mono and would look like this:
mono $PSH_ROOT/psh.exe -s prod "$@"
The following screenshot shows an example.
ubuntu@P1:~/bin/psh$ ./psh list mapsettree
Arabidopsis thaliana
TAIR10
Fragaria
F.iinumae v1.0
F.vesca v4.0.a2
Fragaria x ananassa v1.0
Malus
Malus domestica HFTH1_v1.0.a1
Malus x domestica GDDH13 v1.1
Prunus
Prunus persica v2.0
Prunus armeniaca v1.0
Prunus yedoensis v1.0
Prunus dulcis v1.0
Pyrus
P.bretschneideri DangshanSuli v1.1
P.communis Bartlett DH v2.0
Pyrus betulifolia v1.0
P.ussuriensis_x_communis v1.0
Conifers
Picea abies v1.0
Rosa
Rosa chinensis v2.0
ubuntu@P1:~/bin/psh$
When forming the command line for the batch mode please keep in mind that the list of parameters passed to psh should start with a verb, such as 'add' or 'create', e.g.:
psh add sequence -c mapset1.ini -t
or when running the program on linux, without using the script:
mono psh.exe -s prod add sequence -c mapset1.ini -t
As you can see, after removing -s prod, the list of arguments starts with the command verb (add).
As for the interactive mode, it allows you to not only test test out the commands but get some summary of the data, such as listing the annotation qualifiers found in the GFF files, or extracting map names from a sequence file, etc.
The default shell prompt PS> indicates that you are in the interactive mode. To start PersephoneShell in the interactive mode, call the script psh without extra parameters.
ubuntu2@P5:~/bin/psh$ ./psh
PersephoneShell. Version Built on Apr 09, 2023 18:04
Copyright (C) 2014-2023 Persephone Software, LLC.
$DATA variable is set to '/home/ubuntu2/bin/psh/data/'
PS>
Tip
In the interactive mode, use the up and down arrows on your keyboard to scroll through your command history.
Running PersephoneShell on Mac OS and Linux Computers
You can run PersephoneShell on Linux or Mac OS X computers by installing the Mono framework (note, the Docker image has it preinstalled). Click the appropriate link below to learn more about the Mono framework and how to install it on your machine.
- Linux: http://www.mono-project.com/docs/about-mono/supported-platforms/linux/
- Mac OS X: http://www.mono-project.com/docs/about-mono/supported-platforms/osx/
Mono framework runs any executable file (*.exe) compiled in the .NET/Mono framework as shown below.
mono psh.exe -s <CONNECTION_STRING_NAME>
A sample script psh is included into the package. When using PersephoneShell on linux systems, the command 'mono psh.exe', needed to start the application, and the connection name can be placed into the script. In this case, we can use the same syntax when launching PersephoneShell under Windows or linux:
./psh
The script for starting PersephoneShell typically includes the connection name and has this form:
mono psh.exe -s prod "$@"
The pair of parameters '-s prod' specifies the connection name (prod) and the last parameter "$@" will pass all other arguments, such as commands for immediate execution, from the command line to the application.
Note
All the needed software packages are preinstalled in the Persephone Docker image. The shell script persephone.sh supplied with the Docker image will launch PersephoneShell as: ./persephone.sh psh. Note that the script uses bash command line interpreter. It should be installed on the host operating system. You can create a launching script psh that will look like this:
./persephone.sh psh "$@"
With this script, launching the application from outside Docker can be done with one word: psh. This will allow you to reuse the commands for running PersephoneShell examples from this documentation.