Patterns of using PersephoneShell

Important

Currently, PersephoneShell does not support multiple sessions for writing into the database. Every time a data-modifying command is issued, it claims a lock on writing into the database and, if another session tries to modify the data, a warning will be displayed and the other operation will be blocked. The multi-tasking mode is coming soon.

PersephoneShell is used to load, inspect, and manipulate the objects in the database. This is done by issuing commands on the command line. Some commands require multiple parameters, and in such case, they should be specified in a separate "control" file (in a standard INI format) referenced on the command line as: 

-c <control_file>

As a typical pattern, each loading operation should be preceded by a test run. First, run the command with -t switch (to test) and then, if all tests have been successful, remove -t and use -v parameter (verbose output). Note, the arrow up/down will browse through history of commands; the TAB key will try to auto-complete the commands or the file paths.

The help pages for each command have numerous examples. Please check them out.

When adding a new data set, it is common to find an existing sample INI file in a directory corresponding to the data type, make a copy of the INI file and edit its values. The sample INI files coming with the PersephoneShell package may contain the file versions suitable for different sources, such as NCBI or JGI. The samples use specifics of the format common for the data origin. For example, the GFF3 files from NCBI use AccessionNo as the name of maps, thus the section [MapMapping] includes the instruction

MapIdentifiedBy=AccessionNo

The format of the FASTA headers, and hence the parsing logic, is also different for different organizations producing the files. The name of the origin of the data is usually appended in the sample INI file name, for example: beetEL10-ncbi.ini, IRGSP-1.0-gramene.ini, Wm82.a4-phytozome.ini. Using the right template file will help minimize the editing and save you quite some time.

Some commands (edit, create) require specifying the map set on the command line. This information can be provided as a numeric MapSetId (e.g., 16) or as the full path to the map set in the map set tree (e.g., "Arabidopsis thaliana/TAIR10"). For example: 

edit mapset 16 

or 

edit mapset "Arabidopsis thaliana/TAIR10"

The auto-complete feature greatly improves productivity when using the map set path. In the example above, type (note, the path is case-sensitive!): 

edit mapset A

then press the TAB key to input Arabidopsis thaliana:

edit mapset "Arabidopsis thaliana/"

At this stage you can start pressing the TAB key to list all map sets in the current node or, to narrow down the list, type the first letters of the corresponding name.

Rules for the objects in the database

  • Map set's AccessionNo should be unique across the entire database.
  • Map name's AccessionNo should be unique within the map set.
  • All track names should be unique within the map set. When adding a track using the add command, the tracks added to each map will have the same name.
  • The annotation tracks reference the method of the annotation. Two tracks in a map set cannot use the same method.
  • Track can be grouped under a parent node. The command to list the tree structure of tracks is list tracktree.
  • Deleting a track means deleting multiple tracks with the given name from each map. The corresponding command is delete tracktreenode
  • A marker is an object that can be mapped onto multiple maps. We need to distinguish a marker and a marker mapping.

Interactive versus Command Line Mode

PersephoneShell is executed either in a command line mode (also known as "batch mode") or in an interactive mode. The command line mode executes a single command at a time and returns to the OS prompt. It is useful when the commands are called from a shell script. The command returns 0 for normal termination and 1 for any error.

psh <command> <target> [OPTIONS] 

Here and further, we will assume that psh is a script that has the connection name specified inside, so we will omit the pair of the connection arguments (-s <connection_name>) in the examples below. A typical script psh usually includes mono and would look like this:

mono $PSH_ROOT/psh.exe -s prod "$@"

The following screenshot shows an example.

ubuntu@P1:~/bin/psh$ ./psh list mapsettree
Arabidopsis thaliana
  TAIR10
Fragaria
  F.iinumae v1.0
  F.vesca v4.0.a2
  Fragaria x ananassa v1.0
Malus
  Malus domestica HFTH1_v1.0.a1
  Malus x domestica GDDH13 v1.1
Prunus
  Prunus persica v2.0
  Prunus armeniaca v1.0
  Prunus yedoensis v1.0
  Prunus dulcis v1.0
Pyrus
  P.bretschneideri DangshanSuli v1.1
  P.communis Bartlett DH v2.0
  Pyrus betulifolia v1.0
  P.ussuriensis_x_communis v1.0
Conifers
  Picea abies v1.0
Rosa
  Rosa chinensis v2.0
ubuntu@P1:~/bin/psh$

When forming the command line for the batch mode please keep in mind that the list of parameters passed to psh should start with a verb, such as 'add' or 'create', e.g.:

psh add sequence -c mapset1.ini -t

or when running the program on linux, without using the script:

mono psh.exe -s prod add sequence -c mapset1.ini -t

As you can see, after removing -s prod, the list of arguments starts with the command verb (add).

As for the interactive mode, it allows you to not only test test out the commands but get some summary of the data, such as listing the annotation qualifiers found in the GFF files, or extracting map names from a sequence file, etc. 

The default shell prompt PS> indicates that you are in the interactive mode. To start PersephoneShell in the interactive mode, call the script psh without extra parameters.


ubuntu2@P5:~/bin/psh$ ./psh
PersephoneShell. Version  Built on Apr 09, 2023 18:04
Copyright (C) 2014-2023 Persephone Software, LLC.
$DATA variable is set to '/home/ubuntu2/bin/psh/data/'

PS>

Tip

In the interactive mode, use the up and down arrows on your keyboard to scroll through your command history.

Displaying the Help Menu

To list supported commands, use the help command:

PS> help
PersephoneShell 1.0.8826.19279
Copyright (C) 2014-2024 Persephone Software, LLC.

  add             Add source(s) to the Persephone database.

  analyze         Analyze Persephone database parameters or a FASTA file.

  backup          Backup database/sequences/file storage.

  cleanup         Cleanup temporary or orphan vcf/sequence/blast folders.

  clear           Clear the screen

  color           Set the interface colors

  create          Run analysis and create entries in the DB

  delete          Delete object(s) from the Persephone database.

  edit            Edit selected object.

  export          Export DNA or protein sequences and create BLAST index files.

  history         Display the command history list.

  init            Initialize the Persephone database.

  install         Install a third-party program like BLAST, minimap2, etc.

  list            List objects in the database.

  printmapping    Print mapping table with two columns to use for MapMapping
                  section

  quit            Quit PersephoneShell.

  restore         Restore database/sequences/file storage.

  reset           Reset the interface colors to the default values

  searchindex     Synchronize or rebuild Solr search indexes. Some tracks can be
                  masked (skipped) from indexing.

  update          Update data in DB or Storage. Updating GC content will
                  create/update all the GC tracks (precalculated GC and N
                  statistics for each sequence)

  cd              Change current directory.

  ls              List directory contents.

  version         Display version information.

  help            Display more information on a specific command.

Check our online help at https://help.persephonesoft.com/LoadingDataintoPersephone.html

Typing 'help' followed by a command will displayed a detailed help about the command. For example, 'help export' and 'help export dna' will display different level of details about the (sub-)command.

Running PersephoneShell on Mac OS and Linux Computers

You can run PersephoneShell on Linux or Mac OS X computers by installing the Mono framework. Click the appropriate link below to learn more about the Mono framework and how to install it on your machine.

Mono framework runs any executable file (*.exe) compiled in the .NET/Mono framework as shown below.

mono psh.exe -s <CONNECTION_STRING_NAME>

A sample script psh is included into the package. When using PersephoneShell on linux systems, the command 'mono psh.exe', needed to start the application, can be placed into the script. In this case, we can use the same syntax when launching PersephoneShell under Windows or linux:

psh -s <CONNECTION_STRING_NAME>

Note

All the needed software packages are preinstalled in the Persephone Docker image. The shell script persephone.sh supplied with the Docker image will launch PersephoneShell as: ./persephone.sh psh. Note that the script uses bash command line interpreter. It should be installed on the host operating system.