Note

You can skip this section if you are using the supplied Docker image with the entire Persephone software stack. Still, you might find it useful to read the section on preserving the configuration between the software updates. The configuration file psh.exe.config and, optionally, custom.config are located in the directory /data/psh inside the Docker container.

If you are using your own installation of PersephoneShell (as opposed to the pre-configured Docker container), please configure a few variables. Edit the file psh.exe.config residing in the same directory as PersephoneShell. Things to configure include:

  • database connection string
  • directories for BLAST, sequences, source data root and temporary files
  • selecting whether DIAMOND or BLASTP should be used to find orthologs
  • Solr index settings that include the server URL, core name and an optional qualifier filter

The file psh.exe.config is an XML file with predefined and custom configuration sections. Please check the highlighted text - these are the most common settings that need to be provided for each installation. The same PersephoneShell binary can be used to control several databases which are referred by different connection names (here, DevMaria, ORCL1, ORCL2). Specify which database to use by giving the proper connection name on the command line (see below).

...
<!-- Connection strings for PersephoneShell-->
  <connectionStrings>
    <clear />
    <add name="ORCL1" providerName="Oracle.ManagedDataAccess.Client"  connectionString="***" />
    <add name="ORCL2" providerName="Oracle.ManagedDataAccess.Client" connectionString="???" />
    <add name="DevMaria" providerName="MySql.Data.MySqlClient" connectionString="---" />
  </connectionStrings>

  <!-- Advanced configuration for PersephoneShell 
        TempDir:          use an alternative temp dir here. If empty, it will use OS temp
        DataDir:          root location for data, this value is referenced by $DATA in the file path.
        BlastBinDir:      location of binaries for BLAST and DIAMOND. If omitted, the BLAST functionality will be disabled.
        BlastParams:      extra parameters to run BLASTP for finding orthologs.
        DiamondParams:    extra parameters to run DIAMOND for finding orthologs.
        OrthologFinder:   "BLASTP" (default) or "DIAMOND".
        PromptFormat:     custom format for the command line prompt. See online documentation.
   -->
  <PersephoneShell TempDir="/tmp" 
                   DataDir="/data/Data" 
                   BlastBinDir="~/bin/blast"
                   BlastParams="-evalue 1e-5 -max_target_seqs 5 -max_hsps 1 -word_size 4 -threshold 100 -num_threads 8"
                   DiamondParams="-e 1e-5 --max-target-seqs 5 --max-hsps 2"
                   OrthologFinder="DIAMOND"
                   PromptFormat="$g"
                   > 
    <!-- Sets colors for error, warning, stacktrace, input or prompt among
      Black, Blue, Cyan, DarkBlue, DarkCyan, DarkGray, DarkMagenta, DarkRed, DarkYellow, Gray, Green, Magenta, Red, White and Yellow.
      Error:Red, Warning:Yellow and StackTrace:Cyan is default. -->
    <!--<ConsoleColor Error="DarkCyan" Warning="DarkGray" StackTrace="DarkYellow" Prompt="White" Input="Gray"/>-->
    <ConsoleColor Prompt="White" Input="Gray"/>
  </PersephoneShell>
...

In the <connectionStrings> section, PersephoneShell will look for the named connection strings. The command line requires the connection name to be specified after the parameter -s:

psh -s DevMaria

The line above instructs PersephoneShell to find the connection string named DevMaria in the configuration file and use other settings, such as folders for sequences or data files, under that name.

In case the Persephone Docker image is used, the connection name is 'persephone'.

For security purposes, the administrators should provide the encrypted connection strings. To encrypt your connection string, use the cipher program in the package. The option -e (shown below) will encrypt and the option -d decrypt the connection strings. The option -c will copy the result to the clipboard.

cipher -e "scott/tiger@localhost:1521/orcl1" -c

<PersephoneShell> section

The <PersephoneShell> section is designed to customize advanced options in the program. 

  • TempDir - overwrites the location of the default temporary disk space defined by the operating system. If the value for TempDir is empty, the default OS temporary partition is used, which sometimes could be too restrictive.
  • DataDir points to a root of the data folders that can be accessed by the variable $DATA. This variable can be used in the INI files with loading instructions. It is especially useful if you have several database instances located on different machines even with different operating systems. For example, your production can be hosted on linux, while the testing can be done on Windows. The $DATA variable can be included into the path to the data in the INI files (Source=). The same file hosted on Windows or linux can be specified by the same path value, e.g.,: 
    $DATA/genes.gff

    Note that even on Windows, the path to the data files in the INI files can use the forward slash. 

Important: 

$DATA is recognized only at the start of the string value. It is a common mistake to write the file path as /$DATA/mydata/genes.gff. The first slash prevents recognizing $DATA as a variable, it will be treated literally as "/$DATA".

  • BlastBinDir - PersephoneShell will prepare BLAST files (genomic and protein sequences) by exporting sequences from the database using internal IDs. Specify the location of BLAST package that you would like to use. If PersephoneShell does not find BLAST at that location, it will, with your permission, download the binaries from NCBI (version 2.10+) and install them into this folder (command 'install blast').
    Note, that the location of the BLAST data files is dependent on the connection name. It is specified in the section <connectionSettings> under the node BlastDbStorage:


 <ConnectionSettings>
    <Connection Name="DevMaria">
      <FileStorage Path="/data/prod/FileStorage" />
      <BlastDbStorage Path="/data/prod/BlastDB" />
    </Connection>
 </ConnectionSettings>

  • BlastParams - BLASTP arguments used when PersephoneShell runs the ortholog search using BLASTP.
  • DiamondParams - DIAMOND parameters to run the ortholog finding.
  • OrthologFinder - one of "BLASTP" (default) or "DIAMOND". The protein sequence aligner DIAMOND (http://diamondsearch.org) runs much faster than BLASTP providing a similar sensitivity.
  • PromptFormat allows you to customize the shell prompt in the interactive mode. Several predefined variables are listed in the table below.

Prompt Format Variables

Variable

Meaning

$p

PS

$g

>

$n

new line

$d

current date

$t

current time

 

If the PromptFormat property is not specified, a default prompt format '$p$o$g ', which corresponds to 'PS> ', will be used. Other characters, besides the predefined variables in the table above, will be printed as is. For example, the prompt format 'Ceres$g ' will display 'Ceres> '.

The <ConsoleColor> section is used to customize colors of different types of messages, such as Error, Warning, Prompt, Input, StackTrace. Supported colors include Black, Blue, Cyan, DarkBlue, DarkCyan, DarkGray, DarkMagenta, DarkRed, DarkYellow, Gray, Green, Magenta, Red, White, and Yellow

The default color is Red for Error, Yellow for Warning, and Cyan for StackTrace. The PersephoneShell command 'color' provides the way to customize the colors in the run time. This way each user can create their own color schema.

<SearchIndex> section

Specify the settings for Solr index by using the section <SearchIndex>. It should include the name of the connection, the corresponding server URL and the core name:

<SearchIndex>
    <Connection Name="test" Url="http://localhost:8983/solr/" CoreName="test" />
    <Connection Name="DevMaria" Url="http://localhost:8983/solr/" CoreName="dev" />
    <Connection Name="prod" Url="http://localhost:8983/solr/" CoreName="prod" />
</SearchIndex>

More details on configuring this node can be found here.

Oracle DB sequences

At the end of each INI file, you can find a section [DbSequences]. It contains the mapping between the database column names and corresponding Oracle sequences used to populate the new IDs. This relationships can be defined once for the entire database, making it unnecessary to specify the DB sequences in each INI file. This will help using the command that do not require the INI file but still write into the database.

The table linking the column names to the corresponding Oracle sequences can be provided in a file called dbsequences.ini, located in the same directory as psh.exe. Here is the sample text of such file listing all columns that eventually get populated with the new IDs applied for the database connection called 'connection1':


[connection1]
DESCRIPTION.DESCR_ID=ID_SEQ
TRACK.TRACK_ID=ID_SEQ
TRACK_STYLE.TRACK_STYLE_ID=ID_SEQ
BLAST_ALIGNED_SEQ.BLAST_ALIGNED_SEQ_ID=ID_SEQ
BLAST_ALIGNED_SEQ_QUALIFIER.QUALIFIER_ID=ID_SEQ
...

The full listing of all possible database columns can be provided upon request.

Note, the mapping listed in the [DbSequences] sections of the INI files will have higher priority than the records in the file dbsequences.ini.

Preserving the configuration

With Docker installation, every new version of the software comes with its own copy of psh.exe.config which contains the default configuration parameters. Please note that the new file overwrites the previous one. To preserve your changes please use one of two methods.

1. Compare the new psh.exe.config to the backup copy (not recommended)

The old configuration file is saved as the file psh.exe.config.bak. If you have used your custom values in psh.exe.config, compare the old and the new file versions (e.g., by using the command diff) and restore the customized values overwritten by the new file. (Sometimes, the new file contains new instructions needed for the new version, so it is unsafe to merely restore your copy of psh.exe.config. )

2. Use the file custom.config

To avoid editing the configuration file after each upgrade, extract the custom values and save them in the file custom.config that should reside in the same directory as PersephoneShell binaries (for Docker, /data/psh inside the container). The values from this file will overwrite the corresponding values in the new copy of psh.exe.config. For example, if your $DATA variable (specified in the node <PersephoneShell> as DataDir) should be set to /mnt/shared/data, the instruction in custom.config will be:

PersephoneShell.DataDir="/mnt/shared/data"

In general, to overwrite a value of some variable in a node, specify the path to the variable as the NodeName.Variable. The node can be nested in another node, so its address should use the full path from the root. For example, the node <PersephoneShell> can have a child node <ConsoleColor>:


 <PersephoneShell TempDir="/var/tmp" DataDir="/data/Data" ...>
    <ConsoleColor Prompt="White" Input="Green" Highlight="Cyan" />
  </PersephoneShell>


To overwrite the value of Prompt color, address it as

PersephoneShell.ConsoleColor.Prompt="Red"

The value of a variable in a node can be used to identify a particular node if there is the need to distinguish the node among multiple candidates with the same path. For example, a node <ConnectionSettings> defines configuration for connections with different names:



 <ConnectionSettings>
    <Connection Name="prod">
      <FileStorage Path="/data/prod/FileStorage" />
      <BlastDbStorage Path="/data/prod/BlastDB" />
    </Connection>
    <Connection Name="test">
      <FileStorage Path="/data/test/FileStorage" />
      <BlastDbStorage Path="/data/test/BlastDB" />
      <SequenceStorage Path="/data/test/sequences" />
    </Connection>
 </ConnectionSettings>


To change BlastDB path for the connection "test", address it as

ConnectionSettings.Connection.Name(test).BlastDbStorage.Path="/ebs/test/BlastDB"

Here are other examples of using custom.config.

Task

Syntax of the original configuration file

Syntax of custom.config

Change the connection string for connection named "prod"

<connectionStrings>

    <clear />

    <add name="prod" providerName="MySql.Data.MySqlClient" connectionString="prod-connection-string..." />

    <add name="test" providerName="MySql.Data.MySqlClient" connectionString="test-connection-string..." />

</connectionStrings>


connectionString.add.name(prod).connectionString="new connection string..."

Change directory for the user data in Persephone

 <UserSettings AnonymousLoginOnly="false" AllowAnonymousLogin="true" UsersDataDirectory="{PWD}/Users/" ... QuotaMb="200" />

UserSettings.UserDataDirectory="/bigvolume/Users"

Set the value for default temporary directory from an environment variable

  <appSettings>

    <add key="MultimapEnabled" value="true" />

    <!-- Prevents CSHTML files from being directly accessed by the browser -->

    <add key="webPages:Enabled" value="false" />

    <add key="DefaultTempDirectory" value="/var/tmp" />

  </appSettings>

appSettings.add.key(DefaultTempDirectory).value=$TEMP

Add the text of the node verbatim. Use _ADD instruction.

Add a node:
<UserSettingsOverride Users="max*@persephonesoft.com" QuotaMb="10240" UsersDataDirectory="{PWD}/Users/"/>

UserSettings._ADD=<UserSettingsOverride Users="max*@persephonesoft.com" QuotaMb="10240" UsersDataDirectory="{PWD}/Users/"/>


Add multiple nodes in a multi-line text. Use a backtick (`) symbol around the block of text.

Add multiple nodes:

       <UserSettingsOverride Users="*@persephonesoft.com" QuotaMb="10240" UsersDataDirectory="{PWD}/Users/"/>

       <UserSettingsOverride Users="*max*" QuotaMb="10240" UsersDataDirectory="{PWD}/Users/"/>



UserSettings._ADD=`

       <UserSettingsOverride Users="*@persephonesoft.com" QuotaMb="10240" UsersDataDirectory="{PWD}/Users/"/>

       <UserSettingsOverride Users="*lab*" QuotaMb="20480" UsersDataDirectory="{PWD}/Users/"/>

`


Remove the first record that meets the criteria. Use _REMOVE instruction. For example, remove the flag MultiThreads for the connection named persephone

<Connection Name="persephone" Url="http://localhost:8983/solr/" CoreName="persephone"  MultiThreads="true">

SearchIndex.Connection.Name(persephone).MultiThreads._REMOVE

Remove all child nodes. Use _REMOVEALL to delete all QualifierFilter records.

<Connection Name="persephone" Url="http://localhost:8983/solr/" CoreName="persephone"  MultiThreads="true">

      <QualifierFilter Type="MARKER" Indexing="false"/>

      <QualifierFilter Type="MARKER" Name="CLNDN" Indexing="true"/>

      <QualifierFilter Type="MARKER" Name="rsId" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Indexing="false"/>

      <QualifierFilter Type="ANNOT" Name="ID" Indexing="true" />
</Connection>


SearchIndex.Connection.Name(persephone).QualifierFilter._REMOVEALL

Replace all QualifierFilter records for the connection called persephone. First, remove all existing filters by using _REMOVEALL, then add a multi-line text block enclosing it in backticks (`)

<Connection Name="persephone" Url="http://localhost:8983/solr/" CoreName="persephone"  MultiThreads="true">

      <QualifierFilter Type="MARKER" Indexing="false"/>

      <QualifierFilter Type="MARKER" Name="rsId" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Indexing="false"/>

      <QualifierFilter Type="ANNOT" Name="ID" Indexing="true" />

      <QualifierFilter Type="ANNOT" Name="Note" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="Info" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="Function description" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="Description" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="Alias" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="SwissProt match" Indexing="true"/>

</Connection>


SearchIndex.Connection.Name(persephone).QualifierFilter._REMOVEALL

SearchIndex.Connection.Name(persephone)._ADD=`

      <QualifierFilter Type="MARKER" Indexing="false"/>

      <QualifierFilter Type="MARKER" Name="rsId" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Indexing="false"/>

      <QualifierFilter Type="ANNOT" Name="ID" Indexing="true" />

      <QualifierFilter Type="ANNOT" Name="transcript_id" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="transcriptName" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="transcriptId" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="transcriptID" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="product" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="old_locus_tag" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="note" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="locus_tag" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="iwgsc_id" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="gene_id" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="gene_synonym" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="geneName" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="geneId" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="gene" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="description" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="definition" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="alias" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="Synonym" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="Parent" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="Name" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="Note" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="Info" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="Function description" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="Description" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="Alias" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="SwissProt match" Indexing="true"/>

`