If you are using your installation of PersephoneShell (as opposed to the pre-configured docker container), a few variables need to be configured. Edit the file psh.exe.config that resides in the same directory as PersephoneShell. Things to configure include:

  • database connection string
  • directories for BLAST, data and temporary files
  • selecting whether DIAMOND or BLASTP should be used to find orthologs
  • Solr index settings that include the server URL, core name and optional qualifier filter

The file psh.exe.config is an XML file with predefined and custom configuration sections. Please check the highlighted text - these are the most common settings that need to be provided for each installation.

...
<!-- Connection strings for PersephoneShell-->
  <connectionStrings>
    <clear />
    <add name="ORCL1" providerName="Oracle.ManagedDataAccess.Client"  connectionString="***" />
    <add name="ORCL2" providerName="Oracle.ManagedDataAccess.Client" connectionString="???" />
    <add name="MYSQL" providerName="MySql.Data.MySqlClient" connectionString="---" />
  </connectionStrings>

  <!-- Advanced configuration for PersephoneShell 
        TempDir:          use an alternative temp dir here. If empty, it will use OS temp
        DeleteOrphanData: if true, objects without parent will be deleted, such as markers without any mapping.
        DataDir:          root location for data, this value is referenced by $DATA in the file path.
        BlastBinDir:      location of binaries for BLAST and DIAMOND. If omitted, the BLAST functionality will be disabled.
        BlastParams:      extra parameters to run BLASTP for finding orthologs.
        DiamondParams:    extra parameters to run DIAMOND for finding orthologs.
        OrthologFinder:   "BLASTP" (default) or "DIAMOND".
        PromptFormat:     custom format for the command line prompt. See online documentation.
   -->
  <PersephoneShell DeleteOrphanData="true" 
                   TempDir="/tmp" 
                   DataDir="~/bin/psh/data" 
                   BlastBinDir="~/bin/blast"
                   BlastParams="-evalue 1e-5 -max_target_seqs 5 -max_hsps 1 -word_size 4 -threshold 100 -num_threads 8"
                   DiamondParams="-e 1e-5 --max-target-seqs 5 --max-hsps 2"
                   OrthologFinder="DIAMOND"
                   PromptFormat="$g"
                   > 
    <!-- Sets colors for error, warning, stacktrace, input or prompt among
      Black, Blue, Cyan, DarkBlue, DarkCyan, DarkGray, DarkMagenta, DarkRed, DarkYellow, Gray, Green, Magenta, Red, White and Yellow.
      Error:Red, Warning:Yellow and StackTrace:Cyan is default. -->
    <!--<ConsoleColor Error="DarkCyan" Warning="DarkGray" StackTrace="DarkYellow" Prompt="White" Input="Gray"/>-->
    <ConsoleColor Prompt="White" Input="Gray"/>
  </PersephoneShell>
...

In the <connectionStrings> section, PersephoneShell will look for the named connection strings. The command line requires the connection name to be specified after the parameter -s:

psh -s ORCL1

The line above instructs PersephoneShell to find the connection string named 'ORCL1' in the configuration file. 

It is also important to use the correct database provider. As MariaDb is MySql-compatible, please use MySql.Data.MySqlClient as the name of provider for MariaDb.

For security purposes, the administrators should provide the encrypted connection strings similar to those used by the Persephone main application. To encrypt your connection string, use the cipher program in the package. The option -e (shown below) will encrypt and the option -d decrypt the connection strings. The option -c will copy the result to the clipboard.

cipher -e "scott/tiger@localhost:1521/orcl1" -c

<PersephoneShell> section

The <PersephoneShell> section is designed to customize advanced options in the program. 

  • DeleteOrphanData - when deleting tracks, some objects, like markers, can be left by themselves, without being tied to specific locations in a map. If DeleteOrphanData is true, such "orphan" markers will be deleted.
  • TempDir - overwrites the location of the default temporary disk space defined by the operating system. If the value for TempDir is empty, the default OS temporary partition is used, which sometimes could be too restrictive.
  • DataDir points to a root of the data folders that can be accessed by the variable $DATA. This variable can be used in the INI files with loading instructions. It is especially useful if you have several database instances located on different machines even with different operating systems. For example, your production can be hosted on linux, while the testing can be done on Windows. The $DATA variable can be included into the path to the data in the INI files (Source=). The same file hosted on Windows or linux can be specified by the same path value, e.g.,: 
    $DATA/genes.gff

    Note that even on Windows, the path to the data files in the INI files can use the forward slash. 

Important: 

$DATA is recognized only at the start of the string value. It is a common mistake to write the file path as /$DATA/mydata/genes.gff. The first slash prevents recognizing $DATA as a variable, it will be treated literally as "$DATA".

  • BlastBinDir - PersephoneShell will prepare BLAST files (genomic and protein sequences) by exporting sequences from the database using internal IDs. Specify the location of BLAST package that you would like to use. If PersephoneShell does not find BLAST at that location, it will download the binaries from NCBI (version 2.10+) and install them into this folder (command 'install blast').
    Note, that the location of the BLAST data files is dependent on the database schema
  • BlastParams - BLASTP arguments used when PersephoneShell runs the ortholog search
  • DiamondParams - DIAMOND parameters to run the ortholog finding.
  • OrthologFinder - one of "BLASTP" (default) or "DIAMOND". The protein sequence aligner DIAMOND (http://diamondsearch.org) runs much faster than BLASTP providing a similar sensitivity.
  • PromptFormat allows you to customize the shell prompt in the interactive mode. Several predefined variables are listed in the table below.

Prompt Format Variables

Variable

Meaning

$p

PS

$g

>

$n

new line

$d

current date

$t

current time

 

If the PromptFormat property is not specified, a default prompt format '$p$o$g ', which corresponds to 'PS> ', will be used. Other characters, besides the predefined variables in the table above, will be printed as is. For example, the prompt format 'Ceres$g ' will display 'Ceres> '.

The <ConsoleColor> section is used to customize colors of different types of messages, such as Error, Warning, Prompt, Input, StackTrace. Supported colors include Black, Blue, Cyan, DarkBlue, DarkCyan, DarkGray, DarkMagenta, DarkRed, DarkYellow, Gray, Green, Magenta, Red, White, and Yellow

The default color is Red for Error, Yellow for Warning, and Cyan for StackTrace.

<SearchIndex> section

Specify the settings for Solr index by using the section <SearchIndex>. It should include the name of the connection, the corresponding server URL and the core name:

<SearchIndex>
    <Connection Name="test" Url="http://localhost:8983/solr/" CoreName="test" />
    <Connection Name="dev" Url="http://localhost:8983/solr/" CoreName="dev" />
    <Connection Name="prod" Url="http://localhost:8983/solr/" CoreName="prod" />
</SearchIndex>

More details on configuring this node can be found here.

Oracle DB sequences

At the end of each INI file, you can find a section [DbSequences]. It contains the mapping between the database column names and corresponding Oracle sequences used to populate the new IDs. This relationships can be defined once for the entire database, making it unnecessary to specify the DB sequences in each INI file. This will help using the command that do not require the INI file but still write into the database.

The table linking the column names to the Oracle sequences can be provided in a file called dbsequences.ini, located in the same directory as psh.exe. Here is the sample text of such file listing all columns that eventually get populated with the new IDs applied for the database connection called 'connection1':


[connection1]
DESCRIPTION.DESCR_ID=ID_SEQ
TRACK.TRACK_ID=ID_SEQ
TRACK_STYLE.TRACK_STYLE_ID=ID_SEQ
BLAST_ALIGNED_SEQ.BLAST_ALIGNED_SEQ_ID=ID_SEQ
BLAST_ALIGNED_SEQ_QUALIFIER.QUALIFIER_ID=ID_SEQ
...

The full listing of all possible database columns can be provided upon request.

Note, the mapping listed in the [DbSequences] sections of the INI files will have higher priority than the records in the file dbsequences.ini.

Preserving the configuration

With Docker installation, every new version of the software comes with its own copy of psh.exe.config that contains the default configuration parameters. Please note that the new file overwrites the previous one. To preserve your changes please use one of two methods.

1. Compare the new psh.exe.config to the backup copy (not recommended)

The old configuration file is saved as the file psh.exe.config.bak. If you have used your custom values in psh.exe.config, compare the old and the new file versions (e.g., by using the command diff) and restore the values overwritten by the new file. (Sometimes, the new file contains new instructions needed for the new version, so it is unsafe to merely restore your copy of psh.exe.config. )

2. Use the file custom.config

To avoid editing the configuration file after each upgrade, extract the custom values and save them in the file custom.config. The values from this file will overwrite the corresponding values in the new copy of psh.exe.config. For example, if your $DATA variable (specified in the node <PersephoneShell> as DataDir) should be set to /mnt/shared/data, the instruction in custom.config will be:

PersephoneShell.DataDir="/mnt/shared/data"

In general, to overwrite a value of some variable in a node, specify the path to the variable as the NodeName.Variable. The node can be nested in another node, so its address should use the full path from the root. For example, the node <PersephoneShell> can have a child node <ConsoleColor>:


 <PersephoneShell TempDir="/var/tmp" DataDir="/data/Data" ...>
    <ConsoleColor Prompt="White" Input="Green" Highlight="Cyan" />
  </PersephoneShell>


To overwrite the value of Prompt color, address it as

PersephoneShell.ConsoleColor.Prompt="Red"

The value of a variable in a node can be used to identify a particular node if there is the need to distinguish the node among multiple candidates with the same path. For example, a node <ConnectionSettings> defines configuration for connections with different names:



 <ConnectionSettings>
    <Connection Name="prod">
      <FileStorage Path="/data/prod/FileStorage" />
      <BlastDbStorage Path="/data/prod/BlastDB" />
    </Connection>
    <Connection Name="test">
      <FileStorage Path="/data/test/FileStorage" />
      <BlastDbStorage Path="/data/test/BlastDB" />
      <SequenceStorage Path="/data/test/sequences" />
    </Connection>
 </ConnectionSettings>


To change BlastDB path for the connection "test", address it as

ConnectionSettings.Connection.Name(test).BlastDbStorage.Path="/ebs/test/BlastDB"

Here are other examples of using custom.config.

Task

Syntax of the original configuration file

Syntax of custom.config

Change the connection string for connection named "prod"

<connectionStrings>

    <clear />

    <add name="prod" providerName="MySql.Data.MySqlClient" connectionString="prod-connection-string..." />

    <add name="test" providerName="MySql.Data.MySqlClient" connectionString="test-connection-string..." />

</connectionStrings>


connectionString.add.name(prod).connectionString="new connection string..."

Change directory for the user data in Persephone

 <UserSettings AnonymousLoginOnly="false" AllowAnonymousLogin="true" UsersDataDirectory="{PWD}/Users/" ... QuotaMb="200" />

UserSettings.UserDataDirectory="/bigvolume/Users"

Set the value for default temporary directory from an environment variable

  <appSettings>

    <add key="MultimapEnabled" value="true" />

    <!-- Prevents CSHTML files from being directly accessed by the browser -->

    <add key="webPages:Enabled" value="false" />

    <add key="DefaultTempDirectory" value="/var/tmp" />

  </appSettings>

appSettings.add.key(DefaultTempDirectory).value=$TEMP

Add the text of the node verbatim. Use _ADD instruction.

Add a node:
<UserSettingsOverride Users="max*@persephonesoft.com" QuotaMb="10240" UsersDataDirectory="{PWD}/Users/"/>

UserSettings._ADD=<UserSettingsOverride Users="max*@persephonesoft.com" QuotaMb="10240" UsersDataDirectory="{PWD}/Users/"/>


Add multiple nodes in a multi-line text. Use a backtick (`) symbol around the block of text.

Add multiple nodes:

       <UserSettingsOverride Users="*@persephonesoft.com" QuotaMb="10240" UsersDataDirectory="{PWD}/Users/"/>

       <UserSettingsOverride Users="*max*" QuotaMb="10240" UsersDataDirectory="{PWD}/Users/"/>



UserSettings._ADD=`

       <UserSettingsOverride Users="*@persephonesoft.com" QuotaMb="10240" UsersDataDirectory="{PWD}/Users/"/>

       <UserSettingsOverride Users="*lab*" QuotaMb="20480" UsersDataDirectory="{PWD}/Users/"/>

`


Remove the first record that meets the criteria. Use _REMOVE instruction. For example, remove the flag MultiThreads for the connection named persephone

<Connection Name="persephone" Url="http://localhost:8983/solr/" CoreName="persephone"  MultiThreads="true">

SearchIndex.Connection.Name(persephone).MultiThreads._REMOVE

Remove all child nodes. Use _REMOVEALL to delete all QualifierFilter records.

<Connection Name="persephone" Url="http://localhost:8983/solr/" CoreName="persephone"  MultiThreads="true">

      <QualifierFilter Type="MARKER" Indexing="false"/>

      <QualifierFilter Type="MARKER" Name="CLNDN" Indexing="true"/>

      <QualifierFilter Type="MARKER" Name="rsId" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Indexing="false"/>

      <QualifierFilter Type="ANNOT" Name="ID" Indexing="true" />
</Connection>


SearchIndex.Connection.Name(persephone).QualifierFilter._REMOVEALL

Replace all QualifierFilter records for the connection called persephone. First, remove all existing filters by using _REMOVEALL, then add a multi-line text block enclosing it in backticks (`)

<Connection Name="persephone" Url="http://localhost:8983/solr/" CoreName="persephone"  MultiThreads="true">

      <QualifierFilter Type="MARKER" Indexing="false"/>

      <QualifierFilter Type="MARKER" Name="rsId" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Indexing="false"/>

      <QualifierFilter Type="ANNOT" Name="ID" Indexing="true" />

      <QualifierFilter Type="ANNOT" Name="Note" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="Info" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="Function description" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="Description" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="Alias" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="SwissProt match" Indexing="true"/>

</Connection>


SearchIndex.Connection.Name(persephone).QualifierFilter._REMOVEALL

SearchIndex.Connection.Name(persephone)._ADD=`

      <QualifierFilter Type="MARKER" Indexing="false"/>

      <QualifierFilter Type="MARKER" Name="rsId" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Indexing="false"/>

      <QualifierFilter Type="ANNOT" Name="ID" Indexing="true" />

      <QualifierFilter Type="ANNOT" Name="transcript_id" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="transcriptName" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="transcriptId" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="transcriptID" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="product" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="old_locus_tag" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="note" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="locus_tag" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="iwgsc_id" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="gene_id" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="gene_synonym" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="geneName" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="geneId" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="gene" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="description" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="definition" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="alias" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="Synonym" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="Parent" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="Name" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="Note" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="Info" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="Function description" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="Description" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="Alias" Indexing="true"/>

      <QualifierFilter Type="ANNOT" Name="SwissProt match" Indexing="true"/>

`