Setting Up SelfHostingCerberus Server
Note
The Persephone system is now supplied in a form of a single Docker image. Once the Docker image is installed, you can start using PersephoneShell to populate the data and run the Persephone application. Please refer to this page.
The text below and the following pages are provided here for those who want to install (and configure) the Persephone components separately, without using Docker.
The stand-alone server SelfHostingCerberus (or SelfHostingWebCerberus for the web version) does not require any pre-installed web server like IIS or Apache.
Unpack the binaries of the stand-alone API server to some directory. The configuration file SelfHostingCerberus.exe.config should contain the encrypted database connection strings.
In case of Oracle:
<!-- Database connection strings; the "Default" key is required.
These connection strings must be encrypted using the same mechanism as in Persephone. -->
<connectionStrings>
<clear />
<!-- Oracle database -->
<add name="Default" connectionString="xxxxxxxxxx" />
<add name="System" connectionString="xxxxxxxxxxx" />
</connectionStrings>
In case of MariaDb we need to add information on providerName, as it is Oracle by default:
<!-- Database connection strings; the "Default" key is required.
These connection strings must be encrypted using the same mechanism as in Persephone. -->
<connectionStrings>
<clear />
<!-- MariaDb database -->
<add name="Default" providerName="MySql.Data.MySqlClient" connectionString="xxxxxxxxxx" />
<add name="System" providerName="MySql.Data.MySqlClient" connectionString="xxxxxxxxxxx" />
</connectionStrings>
Important: the line <clear /> should be the first line in the list of connectionStrings.
In most of the cases, the connection string for Default and System connections are the same. Sometimes, when we use the database read replicas, the connection strings can be different. You can consider Default connection as a read-only one while System is for writing.
Enter the database connection string as shown in the example below. (Replace the text highlighted in yellow with an actual encrypted connection string.) This string is what tells Cerberus how to connect to your database and fetch the data.
<add name="Default" connectionString="xxxxxxxxxx" />
It is a good practice to use our utility cipher to encrypt the connection string.
If you plan to run the server on Linux, please install Mono framework (version 6.0 or later) first. The server can be started from the OS command line as:
Windows:
SelfHostingCerberus http://localhost:80/
Linux:
mono SelfHostingCerberus.exe http://localhost:80/
or
mono SelfHostingCerberus.exe "http://*:80/"
Important
Note the closing slash at the end of the URL parameter.
This will start the server that will listen to the port 80, which is normally open by default for regular web users. Please navigate to http://localhost to see the list of API functions. If necessary, change port 80 to some other number. (When run without parameter, the self-hosting Cerberus will listen to the port 1337). Please make sure that the port is also open for your clients.
Note
It is possible to run several Cerberus instances on different ports simultaneously. For example, the production instance can run on port 80 while the instance for testing can use port 8080 or 443. Please ensure that each instance has its own folder for cached data.
The rest of the configuration file deals with caching and the Solr search index.
<!--
Disk-based cache for genome DNA.
All parameters except for CacheDirectory are optional.
Enabled: If "false", disk cache will be disabled, and all of the other parameters
will be ignored. Default is "true".
CacheDirectory: The directory that will contain cached DNA sequences.
The token {TEMP} will be replaced by the current Windows temp folder.
MaxSizeOnDiskMb: Maximum size of the cache files on disk, in megabytes. When the cache
exceeds this size, oldest entries will be removed.
ChunkFileSizeKb: Maximum size of a single sequence chunk, in kilobytes. DNA sequences
will be split up into chunks of roughly this size.
-->
<DiskCacheSettings Enabled="true" CacheDirectory="{TEMP}/cerberus/dna_cache" MaxSizeOnDiskMb="4096" ChunkFileSizeKb="2048" />
<!--
Search engine configuration.
All parameters except for IndexDirectory are optional.
Enabled: If "false", search engine will be disabled, and all of the other parameters
will be ignored. Default is "false".
IndexDirectory: The directory that will contain search index data.
The token {TEMP} will be replaced by the current Windows temp folder.
UpdateMode: Update mode can be APPEND or OVERWRITE
APPEND:
Appends any new items (markers, genes, QTLs, etc.) to the index.
Existing items will be left unchanged.
OVERWRITE:
Deletes existing indexes, and re-creates them using the new items.
-->
<SolrSearchSettings Enabled="true" ServerUrl="http://localhost:8983/solr" CoreName="prod1" />
<!--
Disk-backed cache for track data.
Enabled: If "false", disk cache will be disabled, and all of the other parameters
will be ignored. Default is "true".
CacheDirectory: The directory that will contain cached tracks.
The token {TEMP} will be replaced by the current Windows temp folder.
MaxSizeOnDiskMb: Maximum size of the cache files on disk, in megabytes. When the cache
exceeds this size, oldest entries will be removed.
-->
<TrackCacheSettings Enabled="false" CacheDirectory="{TEMP}/cerberus/track_cache" MaxSizeOnDiskMb="1024" />
For now, we recommend to enable the disk cache (for DNA sequences) and disable the track cache (it requires coordination with the database table DATA_VERSION, which is an advanced topic not yet documented here).
As for the search engine, please read a separate page on configuring Solr.
Setting up SelfHostingCerberus as a daemon on linux
Normally, the server should run under some sort of terminal multiplexer, such as tmux (https://en.wikipedia.org/wiki/Tmux). This would ensure that the program will not terminate even when the terminal session is closed.
In case of using tmux, first enter a terminal session by typing
tmux
on the command prompt. Then start SelfHostingCerberus (for this example, we use default settings, no additional command line arguments):
mono SelfHostingCerberus.exe
The server will start on the default port 1337. To leave it running and close the session, press Ctrl-B and then press 'D' (disconnect). The server will run until the hosting machine is rebooted. To reenter the tmux session, type:
tmux ls
For example:
tmux ls
Cerberus: 1 windows (created Wed Dec 19 13:03:41 2018) [80x24]
Note the terminal ID of the session ('1' - highlighted here in yellow). If there is only one background session, it is enough to type just:
tmux attach
In case there are more than one tmux sessions running, reference the terminal ID:
tmux attach -t 1
To automate starting Cerberus after system restart, please consult this sample script that should be placed under /etc/init.d. Two commands will schedule Cerberus to start after the system reboot:
sudo update-rc.d cerberus defaults
sudo update-rc.d cerberus enable
To start Cerberus manually, you would need to run this command:
sudo /etc/init.d/cerberus start