Set up BLAST functionality of Cerberus

Configuration

BLAST search is implemented as part of SelfHostingCerberus. Configuration of this functionality is done by editing SelfHostingCerberus.exe.config file. Find the section <BlastSettings> and modify two parameters:

Please make sure that the file path references the folders accessible by Cerberus, usually, for the best performance, the files are stored on a local disk of the Cerberus server.

Additionally, the Persephone configuration should be adjusted too (Persephone.exe.config):

Technically, once Cerberus receives a request from Persephone to perform BLAST search, it will pass the task to the local copy of BLAST and relay the results back to Persephone.

Preparing BLAST files

The subject sequences for BLAST are downloaded from the database by running BlastDbExporter. It reads the genomic or protein sequences and their IDs from the database, saves them into local FASTA files and runs NCBI-BLAST's makeblastdb command to compile the BLAST library files.

Example. Dump all sequences currently stored in the database to a BLAST data directory.

1. Configure BlastDbExporter to recognize the connection string to the database. Find <connectionStrings> section in BlastDbExporter.exe.config and add the named connection string (optional: use cipher to encrypt it):

Please note, the line <clear/> should remain as the first line in this section. Please consult a page for more information on formatting the database connection strings.

2. Run BlastDbExporter with the following parameters (assuming the values from the configuration above):

BlastDbExporter -s MariaDbAmazon -b d:\blast\bin\ -d d:\blast\data\ -A

Just in case, here is the list of other parameters of BlastDbExporter:

-s, --connectionString=VALUE Connection string to database server (Required)
-b, --blastBinDirectory=VALUE NCBI BLAST bin directory (Required)
-d, --destination=DIR DIR directory to store blast data (Required)
-o, --organismId=VALUE Dump sequences for Organism_ID
-n, --nucs Dump nucleotide sequences
-p, --prot Dump protein sequences
-m, --method=VALUE Dump proteins predicted by given method
-M, --mapSet=VALUE Dump sequences for MAP_SET_ID
-A, --allMapSets Dump all sequences from exising map sets (large!)
-S, --skipExisting Skip export if file exist
--kf, --keepFasta Keep fasta files
--n0, --namesOrganismRunId Export file name in old format: OrganismId_RunId
-h, --help Show this message and exit
-v, --verbose Verbose mode.
-D, --debug Show debug info.

Example. Prepare genomic DNA files for MapSetId=2:

BlastDbExporter -s MariaDbAmazon -b d:\blast\bin\ -d d:\blast\data\ -n -M 2