Configuration

BLAST search is implemented as part of SelfHostingCerberus. Configuration of this functionality is done by editing SelfHostingCerberus.exe.config file. Find the section <BlastSettings> and modify two parameters:

<BlastSettings>
 <NcbiBlast BlastDbFolder="/var/blast/data" ProgramDirectory="/bin/blast/bin" />
</BlastSettings>

Please make sure that the file path references the folders accessible by Cerberus, usually, for the best performance, the files are stored on a local disk of the Cerberus server.

Additionally, the Persephone configuration should be adjusted too (Persephone.exe.config):

<BlastSettings>
  <UseNcbiBlast value="true" />
  <UseNewBlastForm value="true" />
  <NcbiBlast 
    WebApiUrl="put URL to Cerberus here"
    MultipleSubjects="true">
      <BlastP>...</BlastP>
      <BlastX>...</BlastX>
      <BlastN>...</BlastN>
      <TBlastN>...</TBlastN>
  </NcbiBlast>
</BlastSettings>

Technically, once Cerberus receives a request from Persephone to perform BLAST search, it will pass the task to the local copy of BLAST and relay the results back to Persephone.

Preparing BLAST files

The subject sequences for BLAST are downloaded from the database by running BlastDbExporter. It reads the genomic or protein sequences and their IDs from the database, saves them into local FASTA files and runs NCBI-BLAST's makeblastdb command to compile the BLAST library files.

Example. Dump all sequences currently stored in the database to a BLAST data directory.

1. Configure BlastDbExporter to recognize the connection string to the database. Find <connectionStrings> section in BlastDbExporter.exe.config and add the named connection string (optional: use cipher to encrypt it):

<connectionStrings>
  <clear/>
  <add name="MariaDbAmazon" providerName="MySql.Data.MySqlClient" connectionString="scott/tiger@amazonUrl:3306/PERSDB" />
</connectionStrings>

Please note, the line <clear/> should remain as the first line in this section. Please consult a page for more information on formatting the database connection strings.

2. Run BlastDbExporter with the following parameters (assuming the values from the configuration above):

BlastDbExporter -s MariaDbAmazon -b d:\blast\bin\ -d d:\blast\data\ -A

Just in case, here is the list of other parameters of BlastDbExporter:

-s, --connectionString=VALUE   Connection string to database server (Required)
-b, --blastBinDirectory=VALUE  NCBI BLAST bin directory (Required)
-d, --destination=DIR          DIR directory to store blast data (Required)
-o, --organismId=VALUE         Dump sequences for Organism_ID
-n, --nucs                     Dump nucleotide sequences
-p, --prot                     Dump protein sequences
-m, --method=VALUE             Dump proteins predicted by given method
-M, --mapSet=VALUE             Dump sequences for MAP_SET_ID
-A, --allMapSets               Dump all sequences from exising map sets (large!)
-S, --skipExisting             Skip export if file exist
--kf, --keepFasta              Keep fasta files
--n0, --namesOrganismRunId     Export file name in old format: OrganismId_RunId
-h, --help                     Show this message and exit
-v, --verbose                  Verbose mode.
-D, --debug                    Show debug info.

Example. Prepare genomic DNA files for MapSetId=2:

BlastDbExporter -s MariaDbAmazon -b d:\blast\bin\ -d d:\blast\data\ -n -M 2