Export

Sometimes, rarely, a set of BLAST index files needs to be (re)created. Normally, when the BLAST functionality of PersephoneShell is enabled and the gene annotation is added using the command add annotation, the corresponding proteins are exported and the BLAST index files are built automatically. The BLAST files for genomic sequences are also normally exported as a part of the command add sequence or add sequencedatabase. However, in case the BLAST functionality has been disabled and turned ON later, all the BLAST files need to be produced.

Exporting protein sequences

The command export protein will extract the protein sequences of all gene models in all available annotation tracks for a given map set and pass the exported FASTA files to makeblastdb.

The command requires the information about the map set to be processed. As usual, this can be done by providing MapSetId or map set path (use auto-complete to make your life easier).

PS> export protein "/Arabidopsis thaliana/TAIR10"
Exporting proteins for map set /Arabidopsis thaliana/TAIR10 (1)
Start processing mapset "TAIR10" (id:1), organism "Arabidopsis thaliana"
Method: MRNA
Load gene models from mapset "TAIR10" (id:1)... Done, 48113 gene models loaded
Export proteins to fasta file /data/blastdb/toy/1_MRNA_P
Export proteins: 48091/48113 22 proteins were skipped: they are too short (size <20) or have low complexity
size:20832344
Run makeblastdb ...
Run /blastbin/makeblastdb -dbtype prot -in "1_MRNA_P" -title "TAIR10_MRNA"

Building a new DB, current time: 05/13/2021 20:59:02
New DB name: 1_MRNA_P
New DB title: TAIR10_MRNA
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 48091 sequences in 1.28659 seconds.

Prepared protein index for map set 1, method MRNA

To generate the BLAST data files for all map sets (skipping the existing files), instead of particular MapSetId use the word 'all'.

Exporting genomic sequences

The missing BLAST index files for genomic sequences can be created by the command export dna.

PS> export dna 129
Exporting genomic sequences for map set /SARS-CoV-2/MERS-CoV (129)
Start processing mapset "MERS-CoV" (id:129), organism "Bat SARS-like coronavirus"
Export genomes to fasta file /data/blastdb/toy/129_N
Map count: 1
1/1 Export sequence "NC_019843" (id:37821), length 30,119
Run makeblastdb ...
Run /blastbin/makeblastdb -dbtype nucl -in "129_N" -parse_seqids -title "MERS-CoV"

Building a new DB, current time: 06/07/2021 15:34:09
New DB name: 129_N
New DB title: MERS-CoV
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 1 sequences in 0.0297697 seconds.

Prepared nucleotide index for map set 129

Optional parameters allow to keep (-k) the output FASTA file with sequences and overwrite (-o) the existing index files.

PS> export dna 129 -k

To generate the BLAST data files for all map sets (skipping the existing files), instead of particular MapSetId use the word 'all'.