Web Persephone: The Run BLAST dialog
BLAST (Basic Local Alignment Search Tool) is a set of widely-used sequence comparison tools. Persephone can run these tools and display the results graphically. To open the Run BLAST dialog, select BLAST from the main toolbar:
Alternatively, you can open one of the sequence tabs in the Annotation Details dialog, and click the BLAST button () to open the Run BLAST dialog, and pre-fill it with the current sequence as the query (as well as the appropriate program type and speed/sensitivity presets).
The Run BLAST dialog contains the following controls:
1: Sequence entry box: Paste the input sequence here, in FASTA format. When pasting multiple sequences, make sure to provide a unique header for each sequence; for example:
>primer1
Atgtcggcggcggcgaaggc
>primer2
tcagtcatcatcttgaccag
2: Program type: Select the kind of comparison you wish to perform (these options are also described in more detail below):
- BLASTN: Aligns the nucleotide query sequence against the nucleotide sequences of maps in the selected subject genome.
- TBLASTN: Aligns the protein query sequence against the translated nucleotide sequences (in all six frames) of maps in the selected subject genome.
- BLASTP: Aligns the protein query sequence against the proteins given by gene annotations on the selected Annotation track in the selected map set.
- BLASTX: Aligns the translated nucleotide query sequence (in all six frames) against the proteins given by gene annotations on the selected Annotation track in the selected map set.
When you change the program type, the map set selection tree and quick preset options may change as well.
3: Map Set selection: Use this table to choose the map set (and/or Annotation track name) to use as the subject for comparison (the table supports all standard selection and filtering controls). Click a checkbox next to a tree node to mark all map sets and maps inside it. Alternatively, you can click the triangle icon next to the node to expand it, and then mark individual map sets inside the folder, or individual maps inside the map set (assuming you're performing BLASTN or TBLASTN). Click the checkbox in the column header (at the top of the table) to mark or clear all available maps.
All of the chosen map sets and folders are added to the selection shelf. For example, if you select the Solanum lycopersicum folder (which contains multiple map sets) as well as the DM_v4.03 map set (under Solanum tuberosum), both items will be displayed on the shelf:
Move the mouse over an item on the shelf to display its contents in detail:
Doing so also exposes the button; click it to remove this item from the shelf (thus un-checking the corresponding rows in the map set selection table). If there are too many items to fit on the screen, you can use the scroll bar at the bottom of the shelf to scroll through them:
As you select or deselect maps, the Subject statistics field will update, showing the total number of maps that are currently selected, as well as their total size.
4: Quick presets: Contains commonly used presets for BLAST command-line arguments (these depend on the currently selected program type):
- Fast: Trades away sensitivity in favor of speed (this is the default).
- Normal: More sensitive than the "Fast" preset, but may take significantly longer to run (especially if the query happens to match a repeat).
- Primer: This preset is only available for BLASTN; it is suitable for aligning very short nucleotide sequences (e.g. the kind commonly used as primers) against the target genome. The BLASTN program would typically filter out such short reads completely, but the "Primer" preset prevents this from happening.
- Custom: Custom user-editable arguments.
5: Arguments entry box: Displays the detailed command-line arguments for the currently selected preset. Alternatively, you can check the Override arguments checkbox to enter the arguments manually:
When you check the box, the preset selection will be set to Custom, and the values for the current preset will become editable. You can click anywhere in the textbox to view a list of previous parameter sets (a separate list is kept for each program type, e.g. BLASTN, TBLASTN, and so on):
You can click any entry to select it, or manually type in any other parameters. In addition, you can click the button to quickly set some of the most common options:
Here are a few tips on using the BLAST parameters, including some of the ones that are not listed in this quick edit dialog.
6: Subject statistics: Displays the number of currently selected map sets and maps; as well as their total size (e.g. in nucleotides in case of BLASTN). If the total size of the selected databases is too large, it will be highlighted in red:
In this case, you would have to deselect some maps (or even entire map sets) until the total size falls below the acceptable threshold.
Note
If you are running a local installation of Persephone, you can edit the config file to adjust the maximum allowed size of selected BLAST databases.
When ready, click the Run BLAST button to run your query. This process can take anywhere from a few seconds to several minutes, depending on your query and map set selections (as well as speed/sensitivity options). When the search is complete, the Run BLAST dialog will be replaced by the BLAST Results dialog.
Note that you might see a warning similar to this one:
If so, double-check to make sure your input sequence matches the currently selected program type (DNA for BLASTN/BLASTX, protein for BLASTP/TBLASTN).
Program type: BLASTN/TBLASTN
As described above, these programs will align the query (a nucleotide sequence in case of BLASTN, or a protein sequence in case of TBLASTN) against the nucleotide sequence of the subject genome. In this case, the map set selection tree will contain all physical map sets (i.e., map sets whose maps are based on genomic sequences), and the node for each map set will contain all of its maps.
Program type: BLASTP/BLASTX
These programs align the query (a nucleotide sequence in case of BLASTX, or a protein sequence in case of BLASTP) against the proteins derived from gene models on Annotation tracks in the selected map set (or multiple map sets, if more than one map set is selected). In this case, the map set selection tree will contain all map sets with annotation tracks, and the node for each map set will contain a list of all of annotation track methods:
You can select a single annotation method by checking its chekbox, or select all methods at once by clicking the checkbox next to their parent node (as described above).