Web Persephone: Protein Multiple Alignment

Overview

The Protein Multiple Alignment view aligns multiple protein sequences in the same view (using a variation of the BioMSA algorithm). For example, you could generate a multiple alignment of all the orthologs of a single gene annotation. To open this view, click a gene annotation to open the Annotation Details dialog; navigate to the Orthologs tab, and click the link in the bottom-right corner of the table:

Doing so opens a new dialog:

You can also reach this view by selecting Tools | Protein multiple alignment from the main toolbar; doing so will open a blank dialog.

The top part of the dialog contains alignment options and the list of orthologs; the bottom part contains the actual alignment. You can click the rollout button to expand or collapse each section.

Alignment options

Use these controls to fine-tune the alignment.

Method: Selects the multiple alignment algorithm.

Auto: Automatically selects the best algorithm. The currently selected algorithm is always listed at the top of the Detailed Alignment section:
Complete: Performs a more complete alignment (using the Needleman-Wunsch algorithm) at the expense of slower performance.
Diag: Performs a faster yet less accurate alignment by finding common segments between sequences (called "diagonals"), then running the Needleman-Wunsch algorithm for the missing sequence fragments.

Gap penalties: Adjusts the gap open penalty and the gap extend penalty for the alignment algorithm. Higher values penalize gaps more severely; lower values allow for more gaps and/or longer gaps.

Selecting orthologs

The top portion of the dialog lists all the orthologs that are available for the currently selected gene annotation.

By default, only the most representative transcript for each ortholog is included in the comparison; however, you can click the expand tree button to expand the list of available transcripts. Note that the first item on the list is the currently selected gene, which may also possess multiple transcripts:

Check the checkbox next to a transcript to add it to the comparison. You can also click the checkbox in the table's header to quickly select all available transcripts, although doing so may cause the alignment calculation to run significantly slower (depending on the number of selected transcripts). To return to the default selection, click the reset to default button in the upper-right corner of the table.

Click the name of a transcript to open its own Annotation Details dialog:

In addition, when you mouse over the name of a transcript, a settings button label settings will appear:

Click the button to choose which label should be shown for all transcripts in the current map set (in this example, Zea Mays AGPv4). By default, the best label is selected automatically; however, you can also choose to manually select the value of any available qualifier:

Qualifiers that are marked with "f" contain the gene function, and those marked with "n" contain the gene name; note that these could be different for every map set. The updated label will be reflected in the table as well as the alignment section:

The table of orthologs supports all of the standard search and filtering controls; for example, you could filter the orthologs by organism, hiding all except those that occur on Sorghum:

Sorting the table will also change the sort order of transcript in the alignment sections.

In addition, you can show or hide extra columns by clicking the select columns button in the upper-right corner. For example, you could display the value of the "SwissProt match" qualifier and/or hide the P-value column:

Note that not all qualifiers may be available for all map sets or tracks (in this case, the "SwissProt match" qualifier is not available for the Zea mays map sets). If you have a data file with additional annotation qualifiers, you can load it in PersephoneShell.

Query

The Query section lists all the protein sequences to be aligned, in multi-FASTA format. The Query is read-only when you are aligning gene orthologs; however, if you opened a blank Protein Multiple Alignment dialog (by selecting Tools | Protein multiple alignment from the toolbar), the query will be editable. You can copy/paste desired sequences into that box and edit them in-place, and the changes will be reflected in the alignment view:

Note that when viewing a custom query, the names of the transcripts are taken from their FASTA headers.

Alignment display

The middle portion of the dialog displays the detailed sequence alignment; the bottom portion displays the "bird's eye" overview. The overview compresses the entire alignment to fit on the screen, making it easier to find gaps or blocks of mismatches:

Drag the mouse over the overview to quickly scroll the main alignment view to the desired position. The leftmost two columns in the main alignment view display the name of each aligned transcript and its match score (higher scores indicate a better match to the consensus alignment). You can resize these columns by dragging their borders:

The numbers above the amino acids display their position relative to the start of the consensus alignment.

You can click a transcript name to display its Annotation Details, or mouse over it to change its label, just as you would in the main table:

The thick blue lines represent exon boundaries of the gene that produced the transcript; you can turn this display on or off by toggling the Show exon boundaries checkbox:

You can also change the color-coding scheme for the amino acids by opening the Coloring schema panel:

Color by:

Amino acid: This is the default option; each kind of amino acid is highlighted with its own unique color.
R-Group classification: Highlights amino acids by their side chain types.

Highlight:

All cells: By default, all amino acids are highlighted (as per the Color by selection).
Consensus cells: Highlights only those amino acids that match the consensus alignment; for example, this Leucine is not highlighted because most of the transcripts contain Isoleucine at its position:
Differences from consensus: The opposite of the above; highlights only those amino acids that differ from the consensus alignment.
Differences from query: You can nominate one transcript to act as the reference, and highlight only those amino acids that differ from it. For example, you could highlight the differences from SbRio.01G260800.1 among all of the other Sorghum transcripts:

Exporting alignments

Click the export alignment button in the upper-right corner of the detailed alignment view to export it, either as an SVG image (suitable for publication), or as plain text. You can choose whether to split the alignment lines by width, and also whether to include relative position labels:

Method: complete, Gap open penalty: -300, Gap extend penalty: -11, Queries: 9, Size: 369 aa
1
SbRio.01G260800.1 MRGVTSAAKRAGEMAFNAGGGAVNWFPGHMAAASRAIRDRLKLADLVIEVRDARIPLSSA
SbRio.01G260800.2 ------------------------------------------------------------
SbRio.01G260800.3 ------------------------------------------------------------
Sobic.001G246500.1 MRGVTSAAKRAGEMAFNAGGGAVNWFPGHMAAASRAIRDRLKLADLVIEVRDARIPLSSA
Sobic.001G246500.2 ------------------------------------------------------------
Sobic.001G246500.3 ------------------------------------------------------------
Sobic.001G246500.4 ------------------------------------------------------------
Sobic.001G246500.5 ------------------------------------------------------------
Sobic.001G246500.6 ------------------------------------------------------------

61
SbRio.01G260800.1 NEDLQPVLSAKRRILALNKKDLANPNIMNRWLNHFESCKQDCISVNAHSSNSVNQLLGFA
SbRio.01G260800.2 ---------------------------MNRWLNHFESCKQDCISVNAHSSNSVNQLLGFA
SbRio.01G260800.3 ----------------------------------MESGKMSSHPAQQKKLSSLT------
Sobic.001G246500.1 NEDLQPVLSAKRRILALNKKDLANPNIMNRWLNHFESCKQDCISVNAHSSNSVNQLLGFA
Sobic.001G246500.2 ---------------------------MNRWLNHFESCKQDCISVNAHSSNSVNQLLGFA
Sobic.001G246500.3 ----------------------------------MESGKMSSHPAQQKKLSSLT------
Sobic.001G246500.4 ---------------------------MNRWLNHFESCKQDCISVNAHSSNSVNQLLGFA
Sobic.001G246500.5 ------------------------------------------------------------
Sobic.001G246500.6 ----------------------------------MESGKMSSHPAQQKKLSSLT------
...

You can also export the bird's-eye overview of the entire alignment, in PNG bitmap format.