Update
Note
This is an advanced topic for administrators. The update command is used in special situations to convert some old data to newer formats. Most likely, you do not need it.
Sometimes, with new, more efficient ways of storing and retrieving information, updates are necessary to the data already existing in the database. After the introduction of such improved methods, PersephoneShell is updated and the new versions start loading the data using the new formats. The Persephone client will keep working with the data in the older formats, but we highly recommend converting the stored data to the new form. In most cases, this will result in considerable space and transfer time savings. Please review each sub-command.
Usage:
update <target> <path> [-v] [-t | -f] [-o]
Targets:
annot_connector, annotation_group, bed, data_version, gc_content, mapsettree, marker_connector, ortholog, sample_info, track_qualifier, variant
Some of the targets are discussed below. Please use PersephoneShell's command 'help update <target>' to see the description of each update command, e.g.:
help update annot_connector
update data_version
Every data change in the database should trigger the cache reset in the client applications. This is done by updating the values in the table DATA_VERSION. PersephoneShell does it automatically for each data manipulation, such as adding a track or editing the map set description. Still, if you modify the data using direct SQL queries outside PersephoneShell, you need to update the records in DATA_VERSION. Note, maintaining the correct data version Is mostly important for the Web version of Persephone that actively uses browser's and the server's cache. Updating the data version will force the cache to be reset. The Windows version will engage the cache only if explicitly instructed via the configuration settings (TrackCacheSettings), so if using cache is set to false, the command update data_version will have no effect on the Windows version.
update gc_content
The version of Persephone introduced in the end of 2020 shows statistics of the sequences. The map set properties form has a new tab that, besides other values, shows the number of undefined nucleotides (Ns) in the entire genome. Calculating this number is a time-consuming task that, without the gc_content update, will be performed in the run time. In case of thousands of sequences in the selected map set, the calculation can take quite a while. The command update gc_content re-analyzes each sequence, generates a new GC histogram and stores the total number of Ns into the database for quick retrieval.
Test the data first using the switch -t:
PS> update gc_content -t
Updating GC histograms for all 68 map sets listed in the tree, in Test Mode
Updating 1/68 map set: '/Brachypodium/Brachypodium distachyon Bd21' (id:200081226)
Creating GC histogram for 5 sequences, in Test Mode
Done, processed 5 sequences
Map set does not have the count of Ns saved in db, requires updating
Updating 2/68 map set: '/Glycine max/Glycine max Wm82.a1.v1 (id:200207306)
Creating GC histogram for 20 sequences, in Test Mode
Done, processed 20 sequences
Map set does not have the count of Ns saved in db, requires updating...
The actual gc_content conversion can be done for the entire database or for a selected map set:
PS> update gc_content "Medicago sativa/CADL" -v
Do you want to update GC histograms for map set '/Medicago sativa/CADL' (id:262880824)? (Y/N) Y
Updating 1/1 map set: '/Medicago sativa/CADL' (id:262880824)
Creating GC histogram for 6593 sequences
Done, processed 6593 sequences
update track_qualifier
This command creates a new table TRACK_QUALIFIER needed for Persephone's Edit tracks interface.
PS> update track_qualifier
update variant
Switch to using the new format for the variant data. This command will re-analyze the data stored in the database and extract them outside the database. The corresponding data in the database will be deleted.
update ortholog
A major change in the way the ortholog pairs are stored requires running this command to convert the data to the new format. All orthologs for a pair of maps will be saved in a binary blocks. Small blocks are aggregated into larger records. As a result, loading of the synteny matrix or fetching all orthologs for a pair of maps is much improved. Switching to the new format enables the new function in Persephone called Multimap.
PS> update ortholog [-o][--deleteOld][-t]
The total number of ortholog pairs is significantly reduced by recording the links between groups of genes instead of individual gene models. To group the genes, run the command update annotation_group.
Normally, it is expected that the command update ortholog should be run once and include the parameter --deleteOld, which will delete the original ortholog records between individual gene models and replace them with the ortholog links between primary gene models from each group.
The command updates the database schema, so it is required to restart the WebCerberus server.
update annotation_group
The gene models can be grouped based on their location or common qualifiers. By default, the overlapping gene models will be grouped together. When the parameter --groupNameQualifier is used, the models are placed into the same group if they have identical values of the selected qualifier, such as geneName.
PS> update annotation_group [-t][-o][--groupNameQualifier=<qualifier>] [<mapset>]
You can run bulk update for all genes in the database by using the command
PS> update annotation_group
which will group genes by location. See if it produces reasonable results. Sometimes, it might be better to group the models by a qualifier that reflects the gene name common for splice variants, but doing so requires knowing the name of such qualifier. It could be different for different map sets. The latest versions of PersephoneShell store the qualifier parent_gene_id to preserve the identity of the parent gene record for mRNAs in the GFF file. This qualifier is a good candidate to be used for grouping the gene models:
PS> update annotation_group --groupNameQualifer=parent_gene_id -o 222
In the example above, the parameter -o is required if the grouping for a given track should overwrite the previously created groups. The map set is specified here by its MapSetId=222.