Searchindex
PersephoneShell configuration file can contain a node specifying the location of Solr server and the name of the core:
  
  <SearchIndex>
    <Connection Name="test" Url="http://localhost:8983/solr/" CoreName="test" />
    <Connection Name="prod" Url="http://localhost:8983/solr/" CoreName="prod" />
  </SearchIndex>
If the node corresponding to the connection name is found, the search index will be automatically updated every time the data in the database is changed using PersephoneShell. The presence of this node in the configuration file enables this functionality. For example, after the command add sequence successfully completes, the search index for the newly-inserted map names will be automatically updated.
Here is the summary of searchindex verbs:
| set_auto | enable automatic indexing after each data modification, such as add, edit or delete. | 
| set_manual | disable the automatic indexing. This might be useful if you plan to load several tracks and add extra qualifiers. You can first add the data and then, at the end of the loading job, enable the indexing using set_auto or by manually running the indexing (sync or rebuild). | 
| sync | compare lists of tracks in the database and in the existing Solr index and synchronize them if needed. | 
| deepsync | compare lists of tracks (with feature counts) in the database and in the existing Solr index and synchronize them if needed. | 
| rebuild | delete the existing index for the provided map set and rebuild it from scratch. This can take time if the tracks contain large number of features. | 
| skip | mark a track to skip or to enable the indexing. | 
The automatic behavior is controlled by two commands:
searchindex set_auto
and
searchindex set_manual
When PersephoneShell starts, the flag to trigger the automatic index updates is always set to true, so, in most cases you do not need to run these commands. If, for some reason, you want to suppress updating the index and run it at the very end of the loading, you can set the flag to manual updating by running searchindex set_manual.
Once under the manual control, the index will be updated in two ways: sync (or deepsync) and rebuild.
searchindex sync [<idOrPath>] [--tracks <trackList>]
This command (normally called automatically by the add, edit or delete commands) analyzes the set of tracks present in the database and compares it to the set of tracks in the search index. If the sets are different, they are synchronized by adding or deleting the corresponding index entries.
It is important to note that the granularity of this analysis is on the level of a track. The sync analysis checks if the track is indexed or not in terms of presence or absence of the track in the index. The sync command will not recognize any internal changes to the data in the track. For example, when more qualifiers are added to the gene models, the number of tracks does not change, so the sync command will not notice the difference. After any track data modification using PersephoneShell, the items are re-indexed automatically if the search indexing is set to auto. If, for some reason, you have missed the re-indexing, you can enforce it manually by the command:
searchindex rebuild [<idOrPath>] [--tracks <trackList>]
This command will rebuild the entire search index, deleting and recreating the index for all track items. This is especially useful if you modify the internal track data by issuing direct SQL queries to the database outside PersephoneShell.
The command will affect all map sets or a specific map set. If the map set is provided, the job can be reduced further by listing track names to be re-indexed:
searchindex rebuild 2 --tracks "MSU gene models;Gnomon gene models"
Note how multiple track names are separated by a semicolon.
The parameter idOrPath identifies a map set (by MapSetId or path). Rebuilding the index for one map set usually takes minutes.
The map set path can contain a wild card (*), which can be used to specify multiple map sets. The search pattern is applied to the map set paths of all map sets. For instance, using the pattern "*sativa*", will find map sets with the following paths:
/Oryza sativa indica/ASM465v1
/Medicago (genus)/Medicago sativa (alfalfa)/Zhongmu No.1
/Medicago (genus)/Medicago sativa (alfalfa)/medsa.CADL_HM342
/Oryza sativa spontanea/O.sativa spontanea PI653432
/Cannabis/C.sativa Purple kush
If you want to process all map sets under some branch, you can use the pattern like this:
/Homo sapience/pangenome/*
If idOrPath is not provided, the full index will be rebuilt, and this may take a few hours. To make sure that this process is not started accidentally, PersephoneShell will ask to confirm this commitment by entering the total number of map sets:
PS> searchindex rebuild
Do you want to rebuild index for 148 map sets? (Y/N) Y
To confirm, please enter the number of map sets that will be affected: 148
Deep synchronization (deepsync)
The operations of the command searchindex are not blocking other data modifying commands like add or delete. This means that two PersephoneShell sessions can index and modify the data simultaneously. It can happen that the indexing will start and finish working on an incomplete track data. The command searchindex sync checks only for a track presence or absence in Solr, so there is a chance that it will not detect the incomplete index.
To catch situations like this, use the command searchindex deepsync. This command will compare the data sets taking into account also feature counts in each track. It is usually executed by a cron job that cleans up incomplete indexes on a regular base.
Skip indexing specific tracks
If you know in advance that the features stored in a track will never be searched for, such track can be "protected" from indexing. The indexing procedure for a selected track can be skipped. This can be done during track loading by setting flag IsSearchable in the INI file to false, or, if the track is already present in the database, the "skip" flag can be set by running the command
searchindex skip [<idOrPath>]
By running this command, you will have the option to drop the existing Solr index for the selected track. This will save you time and disk space (and eventually, the search performance).
Skip indexing all tracks with given name
You can designate some tracks to be always skipped in any map set during indexing. For example, if you want all tracks named 'Repeats' not to be indexed, run this command without any map set identifier (MapSetId or MapSetPath):
searchindex skip
This command will present the list of all distinct track names present in the database. Just select the track name by its number and change the status of the "skip" flag. Selecting the track will flip the flag from True to False and vice versa.
PS> searchindex skip
Select a track for which the search indexing will be skipped or enabled. Masking a track from the indexing can save you time and disk space.
No map set is provided. The change will affect the tracks with selected track_name on ALL map sets.
Select from list of existing tracks:
[Number] Track Name (Skip Indexing)
[0] MSU gene models (False)
[1] BGI gene models (False)
[2] Gnomon gene models (False)
[3] GENSCAN (True)
[4] Ensembl (True)
[5] SV-deletions (True)
[6] SV-insertions (True)
[7] HDRA_3k_validated (True)
[8] FGENESH gene models (True)
[9] Predicted TSS (True)
Enter [NodeNumber] to edit? 8
The parameter 'skip indexing' will be set to False for track FGENESH gene models
Do you want to index the track? (Y/N) Y
Track option was changed. Solr is out of sync. You should run searchindex sync
Do you want to run this command now? (Y/N) Y
Selecting "Y" will start the index synchronization, and, if the track was skipped before the change (skipped=True), now it will be included into the search index.
Skip indexing via INI file
When loading a track with searchable features (gene annotation, markers, QTLs), the flag to skip indexing can be set in the control INI file (please see the corresponding sample files):
; IsSearchable: If true (default), the track data will be indexed for search. If false, the indexing will be skipped
IsSearchable=false
Setting IsSearchable to false will force skipping the indexing during loading the data. You can change this later by calling the command searchindex skip [<mapset>].
 
   Copyright © 2009-2025 by
 Copyright © 2009-2025 by