Set up Lucene Search Engine

Persephone supports Lucene as a database search index alternative. If your database is not Oracle, then you must use the Lucene search index to compensate for the context search functionality of Oracle that tokenizes text strings and enables search for keywords using syntax like 'kinase' or 'transpos*'. Plus, the Lucene search engine is highly optimized and fetches the results quicker than a simple database query.

The process of setting up of Lucene is quite simple:

1. Allocate disk space on the same file system as Cerberus is running on. The allocated space should be large enough and can be of the same order of magnitude as the database table space.

2. Run SearchIndexUpdater that comes with PersephoneShell. It will create index files in the allocated disk space. Here is a list of parameters:

s:\SearchIndexUpdater\>SearchIndexUpdater
Usage: SearchIndexUpdater [OPTIONS]+
Update search index to specific directory.

Options:
-t, --type=IndexType (required) the IndexType, must be one of {MARKER,
ANNOTATION, QTL, ALL}
-d, --destination=DIR DIR directory to store index data (default=uses directory specified in config)
-u, --updateMode=UpdateMode
the UpdateMode, must one of {APPEND, OVERWRITE} (default=uses mode specified in config)
-s, --connectionString=VALUE
name of connection string stored in config file (
default=connection string in config under name 'Default')
-m, --mapSets=VALUE list of map set's ACCESSION_NOs or MAP_SET_IDs to
be indexed, separated by ',', (default=all)
-h, --help show this message and exit

Note

On Linux or Unix, use mono framework to run the SearchIndexUpdater tool:
mono SearchIndexUpdater.exe ...

Update SearchIndexUpdater.exe.config file to include the database connection string. This string should have a name (referenced on the command line) and a provider (Oracle.ManagedDataAccess.Client or MySql.Data.MySqlClient). For example, for MariaDb, the connection string could be:

Optionally, use cipher to encrypt the connection string.

If you use the name "Default", the program will automatically try to use the corresponding connection string if the connection name is not entered on the command line.

The other required parameters, like output directory or updateMode could be specified on the command line or, if the index is going to be updated regularly, in the configuration file:

For the first time, run the updater with the arguments similar to the command line below:

SearchIndexUpdater -s MyDb -t ALL

This will create necessary sub-directories in the IndexDirectory with the index files compiled from the entries in the database. The program indexes markers, gene models (annotation), QTLs and all their qualifiers. Later, the indexes can be appended with new entries. The update can be done for selected object types (MARKER, ANNOTATION, QTL) and for selected map sets:

SearchIndexUpdater -t MARKER -u APPEND -m TAIR10

3. Update Cerberus' configuration file. Find the section SearchEngineSettings and modify it according to the example:

These steps should be enough to engage the Persephone's Lucene Search Engine.