Backup/Restore
Migration of the data between database instances can be facilitated by using a pair of commands: backup and restore.
Backup
The command backup creates a backup copy of the data in the database in the form of a large SQL script. In addition to the SQL, the archives with genomic sequences (sequence.tar), files with binary data (file_storage.tar) and user data added through the Persephone client (user_storage.tar) are created.
PS> backup
Absolute path to the storage for user data loaded via Persephone (drag&drop)? /data/WebPersephone/Users
Absolute path to the folder for backups (writable)? /data/shared/backup
Database can be backed up
The following data will be backed up:
Backup type Input path Output path Approximate size
-----------------------------------------------------------------------------------------------------------------------------
Backup folder path /data/backup/Persephone_Backup_2024-03-11
Database /data/backup/Persephone_Backup_2024-03-11/backup.sql 51.47 GB
Sequence storage /data/sequences /data/backup/Persephone_Backup_2024-03-11/sequence.tar 42.66 GB
File storage /data/FileStorage /data/backup/Persephone_Backup_2024-03-11/file_storage.tar 12.16 GB
User storage /data/WebPersephone/Users /data/backup/Persephone_Backup_2024-03-11/user_storage.tar 2.02 KB
Proceed? (Y/N)
The backup files will be stored in a sub-folder of the directory specified during the process. When migrating to another database, transfer the root backup folder (e.g., /data/shared/backup) with the sub-folders to a new location. In case of Docker, it is quite common to use a mounted folder as the destination of the backup, so that the files could be accessed from outside Docker container and copied to a new location.
The directory for the user files added though the Persephone client is not known to PersephoneShell, it is specified in the configuration of the Persephone application, so, its location should be manually entered at the prompt Absolute path to the user storage. For our standard Docker setup, the user files are stored at /data/WebPersephone/Users.
If you plan automation of the backup process, supply the variables via control file and use the parameter -c <control.ini>:
PS> backup -c backup.ini
To avoid any interaction, use the force mode (-f).
The sample file is provided in the sub-folder Samples/Backup.
[Paths]
; Absolute path to the folder with backup (Optional)
BackupFolderPath=/data/Data/backups
; Absolute path to source Persephone's user data storage directory (Optional)
UserStoragePath=/data/WebPersephone/Users
[Backup]
; Save DB into sql dump file (Optional) (true by default)
;SaveDb=false
; Save sequence storage into tar file (Optional) (true by default)
;SaveSequenceStorage=false
; Save file storage into tar file (Optional) (true by default)
;SaveFileStorage=false
; Save user storage into tar file (Optional) (true by default)
SaveUserStorage=false
[Restore]
; Restore DB from sql dump file (Optional) (true by default)
;RestoreDb=false
; Restore sequence storage from tar file (Optional) (true by default)
;RestoreSequenceStorage=false
; Restore file storage from tar file (Optional) (true by default)
;RestoreFileStorage=false
; Restore user storage from tar file (Optional) (true by default)
;RestoreUserStorage=false
; Rebuild SOLR after restoring backup (Optional) (true by default)
RebuildSolrAfterRestore=false
Restore
SYNTAX: restore [-t] [-c controlFile] [-f] [--confirm]
The simplest way to restore the data set in a new location or to recover corrupted data is to use the command restore from the interactive mode. In this case, the command is simply
PS> restore
Create a new empty database or use the one that is coming with the Persephone Docker image. Start PersephoneShell on a new machine and execute the command restore. Note that the data in the Docker image contains one sample map set (Arabidopsis thaliana/TAIR10). The restore command will erase this map set. When asked about the directory for storing the genomic sequences, it is OK to specify the same folder that was in use by the previous copy of the data.
PS> restore
Schema is not empty (1 map set(s) found). If you proceed, all the data will be lost.
Do you want to delete the data? (Y/N) Y
Absolute path to the folder which contains the backup? /shared/backup
Looking for backups in the given folder...
[##] FOLDER_NAME BACKUP_SPENT_TIME BACKUP_DATE TOTAL_FILE_SIZE_MB STATUS
----------------------------------------------------------------------------------------------------
[0] Persephone_Backup_2024-03-11 (1) 00:00:25 2024-03-11 1.17 GB OK
Select [lineNo] corresponding to the backup folder: 0
Absolute path to the user storage? /persephone/WebPersephone/Users
Restored sizes:
TYPE FILE_SIZE_MB
--------------------------------
DB 110.38 MB
Sequence storage 983.76 MB
File storage 10.00 KB
User storage 106.18 MB
Sequence folder (/persephone/sequences) is not empty. If you proceed, all the data will be lost.
Do you want to proceed? (Y/N) Y
User storage folder (/persephone/WebPersephone/Users) is not empty. If you proceed, all the data will be lost.
Do you want to proceed? (Y/N) Y
Do you want to rebuild SOLR after restoring backup? (Otherwise, run it manually) (Y/N) Y
Database can be restored
Restore process Result Approximate size
-----------------------------------------------------------------------------------------
All data will be deleted
Backup folder path /shared/backup/Persephone_Backup_2024-03-11
Database 110.38 MB
Sequence storage /persephone/sequences 983.76 MB
File storage /persephone/FileStorage 10.00 KB
User storage /persephone/WebPersephone/Users 106.18 MB
Do you want to restore this backup? (Y/N) Y
In case your database that will be erased contains more than one map set, the program will ask to enter the number of map sets as a security question to avoid accidental destruction of the data.
PS> restore -c /data/psh/backup.ini
Schema is not empty (12 map set(s) found). If you proceed, all the data will be lost.
Do you want to delete these 12 map sets? (Y/N) Y
To confirm, please enter the number of map sets that will be affected: 12
Some of the values required for the restore process, such as the location of file storage, are predefined in the PersephoneShell configuration file in the section <ConnectionSettings>:
<ConnectionSettings>
<Connection Name="small">
<FileStorage Path="/persephone/FileStorage" />
<BlastDbStorage Path="/persephone/BlastDB" />
<SequenceStorage Path="/persephone/sequences" />
</Connection>
</ConnectionSettings>
The command restore can be called from the linux command line and, with proper parameters, can be automated.
A control file can be used to predefine variables, allowing the program to bypass interactive prompts. To skip all prompts and initiate the restore process non-interactively, use both -f and --confirm parameters together. While -f enables force mode, the flag --confirm acts as a safeguard against accidental execution, which wipes out the existing target database.
$ psh restore -c backup.ini -f --confirm
The control file used in the previous step for the command backup can be reused here.
[Paths]
; Absolute path to the backup folder (optional).
; For restore operations, this can be either:
; A full path to a specific backup archive (e.g., ), or
; The parent backup directory used by the 'backup' command. In this case, the most recent backup version will be restored.
BackupFolderPath=/data/Data/backups
BackupFolderPath can point to the root folder for backups or to a subfolder with particular backup file. If it points to the root folder, the program will automatically pick the latest version of backup archives. If you want to force using a specific version of the backup files, specify the path to that subfolder:
BackupFolderPath=/data/Data/backups/Persephone_Backup_2025-04-04
As a common use case, the combination of commands backup and restore can be used to create a test database by restoring the data backed up from the production instance. As usual, the -t switch will run the commands in test mode.