Process Run
The ProcessRun control file section enables you to keep track of all loading processes. A Process RUN can be considered as a batch job with an ID and description. The objects inserted during one run can later be deleted together using 'delete run' command. The Process RUNs in the table PROCESS_RUN can be of several types:
Process Type |
Description |
LOAD ORGANISM |
Adding an organism being studied. |
LOAD GDNA |
Adding a physical map set with maps based on sequence. |
LOAD MAPS |
Adding a genetic map set and maps. |
LOAD ANNOT |
Adding annotations to a physical map set. |
LOAD MARKER |
Adding markers to a physical/genetic map set. |
LOAD SYNTENY |
Adding syntenic regions between two physical map sets. |
LOAD ONTOLOGY |
Adding ontology terms in OBO format. |
LOAD QTL |
Adding QTLs to a physical/genetic map set. |
LOAD EXPRESSION |
Adding expression data to a physical map set. |
LOAD ALIGNMENT |
Adding protein or DNA alignments to a physical map set. |
LOAD VARIANT |
Adding sequence variants to a physical map set. |
PersephoneShell will create a ProcessRun record with proper ProcessType depending on the command.
It is a good practice to record the source URL of the data being loaded in the Process Run description. Commenting RunDescription= instruction will result in PersephoneShell generating the corresponding run description, typically, by capturing the process type, the map set and the source.
[ProcessRun]
; RunDescription: if specified, a custom description will be used,
; otherwise, "Added annotations for {MapSet Accession No.} from {Sources}." will be used.
RunDescription="Load Gnomon annotations for SL3.2 from NCBI (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/168/275/GCF_002168275.1_ASM216827v2/GCF_002168275.1_ASM216827v2_genomic.gff.gz)"
The process type will also help listing the runs using the parameter -T:
list run -T synteny
The process type for filtering the records is the same as the one used in the add command. Use '-l' switch to see the process type in the listing:
list run -l
RUN_ID DESCRIPTION PROCESS_TYPE DATE_CREATED CREATED_BY
------------------------------------------------------------------------------------------------------------------------------------
1230 Added sequences for GWHBDNS00000000 from GWHBDNS00000000.genome.fasta.gz sequence 8/4/2023 2:48:29 AM ubuntu
1231 Added annotation for GWHBDNS00000000 from GWHBDNS00000000.gff.gz annotation 8/4/2023 2:50:31 AM ubuntu
1232 Created paralogs within /Echinochloa (genus)/Echinochloa oryzicola v2 ortholog 8/4/2023 3:19:21 AM ubuntu
Using the RUN_ID is sometimes the only way to delete some types of data. For example, to delete the ortholog records you must use the RUN_ID:
delete run 1232