List

The list command is used to list one or more objects (targets) in your Persephone database. The common syntax is as follows (with few exceptions):

list <target> [{mapSetId | path}] [-p pattern] [-l] [-i N] [-r] [-t N]

The table below lists the definitions for the list command parameters.

List Command Parameters

Parameter	Required or Optional?	Definition
<target>	Required	A target is the object type you want to list, which can be annotation, annotation_method, annotation_qualifier, annotation_search, chromosome, config, map, map_type, mapping_method, mapset, mapsettree, marker, marker_qualifier, marker_type, ontology, organism, ortholog, path, qtl, qualifier_filter, qualifier_link, run, sample, sequence, source, stats, storage, study, synteny, task, track, track_qualifier, track_type, tracktree, variant, xref_db Note, that a plural form of the target is also accepted.
-p pattern	Optional	Uses a pattern to filter your results. Wildcards (*) are supported.
-l	Optional	Displays the list in the "long listing" format. See Using the Long-Listing Format for more information.
-i N	Optional	Sorts the list by column index (0-based) number N. (The default is 0.) For example, entering "-i 3" would sort the list by index number 3.
-r	Optional	Sorts the list in reverse order.
-t N	Optional	Lists the top N items. (The default is 0.) For example, entering "-t 5" would list only the top 5 items.
-d	Optional	Executes the list command in debug mode. You can send the debug output to Persephone Software, LLC. at http://persephonesoft.com/contact.
-T type	Optional, works with run only	Filter by process type, works with list run command only

For example, to list all the organisms in your Persephone database, enter list organisms as shown below.

PS> list organisms

The following is a typical example of the output.

-1:Unknown 0:Unknown
3555:Beta vulgaris subsp. vulgaris 3769:Arabidopsis thaliana
3847:Glycine max 4081:Solanum lycopersicum
4498:Avena sativa 4555:Setaria italica
4558:Sorghum bicolor 4565:Triticum aestivum
9606:Homo sapiens 10090:Mus musculus
15368:Brachypodium distachyon 29760:Vitis vinifera
38727:Panicum virgatum 39946:Oryza sativa subsp. indica
112509:Hordeum vulgare subsp. vulgare 138011:Brassica napus var. napus
218851:Aquilegia coerulea 311987:Zea mays
311988:Oryza sativa 339834:Miscanthus spp
22 organisms

Tip

The List command is commonly used in conjunction with the Delete command to delete objects from the Persephone database. See Deleting Loaded Data for an example of using the List and Delete commands together.

Using the Long-Listing Format

If you want to display more information about an object use the "-l" option to display the list in the "long-listing" format. For example, to list all organisms beginning with the letter "s" you would enter the following:

PS> list organisms -p s* -l
ID NAME TAXONOMY_NUM COMMON_NAME DICOT
--------------------------------------------------
4081 Solanum lycopersicum 4081 tomato 1
4113 Solanum tuberosum 4113 potato 1
4558 Sorghum bicolor 4558 sorghum 0
2500762 Shigella phage vB_SdyM_006 2500762 Shigella (null)
2697049 SARS coronavirus 2697049 SARS (null)
6 organisms

Listing process runs

Each batch of data is loaded in a process that is recorded as "process_run". You can list all the loading jobs by using the command list run. To filter the multi-line output, you can use the switch -T, which will display the jobs of one type only. The name of the type is the same as the one used with the command add. For example, to list the jobs for loading gene annotation, use 'annotation':

PS> list run -T annotation
3: Load annotations for B73 RefGen_v5
1 run

Other process run types are: sequence, annotation, ribbon, ortholog, marker, map, variant, quantitative, qtl, sam.

The process type of each run will be shown when the long format (-l) is used:

RUN_ID DESCRIPTION PROCESS_TYPE DATE_CREATED CREATED_BY
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
98 Loaded sequences for PGSC_DM_v4.03 from MSU sequence 2/21/2019 2:21:53 AM ubuntu
99 Loaded annotations for PGSC_DM_v4.03 from MSU annotation 2/21/2019 2:27:57 AM ubuntu
100 DArT markers from MSU marker 2/21/2019 2:31:56 AM ubuntu
101 Dundee opa markers from MSU marker 2/21/2019 2:32:37 AM ubuntu

Listing Track Tree entries

The tracks of each map set have a common hierarchical structure repeated on each map. To display this tree-like organization of tracks, identify the map set by using either its MapSetId or the full path (use TAB to auto-complete):

PS> list tracktree "Arabidopsis thaliana/TAIR10"
[0] Ensembl (Track, Order: 0, Type: Annotation)
[1] TREP repeats (Track, Order: 1, Type: GenericBp)
[2] SV-deletions (Track, Order: 2, Type: GenericBp)
[3] SV-insertions (Track, Order: 3, Type: GenericBp)
[4] RNA-seq (Group, Order: 4)
[5] RNA-seq cold treatment (Track, Order: 4, Type: Quantitative)
[6] RNA-seq cold-stress control (Track, Order: 5, Type: Quantitative)

Listing Map Sets

The command 'list mapset' can work as it is, without extra parameters. In such case, it will list all available map sets, showing their MapSetIds:

PS> list mapset
1: TAIR10 5: Scaffold maps
30: DM_v4.03 31: EL10_1.0
44: Optical maps 50: ARE_050606
51: Criollo v2 52: Tcacao_v2.1
53: Os GJ-subtrp: CHAO MEO 54: Tomato ITAG5.0
55: TraitGenetics EXPEN2012
11 mapsets

You can identify a particular map set by specifying its MapSetId or path on the command line. It is typical to use this command with the longListing flag -l:

PS> list mapset 51 -l
MAP_SET_ID DISPLAY_NAME ACCESSION_NO SOURCE_ID DISTANCE_UNIT ORGANISM_ID
----------------------------------------------------------------------------------
51 Criollo v2 GCF_000208745.1 NCBI bp 3641

/Theobroma cacao/Criollo v2
1 mapset

When the listing shows many map sets, you might want to use a pattern (-p) to reduce the number of shown map sets.

PS> list mapset -p solanum
20: Tomato SLT1.0 21: SL4.0
27: DM 1-3 516 R44 30: DM_v4.03
54: Tomato ITAG5.0 55: TraitGenetics EXPEN2012

Note that 'solanum' is not found in the map set names, but it is present in the full path to the map sets that includes the parent nodes, such as 'Solanum lycopersicum' and 'Solanum tuberosum'.

To separate physical and genetic maps you can use sorting by column 4 (0-based):

PS> list mapset -l -i 4
83 2019-nCoV WHU02 MN988669.1 NCBI bp 2697049
82 2019-nCoV WHU01 MN988668.1 NCBI bp 2697049
81 2019-nCoV/USA-WA1/2020 MN985325.1 NCBI bp 2697049
80 2019-nCoV_HKU-SZ-005b_2020 MN975262.1 NCBI bp 2697049
71 Zm-Tx303 Zm-Tx303 MazeGDB bp 4577
209 ARE_050606 ARE_050606 URGI_INRA cM 4565
18 BTx623-IS320C BTx623-IS320C Publication cM 4558
24 ChineseSpring x Renan CS_Renan_Genetic IWGSC cM 4565
25 RH_MAPS RH_MAPS IWGSC cM 4565
214 TraitGenetics EXPEN2012 EXPEN2012 solgenomics cM 4081

Listing the Map Set Tree

An organism can have multiple map sets, which are collections of physical or genetic maps. The map set tree is a hierarchical tree structure that efficiently categorizes different types of map sets across multiple organisms in an organized fashion. The tree features are called "nodes". A node's "parent" is a node one step higher in the hierarchy and lying on the same branch. A node without any parent is called "root" node; a node without any child is called "leaf" node. "Sibling" nodes share the same parent node. The order between sibling nodes is assignable. There are two types of nodes: a node without an assigned map set (red text in the example below) and a node with a map set (blue text below). Note that a map set cannot be assigned on more than one node. To list a map set tree, use command "list mapsettree". The option -l displays node id along with map set ID/type if it exists.

PS> list mapsettree -l
Brachypodium (NodeId:200081224)
Physical:Brachypodium annotation (NodeId:200081227, MapSetId:200081226)
Genetic:Bd3-1 x Bd21 (NodeId:270702460, MapSetId:270702459)
...

If the map set tree has many entries, you can limit the output by specifying the name of the nodes:

PS> list mapsettree "Glycine (genus)"
Glycine (genus)
Glycine max
Wm82.a2.v1
Wm82.a4.v1
Zhonghuang 13
Zhonghuang 13 NCBI
G.max Lee v1.1
Glycine soja
G.soja W05
G.soja PI483463

Listing the Map Set Path

We have a convenience method that displays full path of the map sets.

When authoring the INI files, you can reference the map set by its Id or by path. If you have more than one database and plan to share the INI file between the different instances, it is recommended to use the map set path: the IDs in different databases can be different, but the path could be more stable, and so, can be reused. It is typical to narrow down the output by providing a name of the map set tree node:

PS> list path "Glycine (genus)"
/Glycine (genus)/Glycine max/G.max Wm82 v1.01
/Glycine (genus)/Glycine max/Wm82.a2.v1
/Glycine (genus)/Glycine max/Wm82.a4.v1
/Glycine (genus)/Glycine soja/G.soja W05
/Glycine (genus)/Glycine max/Zhonghuang 13
/Glycine (genus)/Glycine soja/G.soja PI483463
/Glycine (genus)/Glycine max/G.max Lee v1.1

Now you can copy the map set path and transfer it to the INI file.

Listing the Orthologs

The command 'list orthologs' will display all pairs of the map sets that have the orthologous genes:

PS> list ortholog

Record ID From mapset To mapset Count
---------------------------------------------------------------------------------------------------------------
1 Os GJ-temp: IRGSP-1.0 (Nipponbare) [2] Zea mays AGPv4 [20] 18,101
2 Os GJ-temp: IRGSP-1.0 (Nipponbare) [2] O.sativa spontanea PI653432 [234] 24,360
3 Os GJ-temp: IRGSP-1.0 (Nipponbare) [2] ASM465v1 [4] 28,859
4 GRCh38 [10] GRCh37.p13 [37] 18,878
5 TAIR10 [16] Wm82.a2.v1 [19] 14,897
6 TAIR10 [16] Wm82.a4.v1 [34] 14,883
7 Os GJ-temp: IRGSP-1.0 (Nipponbare) [2] Sorghum v.3.1 [17] 19,081
8 Sorghum v.3.1 [17] Zea mays AGPv4 [20] 20,513
9 Sorghum v.3.1 [17] Pearl Millet Aw [246] 21,012
...

By default, the entry for each map set shows the map set Name, and [MapSetId]. The long listing adds accession and track name.

To see the ortholog sets for a given map set, specify its MapSetId or MapSetPath:

PS> list ortholog 2 -l

Os GJ-temp: IRGSP-1.0 (Nipponbare) ([2] MSU_osa1r7)) vs. the rest:

Record ID From track To mapset Track Count
---------------------------------------------------------------------------------------------------------------------------
1 Gnomon gene models Zea mays AGPv4 ([20] AGPv4) Gramene gene models 18,101
2 Gnomon gene models O.sativa spontanea PI653432 ([234] GWHDONS00000000) Gene models 24,360
3 MSU gene models ASM465v1 ([4] ASM465v11) BGI gene models 28,859
7 Gnomon gene models Sorghum v.3.1 ([17] Sorghum v.3.1) PHYTOZOME12 gene models 19,081
19 Gnomon gene models Wheat IWGSCv1.0 ([21] IWGSCv1.0) IWGSC gene models 19,506
41 genes from CSHL Zm-B73 ([47] Zm-B73) Gene models July 2020 17,127
150 Gnomon gene models Chloris virgata, draft ([236] C.virgata) Gene models 18,314

The Record ID in the second column can be used to specify which ortholog set to delete by using the command delete ortholog:

PS> delete ortholog 3

Record ID From mapset To mapset Count
-------------------------------------------------------------------------------
3 Os GJ-temp: IRGSP-1.0 (Nipponbare) [2] ASM465v1 [4] 28,859

Do you want to delete the ortholog record? (Y/N)

Listing the Stats

To peek inside the database and see how many objects of different types are stored, use the command

PS> list stats
[0] - Number of annotations
[1] - Number of annotation qualifiers
[2] - Number of markers
[3] - Number of marker qualifiers
[4] - Number of tracks
[5] - Number of maps
[6] - All of the above
Select [lineNo] of the statistic to print: 6
Database contains:
9,912,131 annotation record(s)
40,563,519 annotation qualifier(s)
9,581,539 marker(s)
26,269,416 marker qualifier(s)
225,958 track(s)
193,162 map(s)

Listing Qualifier Filters

The Solr search index can be reduced by using QualifierFilters. To list the filters for the active connection use the command:

PS> list qualifier_filter
Qualifier filters for search indexing listed in psh.exe.config:
┌────────────────────────────────────────────┐
│ QUALIFIER TYPE INDEXING │
│ ------------------------------------------ │
│ <All> Floats False │
│ <All> Integers False │
│ <All> MARKER False │
│ CLNDN MARKER True │
│ rsId MARKER True │
│ <All> ANNOT False │
│ transcript_id ANNOT True │
│ transcriptName ANNOT True │
│ transcriptId ANNOT True │
│ transcriptID ANNOT True │
│ product ANNOT True │
│ old_locus_tag ANNOT True │
│ note ANNOT True │
│ locus_tag ANNOT True │
│ iwgsc_id ANNOT True │
│ gene_id ANNOT True │
│ gene_synonym ANNOT True │
│ geneName ANNOT True │
│ geneId ANNOT True │
│ gene ANNOT True │
│ description ANNOT True │
│ definition ANNOT True │
│ alias ANNOT True │
│ Synonym ANNOT True │
│ Parent ANNOT True │
│ Name ANNOT True │
│ Note ANNOT True │
│ Info ANNOT True │
│ Function description ANNOT True │
│ Description ANNOT True │
│ Alias ANNOT True │
│ SwissProt match ANNOT True │
│ │
└────────────────────────────────────────────┘

Note that in this particular case, the output contains a frame around the table. To enable frames in the printouts of some commands, use the parameter -F when starting PersephoneShell.

Listing variants (VCFs)

The variants (SNPs/indels) are loaded from VCF files, so the data retain the original VCF file names. Listing the VCF files can be extended to listing the samples from the corresponding files. The command 'list variant' will show the list of VCF files and will ask to select one of them to display its samples.

PS> list variant
[##] VcfId MapSetId RunId Version Original path
----------------------------------------------------------------------------------------------------------------------------------
[0] 13 2 1049 2 /HDD4Gb/bio/data/rice/ordered_rice3K.vcf.gz
[1] 14 17 1050 2 /HDD4Gb/bio/data/sorghum/Sbicolor_Patterson_454.vcf.gz
[2] 16 10 1062 2 /HDD4Gb/bio/data/human/vcf/ALL.chr6.shapeit2_integrated_v1a.GRCh38.20181129.phased.vcf.gz
[3] 18 195 1366 10 /data/e/data/wheat/SNPs_lifted_final2_sorted_v1.vcf.gz/SNPs_lifted_final2_sorted_v1.vcf

Select [lineNo] corresponding to the VCF to list:

You can overwrite this extension by using the forceMode flag (-f), which will skip listing the samples and return to the command line.

When listing variants, pay attention to the column with VcfIds. You can use VcfId in the command for deleting the variants, for example:

PS> delete variant 18 -f

The command with the fourceMode flag (-f) will skip interactive steps, such as selecting a VCF from a list or confirming the deletion, and will return to the prompt after deleting the corresponding data. This allows inclusion of such commands into automated scripts.

Listing Synteny Ribbons

The synteny ribbons connect related regions of a pair of maps. The set of ribbons between two map sets have a common RunId shown in the first column:

PS> list synteny
RUN_ID FROM MAP SET TO MAP SET #RIBBONS
--------------------------------------------------------------------------------------------------
638 [169] O. glaberrima [170] O. barthii 73
639 [171] O. punctata [170] O. barthii 204
640 [165] O. rufipogon [170] O. barthii 186
641 [152] Os cB: ARC 10497 [170] O. barthii 184

Each map set is listed with its MapSetId printed in the square brackets.

There is a command version that accepts an optional parameter to identify one of the map sets that has synteny records. The map set can be specified by MapSetId (26) or its MapSetPath ("/Solanum tuberosum/DM_v4.03"):

PS> list synteny 26
Synteny for /Solanum tuberosum/DM_v4.03
RUN_ID FROM MAP SET TO MAP SET #RIBBONS
----------------------------------------------------------------
2237 [26] DM_v4.03 [212] DM 1-3 516 R44 2,462
2238 [26] DM_v4.03 [397] Solanum etuberosum v1.2 7,925

The RunId printed in the first column can be used to delete the corresponding synteny data set with the command delete synteny:

delete synteny 641