List
The list command is used to list one or more objects (targets) in your Persephone database. The common syntax is as follows (with few exceptions):
list <target> [{mapSetId | path}] [-p pattern] [-l] [-i N] [-r] [-t N]
The table below lists the definitions for the list command parameters.
List Command Parameters
Parameter |
Required or Optional? |
Definition |
<target> |
Required |
A target is the object type you want to list, which can be organism, mapset, chromosome, map, sequence, annotation, sam, tracktree, track, source, study, qtl, marker, mapsettree, annotation_method, mapping_method, alignment_run, run, marker_type, track_type, map_type, ontology, xref_db, sample, storage, annotation_qualifier, path, qualifier_link, annotation_search, ortholog, variant, marker_qualifier , stats, qualifier_filter. Note, that a plural form of the target is also accepted. |
-p pattern |
Optional |
Uses a pattern to filter your results. Wildcards (*) are supported. |
-l |
Optional |
Displays the list in the "long listing" format. See Using the Long-Listing Format for more information. |
-i N |
Optional |
Sorts the list by column index number N. (The default is 0.) For example, entering "-i 3" would sort the list by index number 3. |
-r |
Optional |
Sorts the list in reverse order. |
-t N |
Optional |
Lists the top N items. (The default is 0.) For example, entering "-t 5" would list only the top 5 items. |
-d |
Optional |
Executes the list command in debug mode. You can send the debug output to Persephone Software, LLC. at http://persephonesoft.com/contact. |
-T type |
Optional, works with run only |
Filter by process type, works with list run command only |
For example, to list all the organisms in your Persephone database enter list organisms as shown below.
PS> list organisms
The following is a typical example of the output.
-1:Unknown 0:Unknown
3555:Beta vulgaris subsp. vulgaris 3769:Arabidopsis thaliana
3847:Glycine max 4081:Solanum lycopersicum
4498:Avena sativa 4555:Setaria italica
4558:Sorghum bicolor 4565:Triticum aestivum
9606:Homo sapiens 10090:Mus musculus
15368:Brachypodium distachyon 29760:Vitis vinifera
38727:Panicum virgatum 39946:Oryza sativa subsp. indica
112509:Hordeum vulgare subsp. vulgare 138011:Brassica napus var. napus
218851:Aquilegia coerulea 311987:Zea mays
311988:Oryza sativa 339834:Miscanthus spp
22 organisms
Tip
The List command is commonly used in conjunction with the Delete command to delete objects from the Persephone database. See Deleting Loaded Data for an example of using the List and Delete commands together.
Using the Long-Listing Format
If you want to display more information about an object use the "-l" option to display the list in the "long-listing" format. For example, to list all organisms beginning with the letter "s" you would enter the following:
PS> list organisms -p s* -l
ID NAME TAXONOMY_NUM COMMON_NAME DICOT
--------------------------------------------------
4081 Solanum lycopersicum 4081 tomato 1
4113 Solanum tuberosum 4113 potato 1
4558 Sorghum bicolor 4558 sorghum 0
2500762 Shigella phage vB_SdyM_006 2500762 Shigella (null)
2697049 SARS coronavirus 2697049 SARS (null)
6 organisms
Listing process runs
Each batch of data is loaded in a process that is recorded as "process_run". You can list all the loading jobs by using the command list run. To filter the multi-line output, you can use the switch -T, which will display the jobs of one type only. The name of the type is the same as the one used with the command add. For example, to list the jobs for loading gene annotation, use 'annotation':
PS> list run -T annotation
3: Load annotations for B73 RefGen_v5
1 run
Other process run types are: sequence, annotation, ribbon, ortholog, marker, map, variant, quantitative, qtl, sam.
The process type of each run will be shown when the long format (-l) is used:
RUN_ID DESCRIPTION PROCESS_TYPE DATE_CREATED CREATED_BY
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
98 Loaded sequences for PGSC_DM_v4.03 from MSU sequence 2/21/2019 2:21:53 AM ubuntu
99 Loaded annotations for PGSC_DM_v4.03 from MSU annotation 2/21/2019 2:27:57 AM ubuntu
100 DArT markers from MSU marker 2/21/2019 2:31:56 AM ubuntu
101 Dundee opa markers from MSU marker 2/21/2019 2:32:37 AM ubuntu
Listing Track Tree entries
The tracks of each map set have a common hierarchical structure repeated on each map. To display this tree-like organization of tracks, identify the map set by using either its MapSetId or the full path (use TAB to auto-complete):
PS> list tracktree "Arabidopsis thaliana/TAIR10"
[0] Ensembl (Track, Order: 0, Type: Annotation)
[1] TREP repeats (Track, Order: 1, Type: GenericBp)
[2] SV-deletions (Track, Order: 2, Type: GenericBp)
[3] SV-insertions (Track, Order: 3, Type: GenericBp)
[4] RNA-seq (Group, Order: 4)
[5] RNA-seq cold treatment (Track, Order: 4, Type: Quantitative)
[6] RNA-seq cold-stress control (Track, Order: 5, Type: Quantitative)
Listing Map Sets
The command 'list mapset' can work as it is, without extra parameters. In such case, it will list all available map sets, showing their MapSetIds:
PS> list mapset
1: TAIR10 5: Scaffold maps
30: DM_v4.03 31: EL10_1.0
44: Optical maps 50: ARE_050606
51: Criollo v2 52: Tcacao_v2.1
53: Os GJ-subtrp: CHAO MEO 54: Tomato ITAG5.0
55: TraitGenetics EXPEN2012
11 mapsets
You can identify a particular map set by specifying its MapSetId or path on the command line. It is typical to use this command with the longListing flag -l:
PS> list mapset 51 -l
MAP_SET_ID DISPLAY_NAME ACCESSION_NO SOURCE_ID DISTANCE_UNIT ORGANISM_ID
----------------------------------------------------------------------------------
51 Criollo v2 GCF_000208745.1 NCBI bp 3641
/Theobroma cacao/Criollo v2
1 mapset
When the listing shows many map sets, you might want to use a pattern (-p) to reduce the number of shown map sets.
PS> list mapset -p solanum
20: Tomato SLT1.0 21: SL4.0
27: DM 1-3 516 R44 30: DM_v4.03
54: Tomato ITAG5.0 55: TraitGenetics EXPEN2012
Note that 'solanum' is not found in the map set names, but it is present in the full path to the map sets that includes the parent nodes, such as 'Solanum lycopersicum' and 'Solanum tuberosum'.
To separate physical and genetic maps you can use sorting by column 4:
PS> list mapset -l -i 4
83 2019-nCoV WHU02 MN988669.1 NCBI bp 2697049
82 2019-nCoV WHU01 MN988668.1 NCBI bp 2697049
81 2019-nCoV/USA-WA1/2020 MN985325.1 NCBI bp 2697049
80 2019-nCoV_HKU-SZ-005b_2020 MN975262.1 NCBI bp 2697049
71 Zm-Tx303 Zm-Tx303 MazeGDB bp 4577
209 ARE_050606 ARE_050606 URGI_INRA cM 4565
18 BTx623-IS320C BTx623-IS320C Publication cM 4558
24 ChineseSpring x Renan CS_Renan_Genetic IWGSC cM 4565
25 RH_MAPS RH_MAPS IWGSC cM 4565
214 TraitGenetics EXPEN2012 EXPEN2012 solgenomics cM 4081
Listing the Map Set Tree
An organism can have multiple map sets, which are collections of physical or genetic maps. The map set tree is a hierarchical tree structure to efficiently categorize different types of map sets across multiple organisms in an organized fashion. The tree features are called "nodes". A node's "parent" is a node one step higher in the hierarchy and lying on the same branch. A node without any parent is called "root" node; a node without any child is called "leaf" node. "Sibling" nodes share the same parent node. The order between sibling nodes is assignable. There are two types of nodes: a node without an assigned map set (red text in the example below) and a node with a map set (blue text below). Note that a map set cannot be assigned on more than one node. To list a map set tree, use command "list mapsettree". The option -l displays node id along with map set ID/type if it exists.
PS> list mapsettree -l
Brachypodium (NodeId:200081224)
Physical:Brachypodium annotation (NodeId:200081227, MapSetId:200081226)
Scaffold:Brachypodium scaffolds (NodeId:200590610, MapSetId:200590609)
Genetic:Bd3-1 x Bd21 (NodeId:270702460, MapSetId:270702459)
...
If the map set tree has many entries, you can limit the output by specifying the name of the nodes:
PS> list mapsettree "Glycine (genus)"
Glycine (genus)
Glycine max
G.max Wm82 v1.01
Soybean scaffolds
Wm82.a2.v1
Wm82.a4.v1
Zhonghuang 13
G.max Lee v1.1
Glycine soja
G.soja W05
G.soja PI483463
Listing the Map Set Path
We have a convenience method that displays full path of the map sets.
When authoring the INI files, you can reference the map set by its Id or by path. If you have more than one database and plan to share the INI file between the different instances, it is recommended to use the map set path: the IDs in different databases can be different, but the path could be more stable, and so, can be reused. It is typical to narrow down the output by providing a name of the map set tree node:
PS> list path "Glycine (genus)"
/Glycine (genus)/Glycine max/G.max Wm82 v1.01
/Glycine (genus)/Glycine max/G.max Wm82 v1.01/Soybean scaffolds
/Glycine (genus)/Glycine max/Wm82.a2.v1
/Glycine (genus)/Glycine max/Wm82.a4.v1
/Glycine (genus)/Glycine soja/G.soja W05
/Glycine (genus)/Glycine max/Zhonghuang 13
/Glycine (genus)/Glycine soja/G.soja PI483463
/Glycine (genus)/Glycine max/G.max Lee v1.1
Now you can copy the map set path and transfer it to the INI file.
Listing the Orthologs
The command 'list orthologs' will display all pairs of the map sets that have the orthologous genes:
PS> list orthologs
RunId:1027 Zea mays AGPv4 ([20] AGPv4) 27,153 Zea mays AGPv4 ([20] AGPv4)
RunId:129 Zea mays Mo17 ([30] GCA_003185045.1) 30,058 Zea mays AGPv4 ([20] AGPv4)
RunId:1342 Pearl Millet Aw ([246] Pg_Aw) 27,994 Zea mays AGPv4 ([20] AGPv4)
RunId:1110 Zm-B73 ([47] Zm-B73) 39,065 Zea mays AGPv4 ([20] AGPv4)
RunId:21 Os GJ-temp: IRGSP-1.0 (Nipponbare) ([2] MSU_osa1r7) 31,584 ASM465v1 ([4] ASM465v11)
...
The count of calculated orthologs is shown in the central column for each map set pair.
The entry for each map set shows the map set Name, [MapSetId], and Accession.
To see the map sets "linked" to a given map set by the orthologs, specify one of the map sets:
PS> list orthologs "Glycine (genus)/Glycine max/Wm82.a4.v1"
Glycine (genus)/Glycine max/Wm82.a4.v1 vs.:
RunId:1159 Os XI-1B1: IR 64 ([151] GCA_009914875.1) 13,662
RunId:1087 M.truncatula A17_4.0 ([179] MedtrA17_4.0) 21,478
RunId:1003 Vigna unguiculata v1.2 ([192] Vunguiculata_540_v1.2) 30,350
RunId:1248 Wm82.a4.v1 ([34] Wm82.a4.v1) 45,798
RunId:830 Zhonghuang 13 ([173] Zhonghuang13) 44,037
RunId:385 G.soja W05 ([147] GCF_004193775.1) 52,429
RunId:840 G.max Lee v1.1 ([176] GmaxLee_510_v1.1) 57,659
RunId:845 G.soja PI483463 ([178] Gsoja_509_v1.0) 54,326
RunId:382 TAIR10 ([16] TAIR10) 18,675
RunId:147 Wm82.a2.v1 ([19] Wm82.a2.v1) 59,322
RunId:837 Zhongmu No.1 ([175] ZhongmuNo1) 20,445
Please note that the first column shows the RunId of the process that generated the ortholog set. Knowing the RunId is important if you decide to delete the orthologs. The only way to delete the ortholog pairs is by deleting the corresponding Run:
PS> delete run 837
Listing the Stats
To peek inside the database and see how many objects of different types are stored, use the command
PS> list stats
[0] - Number of annotations
[1] - Number of annotation qualifiers
[2] - Number of markers
[3] - Number of marker qualifiers
[4] - Number of tracks
[5] - Number of maps
[6] - All of the above
Select [lineNo] of the statistic to print: 6
Database contains:
9,912,131 annotation record(s)
40,563,519 annotation qualifier(s)
9,581,539 marker(s)
26,269,416 marker qualifier(s)
225,958 track(s)
193,162 map(s)
Listing Qualifier Filters
The Solr search index can be reduced when using QualifierFilters. To list the filters being in effect for the active connection use the command:
PS> list qualifier_filter
Qualifier filters for search indexing listed in psh.exe.config:
┌────────────────────────────────────────────┐
│ QUALIFIER TYPE INDEXING │
│ ------------------------------------------ │
│ <All> Floats False │
│ <All> Integers False │
│ <All> MARKER False │
│ CLNDN MARKER True │
│ rsId MARKER True │
│ <All> ANNOT False │
│ transcript_id ANNOT True │
│ transcriptName ANNOT True │
│ transcriptId ANNOT True │
│ transcriptID ANNOT True │
│ product ANNOT True │
│ old_locus_tag ANNOT True │
│ note ANNOT True │
│ locus_tag ANNOT True │
│ iwgsc_id ANNOT True │
│ gene_id ANNOT True │
│ gene_synonym ANNOT True │
│ geneName ANNOT True │
│ geneId ANNOT True │
│ gene ANNOT True │
│ description ANNOT True │
│ definition ANNOT True │
│ alias ANNOT True │
│ Synonym ANNOT True │
│ Parent ANNOT True │
│ Name ANNOT True │
│ Note ANNOT True │
│ Info ANNOT True │
│ Function description ANNOT True │
│ Description ANNOT True │
│ Alias ANNOT True │
│ SwissProt match ANNOT True │
│ │
└────────────────────────────────────────────┘
Note that in this particular case, the output contains a frame around the table. To enable frames in the printouts of some commands use PersephoneShell's parameter -F
Listing variants (VCFs)
The variants (SNPs/indels) are loaded from VCF files, so the data retain the original VCF file names. Listing the VCF files can be extended to listing the samples from the corresponding files. The command 'list variant' will show the list of VCF files and will ask to select one of them to display its samples.
PS> list variant
[##] VcfId MapSetId RunId Version Original path
----------------------------------------------------------------------------------------------------------------------------------
[0] 13 2 1049 2 /HDD4Gb/bio/data/rice/ordered_rice3K.vcf.gz
[1] 14 17 1050 2 /HDD4Gb/bio/data/sorghum/Sbicolor_Patterson_454.vcf.gz
[2] 16 10 1062 2 /HDD4Gb/bio/data/human/vcf/ALL.chr6.shapeit2_integrated_v1a.GRCh38.20181129.phased.vcf.gz
[3] 18 195 1366 10 /data/e/data/wheat/SNPs_lifted_final2_sorted_v1.vcf.gz/SNPs_lifted_final2_sorted_v1.vcf
Select [lineNo] corresponding to the VCF to list:
You can overwrite this extension by using the forceMode flag (-f), which will skip listing the samples and return to the command line.
When listing variants, pay attention to the column with VcfIds. You can use VcfId in the command for deleting the variants, for example:
PS> delete variant 18 -f
The command with the fourceMode flag (-f) will skip interactive steps, such as selecting a VCF from a list or confirming the deletion, and will return to the prompt after deleting the corresponding data. This allows inclusion of such commands into automated scripts.