The list command is used to list one or more objects (targets) in your Persephone database. The common syntax is as follows (with few exceptions):

list <target> [{mapSetId | path}] [-p pattern] [-l] [-i N] [-r] [-t N]

The table below lists the definitions for the list command parameters.

List Command Parameters

Parameter

Required or Optional?

Definition

<target>

Required

A target is the object type you want to list, which can be organism, mapset, chromosome, map, sequence, annotation, sam, tracktree, track, source, study, qtl, marker, mapsettree, annotation_method, mapping_method, alignment_run, run, marker_type, track_type, map_type, ontology, xref_db, sample, storage, annotation_qualifier, path, qualifier_link, annotation_search, ortholog,  variant, marker_qualifier , stats, qualifier_filter. Note, that a plural form of the target is also accepted.

-p pattern

Optional

Uses a pattern to filter your results. Wildcards (*) are supported.

-l

Optional

Displays the list in the "long listing" format. See Using the Long-Listing Format for more information.

-i N

Optional

Sorts the list by column index number N. (The default is 0.) For example, entering "-i 3" would sort the list by index number 3.

-r

Optional

Sorts the list in reverse order.

-t N

Optional

Lists the top N items. (The default is 0.) For example, entering "-t 5" would list only the top 5 items.

-d

Optional

Executes the list command in debug mode. You can send the debug output to Persephone Software, LLC. at http://persephonesoft.com/contact.

-T type

Optional, works with run only

Filter by process type, works with list run command only

For example, to list all the organisms in your Persephone database enter list organisms as shown below.

PS> list organisms

The following is a typical example of the output.

-1:Unknown                              0:Unknown
3555:Beta vulgaris subsp. vulgaris      3769:Arabidopsis thaliana
3847:Glycine max                        4081:Solanum lycopersicum
4498:Avena sativa                       4555:Setaria italica
4558:Sorghum bicolor                    4565:Triticum aestivum
9606:Homo sapiens                       10090:Mus musculus
15368:Brachypodium distachyon           29760:Vitis vinifera
38727:Panicum virgatum                  39946:Oryza sativa subsp. indica
112509:Hordeum vulgare subsp. vulgare   138011:Brassica napus var. napus
218851:Aquilegia coerulea               311987:Zea mays
311988:Oryza sativa                     339834:Miscanthus spp
        22 organisms

 

Tip

The List command is commonly used in conjunction with the Delete command to delete objects from the Persephone database. See Deleting Loaded Data for an example of using the List and Delete commands together.

Using the Long-Listing Format

If you want to display more information about an object use the "-l" option to display the list in the "long-listing" format. For example, to list all organisms beginning with the letter "s" you would enter the following:

PS> list organisms -p s* -l
ID    NAME    TAXONOMY_NUM    COMMON_NAME    DICOT
--------------------------------------------------
4081    Solanum lycopersicum    4081    tomato  1
4113    Solanum tuberosum       4113    potato  1
4558    Sorghum bicolor         4558    sorghum 0
2500762 Shigella phage vB_SdyM_006      2500762 Shigella   (null)
2697049 SARS coronavirus        2697049 SARS    (null)
6 organisms

Listing process runs

Each batch of data is loaded in a process that is recorded as "process_run". You can list all the loading jobs by using the command list run. To filter the multi-line output, you can use the switch -T, which will display the jobs of one type only. The name of the type is the same as the one used with the command add. For example, to list the jobs for loading gene annotation, use 'annotation':

PS> list run -T annotation
3: Load annotations for B73 RefGen_v5
1 run

Other process run types are: sequence, annotation, ribbon, ortholog, marker, map, variant, quantitative, qtl, sam.

The process type of each run will be shown when the long format (-l) is used:

RUN_ID  DESCRIPTION                                   PROCESS_TYPE      DATE_CREATED     CREATED_BY
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
98      Loaded sequences for PGSC_DM_v4.03 from MSU   sequence   2/21/2019 2:21:53 AM    ubuntu
99      Loaded annotations for PGSC_DM_v4.03 from MSU annotation 2/21/2019 2:27:57 AM    ubuntu
100     DArT markers from MSU                         marker     2/21/2019 2:31:56 AM    ubuntu
101     Dundee opa markers from MSU                   marker     2/21/2019 2:32:37 AM    ubuntu

Listing Track Tree entries

The tracks of each map set have a common hierarchical structure repeated on each map. To display this tree-like organization of tracks, identify the map set by using either its MapSetId or the full path (use TAB to auto-complete):

PS> list tracktree "Arabidopsis thaliana/TAIR10"
[0] Ensembl (Track, Order: 0, Type: Annotation)
[1] TREP repeats (Track, Order: 1, Type: GenericBp)
[2] SV-deletions (Track, Order: 2, Type: GenericBp)
[3] SV-insertions (Track, Order: 3, Type: GenericBp)
[4] RNA-seq (Group, Order: 4)
    [5] RNA-seq cold treatment (Track, Order: 4, Type: Quantitative)
    [6] RNA-seq cold-stress control (Track, Order: 5, Type: Quantitative)

Listing Map Sets

The command 'list mapset' can work as it is, without extra parameters. In such case, it will list all available map sets, showing their MapSetIds:

PS> list mapset
1: TAIR10                                                   5: Scaffold maps
30: DM_v4.03                                                31: EL10_1.0
44: Optical maps                                            50: ARE_050606
51: Criollo v2                                              52: Tcacao_v2.1
53: Os GJ-subtrp: CHAO MEO                                  54: Tomato ITAG5.0
55: TraitGenetics EXPEN2012
11 mapsets

You can identify a particular map set by specifying its MapSetId or path on the command line. It is typical to use this command with the longListing flag -l:

PS> list mapset 51 -l
MAP_SET_ID  DISPLAY_NAME  ACCESSION_NO     SOURCE_ID  DISTANCE_UNIT  ORGANISM_ID
----------------------------------------------------------------------------------
51          Criollo v2    GCF_000208745.1  NCBI       bp             3641

/Theobroma cacao/Criollo v2
1 mapset

When the listing shows many map sets, you might want to use a pattern (-p) to reduce the number of shown map sets.

PS> list mapset -p solanum
20: Tomato SLT1.0                                           21: SL4.0
27: DM 1-3 516 R44                                          30: DM_v4.03
54: Tomato ITAG5.0                                          55: TraitGenetics EXPEN2012

Note that 'solanum' is not found in the map set names, but it is present in the full path to the map sets that includes the parent nodes, such as 'Solanum lycopersicum' and 'Solanum tuberosum'.

To separate physical and genetic maps you can use sorting by column 4:

PS> list mapset -l -i 4
83          2019-nCoV WHU02                                   MN988669.1                    NCBI                bp             2697049
82          2019-nCoV WHU01                                   MN988668.1                    NCBI                bp             2697049
81          2019-nCoV/USA-WA1/2020                            MN985325.1                    NCBI                bp             2697049
80          2019-nCoV_HKU-SZ-005b_2020                        MN975262.1                    NCBI                bp             2697049
71          Zm-Tx303                                          Zm-Tx303                      MazeGDB             bp             4577
209         ARE_050606                                        ARE_050606                    URGI_INRA           cM             4565
18          BTx623-IS320C                                     BTx623-IS320C                 Publication         cM             4558
24          ChineseSpring x Renan                             CS_Renan_Genetic              IWGSC               cM             4565
25          RH_MAPS                                           RH_MAPS                       IWGSC               cM             4565
214         TraitGenetics EXPEN2012                           EXPEN2012                     solgenomics         cM             4081

Listing the Map Set Tree

An organism can have multiple map sets, which are collections of physical or genetic maps. The map set tree is a hierarchical tree structure to efficiently categorize different types of map sets across multiple organisms in an organized fashion. The tree features are called "nodes". A node's "parent" is a node one step higher in the hierarchy and lying on the same branch. A node without any parent is called "root" node; a node without any child is called "leaf" node. "Sibling" nodes share the same parent node. The order between sibling nodes is assignable. There are two types of nodes: a node without an assigned map set (red text in the example below) and a node with a map set (blue text below). Note that a map set cannot be assigned on more than one node. To list a map set tree, use command "list mapsettree". The option -l displays node id along with map set ID/type if it exists.

PS> list mapsettree -l
Brachypodium (NodeId:200081224) 
  Physical:Brachypodium annotation (NodeId:200081227, MapSetId:200081226) 
    Scaffold:Brachypodium scaffolds (NodeId:200590610, MapSetId:200590609)
  Genetic:Bd3-1 x Bd21 (NodeId:270702460, MapSetId:270702459)
...

 If the map set tree has many entries, you can limit the output by specifying the name of the nodes:


PS> list mapsettree "Glycine (genus)"
Glycine (genus)
  Glycine max
    G.max Wm82 v1.01
      Soybean scaffolds
    Wm82.a2.v1
    Wm82.a4.v1
    Zhonghuang 13
    G.max Lee v1.1
  Glycine soja
    G.soja W05
    G.soja PI483463

Listing the Map Set Path

We have a convenience method that displays full path of the map sets. 

When authoring the INI files, you can reference the map set by its Id or by path. If you have more than one database and plan to share the INI file between the different instances, it is recommended to use the map set path: the IDs in different databases can be different, but the path could be more stable, and so, can be reused. It is typical to narrow down the output by providing a name of the map set tree node:

PS> list path "Glycine (genus)"
/Glycine (genus)/Glycine max/G.max Wm82 v1.01
/Glycine (genus)/Glycine max/G.max Wm82 v1.01/Soybean scaffolds
/Glycine (genus)/Glycine max/Wm82.a2.v1
/Glycine (genus)/Glycine max/Wm82.a4.v1
/Glycine (genus)/Glycine soja/G.soja W05
/Glycine (genus)/Glycine max/Zhonghuang 13
/Glycine (genus)/Glycine soja/G.soja PI483463
/Glycine (genus)/Glycine max/G.max Lee v1.1

Now you can copy the map set path and transfer it to the INI file.

Listing the Orthologs

The command 'list orthologs' will display all pairs of the map sets that have the orthologous genes:


PS> list orthologs
RunId:1027                                            Zea mays AGPv4 ([20] AGPv4)   27,153       Zea mays AGPv4 ([20] AGPv4)
RunId:129                                    Zea mays Mo17 ([30] GCA_003185045.1)   30,058       Zea mays AGPv4 ([20] AGPv4)
RunId:1342                                          Pearl Millet Aw ([246] Pg_Aw)   27,994       Zea mays AGPv4 ([20] AGPv4)
RunId:1110                                                   Zm-B73 ([47] Zm-B73)   39,065       Zea mays AGPv4 ([20] AGPv4)
RunId:21                      Os GJ-temp: IRGSP-1.0 (Nipponbare) ([2] MSU_osa1r7)   31,584       ASM465v1 ([4] ASM465v11)
...

The count of calculated orthologs is shown in the central column for each map set pair.

The entry for each map set shows the map set Name, [MapSetId], and Accession.

To see the map sets "linked" to a given map set by the orthologs, specify one of the map sets:

PS> list orthologs "Glycine (genus)/Glycine max/Wm82.a4.v1"
Glycine (genus)/Glycine max/Wm82.a4.v1 vs.:

RunId:1159  Os XI-1B1: IR 64 ([151] GCA_009914875.1)              13,662
RunId:1087  M.truncatula A17_4.0 ([179] MedtrA17_4.0)             21,478
RunId:1003  Vigna unguiculata v1.2 ([192] Vunguiculata_540_v1.2)  30,350
RunId:1248  Wm82.a4.v1 ([34] Wm82.a4.v1)                          45,798
RunId:830   Zhonghuang 13 ([173] Zhonghuang13)                    44,037
RunId:385   G.soja W05 ([147] GCF_004193775.1)                    52,429
RunId:840   G.max Lee v1.1 ([176] GmaxLee_510_v1.1)               57,659
RunId:845   G.soja PI483463 ([178] Gsoja_509_v1.0)                54,326
RunId:382   TAIR10 ([16] TAIR10)                                  18,675
RunId:147   Wm82.a2.v1 ([19] Wm82.a2.v1)                          59,322
RunId:837   Zhongmu No.1 ([175] ZhongmuNo1)                       20,445

Please note that the first column shows the RunId of the process that generated the ortholog set. Knowing the RunId is important if you decide to delete the orthologs. The only way to delete the ortholog pairs is by deleting the corresponding Run:

PS> delete run 837


Listing the Stats

To peek inside the database and see how many objects of different types are stored, use the command

PS> list stats
[0] - Number of annotations
[1] - Number of annotation qualifiers
[2] - Number of markers
[3] - Number of marker qualifiers
[4] - Number of tracks
[5] - Number of maps
[6] - All of the above
Select [lineNo] of the statistic to print: 6
Database contains:
        9,912,131       annotation record(s)
        40,563,519      annotation qualifier(s)
        9,581,539       marker(s)
        26,269,416      marker qualifier(s)
        225,958 track(s)
        193,162 map(s)

Listing Qualifier Filters

The Solr search index can be reduced when using QualifierFilters. To list the filters being in effect for the active connection use the command:

PS> list qualifier_filter
Qualifier filters for search indexing listed in psh.exe.config:
┌────────────────────────────────────────────┐
│ QUALIFIER             TYPE      INDEXING   │
│ ------------------------------------------ │
│ <All>                 Floats    False      │
│ <All>                 Integers  False      │
│ <All>                 MARKER    False      │
│ CLNDN                 MARKER    True       │
│ rsId                  MARKER    True       │
│ <All>                 ANNOT     False      │
│ transcript_id         ANNOT     True       │
│ transcriptName        ANNOT     True       │
│ transcriptId          ANNOT     True       │
│ transcriptID          ANNOT     True       │
│ product               ANNOT     True       │
│ old_locus_tag         ANNOT     True       │
│ note                  ANNOT     True       │
│ locus_tag             ANNOT     True       │
│ iwgsc_id              ANNOT     True       │
│ gene_id               ANNOT     True       │
│ gene_synonym          ANNOT     True       │
│ geneName              ANNOT     True       │
│ geneId                ANNOT     True       │
│ gene                  ANNOT     True       │
│ description           ANNOT     True       │
│ definition            ANNOT     True       │
│ alias                 ANNOT     True       │
│ Synonym               ANNOT     True       │
│ Parent                ANNOT     True       │
│ Name                  ANNOT     True       │
│ Note                  ANNOT     True       │
│ Info                  ANNOT     True       │
│ Function description  ANNOT     True       │
│ Description           ANNOT     True       │
│ Alias                 ANNOT     True       │
│ SwissProt match       ANNOT     True       │
│                                            │
└────────────────────────────────────────────┘

Note that in this particular case, the output contains a frame around the table. To enable frames in the printouts of some commands use PersephoneShell's parameter -F

Listing variants (VCFs)

The variants (SNPs/indels) are loaded from VCF files, so the data retain the original VCF file names. Listing the VCF files can be extended to listing the samples from the corresponding files. The command 'list variant' will show the list of VCF files and will ask to select one of them to display its samples. 

PS> list variant
[##]  VcfId  MapSetId  RunId  Version  Original path
----------------------------------------------------------------------------------------------------------------------------------
[0]   13     2         1049   2        /HDD4Gb/bio/data/rice/ordered_rice3K.vcf.gz
[1]   14     17        1050   2        /HDD4Gb/bio/data/sorghum/Sbicolor_Patterson_454.vcf.gz
[2]   16     10        1062   2        /HDD4Gb/bio/data/human/vcf/ALL.chr6.shapeit2_integrated_v1a.GRCh38.20181129.phased.vcf.gz
[3]   18     195       1366   10       /data/e/data/wheat/SNPs_lifted_final2_sorted_v1.vcf.gz/SNPs_lifted_final2_sorted_v1.vcf

Select [lineNo] corresponding to the VCF to list:

You can overwrite this extension by using the forceMode flag (-f), which will skip listing the samples and return to the command line.

When listing variants, pay attention to the column with VcfIds. You can use VcfId in the command for deleting the variants, for example:

PS> delete variant 18 -f

The command with the fourceMode flag (-f) will skip interactive steps, such as selecting a VCF from a list or confirming the deletion, and will return to the prompt after deleting the corresponding data. This allows inclusion of such commands into automated scripts.