A helper command that analyzes the data without affecting the database

Analyze db

Use this command to quickly check the database parameters and find those that might need an optimization.

PS> analyze db
Checking DB settings:
time_zone != +00:00
tx_isolation (REPEATABLE-READ) != read-committed
innodb_log_file_size (100663296) < 536870912
tmp_table_size (16777216) < 33554432
max_heap_table_size (16777216) < 33554432
join_buffer_size (262144) < 2097152
query_cache_size (1048576) != 0
max_allowed_packet (16777216) < 67108864
max_connections (151) < 500
Please review the warnings, they may help with the database performance issues.
The recommended parameters are listed at https://help.persephonesoft.com/SettingupthePersephoneSystem.html

Analyze fasta

When adding sequences, it is quite useful to analyze the contents of the FASTA file. Listing the headers and the size of each sequence may help decide which sequences to include and how to parse the headers. The command

analyze fasta <fastaFile>

will read the FASTA file and show the list of records:

PS> analyze fasta d:\tomato\LA2093_genome_v1.5.fa.gz
10,011,079      SPIMPch00
95,499,177      SPIMPch01
55,625,203      SPIMPch02
65,797,135      SPIMPch03
67,155,387      SPIMPch04
66,794,423      SPIMPch05
49,593,552      SPIMPch06
67,501,632      SPIMPch07
66,707,511      SPIMPch08
71,524,677      SPIMPch09
67,903,499      SPIMPch10
56,116,167      SPIMPch11

Total 13 records. 810,507,733 nt

Sort by [N]ame; by [L]ength; [T]op records only; ESC-cancel:

Typing T will allow setting the limit on the number of printed lines.

Sort by [N]ame; by [L]ength; [T]op records only; ESC-cancel: T
Number of top rows to show (0=All)? 3
10,011,079      SPIMPch00
95,499,177      SPIMPch01
55,625,203      SPIMPch02

First 3 records are shown
Total 13 records. 810,507,733 nt

It is sometimes useful to sort the records by Name, which quite often separates the entries into name classes, such as chromosomes, scaffolds, organelles, etc. This may help decide which rules (regular expressions) to use when parsing the headers for map names or which sequences to exclude from loading:

Sort by [N]ame; by [L]ength; [T]op records only; ESC-cancel: N
43,270,923      1 dna:chromosome chromosome:IRGSP-1.0:1:1:43270923:1
23,207,287      10 dna:chromosome chromosome:IRGSP-1.0:10:1:23207287:1
29,021,106      11 dna:chromosome chromosome:IRGSP-1.0:11:1:29021106:1
27,531,856      12 dna:chromosome chromosome:IRGSP-1.0:12:1:27531856:1
35,937,250      2 dna:chromosome chromosome:IRGSP-1.0:2:1:35937250:1
36,413,819      3 dna:chromosome chromosome:IRGSP-1.0:3:1:36413819:1
35,502,694      4 dna:chromosome chromosome:IRGSP-1.0:4:1:35502694:1
29,958,434      5 dna:chromosome chromosome:IRGSP-1.0:5:1:29958434:1
31,248,787      6 dna:chromosome chromosome:IRGSP-1.0:6:1:31248787:1
29,697,621      7 dna:chromosome chromosome:IRGSP-1.0:7:1:29697621:1
28,443,022      8 dna:chromosome chromosome:IRGSP-1.0:8:1:28443022:1
23,012,720      9 dna:chromosome chromosome:IRGSP-1.0:9:1:23012720:1
32,941  AC155918 dna:scaffold scaffold:IRGSP-1.0:AC155918:1:32941:1
88,500  AC156495 dna:scaffold scaffold:IRGSP-1.0:AC156495:1:88500:1
128,256 AC160949 dna:scaffold scaffold:IRGSP-1.0:AC160949:1:128256:1
15,426  AC174930 dna:scaffold scaffold:IRGSP-1.0:AC174930:1:15426:1
206,004 AP008246 dna:scaffold scaffold:IRGSP-1.0:AP008246:1:206004:1
157,458 AP008247 dna:scaffold scaffold:IRGSP-1.0:AP008247:1:157458:1
14,476  Syng_TIGR_002 dna:scaffold scaffold:IRGSP-1.0:Syng_TIGR_002:1:14476:1
19,457  Syng_TIGR_004 dna:scaffold scaffold:IRGSP-1.0:Syng_TIGR_004:1:19457:1
21,787  Syng_TIGR_005 dna:scaffold scaffold:IRGSP-1.0:Syng_TIGR_005:1:21787:1
7,820   Syng_TIGR_007 dna:scaffold scaffold:IRGSP-1.0:Syng_TIGR_007:1:7820:1
16,676  Syng_TIGR_008 dna:scaffold scaffold:IRGSP-1.0:Syng_TIGR_008:1:16676:1