Analyze
A helper command that analyzes the data without affecting the database
Analyze db
Use this command to quickly check the database parameters and find those that might need an optimization.
PS> analyze db
Checking DB settings:
time_zone != +00:00
tx_isolation (REPEATABLE-READ) != read-committed
innodb_log_file_size (100663296) < 536870912
tmp_table_size (16777216) < 33554432
max_heap_table_size (16777216) < 33554432
join_buffer_size (262144) < 2097152
query_cache_size (1048576) != 0
max_allowed_packet (16777216) < 67108864
max_connections (151) < 500
Please review the warnings, they may help with the database performance issues.
The recommended parameters are listed at https://help.persephonesoft.com/SettingupthePersephoneSystem.html
Analyze fasta
When adding sequences, it is quite useful to analyze the contents of the FASTA file. Listing the headers and the size of each sequence may help decide which sequences to include and how to parse the headers. The command
analyze fasta <fastaFile>
will read the FASTA file and show the list of records:
PS> analyze fasta d:\tomato\LA2093_genome_v1.5.fa.gz
10,011,079 SPIMPch00
95,499,177 SPIMPch01
55,625,203 SPIMPch02
65,797,135 SPIMPch03
67,155,387 SPIMPch04
66,794,423 SPIMPch05
49,593,552 SPIMPch06
67,501,632 SPIMPch07
66,707,511 SPIMPch08
71,524,677 SPIMPch09
67,903,499 SPIMPch10
56,116,167 SPIMPch11
Total 13 records. 810,507,733 nt
Sort by [N]ame; by [L]ength; [T]op records only; ESC-cancel:
Typing T will allow setting the limit on the number of printed lines.
Sort by [N]ame; by [L]ength; [T]op records only; ESC-cancel: T
Number of top rows to show (0=All)? 3
10,011,079 SPIMPch00
95,499,177 SPIMPch01
55,625,203 SPIMPch02
First 3 records are shown
Total 13 records. 810,507,733 nt
It is sometimes useful to sort the records by Name, which quite often separates the entries into name classes, such as chromosomes, scaffolds, organelles, etc. This may help decide which rules (regular expressions) to use when parsing the headers for map names or which sequences to exclude from loading:
Sort by [N]ame; by [L]ength; [T]op records only; ESC-cancel: N
43,270,923 1 dna:chromosome chromosome:IRGSP-1.0:1:1:43270923:1
23,207,287 10 dna:chromosome chromosome:IRGSP-1.0:10:1:23207287:1
29,021,106 11 dna:chromosome chromosome:IRGSP-1.0:11:1:29021106:1
27,531,856 12 dna:chromosome chromosome:IRGSP-1.0:12:1:27531856:1
35,937,250 2 dna:chromosome chromosome:IRGSP-1.0:2:1:35937250:1
36,413,819 3 dna:chromosome chromosome:IRGSP-1.0:3:1:36413819:1
35,502,694 4 dna:chromosome chromosome:IRGSP-1.0:4:1:35502694:1
29,958,434 5 dna:chromosome chromosome:IRGSP-1.0:5:1:29958434:1
31,248,787 6 dna:chromosome chromosome:IRGSP-1.0:6:1:31248787:1
29,697,621 7 dna:chromosome chromosome:IRGSP-1.0:7:1:29697621:1
28,443,022 8 dna:chromosome chromosome:IRGSP-1.0:8:1:28443022:1
23,012,720 9 dna:chromosome chromosome:IRGSP-1.0:9:1:23012720:1
32,941 AC155918 dna:scaffold scaffold:IRGSP-1.0:AC155918:1:32941:1
88,500 AC156495 dna:scaffold scaffold:IRGSP-1.0:AC156495:1:88500:1
128,256 AC160949 dna:scaffold scaffold:IRGSP-1.0:AC160949:1:128256:1
15,426 AC174930 dna:scaffold scaffold:IRGSP-1.0:AC174930:1:15426:1
206,004 AP008246 dna:scaffold scaffold:IRGSP-1.0:AP008246:1:206004:1
157,458 AP008247 dna:scaffold scaffold:IRGSP-1.0:AP008247:1:157458:1
14,476 Syng_TIGR_002 dna:scaffold scaffold:IRGSP-1.0:Syng_TIGR_002:1:14476:1
19,457 Syng_TIGR_004 dna:scaffold scaffold:IRGSP-1.0:Syng_TIGR_004:1:19457:1
21,787 Syng_TIGR_005 dna:scaffold scaffold:IRGSP-1.0:Syng_TIGR_005:1:21787:1
7,820 Syng_TIGR_007 dna:scaffold scaffold:IRGSP-1.0:Syng_TIGR_007:1:7820:1
16,676 Syng_TIGR_008 dna:scaffold scaffold:IRGSP-1.0:Syng_TIGR_008:1:16676:1