Persephone supports transcript-level gene expression. This means that the expression is stored as a set with one value per transcript. There can be different sources of the expression values, with RNA-seq being one of the most popular methods. To allow comparison of values from different experiments, their data should be normalized before loading and the normalization should use the same method. It is up to the researchers who load the data to decide which normalization technique to use.

Note

To make it backward compatible we left IsNormalized variable in the control file. By default (if IsNormalized is not specified) the value is true, which means that PersephoneShell will assume that the data is already normalized before loading. The older INI files may contain IsNormalized set to false. If you try to use them, the newer version of PersephoneShell (after February 20, 2017) will refuse working, and will produce a warning that the data should be pre-normalized. In the older version, PersephoneShell was normalizing the data (when IsNormalized=false) by calculating the Z-score based on all values in the experiment. We found that using only one method of normalization is too restrictive and let the scientists chose their own method beforehand.

As other control files, the control INI file for expression also has common sections like ProcessRun, MapSet, etc. Please see more details here.

Currently, only delimited text files are supported. They may contain sample name and additional information in the first several rows. The first column usually contains transcript names:

 

BV_A

BV_B

BV_D

BV_H

BV_K

BV_L

...

 (tissue)

leaves

leaves

leaves

leaves

leaves

leaves

...

(stage)

unknown

unknown

unknown

unknown

unknown

unknown


 (treatment)

Control for salt and mannitol

Salt (150mM NaCl, 24hr)

Control for IAA, GA3, BAP, and ABA

BAP (10uM, 24hr)

Heat (24hr, 35C)


...

PGSC0003DMG400000001

0.2011

0.6912

0.3464

0.46329

0.52498

0.7274

...

PGSC0003DMG400000002

0.48955

0.10511

0.17085

0.80867

8.9386

0.24704

...

PGSC0003DMG400000003

0.499

0.4404

0.0588

0.42734

0.8567

0.00842

...

PGSC0003DMG400000004

0.5521

0.6303

0.1432

0.72328

0.0803

0.0372

...

PGSC0003DMG400000005

0.34473

0.21822

0.72429

0.03762

0.55906

0.97425

...

PGSC0003DMG401000006

0.40259

0.41331

0.28136

0.10187

0.05898

0.60103

...

PGSC0003DMG402000006

0.161589

0.381256

0.445893

0.135183

0.366729

0.0810092

...

Expression section

The Expression section describes where a file source is located and how it should be parsed. Choose at least one delimiter for a delimited text file. CommitFrequency indicates how often the loading process is committed in database.

[Expression]
; Source (required): a TXT file or Excel file located locally or remotely accessible via URL.
Source="G:\Genome\Plants\Solanum tuberosum PGSC_DM_v4.03\DM_RH_RNA_Seq_FPKM.txt"
; The floating point numbers should use decimal point in the US format, like 12345.99
; FileType: {Text (delimited text file)|Excel}
FileType=Text
; Commit frequency: indicates how often the process commits expression data. Every N transcripts.
CommitFrequency=10000
; Delimiters: specify delimiters among Colon(:), Comma(,), Period(.), Hyphen(-), SemiColon(;), Slash(/), Tab(\t), VerticalBar(|)
Delimiters=Tab

; TranscriptNameIndex (required): column index(0-based) for transcript names.
TranscriptNameIndex=0
; AnnotationQualifierName (required). Which qualifier contains the transcript name
AnnotationQualifierName="Id"

Please note that floating point numbers should use decimal point in the US format, like 12345.99.

The transcripts should be identified by a unique transcript name. AnnotationQualifierName specifies which qualifier is used to store the transcript name for the given map set.

Considering the code above, if the text file contains a transcript name, for example, "Gene001", PersephoneShell will search for a qualifier "Id" with value "Gene001" in the given map set and will identify its internal transcript ID. Please make sure that you are using the qualifier that contains unique transcript names.

Sample name, tissue, stage, and additional information (qualifiers) can be captured by assigning index of rows in the delimited text file. Note that a row index and qualifier name to be created is formatted as "SampleQualifierIndices.ROW_INDEX(0-based)=QUAL_NAME". For example, to add a sample qualifier "Treatment" with values given on the row 3 (0-based, which means the text line four) add SampleQualifierIndices.3="Treatment":

;-----------------------------------------------------------------------------
; Sample info (row indices): sample name and other information
;-------------------------
; SampleNameIndex (required): row index(0-based) for sample names.
SampleNameIndex=0
; SampleTissueIndex (required): row index(0-based) for sample tissue information.
SampleTissueIndex=1
; SampleStageIndex (required): row index(0-based) for sample stage information.
SampleStageIndex=2
; SampleQualifiers: row indices(0-based) for additional information.
;SampleQualifierIndices.ROW_INDEX(0-based)=QUAL_NAME
;SampleQualifierIndices.3="Treatment"
;-----------------------------------------------------------------------------
; Data (row/column index)
;-------------------------
; DataStartRowIndex (required): row index(0-based) that expression data begin
DataStartRowIndex=4
; DataStartColumnIndex (required): column index(0-based) that expression data begin
DataStartColumnIndex=1
; IsNormalized: indicates if expression level is normalized or not (default:true).
;IsNormalized=false

DataStartRowIndex and DataStartColumnIndex tell the position of the very first expression data record.