Share this post

RNASeq is a protocol to quantitate the transcriptome and also to detect novel genes, transcripts, splice-sites, exons, ncRNAs. The focus here is on estimating the relative number of mRNA transcripts for differential gene expression between sample phenotypes.

RNASeq is reported to be better than microarrays having a larger dynamic range, lesser background, more accurate and reproducible (Mortazavi et al.,2008; Nagalakshmi et al., 2008).

The protocol, in brief, is as follows.  cDNA from reverse transcription of poly-A captured transcripts is sequenced and fragment reads are aligned to a reference genome or transcriptome. The relative number of reads (after normalization) for each gene is a measure used for differential expression between samples.

This normalized measure, Reads Per Kilo-base per Million mapped reads, RPKM is calculated as follows.

Say gene G has a length of 1000bp.

Sample A has 100 reads mapping to this gene with a total of 10 million mappable reads for the entire sample gives RPKM=100/1*10=10

Sample B has 25 reads mapping to this gene with a total of 5 million mappable reads for the entire sample gives RPKM=25/1*5=5

However this 2-fold change in expression might be misleading. For example, Gene G in sample A might be expressed as a long 1000bp isoform while Sample B expresses a shorter isoform of length say 500bp.

So sample B RPKM=25/0.5*5=10, and so there is no change in differential expression.

A better way to estimate gene abundance is to add up individual isoform totals. However this brings up the problem of assigning short reads to specific isoforms of a gene.

In addition, there is the problem of short reads mapping to different genes in the same family or isoforms. Some algorithms discard these while others assign them in proportion to uniquely mapped reads. The discards reportedly are significant – for example 17% of reads for the mouse.

RSEM (http://bioinformatics.oxfordjournals.org/content/26/4/493.full) is an improvement over earlier methods (http://www.ncbi.nlm.nih.gov/pubmed/18516045) to estimate gene abundance by assigning reads to specific isoforms based on a rigorous statistical approach. Gene quantities are then estimated by summing up isoform values.

The TCGA portal provides RNASeq data for gene relative abundances for various cancer samples that have been processed through both methods. Older data is available under the data type RNASeq while data output from RSEM is available as data type RNASeqV2. The same data in a consolidated format is available on the Broad website.


 

Tumor type Download
ACC http://chilp.it/ba2da0
BLCA http://chilp.it/74937d
BRCA http://chilp.it/d60b60
CESC http://chilp.it/485bb5
COAD http://chilp.it/c35e79
DLBC http://chilp.it/ebf922
GBM http://chilp.it/0618cd
HNSC http://chilp.it/674236
KICH http://chilp.it/e73f2a
KIRC http://chilp.it/8f0b0c
KIRP http://chilp.it/2dc8b9
LAML http://chilp.it/dc9a33
LGG http://chilp.it/cc3acb
LIHC http://chilp.it/0134e1
LUAD http://chilp.it/604e32
LUSC http://chilp.it/ad8ca8
OVCA http://chilp.it/05db59
PAAD http://chilp.it/6e73ed
PRAD http://chilp.it/df572f
READ http://chilp.it/390d8d
SARC http://chilp.it/110603
SKCM http://chilp.it/0c6ea8
THCA http://chilp.it/d9d09d
UCEC http://chilp.it/2fa5fc
UCS http://chilp.it/d61b15