【佳学基因检测】基因解码数据源：转录本丰度的tximport算法

[转录本丰度和tximport管道

在我们演示如何对齐然后计数RNA seq片段之前，我们提到一种更新更快的替代管道是使用转录丰度量化方法，如Sailfish8、Salmon9、kallisto10或RSEM11，在不对齐读取的情况下估计丰度，然后是tximport软件包，用于组装计数矩阵和偏移矩阵，用于Bioconductor差异基因表达软件包。我们已将其作为该工作流程修订过程的一部分添加，因此以下材料涵盖了通过对齐和读取/碎片计数生成计数矩阵。将转录物丰度量词与tximport结合使用以产生基因水平计数矩阵和标准化偏移量的优点是：该方法校正了样本间基因长度的任何潜在变化（例如，来自差异异构体的使用）12；与基于对齐的方法相比，其中一些方法速度更快，所需的内存和磁盘使用量更少；而且可以避免丢弃那些可以与多个具有同源序列的基因对齐的片段13。请注意，成绩单丰度量词跳过存储读取比对的大型文件的生成，而生成存储每个成绩单的估计丰度、计数和有效长度的较小文件。有关更多详细信息，请参阅描述tximport方法的手稿14和tximport包渐晕图以了解软件详细信息。在使用成绩单量词和tximport之后，返回此工作流的入口点将是下面数据对象的部分。

Transcript abundances and the tximport pipeline

Before we demonstrate how to align and then count RNA-seq fragments, we mention that a newer and faster alternative pipeline is to use transcript abundance quantification methods such as Sailfish8, Salmon9, kallisto10 or RSEM11 to estimate abundances without aligning reads, followed by the tximport package for assembling count and offset matrices for use with Bioconductor differential gene expression packages. We have added this as part of the revision process for this workflow, therefore the following material covers generation of count matrices through alignment and read/fragment counting. The advantages of using the transcript abundance quantifiers in conjunction with tximport to produce gene-level count matrices and normalizing offsets are: this approach corrects for any potential changes in gene length across samples (e.g., from differential isoform usage)12; some of these methods are substantially faster and require less memory and disk usage compared to alignment-based methods; and it is possible to avoid discarding those fragments that can align to multiple genes with homologous sequence13. Note that transcript abundance quantifiers skip the generation of large files which store read alignments, instead producing smaller files which store estimated abundances, counts, and effective lengths per transcript. For more details, see the manuscript describing the tximport approach14 and the tximport package vignette for software details. The entry point back into this workflow after using a transcript quantifier and tximport would be the section on the data object below.

(责任编辑：佳学基因)