【佳学基因检测】肿瘤基因检测的这个技术很牛,不是内行看不懂
高通量测序技术通过发现导致疾病发展的改变,有效改变了基因和生物医学研究。尽管在种系和体细胞变异检测方面已经取得了相当大的进展,但低等位基因频率变异的识别仍然受到测序错误和技术人工制品的阻碍。这在肿瘤学中有许多意义,尤其是在液体活检应用中,肿瘤DNA片段的出现频率可能小于0.01%。在这些情况下,由于测序器的平均错误率,敏感检测很困难∼0.1–1% .
High-throughput sequencing technologies have revolutionized genetic and biomedical research by uncovering alterations responsible for the development of disease. Although considerable progress has been made toward germline and somatic variant detection, identification of variants at lower allele frequencies remains hindered by sequencing errors and technical artefacts. This has numerous implications in oncology, particularly in liquid biopsy applications, where tumour DNA fragments may be present at frequencies <0.01%. Sensitive detection is difficult in these scenarios as sequencer error rates average ∼0.1–1% .
一种很有希望的抑制错误的策略是使用少有分子标识符(UMI)来比较来自同一DNA片段的多个读取。删除单个读取中发现的错误,只保留所有冗余读取中存在的变体,以形成单链一致性序列(SSCS)。此外,需要进行链感知双重校正,以消除人工制品的氧化损伤;通过比较互补SSCS,双链共有序列(DCS)只保留在片段的两条链上发现的真实变体。虽然双工方法允许更大的错误抑制,但从SSCS恢复DCS的效率很低(15–47%),并且依赖于测序覆盖率
A promising strategy to suppress errors uses unique molecular identifiers (UMIs) to compare multiple reads derived from the same DNA fragment. Errors that are found in individual reads are removed, and only variants present across all redundant reads are retained to form a single-strand consensus sequence (SSCS). In addition, strand-aware duplex correction is needed to eliminate artefacts from oxidative damage; duplex consensus sequences (DCSs) retain only true variants found on both strands of a fragment by comparing complementary SSCSs. While duplex methods allow for greater error suppression, the efficiency of DCS recovery from SSCSs is poor (15–47%, ) and reliant on sequencing coverage .
当前基于UMI的错误校正方法的一个主要限制是对冗余排序的依赖。这导致效率低下,尽管测序成本很高,但独特序列的产量却很低。在双联UMI方法中,这些效率低下的现象进一步放大,在这种方法中,一个分子的两条链都必须进行冗余测序。这是有问题的,因为不均匀的测序通常是由扩增偏差、随机抽样和覆盖率不足引起的。这些因素将双工校正的适用性限制为仅0.5–2.5%的序列读取(图). 此外,当前基于UMI的策略没有对未冗余排序的单次读取(单例)使用错误抑制。这是有害的,因为在中等深度的测序样本(定义为∼1000×–10000×(本研究覆盖范围)。
A major limitation of current UMI-based error correction methods is the dependence on redundant sequencing. This results in poor efficiency with low yield of unique sequences despite high sequencing costs. These inefficiencies are further magnified in duplex UMI methods, where both strands of a molecule must be redundantly sequenced. This is problematic, as uneven sequencing often arises from amplification biases, stochastic sampling, and inadequate coverage . These factors limit the applicability of duplex correction to only 0.5–2.5% of sequenced reads (Figure ). Furthermore, current UMI-based strategies do not utilize error suppression for single reads (singletons) that have not been redundantly sequenced. This is detrimental as singletons may account for over half of all reads in a moderately deep sequenced sample (defined as ∼1000×–10 000× coverage in this study).
为了解决这些限制,我们开发了一种“单例校正”方法,可以在单例中抑制错误. 通过利用混合捕获深度测序数据中存在的大量单体,单体校正允许显著地校正更多序列。与传统的仅限于冗余读取的UMI方法不同,我们的方法还使用互补链的读取消除了单例中的错误。在这里,我们分析了细胞系和临床样本的组合,发现单重校正持续提高了传统双重校正方法的效率,增加了灵敏度,同时保持了调用低等位基因频率变体的高特异性。
To address these limitations, we developed a ‘Singleton Correction’ methodology that enables error suppression in singletons (Figure . By utilizing the large number of singletons present in hybrid capture deep sequencing data, Singleton Correction allows dramatically more sequences to be corrected. Unlike traditional UMI methods that are restricted to redundant reads, our method also eliminates errors in singletons using reads from the complementary strand. Here, we analyzed a combination of cell line and clinical samples and found that Singleton Correction consistently improved the efficiency of traditional duplex correction methods and increased sensitivity while maintaining high specificity for calling low-allele-frequency variants.
(责任编辑:佳学基因)