Karyotype AI for Precision Oncology
Blood(2024)
摘要
Chromosome analysis is vital for diagnosing genetic disorders and cancer. For hematologic malignancies, identification of somatic clonal aberrations by karyotyping remains the first-line testing and drives therapeutic decisions. Clinically, karyotyping plays a unique role in diagnosing global genomic aberrations on a single-cell basis. However, it is time-consuming because of the largely manual process requiring special expertise. Efforts to automate karyotype analysis to date have fallen short in chromosome aberration detection. A key challenge in applying machine learning to karyotype analysis is the relative scarcity of annotated data, which can limit model performance. To address this, we developed a novel pretraining strategy that leverages the abundance of normal karyotype images. Specifically, we first trained our model on a chromosome classification task using a vast dataset of normal human chromosomes, enabling it to learn fundamental chromosome features. This pretrained model was then fine-tuned using our curated dataset of karyotypes annotated for a variety of chromosomal abnormalities. This transfer learning approach proved remarkably effective, yielding high accuracy despite the limited aberration data. Using a training set of ~10,000 patient specimens and ~50,000 karyograms from over 5 years (2016-2020) of clinical data, we created a labeled set of images representing individual chromosomes. These individual chromosomes were used to train and assess deep learning models for classifying the 24 human chromosomes and identifying chromosomal aberrations. Among multiple machine learning models evaluated, the top-accuracy models for both chromosome identification and aberration detection task utilized the recently introduced Topological Vision Transformers (TopViTs) with 2-level-block-Toeplitz masking, to incorporate structural inductive bias. On the baseline task of chromosome identification, our transformer-based models outperformed CNN (Inception) models with >99.3% accuracy. When applied to disease aberration detection, these high-performing architectures exhibited accuracies >99% for most aberrations. We tested the model on a diverse set of chromosome aberrations (an intra-chromosomal unbalanced abnormality del(5q); intra-chromosomal balanced rearrangements inv(3) and inv(16), and inter-chromosomal translocations t(9;22), t(9;11), and t(11:19)) commonly seen in acute myeloid leukemia (AML), chronic myeloid leukemia, and myelodysplastic syndromes (MDS). Notably, we were able to show high-quality performance even in “few shot” learning scenarios, with limited examples of true aberrations. Incorporating the definition of clonality substantially improved both precision and recall (sensitivity). Furthermore, our attempt to identify aberrant chromosomes de novo showed precision-recall performances comparable to fine tuning across all aberrations. In particular, del(5q) and t(9;22) returned perfect accuracy, while inv(3) and t(11;19)) showed 100% precision with >90% recall. To evaluate the generalizability of our aberration detection models, we used an entirely independent validation set derived from patient samples clinically tested between 2021 and 2022. Across all models and aberrations, we had high precision and recall (100% precision and recall in most instances when considering specimen-level detection). The de novo performance on the 2021-2022 dataset matched that of the 2016-2020 dataset across all aberrations. This is reinforced by the clear separation between normal and aberrant chromosomes seen in UMAP projections for our most frequent aberrations, t(9;22) and del5q. This is the first study demonstrating the ability of a karyotype AI model to accurately detect chromosome aberration approaching expert-level performance.Our assembled dataset, spanning seven years of clinical data and encompassing 6,319 unique patients, represents one of the largest resources for karyotype machine learning. These results open up exciting opportunities for precision oncology, not only expediting patient results but providing a scalable technology for early detection of minimal disease or subclonal lesions. The ability to analyze hundreds of metaphases per specimen would increase the sensitivity of the assay and reveal further details about the clonal architecture of these diseases.
更多查看译文
关键词
Bioimage Analysis,Phenotypic Profiling,Medical Image Analysis,Cancer Genomics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn