Using Artificial Intelligence to Document the Hidden RNA Virosphere
Cell(2024)
摘要
RNA viruses are diverse and abundant components of global ecosystems. The metagenomic identification of RNA viruses is currently limited to those that exhibit sequence similarity to known viruses. Consequently, the detection of highly divergent viruses with poor sequence similarity to known viruses remains a challenging task. We developed a deep learning algorithm, termed LucaProt, to identify highly divergent RNA-dependent RNA polymerase (RdRP) sequences in 10,487 metatranscriptomes from diverse global ecosystems. LucaProt integrates both sequence and structural information to accurately and efficiently detect RdRP sequences. With this approach we identified 161,979 putative RNA virus species and 180 RNA virus supergroups, among which only 21 contained members of phyla or classes currently defined by the International Committee on Taxonomy of Viruses, and includes many groups that were either undescribed or poorly characterized in previous studies. The newly identified RNA viruses were present in diverse ecological settings, including the air, hot springs and hydrothermal vents, and both virus diversity and abundance varied substantially among ecosystems. We also identified the longest RNA virus genome (nido-like virus) documented to date, at 47,250 nucleotides. This study marks the beginning of a new era of virus discovery, providing computational tools that will help expand our understanding of the global RNA virosphere and of virus evolution.### Competing Interest StatementThe authors have declared no competing interest.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn