1
|
Sullivan DK, Pachter L. Flexible parsing, interpretation, and editing of technical sequences with splitcode. Bioinformatics 2024; 40:btae331. [PMID: 38876979 PMCID: PMC11193061 DOI: 10.1093/bioinformatics/btae331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 03/14/2024] [Accepted: 06/12/2024] [Indexed: 06/16/2024] Open
Abstract
MOTIVATION Next-generation sequencing libraries are constructed with numerous synthetic constructs such as sequencing adapters, barcodes, and unique molecular identifiers. Such sequences can be essential for interpreting results of sequencing assays, and when they contain information pertinent to an experiment, they must be processed and analyzed. RESULTS We present a tool called splitcode, that enables flexible and efficient parsing, interpreting, and editing of sequencing reads. This versatile tool facilitates simple, reproducible preprocessing of reads from libraries constructed for a large array of single-cell and bulk sequencing assays. AVAILABILITY AND IMPLEMENTATION The splitcode program is available at http://github.com/pachterlab/splitcode.
Collapse
Affiliation(s)
- Delaney K Sullivan
- UCLA-Caltech Medical Scientist Training Program, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, United States
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, United States
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, United States
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA 91125, United States
| |
Collapse
|
2
|
Jiang S, Evans-Yamamoto D, Bersenev D, Palaniappan SK, Yachie-Kinoshita A. ProtoCode: Leveraging large language models (LLMs) for automated generation of machine-readable PCR protocols from scientific publications. SLAS Technol 2024; 29:100134. [PMID: 38670311 DOI: 10.1016/j.slast.2024.100134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 04/03/2024] [Accepted: 04/22/2024] [Indexed: 04/28/2024]
Abstract
Protocol standardization and sharing are crucial for reproducibility in life sciences. In spite of numerous efforts for standardized protocol description, adherence to these standards in literature remains largely inconsistent. Curation of protocols are especially challenging due to the labor intensive process, requiring expert domain knowledge of each experimental procedure. Recent advancements in Large Language Models (LLMs) offer a promising solution to interpret and curate knowledge from complex scientific literature. In this work, we develop ProtoCode, a tool leveraging fine-tune LLMs to curate protocols into intermediate representation formats which can be interpretable by both human and machine interfaces. Our proof-of-concept, focused on polymerase chain reaction (PCR) protocols, retrieves information from PCR protocols at an accuracy ranging 69-100 % depending on the information content. In all tested protocols, we demonstrate that ProtoCode successfully converts literature-based protocols into correct operational files for multiple thermal cycler systems. In conclusion, ProtoCode can alleviate labor intensive curation and standardization of life science protocols to enhance research reproducibility by providing a reliable, automated means to process and standardize protocols. ProtoCode is freely available as a web server at https://curation.taxila.io/ProtoCode/.
Collapse
Affiliation(s)
- Shuo Jiang
- SBX BioSciences, Inc. 1600 - 925 West Georgia Street, Vancouver, BC, V6C 3L2, Canada
| | - Daniel Evans-Yamamoto
- The Systems Biology Institute, Saisei Ikedayama Bldg., 5-10-25, Higashi Gotanda, Shinagawa-ku, Tokyo, 141-0022, Japan
| | - Dennis Bersenev
- SBX BioSciences, Inc. 1600 - 925 West Georgia Street, Vancouver, BC, V6C 3L2, Canada
| | - Sucheendra K Palaniappan
- The Systems Biology Institute, Saisei Ikedayama Bldg., 5-10-25, Higashi Gotanda, Shinagawa-ku, Tokyo, 141-0022, Japan.
| | - Ayako Yachie-Kinoshita
- SBX BioSciences, Inc. 1600 - 925 West Georgia Street, Vancouver, BC, V6C 3L2, Canada; The Systems Biology Institute, Saisei Ikedayama Bldg., 5-10-25, Higashi Gotanda, Shinagawa-ku, Tokyo, 141-0022, Japan.
| |
Collapse
|
3
|
Ren L, Huang D, Liu H, Ning L, Cai P, Yu X, Zhang Y, Luo N, Lin H, Su J, Zhang Y. Applications of single‑cell omics and spatial transcriptomics technologies in gastric cancer (Review). Oncol Lett 2024; 27:152. [PMID: 38406595 PMCID: PMC10885005 DOI: 10.3892/ol.2024.14285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 01/19/2024] [Indexed: 02/27/2024] Open
Abstract
Gastric cancer (GC) is a prominent contributor to global cancer-related mortalities, and a deeper understanding of its molecular characteristics and tumor heterogeneity is required. Single-cell omics and spatial transcriptomics (ST) technologies have revolutionized cancer research by enabling the exploration of cellular heterogeneity and molecular landscapes at the single-cell level. In the present review, an overview of the advancements in single-cell omics and ST technologies and their applications in GC research is provided. Firstly, multiple single-cell omics and ST methods are discussed, highlighting their ability to offer unique insights into gene expression, genetic alterations, epigenomic modifications, protein expression patterns and cellular location in tissues. Furthermore, a summary is provided of key findings from previous research on single-cell omics and ST methods used in GC, which have provided valuable insights into genetic alterations, tumor diagnosis and prognosis, tumor microenvironment analysis, and treatment response. In summary, the application of single-cell omics and ST technologies has revealed the levels of cellular heterogeneity and the molecular characteristics of GC, and holds promise for improving diagnostics, personalized treatments and patient outcomes in GC.
Collapse
Affiliation(s)
- Liping Ren
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, Sichuan 611844, P.R. China
| | - Danni Huang
- Department of Radiology, Central South University Xiangya School of Medicine Affiliated Haikou People's Hospital, Haikou, Hainan 570208, P.R. China
| | - Hongjiang Liu
- School of Computer Science and Technology, Aba Teachers College, Aba, Sichuan 624099, P.R. China
| | - Lin Ning
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, Sichuan 611844, P.R. China
| | - Peiling Cai
- School of Basic Medical Sciences, Chengdu University, Chengdu, Sichuan 610106, P.R. China
| | - Xiaolong Yu
- Hainan Yazhou Bay Seed Laboratory, Sanya Nanfan Research Institute, Material Science and Engineering Institute of Hainan University, Sanya, Hainan 572025, P.R. China
| | - Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan 611137, P.R. China
| | - Nanchao Luo
- School of Computer Science and Technology, Aba Teachers College, Aba, Sichuan 624099, P.R. China
| | - Hao Lin
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, P.R. China
| | - Jinsong Su
- Research Institute of Integrated Traditional Chinese Medicine and Western Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan 611137, P.R. China
| | - Yinghui Zhang
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, Sichuan 611844, P.R. China
| |
Collapse
|
4
|
Sullivan DK, Pachter L. Flexible parsing, interpretation, and editing of technical sequences with splitcode. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.20.533521. [PMID: 36993532 PMCID: PMC10055216 DOI: 10.1101/2023.03.20.533521] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
Abstract
Next-generation sequencing libraries are constructed with numerous synthetic constructs such as sequencing adapters, barcodes, and unique molecular identifiers. Such sequences can be essential for interpreting results of sequencing assays, and when they contain information pertinent to an experiment, they must be processed and analyzed. We present a tool called splitcode, that enables flexible and efficient parsing, interpreting, and editing of sequencing reads. This versatile tool facilitates simple, reproducible preprocessing of reads from libraries constructed for a large array of single-cell and bulk sequencing assays.
Collapse
Affiliation(s)
- Delaney K. Sullivan
- UCLA-Caltech Medical Scientist Training Program, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, 91125, USA
| |
Collapse
|
5
|
Greenstreet L, Afanassiev A, Kijima Y, Heitz M, Ishiguro S, King S, Yachie N, Schiebinger G. DNA-GPS: A theoretical framework for optics-free spatial genomics and synthesis of current methods. Cell Syst 2023; 14:844-859.e4. [PMID: 37751737 DOI: 10.1016/j.cels.2023.08.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 04/19/2023] [Accepted: 08/25/2023] [Indexed: 09/28/2023]
Abstract
While single-cell sequencing technologies provide unprecedented insights into genomic profiles at the cellular level, they lose the spatial context of cells. Over the past decade, diverse spatial transcriptomics and multi-omics technologies have been developed to analyze molecular profiles of tissues. In this article, we categorize current spatial genomics technologies into three classes: optical imaging, positional indexing, and mathematical cartography. We discuss trade-offs in resolution and scale, identify limitations, and highlight synergies between existing single-cell and spatial genomics methods. Further, we propose DNA-GPS (global positioning system), a theoretical framework for large-scale optics-free spatial genomics that combines ideas from mathematical cartography and positional indexing. DNA-GPS has the potential to achieve scalable spatial genomics for multiple measurement modalities, and by eliminating the need for optical measurement, it has the potential to position cells in three-dimensions (3D).
Collapse
Affiliation(s)
- Laura Greenstreet
- Department of Mathematics, The University of British Columbia, Vancouver, BC, Canada
| | - Anton Afanassiev
- Department of Mathematics, The University of British Columbia, Vancouver, BC, Canada
| | - Yusuke Kijima
- School of Biomedical Engineering, The University of British Columbia, Vancouver, BC, Canada; Department of Aquatic Bioscience, The University of Tokyo, Tokyo, Japan
| | - Matthieu Heitz
- Department of Mathematics, The University of British Columbia, Vancouver, BC, Canada
| | - Soh Ishiguro
- School of Biomedical Engineering, The University of British Columbia, Vancouver, BC, Canada
| | - Samuel King
- School of Biomedical Engineering, The University of British Columbia, Vancouver, BC, Canada
| | - Nozomu Yachie
- School of Biomedical Engineering, The University of British Columbia, Vancouver, BC, Canada; Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo, Japan; Premium Research Institute for Human Metaverse Medicine (WPI-PRIMe), Osaka University, Suita, Osaka, Japan; Graduate School of Media and Governance, Keio University, Fujisawa, Japan.
| | - Geoffrey Schiebinger
- Department of Mathematics, The University of British Columbia, Vancouver, BC, Canada; School of Biomedical Engineering, The University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|