1
|
Wei F, Kouro T, Nakamura Y, Ueda H, Iiizumi S, Hasegawa K, Asahina Y, Kishida T, Morinaga S, Himuro H, Horaguchi S, Tsuji K, Mano Y, Nakamura N, Kawamura T, Sasada T. Enhancing Mass spectrometry-based tumor immunopeptide identification: machine learning filter leveraging HLA binding affinity, aliphatic index and retention time deviation. Comput Struct Biotechnol J 2024; 23:859-869. [PMID: 38356658 PMCID: PMC10864759 DOI: 10.1016/j.csbj.2024.01.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 01/31/2024] [Accepted: 01/31/2024] [Indexed: 02/16/2024] Open
Abstract
Accurately identifying neoantigens is crucial for developing effective cancer vaccines and improving tumor immunotherapy. Mass spectrometry-based immunopeptidomics has emerged as a promising approach to identifying human leukocyte antigen (HLA) peptides presented on the surface of cancer cells, but false-positive identifications remain a significant challenge. In this study, liquid chromatography-tandem mass spectrometry-based proteomics and next-generation sequencing were utilized to identify HLA-presenting neoantigenic peptides resulting from non-synonymous single nucleotide variations in tumor tissues from 18 patients with renal cell carcinoma or pancreatic cancer. Machine learning was utilized to evaluate Mascot identifications through the prediction of MS/MS spectral consistency, and four descriptors for each candidate sequence: the max Mascot ion score, predicted HLA binding affinity, aliphatic index and retention time deviation, were selected as important features in filtering out identifications with inadequate fragmentation consistency. This suggests that incorporating rescoring filters based on peptide physicochemical characteristics could enhance the identification rate of MS-based immunopeptidomics compared to the traditional Mascot approach predominantly used for proteomics, indicating the potential for optimizing neoantigen identification pipelines as well as clinical applications.
Collapse
Affiliation(s)
- Feifei Wei
- Division of Cancer Immunotherapy, Kanagawa Cancer Center Research Institute, Yokohama, Japan
- Cancer Vaccine and Immunotherapy Center, Kanagawa Cancer Center, Yokohama, Japan
| | - Taku Kouro
- Division of Cancer Immunotherapy, Kanagawa Cancer Center Research Institute, Yokohama, Japan
- Cancer Vaccine and Immunotherapy Center, Kanagawa Cancer Center, Yokohama, Japan
| | - Yuko Nakamura
- Isotope Science Center, The University of Tokyo, Tokyo, Japan
| | - Hiroki Ueda
- Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo, Japan
| | - Susumu Iiizumi
- Division of Cancer Immunotherapy, Kanagawa Cancer Center Research Institute, Yokohama, Japan
- Research & Early Development Division, BrightPath Biotherapeutics Co., Ltd., Kawasaki, Japan
| | - Kyoko Hasegawa
- Division of Cancer Immunotherapy, Kanagawa Cancer Center Research Institute, Yokohama, Japan
- Research & Early Development Division, BrightPath Biotherapeutics Co., Ltd., Kawasaki, Japan
| | - Yuki Asahina
- Division of Cancer Immunotherapy, Kanagawa Cancer Center Research Institute, Yokohama, Japan
| | - Takeshi Kishida
- Department of Urology, Kanagawa Cancer Center, Yokohama, Japan
| | - Soichiro Morinaga
- Department of Hepato-Biliary and Pancreatic Surgery, Kanagawa Cancer Center, Yokohama, Japan
| | - Hidetomo Himuro
- Division of Cancer Immunotherapy, Kanagawa Cancer Center Research Institute, Yokohama, Japan
- Cancer Vaccine and Immunotherapy Center, Kanagawa Cancer Center, Yokohama, Japan
| | - Shun Horaguchi
- Division of Cancer Immunotherapy, Kanagawa Cancer Center Research Institute, Yokohama, Japan
- Cancer Vaccine and Immunotherapy Center, Kanagawa Cancer Center, Yokohama, Japan
- Department of Pediatric Surgery, Nihon University School of Medicine, Tokyo, Japan
| | - Kayoko Tsuji
- Division of Cancer Immunotherapy, Kanagawa Cancer Center Research Institute, Yokohama, Japan
- Cancer Vaccine and Immunotherapy Center, Kanagawa Cancer Center, Yokohama, Japan
| | - Yasunobu Mano
- Division of Cancer Immunotherapy, Kanagawa Cancer Center Research Institute, Yokohama, Japan
- Cancer Vaccine and Immunotherapy Center, Kanagawa Cancer Center, Yokohama, Japan
| | - Norihiro Nakamura
- Research & Early Development Division, BrightPath Biotherapeutics Co., Ltd., Kawasaki, Japan
| | | | - Tetsuro Sasada
- Division of Cancer Immunotherapy, Kanagawa Cancer Center Research Institute, Yokohama, Japan
- Cancer Vaccine and Immunotherapy Center, Kanagawa Cancer Center, Yokohama, Japan
| |
Collapse
|
2
|
Humphries EM, Xavier D, Ashman K, Hains PG, Robinson PJ. High-Throughput Proteomics and Phosphoproteomics of Rat Tissues Using Microflow Zeno SWATH. J Proteome Res 2024; 23:2355-2366. [PMID: 38819404 DOI: 10.1021/acs.jproteome.4c00010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
High-throughput tissue proteomics has great potential in the advancement of precision medicine. Here, we investigated the combined sensitivity of trap-elute microflow liquid chromatography with a ZenoTOF for DIA proteomics and phosphoproteomics. Method optimization was conducted on HEK293T cell lines to determine the optimal variable window size, MS2 accumulation time and gradient length. The ZenoTOF 7600 was then compared to the previous generation TripleTOF 6600 using eight rat organs, finding up to 23% more proteins using a fifth of the sample load and a third of the instrument time. Spectral reference libraries generated from Zeno SWATH data in FragPipe (MSFragger-DIA/DIA-NN) contained 4 times more fragment ions than the DIA-NN only library and quantified more proteins. Replicate single-shot phosphopeptide enrichments of 50-100 μg of rat tryptic peptide were analyzed by microflow HPLC using Zeno SWATH without fractionation. Using Spectronaut we quantified a shallow phosphoproteome containing 1000-3000 phosphoprecursors per organ. Promisingly, clear hierarchical clustering of organs was observed with high Pearson correlation coefficients >0.95 between replicate enrichments and median CV of 20%. The combined sensitivity of microflow HPLC with Zeno SWATH allows for the high-throughput quantitation of an extensive proteome and shallow phosphoproteome from small tissue samples.
Collapse
Affiliation(s)
- Erin M Humphries
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, New South Wales 2145, Australia
| | - Dylan Xavier
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, New South Wales 2145, Australia
| | - Keith Ashman
- Sciex, 96 Ricketts Road,Mount Waverley, Victoria 3149, Australia
| | - Peter G Hains
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, New South Wales 2145, Australia
| | - Phillip J Robinson
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, New South Wales 2145, Australia
| |
Collapse
|
3
|
Cao X, Sun S, Xing J. A Massive Proteogenomic Screen Identifies Thousands of Novel Peptides From the Human "Dark" Proteome. Mol Cell Proteomics 2024; 23:100719. [PMID: 38242438 PMCID: PMC10867589 DOI: 10.1016/j.mcpro.2024.100719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 01/01/2024] [Accepted: 01/16/2024] [Indexed: 01/21/2024] Open
Abstract
Although the human gene annotation has been continuously improved over the past 2 decades, numerous studies demonstrated the existence of a "dark proteome", consisting of proteins that were critical for biological processes but not included in widely used gene catalogs. The Genotype-Tissue Expression project generated more than 15,000 RNA-seq datasets from multiple tissues, which modeled 30 million transcripts in the human genome. To provide a resource of high-confidence novel proteins from the dark proteome, we screened 50,000 mass spectrometry runs from over 900 projects to identify proteins translated from the Genotype-Tissue Expression transcript model with proteomic support. We also integrated 3.8 million common genetic variants from the gnomAD database to improve peptide identification. As a result, we identified 170,529 novel peptides with proteomic evidence, of which 6048 passed the strictest standard we defined and were supported by PepQuery. We provided a user-friendly website (https://ncorf.genes.fun/) for researchers to check the evidence of novel peptides from their studies. The findings will improve our understanding of coding genes and facilitate genomic data interpretation in biomedical research.
Collapse
Affiliation(s)
- Xiaolong Cao
- Department of Anesthesiology, Zhujiang Hospital, Southern Medical University, Guangzhou, Guangdong, China; Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA; Human Genetic Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
| | - Siqi Sun
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA; Human Genetic Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
| | - Jinchuan Xing
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA; Human Genetic Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA.
| |
Collapse
|
4
|
Dimonaco NJ, Clare A, Kenobi K, Aubrey W, Creevey CJ. StORF-Reporter: finding genes between genes. Nucleic Acids Res 2023; 51:11504-11517. [PMID: 37897345 PMCID: PMC10682499 DOI: 10.1093/nar/gkad814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 09/04/2023] [Accepted: 09/27/2023] [Indexed: 10/30/2023] Open
Abstract
Large regions of prokaryotic genomes are currently without any annotation, in part due to well-established limitations of annotation tools. For example, it is routine for genes using alternative start codons to be misreported or completely omitted. Therefore, we present StORF-Reporter, a tool that takes an annotated genome and returns regions that may contain missing CDS genes from unannotated regions. StORF-Reporter consists of two parts. The first begins with the extraction of unannotated regions from an annotated genome. Next, Stop-ORFs (StORFs) are identified in these unannotated regions. StORFs are open reading frames that are delimited by stop codons and thus can capture those genes most often missing in genome annotations. We show this methodology recovers genes missing from canonical genome annotations. We inspect the results of the genomes of model organisms, the pangenome of Escherichia coli, and a set of 5109 prokaryotic genomes of 247 genera from the Ensembl Bacteria database. StORF-Reporter extended the core, soft-core and accessory gene collections, identified novel gene families and extended families into additional genera. The high levels of sequence conservation observed between genera suggest that many of these StORFs are likely to be functional genes that should now be considered for inclusion in canonical annotations.
Collapse
Affiliation(s)
- Nicholas J Dimonaco
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth SY23 3PD, Wales, UK
- Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, Wales, UK
- Department of Medicine, McMaster University, Hamilton, ON, Canada
- Farncombe Family Digestive Health Research Institute, McMaster University, Hamilton, ON, Canada
- School of Biological Sciences, Queen’s University Belfast, Belfast BT7 1NN, Northern Ireland, UK
| | - Amanda Clare
- Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, Wales, UK
| | - Kim Kenobi
- Department of Mathematics, Aberystwyth University, Aberystwyth SY23 3BZ, Wales, UK
| | - Wayne Aubrey
- Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, Wales, UK
| | - Christopher J Creevey
- School of Biological Sciences, Queen’s University Belfast, Belfast BT7 1NN, Northern Ireland, UK
| |
Collapse
|
5
|
Edelbo BL, Andreassen SN, Steffensen AB, MacAulay N. Day-night fluctuations in choroid plexus transcriptomics and cerebrospinal fluid metabolomics. PNAS NEXUS 2023; 2:pgad262. [PMID: 37614671 PMCID: PMC10443925 DOI: 10.1093/pnasnexus/pgad262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 07/06/2023] [Accepted: 07/31/2023] [Indexed: 08/25/2023]
Abstract
The cerebrospinal fluid (CSF) provides mechanical protection for the brain and serves as a brain dispersion route for nutrients, hormones, and metabolic waste. The CSF secretion rate is elevated in the dark phase in both humans and rats, which could support the CSF flow along the paravascular spaces that may be implicated in waste clearance. The similar diurnal CSF dynamics pattern observed in the day-active human and the nocturnal rat suggests a circadian regulation of this physiological variable, rather than sleep itself. To obtain a catalog of potential molecular drivers that could provide the day-night-associated modulation of the CSF secretion rate, we determined the diurnal fluctuation in the rat choroid plexus transcriptomic profile with RNA-seq and in the CSF metabolomics with ultraperformance liquid chromatography combined with mass spectrometry. We detected significant fluctuation of 19 CSF metabolites and differential expression of 2,778 choroid plexus genes between the light and the dark phase, the latter of which encompassed circadian rhythm-related genes and several choroid plexus transport mechanisms. The fluctuating components were organized with joint pathway analysis, of which several pathways demonstrated diurnal regulation. Our results illustrate substantial transcriptional and metabolic light-dark phase-mediated changes taking place in the rat choroid plexus and its encircling CSF. The combined data provide directions toward future identification of the molecular pathways governing the fluctuation of this physiological process and could potentially be harnessed to modulate the CSF dynamics in pathology.
Collapse
Affiliation(s)
| | | | | | - Nanna MacAulay
- Department of Neuroscience, University of Copenhagen, 2200 Copenhagen, Denmark
| |
Collapse
|
6
|
Scott AM, Karlsson C, Mohanty T, Hartman E, Vaara ST, Linder A, Malmström J, Malmström L. Generalized precursor prediction boosts identification rates and accuracy in mass spectrometry based proteomics. Commun Biol 2023; 6:628. [PMID: 37301900 PMCID: PMC10257694 DOI: 10.1038/s42003-023-04977-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 05/24/2023] [Indexed: 06/12/2023] Open
Abstract
Data independent acquisition mass spectrometry (DIA-MS) has recently emerged as an important method for the identification of blood-based biomarkers. However, the large search space required to identify novel biomarkers from the plasma proteome can introduce a high rate of false positives that compromise the accuracy of false discovery rates (FDR) using existing validation methods. We developed a generalized precursor scoring (GPS) method trained on 2.75 million precursors that can confidently control FDR while increasing the number of identified proteins in DIA-MS independent of the search space. We demonstrate how GPS can generalize to new data, increase protein identification rates, and increase the overall quantitative accuracy. Finally, we apply GPS to the identification of blood-based biomarkers and identify a panel of proteins that are highly accurate in discriminating between subphenotypes of septic acute kidney injury from undepleted plasma to showcase the utility of GPS in discovery DIA-MS proteomics.
Collapse
Affiliation(s)
- Aaron M Scott
- Division of Infection Medicine, Department of Clinical Sciences, Lund University, Lund, Sweden.
| | - Christofer Karlsson
- Division of Infection Medicine, Department of Clinical Sciences, Lund University, Lund, Sweden
| | - Tirthankar Mohanty
- Division of Infection Medicine, Department of Clinical Sciences, Lund University, Lund, Sweden
| | - Erik Hartman
- Division of Infection Medicine, Department of Clinical Sciences, Lund University, Lund, Sweden
| | - Suvi T Vaara
- Division of Anaesthesia and Intensive Care Medicine Department of Surgery, Intensive Care Units, Helsinki University Central Hospital, Box 340, 00029 HUS, Helsinki, Finland
| | - Adam Linder
- Division of Infection Medicine, Department of Clinical Sciences, Lund University, Lund, Sweden
| | - Johan Malmström
- Division of Infection Medicine, Department of Clinical Sciences, Lund University, Lund, Sweden
| | - Lars Malmström
- Division of Infection Medicine, Department of Clinical Sciences, Lund University, Lund, Sweden.
| |
Collapse
|
7
|
Wang S, Feng S, Pan C, Guo X. FineFDR: Fine-grained Taxonomy-specific False Discovery Rates Control in Metaproteomics. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2022; 2022:287-292. [PMID: 36910011 PMCID: PMC9998077 DOI: 10.1109/bibm55620.2022.9995401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Microbial community proteomics, also termed metaproteomics, investigates all proteins expressed by a microbiota. Tandem mass spectrometry (MS/MS) is the typical method for identifying proteins in metaproteomics, which involves searching the mass spectra against a protein sequence database. A major post-analysis step is controlling the false discovery rate (FDR), i.e., the ratio of false positives to the total number of annotations. The current popular target-decoy FDR estimation method treats all the peptides and proteins equally and overlooks that they could have varied probabilities of being identified. In this study, we report FineFDR, a framework for FDR assessment at fine-grained levels with taxonomy information considered. FineFDR groups the identified peptide-spectrum matches, peptides, and proteins from different taxonomic units and estimates the FDR in each group separately. Empirical experiments on the simulated and real-world data sets demonstrate that our FineFDR achieved higher precision and more peptide and protein identifications when compared to the state-of-the-art methods, such as Comet, Percolator, TIDD, and Tailor. FineFDR is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/FDR.
Collapse
Affiliation(s)
- Shengze Wang
- Department of Computer Science and Engineering University of North Texas, Denton, TX 76207, United States
| | - Shichao Feng
- Department of Computer Science and Engineering University of North Texas, Denton, TX 76207, United States
| | - Chongle Pan
- School of Computer Science Department of Microbiology and Plant Biology, University of Oklahoma, Norman, OK 73019, United States
| | - Xuan Guo
- Department of Computer Science and Engineering University of North Texas, Denton, TX 76207, United States
| |
Collapse
|
8
|
Jiang S, Shi J, Li Y, Zhang Z, Chang L, Wang G, Wu W, Yu L, Dai E, Zhang L, Lyu Z, Xu P, Zhang Y. Mirror proteases of Ac-Trypsin and Ac-LysargiNase precisely improve novel event identifications in Mycolicibacterium smegmatis MC2 155 by proteogenomic analysis. Front Microbiol 2022; 13:1015140. [PMID: 36312923 PMCID: PMC9597629 DOI: 10.3389/fmicb.2022.1015140] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 09/12/2022] [Indexed: 11/22/2022] Open
Abstract
Accurate identification of novel peptides remains challenging because of the lack of evaluation criteria in large-scale proteogenomic studies. Mirror proteases of trypsin and lysargiNase can generate complementary b/y ion series, providing the opportunity to efficiently assess authentic novel peptides in experiments other than filter potential targets by different false discovery rates (FDRs) ranking. In this study, a pair of in-house developed acetylated mirror proteases, Ac-Trypsin and Ac-LysargiNase, were used in Mycolicibacterium smegmatis MC2 155 for proteogenomic analysis. The mirror proteases accurately identified 368 novel peptides, exhibiting 75–80% b and y ion coverages against 65–68% y or b ion coverages of Ac-Trypsin (38.9% b and 68.3% y) or Ac-LysargiNase (65.5% b and 39.6% y) as annotated peptides from M. smegmatis MC2 155. The complementary b and y ion series largely increased the reliability of overlapped sequences derived from novel peptides. Among these novel peptides, 311 peptides were annotated in other public M. smegmatis strains, and 57 novel peptides with more continuous b and y pairs were obtained for further analysis after spectral quality assessment. This enabled mirror proteases to successfully correct six annotated proteins' N-termini and detect 17 new coding open reading frames (ORFs). We believe that mirror proteases will be an effective strategy for novel peptide detection in both prokaryotic and eukaryotic proteogenomics.
Collapse
Affiliation(s)
- Songhao Jiang
- Key Laboratory of Microbial Diversity Research and Application of Hebei, School of Life Sciences, Hebei University, Baoding, China
- Beijing Proteome Research Center, National Center for Protein Sciences Beijing, State Key Laboratory of Proteomics, Research Unit of Proteomics and Research and Development of New Drug of Chinese Academy of Medical Sciences, Institute of Lifeomics, Beijing, China
| | - Jiahui Shi
- Beijing Proteome Research Center, National Center for Protein Sciences Beijing, State Key Laboratory of Proteomics, Research Unit of Proteomics and Research and Development of New Drug of Chinese Academy of Medical Sciences, Institute of Lifeomics, Beijing, China
| | - Yanchang Li
- Beijing Proteome Research Center, National Center for Protein Sciences Beijing, State Key Laboratory of Proteomics, Research Unit of Proteomics and Research and Development of New Drug of Chinese Academy of Medical Sciences, Institute of Lifeomics, Beijing, China
| | - Zhenpeng Zhang
- Beijing Proteome Research Center, National Center for Protein Sciences Beijing, State Key Laboratory of Proteomics, Research Unit of Proteomics and Research and Development of New Drug of Chinese Academy of Medical Sciences, Institute of Lifeomics, Beijing, China
| | - Lei Chang
- Beijing Proteome Research Center, National Center for Protein Sciences Beijing, State Key Laboratory of Proteomics, Research Unit of Proteomics and Research and Development of New Drug of Chinese Academy of Medical Sciences, Institute of Lifeomics, Beijing, China
| | - Guibin Wang
- Beijing Proteome Research Center, National Center for Protein Sciences Beijing, State Key Laboratory of Proteomics, Research Unit of Proteomics and Research and Development of New Drug of Chinese Academy of Medical Sciences, Institute of Lifeomics, Beijing, China
| | - Wenhui Wu
- Beijing Proteome Research Center, National Center for Protein Sciences Beijing, State Key Laboratory of Proteomics, Research Unit of Proteomics and Research and Development of New Drug of Chinese Academy of Medical Sciences, Institute of Lifeomics, Beijing, China
- Guangzhou University of Chinese Medicine, Second Clinical Medicine College, Guangzhou Higher Education Mega Center, Guangzhou, China
| | - Liyan Yu
- Research Unit of Proteomics and Research and Development of New Drug, Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Erhei Dai
- The Fifth Hospital of Shijiazhuang, School of Public Health, Shijiazhuang, China
| | - Lixia Zhang
- Key Research Laboratory for Infectious Disease Prevention for State Administration of Traditional Chinese Medicine, Tianjin Institute of Respiratory Diseases, Haihe Hospital, Tianjin University, Tianjin, China
| | - Zhitang Lyu
- Key Laboratory of Microbial Diversity Research and Application of Hebei, School of Life Sciences, Hebei University, Baoding, China
- Zhitang Lyu
| | - Ping Xu
- Key Laboratory of Microbial Diversity Research and Application of Hebei, School of Life Sciences, Hebei University, Baoding, China
- Beijing Proteome Research Center, National Center for Protein Sciences Beijing, State Key Laboratory of Proteomics, Research Unit of Proteomics and Research and Development of New Drug of Chinese Academy of Medical Sciences, Institute of Lifeomics, Beijing, China
- Guangzhou University of Chinese Medicine, Second Clinical Medicine College, Guangzhou Higher Education Mega Center, Guangzhou, China
- Research Unit of Proteomics and Research and Development of New Drug, Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
- *Correspondence: Ping Xu
| | - Yao Zhang
- Beijing Proteome Research Center, National Center for Protein Sciences Beijing, State Key Laboratory of Proteomics, Research Unit of Proteomics and Research and Development of New Drug of Chinese Academy of Medical Sciences, Institute of Lifeomics, Beijing, China
- Yao Zhang
| |
Collapse
|
9
|
Data Incompleteness May form a Hard-to-Overcome Barrier to Decoding Life’s Mechanism. BIOLOGY 2022; 11:biology11081208. [PMID: 36009835 PMCID: PMC9404739 DOI: 10.3390/biology11081208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Revised: 08/03/2022] [Accepted: 08/10/2022] [Indexed: 11/23/2022]
Abstract
Simple Summary The influence of data incompleteness on the correctness of conclusions about the structure and functions of the objects under study is widely discussed in the literature. It was noted that even a small percentage of missing data can lead to incorrect conclusions and imperfect knowledge. In particular, incompleteness can lead to critical errors in the qualitative and quantitative assessments of interactions in biological systems and a distorted understanding of the functioning mechanisms of living systems. In this brief review, we attempt to demonstrate the extent of this incompleteness in functional information about living systems using the best-studied examples. We suggest that this incompleteness may form seemingly insurmountable barriers in deciphering the mechanisms of the functioning of complex systems with unpredictable properties arising from the interaction of the system components. Abstract In this brief review, we attempt to demonstrate that the incompleteness of data, as well as the intrinsic heterogeneity of biological systems, may form very strong and possibly insurmountable barriers for researchers trying to decipher the mechanisms of the functioning of live systems. We illustrate this challenge using the two most studied organisms: E. coli, with 34.6% genes lacking experimental evidence of function, and C. elegans, with identified proteins for approximately 50% of its genes. Another striking example is an artificial unicellular entity named JCVI-syn3.0, with a minimal set of genes. A total of 31.5% of the genes of JCVI-syn3.0 cannot be ascribed a specific biological function. The human interactome mapping project identified only 5–10% of all protein interactions in humans. In addition, most of the available data are static snapshots, and it is barely possible to generate realistic models of the dynamic processes within cells. Moreover, the existing interactomes reflect the de facto interaction but not its functional result, which is an unpredictable emerging property. Perhaps the completeness of molecular data on any living organism is beyond our reach and represents an unsolvable problem in biology.
Collapse
|