1
|
Jiang S, Qian Q, Zhu T, Zong W, Shang Y, Jin T, Zhang Y, Chen M, Wu Z, Chu Y, Zhang R, Luo S, Jing W, Zou D, Bao Y, Xiao J, Zhang Z. Cell Taxonomy: a curated repository of cell types with multifaceted characterization. Nucleic Acids Res 2022; 51:D853-D860. [PMID: 36161321 PMCID: PMC9825571 DOI: 10.1093/nar/gkac816] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 09/24/2022] [Indexed: 01/12/2023] Open
Abstract
Single-cell studies have delineated cellular diversity and uncovered increasing numbers of previously uncharacterized cell types in complex tissues. Thus, synthesizing growing knowledge of cellular characteristics is critical for dissecting cellular heterogeneity, developmental processes and tumorigenesis at single-cell resolution. Here, we present Cell Taxonomy (https://ngdc.cncb.ac.cn/celltaxonomy), a comprehensive and curated repository of cell types and associated cell markers encompassing a wide range of species, tissues and conditions. Combined with literature curation and data integration, the current version of Cell Taxonomy establishes a well-structured taxonomy for 3,143 cell types and houses a comprehensive collection of 26,613 associated cell markers in 257 conditions and 387 tissues across 34 species. Based on 4,299 publications and single-cell transcriptomic profiles of ∼3.5 million cells, Cell Taxonomy features multifaceted characterization for cell types and cell markers, involving quality assessment of cell markers and cell clusters, cross-species comparison, cell composition of tissues and cellular similarity based on markers. Taken together, Cell Taxonomy represents a fundamentally useful reference to systematically and accurately characterize cell types and thus lays an important foundation for deeply understanding and exploring cellular biology in diverse species.
Collapse
Affiliation(s)
| | | | | | - Wenting Zong
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China,China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yunfei Shang
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China,China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tong Jin
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China,China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yuansheng Zhang
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China,China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ming Chen
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China,China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zishan Wu
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China,China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yuan Chu
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China,China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Rongqin Zhang
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China,China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Sicheng Luo
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China,China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wei Jing
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China,China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Dong Zou
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China,China National Center for Bioinformation, Beijing 100101, China
| | - Yiming Bao
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China,China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jingfa Xiao
- Correspondence may also be addressed to Jingfa Xiao.
| | - Zhang Zhang
- To whom correspondence should be addressed. Tel: +86 10 84097261; Fax: +86 10 84097720;
| |
Collapse
|
2
|
Dall'Alba G, Casa PL, Abreu FPD, Notari DL, de Avila E Silva S. A Survey of Biological Data in a Big Data Perspective. BIG DATA 2022; 10:279-297. [PMID: 35394342 DOI: 10.1089/big.2020.0383] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The amount of available data is continuously growing. This phenomenon promotes a new concept, named big data. The highlight technologies related to big data are cloud computing (infrastructure) and Not Only SQL (NoSQL; data storage). In addition, for data analysis, machine learning algorithms such as decision trees, support vector machines, artificial neural networks, and clustering techniques present promising results. In a biological context, big data has many applications due to the large number of biological databases available. Some limitations of biological big data are related to the inherent features of these data, such as high degrees of complexity and heterogeneity, since biological systems provide information from an atomic level to interactions between organisms or their environment. Such characteristics make most bioinformatic-based applications difficult to build, configure, and maintain. Although the rise of big data is relatively recent, it has contributed to a better understanding of the underlying mechanisms of life. The main goal of this article is to provide a concise and reliable survey of the application of big data-related technologies in biology. As such, some fundamental concepts of information technology, including storage resources, analysis, and data sharing, are described along with their relation to biological data.
Collapse
Affiliation(s)
- Gabriel Dall'Alba
- Computational Biology and Bioinformatics Laboratory, Biotechnology Institute, Department of Life Sciences, University of Caxias do Sul, Caxias do Sul, Brazil
- Genome Science and Technology Program, Faculty of Science, The University of British Columbia, Vancouver, Canada
| | - Pedro Lenz Casa
- Computational Biology and Bioinformatics Laboratory, Biotechnology Institute, Department of Life Sciences, University of Caxias do Sul, Caxias do Sul, Brazil
| | - Fernanda Pessi de Abreu
- Computational Biology and Bioinformatics Laboratory, Biotechnology Institute, Department of Life Sciences, University of Caxias do Sul, Caxias do Sul, Brazil
| | - Daniel Luis Notari
- Computational Biology and Bioinformatics Laboratory, Biotechnology Institute, Department of Life Sciences, University of Caxias do Sul, Caxias do Sul, Brazil
| | - Scheila de Avila E Silva
- Computational Biology and Bioinformatics Laboratory, Biotechnology Institute, Department of Life Sciences, University of Caxias do Sul, Caxias do Sul, Brazil
| |
Collapse
|
3
|
Cellular Aquaculture: Prospects and Challenges. MICROMACHINES 2022; 13:mi13060828. [PMID: 35744442 PMCID: PMC9228929 DOI: 10.3390/mi13060828] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 04/27/2022] [Accepted: 04/28/2022] [Indexed: 02/06/2023]
Abstract
Aquaculture plays an important role as one of the fastest-growing food-producing sectors in global food and nutritional security. Demand for animal protein in the form of fish has been increasing tremendously. Aquaculture faces many challenges to produce quality fish for the burgeoning world population. Cellular aquaculture can provide an alternative, climate-resilient food production system to produce quality fish. Potential applications of fish muscle cell lines in cellular aquaculture have raised the importance of developing and characterizing these cell lines. In vitro models, such as the mouse C2C12 cell line, have been extremely useful for expanding knowledge about molecular mechanisms of muscle growth and differentiation in mammals. Such studies are in an infancy stage in teleost due to the unavailability of equivalent permanent muscle cell lines, except a few fish muscle cell lines that have not yet been used for cellular aquaculture. The Prospect of cell-based aquaculture relies on the development of appropriate muscle cells, optimization of cell conditions, and mass production of cells in bioreactors. Hence, it is required to develop and characterize fish muscle cell lines along with their cryopreservation in cell line repositories and production of ideal mass cells in suitably designed bioreactors to overcome current cellular aquaculture challenges.
Collapse
|
4
|
Mao S, Zhang Y, Seelig G, Kannan S. CellMeSH: probabilistic cell-type identification using indexed literature. Bioinformatics 2022; 38:1393-1402. [PMID: 34893819 PMCID: PMC8826164 DOI: 10.1093/bioinformatics/btab834] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2020] [Revised: 11/21/2021] [Accepted: 12/06/2021] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION Single-cell RNA sequencing (scRNA-seq) is widely used for analyzing gene expression in multi-cellular systems and provides unprecedented access to cellular heterogeneity. scRNA-seq experiments aim to identify and quantify all cell types present in a sample. Measured single-cell transcriptomes are grouped by similarity and the resulting clusters are mapped to cell types based on cluster-specific gene expression patterns. While the process of generating clusters has become largely automated, annotation remains a laborious ad hoc effort that requires expert biological knowledge. RESULTS Here, we introduce CellMeSH-a new automated approach to identifying cell types for clusters based on prior literature. CellMeSH combines a database of gene-cell-type associations with a probabilistic method for database querying. The database is constructed by automatically linking gene and cell-type information from millions of publications using existing indexed literature resources. Compared to manually constructed databases, CellMeSH is more comprehensive and is easily updated with new data. The probabilistic query method enables reliable information retrieval even though the gene-cell-type associations extracted from the literature are noisy. CellMeSH is also able to optionally utilize prior knowledge about tissues or cells for further annotation improvement. CellMeSH achieves top-one and top-three accuracies on a number of mouse and human datasets that are consistently better than existing approaches. AVAILABILITY AND IMPLEMENTATION Web server at https://uncurl.cs.washington.edu/db_query and API at https://github.com/shunfumao/cellmesh. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shunfu Mao
- Electrical and Computer Engineering Department, University of Washington, Seattle, WA 98195, USA
| | - Yue Zhang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - Georg Seelig
- Electrical and Computer Engineering Department, University of Washington, Seattle, WA 98195, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - Sreeram Kannan
- Electrical and Computer Engineering Department, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
5
|
He X, Liu L, Chen B, Wu C. Using Cell Type-Specific Genes to Identify Cell-Type Transitions Between Different in vitro Culture Conditions. Front Cell Dev Biol 2021; 9:644261. [PMID: 34249906 PMCID: PMC8267371 DOI: 10.3389/fcell.2021.644261] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2020] [Accepted: 04/09/2021] [Indexed: 11/13/2022] Open
Abstract
In vitro differentiation or expansion of stem and progenitor cells under chemical stimulation or genetic manipulation is used for understanding the molecular mechanisms of cell differentiation and self-renewal. However, concerns around the cell identity of in vitro-cultured cells exist. Bioinformatics methods, which rely heavily on signatures of cell types, have been developed to estimate cell types in bulk samples. The Tabula Muris Senis project provides an important basis for the comprehensive identification of signatures for different cell types. Here, we identified 46 cell type-specific (CTS) gene clusters for 83 mouse cell types. We conducted Gene Ontology term enrichment analysis on the gene clusters and revealed the specific functions of the relevant cell types. Next, we proposed a simple method, named CTSFinder, to identify different cell types between bulk RNA-Seq samples using the 46 CTS gene clusters. We applied CTSFinder on bulk RNA-Seq data from 17 organs and from developing mouse liver over different stages. We successfully identified the specific cell types between organs and captured the dynamics of different cell types during liver development. We applied CTSFinder with bulk RNA-Seq data from a growth factor-induced neural progenitor cell culture system and identified the dynamics of brain immune cells and nonimmune cells during the long-time cell culture. We also applied CTSFinder with bulk RNA-Seq data from reprogramming induced pluripotent stem cells and identified the stage when those cells were massively induced. Finally, we applied CTSFinder with bulk RNA-Seq data from in vivo and in vitro developing mouse retina and captured the dynamics of different cell types in the two development systems. The CTS gene clusters and CTSFinder method could thus serve as promising toolkits for assessing the cell identity of in vitro culture systems.
Collapse
Affiliation(s)
- Xuelin He
- Department of Nephrology, Beilun People's Hospital, Ningbo, China.,Kidney Disease Center, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.,Kidney Disease Immunology Laboratory, The Third Grade Laboratory, State Administration of Traditional Chinese Medicine of China, Hangzhou, China
| | - Li Liu
- Department of Library, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Baode Chen
- Department of Laboratory Medicine, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Chao Wu
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.,State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| |
Collapse
|
6
|
Kim J, Park J. Single-cell transcriptomics: a novel precision medicine technique in nephrology. Korean J Intern Med 2021; 36:479-490. [PMID: 33076636 PMCID: PMC8137400 DOI: 10.3904/kjim.2020.415] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 10/21/2020] [Indexed: 02/06/2023] Open
Abstract
Due to the complex structure and function of the kidneys, the mechanism of kidney disease is unclear. In particular, transcriptomics approaches at the bulk level are unable to differentiate primary autonomous responses, which lead to disease development, from secondary cell non-autonomous responses. Single-cell analysis techniques can overcome the limitations inherent in the measurement of heterogeneous cell populations and clarify the central issues in kidney biology and disease pathogenesis. Single-cell sequencing helps in identifying disease-related biomarkers and pathways, stratifying patients, and deciding on appropriate treatment methods. Here we review a variety of single-cell analysis techniques and single-cell transcriptomics studies performed in the field of nephrology. Moreover, we discuss the future prospects of single-cell analysis-based precision medicine in nephrology.
Collapse
Affiliation(s)
- Jisoo Kim
- School of Life Sciences, Gwangju Institute of Science and Technology, Gwangju, Korea
| | - Jihwan Park
- School of Life Sciences, Gwangju Institute of Science and Technology, Gwangju, Korea
| |
Collapse
|
7
|
Abstract
Advances in next generation sequencing (NGS) technologies resulted in a broad array of large-scale gene expression studies and an unprecedented volume of whole messenger RNA (mRNA) sequencing data, or the transcriptome (also known as RNA sequencing, or RNA-seq). These include the Genotype Tissue Expression project (GTEx) and The Cancer Genome Atlas (TCGA), among others. Here we cover some of the commonly used datasets, provide an overview on how to begin the analysis pipeline, and how to explore and interpret the data provided by these publicly available resources.
Collapse
Affiliation(s)
- Yazeed Zoabi
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Noam Shomron
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|
8
|
Panina Y, Karagiannis P, Kurtz A, Stacey GN, Fujibuchi W. Human Cell Atlas and cell-type authentication for regenerative medicine. Exp Mol Med 2020; 52:1443-1451. [PMID: 32929224 PMCID: PMC8080834 DOI: 10.1038/s12276-020-0421-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 03/05/2020] [Accepted: 03/09/2020] [Indexed: 12/22/2022] Open
Abstract
In modern biology, the correct identification of cell types is required for the developmental study of tissues and organs and the production of functional cells for cell therapies and disease modeling. For decades, cell types have been defined on the basis of morphological and physiological markers and, more recently, immunological markers and molecular properties. Recent advances in single-cell RNA sequencing have opened new doors for the characterization of cells at the individual and spatiotemporal levels on the basis of their RNA profiles, vastly transforming our understanding of cell types. The objective of this review is to survey the current progress in the field of cell-type identification, starting with the Human Cell Atlas project, which aims to sequence every cell in the human body, to molecular marker databases for individual cell types and other sources that address cell-type identification for regenerative medicine based on cell data guidelines.
Collapse
Affiliation(s)
- Yulia Panina
- Center for iPS Cell Research and Application (CiRA), Kyoto University, 53 Kawahara-cho, Shogoin, Sakyo-ku, Kyoto, 606-8507, Japan
| | - Peter Karagiannis
- Center for iPS Cell Research and Application (CiRA), Kyoto University, 53 Kawahara-cho, Shogoin, Sakyo-ku, Kyoto, 606-8507, Japan
| | - Andreas Kurtz
- BIH Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353, Berlin, Germany
| | - Glyn N Stacey
- International Stem Cell Banking Initiative, 2 High Street, Barley, Herts, SG88HZ, UK
- National Stem Cell Resource Centre, Institute of Zoology, Chinese Academy of Sciences, 100190, Beijing, China
- Innovation Academy for Stem Cell and Regeneration, Chinese Academy of Sciences, 100101, Beijing, China
| | - Wataru Fujibuchi
- Center for iPS Cell Research and Application (CiRA), Kyoto University, 53 Kawahara-cho, Shogoin, Sakyo-ku, Kyoto, 606-8507, Japan.
| |
Collapse
|
9
|
El Amrani K, Alanis-Lobato G, Mah N, Kurtz A, Andrade-Navarro MA. Detection of condition-specific marker genes from RNA-seq data with MGFR. PeerJ 2019; 7:e6970. [PMID: 31179178 PMCID: PMC6542349 DOI: 10.7717/peerj.6970] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 04/07/2019] [Indexed: 12/19/2022] Open
Abstract
The identification of condition-specific genes is key to advancing our understanding of cell fate decisions and disease development. Differential gene expression analysis (DGEA) has been the standard tool for this task. However, the amount of samples that modern transcriptomic technologies allow us to study, makes DGEA a daunting task. On the other hand, experiments with low numbers of replicates lack the statistical power to detect differentially expressed genes. We have previously developed MGFM, a tool for marker gene detection from microarrays, that is particularly useful in the latter case. Here, we have adapted the algorithm behind MGFM to detect markers in RNA-seq data. MGFR groups samples with similar gene expression levels and flags potential markers of a sample type if their highest expression values represent all replicates of this type. We have benchmarked MGFR against other methods and found that its proposed markers accurately characterize the functional identity of different tissues and cell types in standard and single cell RNA-seq datasets. Then, we performed a more detailed analysis for three of these datasets, which profile the transcriptomes of different human tissues, immune and human blastocyst cell types, respectively. MGFR’s predicted markers were compared to gold-standard lists for these datasets and outperformed the other marker detectors. Finally, we suggest novel candidate marker genes for the examined tissues and cell types. MGFR is implemented as a freely available Bioconductor package (https://doi.org/doi:10.18129/B9.bioc.MGFR), which facilitates its use and integration with bioinformatics pipelines.
Collapse
Affiliation(s)
- Khadija El Amrani
- Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Berlin, Germany
| | | | - Nancy Mah
- Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Andreas Kurtz
- Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Berlin, Germany
| | | |
Collapse
|
10
|
Kurtz A, Elsallab M, Sanzenbacher R, Abou-El-Enein M. Linking Scattered Stem Cell-Based Data to Advance Therapeutic Development. Trends Mol Med 2019; 25:8-19. [DOI: 10.1016/j.molmed.2018.10.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Revised: 10/20/2018] [Accepted: 10/22/2018] [Indexed: 02/07/2023]
|
11
|
Roy AL, Conroy RS. Toward mapping the human body at a cellular resolution. Mol Biol Cell 2018; 29:1779-1785. [PMID: 30058989 PMCID: PMC6085824 DOI: 10.1091/mbc.e18-04-0260] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Revised: 06/01/2018] [Accepted: 06/07/2018] [Indexed: 12/21/2022] Open
Abstract
The adult human body is composed of nearly 37 trillion cells, each with potentially unique molecular characteristics. This Perspective describes some of the challenges and opportunities faced in mapping the molecular characteristics of these cells in specific regions of the body and highlights areas for international collaboration toward the broader goal of comprehensively mapping the human body with cellular resolution.
Collapse
Affiliation(s)
- Ananda L. Roy
- Office of Strategic Coordination, Division of Program Coordination, Planning, and Strategic Initiatives, Office of the Director, National Institutes of Health, Bethesda, MD 20892
| | - Richard S. Conroy
- Office of Strategic Coordination, Division of Program Coordination, Planning, and Strategic Initiatives, Office of the Director, National Institutes of Health, Bethesda, MD 20892
| |
Collapse
|
12
|
Abstract
The Cellosaurus is a knowledge resource on cell lines. It aims to describe all cell lines used in biomedical research. Its scope encompasses both vertebrates and invertebrates. Currently, information for >100,000 cell lines is provided. For each cell line, it provides a wealth of information, cross-references, and literature citations. The Cellosaurus is available on the ExPASy server (https://web.expasy.org/cellosaurus/) and can be downloaded in a variety of formats. Among its many uses, the Cellosaurus is a key resource to help researchers identify potentially contaminated/misidentified cell lines, thus contributing to improving the quality of research in the life sciences.
Collapse
Affiliation(s)
- Amos Bairoch
- Computer and Laboratory Investigation of Proteins of Human Origin Group, Faculty of Medicine, Swiss Institute of Bioinformatics, University of Geneva, Geneva 4, Switzerland
| |
Collapse
|
13
|
Wang Q, Armenia J, Zhang C, Penson AV, Reznik E, Zhang L, Minet T, Ochoa A, Gross BE, Iacobuzio-Donahue CA, Betel D, Taylor BS, Gao J, Schultz N. Unifying cancer and normal RNA sequencing data from different sources. Sci Data 2018; 5:180061. [PMID: 29664468 PMCID: PMC5903355 DOI: 10.1038/sdata.2018.61] [Citation(s) in RCA: 103] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Accepted: 02/12/2018] [Indexed: 01/21/2023] Open
Abstract
Driven by the recent advances of next generation sequencing (NGS) technologies and an urgent need to decode complex human diseases, a multitude of large-scale studies were conducted recently that have resulted in an unprecedented volume of whole transcriptome sequencing (RNA-seq) data, such as the Genotype Tissue Expression project (GTEx) and The Cancer Genome Atlas (TCGA). While these data offer new opportunities to identify the mechanisms underlying disease, the comparison of data from different sources remains challenging, due to differences in sample and data processing. Here, we developed a pipeline that processes and unifies RNA-seq data from different studies, which includes uniform realignment, gene expression quantification, and batch effect removal. We find that uniform alignment and quantification is not sufficient when combining RNA-seq data from different sources and that the removal of other batch effects is essential to facilitate data comparison. We have processed data from GTEx and TCGA and successfully corrected for study-specific biases, enabling comparative analysis between TCGA and GTEx. The normalized datasets are available for download on figshare.
Collapse
Affiliation(s)
- Qingguo Wang
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA.,Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA.,College of Computing & Technology, Lipscomb University, Nashville, Tennessee 37204, USA
| | - Joshua Armenia
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA.,Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA
| | - Chao Zhang
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York, 10021, USA
| | - Alexander V Penson
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA.,Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA
| | - Ed Reznik
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA.,Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA
| | - Liguo Zhang
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA
| | - Thais Minet
- College of Computing & Technology, Lipscomb University, Nashville, Tennessee 37204, USA
| | - Angelica Ochoa
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA.,Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA
| | - Benjamin E Gross
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA.,Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA
| | | | - Doron Betel
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York, 10021, USA
| | - Barry S Taylor
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA.,Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA.,Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA
| | - Jianjiong Gao
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA.,Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA
| | - Nikolaus Schultz
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA.,Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA.,Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA
| |
Collapse
|
14
|
Abstract
BACKGROUND Cell lines and cell types are extensively studied in biomedical research yielding to a significant amount of publications each year. Identifying cell lines and cell types precisely in publications is crucial for science reproducibility and knowledge integration. There are efforts for standardisation of the cell nomenclature based on ontology development to support FAIR principles of the cell knowledge. However, it is important to analyse the usage of cell nomenclature in publications at a large scale for understanding the level of uptake of cell nomenclature in literature by scientists. In this study, we analyse the usage of cell nomenclature, both in Vivo, and in Vitro in biomedical literature by using text mining methods and present our results. RESULTS We identified 59% of the cell type classes in the Cell Ontology and 13% of the cell line classes in the Cell Line Ontology in the literature. Our analysis showed that cell line nomenclature is much more ambiguous compared to the cell type nomenclature. However, trends indicate that standardised nomenclature for cell lines and cell types are being increasingly used in publications by the scientists. CONCLUSIONS Our findings provide an insight to understand how experimental cells are described in publications and may allow for an improved standardisation of cell type and cell line nomenclature as well as can be utilised to develop efficient text mining applications on cell types and cell lines. All data generated in this study is available at https://github.com/shenay/CellNomenclatureStudy.
Collapse
Affiliation(s)
- Şenay Kafkas
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University Science and Technology, 4700 KAUST, Thuwal, 23955-6900 Saudi Arabia
| | - Sirarat Sarntivijai
- The European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, SD CB10 1 UK
| | - Robert Hoehndorf
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University Science and Technology, 4700 KAUST, Thuwal, 23955-6900 Saudi Arabia
| |
Collapse
|
15
|
Fu X, He F, Li Y, Shahveranov A, Hutchins AP. Genomic and molecular control of cell type and cell type conversions. CELL REGENERATION 2017; 6:1-7. [PMID: 29348912 PMCID: PMC5769489 DOI: 10.1016/j.cr.2017.09.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/07/2017] [Revised: 09/06/2017] [Accepted: 09/18/2017] [Indexed: 12/17/2022]
Abstract
Organisms are made of a limited number of cell types that combine to form higher order tissues and organs. Cell types have traditionally been defined by their morphologies or biological activity, yet the underlying molecular controls of cell type remain unclear. The onset of single cell technologies, and more recently genomics (particularly single cell genomics), has substantially increased the understanding of the concept of cell type, but has also increased the complexity of this understanding. These new technologies have added a new genome wide molecular dimension to the description of cell type, with genome-wide expression and epigenetic data acting as a cell type ‘fingerprint’ to describe the cell state. Using these genomic fingerprints cell types are being increasingly defined based on specific genomic and molecular criteria, without necessarily a distinct biological function. In this review, we will discuss the molecular definitions of cell types and cell type control, and particularly how endogenous and exogenous transcription factors can control cell types and cell type conversions.
Collapse
|
16
|
Novikov P, Kozlovskaya N, Moiseev S, Shilov E, Bobkova I, Schreiber A, Tsvetkov D, Gollasch M, Mah N, El Amrani K, Kurtz A. Therapeutic Complement Targeting in ANCA-Associated Vasculitides and Thrombotic Microangiopathy. Biomed Hub 2016; 1:1-11. [PMID: 31988889 PMCID: PMC6945915 DOI: 10.1159/000453106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2016] [Accepted: 10/26/2016] [Indexed: 11/19/2022] Open
Abstract
Anti-neutrophil cytoplasmic antibody (ANCA)-associated vasculitides (AAVs) are a group of systemic autoimmune disorders characterized by necrotizing inflammation of medium-to-small vessels, a relative paucity of immune deposits, and an association with detectable circulating ANCAs. AAVs include granulomatosis with polyangiitis (renamed from Wegener's granulomatosis), microscopic polyangiitis, and eosinophilic granulomatosis with polyangiitis (Churg-Strauss syndrome). Until recently, AAVs have not been viewed as complement-mediated disorders. However, recent findings predominantly from animal studies demonstrated a crucial role of the complement system in the pathogenesis of AAVs. Complement activation or defects in its regulation have been described in an increasing number of acquired or genetically driven forms of thrombotic microangiopathy. Coinciding with this expanding spectrum of complement-mediated diseases, the question arises as to which AAV patients might benefit from a complement-targeted therapy. Therapies directed against the complement system point to the necessity of a genetic workup of genes of complement components and regulators in patients with AAV. Genetic testing together with pluripotent stem cells and bioinformatics tools may broaden our approach to the treatment of patients with aggressive forms of AAV.
Collapse
Affiliation(s)
- Pavel Novikov
- I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | | | - Sergey Moiseev
- I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Eugene Shilov
- I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Irina Bobkova
- I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Adrian Schreiber
- Experimental and Clinical Research Center, a Joint Cooperation between the Charité Medical Faculty and the Max Delbrück Center for Molecular Medicine at the Charité and the Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
- Nephrology and Intensive Care Medicine, Campus Virchow, Berlin, Germany
| | - Dmitry Tsvetkov
- Experimental and Clinical Research Center, a Joint Cooperation between the Charité Medical Faculty and the Max Delbrück Center for Molecular Medicine at the Charité and the Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
- Nephrology and Intensive Care Medicine, Campus Virchow, Berlin, Germany
| | - Maik Gollasch
- Experimental and Clinical Research Center, a Joint Cooperation between the Charité Medical Faculty and the Max Delbrück Center for Molecular Medicine at the Charité and the Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
- Nephrology and Intensive Care Medicine, Campus Virchow, Berlin, Germany
- Koch Metchnikoff Forum, Section Nephrology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Nancy Mah
- Berlin-Brandenburg Center for Regenerative Therapies (BCRT), Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Khadija El Amrani
- Berlin-Brandenburg Center for Regenerative Therapies (BCRT), Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Andreas Kurtz
- Berlin-Brandenburg Center for Regenerative Therapies (BCRT), Charité - Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|
17
|
Seltmann S, Lekschas F, Müller R, Stachelscheid H, Bittner MS, Zhang W, Kidane L, Seriola A, Veiga A, Stacey G, Kurtz A. hPSCreg--the human pluripotent stem cell registry. Nucleic Acids Res 2015; 44:D757-63. [PMID: 26400179 PMCID: PMC4702942 DOI: 10.1093/nar/gkv963] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Accepted: 09/11/2015] [Indexed: 12/22/2022] Open
Abstract
The human pluripotent stem cell registry (hPSCreg), accessible at http://hpscreg.eu, is a public registry and data portal for human embryonic and induced pluripotent stem cell lines (hESC and hiPSC). Since their first isolation the number of hESC lines has steadily increased to over 3000 and new iPSC lines are generated in a rapidly growing number of laboratories as a result of their potentially broad applicability in biomedicine and drug testing. Many of these lines are deposited in stem cell banks, which are globally established to store tens of thousands of lines from healthy and diseased donors. The Registry provides comprehensive and standardized biological and legal information as well as tools to search and compare information from multiple hPSC sources and hence addresses a translational research need. To facilitate unambiguous identification over different resources, hPSCreg automatically creates a unique standardized name for each cell line registered. In addition to biological information, hPSCreg stores extensive data about ethical standards regarding cell sourcing and conditions for application and privacy protection. hPSCreg is the first global registry that holds both, manually validated scientific and ethical information on hPSC lines, and provides access by means of a user-friendly, mobile-ready web application.
Collapse
Affiliation(s)
- Stefanie Seltmann
- Berlin-Brandenburg Center for Regenerative Therapies, Charité University Medicine Berlin, Berlin, 13353, Germany
| | - Fritz Lekschas
- Berlin-Brandenburg Center for Regenerative Therapies, Charité University Medicine Berlin, Berlin, 13353, Germany
| | - Robert Müller
- Berlin-Brandenburg Center for Regenerative Therapies, Charité University Medicine Berlin, Berlin, 13353, Germany
| | - Harald Stachelscheid
- Berlin-Brandenburg Center for Regenerative Therapies, Charité University Medicine Berlin, Berlin, 13353, Germany Berlin Institute of Health-Stem Cell Core Facility, 13353 Berlin, Germany
| | - Marie-Sophie Bittner
- Berlin-Brandenburg Center for Regenerative Therapies, Charité University Medicine Berlin, Berlin, 13353, Germany
| | - Weiping Zhang
- Berlin-Brandenburg Center for Regenerative Therapies, Charité University Medicine Berlin, Berlin, 13353, Germany
| | - Luam Kidane
- National Institute for Biological Standards and Control, South Mimms EN63QG, UK
| | - Anna Seriola
- Center of Regenerative Medicine in Barcelona, Barcelona Stem Cell Bank, Barcelona 08003, Spain
| | - Anna Veiga
- Center of Regenerative Medicine in Barcelona, Barcelona Stem Cell Bank, Barcelona 08003, Spain
| | - Glyn Stacey
- National Institute for Biological Standards and Control, South Mimms EN63QG, UK
| | - Andreas Kurtz
- Berlin-Brandenburg Center for Regenerative Therapies, Charité University Medicine Berlin, Berlin, 13353, Germany Seoul National University, College of Veterinary Medicine and Research Institute for Veterinary Science, Seoul 151-742, Republic of Korea
| |
Collapse
|
18
|
El Amrani K, Stachelscheid H, Lekschas F, Kurtz A, Andrade-Navarro MA. MGFM: a novel tool for detection of tissue and cell specific marker genes from microarray gene expression data. BMC Genomics 2015; 16:645. [PMID: 26314578 PMCID: PMC4552366 DOI: 10.1186/s12864-015-1785-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 07/18/2015] [Indexed: 11/10/2022] Open
Abstract
Background Identification of marker genes associated with a specific tissue/cell type is a fundamental challenge in genetic and cell research. Marker genes are of great importance for determining cell identity, and for understanding tissue specific gene function and the molecular mechanisms underlying complex diseases. Results We have developed a new bioinformatics tool called MGFM (Marker Gene Finder in Microarray data) to predict marker genes from microarray gene expression data. Marker genes are identified through the grouping of samples of the same type with similar marker gene expression levels. We verified our approach using two microarray data sets from the NCBI’s Gene Expression Omnibus public repository encompassing samples for similar sets of five human tissues (brain, heart, kidney, liver, and lung). Comparison with another tool for tissue-specific gene identification and validation with literature-derived established tissue markers established functionality, accuracy and simplicity of our tool. Furthermore, top ranked marker genes were experimentally validated by reverse transcriptase-polymerase chain reaction (RT-PCR). The sets of predicted marker genes associated with the five selected tissues comprised well-known genes of particular importance in these tissues. The tool is freely available from the Bioconductor web site, and it is also provided as an online application integrated into the CellFinder platform (http://cellfinder.org/analysis/marker). Conclusions MGFM is a useful tool to predict tissue/cell type marker genes using microarray gene expression data. The implementation of the tool as an R-package as well as an application within CellFinder facilitates its use. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1785-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Khadija El Amrani
- Charité - Universitätsmedizin Berlin, Berlin Brandenburg Center for Regenerative Therapies (BCRT), Berlin, 13353, Germany.
| | - Harald Stachelscheid
- Charité - Universitätsmedizin Berlin, Berlin Brandenburg Center for Regenerative Therapies (BCRT), Berlin, 13353, Germany. .,Berlin Institute of Health, Berlin, 10117, Germany.
| | - Fritz Lekschas
- Charité - Universitätsmedizin Berlin, Berlin Brandenburg Center for Regenerative Therapies (BCRT), Berlin, 13353, Germany.
| | - Andreas Kurtz
- Charité - Universitätsmedizin Berlin, Berlin Brandenburg Center for Regenerative Therapies (BCRT), Berlin, 13353, Germany. .,Seoul National University, College of Veterinary Medicine and Research Institute for Veterinary Science, Seoul, 151-742, Republic of Korea.
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University of Mainz, Mainz, Germany. .,Institute of Molecular Biology, Mainz, Germany.
| |
Collapse
|
19
|
Kurtz A, Stacey G, Kidane L, Seriola A, Stachelscheid H, Veiga A. Regulatory insight into the European human pluripotent stem cell registry. Stem Cells Dev 2015; 23 Suppl 1:51-5. [PMID: 25457963 DOI: 10.1089/scd.2014.0319] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
The European pluripotent stem cell registry aims at listing qualified pluripotent stem cell (PSC) lines that are available globally together with relevant information for each cell line. Specific emphasis is being put on documenting ethical procurement of the cells and providing evidence of pluripotency. The report discusses the tasks and challenges for a global PSC registry as an instrument to develop collaboration, to access cells from diverse resources and banks, and to implement standards, and as a means to follow up usage of cells and support adherence to regulatory and scientific standards and transparency for stakeholders.
Collapse
Affiliation(s)
- Andreas Kurtz
- 1 Berlin Brandenburg Center for Regenerative Medicine, Charité-Universitätsmedizin Berlin , Berlin, Germany
| | | | | | | | | | | |
Collapse
|
20
|
Lekschas F, Stachelscheid H, Seltmann S, Kurtz A. Semantic Body Browser: graphical exploration of an organism and spatially resolved expression data visualization. Bioinformatics 2014; 31:794-6. [PMID: 25344497 DOI: 10.1093/bioinformatics/btu707] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
UNLABELLED Advancing technologies generate large amounts of molecular and phenotypic data on cells, tissues and organisms, leading to an ever-growing detail and complexity while information retrieval and analysis becomes increasingly time-consuming. The Semantic Body Browser is a web application for intuitively exploring the body of an organism from the organ to the subcellular level and visualising expression profiles by means of semantically annotated anatomical illustrations. It is used to comprehend biological and medical data related to the different body structures while relying on the strong pattern recognition capabilities of human users. AVAILABILITY AND IMPLEMENTATION The Semantic Body Browser is a JavaScript web application that is freely available at http://sbb.cellfinder.org. The source code is provided on https://github.com/flekschas/sbb.
Collapse
Affiliation(s)
- Fritz Lekschas
- Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany and Seoul National University, College of Veterinary Medicine and Research Institute for Veterinary Science, Seoul 151-742, Republic of Korea
| | - Harald Stachelscheid
- Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany and Seoul National University, College of Veterinary Medicine and Research Institute for Veterinary Science, Seoul 151-742, Republic of Korea
| | - Stefanie Seltmann
- Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany and Seoul National University, College of Veterinary Medicine and Research Institute for Veterinary Science, Seoul 151-742, Republic of Korea
| | - Andreas Kurtz
- Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany and Seoul National University, College of Veterinary Medicine and Research Institute for Veterinary Science, Seoul 151-742, Republic of Korea Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany and Seoul National University, College of Veterinary Medicine and Research Institute for Veterinary Science, Seoul 151-742, Republic of Korea
| |
Collapse
|