1
|
Ponomarenko E, Poverennaya E, Pyatnitskiy M, Lisitsa A, Moshkovskii S, Ilgisonis E, Chernobrovkin A, Archakov A. Comparative ranking of human chromosomes based on post-genomic data. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2012; 16:604-11. [PMID: 22966780 DOI: 10.1089/omi.2012.0034] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
The goal of the Human Proteome Project (HPP) is to fully characterize the 21,000 human protein-coding genes with respect to the estimated two million proteins they encode. As such, the HPP aims to create a comprehensive, detailed resource to help elucidate protein functions and to advance medical treatment. Similarly to the Human Genome Project (HGP), the HPP chose a chromosome-centric approach, assigning different chromosomes to different countries. Here we introduce a scoring method for chromosome ranking based on several characteristics, including relevance to health problems, existing published knowledge, and current transcriptome and proteome coverage. The score of each chromosome was computed as a weighted combination of indexes reflecting the aforementioned characteristics. The approach is tailored to the chromosome-centric HPP (C-HPP), and is advantageous in that it takes into account currently available information. We ranked the human chromosomes using the proposed score, and observed that Chr Y, Chr 13, and Chr 18 were top-ranked, whereas the scores of Chr 19, Chr 11, and Chr 17 were comparatively low. For Chr 18, selected for the Russian part of C-HPP, about 25% of the encoded genes were associated with diseases, including cancers and neurodegenerative and psychiatric diseases, as well as type 1 diabetes and essential hypertension. This ranking approach could easily be adapted to prioritize research for other sets of genes, such as metabolic pathways and functional categories.
Collapse
|
Research Support, Non-U.S. Gov't |
13 |
11 |
2
|
Ilgisonis E, Lisitsa A, Kudryavtseva V, Ponomarenko E. Creation of Individual Scientific Concept-Centered Semantic Maps Based on Automated Text-Mining Analysis of PubMed. Adv Bioinformatics 2018; 2018:4625394. [PMID: 30147721 PMCID: PMC6083525 DOI: 10.1155/2018/4625394] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2018] [Accepted: 07/05/2018] [Indexed: 01/22/2023] Open
Abstract
Concept-centered semantic maps were created based on a text-mining analysis of PubMed using the BiblioEngine_v2018 software. The objects ("concepts") of a semantic map can be MeSH-terms or other terms (names of proteins, diseases, chemical compounds, etc.) structured in the form of controlled vocabularies. The edges between the two objects were automatically calculated based on the index of semantic similarity, which is proportional to the number of publications related to both objects simultaneously. On the one hand, an individual semantic map created based on the already published papers allows us to trace scientific inquiry. On the other hand, a prospective analysis based on the study of PubMed search history enables us to determine the possible directions for future research.
Collapse
|
research-article |
7 |
9 |
3
|
Ilgisonis E, Vavilov N, Ponomarenko E, Lisitsa A, Poverennaya E, Zgoda V, Radko S, Archakov A. Genome of the Single Human Chromosome 18 as a "Gold Standard" for Its Transcriptome. Front Genet 2021; 12:674534. [PMID: 34194472 PMCID: PMC8238407 DOI: 10.3389/fgene.2021.674534] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 05/17/2021] [Indexed: 01/29/2023] Open
Abstract
The cutoff level applied in sequencing analysis varies according to the sequencing technology, sample type, and study purpose, which can largely affect the coverage and reliability of the data obtained. In this study, we aimed to determine the optimal combination of parameters for reliable RNA transcriptome data analysis. Toward this end, we compared the results obtained from different transcriptome analysis platforms (quantitative polymerase chain reaction, Illumina RNASeq, and Oxford Nanopore Technologies MinION) for the transcriptome encoded by human chromosome 18 (Chr 18) using the same sample types (HepG2 cells and liver tissue). A total of 275 protein-coding genes encoded by Chr 18 was taken as the gene set for evaluation. The combination of Illumina RNASeq and MinION nanopore technologies enabled the detection of at least one transcript for each protein-coding gene encoded by Chr 18. This combination also reduced the probability of false-positive detection of low-copy transcripts due to the simultaneous confirmation of the presence of a transcript by the two fundamentally different technologies: short reads essential for reliable detection (Illumina RNASeq) and long-read sequencing data (MinION). The combination of these technologies achieved complete coverage of all 275 protein-coding genes on Chr 18, identifying transcripts with non-zero expression levels. This approach can improve distinguishing the biological and technical reasons for the absence of mRNA detection for a given gene in transcriptomics.
Collapse
|
Journal Article |
4 |
6 |
4
|
Tarbeeva S, Lyamtseva E, Lisitsa A, Kozlova A, Ponomarenko E, Ilgisonis E. ScanBious: Survey for Obesity Genes Using PubMed Abstracts and DisGeNET. J Pers Med 2021; 11:246. [PMID: 33805313 PMCID: PMC8065449 DOI: 10.3390/jpm11040246] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Revised: 03/23/2021] [Accepted: 03/24/2021] [Indexed: 12/29/2022] Open
Abstract
We used automatic text-mining of PubMed abstracts of papers related to obesity, with the aim of revealing that the information used in abstracts reflects the current understanding and key concepts of this widely explored problem. We compared expert data from DisGeNET to the results of an automated MeSH (Medical Subject Heading) search, which was performed by the ScanBious web tool. The analysis provided an overview of the obesity field, highlighting major trends such as physiological conditions, age, and diet, as well as key well-studied genes, such as adiponectin and its receptor. By intersecting the DisGeNET knowledge with the ScanBious results, we deciphered four clusters of obesity-related genes. An initial set of 100+ thousand abstracts and 622 genes was reduced to 19 genes, distributed among just a few groups: heredity, inflammation, intercellular signaling, and cancer. Rapid profiling of articles could drive personalized medicine: if the disease signs of a particular person were superimposed on a general network, then it would be possible to understand which are non-specific (observed in cohorts and, therefore, most likely have known treatment solutions) and which are less investigated, and probably represent a personalized case.
Collapse
|
research-article |
4 |
2 |
5
|
Evdokimov P, Kudryavtsev A, Ilgisonis E, Ponomarenko E, Lisitsa A. Use of scientific social networking to improve the research strategies of PubMed readers. BMC Res Notes 2016; 9:113. [PMID: 26892337 PMCID: PMC4758102 DOI: 10.1186/s13104-016-1920-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 02/08/2016] [Indexed: 11/13/2022] Open
Abstract
Background Keeping up with journal articles on a daily basis is an important activity of scientists engaged in biomedical research. Usually, journal articles and papers in the field of biomedicine are accessed through the Medline/PubMed electronic library. In the process of navigating PubMed, researchers unknowingly generate user-specific reading profiles that can be shared within a social networking environment. This paper examines the structure of the social networking environment generated by PubMed users. Methods A web browser plugin was developed to map [in Medical Subject Headings (MeSH) terms] the reading patterns of individual PubMed users. Results We developed a scientific social network based on the personal research profiles of readers of biomedical articles. A browser plugin is used to record the digital object identifier or PubMed ID of web pages. Recorded items are posted on the activity feed and automatically mapped to PubMed abstract. Within the activity feed a user can trace back previously browsed articles and insert comments. By calculating the frequency with which specific MeSH occur, the research interests of PubMed users can be visually represented with a tag cloud. Finally, research profiles can be searched for matches between network users. Conclusions A social networking environment was created using MeSH terms to map articles accessed through the Medline/PubMed online library system. In-network social communication is supported by the recommendation of articles and by matching users with similar scientific interests. The system is available at http://bioknol.org/en/. Electronic supplementary material The online version of this article (doi:10.1186/s13104-016-1920-y) contains supplementary material, which is available to authorized users.
Collapse
|
|
9 |
1 |
6
|
Sarygina E, Kozlova A, Deinichenko K, Radko S, Ptitsyn K, Khmeleva S, Kurbatov LK, Spirin P, Prassolov VS, Ilgisonis E, Lisitsa A, Ponomarenko E. Principal Component Analysis of Alternative Splicing Profiles Revealed by Long-Read ONT Sequencing in Human Liver Tissue and Hepatocyte-Derived HepG2 and Huh7 Cell Lines. Int J Mol Sci 2023; 24:15502. [PMID: 37958484 PMCID: PMC10648607 DOI: 10.3390/ijms242115502] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 10/12/2023] [Accepted: 10/14/2023] [Indexed: 11/15/2023] Open
Abstract
The long-read RNA sequencing developed by Oxford Nanopore Technology provides a direct quantification of transcript isoforms. That makes the number of transcript isoforms per gene an intrinsically suitable metric for alternative splicing (AS) profiling in the application to this particular type of RNA sequencing. By using this simple metric and recruiting principal component analysis (PCA) as a tool to visualize the high-dimensional transcriptomic data, we were able to group biospecimens of normal human liver tissue and hepatocyte-derived malignant HepG2 and Huh7 cells into clear clusters in a 2D space. For the transcriptome-wide analysis, the clustering was observed regardless whether all genes were included in analysis or only those expressed in all biospecimens tested. However, in the application to a particular set of genes known as pharmacogenes, which are involved in drug metabolism, the clustering worsened dramatically in the latter case. Based on PCA data, the subsets of genes most contributing to biospecimens' grouping into clusters were selected and subjected to gene ontology analysis that allowed us to determine the top 20 biological processes among which translation and processes related to its regulation dominate. The suggested metrics can be a useful addition to the existing metrics for describing AS profiles, especially in application to transcriptome studies with long-read sequencing.
Collapse
|
research-article |
2 |
1 |
7
|
Krasnov G, Shkrigunov T, Radko S, Ptitsyn K, Shapovalova V, Timoshenko O, Khmeleva S, Kurbatov L, Kiseleva Y, Ilgisonis E, Kiseleva O, Vakhrushev I, Tsvetkova A, Buromski I, Markin S, Archakov A, Lisitsa A, Ponomarenko E. Human Chr18 transcriptome dataset combined from the Illumina HiSeq, ONT MinION, and qPCR data. Data Brief 2021; 36:107130. [PMID: 34095379 PMCID: PMC8166769 DOI: 10.1016/j.dib.2021.107130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 04/26/2021] [Accepted: 05/03/2021] [Indexed: 11/01/2022] Open
Abstract
The chromosome-centric dataset was created by applying several technologies of transcriptome profiling. The described dataset is available at NCBI repository (BioProject ID PRJNA635536). The dataset referred to the same type of tissue, cell lines, transcriptome sequencing technologies, and was accomplished in a period of 8 years (the first data were obtained in 2013 while the last ones - in 2020). The high-throughput sequencing technologies were employed along with the quantitative PCR (qPCR) approach, for data generation using the gene expression level assessment. qPCR was performed for a limited group of genes, encoded on human chromosome 18, for the Russian part of the Chromosome-Centric Human Proteome Project. The data of high-throughput sequencing are provided as Excel spreadsheets, where the data on FPKM and TMP values were evaluated for the whole transcriptome with both Illumina HiSeq and Oxford Nanopore Technologies MinION sequencing.
Collapse
|
Journal Article |
4 |
0 |
8
|
Ilgisonis E, Kiseleva O, Kuznetsova K. Math, science, history, unraveling the mystery-That all started with de novo! EUPA OPEN PROTEOMICS 2020; 22-23:25-27. [PMID: 31890551 PMCID: PMC6924286 DOI: 10.1016/j.euprot.2019.07.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 07/17/2019] [Indexed: 11/25/2022]
Abstract
This work on solving the mystery of words encoded by amino acids in peptides was derived by the YPIC-EuPA Challenge. We received a dry synthetic peptide sample and performed a mass spectrometric analysis followed by de novo peptide sequencing. As a result, a part of “Rays of positive electricity and their application to chemical analyses” by J.J.Tomson was found to be encoded in the peptides of the sample. The words were first revealed from the peptides, that matched by Google search to find the answer. After that, the answer was validated using a standard proteomic search against a database constructed from the quotation found.
Collapse
|
|
5 |
|
9
|
Kozlova A, Sarygina E, Ilgisonis E, Tarbeeva S, Ponomarenko E. The Translatome Map: RNC-Seq vs. Ribo-Seq for Profiling of HBE, A549, and MCF-7 Cell Lines. Int J Mol Sci 2024; 25:10970. [PMID: 39456753 PMCID: PMC11507076 DOI: 10.3390/ijms252010970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2024] [Revised: 10/07/2024] [Accepted: 10/08/2024] [Indexed: 10/28/2024] Open
Abstract
Gene expression is a tightly regulated process that involves multiple layers of control, including transcriptional, post-transcriptional, and translational regulation. To gain a comprehensive understanding of gene expression dynamics and its functional implications, it is crucial to compare translatomic, transcriptomic, and proteomic data. The two most common analysis methods, Ribo-seq and RNC-Seq, were used to analyze the translatome of the same sample, whose datasets were downloaded from the TranslatomeDB database. The resulting translatome maps obtained for three cell lines (HBE, A549, and MCF-7) using these two methods were comparatively analyzed. The two methods of translatome analysis were shown to provide comparable results and can be used interchangeably. The obtained mRNA translation patterns were annotated in the transcriptome and proteome context for the same sample, which may become the basis for the reconstruction of the molecular mechanisms of pathological process development in the future.
Collapse
|
research-article |
1 |
|
10
|
Archakov A, Vavilov N, Ilgisonis E, Lisitsa A, Ponomarenko E, Farafonova T, Tikhonova O, Zgoda V. Number of Detected Proteins as the Function of the Sensitivity of Proteomic
Technology in Human Liver Cells. Curr Protein Pept Sci 2022; 23:290-298. [DOI: 10.2174/1389203723666220526092941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 03/14/2022] [Accepted: 03/25/2022] [Indexed: 11/22/2022]
Abstract
Aims:
The main goal of the Russian part of C-HPP is to detect and functionally annotate
missing proteins (PE2-PE4) encoded by human chromosome 18. To achieve this goal, it is necessary to
use the most sensitive methods of analysis.
Background:
However, identifying such proteins in a complex biological mixture using mass spectrometry
(MS)-based methods is difficult due to the insufficient sensitivity of proteomic analysis methods.
A possible solution to the problem is the pre-fractionation of a complex biological sample at the
sample preparation stage.
Objective:
This study aims to measure the detection limit of SRM SIS analysis using a standard set of
UPS1 proteins and find a way to enhance the sensitivity of the analysis and to, detect proteins encoded
by the human chromosome 18 in liver tissue samples, and compare the data with transcriptomic analysis
of the same samples.
Methods:
Mass spectrometry, data-dependent acquisition, selected reaction monitoring, highperformance
liquid chromatography, data-dependent acquisition in combination with pre-fractionation
by alkaline reversed-phase chromatography, selected reaction monitoring in combination with prefractionation
by alkaline reversed-phase chromatography methods were used in this study.
Results:
The results revealed that 100% of UPS1 proteins in a mixture could only be identified at a
concentration of at least 10-9 М. The decrease in concentration leads to protein losses associated with
technology sensitivity, and no UPS1 protein is detected at a concentration of 10-13 М. Therefore, the
two-dimensional fractionation of samples was applied to improve sensitivity. The human liver tissue
was examined by selected reaction monitoring and shotgun methods of MS analysis using onedimensional
and two-dimensional fractionation to identify the proteins encoded by human chromosome
18. A total of 134 proteins were identified. The overlap between proteomic and transcriptomic data in
human liver tissue was ~50%.
Conclusion:
The sample concentration technique is well suited for a standard UPS1 system that is not
contaminated with a complex biological sample. However, it is not suitable for use with a complex biological
protein mixture. Thus, it is necessary to develop more sophisticated fractionation systems for the
detection of all low-copy proteins. This weak convergence is due to the low sensitivity of proteomic
technology compared to transcriptomic approaches. Also, total mRNA was used to perform RNA-seq
analysis, but not all detected mRNA molecules could be translated into proteins. This introduces additional
uncertainty in the data; in the future, we plan to study only translated mRNA molecules-the translatome.
Data is available via ProteomeXchange with identifier PXD026997.
Collapse
|
|
3 |
|
11
|
Sarygina E, Kliuchnikova A, Tarbeeva S, Ilgisonis E, Ponomarenko E. Model Organisms in Aging Research: Evolution of Database Annotation and Ortholog Discovery. Genes (Basel) 2024; 16:8. [PMID: 39858555 PMCID: PMC11765380 DOI: 10.3390/genes16010008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2024] [Revised: 12/14/2024] [Accepted: 12/16/2024] [Indexed: 01/27/2025] Open
Abstract
BACKGROUND This study aims to analyze the exploration degree of popular model organisms by utilizing annotations from the UniProtKB (Swiss-Prot) knowledge base. The research focuses on understanding the genomic and post-genomic data of various organisms, particularly in relation to aging as an integral model for studying the molecular mechanisms underlying pathological processes and physiological states. METHODS Having characterized the organisms by selected parameters (numbers of gene splice variants, post-translational modifications, etc.) using previously developed information models, we calculated proteome sizes: the number of possible proteoforms for each species. Our analysis also involved searching for orthologs of human aging genes within these model species. RESULTS Our findings indicate that genomic and post-genomic data for more primitive species, such as bacteria and fungi, are more comprehensively characterized compared to other organisms. This is attributed to their experimental accessibility and simplicity. Additionally, we discovered that the genomes of the most studied model organisms allow for a detailed analysis of the aging process, revealing a greater number of orthologous genes related to aging. CONCLUSIONS The results highlight the importance of annotating the genomes of less-studied species to identify orthologs of marker genes associated with complex physiological processes, including aging. Species that potentially possess unique traits associated with longevity and resilience to age-related changes require comprehensive genomic studies.
Collapse
|
research-article |
1 |
|
12
|
Kozlova A, Sarygina E, Deinichenko K, Radko S, Ptitsyn K, Khmeleva S, Kurbatov L, Spirin P, Prassolov V, Ilgisonis E, Lisitsa A, Ponomarenko E. Comparison of Alternative Splicing Landscapes Revealed by Long-Read Sequencing in Hepatocyte-Derived HepG2 and Huh7 Cultured Cells and Human Liver Tissue. BIOLOGY 2023; 12:1494. [PMID: 38132320 PMCID: PMC10740679 DOI: 10.3390/biology12121494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 11/17/2023] [Accepted: 11/25/2023] [Indexed: 12/23/2023]
Abstract
The long-read RNA sequencing developed by Oxford Nanopore Technologies provides a direct quantification of transcript isoforms, thereby making it possible to present alternative splicing (AS) profiles as arrays of single splice variants with different abundances. Additionally, AS profiles can be presented as arrays of genes characterized by the degree of alternative splicing (the DAS-the number of detected splice variants per gene). Here, we successfully utilized the DAS to reveal biological pathways influenced by the alterations in AS in human liver tissue and the hepatocyte-derived malignant cell lines HepG2 and Huh7, thus employing the mathematical algorithm of gene set enrichment analysis. Furthermore, analysis of the AS profiles as abundances of single splice variants by using the graded tissue specificity index τ provided the selection of the groups of genes expressing particular splice variants specifically in liver tissue, HepG2 cells, and Huh7 cells. The majority of these splice variants were translated into proteins products and appeal to be in focus regarding further insights into the mechanisms underlying cell malignization. The used metrics are intrinsically suitable for transcriptome-wide AS profiling using long-read sequencing.
Collapse
|
research-article |
2 |
|