Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Talukder A, Barham C, Li X, Hu H. Interpretation of deep learning in genomics and epigenomics. Brief Bioinform 2021;22:bbaa177. [PMID: 34020542 PMCID: PMC8138893 DOI: 10.1093/bib/bbaa177] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Revised: 06/26/2020] [Accepted: 07/10/2020] [Indexed: 12/17/2022] Open

For:	Talukder A, Barham C, Li X, Hu H. Interpretation of deep learning in genomics and epigenomics. Brief Bioinform 2021;22:bbaa177. [PMID: 34020542 PMCID: PMC8138893 DOI: 10.1093/bib/bbaa177] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Revised: 06/26/2020] [Accepted: 07/10/2020] [Indexed: 12/17/2022] Open

Number

Cited by Other Article(s)

Xu L, Liu Y. Identification, Design, and Application of Noncoding Cis-Regulatory Elements. Biomolecules 2024;14:945. [PMID: 39199333 PMCID: PMC11352686 DOI: 10.3390/biom14080945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Revised: 07/25/2024] [Accepted: 07/30/2024] [Indexed: 09/01/2024] Open

Chen V, Yang M, Cui W, Kim JS, Talwalkar A, Ma J. Applying interpretable machine learning in computational biology-pitfalls, recommendations and opportunities for new developments. Nat Methods 2024;21:1454-1461. [PMID: 39122941 PMCID: PMC11348280 DOI: 10.1038/s41592-024-02359-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 06/24/2024] [Indexed: 08/12/2024]

Viet CT, Zhang M, Dharmaraj N, Li GY, Pearson AT, Manon VA, Grandhi A, Xu K, Aouizerat BE, Young S. Artificial Intelligence Applications in Oral Cancer and Oral Dysplasia. Tissue Eng Part A 2024. [PMID: 39041628 DOI: 10.1089/ten.tea.2024.0096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/24/2024] Open

Abstract

Oral squamous cell carcinoma (OSCC) is a highly unpredictable disease with devastating mortality rates that have not changed over the past decades, in the face of advancements in treatments and biomarkers, which have improved survival for other cancers. Delays in diagnosis are frequent, leading to more disfiguring treatments and poor outcomes in patients. The clinical challenge lies in identifying those patients at highest risk for developing OSCC. Oral epithelial dysplasia (OED) is a precursor of OSCC with highly variable behavior across patients. There is no reliable clinical, pathologic, histologic or molecular biomarker to determine individual risk in OED patients. Similarly, there are no robust biomarkers to predict treatment outcomes or mortality of OSCC patients. This review aims to highlight advancements in artificial intelligence (AI)-based methods to develop predictive biomarkers of OED transformation to OSCC or predictive biomarkers of OSCC mortality and treatment response. Machine-learning based biomarkers, such as S100A7, demonstrate promising appraisal for the risk of malignant transformation of OED. Machine learning-enhanced multiplex immunohistochemistry (mIHC) workflows examine immune cell patterns and organization within the tumor immune microenvironment to generate outcome predictions in immunotherapy. Deep learning (DL) is an AI-based method using an extended neural network or related architecture with multiple "hidden" layers of simulated neurons to combine simple visual features into complex patterns. DL-based digital pathology is currently being developed to assess OED and OSCC outcomes. The integration of machine learning in epigenomics aims to examine the epigenetic modification of diseases and improve our ability to detect, classify, and predict outcomes associated with epigenetic marks. Collectively, these tools showcase promising advancements in discovery and technology, which may provide a potential solution to addressing the current limitations in predicting OED transformation and OSCC behavior, both of which are clinical challenges that must be addressed in order to improve OSCC survival.

Collapse

Weston M, Hu H, Li X. PSPI: A deep learning approach for prokaryotic small protein identification. Front Genet 2024;15:1439423. [PMID: 39050248 PMCID: PMC11266045 DOI: 10.3389/fgene.2024.1439423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Accepted: 06/18/2024] [Indexed: 07/27/2024] Open

Wang S, Wang W. Interpretable prediction of mRNA abundance from promoter sequence using contextual regression models. NAR Genom Bioinform 2024;6:lqae055. [PMID: 38807713 PMCID: PMC11131020 DOI: 10.1093/nargab/lqae055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 04/08/2024] [Accepted: 05/12/2024] [Indexed: 05/30/2024] Open

Dotan E, Jaschek G, Pupko T, Belinkov Y. Effect of tokenization on transformers for biological sequences. Bioinformatics 2024;40:btae196. [PMID: 38608190 PMCID: PMC11055402 DOI: 10.1093/bioinformatics/btae196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 02/20/2024] [Accepted: 04/11/2024] [Indexed: 04/14/2024] Open

Abstract

MOTIVATION

Deep-learning models are transforming biological research, including many bioinformatics and comparative genomics algorithms, such as sequence alignments, phylogenetic tree inference, and automatic classification of protein functions. Among these deep-learning algorithms, models for processing natural languages, developed in the natural language processing (NLP) community, were recently applied to biological sequences. However, biological sequences are different from natural languages, such as English, and French, in which segmentation of the text to separate words is relatively straightforward. Moreover, biological sequences are characterized by extremely long sentences, which hamper their processing by current machine-learning models, notably the transformer architecture. In NLP, one of the first processing steps is to transform the raw text to a list of tokens. Deep-learning applications to biological sequence data mostly segment proteins and DNA to single characters. In this work, we study the effect of alternative tokenization algorithms on eight different tasks in biology, from predicting the function of proteins and their stability, through nucleotide sequence alignment, to classifying proteins to specific families.

RESULTS

We demonstrate that applying alternative tokenization algorithms can increase accuracy and at the same time, substantially reduce the input length compared to the trivial tokenizer in which each character is a token. Furthermore, applying these tokenization algorithms allows interpreting trained models, taking into account dependencies among positions. Finally, we trained these tokenizers on a large dataset of protein sequences containing more than 400 billion amino acids, which resulted in over a 3-fold decrease in the number of tokens. We then tested these tokenizers trained on large-scale data on the above specific tasks and showed that for some tasks it is highly beneficial to train database-specific tokenizers. Our study suggests that tokenizers are likely to be a critical component in future deep-network analysis of biological sequence data.

AVAILABILITY AND IMPLEMENTATION

Code, data, and trained tokenizers are available on https://github.com/technion-cs-nlp/BiologicalTokenizers.

Collapse

Sharma A, Lysenko A, Jia S, Boroevich KA, Tsunoda T. Advances in AI and machine learning for predictive medicine. J Hum Genet 2024:10.1038/s10038-024-01231-y. [PMID: 38424184 DOI: 10.1038/s10038-024-01231-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 02/04/2024] [Accepted: 02/12/2024] [Indexed: 03/02/2024]

Yan Z, Ge F, Liu Y, Zhang Y, Li F, Song J, Yu DJ. TransEFVP: A Two-Stage Approach for the Prediction of Human Pathogenic Variants Based on Protein Sequence Embedding Fusion. J Chem Inf Model 2024;64:1407-1418. [PMID: 38334115 DOI: 10.1021/acs.jcim.3c02019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2024]

Mikhailova AA, Rinke S, Harrison MC. Genomic signatures of eusocial evolution in insects. CURRENT OPINION IN INSECT SCIENCE 2024;61:101136. [PMID: 37922983 DOI: 10.1016/j.cois.2023.101136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 10/27/2023] [Accepted: 10/28/2023] [Indexed: 11/07/2023]

Zheng H, Wang S, Li X, Hu H. A computational modeling of pri-miRNA expression. PLoS One 2024;19:e0290768. [PMID: 38165860 PMCID: PMC10760784 DOI: 10.1371/journal.pone.0290768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Accepted: 08/15/2023] [Indexed: 01/04/2024] Open

van Oosterhout C. AI-informed conservation genomics. Heredity (Edinb) 2024;132:1-4. [PMID: 38151537 PMCID: PMC10798949 DOI: 10.1038/s41437-023-00666-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 12/09/2023] [Accepted: 12/11/2023] [Indexed: 12/29/2023] Open

Qiao C, Gao B, Liu Y, Hu X, Hu W, Calhoun VD, Wang YP. Deep learning with explainability for characterizing age-related intrinsic differences in dynamic brain functional connectivity. Med Image Anal 2023;90:102941. [PMID: 37683445 DOI: 10.1016/j.media.2023.102941] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 08/19/2023] [Accepted: 08/22/2023] [Indexed: 09/10/2023]

Toussaint PA, Leiser F, Thiebes S, Schlesner M, Brors B, Sunyaev A. Explainable artificial intelligence for omics data: a systematic mapping study. Brief Bioinform 2023;25:bbad453. [PMID: 38113073 PMCID: PMC10729786 DOI: 10.1093/bib/bbad453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 07/28/2023] [Accepted: 11/08/2023] [Indexed: 12/21/2023] Open

Klie A, Laub D, Talwar JV, Stites H, Jores T, Solvason JJ, Farley EK, Carter H. Predictive analyses of regulatory sequences with EUGENe. NATURE COMPUTATIONAL SCIENCE 2023;3:946-956. [PMID: 38177592 PMCID: PMC10768637 DOI: 10.1038/s43588-023-00544-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 09/27/2023] [Indexed: 01/06/2024]

Tognon M, Giugno R, Pinello L. A survey on algorithms to characterize transcription factor binding sites. Brief Bioinform 2023;24:bbad156. [PMID: 37099664 PMCID: PMC10422928 DOI: 10.1093/bib/bbad156] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 03/27/2023] [Accepted: 04/01/2023] [Indexed: 04/28/2023] Open

Wysocka M, Wysocki O, Zufferey M, Landers D, Freitas A. A systematic review of biologically-informed deep learning models for cancer: fundamental trends for encoding and interpreting oncology data. BMC Bioinformatics 2023;24:198. [PMID: 37189058 DOI: 10.1186/s12859-023-05262-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 03/30/2023] [Indexed: 05/17/2023] Open

Abstract

BACKGROUND

There is an increasing interest in the use of Deep Learning (DL) based methods as a supporting analytical framework in oncology. However, most direct applications of DL will deliver models with limited transparency and explainability, which constrain their deployment in biomedical settings.

METHODS

This systematic review discusses DL models used to support inference in cancer biology with a particular emphasis on multi-omics analysis. It focuses on how existing models address the need for better dialogue with prior knowledge, biological plausibility and interpretability, fundamental properties in the biomedical domain. For this, we retrieved and analyzed 42 studies focusing on emerging architectural and methodological advances, the encoding of biological domain knowledge and the integration of explainability methods.

RESULTS

We discuss the recent evolutionary arch of DL models in the direction of integrating prior biological relational and network knowledge to support better generalisation (e.g. pathways or Protein-Protein-Interaction networks) and interpretability. This represents a fundamental functional shift towards models which can integrate mechanistic and statistical inference aspects. We introduce a concept of bio-centric interpretability and according to its taxonomy, we discuss representational methodologies for the integration of domain prior knowledge in such models.

CONCLUSIONS

The paper provides a critical outlook into contemporary methods for explainability and interpretability used in DL for cancer. The analysis points in the direction of a convergence between encoding prior knowledge and improved interpretability. We introduce bio-centric interpretability which is an important step towards formalisation of biological interpretability of DL models and developing methods that are less problem- or application-specific.

Collapse

Smith GD, Ching WH, Cornejo-Páramo P, Wong ES. Decoding enhancer complexity with machine learning and high-throughput discovery. Genome Biol 2023;24:116. [PMID: 37173718 PMCID: PMC10176946 DOI: 10.1186/s13059-023-02955-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 04/28/2023] [Indexed: 05/15/2023] Open

Xu F, Qiao C, Zhou H, Calhoun VD, Stephen JM, Wilson TW, Wang Y. An explainable autoencoder with multi-paradigm fMRI fusion for identifying differences in dynamic functional connectivity during brain development. Neural Netw 2023;159:185-197. [PMID: 36580711 DOI: 10.1016/j.neunet.2022.12.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 10/19/2022] [Accepted: 12/12/2022] [Indexed: 12/24/2022]

Deep learning in regulatory genomics: from identification to design. Curr Opin Biotechnol 2023;79:102887. [PMID: 36640453 DOI: 10.1016/j.copbio.2022.102887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 11/12/2022] [Accepted: 12/14/2022] [Indexed: 01/14/2023]

Hu Q, Li K, Yang C, Wang Y, Huang R, Gu M, Xiao Y, Huang Y, Chen L. The role of artificial intelligence based on PET/CT radiomics in NSCLC: Disease management, opportunities, and challenges. Front Oncol 2023;13:1133164. [PMID: 36959810 PMCID: PMC10028142 DOI: 10.3389/fonc.2023.1133164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Accepted: 02/20/2023] [Indexed: 03/09/2023] Open

Affiliation(s)

Qiuyuan Hu Department of positron emission tomography/computed tomography (PET/CT) Center, Yunnan Cancer Hospital, The Third Affiliated Hospital of Kunming Medical University, Cancer Center of Yunnan Province, Kunming, Yunnan, China
Ke Li Department of Cancer Biotherapy Center, Yunnan Cancer Hospital, The Third Affiliated Hospital of Kunming Medical University, Cancer Center of Yunnan Province, Kunming, Yunnan, China
Conghui Yang Department of positron emission tomography/computed tomography (PET/CT) Center, Yunnan Cancer Hospital, The Third Affiliated Hospital of Kunming Medical University, Cancer Center of Yunnan Province, Kunming, Yunnan, China
Yue Wang Department of positron emission tomography/computed tomography (PET/CT) Center, Yunnan Cancer Hospital, The Third Affiliated Hospital of Kunming Medical University, Cancer Center of Yunnan Province, Kunming, Yunnan, China
Rong Huang Department of positron emission tomography/computed tomography (PET/CT) Center, Yunnan Cancer Hospital, The Third Affiliated Hospital of Kunming Medical University, Cancer Center of Yunnan Province, Kunming, Yunnan, China
Mingqiu Gu Department of positron emission tomography/computed tomography (PET/CT) Center, Yunnan Cancer Hospital, The Third Affiliated Hospital of Kunming Medical University, Cancer Center of Yunnan Province, Kunming, Yunnan, China
Yuqiang Xiao Department of positron emission tomography/computed tomography (PET/CT) Center, Yunnan Cancer Hospital, The Third Affiliated Hospital of Kunming Medical University, Cancer Center of Yunnan Province, Kunming, Yunnan, China
Yunchao Huang Department of Thoracic Surgery I, Key Laboratory of Lung Cancer of Yunnan Province, Yunnan Cancer Hospital, The Third Affiliated Hospital of Kunming Medical University, Cancer Center of Yunnan Province, Kunming, Yunnan, China *Correspondence: Long Chen, ; Yunchao Huang,
Long Chen Department of positron emission tomography/computed tomography (PET/CT) Center, Yunnan Cancer Hospital, The Third Affiliated Hospital of Kunming Medical University, Cancer Center of Yunnan Province, Kunming, Yunnan, China *Correspondence: Long Chen, ; Yunchao Huang,

Collapse

Ott J, Park T. Overview of frequent pattern mining. Genomics Inform 2022;20:e39. [PMID: 36617647 PMCID: PMC9847378 DOI: 10.5808/gi.22074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 12/22/2022] [Indexed: 12/31/2022] Open

Galindez G, Sadegh S, Baumbach J, Kacprowski T, List M. Network-based approaches for modeling disease regulation and progression. Comput Struct Biotechnol J 2022;21:780-795. [PMID: 36698974 PMCID: PMC9841310 DOI: 10.1016/j.csbj.2022.12.022] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 12/14/2022] [Accepted: 12/14/2022] [Indexed: 12/23/2022] Open

Lee M, Kim PJ, Joe H, Kim HG. Gene-centric multi-omics integration with convolutional encoders for cancer drug response prediction. Comput Biol Med 2022;151:106192. [PMID: 36327883 DOI: 10.1016/j.compbiomed.2022.106192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 08/26/2022] [Accepted: 10/08/2022] [Indexed: 12/27/2022]

Lan AY, Corces MR. Deep learning approaches for noncoding variant prioritization in neurodegenerative diseases. Front Aging Neurosci 2022;14:1027224. [PMID: 36466610 PMCID: PMC9716280 DOI: 10.3389/fnagi.2022.1027224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 10/24/2022] [Indexed: 11/19/2022] Open

Linder J, Koplik SE, Kundaje A, Seelig G. Deciphering the impact of genetic variation on human polyadenylation using APARENT2. Genome Biol 2022;23:232. [PMID: 36335397 PMCID: PMC9636789 DOI: 10.1186/s13059-022-02799-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 10/19/2022] [Indexed: 11/08/2022] Open

Abstract

BACKGROUND

3'-end processing by cleavage and polyadenylation is an important and finely tuned regulatory process during mRNA maturation. Numerous genetic variants are known to cause or contribute to human disorders by disrupting the cis-regulatory code of polyadenylation signals. Yet, due to the complexity of this code, variant interpretation remains challenging.

RESULTS

We introduce a residual neural network model, APARENT2, that can infer 3'-cleavage and polyadenylation from DNA sequence more accurately than any previous model. This model generalizes to the case of alternative polyadenylation (APA) for a variable number of polyadenylation signals. We demonstrate APARENT2's performance on several variant datasets, including functional reporter data and human 3' aQTLs from GTEx. We apply neural network interpretation methods to gain insights into disrupted or protective higher-order features of polyadenylation. We fine-tune APARENT2 on human tissue-resolved transcriptomic data to elucidate tissue-specific variant effects. By combining APARENT2 with models of mRNA stability, we extend aQTL effect size predictions to the entire 3' untranslated region. Finally, we perform in silico saturation mutagenesis of all human polyadenylation signals and compare the predicted effects of [Formula: see text] million variants against gnomAD. While loss-of-function variants were generally selected against, we also find specific clinical conditions linked to gain-of-function mutations. For example, we detect an association between gain-of-function mutations in the 3'-end and autism spectrum disorder. To experimentally validate APARENT2's predictions, we assayed clinically relevant variants in multiple cell lines, including microglia-derived cells.

CONCLUSIONS

A sequence-to-function model based on deep residual learning enables accurate functional interpretation of genetic variants in polyadenylation signals and, when coupled with large human variation databases, elucidates the link between functional 3'-end mutations and human health.

Collapse

Wang Y, Huang X, Xian B, Jiang H, Zhou T, Chen S, Wen F, Pei J. Machine learning and bioinformatics-based insights into the potential targets of saponins in Paris polyphylla smith against non-small cell lung cancer. Front Genet 2022;13:1005896. [PMID: 36386821 PMCID: PMC9649596 DOI: 10.3389/fgene.2022.1005896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 10/17/2022] [Indexed: 12/24/2022] Open

Abstract

Background: Lung cancer has the highest mortality rate among cancers worldwide, and non-small cell lung cancer (NSCLC) is the major lethal factor. Saponins in Paris polyphylla smith exhibit antitumor activity against non-small cell lung cancer, but their targets are not fully understood.

Methods: In this study, we used differential gene analysis, lasso regression analysis and support vector machine recursive feature elimination (SVM-RFE) to screen potential key genes for NSCLC by using relevant datasets from the GEO database. The accuracy of the signature genes was verified by using ROC curves and gene expression values. Screening of potential active ingredients for the treatment of NSCLC by molecular docking of the reported active ingredients of saponins in Paris polyphylla Smith with the screened signature genes. The activity of the screened components and their effects on key genes expression were further validated by CCK-8, flow cytometry (apoptosis and cycling) and qPCR.

Results: 204 differential genes and two key genes (RHEBL1, RNPC3) stood out in the bioinformatics analysis. Overall survival (OS), First-progression survival (FP) and post-progression survival (PPS) analysis revealed that low expression of RHEBL1 and high expression of RNPC3 indicated good prognosis. In addition, Polyphyllin VI(PPVI) and Protodioscin (Prot) effectively inhibited the proliferation of non-small cell lung cancer cell line with IC50 of 4.46 μM ± 0.69 μM and 8.09 μM ± 0.67μM, respectively. The number of apoptotic cells increased significantly with increasing concentrations of PPVI and Prot. Prot induces G1/G0 phase cell cycle arrest and PPVI induces G2/M phase cell cycle arrest. After PPVI and Prot acted on this cell line for 48 h, the expression of RHEBL1 and RNPC3 was found to be consistent with the results of bioinformatics analysis.

Conclusion: This study identified two potential key genes (RHEBL1 and RNPC3) in NSCLC. Additionally, PPVI and Prot may act on RHEBL1 and RNPC3 to affect NSCLC. Our findings provide a reference for clinical treatment of NSCLC.

Collapse

Towards a better understanding of TF-DNA binding prediction from genomic features. Comput Biol Med 2022;149:105993. [DOI: 10.1016/j.compbiomed.2022.105993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 07/12/2022] [Accepted: 08/14/2022] [Indexed: 11/17/2022]

Watson DS. Interpretable machine learning for genomics. Hum Genet 2022;141:1499-1513. [PMID: 34669035 PMCID: PMC8527313 DOI: 10.1007/s00439-021-02387-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 10/08/2021] [Indexed: 12/19/2022]

Alharbi WS, Rashid M. A review of deep learning applications in human genomics using next-generation sequencing data. Hum Genomics 2022;16:26. [PMID: 35879805 PMCID: PMC9317091 DOI: 10.1186/s40246-022-00396-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 07/12/2022] [Indexed: 12/02/2022] Open

Talukder A, Zhang W, Li X, Hu H. A deep learning method for miRNA/isomiR target detection. Sci Rep 2022;12:10618. [PMID: 35739186 PMCID: PMC9226005 DOI: 10.1038/s41598-022-14890-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 06/14/2022] [Indexed: 11/30/2022] Open

Zhang Y, Hua S, Jiang Q, Xie Z, Wu L, Wang X, Shi F, Dong S, Jiang J. Identification of Feature Genes of a Novel Neural Network Model for Bladder Cancer. Front Genet 2022;13:912171. [PMID: 35719407 PMCID: PMC9198295 DOI: 10.3389/fgene.2022.912171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 05/10/2022] [Indexed: 11/13/2022] Open

Abstract

Background: The combination of deep learning methods and oncogenomics can provide an effective diagnostic method for malignant tumors; thus, we attempted to construct a reliable artificial neural network model as a novel diagnostic tool for Bladder cancer (BLCA). Methods: Three expression profiling datasets (GSE61615, GSE65635, and GSE100926) were downloaded from the Gene Expression Omnibus (GEO) database. GSE61615 and GSE65635 were taken as the train group, while GSE100926 was set as the test group. Differentially expressed genes (DEGs) were filtered out based on the logFC and FDR values. We also performed Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses to explore the biological functions of the DEGs. Consequently, we utilized a random forest algorithm to identify feature genes and further constructed a neural network model. The test group was given the same procedures to validate the reliability of the model. We also explored immune cells' infiltration degree and correlation coefficients through the CiberSort algorithm and corrplot R package. The qRT-PCR assay was implemented to examine the expression level of the feature genes in vitro. Results: A total of 265 DEGs were filtered out and significantly enriched in muscle system processes, collagen-containing and focal adhesion signaling pathways. Based on the random forest algorithm, we selected 14 feature genes to construct the neural network model. The area under the curve (AUC) of the training group was 0.950 (95% CI: 0.850-1.000), and the AUC of the test group was 0.667 (95% CI: 0.333-1.000). Besides, we observed significant differences in the content of immune infiltrating cells and the expression levels of the feature genes. Conclusion: After repeated verification, our neural network model had clinical feasibility to identify bladder cancer patients and provided a potential target to improve the management of BLCA.

Collapse

Mo H, Breitling R, Francavilla C, Schwartz JM. Data integration and mechanistic modelling for breast cancer biology: Current state and future directions. CURRENT OPINION IN ENDOCRINE AND METABOLIC RESEARCH 2022;24:None. [PMID: 36034741 PMCID: PMC9402443 DOI: 10.1016/j.coemr.2022.100350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Prybutok AN, Cain JY, Leonard JN, Bagheri N. Fighting fire with fire: deploying complexity in computational modeling to effectively characterize complex biological systems. Curr Opin Biotechnol 2022;75:102704. [DOI: 10.1016/j.copbio.2022.102704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Revised: 01/27/2022] [Accepted: 02/06/2022] [Indexed: 11/03/2022]

Chalupová E, Vaculík O, Poláček J, Jozefov F, Majtner T, Alexiou P. ENNGene: an Easy Neural Network model building tool for Genomics. BMC Genomics 2022;23:248. [PMID: 35361122 PMCID: PMC8973509 DOI: 10.1186/s12864-022-08414-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 02/23/2022] [Indexed: 11/17/2022] Open

Abstract

Background

The recent big data revolution in Genomics, coupled with the emergence of Deep Learning as a set of powerful machine learning methods, has shifted the standard practices of machine learning for Genomics. Even though Deep Learning methods such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are becoming widespread in Genomics, developing and training such models is outside the ability of most researchers in the field.

Results

Here we present ENNGene—Easy Neural Network model building tool for Genomics. This tool simplifies training of custom CNN or hybrid CNN-RNN models on genomic data via an easy-to-use Graphical User Interface. ENNGene allows multiple input branches, including sequence, evolutionary conservation, and secondary structure, and performs all the necessary preprocessing steps, allowing simple input such as genomic coordinates. The network architecture is selected and fully customized by the user, from the number and types of the layers to each layer's precise set-up. ENNGene then deals with all steps of training and evaluation of the model, exporting valuable metrics such as multi-class ROC and precision-recall curve plots or TensorBoard log files. To facilitate interpretation of the predicted results, we deploy Integrated Gradients, providing the user with a graphical representation of an attribution level of each input position. To showcase the usage of ENNGene, we train multiple models on the RBP24 dataset, quickly reaching the state of the art while improving the performance on more than half of the proteins by including the evolutionary conservation score and tuning the network per protein.

Conclusions

As the role of DL in big data analysis in the near future is indisputable, it is important to make it available for a broader range of researchers. We believe that an easy-to-use tool such as ENNGene can allow Genomics researchers without a background in Computational Sciences to harness the power of DL to gain better insights into and extract important information from the large amounts of data available in the field.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12864-022-08414-x.

Collapse

Andrades R, Recamonde-Mendoza M. Machine learning methods for prediction of cancer driver genes: a survey paper. Brief Bioinform 2022;23:6551145. [PMID: 35323900 DOI: 10.1093/bib/bbac062] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 02/06/2022] [Accepted: 02/08/2022] [Indexed: 12/21/2022] Open

Ventolero MF, Wang S, Hu H, Li X. Computational analyses of bacterial strains from shotgun reads. Brief Bioinform 2022;23:6524011. [PMID: 35136954 DOI: 10.1093/bib/bbac013] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 01/10/2022] [Accepted: 01/11/2022] [Indexed: 12/21/2022] Open

Interpreting neural networks for biological sequences by learning stochastic masks. NAT MACH INTELL 2022;4:41-54. [DOI: 10.1038/s42256-021-00428-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

AIM in Medical Informatics. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_32] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Huminiecki Ł. Virtual Gene Concept and a Corresponding Pragmatic Research Program in Genetical Data Science. ENTROPY (BASEL, SWITZERLAND) 2021;24:17. [PMID: 35052043 PMCID: PMC8774939 DOI: 10.3390/e24010017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 12/02/2021] [Accepted: 12/14/2021] [Indexed: 06/14/2023]

Instance Segmentation for Governmental Inspection of Small Touristic Infrastructure in Beach Zones Using Multispectral High-Resolution WorldView-3 Imagery. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2021. [DOI: 10.3390/ijgi10120813] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Abstract Misappropriation of public lands is an ongoing government concern. In Brazil, the beach zone is public property, but many private establishments use it for economic purposes, requiring constant inspection. Among the undue targets, the individual mapping of straw beach umbrellas (SBUs) attached to the sand is a great challenge due to their small size, high presence, and agglutinated appearance. This study aims to automatically detect and count SBUs on public beaches using high-resolution images and instance segmentation, obtaining pixel-wise semantic information and individual object detection. This study is the first instance segmentation application on coastal areas and the first using WorldView-3 (WV-3) images. We used the Mask-RCNN with some modifications: (a) multispectral input for the WorldView3 imagery (eight channels), (b) improved the sliding window algorithm for large image classification, and (c) comparison of different image resizing ratios to improve small object detection since the SBUs are small objects (<322 pixels) even using high-resolution images (31 cm). The accuracy analysis used standard COCO metrics considering the original image and three scale ratios (2×, 4×, and 8× resolution increase). The average precision (AP) results increased proportionally to the image resolution: 30.49% (original image), 48.24% (2×), 53.45% (4×), and 58.11% (8×). The 8× model presented 94% AP50, classifying nearly all SBUs correctly. Moreover, the improved sliding window approach enables the classification of large areas providing automatic counting and estimating the size of the objects, proving to be effective for inspecting large coastal areas and providing insightful information for public managers. This remote sensing application impacts the inspection cost, tribute, and environmental conditions. Collapse

Deep Learning for Human Disease Detection, Subtype Classification, and Treatment Response Prediction Using Epigenomic Data. Biomedicines 2021;9:biomedicines9111733. [PMID: 34829962 PMCID: PMC8615388 DOI: 10.3390/biomedicines9111733] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Revised: 10/26/2021] [Accepted: 11/17/2021] [Indexed: 12/25/2022] Open

Westerlund AM, Hawe JS, Heinig M, Schunkert H. Risk Prediction of Cardiovascular Events by Exploration of Molecular Data with Explainable Artificial Intelligence. Int J Mol Sci 2021;22:10291. [PMID: 34638627 PMCID: PMC8508897 DOI: 10.3390/ijms221910291] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 09/17/2021] [Accepted: 09/18/2021] [Indexed: 12/11/2022] Open

Picard M, Scott-Boyer MP, Bodein A, Périn O, Droit A. Integration strategies of multi-omics data for machine learning analysis. Comput Struct Biotechnol J 2021;19:3735-3746. [PMID: 34285775 PMCID: PMC8258788 DOI: 10.1016/j.csbj.2021.06.030] [Citation(s) in RCA: 166] [Impact Index Per Article: 55.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 06/17/2021] [Accepted: 06/21/2021] [Indexed: 12/25/2022] Open

Cha M, Zheng H, Talukder A, Barham C, Li X, Hu H. A two-stream convolutional neural network for microRNA transcription start site feature integration and identification. Sci Rep 2021;11:5625. [PMID: 33707582 PMCID: PMC7952457 DOI: 10.1038/s41598-021-85173-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 02/24/2021] [Indexed: 12/31/2022] Open

Oh VKS, Li RW. Temporal Dynamic Methods for Bulk RNA-Seq Time Series Data. Genes (Basel) 2021;12:352. [PMID: 33673721 PMCID: PMC7997275 DOI: 10.3390/genes12030352] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 02/19/2021] [Accepted: 02/22/2021] [Indexed: 02/06/2023] Open

Bruno P, Calimeri F, Greco G. AIM in Medical Informatics. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_32-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]