Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Stephan J, Stegle O, Beyer A. A random forest approach to capture genetic effects in the presence of population structure. Nat Commun 2015;6:7432. [DOI: 10.1038/ncomms8432] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2014] [Accepted: 05/08/2015] [Indexed: 01/07/2023] Open

For:	Stephan J, Stegle O, Beyer A. A random forest approach to capture genetic effects in the presence of population structure. Nat Commun 2015;6:7432. [DOI: 10.1038/ncomms8432] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2014] [Accepted: 05/08/2015] [Indexed: 01/07/2023] Open

Number

Cited by Other Article(s)

Wang Q, Tang TM, Youlton N, Weldy CS, Kenney AM, Ronen O, Weston Hughes J, Chin ET, Sutton SC, Agarwal A, Li X, Behr M, Kumbier K, Moravec CS, Wilson Tang WH, Margulies KB, Cappola TP, Butte AJ, Arnaout R, Brown JB, Priest JR, Parikh VN, Yu B, Ashley EA. Epistasis regulates genetic control of cardiac hypertrophy. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.11.06.23297858. [PMID: 37987017 PMCID: PMC10659487 DOI: 10.1101/2023.11.06.23297858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]

Mouronte-López ML, Gómez Sánchez-Seco J, Benito RM. Patterns of human and bots behaviour on Twitter conversations about sustainability. Sci Rep 2024;14:3223. [PMID: 38331929 PMCID: PMC10853507 DOI: 10.1038/s41598-024-52471-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 01/18/2024] [Indexed: 02/10/2024] Open

Rhodes JS, Aumon A, Morin S, Girard M, Larochelle C, Brunet-Ratnasingham E, Pagliuzza A, Marchitto L, Zhang W, Cutler A, Grand'Maison F, Zhou A, Finzi A, Chomont N, Kaufmann DE, Zandee S, Prat A, Wolf G, Moon KR. Gaining Biological Insights through Supervised Data Visualization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.22.568384. [PMID: 38293135 PMCID: PMC10827133 DOI: 10.1101/2023.11.22.568384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]

Bonham KS, Fahur Bottino G, McCann SH, Beauchemin J, Weisse E, Barry F, Cano Lorente R, Huttenhower C, Bruchhage M, D’Sa V, Deoni S, Klepac-Ceraj V. Gut-resident microorganisms and their genes are associated with cognition and neuroanatomy in children. SCIENCE ADVANCES 2023;9:eadi0497. [PMID: 38134274 PMCID: PMC10745691 DOI: 10.1126/sciadv.adi0497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 11/21/2023] [Indexed: 12/24/2023]

Li X, Hu H, Ren Q, Wang M, Du Y, He Y, Wang Q. Comparative analysis of endophyte diversity of Dendrobium officinale lived on rock and tree. PLANT BIOTECHNOLOGY (TOKYO, JAPAN) 2023;40:145-155. [PMID: 38264473 PMCID: PMC10804140 DOI: 10.5511/plantbiotechnology.23.0208a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 02/08/2023] [Indexed: 01/25/2024]

Mo X, Wang N, He Z, Kang W, Wang L, Han X, Yang L. The sub-molecular characterization identification for cervical cancer. Heliyon 2023;9:e16873. [PMID: 37484385 PMCID: PMC10360967 DOI: 10.1016/j.heliyon.2023.e16873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 05/28/2023] [Accepted: 05/31/2023] [Indexed: 07/25/2023] Open

Abstract

Background

The efficacy of therapy in cervical cancer (CESC) is blocked by high molecular heterogeneity. Thus, the sub-molecular characterization remains primarily explored for personalizing the treatment of CESC patients.

Methods

Datasets with 741 CESC patients were obtained from TCGA and GEO databases. The NMF algorithm, random forest algorithm, and multivariate Cox analysis were utilized to construct a classifier for defining the sub-molecular characterization. Then, the biological characteristics, genomic variations, prognosis, and immune landscape in molecular subtypes were explored. The significance of classifier genes was validated by quantitative Real-Time PCR, cell transfection, cell colony formation assay, wound healing assay, cell proliferation assay, and Western blot.

Results

The CESC patients were classified into two subtypes, and the high classifier-score patients with significant differences in ECM-receptor interaction, PI3K-Akt signaling pathway, and MAPK signaling pathway showed a poorer prognosis in OS (p < 0.001), DFI (p = 0.016), PFI (p < 0.001) and DSS (p < 0.001), and with high the M0 Macrophage and resting Mast cells infiltration and low HLA family gene expression. Moreover, the constructed classifier owns a high identified accuracy in the tumor/normal groups (AUC: 0.993), the tumor/CIN1-CIN3 groups (AUC: 0.963), and normal/CIN1-CIN3 groups (AUC: 0.962), and the total prediction performance is better than currently published signatures in CESC (C-index: 0,763). The combined prediction performance further indicated that Nomogram (AUC = 0.837) is superior to the classifier (AUC = 0.835) and Stage (AUC = 0.568), and the C-index of calibration curves is 0.784. The potential biological function of classifier genes indicated that silencing GALNT2 inhibited the cancer cell's proliferation, migration, and colony formation; Conversely, the cancer cell's proliferation, migration, and colony formation were increased after the upregulation of GALNT2. The Epithelial-Mesenchymal Transition Experiment showed that GALNT2 knockdown might reduce the levels of Snail and Vimentin proteins and increase E-cadherin; Conversely, the levels of Snail and Vimentin proteins were increased, E-cadherin was reduced by GALNT2 upregulation.

Conclusion

The classifier we constructed may help improve our understanding of subtype characteristics and provide a new strategy for developing CESC therapeutics. Remarkably, GALNT2 may be an option to directly target drivers in CESC cancer therapy.

Collapse

Takefuji Y. Why the power of diversity does not always produce better groups and societies. Biosystems 2023;229:104918. [PMID: 37196894 DOI: 10.1016/j.biosystems.2023.104918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 05/06/2023] [Accepted: 05/07/2023] [Indexed: 05/19/2023]

Baiocchi GC, Vojdani A, Rosenberg AZ, Vojdani E, Halpert G, Ostrinski Y, Zyskind I, Filgueiras IS, Schimke LF, Marques AHC, Giil LM, Lavi YB, Silverberg JI, Zimmerman J, Hill DA, Thornton A, Kim M, De Vito R, Fonseca DLM, Plaça DR, Freire PP, Camara NOS, Calich VLG, Scheibenbogen C, Heidecke H, Lattin MT, Ochs HD, Riemekasten G, Amital H, Shoenfeld Y, Cabral-Marques O. Cross-sectional analysis reveals autoantibody signatures associated with COVID-19 severity. J Med Virol 2023;95:e28538. [PMID: 36722456 DOI: 10.1002/jmv.28538] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 01/20/2023] [Accepted: 01/24/2023] [Indexed: 02/02/2023]

Affiliation(s)

Gabriela C Baiocchi Department of Immunology, Institute of Biomedical Sciences, University of São Paulo, São Paulo, Brazil
Aristo Vojdani Immunosciences Laboratory, Inc., Department of Immunology, Los Angeles, California, USA.,Cyrex Laboratories, Phoenix, Arizona, USA
Avi Z Rosenberg Department of Pathology, Johns Hopkins University, Baltimore, Maryland, USA
Elroy Vojdani Regenera Medical, Los Angeles, California, USA
Gilad Halpert Ariel University, Ariel, Israel.,Zabludowicz Center for Autoimmune Diseases, Sheba Medical Center, Tel-Hashomer, Israel.,Saint Petersburg State University Russia, St Petersburg, Russia
Yuri Ostrinski Ariel University, Ariel, Israel.,Zabludowicz Center for Autoimmune Diseases, Sheba Medical Center, Tel-Hashomer, Israel.,Saint Petersburg State University Russia, St Petersburg, Russia
Israel Zyskind Department of Pediatrics, NYU Langone Medical Center, New York, New York, USA.,Maimonides Medical Center, Brooklyn, New York, USA
Igor S Filgueiras Department of Immunology, Institute of Biomedical Sciences, University of São Paulo, São Paulo, Brazil
Lena F Schimke Department of Immunology, Institute of Biomedical Sciences, University of São Paulo, São Paulo, Brazil
Alexandre H C Marques Department of Immunology, Institute of Biomedical Sciences, University of São Paulo, São Paulo, Brazil
Lasse M Giil Department of Internal Medicine, Haraldsplass Deaconess Hospital, Bergen, Norway
Yael B Lavi Department of Chemistry Ben Gurion University Beer-Sheva, Beer-Sheva, Israel
Jonathan I Silverberg Department of Dermatology, George Washington University School of Medicine and Health Sciences, Washington, USA
Jason Zimmerman Maimonides Medical Center, Brooklyn, New York, USA
Dana A Hill ResourcePath, Sterling, Virginia, USA
Amanda Thornton ResourcePath, Sterling, Virginia, USA
Myungjin Kim Data Science Initiative at Brown University, Providence, Rhode Island, USA
Roberta De Vito Department of Biostatistics and the Data Science Initiative at Brown University, Providence, Rhode Island, USA
Dennyson L M Fonseca Interunit Postgraduate Program on Bioinformatics, Institute of Mathematics and Statistics (IME), University of Sao Paulo (USP), Sao Paulo, Brazil
Desireé R Plaça Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, São Paulo, Brazil
Paula P Freire Department of Immunology, Institute of Biomedical Sciences, University of São Paulo, São Paulo, Brazil
Niels O S Camara Department of Immunology, Institute of Biomedical Sciences, University of São Paulo, São Paulo, Brazil
Vera L G Calich Department of Immunology, Institute of Biomedical Sciences, University of São Paulo, São Paulo, Brazil
Carmen Scheibenbogen Institute for Medical Immunology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Berlin, Germany
Harald Heidecke CellTrend Gesellschaft mit beschränkter Haftung (GmbH), Luckenwalde, Germany
Miriam T Lattin Department of Biology, Yeshiva University, Manhatten, New York, USA
Hans D Ochs Department of Pediatrics, University of Washington School of Medicine, and Seattle Children's Research Institute, Seattle, Washington, USA
Gabriela Riemekasten Department of Rheumatology, University Medical Center Schleswig-Holstein Campus Lübeck, Lübeck, Germany
Howard Amital Ariel University, Ariel, Israel.,Zabludowicz Center for Autoimmune Diseases, Sheba Medical Center, Tel-Hashomer, Israel.,Department of Medicine B, Sheba Medical Center, Tel Hashomer, Israel.,Sackler Faculty of Medicine, Tel-Aviv University, Tel-Aviv, Israel
Yehuda Shoenfeld Zabludowicz Center for Autoimmune Diseases, Sheba Medical Center, Tel-Hashomer, Israel.,Saint Petersburg State University Russia, St Petersburg, Russia
Otavio Cabral-Marques Department of Immunology, Institute of Biomedical Sciences, University of São Paulo, São Paulo, Brazil.,Interunit Postgraduate Program on Bioinformatics, Institute of Mathematics and Statistics (IME), University of Sao Paulo (USP), Sao Paulo, Brazil.,Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, São Paulo, Brazil.,Department of Pharmacy and Postgraduate Program of Health and Science, Federal University of Rio Grande do Norte, Natal, Brazil.,Department of Medicine, Division of Molecular Medicine, University of São Paulo School of Medicine, Baltimore, USA.,Laboratory of Medical Investigation 29, University of São Paulo School of Medicine, São Paulo, Brazil

Collapse

Rolczynski BS, Díaz SA, Kim YC, Mathur D, Klein WP, Medintz IL, Melinger JS. Determining interchromophore effects for energy transport in molecular networks using machine-learning algorithms. Phys Chem Chem Phys 2023;25:3651-3665. [PMID: 36648290 DOI: 10.1039/d2cp04960k] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]

Bavykina M, Kostina N, Lee CR, Schafleitner R, Bishop-von Wettberg E, Nuzhdin SV, Samsonova M, Gursky V, Kozlov K. Modeling of Flowering Time in Vigna radiata with Artificial Image Objects, Convolutional Neural Network and Random Forest. PLANTS (BASEL, SWITZERLAND) 2022;11:3327. [PMID: 36501364 PMCID: PMC9738219 DOI: 10.3390/plants11233327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 11/22/2022] [Accepted: 11/28/2022] [Indexed: 06/17/2023]

Accelerating imputation of missing genotypes using parallel computing. J Genet 2022. [DOI: 10.1007/s12041-022-01396-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Li L, Wu X, Chen J, Wang S, Wan Y, Ji H, Wen Y, Zhang J. Genetic Dissection of Epistatic Interactions Contributing Yield-Related Agronomic Traits in Rice Using the Compressed Mixed Model. PLANTS 2022;11:plants11192504. [PMID: 36235370 PMCID: PMC9571936 DOI: 10.3390/plants11192504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Revised: 09/09/2022] [Accepted: 09/19/2022] [Indexed: 11/26/2022]

Abstract

Rice (Oryza sativa) is one of the most important cereal crops in the world, and yield-related agronomic traits, including plant height (PH), panicle length (PL), and protein content (PC), are prerequisites for attaining the desired yield and quality in breeding programs. Meanwhile, the main effects and epistatic effects of quantitative trait nucleotides (QTNs) are all important genetic components for yield-related quantitative traits. In this study, we conducted genome-wide association studies (GWAS) for 413 rice germplasm resources, with 36,901 single nucleotide polymorphisms (SNPs), to identify QTNs, QTN-by-QTN interaction (QQI), and their candidate genes, using a multi-locus compressed variance component mixed model, 3VmrMLM. As a result, two significant QTNs and 56 paired QQIs were detected, amongst 5219 genes of these QTNs, and 26 genes were identified as the yield-related confirmed genes, such as LCRN1, OsSPL3, and OsVOZ1 for PH, and LOG and QsBZR1 for PL. To reveal the substantial contributions related to the variation of yield-related agronomic traits in rice, we further implemented an enrichment analysis and expression analysis. As the results showed, 114 genes, nearly all significant QQIs, were involved in 37 GO terms; for example, the macromolecule metabolic process (GO:0043170), intracellular part (GO:0044424), and binding (GO:0005488). It was revealed that most of the QQIs and the candidate genes were significantly involved in the biological process, molecular function, and cellular component of the target traits. The demonstrated genetic interactions play a critical role in yield-related agronomic traits of rice, and such epistatic interactions contributed to large portions of the missing heritability in GWAS. These results help us to understand the genetic basis underlying the inheritance of the three yield-related agronomic traits and provide implications for rice improvement.

Collapse

Saha S, Perrin L, Röder L, Brun C, Spinelli L. Epi-MEIF: detecting higher order epistatic interactions for complex traits using mixed effect conditional inference forests. Nucleic Acids Res 2022;50:e114. [PMID: 36107776 PMCID: PMC9639209 DOI: 10.1093/nar/gkac715] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 07/29/2022] [Accepted: 09/12/2022] [Indexed: 12/04/2022] Open

Wang H, Yang W, Qin Q, Yang X, Yang Y, Liu H, Lu W, Gu S, Cao X, Feng D, Zhang Z, He J. E3 ubiquitin ligase MAGI3 degrades c-Myc and acts as a predictor for chemotherapy response in colorectal cancer. Mol Cancer 2022;21:151. [PMID: 35864508 PMCID: PMC9306183 DOI: 10.1186/s12943-022-01622-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Accepted: 05/27/2021] [Indexed: 12/24/2022] Open

Abstract

Background

Recurrence and chemoresistance constitute the leading cause of death in colorectal cancer (CRC). Thus, it is of great significance to clarify the underlying mechanisms and identify predictors for tailoring adjuvant chemotherapy to improve the outcome of CRC.

Methods

By screening differentially expressed genes (DEGs), constructing random forest classification and ranking the importance of DEGs, we identified membrane associated guanylate kinase, WW and PDZ domain containing 3 (MAGI3) as an important gene in CRC recurrence. Immunohistochemical and western blot assays were employed to further detect MAGI3 expression in CRC tissues and cell lines. Cell counting kit-8, plate colony formation, flow cytometry, sub-cutaneous injection and azoxymethane plus dextran sulfate sodium induced mice CRC assays were employed to explore the effects of MAGI3 on proliferation, growth, cell cycle, apoptosis, xenograft formation and chemotherapy resistance of CRC. The underlying molecular mechanisms were further investigated through gene set enrichment analysis, quantitative real-time PCR, western blot, co-immunoprecipitation, ubiquitination, GST fusion protein pull-down and immunohistochemical staining assays.

Results

Our results showed that dysregulated low level of MAGI3 was correlated with recurrence and poor prognosis of CRC. MAGI3 was identified as a novel substrate-binding subunit of SKP1-Cullin E3 ligase to recognize c-Myc, and process c-Myc ubiquitination and degradation. Expression of MAGI3 in CRC cells inhibited cell growth, promoted apoptosis and chemosensitivity to fluoropyrimidine-based chemotherapy by suppressing activation of c-Myc in vitro and in vivo. In clinic, the stage II/III CRC patients with MAGI3-high had a significantly good recurrence-free survival (~ 80%, 5-year), and were not necessary for further adjuvant chemotherapy. The patients with MAGI3-medium had a robustly good response rate or recurrence-free survival with fluoropyrimidine-based chemotherapy, and were recommended to undergo fluoropyrimidine-based adjuvant chemotherapy.

Conclusions

MAGI3 is a novel E3 ubiquitin ligase by degradation of c-Myc to regulate CRC development and may act as a potential predictor of adjuvant chemotherapy for CRC patients.

Graphical Abstract

Supplementary Information

The online version contains supplementary material available at 10.1186/s12943-022-01622-9.

Collapse

Affiliation(s)

Haibo Wang Beijing Key Laboratory for Tumor Invasion and Metastasis, Department of Biochemistry and Molecular Biology, Capital Medical University, No.10 Xitoutiao, You An Men, Beijing, 100069, People's Republic of China
Wenjing Yang Department of Oncology, Beijing Hospital of Traditional Chinese Medicine, Capital Medical University, Beijing, People's Republic of China
Qiong Qin Beijing Key Laboratory for Tumor Invasion and Metastasis, Department of Biochemistry and Molecular Biology, Capital Medical University, No.10 Xitoutiao, You An Men, Beijing, 100069, People's Republic of China
Xiaomei Yang Beijing Key Laboratory for Tumor Invasion and Metastasis, Department of Biochemistry and Molecular Biology, Capital Medical University, No.10 Xitoutiao, You An Men, Beijing, 100069, People's Republic of China
Ying Yang Core Facilities Center, Capital Medical University, Beijing, People's Republic of China
Hua Liu Beijing Key Laboratory for Tumor Invasion and Metastasis, Department of Biochemistry and Molecular Biology, Capital Medical University, No.10 Xitoutiao, You An Men, Beijing, 100069, People's Republic of China
Wenxiu Lu Beijing Key Laboratory for Tumor Invasion and Metastasis, Department of Biochemistry and Molecular Biology, Capital Medical University, No.10 Xitoutiao, You An Men, Beijing, 100069, People's Republic of China
Siyu Gu Beijing Key Laboratory for Tumor Invasion and Metastasis, Department of Biochemistry and Molecular Biology, Capital Medical University, No.10 Xitoutiao, You An Men, Beijing, 100069, People's Republic of China
Xuedi Cao Beijing Key Laboratory for Tumor Invasion and Metastasis, Department of Biochemistry and Molecular Biology, Capital Medical University, No.10 Xitoutiao, You An Men, Beijing, 100069, People's Republic of China
Duiping Feng Department of Interventional Radiology, First Hospital of Shanxi Medical University, Taiyuan, People's Republic of China
Zhongtao Zhang Department of General Surgery, Beijing Friendship Hospital, Capital Medical University & National Clinical Research Center for Digestive Diseases, No.95 Yong-anRoad, Xi-Cheng District, Beijing, 100050, People's Republic of China.
Junqi He Beijing Key Laboratory for Tumor Invasion and Metastasis, Department of Biochemistry and Molecular Biology, Capital Medical University, No.10 Xitoutiao, You An Men, Beijing, 100069, People's Republic of China.

Collapse

Ray A. Machine learning in postgenomic biology and personalized medicine. WILEY INTERDISCIPLINARY REVIEWS. DATA MINING AND KNOWLEDGE DISCOVERY 2022;12:e1451. [PMID: 35966173 PMCID: PMC9371441 DOI: 10.1002/widm.1451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 12/22/2021] [Indexed: 06/15/2023]

Chinese Comma Disambiguation in Math Word Problems Using SMOTE and Random Forests. AI 2021. [DOI: 10.3390/ai2040044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Bektaş AB, Gönen M. PrognosiT: Pathway/gene set-based tumour volume prediction using multiple kernel learning. BMC Bioinformatics 2021;22:537. [PMID: 34727887 PMCID: PMC8561914 DOI: 10.1186/s12859-021-04460-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Accepted: 10/26/2021] [Indexed: 11/10/2022] Open

Abstract

Background

Identification of molecular mechanisms that determine tumour progression in cancer patients is a prerequisite for developing new disease treatment guidelines. Even though the predictive performance of current machine learning models is promising, extracting significant and meaningful knowledge from the data simultaneously during the learning process is a difficult task considering the high-dimensional and highly correlated nature of genomic datasets. Thus, there is a need for models that not only predict tumour volume from gene expression data of patients but also use prior information coming from pathway/gene sets during the learning process, to distinguish molecular mechanisms which play crucial role in tumour progression and therefore, disease prognosis.

Results

In this study, instead of initially choosing several pathways/gene sets from an available set and training a model on this previously chosen subset of genomic features, we built a novel machine learning algorithm, PrognosiT, that accomplishes both tasks together. We tested our algorithm on thyroid carcinoma patients using gene expression profiles and cancer-specific pathways/gene sets. Predictive performance of our novel multiple kernel learning algorithm (PrognosiT) was comparable or even better than random forest (RF) and support vector regression (SVR). It is also notable that, to predict tumour volume, PrognosiT used gene expression features less than one-tenth of what RF and SVR algorithms used.

Conclusions

PrognosiT was able to obtain comparable or even better predictive performance than SVR and RF. Moreover, we demonstrated that during the learning process, our algorithm managed to extract relevant and meaningful pathway/gene sets information related to the studied cancer type, which provides insights about its progression and aggressiveness. We also compared gene expressions of the selected genes by our algorithm in tumour and normal tissues, and we then discussed up- and down-regulated genes selected by our algorithm while learning, which could be beneficial for determining new biomarkers.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12859-021-04460-6.

Collapse

Wang D, Li J, Sun Y, Ding X, Zhang X, Liu S, Han B, Wang H, Duan X, Sun T. A Machine Learning Model for Accurate Prediction of Sepsis in ICU Patients. Front Public Health 2021;9:754348. [PMID: 34722452 PMCID: PMC8553999 DOI: 10.3389/fpubh.2021.754348] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 09/20/2021] [Indexed: 12/23/2022] Open

Affiliation(s)

Dong Wang General Intensive Care Unit, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China.,Key Laboratory for Critical Care Medicine of Henan Province, Zhengzhou, China.,Key Laboratory for Sepsis of Zhengzhou, Zhengzhou, China
Jinbo Li General Intensive Care Unit, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China.,Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada
Yali Sun General Intensive Care Unit, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China.,Key Laboratory for Critical Care Medicine of Henan Province, Zhengzhou, China.,Key Laboratory for Sepsis of Zhengzhou, Zhengzhou, China
Xianfei Ding General Intensive Care Unit, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China.,Key Laboratory for Critical Care Medicine of Henan Province, Zhengzhou, China.,Key Laboratory for Sepsis of Zhengzhou, Zhengzhou, China
Xiaojuan Zhang General Intensive Care Unit, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China.,Key Laboratory for Critical Care Medicine of Henan Province, Zhengzhou, China.,Key Laboratory for Sepsis of Zhengzhou, Zhengzhou, China
Shaohua Liu General Intensive Care Unit, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China.,Key Laboratory for Critical Care Medicine of Henan Province, Zhengzhou, China.,Key Laboratory for Sepsis of Zhengzhou, Zhengzhou, China
Bing Han General Intensive Care Unit, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China.,Key Laboratory for Critical Care Medicine of Henan Province, Zhengzhou, China.,Key Laboratory for Sepsis of Zhengzhou, Zhengzhou, China
Haixu Wang General Intensive Care Unit, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China.,Key Laboratory for Critical Care Medicine of Henan Province, Zhengzhou, China.,Key Laboratory for Sepsis of Zhengzhou, Zhengzhou, China
Xiaoguang Duan General Intensive Care Unit, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China.,Key Laboratory for Critical Care Medicine of Henan Province, Zhengzhou, China.,Key Laboratory for Sepsis of Zhengzhou, Zhengzhou, China
Tongwen Sun General Intensive Care Unit, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China.,Key Laboratory for Critical Care Medicine of Henan Province, Zhengzhou, China.,Key Laboratory for Sepsis of Zhengzhou, Zhengzhou, China

Collapse

Tanaka H, Kreisberg JF, Ideker T. Genetic dissection of complex traits using hierarchical biological knowledge. PLoS Comput Biol 2021;17:e1009373. [PMID: 34534210 PMCID: PMC8480841 DOI: 10.1371/journal.pcbi.1009373] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 09/29/2021] [Accepted: 08/23/2021] [Indexed: 11/18/2022] Open

Abstract

Despite the growing constellation of genetic loci linked to common traits, these loci have yet to account for most heritable variation, and most act through poorly understood mechanisms. Recent machine learning (ML) systems have used hierarchical biological knowledge to associate genetic mutations with phenotypic outcomes, yielding substantial predictive power and mechanistic insight. Here, we use an ontology-guided ML system to map single nucleotide variants (SNVs) focusing on 6 classic phenotypic traits in natural yeast populations. The 29 identified loci are largely novel and account for ~17% of the phenotypic variance, versus <3% for standard genetic analysis. Representative results show that sensitivity to hydroxyurea is linked to SNVs in two alternative purine biosynthesis pathways, and that sensitivity to copper arises through failure to detoxify reactive oxygen species in fatty acid metabolism. This work demonstrates a knowledge-based approach to amplifying and interpreting signals in population genetic studies.

Genome-wide association studies (GWAS) have identified many important loci for common diseases and other traits. However, the loci identified by these studies are almost always many steps away from an understanding of underlying biological mechanisms. Here we develop an approach using hierarchical biological knowledge to identify genes and pathways responsible for phenotypic traits. Variants identified by the new method could explain a substantially greater fraction of heritability than previously reported. Moreover, we identified mechanistic pathways by which each causal variant affects cellular function. For example, we find that sensitivity to hydroxyurea is tied to genetic variants in two alternative purine biosynthesis pathways, and that sensitivity to copper arises through failure to detoxify reactive oxygen species in fatty acid metabolism. The new approach is a potentially transformative concept for understanding the genetic drivers of phenotypic variance, with potential applications in understanding traits in biomedicine and agriculture.

Collapse

Montesinos-López OA, Montesinos-López A, Mosqueda-Gonzalez BA, Montesinos-López JC, Crossa J, Ramirez NL, Singh P, Valladares-Anguiano FA. A zero altered Poisson random forest model for genomic-enabled prediction. G3-GENES GENOMES GENETICS 2021;11:6042695. [PMID: 33693599 PMCID: PMC8022945 DOI: 10.1093/g3journal/jkaa057] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Accepted: 12/10/2020] [Indexed: 12/23/2022]

Towards fine-scale population stratification modeling based on kernel principal component analysis and random forest. Genes Genomics 2021;43:1143-1155. [PMID: 34097252 DOI: 10.1007/s13258-021-01057-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 01/26/2021] [Indexed: 10/21/2022]

High-Resolution Genomic Comparisons within Salmonella enterica Serotypes Derived from Beef Feedlot Cattle: Parsing the Roles of Cattle Source, Pen, Animal, Sample Type, and Production Period. Appl Environ Microbiol 2021;87:e0048521. [PMID: 33863705 DOI: 10.1128/aem.00485-21] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Abstract

Salmonella enterica is a major foodborne pathogen, and contaminated beef products have been identified as one of the primary sources of Salmonella-related outbreaks. Pathogenicity and antibiotic resistance of Salmonella are highly serotype and subpopulation specific, which makes it essential to understand high-resolution Salmonella population dynamics in cattle. Time of year, source of cattle, pen, and sample type (i.e., feces, hide, or lymph nodes) have previously been identified as important factors influencing the serotype distribution of Salmonella (e.g., Anatum, Lubbock, Cerro, Montevideo, Kentucky, Newport, and Norwich) that were isolated from a longitudinal sampling design in a research feedlot. In this study, we performed high-resolution genomic comparisons of Salmonella isolates within each serotype using both single-nucleotide polymorphism-based maximum-likelihood phylogeny and hierarchical clustering of core-genome multilocus sequence typing. The importance of the aforementioned features in clonal Salmonella expansion was further explored using a supervised machine learning algorithm. In addition, we identified and compared the resistance genes, plasmids, and pathogenicity island profiles of the isolates within each subpopulation. Our findings indicate that clonal expansion of Salmonella strains in cattle was mainly influenced by the randomization of block and pen, as well as the origin/source of the cattle, i.e., regardless of sampling time and sample type (i.e., feces, lymph node, or hide). Further research is needed concerning the role of the feedlot pen environment prior to cattle placement to better understand carryover contributions of existing strains of Salmonella and their bacteriophages. IMPORTANCE Salmonella serotypes isolated from outbreaks in humans can also be found in beef cattle and feedlots. Virulence factors and antibiotic resistance are among the primary defense mechanisms of Salmonella, and are often associated with clonal expansion. This makes understanding the subpopulation dynamics of Salmonella in cattle critical for effective mitigation. There remains a gap in the literature concerning subpopulation dynamics within Salmonella serotypes in feedlot cattle from the beginning of feeding up until slaughter. Here, we explore Salmonella population dynamics within each serotype using core-genome phylogeny and hierarchical classifications. We used machine learning to quantitatively parse the relative importance of both hierarchical and longitudinal clustering among cattle host samples. Our results reveal that Salmonella populations in cattle are highly clonal over a 6-month study period and that clonal dissemination of Salmonella in cattle is mainly influenced spatially by experimental block and pen, as well by the geographical origin of the cattle.

Collapse

Chen K, Xu H, Lei Y, Lio P, Li Y, Guo H, Ali Moni M. Integration and interplay of machine learning and bioinformatics approach to identify genetic interaction related to ovarian cancer chemoresistance. Brief Bioinform 2021;22:6272796. [PMID: 33971668 DOI: 10.1093/bib/bbab100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 03/04/2021] [Accepted: 03/06/2021] [Indexed: 11/15/2022] Open

Wang B, Liua F, Deveaux L, Ash A, Gosh S, Li X, Rundensteiner E, Cottrell L, Adderley R, Stanton B. Adolescent HIV-related behavioural prediction using machine learning: a foundation for precision HIV prevention. AIDS 2021;35:S75-S84. [PMID: 33867490 PMCID: PMC8133351 DOI: 10.1097/qad.0000000000002867] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]

Ashbrook DG, Arends D, Prins P, Mulligan MK, Roy S, Williams EG, Lutz CM, Valenzuela A, Bohl CJ, Ingels JF, McCarty MS, Centeno AG, Hager R, Auwerx J, Lu L, Williams RW. A platform for experimental precision medicine: The extended BXD mouse family. Cell Syst 2021;12:235-247.e9. [PMID: 33472028 PMCID: PMC7979527 DOI: 10.1016/j.cels.2020.12.002] [Citation(s) in RCA: 89] [Impact Index Per Article: 29.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 08/29/2020] [Accepted: 12/21/2020] [Indexed: 12/17/2022]

Affiliation(s)

David G Ashbrook Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA.
Danny Arends Lebenswissenschaftliche Fakultät, Albrecht Daniel Thaer-Institut, Humboldt-Universität zu Berlin, Invalidenstraße 42, 10115 Berlin, Germany
Pjotr Prins Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
Megan K Mulligan Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
Suheeta Roy Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
Evan G Williams Luxembourg Centre for Systems Biomedicine, Université du Luxembourg, L-4365 Esch-sur-Alzette, Luxembourg
Cathleen M Lutz Mouse Repository and the Rare and Orphan Disease Center, the Jackson Laboratory, Bar Harbor, ME 04609, USA
Alicia Valenzuela Mouse Repository and the Rare and Orphan Disease Center, the Jackson Laboratory, Bar Harbor, ME 04609, USA
Casey J Bohl Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
Jesse F Ingels Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
Melinda S McCarty Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
Arthur G Centeno Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
Reinmar Hager Division of Evolution & Genomic Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, The University of Manchester, Oxford Road, Manchester M13 9PL, UK
Johan Auwerx Laboratory of Integrative Systems Physiology, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
Lu Lu Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA.
Robert W Williams Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA.

Collapse

Peng Q, Shen Y, Fu K, Dai Z, Jin L, Yang D, Zhu J. Artificial intelligence prediction model for overall survival of clear cell renal cell carcinoma based on a 21-gene molecular prognostic score system. Aging (Albany NY) 2021;13:7361-7381. [PMID: 33686949 PMCID: PMC7993746 DOI: 10.18632/aging.202594] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Accepted: 01/14/2021] [Indexed: 01/03/2023]

Orlenko A, Moore JH. A comparison of methods for interpreting random forest models of genetic association in the presence of non-additive interactions. BioData Min 2021;14:9. [PMID: 33514397 PMCID: PMC7847145 DOI: 10.1186/s13040-021-00243-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 01/13/2021] [Indexed: 01/19/2023] Open

Abstract

BACKGROUND

Non-additive interactions among genes are frequently associated with a number of phenotypes, including known complex diseases such as Alzheimer's, diabetes, and cardiovascular disease. Detecting interactions requires careful selection of analytical methods, and some machine learning algorithms are unable or underpowered to detect or model feature interactions that exhibit non-additivity. The Random Forest method is often employed in these efforts due to its ability to detect and model non-additive interactions. In addition, Random Forest has the built-in ability to estimate feature importance scores, a characteristic that allows the model to be interpreted with the order and effect size of the feature association with the outcome. This characteristic is very important for epidemiological and clinical studies where results of predictive modeling could be used to define the future direction of the research efforts. An alternative way to interpret the model is with a permutation feature importance metric which employs a permutation approach to calculate a feature contribution coefficient in units of the decrease in the model's performance and with the Shapely additive explanations which employ cooperative game theory approach. Currently, it is unclear which Random Forest feature importance metric provides a superior estimation of the true informative contribution of features in genetic association analysis.

RESULTS

To address this issue, and to improve interpretability of Random Forest predictions, we compared different methods for feature importance estimation in real and simulated datasets with non-additive interactions. As a result, we detected a discrepancy between the metrics for the real-world datasets and further established that the permutation feature importance metric provides more precise feature importance rank estimation for the simulated datasets with non-additive interactions.

CONCLUSIONS

By analyzing both real and simulated data, we established that the permutation feature importance metric provides more precise feature importance rank estimation in the presence of non-additive interactions.

Collapse

Bracher-Smith M, Crawford K, Escott-Price V. Machine learning for genetic prediction of psychiatric disorders: a systematic review. Mol Psychiatry 2021;26:70-79. [PMID: 32591634 PMCID: PMC7610853 DOI: 10.1038/s41380-020-0825-2] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 06/09/2020] [Accepted: 06/16/2020] [Indexed: 12/25/2022]

Abstract

Machine learning methods have been employed to make predictions in psychiatry from genotypes, with the potential to bring improved prediction of outcomes in psychiatric genetics; however, their current performance is unclear. We aim to systematically review machine learning methods for predicting psychiatric disorders from genetics alone and evaluate their discrimination, bias and implementation. Medline, PsycInfo, Web of Science and Scopus were searched for terms relating to genetics, psychiatric disorders and machine learning, including neural networks, random forests, support vector machines and boosting, on 10 September 2019. Following PRISMA guidelines, articles were screened for inclusion independently by two authors, extracted, and assessed for risk of bias. Overall, 63 full texts were assessed from a pool of 652 abstracts. Data were extracted for 77 models of schizophrenia, bipolar, autism or anorexia across 13 studies. Performance of machine learning methods was highly varied (0.48-0.95 AUC) and differed between schizophrenia (0.54-0.95 AUC), bipolar (0.48-0.65 AUC), autism (0.52-0.81 AUC) and anorexia (0.62-0.69 AUC). This is likely due to the high risk of bias identified in the study designs and analysis for reported results. Choices for predictor selection, hyperparameter search and validation methodology, and viewing of the test set during training were common causes of high risk of bias in analysis. Key steps in model development and validation were frequently not performed or unreported. Comparison of discrimination across studies was constrained by heterogeneity of predictors, outcome and measurement, in addition to sample overlap within and across studies. Given widespread high risk of bias and the small number of studies identified, it is important to ensure established analysis methods are adopted. We emphasise best practices in methodology and reporting for improving future studies.

Collapse

Yan KK, Zhao H, Wu JT, Pang H. An enhanced machine learning tool for cis-eQTL mapping with regularization and confounder adjustments. Genet Epidemiol 2020;44:798-810. [PMID: 32700329 PMCID: PMC7875251 DOI: 10.1002/gepi.22341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Revised: 07/07/2020] [Accepted: 07/07/2020] [Indexed: 11/07/2022]

Translational biomarkers in the era of precision medicine. Adv Clin Chem 2020;102:191-232. [PMID: 34044910 DOI: 10.1016/bs.acc.2020.08.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

Predicting the geographic origin of Spanish Cedar (Cedrela odorata L.) based on DNA variation. CONSERV GENET 2020. [DOI: 10.1007/s10592-020-01282-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Gim JA, Kwon Y, Lee HA, Lee KR, Kim S, Choi Y, Kim YK, Lee H. A Machine Learning-Based Identification of Genes Affecting the Pharmacokinetics of Tacrolimus Using the DMET^TM Plus Platform. Int J Mol Sci 2020;21:E2517. [PMID: 32260456 PMCID: PMC7178269 DOI: 10.3390/ijms21072517] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 03/29/2020] [Accepted: 04/02/2020] [Indexed: 12/15/2022] Open

Affiliation(s)

Jeong-An Gim Department of Transdisciplinary Studies, Graduate School of Convergence Science and Technology, Seoul National University, Seoul 16229, Korea; (J.-A.G.); (Y.K.); (H.A.L.); (K.-R.L.); (S.K.) Medical Science Research Center, College of Medicine, Korea University, Seoul 02841, Korea
Yonghan Kwon Department of Transdisciplinary Studies, Graduate School of Convergence Science and Technology, Seoul National University, Seoul 16229, Korea; (J.-A.G.); (Y.K.); (H.A.L.); (K.-R.L.); (S.K.) Department of Biostatistics and Computing, Yonsei University Graduate School, Seoul 03722, Korea
Hyun A Lee Department of Transdisciplinary Studies, Graduate School of Convergence Science and Technology, Seoul National University, Seoul 16229, Korea; (J.-A.G.); (Y.K.); (H.A.L.); (K.-R.L.); (S.K.) Department of Clinical Pharmacology and Therapeutics, Seoul National University College of Medicine and Hospital, Seoul 03080, Korea
Kyeong-Ryoon Lee Department of Transdisciplinary Studies, Graduate School of Convergence Science and Technology, Seoul National University, Seoul 16229, Korea; (J.-A.G.); (Y.K.); (H.A.L.); (K.-R.L.); (S.K.) Laboratory Animal Resource Center, Korea Research Institute of Bioscience and Biotechnology, Ochang, Chungbuk 28116, Korea
Soohyun Kim Department of Transdisciplinary Studies, Graduate School of Convergence Science and Technology, Seoul National University, Seoul 16229, Korea; (J.-A.G.); (Y.K.); (H.A.L.); (K.-R.L.); (S.K.)
Yoonjung Choi GC Pharma, Yongin 16924, Korea;
Yu Kyong Kim Daewoong Pharmaceutical Co., Ltd., Seoul 06170, Korea;
Howard Lee Department of Transdisciplinary Studies, Graduate School of Convergence Science and Technology, Seoul National University, Seoul 16229, Korea; (J.-A.G.); (Y.K.); (H.A.L.); (K.-R.L.); (S.K.) Department of Clinical Pharmacology and Therapeutics, Seoul National University College of Medicine and Hospital, Seoul 03080, Korea Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul 03080, Korea

Collapse

Kess T, Bentzen P, Lehnert SJ, Sylvester EVA, Lien S, Kent MP, Sinclair‐Waters M, Morris C, Wringe B, Fairweather R, Bradbury IR. Modular chromosome rearrangements reveal parallel and nonparallel adaptation in a marine fish. Ecol Evol 2020;10:638-653. [PMID: 32015832 PMCID: PMC6988541 DOI: 10.1002/ece3.5828] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 10/05/2019] [Accepted: 10/10/2019] [Indexed: 01/01/2023] Open

Breitbach ME, Greenspan S, Resnick NM, Perera S, Gurkar AU, Absher D, Levine AS. Exonic Variants in Aging-Related Genes Are Predictive of Phenotypic Aging Status. Front Genet 2019;10:1277. [PMID: 31921313 PMCID: PMC6931058 DOI: 10.3389/fgene.2019.01277] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Accepted: 11/19/2019] [Indexed: 01/31/2023] Open

Abstract

Background: Recent studies investigating longevity have revealed very few convincing genetic associations with increased lifespan. This is, in part, due to the complexity of biological aging, as well as the limited power of genome-wide association studies, which assay common single nucleotide polymorphisms (SNPs) and require several thousand subjects to achieve statistical significance. To overcome such barriers, we performed comprehensive DNA sequencing of a panel of 20 genes previously associated with phenotypic aging in a cohort of 200 individuals, half of whom were clinically defined by an "early aging" phenotype, and half of whom were clinically defined by a "late aging" phenotype based on age (65-75 years) and the ability to walk up a flight of stairs or walk for 15 min without resting. A validation cohort of 511 late agers was used to verify our results. Results: We found early agers were not enriched for more total variants in these 20 aging-related genes than late agers. Using machine learning methods, we identified the most predictive model of aging status, both in our discovery and validation cohorts, to be a random forest model incorporating damaging exon variants [Combined Annotation-Dependent Depletion (CADD) > 15]. The most heavily weighted variants in the model were within poly(ADP-ribose) polymerase 1 (PARP1) and excision repair cross complementation group 5 (ERCC5), both of which are involved in a canonical aging pathway, DNA damage repair. Conclusion: Overall, this study implemented a framework to apply machine learning to identify sequencing variants associated with complex phenotypes such as aging. While the small sample size making up our cohort inhibits our ability to make definitive conclusions about the ability of these genes to accurately predict aging, this study offers a unique method for exploring polygenic associations with complex phenotypes.

Collapse

TSLRF: Two-Stage Algorithm Based on Least Angle Regression and Random Forest in genome-wide association studies. Sci Rep 2019;9:18034. [PMID: 31792302 PMCID: PMC6889171 DOI: 10.1038/s41598-019-54519-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Accepted: 11/15/2019] [Indexed: 11/24/2022] Open

Abstract

One of the most important tasks in genome-wide association analysis (GWAS) is the detection of single-nucleotide polymorphisms (SNPs) which are related to target traits. With the development of sequencing technology, traditional statistical methods are difficult to analyze the corresponding high-dimensional massive data or SNPs. Recently, machine learning methods have become more popular in high-dimensional genetic data analysis for their fast computation speed. However, most of machine learning methods have several drawbacks, such as poor generalization ability, over-fitting, unsatisfactory classification and low detection accuracy. This study proposed a two-stage algorithm based on least angle regression and random forest (TSLRF), which firstly considered the control of population structure and polygenic effects, then selected the SNPs that were potentially related to target traits by using least angle regression (LARS), furtherly analyzed this variable subset using random forest (RF) to detect quantitative trait nucleotides (QTNs) associated with target traits. The new method has more powerful detection in simulation experiments and real data analyses. The results of simulation experiments showed that, compared with the existing approaches, the new method effectively improved the detection ability of QTNs and model fitting degree, and required less calculation time. In addition, the new method significantly distinguished QTNs and other SNPs. Subsequently, the new method was applied to analyze five flowering-related traits in Arabidopsis. The results showed that, the distinction between QTNs and unrelated SNPs was more significant than the other methods. The new method detected 60 genes confirmed to be related to the target trait, which was significantly higher than the other methods, and simultaneously detected multiple gene clusters associated with the target trait.

Collapse

Meijsen JJ, Rammos A, Campbell A, Hayward C, Porteous DJ, Deary IJ, Marioni RE, Nicodemus KK. Using tree-based methods for detection of gene-gene interactions in the presence of a polygenic signal: simulation study with application to educational attainment in the Generation Scotland Cohort Study. Bioinformatics 2019;35:181-188. [PMID: 29931044 PMCID: PMC6330004 DOI: 10.1093/bioinformatics/bty462] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Accepted: 06/14/2018] [Indexed: 11/13/2022] Open

Fine-Resolution Population Mapping from International Space Station Nighttime Photography and Multisource Social Sensing Data Based on Similarity Matching. REMOTE SENSING 2019. [DOI: 10.3390/rs11161900] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Abstract Previous studies have attempted to disaggregate census data into fine resolution with multisource remote sensing data considering the importance of fine-resolution population distribution in urban planning, environmental protection, resource allocation, and social economy. However, the lack of direct human activity information invariably restricts the accuracy of population mapping and reduces the credibility of the mapping process even when external facility distribution information is adopted. To address these problems, the present study proposed a novel population mapping method by combining International Space Station (ISS) photography nighttime light data, point of interest (POI) data, and location-based social media data. A similarity matching model, consisting of semantic and distance matching models, was established to integrate POI and social media data. Effective information was extracted from the integrated data through principal component analysis and then used along with road density information to train the random forest (RF) model. A comparison with WordPop data proved that our method can generate fine-resolution population distribution with higher accuracy ( R 2 = 0.91 ) than those of previous studies ( R 2 = 0.55 ). To illustrate the advantages of our method, we highlighted the limitations of previous methods that ignore social media data in handling residential regions with similar light intensity. We also discussed the performance of our method in adopting social media data, considering their characteristics, with different volumes and acquisition times. Results showed that social media data acquired between 19:00 and 8:00 with a volume of approximately 300,000 will help our method realize high accuracy with low computation burden. This study showed the great potential of combining social sensing data for disaggregating fine-resolution population. Collapse

Machine learning technology in the application of genome analysis: A systematic review. Gene 2019;705:149-156. [PMID: 31026571 DOI: 10.1016/j.gene.2019.04.062] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 04/17/2019] [Accepted: 04/22/2019] [Indexed: 01/17/2023]

Wu M, Ma S. Robust genetic interaction analysis. Brief Bioinform 2019;20:624-637. [PMID: 29897421 PMCID: PMC6556899 DOI: 10.1093/bib/bby033] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2018] [Revised: 03/22/2018] [Indexed: 01/17/2023] Open

Arabnejad M, Dawkins BA, Bush WS, White BC, Harkness AR, McKinney BA. Transition-transversion encoding and genetic relationship metric in ReliefF feature selection improves pathway enrichment in GWAS. BioData Min 2018;11:23. [PMID: 30410580 PMCID: PMC6215626 DOI: 10.1186/s13040-018-0186-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2018] [Accepted: 10/22/2018] [Indexed: 11/29/2022] Open

Li B, Zhang N, Wang YG, George AW, Reverter A, Li Y. Genomic Prediction of Breeding Values Using a Subset of SNPs Identified by Three Machine Learning Methods. Front Genet 2018;9:237. [PMID: 30023001 PMCID: PMC6039760 DOI: 10.3389/fgene.2018.00237] [Citation(s) in RCA: 79] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2018] [Accepted: 06/14/2018] [Indexed: 12/22/2022] Open

Abstract

The analysis of large genomic data is hampered by issues such as a small number of observations and a large number of predictive variables (commonly known as “large P small N”), high dimensionality or highly correlated data structures. Machine learning methods are renowned for dealing with these problems. To date machine learning methods have been applied in Genome-Wide Association Studies for identification of candidate genes, epistasis detection, gene network pathway analyses and genomic prediction of phenotypic values. However, the utility of two machine learning methods, Gradient Boosting Machine (GBM) and Extreme Gradient Boosting Method (XgBoost), in identifying a subset of SNP makers for genomic prediction of breeding values has never been explored before. In this study, using 38,082 SNP markers and body weight phenotypes from 2,093 Brahman cattle (1,097 bulls as a discovery population and 996 cows as a validation population), we examined the efficiency of three machine learning methods, namely Random Forests (RF), GBM and XgBoost, in (a) the identification of top 400, 1,000, and 3,000 ranked SNPs; (b) using the subsets of SNPs to construct genomic relationship matrices (GRMs) for the estimation of genomic breeding values (GEBVs). For comparison purposes, we also calculated the GEBVs from (1) 400, 1,000, and 3,000 SNPs that were randomly selected and evenly spaced across the genome, and (2) from all the SNPs. We found that RF and especially GBM are efficient methods in identifying a subset of SNPs with direct links to candidate genes affecting the growth trait. In comparison to the estimate of prediction accuracy of GEBVs from using all SNPs (0.43), the 3,000 top SNPs identified by RF (0.42) and GBM (0.46) had similar values to those of the whole SNP panel. The performance of the subsets of SNPs from RF and GBM was substantially better than that of evenly spaced subsets across the genome (0.18–0.29). Of the three methods, RF and GBM consistently outperformed the XgBoost in genomic prediction accuracy.

Collapse

Waters CD, Hard JJ, Brieuc MSO, Fast DE, Warheit KI, Knudsen CM, Bosch WJ, Naish KA. Genomewide association analyses of fitness traits in captive-reared Chinook salmon: Applications in evaluating conservation strategies. Evol Appl 2018;11:853-868. [PMID: 29928295 PMCID: PMC5999212 DOI: 10.1111/eva.12599] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Accepted: 01/09/2018] [Indexed: 12/20/2022] Open

Abstract

A novel application of genomewide association analyses is to use trait-associated loci to monitor the effects of conservation strategies on potentially adaptive genetic variation. Comparisons of fitness between captive- and wild-origin individuals, for example, do not reveal how captive rearing affects genetic variation underlying fitness traits or which traits are most susceptible to domestication selection. Here, we used data collected across four generations to identify loci associated with six traits in adult Chinook salmon (Oncorhynchus tshawytscha) and then determined how two alternative management approaches for captive rearing affected variation at these loci. Loci associated with date of return to freshwater spawning grounds (return timing), length and weight at return, age at maturity, spawn timing, and daily growth coefficient were identified using 9108 restriction site-associated markers and random forest, an approach suitable for polygenic traits. Mapping of trait-associated loci, gene annotations, and integration of results across multiple studies revealed candidate regions involved in several fitness-related traits. Genotypes at trait-associated loci were then compared between two hatchery populations that were derived from the same source but are now managed as separate lines, one integrated with and one segregated from the wild population. While no broad-scale change was detected across four generations, there were numerous regions where trait-associated loci overlapped with signatures of adaptive divergence previously identified in the two lines. Many regions, primarily with loci linked to return and spawn timing, were either unique to or more divergent in the segregated line, suggesting that these traits may be responding to domestication selection. This study is one of the first to utilize genomic approaches to demonstrate the effectiveness of a conservation strategy, managed gene flow, on trait-associated-and potentially adaptive-loci. The results will promote the development of trait-specific tools to better monitor genetic change in captive and wild populations.

Collapse

Kang J, Rancati T, Lee S, Oh JH, Kerns SL, Scott JG, Schwartz R, Kim S, Rosenstein BS. Machine Learning and Radiogenomics: Lessons Learned and Future Directions. Front Oncol 2018;8:228. [PMID: 29977864 PMCID: PMC6021505 DOI: 10.3389/fonc.2018.00228] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Accepted: 06/04/2018] [Indexed: 12/25/2022] Open

Abstract

Due to the rapid increase in the availability of patient data, there is significant interest in precision medicine that could facilitate the development of a personalized treatment plan for each patient on an individual basis. Radiation oncology is particularly suited for predictive machine learning (ML) models due to the enormous amount of diagnostic data used as input and therapeutic data generated as output. An emerging field in precision radiation oncology that can take advantage of ML approaches is radiogenomics, which is the study of the impact of genomic variations on the sensitivity of normal and tumor tissue to radiation. Currently, patients undergoing radiotherapy are treated using uniform dose constraints specific to the tumor and surrounding normal tissues. This is suboptimal in many ways. First, the dose that can be delivered to the target volume may be insufficient for control but is constrained by the surrounding normal tissue, as dose escalation can lead to significant morbidity and rare. Second, two patients with nearly identical dose distributions can have substantially different acute and late toxicities, resulting in lengthy treatment breaks and suboptimal control, or chronic morbidities leading to poor quality of life. Despite significant advances in radiogenomics, the magnitude of the genetic contribution to radiation response far exceeds our current understanding of individual risk variants. In the field of genomics, ML methods are being used to extract harder-to-detect knowledge, but these methods have yet to fully penetrate radiogenomics. Hence, the goal of this publication is to provide an overview of ML as it applies to radiogenomics. We begin with a brief history of radiogenomics and its relationship to precision medicine. We then introduce ML and compare it to statistical hypothesis testing to reflect on shared lessons and to avoid common pitfalls. Current ML approaches to genome-wide association studies are examined. The application of ML specifically to radiogenomics is next presented. We end with important lessons for the proper integration of ML into radiogenomics.

Collapse

Wheeler NE, Gardner PP, Barquist L. Machine learning identifies signatures of host adaptation in the bacterial pathogen Salmonella enterica. PLoS Genet 2018;14:e1007333. [PMID: 29738521 PMCID: PMC5940178 DOI: 10.1371/journal.pgen.1007333] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Accepted: 03/24/2018] [Indexed: 11/18/2022] Open

Brieuc MSO, Waters CD, Drinan DP, Naish KA. A practical introduction to Random Forest for genetic association studies in ecology and evolution. Mol Ecol Resour 2018;18:755-766. [PMID: 29504715 DOI: 10.1111/1755-0998.12773] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2017] [Revised: 02/08/2018] [Accepted: 02/17/2018] [Indexed: 12/25/2022]

Lee S, Kerns S, Ostrer H, Rosenstein B, Deasy JO, Oh JH. Machine Learning on a Genome-wide Association Study to Predict Late Genitourinary Toxicity After Prostate Radiation Therapy. Int J Radiat Oncol Biol Phys 2018;101:128-135. [PMID: 29502932 DOI: 10.1016/j.ijrobp.2018.01.054] [Citation(s) in RCA: 54] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Revised: 01/02/2018] [Accepted: 01/16/2018] [Indexed: 01/23/2023]

Burghardt LT, Young ND, Tiffin P. A Guide to Genome-Wide Association Mapping in Plants. ACTA ACUST UNITED AC 2017;2:22-38. [PMID: 31725973 DOI: 10.1002/cppb.20041] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Uncovering the genetic signature of quantitative trait evolution with replicated time series data. Heredity (Edinb) 2016;118:42-51. [PMID: 27848948 DOI: 10.1038/hdy.2016.98] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2016] [Revised: 08/18/2016] [Accepted: 08/24/2016] [Indexed: 01/04/2023] Open

Exploiting Single-Cell Quantitative Data to Map Genetic Variants Having Probabilistic Effects. PLoS Genet 2016;12:e1006213. [PMID: 27479122 PMCID: PMC4968810 DOI: 10.1371/journal.pgen.1006213] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2016] [Accepted: 07/02/2016] [Indexed: 01/11/2023] Open

Märtens K, Hallin J, Warringer J, Liti G, Parts L. Predicting quantitative traits from genome and phenome with near perfect accuracy. Nat Commun 2016;7:11512. [PMID: 27160605 PMCID: PMC4866306 DOI: 10.1038/ncomms11512] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Accepted: 04/01/2016] [Indexed: 12/20/2022] Open