1
|
Guo H, Lu Y, Lei Z, Bao H, Zhang M, Wang Z, Guan C, Tang B, Liu Z, Wang L. Machine learning-guided realization of full-color high-quantum-yield carbon quantum dots. Nat Commun 2024; 15:4843. [PMID: 38844440 PMCID: PMC11156924 DOI: 10.1038/s41467-024-49172-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 05/24/2024] [Indexed: 06/09/2024] Open
Abstract
Carbon quantum dots (CQDs) have versatile applications in luminescence, whereas identifying optimal synthesis conditions has been challenging due to numerous synthesis parameters and multiple desired outcomes, creating an enormous search space. In this study, we present a novel multi-objective optimization strategy utilizing a machine learning (ML) algorithm to intelligently guide the hydrothermal synthesis of CQDs. Our closed-loop approach learns from limited and sparse data, greatly reducing the research cycle and surpassing traditional trial-and-error methods. Moreover, it also reveals the intricate links between synthesis parameters and target properties and unifies the objective function to optimize multiple desired properties like full-color photoluminescence (PL) wavelength and high PL quantum yields (PLQY). With only 63 experiments, we achieve the synthesis of full-color fluorescent CQDs with high PLQY exceeding 60% across all colors. Our study represents a significant advancement in ML-guided CQDs synthesis, setting the stage for developing new materials with multiple desired properties.
Collapse
Affiliation(s)
- Huazhang Guo
- Institute of Nanochemistry and Nanobiology, School of Environmental and Chemical Engineering, Shanghai University, 99 Shangda Road, BaoShan District, Shanghai, 200444, China
| | - Yuhao Lu
- College of Computing and Data Science, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore
| | - Zhendong Lei
- School of Materials Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore
| | - Hong Bao
- Institute of Nanochemistry and Nanobiology, School of Environmental and Chemical Engineering, Shanghai University, 99 Shangda Road, BaoShan District, Shanghai, 200444, China
| | - Mingwan Zhang
- Institute of Nanochemistry and Nanobiology, School of Environmental and Chemical Engineering, Shanghai University, 99 Shangda Road, BaoShan District, Shanghai, 200444, China
| | - Zeming Wang
- Institute of Nanochemistry and Nanobiology, School of Environmental and Chemical Engineering, Shanghai University, 99 Shangda Road, BaoShan District, Shanghai, 200444, China
| | - Cuntai Guan
- College of Computing and Data Science, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore.
| | - Bijun Tang
- School of Materials Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore.
| | - Zheng Liu
- School of Materials Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore.
- CINTRA CNRS/NTU/THALES, UMI 3288, Research Techno Plaza, 50 Nanyang Drive, Border X Block, Level 6, Singapore, 637553, Singapore.
- Institute for Functional Intelligent Materials, National University of Singapore, Singapore, Singapore.
| | - Liang Wang
- Institute of Nanochemistry and Nanobiology, School of Environmental and Chemical Engineering, Shanghai University, 99 Shangda Road, BaoShan District, Shanghai, 200444, China.
- School of Materials Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore.
| |
Collapse
|
2
|
Koido M, Tomizuka K, Terao C. Fundamentals for predicting transcriptional regulations from DNA sequence patterns. J Hum Genet 2024:10.1038/s10038-024-01256-3. [PMID: 38730006 DOI: 10.1038/s10038-024-01256-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 04/10/2024] [Accepted: 04/25/2024] [Indexed: 05/12/2024]
Abstract
Cell-type-specific regulatory elements, cataloged through extensive experiments and bioinformatics in large-scale consortiums, have enabled enrichment analyses of genetic associations that primarily utilize positional information of the regulatory elements. These analyses have identified cell types and pathways genetically associated with human complex traits. However, our understanding of detailed allelic effects on these elements' activities and on-off states remains incomplete, hampering the interpretation of human genetic study results. This review introduces machine learning methods to learn sequence-dependent transcriptional regulation mechanisms from DNA sequences for predicting such allelic effects (not associations). We provide a concise history of machine-learning-based approaches, the requirements, and the key computational processes, focusing on primers in machine learning. Convolution and self-attention, pivotal in modern deep-learning models, are explained through geometrical interpretations using dot products. This facilitates understanding of the concept and why these have been used for machine learning for DNA sequences. These will inspire further research in this genetics and genomics field.
Collapse
Affiliation(s)
- Masaru Koido
- Laboratory of Complex Trait Genomics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan.
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.
| | - Kohei Tomizuka
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan.
- The Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan.
| |
Collapse
|
3
|
Liu X, Koyama S, Tomizuka K, Takata S, Ishikawa Y, Ito S, Kosugi S, Suzuki K, Hikino K, Koido M, Koike Y, Horikoshi M, Gakuhari T, Ikegawa S, Matsuda K, Momozawa Y, Ito K, Kamatani Y, Terao C. Decoding triancestral origins, archaic introgression, and natural selection in the Japanese population by whole-genome sequencing. SCIENCE ADVANCES 2024; 10:eadi8419. [PMID: 38630824 PMCID: PMC11023554 DOI: 10.1126/sciadv.adi8419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Accepted: 03/07/2024] [Indexed: 04/19/2024]
Abstract
We generated Japanese Encyclopedia of Whole-Genome/Exome Sequencing Library (JEWEL), a high-depth whole-genome sequencing dataset comprising 3256 individuals from across Japan. Analysis of JEWEL revealed genetic characteristics of the Japanese population that were not discernible using microarray data. First, rare variant-based analysis revealed an unprecedented fine-scale genetic structure. Together with population genetics analysis, the present-day Japanese can be decomposed into three ancestral components. Second, we identified unreported loss-of-function (LoF) variants and observed that for specific genes, LoF variants appeared to be restricted to a more limited set of transcripts than would be expected by chance, with PTPRD as a notable example. Third, we identified 44 archaic segments linked to complex traits, including a Denisovan-derived segment at NKX6-1 associated with type 2 diabetes. Most of these segments are specific to East Asians. Fourth, we identified candidate genetic loci under recent natural selection. Overall, our work provided insights into genetic characteristics of the Japanese population.
Collapse
Affiliation(s)
- Xiaoxi Liu
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
| | - Satoshi Koyama
- Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Medical and Population Genetics and Cardiovascular Disease Initiative, Broad Institute of Harvard and MIT, Boston, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | - Kohei Tomizuka
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Sadaaki Takata
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Yuki Ishikawa
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Shuji Ito
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Laboratory for Bone and Joint Diseases, RIKEN Center for Medical Sciences, Tokyo, Japan
- Department of Orthopedic Surgery, Faculty of Medicine, Shimane University, Izumo, Japan
| | - Shunichi Kosugi
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Kunihiko Suzuki
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Keiko Hikino
- Laboratory for Pharmacogenomics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Masaru Koido
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Laboratory of Complex Trait Genomics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Yoshinao Koike
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Laboratory for Bone and Joint Diseases, RIKEN Center for Medical Sciences, Tokyo, Japan
- Department of Orthopedic Surgery, Hokkaido University Graduate School of Medicine, Sapporo, Japan
| | - Momoko Horikoshi
- Laboratory for Genomics of Diabetes and Metabolism, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Takashi Gakuhari
- Institute for the Study of Ancient Civilizations and Cultural Resources, College of Human and Social Sciences, Kanazawa University, Kanazawa, Japan
| | - Shiro Ikegawa
- Laboratory for Bone and Joint Diseases, RIKEN Center for Medical Sciences, Tokyo, Japan
| | - Kochi Matsuda
- Laboratory of Genome Technology, Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
- Laboratory of Clinical Genome Sequencing, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Yukihide Momozawa
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Kaoru Ito
- Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Yoichiro Kamatani
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Laboratory of Complex Trait Genomics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- The Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| |
Collapse
|
4
|
Zhang Y, Zhang P, Wu H. Enhancer-MDLF: a novel deep learning framework for identifying cell-specific enhancers. Brief Bioinform 2024; 25:bbae083. [PMID: 38485768 PMCID: PMC10938904 DOI: 10.1093/bib/bbae083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Revised: 01/27/2024] [Accepted: 02/07/2024] [Indexed: 03/18/2024] Open
Abstract
Enhancers, noncoding DNA fragments, play a pivotal role in gene regulation, facilitating gene transcription. Identifying enhancers is crucial for understanding genomic regulatory mechanisms, pinpointing key elements and investigating networks governing gene expression and disease-related mechanisms. Existing enhancer identification methods exhibit limitations, prompting the development of our novel multi-input deep learning framework, termed Enhancer-MDLF. Experimental results illustrate that Enhancer-MDLF outperforms the previous method, Enhancer-IF, across eight distinct human cell lines and exhibits superior performance on generic enhancer datasets and enhancer-promoter datasets, affirming the robustness of Enhancer-MDLF. Additionally, we introduce transfer learning to provide an effective and potential solution to address the prediction challenges posed by enhancer specificity. Furthermore, we utilize model interpretation to identify transcription factor binding site motifs that may be associated with enhancer regions, with important implications for facilitating the study of enhancer regulatory mechanisms. The source code is openly accessible at https://github.com/HaoWuLab-Bioinformatics/Enhancer-MDLF.
Collapse
Affiliation(s)
- Yao Zhang
- School of Software, Shandong University, Jinan, 250100, Shandong, China
| | - Pengyu Zhang
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Hao Wu
- School of Software, Shandong University, Jinan, 250100, Shandong, China
| |
Collapse
|
5
|
Lee D, Han SK, Yaacov O, Berk-Rauch H, Mathiyalagan P, Ganesh SK, Chakravarti A. Tissue-specific and tissue-agnostic effects of genome sequence variation modulating blood pressure. Cell Rep 2023; 42:113351. [PMID: 37910504 PMCID: PMC10726310 DOI: 10.1016/j.celrep.2023.113351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 09/21/2023] [Accepted: 10/11/2023] [Indexed: 11/03/2023] Open
Abstract
Genome-wide association studies (GWASs) have identified numerous variants associated with polygenic traits and diseases. However, with few exceptions, a mechanistic understanding of which variants affect which genes in which tissues to modulate trait variation is lacking. Here, we present genomic analyses to explain trait heritability of blood pressure (BP) through the genetics of transcriptional regulation using GWASs, multiomics data from different tissues, and machine learning approaches. Approximately 500,000 predicted regulatory variants across four tissues explain 33.4% of variant heritability: 2.5%, 5.3%, 7.7%, and 11.8% for kidney-, adrenal-, heart-, and artery-specific variants, respectively. Variation in the enhancers involved shows greater tissue specificity than in the genes they regulate, suggesting that gene regulatory networks perturbed by enhancer variants in a tissue relevant to a phenotype are the major source of interindividual variation in BP. Thus, our study provides an approach to scan human tissue and cell types for their physiological contribution to any trait.
Collapse
Affiliation(s)
- Dongwon Lee
- Department of Pediatrics, Division of Nephrology, Boston Children's Hospital, Boston & Harvard Medical School, Boston, MA, USA.
| | - Seong Kyu Han
- Department of Pediatrics, Division of Nephrology, Boston Children's Hospital, Boston & Harvard Medical School, Boston, MA, USA
| | - Or Yaacov
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, NY, USA
| | - Hanna Berk-Rauch
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, NY, USA
| | - Prabhu Mathiyalagan
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, NY, USA
| | - Santhi K Ganesh
- Department of Internal Medicine & Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Aravinda Chakravarti
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, NY, USA.
| |
Collapse
|
6
|
Sokolova K, Theesfeld CL, Wong AK, Zhang Z, Dolinski K, Troyanskaya OG. Atlas of primary cell-type-specific sequence models of gene expression and variant effects. CELL REPORTS METHODS 2023; 3:100580. [PMID: 37703883 PMCID: PMC10545936 DOI: 10.1016/j.crmeth.2023.100580] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 05/05/2023] [Accepted: 08/18/2023] [Indexed: 09/15/2023]
Abstract
Human biology is rooted in highly specialized cell types programmed by a common genome, 98% of which is outside of genes. Genetic variation in the enormous noncoding space is linked to the majority of disease risk. To address the problem of linking these variants to expression changes in primary human cells, we introduce ExPectoSC, an atlas of modular deep-learning-based models for predicting cell-type-specific gene expression directly from sequence. We provide models for 105 primary human cell types covering 7 organ systems, demonstrate their accuracy, and then apply them to prioritize relevant cell types for complex human diseases. The resulting atlas of sequence-based gene expression and variant effects is publicly available in a user-friendly interface and readily extensible to any primary cell types. We demonstrate the accuracy of our approach through systematic evaluations and apply the models to prioritize ClinVar clinical variants of uncertain significance, verifying our top predictions experimentally.
Collapse
Affiliation(s)
- Ksenia Sokolova
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Chandra L Theesfeld
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA.
| | - Aaron K Wong
- Flatiron Institute, Simons Foundation, New York City, NY 10001, USA
| | - Zijun Zhang
- Flatiron Institute, Simons Foundation, New York City, NY 10001, USA; Division of Artificial Intelligence in Medicine, Cedars-Sinai Medical Center, 116 N. Robertson Boulevard, Los Angeles, CA 90048, USA
| | - Kara Dolinski
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Olga G Troyanskaya
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA; Flatiron Institute, Simons Foundation, New York City, NY 10001, USA.
| |
Collapse
|
7
|
Fan K, Pfister E, Weng Z. Toward a comprehensive catalog of regulatory elements. Hum Genet 2023; 142:1091-1111. [PMID: 36935423 DOI: 10.1007/s00439-023-02519-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Accepted: 01/03/2023] [Indexed: 03/21/2023]
Abstract
Regulatory elements are the genomic regions that interact with transcription factors to control cell-type-specific gene expression in different cellular environments. A precise and complete catalog of functional elements encoded by the human genome is key to understanding mammalian gene regulation. Here, we review the current state of regulatory element annotation. We first provide an overview of assays for characterizing functional elements, including genome, epigenome, transcriptome, three-dimensional chromatin interaction, and functional validation assays. We then discuss computational methods for defining regulatory elements, including peak-calling and other statistical modeling methods. Finally, we introduce several high-quality lists of regulatory element annotations and suggest potential future directions.
Collapse
Affiliation(s)
- Kaili Fan
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, 368 Plantation Street, ASC5-1069, Worcester, MA, 01605, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, 02138, USA
| | - Edith Pfister
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, 368 Plantation Street, ASC5-1069, Worcester, MA, 01605, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, 368 Plantation Street, ASC5-1069, Worcester, MA, 01605, USA.
| |
Collapse
|
8
|
Campbell C, Francis A, Gaunt TR. Predicting pathogenicity from non-coding mutations. Nat Biomed Eng 2022:10.1038/s41551-022-00996-x. [DOI: 10.1038/s41551-022-00996-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|