1
|
Haghayegh F, Norouziazad A, Haghani E, Feygin AA, Rahimi RH, Ghavamabadi HA, Sadighbayan D, Madhoun F, Papagelis M, Felfeli T, Salahandish R. Revolutionary Point-of-Care Wearable Diagnostics for Early Disease Detection and Biomarker Discovery through Intelligent Technologies. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024:e2400595. [PMID: 38958517 DOI: 10.1002/advs.202400595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 06/19/2024] [Indexed: 07/04/2024]
Abstract
Early-stage disease detection, particularly in Point-Of-Care (POC) wearable formats, assumes pivotal role in advancing healthcare services and precision-medicine. Public benefits of early detection extend beyond cost-effectively promoting healthcare outcomes, to also include reducing the risk of comorbid diseases. Technological advancements enabling POC biomarker recognition empower discovery of new markers for various health conditions. Integration of POC wearables for biomarker detection with intelligent frameworks represents ground-breaking innovations enabling automation of operations, conducting advanced large-scale data analysis, generating predictive models, and facilitating remote and guided clinical decision-making. These advancements substantially alleviate socioeconomic burdens, creating a paradigm shift in diagnostics, and revolutionizing medical assessments and technology development. This review explores critical topics and recent progress in development of 1) POC systems and wearable solutions for early disease detection and physiological monitoring, as well as 2) discussing current trends in adoption of smart technologies within clinical settings and in developing biological assays, and ultimately 3) exploring utilities of POC systems and smart platforms for biomarker discovery. Additionally, the review explores technology translation from research labs to broader applications. It also addresses associated risks, biases, and challenges of widespread Artificial Intelligence (AI) integration in diagnostics systems, while systematically outlining potential prospects, current challenges, and opportunities.
Collapse
Affiliation(s)
- Fatemeh Haghayegh
- Laboratory of Advanced Biotechnologies for Health Assessments (Lab-HA), Biomedical Engineering Program, Lassonde School of Engineering, York University, Toronto, M3J 1P3, Canada
- Department of Electrical Engineering and Computer Science (EECS), Lassonde School of Engineering, York University, Toronto, ON, M3J 1P3, Canada
| | - Alireza Norouziazad
- Laboratory of Advanced Biotechnologies for Health Assessments (Lab-HA), Biomedical Engineering Program, Lassonde School of Engineering, York University, Toronto, M3J 1P3, Canada
- Department of Electrical Engineering and Computer Science (EECS), Lassonde School of Engineering, York University, Toronto, ON, M3J 1P3, Canada
| | - Elnaz Haghani
- Laboratory of Advanced Biotechnologies for Health Assessments (Lab-HA), Biomedical Engineering Program, Lassonde School of Engineering, York University, Toronto, M3J 1P3, Canada
- Department of Electrical Engineering and Computer Science (EECS), Lassonde School of Engineering, York University, Toronto, ON, M3J 1P3, Canada
| | - Ariel Avraham Feygin
- Laboratory of Advanced Biotechnologies for Health Assessments (Lab-HA), Biomedical Engineering Program, Lassonde School of Engineering, York University, Toronto, M3J 1P3, Canada
- Department of Electrical Engineering and Computer Science (EECS), Lassonde School of Engineering, York University, Toronto, ON, M3J 1P3, Canada
| | - Reza Hamed Rahimi
- Laboratory of Advanced Biotechnologies for Health Assessments (Lab-HA), Biomedical Engineering Program, Lassonde School of Engineering, York University, Toronto, M3J 1P3, Canada
- Department of Electrical Engineering and Computer Science (EECS), Lassonde School of Engineering, York University, Toronto, ON, M3J 1P3, Canada
| | - Hamidreza Akbari Ghavamabadi
- Laboratory of Advanced Biotechnologies for Health Assessments (Lab-HA), Biomedical Engineering Program, Lassonde School of Engineering, York University, Toronto, M3J 1P3, Canada
- Department of Electrical Engineering and Computer Science (EECS), Lassonde School of Engineering, York University, Toronto, ON, M3J 1P3, Canada
| | - Deniz Sadighbayan
- Department of Biology, Faculty of Science, York University, Toronto, ON, M3J 1P3, Canada
| | - Faress Madhoun
- Laboratory of Advanced Biotechnologies for Health Assessments (Lab-HA), Biomedical Engineering Program, Lassonde School of Engineering, York University, Toronto, M3J 1P3, Canada
- Department of Electrical Engineering and Computer Science (EECS), Lassonde School of Engineering, York University, Toronto, ON, M3J 1P3, Canada
| | - Manos Papagelis
- Department of Electrical Engineering and Computer Science (EECS), Lassonde School of Engineering, York University, Toronto, ON, M3J 1P3, Canada
| | - Tina Felfeli
- Department of Ophthalmology and Vision Sciences, University of Toronto, Ontario, M5T 3A9, Canada
- Institute of Health Policy, Management and Evaluation, University of Toronto, Ontario, M5T 3M6, Canada
| | - Razieh Salahandish
- Laboratory of Advanced Biotechnologies for Health Assessments (Lab-HA), Biomedical Engineering Program, Lassonde School of Engineering, York University, Toronto, M3J 1P3, Canada
- Department of Electrical Engineering and Computer Science (EECS), Lassonde School of Engineering, York University, Toronto, ON, M3J 1P3, Canada
| |
Collapse
|
2
|
Tsukamoto M, Hishida A, Tamura T, Nagayoshi M, Okada R, Kubo Y, Kato Y, Hamajima N, Nishida Y, Shimanoe C, Ibusuki R, Shibuya K, Takashima N, Nakamura Y, Kusakabe M, Nakamura Y, Koyanagi YN, Oze I, Nishiyama T, Suzuki S, Watanabe I, Matsui D, Otonari J, Ikezaki H, Katsuura-Kamano S, Arisawa K, Kuriki K, Nakatochi M, Momozawa Y, Takeuchi K, Wakai K, Matsuo K. GWAS of Folate Metabolism With Gene-environment Interaction Analysis Revealed the Possible Role of Lifestyles in the Control of Blood Folate Metabolites in Japanese: The J-MICC Study. J Epidemiol 2024; 34:228-237. [PMID: 37517992 PMCID: PMC10999522 DOI: 10.2188/jea.je20220341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 06/30/2023] [Indexed: 08/01/2023] Open
Abstract
BACKGROUND The present genome-wide association study (GWAS) aimed to reveal the genetic loci associated with folate metabolites, as well as to detect related gene-environment interactions in Japanese. METHODS We conducted the GWAS of plasma homocysteine (Hcy), folic acid (FA), and vitamin B12 (VB12) levels in the Japan Multi-Institutional Collaborative Cohort (J-MICC) Study participants who joined from 2005 to 2012, and also estimated gene-environment interactions. In the replication phase, we used data from the Yakumo Study conducted in 2009. In the discovery phase, data of 2,263 participants from four independent study sites of the J-MICC Study were analyzed. In the replication phase, data of 573 participants from the Yakumo Study were analyzed. RESULTS For Hcy, MTHFR locus on chr 1, NOX4 on chr 11, CHMP1A on chr 16, and DPEP1 on chr 16 reached genome-wide significance (P < 5 × 10-8). MTHFR also associated with FA, and FUT2 on chr 19 associated with VB12. We investigated gene-environment interactions in both studies and found significant interactions between MTHFR C677T and ever drinking, current drinking, and physical activity >33% on Hcy (β = 0.039, 0.038 and -0.054, P = 0.018, 0.021 and <0.001, respectively) and the interaction of MTHFR C677T with ever drinking on FA (β = 0.033, P = 0.048). CONCLUSION The present GWAS revealed the folate metabolism-associated genetic loci and gene-environment interactions with drinking and physical activity in Japanese, suggesting the possibility of future personalized cardiovascular disease prevention.
Collapse
Affiliation(s)
- Mineko Tsukamoto
- Department of Preventive Medicine, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Asahi Hishida
- Department of Preventive Medicine, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Takashi Tamura
- Department of Preventive Medicine, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Mako Nagayoshi
- Department of Preventive Medicine, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Rieko Okada
- Department of Preventive Medicine, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Yoko Kubo
- Department of Preventive Medicine, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Yasufumi Kato
- Department of Preventive Medicine, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Nobuyuki Hamajima
- Department of Healthcare Administration, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Yuichiro Nishida
- Department of Preventive Medicine, Faculty of Medicine, Saga University, Saga, Japan
| | | | - Rie Ibusuki
- Department of International Island and Community Medicine, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Kenichi Shibuya
- Department of International Island and Community Medicine, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Naoyuki Takashima
- Department of Public Health, Shiga University of Medical Science, Otsu, Japan
| | - Yasuyuki Nakamura
- Department of Public Health, Shiga University of Medical Science, Otsu, Japan
| | - Miho Kusakabe
- Cancer Prevention Center, Chiba Cancer Center Research Institute, Chiba, Japan
| | - Yohko Nakamura
- Cancer Prevention Center, Chiba Cancer Center Research Institute, Chiba, Japan
| | - Yuriko N. Koyanagi
- Division of Cancer Information and Control, Department of Preventive Medicine, Aichi Cancer Center Research Institute, Nagoya, Japan
| | - Isao Oze
- Division of Cancer Epidemiology and Prevention, Aichi Cancer Center Research Institute, Nagoya, Japan
| | - Takeshi Nishiyama
- Department of Public Health, Nagoya City University Graduate School of Medical Sciences, Nagoya, Japan
| | - Sadao Suzuki
- Department of Public Health, Nagoya City University Graduate School of Medical Sciences, Nagoya, Japan
| | - Isao Watanabe
- Department of Epidemiology for Community Health and Medicine, Kyoto Prefectural University of Medicine, Kyoto, Japan
| | - Daisuke Matsui
- Department of Epidemiology for Community Health and Medicine, Kyoto Prefectural University of Medicine, Kyoto, Japan
| | - Jun Otonari
- Department of Psychosomatic Medicine, Kyushu University Graduate School of Medical Sciences, Faculty of Medical Sciences, Fukuoka, Japan
| | - Hiroaki Ikezaki
- Department of Comprehensive General Internal Medicine, Kyushu University Graduate School of Medical Sciences, Faculty of Medical Sciences, Fukuoka, Japan
| | - Sakurako Katsuura-Kamano
- Department of Preventive Medicine, Tokushima University Graduate School of Biomedical Sciences, Tokushima, Japan
| | - Kokichi Arisawa
- Laboratory of Public Health, Division of Nutritional Sciences, School of Food and Nutritional Sciences, University of Shizuoka, Shizuoka, Japan
| | - Kiyonori Kuriki
- Laboratory of Public Health, Division of Nutritional Sciences, School of Food and Nutritional Sciences, University of Shizuoka, Shizuoka, Japan
| | - Masahiro Nakatochi
- Public Health Informatics Unit, Department of Integrated Health Sciences, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Yukihide Momozawa
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Kenji Takeuchi
- Department of Preventive Medicine, Nagoya University Graduate School of Medicine, Nagoya, Japan
- Department of International and Community Oral Health, Tohoku University Graduate School of Dentistry, Sendai, Japan
| | - Kenji Wakai
- Department of Preventive Medicine, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Keitaro Matsuo
- Division of Cancer Epidemiology and Prevention, Aichi Cancer Center Research Institute, Nagoya, Japan
- Department of Epidemiology, Nagoya University Graduate School of Medicine, Nagoya, Japan
| |
Collapse
|
3
|
Hozumi Y, Tanemura KA, Wei GW. Preprocessing of Single Cell RNA Sequencing Data Using Correlated Clustering and Projection. J Chem Inf Model 2024; 64:2829-2838. [PMID: 37402705 PMCID: PMC11009150 DOI: 10.1021/acs.jcim.3c00674] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/06/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell-cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing the downstream analysis. We present Correlated Clustering and Projection (CCP), a new data-domain dimensionality reduction method, for the first time. CCP projects each cluster of similar genes into a supergene defined as the accumulated pairwise nonlinear gene-gene correlations among all cells. Using 14 benchmark data sets, we demonstrate that CCP has significant advantages over classical principal component analysis (PCA) for clustering and/or classification problems with intrinsically high dimensionality. In addition, we introduce the Residue-Similarity index (RSI) as a novel metric for clustering and classification and the R-S plot as a new visualization tool. We show that the RSI correlates with accuracy without requiring the knowledge of the true labels. The R-S plot provides a unique alternative to the uniform manifold approximation and projection (UMAP) and t-distributed stochastic neighbor embedding (t-SNE) for data with a large number of cell types.
Collapse
Affiliation(s)
- Yuta Hozumi
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Kiyoto Aramis Tanemura
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
4
|
Li T, Yao J. Unveiling the hub genes in the SIGLECs family in colon adenocarcinoma with machine learning. Front Genet 2024; 15:1375100. [PMID: 38650859 PMCID: PMC11033367 DOI: 10.3389/fgene.2024.1375100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 03/25/2024] [Indexed: 04/25/2024] Open
Abstract
Background Despite the recognized roles of Sialic acid-binding Ig-like lectins (SIGLECs) in endocytosis and immune regulation across cancers, their molecular intricacies in colon adenocarcinoma (COAD) are underexplored. Meanwhile, the complicated interactions between different SIGLECs are also crucial but open questions. Methods We investigate the correlation between SIGLECs and various properties, including cancer status, prognosis, clinical features, functional enrichment, immune cell abundances, immune checkpoints, pathways, etc. To fully understand the behavior of multiple SIGLECs' co-evolution and subtract its leading effect, we additionally apply three unsupervised machine learning algorithms, namely, Principal Component Analysis (PCA), Self-Organizing Maps (SOM), K-means, and two supervised learning algorithms, Least Absolute Shrinkage and Selection Operator (LASSO) and neural network (NN). Results We find significantly lower expression levels in COAD samples, together with a systematic enhancement in the correlations between distinct SIGLECs. We demonstrate SIGLEC14 significantly affects the Overall Survival (OS) according to the Hazzard ratio, while using PCA further enhances the sensitivity to both OS and Disease Free Interval (DFI). We find any single SIGLEC is uncorrelated to the cancer stages, which can be significantly improved by using PCA. We further identify SIGLEC-1,15 and CD22 as hub genes in COAD through Differentially Expressed Genes (DEGs), which is consistent with our PCA-identified key components PC-1,2,5 considering both the correlation with cancer status and immune cell abundance. As an extension, we use SOM for the visualization of the SIGLECs and show the similarities and differences between COAD patients. SOM can also help us define subsamples according to the SIGLECs status, with corresponding changes in both immune cells and cancer T-stage, for instance. Conclusion We conclude SIGLEC-1,15 and CD22 as the most promising hub genes in the SIGLECs family in treating COAD. PCA offers significant enhancement in the prognosis and clinical analyses, while using SOM further unveils the transition phases or potential subtypes of COAD.
Collapse
Affiliation(s)
- Tiantian Li
- Key Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai, China
| | - Ji Yao
- Department of Astronomy, School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Astronomical Observatory, Shanghai, China
| |
Collapse
|
5
|
He X, Yang Z, Wang L, Sun Y, Cao H, Liang Y. NeuTox: A weighted ensemble model for screening potential neuronal cytotoxicity of chemicals based on various types of molecular representations. JOURNAL OF HAZARDOUS MATERIALS 2024; 465:133443. [PMID: 38198870 DOI: 10.1016/j.jhazmat.2024.133443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 01/02/2024] [Accepted: 01/03/2024] [Indexed: 01/12/2024]
Abstract
Chemical-induced neurotoxicity has been widely brought into focus in the risk assessment of chemical safety. However, the traditional in vivo animal models to evaluate neurotoxicity are time-consuming and expensive, which cannot completely represent the pathophysiology of neurotoxicity in humans. Cytotoxicity to human neuroblastoma cell line (SH-SY5Y) is commonly used as an alternative to animal testing for the assessment of neurotoxicity, yet it is still not appropriate for high throughput screening of potential neuronal cytotoxicity of chemicals. In this study, we constructed an ensemble prediction model, termed NeuTox, by combining multiple machine learning algorithms with molecular representations based on the weighted score of Particle Swarm Optimization. For the test set, NeuTox shows excellent performance with an accuracy of 0.9064, which are superior to the top-performing individual models. The subsequent experimental verifications reveal that 5,5'-isopropylidenedi-2-biphenylol and 4,4'-cyclo-hexylidenebisphenol exhibited stronger SH-SY5Y-based cytotoxicity compared to bisphenol A, suggesting that NeuTox has good generalization ability in the first-tier assessment of neuronal cytotoxicity of BPA analogs. For ease of use, NeuTox is presented as an online web server that can be freely accessed via http://www.iehneutox-predictor.cn/NeuToxPredict/Predict.
Collapse
Affiliation(s)
- Xuejun He
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Zeguo Yang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Ling Wang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Yuzhen Sun
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Huiming Cao
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China.
| | - Yong Liang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| |
Collapse
|
6
|
Nagayoshi M, Hishida A, Shimizu T, Kato Y, Kubo Y, Okada R, Tamura T, Otonari J, Ikezaki H, Hara M, Nishida Y, Oze I, Koyanagi YN, Nakamura Y, Kusakabe M, Ibusuki R, Shibuya K, Suzuki S, Nishiyama T, Koyama T, Ozaki E, Kuriki K, Takashima N, Nakamura Y, Katsuura-Kamano S, Arisawa K, Nakatochi M, Momozawa Y, Takeuchi K, Wakai K. BMI and Cardiometabolic Traits in Japanese: A Mendelian Randomization Study. J Epidemiol 2024; 34:51-62. [PMID: 36709979 PMCID: PMC10751192 DOI: 10.2188/jea.je20220154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 12/28/2022] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND Although many observational studies have demonstrated significant relationships between obesity and cardiometabolic traits, the causality of these relationships in East Asians remains to be elucidated. METHODS We conducted individual-level Mendelian randomization (MR) analyses targeting 14,083 participants in the Japan Multi-Institutional Collaborative Cohort Study and two-sample MR analyses using summary statistics based on genome-wide association study data from 173,430 Japanese. Using 83 body mass index (BMI)-related loci, genetic risk scores (GRS) for BMI were calculated, and the effects of BMI on cardiometabolic traits were examined for individual-level MR analyses using the two-stage least squares estimator method. The β-coefficients and standard errors for the per-allele association of each single-nucleotide polymorphism as well as all outcomes, or odds ratios with 95% confidence intervals were calculated in the two-sample MR analyses. RESULTS In individual-level MR analyses, the GRS of BMI was not significantly associated with any cardiometabolic traits. In two-sample MR analyses, higher BMI was associated with increased risks of higher blood pressure, triglycerides, and uric acid, as well as lower high-density-lipoprotein cholesterol and eGFR. The associations of BMI with type 2 diabetes in two-sample MR analyses were inconsistent using different methods, including the directions. CONCLUSION The results of this study suggest that, even among the Japanese, an East Asian population with low levels of obesity, higher BMI could be causally associated with the development of a variety of cardiometabolic traits. Causality in those associations should be clarified in future studies with larger populations, especially those of BMI with type 2 diabetes.
Collapse
Affiliation(s)
- Mako Nagayoshi
- Department of Preventive Medicine, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Asahi Hishida
- Department of Preventive Medicine, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Tomonori Shimizu
- Undergraduate Course, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Yasufumi Kato
- Department of Preventive Medicine, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Yoko Kubo
- Department of Preventive Medicine, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Rieko Okada
- Department of Preventive Medicine, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Takashi Tamura
- Department of Preventive Medicine, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Jun Otonari
- Department of Psychosomatic Medicine, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
- Department of Psychosomatic Medicine, International University of Health and Welfare Narita Hospital, Chiba, Japan
| | - Hiroaki Ikezaki
- Department of General Internal Medicine, Kyushu University Hospital, Fukuoka, Japan
- Department of Comprehensive General Internal Medicine, Kyushu University Faculty of Medical Sciences, Fukuoka, Japan
| | - Megumi Hara
- Department of Preventive Medicine, Faculty of Medicine, Saga University, Saga, Japan
| | - Yuichiro Nishida
- Department of Preventive Medicine, Faculty of Medicine, Saga University, Saga, Japan
| | - Isao Oze
- Division of Cancer Epidemiology and Prevention, Aichi Cancer Center Research Institute, Nagoya, Japan
| | - Yuriko N. Koyanagi
- Division of Cancer Information and Control, Aichi Cancer Center Research Institute, Nagoya, Japan
| | - Yohko Nakamura
- Cancer Prevention Center, Chiba Cancer Center Research Institute, Chiba, Japan
| | - Miho Kusakabe
- Cancer Prevention Center, Chiba Cancer Center Research Institute, Chiba, Japan
| | - Rie Ibusuki
- Department of International Island and Community Medicine, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Keiichi Shibuya
- Department of International Island and Community Medicine, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
- Department of Emergency, Kagoshima Prefectural Oshima Hospital, Kagoshima, Japan
| | - Sadao Suzuki
- Department of Public Health, Nagoya City University Graduate School of Medical Sciences, Nagoya, Japan
| | - Takeshi Nishiyama
- Department of Public Health, Nagoya City University Graduate School of Medical Sciences, Nagoya, Japan
| | - Teruhide Koyama
- Department of Epidemiology for Community Health and Medicine, Kyoto Prefectural University of Medicine, Kyoto, Japan
| | - Etsuko Ozaki
- Department of Epidemiology for Community Health and Medicine, Kyoto Prefectural University of Medicine, Kyoto, Japan
| | - Kiyonori Kuriki
- Laboratory of Public Health, Division of Nutritional Sciences, School of Food and Nutritional Sciences, University of Shizuoka, Shizuoka, Japan
| | - Naoyuki Takashima
- Department of Public Health, Faculty of Medicine, Kindai University, Osaka, Japan
- Department of Public Health, Shiga University of Medical Science, Otsu, Japan
| | - Yasuyuki Nakamura
- Department of Public Health, Shiga University of Medical Science, Otsu, Japan
- Yamashina Racto Clinic and Medical Examination Center, Kyoto, Japan
| | - Sakurako Katsuura-Kamano
- Department of Preventive Medicine, Institute of Biomedical Sciences, Tokushima University Graduate School, Tokushima, Japan
| | - Kokichi Arisawa
- Department of Preventive Medicine, Institute of Biomedical Sciences, Tokushima University Graduate School, Tokushima, Japan
| | - Masahiro Nakatochi
- Public Health Informatics Unit, Department of Integrated Health Sciences, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Yukihide Momozawa
- Laboratory for Genotyping Development, Center for Integrative Medical Sciences, Kanagawa, Japan
| | - Kenji Takeuchi
- Department of Preventive Medicine, Nagoya University Graduate School of Medicine, Nagoya, Japan
- Department of International and Community Oral Health, Tohoku University Graduate School of Dentistry, Sendai, Japan
| | - Kenji Wakai
- Department of Preventive Medicine, Nagoya University Graduate School of Medicine, Nagoya, Japan
| |
Collapse
|
7
|
Palarea-Albaladejo J, McNeilly TN, Nisbet AJ. A curated multivariate approach to study efficacy and optimisation of a prototype vaccine against teladorsagiasis in sheep. Vet Res Commun 2024; 48:367-379. [PMID: 37707655 PMCID: PMC10810991 DOI: 10.1007/s11259-023-10208-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 08/25/2023] [Indexed: 09/15/2023]
Abstract
This work discusses and demonstrates the novel use of multivariate analysis and data dimensionality reduction techniques to handle the variety and complexity of data generated in efficacy trials for the development of a prototype vaccine to protect sheep against the Teladorsagia circumcincta nematode. A curated collection of data dimension reduction and visualisation techniques, in conjunction with sensible statistical modelling and testing which explicitly model key features of the data, offers a synthetic view of the relationships between the multiple biological parameters measured. New biological insight is gained into the patterns and associations involving antigen-specific antibody levels, antibody avidity and parasitological parameters of efficacy that is not achievable by standard statistical practice in the field. This approach can therefore be used to guide vaccine refinement and simplification through identifying the most immunologically relevant antigens, and it can be analogously implemented for similar studies in other areas. To facilitate this, the associated data and computer codes written for the R open system for statistical computing are made freely available.
Collapse
Affiliation(s)
- Javier Palarea-Albaladejo
- Department of Computer Science, Applied Mathematics and Statistics, University of Girona, Girona, Spain.
- Biomathematics and Statistics Scotland, JCMB, The King's Buildings, Peter Guthrie Tait Road, Edinburgh, Scotland, UK.
| | - Tom N McNeilly
- Moredun Research Institute, Pentlands Science Park, Bush Loan, Penicuik, Scotland, UK
| | - Alasdair J Nisbet
- Moredun Research Institute, Pentlands Science Park, Bush Loan, Penicuik, Scotland, UK
| |
Collapse
|
8
|
Wang Y, Yu X, Gu Y, Li W, Zhu K, Chen L, Tang Y, Liu G. XGraphCDS: An explainable deep learning model for predicting drug sensitivity from gene pathways and chemical structures. Comput Biol Med 2024; 168:107746. [PMID: 38039896 DOI: 10.1016/j.compbiomed.2023.107746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 10/29/2023] [Accepted: 11/20/2023] [Indexed: 12/03/2023]
Abstract
Cancer is a highly complex disease characterized by genetic and phenotypic heterogeneity among individuals. In the era of precision medicine, understanding the genetic basis of these individual differences is crucial for developing new drugs and achieving personalized treatment. Despite the increasing abundance of cancer genomics data, predicting the relationship between cancer samples and drug sensitivity remains challenging. In this study, we developed an explainable graph neural network framework for predicting cancer drug sensitivity (XGraphCDS) based on comparative learning by integrating cancer gene expression information and drug chemical structure knowledge. Specifically, XGraphCDS consists of a unified heterogeneous network and multiple sub-networks, with molecular graphs representing drugs and gene enrichment scores representing cell lines. Experimental results showed that XGraphCDS consistently outperformed most state-of-the-art baselines (R2 = 0.863, AUC = 0.858). We also constructed a separate in vivo prediction model by using transfer learning strategies with in vitro experimental data and achieved good predictive power (AUC = 0.808). Simultaneously, our framework is interpretable, providing insights into resistance mechanisms alongside accurate predictions. The excellent performance of XGraphCDS highlights its immense potential in aiding the development of selective anti-tumor drugs and personalized dosing strategies in the field of precision medicine.
Collapse
Affiliation(s)
- Yimeng Wang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Xinxin Yu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yaxin Gu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Keyun Zhu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Long Chen
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
| |
Collapse
|
9
|
Carhuaricra-Huaman D, Setubal JC. Step-by-Step Bacterial Genome Comparison. Methods Mol Biol 2024; 2802:107-134. [PMID: 38819558 DOI: 10.1007/978-1-0716-3838-5_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Thanks to advancements in genome sequencing and bioinformatics, thousands of bacterial genome sequences are available in public databases. This presents an opportunity to study bacterial diversity in unprecedented detail. This chapter describes a complete bioinformatics workflow for comparative genomics of bacterial genomes, including genome annotation, pangenome reconstruction and visualization, phylogenetic analysis, and identification of sequences of interest such as antimicrobial-resistance genes, virulence factors, and phage sequences. The workflow uses state-of-the-art, open-source tools. The workflow is presented by means of a comparative analysis of Salmonella enterica serovar Typhimurium genomes. The workflow is based on Linux commands and scripts, and result visualization relies on the R environment. The chapter provides a step-by-step protocol that researchers with basic expertise in bioinformatics can easily follow to conduct investigations on their own genome datasets.
Collapse
Affiliation(s)
- Dennis Carhuaricra-Huaman
- Programa de Pós-Graduação Interunidades em Bioinformática, Instituto de Matemática e Estatística, Universidade de São Paulo, Sao Paulo, SP, Brazil
- Research Group in Biotechnology Applied to Animal Health, Production and Conservation (SANIGEN), Laboratory of Biology and Molecular Genetics, Faculty of Veterinary Medicine, Universidad Nacional Mayor de San Marcos, San Borja, Lima, Peru
| | - João Carlos Setubal
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, Sao Paulo, SP, Brazil.
| |
Collapse
|
10
|
Swarup N, Cheng J, Choi I, Heo YJ, Kordi M, Aziz M, Arora A, Li F, Chia D, Wei F, Elashoff D, Zhang L, Kim S, Kim Y, Wong DTW. Multi-faceted attributes of salivary cell-free DNA as liquid biopsy biomarkers for gastric cancer detection. Biomark Res 2023; 11:90. [PMID: 37817261 PMCID: PMC10566128 DOI: 10.1186/s40364-023-00524-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Accepted: 09/12/2023] [Indexed: 10/12/2023] Open
Abstract
BACKGROUND Recent advances in circulating cell-free DNA (cfDNA) analysis from biofluids have opened new avenues for liquid biopsy (LB). However, current cfDNA LB assays are limited by the availability of existing information on established genotypes associated with tumor tissues. Certain cancers present with a limited list of established mutated cfDNA biomarkers, and thus, nonmutated cfDNA characteristics along with alternative biofluids are needed to broaden the available cfDNA targets for cancer detection. Saliva is an intriguing and accessible biofluid that has yet to be fully explored for its clinical utility for cancer detection. METHODS In this report, we employed a low-coverage single stranded (ss) library NGS pipeline "Broad-Range cell-free DNA-Seq" (BRcfDNA-Seq) using saliva to comprehensively investigate the characteristics of salivary cfDNA (ScfDNA). The identification of cfDNA features has been made possible by applying novel cfDNA processing techniques that permit the incorporation of ultrashort, ss, and jagged DNA fragments. As a proof of concept using 10 gastric cancer (GC) and 10 noncancer samples, we examined whether ScfDNA characteristics, including fragmentomics, end motif profiles, microbial contribution, and human chromosomal mapping, could differentiate between these two groups. RESULTS Individual and integrative analysis of these ScfDNA features demonstrated significant differences between the two cohorts, suggesting that disease state may affect the ScfDNA population by altering nuclear cleavage or the profile of contributory organism cfDNA to total ScfDNA. We report that principal component analysis integration of several aspects of salivary cell-free DNA fragmentomic profiles, genomic element profiles, end-motif sequence patterns, and distinct oral microbiome populations can differentiate the two populations with a p value of < 0.0001 (PC1). CONCLUSION These novel features of ScfDNA characteristics could be clinically useful for improving saliva-based LB detection and the eventual monitoring of local or systemic diseases.
Collapse
Affiliation(s)
- Neeti Swarup
- School of Dentistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Jordan Cheng
- School of Dentistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Irene Choi
- School of Dentistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - You Jeong Heo
- The Samsung Advanced Institute for Health Sciences & Technology (SAIHST), Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, 06355, Republic of Korea
| | - Misagh Kordi
- School of Dentistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Mohammad Aziz
- School of Dentistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Akanksha Arora
- School of Dentistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Indraprastha Institute of Information Technology (IIIT), Delhi, India
| | - Feng Li
- School of Dentistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - David Chia
- Indraprastha Institute of Information Technology (IIIT), Delhi, India
| | - Fang Wei
- School of Dentistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - David Elashoff
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Liying Zhang
- Indraprastha Institute of Information Technology (IIIT), Delhi, India
| | - Sung Kim
- Department of Medicine, Biostatistics and Computational Medicine, University of California Los Angeles, Los Angeles, CA, 90095, USA
- Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, 06355, South Korea
| | - Yong Kim
- School of Dentistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
| | - David T W Wong
- School of Dentistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
| |
Collapse
|
11
|
Boigenzahn H, González LD, Thompson JC, Zavala VM, Yin J. Kinetic Modeling and Parameter Estimation of a Prebiotic Peptide Reaction Network. J Mol Evol 2023; 91:730-744. [PMID: 37796316 DOI: 10.1007/s00239-023-10132-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 08/23/2023] [Indexed: 10/06/2023]
Abstract
Although our understanding of how life emerged on Earth from simple organic precursors is speculative, early precursors likely included amino acids. The polymerization of amino acids into peptides and interactions between peptides are of interest because peptides and proteins participate in complex interaction networks in extant biology. However, peptide reaction networks can be challenging to study because of the potential for multiple species and systems-level interactions between species. We developed and employed a computational network model to describe reactions between amino acids to form di-, tri-, and tetra-peptides. Our experiments were initiated with two of the simplest amino acids, glycine and alanine, mediated by trimetaphosphate-activation and drying to promote peptide bond formation. The parameter estimates for bond formation and hydrolysis reactions in the system were found to be poorly constrained due to a network property known as sloppiness. In a sloppy model, the behavior mostly depends on only a subset of parameter combinations, but there is no straightforward way to determine which parameters should be included or excluded. Despite our inability to determine the exact values of specific kinetic parameters, we could make reasonably accurate predictions of model behavior. In short, our modeling has highlighted challenges and opportunities toward understanding the behaviors of complex prebiotic chemical experiments.
Collapse
Affiliation(s)
- Hayley Boigenzahn
- Department of Chemical and Biological Engineering, University of Wisconsin-Madison, 1415 Engineering Drive, Madison, WI, 53706, USA
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, 330 N. Orchard Street, Madison, WI, 53715, USA
| | - Leonardo D González
- Department of Chemical and Biological Engineering, University of Wisconsin-Madison, 1415 Engineering Drive, Madison, WI, 53706, USA
| | - Jaron C Thompson
- Department of Chemical and Biological Engineering, University of Wisconsin-Madison, 1415 Engineering Drive, Madison, WI, 53706, USA
| | - Victor M Zavala
- Department of Chemical and Biological Engineering, University of Wisconsin-Madison, 1415 Engineering Drive, Madison, WI, 53706, USA
| | - John Yin
- Department of Chemical and Biological Engineering, University of Wisconsin-Madison, 1415 Engineering Drive, Madison, WI, 53706, USA.
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, 330 N. Orchard Street, Madison, WI, 53715, USA.
| |
Collapse
|
12
|
Le NQK, Li W, Cao Y. Sequence-based prediction model of protein crystallization propensity using machine learning and two-level feature selection. Brief Bioinform 2023; 24:bbad319. [PMID: 37649385 DOI: 10.1093/bib/bbad319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 07/09/2023] [Accepted: 08/16/2023] [Indexed: 09/01/2023] Open
Abstract
Protein crystallization is crucial for biology, but the steps involved are complex and demanding in terms of external factors and internal structure. To save on experimental costs and time, the tendency of proteins to crystallize can be initially determined and screened by modeling. As a result, this study created a new pipeline aimed at using protein sequence to predict protein crystallization propensity in the protein material production stage, purification stage and production of crystal stage. The newly created pipeline proposed a new feature selection method, which involves combining Chi-square (${\chi }^{2}$) and recursive feature elimination together with the 12 selected features, followed by a linear discriminant analysisfor dimensionality reduction and finally, a support vector machine algorithm with hyperparameter tuning and 10-fold cross-validation is used to train the model and test the results. This new pipeline has been tested on three different datasets, and the accuracy rates are higher than the existing pipelines. In conclusion, our model provides a new solution to predict multistage protein crystallization propensity which is a big challenge in computational biology.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, 250 Wuxing Street, 110, Taipei, Taiwan
- AIBioMed Research Group, Taipei Medical University, 250 Wuxing Street, 110, Taipei, Taiwan
- Research Center for Artificial Intelligence in Medicine, Taipei Medical University, 250 Wuxing Street, 110, Taipei, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, 252 Wuxing Street, 110, Taipei, Taiwan
| | - Wanru Li
- NUS-ISS, National University of Singapore, 25 Heng Mui Keng Terrace, 119615, Singapore, Singapore
| | - Yanshuang Cao
- NUS-ISS, National University of Singapore, 25 Heng Mui Keng Terrace, 119615, Singapore, Singapore
| |
Collapse
|
13
|
Swarup N, Cheng J, Choi I, Heo YJ, Kordi M, Li F, Aziz M, Chia D, Wei F, Elashoff D, Zhang L, Kim S, Kim Y, Wong DT. Multi-Faceted Attributes of Salivary Cell-free DNA as Liquid Biopsy Biomarkers for Gastric Cancer Detection. RESEARCH SQUARE 2023:rs.3.rs-3154388. [PMID: 37503289 PMCID: PMC10371094 DOI: 10.21203/rs.3.rs-3154388/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Background Recent advances in circulating cell-free DNA (cfDNA) analysis from biofluids have opened new avenues for liquid biopsy (LB). However, current cfDNA LB assays are limited by the availability of existing information on established genotypes associated with tumor tissues. Certain cancers present with a limited list of established mutated cfDNA biomarkers, and thus, nonmutated cfDNA characteristics along with alternative biofluids are needed to broaden the available cfDNA targets for cancer detection. Saliva is an intriguing and accessible biofluid that has yet to be fully explored for its clinical utility for cancer detection. Methods In this report, we employed a low-coverage single stranded (ss) library NGS pipeline "Broad-Range cell-free DNA-Seq" (BRcfDNA-Seq) using saliva to comprehensively investigate the characteristics of salivary cfDNA (ScfDNA). The identification of cfDNA features has been made possible by applying novel cfDNA processing techniques that permit the incorporation of ultrashort, ss, and jagged DNA fragments. As a proof of concept using 10 gastric cancer (GC) and 10 noncancer samples, we examined whether ScfDNA characteristics, including fragmentomics, end motif profiles, microbial contribution, and human chromosomal mapping, could differentiate between these two groups. Results Individual and integrative analysis of these ScfDNA features demonstrated significant differences between the two cohorts, suggesting that disease state may affect the ScfDNA population by altering nuclear cleavage or the profile of contributory organism cfDNA to total ScfDNA. We report that principal component analysis integration of several aspects of salivary cell-free DNA fragmentomic profiles, genomic element profiles, end-motif sequence patterns, and distinct oral microbiome populations can differentiate the two populations with a p value of < 0.0001 (PC1). Conclusion These novel features of ScfDNA characteristics could be clinically useful for improving saliva-based LB detection and the eventual monitoring of local or systemic diseases.
Collapse
Affiliation(s)
- Neeti Swarup
- School of Dentistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Jordan Cheng
- School of Dentistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Irene Choi
- School of Dentistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - You Jeong Heo
- The Samsung Advanced Institute for Health Sciences & Technology (SAIHST), Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul 06355, Republic of Korea
| | - Misagh Kordi
- School of Dentistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Feng Li
- School of Dentistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Mohammad Aziz
- School of Dentistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - David Chia
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Fang Wei
- School of Dentistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - David Elashoff
- Department of Medicine, Biostatistics and Computational Medicine, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Liying Zhang
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Sung Kim
- Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul 06355, South Korea
| | - Yong Kim
- School of Dentistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - David T.W. Wong
- School of Dentistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| |
Collapse
|
14
|
Watts J, Allen E, Mitoubsi A, Khojandi A, Eales J, Papamarkou T. Towards Faster Gene Expression Prediction via Dimensionality Reduction and Feature Selection. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-4. [PMID: 38083578 DOI: 10.1109/embc40787.2023.10340962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
The majority of genes have a genetic component to their expression. Elastic nets have been shown effective at predicting tissue-specific, individual-level gene expression from genotype data. We apply principal component analysis (PCA), linkage disequilibrium pruning, or the combination of the two to reduce, or generate, a lower-dimensional representation of the genetic variants used as inputs to the elastic net models for the prediction of gene expression. Our results show that, in general, elastic nets attain their best performance when all genetic variants are included as inputs; however, a relatively low number of principal components can effectively summarize the majority of genetic variation while reducing the overall computation time. Specifically, 100 principal components reduce the computational time of the models by over 80% with only an 8% loss in R2. Finally, linkage disequilibrium pruning does not effectively reduce the genetic variants for predicting gene expression. As predictive models are commonly made for over 27,000 genes for more than 50 tissues, PCA may provide an effective method for reducing the computational burden of gene expression analysis.
Collapse
|
15
|
Qian X, Dai X, Luo L, Lin M, Xu Y, Zhao Y, Huang D, Qiu H, Liang L, Liu H, Liu Y, Gu L, Lu T, Chen Y, Zhang Y. An Interpretable Multitask Framework BiLAT Enables Accurate Prediction of Cyclin-Dependent Protein Kinase Inhibitors. J Chem Inf Model 2023. [PMID: 37171216 DOI: 10.1021/acs.jcim.3c00473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
The cyclin-dependent protein kinases (CDKs) are protein-serine/threonine kinases with crucial effects on the regulation of cell cycle and transcription. CDKs can be a hallmark of cancer since their excessive expression could lead to impaired cell proliferation. However, the selectivity profile of most developed CDK inhibitors is not enough, which have hindered the therapeutic use of CDK inhibitors. In this study, we propose a multitask deep learning framework called BiLAT based on SMILES representation for the prediction of the inhibitory activity of molecules on eight CDK subtypes (CDK1, 2, 4-9). The framework is mainly composed of an improved bidirectional long short-term memory module BiLSTM and the encode layer of the Transformer framework. Additionally, the data enhancement method of SMILES enumeration is applied to improve the performance of the model. Compared with baseline predictive models based on three conventional machine learning methods and two multitask deep learning algorithms, BiLAT achieves the best performance with the highest average AUC, ACC, F1-score, and MCC values of 0.938, 0.894, 0.911, and 0.715 for the test set. Moreover, we constructed a targeted external data set CDK-Dec for the CDK family, which mainly contains bait values screened by 3D similarity with active compounds. This dataset was utilized in the subsequent evaluation of our model. It is worth mentioning that the BiLAT model is interpretable and can be used by chemists to design and synthesize compounds with improved activity. To further verify the generalization ability of the multitask BiLAT model, we also conducted another evaluation on three public datasets (Tox21, ClinTox, and SIDER). Compared with several currently popular models, BiLAT shows the best performance on two datasets. These results indicate that BiLAT is an effective tool for accelerating drug discovery.
Collapse
Affiliation(s)
- Xu Qian
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Xiaowen Dai
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Lin Luo
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Mingde Lin
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yuan Xu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yang Zhao
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Dingfang Huang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Haodi Qiu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Li Liang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yingbo Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Lingxi Gu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Tao Lu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
- State Key Laboratory of Natural Medicines, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing 210009, China
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yanmin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| |
Collapse
|
16
|
Wang Y, Huang M, Deng H, Li W, Wu Z, Tang Y, Liu G. Identification of vital chemical information via visualization of graph neural networks. Brief Bioinform 2023; 24:6936421. [PMID: 36537081 DOI: 10.1093/bib/bbac577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 11/02/2022] [Accepted: 11/25/2022] [Indexed: 12/24/2022] Open
Abstract
Qualitative or quantitative prediction models of structure-activity relationships based on graph neural networks (GNNs) are prevalent in drug discovery applications and commonly have excellently predictive power. However, the network information flows of GNNs are highly complex and accompanied by poor interpretability. Unfortunately, there are relatively less studies on GNN attributions, and their developments in drug research are still at the early stages. In this work, we adopted several advanced attribution techniques for different GNN frameworks and applied them to explain multiple drug molecule property prediction tasks, enabling the identification and visualization of vital chemical information in the networks. Additionally, we evaluated them quantitatively with attribution metrics such as accuracy, sparsity, fidelity and infidelity, stability and sensitivity; discussed their applicability and limitations; and provided an open-source benchmark platform for researchers. The results showed that all attribution techniques were effective, while those directly related to the predicted labels, such as integrated gradient, preferred to have better attribution performance. These attribution techniques we have implemented could be directly used for the vast majority of chemical GNN interpretation tasks.
Collapse
Affiliation(s)
- Yimeng Wang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Mengting Huang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Hua Deng
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Zengrui Wu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| |
Collapse
|
17
|
Revealing genetic links of Type 2 diabetes that lead to the development of Alzheimer's disease. Heliyon 2022; 9:e12202. [PMID: 36711310 PMCID: PMC9876837 DOI: 10.1016/j.heliyon.2022.e12202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 11/01/2022] [Accepted: 11/30/2022] [Indexed: 12/23/2022] Open
Abstract
Background A factor leading to Alzheimer's Disease (AD), portrayed by peripheral insulin resistance, is Type 2 diabetes mellitus (T2D). The likelihood of T2D cases would be at boosted danger in alternating AD cases has severe social consequences. Several genes have been detected via gene expression profiling or different techniques; despite the consideration of the utility of numerous of these genes stays insufficient. Methods This project is designed to uncover the mutual genomics motifs between AD and T2D via non-negative matrix factorization (NMF) of differentially expressed genes (DEGs) of T2D Mellitus of human cortical neurons of the neurovascular unit gene expression data. A rank factorization value is calculated by employing the combination of the NMF model with the unit invariant knee (UIK) point method. The metagenes are further determined by remarking the enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and gene ontology (GO) enrichment tools. In this study, the most highly expressed genes of metagenes are subjected to protein-protein interaction (PPI) network study to discover the most significant biomarkers of T2D Mellitus in the ageing brain. Results We screened the most important shared genes (CDKN1A, COL22A1, EIF4A, GFAP, SLC1A1, and VIM) and essential human molecular pathways that motivate these diseases. The study aimed to validate the most significant hub genes using network-based methods which detected the corresponding relationship between AD and T2D. Conclusions Using in silico tools, the computational pipeline has broadly examined transformed pathways and discovered promising biomarkers and drug targets. We validated the most significant hub genes using network-based methods which detected the corresponding relationship between AD and T2D. These consequences on brain cells hypothetically reserve to diabetic Alzheimer's so-called type 3 diabetes (T3D) and may offer promising methodologies for curative intrusion.
Collapse
|
18
|
Shu Z, Long Q, Zhang L, Yu Z, Wu XJ. Robust Graph Regularized NMF with Dissimilarity and Similarity Constraints for ScRNA-seq Data Clustering. J Chem Inf Model 2022; 62:6271-6286. [PMID: 36459053 DOI: 10.1021/acs.jcim.2c01305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
The notable progress in single-cell RNA sequencing (ScRNA-seq) technology is beneficial to accurately discover the heterogeneity and diversity of cells. Clustering is an extremely important step during the ScRNA-seq data analysis. However, it cannot achieve satisfactory performances by directly clustering ScRNA-seq data due to its high dimensionality and noise. To address these issues, we propose a novel ScRNA-seq data representation model, termed Robust Graph regularized Non-Negative Matrix Factorization with Dissimilarity and Similarity constraints (RGNMF-DS), for ScRNA-seq data clustering. To accurately characterize the structure information of the labeled samples and the unlabeled samples, respectively, the proposed RGNMF-DS model adopts a couple of complementary regularizers (i.e., similarity and dissimilar regularizers) to guide matrix decomposition. In addition, we construct a graph regularizer to discover the local geometric structure hidden in ScRNA-seq data. Moreover, we adopt the l2,1-norm to measure the reconstruction error and thereby effectively improve the robustness of the proposed RGNMF-DS model to the noises. Experimental results on several ScRNA-seq datasets have demonstrated that our proposed RGNMF-DS model outperforms other state-of-the-art competitors in clustering.
Collapse
Affiliation(s)
- Zhenqiu Shu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650093, China
| | - Qinghan Long
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650093, China
| | - Luping Zhang
- Library of Kunming Medical University, Kunming 650031, China
| | - Zhengtao Yu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650093, China
| | - Xiao-Jun Wu
- Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi 214122, China
| |
Collapse
|
19
|
Chen T, Pubu D, Zhang W, Meng S, Yu C, Yin X, Liu J, Zhang Y. Optimization of the extraction process and metabonomics analysis of uric acid-reducing active substances from Gymnadenia R.Br. and its protective effect on hyperuricemia zebrafish. Front Nutr 2022; 9:1054294. [PMID: 36545468 PMCID: PMC9760756 DOI: 10.3389/fnut.2022.1054294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 11/15/2022] [Indexed: 12/12/2022] Open
Abstract
Background As Gymnadenia R.Br. (Gym) has an obvious uric acid-lowering effect, but its specific bioactive substances and mechanism are still unclear. The key metabolites and pathways used by Gym to reduce uric acid (UA) were identify. Methods An optimized extraction process for urate-lowering active substances from Gym was firstly been carried out based on the xanthine oxidase (XOD) inhibition model in vitro; then, the Ultra-high-performance liquid chromatography and Q-Exactive mass spectrometry (UHPLC-QE-MS) based on non-targeted metabolomics analysis of Traditional Chinese Medicine were performed for comparison of Gym with ethanol concentration of 95% (low extraction rate but high XOD inhibition rate) and 75% (high extraction rate but low XOD inhibition rate), respectively; finally, the protective effect of ethanolic extract of Gym on zebrafish with Hyperuricemia (referred to as HUA zebrafish) was explored. Results We found that the inhibition rate of Gym extract with 95% ethanol concentration on XOD was 84.02%, and the extraction rate was 4.32%. Interestingly, when the other conditions were the same, the XOD inhibition rate of the Gym extract with 75% ethanol concentration was 76.84%, and the extraction rate was 14.68%. A total of 539 metabolites were identified, among them, 162 different metabolites were screened, of which 123 were up-regulated and 39 were down-regulated. Besides significantly reducing the contents of UA, BUN, CRE, ROS, MDA, and XOD activity in HUA zebrafish by Gym and acutely reduce the activity of SOD. Conclusion Along with the flavonoids, polyphenols, alkaloids, terpenoids, and phenylpropanoids, the ethanolic extract of Gym may be related to reduce the UA level of Gym.
Collapse
|
20
|
Li P, Luo H, Ji B, Nielsen J. Machine learning for data integration in human gut microbiome. Microb Cell Fact 2022; 21:241. [PMID: 36419034 PMCID: PMC9685977 DOI: 10.1186/s12934-022-01973-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Accepted: 11/15/2022] [Indexed: 11/25/2022] Open
Abstract
Recent studies have demonstrated that gut microbiota plays critical roles in various human diseases. High-throughput technology has been widely applied to characterize the microbial ecosystems, which led to an explosion of different types of molecular profiling data, such as metagenomics, metatranscriptomics and metabolomics. For analysis of such data, machine learning algorithms have shown to be useful for identifying key molecular signatures, discovering potential patient stratifications, and particularly for generating models that can accurately predict phenotypes. In this review, we first discuss how dysbiosis of the intestinal microbiota is linked to human disease development and how potential modulation strategies of the gut microbial ecosystem can be used for disease treatment. In addition, we introduce categories and workflows of different machine learning approaches, and how they can be used to perform integrative analysis of multi-omics data. Finally, we review advances of machine learning in gut microbiome applications and discuss related challenges. Based on this we conclude that machine learning is very well suited for analysis of gut microbiome and that these approaches can be useful for development of gut microbe-targeted therapies, which ultimately can help in achieving personalized and precision medicine.
Collapse
Affiliation(s)
- Peishun Li
- grid.5371.00000 0001 0775 6028Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Hao Luo
- grid.5371.00000 0001 0775 6028Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Boyang Ji
- grid.5371.00000 0001 0775 6028Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden ,grid.510909.4BioInnovation Institute, Ole Maaløes Vej 3, DK2200 Copenhagen, Denmark
| | - Jens Nielsen
- grid.5371.00000 0001 0775 6028Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden ,grid.510909.4BioInnovation Institute, Ole Maaløes Vej 3, DK2200 Copenhagen, Denmark
| |
Collapse
|
21
|
Single and Combined Associations of Plasma and Urine Essential Trace Elements (Zn, Cu, Se, and Mn) with Cardiovascular Risk Factors in a Mediterranean Population. Antioxidants (Basel) 2022; 11:antiox11101991. [PMID: 36290714 PMCID: PMC9598127 DOI: 10.3390/antiox11101991] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 10/01/2022] [Accepted: 10/04/2022] [Indexed: 11/17/2022] Open
Abstract
Trace elements are micronutrients that are required in very small quantities through diet but are crucial for the prevention of acute and chronic diseases. Despite the fact that initial studies demonstrated inverse associations between some of the most important essential trace elements (Zn, Cu, Se, and Mn) and cardiovascular disease, several recent studies have reported a direct association with cardiovascular risk factors due to the fact that these elements can act as both antioxidants and pro-oxidants, depending on several factors. This study aims to investigate the association between plasma and urine concentrations of trace elements and cardiovascular risk factors in a general population from the Mediterranean region, including 484 men and women aged 18−80 years and considering trace elements individually and as joint exposure. Zn, Cu, Se, and Mn were determined in plasma and urine using an inductively coupled plasma mass spectrometer (ICP-MS). Single and combined analysis of trace elements with plasma lipid, blood pressure, diabetes, and anthropometric variables was undertaken. Principal component analysis, quantile-based g-computation, and calculation of trace element risk scores (TERS) were used for the combined analyses. Models were adjusted for covariates. In single trace element models, we found statistically significant associations between plasma Se and increased total cholesterol and systolic blood pressure; plasma Cu and increased triglycerides and body mass index; and urine Zn and increased glucose. Moreover, in the joint exposure analysis using quantile g-computation and TERS, the combined plasma levels of Zn, Cu, Se (directly), and Mn (inversely) were strongly associated with hypercholesterolemia (OR: 2.03; 95%CI: 1.37−2.99; p < 0.001 per quartile increase in the g-computation approach). The analysis of urine mixtures revealed a significant relationship with both fasting glucose and diabetes (OR: 1.91; 95%CI: 1.01−3.04; p = 0.046). In conclusion, in this Mediterranean population, the combined effect of higher plasma trace element levels (primarily Se, Cu, and Zn) was directly associated with elevated plasma lipids, whereas the mixture effect in urine was primarily associated with plasma glucose. Both parameters are relevant cardiovascular risk factors, and increased trace element exposures should be considered with caution.
Collapse
|
22
|
Gamage HN, Chetty M, Shatte A, Hallinan J. Filter feature selection based Boolean Modelling for Genetic Network Inference. Biosystems 2022; 221:104757. [PMID: 36007675 DOI: 10.1016/j.biosystems.2022.104757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 08/04/2022] [Accepted: 08/04/2022] [Indexed: 11/02/2022]
Abstract
The reconstruction of Gene Regulatory Networks (GRNs) from time series gene expression data is highly relevant for the discovery of complex biological interactions and dynamics. Various computational strategies have been developed for this task, but most approaches have low computational efficiency and are not able to cope with high-dimensional, low sample-number, gene expression data. In this paper, we introduce a novel combined filter feature selection approach for efficient and accurate inference of GRNs. A Boolean framework for network modelling is used to demonstrate the efficacy of the proposed approach. Using discretized microarray expression data, the genes most relevant to each target gene are first filtered using ReliefF, an instance-based feature ranking method that is here applied for the first time to GRN inference. Then, further gene selection from the filtered-gene list is done using a mutual information-based min-redundancy max-relevance criterion by eliminating irrelevant genes. This combined method is executed on resampled datasets to finalize the optimal set of regulatory genes. Building upon our previous research, a Pearson correlation coefficient-based Boolean modelling approach is utilized for the efficient identification of the optimal regulatory rules associated with selected regulatory genes. The proposed approach was evaluated using gene expression datasets from small-scale and medium-scale real gene networks, and was observed to be more effective than Linear Discriminant Analysis, performed better than the individual feature selection methods, and obtained improved Structural Accuracy with a higher number of true positives than other state-of-the-art methods, while outperforming these methods with respect to Dynamic Accuracy and efficiency.
Collapse
Affiliation(s)
| | - Madhu Chetty
- Health Innovation and Transformation Centre, Federation University, Victoria, Australia
| | - Adrian Shatte
- Health Innovation and Transformation Centre, Federation University, Victoria, Australia
| | | |
Collapse
|
23
|
Human Papillomavirus 16 E6 and E7 Oncoproteins Alter the Abundance of Proteins Associated with DNA Damage Response, Immune Signaling and Epidermal Differentiation. Viruses 2022; 14:v14081764. [PMID: 36016386 PMCID: PMC9415472 DOI: 10.3390/v14081764] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 08/08/2022] [Accepted: 08/10/2022] [Indexed: 11/16/2022] Open
Abstract
The high-risk human papillomaviruses are oncogenic viruses associated with almost all cases of cervical carcinomas, and increasing numbers of anal, and oral cancers. Two oncogenic HPV proteins, E6 and E7, are capable of immortalizing keratinocytes and are required for HPV associated cell transformation. Currently, the influence of these oncoproteins on the global regulation of the host proteome is not well defined. Liquid chromatography coupled with quantitative tandem mass spectrometry using isobaric-tagged peptides was used to investigate the effects of the HPV16 oncoproteins E6 and E7 on protein levels in human neonatal keratinocytes (HEKn). Pathway and gene ontology enrichment analyses revealed that the cells expressing the HPV oncoproteins have elevated levels of proteins related to interferon response, inflammation and DNA damage response, while the proteins related to cell organization and epithelial development are downregulated. This study identifies dysregulated pathways and potential biomarkers associated with HPV oncoproteins in primary keratinocytes which may have therapeutic implications. Most notably, DNA damage response pathways, DNA replication, and interferon signaling pathways were affected in cells transduced with HPV16 E6 and E7 lentiviruses. Moreover, proteins associated with cell organization and differentiation were significantly downregulated in keratinocytes expressing HPV16 E6 + E7. High-risk HPV E6 and E7 oncoproteins are necessary for the HPV-associated transformation of keratinocytes. However their influence on the global dysregulation of keratinocyte proteome is not well documented. Here shotgun proteomics using TMT-labeling detected over 2500 significantly dysregulated proteins associated with E6 and E7 expression. Networks of proteins related to interferon response, inflammation and DNA damage repair pathways were altered.
Collapse
|
24
|
Min W, Wan X, Chang TH, Zhang S. A Novel Sparse Graph-Regularized Singular Value Decomposition Model and Its Application to Genomic Data Analysis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:3842-3856. [PMID: 33556027 DOI: 10.1109/tnnls.2021.3054635] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Learning the gene coexpression pattern is a central challenge for high-dimensional gene expression analysis. Recently, sparse singular value decomposition (SVD) has been used to achieve this goal. However, this model ignores the structural information between variables (e.g., a gene network). The typical graph-regularized penalty can be used to incorporate such prior graph information to achieve more accurate discovery and better interpretability. However, the existing approach fails to consider the opposite effect of variables with negative correlations. In this article, we propose a novel sparse graph-regularized SVD model with absolute operator (AGSVD) for high-dimensional gene expression pattern discovery. The key of AGSVD is to impose a novel graph-regularized penalty ( | u|T L| u| ). However, such a penalty is a nonconvex and nonsmooth function, so it brings new challenges to model solving. We show that the nonconvex problem can be efficiently handled in a convex fashion by adopting an alternating optimization strategy. The simulation results on synthetic data show that our method is more effective than the existing SVD-based ones. In addition, the results on several real gene expression data sets show that the proposed methods can discover more biologically interpretable expression patterns by incorporating the prior gene network.
Collapse
|
25
|
Virtual reality for the observation of oncology models (VROOM): immersive analytics for oncology patient cohorts. Sci Rep 2022; 12:11337. [PMID: 35790803 PMCID: PMC9256599 DOI: 10.1038/s41598-022-15548-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 06/24/2022] [Indexed: 11/08/2022] Open
Abstract
The significant advancement of inexpensive and portable virtual reality (VR) and augmented reality devices has re-energised the research in the immersive analytics field. The immersive environment is different from a traditional 2D display used to analyse 3D data as it provides a unified environment that supports immersion in a 3D scene, gestural interaction, haptic feedback and spatial audio. Genomic data analysis has been used in oncology to understand better the relationship between genetic profile, cancer type, and treatment option. This paper proposes a novel immersive analytics tool for cancer patient cohorts in a virtual reality environment, virtual reality to observe oncology data models. We utilise immersive technologies to analyse the gene expression and clinical data of a cohort of cancer patients. Various machine learning algorithms and visualisation methods have also been deployed in VR to enhance the data interrogation process. This is supported with established 2D visual analytics and graphical methods in bioinformatics, such as scatter plots, descriptive statistical information, linear regression, box plot and heatmap into our visualisation. Our approach allows the clinician to interrogate the information that is familiar and meaningful to them while providing them immersive analytics capabilities to make new discoveries toward personalised medicine.
Collapse
|
26
|
Prediction of Lithium-Ion Battery Capacity by Functional Principal Component Analysis of Monitoring Data. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12094296] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
The lithium-ion (Li-ion) battery is a promising energy storage technology for electronics, automobiles, and smart grids. Extensive research was conducted in the past to improve the prediction of the remaining capacity of the Li-ion battery. A robust prediction model would improve the battery performance and reliability for forthcoming usage. In the development of a data-driven capacity prediction model of Li-ion batteries, most past studies employed capacity degradation data; however, very few tried using other performance monitoring variables, such as temperature, voltage, and current data, to estimate and predict the battery capacity. In this study, we aimed to develop a data-driven model for predicting the capacity of Li-ion batteries adopting functional principal component analysis (fPCA) applied to functional monitoring data of temperature, voltage, and current observations. The proposed method is demonstrated using the battery monitoring data available in the NASA Ames Prognostics Center of Excellence repository. The main contribution of the study the development of an empirical data-driven model to diagnose the state-of-health (SOH) of Li-ion batteries based on the health monitoring data utilizing fPCA and LASSO regression. The study obtained encouraging battery capacity prediction performance by explaining overall variation through eigenfunctions of available monitored discharge parameters of Li-ion batteries. The result of capacity prediction obtained a root mean square error (RMSE) of 0.009. The proposed data-driven approach performs well for predicting the capacity by employing functional performance measures over the life span of a Li-ion battery.
Collapse
|
27
|
Exploratory Analysis of Associations Between Whole Blood Mitochondrial Gene Expression and Cancer-Related Fatigue Among Breast Cancer Survivors. Nurs Res 2022; 71:411-417. [PMID: 35416182 PMCID: PMC9420746 DOI: 10.1097/nnr.0000000000000598] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
BACKGROUND Cancer-related fatigue is a prevalent, debilitating, and persistent condition. Mitochondrial dysfunction is a putative contributor to cancer-related fatigue, but relationships between mitochondrial function and cancer-related fatigue are not well understood. OBJECTIVES We investigated the relationships between mitochondrial DNA (mtDNA) gene expression and cancer-related fatigue, as well as the effects of fish and soybean oil supplementation on these relationships. METHODS A secondary analysis was performed on data from a randomized controlled trial of breast cancer survivors 4-36 months posttreatment with moderate-severe cancer-related fatigue. Participants were randomized to take 6 g fish oil, 6 g soybean oil, or 3 g each daily for 6 weeks. At pre- and postintervention, participants completed the Functional Assessment of Chronic Illness Therapy-Fatigue questionnaire and provided whole blood for assessment of mtDNA gene expression. The expression of 12 protein-encoding genes was reduced to a single dimension using principal component analysis for use in regression analysis. Relationships between mtDNA expression and cancer-related fatigue were assessed using linear regression. RESULTS Among 68 participants, cancer-related fatigue improved and expression of all mtDNA genes decreased over 6 weeks with no effect of treatment group on either outcome. Participants with lower baseline mtDNA gene expression had greater improvements in cancer-related fatigue. No significant associations were observed between mtDNA gene expression and cancer-related fatigue at baseline or changes in mtDNA gene expression and changes in cancer-related fatigue. DISCUSSION Data from this exploratory study add to the growing literature that mitochondrial dysfunction may contribute to the etiology and pathophysiology of cancer-related fatigue.
Collapse
|
28
|
pH-degradable, bisphosphonate-loaded nanogels attenuate liver fibrosis by repolarization of M2-type macrophages. Proc Natl Acad Sci U S A 2022; 119:e2122310119. [PMID: 35290110 PMCID: PMC8944276 DOI: 10.1073/pnas.2122310119] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Fibrosis is a consequence of most chronic liver diseases, but currently no approved antifibrotic treatment is available. M2-type macrophages drive fibrosis progression and prevent regression, even when effective causal therapies have been employed. M2-type macrophages activate a cascade of fibrogenic effector cells and can prevent removal of excess scar tissue. To switch these profibrogenic M2 to fibrolytic (regenerative) macrophages, we developed a pH-degradable, nanogel-based delivery system which can be covalently functionalized with the macrophage-repolarizing bisphosphonate alendronate. The nanogels efficiently deliver the clinically approved drug into hepatic nonparenchymal cells after intravenous administration. They do not eliminate macrophages but repolarize their phenotype and subsequently block fibrosis progression. This approach establishes a nanotherapeutic delivery platform to treat further M2-type macrophage-driven diseases, including cancer. Immune-suppressive (M2-type) macrophages can contribute to the progression of cancer and fibrosis. In chronic liver diseases, M2-type macrophages promote the replacement of functional parenchyma by collagen-rich scar tissue. Here, we aim to prevent liver fibrosis progression by repolarizing liver M2-type macrophages toward a nonfibrotic phenotype by applying a pH-degradable, squaric ester–based nanogel carrier system. This nanotechnology platform enables a selective conjugation of the highly water-soluble bisphosphonate alendronate, a macrophage-repolarizing agent that intrinsically targets bone tissue. The covalent delivery system, however, promotes the drug’s safe and efficient delivery to nonparenchymal cells of fibrotic livers after intravenous administration. The bisphosphonate payload does not eliminate but instead reprograms profibrotic M2- toward antifibrotic M1-type macrophages in vitro and potently prevents liver fibrosis progression in vivo, mainly via induction of a fibrolytic phenotype, as demonstrated by transcriptomic and proteomic analyses. Therefore, the alendronate-loaded squaric ester–based nanogels represent an attractive approach for nanotherapeutic interventions in fibrosis and other diseases driven by M2-type macrophages, including cancer.
Collapse
|
29
|
Wang Y, Gu Y, Lou C, Gong Y, Wu Z, Li W, Tang Y, Liu G. A multitask GNN-based interpretable model for discovery of selective JAK inhibitors. J Cheminform 2022; 14:16. [PMID: 35292114 PMCID: PMC8922399 DOI: 10.1186/s13321-022-00593-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 02/26/2022] [Indexed: 11/10/2022] Open
Abstract
The Janus kinase (JAK) family plays a pivotal role in most cytokine-mediated inflammatory and autoimmune responses via JAK/STAT signaling, and administration of JAK inhibitors is a promising therapeutic strategy for several diseases including COVID-19. However, to screen and design selective JAK inhibitors is a daunting task due to the extremely high homology among four JAK isoforms. In this study, we aimed to simultaneously predict pIC50 values of compounds for all JAK subtypes by constructing an interpretable GNN multitask regression model. The final model performance was positive, with R2 values of 0.96, 0.79 and 0.78 on the training, validation and test sets, respectively. Meanwhile, we calculated and visualized atom weights, followed by the rank sum tests and local mean comparisons to obtain key atoms and substructures that could be fine-tuned to design selective JAK inhibitors. Several successful case studies have demonstrated that our approach is feasible and our model could learn the interactions between proteins and small molecules well, which could provide practitioners with a novel way to discover and design JAK inhibitors with selectivity.
Collapse
Affiliation(s)
- Yimeng Wang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yaxin Gu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Chaofeng Lou
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yuning Gong
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Zengrui Wu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
| |
Collapse
|
30
|
Nguyen L, Nguyen Vo TH, Trinh QH, Nguyen BH, Nguyen-Hoang PU, Le L, Nguyen BP. iANP-EC: Identifying Anticancer Natural Products Using Ensemble Learning Incorporated with Evolutionary Computation. J Chem Inf Model 2022; 62:5080-5089. [PMID: 35157472 DOI: 10.1021/acs.jcim.1c00920] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Cancer is one of the most deadly diseases that annually kills millions of people worldwide. The investigation on anticancer medicines has never ceased to seek better and more adaptive agents with fewer side effects. Besides chemically synthetic anticancer compounds, natural products are scientifically proved as a highly potential alternative source for anticancer drug discovery. Along with experimental approaches being used to find anticancer drug candidates, computational approaches have been developed to virtually screen for potential anticancer compounds. In this study, we construct an ensemble computational framework, called iANP-EC, using machine learning approaches incorporated with evolutionary computation. Four learning algorithms (k-NN, SVM, RF, and XGB) and four molecular representation schemes are used to build a set of classifiers, among which the top-four best-performing classifiers are selected to form an ensemble classifier. Particle swarm optimization (PSO) is used to optimise the weights used to combined the four top classifiers. The models are developed by a set of curated 997 compounds which are collected from the NPACT and CancerHSP databases. The results show that iANP-EC is a stable, robust, and effective framework that achieves an AUC-ROC value of 0.9193 and an AUC-PR value of 0.8366. The comparative analysis of molecular substructures between natural anticarcinogens and nonanticarcinogens partially unveils several key substructures that drive anticancerous activities. We also deploy the proposed ensemble model as an online web server with a user-friendly interface to support the research community in identifying natural products with anticancer activities.
Collapse
Affiliation(s)
- Loc Nguyen
- Computational Biology Center, International University - VNU HCMC, Ho Chi Minh City 700000, Vietnam
| | - Thanh-Hoang Nguyen Vo
- School of Mathematics and Statistics, Victoria University of Wellington, Wellington 6140, New Zealand
| | - Quang H Trinh
- Computational Biology Center, International University - VNU HCMC, Ho Chi Minh City 700000, Vietnam.,School of Information and Communication Technology, Hanoi University of Science and Technology, Hanoi 100000, Vietnam
| | - Bach Hoai Nguyen
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington 6140, New Zealand
| | - Phuong-Uyen Nguyen-Hoang
- Computational Biology Center, International University - VNU HCMC, Ho Chi Minh City 700000, Vietnam
| | - Ly Le
- Computational Biology Center, International University - VNU HCMC, Ho Chi Minh City 700000, Vietnam.,Vingroup Big Data Institute, Ha Noi 100000, Vietnam
| | - Binh P Nguyen
- School of Mathematics and Statistics, Victoria University of Wellington, Wellington 6140, New Zealand
| |
Collapse
|
31
|
Shahraki MF, Atanaki FF, Ariaeenejad S, Ghaffari MR, Norouzi‐Beirami MH, Maleki M, Salekdeh GH, Kavousi K. A computational learning paradigm to targeted discovery of biocatalysts from metagenomic data: a case study of lipase identification. Biotechnol Bioeng 2022; 119:1115-1128. [DOI: 10.1002/bit.28037] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 08/18/2021] [Accepted: 12/01/2021] [Indexed: 11/09/2022]
Affiliation(s)
- Mehdi Foroozandeh Shahraki
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Institute of Biochemistry and Biophysics (IBB), University of Tehran Tehran Iran
| | - Fereshteh Fallah Atanaki
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Institute of Biochemistry and Biophysics (IBB), University of Tehran Tehran Iran
| | - Shohreh Ariaeenejad
- Department of Systems and Synthetic Biology Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREEO) Karaj Iran
| | - Mohammad Reza Ghaffari
- Department of Systems and Synthetic Biology Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREEO) Karaj Iran
| | - Mohammad Hossein Norouzi‐Beirami
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Institute of Biochemistry and Biophysics (IBB), University of Tehran Tehran Iran
- Department of Computer Engineering Osku Branch, Islamic Azad University Osku Iran
| | - Morteza Maleki
- Department of Systems and Synthetic Biology Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREEO) Karaj Iran
| | - Ghasem Hosseini Salekdeh
- Department of Systems and Synthetic Biology Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREEO) Karaj Iran
- Department of Molecular Sciences Macquarie University Sydney NSW Australia
| | - Kaveh Kavousi
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Institute of Biochemistry and Biophysics (IBB), University of Tehran Tehran Iran
| |
Collapse
|
32
|
Ma C, Wu M, Ma S. Analysis of cancer omics data: a selective review of statistical techniques. Brief Bioinform 2022; 23:6510158. [PMID: 35039832 DOI: 10.1093/bib/bbab585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 12/19/2021] [Accepted: 12/20/2021] [Indexed: 11/13/2022] Open
Abstract
Cancer is an omics disease. The development in high-throughput profiling has fundamentally changed cancer research and clinical practice. Compared with clinical, demographic and environmental data, the analysis of omics data-which has higher dimensionality, weaker signals and more complex distributional properties-is much more challenging. Developments in the literature are often 'scattered', with individual studies focused on one or a few closely related methods. The goal of this review is to assist cancer researchers with limited statistical expertise in establishing the 'overall framework' of cancer omics data analysis. To facilitate understanding, we mainly focus on intuition, concepts and key steps, and refer readers to the original publications for mathematical details. This review broadly covers unsupervised and supervised analysis, as well as individual-gene-based, gene-set-based and gene-network-based analysis. We also briefly discuss 'special topics' including interaction analysis, multi-datasets analysis and multi-omics analysis.
Collapse
Affiliation(s)
- Chenjin Ma
- College of Statistics and Data Science, Faculty of Science, Beijing University of Technology, Beijing, China
| | - Mengyun Wu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| | - Shuangge Ma
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| |
Collapse
|
33
|
Xia Y, Zhang H, Wang H, Wang Q, Zhu P, Gu Y, Yang H, Geng D. Identification and validation of ferroptosis key genes in bone mesenchymal stromal cells of primary osteoporosis based on bioinformatics analysis. Front Endocrinol (Lausanne) 2022; 13:980867. [PMID: 36093072 PMCID: PMC9452779 DOI: 10.3389/fendo.2022.980867] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 08/12/2022] [Indexed: 11/30/2022] Open
Abstract
Primary osteoporosis has long been underdiagnosed and undertreated. Currently, ferroptosis may be a promising research direction in the prevention and treatment of primary osteoporosis. However, the specific mechanism of ferroptosis in primary osteoporosis remains a mystery. Differentially expressed genes (DEGs) were identified in bone mesenchymal stromal cells (BMSCs) of primary osteoporosis and heathy patients from the GEO databases with the help of bioinformatics analysis. Then, we intersected these DEGs with the ferroptosis dataset and obtained 80 Ferr-DEGs. Several bioinformatics algorithms (PCA, RLE, Limma, BC, MCC, etc.) were adopted to integrate the results. Additionally, we explored the potential functional roles of the Ferr-DEGs via GO and KEGG. Protein-protein interactions (PPI) were used to predict potential interactive networks. Finally, 80 Ferr-DEGs and 5 key Ferr-DEGs were calculated. The 5 key Ferr-DEGs were further verified in the OVX mouse model. In conclusion, through a variety of bioinformatics methods, our research successfully identified 5 key Ferr-DEGs associated with primary osteoporosis and ferroptosis, namely, sirtuin 1(SIRT1), heat shock protein family A (Hsp70) member 5 (HSPA5), mechanistic target of rapamycin kinase (MTOR), hypoxia inducible factor 1 subunit alpha (HIF1A) and beclin 1 (BECN1), which were verified in an animal model.
Collapse
Affiliation(s)
- Yu Xia
- Department of Orthopedics, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Haifeng Zhang
- Department of Orthopedics, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Heng Wang
- Department of Orthopedics, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Qiufei Wang
- Department of Orthopedics, Changshu Hospital Affiliated to Soochow University, First People’s Hospital of Changshu City, Changshu, China
| | - Pengfei Zhu
- Department of Orthopedics, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Ye Gu
- Department of Orthopedics, Changshu Hospital Affiliated to Soochow University, First People’s Hospital of Changshu City, Changshu, China
| | - Huilin Yang
- Department of Orthopedics, The First Affiliated Hospital of Soochow University, Suzhou, China
- *Correspondence: Huilin Yang, ; Dechun Geng,
| | - Dechun Geng
- Department of Orthopedics, The First Affiliated Hospital of Soochow University, Suzhou, China
- *Correspondence: Huilin Yang, ; Dechun Geng,
| |
Collapse
|
34
|
Liu Z, Chen Z, Song K. SpinSPJ: a novel NMR scripting system to implement artificial intelligence and advanced applications. BMC Bioinformatics 2021; 22:581. [PMID: 34875998 PMCID: PMC8650269 DOI: 10.1186/s12859-021-04492-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 11/24/2021] [Indexed: 12/02/2022] Open
Abstract
Background Software for nuclear magnetic resonance (NMR) spectrometers offer general functionality of instrument control and data processing; these applications are often developed with non-scripting languages. NMR users need to flexibly integrate rapidly developing NMR applications with emerging technologies. Scripting systems offer open environments for NMR users to write custom programs. However, existing scripting systems have limited capabilities for both extending the functionality of NMR software’s non-script main program and using advanced native script libraries to support specialized application domains (e.g., biomacromolecules and metabolomics). Therefore, it is essential to design a novel scripting system to address both of these needs. Result Here, a novel NMR scripting system named SpinSPJ is proposed. It works as a plug-in in the Java based NMR spectrometer software SpinStudioJ. In the scripting system, both Java based NMR methods and original CPython based libraries are supported. A module has been developed as a bridge to integrate the runtime environments of Java and CPython. The module works as an extension in the CPython environment and interacts with Java via the Java Native Interface. Leveraging this bridge, Java based instrument control and data processing methods of SpinStudioJ can be called with the CPython style. Compared with traditional scripting systems, SpinSPJ better supports both extending the non-script main program and implementing advanced NMR applications with a rich variety of script libraries. NMR researchers can easily call functions of instrument control and data processing as well as developing complex functionality (such as multivariate statistical analysis, deep learning, etc.) with CPython native libraries. Conclusion SpinSPJ offers a user-friendly environment to implement custom functionality leveraging its powerful basic NMR and rich CPython libraries. NMR applications with emerging technologies can be easily integrated. The scripting system is free of charge and can be downloaded by visiting http://www.spinstudioj.net/spinspj. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04492-y.
Collapse
Affiliation(s)
- Zao Liu
- State Key Laboratory of Magnetic Resonance and Atomic and Molecular Physics, Wuhan Center for Magnetic Resonance, Wuhan Institute of Physics and Mathematics, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan, 430071, People's Republic of China.,Zhongke-Niujin MR Tech Co. Ltd, Wuhan, 430075, People's Republic of China
| | - Zhiwei Chen
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance Research, Xiamen University, Xiamen, 361005, People's Republic of China.
| | - Kan Song
- Zhongke-Niujin MR Tech Co. Ltd, Wuhan, 430075, People's Republic of China.
| |
Collapse
|
35
|
Nijs M, Smets T, Waelkens E, De Moor B. A mathematical comparison of non-negative matrix factorization related methods with practical implications for the analysis of mass spectrometry imaging data. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2021; 35:e9181. [PMID: 34374141 PMCID: PMC9285509 DOI: 10.1002/rcm.9181] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 08/06/2021] [Accepted: 08/07/2021] [Indexed: 05/25/2023]
Abstract
RATIONALE Non-negative matrix factorization (NMF) has been used extensively for the analysis of mass spectrometry imaging (MSI) data, visualizing simultaneously the spatial and spectral distributions present in a slice of tissue. The statistical framework offers two related NMF methods: probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA), which is a generative model. This work offers a mathematical comparison between NMF, PLSA, and LDA, and includes a detailed evaluation of Kullback-Leibler NMF (KL-NMF) for MSI for the first time. We will inspect the results for MSI data analysis as these different mathematical approaches impose different characteristics on the data and the resulting decomposition. METHODS The four methods (NMF, KL-NMF, PLSA, and LDA) are compared on seven different samples: three originated from mice pancreas and four from human-lymph-node tissues, all obtained using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS). RESULTS Where matrix factorization methods are often used for the analysis of MSI data, we find that each method has different implications on the exactness and interpretability of the results. We have discovered promising results using KL-NMF, which has only rarely been used for MSI so far, improving both NMF and PLSA, and have shown that the hitherto stated equivalent KL-NMF and PLSA algorithms do differ in the case of MSI data analysis. LDA, assumed to be the better method in the field of text mining, is shown to be outperformed by PLSA in the setting of MALDI-MSI. Additionally, the molecular results of the human-lymph-node data have been thoroughly analyzed for better assessment of the methods under investigation. CONCLUSIONS We present an in-depth comparison of multiple NMF-related factorization methods for MSI. We aim to provide fellow researchers in the field of MSI a clear understanding of the mathematical implications using each of these analytical techniques, which might affect the exactness and interpretation of the results.
Collapse
Affiliation(s)
- Melanie Nijs
- STADIUS Center for Dynamical Systems, Signal Processing, and Data Analytics, Department of Electrical Engineering (ESAT)KU LeuvenLeuvenBelgium
| | - Tina Smets
- STADIUS Center for Dynamical Systems, Signal Processing, and Data Analytics, Department of Electrical Engineering (ESAT)KU LeuvenLeuvenBelgium
| | - Etienne Waelkens
- Department of Cellular and Molecular MedicineKU Leuven Campus Gasthuisberg O&N 2LeuvenBelgium
| | - Bart De Moor
- STADIUS Center for Dynamical Systems, Signal Processing, and Data Analytics, Department of Electrical Engineering (ESAT)KU LeuvenLeuvenBelgium
| |
Collapse
|
36
|
Frost HR. Eigenvectors from Eigenvalues Sparse Principal Component Analysis (EESPCA). J Comput Graph Stat 2021; 31:486-501. [PMID: 35693984 PMCID: PMC9187050 DOI: 10.1080/10618600.2021.1987254] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 07/13/2021] [Accepted: 09/22/2021] [Indexed: 01/03/2023]
Abstract
We present a novel technique for sparse principal component analysis. This method, named Eigenvectors from Eigenvalues Sparse Principal Component Analysis (EESPCA), is based on the formula for computing squared eigenvector loadings of a Hermitian matrix from the eigenvalues of the full matrix and associated sub-matrices. We explore two versions of the EESPCA method: a version that uses a fixed threshold for inducing sparsity and a version that selects the threshold via cross-validation. Relative to the state-of-the-art sparse PCA methods of Witten et al., Yuan & Zhang and Tan et al., the fixed threshold EESPCA technique offers an order-of-magnitude improvement in computational speed, does not require estimation of tuning parameters via cross-validation, and can more accurately identify true zero principal component loadings across a range of data matrix sizes and covariance structures. Importantly, the EESPCA method achieves these benefits while maintaining out-of-sample reconstruction error and PC estimation error close to the lowest error generated by all evaluated approaches. EESPCA is a practical and effective technique for sparse PCA with particular relevance to computationally demanding statistical problems such as the analysis of high-dimensional data sets or application of statistical techniques like resampling that involve the repeated calculation of sparse PCs.
Collapse
Affiliation(s)
- H Robert Frost
- Department of Biomedical Data Science, Dartmouth College
| |
Collapse
|
37
|
Chen L, Qing Y, Li R, Li C, Li H, Feng X, Li SC. Somatic variant analysis suite: copy number variation clonal visualization online platform for large-scale single-cell genomics. Brief Bioinform 2021; 23:6406714. [PMID: 34671807 DOI: 10.1093/bib/bbab452] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 10/01/2021] [Accepted: 10/01/2021] [Indexed: 11/15/2022] Open
Abstract
The recent advance of single-cell copy number variation (CNV) analysis plays an essential role in addressing intratumor heterogeneity, identifying tumor subgroups and restoring tumor-evolving trajectories at single-cell scale. Informative visualization of copy number analysis results boosts productive scientific exploration, validation and sharing. Several single-cell analysis figures have the effectiveness of visualizations for understanding single-cell genomics in published articles and software packages. However, they almost lack real-time interaction, and it is hard to reproduce them. Moreover, existing tools are time-consuming and memory-intensive when they reach large-scale single-cell throughputs. We present an online visualization platform, single-cell Somatic Variant Analysis Suite (scSVAS), for real-time interactive single-cell genomics data visualization. scSVAS is specifically designed for large-scale single-cell genomic analysis that provides an arsenal of unique functionalities. After uploading the specified input files, scSVAS deploys the online interactive visualization automatically. Users may conduct scientific discoveries, share interactive visualizations and download high-quality publication-ready figures. scSVAS provides versatile utilities for managing, investigating, sharing and publishing single-cell CNV profiles. We envision this online platform will expedite the biological understanding of cancer clonal evolution in single-cell resolution. All visualizations are publicly hosted at https://sc.deepomics.org.
Collapse
Affiliation(s)
- Lingxi Chen
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong, China
| | - Yuhao Qing
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong, China
| | - Ruikang Li
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong, China
| | - Chaohui Li
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong, China
| | - Hechen Li
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong, China.,School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta GA 30332, USA
| | - Xikang Feng
- School of Software, Northwestern Polytechnical University, Xi'an, 710072, Shaanxi, China
| | - Shuai Cheng Li
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong, China
| |
Collapse
|
38
|
Vilor-Tejedor N, Garrido-Martín D, Rodriguez-Fernandez B, Lamballais S, Guigó R, Gispert JD. Multivariate Analysis and Modelling of multiple Brain endOphenotypes: Let's MAMBO! Comput Struct Biotechnol J 2021; 19:5800-5810. [PMID: 34765095 PMCID: PMC8567328 DOI: 10.1016/j.csbj.2021.10.019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Revised: 10/08/2021] [Accepted: 10/12/2021] [Indexed: 12/01/2022] Open
Abstract
Imaging genetic studies aim to test how genetic information influences brain structure and function by combining neuroimaging-based brain features and genetic data from the same individual. Most studies focus on individual correlation and association tests between genetic variants and a single measurement of the brain. Despite the great success of univariate approaches, given the capacity of neuroimaging methods to provide a multiplicity of cerebral phenotypes, the development and application of multivariate methods become crucial. In this article, we review novel methods and strategies focused on the analysis of multiple phenotypes and genetic data. We also discuss relevant aspects of multi-trait modelling in the context of neuroimaging data.
Collapse
Affiliation(s)
- Natalia Vilor-Tejedor
- Barcelonaβeta Brain Research Center (BBRC), Pasqual Maragall Foundation, Barcelona, Spain
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Spain
- Department of Clinical Genetics, Erasmus Medical Center, Rotterdam, Netherlands
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Diego Garrido-Martín
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Spain
| | | | - Sander Lamballais
- Department of Clinical Genetics, Erasmus Medical Center, Rotterdam, Netherlands
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Juan Domingo Gispert
- Barcelonaβeta Brain Research Center (BBRC), Pasqual Maragall Foundation, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
- IMIM (Hospital del Mar Medical Research Institute), Barcelona, Spain
- Centro de Investigación Biomédica en Red Bioingeniería, Biomateriales y Nanomedicina, Madrid, Spain
| |
Collapse
|
39
|
Dimension-reduction simplifies the analysis of signal crosstalk in a bacterial quorum sensing pathway. Sci Rep 2021; 11:19719. [PMID: 34611201 PMCID: PMC8492804 DOI: 10.1038/s41598-021-99169-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 09/21/2021] [Indexed: 11/16/2022] Open
Abstract
Many pheromone sensing bacteria produce and detect more than one chemically distinct signal, or autoinducer. The pathways that detect these signals are typically noisy and interlocked through crosstalk and feedback. As a result, the sensing response of individual cells is described by statistical distributions that change under different combinations of signal inputs. Here we examine how signal crosstalk reshapes this response. We measure how combinations of two homoserine lactone (HSL) input signals alter the statistical distributions of individual cell responses in the AinS/R- and LuxI/R-controlled branches of the Vibrio fischeri bioluminescence pathway. We find that, while the distributions of pathway activation in individual cells vary in complex fashion with environmental conditions, these changes have a low-dimensional representation. For both the AinS/R and LuxI/R branches, the distribution of individual cell responses to mixtures of the two HSLs is effectively one-dimensional, so that a single tuning parameter can capture the full range of variability in the distributions. Combinations of crosstalking HSL signals extend the range of responses for each branch of the circuit, so that signals in combination allow population-wide distributions that are not available under a single HSL input. Dimension reduction also simplifies the problem of identifying the HSL conditions to which the pathways and their outputs are most sensitive. A comparison of the maximum sensitivity HSL conditions to actual HSL levels measured during culture growth indicates that the AinS/R and LuxI/R branches lack sensitivity to population density except during the very earliest and latest stages of growth respectively.
Collapse
|
40
|
Wang M, Chen D, Zheng H, Zhao L, Xue X, Yu F, Zhang Y, Cheng C, Niu Q, Wang S, Zhang Y, Wu L. Sex-Specific Development in Haplodiploid Honeybee Is Controlled by the Female-Embryo-Specific Activation of Thousands of Intronic LncRNAs. Front Cell Dev Biol 2021; 9:690167. [PMID: 34422813 PMCID: PMC8377728 DOI: 10.3389/fcell.2021.690167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 07/12/2021] [Indexed: 11/13/2022] Open
Abstract
Embryonic development depends on a highly coordinated shift in transcription programs known as the maternal-to-zygotic transition (MZT). It remains unclear how haploid and diploid embryo coordinate their genomic activation and embryonic development during MZT in haplodiploid animals. Here, we applied a single-embryo RNA-seq approach to characterize the embryonic transcriptome dynamics in haploid males vs. diploid females of the haplodiploid insect honeybee (Apis mellifera). We observed typical zygotic genome activation (ZGA) occurred in three major waves specifically in female honeybee embryos; haploid genome activation was much weaker and occurred later. Strikingly, we also observed three waves of transcriptional activation for thousands of long non-coding transcripts (lncRNA), 73% of which are transcribed from intronic regions and 65% were specific to female honeybee embryos. These findings support a model in which introns encode thousands of lncRNAs that are expressed in a diploid-embryo-specific and ZGA-triggered manner that may have potential functions to regulate gene expression during early embryonic development in the haplodiploid insect honeybee.
Collapse
Affiliation(s)
- Miao Wang
- Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Dong Chen
- ABLife BioBigData Institute, Wuhan, China.,Laboratory for Genome Regulation and Human Health, ABLife Inc., Wuhan, China
| | - Huoqing Zheng
- College of Animal Science, Zhejiang University, Hangzhou, China
| | - Liuwei Zhao
- Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xiaofeng Xue
- Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Fengyun Yu
- Laboratory for Genome Regulation and Human Health, ABLife Inc., Wuhan, China
| | - Yu Zhang
- ABLife BioBigData Institute, Wuhan, China
| | - Chao Cheng
- ABLife BioBigData Institute, Wuhan, China
| | - Qingsheng Niu
- Department of Scientific Research, Jilin Province Institute of Apicultural Science, Jilin, China
| | - Shuai Wang
- College of Animal Science, Zhejiang University, Hangzhou, China
| | - Yi Zhang
- ABLife BioBigData Institute, Wuhan, China.,Laboratory for Genome Regulation and Human Health, ABLife Inc., Wuhan, China
| | - Liming Wu
- Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, Beijing, China
| |
Collapse
|
41
|
Schlieben LD, Prokisch H, Yépez VA. How Machine Learning and Statistical Models Advance Molecular Diagnostics of Rare Disorders Via Analysis of RNA Sequencing Data. Front Mol Biosci 2021; 8:647277. [PMID: 34141720 PMCID: PMC8204083 DOI: 10.3389/fmolb.2021.647277] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Accepted: 05/10/2021] [Indexed: 12/11/2022] Open
Abstract
Rare diseases, although individually rare, collectively affect approximately 350 million people worldwide. Currently, nearly 6,000 distinct rare disorders with a known molecular basis have been described, yet establishing a specific diagnosis based on the clinical phenotype is challenging. Increasing integration of whole exome sequencing into routine diagnostics of rare diseases is improving diagnostic rates. Nevertheless, about half of the patients do not receive a genetic diagnosis due to the challenges of variant detection and interpretation. During the last years, RNA sequencing is increasingly used as a complementary diagnostic tool providing functional data. Initially, arbitrary thresholds have been applied to call aberrant expression, aberrant splicing, and mono-allelic expression. With the application of RNA sequencing to search for the molecular diagnosis, the implementation of robust statistical models on normalized read counts allowed for the detection of significant outliers corrected for multiple testing. More recently, machine learning methods have been developed to improve the normalization of RNA sequencing read count data by taking confounders into account. Together the methods have increased the power and sensitivity of detection and interpretation of pathogenic variants, leading to diagnostic rates of 10-35% in rare diseases. In this review, we provide an overview of the methods used for RNA sequencing and illustrate how these can improve the diagnostic yield of rare diseases.
Collapse
Affiliation(s)
- Lea D. Schlieben
- School of Medicine, Institute of Human Genetics, Technical University of Munich, Munich, Germany
- Institute of Neurogenomics, Helmholtz Zentrum München, Neuherberg, Germany
| | - Holger Prokisch
- School of Medicine, Institute of Human Genetics, Technical University of Munich, Munich, Germany
- Institute of Neurogenomics, Helmholtz Zentrum München, Neuherberg, Germany
| | - Vicente A. Yépez
- School of Medicine, Institute of Human Genetics, Technical University of Munich, Munich, Germany
- Department of Informatics, Technical University of Munich, Munich, Germany
| |
Collapse
|
42
|
Dong Z, Alterovitz G. netAE: semi-supervised dimensionality reduction of single-cell RNA sequencing to facilitate cell labeling. Bioinformatics 2021; 37:43-49. [PMID: 32726427 DOI: 10.1093/bioinformatics/btaa669] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Revised: 06/07/2020] [Accepted: 07/17/2020] [Indexed: 01/03/2023] Open
Abstract
MOTIVATION Single-cell RNA sequencing allows us to study cell heterogeneity at an unprecedented cell-level resolution and identify known and new cell populations. Current cell labeling pipeline uses unsupervised clustering and assigns labels to clusters by manual inspection. However, this pipeline does not utilize available gold-standard labels because there are usually too few of them to be useful to most computational methods. This article aims to facilitate cell labeling with a semi-supervised method in an alternative pipeline, in which a few gold-standard labels are first identified and then extended to the rest of the cells computationally. RESULTS We built a semi-supervised dimensionality reduction method, a network-enhanced autoencoder (netAE). Tested on three public datasets, netAE outperforms various dimensionality reduction baselines and achieves satisfactory classification accuracy even when the labeled set is very small, without disrupting the similarity structure of the original space. AVAILABILITY AND IMPLEMENTATION The code of netAE is available on GitHub: https://github.com/LeoZDong/netAE. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhengyang Dong
- Department of Computer Science, Stanford University, Stanford, CA 94305
| | - Gil Alterovitz
- Department of Medicine, Brigham and Women's Hospital/Harvard Medical School, Boston, MA 021153.,National Artificial Intelligence Institute, U.S Department of Veterans Affairs, Washington, DC 20571
| |
Collapse
|
43
|
Guo L, Hu Z, Zhao C, Xu X, Wang S, Xu J, Dong J, Cai Z. Data Filtering and Its Prioritization in Pipelines for Spatial Segmentation of Mass Spectrometry Imaging. Anal Chem 2021; 93:4788-4793. [PMID: 33683863 DOI: 10.1021/acs.analchem.0c05242] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Mass spectrometry imaging (MSI) could provide vast amounts of data at the temporal-spatial scale in heterogeneous biological specimens, which challenges us to segment accurately suborgans/microregions from complex MSI data. Several pipelines had been proposed for MSI spatial segmentation in the past decade. More importantly, data filtering was found to be an efficient procedure to improve the outcomes of MSI segmentation pipelines. It is not clear, however, how the filtering procedure affects the MSI segmentation. An improved pipeline was established by elaborating the filtering prioritization and filtering algorithm. Lipidomic-characteristic-based MSI data of a whole-body mouse fetus was used to evaluate the established pipeline on localization of the physiological position of suborgans by comparing with three commonly used pipelines and commercial SCiLS Lab software. Two structural measurements were used to quantify the performances of the pipelines including the percentage of abnormal edge pixel (PAEP) and CHAOS. Our results demonstrated that the established pipeline outperformed the other pipelines in visual inspection, spatial consistence, time-cost, and robustness analysis. For example, the dorsal pallium (isocortex) and hippocampal formation (Hpf) regions, midbrain, cerebellum, and brainstem on the mouse brain were annotated and located by the established pipeline. As a generic pipeline, the established pipeline could help with the accurate assessment and screening of drug/chemical-induced targeted organs and exploration of the progression and molecular mechanisms of diseases. The filter-based strategy is expected to become a critical component in the standard operating procedure of MSI data sets.
Collapse
Affiliation(s)
- Lei Guo
- National Institute for Data Science in Health and Medicine, Department of Electronic Science, Xiamen University, Xiamen 361005, China
| | - Zhenxing Hu
- National Institute for Data Science in Health and Medicine, Department of Electronic Science, Xiamen University, Xiamen 361005, China
| | - Chao Zhao
- State Key Laboratory of Environmental and Biological Analysis, Department of Chemistry, Hong Kong Baptist University, Hong Kong SAR 999077, China.,Institute of Biomedical and Health Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Xiangnan Xu
- School of Mathematics and Statistics, The University of Sydney, Camperdown Sydney, NSW 2006, Australia
| | - Shujuan Wang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences, Beijing 102206, China
| | - Jingjing Xu
- National Institute for Data Science in Health and Medicine, Department of Electronic Science, Xiamen University, Xiamen 361005, China
| | - Jiyang Dong
- National Institute for Data Science in Health and Medicine, Department of Electronic Science, Xiamen University, Xiamen 361005, China
| | - Zongwei Cai
- State Key Laboratory of Environmental and Biological Analysis, Department of Chemistry, Hong Kong Baptist University, Hong Kong SAR 999077, China
| |
Collapse
|
44
|
VIRMOTIF: A User-Friendly Tool for Viral Sequence Analysis. Genes (Basel) 2021; 12:genes12020186. [PMID: 33514039 PMCID: PMC7911170 DOI: 10.3390/genes12020186] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2020] [Revised: 01/10/2021] [Accepted: 01/19/2021] [Indexed: 12/16/2022] Open
Abstract
Bioinformatics and computational biology have significantly contributed to the generation of vast and important knowledge that can lead to great improvements and advancements in biology and its related fields. Over the past three decades, a wide range of tools and methods have been developed and proposed to enhance performance, diagnosis, and throughput while maintaining feasibility and convenience for users. Here, we propose a new user-friendly comprehensive tool called VIRMOTIF to analyze DNA sequences. VIRMOTIF brings different tools together as one package so that users can perform their analysis as a whole and in one place. VIRMOTIF is able to complete different tasks, including computing the number or probability of motifs appearing in DNA sequences, visualizing data using the matplotlib and heatmap libraries, and clustering data using four different methods, namely K-means, PCA, Mean Shift, and ClusterMap. VIRMOTIF is the only tool with the ability to analyze genomic motifs based on their frequency and representation (D-ratio) in a virus genome.
Collapse
|
45
|
Abstract
In recent biomedical studies, multidimensional profiling, which collects proteomics as well as other types of omics data on the same subjects, is getting increasingly popular. Proteomics, transcriptomics, genomics, epigenomics, and other types of data contain overlapping as well as independent information, which suggests the possibility of integrating multiple types of data to generate more reliable findings/models with better classification/prediction performance. In this chapter, a selective review is conducted on recent data integration techniques for both unsupervised and supervised analysis. The main objective is to provide the "big picture" of data integration that involves proteomics data and discuss the "intuition" beneath the recently developed approaches without invoking too many mathematical details. Potential pitfalls and possible directions for future developments are also discussed.
Collapse
Affiliation(s)
- Mengyun Wu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| | - Yu Jiang
- School of Public Health, University of Memphis, Memphis, TN, USA
| | - Shuangge Ma
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT, USA.
| |
Collapse
|
46
|
Wang X, Gao M, Ye J, Jiang Q, Yang Q, Zhang C, Wang S, Zhang J, Wang L, Wu J, Zhan H, Hou X, Han D, Zhao S. An Immune Gene-Related Five-lncRNA Signature for to Predict Glioma Prognosis. Front Genet 2020; 11:612037. [PMID: 33391355 PMCID: PMC7772413 DOI: 10.3389/fgene.2020.612037] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 11/09/2020] [Indexed: 12/19/2022] Open
Abstract
Background The tumor immune microenvironment is closely related to the malignant progression and treatment resistance of glioma. Long non-coding RNA (lncRNA) plays a regulatory role in this process. We investigated the pathological mechanisms within the glioma microenvironment and potential immunotherapy resistance related to lncRNAs. Method We downloaded datasets derived from glioma patients and analyzed them by hierarchical clustering. Next, we analyzed the immune microenvironment of glioma, related gene expression, and patient survival. Coexpressed lncRNAs were analyzed to generate a model of lncRNAs and immune-related genes. We analyzed the model using survival and Cox regression. Then, univariate, multivariate, receiver operating characteristic (ROC), and principle component analysis (PCA) methods were used to verify the accuracy of the model. Finally, GSEA was used to evaluate which functions and pathways were associated with the differential genes. Results Normal brain tissue maintains a low-medium immune state, and gliomas are clearly divided into three groups (low to high immunity). The stromal, immune, and estimate scores increased along with immunity, while tumor purity decreased. Further, human leukocyte antigen (HLA), programmed cell death-1 (PDL1), T cell immunoglobulin and mucin domain 3 (TIM-3), B7-H3, and cytotoxic T lymphocyte-associated antigen-4 (CTLA4) expression increases concomitantly with immune state, and the patient prognosis worsens. Five immune gene-related lncRNAs (AP001007.1, LBX-AS1, MIR155HG, MAPT-AS1, and LINC00515) were screened to construct risk models. We found that risk scores are related to patient prognosis and clinical characteristics, and are positively correlated with PDL1, TIM-3, and B7-H3 expression. These lncRNAs may regulate the tumor immune microenvironment through cytokine-cytokine receptor interactions, complement, and coagulation cascades, and may promote CD8 + T cell, regulatory T cell, M1 macrophage, and infiltrating neutrophils activity in the high-immunity group. In vitro, the abnormal expression of immune-related lncRNAs and the relationship between risk scores and immune-related indicators (PDL1, CTLA4, CD3, CD8, iNOS) were verified by q-PCR and immunohistochemistry (IHC). Conclusion For the first time, we constructed immune gene-related lncRNA risk models. The risk score may be a new biomarker for tumor immune subtypes and provide molecular targets for glioma immunotherapy.
Collapse
Affiliation(s)
- Xinzhuang Wang
- Department of Neurosurgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Colleges and Universities Laboratory of Neurosurgery in Heilongjiang Province, Harbin, China.,Institute of Neuroscience, Sino-Russian Medical Research Center, Harbin Medical University, Harbin, China
| | - Ming Gao
- Department of Neurosurgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Colleges and Universities Laboratory of Neurosurgery in Heilongjiang Province, Harbin, China.,Institute of Neuroscience, Sino-Russian Medical Research Center, Harbin Medical University, Harbin, China
| | - Junyi Ye
- Department of Neurosurgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Colleges and Universities Laboratory of Neurosurgery in Heilongjiang Province, Harbin, China.,Institute of Neuroscience, Sino-Russian Medical Research Center, Harbin Medical University, Harbin, China
| | - Qiuyi Jiang
- Department of Neurosurgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Colleges and Universities Laboratory of Neurosurgery in Heilongjiang Province, Harbin, China.,Institute of Neuroscience, Sino-Russian Medical Research Center, Harbin Medical University, Harbin, China
| | - Quan Yang
- Department of Neurosurgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Colleges and Universities Laboratory of Neurosurgery in Heilongjiang Province, Harbin, China.,Institute of Neuroscience, Sino-Russian Medical Research Center, Harbin Medical University, Harbin, China
| | - Cheng Zhang
- North Broward Preparatory School, Coconut Creek, FL, United States
| | - Shengtao Wang
- Department of Neurosurgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Colleges and Universities Laboratory of Neurosurgery in Heilongjiang Province, Harbin, China.,Institute of Neuroscience, Sino-Russian Medical Research Center, Harbin Medical University, Harbin, China
| | - Jian Zhang
- Department of General Surgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Ligang Wang
- Department of Neurosurgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Colleges and Universities Laboratory of Neurosurgery in Heilongjiang Province, Harbin, China.,Institute of Neuroscience, Sino-Russian Medical Research Center, Harbin Medical University, Harbin, China
| | - Jianing Wu
- Department of Neurosurgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Colleges and Universities Laboratory of Neurosurgery in Heilongjiang Province, Harbin, China.,Institute of Neuroscience, Sino-Russian Medical Research Center, Harbin Medical University, Harbin, China
| | - Hua Zhan
- Department of Neurosurgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Colleges and Universities Laboratory of Neurosurgery in Heilongjiang Province, Harbin, China.,Institute of Neuroscience, Sino-Russian Medical Research Center, Harbin Medical University, Harbin, China
| | - Xu Hou
- Department of Neurosurgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Colleges and Universities Laboratory of Neurosurgery in Heilongjiang Province, Harbin, China.,Institute of Neuroscience, Sino-Russian Medical Research Center, Harbin Medical University, Harbin, China
| | - Dayong Han
- Department of Neurosurgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Colleges and Universities Laboratory of Neurosurgery in Heilongjiang Province, Harbin, China.,Institute of Neuroscience, Sino-Russian Medical Research Center, Harbin Medical University, Harbin, China
| | - Shiguang Zhao
- Department of Neurosurgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Colleges and Universities Laboratory of Neurosurgery in Heilongjiang Province, Harbin, China.,Institute of Neuroscience, Sino-Russian Medical Research Center, Harbin Medical University, Harbin, China.,Department of Neurosurgery, The Pinghu Hospital of Shenzhen University, Shenzhen, China
| |
Collapse
|
47
|
Li X, Liang B, Xu D, Wu C, Li J, Zheng Y. Antimicrobial Resistance Risk Assessment Models and Database System for Animal-Derived Pathogens. Antibiotics (Basel) 2020; 9:E829. [PMID: 33228076 PMCID: PMC7699434 DOI: 10.3390/antibiotics9110829] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Revised: 11/01/2020] [Accepted: 11/17/2020] [Indexed: 01/06/2023] Open
Abstract
(1) Background: The high use of antibiotics has made the issue of antimicrobial resistance (AMR) increasingly serious, which poses a substantial threat to the health of animals and humans. However, there remains a certain gap in the AMR system and risk assessment models between China and the advanced world level. Therefore, this paper aims to provide advanced means for the monitoring of antibiotic use and AMR data, and take piglets as an example to evaluate the risk and highlight the seriousness of AMR in China. (2) Methods: Based on the principal component analysis method, a drug resistance index model of anti-E. coli drugs was established to evaluate the antibiotic risk status in China. Additionally, based on the second-order Monte Carlo methods, a disease risk assessment model for piglets was established to predict the probability of E. coli disease within 30 days of taking florfenicol. Finally, a browser/server architecture-based visualization database system for animal-derived pathogens was developed. (3) Results: The risk of E. coli in the main area was assessed and Hohhot was the highest risk area in China. Compared with the true disease risk probability of 4.1%, the result of the disease risk assessment model is 7.174%, and the absolute error was 3.074%. Conclusions: Taking E. coli as an example, this paper provides an innovative method for rapid and accurate risk assessment of drug resistance. Additionally, the established system and assessment models have potential value for the monitoring and evaluating AMR, highlight the seriousness of antimicrobial resistance, advocate the prudent use of antibiotics, and ensure the safety of animal-derived foods and human health.
Collapse
Affiliation(s)
- Xinxing Li
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China; (X.L.); (B.L.)
| | - Buwen Liang
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China; (X.L.); (B.L.)
| | - Ding Xu
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, College of Engineering, China Agricultural University, Beijing 100083, China; (D.X.); (J.L.)
| | - Congming Wu
- College of Veterinary Medicine, China Agricultural University, Beijing 100083, China;
| | - Jianping Li
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, College of Engineering, China Agricultural University, Beijing 100083, China; (D.X.); (J.L.)
| | - Yongjun Zheng
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, College of Engineering, China Agricultural University, Beijing 100083, China; (D.X.); (J.L.)
| |
Collapse
|
48
|
Cantini L, Kairov U, de Reyniès A, Barillot E, Radvanyi F, Zinovyev A. Assessing reproducibility of matrix factorization methods in independent transcriptomes. Bioinformatics 2020; 35:4307-4313. [PMID: 30938767 PMCID: PMC6821374 DOI: 10.1093/bioinformatics/btz225] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 03/20/2019] [Accepted: 04/01/2019] [Indexed: 12/26/2022] Open
Abstract
Motivation Matrix factorization (MF) methods are widely used in order to reduce dimensionality of transcriptomic datasets to the action of few hidden factors (metagenes). MF algorithms have never been compared based on the between-datasets reproducibility of their outputs in similar independent datasets. Lack of this knowledge might have a crucial impact when generalizing the predictions made in a study to others. Results We systematically test widely used MF methods on several transcriptomic datasets collected from the same cancer type (14 colorectal, 8 breast and 4 ovarian cancer transcriptomic datasets). Inspired by concepts of evolutionary bioinformatics, we design a novel framework based on Reciprocally Best Hit (RBH) graphs in order to benchmark the MF methods for their ability to produce generalizable components. We show that a particular protocol of application of independent component analysis (ICA), accompanied by a stabilization procedure, leads to a significant increase in the between-datasets reproducibility. Moreover, we show that the signals detected through this method are systematically more interpretable than those of other standard methods. We developed a user-friendly tool for performing the Stabilized ICA-based RBH meta-analysis. We apply this methodology to the study of colorectal cancer (CRC) for which 14 independent transcriptomic datasets can be collected. The resulting RBH graph maps the landscape of interconnected factors associated to biological processes or to technological artifacts. These factors can be used as clinical biomarkers or robust and tumor-type specific transcriptomic signatures of tumoral cells or tumoral microenvironment. Their intensities in different samples shed light on the mechanistic basis of CRC molecular subtyping. Availability and implementation The RBH construction tool is available from http://goo.gl/DzpwYp Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Laura Cantini
- Institut Curie, PSL Research University, F-75005 Paris, France.,INSERM U900, F-75005 Paris, France.,CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, F-75006 Paris, France.,Computational Systems Biology Team, Institut de Biologie de l'École Normale Supérieure, CNRS UMR8197, INSERM U1024, École Normale Supérieure, PSL Research University, Paris, France
| | - Ulykbek Kairov
- Laboratory of Bioinformatics and Systems Biology, Center for Life Sciences, National Laboratory Astana, Nazarbayev University, Astana, Kazakhstan
| | - Aurélien de Reyniès
- Programme Cartes d'Identité des Tumeurs (CIT), Ligue Nationale Contre le Cancer, Paris, France
| | - Emmanuel Barillot
- Institut Curie, PSL Research University, F-75005 Paris, France.,INSERM U900, F-75005 Paris, France.,CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, F-75006 Paris, France
| | - François Radvanyi
- Institut Curie, PSL Research University, CNRS, UMR144, Equipe Labellisée Ligue Contre le Cancer, Paris, France.,Sorbonne Universités, UPMC Université Paris 06, CNRS, UMR144, Paris
| | - Andrei Zinovyev
- Institut Curie, PSL Research University, F-75005 Paris, France.,INSERM U900, F-75005 Paris, France.,CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, F-75006 Paris, France.,Lobachevsky University, Nizhny Novgorod, Russia
| |
Collapse
|
49
|
Vinga S. Structured sparsity regularization for analyzing high-dimensional omics data. Brief Bioinform 2020; 22:77-87. [PMID: 32597465 DOI: 10.1093/bib/bbaa122] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2019] [Revised: 05/15/2020] [Accepted: 05/18/2020] [Indexed: 12/18/2022] Open
Abstract
The development of new molecular and cell technologies is having a significant impact on the quantity of data generated nowadays. The growth of omics databases is creating a considerable potential for knowledge discovery and, concomitantly, is bringing new challenges to statistical learning and computational biology for health applications. Indeed, the high dimensionality of these data may hamper the use of traditional regression methods and parameter estimation algorithms due to the intrinsic non-identifiability of the inherent optimization problem. Regularized optimization has been rising as a promising and useful strategy to solve these ill-posed problems by imposing additional constraints in the solution parameter space. In particular, the field of statistical learning with sparsity has been significantly contributing to building accurate models that also bring interpretability to biological observations and phenomena. Beyond the now-classic elastic net, one of the best-known methods that combine lasso with ridge penalizations, we briefly overview recent literature on structured regularizers and penalty functions that have been applied in biomedical data to build parsimonious models in a variety of underlying contexts, from survival to generalized linear models. These methods include functions of $\ell _k$-norms and network-based penalties that take into account the inherent relationships between the features. The successful application to omics data illustrates the potential of sparse structured regularization for identifying disease's molecular signatures and for creating high-performance clinical decision support systems towards more personalized healthcare. Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.
Collapse
Affiliation(s)
- Susana Vinga
- INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
| |
Collapse
|
50
|
Cieslak MC, Castelfranco AM, Roncalli V, Lenz PH, Hartline DK. t-Distributed Stochastic Neighbor Embedding (t-SNE): A tool for eco-physiological transcriptomic analysis. Mar Genomics 2020; 51:100723. [DOI: 10.1016/j.margen.2019.100723] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Revised: 10/20/2019] [Accepted: 11/01/2019] [Indexed: 01/19/2023]
|