1
|
Kratz A, Ranganathan S. Christian Schönbach 1965-2023. BIOINFORMATICS ADVANCES 2023; 3:vbad147. [PMID: 37886713 PMCID: PMC10599964 DOI: 10.1093/bioadv/vbad147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 10/13/2023] [Indexed: 10/28/2023]
Affiliation(s)
- Anton Kratz
- The Systems Biology Institute, Tokyo 141-0022, Japan
| | - Shoba Ranganathan
- Applied Biosciences, Macquarie University, Sydney, NSW 2109, Australia
| |
Collapse
|
2
|
Del Casale A, Sarli G, Bargagna P, Polidori L, Alcibiade A, Zoppi T, Borro M, Gentile G, Zocchi C, Ferracuti S, Preissner R, Simmaco M, Pompili M. Machine Learning and Pharmacogenomics at the Time of Precision Psychiatry. Curr Neuropharmacol 2023; 21:2395-2408. [PMID: 37559539 PMCID: PMC10616924 DOI: 10.2174/1570159x21666230808170123] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 12/01/2022] [Accepted: 12/06/2022] [Indexed: 08/11/2023] Open
Abstract
Traditional medicine and biomedical sciences are reaching a turning point because of the constantly growing impact and volume of Big Data. Machine Learning (ML) techniques and related algorithms play a central role as diagnostic, prognostic, and decision-making tools in this field. Another promising area becoming part of everyday clinical practice is personalized therapy and pharmacogenomics. Applying ML to pharmacogenomics opens new frontiers to tailored therapeutical strategies to help clinicians choose drugs with the best response and fewer side effects, operating with genetic information and combining it with the clinical profile. This systematic review aims to draw up the state-of-the-art ML applied to pharmacogenomics in psychiatry. Our research yielded fourteen papers; most were published in the last three years. The sample comprises 9,180 patients diagnosed with mood disorders, psychoses, or autism spectrum disorders. Prediction of drug response and prediction of side effects are the most frequently considered domains with the supervised ML technique, which first requires training and then testing. The random forest is the most used algorithm; it comprises several decision trees, reduces the training set's overfitting, and makes precise predictions. ML proved effective and reliable, especially when genetic and biodemographic information were integrated into the algorithm. Even though ML and pharmacogenomics are not part of everyday clinical practice yet, they will gain a unique role in the next future in improving personalized treatments in psychiatry.
Collapse
Affiliation(s)
- Antonio Del Casale
- Department of Dynamic and Clinical Psychology and Health Studies, Faculty of Medicine and Psychology, Sapienza University; Unit of Psychiatry, ‘Sant’Andrea’ University Hospital, Rome, Italy
| | - Giuseppe Sarli
- Department of Neuroscience, Mental Health and Sensory Organs (NESMOS), Faculty of Medicine and Psychology, Sapienza University; Unit of Psychiatry, ‘Sant’Andrea’ University Hospital, Rome, Italy
| | - Paride Bargagna
- Department of Neuroscience, Mental Health and Sensory Organs (NESMOS), Faculty of Medicine and Psychology, Sapienza University; Unit of Psychiatry, ‘Sant’Andrea’ University Hospital, Rome, Italy
| | - Lorenzo Polidori
- Department of Neuroscience, Mental Health and Sensory Organs (NESMOS), Faculty of Medicine and Psychology, Sapienza University; Unit of Psychiatry, ‘Sant’Andrea’ University Hospital, Rome, Italy
| | - Alessandro Alcibiade
- Department of Neuroscience, Mental Health and Sensory Organs (NESMOS), Faculty of Medicine and Psychology, Sapienza University; Unit of Psychiatry, ‘Sant’Andrea’ University Hospital, Rome, Italy
| | - Teodolinda Zoppi
- Department of Neuroscience, Mental Health and Sensory Organs (NESMOS), Faculty of Medicine and Psychology, Sapienza University; Unit of Psychiatry, ‘Sant’Andrea’ University Hospital, Rome, Italy
| | - Marina Borro
- Department of Neuroscience, Mental Health and Sensory Organs (NESMOS), Faculty of Medicine and Psychology, Sapienza University; Unit of Laboratory and Advanced Molecular Diagnostics, ‘Sant’Andrea’ University Hospital, Rome, Italy
| | - Giovanna Gentile
- Department of Neuroscience, Mental Health and Sensory Organs (NESMOS), Faculty of Medicine and Psychology, Sapienza University; Unit of Laboratory and Advanced Molecular Diagnostics, ‘Sant’Andrea’ University Hospital, Rome, Italy
| | - Clarissa Zocchi
- Department of Neuroscience, Mental Health and Sensory Organs (NESMOS), Faculty of Medicine and Psychology, Sapienza University; Unit of Psychiatry, ‘Sant’Andrea’ University Hospital, Rome, Italy
| | - Stefano Ferracuti
- Department of Human Neuroscience, Faculty of Medicine and Dentistry, Sapienza University, Unit of Risk Management, ‘Sant’Andrea’ University Hospital, Rome, Italy
| | - Robert Preissner
- Institute of Physiology and Science-IT, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Philippstrasse 12, 10115, Berlin, Germany
| | - Maurizio Simmaco
- Department of Neuroscience, Mental Health and Sensory Organs (NESMOS), Faculty of Medicine and Psychology, Sapienza University; Unit of Laboratory and Advanced Molecular Diagnostics, ‘Sant’Andrea’ University Hospital, Rome, Italy
| | - Maurizio Pompili
- Department of Neuroscience, Mental Health and Sensory Organs (NESMOS), Faculty of Medicine and Psychology, Sapienza University; Unit of Psychiatry, ‘Sant’Andrea’ University Hospital, Rome, Italy
| |
Collapse
|
3
|
Gan M, Ouyang Y. Study on Tourism Consumer Behavior Characteristics Based on Big Data Analysis. Front Psychol 2022; 13:876993. [PMID: 35586228 PMCID: PMC9108417 DOI: 10.3389/fpsyg.2022.876993] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 03/08/2022] [Indexed: 11/13/2022] Open
Abstract
In terms of scenic marketing, big data research also plays an important role in the precise marketing of scenic spots. This paper has focused on the big data related to scenic spots as the research object, explores the relationship between various subdivision big data and the number of tourists in scenic spots, and investigates the difference and influence of the consumption behavior of the secondary consumption items in the scenic area, to find the potential of the scenic area’s business growth and to promote the continuous and stable growth of the scenic area’s sales and tourism economy. Using the relevant theories and analysis methods, such as consumer behavior, big data, and tourism consumer behavior, the content mainly focuses on the establishment of the analysis model of the number of tourists in the scenic spot, the data collection, the estimation of the model parameters, the various types of big data, the calculation of the contribution rate of the data to the number of tourists in the scenic spot, and the difference analysis of the secondary consumption items of different types of tourists in the scenic spot. Results show that a multi-objective analysis model is established based on the relevant econometric theories, and an optimization plan is proposed after the multicollinearity diagnosis of the model; to establish a data envelopment analysis (DEA) model of the difference and influence of different types of tourists’ consumption behavior in scenic spots and study the consumption behavior characteristics of different types of tourists when they purchase secondary consumption items in scenic spots; the econometric model is used to analyze the big data, adjust the linear relationship of some variables, then adopt the method of gradually adding variables combined with the consumer theory, and finally determine the number of daily tourists as the explained variable, the number of internet protocol (IP), Baidu index, and the virtual value of the weekend, dummy variables for variables, bounce rate, and air pollution as explanatory variables.
Collapse
|
4
|
D'Souza M, Sulakhe D, Wang S, Xie B, Hashemifar S, Taylor A, Dubchak I, Conrad Gilliam T, Maltsev N. Strategic Integration of Multiple Bioinformatics Resources for System Level Analysis of Biological Networks. Methods Mol Biol 2017; 1613:85-99. [PMID: 28849559 DOI: 10.1007/978-1-4939-7027-8_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Recent technological advances in genomics allow the production of biological data at unprecedented tera- and petabyte scales. Efficient mining of these vast and complex datasets for the needs of biomedical research critically depends on a seamless integration of the clinical, genomic, and experimental information with prior knowledge about genotype-phenotype relationships. Such experimental data accumulated in publicly available databases should be accessible to a variety of algorithms and analytical pipelines that drive computational analysis and data mining.We present an integrated computational platform Lynx (Sulakhe et al., Nucleic Acids Res 44:D882-D887, 2016) ( http://lynx.cri.uchicago.edu ), a web-based database and knowledge extraction engine. It provides advanced search capabilities and a variety of algorithms for enrichment analysis and network-based gene prioritization. It gives public access to the Lynx integrated knowledge base (LynxKB) and its analytical tools via user-friendly web services and interfaces. The Lynx service-oriented architecture supports annotation and analysis of high-throughput experimental data. Lynx tools assist the user in extracting meaningful knowledge from LynxKB and experimental data, and in the generation of weighted hypotheses regarding the genes and molecular mechanisms contributing to human phenotypes or conditions of interest. The goal of this integrated platform is to support the end-to-end analytical needs of various translational projects.
Collapse
Affiliation(s)
- Mark D'Souza
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA.
- Argonne National Laboratory, Building 221, Room: A142, 9700 South Cass Avenue, Argonne, IL, 60439, USA.
| | - Dinanath Sulakhe
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
- Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, 60637, USA
| | - Sheng Wang
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, 60637, USA
| | - Bing Xie
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
- Department of Computer Science, Illinois Institute of Technology, Chicago, IL, 60616, USA
| | - Somaye Hashemifar
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, 60637, USA
| | - Andrew Taylor
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
| | - Inna Dubchak
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America, Department of Energy Joint Genome Institute, Walnut Creek, CA, USA
| | - T Conrad Gilliam
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
- Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, 60637, USA
| | - Natalia Maltsev
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
- Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, 60637, USA
| |
Collapse
|
5
|
Toward a Literature-Driven Definition of Big Data in Healthcare. BIOMED RESEARCH INTERNATIONAL 2015; 2015:639021. [PMID: 26137488 PMCID: PMC4468280 DOI: 10.1155/2015/639021] [Citation(s) in RCA: 85] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2014] [Accepted: 02/04/2015] [Indexed: 11/17/2022]
Abstract
Objective. The aim of this study was to provide a definition of big data in healthcare. Methods. A systematic search of PubMed literature published until May 9, 2014, was conducted. We noted the number of statistical individuals (n) and the number of variables (p) for all papers describing a dataset. These papers were classified into fields of study. Characteristics attributed to big data by authors were also considered. Based on this analysis, a definition of big data was proposed. Results. A total of 196 papers were included. Big data can be defined as datasets with Log(n∗p) ≥ 7. Properties of big data are its great variety and high velocity. Big data raises challenges on veracity, on all aspects of the workflow, on extracting meaningful information, and on sharing information. Big data requires new computational methods that optimize data management. Related concepts are data reuse, false knowledge discovery, and privacy issues. Conclusion. Big data is defined by volume. Big data should not be confused with data reuse: data can be big without being reused for another purpose, for example, in omics. Inversely, data can be reused without being necessarily big, for example, secondary use of Electronic Medical Records (EMR) data.
Collapse
|
6
|
|
7
|
Li SC, Tachiki LML, Kabeer MH, Dethlefs BA, Anthony MJ, Loudon WG. Cancer genomic research at the crossroads: realizing the changing genetic landscape as intratumoral spatial and temporal heterogeneity becomes a confounding factor. Cancer Cell Int 2014; 14:115. [PMID: 25411563 PMCID: PMC4236490 DOI: 10.1186/s12935-014-0115-7] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2014] [Accepted: 10/24/2014] [Indexed: 02/06/2023] Open
Abstract
The US National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI) created the Cancer Genome Atlas (TCGA) Project in 2006. The TCGA's goal was to sequence the genomes of 10,000 tumors to identify common genetic changes among different types of tumors for developing genetic-based treatments. TCGA offered great potential for cancer patients, but in reality has little impact on clinical applications. Recent reports place the past TCGA approach of testing a small tumor mass at a single time-point at a crossroads. This crossroads presents us with the conundrum of whether we should sequence more tumors or obtain multiple biopsies from each individual tumor at different time points. Sequencing more tumors with the past TCGA approach of single time-point sampling can neither capture the heterogeneity between different parts of the same tumor nor catch the heterogeneity that occurs as a function of time, error rates, and random drift. Obtaining multiple biopsies from each individual tumor presents multiple logistical and financial challenges. Here, we review current literature and rethink the utility and application of the TCGA approach. We discuss that the TCGA-led catalogue may provide insights into studying the functional significance of oncogenic genes in reference to non-cancer genetic background. Different methods to enhance identifying cancer targets, such as single cell technology, real time imaging of cancer cells with a biological global positioning system, and cross-referencing big data sets, are offered as ways to address sampling discrepancies in the face of tumor heterogeneity. We predict that TCGA landmarks may prove far more useful for cancer prevention than for cancer diagnosis and treatment when considering the effect of non-cancer genes and the normal genetic background on tumor microenvironment. Cancer prevention can be better realized once we understand how therapy affects the genetic makeup of cancer over time in a clinical setting. This may help create novel therapies for gene mutations that arise during a tumor's evolution from the selection pressure of treatment.
Collapse
Affiliation(s)
- Shengwen Calvin Li
- />CHOC Children’s Hospital Research Institute, University of California Irvine, 1201 West La Veta Ave, Orange, CA 92868 USA
- />Department of Neurology, University of California Irvine School of Medicine, Irvine, CA 92697-4292 USA
- />Department of Biological Science, California State University, Fullerton, CA 92834 USA
| | - Lisa May Ling Tachiki
- />CHOC Children’s Hospital Research Institute, University of California Irvine, 1201 West La Veta Ave, Orange, CA 92868 USA
- />University of California Irvine School of Medicine, Irvine, CA 92697 USA
| | - Mustafa H Kabeer
- />CHOC Children’s Hospital Research Institute, University of California Irvine, 1201 West La Veta Ave, Orange, CA 92868 USA
- />Department of Pediatric Surgery, CHOC Children’s Hospital, 1201 West La Veta Ave, Orange, CA 92868 USA
- />Department of Surgery, University of California Irvine School of Medicine, 333 City Blvd. West, Suite 700, Orange, CA 92868 USA
| | - Brent A Dethlefs
- />CHOC Children’s Hospital Research Institute, University of California Irvine, 1201 West La Veta Ave, Orange, CA 92868 USA
| | | | - William G Loudon
- />CHOC Children’s Hospital Research Institute, University of California Irvine, 1201 West La Veta Ave, Orange, CA 92868 USA
- />Department of Neurological Surgery, Saint Joseph Hospital, Orange, CA 92868 USA
- />Department of Neurological Surgery, University of California Irvine School of Medicine, Orange, CA 92862 USA
- />Department of Biological Science, California State University, Fullerton, CA 92834 USA
| |
Collapse
|
8
|
Holzinger A, Dehmer M, Jurisica I. Knowledge Discovery and interactive Data Mining in Bioinformatics--State-of-the-Art, future challenges and research directions. BMC Bioinformatics 2014; 15 Suppl 6:I1. [PMID: 25078282 PMCID: PMC4140208 DOI: 10.1186/1471-2105-15-s6-i1] [Citation(s) in RCA: 134] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Affiliation(s)
- Andreas Holzinger
- Research Unit Human-Computer Interaction, Austrian IBM Watson Think Group, Institute for Medical Informatics, Statistics & Documentation, Medical University Graz, Austria
- Institute of Information Systems and Computer Media, Graz University of Technology, Austria
| | - Matthias Dehmer
- Institute for Bioinformatics and Translational Research, UMIT Tyrol, Austria
| | - Igor Jurisica
- Departments of Medical Biophysics and Computer Science, University of Toronto, Ontario, Canada
- Princess Margaret Cancer Centre and Techna Institute for the Advancement of Technology for Health, University Health Network, IBM Life Sciences Discovery Centre, Ontario, Canada
| |
Collapse
|
9
|
Abstract
Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. This ever-increasing sheer volume has made it difficult for scientists to effectively and accurately access figures of their interest, the process of which is crucial for validating research facts and for formulating or testing novel research hypotheses. Current figure search applications can't fully meet this challenge as the “bag of figures” assumption doesn't take into account the relationship among figures. In our previous study, hundreds of biomedical researchers have annotated articles in which they serve as corresponding authors. They ranked each figure in their paper based on a figure's importance at their discretion, referred to as “figure ranking”. Using this collection of annotated data, we investigated computational approaches to automatically rank figures. We exploited and extended the state-of-the-art listwise learning-to-rank algorithms and developed a new supervised-learning model BioFigRank. The cross-validation results show that BioFigRank yielded the best performance compared with other state-of-the-art computational models, and the greedy feature selection can further boost the ranking performance significantly. Furthermore, we carry out the evaluation by comparing BioFigRank with three-level competitive domain-specific human experts: (1) First Author, (2) Non-Author-In-Domain-Expert who is not the author nor co-author of an article but who works in the same field of the corresponding author of the article, and (3) Non-Author-Out-Domain-Expert who is not the author nor co-author of an article and who may or may not work in the same field of the corresponding author of an article. Our results show that BioFigRank outperforms Non-Author-Out-Domain-Expert and performs as well as Non-Author-In-Domain-Expert. Although BioFigRank underperforms First Author, since most biomedical researchers are either in- or out-domain-experts for an article, we conclude that BioFigRank represents an artificial intelligence system that offers expert-level intelligence to help biomedical researchers to navigate increasingly proliferated big data efficiently.
Collapse
Affiliation(s)
- Feifan Liu
- Department of Computer Science, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, United States of America
| | - Hong Yu
- Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
- VA Central Western Massachusetts, Northampton, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
10
|
High-Throughput Translational Medicine: Challenges and Solutions. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2014; 799:39-67. [DOI: 10.1007/978-1-4614-8778-4_3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
11
|
Khan AM, Tan TW, Schönbach C, Ranganathan S. APBioNet-transforming bioinformatics in the Asia-Pacific region. PLoS Comput Biol 2013; 9:e1003317. [PMID: 24204244 PMCID: PMC3814852 DOI: 10.1371/journal.pcbi.1003317] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Affiliation(s)
- Asif M. Khan
- Perdana University Graduate School of Medicine, Selangor Darul Ehsan, Malaysia
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Tin Wee Tan
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- * E-mail: (TWT); (SR)
| | - Christian Schönbach
- School of Science and Technology, Department of Biology and Chemistry, Nazarbayev University, Astana, Republic of Kazakhstan
- Department of Bioscience and Bioinformatics and Biomedical Informatics R&D Center (BMIRC), Kyushu Institute of Technology, Fukuoka, Japan
| | - Shoba Ranganathan
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Department of Chemistry and Biomolecular Sciences and ARC Centre of Excellence in Bioinformatics, Macquarie University, Sydney, Australia
- * E-mail: (TWT); (SR)
| |
Collapse
|
12
|
Holzinger A, Zupan M. KNODWAT: a scientific framework application for testing knowledge discovery methods for the biomedical domain. BMC Bioinformatics 2013; 14:191. [PMID: 23763826 PMCID: PMC3691758 DOI: 10.1186/1471-2105-14-191] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2013] [Accepted: 05/31/2013] [Indexed: 12/05/2022] Open
Abstract
Background Professionals in the biomedical domain are confronted with an increasing mass of data. Developing methods to assist professional end users in the field of Knowledge Discovery to identify, extract, visualize and understand useful information from these huge amounts of data is a huge challenge. However, there are so many diverse methods and methodologies available, that for biomedical researchers who are inexperienced in the use of even relatively popular knowledge discovery methods, it can be very difficult to select the most appropriate method for their particular research problem. Results A web application, called KNODWAT (KNOwledge Discovery With Advanced Techniques) has been developed, using Java on Spring framework 3.1. and following a user-centered approach. The software runs on Java 1.6 and above and requires a web server such as Apache Tomcat and a database server such as the MySQL Server. For frontend functionality and styling, Twitter Bootstrap was used as well as jQuery for interactive user interface operations. Conclusions The framework presented is user-centric, highly extensible and flexible. Since it enables methods for testing using existing data to assess suitability and performance, it is especially suitable for inexperienced biomedical researchers, new to the field of knowledge discovery and data mining. For testing purposes two algorithms, CART and C4.5 were implemented using the WEKA data mining framework.
Collapse
Affiliation(s)
- Andreas Holzinger
- Research Unit Human-Computer Interaction (HCI4MED), Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Auenbruggerplatz 2/V, Graz 8036, Austria.
| | | |
Collapse
|
13
|
Ikeda S, Abe T, Nakamura Y, Kibinge N, Hirai Morita A, Nakatani A, Ono N, Ikemura T, Nakamura K, Altaf-Ul-Amin M, Kanaya S. Systematization of the protein sequence diversity in enzymes related to secondary metabolic pathways in plants, in the context of big data biology inspired by the KNApSAcK motorcycle database. PLANT & CELL PHYSIOLOGY 2013; 54:711-727. [PMID: 23509110 DOI: 10.1093/pcp/pct041] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Biology is increasingly becoming a data-intensive science with the recent progress of the omics fields, e.g. genomics, transcriptomics, proteomics and metabolomics. The species-metabolite relationship database, KNApSAcK Core, has been widely utilized and cited in metabolomics research, and chronological analysis of that research work has helped to reveal recent trends in metabolomics research. To meet the needs of these trends, the KNApSAcK database has been extended by incorporating a secondary metabolic pathway database called Motorcycle DB. We examined the enzyme sequence diversity related to secondary metabolism by means of batch-learning self-organizing maps (BL-SOMs). Initially, we constructed a map by using a big data matrix consisting of the frequencies of all possible dipeptides in the protein sequence segments of plants and bacteria. The enzyme sequence diversity of the secondary metabolic pathways was examined by identifying clusters of segments associated with certain enzyme groups in the resulting map. The extent of diversity of 15 secondary metabolic enzyme groups is discussed. Data-intensive approaches such as BL-SOM applied to big data matrices are needed for systematizing protein sequences. Handling big data has become an inevitable part of biology.
Collapse
Affiliation(s)
- Shun Ikeda
- Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma-shi, Nara, 630-0192 Japan
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Afendi FM, Ono N, Nakamura Y, Nakamura K, Darusman LK, Kibinge N, Morita AH, Tanaka K, Horai H, Altaf-Ul-Amin M, Kanaya S. Data Mining Methods for Omics and Knowledge of Crude Medicinal Plants toward Big Data Biology. Comput Struct Biotechnol J 2013; 4:e201301010. [PMID: 24688691 PMCID: PMC3962233 DOI: 10.5936/csbj.201301010] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2012] [Revised: 03/09/2013] [Accepted: 03/09/2013] [Indexed: 01/01/2023] Open
Abstract
Molecular biological data has rapidly increased with the recent progress of the Omics fields, e.g., genomics, transcriptomics, proteomics and metabolomics that necessitates the development of databases and methods for efficient storage, retrieval, integration and analysis of massive data. The present study reviews the usage of KNApSAcK Family DB in metabolomics and related area, discusses several statistical methods for handling multivariate data and shows their application on Indonesian blended herbal medicines (Jamu) as a case study. Exploration using Biplot reveals many plants are rarely utilized while some plants are highly utilized toward specific efficacy. Furthermore, the ingredients of Jamu formulas are modeled using Partial Least Squares Discriminant Analysis (PLS-DA) in order to predict their efficacy. The plants used in each Jamu medicine served as the predictors, whereas the efficacy of each Jamu provided the responses. This model produces 71.6% correct classification in predicting efficacy. Permutation test then is used to determine plants that serve as main ingredients in Jamu formula by evaluating the significance of the PLS-DA coefficients. Next, in order to explain the role of plants that serve as main ingredients in Jamu medicines, information of pharmacological activity of the plants is added to the predictor block. Then N-PLS-DA model, multiway version of PLS-DA, is utilized to handle the three-dimensional array of the predictor block. The resulting N-PLS-DA model reveals that the effects of some pharmacological activities are specific for certain efficacy and the other activities are diverse toward many efficacies. Mathematical modeling introduced in the present study can be utilized in global analysis of big data targeting to reveal the underlying biology.
Collapse
Affiliation(s)
- Farit M Afendi
- Graduate School of Information Science, Nara Institute of Science and Technology, Nara 630-0101, Ikoma, Japan ; Department of Statistics, Bogor Agricultural University, Jln. Meranti, Kampus IPB Darmaga, Bogor 16680, Indonesia
| | - Naoaki Ono
- Graduate School of Information Science, Nara Institute of Science and Technology, Nara 630-0101, Ikoma, Japan
| | - Yukiko Nakamura
- Graduate School of Information Science, Nara Institute of Science and Technology, Nara 630-0101, Ikoma, Japan
| | - Kensuke Nakamura
- Maebashi Institute of technology, 450-1 Kamisadori, Maebashi-shi, Gunma, 371-0816 Japan
| | - Latifah K Darusman
- Biopharmaca Research Center, Bogor Agricultural University, Kampas IPB Taman Kencana, Jln. Taman Kencana No. 3 Bogor 16151, Indonesia
| | - Nelson Kibinge
- Graduate School of Information Science, Nara Institute of Science and Technology, Nara 630-0101, Ikoma, Japan
| | - Aki Hirai Morita
- Graduate School of Information Science, Nara Institute of Science and Technology, Nara 630-0101, Ikoma, Japan
| | - Ken Tanaka
- Department of Medicinal Resources, Institute of Natural Medicine, University of Toyama, 2630 Toyama, 930-0194, Japan
| | - Hisayuki Horai
- Department of Electronic and Computer Engineering, Ibaraki National College of Technology, 866 Nakane, Hitachinaka, Ibaraki 312-8508, Japan
| | - Md Altaf-Ul-Amin
- Graduate School of Information Science, Nara Institute of Science and Technology, Nara 630-0101, Ikoma, Japan
| | - Shigehiko Kanaya
- Graduate School of Information Science, Nara Institute of Science and Technology, Nara 630-0101, Ikoma, Japan
| |
Collapse
|
15
|
Tretyakov K, Goldberg T, Jin VX, Horton P. Summary of talks and papers at ISCB-Asia/SCCG 2012. BMC Genomics 2013. [PMCID: PMC3639071 DOI: 10.1186/1471-2164-14-s2-i1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Abstract
The second ISCB-Asia conference of the International Society for Computational Biology took place December 17-19, 2012, in Shenzhen, China. The conference was co-hosted by BGI as the first Shenzhen Conference on Computational Genomics (SCCG).
45 talks were presented at ISCB-Asia/SCCG 2012. The topics covered included software tools, reproducible computing, next-generation sequencing data analysis, transcription and mRNA regulation, protein structure and function, cancer genomics and personalized medicine. Nine of the proceedings track talks are included as full papers in this supplement.
In this report we first give a short overview of the conference by listing some statistics and visualizing the talk abstracts as word clouds. Then we group the talks by topic and briefly summarize each one, providing references to related publications whenever possible. Finally, we close with a few comments on the success of this conference.
Collapse
|
16
|
Schönbach C, Tongsima S, Chan J, Brusic V, Tan TW, Ranagathan S. InCoB2012 Conference: from biological data to knowledge to technological breakthroughs. BMC Bioinformatics 2012; 13 Suppl 17:S1. [PMID: 23281929 PMCID: PMC3521245 DOI: 10.1186/1471-2105-13-s17-s1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Ten years ago when Asia-Pacific Bioinformatics Network held the first International Conference on Bioinformatics (InCoB) in Bangkok its theme was North-South Networking. At that time InCoB aimed to provide biologists and bioinformatics researchers in the Asia-Pacific region a forum to meet, interact with, and disseminate knowledge about the burgeoning field of bioinformatics. Meanwhile InCoB has evolved into a major regional bioinformatics conference that attracts not only talented and established scientists from the region but increasingly also from East Asia, North America and Europe. Since 2006 InCoB yielded 114 articles in BMC Bioinformatics supplement issues that have been cited nearly 1,000 times to date. In part, these developments reflect the success of bioinformatics education and continuous efforts to integrate and utilize bioinformatics in biotechnology and biosciences in the Asia-Pacific region. A cross-section of research leading from biological data to knowledge and to technological applications, the InCoB2012 theme, is introduced in this editorial. Other highlights included sessions organized by the Pan-Asian Pacific Genome Initiative and a Machine Learning in Immunology competition. InCoB2013 is scheduled for September 18-21, 2013 at Suzhou, China.
Collapse
Affiliation(s)
- Christian Schönbach
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Fukuoka 820-8502, Japan
- Biomedical Informatics Research and Development Center, Kyushu Institute of Technology, Fukuoka 820-8502, Japan
| | - Sissades Tongsima
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Thailand Science Park, Pathumthani 12120, Thailand
| | - Jonathan Chan
- School of Information Technology, King Mongkut's University of Technology Thonburi, Bangkok 10140, Thailand
| | - Vladimir Brusic
- Cancer Vaccine Center, Dana-Farber Cancer Institute, Boston, MA 02115, USA
| | - Tin Wee Tan
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597, Republic of Singapore
- Computational Resource Centre (A*CRC), A*STAR, Singapore 138632, Republic of Singapore
| | - Shoba Ranagathan
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597, Republic of Singapore
- Department of Chemistry and Biomolecular Sciences and ARC Centre of Excellence, Macquarie University, Sydney, NSW 2109, Australia
| |
Collapse
|
17
|
Schönbach C, Tan TW, Kelso J, Rost B, Nathan S, Ranganathan S. InCoB celebrates its tenth anniversary as first joint conference with ISCB-Asia. BMC Genomics 2011; 12 Suppl 3:S1. [PMID: 22369160 PMCID: PMC3333168 DOI: 10.1186/1471-2164-12-s3-s1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
In 2009 the International Society for Computational Biology (ISCB) started to roll out regional bioinformatics conferences in Africa, Latin America and Asia. The open and competitive bid for the first meeting in Asia (ISCB-Asia) was awarded to Asia-Pacific Bioinformatics Network (APBioNet) which has been running the International Conference on Bioinformatics (InCoB) in the Asia-Pacific region since 2002. InCoB/ISCB-Asia 2011 is held from November 30 to December 2, 2011 in Kuala Lumpur, Malaysia. Of 104 manuscripts submitted to BMC Genomics and BMC Bioinformatics conference supplements, 49 (47.1%) were accepted. The strong showing of Asia among submissions (82.7%) and acceptances (81.6%) signals the success of this tenth InCoB anniversary meeting, and bodes well for the future of ISCB-Asia.
Collapse
Affiliation(s)
- Christian Schönbach
- Department of Bioscience and Bioinformatics, Graduate School of Computer Science and Systems Engineering, Kyushu Institute of Technology, Fukuoka 820-8502, Japan.
| | | | | | | | | | | |
Collapse
|