1
|
Bibal A, Salem NM, Cardon R, White EK, Acuna DE, Burke R, Hunter LE. RecSOI: recommending research directions using statements of ignorance. J Biomed Semantics 2024; 15:2. [PMID: 38650032 PMCID: PMC11034121 DOI: 10.1186/s13326-024-00304-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 03/23/2024] [Indexed: 04/25/2024] Open
Abstract
The more science advances, the more questions are asked. This compounding growth can make it difficult to keep up with current research directions. Furthermore, this difficulty is exacerbated for junior researchers who enter fields with already large bases of potentially fruitful research avenues. In this paper, we propose a novel task and a recommender system for research directions, RecSOI, that draws from statements of ignorance (SOIs) found in the research literature. By building researchers' profiles based on textual elements, RecSOI generates personalized recommendations of potential research directions tailored to their interests. In addition, RecSOI provides context for the recommended SOIs, so that users can quickly evaluate how relevant the research direction is for them. In this paper, we provide an overview of RecSOI's functioning, implementation, and evaluation, demonstrating its effectiveness in guiding researchers through the vast landscape of potential research directions.
Collapse
Affiliation(s)
- Adrien Bibal
- University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA.
| | - Nourah M Salem
- University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
| | - Rémi Cardon
- University of Louvain, Louvain-la-Neuve, Belgium
| | - Elizabeth K White
- University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
| | | | - Robin Burke
- University of Colorado Boulder, Boulder, Colorado, USA
| | | |
Collapse
|
2
|
Callahan TJ, Tripodi IJ, Stefanski AL, Cappelletti L, Taneja SB, Wyrwa JM, Casiraghi E, Matentzoglu NA, Reese J, Silverstein JC, Hoyt CT, Boyce RD, Malec SA, Unni DR, Joachimiak MP, Robinson PN, Mungall CJ, Cavalleri E, Fontana T, Valentini G, Mesiti M, Gillenwater LA, Santangelo B, Vasilevsky NA, Hoehndorf R, Bennett TD, Ryan PB, Hripcsak G, Kahn MG, Bada M, Baumgartner WA, Hunter LE. An open source knowledge graph ecosystem for the life sciences. Sci Data 2024; 11:363. [PMID: 38605048 PMCID: PMC11009265 DOI: 10.1038/s41597-024-03171-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 03/21/2024] [Indexed: 04/13/2024] Open
Abstract
Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoint resources and abstraction algorithms), and benchmarks (e.g., prebuilt KGs). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 different large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.
Collapse
Affiliation(s)
- Tiffany J Callahan
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA.
| | - Ignacio J Tripodi
- Computer Science Department, Interdisciplinary Quantitative Biology, University of Colorado Boulder, Boulder, CO, 80301, USA
| | - Adrianne L Stefanski
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Luca Cappelletti
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Sanya B Taneja
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Jordan M Wyrwa
- Department of Physical Medicine and Rehabilitation, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Elena Casiraghi
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Justin Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Jonathan C Silverstein
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15206, USA
| | - Charles Tapley Hoyt
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, 02115, USA
| | - Richard D Boyce
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15206, USA
| | - Scott A Malec
- Division of Translational Informatics, University of New Mexico School of Medicine, Albuquerque, NM, 87131, USA
| | - Deepak R Unni
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Marcin P Joachimiak
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Peter N Robinson
- Berlin Institute of Health at Charité-Universitatsmedizin, 10117, Berlin, Germany
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Emanuele Cavalleri
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Tommaso Fontana
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Giorgio Valentini
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
- ELLIS, European Laboratory for Learning and Intelligent Systems, Milan Unit, Italy
| | - Marco Mesiti
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Lucas A Gillenwater
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Brook Santangelo
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Nicole A Vasilevsky
- Data Collaboration Center, Critical Path Institute, 1840 E River Rd. Suite 100, Tucson, AZ, 85718, USA
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Tellen D Bennett
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Patrick B Ryan
- Janssen Research and Development, Raritan, NJ, 08869, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Michael G Kahn
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Michael Bada
- Division of General Internal Medicine, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - William A Baumgartner
- Division of General Internal Medicine, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
| | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
| |
Collapse
|
3
|
Santangelo BE, Apgar M, Colorado ASB, Martin CG, Sterrett J, Wall E, Joachimiak MP, Hunter LE, Lozupone CA. Integrating biological knowledge for mechanistic inference in the host-associated microbiome. Front Microbiol 2024; 15:1351678. [PMID: 38638909 PMCID: PMC11024261 DOI: 10.3389/fmicb.2024.1351678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 02/26/2024] [Indexed: 04/20/2024] Open
Abstract
Advances in high-throughput technologies have enhanced our ability to describe microbial communities as they relate to human health and disease. Alongside the growth in sequencing data has come an influx of resources that synthesize knowledge surrounding microbial traits, functions, and metabolic potential with knowledge of how they may impact host pathways to influence disease phenotypes. These knowledge bases can enable the development of mechanistic explanations that may underlie correlations detected between microbial communities and disease. In this review, we survey existing resources and methodologies for the computational integration of broad classes of microbial and host knowledge. We evaluate these knowledge bases in their access methods, content, and source characteristics. We discuss challenges of the creation and utilization of knowledge bases including inconsistency of nomenclature assignment of taxa and metabolites across sources, whether the biological entities represented are rooted in ontologies or taxonomies, and how the structure and accessibility limit the diversity of applications and user types. We make this information available in a code and data repository at: https://github.com/lozuponelab/knowledge-source-mappings. Addressing these challenges will allow for the development of more effective tools for drawing from abundant knowledge to find new insights into microbial mechanisms in disease by fostering a systematic and unbiased exploration of existing information.
Collapse
Affiliation(s)
- Brook E. Santangelo
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, United States
| | - Madison Apgar
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, United States
| | | | - Casey G. Martin
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, United States
| | - John Sterrett
- Department of Integrative Physiology, University of Colorado, Boulder, CO, United States
| | - Elena Wall
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, United States
| | - Marcin P. Joachimiak
- Lawrence Berkeley National Laboratory, Environmental Genomics and Systems Biology Division, Biosystems Data Science Department, Berkeley, CA, United States
| | - Lawrence E. Hunter
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, United States
| | - Catherine A. Lozupone
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, United States
| |
Collapse
|
4
|
Lee JS, Lowell JL, Whitewater K, Roane TM, Miller CS, Chan AP, Sylvester AW, Jackson D, Hunter LE. Monitoring environmental microbiomes: Alignment of microbiology and computational biology competencies within a culturally integrated curriculum and research framework. Mol Ecol Resour 2023. [PMID: 37702134 DOI: 10.1111/1755-0998.13867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 08/18/2023] [Accepted: 08/30/2023] [Indexed: 09/14/2023]
Abstract
We have developed a flexible undergraduate curriculum that leverages the place-based research of environmental microbiomes to increase the number of Indigenous researchers in microbiology, data science and scientific computing. Monitoring Environmental Microbiomes (MEM) provides a curriculum and research framework designed to integrate an Indigenous approach when conducting authentic scientific research and to build interest and confidence at the undergraduate level. MEM has been successfully implemented as a short summer workshop to introduce computing practices in microbiome analysis. Based on self-assessed student knowledge of topics and skills, increased scientific confidence and interest in genomics careers were observed. We propose MEM be incorporated in a scalable course-based research experience for undergraduate institutions, including tribal colleges and universities, community colleges and other minority serving institutions. This coupled curricular and research framework explicitly considers cultural perspectives, access and equity to train a diverse future workforce that is more informed to engage in microbiome research and to translate microbiome science to benefit community and environmental health.
Collapse
Affiliation(s)
- J S Lee
- Department of Chemistry and Biochemistry, Fort Lewis College, Durango, Colorado, USA
| | - J L Lowell
- Department of Public Health, Fort Lewis College, Durango, Colorado, USA
| | - K Whitewater
- Department of Chemistry and Biochemistry, Fort Lewis College, Durango, Colorado, USA
| | - T M Roane
- Department of Integrative Biology, University of Colorado Denver, Denver, Colorado, USA
| | - C S Miller
- Department of Integrative Biology, University of Colorado Denver, Denver, Colorado, USA
| | - A P Chan
- J. Craig Venter Institute, Rockville, Maryland, USA
| | - A W Sylvester
- Marine Biological Laboratory, Woods Hole, Massachusetts, USA
- University of Wyoming, Laramie, Wyoming, USA
| | - D Jackson
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - L E Hunter
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
| |
Collapse
|
5
|
Gupta S, Westacott MJ, Ayers DG, Weiss SJ, Whitley P, Mueller C, Weaver DC, Schneider DJ, Karimpour-Fard A, Hunter LE, Drolet DW, Janjic N. Plasma proteome of growing tumors. Sci Rep 2023; 13:12195. [PMID: 37500700 PMCID: PMC10374562 DOI: 10.1038/s41598-023-38079-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 07/03/2023] [Indexed: 07/29/2023] Open
Abstract
Early detection of cancer is vital for the best chance of successful treatment, but half of all cancers are diagnosed at an advanced stage. A simple and reliable blood screening test applied routinely would therefore address a major unmet medical need. To gain insight into the value of protein biomarkers in early detection and stratification of cancer we determined the time course of changes in the plasma proteome of mice carrying transplanted human lung, breast, colon, or ovarian tumors. For protein measurements we used an aptamer-based assay which simultaneously measures ~ 5000 proteins. Along with tumor lineage-specific biomarkers, we also found 15 markers shared among all cancer types that included the energy metabolism enzymes glyceraldehyde-3-phosphate dehydrogenase, glucose-6-phophate isomerase and dihydrolipoyl dehydrogenase as well as several important biomarkers for maintaining protein, lipid, nucleotide, or carbohydrate balance such as tryptophanyl t-RNA synthetase and nucleoside diphosphate kinase. Using significantly altered proteins in the tumor bearing mice, we developed models to stratify tumor types and to estimate the minimum detectable tumor volume. Finally, we identified significantly enriched common and unique biological pathways among the eight tumor cell lines tested.
Collapse
Affiliation(s)
- Shashi Gupta
- SomaLogic, Inc., 2945 Wilderness Place, Boulder, CO, 80301, USA
| | | | - Deborah G Ayers
- SomaLogic, Inc., 2945 Wilderness Place, Boulder, CO, 80301, USA
| | - Sophie J Weiss
- SomaLogic, Inc., 2945 Wilderness Place, Boulder, CO, 80301, USA
| | - Penn Whitley
- Boulder BioConsulting, Inc., 325 S 68th St., Boulder, CO, 80303, USA
| | | | - Daniel C Weaver
- Boulder BioConsulting, Inc., 325 S 68th St., Boulder, CO, 80303, USA
| | | | - Anis Karimpour-Fard
- University of Colorado School of Medicine, Mailstop 8303, Aurora, CO, 80045, USA
| | - Lawrence E Hunter
- University of Colorado School of Medicine, Mailstop 8303, Aurora, CO, 80045, USA
| | - Daniel W Drolet
- SomaLogic, Inc., 2945 Wilderness Place, Boulder, CO, 80301, USA
| | - Nebojsa Janjic
- SomaLogic, Inc., 2945 Wilderness Place, Boulder, CO, 80301, USA.
| |
Collapse
|
6
|
Boguslav MR, Salem NM, White EK, Sullivan KJ, Bada M, Hernandez TL, Leach SM, Hunter LE. Creating an ignorance-base: Exploring known unknowns in the scientific literature. J Biomed Inform 2023; 143:104405. [PMID: 37270143 PMCID: PMC10528083 DOI: 10.1016/j.jbi.2023.104405] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 05/18/2023] [Accepted: 05/21/2023] [Indexed: 06/05/2023]
Abstract
BACKGROUND Scientific discovery progresses by exploring new and uncharted territory. More specifically, it advances by a process of transforming unknown unknowns first into known unknowns, and then into knowns. Over the last few decades, researchers have developed many knowledge bases to capture and connect the knowns, which has enabled topic exploration and contextualization of experimental results. But recognizing the unknowns is also critical for finding the most pertinent questions and their answers. Prior work on known unknowns has sought to understand them, annotate them, and automate their identification. However, no knowledge-bases yet exist to capture these unknowns, and little work has focused on how scientists might use them to trace a given topic or experimental result in search of open questions and new avenues for exploration. We show here that a knowledge base of unknowns can be connected to ontologically grounded biomedical knowledge to accelerate research in the field of prenatal nutrition. RESULTS We present the first ignorance-base, a knowledge-base created by combining classifiers to recognize ignorance statements (statements of missing or incomplete knowledge that imply a goal for knowledge) and biomedical concepts over the prenatal nutrition literature. This knowledge-base places biomedical concepts mentioned in the literature in context with the ignorance statements authors have made about them. Using our system, researchers interested in the topic of vitamin D and prenatal health were able to uncover three new avenues for exploration (immune system, respiratory system, and brain development) by searching for concepts enriched in ignorance statements. These were buried among the many standard enriched concepts. Additionally, we used the ignorance-base to enrich concepts connected to a gene list associated with vitamin D and spontaneous preterm birth and found an emerging topic of study (brain development) in an implied field (neuroscience). The researchers could look to the field of neuroscience for potential answers to the ignorance statements. CONCLUSION Our goal is to help students, researchers, funders, and publishers better understand the state of our collective scientific ignorance (known unknowns) in order to help accelerate research through the continued illumination of and focus on the known unknowns and their respective goals for scientific knowledge.
Collapse
Affiliation(s)
- Mayla R Boguslav
- Computational Bioscience Program, University of Colorado, Anschutz Medical Campus, E 17th Avenue, Aurora, 80045, CO, USA.
| | - Nourah M Salem
- Computational Bioscience Program, University of Colorado, Anschutz Medical Campus, E 17th Avenue, Aurora, 80045, CO, USA
| | - Elizabeth K White
- Computational Bioscience Program, University of Colorado, Anschutz Medical Campus, E 17th Avenue, Aurora, 80045, CO, USA; Center for Genes, Environment and Health, National Jewish Health, Jackson Street, Denver, 80206, CO, USA
| | - Katherine J Sullivan
- Computational Bioscience Program, University of Colorado, Anschutz Medical Campus, E 17th Avenue, Aurora, 80045, CO, USA
| | - Michael Bada
- Computational Bioscience Program, University of Colorado, Anschutz Medical Campus, E 17th Avenue, Aurora, 80045, CO, USA
| | - Teri L Hernandez
- College of Nursing, Department of Medicine/Division of Endocrinology, Metabolism, & Diabetes, University of Colorado, Anschutz Medical Campus, E 17th Avenue, Aurora, 80045, CO, USA
| | - Sonia M Leach
- Computational Bioscience Program, University of Colorado, Anschutz Medical Campus, E 17th Avenue, Aurora, 80045, CO, USA; Center for Genes, Environment and Health, National Jewish Health, Jackson Street, Denver, 80206, CO, USA
| | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado, Anschutz Medical Campus, E 17th Avenue, Aurora, 80045, CO, USA
| |
Collapse
|
7
|
Callahan TJ, Stefanski AL, Wyrwa JM, Zeng C, Ostropolets A, Banda JM, Baumgartner WA, Boyce RD, Casiraghi E, Coleman BD, Collins JH, Deakyne Davies SJ, Feinstein JA, Lin AY, Martin B, Matentzoglu NA, Meeker D, Reese J, Sinclair J, Taneja SB, Trinkley KE, Vasilevsky NA, Williams AE, Zhang XA, Denny JC, Ryan PB, Hripcsak G, Bennett TD, Haendel MA, Robinson PN, Hunter LE, Kahn MG. Ontologizing health systems data at scale: making translational discovery a reality. NPJ Digit Med 2023; 6:89. [PMID: 37208468 PMCID: PMC10196319 DOI: 10.1038/s41746-023-00830-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 04/28/2023] [Indexed: 05/21/2023] Open
Abstract
Common data models solve many challenges of standardizing electronic health record (EHR) data but are unable to semantically integrate all of the resources needed for deep phenotyping. Open Biological and Biomedical Ontology (OBO) Foundry ontologies provide computable representations of biological knowledge and enable the integration of heterogeneous data. However, mapping EHR data to OBO ontologies requires significant manual curation and domain expertise. We introduce OMOP2OBO, an algorithm for mapping Observational Medical Outcomes Partnership (OMOP) vocabularies to OBO ontologies. Using OMOP2OBO, we produced mappings for 92,367 conditions, 8611 drug ingredients, and 10,673 measurement results, which covered 68-99% of concepts used in clinical practice when examined across 24 hospitals. When used to phenotype rare disease patients, the mappings helped systematically identify undiagnosed patients who might benefit from genetic testing. By aligning OMOP vocabularies to OBO ontologies our algorithm presents new opportunities to advance EHR-based deep phenotyping.
Collapse
Affiliation(s)
- Tiffany J Callahan
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA.
| | - Adrianne L Stefanski
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Jordan M Wyrwa
- Department of Physical Medicine and Rehabilitation, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Chenjie Zeng
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Anna Ostropolets
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Juan M Banda
- Department of Computer Science, Georgia State University, Atlanta, GA, 30303, USA
| | - William A Baumgartner
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Richard D Boyce
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15260, USA
| | - Elena Casiraghi
- Computer Science, Università degli Studi di Milano, Milan, Italy
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Ben D Coleman
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Janine H Collins
- Department of Haematology, University of Cambridge, Cambridge, UK
| | - Sara J Deakyne Davies
- Department of Research Informatics & Data Science, Analytics Resource Center, Children's Hospital Colorado, Aurora, CO, 80045, USA
| | - James A Feinstein
- Adult and Child Center for Health Outcomes Research and Delivery Science (ACCORDS), University of Colorado Anschutz School of Medicine, Aurora, CO, 80045, USA
| | - Asiyah Y Lin
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Blake Martin
- Departments of Biomedical Informatics and Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | | | | | - Justin Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Sanya B Taneja
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Katy E Trinkley
- Department of Family Medicine, University of Colorado Anschutz School of Medicine, Aurora, CO, 80045, USA
| | - Nicole A Vasilevsky
- Translational and Integrative Sciences Lab, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Andrew E Williams
- Tufts Institute for Clinical Research and Health Policy Studies, Tufts University, Boston, MA, 02155, USA
| | - Xingmin A Zhang
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Joshua C Denny
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Patrick B Ryan
- Janssen Research and Development, Raritan, NJ, 08869, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Tellen D Bennett
- Departments of Biomedical Informatics and Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Melissa A Haendel
- Departments of Biomedical Informatics and Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Michael G Kahn
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| |
Collapse
|
8
|
Callahan TJ, Stefanksi AL, Ostendorf DM, Wyrwa JM, Davies SJD, Hripcsak G, Hunter LE, Kahn MG. Characterizing Patient Representations for Computational Phenotyping. AMIA Annu Symp Proc 2023; 2022:319-328. [PMID: 37128436 PMCID: PMC10148332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Patient representation learning methods create rich representations of complex data and have potential to further advance the development of computational phenotypes (CP). Currently, these methods are either applied to small predefined concept sets or all available patient data, limiting the potential for novel discovery and reducing the explainability of the resulting representations. We report on an extensive, data-driven characterization of the utility of patient representation learning methods for the purpose of CP development or automatization. We conducted ablation studies to examine the impact of patient representations, built using data from different combinations of data types and sampling windows on rare disease classification. We demonstrated that the data type and sampling window directly impact classification and clustering performance, and these results differ by rare disease group. Our results, although preliminary, exemplify the importance of and need for data-driven characterization in patient representation-based CP development pipelines.
Collapse
Affiliation(s)
- Tiffany J Callahan
- Columbia University, New York, NY, 10032, USA
- University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | | | | | - Jordan M Wyrwa
- University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
- Children's Hospital Colorado, Aurora, CO, 80045, USA
| | | | | | - Lawrence E Hunter
- University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Michael G Kahn
- University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| |
Collapse
|
9
|
Callahan TJ, Stefanski AL, Kim JD, Baumgartner WA, Wyrwa JM, Hunter LE. Knowledge-Driven Mechanistic Enrichment of the Preeclampsia Ignorome. Pac Symp Biocomput 2023; 28:371-382. [PMID: 36540992 PMCID: PMC9782728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Preeclampsia is a leading cause of maternal and fetal morbidity and mortality. Currently, the only definitive treatment of preeclampsia is delivery of the placenta, which is central to the pathogenesis of the disease. Transcriptional profiling of human placenta from pregnancies complicated by preeclampsia has been extensively performed to identify differentially expressed genes (DEGs). The decisions to investigate DEGs experimentally are biased by many factors, causing many DEGs to remain uninvestigated. A set of DEGs which are associated with a disease experimentally, but which have no known association to the disease in the literature are known as the ignorome. Preeclampsia has an extensive body of scientific literature, a large pool of DEG data, and only one definitive treatment. Tools facilitating knowledge-based analyses, which are capable of combining disparate data from many sources in order to suggest underlying mechanisms of action, may be a valuable resource to support discovery and improve our understanding of this disease. In this work we demonstrate how a biomedical knowledge graph (KG) can be used to identify novel preeclampsia molecular mechanisms. Existing open source biomedical resources and publicly available high-throughput transcriptional profiling data were used to identify and annotate the function of currently uninvestigated preeclampsia-associated DEGs. Experimentally investigated genes associated with preeclampsia were identified from PubMed abstracts using text-mining methodologies. The relative complement of the text-mined- and meta-analysis-derived lists were identified as the uninvestigated preeclampsia-associated DEGs (n=445), i.e., the preeclampsia ignorome. Using the KG to investigate relevant DEGs revealed 53 novel clinically relevant and biologically actionable mechanistic associations.
Collapse
Affiliation(s)
- Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University, New York, NY, USA,
| | | | | | | | | | | |
Collapse
|
10
|
Santangelo BE, Gillenwater LA, Salem NM, Hunter LE. Molecular cartooning with knowledge graphs. Front Bioinform 2022; 2:1054578. [PMID: 36568701 PMCID: PMC9772836 DOI: 10.3389/fbinf.2022.1054578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 11/23/2022] [Indexed: 12/13/2022] Open
Abstract
Molecular "cartoons," such as pathway diagrams, provide a visual summary of biomedical research results and hypotheses. Their ubiquitous appearance within the literature indicates their universal application in mechanistic communication. A recent survey of pathway diagrams identified 64,643 pathway figures published between 1995 and 2019 with 1,112,551 mentions of 13,464 unique human genes participating in a wide variety of biological processes. Researchers generally create these diagrams using generic diagram editing software that does not itself embody any biomedical knowledge. Biomedical knowledge graphs (KGs) integrate and represent knowledge in a semantically consistent way, systematically capturing biomedical knowledge similar to that in molecular cartoons. KGs have the potential to provide context and precise details useful in drawing such figures. However, KGs cannot generally be translated directly into figures. They include substantial material irrelevant to the scientific point of a given figure and are often more detailed than is appropriate. How could KGs be used to facilitate the creation of molecular diagrams? Here we present a new approach towards cartoon image creation that utilizes the semantic structure of knowledge graphs to aid the production of molecular diagrams. We introduce a set of "semantic graphical actions" that select and transform the relational information between heterogeneous entities (e.g., genes, proteins, pathways, diseases) in a KG to produce diagram schematics that meet the scientific communication needs of the user. These semantic actions search, select, filter, transform, group, arrange, connect and extract relevant subgraphs from KGs based on meaning in biological terms, e.g., a protein upstream of a target in a pathway. To demonstrate the utility of this approach, we show how semantic graphical actions on KGs could have been used to produce three existing pathway diagrams in diverse biomedical domains: Down Syndrome, COVID-19, and neuroinflammation. Our focus is on recapitulating the semantic content of the figures, not the layout, glyphs, or other aesthetic aspects. Our results suggest that the use of KGs and semantic graphical actions to produce biomedical diagrams will reduce the effort required and improve the quality of this visual form of scientific communication.
Collapse
|
11
|
Nicholson DN, Rubinetti V, Hu D, Thielk M, Hunter LE, Greene CS. Examining linguistic shifts between preprints and publications. PLoS Biol 2022; 20:e3001470. [PMID: 35104289 PMCID: PMC8806061 DOI: 10.1371/journal.pbio.3001470] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 11/05/2021] [Indexed: 11/19/2022] Open
Abstract
Preprints allow researchers to make their findings available to the scientific community before they have undergone peer review. Studies on preprints within bioRxiv have been largely focused on article metadata and how often these preprints are downloaded, cited, published, and discussed online. A missing element that has yet to be examined is the language contained within the bioRxiv preprint repository. We sought to compare and contrast linguistic features within bioRxiv preprints to published biomedical text as a whole as this is an excellent opportunity to examine how peer review changes these documents. The most prevalent features that changed appear to be associated with typesetting and mentions of supporting information sections or additional files. In addition to text comparison, we created document embeddings derived from a preprint-trained word2vec model. We found that these embeddings are able to parse out different scientific approaches and concepts, link unannotated preprint-peer-reviewed article pairs, and identify journals that publish linguistically similar papers to a given preprint. We also used these embeddings to examine factors associated with the time elapsed between the posting of a first preprint and the appearance of a peer-reviewed publication. We found that preprints with more versions posted and more textual changes took longer to publish. Lastly, we constructed a web application (https://greenelab.github.io/preprint-similarity-search/) that allows users to identify which journals and articles that are most linguistically similar to a bioRxiv or medRxiv preprint as well as observe where the preprint would be positioned within a published article landscape.
Collapse
Affiliation(s)
- David N. Nicholson
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Vincent Rubinetti
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Dongbo Hu
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Marvin Thielk
- Elsevier, Philadelphia, Pennsylvania, United States of America
| | - Lawrence E. Hunter
- Center for Computational Pharmacology, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Casey S. Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| |
Collapse
|
12
|
Boguslav MR, Hailu ND, Bada M, Baumgartner WA, Hunter LE. Concept recognition as a machine translation problem. BMC Bioinformatics 2021; 22:598. [PMID: 34920707 PMCID: PMC8678974 DOI: 10.1186/s12859-021-04141-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 04/19/2021] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Automated assignment of specific ontology concepts to mentions in text is a critical task in biomedical natural language processing, and the subject of many open shared tasks. Although the current state of the art involves the use of neural network language models as a post-processing step, the very large number of ontology classes to be recognized and the limited amount of gold-standard training data has impeded the creation of end-to-end systems based entirely on machine learning. Recently, Hailu et al. recast the concept recognition problem as a type of machine translation and demonstrated that sequence-to-sequence machine learning models have the potential to outperform multi-class classification approaches. METHODS We systematically characterize the factors that contribute to the accuracy and efficiency of several approaches to sequence-to-sequence machine learning through extensive studies of alternative methods and hyperparameter selections. We not only identify the best-performing systems and parameters across a wide variety of ontologies but also provide insights into the widely varying resource requirements and hyperparameter robustness of alternative approaches. Analysis of the strengths and weaknesses of such systems suggest promising avenues for future improvements as well as design choices that can increase computational efficiency with small costs in performance. RESULTS Bidirectional encoder representations from transformers for biomedical text mining (BioBERT) for span detection along with the open-source toolkit for neural machine translation (OpenNMT) for concept normalization achieve state-of-the-art performance for most ontologies annotated in the CRAFT Corpus. This approach uses substantially fewer computational resources, including hardware, memory, and time than several alternative approaches. CONCLUSIONS Machine translation is a promising avenue for fully machine-learning-based concept recognition that achieves state-of-the-art results on the CRAFT Corpus, evaluated via a direct comparison to previous results from the 2019 CRAFT shared task. Experiments illuminating the reasons for the surprisingly good performance of sequence-to-sequence methods targeting ontology identifiers suggest that further progress may be possible by mapping to alternative target concept representations. All code and models can be found at: https://github.com/UCDenver-ccp/Concept-Recognition-as-Translation .
Collapse
Affiliation(s)
- Mayla R Boguslav
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, 12635 East Montview Blvd, Aurora, CO, 80045, USA.
| | - Negacy D Hailu
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, 12635 East Montview Blvd, Aurora, CO, 80045, USA
| | - Michael Bada
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, 12635 East Montview Blvd, Aurora, CO, 80045, USA
| | - William A Baumgartner
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, 12635 East Montview Blvd, Aurora, CO, 80045, USA
| | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, 12635 East Montview Blvd, Aurora, CO, 80045, USA
| |
Collapse
|
13
|
Boguslav MR, Salem NM, White EK, Leach SM, Hunter LE. Identifying and classifying goals for scientific knowledge. Bioinform Adv 2021; 1:vbab012. [PMID: 34661112 PMCID: PMC8508177 DOI: 10.1093/bioadv/vbab012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 06/17/2021] [Indexed: 01/26/2023]
Abstract
MOTIVATION Science progresses by posing good questions, yet work in biomedical text mining has not focused on them much. We propose a novel idea for biomedical natural language processing: identifying and characterizing the questions stated in the biomedical literature. Formally, the task is to identify and characterize statements of ignorance, statements where scientific knowledge is missing or incomplete. The creation of such technology could have many significant impacts, from the training of PhD students to ranking publications and prioritizing funding based on particular questions of interest. The work presented here is intended as the first step towards these goals. RESULTS We present a novel ignorance taxonomy driven by the role statements of ignorance play in research, identifying specific goals for future scientific knowledge. Using this taxonomy and reliable annotation guidelines (inter-annotator agreement above 80%), we created a gold standard ignorance corpus of 60 full-text documents from the prenatal nutrition literature with over 10 000 annotations and used it to train classifiers that achieved over 0.80 F1 scores. AVAILABILITY AND IMPLEMENTATION Corpus and source code freely available for download at https://github.com/UCDenver-ccp/Ignorance-Question-Work. The source code is implemented in Python.
Collapse
Affiliation(s)
- Mayla R Boguslav
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA,To whom correspondence should be addressed.
| | - Nourah M Salem
- Health Informatics Program, College of Health Solutions at Arizona State University, Phoenix, AZ 85004, USA
| | - Elizabeth K White
- Center for Genes, Environment and Health, National Jewish Health, Denver, CO 80206, USA
| | - Sonia M Leach
- Center for Genes, Environment and Health, National Jewish Health, Denver, CO 80206, USA
| | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| |
Collapse
|
14
|
Sullivan KJ, Burden M, Keniston A, Banda JM, Hunter LE. Characterization of Anonymous Physician Perspectives on COVID-19 Using Social Media Data. Pac Symp Biocomput 2021; 26:95-106. [PMID: 33691008 PMCID: PMC7958992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Physicians' beliefs and attitudes about COVID-19 are important to ascertain because of their central role in providing care to patients during the pandemic. Identifying topics and sentiments discussed by physicians and other healthcare workers can lead to identification of gaps relating to theCOVID-19 pandemic response within the healthcare system. To better understand physicians' perspectives on the COVID-19 response, we extracted Twitter data from a specific user group that allows physicians to stay anonymous while expressing their perspectives about the COVID-19 pandemic. All tweets were in English. We measured most frequent bigrams and trigrams, compared sentiment analysis methods, and compared our findings to a larger Twitter dataset containing general COVID-19 related discourse. We found significant differences between the two datasets for specific topical phrases. No statistically significant difference was found in sentiments between the two datasets, and both trended slightly more positive than negative. Upon comparison to manual sentiment analysis, it was determined that these sentiment analysis methods should be improved to accurately capture sentiments of anonymous physician data. Anonymous physician social media data is a unique source of information that provides important insights into COVID-19 perspectives.
Collapse
Affiliation(s)
- Katherine J Sullivan
- Data Science to Patient Value, University of Colorado School of Medicine, Aurora, CO 80045, USA* Corresponding author,
| | | | | | | | | |
Collapse
|
15
|
Abstract
Knowledge-based biomedical data science involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey recent progress in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as progress on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing to construct knowledge graphs, and the expansion of novel knowledge-based approaches to clinical and biological domains.
Collapse
Affiliation(s)
- Tiffany J Callahan
- Computational Bioscience Program and Department of Pharmacology, University of Colorado Denver Anschutz Medical Campus, Aurora, Colorado 80045, USA
| | - Ignacio J Tripodi
- Department of Computer Science, University of Colorado, Boulder, Colorado 80309, USA
| | - Harrison Pielke-Lombardo
- Computational Bioscience Program and Department of Pharmacology, University of Colorado Denver Anschutz Medical Campus, Aurora, Colorado 80045, USA
| | - Lawrence E Hunter
- Computational Bioscience Program and Department of Pharmacology, University of Colorado Denver Anschutz Medical Campus, Aurora, Colorado 80045, USA
| |
Collapse
|
16
|
Tripodi IJ, Callahan TJ, Westfall JT, Meitzer NS, Dowell RD, Hunter LE. Applying knowledge-driven mechanistic inference to toxicogenomics. Toxicol In Vitro 2020; 66:104877. [PMID: 32387679 DOI: 10.1016/j.tiv.2020.104877] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 04/13/2020] [Accepted: 04/23/2020] [Indexed: 02/07/2023]
Abstract
When considering toxic chemicals in the environment, a mechanistic, causal explanation of toxicity may be preferred over a statistical or machine learning-based prediction by itself. Elucidating a mechanism of toxicity is, however, a costly and time-consuming process that requires the participation of specialists from a variety of fields, often relying on animal models. We present an innovative mechanistic inference framework (MechSpy), which can be used as a hypothesis generation aid to narrow the scope of mechanistic toxicology analysis. MechSpy generates hypotheses of the most likely mechanisms of toxicity, by combining a semantically-interconnected knowledge representation of human biology, toxicology and biochemistry with gene expression time series on human tissue. Using vector representations of biological entities, MechSpy seeks enrichment in a manually curated list of high-level mechanisms of toxicity, represented as biochemically- and causally-linked ontology concepts. Besides predicting the canonical mechanism of toxicity for many well-studied compounds, we experimentally validated some of our predictions for other chemicals without an established mechanism of toxicity. This mechanistic inference framework is an advantageous tool for predictive toxicology, and the first of its kind to produce a mechanistic explanation for each prediction. MechSpy can be modified to include additional mechanisms of toxicity, and is generalizable to other types of mechanisms of human biology.
Collapse
Affiliation(s)
- Ignacio J Tripodi
- University of Colorado, Computer Science / Interdisciplinary Quantitative Biology, Boulder, CO 80309, USA.
| | - Tiffany J Callahan
- University of Colorado Anschutz Medical Campus, Computational Bioscience, Denver, CO 80045, USA
| | - Jessica T Westfall
- University of Colorado, Molecular, Cellular and Developmental Biology, Boulder, CO 80309, USA
| | | | - Robin D Dowell
- University of Colorado, Molecular, Cellular and Developmental Biology / Interdisciplinary Quantitative Biology, Boulder, CO 80309, USA
| | - Lawrence E Hunter
- University of Colorado Anschutz Medical Campus, Computational Bioscience / Interdisciplinary Quantitative Biology, Denver, CO 80045, USA
| |
Collapse
|
17
|
Abstract
"P-hacking" is the repeated analysis of data until a statistically significant result is achieved. We show that p-hacking can also occur during data generation, sometimes unintentionally. We use the type-token ratio to demonstrate that differences in the definitions of "type" and "token" can produce significantly different results. Since these terms are rarely defined in the biomedical literature, the result is an inability to meaningfully interpret the body of literature that makes use of this measure.
Collapse
Affiliation(s)
- K. Bretonnel Cohen
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, Colorado, USA
| | - Lawrence E. Hunter
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, Colorado, USA
| | - Peter S. Pressman
- Department of Neurology, University of Colorado Hospitals, Anschutz Medical Campus, Aurora, Colorado, USA
| |
Collapse
|
18
|
Pressman PS, Ross ED, Cohen KB, Chen K, Miller BL, Hunter LE, Gorno‐Tempini ML, Levenson RW. Interpersonal prosodic correlation in frontotemporal dementia. Ann Clin Transl Neurol 2019; 6:1352-1357. [PMID: 31353851 PMCID: PMC6649473 DOI: 10.1002/acn3.50816] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Revised: 05/08/2019] [Accepted: 05/23/2019] [Indexed: 11/06/2022] Open
Abstract
Communication accommodation describes how individuals adjust their communicative style to that of their conversational partner. We predicted that interpersonal prosodic correlation related to pitch and timing would be decreased in behavioral variant frontotemporal dementia (bvFTD). We predicted that the interpersonal correlation in a timing measure and a pitch measure would be increased in right temporal FTD (rtFTD) due to sparing of the neural substrate for speech timing and pitch modulation but loss of social semantics. We found no significant effects in bvFTD, but conversations including rtFTD demonstrated higher interpersonal correlations in speech rate than healthy controls.
Collapse
Affiliation(s)
- Peter S. Pressman
- Department of Neurology, Section of Behavioral Neurology and NeuropsychiatryUniversity of Colorado DenverAnschutz Medical Campus, Academic Office Building 1, Mail Stop #B185, 12631 East 17th AvenueAuroraColorado80045
| | - Elliott D. Ross
- Department of Neurology, Section of Behavioral Neurology and NeuropsychiatryUniversity of Colorado DenverAnschutz Medical Campus, Academic Office Building 1, Mail Stop #B185, 12631 East 17th AvenueAuroraColorado80045
| | - Kevin B. Cohen
- Department of Neurology, Section of Behavioral Neurology and NeuropsychiatryUniversity of Colorado DenverAnschutz Medical Campus, Academic Office Building 1, Mail Stop #B185, 12631 East 17th AvenueAuroraColorado80045
| | - Kuan‐Hua Chen
- Berkeley Psychophysiology LabUniversity of California, Berkeley4143 Tolman Hall, MC 5050BerkeleyCalifornia94720‐5050
| | - Bruce L. Miller
- Memory and Aging CenterUniversity of California675 Nelson Rising LnSan FranciscoCalifornia94158
| | - Lawrence E. Hunter
- Department of Neurology, Section of Behavioral Neurology and NeuropsychiatryUniversity of Colorado DenverAnschutz Medical Campus, Academic Office Building 1, Mail Stop #B185, 12631 East 17th AvenueAuroraColorado80045
| | | | - Robert W. Levenson
- Berkeley Psychophysiology LabUniversity of California, Berkeley4143 Tolman Hall, MC 5050BerkeleyCalifornia94720‐5050
| |
Collapse
|
19
|
Zhang XA, Yates A, Vasilevsky N, Gourdine JP, Callahan TJ, Carmody LC, Danis D, Joachimiak MP, Ravanmehr V, Pfaff ER, Champion J, Robasky K, Xu H, Fecho K, Walton NA, Zhu RL, Ramsdill J, Mungall CJ, Köhler S, Haendel MA, McDonald CJ, Vreeman DJ, Peden DB, Bennett TD, Feinstein JA, Martin B, Stefanski AL, Hunter LE, Chute CG, Robinson PN. Semantic integration of clinical laboratory tests from electronic health records for deep phenotyping and biomarker discovery. NPJ Digit Med 2019; 2:32. [PMID: 31119199 PMCID: PMC6527418 DOI: 10.1038/s41746-019-0110-4] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Accepted: 04/18/2019] [Indexed: 12/22/2022] Open
Abstract
Electronic Health Record (EHR) systems typically define laboratory test results using the Laboratory Observation Identifier Names and Codes (LOINC) and can transmit them using Fast Healthcare Interoperability Resource (FHIR) standards. LOINC has not yet been semantically integrated with computational resources for phenotype analysis. Here, we provide a method for mapping LOINC-encoded laboratory test results transmitted in FHIR standards to Human Phenotype Ontology (HPO) terms. We annotated the medical implications of 2923 commonly used laboratory tests with HPO terms. Using these annotations, our software assesses laboratory test results and converts each result into an HPO term. We validated our approach with EHR data from 15,681 patients with respiratory complaints and identified known biomarkers for asthma. Finally, we provide a freely available SMART on FHIR application that can be used within EHR systems. Our approach allows readily available laboratory tests in EHR to be reused for deep phenotyping and exploits the hierarchical structure of HPO to integrate distinct tests that have comparable medical interpretations for association studies.
Collapse
Affiliation(s)
| | - Amy Yates
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR 97239 USA
| | - Nicole Vasilevsky
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR 97239 USA
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR 97239 USA
| | - J. P. Gourdine
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR 97239 USA
- Library, Oregon Health and Science University, Portland, OR 97239 USA
| | - Tiffany J. Callahan
- Computational Bioscience Program, Department of Pharmacology, University of Colorado Anschutz School of Medicine, Aurora, CO 80045 USA
| | - Leigh C. Carmody
- The Jackson Laboratory for Genomic Medicine, Farmington CT, 06032 USA
| | - Daniel Danis
- The Jackson Laboratory for Genomic Medicine, Farmington CT, 06032 USA
| | - Marcin P. Joachimiak
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 USA
| | - Vida Ravanmehr
- The Jackson Laboratory for Genomic Medicine, Farmington CT, 06032 USA
| | - Emily R. Pfaff
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - James Champion
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Kimberly Robasky
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
- Genetics Department, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
- School of Information and Library Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Hao Xu
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Karamarie Fecho
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Nephi A. Walton
- Genomic Medicine Institute, Geisinger Health System, Danville, PA 17822 USA
| | - Richard L. Zhu
- Institute for Clinical and Translational Research, Johns Hopkins University, Baltimore, MD 21202 USA
| | - Justin Ramsdill
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR 97239 USA
| | - Christopher J. Mungall
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 USA
| | - Sebastian Köhler
- Charité Centrum für Therapieforschung, Charité - Universitätsmedizin Berlin Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, 10117 Germany
- Einstein Center Digital Future, Berlin, 10117 Germany
| | - Melissa A. Haendel
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR 97239 USA
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR 97239 USA
- Linus Pauling Institute and Center for Genome Research and Biocomputing, Oregon State University, Corvallis, OR 97331 USA
| | - Clement J. McDonald
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 USA
| | - Daniel J. Vreeman
- Department of Medicine, Indiana University School of Medicine, Indianapolis, IN 46202 USA
- Center for Biomedical Informatics, Regenstrief Institute, Inc., Indianapolis, IN 46202 USA
| | - David B. Peden
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
- Division of Allergy, Immunology and Rheumatology, Department of Pediatrics, University of North Carolina, Chapel Hill, NC 27599 USA
- University of North Carolina Center for Environmental Medicine, Asthma and Lung Biology, University of North Carolina, Chapel Hill, NC 27599 USA
| | - Tellen D. Bennett
- Department of Pediatrics, Section of Pediatric Critical Care, University of Colorado School of Medicine, Aurora, CO 80045 USA
| | - James A. Feinstein
- Adult and Child Consortium for Health Outcomes Research and Delivery Science (ACCORDS), University of Colorado School of Medicine, Aurora, CO 80045 USA
| | - Blake Martin
- Department of Pediatrics, Section of Pediatric Critical Care, University of Colorado School of Medicine, Aurora, CO 80045 USA
| | - Adrianne L. Stefanski
- Computational Bioscience Program, Department of Pharmacology, University of Colorado Anschutz School of Medicine, Aurora, CO 80045 USA
| | - Lawrence E. Hunter
- Computational Bioscience Program, Department of Pharmacology, University of Colorado Anschutz School of Medicine, Aurora, CO 80045 USA
| | - Christopher G. Chute
- Institute for Clinical and Translational Research, Johns Hopkins University, Baltimore, MD 21202 USA
| | - Peter N. Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington CT, 06032 USA
- Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032 USA
| |
Collapse
|
20
|
Cohen KB, Xia J, Zweigenbaum P, Callahan TJ, Hargraves O, Goss F, Ide N, Névéol A, Grouin C, Hunter LE. Three Dimensions of Reproducibility in Natural Language Processing. LREC Int Conf Lang Resour Eval 2018; 2018:156-165. [PMID: 29911205 PMCID: PMC5998676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Despite considerable recent attention to problems with reproducibility of scientific research, there is a striking lack of agreement about the definition of the term. That is a problem, because the lack of a consensus definition makes it difficult to compare studies of reproducibility, and thus to have even a broad overview of the state of the issue in natural language processing. This paper proposes an ontology of reproducibility in that field. Its goal is to enhance both future research and communication about the topic, and retrospective meta-analyses. We show that three dimensions of reproducibility, corresponding to three kinds of claims in natural language processing papers, can account for a variety of types of research reports. These dimensions are reproducibility of a conclusion, of a finding, and of a value. Three biomedical natural language processing papers by the authors of this paper are analyzed with respect to these dimensions.
Collapse
Affiliation(s)
- K Bretonnel Cohen
- Computational Bioscience Program, University of Colorado School of Medicine
- LIMSI, CNRS, Université Paris-Saclay
| | | | | | - Tiffany J Callahan
- Computational Bioscience Program, University of Colorado School of Medicine
| | | | - Foster Goss
- Department of Emergency Medicine, University of Colorado
| | | | | | | | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado School of Medicine
| |
Collapse
|
21
|
Callahan TJ, Baumgartner WA, Bada M, Stefanski AL, Tripodi I, White EK, Hunter LE. OWL-NETS: Transforming OWL Representations for Improved Network Inference. Pac Symp Biocomput 2018; 23:133-144. [PMID: 29218876 PMCID: PMC5737627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Our knowledge of the biological mechanisms underlying complex human disease is largely incomplete. While Semantic Web technologies, such as the Web Ontology Language (OWL), provide powerful techniques for representing existing knowledge, well-established OWL reasoners are unable to account for missing or uncertain knowledge. The application of inductive inference methods, like machine learning and network inference are vital for extending our current knowledge. Therefore, robust methods which facilitate inductive inference on rich OWL-encoded knowledge are needed. Here, we propose OWL-NETS (NEtwork Transformation for Statistical learning), a novel computational method that reversibly abstracts OWL-encoded biomedical knowledge into a network representation tailored for network inference. Using several examples built with the Open Biomedical Ontologies, we show that OWL-NETS can leverage existing ontology-based knowledge representations and network inference methods to generate novel, biologically-relevant hypotheses. Further, the lossless transformation of OWL-NETS allows for seamless integration of inferred edges back into the original knowledge base, extending its coverage and completeness.
Collapse
Affiliation(s)
- Tiffany J Callahan
- Computational Bioscience Program, University of Colorado Denver Anschutz Medical Campus, Aurora, CO 80045, USA,
| | | | | | | | | | | | | |
Collapse
|
22
|
Boguslav M, Cohen KB, Baumgartner WA, Hunter LE. Improving precision in concept normalization. Pac Symp Biocomput 2018; 23:566-577. [PMID: 29218915 PMCID: PMC5730334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Most natural language processing applications exhibit a trade-off between precision and recall. In some use cases for natural language processing, there are reasons to prefer to tilt that trade-off toward high precision. Relying on the Zipfian distribution of false positive results, we describe a strategy for increasing precision, using a variety of both pre-processing and post-processing methods. They draw on both knowledge-based and frequentist approaches to modeling language. Based on an existing high-performance biomedical concept recognition pipeline and a previously published manually annotated corpus, we apply this hybrid rationalist/empiricist strategy to concept normalization for eight different ontologies. Which approaches did and did not improve precision varied widely between the ontologies.
Collapse
Affiliation(s)
- Mayla Boguslav
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, CO 80045, USA compbio.ucdenver.edu,
| | | | | | | |
Collapse
|
23
|
Bada M, Vasilevsky N, Baumgartner WA, Haendel M, Hunter LE. Gold-standard ontology-based anatomical annotation in the CRAFT Corpus. Database (Oxford) 2017; 2017:4780291. [PMID: 31725864 PMCID: PMC7243923 DOI: 10.1093/database/bax087] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Revised: 10/25/2017] [Accepted: 10/27/2017] [Indexed: 12/24/2022]
Abstract
Gold-standard annotated corpora have become important resources for the training and testing of natural-language-processing (NLP) systems designed to support biocuration efforts, and ontologies are increasingly used to facilitate curational consistency and semantic integration across disparate resources. Bringing together the respective power of these, the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of full-length, open-access biomedical journal articles with extensive manually created syntactic, formatting and semantic markup, was previously created and released. This initial public release has already been used in multiple projects to drive development of systems focused on a variety of biocuration, search, visualization, and semantic and syntactic NLP tasks. Building on its demonstrated utility, we have expanded the CRAFT Corpus with a large set of manually created semantic annotations relying on Uberon, an ontology representing anatomical entities and life-cycle stages of multicellular organisms across species as well as types of multicellular organisms defined in terms of life-cycle stage and sexual characteristics. This newly created set of annotations, which has been added for v2.1 of the corpus, is by far the largest publicly available collection of gold-standard anatomical markup and is the first large-scale effort at manual markup of biomedical text relying on the entirety of an anatomical terminology, as opposed to annotation with a small number of high-level anatomical categories, as performed in previous corpora. In addition to presenting and discussing this newly available resource, we apply it to provide a performance baseline for the automatic annotation of anatomical concepts in biomedical text using a prominent concept recognition system. The full corpus, released with a CC BY 3.0 license, may be downloaded from http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml. Database URL: http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml
Collapse
Affiliation(s)
- Michael Bada
- School of Medicine, Department of Pharmacology, University of Colorado Anschutz Medical Campus, 12801 E. 17th Ave., P.O. Box 6511, MS 8303, Aurora, CO 80045-0511, USA
| | - Nicole Vasilevsky
- Ontology Development Group, Library, Oregon Health & Science University, 318 SW Sam Jackson, Park Road, Portland, OR 97239, USA
| | - William A Baumgartner
- School of Medicine, Department of Pharmacology, University of Colorado Anschutz Medical Campus, 12801 E. 17th Ave., P.O. Box 6511, MS 8303, Aurora, CO 80045-0511, USA
| | - Melissa Haendel
- Ontology Development Group, Library, Oregon Health & Science University, 318 SW Sam Jackson, Park Road, Portland, OR 97239, USA
| | - Lawrence E Hunter
- School of Medicine, Department of Pharmacology, University of Colorado Anschutz Medical Campus, 12801 E. 17th Ave., P.O. Box 6511, MS 8303, Aurora, CO 80045-0511, USA
| |
Collapse
|
24
|
Abstract
Computational manipulation of knowledge is an important, and often under-appreciated, aspect of biomedical Data Science. The first Data Science initiative from the US National Institutes of Health was entitled "Big Data to Knowledge (BD2K)." The main emphasis of the more than $200M allocated to that program has been on "Big Data;" the "Knowledge" component has largely been the implicit assumption that the work will lead to new biomedical knowledge. However, there is long-standing and highly productive work in computational knowledge representation and reasoning, and computational processing of knowledge has a role in the world of Data Science. Knowledge-based biomedical Data Science involves the design and implementation of computer systems that act as if they knew about biomedicine. There are many ways in which a computational approach might act as if it knew something: for example, it might be able to answer a natural language question about a biomedical topic, or pass an exam; it might be able to use existing biomedical knowledge to rank or evaluate hypotheses; it might explain or interpret data in light of prior knowledge, either in a Bayesian or other sort of framework. These are all examples of automated reasoning that act on computational representations of knowledge. After a brief survey of existing approaches to knowledge-based data science, this position paper argues that such research is ripe for expansion, and expanded application.
Collapse
Affiliation(s)
- Lawrence E Hunter
- Computational Bioscience, University of Colorado School of Medicine, Aurora, CO 80045, USA ; ORCID: https://orcid.org/0000-0003-1455-3370
| |
Collapse
|
25
|
Prabhu N, Osifodunrin N, Murphy D, Butler S, Hunter LE. Innovative Strategies for the Management of a Massive Neonatal Rhabdomyoma. J Pediatr Intensive Care 2017; 7:90-93. [PMID: 31073477 DOI: 10.1055/s-0037-1606574] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 08/09/2017] [Indexed: 09/30/2022] Open
Abstract
Rhabdomyomas are histologically benign tumors known to be associated with tuberous sclerosis. The natural history predicts the majority of tumors to be asymptomatic and regress within the first year of life. We describe a neonate presenting on day 1 of life with cardiovascular collapse secondary to a massive rhabdomyoma. Surgical resection was excluded due to the extensive nature of the lesion and oral sirolimus, a mammalian target of rapamycin inhibitor, was commenced to promote tumor regression. The patient developed intractable arrhythmias requiring extracorporeal life support during therapy.
Collapse
Affiliation(s)
- N Prabhu
- Department of Paediatric Cardiology, Royal Hospital for Children, Glasgow, United Kingdom
| | - N Osifodunrin
- Department of Paediatric Oncology, Royal Hospital for Children, Glasgow, United Kingdom
| | - D Murphy
- Department of Paediatric Oncology, Royal Hospital for Children, Glasgow, United Kingdom
| | - S Butler
- Department of Paediatric Radiology, Royal Hospital for Children, Glasgow, United Kingdom
| | - L E Hunter
- Department of Paediatric Cardiology, Royal Hospital for Children, Glasgow, United Kingdom
| |
Collapse
|
26
|
Pouille F, McTavish TS, Hunter LE, Restrepo D, Schoppa NE. Intraglomerular gap junctions enhance interglomerular synchrony in a sparsely connected olfactory bulb network. J Physiol 2017; 595:5965-5986. [PMID: 28640508 PMCID: PMC5577541 DOI: 10.1113/jp274408] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Accepted: 06/14/2017] [Indexed: 01/12/2023] Open
Abstract
KEY POINTS Despite sparse connectivity, population-level interactions between mitral cells (MCs) and granule cells (GCs) can generate synchronized oscillations in the rodent olfactory bulb. Intraglomerular gap junctions between MCs at the same glomerulus can greatly enhance synchronized activity of MCs at different glomeruli. The facilitating effect of intraglomerular gap junctions on interglomerular synchrony is through triggering of mutually synchronizing interactions between MCs and GCs. Divergent connections between MCs and GCs make minimal direct contribution to synchronous activity. ABSTRACT A dominant feature of the olfactory bulb response to odour is fast synchronized oscillations at beta (15-40 Hz) or gamma (40-90 Hz) frequencies, thought to be involved in integration of olfactory signals. Mechanistically, the bulb presents an interesting case study for understanding how beta/gamma oscillations arise. Fast oscillatory synchrony in the activity of output mitral cells (MCs) appears to result from interactions with GABAergic granule cells (GCs), yet the incidence of MC-GC connections is very low, around 4%. Here, we combined computational and experimental approaches to examine how oscillatory synchrony can nevertheless arise, focusing mainly on activity between 'non-sister' MCs affiliated with different glomeruli (interglomerular synchrony). In a sparsely connected model of MCs and GCs, we found first that interglomerular synchrony was generally quite low, but could be increased by a factor of 4 by physiological levels of gap junctional coupling between sister MCs at the same glomerulus. This effect was due to enhanced mutually synchronizing interactions between MC and GC populations. The potent role of gap junctions was confirmed in patch-clamp recordings in bulb slices from wild-type and connexin 36-knockout (KO) mice. KO reduced both beta and gamma local field potential oscillations as well as synchrony of inhibitory signals in pairs of non-sister MCs. These effects were independent of potential KO actions on network excitation. Divergent synaptic connections did not contribute directly to the vast majority of synchronized signals. Thus, in a sparsely connected network, gap junctions between a small subset of cells can, through population effects, greatly amplify oscillatory synchrony amongst unconnected cells.
Collapse
Affiliation(s)
- Frederic Pouille
- Department of Physiology and Biophysics, University of ColoradoAnschutz Medical CampusAuroraCO80045USA
| | - Thomas S. McTavish
- Computational Bioscience Program, University of ColoradoAnschutz Medical CampusAuroraCO80045USA
| | - Lawrence E. Hunter
- Computational Bioscience Program, University of ColoradoAnschutz Medical CampusAuroraCO80045USA
- Department of Pharmacology, University of ColoradoAnschutz Medical CampusAuroraCO80045USA
| | - Diego Restrepo
- Department of Cell and Developmental Biology, University of ColoradoAnschutz Medical CampusAuroraCO80045USA
| | - Nathan E. Schoppa
- Department of Physiology and Biophysics, University of ColoradoAnschutz Medical CampusAuroraCO80045USA
| |
Collapse
|
27
|
Cohen KB, Lanfranchi A, Choi MJY, Bada M, Baumgartner WA, Panteleyeva N, Verspoor K, Palmer M, Hunter LE. Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles. BMC Bioinformatics 2017; 18:372. [PMID: 28818042 PMCID: PMC5561560 DOI: 10.1186/s12859-017-1775-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2016] [Accepted: 07/31/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Coreference resolution is the task of finding strings in text that have the same referent as other strings. Failures of coreference resolution are a common cause of false negatives in information extraction from the scientific literature. In order to better understand the nature of the phenomenon of coreference in biomedical publications and to increase performance on the task, we annotated the Colorado Richly Annotated Full Text (CRAFT) corpus with coreference relations. RESULTS The corpus was manually annotated with coreference relations, including identity and appositives for all coreferring base noun phrases. The OntoNotes annotation guidelines, with minor adaptations, were used. Interannotator agreement ranges from 0.480 (entity-based CEAF) to 0.858 (Class-B3), depending on the metric that is used to assess it. The resulting corpus adds nearly 30,000 annotations to the previous release of the CRAFT corpus. Differences from related projects include a much broader definition of markables, connection to extensive annotation of several domain-relevant semantic classes, and connection to complete syntactic annotation. Tool performance was benchmarked on the data. A publicly available out-of-the-box, general-domain coreference resolution system achieved an F-measure of 0.14 (B3), while a simple domain-adapted rule-based system achieved an F-measure of 0.42. An ensemble of the two reached F of 0.46. Following the IDENTITY chains in the data would add 106,263 additional named entities in the full 97-paper corpus, for an increase of 76% percent in the semantic classes of the eight ontologies that have been annotated in earlier versions of the CRAFT corpus. CONCLUSIONS The project produced a large data set for further investigation of coreference and coreference resolution in the scientific literature. The work raised issues in the phenomenon of reference in this domain and genre, and the paper proposes that many mentions that would be considered generic in the general domain are not generic in the biomedical domain due to their referents to specific classes in domain-specific ontologies. The comparison of the performance of a publicly available and well-understood coreference resolution system with a domain-adapted system produced results that are consistent with the notion that the requirements for successful coreference resolution in this genre are quite different from those of the general domain, and also suggest that the baseline performance difference is quite large.
Collapse
Affiliation(s)
- K Bretonnel Cohen
- Computational Bioscience Program, University of Colorado School of Medicine, Denver, CO, USA.
| | - Arrick Lanfranchi
- Department of Linguistics, University of Colorado at Boulder, Boulder, Colorado, USA
| | - Miji Joo-Young Choi
- School of Computing and Information Systems, The University of Melbourne, Melbourne, Australia
| | - Michael Bada
- Computational Bioscience Program, University of Colorado School of Medicine, Denver, CO, USA
| | - William A Baumgartner
- Computational Bioscience Program, University of Colorado School of Medicine, Denver, CO, USA
| | - Natalya Panteleyeva
- Computational Bioscience Program, University of Colorado School of Medicine, Denver, CO, USA
| | - Karin Verspoor
- School of Computing and Information Systems, The University of Melbourne, Melbourne, Australia
| | - Martha Palmer
- Computational Bioscience Program, University of Colorado School of Medicine, Denver, CO, USA.,Department of Linguistics, University of Colorado at Boulder, Boulder, Colorado, USA
| | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado School of Medicine, Denver, CO, USA
| |
Collapse
|
28
|
Hooper JE, Feng W, Li H, Leach SM, Phang T, Siska C, Jones KL, Spritz RA, Hunter LE, Williams T. Systems biology of facial development: contributions of ectoderm and mesenchyme. Dev Biol 2017; 426:97-114. [PMID: 28363736 PMCID: PMC5530582 DOI: 10.1016/j.ydbio.2017.03.025] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Revised: 03/23/2017] [Accepted: 03/23/2017] [Indexed: 12/17/2022]
Abstract
The rapid increase in gene-centric biological knowledge coupled with analytic approaches for genomewide data integration provides an opportunity to develop systems-level understanding of facial development. Experimental analyses have demonstrated the importance of signaling between the surface ectoderm and the underlying mesenchyme are coordinating facial patterning. However, current transcriptome data from the developing vertebrate face is dominated by the mesenchymal component, and the contributions of the ectoderm are not easily identified. We have generated transcriptome datasets from critical periods of mouse face formation that enable gene expression to be analyzed with respect to time, prominence, and tissue layer. Notably, by separating the ectoderm and mesenchyme we considerably improved the sensitivity compared to data obtained from whole prominences, with more genes detected over a wider dynamic range. From these data we generated a detailed description of ectoderm-specific developmental programs, including pan-ectodermal programs, prominence- specific programs and their temporal dynamics. The genes and pathways represented in these programs provide mechanistic insights into several aspects of ectodermal development. We also used these data to identify co-expression modules specific to facial development. We then used 14 co-expression modules enriched for genes involved in orofacial clefts to make specific mechanistic predictions about genes involved in tongue specification, in nasal process patterning and in jaw development. Our multidimensional gene expression dataset is a unique resource for systems analysis of the developing face; our co-expression modules are a resource for predicting functions of poorly annotated genes, or for predicting roles for genes that have yet to be studied in the context of facial development; and our analytic approaches provide a paradigm for analysis of other complex developmental programs.
Collapse
Affiliation(s)
- Joan E Hooper
- Department of Cell and Developmental Biology, University of Colorado School of Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA; Computational Bioscience Program, University of Colorado School of Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA.
| | - Weiguo Feng
- Department of Cell and Developmental Biology, University of Colorado School of Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA; Department of Craniofacial Biology, University of Colorado School of Dental Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA.
| | - Hong Li
- Department of Craniofacial Biology, University of Colorado School of Dental Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA.
| | - Sonia M Leach
- Department of Biomedical Research, National Jewish Health, 1400 Jackson Street, Denver, CO 80206, USA.
| | - Tzulip Phang
- Computational Bioscience Program, University of Colorado School of Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA; Department of Medicine, University of Colorado School of Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA.
| | - Charlotte Siska
- Computational Bioscience Program, University of Colorado School of Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA.
| | - Kenneth L Jones
- Department of Pediatrics, University of Colorado School of Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA.
| | - Richard A Spritz
- Human Medical Genetics and Genomics Program, University of Colorado School of Medicine, 12800 E 17th Avenue, Aurora, CO 80045, USA.
| | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado School of Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA; Department of Pharmacology, University of Colorado School of Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA.
| | - Trevor Williams
- Department of Cell and Developmental Biology, University of Colorado School of Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA; Department of Craniofacial Biology, University of Colorado School of Dental Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA.
| |
Collapse
|
29
|
Affiliation(s)
- Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Lana X Garmire
- Cancer Epidemiology Program, University of Hawaii Cancer Center, University of Hawaii, Honolulu, Hawaii, USA
| | - Jack A Gilbert
- Department of Surgery, University of Chicago School of Medicine, Chicago, Illinois, USA
| | - Marylyn D Ritchie
- Biomedical and Translational Informatics Program, Geisinger Health System, Danville, Pennsylvania, USA
| | - Lawrence E Hunter
- Department of Pharmacology, University of Colorado School of Medicine, Aurora, Colorado, USA
| |
Collapse
|
30
|
Moore JH, Jennings SF, Greene CS, Hunter LE, Perkins AD, Williams-Devane C, Wunsch DC, Zhao Z, Huang X. NO-BOUNDARY THINKING IN BIOINFORMATICS. Pac Symp Biocomput 2017; 22:646-648. [PMID: 27897015 DOI: 10.1142/9789813207813_0060] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The following sections are included:Bioinformatics is a Mature DisciplineThe Golden Era of Bioinformatics Has BegunNo-Boundary Thinking in BioinformaticsReferences.
Collapse
Affiliation(s)
- Jason H Moore
- Institute for Biomedical Informatics, University of Pennsylvania Philadelphia, PA 19104, USA,
| | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Cohen KB, Goss FR, Zweigenbaum P, Hunter LE. Translational Morphosyntax: Distribution of Negation in Clinical Records and Biomedical Journal Articles. Stud Health Technol Inform 2017; 245:346-350. [PMID: 29295113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Prior knowledge of the distributional characteristics of linguistic phenomena can be useful for a variety of language processing tasks. This paper describes the distribution of negation in two types of biomedical texts: scientific journal articles and progress notes. Two types of negation are examined: explicit negation at the syntactic level and affixal negation at the sub-word level. The data show that the distribution of negation is significantly different in the two document types, with explicit negation more frequent in the clinical documents than in the scientific publications and affixal negation more frequent in the journal articles at the type level and token levels. All code is available on GitHub <fnr rid="fn001" /><fn id="fn001">https://github.com/KevinBretonnelCohen/NegationDistribution </fn>.
Collapse
Affiliation(s)
- K Bretonnel Cohen
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, CO, USA
| | - Foster R Goss
- University of Colorado School of Medicine, Department of Emergency Medicine, Aurora, CO, USA
| | | | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, CO, USA
| |
Collapse
|
32
|
Yadav P, Jezek E, Bouillon P, Callahan TJ, Bada M, Hunter LE, Cohen KB. Semantic Relations in Compound Nouns: Perspectives from Inter-Annotator Agreement. Stud Health Technol Inform 2017; 245:644-648. [PMID: 29295175 PMCID: PMC7781293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Semantic relations have been studied for decades without yet reaching consensus on the set of these relations. However, biomedical language processing and ontologies rely on these relations, so it is important to be able to evaluate their suitability. In this paper we examine the role of inter-annotator agreement in choosing between competing proposals regarding the set of such relations. The experiments consisted of labeling the semantic relations between two elements of noun-noun compounds (e.g. cell migration). Two judges annotated a dataset of terms from the biomedical domain using two competing sets of relations and analyzed the inter-annotator agreement. With no training and little documentation, agreement on this task was fairly high and disagreements were consistent. The results support the utility of the relation-based approach to semantic representation.
Collapse
Affiliation(s)
- Prabha Yadav
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, Colorado 80045, USA
| | | | - Pierrette Bouillon
- Faculté de Traduction et d’Interprétation, Université de Genève, Switzerland
| | - Tiffany J. Callahan
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, Colorado 80045, USA
| | - Michael Bada
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, Colorado 80045, USA
| | - Lawrence E. Hunter
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, Colorado 80045, USA
| | - K. Bretonnel Cohen
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, Colorado 80045, USA
| |
Collapse
|
33
|
Funk CS, Cohen KB, Hunter LE, Verspoor KM. Gene Ontology synonym generation rules lead to increased performance in biomedical concept recognition. J Biomed Semantics 2016; 7:52. [PMID: 27613112 PMCID: PMC5018193 DOI: 10.1186/s13326-016-0096-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Accepted: 08/05/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene Ontology (GO) terms represent the standard for annotation and representation of molecular functions, biological processes and cellular compartments, but a large gap exists between the way concepts are represented in the ontology and how they are expressed in natural language text. The construction of highly specific GO terms is formulaic, consisting of parts and pieces from more simple terms. RESULTS We present two different types of manually generated rules to help capture the variation of how GO terms can appear in natural language text. The first set of rules takes into account the compositional nature of GO and recursively decomposes the terms into their smallest constituent parts. The second set of rules generates derivational variations of these smaller terms and compositionally combines all generated variants to form the original term. By applying both types of rules, new synonyms are generated for two-thirds of all GO terms and an increase in F-measure performance for recognition of GO on the CRAFT corpus from 0.498 to 0.636 is observed. Additionally, we evaluated the combination of both types of rules over one million full text documents from Elsevier; manual validation and error analysis show we are able to recognize GO concepts with reasonable accuracy (88 %) based on random sampling of annotations. CONCLUSIONS In this work we present a set of simple synonym generation rules that utilize the highly compositional and formulaic nature of the Gene Ontology concepts. We illustrate how the generated synonyms aid in improving recognition of GO concepts on two different biomedical corpora. We discuss other applications of our rules for GO ontology quality assurance, explore the issue of overgeneration, and provide examples of how similar methodologies could be applied to other biomedical terminologies. Additionally, we provide all generated synonyms for use by the text-mining community.
Collapse
Affiliation(s)
- Christopher S. Funk
- Computational Bioscience, University of Colorado School of Medicine, Aurora, CO 80045 USA
| | - K. Bretonnel Cohen
- Computational Bioscience, University of Colorado School of Medicine, Aurora, CO 80045 USA
| | - Lawrence E. Hunter
- Computational Bioscience, University of Colorado School of Medicine, Aurora, CO 80045 USA
| | - Karin M. Verspoor
- Department of Computing and Information Systems, University of Melbourne, Parkville, Melbourne, 3010 Australia
- Health and Biomedical Informatics Centre, University of Melbourne, Parkville, Melbourne, 3010 Australia
| |
Collapse
|
34
|
Cohen KB, Xia J, Roeder C, Hunter LE. Reproducibility in Natural Language Processing: A Case Study of Two R Libraries for Mining PubMed/MEDLINE. LREC Int Conf Lang Resour Eval 2016; 2016:6-12. [PMID: 29568821 PMCID: PMC5860830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
There is currently a crisis in science related to highly publicized failures to reproduce large numbers of published studies. The current work proposes, by way of case studies, a methodology for moving the study of reproducibility in computational work to a full stage beyond that of earlier work. Specifically, it presents a case study in attempting to reproduce the reports of two R libraries for doing text mining of the PubMed/MEDLINE repository of scientific publications. The main findings are that a rational paradigm for reproduction of natural language processing papers can be established; the advertised functionality was difficult, but not impossible, to reproduce; and reproducibility studies can produce additional insights into the functioning of the published system. Additionally, the work on reproducibility lead to the production of novel user-centered documentation that has been accessed 260 times since its publication-an average of once a day per library.
Collapse
Affiliation(s)
- K Bretonnel Cohen
- Biomedical Text Mining Group Computational Bioscience Program, University of Colorado School of Medicine
| | - Jingbo Xia
- Department of Bio-statistics, College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University
| | - Christophe Roeder
- Biomedical Text Mining Group Computational Bioscience Program, University of Colorado School of Medicine
- Department of Bio-statistics, College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University
| | - Lawrence E Hunter
- Biomedical Text Mining Group Computational Bioscience Program, University of Colorado School of Medicine
| |
Collapse
|
35
|
Hunter LE, Pushparajah K, Miller O, Anderson D, Simpson JM. Prenatal diagnosis of left ventricular diverticulum and coarctation of the aorta. Ultrasound Obstet Gynecol 2016; 47:236-238. [PMID: 26376444 DOI: 10.1002/uog.15746] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2015] [Revised: 09/01/2015] [Accepted: 09/10/2015] [Indexed: 06/05/2023]
Abstract
Congenital left ventricular diverticulum (LVD) is a rare abnormality of the myocardium which has been detected previously in the fetus. Lesions have been reported from as early as 12 weeks' gestation but are more commonly detected in the mid-second trimester. Fetal presentation of LVD ranges from an abnormal four-chamber view of the heart, arrhythmia or isolated pericardial effusion to fetal hydrops with associated heart failure. Here, we describe the prenatal diagnosis of an infant with LVD originating from the left ventricular outflow tract associated with coarctation of the aorta. The diagnosis was confirmed postnatally by two-dimensional echocardiography and cardiac magnetic resonance imaging. We hypothesize that the lesion compromised antegrade flow into the transverse aortic arch, which may have contributed to underdevelopment of the aortic arch and subsequently the development of coarctation of the aorta. This is a unique case of LVD and coarctation of the aorta.
Collapse
Affiliation(s)
- L E Hunter
- Department of Congenital Heart Disease, Royal Hospital for Children, Glasgow, UK
| | - K Pushparajah
- Department of Congenital Heart Disease, Evelina London Children's Hospital, London, UK
| | - O Miller
- Department of Congenital Heart Disease, Evelina London Children's Hospital, London, UK
| | - D Anderson
- Department of Congenital Heart Disease, Evelina London Children's Hospital, London, UK
| | - J M Simpson
- Department of Congenital Heart Disease, Evelina London Children's Hospital, London, UK
| |
Collapse
|
36
|
Karimpour-Fard A, Epperson LE, Hunter LE. A survey of computational tools for downstream analysis of proteomic and other omic datasets. Hum Genomics 2015; 9:28. [PMID: 26510531 PMCID: PMC4624643 DOI: 10.1186/s40246-015-0050-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Accepted: 10/06/2015] [Indexed: 12/19/2022] Open
Abstract
Proteomics is an expanding area of research into biological systems with significance for biomedical and therapeutic applications ranging from understanding the molecular basis of diseases to testing new treatments, studying the toxicity of drugs, or biotechnological improvements in agriculture. Progress in proteomic technologies and growing interest has resulted in rapid accumulation of proteomic data, and consequently, a great number of tools have become available. In this paper, we review the well-known and ready-to-use tools for classification, clustering and validation, interpretation, and generation of biological information from experimental data. We suggest some rules of thumb for the reader on choosing the best suitable learning method for a particular dataset and conclude with pathway and functional analysis and then provide information about submitting final results to a repository.
Collapse
Affiliation(s)
- Anis Karimpour-Fard
- Department of Pharmacology, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
| | - L Elaine Epperson
- Integrated Center for Genes, Environment, and Health, National Jewish Health, Denver, CO, 80206, USA
| | - Lawrence E Hunter
- Department of Pharmacology, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| |
Collapse
|
37
|
Vehlow C, Kao DP, Bristow MR, Hunter LE, Weiskopf D, Görg C. Visual analysis of biological data-knowledge networks. BMC Bioinformatics 2015; 16:135. [PMID: 25925016 PMCID: PMC4456720 DOI: 10.1186/s12859-015-0550-z] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 03/25/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The interpretation of the results from genome-scale experiments is a challenging and important problem in contemporary biomedical research. Biological networks that integrate experimental results with existing knowledge from biomedical databases and published literature can provide a rich resource and powerful basis for hypothesizing about mechanistic explanations for observed gene-phenotype relationships. However, the size and density of such networks often impede their efficient exploration and understanding. RESULTS We introduce a visual analytics approach that integrates interactive filtering of dense networks based on degree-of-interest functions with attribute-based layouts of the resulting subnetworks. The comparison of multiple subnetworks representing different analysis facets is facilitated through an interactive super-network that integrates brushing-and-linking techniques for highlighting components across networks. An implementation is freely available as a Cytoscape app. CONCLUSIONS We demonstrate the utility of our approach through two case studies using a dataset that combines clinical data with high-throughput data for studying the effect of β-blocker treatment on heart failure patients. Furthermore, we discuss our team-based iterative design and development process as well as the limitations and generalizability of our approach.
Collapse
Affiliation(s)
- Corinna Vehlow
- VISUS, University of Stuttgart, Allmandring 19, Stuttgart, Germany.
| | - David P Kao
- School of Medicine, University of Colorado, E 17th Pl, Aurora, CO, USA.
| | - Michael R Bristow
- School of Medicine, University of Colorado, E 17th Pl, Aurora, CO, USA.
| | - Lawrence E Hunter
- School of Medicine, University of Colorado, E 17th Pl, Aurora, CO, USA.
| | - Daniel Weiskopf
- VISUS, University of Stuttgart, Allmandring 19, Stuttgart, Germany.
| | - Carsten Görg
- School of Medicine, University of Colorado, E 17th Pl, Aurora, CO, USA.
| |
Collapse
|
38
|
Livingston KM, Bada M, Baumgartner WA, Hunter LE. KaBOB: ontology-based semantic integration of biomedical databases. BMC Bioinformatics 2015; 16:126. [PMID: 25903923 PMCID: PMC4448321 DOI: 10.1186/s12859-015-0559-3] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2014] [Accepted: 03/30/2015] [Indexed: 04/04/2023] Open
Abstract
Background The ability to query many independent biological databases using a common ontology-based semantic model would facilitate deeper integration and more effective utilization of these diverse and rapidly growing resources. Despite ongoing work moving toward shared data formats and linked identifiers, significant problems persist in semantic data integration in order to establish shared identity and shared meaning across heterogeneous biomedical data sources. Results We present five processes for semantic data integration that, when applied collectively, solve seven key problems. These processes include making explicit the differences between biomedical concepts and database records, aggregating sets of identifiers denoting the same biomedical concepts across data sources, and using declaratively represented forward-chaining rules to take information that is variably represented in source databases and integrating it into a consistent biomedical representation. We demonstrate these processes and solutions by presenting KaBOB (the Knowledge Base Of Biomedicine), a knowledge base of semantically integrated data from 18 prominent biomedical databases using common representations grounded in Open Biomedical Ontologies. An instance of KaBOB with data about humans and seven major model organisms can be built using on the order of 500 million RDF triples. All source code for building KaBOB is available under an open-source license. Conclusions KaBOB is an integrated knowledge base of biomedical data representationally based in prominent, actively maintained Open Biomedical Ontologies, thus enabling queries of the underlying data in terms of biomedical concepts (e.g., genes and gene products, interactions and processes) rather than features of source-specific data schemas or file formats. KaBOB resolves many of the issues that routinely plague biomedical researchers intending to work with data from multiple data sources and provides a platform for ongoing data integration and development and for formal reasoning over a wealth of integrated biomedical data. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0559-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kevin M Livingston
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
| | - Michael Bada
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
| | - William A Baumgartner
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
| | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
| |
Collapse
|
39
|
Albrecht SV, Barreto AMS, Braziunas D, Buckeridge DL, Cuayáhuitl H, Dethlefs N, Endres M, Farahmand AM, Fox M, Frommberger L, Ganzfried S, Gil Y, Guillet S, Hunter LE, Jhala A, Kersting K, Konidaris G, Lecue F, McIlraith S, Natarajan S, Noorian Z, Poole D, Ronfard R, Saffiotti A, Shaban-Nejad A, Srivastava B, Tesauro G, Uceda-Sosa R, Van den Broeck G, Van Otterlo M, Wallace BC, Weng P, Wiens J, Zhang J. Reports of the AAAI 2014 Conference Workshops. AI MAG 2015. [DOI: 10.1609/aimag.v36i1.2575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
The AAAI-14 Workshop program was held Sunday and Monday, July 27–28, 2012, at the Québec City Convention Centre in Québec, Canada. Canada. The AAAI-14 workshop program included fifteen workshops covering a wide range of topics in artificial intelligence. The titles of the workshops were AI and Robotics; Artificial Intelligence Applied to Assistive Technologies and Smart Environments; Cognitive Computing for Augmented Human Intelligence; Computer Poker and Imperfect Information; Discovery Informatics; Incentives and Trust in Electronic Communities; Intelligent Cinematography and Editing; Machine Learning for Interactive Systems: Bridging the Gap between Perception, Action and Communication; Modern Artificial Intelligence for Health Analytics; Multiagent Interaction without Prior Coordination; Multidisciplinary Workshop on Advances in Preference Handling; Semantic Cities — Beyond Open Data to Models, Standards and Reasoning; Sequential Decision Making with Big Data; Statistical Relational AI; and The World Wide Web and Public Health Intelligence. This article presents short summaries of those events.
Collapse
|
40
|
Hewett D, Whirl-Carrillo M, Hunter LE, Altman RB, Klein TE. A twentieth anniversary tribute to PSB. Pac Symp Biocomput 2015:1-7. [PMID: 25592562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
PSB brings together top researchers from around the world to exchange research results and address open issues in all aspects of computational biology. PSB 2015 marks the twentieth anniversary of PSB. Reaching a milestone year is an accomplishment well worth celebrating. It is long enough to have seen big changes occur, but recent enough to be relevant for today. As PSB celebrates twenty years of service, we would like to take this opportunity to congratulate the PSB community for your success. We would also like the community to join us in a time of celebration and reflection on this accomplishment.
Collapse
Affiliation(s)
- Darla Hewett
- Stanford University, Shriram Center for Bioengineering and Chemical Engineering, 443 Via Ortega, Stanford, CA 94305, USA
| | | | | | | | | |
Collapse
|
41
|
Hinterberg MA, Kao DP, Bristow MR, Hunter LE, Port JD, Görg C. Peax: interactive visual analysis and exploration of complex clinical phenotype and gene expression association. Pac Symp Biocomput 2015:419-30. [PMID: 25592601 PMCID: PMC4344826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Increasing availability of high-dimensional clinical data, which improves the ability to define more specific phenotypes, as well as molecular data, which can elucidate disease mechanisms, is a driving force and at the same time a major challenge for translational and personalized medicine. Successful research in this field requires an approach that ties together specific disease and health expertise with understanding of molecular data through statistical methods. We present PEAX (Phenotype-Expression Association eXplorer), built upon open-source software, which integrates visual phenotype model definition with statistical testing of expression data presented concurrently in a web-browser. The integration of data and analysis tasks in a single tool allows clinical domain experts to obtain new insights directly through exploration of relationships between multivariate phenotype models and gene expression data, showing the effects of model definition and modification while also exploiting potential meaningful associations between phenotype and miRNA-mRNA regulatory relationships. We combine the web visualization capabilities of Shiny and D3 with the power and speed of R for backend statistical analysis, in order to abstract the scripting required for repetitive analysis of sub-phenotype association. We describe the motivation for PEAX, demonstrate its utility through a use case involving heart failure research, and discuss computational challenges and observations. We show that our visual web-based representations are well-suited for rapid exploration of phenotype and gene expression association, facilitating insight and discovery by domain experts.
Collapse
Affiliation(s)
| | - David P. Kao
- School of Medicine, University of Colorado, Aurora, CO 80045, USA
| | | | | | - J. David Port
- School of Medicine, University of Colorado, Aurora, CO 80045, USA
| | - Carsten Görg
- School of Medicine, University of Colorado, Aurora, CO 80045, USA
| |
Collapse
|
42
|
Pattin KA, Greene AC, Altman RB, Cohen KB, Wethington E, Görg C, Hunter LE, Muse SV, Radivojac P, Moore JH. Training the next generation of quantitative biologists in the era of big data. Pac Symp Biocomput 2015:488-492. [PMID: 25592609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
The following sections are included: Workshop Focus, Workshop Contributions and References.
Collapse
Affiliation(s)
- Kristine A Pattin
- Institute for Quantitative Biomedical Sciences, Dartmouth College, Hanover, NH 03755, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Hailu ND, Cohen KB, Hunter LE. Ontology translation: A case study on translating the Gene Ontology from English to German. Nat Lang Process Inf Syst 2014; 8455:33-38. [PMID: 29780975 PMCID: PMC5954410 DOI: 10.1007/978-3-319-07983-7_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
For many researchers, the purpose of ontologies is sharing data. This sharing is facilitated when ontologies are available in multiple languages, but inhibited when an ontology is only available in a single language. Ontologies should be accessible to people in multiple languages, since multilingualism is inevitable in any scientific work. Due to resource scarcity, most ontologies of the biomedical domain are available only in English at present. We present techniques to translate Gene Ontology terms from English to German using DBPedia, the Google Translate API for isolated terms, and the Google Translate API for terms in sentential context. Average fluency scores for the three methods were 4.0, 4.4, and 4.5, respectively. Average adequacy scores were 4.0, 4.9, and 4.9.
Collapse
Affiliation(s)
- Negacy D Hailu
- Computational Bioscience Program, University of Colorado School of Medicine, USA
| | - K Bretonnel Cohen
- Computational Bioscience Program, University of Colorado School of Medicine, USA
| | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado School of Medicine, USA
| |
Collapse
|
44
|
Funk C, Baumgartner W, Garcia B, Roeder C, Bada M, Cohen KB, Hunter LE, Verspoor K. Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters. BMC Bioinformatics 2014; 15:59. [PMID: 24571547 PMCID: PMC4015610 DOI: 10.1186/1471-2105-15-59] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2013] [Accepted: 01/24/2014] [Indexed: 11/10/2022] Open
Abstract
Background Ontological concepts are useful for many different biomedical tasks. Concepts are difficult to recognize in text due to a disconnect between what is captured in an ontology and how the concepts are expressed in text. There are many recognizers for specific ontologies, but a general approach for concept recognition is an open problem. Results Three dictionary-based systems (MetaMap, NCBO Annotator, and ConceptMapper) are evaluated on eight biomedical ontologies in the Colorado Richly Annotated Full-Text (CRAFT) Corpus. Over 1,000 parameter combinations are examined, and best-performing parameters for each system-ontology pair are presented. Conclusions Baselines for concept recognition by three systems on eight biomedical ontologies are established (F-measures range from 0.14–0.83). Out of the three systems we tested, ConceptMapper is generally the best-performing system; it produces the highest F-measure of seven out of eight ontologies. Default parameters are not ideal for most systems on most ontologies; by changing parameters F-measure can be increased by up to 0.4. Not only are best performing parameters presented, but suggestions for choosing the best parameters based on ontology characteristics are presented.
Collapse
Affiliation(s)
- Christopher Funk
- Computational Bioscience Program, U, of Colorado School of Medicine, Aurora, CO 80045, USA.
| | | | | | | | | | | | | | | |
Collapse
|
45
|
Livingston KM, Bada M, Hunter LE, Verspoor K. Representing annotation compositionality and provenance for the Semantic Web. J Biomed Semantics 2013; 4:38. [PMID: 24268021 PMCID: PMC4129183 DOI: 10.1186/2041-1480-4-38] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2013] [Accepted: 09/20/2013] [Indexed: 12/03/2022] Open
Abstract
Background Though the annotation of digital artifacts with metadata has a long history, the bulk of that work focuses on the association of single terms or concepts to single targets. As annotation efforts expand to capture more complex information, annotations will need to be able to refer to knowledge structures formally defined in terms of more atomic knowledge structures. Existing provenance efforts in the Semantic Web domain primarily focus on tracking provenance at the level of whole triples and do not provide enough detail to track how individual triple elements of annotations were derived from triple elements of other annotations. Results We present a task- and domain-independent ontological model for capturing annotations and their linkage to their denoted knowledge representations, which can be singular concepts or more complex sets of assertions. We have implemented this model as an extension of the Information Artifact Ontology in OWL and made it freely available, and we show how it can be integrated with several prominent annotation and provenance models. We present several application areas for the model, ranging from linguistic annotation of text to the annotation of disease-associations in genome sequences. Conclusions With this model, progressively more complex annotations can be composed from other annotations, and the provenance of compositional annotations can be represented at the annotation level or at the level of individual elements of the RDF triples composing the annotations. This in turn allows for progressively richer annotations to be constructed from previous annotation efforts, the precise provenance recording of which facilitates evidence-based inference and error tracking.
Collapse
Affiliation(s)
- Kevin M Livingston
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Michael Bada
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Lawrence E Hunter
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Karin Verspoor
- National ICT Australia, Victoria Research Laboratory, Melbourne, VIC, 3010, Australia ; Department of Computing and Information Systems, The University of Melbourne, Melbourne 3010 VIC, Australia
| |
Collapse
|
46
|
Hunter LE. Rocky Mountain Conference on Bioinformatics Celebrates 10 Years. PLoS Comput Biol 2013; 9:e1003076. [PMID: 23737739 PMCID: PMC3667766 DOI: 10.1371/journal.pcbi.1003076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2013] [Accepted: 03/31/2013] [Indexed: 11/28/2022] Open
Affiliation(s)
- Lawrence E Hunter
- Center for Computational Pharmacology & Computational Bioscience Program, University of Colorado Denver, Aurora, Colorado, USA.
| |
Collapse
|
47
|
Abstract
Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research—translating basic science results into new interventions—and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.
Collapse
Affiliation(s)
- K Bretonnel Cohen
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, Colorado, USA.
| | | |
Collapse
|
48
|
Cohen KB, Hunter LE, Palmer M. Assessment of software testing and quality assurance in natural language processing applications and a linguistically inspired approach to improving it. Trust Eternal Syst Via Evol Softw Data Knowl (2012) 2013; 379:77-90. [PMID: 34308448 PMCID: PMC8300901 DOI: 10.1007/978-3-642-45260-4_6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Significant progress has been made in addressing the scientific challenges of biomedical text mining. However, the transition from a demonstration of scientific progress to the production of tools on which a broader community can rely requires that fundamental software engineering requirements be addressed. In this paper we characterize the state of biomedical text mining software with respect to software testing and quality assurance. Biomedical natural language processing software was chosen because it frequently specifically claims to offer production-quality services, rather than just research prototypes. We examined twenty web sites offering a variety of text mining services. On each web site, we performed the most basic software test known to us and classified the results. Seven out of twenty web sites returned either bad results or the worst class of results in response to this simple test. We conclude that biomedical natural language processing tools require greater attention to software quality. We suggest a linguistically motivated approach to granular evaluation of natural language processing applications, and show how it can be used to detect performance errors of several systems and to predict overall performance on specific equivalence classes of inputs. We also assess the ability of linguistically-motivated test suites to provide good software testing, as compared to large corpora of naturally-occurring data. We measure code coverage and find that it is considerably higher when even small structured test suites are utilized than when large corpora are used.
Collapse
Affiliation(s)
- K. Bretonnel Cohen
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, Colorado, USA; Department of Linguistics, University of Colorado at Boulder, Boulder, Colorado, USA
| | - Lawrence E. Hunter
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, Colorado, USA; Department of Linguistics, University of Colorado at Boulder, Boulder, Colorado, USA
| | - Martha Palmer
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, Colorado, USA; Department of Linguistics, University of Colorado at Boulder, Boulder, Colorado, USA
| |
Collapse
|
49
|
Abstract
In this Commentary, we describe a cryptographic method for returning research results to individuals who participate in clinical studies. Controlled use of this method, which relaxes the typical anonymization guarantee, can ensure that clinically actionable results reach participants while also addressing most privacy concerns.
Collapse
Affiliation(s)
- Lawrence E Hunter
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | | | | | | |
Collapse
|
50
|
Frantz AM, Sarver AL, Ito D, Phang TL, Karimpour-Fard A, Scott MC, Valli VEO, Lindblad-Toh K, Burgess KE, Husbands BD, Henson MS, Borgatti A, Kisseberth WC, Hunter LE, Breen M, O'Brien TD, Modiano JF. Molecular profiling reveals prognostically significant subtypes of canine lymphoma. Vet Pathol 2012; 50:693-703. [PMID: 23125145 DOI: 10.1177/0300985812465325] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
We performed genomewide gene expression analysis of 35 samples representing 6 common histologic subtypes of canine lymphoma and bioinformatics analyses to define their molecular characteristics. Three major groups were defined on the basis of gene expression profiles: (1) low-grade T-cell lymphoma, composed entirely by T-zone lymphoma; (2) high-grade T-cell lymphoma, consisting of lymphoblastic T-cell lymphoma and peripheral T-cell lymphoma not otherwise specified; and (3) B-cell lymphoma, consisting of marginal B-cell lymphoma, diffuse large B-cell lymphoma, and Burkitt lymphoma. Interspecies comparative analyses of gene expression profiles also showed that marginal B-cell lymphoma and diffuse large B-cell lymphoma in dogs and humans might represent a continuum of disease with similar drivers. The classification of these diverse tumors into 3 subgroups was prognostically significant, as the groups were directly correlated with event-free survival. Finally, we developed a benchtop diagnostic test based on expression of 4 genes that can robustly classify canine lymphomas into one of these 3 subgroups, enabling a direct clinical application for our results.
Collapse
Affiliation(s)
- A M Frantz
- Department of Veterinary Clinical Sciences, College of Veterinary Medicine, University of Minnesota, St Paul, Minnesota, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|