Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

74
(from Reference Citation Analysis)

Article PDFs (29)

Cited by > 0 (47)

Searched Name

Lawrence E Hunter

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Bibal A, Salem NM, Cardon R, White EK, Acuna DE, Burke R, Hunter LE. RecSOI: recommending research directions using statements of ignorance. J Biomed Semantics 2024;15:2. [PMID: 38650032 PMCID: PMC11034121 DOI: 10.1186/s13326-024-00304-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 03/23/2024] [Indexed: 04/25/2024] Open

Callahan TJ, Tripodi IJ, Stefanski AL, Cappelletti L, Taneja SB, Wyrwa JM, Casiraghi E, Matentzoglu NA, Reese J, Silverstein JC, Hoyt CT, Boyce RD, Malec SA, Unni DR, Joachimiak MP, Robinson PN, Mungall CJ, Cavalleri E, Fontana T, Valentini G, Mesiti M, Gillenwater LA, Santangelo B, Vasilevsky NA, Hoehndorf R, Bennett TD, Ryan PB, Hripcsak G, Kahn MG, Bada M, Baumgartner WA, Hunter LE. An open source knowledge graph ecosystem for the life sciences. Sci Data 2024;11:363. [PMID: 38605048 PMCID: PMC11009265 DOI: 10.1038/s41597-024-03171-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 03/21/2024] [Indexed: 04/13/2024] Open

Affiliation(s)

Tiffany J Callahan Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA. Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA.
Ignacio J Tripodi Computer Science Department, Interdisciplinary Quantitative Biology, University of Colorado Boulder, Boulder, CO, 80301, USA
Adrianne L Stefanski Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
Luca Cappelletti AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
Sanya B Taneja Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, 15260, USA
Jordan M Wyrwa Department of Physical Medicine and Rehabilitation, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
Elena Casiraghi AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
Nicolas A Matentzoglu Semanticly, Athens, Greece
Justin Reese Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
Jonathan C Silverstein Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15206, USA
Charles Tapley Hoyt Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, 02115, USA
Richard D Boyce Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15206, USA
Scott A Malec Division of Translational Informatics, University of New Mexico School of Medicine, Albuquerque, NM, 87131, USA
Deepak R Unni SIB Swiss Institute of Bioinformatics, Basel, Switzerland
Marcin P Joachimiak Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
Peter N Robinson Berlin Institute of Health at Charité-Universitatsmedizin, 10117, Berlin, Germany
Christopher J Mungall Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
Emanuele Cavalleri AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
Tommaso Fontana AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
Giorgio Valentini AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy ELLIS, European Laboratory for Learning and Intelligent Systems, Milan Unit, Italy
Marco Mesiti AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
Lucas A Gillenwater Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
Brook Santangelo Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
Nicole A Vasilevsky Data Collaboration Center, Critical Path Institute, 1840 E River Rd. Suite 100, Tucson, AZ, 85718, USA
Robert Hoehndorf Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Kingdom of Saudi Arabia
Tellen D Bennett Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
Patrick B Ryan Janssen Research and Development, Raritan, NJ, 08869, USA
George Hripcsak Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
Michael G Kahn Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
Michael Bada Division of General Internal Medicine, University of Colorado School of Medicine, Aurora, CO, 80045, USA
William A Baumgartner Division of General Internal Medicine, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
Lawrence E Hunter Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA. Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA.

Collapse

Santangelo BE, Apgar M, Colorado ASB, Martin CG, Sterrett J, Wall E, Joachimiak MP, Hunter LE, Lozupone CA. Integrating biological knowledge for mechanistic inference in the host-associated microbiome. Front Microbiol 2024;15:1351678. [PMID: 38638909 PMCID: PMC11024261 DOI: 10.3389/fmicb.2024.1351678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 02/26/2024] [Indexed: 04/20/2024] Open

Lee JS, Lowell JL, Whitewater K, Roane TM, Miller CS, Chan AP, Sylvester AW, Jackson D, Hunter LE. Monitoring environmental microbiomes: Alignment of microbiology and computational biology competencies within a culturally integrated curriculum and research framework. Mol Ecol Resour 2023. [PMID: 37702134 DOI: 10.1111/1755-0998.13867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 08/18/2023] [Accepted: 08/30/2023] [Indexed: 09/14/2023]

Gupta S, Westacott MJ, Ayers DG, Weiss SJ, Whitley P, Mueller C, Weaver DC, Schneider DJ, Karimpour-Fard A, Hunter LE, Drolet DW, Janjic N. Plasma proteome of growing tumors. Sci Rep 2023;13:12195. [PMID: 37500700 PMCID: PMC10374562 DOI: 10.1038/s41598-023-38079-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 07/03/2023] [Indexed: 07/29/2023] Open

Boguslav MR, Salem NM, White EK, Sullivan KJ, Bada M, Hernandez TL, Leach SM, Hunter LE. Creating an ignorance-base: Exploring known unknowns in the scientific literature. J Biomed Inform 2023;143:104405. [PMID: 37270143 PMCID: PMC10528083 DOI: 10.1016/j.jbi.2023.104405] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 05/18/2023] [Accepted: 05/21/2023] [Indexed: 06/05/2023]

Abstract

BACKGROUND

Scientific discovery progresses by exploring new and uncharted territory. More specifically, it advances by a process of transforming unknown unknowns first into known unknowns, and then into knowns. Over the last few decades, researchers have developed many knowledge bases to capture and connect the knowns, which has enabled topic exploration and contextualization of experimental results. But recognizing the unknowns is also critical for finding the most pertinent questions and their answers. Prior work on known unknowns has sought to understand them, annotate them, and automate their identification. However, no knowledge-bases yet exist to capture these unknowns, and little work has focused on how scientists might use them to trace a given topic or experimental result in search of open questions and new avenues for exploration. We show here that a knowledge base of unknowns can be connected to ontologically grounded biomedical knowledge to accelerate research in the field of prenatal nutrition.

RESULTS

We present the first ignorance-base, a knowledge-base created by combining classifiers to recognize ignorance statements (statements of missing or incomplete knowledge that imply a goal for knowledge) and biomedical concepts over the prenatal nutrition literature. This knowledge-base places biomedical concepts mentioned in the literature in context with the ignorance statements authors have made about them. Using our system, researchers interested in the topic of vitamin D and prenatal health were able to uncover three new avenues for exploration (immune system, respiratory system, and brain development) by searching for concepts enriched in ignorance statements. These were buried among the many standard enriched concepts. Additionally, we used the ignorance-base to enrich concepts connected to a gene list associated with vitamin D and spontaneous preterm birth and found an emerging topic of study (brain development) in an implied field (neuroscience). The researchers could look to the field of neuroscience for potential answers to the ignorance statements.

CONCLUSION

Our goal is to help students, researchers, funders, and publishers better understand the state of our collective scientific ignorance (known unknowns) in order to help accelerate research through the continued illumination of and focus on the known unknowns and their respective goals for scientific knowledge.

Collapse

Callahan TJ, Stefanski AL, Wyrwa JM, Zeng C, Ostropolets A, Banda JM, Baumgartner WA, Boyce RD, Casiraghi E, Coleman BD, Collins JH, Deakyne Davies SJ, Feinstein JA, Lin AY, Martin B, Matentzoglu NA, Meeker D, Reese J, Sinclair J, Taneja SB, Trinkley KE, Vasilevsky NA, Williams AE, Zhang XA, Denny JC, Ryan PB, Hripcsak G, Bennett TD, Haendel MA, Robinson PN, Hunter LE, Kahn MG. Ontologizing health systems data at scale: making translational discovery a reality. NPJ Digit Med 2023;6:89. [PMID: 37208468 PMCID: PMC10196319 DOI: 10.1038/s41746-023-00830-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 04/28/2023] [Indexed: 05/21/2023] Open

Affiliation(s)

Tiffany J Callahan Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA. Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA.
Adrianne L Stefanski Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
Jordan M Wyrwa Department of Physical Medicine and Rehabilitation, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
Chenjie Zeng National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
Anna Ostropolets Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
Juan M Banda Department of Computer Science, Georgia State University, Atlanta, GA, 30303, USA
William A Baumgartner Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
Richard D Boyce Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15260, USA
Elena Casiraghi Computer Science, Università degli Studi di Milano, Milan, Italy The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
Ben D Coleman The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
Janine H Collins Department of Haematology, University of Cambridge, Cambridge, UK
Sara J Deakyne Davies Department of Research Informatics & Data Science, Analytics Resource Center, Children's Hospital Colorado, Aurora, CO, 80045, USA
James A Feinstein Adult and Child Center for Health Outcomes Research and Delivery Science (ACCORDS), University of Colorado Anschutz School of Medicine, Aurora, CO, 80045, USA
Asiyah Y Lin National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
Blake Martin Departments of Biomedical Informatics and Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
Nicolas A Matentzoglu Semanticly, Athens, Greece
Daniella Meeker Yale School of Medicine, New Haven, CT, 06510, USA
Justin Reese Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
Jessica Sinclair HealthLinc, Valparaiso, IN, 46383, USA
Sanya B Taneja Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, 15260, USA
Katy E Trinkley Department of Family Medicine, University of Colorado Anschutz School of Medicine, Aurora, CO, 80045, USA
Nicole A Vasilevsky Translational and Integrative Sciences Lab, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
Andrew E Williams Tufts Institute for Clinical Research and Health Policy Studies, Tufts University, Boston, MA, 02155, USA
Xingmin A Zhang The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
Joshua C Denny National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
Patrick B Ryan Janssen Research and Development, Raritan, NJ, 08869, USA
George Hripcsak Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
Tellen D Bennett Departments of Biomedical Informatics and Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
Melissa A Haendel Departments of Biomedical Informatics and Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
Peter N Robinson The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
Lawrence E Hunter Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
Michael G Kahn Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA

Collapse

Callahan TJ, Stefanksi AL, Ostendorf DM, Wyrwa JM, Davies SJD, Hripcsak G, Hunter LE, Kahn MG. Characterizing Patient Representations for Computational Phenotyping. AMIA Annu Symp Proc 2023;2022:319-328. [PMID: 37128436 PMCID: PMC10148332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]

Callahan TJ, Stefanski AL, Kim JD, Baumgartner WA, Wyrwa JM, Hunter LE. Knowledge-Driven Mechanistic Enrichment of the Preeclampsia Ignorome. Pac Symp Biocomput 2023;28:371-382. [PMID: 36540992 PMCID: PMC9782728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]

Abstract

Preeclampsia is a leading cause of maternal and fetal morbidity and mortality. Currently, the only definitive treatment of preeclampsia is delivery of the placenta, which is central to the pathogenesis of the disease. Transcriptional profiling of human placenta from pregnancies complicated by preeclampsia has been extensively performed to identify differentially expressed genes (DEGs). The decisions to investigate DEGs experimentally are biased by many factors, causing many DEGs to remain uninvestigated. A set of DEGs which are associated with a disease experimentally, but which have no known association to the disease in the literature are known as the ignorome. Preeclampsia has an extensive body of scientific literature, a large pool of DEG data, and only one definitive treatment. Tools facilitating knowledge-based analyses, which are capable of combining disparate data from many sources in order to suggest underlying mechanisms of action, may be a valuable resource to support discovery and improve our understanding of this disease. In this work we demonstrate how a biomedical knowledge graph (KG) can be used to identify novel preeclampsia molecular mechanisms. Existing open source biomedical resources and publicly available high-throughput transcriptional profiling data were used to identify and annotate the function of currently uninvestigated preeclampsia-associated DEGs. Experimentally investigated genes associated with preeclampsia were identified from PubMed abstracts using text-mining methodologies. The relative complement of the text-mined- and meta-analysis-derived lists were identified as the uninvestigated preeclampsia-associated DEGs (n=445), i.e., the preeclampsia ignorome. Using the KG to investigate relevant DEGs revealed 53 novel clinically relevant and biologically actionable mechanistic associations.

Collapse

Santangelo BE, Gillenwater LA, Salem NM, Hunter LE. Molecular cartooning with knowledge graphs. Front Bioinform 2022;2:1054578. [PMID: 36568701 PMCID: PMC9772836 DOI: 10.3389/fbinf.2022.1054578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 11/23/2022] [Indexed: 12/13/2022] Open

Abstract

Molecular "cartoons," such as pathway diagrams, provide a visual summary of biomedical research results and hypotheses. Their ubiquitous appearance within the literature indicates their universal application in mechanistic communication. A recent survey of pathway diagrams identified 64,643 pathway figures published between 1995 and 2019 with 1,112,551 mentions of 13,464 unique human genes participating in a wide variety of biological processes. Researchers generally create these diagrams using generic diagram editing software that does not itself embody any biomedical knowledge. Biomedical knowledge graphs (KGs) integrate and represent knowledge in a semantically consistent way, systematically capturing biomedical knowledge similar to that in molecular cartoons. KGs have the potential to provide context and precise details useful in drawing such figures. However, KGs cannot generally be translated directly into figures. They include substantial material irrelevant to the scientific point of a given figure and are often more detailed than is appropriate. How could KGs be used to facilitate the creation of molecular diagrams? Here we present a new approach towards cartoon image creation that utilizes the semantic structure of knowledge graphs to aid the production of molecular diagrams. We introduce a set of "semantic graphical actions" that select and transform the relational information between heterogeneous entities (e.g., genes, proteins, pathways, diseases) in a KG to produce diagram schematics that meet the scientific communication needs of the user. These semantic actions search, select, filter, transform, group, arrange, connect and extract relevant subgraphs from KGs based on meaning in biological terms, e.g., a protein upstream of a target in a pathway. To demonstrate the utility of this approach, we show how semantic graphical actions on KGs could have been used to produce three existing pathway diagrams in diverse biomedical domains: Down Syndrome, COVID-19, and neuroinflammation. Our focus is on recapitulating the semantic content of the figures, not the layout, glyphs, or other aesthetic aspects. Our results suggest that the use of KGs and semantic graphical actions to produce biomedical diagrams will reduce the effort required and improve the quality of this visual form of scientific communication.

Collapse

Nicholson DN, Rubinetti V, Hu D, Thielk M, Hunter LE, Greene CS. Examining linguistic shifts between preprints and publications. PLoS Biol 2022;20:e3001470. [PMID: 35104289 PMCID: PMC8806061 DOI: 10.1371/journal.pbio.3001470] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 11/05/2021] [Indexed: 11/19/2022] Open

Boguslav MR, Hailu ND, Bada M, Baumgartner WA, Hunter LE. Concept recognition as a machine translation problem. BMC Bioinformatics 2021;22:598. [PMID: 34920707 PMCID: PMC8678974 DOI: 10.1186/s12859-021-04141-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 04/19/2021] [Indexed: 12/02/2022] Open

Abstract

BACKGROUND

Automated assignment of specific ontology concepts to mentions in text is a critical task in biomedical natural language processing, and the subject of many open shared tasks. Although the current state of the art involves the use of neural network language models as a post-processing step, the very large number of ontology classes to be recognized and the limited amount of gold-standard training data has impeded the creation of end-to-end systems based entirely on machine learning. Recently, Hailu et al. recast the concept recognition problem as a type of machine translation and demonstrated that sequence-to-sequence machine learning models have the potential to outperform multi-class classification approaches.

METHODS

We systematically characterize the factors that contribute to the accuracy and efficiency of several approaches to sequence-to-sequence machine learning through extensive studies of alternative methods and hyperparameter selections. We not only identify the best-performing systems and parameters across a wide variety of ontologies but also provide insights into the widely varying resource requirements and hyperparameter robustness of alternative approaches. Analysis of the strengths and weaknesses of such systems suggest promising avenues for future improvements as well as design choices that can increase computational efficiency with small costs in performance.

RESULTS

Bidirectional encoder representations from transformers for biomedical text mining (BioBERT) for span detection along with the open-source toolkit for neural machine translation (OpenNMT) for concept normalization achieve state-of-the-art performance for most ontologies annotated in the CRAFT Corpus. This approach uses substantially fewer computational resources, including hardware, memory, and time than several alternative approaches.

CONCLUSIONS

Machine translation is a promising avenue for fully machine-learning-based concept recognition that achieves state-of-the-art results on the CRAFT Corpus, evaluated via a direct comparison to previous results from the 2019 CRAFT shared task. Experiments illuminating the reasons for the surprisingly good performance of sequence-to-sequence methods targeting ontology identifiers suggest that further progress may be possible by mapping to alternative target concept representations. All code and models can be found at: https://github.com/UCDenver-ccp/Concept-Recognition-as-Translation .

Collapse

Boguslav MR, Salem NM, White EK, Leach SM, Hunter LE. Identifying and classifying goals for scientific knowledge. Bioinform Adv 2021;1:vbab012. [PMID: 34661112 PMCID: PMC8508177 DOI: 10.1093/bioadv/vbab012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 06/17/2021] [Indexed: 01/26/2023]

Sullivan KJ, Burden M, Keniston A, Banda JM, Hunter LE. Characterization of Anonymous Physician Perspectives on COVID-19 Using Social Media Data. Pac Symp Biocomput 2021;26:95-106. [PMID: 33691008 PMCID: PMC7958992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Callahan TJ, Tripodi IJ, Pielke-Lombardo H, Hunter LE. Knowledge-Based Biomedical Data Science. Annu Rev Biomed Data Sci 2020;3:23-41. [PMID: 33954284 PMCID: PMC8095730 DOI: 10.1146/annurev-biodatasci-010820-091627] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Tripodi IJ, Callahan TJ, Westfall JT, Meitzer NS, Dowell RD, Hunter LE. Applying knowledge-driven mechanistic inference to toxicogenomics. Toxicol In Vitro 2020;66:104877. [PMID: 32387679 DOI: 10.1016/j.tiv.2020.104877] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 04/13/2020] [Accepted: 04/23/2020] [Indexed: 02/07/2023]

Cohen KB, Hunter LE, Pressman PS. P-Hacking Lexical Richness Through Definitions of "Type" and "Token". Stud Health Technol Inform 2019;264:1433-1434. [PMID: 31438167 PMCID: PMC8956251 DOI: 10.3233/shti190470] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Pressman PS, Ross ED, Cohen KB, Chen K, Miller BL, Hunter LE, Gorno‐Tempini ML, Levenson RW. Interpersonal prosodic correlation in frontotemporal dementia. Ann Clin Transl Neurol 2019;6:1352-1357. [PMID: 31353851 PMCID: PMC6649473 DOI: 10.1002/acn3.50816] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Revised: 05/08/2019] [Accepted: 05/23/2019] [Indexed: 11/06/2022] Open

Zhang XA, Yates A, Vasilevsky N, Gourdine JP, Callahan TJ, Carmody LC, Danis D, Joachimiak MP, Ravanmehr V, Pfaff ER, Champion J, Robasky K, Xu H, Fecho K, Walton NA, Zhu RL, Ramsdill J, Mungall CJ, Köhler S, Haendel MA, McDonald CJ, Vreeman DJ, Peden DB, Bennett TD, Feinstein JA, Martin B, Stefanski AL, Hunter LE, Chute CG, Robinson PN. Semantic integration of clinical laboratory tests from electronic health records for deep phenotyping and biomarker discovery. NPJ Digit Med 2019;2:32. [PMID: 31119199 PMCID: PMC6527418 DOI: 10.1038/s41746-019-0110-4] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Accepted: 04/18/2019] [Indexed: 12/22/2022] Open

Affiliation(s)

Xingmin Aaron Zhang The Jackson Laboratory for Genomic Medicine, Farmington CT, 06032 USA
Amy Yates Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR 97239 USA
Nicole Vasilevsky Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR 97239 USA Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR 97239 USA
J. P. Gourdine Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR 97239 USA Library, Oregon Health and Science University, Portland, OR 97239 USA
Tiffany J. Callahan Computational Bioscience Program, Department of Pharmacology, University of Colorado Anschutz School of Medicine, Aurora, CO 80045 USA
Leigh C. Carmody The Jackson Laboratory for Genomic Medicine, Farmington CT, 06032 USA
Daniel Danis The Jackson Laboratory for Genomic Medicine, Farmington CT, 06032 USA
Marcin P. Joachimiak Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 USA
Vida Ravanmehr The Jackson Laboratory for Genomic Medicine, Farmington CT, 06032 USA
Emily R. Pfaff North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
James Champion North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
Kimberly Robasky North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA Genetics Department, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA School of Information and Library Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
Hao Xu Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
Karamarie Fecho Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
Nephi A. Walton Genomic Medicine Institute, Geisinger Health System, Danville, PA 17822 USA
Richard L. Zhu Institute for Clinical and Translational Research, Johns Hopkins University, Baltimore, MD 21202 USA
Justin Ramsdill Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR 97239 USA
Christopher J. Mungall Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 USA
Sebastian Köhler Charité Centrum für Therapieforschung, Charité - Universitätsmedizin Berlin Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, 10117 Germany Einstein Center Digital Future, Berlin, 10117 Germany
Melissa A. Haendel Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR 97239 USA Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR 97239 USA Linus Pauling Institute and Center for Genome Research and Biocomputing, Oregon State University, Corvallis, OR 97331 USA
Clement J. McDonald Lister Hill National Center for Biomedical Communications, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 USA
Daniel J. Vreeman Department of Medicine, Indiana University School of Medicine, Indianapolis, IN 46202 USA Center for Biomedical Informatics, Regenstrief Institute, Inc., Indianapolis, IN 46202 USA
David B. Peden North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA Division of Allergy, Immunology and Rheumatology, Department of Pediatrics, University of North Carolina, Chapel Hill, NC 27599 USA University of North Carolina Center for Environmental Medicine, Asthma and Lung Biology, University of North Carolina, Chapel Hill, NC 27599 USA
Tellen D. Bennett Department of Pediatrics, Section of Pediatric Critical Care, University of Colorado School of Medicine, Aurora, CO 80045 USA
James A. Feinstein Adult and Child Consortium for Health Outcomes Research and Delivery Science (ACCORDS), University of Colorado School of Medicine, Aurora, CO 80045 USA
Blake Martin Department of Pediatrics, Section of Pediatric Critical Care, University of Colorado School of Medicine, Aurora, CO 80045 USA
Adrianne L. Stefanski Computational Bioscience Program, Department of Pharmacology, University of Colorado Anschutz School of Medicine, Aurora, CO 80045 USA
Lawrence E. Hunter Computational Bioscience Program, Department of Pharmacology, University of Colorado Anschutz School of Medicine, Aurora, CO 80045 USA
Christopher G. Chute Institute for Clinical and Translational Research, Johns Hopkins University, Baltimore, MD 21202 USA
Peter N. Robinson The Jackson Laboratory for Genomic Medicine, Farmington CT, 06032 USA Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032 USA

Collapse

Cohen KB, Xia J, Zweigenbaum P, Callahan TJ, Hargraves O, Goss F, Ide N, Névéol A, Grouin C, Hunter LE. Three Dimensions of Reproducibility in Natural Language Processing. LREC Int Conf Lang Resour Eval 2018;2018:156-165. [PMID: 29911205 PMCID: PMC5998676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Callahan TJ, Baumgartner WA, Bada M, Stefanski AL, Tripodi I, White EK, Hunter LE. OWL-NETS: Transforming OWL Representations for Improved Network Inference. Pac Symp Biocomput 2018;23:133-144. [PMID: 29218876 PMCID: PMC5737627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

Boguslav M, Cohen KB, Baumgartner WA, Hunter LE. Improving precision in concept normalization. Pac Symp Biocomput 2018;23:566-577. [PMID: 29218915 PMCID: PMC5730334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

Bada M, Vasilevsky N, Baumgartner WA, Haendel M, Hunter LE. Gold-standard ontology-based anatomical annotation in the CRAFT Corpus. Database (Oxford) 2017;2017:4780291. [PMID: 31725864 PMCID: PMC7243923 DOI: 10.1093/database/bax087] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Revised: 10/25/2017] [Accepted: 10/27/2017] [Indexed: 12/24/2022]

Abstract

Gold-standard annotated corpora have become important resources for the training and testing of natural-language-processing (NLP) systems designed to support biocuration efforts, and ontologies are increasingly used to facilitate curational consistency and semantic integration across disparate resources. Bringing together the respective power of these, the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of full-length, open-access biomedical journal articles with extensive manually created syntactic, formatting and semantic markup, was previously created and released. This initial public release has already been used in multiple projects to drive development of systems focused on a variety of biocuration, search, visualization, and semantic and syntactic NLP tasks. Building on its demonstrated utility, we have expanded the CRAFT Corpus with a large set of manually created semantic annotations relying on Uberon, an ontology representing anatomical entities and life-cycle stages of multicellular organisms across species as well as types of multicellular organisms defined in terms of life-cycle stage and sexual characteristics. This newly created set of annotations, which has been added for v2.1 of the corpus, is by far the largest publicly available collection of gold-standard anatomical markup and is the first large-scale effort at manual markup of biomedical text relying on the entirety of an anatomical terminology, as opposed to annotation with a small number of high-level anatomical categories, as performed in previous corpora. In addition to presenting and discussing this newly available resource, we apply it to provide a performance baseline for the automatic annotation of anatomical concepts in biomedical text using a prominent concept recognition system. The full corpus, released with a CC BY 3.0 license, may be downloaded from http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml.

Database URL: http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml

Collapse

Hunter LE. Knowledge-based biomedical Data Science. EPJ Data Sci 2017;1:19-25. [PMID: 30294517 PMCID: PMC6171523 DOI: 10.3233/ds-170001] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Prabhu N, Osifodunrin N, Murphy D, Butler S, Hunter LE. Innovative Strategies for the Management of a Massive Neonatal Rhabdomyoma. J Pediatr Intensive Care 2017;7:90-93. [PMID: 31073477 DOI: 10.1055/s-0037-1606574] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 08/09/2017] [Indexed: 09/30/2022] Open

Pouille F, McTavish TS, Hunter LE, Restrepo D, Schoppa NE. Intraglomerular gap junctions enhance interglomerular synchrony in a sparsely connected olfactory bulb network. J Physiol 2017;595:5965-5986. [PMID: 28640508 PMCID: PMC5577541 DOI: 10.1113/jp274408] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Accepted: 06/14/2017] [Indexed: 01/12/2023] Open

Abstract

KEY POINTS

Despite sparse connectivity, population-level interactions between mitral cells (MCs) and granule cells (GCs) can generate synchronized oscillations in the rodent olfactory bulb. Intraglomerular gap junctions between MCs at the same glomerulus can greatly enhance synchronized activity of MCs at different glomeruli. The facilitating effect of intraglomerular gap junctions on interglomerular synchrony is through triggering of mutually synchronizing interactions between MCs and GCs. Divergent connections between MCs and GCs make minimal direct contribution to synchronous activity.

ABSTRACT

A dominant feature of the olfactory bulb response to odour is fast synchronized oscillations at beta (15-40 Hz) or gamma (40-90 Hz) frequencies, thought to be involved in integration of olfactory signals. Mechanistically, the bulb presents an interesting case study for understanding how beta/gamma oscillations arise. Fast oscillatory synchrony in the activity of output mitral cells (MCs) appears to result from interactions with GABAergic granule cells (GCs), yet the incidence of MC-GC connections is very low, around 4%. Here, we combined computational and experimental approaches to examine how oscillatory synchrony can nevertheless arise, focusing mainly on activity between 'non-sister' MCs affiliated with different glomeruli (interglomerular synchrony). In a sparsely connected model of MCs and GCs, we found first that interglomerular synchrony was generally quite low, but could be increased by a factor of 4 by physiological levels of gap junctional coupling between sister MCs at the same glomerulus. This effect was due to enhanced mutually synchronizing interactions between MC and GC populations. The potent role of gap junctions was confirmed in patch-clamp recordings in bulb slices from wild-type and connexin 36-knockout (KO) mice. KO reduced both beta and gamma local field potential oscillations as well as synchrony of inhibitory signals in pairs of non-sister MCs. These effects were independent of potential KO actions on network excitation. Divergent synaptic connections did not contribute directly to the vast majority of synchronized signals. Thus, in a sparsely connected network, gap junctions between a small subset of cells can, through population effects, greatly amplify oscillatory synchrony amongst unconnected cells.

Collapse

Cohen KB, Lanfranchi A, Choi MJY, Bada M, Baumgartner WA, Panteleyeva N, Verspoor K, Palmer M, Hunter LE. Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles. BMC Bioinformatics 2017;18:372. [PMID: 28818042 PMCID: PMC5561560 DOI: 10.1186/s12859-017-1775-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2016] [Accepted: 07/31/2017] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Coreference resolution is the task of finding strings in text that have the same referent as other strings. Failures of coreference resolution are a common cause of false negatives in information extraction from the scientific literature. In order to better understand the nature of the phenomenon of coreference in biomedical publications and to increase performance on the task, we annotated the Colorado Richly Annotated Full Text (CRAFT) corpus with coreference relations.

RESULTS

The corpus was manually annotated with coreference relations, including identity and appositives for all coreferring base noun phrases. The OntoNotes annotation guidelines, with minor adaptations, were used. Interannotator agreement ranges from 0.480 (entity-based CEAF) to 0.858 (Class-B3), depending on the metric that is used to assess it. The resulting corpus adds nearly 30,000 annotations to the previous release of the CRAFT corpus. Differences from related projects include a much broader definition of markables, connection to extensive annotation of several domain-relevant semantic classes, and connection to complete syntactic annotation. Tool performance was benchmarked on the data. A publicly available out-of-the-box, general-domain coreference resolution system achieved an F-measure of 0.14 (B3), while a simple domain-adapted rule-based system achieved an F-measure of 0.42. An ensemble of the two reached F of 0.46. Following the IDENTITY chains in the data would add 106,263 additional named entities in the full 97-paper corpus, for an increase of 76% percent in the semantic classes of the eight ontologies that have been annotated in earlier versions of the CRAFT corpus.

CONCLUSIONS

The project produced a large data set for further investigation of coreference and coreference resolution in the scientific literature. The work raised issues in the phenomenon of reference in this domain and genre, and the paper proposes that many mentions that would be considered generic in the general domain are not generic in the biomedical domain due to their referents to specific classes in domain-specific ontologies. The comparison of the performance of a publicly available and well-understood coreference resolution system with a domain-adapted system produced results that are consistent with the notion that the requirements for successful coreference resolution in this genre are quite different from those of the general domain, and also suggest that the baseline performance difference is quite large.

Collapse

Hooper JE, Feng W, Li H, Leach SM, Phang T, Siska C, Jones KL, Spritz RA, Hunter LE, Williams T. Systems biology of facial development: contributions of ectoderm and mesenchyme. Dev Biol 2017;426:97-114. [PMID: 28363736 PMCID: PMC5530582 DOI: 10.1016/j.ydbio.2017.03.025] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Revised: 03/23/2017] [Accepted: 03/23/2017] [Indexed: 12/17/2022]

Abstract

The rapid increase in gene-centric biological knowledge coupled with analytic approaches for genomewide data integration provides an opportunity to develop systems-level understanding of facial development. Experimental analyses have demonstrated the importance of signaling between the surface ectoderm and the underlying mesenchyme are coordinating facial patterning. However, current transcriptome data from the developing vertebrate face is dominated by the mesenchymal component, and the contributions of the ectoderm are not easily identified. We have generated transcriptome datasets from critical periods of mouse face formation that enable gene expression to be analyzed with respect to time, prominence, and tissue layer. Notably, by separating the ectoderm and mesenchyme we considerably improved the sensitivity compared to data obtained from whole prominences, with more genes detected over a wider dynamic range. From these data we generated a detailed description of ectoderm-specific developmental programs, including pan-ectodermal programs, prominence- specific programs and their temporal dynamics. The genes and pathways represented in these programs provide mechanistic insights into several aspects of ectodermal development. We also used these data to identify co-expression modules specific to facial development. We then used 14 co-expression modules enriched for genes involved in orofacial clefts to make specific mechanistic predictions about genes involved in tongue specification, in nasal process patterning and in jaw development. Our multidimensional gene expression dataset is a unique resource for systems analysis of the developing face; our co-expression modules are a resource for predicting functions of poorly annotated genes, or for predicting roles for genes that have yet to be studied in the context of facial development; and our analytic approaches provide a paradigm for analysis of other complex developmental programs.

Collapse

Affiliation(s)

Joan E Hooper Department of Cell and Developmental Biology, University of Colorado School of Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA; Computational Bioscience Program, University of Colorado School of Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA.
Weiguo Feng Department of Cell and Developmental Biology, University of Colorado School of Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA; Department of Craniofacial Biology, University of Colorado School of Dental Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA.
Hong Li Department of Craniofacial Biology, University of Colorado School of Dental Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA.
Sonia M Leach Department of Biomedical Research, National Jewish Health, 1400 Jackson Street, Denver, CO 80206, USA.
Tzulip Phang Computational Bioscience Program, University of Colorado School of Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA; Department of Medicine, University of Colorado School of Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA.
Charlotte Siska Computational Bioscience Program, University of Colorado School of Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA.
Kenneth L Jones Department of Pediatrics, University of Colorado School of Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA.
Richard A Spritz Human Medical Genetics and Genomics Program, University of Colorado School of Medicine, 12800 E 17th Avenue, Aurora, CO 80045, USA.
Lawrence E Hunter Computational Bioscience Program, University of Colorado School of Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA; Department of Pharmacology, University of Colorado School of Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA.
Trevor Williams Department of Cell and Developmental Biology, University of Colorado School of Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA; Department of Craniofacial Biology, University of Colorado School of Dental Medicine, 12801 E 17th Avenue, Aurora, CO 80045, USA.

Collapse

Greene CS, Garmire LX, Gilbert JA, Ritchie MD, Hunter LE. Celebrating parasites. Nat Genet 2017;49:483-484. [PMID: 28358134 PMCID: PMC5710834 DOI: 10.1038/ng.3830] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]

Moore JH, Jennings SF, Greene CS, Hunter LE, Perkins AD, Williams-Devane C, Wunsch DC, Zhao Z, Huang X. NO-BOUNDARY THINKING IN BIOINFORMATICS. Pac Symp Biocomput 2017;22:646-648. [PMID: 27897015 DOI: 10.1142/9789813207813_0060] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Cohen KB, Goss FR, Zweigenbaum P, Hunter LE. Translational Morphosyntax: Distribution of Negation in Clinical Records and Biomedical Journal Articles. Stud Health Technol Inform 2017;245:346-350. [PMID: 29295113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

Yadav P, Jezek E, Bouillon P, Callahan TJ, Bada M, Hunter LE, Cohen KB. Semantic Relations in Compound Nouns: Perspectives from Inter-Annotator Agreement. Stud Health Technol Inform 2017;245:644-648. [PMID: 29295175 PMCID: PMC7781293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Funk CS, Cohen KB, Hunter LE, Verspoor KM. Gene Ontology synonym generation rules lead to increased performance in biomedical concept recognition. J Biomed Semantics 2016;7:52. [PMID: 27613112 PMCID: PMC5018193 DOI: 10.1186/s13326-016-0096-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Accepted: 08/05/2016] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Gene Ontology (GO) terms represent the standard for annotation and representation of molecular functions, biological processes and cellular compartments, but a large gap exists between the way concepts are represented in the ontology and how they are expressed in natural language text. The construction of highly specific GO terms is formulaic, consisting of parts and pieces from more simple terms.

RESULTS

We present two different types of manually generated rules to help capture the variation of how GO terms can appear in natural language text. The first set of rules takes into account the compositional nature of GO and recursively decomposes the terms into their smallest constituent parts. The second set of rules generates derivational variations of these smaller terms and compositionally combines all generated variants to form the original term. By applying both types of rules, new synonyms are generated for two-thirds of all GO terms and an increase in F-measure performance for recognition of GO on the CRAFT corpus from 0.498 to 0.636 is observed. Additionally, we evaluated the combination of both types of rules over one million full text documents from Elsevier; manual validation and error analysis show we are able to recognize GO concepts with reasonable accuracy (88 %) based on random sampling of annotations.

CONCLUSIONS

In this work we present a set of simple synonym generation rules that utilize the highly compositional and formulaic nature of the Gene Ontology concepts. We illustrate how the generated synonyms aid in improving recognition of GO concepts on two different biomedical corpora. We discuss other applications of our rules for GO ontology quality assurance, explore the issue of overgeneration, and provide examples of how similar methodologies could be applied to other biomedical terminologies. Additionally, we provide all generated synonyms for use by the text-mining community.

Collapse

Cohen KB, Xia J, Roeder C, Hunter LE. Reproducibility in Natural Language Processing: A Case Study of Two R Libraries for Mining PubMed/MEDLINE. LREC Int Conf Lang Resour Eval 2016;2016:6-12. [PMID: 29568821 PMCID: PMC5860830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Hunter LE, Pushparajah K, Miller O, Anderson D, Simpson JM. Prenatal diagnosis of left ventricular diverticulum and coarctation of the aorta. Ultrasound Obstet Gynecol 2016;47:236-238. [PMID: 26376444 DOI: 10.1002/uog.15746] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2015] [Revised: 09/01/2015] [Accepted: 09/10/2015] [Indexed: 06/05/2023]

Karimpour-Fard A, Epperson LE, Hunter LE. A survey of computational tools for downstream analysis of proteomic and other omic datasets. Hum Genomics 2015;9:28. [PMID: 26510531 PMCID: PMC4624643 DOI: 10.1186/s40246-015-0050-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Accepted: 10/06/2015] [Indexed: 12/19/2022] Open

Vehlow C, Kao DP, Bristow MR, Hunter LE, Weiskopf D, Görg C. Visual analysis of biological data-knowledge networks. BMC Bioinformatics 2015;16:135. [PMID: 25925016 PMCID: PMC4456720 DOI: 10.1186/s12859-015-0550-z] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 03/25/2015] [Indexed: 11/10/2022] Open

Livingston KM, Bada M, Baumgartner WA, Hunter LE. KaBOB: ontology-based semantic integration of biomedical databases. BMC Bioinformatics 2015;16:126. [PMID: 25903923 PMCID: PMC4448321 DOI: 10.1186/s12859-015-0559-3] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2014] [Accepted: 03/30/2015] [Indexed: 04/04/2023] Open

Abstract

Background

The ability to query many independent biological databases using a common ontology-based semantic model would facilitate deeper integration and more effective utilization of these diverse and rapidly growing resources. Despite ongoing work moving toward shared data formats and linked identifiers, significant problems persist in semantic data integration in order to establish shared identity and shared meaning across heterogeneous biomedical data sources.

Results

We present five processes for semantic data integration that, when applied collectively, solve seven key problems. These processes include making explicit the differences between biomedical concepts and database records, aggregating sets of identifiers denoting the same biomedical concepts across data sources, and using declaratively represented forward-chaining rules to take information that is variably represented in source databases and integrating it into a consistent biomedical representation. We demonstrate these processes and solutions by presenting KaBOB (the Knowledge Base Of Biomedicine), a knowledge base of semantically integrated data from 18 prominent biomedical databases using common representations grounded in Open Biomedical Ontologies. An instance of KaBOB with data about humans and seven major model organisms can be built using on the order of 500 million RDF triples. All source code for building KaBOB is available under an open-source license.

Conclusions

KaBOB is an integrated knowledge base of biomedical data representationally based in prominent, actively maintained Open Biomedical Ontologies, thus enabling queries of the underlying data in terms of biomedical concepts (e.g., genes and gene products, interactions and processes) rather than features of source-specific data schemas or file formats. KaBOB resolves many of the issues that routinely plague biomedical researchers intending to work with data from multiple data sources and provides a platform for ongoing data integration and development and for formal reasoning over a wealth of integrated biomedical data.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0559-3) contains supplementary material, which is available to authorized users.

Collapse

Albrecht SV, Barreto AMS, Braziunas D, Buckeridge DL, Cuayáhuitl H, Dethlefs N, Endres M, Farahmand AM, Fox M, Frommberger L, Ganzfried S, Gil Y, Guillet S, Hunter LE, Jhala A, Kersting K, Konidaris G, Lecue F, McIlraith S, Natarajan S, Noorian Z, Poole D, Ronfard R, Saffiotti A, Shaban-Nejad A, Srivastava B, Tesauro G, Uceda-Sosa R, Van den Broeck G, Van Otterlo M, Wallace BC, Weng P, Wiens J, Zhang J. Reports of the AAAI 2014 Conference Workshops. AI MAG 2015. [DOI: 10.1609/aimag.v36i1.2575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]

Hewett D, Whirl-Carrillo M, Hunter LE, Altman RB, Klein TE. A twentieth anniversary tribute to PSB. Pac Symp Biocomput 2015:1-7. [PMID: 25592562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]

Hinterberg MA, Kao DP, Bristow MR, Hunter LE, Port JD, Görg C. Peax: interactive visual analysis and exploration of complex clinical phenotype and gene expression association. Pac Symp Biocomput 2015:419-30. [PMID: 25592601 PMCID: PMC4344826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]

Pattin KA, Greene AC, Altman RB, Cohen KB, Wethington E, Görg C, Hunter LE, Muse SV, Radivojac P, Moore JH. Training the next generation of quantitative biologists in the era of big data. Pac Symp Biocomput 2015:488-492. [PMID: 25592609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]

Hailu ND, Cohen KB, Hunter LE. Ontology translation: A case study on translating the Gene Ontology from English to German. Nat Lang Process Inf Syst 2014;8455:33-38. [PMID: 29780975 PMCID: PMC5954410 DOI: 10.1007/978-3-319-07983-7_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Funk C, Baumgartner W, Garcia B, Roeder C, Bada M, Cohen KB, Hunter LE, Verspoor K. Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters. BMC Bioinformatics 2014;15:59. [PMID: 24571547 PMCID: PMC4015610 DOI: 10.1186/1471-2105-15-59] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2013] [Accepted: 01/24/2014] [Indexed: 11/10/2022] Open

Livingston KM, Bada M, Hunter LE, Verspoor K. Representing annotation compositionality and provenance for the Semantic Web. J Biomed Semantics 2013;4:38. [PMID: 24268021 PMCID: PMC4129183 DOI: 10.1186/2041-1480-4-38] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2013] [Accepted: 09/20/2013] [Indexed: 12/03/2022] Open

Hunter LE. Rocky Mountain Conference on Bioinformatics Celebrates 10 Years. PLoS Comput Biol 2013;9:e1003076. [PMID: 23737739 PMCID: PMC3667766 DOI: 10.1371/journal.pcbi.1003076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2013] [Accepted: 03/31/2013] [Indexed: 11/28/2022] Open

Cohen KB, Hunter LE. Chapter 16: text mining for translational bioinformatics. PLoS Comput Biol 2013;9:e1003044. [PMID: 23633944 PMCID: PMC3635962 DOI: 10.1371/journal.pcbi.1003044] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open

Cohen KB, Hunter LE, Palmer M. Assessment of software testing and quality assurance in natural language processing applications and a linguistically inspired approach to improving it. Trust Eternal Syst Via Evol Softw Data Knowl (2012) 2013;379:77-90. [PMID: 34308448 PMCID: PMC8300901 DOI: 10.1007/978-3-642-45260-4_6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

Hunter LE, Hopfer C, Terry SF, Coors ME. Reporting actionable research results: shared secrets can save lives. Sci Transl Med 2012;4:143cm8. [PMID: 22814848 DOI: 10.1126/scitranslmed.3003958] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]

Frantz AM, Sarver AL, Ito D, Phang TL, Karimpour-Fard A, Scott MC, Valli VEO, Lindblad-Toh K, Burgess KE, Husbands BD, Henson MS, Borgatti A, Kisseberth WC, Hunter LE, Breen M, O'Brien TD, Modiano JF. Molecular profiling reveals prognostically significant subtypes of canine lymphoma. Vet Pathol 2012;50:693-703. [PMID: 23125145 DOI: 10.1177/0300985812465325] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]