Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

10
(from Reference Citation Analysis)

Article PDFs (8)

Cited by > 0 (10)

Searched Name

Rezarta Islamaj Dogan

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Islamaj Dogan R, Kim S, Chatr-Aryamontri A, Wei CH, Comeau DC, Antunes R, Matos S, Chen Q, Elangovan A, Panyam NC, Verspoor K, Liu H, Wang Y, Liu Z, Altinel B, Hüsünbeyi ZM, Özgür A, Fergadis A, Wang CK, Dai HJ, Tran T, Kavuluru R, Luo L, Steppi A, Zhang J, Qu J, Lu Z. Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine. Database (Oxford) 2019;2019:5303240. [PMID: 30689846 PMCID: PMC6348314 DOI: 10.1093/database/bay147] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Accepted: 12/19/2018] [Indexed: 12/16/2022]

Abstract

The Precision Medicine Initiative is a multicenter effort aiming at formulating personalized treatments leveraging on individual patient data (clinical, genome sequence and functional genomic data) together with the information in large knowledge bases (KBs) that integrate genome annotation, disease association studies, electronic health records and other data types. The biomedical literature provides a rich foundation for populating these KBs, reporting genetic and molecular interactions that provide the scaffold for the cellular regulatory systems and detailing the influence of genetic variants in these interactions. The goal of BioCreative VI Precision Medicine Track was to extract this particular type of information and was organized in two tasks: (i) document triage task, focused on identifying scientific literature containing experimentally verified protein-protein interactions (PPIs) affected by genetic mutations and (ii) relation extraction task, focused on extracting the affected interactions (protein pairs). To assist system developers and task participants, a large-scale corpus of PubMed documents was manually annotated for this task. Ten teams worldwide contributed 22 distinct text-mining models for the document triage task, and six teams worldwide contributed 14 different text-mining systems for the relation extraction task. When comparing the text-mining system predictions with human annotations, for the triage task, the best F-score was 69.06%, the best precision was 62.89%, the best recall was 98.0% and the best average precision was 72.5%. For the relation extraction task, when taking homologous genes into account, the best F-score was 37.73%, the best precision was 46.5% and the best recall was 54.1%. Submitted systems explored a wide range of methods, from traditional rule-based, statistical and machine learning systems to state-of-the-art deep learning methods. Given the level of participation and the individual team results we find the precision medicine track to be successful in engaging the text-mining research community. In the meantime, the track produced a manually annotated corpus of 5509 PubMed documents developed by BioGRID curators and relevant for precision medicine. The data set is freely available to the community, and the specific interactions have been integrated into the BioGRID data set. In addition, this challenge provided the first results of automatically identifying PubMed articles that describe PPI affected by mutations, as well as extracting the affected relations from those articles. Still, much progress is needed for computer-assisted precision medicine text mining to become mainstream. Future work should focus on addressing the remaining technical challenges and incorporating the practical benefits of text-mining tools into real-world precision medicine information-related curation.

Collapse

Affiliation(s)

Rezarta Islamaj Dogan National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Sun Kim National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Andrew Chatr-Aryamontri Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, Canada
Chih-Hsuan Wei National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Donald C Comeau National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Rui Antunes Department of Electronics, Telecommunications and Informatics (DETI)/Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal
Sérgio Matos Department of Electronics, Telecommunications and Informatics (DETI)/Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal
Qingyu Chen School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia
Aparna Elangovan School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia
Nagesh C Panyam School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia
Karin Verspoor School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia
Hongfang Liu Department of Health Science Research, Mayo Clinic, Rochester, MN, USA
Yanshan Wang Department of Health Science Research, Mayo Clinic, Rochester, MN, USA
Zhuang Liu School of Computer Science and Technology, Dalian University of Technology, Dalian, China
Berna Altinel Department of Computer Engineering, Marmara University, Istanbul, Turkey
Zehra Melce Hüsünbeyi Department of Computer Engineering, Bogaziçi University, Istanbul, Turkey
Arzucan Özgür
Aris Fergadis School of Electrical and Computer Engineering, National Technical University of Athens, Zografou, Athens, Greece
Chen-Kai Wang Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei, Taiwan
Hong-Jie Dai Department of Electrical Engineering, National Kaousiung University of Science and Technology, Kaohsiung, Taiwan
Tung Tran Department of Computer Science, University of Kentucky, Lexington, KY, USA
Ramakanth Kavuluru Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, KY, USA
Ling Luo College of Computer Science and Technology, Dalian University of Technology, Dalian, China
Albert Steppi Department of Statistics, Florida State University, Florida, USA
Jinfeng Zhang Department of Statistics, Florida State University, Florida, USA
Jinchan Qu Department of Statistics, Florida State University, Florida, USA
Zhiyong Lu National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA

Collapse

Islamaj Dogan R, Kim S, Chatr-Aryamontri A, Chang CS, Oughtred R, Rust J, Wilbur WJ, Comeau DC, Dolinski K, Tyers M. The BioC-BioGRID corpus: full text articles annotated for curation of protein-protein and genetic interactions. Database (Oxford) 2017;2017:baw147. [PMID: 28077563 PMCID: PMC5225395 DOI: 10.1093/database/baw147] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2016] [Revised: 10/14/2016] [Accepted: 10/18/2016] [Indexed: 11/13/2022]

Abstract

A great deal of information on the molecular genetics and biochemistry of model organisms has been reported in the scientific literature. However, this data is typically described in free text form and is not readily amenable to computational analyses. To this end, the BioGRID database systematically curates the biomedical literature for genetic and protein interaction data. This data is provided in a standardized computationally tractable format and includes structured annotation of experimental evidence. BioGRID curation necessarily involves substantial human effort by expert curators who must read each publication to extract the relevant information. Computational text-mining methods offer the potential to augment and accelerate manual curation. To facilitate the development of practical text-mining strategies, a new challenge was organized in BioCreative V for the BioC task, the collaborative Biocurator Assistant Task. This was a non-competitive, cooperative task in which the participants worked together to build BioC-compatible modules into an integrated pipeline to assist BioGRID curators. As an integral part of this task, a test collection of full text articles was developed that contained both biological entity annotations (gene/protein and organism/species) and molecular interaction annotations (protein–protein and genetic interactions (PPIs and GIs)). This collection, which we call the BioC-BioGRID corpus, was annotated by four BioGRID curators over three rounds of annotation and contains 120 full text articles curated in a dataset representing two major model organisms, namely budding yeast and human. The BioC-BioGRID corpus contains annotations for 6409 mentions of genes and their Entrez Gene IDs, 186 mentions of organism names and their NCBI Taxonomy IDs, 1867 mentions of PPIs and 701 annotations of PPI experimental evidence statements, 856 mentions of GIs and 399 annotations of GI evidence statements. The purpose, characteristics and possible future uses of the BioC-BioGRID corpus are detailed in this report.

Database URL:http://bioc.sourceforge.net/BioC-BioGRID.html

Collapse

Leaman R, Islamaj Dogan R, Lu Z. DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 2013;29:2909-17. [PMID: 23969135 PMCID: PMC3810844 DOI: 10.1093/bioinformatics/btt474] [Citation(s) in RCA: 232] [Impact Index Per Article: 21.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

Dogan RI, Gil Y, Hirsh H, Krishnan NC, Lewis M, Mericli C, Rashidi P, Raskin V, Swarup S, Sun W, Taylor JM, Yeganova L. Reports on the 2012 AAAI Fall Symposium Series. AI MAG 2012. [DOI: 10.1609/aimag.v34i1.2457] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]

Krallinger M, Vazquez M, Leitner F, Salgado D, Chatr-aryamontri A, Winter A, Perfetto L, Briganti L, Licata L, Iannuccelli M, Castagnoli L, Cesareni G, Tyers M, Schneider G, Rinaldi F, Leaman R, Gonzalez G, Matos S, Kim S, Wilbur WJ, Rocha L, Shatkay H, Tendulkar AV, Agarwal S, Liu F, Wang X, Rak R, Noto K, Elkan C, Lu Z, Dogan RI, Fontaine JF, Andrade-Navarro MA, Valencia A. The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text. BMC Bioinformatics 2011;12 Suppl 8:S3. [PMID: 22151929 PMCID: PMC3269938 DOI: 10.1186/1471-2105-12-s8-s3] [Citation(s) in RCA: 107] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open

Abstract

BACKGROUND

Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional metrics. The BioCreative III Protein-Protein Interaction (PPI) tasks were motivated by such considerations, trying to address aspects including how the end user would oversee the generated output, for instance by providing ranked results, textual evidence for human interpretation or measuring time savings by using automated systems. Detecting articles describing complex biological events like PPIs was addressed in the Article Classification Task (ACT), where participants were asked to implement tools for detecting PPI-describing abstracts. Therefore the BCIII-ACT corpus was provided, which includes a training, development and test set of over 12,000 PPI relevant and non-relevant PubMed abstracts labeled manually by domain experts and recording also the human classification times. The Interaction Method Task (IMT) went beyond abstracts and required mining for associations between more than 3,500 full text articles and interaction detection method ontology concepts that had been applied to detect the PPIs reported in them.

RESULTS

A total of 11 teams participated in at least one of the two PPI tasks (10 in ACT and 8 in the IMT) and a total of 62 persons were involved either as participants or in preparing data sets/evaluating these tasks. Per task, each team was allowed to submit five runs offline and another five online via the BioCreative Meta-Server. From the 52 runs submitted for the ACT, the highest Matthew's Correlation Coefficient (MCC) score measured was 0.55 at an accuracy of 89% and the best AUC iP/R was 68%. Most ACT teams explored machine learning methods, some of them also used lexical resources like MeSH terms, PSI-MI concepts or particular lists of verbs and nouns, some integrated NER approaches. For the IMT, a total of 42 runs were evaluated by comparing systems against manually generated annotations done by curators from the BioGRID and MINT databases. The highest AUC iP/R achieved by any run was 53%, the best MCC score 0.55. In case of competitive systems with an acceptable recall (above 35%) the macro-averaged precision ranged between 50% and 80%, with a maximum F-Score of 55%.

CONCLUSIONS

The results of the ACT task of BioCreative III indicate that classification of large unbalanced article collections reflecting the real class imbalance is still challenging. Nevertheless, text-mining tools that report ranked lists of relevant articles for manual selection can potentially reduce the time needed to identify half of the relevant articles to less than 1/4 of the time when compared to unranked results. Detecting associations between full text articles and interaction detection method PSI-MI terms (IMT) is more difficult than might be anticipated. This is due to the variability of method term mentions, errors resulting from pre-processing of articles provided as PDF files, and the heterogeneity and different granularity of method term concepts encountered in the ontology. However, combining the sophisticated techniques developed by the participants with supporting evidence strings derived from the articles for human interpretation could result in practical modules for biological annotation workflows.

Collapse

Affiliation(s)

Martin Krallinger Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
Miguel Vazquez Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
Florian Leitner Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
David Salgado Australian Regenerative Medicine Institute, Monash University, Australia
Andrew Chatr-aryamontri School of Biological Sciences, University of Edinburgh, Edinburgh, UK
Andrew Winter School of Biological Sciences, University of Edinburgh, Edinburgh, UK
Livia Perfetto Department of Biology, University of Rome Tor Vergata, Rome, Italy
Leonardo Briganti Department of Biology, University of Rome Tor Vergata, Rome, Italy
Luana Licata Department of Biology, University of Rome Tor Vergata, Rome, Italy
Marta Iannuccelli Department of Biology, University of Rome Tor Vergata, Rome, Italy
Luisa Castagnoli Department of Biology, University of Rome Tor Vergata, Rome, Italy
Gianni Cesareni Department of Biology, University of Rome Tor Vergata, Rome, Italy IRCSS, Fondazione Santa Lucia, Rome, Italy
Mike Tyers School of Biological Sciences, University of Edinburgh, Edinburgh, UK
Gerold Schneider Institute of Computational Linguistics, University of Zurich, Zurich, Switzerland
Fabio Rinaldi Institute of Computational Linguistics, University of Zurich, Zurich, Switzerland
Robert Leaman School of Computing, Informatics and Decision Systems Engineering, Arizona State University, Tempe, Arizona, USA
Graciela Gonzalez Department of Biomedical Informatics, Arizona State University, Tempe, Arizona, USA
Sergio Matos Institute of Electronics and Telematics Engineering of Aveiro, University of Aveiro Campus Universitario de Santiago, 3810-193 Aveiro, Portugal
Sun Kim National Center for Biotechnology Information (NCBI), 8600 Rockville Pike, Bethesda, Maryland, 20894, USA
W John Wilbur National Center for Biotechnology Information (NCBI), 8600 Rockville Pike, Bethesda, Maryland, 20894, USA
Luis Rocha School of Informatics and Computing, Indiana University, 919 E. 10th St Bloomington IN, 47408, USA
Hagit Shatkay Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, USA
Ashish V Tendulkar Department of Computer Science and Engineering, IIT Madras, Chennai-600 036, India
Shashank Agarwal Medical Informatics, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, USA
Feifan Liu Medical Informatics, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, USA
Xinglong Wang National Centre for Text Mining and School of Computer Science, University of Manchester, Manchester, UK
Rafal Rak National Centre for Text Mining and School of Computer Science, University of Manchester, Manchester, UK
Keith Noto Department of Computer Science, Tufts University, 161 College Ave, Medford, MA 02155, USA
Charles Elkan Department of Computer Science and Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
Zhiyong Lu National Center for Biotechnology Information (NCBI), 8600 Rockville Pike, Bethesda, Maryland, 20894, USA
Rezarta Islamaj Dogan National Center for Biotechnology Information (NCBI), 8600 Rockville Pike, Bethesda, Maryland, 20894, USA
Jean-Fred Fontaine Computational Biology and Data Mining Group, Max-Delbrück-Centrum für Molekulare Medizin, Robert-Rössle-Str. 10, 13125 Berlin, Germany
Miguel A Andrade-Navarro Computational Biology and Data Mining Group, Max-Delbrück-Centrum für Molekulare Medizin, Robert-Rössle-Str. 10, 13125 Berlin, Germany
Alfonso Valencia Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain

Collapse

Mork JG, Bodenreider O, Demner-Fushman D, Dogan RI, Lang FM, Lu Z, Névéol A, Peters L, Shooshan SE, Aronson AR. Extracting Rx information from clinical narrative. J Am Med Inform Assoc 2010;17:536-9. [PMID: 20819859 DOI: 10.1136/jamia.2010.003970] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open

Islamaj Dogan R, Murray GC, Névéol A, Lu Z. Understanding PubMed user search behavior through log analysis. Database (Oxford) 2009;2009:bap018. [PMID: 20157491 PMCID: PMC2797455 DOI: 10.1093/database/bap018] [Citation(s) in RCA: 134] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2009] [Revised: 10/05/2009] [Accepted: 10/06/2009] [Indexed: 11/20/2022]

Dogan RI, Getoor L, Wilbur WJ, Mount SM. Features generated for computational splice-site prediction correspond to functional elements. BMC Bioinformatics 2007;8:410. [PMID: 17958908 PMCID: PMC2241647 DOI: 10.1186/1471-2105-8-410] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2007] [Accepted: 10/24/2007] [Indexed: 11/16/2022] Open

Zotenko E, Islamaj Dogan R, Wilbur WJ, O'Leary DP, Przytycka TM. Structural footprinting in protein structure comparison: the impact of structural fragments. BMC Struct Biol 2007;7:53. [PMID: 17688700 PMCID: PMC2082327 DOI: 10.1186/1472-6807-7-53] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2007] [Accepted: 08/09/2007] [Indexed: 11/23/2022]

Abstract

Background

One approach for speeding-up protein structure comparison is the projection approach, where a protein structure is mapped to a high-dimensional vector and structural similarity is approximated by distance between the corresponding vectors. Structural footprinting methods are projection methods that employ the same general technique to produce the mapping: first select a representative set of structural fragments as models and then map a protein structure to a vector in which each dimension corresponds to a particular model and "counts" the number of times the model appears in the structure. The main difference between any two structural footprinting methods is in the set of models they use; in fact a large number of methods can be generated by varying the type of structural fragments used and the amount of detail in their representation. How do these choices affect the ability of the method to detect various types of structural similarity?

Results

To answer this question we benchmarked three structural footprinting methods that vary significantly in their selection of models against the CATH database. In the first set of experiments we compared the methods' ability to detect structural similarity characteristic of evolutionarily related structures, i.e., structures within the same CATH superfamily. In the second set of experiments we tested the methods' agreement with the boundaries imposed by classification groups at the Class, Architecture, and Fold levels of the CATH hierarchy.

Conclusion

In both experiments we found that the method which uses secondary structure information has the best performance on average, but no one method performs consistently the best across all groups at a given classification level. We also found that combining the methods' outputs significantly improves the performance. Moreover, our new techniques to measure and visualize the methods' agreement with the CATH hierarchy, including the threshholded affinity graph, are useful beyond this work. In particular, they can be used to expose a similar composition of different classification groups in terms of structural fragments used by the method and thus provide an alternative demonstration of the continuous nature of the protein structure universe.

Collapse

Dogan RI, Getoor L, Wilbur WJ, Mount SM. SplicePort--an interactive splice-site analysis tool. Nucleic Acids Res 2007;35:W285-91. [PMID: 17576680 PMCID: PMC1933122 DOI: 10.1093/nar/gkm407] [Citation(s) in RCA: 177] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open