Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	[Subscribe] [Scholar Register]

Number

Cited by Other Article(s)

Abdelmageed N, Löffler F, Feddoul L, Algergawy A, Samuel S, Gaikwad J, Kazem A, König-Ries B. BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain. Biodivers Data J 2022;10:e89481. [PMID: 36761617 PMCID: PMC9836593 DOI: 10.3897/bdj.10.e89481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 09/07/2022] [Indexed: 11/12/2022] Open

Affiliation(s)

Nora Abdelmageed Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, GermanyHeinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University JenaJenaGermany,2Michael-Stifel-Center for Data-Driven and Simulation Science, Jena, GermanyMichael-Stifel-Center for Data-Driven and Simulation ScienceJenaGermany
Felicitas Löffler Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, GermanyHeinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University JenaJenaGermany
Leila Feddoul Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, GermanyHeinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University JenaJenaGermany
Alsayed Algergawy Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, GermanyHeinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University JenaJenaGermany
Sheeba Samuel Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, GermanyHeinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University JenaJenaGermany,2Michael-Stifel-Center for Data-Driven and Simulation Science, Jena, GermanyMichael-Stifel-Center for Data-Driven and Simulation ScienceJenaGermany
Jitendra Gaikwad Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, GermanyHeinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University JenaJenaGermany
Anahita Kazem Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, GermanyHeinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University JenaJenaGermany,3German Center for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, GermanyGerman Center for Integrative Biodiversity Research (iDiv)Halle-Jena-LeipzigGermany
Birgitta König-Ries Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, GermanyHeinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University JenaJenaGermany,2Michael-Stifel-Center for Data-Driven and Simulation Science, Jena, GermanyMichael-Stifel-Center for Data-Driven and Simulation ScienceJenaGermany,3German Center for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, GermanyGerman Center for Integrative Biodiversity Research (iDiv)Halle-Jena-LeipzigGermany

Collapse

Kruesi L, Burstein F, Tanner K. A knowledge management system framework for an open biomedical repository: communities, collaboration and corroboration. JOURNAL OF KNOWLEDGE MANAGEMENT 2020. [DOI: 10.1108/jkm-05-2020-0370] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

Abstract Purpose The purpose of this study is to assess the opportunity for a distributed, networked open biomedical repository (OBR) using a knowledge management system (KMS) conceptual framework. An innovative KMS conceptual framework is proposed to guide the transition from a traditional, siloed approach to a sustainable OBR. Design/methodology/approach This paper reports on a cycle of action research, involving literature review, interviews and focus group with leaders in biomedical research, open science and librarianship, and an audit of elements needed for an Australasian OBR; these, along with an Australian KM standard, informed the resultant KMS framework. Findings The proposed KMS framework aligns the requirements for an OBR with the people, process, technology and content elements of the KM standard. It identifies and defines nine processes underpinning biomedical knowledge – discovery, creation, representation, classification, storage, retrieval, dissemination, transfer and translation. The results comprise an explanation of these processes and examples of the people, process, technology and content dimensions of each process. While the repository is an integral cog within the collaborative, distributed open science network, its effectiveness depends on understanding the relationships and linkages between system elements and achieving an appropriate balance between them. Research limitations/implications The current research has focused on biomedicine. This research builds on the worldwide effort to reduce barriers, in particular paywalls to health knowledge. The findings present an opportunity to rationalize and improve a KMS integral to biomedical knowledge. Practical implications Adoption of the KMS framework for a distributed, networked OBR will facilitate open science through reducing duplication of effort, removing barriers to the flow of knowledge and ensuring effective management of biomedical knowledge. Social implications Achieving quality, permanency and discoverability of a region’s digital assets is possible through ongoing usage of the framework for researchers, industry and consumers. Originality/value The framework demonstrates the dependencies and interplay of elements and processes to frame an OBR KMS. Collapse

Roberts K, Alam T, Bedrick S, Demner-Fushman D, Lo K, Soboroff I, Voorhees E, Wang LL, Hersh WR. TREC-COVID: rationale and structure of an information retrieval shared task for COVID-19. J Am Med Inform Assoc 2020;27:1431-1436. [PMID: 32365190 PMCID: PMC7239098 DOI: 10.1093/jamia/ocaa091] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Accepted: 05/01/2020] [Indexed: 11/17/2022] Open

Anjaria KA. Computational implementation and formalism of FAIR data stewardship principles. DATA TECHNOLOGIES AND APPLICATIONS 2020. [DOI: 10.1108/dta-09-2019-0164] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

A content-based literature recommendation system for datasets to improve data reusability - A case study on Gene Expression Omnibus (GEO) datasets. J Biomed Inform 2020;104:103399. [PMID: 32151769 DOI: 10.1016/j.jbi.2020.103399] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2019] [Revised: 02/26/2020] [Accepted: 03/01/2020] [Indexed: 02/02/2023]

Abstract

OBJECTIVE

The centrality of data to biomedical research is difficult to understate, and the same is true for the importance of the biomedical literature in disseminating empirical findings to scientific questions made on such data. But the connections between the literature and related datasets are often weak, hampering the ability of scientists to easily move between existing datasets and existing findings to derive new scientific hypotheses. This work aims to recommend relevant literature articles for datasets with the ultimate goal of increasing the productivity of researchers. Our approach to literature recommendation for datasets is a part of the dataset reusability platform developed at the University Texas Health Science Center at Houston for datasets related to gene expression. This platform incorporates datasets from Gene Expression Omnibus (GEO). An average of 34 datasets were added to GEO daily in the last five years (i.e. 2014 to 2018), demonstrating the need for automatic methods to connect these datasets with relevant literature. The relevant literature for a given dataset may describe that dataset, provide a scientific finding based on that dataset, or even describe prior and related work to the dataset's topic that is of interest to users of the dataset.

MATERIALS AND METHODS

We adopt an information retrieval paradigm for literature recommendation. In our experiments, distributional semantic features are created from the title and abstract of MEDLINE articles. Then, related articles are identified for datasets in GEO. We evaluate multiple distributional methods such as TF-IDF, BM25, Latent Semantic Analysis, Latent Dirichlet Allocation, word2vec, and doc2vec. Top similar papers are recommended for each dataset using cosine similarity between the dataset's vector representation and every paper's vector representation. We also propose several novel re-ranking and normalization methods over embeddings to improve the recommendations.

RESULTS

The top-performing literature recommendation technique achieved a strict precision at 10 of 0.8333 and a partial precision at 10 of 0.9000 using BM25 based on a manual evaluation of 36 datasets. Evaluation on a larger, automatically-collected benchmark shows small but consistent gains by emphasizing the similarity of dataset and article titles.

CONCLUSION

This work is the first step toward developing a literature recommendation tool by recommending relevant literature for datasets. This will hopefully lead to better data reuse experience.

Collapse

Patra BG, Roberts K, Wu H. A content-based dataset recommendation system for researchers-a case study on Gene Expression Omnibus (GEO) repository. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020;2020:1. [PMID: 33002137 PMCID: PMC7659921 DOI: 10.1093/database/baaa064] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Revised: 07/19/2020] [Accepted: 07/27/2020] [Indexed: 11/13/2022]

Abstract

It is a growing trend among researchers to make their data publicly available for experimental reproducibility and data reusability. Sharing data with fellow researchers helps in increasing the visibility of the work. On the other hand, there are researchers who are inhibited by the lack of data resources. To overcome this challenge, many repositories and knowledge bases have been established to date to ease data sharing. Further, in the past two decades, there has been an exponential increase in the number of datasets added to these dataset repositories. However, most of these repositories are domain-specific, and none of them can recommend datasets to researchers/users. Naturally, it is challenging for a researcher to keep track of all the relevant repositories for potential use. Thus, a dataset recommender system that recommends datasets to a researcher based on previous publications can enhance their productivity and expedite further research. This work adopts an information retrieval (IR) paradigm for dataset recommendation. We hypothesize that two fundamental differences exist between dataset recommendation and PubMed-style biomedical IR beyond the corpus. First, instead of keywords, the query is the researcher, embodied by his or her publications. Second, to filter the relevant datasets from non-relevant ones, researchers are better represented by a set of interests, as opposed to the entire body of their research. This second approach is implemented using a non-parametric clustering technique. These clusters are used to recommend datasets for each researcher using the cosine similarity between the vector representations of publication clusters and datasets. The maximum normalized discounted cumulative gain at 10 (NDCG@10), precision at 10 (p@10) partial and p@10 strict of 0.89, 0.78 and 0.61, respectively, were obtained using the proposed method after manual evaluation by five researchers. As per the best of our knowledge, this is the first study of its kind on content-based dataset recommendation. We hope that this system will further promote data sharing, offset the researchers' workload in identifying the right dataset and increase the reusability of biomedical datasets. Database URL: http://genestudy.org/recommends/#/.

Collapse

Chen X, Gururaj AE, Ozyurt B, Liu R, Soysal E, Cohen T, Tiryaki F, Li Y, Zong N, Jiang M, Rogith D, Salimi M, Kim HE, Rocca-Serra P, Gonzalez-Beltran A, Farcas C, Johnson T, Margolis R, Alter G, Sansone SA, Fore IM, Ohno-Machado L, Grethe JS, Xu H. DataMed - an open source discovery index for finding biomedical datasets. J Am Med Inform Assoc 2018;25:300-308. [PMID: 29346583 PMCID: PMC7378878 DOI: 10.1093/jamia/ocx121] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Revised: 09/20/2017] [Accepted: 09/28/2017] [Indexed: 12/17/2022] Open

Affiliation(s)

Xiaoling Chen School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
Anupama E Gururaj School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
Burak Ozyurt Center for Research in Biological Systems
Ruiling Liu School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
Ergin Soysal School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
Trevor Cohen School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
Firat Tiryaki School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
Yueling Li Center for Research in Biological Systems
Nansu Zong Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
Min Jiang School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
Deevakar Rogith School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
Mandana Salimi School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
Hyeon-Eui Kim Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
Philippe Rocca-Serra e-Research Centre, University of Oxford, Oxford, UK
Alejandra Gonzalez-Beltran e-Research Centre, University of Oxford, Oxford, UK
Claudiu Farcas Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
Todd Johnson School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
Ron Margolis National Institutes of Health, Bethesda, MD, USA
George Alter University of Michigan, Ann Arbor, MI, USA
Susanna-Assunta Sansone e-Research Centre, University of Oxford, Oxford, UK
Ian M Fore National Institutes of Health, Bethesda, MD, USA
Lucila Ohno-Machado Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
Jeffrey S Grethe Center for Research in Biological Systems
Hua Xu School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA

Collapse

Karisani P, Qin ZS, Agichtein E. Probabilistic and machine learning-based retrieval approaches for biomedical dataset retrieval. Database (Oxford) 2018;2018:4956082. [PMID: 29688379 PMCID: PMC5887275 DOI: 10.1093/database/bax104] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2017] [Revised: 11/12/2017] [Accepted: 12/20/2017] [Indexed: 11/17/2022]

Cieslewicz A, Dutkiewicz J, Jedrzejek C. Baseline and extensions approach to information retrieval of complex medical data: Poznan's approach to the bioCADDIE 2016. Database (Oxford) 2018;2018:4930756. [PMID: 29688372 PMCID: PMC5846287 DOI: 10.1093/database/bax103] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2017] [Revised: 12/18/2017] [Accepted: 12/18/2017] [Indexed: 11/23/2022]

Wei W, Ji Z, He Y, Zhang K, Ha Y, Li Q, Ohno-Machado L. Finding relevant biomedical datasets: the UC San Diego solution for the bioCADDIE Retrieval Challenge. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018;2018:4939515. [PMID: 29688374 PMCID: PMC5861401 DOI: 10.1093/database/bay017] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 01/30/2018] [Indexed: 01/28/2023]

Scerri A, Kuriakose J, Deshmane AA, Stanger M, Cotroneo P, Moore R, Naik R, de Waard A. Elsevier's approach to the bioCADDIE 2016 Dataset Retrieval Challenge. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2017;2017:4090923. [PMID: 29220454 PMCID: PMC5737073 DOI: 10.1093/database/bax056] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 06/29/2017] [Indexed: 11/13/2022]

Bouadjenek MR, Verspoor K. Multi-field query expansion is effective for biomedical dataset retrieval. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2017;2017:4107606. [PMID: 29220457 PMCID: PMC5737205 DOI: 10.1093/database/bax062] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Accepted: 07/31/2017] [Indexed: 01/01/2023]

Abstract

In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data Discovery Index prototype or the improvement of the existing one.

Collapse

Wang Y, Rastegar-Mojarad M, Komandur-Elayavilli R, Liu H. Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts. Database (Oxford) 2017;2017:bax091. [PMID: 31725862 PMCID: PMC7243926 DOI: 10.1093/database/bax091] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Revised: 10/17/2017] [Accepted: 11/14/2017] [Indexed: 11/16/2022]