Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Thomas PE, Klinger R, Furlong LI, Hofmann-Apitius M, Friedrich CM. Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers. BMC Bioinformatics 2011;12 Suppl 4:S4. [PMID: 21992066 PMCID: PMC3194196 DOI: 10.1186/1471-2105-12-s4-s4] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open

For:	Thomas PE, Klinger R, Furlong LI, Hofmann-Apitius M, Friedrich CM. Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers. BMC Bioinformatics 2011;12 Suppl 4:S4. [PMID: 21992066 PMCID: PMC3194196 DOI: 10.1186/1471-2105-12-s4-s4] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open

Number

Cited by Other Article(s)

Garda S, Weber-Genzel L, Martin R, Leser U. BELB: a biomedical entity linking benchmark. Bioinformatics 2023;39:btad698. [PMID: 37975879 PMCID: PMC10681865 DOI: 10.1093/bioinformatics/btad698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 10/30/2023] [Accepted: 11/16/2023] [Indexed: 11/19/2023] Open

Mostafa T, Abdel-Hamid I, Taymour M, Ali O. Genetic variants in varicocele-related male infertility: a systematic review and future directions. HUM FERTIL 2023;26:632-648. [PMID: 34587863 DOI: 10.1080/14647273.2021.1983214] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Accepted: 05/12/2021] [Indexed: 02/08/2023]

Becker TE, Jakobsson E. ResidueFinder: extracting individual residue mentions from protein literature. J Biomed Semantics 2021;12:14. [PMID: 34289903 PMCID: PMC8293528 DOI: 10.1186/s13326-021-00243-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Accepted: 05/07/2021] [Indexed: 11/10/2022] Open

Abstract

Background

The revolution in molecular biology has shown how protein function and structure are based on specific sequences of amino acids. Thus, an important feature in many papers is the mention of the significance of individual amino acids in the context of the entire sequence of the protein. MutationFinder is a widely used program for finding mentions of specific mutations in texts. We report on augmenting the positive attributes of MutationFinder with a more inclusive regular expression list to create ResidueFinder, which finds mentions of native amino acids as well as mutations. We also consider parameter options for both ResidueFinder and MutationFinder to explore trade-offs between precision, recall, and computational efficiency. We test our methods and software in full text as well as abstracts.

Results

We find there is much more variety of formats for mentioning residues in the entire text of papers than in abstracts alone. Failure to take these multiple formats into account results in many false negatives in the program. Since MutationFinder, like several other programs, was primarily tested on abstracts, we found it necessary to build an expanded regular expression list to achieve acceptable recall in full text searches. We also discovered a number of artifacts arising from PDF to text conversion, which we wrote elements in the regular expression library to address. Taking into account those factors resulted in high recall on randomly selected primary research articles. We also developed a streamlined regular expression (called “cut”) which enables a several hundredfold speedup in both MutationFinder and ResidueFinder with only a modest compromise of recall. All regular expressions were tested using expanded F-measure statistics, i.e., we compute F_β for various values of where the larger the value of β the more recall is weighted, the smaller the value of β the more precision is weighted.

Conclusions

ResidueFinder is a simple, effective, and efficient program for finding individual residue mentions in primary literature starting with text files, implemented in Python, and available in SourceForge.net. The most computationally efficient versions of ResidueFinder could enable creation and maintenance of a database of residue mentions encompassing all articles in PubMed.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13326-021-00243-3.

Collapse

Lee K, Wei CH, Lu Z. Recent advances of automated methods for searching and extracting genomic variant information from biomedical literature. Brief Bioinform 2021;22:bbaa142. [PMID: 32770181 PMCID: PMC8138883 DOI: 10.1093/bib/bbaa142] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 06/07/2020] [Accepted: 06/25/2020] [Indexed: 12/28/2022] Open

Rehmat N, Farooq H, Kumar S, Ul Hussain S, Naveed H. Predicting the pathogenicity of protein coding mutations using Natural Language Processing. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2020;2020:5842-5846. [PMID: 33019302 DOI: 10.1109/embc44109.2020.9175781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Huang MS, Lai PT, Lin PY, You YT, Tsai RTH, Hsu WL. Biomedical named entity recognition and linking datasets: survey and our recent development. Brief Bioinform 2020;21:2219-2238. [DOI: 10.1093/bib/bbaa054] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 02/29/2020] [Accepted: 03/31/2020] [Indexed: 11/14/2022] Open

Abstract AbstractNatural language processing (NLP) is widely applied in biological domains to retrieve information from publications. Systems to address numerous applications exist, such as biomedical named entity recognition (BNER), named entity normalization (NEN) and protein–protein interaction extraction (PPIE). High-quality datasets can assist the development of robust and reliable systems; however, due to the endless applications and evolving techniques, the annotations of benchmark datasets may become outdated and inappropriate. In this study, we first review commonlyused BNER datasets and their potential annotation problems such as inconsistency and low portability. Then, we introduce a revised version of the JNLPBA dataset that solves potential problems in the original and use state-of-the-art named entity recognition systems to evaluate its portability to different kinds of biomedical literature, including protein–protein interaction and biology events. Lastly, we introduce an ensembled biomedical entity dataset (EBED) by extending the revised JNLPBA dataset with PubMed Central full-text paragraphs, figure captions and patent abstracts. This EBED is a multi-task dataset that covers annotations including gene, disease and chemical entities. In total, it contains 85000 entity mentions, 25000 entity mentions with database identifiers and 5000 attribute tags. To demonstrate the usage of the EBED, we review the BNER track from the AI CUP Biomedical Paper Analysis challenge. Availability: The revised JNLPBA dataset is available at https://iasl-btm.iis.sinica.edu.tw/BNER/Content/Re vised_JNLPBA.zip. The EBED dataset is available at https://iasl-btm.iis.sinica.edu.tw/BNER/Content/AICUP _EBED_dataset.rar. Contact: Email: thtsai@g.ncu.edu.tw, Tel. 886-3-4227151 ext. 35203, Fax: 886-3-422-2681 Email: hsu@iis.sinica.edu.tw, Tel. 886-2-2788-3799 ext. 2211, Fax: 886-2-2782-4814 Supplementary information: Supplementary data are available at Briefings in Bioinformatics online. Collapse

Xu J, Kim S, Song M, Jeong M, Kim D, Kang J, Rousseau JF, Li X, Xu W, Torvik VI, Bu Y, Chen C, Ebeid IA, Li D, Ding Y. Building a PubMed knowledge graph. Sci Data 2020;7:205. [PMID: 32591513 PMCID: PMC7320186 DOI: 10.1038/s41597-020-0543-2] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 05/26/2020] [Indexed: 01/08/2023] Open

Yi C, Huang C, Wang H, Wang C, Dong L, Gu X, Feng X, Chen B. Association study between CYP24A1 gene polymorphisms and cancer risk. Pathol Res Pract 2019;216:152735. [PMID: 31740231 DOI: 10.1016/j.prp.2019.152735] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/17/2019] [Revised: 10/23/2019] [Accepted: 11/10/2019] [Indexed: 02/06/2023]

Galea D, Laponogov I, Veselkov K. Exploiting and assessing multi-source data for supervised biomedical named entity recognition. Bioinformatics 2019. [PMID: 29538614 PMCID: PMC6041968 DOI: 10.1093/bioinformatics/bty152] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Abstract

Motivation

Recognition of biomedical entities from scientific text is a critical component of natural language processing and automated information extraction platforms. Modern named entity recognition approaches rely heavily on supervised machine learning techniques, which are critically dependent on annotated training corpora. These approaches have been shown to perform well when trained and tested on the same source. However, in such scenario, the performance and evaluation of these models may be optimistic, as such models may not necessarily generalize to independent corpora, resulting in potential non-optimal entity recognition for large-scale tagging of widely diverse articles in databases such as PubMed.

Results

Here we aggregated published corpora for the recognition of biomolecular entities (such as genes, RNA, proteins, variants, drugs and metabolites), identified entity class overlap and performed leave-corpus-out cross validation strategy to test the efficiency of existing models. We demonstrate that accuracies of models trained on individual corpora decrease substantially for recognition of the same biomolecular entity classes in independent corpora. This behavior is possibly due to limited generalizability of entity-class-related features captured by individual corpora (model 'overtraining') which we investigated further at the orthographic level, as well as potential annotation standard differences. We show that the combined use of multi-source training corpora results in overall more generalizable models for named entity recognition, while achieving comparable individual performance. By performing learning-curve-based power analysis we further identified that performance is often not limited by the quantity of the annotated data.

Availability and implementation

Compiled primary and secondary sources of the aggregated corpora are available on: https://github.com/dterg/biomedical_corpora/wiki and https://bitbucket.org/iAnalytica/bioner.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Tawfik NS, Spruit MR. The SNPcurator: literature mining of enriched SNP-disease associations. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018;2018:4925332. [PMID: 29688369 PMCID: PMC5844215 DOI: 10.1093/database/bay020] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2017] [Accepted: 02/05/2018] [Indexed: 01/08/2023]

Wei CH, Phan L, Feltz J, Maiti R, Hefferon T, Lu Z. tmVar 2.0: integrating genomic variant information from literature with dbSNP and ClinVar for precision medicine. Bioinformatics 2018;34:80-87. [PMID: 28968638 DOI: 10.1093/bioinformatics/btx541] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Accepted: 08/31/2017] [Indexed: 11/12/2022] Open

Abstract

Motivation

Despite significant efforts in expert curation, clinical relevance about most of the 154 million dbSNP reference variants (RS) remains unknown. However, a wealth of knowledge about the variant biological function/disease impact is buried in unstructured literature data. Previous studies have attempted to harvest and unlock such information with text-mining techniques but are of limited use because their mutation extraction results are not standardized or integrated with curated data.

Results

We propose an automatic method to extract and normalize variant mentions to unique identifiers (dbSNP RSIDs). Our method, in benchmarking results, demonstrates a high F-measure of ∼90% and compared favorably to the state of the art. Next, we applied our approach to the entire PubMed and validated the results by verifying that each extracted variant-gene pair matched the dbSNP annotation based on mapped genomic position, and by analyzing variants curated in ClinVar. We then determined which text-mined variants and genes constituted novel discoveries. Our analysis reveals 41 889 RS numbers (associated with 9151 genes) not found in ClinVar. Moreover, we obtained a rich set worth further review: 12 462 rare variants (MAF ≤ 0.01) in 3849 genes which are presumed to be deleterious and not frequently found in the general population. To our knowledge, this is the first large-scale study to analyze and integrate text-mined variant data with curated knowledge in existing databases. Our results suggest that databases can be significantly enriched by text mining and that the combined information can greatly assist human efforts in evaluating/prioritizing variants in genomic research.

Availability and implementation

The tmVar 2.0 source code and corpus are freely available at https://www.ncbi.nlm.nih.gov/research/bionlp/Tools/tmvar/.

Contact

zhiyong.lu@nih.gov.

Collapse

Xie T, Akbar S, Stathopoulou MG, Oster T, Masson C, Yen FT, Visvikis-Siest S. Epistatic interaction of apolipoprotein E and lipolysis-stimulated lipoprotein receptor genetic variants is associated with Alzheimer's disease. Neurobiol Aging 2018;69:292.e1-292.e5. [PMID: 29858039 DOI: 10.1016/j.neurobiolaging.2018.04.013] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Revised: 04/25/2018] [Accepted: 04/25/2018] [Indexed: 01/19/2023]

Kohailan M, Alanazi M, Rouabhia M, Al Amri A, Parine NR, Semlali A. Two SNPs in the promoter region of Toll-like receptor 4 gene are not associated with smoking in Saudi Arabia. Onco Targets Ther 2017;10:745-752. [PMID: 28223830 PMCID: PMC5308598 DOI: 10.2147/ott.s111971] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open

Computational Analysis of Damaging Single-Nucleotide Polymorphisms and Their Structural and Functional Impact on the Insulin Receptor. BIOMED RESEARCH INTERNATIONAL 2016;2016:2023803. [PMID: 27840822 PMCID: PMC5093252 DOI: 10.1155/2016/2023803] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/17/2016] [Accepted: 09/14/2016] [Indexed: 12/31/2022]

Verspoor KM, Heo GE, Kang KY, Song M. Establishing a baseline for literature mining human genetic variants and their relationships to disease cohorts. BMC Med Inform Decis Mak 2016;16 Suppl 1:68. [PMID: 27454860 PMCID: PMC4959367 DOI: 10.1186/s12911-016-0294-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The Variome corpus, a small collection of published articles about inherited colorectal cancer, includes annotations of 11 entity types and 13 relation types related to the curation of the relationship between genetic variation and disease. Due to the richness of these annotations, the corpus provides a good testbed for evaluation of biomedical literature information extraction systems.

METHODS

In this paper, we focus on assessing performance on extracting the relations in the corpus, using gold standard entities as a starting point, to establish a baseline for extraction of relations important for extraction of genetic variant information from the literature. We test the application of the Public Knowledge Discovery Engine for Java (PKDE4J) system, a natural language processing system designed for information extraction of entities and relations in text, on the relation extraction task using this corpus.

RESULTS

For the relations which are attested at least 100 times in the Variome corpus, we realise a performance ranging from 0.78-0.84 Precision-weighted F-score, depending on the relation. We find that the PKDE4J system adapted straightforwardly to the range of relation types represented in the corpus; some extensions to the original methodology were required to adapt to the multi-relational classification context. The results are competitive with state-of-the-art relation extraction performance on more heavily studied corpora, although the analysis shows that the Recall of a co-occurrence baseline outweighs the benefit of improved Precision for many relations, indicating the value of simple semantic constraints on relations.

CONCLUSIONS

This work represents the first attempt to apply relation extraction methods to the Variome corpus. The results demonstrate that automated methods have good potential to structure the information expressed in the published literature related to genetic variants, connecting mutations to genes, diseases, and patient cohorts. Further development of such approaches will facilitate more efficient biocuration of genetic variant information into structured databases, leveraging the knowledge embedded in the vast publication literature.

Collapse

Yu W, Li Y, Wang Z, Liu L, Liu J, Ding F, Zhang X, Cheng Z, Chen P, Dou J. Transcriptomic changes in human renal proximal tubular cells revealed under hypoxic conditions by RNA sequencing. Int J Mol Med 2016;38:894-902. [PMID: 27432315 DOI: 10.3892/ijmm.2016.2677] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2015] [Accepted: 07/07/2016] [Indexed: 11/05/2022] Open

Thomas P, Rocktäschel T, Hakenberg J, Lichtblau Y, Leser U. SETH detects and normalizes genetic variants in text. Bioinformatics 2016;32:2883-5. [PMID: 27256315 DOI: 10.1093/bioinformatics/btw234] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2015] [Accepted: 04/18/2016] [Indexed: 11/14/2022] Open

Mahmood ASMA, Wu TJ, Mazumder R, Vijay-Shanker K. DiMeX: A Text Mining System for Mutation-Disease Association Extraction. PLoS One 2016;11:e0152725. [PMID: 27073839 PMCID: PMC4830514 DOI: 10.1371/journal.pone.0152725] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 03/19/2016] [Indexed: 11/22/2022] Open

Su J, Su J, Shang X, Wan Q, Chen X, Rao Y. SNP detection of TLR8 gene, association study with susceptibility/resistance to GCRV and regulation on mRNA expression in grass carp, Ctenopharyngodon idella. FISH & SHELLFISH IMMUNOLOGY 2015;43:1-12. [PMID: 25514376 DOI: 10.1016/j.fsi.2014.12.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2014] [Revised: 11/17/2014] [Accepted: 12/06/2014] [Indexed: 05/10/2023]

Abstract

Toll-like receptor 8 (TLR8), a prototypical intracellular member of TLR family, is generally linked closely to antiviral innate immune through recognizing viral nucleic acid. In this study, 5'-flanking region of Ctenopharyngodon idella TLR8 (CiTLR8), 671bp in length, was amplified and eight SNPs containing one SNP in the intron, three SNPs in the coding region (CDS) and four SNPs in the 3'-untranslated region (UTR) were identified and characterized. Of which 4062 A/T was significantly associated with the susceptibility/resistance to GCRV both in genotype and allele (P < 0.05), while 4168 C/T was extremely significantly associated with that (P < 0.01) according to the case (susceptibility)-control (resistance) analysis. Following the verification experiment, further analyses of mRNA expression, linkage disequilibrium (LD), haplotype and microRNA (miRNA) target site indicated that 4062 A/T and 4168 C/T in 3'-UTR might affect the miRNA regulation, while the exertion of antiviral effects of 4062 A/T might rely on its interaction with other SNPs. Additionally, the high-density of SNPs in 3'-UTR might reflect the specific biological functions of 3'-UTR. And also, the mutation of 747 A/G in intron changing the potential transcriptional factor-binding sites (TFBS) nearby might affect the expression of CiTLR8 transcriptionally or post-transcriptionally. Moreover, as predicted, the A/G transition of the only non-synonymous SNP (3846 A/G) in CDS causing threonine/alanine variation, could shorten the length of the α-helix and ultimately affect the integrity of the Toll-IL-1 receptor (TIR) domain. The functional mechanism of 3846 A/G might also involve a threonine phosphorylation signaling. This study may broaden the knowledge of TLR polymorphisms, lay the foundation for further functional research of CiTLR8 and provide potential markers as well as theoretical basis for resistance molecular breeding of grass carp against GCRV.

Collapse

Macintyre G, Jimeno Yepes A, Ong CS, Verspoor K. Associating disease-related genetic variants in intergenic regions to the genes they impact. PeerJ 2014;2:e639. [PMID: 25374782 PMCID: PMC4217187 DOI: 10.7717/peerj.639] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2014] [Accepted: 10/07/2014] [Indexed: 11/20/2022] Open

Beyan T, Aydın Son Y. Incorporation of personal single nucleotide polymorphism (SNP) data into a national level electronic health record for disease risk assessment, part 1: an overview of requirements. JMIR Med Inform 2014;2:e15. [PMID: 25599712 PMCID: PMC4288081 DOI: 10.2196/medinform.3169] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2013] [Revised: 05/25/2014] [Accepted: 07/02/2014] [Indexed: 01/29/2023] Open

Abstract

BACKGROUND

Personalized medicine approaches provide opportunities for predictive and preventive medicine. Using genomic, clinical, environmental, and behavioral data, tracking and management of individual wellness is possible. A prolific way to carry this personalized approach into routine practices can be accomplished by integrating clinical interpretations of genomic variations into electronic medical records (EMRs)/electronic health records (EHRs). Today, various central EHR infrastructures have been constituted in many countries of the world including Turkey.

OBJECTIVE

The objective of this study was to concentrate on incorporating the personal single nucleotide polymorphism (SNP) data into the National Health Information System of Turkey (NHIS-T) for disease risk assessment, and evaluate the performance of various predictive models for prostate cancer cases. We present our work as a miniseries containing three parts: (1) an overview of requirements, (2) the incorporation of SNP into the NHIS-T, and (3) an evaluation of SNP incorporated NHIS-T for prostate cancer.

METHODS

For the first article of this miniseries, the scientific literature is reviewed and the requirements of SNP data integration into EMRs/EHRs are extracted and presented.

RESULTS

In the literature, basic requirements of genomic-enabled EMRs/EHRs are listed as incorporating genotype data and its clinical interpretation into EMRs/EHRs, developing accurate and accessible clinicogenomic interpretation resources (knowledge bases), interpreting and reinterpreting of variant data, and immersing of clinicogenomic information into the medical decision processes. In this section, we have analyzed these requirements under the subtitles of terminology standards, interoperability standards, clinicogenomic knowledge bases, defining clinical significance, and clinicogenomic decision support.

CONCLUSIONS

In order to integrate structured genotype and phenotype data into any system, there is a need to determine data components, terminology standards, and identifiers of clinicogenomic information. Also, we need to determine interoperability standards to share information between different information systems of stakeholders, and develop decision support capability to interpret genomic variations based on the knowledge bases via different assessment approaches.

Collapse

Neves M. An analysis on the entity annotations in biological corpora. F1000Res 2014;3:96. [PMID: 25254099 PMCID: PMC4168744 DOI: 10.12688/f1000research.3216.1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/17/2014] [Indexed: 11/20/2022] Open

The Relationship between ALA16VAL Single Gene Polymorphism and Renal Cell Carcinoma. Adv Urol 2014;2014:932481. [PMID: 24587799 PMCID: PMC3920972 DOI: 10.1155/2014/932481] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2013] [Accepted: 12/02/2013] [Indexed: 12/12/2022] Open

Jimeno Yepes A, Verspoor K. Mutation extraction tools can be combined for robust recognition of genetic variants in the literature. F1000Res 2014;3:18. [PMID: 25285203 PMCID: PMC4176422 DOI: 10.12688/f1000research.3-18.v2] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/27/2014] [Indexed: 11/20/2022] Open

Abstract

As the cost of genomic sequencing continues to fall, the amount of data being collected and studied for the purpose of understanding the genetic basis of disease is increasing dramatically. Much of the source information relevant to such efforts is available only from unstructured sources such as the scientific literature, and significant resources are expended in manually curating and structuring the information in the literature. As such, there have been a number of systems developed to target automatic extraction of mutations and other genetic variation from the literature using text mining tools. We have performed a broad survey of the existing publicly available tools for extraction of genetic variants from the scientific literature. We consider not just one tool but a number of different tools, individually and in combination, and apply the tools in two scenarios. First, they are compared in an intrinsic evaluation context, where the tools are tested for their ability to identify specific mentions of genetic variants in a corpus of manually annotated papers, the Variome corpus. Second, they are compared in an extrinsic evaluation context based on our previous study of text mining support for curation of the COSMIC and InSiGHT databases. Our results demonstrate that no single tool covers the full range of genetic variants mentioned in the literature. Rather, several tools have complementary coverage and can be used together effectively. In the intrinsic evaluation on the Variome corpus, the combined performance is above 0.95 in F-measure, while in the extrinsic evaluation the combined recall performance is above 0.71 for COSMIC and above 0.62 for InSiGHT, a substantial improvement over the performance of any individual tool. Based on the analysis of these results, we suggest several directions for the improvement of text mining tools for genetic variant extraction from the literature.

Collapse

Vívenes M, Castro de Guerra D, Rodríguez-Larralde Á, Arocha-Piñango CL, Guerrero B. Activity and levels of factor XIII in a Venezuelan admixed population: association with rs5985 (Val35Leu) and STR F13A01 polymorphisms. Thromb Res 2012;130:729-34. [DOI: 10.1016/j.thromres.2012.07.027] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2012] [Revised: 07/19/2012] [Accepted: 07/31/2012] [Indexed: 11/16/2022]

Thomas P, Starlinger J, Vowinkel A, Arzt S, Leser U. GeneView: a comprehensive semantic search engine for PubMed. Nucleic Acids Res 2012;40:W585-91. [PMID: 22693219 PMCID: PMC3394277 DOI: 10.1093/nar/gks563] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open