Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zhang Y, Shen F, Mojarad MR, Li D, Liu S, Tao C, Yu Y, Liu H. Systematic identification of latent disease-gene associations from PubMed articles. PLoS One 2018;13:e0191568. [PMID: 29373609 PMCID: PMC5786305 DOI: 10.1371/journal.pone.0191568] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2017] [Accepted: 01/08/2018] [Indexed: 12/27/2022] Open

For:	Zhang Y, Shen F, Mojarad MR, Li D, Liu S, Tao C, Yu Y, Liu H. Systematic identification of latent disease-gene associations from PubMed articles. PLoS One 2018;13:e0191568. [PMID: 29373609 PMCID: PMC5786305 DOI: 10.1371/journal.pone.0191568] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2017] [Accepted: 01/08/2018] [Indexed: 12/27/2022] Open

Number

Cited by Other Article(s)

Jamaludeen N, Beyer C, Billing U, Vogel K, Brunner-Weinzierl M, Spiliopoulou M. Potential of Point-of-Care and At-Home Assessment of Immune Status via Rapid Cytokine Detection and Questionnaire-Based Anamnesis. SENSORS (BASEL, SWITZERLAND) 2021;21:4960. [PMID: 34372196 PMCID: PMC8348245 DOI: 10.3390/s21154960] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 06/30/2021] [Accepted: 07/07/2021] [Indexed: 12/29/2022]

Abstract

Monitoring the immune system's status has emerged as an urgent demand in critical health conditions. The circulating cytokine levels in the blood reflect a thorough insight into the immune system status. Indeed, measuring one cytokine may deliver more information equivalent to detecting multiple diseases at a time. However, if the reported cytokine levels are interpreted with considering lifestyle and any comorbid health conditions for the individual, this will promote a more precise assessment of the immune status. Therefore, this study addresses the most recent advanced assays that deliver rapid, accurate measuring of the cytokine levels in human blood, focusing on add-on potentials for point-of-care (PoC) or personal at-home usage, and investigates existing health questionnaires as supportive assessment tools that collect all necessary information for the concrete analysis of the measured cytokine levels. We introduced a ten-dimensional featuring of cytokine measurement assays. We found 15 rapid cytokine assays with assay time less than 1 h; some could operate on unprocessed blood samples, while others are mature commercial products available in the market. In addition, we retrieved several health questionnaires that addressed various health conditions such as chronic diseases and psychological issues. Then, we present a machine learning-based solution to determine what makes the immune system fit. To this end, we discuss how to employ topic modeling for deriving the definition of immune fitness automatically from literature. Finally, we propose a prototype model to assess the fitness of the immune system through leveraging the derived definition of the immune fitness, the cytokine measurements delivered by a rapid PoC immunoassay, and the complementary information collected by the health questionnaire about other health factors. In conclusion, we discovered various advanced rapid cytokine detection technologies that are promising candidates for point-of-care or at-home usage; if paired with a health status questionnaire, the assessment of the immune system status becomes solid and we demonstrated potentials for promoting the assessment tool with data mining techniques.

Collapse

Kavvadias S, Drosatos G, Kaldoudi E. Supporting topic modeling and trends analysis in biomedical literature. J Biomed Inform 2020;110:103574. [DOI: 10.1016/j.jbi.2020.103574] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Revised: 08/24/2020] [Accepted: 09/12/2020] [Indexed: 11/25/2022]

Jiang Y, Wu C, Zhang Y, Zhang S, Yu S, Lei P, Lu Q, Xi Y, Wang H, Song Z. GTX.Digest.VCF: an online NGS data interpretation system based on intelligent gene ranking and large-scale text mining. BMC Med Genomics 2019;12:193. [PMID: 31856831 PMCID: PMC6923899 DOI: 10.1186/s12920-019-0637-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Accepted: 11/26/2019] [Indexed: 02/07/2023] Open

Zhang G, Wang W, Huang W, Xie X, Liang Z, Cao H. Cross-disease analysis identified novel common genes for both lung adenocarcinoma and lung squamous cell carcinoma. Oncol Lett 2019;18:3463-3470. [PMID: 31516564 PMCID: PMC6732964 DOI: 10.3892/ol.2019.10678] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 05/25/2019] [Indexed: 12/25/2022] Open

Abstract

Lung squamous cell carcinoma (LSCC) exhibits a number of similarities with lung adenocarcinoma (LA) in terms of copy number alterations. However, compared with LA, the range of genetic alterations in LSCC is less understood. In the present study, a large-scale literature-based search of LA-associated genes and LSCC-associated genes was performed to identify the genetic basis in common with these two diseases. For each of the LA-associated genes, a mega-analysis was performed to test its expression variations in LSCC using 11 RNA expression datasets, with significant genes identified using statistical analysis. Subsequently, a functional pathway analysis was performed to identify a possible association between any of the significant genes identified from the mega-analysis and LSCC, followed by a co-expression analysis. A multiple linear regression (MLR) model was employed to investigate the possible influence of sample size, country of origin and study date on gene expression in patients with LSCC. Disease-gene association data analysis identified 1,178 genes involved in LA, 334 in LSCC, with a significant overlap of 187 genes (P<1.02×⁻¹⁶¹). Mega-analysis revealed that three LA-associated genes, such as solute carrier family 2 member 1 (SLC2A1), endothelial PAS domain protein 1 (EPAS1) and cyclin-dependent kinase 4 (CDK4), were significantly associated with LSCC (P<1.60×10⁻⁸), with multiple potential pathways identified by functional pathway analysis, which were further validated by co-expression analysis. The present MLR analysis suggested that the country of origin was a significant factor for the levels of expression of all three genes in patients with LSCC (P<4.0×10⁻³). Collectively, the present results suggested that genes associated with LA should be further investigated for their association with LSCC. In addition, SLC2A1, EPAS1 and CDK4 may be novel risk genes associated with LA and LSCC.

Collapse

Shen F, Zhao Y, Wang L, Mojarad MR, Wang Y, Liu S, Liu H. Rare disease knowledge enrichment through a data-driven approach. BMC Med Inform Decis Mak 2019;19:32. [PMID: 30764825 PMCID: PMC6376651 DOI: 10.1186/s12911-019-0752-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Accepted: 02/01/2019] [Indexed: 01/03/2023] Open

Wang Y, Sohn S, Liu S, Shen F, Wang L, Atkinson EJ, Amin S, Liu H. A clinical text classification paradigm using weak supervision and deep representation. BMC Med Inform Decis Mak 2019;19:1. [PMID: 30616584 PMCID: PMC6322223 DOI: 10.1186/s12911-018-0723-6] [Citation(s) in RCA: 131] [Impact Index Per Article: 26.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Accepted: 12/10/2018] [Indexed: 01/02/2023] Open

Abstract

BACKGROUND

Automatic clinical text classification is a natural language processing (NLP) technology that unlocks information embedded in clinical narratives. Machine learning approaches have been shown to be effective for clinical text classification tasks. However, a successful machine learning model usually requires extensive human efforts to create labeled training data and conduct feature engineering. In this study, we propose a clinical text classification paradigm using weak supervision and deep representation to reduce these human efforts.

METHODS

We develop a rule-based NLP algorithm to automatically generate labels for the training data, and then use the pre-trained word embeddings as deep representation features for training machine learning models. Since machine learning is trained on labels generated by the automatic NLP algorithm, this training process is called weak supervision. We evaluat the paradigm effectiveness on two institutional case studies at Mayo Clinic: smoking status classification and proximal femur (hip) fracture classification, and one case study using a public dataset: the i2b2 2006 smoking status classification shared task. We test four widely used machine learning models, namely, Support Vector Machine (SVM), Random Forest (RF), Multilayer Perceptron Neural Networks (MLPNN), and Convolutional Neural Networks (CNN), using this paradigm. Precision, recall, and F1 score are used as metrics to evaluate performance.

RESULTS

CNN achieves the best performance in both institutional tasks (F1 score: 0.92 for Mayo Clinic smoking status classification and 0.97 for fracture classification). We show that word embeddings significantly outperform tf-idf and topic modeling features in the paradigm, and that CNN captures additional patterns from the weak supervision compared to the rule-based NLP algorithms. We also observe two drawbacks of the proposed paradigm that CNN is more sensitive to the size of training data, and that the proposed paradigm might not be effective for complex multiclass classification tasks.

CONCLUSION

The proposed clinical text classification paradigm could reduce human efforts of labeled training data creation and feature engineering for applying machine learning to clinical text classification by leveraging weak supervision and deep representation. The experimental experiments have validated the effectiveness of paradigm by two institutional and one shared clinical text classification tasks.

Collapse