1
|
Cohen-Mekelburg S, Van T, Yu X, Costa DK, Manojlovich M, Saini S, Gilmartin H, Admon AJ, Resnicow K, Higgins PDR, Siwo G, Zhu J, Waljee AK. Understanding clinician connections to inform efforts to promote high-quality inflammatory bowel disease care. PLoS One 2022; 17:e0279441. [PMID: 36574370 PMCID: PMC9794045 DOI: 10.1371/journal.pone.0279441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 12/07/2022] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Highly connected individuals disseminate information effectively within their social network. To apply this concept to inflammatory bowel disease (IBD) care and lay the foundation for network interventions to disseminate high-quality treatment, we assessed the need for improving the IBD practices of highly connected clinicians. We aimed to examine whether highly connected clinicians who treat IBD patients were more likely to provide high-quality treatment than less connected clinicians. METHODS We used network analysis to examine connections among clinicians who shared patients with IBD in the Veterans Health Administration between 2015-2018. We created a network comprised of clinicians connected by shared patients. We quantified clinician connections using degree centrality (number of clinicians with whom a clinician shares patients), closeness centrality (reach via shared contacts to other clinicians), and betweenness centrality (degree to which a clinician connects clinicians not otherwise connected). Using weighted linear regression, we examined associations between each measure of connection and two IBD quality indicators: low prolonged steroids use, and high steroid-sparing therapy use. RESULTS We identified 62,971 patients with IBD and linked them to 1,655 gastroenterologists and 7,852 primary care providers. Clinicians with more connections (degree) were more likely to exhibit high-quality treatment (less prolonged steroids beta -0.0268, 95%CI -0.0427, -0.0110, more steroid-sparing therapy beta 0.0967, 95%CI 0.0128, 0.1805). Clinicians who connect otherwise unconnected clinicians (betweenness) displayed more prolonged steroids use (beta 0.0003, 95%CI 0.0001, 0.0006). The presence of variation is more relevant than its magnitude. CONCLUSIONS Clinicians with a high number of connections provided more high-quality IBD treatments than less connected clinicians, and may be well-positioned for interventions to disseminate high-quality IBD care. However, clinicians who connect clinicians who are otherwise unconnected are more likely to display low-quality IBD treatment. Efforts to improve their quality are needed prior to leveraging their position to disseminate high-quality care.
Collapse
Affiliation(s)
- Shirley Cohen-Mekelburg
- VA Center for Clinical Management Research, LTC Charles Kettles VA Medical Center, Ann Arbor, Michigan, United States of America
- Division of Gastroenterology & Hepatology, University of Michigan Medicine, Ann Arbor, Michigan, United States of America
| | - Tony Van
- VA Center for Clinical Management Research, LTC Charles Kettles VA Medical Center, Ann Arbor, Michigan, United States of America
| | - Xianshi Yu
- Department of Statistics, University of Michigan Medicine, Ann Arbor, Michigan, United States of America
| | - Deena Kelly Costa
- School of Nursing, Yale University, New Haven, Connecticut, United States of America
- Section on Pulmonary, Critical Care & Sleep Medicine, Department of Internal Medicine, Yale University, New Haven, Connecticut, United States of America
| | - Milisa Manojlovich
- School of Nursing, Yale University, New Haven, Connecticut, United States of America
- Section on Pulmonary, Critical Care & Sleep Medicine, Department of Internal Medicine, Yale University, New Haven, Connecticut, United States of America
| | - Sameer Saini
- VA Center for Clinical Management Research, LTC Charles Kettles VA Medical Center, Ann Arbor, Michigan, United States of America
- Division of Gastroenterology & Hepatology, University of Michigan Medicine, Ann Arbor, Michigan, United States of America
| | - Heather Gilmartin
- Denver/Seattle Center of Innovation, VA Eastern Colorado Healthcare System, Aurora, Colorado, United States of America
| | - Andrew J. Admon
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, University of Michigan Medicine, Ann Arbor, Michigan, United States of America
- Pulmonary Service, LTC Charles Kettles VA Medical Center, Ann Arbor, Michigan, United States of America
| | - Ken Resnicow
- Department of Health Education and Health Behavior, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Peter D. R. Higgins
- Division of Gastroenterology & Hepatology, University of Michigan Medicine, Ann Arbor, Michigan, United States of America
| | - Geoffrey Siwo
- Division of Gastroenterology & Hepatology, University of Michigan Medicine, Ann Arbor, Michigan, United States of America
| | - Ji Zhu
- Department of Statistics, University of Michigan Medicine, Ann Arbor, Michigan, United States of America
| | - Akbar K. Waljee
- VA Center for Clinical Management Research, LTC Charles Kettles VA Medical Center, Ann Arbor, Michigan, United States of America
- Division of Gastroenterology & Hepatology, University of Michigan Medicine, Ann Arbor, Michigan, United States of America
| |
Collapse
|
2
|
Saib WB, Kumar P, Siwo G, Dlamini G, Singh E, Candy S, Klipin M. Abstract A11: A deep learning approach for extracting clinically relevant information from pathology reports. Cancer Res 2017. [DOI: 10.1158/1538-7445.newfront17-a11] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Data-Driven Healthcare, IBM Research Africa, Johannesburg, South Africa
The National Cancer Registry (NCR) in South Africa plays a significant role in reporting nationwide cancer statistics and raising the global awareness of the massive impact of cancer. The government requires confirmed cancer cases to be reported to the NCR. Due to manual processes and the increasing magnitude of reports received annually, a considerable lag time exists in cancer statistics, which means the extent of the cancer cases is currently not understood. In addition, the unstructured free-text also needs to be processed in order to identify clinical information that could be important for public health planning. We present initial results from a deep learning approach to address this time lag. Deep learning is a powerful machine-learning algorithm that has made strides in the area of medical image recognition and speech processing. The deep learning system takes as input 2000 de-identified breast cancer pathology reports provided by the NCR in collaboration with the University of Witwatersrand Medical School. The pathology reports are first preprocessed using the Tf-idf (term frequency-inverse document frequency) method, which suggests how important a word is to a document in a corpus by assigning a numerical statistic to each word and hence obtain a term frequency document matrix. The high dimensional data matrix is, input into an unsupervised learning autoencoder, a data compression algorithm used to attain rich features that best represents the specific breast cancer topography and morphology. Unlike other approaches, our approach relies on non-dictionary sources such as clinical empirical knowledge extracted from the reports and dictionary sources such as the 12,000 medical diagnoses available in the International Statistical Classification of Diseases and Related Health Problems (ICD-10). The output from the deep learning system can be used to automate the classification of reports into their corresponding topography and morphology. The system could also be used to create a visual analytics system to aid data exploration and trend analysis of the current state of cancer in South Africa.
Citation Format: Waheeda Banu Saib, Pavan Kumar, Geoffrey Siwo, Gciniwe Dlamini, Elvira Singh, Sue Candy, Michael Klipin. A deep learning approach for extracting clinically relevant information from pathology reports [abstract]. In: Proceedings of the AACR International Conference: New Frontiers in Cancer Research; 2017 Jan 18-22; Cape Town, South Africa. Philadelphia (PA): AACR; Cancer Res 2017;77(22 Suppl):Abstract nr A11.
Collapse
Affiliation(s)
| | - Pavan Kumar
- 1IBM Research, Johannesburg, Gauteng, South Africa,
| | | | | | - Elvira Singh
- 2National Health Laboratory Service, Johannesburg, Gauteng, South Africa,
| | - Sue Candy
- 2National Health Laboratory Service, Johannesburg, Gauteng, South Africa,
| | - Michael Klipin
- 3University of Witswatersrand, Johannesburg, Gauteng, South Africa
| |
Collapse
|
3
|
Abstract
The quantitative prediction of transcriptional activity of genes using promoter sequence is fundamental to the engineering of biological systems for industrial purposes and understanding the natural variation in gene expression. To catalyze the development of new algorithms for this purpose, the Dialogue on Reverse Engineering Assessment and Methods (DREAM) organized a community challenge seeking predictive models of promoter activity given normalized promoter activity data for 90 ribosomal protein promoters driving expression of a fluorescent reporter gene. By developing an unbiased modeling approach that performs an iterative search for predictive DNA sequence features using the frequencies of various k-mers, inferred DNA mechanical properties and spatial positions of promoter sequences, we achieved the best performer status in this challenge. The specific predictive features used in the model included the frequency of the nucleotide G, the length of polymeric tracts of T and TA, the frequencies of 6 distinct trinucleotides and 12 tetranucleotides, and the predicted protein deformability of the DNA sequence. Our method accurately predicted the activity of 20 natural variants of ribosomal protein promoters (Spearman correlation r = 0.73) as compared to 33 laboratory-mutated variants of the promoters (r = 0.57) in a test set that was hidden from participants. Notably, our model differed substantially from the rest in 2 main ways: i) it did not explicitly utilize transcription factor binding information implying that subtle DNA sequence features are highly associated with gene expression, and ii) it was entirely based on features extracted exclusively from the 100 bp region upstream from the translational start site demonstrating that this region encodes much of the overall promoter activity. The findings from this study have important implications for the engineering of predictable gene expression systems and the evolution of gene expression in naturally occurring biological systems.
Collapse
Affiliation(s)
- Geoffrey Siwo
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA; Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA; Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA; IBM TJ Watson Research Center, NY, USA; IBM Research-Africa, Johannesberg, South Africa
| | - Andrew Rider
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA; Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA; Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA
| | - Asako Tan
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA; Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA; Epicentre, Madison, WI, USA
| | - Richard Pinapati
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA; Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA; Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA
| | - Scott Emrich
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA; Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA; Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA
| | - Nitesh Chawla
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA; Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA; Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA
| | - Michael Ferdig
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA; Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA; Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA
| |
Collapse
|
4
|
Rider AK, Siwo G, Emrich SJ, Ferdig MT, Chawla NV. A supervised learning approach to the ensemble clustering of genes. INT J DATA MIN BIOIN 2014; 9:199-219. [DOI: 10.1504/ijdmb.2014.059062] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
5
|
Meyer P, Siwo G, Zeevi D, Sharon E, Norel R, Segal E, Stolovitzky G. Inferring gene expression from ribosomal promoter sequences, a crowdsourcing approach. Genome Res 2013; 23:1928-37. [PMID: 23950146 PMCID: PMC3814892 DOI: 10.1101/gr.157420.113] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
The Gene Promoter Expression Prediction challenge consisted of predicting gene expression from promoter sequences in a previously unknown experimentally generated data set. The challenge was presented to the community in the framework of the sixth Dialogue for Reverse Engineering Assessments and Methods (DREAM6), a community effort to evaluate the status of systems biology modeling methodologies. Nucleotide-specific promoter activity was obtained by measuring fluorescence from promoter sequences fused upstream of a gene for yellow fluorescence protein and inserted in the same genomic site of yeast Saccharomyces cerevisiae. Twenty-one teams submitted results predicting the expression levels of 53 different promoters from yeast ribosomal protein genes. Analysis of participant predictions shows that accurate values for low-expressed and mutated promoters were difficult to obtain, although in the latter case, only when the mutation induced a large change in promoter activity compared to the wild-type sequence. As in previous DREAM challenges, we found that aggregation of participant predictions provided robust results, but did not fare better than the three best algorithms. Finally, this study not only provides a benchmark for the assessment of methods predicting activity of a specific set of promoters from their sequence, but it also shows that the top performing algorithm, which used machine-learning approaches, can be improved by the addition of biological features such as transcription factor binding sites.
Collapse
Affiliation(s)
- Pablo Meyer
- IBM T.J. Watson Research Center, Yorktown Heights, New York 10598, USA
| | | | | | | | | | | | | | | |
Collapse
|
6
|
Huang Y, Siwo G, Wuchty S, Ferdig MT, Przytycka TM. Symmetric Epistasis Estimation (SEE) and its application to dissecting interaction map of Plasmodium falciparum. Mol Biosyst 2012; 8:1544-52. [PMID: 22419061 DOI: 10.1039/c2mb05333k] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
It is being increasingly recognized that many important phenotypic traits, including various diseases, are governed by a combination of weak genetic effects and their interactions. While the detection of epistatic interactions that involve a non-additive effect of two loci on a quantitative trait is particularly challenging, this interaction type is fundamental for the understanding of genome organization and gene regulation. However, current methods that detect epistatic interactions typically rely on the existence of a strong primary effect, considerably limiting the sensitivity of the search. To fill this gap, we developed a new method, SEE (Symmetric Epistasis Estimation), allowing the genome-wide detection of epistatic interactions without the need for a strong primary effect. We applied our approach to progeny crosses of the human malaria parasite P. falciparum and S. cerevisiae. We found an abundance of epistatic interactions in the parasite and a much smaller number of such interactions in yeast. The genome of P. falciparum also harboured several epistatic interaction hotspots that putatively play a role in drug resistance mechanisms. The abundance of observed epistatic interactions might suggest a mechanism of compensation for the extremely limited repertoire of transcription factors. Interestingly, epistatic interaction hotspots were associated with elevated levels of linkage disequilibrium, an observation that suggests selection pressure acting on P. falciparum, potentially reflecting host-pathogen interactions or drug-induced selection.
Collapse
Affiliation(s)
- Yang Huang
- National Center for Biotechnology Information, NLM, NIH, 8600 Rockville Pike, Building 38A, Bethesda, MD 20894, USA
| | | | | | | | | |
Collapse
|
7
|
Rider AK, Siwo G, Chawla NV, Ferdig M, Emrich SJ. A statistical approach to finding overlooked genetic associations. BMC Bioinformatics 2010; 11:526. [PMID: 20964847 PMCID: PMC2974753 DOI: 10.1186/1471-2105-11-526] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2010] [Accepted: 10/21/2010] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
Complexity and noise in expression quantitative trait loci (eQTL) studies make it difficult to distinguish potential regulatory relationships among the many interactions. The predominant method of identifying eQTLs finds associations that are significant at a genome-wide level. The vast number of statistical tests carried out on these data make false negatives very likely. Corrections for multiple testing error render genome-wide eQTL techniques unable to detect modest regulatory effects.
We propose an alternative method to identify eQTLs that builds on traditional approaches. In contrast to genome-wide techniques, our method determines the significance of an association between an expression trait and a locus with respect to the set of all associations to the expression trait. The use of this specific information facilitates identification of expression traits that have an expression profile that is characterized by a single exceptional association to a locus.
Our approach identifies expression traits that have exceptional associations regardless of the genome-wide significance of those associations. This property facilitates the identification of possible false negatives for genome-wide significance. Further, our approach has the property of prioritizing expression traits that are affected by few strong associations. Expression traits identified by this method may warrant additional study because their expression level may be affected by targeting genes near a single locus.
Results
We demonstrate our method by identifying eQTL hotspots in Plasmodium falciparum (malaria) and Saccharomyces cerevisiae (yeast). We demonstrate the prioritization of traits with few strong genetic effects through Gene Ontology (GO) analysis of Yeast. Our results are strongly consistent with results gathered using genome-wide methods and identify additional hotspots and eQTLs.
Conclusions
New eQTLs and hotspots found with this method may represent regions of the genome or biological processes that are controlled through few relatively strong genetic interactions. These points of interest warrant experimental investigation.
Collapse
|
8
|
Reilly Ayala HB, Wacker MA, Siwo G, Ferdig MT. Quantitative trait loci mapping reveals candidate pathways regulating cell cycle duration in Plasmodium falciparum. BMC Genomics 2010; 11:577. [PMID: 20955606 PMCID: PMC3091725 DOI: 10.1186/1471-2164-11-577] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2010] [Accepted: 10/18/2010] [Indexed: 11/24/2022] Open
Abstract
Background Elevated parasite biomass in the human red blood cells can lead to increased malaria morbidity. The genes and mechanisms regulating growth and development of Plasmodium falciparum through its erythrocytic cycle are not well understood. We previously showed that strains HB3 and Dd2 diverge in their proliferation rates, and here use quantitative trait loci mapping in 34 progeny from a cross between these parent clones along with integrative bioinformatics to identify genetic loci and candidate genes that control divergences in cell cycle duration. Results Genetic mapping of cell cycle duration revealed a four-locus genetic model, including a major genetic effect on chromosome 12, which accounts for 75% of the inherited phenotype variation. These QTL span 165 genes, the majority of which have no predicted function based on homology. We present a method to systematically prioritize candidate genes using the extensive sequence and transcriptional information available for the parent lines. Putative functions were assigned to the prioritized genes based on protein interaction networks and expression eQTL from our earlier study. DNA metabolism or antigenic variation functional categories were enriched among our prioritized candidate genes. Genes were then analyzed to determine if they interact with cyclins or other proteins known to be involved in the regulation of cell cycle. Conclusions We show that the divergent proliferation rate between a drug resistant and drug sensitive parent clone is under genetic regulation and is segregating as a complex trait in 34 progeny. We map a major locus along with additional secondary effects, and use the wealth of genome data to identify key candidate genes. Of particular interest are a nucleosome assembly protein (PFL0185c), a Zinc finger transcription factor (PFL0465c) both on chromosome 12 and a ribosomal protein L7Ae-related on chromosome 4 (PFD0960c).
Collapse
|
9
|
Abstract
Although maps of intracellular interactions are increasingly well characterized, little is known about large-scale maps of host-pathogen protein interactions. The investigation of host-pathogen interactions can reveal features of pathogenesis and provide a foundation for the development of drugs and disease prevention strategies. A compilation of experimentally verified interactions between HIV-1 and human proteins and a set of HIV-dependency factors (HDF) allowed insights into the topology and intricate interplay between viral and host proteins on a large scale. We found that targeted and HDF proteins appear predominantly in rich-clubs, groups of human proteins that are strongly intertwined among each other. These assemblies of proteins may serve as an infection gateway, allowing the virus to take control of the human host by reaching protein pathways and diversified cellular functions in a pronounced and focused way. Particular transcription factors and protein kinases facilitate indirect interactions between HDFs and viral proteins. Discerning the entanglement of directly targeted and indirectly interacting proteins may uncover molecular and functional sites that can provide novel perspectives on the progression of HIV infection and highlight new avenues to fight this virus.
Collapse
Affiliation(s)
- Stefan Wuchty
- National Center of Biotechnology Information, National Institutes of Health, Bethesda, Maryland, USA.
| | | | | |
Collapse
|
10
|
Siwo G. Site Specific Codon Bias in HIV-1 pol Gene. Retrovirology 2005. [DOI: 10.1186/1742-4690-2-s1-p91] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|