Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

20
(from Reference Citation Analysis)

Article PDFs (8)

Cited by > 0 (11)

Searched Name

Meghana Kshirsagar

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Gholami S, Scheppke L, Kshirsagar M, Wu Y, Dodhia R, Bonelli R, Leung I, Sallo FB, Muldrew A, Jamison C, Peto T, Lavista Ferres J, Weeks WB, Friedlander M, Lee AY. Self-Supervised Learning for Improved Optical Coherence Tomography Detection of Macular Telangiectasia Type 2. JAMA Ophthalmol 2024;142:226-233. [PMID: 38329740 PMCID: PMC10853868 DOI: 10.1001/jamaophthalmol.2023.6454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 11/29/2023] [Indexed: 02/09/2024]

Abstract

Importance

Deep learning image analysis often depends on large, labeled datasets, which are difficult to obtain for rare diseases.

Objective

To develop a self-supervised approach for automated classification of macular telangiectasia type 2 (MacTel) on optical coherence tomography (OCT) with limited labeled data.

Design, Setting, and Participants

This was a retrospective comparative study. OCT images from May 2014 to May 2019 were collected by the Lowy Medical Research Institute, La Jolla, California, and the University of Washington, Seattle, from January 2016 to October 2022. Clinical diagnoses of patients with and without MacTel were confirmed by retina specialists. Data were analyzed from January to September 2023.

Exposures

Two convolutional neural networks were pretrained using the Bootstrap Your Own Latent algorithm on unlabeled training data and fine-tuned with labeled training data to predict MacTel (self-supervised method). ResNet18 and ResNet50 models were also trained using all labeled data (supervised method).

Main Outcomes and Measures

The ground truth yes vs no MacTel diagnosis is determined by retinal specialists based on spectral-domain OCT. The models' predictions were compared against human graders using accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), area under precision recall curve (AUPRC), and area under the receiver operating characteristic curve (AUROC). Uniform manifold approximation and projection was performed for dimension reduction and GradCAM visualizations for supervised and self-supervised methods.

Results

A total of 2636 OCT scans from 780 patients with MacTel and 131 patients without MacTel were included from the MacTel Project (mean [SD] age, 60.8 [11.7] years; 63.8% female), and another 2564 from 1769 patients without MacTel from the University of Washington (mean [SD] age, 61.2 [18.1] years; 53.4% female). The self-supervised approach fine-tuned on 100% of the labeled training data with ResNet50 as the feature extractor performed the best, achieving an AUPRC of 0.971 (95% CI, 0.969-0.972), an AUROC of 0.970 (95% CI, 0.970-0.973), accuracy of 0.898%, sensitivity of 0.898, specificity of 0.949, PPV of 0.935, and NPV of 0.919. With only 419 OCT volumes (185 MacTel patients in 10% of labeled training dataset), the ResNet18 self-supervised model achieved comparable performance, with an AUPRC of 0.958 (95% CI, 0.957-0.960), an AUROC of 0.966 (95% CI, 0.964-0.967), and accuracy, sensitivity, specificity, PPV, and NPV of 90.2%, 0.884, 0.916, 0.896, and 0.906, respectively. The self-supervised models showed better agreement with the more experienced human expert graders.

Conclusions and Relevance

The findings suggest that self-supervised learning may improve the accuracy of automated MacTel vs non-MacTel binary classification on OCT with limited labeled training data, and these approaches may be applicable to other rare diseases, although further research is warranted.

Collapse

Pereira M, Kshirsagar M, Mukherjee S, Dodhia R, Lavista Ferres J, de Sousa R. Assessment of differentially private synthetic data for utility and fairness in end-to-end machine learning pipelines for tabular data. PLoS One 2024;19:e0297271. [PMID: 38315667 PMCID: PMC10843030 DOI: 10.1371/journal.pone.0297271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Accepted: 01/02/2024] [Indexed: 02/07/2024] Open

Abstract

Differentially private (DP) synthetic datasets are a solution for sharing data while preserving the privacy of individual data providers. Understanding the effects of utilizing DP synthetic data in end-to-end machine learning pipelines impacts areas such as health care and humanitarian action, where data is scarce and regulated by restrictive privacy laws. In this work, we investigate the extent to which synthetic data can replace real, tabular data in machine learning pipelines and identify the most effective synthetic data generation techniques for training and evaluating machine learning models. We systematically investigate the impacts of differentially private synthetic data on downstream classification tasks from the point of view of utility as well as fairness. Our analysis is comprehensive and includes representatives of the two main types of synthetic data generation algorithms: marginal-based and GAN-based. To the best of our knowledge, our work is the first that: (i) proposes a training and evaluation framework that does not assume that real data is available for testing the utility and fairness of machine learning models trained on synthetic data; (ii) presents the most extensive analysis of synthetic dataset generation algorithms in terms of utility and fairness when used for training machine learning models; and (iii) encompasses several different definitions of fairness. Our findings demonstrate that marginal-based synthetic data generators surpass GAN-based ones regarding model training utility for tabular data. Indeed, we show that models trained using data generated by marginal-based algorithms can exhibit similar utility to models trained using real data. Our analysis also reveals that the marginal-based synthetic data generated using AIM and MWEM PGM algorithms can train models that simultaneously achieve utility and fairness characteristics close to those obtained by models trained with real data.

Collapse

Ciceri G, Baggiolini A, Cho HS, Kshirsagar M, Benito-Kwiecinski S, Walsh RM, Aromolaran KA, Gonzalez-Hernandez AJ, Munguba H, Koo SY, Xu N, Sevilla KJ, Goldstein PA, Levitz J, Leslie CS, Koche RP, Studer L. An epigenetic barrier sets the timing of human neuronal maturation. Nature 2024;626:881-890. [PMID: 38297124 PMCID: PMC10881400 DOI: 10.1038/s41586-023-06984-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 12/15/2023] [Indexed: 02/02/2024]

Affiliation(s)

Gabriele Ciceri The Center for Stem Cell Biology and Developmental Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
Arianna Baggiolini The Center for Stem Cell Biology and Developmental Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA Institute of Oncology Research (IOR), Bellinzona Institutes of Science (BIOS+), Bellinzona, Switzerland Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland
Hyein S Cho The Center for Stem Cell Biology and Developmental Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA Computational Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Meghana Kshirsagar Computational Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA Microsoft AI for Good Research, Redmond, WA, USA
Silvia Benito-Kwiecinski The Center for Stem Cell Biology and Developmental Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Ryan M Walsh The Center for Stem Cell Biology and Developmental Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Kelly A Aromolaran Department of Anesthesiology, Weill Cornell Medicine, New York, NY, USA
Alberto J Gonzalez-Hernandez Department of Biochemistry, Weill Cornell Medicine, New York, NY, USA
Hermany Munguba Department of Biochemistry, Weill Cornell Medicine, New York, NY, USA
So Yeon Koo The Center for Stem Cell Biology and Developmental Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA Weill Cornell Neuroscience PhD Program, New York, NY, USA
Nan Xu The Center for Stem Cell Biology and Developmental Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA Louis V. Gerstner Jr Graduate School of Biomedical Sciences, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Kaylin J Sevilla The Center for Stem Cell Biology and Developmental Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Peter A Goldstein Department of Anesthesiology, Weill Cornell Medicine, New York, NY, USA
Joshua Levitz Department of Biochemistry, Weill Cornell Medicine, New York, NY, USA
Christina S Leslie Computational Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Richard P Koche Center for Epigenetics Research, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Lorenz Studer The Center for Stem Cell Biology and Developmental Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA.

Collapse

Sledzieski S, Kshirsagar M, Baek M, Berger B, Dodhia R, Ferres JL. Democratizing Protein Language Models with Parameter-Efficient Fine-Tuning. bioRxiv 2023:2023.11.09.566187. [PMID: 37986761 PMCID: PMC10659351 DOI: 10.1101/2023.11.09.566187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]

Abstract

Proteomics has been revolutionized by large pre-trained protein language models, which learn unsupervised representations from large corpora of sequences. The parameters of these models are then fine-tuned in a supervised setting to tailor the model to a specific downstream task. However, as model size increases, the computational and memory footprint of fine-tuning becomes a barrier for many research groups. In the field of natural language processing, which has seen a similar explosion in the size of models, these challenges have been addressed by methods for parameter-efficient fine-tuning (PEFT). In this work, we newly bring parameter-efficient fine-tuning methods to proteomics. Using the parameter-efficient method LoRA, we train new models for two important proteomic tasks: predicting protein-protein interactions (PPI) and predicting the symmetry of homooligomers. We show that for homooligomer symmetry prediction, these approaches achieve performance competitive with traditional fine-tuning while requiring reduced memory and using three orders of magnitude fewer parameters. On the PPI prediction task, we surprisingly find that PEFT models actually outperform traditional fine-tuning while using two orders of magnitude fewer parameters. Here, we go even further to show that freezing the parameters of the language model and training only a classification head also outperforms fine-tuning, using five orders of magnitude fewer parameters, and that both of these models outperform state-of-the-art PPI prediction methods with substantially reduced compute. We also demonstrate that PEFT is robust to variations in training hyper-parameters, and elucidate where best practices for PEFT in proteomics differ from in natural language processing. Thus, we provide a blueprint to democratize the power of protein language model tuning to groups which have limited computational resources.

Collapse

Ali MS, Kshirsagar M, Naredo E, Ryan C. Dynamic Grammar Pruning for Program Size Reduction in Symbolic Regression. SN Comput Sci 2023;4:402. [PMID: 37214587 PMCID: PMC10192180 DOI: 10.1007/s42979-023-01840-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 04/12/2023] [Indexed: 05/24/2023]

Meller A, Ward MD, Borowsky JH, Lotthammer JM, Kshirsagar M, Oviedo F, Lavista Ferres J, Bowman G. Predicting the locations of cryptic pockets from single protein structures using the PocketMiner graph neural network. Biophys J 2023;122:445a. [PMID: 36784287 DOI: 10.1016/j.bpj.2022.11.2400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2023] Open

Mukherjee S, Kshirsagar M, Becker N, Xu Y, Weeks WB, Patel S, Ferres JL, Jackson ML. Identifying long-term effects of SARS-CoV-2 and their association with social determinants of health in a cohort of over one million COVID-19 survivors. BMC Public Health 2022;22:2394. [PMID: 36539760 PMCID: PMC9765366 DOI: 10.1186/s12889-022-14806-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 12/05/2022] [Indexed: 12/24/2022] Open

Abstract

BACKGROUND

Despite an abundance of information on the risk factors of SARS-CoV-2, there have been few US-wide studies of long-term effects. In this paper we analyzed a large medical claims database of US based individuals to identify common long-term effects as well as their associations with various social and medical risk factors.

METHODS

The medical claims database was obtained from a prominent US based claims data processing company, namely Change Healthcare. In addition to the claims data, the dataset also consisted of various social determinants of health such as race, income, education level and veteran status of the individuals. A self-controlled cohort design (SCCD) observational study was performed to identify ICD-10 codes whose proportion was significantly increased in the outcome period compared to the control period to identify significant long-term effects. A logistic regression-based association analysis was then performed between identified long-term effects and social determinants of health.

RESULTS

Among the over 1.37 million COVID patients in our datasets we found 36 out of 1724 3-digit ICD-10 codes to be statistically significantly increased in the post-COVID period (p-value < 0.05). We also found one combination of ICD-10 codes, corresponding to 'other anemias' and 'hypertension', that was statistically significantly increased in the post-COVID period (p-value < 0.05). Our logistic regression-based association analysis with social determinants of health variables, after adjusting for comorbidities and prior conditions, showed that age and gender were significantly associated with the multiple long-term effects. Race was only associated with 'other sepsis', income was only associated with 'Alopecia areata' (autoimmune disease causing hair loss), while education level was only associated with 'Maternal infectious and parasitic diseases' (p-value < 0.05).

CONCLUSION

We identified several long-term effects of SARS-CoV-2 through a self-controlled study on a cohort of over one million patients. Furthermore, we found that while age and gender are commonly associated with the long-term effects, other social determinants of health such as race, income and education levels have rare or no significant associations.

Collapse

Kshirsagar M, Nasir M, Mukherjee S, Becker N, Dodhia R, Weeks WB, Ferres JL, Richardson B. The Risk of Hospitalization and Mortality After Breakthrough SARS-CoV-2 Infection by Vaccine Type: Observational Study of Medical Claims Data. JMIR Public Health Surveill 2022;8:e38898. [PMID: 36265135 PMCID: PMC9645422 DOI: 10.2196/38898] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Revised: 10/06/2022] [Accepted: 10/18/2022] [Indexed: 11/09/2022] Open

Abstract

BACKGROUND

Several risk factors have been identified for severe COVID-19 disease by the scientific community. In this paper, we focus on understanding the risks for severe COVID-19 infections after vaccination (ie, in breakthrough SARS-CoV-2 infections). Studying these risks by vaccine type, age, sex, comorbidities, and any prior SARS-CoV-2 infection is important to policy makers planning further vaccination efforts.

OBJECTIVE

We performed a comparative study of the risks of hospitalization (n=1140) and mortality (n=159) in a SARS-CoV-2 positive cohort of 19,815 patients who were all fully vaccinated with the Pfizer, Moderna, or Janssen vaccines.

METHODS

We performed Cox regression analysis to calculate the risk factors for developing a severe breakthrough SARS-CoV-2 infection in the study cohort by controlling for vaccine type, age, sex, comorbidities, and a prior SARS-CoV-2 infection.

RESULTS

We found lower hazard ratios for those receiving the Moderna vaccine (P<.001) and Pfizer vaccine (P<.001), with the lowest hazard rates being for Moderna, as compared to those who received the Janssen vaccine, independent of age, sex, comorbidities, vaccine type, and prior SARS-CoV-2 infection. Further, individuals who had a SARS-CoV-2 infection prior to vaccination had some increased protection over and above the protection already provided by the vaccines, from hospitalization (P=.001) and death (P=.04), independent of age, sex, comorbidities, and vaccine type. We found that the top statistically significant risk factors for severe breakthrough SARS-CoV-2 infections were age of >50, male gender, moderate and severe renal failure, severe liver disease, leukemia, chronic lung disease, coagulopathy, and alcohol abuse.

CONCLUSIONS

Among individuals who were fully vaccinated, the risk of severe breakthrough SARS-CoV-2 infection was lower for recipients of the Moderna or Pfizer vaccines and higher for recipients of the Janssen vaccine. These results from our analysis at a population level will be helpful to public health policy makers. Our result on the influence of a previous SARS-CoV-2 infection necessitates further research into the impact of multiple exposures on the risk of developing severe COVID-19.

Collapse

Kshirsagar M, Yuan H, Ferres JL, Leslie C. BindVAE: Dirichlet variational autoencoders for de novo motif discovery from accessible chromatin. Genome Biol 2022;23:174. [PMID: 35971180 PMCID: PMC9380350 DOI: 10.1186/s13059-022-02723-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 06/28/2022] [Indexed: 11/10/2022] Open

Law JN, Akers K, Tasnina N, Santina CMD, Deutsch S, Kshirsagar M, Klein-Seetharaman J, Crovella M, Rajagopalan P, Kasif S, Murali TM. Interpretable network propagation with application to expanding the repertoire of human proteins that interact with SARS-CoV-2. Gigascience 2021;10:giab082. [PMID: 34966926 PMCID: PMC8716363 DOI: 10.1093/gigascience/giab082] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 09/21/2021] [Accepted: 11/28/2021] [Indexed: 01/02/2023] Open

Yao Y, Kshirsagar M, Vaidya G, Ducrée J, Ryan C. Convergence of Blockchain, Autonomous Agents, and Knowledge Graph to Share Electronic Health Records. Front Blockchain 2021. [DOI: 10.3389/fbloc.2021.661238] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Kshirsagar M, Tasnina N, Ward MD, Law JN, Murali TM, Lavista Ferres JM, Bowman GR, Klein-Seetharaman J. Protein sequence models for prediction and comparative analysis of the SARS-CoV-2 -human interactome. Pac Symp Biocomput 2021;26:154-165. [PMID: 33691013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Yuan H, Kshirsagar M, Zamparo L, Lu Y, Leslie CS. BindSpace decodes transcription factor binding signals by large-scale sequence embedding. Nat Methods 2019;16:858-861. [PMID: 31406384 PMCID: PMC6717532 DOI: 10.1038/s41592-019-0511-y] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2019] [Accepted: 07/10/2019] [Indexed: 01/04/2023]

Kshirsagar M, Murugesan K, Carbonell JG, Klein-Seetharaman J. Multitask Matrix Completion for Learning Protein Interactions Across Diseases. J Comput Biol 2017;24:501-514. [PMID: 28128642 DOI: 10.1089/cmb.2016.0201] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open

Kshirsagar M, Schleker S, Carbonell J, Klein-Seetharaman J. Techniques for transferring host-pathogen protein interactions knowledge to new tasks. Front Microbiol 2015;6:36. [PMID: 25699028 PMCID: PMC4313693 DOI: 10.3389/fmicb.2015.00036] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2014] [Accepted: 01/12/2015] [Indexed: 11/17/2022] Open

Schleker S, Kshirsagar M, Klein-Seetharaman J. Comparing human-Salmonella with plant-Salmonella protein-protein interaction predictions. Front Microbiol 2015;6:45. [PMID: 25674082 PMCID: PMC4309195 DOI: 10.3389/fmicb.2015.00045] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2014] [Accepted: 01/13/2015] [Indexed: 11/13/2022] Open

Kshirsagar M, Carbonell J, Klein-Seetharaman J. Multitask learning for host-pathogen protein interactions. Bioinformatics 2013;29:i217-26. [PMID: 23812987 PMCID: PMC3694681 DOI: 10.1093/bioinformatics/btt245] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Abstract

Motivation: An important aspect of infectious disease research involves understanding the differences and commonalities in the infection mechanisms underlying various diseases. Systems biology-based approaches study infectious diseases by analyzing the interactions between the host species and the pathogen organisms. This work aims to combine the knowledge from experimental studies of host–pathogen interactions in several diseases to build stronger predictive models. Our approach is based on a formalism from machine learning called ‘multitask learning’, which considers the problem of building models across tasks that are related to each other. A ‘task’ in our scenario is the set of host–pathogen protein interactions involved in one disease. To integrate interactions from several tasks (i.e. diseases), our method exploits the similarity in the infection process across the diseases. In particular, we use the biological hypothesis that similar pathogens target the same critical biological processes in the host, in defining a common structure across the tasks.

Results: Our current work on host–pathogen protein interaction prediction focuses on human as the host, and four bacterial species as pathogens. The multitask learning technique we develop uses a task-based regularization approach. We find that the resulting optimization problem is a difference of convex (DC) functions. To optimize, we implement a Convex–Concave procedure-based algorithm. We compare our integrative approach to baseline methods that build models on a single host–pathogen protein interaction dataset. Our results show that our approach outperforms the baselines on the training data. We further analyze the protein interaction predictions generated by the models, and find some interesting insights.

Availability: The predictions and code are available at: http://www.cs.cmu.edu/∼mkshirsa/ismb2013_paper320.html

Contact:j.klein-seetharaman@warwick.ac.uk

Supplementary information:Supplementary data are available at Bioinformatics online.

Collapse

Udupa A, Nahar P, Shah S, Kshirsagar M, Ghongane B. A comparative study of effects of omega-3 Fatty acids, alpha lipoic Acid and vitamin e in type 2 diabetes mellitus. Ann Med Health Sci Res 2013;3:442-6. [PMID: 24116330 PMCID: PMC3793456 DOI: 10.4103/2141-9248.117954] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open

Kshirsagar M, Carbonell J, Klein-Seetharaman J. Techniques to cope with missing data in host-pathogen protein interaction prediction. ACTA ACUST UNITED AC 2013;28:i466-i472. [PMID: 22962468 PMCID: PMC3436802 DOI: 10.1093/bioinformatics/bts375] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]

Zhao Z, Xia J, Tastan O, Singh I, Kshirsagar M, Carbonell J, Klein-Seetharaman J. Virus interactions with human signal transduction pathways. ACTA ACUST UNITED AC 2011;4:83-105. [PMID: 21330695 DOI: 10.1504/ijcbdd.2011.038658] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]