1
|
Facciuolo A, Scruten E, Lipsit S, Lang A, Parker Cates Z, Lew JM, Falzarano D, Gerdts V, Kusalik AJ, Napper S. High-resolution analysis of long-term serum antibodies in humans following convalescence of SARS-CoV-2 infection. Sci Rep 2022; 12:9045. [PMID: 35641545 PMCID: PMC9152668 DOI: 10.1038/s41598-022-12032-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 03/09/2022] [Indexed: 11/09/2022] Open
Abstract
Long-term antibody responses to SARS-CoV-2 have focused on responses to full-length spike protein, specific domains within spike, or nucleoprotein. In this study, we used high-density peptide microarrays representing the complete proteome of SARS-CoV-2 to identify binding sites (epitopes) targeted by antibodies present in the blood of COVID-19 resolved cases at 5 months post-diagnosis. Compared to previous studies that evaluated epitope-specific responses early post-diagnosis (< 60 days), we found that epitope-specific responses to nucleoprotein and spike protein have contracted, and that responses to membrane protein have expanded. Although antibody titers to full-length spike and nucleoprotein remain steady over months, taken together our data suggest that the population of epitope-specific antibodies that contribute to this reactivity is dynamic and evolves over time. Further, the spike epitopes bound by polyclonal antibodies in COVID-19 convalescent serum samples aligned with known target sites that can neutralize viral activity suggesting that the maintenance of these antibodies might provide rapid serological immunity. Finally, the most dominant epitopes for membrane protein and spike showed high diagnostic accuracy providing novel biomarkers to refine blood-based antibody tests. This study provides new insights into the specific regions of SARS-CoV-2 targeted by serum antibodies long after infection.
Collapse
Affiliation(s)
- Antonio Facciuolo
- Vaccine and Infectious Disease Organization (VIDO), University of Saskatchewan, Saskatoon, SK, Canada
| | - Erin Scruten
- Vaccine and Infectious Disease Organization (VIDO), University of Saskatchewan, Saskatoon, SK, Canada
| | - Sean Lipsit
- Vaccine and Infectious Disease Organization (VIDO), University of Saskatchewan, Saskatoon, SK, Canada.,Department of Biochemistry, Microbiology, and Immunology, University of Saskatchewan, Saskatoon, SK, Canada
| | - Amanda Lang
- Roy Romanow Provincial Laboratory, Saskatchewan Health Authority, Regina, SK, Canada
| | - Zoë Parker Cates
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| | - Jocelyne M Lew
- Vaccine and Infectious Disease Organization (VIDO), University of Saskatchewan, Saskatoon, SK, Canada
| | - Darryl Falzarano
- Vaccine and Infectious Disease Organization (VIDO), University of Saskatchewan, Saskatoon, SK, Canada.,Department of Veterinary Microbiology, University of Saskatchewan, Saskatoon, SK, Canada
| | - Volker Gerdts
- Vaccine and Infectious Disease Organization (VIDO), University of Saskatchewan, Saskatoon, SK, Canada
| | - Anthony J Kusalik
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| | - Scott Napper
- Vaccine and Infectious Disease Organization (VIDO), University of Saskatchewan, Saskatoon, SK, Canada. .,Department of Biochemistry, Microbiology, and Immunology, University of Saskatchewan, Saskatoon, SK, Canada.
| |
Collapse
|
2
|
Maleki F, Ovens K, McQuillan I, Kusalik AJ. Silver: Forging almost Gold Standard Datasets. Genes (Basel) 2021; 12:genes12101523. [PMID: 34680918 PMCID: PMC8535810 DOI: 10.3390/genes12101523] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 09/19/2021] [Accepted: 09/22/2021] [Indexed: 11/16/2022] Open
Abstract
Gene set analysis has been widely used to gain insight from high-throughput expression studies. Although various tools and methods have been developed for gene set analysis, there is no consensus among researchers regarding best practice(s). Most often, evaluation studies have reported contradictory recommendations of which methods are superior. Therefore, an unbiased quantitative framework for evaluations of gene set analysis methods will be valuable. Such a framework requires gene expression datasets where enrichment status of gene sets is known a priori. In the absence of such gold standard datasets, artificial datasets are commonly used for evaluations of gene set analysis methods; however, they often rely on oversimplifying assumptions that make them biased in favor of or against a given method. In this paper, we propose a quantitative framework for evaluation of gene set analysis methods by synthesizing expression datasets using real data, without relying on oversimplifying or unrealistic assumptions, while preserving complex gene-gene correlations and retaining the distribution of expression values. The utility of the quantitative approach is shown by evaluating ten widely used gene set analysis methods. An implementation of the proposed method is publicly available. We suggest using Silver to evaluate existing and new gene set analysis methods. Evaluation using Silver provides a better understanding of current methods and can aid in the development of gene set analysis methods to achieve higher specificity without sacrificing sensitivity.
Collapse
Affiliation(s)
- Farhad Maleki
- Augmented Intelligence & Precision Health Laboratory, Institute of the McGill University Health Centre, McGill University, Montreal, QC H4A 3S5, Canada;
- Correspondence:
| | - Katie Ovens
- Augmented Intelligence & Precision Health Laboratory, Institute of the McGill University Health Centre, McGill University, Montreal, QC H4A 3S5, Canada;
| | - Ian McQuillan
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK S7N 5C9, Canada; (I.M.); (A.J.K.)
| | - Anthony J. Kusalik
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK S7N 5C9, Canada; (I.M.); (A.J.K.)
| |
Collapse
|
3
|
Parker Cates Z, Facciuolo A, Hogan D, Griebel PJ, Napper S, Kusalik AJ. EPIphany—A Platform for Analysis and Visualization of Peptide Immunoarray Data. Front Bioinform 2021; 1:694324. [PMID: 36303765 PMCID: PMC9581008 DOI: 10.3389/fbinf.2021.694324] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 06/25/2021] [Indexed: 11/13/2022] Open
Abstract
Antibodies are critical effector molecules of the humoral immune system. Upon infection or vaccination, populations of antibodies are generated which bind to various regions of the invading pathogen or exogenous agent. Defining the reactivity and breadth of this antibody response provides an understanding of the antigenic determinants and enables the rational development and assessment of vaccine candidates. High-resolution analysis of these populations typically requires advanced techniques such as B cell receptor repertoire sequencing, mass spectrometry of isolated immunoglobulins, or phage display libraries that are dependent upon equipment and expertise which are prohibitive for many labs. High-density peptide microarrays representing diverse populations of putative linear epitopes (immunoarrays) are an effective alternative for high-throughput examination of antibody reactivity and diversity. While a promising technology, widespread adoption of immunoarrays has been limited by the need for, and relative absence of, user-friendly tools for consideration and visualization of the emerging data. To address this limitation, we developed EPIphany, a software platform with a simple web-based user interface, aimed at biological users, that provides access to important analysis parameters, data normalization options, and a variety of unique data visualization options. This platform provides researchers the greatest opportunity to extract biologically meaningful information from the immunoarray data, thereby facilitating the discovery and development of novel immuno-therapeutics.
Collapse
Affiliation(s)
- Zoe Parker Cates
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| | - Antonio Facciuolo
- Vaccine and Infectious Disease Organization (VIDO), University of Saskatchewan, Saskatoon, SK, Canada
| | - Daniel Hogan
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| | - Philip J. Griebel
- Vaccine and Infectious Disease Organization (VIDO), University of Saskatchewan, Saskatoon, SK, Canada
- School of Public Health, University of Saskatchewan, Saskatoon, SK, Canada
| | - Scott Napper
- Vaccine and Infectious Disease Organization (VIDO), University of Saskatchewan, Saskatoon, SK, Canada
- Department of Biochemistry, Microbiology and Immunology, University of Saskatchewan, Saskatoon, SK, Canada
- *Correspondence: Scott Napper,
| | - Anthony J. Kusalik
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| |
Collapse
|
4
|
Finch SL, Rosenberg AM, Kusalik AJ, Maleki F, Rezaei E, Baxter-Jones A, Benseler S, Boire G, Cabral D, Campillo S, Chédeville G, Chetaille AL, Dancey P, Duffy C, Duffy KW, Guzman J, Houghton K, Huber AM, Jurencak R, Lang B, Laxer RM, Morishita K, Oen KG, Petty RE, Ramsey SE, Roth J, Schneider R, Scuccimarri R, Stringer E, Tse SML, Tucker LB, Turvey SE, Szafron M, Whiting S, Yeung RS, Vatanparast H. Higher concentrations of vitamin D in Canadian children with juvenile idiopathic arthritis compared to healthy controls are associated with more frequent use of vitamin D supplements and season of birth. Nutr Res 2021; 92:139-149. [PMID: 34311227 DOI: 10.1016/j.nutres.2021.05.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 05/05/2021] [Accepted: 05/23/2021] [Indexed: 10/21/2022]
Abstract
A number of studies have demonstrated that patients with autoimmune disease have lower levels of vitamin D prompting speculation that vitamin D might suppress inflammation and immune responses in children with juvenile idiopathic arthritis (JIA). The objective of this study was to compare vitamin D levels in children with JIA at disease onset with healthy children. We hypothesized that children and adolescents with JIA have lower vitamin D levels than healthy children and adolescents. Data from a Canadian cohort of children with new-onset JIA (n= 164, data collection 2007-2012) were compared to Canadian Health Measures Survey (CHMS) data (n=4027, data collection 2007-2011). We compared 25-hydroxy vitamin D (25(OH)D) concentrations with measures of inflammation, vitamin D supplement use, milk intake, and season of birth. Mean 25(OH)D level was significantly higher in patients with JIA (79 ± 3.1 nmol/L) than in healthy controls (68 ± 1.8 nmol/L P <.05). Patients with JIA more often used vitamin D containing supplements (50% vs. 7%; P <.05). The prevalence of 25(OH)D deficiency (<30 nmol/L) was 6% for both groups. Children with JIA with 25(OH)D deficiency or insufficiency (<50 nmol/L) had higher C-reactive protein levels. Children with JIA were more often born in the fall and winter compared to healthy children. In contrast to earlier studies, we found vitamin D levels in Canadian children with JIA were higher compared to healthy children and associated with more frequent use of vitamin D supplements. Among children with JIA, low vitamin D levels were associated with indicators of greater inflammation.
Collapse
Affiliation(s)
- Sarah L Finch
- University of Saskatchewan, Saskatoon, Canada; University of Prince Edward Island, Charlottetown, Canada
| | | | | | | | | | | | - Susanne Benseler
- Alberta Children's Hospital, Cumming School of Medicine, University of Calgary, Calgary, Canada
| | | | - David Cabral
- BC Children's Hospital and The University of British Columbia, Vancouver, Canada
| | | | | | | | - Paul Dancey
- Janeway Children's Health and Rehabilitation Centre, St. John's, Canada
| | - Ciaran Duffy
- Children's Hospital of Eastern Ontario, Ottawa, Canada
| | | | - Jaime Guzman
- BC Children's Hospital and The University of British Columbia, Vancouver, Canada
| | - Kristin Houghton
- BC Children's Hospital and The University of British Columbia, Vancouver, Canada
| | - Adam M Huber
- IWK Health Centre and Dalhousie University, Halifax, Canada
| | | | - Bianca Lang
- IWK Health Centre and Dalhousie University, Halifax, Canada
| | - Ron M Laxer
- The University of Toronto and The Hospital for Sick Children, Toronto, Canada
| | - Kimberly Morishita
- BC Children's Hospital and The University of British Columbia, Vancouver, Canada
| | - Kiem G Oen
- University of Manitoba, Winnipeg, Canada
| | - Ross E Petty
- BC Children's Hospital and The University of British Columbia, Vancouver, Canada
| | | | - Johannes Roth
- Children's Hospital of Eastern Ontario, Ottawa, Canada
| | - Rayfel Schneider
- The University of Toronto and The Hospital for Sick Children, Toronto, Canada
| | | | | | - Shirley M L Tse
- The University of Toronto and The Hospital for Sick Children, Toronto, Canada
| | - Lori B Tucker
- BC Children's Hospital and The University of British Columbia, Vancouver, Canada
| | - Stuart E Turvey
- BC Children's Hospital and The University of British Columbia, Vancouver, Canada
| | | | | | - Rae Sm Yeung
- The University of Toronto and The Hospital for Sick Children, Toronto, Canada
| | | | | |
Collapse
|
5
|
Rezaei E, Hogan D, Trost B, Kusalik AJ, Boire G, Cabral DA, Campillo S, Chédeville G, Chetaille AL, Dancey P, Duffy C, Watanabe Duffy K, Gordon J, Guzman J, Houghton K, Huber AM, Jurencak R, Lang B, Morishita K, Oen KG, Petty RE, Ramsey SE, Scuccimarri R, Spiegel L, Stringer E, Taylor-Gjevre RM, Tse SML, Tucker LB, Turvey SE, Tupper S, Yeung RSM, Benseler S, Ellsworth J, Guillet C, Karananayake C, Muhajarine N, Roth J, Schneider R, Rosenberg AM. Clinical and associated inflammatory biomarker features predictive of short-term outcomes in non-systemic juvenile idiopathic arthritis. Rheumatology (Oxford) 2021; 59:2402-2411. [PMID: 31919503 DOI: 10.1093/rheumatology/kez615] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 11/04/2019] [Indexed: 12/11/2022] Open
Abstract
OBJECTIVE To identify early predictors of disease activity at 18 months in JIA using clinical and biomarker profiling. METHODS Clinical and biomarker data were collected at JIA diagnosis in a prospective longitudinal inception cohort of 82 children with non-systemic JIA, and their ability to predict an active joint count of 0, a physician global assessment of disease activity of ≤1 cm, and inactive disease by Wallace 2004 criteria 18 months later was assessed. Correlation-based feature selection and ReliefF were used to shortlist predictors and random forest models were trained to predict outcomes. RESULTS From the original 112 features, 13 effectively predicted 18-month outcomes. They included age, number of active/effused joints, wrist, ankle and/or knee involvement, ESR, ANA positivity and plasma levels of five inflammatory biomarkers (IL-10, IL-17, IL-12p70, soluble low-density lipoprotein receptor-related protein 1 and vitamin D), at enrolment. The clinical plus biomarker panel predicted active joint count = 0, physician global assessment ≤ 1, and inactive disease after 18 months with 0.79, 0.80 and 0.83 accuracy and 0.84, 0.83, 0.88 area under the curve, respectively. Using clinical features alone resulted in 0.75, 0.72 and 0.80 accuracy, and area under the curve values of 0.81, 0.78 and 0.83, respectively. CONCLUSION A panel of five plasma biomarkers combined with clinical features at the time of diagnosis more accurately predicted short-term disease activity in JIA than clinical characteristics alone. If validated in external cohorts, such a panel may guide more rationally conceived, biologically based, personalized treatment strategies in early JIA.
Collapse
Affiliation(s)
- Elham Rezaei
- Department of PediatricsUniversity of Saskatchewan, Saskatoon, SK, Canada
| | - Daniel Hogan
- Department of Computer Sciences, University of Saskatchewan, Saskatoon, SKCanada
| | - Brett Trost
- Department of Computer Sciences, University of Saskatchewan, Saskatoon, SKCanada
| | - Anthony J Kusalik
- Department of Computer Sciences, University of Saskatchewan, Saskatoon, SKCanada
| | - Gilles Boire
- Département de Médecine, Université de Sherbrooke, Sherbrooke, QCCanada
| | - David A Cabral
- Department of Pediatrics, British Columbia Children's Hospital, Vancouver, BCCanada
| | - Sarah Campillo
- Department of Pediatrics, McGill University Health Center, Montreal, QCCanada
| | - Gaëlle Chédeville
- Department of Pediatrics, McGill University Health Center, Montreal, QCCanada
| | - Anne-Laure Chetaille
- Département de Médecine le, Centre Hospitalier Universitaire de Quebec, Quebec, QCCanada
| | - Paul Dancey
- Department of Pediatrics, Janeway Children's Health and Rehabilitation Centre, St John's, NLCanada
| | - Ciaran Duffy
- Department of Pediatrics, Children's Hospital of Eastern Ontario, Ottawa, ONCanada
| | - Karen Watanabe Duffy
- Department of Pediatrics, Children's Hospital of Eastern Ontario, Ottawa, ONCanada
| | - John Gordon
- Department of Medicine, University of Saskatchewan, Saskatoon, SKCanada
| | - Jaime Guzman
- Department of Pediatrics, British Columbia Children's Hospital, Vancouver, BCCanada
| | - Kristin Houghton
- Department of Pediatrics, British Columbia Children's Hospital, Vancouver, BCCanada
| | - Adam M Huber
- Department of Pediatrics, IWK Health Centre and Dalhousie University, Halifax, NSCanada
| | - Roman Jurencak
- Department of Pediatrics, Children's Hospital of Eastern Ontario, Ottawa, ONCanada
| | - Bianca Lang
- Department of Pediatrics, IWK Health Centre and Dalhousie University, Halifax, NSCanada
| | - Kimberly Morishita
- Department of Pediatrics, British Columbia Children's Hospital, Vancouver, BCCanada
| | - Kiem G Oen
- Department of Pediatrics, University of Manitoba, Winnipeg, MBCanada
| | - Ross E Petty
- Department of Pediatrics, British Columbia Children's Hospital, Vancouver, BCCanada
| | - Suzanne E Ramsey
- Department of Pediatrics, IWK Health Centre and Dalhousie University, Halifax, NSCanada
| | - Rosie Scuccimarri
- Department of Pediatrics, McGill University Health Center, Montreal, QCCanada
| | - Lynn Spiegel
- Department of Paediatrics, University of Toronto and the Hospital for Sick Children, Toronto, ONCanada
| | - Elizabeth Stringer
- Department of Pediatrics, IWK Health Centre and Dalhousie University, Halifax, NSCanada
| | | | - Shirley M L Tse
- Department of Paediatrics, University of Toronto and the Hospital for Sick Children, Toronto, ONCanada
| | - Lori B Tucker
- Department of Pediatrics, British Columbia Children's Hospital, Vancouver, BCCanada
| | - Stuart E Turvey
- Department of Pediatrics, British Columbia Children's Hospital, Vancouver, BCCanada
| | - Susan Tupper
- Department of PediatricsUniversity of Saskatchewan, Saskatoon, SK, Canada
| | - Rae S M Yeung
- Department of Paediatrics, University of Toronto and the Hospital for Sick Children, Toronto, ONCanada
| | - Susanne Benseler
- Department of Pediatrics, University of Calgary, Calgary, ABCanada
| | - Janet Ellsworth
- Department of Pediatrics, University of Alberta, Edmonton, ABCanada
| | - Chantal Guillet
- Department of Pediatrics, Hôpital Fleurimont (CHUS), Quebec, QCCanada
| | | | - Nazeem Muhajarine
- Department of Community Health and Epidemiology, University of Saskatchewan, Saskatoon, SK, Canada
| | - Johannes Roth
- Department of Pediatrics, Children's Hospital of Eastern Ontario, Ottawa, ONCanada
| | - Rayfel Schneider
- Department of Paediatrics, University of Toronto and the Hospital for Sick Children, Toronto, ONCanada
| | - Alan M Rosenberg
- Department of PediatricsUniversity of Saskatchewan, Saskatoon, SK, Canada
| |
Collapse
|
6
|
Rezaei E, Hogan D, Trost B, Kusalik AJ, Boire G, Cabral DA, Campillo S, Chédeville G, Chetaille AL, Dancey P, Duffy C, Duffy KW, Eng SWM, Gordon J, Guzman J, Houghton K, Huber AM, Jurencak R, Lang B, Laxer RM, Morishita K, Oen KG, Petty RE, Ramsey SE, Scherer SW, Scuccimarri R, Spiegel L, Stringer E, Taylor-Gjevre RM, Tse SML, Tucker LB, Turvey SE, Tupper S, Wintle RF, Yeung RSM, Rosenberg AM. Associations of clinical and inflammatory biomarker clusters with juvenile idiopathic arthritis categories. Rheumatology (Oxford) 2020; 59:1066-1075. [PMID: 32321162 DOI: 10.1093/rheumatology/kez382] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 07/30/2019] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE To identify discrete clusters comprising clinical features and inflammatory biomarkers in children with JIA and to determine cluster alignment with JIA categories. METHODS A Canadian prospective inception cohort comprising 150 children with JIA was evaluated at baseline (visit 1) and after six months (visit 2). Data included clinical manifestations and inflammation-related biomarkers. Probabilistic principal component analysis identified sets of composite variables, or principal components, from 191 original variables. To discern new clinical-biomarker clusters (clusters), Gaussian mixture models were fit to the data. Newly-defined clusters and JIA categories were compared. Agreement between the two was assessed using Kruskal-Wallis analyses and contingency plots. RESULTS Three principal components recovered 35% (three clusters) and 40% (five clusters) of the variance in patient profiles in visits 1 and 2, respectively. None of the clusters aligned precisely with any of the seven JIA categories but rather spanned multiple categories. Results demonstrated that the newly defined clinical-biomarker lustres are more homogeneous than JIA categories. CONCLUSION Applying unsupervised data mining to clinical and inflammatory biomarker data discerns discrete clusters that intersect multiple JIA categories. Results suggest that certain groups of patients within different JIA categories are more aligned pathobiologically than their separate clinical categorizations suggest. Applying data mining analyses to complex datasets can generate insights into JIA pathogenesis and could contribute to biologically based refinements in JIA classification.
Collapse
Affiliation(s)
- Elham Rezaei
- Department of Pediatrics, University of Saskatchewan, Saskatoon, Canada
| | - Daniel Hogan
- Department of Computer Sciences, University of Saskatchewan
| | - Brett Trost
- Department of Computer Sciences, University of Saskatchewan
| | | | - Gilles Boire
- Département de Médecine, Université de Sherbrooke, Sherbrooke
| | - David A Cabral
- Department of Pediatrics, British Columbia Children's Hospital, Vancouver
| | - Sarah Campillo
- Department of Pediatrics, McGill University Health Center, Montreal
| | | | | | - Paul Dancey
- Department of Pediatrics, Janeway Children's Health and Rehabilitation Centre, St. John's
| | - Ciaran Duffy
- Department of Pediatrics, Children's Hospital of Eastern Ontario, Ottawa
| | | | - Simon W M Eng
- Department of Pediatrics, University of Toronto and the Hospital for Sick Children, Toronto
| | - John Gordon
- Department of Medicine, University of Saskatchewan
| | - Jaime Guzman
- Department of Pediatrics, British Columbia Children's Hospital, Vancouver
| | - Kristin Houghton
- Department of Pediatrics, British Columbia Children's Hospital, Vancouver
| | - Adam M Huber
- Department of Pediatrics, IWK Health Centre and Dalhousie University, Halifax
| | - Roman Jurencak
- Department of Pediatrics, Children's Hospital of Eastern Ontario, Ottawa
| | - Bianca Lang
- Department of Pediatrics, IWK Health Centre and Dalhousie University, Halifax
| | - Ronald M Laxer
- Department of Pediatrics, University of Toronto and the Hospital for Sick Children, Toronto
| | - Kimberly Morishita
- Department of Pediatrics, British Columbia Children's Hospital, Vancouver
| | - Kiem G Oen
- Department of Pediatrics, University of Manitoba, Winnipeg
| | - Ross E Petty
- Department of Pediatrics, British Columbia Children's Hospital, Vancouver
| | - Suzanne E Ramsey
- Department of Pediatrics, IWK Health Centre and Dalhousie University, Halifax
| | | | | | - Lynn Spiegel
- Department of Pediatrics, University of Toronto and the Hospital for Sick Children, Toronto
| | - Elizabeth Stringer
- Department of Pediatrics, IWK Health Centre and Dalhousie University, Halifax
| | | | - Shirley M L Tse
- Department of Pediatrics, University of Toronto and the Hospital for Sick Children, Toronto
| | - Lori B Tucker
- Department of Pediatrics, British Columbia Children's Hospital, Vancouver
| | - Stuart E Turvey
- Department of Pediatrics, British Columbia Children's Hospital, Vancouver
| | - Susan Tupper
- Department of Pediatrics, University of Saskatchewan, Saskatoon, Canada
| | | | - Rae S M Yeung
- Department of Pediatrics, University of Toronto and the Hospital for Sick Children, Toronto
| | - Alan M Rosenberg
- Department of Pediatrics, University of Saskatchewan, Saskatoon, Canada
| | | |
Collapse
|
7
|
Abstract
Gene set analysis methods are widely used to provide insight into high-throughput gene expression data. There are many gene set analysis methods available. These methods rely on various assumptions and have different requirements, strengths and weaknesses. In this paper, we classify gene set analysis methods based on their components, describe the underlying requirements and assumptions for each class, and provide directions for future research in developing and evaluating gene set analysis methods.
Collapse
|
8
|
Pérez‐López E, Hossain MM, Tu J, Waldner M, Todd CD, Kusalik AJ, Wei Y, Bonham‐Smith PC. Transcriptome Analysis Identifies Plasmodiophora brassicae Secondary Infection Effector Candidates. J Eukaryot Microbiol 2020; 67:337-351. [PMID: 31925980 PMCID: PMC7317818 DOI: 10.1111/jeu.12784] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2019] [Revised: 12/15/2019] [Accepted: 01/04/2020] [Indexed: 12/17/2022]
Abstract
Plasmodiophora brassicae (Wor.) is an obligate intracellular plant pathogen affecting Brassicas worldwide. Identification of effector proteins is key to understanding the interaction between P. brassicae and its susceptible host plants. To date, there is very little information available on putative effector proteins secreted by P. brassicae during a secondary infection of susceptible host plants, resulting in root gall production. A bioinformatics pipeline approach to RNA-Seq data from Arabidopsis thaliana (L.) Heynh. root tissues at 17, 20, and 24 d postinoculation (dpi) identified 32 small secreted P. brassicae proteins (SSPbPs) that were highly expressed over this secondary infection time frame. Functional signal peptides were confirmed for 31 of the SSPbPs, supporting the accuracy of the pipeline designed to identify secreted proteins. Expression profiles at 0, 2, 5, 7, 14, 21, and 28 dpi verified the involvement of some of the SSPbPs in secondary infection. For seven of the SSPbPs, a functional domain was identified using Blast2GO and 3D structure analysis and domain functionality was confirmed for SSPbP22, a kinase localized to the cytoplasm and nucleus.
Collapse
Affiliation(s)
- Edel Pérez‐López
- Department of BiologyUniversity of SaskatchewanSaskatoonSKS7N 5E2Canada
| | | | - Jiangying Tu
- Agriculture and Agri‐food CanadaSaskatoon Research CentreSaskatoonSKS7N 0X2Canada
| | - Matthew Waldner
- Department of Computer ScienceUniversity of SaskatchewanSaskatoonSKS7N 5C9Canada
| | | | - Anthony J. Kusalik
- Department of Computer ScienceUniversity of SaskatchewanSaskatoonSKS7N 5C9Canada
| | - Yangdou Wei
- Department of BiologyUniversity of SaskatchewanSaskatoonSKS7N 5E2Canada
| | | |
Collapse
|
9
|
Maleki F, Ovens KL, Hogan DJ, Rezaei E, Rosenberg AM, Kusalik AJ. Measuring consistency among gene set analysis methods: A systematic study. J Bioinform Comput Biol 2019; 17:1940010. [DOI: 10.1142/s0219720019400109] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Gene set analysis is a quantitative approach for generating biological insight from gene expression datasets. The abundance of gene set analysis methods speaks to their popularity, but raises the question of the extent to which results are affected by the choice of method. Our systematic analysis of 13 popular methods using 6 different datasets, from both DNA microarray and RNA-Seq origin, shows that this choice matters a great deal. We observed that the overall number of gene sets reported by each method differed by up to 2 orders of magnitude, and there was a bias toward reporting large gene sets with some methods. Furthermore, there was substantial disagreement between the 20 most statistically significant gene sets reported by the methods. This was also observed when expanding to the 100 most statistically significant reported gene sets. For different datasets of the same phenotype/condition, the top 20 and top 100 most significant results also showed little to no agreement even when using the same method. GAGE, PAGE, and ORA were the only methods able to achieve relatively high reproducibility when comparing the 20 and 100 most statistically significant gene sets. Biological validation on a juvenile idiopathic arthritis (JIA) dataset showed wide variation in terms of the relevance of the top 20 and top 100 most significant gene sets to known biology of the disease, where GAGE predicted the most relevant gene sets, followed by GSEA, ORA, and PAGE.
Collapse
Affiliation(s)
- Farhad Maleki
- Department of Computer Science, University of Saskatchewan, 110 Science Place, Saskatoon SK S7N 5C9, Canada
| | - Katie L. Ovens
- Department of Computer Science, University of Saskatchewan, 110 Science Place, Saskatoon SK S7N 5C9, Canada
| | - Daniel J. Hogan
- Department of Computer Science, University of Saskatchewan, 110 Science Place, Saskatoon SK S7N 5C9, Canada
| | - Elham Rezaei
- Department of Pediatrics, Royal University Hospital, Saskatoon SK S7N OW8, Canada
| | - Alan M. Rosenberg
- Department of Pediatrics, Royal University Hospital, Saskatoon SK S7N OW8, Canada
| | - Anthony J. Kusalik
- Department of Computer Science, University of Saskatchewan, 110 Science Place, Saskatoon SK S7N 5C9, Canada
| |
Collapse
|
10
|
Abstract
BACKGROUND Gene set analysis is a well-established approach for interpretation of data from high-throughput gene expression studies. Achieving reproducible results is an essential requirement in such studies. One factor of a gene expression experiment that can affect reproducibility is the choice of sample size. However, choosing an appropriate sample size can be difficult, especially because the choice may be method-dependent. Further, sample size choice can have unexpected effects on specificity. RESULTS In this paper, we report on a systematic, quantitative approach to study the effect of sample size on the reproducibility of the results from 13 gene set analysis methods. We also investigate the impact of sample size on the specificity of these methods. Rather than relying on synthetic data, the proposed approach uses real expression datasets to offer an accurate and reliable evaluation. CONCLUSION Our findings show that, as a general pattern, the results of gene set analysis become more reproducible as sample size increases. However, the extent of reproducibility and the rate at which it increases vary from method to method. In addition, even in the absence of differential expression, some gene set analysis methods report a large number of false positives, and increasing sample size does not lead to reducing these false positives. The results of this research can be used when selecting a gene set analysis method from those available.
Collapse
Affiliation(s)
- Farhad Maleki
- Department of Computer Science, University of Saskatchewan, 110 Science Place, Saskatoon, Canada.
| | - Katie Ovens
- Department of Computer Science, University of Saskatchewan, 110 Science Place, Saskatoon, Canada
| | - Ian McQuillan
- Department of Computer Science, University of Saskatchewan, 110 Science Place, Saskatoon, Canada
| | - Anthony J Kusalik
- Department of Computer Science, University of Saskatchewan, 110 Science Place, Saskatoon, Canada
| |
Collapse
|
11
|
Irani S, Trost B, Waldner M, Nayidu N, Tu J, Kusalik AJ, Todd CD, Wei Y, Bonham-Smith PC. Transcriptome analysis of response to Plasmodiophora brassicae infection in the Arabidopsis shoot and root. BMC Genomics 2018; 19:23. [PMID: 29304736 PMCID: PMC5756429 DOI: 10.1186/s12864-017-4426-7] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2017] [Accepted: 12/29/2017] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Clubroot is an important disease caused by the obligate parasite Plasmodiophora brassicae that infects the Brassicaceae. As a soil-borne pathogen, P. brassicae induces the generation of abnormal tissue in the root, resulting in the formation of galls. Root infection negatively affects the uptake of water and nutrients in host plants, severely reducing their growth and productivity. Many studies have emphasized the molecular and physiological effects of the clubroot disease on root tissues. The aim of the present study is to better understand the effect of P. brassicae on the transcriptome of both shoot and root tissues of Arabidopsis thaliana. RESULTS Transcriptome profiling using RNA-seq was performed on both shoot and root tissues at 17, 20 and 24 days post inoculation (dpi) of A. thaliana, a model plant host for P. brassicae. The number of differentially expressed genes (DEGs) between infected and uninfected samples was larger in shoot than in root. In both shoot and root, more genes were differentially regulated at 24 dpi than the two earlier time points. Genes that were highly regulated in response to infection in both shoot and root primarily were involved in the metabolism of cell wall compounds, lipids, and shikimate pathway metabolites. Among hormone-related pathways, several jasmonic acid biosynthesis genes were upregulated in both shoot and root tissue. Genes encoding enzymes involved in cell wall modification, biosynthesis of sucrose and starch, and several classes of transcription factors were generally differently regulated in shoot and root. CONCLUSIONS These results highlight the similarities and differences in the transcriptomic response of above- and below-ground tissues of the model host Arabidopsis following P. brassicae infection. The main transcriptomic changes in root metabolism during clubroot disease progression were identified. An overview of DEGs in the shoot underlined the physiological changes in above-ground tissues following pathogen establishment and disease progression. This study provides insights into host tissue-specific molecular responses to clubroot development and may have applications in the development of clubroot markers for more effective breeding strategies.
Collapse
Affiliation(s)
- Solmaz Irani
- 0000 0001 2154 235Xgrid.25152.31Department of Biology, University of Saskatchewan, Saskatoon, S7N 5E2 Canada
| | - Brett Trost
- 0000 0001 2154 235Xgrid.25152.31Department of Computer Science, University of Saskatchewan, Saskatoon, S7N 5C9 Canada
| | - Matthew Waldner
- 0000 0001 2154 235Xgrid.25152.31Department of Computer Science, University of Saskatchewan, Saskatoon, S7N 5C9 Canada
| | - Naghabushana Nayidu
- 0000 0001 2154 235Xgrid.25152.31Department of Biology, University of Saskatchewan, Saskatoon, S7N 5E2 Canada
| | - Jiangying Tu
- 0000 0001 2154 235Xgrid.25152.31Department of Biology, University of Saskatchewan, Saskatoon, S7N 5E2 Canada
| | - Anthony J. Kusalik
- 0000 0001 2154 235Xgrid.25152.31Department of Computer Science, University of Saskatchewan, Saskatoon, S7N 5C9 Canada
| | - Christopher D. Todd
- 0000 0001 2154 235Xgrid.25152.31Department of Biology, University of Saskatchewan, Saskatoon, S7N 5E2 Canada
| | - Yangdou Wei
- 0000 0001 2154 235Xgrid.25152.31Department of Biology, University of Saskatchewan, Saskatoon, S7N 5E2 Canada
| | - Peta C. Bonham-Smith
- 0000 0001 2154 235Xgrid.25152.31Department of Biology, University of Saskatchewan, Saskatoon, S7N 5E2 Canada
| |
Collapse
|
12
|
Pérez-López E, Waldner M, Hossain M, Kusalik AJ, Wei Y, Bonham-Smith PC, Todd CD. Identification of Plasmodiophora brassicae effectors - A challenging goal. Virulence 2018; 9:1344-1353. [PMID: 30146948 PMCID: PMC6177251 DOI: 10.1080/21505594.2018.1504560] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Accepted: 07/18/2018] [Indexed: 11/06/2022] Open
Abstract
Clubroot is an economically important disease affecting Brassica plants worldwide. Plasmodiophora brassicae is the protist pathogen associated with the disease, and its soil-borne obligate parasitic nature has impeded studies related to its biology and the mechanisms involved in its infection of the plant host. The identification of effector proteins is key to understanding how the pathogen manipulates the plant's immune response and the genes involved in resistance. After more than 140 years studying clubroot and P. brassicae, very little is known about the effectors playing key roles in the infection process and subsequent disease progression. Here we analyze the information available for identified effectors and suggest several features of effector genes that can be used in the search for others. Based on the information presented in this review, we propose a comprehensive bioinformatics pipeline for effector identification and provide a list of the bioinformatics tools available for such.
Collapse
Affiliation(s)
- Edel Pérez-López
- Department of Biology, University of Saskatchewan, Saskatoon, Canada
| | - Matthew Waldner
- Department of Computer Science, University of Saskatchewan, Saskatoon, Canada
| | - Musharaf Hossain
- Department of Biology, University of Saskatchewan, Saskatoon, Canada
| | - Anthony J. Kusalik
- Department of Computer Science, University of Saskatchewan, Saskatoon, Canada
| | - Yangdou Wei
- Department of Biology, University of Saskatchewan, Saskatoon, Canada
| | | | | |
Collapse
|
13
|
Abstract
De novo peptide sequencing using tandem mass spectrometry (MS/MS) data has become a major computational method for sequence identification in recent years. With the development of new instruments and technology, novel computational methods have emerged with enhanced performance. However, there are only a few methods focusing on ECD/ETD spectra, which mainly contain variants of c -ions and z-ions. Here, a de novo sequencing method for ECD/ETD spectra, NovoExD, is presented. NovoExD applies a new form of spectrum graph with multiple edge types (called a GMET), considers multiple peptide tags, and integrates amino acid combination (AAC) and fragment ion charge information. Its performance is compared with another successful de novo sequencing method, pNovo+, which has an option for ECD/ETD spectra. Experiments conducted on three different datasets show that the average full length peptide identification accuracy of NovoExD is as high as 88.70 percent, and that NovoExD's average accuracy is more than 20 percent greater on all datasets than that of pNovo+.
Collapse
|
14
|
Yan Y, Kusalik AJ, Wu FX. De novopeptide sequencing using CID and HCD spectra pairs. Proteomics 2016; 16:2615-2624. [DOI: 10.1002/pmic.201500251] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Revised: 05/31/2016] [Accepted: 07/08/2016] [Indexed: 11/06/2022]
Affiliation(s)
- Yan Yan
- Division; of Biomedical Engineering; University of Saskatchewan; Saskatoon Saskatchewan Canada
| | - Anthony J. Kusalik
- Division; of Biomedical Engineering; University of Saskatchewan; Saskatoon Saskatchewan Canada
- Department of Computer Science; University of Saskatchewan; Saskatoon Saskatchewan Canada
| | - Fang-Xiang Wu
- Division; of Biomedical Engineering; University of Saskatchewan; Saskatoon Saskatchewan Canada
- Department of Mechanical Engineering; University of Saskatchewan; Saskatoon Saskatchewan Canada
| |
Collapse
|
15
|
Yan Y, Kusalik AJ, Wu FX. Recent Developments in Computational Methods for De Novo Peptide Sequencing from Tandem Mass Spectrometry (MS/MS). Protein Pept Lett 2016; 22:983-91. [PMID: 26295161 DOI: 10.2174/0929866522666150821113127] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2015] [Revised: 08/12/2015] [Accepted: 08/20/2015] [Indexed: 11/22/2022]
Abstract
Tandem mass spectrometry (MS/MS) has emerged as a major technology for peptide sequencing. Typically, there are three kinds of methods for the peptide sequencing: database searching, peptide tagging, and de novo sequencing. De novo sequencing has drawn increasing attention because of its independence from existing protein databases and potential for identifying new proteins, proteins resulting from mutations, proteins with unexpected modifications and so on. Recently, with the improvements in the accuracy of MS/MS and development of alternative fragmentation modes of MS/MS, many new de novo sequencing methods have been formulated. This paper reviews these recently developed sequencing methods including those for alternative MS/MS spectra. The paper first introduces background knowledge on peptide sequencing and mass spectrometry, and then reviews de novo peptide sequencing methods for traditional CID spectra. After that, it focuses on the recent development of de novo methods for alternative MS/MS spectra. In addition, methods using multiple spectra from the same peptide are surveyed. Finally, conclusions and some directions of future work are discussed.
Collapse
Affiliation(s)
| | | | - Fang-Xiang Wu
- Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, Canada.
| |
Collapse
|
16
|
|
17
|
Abstract
In recent years, de novo peptide sequencing from mass spectrometry data has developed as one of the major peptide identification methods with the emergence of new instruments and advanced computational methods. However, there are still limitations to this method; for example, the typically used spectrum graph model cannot represent all the information and relationships inherent in tandem mass spectra (MS/MS spectra). Here, we present a new method named NovoHCD which applies a spectrum graph model with multiple types of edges (called a multi-edge graph), and integrates into it amino acid combination (AAC) information and peptide tags. In addition, information on immonium ions observed particularly in higher-energy collisional dissociation (HCD) spectra is incorporated. Comparisons between NovoHCD and another successful de novo peptide sequencing method for HCD spectra, pNovo, were performed. Experiments were conducted on five HCD spectral datasets. Results show that NovoHCD outperforms pNovo in terms of full length peptide identification accuracy; specifically, the accuracy increases 13%-21% over the five datasets.
Collapse
|
18
|
Zhang C, Bickis MG, Wu FX, Kusalik AJ. Optimally-connected hidden markov models for predicting MHC-binding peptides. J Bioinform Comput Biol 2007; 4:959-80. [PMID: 17099936 DOI: 10.1142/s0219720006002314] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2006] [Revised: 06/12/2006] [Accepted: 06/12/2006] [Indexed: 11/18/2022]
Abstract
Hidden Markov models (HMMs) are one of various methods that have been applied to prediction of major histo-compatibility complex (MHC) binding peptide. In terms of model topology, a fully-connected HMM (fcHMM) has the greatest potential to predict binders, at the cost of intensive computation. While a profile HMM (pHMM) performs dramatically fewer computations, it potentially merges overlapping patterns into one which results in some patterns being missed. In a profile HMM a state corresponds to a position on a peptide while in an fcHMM a state has no specific biological meaning. This work proposes optimally-connected HMMs (ocHMMs), which do not merge overlapping patterns and yet, by performing topological reductions, a model's connectivity is greatly reduced from an fcHMM. The parameters of ocHMMs are initialized using a novel amino acid grouping approach called "multiple property grouping." Each group represents a state in an ocHMM. The proposed ocHMMs are compared to a pHMM implementation using HMMER, based on performance tests on two MHC alleles HLA (Human Leukocyte Antigen)-A*0201 and HLA-B*3501. The results show that the heuristic approaches can be adjusted to make an ocHMM achieve higher predictive accuracy than HMMER. Hence, such obtained ocHMMs are worthy of trial for predicting MHC-binding peptides.
Collapse
Affiliation(s)
- Chenhong Zhang
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK S7N 5C9, Canada.
| | | | | | | |
Collapse
|
19
|
Abstract
BACKGROUND One type of DNA microarray experiment is discovery of gene expression patterns for a cell line undergoing a biological process over a series of time points. Two important issues with such an experiment are the number of time points, and the interval between them. In the absence of biological knowledge regarding appropriate values, it is natural to question whether the behaviour of progressively generated data may by itself determine a threshold beyond which further microarray experiments do not contribute to pattern discovery. Additionally, such a threshold implies a minimum number of microarray experiments, which is important given the cost of these experiments. RESULTS We have developed a method for determining the minimum number of microarray experiments (i.e. time points) for temporal gene expression, assuming that the span between time points is given and the hierarchical clustering technique is used for gene expression pattern discovery. The key idea is a similarity measure for two clusterings which is expressed as a function of the data for progressive time points. While the experiments are underway, this function is evaluated. When the function reaches its maximum, it indicates the set of experiments reach a saturated state. Therefore, further experiments do not contribute to the discrimination of patterns. CONCLUSION The method has been verified with two previously published gene expression datasets. For both experiments, the number of time points determined with our method is less than in the published experiments. It is noted that the overall approach is applicable to other clustering techniques.
Collapse
Affiliation(s)
- Fang-Xiang Wu
- Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, S7N 5A9, Canada
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, S7N 5A9, Canada
| | - WJ Zhang
- Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, S7N 5A9, Canada
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, S7N 5A9, Canada
| | - Anthony J Kusalik
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, S7N 5A9, Canada
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, S7N 5C9, Canada
| |
Collapse
|
20
|
Wu FX, Zhang WJ, Kusalik AJ. Dynamic model-based clustering for time-course gene expression data. J Bioinform Comput Biol 2005; 3:821-36. [PMID: 16078363 DOI: 10.1142/s0219720005001314] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2004] [Revised: 10/06/2004] [Accepted: 10/11/2004] [Indexed: 11/18/2022]
Abstract
Microarray technology has produced a huge body of time-course gene expression data. Such gene expression data has proved useful in genomic disease diagnosis and genomic drug design. The challenge is how to uncover useful information in such data. Cluster analysis has played an important role in analyzing gene expression data. Many distance/correlation- and static model-based clustering techniques have been applied to time-course expression data. However, these techniques are unable to account for the dynamics of such data. It is the dynamics that characterize the data and that should be considered in cluster analysis so as to obtain high quality clustering. This paper proposes a dynamic model-based clustering method for time-course gene expression data. The proposed method regards a time-course gene expression dataset as a set of time series, generated by a number of stochastic processes. Each stochastic process defines a cluster and is described by an autoregressive model. A relocation-iteration algorithm is proposed to identity the model parameters and posterior probabilities are employed to assign each gene to an appropriate cluster. A bootstrapping method and an average adjusted Rand index (AARI) are employed to measure the quality of clustering. Computational experiments are performed on a synthetic and three real time-course gene expression datasets to investigate the proposed method. The results show that our method allows the better quality clustering than other clustering methods (e.g. k-means) for time-course gene expression data, and thus it is a useful and powerful tool for analyzing time-course gene expression data.
Collapse
Affiliation(s)
- Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, Saskatchewan, S7N 5A9, Canada.
| | | | | |
Collapse
|
21
|
Wu FX, Zhang WJ, Kusalik AJ. Modeling gene expression from microarray expression data with state-space equations. Pac Symp Biocomput 2004:581-92. [PMID: 14992535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
We describe a new method to model gene expression from time-course gene expression data. The modelling is in terms of state-space descriptions of linear systems. A cell can be considered to be a system where the behaviours (responses) of the cell depend completely on the current internal state plus any external inputs. The gene expression levels in the cell provide information about the behaviours of the cell. In previously proposed methods, genes were viewed as internal state variables of a cellular system and their expression levels were the values of the intemal state variables. This viewpoint has suffered from the underestimation of the model parameters. Instead, we view genes as the observation variables, whose expression values depend on the current intemal state variables and any external input. Factor analysis is used to identify the internal state variables, and Bayesian Information Criterion (BIC) is used to determine the number of the internal state variables. By building dynamic equations of the internal state variables and the relationships between the internal state variables and the observation variables (gene expression profiles), we get state-space descriptions of gene expression model. In the present method, model parameters may be unambiguously identified from time-course gene expression data. We apply the method to two time-course gene expression datasets to illustrate it.
Collapse
Affiliation(s)
- F X Wu
- Division of Biomedical Engineering, University of Saskatchewan, 57 Campus Dr., Saskatoon, SK, S7N 5A9, Canada.
| | | | | |
Collapse
|
22
|
Abstract
This paper describes an improved method for conducting global feature comparisons of protein molecules in three dimensions and for producing a new form of multiple structure alignment. Our automated MolCom method incorporates an octtree strategy to partition and examine molecular properties in three-dimensional space at multiple levels of analysis. The MolCom method's multiple alignment is in the form of an octtree which locates regions in three-dimensional space where correspondence between molecules is identified based on a dynamic set of molecular features. MolCom offers a practical solution to the inherent compromise between computational complexity and analytical detail. MolCom is currently the only method that can analyze and compare a series of defined physicochemical properties using multiple, simultaneous levels of resolution. It is also the only method that provides a consensus structure outlining precisely where the similarity exists in three-dimensional space. Using a modest-sized collection of structural properties, separate experiments were conducted to calibrate MolCom and to verify that the spatial analyses and resulting structure alignments accurately identified both similar and dissimilar structures. The accuracy of MolCom was found to be over 99% and the similarity scores correlated strongly with the z-scores of the Alignment by Incremental Combinatorial Extension of the Optimal Path method.
Collapse
Affiliation(s)
- S D O'Hearn
- National Research Council of Canada, Plant Biotechnology Institute, Saskatoon, Saskatchewan, S7N 0W9, Canada
| | | | | |
Collapse
|