1
|
Weber LL, Reiman D, Roddur MS, Qi Y, El-Kebir M, Khan AA. TRIBAL: Tree Inference of B cell Clonal Lineages. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.27.568874. [PMID: 38076836 PMCID: PMC10705245 DOI: 10.1101/2023.11.27.568874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/12/2024]
Abstract
B cells are a critical component of the adaptive immune system, responsible for producing antibodies that help protect the body from infections and foreign substances. Single cell RNA-sequencing (scRNA-seq) has allowed for both profiling of B cell receptor (BCR) sequences and gene expression. However, understanding the adaptive and evolutionary mechanisms of B cells in response to specific stimuli remains a significant challenge in the field of immunology. We introduce a new method, TRIBAL, which aims to infer the evolutionary history of clonally related B cells from scRNA-seq data. The key insight of TRIBAL is that inclusion of isotype data into the B cell lineage inference problem is valuable for reducing phylogenetic uncertainty that arises when only considering the receptor sequences. Consequently, the TRIBAL inferred B cell lineage trees jointly capture the somatic mutations introduced to the B cell receptor during affinity maturation and isotype transitions during class switch recombination. In addition, TRIBAL infers isotype transition probabilities that are valuable for gaining insight into the dynamics of class switching. Via in silico experiments, we demonstrate that TRIBAL infers isotype transition probabilities with the ability to distinguish between direct versus sequential switching in a B cell population. This results in more accurate B cell lineage trees and corresponding ancestral sequence and class switch reconstruction compared to competing methods. Using real-world scRNA-seq datasets, we show that TRIBAL recapitulates expected biological trends in a model affinity maturation system. Furthermore, the B cell lineage trees inferred by TRIBAL were equally plausible for the BCR sequences as those inferred by competing methods but yielded lower entropic partitions for the isotypes of the sequenced B cell. Thus, our method holds the potential to further advance our understanding of vaccine responses, disease progression, and the identification of therapeutic antibodies.
Collapse
Affiliation(s)
- Leah L. Weber
- Department of Computer Science, University of Illinois at Urbana-Champaign, IL 61801, USA
| | - Derek Reiman
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| | - Mrinmoy S. Roddur
- Department of Computer Science, University of Illinois at Urbana-Champaign, IL 61801, USA
| | - Yuanyuan Qi
- Department of Computer Science, University of Illinois at Urbana-Champaign, IL 61801, USA
| | - Mohammed El-Kebir
- Department of Computer Science, University of Illinois at Urbana-Champaign, IL 61801, USA
| | - Aly A. Khan
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
- Department of Pathology, University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
2
|
Shi F, Li H, Rong G, Zhang Z, Wang J. Improved Fixed-Parameter Algorithm for the Tree Containment Problem on Unrooted Phylogenetic Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3539-3552. [PMID: 34506290 DOI: 10.1109/tcbb.2021.3111660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Phylogenetic trees are unable to represent the evolutionary process for a collection of species if reticulation events happened, and a generalized model named phylogenetic network was introduced consequently. However, the representation of the evolutionary process for one gene is actually a phylogenetic tree that is "contained" in the phylogenetic network for the considered species containing the gene. Thus a fundamental computational problem named Tree Containment problem arises, which asks whether a phylogenetic tree is contained in a phylogenetic network. The previous research on the problem mainly focused on its rooted version of which the considered tree and network are rooted, and several algorithms were proposed when the considered network is binary or structure-restricted. There is almost no algorithm for its unrooted version except the recent fixed-parameter algorithm with runtime O(4kn2), where k and n are the reticulation number and size of the considered unrooted binary phylogenetic network N, respectively. As the runtime is a little expensive when considering big values of k, we aim to improve it and successfully propose a fixed-parameter algorithm with runtime O(2.594kn2) in the paper. Additionally, we experimentally show its effectiveness on biological data and simulated data.
Collapse
|
3
|
Foroughmand-Araabi MH, Goliaei S, McHardy AC. Scelestial: Fast and accurate single-cell lineage tree inference based on a Steiner tree approximation algorithm. PLoS Comput Biol 2022; 18:e1009100. [PMID: 35951662 PMCID: PMC9426887 DOI: 10.1371/journal.pcbi.1009100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 08/30/2022] [Accepted: 06/23/2022] [Indexed: 11/19/2022] Open
Abstract
Single-cell genome sequencing provides a highly granular view of biological systems but is affected by high error rates, allelic amplification bias, and uneven genome coverage. This creates a need for data-specific computational methods, for purposes such as for cell lineage tree inference. The objective of cell lineage tree reconstruction is to infer the evolutionary process that generated a set of observed cell genomes. Lineage trees may enable a better understanding of tumor formation and growth, as well as of organ development for healthy body cells. We describe a method, Scelestial, for lineage tree reconstruction from single-cell data, which is based on an approximation algorithm for the Steiner tree problem and is a generalization of the neighbor-joining method. We adapt the algorithm to efficiently select a limited subset of potential sequences as internal nodes, in the presence of missing values, and to minimize cost by lineage tree-based missing value imputation. In a comparison against seven state-of-the-art single-cell lineage tree reconstruction algorithms—BitPhylogeny, OncoNEM, SCITE, SiFit, SASC, SCIPhI, and SiCloneFit—on simulated and real single-cell tumor samples, Scelestial performed best at reconstructing trees in terms of accuracy and run time. Scelestial has been implemented in C++. It is also available as an R package named RScelestial.
Collapse
Affiliation(s)
- Mohammad-Hadi Foroughmand-Araabi
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Sama Goliaei
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Alice C. McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
- * E-mail:
| |
Collapse
|
4
|
Yan J, Patterson N, Narasimhan VM. miqoGraph: fitting admixture graphs using mixed-integer quadratic optimization. Bioinformatics 2020; 37:2488-2490. [PMID: 33247708 DOI: 10.1093/bioinformatics/btaa988] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 10/25/2020] [Accepted: 11/16/2020] [Indexed: 11/14/2022] Open
Abstract
SUMMARY Admixture graphs represent the genetic relationship between a set of populations through splits, drift and admixture. In this article, we present the Julia package miqoGraph, which uses mixed-integer quadratic optimization to fit topology, drift lengths and admixture proportions simultaneously. Through applications of miqoGraph to both simulated and real data, we show that integer optimization can greatly speed up and automate what is usually an arduous manual process. AVAILABILITY AND IMPLEMENTATION https://github.com/juliayyan/PhylogeneticTrees.jl. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Julia Yan
- Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Nick Patterson
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
| | - Vagheesh M Narasimhan
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA.,Department of Integrative Biology, The University of Texas at Austin.,Department of Statistics and Data Science, The University of Texas at Austin
| |
Collapse
|
5
|
Diagnosis of Breast Hyperplasia and Evaluation of RuXian-I Based on Metabolomics Deep Belief Networks. Int J Mol Sci 2019; 20:ijms20112620. [PMID: 31141969 PMCID: PMC6600413 DOI: 10.3390/ijms20112620] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 05/24/2019] [Accepted: 05/26/2019] [Indexed: 01/09/2023] Open
Abstract
Breast cancer is estimated to be the leading cancer type among new cases in American women. Core biopsy data have shown a close association between breast hyperplasia and breast cancer. The early diagnosis and treatment of breast hyperplasia are extremely important to prevent breast cancer. The Mongolian medicine RuXian-I is a traditional drug that has achieved a high level of efficacy and a low incidence of side effects in its clinical use. However, for detecting the efficacy of RuXian-I, a rapid and accurate evaluation method based on metabolomic data is still lacking. Therefore, we proposed a framework, named the metabolomics deep belief network (MDBN), to analyze breast hyperplasia metabolomic data. We obtained 168 samples of metabolomic data from an animal model experiment of RuXian-I, which were averaged from control groups, treatment groups, and model groups. In the process of training, unlabelled data were used to pretrain the Deep Belief Networks models, and then labelled data were used to complete fine-tuning based on a limited-memory Broyden Fletcher Goldfarb Shanno (L-BFGS) algorithm. To prevent overfitting, a dropout method was added to the pretraining and fine-tuning procedures. The experimental results showed that the proposed model is superior to other classical classification methods that are based on positive and negative spectra data. Further, the proposed model can be used as an extension of the classification method for metabolomic data. For the high accuracy of classification of the three groups, the model indicates obvious differences and boundaries between the three groups. It can be inferred that the animal model of RuXian-I is well established, which can lay a foundation for subsequent related experiments. This also shows that metabolomic data can be used as a means to verify the effectiveness of RuXian-I in the treatment of breast hyperplasia.
Collapse
|
6
|
Catanzaro D, Ravi R, Schwartz R. A mixed integer linear programming model to reconstruct phylogenies from single nucleotide polymorphism haplotypes under the maximum parsimony criterion. Algorithms Mol Biol 2013; 8:3. [PMID: 23343437 PMCID: PMC3599976 DOI: 10.1186/1748-7188-8-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2012] [Accepted: 01/02/2013] [Indexed: 11/17/2022] Open
Abstract
Background Phylogeny estimation from aligned haplotype sequences has attracted more and more attention in the recent years due to its importance in analysis of many fine-scale genetic data. Its application fields range from medical research, to drug discovery, to epidemiology, to population dynamics. The literature on molecular phylogenetics proposes a number of criteria for selecting a phylogeny from among plausible alternatives. Usually, such criteria can be expressed by means of objective functions, and the phylogenies that optimize them are referred to as optimal. One of the most important estimation criteria is the parsimony which states that the optimal phylogeny T∗for a set
H of n haplotype sequences over a common set of variable loci is the one that satisfies the following requirements: (i) it has the shortest length and (ii) it is such that, for each pair of distinct haplotypes
hi,hj∈H, the sum of the edge weights belonging to the path from hi to hj in T∗ is not smaller than the observed number of changes between hi and hj. Finding the most parsimonious phylogeny for
H involves solving an optimization problem, called the Most Parsimonious Phylogeny Estimation Problem (MPPEP), which is
NP-hard in many of its versions. Results In this article we investigate a recent version of the MPPEP that arises when input data consist of single nucleotide polymorphism haplotypes extracted from a population of individuals on a common genomic region. Specifically, we explore the prospects for improving on the implicit enumeration strategy of implicit enumeration strategy used in previous work using a novel problem formulation and a series of strengthening valid inequalities and preliminary symmetry breaking constraints to more precisely bound the solution space and accelerate implicit enumeration of possible optimal phylogenies. We present the basic formulation and then introduce a series of provable valid constraints to reduce the solution space. We then prove that these constraints can often lead to significant reductions in the gap between the optimal solution and its non-integral linear programming bound relative to the prior art as well as often substantially faster processing of moderately hard problem instances. Conclusion We provide an indication of the conditions under which such an optimal enumeration approach is likely to be feasible, suggesting that these strategies are usable for relatively large numbers of taxa, although with stricter limits on numbers of variable sites. The work thus provides methodology suitable for provably optimal solution of some harder instances that resist all prior approaches.
Collapse
|
7
|
|
8
|
Dong J, Fernández-Baca D, McMorris FR. Constructing majority-rule supertrees. Algorithms Mol Biol 2010; 5:2. [PMID: 20047658 PMCID: PMC2826330 DOI: 10.1186/1748-7188-5-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2009] [Accepted: 01/04/2010] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Supertree methods combine the phylogenetic information from multiple partially-overlapping trees into a larger phylogenetic tree called a supertree. Several supertree construction methods have been proposed to date, but most of these are not designed with any specific properties in mind. Recently, Cotton and Wilkinson proposed extensions of the majority-rule consensus tree method to the supertree setting that inherit many of the appealing properties of the former. RESULTS We study a variant of one of Cotton and Wilkinson's methods, called majority-rule (+) supertrees. After proving that a key underlying problem for constructing majority-rule (+) supertrees is NP-hard, we develop a polynomial-size exact integer linear programming formulation of the problem. We then present a data reduction heuristic that identifies smaller subproblems that can be solved independently. While this technique is not guaranteed to produce optimal solutions, it can achieve substantial problem-size reduction. Finally, we report on a computational study of our approach on various real data sets, including the 121-taxon, 7-tree Seabirds data set of Kennedy and Page. CONCLUSIONS The results indicate that our exact method is computationally feasible for moderately large inputs. For larger inputs, our data reduction heuristic makes it feasible to tackle problems that are well beyond the range of the basic integer programming approach. Comparisons between the results obtained by our heuristic and exact solutions indicate that the heuristic produces good answers. Our results also suggest that the majority-rule (+) approach, in both its basic form and with data reduction, yields biologically meaningful phylogenies.
Collapse
Affiliation(s)
- Jianrong Dong
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA
| | | | - FR McMorris
- Department of Applied Mathematics, Illinois Institute of Technology, Chicago, IL 60616, USA
| |
Collapse
|