1
|
Muntoni AP, Pagnani A. DCAlign v1.0: aligning biological sequences using co-evolution models and informed priors. Bioinformatics 2023; 39:btad537. [PMID: 37647658 PMCID: PMC10491954 DOI: 10.1093/bioinformatics/btad537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 06/14/2023] [Accepted: 08/29/2023] [Indexed: 09/01/2023] Open
Abstract
SUMMARY DCAlign is a new alignment method able to cope with the conservation and the co-evolution signals that characterize the columns of multiple sequence alignments of homologous sequences. However, the pre-processing steps required to align a candidate sequence are computationally demanding. We show in v1.0 how to dramatically reduce the overall computing time by including an empirical prior over an informative set of variables mirroring the presence of insertions and deletions. AVAILABILITY AND IMPLEMENTATION DCAlign v1.0 is implemented in Julia and it is fully available at https://github.com/infernet-h2020/DCAlign.
Collapse
Affiliation(s)
- Anna Paola Muntoni
- Italian Institute for Genomic Medicine, IRCCS Candiolo, I-10060 Candiolo (TO), Italy
- Politecnico di Torino, I-10129 Torino, Italy
| | - Andrea Pagnani
- Italian Institute for Genomic Medicine, IRCCS Candiolo, I-10060 Candiolo (TO), Italy
- Politecnico di Torino, I-10129 Torino, Italy
- INFN, Sezione di Torino, Torino, Via Pietro Giuria 1, I-10125 Torino, Italy
| |
Collapse
|
2
|
Budzynski L, Pagnani A. Small-coupling expansion for multiple sequence alignment. Phys Rev E 2023; 107:044125. [PMID: 37198812 DOI: 10.1103/physreve.107.044125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Accepted: 03/27/2023] [Indexed: 05/19/2023]
Abstract
The alignment of biological sequences such as DNA, RNA, and proteins, is one of the basic tools that allow to detect evolutionary patterns, as well as functional or structural characterizations between homologous sequences in different organisms. Typically, state-of-the-art bioinformatics tools are based on profile models that assume the statistical independence of the different sites of the sequences. Over the last years, it has become increasingly clear that homologous sequences show complex patterns of long-range correlations over the primary sequence as a consequence of the natural evolution process that selects genetic variants under the constraint of preserving the functional or structural determinants of the sequence. Here, we present an alignment algorithm based on message passing techniques that overcomes the limitations of profile models. Our method is based on a perturbative small-coupling expansion of the free energy of the model that assumes a linear chain approximation as the zeroth-order of the expansion. We test the potentiality of the algorithm against standard competing strategies on several biological sequences.
Collapse
Affiliation(s)
- Louise Budzynski
- DISAT, Politecnico di Torino, Corso Duca degli Abruzzi, 24, I-10129, Torino, Italy
- Italian Institute for Genomic Medicine, IRCCS Candiolo, SP-142, I-10060, Candiolo, Italy
| | - Andrea Pagnani
- DISAT, Politecnico di Torino, Corso Duca degli Abruzzi, 24, I-10129, Torino, Italy
- Italian Institute for Genomic Medicine, IRCCS Candiolo, SP-142, I-10060, Candiolo, Italy
- INFN, Sezione di Torino, Torino, Via Pietro Giuria, 1 10125 Torino Italy
| |
Collapse
|
3
|
Wojciechowski JW, Tekoglu E, Gąsior-Głogowska M, Coustou V, Szulc N, Szefczyk M, Kopaczyńska M, Saupe SJ, Dyrka W. Exploring a diverse world of effector domains and amyloid signaling motifs in fungal NLR proteins. PLoS Comput Biol 2022; 18:e1010787. [PMID: 36542665 PMCID: PMC9815663 DOI: 10.1371/journal.pcbi.1010787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 01/05/2023] [Accepted: 12/02/2022] [Indexed: 12/24/2022] Open
Abstract
NLR proteins are intracellular receptors constituting a conserved component of the innate immune system of cellular organisms. In fungi, NLRs are characterized by high diversity of architectures and presence of amyloid signaling. Here, we explore the diverse world of effector and signaling domains of fungal NLRs using state-of-the-art bioinformatic methods including MMseqs2 for fast clustering, probabilistic context-free grammars for sequence analysis, and AlphaFold2 deep neural networks for structure prediction. In addition to substantially improving the overall annotation, especially in basidiomycetes, the study identifies novel domains and reveals the structural similarity of MLKL-related HeLo- and Goodbye-like domains forming the most abundant superfamily of fungal NLR effectors. Moreover, compared to previous studies, we found several times more amyloid motif instances, including novel families, and validated aggregating and prion-forming properties of the most abundant of them in vitro and in vivo. Also, through an extensive in silico search, the NLR-associated amyloid signaling was identified in basidiomycetes. The emerging picture highlights similarities and differences in the NLR architectures and amyloid signaling in ascomycetes, basidiomycetes and other branches of life.
Collapse
Affiliation(s)
- Jakub W. Wojciechowski
- Katedra Inżynierii Biomedycznej, Wydział Podstawowych Problemów Techniki, Politechnika Wrocławska, Wrocław, Poland
| | - Emirhan Tekoglu
- Biyomühendislik Bölümü, Yıldız Teknik Üniversitesi, İstanbul, Turkey
- Wydział Chemiczny, Politechnika Wrocławska, Poland
| | - Marlena Gąsior-Głogowska
- Katedra Inżynierii Biomedycznej, Wydział Podstawowych Problemów Techniki, Politechnika Wrocławska, Wrocław, Poland
| | - Virginie Coustou
- Institut de Biochimie et de Génétique Cellulaire, UMR 5095 CNRS, Université de Bordeaux, Bordeaux, France
| | - Natalia Szulc
- Katedra Inżynierii Biomedycznej, Wydział Podstawowych Problemów Techniki, Politechnika Wrocławska, Wrocław, Poland
| | - Monika Szefczyk
- Katedra Chemii Bioorganicznej, Wydział Chemiczny, Politechnika Wrocławska, Wrocław, Poland
| | - Marta Kopaczyńska
- Katedra Inżynierii Biomedycznej, Wydział Podstawowych Problemów Techniki, Politechnika Wrocławska, Wrocław, Poland
| | - Sven J. Saupe
- Institut de Biochimie et de Génétique Cellulaire, UMR 5095 CNRS, Université de Bordeaux, Bordeaux, France
- * E-mail: (SJS); (WD)
| | - Witold Dyrka
- Katedra Inżynierii Biomedycznej, Wydział Podstawowych Problemów Techniki, Politechnika Wrocławska, Wrocław, Poland
- * E-mail: (SJS); (WD)
| |
Collapse
|