1
|
Menke M, Berger B, Cowen L. Matt: local flexibility aids protein multiple structure alignment. PLoS Comput Biol 2008; 4:e10. [PMID: 18193941 PMCID: PMC2186361 DOI: 10.1371/journal.pcbi.0040010] [Citation(s) in RCA: 137] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2007] [Accepted: 12/06/2007] [Indexed: 11/20/2022] Open
Abstract
Even when there is agreement on what measure a protein multiple structure alignment should be optimizing, finding the optimal alignment is computationally prohibitive. One approach used by many previous methods is aligned fragment pair chaining, where short structural fragments from all the proteins are aligned against each other optimally, and the final alignment chains these together in geometrically consistent ways. Ye and Godzik have recently suggested that adding geometric flexibility may help better model protein structures in a variety of contexts. We introduce the program Matt (Multiple Alignment with Translations and Twists), an aligned fragment pair chaining algorithm that, in intermediate steps, allows local flexibility between fragments: small translations and rotations are temporarily allowed to bring sets of aligned fragments closer, even if they are physically impossible under rigid body transformations. After a dynamic programming assembly guided by these "bent" alignments, geometric consistency is restored in the final step before the alignment is output. Matt is tested against other recent multiple protein structure alignment programs on the popular Homstrad and SABmark benchmark datasets. Matt's global performance is competitive with the other programs on Homstrad, but outperforms the other programs on SABmark, a benchmark of multiple structure alignments of proteins with more distant homology. On both datasets, Matt demonstrates an ability to better align the ends of alpha-helices and beta-strands, an important characteristic of any structure alignment program intended to help construct a structural template library for threading approaches to the inverse protein-folding problem. The related question of whether Matt alignments can be used to distinguish distantly homologous structure pairs from pairs of proteins that are not homologous is also considered. For this purpose, a p-value score based on the length of the common core and average root mean squared deviation (RMSD) of Matt alignments is shown to largely separate decoys from homologous protein structures in the SABmark benchmark dataset. We postulate that Matt's strong performance comes from its ability to model proteins in different conformational states and, perhaps even more important, its ability to model backbone distortions in more distantly related proteins.
Collapse
|
Research Support, N.I.H., Extramural |
17 |
137 |
2
|
Bradley P, Cowen L, Menke M, King J, Berger B. BETAWRAP: successful prediction of parallel beta -helices from primary sequence reveals an association with many microbial pathogens. Proc Natl Acad Sci U S A 2001; 98:14819-24. [PMID: 11752429 PMCID: PMC64942 DOI: 10.1073/pnas.251267298] [Citation(s) in RCA: 87] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2001] [Indexed: 11/18/2022] Open
Abstract
The amino acid sequence rules that specify beta-sheet structure in proteins remain obscure. A subclass of beta-sheet proteins, parallel beta-helices, represent a processive folding of the chain into an elongated topologically simpler fold than globular beta-sheets. In this paper, we present a computational approach that predicts the right-handed parallel beta-helix supersecondary structural motif in primary amino acid sequences by using beta-strand interactions learned from non-beta-helix structures. A program called BETAWRAP (http://theory.lcs.mit.edu/betawrap) implements this method and recognizes each of the seven known parallel beta-helix families, when trained on the known parallel beta-helices from outside that family. BETAWRAP identifies 2,448 sequences among 595,890 screened from the National Center for Biotechnology Information (NCBI; http://www.ncbi.nlm.nih.gov/) nonredundant protein database as likely parallel beta-helices. It identifies surprisingly many bacterial and fungal protein sequences that play a role in human infectious disease; these include toxins, virulence factors, adhesins, and surface proteins of Chlamydia, Helicobacteria, Bordetella, Leishmania, Borrelia, Rickettsia, Neisseria, and Bacillus anthracis. Also unexpected was the rarity of the parallel beta-helix fold and its predicted sequences among higher eukaryotes. The computational method introduced here can be called a three-dimensional dynamic profile method because it generates interstrand pairwise correlations from a processive sequence wrap. Such methods may be applicable to recognizing other beta structures for which strand topology and profiles of residue accessibility are well conserved.
Collapse
|
research-article |
24 |
87 |
3
|
Sledzieski S, Singh R, Cowen L, Berger B. D-SCRIPT translates genome to phenome with sequence-based, structure-aware, genome-scale predictions of protein-protein interactions. Cell Syst 2021; 12:969-982.e6. [PMID: 34536380 PMCID: PMC8586911 DOI: 10.1016/j.cels.2021.08.010] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 08/01/2021] [Accepted: 08/19/2021] [Indexed: 11/29/2022]
Abstract
We combine advances in neural language modeling and structurally motivated design to develop D-SCRIPT, an interpretable and generalizable deep-learning model, which predicts interaction between two proteins using only their sequence and maintains high accuracy with limited training data and across species. We show that a D-SCRIPT model trained on 38,345 human PPIs enables significantly improved functional characterization of fly proteins compared with the state-of-the-art approach. Evaluating the same D-SCRIPT model on protein complexes with known 3D structure, we find that the inter-protein contact map output by D-SCRIPT has significant overlap with the ground truth. We apply D-SCRIPT to screen for PPIs in cow (Bos taurus) at a genome-wide scale and focusing on rumen physiology, identify functional gene modules related to metabolism and immune response. The predicted interactions can then be leveraged for function prediction at scale, addressing the genome-to-phenome challenge, especially in species where little data are available.
Collapse
|
Research Support, N.I.H., Extramural |
4 |
60 |
4
|
Singh R, Sledzieski S, Bryson B, Cowen L, Berger B. Contrastive learning in protein language space predicts interactions between drugs and protein targets. Proc Natl Acad Sci U S A 2023; 120:e2220778120. [PMID: 37289807 PMCID: PMC10268324 DOI: 10.1073/pnas.2220778120] [Citation(s) in RCA: 59] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 04/10/2023] [Indexed: 06/10/2023] Open
Abstract
Sequence-based prediction of drug-target interactions has the potential to accelerate drug discovery by complementing experimental screens. Such computational prediction needs to be generalizable and scalable while remaining sensitive to subtle variations in the inputs. However, current computational techniques fail to simultaneously meet these goals, often sacrificing performance of one to achieve the others. We develop a deep learning model, ConPLex, successfully leveraging the advances in pretrained protein language models ("PLex") and employing a protein-anchored contrastive coembedding ("Con") to outperform state-of-the-art approaches. ConPLex achieves high accuracy, broad adaptivity to unseen data, and specificity against decoy compounds. It makes predictions of binding based on the distance between learned representations, enabling predictions at the scale of massive compound libraries and the human proteome. Experimental testing of 19 kinase-drug interaction predictions validated 12 interactions, including four with subnanomolar affinity, plus a strongly binding EPHB1 inhibitor (KD = 1.3 nM). Furthermore, ConPLex embeddings are interpretable, which enables us to visualize the drug-target embedding space and use embeddings to characterize the function of human cell-surface proteins. We anticipate that ConPLex will facilitate efficient drug discovery by making highly sensitive in silico drug screening feasible at the genome scale. ConPLex is available open source at https://ConPLex.csail.mit.edu.
Collapse
|
research-article |
2 |
59 |
5
|
Cowen L, Corey M, Simmons R, Keenan N, Robertson J, Levison H. Growing older with cystic fibrosis: psychologic adjustment of patients more than 16 years old. Psychosom Med 1984; 46:363-76. [PMID: 6484102 DOI: 10.1097/00006842-198407000-00005] [Citation(s) in RCA: 35] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
The 79 female and 147 male patients constituting the population with cystic fibrosis (CF) aged 16 years and older attending The Hospital for Sick Children were asked to complete the Cornell Medical Index (CMI) and Tennessee Self-Concept Scale (TSCS); 64 female (81%) and 112 male (76%) subjects participated. Analysis of CMI results showed 43% of female subjects to have moderate to severe emotional disturbance compared to 19% of male subjects. This female : male ratio for severity of emotional disturbance is found in ostensibly healthy groups, but the percentages of disturbance approach values for medical patient populations. The frequency of emotional disability is greater in those more than 20 than in those 16-19 years old. The TSCS results portray a generally normal self-concept except for scores of positive physical self and psychosis for patients aged 20 years and older; these scores approach psychiatric values, suggesting that some reality distortion facilitates emotional adjustment to adult life with CF. The TSCS and CMI results correlate significantly, indicating a connection between self-concept and emotional status. However, TSCS and CMI scores do not correlate with measures of disease severity except for correlations between lung function and physical self-concept in older male patients. These results suggest that psychologic functioning is independent of the degree of physical impairment in older patients with CF, with the long-surviving male patients more realistically appraising the limitations their disease imposes and utilizing denial and minimization to a lesser degree. Demographic data on the clinic population reveal that most patients aged 16 years and older cope with their intellectual, developmental, and socioeconomic tasks commensurate with normal age expectations.
Collapse
|
|
41 |
35 |
6
|
Baskaran JP, Weldy A, Guarin J, Munoz G, Shpilker PH, Kotlik M, Subbiah N, Wishart A, Peng Y, Miller MA, Cowen L, Oudin MJ. Cell shape, and not 2D migration, predicts extracellular matrix-driven 3D cell invasion in breast cancer. APL Bioeng 2020; 4:026105. [PMID: 32455252 PMCID: PMC7202897 DOI: 10.1063/1.5143779] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Accepted: 04/14/2020] [Indexed: 12/13/2022] Open
Abstract
Metastasis, the leading cause of death in cancer patients, requires the invasion of tumor cells through the stroma in response to migratory cues, in part provided by the extracellular matrix (ECM). Recent advances in proteomics have led to the identification of hundreds of ECM proteins, which are more abundant in tumors relative to healthy tissue. Our goal was to develop a pipeline to easily predict which ECM proteins are more likely to have an effect on cancer invasion and metastasis. We evaluated the effect of four ECM proteins upregulated in breast tumor tissue in multiple human breast cancer cell lines in three assays. There was no linear relationship between cell adhesion to ECM proteins and ECM-driven 2D cell migration speed, persistence, or 3D invasion. We then used classifiers and partial-least squares regression analysis to identify which metrics best predicted ECM-driven 2D migration and 3D invasion responses. We find that ECM-driven 2D cell migration speed or persistence did not predict 3D invasion in response to the same cue. However, cell adhesion, and in particular cell elongation and shape irregularity, accurately predicted the magnitude of ECM-driven 2D migration and 3D invasion. Our models successfully predicted the effect of novel ECM proteins in a cell-line specific manner. Overall, our studies identify the cell morphological features that determine 3D invasion responses to individual ECM proteins. This platform will help provide insight into the functional role of ECM proteins abundant in tumor tissue and help prioritize strategies for targeting tumor-ECM interactions to treat metastasis.
Collapse
|
research-article |
5 |
34 |
7
|
Cowen L, Bradley P, Menke M, King J, Berger B. Predicting the beta-helix fold from protein sequence data. J Comput Biol 2002; 9:261-76. [PMID: 12015881 DOI: 10.1089/10665270252935458] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A method is presented that uses beta-strand interactions to predict the parallel right-handed beta-helix super-secondary structural motif in protein sequences. A program called BetaWrap implements this method and is shown to score known beta-helices above non-beta-helices in the Protein Data Bank in cross-validation. It is demonstrated that BetaWrap learns each of the seven known SCOP beta-helix families, when trained primarily on beta-structures that are not beta-helices, together with structural features of known beta-helices from outside the family. BetaWrap also predicts many bacterial proteins of unknown structure to be beta-helices; in particular, these proteins serve as virulence factors, adhesins, and toxins in bacterial pathogenesis and include cell surface proteins from Chlamydia and the intestinal bacterium Helicobacter pylori. The computational method used here may generalize to other beta-structures for which strand topology and profiles of residue accessibility are well conserved.
Collapse
|
|
23 |
32 |
8
|
Simmons RJ, Corey M, Cowen L, Keenan N, Robertson J, Levison H. Emotional adjustment of early adolescents with cystic fibrosis. Psychosom Med 1985; 47:111-22. [PMID: 4048358 DOI: 10.1097/00006842-198503000-00002] [Citation(s) in RCA: 31] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Eighty-five 12- to 15-year-old adolescents regularly attending the cystic fibrosis (CF) clinic of The Hospital for Sick Children were asked to complete the Children's Health Locus of Control and the Tennessee Self Concept Scale. Their parents were requested to complete the Child Behavior Checklist. Thirty-four males (72%) and 28 females (74%) participated in the study. This study found that adolescents with CF are able to maintain a good self concept, be socially competent, and perceive that they are in control of their health while showing an increase in behavior problems. Females rely heavily on denial and are more behaviorally compliant, whereas boys use less denial but show more behavior problems. Males appear to integrate having a physical disorder into their self concept, whereas females do not. The findings demonstrate a difference in mechanisms of coping with cystic fibrosis between male and female adolescents with CF, which may contribute to the decline in physical status in females and better survival of males.
Collapse
|
|
40 |
31 |
9
|
Abstract
Summary Computational methods to predict protein–protein interaction (PPI) typically segregate into sequence-based ‘bottom-up’ methods that infer properties from the characteristics of the individual protein sequences, or global ‘top-down’ methods that infer properties from the pattern of already known PPIs in the species of interest. However, a way to incorporate top-down insights into sequence-based bottom-up PPI prediction methods has been elusive. We thus introduce Topsy-Turvy, a method that newly synthesizes both views in a sequence-based, multi-scale, deep-learning model for PPI prediction. While Topsy-Turvy makes predictions using only sequence data, during the training phase it takes a transfer-learning approach by incorporating patterns from both global and molecular-level views of protein interaction. In a cross-species context, we show it achieves state-of-the-art performance, offering the ability to perform genome-scale, interpretable PPI prediction for non-model organisms with no existing experimental PPI data. In species with available experimental PPI data, we further present a Topsy-Turvy hybrid (TT-Hybrid) model which integrates Topsy-Turvy with a purely network-based model for link prediction that provides information about species-specific network rewiring. TT-Hybrid makes accurate predictions for both well- and sparsely-characterized proteins, outperforming both its constituent components as well as other state-of-the-art PPI prediction methods. Furthermore, running Topsy-Turvy and TT-Hybrid screens is feasible for whole genomes, and thus these methods scale to settings where other methods (e.g. AlphaFold-Multimer) might be infeasible. The generalizability, accuracy and genome-level scalability of Topsy-Turvy and TT-Hybrid unlocks a more comprehensive map of protein interaction and organization in both model and non-model organisms. Availability and implementation https://topsyturvy.csail.mit.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
|
3 |
28 |
10
|
Kumar A, Cowen L. Augmented training of hidden Markov models to recognize remote homologs via simulated evolution. Bioinformatics 2009; 25:1602-8. [PMID: 19389731 PMCID: PMC2732314 DOI: 10.1093/bioinformatics/btp265] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION While profile hidden Markov models (HMMs) are successful and powerful methods to recognize homologous proteins, they can break down when homology becomes too distant due to lack of sufficient training data. We show that we can improve the performance of HMMs in this domain by using a simple simulated model of evolution to create an augmented training set. RESULTS We show, in two different remote protein homolog tasks, that HMMs whose training is augmented with simulated evolution outperform HMMs trained only on real data. We find that a mutation rate between 15 and 20% performs best for recognizing G-protein coupled receptor proteins in different classes, and for recognizing SCOP super-family proteins from different families.
Collapse
|
Research Support, N.I.H., Extramural |
16 |
26 |
11
|
Menke M, Berger B, Cowen L. Markov random fields reveal an N-terminal double beta-propeller motif as part of a bacterial hybrid two-component sensor system. Proc Natl Acad Sci U S A 2010; 107:4069-74. [PMID: 20147619 PMCID: PMC2819974 DOI: 10.1073/pnas.0909950107] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The recent explosion in newly sequenced bacterial genomes is outpacing the capacity of researchers to try to assign functional annotation to all the new proteins. Hence, computational methods that can help predict structural motifs provide increasingly important clues in helping to determine how these proteins might function. We introduce a Markov Random Field approach tailored for recognizing proteins that fold into mainly beta-structural motifs, and apply it to build recognizers for the beta-propeller shapes. As an application, we identify a potential class of hybrid two-component sensor proteins, that we predict contain a double-propeller domain.
Collapse
|
Research Support, N.I.H., Extramural |
15 |
23 |
12
|
McDonnell AV, Menke M, Palmer N, King J, Cowen L, Berger B. Fold recognition and accurate sequence-structure alignment of sequences directing beta-sheet proteins. Proteins 2006; 63:976-85. [PMID: 16547930 DOI: 10.1002/prot.20942] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The ability to predict structure from sequence is particularly important for toxins, virulence factors, allergens, cytokines, and other proteins of public health importance. Many such functions are represented in the parallel beta-helix and beta-trefoil families. A method using pairwise beta-strand interaction probabilities coupled with evolutionary information represented by sequence profiles is developed to tackle these problems for the beta-helix and beta-trefoil folds. The algorithm BetaWrapPro employs a "wrapping" component that may capture folding processes with an initiation stage followed by processive interaction of the sequence with the already-formed motifs. BetaWrapPro outperforms all previous motif recognition programs for these folds, recognizing the beta-helix with 100% sensitivity and 99.7% specificity and the beta-trefoil with 100% sensitivity and 92.5% specificity, in crossvalidation on a database of all nonredundant known positive and negative examples of these fold classes in the PDB. It additionally aligns 88% of residues for the beta-helices and 86% for the beta-trefoils accurately (within four residues of the exact position) to the structural template, which is then used with the side-chain packing program SCWRL to produce 3D structure predictions. One striking result has been the prediction of an unexpected parallel beta-helix structure for a pollen allergen, and its recent confirmation through solution of its structure. A Web server running BetaWrapPro is available and outputs putative PDB-style coordinates for sequences predicted to form the target folds.
Collapse
|
Research Support, U.S. Gov't, Non-P.H.S. |
19 |
22 |
13
|
Cowen L, Corey M, Keenan N, Simmons R, Arndt E, Levison H. Family adaptation and psychosocial adjustment to cystic fibrosis in the preschool child. Soc Sci Med 1985; 20:553-60. [PMID: 4001981 DOI: 10.1016/0277-9536(85)90393-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
The parents of 80% (41 of 51) of preschoolers with cystic fibrosis (CF) diagnosed at least 1 year prior to the study and attending the Hospital for Sick Children (HSC) CF Clinic completed the Problem Inventory (PINV), Preschool Behavior Questionnaire (PBQ) and Family Assessment Measure (FAM). The mean age of the CF children was 3.7 years. Parents of a control group of 31 healthy daycare children with a mean age of 3.6 years completed the same questionnaires. Parents of healthy preschoolers reported more child-related problems for 2-5 year olds than did parents of CF children (P less than 0.001) suggesting that parents who have confronted the CF diagnosis go on to minimize the normal stresses of the developmental period. Considerable agreement was seen between PINV scores for mothers and fathers in each group, revealing that parents in a given family perceive similarly the impact their child has upon them. The mean PBQ for CF preschoolers was not significantly different from that of the control group, although there was some tendency toward hostile aggressive behavior in the CF group. Surprisingly, total FAM scores of all samples showed no significant differences with the exception of a better total FAM score for fathers of CF children when compared to control fathers revealing that the CF family is not, during the early years of relative health stability, adversely affected. Two subscales were significantly elevated, social desirability (for CF mothers and fathers) and denial (for CF mothers only), describing an important response style which may enhance mastery of long-term stress.(ABSTRACT TRUNCATED AT 250 WORDS)
Collapse
|
|
40 |
20 |
14
|
Simmons RJ, Corey M, Cowen L, Keenan N, Robertson J, Levison H. Behavioral adjustment of latency age children with cystic fibrosis. Psychosom Med 1987; 49:291-301. [PMID: 3602299 DOI: 10.1097/00006842-198705000-00008] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
One hundred and twenty-six 6 to 11-year-old latency age children with Cystic Fibrosis regularly attending the CF clinic of the Hospital for Sick Children were asked to complete the Piers-Harris Self-Concept Scale and the Children's Health Locus of Control. Their parents were requested to complete the Child Behavior Checklist and the Family Assessment Measure. One hundred and eight (86%) participated in the study. Twenty-three percent of the children were found to have sufficient behavior problems to indicate a significant degree of maladjustment. Latency-age CF children show an increase in problems compared to a pre-school group, suggesting that leaving the protection of the family is problematic for a child with a chronic physical disorder. Males show more behavior problems than females. Males' behavior is characterized by somatic complaint profile. In spite of difficulties, CF latency children are able to maintain good social competence and self-concept suggesting compensatory mechanisms. These mechanisms are different for males and females. Females' self-concept and social competence are supportive of each other, whereas for males, this is not the case. Similarly, female behavior is relevant to family functioning. Males and females adjust to difficulty as indicated by differences in behavior profiles.
Collapse
|
|
38 |
19 |
15
|
Kumar A, Cowen L. Recognition of beta-structural motifs using hidden Markov models trained with simulated evolution. Bioinformatics 2010; 26:i287-93. [PMID: 20529918 PMCID: PMC2881384 DOI: 10.1093/bioinformatics/btq199] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Motivation: One of the most successful methods to date for recognizing protein sequences that are evolutionarily related, has been profile hidden Markov models. However, these models do not capture pairwise statistical preferences of residues that are hydrogen bonded in β-sheets. We thus explore methods for incorporating pairwise dependencies into these models. Results: We consider the remote homology detection problem for β-structural motifs. In particular, we ask if a statistical model trained on members of only one family in a SCOP β-structural superfamily, can recognize members of other families in that superfamily. We show that HMMs trained with our pairwise model of simulated evolution achieve nearly a median 5% improvement in AUC for β-structural motif recognition as compared to ordinary HMMs. Availability: All datasets and HMMs are available at: http://bcb.cs.tufts.edu/pairwise/ Contact:anoop.kumar@tufts.edu; lenore.cowen@tufts.edu
Collapse
|
Research Support, N.I.H., Extramural |
15 |
12 |
16
|
Menke M, King J, Berger B, Cowen L. Wrap-and-Pack: A New Paradigm for Beta Structural Motif Recognition with Application to Recognizing Beta Trefoils. J Comput Biol 2005; 12:777-95. [PMID: 16108716 DOI: 10.1089/cmb.2005.12.777] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
Abstract
A method is presented that uses beta-strand interactions at both the sequence and the atomic level, to predict beta-structural motifs of protein sequences. A program called Wrap-and- Pack implements this method and is shown to recognize beta-trefoils, an important class of globular beta-structures, in the Protein Data Bank with 92% specificity and 92.3% sensitivity in cross-validation. It is demonstrated that Wrap-and-Pack learns each of the ten known SCOP beta-trefoil families, when trained primarily on beta-structures that are not beta-trefoils, together with three-dimensional structures of known beta-trefoils from outside the family. Wrap-and-Pack also predicts many proteins of unknown structure to be beta-trefoils. The computational method used here may generalize to other beta-structures for which strand topology and profiles of residue accessibility are well conserved.
Collapse
|
|
20 |
6 |
17
|
Sledzieski S, Devkota K, Singh R, Cowen L, Berger B. TT3D: Leveraging precomputed protein 3D sequence models to predict protein-protein interactions. Bioinformatics 2023; 39:btad663. [PMID: 37897686 PMCID: PMC10640393 DOI: 10.1093/bioinformatics/btad663] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 09/24/2023] [Accepted: 10/27/2023] [Indexed: 10/30/2023] Open
Abstract
MOTIVATION High-quality computational structural models are now precomputed and available for nearly every protein in UniProt. However, the best way to leverage these models to predict which pairs of proteins interact in a high-throughput manner is not immediately clear. The recent Foldseek method of van Kempen et al. encodes the structural information of distances and angles along the protein backbone into a linear string of the same length as the protein string, using tokens from a 21-letter discretized structural alphabet (3Di). RESULTS We show that using both the amino acid sequence and the 3Di sequence generated by Foldseek as inputs to our recent deep-learning method, Topsy-Turvy, substantially improves the performance of predicting protein-protein interactions cross-species. Thus TT3D (Topsy-Turvy 3D) presents a way to reuse all the computational effort going into producing high-quality structural models from sequence, while being sufficiently lightweight so that high-quality binary protein-protein interaction predictions across all protein pairs can be made genome-wide. AVAILABILITY AND IMPLEMENTATION TT3D is available at https://github.com/samsledje/D-SCRIPT. An archived version of the code at time of submission can be found at https://zenodo.org/records/10037674.
Collapse
|
Research Support, N.I.H., Extramural |
2 |
5 |
18
|
|
|
53 |
4 |
19
|
|
|
54 |
3 |
20
|
Knepp T, Pippin M, Crawford J, Chen G, Szykman J, Long R, Cowen L, Cede A, Abuhassan N, Herman J, Delgado R, Compton J, Berkoff T, Fishman J, Martins D, Stauffer R, Thompson AM, Weinheimer A, Knapp D, Montzka D, Lenschow D, Neil D. Estimating surface NO 2 and SO 2 mixing ratios from fast-response total column observations and potential application to geostationary missions. JOURNAL OF ATMOSPHERIC CHEMISTRY 2015; 72:261-286. [PMID: 26692593 PMCID: PMC4665805 DOI: 10.1007/s10874-013-9257-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2012] [Accepted: 04/08/2013] [Indexed: 05/20/2023]
Abstract
Total-column nitrogen dioxide (NO2) data collected by a ground-based sun-tracking spectrometer system (Pandora) and an photolytic-converter-based in-situ instrument collocated at NASA's Langley Research Center in Hampton, Virginia were analyzed to study the relationship between total-column and surface NO2 measurements. The measurements span more than a year and cover all seasons. Surface mixing ratios are estimated via application of a planetary boundary-layer (PBL) height correction factor. This PBL correction factor effectively corrects for boundary-layer variability throughout the day, and accounts for up to ≈75 % of the variability between the NO2 data sets. Previous studies have made monthly and seasonal comparisons of column/surface data, which has shown generally good agreement over these long average times. In the current analysis comparisons of column densities averaged over 90 s and 1 h are made. Applicability of this technique to sulfur dioxide (SO2) is briefly explored. The SO2 correlation is improved by excluding conditions where surface levels are considered background. The analysis is extended to data from the July 2011 DISCOVER-AQ mission over the greater Baltimore, MD area to examine the method's performance in more-polluted urban conditions where NO2 concentrations are typically much higher.
Collapse
|
research-article |
10 |
3 |
21
|
Kumar L, Brenner N, Sledzieski S, Olaosebikan M, Roger LM, Lynn-Goin M, Klein-Seetharaman R, Berger B, Putnam H, Yang J, Lewinski NA, Singh R, Daniels NM, Cowen L, Klein-Seetharaman J. Transfer of knowledge from model organisms to evolutionarily distant non-model organisms: The coral Pocillopora damicornis membrane signaling receptome. PLoS One 2023; 18:e0270965. [PMID: 36735673 PMCID: PMC9897584 DOI: 10.1371/journal.pone.0270965] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 06/21/2022] [Indexed: 02/04/2023] Open
Abstract
With the ease of gene sequencing and the technology available to study and manipulate non-model organisms, the extension of the methodological toolbox required to translate our understanding of model organisms to non-model organisms has become an urgent problem. For example, mining of large coral and their symbiont sequence data is a challenge, but also provides an opportunity for understanding functionality and evolution of these and other non-model organisms. Much more information than for any other eukaryotic species is available for humans, especially related to signal transduction and diseases. However, the coral cnidarian host and human have diverged over 700 million years ago and homologies between proteins in the two species are therefore often in the gray zone, or at least often undetectable with traditional BLAST searches. We introduce a two-stage approach to identifying putative coral homologues of human proteins. First, through remote homology detection using Hidden Markov Models, we identify candidate human homologues in the cnidarian genome. However, for many proteins, the human genome alone contains multiple family members with similar or even more divergence in sequence. In the second stage, therefore, we filter the remote homology results based on the functional and structural plausibility of each coral candidate, shortlisting the coral proteins likely to have conserved some of the functions of the human proteins. We demonstrate our approach with a pipeline for mapping membrane receptors in humans to membrane receptors in corals, with specific focus on the stony coral, P. damicornis. More than 1000 human membrane receptors mapped to 335 coral receptors, including 151 G protein coupled receptors (GPCRs). To validate specific sub-families, we chose opsin proteins, representative GPCRs that confer light sensitivity, and Toll-like receptors, representative non-GPCRs, which function in the immune response, and their ability to communicate with microorganisms. Through detailed structure-function analysis of their ligand-binding pockets and downstream signaling cascades, we selected those candidate remote homologues likely to carry out related functions in the corals. This pipeline may prove generally useful for other non-model organisms, such as to support the growing field of synthetic biology.
Collapse
|
research-article |
2 |
3 |
22
|
Rabb N, Cowen L, de Ruiter JP, Scheutz M. Cognitive cascades: How to model (and potentially counter) the spread of fake news. PLoS One 2022; 17:e0261811. [PMID: 34995299 PMCID: PMC8740964 DOI: 10.1371/journal.pone.0261811] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Accepted: 12/10/2021] [Indexed: 11/22/2022] Open
Abstract
Understanding the spread of false or dangerous beliefs-often called misinformation or disinformation-through a population has never seemed so urgent. Network science researchers have often taken a page from epidemiologists, and modeled the spread of false beliefs as similar to how a disease spreads through a social network. However, absent from those disease-inspired models is an internal model of an individual's set of current beliefs, where cognitive science has increasingly documented how the interaction between mental models and incoming messages seems to be crucially important for their adoption or rejection. Some computational social science modelers analyze agent-based models where individuals do have simulated cognition, but they often lack the strengths of network science, namely in empirically-driven network structures. We introduce a cognitive cascade model that combines a network science belief cascade approach with an internal cognitive model of the individual agents as in opinion diffusion models as a public opinion diffusion (POD) model, adding media institutions as agents which begin opinion cascades. We show that the model, even with a very simplistic belief function to capture cognitive effects cited in disinformation study (dissonance and exposure), adds expressive power over existing cascade models. We conduct an analysis of the cognitive cascade model with our simple cognitive function across various graph topologies and institutional messaging patterns. We argue from our results that population-level aggregate outcomes of the model qualitatively match what has been reported in COVID-related public opinion polls, and that the model dynamics lend insights as to how to address the spread of problematic beliefs. The overall model sets up a framework with which social science misinformation researchers and computational opinion diffusion modelers can join forces to understand, and hopefully learn how to best counter, the spread of disinformation and "alternative facts."
Collapse
|
research-article |
3 |
1 |
23
|
Cowen L, Danset A, Fabre M. Etude comparée des réponses d'écoliers français et américains au test des histoires secrètes. ENFANCE 1965. [DOI: 10.3406/enfan.1965.2379] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
|
60 |
1 |
24
|
Ashey J, McKelvie H, Freeman J, Shpilker P, Zane LH, Becker DM, Cowen L, Richmond RH, Paul VJ, Seneca FO, Putnam HM. Characterizing transcriptomic responses to sediment stress across location and morphology in reef-building corals. PeerJ 2024; 12:e16654. [PMID: 38313033 PMCID: PMC10836209 DOI: 10.7717/peerj.16654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 11/20/2023] [Indexed: 02/06/2024] Open
Abstract
Anthropogenic activities increase sediment suspended in the water column and deposition on reefs can be largely dependent on colony morphology. Massive and plating corals have a high capacity to trap sediments, and active removal mechanisms can be energetically costly. Branching corals trap less sediment but are more susceptible to light limitation caused by suspended sediment. Despite deleterious effects of sediments on corals, few studies have examined the molecular response of corals with different morphological characteristics to sediment stress. To address this knowledge gap, this study assessed the transcriptomic responses of branching and massive corals in Florida and Hawai'i to varying levels of sediment exposure. Gene expression analysis revealed a molecular responsiveness to sediments across species and sites. Differential Gene Expression followed by Gene Ontology (GO) enrichment analysis identified that branching corals had the largest transcriptomic response to sediments, in developmental processes and metabolism, while significantly enriched GO terms were highly variable between massive corals, despite similar morphologies. Comparison of DEGs within orthogroups revealed that while all corals had DEGs in response to sediment, there was not a concerted gene set response by morphology or location. These findings illuminate the species specificity and genetic basis underlying coral susceptibility to sediments.
Collapse
|
research-article |
1 |
|
25
|
Ognyanov V, Cowen L. A day hospital program for patients in crisis. HOSPITAL & COMMUNITY PSYCHIATRY 1974; 25:209-10. [PMID: 4592386 DOI: 10.1176/ps.25.4.209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
|
51 |
|