1
|
Saldova R, Asadi Shehni A, Haakensen VD, Steinfeld I, Hilliard M, Kifer I, Helland A, Yakhini Z, Børresen-Dale AL, Rudd PM. Association of N-glycosylation with breast carcinoma and systemic features using high-resolution quantitative UPLC. J Proteome Res 2014; 13:2314-27. [PMID: 24669823 DOI: 10.1021/pr401092y] [Citation(s) in RCA: 122] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
An improved separation of the human serum N-glycome using hydrophilic interaction chromatography technology with UPLC is described, where more than 140 N-glycans were assigned. Using this technique, serum samples from 107 healthy controls and 62 newly diagnosed breast cancer patients were profiled. The most statistically significant alterations were observed in cancer patients compared with healthy controls: an increase in sialylation, branching, and outer-arm fucosylation and a decrease in high-mannosylated and biantennary core-fucosylated glycans. In the controls and cases combined systemic features were analyzed; serum estradiol was associated with increase in digalactosylated glycans, and higher mammographic density was associated with increase in biantennary digalactosylated glycans and with decrease in trisialylated and in outer-arm fucosylated glycans. Furthermore, particular glycans were altered in some features of the breast carcinomas; bisected biantennary nonfucosylated glycans were decreased in patients with progesterone receptor positive tumors, and core-fucosylated biantennary bisected monogalactosylated glycans were decreased in patients with the TP53 mutation. Systemic features show more significant associations with the serum N-glycome than do the features of the breast carcinomas. In conclusion, the UPLC-based glycan analysis technique described here reveals highly significant differences between healthy women and breast cancer patients. Significant associations with breast carcinoma and systemic features are described.
Collapse
|
Research Support, Non-U.S. Gov't |
11 |
122 |
2
|
Haakensen VD, Steinfeld I, Saldova R, Shehni AA, Kifer I, Naume B, Rudd PM, Børresen-Dale AL, Yakhini Z. Serum N-glycan analysis in breast cancer patients--Relation to tumour biology and clinical outcome. Mol Oncol 2015; 10:59-72. [PMID: 26321095 DOI: 10.1016/j.molonc.2015.08.002] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2014] [Revised: 08/02/2015] [Accepted: 08/03/2015] [Indexed: 12/13/2022] Open
Abstract
Glycosylation and related processes play important roles in cancer development and progression, including metastasis. Several studies have shown that N-glycans have potential diagnostic value as cancer serum biomarkers. We have explored the significance of the abundance of particular serum N-glycan structures as important features of breast tumour biology by studying the serum glycome and tumour transcriptome (mRNA and miRNA) of 104 breast cancer patients. Integration of these types of molecular data allows us to study the relationship between serum glycans and transcripts representing functional pathways, such as metabolic pathways or DNA damage response. We identified tri antennary trigalactosylated trisialylated glycans in serum as being associated with lower levels of tumour transcripts involved in focal adhesion and integrin-mediated cell adhesion. These glycan structures were also linked to poor prognosis in patients with ER negative tumours. High abundance of simple monoantennary glycan structures were associated with increased survival, particularly in the basal-like subgroup. The presence of circulating tumour cells was found to be significantly associated with several serum glycome structures like bi and triantennary, di- and trigalactosylated, di- and trisialylated. The link between tumour miRNA expression levels and N-glycan production is also examined.
Collapse
|
Research Support, Non-U.S. Gov't |
10 |
33 |
3
|
Portugaly E, Kifer I, Linial M. Selecting targets for structural determination by navigating in a graph of protein families. Bioinformatics 2002; 18:899-907. [PMID: 12117787 DOI: 10.1093/bioinformatics/18.7.899] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION A major goal in structural genomics is to enrich the catalogue of proteins whose 3D structures are known. In an attempt to address this problem we mapped over 10 000 proteins with solved structures onto a graph of all Swissprot protein sequences (release 36, approximately 73 000 proteins) provided by ProtoMap, with the goal of sorting proteins according to their likelihood of belonging to new superfamilies. We hypothesized that proteins within neighbouring clusters tend to share common structural superfamilies or folds. If true, the likelihood of finding new superfamilies increases in clusters that are distal from other solved structures within the graph. RESULTS We defined an order relation between unsolved proteins according to their 'distance' from solved structures in the graph, and sorted approximately 48 000 proteins. Our list can be partitioned into three groups: approximately 35 000 proteins sharing a cluster with at least one known structure; approximately 6500 proteins in clusters with no solved structure but with neighbouring clusters containing known structures; and a third group contains the rest of the proteins, approximately 6100 (in 1274 clusters). We tested the quality of the order relation using thousands of recently solved structures that were not included when the order was defined. The tests show that our order is significantly better (P-value approximately 10(5)) than a random order. More interestingly, the order within the union of the second and third groups, and the order within the third group alone, perform better than random (P-values: 0.0008 and 0.15, respectively) and are better than alternative orders created using PSI-BLAST. Herein, we present a method for selecting targets to be used in structural genomics projects. AVAILABILITY List of proteins to be used for targets selection combined with a set of biological filters for narrowing down potential targets is in http://www.protarget.cs.huji.ac.il.
Collapse
|
Comparative Study |
23 |
18 |
4
|
Drory Retwitzer M, Kifer I, Sengupta S, Yakhini Z, Barash D. An Efficient Minimum Free Energy Structure-Based Search Method for Riboswitch Identification Based on Inverse RNA Folding. PLoS One 2015; 10:e0134262. [PMID: 26230932 PMCID: PMC4521916 DOI: 10.1371/journal.pone.0134262] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2015] [Accepted: 07/07/2015] [Indexed: 11/22/2022] Open
Abstract
Riboswitches are RNA genetic control elements that were originally discovered in bacteria and provide a unique mechanism of gene regulation. They work without the participation of proteins and are believed to represent ancient regulatory systems in the evolutionary timescale. One of the biggest challenges in riboswitch research is to find additional eukaryotic riboswitches since more than 20 riboswitch classes have been found in prokaryotes but only one class has been found in eukaryotes. Moreover, this single known class of eukaryotic riboswitch, namely the TPP riboswitch class, has been found in bacteria, archaea, fungi and plants but not in animals. The few examples of eukaryotic riboswitches were identified using sequence-based bioinformatics search methods such as a combination of BLAST and pattern matching techniques that incorporate base-pairing considerations. None of these approaches perform energy minimization structure predictions. There is a clear motivation to develop new bioinformatics methods, aside of the ongoing advances in covariance models, that will sample the sequence search space more flexibly using structural guidance while retaining the computational efficiency of sequence-based methods. We present a new energy minimization approach that transforms structure-based search into a sequence-based search, thereby enabling the utilization of well established sequence-based search utilities such as BLAST and FASTA. The transformation to sequence space is obtained by using an extended inverse RNA folding problem solver with sequence and structure constraints, available within RNAfbinv. Examples in applying the new method are presented for the purine and preQ1 riboswitches. The method is described in detail along with its findings in prokaryotes. Potential uses in finding novel eukaryotic riboswitches and optimizing pre-designed synthetic riboswitches based on ligand simulations are discussed. The method components are freely available for use.
Collapse
|
Validation Study |
10 |
10 |
5
|
Drory Retwitzer M, Polishchuk M, Churkin E, Kifer I, Yakhini Z, Barash D. RNAPattMatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps. Nucleic Acids Res 2015; 43:W507-12. [PMID: 25940619 PMCID: PMC4489251 DOI: 10.1093/nar/gkv435] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2015] [Accepted: 04/23/2015] [Indexed: 11/13/2022] Open
Abstract
Searching for RNA sequence-structure patterns is becoming an essential tool for RNA practitioners. Novel discoveries of regulatory non-coding RNAs in targeted organisms and the motivation to find them across a wide range of organisms have prompted the use of computational RNA pattern matching as an enhancement to sequence similarity. State-of-the-art programs differ by the flexibility of patterns allowed as queries and by their simplicity of use. In particular-no existing method is available as a user-friendly web server. A general program that searches for RNA sequence-structure patterns is RNA Structator. However, it is not available as a web server and does not provide the option to allow flexible gap pattern representation with an upper bound of the gap length being specified at any position in the sequence. Here, we introduce RNAPattMatch, a web-based application that is user friendly and makes sequence/structure RNA queries accessible to practitioners of various background and proficiency. It also extends RNA Structator and allows a more flexible variable gaps representation, in addition to analysis of results using energy minimization methods. RNAPattMatch service is available at http://www.cs.bgu.ac.il/rnapattmatch. A standalone version of the search tool is also available to download at the site.
Collapse
|
Research Support, Non-U.S. Gov't |
10 |
10 |
6
|
Kifer I, Sasson O, Linial M. Predicting fold novelty based on ProtoNet hierarchical classification. Bioinformatics 2004; 21:1020-7. [PMID: 15539447 DOI: 10.1093/bioinformatics/bti135] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Structural genomics projects aim to solve a large number of protein structures with the ultimate objective of representing the entire protein space. The computational challenge is to identify and prioritize a small set of proteins with new, currently unknown, superfamilies or folds. RESULTS We develop a method that assigns each protein a likelihood of it belonging to a new, yet undetermined, structural superfamily. The method relies on a variant of ProtoNet, an automatic hierarchical classification scheme of all protein sequences from SwissProt. Our results show that proteins that are remote from solved structures in the ProtoNet hierarchy are more likely to belong to new superfamilies. The results are validated against SCOP releases from recent years that account for about half of the solved structures known to date. We show that our new method and the representation of ProtoNet are superior in detecting new targets, compared to our previous method using ProtoMap classification. Furthermore, our method outperforms PSI-BLAST search in detecting potential new superfamilies.
Collapse
|
Validation Study |
21 |
9 |
7
|
Kifer I, Nussinov R, Wolfson HJ. Constructing templates for protein structure prediction by simulation of protein folding pathways. Proteins 2009; 73:380-94. [PMID: 18433063 DOI: 10.1002/prot.22073] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
How a one-dimensional protein sequence folds into a specific 3D structure remains a difficult challenge in structural biology. Many computational methods have been developed in an attempt to predict the tertiary structure of the protein; most of these employ approaches that are based on the accumulated knowledge of solved protein structures. Here we introduce a novel and fully automated approach for predicting the 3D structure of a protein that is based on the well accepted notion that protein folding is a hierarchical process. Our algorithm follows the hierarchical model by employing two stages: the first aims to find a match between the sequences of short independently-folding structural entities and parts of the target sequence and assigns the respective structures. The second assembles these local structural parts into a complete 3D structure, allowing for long-range interactions between them. We present the results of applying our method to a subset of the targets from CASP6 and CASP7. Our results indicate that for targets with a significant sequence similarity to known structures we are often able to provide predictions that are better than those achieved by two leading servers, and that the most significant improvements in comparison with these methods occur in regions of a gapped structural alignment between the native structure and the closest available structural template. We conclude that in addition to performing well for targets with known homologous structures, our method shows great promise for addressing the more general category of comparative modeling targets, which is our next goal.
Collapse
|
Research Support, Non-U.S. Gov't |
16 |
9 |
8
|
Kifer I, Nussinov R, Wolfson HJ. GOSSIP: a method for fast and accurate global alignment of protein structures. Bioinformatics 2011; 27:925-32. [PMID: 21296751 PMCID: PMC3065682 DOI: 10.1093/bioinformatics/btr044] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2010] [Revised: 12/04/2010] [Accepted: 01/21/2011] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The database of known protein structures (PDB) is increasing rapidly. This results in a growing need for methods that can cope with the vast amount of structural data. To analyze the accumulating data, it is important to have a fast tool for identifying similar structures and clustering them by structural resemblance. Several excellent tools have been developed for the comparison of protein structures. These usually address the task of local structure alignment, an important yet computationally intensive problem due to its complexity. It is difficult to use such tools for comparing a large number of structures to each other at a reasonable time. RESULTS Here we present GOSSIP, a novel method for a global all-against-all alignment of any set of protein structures. The method detects similarities between structures down to a certain cutoff (a parameter of the program), hence allowing it to detect similar structures at a much higher speed than local structure alignment methods. GOSSIP compares many structures in times which are several orders of magnitude faster than well-known available structure alignment servers, and it is also faster than a database scanning method. We evaluate GOSSIP both on a dataset of short structural fragments and on two large sequence-diverse structural benchmarks. Our conclusions are that for a threshold of 0.6 and above, the speed of GOSSIP is obtained with no compromise of the accuracy of the alignments or of the number of detected global similarities. AVAILABILITY A server, as well as an executable for download, are available at http://bioinfo3d.cs.tau.ac.il/gossip/.
Collapse
|
Evaluation Study |
14 |
8 |
9
|
Cohn-Alperovich D, Rabner A, Kifer I, Mandel-Gutfreund Y, Yakhini Z. Mutual enrichment in aggregated ranked lists with applications to gene expression regulation. Bioinformatics 2017; 32:i464-i472. [PMID: 27587663 DOI: 10.1093/bioinformatics/btw435] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION It is often the case in biological measurement data that results are given as a ranked list of quantities-for example, differential expression (DE) of genes as inferred from microarrays or RNA-seq. Recent years brought considerable progress in statistical tools for enrichment analysis in ranked lists. Several tools are now available that allow users to break the fixed set paradigm in assessing statistical enrichment of sets of genes. Continuing with the example, these tools identify factors that may be associated with measured differential expression. A drawback of existing tools is their focus on identifying single factors associated with the observed or measured ranks, failing to address relationships between these factors. For example, a scenario in which genes targeted by multiple miRNAs play a central role in the DE signal but the effect of each single miRNA is too subtle to be detected, as shown in our results. RESULTS We propose statistical and algorithmic approaches for selecting a sub-collection of factors that can be aggregated into one ranked list that is heuristically most associated with an input ranked list (pivot). We examine performance on simulated data and apply our approach to cancer datasets. We find small sub-collections of miRNA that are statistically associated with gene DE in several types of cancer, suggesting miRNA cooperativity in driving disease related processes. Many of our findings are consistent with known roles of miRNAs in cancer, while others suggest previously unknown roles for certain miRNAs. AVAILABILITY AND IMPLEMENTATION Code and instructions for our algorithmic framework, MULSEA, are in: https://github.com/YakhiniGroup/MULSEAContact:dalia.cohn@gmail.com SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
Journal Article |
8 |
5 |
10
|
Kifer I, Nussinov R, Wolfson HJ. Protein structure prediction using a docking-based hierarchical folding scheme. Proteins 2011; 79:1759-73. [PMID: 21445943 PMCID: PMC3092838 DOI: 10.1002/prot.22999] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2010] [Revised: 01/02/2011] [Accepted: 01/18/2011] [Indexed: 12/13/2022]
Abstract
The pathways by which proteins fold into their specific native structure are still an unsolved mystery. Currently, many methods for protein structure prediction are available, and most of them tackle the problem by relying on the vast amounts of data collected from known protein structures. These methods are often not concerned with the route the protein follows to reach its final fold. This work is based on the premise that proteins fold in a hierarchical manner. We present FOBIA, an automated method for predicting a protein structure. FOBIA consists of two main stages: the first finds matches between parts of the target sequence and independently folding structural units using profile-profile comparison. The second assembles these units into a 3D structure by searching and ranking their possible orientations toward each other using a docking-based approach. We have previously reported an application of an initial version of this strategy to homology based targets. Since then we have considerably enhanced our method's abilities to allow it to address the more difficult template-based target category. This allows us to now apply FOBIA to the template-based targets of CASP8 and to show that it is both very efficient and promising. Our method can provide an alternative for template-based structure prediction, and in particular, the docking-basedranking technique presented here can be incorporated into any profile-profile comparison based method.
Collapse
|
Research Support, N.I.H., Intramural |
14 |
3 |
11
|
Kifer I, Branca RM, Ben-Dor A, Zhai L, Xu P, Lehtio J, Yakhini Z. Optimizing Analytical Depth and Cost Efficiency of IEF-LC/MS Proteomics. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:272-281. [PMID: 28368805 DOI: 10.1109/tcbb.2015.2452901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
IEF LC-MS/MS is an analytical method that incorporates a two-step sample separation prior to MS identification of proteins. When analyzing complex samples this preparatory separation allows for higher analytical depth and improved quantification accuracy of proteins. However, cost and analysis time are greatly increased as each analyzed IEF fraction is separately profiled using LC-MS/MS. We propose an approach that selects a subset of IEF fractions for LC-MS/MS analysis that is highly informative in the context of a group of proteins of interest. Specifically, our method allows a significant reduction in cost and instrument time as compared to the standard protocol of running all fractions, with little compromise to coverage. We develop algorithmics to optimize the selection of the IEF fractions on which to run LC-MS/MS. We translate the fraction optimization task to Minimum Set Cover, a well-studied NP-hard problem. We develop heuristic solutions and compare them in terms of effectiveness and running times. We provide examples to demonstrate advantages and limitations of each algorithmic approach. Finally, we test our methodology by applying it to experimental data obtained from IEF LC-MS/MS analysis of yeast and human samples. We demonstrate the benefit of this approach for analyzing complex samples with a focus on different protein sets of interest.
Collapse
|
|
8 |
|
12
|
Cohn-Alperovich D, Rabner A, Kifer I, Mandel-Gutfreund Y, Yakhini Z. Mutual enrichment in aggregated ranked lists with applications to gene expression regulation. Bioinformatics 2017; 33:470. [PMID: 28011778 DOI: 10.1093/bioinformatics/btw727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
Comment |
8 |
|
13
|
Kifer I, Aizenman A, Fillipov-Levy N, Barbash Z, Vidne M, Tarcic G. Abstract 1938: Large-scale identification of mutation activity and drug sensitivity in BRAF via a novel multi-label multi-task CNN model. Cancer Res 2022. [DOI: 10.1158/1538-7445.am2022-1938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Precision medicine has allowed for many drugs to be developed for frequently occurring well studied oncogenic mutations such as NTRK fusions or EGFR exon 19 mutations. However, large scale genomic sequencing of patient samples shows that tumors harbor many mutations classified as Variants of Uncertain Significance (VUS). Such mutations could have a significant role in tumor progression and can thus serve as potential drug targets. Therefore, understanding and characterizing the functional significance of VUSs and their response to targeted agents is essential. Here we present a novel machine learning (ML) model consisting of a multi-label, multi-task deep convolutional neural network followed by a decision tree-based regression model. Focusing on alterations in BRAF, we used data from a cell-based assay that measures the activity of signaling pathway activation. This is done using fluorescent imaging of cells expressing a mutated protein together with a fluorescently labeled signaling pathway reporter, providing the input to our model. We trained the model on 3 types of cell images: cells transfected with WT BRAF, BRAF V600E, and BRAF V600E treated with a high dose of Vemurafenib. We use two datasets to evaluate our performance: a set of 17 known active BRAF fusions, as well as a set of 16 known active non-V600E mutations. We also compare our performance to previously published single-task ML model that aims to detect both activity and response. The two methodologies are compared via two criteria: ability to detect activity of the mutations in the dataset, as well as ability to predict response to Vemurafenib or to FORE8394, a drug previously unseen by the model. We show that while both single-task and multi-task models identify all 17 known active fusions as oncogenic, the multi-task does slightly better on the non-V600E mutations, correctly identifying 15/16 of the active mutations vs 10/16 for the single-task model. Comparing drug response, the multi-task model has higher sensitivity in detecting active mutations as responsive to Vemurafenib or FORE8394, including all V600 mutations which are known responders to Vemurafenib and for whom the single-task model does not capture response. Following the training and validation, given a dataset of >300 previously unseen BRAF mutations, the multi-task model is then able to predict both the mutation’s oncogenicity level as well as its expected response to the given drug. Interestingly, the multi-task model also suggests a different drug response profile for vemurafenib compared to FORE8394. We conclude that our novel multi-task model provides an accurate and efficient method for uncovering the actionable and treatable mutational landscape of a drug for patients with mutations in BRAF. It can thus be viewed as a step forward in developing sensitive methodologies for determining patients that are more susceptible to benefit from a potential drug.
Citation Format: Ilona Kifer, Arie Aizenman, Natalie Fillipov-Levy, Zohar Barbash, Michael Vidne, Gabi Tarcic. Large-scale identification of mutation activity and drug sensitivity in BRAF via a novel multi-label multi-task CNN model [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 1938.
Collapse
|
|
3 |
|