26
|
Bruinsma SM, Roobol MJ, Carroll PR, Klotz L, Pickles T, Moore CM, Gnanapragasam VJ, Villers A, Rannikko A, Valdagni R, Frydenberg M, Kakehi Y, Filson CP, Bangma CH. Expert consensus document: Semantics in active surveillance for men with localized prostate cancer - results of a modified Delphi consensus procedure. Nat Rev Urol 2017; 14:312-322. [PMID: 28290462 DOI: 10.1038/nrurol.2017.26] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Active surveillance (AS) is broadly described as a management option for men with low-risk prostate cancer, but semantic heterogeneity exists in both the literature and in guidelines. To address this issue, a panel of leading prostate cancer specialists in the field of AS participated in a consensus-forming project using a modified Delphi method to reach international consensus on definitions of terms related to this management option. An iterative three-round sequence of online questionnaires designed to address 61 individual items was completed by each panel member. Consensus was considered to be reached if ≥70% of the experts agreed on a definition. To facilitate a common understanding among all experts involved and resolve potential ambiguities, a face-to-face consensus meeting was held between Delphi survey rounds two and three. Convenience sampling was used to construct the panel of experts. In total, 12 experts from Australia, France, Finland, Italy, the Netherlands, Japan, the UK, Canada and the USA participated. By the end of the Delphi process, formal consensus was achieved for 100% (n = 61) of the terms and a glossary was then developed. Agreement between international experts has been reached on relevant terms and subsequent definitions regarding AS for patients with localized prostate cancer. This standard terminology could support multidisciplinary communication, reduce the extent of variations in clinical practice and optimize clinical decision making.
Collapse
|
27
|
Fleuren WWM, Toonen EJM, Verhoeven S, Frijters R, Hulsen T, Rullmann T, van Schaik R, de Vlieg J, Alkema W. Identification of new biomarker candidates for glucocorticoid induced insulin resistance using literature mining. BioData Min 2013; 6:2. [PMID: 23379763 PMCID: PMC3577498 DOI: 10.1186/1756-0381-6-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2012] [Accepted: 01/02/2013] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Glucocorticoids are potent anti-inflammatory agents used for the treatment of diseases such as rheumatoid arthritis, asthma, inflammatory bowel disease and psoriasis. Unfortunately, usage is limited because of metabolic side-effects, e.g. insulin resistance, glucose intolerance and diabetes. To gain more insight into the mechanisms behind glucocorticoid induced insulin resistance, it is important to understand which genes play a role in the development of insulin resistance and which genes are affected by glucocorticoids.Medline abstracts contain many studies about insulin resistance and the molecular effects of glucocorticoids and thus are a good resource to study these effects. RESULTS We developed CoPubGene a method to automatically identify gene-disease associations in Medline abstracts. We used this method to create a literature network of genes related to insulin resistance and to evaluate the importance of the genes in this network for glucocorticoid induced metabolic side effects and anti-inflammatory processes.With this approach we found several genes that already are considered markers of GC induced IR, such as phosphoenolpyruvate carboxykinase (PCK) and glucose-6-phosphatase, catalytic subunit (G6PC). In addition, we found genes involved in steroid synthesis that have not yet been recognized as mediators of GC induced IR. CONCLUSIONS With this approach we are able to construct a robust informative literature network of insulin resistance related genes that gave new insights to better understand the mechanisms behind GC induced IR. The method has been set up in a generic way so it can be applied to a wide variety of disease networks.
Collapse
|
28
|
van Hooff SR, Koster J, Hulsen T, van Schaik BDC, Roos M, van Batenburg MF, Versteeg R, van Kampen AHC. The construction of genome-based transcriptional units. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2009; 13:105-14. [PMID: 19320556 DOI: 10.1089/omi.2008.0036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Gene-oriented sequence clusters (transcriptional units) have found many applications in genomics research including the construction of transcriptome maps and identification of splice variants. We developed a new method to construct transcriptional that uses the genomic sequence as a template. We present and discuss our method in detail together with an evaluation of the transcriptional units for human. We constructed 33,007 and 27,792 transcriptional units for human and mouse, respectively. The sensitivity (81%) and specificity (90%) of our method compares favorably to other established methods. We evaluated the representation of experimentally validated and predicted intergenic spliced transcripts in humans and show that we correctly represent a large fraction of these cases by single transcriptional units. Our method performs well, but the evaluation of the final set of transcriptional units show that improvements to the algorithm are still possible. However, because the precise number and types of errors are difficult to track, it is not obvious how to significantly improve the algorithm. We believe that ongoing research efforts are necessary to further improve current methods. This should include detailed documentation, comparison, and evaluation of current methods.
Collapse
|
29
|
Hulsen T, Groenen PMA, de Vlieg J, Alkema W. PhyloPat: an updated version of the phylogenetic pattern database contains gene neighborhood. Nucleic Acids Res 2009; 37:D731-7. [PMID: 18832367 PMCID: PMC2686476 DOI: 10.1093/nar/gkn645] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Phylogenetic patterns show the presence or absence of certain genes in a set of full genomes derived from different species. They can also be used to determine sets of genes that occur only in certain evolutionary branches. Previously, we presented a database named PhyloPat which allows the complete Ensembl gene database to be queried using phylogenetic patterns. Here, we describe an updated version of PhyloPat which can be queried by an improved web server. We used a single linkage clustering algorithm to create 241,697 phylogenetic lineages, using all the orthologies provided by Ensembl v49. PhyloPat offers the possibility of querying with binary phylogenetic patterns or regular expressions, or through a phylogenetic tree of the 39 included species. Users can also input a list of Ensembl, EMBL, EntrezGene or HGNC IDs to check which phylogenetic lineage any gene belongs to. A link to the FatiGO web interface has been incorporated in the HTML output. For each gene, the surrounding genes on the chromosome, color coded according to their phylogenetic lineage can be viewed, as well as FASTA files of the peptide sequences of each lineage. Furthermore, lists of omnipresent, polypresent, oligopresent and anticorrelating genes have been included. PhyloPat is freely available at http://www.cmbi.ru.nl/phylopat.
Collapse
|
30
|
Hulsen T, de Vlieg J, Alkema W. BioVenn - a web application for the comparison and visualization of biological lists using area-proportional Venn diagrams. BMC Genomics 2008; 9:488. [PMID: 18925949 PMCID: PMC2584113 DOI: 10.1186/1471-2164-9-488] [Citation(s) in RCA: 1096] [Impact Index Per Article: 68.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2008] [Accepted: 10/16/2008] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND In many genomics projects, numerous lists containing biological identifiers are produced. Often it is useful to see the overlap between different lists, enabling researchers to quickly observe similarities and differences between the data sets they are analyzing. One of the most popular methods to visualize the overlap and differences between data sets is the Venn diagram: a diagram consisting of two or more circles in which each circle corresponds to a data set, and the overlap between the circles corresponds to the overlap between the data sets. Venn diagrams are especially useful when they are 'area-proportional' i.e. the sizes of the circles and the overlaps correspond to the sizes of the data sets. Currently there are no programs available that can create area-proportional Venn diagrams connected to a wide range of biological databases. RESULTS We designed a web application named BioVenn to summarize the overlap between two or three lists of identifiers, using area-proportional Venn diagrams. The user only needs to input these lists of identifiers in the textboxes and push the submit button. Parameters like colors and text size can be adjusted easily through the web interface. The position of the text can be adjusted by 'drag-and-drop' principle. The output Venn diagram can be shown as an SVG or PNG image embedded in the web application, or as a standalone SVG or PNG image. The latter option is useful for batch queries. Besides the Venn diagram, BioVenn outputs lists of identifiers for each of the resulting subsets. If an identifier is recognized as belonging to one of the supported biological databases, the output is linked to that database. Finally, BioVenn can map Affymetrix and EntrezGene identifiers to Ensembl genes. CONCLUSION BioVenn is an easy-to-use web application to generate area-proportional Venn diagrams from lists of biological identifiers. It supports a wide range of identifiers from the most used biological databases currently available. Its implementation on the World Wide Web makes it available for use on any computer with internet connection, independent of operating system and without the need to install programs locally. BioVenn is freely accessible at http://www.cmbi.ru.nl/cdd/biovenn/.
Collapse
|
31
|
Franck E, Hulsen T, Huynen MA, de Jong WW, Lubsen NH, Madsen O. Evolution of closely linked gene pairs in vertebrate genomes. Mol Biol Evol 2008; 25:1909-21. [PMID: 18566020 DOI: 10.1093/molbev/msn136] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
The orientation of closely linked genes in mammalian genomes is not random: there are more head-to-head (h2h) gene pairs than expected. To understand the origin of this enrichment in h2h gene pairs, we have analyzed the phylogenetic distribution of gene pairs separated by less than 600 bp of intergenic DNA (gene duos). We show here that a lack of head-to-tail (h2t) gene duos is an even more distinctive characteristic of mammalian genomes, with the platypus genome as the only exception. In nonmammalian vertebrate and in nonvertebrate genomes, the frequency of h2h, h2t, and tail-to-tail (t2t) gene duos is close to random. In tetrapod genomes, the h2t and t2t gene duos are more likely to be part of a larger gene cluster of closely spaced genes than h2h gene duos; in fish and urochordate genomes, the reverse is seen. In human and mouse tissues, the expression profiles of gene duos were skewed toward positive coexpression, irrespective of orientation. The organization of orthologs of both members of about 40% of the human gene duos could be traced in other species, enabling a prediction of the organization at the branch points of gnathostomes, tetrapods, amniotes, and euarchontoglires. The accumulation of h2h gene duos started in tetrapods, whereas that of h2t and t2t gene duos only started in amniotes. The apparent lack of evolutionary conservation of h2t and t2t gene duos relative to that of h2h gene duos is thus a result of their relatively late origin in the lineage leading to mammals; we show that once they are formed h2t and t2t gene duos are as stable as h2h gene duos.
Collapse
|
32
|
Denissov S, van Driel M, Voit R, Hekkelman M, Hulsen T, Hernandez N, Grummt I, Wehrens R, Stunnenberg H. Identification of novel functional TBP-binding sites and general factor repertoires. EMBO J 2007; 26:944-54. [PMID: 17268553 PMCID: PMC1852848 DOI: 10.1038/sj.emboj.7601550] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2006] [Accepted: 12/15/2006] [Indexed: 02/08/2023] Open
Abstract
Our current knowledge of the general factor requirement in transcription by the three mammalian RNA polymerases is based on a small number of model promoters. Here, we present a comprehensive chromatin immunoprecipitation (ChIP)-on-chip analysis for 28 transcription factors on a large set of known and novel TATA-binding protein (TBP)-binding sites experimentally identified via ChIP cloning. A large fraction of identified TBP-binding sites is located in introns or lacks a gene/mRNA annotation and is found to direct transcription. Integrated analysis of the ChIP-on-chip data and functional studies revealed that TAF12 hitherto regarded as RNA polymerase II (RNAP II)-specific was found to be also involved in RNAP I transcription. Distinct profiles for general transcription factors and TAF-containing complexes were uncovered for RNAP II promoters located in CpG and non-CpG islands suggesting distinct transcription initiation pathways. Our study broadens the spectrum of general transcription factor function and uncovers a plethora of novel, functional TBP-binding sites in the human genome.
Collapse
|
33
|
Hulsen T, de Vlieg J, Leunissen JAM, Groenen PMA. Testing statistical significance scores of sequence comparison methods with structure similarity. BMC Bioinformatics 2006; 7:444. [PMID: 17038163 PMCID: PMC1618413 DOI: 10.1186/1471-2105-7-444] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2006] [Accepted: 10/12/2006] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences. RESULTS All experiments are performed on the ASTRAL SCOP database. The Smith-Waterman sequence comparison algorithm with both e-value and Z-score statistics is evaluated, using ROC, CVE and AP measures. The BLAST and FASTA algorithms are used as reference. We find that two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith-Waterman implementation with Z-score. SSEARCH especially has very high scores. CONCLUSION The compute intensive Z-score does not have a clear advantage over the e-value. The Smith-Waterman implementations give generally better results than their heuristic counterparts. We recommend using the SSEARCH algorithm combined with e-values for pairwise sequence comparisons.
Collapse
|
34
|
Hulsen T, de Vlieg J, Groenen PMA. PhyloPat: phylogenetic pattern analysis of eukaryotic genes. BMC Bioinformatics 2006; 7:398. [PMID: 16948844 PMCID: PMC1570148 DOI: 10.1186/1471-2105-7-398] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2006] [Accepted: 09/01/2006] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Phylogenetic patterns show the presence or absence of certain genes or proteins in a set of species. They can also be used to determine sets of genes or proteins that occur only in certain evolutionary branches. Phylogenetic patterns analysis has routinely been applied to protein databases such as COG and OrthoMCL, but not upon gene databases. Here we present a tool named PhyloPat which allows the complete Ensembl gene database to be queried using phylogenetic patterns. DESCRIPTION PhyloPat is an easy-to-use webserver, which can be used to query the orthologies of all complete genomes within the EnsMart database using phylogenetic patterns. This enables the determination of sets of genes that occur only in certain evolutionary branches or even single species. We found in total 446,825 genes and 3,164,088 orthologous relationships within the EnsMart v40 database. We used a single linkage clustering algorithm to create 147,922 phylogenetic lineages, using every one of the orthologies provided by Ensembl. PhyloPat provides the possibility of querying with either binary phylogenetic patterns (created by checkboxes) or regular expressions. Specific branches of a phylogenetic tree of the 21 included species can be selected to create a branch-specific phylogenetic pattern. Users can also input a list of Ensembl or EMBL IDs to check which phylogenetic lineage any gene belongs to. The output can be saved in HTML, Excel or plain text format for further analysis. A link to the FatiGO web interface has been incorporated in the HTML output, creating easy access to functional information. Finally, lists of omnipresent, polypresent and oligopresent genes have been included. CONCLUSION PhyloPat is the first tool to combine complete genome information with phylogenetic pattern querying. Since we used the orthologies generated by the accurate pipeline of Ensembl, the obtained phylogenetic lineages are reliable. The completeness and reliability of these phylogenetic lineages will further increase with the addition of newly found orthologous relationships within each new Ensembl release.
Collapse
|
35
|
Hulsen T, Huynen MA, de Vlieg J, Groenen PMA. Benchmarking ortholog identification methods using functional genomics data. Genome Biol 2006; 7:R31. [PMID: 16613613 PMCID: PMC1557999 DOI: 10.1186/gb-2006-7-4-r31] [Citation(s) in RCA: 119] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2005] [Revised: 12/06/2005] [Accepted: 03/14/2006] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND The transfer of functional annotations from model organism proteins to human proteins is one of the main applications of comparative genomics. Various methods are used to analyze cross-species orthologous relationships according to an operational definition of orthology. Often the definition of orthology is incorrectly interpreted as a prediction of proteins that are functionally equivalent across species, while in fact it only defines the existence of a common ancestor for a gene in different species. However, it has been demonstrated that orthologs often reveal significant functional similarity. Therefore, the quality of the orthology prediction is an important factor in the transfer of functional annotations (and other related information). To identify protein pairs with the highest possible functional similarity, it is important to qualify ortholog identification methods. RESULTS To measure the similarity in function of proteins from different species we used functional genomics data, such as expression data and protein interaction data. We tested several of the most popular ortholog identification methods. In general, we observed a sensitivity/selectivity trade-off: the functional similarity scores per orthologous pair of sequences become higher when the number of proteins included in the ortholog groups decreases. CONCLUSION By combining the sensitivity and the selectivity into an overall score, we show that the InParanoid program is the best ortholog identification method in terms of identifying functionally equivalent proteins.
Collapse
|
36
|
Oliveira L, Hulsen T, Lutje Hulsik D, Paiva ACM, Vriend G. Heavier-than-air flying machines are impossible. FEBS Lett 2004; 564:269-73. [PMID: 15111108 DOI: 10.1016/s0014-5793(04)00320-5] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2003] [Accepted: 02/23/2004] [Indexed: 02/08/2023]
Abstract
Many G protein-coupled receptor (GPCR) models have been built over the years. The release of the structure of bovine rhodopsin in August 2000 enabled us to analyze models built before that period to learn more about the models we build today. We conclude that the GPCR modelling field is riddled with 'common knowledge' similar to Lord Kelvin's remark in 1895 that "heavier-than-air flying machines are impossible", and we summarize what we think are the (im)possibilities of modelling GPCRs using the coordinates of bovine rhodopsin as a template. Associated WWW pages: www.gpcr.org/articles/2003_mod
Collapse
|