1
|
A novel thin plate spline methodology to model tissue surfaces and quantify tumor cell invasion in organ-on-chip models. SLAS DISCOVERY : ADVANCING LIFE SCIENCES R & D 2024:100163. [PMID: 38796111 DOI: 10.1016/j.slasd.2024.100163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 05/21/2024] [Accepted: 05/22/2024] [Indexed: 05/28/2024]
Abstract
Organ-on-chip (OOC) models can be useful tools for cancer drug discovery. Advances in OOC technology have led to the development of more complex assays, yet analysis of these systems does not always account for these advancements, resulting in technical challenges. A challenging task in the analysis of these two-channel microfluidic models is to define the boundary between the channels so objects moving within and between channels can be quantified. We propose a novel imaging-based application of a thin plate spline method - a generalized cubic spline that can be used to model coordinate transformations - to model a tissue boundary and define compartments for quantification of invaded objects, representing the early steps in cancer metastasis. To evaluate its performance, we applied our analytical approach to an adapted OOC developed by Emulate, Inc., utilizing a two-channel system with endothelial cells in the bottom channel and colorectal cancer (CRC) patient-derived organoids (PDOs) in the top channel. Initial application and visualization of this method revealed boundary variations due to microscope stage tilt and ridge and valley-like contours in the endothelial tissue surface. The method was functionalized into a reproducible analytical process and web tool - the Chip Invasion and Contour Analysis (ChICA) - to model the endothelial surface and quantify invading tumor cells across multiple chips. To illustrate applicability of the analytical method, we applied the tool to CRC organoid-chips seeded with two different endothelial cell types and measured distinct variations in endothelial surfaces and tumor cell invasion dynamics. Since ChICA utilizes only positional data output from imaging software, the method is applicable to and agnostic of the imaging tool and image analysis system used. The novel thin plate spline method developed in ChICA can account for variation introduced in OOC manufacturing or during the experimental workflow, can quickly and accurately measure tumor cell invasion, and can be used to explore biological mechanisms in drug discovery.
Collapse
|
2
|
A novel thin plate spline methodology to model tissue surfaces and quantify tumor cell invasion in organ-on-chip models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.20.567272. [PMID: 38045424 PMCID: PMC10690199 DOI: 10.1101/2023.11.20.567272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Organ-on-chip (OOC) models can be useful tools for cancer drug discovery. Advances in OOC technology have led to the development of more complex assays, yet analysis of these systems does not always account for these advancements, resulting in technical challenges. A challenging task in the analysis of these two-channel microfluidic models is to define the boundary between the channels so objects moving within and between channels can be quantified. We propose a novel imaging-based application of a thin plate spline method - a generalized cubic spline that can be used to model coordinate transformations - to model a tissue boundary and define compartments for quantification of invaded objects, representing the early steps in cancer metastasis. To evaluate its performance, we applied our analytical approach to an adapted OOC developed by Emulate, Inc., utilizing a two-channel system with endothelial cells in the bottom channel and colorectal cancer (CRC) patient-derived organoids (PDOs) in the top channel. Initial application and visualization of this method revealed boundary variations due to microscope stage tilt and ridge and valley-like contours in the endothelial tissue surface. The method was functionalized into a reproducible analytical process and web tool - the Chip Invasion and Contour Analysis (ChICA) - to model the endothelial surface and quantify invading tumor cells across multiple chips. To illustrate applicability of the analytical method, we applied the tool to CRC organoid-chips seeded with two different endothelial cell types and measured distinct variations in endothelial surfaces and tumor cell invasion dynamics. Since ChICA utilizes only positional data output from imaging software, the method is applicable to and agnostic of the imaging tool and image analysis system used. The novel thin plate spline method developed in ChICA can account for variation introduced in OOC manufacturing or during the experimental workflow, can quickly and accurately measure tumor cell invasion, and can be used to explore biological mechanisms in drug discovery.
Collapse
|
3
|
Immune biomarkers associated with COVID-19 disease severity in an urban, hospitalized population. Pract Lab Med 2023; 36:e00323. [PMID: 37649544 PMCID: PMC10462676 DOI: 10.1016/j.plabm.2023.e00323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 04/17/2023] [Accepted: 06/30/2023] [Indexed: 09/01/2023] Open
Abstract
Objectives We sought to identify immune biomarkers associated with severe Coronavirus disease 2019 (COVID-19) in patients admitted to a large urban hospital during the early phase of the SARS-CoV-2 pandemic. Design The study population consisted of SARS-CoV-2 positive subjects admitted for COVID-19 (n = 58) or controls (n = 14) at the Los Angeles County University of Southern California Medical Center between April 2020 through December 2020. Immunologic markers including chemokine/cytokines (IL-6, IL-8, IL-10, IP-10, MCP-1, TNF-α) and serologic markers against SARS-CoV-2 antigens (including spike subunits S1 and S2, receptor binding domain, and nucleocapsid) were assessed in serum collected on the day of admission using bead-based multiplex immunoassay panels. Results We observed that body mass index (BMI) and SARS-CoV-2 antibodies were significantly elevated in patients with the highest COVID-19 disease severity. IP-10 was significantly elevated in COVID-19 patients and was associated with increased SARS-CoV-2 antibodies. Interactions among all available variables on COVID-19 disease severity were explored using a linear support vector machine model which supported the importance of BMI and SARS-CoV-2 antibodies. Conclusions Our results confirm the known adverse association of BMI on COVID-19 severity and suggest that IP-10 and SARS-CoV-2 antibodies could be useful to identify patients most likely to experience the most severe forms of the disease.
Collapse
|
4
|
Adaptation of Imaging Mass Cytometry to Explore the Single Cell Alloimmune Landscape of Liver Transplant Rejection. Front Immunol 2022; 13:831103. [PMID: 35432320 PMCID: PMC9009043 DOI: 10.3389/fimmu.2022.831103] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 03/08/2022] [Indexed: 12/14/2022] Open
Abstract
Rejection continues to be an important cause of graft loss in solid organ transplantation, but deep exploration of intragraft alloimmunity has been limited by the scarcity of clinical biopsy specimens. Emerging single cell immunoprofiling technologies have shown promise in discerning mechanisms of autoimmunity and cancer immunobiology. Within these applications, Imaging Mass Cytometry (IMC) has been shown to enable highly multiplexed, single cell analysis of immune phenotypes within fixed tissue specimens. In this study, an IMC panel of 10 validated markers was developed to explore the feasibility of IMC in characterizing the immune landscape of chronic rejection (CR) in clinical tissue samples obtained from liver transplant recipients. IMC staining was highly specific and comparable to traditional immunohistochemistry. A single cell segmentation analysis pipeline was developed that enabled detailed visualization and quantification of 109,245 discrete cells, including 30,646 immune cells. Dimensionality reduction identified 11 unique immune subpopulations in CR specimens. Most immune subpopulations were increased and spatially related in CR, including two populations of CD45+/CD3+/CD8+ cytotoxic T-cells and a discrete CD68+ macrophage population, which were not observed in liver with no rejection (NR). Modeling via principal component analysis and logistic regression revealed that single cell data can be utilized to construct statistical models with high consistency (Wilcoxon Rank Sum test, p=0.000036). This study highlights the power of IMC to investigate the alloimmune microenvironment at a single cell resolution during clinical rejection episodes. Further validation of IMC has the potential to detect new biomarkers, identify therapeutic targets, and generate patient-specific predictive models of clinical outcomes in solid organ transplantation.
Collapse
|
5
|
Imaging-Based Machine Learning Analysis of Patient-Derived Tumor Organoid Drug Response. Front Oncol 2022; 11:771173. [PMID: 34993134 PMCID: PMC8724556 DOI: 10.3389/fonc.2021.771173] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 12/02/2021] [Indexed: 12/12/2022] Open
Abstract
Three-quarters of compounds that enter clinical trials fail to make it to market due to safety or efficacy concerns. This statistic strongly suggests a need for better screening methods that result in improved translatability of compounds during the preclinical testing period. Patient-derived organoids have been touted as a promising 3D preclinical model system to impact the drug discovery pipeline, particularly in oncology. However, assessing drug efficacy in such models poses its own set of challenges, and traditional cell viability readouts fail to leverage some of the advantages that the organoid systems provide. Consequently, phenotypically evaluating complex 3D cell culture models remains difficult due to intra- and inter-patient organoid size differences, cellular heterogeneities, and temporal response dynamics. Here, we present an image-based high-content assay that provides object level information on 3D patient-derived tumor organoids without the need for vital dyes. Leveraging computer vision, we segment and define organoids as independent regions of interest and obtain morphometric and textural information per organoid. By acquiring brightfield images at different timepoints in a robust, non-destructive manner, we can track the dynamic response of individual organoids to various drugs. Furthermore, to simplify the analysis of the resulting large, complex data files, we developed a web-based data visualization tool, the Organoizer, that is available for public use. Our work demonstrates the feasibility and utility of using imaging, computer vision and machine learning to determine the vital status of individual patient-derived organoids without relying upon vital dyes, thus taking advantage of the characteristics offered by this preclinical model system.
Collapse
|
6
|
Access to RNA-sequencing data from 1,173 plant species: The 1000 Plant transcriptomes initiative (1KP). Gigascience 2019; 8:giz126. [PMID: 31644802 PMCID: PMC6808545 DOI: 10.1093/gigascience/giz126] [Citation(s) in RCA: 84] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Revised: 08/08/2019] [Accepted: 09/28/2019] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND The 1000 Plant transcriptomes initiative (1KP) explored genetic diversity by sequencing RNA from 1,342 samples representing 1,173 species of green plants (Viridiplantae). FINDINGS This data release accompanies the initiative's final/capstone publication on a set of 3 analyses inferring species trees, whole genome duplications, and gene family expansions. These and previous analyses are based on de novo transcriptome assemblies and related gene predictions. Here, we assess their data and assembly qualities and explain how we detected potential contaminations. CONCLUSIONS These data will be useful to plant and/or evolutionary scientists with interests in particular gene families, either across the green plant tree of life or in more focused lineages.
Collapse
|
7
|
Monitoring dynamic cytotoxic chemotherapy response in castration-resistant prostate cancer using plasma cell-free DNA (cfDNA). BMC Res Notes 2019; 12:275. [PMID: 31092276 PMCID: PMC6521434 DOI: 10.1186/s13104-019-4312-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2018] [Accepted: 05/08/2019] [Indexed: 02/08/2023] Open
Abstract
Objective Cell-free DNA (cfDNA) is an attractive cancer biomarker, as it is thought to reflect a component of the underlying genetic makeup of the tumor and is readily accessible in serial fashion. Because chemotherapy regimens are expected to act rapidly on cancer and cfDNA is cleared from the blood within minutes, we hypothesized that cfDNA would reflect immediate effects of treatment. Here, we developed a method for monitoring long cfDNA fragments, and report dynamic changes in response to cytotoxic chemotherapy. Results Peripheral blood was obtained from 15 patients with metastatic castration-resistant prostate cancer (CRPC) immediately before and after cytotoxic chemotherapy infusion. cfDNA was extracted and quantified for long interspersed nuclear elements (LINE1; 297 bp) using qPCR. Targeted deep sequencing was performed to quantify the frequency of mutations in exon 8 of the androgen receptor (AR), a mutational hotspot region in CRPC. Single nucleotide mutations in AR exon 8 were found in 6 subjects (6/15 = 40%). Analytical variability was minimized by pooling independent PCR reactions for each library. In 5 patients, tumor-derived long cfDNA levels were found to change immediately after infusion. Detailed analysis of one subject suggests that cytotoxic chemotherapy can produce rapidly observable effects on cfDNA. Electronic supplementary material The online version of this article (10.1186/s13104-019-4312-2) contains supplementary material, which is available to authorized users.
Collapse
|
8
|
Abstract 3934: Bladder cancer tumor heterogeneity: development of a system-level mutation assay. Cancer Res 2017. [DOI: 10.1158/1538-7445.am2017-3934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Survival rates for patients with muscle-invasive bladder cancer have not improved in the past 20 years, and new therapies are imperative. Intratumor heterogeneity can complicate molecular profiling attempts to optimize therapy for cancers harboring several actionable tumor subclones. To develop personalized treatment strategies, there is a need for assays to measure intratumor heterogeneity in bladder cancers.
We conducted a pilot study of two muscle invasive high-grade transitional cell carcinoma cases. We used a comprehensive cancer panel (Thermo Fisher) covering >400 cancer genes to analyze distinct tumor loci and matched normal tissues. Based on the identified somatic mutations, we designed a bladder-specific panel to (1) validate our results with increased coverage, and (2) analyze liquid biopsy samples.
Using the comprehensive cancer panel, we sequenced 6 tumor loci to an average sequencing depth of approximately 100x. We detected intratumor heterogeneity in both patients: By applying a combination of frequency-based (minor allele frequency >10%) and probabilistic (probability of difference between observed frequencies due to sampling) filters, we identified 44 credible somatic SNVs, including mutations that were not shared among all three loci. We used these SNVs to design a custom amplicon panel covering 42 SNVs across 38 genes that is suitable for highly fragmented DNA. The custom panel was used to validate the SNVs in the same tumor regions and in liquid biopsy samples from plasma and urine (approximate coverage 6,000x). In both cases, we identified private mutations reported in The Cancer Genome Atlas Urothelial Bladder Carcinoma (TCGA-BLCA) data collection, reflecting tumor evolution. Liquid biopsy samples from urine revealed all trunk mutations but only 1 out of 5 private mutations.
We conclude that tumor evolution can affect distinct loci within bladder tumors, which may not be fully represented in liquid biopsy samples. These results suggest the need for analyzing multiple tumor regions to identify all actionable driver mutations. In the future, we plan to apply our assay to additional foci and patients in order to identify optimal bladder tumor sampling strategies.
Citation Format: Katherin Patsch, Naim Matasci, Anjana Soundararajan, John Nicoll, Jonathan Katz, Antonio Sanchez, Erika Feierstein, Christina Van Loy, Zhao Xu, David B. Agus, Mitchell E. Gross, Daniel Ruderman. Bladder cancer tumor heterogeneity: development of a system-level mutation assay [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 3934. doi:10.1158/1538-7445.AM2017-3934
Collapse
|
9
|
Abstract 3731: A novel multidimensional cfDNA assay for real time analysis of chemotherapy response. Cancer Res 2017. [DOI: 10.1158/1538-7445.am2017-3731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Due to its non-invasive nature and transient lifespan (~30 mins), circulating cell free DNA (cfDNA) may provide novel insights into cancer evolution and response to therapy. Here, we have developed a multidimensional cfDNA assay to measure patients’ responses to chemotherapy on a time scale that would be impossible using traditional biopsy methodologies.
We extracted cfDNA from plasma collected immediately before and after infusion of cytotoxic chemotherapy, and 24 hours after treatment. We developed assays to (1) quantify cfDNA via qPCR of a 288 base pair amplicon of Long Interspersed Nuclear Elements (LINE1), (2) determine the DNA Integrity Index based on additional analysis of 115bp and 516bp LINE1 amplicons, and (3) analyze androgen receptor exon 8 via targeted deep sequencing.
We analyzed cfDNA extracted from 17 patients with metastatic castration-resistant prostate cancer through sequential application of these 3 assays. Median PSA levels were 69 (0-2223) ng/dl, Hgb 12 (8.2-15) g/L, Alk Phos 117 (35-879) IU/mL. Deep sequencing of AR exon 8 revealed 6 of the 17 patients presented with 1-2 SNVs (H874Y, T877A, D890H or D879Y) at ≥1% frequency. Considering LINE1 quantification alone, the average change pre/post chemotherapy was 38.78 pg/mL (0.5-292.0 pg/mL). A variety of responses was noted including: 8 of the patients expressed elevated LINE1 levels post-treatment when compared to baseline, and 5 expressed depressed levels. Importantly, therapy did not affect tumor subclones equally, as demonstrated by sequencing and quantitation of mutant AR exon 8 species. Additional data on changes in DNA Integrity Index will be presented.
We conclude that cytotoxic chemotherapy produces immediately observable effects on cfDNA. The effect of docetaxel chemotherapy on distinct clones within the tumor can be measured within hours. Now that a short-time dynamics response has been demonstrated, follow up studies will be performed involving more patients to determine the prognostic value.
Citation Format: Pavan P. Shah, Katherin Patsch, Naim Matasci, Anjana Soudararajan, Patricia Diaz, David B. Agus, Daniel Ruderman, Mitchell E. Gross. A novel multidimensional cfDNA assay for real time analysis of chemotherapy response [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 3731. doi:10.1158/1538-7445.AM2017-3731
Collapse
|
10
|
Data mining with iPlant: a meeting report from the 2013 GARNet workshop, Data mining with iPlant. JOURNAL OF EXPERIMENTAL BOTANY 2015; 66:1-6. [PMID: 25326627 DOI: 10.1093/jxb/eru402] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
High-throughput sequencing technologies have rapidly moved from large international sequencing centres to individual laboratory benchtops. These changes have driven the 'data deluge' of modern biology. Submissions of nucleotide sequences to GenBank, for example, have doubled in size every year since 1982, and individual data sets now frequently reach terabytes in size. While 'big data' present exciting opportunities for scientific discovery, data analysis skills are not part of the typical wet bench biologist's experience. Knowing what to do with data, how to visualize and analyse them, make predictions, and test hypotheses are important barriers to success. Many researchers also lack adequate capacity to store and share these data, creating further bottlenecks to effective collaboration between groups and institutes. The US National Science Foundation-funded iPlant Collaborative was established in 2008 to form part of the data collection and analysis pipeline and help alleviate the bottlenecks associated with the big data challenge in plant science. Leveraging the power of high-performance computing facilities, iPlant provides free-to-use cyberinfrastructure to enable terabytes of data storage, improve analysis, and facilitate collaborations. To help train UK plant science researchers to use the iPlant platform and understand how it can be exploited to further research, GARNet organized a four-day Data mining with iPlant workshop at Warwick University in September 2013. This report provides an overview of the workshop, and highlights the power of the iPlant environment for lowering barriers to using complex bioinformatics resources, furthering discoveries in plant science research and providing a platform for education and outreach programmes.
Collapse
|
11
|
Abstract
Reconstructing the origin and evolution of land plants and their algal relatives is a fundamental problem in plant phylogenetics, and is essential for understanding how critical adaptations arose, including the embryo, vascular tissue, seeds, and flowers. Despite advances in molecular systematics, some hypotheses of relationships remain weakly resolved. Inferring deep phylogenies with bouts of rapid diversification can be problematic; however, genome-scale data should significantly increase the number of informative characters for analyses. Recent phylogenomic reconstructions focused on the major divergences of plants have resulted in promising but inconsistent results. One limitation is sparse taxon sampling, likely resulting from the difficulty and cost of data generation. To address this limitation, transcriptome data for 92 streptophyte taxa were generated and analyzed along with 11 published plant genome sequences. Phylogenetic reconstructions were conducted using up to 852 nuclear genes and 1,701,170 aligned sites. Sixty-nine analyses were performed to test the robustness of phylogenetic inferences to permutations of the data matrix or to phylogenetic method, including supermatrix, supertree, and coalescent-based approaches, maximum-likelihood and Bayesian methods, partitioned and unpartitioned analyses, and amino acid versus DNA alignments. Among other results, we find robust support for a sister-group relationship between land plants and one group of streptophyte green algae, the Zygnematophyceae. Strong and robust support for a clade comprising liverworts and mosses is inconsistent with a widely accepted view of early land plant evolution, and suggests that phylogenetic hypotheses used to understand the evolution of fundamental plant traits should be reevaluated.
Collapse
|
12
|
Abstract
The 1,000 plants (1KP) project is an international multi-disciplinary consortium that has generated transcriptome data from over 1,000 plant species, with exemplars for all of the major lineages across the Viridiplantae (green plants) clade. Here, we describe how to access the data used in a phylogenomics analysis of the first 85 species, and how to visualize our gene and species trees. Users can develop computational pipelines to analyse these data, in conjunction with data of their own that they can upload. Computationally estimated protein-protein interactions and biochemical pathways can be visualized at another site. Finally, we comment on our future plans and how they fit within this scalable system for the dissemination, visualization, and analysis of large multi-species data sets.
Collapse
|
13
|
Phylogenetic analysis with the iPlant discovery environment. CURRENT PROTOCOLS IN BIOINFORMATICS 2013; Chapter 6:6.13.1-6.13.13. [PMID: 23749754 DOI: 10.1002/0471250953.bi0613s42] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The iPlant Collaborative's Discovery Environment is a unified Web portal to many bioinformatics applications and analytical workflows, including various methods of phylogenetic analysis. This unit describes example protocols for phylogenetic analyses, starting at sequence retrieval from the GenBank sequence database, through to multiple sequence alignment inference and visualization of phylogenetic trees. Methods for extracting smaller sub-trees from very large phylogenies, and the comparative method of continuous ancestral character state reconstruction based on observed morphology of extant species related to their phylogenetic relationships, are also presented.
Collapse
|
14
|
Phylotastic! Making tree-of-life knowledge accessible, reusable and convenient. BMC Bioinformatics 2013; 14:158. [PMID: 23668630 PMCID: PMC3669619 DOI: 10.1186/1471-2105-14-158] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2013] [Accepted: 04/30/2013] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Scientists rarely reuse expert knowledge of phylogeny, in spite of years of effort to assemble a great "Tree of Life" (ToL). A notable exception involves the use of Phylomatic, which provides tools to generate custom phylogenies from a large, pre-computed, expert phylogeny of plant taxa. This suggests great potential for a more generalized system that, starting with a query consisting of a list of any known species, would rectify non-standard names, identify expert phylogenies containing the implicated taxa, prune away unneeded parts, and supply branch lengths and annotations, resulting in a custom phylogeny suited to the user's needs. Such a system could become a sustainable community resource if implemented as a distributed system of loosely coupled parts that interact through clearly defined interfaces. RESULTS With the aim of building such a "phylotastic" system, the NESCent Hackathons, Interoperability, Phylogenies (HIP) working group recruited 2 dozen scientist-programmers to a weeklong programming hackathon in June 2012. During the hackathon (and a three-month follow-up period), 5 teams produced designs, implementations, documentation, presentations, and tests including: (1) a generalized scheme for integrating components; (2) proof-of-concept pruners and controllers; (3) a meta-API for taxonomic name resolution services; (4) a system for storing, finding, and retrieving phylogenies using semantic web technologies for data exchange, storage, and querying; (5) an innovative new service, DateLife.org, which synthesizes pre-computed, time-calibrated phylogenies to assign ages to nodes; and (6) demonstration projects. These outcomes are accessible via a public code repository (GitHub.com), a website (http://www.phylotastic.org), and a server image. CONCLUSIONS Approximately 9 person-months of effort (centered on a software development hackathon) resulted in the design and implementation of proof-of-concept software for 4 core phylotastic components, 3 controllers, and 3 end-user demonstration tools. While these products have substantial limitations, they suggest considerable potential for a distributed system that makes phylogenetic knowledge readily accessible in computable form. Widespread use of phylotastic systems will create an electronic marketplace for sharing phylogenetic knowledge that will spur innovation in other areas of the ToL enterprise, such as annotation of sources and methods and third-party methods of quality assessment.
Collapse
|
15
|
The taxonomic name resolution service: an online tool for automated standardization of plant names. BMC Bioinformatics 2013; 14:16. [PMID: 23324024 PMCID: PMC3554605 DOI: 10.1186/1471-2105-14-16] [Citation(s) in RCA: 214] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2012] [Accepted: 01/02/2013] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND The digitization of biodiversity data is leading to the widespread application of taxon names that are superfluous, ambiguous or incorrect, resulting in mismatched records and inflated species numbers. The ultimate consequences of misspelled names and bad taxonomy are erroneous scientific conclusions and faulty policy decisions. The lack of tools for correcting this 'names problem' has become a fundamental obstacle to integrating disparate data sources and advancing the progress of biodiversity science. RESULTS The TNRS, or Taxonomic Name Resolution Service, is an online application for automated and user-supervised standardization of plant scientific names. The TNRS builds upon and extends existing open-source applications for name parsing and fuzzy matching. Names are standardized against multiple reference taxonomies, including the Missouri Botanical Garden's Tropicos database. Capable of processing thousands of names in a single operation, the TNRS parses and corrects misspelled names and authorities, standardizes variant spellings, and converts nomenclatural synonyms to accepted names. Family names can be included to increase match accuracy and resolve many types of homonyms. Partial matching of higher taxa combined with extraction of annotations, accession numbers and morphospecies allows the TNRS to standardize taxonomy across a broad range of active and legacy datasets. CONCLUSIONS We show how the TNRS can resolve many forms of taxonomic semantic heterogeneity, correct spelling errors and eliminate spurious names. As a result, the TNRS can aid the integration of disparate biological datasets. Although the TNRS was developed to aid in standardizing plant names, its underlying algorithms and design can be extended to all organisms and nomenclatural codes. The TNRS is accessible via a web interface at http://tnrs.iplantcollaborative.org/ and as a RESTful web service and application programming interface. Source code is available at https://github.com/iPlantCollaborativeOpenSource/TNRS/.
Collapse
|
16
|
The iPlant Collaborative: Cyberinfrastructure for Plant Biology. FRONTIERS IN PLANT SCIENCE 2011; 2:34. [PMID: 22645531 PMCID: PMC3355756 DOI: 10.3389/fpls.2011.00034] [Citation(s) in RCA: 255] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2011] [Accepted: 07/11/2011] [Indexed: 05/17/2023]
Abstract
The iPlant Collaborative (iPlant) is a United States National Science Foundation (NSF) funded project that aims to create an innovative, comprehensive, and foundational cyberinfrastructure in support of plant biology research (PSCIC, 2006). iPlant is developing cyberinfrastructure that uniquely enables scientists throughout the diverse fields that comprise plant biology to address Grand Challenges in new ways, to stimulate and facilitate cross-disciplinary research, to promote biology and computer science research interactions, and to train the next generation of scientists on the use of cyberinfrastructure in research and education. Meeting humanity's projected demands for agricultural and forest products and the expectation that natural ecosystems be managed sustainably will require synergies from the application of information technologies. The iPlant cyberinfrastructure design is based on an unprecedented period of research community input, and leverages developments in high-performance computing, data storage, and cyberinfrastructure for the physical sciences. iPlant is an open-source project with application programming interfaces that allow the community to extend the infrastructure to meet its needs. iPlant is sponsoring community-driven workshops addressing specific scientific questions via analysis tool integration and hypothesis testing. These workshops teach researchers how to add bioinformatics tools and/or datasets into the iPlant cyberinfrastructure enabling plant scientists to perform complex analyses on large datasets without the need to master the command-line or high-performance computational services.
Collapse
|