1
|
Augusta: From RNA-Seq to gene regulatory networks and Boolean models. Comput Struct Biotechnol J 2024; 23:783-790. [PMID: 38312198 PMCID: PMC10837063 DOI: 10.1016/j.csbj.2024.01.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 01/17/2024] [Accepted: 01/19/2024] [Indexed: 02/06/2024] Open
Abstract
Computational models of gene regulations help to understand regulatory mechanisms and are extensively used in a wide range of areas, e.g., biotechnology or medicine, with significant benefits. Unfortunately, there are only a few computational gene regulatory models of whole genomes allowing static and dynamic analysis due to the lack of sophisticated tools for their reconstruction. Here, we describe Augusta, an open-source Python package for Gene Regulatory Network (GRN) and Boolean Network (BN) inference from the high-throughput gene expression data. Augusta can reconstruct genome-wide models suitable for static and dynamic analyses. Augusta uses a unique approach where the first estimation of a GRN inferred from expression data is further refined by predicting transcription factor binding motifs in promoters of regulated genes and by incorporating verified interactions obtained from databases. Moreover, a refined GRN is transformed into a draft BN by searching in the curated model database and setting logical rules to incoming edges of target genes, which can be further manually edited as the model is provided in the SBML file format. The approach is applicable even if information about the organism under study is not available in the databases, which is typically the case for non-model organisms including most microbes. Augusta can be operated from the command line and, thus, is easy to use for automated prediction of models for various genomes. The Augusta package is freely available at github.com/JanaMus/Augusta. Documentation and tutorials are available at augusta.readthedocs.io.
Collapse
|
2
|
md_harmonize: A Python Package for Atom-Level Harmonization of Public Metabolic Databases. Metabolites 2023; 13:1199. [PMID: 38132881 PMCID: PMC10744849 DOI: 10.3390/metabo13121199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 12/13/2023] [Accepted: 12/15/2023] [Indexed: 12/23/2023] Open
Abstract
A major challenge to integrating public metabolic resources is the use of different nomenclatures by individual databases. This paper presents md_harmonize, an open-source Python package for harmonizing compounds and metabolic reactions across various metabolic databases. The md_harmonize package utilizes a neighborhood-specific graph coloring method for generating a unique identifier for each compound via atom identifiers based on a compound's chemical structure. The resulting harmonized compounds and reactions can be used for various downstream analyses, including the construction of atom-resolved metabolic networks and models for metabolic flux analysis. Parts of the md_harmonize package have been optimized using a variety of computational techniques to allow certain NP-complete problems handled by the software to be tractable for these specific use-cases. The software is available on GitHub and through the Python Package Index, with end-user documentation hosted on GitHub Pages.
Collapse
|
3
|
Synconn_build: A python based synthetic dataset generator for testing and validating control-oriented neural networks for building dynamics prediction. MethodsX 2023; 11:102464. [PMID: 38023310 PMCID: PMC10630644 DOI: 10.1016/j.mex.2023.102464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 10/23/2023] [Indexed: 12/01/2023] Open
Abstract
Applying model-based predictive control in buildings requires a control-oriented model capable of learning how various control actions influence building dynamics, such as indoor air temperature and energy use. However, there is currently a shortage of empirical or synthetic datasets with the appropriate features, variability, quality and volume to properly benchmark these control-oriented models. Addressing this need, a flexible, open-source, Python-based tool, synconn_build, capable of generating synthetic building operation data using EnergyPlus as the main building energy simulation engine is introduced. The uniqueness of synconn_build lies in its capability to automate multiple aspects of the simulation process, guided by user inputs drawn from a text-based configuration file. It generates various kinds of unique random signals for control inputs, performs co-simulation to create unique occupancy schedules, and acquires weather data. Additionally, it simplifies the typically tedious and complex task of configuring EnergyPlus files with all user inputs. Unlike other synthetic datasets for building operations, synconn_build offers a user-friendly generator that selectively creates data based on user inputs, preventing overwhelming data overproduction. Instead of emulating the operational schedules of real buildings, synconn_build generates test signals with more frequent variation to cover a broader range of operating conditions. •Synconn_build is an open-source tool designed to address the lack of datasets for benchmarking control-oriented building dynamics prediction models.•The tool automates simulations, data acquisition, and EnergyPlus configuration, guided by user inputs.•Synconn_build prevents data overproduction by selectively creating data, offering a user-friendly approach to dataset generation.
Collapse
|
4
|
PyAGH: a python package to fast construct kinship matrices based on different levels of omic data. BMC Bioinformatics 2023; 24:153. [PMID: 37072709 PMCID: PMC10111838 DOI: 10.1186/s12859-023-05280-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Accepted: 04/10/2023] [Indexed: 04/20/2023] Open
Abstract
BACKGROUND Construction of kinship matrices among individuals is an important step for both association studies and prediction studies based on different levels of omic data. Methods for constructing kinship matrices are becoming diverse and different methods have their specific appropriate scenes. However, software that can comprehensively calculate kinship matrices for a variety of scenarios is still in an urgent demand. RESULTS In this study, we developed an efficient and user-friendly python module, PyAGH, that can accomplish (1) conventional additive kinship matrces construction based on pedigree, genotypes, abundance data from transcriptome or microbiome; (2) genomic kinship matrices construction in combined population; (3) dominant and epistatic effects kinship matrices construction; (4) pedigree selection, tracing, detection and visualization; (5) visualization of cluster, heatmap and PCA analysis based on kinship matrices. The output from PyAGH can be easily integrated in other mainstream software based on users' purposes. Compared with other softwares, PyAGH integrates multiple methods for calculating the kinship matrix and has advantages in terms of speed and data size compared to other software. PyAGH is developed in python and C + + and can be easily installed by pip tool. Installation instructions and a manual document can be freely available from https://github.com/zhaow-01/PyAGH . CONCLUSION PyAGH is a fast and user-friendly Python package for calculating kinship matrices using pedigree, genotype, microbiome and transcriptome data as well as processing, analyzing and visualizing data and results. This package makes it easier to perform predictions and association studies processes based on different levels of omic data.
Collapse
|
5
|
NiftyPAD - Novel Python Package for Quantitative Analysis of Dynamic PET Data. Neuroinformatics 2023; 21:457-468. [PMID: 36622500 PMCID: PMC10085912 DOI: 10.1007/s12021-022-09616-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/30/2022] [Indexed: 01/10/2023]
Abstract
Current PET datasets are becoming larger, thereby increasing the demand for fast and reproducible processing pipelines. This paper presents a freely available, open source, Python-based software package called NiftyPAD, for versatile analyses of static, full or dual-time window dynamic brain PET data. The key novelties of NiftyPAD are the analyses of dual-time window scans with reference input processing, pharmacokinetic modelling with shortened PET acquisitions through the incorporation of arterial spin labelling (ASL)-derived relative perfusion measures, as well as optional PET data-based motion correction. Results obtained with NiftyPAD were compared with the well-established software packages PPET and QModeling for a range of kinetic models. Clinical data from eight subjects scanned with four different amyloid tracers were used to validate the computational performance. NiftyPAD achieved [Formula: see text] correlation with PPET, with absolute difference [Formula: see text] for linearised Logan and MRTM2 methods, and [Formula: see text] correlation with QModeling, with absolute difference [Formula: see text] for basis function based SRTM and SRTM2 models. For the recently published SRTM ASL method, which is unavailable in existing software packages, high correlations with negligible bias were observed with the full scan SRTM in terms of non-displaceable binding potential ([Formula: see text]), indicating reliable model implementation in NiftyPAD. Together, these findings illustrate that NiftyPAD is versatile, flexible, and produces comparable results with established software packages for quantification of dynamic PET data. It is freely available ( https://github.com/AMYPAD/NiftyPAD ), and allows for multi-platform usage. The modular setup makes adding new functionalities easy, and the package is lightweight with minimal dependencies, making it easy to use and integrate into existing processing pipelines.
Collapse
|
6
|
Linking Expression of Cell-Surface Receptors with Transcription Factors by Computational Analysis of Paired Single-Cell Proteomes and Transcriptomes. Methods Mol Biol 2023; 2660:149-169. [PMID: 37191796 DOI: 10.1007/978-1-0716-3163-8_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Complex signaling and transcriptional programs control the development and physiology of specialized cell types. Genetic perturbations in these programs cause human cancers to arise from a diverse set of specialized cell types and developmental states. Understanding these complex systems and their potential to drive cancer is critical for the development of immunotherapies and druggable targets. Pioneering single-cell multi-omics technologies that analyze transcriptional states have been coupled with the expression of cell-surface receptors. This chapter describes SPaRTAN (Single-cell Proteomic and RNA-based Transcription factor Activity Network), a computational framework, to link transcription factors with cell-surface protein expression. SPaRTAN uses CITE-seq (cellular indexing of transcriptomes and epitopes by sequencing) data and cis-regulatory sites to model the effect of interactions between transcription factors and cell-surface receptors on gene expression. We demonstrate the pipeline for SPaRTAN using CITE-seq data from peripheral blood mononuclear cells.
Collapse
|
7
|
PubExN: An Automated PubMed Bulk Article Extractor with Affiliation Normalization Package. SN COMPUTER SCIENCE 2023; 4:353. [PMID: 37128512 PMCID: PMC10132428 DOI: 10.1007/s42979-023-01687-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Accepted: 01/11/2023] [Indexed: 05/03/2023]
Abstract
Biomedical article extraction is the preliminary step for every biomedical application. These applications are helpful in finding the gene, disease, chemical, drugs, protein entities. Finding entities relation such as gene-gene entities, drug-disease interaction, and chemical protein relation the PubExN can be helpful for these types of biomedical applications. In most cases, domain experts do this extraction process on their own. Human interference makes this process time-consuming and there is a high probability, that documents can be missed during the extraction process. To get rid of these complicated processes a python package is introduced to automate the process of bulk extraction from the PubMed database. The extraction process covers all the citation information with the associated abstract. The batch approach is used to extract the bulk extraction. The motivation for the development of PubExN was to provide flexibility for the extraction process of biomedical article's text data from NCBI's PubMed database. Basically, NCBI's PubMed database article contains the article id or can say PubMed-id (PMID), the title of the article, abstract, authors information, etc. This package will benefit many biomedical texts mining research including biomedical named entity recognition, biomedical relation extraction, literature discovery, knowledgebase creation, and various biomedical Natural Language Processing (NLP) tasks. In addition, it could be used in the author name disambiguation problems and new drug discoveries. This package will help save time and extra effort for the extraction and normalization process of PubMed articles.
Collapse
|
8
|
MLcps: machine learning cumulative performance score for classification problems. Gigascience 2022; 12:giad108. [PMID: 38091508 PMCID: PMC10716825 DOI: 10.1093/gigascience/giad108] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 10/02/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND Assessing the performance of machine learning (ML) models requires careful consideration of the evaluation metrics used. It is often necessary to utilize multiple metrics to gain a comprehensive understanding of a trained model's performance, as each metric focuses on a specific aspect. However, comparing the scores of these individual metrics for each model to determine the best-performing model can be time-consuming and susceptible to subjective user preferences, potentially introducing bias. RESULTS We propose the Machine Learning Cumulative Performance Score (MLcps), a novel evaluation metric for classification problems. MLcps integrates several precomputed evaluation metrics into a unified score, enabling a comprehensive assessment of the trained model's strengths and weaknesses. We tested MLcps on 4 publicly available datasets, and the results demonstrate that MLcps provides a holistic evaluation of the model's robustness, ensuring a thorough understanding of its overall performance. CONCLUSIONS By utilizing MLcps, researchers and practitioners no longer need to individually examine and compare multiple metrics to identify the best-performing models. Instead, they can rely on a single MLcps value to assess the overall performance of their ML models. This streamlined evaluation process saves valuable time and effort, enhancing the efficiency of model evaluation. MLcps is available as a Python package at https://pypi.org/project/MLcps/.
Collapse
|
9
|
twoaxistracking - a python package for simulating self-shading of two-axis tracking solar collectors. MethodsX 2022; 9:101876. [PMID: 36311267 PMCID: PMC9597106 DOI: 10.1016/j.mex.2022.101876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 10/03/2022] [Indexed: 12/02/2022] Open
Abstract
Self-shading in fields of two-axis tracking collectors typically ranges from 1% to 6% of the annual incident irradiation. It is thus essential to account for shading in order to obtain accurate yield estimates and financing for such solar projects. The present study presents the free and open-source Python package twoaxistracking for simulating self-shading in fields of two-axis tracking collectors. The package is freely available at: https://github.com/pvlib/twoaxistracking. The main steps of the method and mathematical formulation are described. Additionally, a demonstration of how to use the package is presented. The shading calculation method excels over previous methods found in the literature in that it can:•Handle arbitrary aperture geometries and distinguish between the total and active areas.•Account for sloped ground and collectors with different heights within the same field.•Reduce computation time by skipping calculations at high solar elevation angles.
Collapse
|
10
|
Evolution of the Automatic Rhodopsin Modeling (ARM) Protocol. Top Curr Chem (Cham) 2022; 380:21. [PMID: 35291019 PMCID: PMC8924150 DOI: 10.1007/s41061-022-00374-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 01/29/2022] [Indexed: 10/27/2022]
Abstract
In recent years, photoactive proteins such as rhodopsins have become a common target for cutting-edge research in the field of optogenetics. Alongside wet-lab research, computational methods are also developing rapidly to provide the necessary tools to analyze and rationalize experimental results and, most of all, drive the design of novel systems. The Automatic Rhodopsin Modeling (ARM) protocol is focused on providing exactly the necessary computational tools to study rhodopsins, those being either natural or resulting from mutations. The code has evolved along the years to finally provide results that are reproducible by any user, accurate and reliable so as to replicate experimental trends. Furthermore, the code is efficient in terms of necessary computing resources and time, and scalable in terms of both number of concurrent calculations as well as features. In this review, we will show how the code underlying ARM achieved each of these properties.
Collapse
|
11
|
Scikit-Dimension: A Python Package for Intrinsic Dimension Estimation. ENTROPY (BASEL, SWITZERLAND) 2021; 23:1368. [PMID: 34682092 PMCID: PMC8534554 DOI: 10.3390/e23101368] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 10/10/2021] [Accepted: 10/16/2021] [Indexed: 02/07/2023]
Abstract
Dealing with uncertainty in applications of machine learning to real-life data critically depends on the knowledge of intrinsic dimensionality (ID). A number of methods have been suggested for the purpose of estimating ID, but no standard package to easily apply them one by one or all at once has been implemented in Python. This technical note introduces scikit-dimension, an open-source Python package for intrinsic dimension estimation. The scikit-dimension package provides a uniform implementation of most of the known ID estimators based on the scikit-learn application programming interface to evaluate the global and local intrinsic dimension, as well as generators of synthetic toy and benchmark datasets widespread in the literature. The package is developed with tools assessing the code quality, coverage, unit testing and continuous integration. We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation for real-life and synthetic data.
Collapse
|
12
|
pyam: Analysis and visualisation of integrated assessment and macro-energy scenarios. OPEN RESEARCH EUROPE 2021; 1:74. [PMID: 37645194 PMCID: PMC10446008 DOI: 10.12688/openreseurope.13633.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 08/12/2021] [Indexed: 08/31/2023]
Abstract
The open-source Python package pyam provides a suite of features and methods for the analysis, validation and visualization of reference data and scenario results generated by integrated assessment models, macro-energy tools and other frameworks in the domain of energy transition, climate change mitigation and sustainable development. It bridges the gap between scenario processing and visualisation solutions that are "hard-wired" to specific modelling frameworks and generic data analysis or plotting packages. The package aims to facilitate reproducibility and reliability of scenario processing, validation and analysis by providing well-tested and documented methods for working with timeseries data in the context of climate policy and energy systems. It supports various data formats, including sub-annual resolution using continuous time representation and "representative timeslices". The pyam package can be useful for modelers generating scenario results using their own tools as well as researchers and analysts working with existing scenario ensembles such as those supporting the IPCC reports or produced in research projects. It is structured in a way that it can be applied irrespective of a user's domain expertise or level of Python knowledge, supporting experts as well as novice users. The code base is implemented following best practices of collaborative scientific-software development. This manuscript describes the design principles of the package and the types of data which can be handled. The usefulness of pyam is illustrated by highlighting several recent applications.
Collapse
|
13
|
mfapy: An open-source Python package for 13C-based metabolic flux analysis. Metab Eng Commun 2021; 13:e00177. [PMID: 34354925 PMCID: PMC8322459 DOI: 10.1016/j.mec.2021.e00177] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 06/01/2021] [Accepted: 07/05/2021] [Indexed: 11/28/2022] Open
Abstract
13C-based metabolic flux analysis (13C-MFA) is an essential tool for estimating intracellular metabolic flux levels in metabolic engineering and biology. In 13C-MFA, a metabolic flux distribution that explains the observed isotope labeling data was computationally estimated using a non-linear optimization method. Herein, we report the development of mfapy, an open-source Python package developed for more flexibility and extensibility for 13C-MFA. mfapy compels users to write a customized Python code by describing each step in the data analysis procedures of the isotope labeling experiments. The flexibility and extensibility provided by mfapy can support trial-and-error performance in the routine estimation of metabolic flux distributions, experimental design by computer simulations of 13C-MFA experiments, and development of new data analysis techniques for stable isotope labeling experiments. mfapy is available to the public from the Github repository (https://github.com/fumiomatsuda/mfapy). An open-source Python package, mfapy, is developed for 13C-MFA. mfapy enables users to write Python codes for data analysis procedures of 13C-MFA. mfapy has a flexibility and extensibility to support various data analysis procedures. Computer simulations of 13C-MFA experiments is supported for experimental design.
Collapse
|
14
|
PySmash: Python package and individual executable program for representative substructure generation and application. Brief Bioinform 2021; 22:6168498. [PMID: 33709154 DOI: 10.1093/bib/bbab017] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2020] [Revised: 01/06/2021] [Accepted: 01/12/2021] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Substructure screening is widely applied to evaluate the molecular potency and ADMET properties of compounds in drug discovery pipelines, and it can also be used to interpret QSAR models for the design of new compounds with desirable physicochemical and biological properties. With the continuous accumulation of more experimental data, data-driven computational systems which can derive representative substructures from large chemical libraries attract more attention. Therefore, the development of an integrated and convenient tool to generate and implement representative substructures is urgently needed. RESULTS In this study, PySmash, a user-friendly and powerful tool to generate different types of representative substructures, was developed. The current version of PySmash provides both a Python package and an individual executable program, which achieves ease of operation and pipeline integration. Three types of substructure generation algorithms, including circular, path-based and functional group-based algorithms, are provided. Users can conveniently customize their own requirements for substructure size, accuracy and coverage, statistical significance and parallel computation during execution. Besides, PySmash provides the function for external data screening. CONCLUSION PySmash, a user-friendly and integrated tool for the automatic generation and implementation of representative substructures, is presented. Three screening examples, including toxicophore derivation, privileged motif detection and the integration of substructures with machine learning (ML) models, are provided to illustrate the utility of PySmash in safety profile evaluation, therapeutic activity exploration and molecular optimization, respectively. Its executable program and Python package are available at https://github.com/kotori-y/pySmash.
Collapse
|
15
|
Abstract
Psychological embeddings provide a powerful formalism for characterizing human-perceived similarity among members of a stimulus set. Obtaining high-quality embeddings can be costly due to algorithm design, software deployment, and participant compensation. This work aims to advance state-of-the-art embedding techniques and provide a comprehensive software package that makes obtaining high-quality psychological embeddings both easy and relatively efficient. Contributions are made on four fronts. First, the embedding procedure allows multiple trial configurations (e.g., triplets) to be used for collecting similarity judgments from participants. For example, trials can be configured to collect triplet comparisons or to sort items into groups. Second, a likelihood model is provided for three classes of similarity kernels allowing users to easily infer the parameters of their preferred model using gradient descent. Third, an active selection algorithm is provided that makes data collection more efficient by proposing comparisons that provide the strongest constraints on the embedding. Fourth, the likelihood model allows the specification of group-specific attention weight parameters. A series of experiments are included to highlight each of these contributions and their impact on converging to a high-quality embedding. Collectively, these incremental improvements provide a powerful and complete set of tools for inferring psychological embeddings. The relevant tools are available as the Python package PsiZ, which can be cloned from GitHub ( https://github.com/roads/psiz ).
Collapse
|
16
|
Semi-automatic extraction of liana stems from terrestrial LiDAR point clouds of tropical rainforests. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING : OFFICIAL PUBLICATION OF THE INTERNATIONAL SOCIETY FOR PHOTOGRAMMETRY AND REMOTE SENSING (ISPRS) 2019; 154:114-126. [PMID: 31417229 PMCID: PMC6686632 DOI: 10.1016/j.isprsjprs.2019.05.011] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Revised: 05/22/2019] [Accepted: 05/27/2019] [Indexed: 06/10/2023]
Abstract
Lianas are key structural elements of tropical forests having a large impact on the global carbon cycle by reducing tree growth and increasing tree mortality. Despite the reported increasing abundance of lianas across neotropics, very few studies have attempted to quantify the impact of lianas on tree and forest structure. Recent advances in high resolution terrestrial laser scanning (TLS) systems have enabled us to quantify the forest structure, in an unprecedented detail. However, the uptake of TLS technology to study lianas has not kept up with the same pace as it has for trees. The slower technological adoption of TLS to study lianas is due to the lack of methods to study these complex growth forms. In this study, we present a semi-automatic method to extract liana woody components from plot-level TLS data of a tropical rainforest. We tested the method in eight plots from two different tropical rainforest sites (two in Gigante Peninsula, Panama and six in Nouragues, French Guiana) along an increasing gradient of liana infestation (from plots with low liana density to plots with very high liana density). Our method uses a machine learning model based on the Random Forest (RF) algorithm. The RF algorithm is trained on the eigen features extracted from the points in 3D at multiple spatial scales. The RF based liana stem extraction method successfully extracts on average 58% of liana woody points in our dataset with a high precision of 88%. We also present simple post-processing steps that increase the percentage of extracted liana stems from 54% to 90% in Nouragues and 65% to 70% in Gigante Peninsula without compromising on the precision. We provide the entire processing pipeline as an open source python package. Our method will facilitate new research to study lianas as it enables the monitoring of liana abundance, growth and biomass in forest plots. In addition, the method facilitates the easier processing of 3D data to study tree structure from a liana-infested forest.
Collapse
|
17
|
Quantiprot - a Python package for quantitative analysis of protein sequences. BMC Bioinformatics 2017; 18:339. [PMID: 28716000 PMCID: PMC5512976 DOI: 10.1186/s12859-017-1751-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2017] [Accepted: 07/05/2017] [Indexed: 11/17/2022] Open
Abstract
Background The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted. Results Quantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf’s law coefficient. Conclusions We propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences.
Collapse
|