1
|
Frommlet F. A neutral comparison of algorithms to minimize L 0 penalties for high-dimensional variable selection. Biom J 2024; 66:e2200207. [PMID: 37421205 DOI: 10.1002/bimj.202200207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 03/09/2023] [Accepted: 04/29/2023] [Indexed: 07/10/2023]
Abstract
Variable selection methods based on L0 penalties have excellent theoretical properties to select sparse models in a high-dimensional setting. There exist modifications of the Bayesian Information Criterion (BIC) which either control the familywise error rate (mBIC) or the false discovery rate (mBIC2) in terms of which regressors are selected to enter a model. However, the minimization of L0 penalties comprises a mixed-integer problem which is known to be NP-hard and therefore becomes computationally challenging with increasing numbers of regressor variables. This is one reason why alternatives like the LASSO have become so popular, which involve convex optimization problems that are easier to solve. The last few years have seen some real progress in developing new algorithms to minimize L0 penalties. The aim of this article is to compare the performance of these algorithms in terms of minimizing L0 -based selection criteria. Simulation studies covering a wide range of scenarios that are inspired by genetic association studies are used to compare the values of selection criteria obtained with different algorithms. In addition, some statistical characteristics of the selected models and the runtime of algorithms are compared. Finally, the performance of the algorithms is illustrated in a real data example concerned with expression quantitative trait loci (eQTL) mapping.
Collapse
Affiliation(s)
- Florian Frommlet
- Institute of Medical Statistics, Center for Medical Data Science, Medical University of Vienna, Vienna, Austria
| |
Collapse
|
2
|
Zhou J, Hoen AG, Mcritchie S, Pathmasiri W, Viles WD, Nguyen QP, Madan JC, Dade E, Karagas MR, Gui J. Information enhanced model selection for Gaussian graphical model with application to metabolomic data. Biostatistics 2022; 23:926-948. [PMID: 33720330 PMCID: PMC9608647 DOI: 10.1093/biostatistics/kxab006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 01/21/2021] [Accepted: 01/22/2021] [Indexed: 11/12/2022] Open
Abstract
In light of the low signal-to-noise nature of many large biological data sets, we propose a novel method to learn the structure of association networks using Gaussian graphical models combined with prior knowledge. Our strategy includes two parts. In the first part, we propose a model selection criterion called structural Bayesian information criterion, in which the prior structure is modeled and incorporated into Bayesian information criterion. It is shown that the popular extended Bayesian information criterion is a special case of structural Bayesian information criterion. In the second part, we propose a two-step algorithm to construct the candidate model pool. The algorithm is data-driven and the prior structure is embedded into the candidate model automatically. Theoretical investigation shows that under some mild conditions structural Bayesian information criterion is a consistent model selection criterion for high-dimensional Gaussian graphical model. Simulation studies validate the superiority of the proposed algorithm over the existing ones and show the robustness to the model misspecification. Application to relative concentration data from infant feces collected from subjects enrolled in a large molecular epidemiological cohort study validates that metabolic pathway involvement is a statistically significant factor for the conditional dependence between metabolites. Furthermore, new relationships among metabolites are discovered which can not be identified by the conventional methods of pathway analysis. Some of them have been widely recognized in biological literature.
Collapse
Affiliation(s)
- Jie Zhou
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, 3 Rope Ferry Road, Hanover, NH 03755, USA
| | - Anne G Hoen
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA and Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, 3 Rope Ferry Road, Hanover, NH 03755, USA
| | - Susan Mcritchie
- Nutrition Research Institute, Department of Nutrition, School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, 500 Laureate Way, Kannapolis, NC 28081, USA
| | - Wimal Pathmasiri
- Nutrition Research Institute, Department of Nutrition, School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, 500 Laureate Way, Kannapolis, NC 28081, USA
| | - Weston D Viles
- Department of Mathematics and Statistics, University of Southern Maine, 96 Falmouth St, Portland, ME 04103, USA
| | - Quang P Nguyen
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA and Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Juliette C Madan
- Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Erika Dade
- Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Margaret R Karagas
- Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Jiang Gui
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| |
Collapse
|
3
|
Frommlet F, Szulc P, König F, Bogdan M. Selecting predictive biomarkers from genomic data. PLoS One 2022; 17:e0269369. [PMID: 35709188 PMCID: PMC9202896 DOI: 10.1371/journal.pone.0269369] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 05/13/2022] [Indexed: 11/18/2022] Open
Abstract
Recently there have been tremendous efforts to develop statistical procedures which allow to determine subgroups of patients for which certain treatments are effective. This article focuses on the selection of prognostic and predictive genetic biomarkers based on a relatively large number of candidate Single Nucleotide Polymorphisms (SNPs). We consider models which include prognostic markers as main effects and predictive markers as interaction effects with treatment. We compare different high-dimensional selection approaches including adaptive lasso, a Bayesian adaptive version of the Sorted L-One Penalized Estimator (SLOBE) and a modified version of the Bayesian Information Criterion (mBIC2). These are compared with classical multiple testing procedures for individual markers. Having identified predictive markers we consider several different approaches how to specify subgroups susceptible to treatment. Our main conclusion is that selection based on mBIC2 and SLOBE has similar predictive performance as the adaptive lasso while including substantially fewer biomarkers.
Collapse
Affiliation(s)
- Florian Frommlet
- Department of Medical Statistics, CEMSIIS, Medical University of Vienna, Vienna, Austria
- * E-mail:
| | - Piotr Szulc
- Institute of Mathematics, University of Wroclaw, Wroclaw, Poland
| | - Franz König
- Department of Medical Statistics, CEMSIIS, Medical University of Vienna, Vienna, Austria
| | - Malgorzata Bogdan
- Institute of Mathematics, University of Wroclaw, Wroclaw, Poland
- Department of Statistics, Lund University, Lund, Sweden
| |
Collapse
|
4
|
Yan C, Liu J, Wei Z, Chen J, Ji Y, Fan L. Algal/bacterial uptake kinetics of dissolved organic nitrogen in municipal wastewater treatment facilities effluents. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2022; 309:114719. [PMID: 35180440 DOI: 10.1016/j.jenvman.2022.114719] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Revised: 02/06/2022] [Accepted: 02/10/2022] [Indexed: 06/14/2023]
Abstract
The simulation and analysis of the degradation process of organic nitrogen contaminants in wastewater treatment facility effluent is important to the estimation of its actual contribution to eutrophication, and it is crucial to the developing of watershed protection plan. In this study, algal and algal/bacterial based bioassay was conducted to study the bioavailability of dissolved organic nitrogen contaminants in wastewater treatment plants effluents, and 4 kinetic models were used to describe the mineralization process. The traditional 1-pool model that was commonly used in water quality models showed poor correlation (r2 = 0.613 ± 0.261), while the other three models performed much better (r2 > 0.950). The model coefficient and simplicity were studied using Akaike information criterion and Bayesian information criterion, and Gamma model was indicated to be the best model since it presented the most parsimonious fit to the data with the fewest terms. This study exhibited that the bioavailability and degradation rate of organic nitrogen in wastewater effluent varied greatly, and this variation should be considered in water quality models. Besides, Gamma model could be used to modify the current Total Maximum Daily Load models to provide a scientific basis for making watershed protection plans and controlling eutrophication.
Collapse
Affiliation(s)
- Cihang Yan
- College of Chemistry and Materials Science, Sichuan Normal University, Chengdu, 610066, China
| | - Jiayin Liu
- College of Chemistry and Materials Science, Sichuan Normal University, Chengdu, 610066, China
| | - Zhiyu Wei
- College of Chemistry and Materials Science, Sichuan Normal University, Chengdu, 610066, China
| | - Jie Chen
- Sichuan Academy of Environmental Sciences, Chengdu, 610041, China
| | - Yutong Ji
- Sichuan Academy of Environmental Sciences, Chengdu, 610041, China
| | - Lu Fan
- College of Chemistry and Materials Science, Sichuan Normal University, Chengdu, 610066, China; Key Laboratory of Treatment for Special Wastewater of Sichuan Province Higher Education System, Sichuan, Chengdu, 610066, China.
| |
Collapse
|
5
|
Guo Z, Chen M, Fan Y, Song Y. A general adaptive ridge regression method for generalized linear models: an iterative re-weighting approach. COMMUN STAT-THEOR M 2022. [DOI: 10.1080/03610926.2022.2028841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Affiliation(s)
- Zijun Guo
- College of Science, University of Shanghai for Science and Technology, Shanghai, China
| | - Mengxing Chen
- College of Science, University of Shanghai for Science and Technology, Shanghai, China
| | - Yali Fan
- College of Science, University of Shanghai for Science and Technology, Shanghai, China
| | - Yan Song
- Department of Control Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| |
Collapse
|
6
|
Wallin J, Bogdan M, Szulc PA, Doerge RW, Siegmund DO. Ghost QTL and hotspots in experimental crosses: novel approach for modeling polygenic effects. Genetics 2021; 217:6067404. [PMID: 33789342 DOI: 10.1093/genetics/iyaa041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 12/10/2020] [Indexed: 11/14/2022] Open
Abstract
Ghost quantitative trait loci (QTL) are the false discoveries in QTL mapping, that arise due to the "accumulation" of the polygenic effects, uniformly distributed over the genome. The locations on the chromosome that are strongly correlated with the total of the polygenic effects depend on a specific sample correlation structure determined by the genotypes at all loci. The problem is particularly severe when the same genotypes are used to study multiple QTL, e.g. using recombinant inbred lines or studying the expression QTL. In this case, the ghost QTL phenomenon can lead to false hotspots, where multiple QTL show apparent linkage to the same locus. We illustrate the problem using the classic backcross design and suggest that it can be solved by the application of the extended mixed effect model, where the random effects are allowed to have a nonzero mean. We provide formulas for estimating the thresholds for the corresponding t-test statistics and use them in the stepwise selection strategy, which allows for a simultaneous detection of several QTL. Extensive simulation studies illustrate that our approach eliminates ghost QTL/false hotspots, while preserving a high power of true QTL detection.
Collapse
Affiliation(s)
- Jonas Wallin
- Department of Statistics, Lund University, 220 07 Lund, Sweden
| | - Małgorzata Bogdan
- Department of Statistics, Lund University, 220 07 Lund, Sweden.,Department of Mathematics, Institute of Mathematics, University of Wroclaw, 50-137 Wroclaw, Poland
| | - Piotr A Szulc
- Department of Mathematics, Institute of Mathematics, University of Wroclaw, 50-137 Wroclaw, Poland
| | - R W Doerge
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15 213, USA.,Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15 213, USA
| | - David O Siegmund
- Department of Statistics, Stanford University, Stanford, CA 94 305, USA
| |
Collapse
|
7
|
Huang J, Jiao Y, Kang L, Liu J, Liu Y, Lu X. GSDAR: a fast Newton algorithm for $$\ell _0$$ regularized generalized linear models with statistical guarantee. Comput Stat 2021. [DOI: 10.1007/s00180-021-01098-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
8
|
Su L, Weaver JL, Groenenboom M, Nakamura N, Rus E, Anand P, Jha SK, Okasinski JS, Dura JA, Reeja-Jayan B. Tailoring Electrode-Electrolyte Interfaces in Lithium-Ion Batteries Using Molecularly Engineered Functional Polymers. ACS APPLIED MATERIALS & INTERFACES 2021; 13:9919-9931. [PMID: 33616383 DOI: 10.1021/acsami.0c20978] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Electrode-electrolyte interfaces (EEIs) affect the rate capability, cycling stability, and thermal safety of lithium-ion batteries (LIBs). Designing stable EEIs with fast Li+ transport is crucial for developing advanced LIBs. Here, we study Li+ kinetics at EEIs tailored by three nanoscale polymer thin films via chemical vapor deposition (CVD) polymerization. Small binding energy with Li+ and the presence of sufficient binding sites for Li+ allow poly(3,4-ethylenedioxythiophene) (PEDOT) based artificial coatings to enable fast charging of LiCoO2. Operando synchrotron X-ray diffraction experiments suggest that the superior Li+ transport property in PEDOT further improves current homogeneity in the LiCoO2 electrode during cycling. PEDOT also forms chemical bonds with LiCoO2, which reduces Co dissolution and inhibits electrolyte decomposition. As a result, the LiCoO2 4.5 V cycle life tested at C/2 increases over 1700% after PEDOT coating. In comparison, the other two polymer coatings show undesirable effects on LiCoO2 performance. These insights provide us with rules for selecting/designing polymers to engineer EEIs in advanced LIBs.
Collapse
Affiliation(s)
- Laisuo Su
- Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Jamie L Weaver
- National Institute of Standards and Technology, Material Measurement Laboratory Gaithersburg, Maryland 20899, United States
| | - Mitchell Groenenboom
- National Institute of Standards and Technology, Material Measurement Laboratory Gaithersburg, Maryland 20899, United States
| | - Nathan Nakamura
- Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Eric Rus
- National Institute of Standards and Technology, Center for Neutron Research, Gaithersburg, Maryland 20899, United States
| | - Priyanka Anand
- Department of Materials Science & Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Shikhar Krishn Jha
- Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - John S Okasinski
- Advanced Photon Source, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Joseph A Dura
- National Institute of Standards and Technology, Center for Neutron Research, Gaithersburg, Maryland 20899, United States
| | - B Reeja-Jayan
- Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
9
|
Correa Marrero M, Immink RGH, de Ridder D, van Dijk ADJ. Improved inference of intermolecular contacts through protein-protein interaction prediction using coevolutionary analysis. Bioinformatics 2020; 35:2036-2042. [PMID: 30398547 DOI: 10.1093/bioinformatics/bty924] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Revised: 10/11/2018] [Accepted: 11/05/2018] [Indexed: 01/09/2023] Open
Abstract
MOTIVATION Predicting residue-residue contacts between interacting proteins is an important problem in bioinformatics. The growing wealth of sequence data can be used to infer these contacts through correlated mutation analysis on multiple sequence alignments of interacting homologs of the proteins of interest. This requires correct identification of pairs of interacting proteins for many species, in order to avoid introducing noise (i.e. non-interacting sequences) in the analysis that will decrease predictive performance. RESULTS We have designed Ouroboros, a novel algorithm to reduce such noise in intermolecular contact prediction. Our method iterates between weighting proteins according to how likely they are to interact based on the correlated mutations signal, and predicting correlated mutations based on the weighted sequence alignment. We show that this approach accurately discriminates between protein interaction versus non-interaction and simultaneously improves the prediction of intermolecular contact residues compared to a naive application of correlated mutation analysis. This requires no training labels concerning interactions or contacts. Furthermore, the method relaxes the assumption of one-to-one interaction of previous approaches, allowing for the study of many-to-many interactions. AVAILABILITY AND IMPLEMENTATION Source code and test data are available at www.bif.wur.nl/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Richard G H Immink
- Laboratory of Molecular Biology, Department of Plant Sciences.,Bioscience, Wageningen Plant Research
| | | | - Aalt D J van Dijk
- Bioinformatics Group, Department of Plant Sciences.,Bioscience, Wageningen Plant Research.,Biometris, Department of Plant Sciences, Wageningen University & Research, Wageningen PB, The Netherlands
| |
Collapse
|
10
|
Mendonça F, Mostafa SS, Morgado-Dias F, Ravelo-García AG. Matrix of Lags: A tool for analysis of multiple dependent time series applied for CAP scoring. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2020; 189:105314. [PMID: 31978807 DOI: 10.1016/j.cmpb.2020.105314] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Revised: 11/19/2019] [Accepted: 01/04/2020] [Indexed: 06/10/2023]
Abstract
BACKGROUND Multiple methods have been developed to assess what happens between and within time series. In a particular type of these series, the previous values of the currently observed series are contingent on the lagged values of another series. These cases can commonly be addressed by regression. However, a model selection criteria should be employed to evaluate the compromise between the amount of information provided and the model complexity. This is the basis for the development of the Matrix of Lags (MoL), a tool to study dependent time series. METHODS For each input, multiple regressions were applied to produce a model for each lag and a model selection criterion identifies the lags that will populate an auxiliary matrix. Afterwards, the energy of the lags (that are in the auxiliary matrix) was used to define a row of the MoL. Therefore, each input corresponds to a row of the MoL. To test the proposed tool, the heart rate variability and the electrocardiogram derived respiration were employed to perform the indirect estimation of the electroencephalography cyclic alternating pattern (CAP) cycles. Therefore, a support vector machine was fed with the MoL to perform the CAP cycle classification for each input signal. Multiple tests were carried out to further examine the proposed tool, including the effect of balancing the datasets, application of other regression methods and employment of two feature section models. The first was based on sequential backward selection while the second examined characteristics of a return map. RESULTS The best performance of the subject independent model was attained by feeding the lags, selected by sequential backward selection, to a support vector machine, achieving an average accuracy, sensitivity, specificity and area under the receiver operating characteristic curve of, respectively, 77%, 71%, 82% and 0.77. CONCLUSIONS The developed model allows to perform a measurement of a characteristic marker of sleep instability (the CAP cycle) and the results are in the upper bound of the specialist agreement range with visual analysis. Thus, the developed method could possibly be used for clinical diagnosis.
Collapse
Affiliation(s)
- Fábio Mendonça
- Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Lisbon, Portugal; Madeira Interactive Technologies Institute (ITI/Larsys/M-ITI), 9020-105 Funchal, Madeira, Portugal.
| | - Sheikh Shanawaz Mostafa
- Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Lisbon, Portugal; Madeira Interactive Technologies Institute (ITI/Larsys/M-ITI), 9020-105 Funchal, Madeira, Portugal
| | - Fernando Morgado-Dias
- Madeira Interactive Technologies Institute (ITI/Larsys/M-ITI), 9020-105 Funchal, Madeira, Portugal; Faculdade de Ciências Exatas e da Engenharia, Universidade da Madeira, 9000-082 Funchal, Madeira, Portugal
| | - Antonio G Ravelo-García
- Institute for Technological Development and Innovation in Communications, Universidad de Las Palmas de Gran Canaria, 35001 Las Palmas de Gran Canaria, Canary Islands, Spain
| |
Collapse
|
11
|
Mendonça F, Mostafa SS, Morgado-Dias F, Ravelo-García AG, Penzel T. Sleep quality of subjects with and without sleep-disordered breathing based on the cyclic alternating pattern rate estimation from single-lead ECG. Physiol Meas 2019; 40:105009. [DOI: 10.1088/1361-6579/ab4f08] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
12
|
Model selection in sparse high-dimensional vine copula models with an application to portfolio risk. J MULTIVARIATE ANAL 2019. [DOI: 10.1016/j.jmva.2019.03.004] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
13
|
Chen AH, Ge W, Metcalf W, Jakobsson E, Mainzer LS, Lipka AE. An assessment of true and false positive detection rates of stepwise epistatic model selection as a function of sample size and number of markers. Heredity (Edinb) 2019; 122:660-671. [PMID: 30443009 PMCID: PMC6462028 DOI: 10.1038/s41437-018-0162-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Revised: 10/19/2018] [Accepted: 10/28/2018] [Indexed: 12/21/2022] Open
Abstract
Association studies have been successful at identifying genomic regions associated with important traits, but routinely employ models that only consider the additive contribution of an individual marker. Because quantitative trait variability typically arises from multiple additive and non-additive sources, utilization of statistical approaches that include main and two-way interaction marker effects of several loci in one model could lead to unprecedented characterization of these sources. Here we examine the ability of one such approach, called the Stepwise Procedure for constructing an Additive and Epistatic Multi-Locus model (SPAEML), to detect additive and epistatic signals simulated using maize and human marker data. Our results revealed that SPAEML was capable of detecting quantitative trait nucleotides (QTNs) at sample sizes as low as n = 300 and consistently specifying signals as additive and epistatic for larger sizes. Sample size and minor allele frequency had a major influence on SPAEML's ability to distinguish between additive and epistatic signals, while the number of markers tested did not. We conclude that SPAEML is a useful approach for providing further elucidation of the additive and epistatic sources contributing to trait variability when applied to a small subset of genome-wide markers located within specific genomic regions identified using a priori analyses.
Collapse
Affiliation(s)
- Angela H Chen
- Department of Statistics, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Weihao Ge
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
- Center for Biophysics and Computational Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - William Metcalf
- Department of Computer Sciences, Rose-Hulman Institute of Technology, Terre Haute, IN, 47803, USA
| | - Eric Jakobsson
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
- Center for Biophysics and Computational Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
- Neuroscience Program, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
- Department of Molecular and Integrative Physiology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Liudmila Sergeevna Mainzer
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
- Department of Molecular and Integrative Physiology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
| | - Alexander E Lipka
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
| |
Collapse
|
14
|
Lee CH, Dura JA, LeBar A, DeCaluwe SC. Direct, operando observation of the bilayer solid electrolyte interphase structure: Electrolyte reduction on a non-intercalating electrode. JOURNAL OF POWER SOURCES 2019; 412:https://doi.org/10.1016/j.jpowsour.2018.11.093. [PMID: 32831460 PMCID: PMC7439254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The solid electrolyte interphase (SEI) remains a central challenge to lithium-ion battery durability, in part due to poor understanding of the basic chemistry responsible for its formation and evolution. In this study, the SEI on a non-intercalating tungsten anode is measured by operando neutron reflectometry and quartz crystal microbalance. A dual-layer SEI is observed, with a 3.7 nm thick inner layer and a 15.4 nm thick outer layer. Such structures have been proposed in the literature, but have not been definitively observed via neutron reflectometry. The SEI mass per area was 1207.2 ng/cm2, and QCM provides insight into the SEI formation dynamics during a negative-going voltage sweep and its evolution over multiple cycles. Monte Carlo simulations identify SEI chemical compositions consistent with the combined measurements. The results are consistent with a primarily inorganic, dense inner layer and a primarily organic, porous outer layer, directly confirming structures proposed in the literature. Further refinement of techniques presented herein, coupled with additional complementary measurements and simulations, can give quantitative insight into SEI formation and evolution as a function of battery materials and cycling conditions. This, in turn, will enable scientifically-guided design of durable, conductive SEI layers for Li-ion batteries for a range of applications.
Collapse
Affiliation(s)
- Christopher H. Lee
- Department of Mechanical Engineering, Colorado School of Mines, Golden, CO 80401, USA
| | - Joseph A. Dura
- NIST Center for Neutron Research, Gaithersburg, MD, 20899, USA
| | - Amy LeBar
- Department of Mechanical Engineering, Colorado School of Mines, Golden, CO 80401, USA
| | - Steven C. DeCaluwe
- Department of Mechanical Engineering, Colorado School of Mines, Golden, CO 80401, USA
| |
Collapse
|
15
|
Duan W, Zhang R, Zhao Y, Shen S, Wei Y, Chen F, Christiani DC. Bayesian variable selection for parametric survival model with applications to cancer omics data. Hum Genomics 2018; 12:49. [PMID: 30400837 PMCID: PMC6218990 DOI: 10.1186/s40246-018-0179-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Accepted: 10/07/2018] [Indexed: 12/15/2022] Open
Abstract
Background Modeling thousands of markers simultaneously has been of great interest in testing association between genetic biomarkers and disease or disease-related quantitative traits. Recently, an expectation-maximization (EM) approach to Bayesian variable selection (EMVS) facilitating the Bayesian computation was developed for continuous or binary outcome using a fast EM algorithm. However, it is not suitable to the analyses of time-to-event outcome in many public databases such as The Cancer Genome Atlas (TCGA). Results We extended the EMVS to high-dimensional parametric survival regression framework (SurvEMVS). A variant of cyclic coordinate descent (CCD) algorithm was used for efficient iteration in M-step, and the extended Bayesian information criteria (EBIC) was employed to make choice on hyperparameter tuning. We evaluated the performance of SurvEMVS using numeric simulations and illustrated the effectiveness on two real datasets. The results of numerical simulations and two real data analyses show the well performance of SurvEMVS in aspects of accuracy and computation. Some potential markers associated with survival of lung or stomach cancer were identified. Conclusions These results suggest that our model is effective and can cope with high-dimensional omics data. Electronic supplementary material The online version of this article (10.1186/s40246-018-0179-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Weiwei Duan
- Department of Biostatistics, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,China International Cooperation Center for Environment and Human Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,Joint Laboratory of Health and Environmental Risk Assessment (HERA), Nanjing Medical University School of Public Health / Harvard School of Public Health, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,Key Laboratory of Biomedical Big Data of Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China
| | - Ruyang Zhang
- Department of Biostatistics, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,China International Cooperation Center for Environment and Human Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,Joint Laboratory of Health and Environmental Risk Assessment (HERA), Nanjing Medical University School of Public Health / Harvard School of Public Health, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,Key Laboratory of Biomedical Big Data of Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China
| | - Yang Zhao
- Department of Biostatistics, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,China International Cooperation Center for Environment and Human Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,Joint Laboratory of Health and Environmental Risk Assessment (HERA), Nanjing Medical University School of Public Health / Harvard School of Public Health, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,Key Laboratory of Biomedical Big Data of Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China
| | - Sipeng Shen
- Department of Biostatistics, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,China International Cooperation Center for Environment and Human Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,Joint Laboratory of Health and Environmental Risk Assessment (HERA), Nanjing Medical University School of Public Health / Harvard School of Public Health, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,Key Laboratory of Biomedical Big Data of Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China
| | - Yongyue Wei
- Department of Biostatistics, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,China International Cooperation Center for Environment and Human Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,Joint Laboratory of Health and Environmental Risk Assessment (HERA), Nanjing Medical University School of Public Health / Harvard School of Public Health, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,Key Laboratory of Biomedical Big Data of Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China
| | - Feng Chen
- Department of Biostatistics, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China. .,China International Cooperation Center for Environment and Human Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China. .,Joint Laboratory of Health and Environmental Risk Assessment (HERA), Nanjing Medical University School of Public Health / Harvard School of Public Health, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China. .,Key Laboratory of Biomedical Big Data of Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.
| | - David C Christiani
- China International Cooperation Center for Environment and Human Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,Joint Laboratory of Health and Environmental Risk Assessment (HERA), Nanjing Medical University School of Public Health / Harvard School of Public Health, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,Department of Environmental Health, Harvard School of Public Health, Boston, MA, USA.,Pulmonary and Critical Care Division, Department of Medicine, Massachusetts General Hospital/Harvard Medical School, Boston, MA, 02114, USA
| |
Collapse
|
16
|
Liu Y, Wang P. Selection by partitioning the solution paths. Electron J Stat 2018. [DOI: 10.1214/18-ejs1434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
17
|
Cheng L, Shan L, Kim I. Multilevel Gaussian graphical model for multilevel networks. J Stat Plan Inference 2017. [DOI: 10.1016/j.jspi.2017.05.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
18
|
Szulc P, Bogdan M, Frommlet F, Tang H. Joint genotype- and ancestry-based genome-wide association studies in admixed populations. Genet Epidemiol 2017; 41:555-566. [PMID: 28657151 DOI: 10.1002/gepi.22056] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2016] [Revised: 04/01/2017] [Accepted: 04/25/2017] [Indexed: 12/21/2022]
Abstract
In genome-wide association studies (GWAS) genetic loci that influence complex traits are localized by inspecting associations between genotypes of genetic markers and the values of the trait of interest. On the other hand, admixture mapping, which is performed in case of populations consisting of a recent mix of two ancestral groups, relies on the ancestry information at each locus (locus-specific ancestry). Recently it has been proposed to jointly model genotype and locus-specific ancestry within the framework of single marker tests. Here, we extend this approach for population-based GWAS in the direction of multimarker models. A modified version of the Bayesian information criterion is developed for building a multilocus model that accounts for the differential correlation structure due to linkage disequilibrium (LD) and admixture LD. Simulation studies and a real data example illustrate the advantages of this new approach compared to single-marker analysis or modern model selection strategies based on separately analyzing genotype and ancestry data, as well as to single-marker analysis combining genotypic and ancestry information. Depending on the signal strength, our procedure automatically chooses whether genotypic or locus-specific ancestry markers are added to the model. This results in a good compromise between the power to detect causal mutations and the precision of their localization. The proposed method has been implemented in R and is available at http://www.math.uni.wroc.pl/~mbogdan/admixtures/.
Collapse
Affiliation(s)
- Piotr Szulc
- Faculty of Mathematics, Wroclaw University of Technology, Wroclaw, Poland
| | - Malgorzata Bogdan
- Faculty of Mathematics and Computer Science, University of Wroclaw, Wroclaw, Poland
| | - Florian Frommlet
- Department of Medical Statistics, CEMSIIS, Medical University of Vienna, Vienna, Austria
| | - Hua Tang
- Departments of Genetics and Statistics, Stanford University, Stanford, California, United States of America
| |
Collapse
|
19
|
Shi M, Shen W, Wang HQ, Chong Y. Adaptive modelling of gene regulatory network using Bayesian information criterion-guided sparse regression approach. IET Syst Biol 2016; 10:252-259. [PMID: 27879480 PMCID: PMC8687338 DOI: 10.1049/iet-syb.2016.0005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2016] [Revised: 06/13/2016] [Accepted: 06/14/2016] [Indexed: 11/19/2022] Open
Abstract
Inferring gene regulatory networks (GRNs) from microarray expression data are an important but challenging issue in systems biology. In this study, the authors propose a Bayesian information criterion (BIC)-guided sparse regression approach for GRN reconstruction. This approach can adaptively model GRNs by optimising the l1-norm regularisation of sparse regression based on a modified version of BIC. The use of the regularisation strategy ensures the inferred GRNs to be as sparse as natural, while the modified BIC allows incorporating prior knowledge on expression regulation and thus avoids the overestimation of expression regulators as usual. Especially, the proposed method provides a clear interpretation of combinatorial regulations of gene expression by optimally extracting regulation coordination for a given target gene. Experimental results on both simulation data and real-world microarray data demonstrate the competent performance of discovering regulatory relationships in GRN reconstruction.
Collapse
Affiliation(s)
- Ming Shi
- State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, 129 Luoyu Road, Wuhan 430079, People's Republic of China
| | - Weiming Shen
- State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, 129 Luoyu Road, Wuhan 430079, People's Republic of China
| | - Hong-Qiang Wang
- Machine Intelligence and Computational Biology Lab, Institute of Intelligent Machines, Chinese Academy of Science, P.O. Box 1130, Hefei 230031, People's Republic of China
| | - Yanwen Chong
- State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, 129 Luoyu Road, Wuhan 430079, People's Republic of China.
| |
Collapse
|
20
|
Li D, Sivaganesan S. On the role of the prior in multiplicity adjustment. JOURNAL OF STATISTICAL THEORY AND PRACTICE 2016. [DOI: 10.1080/15598608.2015.1128858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
21
|
|
22
|
Frommlet F, Nuel G. An Adaptive Ridge Procedure for L0 Regularization. PLoS One 2016; 11:e0148620. [PMID: 26849123 PMCID: PMC4743917 DOI: 10.1371/journal.pone.0148620] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2015] [Accepted: 01/21/2016] [Indexed: 11/18/2022] Open
Abstract
Penalized selection criteria like AIC or BIC are among the most popular methods for variable selection. Their theoretical properties have been studied intensively and are well understood, but making use of them in case of high-dimensional data is difficult due to the non-convex optimization problem induced by L0 penalties. In this paper we introduce an adaptive ridge procedure (AR), where iteratively weighted ridge problems are solved whose weights are updated in such a way that the procedure converges towards selection with L0 penalties. After introducing AR its specific shrinkage properties are studied in the particular case of orthogonal linear regression. Based on extensive simulations for the non-orthogonal case as well as for Poisson regression the performance of AR is studied and compared with SCAD and adaptive LASSO. Furthermore an efficient implementation of AR in the context of least-squares segmentation is presented. The paper ends with an illustrative example of applying AR to analyze GWAS data.
Collapse
Affiliation(s)
- Florian Frommlet
- Department of Medical Statistics (CEMSIIS), Medical University of Vienna, Spitalgasse 23, A-1090 Vienna, Austria
| | - Grégory Nuel
- National Institute for Mathematical Sciences (INSMI), CNRS, Stochastics and Biology Group (PSB), LPMA UMR CNRS 7599, Université Pierre et Marie Curie, 4 place Jussieu, 75005 Paris, France
| |
Collapse
|
23
|
Sovilj D, Björk KM, Lendasse A. Comparison of combining methods using Extreme Learning Machines under small sample scenario. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.03.109] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
24
|
Rawi R, El Anbari M, Bensmail H. Model selection emphasises the importance of non-chromosomal information in genetic studies. PLoS One 2015; 10:e0117014. [PMID: 25626013 PMCID: PMC4308103 DOI: 10.1371/journal.pone.0117014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2014] [Accepted: 12/17/2014] [Indexed: 12/05/2022] Open
Abstract
Ever since the case of the missing heritability was highlighted some years ago, scientists have been investigating various possible explanations for the issue. However, none of these explanations include non-chromosomal genetic information. Here we describe explicitly how chromosomal and non-chromosomal modifiers collectively influence the heritability of a trait, in this case, the growth rate of yeast. Our results show that the non-chromosomal contribution can be large, adding another dimension to the estimation of heritability. We also discovered, combining the strength of LASSO with model selection, that the interaction of chromosomal and non-chromosomal information is essential in describing phenotypes.
Collapse
Affiliation(s)
- Reda Rawi
- Computational Science and Engineering Center, Qatar Computing Research Institute, Doha, Qatar
| | - Mohamed El Anbari
- Computational Science and Engineering Center, Qatar Computing Research Institute, Doha, Qatar
- Division of Biomedical Informatics, Sidra Medical and Research Center, Doha, Qatar
| | - Halima Bensmail
- Computational Science and Engineering Center, Qatar Computing Research Institute, Doha, Qatar
- * E-mail:
| |
Collapse
|
25
|
Barber RF, Drton M. High-dimensional Ising model selection with Bayesian information criteria. Electron J Stat 2015. [DOI: 10.1214/15-ejs1012] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
26
|
Yi H, Breheny P, Imam N, Liu Y, Hoeschele I. Penalized multimarker vs. single-marker regression methods for genome-wide association studies of quantitative traits. Genetics 2015; 199:205-22. [PMID: 25354699 PMCID: PMC4286685 DOI: 10.1534/genetics.114.167817] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2014] [Accepted: 10/21/2014] [Indexed: 11/18/2022] Open
Abstract
The data from genome-wide association studies (GWAS) in humans are still predominantly analyzed using single-marker association methods. As an alternative to single-marker analysis (SMA), all or subsets of markers can be tested simultaneously. This approach requires a form of penalized regression (PR) as the number of SNPs is much larger than the sample size. Here we review PR methods in the context of GWAS, extend them to perform penalty parameter and SNP selection by false discovery rate (FDR) control, and assess their performance in comparison with SMA. PR methods were compared with SMA, using realistically simulated GWAS data with a continuous phenotype and real data. Based on these comparisons our analytic FDR criterion may currently be the best approach to SNP selection using PR for GWAS. We found that PR with FDR control provides substantially more power than SMA with genome-wide type-I error control but somewhat less power than SMA with Benjamini-Hochberg FDR control (SMA-BH). PR with FDR-based penalty parameter selection controlled the FDR somewhat conservatively while SMA-BH may not achieve FDR control in all situations. Differences among PR methods seem quite small when the focus is on SNP selection with FDR control. Incorporating linkage disequilibrium into the penalization by adapting penalties developed for covariates measured on graphs can improve power but also generate more false positives or wider regions for follow-up. We recommend the elastic net with a mixing weight for the Lasso penalty near 0.5 as the best method.
Collapse
Affiliation(s)
- Hui Yi
- Virginia Bioinformatics Institute, Virginia Tech University, Blacksburg, Virginia 24061 Ph.D. Program in Genetics, Bioinformatics, and Computational Biology, Virginia Tech University, Blacksburg, Virginia 24061
| | - Patrick Breheny
- Department of Biostatistics, University of Iowa, Iowa City, Iowa 52240
| | - Netsanet Imam
- Virginia Bioinformatics Institute, Virginia Tech University, Blacksburg, Virginia 24061
| | - Yongmei Liu
- Departments of Epidemiology and Prevention and Internal Medicine, Division of Public Health Sciences, Translational Research Institute, Wake Forest School of Medicine, Winston-Salem, North Carolina 27157
| | - Ina Hoeschele
- Virginia Bioinformatics Institute, Virginia Tech University, Blacksburg, Virginia 24061 Department of Statistics, Virginia Tech University, Blacksburg, Virginia 24061
| |
Collapse
|
27
|
He Y, Chen Z. The EBIC and a sequential procedure for feature selection in interactive linear models with high-dimensional data. ANN I STAT MATH 2014. [DOI: 10.1007/s10463-014-0497-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
28
|
DeCaluwe SC, Kienzle PA, Bhargava P, Baker AM, Dura JA. Phase segregation of sulfonate groups in Nafion interface lamellae, quantified via neutron reflectometry fitting techniques for multi-layered structures. SOFT MATTER 2014; 10:5763-5776. [PMID: 24981163 DOI: 10.1039/c4sm00850b] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Neutron reflectometry analysis methods for under-determined, multi-layered structures are developed and used to determine the composition depth profile in cases where the structure is not known a priori. These methods, including statistical methods, sophisticated fitting routines, and coupling multiple data sets, are applied to hydrated and dehydrated Nafion nano-scaled films with thicknesses comparable to those found coating electrode particles in fuel cell catalyst layers. These results confirm the lamellar structure previously observed on hydrophilic substrates, and demonstrate that for hydrated films they can accurately be described as layers rich in both water and sulfonate groups alternating with water-poor layers containing an excess of fluorocarbon groups. The thickness of these layers increases slightly and the amplitude of the water volume fraction oscillation exponentially decreases away from the hydrophilic interface. For dehydrated films, the composition oscillations die out more rapidly. The Nafion-SiO2 substrate interface contains a partial monolayer of sulfonate groups bonded to the substrate and a large excess of water compared to that expected by the water-to-sulfonate ratio, λ, observed throughout the rest of the film. Films that were made thin enough to truncate this lamellar region showed a depth profile nearly identical to thicker films, indicating that there are no confinement or surface effects altering the structure. Comparing the SLD profile measured for films dried at 60 °C to modeled composition profiles derived by removing water from the hydrated lamellae suggests incomplete re-mixing of the polymer groups upon dehydration, indicated limited polymer mobility in these Nafion thin films.
Collapse
Affiliation(s)
- Steven C DeCaluwe
- Department of Mechanical Engineering, Colorado School of Mines, Golden, CO 80401, USA
| | | | | | | | | |
Collapse
|
29
|
Dolejsi E, Bodenstorfer B, Frommlet F. Analyzing genome-wide association studies with an FDR controlling modification of the Bayesian Information Criterion. PLoS One 2014; 9:e103322. [PMID: 25061809 PMCID: PMC4111553 DOI: 10.1371/journal.pone.0103322] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Accepted: 07/01/2014] [Indexed: 01/24/2023] Open
Abstract
The prevailing method of analyzing GWAS data is still to test each marker individually, although from a statistical point of view it is quite obvious that in case of complex traits such single marker tests are not ideal. Recently several model selection approaches for GWAS have been suggested, most of them based on LASSO-type procedures. Here we will discuss an alternative model selection approach which is based on a modification of the Bayesian Information Criterion (mBIC2) which was previously shown to have certain asymptotic optimality properties in terms of minimizing the misclassification error. Heuristic search strategies are introduced which attempt to find the model which minimizes mBIC2, and which are efficient enough to allow the analysis of GWAS data. Our approach is implemented in a software package called MOSGWA. Its performance in case control GWAS is compared with the two algorithms HLASSO and d-GWASelect, as well as with single marker tests, where we performed a simulation study based on real SNP data from the POPRES sample. Our results show that MOSGWA performs slightly better than HLASSO, where specifically for more complex models MOSGWA is more powerful with only a slight increase in Type I error. On the other hand according to our simulations GWASelect does not at all control the type I error when used to automatically determine the number of important SNPs. We also reanalyze the GWAS data from the Wellcome Trust Case-Control Consortium and compare the findings of the different procedures, where MOSGWA detects for complex diseases a number of interesting SNPs which are not found by other methods.
Collapse
Affiliation(s)
- Erich Dolejsi
- Center for Medical Statistics, Informatics, and Intelligent Systems/Section of Medical Statistics, Medical University Vienna, Vienna, Austria
| | | | - Florian Frommlet
- Center for Medical Statistics, Informatics, and Intelligent Systems/Section of Medical Statistics, Medical University Vienna, Vienna, Austria
| |
Collapse
|
30
|
Extended Bayesian information criterion in the Cox model with a high-dimensional feature space. ANN I STAT MATH 2014. [DOI: 10.1007/s10463-014-0448-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
31
|
Abstract
Model search strategies play an important role in finding simultaneous susceptibility genes that are associated with a trait. More particularly, model selection via the information criteria, such as the BIC with modifications, have received considerable attention in quantitative trait loci (QTL) mapping. However, such modifications often depend upon several factors, such as sample size, prior distribution, and the type of experiment, e.g., backcross, intercross. These changes make it difficult to generalize the methods to all cases. The fence method avoids such limitations with a unified approach, and hence can be used more broadly. In this paper, this method is studied in the case of backcross experiments throughout a series of simulation studies. The results are compared with those of the modified BIC method as well as some of the most popular shrinkage methods for model selection.
Collapse
Affiliation(s)
- Thuan Nguyen
- Department of Public Health and Preventive Medicine, Oregon Health and Science University, Portland, OR 97239, U.S.A
| | - Jie Peng
- Department of Statistics, University of California, Davis, CA 95616, U.S.A
| | - Jiming Jiang
- Department of Statistics, University of California, Davis, CA 95616, U.S.A
| |
Collapse
|
32
|
Development of QSPR Model Relating Solvent Structure to Crystal Morphology. ACTA ACUST UNITED AC 2014. [DOI: 10.1016/b978-0-444-63433-7.50038-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
33
|
Lv J, Liu JS. Model selection principles in misspecified models. J R Stat Soc Series B Stat Methodol 2013. [DOI: 10.1111/rssb.12023] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Jinchi Lv
- University of Southern California; Los Angeles USA
| | | |
Collapse
|
34
|
ElBakry O, Ahmad MO, Swamy MNS. Inference of gene regulatory networks with variable time delay from time-series microarray data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:671-687. [PMID: 24091400 DOI: 10.1109/tcbb.2013.73] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Regulatory interactions among genes and gene products are dynamic processes and hence modeling these processes is of great interest. Since genes work in a cascade of networks, reconstruction of gene regulatory network (GRN) is a crucial process for a thorough understanding of the underlying biological interactions. We present here an approach based on pairwise correlations and lasso to infer the GRN, taking into account the variable time delays between various genes. The proposed method is applied to both synthetic and real data sets, and the results on synthetic data show that the proposed approach outperforms the current methods. Further, the results using real data are more consistent with the existing knowledge concerning the possible gene interactions.
Collapse
|
35
|
Extended BIC for linear regression models with diverging number of relevant features and high or ultra-high feature spaces. J Stat Plan Inference 2013. [DOI: 10.1016/j.jspi.2012.08.015] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
36
|
Frommlet F, Bogdan M. Some optimality properties of FDR controlling rules under sparsity. Electron J Stat 2013. [DOI: 10.1214/13-ejs808] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
37
|
Abstract
In this chapter we describe a novel Bayesian approach to designing GWAS studies with the goal of ensuring robust detection of effects of genomic loci associated with trait variation.The goal of GWAS is to detect loci associated with variation in traits of interest. Finding which of 500,000-1,000,000 loci has a practically significant effect is a difficult statistical problem, like finding a needle in a haystack. We address this problem by designing experiments to detect effects with a given Bayes factor, where the Bayes factor is chosen sufficiently large to overcome the low prior odds for genomic associations. Methods are given for various possible data structures including random population samples, case-control designs, transmission disequilibrium tests, sib-based transmission disequilibrium tests, and other family-based designs including designs for plants with clonal replication. We also consider the problem of eliciting prior information from experts, which is necessary to quantify prior odds for loci. We advocate a "subjective" Bayesian approach, where the prior distribution is considered as a mathematical representation of our prior knowledge, while also giving generic formulae that allow conservative computations based on low prior information, e.g., equivalent to the information in a single sample point. Examples using R and the R packages ldDesign are given throughout.
Collapse
Affiliation(s)
- Roderick D Ball
- Scion (New Zealand Forest Research Institute Limited), Rotorua, New Zealand
| |
Collapse
|
38
|
Liu Y, Yang T, Li H, Yang R. Iteratively reweighted LASSO for mapping multiple quantitative trait loci. Brief Bioinform 2012; 15:20-9. [DOI: 10.1093/bib/bbs062] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
|
39
|
Miller MA, Feng XJ, Li G, Rabitz HA. Identifying biological network structure, predicting network behavior, and classifying network state with High Dimensional Model Representation (HDMR). PLoS One 2012; 7:e37664. [PMID: 22723838 PMCID: PMC3377689 DOI: 10.1371/journal.pone.0037664] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2011] [Accepted: 04/26/2012] [Indexed: 11/26/2022] Open
Abstract
This work presents an adapted Random Sampling - High Dimensional Model Representation (RS-HDMR) algorithm for synergistically addressing three key problems in network biology: (1) identifying the structure of biological networks from multivariate data, (2) predicting network response under previously unsampled conditions, and (3) inferring experimental perturbations based on the observed network state. RS-HDMR is a multivariate regression method that decomposes network interactions into a hierarchy of non-linear component functions. Sensitivity analysis based on these functions provides a clear physical and statistical interpretation of the underlying network structure. The advantages of RS-HDMR include efficient extraction of nonlinear and cooperative network relationships without resorting to discretization, prediction of network behavior without mechanistic modeling, robustness to data noise, and favorable scalability of the sampling requirement with respect to network size. As a proof-of-principle study, RS-HDMR was applied to experimental data measuring the single-cell response of a protein-protein signaling network to various experimental perturbations. A comparison to network structure identified in the literature and through other inference methods, including Bayesian and mutual-information based algorithms, suggests that RS-HDMR can successfully reveal a network structure with a low false positive rate while still capturing non-linear and cooperative interactions. RS-HDMR identified several higher-order network interactions that correspond to known feedback regulations among multiple network species and that were unidentified by other network inference methods. Furthermore, RS-HDMR has a better ability to predict network response under unsampled conditions in this application than the best statistical inference algorithm presented in the recent DREAM3 signaling-prediction competition. RS-HDMR can discern and predict differences in network state that arise from sources ranging from intrinsic cell-cell variability to altered experimental conditions, such as when drug perturbations are introduced. This ability ultimately allows RS-HDMR to accurately classify the experimental conditions of a given sample based on its observed network state.
Collapse
Affiliation(s)
- Miles A Miller
- Department of Chemistry, Princeton University, Princeton, New Jersey, USA
| | | | | | | |
Collapse
|
40
|
Frommlet F, Ruhaltinger F, Twaróg P, Bogdan M. Modified versions of Bayesian Information Criterion for genome-wide association studies. Comput Stat Data Anal 2012. [DOI: 10.1016/j.csda.2011.05.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
41
|
Mantalos P, Karagrigoriou A. Bootstrapping the augmented Dickey–Fuller test for unit root using the MDIC. J STAT COMPUT SIM 2012. [DOI: 10.1080/00949655.2010.539219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
42
|
|
43
|
Żak-Szatkowska M, Bogdan M. Modified versions of the Bayesian Information Criterion for sparse Generalized Linear Models. Comput Stat Data Anal 2011. [DOI: 10.1016/j.csda.2011.04.016] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
44
|
Chaikam V, Negeri A, Dhawan R, Puchaka B, Ji J, Chintamanani S, Gachomo EW, Zillmer A, Doran T, Weil C, Balint-Kurti P, Johal G. Use of Mutant-Assisted Gene Identification and Characterization (MAGIC) to identify novel genetic loci that modify the maize hypersensitive response. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2011; 123:985-97. [PMID: 21792633 DOI: 10.1007/s00122-011-1641-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2011] [Accepted: 06/13/2011] [Indexed: 05/22/2023]
Abstract
The partially dominant, autoactive maize disease resistance gene Rp1-D21 causes hypersensitive response (HR) lesions to form spontaneously on leaves and stems in the absence of pathogen recognition. The maize nested association mapping (NAM) population consists of 25 200-line subpopulations each derived from a cross between the maize line B73 and one of 25 diverse inbred lines. By crossing a line carrying the Rp1-D21 gene with lines from three of these subpopulations and assessing the F(1) progeny, we were able to map several novel loci that modify the maize HR, using both single-population quantitative trait locus (QTL) and joint analysis of all three populations. Joint analysis detected QTL in greater number and with greater confidence and precision than did single population analysis. In particular, QTL were detected in bins 1.02, 4.04, 9.03, and 10.03. We have previously termed this technique, in which a mutant phenotype is used as a "reporter" for a trait of interest, Mutant-Assisted Gene Identification and Characterization (MAGIC).
Collapse
Affiliation(s)
- Vijay Chaikam
- Department of Agronomy, Purdue University, West Lafayette, IN 47907, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Abstract
Many common human diseases and complex traits are highly heritable and influenced by multiple genetic and environmental factors. Although genome-wide association studies (GWAS) have successfully identified many disease-associated variants, these genetic variants explain only a small proportion of the heritability of most complex diseases. Genetic interactions (gene-gene and gene-environment) substantially contribute to complex traits and diseases and could be one of the main sources of the missing heritability. This paper provides an overview of the available statistical methods and related computer software for identifying genetic interactions in animal and plant experimental crosses and human genetic association studies. The main discussion falls under the three broad issues in statistical analysis of genetic interactions: the definition, detection and interpretation of genetic interactions. Recently developed methods based on modern techniques for high-dimensional data are reviewed, including penalized likelihood approaches and hierarchical models; the relationships between these methods are also discussed. I conclude this review by highlighting some areas of future research.
Collapse
|
46
|
Huang H, Chanda P, Alonso A, Bader JS, Arking DE. Gene-based tests of association. PLoS Genet 2011; 7:e1002177. [PMID: 21829371 PMCID: PMC3145613 DOI: 10.1371/journal.pgen.1002177] [Citation(s) in RCA: 80] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2010] [Accepted: 05/25/2011] [Indexed: 11/19/2022] Open
Abstract
Genome-wide association studies (GWAS) are now used routinely to identify SNPs associated with complex human phenotypes. In several cases, multiple variants within a gene contribute independently to disease risk. Here we introduce a novel Gene-Wide Significance (GWiS) test that uses greedy Bayesian model selection to identify the independent effects within a gene, which are combined to generate a stronger statistical signal. Permutation tests provide p-values that correct for the number of independent tests genome-wide and within each genetic locus. When applied to a dataset comprising 2.5 million SNPs in up to 8,000 individuals measured for various electrocardiography (ECG) parameters, this method identifies more validated associations than conventional GWAS approaches. The method also provides, for the first time, systematic assessments of the number of independent effects within a gene and the fraction of disease-associated genes housing multiple independent effects, observed at 35%-50% of loci in our study. This method can be generalized to other study designs, retains power for low-frequency alleles, and provides gene-based p-values that are directly compatible for pathway-based meta-analysis.
Collapse
Affiliation(s)
- Hailiang Huang
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
- High Throughput Biology Center, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Pritam Chanda
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
- High Throughput Biology Center, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Alvaro Alonso
- Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Joel S. Bader
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
- High Throughput Biology Center, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Dan E. Arking
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| |
Collapse
|
47
|
Long N, Gianola D, Rosa GJM, Weigel KA. Marker-assisted prediction of non-additive genetic values. Genetica 2011; 139:843-54. [PMID: 21674154 DOI: 10.1007/s10709-011-9588-7] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2011] [Accepted: 06/03/2011] [Indexed: 10/18/2022]
Abstract
It has become increasingly clear from systems biology arguments that interaction and non-linearity play an important role in genetic regulation of phenotypic variation for complex traits. Marker-assisted prediction of genetic values assuming additive gene action has been widely investigated because of its relevance in artificial selection. On the other hand, it has been less well-studied when non-additive effects hold. Here, we explored a nonparametric model, radial basis function (RBF) regression, for predicting quantitative traits under different gene action modes (additivity, dominance and epistasis). Using simulation, it was found that RBF had better ability (higher predictive correlations and lower predictive mean square errors) of predicting merit of individuals in future generations in the presence of non-additive effects than a linear additive model, the Bayesian Lasso. This was true for populations undergoing either directional or random selection over several generations. Under additive gene action, RBF was slightly worse than the Bayesian Lasso. While prediction of genetic values under additive gene action is well handled by a variety of parametric models, nonparametric RBF regression is a useful counterpart for dealing with situations where non-additive gene action is suspected, and it is robust irrespective of mode of gene action.
Collapse
Affiliation(s)
- Nanye Long
- Department of Animal Sciences, University of Wisconsin-Madison, 1675 Observatory Dr. Animal Science Bldg, Madison, WI 53706, USA.
| | | | | | | |
Collapse
|
48
|
Wang Q, Megalooikonomou V. A performance evaluation framework for association mining in spatial data. J Intell Inf Syst 2010; 35:465-494. [PMID: 21170170 PMCID: PMC3002258 DOI: 10.1007/s10844-009-0115-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
The evaluation of the process of mining associations is an important and challenging problem in database systems and especially those that store critical data and are used for making critical decisions. Within the context of spatial databases we present an evaluation framework in which we use probability distributions to model spatial regions, and Bayesian networks to model the joint probability distribution and the structural relationships among spatial and non-spatial predicates. We demonstrate the applicability of the proposed framework by evaluating representatives from two well-known approaches that are used for learning associations, i.e., dependency analysis (using statistical tests of independence) and Bayesian methods. By controlling the parameters of the framework we provide extensive comparative results of the performance of the two approaches. We obtain measures of recovery of known associations as a function of the number of samples used, the strength, number and type of associations in the model, the number of spatial predicates associated with a particular non-spatial predicate, the prior probabilities of spatial predicates, the conditional probabilities of the non-spatial predicates, the image registration error, and the parameters that control the sensitivity of the methods. In addition to performance we investigate the processing efficiency of the two approaches.
Collapse
Affiliation(s)
- Qiang Wang
- Data Engineering Laboratory, Department of Computer and Information Sciences, Temple University, 415 Wachman Hall, 1805 N. Broad Str., Philadelphia, PA 19122, USA
| | - Vasileios Megalooikonomou
- Data Engineering Laboratory, Department of Computer and Information Sciences, Temple University, 415 Wachman Hall, 1805 N. Broad Str., Philadelphia, PA 19122, USA
| |
Collapse
|
49
|
Chen Z, Cui W. A two-phase procedure for QTL mapping with regression models. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2010; 121:363-372. [PMID: 20336448 DOI: 10.1007/s00122-010-1315-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2009] [Accepted: 02/27/2010] [Indexed: 05/29/2023]
Abstract
It is typical in QTL mapping experiments that the number of markers under investigation is large. This poses a challenge to commonly used regression models since the number of feature variables is usually much larger than the sample size, especially, when epistasis effects are to be considered. The greedy nature of the conventional stepwise procedures is well known and is even more conspicuous in such cases. In this article, we propose a two-phase procedure based on penalized likelihood techniques and extended Bayes information criterion (EBIC) for QTL mapping. The procedure consists of a screening phase and a selection phase. In the screening phase, the main and interaction features are alternatively screened by a penalized likelihood mechanism. In the selection phase, a low-dimensional approach using EBIC is applied to the features retained in the screening phase to identify QTL. The two-phase procedure has the asymptotic property that its positive detection rate (PDR) and false discovery rate (FDR) converge to 1 and 0, respectively, as sample size goes to infinity. The two-phase procedure is compared with both traditional and recently developed approaches by simulation studies. A real data analysis is presented to demonstrate the application of the two-phase procedure.
Collapse
Affiliation(s)
- Zehua Chen
- Department of Statistics and Applied Probability, National University of Singapore, 3 Science Drive 2, Singapore.
| | | |
Collapse
|
50
|
Park J, Ghosh JK. Guided random walk through some high dimensional problems. SANKHYA-SERIES A-MATHEMATICAL STATISTICS AND PROBABILITY 2010. [DOI: 10.1007/s13171-010-0017-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|