1
|
Estimation of multiple networks with common structures in heterogeneous subgroups. J MULTIVARIATE ANAL 2024; 202:105298. [PMID: 38433779 PMCID: PMC10907012 DOI: 10.1016/j.jmva.2024.105298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2024]
Abstract
Network estimation has been a critical component of high-dimensional data analysis and can provide an understanding of the underlying complex dependence structures. Among the existing studies, Gaussian graphical models have been highly popular. However, they still have limitations due to the homogeneous distribution assumption and the fact that they are only applicable to small-scale data. For example, cancers have various levels of unknown heterogeneity, and biological networks, which include thousands of molecular components, often differ across subgroups while also sharing some commonalities. In this article, we propose a new joint estimation approach for multiple networks with unknown sample heterogeneity, by decomposing the Gaussian graphical model (GGM) into a collection of sparse regression problems. A reparameterization technique and a composite minimax concave penalty are introduced to effectively accommodate the specific and common information across the networks of multiple subgroups, making the proposed estimator significantly advancing from the existing heterogeneity network analysis based on the regularized likelihood of GGM directly and enjoying scale-invariant, tuning-insensitive, and optimization convexity properties. The proposed analysis can be effectively realized using parallel computing. The estimation and selection consistency properties are rigorously established. The proposed approach allows the theoretical studies to focus on independent network estimation only and has the significant advantage of being both theoretically and computationally applicable to large-scale data. Extensive numerical experiments with simulated data and the TCGA breast cancer data demonstrate the prominent performance of the proposed approach in both subgroup and network identifications.
Collapse
|
2
|
Information-incorporated sparse hierarchical cancer heterogeneity analysis. Stat Med 2024; 43:2280-2297. [PMID: 38553996 DOI: 10.1002/sim.10071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Revised: 01/11/2024] [Accepted: 03/19/2024] [Indexed: 05/18/2024]
Abstract
Cancer heterogeneity analysis is essential for precision medicine. Most of the existing heterogeneity analyses only consider a single type of data and ignore the possible sparsity of important features. In cancer clinical practice, it has been suggested that two types of data, pathological imaging and omics data, are commonly collected and can produce hierarchical heterogeneous structures, in which the refined sub-subgroup structure determined by omics features can be nested in the rough subgroup structure determined by the imaging features. Moreover, sparsity pursuit has extraordinary significance and is more challenging for heterogeneity analysis, because the important features may not be the same in different subgroups, which is ignored by the existing heterogeneity analyses. Fortunately, rich information from previous literature (for example, those deposited in PubMed) can be used to assist feature selection in the present study. Advancing from the existing analyses, in this study, we propose a novel sparse hierarchical heterogeneity analysis framework, which can integrate two types of features and incorporate prior knowledge to improve feature selection. The proposed approach has satisfactory statistical properties and competitive numerical performance. A TCGA real data analysis demonstrates the practical value of our approach in analyzing data heterogeneity and sparsity.
Collapse
|
3
|
Endocrine disrupting chemical mixture exposure and risk of papillary thyroid cancer in U.S. military personnel: A nested case-control study. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 922:171342. [PMID: 38428594 PMCID: PMC11034764 DOI: 10.1016/j.scitotenv.2024.171342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 02/26/2024] [Accepted: 02/27/2024] [Indexed: 03/03/2024]
Abstract
Single-pollutant methods to evaluate associations between endocrine disrupting chemicals (EDCs) and thyroid cancer risk may not reflect realistic human exposures. Therefore, we evaluated associations between exposure to a mixture of 18 EDCs, including polychlorinated biphenyls (PCBs), brominated flame retardants, and organochlorine pesticides, and risk of papillary thyroid cancer (PTC), the most common thyroid cancer histological subtype. We conducted a nested case-control study among U.S. military servicemembers of 652 histologically-confirmed PTC cases diagnosed between 2000 and 2013 and 652 controls, matched on birth year, sex, race/ethnicity, military component (active duty/reserve), and serum sample timing. We estimated mixture odds ratios (OR), 95% confidence intervals (95% CI), and standard errors (SE) for associations between pre-diagnostic serum EDC mixture concentrations, overall PTC risk, and risk of histological subtypes of PTC (classical, follicular), adjusted for body mass index and military branch, using quantile g-computation. Additionally, we identified relative contributions of individual mixture components to PTC risk, represented by positive and negative weights (w). A one-quartile increase in the serum mixture concentration was associated with a non-statistically significant increase in overall PTC risk (OR = 1.19; 95% CI = 0.91, 1.56; SE = 0.14). Stratified by histological subtype and race (White, Black), a one-quartile increase in the mixture was associated with increased classical PTC risk among those of White race (OR = 1.59; 95% CI = 1.06, 2.40; SE = 0.21), but not of Black race (OR = 0.95; 95% CI = 0.34, 2.68; SE = 0.53). PCBs 180, 199, and 118 had the greatest positive weights driving this association among those of White race (w = 0.312, 0.255, and 0.119, respectively). Findings suggest that exposure to an EDC mixture may be associated with increased classical PTC risk. These findings warrant further investigation in other study populations to better understand PTC risk by histological subtype and race.
Collapse
|
4
|
Neurogenetic underpinnings of nicotine use severity: Integrating the brain transcriptomes and GWAS variants via network approaches. Psychiatry Res 2024; 334:115815. [PMID: 38422867 PMCID: PMC11017751 DOI: 10.1016/j.psychres.2024.115815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 02/19/2024] [Accepted: 02/23/2024] [Indexed: 03/02/2024]
Abstract
Our study focused on human brain transcriptomes and the genetic risks of cigarettes per day (CPD) to investigate the neurogenetic mechanisms of individual variation in nicotine use severity. We constructed whole-brain and intramodular region-specific coexpression networks using BrainSpan's transcriptomes, and the genomewide association studies identified risk variants of CPD, confirmed the associations between CPD and each gene set in the region-specific subnetworks using an independent dataset, and conducted bioinformatic analyses. Eight brain-region-specific coexpression subnetworks were identified in association with CPD: amygdala, hippocampus, medial prefrontal cortex (MPFC), orbitofrontal cortex (OPFC), dorsolateral prefrontal cortex, striatum, mediodorsal nucleus of the thalamus (MDTHAL), and primary motor cortex (M1C). Each gene set in the eight subnetworks was associated with CPD. We also identified three hub proteins encoded by GRIN2A in the amygdala, PMCA2 in the hippocampus, MPFC, OPFC, striatum, and MDTHAL, and SV2B in M1C. Intriguingly, the pancreatic secretion pathway appeared in all the significant protein interaction subnetworks, suggesting pleiotropic effects between cigarette smoking and pancreatic diseases. The three hub proteins and genes are implicated in stress response, drug memory, calcium homeostasis, and inhibitory control. These findings provide novel evidence of the neurogenetic underpinnings of smoking severity.
Collapse
|
5
|
Hierarchical False Discovery Rate Control for High-dimensional Survival Analysis with Interactions. Comput Stat Data Anal 2024; 192:107906. [PMID: 38098875 PMCID: PMC10718515 DOI: 10.1016/j.csda.2023.107906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2023]
Abstract
With the development of data collection techniques, analysis with a survival response and high-dimensional covariates has become routine. Here we consider an interaction model, which includes a set of low-dimensional covariates, a set of high-dimensional covariates, and their interactions. This model has been motivated by gene-environment (G-E) interaction analysis, where the E variables have a low dimension, and the G variables have a high dimension. For such a model, there has been extensive research on estimation and variable selection. Comparatively, inference studies with a valid false discovery rate (FDR) control have been very limited. The existing high-dimensional inference tools cannot be directly applied to interaction models, as interactions and main effects are not "equal". In this article, for high-dimensional survival analysis with interactions, we model survival using the Accelerated Failure Time (AFT) model and adopt a "weighted least squares + debiased Lasso" approach for estimation and selection. A hierarchical FDR control approach is developed for inference and respect of the "main effects, interactions" hierarchy. The asymptotic distribution properties of the debiased Lasso estimators are rigorously established. Simulation demonstrates the satisfactory performance of the proposed approach, and the analysis of a breast cancer dataset further establishes its practical utility.
Collapse
|
6
|
Prediction Consistency Regularization for Learning with Noise Labels Based on Contrastive Clustering. ENTROPY (BASEL, SWITZERLAND) 2024; 26:308. [PMID: 38667864 PMCID: PMC11049179 DOI: 10.3390/e26040308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 03/28/2024] [Accepted: 03/29/2024] [Indexed: 04/28/2024]
Abstract
In the classification task, label noise has a significant impact on models' performance, primarily manifested in the disruption of prediction consistency, thereby reducing the classification accuracy. This work introduces a novel prediction consistency regularization that mitigates the impact of label noise on neural networks by imposing constraints on the prediction consistency of similar samples. However, determining which samples should be similar is a primary challenge. We formalize the similar sample identification as a clustering problem and employ twin contrastive clustering (TCC) to address this issue. To ensure similarity between samples within each cluster, we enhance TCC by adjusting clustering prior to distribution using label information. Based on the adjusted TCC's clustering results, we first construct the prototype for each cluster and then formulate a prototype-based regularization term to enhance prediction consistency for the prototype within each cluster and counteract the adverse effects of label noise. We conducted comprehensive experiments using benchmark datasets to evaluate the effectiveness of our method under various scenarios with different noise rates. The results explicitly demonstrate the enhancement in classification accuracy. Subsequent analytical experiments confirm that the proposed regularization term effectively mitigates noise and that the adjusted TCC enhances the quality of similar sample recognition.
Collapse
|
7
|
Organochlorine pesticides and risk of papillary thyroid cancer in U.S. military personnel: a nested case-control study. Environ Health 2024; 23:28. [PMID: 38504322 PMCID: PMC10949709 DOI: 10.1186/s12940-024-01068-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 03/01/2024] [Indexed: 03/21/2024]
Abstract
BACKGROUND The effects of organochlorine pesticide (OCP) exposure on the development of human papillary thyroid cancer (PTC) are not well understood. A nested case-control study was conducted with data from the U.S. Department of Defense Serum Repository (DoDSR) cohort between 2000 and 2013 to assess associations of individual OCPs serum concentrations with PTC risk. METHODS This study included 742 histologically confirmed PTC cases (341 females, 401 males) and 742 individually-matched controls with pre-diagnostic serum samples selected from the DoDSR. Associations between categories of lipid-corrected serum concentrations of seven OCPs and PTC risk were evaluated for classical PTC and follicular PTC using conditional logistic regression, adjusted for body mass index category and military branch to compute odds ratios (OR) and 95% confidence intervals (CIs). Effect modification by sex, birth cohort, and race was examined. RESULTS There was no evidence of associations between most of the OCPs and PTC, overall or stratified by histological subtype. Overall, there was no evidence of an association between hexachlorobenzene (HCB) and PTC, but stratified by histological subtype HCB was associated with significantly increased risk of classical PTC (third tertile above the limit of detection (LOD) vs.
Collapse
|
8
|
[The association between portal vein thrombosis and rebleeding after non-urgent endoscopic treatment of esophagogastric varices]. ZHONGHUA YI XUE ZA ZHI 2024; 104:682-689. [PMID: 38418167 DOI: 10.3760/cma.j.cn112137-20231110-01064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/01/2024]
Abstract
Objective: To investigate the association between portal vein thrombosis and rebleeding after non-urgent endoscopic treatment of esophagogastric varices. Methods: The cirrhotic patients with esophagogastric varices diagnosed in the People's Hospital of Zhengzhou University from January 2017 to March 2023 were retrospectively collected. The patients were divided into thrombotic group and non-thrombotic group according to the presence or absence of portal vein thrombosis. The failure rate of endoscopic treatment and rebleeding rate in different periods were compared between the two groups. Receiver operating characteristic (ROC) curve was used to select the best cutoff value of gastric varicose diameter that affected total rebleeding during follow-up in both groups. The influencing factors of rebleeding within 12 and 36 months in both groups were analyzed, and the influencing factors of rebleeding within 36 months in thrombus group were further analyzed. Results: A total of 106 patients were enrolled, including 53 patients in the thrombotic group [male 37, female 16, aged 18-78 (54±13) years] and 53 patients in the non-thrombotic group [male 37, female 16, aged 27-83 (55±12) years]. The follow-up time of the two groups were (20±15) and (25±15) months, respectively. The total rebleeding rate in the thrombotic group was higher than that in the non-thrombotic group [30.2% (16/53) vs 13.2% (7/53), P˂0.05]. The rebleeding rates within 6, 12, 24 and 36 months in the thrombotic group were higher than those in the non-thrombotic group [18.9% (10/53) vs 5.7% (3/53), 18.9% (10/53) vs 5.7% (3/53), 28.3% (15/53) vs 9.4% (5/53), 30.2% (16/53) vs 11.3% (6/53), all P˂0.05]. The best cut-off value of the diameter of gastric varices that affects the total rebleeding in the two groups was 10.4 mm (10 mm was selected as the best cut-off value for the convenience of practical clinical application). Hemoglobin ˂ 85 g/L (HR=0.202, 95%CI: 0.043-0.953, P=0.043), 10 mm ˂ the diameter of GV ≤ 15 mm (HR=5.321, 95%CI: 1.161-24.390, P=0.031) and endoscopic variceal ligation combined with endoscopic tissue adhesive injection (EVL+ETAI) (HR=7.172, 95%CI: 1.910-26.930, P=0.004) were the risk factors for the first gastroesophageal variceal rebleeding within 12 months after non-urgent endoscopic treatment. EVL+ETAI (HR=3.811, 95%CI: 1.441-10.084, P=0.007) and portal vein thrombosis (HR=4.026, 95%CI: 1.483-10.932, P=0.006) were the risk factors for the first gastroesophageal variceal rebleeding within 36 months after non-urgent endoscopic treatment. The study found that, 10 mm ˂ the diameter of GV ≤ 15 mm (HR=7.503, 95%CI: 1.568-35.890, P=0.012) was the risk factor for rebleeding within 36 months in the thrombotic group. Conclusion: Portal vein thrombosis is a risk factor for rebleeding after non-urgent endoscopic treatment of esophagogastric varices.
Collapse
|
9
|
Partial replacement of soybean meal with microalgae biomass on in vitro ruminal fermentation may reduce ruminal protein degradation. J Dairy Sci 2024; 107:1460-1471. [PMID: 37944802 DOI: 10.3168/jds.2023-24016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 09/20/2023] [Indexed: 11/12/2023]
Abstract
The objective of this study was to evaluate the effects of partially replacing soybean meal (SBM) with algal sources on in vitro ruminal fermentation. Using 6 fermenters in a 3 × 3 replicated Latin square with 3 periods of 10 d each, we tested 3 treatments: a control diet (CRT) with SBM at 17.8% of the diet dry matter (DM); and 50% SBM biomass replacement with either Chlorella pyrenoidosa (CHL); or Spirulina platensis (SPI). The basal diet was formulated to meet the requirements of a 680-kg Holstein dairy cow producing 45 kg/d of milk with 3.5% fat and 3% protein. All diets had a similar nutritional composition (16.0% CP; 34.9% NDF; 31.0% starch, DM basis) and fermenters were provided with 106 g DM/d split into 2 portions. After 7 d of adaptation, samples were collected for 3 d of each period for analyses of ruminal fermentation at 0, 1, 2, 4, 6, and 8 h after morning feeding for evaluation of the ruminal fermentation kinetics. For the evaluation of the daily production of total metabolites and for the evaluation of nutrient degradability, samples from the effluent containers were collected daily. Statistical analysis was performed with the MIXED procedure of SAS with treatment, time, and their interactions considered as fixed effects; day, square, and fermenter were considered as random effects. Orthogonal contrasts (CRT vs. algae; and CHL vs. SPI) were used to depict the treatment effect, and significance was declared when P ≤ 0.05. Fermenters that received algae-based diets had a greater propionate molar concentration and molar proportion when compared with the fermenters fed CRT diets. In addition, those algae-fed fermenters had lower branched short-chain fatty acids (BSCFA) and isoacids (IA), which are biomarkers of ruminal protein degradation, along with lower ammonia (NH3-N) concentration and greater nonammonia nitrogen (NAN). When contrasting with fermenters fed SPI-diets, fermenters fed based CHL-diets had a lower molar concentration of BSCFA and IA, along with lower NH3-N concentration and flow, and greater NAN, bacterial nitrogen flow, and efficiency of nitrogen utilization. Those results indicate that CHL protein may be more resistant to ruminal degradation, which would increase efficiency of nitrogen utilization. In summary, partially replacing SBM with algae biomass, especially with CHL, is a promising strategy to improve the efficiency of nitrogen utilization, due to the fact that fermenters fed CHL-based diets resulted in a reduction in BSCFA and IA, which are markers of protein degradation, and it would improve the efficiency of nitrogen utilization. However, further validation using in vivo models are required.
Collapse
|
10
|
[Effects of canagliflozin on amino acid metabolism in atherosclerotic mice]. ZHONGHUA XIN XUE GUAN BING ZA ZHI 2024; 52:64-71. [PMID: 38220457 DOI: 10.3760/cma.j.cn112148-20231009-00275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
Objective: To explore the possible anti-atherosclerotic mechanisms of glucose co-transporter-2 inhibitor canagliflozin. Methods: ApoE-/-mice fed on Western diet were randomly assigned into the model group (n=10) and the canagliflozin group (n=10). C57BL/6J mice fed on normal diet were chosen as the control group (n=10). Mice in the canagliflozin group were gavaged with canagliflozin for 14 weeks. The presence and severity of atherosclerosis were evaluated with HE and oil red O stainings in aortic root section slices. PCR assay was performed to determine the mRNA expression levels of nitric oxide synthase. Hepatic transcriptome analysis and hepatic amino acid detection were conducted using RNA-seq and targeted LC-MS, respectively. Results: HE staining and oil red O staining of the aortic root showed that AS models were successfully established in ApoE-/-mice fed on Western diet for 14 weeks. Canagliflozin alleviated the severity of atherosclerosis in pathology. Hepatic transcriptome analysis indicated that canagliflozin impacted on amino acid metabolism, especially arginine synthesis in ApoE-/-mice. Targeted metabolomics analysis of amino acids showed that canagliflozin reduced hepatic levels of L-serine, L-aspartic acid, tyrosine, L-hydroxyproline, and L-citrulline, but raised the hepatic level of L-arginine. Compared to the model group, the canagliflozin group exhibited higher serum arginine and nitric oxide levels as well as elevated nitric oxide mRNA expression in aortic tissues (P<0.05). Conclusion: Canagliflozin regulated the amino acid metabolism, reduced the levels of glucogenic amino acids,and promoted the synthesis of arginine in atherosclerotic mice.
Collapse
|
11
|
Bi-level structured functional analysis for genome-wide association studies. Biometrics 2023; 79:3359-3373. [PMID: 37098961 DOI: 10.1111/biom.13871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 04/19/2023] [Indexed: 04/27/2023]
Abstract
Genome-wide association studies (GWAS) have led to great successes in identifying genotype-phenotype associations for complex human diseases. In such studies, the high dimensionality of single nucleotide polymorphisms (SNPs) often makes analysis difficult. Functional analysis, which interprets SNPs densely distributed in a chromosomal region as a continuous process rather than discrete observations, has emerged as a promising avenue for overcoming the high dimensionality challenges. However, the majority of the existing functional studies continue to be individual SNP based and are unable to sufficiently account for the intricate underpinning structures of SNP data. SNPs are often found in groups (e.g., genes or pathways) and have a natural group structure. Additionally, these SNP groups can be highly correlated with coordinated biological functions and interact in a network. Motivated by these unique characteristics of SNP data, we develop a novel bi-level structured functional analysis method and investigate disease-associated genetic variants at the SNP level and SNP group level simultaneously. The penalization technique is adopted for bi-level selection and also to accommodate the group-level network structure. Both the estimation and selection consistency properties are rigorously established. The superiority of the proposed method over alternatives is shown through extensive simulation studies. A type 2 diabetes SNP data application yields some biologically intriguing results.
Collapse
|
12
|
Pathological imaging-assisted cancer gene-environment interaction analysis. Biometrics 2023; 79:3883-3894. [PMID: 37132273 PMCID: PMC10622332 DOI: 10.1111/biom.13873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Accepted: 04/26/2023] [Indexed: 05/04/2023]
Abstract
Gene-environment (G-E) interactions have important implications for cancer outcomes and phenotypes beyond the main G and E effects. Compared to main-effect-only analysis, G-E interaction analysis more seriously suffers from a lack of information caused by higher dimensionality, weaker signals, and other factors. It is also uniquely challenged by the "main effects, interactions" variable selection hierarchy. Effort has been made to bring in additional information to assist cancer G-E interaction analysis. In this study, we take a strategy different from the existing literature and borrow information from pathological imaging data. Such data are a "byproduct" of biopsy, enjoys broad availability and low cost, and has been shown as informative for modeling prognosis and other cancer outcomes/phenotypes in recent studies. Building on penalization, we develop an assisted estimation and variable selection approach for G-E interaction analysis. The approach is intuitive, can be effectively realized, and has competitive performance in simulation. We further analyze The Cancer Genome Atlas (TCGA) data on lung adenocarcinoma (LUAD). The outcome of interest is overall survival, and for G variables, we analyze gene expressions. Assisted by pathological imaging data, our G-E interaction analysis leads to different findings with competitive prediction performance and stability.
Collapse
|
13
|
FunctanSNP: an R package for functional analysis of dense SNP data (with interactions). Bioinformatics 2023; 39:btad741. [PMID: 38060266 PMCID: PMC10723032 DOI: 10.1093/bioinformatics/btad741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 11/30/2023] [Accepted: 12/06/2023] [Indexed: 12/08/2023] Open
Abstract
SUMMARY Densely measured SNP data are routinely analyzed but face challenges due to its high dimensionality, especially when gene-environment interactions are incorporated. In recent literature, a functional analysis strategy has been developed, which treats dense SNP measurements as a realization of a genetic function and can 'bypass' the dimensionality challenge. However, there is a lack of portable and friendly software, which hinders practical utilization of these functional methods. We fill this knowledge gap and develop the R package FunctanSNP. This comprehensive package encompasses estimation, identification, and visualization tools and has undergone extensive testing using both simulated and real data, confirming its reliability. FunctanSNP can serve as a convenient and reliable tool for analyzing SNP and other densely measured data. AVAILABILITY AND IMPLEMENTATION The package is available at https://CRAN.R-project.org/package=FunctanSNP.
Collapse
|
14
|
The Bayesian Regularized Quantile Varying Coefficient Model. Comput Stat Data Anal 2023; 187:107808. [PMID: 38746689 PMCID: PMC11090482 DOI: 10.1016/j.csda.2023.107808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/19/2024]
Abstract
The quantile varying coefficient (VC) model can flexibly capture dynamical patterns of regression coefficients. In addition, due to the quantile check loss function, it is robust against outliers and heavy-tailed distributions of the response variable, and can provide a more comprehensive picture of modeling via exploring the conditional quantiles of the response variable. Although extensive studies have been conducted to examine variable selection for the high-dimensional quantile varying coefficient models, the Bayesian analysis has been rarely developed. The Bayesian regularized quantile varying coefficient model has been proposed to incorporate robustness against data heterogeneity while accommodating the non-linear interactions between the effect modifier and predictors. Selecting important varying coefficients can be achieved through Bayesian variable selection. Incorporating the multivariate spike-and-slab priors further improves performance by inducing exact sparsity. The Gibbs sampler has been derived to conduct efficient posterior inference of the sparse Bayesian quantile VC model through Markov chain Monte Carlo (MCMC). The merit of the proposed model in selection and estimation accuracy over the alternatives has been systematically investigated in simulation under specific quantile levels and multiple heavy-tailed model errors. In the case study, the proposed model leads to identification of biologically sensible markers in a non-linear gene-environment interaction study using the NHS data.
Collapse
|
15
|
KatG catalase deficiency confers bedaquiline hyper-susceptibility to isoniazid resistant Mycobacterium tuberculosis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.17.562707. [PMID: 37905073 PMCID: PMC10614911 DOI: 10.1101/2023.10.17.562707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Multidrug-resistant tuberculosis (MDR-TB) is a growing source of global mortality and threatens global control of tuberculosis (TB) disease. The diarylquinoline bedaquiline (BDQ) recently emerged as a highly efficacious drug against MDR-TB, defined as resistance to the first-line drugs isoniazid (INH) and rifampin. INH resistance is primarily caused by loss-of-function mutations in the catalase KatG, but mechanisms underlying BDQ's efficacy against MDR-TB remain unknown. Here we employ a systems biology approach to investigate BDQ hyper-susceptibility in INH-resistant Mycobacterium tuberculosis . We found hyper-susceptibility to BDQ in INH-resistant cells is due to several physiological changes induced by KatG deficiency, including increased susceptibility to reactive oxygen species and DNA damage, remodeling of transcriptional programs, and metabolic repression of folate biosynthesis. We demonstrate BDQ hyper-susceptibility is common in INH-resistant clinical isolates. Collectively, these results highlight how altered bacterial physiology can impact drug efficacy in drug-resistant bacteria.
Collapse
|
16
|
The Role of Radiation Therapy for Metastatic Cervical Cancer. Int J Radiat Oncol Biol Phys 2023; 117:e555. [PMID: 37785704 DOI: 10.1016/j.ijrobp.2023.06.1865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2023]
Abstract
PURPOSE/OBJECTIVE(S) Survival rates for women with metastatic cervical cancer (CC) are low, with limited management options. Radiation therapy (RT) for metastatic disease has led to prolonged survival in other malignancies, however, the data are scarce in CC. Herein, we evaluated the effect of RT for metastatic CC. MATERIALS/METHODS A total of 58 patients with metastatic CC between September 2019 and January 2023 were retrospectively analyzed. All the patients were treated with platinum-based chemotherapy combined with targeted therapy or immunotherapy followed with or without RT (NRT). The recent efficacy, survival status and prognostic factors were analyzed statistically. RESULTS Objective response rate (ORR) was 63.6% with one complete and twenty partial responses in RT group (n = 33) and 40.0% with two complete and eight partial responses in NRT group (n = 25), respectively (p = 0.074). Disease control rate (DCR) of the RT and NRT groups were 79.4% vs 80.0%, respectively (p = 0.861). Median follow-up time was 17 months (3-39months). In RT group, 11(33.3%) patients experienced local regional or distant failure and 9 (27.3%) patients were dead. In NRT group, 15(60%) patients had progression and 8 (32%) patients dead. There was no significant difference between the two groups in overall survival (OS); however, RT group displayed superior progression-free survival (PFS) (1-year OS: 72.7% vs. 68.0%, p = 0.460; 1-year PFS: 66.7% vs. 40.0%, p = 0.039). The multivariate analysis showed that RT, immunotherapy, lymph node metastasis only relevant predictor of superior PFS but not OS. In subgroup analysis, patients treated with RT appeared to have a better PFS in some specific cohorts, such as age>45 years (72.0% vs 36.4% P = 0.015), squamous carcinoma histology (71.0% vs 40.9% P = 0.017), metastatic at diagnosis (75.0% vs 47.6% P = 0.012), non-targeted therapy (72.4% vs 43.8% P = 0.040). No significant increase in treatment-related toxicity was observed in the RT group compared with the NRT group. CONCLUSION RT provided superior PFS in metastatic CC patients compared to NRT, and well tolerated. Moreover, RT, immunotherapy, lymph node metastasis only were independent significant prognostic factors for PFS. Subgroup analysis showed that combination of RT and chemotherapy obtained favorable PFS in metastatic CC patients with age>45 years, squamous carcinoma histology, metastatic at diagnosis, non-targeted therapy. Studies with a larger sample size and longer follow-up are warranted.
Collapse
|
17
|
Hyperthermia Enhances the Radiosensitivity of Pancreatic Cancer Cells by Inhibiting Wnt2B Signaling. Int J Radiat Oncol Biol Phys 2023; 117:e277. [PMID: 37785041 DOI: 10.1016/j.ijrobp.2023.06.1254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2023]
Abstract
PURPOSE/OBJECTIVE(S) Pancreatic cancer (PC) is a highly lethal human malignance. Due to unobvious symptoms at early stage, most of the patients with PC are diagnosed at late stages and lose the chance of surgical resection. Furthermore, PC patients are resistant to chemoradiotherapy and therefore show a dismal survival. Hyperthermia is commonly used as a sensitizer of chemotherapy or radiotherapy for the clinical treatment of human cancers. Our study aimed to investigate whether hyperthermia can improve the radiosensitivity of PC cells and uncover the involved mechanisms. MATERIALS/METHODS PC cells BxPC3, CFPAC-1 and PANC1 were heated to 43 ℃ 1 h before exposure to ionizing irradiation (IR). The radiosensitivity of PC cells were detected in vitro by colony formation assay, immunofluence analysis and western blotting. The mechanisms studies have been conducted using qRT-PCR analysis, cDNA/siRNA transfection and comet assay. RESULTS Hyperthermia significantly enhanced the radiosensitivity of PC cells by decreasing their colony formation and increasing DNA damage following IR. By qRT-PCR analysis of Wnt genes expressions, we found Wnt2B was significantly down-regulated in PC-3 cells which were treated with the combination of hyperthermia and IR compared with hyperthermia or IR alone. Functional assays showed that the expression level of Wnt2B was inversely associated with the radiosensitivity of PC-3 cells. Furthermore, we found hyperthermia inhibited the expression of DNA repair proteins such as p-BRCA1 and p-MRE11 in PC cells following IR CONCLUSION: Hyperthermia can significantly enhance the radiosensitivity of PC cells in a Wnt2B signaling-dependent manner.
Collapse
|
18
|
ZINBMM: a general mixture model for simultaneous clustering and gene selection using single-cell transcriptomic data. Genome Biol 2023; 24:208. [PMID: 37697330 PMCID: PMC10496184 DOI: 10.1186/s13059-023-03046-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 08/22/2023] [Indexed: 09/13/2023] Open
Abstract
Clustering is a critical component of single-cell RNA sequencing (scRNA-seq) data analysis and can help reveal cell types and infer cell lineages. Despite considerable successes, there are few methods tailored to investigating cluster-specific genes contributing to cell heterogeneity, which can promote biological understanding of cell heterogeneity. In this study, we propose a zero-inflated negative binomial mixture model (ZINBMM) that simultaneously achieves effective scRNA-seq data clustering and gene selection. ZINBMM conducts a systemic analysis on raw counts, accommodating both batch effects and dropout events. Simulations and the analysis of five scRNA-seq datasets demonstrate the practical applicability of ZINBMM.
Collapse
|
19
|
Two-level Bayesian interaction analysis for survival data incorporating pathway information. Biometrics 2023; 79:1761-1774. [PMID: 36524727 PMCID: PMC10272285 DOI: 10.1111/biom.13811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 08/31/2022] [Accepted: 12/06/2022] [Indexed: 12/23/2022]
Abstract
Genetic interactions play an important role in the progression of complex diseases, providing explanation of variations in disease phenotype missed by main genetic effects. Comparatively, there are fewer studies on survival time, given its challenging characteristics such as censoring. In recent biomedical research, two-level analysis of both genes and their involved pathways has received much attention and been demonstrated as more effective than single-level analysis. However, such analysis is usually limited to main effects. Pathways are not isolated, and their interactions have also been suggested to have important contributions to the prognosis of complex diseases. In this paper, we develop a novel two-level Bayesian interaction analysis approach for survival data. This approach is the first to conduct the analysis of lower-level gene-gene interactions and higher-level pathway-pathway interactions simultaneously. Significantly advancing from the existing Bayesian studies based on the Markov Chain Monte Carlo (MCMC) technique, we propose a variational inference framework based on the accelerated failure time model with effective priors to accommodate two-level selection as well as censoring. Its computational efficiency is much desirable for high-dimensional interaction analysis. We examine performance of the proposed approach using extensive simulation. The application to TCGA melanoma and lung adenocarcinoma data leads to biologically sensible findings with satisfactory prediction accuracy and selection stability.
Collapse
|
20
|
Comparative Effectiveness Analysis of Lumpectomy and Mastectomy for Elderly Female Breast Cancer Patients: A Deep Learning-based Big Data Analysis. THE YALE JOURNAL OF BIOLOGY AND MEDICINE 2023; 96:327-346. [PMID: 37781001 PMCID: PMC10524818 DOI: 10.59249/iaju7580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/03/2023]
Abstract
Objectives: To evaluate the comparative effectiveness of treatments, a randomized clinical trial remains the gold standard but can be challenged by a high cost, a limited sample size, an inability to fully reflect the real world, and feasibility concerns. The objective is to showcase a big data approach that takes advantage of large electronic medical record (EMR) data to emulate clinical trials. To overcome the limitations of regression analysis, a deep learning-based analysis pipeline was developed. Study Design and Setting: Lumpectomy (breast-conserving surgery) and mastectomy are the two most commonly used surgical procedures for early-stage female breast cancer patients. An emulation trial was designed using the Surveillance, Epidemiology, and End Results (SEER)-Medicare data to evaluate their relative effectiveness in overall survival. The analysis pipeline consisted of a propensity score step, a weighted survival analysis step, and a bootstrap inference step. Results: A total of 65,997 subjects were enrolled in the emulated trial, with 50,704 and 15,293 in the lumpectomy and mastectomy arms, respectively. The two surgery procedures had comparable effects in terms of overall survival (survival year change = 0.08, 95% confidence interval (CI): -0.08, 0.25) for the elderly SEER-Medicare early-stage female breast cancer patients. Conclusion: This study demonstrated the power of "mining large EMR data + deep learning-based analysis," and the proposed analysis strategy and technique can be potentially broadly applicable. It provided convincing evidence of the comparative effectiveness of lumpectomy and mastectomy.
Collapse
|
21
|
First-principles property assessment of hybrid formate perovskites. J Chem Phys 2023; 159:074702. [PMID: 37589410 DOI: 10.1063/5.0159526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 07/19/2023] [Indexed: 08/18/2023] Open
Abstract
Hybrid organic-inorganic formate perovskites, AB(HCOO)3, are a large family of compounds that exhibit a variety of phase transitions and diverse properties, such as (anti)ferroelectricity, ferroelasticity, (anti)ferromagnetism, and multiferroism. While many properties of these materials have already been characterized, we are not aware of any study that focuses on the comprehensive property assessment of a large number of formate perovskites. A comparison of the properties of materials within the family is challenging due to systematic errors attributed to different techniques or the lack of data. For example, complete piezoelectric, dielectric, and elastic tensors are not available. In this work, we utilize first-principles density functional theory based simulations to overcome these challenges and to report structural, mechanical, dielectric, piezoelectric, and ferroelectric properties of 29 formate perovskites. We find that these materials exhibit elastic stiffness in the range 0.5-127.0 GPa; highly anisotropic linear compressibility, including zero and even negative values; dielectric constants in the range 0.1-102.1; highly anisotropic piezoelectric response with the longitudinal values in the range 1.18-21.12 pC/N; and spontaneous polarizations in the range 0.2-7.8 μC/cm2. Furthermore, we propose and computationally characterize a few formate perovskites that have not been reported yet.
Collapse
|
22
|
Aligned deep neural network for integrative analysis with high-dimensional input. J Biomed Inform 2023; 144:104434. [PMID: 37391115 PMCID: PMC10534141 DOI: 10.1016/j.jbi.2023.104434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Revised: 05/20/2023] [Accepted: 06/19/2023] [Indexed: 07/02/2023]
Abstract
OBJECTIVE Deep neural network (DNN) techniques have demonstrated significant advantages over regression and some other techniques. In recent studies, DNN-based analysis has been conducted on data with high-dimensional input such as omics measurements. In such analysis, regularization, in particular penalization, has been applied to regularize estimation and distinguish relevant input variables from irrelevant ones. A unique challenge arises from the "lack of information" attributable to high dimensionality of input and limited size of training data. For many data/studies, there exist other data/studies that may be relevant and can potentially provide additional information to boost performance. METHODS In this study, we conduct integrative analysis of multiple independent datasets/studies, with the goal of borrowing information across each other and improving overall performance. Significantly different from regression-based integrative analysis (where alignment can be easily achieved based on covariates), alignment across multiple DNNs can be nontrivial. We develop ANNI, an Aligned DNN technique for Integrative analysis with high-dimensional input. Penalization is applied for regularized estimation, selection of important input variables, and, equally importantly, information borrowing across multiple DNNs. An effective computational algorithm is developed. RESULTS Extensive simulations demonstrate competitive performance of the proposed technique. The analysis of cancer omics data further establishes its practical utility.
Collapse
|
23
|
Prior information-assisted integrative analysis of multiple datasets. Bioinformatics 2023; 39:btad452. [PMID: 37490475 PMCID: PMC10400378 DOI: 10.1093/bioinformatics/btad452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 05/13/2023] [Accepted: 07/24/2023] [Indexed: 07/27/2023] Open
Abstract
MOTIVATION Analyzing genetic data to identify markers and construct predictive models is of great interest in biomedical research. However, limited by cost and sample availability, genetic studies often suffer from the "small sample size, high dimensionality" problem. To tackle this problem, an integrative analysis that collectively analyzes multiple datasets with compatible designs is often conducted. For regularizing estimation and selecting relevant variables, penalization and other regularization techniques are routinely adopted. "Blindly" searching over a vast number of variables may not be efficient. RESULTS We propose incorporating prior information to assist integrative analysis of multiple genetic datasets. To obtain accurate prior information, we adopt a convolutional neural network with an active learning strategy to label textual information from previous studies. Then the extracted prior information is incorporated using a group LASSO-based technique. We conducted a series of simulation studies that demonstrated the satisfactory performance of the proposed method. Finally, data on skin cutaneous melanoma are analyzed to establish practical utility. AVAILABILITY AND IMPLEMENTATION Code is available at https://github.com/ldz7/PAIA. The data that support the findings in this article are openly available in TCGA (The Cancer Genome Atlas) at https://portal.gdc.cancer.gov/.
Collapse
|
24
|
Editorial. Brief Bioinform 2023; 24:bbad258. [PMID: 37406189 DOI: 10.1093/bib/bbad258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/07/2023] Open
|
25
|
The association of new-onset diabetes with subsequent diagnosis of pancreatic cancer-novel use of a large administrative database. J Public Health (Oxf) 2023; 45:e266-e274. [PMID: 36321614 PMCID: PMC10273390 DOI: 10.1093/pubmed/fdac118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 09/05/2022] [Accepted: 09/26/2022] [Indexed: 11/03/2023] Open
Abstract
BACKGROUND Screening options for pancreatic ductal adenocarcinoma (PDAC) are limited. New-onset type 2 diabetes (NoD) is associated with subsequent diagnosis of PDAC in observational studies and may afford an opportunity for PDAC screening. We evaluated this association using a large administrative database. METHODS Patients were identified using claims data from the OptumLabs® Data Warehouse. Adult patients with NoD diagnosis were matched 1:3 with patients without NoD using age, sex and chronic obstructive pulmonary disease (COPD) status. The event of PDAC diagnosis was compared between cohorts using the Kaplan-Meier method. Factors associated with PDAC diagnosis were evaluated with Cox's proportional hazards modeling. RESULTS We identified 640 421 patients with NoD and included 1 921 263 controls. At 3 years, significantly more PDAC events were identified in the NoD group vs control group (579 vs 505; P < 0.001). When controlling for patient factors, NoD was significantly associated with elevated risk of PDAC (HR 3.474, 95% CI 3.082-3.920, P < 0.001). Other factors significantly associated with PDAC diagnosis were increasing age, increasing age among Black patients, and COPD diagnosis (P ≤ 0.05). CONCLUSIONS NoD was independently associated with subsequent diagnosis of PDAC within 3 years. Future studies should evaluate the feasibility and benefit of PDAC screening in patients with NoD.
Collapse
|
26
|
Robust Bayesian variable selection for gene-environment interactions. Biometrics 2023; 79:684-694. [PMID: 35394058 PMCID: PMC11086965 DOI: 10.1111/biom.13670] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 03/23/2022] [Accepted: 03/28/2022] [Indexed: 11/30/2022]
Abstract
Gene-environment (G× E) interactions have important implications to elucidate the etiology of complex diseases beyond the main genetic and environmental effects. Outliers and data contamination in disease phenotypes of G× E studies have been commonly encountered, leading to the development of a broad spectrum of robust regularization methods. Nevertheless, within the Bayesian framework, the issue has not been taken care of in existing studies. We develop a fully Bayesian robust variable selection method for G× E interaction studies. The proposed Bayesian method can effectively accommodate heavy-tailed errors and outliers in the response variable while conducting variable selection by accounting for structural sparsity. In particular, for the robust sparse group selection, the spike-and-slab priors have been imposed on both individual and group levels to identify important main and interaction effects robustly. An efficient Gibbs sampler has been developed to facilitate fast computation. Extensive simulation studies, analysis of diabetes data with single-nucleotide polymorphism measurements from the Nurses' Health Study, and The Cancer Genome Atlas melanoma data with gene expression measurements demonstrate the superior performance of the proposed method over multiple competing alternatives.
Collapse
|
27
|
Bayesian finite mixture of regression analysis for cancer based on histopathological imaging-environment interactions. Biostatistics 2023; 24:425-442. [PMID: 37057611 PMCID: PMC10102889 DOI: 10.1093/biostatistics/kxab038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Revised: 07/28/2021] [Accepted: 10/01/2021] [Indexed: 11/12/2022] Open
Abstract
Cancer is a heterogeneous disease. Finite mixture of regression (FMR)-as an important heterogeneity analysis technique when an outcome variable is present-has been extensively employed in cancer research, revealing important differences in the associations between a cancer outcome/phenotype and covariates. Cancer FMR analysis has been based on clinical, demographic, and omics variables. A relatively recent and alternative source of data comes from histopathological images. Histopathological images have been long used for cancer diagnosis and staging. Recently, it has been shown that high-dimensional histopathological image features, which are extracted using automated digital image processing pipelines, are effective for modeling cancer outcomes/phenotypes. Histopathological imaging-environment interaction analysis has been further developed to expand the scope of cancer modeling and histopathological imaging-based analysis. Motivated by the significance of cancer FMR analysis and a still strong demand for more effective methods, in this article, we take the natural next step and conduct cancer FMR analysis based on models that incorporate low-dimensional clinical/demographic/environmental variables, high-dimensional imaging features, as well as their interactions. Complementary to many of the existing studies, we develop a Bayesian approach for accommodating high dimensionality, screening out noises, identifying signals, and respecting the "main effects, interactions" variable selection hierarchy. An effective computational algorithm is developed, and simulation shows advantageous performance of the proposed approach. The analysis of The Cancer Genome Atlas data on lung squamous cell cancer leads to interesting findings different from the alternative approaches.
Collapse
|
28
|
Unified model-free interaction screening via CV-entropy filter. Comput Stat Data Anal 2023; 180:107684. [PMID: 36910335 PMCID: PMC9997997 DOI: 10.1016/j.csda.2022.107684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
For many practical high-dimensional problems, interactions have been increasingly found to play important roles beyond main effects. A representative example is gene-gene interaction. Joint analysis, which analyzes all interactions and main effects in a single model, can be seriously challenged by high dimensionality. For high-dimensional data analysis in general, marginal screening has been established as effective for reducing computational cost, increasing stability, and improving estimation/selection performance. Most of the existing marginal screening methods are designed for the analysis of main effects only. The existing screening methods for interaction analysis are often limited by making stringent model assumptions, lacking robustness, and/or requiring predictors to be continuous (and hence lacking flexibility). A unified marginal screening approach tailored to interaction analysis is developed, which can be applied to regression, classification, and survival analysis. Predictors are allowed to be continuous and discrete. The proposed approach is built on Coefficient of Variation (CV) filters based on information entropy. Statistical properties are rigorously established. It is shown that the CV filters are almost insensitive to the distribution tails of predictors, correlation structure among predictors, and sparsity level of signals. An efficient two-stage algorithm is developed to make the proposed approach scalable to ultrahigh-dimensional data. Simulations and the analysis of TCGA LUAD data further establish the practical superiority of the proposed approach.
Collapse
|
29
|
Gene-environment interaction analysis via deep learning. Genet Epidemiol 2023; 47:261-286. [PMID: 36807383 PMCID: PMC10244912 DOI: 10.1002/gepi.22518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 12/17/2022] [Accepted: 02/01/2023] [Indexed: 02/21/2023]
Abstract
Gene-environment (G-E) interaction analysis plays an important role in studying complex diseases. Extensive methodological research has been conducted on G-E interaction analysis, and the existing methods are mostly based on regression techniques. In many fields including biomedicine and omics, it has been increasingly recognized that deep learning may outperform regression with its unique flexibility (e.g., in accommodating unspecified nonlinear effects) and superior prediction performance. However, there has been a lack of development in deep learning for G-E interaction analysis. In this article, we fill this important knowledge gap and develop a new analysis approach based on deep neural network in conjunction with penalization. The proposed approach can simultaneously conduct model estimation and selection (of important main G effects and G-E interactions), while uniquely respecting the "main effects, interactions" variable selection hierarchy. Simulation shows that it has superior prediction and feature selection performance. The analysis of data on lung adenocarcinoma and skin cutaneous melanoma overall survival further establishes its practical utility. Overall, this study can advance G-E interaction analysis by delivering a powerful new analysis approach based on modern deep learning.
Collapse
|
30
|
HETEROGENEITY ANALYSIS VIA INTEGRATING MULTI-SOURCES HIGH-DIMENSIONAL DATA WITH APPLICATIONS TO CANCER STUDIES. Stat Sin 2023; 33:729-758. [PMID: 38037567 PMCID: PMC10686523 DOI: 10.5705/ss.202021.0002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2023]
Abstract
This study has been motivated by cancer research, in which heterogeneity analysis plays an important role and can be roughly classified as unsupervised or supervised. In supervised heterogeneity analysis, the finite mixture of regression (FMR) technique is used extensively, under which the covariates affect the response differently in subgroups. High-dimensional molecular and, very recently, histopathological imaging features have been analyzed separately and shown to be effective for heterogeneity analysis. For simpler analysis, they have been shown to contain overlapping, but also independent information. In this article, our goal is to conduct the first and more effective FMR-based cancer heterogeneity analysis by integrating high-dimensional molecular and histopathological imaging features. A penalization approach is developed to regularize estimation, select relevant variables, and, equally importantly, promote the identification of independent information. Consistency properties are rigorously established. An effective computational algorithm is developed. A simulation and an analysis of The Cancer Genome Atlas (TCGA) lung cancer data demonstrate the practical effectiveness of the proposed approach. Overall, this study provides a practical and useful new way of conducting supervised cancer heterogeneity analysis.
Collapse
|
31
|
Bacterial antibiotic resistance among cancer inpatients in China: 2016-20. QJM 2023; 116:213-220. [PMID: 36269193 DOI: 10.1093/qjmed/hcac244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 09/16/2022] [Accepted: 10/10/2022] [Indexed: 11/09/2022] Open
Abstract
BACKGROUND The incidence of infections among cancer patients is as high as 23.2-33.2% in China. However, the lack of information and data on the number of antibiotics used by cancer patients is an obstacle to implementing antibiotic management plans. AIM This study aimed to investigate bacterial infections and antibiotic resistance in Chinese cancer patients to provide a reference for the rational use of antibiotics. DESIGN This was a 5-year retrospective study on the antibiotic resistance of cancer patients. METHODS In this 5-year surveillance study, we collected bacterial and antibiotic resistance data from 20 provincial cancer diagnosis and treatment centers and three specialized cancer hospitals in China. We analyzed the resistance of common bacteria to antibiotics, compared to common clinical drug-resistant bacteria, evaluated the evolution of critical drug-resistant bacteria and conducted data analysis. FINDINGS Between 2016 and 2020, 216 219 bacterial strains were clinically isolated. The resistance trend of Escherichia coli and Klebsiella pneumoniae to amikacin, ciprofloxacin, cefotaxime, piperacillin/tazobactam and imipenem was relatively stable and did not significantly increase over time. The resistance of Pseudomonas aeruginosa strains to all antibiotics tested, including imipenem and meropenem, decreased over time. In contrast, the resistance of Acinetobacter baumannii strains to carbapenems increased from 4.7% to 14.7%. Methicillin-resistant Staphylococcus aureus (MRSA) significantly decreased from 65.2% in 2016 to 48.9% in 2020. CONCLUSIONS The bacterial prevalence and antibiotic resistance rates of E. coli, K. pneumoniae, P. aeruginosa, A. baumannii, S. aureus and MRSA were significantly lower than the national average.
Collapse
|
32
|
SARS-CoV-2 mRNA vaccines decouple anti-viral immunity from humoral autoimmunity. Nat Commun 2023; 14:1299. [PMID: 36894554 PMCID: PMC9996559 DOI: 10.1038/s41467-023-36686-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 02/09/2023] [Indexed: 03/11/2023] Open
Abstract
mRNA-based vaccines dramatically reduce the occurrence and severity of COVID-19, but are associated with rare vaccine-related adverse effects. These toxicities, coupled with observations that SARS-CoV-2 infection is associated with autoantibody development, raise questions whether COVID-19 vaccines may also promote the development of autoantibodies, particularly in autoimmune patients. Here we used Rapid Extracellular Antigen Profiling to characterize self- and viral-directed humoral responses after SARS-CoV-2 mRNA vaccination in 145 healthy individuals, 38 patients with autoimmune diseases, and 8 patients with mRNA vaccine-associated myocarditis. We confirm that most individuals generated robust virus-specific antibody responses post vaccination, but that the quality of this response is impaired in autoimmune patients on certain modes of immunosuppression. Autoantibody dynamics are remarkably stable in all vaccinated patients compared to COVID-19 patients that exhibit an increased prevalence of new autoantibody reactivities. Patients with vaccine-associated myocarditis do not have increased autoantibody reactivities relative to controls. In summary, our findings indicate that mRNA vaccines decouple SARS-CoV-2 immunity from autoantibody responses observed during acute COVID-19.
Collapse
|
33
|
Effects of testis testosterone deficiency on gene expression in the adrenal gland and skeletal muscle of ducks. Br Poult Sci 2023. [PMID: 36735924 DOI: 10.1080/00071668.2023.2176741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
1. Testosterone has an anabolic effect on skeletal muscle. The testes produce most of the testosterone in vivo, while the adrenal glands contribute smaller amounts. When testis testosterone is deficient the adrenal gland increases steroid hormone synthesis, which is referred to as compensatory testicular adaptation (CTA).2. To reveal the effects of testis testosterone deficiency on adrenal steroid hormones synthesis and skeletal muscle development, gene expression related to adrenal steroid hormones synthesis and skeletal muscle development were determined by RNA-seq.3. The results showed that castrating male ducks had significant effects on their body weight but no significant impact on cross-sectional area (CSA) or density of pectoral muscle fibres. In skeletal muscle protein metabolism, expression levels of the catabolic gene atrogin1/MAFbx and the anabolic gene eEF2 were significantly higher, with concomitant increases after castration. The adrenal glands' alteration of the steroid hormone 11β-hydroxylase (CYP11B1) was significantly lower following castration.4. Expression pattern analysis showed that the adrenal glands' glucocorticoid receptor (NR3C1/GR) had a potential regulatory relationship with the skeletal muscle-related genes (Pax7, mTOR, FBXO32, FOXO3, and FOXO4).5. The data showed that castration affected muscle protein metabolism, adrenal steroid and testosterone synthesis. In addition, it was speculated that, after castration, steroid hormones produced by the adrenal gland could have a compensatory effect, which might mediate the changes in skeletal muscle protein metabolism and development.
Collapse
|
34
|
Human disease clinical treatment network for the elderly: analysis of the medicare inpatient length of stay and readmission data. Biometrics 2023; 79:404-416. [PMID: 34411297 DOI: 10.1111/biom.13549] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2021] [Revised: 06/24/2021] [Accepted: 08/11/2021] [Indexed: 11/30/2022]
Abstract
Clinical treatment outcomes are the quality and cost targets that health-care providers aim to improve. Most existing outcome analysis focuses on a single disease or all diseases combined. Motivated by the success of molecular and phenotypic human disease networks (HDNs), this article develops a clinical treatment network that describes the interconnections among diseases in terms of inpatient length of stay (LOS) and readmission. Here one node represents one disease, and two nodes are linked with an edge if their LOS and number of readmissions are conditionally dependent. This is the very first HDN that jointly analyzes multiple clinical treatment outcomes at the pan-disease level. To accommodate the unique data characteristics, we propose a modeling approach based on two-part generalized linear models and estimation based on penalized integrative analysis. Analysis is conducted on the Medicare inpatient data of 100,000 randomly selected subjects for the period of January 2010 to December 2018. The resulted network has 1008 edges for 106 nodes. We analyze key network properties including connectivity, module/hub, and temporal variation. The findings are biomedically sensible. For example, high connectivity and hub conditions, such as disorders of lipid metabolism and essential hypertension, are identified. There are also findings that are less/not investigated in the literature. Overall, this study can provide additional insight into diseases' properties and their interconnections and assist more efficient disease management and health-care resources allocation.
Collapse
|
35
|
Impact of various cleaning procedures on p‐GaN surfaces. SURF INTERFACE ANAL 2023. [DOI: 10.1002/sia.7207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]
|
36
|
A General Framework for Identifying Hierarchical Interactions and Its Application to Genomics Data. J Comput Graph Stat 2023; 32:873-883. [PMID: 38009111 PMCID: PMC10671243 DOI: 10.1080/10618600.2022.2152034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Accepted: 11/08/2022] [Indexed: 12/03/2022]
Abstract
The analysis of hierarchical interactions has long been a challenging problem due to the large number of candidate main effects and interaction effects, and the need for accommodating the "main effects, interactions" hierarchy. The two-stage analysis methods enjoy simplicity and low computational cost, but contradict the fact that the outcome of interest is attributable to the joint effects of multiple main factors and their interactions. The existing joint analysis methods can accurately describe the underlying data generating process, but suffer from prohibitively high computational cost. And it is not straightforward to extend their optimization algorithms to general loss functions. To address this need, we develop a new computational method that is much faster than the existing joint analysis methods and rivals the runtimes of two-stage analysis. The proposed method, HierFabs, adopts the framework of the forward and backward stagewise algorithm and enjoys computational efficiency and broad applicability. To accommodate hierarchy without imposing additional constraints, it has newly developed forward and backward steps. It naturally accommodates the strong and weak hierarchy, and makes optimization much simpler and faster than in the existing studies. Optimality of HierFabs sequences is investigated theoretically. Simulations show that it outperforms the existing methods. The analysis of TCGA data on melanoma demonstrates its competitive practical performance.
Collapse
|
37
|
Hierarchy‐assisted gene expression regulatory network analysis. Stat Anal Data Min 2023. [DOI: 10.1002/sam.11609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
38
|
Spatio-temporally smoothed deep survival neural network. J Biomed Inform 2023; 137:104255. [PMID: 36462600 PMCID: PMC9845179 DOI: 10.1016/j.jbi.2022.104255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Revised: 11/16/2022] [Accepted: 11/24/2022] [Indexed: 12/03/2022]
Abstract
The analysis of registry data has important implications for cancer monitoring, control, and treatment. In such analysis, (semi)parametric models, such as the Cox Proportional Hazards model, have been routinely adopted. In recent years, deep neural network (DNN) has been shown to excel in many fields with its flexibility and superior prediction performance, and it has been applied to the analysis of cancer survival data. Cancer registry data usually has a broad spatial and temporal coverage, leading to significant heterogeneity. Published studies have suggested that it is not sensible to fit one model for all spatial and temporal locations combined. On the other hand, it is inefficient to fit one model for each spatial/temporal location separately. Motivated by such considerations, in this study, we develop a spatio-temporally smoothed DNN approach for the analysis of cancer registry data with a (censored) survival outcome. This approach can accommodate the significant differences across time and space, while recognizing that the spatial and temporal changes are smooth. It is effectively realized via cutting-edge optimization techniques. To draw more definitive conclusions, we also develop an approach for assessing the importance of each individual input variable. Data on head and neck cancer (HNC) and pancreatic cancer from the Surveillance, Epidemiology, and End Results (SEER) database is analyzed. Compared to direct competitors, the proposed approach leads to network architectures that are smoother. Evaluated using the time-dependent Concordance-Index, it has a better prediction performance. The important variables are also biomedically sensible. Overall, this study can deliver a new and effective tool for deciphering cancer survival at the population level.
Collapse
|
39
|
Hierarchical cancer heterogeneity analysis based on histopathological imaging features. Biometrics 2022; 78:1579-1591. [PMID: 34390584 PMCID: PMC8995088 DOI: 10.1111/biom.13544] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 08/01/2021] [Accepted: 08/06/2021] [Indexed: 12/30/2022]
Abstract
In cancer research, supervised heterogeneity analysis has important implications. Such analysis has been traditionally based on clinical/demographic/molecular variables. Recently, histopathological imaging features, which are generated as a byproduct of biopsy, have been shown as effective for modeling cancer outcomes, and a handful of supervised heterogeneity analysis has been conducted based on such features. There are two types of histopathological imaging features, which are extracted based on specific biological knowledge and using automated imaging processing software, respectively. Using both types of histopathological imaging features, our goal is to conduct the first supervised cancer heterogeneity analysis that satisfies a hierarchical structure. That is, the first type of imaging features defines a rough structure, and the second type defines a nested and more refined structure. A penalization approach is developed, which has been motivated by but differs significantly from penalized fusion and sparse group penalization. It has satisfactory statistical and numerical properties. In the analysis of lung adenocarcinoma data, it identifies a heterogeneity structure significantly different from the alternatives and has satisfactory prediction and stability performance.
Collapse
|
40
|
Multidimensional molecular measurements-environment interaction analysis for disease outcomes. Biometrics 2022; 78:1542-1554. [PMID: 34213006 PMCID: PMC9366385 DOI: 10.1111/biom.13526] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Revised: 02/27/2021] [Accepted: 06/28/2021] [Indexed: 12/30/2022]
Abstract
Multiple types of molecular (genetic, genomic, epigenetic, etc.) measurements, environmental risk factors, and their interactions have been found to contribute to the outcomes and phenotypes of complex diseases. In each of the previous studies, only the interactions between one type of molecular measurement and environmental risk factors have been analyzed. In recent biomedical studies, multidimensional profiling, in which data from multiple types of molecular measurements are collected from the same subjects, is becoming popular. A myriad of recent studies have shown that collectively analyzing multiple types of molecular measurements is not only biologically sensible but also leads to improved estimation and prediction. In this study, we conduct an M-E interaction analysis, with M standing for multidimensional molecular measurements and E standing for environmental risk factors. This can accommodate multiple types of molecular measurements and sufficiently account for their overlapping as well as independent information. Extensive simulation shows that it outperforms several closely related alternatives. In the analysis of TCGA (The Cancer Genome Atlas) data on lung adenocarcinoma and cutaneous melanoma, we make some stable biological findings and achieve stable prediction.
Collapse
|
41
|
A high-efficiency differential expression method for cancer heterogeneity using large-scale single-cell RNA-sequencing data. Front Genet 2022; 13:1063130. [DOI: 10.3389/fgene.2022.1063130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 11/14/2022] [Indexed: 12/03/2022] Open
Abstract
Colorectal cancer is a highly heterogeneous disease. Tumor heterogeneity limits the efficacy of cancer treatment. Single-cell RNA-sequencing technology (scRNA-seq) is a powerful tool for studying cancer heterogeneity at cellular resolution. The sparsity, heterogeneous diversity, and fast-growing scale of scRNA-seq data pose challenges to the flexibility, accuracy, and computing efficiency of the differential expression (DE) methods. We proposed HEART (high-efficiency and robust test), a statistical combination test that can detect DE genes with various sources of differences beyond mean expression changes. To validate the performance of HEART, we compared HEART and the other six popular DE methods on various simulation datasets with different settings by two simulation data generation mechanisms. HEART had high accuracy (F1 score >0.75) and brilliant computational efficiency (less than 2 min) on multiple simulation datasets in various experimental settings. HEART performed well on DE genes detection for the PBMC68K dataset quantified by UMI counts and the human brain single-cell dataset quantified by read counts (F1 score = 0.79, 0.65). By applying HEART to the single-cell dataset of a colorectal cancer patient, we found several potential blood-based biomarkers (CTTN, S100A4, S100A6, UBA52, FAU, and VIM) associated with colorectal cancer metastasis and validated them on additional spatial transcriptomic data of other colorectal cancer patients.
Collapse
|
42
|
The Feasibility of Quad-Modal PET/SPECT/Spectral-CT/CBCT On-Board Imaging in a Small-Animal Radiation Therapy Platform. Int J Radiat Oncol Biol Phys 2022. [DOI: 10.1016/j.ijrobp.2022.07.557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
|
43
|
A Linguistic Analysis of News Coverage of E-Healthcare in China with a Heterogeneous Graphical Model. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1557. [PMID: 36359647 PMCID: PMC9689216 DOI: 10.3390/e24111557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 10/23/2022] [Accepted: 10/26/2022] [Indexed: 06/16/2023]
Abstract
E-healthcare has been envisaged as a major component of the infrastructure of modern healthcare, and has been developing rapidly in China. For healthcare, news media can play an important role in raising public interest and utilization of a particular service and complicating (and, perhaps clouding) debate on public health policy issues. We conducted a linguistic analysis of news reports from January 2015 to June 2021 related to E-healthcare in mainland China, using a heterogeneous graphical modeling approach. This approach can simultaneously cluster the datasets and estimate the conditional dependence relationships of keywords. It was found that there were eight phases of media coverage. The focuses and main topics of media coverage were extracted based on the network hub and module detection. The temporal patterns of media reports were found to be mostly consistent with the policy trend. Specifically, in the policy embryonic period (2015-2016), two phases were obtained, industry management was the main topic, and policy and regulation were the focuses of media coverage. In the policy development period (2017-2019), four phases were discovered. All the four main topics, namely industry development, health care, financial market, and industry management, were present. In 2017 Q3-2017 Q4, the major focuses of media coverage included social security, healthcare and reform, and others. In 2018 Q1, industry regulation and finance became the focuses. In the policy outbreak period (2020-), two phases were discovered. Financial market and industry management were the main topics. Medical insurance and healthcare for the elderly became the focuses. This analysis can offer insights into how the media responds to public policy for E-healthcare, which can be valuable for the government, public health practitioners, health care industry investors, and others.
Collapse
|
44
|
A tree-based gene-environment interaction analysis with rare features. Stat Anal Data Min 2022; 15:648-674. [PMID: 38046814 PMCID: PMC10691867 DOI: 10.1002/sam.11578] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 02/14/2022] [Indexed: 01/20/2023]
Abstract
Gene-environment (G-E) interaction analysis plays a critical role in understanding and modeling complex diseases. Compared to main-effect-only analysis, it is more seriously challenged by higher dimensionality, weaker signals, and the unique "main effects, interactions" variable selection hierarchy. In joint G-E interaction analysis under which a large number of G factors are analysed in a single model, effort tailored to rare features (e.g., SNPs with low minor allele frequencies) has been limited. Existing investigations on rare features have been mostly focused on marginal analysis, where various data aggregation techniques have been developed, and hypothesis testings have been conducted to identify significant aggregated features. However, such techniques cannot be extended to joint G-E interaction analysis. In this study, building on a very recent tree-based data aggregation technique, which has been developed for main-effect-only analysis, we develop a new G-E interaction analysis approach tailored to rare features. The adopted data aggregation technique allows for more efficient information borrowing from neighboring rare features. Similar to some existing state-of-the-art ones, the proposed approach adopts penalization for variable selection, regularized estimation, and respect of the variable selection hierarchy. Simulation shows that it has more accurate identification of important interactions and main effects than several competing alternatives. In the analysis of NFBC1966 study, the proposed approach leads to findings different from the alternatives and with satisfactory prediction and stability performance.
Collapse
|
45
|
Effects of betaine supplementation on reproductive performance of breeding geese. Br Poult Sci 2022; 64:283-288. [PMID: 36164766 DOI: 10.1080/00071668.2022.2128988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
1. An experiment feeding three concentrations of betaine was conducted using breeding geese to analyse the reproductive performance, serum biochemical indexes, egg quality and intestinal immunity.2. A total of 450 female and 90 male Jiangnan White breeding geese were divided into three treatments, with five pen replicates each containing 30 female geese and 6 male geese.3. The results showed that there was no significant effect on the reproductive performance, serum biochemical indexes or jejunal villi goblet cells of geese with different levels of betaine in the diet (P>0.05). Compared with the control group, the addition of 2.5 g/kg betaine to the diet showed a tendency to increase egg mass (P>0.05) the betaine content in the yolk (P<0.05). Feeding betaine significantly increased the height of jejunal villi and egg yolk total cholesterol content in female geese (P<0.05).4. In conclusion, adding betaine to the goose diet was effective in its ability to improve intestinal structures and increase egg production. Adding 2.5 g/kg betaine to feed significantly increased the content of TCHOL and betaine in goose eggs.
Collapse
|
46
|
Default risk prediction and feature extraction using a penalized deep neural network. STATISTICS AND COMPUTING 2022; 32:76. [PMID: 36124203 PMCID: PMC9476445 DOI: 10.1007/s11222-022-10140-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 08/26/2022] [Indexed: 06/15/2023]
Abstract
Online peer-to-peer lending platforms provide loans directly from lenders to borrowers without passing through traditional financial institutions. For lenders on these platforms to avoid loss, it is crucial that they accurately assess default risk so that they can make appropriate decisions. In this study, we develop a penalized deep learning model to predict default risk based on survival data. As opposed to simply predicting whether default will occur, we focus on predicting the probability of default over time. Moreover, by adding an additional one-to-one layer in the neural network, we achieve feature selection and estimation simultaneously by incorporating an L 1 -penalty into the objective function. The minibatch gradient descent algorithm makes it possible to handle massive data. An analysis of a real-world loan data and simulations demonstrate the model's competitive practical performance, which suggests favorable potential applications in peer-to-peer lending platforms.
Collapse
|
47
|
Lq-based robust analytics on ultrahigh and high dimensional data. Stat Med 2022; 41:5220-5241. [PMID: 36098057 DOI: 10.1002/sim.9563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 06/02/2022] [Accepted: 08/02/2022] [Indexed: 11/10/2022]
Abstract
Ultrahigh and high dimensional data are common in regression analysis for various fields, such as omics data, finance, and biological engineering. In addition to the problem of dimension, the data might also be contaminated. There are two main types of contamination: outliers and model misspecification. We develop an unique method that takes into account the ultrahigh or high dimensional issues and both types of contamination. In this article, we propose a framework for feature screening and selection based on the minimum Lq-likelihood estimation (MLqE), which accounts for the model misspecification contamination issue and has also been shown to be robust to outliers. In numerical analysis, we explore the robustness of this framework under different outliers and model misspecification scenarios. To examine the performance of this framework, we conduct real data analysis using the skin cutaneous melanoma data. When comparing with traditional screening and feature selection methods, the proposed method shows superiority in both variable identification effectiveness and parameter estimation accuracy.
Collapse
|
48
|
EP14.01-021 Anlotinib Plus Irinotecan or Docetaxel in Small-Cell Lung Cancer (SCLC) Relapsed within Six Months: a Single-Arm Phase II Study. J Thorac Oncol 2022. [DOI: 10.1016/j.jtho.2022.07.956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
49
|
A nested case-control study of serum polychlorinated biphenyls and papillary thyroid cancer risk among U.S. military service members. ENVIRONMENTAL RESEARCH 2022; 212:113367. [PMID: 35504340 PMCID: PMC9238631 DOI: 10.1016/j.envres.2022.113367] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 04/19/2022] [Accepted: 04/21/2022] [Indexed: 05/09/2023]
Abstract
BACKGROUND AND OBJECTIVES Although polychlorinated biphenyls (PCBs) were banned decades ago, populations are continuously exposed to PCBs due to their persistence and bioaccumulation/biomagnification in the environment. Results from limited epidemiologic studies linking PCBs to thyroid cancer have been inconclusive. This study aimed to investigate the association between individual PCBs and PCB mixture and papillary thyroid cancer (PTC), the most common thyroid cancer histologic subtype. METHODS We carried out a nested case-control study including 742 histologically confirmed PTC cases diagnosed in 2000-2013 and 742 individually matched controls among U.S. military service members. Pre-diagnostic serum samples that were collected on average nine years before PTC diagnosis were used to measure PCB congeners by gas chromatography isotope dilution high resolution mass spectrometry (GC/ID-HRMS). Conditional logistic regression, Bayesian kernel machine regression (BKMR), and weighted quantile sum (WQS) regression were employed to estimate the association between single PCB congeners as well as their mixture and PTC. RESULTS Four PCB congeners (PCB-74, PCB-99, PCB-105, PCB-118) had significant associations and dose-response relationships with increased risk of PTC in single congener models. When considering the effects from all measured PCBs and their potential interactions in the BKMR model, PCB-118 showed positive trends of association with PTC. Increased exposure to the PCB congeners as a mixturewas also associated with an increased risk of PTC in the WQS model, with the mixture dominated by PCB-118, followed by PCB-74 and PCB-99. One PCB congener, PCB-187, showed an inverse trend of association with PTC in the mixture analysis. DISCUSSION This study suggests that exposure to certain PCBs as well as a mixture of PCBs were associated with an increased risk of PTC. The observed association was mainly driven by PCB-118, and to a lesser extent by PCB-74 and PCB-99. The findings warrant further investigation.
Collapse
|
50
|
EP05.01-031 Lysimachia Capillipes Capilliposide C Enhances the Radiosensitivity of Lung Cancer by Promoting ERRFI1 via Inhibiting Phosphorylation of STAT3. J Thorac Oncol 2022. [DOI: 10.1016/j.jtho.2022.07.478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|