1
|
Peng C, Wang J, Asante I, Louie S, Jin R, Chatzi L, Casey G, Thomas DC, Conti DV. A latent unknown clustering integrating multi-omics data (LUCID) with phenotypic traits. Bioinformatics 2019; 36:842-850. [PMID: 31504184 PMCID: PMC7986585 DOI: 10.1093/bioinformatics/btz667] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Revised: 08/04/2019] [Accepted: 08/21/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Epidemiologic, clinical and translational studies are increasingly generating multiplatform omics data. Methods that can integrate across multiple high-dimensional data types while accounting for differential patterns are critical for uncovering novel associations and underlying relevant subgroups. RESULTS We propose an integrative model to estimate latent unknown clusters (LUCID) aiming to both distinguish unique genomic, exposure and informative biomarkers/omic effects while jointly estimating subgroups relevant to the outcome of interest. Simulation studies indicate that we can obtain consistent estimates reflective of the true simulated values, accurately estimate subgroups and recapitulate subgroup-specific effects. We also demonstrate the use of the integrated model for future prediction of risk subgroups and phenotypes. We apply this approach to two real data applications to highlight the integration of genomic, exposure and metabolomic data. AVAILABILITY AND IMPLEMENTATION The LUCID method is implemented through the LUCIDus R package available on CRAN (https://CRAN.R-project.org/package=LUCIDus). SUPPLEMENTARY INFORMATION Supplementary materials are available at Bioinformatics online.
Collapse
Affiliation(s)
- Cheng Peng
- Department of Preventive Medicine, Keck School of Medicine, Los Angeles, CA 90089, USA
| | - Jun Wang
- Department of Preventive Medicine, Keck School of Medicine, Los Angeles, CA 90089, USA
| | - Isaac Asante
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA 90089, USA
| | - Stan Louie
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA 90089, USA
| | - Ran Jin
- Department of Preventive Medicine, Keck School of Medicine, Los Angeles, CA 90089, USA
| | - Lida Chatzi
- Department of Preventive Medicine, Keck School of Medicine, Los Angeles, CA 90089, USA
| | - Graham Casey
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
| | - Duncan C Thomas
- Department of Preventive Medicine, Keck School of Medicine, Los Angeles, CA 90089, USA
| | | |
Collapse
|
2
|
Schmit SL, Figueiredo JC, Cortessis VK, Thomas DC. The Influence of Screening for Precancerous Lesions on Family-Based Genetic Association Tests: An Example of Colorectal Polyps and Cancer. Am J Epidemiol 2015; 182:714-22. [PMID: 26306664 DOI: 10.1093/aje/kwv128] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2014] [Accepted: 05/05/2015] [Indexed: 11/12/2022] Open
Abstract
Unintended consequences of secondary prevention include potential introduction of bias into epidemiologic studies estimating genotype-disease associations. To better understand such bias, we simulated a family-based study of colorectal cancer (CRC), which can be prevented by resecting screen-detected polyps. We simulated genes related to CRC development through risk of polyps (G1), risk of CRC but not polyps (G2), and progression from polyp to CRC (G3). Then, we examined 4 analytical strategies for studying diseases subject to secondary prevention, comparing the following: 1) CRC cases with all controls, without adjusting for polyp history; 2) CRC cases with controls, adjusting for polyp history; 3) CRC cases with only polyp-free controls; and 4) cases with either CRC or polyps with controls having neither. Strategy 1 yielded estimates of association between CRC and each G that were not substantially biased. Strategies 2-4 yielded biased estimates varying in direction according to analysis strategy and gene type. Type I errors were correct, but strategy 1 provided greater power for estimating associations with G2 and G3. We also applied each strategy to case-control data from the Colon Cancer Family Registry (1997-2007). Generally, the best analytical option balancing bias and power is to compare all CRC cases with all controls, ignoring polyps.
Collapse
|
3
|
Magzamen S, Van Sickle D, Rose LD, Cronk C. Environmental pediatrics. Pediatr Ann 2011; 40:144-51. [PMID: 21417205 DOI: 10.3928/00904481-20110217-08] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Affiliation(s)
- Sheryl Magzamen
- Department of Population Health Sciences, University of Wisconsin, Madison, WI, USA
| | | | | | | |
Collapse
|
4
|
Joshi AD, Corral R, Siegmund KD, Haile RW, Le Marchand L, Martínez ME, Ahnen DJ, Sandler RS, Lance P, Stern MC. Red meat and poultry intake, polymorphisms in the nucleotide excision repair and mismatch repair pathways and colorectal cancer risk. Carcinogenesis 2008; 30:472-9. [PMID: 19029193 DOI: 10.1093/carcin/bgn260] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Diets high in red meat have been consistently associated with colorectal cancer (CRC) risk and may result in exposure to carcinogens that cause DNA damage [i.e polycyclic aromatic hydrocarbons, heterocyclic amines (HCAs) and N-nitroso compounds]. Using a family-based study, we investigated whether polymorphisms in the nucleotide excision repair (NER) (ERCC1 3' untranslated region (UTR) G/T, XPD Asp312Asn and Lys751Gln, XPC intron 11 C/A, XPA 5' UTR C/T, XPF Arg415Gln and XPG Asp1104His) and mismatch repair (MLH1 Ile219Val and MSH2 Gly322Asp) pathways modified the association with red meat and poultry intake. We tested for gene-environment interactions using case-only analyses (n = 577) and compared the results using case-unaffected sibling comparisons (n = 307 sibships). Increased risk of CRC was observed for intake of more than or equal to three servings per week of red meat [odds ratio (OR) = 1.8, 95% confidence interval (CI) = 1.3-2.5)] or high-temperature cooked red meat (OR = 1.6, 95% CI = 1.1-2.2). Intake of red meat heavily brown on the outside or inside increased CRC risk only among subjects who carried the XPD codon 751 Lys/Lys genotype (case-only interaction P = 0.006 and P = 0.001, respectively, for doneness outside or inside) or the XPD codon 312 Asp/Asp genotype (case-only interaction P = 0.090 and P < 0.001, respectively). These interactions were stronger for rectal cancer cases (heterogeneity test P = 0.002 for XPD Asp312Asn and P = 0.03 for XPD Lys751Gln) and remained statistically significant after accounting for multiple testing. Case-unaffected sibling analyses were generally supportive of the case-only results. These findings highlight the possible contribution of diets high in red meat to the formation of lesions that elicit the NER pathway, such as carcinogen-induced bulky adducts.
Collapse
Affiliation(s)
- Amit D Joshi
- Department of Preventive Medicine, Keck School of Medicine, Norris Comprehensive Cancer Center, University of Southern California, CA 90089, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Thomas DC. Multistage sampling for latent variable models. LIFETIME DATA ANALYSIS 2007; 13:565-581. [PMID: 17943440 DOI: 10.1007/s10985-007-9061-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2007] [Accepted: 09/12/2007] [Indexed: 05/25/2023]
Abstract
I consider the design of multistage sampling schemes for epidemiologic studies involving latent variable models, with surrogate measurements of the latent variables on a subset of subjects. Such models arise in various situations: when detailed exposure measurements are combined with variables that can be used to assign exposures to unmeasured subjects; when biomarkers are obtained to assess an unobserved pathophysiologic process; or when additional information is to be obtained on confounding or modifying variables. In such situations, it may be possible to stratify the subsample on data available for all subjects in the main study, such as outcomes, exposure predictors, or geographic locations. Three circumstances where analytic calculations of the optimal design are possible are considered: (i) when all variables are binary; (ii) when all are normally distributed; and (iii) when the latent variable and its measurement are normally distributed, but the outcome is binary. In each of these cases, it is often possible to considerably improve the cost efficiency of the design by appropriate selection of the sampling fractions. More complex situations arise when the data are spatially distributed: the spatial correlation can be exploited to improve exposure assignment for unmeasured locations using available measurements on neighboring locations; some approaches for informative selection of the measurement sample using location and/or exposure predictor data are considered.
Collapse
Affiliation(s)
- Duncan C Thomas
- Department of Preventive Medicine, University of Southern California, 1540 Alcazar St., CHP-220, Los Angeles, CA 90089-9011, USA.
| |
Collapse
|
6
|
Abstract
Designs that involve families (the traditional strength of genetic epidemiology) and population-based sampling (the traditional strength of environmental epidemiology) allow investigation of both genes and environment, separately or together, and allow valid inference to the population. These case-control-family designs (including those involving twin pairs), can be regarded as retrospective cohort studies of relatives, and can be used for: determining familial risks and genetic models; estimating risk (penetrance) for measured genotypes; genetic association studies; stratifying risks by family history and known mutation status; and studying modifiers of risk in genetically susceptible individuals. Follow-up of families allows genetic and environmental risks to be studied prospectively. We discuss statistical methods, theoretical and practical strengths, limitations, and other issues. Given their versatility, population-based family studies could become a principal framework in epidemiology, and move genetics from its traditional focus on high-risk families to give it a wider clinical and population health relevance.
Collapse
Affiliation(s)
- John L Hopper
- University of Melbourne, Centre for Molecular, Environmental, Genetic and Analytic Epidemiology, 723 Swanston Street, Carlton, Victoria 3053, Australia.
| | | | | |
Collapse
|
7
|
Thomas DC, Haile RW, Duggan D. Recent developments in genomewide association scans: a workshop summary and review. Am J Hum Genet 2005; 77:337-45. [PMID: 16080110 PMCID: PMC1226200 DOI: 10.1086/432962] [Citation(s) in RCA: 162] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2005] [Accepted: 06/20/2005] [Indexed: 01/18/2023] Open
Abstract
With the imminent availability of ultra-high-volume genotyping platforms (on the order of 100,000-1,000,000 genotypes per sample) at a manageable cost, there is growing interest in the possibility of conducting genomewide association studies for a variety of diseases but, so far, little consensus on methods to design and analyze them. In April 2005, an international group of >100 investigators convened at the University of Southern California over the course of 2 days to compare notes on planned or ongoing studies and to debate alternative technologies, study designs, and statistical methods. This report summarizes these discussions in the context of the relevant literature. A broad consensus emerged that the time was now ripe for launching such studies, and several common themes were identified--most notably the considerable efficiency gains of multistage sampling design, specifically those made by testing only a portion of the subjects with a high-density genomewide technology, followed by testing additional subjects and/or additional SNPs at regions identified by this initial scan.
Collapse
Affiliation(s)
- Duncan C Thomas
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089-9011, USA.
| | | | | |
Collapse
|
8
|
Abstract
We consider two-stage case-control designs for testing associations between single nucleotide polymorphisms (SNPs) and disease, in which a subsample of subjects is used to select a panel of "tagging" SNPs that will be considered in the main study. We propose a pseudolikelihood [Pepe and Flemming, 1991: JASA 86:108-113] that combines the information from both the main study and the substudy to test the association with any polymorphism in the original set. SNP-tagging [Chapman et al., 2003: Hum Hered 56:18-31] and haplotype-tagging [Stram et al., 2003a; Hum Hered 55:27-36] approaches are compared. We show that the cost-efficiency of such a design for estimating the relative risk associated with the causal polymorphism can be considerably better than for a single-stage design, even if the causal polymorphism is not included in the tag-SNP set. We also consider the optimal selection of cases and controls in such designs and the relative efficiency for estimating the location of a causal variant in linkage disequilibrium mapping. Nevertheless, as the cost of high-volume genotyping plummets and haplotype tagging information from the International HapMap project [Gibbs et al., 2003; Nature 426:789-796] rapidly accumulates in public databases, such two-stage designs may soon become unnecessary.
Collapse
Affiliation(s)
- Duncan Thomas
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089-9011, USA.
| | | | | |
Collapse
|
9
|
Chen K, Cai J, Liu XY, Ma XY, Yao KY, Zheng S. Nested case-control study on the risk factors of colorectal cancer. World J Gastroenterol 2003; 9:99-103. [PMID: 12508360 PMCID: PMC4728259 DOI: 10.3748/wjg.v9.i1.99] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
AIM: To investigate the risk factors of colon cancer and rectal cancer.
METHODS: A nested case-control study was conducted in a cohort of 64693 subjects who participated in a colorectal cancer screening program from 1989 to 1998 in Jiashan county, Zhejiang, China. 196 cases of colorectal cancer were detected from 1990 to 1998 as the case group and 980 non-colorectal cancer subjects, matched with factors of age, gender, resident location, were randomly selected from the 64693 cohort as controls. By using univariate analysis and mutivariate conditional logistic regression analysis, the odds ratio (OR) and its 95% confidence interval (95%CI) were calculated between colorectal cancer and personal habits, dietary factors, as well as intestinal related symptoms.
RESULTS: The mutivariate analysis results showed that after matched with age, sex and resident location, mucous blood stool history and mixed sources of drinking water were closely associated with colon cancer and rectal cancer, OR values for the mucous blood stool history were 3.508 (95%CI: 1.370-8.985) and 2.139 (95%CI: 1.040-4.402) respectively; for the mixed drinking water sources, 2.387 (95%CI: 1.243-4.587) and 1.951 (95%CI: 1.086-3.506) respectively. All reached the significant level with a P-value less than 0.05.
CONCLUSION: The study suggested that mucous blood stool history and mixed sources of drinking water were the risk factors of colon cancer and rectal cancer. There was no any significant association between dietary habits and the incidence of colorectal cancer.
Collapse
Affiliation(s)
- Kun Chen
- Department of Epidemiology, Zhejiang University School of Public Health, Hangzhou, 310006 Zhejiang Province, China.
| | | | | | | | | | | |
Collapse
|
10
|
Easson AM, Cotterchio M, Crosby JA, Sutherland H, Dale D, Aronson M, Holowaty E, Gallinger S. A population-based study of the extent of surgical resection of potentially curable colon cancer. Ann Surg Oncol 2002; 9:380-7. [PMID: 11986190 DOI: 10.1007/bf02573873] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
BACKGROUND We attempted to determine factors contributing to the extent of initial curative resection for colon cancer in a population-based cohort. Total abdominal colectomy with ileorectal anastomosis (TAC-IR) may be considered for young patients or those with a colorectal cancer family history to prevent metachronous lesions and facilitate surveillance. METHODS All Ontario patients newly diagnosed with colon cancer over 12 months beginning in July 1997 were staged at the time of surgery. The extent of resection was compared with variables, including familial risk obtained from the Ontario Familial Colon Cancer Registry. RESULTS Complete staging was possible for 86% of patients. A total of 1223 patients had a potentially curative resection: 17%, 46%, and 36% were stage I, II, and III, respectively. Patients were more likely to receive a TAC-IR if they were < or = 50 years old (odds ratio [OR], 3.5; 95% confidence interval [CI], 1.8-6.6), if they had a synchronous lesion (OR, 28.37; 95% CI, 12.2-61.2), or if they were at a teaching hospital (OR, 2.8; 95% CI, 1.6-4.7), but not if they had a family history (OR,.7; 95% CI,.3- 1.5). CONCLUSIONS Young age, teaching hospital, and multiple cancers but not family history were important factors for performing a TAC-IR.
Collapse
Affiliation(s)
- Alexandra M Easson
- Department of Surgical Oncology, Princess Margaret Hospital, Toronto, Ontario, Canada.
| | | | | | | | | | | | | | | |
Collapse
|
11
|
Ko CY, Rusin LC, Schoetz DJ, Moreau L, Coller JC, Murray JJ, Roberts PL, Marcello PW. Long-term outcomes of the ileal pouch anal anastomosis: the association of bowel function and quality of life 5 years after surgery. J Surg Res 2001; 98:102-7. [PMID: 11426437 DOI: 10.1006/jsre.2001.6171] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
INTRODUCTION Previous studies have reported that mean health related quality of life (HRQL) levels generally attain normalcy following construction of an ileal pouch anal anastomosis (IPAA). It appears inconsistent, however, that these normal HRQL levels are achieved while bowel function (BF) scores generally remain statistically worse than "normal" (e.g., 4-8 stools/day, possible anal leakage, diaper usage). To investigate this inconsistency, the current study attempts to determine if any statistical associations are present between HRQL and BF, specifically in the long term. Multivariate regression analyses are performed using each of 8 individual HRQL domains against the full model of BF characteristics. METHODS All patients more than 5 years status post an ileal pouch anal anastomosis (IPAA) procedure for familial adenomatous polyposis (FAP) at a single institution were studied. FAP was chosen because patients are routinely asymptomatic preoperatively. BF (e.g., stool frequency, anal leakage) and HRQL (using the 8 health domains of the SF-36) were assessed by patient interview. Student's t tests and full model multivariate regression analyses were used to analyze associations between BF and HRQL. RESULTS The sample included 25 patients (14 male). Mean age was 39 years, mean follow-up time was 11 years. Although mean scores for the 8 individual HRQL domains were not statistically different from the general United States population, regression analyses of the different domains did demonstrate significant associations with varying levels of BF. While controlling for age and gender, the analyses show that the physical function domain is improved with the ability to pass flatus independent of stool, and physical role and mental health domains are improved with decreased stool frequency. The social function domain is improved with increased stool retention time, while the perception of general health is improved with less diaper usage and less sexual dysfunction. CONCLUSIONS This study shows that a statistically significant association between HRQL levels and BF is present. Of the numerous BF characteristics tested, five appear to be of greater importance with regard to certain HRQL domains. This finding may have clinical implications concerning pouch construction and surgical technique. Methodologically, this study demonstrates that merely using mean levels to describe HRQL may not elucidate meaningful relationships between important clinical outcomes, such as function and HRQL.
Collapse
Affiliation(s)
- C Y Ko
- UCLA School of Medicine, Robert Wood Johnson Clinical Center, B-537 Factor Building, Los Angeles, CA 90095-1736.
| | | | | | | | | | | | | | | |
Collapse
|