Title: | Adaptive Sum of Powered Score Test |
---|---|
Description: | R codes for the (adaptive) Sum of Powered Score ('SPU' and 'aSPU') tests, inverse variance weighted Sum of Powered score ('SPUw' and 'aSPUw') tests and gene-based and some pathway based association tests (Pathway based Sum of Powered Score tests ('SPUpath'), adaptive 'SPUpath' ('aSPUpath') test, 'GEEaSPU' test for multiple traits - single 'SNP' (single nucleotide polymorphism) association in generalized estimation equations, 'MTaSPUs' test for multiple traits - single 'SNP' association with Genome Wide Association Studies ('GWAS') summary statistics, Gene-based Association Test that uses an extended 'Simes' procedure ('GATES'), Hybrid Set-based Test ('HYST') and extended version of 'GATES' test for pathway-based association testing ('GATES-Simes'). ). The tests can be used with genetic and other data sets with covariates. The response variable is binary or quantitative. Summary; (1) Single trait-'SNP' set association with individual-level data ('aSPU', 'aSPUw', 'aSPUr'), (2) Single trait-'SNP' set association with summary statistics ('aSPUs'), (3) Single trait-pathway association with individual-level data ('aSPUpath'), (4) Single trait-pathway association with summary statistics ('aSPUsPath'), (5) Multiple traits-single 'SNP' association with individual-level data ('GEEaSPU'), (6) Multiple traits- single 'SNP' association with summary statistics ('MTaSPUs'), (7) Multiple traits-'SNP' set association with summary statistics('MTaSPUsSet'), (8) Multiple traits-pathway association with summary statistics('MTaSPUsSetPath'). |
Authors: | Il-Youp Kwak and others (See Author(s) in each function manual) |
Maintainer: | Il-Youp Kwak <[email protected]> |
License: | GPL-3 |
Version: | 1.50 |
Built: | 2025-02-01 03:26:39 UTC |
Source: | https://github.com/ikwak2/aspu |
It gives p-values of the SPU tests and aSPU test.
aSPU( Y, X, cov = NULL, resample = c("perm", "boot", "sim"), model = c("gaussian", "binomial"), pow = c(1:8, Inf), n.perm = 1000, threshold = 10^5 )
aSPU( Y, X, cov = NULL, resample = c("perm", "boot", "sim"), model = c("gaussian", "binomial"), pow = c(1:8, Inf), n.perm = 1000, threshold = 10^5 )
Y |
Response or phenotype data. It can be a disease indicator; =0 for controls, =1 for cases. Or it can be a quantitative trait. A vector with length n (number of observations). |
X |
Genotype or other data; each row for a subject, and each column for an SNP (or a predictor). The value of each SNP is the # of the copies for an allele. A matrix with dimension n by k (n : number of observation, k : number of SNPs (or predictors) ). |
cov |
Covariates. A matrix with dimension n by p (n :number of observation, p : number of covariates). |
resample |
Use "perm" for residual permutations, "sim" for simulations from the null distribution, and "boot" for parametric bootstrap. |
model |
Use "gaussian" for a quantitative trait, and use "binomial" for a binary trait. |
pow |
power used in SPU test. A vector of the powers. |
n.perm |
number of permutations or bootstraps. |
threshold |
When n.perm is less than the threshold, we use faster version of aSPU but use more memory. (default is 10^5) |
A list object, Ts : test statistics for the SPU tests (in the order of the specified pow) and finally for the aSPU test. pvs : p-values for the SPU and aSPU tests.
Il-Youp Kwak, Junghi Kim, Yiwei Zhang and Wei Pan
Wei Pan, Junghi Kim, Yiwei Zhang, Xiaotong Shen and Peng Wei (2014) A powerful and adaptive association test for rare variants, Genetics, 197(4), 1081-95
Junghi Kim, Jeffrey R Wozniak, Bryon A Mueller, Xiaotong Shen and Wei Pan (2014) Comparison of statistical tests for group differences in brain functional networks, NeuroImage, 1;101:681-694
data(exdat) ## example analysis using aSPU test on exdat data. # increase n.perm for better p.val ## Not run: out <- aSPU(exdat$Y, exdat$X, cov = NULL, resample = "boot", model = "binomial", pow = c(1:8, Inf), n.perm = 1000) ## End(Not run) out$Ts # This is a vector of Test Statistics for SPU and aSPU tests. # SPU1 to SPUInf corresponds with the option pow=c(1:8, Inf) # They are SPU test statistics. # The last element aSPU is minimum of them, aSPU statistic. out$pvs # This is a vector of p-values for SPU and aSPU tests. # SPU1 to SPUInf corresponds with the option pow=c(1:8, Inf) # They are p-values for corresponding SPU tests. # The last element is p-value of aSPU test.
data(exdat) ## example analysis using aSPU test on exdat data. # increase n.perm for better p.val ## Not run: out <- aSPU(exdat$Y, exdat$X, cov = NULL, resample = "boot", model = "binomial", pow = c(1:8, Inf), n.perm = 1000) ## End(Not run) out$Ts # This is a vector of Test Statistics for SPU and aSPU tests. # SPU1 to SPUInf corresponds with the option pow=c(1:8, Inf) # They are SPU test statistics. # The last element aSPU is minimum of them, aSPU statistic. out$pvs # This is a vector of p-values for SPU and aSPU tests. # SPU1 to SPUInf corresponds with the option pow=c(1:8, Inf) # They are p-values for corresponding SPU tests. # The last element is p-value of aSPU test.
It gives the p-values of the SPU(1), SPU(2) and minP tests and aSPUd test based on the asymptotic distribution.
aSPUd(Y, X, cov = NULL, model = c("gaussian", "binomial"))
aSPUd(Y, X, cov = NULL, model = c("gaussian", "binomial"))
Y |
phenotype data. It can be disease lables; =0 for controls, =1 for cases. or It can be any quantitative traits. Vector with length n (number of observations) |
X |
genotype data; each row for a subject, and each column for an SNP. The value of each element is the # of the copies for an allele. Matrix with dimension n by k (n : number of observation, k : number of genotype data) |
cov |
covariates. Matrix with dimension n by p (n :number of observation, p : number of covariates) |
model |
Use "gaussian" for quantitative trait (Default) , and Use "binomial" for binary trait. |
p-values for SPU(1), SPU(2), minP tests and aSPU test.
Gongjun Xu, Lifeng Lin, Peng Wei and Wei Pan (2016) An adaptive two-sample test for high-dimensional means, Biometrika (2016) 103 (3): 609-624.
data(exdat) out <- aSPUd(exdat$Y, exdat$X, cov = NULL, model = "binomial") out
data(exdat) out <- aSPUd(exdat$Y, exdat$X, cov = NULL, model = "binomial") out
It gives p-values of the SPUpath tests and aSPUpath test.
aSPUpath( Y, X, cov = NULL, model = c("binomial", "gaussian"), snp.info, gene.info, pow = c(1:8, Inf), pow2 = c(1, 2, 4, 8), n.perm = 200, usePCs = F, varprop = 0.95 )
aSPUpath( Y, X, cov = NULL, model = c("binomial", "gaussian"), snp.info, gene.info, pow = c(1:8, Inf), pow2 = c(1, 2, 4, 8), n.perm = 200, usePCs = F, varprop = 0.95 )
Y |
Response or phenotype data. It can be a disease indicator; =0 for controls, =1 for cases. Or it can be a quantitative trait. A vector with length n (number of observations). |
X |
Genotype or other data; each row for a subject, and each column for an SNP (or a predictor). The value of each SNP is the # of the copies for an allele. A matrix with dimension n by k (n : number of observation, k : number of SNPs (or predictors) ). |
cov |
Covariates. A matrix with dimension n by p (n :number of observation, p : number of covariates). |
model |
Use "gaussian" for a quantitative trait, and use "binomial" for a binary trait. |
snp.info |
SNP information matrix, the 1st column is SNP id, 2nd column is chromosome #, 3rd column indicates SNP location. |
gene.info |
GENE information matrix, The 1st column is GENE id, 2nd column is chromosome #, 3rd and 4th column indicate start and end positions of the gene. |
pow |
SNP specific power(gamma values) used in SPUpath test. |
pow2 |
GENE specific power(gamma values) used in SPUpath test. |
n.perm |
number of permutations. |
usePCs |
indicating whether to extract PCs and then use PCs of X. |
varprop |
the proportion of the variations explained (cutoff) that determines how many top PCs to use. |
P-values for SPUpath tests and aSPUpath test.
Il-Youp Kwak and Wei Pan
Wei Pan, Il-Youp Kwak and Peng Wei (2015) A Powerful and Pathway-Based Adaptive Test for Genetic Association With Common or Rare Variants, The American Journal of Human Genetics, 97, 86-98
## Not run: dat1<-simPathAR1Snp(nGenes=20, nGenes1=5, nSNPlim=c(1, 20), nSNP0=1, LOR=.2, n=100, MAFlim=c(0.05, 0.4), p0=0.05 ) ## End(Not run) # p-values of SPUpath and aSPUpath tests. ## Not run: p.pathaspu<- aSPUpath(dat1$Y, dat1$X, snp.info = dat1$snp.info, gene.info = dat1$gene.info, model = "binomial", pow=1:8, pow2=c(1, 2, 4, 8), n.perm=1000) ## End(Not run) p.pathaspu ## pow = 1:8 and pow2 = 1,2,4,8 ## So, there are 8*4 = 32 SPUpath p-values. ## SPUpathi,j corresponds pow = i , pow2 = j ## The last element, aSPUpath gives aSPUpath p-value.
## Not run: dat1<-simPathAR1Snp(nGenes=20, nGenes1=5, nSNPlim=c(1, 20), nSNP0=1, LOR=.2, n=100, MAFlim=c(0.05, 0.4), p0=0.05 ) ## End(Not run) # p-values of SPUpath and aSPUpath tests. ## Not run: p.pathaspu<- aSPUpath(dat1$Y, dat1$X, snp.info = dat1$snp.info, gene.info = dat1$gene.info, model = "binomial", pow=1:8, pow2=c(1, 2, 4, 8), n.perm=1000) ## End(Not run) p.pathaspu ## pow = 1:8 and pow2 = 1,2,4,8 ## So, there are 8*4 = 32 SPUpath p-values. ## SPUpathi,j corresponds pow = i , pow2 = j ## The last element, aSPUpath gives aSPUpath p-value.
It gives p-values of the SPUpath tests and aSPUpath test. Faster than aSPUsPath function when n is large (N > 10^4).
aSPUpath2( Y, X, cov = NULL, model = c("binomial", "gaussian"), snp.info, gene.info, pow = c(1:8, Inf), pow2 = c(1, 2, 4, 8), n.perm = 200, usePCs = F, varprop = 0.95 )
aSPUpath2( Y, X, cov = NULL, model = c("binomial", "gaussian"), snp.info, gene.info, pow = c(1:8, Inf), pow2 = c(1, 2, 4, 8), n.perm = 200, usePCs = F, varprop = 0.95 )
Y |
Response or phenotype data. It can be a disease indicator; =0 for controls, =1 for cases. Or it can be a quantitative trait. A vector with length n (number of observations). |
X |
Genotype or other data; each row for a subject, and each column for an SNP (or a predictor). The value of each SNP is the # of the copies for an allele. A matrix with dimension n by k (n : number of observation, k : number of SNPs (or predictors) ). |
cov |
Covariates. A matrix with dimension n by p (n :number of observation, p : number of covariates). |
model |
Use "gaussian" for a quantitative trait, and use "binomial" for a binary trait. |
snp.info |
SNP information matrix, the 1st column is SNP id, 2nd column is chromosome #, 3rd column indicates SNP location. |
gene.info |
GENE information matrix, The 1st column is GENE id, 2nd column is chromosome #, 3rd and 4th column indicate start and end positions of the gene. |
pow |
SNP specific power(gamma values) used in SPUpath test. |
pow2 |
GENE specific power(gamma values) used in SPUpath test. |
n.perm |
number of permutations. |
usePCs |
indicating whether to extract PCs and then use PCs of X. |
varprop |
the proportion of the variations explained (cutoff) that determines how many top PCs to use. |
P-values for SPUpath tests and aSPUpath test.
Il-Youp Kwak and Wei Pan
Wei Pan, Il-Youp Kwak and Peng Wei (2015) A Powerful and Pathway-Based Adaptive Test for Genetic Association With Common or Rare Variants, The American Journal of Human Genetics, 97, 86-98
## Not run: dat1<-simPathAR1Snp(nGenes=20, nGenes1=5, nSNPlim=c(1, 20), nSNP0=1, LOR=.2, n=100, MAFlim=c(0.05, 0.4), p0=0.05 ) ## End(Not run) # p-values of SPUpath and aSPUpath tests. ## Not run: p.pathaspu<- aSPUpath2(dat1$Y, dat1$X, snp.info = dat1$snp.info, gene.info = dat1$gene.info, model = "binomial", pow=1:8, pow2=c(1, 2, 4, 8), n.perm=1000) ## End(Not run) #p.pathaspu ## pow = 1:8 and pow2 = 1,2,4,8 ## So, there are 8*4 = 32 SPUpath p-values. ## SPUpathi,j corresponds pow = i , pow2 = j ## The last element, aSPUpath gives aSPUpath p-value.
## Not run: dat1<-simPathAR1Snp(nGenes=20, nGenes1=5, nSNPlim=c(1, 20), nSNP0=1, LOR=.2, n=100, MAFlim=c(0.05, 0.4), p0=0.05 ) ## End(Not run) # p-values of SPUpath and aSPUpath tests. ## Not run: p.pathaspu<- aSPUpath2(dat1$Y, dat1$X, snp.info = dat1$snp.info, gene.info = dat1$gene.info, model = "binomial", pow=1:8, pow2=c(1, 2, 4, 8), n.perm=1000) ## End(Not run) #p.pathaspu ## pow = 1:8 and pow2 = 1,2,4,8 ## So, there are 8*4 = 32 SPUpath p-values. ## SPUpathi,j corresponds pow = i , pow2 = j ## The last element, aSPUpath gives aSPUpath p-value.
The test is based on the Huber loss function and using the parametric bootstrap for inference (i.e. bootstrapping residuals).
aSPUr(Y, X, cov = NULL, pow = c(1:8, Inf), B = 1000, C = 1.345)
aSPUr(Y, X, cov = NULL, pow = c(1:8, Inf), B = 1000, C = 1.345)
Y |
a vector of quantitative traits (QTs). |
X |
Genotype or other data; each row for a subject, and each column for an SNP (or a predictor). The value of each SNP is the # of the copies for an allele. A matrix with dimension n by k (n : number of observation, k : number of SNPs (or predictors) ). |
cov |
Covariates. A matrix with dimension n by k2 (n :number of observation, k2 : number of covariates). |
pow |
power used in SPUr test. A vector of the powers. |
B |
number of bootstraps. |
C |
Constant in huber loss function. C = 1.345 is chosen to maintain a high efficiency for a Normal error. |
p-values of the SPUr tests in the order of supplied pow values; finally, the p-value of the aSPUr test (that combines the SPUs tests with pow by taking their min P-value and adjust for multiple testing).
Yiwei Zhang and Wei Pan
Peng Wei, Ying Cao, Yiwei Zhang, Zhiyuan Xu, Il-Youp Kwak, Eric Boerwinkle, Wei Pan (2016) On Robust Association Testing for Quantitative Traits and Rare Variants, G3, 6(12) 3941-3950.
data(exdat) ## example analysis using aSPU test on exdat data. QT <- jitter(exdat$Y) out <- aSPUr(Y = QT, X = exdat$X, cov = NULL, B = 100) out ## This is a vector of p-values for SPUr and aSPUr tests. ## SPU1 to SPUInf corresponds with the option pow=c(1:8, Inf) ## They are p-values for corresponding SPUr tests. ## The last element is p-value of aSPUr test.
data(exdat) ## example analysis using aSPU test on exdat data. QT <- jitter(exdat$Y) out <- aSPUr(Y = QT, X = exdat$X, cov = NULL, B = 100) out ## This is a vector of p-values for SPUr and aSPUr tests. ## SPU1 to SPUInf corresponds with the option pow=c(1:8, Inf) ## They are p-values for corresponding SPUr tests. ## The last element is p-value of aSPUr test.
It gives p-values of the SPUs tests and aSPUs test with GWAS summary statistics.
aSPUs(Zs, corSNP, pow = c(1:8, Inf), n.perm = 1000, Ps = FALSE, prune = TRUE)
aSPUs(Zs, corSNP, pow = c(1:8, Inf), n.perm = 1000, Ps = FALSE, prune = TRUE)
Zs |
Z-scores for each SNPs. It could be P-values if the Ps option is TRUE. |
corSNP |
Correlation matirx of the SNPs to be tested; estimated from a reference panel (based on the same set of the reference alleles as used in calculating Z-scores). |
pow |
power used in SPU test. A vector of the powers. |
n.perm |
number of permutations or bootstraps. |
Ps |
TRUE if input is p-value, FALSE if input is Z-scores. The default is FALSE. |
prune |
if it is TRUE, do pruing before the test using pruneSNP function. |
A list object, Ts : test statistics for the SPU tests (in the order of the specified pow) and finally for the aSPU test. pvs : p-values for the SPUs and aSPUs tests.
Il-Youp Kwak and Wei Pan
Il-Youp Kwak, Wei Pan (2015) Adaptive Gene- and Pathway-Trait Association Testing with GWAS Summary Statistics, Bioinformatics, 32(8), 1178-1184.
data(kegg9) ## example analysis using aSPUM test. g <- kegg9$gene.info[1,1] # SOAT1 ## Take snps mapped on gene "SOAT1" from the information of gene.info and snp.info. snps <- which( ( kegg9$snp.info[,2] == kegg9$gene.info[kegg9$gene.info[,1] == g, 2] ) & (kegg9$snp.info[,3] > kegg9$gene.info[kegg9$gene.info[,1] == g, 3] ) & (kegg9$snp.info[,3] < kegg9$gene.info[kegg9$gene.info[,1] == g, 4] ) ) ## Take subsets newP <- kegg9$nP[snps] ; ldsub <- kegg9$ldmatrix[snps, snps]; ## Get p-value for gene SOAT1. Read vignette for details. out <- aSPUs(newP, corSNP=ldsub , pow=c(1:8, Inf), n.perm=100, Ps=TRUE) out$Ts # This is a vector of Test Statistics for SPUM and aSPUM tests. # SPUs1 to SPUsInf corresponds with the option pow=c(1:8, Inf) # They are SPUs test statistics. # The last element aSPUs is minimum of them, aSPUs statistic. out$pvs # This is a vector of p-values for SPUs and aSPUs tests. # SPUs1 to SPUsInf corresponds with the option pow=c(1:8, Inf) # They are p-values for corresponding SPUs tests. # The last element is p-value of aSPUs test.
data(kegg9) ## example analysis using aSPUM test. g <- kegg9$gene.info[1,1] # SOAT1 ## Take snps mapped on gene "SOAT1" from the information of gene.info and snp.info. snps <- which( ( kegg9$snp.info[,2] == kegg9$gene.info[kegg9$gene.info[,1] == g, 2] ) & (kegg9$snp.info[,3] > kegg9$gene.info[kegg9$gene.info[,1] == g, 3] ) & (kegg9$snp.info[,3] < kegg9$gene.info[kegg9$gene.info[,1] == g, 4] ) ) ## Take subsets newP <- kegg9$nP[snps] ; ldsub <- kegg9$ldmatrix[snps, snps]; ## Get p-value for gene SOAT1. Read vignette for details. out <- aSPUs(newP, corSNP=ldsub , pow=c(1:8, Inf), n.perm=100, Ps=TRUE) out$Ts # This is a vector of Test Statistics for SPUM and aSPUM tests. # SPUs1 to SPUsInf corresponds with the option pow=c(1:8, Inf) # They are SPUs test statistics. # The last element aSPUs is minimum of them, aSPUs statistic. out$pvs # This is a vector of p-values for SPUs and aSPUs tests. # SPUs1 to SPUsInf corresponds with the option pow=c(1:8, Inf) # They are p-values for corresponding SPUs tests. # The last element is p-value of aSPUs test.
It gives p-values of the SPUs tests and aSPUs test with GWAS summary statistics.
aSPUsD(Zs, corrSNP, Ps = FALSE)
aSPUsD(Zs, corrSNP, Ps = FALSE)
Zs |
Z-scores for each SNPs. It could be P-values if the Ps option is TRUE. |
corrSNP |
Correlation matirx of the SNPs to be tested; estimated from a reference panel (based on the same set of the reference alleles as used in calculating Z-scores). |
Ps |
TRUE if input is p-value, FALSE if input is Z-scores. The default is FALSE. |
pvs : p-values for the SPUsD and aSPUsD tests.
Il-Youp Kwak and Wei Pan
Gongjun Xu, Lifeng Lin, Peng Wei and Wei Pan (2016) An adaptive two-sample test for high-dimensional means, Biometrika (2016) 103 (3): 609-624. Il-Youp Kwak, Wei Pan (2015) Adaptive Gene- and Pathway-Trait Association Testing with GWAS Summary Statistics, Bioinformatics, 32(8), 1178-1184
data(kegg9) ## example analysis using aSPUM test. g <- kegg9$gene.info[1,1] # SOAT1 ## Take snps mapped on gene "SOAT1" from the information of gene.info and snp.info. snps <- which( ( kegg9$snp.info[,2] == kegg9$gene.info[kegg9$gene.info[,1] == g, 2] ) & (kegg9$snp.info[,3] > kegg9$gene.info[kegg9$gene.info[,1] == g, 3] ) & (kegg9$snp.info[,3] < kegg9$gene.info[kegg9$gene.info[,1] == g, 4] ) ) ## Take subsets newP <- kegg9$nP[snps] ; ldsub <- kegg9$ldmatrix[snps, snps]; ## Get p-value for gene SOAT1. Read vignette for details. out <- aSPUsD(newP, corrSNP=ldsub, Ps=TRUE) out
data(kegg9) ## example analysis using aSPUM test. g <- kegg9$gene.info[1,1] # SOAT1 ## Take snps mapped on gene "SOAT1" from the information of gene.info and snp.info. snps <- which( ( kegg9$snp.info[,2] == kegg9$gene.info[kegg9$gene.info[,1] == g, 2] ) & (kegg9$snp.info[,3] > kegg9$gene.info[kegg9$gene.info[,1] == g, 3] ) & (kegg9$snp.info[,3] < kegg9$gene.info[kegg9$gene.info[,1] == g, 4] ) ) ## Take subsets newP <- kegg9$nP[snps] ; ldsub <- kegg9$ldmatrix[snps, snps]; ## Get p-value for gene SOAT1. Read vignette for details. out <- aSPUsD(newP, corrSNP=ldsub, Ps=TRUE) out
It gives p-values of the SPUsPath tests and aSPUsPath test with GWAS summary statistics.
aSPUsPath( Zs, corSNP, pow = c(1, 2, 4, 8, Inf), pow2 = c(1, 2, 4, 8), snp.info, gene.info, n.perm = 1000, Ps = FALSE, prune = TRUE )
aSPUsPath( Zs, corSNP, pow = c(1, 2, 4, 8, Inf), pow2 = c(1, 2, 4, 8), snp.info, gene.info, n.perm = 1000, Ps = FALSE, prune = TRUE )
Zs |
Z-scores for each SNPs. It could be P-values if the Ps option is TRUE. |
corSNP |
Correlation matirx of the SNPs to be tested; estimated from a reference panel (based on the same set of the reference alleles as used in calculating Z-scores). |
pow |
SNP specific power(gamma values) used in SPUpath test. |
pow2 |
GENE specific power(gamma values) used in SPUpath test. |
snp.info |
SNP information matrix, the 1st column is SNP id, 2nd column is chromosome #, 3rd column indicates SNP location. |
gene.info |
GENE information matrix, The 1st column is GENE id, 2nd column is chromosome #, 3rd and 4th column indicate start and end positions of the gene. |
n.perm |
number of permutations. |
Ps |
TRUE if input is p-value, FALSE if input is Z-scores. The default is FALSE. |
prune |
if it is TRUE, do pruing before the test using pruneSNP function. |
P-values for SPUMpath tests and aSPUMpath test.
Il-Youp Kwak and Wei Pan
Il-Youp Kwak, Wei Pan (2015) Adaptive Gene- and Pathway-Trait Association Testing with GWAS Summary Statistics, Bioinformatics, 32(8):1178-1184
data(kegg9) # p-values of SPUpath and aSPUpath tests. out.a <- aSPUsPath(kegg9$nP, corSNP = kegg9$ldmatrix, pow=c(1:8, Inf), pow2 = c(1,2,4,8), snp.info=kegg9$snp.info, gene.info = kegg9$gene.info, n.perm=5, Ps = TRUE) out.a
data(kegg9) # p-values of SPUpath and aSPUpath tests. out.a <- aSPUsPath(kegg9$nP, corSNP = kegg9$ldmatrix, pow=c(1:8, Inf), pow2 = c(1,2,4,8), snp.info=kegg9$snp.info, gene.info = kegg9$gene.info, n.perm=5, Ps = TRUE) out.a
It gives p-values of the SPUsPath tests and aSPUsPath test with GWAS summary statistics. Faster than aSPUsPath function when n is large (N > 10^4).
aSPUsPath2( Zs, corSNP, pow = c(1, 2, 4, 8, Inf), pow2 = c(1, 2, 4, 8), snp.info, gene.info, n.perm = 1000, Ps = FALSE, prune = TRUE )
aSPUsPath2( Zs, corSNP, pow = c(1, 2, 4, 8, Inf), pow2 = c(1, 2, 4, 8), snp.info, gene.info, n.perm = 1000, Ps = FALSE, prune = TRUE )
Zs |
Z-scores for each SNPs. It could be P-values if the Ps option is TRUE. |
corSNP |
Correlation matirx of the SNPs to be tested; estimated from a reference panel (based on the same set of the reference alleles as used in calculating Z-scores). |
pow |
SNP specific power(gamma values) used in SPUpath test. |
pow2 |
GENE specific power(gamma values) used in SPUpath test. |
snp.info |
SNP information matrix, the 1st column is SNP id, 2nd column is chromosome #, 3rd column indicates SNP location. |
gene.info |
GENE information matrix, The 1st column is GENE id, 2nd column is chromosome #, 3rd and 4th column indicate start and end positions of the gene. |
n.perm |
number of permutations. |
Ps |
TRUE if input is p-value, FALSE if input is Z-scores. The default is FALSE. |
prune |
if it is TRUE, do pruing before the test using pruneSNP function. |
P-values for SPUMpath tests and aSPUMpath test.
Il-Youp Kwak and Wei Pan
Il-Youp Kwak, Wei Pan (2015) Adaptive Gene- and Pathway-Trait Association Testing with GWAS Summary Statistics, Bioinformatics, 32(8):1178-1184
data(kegg9) # p-values of SPUpath and aSPUpath tests. ## Not run: out.a <- aSPUsPath2(kegg9$nP, corSNP = kegg9$ldmatrix, pow=c(1:8, Inf), pow2 = c(1,2,4,8), snp.info=kegg9$snp.info, gene.info = kegg9$gene.info, n.perm=1000, Ps = TRUE) ## End(Not run) #out.a
data(kegg9) # p-values of SPUpath and aSPUpath tests. ## Not run: out.a <- aSPUsPath2(kegg9$nP, corSNP = kegg9$ldmatrix, pow=c(1:8, Inf), pow2 = c(1,2,4,8), snp.info=kegg9$snp.info, gene.info = kegg9$gene.info, n.perm=1000, Ps = TRUE) ## End(Not run) #out.a
It gives the p-values of the SPUw tests and aSPUw test based on the permutations of the residuals or simulations from the null distripution.
aSPUw( Y, X, cov = NULL, resample = c("perm", "boot", "sim"), model = c("gaussian", "binomial"), pow = c(1:8, Inf), n.perm = 1000 )
aSPUw( Y, X, cov = NULL, resample = c("perm", "boot", "sim"), model = c("gaussian", "binomial"), pow = c(1:8, Inf), n.perm = 1000 )
Y |
Response or phenotype data. It can be a disease indicator; =0 for controls, =1 for cases. Or it can be a quantitative trait. A vector with length n (number of observations). |
X |
Genotype or other data; each row for a subject, and each column for an SNP (or a predictor). The value of each SNP is the # of the copies for an allele. A matrix with dimension n by k (n : number of observation, k : number of SNPs (or predictors) ). |
cov |
Covariates. A matrix with dimension n by p (n :number of observation, p : number of covariates). |
resample |
Use "perm" for residual permutations, "sim" for simulations from the null distribution, and "boot" for parametric bootstrap. |
model |
Use "gaussian" for a quantitative trait, and use "binomial" for a binary trait. |
pow |
power used in SPU test. A vector of the powers. |
n.perm |
number of permutations or bootstraps. |
A list object, Ts : Test Statistics for the SPUw and aSPUw test. pvs : p-values for the SPUw and aSPUw test.
Il-Youp Kwak, Junghi Kim and Wei Pan
Junghi Kim, Jeffrey R Wozniak, Bryon A Mueller, Xiaotong Shen and Wei Pan (2014) Comparison of statistical tests for group differences in brain functional networks, Neuroimage, 1;101:681-694
data(exdat) # increase n.perm for accurate p-value ## Not run: out <- aSPUw(exdat$Y, exdat$X, pow = c(1:8, Inf), n.perm = 1000) out$Ts # This is a vector of Test Statistics for SPU and aSPU tests. # SPU1 to SPUInf corresponds with the option pow=c(1:8, Inf) # They are SPU test statistics. # The last element aSPU is minimum of them, aSPU statistic. out$pvs # This is a vector of p-values for SPU and aSPU tests. # SPU1 to SPUInf corresponds with the option pow=c(1:8, Inf) # They are p-values for corresponding SPU tests. # The last element is p-value of aSPU test.
data(exdat) # increase n.perm for accurate p-value ## Not run: out <- aSPUw(exdat$Y, exdat$X, pow = c(1:8, Inf), n.perm = 1000) out$Ts # This is a vector of Test Statistics for SPU and aSPU tests. # SPU1 to SPUInf corresponds with the option pow=c(1:8, Inf) # They are SPU test statistics. # The last element aSPU is minimum of them, aSPU statistic. out$pvs # This is a vector of p-values for SPU and aSPU tests. # SPU1 to SPUInf corresponds with the option pow=c(1:8, Inf) # They are p-values for corresponding SPU tests. # The last element is p-value of aSPU test.
Estimate the covariance matrix of multiple traits based on their (null) summary Z-scores.
estcov(allZ, Ps = FALSE)
estcov(allZ, Ps = FALSE)
allZ |
matrix of summary Z-scores for all SNP. each row for SNP; each column for single trait. |
Ps |
TRUE if input is p-value, FALSE if input is Z-scores. The default is FALSE. |
estimated correlation matrix.
Junghi Kim, Yun Bai and Wei Pan
Junghi Kim, Yun Bai and Wei Pan (2015) An Adaptive Association Test for Multiple Phenotypes with GWAS Summary Statistics, Genetic Epidemiology, 8:651-663
# -- n.snp: number of SNPs # -- n.trait: number of traits # -- n.subject: number of subjects n.snp <- 100 n.traits <- 10 n.subjects <- 1000 traits <- matrix(rnorm(n.subjects*n.traits), n.subjects, n.traits) v <- cov(traits) allZ <- rmvnorm(n.snp, Sigma=v) colnames(allZ) <- paste("trait", 1:n.traits, sep="") rownames(allZ) <- paste("snp", 1:n.snp, sep="") r <- estcov(allZ) MTaSPUs(Z = allZ, v = r, B = 100, pow = c(1:4, Inf), transform = FALSE) MTaSPUs(Z = allZ[1,], v = r, B = 100, pow = c(1:4, Inf), transform = FALSE) minP(Zi= allZ[1,], r = r)
# -- n.snp: number of SNPs # -- n.trait: number of traits # -- n.subject: number of subjects n.snp <- 100 n.traits <- 10 n.subjects <- 1000 traits <- matrix(rnorm(n.subjects*n.traits), n.subjects, n.traits) v <- cov(traits) allZ <- rmvnorm(n.snp, Sigma=v) colnames(allZ) <- paste("trait", 1:n.traits, sep="") rownames(allZ) <- paste("snp", 1:n.snp, sep="") r <- estcov(allZ) MTaSPUs(Z = allZ, v = r, B = 100, pow = c(1:4, Inf), transform = FALSE) MTaSPUs(Z = allZ[1,], v = r, B = 100, pow = c(1:4, Inf), transform = FALSE) minP(Zi= allZ[1,], r = r)
The exdat data set is list of three objects. exdat$Y is a vector of length 1000, 500 0s and 500 1s. exdat$X is a matrix of 1000 by 10. This simulated X matrix is assumed to be rare variants. All eliments are 0, 1 or 2. exdat$SNP0indx is a vector of length 10. The values are 1 or 0, 0 indicate corresponding column of X matrix have no association with Y vector.
data(exdat)
data(exdat)
data(exdat) exdat$X[c(1:10, 501:510),] exdat$Y[c(1:10, 501:510)] # 3rd and 10th column of X have no association with Y exdat$SNP0indx
data(exdat) exdat$X[c(1:10, 501:510),] exdat$Y[c(1:10, 501:510)] # 3rd and 10th column of X have no association with Y exdat$SNP0indx
Get the p-value of GATES. Usually it is used to get genomewise p-values. This function is taken from postgwas package. There is a little modification of the code GATES in postgwas package. 1) The approximated matrix may have negative eigen value, we modified it not to have negative values; 2) we added one more return (the key gene location) for Hyst method.
GATES2(ldmatrix, p)
GATES2(ldmatrix, p)
ldmatrix |
numeric. A correlation matrix of SNPs, dimensions matching the p and snps arguments. |
p |
p-value for each SNPs. |
A p-value of GATES and the key gene location (to be used by Hyst).
Milan Hiersche(taken from pastgwas package), Il-Youp Kwak(modified a little)
Miao-Xin Li, Hong-Sheng Gui, Johnny S.H. Kwan and Pak C. Sham (2011) GATES: A Rapid and Powerful Gene-Based Association Test Using Extended Simes Procedure The American Journal of Human Genetics 88, 283-293
#simula <- simPathAR1Snp(nGenes=20, nGenes1=1, nSNPlim=c(1, 20), nSNP0=1:3, # LOR=.2, rholim=c(0,0), # n=30, MAFlim=c(0.05, 0.4), p0=0.05) #Ps <- getlogitp(simula$Y, simula$X) ## get correlation of SNPs using controls #ldmat <- cor(simula$X[ simula$Y == 0, ]) #o.pvec = order(Ps) # ldmat <- ldmat[o.pvec, o.pvec] #(gatesp <- GATES2(ldmat, sort(Ps))[1])
#simula <- simPathAR1Snp(nGenes=20, nGenes1=1, nSNPlim=c(1, 20), nSNP0=1:3, # LOR=.2, rholim=c(0,0), # n=30, MAFlim=c(0.05, 0.4), p0=0.05) #Ps <- getlogitp(simula$Y, simula$X) ## get correlation of SNPs using controls #ldmat <- cor(simula$X[ simula$Y == 0, ]) #o.pvec = order(Ps) # ldmat <- ldmat[o.pvec, o.pvec] #(gatesp <- GATES2(ldmat, sort(Ps))[1])
Get the p-value of GATES-Simes. It uses an extended Simes procedure to combine GATES p-values across multiple genes in a pathway.
GatesSimes(pvec, ldmatrix, snp.info, gene.info)
GatesSimes(pvec, ldmatrix, snp.info, gene.info)
pvec |
p-values for each SNP. |
ldmatrix |
numeric. A correlation matrix of SNPs with dimensions matching the length of pvec (the number of SNPs). |
snp.info |
SNP information matrix, the 1st column is SNP id, 2nd column is chromosome #, 3rd column indicates SNP location. |
gene.info |
GENE information matrix, The 1st column is GENE id, 2nd column is chromosome #, 3rd and 4th column indicate start and end positions of the gene. |
A p-value.
Il-Youp Kwak and Wei Pan
Hongsheng Gui, Miaoxin Li, Pak C Sham and Stacey S Cherny (2011) Comparisons of seven algorithms for pathway analysis using the WTCCC Crohn's Disease BMC Research Notes, 4:386
#simula <- simPathAR1Snp(nGenes=20, nGenes1=1, nSNPlim=c(1, 20), nSNP0=1:3, # LOR=.2, rholim=c(0,0), # n=30, MAFlim=c(0.05, 0.4), p0=0.05) #logitp <- getlogitp(simula$Y, simula$X) ## get correlation of SNPs using controls #ldmat <- cor(simula$X[ simula$Y == 0, ]) #out <- GatesSimes(pvec = logitp, ldmatrix = ldmat, snp.info = simula$snp.info, # gene.info = simula$gene.info) #out
#simula <- simPathAR1Snp(nGenes=20, nGenes1=1, nSNPlim=c(1, 20), nSNP0=1:3, # LOR=.2, rholim=c(0,0), # n=30, MAFlim=c(0.05, 0.4), p0=0.05) #logitp <- getlogitp(simula$Y, simula$X) ## get correlation of SNPs using controls #ldmat <- cor(simula$X[ simula$Y == 0, ]) #out <- GatesSimes(pvec = logitp, ldmatrix = ldmat, snp.info = simula$snp.info, # gene.info = simula$gene.info) #out
It gives p-values of the GEESPU tests and GEEaSPU test.
GEEaSPU( traits, geno, Z = NULL, model = c("binomial", "gaussian"), gamma = c(1:8, Inf), n.sim = 1000, corstr = "independence" )
GEEaSPU( traits, geno, Z = NULL, model = c("binomial", "gaussian"), gamma = c(1:8, Inf), n.sim = 1000, corstr = "independence" )
traits |
trait matrix. The row for individuals and the column for traits. |
geno |
A matrix of genetic information. |
Z |
covariates. |
model |
Use "gaussian" for a quantitative trait, and use "binomial" for a binary trait. |
gamma |
power used in GEEaSPU test. A vector of the powers. |
n.sim |
number of simulations. |
corstr |
a character string specifying the correlation structure. The following are permitted: "independence", "fixed", "stat_M_dep", "non_stat_M_dep", "exchangeable", "AR-M" and "unstructured" |
p-values for the GEE-SPU and GEE-aSPU test.
Junghi Kim, Wei Pan and Il-Youp Kwak
Yiwei Zhang, Zhiyuan Xu, Xiaotong Shen, Wei Pan (2014) Testing for association with multiple traits in generalized estimation equations, with application to neuroimaging data. Neuroimage. 96:309-25
traits <- matrix(rnorm(100*5, 0,1), ncol=5) Z <- rnorm(100, 2, 0.5) geno <- rbinom(100, 2, 0.5) out <- GEEaSPU(traits, geno, Z = NULL, model = "gaussian", gamma = c(1:8,Inf), n.sim = 100)
traits <- matrix(rnorm(100*5, 0,1), ncol=5) Z <- rnorm(100, 2, 0.5) geno <- rbinom(100, 2, 0.5) out <- GEEaSPU(traits, geno, Z = NULL, model = "gaussian", gamma = c(1:8,Inf), n.sim = 100)
Get p-value using logistic regresion for each of the multiple SNPs
getlogitp(Y, X)
getlogitp(Y, X)
Y |
Response or phenotype data. It can be a disease indicator; =0 for controls, =1 for cases. |
X |
Genotype or other data; each row for a subject, and each column for an SNP (or a predictor). The value of each SNP is the # of the copies for an allele. A matrix with dimension n by p (n : number of observation, p : number of SNPs (or predictors) ). |
p-values for each SNPs.
#simula <- simPathAR1Snp(nGenes=20, nGenes1=1, nSNPlim=c(1, 20), nSNP0=1:3, # LOR=.2, rholim=c(0,0), # n=10, MAFlim=c(0.05, 0.4), p0=0.05) #logitp <- getlogitp(simula$Y, simula$X)
#simula <- simPathAR1Snp(nGenes=20, nGenes1=1, nSNPlim=c(1, 20), nSNP0=1:3, # LOR=.2, rholim=c(0,0), # n=10, MAFlim=c(0.05, 0.4), p0=0.05) #logitp <- getlogitp(simula$Y, simula$X)
Get a p-value using HYST.
Hyst(pvec, ldmatrix, snp.info, gene.info)
Hyst(pvec, ldmatrix, snp.info, gene.info)
pvec |
p-values for each SNP. |
ldmatrix |
numeric. A correlation matrix of SNPs with dimensions matching the length of pvec (the number of SNPs). |
snp.info |
SNP information matrix, the 1st column is SNP id, 2nd column is chromosome #, 3rd column indicates SNP location. |
gene.info |
GENE information matrix, The 1st column is GENE id, 2nd column is chromosome #, 3rd and 4th column indicate start and end positions of the gene. |
A p-value.
Il-Youp Kwak and Wei Pan
Miao-Xin Li, Johnny S.H. Kwan and Pak C. Sham (2012) HYST: A Hybrid Set-Based Test for Genome-wide Association Studies, with Application to Protein-Protein Interaction-Based Association Analysis The American Journal of Human Genetics, 91, 478-488.
#simula <- simPathAR1Snp(nGenes=20, nGenes1=1, nSNPlim=c(1, 20), nSNP0=1:3, # LOR=.2, rholim=c(0,0), # n=30, MAFlim=c(0.05, 0.4), p0=0.05) #logitp <- getlogitp(simula$Y, simula$X) ## get correlation of SNPs using controls #ldmat <- cor(simula$X[ simula$Y == 0, ]) #out <- Hyst(pvec = logitp, ldmatrix = ldmat, snp.info = simula$snp.info, # gene.info = simula$gene.info)
#simula <- simPathAR1Snp(nGenes=20, nGenes1=1, nSNPlim=c(1, 20), nSNP0=1:3, # LOR=.2, rholim=c(0,0), # n=30, MAFlim=c(0.05, 0.4), p0=0.05) #logitp <- getlogitp(simula$Y, simula$X) ## get correlation of SNPs using controls #ldmat <- cor(simula$X[ simula$Y == 0, ]) #out <- Hyst(pvec = logitp, ldmatrix = ldmat, snp.info = simula$snp.info, # gene.info = simula$gene.info)
Data on coronary artery disease, myocardial infarction have been contributed by CARDIoGRAMplusC4D investigators and have been downloaded from www.CARDIOGRAMPLUSC4D.ORG
The data set contains P value data for coronary artery disease (CAD). We mapped these SNPs to the 9th KEGG pathway.
gene.info is a contains 16 gene information of 9th Kegg pathway. The 1st column is the name of the gene, 2nd column is the chromosome number, 3rd column is where the gene starts and 4th column is where the gene ends.
snp.info contains 330 snp informations mapped on this Kegg pathway. The 1st column is the rsID of the SNP, 2nd column is the chromosome number and the 3rd column is the location of the SNP.
Ps is a vector containing p-value information for mapped SNPs.
The 'PPs' is a list object contains the SNP information for each genes. For example, PPs$CEL contains snp.info matrix mapped on gene 'CEL'. The 1st column is the rsID of the SNP, 2nd column is the chromosome number, the 3rd column is the location of the SNP and the 4th column is p-value of the SNP.
nP is a subvector of Ps. These are SNPs on reference population(Hapmap CEU phase 2 data, downloadable from plink, SNPs with MAF less than 5 percent omitted)
ldmatrix is a correaltion matrix of SNPs from the reference population.
data(kegg9)
data(kegg9)
Schunkert H, Konig IR, Kathiresan S, Reilly MP, Assimes TL, Holm H et al. (2011) Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat Genet. 43: 333-338
data(kegg9) ## gene informations kegg9$gene.info ## SNPs mapped on 3rd and 4th gene in 9th Kegg pathway kegg9$PPs[3:4] ## snp information kegg9$snp.info ## The 1st 10 P-values of SNPs mapped on 9th Kegg pathway. kegg9$nP[1:10] ## correlation matrix among those SNPs kegg9$ldmatrix[1:10,1:10]
data(kegg9) ## gene informations kegg9$gene.info ## SNPs mapped on 3rd and 4th gene in 9th Kegg pathway kegg9$PPs[3:4] ## snp information kegg9$snp.info ## The 1st 10 P-values of SNPs mapped on 9th Kegg pathway. kegg9$nP[1:10] ## correlation matrix among those SNPs kegg9$ldmatrix[1:10,1:10]
Return exact minP test p-value for multiple traits - single SNP association.
minP(Zi, r)
minP(Zi, r)
Zi |
a vector of summary Z-scores for single SNP |
r |
estimated correlation matrix based on the summary Z-scores (output of estcov) |
return exact minP test
Junghi Kim, Yun Bai and Wei Pan
Junghi Kim, Yun Bai and Wei Pan (2015) An Adaptive Association Test for Multiple Phenotypes with GWAS Summary Statistics, Genetic Epidemiology, 8:651-663
# -- n.snp: number of SNPs # -- n.trait: number of traits # -- n.subject: number of subjects n.snp <- 100 n.traits <- 10 n.subjects <- 1000 traits <- matrix(rnorm(n.subjects*n.traits), n.subjects, n.traits) v <- cov(traits) allZ <- rmvnorm(n.snp, Sigma=v) colnames(allZ) <- paste("trait", 1:n.traits, sep="") rownames(allZ) <- paste("snp", 1:n.snp, sep="") r <- estcov(allZ) MTaSPUs(Z = allZ, v = r, B = 100, pow = c(1:4, Inf), transform = FALSE) MTaSPUs(Z = allZ[1,], v = r, B = 100, pow = c(1:4, Inf), transform = FALSE) minP(Zi= allZ[1,], r = r)
# -- n.snp: number of SNPs # -- n.trait: number of traits # -- n.subject: number of subjects n.snp <- 100 n.traits <- 10 n.subjects <- 1000 traits <- matrix(rnorm(n.subjects*n.traits), n.subjects, n.traits) v <- cov(traits) allZ <- rmvnorm(n.snp, Sigma=v) colnames(allZ) <- paste("trait", 1:n.traits, sep="") rownames(allZ) <- paste("snp", 1:n.snp, sep="") r <- estcov(allZ) MTaSPUs(Z = allZ, v = r, B = 100, pow = c(1:4, Inf), transform = FALSE) MTaSPUs(Z = allZ[1,], v = r, B = 100, pow = c(1:4, Inf), transform = FALSE) minP(Zi= allZ[1,], r = r)
SNP based adaptive association test for multiple phenotypes with GWAS summary statistics.
MTaSPUs(Z, v, B, pow, transform = FALSE, Ps = FALSE)
MTaSPUs(Z, v, B, pow, transform = FALSE, Ps = FALSE)
Z |
matrix of summary Z-scores, SNPs in rows and traits in columns. Or a vector of summary Z-scores for a single snp |
v |
estimated correlation matrix based on the summary Z-scores (output of estcov) |
B |
number of Monte Carlo samples simulated to compute p-values, the maximum number of MC simulations is 1e8 |
pow |
power used in SPU test. A vector of the powers. |
transform |
if TRUE, the inference is made on transformed Z |
Ps |
TRUE if input is p-value, FALSE if input is Z-scores. The default is FALSE. |
compute p-values for SPU(gamma) i.e. pow=1:8, and infinity aSPU, based on the minimum p-values over SPU(power) each row for single SNP
Junghi Kim, Yun Bai and Wei Pan
Junghi Kim, Yun Bai and Wei Pan (2015) An Adaptive Association Test for Multiple Phenotypes with GWAS Summary Statistics, Genetic Epidemiology, 8:651-663
# -- n.snp: number of SNPs # -- n.trait: number of traits # -- n.subject: number of subjects n.snp <- 100 n.traits <- 10 n.subjects <- 1000 traits <- matrix(rnorm(n.subjects*n.traits), n.subjects, n.traits) v <- cov(traits) allZ <- rmvnorm(n.snp, Sigma=v) colnames(allZ) <- paste("trait", 1:n.traits, sep="") rownames(allZ) <- paste("snp", 1:n.snp, sep="") r <- estcov(allZ) MTaSPUs(Z = allZ, v = r, B = 100, pow = c(1:4, Inf), transform = FALSE) MTaSPUs(Z = allZ[1,], v = r, B = 100, pow = c(1:4, Inf), transform = FALSE) minP(Zi= allZ[1,], r = r)
# -- n.snp: number of SNPs # -- n.trait: number of traits # -- n.subject: number of subjects n.snp <- 100 n.traits <- 10 n.subjects <- 1000 traits <- matrix(rnorm(n.subjects*n.traits), n.subjects, n.traits) v <- cov(traits) allZ <- rmvnorm(n.snp, Sigma=v) colnames(allZ) <- paste("trait", 1:n.traits, sep="") rownames(allZ) <- paste("snp", 1:n.snp, sep="") r <- estcov(allZ) MTaSPUs(Z = allZ, v = r, B = 100, pow = c(1:4, Inf), transform = FALSE) MTaSPUs(Z = allZ[1,], v = r, B = 100, pow = c(1:4, Inf), transform = FALSE) minP(Zi= allZ[1,], r = r)
It gives p-values of the MTSPUsSet tests and MTaSPUsSet test with GWAS summary statistics.
MTaSPUsSet( Zs, corSNP, corPhe, pow = c(1, 2, 4, 8), pow2 = c(1, 2, 4, 8), n.perm = 5000, Ps = FALSE, prune = TRUE )
MTaSPUsSet( Zs, corSNP, corPhe, pow = c(1, 2, 4, 8), pow2 = c(1, 2, 4, 8), n.perm = 5000, Ps = FALSE, prune = TRUE )
Zs |
Z-score matrix. row represent SNPs and column represent traits. It could be P-values if the Ps option is TRUE. |
corSNP |
Correlation matirx of the SNPs to be tested; estimated from a reference panel (based on the same set of the reference alleles as used in calculating Z-scores). |
corPhe |
Correlation matirx of phenotypes to be tested; Estimated from Z-scores. |
pow |
SNP specific power(gamma values) used in MTSPUsSet test. |
pow2 |
GENE specific power(gamma values) used in MTSPUsSet test. |
n.perm |
number of permutations or bootstraps. |
Ps |
TRUE if input is p-value, FALSE if input is Z-scores. The default is FALSE. |
prune |
if it is TRUE, do pruing before the test using pruneSNP function. |
A vector object, MTSPUsSet test P values and MTaSPUsSet P value.
Il-Youp Kwak and Wei Pan
Il-Youp Kwak, Wei Pan (2017) Gene- and pathway-based association tests for multiple traits with GWAS summary statistics, Bioinformatics, 33(1), 64-71.
data(SAMD11) attach(SAMD11) ## example analysis using MTaSPUsSet test. (outFZ <- MTaSPUsSet(ZsF, corSNP=corSNPF, corPhe = corPheF, pow=c(1,2,4,8), pow2 = c(1,2,4,8), n.perm=10, Ps=FALSE))
data(SAMD11) attach(SAMD11) ## example analysis using MTaSPUsSet test. (outFZ <- MTaSPUsSet(ZsF, corSNP=corSNPF, corPhe = corPheF, pow=c(1,2,4,8), pow2 = c(1,2,4,8), n.perm=10, Ps=FALSE))
It gives p-values of the MTSPUsSetPath tests and MTaSPUsSetPath test with GWAS summary statistics.
MTaSPUsSetPath( Zs, corPhe, corSNP, pow1 = c(1, 2, 4, 8), pow2 = c(1, 2, 4, 8), pow3 = c(1, 2, 4, 8), snp.info, gene.info, n.perm = 1000, Ps = FALSE, prune = TRUE )
MTaSPUsSetPath( Zs, corPhe, corSNP, pow1 = c(1, 2, 4, 8), pow2 = c(1, 2, 4, 8), pow3 = c(1, 2, 4, 8), snp.info, gene.info, n.perm = 1000, Ps = FALSE, prune = TRUE )
Zs |
Z-score matrix. row represent SNPs and column represent traits. It could be P-values if the Ps option is TRUE. |
corPhe |
Correlation matirx of phenotypes to be tested; Estimated from Z-scores. |
corSNP |
Correlation matirx of the SNPs to be tested; estimated from a reference panel (based on the same set of the reference alleles as used in calculating Z-scores). |
pow1 |
SNP specific power(gamma values) used in MTSPUsSetPath test. |
pow2 |
GENE specific power(gamma values) used in MTSPUsSetPath test. |
pow3 |
Trait specific power(gamma values) used in MTSPUsSetPath test. |
snp.info |
SNP information matrix, the 1st column is SNP id, 2nd column is chromosome #, 3rd column indicates SNP location. |
gene.info |
GENE information matrix, The 1st column is GENE id, 2nd column is chromosome #, 3rd and 4th column indicate start and end positions of the gene. |
n.perm |
number of permutations. |
Ps |
TRUE if input is p-value, FALSE if input is Z-scores. The default is FALSE. |
prune |
if it is TRUE, do pruing before the test using pruneSNP function. |
P-values for MTSPUsSetpath tests and MTaSPUsSetpPath test.
Il-Youp Kwak and Wei Pan
Il-Youp Kwak, Wei Pan (2017) Gene- and pathway-based association tests for multiple traits with GWAS summary statistics, Bioinformatics, 33(1), 64-71
Zs <- cbind ( c( 0.3, 0.2, 0.1,0.5,1.2), c(-.1, .3,-.1,.1,1.2) ) varSNP = cbind( c( 1, .1,0, 0, .11), c(.1,1, 0, 0, 0), c(0,0, 1, 0, 0), c(0,0, 0, 1, 0), c(.11,0,0,0,1) ) varPhe = cbind( c( 1, -.1), c(-.1,1) ) gene.info = data.frame( Gnm = c( "G1", "G2"), chr = c(1,3), loc1 = c(0, 0), loc2 = c(10,10) ) snp.info = data.frame( rsid = c("rs1", "rs2", "rs3", "rs4", "rs5"), chr = c(1,1,3,3,3), loc = c(1,2,1,2,3) ) out <- MTaSPUsSetPath(Zs, corPhe = varPhe, corSNP=varSNP, n.perm = 100, snp.info = snp.info, gene.info = gene.info) out
Zs <- cbind ( c( 0.3, 0.2, 0.1,0.5,1.2), c(-.1, .3,-.1,.1,1.2) ) varSNP = cbind( c( 1, .1,0, 0, .11), c(.1,1, 0, 0, 0), c(0,0, 1, 0, 0), c(0,0, 0, 1, 0), c(.11,0,0,0,1) ) varPhe = cbind( c( 1, -.1), c(-.1,1) ) gene.info = data.frame( Gnm = c( "G1", "G2"), chr = c(1,3), loc1 = c(0, 0), loc2 = c(10,10) ) snp.info = data.frame( rsid = c("rs1", "rs2", "rs3", "rs4", "rs5"), chr = c(1,1,3,3,3), loc = c(1,2,1,2,3) ) out <- MTaSPUsSetPath(Zs, corPhe = varPhe, corSNP=varSNP, n.perm = 100, snp.info = snp.info, gene.info = gene.info) out
It gives p-values of the MTSPUsSet tests MTScore and MTaSPUsSet_Score test with GWAS summary statistics.
MTaSPUsSetScore( Zs, corSNP, corPhe, pow = c(1, 2, 4, 8), pow2 = c(1, 2, 4, 8), n.perm = 5000, Ps = FALSE )
MTaSPUsSetScore( Zs, corSNP, corPhe, pow = c(1, 2, 4, 8), pow2 = c(1, 2, 4, 8), n.perm = 5000, Ps = FALSE )
Zs |
Z-score matrix. row represent SNPs and column represent traits. It could be P-values if the Ps option is TRUE. |
corSNP |
Correlation matirx of the SNPs to be tested; estimated from a reference panel (based on the same set of the reference alleles as used in calculating Z-scores). |
corPhe |
Correlation matirx of phenotypes to be tested; Estimated from Z-scores. |
pow |
SNP specific power(gamma values) used in MTSPUsSetScore test. |
pow2 |
GENE specific power(gamma values) used in MTSPUsSetScore test. |
n.perm |
number of permutations or bootstraps. |
Ps |
TRUE if input is p-value, FALSE if input is Z-scores. The default is FALSE. |
A vector object, MTSPUsSet test, MTScore test P values and MTaSPUsSet_Score P value.
Il-Youp Kwak and Wei Pan
Il-Youp Kwak, Wei Pan (2017) Gene- and pathway-based association tests for multiple traits with GWAS summary statistics, Bioinformatics. 33(1), 64-71
data(SAMD11) attach(SAMD11) ## example analysis using aSPUM test. (outFZ <- MTaSPUsSetScore(ZsF, corSNP=corSNPF, corPhe = corPheF, pow=c(1,2,4,8), pow2 = c(1,2,4,8), n.perm=10, Ps=FALSE))
data(SAMD11) attach(SAMD11) ## example analysis using aSPUM test. (outFZ <- MTaSPUsSetScore(ZsF, corSNP=corSNPF, corPhe = corPheF, pow=c(1,2,4,8), pow2 = c(1,2,4,8), n.perm=10, Ps=FALSE))
It gives a P-value image for a given P-value matrix.
plotPmat( Ps, zlim = NULL, main = NULL, yt = NULL, xlab = "SNPs", thresh = -log(5e-08, 10), trait.names = NULL )
plotPmat( Ps, zlim = NULL, main = NULL, yt = NULL, xlab = "SNPs", thresh = -log(5e-08, 10), trait.names = NULL )
Ps |
P-value matrix. Row represent traits, column represent SNPs |
zlim |
-log 10 transformed p value range. |
main |
main title. |
yt |
it controls the location of trait names. |
xlab |
lable for x axis. Default is "SNPs". |
thresh |
Default is set to widely used genome wise threshold -log(5e-8,10). |
trait.names |
A vector of trait names. |
Image of P-values matrix.
Il-Youp Kwak and Wei Pan
Il-Youp Kwak, Wei Pan (2017) Gene- and pathway-based association tests for multiple traits with GWAS summary statistics, Bioinformatics. 33(1), 64-71
## Say we have 3 traits and their p-values at 5 SNPs. Ps <- rbind( c(0.001, 0.4, 0.5, 0.00000001, .1), c(0.03, 0.3, 0.3, 0.00001, .2), c(0.01, 0.2, 0.4, 0.001, .0001) ) ## We can visualize it using plotPmat function. plotPmat(Ps)
## Say we have 3 traits and their p-values at 5 SNPs. Ps <- rbind( c(0.001, 0.4, 0.5, 0.00000001, .1), c(0.03, 0.3, 0.3, 0.00001, .2), c(0.01, 0.2, 0.4, 0.001, .0001) ) ## We can visualize it using plotPmat function. plotPmat(Ps)
When correlation matirx have highly correlated SNPs, the performance of aSPUs, aSPUsPath and MTaSPUsSet are not very good. Do pruning using this function.
pruneSNP(corSNP, rup = 0.95, rdown = -0.95)
pruneSNP(corSNP, rup = 0.95, rdown = -0.95)
corSNP |
Correlation matirx of the SNPs to be tested; estimated from a reference panel (based on the same set of the reference alleles as used in calculating Z-scores). |
rup |
pruning criteria, erase one SNP when correlation between two SNPs are larger than this value. |
rdown |
pruning criteria, erase one SNP when correlation between two SNPs are smaller than this value. |
a list object pruned.corSNP and to.erase. pruned.corSNP is pruned correlation matrix. to.erase is SNP index to erase to get purned object. (i.e. corSNP[-to.erase, -to.erase] = pruned.corSNP )
Il Youp Kwak
data(kegg9) ## example analysis using aSPUM test. g <- kegg9$gene.info[1,1] # SOAT1 ## Take snps mapped on gene "SOAT1" from the information of gene.info and snp.info. snps <- which( ( kegg9$snp.info[,2] == kegg9$gene.info[kegg9$gene.info[,1] == g, 2] ) & (kegg9$snp.info[,3] > kegg9$gene.info[kegg9$gene.info[,1] == g, 3] ) & (kegg9$snp.info[,3] < kegg9$gene.info[kegg9$gene.info[,1] == g, 4] ) ) ## Take subsets newP <- kegg9$nP[snps] ; ldsub <- kegg9$ldmatrix[snps, snps]; prSNP <- pruneSNP(ldsub) newP <- newP[-prSNP$to.erase] ldsub <- ldsub[-prSNP$to.erase, -prSNP$to.erase ] ## Get p-value for gene SOAT1. Read vignette for details. out <- aSPUs(newP, corSNP=ldsub , pow=c(1:8, Inf), n.perm=100, Ps=TRUE) out$Ts # This is a vector of Test Statistics for SPUM and aSPUM tests. # SPUs1 to SPUsInf corresponds with the option pow=c(1:8, Inf) # They are SPUs test statistics. # The last element aSPUs is minimum of them, aSPUs statistic. out$pvs # This is a vector of p-values for SPUs and aSPUs tests. # SPUs1 to SPUsInf corresponds with the option pow=c(1:8, Inf) # They are p-values for corresponding SPUs tests. # The last element is p-value of aSPUs test.
data(kegg9) ## example analysis using aSPUM test. g <- kegg9$gene.info[1,1] # SOAT1 ## Take snps mapped on gene "SOAT1" from the information of gene.info and snp.info. snps <- which( ( kegg9$snp.info[,2] == kegg9$gene.info[kegg9$gene.info[,1] == g, 2] ) & (kegg9$snp.info[,3] > kegg9$gene.info[kegg9$gene.info[,1] == g, 3] ) & (kegg9$snp.info[,3] < kegg9$gene.info[kegg9$gene.info[,1] == g, 4] ) ) ## Take subsets newP <- kegg9$nP[snps] ; ldsub <- kegg9$ldmatrix[snps, snps]; prSNP <- pruneSNP(ldsub) newP <- newP[-prSNP$to.erase] ldsub <- ldsub[-prSNP$to.erase, -prSNP$to.erase ] ## Get p-value for gene SOAT1. Read vignette for details. out <- aSPUs(newP, corSNP=ldsub , pow=c(1:8, Inf), n.perm=100, Ps=TRUE) out$Ts # This is a vector of Test Statistics for SPUM and aSPUM tests. # SPUs1 to SPUsInf corresponds with the option pow=c(1:8, Inf) # They are SPUs test statistics. # The last element aSPUs is minimum of them, aSPUs statistic. out$pvs # This is a vector of p-values for SPUs and aSPUs tests. # SPUs1 to SPUsInf corresponds with the option pow=c(1:8, Inf) # They are p-values for corresponding SPUs tests. # The last element is p-value of aSPUs test.
Genetic Investigation of ANthropometric Traits (GIANT) consortium data contain P-values of 2.7 million SNPs with six anthropometric traits that are well established to represent body size and shape: height, weight, BMI, waist circumference (WC), hip circumference(HIP).
We mapped SNPs on gene SAMD11. This subdata contains P values, Z scores of SNPs mapped on gene SAMD11. It also contains correlation among SNPs and correlation among phenotypes for demostrated example of MTaSPUsSet test.
data(SAMD11)
data(SAMD11)
Shungin D, Winkler TW, Croteau-Chonka DC, Ferreira T, Locke AE, Magi R, Strawbridge R, Pers TH, Fischer K, Justice AE, Workalemahu T, Wu JM, et al. (2015). New genetic loci link adipose and insulin biology to body fat distribution. Nature 518: 187-196.
data(SAMD11) ## Z-scores for man SAMD11$ZsM ## P-values for man SAMD11$PsM ## correlation among SNPs for man SAMD11$corSNPM ## correlation among phenotypes for man SAMD11$corPheM
data(SAMD11) ## Z-scores for man SAMD11$ZsM ## P-values for man SAMD11$PsM ## correlation among SNPs for man SAMD11$corSNPM ## correlation among phenotypes for man SAMD11$corPheM
It gives a simulated SNPs consisting of multiple genes in a pathway. Each SNPs from a latent multivariate Gaussian variable with an AR1 correlation structure.
simPathAR1Snp( nGenes = 10, nGenes1 = 5, nSNPs = NULL, ncSNPs = NULL, nSNPlim = c(1, 20), nSNP0 = 1:3, LOR = 0.3, n = 100, MAFlim = c(0.05, 0.4), rholim = c(0, 0), p0 = 0.05, noncausal = FALSE )
simPathAR1Snp( nGenes = 10, nGenes1 = 5, nSNPs = NULL, ncSNPs = NULL, nSNPlim = c(1, 20), nSNP0 = 1:3, LOR = 0.3, n = 100, MAFlim = c(0.05, 0.4), rholim = c(0, 0), p0 = 0.05, noncausal = FALSE )
nGenes |
The number of total genes. |
nGenes1 |
The number of causal genes. |
nSNPs |
A vector, length matched with total number of genes. Each elements of vector indicate the number of SNPs in the gene. Default is nSNPs = NULL, in this case the number of nSNPs randomly selected from nSNPlow to nSNPup. |
ncSNPs |
A vector, length matched with total number of genes. Each elements of vector indicate the number of causal SNPs in the gene. Default is ncSNPs = NULL, in this case the number of ncSNPs are randomly selected from nSNP0. |
nSNPlim |
If nSNPs = NULL, the number of SNPs in Gene randomly selected in nSNPlim. |
nSNP0 |
If ncSNPs = NULL, the number of causal SNPs in Gene randomly selected from nSNP0. Default is 1:3. |
LOR |
Association in log OR between a causal SNP and outcome. |
n |
# of cases (= # of controls). |
MAFlim |
MAF's of the SNPs are drawn from Unif(MAFlim[1], MAFlim[2]). |
rholim |
the SNPs in eahc gene are from a latent Normal variable with a AR(rho) corr structure, rho's are drawn from Unif(rholim[1], rholim[2]); the SNPs in diff genes are independant. |
p0 |
background disease prevalence;i.e. intercept=log(p0/(1-p0)). |
noncausal |
exclude causal SNPs if TRUE, it is the simulation set up d in the paper(Pan et al 2015). |
a list of the binary outcome Y (=0 or 1) and SNPs (=0, 1 or 2); Y is a vector of length 2n; X is a matrix of 2n by nSNP.
# Simulation set up A a) in the paper (Pan et al 2015) ## Not run: simula <- simPathAR1Snp(nGenes=20, nGenes1=1, nSNPlim=c(1, 20), nSNP0=1:3, LOR=.2, rholim=c(0,0), n=100, MAFlim=c(0.05, 0.4), p0=0.05) ## End(Not run) # Simulation set up A b) in the paper #simulb <- simPathAR1Snp(nGenes=20, nGenes1=1, nSNPlim=c(1, 100), nSNP0=1:3, # LOR=.2, rholim=c(0,0), # n=100, MAFlim=c(0.05, 0.4), p0=0.05)
# Simulation set up A a) in the paper (Pan et al 2015) ## Not run: simula <- simPathAR1Snp(nGenes=20, nGenes1=1, nSNPlim=c(1, 20), nSNP0=1:3, LOR=.2, rholim=c(0,0), n=100, MAFlim=c(0.05, 0.4), p0=0.05) ## End(Not run) # Simulation set up A b) in the paper #simulb <- simPathAR1Snp(nGenes=20, nGenes1=1, nSNPlim=c(1, 100), nSNP0=1:3, # LOR=.2, rholim=c(0,0), # n=100, MAFlim=c(0.05, 0.4), p0=0.05)
Genetic Investigation of ANthropometric Traits (GIANT) consortium data contain P-values of 2.7 million SNPs with six anthropometric traits that are well established to represent body size and shape: height, weight, BMI, waist circumference (WC), hip circumference(HIP).
We mapped SNPs on gene LCORL, RASA2, STK33 and RPGRIP1L. This subdata contains P values, correlation matrices for demostrated example of MTaSPUsSet test.
data(someGs)
data(someGs)
Shungin D, Winkler TW, Croteau-Chonka DC, Ferreira T, Locke AE, Magi R, Strawbridge R, Pers TH, Fischer K, Justice AE, Workalemahu T, Wu JM, et al. (2015). New genetic loci link adipose and insulin biology to body fat distribution. Nature 518: 187-196.
data(someGs) ## P-values for LCORL someGs$LCORL[[1]] ## correlation matrix for LCORL someGs$LCORL[[2]] ## P-values for RASA2 someGs$RASA2[[1]] ## correlation matrix for LCORL someGs$RASA2[[2]] # See vignettes for more details.
data(someGs) ## P-values for LCORL someGs$LCORL[[1]] ## correlation matrix for LCORL someGs$LCORL[[2]] ## P-values for RASA2 someGs$RASA2[[1]] ## correlation matrix for LCORL someGs$RASA2[[2]] # See vignettes for more details.