Title: | Fused Sparse Structural Equation Models to Jointly Infer Gene Regulatory Network |
---|---|
Description: | An optimizer of Fused-Sparse Structural Equation Models, which is the state of the art jointly fused sparse maximum likelihood function for structural equation models proposed by Xin Zhou and Xiaodong Cai (2018 <doi:10.1101/466623>). |
Authors: | Xin Zhou, Xiaodong Cai |
Maintainer: | Xin Zhou <[email protected]> |
License: | GPL (>=3) |
Version: | 0.1.8 |
Built: | 2025-01-23 05:57:09 UTC |
Source: | https://github.com/ivis4ml/fssemr |
cv.multiFSSEMiPALM
cv.multiFSSEMiPALM( Xs, Ys, Bs, Fs, Sk, sigma2, nlambda = 20, nrho = 20, nfold = 5, p, q, wt = TRUE, plot = FALSE )
cv.multiFSSEMiPALM( Xs, Ys, Bs, Fs, Sk, sigma2, nlambda = 20, nrho = 20, nfold = 5, p, q, wt = TRUE, plot = FALSE )
Xs |
eQTL matrices |
Ys |
Gene expression matrices |
Bs |
initialized GRN-matrices |
Fs |
initialized eQTL effect matrices |
Sk |
eQTL index of genes |
sigma2 |
initialized noise variance |
nlambda |
number of hyper-parameter of lasso term in CV |
nrho |
number of hyper-parameter of fused-lasso term in CV |
nfold |
CVfold number. Default 5/10 |
p |
number of genes |
q |
number of eQTLs |
wt |
use adaptive lasso or not. Default TRUE. |
plot |
plot contour of cvmean or not. Default FALSE. |
list of cross-validation result
cv.multiFSSEMiPALM2
cv.multiFSSEMiPALM2( Xs, Ys, Bs, Fs, Sk, sigma2, nlambda = 20, nrho = 20, nfold = 5, p, q, wt = TRUE, plot = FALSE )
cv.multiFSSEMiPALM2( Xs, Ys, Bs, Fs, Sk, sigma2, nlambda = 20, nrho = 20, nfold = 5, p, q, wt = TRUE, plot = FALSE )
Xs |
eQTL matrices |
Ys |
Gene expression matrices |
Bs |
initialized GRN-matrices |
Fs |
initialized eQTL effect matrices |
Sk |
eQTL index of genes |
sigma2 |
initialized noise variance |
nlambda |
number of hyper-parameter of lasso term in CV |
nrho |
number of hyper-parameter of fused-lasso term in CV |
nfold |
CVfold number. Default 5/10 |
p |
number of genes |
q |
number of eQTLs |
wt |
use adaptive lasso or not. Default TRUE. |
plot |
plot contour of cvmean or not. Default FALSE. |
list of cross-validation result
cv.multiNFSSEMiPALM2
cv.multiNFSSEMiPALM2( Xs, Ys, Bs, Fs, Sk, sigma2, nlambda = 20, nrho = 20, nfold = 5, p, q, wt = TRUE, plot = FALSE )
cv.multiNFSSEMiPALM2( Xs, Ys, Bs, Fs, Sk, sigma2, nlambda = 20, nrho = 20, nfold = 5, p, q, wt = TRUE, plot = FALSE )
Xs |
eQTL matrices |
Ys |
Gene expression matrices |
Bs |
initialized GRN-matrices |
Fs |
initialized eQTL effect matrices |
Sk |
eQTL index of genes |
sigma2 |
initialized noise variance |
nlambda |
number of hyper-parameter of lasso term in CV |
nrho |
number of hyper-parameter of fused-lasso term in CV |
nfold |
CVfold number. Default 5/10 |
p |
number of genes |
q |
number of eQTLs |
wt |
use adaptive lasso or not. Default TRUE. |
plot |
plot contour of cvmean or not. Default FALSE. |
list of cross-validation result for NFSSEM
cv.multiRegression
cv.multiRegression(Xs, Ys, Sk, ngamma = 20, nfold = 5, n, p, k)
cv.multiRegression(Xs, Ys, Sk, ngamma = 20, nfold = 5, n, p, k)
Xs |
eQTL matrices |
Ys |
Gene expression matrices |
Sk |
eQTL index of genes |
ngamma |
number of hyper-parameter in CV |
nfold |
CVfold number. Default 5/10 |
n |
number of observations |
p |
number of genes |
k |
number of eQTLs |
gamma_min optimal gamma to minimize cross-validation error
function generator function
cwiseGradient4FSSEM(n, c, Y, R, Y2norm, sigma2)
cwiseGradient4FSSEM(n, c, Y, R, Y2norm, sigma2)
n |
number of observations |
c |
cofactor vector |
Y |
Matrix of gene expression |
R |
Residual matrix |
Y2norm |
Column of YtY |
sigma2 |
noise variance |
function whose argument is column vector bi
False discovery rate for network prediction
FDR(X, B, PREC = 0)
FDR(X, B, PREC = 0)
X |
list of predicted network matrices |
B |
list of true network matrices |
PREC |
precision threshold for FDR test. Default 0. |
inversed difference of two B matrices. For adaptive fused lasso penalty
flinvB(Bs)
flinvB(Bs)
Bs |
list of network matrices |
inversed difference matrices
if you do not want adaptive fused lasso penalty, floneB replace flinvB
floneB(Bs)
floneB(Bs)
Bs |
list of network matrices |
matrix whose entries are all 1
Solving Sparse Structural Equation Model
Xin Zhou <[email protected]>
seed = as.numeric(Sys.time()) N = 100 # sample size Ng = 5 # gene number Nk = 5 * 3 # eQTL number Ns = 1 # sparse ratio sigma2 = 0.01 # sigma2 set.seed(seed) library(fssemR) data = randomFSSEMdata(n = N, p = Ng, k = Nk, sparse = Ns, df = 0.3, sigma2 = sigma2, u = 5, type = "DG", nhub = 1, dag = TRUE) gamma = cv.multiRegression(data$Data$X, data$Data$Y, data$Data$Sk, ngamma = 20, nfold = 5, N, Ng, Nk) fit = multiRegression(data$Data$X, data$Data$Y, data$Data$Sk, gamma, N, Ng, Nk, trans = FALSE) Xs = data$Data$X Ys = data$Data$Y Sk = data$Data$Sk ## cross-validation ## cvfitc <- cv.multiFSSEMiPALM(Xs = Xs, Ys = Ys, Bs = fit$Bs, Fs = fit$Fs, Sk = Sk, ## sigma2 = fit$sigma2, nlambda = 10, nrho = 10, ## nfold = 5, p = Ng, q = Nk, wt = TRUE) fitm <- opt.multiFSSEMiPALM(Xs = Xs, Ys = Ys, Bs = fit$Bs, Fs = fit$Fs, Sk = Sk, sigma2 = fit$sigma2, nlambda = 10, nrho = 10, p = Ng, q = Nk, wt = TRUE) fitc0 <- fitm$fit (TPR(fitc0$Bs[[1]], data$Vars$B[[1]]) + TPR(fitc0$Bs[[2]], data$Vars$B[[2]])) / 2 (FDR(fitc0$Bs[[1]], data$Vars$B[[1]]) + FDR(fitc0$Bs[[2]], data$Vars$B[[2]])) / 2 TPR(fitc0$Bs[[1]] - fitc0$Bs[[2]], data$Vars$B[[1]] - data$Vars$B[[2]]) FDR(fitc0$Bs[[1]] - fitc0$Bs[[2]], data$Vars$B[[1]] - data$Vars$B[[2]])
seed = as.numeric(Sys.time()) N = 100 # sample size Ng = 5 # gene number Nk = 5 * 3 # eQTL number Ns = 1 # sparse ratio sigma2 = 0.01 # sigma2 set.seed(seed) library(fssemR) data = randomFSSEMdata(n = N, p = Ng, k = Nk, sparse = Ns, df = 0.3, sigma2 = sigma2, u = 5, type = "DG", nhub = 1, dag = TRUE) gamma = cv.multiRegression(data$Data$X, data$Data$Y, data$Data$Sk, ngamma = 20, nfold = 5, N, Ng, Nk) fit = multiRegression(data$Data$X, data$Data$Y, data$Data$Sk, gamma, N, Ng, Nk, trans = FALSE) Xs = data$Data$X Ys = data$Data$Y Sk = data$Data$Sk ## cross-validation ## cvfitc <- cv.multiFSSEMiPALM(Xs = Xs, Ys = Ys, Bs = fit$Bs, Fs = fit$Fs, Sk = Sk, ## sigma2 = fit$sigma2, nlambda = 10, nrho = 10, ## nfold = 5, p = Ng, q = Nk, wt = TRUE) fitm <- opt.multiFSSEMiPALM(Xs = Xs, Ys = Ys, Bs = fit$Bs, Fs = fit$Fs, Sk = Sk, sigma2 = fit$sigma2, nlambda = 10, nrho = 10, p = Ng, q = Nk, wt = TRUE) fitc0 <- fitm$fit (TPR(fitc0$Bs[[1]], data$Vars$B[[1]]) + TPR(fitc0$Bs[[2]], data$Vars$B[[2]])) / 2 (FDR(fitc0$Bs[[1]], data$Vars$B[[1]]) + FDR(fitc0$Bs[[2]], data$Vars$B[[2]])) / 2 TPR(fitc0$Bs[[1]] - fitc0$Bs[[2]], data$Vars$B[[1]] - data$Vars$B[[2]]) FDR(fitc0$Bs[[1]] - fitc0$Bs[[2]], data$Vars$B[[1]] - data$Vars$B[[2]])
implementor function of FSSEM solver
implFSSEM(data = NULL, method = c("CV", "BIC"))
implFSSEM(data = NULL, method = c("CV", "BIC"))
data |
Data archive of experiment measurements, includeing eQTL matrices, Gene expression matrices of different conditions, marker of eQTLs and data generation SEM model |
method |
Use cross-validation (CV) or bayesian-information-criterion(BIC) |
List of TPR and FDR
initLambdaiPALM
initLambdaiPALM(Xs, Ys, Bs, Fs, Sk, sigma2, Wl, Wf, p, k)
initLambdaiPALM(Xs, Ys, Bs, Fs, Sk, sigma2, Wl, Wf, p, k)
Xs |
eQTL matrices |
Ys |
Gene expression matrices |
Bs |
initialized GRN-matrices |
Fs |
initialized eQTL effect matrices |
Sk |
eQTL index of genes |
sigma2 |
initialized noise variance |
Wl |
weight matrices for adaptive lasso terms |
Wf |
weight matrix for adaptive fused lasso term |
p |
number of genes |
k |
number of eQTL |
lambda_max
initLambdaiPALM2
initLambdaiPALM2(Xs, Ys, Bs, Fs, Sk, sigma2, Wl, Wf, p, k)
initLambdaiPALM2(Xs, Ys, Bs, Fs, Sk, sigma2, Wl, Wf, p, k)
Xs |
eQTL matrices |
Ys |
Gene expression matrices |
Bs |
initialized GRN-matrices |
Fs |
initialized eQTL effect matrices |
Sk |
eQTL index of genes |
sigma2 |
initialized noise variance |
Wl |
weight matrices for adaptive lasso terms |
Wf |
weight matrix for adaptive fused lasso term |
p |
number of genes |
k |
number of eQTL |
lambda_max
initLambdaiPALM3
initLambdaiPALM3(Xs, Ys, Bs, Fs, Sk, sigma2, Wl, Wf, p, k)
initLambdaiPALM3(Xs, Ys, Bs, Fs, Sk, sigma2, Wl, Wf, p, k)
Xs |
eQTL matrices |
Ys |
Gene expression matrices |
Bs |
initialized GRN-matrices |
Fs |
initialized eQTL effect matrices |
Sk |
eQTL index of genes |
sigma2 |
initialized noise variance |
Wl |
weight matrices for adaptive lasso terms |
Wf |
weight matrix for adaptive fused lasso term |
p |
number of genes |
k |
number of eQTL |
lambda_max
initRhoiPALM
initRhoiPALM(Xs, Ys, Bs, Fs, Sk, sigma2, Wl, Wf, lambda, n, p)
initRhoiPALM(Xs, Ys, Bs, Fs, Sk, sigma2, Wl, Wf, lambda, n, p)
Xs |
eQTL matrices |
Ys |
Gene expression matrices |
Bs |
initialized GRN-matrices |
Fs |
initialized eQTL effect matrices |
Sk |
eQTL index of genes |
sigma2 |
initialized noise variance |
Wl |
weight matrices for adaptive lasso terms |
Wf |
weight matrix for adaptive fused lasso term |
lambda |
lambda w.r.t. rho_max |
n |
number of observations |
p |
number of genes |
rho_max
initRhoiPALM2
initRhoiPALM2(Xs, Ys, Bs, Fs, Sk, sigma2, Wl, Wf, lambda, n, p)
initRhoiPALM2(Xs, Ys, Bs, Fs, Sk, sigma2, Wl, Wf, lambda, n, p)
Xs |
eQTL matrices |
Ys |
Gene expression matrices |
Bs |
initialized GRN-matrices |
Fs |
initialized eQTL effect matrices |
Sk |
eQTL index of genes |
sigma2 |
initialized noise variance |
Wl |
weight matrices for adaptive lasso terms |
Wf |
weight matrix for adaptive fused lasso term |
lambda |
lambda w.r.t. rho_max |
n |
number of observations |
p |
number of genes |
rho_max
initRhoiPALM3
initRhoiPALM3(Xs, Ys, Bs, Fs, Sk, sigma2, Wl, Wf, lambda, n, p)
initRhoiPALM3(Xs, Ys, Bs, Fs, Sk, sigma2, Wl, Wf, lambda, n, p)
Xs |
eQTL matrices |
Ys |
Gene expression matrices |
Bs |
initialized GRN-matrices |
Fs |
initialized eQTL effect matrices |
Sk |
eQTL index of genes |
sigma2 |
initialized noise variance |
Wl |
weight matrices for adaptive lasso terms |
Wf |
weight matrix for adaptive perturbation group lasso term |
lambda |
lambda w.r.t. rho_max |
n |
number of observations |
p |
number of genes |
rho_max
inverse matrices of B network for adaptive FSSEM
inverseB(Bs)
inverseB(Bs)
Bs |
list of network matrices |
list of inversed B matrices
if you do not want to get inversed B matrces, invoneB gives you a matrix with constant 1 instead in FSSEM
invoneB(Bs)
invoneB(Bs)
Bs |
list of network matrices |
list of invone B matrices
logLikFSSEM
logLikFSSEM(Bs, Wl, Wf, lambda, rho, sigma2, Dets, n, p)
logLikFSSEM(Bs, Wl, Wf, lambda, rho, sigma2, Dets, n, p)
Bs |
Network matrices |
Wl |
Weights for lasso term |
Wf |
Weights for fused term |
lambda |
Hyperparameter of lasso term |
rho |
Hyperparameter of fused lasso term |
sigma2 |
noise variance |
Dets |
determinants of I-B matrices |
n |
number of observations |
p |
number of genes |
objective value of FSSEM with specified hyper-paramters
logLikNFSSEM
logLikNFSSEM(Bs, Wl, Wf, lambda, rho, sigma2, Dets, n, p)
logLikNFSSEM(Bs, Wl, Wf, lambda, rho, sigma2, Dets, n, p)
Bs |
Network matrices |
Wl |
Weights for lasso term |
Wf |
Weights for group perturb lasso term |
lambda |
Hyperparameter of lasso term |
rho |
Hyperparameter of group fused lasso term |
sigma2 |
noise variance |
Dets |
determinants of I-B matrices |
n |
number of observations |
p |
number of genes |
objective value of NFSSEM with specified hyper-paramters
Implementing FSSELM algorithm for network inference. If Xs is identify for different conditions, multiFSSEMiPALM will be use, otherwise, please
use multiFSSEMiPALM2
for general cases
multiFSSEMiPALM( Xs, Ys, Bs, Fs, Sk, sigma2, lambda, rho, Wl, Wf, p, maxit = 100, inert = inert_opt("linear"), threshold = 1e-06, verbose = TRUE, sparse = TRUE, trans = FALSE, B2norm = NULL, strict = FALSE )
multiFSSEMiPALM( Xs, Ys, Bs, Fs, Sk, sigma2, lambda, rho, Wl, Wf, p, maxit = 100, inert = inert_opt("linear"), threshold = 1e-06, verbose = TRUE, sparse = TRUE, trans = FALSE, B2norm = NULL, strict = FALSE )
Xs |
eQTL matrices |
Ys |
Gene expression matrices |
Bs |
initialized GRN-matrices |
Fs |
initialized eQTL effect matrices |
Sk |
eQTL index of genes |
sigma2 |
initialized noise variance from ridge regression |
lambda |
Hyperparameter of lasso term in FSSEM |
rho |
Hyperparameter of fused-lasso term in FSSEM |
Wl |
weight matrices for adaptive lasso terms |
Wf |
weight matrix for adaptive fused lasso term |
p |
number of genes |
maxit |
maximum iteration number. Default 100 |
inert |
inertial function for iPALM. Default as k-1/k+2 |
threshold |
convergence threshold. Default 1e-6 |
verbose |
Default TRUE |
sparse |
Sparse Matrix or not |
trans |
Fs matrix is transposed to k x p or not. If Fs from ridge regression, trans = TRUE, else, trans = FALSE |
B2norm |
B2norm matrices generated from ridge regression. Default NULL. |
strict |
Converge strictly or not. Default False |
fit List of FSSEM model
coefficient matrices of gene regulatory networks
coefficient matrices of eQTL-gene effect
Bias vector
estimate of covariance in SEM
seed = 1234 N = 100 # sample size Ng = 5 # gene number Nk = 5 * 3 # eQTL number Ns = 1 # sparse ratio sigma2 = 0.01 # sigma2 set.seed(seed) library(fssemR) data = randomFSSEMdata(n = N, p = Ng, k = Nk, sparse = Ns, df = 0.3, sigma2 = sigma2, u = 5, type = "DG", nhub = 1, dag = TRUE) ## If we assume that different condition has different genetics perturbations (eQTLs) ## gamma = cv.multiRegression(data$Data$X, data$Data$Y, data$Data$Sk, ngamma = 20, nfold = 5, ## N, Ng, Nk) gamma = 0.6784248 ## optimal gamma computed by cv.multiRegression fit = multiRegression(data$Data$X, data$Data$Y, data$Data$Sk, gamma, N, Ng, Nk, trans = FALSE) Xs = data$Data$X Ys = data$Data$Y Sk = data$Data$Sk cvfitc <- cv.multiFSSEMiPALM(Xs = Xs, Ys = Ys, Bs = fit$Bs, Fs = fit$Fs, Sk = Sk, sigma2 = fit$sigma2, nlambda = 5, nrho = 5, nfold = 5, p = Ng, q = Nk, wt = TRUE) fitc0 <- multiFSSEMiPALM(Xs = Xs, Ys = Ys, Bs = fit$Bs, Fs = fit$Fs, Sk = Sk, sigma2 = fit$sigma2, lambda = cvfitc$lambda, rho = cvfitc$rho, Wl = inverseB(fit$Bs), Wf = flinvB(fit$Bs), p = Ng, maxit = 100, threshold = 1e-5, sparse = TRUE, verbose = TRUE, trans = TRUE, strict = TRUE) (TPR(fitc0$Bs[[1]], data$Vars$B[[1]]) + TPR(fitc0$Bs[[2]], data$Vars$B[[2]])) / 2 (FDR(fitc0$Bs[[1]], data$Vars$B[[1]]) + FDR(fitc0$Bs[[2]], data$Vars$B[[2]])) / 2 TPR(fitc0$Bs[[1]] - fitc0$Bs[[2]], data$Vars$B[[1]] - data$Vars$B[[2]]) FDR(fitc0$Bs[[1]] - fitc0$Bs[[2]], data$Vars$B[[1]] - data$Vars$B[[2]])
seed = 1234 N = 100 # sample size Ng = 5 # gene number Nk = 5 * 3 # eQTL number Ns = 1 # sparse ratio sigma2 = 0.01 # sigma2 set.seed(seed) library(fssemR) data = randomFSSEMdata(n = N, p = Ng, k = Nk, sparse = Ns, df = 0.3, sigma2 = sigma2, u = 5, type = "DG", nhub = 1, dag = TRUE) ## If we assume that different condition has different genetics perturbations (eQTLs) ## gamma = cv.multiRegression(data$Data$X, data$Data$Y, data$Data$Sk, ngamma = 20, nfold = 5, ## N, Ng, Nk) gamma = 0.6784248 ## optimal gamma computed by cv.multiRegression fit = multiRegression(data$Data$X, data$Data$Y, data$Data$Sk, gamma, N, Ng, Nk, trans = FALSE) Xs = data$Data$X Ys = data$Data$Y Sk = data$Data$Sk cvfitc <- cv.multiFSSEMiPALM(Xs = Xs, Ys = Ys, Bs = fit$Bs, Fs = fit$Fs, Sk = Sk, sigma2 = fit$sigma2, nlambda = 5, nrho = 5, nfold = 5, p = Ng, q = Nk, wt = TRUE) fitc0 <- multiFSSEMiPALM(Xs = Xs, Ys = Ys, Bs = fit$Bs, Fs = fit$Fs, Sk = Sk, sigma2 = fit$sigma2, lambda = cvfitc$lambda, rho = cvfitc$rho, Wl = inverseB(fit$Bs), Wf = flinvB(fit$Bs), p = Ng, maxit = 100, threshold = 1e-5, sparse = TRUE, verbose = TRUE, trans = TRUE, strict = TRUE) (TPR(fitc0$Bs[[1]], data$Vars$B[[1]]) + TPR(fitc0$Bs[[2]], data$Vars$B[[2]])) / 2 (FDR(fitc0$Bs[[1]], data$Vars$B[[1]]) + FDR(fitc0$Bs[[2]], data$Vars$B[[2]])) / 2 TPR(fitc0$Bs[[1]] - fitc0$Bs[[2]], data$Vars$B[[1]] - data$Vars$B[[2]]) FDR(fitc0$Bs[[1]] - fitc0$Bs[[2]], data$Vars$B[[1]] - data$Vars$B[[2]])
Implementing FSSELM algorithm for network inference. If Xs is identify for different conditions, multiFSSEMiPALM will be use, otherwise, please
use multiFSSEMiPALM2
for general cases
multiFSSEMiPALM2( Xs, Ys, Bs, Fs, Sk, sigma2, lambda, rho, Wl, Wf, p, maxit = 100, inert = inert_opt("linear"), threshold = 1e-06, verbose = TRUE, sparse = TRUE, trans = FALSE, B2norm = NULL, strict = FALSE )
multiFSSEMiPALM2( Xs, Ys, Bs, Fs, Sk, sigma2, lambda, rho, Wl, Wf, p, maxit = 100, inert = inert_opt("linear"), threshold = 1e-06, verbose = TRUE, sparse = TRUE, trans = FALSE, B2norm = NULL, strict = FALSE )
Xs |
eQTL matrices |
Ys |
Gene expression matrices |
Bs |
initialized GRN-matrices |
Fs |
initialized eQTL effect matrices |
Sk |
eQTL index of genes |
sigma2 |
initialized noise variance from ridge regression |
lambda |
Hyperparameter of lasso term in FSSEM |
rho |
Hyperparameter of fused-lasso term in FSSEM |
Wl |
weight matrices for adaptive lasso terms |
Wf |
weight matrix for adaptive fused lasso term |
p |
number of genes |
maxit |
maximum iteration number. Default 100 |
inert |
inertial function for iPALM. Default as k-1/k+2 |
threshold |
convergence threshold. Default 1e-6 |
verbose |
Default TRUE |
sparse |
Sparse Matrix or not |
trans |
Fs matrix is transposed to k x p or not. If Fs from ridge regression, trans = TRUE, else, trans = FALSE |
B2norm |
B2norm matrices generated from ridge regression. Default NULL. |
strict |
Converge strictly or not. Default False |
fit List of FSSEM model
coefficient matrices of gene regulatory networks
coefficient matrices of eQTL-gene effect
Bias vector
estimate of covariance in SEM
seed = 1234 N = 100 # sample size Ng = 5 # gene number Nk = 5 * 3 # eQTL number Ns = 1 # sparse ratio sigma2 = 0.01 # sigma2 set.seed(seed) library(fssemR) data = randomFSSEMdata(n = N, p = Ng, k = Nk, sparse = Ns, df = 0.3, sigma2 = sigma2, u = 5, type = "DG", nhub = 1, dag = TRUE) ## If we assume that different condition has different genetics perturbations (eQTLs) data$Data$X = list(data$Data$X, data$Data$X) ## gamma = cv.multiRegression(data$Data$X, data$Data$Y, data$Data$Sk, ngamma = 20, nfold = 5, ## N, Ng, Nk) gamma = 0.6784248 ## optimal gamma computed by cv.multiRegression fit = multiRegression(data$Data$X, data$Data$Y, data$Data$Sk, gamma, N, Ng, Nk, trans = FALSE) Xs = data$Data$X Ys = data$Data$Y Sk = data$Data$Sk cvfitc <- cv.multiFSSEMiPALM2(Xs = Xs, Ys = Ys, Bs = fit$Bs, Fs = fit$Fs, Sk = Sk, sigma2 = fit$sigma2, nlambda = 5, nrho = 5, nfold = 5, p = Ng, q = Nk, wt = TRUE) fitc0 <- multiFSSEMiPALM2(Xs = Xs, Ys = Ys, Bs = fit$Bs, Fs = fit$Fs, Sk = Sk, sigma2 = fit$sigma2, lambda = cvfitc$lambda, rho = cvfitc$rho, Wl = inverseB(fit$Bs), Wf = flinvB(fit$Bs), p = Ng, maxit = 100, threshold = 1e-5, sparse = TRUE, verbose = TRUE, trans = TRUE, strict = TRUE) (TPR(fitc0$Bs[[1]], data$Vars$B[[1]]) + TPR(fitc0$Bs[[2]], data$Vars$B[[2]])) / 2 (FDR(fitc0$Bs[[1]], data$Vars$B[[1]]) + FDR(fitc0$Bs[[2]], data$Vars$B[[2]])) / 2 TPR(fitc0$Bs[[1]] - fitc0$Bs[[2]], data$Vars$B[[1]] - data$Vars$B[[2]]) FDR(fitc0$Bs[[1]] - fitc0$Bs[[2]], data$Vars$B[[1]] - data$Vars$B[[2]])
seed = 1234 N = 100 # sample size Ng = 5 # gene number Nk = 5 * 3 # eQTL number Ns = 1 # sparse ratio sigma2 = 0.01 # sigma2 set.seed(seed) library(fssemR) data = randomFSSEMdata(n = N, p = Ng, k = Nk, sparse = Ns, df = 0.3, sigma2 = sigma2, u = 5, type = "DG", nhub = 1, dag = TRUE) ## If we assume that different condition has different genetics perturbations (eQTLs) data$Data$X = list(data$Data$X, data$Data$X) ## gamma = cv.multiRegression(data$Data$X, data$Data$Y, data$Data$Sk, ngamma = 20, nfold = 5, ## N, Ng, Nk) gamma = 0.6784248 ## optimal gamma computed by cv.multiRegression fit = multiRegression(data$Data$X, data$Data$Y, data$Data$Sk, gamma, N, Ng, Nk, trans = FALSE) Xs = data$Data$X Ys = data$Data$Y Sk = data$Data$Sk cvfitc <- cv.multiFSSEMiPALM2(Xs = Xs, Ys = Ys, Bs = fit$Bs, Fs = fit$Fs, Sk = Sk, sigma2 = fit$sigma2, nlambda = 5, nrho = 5, nfold = 5, p = Ng, q = Nk, wt = TRUE) fitc0 <- multiFSSEMiPALM2(Xs = Xs, Ys = Ys, Bs = fit$Bs, Fs = fit$Fs, Sk = Sk, sigma2 = fit$sigma2, lambda = cvfitc$lambda, rho = cvfitc$rho, Wl = inverseB(fit$Bs), Wf = flinvB(fit$Bs), p = Ng, maxit = 100, threshold = 1e-5, sparse = TRUE, verbose = TRUE, trans = TRUE, strict = TRUE) (TPR(fitc0$Bs[[1]], data$Vars$B[[1]]) + TPR(fitc0$Bs[[2]], data$Vars$B[[2]])) / 2 (FDR(fitc0$Bs[[1]], data$Vars$B[[1]]) + FDR(fitc0$Bs[[2]], data$Vars$B[[2]])) / 2 TPR(fitc0$Bs[[1]] - fitc0$Bs[[2]], data$Vars$B[[1]] - data$Vars$B[[2]]) FDR(fitc0$Bs[[1]] - fitc0$Bs[[2]], data$Vars$B[[1]] - data$Vars$B[[2]])
Implementing NFSSEM algorithm for network inference. If Xs is identify for different conditions, multiNFSSEMiPALM will be use, otherwise, please
use multiNFSSEMiPALM2
for general cases
multiNFSSEMiPALM2( Xs, Ys, Bs, Fs, Sk, sigma2, lambda, rho, Wl, Wf, p, maxit = 100, inert = inert_opt("linear"), threshold = 1e-06, verbose = TRUE, sparse = TRUE, trans = FALSE, B2norm = NULL, strict = FALSE )
multiNFSSEMiPALM2( Xs, Ys, Bs, Fs, Sk, sigma2, lambda, rho, Wl, Wf, p, maxit = 100, inert = inert_opt("linear"), threshold = 1e-06, verbose = TRUE, sparse = TRUE, trans = FALSE, B2norm = NULL, strict = FALSE )
Xs |
eQTL matrices |
Ys |
Gene expression matrices |
Bs |
initialized GRN-matrices |
Fs |
initialized eQTL effect matrices |
Sk |
eQTL index of genes |
sigma2 |
initialized noise variance from ridge regression |
lambda |
Hyperparameter of lasso term in NFSSEM |
rho |
Hyperparameter of fused-lasso term in NFSSEM |
Wl |
weight matrices for adaptive lasso terms |
Wf |
weight matrix for columnwise l2 norm adaptive group lasso |
p |
number of genes |
maxit |
maximum iteration number. Default 100 |
inert |
inertial function for iPALM. Default as k-1/k+2 |
threshold |
convergence threshold. Default 1e-6 |
verbose |
Default TRUE |
sparse |
Sparse Matrix or not |
trans |
Fs matrix is transposed to k x p or not. If Fs from ridge regression, trans = TRUE, else, trans = FALSE |
B2norm |
B2norm matrices generated from ridge regression. Default NULL. |
strict |
Converge strictly or not. Default False |
zstart |
TRUE for zero matrix start |
fit List of NFSSEM model
coefficient matrices of gene regulatory networks
coefficient matrices of eQTL-gene effect
Bias vector
estimate of covariance in SEM
Ridge regression on multiple conditions, initialization of FSSEM algorithm
multiRegression(Xs, Ys, Sk, gamma, n, p, k, trans = FALSE)
multiRegression(Xs, Ys, Sk, gamma, n, p, k, trans = FALSE)
Xs |
eQTL matrices. eQTL matrix can be matrix/list of multiple conditions |
Ys |
Gene expression matrices |
Sk |
eQTL index of genes |
gamma |
Hyperparameter for ridge regression |
n |
number of observations |
p |
number of genes |
k |
number of eQTLs |
trans |
if rows for sample, trans = TRUE, otherwise, trans = FALSE. Default FALSE |
fit List of SEM model
coefficient matrices of gene regulatory networks
eQTL's coefficients w.r.t each gene
coefficient matrices of eQTL-gene effect
Bias vector
estimate of covariance in SEM
seed = 1234 N = 100 # sample size Ng = 5 # gene number Nk = 5 * 3 # eQTL number Ns = 1 # sparse ratio sigma2 = 0.01 # sigma2 set.seed(seed) data = randomFSSEMdata(n = N, p = Ng, k = Nk, sparse = Ns, df = 0.3, sigma2 = sigma2, u = 5, type = "DG", nhub = 1, dag = TRUE) ## If we assume that different condition has different genetics perturbations (eQTLs) ## data$Data$X = list(data$Data$X, data$Data$X) gamma = cv.multiRegression(data$Data$X, data$Data$Y, data$Data$Sk, ngamma = 20, nfold = 5, N, Ng, Nk) fit = multiRegression(data$Data$X, data$Data$Y, data$Data$Sk, gamma, N, Ng, Nk, trans = FALSE)
seed = 1234 N = 100 # sample size Ng = 5 # gene number Nk = 5 * 3 # eQTL number Ns = 1 # sparse ratio sigma2 = 0.01 # sigma2 set.seed(seed) data = randomFSSEMdata(n = N, p = Ng, k = Nk, sparse = Ns, df = 0.3, sigma2 = sigma2, u = 5, type = "DG", nhub = 1, dag = TRUE) ## If we assume that different condition has different genetics perturbations (eQTLs) ## data$Data$X = list(data$Data$X, data$Data$X) gamma = cv.multiRegression(data$Data$X, data$Data$Y, data$Data$Sk, ngamma = 20, nfold = 5, N, Ng, Nk) fit = multiRegression(data$Data$X, data$Data$Y, data$Data$Sk, gamma, N, Ng, Nk, trans = FALSE)
obj.multiRegression
obj.multiRegression(Xs, Ys, fit, trans = F)
obj.multiRegression(Xs, Ys, fit, trans = F)
Xs |
eQTL matrices |
Ys |
gene expression matrices |
fit |
regression fit result object |
trans |
if rows for sample, trans = TRUE, otherwise, trans = FALSE. Default FALSE |
error squared norm of ||(I-B)Y - FX||_2^2
optimize multiFSSEMiPALM's parameters by minimize BIC, when feature size is large (> 300), BIC methods will be much faster than Cross-validation
opt.multiFSSEMiPALM( Xs, Ys, Bs, Fs, Sk, sigma2, nlambda = 20, nrho = 20, p, q, wt = TRUE )
opt.multiFSSEMiPALM( Xs, Ys, Bs, Fs, Sk, sigma2, nlambda = 20, nrho = 20, p, q, wt = TRUE )
Xs |
eQTL matrices |
Ys |
Gene expression matrices |
Bs |
initialized GRN-matrices |
Fs |
initialized eQTL effect matrices |
Sk |
eQTL index of genes |
sigma2 |
initialized noise variance |
nlambda |
number of hyper-parameter of lasso term in CV |
nrho |
number of hyper-parameter of fused-lasso term in CV |
p |
number of genes |
q |
number of eQTLs |
wt |
use adaptive lasso or not. Default TRUE. |
list of model selection result
seed = 1234 N = 100 # sample size Ng = 5 # gene number Nk = 5 * 3 # eQTL number Ns = 1 # sparse ratio sigma2 = 0.01 # sigma2 set.seed(seed) library(fssemR) data = randomFSSEMdata(n = N, p = Ng, k = Nk, sparse = Ns, df = 0.3, sigma2 = sigma2, u = 5, type = "DG", nhub = 1, dag = TRUE) ## If we assume that different condition has different genetics perturbations (eQTLs) ## data$Data$X = list(data$Data$X, data$Data$X) ## gamma = cv.multiRegression(data$Data$X, data$Data$Y, data$Data$Sk, ngamma = 20, nfold = 5, ## N, Ng, Nk) gamma = 0.6784248 ## optimal gamma computed by cv.multiRegression fit = multiRegression(data$Data$X, data$Data$Y, data$Data$Sk, gamma, N, Ng, Nk, trans = FALSE) Xs = data$Data$X Ys = data$Data$Y Sk = data$Data$Sk fitm <- opt.multiFSSEMiPALM(Xs = Xs, Ys = Ys, Bs = fit$Bs, Fs = fit$Fs, Sk = Sk, sigma2 = fit$sigma2, nlambda = 10, nrho = 10, p = Ng, q = Nk, wt = TRUE) fitc0 <- fitm$fit (TPR(fitc0$Bs[[1]], data$Vars$B[[1]]) + TPR(fitc0$Bs[[2]], data$Vars$B[[2]])) / 2 (FDR(fitc0$Bs[[1]], data$Vars$B[[1]]) + FDR(fitc0$Bs[[2]], data$Vars$B[[2]])) / 2 TPR(fitc0$Bs[[1]] - fitc0$Bs[[2]], data$Vars$B[[1]] - data$Vars$B[[2]]) FDR(fitc0$Bs[[1]] - fitc0$Bs[[2]], data$Vars$B[[1]] - data$Vars$B[[2]])
seed = 1234 N = 100 # sample size Ng = 5 # gene number Nk = 5 * 3 # eQTL number Ns = 1 # sparse ratio sigma2 = 0.01 # sigma2 set.seed(seed) library(fssemR) data = randomFSSEMdata(n = N, p = Ng, k = Nk, sparse = Ns, df = 0.3, sigma2 = sigma2, u = 5, type = "DG", nhub = 1, dag = TRUE) ## If we assume that different condition has different genetics perturbations (eQTLs) ## data$Data$X = list(data$Data$X, data$Data$X) ## gamma = cv.multiRegression(data$Data$X, data$Data$Y, data$Data$Sk, ngamma = 20, nfold = 5, ## N, Ng, Nk) gamma = 0.6784248 ## optimal gamma computed by cv.multiRegression fit = multiRegression(data$Data$X, data$Data$Y, data$Data$Sk, gamma, N, Ng, Nk, trans = FALSE) Xs = data$Data$X Ys = data$Data$Y Sk = data$Data$Sk fitm <- opt.multiFSSEMiPALM(Xs = Xs, Ys = Ys, Bs = fit$Bs, Fs = fit$Fs, Sk = Sk, sigma2 = fit$sigma2, nlambda = 10, nrho = 10, p = Ng, q = Nk, wt = TRUE) fitc0 <- fitm$fit (TPR(fitc0$Bs[[1]], data$Vars$B[[1]]) + TPR(fitc0$Bs[[2]], data$Vars$B[[2]])) / 2 (FDR(fitc0$Bs[[1]], data$Vars$B[[1]]) + FDR(fitc0$Bs[[2]], data$Vars$B[[2]])) / 2 TPR(fitc0$Bs[[1]] - fitc0$Bs[[2]], data$Vars$B[[1]] - data$Vars$B[[2]]) FDR(fitc0$Bs[[1]] - fitc0$Bs[[2]], data$Vars$B[[1]] - data$Vars$B[[2]])
optimize multiFSSEMiPALM's parameters by minimize BIC, when feature size is large (> 300), BIC methods will be much faster than Cross-validation
opt.multiFSSEMiPALM2( Xs, Ys, Bs, Fs, Sk, sigma2, nlambda = 20, nrho = 20, p, q, wt = TRUE )
opt.multiFSSEMiPALM2( Xs, Ys, Bs, Fs, Sk, sigma2, nlambda = 20, nrho = 20, p, q, wt = TRUE )
Xs |
eQTL matrices |
Ys |
Gene expression matrices |
Bs |
initialized GRN-matrices |
Fs |
initialized eQTL effect matrices |
Sk |
eQTL index of genes |
sigma2 |
initialized noise variance |
nlambda |
number of hyper-parameter of lasso term in CV |
nrho |
number of hyper-parameter of fused-lasso term in CV |
p |
number of genes |
q |
number of eQTLs |
wt |
use adaptive lasso or not. Default TRUE. |
list of model selection result
seed = 1234 N = 100 # sample size Ng = 5 # gene number Nk = 5 * 3 # eQTL number Ns = 1 # sparse ratio sigma2 = 0.01 # sigma2 set.seed(seed) library(fssemR) data = randomFSSEMdata(n = N, p = Ng, k = Nk, sparse = Ns, df = 0.3, sigma2 = sigma2, u = 5, type = "DG", nhub = 1, dag = TRUE) ## If we assume that different condition has different genetics perturbations (eQTLs) data$Data$X = list(data$Data$X, data$Data$X) ## gamma = cv.multiRegression(data$Data$X, data$Data$Y, data$Data$Sk, ngamma = 20, nfold = 5, ## N, Ng, Nk) gamma = 0.6784248 ## optimal gamma computed by cv.multiRegression fit = multiRegression(data$Data$X, data$Data$Y, data$Data$Sk, gamma, N, Ng, Nk, trans = FALSE) Xs = data$Data$X Ys = data$Data$Y Sk = data$Data$Sk fitm <- opt.multiFSSEMiPALM2(Xs = Xs, Ys = Ys, Bs = fit$Bs, Fs = fit$Fs, Sk = Sk, sigma2 = fit$sigma2, nlambda = 10, nrho = 10, p = Ng, q = Nk, wt = TRUE) fitc0 <- fitm$fit (TPR(fitc0$Bs[[1]], data$Vars$B[[1]]) + TPR(fitc0$Bs[[2]], data$Vars$B[[2]])) / 2 (FDR(fitc0$Bs[[1]], data$Vars$B[[1]]) + FDR(fitc0$Bs[[2]], data$Vars$B[[2]])) / 2 TPR(fitc0$Bs[[1]] - fitc0$Bs[[2]], data$Vars$B[[1]] - data$Vars$B[[2]]) FDR(fitc0$Bs[[1]] - fitc0$Bs[[2]], data$Vars$B[[1]] - data$Vars$B[[2]])
seed = 1234 N = 100 # sample size Ng = 5 # gene number Nk = 5 * 3 # eQTL number Ns = 1 # sparse ratio sigma2 = 0.01 # sigma2 set.seed(seed) library(fssemR) data = randomFSSEMdata(n = N, p = Ng, k = Nk, sparse = Ns, df = 0.3, sigma2 = sigma2, u = 5, type = "DG", nhub = 1, dag = TRUE) ## If we assume that different condition has different genetics perturbations (eQTLs) data$Data$X = list(data$Data$X, data$Data$X) ## gamma = cv.multiRegression(data$Data$X, data$Data$Y, data$Data$Sk, ngamma = 20, nfold = 5, ## N, Ng, Nk) gamma = 0.6784248 ## optimal gamma computed by cv.multiRegression fit = multiRegression(data$Data$X, data$Data$Y, data$Data$Sk, gamma, N, Ng, Nk, trans = FALSE) Xs = data$Data$X Ys = data$Data$Y Sk = data$Data$Sk fitm <- opt.multiFSSEMiPALM2(Xs = Xs, Ys = Ys, Bs = fit$Bs, Fs = fit$Fs, Sk = Sk, sigma2 = fit$sigma2, nlambda = 10, nrho = 10, p = Ng, q = Nk, wt = TRUE) fitc0 <- fitm$fit (TPR(fitc0$Bs[[1]], data$Vars$B[[1]]) + TPR(fitc0$Bs[[2]], data$Vars$B[[2]])) / 2 (FDR(fitc0$Bs[[1]], data$Vars$B[[1]]) + FDR(fitc0$Bs[[2]], data$Vars$B[[2]])) / 2 TPR(fitc0$Bs[[1]] - fitc0$Bs[[2]], data$Vars$B[[1]] - data$Vars$B[[2]]) FDR(fitc0$Bs[[1]] - fitc0$Bs[[2]], data$Vars$B[[1]] - data$Vars$B[[2]])
optimize multiNFSSEMiPALM's parameters by minimize BIC, when feature size is large (> 300), BIC methods will be much faster than Cross-validation
opt.multiNFSSEMiPALM2( Xs, Ys, Bs, Fs, Sk, sigma2, nlambda = 20, nrho = 20, p, q, wt = TRUE )
opt.multiNFSSEMiPALM2( Xs, Ys, Bs, Fs, Sk, sigma2, nlambda = 20, nrho = 20, p, q, wt = TRUE )
Xs |
eQTL matrices |
Ys |
Gene expression matrices |
Bs |
initialized GRN-matrices |
Fs |
initialized eQTL effect matrices |
Sk |
eQTL index of genes |
sigma2 |
initialized noise variance |
nlambda |
number of hyper-parameter of lasso term in CV |
nrho |
number of hyper-parameter of fused-lasso term in CV |
p |
number of genes |
q |
number of eQTLs |
wt |
use adaptive lasso or not. Default TRUE. |
list of model selection result
inversed column l2 norm for perturbed group lasso penalty
pninvB(Bs)
pninvB(Bs)
Bs |
list of network matrices |
inversed l2 norm of B2 - B1
if you do not want adaptive group lasso penalty, pnoneB replace pninvB
pnoneB(Bs)
pnoneB(Bs)
Bs |
list of network matrices |
inversed l2 norm of B2 - B1 with all entries is 1
proc.centerFSSEM
proc.centerFSSEM(Xs, Ys)
proc.centerFSSEM(Xs, Ys)
Xs |
eQTL matrices |
Ys |
list of gene expression matrices |
centered Xs and Ys and mean vectors
proc.centerFSSEM2
proc.centerFSSEM2(Xs, Ys)
proc.centerFSSEM2(Xs, Ys)
Xs |
list of eQTL matrices |
Ys |
list of gene expression matrices |
centered Xs and Ys and mean vectors
randomFSSEMdata
randomFSSEMdata( n, p, k, sparse = 0.1, df = 0.2, sigma2 = 0.01, u = 5, type = c("DG", "ER"), dag = TRUE, coef = c(0.2, 0.4), nhub = 2 )
randomFSSEMdata( n, p, k, sparse = 0.1, df = 0.2, sigma2 = 0.01, u = 5, type = c("DG", "ER"), dag = TRUE, coef = c(0.2, 0.4), nhub = 2 )
n |
number of observations |
p |
number of genes |
k |
number of eQTLs |
sparse |
ratio of edges / gene_number |
df |
ratio of differential edges among two network |
sigma2 |
noise variance of error |
u |
variance of bias in SEM model. |
type |
type of generated network, can be selected as DG, ER, Scale-free network |
dag |
network is directed-acyclic or not. Default TRUE |
coef |
Range of absolute value of coefficients in simulated network matrices. Default (0.2, 0.4), or (0.5, 1) |
nhub |
If you select to generate ER network, nhub is the number of pre-defined hub node number. Default 2 |
list of generated data
List of observed, Xs, Ys, Sk
List of model, Bs, Fs, mu, n, p, k
randomFSSEMdata2
randomFSSEMdata2( n, p, k, sparse = 0.1, df = 0.2, sigma2 = 0.01, u = 5, type = c("DG", "ER"), dag = TRUE, coef = c(0.2, 0.4), nhub = 2 )
randomFSSEMdata2( n, p, k, sparse = 0.1, df = 0.2, sigma2 = 0.01, u = 5, type = c("DG", "ER"), dag = TRUE, coef = c(0.2, 0.4), nhub = 2 )
n |
number of observations. Vector for unbalance observations |
p |
number of genes |
k |
number of eQTLs |
sparse |
ratio of edges / gene_number |
df |
ratio of differential edges among two network |
sigma2 |
noise variance of error |
u |
variance of bias in SEM model. |
type |
type of generated network, can be selected as DG, ER, Scale-free network |
dag |
network is directed-acyclic or not. Default TRUE |
coef |
Range of absolute value of coefficients in simulated network matrices. Default (0.2, 0.4), or (0.5, 1) |
nhub |
If you select to generate ER network, nhub is the number of pre-defined hub node number. Default 2 |
list of generated data
List of observed, Xs, Ys, Sk
List of model, Bs, Fs, mu, n, p, k
randomFSSEMdata4Cor
randomFSSEMdata4Cor( n, p, k, sparse = 0.1, df = 0.2, sigma2 = 0.01, u = 5, type = c("DG", "ER"), dag = TRUE, coef = c(0.2, 0.4), nhub = 2, r = 0.5 )
randomFSSEMdata4Cor( n, p, k, sparse = 0.1, df = 0.2, sigma2 = 0.01, u = 5, type = c("DG", "ER"), dag = TRUE, coef = c(0.2, 0.4), nhub = 2, r = 0.5 )
n |
number of observations. Vector for unbalance observations |
p |
number of genes |
k |
number of eQTLs |
sparse |
ratio of edges / gene_number |
df |
ratio of differential edges among two network |
sigma2 |
noise variance of error |
u |
variance of bias in SEM model. |
type |
type of generated network, can be selected as DG, ER, Scale-free network |
dag |
network is directed-acyclic or not. Default TRUE |
coef |
Range of absolute value of coefficients in simulated network matrices. Default (0.2, 0.4), or (0.5, 1) |
nhub |
If you select to generate ER network, nhub is the number of pre-defined hub node number. Default 2 |
r |
correlation between different observations |
list of generated data
List of observed, Xs, Ys, Sk
List of model, Bs, Fs, mu, n, p, k
Power of detection for network prediction
TPR(X, B, PREC = 0)
TPR(X, B, PREC = 0)
X |
list of predicted network matrices |
B |
list of true network matrices |
PREC |
precision threshold for FDR test. Default 0. |
transx
transx(data)
transx(data)
data |
Collecting data structure generated by randomFSSEMdata function |
transformed list of eQTL matrices