R/DA-test-multiple-groups.R
run_test_multiple_groups.Rd
Statistical test for multiple groups
run_test_multiple_groups(
ps,
group,
taxa_rank = "all",
transform = c("identity", "log10", "log10p"),
norm = "TSS",
norm_para = list(),
method = c("anova", "kruskal"),
p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"),
pvalue_cutoff = 0.05,
effect_size_cutoff = NULL
)
a phyloseq::phyloseq
object
character, the variable to set the group
character to specify taxonomic rank to perform
differential analysis on. Should be one of
phyloseq::rank_names(phyloseq)
, or "all" means to summarize the taxa by
the top taxa ranks (summarize_taxa(ps, level = rank_names(ps)[1])
), or
"none" means perform differential analysis on the original taxa
(taxa_names(phyloseq)
, e.g., OTU or ASV).
character, the methods used to transform the microbial
abundance. See transform_abundances()
for more details. The
options include:
"identity", return the original data without any transformation (default).
"log10", the transformation is log10(object)
, and if the data contains
zeros the transformation is log10(1 + object)
.
"log10p", the transformation is log10(1 + object)
.
the methods used to normalize the microbial abundance data. See
normalize()
for more details.
Options include:
"none": do not normalize.
"rarefy": random subsampling counts to the smallest library size in the data set.
"TSS": total sum scaling, also referred to as "relative abundance", the abundances were normalized by dividing the corresponding sample library size.
"TMM": trimmed mean of m-values. First, a sample is chosen as reference. The scaling factor is then derived using a weighted trimmed mean over the differences of the log-transformed gene-count fold-change between the sample and the reference.
"RLE", relative log expression, RLE uses a pseudo-reference calculated using the geometric mean of the gene-specific abundances over all samples. The scaling factors are then calculated as the median of the gene counts ratios between the samples and the reference.
"CSS": cumulative sum scaling, calculates scaling factors as the cumulative sum of gene abundances up to a data-derived threshold.
"CLR": centered log-ratio normalization.
"CPM": pre-sample normalization of the sum of the values to 1e+06.
arguments passed to specific normalization methods
test method, must be one of "anova" or "kruskal"
method for multiple test correction, default none
,
for more details see stats::p.adjust.
numeric, p value cutoff, default 0.05.
numeric, cutoff of effect size default NULL
which means no effect size filter. The eta squared is used to measure the
effect size for anova/kruskal test.
a microbiomeMarker
object.
data(enterotypes_arumugam)
ps <- phyloseq::subset_samples(
enterotypes_arumugam,
Enterotype %in% c("Enterotype 3", "Enterotype 2", "Enterotype 1")
)
mm_anova <- run_test_multiple_groups(
ps,
group = "Enterotype",
method = "anova"
)