Statistical test between two groups

run_test_two_groups(
  ps,
  group,
  taxa_rank = "all",
  transform = c("identity", "log10", "log10p"),
  norm = "TSS",
  norm_para = list(),
  method = c("welch.test", "t.test", "white.test"),
  p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"),
  pvalue_cutoff = 0.05,
  diff_mean_cutoff = NULL,
  ratio_cutoff = NULL,
  conf_level = 0.95,
  nperm = 1000,
  ...
)

Arguments

ps

a phyloseq::phyloseq object

group

character, the variable to set the group

taxa_rank

character to specify taxonomic rank to perform differential analysis on. Should be one of phyloseq::rank_names(phyloseq), or "all" means to summarize the taxa by the top taxa ranks (summarize_taxa(ps, level = rank_names(ps)[1])), or "none" means perform differential analysis on the original taxa (taxa_names(phyloseq), e.g., OTU or ASV).

transform

character, the methods used to transform the microbial abundance. See transform_abundances() for more details. The options include:

  • "identity", return the original data without any transformation (default).

  • "log10", the transformation is log10(object), and if the data contains zeros the transformation is log10(1 + object).

  • "log10p", the transformation is log10(1 + object).

norm

the methods used to normalize the microbial abundance data. See normalize() for more details. Options include:

  • "none": do not normalize.

  • "rarefy": random subsampling counts to the smallest library size in the data set.

  • "TSS": total sum scaling, also referred to as "relative abundance", the abundances were normalized by dividing the corresponding sample library size.

  • "TMM": trimmed mean of m-values. First, a sample is chosen as reference. The scaling factor is then derived using a weighted trimmed mean over the differences of the log-transformed gene-count fold-change between the sample and the reference.

  • "RLE", relative log expression, RLE uses a pseudo-reference calculated using the geometric mean of the gene-specific abundances over all samples. The scaling factors are then calculated as the median of the gene counts ratios between the samples and the reference.

  • "CSS": cumulative sum scaling, calculates scaling factors as the cumulative sum of gene abundances up to a data-derived threshold.

  • "CLR": centered log-ratio normalization.

  • "CPM": pre-sample normalization of the sum of the values to 1e+06.

norm_para

arguments passed to specific normalization methods

method

test method, must be one of "welch.test", "t.test" or "white.test"

p_adjust

method for multiple test correction, default none, for more details see stats::p.adjust.

pvalue_cutoff

numeric, p value cutoff, default 0.05

diff_mean_cutoff, ratio_cutoff

cutoff of different means and ratios, default NULL which means no effect size filter.

conf_level

numeric, confidence level of interval.

nperm

integer, number of permutations for white non parametric t test estimation

...

extra arguments passed to t.test() or fisher.test()

Value

a microbiomeMarker object.

Author

Yang Cao

Examples

data(enterotypes_arumugam)
mm_welch <- run_test_two_groups(
    enterotypes_arumugam,
    group = "Gender",
    method = "welch.test"
)
mm_welch
#> microbiomeMarker-class inherited from phyloseq-class
#> normalization method:              [ TSS ]
#> microbiome marker identity method: [ welch.test ]
#> marker_table() Marker Table:       [ 3 microbiome markers with 5 variables ]
#> otu_table()    OTU Table:          [ 244 taxa and  39 samples ]
#> sample_data()  Sample Data:        [ 39 samples by  9 sample variables ]
#> tax_table()    Taxonomy Table:     [ 244 taxa by 1 taxonomic ranks ]