Differential abundance analysis for microbial absolute abundance data. This function is a wrapper of ANCOMBC::ancombc().

run_ancombc(
  ps,
  group,
  confounders = character(0),
  contrast = NULL,
  taxa_rank = "all",
  transform = c("identity", "log10", "log10p"),
  norm = "none",
  norm_para = list(),
  p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"),
  prv_cut = 0.1,
  lib_cut = 0,
  struc_zero = FALSE,
  neg_lb = FALSE,
  tol = 1e-05,
  max_iter = 100,
  conserve = FALSE,
  pvalue_cutoff = 0.05
)

Arguments

ps

a phyloseq::phyloseq object, which consists of a feature table, a sample metadata and a taxonomy table.

group

the name of the group variable in metadata. Specifying group is required for detecting structural zeros and performing global test.

confounders

character vector, the confounding variables to be adjusted. default character(0), indicating no confounding variable.

contrast

this parameter only used for two groups comparison while there are multiple groups. For more please see the following details.

taxa_rank

character to specify taxonomic rank to perform differential analysis on. Should be one of phyloseq::rank_names(phyloseq), or "all" means to summarize the taxa by the top taxa ranks (summarize_taxa(ps, level = rank_names(ps)[1])), or "none" means perform differential analysis on the original taxa (taxa_names(phyloseq), e.g., OTU or ASV).

transform

character, the methods used to transform the microbial abundance. See transform_abundances() for more details. The options include:

  • "identity", return the original data without any transformation (default).

  • "log10", the transformation is log10(object), and if the data contains zeros the transformation is log10(1 + object).

  • "log10p", the transformation is log10(1 + object).

norm

the methods used to normalize the microbial abundance data. See normalize() for more details. Options include:

  • "none": do not normalize.

  • "rarefy": random subsampling counts to the smallest library size in the data set.

  • "TSS": total sum scaling, also referred to as "relative abundance", the abundances were normalized by dividing the corresponding sample library size.

  • "TMM": trimmed mean of m-values. First, a sample is chosen as reference. The scaling factor is then derived using a weighted trimmed mean over the differences of the log-transformed gene-count fold-change between the sample and the reference.

  • "RLE", relative log expression, RLE uses a pseudo-reference calculated using the geometric mean of the gene-specific abundances over all samples. The scaling factors are then calculated as the median of the gene counts ratios between the samples and the reference.

  • "CSS": cumulative sum scaling, calculates scaling factors as the cumulative sum of gene abundances up to a data-derived threshold.

  • "CLR": centered log-ratio normalization.

  • "CPM": pre-sample normalization of the sum of the values to 1e+06.

norm_para

named list. other arguments passed to specific normalization methods. Most users will not need to pass any additional arguments here.

p_adjust

method to adjust p-values by. Default is "holm". Options include "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none". See stats::p.adjust() for more details.

prv_cut

a numerical fraction between 0 and 1. Taxa with prevalences less than prv_cut will be excluded in the analysis. Default is 0.10.

lib_cut

a numerical threshold for filtering samples based on library sizes. Samples with library sizes less than lib_cut will be excluded in the analysis. Default is 0, i.e. do not filter any sample.

struc_zero

whether to detect structural zeros. Default is FALSE.

neg_lb

whether to classify a taxon as a structural zero in the corresponding study group using its asymptotic lower bound. Default is FALSE.

tol

the iteration convergence tolerance for the E-M algorithm. Default is 1e-05.

max_iter

the maximum number of iterations for the E-M algorithm. Default is 100.

conserve

whether to use a conservative variance estimate of the test statistic. It is recommended if the sample size is small and/or the number of differentially abundant taxa is believed to be large. Default is FALSE.

pvalue_cutoff

level of significance. Default is 0.05.

Value

a microbiomeMarker object.

Details

contrast must be a two length character or NULL (default). It is only required to set manually for two groups comparison when there are multiple groups. The order determines the direction of comparison, the first element is used to specify the reference group (control). This means that, the first element is the denominator for the fold change, and the second element is used as baseline (numerator for fold change). Otherwise, users do required to concern this parameter (set as default NULL), and if there are two groups, the first level of groups will set as the reference group; if there are multiple groups, it will perform an ANOVA-like testing to find markers which difference in any of the groups.

References

Lin, Huang, and Shyamal Das Peddada. "Analysis of compositions of microbiomes with bias correction." Nature communications 11.1 (2020): 1-11.

See also

Examples

data(enterotypes_arumugam)
ps <- phyloseq::subset_samples(
    enterotypes_arumugam,
    Enterotype %in% c("Enterotype 3", "Enterotype 2")
)
run_ancombc(ps, group = "Enterotype")
#> 'ancombc' is deprecated 
#> Use 'ancombc2' instead
#> Warning: The group variable has < 3 categories 
#> The multi-group comparisons (global/pairwise/dunnet/trend) will be deactivated
#> microbiomeMarker-class inherited from phyloseq-class
#> normalization method:              [ none ]
#> microbiome marker identity method: [ ancombc ]
#> marker_table() Marker Table:       [ 26 microbiome markers with 5 variables ]
#> otu_table()    OTU Table:          [ 235 taxa and  24 samples ]
#> sample_data()  Sample Data:        [ 24 samples by  9 sample variables ]
#> tax_table()    Taxonomy Table:     [ 235 taxa by 1 taxonomic ranks ]