Liner discriminant analysis (LDA) effect size (LEFSe) analysis

Perform Metagenomic LEFSe analysis based on phyloseq object.

run_lefse(
  ps,
  group,
  subgroup = NULL,
  taxa_rank = "all",
  transform = c("identity", "log10", "log10p"),
  norm = "CPM",
  norm_para = list(),
  kw_cutoff = 0.05,
  lda_cutoff = 2,
  bootstrap_n = 30,
  bootstrap_fraction = 2/3,
  wilcoxon_cutoff = 0.05,
  multigrp_strat = FALSE,
  strict = c("0", "1", "2"),
  sample_min = 10,
  only_same_subgrp = FALSE,
  curv = FALSE
)

Arguments

ps

a phyloseq-class object

group

character, the column name to set the group

subgroup

character, the column name to set the subgroup

taxa_rank

character to specify taxonomic rank to perform differential analysis on. Should be one of phyloseq::rank_names(phyloseq), or "all" means to summarize the taxa by the top taxa ranks (summarize_taxa(ps, level = rank_names(ps)[1])), or "none" means perform differential analysis on the original taxa (taxa_names(phyloseq), e.g., OTU or ASV).

transform

character, the methods used to transform the microbial abundance. See transform_abundances() for more details. The options include:

"identity", return the original data without any transformation (default).
"log10", the transformation is log10(object), and if the data contains zeros the transformation is log10(1 + object).
"log10p", the transformation is log10(1 + object).

norm

the methods used to normalize the microbial abundance data. See normalize() for more details. Options include:

"none": do not normalize.
"rarefy": random subsampling counts to the smallest library size in the data set.
"TSS": total sum scaling, also referred to as "relative abundance", the abundances were normalized by dividing the corresponding sample library size.
"TMM": trimmed mean of m-values. First, a sample is chosen as reference. The scaling factor is then derived using a weighted trimmed mean over the differences of the log-transformed gene-count fold-change between the sample and the reference.
"RLE", relative log expression, RLE uses a pseudo-reference calculated using the geometric mean of the gene-specific abundances over all samples. The scaling factors are then calculated as the median of the gene counts ratios between the samples and the reference.
"CSS": cumulative sum scaling, calculates scaling factors as the cumulative sum of gene abundances up to a data-derived threshold.
"CLR": centered log-ratio normalization.
"CPM": pre-sample normalization of the sum of the values to 1e+06.

norm_para

named list. other arguments passed to specific normalization methods. Most users will not need to pass any additional arguments here.

kw_cutoff

numeric, p value cutoff of kw test, default 0.05

lda_cutoff

numeric, lda score cutoff, default 2

bootstrap_n

integer, the number of bootstrap iteration for LDA, default 30

bootstrap_fraction

numeric, the subsampling fraction value for each bootstrap iteration, default 2/3

wilcoxon_cutoff

numeric, p value cutoff of wilcoxon test, default 0.05

multigrp_strat

logical, for multiple group tasks, whether the test is performed in a one-against one (more strict) or in a one-against all setting, default FALSE.

strict

multiple testing options, 0 for no correction (default), 1 for independent comparisons, 2 for independent comparison.

sample_min

integer, minimum number of samples per subclass for performing wilcoxon test, default 10

only_same_subgrp

logical, whether perform the wilcoxon test only among the subgroups with the same name, default FALSE

curv

logical, whether perform the wilcoxon test using the Curtis's approach, defalt FALSE

Value

a microbiomeMarker object, in which the slot of marker_table

contains four variables:

feature, significantly different features.
enrich_group, the class of the differential features enriched.
lda, logarithmic LDA score (effect size)
pvalue, p value of kw test.

References

Segata, Nicola, et al. Metagenomic biomarker discovery and explanation. Genome biology 12.6 (2011): R60.

Author

Yang Cao

Examples