13 Taxa Prevalence and Detection

Taxa prevalence and detection analysis quantify how often a taxon is present or detected across the samples under study.
High prevalence indicates that the taxon is commonly found across many samples.
Low prevalence suggests that the taxon is more sporadic or rare.
Analyzing taxa prevalence can provide insights into the distribution of different microbial taxa and their potential ecological roles.

13.1 Understand sample and taxa sizes

library(phyloseq)

# Min and max sample and taxa sums: These values are useful when setting up some threshholds.

cat("Minimum sample sums:", min(data.frame(sample_sums(ps_raw))), "\n\n")
Minimum sample sums: 1776 
cat("Maximum sample sums:", max(data.frame(sample_sums(ps_raw))), "\n\n")
Maximum sample sums: 28883 
cat("Minimum taxa sums:", min(data.frame(taxa_sums(ps_raw))), "\n\n")
Minimum taxa sums: 0 
cat("Maximum taxa sums:", max(data.frame(taxa_sums(ps_raw))), "\n\n")
Maximum taxa sums: 873314

13.2 Subsetting Microbiome Data by Taxa Prevalence

library(phyloseq)
library(microbiome)

# Define a function to subset microbiome data based on taxa prevalence
subset_by_prevalence <- function(ps_raw, prevalence_threshold) {
  # Calculate taxa prevalence across samples
  taxa_prevalence <- prevalence(ps_raw)
  
  # Subset taxa based on prevalence threshold
  taxa_to_keep <- rownames(taxa_prevalence)[taxa_prevalence >= prevalence_threshold]
  
  # Subset microbiome data
  ps_subset <- prune_taxa(taxa_to_keep, ps_raw)
  
  return(ps_subset)
}

# Example usage: Subset microbiome data with a prevalence threshold of 0.1 (10%)
ps_subset <- subset_by_prevalence(ps_raw, 0.9)

12 Dimensionality Reduction Techniques

A IMAP GitHub Repos