13 Taxa Prevalence and Detection
- Taxa prevalence and detection analysis quantify how often a taxon is present or detected across the samples under study.
- High prevalence indicates that the taxon is commonly found across many samples.
- Low prevalence suggests that the taxon is more sporadic or rare.
- Analyzing taxa prevalence can provide insights into the distribution of different microbial taxa and their potential ecological roles.
13.1 Understand sample and taxa sizes
library(phyloseq)
# Min and max sample and taxa sums: These values are useful when setting up some threshholds.
cat("Minimum sample sums:", min(data.frame(sample_sums(ps_raw))), "\n\n")
Minimum sample sums: 1776
cat("Maximum sample sums:", max(data.frame(sample_sums(ps_raw))), "\n\n")
Maximum sample sums: 28883
cat("Minimum taxa sums:", min(data.frame(taxa_sums(ps_raw))), "\n\n")
Minimum taxa sums: 0
cat("Maximum taxa sums:", max(data.frame(taxa_sums(ps_raw))), "\n\n")
Maximum taxa sums: 873314
13.2 Subsetting Microbiome Data by Taxa Prevalence
library(phyloseq)
library(microbiome)
# Define a function to subset microbiome data based on taxa prevalence
subset_by_prevalence <- function(ps_raw, prevalence_threshold) {
# Calculate taxa prevalence across samples
taxa_prevalence <- prevalence(ps_raw)
# Subset taxa based on prevalence threshold
taxa_to_keep <- rownames(taxa_prevalence)[taxa_prevalence >= prevalence_threshold]
# Subset microbiome data
ps_subset <- prune_taxa(taxa_to_keep, ps_raw)
return(ps_subset)
}
# Example usage: Subset microbiome data with a prevalence threshold of 0.1 (10%)
ps_subset <- subset_by_prevalence(ps_raw, 0.9)