2 Prepare Mothur Output

After completing the Mothur bioinformatics pipeline, the next step involves preparing the Mothur output for further analysis. This typically includes organizing and formatting the output data files generated by Mothur into a suitable format for downstream analysis. This preparation step is crucial for ensuring that the data are appropriately structured and compatible with the statistical analysis and visualization tools to be used in subsequent analyses.

2.1 Creating data directories

Data directories are the folders or directories where your data files are stored. Organizing your data into directories helps maintain a structured and manageable workflow, making it easier to locate and access specific files during analysis. Properly organizing data directories is essential for ensuring efficient data management and reproducibility in bioinformatics analyses.

if (!dir.exists('data')) {dir.create('data')}
if (!dir.exists('data/mothur')) {dir.create('data/mothur')}

library(tidyverse, suppressPackageStartupMessages())

2.2 Mothur metadata

The Mothur metadata contains essential information about the samples processed through the Mothur bioinformatics pipeline. This metadata typically includes details such as sample identifiers, experimental conditions, and any other relevant metadata associated with each sample.

read_tsv("data/mothur/mothur_sample_metadata.tsv", show_col_types = FALSE) %>% 
  write_csv("data/mothur/mothur_tidy_metadata.csv")

2.3 Mothur otutable

The Mothur OTU (Operational Taxonomic Unit) table provides a comprehensive summary of the microbial community composition across different samples in the Mothur pipeline. It lists the abundance or occurrence of each OTU in each sample, allowing for comparisons of microbial community structure and diversity between samples.

read_tsv("../imap-mothur-bioinformatics/mothur_process/asv_analysis/final.asv.shared", skip = 0, show_col_types = FALSE) %>%
  dplyr::rename(group="Group") %>% 
  dplyr::select(-c(label, numASVs)) %>% 
  mutate_at(2:ncol(.), as.numeric) %>% 
  mutate_if(is.numeric, ~replace(., is.na(.), 0)) %>% 
  pivot_longer(-group, names_to = "OTU", values_to = "count") %>% 
  write_csv("data/mothur/mothur_tidy_otutable.csv")

2.4 Mothur taxonomy

The Mothur taxonomy file assigns taxonomic classifications to each OTU identified in the microbial community by Mothur. This file provides information on the taxonomic identity of each OTU, often at various taxonomic levels such as phylum, class, order, family, genus, and species.

read_tsv("../imap-mothur-bioinformatics/mothur_process/asv_analysis/final.asv.ASV.cons.taxonomy", show_col_types=FALSE) %>% 
  distinct() %>%
  dplyr::select(-Size) %>%
  mutate_if(is.numeric, ~replace(., is.na(.), 0)) %>%
  mutate(Taxonomy = gsub("\\(100\\)", "", Taxonomy)) %>%  
  separate(Taxonomy, into = c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus"), sep = ";") %>% 
  write_csv("data/mothur/mothur_tidy_taxonomy.csv")

2.5 Mothur Composite Object

In Mothur, creating a composite object involves combining the metadata, OTU table, and taxonomy into a single cohesive entity. This composite object consolidates all relevant information about the microbial community samples processed through the Mothur pipeline, facilitating seamless downstream analysis.

# Mothur composite
library(tidyverse, suppressPackageStartupMessages())

mothur_tidy_metadata <- read_csv("data/mothur/mothur_tidy_metadata.csv", show_col_types = FALSE)
mothur_tidy_otutable <- read_csv("data/mothur/mothur_tidy_otutable.csv", show_col_types = FALSE)
mothur_tidy_taxonomy <- read_csv("data/mothur/mothur_tidy_taxonomy.csv", show_col_types = FALSE)

mothur_composite <- inner_join(mothur_tidy_metadata, mothur_tidy_otutable, by = "group") %>% 
  inner_join(., mothur_tidy_taxonomy, by = "OTU") %>% 
  group_by(group) %>% 
  mutate(rel_abund = count/sum(count)) %>% 
  ungroup() %>% 
  mutate_if(is.numeric, ~replace(., is.na(.), 0)) %>%
  relocate(count, .before = rel_abund) 

write_csv(mothur_composite, "data/mothur/mothur_composite.csv")