3 Prepare Qiime2 Output

After completing the QIIME2 bioinformatics pipeline, the next step involves preparing the QIIME2 output for further analysis. This typically includes organizing and formatting the output data files generated by QIIME2 into a suitable format for downstream analysis. This preparation step is crucial for ensuring that the data are appropriately structured and compatible with the statistical analysis and visualization tools to be used in subsequent analyses.

3.1 Data directories

Data directories are the folders or directories where your data files are stored.

if (!dir.exists('data')) {dir.create('data')}
if (!dir.exists('data/qiime2')) {dir.create('data/qiime2')}

library(tidyverse, suppressPackageStartupMessages())

3.2 QIIME2 Metadata

The QIIME2 metadata contains essential information about the samples processed through the QIIME2 bioinformatics pipeline. Similar to Mothur, this metadata includes details such as sample identifiers, experimental conditions, and any other relevant metadata associated with each sample.

library(dplyr)

read_tsv("../imap-qiime2-bioinformatics/resources/metadata/qiime2_sample_metadata.tsv", show_col_types = FALSE) %>% 
  dplyr::rename(sample_id="sample-id") %>% 
  write_csv("data/qiime2/qiime2_tidy_metadata.csv")

3.3 QIIME2 Feature Table

In QIIME2, the Feature Table serves a similar purpose to the OTU table in Mothur. It provides a comprehensive summary of the microbial community composition across different samples. However, in QIIME2, the term “Feature” is used instead of “OTU” to represent unique sequences or entities identified in the microbial community.

library(dplyr)

read_tsv("../imap-qiime2-bioinformatics/qiime2_process/export/feature-table.tsv", skip = 1, show_col_types = FALSE) %>%
  dplyr::rename(feature='#OTU ID') %>%
  select(-starts_with('Mock')) %>% 
  mutate_at(2:ncol(.), as.numeric) %>% 
  mutate_if(is.numeric, ~replace(., is.na(.), 0)) %>% 
  pivot_longer(-feature, names_to = "sample_id", values_to = "count") %>% 
  relocate(sample_id, .before = feature) %>% 
  write_csv("data/qiime2/qiime2_tidy_otutable.csv")

3.4 QIIME2 Taxonomy

Similar to Mothur, the QIIME2 taxonomy file assigns taxonomic classifications to each Feature identified in the microbial community. This file provides information on the taxonomic identity of each Feature, often at various taxonomic levels such as phylum, class, order, family, genus, and species.

library(dplyr)

read_tsv("../imap-qiime2-bioinformatics/qiime2_process/export/taxonomy.tsv", show_col_types=FALSE) %>% 
  dplyr::rename(feature="Feature ID") %>% 
  distinct() %>%
  mutate_if(is.numeric, ~replace(., is.na(.), 0)) %>%
  mutate(Taxon = str_replace_all(Taxon, "; s__$", ""),
         Taxon = str_replace_all(Taxon, "; g__$", ""),
         Taxon = str_replace_all(Taxon, "; f__$", ""),
         Taxon = str_replace_all(Taxon, "; o__$", ""),
         Taxon = str_replace_all(Taxon, "; c__$", ""),
         Taxon = str_replace_all(Taxon, "; p__$", ""),
         Taxon = str_replace_all(Taxon, "; k__$", ""),
         Taxon = str_replace_all(Taxon, "\\[|\\]", ""),
         Taxon = str_replace_all(Taxon, "\\s", "")) %>%
  dplyr::filter(!grepl("s__*", Taxon)) %>%
  dplyr::filter(grepl("g__*", Taxon)) %>% 
  select(-Confidence) %>% 
  mutate(Taxon = str_replace_all(Taxon, "\\w__", "")) %>% 
  separate(Taxon, into = c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus"), sep = ";") %>% 
  write_csv("data/qiime2/qiime2_tidy_taxonomy.csv")

3.5 QIIME2 Composite Object

Similarly, in QIIME2, it is essential to integrate the metadata, Feature table (equivalent to the OTU table in Mothur), and taxonomy into a unified composite object. This composite object encapsulates all pertinent details about the microbial community samples analyzed using the QIIME2 pipeline, streamlining subsequent analytical workflows.

# QIIME2 composite
library(tidyverse, suppressPackageStartupMessages())

qiime2_tidy_metadata <- read_csv("data/qiime2/qiime2_tidy_metadata.csv", show_col_types = FALSE)
qiime2_tidy_otutable <- read_csv("data/qiime2/qiime2_tidy_otutable.csv", show_col_types = FALSE)
qiime2_tidy_taxonomy <- read_csv("data/qiime2/qiime2_tidy_taxonomy.csv", show_col_types = FALSE)

qiime2_composite <- inner_join(qiime2_tidy_metadata, qiime2_tidy_otutable, by = "sample_id") %>% 
  inner_join(., qiime2_tidy_taxonomy, by = "feature") %>% 
  group_by(sample_id) %>% 
  mutate(rel_abund = count/sum(count)) %>% 
  ungroup() %>% 
  mutate_if(is.numeric, ~replace(., is.na(.), 0)) %>%
  relocate(count, .before = rel_abund) 

write_csv(qiime2_composite, "data/qiime2/qiime2_composite.csv")