Data processing

General snakemake workflow


A tentative snakemake workflow that defines data processing rules in a DAG (directed acyclic graph) format. A detailed interactive snakemake HTML report is available here. Use a wider screen to get a better interactive snakemake report.


Data Tidying

Requires:

  • Sample metadata
  • OTU tables from mothur and qiime2 pipelines.
  • Taxonomy tables from mothur and qiime2 pipelines.


Composite objects

Requires:

  • Tidy sample metadata.
  • Tidy OTU tables.
  • Tidy taxonomy tables.

The mothur and qiime2 composite objects are in a long-format which is suitable for most types of data visualization.


Phyloseq objects

Requires:

  • A phyloseq package
  • Tidy sample metadata.
  • Tidy OTU tables.
  • Tidy taxonomy tables.


Data transformation

Requires:

  • A phyloseq package
  • A microbiome package
  • Phyloseq objects

Data transformation is intended to converting the values into ready-to-use matrices. There are different methods out there, and here are just a few:

  • No transformation is similar to raw abundance.
  • Compositional version or relative abundance.
  • Arc sine (asin) transformation.
  • Z-transform for OTUs.
  • Z-transform for Samples.
  • Log10 transformation.
  • Log10p transformation,
  • CLR transformation.
  • Shift the baseline.
  • Data Scaling.


Data reduction

Dimensionality reduction is intended to reduce the dimension of the variables but keeping as much of the variation as possible. , and includes:

  1. Linear methods (commonly used in microbiome data analysis)
    • PCA (Principal Component Analysis)
    • Factor Analysis
    • LDA (Linear Discriminant Analysis)
    • More here.
  2. Non-linear methods



Citation

Please consider citing the iMAP article[1] if you find any part of the IMAP practical user guides helpful in your microbiome data analysis.


References

[1]
Buza, T. M., Tonui, T., Stomeo, F., Tiambo, C., Katani, R., Schilling, M., … Kapur, V. (2019). iMAP: An integrated bioinformatics and visualization pipeline for microbiome data analysis. BMC Bioinformatics, 20. https://doi.org/10.1186/S12859-019-2965-4



Appendix

Project main tree

.
├── 02_qiime2_data_processing.Rmd
├── LICENSE.md
├── README.md
├── Rplots.pdf
├── config
│   ├── config.yml
│   ├── pbs
│   ├── samples.tsv
│   ├── slurm
│   └── units.tsv
├── css
│   └── styles.css
├── dags
│   ├── rulegraph.png
│   └── rulegraph.svg
├── data
│   ├── mothur
│   └── qiime2
├── figures
│   ├── taxon_barplot.png
│   └── taxon_barplot.svg
├── images
│   ├── bkgd.png
│   ├── coders.png
│   ├── processdata.png
│   └── smkreport
├── imap-data-processing.Rproj
├── index.Rmd
├── library
│   ├── apa.csl
│   ├── imap.bib
│   └── references.bib
├── report.html
├── results
│   └── project_tree.txt
├── styles.css
└── workflow
    ├── Snakefile
    ├── Snakefile__
    ├── envs
    ├── report
    ├── rules
    ├── schemas
    └── scripts

19 directories, 25 files



Troubleshooting of FAQs

  1. Question
    • Answer
  2. Question
    • Answer