Data processing
General snakemake workflow
A tentative snakemake workflow that defines data processing rules in a DAG (directed acyclic graph) format. A detailed interactive snakemake HTML report is available here. Use a wider screen to get a better interactive snakemake report.
Data Tidying
Requires:
- Sample metadata
- OTU tables from
mothur
andqiime2
pipelines. - Taxonomy tables from
mothur
andqiime2
pipelines.
Composite objects
Requires:
- Tidy sample metadata.
- Tidy OTU tables.
- Tidy taxonomy tables.
The mothur
and qiime2
composite objects are
in a long-format which is suitable for most types of data
visualization.
Phyloseq objects
Requires:
- A
phyloseq
package - Tidy sample metadata.
- Tidy OTU tables.
- Tidy taxonomy tables.
Data transformation
Requires:
- A
phyloseq
package - A
microbiome
package - Phyloseq objects
Data transformation is intended to converting the values into ready-to-use matrices. There are different methods out there, and here are just a few:
- No transformation is similar to raw abundance.
- Compositional version or relative abundance.
- Arc sine (asin) transformation.
- Z-transform for OTUs.
- Z-transform for Samples.
- Log10 transformation.
- Log10p transformation,
- CLR transformation.
- Shift the baseline.
- Data Scaling.
Data reduction
Dimensionality reduction is intended to reduce the dimension of the variables but keeping as much of the variation as possible. , and includes:
- Linear methods (commonly used in microbiome data analysis)
- PCA (Principal Component Analysis)
- Factor Analysis
- LDA (Linear Discriminant Analysis)
- More here.
- Non-linear methods
Citation
Please consider citing the iMAP article[1] if you find any part of the IMAP practical user guides helpful in your microbiome data analysis.
References
Appendix
Project main tree
.
├── 02_qiime2_data_processing.Rmd
├── LICENSE.md
├── README.md
├── Rplots.pdf
├── config
│  ├── config.yml
│  ├── pbs
│  ├── samples.tsv
│  ├── slurm
│  └── units.tsv
├── css
│  └── styles.css
├── dags
│  ├── rulegraph.png
│  └── rulegraph.svg
├── data
│  ├── mothur
│  └── qiime2
├── figures
│  ├── taxon_barplot.png
│  └── taxon_barplot.svg
├── images
│  ├── bkgd.png
│  ├── coders.png
│  ├── processdata.png
│  └── smkreport
├── imap-data-processing.Rproj
├── index.Rmd
├── library
│  ├── apa.csl
│  ├── imap.bib
│  └── references.bib
├── report.html
├── results
│  └── project_tree.txt
├── styles.css
└── workflow
├── Snakefile
├── Snakefile__
├── envs
├── report
├── rules
├── schemas
└── scripts
19 directories, 25 files
Troubleshooting of FAQs
- Question
- Question
-
Answer
-
Answer