Data processing

General snakemake workflow

A tentative snakemake workflow that defines data processing rules in a DAG (directed acyclic graph) format. A detailed interactive snakemake HTML report is available here. Use a wider screen to get a better interactive snakemake report.

Data Tidying

Requires:

Sample metadata
OTU tables from mothur and qiime2 pipelines.
Taxonomy tables from mothur and qiime2 pipelines.

Composite objects

Requires:

Tidy sample metadata.
Tidy OTU tables.
Tidy taxonomy tables.

The mothur and qiime2 composite objects are in a long-format which is suitable for most types of data visualization.

Phyloseq objects

Requires:

A phyloseq package
Tidy sample metadata.
Tidy OTU tables.
Tidy taxonomy tables.

Data transformation

Requires:

A phyloseq package
A microbiome package
Phyloseq objects

Data transformation is intended to converting the values into ready-to-use matrices. There are different methods out there, and here are just a few:

No transformation is similar to raw abundance.
Compositional version or relative abundance.
Arc sine (asin) transformation.
Z-transform for OTUs.
Z-transform for Samples.
Log10 transformation.
Log10p transformation,
CLR transformation.
Shift the baseline.
Data Scaling.

Data reduction

Dimensionality reduction is intended to reduce the dimension of the variables but keeping as much of the variation as possible. , and includes:

Linear methods (commonly used in microbiome data analysis)
- PCA (Principal Component Analysis)
- Factor Analysis
- LDA (Linear Discriminant Analysis)
- More here.
Non-linear methods

Related work

Repo	Description	Status
IMAP-GLIMPSE	IMAP project overview	In-progress
IMAP-PART 01	Software requirement for microbiome data analysis with Snakemake workflows	In-progress
IMAP-PART 02	Downloading and exploring microbiome sample metadata from SRA Database	In-progress
IMAP-PART 03	Downloading and filtering microbiome sequencing data from SRA database	In-progress
IMAP-PART 04	Quality Control of Microbiome Next Generation Sequencing Reads	In-progress
IMAP-PART 05	Microbial profiling using MOTHUR and Snakemake workflows	In-progress
IMAP-PART 06	Microbial profiling using QIIME2 and Snakemake workflows	In-progress
IMAP-PART 07	Processing Output from 16S-Based microbiome bioinformatics pipelines	In-progress
IMAP-PART 08	Exploratory Analysis of 16S-Based Microbiome Processed Data	In-progress
IMAP-SUMMARY	Summary of snakemake workflows for microbiome data analysis	In-progress

Citation

Please consider citing the iMAP article^[1] if you find any part of the IMAP practical user guides helpful in your microbiome data analysis.

References

[1]

Buza, T. M., Tonui, T., Stomeo, F., Tiambo, C., Katani, R., Schilling, M., … Kapur, V. (2019). iMAP: An integrated bioinformatics and visualization pipeline for microbiome data analysis. BMC Bioinformatics, 20. https://doi.org/10.1186/S12859-019-2965-4

Appendix

Project main tree

.
├── 02_qiime2_data_processing.Rmd
├── LICENSE.md
├── README.md
├── Rplots.pdf
├── config
│   ├── config.yml
│   ├── pbs
│   ├── samples.tsv
│   ├── slurm
│   └── units.tsv
├── css
│   └── styles.css
├── dags
│   ├── rulegraph.png
│   └── rulegraph.svg
├── data
│   ├── mothur
│   └── qiime2
├── figures
│   ├── taxon_barplot.png
│   └── taxon_barplot.svg
├── images
│   ├── bkgd.png
│   ├── coders.png
│   ├── processdata.png
│   └── smkreport
├── imap-data-processing.Rproj
├── index.Rmd
├── library
│   ├── apa.csl
│   ├── imap.bib
│   └── references.bib
├── report.html
├── results
│   └── project_tree.txt
├── styles.css
└── workflow
    ├── Snakefile
    ├── Snakefile__
    ├── envs
    ├── report
    ├── rules
    ├── schemas
    └── scripts

19 directories, 25 files

Troubleshooting of FAQs

Question

Answer

Question

Answer

IMAP-PART 07

Processing Output from 16S-based microbiome bioinformatics pipelines

Fostering Reproducible Microbiome data Analysis with Snakemake workflow

Teresia Mrema-Buza

IMAP-PART 07
Latest GitHub-Repo
Maintained by Teresia Mrema-Buza

Updated on 2023-04-25

Data processing

General snakemake workflow

Data Tidying

Composite objects

Phyloseq objects

Data transformation

Data reduction

Citation

References

Appendix

Project main tree

Troubleshooting of FAQs

IMAP-PART 07Processing Output from 16S-based microbiome bioinformatics pipelines

Fostering Reproducible Microbiome data Analysis with Snakemake workflow

Teresia Mrema-Buza

IMAP-PART 07Latest GitHub-RepoMaintained by Teresia Mrema-BuzaUpdated on 2023-04-25

Data processing

General snakemake workflow

Data Tidying

Composite objects

Phyloseq objects

Data transformation

Data reduction

Related work

Citation

References

Appendix

Project main tree

Troubleshooting of FAQs

IMAP-PART 07

Processing Output from 16S-based microbiome bioinformatics pipelines

IMAP-PART 07
Latest GitHub-Repo
Maintained by Teresia Mrema-Buza

Updated on 2023-04-25