5 Microbiome Bioinformatics Tools
5.1 Read quality control Tools
We categorized the quality control (QC) tools under the bioinformatics tools. Most of the QC tools are available via the bioconda channel. Below are some of the most common bioinformatics tools used to understand read features and their quality scores. You can click on the hyperlink to learn more about installing the software.
- Seqkit for simple statistics[5].
- FastQC for quality assessment of bases[6].
- MultiQC for summarizing FastQC metrics[7].
- BBMap platform for read trimming and decontamination[8].
- Trimmommatic is a flexible read-trimming tool for Illumina NGS data[9].
- Kneaddata via bioconda channel for performing quality control on metagenomic sequencing data[10].
- Kneaddata via the biobakery channel for performing quality control on metagenomic sequencing data[10].
Note that the links for each tool may need to be updated. Make sure to check for the latest instructions online.
5.1.1 Installing using conda or mamba
Direct installation of the bioinformatics tools is much easier using the conda or mamba package management system.
mamba install -c bioconda seqkit
mamba install -c bioconda fastqc
mamba install -c bioconda multiqc
mamba install -c bioconda bbmap
mamba install -c bioconda trimmommatic
mamba install -c bioconda kneaddata
mamba install -c biobakery kneaddata
5.2 Creating readqc
env from YAML file
Filename: environment.yml
5.3 Microbial composition profiling tools
5.3.1 Mothur pipeline
- Most famous for profiling microbial composition from 16S rRNA sequencing data.
- Mothur is an open-source software package for bioinformatics data processing.
- Mothur pipeline is a famous platform comprising over 145 tools that can be integrated for a desired pipeline.
- Mothur has a basic tutorial that help users get started with 16S rRNA gene analysis.
- We can download a stable platform from here.
5.3.2 QIIME2 pipeline
- Most famous for profiling microbial composition from 16S rRNA sequencing data.
- QIIME2 is an open-source microbiome analysis platform with integrated software for quality control, such as DADA2.
- It is a very famous platform with an active community forum.
- QIIME2 has profound tutorials that help users get started with 16S rRNA gene analysis.
- We can install the latest version from here.
5.3.3 MetaPhlAn pipeline
- MetaPhlAn is an open-source pipeline for taxonomic profiling from metagenomic shotgun sequencing data.
- MetaPhlAn tutorial provide a step-by-step guidance for taxonomic profiling from different environmental samples.
5.4 Functional and Metabolic Analysis Network tools
5.4.1 HUMAnN pipeline
- HUMAnN (the HMP Unified Metabolic Analysis Network) is an open-source pipeline for functional profiling from metagenomic sequencing data.
- HUMAnN tutorial provide a step-by-step guidance for functional profiling.
5.5 Installing microbiome pipeline using mamba
Using a conda or mamba package management system is much easier to install bioinformatics tools.
5.6 Demo installing MetaPhlAn and HUMAnN
- We will demonstrate how to create a new environment named
biobakery3
and install MetaPhlAn and HUMAnN pipelines.
Note: When installing HUMAnN e.g using conda
it will
also automatically install MetaPhlAn for microbial profiling. But having
a standalone MetaPhlAn pipeline can be useful when your interest is in
microbial profiling rather than functional profiling.
conda create --name biobakery3 python=3.9
conda activate biobakery3
# Set conda channel priority:
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --add channels biobakery
conda install humann -c biobakery
conda install metaphlan -c bioconda.
# Test the installation
humann_test