9 Data Tranformation from Phyloseq Objects

The process of data transformation in microbiome analysis involves converting raw or relative abundance values into matrices that are suitable for further analysis. Data transformation focuses on modifying the data itself. In this context, we will explore various methods used for this transformation on a phyloseq object.

9.1 Import processed phyloseq object

load("../imap-data-preparation/data/phyloseq_raw_rel_psextra_df_objects.rda", verbose = TRUE) 
Loading objects:
  ps_raw
  ps_rel
  psextra_raw
  psextra_rel
  ps_df

9.2 Raw Abundance

ps_raw <- psextra_raw
otu_table(ps_raw)[1:10, 1:10]
OTU Table:          [10 taxa and 10 samples]
                     taxa are rows
                       Sample-1 Sample-2 Sample-3 Sample-4 Sample-5 Sample-6
Eubacterium limosum           1        1        1        1        1        1
Staphylococcus                0        0        0        0        0        0
Oceanospirillum               1        2        1        2        1        1
Ruminococcus obeum           89      901      620      476      457      375
Burkholderia                  1        1        2        0        2        1
Clostridium sphenoides       41      127      133       63       94       90
Fusobacteria                  4        5        5        5        5        8
Bifidobacterium              43       25      183      493       22       24
Methylobacterium              0        0        0        0        0        0
Clostridium ramosum           2        2        2        2        2        2
                       Sample-7 Sample-8 Sample-9 Sample-10
Eubacterium limosum           1        1        1         1
Staphylococcus                0        0        0         0
Oceanospirillum               1        1        1         2
Ruminococcus obeum          661      444      339       599
Burkholderia                  1        1        0         1
Clostridium sphenoides      153       92       55        84
Fusobacteria                  5        5        4         5
Bifidobacterium             116       62      151        20
Methylobacterium              0        0        0         0
Clostridium ramosum           3        2        2         2

9.3 No Transformation

Similar to raw abundance

(ps_identity <- microbiome::transform(ps_raw, 'identity'))
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 130 taxa and 222 samples ]
sample_data() Sample Data:       [ 222 samples by 8 sample variables ]
tax_table()   Taxonomy Table:    [ 130 taxa by 3 taxonomic ranks ]

cat("\n\n")
otu_table(ps_identity)[1:5, 1:3]
OTU Table:          [5 taxa and 3 samples]
                     taxa are rows
                    Sample-1 Sample-2 Sample-3
Eubacterium limosum        1        1        1
Staphylococcus             0        0        0
Oceanospirillum            1        2        1
Ruminococcus obeum        89      901      620
Burkholderia               1        1        2

9.4 Relative abundance

(ps_rel = phyloseq::transform_sample_counts(ps_raw, function(x){x / sum(x)}))
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 130 taxa and 222 samples ]
sample_data() Sample Data:       [ 222 samples by 8 sample variables ]
tax_table()   Taxonomy Table:    [ 130 taxa by 3 taxonomic ranks ]

otu_table(ps_rel)[1:5, 1:3]
OTU Table:          [5 taxa and 3 samples]
                     taxa are rows
                        Sample-1     Sample-2     Sample-3
Eubacterium limosum 0.0001182173 4.742483e-05 3.462244e-05
Staphylococcus      0.0000000000 0.000000e+00 0.000000e+00
Oceanospirillum     0.0001182173 9.484966e-05 3.462244e-05
Ruminococcus obeum  0.0105213382 4.272977e-02 2.146591e-02
Burkholderia        0.0001182173 4.742483e-05 6.924488e-05

9.5 Arc sine (asin) transformation

  • Typically used when dealing with proportional and percentages.
  • Proportionals range from 0 to 1
  • Percentages range from 0 to 100
  • The Metaphlan3 relative abundances are in percentages! That means the column totals in 100.
x = otu_table(ps_rel)
y <- x/max(x)

ps_asin <- round(asin(sqrt(y)), 6)

ps_asin <- as.matrix(ps_asin)
ps_asin[1:5, 1:4]
OTU Table:          [5 taxa and 4 samples]
                     taxa are rows
                    Sample-1 Sample-2 Sample-3 Sample-4
Eubacterium limosum 0.012391 0.007848 0.006706 0.009903
Staphylococcus      0.000000 0.000000 0.000000 0.000000
Oceanospirillum     0.012391 0.011099 0.006706 0.014005
Ruminococcus obeum  0.117166 0.237814 0.167759 0.217764
Burkholderia        0.012391 0.007848 0.009484 0.000000

9.6 Compositional Version

Compositional data represents relative proportions or percentages of different microbial taxa within a sample, rather than the absolute abundance of each taxon. This transformation is necessary because raw abundance data is typically affected by various factors, such as sequencing depth, which can lead to spurious correlations and biases in downstream analyses.

(ps_compositional <- microbiome::transform(ps_raw, 'compositional'))
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 130 taxa and 222 samples ]
sample_data() Sample Data:       [ 222 samples by 8 sample variables ]
tax_table()   Taxonomy Table:    [ 130 taxa by 3 taxonomic ranks ]

cat("\n\n")
otu_table(ps_compositional)[1:5, 1:3]
OTU Table:          [5 taxa and 3 samples]
                     taxa are rows
                        Sample-1     Sample-2     Sample-3
Eubacterium limosum 0.0001182173 4.742483e-05 3.462244e-05
Staphylococcus      0.0000000000 0.000000e+00 0.000000e+00
Oceanospirillum     0.0001182173 9.484966e-05 3.462244e-05
Ruminococcus obeum  0.0105213382 4.272977e-02 2.146591e-02
Burkholderia        0.0001182173 4.742483e-05 6.924488e-05

9.7 Z-transform for OTUs

(ps_z_otu <- microbiome::transform(ps_raw, 'Z', 'OTU'))
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 130 taxa and 222 samples ]
sample_data() Sample Data:       [ 222 samples by 8 sample variables ]
tax_table()   Taxonomy Table:    [ 130 taxa by 3 taxonomic ranks ]

cat("\n\n")
otu_table(ps_z_otu)[1:5, 1:3]
OTU Table:          [5 taxa and 3 samples]
                     taxa are rows
                      Sample-1   Sample-2   Sample-3
Eubacterium limosum -0.2855358 -0.2855358 -0.2855358
Staphylococcus      -0.2685840 -0.2685840 -0.2685840
Oceanospirillum     -0.5570347  0.2530445 -0.5570347
Ruminococcus obeum  -0.7310885  2.1112033  1.6508691
Burkholderia         0.3632341  0.3632341  1.1488042

9.8 Z-transform for Samples

(ps_z_sample <- microbiome::transform(ps_raw, 'Z', 'sample'))
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 130 taxa and 222 samples ]
sample_data() Sample Data:       [ 222 samples by 8 sample variables ]
tax_table()   Taxonomy Table:    [ 130 taxa by 3 taxonomic ranks ]

cat("\n\n")
otu_table(ps_z_sample)[1:5, 1:3]
OTU Table:          [5 taxa and 3 samples]
                     taxa are rows
                      Sample-1   Sample-2   Sample-3
Eubacterium limosum -0.7821978 -0.8924188 -0.8682102
Staphylococcus      -1.1517915 -1.2167251 -1.1804675
Oceanospirillum     -0.7821978 -0.7027117 -0.8682102
Ruminococcus obeum   1.2475566  1.9669852  1.7167961
Burkholderia        -0.7821978 -0.8924188 -0.6855514

9.9 Log10 Transform

(ps_log10 <- microbiome::transform(ps_raw, 'log10'))
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 130 taxa and 222 samples ]
sample_data() Sample Data:       [ 222 samples by 8 sample variables ]
tax_table()   Taxonomy Table:    [ 130 taxa by 3 taxonomic ranks ]

cat("\n\n")
otu_table(ps_log10)[1:5, 1:3]
OTU Table:          [5 taxa and 3 samples]
                     taxa are rows
                    Sample-1  Sample-2  Sample-3
Eubacterium limosum 0.301030 0.3010300 0.3010300
Staphylococcus      0.000000 0.0000000 0.0000000
Oceanospirillum     0.301030 0.4771213 0.3010300
Ruminococcus obeum  1.954243 2.9552065 2.7930916
Burkholderia        0.301030 0.3010300 0.4771213

9.10 Log10p Transform

(ps_log10p <- microbiome::transform(ps_raw, 'log10p'))
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 130 taxa and 222 samples ]
sample_data() Sample Data:       [ 222 samples by 8 sample variables ]
tax_table()   Taxonomy Table:    [ 130 taxa by 3 taxonomic ranks ]

cat("\n\n")
otu_table(ps_log10p)[1:5, 1:3]
OTU Table:          [5 taxa and 3 samples]
                     taxa are rows
                    Sample-1  Sample-2  Sample-3
Eubacterium limosum 0.301030 0.3010300 0.3010300
Staphylococcus      0.000000 0.0000000 0.0000000
Oceanospirillum     0.301030 0.4771213 0.3010300
Ruminococcus obeum  1.954243 2.9552065 2.7930916
Burkholderia        0.301030 0.3010300 0.4771213

9.11 CLR Transform

  • Note that small pseudocount is added if data contains zeroes
(ps_clr <- microbiome::transform(ps_raw, 'clr'))
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 130 taxa and 222 samples ]
sample_data() Sample Data:       [ 222 samples by 8 sample variables ]
tax_table()   Taxonomy Table:    [ 130 taxa by 3 taxonomic ranks ]

cat("\n\n")
otu_table(ps_clr)[1:5, 1:3]
OTU Table:          [5 taxa and 3 samples]
                     taxa are rows
                     Sample-1  Sample-2  Sample-3
Eubacterium limosum -1.497647 -2.040486 -2.016371
Staphylococcus      -3.555474 -3.359449 -3.114983
Oceanospirillum     -1.497647 -1.490870 -2.016371
Ruminococcus obeum   2.855975  4.452252  4.008689
Burkholderia        -1.497647 -2.040486 -1.505545

9.12 Shift the baseline

(ps_shift <- microbiome::transform(ps_raw, 'shift', shift=1))
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 130 taxa and 222 samples ]
sample_data() Sample Data:       [ 222 samples by 8 sample variables ]
tax_table()   Taxonomy Table:    [ 130 taxa by 3 taxonomic ranks ]

cat("\n\n")
otu_table(ps_shift)[1:5, 1:3]
OTU Table:          [5 taxa and 3 samples]
                     taxa are rows
                    Sample-1 Sample-2 Sample-3
Eubacterium limosum        2        2        2
Staphylococcus             1        1        1
Oceanospirillum            2        3        2
Ruminococcus obeum        90      902      621
Burkholderia               2        2        3

9.13 Data Scaling

(ps_scale <- microbiome::transform(ps_raw, 'scale', scale=1))
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 130 taxa and 222 samples ]
sample_data() Sample Data:       [ 222 samples by 8 sample variables ]
tax_table()   Taxonomy Table:    [ 130 taxa by 3 taxonomic ranks ]

cat("\n\n")
otu_table(ps_scale)[1:5, 1:3]
OTU Table:          [5 taxa and 3 samples]
                     taxa are rows
                    Sample-1 Sample-2 Sample-3
Eubacterium limosum        1        1        1
Staphylococcus             0        0        0
Oceanospirillum            1        2        1
Ruminococcus obeum        89      901      620
Burkholderia               1        1        2

9.14 Save transformed objects

save(
  ps_asin, 
  ps_identity,
  ps_compositional, 
  ps_z_otu, 
  ps_z_sample, 
  ps_log10, 
  ps_log10p, 
  ps_clr, 
  ps_shift, 
  ps_scale, 
  file = "data/ps_transformed.rda")

9.15 Confirm saved object

load("data/ps_transformed.rda", verbose = TRUE)
Loading objects:
  ps_asin
  ps_identity
  ps_compositional
  ps_z_otu
  ps_z_sample
  ps_log10
  ps_log10p
  ps_clr
  ps_shift
  ps_scale