Project FOMC00000_DEMO services include NGS sequencing of the V1V3 region of the 16S rRNA gene amplicons from the samples. First and foremost, please
download this report, as well as the sequence raw data from the download links provided below.
These links will expire after 60 days. We cannot guarantee the availability of your data after 60 days.
Full Bioinformatics analysis service was requested. We provide many analyses, starting from the raw sequence quality and noise filtering, pair reads merging, as well as chimera filtering for the sequences, using the
DADA2 denosing algorithm and pipeline.
We also provide many downstream analyses such as taxonomy assignment, alpha and beta diversity analyses, and differential abundance analysis.
For taxonomy assignment, most informative would be the taxonomy barplots. We provide an interactive barplots to show the relative abundance of microbes at different taxonomy levels (from Phylum to species) that you can choose.
If you specify which groups of samples you want to compare for differential abundance, we provide both ANCOM and LEfSe differential abundance analysis.
The samples were processed and analyzed with the ZymoBIOMICS® Service: Targeted
Metagenomic Sequencing (Zymo Research, Irvine, CA).
DNA Extraction: If DNA extraction was performed, one of three different DNA
extraction kits was used depending on the sample type and sample volume and were
used according to the manufacturer’s instructions, unless otherwise stated. The kit used
in this project is marked below:
☐
ZymoBIOMICS® DNA Miniprep Kit (Zymo Research, Irvine, CA)
☐
ZymoBIOMICS® DNA Microprep Kit (Zymo Research, Irvine, CA)
☐
ZymoBIOMICS®-96 MagBead DNA Kit (Zymo Research, Irvine, CA)
☑
N/A (DNA Extraction Not Performed)
Elution Volume: 50µL
Additional Notes: NA
Targeted Library Preparation: The DNA samples were prepared for targeted
sequencing with the Quick-16S™ NGS Library Prep Kit (Zymo Research, Irvine, CA).
These primers were custom designed by Zymo Research to provide the best coverage
of the 16S gene while maintaining high sensitivity. The primer sets used in this project
are marked below:
☐
Quick-16S™ Primer Set V1-V2 (Zymo Research, Irvine, CA)
☑
Quick-16S™ Primer Set V1-V3 (Zymo Research, Irvine, CA)
☐
Quick-16S™ Primer Set V3-V4 (Zymo Research, Irvine, CA)
☐
Quick-16S™ Primer Set V4 (Zymo Research, Irvine, CA)
☐
Quick-16S™ Primer Set V6-V8 (Zymo Research, Irvine, CA)
☐
Other: NA
Additional Notes: NA
The sequencing library was prepared using an innovative library preparation process in
which PCR reactions were performed in real-time PCR machines to control cycles and
therefore limit PCR chimera formation. The final PCR products were quantified with
qPCR fluorescence readings and pooled together based on equal molarity. The final
pooled library was cleaned up with the Select-a-Size DNA Clean & Concentrator™
(Zymo Research, Irvine, CA), then quantified with TapeStation® (Agilent Technologies,
Santa Clara, CA) and Qubit® (Thermo Fisher Scientific, Waltham, WA).
Control Samples: The ZymoBIOMICS® Microbial Community Standard (Zymo
Research, Irvine, CA) was used as a positive control for each DNA extraction, if
performed. The ZymoBIOMICS® Microbial Community DNA Standard (Zymo Research,
Irvine, CA) was used as a positive control for each targeted library preparation.
Negative controls (i.e. blank extraction control, blank library preparation control) were
included to assess the level of bioburden carried by the wet-lab process.
Sequencing: The final library was sequenced on Illumina® MiSeq™ with a V3 reagent kit
(600 cycles). The sequencing was performed with 10% PhiX spike-in.
Absolute Abundance Quantification*: A quantitative real-time PCR was set up with a
standard curve. The standard curve was made with plasmid DNA containing one copy
of the 16S gene and one copy of the fungal ITS2 region prepared in 10-fold serial
dilutions. The primers used were the same as those used in Targeted Library
Preparation. The equation generated by the plasmid DNA standard curve was used to
calculate the number of gene copies in the reaction for each sample. The PCR input
volume (2 µl) was used to calculate the number of gene copies per microliter in each
DNA sample.
The number of genome copies per microliter DNA sample was calculated by dividing
the gene copy number by an assumed number of gene copies per genome. The value
used for 16S copies per genome is 4. The value used for ITS copies per genome is 200.
The amount of DNA per microliter DNA sample was calculated using an assumed
genome size of 4.64 x 106 bp, the genome size of Escherichia coli, for 16S samples, or
an assumed genome size of 1.20 x 107 bp, the genome size of Saccharomyces
cerevisiae, for ITS samples. This calculation is shown below:
Calculated Total DNA = Calculated Total Genome Copies × Assumed Genome Size (4.64 × 106 bp) ×
Average Molecular Weight of a DNA bp (660 g/mole/bp) ÷ Avogadro’s Number (6.022 x 1023/mole)
* Absolute Abundance Quantification is only available for 16S and ITS analyses.
The absolute abundance standard curve data can be viewed in Excel here:
The absolute abundance standard curve is shown below:
The complete report of your project, including all links in this report, can be downloaded by clicking the link provided below. The downloaded file is a compressed ZIP file and once unzipped, open the file “REPORT.html” (may only shown as "REPORT" in your computer) by double clicking it. Your default web browser will open it and you will see the exact content of this report.
Please download and save the file to your computer storage device. The download link will expire after 60 days upon your receiving of this report.
Complete report download link:
To view the report, please follow the following steps:
1.
Download the .zip file from the report link above.
2.
Extract all the contents of the downloaded .zip file to your desktop.
3.
Open the extracted folder and find the "REPORT.html" (may shown as only "REPORT").
4.
Open (double-clicking) the REPORT.html file. Your default browser will open the top age of the complete report. Within the
report, there are links to view all the analyses performed for the project.
The raw NGS sequence data is available for download with the link provided below. The data is a compressed ZIP file and can be unzipped to individual sequence files.
Since this is a pair-end sequencing, each of your samples is represented by two sequence files, one for READ 1,
with the file extension “*_R1.fastq.gz”, another READ 2, with the file extension “*_R1.fastq.gz”.
The files are in FASTQ format and are compressed. FASTQ format is a text-based data format for storing both a biological sequence
and its corresponding quality scores. Most sequence analysis software will be able to open them.
The Sample IDs associated with the R1 and R2 fastq files are listed in the table below:
Sample ID
Original Sample ID
Read 1 File Name
Read 2 File Name
F12694.S100
original sample ID here
zr12694_100V1V3_R1.fastq.gz
zr12694_100V1V3_R2.fastq.gz
F12694.S101
original sample ID here
zr12694_101V1V3_R1.fastq.gz
zr12694_101V1V3_R2.fastq.gz
F12694.S102
original sample ID here
zr12694_102V1V3_R1.fastq.gz
zr12694_102V1V3_R2.fastq.gz
F12694.S103
original sample ID here
zr12694_103V1V3_R1.fastq.gz
zr12694_103V1V3_R2.fastq.gz
F12694.S104
original sample ID here
zr12694_104V1V3_R1.fastq.gz
zr12694_104V1V3_R2.fastq.gz
F12694.S105
original sample ID here
zr12694_105V1V3_R1.fastq.gz
zr12694_105V1V3_R2.fastq.gz
F12694.S106
original sample ID here
zr12694_106V1V3_R1.fastq.gz
zr12694_106V1V3_R2.fastq.gz
F12694.S107
original sample ID here
zr12694_107V1V3_R1.fastq.gz
zr12694_107V1V3_R2.fastq.gz
F12694.S108
original sample ID here
zr12694_108V1V3_R1.fastq.gz
zr12694_108V1V3_R2.fastq.gz
F12694.S109
original sample ID here
zr12694_109V1V3_R1.fastq.gz
zr12694_109V1V3_R2.fastq.gz
F12694.S010
original sample ID here
zr12694_10V1V3_R1.fastq.gz
zr12694_10V1V3_R2.fastq.gz
F12694.S110
original sample ID here
zr12694_110V1V3_R1.fastq.gz
zr12694_110V1V3_R2.fastq.gz
F12694.S111
original sample ID here
zr12694_111V1V3_R1.fastq.gz
zr12694_111V1V3_R2.fastq.gz
F12694.S112
original sample ID here
zr12694_112V1V3_R1.fastq.gz
zr12694_112V1V3_R2.fastq.gz
F12694.S113
original sample ID here
zr12694_113V1V3_R1.fastq.gz
zr12694_113V1V3_R2.fastq.gz
F12694.S114
original sample ID here
zr12694_114V1V3_R1.fastq.gz
zr12694_114V1V3_R2.fastq.gz
F12694.S115
original sample ID here
zr12694_115V1V3_R1.fastq.gz
zr12694_115V1V3_R2.fastq.gz
F12694.S116
original sample ID here
zr12694_116V1V3_R1.fastq.gz
zr12694_116V1V3_R2.fastq.gz
F12694.S117
original sample ID here
zr12694_117V1V3_R1.fastq.gz
zr12694_117V1V3_R2.fastq.gz
F12694.S118
original sample ID here
zr12694_118V1V3_R1.fastq.gz
zr12694_118V1V3_R2.fastq.gz
F12694.S119
original sample ID here
zr12694_119V1V3_R1.fastq.gz
zr12694_119V1V3_R2.fastq.gz
F12694.S011
original sample ID here
zr12694_11V1V3_R1.fastq.gz
zr12694_11V1V3_R2.fastq.gz
F12694.S120
original sample ID here
zr12694_120V1V3_R1.fastq.gz
zr12694_120V1V3_R2.fastq.gz
F12694.S121
original sample ID here
zr12694_121V1V3_R1.fastq.gz
zr12694_121V1V3_R2.fastq.gz
F12694.S122
original sample ID here
zr12694_122V1V3_R1.fastq.gz
zr12694_122V1V3_R2.fastq.gz
F12694.S123
original sample ID here
zr12694_123V1V3_R1.fastq.gz
zr12694_123V1V3_R2.fastq.gz
F12694.S124
original sample ID here
zr12694_124V1V3_R1.fastq.gz
zr12694_124V1V3_R2.fastq.gz
F12694.S125
original sample ID here
zr12694_125V1V3_R1.fastq.gz
zr12694_125V1V3_R2.fastq.gz
F12694.S126
original sample ID here
zr12694_126V1V3_R1.fastq.gz
zr12694_126V1V3_R2.fastq.gz
F12694.S127
original sample ID here
zr12694_127V1V3_R1.fastq.gz
zr12694_127V1V3_R2.fastq.gz
F12694.S128
original sample ID here
zr12694_128V1V3_R1.fastq.gz
zr12694_128V1V3_R2.fastq.gz
F12694.S129
original sample ID here
zr12694_129V1V3_R1.fastq.gz
zr12694_129V1V3_R2.fastq.gz
F12694.S012
original sample ID here
zr12694_12V1V3_R1.fastq.gz
zr12694_12V1V3_R2.fastq.gz
F12694.S130
original sample ID here
zr12694_130V1V3_R1.fastq.gz
zr12694_130V1V3_R2.fastq.gz
F12694.S131
original sample ID here
zr12694_131V1V3_R1.fastq.gz
zr12694_131V1V3_R2.fastq.gz
F12694.S132
original sample ID here
zr12694_132V1V3_R1.fastq.gz
zr12694_132V1V3_R2.fastq.gz
F12694.S133
original sample ID here
zr12694_133V1V3_R1.fastq.gz
zr12694_133V1V3_R2.fastq.gz
F12694.S134
original sample ID here
zr12694_134V1V3_R1.fastq.gz
zr12694_134V1V3_R2.fastq.gz
F12694.S135
original sample ID here
zr12694_135V1V3_R1.fastq.gz
zr12694_135V1V3_R2.fastq.gz
F12694.S136
original sample ID here
zr12694_136V1V3_R1.fastq.gz
zr12694_136V1V3_R2.fastq.gz
F12694.S137
original sample ID here
zr12694_137V1V3_R1.fastq.gz
zr12694_137V1V3_R2.fastq.gz
F12694.S138
original sample ID here
zr12694_138V1V3_R1.fastq.gz
zr12694_138V1V3_R2.fastq.gz
F12694.S139
original sample ID here
zr12694_139V1V3_R1.fastq.gz
zr12694_139V1V3_R2.fastq.gz
F12694.S013
original sample ID here
zr12694_13V1V3_R1.fastq.gz
zr12694_13V1V3_R2.fastq.gz
F12694.S140
original sample ID here
zr12694_140V1V3_R1.fastq.gz
zr12694_140V1V3_R2.fastq.gz
F12694.S141
original sample ID here
zr12694_141V1V3_R1.fastq.gz
zr12694_141V1V3_R2.fastq.gz
F12694.S142
original sample ID here
zr12694_142V1V3_R1.fastq.gz
zr12694_142V1V3_R2.fastq.gz
F12694.S143
original sample ID here
zr12694_143V1V3_R1.fastq.gz
zr12694_143V1V3_R2.fastq.gz
F12694.S144
original sample ID here
zr12694_144V1V3_R1.fastq.gz
zr12694_144V1V3_R2.fastq.gz
F12694.S145
original sample ID here
zr12694_145V1V3_R1.fastq.gz
zr12694_145V1V3_R2.fastq.gz
F12694.S146
original sample ID here
zr12694_146V1V3_R1.fastq.gz
zr12694_146V1V3_R2.fastq.gz
F12694.S147
original sample ID here
zr12694_147V1V3_R1.fastq.gz
zr12694_147V1V3_R2.fastq.gz
F12694.S148
original sample ID here
zr12694_148V1V3_R1.fastq.gz
zr12694_148V1V3_R2.fastq.gz
F12694.S149
original sample ID here
zr12694_149V1V3_R1.fastq.gz
zr12694_149V1V3_R2.fastq.gz
F12694.S014
original sample ID here
zr12694_14V1V3_R1.fastq.gz
zr12694_14V1V3_R2.fastq.gz
F12694.S150
original sample ID here
zr12694_150V1V3_R1.fastq.gz
zr12694_150V1V3_R2.fastq.gz
F12694.S151
original sample ID here
zr12694_151V1V3_R1.fastq.gz
zr12694_151V1V3_R2.fastq.gz
F12694.S152
original sample ID here
zr12694_152V1V3_R1.fastq.gz
zr12694_152V1V3_R2.fastq.gz
F12694.S153
original sample ID here
zr12694_153V1V3_R1.fastq.gz
zr12694_153V1V3_R2.fastq.gz
F12694.S154
original sample ID here
zr12694_154V1V3_R1.fastq.gz
zr12694_154V1V3_R2.fastq.gz
F12694.S155
original sample ID here
zr12694_155V1V3_R1.fastq.gz
zr12694_155V1V3_R2.fastq.gz
F12694.S156
original sample ID here
zr12694_156V1V3_R1.fastq.gz
zr12694_156V1V3_R2.fastq.gz
F12694.S157
original sample ID here
zr12694_157V1V3_R1.fastq.gz
zr12694_157V1V3_R2.fastq.gz
F12694.S158
original sample ID here
zr12694_158V1V3_R1.fastq.gz
zr12694_158V1V3_R2.fastq.gz
F12694.S159
original sample ID here
zr12694_159V1V3_R1.fastq.gz
zr12694_159V1V3_R2.fastq.gz
F12694.S015
original sample ID here
zr12694_15V1V3_R1.fastq.gz
zr12694_15V1V3_R2.fastq.gz
F12694.S160
original sample ID here
zr12694_160V1V3_R1.fastq.gz
zr12694_160V1V3_R2.fastq.gz
F12694.S161
original sample ID here
zr12694_161V1V3_R1.fastq.gz
zr12694_161V1V3_R2.fastq.gz
F12694.S162
original sample ID here
zr12694_162V1V3_R1.fastq.gz
zr12694_162V1V3_R2.fastq.gz
F12694.S163
original sample ID here
zr12694_163V1V3_R1.fastq.gz
zr12694_163V1V3_R2.fastq.gz
F12694.S164
original sample ID here
zr12694_164V1V3_R1.fastq.gz
zr12694_164V1V3_R2.fastq.gz
F12694.S165
original sample ID here
zr12694_165V1V3_R1.fastq.gz
zr12694_165V1V3_R2.fastq.gz
F12694.S166
original sample ID here
zr12694_166V1V3_R1.fastq.gz
zr12694_166V1V3_R2.fastq.gz
F12694.S167
original sample ID here
zr12694_167V1V3_R1.fastq.gz
zr12694_167V1V3_R2.fastq.gz
F12694.S168
original sample ID here
zr12694_168V1V3_R1.fastq.gz
zr12694_168V1V3_R2.fastq.gz
F12694.S169
original sample ID here
zr12694_169V1V3_R1.fastq.gz
zr12694_169V1V3_R2.fastq.gz
F12694.S016
original sample ID here
zr12694_16V1V3_R1.fastq.gz
zr12694_16V1V3_R2.fastq.gz
F12694.S170
original sample ID here
zr12694_170V1V3_R1.fastq.gz
zr12694_170V1V3_R2.fastq.gz
F12694.S171
original sample ID here
zr12694_171V1V3_R1.fastq.gz
zr12694_171V1V3_R2.fastq.gz
F12694.S172
original sample ID here
zr12694_172V1V3_R1.fastq.gz
zr12694_172V1V3_R2.fastq.gz
F12694.S173
original sample ID here
zr12694_173V1V3_R1.fastq.gz
zr12694_173V1V3_R2.fastq.gz
F12694.S174
original sample ID here
zr12694_174V1V3_R1.fastq.gz
zr12694_174V1V3_R2.fastq.gz
F12694.S175
original sample ID here
zr12694_175V1V3_R1.fastq.gz
zr12694_175V1V3_R2.fastq.gz
F12694.S176
original sample ID here
zr12694_176V1V3_R1.fastq.gz
zr12694_176V1V3_R2.fastq.gz
F12694.S177
original sample ID here
zr12694_177V1V3_R1.fastq.gz
zr12694_177V1V3_R2.fastq.gz
F12694.S178
original sample ID here
zr12694_178V1V3_R1.fastq.gz
zr12694_178V1V3_R2.fastq.gz
F12694.S179
original sample ID here
zr12694_179V1V3_R1.fastq.gz
zr12694_179V1V3_R2.fastq.gz
F12694.S017
original sample ID here
zr12694_17V1V3_R1.fastq.gz
zr12694_17V1V3_R2.fastq.gz
F12694.S180
original sample ID here
zr12694_180V1V3_R1.fastq.gz
zr12694_180V1V3_R2.fastq.gz
F12694.S181
original sample ID here
zr12694_181V1V3_R1.fastq.gz
zr12694_181V1V3_R2.fastq.gz
F12694.S182
original sample ID here
zr12694_182V1V3_R1.fastq.gz
zr12694_182V1V3_R2.fastq.gz
F12694.S183
original sample ID here
zr12694_183V1V3_R1.fastq.gz
zr12694_183V1V3_R2.fastq.gz
F12694.S184
original sample ID here
zr12694_184V1V3_R1.fastq.gz
zr12694_184V1V3_R2.fastq.gz
F12694.S185
original sample ID here
zr12694_185V1V3_R1.fastq.gz
zr12694_185V1V3_R2.fastq.gz
F12694.S186
original sample ID here
zr12694_186V1V3_R1.fastq.gz
zr12694_186V1V3_R2.fastq.gz
F12694.S187
original sample ID here
zr12694_187V1V3_R1.fastq.gz
zr12694_187V1V3_R2.fastq.gz
F12694.S188
original sample ID here
zr12694_188V1V3_R1.fastq.gz
zr12694_188V1V3_R2.fastq.gz
F12694.S189
original sample ID here
zr12694_189V1V3_R1.fastq.gz
zr12694_189V1V3_R2.fastq.gz
F12694.S018
original sample ID here
zr12694_18V1V3_R1.fastq.gz
zr12694_18V1V3_R2.fastq.gz
F12694.S190
original sample ID here
zr12694_190V1V3_R1.fastq.gz
zr12694_190V1V3_R2.fastq.gz
F12694.S191
original sample ID here
zr12694_191V1V3_R1.fastq.gz
zr12694_191V1V3_R2.fastq.gz
F12694.S192
original sample ID here
zr12694_192V1V3_R1.fastq.gz
zr12694_192V1V3_R2.fastq.gz
F12694.S193
original sample ID here
zr12694_193V1V3_R1.fastq.gz
zr12694_193V1V3_R2.fastq.gz
F12694.S194
original sample ID here
zr12694_194V1V3_R1.fastq.gz
zr12694_194V1V3_R2.fastq.gz
F12694.S195
original sample ID here
zr12694_195V1V3_R1.fastq.gz
zr12694_195V1V3_R2.fastq.gz
F12694.S196
original sample ID here
zr12694_196V1V3_R1.fastq.gz
zr12694_196V1V3_R2.fastq.gz
F12694.S197
original sample ID here
zr12694_197V1V3_R1.fastq.gz
zr12694_197V1V3_R2.fastq.gz
F12694.S198
original sample ID here
zr12694_198V1V3_R1.fastq.gz
zr12694_198V1V3_R2.fastq.gz
F12694.S199
original sample ID here
zr12694_199V1V3_R1.fastq.gz
zr12694_199V1V3_R2.fastq.gz
F12694.S019
original sample ID here
zr12694_19V1V3_R1.fastq.gz
zr12694_19V1V3_R2.fastq.gz
F12694.S001
original sample ID here
zr12694_1V1V3_R1.fastq.gz
zr12694_1V1V3_R2.fastq.gz
F12694.S200
original sample ID here
zr12694_200V1V3_R1.fastq.gz
zr12694_200V1V3_R2.fastq.gz
F12694.S201
original sample ID here
zr12694_201V1V3_R1.fastq.gz
zr12694_201V1V3_R2.fastq.gz
F12694.S202
original sample ID here
zr12694_202V1V3_R1.fastq.gz
zr12694_202V1V3_R2.fastq.gz
F12694.S203
original sample ID here
zr12694_203V1V3_R1.fastq.gz
zr12694_203V1V3_R2.fastq.gz
F12694.S204
original sample ID here
zr12694_204V1V3_R1.fastq.gz
zr12694_204V1V3_R2.fastq.gz
F12694.S205
original sample ID here
zr12694_205V1V3_R1.fastq.gz
zr12694_205V1V3_R2.fastq.gz
F12694.S206
original sample ID here
zr12694_206V1V3_R1.fastq.gz
zr12694_206V1V3_R2.fastq.gz
F12694.S207
original sample ID here
zr12694_207V1V3_R1.fastq.gz
zr12694_207V1V3_R2.fastq.gz
F12694.S208
original sample ID here
zr12694_208V1V3_R1.fastq.gz
zr12694_208V1V3_R2.fastq.gz
F12694.S209
original sample ID here
zr12694_209V1V3_R1.fastq.gz
zr12694_209V1V3_R2.fastq.gz
F12694.S020
original sample ID here
zr12694_20V1V3_R1.fastq.gz
zr12694_20V1V3_R2.fastq.gz
F12694.S210
original sample ID here
zr12694_210V1V3_R1.fastq.gz
zr12694_210V1V3_R2.fastq.gz
F12694.S211
original sample ID here
zr12694_211V1V3_R1.fastq.gz
zr12694_211V1V3_R2.fastq.gz
F12694.S212
original sample ID here
zr12694_212V1V3_R1.fastq.gz
zr12694_212V1V3_R2.fastq.gz
F12694.S213
original sample ID here
zr12694_213V1V3_R1.fastq.gz
zr12694_213V1V3_R2.fastq.gz
F12694.S214
original sample ID here
zr12694_214V1V3_R1.fastq.gz
zr12694_214V1V3_R2.fastq.gz
F12694.S215
original sample ID here
zr12694_215V1V3_R1.fastq.gz
zr12694_215V1V3_R2.fastq.gz
F12694.S216
original sample ID here
zr12694_216V1V3_R1.fastq.gz
zr12694_216V1V3_R2.fastq.gz
F12694.S217
original sample ID here
zr12694_217V1V3_R1.fastq.gz
zr12694_217V1V3_R2.fastq.gz
F12694.S218
original sample ID here
zr12694_218V1V3_R1.fastq.gz
zr12694_218V1V3_R2.fastq.gz
F12694.S219
original sample ID here
zr12694_219V1V3_R1.fastq.gz
zr12694_219V1V3_R2.fastq.gz
F12694.S021
original sample ID here
zr12694_21V1V3_R1.fastq.gz
zr12694_21V1V3_R2.fastq.gz
F12694.S220
original sample ID here
zr12694_220V1V3_R1.fastq.gz
zr12694_220V1V3_R2.fastq.gz
F12694.S221
original sample ID here
zr12694_221V1V3_R1.fastq.gz
zr12694_221V1V3_R2.fastq.gz
F12694.S222
original sample ID here
zr12694_222V1V3_R1.fastq.gz
zr12694_222V1V3_R2.fastq.gz
F12694.S223
original sample ID here
zr12694_223V1V3_R1.fastq.gz
zr12694_223V1V3_R2.fastq.gz
F12694.S224
original sample ID here
zr12694_224V1V3_R1.fastq.gz
zr12694_224V1V3_R2.fastq.gz
F12694.S225
original sample ID here
zr12694_225V1V3_R1.fastq.gz
zr12694_225V1V3_R2.fastq.gz
F12694.S226
original sample ID here
zr12694_226V1V3_R1.fastq.gz
zr12694_226V1V3_R2.fastq.gz
F12694.S227
original sample ID here
zr12694_227V1V3_R1.fastq.gz
zr12694_227V1V3_R2.fastq.gz
F12694.S228
original sample ID here
zr12694_228V1V3_R1.fastq.gz
zr12694_228V1V3_R2.fastq.gz
F12694.S229
original sample ID here
zr12694_229V1V3_R1.fastq.gz
zr12694_229V1V3_R2.fastq.gz
F12694.S022
original sample ID here
zr12694_22V1V3_R1.fastq.gz
zr12694_22V1V3_R2.fastq.gz
F12694.S230
original sample ID here
zr12694_230V1V3_R1.fastq.gz
zr12694_230V1V3_R2.fastq.gz
F12694.S231
original sample ID here
zr12694_231V1V3_R1.fastq.gz
zr12694_231V1V3_R2.fastq.gz
F12694.S232
original sample ID here
zr12694_232V1V3_R1.fastq.gz
zr12694_232V1V3_R2.fastq.gz
F12694.S233
original sample ID here
zr12694_233V1V3_R1.fastq.gz
zr12694_233V1V3_R2.fastq.gz
F12694.S234
original sample ID here
zr12694_234V1V3_R1.fastq.gz
zr12694_234V1V3_R2.fastq.gz
F12694.S235
original sample ID here
zr12694_235V1V3_R1.fastq.gz
zr12694_235V1V3_R2.fastq.gz
F12694.S236
original sample ID here
zr12694_236V1V3_R1.fastq.gz
zr12694_236V1V3_R2.fastq.gz
F12694.S237
original sample ID here
zr12694_237V1V3_R1.fastq.gz
zr12694_237V1V3_R2.fastq.gz
F12694.S238
original sample ID here
zr12694_238V1V3_R1.fastq.gz
zr12694_238V1V3_R2.fastq.gz
F12694.S239
original sample ID here
zr12694_239V1V3_R1.fastq.gz
zr12694_239V1V3_R2.fastq.gz
F12694.S023
original sample ID here
zr12694_23V1V3_R1.fastq.gz
zr12694_23V1V3_R2.fastq.gz
F12694.S240
original sample ID here
zr12694_240V1V3_R1.fastq.gz
zr12694_240V1V3_R2.fastq.gz
F12694.S241
original sample ID here
zr12694_241V1V3_R1.fastq.gz
zr12694_241V1V3_R2.fastq.gz
F12694.S242
original sample ID here
zr12694_242V1V3_R1.fastq.gz
zr12694_242V1V3_R2.fastq.gz
F12694.S243
original sample ID here
zr12694_243V1V3_R1.fastq.gz
zr12694_243V1V3_R2.fastq.gz
F12694.S244
original sample ID here
zr12694_244V1V3_R1.fastq.gz
zr12694_244V1V3_R2.fastq.gz
F12694.S245
original sample ID here
zr12694_245V1V3_R1.fastq.gz
zr12694_245V1V3_R2.fastq.gz
F12694.S246
original sample ID here
zr12694_246V1V3_R1.fastq.gz
zr12694_246V1V3_R2.fastq.gz
F12694.S247
original sample ID here
zr12694_247V1V3_R1.fastq.gz
zr12694_247V1V3_R2.fastq.gz
F12694.S248
original sample ID here
zr12694_248V1V3_R1.fastq.gz
zr12694_248V1V3_R2.fastq.gz
F12694.S249
original sample ID here
zr12694_249V1V3_R1.fastq.gz
zr12694_249V1V3_R2.fastq.gz
F12694.S024
original sample ID here
zr12694_24V1V3_R1.fastq.gz
zr12694_24V1V3_R2.fastq.gz
F12694.S250
original sample ID here
zr12694_250V1V3_R1.fastq.gz
zr12694_250V1V3_R2.fastq.gz
F12694.S251
original sample ID here
zr12694_251V1V3_R1.fastq.gz
zr12694_251V1V3_R2.fastq.gz
F12694.S252
original sample ID here
zr12694_252V1V3_R1.fastq.gz
zr12694_252V1V3_R2.fastq.gz
F12694.S253
original sample ID here
zr12694_253V1V3_R1.fastq.gz
zr12694_253V1V3_R2.fastq.gz
F12694.S254
original sample ID here
zr12694_254V1V3_R1.fastq.gz
zr12694_254V1V3_R2.fastq.gz
F12694.S255
original sample ID here
zr12694_255V1V3_R1.fastq.gz
zr12694_255V1V3_R2.fastq.gz
F12694.S256
original sample ID here
zr12694_256V1V3_R1.fastq.gz
zr12694_256V1V3_R2.fastq.gz
F12694.S257
original sample ID here
zr12694_257V1V3_R1.fastq.gz
zr12694_257V1V3_R2.fastq.gz
F12694.S258
original sample ID here
zr12694_258V1V3_R1.fastq.gz
zr12694_258V1V3_R2.fastq.gz
F12694.S259
original sample ID here
zr12694_259V1V3_R1.fastq.gz
zr12694_259V1V3_R2.fastq.gz
F12694.S025
original sample ID here
zr12694_25V1V3_R1.fastq.gz
zr12694_25V1V3_R2.fastq.gz
F12694.S260
original sample ID here
zr12694_260V1V3_R1.fastq.gz
zr12694_260V1V3_R2.fastq.gz
F12694.S261
original sample ID here
zr12694_261V1V3_R1.fastq.gz
zr12694_261V1V3_R2.fastq.gz
F12694.S262
original sample ID here
zr12694_262V1V3_R1.fastq.gz
zr12694_262V1V3_R2.fastq.gz
F12694.S263
original sample ID here
zr12694_263V1V3_R1.fastq.gz
zr12694_263V1V3_R2.fastq.gz
F12694.S264
original sample ID here
zr12694_264V1V3_R1.fastq.gz
zr12694_264V1V3_R2.fastq.gz
F12694.S265
original sample ID here
zr12694_265V1V3_R1.fastq.gz
zr12694_265V1V3_R2.fastq.gz
F12694.S266
original sample ID here
zr12694_266V1V3_R1.fastq.gz
zr12694_266V1V3_R2.fastq.gz
F12694.S267
original sample ID here
zr12694_267V1V3_R1.fastq.gz
zr12694_267V1V3_R2.fastq.gz
F12694.S268
original sample ID here
zr12694_268V1V3_R1.fastq.gz
zr12694_268V1V3_R2.fastq.gz
F12694.S026
original sample ID here
zr12694_26V1V3_R1.fastq.gz
zr12694_26V1V3_R2.fastq.gz
F12694.S027
original sample ID here
zr12694_27V1V3_R1.fastq.gz
zr12694_27V1V3_R2.fastq.gz
F12694.S028
original sample ID here
zr12694_28V1V3_R1.fastq.gz
zr12694_28V1V3_R2.fastq.gz
F12694.S029
original sample ID here
zr12694_29V1V3_R1.fastq.gz
zr12694_29V1V3_R2.fastq.gz
F12694.S002
original sample ID here
zr12694_2V1V3_R1.fastq.gz
zr12694_2V1V3_R2.fastq.gz
F12694.S030
original sample ID here
zr12694_30V1V3_R1.fastq.gz
zr12694_30V1V3_R2.fastq.gz
F12694.S031
original sample ID here
zr12694_31V1V3_R1.fastq.gz
zr12694_31V1V3_R2.fastq.gz
F12694.S032
original sample ID here
zr12694_32V1V3_R1.fastq.gz
zr12694_32V1V3_R2.fastq.gz
F12694.S033
original sample ID here
zr12694_33V1V3_R1.fastq.gz
zr12694_33V1V3_R2.fastq.gz
F12694.S034
original sample ID here
zr12694_34V1V3_R1.fastq.gz
zr12694_34V1V3_R2.fastq.gz
F12694.S035
original sample ID here
zr12694_35V1V3_R1.fastq.gz
zr12694_35V1V3_R2.fastq.gz
F12694.S036
original sample ID here
zr12694_36V1V3_R1.fastq.gz
zr12694_36V1V3_R2.fastq.gz
F12694.S037
original sample ID here
zr12694_37V1V3_R1.fastq.gz
zr12694_37V1V3_R2.fastq.gz
F12694.S038
original sample ID here
zr12694_38V1V3_R1.fastq.gz
zr12694_38V1V3_R2.fastq.gz
F12694.S039
original sample ID here
zr12694_39V1V3_R1.fastq.gz
zr12694_39V1V3_R2.fastq.gz
F12694.S003
original sample ID here
zr12694_3V1V3_R1.fastq.gz
zr12694_3V1V3_R2.fastq.gz
F12694.S040
original sample ID here
zr12694_40V1V3_R1.fastq.gz
zr12694_40V1V3_R2.fastq.gz
F12694.S041
original sample ID here
zr12694_41V1V3_R1.fastq.gz
zr12694_41V1V3_R2.fastq.gz
F12694.S042
original sample ID here
zr12694_42V1V3_R1.fastq.gz
zr12694_42V1V3_R2.fastq.gz
F12694.S043
original sample ID here
zr12694_43V1V3_R1.fastq.gz
zr12694_43V1V3_R2.fastq.gz
F12694.S044
original sample ID here
zr12694_44V1V3_R1.fastq.gz
zr12694_44V1V3_R2.fastq.gz
F12694.S045
original sample ID here
zr12694_45V1V3_R1.fastq.gz
zr12694_45V1V3_R2.fastq.gz
F12694.S046
original sample ID here
zr12694_46V1V3_R1.fastq.gz
zr12694_46V1V3_R2.fastq.gz
F12694.S047
original sample ID here
zr12694_47V1V3_R1.fastq.gz
zr12694_47V1V3_R2.fastq.gz
F12694.S048
original sample ID here
zr12694_48V1V3_R1.fastq.gz
zr12694_48V1V3_R2.fastq.gz
F12694.S049
original sample ID here
zr12694_49V1V3_R1.fastq.gz
zr12694_49V1V3_R2.fastq.gz
F12694.S004
original sample ID here
zr12694_4V1V3_R1.fastq.gz
zr12694_4V1V3_R2.fastq.gz
F12694.S050
original sample ID here
zr12694_50V1V3_R1.fastq.gz
zr12694_50V1V3_R2.fastq.gz
F12694.S051
original sample ID here
zr12694_51V1V3_R1.fastq.gz
zr12694_51V1V3_R2.fastq.gz
F12694.S052
original sample ID here
zr12694_52V1V3_R1.fastq.gz
zr12694_52V1V3_R2.fastq.gz
F12694.S053
original sample ID here
zr12694_53V1V3_R1.fastq.gz
zr12694_53V1V3_R2.fastq.gz
F12694.S054
original sample ID here
zr12694_54V1V3_R1.fastq.gz
zr12694_54V1V3_R2.fastq.gz
F12694.S055
original sample ID here
zr12694_55V1V3_R1.fastq.gz
zr12694_55V1V3_R2.fastq.gz
F12694.S056
original sample ID here
zr12694_56V1V3_R1.fastq.gz
zr12694_56V1V3_R2.fastq.gz
F12694.S057
original sample ID here
zr12694_57V1V3_R1.fastq.gz
zr12694_57V1V3_R2.fastq.gz
F12694.S058
original sample ID here
zr12694_58V1V3_R1.fastq.gz
zr12694_58V1V3_R2.fastq.gz
F12694.S059
original sample ID here
zr12694_59V1V3_R1.fastq.gz
zr12694_59V1V3_R2.fastq.gz
F12694.S005
original sample ID here
zr12694_5V1V3_R1.fastq.gz
zr12694_5V1V3_R2.fastq.gz
F12694.S060
original sample ID here
zr12694_60V1V3_R1.fastq.gz
zr12694_60V1V3_R2.fastq.gz
F12694.S061
original sample ID here
zr12694_61V1V3_R1.fastq.gz
zr12694_61V1V3_R2.fastq.gz
F12694.S062
original sample ID here
zr12694_62V1V3_R1.fastq.gz
zr12694_62V1V3_R2.fastq.gz
F12694.S063
original sample ID here
zr12694_63V1V3_R1.fastq.gz
zr12694_63V1V3_R2.fastq.gz
F12694.S064
original sample ID here
zr12694_64V1V3_R1.fastq.gz
zr12694_64V1V3_R2.fastq.gz
F12694.S065
original sample ID here
zr12694_65V1V3_R1.fastq.gz
zr12694_65V1V3_R2.fastq.gz
F12694.S066
original sample ID here
zr12694_66V1V3_R1.fastq.gz
zr12694_66V1V3_R2.fastq.gz
F12694.S067
original sample ID here
zr12694_67V1V3_R1.fastq.gz
zr12694_67V1V3_R2.fastq.gz
F12694.S068
original sample ID here
zr12694_68V1V3_R1.fastq.gz
zr12694_68V1V3_R2.fastq.gz
F12694.S069
original sample ID here
zr12694_69V1V3_R1.fastq.gz
zr12694_69V1V3_R2.fastq.gz
F12694.S006
original sample ID here
zr12694_6V1V3_R1.fastq.gz
zr12694_6V1V3_R2.fastq.gz
F12694.S070
original sample ID here
zr12694_70V1V3_R1.fastq.gz
zr12694_70V1V3_R2.fastq.gz
F12694.S071
original sample ID here
zr12694_71V1V3_R1.fastq.gz
zr12694_71V1V3_R2.fastq.gz
F12694.S072
original sample ID here
zr12694_72V1V3_R1.fastq.gz
zr12694_72V1V3_R2.fastq.gz
F12694.S073
original sample ID here
zr12694_73V1V3_R1.fastq.gz
zr12694_73V1V3_R2.fastq.gz
F12694.S074
original sample ID here
zr12694_74V1V3_R1.fastq.gz
zr12694_74V1V3_R2.fastq.gz
F12694.S075
original sample ID here
zr12694_75V1V3_R1.fastq.gz
zr12694_75V1V3_R2.fastq.gz
F12694.S076
original sample ID here
zr12694_76V1V3_R1.fastq.gz
zr12694_76V1V3_R2.fastq.gz
F12694.S077
original sample ID here
zr12694_77V1V3_R1.fastq.gz
zr12694_77V1V3_R2.fastq.gz
F12694.S078
original sample ID here
zr12694_78V1V3_R1.fastq.gz
zr12694_78V1V3_R2.fastq.gz
F12694.S079
original sample ID here
zr12694_79V1V3_R1.fastq.gz
zr12694_79V1V3_R2.fastq.gz
F12694.S007
original sample ID here
zr12694_7V1V3_R1.fastq.gz
zr12694_7V1V3_R2.fastq.gz
F12694.S080
original sample ID here
zr12694_80V1V3_R1.fastq.gz
zr12694_80V1V3_R2.fastq.gz
F12694.S081
original sample ID here
zr12694_81V1V3_R1.fastq.gz
zr12694_81V1V3_R2.fastq.gz
F12694.S082
original sample ID here
zr12694_82V1V3_R1.fastq.gz
zr12694_82V1V3_R2.fastq.gz
F12694.S083
original sample ID here
zr12694_83V1V3_R1.fastq.gz
zr12694_83V1V3_R2.fastq.gz
F12694.S084
original sample ID here
zr12694_84V1V3_R1.fastq.gz
zr12694_84V1V3_R2.fastq.gz
F12694.S085
original sample ID here
zr12694_85V1V3_R1.fastq.gz
zr12694_85V1V3_R2.fastq.gz
F12694.S086
original sample ID here
zr12694_86V1V3_R1.fastq.gz
zr12694_86V1V3_R2.fastq.gz
F12694.S087
original sample ID here
zr12694_87V1V3_R1.fastq.gz
zr12694_87V1V3_R2.fastq.gz
F12694.S088
original sample ID here
zr12694_88V1V3_R1.fastq.gz
zr12694_88V1V3_R2.fastq.gz
F12694.S089
original sample ID here
zr12694_89V1V3_R1.fastq.gz
zr12694_89V1V3_R2.fastq.gz
F12694.S008
original sample ID here
zr12694_8V1V3_R1.fastq.gz
zr12694_8V1V3_R2.fastq.gz
F12694.S090
original sample ID here
zr12694_90V1V3_R1.fastq.gz
zr12694_90V1V3_R2.fastq.gz
F12694.S091
original sample ID here
zr12694_91V1V3_R1.fastq.gz
zr12694_91V1V3_R2.fastq.gz
F12694.S092
original sample ID here
zr12694_92V1V3_R1.fastq.gz
zr12694_92V1V3_R2.fastq.gz
F12694.S093
original sample ID here
zr12694_93V1V3_R1.fastq.gz
zr12694_93V1V3_R2.fastq.gz
F12694.S094
original sample ID here
zr12694_94V1V3_R1.fastq.gz
zr12694_94V1V3_R2.fastq.gz
F12694.S095
original sample ID here
zr12694_95V1V3_R1.fastq.gz
zr12694_95V1V3_R2.fastq.gz
F12694.S096
original sample ID here
zr12694_96V1V3_R1.fastq.gz
zr12694_96V1V3_R2.fastq.gz
F12694.S097
original sample ID here
zr12694_97V1V3_R1.fastq.gz
zr12694_97V1V3_R2.fastq.gz
F12694.S098
original sample ID here
zr12694_98V1V3_R1.fastq.gz
zr12694_98V1V3_R2.fastq.gz
F12694.S099
original sample ID here
zr12694_99V1V3_R1.fastq.gz
zr12694_99V1V3_R2.fastq.gz
F12694.S009
original sample ID here
zr12694_9V1V3_R1.fastq.gz
zr12694_9V1V3_R2.fastq.gz
Please download and save the file to your computer storage device. The download link will expire after 60 days upon your receiving of this report.
DADA2 is a software package that models and corrects Illumina-sequenced amplicon errors.
DADA2 infers sample sequences exactly, without coarse-graining into OTUs,
and resolves differences of as little as one nucleotide. DADA2 identified more real variants
and output fewer spurious sequences than other methods.
DADA2’s advantage is that it uses more of the data. The DADA2 error model incorporates quality information,
which is ignored by all other methods after filtering. The DADA2 error model incorporates quantitative abundances,
whereas most other methods use abundance ranks if they use abundance at all.
The DADA2 error model identifies the differences between sequences, eg. A->C,
whereas other methods merely count the mismatches. DADA2 can parameterize its error model from the data itself,
rather than relying on previous datasets that may or may not reflect the PCR and sequencing protocols used in your study.
DADA2 pipeline includes several tools for read quality control, including quality filtering, trimming, denoising, pair merging and chimera filtering. Below are the major processing steps of DADA2:
Step 1. Read trimming based on sequence quality
The quality of NGS Illumina sequences often decreases toward the end of the reads.
DADA2 allows to trim off the poor quality read ends in order to improve the error
model building and pair mergicing performance.
Step 2. Learn the Error Rates
The DADA2 algorithm makes use of a parametric error model (err) and every
amplicon dataset has a different set of error rates. The learnErrors method
learns this error model from the data, by alternating estimation of the error
rates and inference of sample composition until they converge on a jointly
consistent solution. As in many machine-learning problems, the algorithm must
begin with an initial guess, for which the maximum possible error rates in
this data are used (the error rates if only the most abundant sequence is
correct and all the rest are errors).
Step 3. Infer amplicon sequence variants (ASVs) based on the error model built in previous step. This step is also called sequence "denoising".
The outcome of this step is a list of ASVs that are the equivalent of oligonucleotides.
Step 4. Merge paired reads. If the sequencing products are read pairs, DADA2 will merge the R1 and R2 ASVs into single sequences.
Merging is performed by aligning the denoised forward reads with the reverse-complement of the corresponding
denoised reverse reads, and then constructing the merged “contig” sequences.
By default, merged sequences are only output if the forward and reverse reads overlap by
at least 12 bases, and are identical to each other in the overlap region (but these conditions can be changed via function arguments).
Step 5. Remove chimera.
The core dada method corrects substitution and indel errors, but chimeras remain. Fortunately, the accuracy of sequence variants
after denoising makes identifying chimeric ASVs simpler than when dealing with fuzzy OTUs.
Chimeric sequences are identified if they can be exactly reconstructed by
combining a left-segment and a right-segment from two more abundant “parent” sequences. The frequency of chimeric sequences varies substantially
from dataset to dataset, and depends on on factors including experimental procedures and sample complexity.
Results
1. Read Quality Plots NGS sequence analaysis starts with visualizing the quality of the sequencing. Below are the quality plots of the first
sample for the R1 and R2 reads separately. In gray-scale is a heat map of the frequency of each quality score at each base position. The mean
quality score at each position is shown by the green line, and the quartiles of the quality score distribution by the orange lines.
The forward reads are usually of better quality. It is a common practice to trim the last few nucleotides to avoid less well-controlled errors
that can arise there. The trimming affects the downstream steps including error model building, merging and chimera calling. FOMC uses an empirical
approach to test many combinations of different trim length in order to achieve best final amplicon sequence variants (ASVs), see the next
section “Optimal trim length for ASVs”.
2. Optimal trim length for ASVs The final number of merged and chimera-filtered ASVs depends on the quality filtering (hence trimming) in the very beginning of the DADA2 pipeline.
In order to achieve highest number of ASVs, an empirical approach was used -
Create a random subset of each sample consisting of 5,000 R1 and 5,000 R2 (to reduce computation time)
Trim 10 bases at a time from the ends of both R1 and R2 up to 50 bases
For each combination of trimmed length (e.g., 300x300, 300x290, 290x290 etc), the trimmed reads are
subject to the entire DADA2 pipeline for chimera-filtered merged ASVs
The combination with highest percentage of the input reads becoming final ASVs is selected for the complete set of data
Below is the result of such operation, showing ASV percentages of total reads for all trimming combinations (1st Column = R1 lengths in bases; 1st Row = R2 lengths in bases):
R1/R2
281
271
261
251
241
231
321
74.19%
74.86%
75.05%
75.85%
81.67%
79.83%
311
74.22%
74.96%
75.11%
75.92%
79.77%
37.28%
301
74.28%
75.02%
75.16%
74.11%
37.05%
20.78%
291
74.29%
75.01%
73.42%
34.18%
20.52%
15.86%
281
74.49%
73.44%
34.25%
19.15%
15.74%
2.02%
271
73.14%
34.82%
19.22%
14.69%
1.99%
1.29%
Based on the above result, the trim length combination of R1 = 321 bases and R2 = 241 bases (highlighted red above), was chosen for generating final ASVs for all sequences.
This combination generated highest number of merged non-chimeric ASVs and was used for downstream analyses, if requested.
3. Error plots from learning the error rates
After DADA2 building the error model for the set of data, it is always worthwhile, as a sanity check if nothing else, to visualize the estimated error rates.
The error rates for each possible transition (A→C, A→G, …) are shown below. Points are the observed error rates for each consensus quality score.
The black line shows the estimated error rates after convergence of the machine-learning algorithm.
The red line shows the error rates expected under the nominal definition of the Q-score.
The ideal result would be the estimated error rates (black line) are a good fit to the observed rates (points), and the error rates drop
with increased quality as expected.
Forward Read R1 Error Plot
Reverse Read R2 Error Plot
The PDF version of these plots are available here:
4. DADA2 Result Summary The table below shows the summary of the DADA2 analysis,
tracking paired read counts of each samples for all the steps during DADA2 denoising process -
including end-trimming (filtered), denoising (denoisedF, denoisedF), pair merging (merged) and chimera removal (nonchim).
Sample ID
F12694.S001
F12694.S002
F12694.S003
F12694.S004
F12694.S005
F12694.S006
F12694.S007
F12694.S008
F12694.S009
F12694.S010
F12694.S011
F12694.S012
F12694.S013
F12694.S014
F12694.S015
F12694.S016
F12694.S017
F12694.S018
F12694.S019
F12694.S020
F12694.S021
F12694.S022
F12694.S023
F12694.S024
F12694.S025
F12694.S026
F12694.S027
F12694.S028
F12694.S029
F12694.S030
F12694.S031
F12694.S032
F12694.S033
F12694.S034
F12694.S035
F12694.S036
F12694.S037
F12694.S038
F12694.S039
F12694.S040
F12694.S041
F12694.S042
F12694.S043
F12694.S044
F12694.S045
F12694.S046
F12694.S047
F12694.S048
F12694.S049
F12694.S050
F12694.S051
F12694.S052
F12694.S053
F12694.S054
F12694.S055
F12694.S056
F12694.S057
F12694.S058
F12694.S059
F12694.S060
F12694.S061
F12694.S062
F12694.S063
F12694.S064
F12694.S065
F12694.S066
F12694.S067
F12694.S068
F12694.S069
F12694.S070
F12694.S071
F12694.S072
F12694.S073
F12694.S074
F12694.S075
F12694.S076
F12694.S077
F12694.S078
F12694.S079
F12694.S080
F12694.S081
F12694.S082
F12694.S083
F12694.S084
F12694.S085
F12694.S086
F12694.S087
F12694.S088
F12694.S089
F12694.S090
F12694.S091
F12694.S092
F12694.S093
F12694.S094
F12694.S095
F12694.S096
F12694.S097
F12694.S098
F12694.S099
F12694.S100
F12694.S101
F12694.S102
F12694.S103
F12694.S104
F12694.S105
F12694.S106
F12694.S107
F12694.S108
F12694.S109
F12694.S110
F12694.S111
F12694.S112
F12694.S113
F12694.S114
F12694.S115
F12694.S116
F12694.S117
F12694.S118
F12694.S119
F12694.S120
F12694.S121
F12694.S122
F12694.S123
F12694.S124
F12694.S125
F12694.S126
F12694.S127
F12694.S128
F12694.S129
F12694.S130
F12694.S131
F12694.S132
F12694.S133
F12694.S134
F12694.S135
F12694.S136
F12694.S137
F12694.S138
F12694.S139
F12694.S140
F12694.S141
F12694.S142
F12694.S143
F12694.S144
F12694.S145
F12694.S146
F12694.S147
F12694.S148
F12694.S149
F12694.S150
F12694.S151
F12694.S152
F12694.S153
F12694.S154
F12694.S155
F12694.S156
F12694.S157
F12694.S158
F12694.S159
F12694.S160
F12694.S161
F12694.S162
F12694.S163
F12694.S164
F12694.S165
F12694.S166
F12694.S167
F12694.S168
F12694.S169
F12694.S170
F12694.S171
F12694.S172
F12694.S173
F12694.S174
F12694.S175
F12694.S176
F12694.S177
F12694.S178
F12694.S179
F12694.S180
F12694.S181
F12694.S182
F12694.S183
F12694.S184
F12694.S185
F12694.S186
F12694.S187
F12694.S188
F12694.S189
F12694.S190
F12694.S191
F12694.S192
F12694.S193
F12694.S194
F12694.S195
F12694.S196
F12694.S197
F12694.S198
F12694.S199
F12694.S200
F12694.S201
F12694.S202
F12694.S203
F12694.S204
F12694.S205
F12694.S206
F12694.S207
F12694.S208
F12694.S209
F12694.S210
F12694.S211
F12694.S212
F12694.S213
F12694.S214
F12694.S215
F12694.S216
F12694.S217
F12694.S218
F12694.S219
F12694.S220
F12694.S221
F12694.S222
F12694.S223
F12694.S224
F12694.S225
F12694.S226
F12694.S227
F12694.S228
F12694.S229
F12694.S230
F12694.S231
F12694.S232
F12694.S233
F12694.S234
F12694.S235
F12694.S236
F12694.S237
F12694.S238
F12694.S239
F12694.S240
F12694.S241
F12694.S242
F12694.S243
F12694.S244
F12694.S245
F12694.S246
F12694.S247
F12694.S248
F12694.S249
F12694.S250
F12694.S251
F12694.S252
F12694.S253
F12694.S254
F12694.S255
F12694.S256
F12694.S257
F12694.S258
F12694.S259
F12694.S260
F12694.S261
F12694.S262
F12694.S263
F12694.S264
F12694.S265
F12694.S266
F12694.S267
F12694.S268
Row Sum
Percentage
input
96,936
158,831
120,714
113,337
110,987
116,497
125,023
133,502
108,213
124,964
92,261
104,369
108,520
94,039
137,996
130,321
111,566
95,092
108,641
112,918
122,226
113,142
113,536
110,170
101,695
106,407
100,174
116,498
118,015
96,322
95,627
109,147
89,994
113,029
110,907
122,957
92,006
100,978
94,282
98,252
138,615
107,358
102,136
95,943
111,141
120,620
117,750
114,416
131,642
121,603
108,248
107,296
116,564
108,325
111,878
125,162
139,608
119,510
101,157
104,180
112,274
119,157
107,890
105,080
165,291
105,045
132,533
141,793
225,870
130,466
83,060
109,485
159,736
100,185
119,602
106,896
100,174
89,802
100,863
108,485
146,436
159,535
111,088
91,747
109,158
122,528
111,595
122,583
142,549
128,827
91,317
117,745
108,698
116,279
104,447
125,856
119,664
125,953
115,625
127,651
96,910
125,210
107,747
123,507
115,607
114,269
108,392
108,835
116,292
100,606
112,094
126,226
116,426
123,098
120,826
106,995
89,276
123,039
116,532
141,823
117,493
117,428
124,738
109,860
95,221
117,031
98,746
110,798
115,872
126,805
134,851
119,287
106,732
137,805
103,166
150,856
216,552
100,440
132,889
119,190
96,697
110,252
110,387
118,257
122,536
121,331
118,818
121,823
117,254
118,986
110,535
130,805
124,640
128,948
98,691
126,681
101,162
137,052
107,058
116,763
138,875
114,953
125,190
96,635
89,910
119,419
122,017
131,391
100,053
121,393
113,183
93,323
118,138
140,675
127,302
123,705
125,238
112,848
124,743
124,668
96,144
143,357
138,614
128,474
153,906
139,382
107,621
99,133
109,461
153,710
108,534
130,346
125,235
91,136
110,676
111,069
109,375
108,827
88,882
137,127
131,161
100,396
128,810
112,672
98,898
112,550
96,044
103,399
104,585
149,256
63,496
115,883
88,303
86,606
132,068
107,291
110,988
115,902
90,726
92,930
99,957
107,329
78,242
92,908
139,458
116,902
125,564
145,873
128,976
106,633
127,389
91,077
149,894
98,886
130,756
137,544
119,789
100,726
119,682
148,185
141,380
124,727
135,064
121,722
90,050
106,686
99,248
66,262
163,060
102,818
92,725
87,082
80,404
61,021
79,381
82,661
157,550
114,641
95,226
81,011
103,224
101,538
117,698
105,118
111,529
103,335
111,901
135,664
30,879,485
100.00%
filtered
94,032
153,870
117,085
109,804
107,625
112,916
120,992
129,161
104,838
121,173
89,432
101,105
105,191
91,184
133,720
126,359
108,101
92,227
105,275
109,506
118,360
109,752
109,839
106,817
98,614
103,118
97,051
112,912
114,385
93,252
92,669
105,687
87,185
109,330
107,466
119,017
89,079
97,700
91,273
95,205
134,336
103,985
98,943
92,900
107,742
116,872
114,033
110,727
127,444
117,859
104,876
103,937
112,929
104,975
108,447
121,330
135,358
115,854
98,169
100,958
108,886
115,419
104,498
101,736
160,102
101,940
128,317
137,395
218,996
126,365
80,408
105,987
154,735
97,001
115,934
103,564
97,142
87,000
97,618
105,148
141,914
154,715
107,752
88,687
105,693
118,762
108,160
118,762
138,141
124,798
88,458
114,122
105,198
112,753
101,290
122,006
116,029
122,148
111,985
123,725
93,878
121,379
104,425
119,797
112,069
110,796
105,043
105,406
112,677
97,457
108,648
122,136
112,625
119,407
116,994
103,675
86,536
119,184
112,806
137,305
113,908
113,774
120,793
106,466
92,252
113,242
95,482
107,381
112,343
122,800
130,599
115,618
103,363
133,427
99,984
146,126
209,833
97,281
128,757
115,401
93,722
106,789
106,974
114,607
118,738
117,538
115,174
117,972
113,601
115,317
107,116
126,791
120,912
124,902
95,689
122,704
97,988
132,715
103,716
113,155
134,599
111,325
121,344
93,574
87,075
115,676
118,193
127,416
96,977
117,627
109,624
90,423
114,451
136,312
123,279
119,787
121,414
109,378
120,759
120,871
93,217
138,868
134,338
124,543
149,222
135,174
104,338
96,124
106,146
148,876
105,055
126,327
121,299
88,359
107,327
107,567
105,851
105,526
86,109
132,874
127,048
97,353
124,818
109,037
95,881
109,075
93,013
100,222
101,302
144,685
61,510
112,406
85,716
83,870
128,004
103,994
107,733
112,357
87,923
89,961
96,801
104,119
75,799
90,071
135,261
113,292
121,765
141,415
125,074
103,332
123,425
88,201
145,321
95,744
126,729
133,226
116,109
97,593
115,927
143,377
136,980
120,957
130,886
117,986
87,294
103,402
96,081
64,174
157,853
99,723
89,905
84,360
77,966
59,136
77,039
80,133
152,702
111,204
92,231
78,494
99,911
98,282
113,991
101,886
107,977
100,090
108,414
131,468
29,920,721
96.90%
denoisedF
93,233
152,358
116,373
108,821
107,006
111,569
120,013
127,972
103,760
119,762
88,223
100,029
103,975
90,403
132,576
125,146
106,965
91,504
104,087
108,203
117,308
108,860
108,686
105,709
97,353
102,126
96,235
111,972
113,399
92,322
91,683
104,443
86,300
108,305
106,327
117,902
88,023
96,822
90,464
94,476
132,999
102,985
98,050
91,713
106,708
115,706
113,000
109,866
126,381
116,931
103,337
102,784
111,651
104,052
107,412
120,282
134,025
114,567
97,364
99,871
107,928
114,567
103,629
101,025
158,398
101,031
127,309
135,961
217,064
125,127
79,711
104,830
153,384
95,959
115,028
102,557
96,080
86,177
96,843
104,325
140,538
153,084
106,610
87,667
104,533
117,570
107,274
117,560
136,941
123,267
87,603
113,235
104,224
111,712
100,363
121,053
115,065
121,026
110,991
122,521
93,036
120,258
103,387
118,789
111,114
109,765
103,748
104,498
111,277
96,327
107,355
121,286
111,428
118,150
115,662
102,754
85,688
118,068
111,416
136,279
112,671
112,847
119,718
105,289
91,237
112,145
94,310
106,157
111,095
121,441
129,169
114,526
102,352
132,075
99,153
144,774
207,484
96,335
127,921
114,289
92,814
105,831
105,927
113,444
117,785
116,423
113,612
116,854
112,566
114,422
105,964
125,808
119,823
123,903
94,714
121,629
96,931
131,605
102,853
112,249
133,474
110,545
120,534
92,916
86,408
114,741
117,492
126,104
96,230
116,541
108,772
89,877
113,622
135,108
122,469
118,758
120,184
108,314
119,963
119,935
92,485
137,491
132,929
123,245
147,803
134,001
103,330
95,267
105,183
147,396
104,141
125,327
120,027
87,475
105,932
106,585
104,826
104,360
85,240
131,474
125,974
96,538
123,347
108,038
95,099
108,081
92,329
99,225
100,406
143,188
60,798
111,408
84,816
82,998
126,829
102,858
106,852
111,305
87,323
89,050
95,813
103,099
75,086
89,198
133,914
112,193
120,602
140,100
124,031
102,234
122,245
87,279
144,171
95,054
125,850
131,750
114,783
96,397
114,871
142,166
135,964
119,840
129,482
117,152
86,509
102,308
94,934
63,474
156,221
99,013
88,883
83,752
77,338
58,425
76,425
79,453
151,110
110,190
91,388
77,633
98,904
97,126
112,817
100,906
106,905
98,927
107,406
130,145
29,637,009
95.98%
denoisedR
92,567
151,667
115,508
108,253
106,334
110,975
119,136
126,922
103,113
119,284
87,861
99,369
103,011
89,870
131,403
124,552
106,418
90,743
103,288
107,507
116,476
108,237
108,135
105,183
96,903
101,318
95,690
111,072
112,780
91,795
91,049
103,703
85,615
107,294
105,386
117,319
87,649
96,230
89,806
94,055
132,027
102,508
97,358
91,521
106,151
114,840
112,269
109,123
125,618
116,099
103,175
102,339
111,256
103,594
106,699
119,647
133,374
113,891
96,859
99,345
107,312
113,734
102,799
100,440
157,341
100,368
126,598
134,756
215,700
123,979
78,919
104,011
152,616
95,351
114,285
101,993
95,556
85,524
96,288
103,786
139,484
152,247
106,172
87,339
103,659
116,926
106,508
116,898
136,382
122,394
87,105
112,535
103,583
111,035
99,688
120,157
114,277
120,015
110,235
121,960
92,402
119,540
102,871
118,240
110,452
109,377
103,285
103,970
110,582
96,001
106,888
120,434
110,887
117,525
114,821
102,232
84,975
117,158
110,467
135,313
111,960
111,814
118,850
104,614
90,636
111,367
93,728
105,599
110,314
121,050
128,373
113,545
102,049
131,190
97,975
143,695
206,504
95,665
126,910
113,606
92,435
105,068
105,399
112,872
117,207
115,565
113,049
116,060
111,754
113,727
105,055
124,892
119,000
123,136
94,034
120,999
96,122
130,928
102,003
111,414
132,604
109,731
119,615
92,128
85,640
113,989
116,474
125,199
95,365
116,008
108,046
89,061
112,817
134,169
121,390
118,130
119,692
107,686
119,227
119,032
91,917
136,775
132,404
122,513
146,709
133,028
102,644
94,582
104,449
146,427
103,318
124,511
119,320
87,006
105,455
106,069
104,117
103,775
84,733
130,971
125,334
95,755
122,632
107,082
94,402
107,347
91,604
98,858
99,880
142,764
60,321
110,704
84,322
82,578
126,046
102,134
106,340
110,329
86,807
88,332
95,320
102,442
74,668
88,542
132,935
111,411
119,692
139,264
123,276
101,372
121,312
87,067
143,059
94,461
124,817
130,965
114,153
96,063
114,012
141,442
135,127
119,020
128,571
116,496
86,081
101,682
94,415
63,160
155,165
98,335
88,351
83,178
76,663
57,869
75,959
78,986
150,352
109,610
90,920
77,282
98,371
96,645
112,211
100,215
106,212
98,516
106,760
129,368
29,450,797
95.37%
merged
88,799
144,885
110,870
103,621
102,324
105,303
114,121
120,562
97,345
112,929
83,335
93,858
96,494
85,443
124,542
118,621
101,484
86,755
97,667
102,375
111,155
103,500
103,510
100,752
91,530
96,542
91,411
106,043
107,834
87,364
86,098
98,364
80,802
102,206
99,891
112,334
82,917
91,674
86,073
90,074
125,267
97,910
93,273
86,680
101,563
109,457
107,601
104,951
120,729
111,445
98,402
97,175
106,256
99,306
101,317
114,731
127,680
108,548
93,232
94,635
102,881
109,175
98,716
96,319
150,165
95,978
121,877
127,464
205,951
118,063
74,819
99,126
146,277
90,738
109,739
97,585
91,235
81,921
92,384
99,228
133,144
145,552
101,486
83,308
98,101
111,486
101,758
111,425
131,541
116,148
83,186
108,315
99,611
106,484
94,993
114,979
109,451
114,934
105,984
116,699
88,002
114,661
98,310
112,508
106,437
104,843
97,746
100,076
104,652
92,082
101,629
115,466
106,557
111,994
109,360
98,186
80,908
111,971
104,358
129,686
106,733
107,070
114,169
99,338
86,201
106,137
88,977
99,841
105,462
115,776
122,450
108,277
97,813
125,549
93,626
137,386
197,499
90,972
121,891
108,535
88,133
101,349
100,656
107,903
113,225
110,511
106,711
111,112
107,034
109,121
100,238
119,124
113,642
118,087
89,381
115,870
90,937
125,654
97,829
106,161
127,245
106,059
115,358
88,696
82,074
108,866
112,657
119,542
91,263
110,865
104,181
85,295
108,369
128,559
116,590
112,818
114,115
102,627
114,961
113,672
88,408
131,162
126,195
116,786
140,609
127,221
97,769
90,364
100,252
140,397
98,142
119,105
113,921
82,657
100,040
102,146
99,765
98,616
80,802
124,858
120,178
91,573
116,583
102,465
91,020
102,651
88,367
95,419
95,443
136,390
57,467
106,333
80,283
78,748
120,664
97,127
102,522
104,986
83,780
83,800
90,926
97,997
71,034
84,491
127,576
106,622
114,245
133,573
118,503
96,531
115,517
83,493
136,857
91,240
119,444
124,575
109,072
91,390
108,847
136,166
130,132
114,576
122,404
112,181
82,865
96,698
89,330
60,311
148,898
94,739
83,928
80,563
73,450
54,935
72,677
75,326
142,987
103,980
87,554
73,633
93,453
92,330
107,277
95,760
101,481
93,502
101,854
122,819
28,142,171
91.14%
nonchim
69,753
125,616
102,179
89,501
87,730
95,474
99,282
107,977
86,578
96,121
69,736
81,944
86,088
76,003
114,041
95,920
90,287
75,339
90,026
95,008
91,989
83,970
89,416
86,693
78,914
82,246
80,405
86,397
99,420
76,530
79,429
89,583
73,762
94,798
90,649
98,115
68,843
81,678
75,501
78,302
117,355
86,052
86,704
76,793
90,262
102,466
93,847
84,724
100,796
96,171
84,518
77,391
89,619
85,913
94,219
103,044
106,938
94,145
81,680
84,906
88,579
93,609
87,980
86,452
140,617
84,536
110,304
116,290
182,023
110,443
65,913
90,475
128,807
76,024
83,567
84,691
79,689
70,971
79,435
86,913
122,691
136,222
81,854
71,555
90,042
94,641
90,727
95,211
99,538
107,860
74,186
96,265
91,182
99,192
80,914
106,351
98,883
105,611
98,925
93,963
70,415
103,474
83,633
98,756
95,738
93,023
87,098
87,968
98,059
77,335
81,883
102,201
90,332
104,012
99,792
87,705
74,584
104,821
97,545
117,367
87,613
96,680
98,935
91,930
79,081
88,357
81,373
90,692
97,325
100,347
114,572
100,160
74,971
94,356
82,215
125,854
185,791
77,788
113,864
101,311
76,964
86,825
87,219
95,237
96,383
100,697
94,952
105,154
96,630
100,592
89,204
109,610
106,821
99,274
75,917
98,432
82,416
104,172
88,697
93,474
114,957
95,197
109,392
82,073
72,633
89,167
97,676
107,216
84,307
96,709
94,815
76,046
94,339
116,128
102,000
93,765
100,643
87,559
106,313
102,556
74,921
116,264
113,631
103,796
125,974
109,322
89,498
81,513
92,057
132,206
87,871
104,501
100,775
73,539
88,567
80,796
92,574
89,426
68,711
112,213
107,136
84,885
108,564
95,889
84,290
89,719
72,398
86,908
83,255
121,246
52,414
97,185
71,023
67,298
110,145
88,915
91,091
92,468
69,965
76,939
83,623
87,329
63,853
74,179
116,746
92,316
108,811
122,703
104,464
91,272
104,833
74,169
117,285
84,413
110,674
118,466
98,483
79,265
101,631
124,075
116,074
107,034
113,912
95,280
75,834
82,584
81,021
55,760
141,570
74,982
77,886
74,916
68,343
48,541
62,894
64,899
121,252
88,535
75,339
66,122
79,137
79,263
90,041
87,004
91,515
75,256
91,332
110,731
24,938,569
80.76%
This table can be downloaded as an Excel table below:
5. DADA2 Amplicon Sequence Variants (ASVs). A total of 20085 unique merged and chimera-free ASV sequences were identified, and their corresponding
read counts for each sample are available in the "ASV Read Count Table" with rows for the ASV sequences and columns for sample. This read count table can be used for
microbial profile comparison among different samples and the sequences provided in the table can be used to taxonomy assignment.
The species-level, open-reference 16S rRNA NGS reads taxonomy assignment pipeline
Version 20210310
1. Raw sequences reads in FASTA format were BLASTN-searched against a combined set of 16S rRNA reference sequences.
It consists of MOMD (version 0.1), the HOMD (version 15.2 http://www.homd.org/index.php?name=seqDownload&file&type=R ),
HOMD 16S rRNA RefSeq Extended Version 1.1 (EXT), GreenGene Gold (GG)
(http://greengenes.lbl.gov/Download/Sequence_Data/Fasta_data_files/gold_strains_gg16S_aligned.fasta.gz) ,
and the NCBI 16S rRNA reference sequence set (https://ftp.ncbi.nlm.nih.gov/blast/db/16S_ribosomal_RNA.tar.gz).
These sequences were screened and combined to remove short sequences (<1000nt), chimera, duplicated and sub-sequences,
as well as sequences with poor taxonomy annotation (e.g., without species information).
This process resulted in 1,015 from HOMD V15.22, 495 from EXT, 3,940 from GG and 18,044 from NCBI, a total of 25,120 sequences.
Altogether these sequence represent a total of 15,601 oral and non-oral microbial species.
The NCBI BLASTN version 2.7.1+ (Zhang et al, 2000) was used with the default parameters.
Reads with ≥ 98% sequence identity to the matched reference and ≥ 90% alignment length
(i.e., ≥ 90% of the read length that was aligned to the reference and was used to calculate
the sequence percent identity) were classified based on the taxonomy of the reference sequence
with highest sequence identity. If a read matched with reference sequences representing
more than one species with equal percent identity and alignment length, it was subject
to chimera checking with USEARCH program version v8.1.1861 (Edgar 2010). Non-chimeric reads with multi-species
best hits were considered valid and were assigned with a unique species
notation (e.g., spp) denoting unresolvable multiple species.
2. Unassigned reads (i.e., reads with < 98% identity or < 90% alignment length) were pooled together and reads < 200 bases were
removed. The remaining reads were subject to the de novo
operational taxonomy unit (OTU) calling and chimera checking using the USEARCH program version v8.1.1861 (Edgar 2010).
The de novo OTU calling and chimera checking was done using 98% as the sequence identity cutoff, i.e., the species-level OTU.
The output of this step produced species-level de novo clustered OTUs with 98% identity.
Representative reads from each of the OTUs/species were then BLASTN-searched
against the same reference sequence set again to determine the closest species for
these potential novel species. These potential novel species were pooled together with the reads that were signed to specie-level in
the previous step, for down-stream analyses.
Reference:
Edgar RC. Search and clustering orders of magnitude faster than BLAST.
Bioinformatics. 2010 Oct 1;26(19):2460-1. doi: 10.1093/bioinformatics/btq461. Epub 2010 Aug 12. PubMed PMID: 20709691.
3. Designations used in the taxonomy:
1) Taxonomy levels are indicated by these prefixes:
k__: domain/kingdom
p__: phylum
c__: class
o__: order
f__: family
g__: genus
s__: species
Example:
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Blautia;s__faecis
2) Unique level identified – known species:
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Roseburia;s__hominis
The above example shows some reads match to a single species (all levels are unique)
3) Non-unique level identified – known species:
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Roseburia;s__multispecies_spp123_3
The above example “s__multispecies_spp123_3” indicates certain reads equally match to 3 species of the
genus Roseburia; the “spp123” is a temporally assigned species ID.
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__multigenus;s__multispecies_spp234_5
The above example indicates certain reads match equally to 5 different species, which belong to multiple genera.;
the “spp234” is a temporally assigned species ID.
4) Unique level identified – unknown species, potential novel species:
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Roseburia;s__ hominis_nov_97%
The above example indicates that some reads have no match to any of the reference sequences with
sequence identity ≥ 98% and percent coverage (alignment length) ≥ 98% as well. However this groups
of reads (actually the representative read from a de novo OTU) has 96% percent identity to
Roseburia hominis, thus this is a potential novel species, closest to Roseburia hominis.
(But they are not the same species).
5) Multiple level identified – unknown species, potential novel species:
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Roseburia;s__ multispecies_sppn123_3_nov_96%
The above example indicates that some reads have no match to any of the reference sequences
with sequence identity ≥ 98% and percent coverage (alignment length) ≥ 98% as well.
However this groups of reads (actually the representative read from a de novo OTU)
has 96% percent identity equally to 3 species in Roseburia. Thus this is no single
closest species, instead this group of reads match equally to multiple species at 96%.
Since they have passed chimera check so they represent a novel species. “sppn123” is a
temporary ID for this potential novel species.
4. The taxonomy assignment algorithm is illustrated in this flow char below:
Read Taxonomy Assignment - Result Summary *
Code
Category
MPC=0% (>=1 read)
MPC=0.01%(>=2487 reads)
A
Total reads
24,938,569
24,938,569
B
Total assigned reads
24,878,532
24,878,532
C
Assigned reads in species with read count < MPC
0
159,883
D
Assigned reads in samples with read count < 500
0
0
E
Total samples
268
268
F
Samples with reads >= 500
268
268
G
Samples with reads < 500
0
0
H
Total assigned reads used for analysis (B-C-D)
24,878,532
24,718,649
I
Reads assigned to single species
22,281,176
22,180,815
J
Reads assigned to multiple species
2,271,680
2,253,700
K
Reads assigned to novel species
325,676
284,134
L
Total number of species
1,007
201
M
Number of single species
504
175
N
Number of multi-species
59
13
O
Number of novel species
444
13
P
Total unassigned reads
60,037
60,037
Q
Chimeric reads
407
407
R
Reads without BLASTN hits
252
252
S
Others: short, low quality, singletons, etc.
59,378
59,378
A=B+P=C+D+H+Q+R+S
E=F+G
B=C+D+H
H=I+J+K
L=M+N+O
P=Q+R+S
* MPC = Minimal percent (of all assigned reads) read count per species, species with read count < MPC were removed.
* Samples with reads < 500 were removed from downstream analyses.
* The assignment result from MPC=0.1% was used in the downstream analyses.
Read Taxonomy Assignment - ASV Species-Level Read Counts Table
This table shows the read counts for each sample (columns) and each species identified based on the ASV sequences.
The downstream analyses were based on this table.
The species listed in the table has full taxonomy and a dynamically assigned species ID specific to this report.
When some reads match with the reference sequences of more than one species equally (i.e., same percent identiy and alignmnet coverage),
they can't be assigned to a particular species. Instead, they are assigned to multiple species with the species notaton
"s__multispecies_spp2_2". In this notation, spp2 is the dynamic ID assigned to these reads that hit multiple sequences and the "_2"
at the end of the notation means there are two species in the spp2.
You can look up which species are included in the multi-species assignment, in this table below:
Another type of notation is "s__multispecies_sppn2_2", in which the "n" in the sppn2 means it's a potential novel species because all the reads in this species
have < 98% idenity to any of the reference sequences. They were grouped together based on de novo OTU clustering at 98% identity cutoff. And then
a representative sequence was chosed to BLASTN search against the reference database to find the closest match (but will still be < 98%). This representative
sequence also matched equally to more than one species, hence the "spp" was given in the label.
In ecology, alpha diversity (α-diversity) is the mean species diversity in sites or habitats at a local scale.
The term was introduced by R. H. Whittaker[1][2] together with the terms beta diversity (β-diversity)
and gamma diversity (γ-diversity). Whittaker's idea was that the total species diversity in a landscape
(gamma diversity) is determined by two different things, the mean species diversity in sites or habitats
at a more local scale (alpha diversity) and the differentiation among those habitats (beta diversity).
Diversity measures are affected by the sampling depth. Rarefaction is a technique to assess species richness from the results of sampling. Rarefaction allows
the calculation of species richness for a given number of individual samples, based on the construction
of so-called rarefaction curves. This curve is a plot of the number of species as a function of the
number of samples. Rarefaction curves generally grow rapidly at first, as the most common species are found,
but the curves plateau as only the rarest species remain to be sampled.
The two main factors taken into account when measuring diversity are richness and evenness.
Richness is a measure of the number of different kinds of organisms present in a particular area.
Evenness compares the similarity of the population size of each of the species present. There are
many different ways to measure the richness and evenness. These measurements are called "estimators" or "indices".
Below is a diversity of 3 commonly used indices showing the values for all the samples (dots) and in groups (boxes).
 
Alpha Diversity Box Plots for All Groups
 
 
 
Alpha Diversity Box Plots for Individual Comparisons
To test whether the alpha diversity among different comparison groups are different statistically, we use the Kruskal Wallis H test
provided the "alpha-group-significance" fucntion in the QIIME 2 "diversity" package. Kruskal Wallis H test is the non-parametric alternative
to the One Way ANOVA. Non-parametric means that the test doesn’t assume your data comes from a particular distribution. The H test is used
when the assumptions for ANOVA aren’t met (like the assumption of normality). It is sometimes called the one-way ANOVA on ranks,
as the ranks of the data values are used in the test rather than the actual data points. The H test determines whether the medians of two
or more groups are different.
Below are the Kruskal Wallis H test results for each comparison based on three different alpha diversity measures: 1) Observed species (features),
2) Shannon index, and 3) Simpson index.
Beta diversity compares the similarity (or dissimilarity) of microbial profiles between different
groups of samples. There are many different similarity/dissimilarity metrics.
In general, they can be quantitative (using sequence abundance, e.g., Bray-Curtis or weighted UniFrac)
or binary (considering only presence-absence of sequences, e.g., binary Jaccard or unweighted UniFrac).
They can be even based on phylogeny (e.g., UniFrac metrics) or not (non-UniFrac metrics, such as Bray-Curtis, etc.).
For microbiome studies, species profiles of samples can be compared with the Bray-Curtis dissimilarity,
which is based on the count data type. The pair-wise Bray-Curtis dissimilarity matrix of all samples can then be
subject to either multi-dimensional scaling (MDS, also known as PCoA) or non-metric MDS (NMDS).
MDS/PCoA is a
scaling or ordination method that starts with a matrix of similarities or dissimilarities
between a set of samples and aims to produce a low-dimensional graphical plot of the data
in such a way that distances between points in the plot are close to original dissimilarities.
NMDS is similar to MDS, however it does not use the dissimilarities data, instead it converts them into
the ranks and use these ranks in the calculation.
In our beta diversity analysis, Bray-Curtis dissimilarity matrix was first calculated and then plotted by the PCoA and
NMDS separately. Below are beta diveristy results for all groups together:
 
 
NMDS and PCoA Plots for All Groups
 
 
 
 
 
The above PCoA and NMDS plots are based on count data. The count data can also be transformed into centered log ratio (CLR)
for each species. The CLR data is no longer count data and cannot be used in Bray-Curtis dissimilarity calculation. Instead
CLR can be compared with Euclidean distances. When CLR data are compared by Euclidean distance, the distance is also called
Aitchison distance.
Below are the NMDS and PCoA plots of the Aitchison distances of the samples:
Interactive 3D PCoA Plots - Bray-Curtis Dissimilarity
 
 
 
Interactive 3D PCoA Plots - Euclidean Distance
 
 
 
Interactive 3D PCoA Plots - Correlation Coefficients
 
 
 
Group Significance of Beta-diversity Indices
To test whether the between-group dissimilarities are significantly greater than the within-group dissimilarities,
the "beta-group-significance" function provided in the QIIME 2 "diversity" package was used with PERMANOVA
(permutational multivariate analysis of variance) as the group significant testing method.
Three beta diversity matrics were used: 1) Bray–Curtis dissimilarity 2) Correlation coefficient matrix , and 3) Aitchison distance
(Euclidean distance between clr-transformed compositions).
16S rRNA next generation sequencing (NGS) generates a fixed number of reads that reflect the proportion of different
species in a sample, i.e., the relative abundance of species, instead of the absolute abundance.
In Mathematics, measurements involving probabilities, proportions, percentages, and ppm can all
be thought of as compositional data. This makes the microbiome read count data “compositional”
(Gloor et al, 2017). In general, compositional data represent parts of a whole which only
carry relative information (http://www.compositionaldata.com/).
The problem of microbiome data being compositional arises when comparing two groups of samples for
identifying “differentially abundant” species. A species with the same absolute abundance between two
conditions, its relative abundances in the two conditions (e.g., percent abundance) can become different
if the relative abundance of other species change greatly. This problem can lead to incorrect conclusion
in terms of differential abundance for microbial species in the samples.
When studying differential abundance (DA), the current better approach is to transform the read count
data into log ratio data. The ratios are calculated between read counts of all species in a sample to
a “reference” count (e.g., mean read count of the sample). The log ratio data allow the detection of DA
species without being affected by percentage bias mentioned above
In this report, a compositional DA analysis tool “ANCOM” (analysis of composition of microbiomes)
was used. ANCOM transforms the count data into log-ratios and thus is more suitable for comparing
the composition of microbiomes in two or more populations. "ANCOM" generates a table of features with
W-statistics and whether the null hypothesis is rejected. The “W” is the W-statistic, or number of
features that a single feature is tested to be significantly different against. Hence the higher the "W"
the more statistical sifgnificant that a feature/species is differentially abundant.
References:
Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ. Microbiome Datasets Are Compositional: And This Is Not Optional. Front Microbiol.
2017 Nov 15;8:2224. doi: 10.3389/fmicb.2017.02224. PMID: 29187837; PMCID: PMC5695134.
Mandal S, Van Treuren W, White RA, Eggesbø M, Knight R, Peddada SD. Analysis of composition of
microbiomes: a novel method for studying microbial composition. Microb Ecol Health Dis.
2015 May 29;26:27663. doi: 10.3402/mehd.v26.27663. PMID: 26028277; PMCID: PMC4450248.
Lin H, Peddada SD. Analysis of compositions of microbiomes with bias correction.
Nat Commun. 2020 Jul 14;11(1):3514. doi: 10.1038/s41467-020-17041-7.
PMID: 32665548; PMCID: PMC7360769.
Starting with version V1.2, we include the results of ANCOM-BC (Analysis of Compositions of
Microbiomes with Bias Correction) (Lin and Peddada 2020). ANCOM-BC is an updated version of "ANCOM" that:
(a) provides statistically valid test with appropriate p-values,
(b) provides confidence intervals for differential abundance of each taxon,
(c) controls the False Discovery Rate (FDR),
(d) maintains adequate power, and
(e) is computationally simple to implement.
The bias correction (BC) addresses a challenging problem of the bias introduced by differences in
the sampling fractions across samples. This bias has been a major hurdle in performing DA analysis of microbiome data.
ANCOM-BC estimates the unknown sampling fractions and corrects the bias induced by their differences among samples.
The absolute abundance data are modeled using a linear regression framework.
Starting with version V1.43, ANCOM-BC2 is used instead of ANCOM-BC, So that multiple pairwise directional test can be performed (if there are more than two gorups in a comparison).
When performing pairwise directional test, the mixed directional false discover rate (mdFDR) is taken into account. The mdFDR
is the combination of false discovery rate due to multiple testing, multiple pairwise comparisons, and directional tests within
each pairwise comparison. The mdFDR is adopted from (Guo, Sarkar, and Peddada 2010; Grandhi, Guo, and Peddada 2016). For more detail
explanation and additional features of ANCOM-BC2 please see author's documentation.
References:
Lin H, Peddada SD. Analysis of compositions of microbiomes with bias correction.
Nat Commun. 2020 Jul 14;11(1):3514. doi: 10.1038/s41467-020-17041-7.
PMID: 32665548; PMCID: PMC7360769.
Guo W, Sarkar SK, Peddada SD. Controlling false discoveries in multidimensional directional decisions, with applications to gene expression data on ordered categories. Biometrics. 2010 Jun;66(2):485-92. doi: 10.1111/j.1541-0420.2009.01292.x. Epub 2009 Jul 23. PMID: 19645703; PMCID: PMC2895927.
Grandhi A, Guo W, Peddada SD. A multiple testing procedure for multi-dimensional pairwise comparisons with application to gene expression studies. BMC Bioinformatics. 2016 Feb 25;17:104. doi: 10.1186/s12859-016-0937-5. PMID: 26917217; PMCID: PMC4768411.
LEfSe (Linear Discriminant Analysis Effect Size) is an alternative method to find "organisms, genes, or
pathways that consistently explain the differences between two or more microbial communities" (Segata et al., 2011).
Specifically, LEfSe uses rank-based Kruskal-Wallis (KW) sum-rank test to detect features with significant
differential (relative) abundance with respect to the class of interest. Since it is rank-based, instead of proportional based,
the differential species identified among the comparison groups is less biased (than percent abundance based).
Reference:
Segata N, Izard J, Waldron L, Gevers D, Miropolsky L, Garrett WS, Huttenhower C. Metagenomic biomarker discovery and explanation. Genome Biol. 2011 Jun 24;12(6):R60. doi: 10.1186/gb-2011-12-6-r60. PMID: 21702898; PMCID: PMC3218848.
To analyze the co-occurrence or co-exclusion between microbial species among different samples, network correlation
analysis tools are usually used for this purpose. However, microbiome count data are compositional. If count data are normalized to the total number of counts in the
sample, the data become not independent and traditional statistical metrics (e.g., correlation) for the detection
of specie-species relationships can lead to spurious results. In addition, sequencing-based studies typically
measure hundreds of OTUs (species) on few samples; thus, inference of OTU-OTU association networks is severely
under-powered. Here we use SPIEC-EASI (SParse InversECovariance Estimation
for Ecological Association Inference), a statistical method for the inference of microbial
ecological networks from amplicon sequencing datasets that addresses both of these issues (Kurtz et al., 2015).
SPIEC-EASI combines data transformations developed for compositional data analysis with a graphical model
inference framework that assumes the underlying ecological association network is sparse. SPIEC-EASI provides
two algorithms for network inferencing – 1) Meinshausen-Bühlmann's neighborhood selection (MB method) and inverse covariance selection
(GLASSO method, i.e., graphical least absolute shrinkage and selection operator). This is fundamentally distinct from SparCC, which essentially estimate pairwise correlations. In addition
to these two methods, we provide the results of a third method - SparCC (Sparse Correlations for Compositional Data)(Friedman & Alm 2012), which
is also a method for inferring correlations from compositional data. SparCC estimates the linear Pearson correlations between
the log-transformed components.
References:
Kurtz ZD, Müller CL, Miraldi ER, Littman DR, Blaser MJ, Bonneau RA. Sparse and compositionally robust inference of microbial ecological networks. PLoS Comput Biol. 2015 May 7;11(5):e1004226. doi: 10.1371/journal.pcbi.1004226. PMID: 25950956; PMCID: PMC4423992.
The results of this analysis are for research purpose only. They are not intended to diagnose, treat, cure, or prevent any disease. Forsyth and FOMC
are not responsible for use of information provided in this report outside the research area.