ASAP pipeline

Description of Method

The amplicon sequence data was analyzed using an internal pipeline Amplicon Sequencing Analysis Pipeline (ASAP, version 1.4). The MiSeq sequences (2 * 251 bp) were first subjected to quality check with FastQC (version 0.11.5). The pair-end sequences were then merged based on the 3’ overlap using PEAR (version 0.9.10) [1] with a quality score cutoff of 20, minimum assembled length of 200, maximum assembled length of 400 and minimum overlap length of 50 bp. The program split_libraries_fastq.py of QIIME packages (version 1.9.1) [2] was used to assign reads to samples (demultiplexing) based on the barcodes with the maximum barcode error of 0 and trimming quality score cutoff of 20. Primer sequences (forward and reverse) were trimmed. Sequences of library splitting of multiple sequencing rounds (two rounds in this study) were merged. Dereplication was performed using USEARCH (version 9.2.64) [3] with the command fastx_uniques (with the option of -sizeout for sequence abundance output). Operational Taxonomic Units (OTUs) were clustered using UPARSE (command -cluster_otus of USEARCH) [4] with OTU identity threshold of 0.97 and singletons and chimeric sequences were removed during this process. OTU table was made using command of -usearch_global of USEARCH. The representative sequences of OTUs were classified using RDP Classifier (16S: training set 16, June 2016, ITS: trainset fungalits_warcup, July 2016) [5] with confidence cutoff of 0.8. OTUs assigned to Chloroplast (at Order level) were removed (for ITS, delete this sentence). The representative sequences of OTUs were used to construct the phylogenetic tree. Sequences were aligned using MAFFT (version 3.8.31) [6] and alignments were filtered using Gblocks (version 0.91b) [7] with option of -t=d, -b4=3 and -b5=h. FastTree [8] was used to construct phylogenetic tree with the filtered alignment. The phylogenetic tree and OTU table was used in calculation of alpha diversity (phylogeny based indexes) and beta diversity (UniFrac distance) using programs of QIIME.

1. Zhang J, Kobert K, Flouri T, Stamatakis A: PEAR: a fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics 2014, 30(5):614-620.
2. Kuczynski J, Stombaugh J, Walters WA, Gonzalez A, Caporaso JG, Knight R: Using QIIME to analyze 16S rRNA gene sequences from microbial communities. Curr Protoc Bioinformatics 2011, Chapter 10:Unit 10 17.
3. Edgar RC: Search and clustering orders of magnitude faster than BLAST. Bioinformatics 2010, 26(19):2460-2461.
4. Edgar RC: UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nat Methods 2013, 10(10):996-998.
5. Wang Q, Garrity GM, Tiedje JM, Cole JR: Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microb 2007, 73(16):5261-5267.
6. Katoh K, Misawa K, Kuma K, Miyata T: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 2002, 30(14):3059-3066.
7. Castresana J: Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 2000, 17(4):540-552.
8. Price MN, Dehal PS, Arkin AP: FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix. Mol Biol Evol 2009, 26(7):1641-1650.