An essential step is the taxonomic classification of sequence fragments. We describe a web server for the taxonomic assignment of metagenome sequences with PhyloPythiaS. PhyloPythiaS is a fast and accurate sequence composition-based classifier that utilizes the hierarchical relationships between clades.

Taxonomic assignments with the web server can be made with a generic model, or with sample-specific models that users can specify and create. Several interactive visualization modes and multiple download formats allow quick and convenient analysis and downstream processing of taxonomic assignments.

Here, we demonstrate usage of our web server by taxonomic assignment of metagenome samples from an acidophilic biofilm community Web assignment an acid mine and of a microbial community from cow rumen. Introduction A metagenome sequence sample is obtained by sequencing the DNA of a mixture of microorganisms from an environment of interest [1].

Identification of the taxonomic affiliation of DNA sequences, either for individual reads or assembled contigs, is an essential step prior to further analysis, such as characterization of the functional and metabolic capabilities of the sequenced microbial community [2].

Various taxonomic assignment methods exist, which can be divided into three categories: Sequence composition based methods use short substrings k-mers to represent a sequence as a vector of fixed length, which is used to assess similarity among sequences.

Sequence alignment and phylogeny-based methods use sequence similarity as a measure of evolutionary relatedness between sequences. This approach is computationally more expensive compared to sequence composition, and thus requires more hardware resources for analysis of large datasets.

Hybrid methods combine information from both sequence composition and alignment to assess similarity between sequences. From another perspective, taxonomic assignment methods can be categorized as either unsupervised or Web assignment methods.

Unsupervised methods cluster the sequences based on a similarity measure and then assign a taxonomic affiliation to the clusters. Supervised methods, on the other hand, infer a taxonomic model using sequences of known taxonomic origin, which are then used for taxonomic assignment of novel metagenome sequences.

Given that sufficient reference data for modeling are available, supervised methods are likely to be more accurate in taxonomic assignment than clustering techniques, as the effect of non-taxonomic signals, such as guanine and cytosine strand biases, on taxonomic assignment is minimized during model induction.

Recently we developed a new method PhyloPythiaS, which is a successor to the previously published software PhyloPythia [8][9].

PhyloPythiaS exhibits high prediction accuracy and allows a rapid analysis of datasets with several hundred mega-bases or giga-bases. PhyloPythiaS was benchmarked on simulated and real data sets and shows good predictive performance.

PhyloPythiaS shows notably reduced execution times in comparison to MEGAN [4] and PhymmBL [5] fold and fold respectively on a 13 Mb assembled metagenome sampleas no similarity searches are performed against large databases. It also shows better predictive performance on both simulated and real metagenome samples, in particular when limited amount of reference sequences from particular species are available approximately kb.

While for short fragments, all methods perform less favorably than for fragments of 1 kb in length or more [2]similarity-based assignment with MEGAN has the lowest error rate for short fragments.

PhyloPythiaS is freely available for non-commercial users and can be installed on a Linux-based machine [8]. PhyloPythiaS can be used in two different modes — generic and sample-specific. The generic model is suitable for the analysis of a metagenome sample, if no further information on the sample's taxonomic composition or relevant reference data are available.

Assignment accuracy can be improved by creation and use of a sample-specific model, which includes clades for the abundant sample population that are inferred from the appropriate reference sequences.

A sample-specific model is inferred from public sequence data combined with sequences with known taxonomic affiliation identified from the metagenome sample, along with a customized taxonomy.

If a better match to the taxa in the metagenome sample is achieved, sample-specific models exhibit higher predictive accuracy, and have improved resolution to low-ranking clades and higher coverage in terms of assigned sequences compared to the generic model.

Here we present a web server for taxonomic sequence assignment for web-based use of PhyloPythiaS. The underlying functionality of the software is as we have described it before.

Furthermore, they allow a visual presentation of results for a quick overview and exploration of data sets. Our server is unique in that it provides the ability to construct and use sample-specific models, besides enabling assignment with generic models.

We illustrate taxonomic metagenome assignment with the generic and sample-specific modes of the web server by analyzing metagenome samples of an acidophilic biofilm community from acid mine drainage AMD [13] and of a cow rumen microbial community [14].

Results We demonstrate the functionality of the web server based on a taxonomic assignment of two metagenome sequence samples. For performance analysis, we assessed the consistency and taxonomic distance of assignments, as defined in [8].

A prediction for a sequence fragment was considered to be consistent if the fragment was either assigned to the correct clade or to a parental clade of the correct taxonomic label.

