Despite the long history of study of Escherichia coli as a model system and pathogen, there have been relatively few species of the genus Escherichia identified to date. Currently, five species have been recognized: E. coli, E. fergusonii, E. albertii, E. marmotae, and E. hermanii. Several additional lineages have been previously classified as Escherichia, but subsequent phylogenetic analysis has led to the reclassification of the species previously known as E. adecarboxylata, E. blattae, and E. vulneris to Leclercia adecarboxylata, Shimwella blattae, and Pseudescherichia vulneris, respectively.
While most isolates of Escherichia are associated with the gastrointestinal tract of humans or domestic animals, surveys of environmental isolates and similar population studies of Escherichia have been conducted. One such study was conducted in 2005 in the laboratory of Thomas Whittam and this work uncovered a number of novel clades of Escherichia. This work utilized extended multi-locus sequence typing of internal sequences from 22 housekeeping genes to produce a phylogenetic analysis of strains from humans, wild animals, birds, and environmental and water sources from Australia, Asia, and North America. Due to the fact that no identifiable biochemical differences were identified, these five new clades were labeled “cryptic” Escherichia. In a 2015 review, Dr. Seth Walk added additional support for these clades through a phylogenetic tree analysis using the original 22 multilocus sequence typing (MLST) loci and a genomic analysis coupled with a literature review to identify potential unifying features of some of these clades, including identification of virulence-associated genes in Clade I (most closely related to E. coli) and adaptation to an environmental niche outside the gastrointestinal tract for Clades II–V.
Because MLST requires the selection of specific loci for phylogenetic comparison, it is by nature inherently vulnerable to selection bias. On the other hand, comparative techniques that use the entirety of the genome are unlikely to suffer from the same sort of bias. In this manner, a phylogenomic comparison is likely to provide more complete and unbiased results than a more limited phylogenetic comparison like MLST. Here, we present the results of such a phylogenomic analysis on a set of 89 Escherichia strains that include the previously described “cryptic” strains.