Sequence type 1 group B
Streptococcus, an emerging
cause of invasive disease in adults, evolves by small
genetic changes
Anthony R. Floresa, Jessica Galloway-Peñab, Pranoti Sahasrabhojaneb, Miguel Saldañab, Hui Yaoc, Xiaoping Suc,
Nadim J. Ajamid, Michael E. Holderd, Joseph F. Petrosinod, Erika Thompsone, Immaculada Margarit Y Rosf, Roberto Rosinif, Guido Grandif, Nicola Horstmannb, Sarah Teaterog, Allison McGeerh,i, Nahuel Fittipaldig,i,
Rino Rappuolif,1, Carol J. Bakera,d, and Samuel A. Shelburneb,j,1
aDepartment of Pediatrics and Molecular Virology and Microbiology, MD Anderson Cancer Center, Houston, TX 77030;bDepartment of Infectious Diseases,
Infection Control and Employee Health, MD Anderson Cancer Center, Houston, TX 77030;cDepartment of Bioinformatics and Computational Biology,
MD Anderson Cancer Center, Houston, TX 77030;dThe Alkek Center for Metagenomics and Microbiome Research, Department of Molecular Virology and
Microbiology, Baylor College of Medicine, Houston, TX 77030;eDNA Sequencing Facility, MD Anderson Cancer Center, Houston, TX 77030;fNovartis
Vaccines, 53100 Siena, Italy;gPublic Health Ontario, Toronto, ON, Canada M5G 1M1;hDepartment of Microbiology, Mount Sinai Hospital, Toronto, ON,
Canada M5G 1X5;iDepartment of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada MG5 1M1; andjDepartment of
Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030
Contributed by Rino Rappuoli, April 8, 2015 (sent for review September 1, 2014)
The molecular mechanisms underlying pathogen emergence in humans is a critical but poorly understood area of microbiologic investigation. Serotype V group BStreptococcus (GBS) was first isolated from humans in 1975, and rates of invasive serotype V GBS disease significantly increased starting in the early 1990s. We found that 210 of 229 serotype V GBS strains (92%) isolated from the bloodstream of nonpregnant adults in the United States and Canada between 1992 and 2013 were multilocus sequence type (ST) 1. Elucidation of the complete genome of a 1992 ST-1 strain revealed that this strain had the highest homology with a GBS strain causing cow mastitis and that the 1992 ST-1 strain differed from serotype V strains isolated in the late 1970s by acquisition of cell surface proteins and antimicrobial resistance determinants. Whole-genome comparison of 202 invasive ST-1 strains detected significant recombination in only eight strains. The remaining 194 strains differed by an average of 97 SNPs. Phylogenetic analysis revealed a temporally dependent mode of genetic diversification consistent with the emergence in the 1990s of ST-1 GBS as major agents of human disease. Thirty-one loci were identified as being under positive selective pressure, and mutations at loci encoding polysaccharide capsule production proteins, regulators of pilus ex-pression, and two-component gene regulatory systems were shown to affect the bacterial phenotype. These data reveal that phenotypic diversity among ST-1 GBS is mainly driven by small genetic changes rather than extensive recombination, thereby extending knowledge into how pathogens adapt to humans.
Streptococcus agalactiae
|
pathogenesis|
evolution|
single nucleotide polymorphisms|
surface proteinT
he recent increase in large-scale DNA sequencing feasibility has allowed for significant advances in understanding the population genetics of bacteria that cause disease in humans (1, 2). A major outcome of the rapid expansion of available bacterial genomes has been the appreciation of the marked intraspecies genetic variability present in a wide variety of human bacterial pathogens (3, 4). This genetic variation can have profound im-pact on host–pathogen interaction by affecting transmissibility, infection severity, and antimicrobial resistance (2, 5, 6). The observed genetic intraspecies variability can arise via many dis-tinct mechanisms including large-scale events such as recombi-nation and bacteriophage-mediated horizontal gene transfer as well as small-scale genetic changes such as short insertions, de-letions, and/or single nucleotide changes (2, 3, 5).Group BStreptococcus (GBS) is a common colonizer of hu-mans that emerged in the 1970s as the leading cause of invasive
bacterial disease in neonates and infants less than 3 mo of age (7). GBS is divided into 10 serotypes based on the carbohydrate composition of its sialic acid containing capsule, but gene con-tent at the genomic level does not necessarily correlate with capsular serotype (8, 9). A seven-gene multilocus sequence typ-ing (MLST) allows for the classification of the majority of GBS strains isolated from humans into five major clonal complexes (CCs) with a recent study by Da Cunha et al. showing that the major GBS CCs are primarily derived from a limited number of tetracycline-resistant clones, suggesting a key role of tetracycline resistance in GBS strain emergence (10, 11). CC-17 GBS strains have been particularly well studied given their role as the major cause of severe, invasive infant disease (10, 12). In contrast, se-rotype V strains cause a larger percentage of invasive disease in nonpregnant adults compared with neonates (13–15). Impor-tantly, rates of invasive GBS disease have been increasing during
Significance
Serotype V group BStreptococcus (GBS) infection rates in hu-mans have steadily increased during the past several decades. We determined that 92% of bloodstream infections caused by serotype V GBS in Houston and Toronto are caused by genet-ically related strains called sequence type (ST) 1. Whole-genome analysis of 202 serotype V ST-1 strains revealed the molecular re-lationship among these strains and that they are closely related to a bovine strain. Moreover, we found that a subset of GBS genes is under selective evolutionary pressure, indicating that proteins produced by these genes likely contribute to GBS host–pathogen interaction. These data will assist in understanding how bacteria adapt to cause disease in humans, thereby potentially informing new preventive and therapeutic strategies.
Author contributions: A.R.F., J.G.-P., H.Y., I.M.Y.R., R. Rosini, G.G., N.H., A.M., N.F., R. Rappuoli, C.J.B., and S.A.S. designed research; A.R.F., J.G.-P., P.S., M.S., H.Y., X.S., N.J.A., M.E.H., J.F.P., E.T., I.M.Y.R., R. Rosini, S.T., N.F., and S.A.S. performed research; A.R.F., H.Y., X.S., N.J.A., M.E.H., J.F.P., E.T., I.M.Y.R., R. Rosini, G.G., and N.H. contributed new reagents/ analytic tools; A.R.F., J.G.-P., P.S., M.S., H.Y., X.S., M.E.H., I.M.Y.R., R. Rosini, G.G., A.M., N.F., R. Rappuoli, C.J.B., and S.A.S. analyzed data; and A.R.F., J.G.-P., H.Y., X.S., N.J.A., M.E.H., J.F.P., E.T., R. Rosini, N.H., S.T., A.M., N.F., R. Rappuoli, C.J.B., and S.A.S. wrote the paper. The authors declare no conflict of interest.
Data deposition: The sequences reported in this paper have been deposited in the Na-tional Center for Biotechnology BioProject database,www.ncbi.nlm.nih.gov/bioproject
(BioProjectPRJNA274384). For a list of accession numbers, seeTable S1.
1To whom correspondence may be addressed. Email: rino.rappuoli@novartis.com or
sshelburne@mdanderson.org.
This article contains supporting information online atwww.pnas.org/lookup/suppl/doi:10. 1073/pnas.1504725112/-/DCSupplemental.
www.pnas.org/cgi/doi/10.1073/pnas.1504725112 PNAS | May 19, 2015 | vol. 112 | no. 20 | 6431–6436
MIC
the past 25 y in nonpregnant adults, with a significant part of the rise resulting from serotype V GBS strains (15–17).
Despite the clear and increasing impact of serotype V strains, data are limited regarding molecular epidemiology of serotype V GBS causing invasive disease in nonpregnant adults (14, 15, 18). Only a few studies of serotype V strains have investigated the noncapsular genetic makeup of the strains, and those that have done so have included colonizing and invasive GBS strains iso-lated from infants or have not described the clinical origin of the tested strains (18, 19). Thus, we sought to analyze a large cohort of clinically well-defined, geographically distinct, and temporally disparate GBS isolates by using a whole-genome approach to elucidate the population structure of serotype V GBS causing invasive disease in nonpregnant adults. Data using non–genome-wide level approaches found that many serotype V strains were closely related, suggesting that a particular clone, rather than a genetically diverse array of strains arising from large-scale re-combination, might be responsible for the majority of serotype V disease (18). Thus, we specifically sought to test the hypothesis that genetic diversity among invasive serotype V GBS strains is driven by small genetic changes at loci that are critical to GBS host–pathogen interaction.
Results
The Vast Majority of GBS Serotype V Bloodstream Isolates from Nonpregnant Adults Are Multi-Locus Sequence Type 1. We de-termined the MLST of 229 serotype V GBSs isolated from the blood of unique nonpregnant adults in Houston, TX, and Tor-onto, ON, Canada (SI Appendix, Table S1). A total of 200 iso-lates were sequence type (ST) 1 and an additional 10 isoiso-lates were single-loci variants that we will consider ST-1 for the pur-poses of this manuscript (Materials and Methods provides further details on these 10 strains), for a total of 210 of the 229 strains (92%). The non–ST-1 serotype V strains were a mixture of other STs, with the most common being ST-19. Thus, the over-whelming majority of invasive serotype V GBS strains causing bacteremia in nonpregnant adults in Houston and Toronto be-long to a single ST.
Determination of a Complete Genome Sequence of an ST-1 GBS Strain Causing Invasive Human Infection.As a complete genome of an ST-1 GBS strain isolated from a human had not been de-termined previously, we next used a combination of long reads by using a PacBio instrument and paired-end Illumina short-read data to completely assemble the genome of the ST-1 strain SGBS001 (20). The genome was 2,092,071 bp with 2,061 pre-dicted ORFs and contains genes prepre-dicted to encode alpha-like protein (Alp) 3 (Alp3) and pilus 1 and 2a (Fig. 1A) (21, 22). Compared with GBS strains whose complete genomes are available from the National Center for Biotechnology Infor-mation (NCBI), strain SGBS001 was most similar to the se-rotype V Swedish cow mastitis strain 09mas018883 (also ST-1; Fig. 1B) (23). Strain SGBS001 is >99% similar and has >99% coverage compared with strain 09mas018883, with the major difference being the presence of a ∼40 kb region uniquely in strain 09mas018883 (located between homologs of rdf_1233 andrdf_1234), which includes genes encoding proteins involved in lactose utilization (thelac.2 operon; Fig. 1B). The lac.2 operon has previously been shown to be preferentially found in GBS isolated from cattle (24). Given that the presence of the lac.2 operon in bovine GBS strains has been suggested to occur via lateral gene transfer (25), our findings indicate that a single genetic event could account for the majority of differences ob-served between these two strains.
Description of a Novel Alp Present in SGBS001.To begin to study why ST-1 GBS strains have a specific predilection for causing disease in nonpregnant adult humans, we searched the SGBS001 genome for cell-surface or actively secreted proteins that were unique to ST-1 GBS. We found a single gene,rdf_0594, present on the mobile genetic element (MGE) RDF.2 (Fig. 1A andSI
Appendix, Fig. S1), which encodes a cell-surface protein that was absent in non–ST-1 GBS as determined by BLAST analysis. RDF_0594 is predicted to be 1,769 aa long and to contain an N-terminal secretion signal sequence and a C-terminal LPXTG cell-wall localization motif consistent with cell-surface localiza-tion (Fig. 2A). The C-terminal porlocaliza-tion of RDF_0594 contains six repeats of 79 aa that are 77% similar to the repeats found in the alpha-like Rib protein (21). Alpha and Alps are major compo-nents of the GBS cell surface, although their precise role in GBS pathogenesis is unknown (26). Although the N-terminal signal sequence and C terminus of RDF_0594 place it in the Alp family, the N-terminal 1,187 aa of RDF_0594 have no readily identifiable homology to other GBS Alps. By homology model-ing usmodel-ing I-TASSER and GenTHREADER (27, 28), these 1,187 aa were predicted to consist of a repeat β-sheet structure typi-cally found in proteins involved in cell adhesion or binding to host proteins, such as fibrinogen or complement factor binding proteins (29). Interestingly, strains SS-1168 and SS-1172, which were among the first serotype V GBS strains isolated from hu-mans (in 1977 and 1978, respectively) and were confirmed to be ST-2, which differs by only one SNP in theatr allele from ST-1, lacked the RDF.2 MGE containingrdf_0594 and encoded Alp1 rather than Alp3 (Fig. 2B). Given that RDF_0594 is quite dis-tinct from other Alps (Fig. 2B) and has not previously been described outside of ST-1 GBS strains, we have named it AlpST-1 and suggest that further study of the role of this protein in GBS pathogenesis is warranted.
Large-Scale Recombination Is Not the Major Factor Driving Genetic Diversity Among ST-1 GBS.We next sought to determine the genetic relationship of the invasive ST-1 strains at the whole-genome level. Polymorphisms in the genome sequences of 201 ST-1 strains were identified relative to strain SGBS001 by using two independent pipelines [variant ascertainment algorithm (VAAL) Fig. 1. Whole-genome analysis of an ST-1 serotype V GBS strain isolated from the bloodstream of a nonpregnant adult. (A) Genome atlas for the reference serotype V ST-1 strain SGBS001. Genome scale in megabases is given in the innermost circle (circle 1). Circle 2 shows GC skew, calculated as (G− C)/(G + C) and averaged over a moving window of 10,000 bp showing excess G (green) and C (purple). GC content is displayed in circle 3 with values above average (outward directed) or below average (inward directed) in-dicated. Annotated coding sequences are shown in dark blue (forward, clockwise) and light blue (reverse, counterclockwise) in circle 4. Reference genome landmarks are displayed in circle 5 and include ribosomal operons (green), MGEs (red, RDF.1–6), pilus islands [blue, pilus island 1 (PI-1) and 2a (PI-2a)], capsule biosynthesis operon (blue, neuAcpsA), and Alp-encoding genes. (B) BLASTN comparisons of the reference genome SGBS001 and publically available completed GBS genomes as labeled in the legend. The innermost ring shows the genome scale (in megabases) and subsequent rings (inner- to outermost) show BLASTN comparisons in order of decreasing ho-mology (ring, serotype, NCBI accession no.) for 09mas018883 (1, V, NC_ 021485), A909 (2, 1a, NC_007432), GD201008-001 (3, 1a, NC_018646), 2603V/R (4, V, NC_004116), NEM316 (5, III, NC_004368), ILRI112 (6, VI, NC_021507), ILRI005 (7, V, NC_021486), 2–22 (8, Ia, NC_021195), SA20-06 (9, Ia, NC_019048), and 138P (10, Ib, CP007482). Reference genome landmarks for SGBS001 are shown as in A.
(30) and CLC Genomics Workbench version 7; Materials and Methods], and 9,876 unique SNP loci were used to determine phylogenetic relationships (Fig. 3A). After accounting for differ-ences in MGE content, eight strains showed greater phylogenetic divergence compared with the main phylogenetic cluster of ST-1 strains (Fig. 3B). By using Bayesian analysis of recombination (BRATNextGen) (31), we identified discrete regions of recom-bination with non–ST-1, non-serotype V GBS in these outlying strains (SI Appendix, Fig. S2 and Table S2). For example, strain SGBS064 contains a 41-kb region that most closely resembles DNA from the ST-23, serotype III strain NEM316 (labeled in orange as SGBS064.1 in Fig. 3C). Importantly, however, evidence of such recombination was quite rare among the ST-1 strains in this cohort, indicating that recombination is not a major force driving genetic diversity among serotype V ST-1 GBS. Given that we an-alyzed only serotype V strains, we would not have detected re-combination involving the capsular polysaccharide synthesis (cps) locus, but a recent study of 229 GBS isolates identified only one ST-1 strain that was not capsular type V, suggesting thatcps re-combination is not common among ST-1 strains (10). The eight ST-1 strains identified as having significant recombination were excluded from the remainder of our analysis, leaving a total of 194 sequenced ST-1 strains.
Genetic Relationship Among ST-1 GBS Strains Lacking Evidence of Recombination.By using the method of Harris et al. (1), we de-termined that the core genome of the 194 ST-1 strains without evidence of recombination was 1,931 genes, and included the gene encoding the AlpST-1 protein. Moreover, the majority of gene variance compared with SGBS001 was observed in only one or two strains (SI Appendix, Table S3). A total of 2,001 genes, or 97% of the SGBS001 genome, were present in at least 192 of the 194 strains, demonstrating that there were minimal differences in terms of gene presence or absence among the ST-1 strains.
The elucidation of the ST-1 core genome and the finding that only a few strains had significant recombination provided the opportunity to determine the genetic relationship of ST-1 strains.
Phylogenetic analysis for the 194 strains was performed by using 5,880 unique SNP loci (Fig. 3D). The average number of genetic polymorphisms separating any two ST-1 strains was 97, which is consistent with the relatively small number of SNPs separating strains of serotype M1 or M3 group AStreptococcus (4, 5). When temporal and geographic factors were considered, we observed a radial pattern of divergence that appeared temporally but not geographically dependent (Fig. 3D shows a radial phylogenetic tree whereas SI Appendix, Fig. S3A, shows a phylogram of the same dataset). For example, when strains were divided into early (1992–2000; Fig. 3D, red), middle (2001–2009; Fig. 3B, green), and late time groupings (2010–2012; Fig. 3D, blue), strains from the latter time group tended to be located on the periphery of the phylogenetic tree whereas strains form the early period were primarily located closest to center. Although one possible ex-planation for this appearance is that the Canadian strains were exclusively isolated in the later time period, this relationship was replicated when only Houston strains were analyzed (SI Appen-dix, Fig. S3B). Moreover, strains that were isolated from Toronto were interspersed among strains from Houston (Fig. 3D) despite the fact that two cities are separated by∼2,100 km. To statisti-cally test our visual observation of temporal divergence, we calculated the number of SNPs that separated strains within each group and found a statistically significant, progressive increase (i.e., late> middle > early) in the average number of intragroup Fig. 2. Characteristics of novel alpha-like protein from ST-1 GBS. (A)
Com-parison of structure of Alps from strain 2603V/R and SGBS001. Schematic of “S” (signal), “N” (N terminus), “A” (A repeat), “U” (unknown), “R” (repeat), and“C” (C terminus) regions are from Lachenauer et al. (21). Numbers refer to percent amino acid identify. (B) Phylogenetic tree of alpha and Alps was created by using MEGA following alignment via ClustalW. In bold type are alpha and alpha-like type sequences derived from Lachenauer et al. (21) with strain source indicated in parentheses. In regular type are alpha and Alps from the fully sequenced strains 2603V/R (serotype V), NEM316 (serotype III), and SGBS001 (serotype V) and the early serotype V isolates 1168 and SS-1172. Numbers refer to confidence of branching as determined by bootstrap analysis (1,000 replicates). Bracket refers to genetic distance.
Fig. 3. Whole genome-level phylogenetic analysis of invasive ST-1 GBS strains. (A) A rooted neighbor joining tree representing 10,365 unique SNP loci from 201 serotype V ST-1 GBSs relative to the reference genome SGBS001. The outgroup ST-2 strain SS1168 is labeled for reference. (B) Rooted neighbor joining tree after exclusion of MGEs (RDF.1–6) repre-senting 8,459 unique SNP loci from 201 GBS isolates as in A. (C) Genome atlas showing SNP distribution of strains NGBS519 (black, innermost, ring 1), SGBS064 (orange, ring 2), SGBS141 (green, ring 3), SGBS060 (red, ring 4), and SGBS042 (blue, ring 5). Ring 6 shows selected reference genome landmarks including pilus islands [pilus island 1 (PI-1) and 2a (PI-2a)] and capsule bio-synthesis genes (neuA-cpsA). Also displayed are regions of putative re-combination as identified by BratNextGen for each strain. (D) Rooted neighbor joining tree representing 6,375 unique SNP loci from 194 serotype V ST-1 GBSs after excluding strains with potential recombination. For A, B, and D, the year and location of strain isolation is as shown in the legend.
Flores et al. PNAS | May 19, 2015 | vol. 112 | no. 20 | 6433
MIC
SNPs (P < 0.05, Mann–Whitney U test;SI Appendix, Fig. S3C). Given the finding of temporal divergence, we used Path-O-Gen to estimate the origin of serotype V ST-1 strains and arrived at a date of 1963, which is relatively close to the first reported iso-lation of serotype V from humans in 1975 (18) (SI Appendix, Fig. S3D). We additionally analyzed 24 serotype V ST-1 strains from three different continents and for which whole-genome data were recently reported (10) and found that the strains were in-terspersed among our North American isolates, suggesting the global dissemination of the ST-1 clone (SI Appendix, Fig. S4).
High Rates of Antimicrobial Resistance Genes Present in ST-1 Strains.
Consistent with a recent report regarding the key role of tetra-cycline resistance in GBS evolution (10), tetratetra-cycline resistance elements were found ubiquitously throughout our collection (91%), with the most common determinant beingtetM (SI Ap-pendix, Fig. S5A). When present,tetM was invariably located in the transposable element Tn916 inserted in RDF.2 (SI Appendix, Fig. S1) at the location reported for the Tn916–1 lineage by Da Cunha et al. (10). Macrolide resistance factors were present at a relatively higher rate in the present study (59%;SI Appendix, Fig. S5A) than what has been reported in other GBS studies (10, 17). The majority of macrolide resistance elements were ermB or ermTR (SI Appendix, Fig. S5A), withermB being colocalized with tetM in Tn3872 as recently described (10). The vast majority of strains carryingermB alleles clustered on the same phylogenetic branch, indicating that single insertion events resulted in most of the macrolide resistant strains (SI Appendix, Fig. S5B). The two serotype V strains from the 1970s lackedtetM or erm genes, in accord with the absence of RDF.2.
Identification of GBS Genes Undergoing Positive Selection.We next determined the degree of genetic variation at each of the 1,931 loci present in the core genome for the 194 ST-1 strains to ask the question whether particular GBS genes have high levels of genetic variation, as such loci would be expected to participate in host–pathogen interaction (32). After accounting for multiple comparisons, 31 genes had significantly higher genetic variation levels compared with the remainder of the genome (Fig. 4, Table 1, and SI Appendix, Table S4). These highly polymorphic loci in-cluded genes encoding proteins involved in polysaccharide capsule synthesis, regulators of pilus production, and members of two-component gene regulatory systems (TCS) (Table 1). There was a strong bias toward nonsynonymous SNPs in these genes with a nonsynonymous:synonymous ratio of 5, suggesting that these genes are undergoing adaptive evolution as a result of selective pressure (32) (Table 1).
Small Genetic Changes Are Associated with Significant Changes in Capsule and Pilus Production. The genetic variation data sug-gested that alterations in polysaccharide capsule and pilus pro-duction figure prominently in ST-1 interstrain variation. Capsule and pilus proteins are the major targets of proposed GBS vaccines (33). To confirm that the observed genetic alterations were associated with phenotypic diversity in capsule and pilus levels, we tested the ability of strains with defined genetic al-terations (SI Appendix, Table S1) to produce type V capsule and type 1 and 2a pilus proteins by using monoclonal anti-bodies and FACS analysis (22). Consistent with the genetic data, the only strains to produce low or undetectable levels of type V capsule were those with insertions in the capsule bio-synthesis encoding genescpsG and cpsO (Fig. 5A). Similar to the capsule data, pilus 1 levels were low only in the three strains that contained genetic alterations encoding for pilus 1 accessory or backbone proteins (Fig. 5B). Compared with pilus 1, pilus 2a levels were more variable across the strains but were undetectable in the three strains that contained predicted del-eterious polymorphisms in genes encoding pilus 2a encoding proteins (Fig. 5C).
Polymorphisms in Regulatory Protein Encoding Genes Are Associated with Distinct Transcriptomes.Given that several regulator encod-ing genes contained high levels of genetic polymorphisms (Table 1), we next tested the hypothesis that strains with polymorphisms in known or putative regulatory genes have distinct tran-scriptomes. To this end, we used RNA sequencing (RNA-Seq) to compare the transcriptome of four strains with defined mutations in regulatory genes that were identified as being highly poly-morphic in our genetic variation analysis. The genotypes of the strains with defined mutations in four distinct regulatory proteins are shown in Fig. 5D (genotype details are provided inSI Ap-pendix, Table S1), whereas strain SGBS102 is WT at these four regulatory loci and thus was chosen as the control strain for our analysis. In concert with the idea that the observed genetic polymorphisms resulted in significant phenotypic variation, principal component analysis of transcriptome data showed that the global gene expression profiles of the tested strains were quite distinct (Fig. 5D andSI Appendix, Fig. S6).
The minimum number of genes with differential transcript levels (defined as mean transcript level at least twofold and final, corrected P ≤ 0.05) between the test and control strains (e.g., SGBS001 vs. SGBS102 or SGBS106 vs. SGBS102) was 25 for strain SGBS106, whereas the highest number was 92 for strain SGBS046, which encodes a truncated form of the TCS regulator RDF_0315 (SI Appendix, Table S5). Multiple genes known to contribute to GBS host–pathogen interaction were included among the differentially transcribed genes includingsrr-1, pilus encoding genes, Alp encoding genes, andbibA, which encodes an cell-surface adhesion critical to GBS survival in human blood Fig. 4. Identification of GBS genes with high levels of genetic variation.
Shown is the distribution ofχ2statistics for observed vs. expected numbers
of SNPs for the 1,931 genes in the serotype V ST-1 core genome. The genes with the five lowest corrected P values are labeled.
Table 1. Examples of GBS genes with significantly increased genetic variation
Gene location
Gene
name Mutations*
Putative function of encoded protein
rdf_1161 cpsO 17/0 Capsule synthesis protein rdf_0654 ape1 15/2 Pilus 1 regulator rdf_1162 cpsN 13/2 Capsule synthesis protein rdf_1167 cpsE 17/0 Capsule synthesis protein rdf_0315 7/1 Response regulator rdf_1166 cpsF 7/0 Capsule synthesis protein rdf_1410 rga 9/0 Pilus 2a regulator rdf_0973 12/3 Oligopeptide transporter rdf_1359 rogB 10/2 Pilus 2a regulator rdf_1119 skzL 10/3 Streptokinase B rdf_1169 cpsC 4/1 Capsule synthesis protein
*Nonsynonymous/synonymous mutations.
(Fig. 5D) (34). An example of the diversity observed among the studied strains comes from thealp3 gene, whose transcript level was∼25-fold lower in strains SGBS054 and SGBS046 compared with strain SGBS102 (Fig. 5D). In concert with the capsule and pilus expression data, the transcriptome analyses show that small genetic variations can be associated with significant phenotypic variation among ST-1 GBS strains.
Discussion
The increasing feasibility of whole-genome sequencing of large cohorts of clinical bacterial isolates is beginning to elucidate mechanisms of pathogen evolution, which, in turn, is providing key insights into a broad range of medically important issues such as occurrence of epidemics and transmission of drug-resistant path-ogens (2, 4, 6). Most studies that involve high density genomic sequencing of bacterial pathogens investigated bacteria that have been prevalent causes of human disease for centuries (1, 4, 6). In contrast, GBS has only recently emerged as a significant cause of human infection, perhaps as a result of the proliferation of tet-racycline-resistant clones during an era of widespread tetracycline use, as suggested by Da Cunha et al. (10, 15, 16, 18).
A key finding of the present work was that ST-1 strains that are highly similar at the whole-genome level accounted for
>90% of the invasive infections in Houston and Toronto caused by serotype V GBS in nonpregnant adults. Moreover, diversity in the ST-1 strains was primarily driven by small genetic events rather than recombination. This finding was somewhat unexpected, as recombination has been found to be a major driver of GBS genetic diversity when diverse serotypes are analyzed (3, 10). Together with previous studies, these data suggest that GBS evolution may be viewed as similar to the antigenic shift/antigenic drift model of influenza in which recombination drives the emergence of new GBS subtypes (as may have occurred for serotype V in the mid-1970s), which then slowly accumulate new genetic polymorphisms over time. A recent report by Da Cunha et al. analyzing a broad array of GBS serotypes also found that the majority of GBS CCs comprised strains that were highly clonal in nature, as exemplified by a 3% recombination rate among ST-17 strains, suggesting that a similar population structure may hold for other major GBS STs as we describe herein for ST-1 (10).
Intriguingly, analysis of two of the first serotype V strains causing disease in humans (strains ST-1168 and ST-1172) de-termined that these strains were a single loci variant of ST-1, but lacked the Alp3 and AlpST-1 proteins ubiquitous in our 1992– 2013 cohort. It has been shown that serotype V strains were present in humans for approximately 15 y before expanding in the early 1990s and that the pulsed-field gel electrophoresis (PFGE) pattern of serotype V strains responsible for the ma-jority of the expansion observed in the 1990s is the same as serotype V strains isolated as early as 1975 (18). Our MLST findings are in concert with the previously published PFGE data, but the increased sensitivity of our whole-genome sequencing approach reveals that acquisition of Alp3 and AlpST-1 by ST-1 strains occurred sometime between the first known isolation of serotype V GBS from humans in 1975 and the expansion of serotype V strains observed in the early 1990s. It was recently suggested that acquisition of tetracycline resistance has been a major force in the emergence of GBS, and the finding that ST-1168 and ST-1172 lacktetM is consistent with this proposal (10). Importantly, however,tetM is present in the RDF.2 MGE, which also contains the gene encoding AlpST-1 along with numerous other proteins (SI Appendix, Fig. S1). Thus, at least for ST-1 strains, it remains possible that the widespread nature oftetM is a result of its colocalization with other key genes rather than tet-racycline resistance per se being the driving force for clone emergence. Similarly to Da Cunha et al., we found that a sig-nificant proportion of our ST-1 strains contained the ermB macrolide resistance element present along withtetM in Tn3872 (10). Taken together, these data appear most consistent with the idea that expansion of ST-1 serotype V GBS since 1990 was facilitated by acquisition of genetic determinants that allowed for an increased capacity to cause disease in nonpregnant adults.
The relatively low rate of recombination among ST-1 strains allowed for elucidation of discrete GBS loci under positive selec-tion, only some of which have been previously identified as im-portant for GBS host–pathogen interaction (22, 26, 35). Thus, in conjunction with our Alp findings, these data provide a new plat-form for investigating why ST-1 GBS strains have emerged as a major cause of invasive disease in nonpregnant adults, such as analysis of the previously unstudied TCS encoded byrdf_0314-0315. Interestingly, although the exact genes under positive selection are distinct between GBS and GAS, the GBS findings are thematically similar to those previously reported for GAS in terms of the high prevalence of nonsynonymous polymorphisms predicted to alter the bacterial cell surface through variation in regulatory proteins con-trolling cell-surface composition or through changes in cell-surface proteins themselves (4, 5, 36).
In summary, we have discovered that the emergence of serotype V GBS causing invasive disease in nonpregnant adults is primarily driven by ST-1 strains that are highly similar at the whole-genome level but possess significant phenotypic diversity as a result of small genetic changes. These data provide a new platform for investigating factors that contribute to invasive GBS disease in Fig. 5. Small genetic changes lead to significant phenotypic variation. (A–C)
Indicated strains are color coded as follows: red strains are WT at capsule and pilus loci; green strains have polymorphisms in capsule encoding genes; or-ange strains have polymorphisms in pilus 1 encoding genes; purple strains have polymorphisms in pilus 2a encoding genes; and black/gray strains are controls. Strains that have mutations in one loci are WT at the other two loci (e.g., strains with capsule locus mutations are WT at pilus 1 and 2a loci), and the same 12 ST-1 strains are tested in A–C. Data graphed are fluorescence above baseline for indicated monoclonal antibody. Lower dashed line is cutoff between positive and negative, whereas upper dashed line is cutoff between low and high expression (33). (A) Type V capsule: strain 2603V/R (serotype V) is positive control whereas strain 515 (serotype 1a) is negative control. (B) Type 1 pilus: strain 515 is negative control. (C) Type 2a pilus: strain 515 is positive control and strain 2603V/R is negative control. (D) Heat map of RNA-Seq data for selected genes. Log2fold transcript level is shown
relative to strain SGBS102. Strain SGB102 is WT at the loci encoding regu-latory proteins which are mutated in strains as indicated in the figure.
Flores et al. PNAS | May 19, 2015 | vol. 112 | no. 20 | 6435
MIC
adult humans and shed new light into the molecular mechanisms by which pathogenic bacteria adapt to the human host. Materials and Methods
Bacterial Strains and Serotyping. The strains used in the present work are shown inSI Appendix, Table S1. The United States strains comprised consecutive serotype V GBSs isolated from the bloodstream of nonpregnant adults as part of an on-going surveillance program at the Texas Medical Center in Houston, TX, between 1992 and 2013. Canadian strains with the same characteristics were isolated as part of a surveillance program for invasive streptococcal infection in Toronto from 2011 to 2012 as described previously (13). Identification of GBS serotype was performed by using latex agglutination. SS-1168 and SS-1172 are serotype V strains isolated in 1977 and 1978, respectively, and were obtained from the Centers for Disease Control and Prevention from the collection of H. W. Wilkinson (37).
MLST and Whole-Genome Sequencing. MLST was performed as described previously (11) and verified directly from short-read whole-genome se-quence by using SRST2 (https://github.com/katholt/srst2). Ten strains differed from ST-1 by a single polymorphism. Whole-genome sequencing showed that these strains clustered tightly with ST-1 strains, and they were consid-ered as ST-1 strains for the purposes of this manuscript. Details on whole-genome sequencing of SGBS001 using a combination of PacBio and Illumia HiSeq data are presented inSI Appendix, SI Materials and Methods. The GenBank accession number for the SGBS001 genome is CP010867. Of the remaining 208 ST-1 strains, whole-genome sequencing of 201 GBS strains (seven random strains were not sequenced for multiplexing convenience reasons) was performed by using Illumina HiSeq or MiSeq instruments with an average sequencing depth of 120× (38). Sequence data for the 208 ge-nomes have been deposited in the NCBI Short Read Archive database (BioProject PRJNA274384). Noncore aspects of the ST-1 genome were defined as all sequences> 1 kb that were not present in all sequenced isolates, as previously described (1). Polymorphisms in the core genome were called against the reference SGBS001 genome by using VAAL (30) and the CLC
Genomics Workbench, version 7.0.3. Phylogenetic analyses and trees were generated by using the CLC Genomics Workbench, version 7.0.3. Regions of recombination were identified by using BratNextGen (31). By using the neighbor-joining tree of 194 ST-1 GBS strains, we generated a chronogram to estimate the time to most recent common ancestor by using Path-O-Gen (tree.bio.ed.ac.uk/software/pathogen). Genes with an excess of polymorphisms were identified as detailed inSI Appendix, SI Materials and Methods, using the Bonferroni method to correct for multiple comparisons (5). Genomic visualizations were generated by using BRIG (39).
Phylogenetic Analysis of the AlpST-1 Protein. Following alignment using CLUSTALW, sequences were analyzed in MEGA, version 5.2, to create radial trees by using the neighbor-joining statistical method and the maximum-likelihood composite model. The robustness of the nodes was evaluated via bootstrapping (1,000 replicates).
Capsule and Pilus Expression Level Analysis. Genotypes of tested strains are listed inSI Appendix, Table S1. Capsule and pilus expression level were de-termined by using monoclonal antibodies and FACS analysis as described previously (22).
RNA Isolation and Transcriptome Analysis. For RNA-Seq analysis, strains were grown in quadruplicate to midexponential phase in Todd–Hewitt broth (Difco) and RNA was isolated using a Qiagen RNeasy kit. RNA-Seq analysis was performed in quadruplicate per strain as described previously (40) and as detailed inSI Appendix, SI Materials and Methods. Transcript levels were considered significantly different if the mean transcript level difference was≥ 2.0-fold and the final adjusted P value was less than 0.05.
ACKNOWLEDGMENTS. This work was supported by National Institutes of Allergy and Infectious Diseases/National Institutes of Health Grant 5R01AI089891 (to S.A.S.), Chapman Foundation, and MD Anderson Cancer Center Support Grant P30 CA016672 (Bioinformatics Shared Resource).
1. Harris SR, et al. (2010) Evolution of MRSA during hospital transmission and in-tercontinental spread. Science 327(5964):469–474.
2. Chewapreecha C, et al. (2014) Dense genomic sampling identifies highways of pneumococcal recombination. Nat Genet 46(3):305–309.
3. Brochet M, et al. (2008) Shaping a bacterial genome by large chromosomal re-placements, the evolutionary history of Streptococcus agalactiae. Proc Natl Acad Sci USA 105(41):15961–15966.
4. Nasser W, et al. (2014) Evolutionary pathway to increased virulence and epidemic group A Streptococcus disease derived from 3,615 genome sequences. Proc Natl Acad Sci USA 111(17):E1768–E1776.
5. Beres SB, et al. (2010) Molecular complexity of successive bacterial epidemics decon-voluted by comparative pathogenomics. Proc Natl Acad Sci USA 107(9):4371–4376. 6. Casali N, et al. (2014) Evolution and transmission of drug-resistant tuberculosis in a
Russian population. Nat Genet 46(3):279–286.
7. Eickhoff TC, Klein JO, Daly AK, Ingall D, Finland M (1964) Neonatal sepsis and other infections due to group B beta-hemolytic streptococci. N Engl J Med 271:1221–1228. 8. Tettelin H, et al. (2005) Genome analysis of multiple pathogenic isolates of Strepto-coccus agalactiae: implications for the microbial“pan-genome”. Proc Natl Acad Sci USA 102(39):13950–13955.
9. Bellais S, et al. (2012) Capsular switching in group B Streptococcus CC17 hypervirulent clone: A future challenge for polysaccharide vaccine development. J Infect Dis 206(11):1745–1752. 10. Da Cunha V, et al.; DEVANI Consortium (2014) Streptococcus agalactiae clones infecting hu-mans were selected and fixed through the extensive use of tetracycline. Nat Commun 5:4544. 11. Jones N, et al. (2003) Multilocus sequence typing system for group B streptococcus.
J Clin Microbiol 41(6):2530–2536.
12. Tazi A, et al. (2010) The surface protein HvgA mediates group B streptococcus hy-pervirulence and meningeal tropism in neonates. J Exp Med 207(11):2313–2322. 13. Teatero S, et al. (2014) Characterization of invasive group B streptococcus strains from
the greater Toronto area, Canada. J Clin Microbiol 52(5):1441–1447.
14. Phares CR, et al.; Active Bacterial Core surveillance/Emerging Infections Program Network (2008) Epidemiology of invasive group B streptococcal disease in the United States, 1999-2005. JAMA 299(17):2056–2065.
15. Lamagni TL, et al. (2013) Emerging trends in the epidemiology of invasive group B streptococcal disease in England and Wales, 1991-2010. Clin Infect Dis 57(5):682–688. 16. Skoff TH, et al. (2009) Increasing burden of invasive group B streptococcal disease in
nonpregnant adults, 1990-2007. Clin Infect Dis 49(1):85–92.
17. Tazi A, et al. (2011) Invasive group B streptococcal infections in adults, France (2007-2010). Clin Microbiol Infect 17(10):1587–1589.
18. Elliott JA, Farmer KD, Facklam RR (1998) Sudden increase in isolation of group B streptococci, serotype V, is not due to emergence of a new pulsed-field gel electro-phoresis type. J Clin Microbiol 36(7):2115–2116.
19. Bohnsack JF, et al. (2008) Population structure of invasive and colonizing strains of Streptococcus agalactiae from neonates of six U.S. Academic Centers from 1995 to 1999. J Clin Microbiol 46(4):1285–1291.
20. Koren S, et al. (2013) Reducing assembly complexity of microbial genomes with sin-gle-molecule sequencing. Genome Biol 14(9):R101.
21. Lachenauer CS, Creti R, Michel JL, Madoff LC (2000) Mosaicism in the alpha-like protein genes of group B streptococci. Proc Natl Acad Sci USA 97(17):9630–9635. 22. Margarit I, et al. (2009) Preventing bacterial infections with pilus-based vaccines: The
group B streptococcus paradigm. J Infect Dis 199(1):108–115.
23. Zubair S, et al. (2013) Genome sequence of Streptococcus agalactiae strain 09mas018883, isolated from a Swedish cow. Genome Announcements 1(4):e00456-13.
24. Richards VP, Choi SC, Pavinski Bitar PD, Gurjar AA, Stanhope MJ (2013) Transcriptomic and genomic evidence for Streptococcus agalactiae adaptation to the bovine envi-ronment. BMC Genomics 14:920.
25. Richards VP, et al. (2011) Comparative genomics and the role of lateral gene transfer in the evolution of bovine adapted Streptococcus agalactiae. Infect Genet Evol 11(6):1263–1275. 26. Lindahl G, Stålhammar-Carlemalm M, Areschoug T (2005) Surface proteins of Strep-tococcus agalactiae and related proteins in other bacterial pathogens. Clin Microbiol Rev 18(1):102–127.
27. Zhang Y (2008) I-TASSER server for protein 3D structure prediction. BMC Bioinformatics 9:40. 28. Jones DT (1999) GenTHREADER: An efficient and reliable protein fold recognition
method for genomic sequences. J Mol Biol 287(4):797–815.
29. Xiang H, et al. (2012) Crystal structures reveal the multi-ligand binding mechanism of Staphylococcus aureus ClfB. PLoS Pathog 8(6):e1002751.
30. Nusbaum C, et al. (2009) Sensitive, specific polymorphism discovery in bacteria using massively parallel sequencing. Nat Methods 6(1):67–69.
31. Marttinen P, et al. (2012) Detection of recombination events in bacterial genomes from large population samples. Nucleic Acids Res 40(1):e6.
32. Lieberman TD, et al. (2011) Parallel bacterial evolution within multiple patients identifies candidate pathogenicity genes. Nat Genet 43(12):1275–1280.
33. Chen VL, Avci FY, Kasper DL (2013) A maternal vaccine against group B Streptococcus: Past, present, and future. Vaccine 31(suppl 4):D13–D19.
34. Santi I, et al. (2007) BibA: A novel immunogenic bacterial adhesin contributing to group B Streptococcus survival in human blood. Mol Microbiol 63(3):754–767. 35. Cieslewicz MJ, et al. (2005) Structural and genetic diversity of group B Streptococcus
capsular polysaccharides. Infect Immun 73(5):3096–3103.
36. Shea PR, et al. (2011) Distinct signatures of diversifying selection revealed by genome analysis of respiratory tract and invasive bacterial populations. Proc Natl Acad Sci USA 108(12):5039–5044.
37. Wilkinson HW (1977) Nontypable group B streptococci isolated from human sources. J Clin Microbiol 6(2):183–184.
38. Shelburne SA, et al. (2014) Streptococcus mitis strains causing severe clinical disease in cancer patients. Emerg Infect Dis 20(5):762–771.
39. Alikhan NF, Petty NK, Ben Zakour NL, Beatson SA (2011) BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons. BMC Genomics 12:402.
40. Horstmann N, et al. (2014) Dual-site phosphorylation of the control of virulence regulator impacts group a streptococcal global gene expression and pathogenesis. PLoS Pathog 10(5):e1004088.
SI Materials and Methods
Complete Genome Assembly of Strain SGBS001. Genomic DNA from overnight culture was
isolated using a phenol/chloroform extraction protocol and subjected to large-insert PacBio
library preparation following the User Bulletin – Guideline for Preparing 20 kb SMRTbellTM
Templates (version 2) and Procedure and Checklist – 20kb Template Preparation Using
BluePippin Size-Selection (Version 3)
(http://www.pacificbiosciences.com/support/pubmap/documentation.html). SMRTbell templates
were subjected to standard SMRT sequencing using an engineered phi29 DNA polymerase on
the PacBio RS system according to the manufacturer’s protocol. Subsequently, paired end
sequencing reads derived from short-read Illumina HiSeq data were mapped to the newly
assembled contig for error correction of the PacBio data (1). Annotation of the closed SGBS001
genome was performed using RAST (2) and BASYS (3) in addition to homologous annotation
transfer from the previously sequenced 2603V/R genome (accession number NC_004116) using
CLCBio Genomics Workbench version 7.0.3 (Qiagen). The GenBank accession number for the
SGBS001 genome is CP010867.
Identification of Genes with an Excess of Polymorphisms. To identify genes possibly under
positive selection, we sought to determine whether any of the 1,931 core ST-1 genes had a higher
number of SNPs than expected for a random distribution as previously described (4). The
expected numbers of SNPs was calculated for all core genes and corrected for the size of the
gene relative to the c
ore genome. The resulting χ
2probabilities were then corrected for multiple
testing using the Bonferroni method (Table 1).
Transcriptome Analysis Using RNASeq. For RNA-Seq analysis, strains were grown in
quadruplicate to mid-exponential phase and RNA was isolated using a RNeasy kit (Qiagen).
RNA integrity was confirmed using an Agilent Bioanalyzer. First and second strand cDNA
synthesis was performed on 500 ng of rRNA depleted RNA using the Ovation Prokaryotic
RNA-Seq System (NuGEN). The resulting cDNA was fragmented to 200 bp (mean fragment size)
with the S220 Focused-ultrasonicator (Covaris) and used to make barcoded sequencing libraries
on the SPRI-TE Nucleic Acid Extractor (Beckman-Coulter). No size selection was employed.
Libraries were quantitated by qPCR (KAPA Systems), multiplexed and 8 samples per lane were
sequenced on the HiSeq2000 using
76 bp paired-end
sequencing. The raw reads in FASTQ
format were aligned to the SGBS001 genome using Mosaik alignment software (version:
1.1.0021,
http://code.google.com/p/mosaik-aligner/
) with the following alignment parameters: 8
percent maximal percentage read length allowed to be errors and 50 percent minimal percentage
of the read length aligned. Duplicate fragments with a lower alignment quality were discarded.
Next, the overlaps between aligned reads and annotated genes were counted using HTSeq
software (
http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html
). If the number of
overlapped read of any given gene was less than one per million total mapped read for all
samples, this gene was excluded from further analysis. The gene counts were normalized using
the scaling factor method (5). A negative binomial generalized linear model was fitted to each
gene. Then a likelihood-ratio test was applied to examine if there was a difference in the gene
transcript levels among the compared strains (5). The Benjamini-Hochberg method was used to
control false discovery rate (FDR) (6). Next, pair-wise comparisons using the likelihood ratio
test were performed to compare the gene transcript levels between pairs of strains with the
Holm's method used to calculate adjusted P-values to correct for multiple testing. Transcript
levels were considered significantly different only if the mean transcript level difference
was ≥
1.5-fold and the final, adjusted P value was less than 0.05.
SI Material and Methods References
1.
Koren S, et al. (2013) Reducing assembly complexity of microbial genomes with
single-molecule sequencing. Genome Biol 14(9):R101.
2.
Aziz RK, et al. (2008) The RAST Server: rapid annotations using subsystems
technology. BMC Genomics 9:75.
3.
Van Domselaar GH, et al. (2005) BASys: a web server for automated bacterial genome
annotation. Nucleic Acids Res 33(Web Server issue):W455-459.
4.
Beres SB, et al. (2010) Molecular complexity of successive bacterial epidemics
deconvoluted by comparative pathogenomics. Proc Natl Acad Sci U S A
107(9):4371-4376.
5.
Anders S & Huber W (2010) Differential expression analysis for sequence count data.
Genome Biol 11(10):R106.
6.
Reiner A, Yekutieli D, & Benjamini Y (2003) Identifying differentially expressed genes
using false discovery rate controlling procedures. Bioinformatics 19(3):368-375.
Fig. S2. Identified regions of potential recombination in ST-1 GBS. Proportion of shared
ancestry (PSA) tree and recombinogenic segments were determined by BratNextGen.
PSA clusters based on 202 GBS ST-1 pseudogenomes are displayed on left. The
SGBS001 genome position is given on the bottom of the figure. Highlighted (gray) are
regions of the reference genome including mobile genetic elements (RDF.1 – RDF.6),
pilus islands (PI-1, PI-2a), and the capsule biosynthesis locus (neuA-cpsA). Also
highlighted (salmon) are the recombinogenic segments of GBS strains showing greater
phylogenetic distance from the core ST1 cluster (Fig. 3B).
Fig. S3. (A) Phylogram of neighbor-joining tree of 194 ST-1 strains lacking evidence of
recombination. Shape and color of strains are as indicated in legend. Genetic distance is
indicated by in-laid bar. Data are the same as for Fig. 3D except using a phylogram
rather than a radial tree depiction. (B) Neighbor-joining tree of all Houston isolated
strains after excluding strains with potential recombination. The year of isolation of
individual strains is noted in the legend. In-laid bar represents genetic distance. (C)
Graph plotting the mean ± standard deviation of the number of intra-group SNPs per
strain stratified by indicated time group. Asterisks indicate P < 0.0001 relative to
1992-2000 (*) and P < 0.0001 relative to both 1992-1992-2000 and 2001-2010 (**) as determined
Mann-Whitney U-test. (D) Estimation of the year of origin inferred from 194 ST-1 strains
based on 5,880 concatenated core SNPs by neighbor joining. Illustrated is the best fit of
the root-to-tip divergence for each of the strains as determined by Path-O-Gen. The
X-intercept (1963) is the best estimate of the time of origin of the most recent common
serotype V, ST-1 ancestor.
Fig. S4. Rooted neighbor joining tree of 194 North American ST-1 strains from this study
(as depicted in Fig. 3D) along with 24 serotype V, ST-1 strains from (10). The strains
from this study were all isolated in North America and are in blue whereas strains from
(10) were isolated in Europe, Africa, or Australia as indicated in the legend. All strains
from this study were invasive and thus are a depicted by a circle. The strains from (10)
were either invasive (circle) or colonizers/unknown clinical status (squares).
Fig. S5. Summary of antimicrobial resistance elements identified in ST-1 strains. (A) Antimicrobial resistance determinants were
antimicrobial resistance determinant. (B) erm alleles are indicated along with phylogenetic relationships as determined in Fig. 3D.
Country of origin of strains and erm alleles are as indicated in legend. Note clustering of strains with ermT/R and ermB_10 alleles
suggesting single insertion events. ermB_10 containing strains had the ermB gene located in Tn3872 which appeared to have occurred
from a single recombination event of Tn917 into Tn916 to create Tn3872 as described by Da Cunha et al. (10).
Fig. S6. Principal components analysis (PCA) of RNAseq data. PCA showing
inter-sample variation of quadruplicate replicates of indicated strains grown to mid-exponential
in TH broth. Strain numbers and phenotype are indicated in the legend. % values refer
to amount of variation explained by each axis.
Strain
isolation
Year of
Location of isolation
MLST
Genotype
a(if applicable)
NGBS008
2009
Toronto, CA
1
NGBS009
2010
Toronto, CA
1
NGBS010
2009
Toronto, CA
1
NGBS021
2009
Toronto, CA
1
NGBS022
2009
Toronto, CA
1
NGBS025
2009
Toronto, CA
1
NGBS028
2009
Toronto, CA
1
NGBS030
2010
Toronto, CA
1
NGBS035
2010
Toronto, CA
1
NGBS044
2010
Toronto, CA
1
NGBS054
2010
Toronto, CA
1
NGBS063
2010
Toronto, CA
1
NGBS068
2010
Toronto, CA
1
NGBS092
2010
Toronto, CA
1
NGBS093
2010
Toronto, CA
1
NGBS094
2010
Toronto, CA
1
NGBS099
2010
Toronto, CA
1
NGBS107
2010
Toronto, CA
1
NGBS110
2010
Toronto, CA
1
NGBS117
2010
Toronto, CA
1
NGBS134
2010
Toronto, CA
1
NGBS164
2010
Toronto, CA
1
NGBS171
2010
Toronto, CA
1
NGBS172
2010
Toronto, CA
1
NGBS177
2010
Toronto, CA
1
NGBS180
2010
Toronto, CA
1
NGBS200
2010
Toronto, CA
1
NGBS210
2011
Toronto, CA
1
NGBS229
2011
Toronto, CA
1
NGBS234
2010
Toronto, CA
1
NGBS241
2011
Toronto, CA
453
NGBS244
2011
Toronto, CA
1
NGBS246
2011
Toronto, CA
1
NGBS267
2010
Toronto, CA
1
NGBS272
2011
Toronto, CA
1
NGBS273
2011
Toronto, CA
1
NGBS275
2010
Toronto, CA
1
NGBS279
2010
Toronto, CA
1
NGBS283
2010
Toronto, CA
1
NGBS287
2010
Toronto, CA
1
NGBS288
2010
Toronto, CA
1
NGBS298
2011
Toronto, CA
1
NGBS303
2010
Toronto, CA
1
NGBS321
2011
Toronto, CA
1
NGBS323
2011
Toronto, CA
1
NGBS325
2011
Toronto, CA
1
NGBS330
2011
Toronto, CA
1
NGBS331
2011
Toronto, CA
1
NGBS332
2011
Toronto, CA
1
NGBS338
2011
Toronto, CA
1
NGBS348
2011
Toronto, CA
1
NGBS357
2011
Toronto, CA
1
NGBS359
2011
Toronto, CA
1
NGBS360
2011
Toronto, CA
1
NGBS372
2011
Toronto, CA
1
NGBS380
2011
Toronto, CA
1
NGBS381
2011
Toronto, CA
1
NGBS411
2011
Toronto, CA
1
NGBS418
2011
Toronto, CA
1
NGBS425
2011
Toronto, CA
1
NGBS434
2011
Toronto, CA
1
NGBS444
2011
Toronto, CA
1
NGBS462
2011
Toronto, CA
1
NGBS492
2012
Toronto, CA
1
NGBS494
2012
Toronto, CA
1
NGBS497
2012
Toronto, CA
1
NGBS499
2012
Toronto, CA
1
NGBS513
2012
Toronto, CA
1
NGBS519
2012
Toronto, CA
1
NGBS536
2012
Toronto, CA
1
NGBS553
2012
Toronto, CA
1
NGBS558
2012
Toronto, CA
1
NGBS561
2012
Toronto, CA
1
NGBS571
2012
Toronto, CA
1
NGBS579
2012
Toronto, CA
1
NGBS580
2012
Toronto, CA
1
NGBS586
2012
Toronto, CA
1
NGBS604
2012
Toronto, CA
1
NGBS624
2012
Toronto, CA
1
NGBS630
2012
Toronto, CA
1
NGBS633
2012
Toronto, CA
531
SGBS001
1993
Houston, TX, USA
1
595_596insT RDF_1357 (AP1_2a encoding gene) 1195delT RDF_1410 (rga )SGBS002
1993
Houston, TX, USA
1
SGBS003
1994
Houston, TX, USA
1
SGBS004
1995
Houston, TX, USA
1
SGBS005
1996
Houston, TX, USA
1
SGBS007
1997
Houston, TX, USA
1
SGBS008
1998
Houston, TX, USA
1
SGBS010
1999
Houston, TX, USA
1
SGBS011
2000
Houston, TX, USA
24
SGBS012
2000
Houston, TX, USA
452
SGBS013
2001
Houston, TX, USA
1
SGBS014
2002
Houston, TX, USA
1
SGBS015
2003
Houston, TX, USA
1
SGBS016
2003
Houston, TX, USA
1
SGBS019
2005
Houston, TX, USA
26
SGBS020
2006
Houston, TX, USA
1
SGBS021
2007
Houston, TX, USA
1
SGBS022
2008
Houston, TX, USA
1
SGBS023
2009
Houston, TX, USA
86
SGBS024
2010
Houston, TX, USA
370
SGBS025
2010
Houston, TX, USA
1
SGBS026
2011
Houston, TX, USA
1
SGBS027
2011
Houston, TX, USA
1
SGBS028
2011
Houston, TX, USA
19SGBS029
2011
Houston, TX, USA
1
SGBS030
2012
Houston, TX, USA
1
SGBS031
1992
Houston, TX, USA
1
SGBS032
1992
Houston, TX, USA
1
SGBS033
1994
Houston, TX, USA
1
SGBS034
1994
Houston, TX, USA
1
SGBS035
1994
Houston, TX, USA
1
SGBS036
1997
Houston, TX, USA
1
SGBS037
1997
Houston, TX, USA
1
SGBS038
1998
Houston, TX, USA
1
SGBS039
1998
Houston, TX, USA
1
435delG RDF_0659 (AP1-2a encoding gene)SGBS040
1998
Houston, TX, USA
1
SGBS041
1998
Houston, TX, USA
1
SGBS042
1998
Houston, TX, USA
1
SGBS043
1998
Houston, TX, USA
1
Wild-typeSGBS044
1999
Houston, TX, USA
1
SGBS045
1999
Houston, TX, USA
1
SGBS046
1999
Houston, TX, USA
1
Y222X RDF_0655 (BP1 encoding gene) E155X RDF_0315SGBS047
2000
Houston, TX, USA
1
SGBS048
2000
Houston, TX, USA
1
672_673insA RDF_1161 (cpsO)SGBS049
2000
Houston, TX, USA
1
SGBS050
2000
Houston, TX, USA
1
SGBS051
2000
Houston, TX, USA
1
SGBS052
2000
Houston, TX, USA
1
SGBS053
2000
Houston, TX, USA
1
SGBS054
2001
Houston, TX, USA
1
983delT RDF_1359 (rogB )SGBS057
2001
Houston, TX, USA
1
SGBS058
2001
Houston, TX, USA
1
SGBS059
2001
Houston, TX, USA
1
SGBS060
2001
Houston, TX, USA
1
SGBS061
2002
Houston, TX, USA
1
SGBS062
2002
Houston, TX, USA
1
SGBS063
2002
Houston, TX, USA
1
SGBS064
2002
Houston, TX, USA
1
SGBS065
2002
Houston, TX, USA
1
SGBS066
2002
Houston, TX, USA
1
SGBS067
2002
Houston, TX, USA
1
SGBS068
2003
Houston, TX, USA
1
SGBS069
2003
Houston, TX, USA
1
SGBS070
2003
Houston, TX, USA
1
SGBS071
2003
Houston, TX, USA
1
SGBS072
2003
Houston, TX, USA
1
SGBS073
2003
Houston, TX, USA
26
SGBS074
2004
Houston, TX, USA
1
SGBS075
2004
Houston, TX, USA
1
SGBS076
2004
Houston, TX, USA
1
SGBS077
2004
Houston, TX, USA
1
SGBS078
2004
Houston, TX, USA
1
SGBS079
2004
Houston, TX, USA
1
SGBS080
2004
Houston, TX, USA
1
SGBS081
2004
Houston, TX, USA
1
SGBS082
2004
Houston, TX, USA
1
SGBS083
2004
Houston, TX, USA
1
SGBS084
2005
Houston, TX, USA
1
SGBS085
2005
Houston, TX, USA
1
SGBS086
2005
Houston, TX, USA
1
SGBS087
2005
Houston, TX, USA
1
Wild-typeSGBS088
2005
Houston, TX, USA
1
SGBS089
2005
Houston, TX, USA
1
SGBS090
2005
Houston, TX, USA
190
SGBS092
2005
Houston, TX, USA
1
SGBS093
2005
Houston, TX, USA
1
SGBS094
2005
Houston, TX, USA
1
SGBS095
2005
Houston, TX, USA
1
SGBS096
2005
Houston, TX, USA
1
SGBS097
2005
Houston, TX, USA
1
SGBS098
2005
Houston, TX, USA
1
SGBS099
2005
Houston, TX, USA
19
SGBS100
2006
Houston, TX, USA
1
SGBS101
2006
Houston, TX, USA
1
SGBS102
2006
Houston, TX, USA
1
Wild-typeSGBS103
2006
Houston, TX, USA
1
SGBS105
2006
Houston, TX, USA
1
SGBS106
2006
Houston, TX, USA
1
433delA RDF_0654 (ape1)SGBS107
2006
Houston, TX, USA
1
159delA RDF_0659 (AP1_1 encoding gene)SGBS108
2006
Houston, TX, USA
1
SGBS109
2006
Houston, TX, USA
1
SGBS110
2007
Houston, TX, USA
1
252_253insG RDF_1165 (cpsG )SGBS111
2007
Houston, TX, USA
1
SGBS114
2007
Houston, TX, USA
1
SGBS115
2007
Houston, TX, USA
1
SGBS116
2007
Houston, TX, USA
1
SGBS117
2008
Houston, TX, USA
26
SGBS118
2008
Houston, TX, USA
1
SGBS119
2008
Houston, TX, USA
1
SGBS120
2008
Houston, TX, USA
1
SGBS121
2009
Houston, TX, USA
190
SGBS122
2009
Houston, TX, USA
1
SGBS123
2009
Houston, TX, USA
55
SGBS124
2009
Houston, TX, USA
49
SGBS125
2009
Houston, TX, USA
1
SGBS126
2009
Houston, TX, USA
1
SGBS127
2009
Houston, TX, USA
1
SGBS128
2009
Houston, TX, USA
19
SGBS129
2009
Houston, TX, USA
1
SGBS130
2009
Houston, TX, USA
19
SGBS131
2009
Houston, TX, USA
19
SGBS132
2009
Houston, TX, USA
1
SGBS133
2009
Houston, TX, USA
1
SGBS134
2009
Houston, TX, USA
452
SGBS135
2007
Houston, TX, USA
1
SGBS136
2007
Houston, TX, USA
1
SGBS137
2010
Houston, TX, USA
19
SGBS138
2010
Houston, TX, USA
1
SGBS139
2008
Houston, TX, USA
1
SGBS140
2008
Houston, TX, USA
1
SGBS141
2009
Houston, TX, USA
1
SGBS142
2009
Houston, TX, USA
1
SGBS143
2010
Houston, TX, USA
1
SGBS144
2010
Houston, TX, USA
1
SGBS145
2010
Houston, TX, USA
1
SGBS146
2010
Houston, TX, USA
1
751_delGAACTTTTAAATAAAC RDF_1159 (cpsK )SGBS147
2010
Houston, TX, USA
1
940delT RDF_1357 (AP1-2a encdoing gene)SGBS148
2011
Houston, TX, USA
1
930delT RDF_1357 (AP1-2a encoding gene)SGBS149
2012
Houston, TX, USA
19
SGBS150
2012
Houston, TX, USA
1
SGBS151
2012
Houston, TX, USA
1
SGBS152
2012
Houston, TX, USA
1
SGBS154
2013
Houston, TX, USA
19
SGBS155
2013
Houston, TX, USA
1
Strain
SGBS001
Start
aSGBS001
End
SGBS001 CDS
Length
(bp)
Best BLAST hit
b
Hit Start
cHit End
Identity
(%)
SGBS060 1658415 1674424 RDF_1615 to RDF_1635 15875 2603V/R (V) 1686219 1702221 99.94 SGBS064 834287 875209 RDF_0812 to RDF_0851 40922 NEM316 (III) 845210 885592 99.92 SGBS141 1245926 1274722 RDF_1206 to RDF_1232 28796 09mas018883 (V) 1217847 1246654 99.99 1294760 1300559 RDF_1253 to RDF_1258 5799 NEM316 (III) 1425086 1430886 99.1 1310117 1347812 RDF_1270 to RDF_1304 37695 2603V/R (V) 1328302 1362368 99.04 aBeginning and ending genome position of the reference genome (SGBS001)
b
Best BLAST hit determined by blastn alignment of de novo assembled query contig against all completed GBS genomes.
with serotype of identified strain noted in parentheses
c
Start and end points refer to genome identified as source of recombination
RDF#
# strains affected
Region of difference
aRDF_0167-0169
1
No
RDF_0217-0229
1
Yes, 1.2
RDF_0230-0238
10
Yes, 1.2
RDF_0240-0246
1
Yes, 1.3
RDF_0407-0408
1
No
RDF_0547-0592
8
Yes, 1.4
RDF_0948-0959
1
Yes, 5.4
RDF_1226-1234
2
Yes, 1.13
RDF_1240-1250
1
Yes, 2.6
RDF_1299-1301
1
Yes, 1.13
RDF_1440-1442
5
Yes, 1.17
RDF_1900-1906
1
Yes, 1.21
RDF_1877-1890
1
Yes, 2.6
RDF_1895-1916
1
No
RDF_1972-1985
2
Yes, 2.9
RDF_1986-1988
4
No
Table S3. Gene regions of strain SGBS001 > 1 kb missing from at least one serotype V, ST-1 strain
a