ContentslistsavailableatScienceDirect
Epidemics
jo u r n al h om ep ag e :w w w . e l s e v i e r . c o m / l o c a t e / e p i d e m i c s
Research
Paper
Clustering
of
contacts
relevant
to
the
spread
of
infectious
disease
Xiong
Xiao
a,d,
Albert
Jan
van
Hoek
a,
Michael
G.
Kenward
a,
Alessia
Melegaro
b,
Mark
Jit
a,c,∗aFacultyofEpidemiologyandPopulationHealth,LondonSchoolofHygieneandTropicalMedicine,KeppelStreet,LondonWC1E7HT,UnitedKingdom bDONDENACentreforResearchonSocialDynamics&PublicPolicy,UniversitàBocconi,ViaGuglielmoRöntgenn.1,20136Milan,Italy
cModellingandEconomicsUnit,PublicHealthEngland,61ColindaleAvenue,LondonNW95EQ,UnitedKingdom dDepartmentofEpidemiologyandBiostatistics,WestChinaSchoolofPublicHealth,SichuanUniversity,Chengdu,China
a
r
t
i
c
l
e
i
n
f
o
Articlehistory:
Received18January2016
Receivedinrevisedform4August2016 Accepted23August2016
Availableonline26August2016
Keywords: Clustering Contacts Infectiousdiseases Mathematicalmodelling Varicella-zostervirus
a
b
s
t
r
a
c
t
Objective:Infectiousdiseasespreaddependsoncontactratesbetweeninfectiousandsusceptible indi-viduals.Transmissionmodelsarecommonlyinformedusingempiricallycollectedcontactdata,butthe relevanceofdifferentcontacttypestotransmissionisstillnotwellunderstood.Somestudiesselect con-tactsbasedonasinglecharacteristicsuchasproximity(physical/non-physical),location,durationor frequency.Thisstudyaimedtoexplorewhetherclustersofcontactssimilartoeachotheracrossmultiple characteristicscouldbetterexplaindiseasetransmission.
Methods:IndividualcontactdatafromthePOLYMODsurveyinPoland,GreatBritain,Belgium,Finlandand ItalyweregroupedintoclustersbythekmedoidsclusteringalgorithmwithaManhattandistancemetric tostratifycontactsusingallfourcharacteristics.Contactclusterswerethenusedtofitatransmission modeltosero-epidemiologicaldataforvaricella-zostervirus(VZV)ineachcountry.
Resultsanddiscussion:Acrossthefivecountries,9–15clusterswerefoundtooptimisebothqualityof clus-tering(measuredusingaveragesilhouettewidth)andqualityoffit(measuredusingseveralinformation criteria).Ofthese,2–3clustersweremostrelevanttoVZVtransmission,characterisedby(i)1–2clusters ofage-assortativecontactsinschools,(ii)aclusteroflessage-assortativecontactsinnon-schoolsettings. Qualityoffitwassimilartousingcontactsstratifiedbyasinglecharacteristic,providingvalidationthat singlestratificationsareappropriate.However,usingclusteringtostratifycontactsusingmultiple char-acteristicsprovidedinsightintothestructuresunderlyinginfectiontransmission,particularlytheroleof age-assortativecontacts,involvingschoolagechildren,forVZVtransmissionbetweenhouseholds.
©2016TheAuthor(s).PublishedbyElsevierB.V.ThisisanopenaccessarticleundertheCCBYlicense (http://creativecommons.org/licenses/by/4.0/).
1. Introduction
Mathematicalmodelsofinfectiousdiseasetransmissionrequire assumptionsaboutmixingbetweendifferentsubgroupsina pop-ulationthatcanpotentiallyleadtotransmissionbetweeninfected andsusceptibleindividuals.Thesimplestassumptionisthat
every-Abbreviations: AIC,AkaikeInformationCriterion;AICc,small-sample-size cor-rectedAkaikeInformationCriterion;ASW,averagesilhouettewidth;BIC,Bayesian InformationCriterion;PAM,partitioningaroundmedoids;VZV,varicella-zoster virus;WAIFW,whoacquiresinfectionfromwhom.
∗ Correspondingauthorat:FacultyofEpidemiologyandPopulationHealth, Lon-donSchoolofHygieneandTropicalMedicine,KeppelStreet,LondonWC1E7HT, UnitedKingdom.
E-mailaddresses:Xiong.Xiao@lshtm.ac.uk
(X.Xiao),Albert.VanHoek@lshtm.ac.uk(A.J.vanHoek),Mike.Kenward@lshtm.ac.uk
(M.G.Kenward),alessia.melegaro@unibocconi.it(A.Melegaro),
mark.jit@lshtm.ac.uk(M.Jit).
onehasthesameprobabilityofcontactingeachother,butthiscan sometimesleadtomisleadingresults(KeelingandRohani,2008). Indeed,manyinfectioncontrolinterventionssuchasvaccinating children(Thorringtonetal.,2015)orclosingschoolsduringa pan-demic(Houseetal.,2011)arepredicatedontheassumptionthat certainsubgroupsinthepopulationarethemaintransmitters.
A more realistic assumption is to subdivide the population basedonsomecharacteristic,andintroduce amatrixofcontact rates capableof transmittinginfectionbetween eachsubgroup, calledthe“whoacquiresinfectionfromwhom”(WAIFW)matrix (VynnyckyandWhite,2010).Ageisthecharacteristicmost com-monlyusedasasourceofheterogeneityinmixingpatterns.Amodel withage-stratifiedcontactratescanbefittedtoage-specificdataon infectionhistory(suchassero-epidemiologicaldata,whichmarks theprevalenceofpreviousinfection)toestimatetheage-specific effectivecontactratesintheWAIFWmatrix.
ToinformtheelementsoftheWAIFWmatrix,thenumberof socialcontactsthatindividualsindifferentagegroupsreportcan http://dx.doi.org/10.1016/j.epidem.2016.08.001
beempiricallymeasuredandusedasproxiestotheactualcontact ratesunderlyingtransmission(Beutelsetal.,2006;Edmundsetal., 1997;Wallingaetal.,2006).Thelargestsuchstudyisadiary-based surveyof7290participantsineightEuropeancountriescollected in2006aspartofthePOLYMODproject(Mossongetal.,2008). Sincethencontactstudieshavebeencarriedoutinotherpartsof theworldusingsimilarmethodology.
In these studies, participants are asked to record the con-tacts thatthey have madeover a single dayand classify them usinganumberofcharacteristics(suchasphysical/non-physical, long/short,home/school/worketc.)Sinceitisunrealisticto mea-sureeffectivecontacts(i.e.contactsthatcantransmitaparticular infection)betweenindividualsdirectly,someformofself-reported socialbehavioursuchasface-to-faceconversationorskin-to-skin contactisusedasaproxy.Itisassumedthattheagedistribution ofthesesocialcontactsisrelatedtotheagedistributionof effec-tivecontactsbyaconstantproportionalityfactor,anassumption referredtoasthe“socialcontacthypothesis”(Wallingaetal.,2006). Hencetheage-specifictransmissionmatrixiscompletelydescribed byestimating this factor.Subsequentanalyses (Melegaroet al., 2011)foundthatstratifyingtheWAIFWmatrixaccordingto dif-ferentcharacteristicsofcontacts(withadifferentproportionality constantforeachtypeofcontact)significantlyimprovedthe good-nessoffitofthemodelstoserologicalmarkersofpastinfectionfor respiratoryinfections.Thisimpliesthatdifferentcharacteristicsof socialcontactcontributedifferentlytoinfectiontransmission.
A previous study explored the use of formal clustering algorithms to group POLYMOD survey respondents based on the number and location of their contacts (Kretzschmar and Mikolajczyk,2009).Thestudyfoundthatrespondentsacross dif-ferentcountriesfellintoasimilarrangeofcontactprofiles,withthe work,schoolandhouseholdcontactprofilesmostcommon. How-ever,thisdoesnottelluswhethertherearecertaintypesofcontacts (ratherthanrespondents)whichmaybeparticularlyrelevantfor thetransmissionofparticularinfectiousdiseases.Furthermore,the relevantclustersofcontactsmaybedisease-specific,sincedifferent infectionshaveseparateroutesoftransmission.
Henceanalternativeapproachwouldbetoexplorewhatkindof contacts(ratherthanrespondents)aremostrelevanttoinfection transmission.Theonlypreviousworkinthisareahasfocusedon singledimensionsofcontacts(Melegaroetal.,2011).Thissimply indicatedthat“intimate”(i.e.physical,home,long-durationand frequent)contactsarebetterabletoexplainage-dependent pat-ternsintheacquisitionofserologicalmarkersforvaricella-zoster virus(VZV)andparvovirusB19,thetwoinfectionexaminedinthe study.Sincethedimensionsofintimacyarehighlycorrelated,it maybemoreinformativetotakeallthecharacteristicsofsocial con-tactsintoaccountcollectivelywhenstratifyingtheWAIFWmatrix. Inparticularcontactsmadebydifferentrespondentscouldbe groupedintoclusters,andthenexaminedtoidentifywhichclusters bestexplainpatternsofinfectionacquisitioninthecorresponding populations.Here,weexploretheuseofclusteringalgorithmsto determineclustersofsocialcontactswhicharesimilartoeachother basedonmulti-dimensionalcharacteristicsofsocialcontacts.
Arangeofclusteringalgorithmshavebeendevelopedsuchas hierarchicalclustering,partitioningclusteringandlatentclass clus-tering(Everittetal.,2011).Inhierarchicalclustering,eachelement isplottedonagraphtodeterminetheoptimalnumberofclusters. Whenthenumberofelementsbecomesverylarge(suchasthe thousandsofcontactsforeachcountryinthePOLYMODsurvey), thisgraphicalmethodbecomesverycumbersome.Inlatentclass clustering,amultivariatedistributionisimposedonthedataand thevalidationoftheclusteringresultsdependsonseveral assump-tionssuchastheparametricformofthemultivariatedistribution, thedependencybetweenvariablesandtheapproximationof like-lihoodestimation.Sincesocialcontactdataincludedifferenttypes
ofvariables(binaryandordinal),someofwhicharestructured,it isdifficulttoidentifyanappropriatemultivariatedistributionto describethedata.Henceweusedpartitioningclustering,andin particularthekmedoidsmethodtoclassifycontacts.
We then investigate whether these clusters enhance our understanding of transmission patterns by using them to fit a transmission dynamic model to age-dependent patterns in the acquisition of varicella-zoster virus serological markers, as an exampleofachildhoodrespiratoryinfectionwithclear,long-lived markersofpastinfectionandnovaccinationhistoryatthetimeof datacollection.
2. Methods 2.1. Datasources
Age-specific contact matrices were constructed using social contactdatafromparticipantsofthePOLYMODproject(Mossong etal.,2008)livinginPoland(15808contacts,1003participants), Great Britain(11052contacts,996participants),Belgium (8810 contacts,747participants),Finland(10319contacts,973 partici-pants)andItaly(15788contacts,842participants).Contactswith anymissing information werediscarded, so ourdataset differs slightly frompreviousanalyses(Melegaroetal., 2011).Contact matriceswereadjustedforpopulationsizeandreciprocityusing well-describedprocedures(Melegaroetal.,2011;Wallingaetal., 2006).
ModelswerefittedtodataonthepresenceofantibodiestoVZV fromserumsamplescollectedin1996from1300participantsaged 0–19inPoland,2091participantsaged0–20inEnglandandWales (fittedtoGreatBritishcontacts),2760participantsaged0–39in Belgium,2500participantsaged0–79inFinlandand2517 par-ticipantsaged0–79inItaly(Vyseetal.,2004).Thesampleswere collectedfromunlinkedanonymised(apartfromage)residualsera followingmicrobiologicalorbiochemicalinvestigations(Osborne et al., 2000), and tested as partof the European Commission-fundedsecondEuropeanSero-epidemiologicalNetwork(ESEN2) (Melegaroetal.,2011;Nardoneetal.,2007).Childrenunder5years oldwereoversampledwiththesamplesizeineachagegroup rang-ingfrom117to192.For those aged5–20 yearsapproximately 100seraineachone-yearagegroupweretested.Allserological testsforVZV-specificIgGwereperformedatPrestonPublicHealth LaboratoryusingacommercialELISAassay.
Populationdatawereobtainedfromnationalstatisticsoffices inthefivecountriesconsideredasinpreviousanalyses(Melegaro etal.,2011).
2.2. Clusteranalysisofsocialcontactdata
Aclusterisdefinedasagroupofsocialcontactswhose mem-bers aremoresimilartoeach otherthan tonon-members.The similarity of two contacts is defined using the Manhattan dis-tancemeasure betweenthetwo (Everittet al.,2011),based on four contact characteristics: proximity (physical, non-physical), duration(<5min,5–15min,15minto1h,1–4h,>4h),frequency (daily, weekly, monthly, a few times a year, first time) and location (home, work, school, leisure, transport, other). Vari-ables representingeachcharacteristic wererecodedsothatthe distance between any two contacts lies between 0 and 1, so theManhattan distance becomes theGower’s similarity coeffi-cientwhichisanappropriateproximitymeasurementformixed data (Gower, 1971) (see Appendix A1 for details in Supple-mentary data). Contacts recorded as taking place in multiple locationswereassignedtoasinglelocationbasedonthefollowing hierarchy: home>work>school>leisure>other>transport. The
hierarchywasbasedontheputativedurationofcontactsinthe givensettingsassuggestedbypreviousinvestigators(Kretzschmar andMikolajczyk,2009).Socialcontactswereclassifiedinto clus-tersusingthePartitioningAroundMedoids(PAM)algorithm,the mostcommonrealisationofk-medoidsclustering.Thisapproach pre-defines a fixednumberof clusters, selectstypical elements ascentroidsofeachclusterandassignsotherelementstoa clus-ter according to their distance to the centroid (Kaufman and Rousseeuw,1990).
Thenumberofclusterswasvariedbetween2and15,withthe upperlimitincludedtoavoidpossibleproblemswithdatasparsity. TheoptimalnumberofclustersforthePAMalgorithmwasthen determined usingaverage silhouette width (ASW),which mea-sureshowwelleachcontactisassigned toitscluster(Kaufman andRousseeuw,1990).Replicationanalysis(Breckenridge,2000; Walesiaketal.,2008)wasperformedtoassesstherobustnessof theclusteringresults,bysplittingthedatasetintoseveralsubsets andusingthesamealgorithmseparatelyforeachsubsettocompare classificationagreement.
2.3. Fittingdynamictransmissionmodels
We constructed a realistic age-structured (Schenzle, 1984) susceptible-infected-recovered(SIR)dynamicmodelofVZV trans-mission(withnonaturalmortalityassumed).Inthis model,the nextgenerationmatrixrepresentingthepotentialnumberof infec-tiontransmissioneventsperperson,iscalculatedastheproductof the(adjusted)age-dependentcontactmatrix,aproportionality fac-torqrepresentingtheproportionofcontactsthatcanpotentially leadtoanewinfectionandvectorrepresentingthesizeofthe sus-ceptiblepopulationineachagegroup.Foreachclusteriofcontacts (1≤i≤NwhereNisthetotalnumberofclusters),wegenerated NseparatecontactmatricesCiwithcorrespondingproportionality factorsqi.Hencethenextgenerationmatrixisthelinear combi-nationofallcontactmatricesC1q1w+...+CNqNwwherewisa
verticalvectorofthepopulationsizeineachagegroup.Thebasic reproductionR0wascalculatedastheprincipaleigenvalueofthe
nextgenerationmatrix.
A grid search algorithm was used to find the qi’s which maximised the binomial likelihood of the model given sero-epidemiological data for VZVinfections (Melegaroet al., 2011; Ogunjimietal.,2009).Wedefinedrelevantclustersasthosewith correspondingbestfittingqi’sthatwerenon-zero,andhencewhich contributedtothenextgenerationmatrix.Aniterativemethodwas usedtoobtaintheproportionofimmunesateachagegroupasa functionoftheqi’s.Qualityofmodelfitwithdifferentnumbersof clusterswasmeasuredusingtheAkaikeInformationCriterion(AIC), itssmall-sample-size corrected version (AICc)and the Bayesian InformationCriterion(BIC).Goodnessoffitwasalsocomparedto modelswithcontactsstratifiedbyasinglecharacteristiconly. 2.4. Uncertaintyintervals
Uncertaintyintervalsfortheproportionalityfactorqandbasic reproductionnumberR0 wereestimatedbybootstrapsampling
frombothsocialcontactdataandsero-epidemiologicaldatawith thesamesamplesizeastheoriginaldata(Melegaroetal.,2011). For social contact data, bootstrap samples were generated by re-samplingparticipants’identitynumbers.Socialcontact matri-ces were then estimated for each bootstrap sample. For the sero-epidemiologicaldata,bootstrapsamplesweregeneratedby randomlydrawingfromaBernoullidistributionwithprobability ofsuccessequaltotheobservedseroprevalenceforthespecificage group.Afterthis,thenewlygeneratedsocialcontactmatricesand sero-epidemiologicaldatawerematched torepeatedly estimate thetransmissionparameters.
AllanalyseswereperformedinRversion2.15.2(RCoreTeam, 2012),usingtheclusterandclusterSimpackages.
2.5. Sensitivityanalysis
Toexplorewhetherbetween-countrydifferencesinfitted mod-els were due to heterogeneity in the age ranges over which seroprevalencedatawasavailable,werepeatedouranalysisusing onlyseroprevalencedatain therange 0–20years(whichis the maximumrangeforwhichdatawereavailableinallfivecountries). 3. Results
3.1. Optimalnumberofclusters
Evaluatingtheoptimalnumberofclustersinvolvesthree com-ponents:thequalityoftheclustering(measuredusingASW),how welltheresultingclustersfitthetransmissionmodelofVZV sero-prevalence(measuredusingAIC,AICcandBIC)andthenumberof clustersthatareactuallyrelevanttothismodelfit(Fig.1).
Inallcountries,ASWdecreasedbetween2and5clusters (indi-catingworseclusteringquality),probablybecausemostcontact characteristicshaveeither 2(physical/non-physical)or 5(other characteristics)levelsofencoding.After5clusters,theASW grad-uallyincreased,withtheincreasesoccurringatthepointswhere thenumberofrelevantclustersincreased.Thethreemeasuresof goodnessoffittoVZVseroprevalencedata(AIC,AICcandBIC)were highlyconsistentwitheachother.
Wedeterminedtheoptimalnumber ofclustersby maximiz-ingbothqualityoftheclusteringaswellasgoodnessoffittothe pointwherefurtherincreasesinthenumberofclusterswouldnot makenoticeablechangesinthequalityofclustering,modelfitsor thenumberofrelevantclusterssubstantially.Theoptimalnumber ofclustersis12inGreatBritain,14 inItaly,15 inBelgium,8in Finlandand7inPoland.TherelatedcorrectedRandindexforthe clusteringineachcountryrangesfrom89%to95%,whichsuggests thattheclusteringisrobust(Breckenridge,2000).TheRandindex representstheaverageagreementbetweentwodataclusteringin multiplereplications(withtenreplicationsusedinthisanalysis) ifcontactdataarerandomlydividedintotwosamplesandeach sampleisclusteredintosamenumberofclustersseparately. 3.2. Comparisonsbetweendifferentstratifications
Whencontactdataarestratifiedbyclusters,only2–3clusters ineachcountryarefoundtoberelevanttoVZVtransmission(i.e. havenon-zeroqi’swhenthemodelisfittedtoVZVserology). Pre-viousworkstratifyingcontactsbysinglecharacteristicsuggested thatmoreintimatecontacts(physical,home/school,frequent,long duration)bestexplainVZVtransmission(Melegaroetal.,2011). Whenclusteringcontactsusingmultiplecharacteristics,themost relevantclustersarestilllikelytocontainmoreintimatecontacts. However,thepictureismorenuancedandvariesacrossthefive countries.
DetailsofthemostrelevantclustersaregiveninTable2,Fig.2, Fig.3andAppendixA.2(Supplementarydata).Briefly,inFinland andItaly,thecountrieswiththemoststronglyage-assortative over-allcontact matrices,three contact clustersare relevant.Overall assortativityisdrivenbytwoclustersdominatedbydailyschool contacts:onephysicalandonenon-physical.Athirdcluster, domi-natedbynon-physicalleisurecontactsinFinlandandphysicalother contactsinItaly,isalsoassortativebutlesssothantheschool con-tacts.Inbothcountriestherearetwoclusterswithhomecontacts whicharenotfoundtoberelevant.
InGreatBritain,BelgiumandPoland,onlytwoclustersare rel-evant:oneconsistingmainlyofhighlyassortativedailyphysical
Fig.1.Qualityofclustering(measuredusingASW),qualityoffit(measuredusingAIC,AICcandBIC)andnumberofrelevantclustersformodelsineachcountrywithdifferent numberofclusters.Verticaldashedlineshowsthemodelwiththeoptimalnumberofclusters.
Fig.2.ThedistributionsofthecharacteristicsofcontactforeachclusterinGreatBritain.Eachclustercorrespondstoonenumbersbelowthex-axis.Theredstarmark indicatestherelevantclusterwhoseestimatedqisnotequaltozero.ResultsforothercountriesaregiveninAppendixA2(Supplementarydata).(Forinterpretationofthe referencestocolourinthisfigurelegend,thereaderisreferredtothewebversionofthisarticle.)
schoolcontacts,andthesecondoflessassortativephysicalhome contacts.The home clustersshow a strong secondary diagonal probablyrepresentinginter-generationalcontacts,particularlyin Poland.Inallthree countries,there arenon-relevant homeand schoolclustersmoredominatedbylonger(>4h)contactsthanthe homeandschoolclustersactuallyfoundrelevant.
Table1comparesthebestfittingmodelwiththeoptimal num-berofclustersineachcountry,withmodelsofcontactsstratifiedby singlecharacteristicsasusedinpreviousanalyses(Melegaroetal., 2011).Ineverycountry,amodelbasedonclusteringcontactsusing multiplecharacteristicshasabetterqualityfit(measuringusing AIC)thanmodelswithcontactsstratifiedbyasingle characteris-tics.However,theimprovedfitisattheexpenseofpoorermodel
Table1
Modeloutcomesfordifferentstratificationsofsocialcontactdata.Paranthesesgive95%intervalforthebootstrapdistributionsofqandR0.
GreatBritain Italy Belgium Finland Poland
95%CI 95%CI 95%CI 95%CI 95%CI Allcontacts q 0.057 0.048–0.066 0.033 0.027–0.039 0.091 0.068–0.114 0.076 0.065–0.089 0.057 0.046–0.071 R0 4.65 3.95–5.50 4.59 3.74–5.64 7.72 5.81–10.31 5.84 4.95–7.01 6.87 5.49–8.90 AIC 264 295 277 194 181 Byproximity q(physical) 0.090 0.073–0.107 0.043 0.026–0.052 0.114 0.083–0.142 0.030 0.000–0.095 0.078 0.058–0.095 q(non-physical) 0.000 0.000–0.011 0.002 0.000–0.036 0.000 0.000–0.036 0.154 0.056–0.221 0.000 0.000–0.055 R0 3.70 3.29–4.40 3.55 3.12–4.71 5.74 4.46–7.64 8.34 5.78–11.19 5.47 4.47–7.43 AIC 202 273 253 182 158 Byduration q(<15min) 0.000 0.000–0.000 0.000 0.000–0.020 0.000 0.000–0.001 0.000 0.000–0.287 0.000 0.000–0.121 q(15–60min) 0.000 0.000–0.000 0.000 0.000–0.130 0.000 0.000–0.535 0.000 0.000–0.117 0.000 0.000–0.000 q(>1h) 0.081 0.060–0.096 0.043 0.018–0.050 0.121 0.026–0.148 0.107 0.042–0.131 0.079 0.027–0.103 R0 3.87 3.28–5.63 3.89 3.37–5.20 5.39 4.15–9.20 4.51 3.97–9.01 5.27 4.31–8.15 AIC 235 285 250 186 186 Bylocation q(home) 0.185 0.092–0.213 0.021 0.000–0.074 0.105 0.000–0.305 0.044 0.000–0.131 0.224 0.082–0.326 q(school) 0.000 0.000–0.111 0.044 0.023–0.060 0.154 0.000–0.298 0.142 0.068–0.191 0.025 0.000–0.049 q(work) 0.000 0.000–0.000 0.000 0.000–0.012 0.000 0.000–0.000 0.000 0.000–0.000 0.000 0.000–0.000 q(others) 0.000 0.000–0.337 0.008 0.000–0.065 0.000 0.000–0.120 0.026 0.000–0.179 0.000 0.000–0.229 R0 5.11 3.93–8.93 3.93 3.36–5.07 4.97 3.98–8.54 6.15 4.35–8.56 7.63 6.07–10.77 AIC 192 281 251 183 108 Byfrequency q(daily) 0.029 0.000–0.056 0.039 0.025–0.046 0.162 0.076–0.220 0.120 0.051–0.144 0.089 0.062–0.115 q(weekly) 0.155 0.057–0.268 0.000 0.000–0.094 0.000 0.000–0.129 0.015 0.000–0.405 0.000 0.000–0.279 q(lessthanweekly) 0.000 0.000–0.000 0.000 0.000–0.085 0.000 0.000–0.101 0.000 0.000–0.082 0.000 0.000–0.000
R0 4.62 3.68–6.60 4.19 3.50–5.33 4.83 3.80–7.50 5.35 4.60–7.17 5.53 4.03–9.82 AIC 166 282 231 183 187 Byclusters Clusters 12 14 15 8 7 Relevantclusters 2 3 2 3 2 q(firstcluster) 0.565 0.000–0.723 0.071 0.000–0.225 0.204 0.000–0.336 0.303 0.000–0.371 0.246 0.000–0.323 q(secondcluster) 0.085 0.000–0.178 0.127 0.000–0.202 0.443 0.000–0.948 0.183 0.000–0.329 0.081 0.000–0.129 q(thirdcluster) 0.064 0.000–0.105 0.030 0.000–0.357 R0 3.79 2.79–9.44 4.68 2.92–5.36 4.49 3.22–8.25 6.19 4.92–23.29 7.61 5.91–32.33 AIC 160 260 226 180 106 Randindex 0.89 0.91 0.91 0.95 0.90 Table2
Characteristicsofthemostrelevantclustersineachcountry.
Country Cluster Location Physical Frequency Duration Assortativity Importance
Great Britain
1(home) Home Y Mixed Mixed – ++
2(school) School Y Daily Mixed + +
Belgium 1(home) Home Y Daily Mixed – –
2(school) School Y Daily Mixed + +
Poland 1(home) Home Y Daily Mixed – ++
2(school) School(mostly) Y Daily(mostly) Mixed + +
Finland 1(schooltouch) School Y Daily(mostly) Mixed ++ +
2(schoolnon-touch) School N Daily(mostly) Mixed ++ –
3(leisure) Leisure N Mixed Mixed + +
Italy 1(schooltouch) School Y Daily(mostly) Mixed ++ +
2(schoolnon-touch) School N Daily(mostly) Mixed ++ +
3(other) Other Y Mixed Mixed + –
estimations(largeruncertaintyintervals)aroundtheestimatedR0.
Themodelsbasedonclusterswithmultiplecharacteristicsestimate
thatR0rangesfrom3.79(95%range2.79–9.44)inGreatBritainto
7.61(95%range5.91–32.33)inPoland.Thepointestimatesare
sim-ilartoestimatesusingsinglecharacteristicsandconsistentwitha
rangeof3–8intheliterature(Goeyvaertsetal.,2010;Iozzietal.,
2010;Melegaroetal.,2011;Ogunjimietal.,2009).
Fig.4showsmodelpredictionsofVZVseropositivityusingthe optimalnumberofclusterscomparedthemwiththetrue
propor-tionofseropositivesamples.Onthewholemodelpredictionsfit thesero-epidemiologicaldatawell.
3.3. Sensitivityanalysis
Whenonlyseroprevalencedatafor 0–20yearoldsisusedin Belgium, Finland and Italy to ensure comparability with Great BritainandPoland,theoverallresultsarenotgreatlyaltered.The samethreehighlyassortativeclusters(dominatedbytwoclusters
Fig.3.Thecorrectedage-specificcontactmatricesforthetwoidentifiedrelevantclusters(theclusternumber1and5)withsamplingweightingandreciprocalcorrection.
ofschoolcontacts)arerelevantinItaly.InFinland,twoassortative clusters(dominatedbyschoolandleisurecontactsrespectively) arerelevant,sothereislittleoverallchangeinage-assortativity.
InBelgium,there isa slightshiftasthehighlyassortative clus-terdominatedbyschoolcontactsisdroppedbutalessassortative clusterdominatedbyhomecontactsremainsrelevant,sooverall
Fig.4.ProportionsofsamplepositiveforVZVantibodiesbyage(bluecirclewithsizeproportionaltosamplesize),andmodelpredictionsofprevalence(reddashline)by age.Thegreylinesgivethe95%bootstrapinterval.(Forinterpretationofthereferencestocolourinthisfigurelegend,thereaderisreferredtothewebversionofthisarticle.)
thecontactmatrixissomewhatlessassortative.Fulldetailsarein AppendixA3(Supplementarydata).
4. Discussion
Understandingsocialcontactratesisessentialtomost realis-ticmodelsofinfectiousdiseasetransmission.Contactstudiessuch asPOLYMODanditssuccessorshaveadvancedthefieldby allow-ingsuchratestobemeasuredempiricallyratherthanassumed. However,infectiontransmissioneventscannotusuallybedirectly observedsoassumptionshavetobemadeaboutthecontribution ofdifferenttypesofreportedcontactsusedasproxiestocontacts leadingtotransmission.Wehaveconductedthefirststudyto inves-tigatethe contactsthat best explain past infectionstatus data, takingintoaccountmorethanonecharacteristicatatimeusing clusteringalgorithms.
Whenthesereportedcontactsare clusteredusingclustering algorithms,onlyafewoftheseclustersarefoundtoberelevantto VZVtransmission.VZVtransmissioninallfivecountriesis dom-inatedbya groupofhighlyassortative contactsanda groupof lessassortativecontacts.However,thebalancebetweenthetwo groupschangesacrosscountries:theformergroupdominatesin FinlandandItaly,whilethelattergroupdominatesinPoland.Great BritainandBelgiumlieinbetween.Thesecountrycharacteristics arerobusttoconstrainingtheVZVdatausedtofitmodelstodata
from0to20yearoldssotheydonotappeartobedrivenbycountry differencesintheavailabilityofVZVseroprevalencedata.
Interestingly,althoughschoolcontactsappeartoplayakeyrole acrossallfivecountriesinprovidingtheassortativenessincontacts, non-assortativenessincontactsderivefromhome,leisureorother contacts.ThismatchesinformationfromVZVsurveillancein Eng-landthatsuggestsschoolandpreschoolchildrenplayakeyrolein transmission(Brissonetal.,2001).Thisexplanationhasface valid-itysinceVZVistransmittedbetween,aswellaswithin,households (OrganisationforEconomicCooperationandDevelopment,2008). Ofnote,theproportionofchildren4yearsandunderenrolledin educationin2006(whenthePOLYMODcontactstudywas con-ducted)ishighestinBelgium,followedbyItaly,UnitedKingdom, FinlandandfinallyPoland.Thisexactlymatchestheorderinwhich assortativecontactsdominatetheclustersrelevanttoVZV trans-missionin thefive countries, withtheexception of Finland.In Finland,althoughformaleducationstartslate(atage7),thereis highenrolmentinpubliclysubsidisedday-carewhichisnot cap-turedinstandardisedinter-countrystatistics.
Thequalityoffitformodelsbasedonclustersstratifiedby mul-tiplevariableswasonlyslightlybetterthanforpreviousmodels basedoncontactsstratifiedbyasinglevariablealone.The simi-laritybetweenfitqualitywithsingleandmultiplestratifications ofcontactssuggeststhatasinglecharacteristiclargelysucceedsin capturingthedichotomybetweenrelevantandnon-relevant con-tacts.Thisisprobablybecausecontactscharacteristicsarehighly
associatedwitheachother(e.g.contactsthatare physical,long duration,homeandfrequentatthesametimeare disproportion-atelyrepresentedamongallcontacts).Also,byconsideringseveral contactcharacteristicssimultaneously,ourclusteringmethodgives insightsintotheinfectiontransmissionprocessthatmaynotbe seenwhenstratifyingbyasinglecharacteristic.Forinstance,we candeterminetherelevanceofshortdurationphysicalcontactsfor whichsinglecharacteristicstratificationwouldproduce conflict-ingconclusionsdependingonwhetherthestratificationwasby durationorproximity.
Recentanalyseshaveshownthatusinganage-specific propor-tionalityfactorenablesmodelstobetterfitVZVserologyinmany Europeancountries(Goeyvaertset al., 2010; Santermanset al., 2015).Inouranalysis,wevaryproportionalityfactorsaccording toother contact characteristics, but assume that theyare age-independentwithineach clusterof contacts.Thiswasdonefor simplicity and to allow direct comparison withprevious work (Melegaroetal.,2011)butmayhaveresultedinapoorerfittodata. 5. Conclusions
Usingthekmedoidsclusteringalgorithmstoidentifyclustersof contactsrelevanttoVZVtransmissiongivessimilarmodelfit qual-ityandR0 valuesascontactsstratifiedbyasinglecharacteristic.
However,consideringseveralcharacteristicssimultaneously pro-videskeyinsightsintothetypeofcontactsthataremostrelevantto infectiontransmission,andthosethatseemnottoplayan impor-tantrole.Firstly,itconfirmsthatsingle-characteristicstratifications areabletoprovideoptimalfitstodata,probablybecausetheyare abletocapturethemostintimatecontactsresponsibleformost oftransmission.Secondly,wefindthatschool-agechildrenplaya keyroleinalmostallcontactsrelevanttoVZVtransmissionacross fivecountries,andcontactsatschoolinparticulararemost impor-tantincountrieswithhighlyassortativecontactmatrices.Thiskind ofanalysismaythereforeprovideanalternativeorsupplementary approachtoprevioussinglecharacteristicstratificationmodels,as wellasvalidationforpreviousmethodsofcontactstratification. Extendingsuchanalysestoothercountriesandinfectionsmaybe usefulforfuturework.
Conflictsofinterest None.
Authorcontributions
MJandXXdesignedthestudy.XXcarriedouttheanalysis,wrote thecodes and drewtheconclusions withinputfromMJ, AJvH, MGKandAM.XXandMJwrotethefirstdraftofthemanuscript. Allauthorscontributedtocriticalrevisionsofthemanuscriptand approvedthefinalarticle.
Acknowledgments
We thank John Edmunds, Mirjam Kretzschmar and Rafaeal Mikolajczyk for helpful discussions. This work was supported bytheNational Institutefor HealthResearch Health Protection ResearchUnit(NIHRHPRU)inImmunisationattheLondonSchool ofHygieneandTropicalMedicineinpartnershipwithPublicHealth England(PHE).Theviewsexpressedarethoseoftheauthorand notnecessarily those of theNHS,theNIHR, theDepartment of HealthorPublicHealthEngland.Researchleadingtotheseresults hasreceivedfundingfromtheEuropeanResearchCouncilunder theEuropeanUnion’sSeventhFrameworkProgramme (FP7/2007-2013)/ERCGrantagreement283955(DECIDE)toAM.Thefunders
didnotinfluencestudydesign,collection,analysisand interpre-tationofdata,writingofthisreport,orthedecisiontosubmitfor publication.
AppendixA. Supplementarydata
Supplementarydataassociatedwiththisarticlecanbefound,in theonlineversion,athttp://dx.doi.org/10.1016/j.epidem.2016.08. 001.
References
Beutels,P.,Shkedy,Z.,Aerts,M.,VanDamme,P.,2006.Socialmixingpatternsfor transmissionmodelsofclosecontactinfections:exploringself-evaluationand diary-baseddatacollectionthroughaweb-basedinterface.Epidemiol.Infect. 134,1158–1166,http://dx.doi.org/10.1017/S0950268806006418.
Breckenridge,J.,2000.Validatingclusteranalysis:consistentreplicationand symmetry.MultivariateBehav.Res.35,261–285.
Brisson,M.,Edmunds,W.J.,Law,B.,Gay,N.J.,Walld,R.,Brownell,M.,Roos,L.L., Roos,L.,DeSerres,G.,2001.Epidemiologyofvaricellazostervirusinfectionin CanadaandtheUnitedKingdom.Epidemiol.Infect.127,305–314.
Edmunds,W.J.,O’Callaghan,C.J.,Nokes,D.J.,1997.Whomixeswithwhom?A methodtodeterminethecontactpatternsofadultsthatmayleadtothe spreadofairborneinfections.Proc.Biol.Sci.264,949–957,http://dx.doi.org/ 10.1098/rspb.1997.0131.
Everitt,B.S.,Landau,S.,Leese,M.,Stahl,D.,2011.Clusteranalysis.In:WileySeries inProbabilityandStatistics.Wiley,Chichester,UK,http://dx.doi.org/10.1007/ BF00154794.
Goeyvaerts,N.,Hens,N.,Ogunjimi,B.,Aerts,M.,Shkedy,Z.,Damme,P.,VanBeutels, P.,2010.Estimatinginfectiousdiseaseparametersfromdataonsocialcontacts andserologicalstatus.J.R.Stat.Soc.Ser.CAppl.Stat.59,255–277,http://dx. doi.org/10.1111/j.1467-9876.2009.00693.x.
Gower,J.C.,1971.AGeneralCoefficientofSimilarityandSomeofItsProperties. Biometrics27,857,http://dx.doi.org/10.2307/2528823.
House,T.,Baguelin,M.,VanHoek,A.J.,White,P.J.,Sadique,Z.,Eames,K.,Read,J.M., Hens,N.,Melegaro,A.,Edmunds,W.J.,Keeling,M.J.,2011.Modellingthe impactoflocalreactiveschoolclosuresoncriticalcareprovisionduringan influenzapandemic.Proc.Biol.Sci.278,2753–2760,http://dx.doi.org/10.1098/ rspb.2010.2688.
Iozzi,F.,Trusiano,F.,Chinazzi,M.,Billari,F.C.,Zagheni,E.,Merler,S.,Ajelli,M.,Del Fava,E.,Manfredi,P.,2010.LittleItaly:anagent-Basedapproachtothe estimationofcontactpatterns-fittingpredictedmatricestoserologicaldata. PLoSComput.Biol.6,e1001021,http://dx.doi.org/10.1371/journal.pcbi. 1001021.
Kaufman,L.,Rousseeuw,P.J.,1990.FindingGroupsinData:AnIntroductionto ClusterAnalysis(WileySeriesinProbabilityandStatistics).JohnWiley&Sons.
Keeling,M.J.,Rohani,P.,2008.ModelingInfectiousDiseasesinHumansand Animals.PrincetonUniversityPress,http://dx.doi.org/10.1038/453034a. Kretzschmar,M.,Mikolajczyk,R.T.,2009.Contactprofilesineighteuropean
countriesandimplicationsformodellingthespreadofairborneinfectious diseases.PLoSOne4,e5931,http://dx.doi.org/10.1371/journal.pone.0005931. Melegaro,A.,Jit,M.,Gay,N.,Zagheni,E.,Edmunds,W.,2011.Whattypesof
contactsareimportantforthespreadofinfections?:usingcontactsurveydata toexploreEuropeanmixingpatterns.Epidemics3,143–151,http://dx.doi.org/ 10.1016/j.epidem.2011.04.001.
Mossong,J.,Hens,N.,Jit,M.,Beutels,P.,Auranen,K.,Mikolajczyk,R.,Massari,M., Salmaso,S.,Tomba,G.S.,Wallinga,J.,Heijne,J.,Sadkowska-Todys,M.,Rosinska, M.,Edmunds,W.J.,2008.Socialcontactsandmixingpatternsrelevanttothe spreadofinfectiousdiseases.PLoSMed.5,e74.
Nardone,A.,deOry,F.,Carton,M.,Cohen,D.,vanDamme,P.,Davidkin,I.,Rota, M.C.,deMelker,H.,Mossong,J.,Slacikova,M.,Tischer,A.,Andrews,N.,Berbers, G.,Gabutti,G.,Gay,N.,Jones,L.,Jokinen,S.,Kafatos,G.,deAragón,M.V.M., Schneider,F.,Smetana,Z.,Vargova,B.,Vranckx,R.,Miller,E.,2007.The comparativesero-epidemiologyofvaricellazostervirusin11countriesinthe Europeanregion.Vaccine25,7866–7872,http://dx.doi.org/10.1016/j.vaccine. 2007.07.036.
Ogunjimi,B.,Hens,N.,Goeyvaerts,N.,Aerts,M.,VanDamme,P.,Beutels,P.,2009. Usingempiricalsocialcontactdatatomodelpersontopersoninfectious diseasetransmission:anillustrationforvaricella.Math.Biosci.218,80–87,
http://dx.doi.org/10.1016/j.mbs.2008.12.009.
OrganisationforEconomicCooperationandDevelopment.(2008).Educationata Glance,Statistics/EducationataGlance/2008:OECDIndicators[WWW Document].URL http://www.oecd-ilibrary.org/education/education-at-a-glance-2008eag-2008-en(accessed7.26.16).
Osborne,K.,Gay,N.,Hesketh,L.,Morgan-Capner,P.,Miller,E.,2000.Tenyearsof serologicalsurveillanceinEnglandandWales:methodsresults,implications andaction.Int.J.Epidemiol.29,362–368.
RCoreTeam.(2012).R:Alanguageandenvironmentforstatisticalcomputing. Santermans,E.,Goeyvaerts,N.,Melegaro,A.,Edmunds,W.J.,Faes,C.,Aerts,M.,
Beutels,P.,Hens,N.,2015.Thesocialcontacthypothesisundertheassumption ofendemicequilibrium:elucidatingthetransmissionpotentialofVZVin Europe.Epidemics11,14–23,http://dx.doi.org/10.1016/j.epidem.2014.12.005.
Schenzle,D.,1984.Anage-structuredmodelofpre-andpost-vaccinationmeasles transmission.IMAJ.Math.Appl.Med.Biol.1,169–191.
Thorrington,D.,Jit,M.,Eames,K.,2015.Targetedvaccinationinhealthyschool children—canprimaryschoolvaccinationalonecontrolinfluenza?Vaccine33, 5415–5424,http://dx.doi.org/10.1016/j.vaccine.2015.08.031.
Vynnycky,E.,White,R.,2010.AnIntroductiontoInfectiousDiseaseModelling. OxfordUniversityPress,Oxford.
Vyse,A.J.,Gay,N.J.,Hesketh,L.M.,Morgan-Capner,P.,Miller,E.,2004.
SeroprevalenceofantibodytovaricellazostervirusinEnglandandWalesin childrenandyoungadults.Epidemiol.Infect.132,1129–1134.
Walesiak,M.,Dudek,A.,Dudek,M.(2008).clusterSim:Searchingforoptimal clusteringprocedureforadataset.
Wallinga,J.,Teunis,P.,Kretzschmar,M.,2006.Usingdataonsocialcontactsto estimateage-specifictransmissionparametersforrespiratory-spread infectiousagents.Am.J.Epidemiol.164,936–944,http://dx.doi.org/10.1093/ aje/kwj317.