The effects of scheduling, workload type and consolidation scenarios on virtual machine performance and their prediction through optimized artificial neural networks

(1)

ContentslistsavailableatScienceDirect

The

Journal

of

Systems

and

Software

jo u r n al h om ep a g e :w w w . e l s e v i e r . c o m / l o c a t e / j s s

The

effects

of

scheduling,

workload

type

and

consolidation

scenarios

on

virtual

machine

performance

and

their

prediction

through

optimized

artiﬁcial

neural

networks

George

Kousiouris

a,∗

_,

_Tommaso

_Cucinotta

b

_,

_Theodora

_Varvarigou

a

a_Dept._of_Electrical_and_Computer_Engineering,_National_Technical_University_of_Athens,_9,_Heroon_Polytechniou_Str.,₁₅₇₇₃_Athens,_Greece

b_Real-Time_Systems_Laboratory_(ReTiS),_Scuola_Superiore_Sant’Anna,_Via_Moruzzi,_1,₅₆₁₀₀_Pisa,_Italy

a

r

t

i

c

l

e

i

n

f

o

Articlehistory:

Received25February2011

Receivedinrevisedform6April2011

Accepted6April2011

Available online 14 April 2011 Keywords:

Virtualization

Real-timescheduling

Cloudcomputing

Artiﬁcialneuralnetworks

Geneticalgorithms

Performanceprediction

a

b

s

t

r

a

c

t

Theaimofthispaperistostudyandpredicttheeffectofanumberofcriticalparametersonthe perfor-manceofvirtualmachines(VMs).Theseparametersincludeallocationpercentages,real-timescheduling decisionsandco-placementofVMswhenthesearedeployedconcurrentlyonthesamephysicalnode, asdictatedbytheserverconsolidationtrendandtherecentadvancesintheCloudcomputingsystems. DifferentcombinationsofVMworkloadtypesareinvestigatedinrelationtotheaforementionedfactors inordertofindtheoptimalallocationstrategies.Whatismore,differentlevelsofmemorysharingare applied,basedonthecouplingofVMstocoresonamulti-corearchitecture.Foralltheaforementioned cases,theeffectonthescoreofspecificbenchmarksrunninginsidetheVMsismeasured.Finally,ablack boxmethodbasedongeneticallyoptimizedartificialneuralnetworksisinsertedinorderto investi-gatethedegradationpredictionabilityapriorioftheexecutionandiscomparedtothelinearregression method.

1. Introduction

During the recent years, server consolidation through the use of virtualization techniques and tools such as VMware (http://www.vmware.com/), XEN (http:www.xen.org/), KVM (http://www.linux-kvm.org/)iswidelyusedindatacentersorin newcomputingparadigms suchasCloudcomputing (Armbrust etal.,2010).Throughthistechnique,applicationswithdifferent characteristicsandrequirementscanruninsidevirtualmachines (VMs)onthesamephysicalmulti-corehost,withincreasedlevels of security, fault tolerance, isolation, dynamicity in resource allocationandeaseofmanagement.

However,anumberofissuesarisefromthisadvancementin computing,likethedegradationoftheperformanceofthe applica-tions,duetotheusageofthevirtualizationlayer(Tikotekaretal., 2008).Exceptfortheextralayer,anumberofotherparametersmay alsoaffecttheperformanceindistributedandvirtualized environ-ments.ThesemaybethepercentageofCPUallocatedtotheVMand itseffectonexecutiontimeandtheschedulinggranularity(theway thispercentageisassigned)thatmayalsoaffectthelatter.What

∗ Correspondingauthor.Tel.:+306939354121.

E-mailaddresses:gkousiou@mail.ntua.gr,gkousiou@yahoo.com(G.Kousiouris),

tommaso.cucinotta@sssup.it(T.Cucinotta),dora@telecom.ntua.gr(T.Varvarigou).

ismore,consolidationdecisions,whethertheseimplythe assign-mentofVMstothesamecoreorneighbouringcoresonthesame physicalnode,mayhaveaneffectonapplicationperformance.In addition,differenttypesofapplicationsmayhavedifferent inter-ferenceeffectontheco-scheduledonesonthesamephysicalnode. Thismayhappenduetoapplicationinternalcharacteristics,like typeofcalculations,usageofspeciﬁcsub-circuitsoftheCPU,effect duetocachesharing,memoryaccesspatternsetc.

The degradation of performance in this case may lead to increasedneedforresources.However,thecurrentCloudbusiness SPImodel(Youseffetal.,2008)identiﬁesdiscreterolesbetween theentitythatoffersaplatform(PaaSprovider)andtheentitythat offerstheresourceswhichthisplatformmayuse(IaaSorCloud provider).Thismodel,inconjunctionwiththeaforementioned per-formanceoverhead,createsthefollowingproblem:

• APaaSproviderhasestimatedthatXnumberofresourcesare neededformeetingtheQoSfeaturesoftheapplicationinsidethe VM.Thisestimationwillprobablybemadeafterobservingthe applicationbehaviorwhenrunningasstandalone.

• TheIaaSproviderreceivesarequestinordertoreservethisX amountofresources.Inanattempttomaximizeproﬁt,theVMs oftheapplicationwillbeconsolidatedwithotherVMsthatare admittedintheinfrastructure.Thewaythisconsolidationis per-formedisunknowntothePaaSlayer.Thisdoesnotmeanthat

(2)

theIaaSproviderwillgivelessresources.Thesameamountas requestedwillbeallocated.

• Theperformancedegradationduetotheaforementioned consol-idationandaccordinginterferencewillbetranslatedtodecreased QualityofService(QoS)levelsfortheapplication.

• ThebreachofconﬁdencebetweenthePaaSandIaaSprovideris inevitable.Theformerwillconsiderthatthelatterhasdeceived him/her,andthattheallocationofresourceswaslessthanagreed. Thelatterwillconsiderthattheformermadeapoorestimation andistryingtotransfertheresponsibilitytotheinfrastructure layer.

Therefore,itisimperativeforanIaaSprovidertobeableto pre-dictaccuratelytheoverheadthatisproducedbytheconsolidation strategyandthereforeincreaseapriorithenumberofresources allocatedtoanapplicationaccordingly.Thiswillnotonlyleadto improvedreputationbutwillenabletheIaaSprovidertooptimize theassignmentofVMstohostsorevencoresofonehost, depend-ingonthetypeofworkloadtheseVMsproduceandtheestimated interferenceofonetypeofworkloadovertheother.Byknowing whichcombinationsresultinlessoverhead,over-provisioningwith regardtostandaloneexecutioncanbeminimized.

Theaimofthis paperis totakeunderconsideration a wide rangeofparametersthataffectapplicationperformancewhen run-ningonconsolidatedvirtualizedinfrastructuresandquantifythis overhead.Therefore,IaaSprovidersmayperformintelligent deci-sionsregardingtheplacementofVMsonphysicalnodes.Thiswill mitigatetheaforementionedproblemandwillaidtheIaaSlayer tominimizetheneedforover-provisioning.Asindicative applica-tionsweconsiderthe6standardMatlabbenchmarktests,dueto thefactthattheyrepresentawideareaofcomputational work-load,frommathematicalcalculationstographicsprocessing,and areeasilyreusablebyotherresearchers.Thefollowingaspectsare investigated:

• Scheduling decisions for EDF-based (Earliest Deadline First) scheduling(budgetQoverperiodP),bothintermsofCPUshare assigned(Q/P)andgranularityofthisassignment(P).Thistype ofscheduling(LiuandLayland,1973),whichisverypopularin thereal-timeworld,isusedinordertoguaranteethatthe assign-mentsofCPUpercentagestotheVMswillbefollowedandto ensuretemporalisolation.

• Differentcombinationsofbenchmarktests,inordertodiscover theonesthatshowlessinterference(andthereforeoverhead), whenrunningconcurrentlyonthesamenode.

• Differentplacementdecisionsandhowtheseaffecttheoverhead (placementonthesamecore,inadjacentornon-adjacentcores inonephysicalnode,comparedwithstandaloneexecution).As adjacentcoresweconsidertheonesthatshareaL2cache mem-ory.

• Abilitytopredictinadvancetheseeffectsinordertoproactively provisiontheresourcesneededbyanapplicationandbeableto choosethesuitableconﬁgurationthatcreatestheleastoverhead. Forachievingthis,artiﬁcialneuralnetworks(ANNs)areused,that areautomaticallydesignedandoptimizedthroughaninnovative evolutionary(GA-based)approach.Thetopologyofthenetworks isdynamic interms ofnumber ofhidden layers,neuronsper layerandtransferfunctionsperlayerandisdecidedthroughthe optimizationprocessoftheGA.

Theremainingofthepaperisstructuredasfollows.InSection 2,similarapproachesintherelatedﬁeldarepresented,whilein Section3anextendedanalysisismadeonthechosenparameters forinvestigation.Section4containsthedescriptionofthetest-bed andthemeasuringprocess,whileSection5presentsthe analyt-icalresultsfromtheexperiments.Section6detailstheapproach

inordertopredictinadvancetheeffectandcomparesittothe multi-variatelinearregressionmethod,whileSection7provides theoverallconclusionsfromthisstudyandintentionsforthefuture.

2. Relatedwork

Inthepast,wehaveinvestigatedpartlyanumberofthe param-etersthatareconsideredhere.Inourpreviousworks(Cucinotta etal.,2010a;Kousiourisetal.,2010),themainfocuswasgivenon theeffectofschedulingparametersand%ofcpuassignmentson theQoSofferedbyspeciﬁcapplications(mainlyinteractiveones).

Given that virtualization technologies have become very popular during the recent years, a large number of interest-ing research efforts exist. In (Padala et al., 2007), the authors investigate the overhead that is inserted through the use of differentvirtualizationtools, withtheintentiontodiscoverthe mostefﬁcient one. In (Makhija et al., 2006), the main focus is the scalability of virtual machines with mixed workloads and theeffectontheirperformancewhen consolidated.The bench-marks are based on server applications. With the extended use of Cloud technologies, applications that are envisioned to be part of their workload may have more complicated work-loads rather than traditional data center ones. Such examples may be the case of the Digital Film Post-production applica-tion (http://www.irmosproject.eu/Postproduction.aspx) of the IRMOS project or the availability of mathematical program-mingand optimizationtoolslikeGAMSinCloudinfrastructures (http://support.gams-software.com/doku.php?id=platform:aws). Furthermore,notemporalisolationbetweenthevirtualmachines isused.

Averyinterestingapproach,similartoourwork,isKohetal. (2007).Inthiscase,theauthorsstudytheperformanceinterference ofcombinationsofelementaryapplicationswhenrunninginside co-allocatedVMs.Interestingscoremethodologyispresentedthat canalsobeusedforclassifyingapplications.Theworkload char-acteristicsofeachapplicationaregatheredinordertobeusedas acomparisoninthefutureforidentifyingunknownapplications thataregoingtobeinsertedintheinfrastructureandaspredictors inalinearregressionmodel.Themaindifferencesofourapproach lieonthefactthatweinvestigatealimitednumberofspeciﬁcand genericbenchmarktests,withtheuseofrealtimeschedulingand adifferentpredictionmethodbasedonANNs.Especiallytheuseof RTschedulingensuresthetemporalisolationofVMswhen run-ningconcurrentlyonthesameCPU andthefairdivision ofthe resourcepercentagesthateachVMutilizes.Inaddition,we inves-tigatedifferenthardwaresetupinterferences,suchasrunningVMs onthesamecoreorinadjacentcoresinmulti-corearchitectures anddifferentsetupsintermsofgranularityandpercentageforeach elementaryapplication.

InSororetal.(2008),averypromisingapproachfordynamically handlingtheresourcesonwhichconcurrentVMsareexecutedis portrayed,inordertooptimizesystemutilizationandapplication responsetimes.Thisisbasedonbothofﬂinerecommendationsand onlinecorrections.Howeverthisworkiscenteredarounddatabase applications.InMoller(2007),adesignandvalidation methodol-ogyofbenchmarksforvirtualmachinesispresented,wherealso theconcurrencyeffectsareinvestigated.In Tickooetal.(2010), anumberofconﬁgurationsistakenunderinvestigation,like run-ningonthesameoradjacentcores,throughhyper-threadingornot. Experimentsarefocusedon3testsandwithouttakingunder con-siderationschedulingparameters.Predictionmethodsarebasedon analyticalformulas.

InJinetal.(2008),VSCBenchmarkisdescribed,ananalysisof thedynamic performanceofVMswhennewVMs aredeployed. Theauthorsinvestigatethebehavioroftheco-scheduledVMsfor

(3)

bothlaunchingandmaintainingmultipleinstancesonthesame server.BoottimeforaVMisaveryresourceconsumingtask,that mayleadotherco-scheduledtaskstoresourcestarvation.Inour work,thisisavoidedthroughtheuseoftherealtimescheduling thatlimitstheresourcesassignedtoatask(likeaVM)toa prede-ﬁnedpercentage.Thustemporalisolationisachievedbetweenthe concurrentlyrunningtasks.

InWengetal.(2009),aninterestingapproachforscheduling VMsispresented,withtwomodesofoperation(high-throughput andconcurrent)whichaimsatefﬁcientlyhandlingCPUtimefor thephaseswheretheVMshavemultiplethreadsrunning concur-rently.Thisapproachismorefocusedontheschedulingaspectsand doesnottakeintoconsiderationconcurrentVMsrunningonthe sameserver.TheVirtualGEMSsimulatorfortestingdifferent con-ﬁgurationswithregardtoL2setupbetweendifferentconcurrently runningVMsappearsinGarcía-Guirodoetal.(2009).

A benchmarking suite speciﬁcally for VM performance is depictedinEl-RefaeyandRizkae(2010).Whileitpinpointssome importantfeaturesforbenchmarkingofvirtualizedapplications,it mainlyfocusesononetypeofworkload(linuxkernelcompilation). Asurveyofvirtualizationtechnologiesinadditiontoperformance resultsfordifferentconﬁgurationsappearsinWhiteandPilbeam (2010).InCucinottaetal.(2009,2010b),aninvestigationis car-riedoutonthepossibilitytoenhancethetemporalisolationamong virtualmachinesconcurrentlyrunningonthesamecore,byusing theIRMOSreal-timescheduler,focusingoncompute-intensiveand network-intensiveworkloads.

InKhalidetal.(2009)averyinterestingapproachforadaptive real-timemanagement of VMsin HighPerformance Computing (HPC)workloadsispresented.Thismechanismisusedduringthe operationoftheHPCinfrastructure,inwhichtheVMshavea prede-finedrunningperiodwhichisdecidedbytheinfrastructuresharing policy.Themaingoalistocalculateinrealtimeifthejobthatis includedintheVMisgoingtobefinishedintimeornot.Theones thatwillnotanywayfinishintimeareprematurelykilledinorderto improvethesuccessrateoftheoverallinfrastructure.Inourview, thisworkcanbeusedincombinationwiththeonepresentedinthis paper.Themodelsthataredescribedherecanbeusedforobserving andminimizingtheoverheadofaspecificallocationoverthe differ-entnodes,whiletheworkpresentedinKhalidetal.(2009)canbe usedfortheruntimeoptimizationofthevirtualizedinfrastructure. GA-basedoptimizationofANNshasbeenusedinthepastina numberofsignificantworks.Forexample,inLeungetal.(2003),a numberoftheparametersofthenetwork(likeconnectionweights) arechosenthroughGAs.Themaindisadvantageofthisapproach liesonthefactthatastaticstructureforthenetworkisrequired.In othercases(likeinIlonenetal.(2003))theyareusedforindirect optimization,throughforexampleoptimizingthetrainingprocess. InFiszelewetal.(2007),anotherapproachcombiningGAsandANN architectureispresented,howeverthestructureofthenetworkis limitedtotwohiddenlayersandstatictransferfunctions,while experimentingwithadifferentsetofparameterssuchasnumber ofnodesandtheirconnections.ThesameappliesforGiustolisiand Simeone(2006).Inourapproach,thenetworksmayhaveadynamic topology,intermsofnumberofhiddenlayers,numberofneurons perlayerandtransferfunctionsperlayer.Otherparameterssuch asdifferenttrainingmethodscanbeaddedatwill.

3. Investigatedparameters

Inageneralpurposeoperatingsystem,whentwotasksstartto execute,theycompetewitheach otherfor theunderlying com-putationalresources.Theamountofthelatterthatisassignedto each taskmaybedividedaccording tothetemporal needsand demandsofthem.However,thisresultsinaninabilitytopredict

theQoSofferedbyanapplicationrunninginsideaVM,inshared infrastructures.The percentage ofCPU allocatedtoit mayvary signiﬁcantlydependingontheco-locatedtasksdemand(likethe bootingofanotherVMinTickooetal.(2010)).Onesolutionto mit-igatethiseffectisderivedfromtherealtimeworld.Throughthe useofEDF-basedscheduling,thepercentageassignedtoataskmay beguaranteed.Ingeneral,forthistypeofscheduling,therearetwo signiﬁcantparameters,thebudgetQandtheperiodP.Qisthe com-putationaltimeonthephysicalnodethatisassignedtoataskover aperiodP.Thus,theratioofQ/Pistheresultingpercentageonthe machine.However,foragivenpercentage,differentgranularities mayexist,dependingonP(forexample,50%maymean50msevery 100msoritmaymean250over500ms).Oneinterestingquestion ishowtheseparametersaffectapplicationperformance.

As investigated in Cucinotta et al. (2010a), Kousiouriset al. (2010),theeffectofreal-timeschedulingparametersmayaffectthe applicationperformanceinavarietyofways.Forexample,inour aforementionedworks, that investigateinteractiveapplications, differentgranularitiesofthesame CPUpercentage assignments lead toincreasedstandard deviationvalues forserverresponse times.Theconclusionfromthatanalysiswasthatforinteractive applications,lowgranularity(smallschedulingperiodP)leadsto morestableperformanceforthesamepercentageofCPUassigned. In this work,we focus on6computationally intensive applica-tions.Theeffectofschedulingdecisions(forbothpercentageofCPU allocatedandgranularityofthisallocation)ontheirperformance metricsisobservedalongwiththeinterferenceoftheco-allocation ofVMsforthesameCPU%assigned.

AsidentifiedinIRMOSprojectD5.1.1 (2009),themetricthat iscurrentlyusedbyinfrastructureprovidersinordertodescribe hardwarecapabilitiesismainlyCPUfrequency.Howeverthis met-ricisfarfromsufficient,giventhatitdoesnotreflectthephysical node’s ability to solve problems (does not take under consid-eration other factors such as bus speed, memory speed, etc.). Otherapproaches suchastheBerkeleydwarfs(http://view.eecs. berkeley.edu/wiki/Dwarfs)or Matlabbenchmarktests aremore abletocaptureanode’scomputationalcapability.In thispaper the second approach was chosen. The Matlab benchmarks (http://www.mathworks.com/help/techdoc/ref/bench.html) con-sistof6tests,thatmeasurethephysicalnode’s(andMatlab’sfor thismatter)abilitytosolveproblemswithdifferentcomputational requirements.Thereforethesetestsareusedforbothdetermining thehardwarecomputationalcapability(testscore)andthe charac-terizationofthetypesofworkloads(testnumber).

Whatismore,theyrepresentawideareaoftypesofworkloads (Table1)withalimitednumberoftestsanditisthereasonthe spe-ciﬁcapproachwaschosen.Forthese,weinvestigatetheeffectofthe CPUallocationandtheeffectoftheircombinationinconcurrently runningVMs.Itisexpectedthatdifferentcombinationswillhave differenteffect,duetothenatureofthecomputations.The degra-dationoftheperformanceisreﬂectedonthetestscore.Lowertest scoresindicatebetterperformance.Ingeneral,thetestscoreisthe timeneededtomakethenecessarycalculationsforeachtest.

Furthermore,differentallocationscenariosareinvestigatedfor multi-corearchitectures.TheVMsarepinnedwithavarietyofways

Table1

Details regarding the Matlab benchmark tests and the speciﬁc

computa-tional patterns thattheyrepresent (http://www.mathworks.com/help/techdoc/

ref/bench.html).

Test1 Floatingpointoperationswithregularmemoryaccesses Test2 Floatingpointoperationswithirregularmemoryaccesses

Test3 Datastructures

Test4 Floatingpointandmixedintegeroperations

Test5 2-Dgraphics

(4)

oncoresofthesamephysicalnodeinordertoexaminethe

interfer-enceeffectperscenarioontheirperformance.Thisisexpectedtobe

reducedduetothetemporalisolationwhensharingthesamecore

(withtheexceptionofcacheinﬂuence).Howeveritisinterestingto

seewhathappenswhentheVMsthatcontaintheindividualtests

arescheduledondifferentcoresonthesamephysicalnode,thus

runninginrealparallelismandcontendingforthesamesub-circuits

liketheRAMbus.

So,overallwhatisinvestigatedis:

• differentcore%assignmentsforasingleVMrunningan

elemen-tarytest,

• differentgranularityofallocationassignmentsfora singleVM

runningatest,

• mutualinterferenceeffectofcombination oftestsfor

concur-rentlyrunningVMsonthesamecore(sharedL1andL2cache

memories)andforavarietyofschedulingperiods,

concur-rentlyrunningVMsonadjacentcores(withsharedL2cache)on

thesamephysicalnodeandforavarietyofschedulingperiods,

concur-rentlyrunning VMs onnon-adjacentcores (non-shared L1/L2

cache)onthesamephysicalnodeandforavarietyofscheduling

periods,

• ping times for differentping periods,%core assignments and

schedulingperiodsofVMsinordertoinvestigatetheeffecton

networkresponsiveness.

4. Testbed

For the real time scheduler, the one that is presented in

(Checconietal.,2009)waschosen.Thehostoftheexperimentswas aquadcore(2.4GHz)CPU,with8GBofRAMand8MBofL2cache (dividedintwo4MBpartspercorecouple).Powermanagement wasswitchedoffinordernottodynamicallychangetheprocessor speedforpowerefﬁcientoperation.ThehostOSisUbuntuLinux 10.10,theVMOSisWindows7(withCygwinandOpenSSH)and thehypervisorisKVM.TheMatlabexecutablecontainingthetests tookasargumentsthedurationoftheexperiment,thenumberof testtorun,anexecutionIDandtheschedulingparameters(Q,P) fordocumentationpurposes.ItwascompiledinMatlabR2007b. In order to be able to run individual tests and not the entire suite,thevariationofthebenchmarkfunctionthatisprovidedin

(http://www.mathworks.com/matlabcentral/fileexchange/11984) was used. This was combined with suitable logic in order to performthetestsforaspecificperiodoftimeandnotforaspecific number of runs. The reason for this is that each one of the 6 elementary tests hasa different execution time, which is also affectedbytheresourcesallocatedtoit.Sothesynchronizationof themcannotbebasedonnumberofruns.Furthermore,inorder toruneachtestforanincreasednumberoftimes,thisduration wassignificantlylargersothatalargesamplecouldbecollected (intherangeofhundredsoftestrunsperconfiguration).Fromthis sample,themeanvalueoftheexecutiontimeswasextractedfor eachcase.

TheexperimentswerecoordinatedbyaJavamodule (Coordina-tor)thatwaspinnedononecore(CPU0).Thiscodeimplemented theinterfacetowardstherealtimescheduler,inordertochange theQ,PschedulingparametersontheprocessesoftheVMs.The VMswerepinnedtoeitheronecore(CPU2)oreachoneon dif-ferentcores(CPU1,2or3),dependingontheexperiment.After thesetupofthescheduler,theCoordinatorwasusedinorderto connectintotheVMsthroughSSHandlaunchdifferent combina-tionsoftheMatlabexecutablesforaspeciﬁcperiodoftime.The timewasmeasuredfrominsideMatlab,andeachexecutionwitha givenconﬁgurationwasperformeduntilacertaintimeperiodhad passed(500s).Thisleadstoasmallviolationofthetimeperiod, butitwasmuchsmallerthantheoverallduration.Itiscriticalto mentionthatifthefunctionsusedtomeasuretimearebasedonthe clockfunctionofMatlab,thisisfarfromaccurate.Itwasobserved thatwithhighutilizationinsidetheVM,thesetimingmechanisms lostcompletelytrackoftime (indicatively,whilethesimulation wasrunningfor2h,insidetheVMonly5measuredminuteshad passed).Thisiswhythetic-tocmeasuringfunctionofMatlabwas used.

Thepseudo-codefortheCoordinatorappearsinFig.1andfor theexecutableinsidetheVMinFig.2.Theoverall implementa-tiondesignandtheinteractionsbetweenthenecessarysoftware componentsappearinFig.3.Indetail,thedifferentstepsneeded are:

• Launching of the Java Coordinator. This component has the responsibility ofkeeping track of theexecution loops forthe variousparametersoftheexperimentslikeschedulingperiods, CPUpercentagesassignedandtestcombinations.Furthermore, itraisesthethreadsthatcontacttheschedulerinterface(Python script)forsettinguptheCPUallocationsforeachVM.

Fig.1.Pseudo-codeforJavaCoordinatorsequenceofactions.

(5)

Fig.3.Softwarecomponentsneededfortheimplementationandtheirinteraction:themaincomponentistheJavaCoordinatorwhichkeepstrackofexecutionloopsin

ordertocovertheinvestigatedinterval,setsuptheschedulingparametersoftheVMsandconnectstotheguestOSinordertolaunchthespeciﬁctestcombinationdictated

bythecurrentexecution.

• The next step is to raise different threads in order to con-necttotheVMs and launchtheMatlab executableswiththe speciﬁcarguments(testcombinations).Thesethreadsare syn-chronized in the end of their execution, in order to ensure theconcurrencyofthenextcombination execution.It is crit-icalto mentionthat in orderto ensurea stable execution of thethreadsfor anunlimitedperiod oftime, the implementa-tionsuggestedin( http://www.javaworld.com/javaworld/jw-12-2000/jw-1229-traps.html?page=4)was followed. This guaran-teesthattheoutputanderrorstreamsarecapturedefﬁciently, inordertoavoidcomputerhangs.

• WhenbothexecutionsinsidetheVMsarecompleted,theJava Coordinatorprogresseswiththenextconﬁgurationdictatedby theloop.

Threadsynchronizationwasperformedinordertoensurethat theVMsareexecutedconcurrently.TheresultsofeachVMwere keptinalogﬁlelocallyattheVM.Attheendtheyweremergedin ordertoobtaintheoveralllogs.Detailsregardingthetestbedare documentedinTable2.

Thedurationofeachrun(aspecifictestcombinationwitha spe-cifichardwareconfiguration)wassetto500s.Thiswassufficient timetoensurethatthethreadswouldbeconcurrentlyexecuted forthemajorityofthetime,withouthavingrandomssh connec-tiondelaysinfluencethisprocess.Thesshconnectiondelaysranged from2.8to3.8s.Thisiswhythreadsynchronizationwasperformed intheendofthe500-sinterval.Thethreadthatfinishedfirstwaited forthesecondonetocomplete.Itwasalsoenoughinordertoletthe testsrunforanincreasednumberofruns(intherangeofhundreds), thusleadingtosufficientsamplegathering.

Table2

Test-beddetailsregardinghardwareandsoftware.

HostOS LinuxUbuntu10.10 GuestOS Windows7 Hypervisor KVM

BenchmarksMatlabR2007bexecutable

PhysicalhostIntelCore2DuoQ6600(Quadcore2.4GHz,8MBcache,8GBRAM)

Furthermore,anexecution IDwaspassedinside theVMs,in

ordertobeincludedin thereportedresults. Thiswasaunique

numberforeachexaminedcombination.Thisisnecessarydueto

thefactthatit wasobserved thatfora numberofrandom

rea-sons(likebrokentcppipes,etc.)alimitednumberofcombinations

didnotsucceedinstartingorﬁnishing.ForthesecasestheIDdid

notgetlogged.Thiswaytheresultsfromtheseparateﬁlesthat

arekeptinsideeachVMandconsolidatedintheendcouldbe

ﬁl-teredinordertoruleouttheaforementionedcasesandrepeatthe

experimentforthem.

5. Measurements

Inthefollowingparagraphsthedifferentdeploymentscenarios

aredescribed,alongwiththeresultingmeasurements.Forallcases,

theschedulergranularitywasinvestigatedfrom200to800ms.A

warm-upperiodforthesystemwasintroduced,assuggestedbythe

literature,inorderfortheVMstobehaveinamorestablefashion.

Thatiswhytheexperimentsbeganfrom150ms,buttheresults

werediscardedforthatspeciﬁcvalue.Eachdatapointinthegraphs

showninthenextsectionsisthemeanvalueoftheindividualtest

runsinsidea500-sinterval.Ingeneraltheseindividualtestruns

wereintherangeofhundredsoftimes.

5.1. SingleVMonacore

Inthisscenariothemaximumpercentageassignedtoeachtest

is80%.Thisisduetothefactthattheremainingpartisleftforhost

systemtaskstoutilize.Inthisconﬁguration,eachVMisexecuted

asstandaloneononecore.NootherVMisrunning.Thiswas

per-formedinordertoseetheeffectofthegranularityofthescheduling

parametersandthepercentage allocation,inadditionto

acquir-ingthebaselinescoresofun-interferedexecution(standalone)for

comparisonwiththeVMconsolidationscenarios.Asscheduling

parametersareconsideredtheCPUshareXassignedtoataskin

aperiodP.Ineveryperiod,thetaskisgivenX*PmsofCPUtime.For

exampleifwehaveaschedulingperiodof500msandaCPUshare

of40%,thismeansthatthetaskwillbegiven200msofCPUtime

(6)

Fig.4. TestscoresfordifferentCPUpercentagesandgranularity(schedulingperiod) whenthetestsarerunningwithoutanyinterference(co-scheduledtests).Thetest score(verticalaxis)indicatesthetimeneededforexecutingonetestrunandhas beencalculatedfromthemeanvalueofhundredsoftestrunsinsidea500-s execu-tionrun.Theschedulingperiod(horizontalaxis)isthetimeintervalinwhichthe allocationofX%CPUshareisguaranteed.(a)Standalonetest1;(b)Standalonetest 4;(c)Standalonetest6.

In Fig.4thescoresfor each test areportrayed, fordifferent CPUsharesandperiodsofassignmentofthem.Lowertestscores indicatebetterperformance.Inessence,thetestscoreisthetime neededforcompletingonetestrun.Onlyanindicativenumberis portrayedhere,remaininggraphsareincludedinAppendixA(Fig. A.1).

Fortests1–4,theconclusionisthatthebehavioris indepen-dentofthegranularityofscheduling.Thisisanticipatedsinceina standalonebasis,theeffectofinterferenceincachememoriesfrom taskswitchingisnon-existent.Withrelationtotheimprovement ofperformancewhentheCPUshareincreases,thisisalmostlinear, asseenfromtheinverselyproportionaltestscores.

Fortests5–6(graphicstests),thebehaviorismoreoscillatory, whichcanbeattributedtothedifﬁcultyofthevirtualizationlayer toaccessoremulatethegraphicscard.

5.2. ConcurrentVMscollocatedonthesamecore

Thesecondexperimentaimsatexaminingtheeffectof concur-rencyontheperformanceofVMsthatarerunningonthesame core,thususingsharedL1andL2cachememories.Forthisreason, eachVMisallocated40%ofthecorecapacity.Theperformance scoresarecomparedtothe40%scoresofexperimentA.Resultsare showninFig.5.Theseillustratethescoresofeachindividualtest, whenrunningonthesamecorewitheachoftheothertests.That iswhythe(i,j)ﬁguredoesnotcoincidewiththe(j,i)ﬁgure.Lower testscoresmeanbetterperformance.

Itisevidentfromtheabovegraphsthatwhilethescheduling periodPoftheassignmentsincreases,theperformanceoverhead oftheco-scheduledtestsimproves.Thisisowedmainlytothefact thatwithlargerperiods(andthesame%ofCPU)theapplications havemoretimetoutilizecomponentssuchasthecachememories. AlowerperiodresultsinmorefrequenttaskswitchingontheCPU andthus highercache interferenceand contaminationbetween theconcurrenttasks.Furthermore,theoverheadfromthecontext switching(scheduleroverhead)isincreased.

Anotherconclusionwasthatinsomepartsthegraphsseemedto portraysomeanomalieslikeinthetest3graph(Fig.5c).However, aftercarefullyobservingtheresults,itseemedthatwhenthehigh valuesofonetestoccurred,therewerelowvaluesfortheothertest. Thisledustoplotthecombinedperformance(addedbenchmark scores)ofbothco-scheduledVMs,whichshowedamorelogical system-levelgraph.Itseemsthatwhatmattersinthiscaseiswhich taskstartsﬁrst.Inthetest-beddescriptioninSection4itwasstated thattheexecutionthreadsaresynchronizedintheendofthetesting period,becausethiscannothappeninthebeginning.Thethreads arelaunchedsimultaneously,butduetoanumberofrandom rea-sons(likesshdelayforexample),theexecutionofeachtestinthe respectiveVMsdoesnotstartatexactlythesametime.Thisdelay wasintherangeof3s,howeveritwasmuchlowerthantheselected 500sdurationofeachcombination.Fromthesystemlevelgraphs moreaccurateconclusionscanbededucted.Thesearecombined withtheotherscenariosandaredepictedinFig.6.

Furthermore,thereseemtobeextremelyhighvaluesforthe 250msperiodP.Thishappensforalmostallgraphsandis con-tradictory tothe general trend of the graphs. By taking under considerationthewaytheforloopsareconstructedinFig.1,we canconcludethatthishappensduetosomeinternalinterference intheguestOS(e.g.anOStaskthatstartsexecutinginsidetheVM) andinﬂuencesthemeasurementstakenatthisperiod.

5.3. ConcurrentVMscollocatedonadjacentcores

The third experiment aims at investigating the effect on the performance of VMs when running on different but adja-cent cores on the same CPU. This is to identify the effect of

(7)

Fig.5. Effectonindividualtestscoresfordifferentcombinationsofconcurrenttests.Eachofthegraphsindicatesatest’sperformance.Itcontains7lineswhichcorrespond

tothistest’sperformance,wheneachoftheothertestsisrunningontheotherVM,inadditiontoitsperformancewhenitisexecutedasstandaloneforcomparisonpurposes.

(a)Test1performance;(b)test2performance;(c)test3performance;(d)test4performance;(e)test5performance;(f)test6performance.

L1 cache memory interference, when compared to the results of B. In this setup, the cores have different L1 cache memo-riesbutcommonL2 ones.Thechosenutilizationpercentagefor thisexperiment was40% percore.Thereasonfor choosingthis percentage was to have comparable results with B, for which maximumassignmentoftwotasks ononecoreis40% pertask (aftersubtractingapproximately20%thatisalwaysavailablefor systemtasks).Comparisonwiththeotherscenarios isincluded inFig.6.

5.4. ConcurrentVMscollocatedonnon-adjacentcores

ThefourthconﬁgurationwasdecidedtobepinningtheVMson non-adjacentcores.Thereasonforthissetupisthatthephysical nodeoftheexperimentshas4cores,whichshareaL2cachein pairs.SobypinningtheVMs onnon-adjacentcores,thegainin termsofhavingnosharedcachememoriesisinvestigated,in com-parisontothepreviousexperimentwhereL2memorywasshared. Forallscenarios,thecombinedperformanceoftheVMsper

(8)

com-Fig.6. Comparisonofsystemlevel(addedscores)performanceforalldeploymentscenariosand40%CPUallocationperVM.Ineachofthegraphsaspeciﬁccombination

oftestsisinvestigatedinordertocomparetheiraddedscores(verticalaxis)withregardtodifferentdeploymentscenariosandhowthisperformanceisaffectedbythe

schedulingperiodP(horizontalaxis).(a)Tests1and1combinedperformance;(b)tests6and6combinedperformance;(c)tests1and4combinedperformance;(d)tests1

and5combinedperformance;(e)tests2and4combinedperformance;(f)tests2and6combinedperformance;(g)tests3and5combinedperformance;(h)tests4and6

combinedperformance.

binationoftestsisdepictedinFig.6.Inthisﬁgure,wehaveadded thetestscoresofbothVMsandhaveplottedallthedeployment scenariosforcomparisonpurposes.Theremainingcombinations thatarenotincludedinthisﬁgurearelocatedinAppendixB(Fig. B.1).

Theanticipationduringthebeginningoftheseexperimentswas thatinterferencebetweenmemoryusageofconcurrentlyrunning VMswouldsigniﬁcantlyaffectVMperformance.Thiswasvalidated fromalltestedcases.Thebaselinescoresaremuchbetterthanany otherconﬁguration,eventhoughthepercentageofcoreassignedto

(9)

Fig.7.Percentageofthedegradationofbenchmarkscoresforalltest

combina-tionsandscenariosandP=800msinordertoillustratetherangeoftheoverhead

andtheidentiﬁcationofoptimumcombinations.Inthehorizontalaxisthedifferent

testcombinationsareportrayedwhileintheverticalonethedegradationofthe

performanceasextractedfromEq.(1).

theVMswasthesame.Furthermore,differenttestscreatedifferent levelsofinterference,thusvalidatingtheinitialassumptionsthat carefulanalysismustbemadeinordertooptimizeVMgroupings onavailablenodes.Higherperiodsfortheschedulerseemalsoto beneﬁttheperformance,duetobettercacheutilization.

However,itwasalsoanticipatedthatmovingfromtightly cou-pled configurations (same core) to more loosely coupledones (adjacentandnon-adjacentcores)wouldbenefittheperformance oftheVMs.Afterobservingthegraphs,wecanconcludetothefact thatthisdoesnothappen.Giventhattheloosely coupledcases meanL1cacheindependence(foradjacentcores)andL1/L2cache independence(fornon-adjacentcores)itcanbeassumedthatthe bottleneckthatiscausingthislackofperformanceimprovement hastodowithRAMaccesses.Thefactorofinfluenceinthiscasecan bethememorybus(FSB)thatisnecessaryforaccessingtheRAM. Intheobservedhardware,thiswasa1066MHzbus.However,this bandwidthissharedbetweenthedifferentcorestryingtoaccess themainmemory.IfmorethanonecorestrytoaccesstheRAMat thesametime,theavailablebandwidthwillbedividedbetween them.Thus,whenallocatingVMsondifferentcores(withthesame 40%share),thedividedbandwidthisalimitingfactorincomparison tothesamecorescenario.Forthesamecoredeployment,VMsare executedoneaftertheother,inasequential,pseudo-parallel fash-ion.Duringthistime,eachVMhasalltheavailablebandwidthof theFSB.Whenallocatedindifferentcores,fortheintervalinwhich the40%activationperiodsoftheVMsoverlap,theirbandwidthon theFSBwillbedivided.GiventhatthenumberofaccessestoRAM (orwhen)thatarenecessarycanbefluctuating,thiscanexplain whyinsomecasesoneconfigurationisbetterthantheother,with oneeffectprevailingovertheother(cacheinterferenceovershared FSBbandwidth).

Inordertobetterconcludeandillustrateforwhichcombination ofteststhisimprovementisgreater,thepercentagedegradationof thesystemleveltestscoresisportrayedinFig.7,forP=800ms.We chosethespeciﬁcperiodduetothefactthatitwasshownfrom thepreviousgraphsthatforthisvalueofPthesystemhadtheleast overhead.Thedegradationoftheperformanceistheratiobetween thetestscoresoftheaddedconcurrentexecutionsandtheonesof theaddedstandaloneones,forallthecombinationsoftestsandis derivedfromEq.(1).

%Degradation=100×(TAI+T_TBI)−(TABase+TBBase)

ABase+TBBase (1)

Inthisformula,AandBarethetestnumberspercombination,Iis thedeploymentscenario(samecore,adjacentcores,non-adjacent cores)andBaseisthescoreswhenthetestsareexecutedas stan-dalone,withoutanyothertestrunningonthephysicalnode.

Theaimofthisﬁgureistoshowtheconcretepercentage over-heads ofdifferent scenariosand for differenttestcombinations andtoidentifyanypatternsthatcanbeofbeneﬁtforextracting genericconclusions.Fromthisitcanbeshown thatthe compu-tationalcapabilitiesoftheVMscanrangefromalmost0andup to160%,dependingonthecollocatedtasksandthedeployment scenario.

Fromthisgraphitisalsoevidentthatthebestcombinations areforallapplicationswithtest5andthesecondfittestcandidate istest6.Thereasonwhythesegraphicstestsarethefittestcanbe attributedtothefactthattheyuseasmallamountofdata,onwhich theyperformalargenumberofcomputations.Worstcandidatefor co-allocationistest4,whiletheadjacentcoresplacementseemsto producethemaximumoverhead(combinationofL2andFSBbus interference).Thisisevidenceoftheimportanceofthe investiga-tionoftheinterferencecausedbetweenco-scheduledVMs,which canleadtosignificantdegradationoftheQoSfeaturesof applica-tionsrunninginvirtualizedinfrastructures.Furthermore,inmost casestheadjacentcoresetuphastheworstperformance,duetoL2 cacheinterferenceanddivisionofmemorybusbandwidth.L1cache interferenceseemstoaffectonlyalimitednumberofcombinations (like1and2).

5.5. Statisticalanalysisofthemeasurements

Inordertogatherthedatasetthatwaspresentedintheprevious graphs,eachconfiguration(specificschedulingperiod,specifictest combinationanddeploymentscenario)waslettorunfor500s.This meansthateachpointinthegraphs(withtheaccordingsetup)was producedfromtakingthemeanfromalltherunsexecutedinthis 500-sinterval.Thenumberofrunswasfluctuatingduetothefact thateachtesthaddifferentcomputationalneedsandalsothe over-headfromtheconcurrentlyrunningVMswasvarying.Howeverit wasintherangeofhundredsoftimesforeachdatapointThisis indicativealsoofthetestscores,whichindicatethetimeneededto performonerun.Themeanvaluefromthesehundredsoftimesof executionsforeachconfigurationwasusedinthegraphs.Details regardingthespecificnumbersforeachexperimentareincluded inAppendixD(Fig.D.1).

In order to further validate the data gathering process, we repeatedanumberoftheexperimentsfor10times.Thismeansthat theexecutionswithaspecificconfiguration(testcombinations,Q/P ratio,Pschedulingperiod)wereexecuted10timesinorderto inves-tigatethedifferencebetweennon-consecutiveexecutionswiththe sameconditions.Duringthisexperiment,eachconfiguration (500-sinterval)wasrunfor10timesinanon-consecutivemanner.The results(meanvalues ofthetestscoresinthe500-sintervaland standarddeviationsofthevaluesinsidethisinterval)areportrayed inFig.8.Fromthisitcanbeobservedthatthedifferencebetween thedifferentexecutionsisverysmall.

Furthermore,thenumberofrunsforeachtestineachofthe exe-cutionswasdocumentedandisshowninFig.9.Fromthisitcanbe observedthatforeachexecution(fromwhichthemeanvalueand standarddeviationthatisportrayedinFig.8wasextracted)around 320runswereconductedfortest4andaround230runsfortest5. Thedifferenceinthenumberofrunsbetweenthetwotestsoccurs fromthefactthateachtesthasdifferenttypesofcomputations, thusdifferenttimeofexecution.Itwaschosentosynchronizethe executionsbasedonagiventimeinterval(andnotonnumberof runs)inordertoensuretheconcurrencyofthecomputations.By havingthesamenumberofruns(anddifferenttimeforeachrunof eachtest)wewouldhavetheproblemofonetestﬁnishingfaster thananother.Thustheremainingrunsoftheslowertestwouldbe executedasinthestandalonephase.

Fromtheabovegraphswecanconcludethatduetothefactthat thetestrunsineachexecutionaresigniﬁcantlyhigh,repetitionof

(10)

Fig.8.Variationofthemeantimesandstandarddeviationsofthetestscoresfor

10differentnon-consecutiveexecutionsofthesameconﬁguration(combination

oftests4and5,schedulingperiod=700ms,samecoreexecution).Thesamplefor

extractingthemeanvalueandstddevforeachexecutionhasbeengatheredafter

lettingeachexecutionrunfor500s.

Fig.9.Numberoftestrunsinsidethe500-ssamplegatheringintervalforeachof

thedifferentnon-consecutiveexecutionsofthesameconﬁguration(combination

oftests4and5,schedulingperiod=700ms,samecoreexecution).

theexperimentswillproduceverysimilarvalues(asdepictedin Fig.8).Thedistributionoftheindividualvaluesofthetestscores fortest4appearsinFig.10,foroneofthe10executions(Fig.10a) andall10repetitions(Fig.10b).Fromthiscomparisonitcanbe concludedthatthetwodistributionsarealmostalike,showingthe peaksinthesamebins(0.8–1.2and1.2–1.4values).

Regardingthestatisticalanalysisofthesamplesgatheredinside asingleexecution,theconfidenceintervalsaredepictedinFig.11, forbothtests(4and5)oftheselectedrun.Forthisrun(500-s inter-val),test4wasexecuted232timeswhiletest5wasexecuted321 times.Inordertoprocesstheresults,thedfittoolofMatlabwas used.Fromthiswecanobservethat,especiallyfortest5thereare twomajorareasofconcentrationofthetest scores,aroundthe 0.9and1.3values.Thiscanbeattributedtotheinactiveperiods insertedbytheschedulerforthetask.Thisconfigurationhada40% shareoveraperiodof700ms.Thismeansthatfor60%ofthetime (420ms)thetaskisdeactivated.Thusifthetestrunmanagesto finishbeforetheinactiveperioditwillnothavetheextradelayof the420ms.Ifnot,thenitwillfinishrightafterthatinterval.This agreesalsowiththedifferencebetweenthetwomajorvaluesof thedistribution.

5.6. Networkresponse

Duringtheinitialtestsfortheexperimentsof5.1itwasnoticed that for low values of period P (below 100ms) the VMs were extremelyunresponsive,evenforasimplesshconnectionorping.

Fig.10. Histogramofthedistributionofindividualtimesforeachtestrunfor(a)1

executionofthe500-sintervaland(b)10executionsofthe500-sintervalwitha

givenconﬁguration(combinationoftests4and5,schedulingperiod=700ms,same

coreexecution).

Fig.11.Cumulativedistributionfunctionandconﬁdenceintervalsfortests4and5

executionsamplesforschedulingperiod=700ms,samecoreexecution.The

hori-zontalaxissymbolizesthetestscoresandtheverticalaxistheprobabilitytohave

thesescores.

Forthisreasonitwasdecidedtoconductanotherexperimentin order toinvestigate the deterioration of VM network response times.ThisinvestigationinvolvedthepingingofoneVMfora cer-tainperiodoftime(300s)foranumberofdifferentconﬁgurations. Differentpingperiodswereused.Thenumberofsamplescanbe derived fromthedivision ofthe300-s period withthepinging period.DifferentpercentagesofCPUassignmentswerealsotaken underconsiderationanddifferentperiodsPofthescheduling allo-cation.DetailedvaluesappearinTable3.

InordertostresstheCPUtofullload,twoapproacheswere con-sidered.TheﬁrstwaswithoneoftheMatlabbenchmarks.Inorder toavoidtheeffectofsomespeciﬁccomputationinterferencethat couldbecausedbythesetests(likeincreasedI/Othatcouldincrease

(11)

Table3

Parametersvariedforthepingexperiment.

Pingingperiod 0.5,1,1.5and2s

SchedulingperiodP 150–800ms(50msstep)

VMstatus Fullload

Q/P 20,40,60and80%

theservedinterruptsandaffectthepingresponses),another

exe-cutablewascreated.Thiswascomprisedofasimpleinﬁnitefor

loopthatcalculatedthesquarevalueoftheloopindex.Theresults

werealmostidentical.Inthefollowinggraphs(Fig.12),themean

valueanddeviationoftheresponsetimesaredepicted,forthemost stressfulexperiment (pingingperiod=0.5s).Morecombinations areincludedinAppendixC(Fig.C.1).

ItisimperativetostressthatthetestbeganfromP=50msof granularity,howevertheVMsshowedlittleresponsiveness,with manypackettimeoutsoccurring.Fullresponsivenesswasshown from150msvalues,andfromthesevaluesandontheplotswere created.Whatisseenfromthesemeasurementsisthatafterthe initialgranularitieswhentheVMisunresponsive,lowvaluesofP aremorebeneﬁcialforthenetworkmetrics(bothmeanvalueand deviationfromthatforeachindividualpackettime).Furthermore, onetheoreticalexpectancywouldbethatthepingtimeswould deteriorateasthePvaluesincreased,duetothefactthatforlow percentagesofCPUassignments,largeinactiveperiodsofthetask wouldaffectespeciallythedeviationtimes.However,asseenfrom thegraphs,thehighestvaluesforbothmetricsareformiddlecases ofP(forthe450mscase).Afterthismaximum,thenetworkmetrics seemtoimproveagain,howevernottothelevelsofthelowP val-ues.SimilarbehaviorwasnoticedforadjacentvaluesofP(200ms, 400msand800ms).Thesewereskippedforbettervisibilityofthe graphs.Anotherinterestingconclusionisthatthepingingperiod doesnotseemtoaffectthisbehavior.

ThesimilarityofthemetricsforhighvaluesoftheallocationQ/P (%CPUshare)isduetothefactthatforpercentagescloseto100% (likethe80%case),theCPUisalmostdedicatedtothetask,sothe periodofthisallocationdoesnotinﬂuencetheVMbehavior.Almost theentiretimethetaskisactive.

Forthestatisticalanalysisofthemeasurementswehaveused asanexampletheconﬁgurationforschedulingperiodof150ms, pinging period of 0.5sand CPU share of 60%. The distribution of the600 individualvalues collected duringthe sample gath-eringexperiment is depictedin Fig.13a.In Fig. 13b theﬁtting of an exponential distribution to these values is shown, with meanvalue=461.544,variance=213,023,stderror=18.8739and estimatedcovariance=356.226.Exponentialdistributionsare com-monlyusedfornetworkingtimes,asalsoshowninCucinottaetal.

Fig.13.(a)Distributionoftheindividualpingdelaysin500msbinsand(b)Density

distributionofindividualvalues(pingtimes)andﬁttingofanexponential

distribu-tionforschedulingperiodof150ms,pingingperiodof0.5sandCPUshareof60%.(a)

Histogramofthedistributionoftheindividualvaluesand(b)densitydistribution

andﬁttingofanexponentialdistribution.

(2010a).TheCDFandconfidenceintervalsforallCPUsharesofthis configurationareshowninFig.14,fromwhichthebenefitofhaving moreCPUshareisvisualizedinamorequantitativeway.

6. Predictionofoverhead

The experiments that were presented above are helpfulfor an infrastructure/Cloud provider in order tobe able to have a generalizedpictureofhowapplicationperformancedeteriorates followingtheplacementdecisions.Thehighlevelconclusionscould alwaysbeextractedasfuzzylogicrulesinasimilardecision mak-ingmechanismthatwouldaidtheproviderduringthescheduling

Fig.12.(a)Meantimeoftheresponsetimesforfullloadandvariousconﬁgurationsofgranularityofassignment(schedulingperiod)Pandpingingperiod0.5sand(b)Mdev

time(asreportedbythepingcommand)oftheresponsetimesforfullloadandvariousconﬁgurationsofgranularityofassignment(schedulingperiod)Pandpingingperiod

(12)

Fig.14.Cumulativedistributionfunctionandaccordingconﬁdenceintervalsforall

CPUshares,forschedulingperiodof150ms,pingingperiodof0.5s.Thehorizontal

axisisthetimedelayinmillisecondsandtheverticaltheprobabilitytohaveaping

timelessorequaltothisdelay.

and allocation policiesenforcement.However, in order for this approachtomeetitsfullpotential,asuitablegenericmodelshould bedesigned.Thismodelmustbeabletoextractaccurate predic-tionsregarding theanticipated performance oftheapplications thathavesimilarbehaviortotheinvestigatedtests.Itmustbeable totakeasinputtheconditionsofexecutionandproducethe antici-patedlevelofperformance(testscore),thusaccuratelyquantifying theexpectedoverhead.

Inordertoimplementsuchasystem,themostgeneric black-boxmethodwaschosen,anartificialneuralnetwork(ANN).ANNs areamethodologythataimstoimitatethebehaviorofthehuman brainandhavebeenuseduptonowforanumberofapplications (e.g.patternrecognition,functionapproximation,classificationor timeseriesprediction).Theirmainabilityistomodelasystem’s behaviorbasedonlyona trainingdataset(blackboxapproach) thatincludesanumberofsysteminputsandtheaccording out-puts.Ingeneral,theANNcorrelatestheinputwiththeoutputdata inordertofindanoverallapproximationofthemodelfunctionthat describesthedependenceoftheoutputfromtheinput.Itwas cho-sensinceasdemonstratedinSection5,therearenumerous,even hidden,factorsthatmayaffectanapplication’sperformancelike hardwaredesigndetails.Havingananalyticalmodelingapproach isalmostimpossible,giventhevarietyofthesefactors.Buteven ifonedevotessuchanefforttocreateadetailedmodel,itwillbe deemeduselesswiththenextprocessorgenerationoradifferent architecturaldesign.TheANNsofferamuchmoregenericand flexi-bleapproach,andgivenonlyasuitabledataset,canachieveaquick andsatisfactorymodel.

TheidentiﬁedinputsandoutputsoftheANNmodelaredepicted inFig.15andincludetheinvestigatedparametersoftheprevious sections.

Therangeorpotentialvaluesoftheparametersarelistedin Table4.

All the inputs and outputs were normalized in the (_−1,1) numericinterval.Forthenon-numericinputs,thesewereassigned

Fig.15.ANNmodel.

Table4

Rangeofparameters.

Testnumber/applicationtype1 Canbefrom1to6(Matlab benchmarks)

Q/P(core%) 0–80%

P Arbitrary(inourcaseitis

from200to800ms)

Executionmode Canbestandalone,same

core,adjacentcores, non-adjacentcores

differentvaluesforeachstate.Thenormalizationisnecessaryfor

internalANNrepresentationandbetterdepictionofthesystem.The

datasetthatwaspresentedintheprevioussections(1100different

executions,eachfor500s)wasusedfortraining(50%),intermediate

validationduringtraining(20%)andindependentvalidation(30%)

ofthenetwork.

6.1. OptimizationofANNstructureandparameters

Despitetheiradvantages,ANNscomewithanimportant

draw-back.Thedecisionsregardingtheirdesignparameters(likenumber

of hidden layers, transferfunctions for each layer and number

of neuronsper layer)are more or lessbased onthe designer’s

instinctandexperience.Inordertoautomateandoptimizethis

process,anevolutionaryapproachwasfollowedinthispaper.The

aforementionedparametersoftheANNdesignwereinsertedin

a Geneticalgorithm (GA) (Holland,1975), in order tooptimize

theirselection.In each generationof theGA,a setof solutions istried out,through trainingand validatingtherespective net-works.Theperformancemetricforeachsolution,inourcasethe errorinthetraining set,is thenreturnedtotheGA,asan indi-cationofitssuitability.Thenthenextgenerationiscreatedbythe bestsolutionsofthecurrentone(elitism),combinationsofexisting solutions(crossover)andslightdifferentiationofthem(mutation). Throughthisalgorithm,theparameterspaceisquicklysearched forasuboptimalsolution.TheparametersthatarefedintotheGA appearinFig.16.

Inorder toimplement theaforementionedapproach,an ini-tialGAscriptiscreatedinordertohandlethepopulationineach iteration and tolaunch theANNcreator script, afterfeeding it withthevaluesthataredepictedinthechromosomes.TheANN creatorscripttakesthevariablesfromtheGAscriptand dynami-callycreatestheANNsaccordingtotheGA-selectedparameters,by changingthenetworktopology.Thenitinitializesandtrainsthem (feed-forward,backpropagationnetworkswereused,trainedwith theLevenberg–Marquardt(Moré,1978)algorithm)andreturnsthe

Fig.16.ParametersoftheANNdesigntakenfromtheGA.Theseincludehowmany

hiddenlayerswillbeinitializedandwhatwilltheirtransferfunctionandtheir

neurons.Theseparameterswereingeneraldecidedbyahumanexpertbasedon

(13)

Table5

DetailsofthebestANNstructureasthiswasderivedfromtheoptimizationprocess,fordifferentnumberofgenerationsoftheGA.

Numberoflayers/GAgenerations Neuronsperlayer Transferfunctionsperlayer Meanabsoluteerror(%) Standarddeviation Time

4/30 5–7–19–1 Satlin–Radbas–Radbas–Satlins 5.94 16.47 40min

3/50 5–7–1 Tansig–Tansig–Purelin 5.06 13.85 1.1h

4/100 5–10–18–1 Radbas–Softmax–Radbas–Netinv 4.59 14.80 3h

4/150 5–24–8–1 Purelin–Logsig–Logsig–Satlins 5.16 14.61 5h

performancecriterion(errorinthetrainingset)totheGAforthe

evaluation ofeach solution.Anintermediate validationsetwas

usedinordertoenhancetheANNgeneralizationcapabilities,the

network’sabilitytopredictcasesthatithasnotmetbefore

dur-ingtraining.Inthiscase,eachnetworkwastrainedbythe50%of

thedatasetthenifitspredictionsontheintermediatevalidation

sethadameanabsoluteerror(MAE)smallerthan10%,this

net-workwassaved.Intheendofthegenerations,fromallthesaved

networks,theﬁnalcandidatewastheonethatshowedthebest

performance(MAE)intheindependentvalidationset.Itwasalso

investigatedifthereturnedmetricoftheGAwastheMAEinthe

intermediatevalidationset,insteadofthetrainingseterror.This

showedaworseperformance.

13 different transfer functionswere taken under

considera-tion(astheyaredescribedinhttp://www.mathworks.com/help/

toolbox/nnet/function.html) and a maximum limit of ten (10) layersandthirty(30)neuronsperlayerwasinserted.Four inde-pendentrunswereconductedfordifferentnumberofgenerations (30,50,100and150). Eachgenerationinvestigated20possible solutions.

TheANNstructureandoverallpredictionaccuracyfortheseis describedinTable5.Inthistable,onlythebestmodelfromeach runisdepicted.Themeanabsoluteerrorforall328independent validationcasesisbetween4.5and5.94,dependingonthe num-berofgenerationsused,whichcanbeconsideredasaverygood estimator.Onlythevaluesintheindependentvalidationsetwere usedforthecalculationofthiserror.Thesedatahavenotbeenused duringtraining.

The deterioration in the performance for 150 runs is not expectedbutcanbeattributedtothefactthateachGArundepends onsomerandomparameters,liketheinitialvalues ofthe chro-mosomeortherandomnessintheselectionofcharacteristicsto mutate. In a real-world situation,this can beeasily tackledby addinggenerations ofinvestigationuntila speciﬁcperformance criterionismet.Themostsuitablesolutionsarefoundinthe50 and100generationsruns.Thedifferenceliesinthefactthatfor 100generationsabettermeanerrorisacquiredbutaworse devi-ationmetric.Theﬁnalchoicedepends onthetrade-offbetween thesecharacteristics.

Anotherinteresting conclusionis that smallernetworks are moreabletocapturethedependenciesbetweentheinputandthe output.Eventhoughthenumberoflayershadamaximumvalue of10, only3or4-layernetworks showedgoodperformance(3 istheminimumlimitforANNs).Thiscanbeattributedtomore robustbehavior ofsmall networksdue tothefollowing aspect. Theinﬂuenceofthejthinputtotheithoutputdependsonallthe alternativepathsthatconnectthemandtheincreaseofthispath number(becauseofhighernumberoflayers)seemstoincrease thestandarddeviationoftheprediction.Asimilarresultappears inthecircuitsynthesisdomain,wheretheincreaseinthenumber ofpaths(laddercompared tolatticesynthesis)resultsinhigher systemsensitivity.

InFig.17theevolutionofthemeanabsoluteerrorofthesaved networksisportrayedforeachrun.Thesenetworksaresavedas theGAprogressesandiftheintermediatevalidationcriterionis met(errorintheintermediatevalidationset<10%).Thelackof sta-bility(continuouslyreducedmeanerror)inthisprocessisdueto

Fig.17.Temporalevolutionoftheperformance(meanabsoluteerroronthe

inde-pendentvalidationset)ofthenetworksthatmeetthesavecriterion(MAE<10%

ontheintermediatevalidationset).Afteraperiodoftime,theGAhasconverged

toasuitablesolution,afterwhichtheperformanceoftheinvestigatednetworks

deteriorates.

thefactthatrandomnessisuptoapointinherentinan evolution-aryapproach duetotheuseof operatorssuchasmutationand crossover.

Furthermore,fromthisgraphitisevidentthatalargenumber ofgoodqualitynetworksareproducedthroughtheproposed opti-mizationprocess.It canalsobeseenthatthenetworksthatare savedinlatergenerationsofonerunappeartobelosingquality. ThiscanbeattributedtothefactthatwhentheGAfindsagood solution,italsofocusesonnearbysimilarconfigurations.Ifthegood solutionisthelocalorglobaloptimum,thenthenearbysolutions willbeunderperformingandalargenumberofthecandidateswill bedevotedtothisareainvain(othersolutionsbasedoncrossover willsearchtheremainingsamplespace).Howeverthisisalsooneof thestrengthsofGAs,sincethroughthisprocessintheearlystages vastareasofthesamplespacearetriedoutandafterwardsthebest onesarethoroughlyexaminedinordertofindtheoptimalsolution inthatparticularinterval.

Thedistributionoftheerrorsofthenetworkinpredictingthe performance(testscores)isdirectlycomparedtothemulti-variate regressionmethodinFig.18.Thisaccuracyisbasedonlyonthe independentvalidationsetonly,forobjectivitypurposes.

6.2. Comparisonwithmulti-variatelinearregression

Inordertocompareourapproach,themultivariateregression methodwaschosen,thatisalsousedinKohetal.(2007).The MAT-LABfunctionmvregresswasused,inthesamenormalizeddataset. Theonlydifferencewasthatnowthetrainingandintermediate validationdatavectorsweremerged.Inthismethod,thereisno needtohaveintermediatevalidationstep.Forthevalidationofthe results,thesame30%independentvalidationdatasetwasused, forobjectivitypurposes.Thedetailsoftheapproximationfunction

(14)

Fig.18.Histogramoftheerrorsintheindependentvalidationsetfor50and100

generationrunsoftheGA-ANNmodelcomparedtomulti-variateregression.The

differenceliesonthecenterofthedistributions,whichfortheproposedmethod

isaround0whileformv-regressisaround8%,Thehorizontalaxisistheerror(%)

intheestimation(inbins)whiletheverticalisthenumberofvalidationcasesfor

whichtheerrorwasinthespeciﬁcbin.

Table6

Multi-variateregressiondetails.

Coefﬁcients Meanabsoluteerror Standarddeviation Timeneeded

0.0218 8.78% 18.15 ∼1s −0.0171 0.8326 −0.0801 0.0631 0

appearinTable6.Thecoefﬁcientscellcontainsintherespective

orderthemultipliersforeachoneofthe5inputsdepictedinFig.15, plustheconstantfactor.Theirvaluesarelowbecausetheyhave beenderivedforthenormalizedvaluesofthedatasetinthe[−1,1] interval.

TheresultsfromthismethodhavebeenincorporatedinFig.18 fordirectcomparisonwiththebestANNmodelsfromthe optimiza-tionmethod.

Fromthecomparisonofthetwoapproachesitcanbeconcluded thattheoptimizedANNshaveabetterperformanceintermsof meanabsoluteerrorandstandarddeviation.Thisisalsoevident fromthedistributionoftheerrorsasdepictedinthehistogram. WhiletheANNmodelsexhibithighconcentrationoferrorsaround thezerovalue,thisisnotthecasefortheregressionmodel,which concentratesaroundthe8–10%errorpoint.

Thestandarddeviationvaluesarehighforbothmethods,but this is due tospeciﬁc values of the328-case validation set for which the measurements have large deviations. This happens for thepeaks ofthe graphspresented in Section 5 and canbe attributedtosomerandomeffectsthataffectsystemperformance (e.g.systemtasks that updatethe OS inside theVM, etc.),like in the case of P=250ms that was mentioned in the previous sections.

However,becausetheproposedmethodhasmuchlarger com-putationalneeds,forenvironmentswhere newmodelsmustbe createdontheﬂythemulti-variateregression method offersa satisfactoryperformanceinaveryfastway.Inthecontextof inves-tigationinthispaper(infrastructure/Cloudproviderthatwantsto haveamodelformoreefﬁcientallocationofVMsovernodes),the computationalneedsoftheoptimizedANNmethodarenot con-sideredaconstraint.Themodelswillbecreatedonce,andmaybe refreshedinthefuturewithnewdata.However,tighttiming con-straintsfortheproductionofthemodelsdonotexist.Thisisan off-lineprocess.

Detailed percent improvements achieved by the compared methodaredepictedinTable7.

For the response time of the models, that is the duration needed for each model to provide the prediction after the

Table7

ComparisonbetweenmultivariateregressionandGA-optimizedANNs.

Method %Improvement(ANN

meanabsoluteerror comparedto mvregress) %Improvement(ANN standarddeviation comparedto mvregress) 50gen.GA-ANN 42.36% 23.64% 100gen.GA-ANN 47.61% 18.4%

assertion of the conditions of execution to theinput, this was

better for the multivariate regression. However, for the ANN

it was also very fast (∼10ms), thus not making this a critical

limitation.

7. Conclusionsandfuturework

Asaconclusion,theemergenceofvirtualizedandshared

infras-tructureslikeCloudsimposesasigniﬁcantchallengeforproviders

and applications. Applications that are running inside virtual

machinesareaffectedbymanyfactorslikevirtualizationand

co-allocationofotherVMs.Inthispaper,theeffectsoftheseallocation

decisionswereinvestigated,bytakingunderconsiderationa

sig-niﬁcantnumberofparameterslikerealtimeschedulingdecisions,

typesofworkloadanddifferentdeploymentscenarios.The

perfor-manceoverheadposedbytheseparameterscanreachupto150%,

whilecarefullyselectingtheco-allocatedtasksandthescenarioof

placementcanminimizeorevencancelthiseffect.Whatismore,

higherschedulingperiodsresultinasigniﬁcantreductioninthe

overheadcausedbytheinterference.Furthermore,theapplication

ofsuchaperformanceanalysismethodonnewhardwaredesign

mayleadtooptimizedmulti-corearchitecturesforCloud-speciﬁc

environments,throughtheidentiﬁcationofbottleneckslikethe

sharedmemorybus.

Inordertoprovideanautomaticdecisionmakingmechanism

thatwill guidetheinfrastructurestoproperconﬁgurationsand

removetheneedofhumanintervention,aGA-optimizedANNcan

beusedtomodel,quantifyandaccuratelypredicttheperformance

oftheapplicationsforagivenconﬁguration.Themodelacquired

fromthisprocess iseffective(error<5%),generic andcanbeat

any time extended in order to include other identiﬁed

signiﬁ-cantfactorsorscenarios.Throughtheuseofthismechanism,an

infrastructure/Cloudprovidercanhaveaprioriknowledgeofthe

interference.Thustheyareabletooptimizethemanagementof

thephysicalresources.

Forthefuture,oneinterestingaspecttopursueistheautomatic

detectionofthetypeofworkloadcausedbyeachapplication.Inthis

study,theanalysiswasbasedonthe6Matlabbenchmarktests,that

depictdifferenttypesofcomputations.Forreal-worldapplications

thismeansthattheymust bematchedtooneormoreofthese

elementarytests,basedforexampleonthecomparison oftheir

footprints.

Acknowledgments

ThisresearchispartiallyfundedbytheEuropeanCommission

aspartoftheEuropeanIST7thFrameworkProgramthroughthe

projectsIRMOSundercontractnumber214777andOPTIMISunder

contractnumber257115.

AppendixA. RemainingﬁguresfromSection5.1

(15)

Fig.A.1.RemainingﬁguresfromSection5.1(Fig.4:testscoresfordifferentCPUpercentagesandgranularity(schedulingperiod)whenthetestsarerunningwithoutany

interference(co-scheduledtests).Thetestscore(verticalaxis)indicatesthetimeneededforexecutingonetestrunandhasbeencalculatedfromthemeanvalueofhundreds

oftestrunsinsidea500-sexecutionrun.Theschedulingperiod(horizontalaxis)isthetimeintervalinwhichtheallocationofX%CPUshareisguaranteed).(a)Standalone

test3;(b)standalonetest2;(c)standalonetest5.

AppendixB. RemainingcombinationsfromSection5.4

SeeFig.B.1.

AppendixC. Figuresforpingingperiod=2s(continuation ofSection5.5)

(16)

Fig.B.1.RemainingcombinationsfromSection5.4(Fig.6:comparisonofsystemlevel(addedscores)performanceforalldeploymentscenariosand40%CPUallocation

perVM.Ineachofthegraphsaspeciﬁccombinationoftestsisinvestigatedinordertocomparetheiraddedscores(verticalaxis)withregardtodifferentdeployment

scenariosandhowthisperformanceisaffectedbytheschedulingperiodP).(a)Tests1and3combinedperformance;(b)tests4and4combinedperformance;(c)tests4

and5combinedperformance;(d)tests5and5combinedperformance;(e)tests3and4combinedperformance;(f)tests2and3combinedperformance;(g)tests1and6

combinedperformance;(h)tests3and6combinedperformance;(i)tests2and5combinedperformance;(j)tests2and2combinedperformance;(k)tests3and3combined

(17)

(18)

Fig.C.1.(a)Meantimeoftheresponsetimesforfullloadandvariousconﬁgurationsofgranularityofassignment(schedulingperiod)Pandpingingperiod2sand(b)Mdev

time(asreportedbythepingcommand)oftheresponsetimesforfullloadandvariousconﬁgurationsofgranularityofassignment(schedulingperiod)Pandpingingperiod

2s(continuationofSection5.5).

AppendixD. Numberofexecutionsforeachofthedata pointsinsamecorescenariographs

(19)

Fig.D.1.Numberoftestrunsfromwhichthemeanvaluesforthescoreswereextractedforthesamecorescenariographs(Fig.5)for(a)test1whenitiscoscheduledwith

allothertests,(b)test2whenitiscoscheduledwithallothertests,(c)test3whenitiscoscheduledwithallothertests,(d)test4whenitiscoscheduledwithallothertests,

(e)test5whenitiscoscheduledwithallothertests,(f)test6whenitiscoscheduledwithallothertests.(a)Test1numberofrunswhenco-scheduledwithallothertests;(b)

test2numberofrunswhenco-scheduledwithallothertests;(c)test3numberofrunswhenco-scheduledwithallothertests;(d)test4numberofrunswhenco-scheduled

(20)

(21)

Fig.D.1. (Continued).

References

Armbrust, M., Fox, A., Grifﬁth, R., Joseph, A.D., Katz, R., Konwinski, A., Lee,

G.,Patterson,D.,Rabkin,A., Stoica,I.,Zaharia, M.,2010.Aview ofcloud

computing. Commun. ACM 53 (4), 50–58, doi:10.1145/1721654.1721672,

http://doi.acm.org/10.1145/1721654.1721672.

Checconi,F.,Cucinotta,T.,Faggioli,D.,Lipari,G.,2009.Hierarchicalmultiprocessor

CPUreservationsfortheLinuxKernel.In:Proceedingsofthe5thInternational

WorkshoponOperatingSystemsPlatformsforEmbeddedReal-Time

Applica-tions(OSPERT2009),Dublin,Ireland.

Cucinotta,T.,Anastasi,G.,Abeni,L.,2009.Respectingtemporalconstraintsin

vir-tualisedservices.In:Proceedingsofthe2ndIEEEInternationalWorkshopon

Real-TimeService-OrientedArchitectureandApplications (RTSOAA2009),

Seattle,Washington,July.

Cucinotta,T.,Checconi,F.,Kousiouris,G.,Kyriazis,D.,Varvarigou,T.,Mazzetti,

A., Zlatev, Z., Papay, J., Boniface, M., Berger, S., Lamp, D., Voith, T.,

Stein, M., 2010a. Virtualised e-Learning with real-time guarantees on

the IRMOS platform. In: Proceedings of the IEEE International

Confer-enceonService-OrientedComputingandApplications(SOCA2010),Perth,

Australia.

Cucinotta,T.,Giani,D.,Faggioli,D.,Checconi,F.,2010b.Providingperformance

guar-anteestovirtualmachinesusingreal-timescheduling.In:Proceedingsofthe

5thWorkshoponVirtualizationandHigh-PerformanceCloudComputing(VHPC

2010),Ischia(Naples),Italy.

El-Refaey,M.A.,Rizkaa,M.A.,2010.CloudGauge:adynamiccloudandvirtualization

benchmarkingsuite.In:EnablingTechnologies:InfrastructuresforCollaborative

Enterprises(WETICE),201019thIEEEInternationalWorkshopon,vol.,no.,pp.

(22)

Fiszelew,A.,Britos,P.,Ochoa,A.,Merlino,H.,Fernández,E.,García-Martínez,R.,

2007.Findingoptimalneuralnetworkarchitectureusinggeneticalgorithms.

Adv.Comput.Sci.Eng.Res.Comput.Sci.27.

García-Guirado,A.,Fernández-Pascual,R.,García, J.M.,2009.Virtual-GEMS:An

infrastructuretosimulatevirtualmachines.In:FifthAnnualWorkshopon

Mod-eling,BenchmarkingandSimulationMoBS2009heldinConjunctionwiththe

36thAnnualInternationalSymposiumonComputerArchitectureSunday.

Giustolisi,O.,Simeone,V.,2006.Optimaldesignofartiﬁcialneuralnetworksbya

multi-objectivestrategy:groundwaterlevelpredictions/Constructionoptimale

deréseauxdeneuronesartiﬁcielsselonunestratégiemulti-objectifs:prévisions

deniveaupiézométrique.Hydrol.Sci.J.51(3),502–523.

Holland,J.,1975.AdaptationinNaturalandArtiﬁcialSystems.UniversityofMichigan

Press,AnnArbor.

http://support.gams-software.com/doku.php?id=platform:aws. http://view.eecs.berkeley.edu/wiki/Dwarfs. http://www.irmosproject.eu/Postproduction.aspx. http://www.javaworld.com/javaworld/jw-12-2000/jw-1229-traps.html?page=4. http://www.linux-kvm.org/. http://www.mathworks.com/help/techdoc/ref/bench.html. http://www.mathworks.com/help/toolbox/nnet/function.html. http://www.mathworks.com/matlabcentral/ﬁleexchange/11984. http://www.vmware.com/. http://www.xen.org/.

Ilonen,J.,Kamarainen,J.K.,Lampinen,J.,2003.Differentialevolutiontraining

algo-rithmforfeed-forwardneuralnetworks.NeuralProcess.Lett.17(February(1)).

IRMOSProjectD5.1.1,2009.Modelsofrealtimeapplicationsonserviceoriented

infrastructures,It-Innovationandotherpartners.

Jin,H.,Cao,W.,Yuan,P.,Xie,X.,2008.VSCBenchmark:benchmarkfordynamic

serverperformanceofvirtualizationtechnology.In:Proceedingsofthe1st

Inter-nationalForumonNext-generationMulticore/ManycoreTechnologies,Cairo,

Egypt.

Khalid,O.,Maljevic,I.,Anthony,R.,Petridis,M.,Parrott,K.,Schulz,M.,2009.Dynamic

schedulingofvirtualmachinesrunningHPCworkloadsinscientiﬁcgrids.In:

NewTechnologies,MobilityandSecurity(NTMS),20093rdInternational

Con-ferenceon,vol.,no.,pp.1–5,20–23Dec.,doi:10.1109/NTMS.2009.5384725.

Koh,Y.,Knauerhase,R.C.,Brett,P.,Bowman,M.,Wen,Z.,Pu,C.,2007.Ananalysis

ofperformanceinterferenceeffectsinvirtualenvironments.In:ISPASS,.IEEE

ComputerSociety,pp.200–209.

Kousiouris,G.,Checconi,F.,Mazzetti,A.,Zlatev,Z.,Papay,J.,Voith,T.,Kyriazis,D.,

2010.Distributedinteractivereal-timemultimediaapplications:asampling

andanalysisframework.In:1stInternationalWorkshoponAnalysisToolsand

MethodologiesforEmbeddedandReal-timeSystems(WATERS2010),Brussels,

Belgium.

Leung,F.H.F.,Lam,H.K.,Ling,S.H.,Tam,P.K.S.,2003.Tuningofthestructureand

parametersofaneuralnetworkusinganimprovedgeneticalgorithm.IEEETrans.

NeuralNetwor.14(January(1)).

Liu,C.L.,Layland,J.,1973.Schedulingalgorithmsformultiprogramminginahard

real-timeenvironment.J.ACM20(1).

Makhija,V.,etal.,2006.VMmark:ascalablebenchmarkforvirtualizedsystems.

Technicalreport,VMWare.

Moller,K.T.,Virtualmachinebenchmarking.DiplomaThesis.UniversitatKarlsruhe

(TH).2007.

Moré,J.J.,1978.TheLevenberg–Marquardtalgorithm:implementationand

the-ory.In:Watson,G.A.(Ed.),NumericalAnalysis,Dundee1977.LectureNotesin

Mathematics,vol.630.Springer,Berlin,pp.105–116.

Padala,X.Z.,Wanf,Z.,Singhal,S.,Shin,K.,2007.Performanceevaluationof

vir-tualizationtechnologiesforserver consolidation.HPLabs,Tech.Rep.

HPL-2007-59.

Soror, A.A., Minhas, U.F., Aboulnaga, A., Salem, K., Kokosielis,P., Kamath, S.,

2008.Automaticvirtualmachineconﬁgurationfordatabaseworkloads.ACM

Trans.DatabaseSyst.35(1),47p,Article7,doi:10.1145/1670243.1670250,

http://doi.acm.org/10.1145/1670243.1670250.

Tickoo,O.,Iyer,R.,Illikkal,R.,Newell,D.,2010.Modelingvirtualmachine

perfor-mance.ACMSIGMETRICSPerform.Eval.Rev.37,55.

Tikotekar,A.,Vallˇıee,G.,Naughton,T.,Ong,H.,Engelmann,C.,Scott,S.L.,Filippi,A.M.,

2008.Effectsofvirtualizationonascientiﬁcapplicationrunningahyperspectral

radiativetransfercodeonvirtualmachines.In:Proc.ofHPCVirt’08.USA,ACM,

pp.16–23.

Weng,C.,Wang,Z.,Li,M.,Lu,X.,2009.Thehybridschedulingframeworkfor

virtual machine systems VEE‘09. In:Proceedings of the 2009ACM

SIG-PLAN/SIGOPSInternationalConferenceonVirtualExecutionEnvironments,

ACM,pp.111–120.

White,J.,Pilbeam,A.,2010.ASurveyofVirtualizationTechnologiesWith

Perfor-manceTesting,ArXive-prints,1010.3233.

Youseff,L.,Butrico,M.,DaSilva,D.,2008.Towardauniﬁedontologyofcloud

comput-ing.In:GridComputingEnvironmentsWorkshop,2008.GCE‘08gridComputing

EnvironmentsWorkshop,2008,GCE‘08,pp.1–10.

GeorgeKousiourisreceivedhisdiplomainElectricalandComputerEngineering

fromtheUniversityofPatras,Greecein2005.HeiscurrentlypursuinghisPh.D.in

GridandCloudcomputingattheTelecommunicationsLaboratoryoftheDept.of

ElectricalandComputerEngineeringoftheNationalTechnicalUniversityofAthens

andisaresearcherfortheInstituteofCommunicationandComputerSystems(ICCS).

HehasparticipatedintheEUfundedprojectsBEinGRIDIRMOSandOPTIMISandthe

NationalprojectGRID-APP.Inthepasthehasworkedforprivate

telecommunica-tionscompanies.Hisinterestsaremainlycomputationalintelligence,optimization,

computernetworksandwebservices.

TommasoCucinottagraduatedinComputerEngineeringattheUniversityofPisa

in2000,andreceivedthePh.D.degreeinComputerEngineeringfromtheScuola

SuperioreSant’AnnaofPisain2004.HeisAssistantProfessorofComputer

Engineer-ingattheReal-TimeSystemsLaboratory(ReTiS)ofScuolaSuperioreSant’Anna.His

mainresearchactivitiesareintheareasofreal-timeandembeddedsystems,with

aparticularfocusonreal-timesupportforgeneral-purposeOperatingSystems,and

security,withaparticularfocusonsmart-cardbasedauthentication.

TheodoraA.VarvarigoureceivedtheB.TechdegreefromtheNationalTechnical

UniversityofAthens,Athens,Greecein1988,theMSdegreesinElectrical

Engi-neering(1989)andinComputerScience(1991)fromStanfordUniversity,Stanford,

Californiain1989andthePh.D.degreefromStanfordUniversityaswellin1991.

SheworkedatAT&TBellLabs,Holmdel,NewJerseybetween1991and1995.

Between1995and1997sheworkedasanAssistantProfessorattheTechnical

UniversityofCrete,Chania,Greece.Since1997shewaselectedasanAssistant

Pro-fessorwhilesince2007sheisaProfessorattheNationalTechnicalUniversityof

Athens,andDirectorofthePostgraduateCourse“EngineeringEconomicsSystems”.

Prof.Varvarigouhasgreatexperienceintheareaofsemanticwebtechnologies,

schedulingoverdistributedplatforms,embeddedsystemsandgridcomputing.

Inthisarea,shehaspublishedmorethan150papersinleadingjournalsand