• Non ci sono risultati.

The effects of scheduling, workload type and consolidation scenarios on virtual machine performance and their prediction through optimized artificial neural networks

N/A
N/A
Protected

Academic year: 2021

Condividi "The effects of scheduling, workload type and consolidation scenarios on virtual machine performance and their prediction through optimized artificial neural networks"

Copied!
22
0
0

Testo completo

(1)

ContentslistsavailableatScienceDirect

The

Journal

of

Systems

and

Software

jo u r n al h om ep a g e :w w w . e l s e v i e r . c o m / l o c a t e / j s s

The

effects

of

scheduling,

workload

type

and

consolidation

scenarios

on

virtual

machine

performance

and

their

prediction

through

optimized

artificial

neural

networks

George

Kousiouris

a,∗

,

Tommaso

Cucinotta

b

,

Theodora

Varvarigou

a

aDept.ofElectricalandComputerEngineering,NationalTechnicalUniversityofAthens,9,HeroonPolytechniouStr.,15773Athens,Greece

bReal-TimeSystemsLaboratory(ReTiS),ScuolaSuperioreSant’Anna,ViaMoruzzi,1,56100Pisa,Italy

a

r

t

i

c

l

e

i

n

f

o

Articlehistory:

Received25February2011

Receivedinrevisedform6April2011

Accepted6April2011

Available online 14 April 2011 Keywords:

Virtualization

Real-timescheduling

Cloudcomputing

Artificialneuralnetworks

Geneticalgorithms

Performanceprediction

a

b

s

t

r

a

c

t

Theaimofthispaperistostudyandpredicttheeffectofanumberofcriticalparametersonthe perfor-manceofvirtualmachines(VMs).Theseparametersincludeallocationpercentages,real-timescheduling decisionsandco-placementofVMswhenthesearedeployedconcurrentlyonthesamephysicalnode, asdictatedbytheserverconsolidationtrendandtherecentadvancesintheCloudcomputingsystems. DifferentcombinationsofVMworkloadtypesareinvestigatedinrelationtotheaforementionedfactors inordertofindtheoptimalallocationstrategies.Whatismore,differentlevelsofmemorysharingare applied,basedonthecouplingofVMstocoresonamulti-corearchitecture.Foralltheaforementioned cases,theeffectonthescoreofspecificbenchmarksrunninginsidetheVMsismeasured.Finally,ablack boxmethodbasedongeneticallyoptimizedartificialneuralnetworksisinsertedinorderto investi-gatethedegradationpredictionabilityapriorioftheexecutionandiscomparedtothelinearregression method.

© 2011 Elsevier Inc. All rights reserved.

1. Introduction

During the recent years, server consolidation through the use of virtualization techniques and tools such as VMware (http://www.vmware.com/), XEN (http:www.xen.org/), KVM (http://www.linux-kvm.org/)iswidelyusedindatacentersorin newcomputingparadigms suchasCloudcomputing (Armbrust etal.,2010).Throughthistechnique,applicationswithdifferent characteristicsandrequirementscanruninsidevirtualmachines (VMs)onthesamephysicalmulti-corehost,withincreasedlevels of security, fault tolerance, isolation, dynamicity in resource allocationandeaseofmanagement.

However,anumberofissuesarisefromthisadvancementin computing,likethedegradationoftheperformanceofthe applica-tions,duetotheusageofthevirtualizationlayer(Tikotekaretal., 2008).Exceptfortheextralayer,anumberofotherparametersmay alsoaffecttheperformanceindistributedandvirtualized environ-ments.ThesemaybethepercentageofCPUallocatedtotheVMand itseffectonexecutiontimeandtheschedulinggranularity(theway thispercentageisassigned)thatmayalsoaffectthelatter.What

∗ Correspondingauthor.Tel.:+306939354121.

E-mailaddresses:gkousiou@mail.ntua.gr,gkousiou@yahoo.com(G.Kousiouris),

tommaso.cucinotta@sssup.it(T.Cucinotta),dora@telecom.ntua.gr(T.Varvarigou).

ismore,consolidationdecisions,whethertheseimplythe assign-mentofVMstothesamecoreorneighbouringcoresonthesame physicalnode,mayhaveaneffectonapplicationperformance.In addition,differenttypesofapplicationsmayhavedifferent inter-ferenceeffectontheco-scheduledonesonthesamephysicalnode. Thismayhappenduetoapplicationinternalcharacteristics,like typeofcalculations,usageofspecificsub-circuitsoftheCPU,effect duetocachesharing,memoryaccesspatternsetc.

The degradation of performance in this case may lead to increasedneedforresources.However,thecurrentCloudbusiness SPImodel(Youseffetal.,2008)identifiesdiscreterolesbetween theentitythatoffersaplatform(PaaSprovider)andtheentitythat offerstheresourceswhichthisplatformmayuse(IaaSorCloud provider).Thismodel,inconjunctionwiththeaforementioned per-formanceoverhead,createsthefollowingproblem:

• APaaSproviderhasestimatedthatXnumberofresourcesare neededformeetingtheQoSfeaturesoftheapplicationinsidethe VM.Thisestimationwillprobablybemadeafterobservingthe applicationbehaviorwhenrunningasstandalone.

• TheIaaSproviderreceivesarequestinordertoreservethisX amountofresources.Inanattempttomaximizeprofit,theVMs oftheapplicationwillbeconsolidatedwithotherVMsthatare admittedintheinfrastructure.Thewaythisconsolidationis per-formedisunknowntothePaaSlayer.Thisdoesnotmeanthat

0164-1212/$–seefrontmatter © 2011 Elsevier Inc. All rights reserved.

(2)

theIaaSproviderwillgivelessresources.Thesameamountas requestedwillbeallocated.

• Theperformancedegradationduetotheaforementioned consol-idationandaccordinginterferencewillbetranslatedtodecreased QualityofService(QoS)levelsfortheapplication.

• ThebreachofconfidencebetweenthePaaSandIaaSprovideris inevitable.Theformerwillconsiderthatthelatterhasdeceived him/her,andthattheallocationofresourceswaslessthanagreed. Thelatterwillconsiderthattheformermadeapoorestimation andistryingtotransfertheresponsibilitytotheinfrastructure layer.

Therefore,itisimperativeforanIaaSprovidertobeableto pre-dictaccuratelytheoverheadthatisproducedbytheconsolidation strategyandthereforeincreaseapriorithenumberofresources allocatedtoanapplicationaccordingly.Thiswillnotonlyleadto improvedreputationbutwillenabletheIaaSprovidertooptimize theassignmentofVMstohostsorevencoresofonehost, depend-ingonthetypeofworkloadtheseVMsproduceandtheestimated interferenceofonetypeofworkloadovertheother.Byknowing whichcombinationsresultinlessoverhead,over-provisioningwith regardtostandaloneexecutioncanbeminimized.

Theaimofthis paperis totakeunderconsideration a wide rangeofparametersthataffectapplicationperformancewhen run-ningonconsolidatedvirtualizedinfrastructuresandquantifythis overhead.Therefore,IaaSprovidersmayperformintelligent deci-sionsregardingtheplacementofVMsonphysicalnodes.Thiswill mitigatetheaforementionedproblemandwillaidtheIaaSlayer tominimizetheneedforover-provisioning.Asindicative applica-tionsweconsiderthe6standardMatlabbenchmarktests,dueto thefactthattheyrepresentawideareaofcomputational work-load,frommathematicalcalculationstographicsprocessing,and areeasilyreusablebyotherresearchers.Thefollowingaspectsare investigated:

• Scheduling decisions for EDF-based (Earliest Deadline First) scheduling(budgetQoverperiodP),bothintermsofCPUshare assigned(Q/P)andgranularityofthisassignment(P).Thistype ofscheduling(LiuandLayland,1973),whichisverypopularin thereal-timeworld,isusedinordertoguaranteethatthe assign-mentsofCPUpercentagestotheVMswillbefollowedandto ensuretemporalisolation.

• Differentcombinationsofbenchmarktests,inordertodiscover theonesthatshowlessinterference(andthereforeoverhead), whenrunningconcurrentlyonthesamenode.

• Differentplacementdecisionsandhowtheseaffecttheoverhead (placementonthesamecore,inadjacentornon-adjacentcores inonephysicalnode,comparedwithstandaloneexecution).As adjacentcoresweconsidertheonesthatshareaL2cache mem-ory.

• Abilitytopredictinadvancetheseeffectsinordertoproactively provisiontheresourcesneededbyanapplicationandbeableto choosethesuitableconfigurationthatcreatestheleastoverhead. Forachievingthis,artificialneuralnetworks(ANNs)areused,that areautomaticallydesignedandoptimizedthroughaninnovative evolutionary(GA-based)approach.Thetopologyofthenetworks isdynamic interms ofnumber ofhidden layers,neuronsper layerandtransferfunctionsperlayerandisdecidedthroughthe optimizationprocessoftheGA.

Theremainingofthepaperisstructuredasfollows.InSection 2,similarapproachesintherelatedfieldarepresented,whilein Section3anextendedanalysisismadeonthechosenparameters forinvestigation.Section4containsthedescriptionofthetest-bed andthemeasuringprocess,whileSection5presentsthe analyt-icalresultsfromtheexperiments.Section6detailstheapproach

inordertopredictinadvancetheeffectandcomparesittothe multi-variatelinearregressionmethod,whileSection7provides theoverallconclusionsfromthisstudyandintentionsforthefuture.

2. Relatedwork

Inthepast,wehaveinvestigatedpartlyanumberofthe param-etersthatareconsideredhere.Inourpreviousworks(Cucinotta etal.,2010a;Kousiourisetal.,2010),themainfocuswasgivenon theeffectofschedulingparametersand%ofcpuassignmentson theQoSofferedbyspecificapplications(mainlyinteractiveones).

Given that virtualization technologies have become very popular during the recent years, a large number of interest-ing research efforts exist. In (Padala et al., 2007), the authors investigate the overhead that is inserted through the use of differentvirtualizationtools, withtheintentiontodiscoverthe mostefficient one. In (Makhija et al., 2006), the main focus is the scalability of virtual machines with mixed workloads and theeffectontheirperformancewhen consolidated.The bench-marks are based on server applications. With the extended use of Cloud technologies, applications that are envisioned to be part of their workload may have more complicated work-loads rather than traditional data center ones. Such examples may be the case of the Digital Film Post-production applica-tion (http://www.irmosproject.eu/Postproduction.aspx) of the IRMOS project or the availability of mathematical program-mingand optimizationtoolslikeGAMSinCloudinfrastructures (http://support.gams-software.com/doku.php?id=platform:aws). Furthermore,notemporalisolationbetweenthevirtualmachines isused.

Averyinterestingapproach,similartoourwork,isKohetal. (2007).Inthiscase,theauthorsstudytheperformanceinterference ofcombinationsofelementaryapplicationswhenrunninginside co-allocatedVMs.Interestingscoremethodologyispresentedthat canalsobeusedforclassifyingapplications.Theworkload char-acteristicsofeachapplicationaregatheredinordertobeusedas acomparisoninthefutureforidentifyingunknownapplications thataregoingtobeinsertedintheinfrastructureandaspredictors inalinearregressionmodel.Themaindifferencesofourapproach lieonthefactthatweinvestigatealimitednumberofspecificand genericbenchmarktests,withtheuseofrealtimeschedulingand adifferentpredictionmethodbasedonANNs.Especiallytheuseof RTschedulingensuresthetemporalisolationofVMswhen run-ningconcurrentlyonthesameCPU andthefairdivision ofthe resourcepercentagesthateachVMutilizes.Inaddition,we inves-tigatedifferenthardwaresetupinterferences,suchasrunningVMs onthesamecoreorinadjacentcoresinmulti-corearchitectures anddifferentsetupsintermsofgranularityandpercentageforeach elementaryapplication.

InSororetal.(2008),averypromisingapproachfordynamically handlingtheresourcesonwhichconcurrentVMsareexecutedis portrayed,inordertooptimizesystemutilizationandapplication responsetimes.Thisisbasedonbothofflinerecommendationsand onlinecorrections.Howeverthisworkiscenteredarounddatabase applications.InMoller(2007),adesignandvalidation methodol-ogyofbenchmarksforvirtualmachinesispresented,wherealso theconcurrencyeffectsareinvestigated.In Tickooetal.(2010), anumberofconfigurationsistakenunderinvestigation,like run-ningonthesameoradjacentcores,throughhyper-threadingornot. Experimentsarefocusedon3testsandwithouttakingunder con-siderationschedulingparameters.Predictionmethodsarebasedon analyticalformulas.

InJinetal.(2008),VSCBenchmarkisdescribed,ananalysisof thedynamic performanceofVMswhennewVMs aredeployed. Theauthorsinvestigatethebehavioroftheco-scheduledVMsfor

(3)

bothlaunchingandmaintainingmultipleinstancesonthesame server.BoottimeforaVMisaveryresourceconsumingtask,that mayleadotherco-scheduledtaskstoresourcestarvation.Inour work,thisisavoidedthroughtheuseoftherealtimescheduling thatlimitstheresourcesassignedtoatask(likeaVM)toa prede-finedpercentage.Thustemporalisolationisachievedbetweenthe concurrentlyrunningtasks.

InWengetal.(2009),aninterestingapproachforscheduling VMsispresented,withtwomodesofoperation(high-throughput andconcurrent)whichaimsatefficientlyhandlingCPUtimefor thephaseswheretheVMshavemultiplethreadsrunning concur-rently.Thisapproachismorefocusedontheschedulingaspectsand doesnottakeintoconsiderationconcurrentVMsrunningonthe sameserver.TheVirtualGEMSsimulatorfortestingdifferent con-figurationswithregardtoL2setupbetweendifferentconcurrently runningVMsappearsinGarcía-Guirodoetal.(2009).

A benchmarking suite specifically for VM performance is depictedinEl-RefaeyandRizkae(2010).Whileitpinpointssome importantfeaturesforbenchmarkingofvirtualizedapplications,it mainlyfocusesononetypeofworkload(linuxkernelcompilation). Asurveyofvirtualizationtechnologiesinadditiontoperformance resultsfordifferentconfigurationsappearsinWhiteandPilbeam (2010).InCucinottaetal.(2009,2010b),aninvestigationis car-riedoutonthepossibilitytoenhancethetemporalisolationamong virtualmachinesconcurrentlyrunningonthesamecore,byusing theIRMOSreal-timescheduler,focusingoncompute-intensiveand network-intensiveworkloads.

InKhalidetal.(2009)averyinterestingapproachforadaptive real-timemanagement of VMsin HighPerformance Computing (HPC)workloadsispresented.Thismechanismisusedduringthe operationoftheHPCinfrastructure,inwhichtheVMshavea prede-finedrunningperiodwhichisdecidedbytheinfrastructuresharing policy.Themaingoalistocalculateinrealtimeifthejobthatis includedintheVMisgoingtobefinishedintimeornot.Theones thatwillnotanywayfinishintimeareprematurelykilledinorderto improvethesuccessrateoftheoverallinfrastructure.Inourview, thisworkcanbeusedincombinationwiththeonepresentedinthis paper.Themodelsthataredescribedherecanbeusedforobserving andminimizingtheoverheadofaspecificallocationoverthe differ-entnodes,whiletheworkpresentedinKhalidetal.(2009)canbe usedfortheruntimeoptimizationofthevirtualizedinfrastructure. GA-basedoptimizationofANNshasbeenusedinthepastina numberofsignificantworks.Forexample,inLeungetal.(2003),a numberoftheparametersofthenetwork(likeconnectionweights) arechosenthroughGAs.Themaindisadvantageofthisapproach liesonthefactthatastaticstructureforthenetworkisrequired.In othercases(likeinIlonenetal.(2003))theyareusedforindirect optimization,throughforexampleoptimizingthetrainingprocess. InFiszelewetal.(2007),anotherapproachcombiningGAsandANN architectureispresented,howeverthestructureofthenetworkis limitedtotwohiddenlayersandstatictransferfunctions,while experimentingwithadifferentsetofparameterssuchasnumber ofnodesandtheirconnections.ThesameappliesforGiustolisiand Simeone(2006).Inourapproach,thenetworksmayhaveadynamic topology,intermsofnumberofhiddenlayers,numberofneurons perlayerandtransferfunctionsperlayer.Otherparameterssuch asdifferenttrainingmethodscanbeaddedatwill.

3. Investigatedparameters

Inageneralpurposeoperatingsystem,whentwotasksstartto execute,theycompetewitheach otherfor theunderlying com-putationalresources.Theamountofthelatterthatisassignedto each taskmaybedividedaccording tothetemporal needsand demandsofthem.However,thisresultsinaninabilitytopredict

theQoSofferedbyanapplicationrunninginsideaVM,inshared infrastructures.The percentage ofCPU allocatedtoit mayvary significantlydependingontheco-locatedtasksdemand(likethe bootingofanotherVMinTickooetal.(2010)).Onesolutionto mit-igatethiseffectisderivedfromtherealtimeworld.Throughthe useofEDF-basedscheduling,thepercentageassignedtoataskmay beguaranteed.Ingeneral,forthistypeofscheduling,therearetwo significantparameters,thebudgetQandtheperiodP.Qisthe com-putationaltimeonthephysicalnodethatisassignedtoataskover aperiodP.Thus,theratioofQ/Pistheresultingpercentageonthe machine.However,foragivenpercentage,differentgranularities mayexist,dependingonP(forexample,50%maymean50msevery 100msoritmaymean250over500ms).Oneinterestingquestion ishowtheseparametersaffectapplicationperformance.

As investigated in Cucinotta et al. (2010a), Kousiouriset al. (2010),theeffectofreal-timeschedulingparametersmayaffectthe applicationperformanceinavarietyofways.Forexample,inour aforementionedworks, that investigateinteractiveapplications, differentgranularitiesofthesame CPUpercentage assignments lead toincreasedstandard deviationvalues forserverresponse times.Theconclusionfromthatanalysiswasthatforinteractive applications,lowgranularity(smallschedulingperiodP)leadsto morestableperformanceforthesamepercentageofCPUassigned. In this work,we focus on6computationally intensive applica-tions.Theeffectofschedulingdecisions(forbothpercentageofCPU allocatedandgranularityofthisallocation)ontheirperformance metricsisobservedalongwiththeinterferenceoftheco-allocation ofVMsforthesameCPU%assigned.

AsidentifiedinIRMOSprojectD5.1.1 (2009),themetricthat iscurrentlyusedbyinfrastructureprovidersinordertodescribe hardwarecapabilitiesismainlyCPUfrequency.Howeverthis met-ricisfarfromsufficient,giventhatitdoesnotreflectthephysical node’s ability to solve problems (does not take under consid-eration other factors such as bus speed, memory speed, etc.). Otherapproaches suchastheBerkeleydwarfs(http://view.eecs. berkeley.edu/wiki/Dwarfs)or Matlabbenchmarktests aremore abletocaptureanode’scomputationalcapability.In thispaper the second approach was chosen. The Matlab benchmarks (http://www.mathworks.com/help/techdoc/ref/bench.html) con-sistof6tests,thatmeasurethephysicalnode’s(andMatlab’sfor thismatter)abilitytosolveproblemswithdifferentcomputational requirements.Thereforethesetestsareusedforbothdetermining thehardwarecomputationalcapability(testscore)andthe charac-terizationofthetypesofworkloads(testnumber).

Whatismore,theyrepresentawideareaoftypesofworkloads (Table1)withalimitednumberoftestsanditisthereasonthe spe-cificapproachwaschosen.Forthese,weinvestigatetheeffectofthe CPUallocationandtheeffectoftheircombinationinconcurrently runningVMs.Itisexpectedthatdifferentcombinationswillhave differenteffect,duetothenatureofthecomputations.The degra-dationoftheperformanceisreflectedonthetestscore.Lowertest scoresindicatebetterperformance.Ingeneral,thetestscoreisthe timeneededtomakethenecessarycalculationsforeachtest.

Furthermore,differentallocationscenariosareinvestigatedfor multi-corearchitectures.TheVMsarepinnedwithavarietyofways

Table1

Details regarding the Matlab benchmark tests and the specific

computa-tional patterns thattheyrepresent (http://www.mathworks.com/help/techdoc/

ref/bench.html).

Test1 Floatingpointoperationswithregularmemoryaccesses Test2 Floatingpointoperationswithirregularmemoryaccesses

Test3 Datastructures

Test4 Floatingpointandmixedintegeroperations

Test5 2-Dgraphics

(4)

oncoresofthesamephysicalnodeinordertoexaminethe

interfer-enceeffectperscenarioontheirperformance.Thisisexpectedtobe

reducedduetothetemporalisolationwhensharingthesamecore

(withtheexceptionofcacheinfluence).Howeveritisinterestingto

seewhathappenswhentheVMsthatcontaintheindividualtests

arescheduledondifferentcoresonthesamephysicalnode,thus

runninginrealparallelismandcontendingforthesamesub-circuits

liketheRAMbus.

So,overallwhatisinvestigatedis:

• differentcore%assignmentsforasingleVMrunningan

elemen-tarytest,

• differentgranularityofallocationassignmentsfora singleVM

runningatest,

• mutualinterferenceeffectofcombination oftestsfor

concur-rentlyrunningVMsonthesamecore(sharedL1andL2cache

memories)andforavarietyofschedulingperiods,

• mutualinterferenceeffectofcombination oftestsfor

concur-rentlyrunningVMsonadjacentcores(withsharedL2cache)on

thesamephysicalnodeandforavarietyofschedulingperiods,

• mutualinterferenceeffectofcombination oftestsfor

concur-rentlyrunning VMs onnon-adjacentcores (non-shared L1/L2

cache)onthesamephysicalnodeandforavarietyofscheduling

periods,

• ping times for differentping periods,%core assignments and

schedulingperiodsofVMsinordertoinvestigatetheeffecton

networkresponsiveness.

4. Testbed

For the real time scheduler, the one that is presented in

(Checconietal.,2009)waschosen.Thehostoftheexperimentswas aquadcore(2.4GHz)CPU,with8GBofRAMand8MBofL2cache (dividedintwo4MBpartspercorecouple).Powermanagement wasswitchedoffinordernottodynamicallychangetheprocessor speedforpowerefficientoperation.ThehostOSisUbuntuLinux 10.10,theVMOSisWindows7(withCygwinandOpenSSH)and thehypervisorisKVM.TheMatlabexecutablecontainingthetests tookasargumentsthedurationoftheexperiment,thenumberof testtorun,anexecutionIDandtheschedulingparameters(Q,P) fordocumentationpurposes.ItwascompiledinMatlabR2007b. In order to be able to run individual tests and not the entire suite,thevariationofthebenchmarkfunctionthatisprovidedin

(http://www.mathworks.com/matlabcentral/fileexchange/11984) was used. This was combined with suitable logic in order to performthetestsforaspecificperiodoftimeandnotforaspecific number of runs. The reason for this is that each one of the 6 elementary tests hasa different execution time, which is also affectedbytheresourcesallocatedtoit.Sothesynchronizationof themcannotbebasedonnumberofruns.Furthermore,inorder toruneachtestforanincreasednumberoftimes,thisduration wassignificantlylargersothatalargesamplecouldbecollected (intherangeofhundredsoftestrunsperconfiguration).Fromthis sample,themeanvalueoftheexecutiontimeswasextractedfor eachcase.

TheexperimentswerecoordinatedbyaJavamodule (Coordina-tor)thatwaspinnedononecore(CPU0).Thiscodeimplemented theinterfacetowardstherealtimescheduler,inordertochange theQ,PschedulingparametersontheprocessesoftheVMs.The VMswerepinnedtoeitheronecore(CPU2)oreachoneon dif-ferentcores(CPU1,2or3),dependingontheexperiment.After thesetupofthescheduler,theCoordinatorwasusedinorderto connectintotheVMsthroughSSHandlaunchdifferent combina-tionsoftheMatlabexecutablesforaspecificperiodoftime.The timewasmeasuredfrominsideMatlab,andeachexecutionwitha givenconfigurationwasperformeduntilacertaintimeperiodhad passed(500s).Thisleadstoasmallviolationofthetimeperiod, butitwasmuchsmallerthantheoverallduration.Itiscriticalto mentionthatifthefunctionsusedtomeasuretimearebasedonthe clockfunctionofMatlab,thisisfarfromaccurate.Itwasobserved thatwithhighutilizationinsidetheVM,thesetimingmechanisms lostcompletelytrackoftime (indicatively,whilethesimulation wasrunningfor2h,insidetheVMonly5measuredminuteshad passed).Thisiswhythetic-tocmeasuringfunctionofMatlabwas used.

Thepseudo-codefortheCoordinatorappearsinFig.1andfor theexecutableinsidetheVMinFig.2.Theoverall implementa-tiondesignandtheinteractionsbetweenthenecessarysoftware componentsappearinFig.3.Indetail,thedifferentstepsneeded are:

• Launching of the Java Coordinator. This component has the responsibility ofkeeping track of theexecution loops forthe variousparametersoftheexperimentslikeschedulingperiods, CPUpercentagesassignedandtestcombinations.Furthermore, itraisesthethreadsthatcontacttheschedulerinterface(Python script)forsettinguptheCPUallocationsforeachVM.

Fig.1.Pseudo-codeforJavaCoordinatorsequenceofactions.

(5)

Fig.3.Softwarecomponentsneededfortheimplementationandtheirinteraction:themaincomponentistheJavaCoordinatorwhichkeepstrackofexecutionloopsin

ordertocovertheinvestigatedinterval,setsuptheschedulingparametersoftheVMsandconnectstotheguestOSinordertolaunchthespecifictestcombinationdictated

bythecurrentexecution.

• The next step is to raise different threads in order to con-necttotheVMs and launchtheMatlab executableswiththe specificarguments(testcombinations).Thesethreadsare syn-chronized in the end of their execution, in order to ensure theconcurrencyofthenextcombination execution.It is crit-icalto mentionthat in orderto ensurea stable execution of thethreadsfor anunlimitedperiod oftime, the implementa-tionsuggestedin( http://www.javaworld.com/javaworld/jw-12-2000/jw-1229-traps.html?page=4)was followed. This guaran-teesthattheoutputanderrorstreamsarecapturedefficiently, inordertoavoidcomputerhangs.

• WhenbothexecutionsinsidetheVMsarecompleted,theJava Coordinatorprogresseswiththenextconfigurationdictatedby theloop.

Threadsynchronizationwasperformedinordertoensurethat theVMsareexecutedconcurrently.TheresultsofeachVMwere keptinalogfilelocallyattheVM.Attheendtheyweremergedin ordertoobtaintheoveralllogs.Detailsregardingthetestbedare documentedinTable2.

Thedurationofeachrun(aspecifictestcombinationwitha spe-cifichardwareconfiguration)wassetto500s.Thiswassufficient timetoensurethatthethreadswouldbeconcurrentlyexecuted forthemajorityofthetime,withouthavingrandomssh connec-tiondelaysinfluencethisprocess.Thesshconnectiondelaysranged from2.8to3.8s.Thisiswhythreadsynchronizationwasperformed intheendofthe500-sinterval.Thethreadthatfinishedfirstwaited forthesecondonetocomplete.Itwasalsoenoughinordertoletthe testsrunforanincreasednumberofruns(intherangeofhundreds), thusleadingtosufficientsamplegathering.

Table2

Test-beddetailsregardinghardwareandsoftware.

HostOS LinuxUbuntu10.10 GuestOS Windows7 Hypervisor KVM

BenchmarksMatlabR2007bexecutable

PhysicalhostIntelCore2DuoQ6600(Quadcore2.4GHz,8MBcache,8GBRAM)

Furthermore,anexecution IDwaspassedinside theVMs,in

ordertobeincludedin thereportedresults. Thiswasaunique

numberforeachexaminedcombination.Thisisnecessarydueto

thefactthatit wasobserved thatfora numberofrandom

rea-sons(likebrokentcppipes,etc.)alimitednumberofcombinations

didnotsucceedinstartingorfinishing.ForthesecasestheIDdid

notgetlogged.Thiswaytheresultsfromtheseparatefilesthat

arekeptinsideeachVMandconsolidatedintheendcouldbe

fil-teredinordertoruleouttheaforementionedcasesandrepeatthe

experimentforthem.

5. Measurements

Inthefollowingparagraphsthedifferentdeploymentscenarios

aredescribed,alongwiththeresultingmeasurements.Forallcases,

theschedulergranularitywasinvestigatedfrom200to800ms.A

warm-upperiodforthesystemwasintroduced,assuggestedbythe

literature,inorderfortheVMstobehaveinamorestablefashion.

Thatiswhytheexperimentsbeganfrom150ms,buttheresults

werediscardedforthatspecificvalue.Eachdatapointinthegraphs

showninthenextsectionsisthemeanvalueoftheindividualtest

runsinsidea500-sinterval.Ingeneraltheseindividualtestruns

wereintherangeofhundredsoftimes.

5.1. SingleVMonacore

Inthisscenariothemaximumpercentageassignedtoeachtest

is80%.Thisisduetothefactthattheremainingpartisleftforhost

systemtaskstoutilize.Inthisconfiguration,eachVMisexecuted

asstandaloneononecore.NootherVMisrunning.Thiswas

per-formedinordertoseetheeffectofthegranularityofthescheduling

parametersandthepercentage allocation,inadditionto

acquir-ingthebaselinescoresofun-interferedexecution(standalone)for

comparisonwiththeVMconsolidationscenarios.Asscheduling

parametersareconsideredtheCPUshareXassignedtoataskin

aperiodP.Ineveryperiod,thetaskisgivenX*PmsofCPUtime.For

exampleifwehaveaschedulingperiodof500msandaCPUshare

of40%,thismeansthatthetaskwillbegiven200msofCPUtime

(6)

Fig.4. TestscoresfordifferentCPUpercentagesandgranularity(schedulingperiod) whenthetestsarerunningwithoutanyinterference(co-scheduledtests).Thetest score(verticalaxis)indicatesthetimeneededforexecutingonetestrunandhas beencalculatedfromthemeanvalueofhundredsoftestrunsinsidea500-s execu-tionrun.Theschedulingperiod(horizontalaxis)isthetimeintervalinwhichthe allocationofX%CPUshareisguaranteed.(a)Standalonetest1;(b)Standalonetest 4;(c)Standalonetest6.

In Fig.4thescoresfor each test areportrayed, fordifferent CPUsharesandperiodsofassignmentofthem.Lowertestscores indicatebetterperformance.Inessence,thetestscoreisthetime neededforcompletingonetestrun.Onlyanindicativenumberis portrayedhere,remaininggraphsareincludedinAppendixA(Fig. A.1).

Fortests1–4,theconclusionisthatthebehavioris indepen-dentofthegranularityofscheduling.Thisisanticipatedsinceina standalonebasis,theeffectofinterferenceincachememoriesfrom taskswitchingisnon-existent.Withrelationtotheimprovement ofperformancewhentheCPUshareincreases,thisisalmostlinear, asseenfromtheinverselyproportionaltestscores.

Fortests5–6(graphicstests),thebehaviorismoreoscillatory, whichcanbeattributedtothedifficultyofthevirtualizationlayer toaccessoremulatethegraphicscard.

5.2. ConcurrentVMscollocatedonthesamecore

Thesecondexperimentaimsatexaminingtheeffectof concur-rencyontheperformanceofVMsthatarerunningonthesame core,thususingsharedL1andL2cachememories.Forthisreason, eachVMisallocated40%ofthecorecapacity.Theperformance scoresarecomparedtothe40%scoresofexperimentA.Resultsare showninFig.5.Theseillustratethescoresofeachindividualtest, whenrunningonthesamecorewitheachoftheothertests.That iswhythe(i,j)figuredoesnotcoincidewiththe(j,i)figure.Lower testscoresmeanbetterperformance.

Itisevidentfromtheabovegraphsthatwhilethescheduling periodPoftheassignmentsincreases,theperformanceoverhead oftheco-scheduledtestsimproves.Thisisowedmainlytothefact thatwithlargerperiods(andthesame%ofCPU)theapplications havemoretimetoutilizecomponentssuchasthecachememories. AlowerperiodresultsinmorefrequenttaskswitchingontheCPU andthus highercache interferenceand contaminationbetween theconcurrenttasks.Furthermore,theoverheadfromthecontext switching(scheduleroverhead)isincreased.

Anotherconclusionwasthatinsomepartsthegraphsseemedto portraysomeanomalieslikeinthetest3graph(Fig.5c).However, aftercarefullyobservingtheresults,itseemedthatwhenthehigh valuesofonetestoccurred,therewerelowvaluesfortheothertest. Thisledustoplotthecombinedperformance(addedbenchmark scores)ofbothco-scheduledVMs,whichshowedamorelogical system-levelgraph.Itseemsthatwhatmattersinthiscaseiswhich taskstartsfirst.Inthetest-beddescriptioninSection4itwasstated thattheexecutionthreadsaresynchronizedintheendofthetesting period,becausethiscannothappeninthebeginning.Thethreads arelaunchedsimultaneously,butduetoanumberofrandom rea-sons(likesshdelayforexample),theexecutionofeachtestinthe respectiveVMsdoesnotstartatexactlythesametime.Thisdelay wasintherangeof3s,howeveritwasmuchlowerthantheselected 500sdurationofeachcombination.Fromthesystemlevelgraphs moreaccurateconclusionscanbededucted.Thesearecombined withtheotherscenariosandaredepictedinFig.6.

Furthermore,thereseemtobeextremelyhighvaluesforthe 250msperiodP.Thishappensforalmostallgraphsandis con-tradictory tothe general trend of the graphs. By taking under considerationthewaytheforloopsareconstructedinFig.1,we canconcludethatthishappensduetosomeinternalinterference intheguestOS(e.g.anOStaskthatstartsexecutinginsidetheVM) andinfluencesthemeasurementstakenatthisperiod.

5.3. ConcurrentVMscollocatedonadjacentcores

The third experiment aims at investigating the effect on the performance of VMs when running on different but adja-cent cores on the same CPU. This is to identify the effect of

(7)

Fig.5. Effectonindividualtestscoresfordifferentcombinationsofconcurrenttests.Eachofthegraphsindicatesatest’sperformance.Itcontains7lineswhichcorrespond

tothistest’sperformance,wheneachoftheothertestsisrunningontheotherVM,inadditiontoitsperformancewhenitisexecutedasstandaloneforcomparisonpurposes.

(a)Test1performance;(b)test2performance;(c)test3performance;(d)test4performance;(e)test5performance;(f)test6performance.

L1 cache memory interference, when compared to the results of B. In this setup, the cores have different L1 cache memo-riesbutcommonL2 ones.Thechosenutilizationpercentagefor thisexperiment was40% percore.Thereasonfor choosingthis percentage was to have comparable results with B, for which maximumassignmentoftwotasks ononecoreis40% pertask (aftersubtractingapproximately20%thatisalwaysavailablefor systemtasks).Comparisonwiththeotherscenarios isincluded inFig.6.

5.4. ConcurrentVMscollocatedonnon-adjacentcores

ThefourthconfigurationwasdecidedtobepinningtheVMson non-adjacentcores.Thereasonforthissetupisthatthephysical nodeoftheexperimentshas4cores,whichshareaL2cachein pairs.SobypinningtheVMs onnon-adjacentcores,thegainin termsofhavingnosharedcachememoriesisinvestigated,in com-parisontothepreviousexperimentwhereL2memorywasshared. Forallscenarios,thecombinedperformanceoftheVMsper

(8)

com-Fig.6. Comparisonofsystemlevel(addedscores)performanceforalldeploymentscenariosand40%CPUallocationperVM.Ineachofthegraphsaspecificcombination

oftestsisinvestigatedinordertocomparetheiraddedscores(verticalaxis)withregardtodifferentdeploymentscenariosandhowthisperformanceisaffectedbythe

schedulingperiodP(horizontalaxis).(a)Tests1and1combinedperformance;(b)tests6and6combinedperformance;(c)tests1and4combinedperformance;(d)tests1

and5combinedperformance;(e)tests2and4combinedperformance;(f)tests2and6combinedperformance;(g)tests3and5combinedperformance;(h)tests4and6

combinedperformance.

binationoftestsisdepictedinFig.6.Inthisfigure,wehaveadded thetestscoresofbothVMsandhaveplottedallthedeployment scenariosforcomparisonpurposes.Theremainingcombinations thatarenotincludedinthisfigurearelocatedinAppendixB(Fig. B.1).

Theanticipationduringthebeginningoftheseexperimentswas thatinterferencebetweenmemoryusageofconcurrentlyrunning VMswouldsignificantlyaffectVMperformance.Thiswasvalidated fromalltestedcases.Thebaselinescoresaremuchbetterthanany otherconfiguration,eventhoughthepercentageofcoreassignedto

(9)

Fig.7.Percentageofthedegradationofbenchmarkscoresforalltest

combina-tionsandscenariosandP=800msinordertoillustratetherangeoftheoverhead

andtheidentificationofoptimumcombinations.Inthehorizontalaxisthedifferent

testcombinationsareportrayedwhileintheverticalonethedegradationofthe

performanceasextractedfromEq.(1).

theVMswasthesame.Furthermore,differenttestscreatedifferent levelsofinterference,thusvalidatingtheinitialassumptionsthat carefulanalysismustbemadeinordertooptimizeVMgroupings onavailablenodes.Higherperiodsfortheschedulerseemalsoto benefittheperformance,duetobettercacheutilization.

However,itwasalsoanticipatedthatmovingfromtightly cou-pled configurations (same core) to more loosely coupledones (adjacentandnon-adjacentcores)wouldbenefittheperformance oftheVMs.Afterobservingthegraphs,wecanconcludetothefact thatthisdoesnothappen.Giventhattheloosely coupledcases meanL1cacheindependence(foradjacentcores)andL1/L2cache independence(fornon-adjacentcores)itcanbeassumedthatthe bottleneckthatiscausingthislackofperformanceimprovement hastodowithRAMaccesses.Thefactorofinfluenceinthiscasecan bethememorybus(FSB)thatisnecessaryforaccessingtheRAM. Intheobservedhardware,thiswasa1066MHzbus.However,this bandwidthissharedbetweenthedifferentcorestryingtoaccess themainmemory.IfmorethanonecorestrytoaccesstheRAMat thesametime,theavailablebandwidthwillbedividedbetween them.Thus,whenallocatingVMsondifferentcores(withthesame 40%share),thedividedbandwidthisalimitingfactorincomparison tothesamecorescenario.Forthesamecoredeployment,VMsare executedoneaftertheother,inasequential,pseudo-parallel fash-ion.Duringthistime,eachVMhasalltheavailablebandwidthof theFSB.Whenallocatedindifferentcores,fortheintervalinwhich the40%activationperiodsoftheVMsoverlap,theirbandwidthon theFSBwillbedivided.GiventhatthenumberofaccessestoRAM (orwhen)thatarenecessarycanbefluctuating,thiscanexplain whyinsomecasesoneconfigurationisbetterthantheother,with oneeffectprevailingovertheother(cacheinterferenceovershared FSBbandwidth).

Inordertobetterconcludeandillustrateforwhichcombination ofteststhisimprovementisgreater,thepercentagedegradationof thesystemleveltestscoresisportrayedinFig.7,forP=800ms.We chosethespecificperiodduetothefactthatitwasshownfrom thepreviousgraphsthatforthisvalueofPthesystemhadtheleast overhead.Thedegradationoftheperformanceistheratiobetween thetestscoresoftheaddedconcurrentexecutionsandtheonesof theaddedstandaloneones,forallthecombinationsoftestsandis derivedfromEq.(1).

%Degradation=100×(TAI+TTBI)−(TABase+TBBase)

ABase+TBBase (1)

Inthisformula,AandBarethetestnumberspercombination,Iis thedeploymentscenario(samecore,adjacentcores,non-adjacent cores)andBaseisthescoreswhenthetestsareexecutedas stan-dalone,withoutanyothertestrunningonthephysicalnode.

Theaimofthisfigureistoshowtheconcretepercentage over-heads ofdifferent scenariosand for differenttestcombinations andtoidentifyanypatternsthatcanbeofbenefitforextracting genericconclusions.Fromthisitcanbeshown thatthe compu-tationalcapabilitiesoftheVMscanrangefromalmost0andup to160%,dependingonthecollocatedtasksandthedeployment scenario.

Fromthisgraphitisalsoevidentthatthebestcombinations areforallapplicationswithtest5andthesecondfittestcandidate istest6.Thereasonwhythesegraphicstestsarethefittestcanbe attributedtothefactthattheyuseasmallamountofdata,onwhich theyperformalargenumberofcomputations.Worstcandidatefor co-allocationistest4,whiletheadjacentcoresplacementseemsto producethemaximumoverhead(combinationofL2andFSBbus interference).Thisisevidenceoftheimportanceofthe investiga-tionoftheinterferencecausedbetweenco-scheduledVMs,which canleadtosignificantdegradationoftheQoSfeaturesof applica-tionsrunninginvirtualizedinfrastructures.Furthermore,inmost casestheadjacentcoresetuphastheworstperformance,duetoL2 cacheinterferenceanddivisionofmemorybusbandwidth.L1cache interferenceseemstoaffectonlyalimitednumberofcombinations (like1and2).

5.5. Statisticalanalysisofthemeasurements

Inordertogatherthedatasetthatwaspresentedintheprevious graphs,eachconfiguration(specificschedulingperiod,specifictest combinationanddeploymentscenario)waslettorunfor500s.This meansthateachpointinthegraphs(withtheaccordingsetup)was producedfromtakingthemeanfromalltherunsexecutedinthis 500-sinterval.Thenumberofrunswasfluctuatingduetothefact thateachtesthaddifferentcomputationalneedsandalsothe over-headfromtheconcurrentlyrunningVMswasvarying.Howeverit wasintherangeofhundredsoftimesforeachdatapointThisis indicativealsoofthetestscores,whichindicatethetimeneededto performonerun.Themeanvaluefromthesehundredsoftimesof executionsforeachconfigurationwasusedinthegraphs.Details regardingthespecificnumbersforeachexperimentareincluded inAppendixD(Fig.D.1).

In order to further validate the data gathering process, we repeatedanumberoftheexperimentsfor10times.Thismeansthat theexecutionswithaspecificconfiguration(testcombinations,Q/P ratio,Pschedulingperiod)wereexecuted10timesinorderto inves-tigatethedifferencebetweennon-consecutiveexecutionswiththe sameconditions.Duringthisexperiment,eachconfiguration (500-sinterval)wasrunfor10timesinanon-consecutivemanner.The results(meanvalues ofthetestscoresinthe500-sintervaland standarddeviationsofthevaluesinsidethisinterval)areportrayed inFig.8.Fromthisitcanbeobservedthatthedifferencebetween thedifferentexecutionsisverysmall.

Furthermore,thenumberofrunsforeachtestineachofthe exe-cutionswasdocumentedandisshowninFig.9.Fromthisitcanbe observedthatforeachexecution(fromwhichthemeanvalueand standarddeviationthatisportrayedinFig.8wasextracted)around 320runswereconductedfortest4andaround230runsfortest5. Thedifferenceinthenumberofrunsbetweenthetwotestsoccurs fromthefactthateachtesthasdifferenttypesofcomputations, thusdifferenttimeofexecution.Itwaschosentosynchronizethe executionsbasedonagiventimeinterval(andnotonnumberof runs)inordertoensuretheconcurrencyofthecomputations.By havingthesamenumberofruns(anddifferenttimeforeachrunof eachtest)wewouldhavetheproblemofonetestfinishingfaster thananother.Thustheremainingrunsoftheslowertestwouldbe executedasinthestandalonephase.

Fromtheabovegraphswecanconcludethatduetothefactthat thetestrunsineachexecutionaresignificantlyhigh,repetitionof

(10)

Fig.8.Variationofthemeantimesandstandarddeviationsofthetestscoresfor

10differentnon-consecutiveexecutionsofthesameconfiguration(combination

oftests4and5,schedulingperiod=700ms,samecoreexecution).Thesamplefor

extractingthemeanvalueandstddevforeachexecutionhasbeengatheredafter

lettingeachexecutionrunfor500s.

Fig.9.Numberoftestrunsinsidethe500-ssamplegatheringintervalforeachof

thedifferentnon-consecutiveexecutionsofthesameconfiguration(combination

oftests4and5,schedulingperiod=700ms,samecoreexecution).

theexperimentswillproduceverysimilarvalues(asdepictedin Fig.8).Thedistributionoftheindividualvaluesofthetestscores fortest4appearsinFig.10,foroneofthe10executions(Fig.10a) andall10repetitions(Fig.10b).Fromthiscomparisonitcanbe concludedthatthetwodistributionsarealmostalike,showingthe peaksinthesamebins(0.8–1.2and1.2–1.4values).

Regardingthestatisticalanalysisofthesamplesgatheredinside asingleexecution,theconfidenceintervalsaredepictedinFig.11, forbothtests(4and5)oftheselectedrun.Forthisrun(500-s inter-val),test4wasexecuted232timeswhiletest5wasexecuted321 times.Inordertoprocesstheresults,thedfittoolofMatlabwas used.Fromthiswecanobservethat,especiallyfortest5thereare twomajorareasofconcentrationofthetest scores,aroundthe 0.9and1.3values.Thiscanbeattributedtotheinactiveperiods insertedbytheschedulerforthetask.Thisconfigurationhada40% shareoveraperiodof700ms.Thismeansthatfor60%ofthetime (420ms)thetaskisdeactivated.Thusifthetestrunmanagesto finishbeforetheinactiveperioditwillnothavetheextradelayof the420ms.Ifnot,thenitwillfinishrightafterthatinterval.This agreesalsowiththedifferencebetweenthetwomajorvaluesof thedistribution.

5.6. Networkresponse

Duringtheinitialtestsfortheexperimentsof5.1itwasnoticed that for low values of period P (below 100ms) the VMs were extremelyunresponsive,evenforasimplesshconnectionorping.

Fig.10. Histogramofthedistributionofindividualtimesforeachtestrunfor(a)1

executionofthe500-sintervaland(b)10executionsofthe500-sintervalwitha

givenconfiguration(combinationoftests4and5,schedulingperiod=700ms,same

coreexecution).

Fig.11.Cumulativedistributionfunctionandconfidenceintervalsfortests4and5

executionsamplesforschedulingperiod=700ms,samecoreexecution.The

hori-zontalaxissymbolizesthetestscoresandtheverticalaxistheprobabilitytohave

thesescores.

Forthisreasonitwasdecidedtoconductanotherexperimentin order toinvestigate the deterioration of VM network response times.ThisinvestigationinvolvedthepingingofoneVMfora cer-tainperiodoftime(300s)foranumberofdifferentconfigurations. Differentpingperiodswereused.Thenumberofsamplescanbe derived fromthedivision ofthe300-s period withthepinging period.DifferentpercentagesofCPUassignmentswerealsotaken underconsiderationanddifferentperiodsPofthescheduling allo-cation.DetailedvaluesappearinTable3.

InordertostresstheCPUtofullload,twoapproacheswere con-sidered.ThefirstwaswithoneoftheMatlabbenchmarks.Inorder toavoidtheeffectofsomespecificcomputationinterferencethat couldbecausedbythesetests(likeincreasedI/Othatcouldincrease

(11)

Table3

Parametersvariedforthepingexperiment.

Pingingperiod 0.5,1,1.5and2s

SchedulingperiodP 150–800ms(50msstep)

VMstatus Fullload

Q/P 20,40,60and80%

theservedinterruptsandaffectthepingresponses),another

exe-cutablewascreated.Thiswascomprisedofasimpleinfinitefor

loopthatcalculatedthesquarevalueoftheloopindex.Theresults

werealmostidentical.Inthefollowinggraphs(Fig.12),themean

valueanddeviationoftheresponsetimesaredepicted,forthemost stressfulexperiment (pingingperiod=0.5s).Morecombinations areincludedinAppendixC(Fig.C.1).

ItisimperativetostressthatthetestbeganfromP=50msof granularity,howevertheVMsshowedlittleresponsiveness,with manypackettimeoutsoccurring.Fullresponsivenesswasshown from150msvalues,andfromthesevaluesandontheplotswere created.Whatisseenfromthesemeasurementsisthatafterthe initialgranularitieswhentheVMisunresponsive,lowvaluesofP aremorebeneficialforthenetworkmetrics(bothmeanvalueand deviationfromthatforeachindividualpackettime).Furthermore, onetheoreticalexpectancywouldbethatthepingtimeswould deteriorateasthePvaluesincreased,duetothefactthatforlow percentagesofCPUassignments,largeinactiveperiodsofthetask wouldaffectespeciallythedeviationtimes.However,asseenfrom thegraphs,thehighestvaluesforbothmetricsareformiddlecases ofP(forthe450mscase).Afterthismaximum,thenetworkmetrics seemtoimproveagain,howevernottothelevelsofthelowP val-ues.SimilarbehaviorwasnoticedforadjacentvaluesofP(200ms, 400msand800ms).Thesewereskippedforbettervisibilityofthe graphs.Anotherinterestingconclusionisthatthepingingperiod doesnotseemtoaffectthisbehavior.

ThesimilarityofthemetricsforhighvaluesoftheallocationQ/P (%CPUshare)isduetothefactthatforpercentagescloseto100% (likethe80%case),theCPUisalmostdedicatedtothetask,sothe periodofthisallocationdoesnotinfluencetheVMbehavior.Almost theentiretimethetaskisactive.

Forthestatisticalanalysisofthemeasurementswehaveused asanexampletheconfigurationforschedulingperiodof150ms, pinging period of 0.5sand CPU share of 60%. The distribution of the600 individualvalues collected duringthe sample gath-eringexperiment is depictedin Fig.13a.In Fig. 13b thefitting of an exponential distribution to these values is shown, with meanvalue=461.544,variance=213,023,stderror=18.8739and estimatedcovariance=356.226.Exponentialdistributionsare com-monlyusedfornetworkingtimes,asalsoshowninCucinottaetal.

Fig.13.(a)Distributionoftheindividualpingdelaysin500msbinsand(b)Density

distributionofindividualvalues(pingtimes)andfittingofanexponential

distribu-tionforschedulingperiodof150ms,pingingperiodof0.5sandCPUshareof60%.(a)

Histogramofthedistributionoftheindividualvaluesand(b)densitydistribution

andfittingofanexponentialdistribution.

(2010a).TheCDFandconfidenceintervalsforallCPUsharesofthis configurationareshowninFig.14,fromwhichthebenefitofhaving moreCPUshareisvisualizedinamorequantitativeway.

6. Predictionofoverhead

The experiments that were presented above are helpfulfor an infrastructure/Cloud provider in order tobe able to have a generalizedpictureofhowapplicationperformancedeteriorates followingtheplacementdecisions.Thehighlevelconclusionscould alwaysbeextractedasfuzzylogicrulesinasimilardecision mak-ingmechanismthatwouldaidtheproviderduringthescheduling

Fig.12.(a)Meantimeoftheresponsetimesforfullloadandvariousconfigurationsofgranularityofassignment(schedulingperiod)Pandpingingperiod0.5sand(b)Mdev

time(asreportedbythepingcommand)oftheresponsetimesforfullloadandvariousconfigurationsofgranularityofassignment(schedulingperiod)Pandpingingperiod

(12)

Fig.14.Cumulativedistributionfunctionandaccordingconfidenceintervalsforall

CPUshares,forschedulingperiodof150ms,pingingperiodof0.5s.Thehorizontal

axisisthetimedelayinmillisecondsandtheverticaltheprobabilitytohaveaping

timelessorequaltothisdelay.

and allocation policiesenforcement.However, in order for this approachtomeetitsfullpotential,asuitablegenericmodelshould bedesigned.Thismodelmustbeabletoextractaccurate predic-tionsregarding theanticipated performance oftheapplications thathavesimilarbehaviortotheinvestigatedtests.Itmustbeable totakeasinputtheconditionsofexecutionandproducethe antici-patedlevelofperformance(testscore),thusaccuratelyquantifying theexpectedoverhead.

Inordertoimplementsuchasystem,themostgeneric black-boxmethodwaschosen,anartificialneuralnetwork(ANN).ANNs areamethodologythataimstoimitatethebehaviorofthehuman brainandhavebeenuseduptonowforanumberofapplications (e.g.patternrecognition,functionapproximation,classificationor timeseriesprediction).Theirmainabilityistomodelasystem’s behaviorbasedonlyona trainingdataset(blackboxapproach) thatincludesanumberofsysteminputsandtheaccording out-puts.Ingeneral,theANNcorrelatestheinputwiththeoutputdata inordertofindanoverallapproximationofthemodelfunctionthat describesthedependenceoftheoutputfromtheinput.Itwas cho-sensinceasdemonstratedinSection5,therearenumerous,even hidden,factorsthatmayaffectanapplication’sperformancelike hardwaredesigndetails.Havingananalyticalmodelingapproach isalmostimpossible,giventhevarietyofthesefactors.Buteven ifonedevotessuchanefforttocreateadetailedmodel,itwillbe deemeduselesswiththenextprocessorgenerationoradifferent architecturaldesign.TheANNsofferamuchmoregenericand flexi-bleapproach,andgivenonlyasuitabledataset,canachieveaquick andsatisfactorymodel.

TheidentifiedinputsandoutputsoftheANNmodelaredepicted inFig.15andincludetheinvestigatedparametersoftheprevious sections.

Therangeorpotentialvaluesoftheparametersarelistedin Table4.

All the inputs and outputs were normalized in the (−1,1) numericinterval.Forthenon-numericinputs,thesewereassigned

Fig.15.ANNmodel.

Table4

Rangeofparameters.

Testnumber/applicationtype1 Canbefrom1to6(Matlab benchmarks)

Testnumber/applicationtype1 Canbefrom1to6(Matlab benchmarks)

Q/P(core%) 0–80%

P Arbitrary(inourcaseitis

from200to800ms)

Executionmode Canbestandalone,same

core,adjacentcores, non-adjacentcores

differentvaluesforeachstate.Thenormalizationisnecessaryfor

internalANNrepresentationandbetterdepictionofthesystem.The

datasetthatwaspresentedintheprevioussections(1100different

executions,eachfor500s)wasusedfortraining(50%),intermediate

validationduringtraining(20%)andindependentvalidation(30%)

ofthenetwork.

6.1. OptimizationofANNstructureandparameters

Despitetheiradvantages,ANNscomewithanimportant

draw-back.Thedecisionsregardingtheirdesignparameters(likenumber

of hidden layers, transferfunctions for each layer and number

of neuronsper layer)are more or lessbased onthe designer’s

instinctandexperience.Inordertoautomateandoptimizethis

process,anevolutionaryapproachwasfollowedinthispaper.The

aforementionedparametersoftheANNdesignwereinsertedin

a Geneticalgorithm (GA) (Holland,1975), in order tooptimize

theirselection.In each generationof theGA,a setof solutions istried out,through trainingand validatingtherespective net-works.Theperformancemetricforeachsolution,inourcasethe errorinthetraining set,is thenreturnedtotheGA,asan indi-cationofitssuitability.Thenthenextgenerationiscreatedbythe bestsolutionsofthecurrentone(elitism),combinationsofexisting solutions(crossover)andslightdifferentiationofthem(mutation). Throughthisalgorithm,theparameterspaceisquicklysearched forasuboptimalsolution.TheparametersthatarefedintotheGA appearinFig.16.

Inorder toimplement theaforementionedapproach,an ini-tialGAscriptiscreatedinordertohandlethepopulationineach iteration and tolaunch theANNcreator script, afterfeeding it withthevaluesthataredepictedinthechromosomes.TheANN creatorscripttakesthevariablesfromtheGAscriptand dynami-callycreatestheANNsaccordingtotheGA-selectedparameters,by changingthenetworktopology.Thenitinitializesandtrainsthem (feed-forward,backpropagationnetworkswereused,trainedwith theLevenberg–Marquardt(Moré,1978)algorithm)andreturnsthe

Fig.16.ParametersoftheANNdesigntakenfromtheGA.Theseincludehowmany

hiddenlayerswillbeinitializedandwhatwilltheirtransferfunctionandtheir

neurons.Theseparameterswereingeneraldecidedbyahumanexpertbasedon

(13)

Table5

DetailsofthebestANNstructureasthiswasderivedfromtheoptimizationprocess,fordifferentnumberofgenerationsoftheGA.

Numberoflayers/GAgenerations Neuronsperlayer Transferfunctionsperlayer Meanabsoluteerror(%) Standarddeviation Time

4/30 5–7–19–1 Satlin–Radbas–Radbas–Satlins 5.94 16.47 40min

3/50 5–7–1 Tansig–Tansig–Purelin 5.06 13.85 1.1h

4/100 5–10–18–1 Radbas–Softmax–Radbas–Netinv 4.59 14.80 3h

4/150 5–24–8–1 Purelin–Logsig–Logsig–Satlins 5.16 14.61 5h

performancecriterion(errorinthetrainingset)totheGAforthe

evaluation ofeach solution.Anintermediate validationsetwas

usedinordertoenhancetheANNgeneralizationcapabilities,the

network’sabilitytopredictcasesthatithasnotmetbefore

dur-ingtraining.Inthiscase,eachnetworkwastrainedbythe50%of

thedatasetthenifitspredictionsontheintermediatevalidation

sethadameanabsoluteerror(MAE)smallerthan10%,this

net-workwassaved.Intheendofthegenerations,fromallthesaved

networks,thefinalcandidatewastheonethatshowedthebest

performance(MAE)intheindependentvalidationset.Itwasalso

investigatedifthereturnedmetricoftheGAwastheMAEinthe

intermediatevalidationset,insteadofthetrainingseterror.This

showedaworseperformance.

13 different transfer functionswere taken under

considera-tion(astheyaredescribedinhttp://www.mathworks.com/help/

toolbox/nnet/function.html) and a maximum limit of ten (10) layersandthirty(30)neuronsperlayerwasinserted.Four inde-pendentrunswereconductedfordifferentnumberofgenerations (30,50,100and150). Eachgenerationinvestigated20possible solutions.

TheANNstructureandoverallpredictionaccuracyfortheseis describedinTable5.Inthistable,onlythebestmodelfromeach runisdepicted.Themeanabsoluteerrorforall328independent validationcasesisbetween4.5and5.94,dependingonthe num-berofgenerationsused,whichcanbeconsideredasaverygood estimator.Onlythevaluesintheindependentvalidationsetwere usedforthecalculationofthiserror.Thesedatahavenotbeenused duringtraining.

The deterioration in the performance for 150 runs is not expectedbutcanbeattributedtothefactthateachGArundepends onsomerandomparameters,liketheinitialvalues ofthe chro-mosomeortherandomnessintheselectionofcharacteristicsto mutate. In a real-world situation,this can beeasily tackledby addinggenerations ofinvestigationuntila specificperformance criterionismet.Themostsuitablesolutionsarefoundinthe50 and100generationsruns.Thedifferenceliesinthefactthatfor 100generationsabettermeanerrorisacquiredbutaworse devi-ationmetric.Thefinalchoicedepends onthetrade-offbetween thesecharacteristics.

Anotherinteresting conclusionis that smallernetworks are moreabletocapturethedependenciesbetweentheinputandthe output.Eventhoughthenumberoflayershadamaximumvalue of10, only3or4-layernetworks showedgoodperformance(3 istheminimumlimitforANNs).Thiscanbeattributedtomore robustbehavior ofsmall networksdue tothefollowing aspect. Theinfluenceofthejthinputtotheithoutputdependsonallthe alternativepathsthatconnectthemandtheincreaseofthispath number(becauseofhighernumberoflayers)seemstoincrease thestandarddeviationoftheprediction.Asimilarresultappears inthecircuitsynthesisdomain,wheretheincreaseinthenumber ofpaths(laddercompared tolatticesynthesis)resultsinhigher systemsensitivity.

InFig.17theevolutionofthemeanabsoluteerrorofthesaved networksisportrayedforeachrun.Thesenetworksaresavedas theGAprogressesandiftheintermediatevalidationcriterionis met(errorintheintermediatevalidationset<10%).Thelackof sta-bility(continuouslyreducedmeanerror)inthisprocessisdueto

Fig.17.Temporalevolutionoftheperformance(meanabsoluteerroronthe

inde-pendentvalidationset)ofthenetworksthatmeetthesavecriterion(MAE<10%

ontheintermediatevalidationset).Afteraperiodoftime,theGAhasconverged

toasuitablesolution,afterwhichtheperformanceoftheinvestigatednetworks

deteriorates.

thefactthatrandomnessisuptoapointinherentinan evolution-aryapproach duetotheuseof operatorssuchasmutationand crossover.

Furthermore,fromthisgraphitisevidentthatalargenumber ofgoodqualitynetworksareproducedthroughtheproposed opti-mizationprocess.It canalsobeseenthatthenetworksthatare savedinlatergenerationsofonerunappeartobelosingquality. ThiscanbeattributedtothefactthatwhentheGAfindsagood solution,italsofocusesonnearbysimilarconfigurations.Ifthegood solutionisthelocalorglobaloptimum,thenthenearbysolutions willbeunderperformingandalargenumberofthecandidateswill bedevotedtothisareainvain(othersolutionsbasedoncrossover willsearchtheremainingsamplespace).Howeverthisisalsooneof thestrengthsofGAs,sincethroughthisprocessintheearlystages vastareasofthesamplespacearetriedoutandafterwardsthebest onesarethoroughlyexaminedinordertofindtheoptimalsolution inthatparticularinterval.

Thedistributionoftheerrorsofthenetworkinpredictingthe performance(testscores)isdirectlycomparedtothemulti-variate regressionmethodinFig.18.Thisaccuracyisbasedonlyonthe independentvalidationsetonly,forobjectivitypurposes.

6.2. Comparisonwithmulti-variatelinearregression

Inordertocompareourapproach,themultivariateregression methodwaschosen,thatisalsousedinKohetal.(2007).The MAT-LABfunctionmvregresswasused,inthesamenormalizeddataset. Theonlydifferencewasthatnowthetrainingandintermediate validationdatavectorsweremerged.Inthismethod,thereisno needtohaveintermediatevalidationstep.Forthevalidationofthe results,thesame30%independentvalidationdatasetwasused, forobjectivitypurposes.Thedetailsoftheapproximationfunction

(14)

Fig.18.Histogramoftheerrorsintheindependentvalidationsetfor50and100

generationrunsoftheGA-ANNmodelcomparedtomulti-variateregression.The

differenceliesonthecenterofthedistributions,whichfortheproposedmethod

isaround0whileformv-regressisaround8%,Thehorizontalaxisistheerror(%)

intheestimation(inbins)whiletheverticalisthenumberofvalidationcasesfor

whichtheerrorwasinthespecificbin.

Table6

Multi-variateregressiondetails.

Coefficients Meanabsoluteerror Standarddeviation Timeneeded

0.0218 8.78% 18.15 ∼1s −0.0171 0.8326 −0.0801 0.0631 0

appearinTable6.Thecoefficientscellcontainsintherespective

orderthemultipliersforeachoneofthe5inputsdepictedinFig.15, plustheconstantfactor.Theirvaluesarelowbecausetheyhave beenderivedforthenormalizedvaluesofthedatasetinthe[−1,1] interval.

TheresultsfromthismethodhavebeenincorporatedinFig.18 fordirectcomparisonwiththebestANNmodelsfromthe optimiza-tionmethod.

Fromthecomparisonofthetwoapproachesitcanbeconcluded thattheoptimizedANNshaveabetterperformanceintermsof meanabsoluteerrorandstandarddeviation.Thisisalsoevident fromthedistributionoftheerrorsasdepictedinthehistogram. WhiletheANNmodelsexhibithighconcentrationoferrorsaround thezerovalue,thisisnotthecasefortheregressionmodel,which concentratesaroundthe8–10%errorpoint.

Thestandarddeviationvaluesarehighforbothmethods,but this is due tospecific values of the328-case validation set for which the measurements have large deviations. This happens for thepeaks ofthe graphspresented in Section 5 and canbe attributedtosomerandomeffectsthataffectsystemperformance (e.g.systemtasks that updatethe OS inside theVM, etc.),like in the case of P=250ms that was mentioned in the previous sections.

However,becausetheproposedmethodhasmuchlarger com-putationalneeds,forenvironmentswhere newmodelsmustbe createdontheflythemulti-variateregression method offersa satisfactoryperformanceinaveryfastway.Inthecontextof inves-tigationinthispaper(infrastructure/Cloudproviderthatwantsto haveamodelformoreefficientallocationofVMsovernodes),the computationalneedsoftheoptimizedANNmethodarenot con-sideredaconstraint.Themodelswillbecreatedonce,andmaybe refreshedinthefuturewithnewdata.However,tighttiming con-straintsfortheproductionofthemodelsdonotexist.Thisisan off-lineprocess.

Detailed percent improvements achieved by the compared methodaredepictedinTable7.

For the response time of the models, that is the duration needed for each model to provide the prediction after the

Table7

ComparisonbetweenmultivariateregressionandGA-optimizedANNs.

Method %Improvement(ANN

meanabsoluteerror comparedto mvregress) %Improvement(ANN standarddeviation comparedto mvregress) 50gen.GA-ANN 42.36% 23.64% 100gen.GA-ANN 47.61% 18.4%

assertion of the conditions of execution to theinput, this was

better for the multivariate regression. However, for the ANN

it was also very fast (∼10ms), thus not making this a critical

limitation.

7. Conclusionsandfuturework

Asaconclusion,theemergenceofvirtualizedandshared

infras-tructureslikeCloudsimposesasignificantchallengeforproviders

and applications. Applications that are running inside virtual

machinesareaffectedbymanyfactorslikevirtualizationand

co-allocationofotherVMs.Inthispaper,theeffectsoftheseallocation

decisionswereinvestigated,bytakingunderconsiderationa

sig-nificantnumberofparameterslikerealtimeschedulingdecisions,

typesofworkloadanddifferentdeploymentscenarios.The

perfor-manceoverheadposedbytheseparameterscanreachupto150%,

whilecarefullyselectingtheco-allocatedtasksandthescenarioof

placementcanminimizeorevencancelthiseffect.Whatismore,

higherschedulingperiodsresultinasignificantreductioninthe

overheadcausedbytheinterference.Furthermore,theapplication

ofsuchaperformanceanalysismethodonnewhardwaredesign

mayleadtooptimizedmulti-corearchitecturesforCloud-specific

environments,throughtheidentificationofbottleneckslikethe

sharedmemorybus.

Inordertoprovideanautomaticdecisionmakingmechanism

thatwill guidetheinfrastructurestoproperconfigurationsand

removetheneedofhumanintervention,aGA-optimizedANNcan

beusedtomodel,quantifyandaccuratelypredicttheperformance

oftheapplicationsforagivenconfiguration.Themodelacquired

fromthisprocess iseffective(error<5%),generic andcanbeat

any time extended in order to include other identified

signifi-cantfactorsorscenarios.Throughtheuseofthismechanism,an

infrastructure/Cloudprovidercanhaveaprioriknowledgeofthe

interference.Thustheyareabletooptimizethemanagementof

thephysicalresources.

Forthefuture,oneinterestingaspecttopursueistheautomatic

detectionofthetypeofworkloadcausedbyeachapplication.Inthis

study,theanalysiswasbasedonthe6Matlabbenchmarktests,that

depictdifferenttypesofcomputations.Forreal-worldapplications

thismeansthattheymust bematchedtooneormoreofthese

elementarytests,basedforexampleonthecomparison oftheir

footprints.

Acknowledgments

ThisresearchispartiallyfundedbytheEuropeanCommission

aspartoftheEuropeanIST7thFrameworkProgramthroughthe

projectsIRMOSundercontractnumber214777andOPTIMISunder

contractnumber257115.

AppendixA. RemainingfiguresfromSection5.1

(15)

Fig.A.1.RemainingfiguresfromSection5.1(Fig.4:testscoresfordifferentCPUpercentagesandgranularity(schedulingperiod)whenthetestsarerunningwithoutany

interference(co-scheduledtests).Thetestscore(verticalaxis)indicatesthetimeneededforexecutingonetestrunandhasbeencalculatedfromthemeanvalueofhundreds

oftestrunsinsidea500-sexecutionrun.Theschedulingperiod(horizontalaxis)isthetimeintervalinwhichtheallocationofX%CPUshareisguaranteed).(a)Standalone

test3;(b)standalonetest2;(c)standalonetest5.

AppendixB. RemainingcombinationsfromSection5.4

SeeFig.B.1.

AppendixC. Figuresforpingingperiod=2s(continuation ofSection5.5)

(16)

Fig.B.1.RemainingcombinationsfromSection5.4(Fig.6:comparisonofsystemlevel(addedscores)performanceforalldeploymentscenariosand40%CPUallocation

perVM.Ineachofthegraphsaspecificcombinationoftestsisinvestigatedinordertocomparetheiraddedscores(verticalaxis)withregardtodifferentdeployment

scenariosandhowthisperformanceisaffectedbytheschedulingperiodP).(a)Tests1and3combinedperformance;(b)tests4and4combinedperformance;(c)tests4

and5combinedperformance;(d)tests5and5combinedperformance;(e)tests3and4combinedperformance;(f)tests2and3combinedperformance;(g)tests1and6

combinedperformance;(h)tests3and6combinedperformance;(i)tests2and5combinedperformance;(j)tests2and2combinedperformance;(k)tests3and3combined

(17)
(18)

Fig.C.1.(a)Meantimeoftheresponsetimesforfullloadandvariousconfigurationsofgranularityofassignment(schedulingperiod)Pandpingingperiod2sand(b)Mdev

time(asreportedbythepingcommand)oftheresponsetimesforfullloadandvariousconfigurationsofgranularityofassignment(schedulingperiod)Pandpingingperiod

2s(continuationofSection5.5).

AppendixD. Numberofexecutionsforeachofthedata pointsinsamecorescenariographs

(19)

Fig.D.1.Numberoftestrunsfromwhichthemeanvaluesforthescoreswereextractedforthesamecorescenariographs(Fig.5)for(a)test1whenitiscoscheduledwith

allothertests,(b)test2whenitiscoscheduledwithallothertests,(c)test3whenitiscoscheduledwithallothertests,(d)test4whenitiscoscheduledwithallothertests,

(e)test5whenitiscoscheduledwithallothertests,(f)test6whenitiscoscheduledwithallothertests.(a)Test1numberofrunswhenco-scheduledwithallothertests;(b)

test2numberofrunswhenco-scheduledwithallothertests;(c)test3numberofrunswhenco-scheduledwithallothertests;(d)test4numberofrunswhenco-scheduled

(20)
(21)

Fig.D.1. (Continued).

References

Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee,

G.,Patterson,D.,Rabkin,A., Stoica,I.,Zaharia, M.,2010.Aview ofcloud

computing. Commun. ACM 53 (4), 50–58, doi:10.1145/1721654.1721672,

http://doi.acm.org/10.1145/1721654.1721672.

Checconi,F.,Cucinotta,T.,Faggioli,D.,Lipari,G.,2009.Hierarchicalmultiprocessor

CPUreservationsfortheLinuxKernel.In:Proceedingsofthe5thInternational

WorkshoponOperatingSystemsPlatformsforEmbeddedReal-Time

Applica-tions(OSPERT2009),Dublin,Ireland.

Cucinotta,T.,Anastasi,G.,Abeni,L.,2009.Respectingtemporalconstraintsin

vir-tualisedservices.In:Proceedingsofthe2ndIEEEInternationalWorkshopon

Real-TimeService-OrientedArchitectureandApplications (RTSOAA2009),

Seattle,Washington,July.

Cucinotta,T.,Checconi,F.,Kousiouris,G.,Kyriazis,D.,Varvarigou,T.,Mazzetti,

A., Zlatev, Z., Papay, J., Boniface, M., Berger, S., Lamp, D., Voith, T.,

Stein, M., 2010a. Virtualised e-Learning with real-time guarantees on

the IRMOS platform. In: Proceedings of the IEEE International

Confer-enceonService-OrientedComputingandApplications(SOCA2010),Perth,

Australia.

Cucinotta,T.,Giani,D.,Faggioli,D.,Checconi,F.,2010b.Providingperformance

guar-anteestovirtualmachinesusingreal-timescheduling.In:Proceedingsofthe

5thWorkshoponVirtualizationandHigh-PerformanceCloudComputing(VHPC

2010),Ischia(Naples),Italy.

El-Refaey,M.A.,Rizkaa,M.A.,2010.CloudGauge:adynamiccloudandvirtualization

benchmarkingsuite.In:EnablingTechnologies:InfrastructuresforCollaborative

Enterprises(WETICE),201019thIEEEInternationalWorkshopon,vol.,no.,pp.

(22)

Fiszelew,A.,Britos,P.,Ochoa,A.,Merlino,H.,Fernández,E.,García-Martínez,R.,

2007.Findingoptimalneuralnetworkarchitectureusinggeneticalgorithms.

Adv.Comput.Sci.Eng.Res.Comput.Sci.27.

García-Guirado,A.,Fernández-Pascual,R.,García, J.M.,2009.Virtual-GEMS:An

infrastructuretosimulatevirtualmachines.In:FifthAnnualWorkshopon

Mod-eling,BenchmarkingandSimulationMoBS2009heldinConjunctionwiththe

36thAnnualInternationalSymposiumonComputerArchitectureSunday.

Giustolisi,O.,Simeone,V.,2006.Optimaldesignofartificialneuralnetworksbya

multi-objectivestrategy:groundwaterlevelpredictions/Constructionoptimale

deréseauxdeneuronesartificielsselonunestratégiemulti-objectifs:prévisions

deniveaupiézométrique.Hydrol.Sci.J.51(3),502–523.

Holland,J.,1975.AdaptationinNaturalandArtificialSystems.UniversityofMichigan

Press,AnnArbor.

http://support.gams-software.com/doku.php?id=platform:aws. http://view.eecs.berkeley.edu/wiki/Dwarfs. http://www.irmosproject.eu/Postproduction.aspx. http://www.javaworld.com/javaworld/jw-12-2000/jw-1229-traps.html?page=4. http://www.linux-kvm.org/. http://www.mathworks.com/help/techdoc/ref/bench.html. http://www.mathworks.com/help/toolbox/nnet/function.html. http://www.mathworks.com/matlabcentral/fileexchange/11984. http://www.vmware.com/. http://www.xen.org/.

Ilonen,J.,Kamarainen,J.K.,Lampinen,J.,2003.Differentialevolutiontraining

algo-rithmforfeed-forwardneuralnetworks.NeuralProcess.Lett.17(February(1)).

IRMOSProjectD5.1.1,2009.Modelsofrealtimeapplicationsonserviceoriented

infrastructures,It-Innovationandotherpartners.

Jin,H.,Cao,W.,Yuan,P.,Xie,X.,2008.VSCBenchmark:benchmarkfordynamic

serverperformanceofvirtualizationtechnology.In:Proceedingsofthe1st

Inter-nationalForumonNext-generationMulticore/ManycoreTechnologies,Cairo,

Egypt.

Khalid,O.,Maljevic,I.,Anthony,R.,Petridis,M.,Parrott,K.,Schulz,M.,2009.Dynamic

schedulingofvirtualmachinesrunningHPCworkloadsinscientificgrids.In:

NewTechnologies,MobilityandSecurity(NTMS),20093rdInternational

Con-ferenceon,vol.,no.,pp.1–5,20–23Dec.,doi:10.1109/NTMS.2009.5384725.

Koh,Y.,Knauerhase,R.C.,Brett,P.,Bowman,M.,Wen,Z.,Pu,C.,2007.Ananalysis

ofperformanceinterferenceeffectsinvirtualenvironments.In:ISPASS,.IEEE

ComputerSociety,pp.200–209.

Kousiouris,G.,Checconi,F.,Mazzetti,A.,Zlatev,Z.,Papay,J.,Voith,T.,Kyriazis,D.,

2010.Distributedinteractivereal-timemultimediaapplications:asampling

andanalysisframework.In:1stInternationalWorkshoponAnalysisToolsand

MethodologiesforEmbeddedandReal-timeSystems(WATERS2010),Brussels,

Belgium.

Leung,F.H.F.,Lam,H.K.,Ling,S.H.,Tam,P.K.S.,2003.Tuningofthestructureand

parametersofaneuralnetworkusinganimprovedgeneticalgorithm.IEEETrans.

NeuralNetwor.14(January(1)).

Liu,C.L.,Layland,J.,1973.Schedulingalgorithmsformultiprogramminginahard

real-timeenvironment.J.ACM20(1).

Makhija,V.,etal.,2006.VMmark:ascalablebenchmarkforvirtualizedsystems.

Technicalreport,VMWare.

Moller,K.T.,Virtualmachinebenchmarking.DiplomaThesis.UniversitatKarlsruhe

(TH).2007.

Moré,J.J.,1978.TheLevenberg–Marquardtalgorithm:implementationand

the-ory.In:Watson,G.A.(Ed.),NumericalAnalysis,Dundee1977.LectureNotesin

Mathematics,vol.630.Springer,Berlin,pp.105–116.

Padala,X.Z.,Wanf,Z.,Singhal,S.,Shin,K.,2007.Performanceevaluationof

vir-tualizationtechnologiesforserver consolidation.HPLabs,Tech.Rep.

HPL-2007-59.

Soror, A.A., Minhas, U.F., Aboulnaga, A., Salem, K., Kokosielis,P., Kamath, S.,

2008.Automaticvirtualmachineconfigurationfordatabaseworkloads.ACM

Trans.DatabaseSyst.35(1),47p,Article7,doi:10.1145/1670243.1670250,

http://doi.acm.org/10.1145/1670243.1670250.

Tickoo,O.,Iyer,R.,Illikkal,R.,Newell,D.,2010.Modelingvirtualmachine

perfor-mance.ACMSIGMETRICSPerform.Eval.Rev.37,55.

Tikotekar,A.,Vallˇıee,G.,Naughton,T.,Ong,H.,Engelmann,C.,Scott,S.L.,Filippi,A.M.,

2008.Effectsofvirtualizationonascientificapplicationrunningahyperspectral

radiativetransfercodeonvirtualmachines.In:Proc.ofHPCVirt’08.USA,ACM,

pp.16–23.

Weng,C.,Wang,Z.,Li,M.,Lu,X.,2009.Thehybridschedulingframeworkfor

virtual machine systems VEE‘09. In:Proceedings of the 2009ACM

SIG-PLAN/SIGOPSInternationalConferenceonVirtualExecutionEnvironments,

ACM,pp.111–120.

White,J.,Pilbeam,A.,2010.ASurveyofVirtualizationTechnologiesWith

Perfor-manceTesting,ArXive-prints,1010.3233.

Youseff,L.,Butrico,M.,DaSilva,D.,2008.Towardaunifiedontologyofcloud

comput-ing.In:GridComputingEnvironmentsWorkshop,2008.GCE‘08gridComputing

EnvironmentsWorkshop,2008,GCE‘08,pp.1–10.

GeorgeKousiourisreceivedhisdiplomainElectricalandComputerEngineering

fromtheUniversityofPatras,Greecein2005.HeiscurrentlypursuinghisPh.D.in

GridandCloudcomputingattheTelecommunicationsLaboratoryoftheDept.of

ElectricalandComputerEngineeringoftheNationalTechnicalUniversityofAthens

andisaresearcherfortheInstituteofCommunicationandComputerSystems(ICCS).

HehasparticipatedintheEUfundedprojectsBEinGRIDIRMOSandOPTIMISandthe

NationalprojectGRID-APP.Inthepasthehasworkedforprivate

telecommunica-tionscompanies.Hisinterestsaremainlycomputationalintelligence,optimization,

computernetworksandwebservices.

TommasoCucinottagraduatedinComputerEngineeringattheUniversityofPisa

in2000,andreceivedthePh.D.degreeinComputerEngineeringfromtheScuola

SuperioreSant’AnnaofPisain2004.HeisAssistantProfessorofComputer

Engineer-ingattheReal-TimeSystemsLaboratory(ReTiS)ofScuolaSuperioreSant’Anna.His

mainresearchactivitiesareintheareasofreal-timeandembeddedsystems,with

aparticularfocusonreal-timesupportforgeneral-purposeOperatingSystems,and

security,withaparticularfocusonsmart-cardbasedauthentication.

TheodoraA.VarvarigoureceivedtheB.TechdegreefromtheNationalTechnical

UniversityofAthens,Athens,Greecein1988,theMSdegreesinElectrical

Engi-neering(1989)andinComputerScience(1991)fromStanfordUniversity,Stanford,

Californiain1989andthePh.D.degreefromStanfordUniversityaswellin1991.

SheworkedatAT&TBellLabs,Holmdel,NewJerseybetween1991and1995.

Between1995and1997sheworkedasanAssistantProfessorattheTechnical

UniversityofCrete,Chania,Greece.Since1997shewaselectedasanAssistant

Pro-fessorwhilesince2007sheisaProfessorattheNationalTechnicalUniversityof

Athens,andDirectorofthePostgraduateCourse“EngineeringEconomicsSystems”.

Prof.Varvarigouhasgreatexperienceintheareaofsemanticwebtechnologies,

schedulingoverdistributedplatforms,embeddedsystemsandgridcomputing.

Inthisarea,shehaspublishedmorethan150papersinleadingjournalsand

Figura

Fig. 1. Pseudo-code for Java Coordinator sequence of actions.
Fig. 3. Software components needed for the implementation and their interaction: the main component is the Java Coordinator which keeps track of execution loops in order to cover the investigated interval, sets up the scheduling parameters of the VMs and c
Fig. 4. Test scores for different CPU percentages and granularity (scheduling period) when the tests are running without any interference (co-scheduled tests)
Fig. 5. Effect on individual test scores for different combinations of concurrent tests
+7

Riferimenti

Documenti correlati

– Influisce sul tempo che il processo passerà nella ready queue in attesa della CPU.. • Un parametro di valutazione può dunque essere il tempo di attesa

❖ Se vi sono n processi nella ready queue ed il quanto di tempo è q , ciascun processo occupa 1/n del tempo di CPU in frazioni di, al più, q unità di tempo; nessun processo attende

To construct the neural network model ANN1, the following 82 input features were used: sine and cosine values with one year period, current and prior values for

  Con n processi nella coda e un quanto di tempo = q, ogni processo riceve 1/n di CPU time in blocchi di q unità per

Starvation: si verifica quando uno o più processi di priorità bassa vengono lasciati indefinitamente nella coda dei processi pronti, perchè vi è sempre almeno un processo pronto

A clairvoyant feedback control algorithm was assigned to each task, which knows in advance the execution time of each instance of the tasks and asks the supervisor for the

The use of NBCA in obliteration of a dilated bile duct seems to be a safe procedure with good results providing a less invasive option than hepatic resection and decreasing

In this paper, we addressed the problem of scheduling soft real-time systems by using reservation based schedul- ing techniques augmented by feedback control strategies.. We