Contents lists available atScienceDirect
Science
of
Computer
Programming
www.elsevier.com/locate/scico
From
CIL
to
Java
bytecode:
Semantics-based
translation
for
static
analysis
leveraging
Pietro Ferrara
a,
b,
∗
,
Agostino Cortesi
b,
Fausto Spoto
c aJuliaSoft,Verona,ItalybUniversitàCa’FoscaridiVenezia,Italy cUniversitàdiVerona,Italy
a
r
t
i
c
l
e
i
n
f
o
a
b
s
t
r
a
c
t
Articlehistory:
Received28June2019
Receivedinrevisedform20December2019 Accepted8January2020
Availableonline31January2020
Keywords:
Staticanalysis Abstractinterpretation Javabytecode CIL
AformaltranslationofCIL(i.e.,.Net)bytecodeintoJavabytecodeisintroducedandproved soundwith respecttothelanguagesemantics. Theresultingcodeisthenanalyzedwith Julia,anindustrialstaticanalyzerofJavabytecode.Theoverallprocessoftranslationand analysisisfast,scalestoindustrialprograms,andintroducesanegligiblenumberoffalse alarms. The main contribution ofthis work is to leverage existing,mature, and sound analyzers for Java bytecode by applying them alsoto the widerange of.Net software systems.Experimentalresultsshowtheactualeffectivenessofthisapproachwhenapplied to all the system libraries of the Microsoft .Net framework version 4.0.30319 (about 5 MLOCs).
©2020TheAuthor(s).PublishedbyElsevierB.V.Thisisanopenaccessarticleunderthe CCBY-NC-NDlicense(http://creativecommons.org/licenses/by-nc-nd/4.0/).
1. Introduction
Static analysis infers, at compile-time, properties about the run-time behavior of computer programs. It allows one to verify, forinstance,the absence ofrun-time errors orsecurity breaches. Static analysis applies alsoto compiled code inassemblyorbytecodeformat.ThisisparticularlyinterestingforapplicationsdistributedontheInternet,ordownloaded frompublic(andpossiblyunsafe)applicationrepositories(e.g.,theGooglePlaystore),whenthesourcecodeisnotavailable, buttheuserwouldliketostaticallychecksomesafetyorsecurityproperties.
The analysisofJava bytecode(from now on,JB) forthe Java VirtualMachine hasa long research tradition andmany analyzersexist [1].Some analyses buildonformal mathematicalroots,such asabstractinterpretation [2–6].Moreover,JB makes the design ofstaticanalysis easierby requiringbytecode to be type-checkable [7] andwithout unsafe operations such asfree pointer operations.On thecontrary, CILbytecode,that is,the compiledbytecode usedforthe .Netplatform (from nowon, justCIL), hasnot received muchattention fromthestaticanalysis communityyet.Therefore, whilethere areseveralsyntactic(thatis,notbasedonformal methods)staticanalyzersforCIL,thereareveryfewindustrialsemantic analyzersbasedonformalmethods.Moreover,CILcanbeusedinanunsafe way,thatis,byallowingfreepointeroperations, whichmakesitsstaticanalysisharder.However,theseoperationsareveryoftenusedinverycontrolledcontexts.Hence,in mostcases,astaticanalyzercouldpossiblycapturetheiractualbehavioranyway.
Despitecleardifferences,JBandCILsharestrongsimilarities:botharelow-levelobject-orientedlanguageswhereobjects are stored andshared in theheap. Hence, it is tempting to leverage mature existing staticanalyses and toolsfor JB by translating CILintoequivalent JB andbyrunning thetoolsonthe latter.Obviously,thisintroduces issuesabouttheexact
*
Correspondingauthor.E-mailaddresses:pietro.ferrara@unive.it(P. Ferrara),cortesi@unive.it(A. Cortesi).
https://doi.org/10.1016/j.scico.2020.102392
0167-6423/©2020TheAuthor(s).PublishedbyElsevierB.V.ThisisanopenaccessarticleundertheCCBY-NC-NDlicense (http://creativecommons.org/licenses/by-nc-nd/4.0/).
meaningofequivalence betweenCILanditstranslationintoJB.Moreover,thetranslationshouldnotintroducecodeartifacts thatconfusetheanalyzerandshouldworkonindustrial-sizeCILapplications,supportingasmanyunsafepointeroperations aspossible.
1.1. Contribution
Themaincontributionofthiswork1 istheintroductionofatranslationofCILtoJBthatisboththeoreticallysoundand effectiveinpractice,sothatanindustrialstaticanalyzerforJBcanbeappliedto.Net(andinparticularC#)programs.More languagescompileintoJB(e.g.,JavaandScala)andCIL(e.g.,VB.NetandF#),withdistinctfeaturesandcodestructures.Here, wefocusonJavaandC#,thathavesimilarstructureandcompileintocomparablebytecode.
We startbyformalizingtheconcretesemantics ofarepresentative subsetofCILandJB,andthetranslationofCILinto JB.Then,weprovethistranslationsound,inthesensethattheconcretesemanticsoftheinitialCILprogramisequivalentto thatofthetranslatedJBprogram.Thisguaranteesthat,ifweproveapropertyoftheJBprogram,thensuchpropertyholds also forthe original CIL program. Ourimplementation supports thefull set of CILsafe statements.Fig. 7in Appendix B
reportsalltheCILstatementsasdefinedbyECMA335standard [9],andifandhowwetranslatethem.Thenwepresenta deepexperimentalevaluationoverindustrial-sizeopen-sourcepopularprograms,byapplyingtheJuliastaticanalyzer [4] to thetranslatedJB.
Ourexperimentsarefocusedonthreemainresearchhypothesestoprovethescalability,precision, andcoverageofour approach.Inparticular,thisarticleanswersthefollowingresearchquestions:
ResearchQuestion1(Scalability).DoestheCILtoJBtranslationscale,thatis,(i)canitdealwithlibrariesofindustrialsize(100 KLOCs) inafewminutes,and(ii)isitscomputationaltimenegligiblew.r.t.thatrequiredbytheoverallanalysis?
ResearchQuestion2(Precision).DoestheCILtoJBtranslationintroducelessthan10%offalsealarmsw.r.t.theoverallnumberof alarmsproducedbytheanalyzerwhenconsideringhighseveritywarnings?
ResearchQuestion3(Libraries).DespitesupportingonlyasubsetofCIL,doestheCILtoJBtranslationsucceedonatleast95%ofthe systemlibraries?
About scalability,theoverallgoalistoobtainatool thatcan beappliedduring thenormalsoftwaredevelopment life-cycle, thatis,itcanperformtheanalysisateachbuildofthesystem.Therefore,weneedatoolthatanalyzeshundreds of thousandsofLOCsinfewminutes.
About precision,the10%bound waschosen assuchlimitisperceived asstandard bythecommunitywhenevaluating theeffectivenessofstaticanalyzers.Falsealarmsarenotrealissuesinthecode,buttheyareduetosomeformsof approx-imationperformedbytheanalysis.Forinstance,arecentpaper [10] statedthatGoogleadoptsastaticanalyzerduringcode reviewifit“produceslessthan10%effectivefalsepositives”(aka,alarms),sincesuchamountofnoise doesnotcompromise thepracticaleffectivenessofthetoolwhenusedbysoftwaredevelopers.Notethatourresearchquestionreferstothefalse alarms duetothetranslation,andthatwouldnotbeproducedonthesamecodewritteninJavabytheJavaanalyzer.For instance,letusassumethatananalyzerproduces10falsealarmsonaJavaprogram.ResearchQuestion2issatisfiedifthe analysison thesameprogramwrittenin.NET (andtranslatedto JBbeforebeinganalyzedbythe Javaanalyzer)produces atmost11false alarms(the 10produced ontheJava program–that aredueto theimprecisionoftheanalysis– and1 introduced bythetranslation).Inourexperience,thesefalsealarmsusually happenbecausethe.NETcompilerintroduces some patterns(e.g.,whencheckingthenullnessofavariable)thataredifferentfromthoseproducedbytheJavacompiler, andtheJavastaticanalyzerapproximatesthemtoocoarsely.
Withrespecttoquestion 3,itiscrucialthatastaticanalyzerunderstandsthebehavior ofsystemlibraries(e.g.,semantics ofmethodcallsandassumptionsmadeontheirparameters)orotherwiseitcouldonlyrelyonmanualannotationsor (pos-siblyunsound) assumptions ontheirexecution. However,systemlibraries needtoaccessmemorythroughunsafepointer. Javaallowssuchbehaviors throughnativemethods(writteninlanguagesotherthanJavaandboundthroughtheJavaNative Interface),while.Net allowsunsafepointersdirectlyinits samecode.Inthesecases,ourtranslationproduces Javanative methods.ThroughResearchQuestion3,weensurethattheeffortofmanuallyannotating.Netlibrariesiscomparabletothat neededforJava.
Ourtranslationcouldalsobeapplied toletJavaandC# codeinteroperate,bycompilingthembothintoJavabytecode. Thisrequires,however,thetranslationofthelibraries calledby theapplications,includingthestandardlibraryofC#that containsrelevantportionsofunsafeandnativecode,whichisoutofthescopeofthiswork.
1 Thispaperisarevisedandextendedversionof [8],distinguishedpaperawardatICSE-FormaliSE(2018).Inparticular,thispaperaddsdetailedformal
1.2. Relatedwork
Few attemptshavebeenmadeinthepasttotranslate CILtoJB.Grasshopper isprobablythemostpopularone. How-ever, it is not available anymore.2 As far as we can see, it was abandoned about a decade ago, and we cannot make any comparisonwith our translation. A similar tool is CLR2JVM [11]: it translatesCIL to an intermediate X M LC L R rep-resentation, that can be then translated into X M LJ V M, and finally to JB. As far as we can see,3 the tool should read .NET executables, but it failed to parse all the executable files of our experiments (see Section 5 for the complete list). This probably happened because CLR2JVM is not maintained any more (the last commit to the repository https:// sourceforge.net/p/xmlvm/code/HEAD/tree/trunk/xmlvm/src/clr2jvm/occurredmorethansixyearsago),anditdoesnot sup-portthelastCILversions.NeitherGrasshoppernorCLR2JVMhasanydocumentationordiscussionabouthowthetranslation isperformed(in particular,howtheyhandleinstructionsthat aredifferentbetweenCILandJB,such asdirectreferences). Therefore,asfaraswecansee,ourtranslationfromCILtoJBistheonlyonethatworksonrecentreleasesofCILandJB, andisformalizedandprovedsound.
Other translationsbetweenlow-level languagesexist,justified bythe needofapplying verificationtoolsthatwork on aspecificlanguage only.Forinstance,[12] definesa translationfromBoogieintoWhyMLandproves itssoundness,aswe havedonefromCILtoJB.Similartranslationsworkalsoatruntime,inparticularinsideajust-in-timecompiler,asin [13]. However,wedidnotfindanyliteratureonthetranslationofCILintoJBforindustrial-sizesoftware.
Many other static analyzers for .Net exist, in particular forC# code. There are tools that verify compliance to some guideline,suchasFxcop [14] andCoverityPrevent [15].Others,suchasNDepend [16] andCodeMetrics [17],provide met-ricsaboutthe codeunderanalysis. ReSharper [18] applies syntacticalcodeinspections, finds codesmells andguarantees complianceagainstcodingstandards.Asfaraswecansee,thereexistonlytwomainfundamentaltoolswithscientificbase: Spec# [19],an extension ofC#withstaticcheckingofvariouskindsofmanual specifications,andCodeContracts [20],an abstractinterpretation-based staticanalyzerfor CIL.Inthe Javaworld, the numberofstaticanalyzers basedon syntactic reasoning(thatis,not basedonformal methods),such asCheckstyle [21],FindBugs [22] andPMD [23],iscomparableto that for.Net. However, Java attractedmuch more attentionfromthe scientific community, andmoresemantic analyzers havebeenintroducedduringthelastdecade,suchasCodeSonar [24],ThreadSafe [25] andJulia [4].Instead,therearequite fewer semantic static analyzers for.NET. In addition,few semantic analyzers, such asWALA [26], havebeen applied to variouslanguages(e.g.,JavaandJavaScript),butwithad-hoctranslationofthesourcetotheanalyzedlanguage.
Ourapproach lets usapply allthe Javaanalyzers on.Net programs (almost)forfree, that is,by translatingCIL intoJB andby using theanalyzers asthey are (weexpect that few manual annotations are neededtoimprove theprecision of theanalysis,inparticularwhendealingwithlibrarycalls).WehavealsostudiedperformanceandresultswithJulia.Asfar asweknow, ourworkisthe firsttranslationofCILintoJBforstaticanalysisthat isprovento besound andcomes with evidencethatthistranslationapplies toindustrial-sizesoftware, withresultsthatarecomparableintermsofprecision and efficiencytothoseobtainedonJB.
2. Background
Thissection providessome backgroundon CILandJB,arunningexample,anda discussiononthe architectureof our approach.ForanexhaustivedefinitionofJBandCIL,see [7] and [9],respectively.
2.1. CILandJB
Bytecode is a machine-independent low-level programming language, used as target of the compilation of high-level languages, that hence becomes machine-independent. Bytecode languages are interpreted by their corresponding virtual machine, specific to each execution architecture. Both .Net and Java compile into bytecode. However, they use distinct instructionsandvirtual machines..Net compilesintoCIL,whileJavacompiles intoJB.Thesehavestrongsimilarities:both use an operand stack for temporary values andan array of local variables standing for source code variables; both are object-oriented, withinstructions forobject creation, field accessand virtual methoddispatch. Despitethese undeniable similarities, CIL andJB differforthe wayof performing parameter passing (CIL uses aspecific array ofvariables forthe formal parameters, while JB merges them into the array of local variables); they handle object creation differently (CIL createsandinitializes theobjectatthe sametime, whiletheseare distinctoperationsinJB);they allocatememoryslots differently(inCILeach valueusesaslot,whileJBuses1or2slotsper32- or64-bitvalues,respectively);finally,CILuses pointersexplicitly,alsoin type-unsafeways, while JBhasnonotion ofpointer.We focusourformalization ona minimal representativesubsetofJBandCIL,asdefinedinFig.1.Thatfigurepresentsbytecodeinstructionsfor:
arithmetic: JBhastype-specificoperations,suchas
iadd
andladd
toaddtwointegerorlongvalues,respectively.Instead, CILhasgenericoperations,suchasadd
toaddtwonumericalvaluesofthesametype;2 Wewereunabletoaccessthewebsitehttp://dev.mainsoft.comthat,frompastforumdiscussions(http://stackoverflow.com/questions/95163/differences -between-msil-and-java-bytecode),seemstobethewebsiteofthetool.
JB CIL iadd
add (arith. op.)
ladd iload i ldloc i (local vars) lload i stloc i aload i ldarg i istore i lstore i astore i invokevirtual call (meth. call) invokestatic new T newobj T(· · · ) (objects) getfield f ldfld f putfield f stfld f
if_icmpgt bgt (cond. branch)
dup dup (stack) dup2 ldloca i (pointers) stind ldind
Fig. 1. JB and CIL minimal bytecode languages.
localvariablesaccess: JBhasasinglearrayofvariablesforbothlocalvariablesandmethodarguments,andreadsandwrites valuesfromthisarraythroughx
load
andxstore
,wherexisi
forintegervalues,l
forlongvaluesanda
forreferences, respectively.Inthisarray,64bitsvaluesusetwosubsequentslots.CIL,instead,usestwoarrays:oneformethod’sarguments (ldarg
i
loadsthevalueofthei-thargument)andoneforlocalvariables(ldloc
i
andstloc
i
readandwrite the i-thlocalvariable,respectively).Inaddition,itusesoneslotbothfor32- and64-bitvariables;methodcall: JBhasseveralkindsofmethodcallinstructions,suchas
invokevirtual
andinvokestatic
.Instead,CIL hasauniquecall
instruction;objectmanipulation: inJB,instructions
new
,getfield
,andputfield
allocateanewobject,read,andwriteitsfields, respectively.CILhassimilarinstructionsnewobj
,thatalsocallstheconstructor,ldfld
andstfld
;conditionalbranch: JBhastype-specificconditionalbranchinstructionssuchas
if_icmpgt
(tobranchifthegreaterthan operatorreturnstrueonthetopmosttwointegervaluesoftheoperandstack);CILhasgenericinstructionssuchasbgt
; stack: CILduplicates thetopvalueofthestackthroughthedup
instruction.JBdoesthesamewithdup
for32bitsvalues anddup2
for64bitsvalues;pointers: CIL contains some instructions to load the address of a local variable (
ldloca
i
), and to store and load a value into the memory cell pointedby a reference (stind
andldind
, respectively). Instead, JB has no direct pointer manipulation.Intherestofthisarticle,StCILandStJBdenoteCILandJBinstructionsorstatements,respectively.Amethod(bothinCIL
andJB)isrepresentedby(i) asequenceof(possiblyconditionalandbranching)statements,and(ii)thenumberandstatic typesofargumentsandlocalvariables.
2.2. Runningexample
Fig.2showstherunningexamplethat Sections3and4usetoclarifytheformalization.TheC#codeinFig.2adefines a class
Wrap
that wrapsan integer value, and a staticmethodWrapsCollection
that, given an integern
, returns a collectionofn wrappers containingvaluesfrom0 to n−
1.Fig.2bpresentsthe(simplified)CILobtainedfromits compi-lation:asusualwithCIL,codeisunstructured(e.g.,therearebranchesatlines 7and 20),andeach sourcecodestatement could be translatedinto manybytecodestatements (e.g.,line 7 inFig. 2a compiles into lines 3and 4in Fig.2b). Fig.2c presentstheresultsofourtranslationofCILintoJB.Thenextsectionsexplainthestepsofthetranslation.Firstofall,notice some ofthedifferenceshighlightedinSection 2.1.Namely,thetype-genericCILstatementstloc.1
atline 6ofFig.2bis translatedintothe type-specificistore_2
atline 7ofFig. 2c.Similarly,newobj
(line 11)istranslatedinto multipleJB statements(line 12-16).Therunningexamplecontainssomeinstructionsthatarenotpartoftheminimallanguagedefined inSection 2.1,andinparticular(i)ldc
andiconst_*
thatloadconstant(integer)valuesinCILandJB,respectively,(ii)blt
andif_icmplt
forconditionalbranchingwhenan(integer)valueisstrictlylessthananother,(iii)invokespecial
1 public class Wrap
2 {
3 int f ;
4 Wrap(int f) { this . f = f ; }
5 static ICollection <Wrap>WrapsCollection(int n)
6 {
7 ICollection <Wrap> result = new List<Wrap>();
8 for ( int i = 0; i < n; i ++)
9 result .Add(new Wrap(i));
10 return result ;
11 }
12 }
(a)TheC#sourceoftherunningexample.
1 static ICollection <Wrap>
2 WrapsCollection(int n) { 3 newobj List<Wrap>::.ctor() 4 stloc.0 5 ldc.0 6 stloc.1 7 br #18 8 9 ldloc.0 10 ldloc.1 11 newobj Wrap::.ctor(int32)
12 call ICollection <Wrap>::Add
13 ldloc.1 14 ldc.1 15 add 16 stloc.1 17 18 ldloc.1 19 ldarg.0 20 blt #9 21 22 ldloc.0 23 ret 24 }
(b)Compiling(a)intoCIL.
1 static WrapsCollection(I)LICollection {
2 new List
3 dup
4 invokespecial List.< init >
5 astore_1 6 iconst_0 7 istore_2 8 goto 23 9 10 aload_1 11 iload_2 12 istore_3 13 new Wrap 14 dup 15 iload_3 16 invokespecial Wrap.<init>
17 invokevirtual <ICollection .Add>
18 iload_2 19 iconst_1 20 iadd 21 istore_2 22 23 iload_2 24 iload_0 25 if_icmplt 10 26 27 aload_1 28 areturn 29 } (c)Translating(b)intoJB.
Fig. 2. The C# code, CIL, and JB of the running example.
1 void init ( ref int i )
2 { 3 i ++; 4 } 5 6 void run() 7 { 8 int i =0; 9 init ( ref i ); 10 } (a)C#code
1 void init (WrapRef i) {
2 i .value = i .value + 1;
3 }
4
5 int run() {
6 int i = 0;
7 WrapRef wrap = new WrapRef();
8 wrap.value = i ;
9 init (wrap);
10 i = wrap.value;
11 }
(b)Javacode
1 void init ( ref A& A)
2 { 3 ldarg.0 4 ldarg.0 5 ldind.i4 6 ldc. i4 .1 7 add 8 stind.i4 9 } 10 11 int run() 12 { 13 ldc. i4 .0 14 stloc.0 15 ldloca.s 0
16 call void Temporary.Foo
17 ::’ init ’( int32&)
18 }
(c)CIL
Fig. 3. CIL code usingrefparameters.
2.2.1. Examplewithdirectreferences
SafeC#codeadoptsdirectpointersonlyfor
out
andref
methodparameters.Theseparameters canbeassigned(and readaswell incaseofref
)insidethemethod,and“anychangetotheparameterinthecalledmethodisreflectedinthe callingmethod”.4 Inourtranslation,webuild wrapperobjectstosoundlyrepresenttheir semantics.Consider forinstance theC#codeinFig.3a.Methodinit
receivesaref
parameteranditincrementsitbyone.Thisiscompiled(Fig.3c)into amethodthat readsthevalue pointedby thedirectreference(line 5),andwritesit(line 8). Ourgoalistotranslate this code intothe Javacode inFig. 3b: we simulatethedirect referenceby constructing awrapper object(line 7), assigning thevalueofthelocalvariabletofieldvalue
ofthewrapper(line 8),andthenpropagatingbacktheresultsofthecalltoinit
(line 9)byassigningthevalueofthelocalvariablewiththeonestoredinthewrapper(line 10).Inaddition,theref
parameterisreplacedbythetypeofthewrapperobject.
Fig. 4. Architecture of the analysis.
2.3. Julia
Julia [4] isa static analyzerforJB, based onabstract interpretation [2]. It transforms JBto basic blocksof code,that it analyzesthroughafixpointalgorithm.Analysesare constraint-basedordenotational.Currently, Juliafeaturesaround 70 checkers,includingnullness,termination,synchronizationandtaintanalysis.SinceJuliaworksonJB,inprincipleitanalyzes anyprogramminglanguage thatcompilesintothatbytecode.Inparticular,inthisarticleweapplyittotheanalysisofthe compilation of CILinto JB.Note thatJulia verifies that the analyzedbytecode iswell-formed, that is, itmainly performs sanity checks to avoid the static analyzerto fail while analyzing the bytecode. Therefore, all the translated bytecode is verified by Julia before beinganalyzed. Julia relies onJava annotations in order to allow a userto specify the semantic modelofsomecomponents(e.g.,ifamethodisanentrypointinaparticularruntimeenvironment,oranAPIcallreturns user input). https://static.juliasoft.com/docs/latest/annotations.html reportsthe full list of Julia’s annotations. While these annotationsaddsemanticinformationabouttheapplicationandthelibraries,theydonotaffectthetranslationfromCILto JB.
Fig.4isahighlevelview ofouranalysis,wherethegraycomponentshavebeenimplementedandmodifiedtosupport .NETanalyses.Javacodeiscompiledby
javac
intoajar
file,thenparsedthroughtheByteCodeEngineeringLibrary [27] (BCEL). Juliareceives thislatterformat, appliesits analysis(byusingmanycomponents such asthe checkers,thatdefine the analysesrelying on variousabstractdomains, afixpoint engine,andtheframework specifyingthe semanticsofsome specific componentsoftheprogramminglanguage),andoutputsalistofwarnings.Theaddedcomponentinourapproach isthetranslationofCILintoBCEL format(greyarrowintheupperpartofFig.4).SinceaprograminBCELformatcanbe dumpedintoajarfile,wecandumpa.dllfileinthisformat.We instructed Julia by adding manual annotations to the main components of the .NET run-time environments. All together, this amounts to about 1.500 annotations on the main library APIs. For instance, we annotated field System
.
Decimal.
One to specify that it is externally initialized with Julia’s annotation Injected, and method System.
Web.
HttpRequest.
Params withUntrustedUserInputto specifythisreturns a userinput (thatis,it isa source forJulia’s In-jectionanalysis).3. Concretesemantics 3.1. Notation
Webrieflyrecallsomestandardnotationadoptedbythefurtherformalizationinthissectionandthenextone. Commonmathematicalelements. Firstofall,weadoptedstandardsetnotations,thatis:(i)a
∈
A denotesthatelementa is containedinset A,(ii) A×
B denotes theCartesianproduct ofsets A and B (thatis,theset ofall orderedpairs(
a,
b)
suchthata∈
A andb∈
V ,(iii)∪
and∩
denotesetunionandintersection,respectively,and(iv)|
A|
denotesthecardinality (thatis,thenumberofelements)ofset A.Aboutfunctions, wedenoteby (i) F
:
D→
C thesetofallfunctionsrelatingelementsinadomain D toacodomain C , (ii)[
d→
c]
thefunctionthatrelateselementd toc,(iii) f[
d→
c]
thefunctionthatbehaveslike f exceptforelementd that isrelatedtoc,and(iv) f(
d)
theapplicationoffunction f totheelementofthedomaind (e.g., f(
d)
=
c if f= [
d→
c]
).We represent lists and stacks as functions mapping natural indexes to elements. In particular, a stack s (or list) of elements in T isa functionin
N
→
T suchthat∃
i∈ N : ∀
i1≤
i:
i1∈
dom(
s)
∧ ∀
i2>
i:
i2∈
/
dom(
s)
; dom(
s)
denotes thedomainofthegivenfunction(thatis,theindexesonwhichstacks isdefinedinthisparticularcase).Wewillrefertoi as theheightofs (height
(
s)
).Givenastacks andanelemente,s::
e denotesastackwhosetopelement(thatis,theonewithtypeOf(v1)=typeOf(v2)
add, (s::v1::v2,l,a,h) →CIL(s:: (v1+v2),l,a,h) (add)
ldloc i, (s,l,a,h) →CIL(s::l(i),l,a,h) (ldloc)
stloc i, (s::v,l,a,h) →CIL(s,l[i →v],a,h) (stloc)
ldarg i, (s,l,a,h) →CIL(s::a(i),l,a,h) (ldarg)
isStatic(m(arg0,· · · , argi))=false∧t= null∧
body(m(arg0,· · · , argi), (t,v1,· · · ,vi)), ([], ∅, [0→t,j→vj:j∈ [1..i]],h) →CIL(s,l,a,h)
call m(arg1,· · · , argi), (s::t::v1:: · · · ::vi,l,a,h) →CIL(s,l,a,h)
(call)
isStatic(m(arg0,· · · , argi))=true∧
body(m(arg0,· · · , argi), (v1,· · · ,vi)), ([], ∅, [j−1→vj:j∈ [1..i]],h) →CIL(s,l,a,h)
call m(arg1,· · · , argi), (s::v1:: · · · ::vi,l,a,h) →CIL(s,l,a,h)
(call static)
fresh(T,h)= (r,h1)∧ body(ctor(arg1,· · · , argi), (v1,· · · ,vi)), ([], ∅, [0→r,j→vj:j∈ [1..i]],h1) →CIL(s,l,a,h)
newobj T(a1,· · · , ai), (s::v1:: · · · ::vi,l,a,h) →CIL(s::r,l,a,h)
(newobj) o= null ldfld f, (s::o,l,a,h) →CIL(s::h(o)(f),l,a,h) (ldfld) o= null s =h(o)[f →v] stfld f, (s::o::v,l,a,h) →CIL(s,l,a,h[o→s]) (stfld) typeOf(v1)=typeOf(v2)∧v1>v2 bgt l, (s::v1::v2,l,a,h) →CILl, (s,l,a,h)
(bgt true) typeOf(v1)=typeOf(v2)∧v1≤v2
bgt l, (s::v1::v2,l,a,h) →CIL(s,l,a,h) (bgt false) ldloca i, (s,l,a,h) →CIL(s::ri,l,a,h) (ldloca) stind, (s::ri::v,l,a,h) →CIL(s,l,a,h[ri→v]) (stind) dup, (s::v,l,a,h) →CIL(s::v::v,l,a,h) (dup) ldind, (s::ri,l,a,h) →CIL(s::h(ri),l,a,h) (ldind)
Fig. 5. Concrete CIL semantics.
thehighestindex)ise followedbythestacks.Inaddition,
< ...
>
denotesalistofelements(e.g.,<
a,
b>
denotesthelist representedbythefunction[
0→
a,
1→
b]
).Semantics. Whenformalizingconcretestatesandsemantics,wedenotedby
thesetofconcretestatesofexecutionforJB (
JB)andCIL(
CIL).ThesmallstepsemanticsofJBandCILisdenotedby
→
JBand→
CIL,respectively,where,forinstance, st,
σ
1→
JBσ
2 representsthattheexecutionofJBstatementstwithanentrystateσ
1 resultsintheexitstateσ
2.Thesmallstepsemanticsisdefinedthroughinferencerules.Forinstance,
n
=
n+
1 increment,n
→
nformalizesthatwhenweexecutestatementincrementonavaluen weobtainavaluendefinedasn
=
n+
1.Object-orientedcomponents. We denote the sets of referenceand numerical valuesby Ref andNum, respectively, and valuesby Val
=
Ref∪
Num.As usual forobject-oriented programminglanguages,an objectis amap from fieldnames to values,andaheapisamapfromreferencestoobjects.Formally,Heap:
Ref→
Field→
Val,whereFieldisthesetcontaining all field names. fresh(
T,
h)
= (
r,
h)
allocates an objectof type T in heap handreturns (i) thereference r ofthe freshly allocatedobject,and(ii)theheaphresultingfromtheallocationofmemoryonh.Forsimplicity,we consideronlyinteger(Int) andlong(Long)numericaltypes (NumTypes
= {
Int,
Long}
), andreferences (Ref).Givenavaluev,typeOf(
v)
returnsitstype(Int,Long,orRef).SinceJBinstructionsoftenprependaprefixtodistinguish instructionsdealingwithdifferenttypes(e.g.,iadd
andladd
),wedefineasupportfunctionJVMprefix that,givenatypet, returnstheprefixofthegiventype(i.e.,i
ift=
Int,l
ift=
Long,anda
ift=
Ref).WedefinebyWRefanobjecttypewith auniquefieldvalue.Given a method signature
m
and a list of arguments L (with the receiver in the first argument ifm
is not static), body(m,
L)
: N →
St returns the body of the method resolving the call, that is, a sequence of statements (represented byafunctionmappingindexestostatements).Similarly,eachstatementbelongstoamethod;hencegetBody(st)
=
b isthe bodyofthemethodwherest
occurs.Finally,isStatic(m
)meansthatm
isstatic.3.2. CIL
WedefinetheconcretesemanticsoftheCILfragmentofSection2.1.
ConcreteState. AlocalstateinCILiscomposedbyastackofvaluesorreferencetolocalvariablesStack
: N →
Val∪
RefLoc(where ri
∈
RefLoc representsthecell’s referenceof thei-thlocalvariable), anarray oflocalvariables Loc: N →
Val,andan array of method arguments Arg
: N →
Val. A concrete CIL state consistsof a local state anda heap, that is,CIL
=
Stack
×
Loc×
Arg×
Heap.ConcreteSemantics. Fig.5showstheconcreteCILsemantics
st,
σ
→
CILσ
.Forastatementst
andanentrystateσ
,ittypeOf(v)=Long dup, (s::v,l,h) →JB(s::v::v,l,h)
(dup) typeOf(v1)=Long
dup2, (s::v1::v2,l,h) →JB(s::v1::v2::v1::v2,l,h)
(dup2 32) typeOf(v)=Long
dup2, (s::v,l,h) →JB(s::v::v,l,h)
(dup2 64) typeOf(v1)=Int∧typeOf(v2)=Int
iadd, (s::v1::v2,l,h) →JB(s:: (v1+v2),l,h) (iadd)
typeOf(v1)=Long∧typeOf(v2)=Long
ladd, (s::v1::v2,l,h) →JB(s:: (v1+v2),l,h) (ladd) x=JVMprefix(typeOf(l(i))) xload i, (s,l,h) →JB(s::l(i),l,h) (xload) x=JVMprefix(typeOf(v)) xstore i, (s::v,l,h) →JB(s,l[i →v],h) (xstore)
isStatic(m(arg0,· · · , argi))=false∧t= null∧
body(m(arg0,· · · , argi), (t,v1,· · · ,vi)), ([], [0→t,j→vj:j∈ [1..i]],h) →JB(s,l,h)
invokevirtual m(arg1,· · · , argi), (s::t::v1:: · · · ::vi,l,h) →JB(s,l,h)
(invokevirtual)
isStatic(m(arg0,· · · , argi))=true∧
body(m(arg0,· · · , argi), (v1,· · · ,vi)), ([], [j−1→vj:j∈ [1..i]],h) →JB(s,l,h)
invokestatic m(arg1,· · · , argi), (s::v1:: · · · ::vi,l,h) →JB(s,l,h)
(invokestatic) fresh(T,h)= (r,h) new T, (s,l,h) →JB(s::r,l,h) (new) o= null getfield f, (s::o,l,h) →JB(s::h(o)(f),l,h) (getfield) o= null s=h(o)[f →v] putfield f, (s::o::v,l,h) →JB(s,l,h[o→s])
(putfield) typeOf(v1)=Int∧typeOf(v2)=Int∧v1>v2
if_icmpgt l, (s::v1::v2,l,h) →JBl, (s,l,h)
if_icmpgt true
typeOf(v1)=Int∧typeOf(v2)=Int∧v1≤v2
if_icmpgt l, (s::v1::v2,l,h) →JB(s,l,h)
if_icmpgt false
Fig. 6. Concrete JB semantics.
executeisthatat
l
.Otherwise,thenextinstructiontoexecuteisimplicitlyassumedtobethesubsequentone,sequentially (ifany).Forthemostpart,theconcretesemanticsisastraightforwardformalizationoftherun-timesemanticsdefinedbytheCIL ECMAStandard [9].Forinstance,rule
add
popsthetwotopmostvaluesoftheoperandstackandreplacesthemwiththeir addition.However,itssemanticsisdefinediffthetwovalueshavethesametype.Instead,ldloc
i
pushestotheoperand stackthevalue ofthei-thlocalvariables,whilestloc
i
storesthetopoftheoperandstackintothe i-thlocalvariable. Statements workingwithobjects,such asldfld
andstfld
,read fromandwrite intothe heap,iftheir receiveris notnull
.Rulescall
andcall
static
createaframe(i.e.,anarrayofarguments,anemptyarrayoflocalvariablesandan empty operandstack),executethecalleeandleaveitsreturnedvalue onthestack,ifany.Forsimplicity,theformalization assumes that there isno returned value.5 Finally,ldloca
i
loadsthe reference to the i-th local variable to the stack (represented byri),stind
storesthegivenvalue tothegivenreference,andldind
loadsthevalue pointedbythegiven reference.3.2.1. Runningexample
Consider the runningexamplein Fig.2bandapply theconcrete semanticswhen
n
is 1,thatis, whentheentrystate consistsofanemptyoperandstackandofanarrayoflocalvariables,whilethevalueoftheargumentsis[
0→
1]
.Assume thatnewobj
allocatestheobjectataddress#1.Then afterthefirstblock(lines 3-7)theaddressoftheobjectisstoredin localvariable 0,localvariable 1(representingvariablei
ofthesourceprogram)holds0,andaddress#1 intheheapholds an objectoftypeList
Collection
.Formally,theconcretestateatline 7is(
∅,
[
0→
#1,
1→
0],
[
0→
1],
[
#1→<>])
, where<>
standsfortheemptylist.Thenthebodyofthefor
loop(lines 9-16)isexecutedonceandcreatesanewWrap
object(assumeataddress#2)wrapping0,addsittothelistataddress#1 andincrementscounter
i
(i.e.,localvariable 0) by1.Hencetheexecutionoftheconcretesemanticsofthebodyoftheloopleadstotheconcretestateσ
= (∅,
[
0→
#1,
1→
1],
[
0→
1],
[
#1→<
#2>,
#2→ [
f→
0]])
atline 16.Theconditionatline 20willthenroutetheprogramtoline 22(since argument 0andlocalvariable 1hold1).Inconclusion,theconcretesemanticswillreachstatementret
atline 23witha stateσ
CIL equaltoσ
,butwheretheoperandstackcontainsreference#1.3.3. JB
WedefinetheconcretesemanticsoftheJBfragmentofSection2.1.
ConcreteState. A localstateinJBconsistsofastackofvaluesStack
: N →
Val andanarrayoflocalvariablesLoc: N →
Val.AconcreteJBstateconsistsofalocalstateandaheap,thatis,JB
=
Stack×
Loc×
Heap.5 Thiswouldhaverequiredtodefinetwodistinctrules(theonewithreturnedvalueswherethefinalstackcontainsthereturnedvaluefollowedby
thepreviousstackinadditiontotheonealreadyformalized).However,theimplementationfullysupportsthiscaseaswell,andweomitithereonlyto improvethereadabilityoftheformalizationandformalproofs.
ConcreteSemantics. Fig.6reportstheconcreteJBsemantics
→
JB.Forastatementst
andanentrystateσ
,ityieldsthestate
σ
resultingfromtheexecutionofst
overσ
.Forthemostpart,thebehaviorofthissemanticsisidenticaltothatfor CIL.ThemaindifferencesarethatJBinstructionsworkonspecifictypes(e.g.,whileCILadd
statementaddstwovaluesof thesametype,JBiadd
andladd
statementsadd thevaluesiffthey arebothint
orlong
,respectively),andnew
only allocatesanewobject,whileCILnewobj
statementsalsocallsaconstructor.3.3.1. Runningexample
TheapplicationoftheJBconcretesemanticsissimilar tothatforCIL,butthere aretwominordifferences:(i)thereis onlyonearrayoflocalvariablesrepresentingbothCILargumentsandlocalvariables(e.g.,CILlocalvariable 0isrepresented byJBlocalvariable 1,sincethefirstlocalvariableholdstheargumentofthemethod);and(ii)thereisaninstrumentation localvariable6 atindex 3.Therefore,afterweapplytheJBconcretesemanticsfromtheentrystatethatmapstheargument to1,weobtaintheconcretestate
(
[
0→
#1],
[
0→
1,
1→
#1,
2→
1,
3→
0],
[
#1→<
#2>,
#2→ [
f→
0]])
.4. FromCILtoJB 4.1. Concretestates
Function
T
σJ K
:
CIL→
JBtranslates CILconcretestates intoJBconcrete states(where l representsthenumber ofelementsinthestackofs):
T
σJ(
s,
l,
a,
h)
K
= (
s,
cnvrtLoc(
l,
a),
hl)
.Thisfunction(i) replacesdirectreferenceswithwrapperobjects,and(ii)mergesthearray oflocalvariablesand argu-ments,adjustingvariableindexesfor64bitsvalues.Formally:
i
∈
dom(s),l
=
height(s),
s=
i
→
s
(i)
if s(i)
∈
Valri if s
(i)
∈
RefLocwhere h−1
=
h,
and(
hi,
ri)
=
allocWrp(hi−1,i),
allocWrp(h,
j)=
⎧
⎨
⎩
(
h, null)
if s(
j)∈
Val(
h[
r→
h(r)
[value →
l(
j)]])
where
(r,
h)
=
fresh(WRef,
h)
if s(i)
∈
RefLocIntuitively,eachdirectreferenceintheoperandstack(thatis,s
(
i)
∈
RefLoc)isreplacedbyanotherreferencepointingtoawrapperobjectfreshlyallocatedandcontaininginitsfieldthevaluepointedbytheoriginaldirectreference. Then,foranarrayofvaluesb andanindexi,thefollowingfunctioncountsthe64bitstypesamongthefirsti:
64bi
= |{
j:
0≤
j<
i and b[
j]
is a 64 bit value}|
ThenthearraycnvrtLoc
(
l,
a)
isdefinedasfollows:∀
0≤
i<
|
a| :
cnvrtLoc(l,
a)
[
i+
64ia] =
a[
i]
∀
0≤
i<
|
l| :
cnvrtLoc(l,
a)
[|
a| +
64|aa|+
i+
64il] =
l[
i]
Definition1.Considerafunction f thatmapselementsofa pairofarrays
(
a,
b)
intoelements ofan arrayc.Let f(
a[
i])
=
c[
i]
and f(
b[
i])
=
c[
i]
.Thefunction f isanidentityembedding ifthefollowingconditionshold:1
)
∀
0≤
i<
|
a| :
a[
i] =
c[
i]
and∀
j<
|
b| :
b[
j] =
c[
j]
2
)
∀
0≤
i,j<
|
a| :
i≤
j⇒
i≤
j and∀
0≤
h,k<
|
b| :
h≤
k⇒
h≤
k3
)
{
i:
0≤
i≤ |
a|} ∩ {
j:
0≤
j≤ |
b|} = ∅
Lemma1.ThefunctioncnvrtLoc isanidentityembedding.
Proof. Itissufficienttoobservethat,byconstruction,thefunctionconcatenatesthetwoarraysbyshiftingindexeswhena 64bitsvalueoccurs,hencepreservingthevaluesandtheorderingoftheelements’indexes.
2
4.1.1. Runningexample
Consider theCILexitstatecomputedinSection3.2,thatis,
σ
CIL= ([
0→
#1],
[
0→
#1,
1→
1],
[
0→
1],
[
#1→<
#2>,
#2
→ [
f→
0]])
. TheCIL toJB translationcomputesT
σJ
σ
CILK
= ([
0→
#1],
[
0→
1,
1→
#1,
2→
1],
[
#1→<
#2>,
#2→
[
f→
0]])
(it merges CILargumentsandlocalvariables). Thisstate isalmost identicaltotheexit stateofthe JBconcrete semanticsappliedtotherunningexample(Section3.3)exceptfortheinstrumentationlocalvariableatindex 3.TJdup,s:: t,l,a,wK = dup if t=Long dup2 if t=Long TJadd,s:: t1:: t2,l,a,wK = iadd if t1=t2=Int ladd if t1=t2=Long
TJldloc i,s,l,a,wK = xloadj where j= |a| +64a|a|+i+64il∧x=JVMprefix(typeOf(l(i)))
TJstloc i,s:: t,l,a,wK = xstorej where j= |a| +64|aa+i+64i
l∧x=JVMprefix(typeOf(l(i)))
TJldarg i,s,l,a,wK = xloadj where j=i+64i
a∧x=JVMprefix(typeOf(a(i)))
TJcall m(arg1,· · · , argi), = invoke; aloadp1idx1; getfield value ;xidx1storep 2 idx1 ; · · · s::t1:: · · · ::ti,l,a,w::p1:: · · · ::piK · · · aloadp1idx
j; getfield value ;xidxjstorep
2 idxj;
whereinvoke=
invokestatic m(arg1,· · · , argi) if isStatic(m(arg1,· · ·, argi))
invokevirtual m(arg1,· · · , argi) otherwise
{idx1,· · · ,idxj} = {k: argk∈RefLoc}
∀k∈ [1..j] :xidxk=JVMprefix(typeOf(l(p 2 idxk)))∧ ∀r∈ [1..i] :pi= (p 1 i,p 2 j)
TJnewobj T(a1,· · · , ai), = xistoreidxi; · · · ;x1storeidx1; new T ; dup ;
s::t1:: · · · ::ti,l,a,wK x1loadidx1; · · · ;xiloadidxi; invokevirtual < init > (arg1,· · · , argi)
where∀j∈ [1..i] :xj=JVMprefix(aj)∧idxj=freshIdx(newobj T(a1,· · · , ai),j)
TJldfld f,s:: to,l,a,wK = getfield f
TJstfld f,s:: to:: tv,l,a,wK = putfield f
TJbgt k,s:: t1:: t2,l,a,wK = if_icmpgt kwhere k=statementIdx(getBody(bgt k)(k))if t1=t2=Int
TJldloca i,s,l,a,wK = TJnewobj WrapRef() ; dup2 ; stloc j; ldloc i; stfld value,s,l,a,wK where j=freshIdx(ldloca i,0)
TJstind,s,l,a,wK = TJstfld value,s,l,a,wK TJldind,s,l,a,wK = TJldfld value,s,l,a,wK
Fig. 7. Translation of CIL statements into JB.
4.2. Statements
Fig. 7 formalizesthe translation
T
Jst
CIL, K
K
= st
JB of a single CIL statement into a sequence ofJB statements.Thecomponentswithalinearaccent(e.g.,s)denotesstaticinformationaboutcomputationalvalues.Inparticular,ourtranslation relieson statictype informationaboutlocals (denotedby l),arguments(a)andstackelements (s),computedat
st
CIL bya standard algorithm [9]. Inparticular,types andheight ofthestack arefixed andknownforeach bytecode.Inaddition, thefourthcomponentwisastackofelementin
⊥
∪ (N × N)
that,foreachelementinthestack,tells(i)⊥
ifitisnot a directreference,or(ii)(
i,
j)
wherei istheindexofthelocalvariablespointedbythedirectreference,7 and j istheindex of thelocal instrumentationvariable containing apointer to thewrapper simulatingthereference. Instrumentation local variablesareappendedattheendofthearrayofJBlocalvariables(thatis,aftertheargumentsandthelocalvariablesused intheprogram).TheyareneededinordertocachesomevaluesthatarelaterneededtorepresentthesemanticsoftheCIL programinitsJBtranslation.Few CILstatements (namely,
ldfld
andstfld
) have a one-to-one translationinto a JB statement (getfield
andputfield
).Thestatementsreadingandwritinglocalvariablesandarguments(ldarg
,ldloc
,andstloc
)aretranslated intotheir JBcounterpart(xload
,xstore
,respectively),takingintoaccountthetype ofthevalueatthetopofthestack, andadjustingtheindexofthevariable,takingintoaccountargumentsand64bitvariables.SomeCILstatements(dup
)get translatedintodifferentJBstatementsonthebasisofcontextualinformationsuchasthetypeofvaluesintheoperandstack (dup
anddup2
).Other CILstatementscanbetranslatedonlyifthetypeofthevaluesintheoperandstackisnumeric:(i)add
can betranslatedintoladd
andiadd
,and(ii)bgt
tois_icmpgt
;iftheyare appliedtoreferences(asingeneric CILcode),thenthecodeisconsideredunsafeandisnottranslated.call
requiresto (i)translate themethod callto thecorresponding staticordynamic invocationstatement inJB,and (ii)to propagatethesideeffectsondirectpointerspassed tothemethodasout
/ref
parameterstothelocalvariablesof the callee. The translationofnewobj
is trickybecause ofthe differentpatterns usedin CILandJB forobject creation.8 While CIL createsandinitializes the object (i.e.,calls its constructor)witha single instruction, JB splits theseoperations andrequiresthenewlycreatedobjecttooccurbelowtheargumentsonthestack,beforecallingtheconstructor.Hence,the translationreliesonafunctionfreshIdx tostoreandloadthevaluesoftheconstructorargumentsthroughinstrumentation localvariables.Inparticular,givenaCILmethodm, thenumberandtypesofargumentsandlocalvariablesofthemethod are known(SectionII.15.4of [9]).Therefore,functioncnvrtLoc tellswhichlocalvariablesthe translatedJBmethodalready uses.Then,foreachargumentofeachnewobj
statementinm, itispossibletoallocateafreshlocalvariabletostoreand loaditsvalue.Inthisway,thetranslationallocatesanewobjectandputsitsaddressbelowtheconstructorarguments.7 SincethelanguageweintroducedinFig.1supportsonlyldlocatogetadirectpointer,weneedtotrackonlythisinformationintheformalization. 8 For sakeofsimplicity,weassumethe constructordoes nothaveout/refparameters.Intheimplementation,they aretreatedas forthecall
Instructionsdealingwithdirectpointers(namely,
ldloca
,stind
,andldind
inourminimallanguage)aretranslated through equivalent CILinstructions dealing withwrapperobjects (and their fieldvalue
). Therefore,stind
andldind
aresimplytranslatedthroughequivalent writeandreadoffield
value
,respectively.ldloca
insteadrequirestoallocate a wrapper objectnewobj
,stores a reference to the wrapper (stloc
) inan instrumentation variable obtainedthrough freshIdx,storesthevaluepointedbythedirectreferenceinthelocalvariabletoitsfieldvalue
(ldloc
andstfld
),and leavesareferencetothewrapperinthestack(dup
).Each CIL statement is translated into one or more JB statements, hence offsets are not preserved. Thus, function statementIdx
:
St→ N
yields the JB offset of the first statement in the translation of the given CIL statement. In addi-tion,sincedirectreferencesarereplacedbywrapperobjects,whenamethodparameterhasadirectreferencetype&T
(and thishappenswhenitisaref
orout
parameterinsafeC#),thisisreplacedbyawrapperobjectWRef.4.2.1. Runningexample
Considerthe runningexampleinFig.2.MostCILstatementsaretranslatedintoa singleJBstatement (e.g.,lines 18-20 and 22-23ofFig.2baretranslatedintolines 23-25and 27-28ofFig.2c),withthenoticeableexceptionoftheCIL
newobj
statements at lines 3 and 11, translated into lines 2-4 and 12-16, respectively. The former passes no argument to the constructor;thelatter(thatinstantiatesa
Wrap
)callsaconstructorwithanargument,hencerequiringaninstrumentation variableatindex 3.4.2.2. Directreferences
AssketchedinSection2.2.1,wemodelthesemanticsofpointersinsafeC#codethroughwrapperobjects.Inparticular,
ldind
(line 5ofFig.3c)istranslatedintothefieldaccessi.value
(rightsideoftheassignmentatline 2ofFig.3b),whilestind
(line 8 ofFig.3c)istranslatedintotheassignment ofi.value
(leftsideofthe assignmentatline 2ofFig.3b). In addition,ldloca
(line 15of Fig.3c) leads tothe construction andassignment ofa wrapperobject(lines 7 and 8 of Fig.3b),whileafterthemethodcallthevaluecontainedinthewrapperobjectiswrittenintothelocalvariable(line 10of Fig.3b).4.3. Correctness
ThissectionprovesthatthetranslationfromCILtoJBstatementsiscorrect.Namely,givenaconcreteCILstate
σ
CIL andapplyingtheoperationalsemanticsforastatement
st
,oneobtainsastatethat,whentranslatedintoJB,isexactlythestate resultingfromthetranslationofσ
CIL intoJBandtheapplicationoftheJBsemanticstoit:∀st ∈
StCIL,
σ
CIL∈
CIL: st,
σ
CIL→
CILσ
CIL andTJst, KK, T
σJ
σ
CILK →
JBσ
JB⇓
T
σJ
σ
CILK =
•σ
JBwhere
σ
1=
•σ
2 meansthatthetwostatesareequaluptoinstrumentationvariablesintroducedbythetranslationprocess.Formally,let
σ
1= (s
1, l
11:: . . . :: l
n1
, h
1)
andσ
2= (s
2, l
12:: . . . :: l
n2
,
:: l
n+12:: . . .:: l
n+k2, h
2)
,thenσ
1=
•σ
2 iffs
1= s
2,and∀i ≤ n : l
i1
= l
i2, and
h
1= h
2. Note that instrumentation variables are present only in the JB state, hence in the righthand-sideoftheequality. 4.3.1. Formalproofofsoundness
Lemma2.Thetranslationof
ldloc
iscorrect.Proof. Let usconsider the caseof integer arguments(the other casecan be treated analogously).Let usprove that the translationof
ldloc
forintegervaluesiscorrect,i.e.,that∀
σ
CIL∈
CIL,ifldloc i,
σ
CIL→
CILσ
CIL andTJldloc i,
(
s,
l,
a
,
w)
K
,T
σJ
σ
CILK
→
JBσ
JB thenT
σJ
σ
CILK
=
•σ
JB .Letσ
CIL= (
s,
l,
a,
h)
bearbitrary.Bytheldloc
−
CILrulewehaveσ
CIL=
(s
:: l[i], l, a, h)
.Byrule (3)ofFig.7,T
Jldloc i,
s,
l,
a,
wK
= iload
j where j= |
a|
+
64a|a|+
i+
64il.By definitionof
T
σJ
K
:T
σJ
σ
CILK = T
σJ(s, l, a, h)K = (s
,cnvrtLoc(l, a), h
).
Bytheiload
−
JBrule:iload j, (s
,cnvrtLoc(l, a), h
)
→
JB
(s
::
cnvrtLoc(l, a)[j],
cnvrtLoc(l, a), h)
=
σ
JB.
Bydefinitionof
T
σJ
K
wehaveT
σJ
σ
CILK = T
σJ(s :: l[i], l, a, h)K = (s :: l[i],
cnvrtLoc(l, a), h).Lemma3.Thetranslationof
ldarg
iscorrect.Proof. Let us consider thecase ofinteger arguments (the other case can be treatedanalogously). Letus provethat the translationof
ldarg
forintegervaluesiscorrect,i.e.,that∀
σ
CIL∈
CIL,ifldarg i,
σ
CIL→
CILσ
CIL andTJldarg i,
(
s,
l,
a
,
w)
K, T
σJ
σ
CILK
→
JBσ
JB thenT
σJ
σ
CILK
=
•σ
JB .Letσ
CIL= (
s,
l,
a,
h)
bearbitrary.Bytheldarg
−
CILrulewehaveσ
CIL=
(s
:: a[i], l, a, h)
.By rule (5)of Fig. 7,T
Jldarg i,
s,
l,
a,
wK
= iload
j where j=
i+
64a|a|+
i+
64il. Bydefinition of
T
σJ
K
:T
σJ
σ
CILK = T
σJ(s, l, a, h)K = (s
,
cnvrtLoc(l, a), h).
Bytheiload
−
JBrule:iload j, (s
,
cnvrtLoc(l, a), h)
→
JB
(s
::
cnvrtLoc(l, a)[j],
cnvrtLoc(l, a), h)
=
σ
JB.
Bydefinitionof
T
σJ
K
wehaveT
σJ
σ
CILK = T
σJ(s :: l[i], l, a, h)K = (s :: l[i],
cnvrtLoc(l, a), h).BydefinitionofcnvrtLoc wehavecnvrtLoc
(l,
a
)
[j] = l[i]
,whichimpliesT
σJ
σ
CILK
=
σ
JB ,andthusT
σJ
σ
CILK
=
•σ
JB .2
Lemma4.Thetranslationof
stloc
iscorrect.Proof. Let us consider thecase ofinteger arguments (the other case can be treatedanalogously). Letus provethat the translationof
stloc
forintegervaluesiscorrect,i.e.,that∀
σ
CIL∈
CIL,ifstloc i,
σ
CIL→
CILσ
CIL andTJstloc i,
(
s,
l,
a
,
w)
K
,T
σJ
σ
CILK
→
JBσ
JB thenT
σJ
σ
CILK
=
•σ
JB . Letσ
CIL= (
s::
v,
l,
a,
h)
be an arbitrary state where the stack hasan integer value v at the top. By the
stloc
−
CIL rule we haveσ
CIL= (s, l[i → v], a, h)
. By rule (4) of Fig. 7,T
Jstloc i,
s:: t,
l,
a,
wK
= istore
j where j= |
a|
+
64|aa|+
i+
64il.Bydefinitionof
T
σJ
K
:T
σJ
σ
CILK = T
σJ(s :: v, l, a, h)K = (s
::
v,cnvrtLoc(l, a), h)
Bytheistore
−
JBrule:istore j, (s
::
v,cnvrtLoc(l, a), h)
→
JB
(s
,
cnvrtLoc(l, a)[j → v], h
)
=
σ
JB.
Bydefinitionof
T
σJ
K
wehaveT
σJ
σ
CILK = T
σJ(s, l[i → v], a, h)K = (s,
cnvrtLoc(l[i → v], a), h
)
By definition ofcnvrtLoc we have that cnvrtLoc
(l
[i → v],
a
)
=
cnvrtLoc(l,
a
)
[j → v]
which impliesT
σJ
σ
CILK
=
σ
JB ,andthus
T
σJ
σ
CILK
=
•σ
JB .2
Lemma5.Thetranslationof
add
iscorrect.Proof. Let us consider thecase ofinteger arguments (the other case can be treatedanalogously). Letus provethat the translation of
add
for integer values is correct, i.e., that∀
σ
CIL∈
CIL, ifadd,
σ
CIL→
CILσ
CIL andTJadd, (
s,
l,
a,
w)
K,
T
σJ
σ
CILK
→
JBσ
JB thenT
σJ
σ
CILK
=
•σ
JB . Letσ
CIL= (
s::
v1::
v2,
l,
a,
h)
be an arbitrary state where the stack hastwointeger values v1 and v2 atthe top. By the
add
−
CILrule we haveσ
CIL= (s :: (v
1+ v
2), l, a, h)
. By rule(2) ofFig. 7,T
Jstloc i,
s:: t
1:: t
2,
l,
a,
wK
= iadd
j.BydefinitionofT
σJ
K
:T
σJ
σ
CILK = T
σJ(s :: v
1:: v
2, l, a, h)
K = (s
::
v1::
v2,cnvrtLoc(l, a), h
)
Bythe
iadd
−
JBrule:iadd j, (s
::
v1::
v2,
cnvrtLoc(l, a), h)
→
JB(s
:: (v
1+ v
2),
cnvrtLoc(l, a), h)
=
σ
JB.
Bydefinitionof
T
σJ
K
wehaveT
σJ
σ
CILK = T
σJ(s :: (v
1+ v
2), l, a, h)
K = (s
:: (v
1+ v
2),
cnvrtLoc(l, a), h)
whichimplies
T
σJ
σ
CILK
=
σ
JB ,andthusT
σJ
σ
CILK
=
•σ
JB .2
Proof. Let usconsider the caseof integer arguments(the other casecan be treated analogously).Let usprove that the translation of
dup
for integer values is correct, i.e., that∀
σ
CIL∈
CIL, ifdup,
σ
CIL→
CILσ
CIL andTJadd, (
s,
l,
a,
w)
K,
T
σJ
σ
CILK
→
JBσ
JB thenT
σJ
σ
CILK
=
•σ
JB .Letσ
CIL= (
s::
v,
l,
a,
h)
beanarbitrarystatewherethestackhasanintegervaluev atthetop.Bythe
dup
−
CILrulewehaveσ
CIL= (s :: v :: v, l, a, h)
.Byrule(1)ofFig.7,T
Jdup,
s:: t,
l,
a,
wK
= dup
j. BydefinitionofT
σJ
K
:T
σJ
σ
CILK = T
σJ(s :: v, l, a, h)K = (s
::
v,cnvrtLoc(l, a), h).
Bythedup
−
JBrule:dup, (s
::
v,cnvrtLoc(l, a), h)
→
JB(s
:: v :: v,
cnvrtLoc(l, a), h)
=
σ
JB.
Bydefinitionof
T
σJ
K
wehaveT
σJ
σ
CILK = T
σJ(s :: v, l, a, h)K = (s
:: v,
cnvrtLoc(l, a), h)
whichimplies
T
σJ
σ
CILK
=
σ
JB ,andthusT
σJ
σ
CILK
=
•σ
JB .2
Lemma7.Thetranslationof
ldfld
iscorrect.Proof. Letusconsiderthecaseofanumericalfield(thecaseofa referencefieldcanbetreatedanalogously).Wewantto provethat
∀
σ
CIL∈
CIL,ifldfld f,
σ
CIL→
CILσ
CIL andTJldfld f, (
s,
l,
a,
w)
K, T
σJ
σ
CILK
→
JBσ
JB thenT
σJ
σ
CILK
=
•σ
JB .Letσ
CIL= (
s::
o,
l,
a,
h)
beanarbitrarywherethestackcontainsareferenceo at thetop.Bytheldfld
−
CILrulewehave
σ
CIL= (s :: h(o)(f), l, a, h)
.Byrule(8)ofFig.7,T
Jldfld f,
s:: t
o,
l,
a,
wK
= getfield f
.BydefinitionofT
σJ
K
:T
σJ
σ
CILK = T
σJ(s :: o, l, a, h)K = (s
:: r
o,cnvrtLoc(l, a), h
)
wherero isdefinedasareferencetoawrapperJBobjectrepresentingtheCILobject.Bythe
getfield
−
JBrule:getfield f, (s
:: r
o,
cnvrtLoc(l, a), h)
→
JB(s
:: h
(r
o)(f),
cnvrtLoc(l, a), h)
=
σ
JB.
Bydefinitionof
T
σJ
K
wehaveT
σJ
σ
CILK = T
σJ(s :: h(o)(f), l, a, h)K = (s
:: h(r
o)(f),
cnvrtLoc(l, a), h)
whichimplies
T
σJ
σ
CILK
=
σ
JB ,andthusT
σJ
σ
CILK
=
•σ
JB .2
Lemma8.Thetranslationof
stfld
iscorrect.Proof. Letusconsiderthecaseofanumericalfield(thecaseofa referencefieldcanbetreatedanalogously).Wewantto provethat
∀
σ
CIL∈
CIL,ifstfld f,
σ
CIL→
CILσ
CIL andTJstfld f, (
s,
l,
a,
w)
K, T
σJ
σ
CILK
→
JBσ
JB thenT
σJ
σ
CILK
=
•σ
JB . Letσ
CIL= (
s::
o::
v,
l,
a,
h)
be an arbitrarywhere thestack contains areferenceo and avalue v at the top.Bythestfld
−
CIL rule we haveσ
CIL= (s, l, a, h[o → h(o)[f → v]])
. By rule (9) of Fig. 7,T
Jstfld f,
s:: t
o:: t
v,
l,
a,
wK
=
putfield f
.BydefinitionofT
σJ
K
:T
σJ
σ
CILK = T
σJ(s :: o :: v, l, a, h)K = (s
:: r
o:: v,
cnvrtLoc(l, a), h)
wherero isdefinedasareferencetoawrapperJBobjectrepresentingtheCILobject.Bythe
putfield
−
JBrule:putfield f, (s
:: r
o:: v,
cnvrtLoc(l, a), h)
→
JB(s
:: h
(r
o)(f),
cnvrtLoc(l, a), h[r
o→ h
(r
o)
[f → v]])
=
σ
JB.
Bydefinitionof
T
σJ
K
wehaveT
σJ
σ
CILK = T
σJ(s, l, a, h[o → h(o)[f → v]])K
= (s
,cnvrtLoc(l, a), h
[r
o→ h(o)[f → v]])
= (s
,cnvrtLoc(l, a), h
[r
o
→ h
(r
o)
[f → v]])
whichimplies