A Formalization of the Static Semantics of Rust

(1)

Universit`

a degli Studi di Pisa

DIPARTIMENTO DI INFORMATICA Corso di Laurea Magistrale in Informatica

A Formalization of the Static Semantics of Rust

A Formalization and an Implementation with Spoofax of the Static Semantics of Rust

Candidato:

Amin Ait Lamqadem

Relatore:

Andrea Corradini

(2)

(3)

5 Name Binding and Typing for MiniRust 31 5.1 Introduction . . . 31 5.2 Types . . . 31 5.2.1 Type Environment . . . 32 5.3 Typing Relation . . . 33 5.3.1 Module Elements . . . 33 5.3.2 Statements . . . 35 5.3.3 Variants . . . 35 5.3.4 Expressions . . . 36 5.3.5 Patterns . . . 37 5.3.6 Paths . . . 38 5.3.7 Types . . . 39

5.3.8 Mutability and Lifetimes . . . 39

5.4 Subtyping . . . 40

5.5 Implementation of the Type Checker using NaBL2 . . . 40

5.5.1 Scope Graphs . . . 40 5.5.2 NaBL2 . . . 43 5.5.3 Example . . . 47 5.6 Summary . . . 48 6 Lifetime Inference 49 6.1 Introduction . . . 49

6.2 Lexical and Non-Lexical Lifetimes . . . 49

6.3 Lifetime Relation . . . 50 6.4 Lifetime Checking . . . 52 6.4.1 New Names . . . 52 6.4.2 Borrowing . . . 53 6.4.3 Subtyping . . . 54 6.4.4 Checking Lifetimes . . . 55

6.5 Lifetime Variables and Substitution . . . 57

6.6 Minimal Scopes . . . 57

6.7 Constraint-Based Lifetime Checking . . . 58

6.8 Constraints Resolution . . . 62

6.9 Implementation of Lifetime Inference using Stratego . . . 65

6.10 Summary . . . 67

7 Borrow Checker 69 7.1 Introduction . . . 69

7.1.1 Path . . . 69

(5)

CONTENTS 5

7.1.3 Borrowing . . . 70

7.2 Restrictions . . . 71

7.3 Move Semantics . . . 72

7.4 Borrow Checker Rules . . . 73

7.4.1 Rules on restrictions . . . 75

7.4.2 Rules on duration . . . 77

7.4.3 The use of a path . . . 79

7.5 Borrow Checking for MiniRust . . . 79

7.5.1 Module elements . . . 79

7.5.2 Statements . . . 80

7.5.3 Expressions . . . 80

7.5.4 Control flow . . . 81

7.5.5 Array approximation . . . 82

7.6 Implementation of the Borrow Checker using Stratego . . . 82

7.7 Conclusion . . . 83 8 Related Work 85 8.1 Typing . . . 85 8.2 Influences on Rust . . . 85 8.3 Rust Formalizations . . . 85 8.4 Language Workbenches . . . 86 9 Conclusions 87 9.1 Context . . . 87 9.2 Summary . . . 87 9.3 Future Work . . . 88

A Borrow Checking Rules 89 A.1 Crates . . . 89

A.2 Module Elements . . . 89

A.3 Statements . . . 89

A.4 Expressions . . . 90

A.5 Other rules . . . 92

A.5.1 Using a Path . . . 92

A.5.2 Restricted Operations . . . 92

A.5.3 Duration Restrictions . . . 92

A.5.4 Paths . . . 93

A.5.5 Loans . . . 95

A.5.6 Types . . . 95

(6)

(7)

Chapter 1

Introduction

1.1 Context of the Study

New programming languages are constantly developed. The specification of a program-ming language involves various language aspects. One of these aspects is the specification of the static semantics. The static semantics of a language describes when a program is well-formed, using a type system. A static semantics serves two purposes: it guarantees that a class of runtime errors cannot occur (safety) and that certain runtime imple-mentation strategies will work correctly (efficiency) [10]. Type checkers are algorithms that implement the static semantics. This algorithms are usually written with general purpose languages, that deal with low level details hiding the declarative nature of the specification.

Language workbenches are tools that simplify the implementation of a compiler and of an integrated development environment for a programming language [6]. They provide high-level declarative metalanguages, that are used to specify different language aspects, such as syntax checking and static semantics. These high-level specifications are used by the language workbench to generate the correspondent implementation, such as a parser and a type checker.

Rust is a programming language recently developed by Mozilla ([7], 2010). The lan-guage tries to substitute C and C++ in the area of system programming. The goals of the language are to produce efficient code, while guaranteeing a certain degree of memory safety. To meet these requirements the language uses a novel system of ownership, bor-rowing and lifetimes. These are part of the static semantics of the language. Ownership ensures that at any point there is only one owner of any given resource. Borrowing allows to create a reference that borrows the ownership of a resource temporarily. Lifetimes are used to determine the duration of each borrow and they are inferred by the compiler. The Borrow Checker is the part of the compiler that ensures that the ownership system is respected.

(8)

2 CHAPTER 1. INTRODUCTION

1.2 Problem Statement

The Rust type system and its claims on memory safety have attracted some attention of researchers to define a formalization of its static semantics [1, 11, 22] . Rusty Types [1] and Rust Belt [11] use, respectively, a calculus and a separation logic, to formalize the language. This methods does not allow to derive a high level specification suitable for tools like language workbenches. Patina [22], instead, formalizes the Rust’s Borrow Checker defining a type system. Although this formalization can be used to derive a declarative specification, it does not include important aspects of borrow checking, such as lifetime inference. Furthermore the formalization is limited to a sub-language that does not include important constructs for borrow checking. For example, it does not formalize borrow checking for arrays, on which the Borrow Checker shows a special behavior.

1.3 Aim and Scope

The goal of this thesis is to define a formalization of the static semantics of Rust. This formalization should include important aspects such as lifetime inference. It also extends prior work to cover a set of constructs of the language large enough to show special cases of the Borrow Checker. On top of that, the formalization must be suitable to derive a declarative specification usable in a language workbench. Our focus for this formalization will be the Borrow Checker, thus we limit our formalization to a subset of the language that includes only essential features for borrow checking.

1.4 Significance of the Study

The contributions of this thesis are the following:

• We present a formalization of the Rust’s static semantics that includes lifetime inference.

• We implemented this formalization in the Spoofax language workbench through a set of declarative specifications.

1.5 Overview of the Study

The thesis is structured as follows: in Chapter 2 we introduce Rust and its main features. We then describe in Chapter 3 the language workbench that was used to implement our formalization: Spoofax. In the rest of the chapters we formalize our subset of Rust: MiniRust. In Chapter 4 we describe its syntax and the transformations operated on the language before type checking. We then focus, in Chapter 5, on name binding and type checking. We describe the judgments for typing of our language. In Chapter 6 we present the lifetime inference phase. In Chapter 7 we finally describe the Borrow Checker, providing the judgments that define borrow checking. For each of these chapters

(9)

1.5. OVERVIEW OF THE STUDY 3 we show the correspondent declarative specification in Spoofax. In Chapter 8 we review the related work. We conclude in Chapter 9 discussing the outcome of our work.

(10)

(11)

Chapter 2

Rust Overview

2.1 Introduction

In this chapter we will give an introduction of the Rust programming language. This chapter is thought for people who don’t already know the language: if the reader is already familiar with the language he can skip this chapter. Rust has a wide range of features, in this chapter we will explain only those that will be used in the rest of the thesis. For a comprehensive guide to Rust refer to [15]. In this chapter we will first look at the goals of Rust, then we will show some constructs of the language and finally we will look at Rust’s most interesting features, namely ownership, borrowing and lifetimes.

2.2 Rust

Rust is a programming language with a focus on system programming [15]. To be com-petitive in this field the compiler produces efficient code, whose efficiency is comparable to that of C or C++. On top of that the compiler also guarantees that the resulting programs are safe, in the sense that a set of memory errors are completely avoided. To produce fast code the language doesn’t rely on garbage collection, instead it uses a pow-erful type system that allows the compiler to insert in the program calls to the memory management. This type system uses the concept of ownership: we will see some examples of how ownership works in section 2.4. The compiler guarantees that a class of mem-ory errors never happens. Rust doesn’t have null pointers, which eliminates errors like null dereferencing. The ownership type system guarantees that accesses to uninitialized memory are catched at compile time. The Borrow Checker guarantees that dangling pointers are detected statically through the usage of lifetimes. Double free errors, where a program marks as unused a piece of memory twice, are not possible with ownership. Finally the programmer doesn’t have directly access to the memory management.

(12)

6 CHAPTER 2. RUST OVERVIEW

2.3 Basic Syntax

In Rust names can be created using the let statement. Variables are by default immutable, but mutable variables can be created using the keyword mut before the variable name. Types for variables are by default inferred, but they can be specified adding a colon and the type after the variable name.

{

let x = 3; // x is an immutable 32-bit integer

let mut y = true; // y is a mutable boolean

let z: i32 = 0; // z is an immutable 32-bit integer }

Rust has a wide range of primitive types, but for the scope of this thesis it is sufficient to know that there is a boolean type bool with values true and false and numeric types of which we will use i32: a signed 32-bit integer. Another type that we will use in the rest of the examples is Vec<i32> which is a vector of integers.

In Rust functions are declared using the keyword fn. Types for function arguments are not inferred and need to be explicitly specified. While types could be easily inferred also for functions, explicit typing gives a documentation of the function behaviour. The last expression, in the function’s body, is the returned value. Early returns can be specified using the return keyword.

{

// A function called 'sub' that takes two 32-bit integers

// as arguments and return their difference as a 32-bit integer fn sub(x: i32, y: i32) -> i32 {

x - y \\ last expression is returned }

let z = sub(4, 3) // 1 is bound to z }

The flow of the program can be handled using common constructs like if, while and for. Another kind of loop available in Rust is the infinite loop: to exit the loop you can use the break statement.

{

loop { // infinite loop c = get_command(); if c == 0 { break; } else { ... } } }

(13)

2.3. BASIC SYNTAX 7 Data can be structured using common construct like struct and enum. Rust’s structs are like C’s structs. After the keyword struct we specify the name and the list of fields enclosed between curly brackets. Fields are specified giving the name and type for each one. Fields can be accessed through the usual dot notation.

{

// A struct called Point struct Point {

x: i32, // x is an i32

y: i32 // y is an i32

}

// Get the x field of the given Point fn get_x(p: Point) -> i32 {

return p.x; }

}

Rust’s enums are used to represent sum-types. The value of a sum-type can be one of a list of variants. In the example below the enum declares a new type Option, that can be instantiated either with None or Some. Notice also that arbitrary data can be associated with a variant: in the example below we associate a 32-bit integer with the variant Some.

{

// An enum called Option which specifies two different variants: // a variant called None which has no data associated with it // a variant called Some which contains an i32

enum Option{ None, Some(i32) }

}

Matches are used to branch on enums. A branch is specified for each different variant of the matched type. For variants with associated data, these can be bound through variable names in the branch guard and can be used in the body of the branch.

{

// A match that is exhaustive on all // possible variants, that is the // values that `data` can take. match data {

None => return 0, Some(x) => return x

(14)

8 CHAPTER 2. RUST OVERVIEW }

}

2.4 Ownership

In this section we will introduce the concept of ownership which is one of the peculiarities of Rust. In Rust memory management is neither automatic, through the use of a garbage collector, nor manual, because the programmer does not need to manage the memory explicitly. Using an ownership type system the Rust compiler can figure out when data is not used anymore and can insert in the program calls to the memory management. In a type system based on ownership each object in memory has a unique owner [3]. In Rust the variable that refers to an object is the object’s owner. When the owner goes out of scope its referenced memory is deallocated. The compiler inserts calls to free the memory at the exit of each scope.

{

let mut v = vec![1, 2, 3]1_{; // v is the owner of the Vector of integers}

} // v goes out of scope, memory owned by v is dropped

To guarantee the uniqueness of the owner, whenever there is a copy of the pointer the ownership is moved from the old owner to the new variable. After the move the old owner becomes uninitialized and next uses of the old owner cause a static error.

{

let v1 = vec![1, 2, 3];

let v2 = v1; // move ownership to v2

println!("{}", v1[0]); // ERROR! v1 is uninitialized }

The ownership is moved also for the parameters of a function call. This happens because the function call implicitly copies the pointer from the actual parameters to the formal parameters. Returning the pointer moves back the ownership to the call site. {

fn vec_length(v: Vec<i32>) -> (i32, Vec<i32>) { (v.len(), v) // needs to return the ownership of v }

let v = vec![1, 2, 3];

let (v, i) = vec_length(v); // move the ownership to the function

// implicit assignment to function parameter }

Moving the field of a struct doesn’t de-initialize the whole struct, but just the moved field.

1

This syntax is the application of a macro, this macro creates a Vector with the elements within square brackets

(15)

2.5. BORROWING 9 {

let d = Data{ l: 0, v: vec![] };

let v1 = d.v; // move out the ownership of d.v only

println!("{}", d.l); // ok: we can still access to d.l println!("{:?}", d.v); // ERROR! d.v is uninitialized }

Rust defines a trait2 _{named Copy: types that are implementing this trait don’t need to}

be moved but instead their value is copied and the old owner is not de-initialized. A type that contains a reference cannot be Copy. An example of copy types are primitive types (i32, bool, f64, etc.).

{

let x : i32 = 5;

let y = x; // doesn't move the ownership of x just copies the value pritntln!("{}", x); // 5

}

2.5 Borrowing

In Rust the ampersand symbol & is used to create a reference and for reference types. The star operator * is used to access to the referenced value. References in Rust are not raw pointers, but are used to take temporarily the ownership of the referenced variable and are called borrows. A borrow type is the type associated with a borrow and it is indicated with the ampersand symbol & and the type of the referenced variable.

{

let v = vec![1, 2, 3];

let v_ref = & v; // take the ownership temporarily

assert_eq!(v[1], (*v_ref)[1])3_{; // the * operator is used to access to v}

}

Borrows are particularly useful for function calls where the ownership can be borrowed for the duration of the call and doesn’t need to be moved back and forth.

{

fn length(v_ref: & Vec<i32>) -> i32 { ...

}

let v = vec![1];

let i = length(& v); // take ownership temporarily println!("{:?}", v); // ownership back to v

}

2_{A trait defines the abstract behavior of a type.} _{It consists of a set of methods that must be}

implemented by a type to have that trait.

(16)

10 CHAPTER 2. RUST OVERVIEW A borrow can be immutable or mutable. A mutable borrow can be created using the keyword mut after the ampersand. Mutable borrows require the referenced variable to be declared as mutable and they can be used to change its value.

{

fn change(v_ref:&mut Vec<i32>, i: i32, x: i32) { (*v_ref)[i] = x;

}

let mut v = vec![1, 2, 3]; // owner needs to be mutable change(& mut v, 0, 0); // to take a mutable borrow println!("{:?}", v); // [0, 2, 3]

}

Immutable borrows can only read the value of the borrowed variable, but they can’t change it. It is possible to have multiple immutable borrows at the same time. Mutable borrows instead require to be unique. The general rule is that we can have multiple immutable borrows or a single mutable borrow at the same time, but not both.

{

let x = 4;

let x_ref1 = & x; // we can have multiple immutable borrows let x_ref2 = & x; // ok

let x_ref3 = & mut x; // ERROR! or a unique mutable borrow }

A borrow limits the usage of the borrowed variable. An immutable borrow requires that the ownership of the borrowed variable is not moved and that the borrowed variable is not written. Reading during an immutable borrow is instead allowed. A mutable borrow prevents reading, writing and moving the borrowed variable while it’s active.

{

let v1 = vec![1, 2, 3];

let v_ref = & v1; // immutable borrow: v_ref limits the usage of v1

let v2 = v1; // ERROR! cannot move out v1

v1[0] = 2; // ERROR! cannot write to v1

println!("{}", v1[0]); // reading is ok }

A borrow limits not only the exact variable that is borrowed, but also all the variables used to access to this variable. For example the borrow of a struct’s field limits the field itself and the variable containing the struct.

{

struct Point { ... } // as usual let mut p = Point { x: 0, y: 0 }; let x_ref = & p.x; // borrow p.x

p = Point { x: 1, y: 1 }; // ERROR! cannot mutate a prefix of p.x }

(17)

2.6. LIFETIMES 11 A mutable borrow, since it is unique, can, in turn, lend again the ownership and create another borrow: this operation is called re-borrowing.

{

let mut x = 5;

let r1 = & mut x; // borrow x

let r2 = & x; // ERROR! cannot borrow immutably x while r1 is active let r3 = & *r1; // ok: this borrow limits r1 usage

}

The compiler through a static analysis phase guarantees that at the same time we have only one or more immutable borrows or a single mutable borrow, but not both, and it guarantees also that the usage of a borrowed value is limited for the whole duration of the borrow. The compiler generates an error if a program does not satisfy these conditions. The part of the compiler that executes this static analysis is called Borrow Checker. We will see in detail how this works in Chapter 7.

2.6 Lifetimes

Lifetimes in Rust are used to identify the duration of a borrow. A lifetime variable a is written with the notation 'a. Knowing the duration of a borrow is crucial to avoid errors like the use-after-free.

{

// An example of use-after-free

fn invalid_reference() -> &Vec<i32> { let v = vec![];

return &v; // ERROR! v does not live long enough } // v is dropped here

let x = invalid_reference();

println!("{?:}", *x); // potential error: accesses to a dropped object }

The compiler through an analysis called lifetime inference assigns a span of code to each lifetime variable. This span of code represents the duration of the borrow with that lifetime, that is the point from the start of the borrow to its last use. These associations are used by the Borrow Checker to detect when a borrow is used after the borrowed variable has been dropped. Lifetimes are usually implicit: the compiler binds to each borrow a different lifetime variable. For example the type &'a i32 is a reference to a 32-bit integer that lives for a lifetime 'a.

{

let r1:& i32; // this is translated to the one below let r1:&'a i32; // the compiler adds a fresh lifetime

// to each borrow

(18)

12 CHAPTER 2. RUST OVERVIEW In Rust functions are polymorphic on lifetimes: lifetime parameters can be declared within angle brackets <>, using the usual notation for generics. Lifetime parameters are required if the function signature contains borrow types. The compiler automati-cally adds lifetime parameters for some common cases, allowing lifetimes to be elided in function signatures: this feature is called lifetime elision. When lifetime elision does not apply, the programmer needs to specify explicitly in the function signature a list of lifetime parameters and attach one to each borrow type.

{

fn foo(x: & i32) { ... } // lifetime elision turns this // into the one below

fn foo<'a>(x: &'a i32) { ... } // declares a lifetime 'a

} // x is a borrow that lives for 'a

If two borrow types have the same lifetime variable this guarantees that they will have the same duration.

{

// can return x or y, they need to have the same lifetime as // the returned value

fn borrow_x_or_y<'a>(x: &'a i32, y: &'a i32) -> &'a i32 { ... } }

{

// always returns p, p and the return value have the same lifetime fn borrow_p<'a, 'b>(p: &'a i32, q: &'b i32) -> &'a i32 { ... } }

2.7 Other Aspects of Rust

Rust is a general purpose programming language with a rich variety of features. In our presentation we included only the features that are relevant for the Borrow Checker. The language used in the thesis, that we called MiniRust, is a subset of Rust that contains all the features covered in this chapter. The syntax of MiniRust will be presented in Chapter 4. The constructs of Rust that were not described and are not part of this thesis are the following:

• Implementation blocks: allow to associate methods with a type;

• Generics: can be used to make the type used by a function or a data structure abstract;

• Traits: a trait consists of a set of methods that can be implemented by a type. A type that has a specific trait must implement the methods of that trait. Traits are used to define the behavior of a type in an abstract way. Traits can be used in generics to specify that the type must have a certain behavior;

(19)

2.8. SUMMARY 13 • Macros: unlike C macros, Rust macros are expanded after parsing and they are

indicated with a name terminating with an exclamation mark !;

• Unsafe blocks: sometimes it is useful to circumvent the Borrow Checker. The unsafe keyword allows to write blocks of code that are not checked by the Borrow Checker. In these blocks raw pointers and unsafe functions can be used.

2.8 Summary

We have seen some of the basic constructs of the language and we have described the ownership policy, the borrowing mechanism and how this relies on lifetimes. We will use this introduction to Rust to describe MiniRust, our implementation of a subset of the Rust language in the next chapters.

(20)

(21)

Chapter 3

Spoofax

In this chapter we will introduce Spoofax: the tool that we used to implement our specification. We will first introduce the concept of Language Workbench (LWB), we will then look at the Spoofax LWB and we will conclude the chapter presenting the structure of our project and the obtained editor.

3.1 Language Workbenches

Research in programming languages tries to close the gap between language constructs and domain concepts [30]. General purpose programming languages are used to solve problems in any domain. Developers are responsible of using the language constructs to model concepts of the problem’s domain.

Domain Specific Languages (DSLs) try to tackle the problem moving closer languages and domain concepts using linguistic abstractions [28]. DSLs offer notations, analysis, verification and optimization techniques that are specialized to a problem’s domain [30]. A developer first designs a language that models domain concepts and then uses the resulting language to solve the problem.

A DSL to be of practical use needs:

• a compiler to translate source code into executables;

• an Integrated Development Environment (IDE) to edit the code with some editor support.

Language construction tools exist to define a compiler for a language. For IDE’s support, on the other hand, often parts of the language definition used for the compiler have to be reimplemented again. Examples of these repeated definitions are: parsing or name and type analysis.

Language Workbenches (LWB) are used to derive a compiler and an IDE from the same definition. In that regard Language Workbenches are tools that integrate language construction tools with IDE’s support [14]. A developer creating a new language with a Language Workbench can focus on language design rather than on implementation

(22)

16 CHAPTER 3. SPOOFAX

Figure 3.1: How language aspects map to Spoofax’s metalanguages and the generated artifacts

details and obtain a compiler and an IDE from the same specification. Examples of LWBs are MPS [9], Xtext [5] and Spoofax [14].

3.2 Spoofax

Spoofax is a LWB for textual DSLs, with state of the art IDE support, that uses declara-tive language specifications for language aspects and IDE services [14]. We used Spoofax to write a specification of the syntax and the static semantics and we derived an IDE for our language MiniRust.

As for modern IDEs, Spoofax supports real-time syntactic and semantic analysis based on a parser scheduled in the background. The parser supports error recovery and origin tracking [14]. Examples of supported syntactic and semantic analysis phases are: syntax highlighting, syntax completion, name and type analysis and semantic content comple-tion.

A language definition involves different aspects that are:

• syntax, which describes the structure of well-formed sentences;

• name binding and scoping rules, which describe the relation between definition and uses of names;

• static semantics, which generates static constraints on syntactically well-formed programs, avoiding a large class of runtime errors;

(23)

3.3. MINIRUST PROJECT STRUCTURE 17 • transformations, which improve programs in some direction like performance or

understandability;

• dynamic semantics, which describe the dynamic behavior of programs; • code generation, which specifies a translation to lower level target code.

Spoofax provides declarative metalanguages to the developer to describe each lan-guage aspect [30]. Consider Fig. 3.1: the metalanlan-guage SDF3 deals with the syntax, NaBL2 handles the name binding and the static semantics aspects, DynSem on the other hand describes dynamic semantics of the language and Stratego is a term-rewriting lan-guage that is used for transformations and code generation.

3.3 MiniRust Project Structure

We have seen in the previous section how, in Spoofax, different metalanguages are used to describe different aspects of a language. In this section we will see how our language, MiniRust, is structured in this Language Workbench.

MiniRust MiniRust AST parsing

desugaring type

analysis _inferencelifetime checkingborrow

Figure 3.2: Project structure

The MiniRust frontend performs the steps reported in Fig. 3.2. Given a MiniRust program, the following steps are performed:

1. The program is first transformed in an AST by a parser. This parser was derived from a specification of the syntax of the language. This specification was written in the SDF3 metalanguage.

(24)

18 CHAPTER 3. SPOOFAX 2. On the AST we then apply transformations to: eliminate redundant constructs and thus simplify the subsequent phases, but also to make explicit elements that are implicit in the source code like lifetime variables. This happens during the desugaring phase. The transformations defined in this phase are specified with the Stratego metalanguage, which allows to describe transformations on trees;

3. Once the AST has been simplified we then proceed to the part of static semantics shown in the figure by the dashed box. The first phase of this analysis analyzes the correctness of names and types. This analysis does not change the AST, but creates auxiliary structures as a result of the analysis. This part of the analysis is described in NaBL2, the metalanguage used in Spoofax to describe the analysis of names and types.

4. After verifying that the AST is well typed, we proceed inferring the lifetime vari-ables to assign them a value. This phase was implemented with the metalanguage Stratego. Like the previous analysis, this does not change the AST, but adds new auxiliary structures where the result of this inference is also present.

5. The last phase uses the results of the previous phases to check that the program respects the rules for borrowing and ownership: that is it implements the Borrow Checker. Also this phase is implemented in Stratego and, as in the previous phases, leaves the AST unchanged and produces additional results.

In the next section we will describe the editor we obtained describing the syntax and the various parts of the static semantics of our language in Spoofax.

3.4 MiniRust Resulting Editor

The usage of the Spoofax language workbench to describe our language allowed us to automatically get an editor. A language defined in Spoofax can be distributed as an Eclipse plugin. Adding this plugin to an Eclipse installation, you get an editor with all the features given by the Eclipse IDE. The resulting editor parses the program displaying errors and suggestions. It does type checking marking type errors and displaying infor-mations on types when hovering on a specific name. It performs lifetime inference and borrow checking showing, in case errors are present, a message with the cause. All of this occurs live while the programmer types, therefore the analysis does not have to be run manually, but each time the program is modified it runs the entire pipeline again.

In Fig. 3.3 we see a screenshot of the resulting editor. Hovering on a variable the editor shows us the type as in the case of the field data. Errors, if present, are signaled at the point in which they arise as in the case of the access to the field v.len which is an invalid access detected by the Borrow Checker. The error message is reported in a special section that shows the cause of the error and the position in the program.

(25)

3.5. SUMMARY 19

Figure 3.3: Editor for MiniRust

3.5 Summary

We have described a Language Workbench in general and in particular Spoofax. We then discussed briefly the structure of the project and the obtained editor for our language. We will describe more in detail the metalanguages used to define the syntax and the static semantics aspects in the next chapters, looking at the definitions for MiniRust.

(26)

(27)

Chapter 4

Syntax and Desugaring for

MiniRust

4.1 Introduction

In this chapter we introduce and describe the syntax of our language MiniRust. We first show the syntax of the language, then how this was specified using SDF3, the Spoofax’s meta-language for syntax specification, later we look at some desugaring rules for MiniRust and we conclude by looking at the implementation of this desugaring rules in Stratego, the Spoofax’s meta-language for term rewriting.

4.2 MiniRust Syntax

In MiniRust the syntax is reduced to but a bare essential subset of Rust. It includes bor-rowing and lifetime syntax, since this is the main topic of this work, and other language constructs that give rise to interesting cases when interacting with borrowing.

In Fig. 4.1 we show the syntax of MiniRust. In the rest of this section we comment the aspects that are not obvious of MiniRust. The meta-variable c ranges over MiniRust’s programs. It represents a crate: crates in Rust are what in other languages are called packages. They group together a set of related functions and data structures. In MiniRust a crate is a possible empty list of module elements mii∈0..n. A module element m can

be a module declaration, a use statement, an enum declaration, a struct declaration or a function declaration. A module in MiniRust creates a namespace without any visibility limitation. To refer to a name x declared in the module m one can use the notation m::x. A use statement is used in order to make names from a module local. Enum declarations are such as the ones in Rust, but while in Rust a variant can be a label with attached any type of data, in MiniRust the type of the variant’s data is either unit or struct. Function declarations are polymorphic on lifetimes. They take a possible empty list of lifetime parameters ljj∈0..m. Borrow types in function parameters needs to

explicitly use one of the declared lifetimes parameters. 21

(28)

22 CHAPTER 4. SYNTAX AND DESUGARING FOR MINIRUST

c := mii∈0..n crate

m := module element

| mod x {mii∈0..n} module declaration

| use p; use statement

| enum x {vii∈1..n} enum declaration

| struct x {xi: tii∈1..n} struct declaration

| fn x<ljj∈0..m>(xi: teii∈0..n) → te {skk∈0..pe} function declaration

s := statements

| let x:t; let binding

| e; expression statement

| loop{sii∈1..n} loop statement

| break; break statement

| continue; continue statement

| m module element

e := expressions

| [eii∈1..n] array creation

| p= e assignment

| p op= e operational assignment

| e bop e binary operation

| {sii∈0..ne} block expression

| & q e borrow expression

| x<m>(eii∈0..n) function call

| if e {sii∈0..n e}else {sii∈0..m e} if expression

| match e {pati => ei1..n} match expression

| x {xi : eii∈1..n} struct creation

| p path expression

| liti | litb| lits literals

p := path

| x variable

| x1:: ... :: xn module access

| e.x field access

| e[i] array access

| ∗e borrow access

v := variant

| x named variant

| x{xi: tii∈1..n} struct variant

pat := pattern

| x named pattern

| x{xi: patii∈1..n} struct pattern

l := 'x lifetime q := mutability | ε immutable | mut mutable ¯ t := base types

| () | bool | i32 | string primitive types

| x type name

t := types

| ¯t base type

| [t] array type

| & q t borrow type

te _:= _{explicit types}

| ¯t base type

| [te_] _{array type}

| & l q te _{explicit borrow type}

(29)

4.3. IMPLEMENTATION OF THE SYNTAX WITH SDF3 23 Statements s are another syntactic category. A statement can be a let statement, an expression statement, a loop statement where we can use break and continue, or a module element m.

The meta-variable e ranges over expressions. For operational assignment the oper-ation op is one of the usual arithmetic operoper-ations +, -, /, *, %. The binary operator bop, used in the binary operation production, is one of: operators in arithmetic

opera-tors, equality operators (==, !=), relational operators (>, <, >=, <=) or boolean operators (&&, ||). In MiniRust calls to function declaring lifetime parameters needs to add an integer number m between angular brackets <m>, this number must correspond to the number of lifetime parameters in the called function. This syntax differs from that used by Rust. A function call, in its explicit form, needs a fresh lifetime variable for each lifetime parameter in the called function. As we will see at the end of this chapter, we will add lifetime variables during the desugaring phase. During this phase information on types, like the number of lifetime parameters required by the called function, are not yet available. For this reason we require the programmer to provide for us the number of lifetime parameter declared in the called function. Literals usable in MiniRust are integer literals liti, boolean literals litb with values true and false and string literals

lits where string values are enclosed in double quotes.

The meta-variable p ranges over paths, which represent the left values of the language. A path can be: a variable x, an access to the element of a module x1 :: ... :: xn, an access

to a struct’s field e.x, an access to an array’s element e[i] or an access to a borrow ∗e. The meta-variable ¯t ranges over the types that do not contain other types. These are the primitive types and the type name x used for structs and enums. Primitive types in MiniRust are the unit type (), the boolean type bool, the integer type i32 and the string type string. Using the meta-variable ¯t, the meta-variable t ranges over types with implicit lifetimes in borrow types. Types with explicit lifetimes in borrow types are described by the meta-variable te_{. Borrow types with explicit lifetimes & l q t must be}

used in function signatures. The meta-variable x ranges over names.

4.3 Implementation of the Syntax with SDF3

In this section we will introduce SDF3 the meta-language for syntax definition present in Spoofax. We show some examples of encoding in SDF3 of the productions defining the syntax of MiniRust.

4.3.1 SDF3

The syntax definitions in Spoofax are given using the SDF3 meta-language. SDF3 pro-vides a declarative syntax definition, it supports the full class of Context-Free (CF) grammars, handles ambiguity using declarative disambiguation rules and uses declara-tive annotations on productions to describe the resulting abstract syntax trees. Spoofax uses SDF3 specifications to generate a parser, a pretty-printer and some basic editor fea-tures. The generated parser uses SGLR parsing (Scannerless Generalized Left to Right),

(30)

24 CHAPTER 4. SYNTAX AND DESUGARING FOR MINIRUST which removes the need for two different parsers, one for the lexical analysis and one for the syntactic analysis [18].

4.3.2 Crates

The following listing shows the SDF3 specification for a crate. context-free syntax Crate.Crate = <<{ModElem " \n"}*>> ModElem = FnDecl ModElem = ModDecl ModElem = StructDecl ModElem = EnumDecl ModElem = UseDecl

The SDF3 specification looks very similar to the grammar in Fig. 4.1, the sort Crate can expand to the expression {ModElem "\n"}* which represents a list of ModElem sorts, separated by a new-line char, where the * is used to indicate a possible empty list. The choice operator | for m in the grammar translates to a list of productions for the sort ModElem in SDF3.

As SDF3 describes completely the parsing process also semantic actions are embedded in the syntax. The .Crate syntax is a semantic action that informs the parser which name will be used for the abstract syntax tree node generated following this reduction. Fig. 4.2(a) shows an example of a simple crate, the top part of the corresponding abstract syntax tree is in (b). use foo1; mod foo2 { ... } (a) UseDecl ModDecl Crate ... ... (b)

Figure 4.2: (a) An example of a possible crate. (b) AST of the code in the example.

4.3.3 Statements

In MiniRust statements are another important syntactic category. The following listing shows the SDF3 specification for the statements non-terminal Stmt.

(31)

4.3. IMPLEMENTATION OF THE SYNTAX WITH SDF3 25 context-free syntax Stmt = Let Stmt.ExprStmt = Expr ";" Stmt = Loop Stmt.Break = "break" ";" Stmt.Continue = "continue" ";" Stmt = ModElem

A statement can expand into a let-binding (Let), an expression followed by a semi-colon (ExprStmt), a loop (Loop), a break statement, a continue statement or a module element (ModElem).

The following listing presents the specification for function declarations. We only show the specification for function declarations because it is complex enough to show some of SDF3’s features.

context-free syntax

FnDecl.FnDecl = <fn <Id> <GenericParams?> <FnSig> <BlockExpr>>

FnSig = <(<{Param ", "}*>) -\> <Ty>> Param.Param = <<Id>:<Ty>>

Functions are declared with the keyword fn followed by the function signature FnSig and a block containing the function’s code BlockExpr. The sort GenericParams captures only lifetimes declaration, as generics are not available in MiniRust, the symbol ? is used to make the preceding sort optional. A function signature FnSig is a list of parameters Param followed by an arrow -> and a type, which is captured by the sort Ty. The SDF3 specification is used for multiple purposes, Spoofax generates from it not only a parser, but also a formatter and a generator, that are used by IDE services. Angle brackets in the FnDecl production enclose template syntax. In template syntax non-terminals have to be enclosed further between angle brackets, the rest are considered as terminal symbols. The formatter will use indentations, newlines and spaces for the resulting layout as defined in the production. One can use template syntax to specify to the formatter the desired layout for each production.

4.3.4 Expressions

Expressions are another syntactic category in MiniRust. In following listing we report the syntax specification for addition, subtraction, multiplication, division and remainder.

context-free syntax // arithmetic expressions Expr.Sum = <<Expr> + <Expr>> {left} Expr.Sub = <<Expr> - <Expr>> {left} Expr.Mul = <<Expr> * <Expr>> {left}

(32)

26 CHAPTER 4. SYNTAX AND DESUGARING FOR MINIRUST 3 - 3 - 4 (a) 3 -3 4 3 - (3 - 4) 4 -3 3 (3 - 3) - 4 (b)

Figure 4.3: An ambiguous expression (a) that can lead to the two displayed ASTs (b) Expr.Div = <<Expr> / <Expr>> {left}

Expr.Rem = <<Expr> % <Expr>> {left}

Grammars that can parse a string as multiple ASTs are called ambiguous. Fig. 4.3 shows an expression that parses as multiple ASTs. SDF3 handles ambiguities using disambiguation rules such as priority and associativity, exclusion/rejection, which can reject a specific production from being accepted, and follow restrictions rules, which accept a production only if the following characters (lookahead) are contained in a certain class. Associativity for each production can be specified by using attributes enclosed between curly brackets. For example the attribute left in the previous listing specifies that the preceding production is left-associative, which leads to the choice of the second AST for the example in Fig. 4.3.

The following listing shows how priorities were specified for arithmetic expressions. context-free priorities

{left: Expr.Mul Expr.Div Expr.Rem} >

{left: Expr.Sum Expr.Sub}

One can specify priorities between different productions in a context-free priorities section, by grouping productions between braces and using the > operator to indicate that productions in the left-hand side group have priority over the one in the right-hand side. Associativity between productions in the same group are also specified by the left: attribute inside the braces. In Fig. 4.4 we report an example of an expression with the AST derived according to the priority rules in the previous listing.

4.3.5 Literals

In order to complete our tour of SDF3 features we show an example of a lexical definition. Literals that are allowed in MiniRust are only booleans true and false, integers and strings. The following listing illustrates the lexical definition for integer literals.

(33)

4.4. DESUGARING 27

3 * 3 + 4 * 4

+

3 3

(3 * 3) + 4

Figure 4.4: An expression and its AST according to priorities defined.

Int = "-"? [0-9]+ lexical restrictions

Int -/- [0-9]

In the lexical syntax section square brackets enclose a character class, in this case all the numbers from 0 to 9, the + operator is the non-empty repetition. In the lexical restrictions section the -/- operator expresses a follow-restriction rule, this ensures longest match when matching a sequence of numbers. This concludes our little tour of SDF3, in the next section we will discuss desugaring for MiniRust.

4.4 Desugaring

p op= e def= p = p bop e operational assignment

mii∈0..n def

= <ljj∈0..m>mii∈0..n crates

& q e def= &l0 q e borrow expression

& q t def= &l0 q t borrow type

x<m>(eii∈0..n) def

= x<l0jj∈1..m>(eii∈0..n) function calls

Figure 4.5: Desugaring rules

In the syntax that we presented in Fig. 4.1 some constructs are convenient for the programmer using the language, but are not strictly necessary. We call syntactic sugar all the constructs for which there is an equivalent combination of simpler constructs. To simplify the next compilation steps we want to eliminate all the syntactic sugar, rewriting it to its equivalent form. This operation is called desugaring. In Fig. 4.5 we report our desugaring rules. An example of syntactic sugar present in our language is the operational assignment, which represent two operations: a binary operation between the path p and the expression e and the assignment of the result of this expression to the path p. The first rule in Fig. 4.5 represents this rewriting.

(34)

28 CHAPTER 4. SYNTAX AND DESUGARING FOR MINIRUST Rewriting rules are not only necessary for syntactic sugar, but also for elements that are implicit in the program. This is the case for lifetimes, each borrow expression & q e and each borrow type & q t has an implicit lifetime variable bound to it . The rules in Fig. 4.5 for borrow expressions and borrow types add a lifetime variable l0 _{to each of}

these, where the name of the added lifetime is fresh.

Another case where we need to create fresh lifetime variables is for function calls x<m>(eii∈0..n). The rule in Fig. 4.5 for function calls replaces the number m with a list

of m fresh lifetime variables l0

jj∈1..m. The number m must correspond to the number of

lifetime parameters declared in the called function. If this number is not correct this will entail an error during type checking.

These lifetimes behave like variables, as we will see in the next chapter to represent them as variables we need them to be declared somewhere. In our implementation we decided to extend the root of a program, the crate, with the list of lifetime variables bound to each borrow expression and borrow type. The rewriting rule on crates in Fig. 4.5 adds to the crate the list of lifetime names ljj∈0..m, where each lj is a name introduced by the

rules for borrow expression or borrow type.

4.4.1 Implementation of Desugaring using Stratego

Our desugaring phase is implemented in Stratego, which is the meta-DSL for term rewrit-ing used in Spoofax. In Stratego it is possible to define rules and strategies. A rule defines a rewriting on terms, a strategy instead defines how to apply a rule during a term traversal. In Stratego a term c(t1, ..., tn) is the application of a constructor c to a

list of sub-terms t1, ..., tn. The constructors generated by our specification in SDF3 are

Stratego’s terms. An example of a Stratego term is Id("x"), where Id is the constructor and "x" is the sub-term. A rule on terms has the form: R : p1 → p2, where R is the

name of the rule, p1 is the left-hand side pattern and p2 is the right-hand side pattern. A

pattern is a term with variables [2]. The following example shows the Stratego rewriting rule for operational assignments, in this case it rewrites a multiplication assignment to an assignment with a multiplication in its right-hand side.

desugar(|created-names):

AssignMul(lhs,rhs) → Assign(lhs, Mul(lhs,rhs))

In Stratego uppercase names are used for constructors and lowercase names are used for variables. The left-hand side pattern matches an AssignMul constructor and binds the variables lhs and rhs with the two sub-terms of the constructor. These variables can be used in the right-hand side pattern that builds a new term Assign with Mul as its second sub-term. The semantic of the rule is that if the left-hand side pattern is matched then it is rewritten as the right-hand side pattern, in this case we say that the rule succeeded, otherwise we say that the rule failed. Rules in Stratego are parametric with respect to rules and terms. A rule R with rules R1, ..., Rn and terms t1, ..., tn as

parameters takes the form R(R1, ..., Rn|t1, ..., tn) : p1 → p2. The rule desugar take the

(35)

4.4. DESUGARING 29 the next rules. In the following listing we show the Stratego rewriting rule for borrow expressions. desugar(|created-names): Borrow(q, e) → Borrow(lf, q, e) where lf := Lifetime(Id(<newname> "l")); <iset-add(|lf)> created-names

The rule rewrites a borrow expression with mutability q and an expression e to a borrow with a new term lf representing a fresh lifetime name. In this rule we used a where clause. In Statego a rewriting rule is applied only if the application of the rules in the where clause succeed. In this rule, in the first statement of the where clause we first build a new term Lifetime and we bind it to a variable lf. The notation <R> t stands for the application of the rule R to the term t. In the first statement we apply the rule newname to the term "l". This rule generates a name that is guaranteed to be fresh in the program, using the input term as prefix for the name. As we explained when we discussed the rewriting rules we need to remember the lifetime names that we introduce. This is the purpose of the term created-names, that represents the set of lifetime names that we have created. In the last statement we use the rule iset-add to add the new lifetime lf to this set.

In the following listing we show the Stratego rule for crates. add-all-life-vars-to-crate(|created-names):

Crate(mi) → Crate(GenericParams(lfj), mi) where

lfj := <iset-elements> created-names

The rule matches a Crate and rewrites it as a Crate where the first sub-term contains all the lifetime names lfj created during desugaring. In the last listing we show the Stratego rule desugar-all. The rule takes the program’s AST and combines the previous rules to generate the desugared AST.

desugar-all:

ast → desugared-ast where

created-names := <new-iset>();

ast' := <topdown(desugar(|created-names))> ast;

desugared-ast := <add-all-life-vars-to-crate(|created-names)> ast'; <iset-destroy> created-names

The rule first creates a new set that is bound to the variable created-names. It then uses the strategy topdown for the application of the rule desugar on the term ast. The strategy topdown(r) traverses the input term applying the rule r to the root constructor first and then to each of its subterms, and so on. Once the AST is desugared the rule adds all the created lifetimes to the crate and produces the result term desugared-ast. The last rule deletes the term created-names to cleanup the memory.

(36)

30 CHAPTER 4. SYNTAX AND DESUGARING FOR MINIRUST

4.5 Summary

In this chapter we described the syntax of our language and how we specified the syntax in SDF3. We showed the desugaring rules for MiniRust and their implementation in Stratego. In the next chapter we will discuss part of the static semantic of MiniRust, namely name binding and typing.

(37)

Chapter 5

Name Binding and Typing for

MiniRust

5.1 Introduction

In this chapter we will describe type checking for MiniRust programs. We first describe the semantic types of the language and how we represent the type environment. We then describe formally with well-typedness and well-formedness judgments of the type system of MiniRust. We conclude the chapter with an overview of our implementation with the NaBL2 meta-language, that uses the scope graph formalism, looking at the most interesting cases of NaBL2 specifications for MiniRust. We will exclude lifetimes for now from the discussion and we will look at them in the next chapter.

5.2 Types

In this section we describe the types that we will use in formedness and well-typedness judgments. The meta-variable τ ranges over the semantic values for types. The base types are: () for unit, bool for booleans, int for integers and string for strings. The type [τ] is the array type that contains elements of type τ. Struct’s types have the form {xi : τii∈1..n}, which denotes a struct with n fields where xi is the name of the ith

field and τi its type. The type of an enum is a list of n variants grouped between angle

brackets hxi : τii∈0..ni. Each variant has a name xi, used to refer that particular variant,

and a type τi that indicates the type of the data contained in that variant. The last type

is the polymorphic function type ∀`ij∈1..m.(τii∈1..n → τ ). This type denotes the type of

a function that is polymorphic on lifetimes. Given m lifetime parameters, it defines the type of a function that accepts n parameters, each one of type τi, and that returns a

value of type τ.

(38)

32 CHAPTER 5. NAME BINDING AND TYPING FOR MINIRUST

τ ::= types:

() unit type

bool boolean type

int integer type

string string type

[τ ] array type

{x_i : τii∈1..n} struct type

hxi : τii∈0..ni enum type

& ` µ τ borrow type

τii∈1..n→ τ function type

∀`jj∈1..m.(τii∈1..n→ τ ) polymorphic function type

The metavariable µ ranges over the semantic values for the mutability. The value imm is used for immutability and mut for mutability.

µ ::= mutabilities:

mut mutable

imm immutable

5.2.1 Type Environment

In the well-formedness and well-typedness judgments we use an environment Γ to keep track of the type associated with each name. Names can be used to refer to vari-ables, modules, types and lifetime variables. We define our environment Γ as a tuple of sub-environments (Γv,Γm,Γt,Γl,Γq), where each sub-environment represents a

dif-ferent namespace. The sub-environment Γv keeps track of the bindings between the

variables names and their types. The sub-environment Γm instead, keeps track of the

binding between a module’s name x and the environment Γ containing the names defined in the module. A struct and an enum declaration create a new type name. The sub-environment for type names is Γt, it contains the binding between a type name x and its

semantic value τ. Lifetime variables are recorded in the sub-environment Γl that is used

to check that, any time a lifetime l variable is used, this was first declared. Variables can be declared as mutable using the keyword mut in a let statement. The sub-environment Γq keeps track of the declared mutability µ for each variable x.

(39)

5.3. TYPING RELATION 33

Γ ::= contexts:

(Γv,Γm,Γt,Γl,Γq)

Γv ::= variable context

Γv, x: τ variable type binding

Γm ::= module context

Γm, x: Γ module environment binding

Γt ::= type context

Γt, x: τ type name type binding

Γl ::= lifetime context

Γl, x lifetime name binding

Γq ::= mutability context

Γq, x: µ variable mutability binding

An empty sub-environment is written ∅. To lookup the value associated with a name a in a sub-environment Γx we use the notation Γx(a). We write Γx, a 7→ b to denote an

environment that has the same sub-environments of Γ except for the sub-environment Γx which is extended with the binding a 7→ b.The union Γ ∪ Γ0 of two environments is

defined as the union of their sub-environments.

5.3 Typing Relation

In this section we describe the well-formedness and well-typedness judgments for MiniRust. The well-formedness judgment ` c defines when a crate c is well-formed and it is the start-ing point of our judgments. A crate <ljj∈0..m>mii∈1..ncontains a list of lifetime variables

ljj∈0..m and a list of module elements mii∈1..n. The lifetime variables are used in the

program and they were introduced during the desugaring phase. Each lifetime variable needs to be declared before its usage and this is the purpose of the environment Γ0_.

Module elements are then checked in sequence propagating the updated environment. ` c

Γ0 _{= (∅, ∅, ∅, {l}

jj∈0..m}, ∅) Γi−1` mi; Γi

`<ljj∈0..m>mii∈1..n (t-crate)

5.3.1 Module Elements

The well-formedness judgment Γ ` m; Γ0 _{checks that a module element m is well-formed}

with respect to types, using the environment Γ and producing the environment Γ0_.

Γ ` m; Γ0

A module is well-formed if all its module elements miare well-formed evaluating them in

sequence and if the resulting environment Γ0 _{contains a mapping from the module name}

x to the environment containing all the names defined by the module elements, that is the last resulting environment Γn_.

(40)

34 CHAPTER 5. NAME BINDING AND TYPING FOR MINIRUST Γi−1_{` m}

i ; Γi Γ0 = Γm, x 7→Γn

Γ0 _`_{mod x{m}

ii∈1..n}; Γ0 (t-mod)

The use statement is well-formed if the resulting environment Γ0 _{contains all the names}

defined in the initial environment Γ with the addition of all the names defined in the environment bound to the name of the module x.

Γ0= Γ ∪ Γm(x)

Γ `use x; ; Γ0 (t-use)

A struct declaration is well-formed if each syntactic type ti is well-typed and has type τi

and if the resulting environment Γ0 _{contains a mapping in the sub-environment for types}

from the struct’s name x to the struct’s type {xi : τii∈1..n}.

Γ ` ti : τi Γ0 = Γt, x 7→ {xi: τii∈1..n}

Γ `struct x{xi: tii∈1..n}; Γ0 (t-struct)

An enum declaration contains a list of variants vii∈0..n. Variants that are available in

the language are variants with just a name like None or struct variants like Some{x:i32}. A variant for an enum type can be used using the same notation used to refer to mod-ule elements. For example to refer to the variant None of the enum Option we write Option::None. Since the enum uses the same notation of the module we will represent it as a type and also as a module. Like in Rust, in MiniRust is not possible to have a enum and a module with the same name. Each variant is represented with its name and its type. An enum declaration is well-formed if each variant is well-typed with a type hx_i : τiiwhere xi is the name of the variant and τiis the type of the data attached to that

variant. The resulting environment must contain a mapping from the name of the enum xto its type hxi : τii∈0..ni. To allow the usage of a variant with the syntax used to refer

to a module element, we represent the enum also as a module containing a list of type declarations. The declared types have as name the name of a variant xi and as a type

the enum’s type. A new environment Γe _{contains a mapping from each variant’s name}

xi to the enum’s type. This environment is then used to extend the module environment

in which we add a mapping from the enum name x to the environment Γe_.

Γ ` vi: hxi : τii Γ0 = Γt, x 7→ hxi : τ_ii∈0..ni

Γe_{= (∅, ∅, {x}

i 7→ hxj : τjj∈0..ni i∈0..n

}, ∅, ∅) Γ00= Γ0m, x 7→Γe

Γ `enum x{vii∈0..n}; Γ00 (t-enum)

A function declaration declares a list of lifetime parameters that can be used in the types of the arguments and in the return type. These types needs to be well-typed assuming that the lifetime variables ljj∈0..m are already declared. The resulting environment Γ00

adds to the sub-environment for variables the mapping from the function name x to the function type ∀ljj∈0..m.(τii∈1..n → τ ). The function’s body needs to be well-typed using

an environment that contains the mapping of the function name and a mapping for each parameter name xi to its type τi. The type of the body τb needs to be subtype of the

(41)

5.3. TYPING RELATION 35 Γ0 = Γl, ljj∈0..m Γ0 ` ti: τi Γ0 ` t : τ

Γ00_{= Γ}

v, x 7→ ∀ljj∈0..m.(τii∈1..n→ τ )

Γ000 = Γv00, xi7→ τii∈1..n Γ000 ` {skk∈0..pe}: τb τb <: τ

Γ `fn x<ljj∈0..m>(xi : tii∈1..n) → t {skk∈0..pe}; Γ00 (t-fun)

5.3.2 Statements

The well-formedness judgment for statements is Γ ` s; Γ. Γ ` s; Γ

A let binding is well-formed if the mutability q is well-typed with semantic value µ, if the syntactic type t is well-typed with type τtand if the assigned expression e is well-typed

with type τe. The type of the expression τe needs to be subtype of the variable type τt.

The resulting environment must contain the binding between the variable name x and its type τtand the binding between the variable name x and its declared mutabillity µ.

q: µ Γ ` t : τt Γ ` e : τe τe<: τt

Γ0 _{= Γ}

v, x 7→ τt Γ00 = Γ0q, x 7→ µ

Γ `let q x : t = e; ; Γ00 (t-let)

A loop statement is well-formed if the body is well-typed and has type unit (). Γ ` {sii∈1..ne}: ()

Γ `loop {sii∈1..ne}; Γ (t-loop)

5.3.3 Variants

Well-typedness for enum variants is defined by the judgment Γ ` v : τ. Γ ` v : τ

To a named variant we assign an enum type with a single variant, this variant has name x and type unit () since this variant does not contain any data.

Γ ` x : hx : ()i (t-var1)

A struct variant is well-typed if each type term ti is well-typed and its type is an enum

with a single variant with name x and with type the struct’s type {xi : τii∈1..n}.

Γ ` ti : τi

(42)

36 CHAPTER 5. NAME BINDING AND TYPING FOR MINIRUST

5.3.4 Expressions

The judgment Γ ` e : τ defines when an expression e is well-typed. Γ ` e : τ

A binary operation is well-typed if the sub-expressions e1 and e2 are well-typed and

the resulting type is correct with respect to the used operation. The operator opb

repre-sents the boolean operations (&&, ||). The operator opnranges over arithmetic operators

(+, -, *, /). The operator ope represents equality operators (==, !=). The operator opr

represents comparison operators (>, <, >=, <=). Γ ` e1: bool Γ ` e2: bool Γ ` e1 opb e2: bool (t-op-bool) Γ ` e1 : int Γ ` e2 : int Γ ` e1 opn e2 : int (t-op-num) Γ ` e1: τ Γ ` e2: τ Γ ` e1 ope e2: bool (t-op-eq) Γ ` e1 : int Γ ` e2 : int Γ ` e1 opr e2: bool (t-op-rel) In MiniRust a block is an expression. A block is well-typed if all the statements in the block are formed and using the resulting environment the last expression is well-typed. The type of the block is the type of the last expression. A block limits the visibility of the names defined in it by not propagating the resulting environments.

Γi−1_{` s}

i ; Γi Γn` e : τ

Γ0 _{` {s}

ii∈1..ne}: τ (t-block)

An assignment is well-typed if the variable name x is already defined and has type τl, if

the assigned expression is well-typed with type τr and if the type of the expression is a

subtype of the variable’s type.

Γ ` x : τl Γ ` e : τr τr <: τl

Γ ` x = e : τr (t-assign)

An array expression is well-typed if each expression is well-typed and all the expressions have the same type τ. In this case the type of the resulting array is [τ].

Γ ` ei : τ

Γ `[eii∈0..n] : [τ] (t-arr-creat)

A struct expression is well-typed if the struct is already declared and has a type that contains all the fields xiused in the initialization. The type of the expression τi0 assigned

to each field must be a subtype of the field’s type τi.

Γ ` x : {xi : τii∈1..n} Γ ` ei : τi0 τi0 <: τi

Γ ` x {xi : eii∈1..n} : {xi: τii∈1..n} (t-struct)

A borrow expression requires that the lifetime variable l is already declared, the syntactic mutability q must be well-typed with the semantic mutability µ and that the path p is well-typed with type τ. The type of the borrow is & l µ τ.

(43)

5.3. TYPING RELATION 37 Γ ` l q : µ Γ ` p : τ

Γ `& l q p : & ` µ τ (t-borrow)

A polymorphic function call is well-typed if the function name is well-typed and its type is a polymorphic function. The lifetime variables lj were introduced during the

desugaring phase and are used to instantiate the generic function type. To do this we use a substitution σ that substitutes each lifetime parameter ¯lj, appearing in a type, with

the corresponding lifetime variable lj. The substitution is used to obtain an instance of

the formal types σ(¯τi) and of the return type σ(¯τ). As usual the type of each actual

parameter must be a subtype of the instantiated corresponding formal parameter. Γ ` x : ∀¯ljj∈0..m.( ¯τii∈0..n→ ¯τ)

Γ ` lj Γ ` ei: τi σ= [¯lj 7→ lj] τi <: σ( ¯τi)

Γ ` x<ljj∈0..m>(eii∈0..n) : σ(¯τ) (t-call)

An if expression is well-typed if the type of the condition e is the boolean type and if the then-branch and the else-branch have the same type τ. The type of the result is the type of the two branches τ.

Γ ` e : bool Γ ` {sii∈0..ne}: τ Γ ` {sjj∈0..me}: τ

Γ `if e {sii∈0..ne} else {sjj∈0..me}: τ (t-if)

A match expression is well-typed if the type of the matched expression e is an enum, if the patterns pati are well-typed having as a type one of the variants in the matched

enum type and if the expressions ei of each branch have the same type τ. The type

of the match expression is the type of the expressions τ. A pattern pati can introduce

new variables, these variables can be used by the corresponding expression ei. This is

modeled using the environment generated by the pattern for the well-typedness of the corresponding expression.

Γ ` e : hxi : τii∈0..ni Γ ` pati: hxi : τii; Γi Γi ` ei : τ

Γ `match e {pati ⇒ eii∈1..n} : τ (t-match)

Literals represent the values for the base types and they are well-typed using the following judgments.

Γ ` lits: string (t-string) Γ ` litn: int (t-int) Γ ` true : bool (t-true) Γ ` false : bool (t-false)

5.3.5 Patterns

Patterns are used in match expressions to match the variants defined for an enumeration type. A pattern can introduce new names that refer to the data contained in a variant. The judgment Γ ` pat : τ; Γ0 _{defines well-typedness for a pattern pat and the resulting}

(44)

38 CHAPTER 5. NAME BINDING AND TYPING FOR MINIRUST Γ ` pat : τ ; Γ0

A pattern for a label variant is well-typed if the variant’s name is well-typed and has as a type an enum type hxi : τii∈0..ni. The resulting type is an enum type with a

single variant with name x and type () since a label variant does not contain any data. A label variant does not bind new variables, thus the resulting environment is the initial environment Γ.

Γ ` x : hxi: τii∈0..ni

Γ ` x : hx : ()i; Γ (t-pat1)

A pattern for struct variants is well-typed if the name x is well-typed and has an enum type h¯xj : ¯τjj∈0..mi. This type must contain among its variant a variant with name x and

with a struct type. The variables yi are bound to the value of each struct’s field. The

resulting environment Γ0 _{extends the initial environment binding each variable y}

i to the

corresponding field’s type τi.

Γ ` x : h¯xj : ¯τjj∈0..mi

∃p. (¯xp = x ∧ ¯τp= {xi : τii∈1..n}) Γ0 = Γv, yi 7→ τii∈1..n

Γ ` x {xi : yii∈1..n} : hx : {xi : τii∈1..n}i; Γ0 (t-pat2)

5.3.6 Paths

The well-typedness of a path p is defined by the judgment Γ ` p : τ. Γ ` p : τ

A variable name x must be already declared and have type τ. Γv(x) = τ

Γ ` x : τ (t-var)

A path to a variable x1 :: ... :: xn is well-typed if the names x1, ..., xn−1 are modules

with an associated environment Γi _{and if the variable name x}

n is well-typed using the

environment Γn−1_{. The type of the path is the type of the last variable τ.}

Γi _{= Γ}

mi−1(xi) Γn−1` xn: τ

Γ0 _{` x}

1::...::xn: τ (t-path)

A field access is well-typed if the expression e has struct type that contains a field with the name xj. The type of the field access is the field’s type τj.

Γ ` e : {xi: τii∈1..n}

Γ ` e.xj : τj (t-field)

An access to a position of an array is well-typed if the expression e is well-typed with type array and if the expression for the index ei is well-typed with type int. The type of

A Formalization of the Static Semantics of Rust

Universit`

a degli Studi di Pisa

A Formalization of the Static Semantics of Rust

Contents

Chapter 1

Introduction

1.1

Context of the Study

1.2

Problem Statement

1.3

Aim and Scope

1.4

Significance of the Study

1.5

Overview of the Study

Chapter 2

Rust Overview

2.1

Introduction

2.2

Rust

2.3

Basic Syntax

2.4

Ownership

2.5

Borrowing

2.6

Lifetimes

2.7

Other Aspects of Rust

2.8

Summary

Chapter 3

Spoofax

3.1

Language Workbenches

3.2

Spoofax

3.3

MiniRust Project Structure

3.4

MiniRust Resulting Editor

3.5

Summary

Chapter 4

Syntax and Desugaring for

MiniRust

4.1

Introduction

4.2

MiniRust Syntax

4.3

Implementation of the Syntax with SDF3

4.4

Desugaring

4.5

Summary

Chapter 5

Name Binding and Typing for

MiniRust

5.1

Introduction

5.2

Types

5.3

Typing Relation