Database Communication Technologies for Multi-domain Traffic Engineering

(1)

I

UNIVERSITÀ DEGLI STUDI DI PISA

SCUOLA

SUPERIORE SANT'ANNA

DEPARTMENT OF COMPUTER SCIENCE

DEPARTMENT OF INFORMATION ENGINEERING

AND SCUOLA SUPERIORE SANT'ANNA

CORSO DI LAUREA MAGISTRALE in INFORMATICA E NETWORKING

TESI DI LAUREA

Database Communication Technologies for

Multi-Domain Traffic Engineering

Candidato: Supervisors:

Maziar Sedghisaray Dr. Filippo Cugini

(524923) Dr. Alessio Giorgetti

(2)

II

(3)

III

Acknowledgements

I would like to thank so many people who have directly or indirectly guided me in the successful accomplishment of this thesis work. In addition to my supervisors, many other people at Istituto di Tecnologie della Comunicazione dell'Informazione e della Percezione (TeCIP) institute contributed to this project in their own subtle capacities.

My deepest gratitude is directed Professor Castoldi for selecting me for this work and being my mentor throughout the project.

I would like to thank my supervisors Dr. Flippo Cugini, Allessio Giorgeti for their great support and help during the development of the thesis. They gave me many interesting ideas to work with. Thank you very much!

I express my gratitude to professors and people at department of Informatica at University of Pisa, especially professor Marco Danelutto for his great support and help during this master degree.

Also I would like to thank Italian embassy in Iran and DSU organization for their great help which without them it was impossible for me to achieve this level in my life.

And finally I want to give my greatest thanks to my parents, relatives and friends for their unconditional support and encouragement throughout the successful completion of this thesis work.

(4)

IV

Abstract

In multi-domain networks, exchange of Traffic Engineering (TE) information is required to enable effective end-to-end service provisioning.

So far, several solutions have been proposed where Path Computation Elements (PCEs) communicate using specifically designed protocols, however these solutions present important scalability issues.

This work proposes a direct communication among database systems located at the PCEs, to exchange TE information. By exploiting currently available database technologies, scalable and predictable performance is demonstrated.

(5)

V

ACKNOWLEDGEMENTS……….……….………….. III ABSTRACT……….……….………… IV CONTENTS……….……..….…..……….……… V

INTRODUCTION AND MOTIVATIONS……….…..……….………. 1

OBJECTIVES………..……….…………. 4

OUTLINE OF THESIS REPORT………..……….……….. 5

Part I

….………MPLS and PCE Background….……...

1. Introduction to MPLS and Traffic Engineering

….………….….. 7

1.1

MPLS overview

………..………. 8

1.2 MPLS Architecture

……….….………... 8

1.3 Protocols in MPLS Networks

………….……….… 10

1.4 Explicit Routing

……….……….….. 10

1.5 Traffic Engineering (TE)

……….………..……… 11

1.6 TE Extensions to OSPF Version 2

……….………… 11

2. Overview of Path Computation Element

………..………... 15

2.1 Definition

……….……… 16

2.2 An introduction to PCE

……….………...……… 16

2.3 Motivations to use a PCE based architecture

……..…....…… 17

2.4 PCE Architecture

………..…...………….…… 18

2.5 Traffic Engineering topology visibility

…..…….……… 19

Part II

….…The Problem and Proposed Solution.……

3. The Problem

………..…….…... 23

3.1 PCE in Multi-Domain Networks

…..………...….. 24

3.1.1 Per-domain path computation………...………… 24

3.1.2 Simple cooperating PCEs………...…….………… 25

3.1.3 Backward computation………...………… 26

(6)

VI

3.1.3.2 Backward recursive path computation (BRPC)…….….. 28

3.1.4 H-PCE and other existing technologies……….…... 28

4. The Proposed Solution and its requirements

………..…..… 32

4.1 The idea of exploiting database technologies for TE

exchange and its requirements

………..….……… 33

4.2 MariaDB Galera Cluster, Technologies which meets

proposed solution requirements

……….……….. 36

4.2.1 Why MariaDB?... 36

4.2.2 Why Galera? ... 38

4.2.3 MariaDB Galera Cluster - Known Limitations………... 39

Part III

……Key Concepts of MariaDB and Galera……

5. Understanding the Essentials of MariaDB

………..………..…. 43

5.1 The MariaDB architecture

………...………... 44

5.2 Storage engines

………..………...………. 47

5.2.1 InnoDB Storage Engine………..………… 47

5.2.2 OQGRAPH Storage Engine……….……….……. 48

5.2.3 CONNECT Storage Engine……….…….………. 48

5.2.4 Performance_Schema Storage Engine………….……...….…… 49

5.3 InnoDB data structures

……….………... 49

5.4 MariaDB caches

………..…………..……... 50

5.4.1 InnoDB caches………..……….…….………. 51

5.4.2 InnoDB pages………..………… 52

5.4.3 The InnoDB buffer pool………...………..… 52

(7)

VII

5.4.3.2 Dirty pages……….……….……... 53

5.4.3.3 Old and new pages……….. 54

5.4.3.4 The read ahead optimization……..……….. 55

5.4.3.5 Diagnosing the buffer pool performance………...….. 55

5.4.3.6 Dumping and loading the buffer pool……..………...….. 56

5.4.3.7 The InnoDB change buffer………..………...….. 57

5.4.3.8 Explaining the doublewrite buffer………..……….. 58

5.4.4 The query cache……….…….…… 58

5.4.4.1 Configuring the query cache……….……….…..…… 60

5.4.4.2 Information on the status of the query cache………..……...…. 61

5.5 The information_schema database

……….………..… 61

5.6 The performance_schema database

……….…….…… 62

5.7 MariaDB resources

………..……..… 64

6. Database Replication Technologies

………..….……… 65

6.1 Database Replication

………...…………. 66

6.1.1 Asynchronous and Synchronous Replication…………...…… 67

6.1.2 Advantages of Synchronous Replication……….…… 67

6.1.3 Disadvantages of Synchronous Replication………...…….…… 68

6.1.4 Solving the Issues in Synchronous Replication………..…….… 68

6.2 Galera Cluster

………..….. 69

6.2.1 How Galera Cluster Works……….………… 70

6.2.2 Certification-Based Replication………...……… 71

6.2.2.1 What Certification-based Replication requires…………..…….. 71

6.2.2.2 How Certification-based Replication Works………..……. 71

(8)

VIII

6.2.3 Replication API………..…….….. 73

6.2.3.1 wsrep API……….….……….. 74

6.2.3.2 Global Transaction ID………...……… 75

6.2.3.3 Galera Replication Plugin……….……….…… 75

6.2.3.4 Group Communication Plugins……….………..…….. 76

6.2.4 Write-set Cache (GCache)……….….. 76

Part IV

…..….Implementation Detail and Test case…..

7. Installation and Configuration

………...…… 79

7.1 Requirements

………...…….. 80

7.2 Installation

……….…… 80

7.2.1 Preparing the Server……….…..………..…….. 81

7.2.2 Enabling the MariaDB Repository (yum Repository)…...…. 81

7.2.3 Installation of dependencies………..……..……..……… 82

7.2.4 Installing MariaDB Galera Cluster…………...………....……… 82

7.2.5 Post-installation Configuration………...………. 82

7.3 System Configuration

………...………..…….. 84

7.3.1 Description of Configuration………..……….…...………. 85

7.4 Cluster Initialization

……...………..……….. 87

7.4.1 Adding Additional Nodes to the Cluster……….…. 88

7.4.2 Testing the Cluster……..………...……….…….…… 89

7.4.3 Restarting the Cluster………..……….………... 91

(9)

IX

8. Scenario Development

………...……….………...…. 93

8.1 Reference Scenario

………..…..……… 94

8.1.1 Architecture of Database based PCE………...……..…..…….. 94

8.1.2 Functional Description of Database based PCE……..…….… 96

8.1.3 Reference Topology……….………..……. 101

8.2 Tables description in Database of PCE

………...…….. 103

8.2.1 Traffic Engineering Table…………..……….….……….. 103

8.2.2 LSP_Candidate Table………...……… 106

8.3 K-Shortest Path Computation

……….……….. 108

8.3.1 OQGRAPH Engine ……….…………... 108

8.3.1.1 OQGRAPH Engine Limitations……….………….. 112

8.3.2 Yen Algorithm……….……… 112

8.4 Internal Table synchronization

………...….…………. 115

8.4.1 What is Trigger?... 115

8.4.2 Structure of Trigger……… 116

8.4.3 The use case of Trigger in PCE………...……….. 117

9. Test Case

………..………..….. 121

9.1 Test Scenario 1: on TeCIP lab servers

……….……. 122

9.1.1 Insert Performance Evaluation……….………. 122

9.1.2 Update Performance Evaluation………..………. 125

9.1.2.1 Duplicate Update Performance Evaluation……. 126

9.2 Test Scenario 2: on Cloud

………...……. 127

9.2.1 Insert Performance Evaluation (Geo-Distributed)...……. 127

9.2.2 Update Performance Evaluation (Geo-Distributed) ….... 130

(10)

X

Conclusion

……….…. 131

References

………...…. 131

List of Abbreviations

………...…. 131

(11)

1

Introduction and motivations

Traffic exchange across Internet is practically enabled by the availability of routing information databases at network routers/gateways, at Border Gateway Protocol (BGP) and route reflectors (RR). Such databases are populated and kept consistent by network protocols as BGP and Interior Gateway Protocols (IGPs).

However, the way these traditional telecommunication protocols are defined, implemented and enhanced is not free from limitations.

First, they were designed and implemented more than 20 years ago, when processing capabilities at routers and bandwidth capacity on internet links were extremely poor. Thus, they were designed to minimize the router effort and bandwidth usage at the unavoidable expenses of performance efficiency. For example, in inter-domain routing through BGP, just shortest

path routing is typically adopted with rare use of efficient load balancing and no use at all of traffic engineering (TE). Such techniques have been utilized only within IGP Areas, i.e., in networks of very limited size. Also in this case, TE has been implemented with limited efficiency. For example, in OSPF-TE, database

synchronization follows a complex procedure (specifically designed OSPF packets for LSA header exchange to verify if the actual Link State Advertisement (LSA) content is really needed). In addition, Opaque LSA carrying TE updates may be significantly

(12)

2

delayed (even 5 seconds). Still today, one of the main concern for any network architect is the scalability of the network in terms of number of routes/addresses. However, as shown in Fig. 1, the order of magnitude is of only millions of entries, leading to databases of just GBytes. These are really negligible values for current systems and technologies. Additional limitations of traditional telecommunication protocols refer to their complex structure, objects, procedures and optional/obsolete operations, which reflect the attempt to track more than 20 years of technological evolutions within protocols fundamentally not designed for performance efficiency. This has determined many issues. For example, (i) the whole standardization process is typically very slow, (ii) interoperability is always problematic (possible presence of non-standard extensions, unsupported fields, different protocol versions and optional functionalities not adequately implemented); (iii) potential scalability issues, which typically prevent/delay an efficient exchange of resource information (a node is typically unaware of the actual load of the remote element and to avoid potential problems just limits its pace of updates and applies abstraction of resources).

Finally, proprietary internal procedures are typically deployed at each network element to access/update its local routing information database, making the database unavailable for direct access to network applications or potentially innovative network functions. Today internet works at a limited percentage of its potential, leading to extra costs and power consumptions for overprovisioning, vendor lock-in scenarios for interop issues, lack of TE solutions. This impacts the provisioning of services crossing different domains, like bandwidth-hungry cloud-based services or the expected delay-sensitive 5G services. So far, many attempts have been made to overcome such limitations, with limited to null success. The main reason is that they have all focused on the same approach, proposing just evolutionary versions of traditional telecommunication protocol solutions. To overcome the above limitations, this work proposes to exploit the currently available database (DB) technologies to perform the exchange of routing database information

(13)

3

(reachability and TE info). In particular, we propose to rely on the DB communication facilities and synchronization protocols as an attractive, already available, alternative with respect to currently adopted telecommunication protocols. In the last years, database technologies have evolved providing impressive performance. The use of such DB communication facilities has the potential to take advantage of the currently available technologies to significantly improve scalability performance, enable the introduction of effective TE strategies, speed up the support of newly introduced parameters and procedures, and enable controller operations on database parameters.

(14)

4

Objectives

The objective of this thesis work is to investigate the potential use of currently available database (DB) technologies to perform the exchange of Traffic Engineering information in Multi-Domain networks.

To achieve the goals, a cluster of database nodes including simulated TE information, path computation functionality is built and evaluated.

The technical goals are:

 Increasing efficiency and performance of traffic engineering exchange in multi-domain networks by means of the integration of different technologies (currently available database and replication technologies) and their respective protocols.  Speed up establishment of Label Switch Paths by having an already computed

table of K-Shortest Paths from each source to any destination in network topology.  Investigate performance of the proposed solution in such environment.

To reach these technical goals, some intermediate goals have to be achieved:

 Verify the requirements and investigate the currently available database and replication technologies features and limitations.

 Understanding key concepts of Database and Replication technologies and implement them with proper configuration.

 TE information of a Multi-Domain network Topology has to be designed and simulated.

 Implementation of an algorithm that can perform K-Shortest Path (KSP) computation taking e.g. bandwidth constraints into consideration.

(15)

5

Outline of thesis report

Part I addresses and introduces terminology and concepts as they are used in this work. Specifically, Chapter 1 gives an overview of MPLS Networks and Traffic Engineering. Chapter 2 introduces the Path Computation Element (PCE) architecture and discusses its key components for MPLS networks.

Part II discusses about the main problem and proposed solution. Chapter 3 provides an updated overview of existing solutions for the exchange of TE information in Multi-Domain networks along with their limitations and issues. Chapter 4 discusses the proposed solution and introduces MariaDB Galera Cluster. Also, benefits and limitation of selected Database and Replication Technology are given in this chapter.

Part III provides detail of Key Concepts of MariaDB and Galera which are relevant for this work. Chapter 5 discusses essentials of MariaDB with a focus on MariaDB caches. Chapter 6 gives an introduction to database replication concepts and terminology followed by details of Galera technology.

Part IV discusses the Implementation Detail of reference scenario and Test case. Chapter 7 is about installation and configuration of MariaDB Galera Cluster which is core of this thesis work. Chapter 8 discusses the implementation of the reference scenario. Chapter 9 demonstrates the result of test scenarios developed in Istituto di Tecnologie della Comunicazione dell'Informazione e della Percezione (TeCIP) lab. Chapter 10 concludes this thesis work.

(16)

6

Part I

MPLS Network and PCE

Background

(17)

7

1. Introduction to Multi-protocol

Label Switching (MPLS) and Traffic

Engineering (TE)

The primary goal of this chapter is to provide a knowledge of what MPLS is, and importance of Traffic Engineering information. Important definitions such as LSP, Traffic Engineering, RSVP-TE, and OSPF-TE [7] are given.

(18)

8

1.1 MPLS overview

Multiprotocol Label Switching (MPLS) is a packet-carrying mechanism used in high-performance telecommunications networks which directs and carries data using pre-established paths. MPLS makes it easy to create "virtual links" between distant nodes. It can encapsulate packets of various network protocols. MPLS is a highly scalable, protocol agnostic, data-carrying mechanism. In an MPLS network, data packets are assigned labels. Packet-forwarding decisions are made solely on the contents of this label, without the need to examine the packet itself. MPLS is nowadays a classic solution and a standard for carrying information in the networks. It has been a great solution for sending the information using packet routing. MPLS offers high-performance-networks: with speed, quality of service and resource reservation protocols.

1.2 MPLS Architecture

MPLS is an extension of the IP network architecture and it offers a number of applications for the network, e.g. Traffic Engineering (TE) [section 1.6], Fast Reoute (FRR). The routers in a MPLS network are called Label Switch Routers (LSR) and they can be core routers or edge routers. Also, the edge routers can be ingress routers or egress routers. See the figure below.

(19)

9

Figure 1.1 shows the path through which the labeled packet is traversed in MPLS network. This path is known as the Label Switched Path, or LSP, and it consists of a sequence of LSRs. The labeled packets are switched through this LSP in a MPLS network. MPLS have two main components in its architecture: control plane and data plane. See the figure 1.2.

Figure 1.2: Control plane and data plane [2]

Control Plane: It exchanges the label between adjacent nodes and controls the routing information exchange. The Label Information Base (LIB) in a LSR holds all the local labels assigned by that LSR, and a mapping between those labels and the labels received from neighboring LSRs. LSPs are established by the control plane and the labeled packets use them. Control plane exchanges routing information depending on the routing protocols, such as Open Shortest Path First (OSPF).

(20)

10

Data Plane: Data plane is a forwarding engine, which is not dependent on the routing protocols as well as label exchange protocols. Data plane uses the label forwarding information base (LFIB) table to store the information of the labels and forward the packets. The LFIB holds only those local labels that are currently being used for forwarding. It also has a mapping between these labels and the outgoing labels received from neighbor LSRs, but additionally it also holds the egress interface to be used in forwarding, and the IP next hop address.

1.3 Protocols in MPLS Networks

As in the traditional IP network, IP routing protocols are used to forward the packets but, in MPLS network, the LSRs use label switching as the forwarding mechanism. Label Switched Paths (LSP) are setup by label distribution throughout the network and the most common label distribution protocols are Label Distribution Protocol (LDP), and Resource Reservation Protocol (RSVP). RSVP [RFC 2205] is a signaling protocol which is used in a network to achieve the creation and maintenance of distributed reservation state across the network. RSVP uses the Path message to establish a path from an ingress point to egress point, and a Resv message to reserve the resources along the path. The original RSVP protocol has been extended from its original characteristic to RSVP-TE to support the Traffic Engineering [section 1.6] capabilities.

1.4 Explicit Routing

When an LSP is going to be setup between an ingress and egress node, and one wants the LSP to meet certain requirements, it is sometimes necessary to specify further details for that LSP. These requirements could be expressed in e.g. bandwidth, switching capability. The ability to set up an LSP according to certain requirements (or constraints) is called constraint based routing. A similar technique is explicit routing, where it is possible to specify the explicit path hop by hop. Further the path can be specified by Strict or Loose hops. In Strict routing the path for a specific LSP is fixed. It means that, the LSP should take the path in between two nodes. Loose routing specifies that an LSP should at least

(21)

11

follow the given nodes which mean that in addition to the specified nodes that must be followed, other nodes may also be included in the LSP.

1.5 Traffic Engineering (TE)

The ultimate goal of traffic engineering is to optimize the utilization of network resources and to minimize traffic congestion. It is a pragmatic way of handling traffic problems. One of the design goals for MPLS was to create a tool to achieve this. A description of traffic engineering can therefore be as follows:

“Traffic Engineering is all about discovering what paths and links are available in the network, what the current traffic usage is within the network and then directing traffic to routes other than the shortest so that optimal use is made of the resources within the network. This is achieved by a combination of extensions to the existing IGP routing protocols, traffic monitoring tools and traffic routing techniques” [Adrain Farrel].

1.6 TE Extensions to OSPF Version 2

For this thesis work, the focus is on TE protocol [7]. The extended version of OSPF-TE is the most common routing protocol used in IP/MPLS networks.

Specifically, OSPF is a link state routing protocol which uses Link State Advertisements to gather the network topology and specific resource information of the links in the network. The OSPF routing protocol has been extended to also be aware of e.g. the capabilities of the transport network. Such information of e.g. bandwidth is missing in the original OSPF routing protocol. Bandwidth parameters, such as maximum and utilized bandwidth, have been introduced in the OSPF traffic engineering extensions (OSPF-TE). That is why the concept of opaque Link State Advertisements (LSAs) has been introduced [RFC 3630]. To support the TE capabilities standards OSPF-TE is also extended with further information.

In traffic engineering OSPF, the available bandwidth for a specific link is advertised to help understanding where resources are available to place a LSP. For instance, a Gigabit Ethernet traffic engineering link may have the ability to carry 1 Gbit/s amount of payload

(22)

12

but on that link some bandwidth could be reserved for other LSPs and also have higher priority. Therefore, more information related to bandwidth has been added in OSPF-TE extensions. [RFC 4203].

Some specific fields of OSPF-TE which I implemented in this work [section 8.1.3] are following:

• Router Address: The Router Address specifies a stable IP address of the advertising router that is always reachable if there is any connectivity to it; this is typically implemented as a "loopback address". The key attribute is that the address does not become unusable if an interface is down. In other protocols, this is known as the "router ID".

• Link type: The Link Type defines the type of the link: 1 - Point-to-point

2 - Multi-access

• Link ID: The Link ID identifies the other end of the link. For point-to-point links, this is the Router ID of the neighbor. For multi-access links, this is the interface address of the designated router. The Link ID is identical to the contents of the Link ID field in the Router LSA for these link types.

• Local Interface IP Address: The Local Interface IP Address specifies the IP address (es) of the interface corresponding to this link.

• Remote Interface IP Address: The Remote Interface IP Address specifies the IP address (es) of the neighbor's interface corresponding to this link. This and the local address are used to discern multiple parallel links between systems.

• Traffic Engineering Metric: The Traffic Engineering Metric specifies the link metric for traffic engineering purposes.

(23)

13

• Maximum Bandwidth: The Maximum Bandwidth specifies the maximum bandwidth that can be used on this link, in this direction (from the system originating the LSA to its neighbor), in IEEE floating point format. This is the true link capacity. The units are bytes per second.

• Maximum Reservable Bandwidth: The Maximum Reservable Bandwidth specifies the maximum bandwidth that may be reserved on this link, in this direction, in IEEE floating point format. Note that this may be greater than the maximum bandwidth (in which case the link may be oversubscribed). This SHOULD be user-configurable; the default value should be the Maximum Bandwidth. The units are bytes per second.

• Unreserved Bandwidth: The Unreserved Bandwidth specifies the amount of bandwidth not yet reserved at each of the eight priority levels in IEEE floating point format. The values correspond to the bandwidth that can be reserved with a setup priority of 0 through 7. The initial values (before any bandwidth is reserved) are all set to the Maximum Reservable Bandwidth. Each value will be less than or equal to the Maximum Reservable Bandwidth. The units are bytes per second.

• Administrative Group: Administrative Group, it contains a 4-octet bit mask assigned by the network administrator. Each set bit corresponds to one administrative group assigned to the interface. A link may belong to multiple groups.

(24)

14

Path computation process is crucial to achieve the desired TE objective. Its actual effectiveness depends on a number of factors. Mechanisms utilized to update topology and TE information, as well as the latency between path computation and resource reservation, which is typically distributed, may affect path computation efficiency. Moreover, TE visibility is limited in many network scenarios, and it may negatively impact resource utilization. The Internet Engineering Task Force (IETF) has promoted the Path Computation Element (PCE) architecture, proposing a dedicated network entity devoted to path computation process. Next chapter discusses this architecture and its limitations.

(25)

15

2. Overview of Path Computation

Element (PCE)

In the previous chapter, we gave an introduction to MPLS networks and the techniques used to engineer the traffic. This chapter represents an overview of “Path Computation Element (PCE)”: A new routing paradigm for computing MPLS Traffic Engineering LSP. Important definitions such as PCE, Path Computation Element Protocol (PCEP) and Multi-Domain network has been given in this chapter.

Note that an excellent overview and detail definition of Path Computation Element, in inter-layer multi-inter-layer contexts, using single, multi, and hierarchical PCE architecture can be found in [3].

(26)

16

2.1 Definition

A Path Computation Element (PCE) is an entity that is capable of computing a route based on a network topology, applying computational constraints to the computation [draft-PCE-Arch].

2.2 An introduction to PCE

MPLS technology provides sophisticated traffic engineering, which controls label switching path with an explicit route to satisfy its constraints such as bandwidth and delay. Constraint-based path computation is able to be performed at a path computation engine equipped with a head-end LSR. The path computation is based on a network status including current TE information. However, as a network becomes large and consists of multiples domains, path computation becomes complex. Examples of domains include Interior Gateway Protocol (IGP) areas, Autonomous Systems (ASes). A conventional path computation approach is not completely able to provide optimal paths. Therefore, a new path computation model is required [4].

Because of the strong motivations for PCE in MPLS networks, Internet Engineering Task Force (IETF) launched PCE working group to standardize PCE protocols in January 2005. PCE is functionally separate from label switching routers (LSRs). PCE is an entity that is capable of computing a network path or route based on a network graph, and applying computational constraints. It is applied to intra-area, inter-area, inter-AS, and inter-layer traffic engineering. PCE-based inter-layer traffic engineering framework optimizes network resource utilization globally, i.e. taking into account all layers, rather than optimizing resource utilization at each layer independently. This allows better network efficiency to be achieved. Although this work could be applied to Multi-layer scenario, our focus in this thesis is on PCE-based computation model in MPLS (Single-Layer / Multi-Domain) networks. Figure 2.1 shows Key components of PCE architecture.

(27)

17

Figure 2.1 Key components of PCE architecture

2.3 Motivations to use a PCE based architecture

The increasing demand for applications and services demanding for flexible and guaranteed Quality of Service (QoS) has pushed network operators to adopt Multi-Protocol Label Switching (MPLS) in core networks [5] [RFC 2945]. MPLS provides the Traffic Engineering (TE) capability to route traffic flows, namely Label Switched Paths (LSPs), along explicit routes. Thanks to resource availability and topology information collected through routing protocols (e.g., Open Shortest Path First with TE extensions, (OSPF-TE) [section 1.6], such TE capability allows source nodes to perform path computation subject to additional QoS constraints typical of such networks, e.g., guaranteed bandwidth in MPLS networks. The capability of the PCE to compute different types of paths allows the PCE to provide Traffic Engineering functions in a MPLS enabled network.

There are several reasons behind the PCE based architecture. The constraint-based path computation is a fundamental building block for traffic engineering in MPLS networks. In MPLS networks a PCE is the tool that can perform computation of a path and take constraints into consideration.

The PCE collects link-state information and performs path computation on behalf of network nodes. The PCE provides the additional advantage that network nodes can

(28)

18

avoid highly CPU-intensive path computations and effective TE solutions are achievable also in case of legacy network nodes.

2.4 PCE Architecture

The main role of a PCE is to respond to a path computation request from an ingress node or a router. A PCE can be implemented in a dedicated network server which is a node that is not participating in forwarding traffic. On the other hand it listens on the exchange of routing and traffic engineering information, from that respect it is part of the control plane of the MPLS enabled network. A path request is received by the ingress node and before it can initiate signaling to establish a traffic engineering path, it makes a path computation request to the “external” path computation server. In this architecture, PCE also operates on the Traffic Engineering information to compute the request path. Here, PCE listens on the routing

protocol information to build the traffic engineering database [section 8.2]. The basic architecture of a PCE based network is shown in (Figure 2.2).

Figure 2.2: Basic architecture of PCE based network

 PCC: Generally a Path Computation Client (PCC) [draft-PCE-Arch] is an edge/ingress node of a network. The main task of a PCC is sending a path request to the path computation element according to the path initiating request from a user or a particular application.

 PCE: Path Computation Element [draft-PCE-Arch] is a component or application or network node which performs path computation tasks according to a path

(29)

19

computation request. After computing the required path, the PCE sends it to the path computation client.

 TED: A traffic engineering database is built up according to the resource and network topology information of a network domain. This database contains some additional information (e.g. bandwidth) of the network node. In this work, we use the terminology Traffic Engineering information and these information in our database stored in Traffic_Eng Table [Section 8.2]

 Path Computation Request: Path Computation request [draft-PCE-Arch] is sent by the PCC to Path Computation Element.

 Path Computation Response: Path Computation response [draft-PCE-Arch] is sent by the PCE to the respective PCC that requested a path computation.

Communication between a network element, referred to as Path Computation Client (PCC) and the PCE is achieved by exploiting the Path Computation Element communication Protocol (PCEP) [6].

So from the above discussion we can understand that a PCE responds to a path request from a PCC by computing a path. The PCC uses this path to setup an LSP. For example, when an ingress node or router is setting up a path using, the edge node can query a PCE for the path between two points where it wants to setup a traffic engineering path. PCE should use path computation algorithms like CSPF to find the optimal path.

2.5 Traffic Engineering topology visibility

The PCE-based path computation model is originally driven by constrained-based shortest path in inter-area and inter-AS traffic engineering. In an inter-area/inter-AS network, an LSR that participates in IGP does not have whole TE topology in the network. IGP limits TE topology advertisement within one routing domain. As defined in [RFC4726], a domain is any collection of network elements within a common sphere of address management or path computational responsibility. Examples of such domains

(30)

20

include IGP areas and ASes. In multiple-domain networks, an ingress LSR has not TE topologies of other domains because TE information is not exchanged across the domain boundaries due to diversity of carriers’ TE policies. Service providers want to have own path computation engine, because it is a key to traffic engineering, which play an important role of their revenue. On the other hand, from vendors’ points of view, it is not desirable to implement a path computation engine that considers all the requirements from network providers.

Therefore, the ingress LSR is not able to provide a complete set of an explicit route. The ingress LSR specifies explicit routes only for its own domain, but does loose routes for other domains. The loose routes are expected border LSRs to expand them to explicit routes in the signaling procedure. However, the route that is provided by the ingress LSR may not satisfy the requirements. This causes to increase path setup time. Even if a path that satisfies the constraints is found, it may not be optimal. This is because border LSRs are not selected with whole TE topology information.

Figure 2.3 shows architecture of PCE in Multi-Domain networks.

(31)

21

Issues for Routing in Multi-Domain Networks can summarized:  The lack of full topology and TE information.

 No single node has the full visibility to determine an optimal or even feasible end-to-end path.

 How to select the exit point and next domain boundary from a domain.

 How can a head-end determine which domains should be used for the end-to-end path?

 Information exchange across multiple domains is limited due to the lack of trust relationship, security issues, or scalability issues even if there is trust relationship between domains.

To compute inter-domain path, PCE collects multi-domain TE topology by participating in multiple IGP routing domains, and/or PCE communicates with another PCE to exchanges their computed results. Inter PCE communications also is achieved by exploiting the PCEP [6]. However, [section 3.1] discusses the existing technologies for solving this issue.

This part of thesis, discussed the essential background of this thesis work. Chapter 3 of this work dedicated to the main problem, it discusses the existing technologies for the exchange of TE information across domains and their main limitations. Chapter 4 is introduces the proposed solution which by means of the integration of different technologies and their respective protocols to overcome that kind of limitations.

(32)

22

Part II

(33)

23

3. The Problem

Section 2.4 of this thesis work, discussed the Traffic Engineering topology visibility in multi-Domain Networks. This chapter discusses about existing technologies for the exchange of TE information across domains and their main limitations.

(34)

24

3.1 PCE in Multi-Domain Networks

Section 2.4 of this thesis work, discussed the Traffic Engineering topology visibility in multi-Domain Networks. One of the main motivation behind PCEs deployment is to tackle the problem of multi-domain label switched paths (LSPs) establishment. In single domain networks, the dissemination of TE information is fully allowed. However, in multi-domain networks, this is inhibited by the issue of preserving control plane scalability and confidentiality (in multi-carrier scenario) across domains, thus limiting the amount of TE information exchanged among domains [section 2.4]. Thus, for computation of inter-domain paths, PCE collects multi-domain TE topology by communicating with another PCEs to exchanges information about other domains. In this regard many solution has been proposed. A comprehensive and updated overview of the existing protocol solutions for the exchange of TE information across domains is provided in the recently published [RFC7926] [8]. Some of the main solutions and their limitations are discussed in this chapter.

3.1.1 Per-domain path computation

Per-domain path computation is based on multiple path computations performed during the signaling phase of inter-domain LSP setup. Each domain is assumed to have a PCE responsible for path computation exclusively inside that domain. The sequence of PCE/domains is assumed to be known in advance. In particular, each PCE, given a destination domain, is aware of the next domain hop. In Fig. 3.1 the PD procedure is sketched. Thus, the signaling phase and the path computation phase are interlaced and the overall inter-domain path is computed and signaled in a distributed way through independent and uncoordinated partial path computations performed by each PCE of the involved domains. At this approach, each PCE computes the path from its ingress to egress router in its domain. As you can see in figure 3.1, for each domain, there is a request to PCE of domain for computation of intra-domain LSP.

(35)

25

Fig. 3.1. Per-domain path computation procedure.

The PD procedure, due to incoordination among domain PCEs, leads to sub-optimal path computation and is subject to possible signaling error events. Moreover, if there are multiple connections between the domains, the source PCE may provide a path that is optimal locally, but not overall.

3.1.2 Simple cooperating PCEs

The cooperating PCEs configuration allows the PCEs to exchange information in order to find a better end-to-end connection. Each PCE sends the best solution to the next PCE in the chain, but the neighboring PCE can suggest another connection to the former PCE. However, each connection is chosen locally which means that when the optime end-to-end path does not use the local optimum paths, then the global solution cannot be found. Figure 3.3 shows simple cooperating PCEs procedure.

(36)

26

Fig. 3.2. Simple cooperating PCEs procedure.

3.1.3 Backward computation

In the Inter-PCE path computation, coordination among PCEs of different domains is introduced. This strategy enables the decoupling between path computation and the signaling phase. Each PCE computation is performed based on the segments computation results provided by the contiguous PCE through PCEP. Among the possible strategies enabling inter-PCE path computation, the backward computation is the most considered one, in two versions, standard and recursive. Both versions assume that, for a given source-destination couple, the PCE/domain sequence is a-priori determined. In particular, each PCE is aware of the next involved PCE. However, the scope of such exchange is limited to few domains and synchronization process is quite time consuming, with no intention to cover larger scenario or the whole internet.

3.1.3.1 Standard backward path computation

The standard backward path computation [9] is exploited through a chain of PCEP sessions between adjacent PCEs. The source and the destination PCEs are those referred

(37)

27

to the domain containing the LSP source node and destination node, respectively. Fig. 3.3 shows the standard backward procedure. The sequence of involved PCEs (i.e., domains) is a-priori determined. The source PCE (i.e., PCE1), upon path request coming from a controlled node (e.g., node A requesting a path to node Z) triggers a Request towards the next PCE (i.e., PCE2) and, in turn, the Request is forwarded until the destination PCE (i.e., PCE3) is reached. The destination PCE computes a path between the destination node and one of the Border Nodes (BN) connecting its domain to the domain of the requesting PCE (e.g., segment W-Y-Z). The path segment is enclosed within a Request and provided to the penultimate PCEs (i.e., PCE2), that computes a path between one BN of the requesting PCE and the BN indicated by the provided segment (i.e., segment I-L-M-N-O-W). The ERO segment is attached to the previous one and the operation is repeated until the source domain is reached. The overall path computation is the result of independent segment path computations because the choice of each BN is delegated to only one PCE. As a consequence the overall computation is sub-optimal (e.g., the path computed in Fig. 3.2 has hop-count metric equal to 11, in contrast with the shortest 9-hop path).

(38)

28

3.1.3.2 Backward recursive path computation (BRPC)

The BRPC method starts at the destination domain, which sends to his neighbor the cost from the edge router to the destination node hence termed backwards [10]. As a result, the neighboring domain can create a tree with its egress nodes and the destination node. This process continues until the origin domain which then selects the best end-to-end path. Fig. 3.4 depicts three connected domains with one PCE per domain. Using BRPC, PCE1 sends a request to PCE2, which forwards it to the PCE3. The PCE3 replies with the distance from its border nodes (Node 3-2 A and Node 3-2 B) with the domain 2. The PCE2 carries out the same operation sending a tree with the possible combinations. When multiple domains are interconnected such information exchange can be complicated. If the intermediate domains are known, this process is easier. Border gateway protocols (BGPs) can be used to select the possible domains to check this PCE based computation. However, BRPC does not scale with complex multi-domain topologies.

Fig.3.4. Backward recursive path computation.

3.1.4 Hierarchical PCE (H-PCE) and other existing

technologies

The Hierarchical PCE (H-PCE) [11] architecture proposal defines the procedures for combining end-to-end path computation with an effective domain sequence computation. In the H-PCE, a single parent PCE (pPCE) is responsible for inter-domain path computation, while in each domain a local child PCE (cPCE) performs intra-domain

(39)

29

path computation. Hierarchical PCEs have not have information of the whole network, but are only aware of the connectivity among the domains and provide coordination to the PCEs. The path request is sent to an upper-hierarchically

PCE, which ask to the subsidiary PCE about their connectivity between the candidate inter-domain connections. Once this answer is known, the best solution is selected and it is transmitted to the source PCE. Such scenario is shown in Fig. 3.5.

Fig. 3.5. Hierarchical multi-domain path computation element configuration.

Within the H-PCE architecture several methods have been proposed to store, update, and retrieve the intra-domain information at the pPCE, such as the recently standardized Border Gateway Protocol with Link State extensions (BGP-LS [12], [13]). However, this approach still introduces scalability concerns especially under dynamic traffic condition such as during restoration, because all the computation procedures for inter-domain paths directly involve the pPCE. Alternatively, emerging solutions propose to use the BGP-LS protocol to exchange the intra-domain information among peer PCEs [3]. This solution, illustrated in Fig. 3.6, is interesting for solving the scalability concerns at the pPCE but is still affected by the constraints implied by the utilization of a specific protocol. Indeed, the traditional way of defining and/or enhancing communication protocols is not free from limitations (e.g. BGP-LS). First, the whole process is typically

(40)

30

slow: years are usually needed to define protocol requirements, protocol specifications, perform related software implementation and run inter-operability tests. Moreover, inter-operability is always challenging to be achieved among multiple different implementations. A second drawback of traditional communication protocols is that the amount of TE information advertised from one domain to another is typically limited by potential scalability issues that may affect the receiving domain. However, the remote domain may not be overloaded and could be effortlessly able to handle all needed information, thus potentially improving path computation efficiency. Third, in traditional approaches (see Fig. 3.6, and Fig. 3.7), specific internal procedures need to be implemented at the controller to store and retrieve the information in its local database. In particular, a controller has to run internal API to retrieve TE parameters from its local database and encapsulate them into a communication protocol or into a northbound interface for service use. At the remote controller, TE parameters are de-capsulated and inserted through internal API into its local database. All this operations introduce controller complexity and limit the capacity to rapidly enhance controller capabilities.

Fig. 3.6. Traditional H-PCE architecture employing communication protocols (e.g., BGP-LS) for exchange of TE information.

Fig. 3.7. Emerging architectures employing communication protocols (e.g., BGP-LS) for exchange of TE information.

(41)

31

This chapter, summarizes the existing technologies for propagation of TE information in multi-domain networks and their limitations. In this thesis work, we proposes to exploit the currently available database (DB) technologies to improve the performance in the retrieval and synchronization of TE information. In particular, we propose to rely on the DB communication facilities and synchronization protocols as an attractive, already available, alternative with respect to currently adopted dedicated and traditional telecommunication protocols. Next chapter (chapter 4) discusses this proposed solution and introduces advanced database and replication technologies. The reference scenario and implementation detail of this proposed solution discussed in Part IV.

(42)

32

4. The Proposed Solution and its

requirements

This chapter discusses the proposed solution for the exchange of TE information across domains and its requirement. Also, Benefits and limitation of selected Database and Replication Technology are given in this chapter.

(43)

33

4.1 The idea of exploiting database technologies

for TE exchange and its requirements

In previous chapter of this work, the main limitation of existing technologies for exchange of Traffic Engineering information across domains has been discussed. However, the scope of such exchange is limited to few domains and synchronization process is quite time consuming, with no intention to cover larger scenario. In this thesis work, we proposes to exploit the currently available database (DB) technologies to improve the performance in the retrieval and synchronization of TE information. In particular, we propose to rely on the DB communication facilities and synchronization protocols as an attractive, already available, alternative with respect to currently adopted dedicated and traditional telecommunication protocols. More specifically, the proposal is for relational database management systems (RDBMS) operating by means of standard Structured Query Language (SQL). To achieve interoperability, besides communication protocol and SQL, standard definitions of network and topology parameters are considered.

Once a controller stores or updates information locally, through standard SQL, the policy-based DB synchronization takes place in an automatic way, without additional operations performed by the controller.

In particular, DB replication is applied, copying and distributing database data from one database to another, achieving automatic synchronization among databases to maintain

consistency. This is one of the main requirement for the proposed solution.

Several types of data replication technologies can be considered. In this work, we focus on transactional replication. This type typically starts with a snapshot of the database to be replicated. After the initial snapshot, modifications are distributed in near real time, in the same order as they occurred, guaranteeing consistency. This type of replication is typically utilized in server-to-server communications to address very high volume of insert, update, and delete activities, applying incremental changes with low latency in

(44)

34

data replication. Potential conflicts are automatically resolved through the execution of isolated (atomic) transactions.

Transactional replication is suitable for multi-domain networks since each controller is responsible for the resources under its domain of visibility. Moreover, it is the only element updating the related DB info (i.e., DB master). No conflicts occur in the update process since only passive replication is applied. Furthermore, changes may involve high volume of modifications with low experienced latency, since it is an important requirement to achieve fast network convergence time.

With respect to traditional telecommunication protocols where the case of a single parameter variation (e.g., unreserved link bandwidth) requires the entire set of link parameters to be re-advertised, in the case of DB replication just the specific parameter variation is replicated, i.e. not including the whole set of TE link info. This significantly simplifies the TE exchange process towards remote domains.

Replication occurs at transaction commit time, by multicasting transaction update to the cluster. Two types of replication can be considered: synchronous and asynchronous. The former guarantees that changes happening on one DB are synchronously propagated to other master DBs of the cluster. Such mechanism is also named as multi-master replication. The latter type provides no guarantees about the delay between applying changes on master DB and the propagation to slave DB. This mechanism is typically named as master-slave replication. Synchronous replication is typically highly available and always consistent. However, it is typically slower than asynchronous replication because the data is considered available only upon the slowest DB is in sync with all other DBs.

Both master and master-slave replication are of interest in the context of multi-domain networks. For example, in the case of multi-multi-domain single-carrier network with overlapped visibility among different controllers, a multi-master configuration may be preferable for scalability, availability and consistency reasons. On the other hand, in the

(45)

35

case of either multi-carrier networks or a single-carrier network with non-overlapped visibility across domains (e.g., multi-vendor networks), a master-slave configuration may be more suitable because one domain is publisher (master) of its own resources and subscriber (slave) of other domain resources.

In the latter case of multi-carrier networks, it is important to preserve adequate confidentiality while advertising network resources. In this scenario, specific policies can be easily configured and applied at the master database, providing full administrative control to the resources to export to other carriers. This thesis work, focused only on multi master scenario.

Fig. 4.1 shows the proposed DB-based solution applied to the previously introduced multi-domain scenario. In this case, to exchange TE information to other controllers, it is no more necessary to implement dedicated protocol messages which encapsulate/de-capsulate network parameters. Instead, the policy-based DB synchronization takes place in an automatic way, implementing the exchange of YANG-defined TE information without dedicated operations performed by the controller.

Fig. 4.1 Proposed solution employing DB synchronization protocols (e.g., SQL) for exchange of YANG-defined TE info.

(46)

36

4.2 MariaDB Galera Cluster, Technologies which

meets proposed solution requirements

Galera Cluster, or simply Galera, is a cluster implementation for MariaDB and MySQL. MariaDB Galera Cluster is an official MariaDB distribution that contains the Galera technology. It follows the same major version numbers as the underlying MariaDB version.

4.2.1 Why MariaDB?

MariaDB is a mature, stable, open source relational database. From its beginning in 2009 as a branch or fork of the MySQL database, to its status today as the default version of database in most Linux distributions, and the database of choice for many companies. MariaDB shares many features and capabilities of its parent database, but like most children it has also surpassed its parent in many ways.

Here are some top advantages of MariaDB over other Database Technologies: Truly Open Source: MariaDB development is more open and vibrant.

 All code in MariaDB is released under GPL, LGPL or BSD.

 MariaDB have not closed source modules like the ones that can be found in MySQL Enterprise Edition. In fact, all the closed source features in MySQL 5.5 Enterprise Edition are found in the MariaDB open source version.

 MariaDB client libraries (for C, for Java (JDBC), for Windows (ODBC)) are released under LGPL to allow linking with closed source software. MySQL client libraries are released under GPL that does not allow linking with closed source software.  MariaDB includes test cases for all fixed bugs. Oracle doesn't provide test cases for

new bugs fixed in MySQL 5.5.

 All bugs and development plans are public.

(47)

37

More cutting edge features: MariaDB has had much more new features in recent years and they are released earlier. Therefore we trust MariaDB to deliver us the best features. Great support team: MariaDB offers support engineers that are said to be experts in both MariaDB and MySQL. They offer 24/7 support with an enterprise subscription for mission-critical production systems.

Many database connectors: MariaDB offers a variety of database connectors including: ADO.NET, C, C++, D, Java, JavaScript, ODBC, Perl, PHP, Python, Ruby, and Visual Studio plug-in.

Galera active-active master clustering: MariaDB offers master-master and master-slave replication as well. MariaDB also uses the Galera Cluster for multi-master. As of MariaDB 10.1, Galera included with MariaDB. Galera, unlike traditional MySQL master-slave replication, provides master-master replication and thus enables a new kind of scalability architecture for MariaDB.

Better performance: MariaDB claims it has a much improved query optimizer and many other performance related improvements. Certain benchmarks show that MariaDB is radically faster than MySQL. Many optimizer enhancements in MariaDB 5.3. Subqueries are now finally usable. The complete list and a comparison with MySQL is discussed in [14]. A benchmark can be found in [15].

Pool of Threads: This allows MariaDB to run with 200,000+ connections and with a notable speed improvement when using many connections [16]. Lots of speed improvements when a client connects to MariaDB.

(48)

38

4.2.2 Why Galera?

Galera Cluster [17] is a synchronous multi-master database cluster, based on synchronous replication and Oracle’s MySQL/InnoDB. When Galera Cluster is in use, you can direct reads and writes to any node, and you can lose any individual node without interruption in operations and without the need to handle complex failover procedures.

At a high level, Galera Cluster consists of a database server—that is, MySQL or MariaDB—that then uses the Galera Replication Plugin to manage replication. To be more specific, the MySQL replication plugin API has been extended to provide all the information and hooks required for true multi-master, synchronous replication. This extended API is called the Write-Set Replication API, or wsrep API.

Through the wsrep API, Galera Cluster provides certification-based replication. A transaction for replication, the write-set, not only contains the database rows to replicate, but also includes information on all the locks that were held by the database during the transaction. Each node then certifies the replicated write-set against other write-sets in the applier queue. The write-set is then applied, if there are no conflicting locks. At this point, the transaction is considered committed, after which each node continues to apply it to the tablespace.

This approach is also called virtually synchronous replication, given that while it is logically synchronous, the actual writing and committing to the tablespace happens independently, and thus asynchronously on each node.

Galera Cluster provides a significant improvement in high-availability for the MySQL ecosystem. The various ways to achieve high-availability have typically provided only some of the features available through Galera Cluster, making the choice of a high-availability solution an exercise in tradeoffs.

(49)

39

The following features are available through Galera Cluster: • True Multi-master Read and write to any node at any time.

• Synchronous Replication No slave lag, no data is lost at node crash.

• Tightly Coupled All nodes hold the same state. No diverged data between nodes allowed.

• Multi-threaded Slave For better performance. For any workload. • No Master-Slave Failover Operations or Use of VIP.

• Hot Standby No downtime during failover (since there is no failover).

• Automatic Node Provisioning No need to manually back up the database and copy it to the new node.

• Supports InnoDB.

• Transparent to Applications Required no (or minimal) changes to the application. • No Read and Write Splitting Needed.

The result is a high-availability solution that is both robust in terms of data integrity and high-performance with instant failovers.

4.2.3 MariaDB Galera Cluster - Known Limitations

MariaDB Galera Cluster has some limitation listed in [18]. However, most of them are not applying for our proposed solution in this work. These limitations are:

 Currently replication works only with the InnoDB storage engine. Any writes to tables of other types, including system (mysql.*) tables are not replicated (this limitation excludes DDL statements such as CREATE USER, which implicitly modify the mysql.* tables — those are replicated). There is however experimental support for MyISAM. However, we are using InnoDB storage engine and MyISAM is

(50)

40

 All tables should have a primary key (multi-column primary keys are supported). DELETE operations are unsupported on tables without a primary key. Also, rows in tables without a primary key may appear in a different order on different nodes.

Not a big deal.

 The query log cannot be directed to a table. If you enable query logging, you must forward the log to a file: log_output=FILE. The query log in our configuration is

disabled, discussed in Part IV [Implementation detail].

 XA transactions are not supported. We are not using this feature.

 Transaction size. While Galera does not explicitly limit the transaction size, a writeset is processed as a single memory-resident buffer and as a result, extremely large transactions (e.g. LOAD DATA) may adversely affect node performance. To avoid that, the wsrep_max_ws_rows and wsrep_max_ws_size system variables limit transaction rows to 128K and the transaction size to 1 GB by default. If necessary, users may want to increase those limits. Future versions will add support for transaction fragmentation. In our scenario, mainly we execute small

transactions but very frequently. So, this limitation not applies to our proposed solution.

 Prior to MariaDB Galera Cluster versions 5.5.40-galera and 10.0.14-galera, the query cache needed to be disabled. The query cache is disabled in our configuration.

Discussed in Part IV [Implementation detail].

 In an asynchronous replication setup where a master replicates to a galera node acting as slave, parallel replication (slave-parallel-threads > 1) on slave is currently not supported. We are using Synchronous replication.

(51)

41

 Windows is not supported. We are good with Linux.

 While binlog_format is checked on startup and can only be ROW (see Binary Log Formats), it can be changed at runtime. Do NOT change binlog_format at runtime, it is likely not only cause replication failure, but make all other nodes crash. The binlog_format is disabled in our configuration. Discussed in Part IV [Implementation

detail].

This chapter introduced the proposed solution for exchange of Traffic Engineering information across domains. Advanced technologies such as MariaDB Galera Cluster and their benefits and limitation for this proposal discussed. Before moving to reference scenario and implementation detail, in Part III of this work I discussed some essential part of the most recent Database and Replication technology which is essential to understand the work and for future work.

(52)

42

Part III

Key Concepts of MariaDB and

Galera

(53)

43

5. Understanding the Essentials of

MariaDB

This chapter provides in detail critical concepts of the MariaDB architecture with an specific emphasize on the most relevant parts which used in this thesis work.

(54)

44

5.1 The MariaDB architecture

MariaDB is a community-driven fork of MySQL that was started in 2009 by Monty Widenius, the original author of MySQL, after the old project was acquired by Oracle. The first version of MariaDB was based on MySQL 5.1, and the improvements to MySQL base code are regularly merged into the MariaDB project. Other features are also merged from the Percona Server, another fork that is very similar to the mainstream product. The most important Percona feature merged into MariaDB is XtraDB, a fork of the InnoDB storage engine. InnoDB is the default storage engine in modern MySQL and MariaDB versions. XtraDB fixes bugs that are still present in InnoDB before the official bug fixes are released by Oracle. It also has performance improvements and other minor features. The protocol, API, and most SQL statements that work with MySQL also fully work with MariaDB. The plugins that are written for MySQL work with MariaDB too. Thanks to these characteristics, most of the applications for MySQL work with MariaDB, without any modifications required. But, at the same time, switching to MariaDB allows one to use interesting features that are not available with MySQL. While the reader is probably familiar with Databases Management System (DBMS) in general, and particularly MariaDB or MySQL, a quick architecture review might be useful. In this introductory chapter, the main components and operations performed by the server are listed.

(55)

45

Fig 5.1 architecture of MariaDB

Basically, from a user's point of view, MariaDB receives some SQL queries or statements, elaborates them, and returns a result set. Figure 5.1 shows this process and the components involved and following paragraphs describes them in more detail:

 When a client connects to MariaDB, an authentication is performed based on the client's hostname, username, and password.

 If the login succeeds, the client can send a SQL query to the server.  The parser understands the SQL string.

 The server checks whether the client has the permissions required for the requested action.