Search for a massive resonance decaying into a pair of Higgs with 4 b-quarks in the final state and development of machine learning technique applied to b-tagging algorithms.

(1)

U

NIVERSIT

`

A DEGLI

S

TUDI DI

P

ISA

DIPARTIMENTO DI FISICA ‘E. FERMI’

Tesi di Laurea Magistrale in Scienze Fisiche

Curriculum ‘Fisica delle Interazioni Fondamentali’

Ottobre 2017

Search for a heavy resonance decaying into

a pair of Higgs bosons in the b¯bb¯b final state

and application of a deep learning technique to

a b-tagging algorithm with the CMS experiment

Candidato:

Andrea Malara

Relatore:

Prof. Andrea Rizzi

(2)

(3)

Introduction

The discovery of the Higgs boson opens new possibilities for searches beyond the Standard Model. In particular, the recently discovered particle can be used as a tool to search for heavy mass resonances decaying into a pair of Higgs bosons. The largest branching ratio of the Higgs boson is into a pair of b-quarks. Thus, a natural choice to search for a new massive state X decaying into a pair of Higgs bosons through four b-jets in the final state. This thesis presents a search for X → H(b¯b)H(b¯b) performed using 35.9 fb−1

of proton-proton collision data recorded by the CMS detector at the LHC at the centre-of-mass energy of 13 TeV during 2016.

Signal events are expected to produce a peak on top of the invariant mass multi-jets background distribution of the four reconstructed b-jets. The final goal of this analysis is to measure an excess of events above the background or, otherwise, to provide an upper bound on the production cross section multiplied by the branching ratio of the process.

The event selection begins by identifying events containing at least four central b-jets. Amongst these jets, two pairs are chosen according to appro-priate criteria such that they are compatible with the Higgs particles. This search covers a broad range of mass hypotheses for the resonance, between 260 GeV and 1200 GeV. The kinematics involved in the decay of such a res-onance changes substantially over this range, and thus two different mass ranges have been used: the Low Mass Regime (LMR) covers the mass range up to 650 GeV, while in the Medium Mass Regime (MMR), the mass is in excess of 550 GeV. Here the decaying Higgs bosons are sufficiently boosted so that the decaying jets have a small angular separation and this can be used as a discriminating variable to select the jets originating from the same Higgs. On the other hand, for LMR we use the looser requirement that the di-jets invariant mass is compatible with the nominal Higgs mass.

(8)

viii

To perform the analysis, we make use of a signal shape modelled from MC simulation. The simulated signal events are produced in the assumption of a narrow width resonance produced via gluon-gluon fusion for two different spin hypotheses. The signal samples were produced for 20 mass points in steps of 10, 50 or 100 GeV in the mass range and, since the mass resolution is smaller than the step size, to account for intermediate mass points, we interpolate both the shape and the signal events normalisation.

On the other hand, a data-driven approach is used to model the non-resonant multi-jet background contribution using a fit with a smooth func-tion. To avoid possible biases, this analysis is designed and tested without ex-ploiting the information of the data events falling in the Signal Region (SR) and, in this sense, the analysis is carried out in a “blind” procedure. For this reason, to validate the fit strategy we use several control regions, defined on the 2D space of the two reconstructed Higgs candidate masses.

My personal contribution has been the optimisation of this analysis for the data collected in the Run 2 at LHC. I performed almost all the parts of the analysis studying the effects of the new b-tagging algorithms, the bias related to the choice of the background model and, in the perspective of minimisation of such bias, I improved the background modelling.

The background modelling strategy used for Run 1 data tries to fit the distribution of the control regions; differently from the previous searches where this approach worked properly, the increase of statistics collected dur-ing Run 2 enhances some features on the distribution, makdur-ing difficult to properly model it. Therefore we adopt two different strategies for LMR and MMR cases. For LMR, an “ABCD” method has been used to predict a com-patible shape of the background for the signal region where to perform the fit-strategy validation. I also showed that it is convenient to split the range of the LMR into two overlapped ranges to separate the turn-on and the tails and avoid most of the fitting problems. At the same time, this choice provides the advantage to extend the LMR total coverage and to exclude the kinematically problematic lower mass range in MMR, which is covered in the LMR region. To prove this statement I used several selection criteria that led us to minimise the uncertainty in the choice of the functional form for our background model.

Once that all the procedure has been defined, tested and verified, the analysis has been unblinded and, given these assumptions, I searched for an excess of events above the background expectation. No significant statistical excess is observed for masses between 260 GeV and 1200 GeV and the upper limits at 95% of confidence level has been set for the production cross-section multiplied by branching ratio as a function of the mass and for both spin hypotheses.

(9)

ix In the second part of this thesis, as a possible improvement for b-tagging for future analysis, I worked on machine learning algorithms as a first step for a development of a new algorithm. In particular, in this search with 4 b quarks in the final state, any improvement for a single b-tagging provides a better global signal extraction efficiency.

As the first step, I select the optimal working point for the most recent available algorithms (CMVA and deep-CSV) provided by CMS offline recon-struction. Then I performed an optimisation of a new algorithm that uses a deep neural network and takes as input low-level information, differently from most of the state-of-the-art algorithms in CMS that rely on second-ary vertexes reconstruction information. A natural assumption is that if we let a deep neural network deal with it, it is reasonable to expect that some information can be recovered.

The choice of the input variables is inspired by the standard vertex recon-struction algorithm (IVF), whereas no analytical fit of the secondary vertex is used. The usage of a deep neural network is justified by the large number of variables used to accurately describe each jet event, order of thousand vari-ables if all the tracks are included, which would be difficult to optimise with a simple neural network. After many attempts, a satisfactory implement-ation of the network has been reached using feed-forward layers, recurrent network (LSTM) and convolutional units, each used to address a different task given the input variables shapes such that the network is aware of the physics which was used to choose the input.

The results obtained from this first step are promising and interesting for future developments since the new algorithms outperform the existing one, improving the tagging efficiency up to 10-15% for a fixed mistag rate (false positive rate). In particular, I tested how this new algorithm behaves for jets of high momentum, which is particularly relevant in the X → H(b¯b)H(b¯b) analysis.

(10)

(11)

1

Theoretical basis and motivations

The Standard Model (SM) is the theory that until now has been able to ex-plain the phenomenology of the microscopic world and to identify its element-ary constituents, confirmed by several High Energies Physics experiments. It gives a phenomenological formulation of three of the four forces known in Nature: electromagnetic force, weak and strong. The gravitational interac-tion is negligible at such scales of energy and is therefore not included in this model. It is a quantum field theory in which each interaction is mediated by the exchange of the so-called gauge boson. To date, although all the exper-iments are in agreement with it, the Standard Model cannot be considered a complete theory of fundamental interactions, since it does not include a description of gravity and is not compatible with general relativity. There-fore, there is the need to explore beyond the electroweak scale (_{∼ 250 GeV),} in search of symmetries or dimensions that are more extensive than those that today characterise the Standard Model. Moreover, the Higgs boson is of particular importance in SM. This particle is the result of the spontan-eous breaking of electro-weak symmetry, and, through the Higgs mechanism, assigns mass to the particles. In this chapter, we will discuss the Standard Model, the Higgs mechanism and its consequences, the theoretical and ex-perimental limits related to the Higgs boson and, finally, its production and decay processes at the Large Hadron Collider (LHC).

1.1 The Standard Model

The Standard Model (SM) is the theory which explains three of the four forces actually known through the bosons and also explains the structure of matter through the fermions. It includes 12 fermions of 1

2 spin, which

are the constituents of matter and obey the Pauli exclusion principle, and 4 bosons of spin 1, which carry the three forces (see Figure 1.1.1) plus the additional recently discovered Higgs boson, which gives masses to all fermions

(12)

2 Theoretical basis and motivations and bosons. Theoretical considerations and experimental data have led to the conclusion that the strong nuclear force, the weak nuclear force and the electromagnetic force are described by a gauge theory based on the local symmetry of the U (1)_Y× SU (2)L× SU (3)C group, with a partial breaking

of the symmetry induced by the Higgs mechanism in the SU (2)_L× U (1)Y

electro-weak sector. R/G/B2/3 1/2 2.3 MeV up

u

R/G/B−1/3 1/2 4.8 MeV down

d

−1 1/2 511 keV electron

e

1/2 < 2 eV e neutrino

ν

e

R/G/B2/3 1/2 1.28 GeV charm

c

R/G/B−1/3 1/2 95 MeV strange

s

−1 1/2 105.7 MeV muon

µ

1/2 < 190 keV µ neutrino

ν

µ

R/G/B2/3 1/2 173.2 GeV top

t

R/G/B−1/3 1/2 4.7 GeV bottom

b

−1 1/2 1.777 GeV tau

τ

1/2 < 18.2 MeV τ neutrino

ν

τ

±1 1 80.4 GeV

W

± 1 91.2 GeV

Z

1 photon

γ

color 1 gluon

g

0 125.1 GeV Higgs

H

graviton strong n u cle ar force (color) electromagnetic force (c harge) w eak n uclear force (w eak isospin) gra vitational force (mass) charge colors mass spin 6 quarks (+6 an ti -qu arks) 6 leptons (+6 an ti -lepton s) 12 fermions (+12 anti-fermions) increasing mass→ 5 bosons (+1 opposite charge W ) standard matter unstable matter force carriers

Goldstone bosons

outside standard model 1st 2nd 3rd generation

Figure 1.1.1: The building blocks of matter, divided into interaction group.

1.1.1 Fermions

Fermions are grouped in three generations and further classified into two types: leptons, which can interact via all forces except strong interaction, and quarks, which carry a colour charge and so they can also interact strongly. The neutral leptons are called neutrinos, denoted by the symbol ν and they only experience weak. In Figure 1.1.1 the masses increase from left to right, both for leptons and quarks. Particles in higher generations (having higher masses) are unstable; in order to observe and study these particles, we need to produce them via collisions with other stable particles. While leptons exist as free particles, quarks seem not to do so. It is a peculiarity of the strong forces between the quarks that they can be found only in combinations, not individually. This phenomenon is known as confinement. The quarks are the only fundamental particles interacting through all the possible forces.

(13)

1.1 The Standard Model 3 numbers, some of which respond to some conserved charge under the gauge invariance of the U (1)_Y× SU (2)L× SU (3)C group. The standard model is

a chiral theory, which is not invariant for parity transformations. It is worth remembering that a fermion field ψ (which satisfies the Dirac equation) can be divided into its left-handed part ψL and its right-handed part ψR as:

ψ = ψL+ ψR (1.1)

These two parts identify the two irreducible representations of the restric-ted and orthochronous Lorentz group, (1,0) and (0,1). The corresponding spinor is called Weyl’s spinor.

1.1.2 Gauge Symmetries

Before describing the Lagrangian of the SM and the Higgs mechanism, it is necessary to describe the gauge invariance of quantum electrodynamics (QED), the field theory describing electromagnetic interactions.

Photon is massless and mediates the electromagnetic force between elec-trically charged particles. Mathematically, QED is an abelian gauge theory with the symmetry group U(1)em. The gauge field, which mediates the

in-teraction between the charged 1

2 spin fields, is the electromagnetic field. The

QED Lagrangian for a 1

2 spin field interacting with the electromagnetic field

is given in natural units by: LQED = ¯ψ(iγ µ_D µ− m)ψ − 1 4FµνF µν _(1.2)

where Dµ ≡ ∂µ+ iQAµ is the gauge covariant derivative, Q can be

in-terpreted as the electric charge of the spinor field and m its mass, Aµ is the

covariant four-potential of the electromagnetic field and Fµν is the

electro-magnetic field tensor. This Lagrangian remains unchanged under the local transformation of the Abelian unitary group U(1):

(

ψ_{−→ ψ}0 = eiQθ(x)ψ Aµ−→ A0µ= Aµ− ∂µθ(x)

(1.3) where Q is the electric charge operator of group U(1) and θ(x) is the phase dependent on the coordinates. A global transformation of U(1) leads to the preservation of the electric charge. Furthermore, a mass term for the Aµfield

is not allowed by the gauge invariance, so that the photon is not massive; this result has been confirmed several times by experimental observations.

Weinberg and Salam’s electroweak theory [1] is born from the unification of QED and weak interactions, with the aim of describing two forces as different manifestations of the same interaction. The symmetry group is SU (2)_L_{× U (1)}_Y, whose generators are respectively the operator of weak isospin ~I and the operator of hyper-charge Y . SM is a chiral theory, i.e.

(14)

4 Theoretical basis and motivations the left-handed and right handed components of the fermions transform in a different way under infinitesimal local gauges transformations:

       SU (2)L ψL −→ ψ 0 L= 1 + i~α(x)· ~IψL ψ_R _{−→ ψ}0 R = ψR      U (1)Y ψL −→ ψ 0 L = (1 + iβ(x)Y ) ψL ψR −→ ψ 0 R = (1 + iβ(x)Y ) ψR (1.4) where α(x) and β(x) represent the local gauge transformation. ψL and

ψR identify the weak isospin doublet and singlet respectively. In particular

leptons and quarks are represented in the following notation: ψ_Llepton = ν l− L ψ_Lquarks = u d L ψ_Rlepton = l_R− ψ_Rquarks = uR, dR

The gauge invariance of this theory, with respect to transformations in Equation (1.4), conduct to the following expression of the electroweak Lag-rangian: LEW K = i ¯ψγ µ_D µψ − 1 4W~µν· ~W µν −1₄BµνBµν (1.5)

where this time we have Dµ=

∂µ+ ig 0 2Y Bµ+ ig~I· ~Wµ . Furthermore g0 _{and g are the coupling constants introduced for U(1) and SU(2) gauge}

group, and their relative field Bµand ~Wµ, which respect the following

trans-formations:      SU (2)L ~ Wµ−→ ~Wµ0 = ~Wµ+ ∂µ~α(x) + g~α(x)× ~Wµ Bµ−→ Bµ0 = Bµ      U (1)Y ~ Wµ−→ ~Wµ0 = ~Wµ Bµ−→ Bµ0 = Bµ+ ∂µβ(x) (1.6) Notice the addition of fermion mass terms into the electroweak Lagrangian is forbidden, since terms of the form mψψ do not respect SU (2)L× U (1)Y

gauge invariance. Neither is it possible to add explicit mass terms for the U(1) and SU(2) gauge fields. The Higgs mechanism is responsible for the generation of the gauge boson masses, and the fermion masses result from Yukawa-type interactions with the Higgs field.

1.1.3 The Higgs Mechanism

The SM, to explain the presence of fermions and massive bosons, predicts the existence of a scalar boson, the Higgs H, which through a mechanism

(15)

1.1 The Standard Model 5 of spontaneous symmetry breaking [2–5], provides mass to the particles, preserving the gauge invariance of the Lagrangian. The symmetry breaking occurs when the Lagrangian of the physical system shows a symmetry that is no longer preserved by the configuration of the field in its fundamental state. In the case of electroweak theory, the symmetry group SU (2)_L× U (1)Y is

simultaneously broken preserving the U(1)em symmetry linked to the charge,

which must be kept in vacuum.

In order to describe the main idea of symmetry breaking, the Lagrangian for a simple model with a complex scalar field φ and a quartic potential is considered: LH = 1 2(∂µφ) † (∂µφ)− V (φ) = 1 2(∂µφ) † (∂µφ)− 1 2µ 2_φ2 −1₄λφ4 (1.7) The potential V (φ) has different shapes depending on the µ2 _sign:

• if µ2_{> 0} _{the minimum of the potential is the state with φ = 0;}

• if µ2 _{< 0}_{the minimum of the potential is located on the circumference}

of radius v =q−µ2

λ .

Figure 1.1.2 shows the shape of the potential in both cases. Substituting the partial derivative with its covariant versionDµ= ∂µ+ ig

0

2Y Bµ+ ig~I · ~Wµ

, the Lagrangian LHwill be locally gauge invariant for SU (2)L× U (1)Ygroup;

thus expanding around its minimum, the field φ will be, with an opportune gauge fixing: φ(x) = 0 v + H(x) (1.8) As a result the Lagrangian became:

LH ' 1 2(∂µH) 2 − λv2H2 + gv 2 8 W 2 1 + W22 + v 2 8 gW µ 3 − g0Bµ 2 + +higher order terms

(1.9) It can be seen that the fields W1 and W2 acquire a mass term mW = gv₂

, as well as the H fieldmH =

√

2v2_λ_{. On the other hand, the fields W} 3 and

B are mixed and after a rotation of these two fields it is possible to obtain: Aµ Zµ = cos θW sin θW − sin θW cos θW Bµ W_µ3 (1.10) where cos θW = m_mW_Z = √ g

g2_+g02 defines the Weinberg angle. The relative mass value for Aµ and Zµ are mA = 0 and mZ = v₂

p

g2_{+ g}02_{. It is also}

(16)

6 Theoretical basis and motivations field definition, by using the experimental value of the Fermi constant GF,

determined by the decay of the muon in the following way: g 8m2 W = G√F 2 =⇒ v = 1 p GF √ 2 ∼ 246 GeV (1.11) On the other hand, the Higgs mass is not predicted by theory because it depends on the unknown parameter λ that appears in potential V (φ).

The mass term for fermions cannot be put explicitly in the Lagrangian because it breaks the SU (2)_L× U (1)Y symmetry. It is possible to use the

Higgs doublet φ to generate the masses of quarks and leptons, adding to the Lagrangian a Yukawa term invariant for gauge transformations:

LL=−gl ψLφψR+ ψRφψL

=−mlψψ−

ml

v ψψH (1.12)

where gl is the coupling constant for leptons and ml = glv₂. Both the

mass term and the Higgs coupling with the fermion is present. Similarly for quark masses, the conjugate Higgs double is used.

Figure 1.1.2: The “mexican hat” shape of the quartic potential that allows for spontaneous symmetry breaking.

1.2 Higgs Boson Production and Decays

In proton-proton collisions at the center of mass energies currently reached by the LHC (up to 13 TeV) the Higgs boson is expected to be produced mainly through four mechanisms: gluon-gluon fusion (ggF), vector boson fusion (VBF), associated vector bosons production (VH) and in association with t¯t or b¯b pair. The hierarchy of the production cross sections is listed in Table 1.2.1, the leading Feynman diagrams are reported in Figure 1.2.1 and a brief description of the mechanisms will follow [6].

(17)

1.2 Higgs Boson Production and Decays 7

Gluon-Gluon Fusion The ggF is the dominant production mode of the total. Due to the gluon being massless, there is no direct coupling between gluons and the Higgs boson and the leading diagram involves a quark loop: the dominant contribution to the SM amplitude arises from the top quark loops, the only one with the mass comparable to the Higgs one. Despite its large production cross section, this production mode can be adequately studied only when the Higgs boson decays to particularly clean final states, allowing an efficient background rejection. The production cross section is potentially sensitive to contributions from hypothetical new massive particles with non-zero colour charge interacting within the strong sector.

Vector Boson Fusion The VBF has a cross section of about a tenth of that of ggF. The leading diagrams involve a qq scattering with a vector boson exchange and the emission of a real Higgs. Since the momentum exchange is typically lower than the center of mass energy of the two quarks, the channel is characterised by two separated high-rapidity quarks in the final state, whose presence can be used as signature of the VBF production channel: the two jets have a tendency to be in the forward and backward regions of the detector and so, this particular topology, allows a sufficient background rejection, also depending on the Higgs decay mode. Higgs-Strahlung (VH) This process occurs when a virtual vector bo-son (V) decays to its on-shell state, radiating a Higgs bobo-son. Both the W and Z bosons contribute to this process, that is usually referred to as the VH production mode. Leptons from electroweak decays of vector bosons are particularly helpful for triggering and for background rejection in a hadronic environment, as they are easily identified and reconstructed. Associated Production ttH associated production, where the t quarks are mostly produced through gluon fusion, plays a role for light Higgs masses providing a direct measurement of the top-Higgs Yukawa coupling. For√s = 13 TeV, the production cross section is rather small compared to other Higgs production modes. Other processes, namely the bbH asso-ciated production and the single-top assoasso-ciated production, currently are not object of direct searches, but their contribution is taken into account in global Higgs properties measurements.

The total production cross section in proton-proton collisions depends on the center of mass energy. The current SM model predictions for a Higgs mass of 125.09 GeV [7] is of 51pb at √s = 13 TeV.

As for any unstable particle, the branching fractions of the Higgs boson are determined by the partial widths of the decays into each final state (χ):

BR(H→ χ) = Γ(H → χ)

(18)

8 Theoretical basis and motivations p p g t/b g H (a) ggF p p q W∗/Z∗ W∗/Z∗ q0 q q0 H (b) VBF p p q q W∗_/Z∗ W/Z H (c) VH p p g t/b g H Z p p g t/b g Z H Z (d) ggZH p p g g t/b t/b H (e) ttH

Figure 1.2.1: Examples of leading order diagrams for different Higgs production modes. σ [pb] σ/σtot [%] ggF 44.08 86.1% VBF 3.78 7.4% WH 1.37 2.7% ZH 0.88 1.7% t¯tH 0.51 1.0% b¯bH 0.49 1.0% tH 0.07 0.1%

Table 1.2.1: Higgs boson production cross section for each mode assuming mH = 125 GeV, at NNLO+NNLL QCD accuracies. The total production cross

is predicted to be 51pb. The relative contribution of each production mechanism to the total cross section is also given [6].

where ΓH =PχΓ(H → χ) predicted from the SM to be about 4 MeV for

mH = 125 GeV[6].

The Standard Model predicts the Higgs boson decay amplitude and its branching ratio in each final state. The Higgs boson decays into pairs of fermions through Yukawa-like interactions and the width decay at leading order is: Γ(H _{→ ff) =} GFNC 4√2πmHm 2 f 1− 4m2 f m2_H !3/2 (1.14)

(19)

1.2 Higgs Boson Production and Decays 9 where NC is the color factor and it is 1 for leptons and 3 for quarks. The

dominant decay in a pair of leptons is expected to be into τ+_τ−_{. In case}

of hadron decay a QCD corrections are needed, i.e. the loop contributions for emission or exchange of a gluon in the final state. If mH 2mq, NLO

calculations lead to: ΓN LO(H → q¯q) = GFNC 4√2π mHm 2 f " 1 +4 3 αS π 9 4+ 3 2log m2_q m2 H !# (1.15) where αS is the strong interaction coupling. In this case the b¯b is the

dom-inant one whereas the decay H → t¯t is kinematically prohibited.

The decay rate for a Higgs boson in a vector boson pair (at least one is virtual as mH < 2mV) is: Γ(H → V V ) = GFm 3 H 16√2πδV 1− 4x + 12x 2 _{x =} m2V m2 H (1.16) where the subscript V can be alternatively W or Z and δZ = 1, δW = 2.

Higgs boson can directly interact only with massive particles, so that the decays H → gg and H → γγ are absent at the tree level. These decay rates are generated by quantum loops and the dominant contributions to the decay amplitude are given by massive particles, top quark and W boson. The respective rate are:

Γ(H → γγ) = GFm 3 H 8√2π α2 π2I (1.17) Γ(H → gg) = GFm 3 H 36√2π α2_S(mH) π2 I (1.18)

where I is a factor depending on mt and mW. The branching ratio for the

dominant decay mode are listed in Table 1.2.2. mode BR [%] b¯b 57 % τ tau 6.3% c¯c 2.9% s¯s 57% µµ 0.02% mode BR [%] b¯b 57 % W W∗ 22% gg 8.6% ZZ∗ 2.7% γγ 0.2% Zγ 0.01%

Table 1.2.2: Higgs boson dominant decay mode branching fraction assuming mH= 125 GeV [6].

(20)

10 Theoretical basis and motivations

1.2.1 Higgs and the LHC Run I Legacy

In 2012 both the ATLAS and CMS experiments at the LHC announced of the observation of a new particle at a mass of approximately 125 GeV with Higgs-like properties [8, 9]. After that, a great effort has been made to characterise the newly-discovered object. The production and decay rates [10–19] and the spin-parity quantum numbers [11, 15, 20–22] of the new boson have been extensively studied. Results show that the properties of the particle observed are consistent with those expected for a SM Higgs boson. Various searches for anomalies in the Higgs boson couplings strength have been performed, reporting no statistically significant deviations from those predicted for the SM [23, 24].

The Higgs mass has been measured with high precision by using H → γγ and H → ZZ∗ _{→ 4l, characterised by excellent mass resolution(1-2 GeV).}

Good compatibility among the two channels has been reported by both ex-periment [21, 25].

The spin-parity properties have been studied exploiting the H → γγ, H → ZZ∗ _{→ 4l and the H → W W}∗ _{→ lνlν modes. The observed data}

have rejected spin-1 and spin-2 hypotheses and, assuming that the boson has zero spin, have been shown to be consistent with the pure scalar hypothesis, JP= 0+ as predicted by the SM [22, 26].

1.2.2 Higgs Boson as a Portal for New Physics

The discovery of the Higgs boson and the measurement of its properties represent one of the greatest achievements of the Run 1 at LHC. The charac-terisation of the Higgs is however far from complete and new measurements are expected already by the end of the Run 2 at LHC.

Within the context of dark matter (DM) scenario [27], the Higgs boson acts as the mediator between the SM and the DM particles possibly introdu-cing invisible decay modes and bounds on these decays provides the strongest limits for low-mass dark matter candidates.

Furthermore, the Higgs can be used as a tool in searches for new physics phenomena beyond the SM. An important test would be the reconstruc-tion of the Higgs potential through the measurement of the tri-linear and quartic Higgs self-couplings, which are accessible in multi-Higgs production processes. In the SM the double Higgs production represents the golden channel to study details of the Higgs potential at hadron colliders, but the predicted cross section is extremely small, order of few fb. Despite the great improvements in the Run 2, the SM double Higgs production is expected to be accessible only at higher integrated luminosities [28].

However, enhanced double Higgs production is expected in many scen-arios of new physics preferentially within the electroweak sector. Tree-level couplings of the Higgs boson to fermions and gauge bosons are expected to

(21)

1.2 Higgs Boson Production and Decays 11 be modified with respect to the SM prediction.

Clear modifications of the couplings can also arise in weakly-coupled ex-tensions of the SM, such as super-symmetry, through the mixing of the Higgs boson with new states. Warped Extra Dimensional theories also predict the existence of a massive scalar (Radion) or a spin-two particle (KK-Graviton) produced by gluon fusion which could decay into a pair of Higgs boson. This thesis will use these model as a benchmark to signal event production. Warped Extra Dimensions

A large class of models suggests the existence of Warped Extra Dimensions (WED) [29], in order to explain the scale difference between gravity and the other fundamental interactions. In the WED models, resonant Higgs pair production is mediated by either a spin-2 massive Kaluza-Klein (KK) Graviton or by a spin-0 Radion.

In the simplest case one spatial extra dimension of length L compactified between two fixed points, commonly called branes is introduced, as first proposed by Randall and Sundrum. The region between the branes is called bulk and is controlled through an exponential five-dimensional metric:

ds2= e−2kygµνdxµdxν + dy2 (1.19)

where the y variable refers to the coordinate in the 5th _{dimension. The}

gap between the Planck scale and the electroweak scale is controlled in the metric by the warp factor k that parametrises the curvature. The brane where the density of the extra dimensional metric is localised is called the “Planck brane” or ultraviolet brane (UV) defined by y = 0, while the other, where the typical SM energies are localised, is called the “ TeV brane” or infrared branes (IR) where y = L.

Perturbations on the space-time project in the four-dimensional effective theory a KK states. The zero mode of such a oscillations corresponds to the massless gravity mediator, the graviton, while the first massive graviton excitation is the KK-Graviton. Similarly, fluctuations of the extra dimension y of the metric produce the Radion field and its related KK states.

The mass of the KK-Graviton is proportional to the scale of the theory ΛG= √ 8πe−kLMP L as: mG ∝ k MP L ΛG (1.20)

The product of the curvature and length of the extra dimension kL ∼ 35 is determined by the difference between the Planck and electroweak scales from Equation (1.19). The Radion scale is related to the KK-Graviton as ΛR=

√

6ΛG [30].

There are two possible ways of describing a KK-graviton that depend on the choice of localisation for the SM matter fields. Depending on the scenario, they can be localised in the IR brane (RS1) [29] or be allowed to

(22)

12 Theoretical basis and motivations explore the 5th _{dimension as well. A possible configuration, known as}

bulk-RS [31], predicts that the Higgs doublet is localised in the IR brane with the gauge bosons, while the other SM matter fields are localised in the UV brane and allowed to propagate in the extra dimensional bulk. The other possibility is the RS1, where only gravity is allowed to propagate in the extra-dimensional bulk whereas SM fields are localised in the IR brane. In RS1, the KK-Graviton and Radion couple to light quarks and gluons with the same coefficient, while in the bulk-RS scenario they couple preferentially with the H, Z, W and t and the light fermions couplings is suppressed. In this scenario, since couplings to fermions are suppressed, the gluon fusion is the dominant process for both Radion and KK-Graviton production at the LHC [32]. In Figure 1.2.2 the production cross sections for resonant di-Higgs production is shown. In the intermediate state X is the KK-Graviton or the Radion.

WED scenarios have been probed at LHC during Run 1. The most sens-itive channels for a di-Higgs resonant production are HH → b¯bγγ up to mX = 400 GeV and HH → b¯bb¯b for higher mass resonances. The

cur-rent limits set by CMS during Run 1 exclude Radions with ΛG= 1 TeV for

masses ranging from 260 GeV to 1.5 TeV. The exclusion plot are reported in Figure 1.2.3 [33–35].

A model independent search for resonant di-Higgs production is presented in this thesis, where both Higgs decay into a b¯b pair. The natural width of such a resonance is considered narrow compared to the experimental resol-ution. The four b-jets final state is chosen to exploit the leading BR for a SM Higgs boson. This fully hadronic final state, though the overwhelming QCD background, is expected to be quite competitive in searches for new physics. For a resonance with a mass larger than 500 GeV and negligible natural width, several studies have shown this channel could be sensitive to a σ × BR as small as a few fb with the LHC data collected in 2016.

(23)

1.2 Higgs Boson Production and Decays 13

Figure 1.2.2: Production cross sections times branching ration as a function of the KK-Graviton and Radion masses.

Figure 1.2.3: Observed and expected 95% CL upper limits on the product of cross section of a narrow resonance and the relative branching ratio for different decay mode. The green and yellow bands represent, respectively, the 1 and 2 standard deviation extensions beyond the expected limit. Theory predictions corresponding to WED models are also shown.

(24)

(25)

2

The experimental framework

The data analysed in this thesis were recorded during 2016 by the Compact Muon Solenoid (CMS) detector [36] at the Large Hadron Collider (LHC) [37] at CERN. This chapter provides a brief description of the experimental framework, both the accelerator system and the detector, focusing on the trigger and tracking system, which are the most relevant elements for this analysis.

2.1 Large Hadron Collider

The LHC is a two-ring hadron beam accelerator and collider operated by the European Organization for Nuclear Research (CERN). It was built in the 26.7km underground tunnel previously used for the Large Electron Positron collider (LEP). The LEP tunnel has eight straight sections and eight arcs and lies between 45 m and 170 m below the surface on a plane inclined at 1.4% sloping towards the Léman lake. The LHC is designed to accelerate proton beams with center of mass energy (√s) up to 14 TeV and an instantaneous luminosity L(t) of 1034_cm−2_s−1_{, as well as lead ion}

beams (208_Pb82+_{) with beam kinetic energy up to 2.76 TeV/nucleon, which}

corresponds√s = 1.15 PeVfor ion-ion collisions, achieving an instantaneous luminosity of L(t) = 1027_cm−2_s−1_{. This thesis uses data from proton-proton}

collisions taken in 2016 during Run 2 (Figure 2.1.2).

Dipole magnets and focusing quadrupoles are located in each arc. The beams are guided around the accelerator ring by a strong magnetic field (Bmax = 8.33T) maintained by 1232 superconducting niobium-titanium

(NbTi) dipole magnets. Besides, a total of 392 quadrupole magnets are used to focus the beam while superconductive radio-frequency cavities, which are tuned to oscillate at 400MHz, are used to accelerate it.

The LHC is the final part of the CERN accelerator complex shown in Figure 2.1.1. Protons are collected by ionisation of the Hydrogen gas source

(26)

16 The experimental framework and accelerated in steps by a chain of accelerators. The first step is the Linear Accelerator (LINAC 2) followed by the Proton Synchrotron Booster (PSB), the Proton Synchrotron (PS) and, in the end, the Super Proton Syn-chrotron (SPS), where protons are accelerated up to the energy of 450 GeV and grouped into beams. They are then injected into the two LHC rings with the opposite rotation directions, due to the opposite magnetic field in the two rings. Each beam is split into 2808 bunches with a nominal amount of protons per bunch N = 1.15 × 1011_{. The further beam acceleration, up}

to the collision energy, is done by the radio frequency (RF) cavity system, which at the same time adjusts the shape of the beams.

Four different experiments with different characteristics and purposes are located at the four interaction points. ATLAS (A Toroidal LHC ApparatuS) and CMS (Compact Muon Solenoid) are designed to investigate a broad range of phenomena. Their focus includes the Higgs boson measurement and the exploration of the energy frontier in a quest for new physics at the TeV scale. ALICE (A Large Ion Collider Experiment) is a optimised for heavy-ion collisions detector, designed to study the physics of the strong interaction at extremely high energy densities, where a phase of matter called quark-gluon plasma forms. The Large Hadron Collider beauty (LHCb) experiment specialises in the precise measurements of CP-violating observables to search for indirect evidence of New Physics.

(27)

2.2 The CMS experiment at LHC 17

1 May 1 Jun 1 Jul 1 Aug 1 Sep 1 Oct

Date (UTC) 0 5 10 15 20 25 30 35 40 45 T o ta l In te g ra te d L u m in o s it y ( fb ¡ 1) CMS Online Luminosity

Data included from 2016-04-22 22:48 to 2016-10-27 14:12 UTC

LHC Delivered: 41.07 fb¡1 CMS Recorded: 37.82 fb¡1 0 5 10 15 20 25 30 35 40 45

CMS Integrated Luminosity, pp, 2016, ps = 13 TeV

Figure 2.1.2: Cumulative luminosity versus day delivered to (blue), and recorded by CMS (orange) during stable beams and for p-p collisions at 13TeV center of mass energy in 2016 [39].

2.2 The CMS experiment at LHC

The Compact Muon Solenoid detector is a multipurpose detector aiming to fully reconstruct proton-proton collisions events. The primary physical motivations for the CMS detector were to test the Standard Model (SM) theory, with particular attention to the search for the SM Higgs scalar boson, discovered in 2012 [8, 9] and for possible beyond Standard Model signatures explored in the TeV scale.

The detector is designed according to the cylinder shape of the super-conducting solenoid, which provides a uniform magnetic field of 3.8T. The structure consists of two regions: the barrel with |η| ≤ 1.2 are made of sub-detectors positioned at increasing values of the cylinder radius and the endcaps (|η| ≥ 1.2) where sub-detectors are layered along the beam axis. The solenoid itself is 13m long with a 6m diameter. It contains, from inside out, the tracker and the electromagnetic and hadronic calorimeters. Outside the magnet coil, the iron return yoke of the magnet hosts the muon spectrometer, used for reconstruction of muon tracks.

The main features of the CMS detector are indeed the superconduct-ing solenoid, which allows a compact design with a strong magnetic field, a high-quality tracking system, a high resolution and high granularity elec-tromagnetic calorimeter, a hermetic hadronic calorimeter and a redundant muon system. The overall length is 21.6m, the diameter 14.6m and the total weight about 14 500t. An inner view of the CMS detector is shown in Figure 2.2.1.

(28)

18 The experimental framework The coordinate system used by the experiments at the LHC has its origin fixed at the nominal collision point. The X-axis points towards the centre of the LHC ring, the Y-axis points upwards, and the Z-axis points along the counter clockwise beam direction. Besides, the azimuthal angle φ is measured from the X-axis in the X-Y plane and the polar angle θ is measured from the positive Z-axis. CMS uses a more suitable cylindrical coordinate system (r, φ, η) with r being the distance from the Z-axis and η the pseudo-rapidity defined as η = − ln tan(θ/2). The pseudo-rapidity is the relativistic limit of the rapidity of a particle, yz = 1₂lnE+p_E−pz_zc_c, which depends on particle

energy (E) and longitudinal momentum (pz).

In a typical collision, the centre of mass is boosted along the z-axis re-spect to the laboratory frame. The kinematics of the collision products are therefore conveniently described by the coordinates (pT, yz, φ, m). Here, m

indicates the invariant particle mass, pT the transverse momentum given by

pT = p sin θ. The transverse momentum, the azimuthal angle and the mass

are invariant under boosts along the z-axis, and the rapidity changes only by an additive constant. The difference in rapidity between two particles is therefore invariant under boosts along the Z-axis.

Figure 2.2.1: A cutaway view of the CMS detector [40].

2.2.1 The inner tracking system

The tracker [41, 42] of CMS constitutes the innermost part of CMS and is designed to provide a precise measurement of the charged particle tracks and an efficient reconstruction of the primary and secondary interaction vertexes.

(29)

2.2 The CMS experiment at LHC 19 The tracker occupies a cylindrical volume inside an almost uniform co-axial magnetic field of 3.8T provided by the CMS solenoid [43]; the tracker total length and diameter are respectively of 5.8m and 2.5m, while the angular coverage reaches up to η = 2.5, for a total active surface of 210m2_.

A silicon pixel detector is installed in the innermost region, closest to the interaction point, while silicon micro strip detectors are used in the outer region. The pixel tracker consists of three co-axial barrel layers at radii between 4.4cm and 10.2cm and two pairs of endcap disks at z = ±34.5 cm and ±46.5 cm, with an hit position resolution of 10µm in the transverse coordinate and 20-40µm in the longitudinal coordinate.

The strip tracker is composed of four subsystems, for a total of ten barrel layers and twelve endcaps of different sizes. The Tracker Inner Barrel (TIB) and Disks (TID) cover r ≤ 55cm and |z| ≤ 118cm, and are composed respect-ively of four barrel layers and three disks for each side. The Tracker Outer Barrel (TOB) covers r ≥ 55cm and |z| ≤ 118cm and consists of six barrel layers. The Tracker EndCaps (TEC) cover the region 124 ≤ |z| ≤ 282 cm. Each TEC is composed of nine disks, each containing up to seven concentric rings of silicon strip modules extending the acceptance of the tracker up to a pseudorapidity of η ≤ 2.5. The strip tracker provides a range of resolutions in r-φ of approximately 10-50µm. A schematic drawing of the CMS tracker is shown in Figure 2.2.2. Green dashed lines delimited the tracker subsys-tems. Strip tracker modules that provide 2-D hits are shown by thin, black lines, while those permitting the reconstruction of hit positions in 3-D are shown by thick, blue lines. The latter each consist of two back-to-back strip modules, in which one module is rotated. The pixel modules, shown by the red lines, also provide 3-D hits. Within a given layer the adjacent modules are overlapped, thereby gaps in the acceptance can be avoided.

Figure 2.2.2: Schematic cross section through the CMS tracker in the r-Z plane. In this view, the tracker is symmetric in the r-axis. The center of the tracker, corresponding to the approximate position of the pp collision point, is indicated by a star. [41]

(30)

20 The experimental framework

Tracker Upgrade

The elements of the tracker are exposed to large radiation and have a finite lifetime. The pixel system is very close to the interaction region and sees the largest flux of particles. CMS has installed a new pixel detector during the 2016-2017 end-of-the-year shutdown, therefore the data used for this thesis are referred to the precious version of the detector. Figure 2.2.3 shows a conceptual layout for the Phase 1 upgrade pixel detector. The current 3-layer barrel, 2-disk endcap system is replaced with a 4-layer barrel, 3-disk endcap system for four hit coverage. Moreover, the addition of the fourth barrel layer at a radius of 16cm provides a safety margin in case the first silicon strip layer of the TIB degrades more rapidly than expected, but its primary role is in providing redundancy in pattern recognition and reducing fake rates and reconstruction time with high pile-up.

(a) (b)

Figure 2.2.3: a) Comparison between the position of layers and disks in the existing (below the beam pipe) and the new (above the beam pipe) pixel trackers. b) Transverse view of the pixel barrels of the current (left) and the new (right) pixel detector [42].

The performance has been compared to those of the current detector. We report in Figure 2.2.4 an example of such studies on simulated samples of t¯t with 0, 25, 50, 100 pile-up interactions, comparing efficiency and misidenti-fication of the two pixel detectors. The track reconstruction efficiency and fake rate presented are defined as follows:

Tracking efficiency= Number of truth tracks matched to reconstructed tracks

Number of truth tracks

Track fake rate= Number of reconstructed tracks not matched to truth tracks

(31)

(a)

(b)

(c)

(d)

Figure 2.2.4: Tracking efficiency (a,c) and fake rate (b,d) for thet¯t sample as a function of track η, for the current detector (a,b) and the upgrade pixel detector (c,d). Results are shown for zero pileup (blue squares), an average pileup of 25 (red dots), an average pileup of 50 (black diamonds), and an average pileup of 100 (brown triangles). [42].

2.2.2 Calorimeter system

Around the tracker, two calorimeter tiers take care of particle energy measurements and provide complementary track information. Calorimeters are mainly composed of absorber material in which incident particles ini-tiate secondary showers that can be measured. As the primary particles are absorbed in the process, this is a destructive technique. Two distinct sub-detectors are used to achieve optimal measurement of both electrons

(32)

22 The experimental framework and photons and strongly interacting particles. Showers in electromagnetic calorimeters are formed either through photon pair-production (γ → e+_e−_),

and through an electron bremsstrahlung (e±

→ γe±). A shower grows until the energy of the particles is sufficiently low for them to be captured and absorbed by the surrounding detector material. The fraction of energy a particle loses when crossing through a certain amount of material can be expressed in terms of the radiation length X0, the average distance covered

by an electron before its energy is reduced through bremsstrahlung by a factor 1/e. One radiation length is also equivalent to 7

9 of the mean free

path of the photon, the average distance covered before pair production oc-curs. In the transverse direction, the extent of a shower is characterised by Molière radius RM; a cylinder of r = RM contains on average 90% of

the shower energy. Both X0 and RM are material dependent. In a hadron

calorimeter, hadrons shower through inelastic interactions, including multi-particle production (e.g. pion pair production) and nuclear decays. As had-ron showers include neutral pions, which decay electromagnetically, they will include a component which can be registered in an electromagnetic calori-meter. The characteristic length used to describe hadronic showers is the nuclear interaction length λ, the average distance crossed before undergoing an inelastic nuclear interaction. Also λ is material dependent, and gener-ally larger than X0. Considering the larger characteristic length of hadronic

showers as well as the presence of an electromagnetic component, it makes sense for electromagnetic calorimeters to precede hadronic calorimeters in the detector layout. Technology-wise, two broad categories of calorimeters exist: homogeneous calorimeters, consisting entirely of active materials; and sampling calorimeters, alternating active and passive materials. Homogen-eous calorimeters can provide excellent energy resolution but are less suited for particle identification due to lower position accuracy and large thickness requirements. Sampling calorimeters, on the other hand, have lower energy resolution, but provide better spatial resolution and thanks to absorber lay-ers manage to contain hadron showlay-ers with smaller material thickness. The lower thickness requirement, which makes the detector cheaper, is often the main argument for the use of sampling calorimeters.

ECAL

One of the principal CMS design objectives was to construct a very high-performance electromagnetic calorimeter. The electromagnetic calorimeter (ECAL) is used to measure the energy and the direction of electrons and photons. It played an essential role in the study of the physics of electroweak symmetry breaking, particularly through the exploration of the Higgs sector. The ECAL is a hermetic, homogeneous scintillating crystal calorimeter, which, differently from a sampling calorimeter, offers better performances for energy resolution since most of the energy from electrons or photons

(33)

2.2 The CMS experiment at LHC 23 is deposited within the volume of the calorimeter. The active material is lead tungstenate (P bW O4). It is divided into two parts. The barrel part

is called ECAL barrel (EB) and it covers up to η = 1.479, extending from r = 1.3mto r = 1.8m; the two endcaps are called ECAL endcaps (EE) and cover the region 1.479 ≤ η ≤ 3.0, extending from z = 3m to z = 3.8m. A 20cm thick pre-shower (ES) stands in front of the EE covering the region 1.653_{≤ η ≤ 2.6. It consists of two active silicon strip layers and passive lead} absorber layers, and it is used to help to distinguish energetic photons from π0 that appears as a very close photons pair. The longitudinal section of the calorimeter is shown in Figure 2.2.5.

Figure 2.2.5: Longitudinal view of one quarter of the CMS ECAL. [44]

When a photon or an electron enters the dense material of the calorimeter, it creates an electromagnetic shower which is registered using photodetect-ors. The P bW O4 has been chosen because it has a short radiation length

(X0 = 0.89cm) and a small Molière radius (RM = 2.2cm). Thanks to these

properties provided by a high density (ρ = 8.28g cm3_{), the electromagnetic}

calorimeter has a compact shape, a fine granularity and the property of collect the 80% of the light emitted within 25ns.

The energy resolution of a calorimeter can be parametrised as σ E 2 = a √ E 2 +σn E 2 + c2 (2.1)

where a is called stochastic term and includes the effects of fluctuations in the number of photo-electrons, σn is the noise from the electronics and

pile-up, and c is a constant term related to the calibration of the calorimeter. The values of the three constants measured on test beams are reported in Table 2.2.1 and the trend of energy resolution is shown in Figure 2.2.6.

(34)

Contribution EB (η = 0) EE (η = 2) Stochastic term 2.7%/√E 5.7%/√E

Constant term 0.55% 0.55%

Noise (high luminosity) 0.155 MeV 0.205 MeV Noise (low luminosity) 0.210 MeV 0.245 MeV

Table 2.2.1: Contribution to the energy resolution of ECAL (the energy E is expressed in GeV) [44].

Figure 2.2.6: Energy resolution as a function of electron energy with and without a preshower consisting of a single layer of 2.5X0lead [44].

HCAL

The hadron calorimeters are particularly important for the measurement of hadron jets and neutrinos or exotic particles resulting in apparent missing transverse energy.

The HCAL operating principle is the following. Firstly, going through the absorber the particle produces a shower, then a scintillator measures the scintillation light of the charged particles produced in the shower. A large part of the calorimeter is inside the superconducting solenoid. Therefore it is essential to use no ferromagnetic materials. The absorber is made of brass because brass has a short interaction length and it is easy to shape. The scintillator uses wavelength shifting fibres. HCAL is divided into four parts: hadron barrel (HB), hadron endcaps (HE), hadron outer (HO) and hadron forward (HF).

Figure 2.2.7 shows the longitudinal view of the CMS detector. The HB and HE sit behind the tracker and the electromagnetic calorimeter as seen from the interaction point. The HB is radially restricted between the outer extent of the EB and the inner part of the magnet coil (1.77 m ≤ r ≤ 2.95 m).

(35)

2.2 The CMS experiment at LHC 25 The HB covers the region η ≤ 1.3, while the HE part covers a region up to |η| = 3. In the central region of CMS, the calorimeters are not able to contain very high energetic jets, and the tail of the jet is measured by a tail catcher (HO) outside the coil complementing the barrel calorimeter. Beyond |η| = 3, the forward hadron calorimeters placed in region 11.1 m ≤ |z| ≤ 14 m from the interaction point extend the pseudorapidity coverage up to |η| = 5.2. The largest challenge for such a detector is to be able to cope with the very large particle flux observed in the forward direction.

The energy resolution of the central and endcap HCAL parts can be fit with a parametrisation similar to the one used for ECAL and shown in Equation (2.1). In the case of HCAL, the contribution from the noise σn

is negligible. In the energy range 2–300 GeV, the stochastic and constant terms are respectively a = (84.7 ± 1.6%) GeV and c = (7.4 ± 0.8%) [45].

Figure 2.2.7: Longitudinal view of the CMS detector showing the locations of the hadron barrel (HB), endcap (HE), outer (HO) and forward (HF) calorimeters: the dashed lines are at fixedη values. [36]

2.2.3 Muon system

Muons do not interact through strong interactions and electromagnetic energy loss in the matter is essentially due to ionization: they lose about 3 GeV when traversing the entire detector volume, while most of the other interaction products are stopped by ECAL and HCAL. For that reason, the muon system is the outermost part of the CMS detector. The muon system and the inner tracker provide a way to measure the muon momentum. There-fore, to achieve good physical performance, the muon system is designed to deliver a fine transverse momentum resolution. Because the contribution from the backgrounds is significantly reduced by the calorimeters, high effi-cient stand-alone muon identification by the muon system is possible.

(36)

26 The experimental framework The muon system is divided into barrel and endcap sections (see Fig-ure 2.2.8) and uses three different gas detector technologies to measFig-ure the muons; drift tubes (DT) in the barrel region, cathode strip chambers (CSC) in the endcap area, and resistive plate chambers (RPC) in both the barrel and endcap.

The barrel section covers |η| ≤ 1.2 region, where the standard drift tube (DT) chambers are used. The DT chambers are organised into four stations. The first three stations, each composed of 12 chambers, are hosted inside the magnetic field return plates. Four chamber provide measurements in the rφ-plane, four measure a coordinate in the rθ-plane and four are to measurements in the r − z-plane . The fourth station is placed outside the magnetic field; it provides only rφ-coordinate measurements and is composed of 8 chambers, which are separated to the most possible extent to achieve the best angular resolution. The two endcap sections allows to identify muons in 0.9≤ |η| ≤ 2.4 region. In the endcap, where the muon rates are higher than in the barrel, the cathode strip chambers (CSC) are used to ensure good performance in a high background environment and non-uniform magnetic field conditions. The advantages of the CSC are a fine segmentation, a fast response time and their radiation hardness. Each endcap section consists of four CSC stations inter-layered between the magnet flux return plates. In each CSC station, six layers of approximately perpendicular anode wires and cathode strips provide efficient muon tracking and robust pattern recognition for background rejection. Both cathodes and anodes are readout delivering rφand η measurements, respectively. The anodes also provides good time resolution, which is used to identify the beam-crossing time of a muon.

In addition to the DT and CSC, resistive plate chambers (RPC) are used in η ≤ 1.6 region. RPCs are gaseous parallel-plate detectors with a high intrinsic time resolution of about 2ns made of double-gap modules operating in avalanche mode. Six RPC stations in the barrel region and three RPC stations in the endcap region provide additional pT trigger capabilities, due

to the fast RPC response time, and extra information for tracking ambiguity resolution.

2.2.4 The trigger

At√s = 13 TeV, the design LHC luminosity is L(t) = 1034cm−2s−1 and the total inelastic cross-section is approximately 70mb [47] for 13 TeV of center of mass energy; this corresponds to a rate of 109 _{events per second}

from proton-proton collisions. After zero suppression the data size per event is order of 1MB. Because of the LHC bunch crossing frequency is 40MHz (25ns spaced beams) there are technical difficulties in handling, storing and processing such huge amounts of data that impose a reduction factor on the rate of events that can be written to permanent storage. Furthermore, the majority of collisions are not of interest.

(37)

Figure 2.2.8: Shown are the locations of the various muon stations and the steel disks: the 4 drift tube (DT, in light orange), the cathode strip chambers (CSC, in green), resistive plate chambers (RPC, in blue) [46].

The usage of the Trigger and Data Acquisition System (TriDAS) [48] , a real-time selection and recording of the useful events, reduces the collected events for archiving and later offline analysis. The selection of the events is based on their physics content so that the online algorithms must have a level of sophistication of the offline reconstruction. The criteria for the event selection should be as inclusive as possible for unexpected new phenomena that may appear; thus, the events selected must be tagged to indicate the reason for their selection. The required rejection power is too large to be achieved in a single processing step. For this reason, the full selection task is split into two stages.

The Level-1 Trigger (L1), that reduces the rate up to 100kHz, is hardware based and several uses custom electronics to implement in hardware the events selection using information from the calorimeters and muon systems only. The High-Level Trigger (HLT), designed to reduce the rate up to the maximum that the data acquisition system can handle (100Hz - 1kHz), is a processor farm several thousand CPU cores, capable of processing a maximum rate of 10MHz. Events accepted by the L1 trigger are read out to the high-level trigger using the complete detector information, including input from the tracker.

(38)

L1 trigger The L1 trigger operates at the hardware level, and through

the use of simplified information, it makes an accept-reject decision in less than 3.2µs. During this time the full detector information with a lower resolution of the event is memorised in local buffers, whose capacity is limited to 128 bunch crossings. The L1 Trigger System is organised into three major subsystems: the L1 calorimeter trigger, the L1 muon trigger, and the L1 global trigger. The first level is made of calorimeters trigger towers, which measure the energy deposited in the calorimeters (HCAL, ECAL and HF), and of muon signals (DT, CSC and RPC). The second level is composed by regional triggers which measure some variables, for example, the muon momentum, jet quantities, total and missing transverse energy, and then pass the data to the global triggers. Finally, at the third level, the Global Trigger (GT) select the events that pass some predefined criteria. The decision to accept is taken if the event satisfies all the requirements of at least one of the GT algorithms (OR of the single requirements). The GT may execute in parallel up to 130 algorithms, from basic algorithms consisting on some simple pT or ET thresholds of a single object to sophisticated algorithms

based on topological selections. The result of each algorithm is represented by one bit, which indicates if an event passed the algorithm requirements or not. If L1 accept decision is made, the entire detector information is read out and is given to the event builder network (EB), to generate a global event and then it is transferred to the HLT.

(39)

HLT trigger The HLT is implemented in software. It decides whether

to accept or reject an event in 100ms on average. The data processing of the HLT is structured around the concept of a Path and Menu. An “HLT Path” is a set of algorithms and filters run in a predefined order and connected by logical “AND”. An “HLT Menu” represents the sum of logical “OR” of trigger paths that, if enabled, determine whether to reject or store an event. A path is subdivided into producers and filters, requiring the presence of one or more physics objects. The producers compute physics objects from the information of the event and the filters decide if the event satisfies or not a particular criterion. The full detector readout is available at HLT but, to minimise CPU time, the structure of a trigger path is such that the information that can be reconstructed quickly is produced first and used to reduce the data rate for the successive producers and filters. If the event satisfies all the path’s requirements, a boolean variable (“trigger bit”) is set to 1, otherwise, is set to 0. The event needs to pass the logical AND of each requirement to fire the trigger: just one not satisfied is sufficient to refuse the event. In this way producers that are time-consuming are run only on a small fraction of the most interesting events. This allows, for example, the usage of the full track reconstruction only for the events which cannot be rejected before using information from the calorimeters or the faster pixel-only track reconstruction. In order not to supersaturate the memory, a potential random “pre-scale” factor can be used to reduce the amount of stored data [50].

2.2.5 Computing system

Events accepted by the HLT have to be stored and processed to be ana-lysed. Data produced in p–p collisions and simulated samples are stored in a structure called World LHC Computing Grid, made of different computing centres spread all over the world and linked with a high-speed network [51]. In CMS there are three formats: RAW, RECO and AOD (Analysis Object Data). The RAW format is referred to data collected by the CMS detector and that has fired the HLT (average size of 1.0MB/event). The RECO data are instead data on which reconstruction and identification algorithms are applied about twice a year to promote them to high–level physical ob-jects (average size of 2MB/event). The last step is the AOD data, which are filtered RECO data to obtain more accessible objects (average size of 400kB/event).

The amount of data expected during Run 2 would vastly exceed avail-able storage resources. To tackle this challenge, CMS has developed a new, condensed data format called Mini-AOD. Mini-AOD is produced centrally by the CMS computing group and provides a common foundation for CMS physics analyses. Its compressed format is one tenth the size of AOD, and it provides CMS with the solution for handling the large influx of data coming

(40)

30 The experimental framework from Run 2 [52].

For MC simulations the corresponding data are called GEN–SIM–DIGI (average size of 2 MB/event). GEN indicates the generation of physical pro-cesses, using complex simulation tools, like PYTHIA [53], POWHEG [54] and MADGRAPH [55]. SIM indicates the simulation, where the interac-tion of the particles in all CMS detectors and their responses are described through the software GEANT4 [56]. Finally, DIGI indicates the digitisa-tion, that is the simulation of the electronic response.

The CMS computing environment has been constructed as a distributed system of computing services and resources that interact with each other as Grid services. The Worldwide LHC Computing Grid (WLCG) is composed of four levels, or “Tiers”. Each Tier is made up of several computer centres and provides a specific set of services. Between them, the tiers process, store and analyse all the data from the Large Hadron Collider (LHC).

Tier-0 (T0) is the CERN data centre. All of the data from the LHC passes through this central hub, which, however, provides less than 20% of the Grid’s total computing capacity. The standard workflow is as follows.

• Accept RAW data (millions of files from across the detectors) from the CMS Online Trigger and Data Acquisition System (TriDAS).

• Repack and archive this data into “primary datasets” based on trigger information (immutable bits).

• Distribute RAW data sets among the next tier stage resources (Tier-1) so that two copies of every RAW data are saved, one at CERN and another at a Tier-1.

• Perform “PromptCalibration” to get the calibration constants needed to run the reconstruction.

• Distribute the RECO datasets among Tier-1 centres, to match for each RAW the corresponding RECO.

• Distribute AOD to all Tier-1 centres.

Tier-1 (T1) There is a set of 15 Tier-1 (T1) sites, which are round-the-clock support computer centres sited in CMS collaborating countries (large national labs, e.g. INFN and FNAL). Tier-1 sites are in general used for large-scale and can provide data to and receive data from all Tier-2 sites. Each T1 centre:

• Provides tape archive of part of the RAW data (secure second copy) which it receives as a subset of the datasets from the T0.

• Provides substantial CPU power for data analysis. • Stores an entire copy of the AOD.

(41)

2.2 The CMS experiment at LHC 31 • Distributes RECO and AOD to the other T1 centres, CERN and to

the associated group of Tier-2 centres.

• Provides secure storage and redistribution for MC events generated by the Tier-2.

Tier-2 (T2) Tier-2s are typically universities and other scientific institutes that can store sufficient data and provide adequate computing power for specific analysis tasks. There are around fifty Tier2 sites around the world. T2s rely upon T1s for access to large datasets and secure storage of the new data produced at the T2 (generally MonteCarlo). In summary, the Tier-2 sites provide:

• Services for local groups.

• Grid-based analysis for the whole experiment. • Monte Carlo simulation for the whole experiment.

Individual scientists can access the Grid through local computing re-sources (Tier-3), which can consist of local clusters in a university depart-ment or even an individual PC. There is no formal engagedepart-ment between WLCG and Tier-3.

(42)

(43)

3

Reconstruction of Physics Objects

In this Chapter, we describe how the information from different sub-detectors is used to reconstruct the outgoing particles from p-p interactions. In the first steps, the collected raw information is used to reconstruct high-level objects, including tracks, vertices and stable particles, using the particle flow algorithm. Then the high-level information is employed in the recon-struction of the physical objects and those of primary interest for this ana-lysis are reported. The present search focuses on a final state containing four b-jets coming from Higgs decays; thus, several jet reconstruction algorithms, jet substructure techniques, and b-tagging algorithms are considered.

3.1 Tracks

The combinatorial track finder (CTF), an adaptation of a combinator-ial Kalman Filter (KF) [57] that uses reconstructed pixel and strip hits as input, is employed to estimate the trajectory and the momentum of the charged particles passing through the inner tracking system. To describe the expected helical path of a charged particle in the tracker, five paramet-ers are required. At CMS d0, z0, φ, cot θ and pT are used. pT is defined at

the impact point (x0, y0, z0), the closest to the nominal beam axis (z=0) of

the track trajectory; θ and φ are the polar and the azimuthal angles of the track momentum vector at the point of closest approach. d0 is the impact

parameter1 _{(IP) of the track in the transverse plane and is defined by the}

coordinate of the impact point, d0 =−y0cos φ + x0sin φ.

Due to high particle multiplicity inside the tracker, the track reconstruc-tion is a computareconstruc-tionally challenging procedure. To reduce the combinatorial complexity, an iterative tracking process is used. This allows excluding the

1

The impact parameters d0 and dz are defined as the transverse and longitudinal