Scalable at the RU level

(1)

Trigger and Data Acquisition

at the Large Hadron Collider

(2)

Acknowledgments (again)

• This overview talk would not exist without the help of many colleagues and all the material available online

• I wish to thank the colleagues from ATLAS, CMS, LHCb and ALICE, in particular R. Ferrari, P. Sphicas, C. Schwick, E.

Pasqualucci, A. Nisati, F. Pastore, S. Marcellini, S. Cadeddu ,

M. Zanetti, A. Di Mattia and many others for their excellent

reports and presentations

(3)

Day 2 - Summary

• Data acquisition

– Data flow scheme

– Readout: from the front-end to the readout buffers

• Event Building

– How-to

– Switching methods, limits and available technologies – A challenging example: CMS

• High-Level Trigger

– Requirements – Implementation – Performances

• Final Conclusions

(4)

L1 Trigger Summary

Difficult experimental conditions at LHC !!!

• ~10⁹ interactions per second

@ L = 10³⁴ cm^-2s^-1

• 22 interactions / bunch crossing

• DAQ-limited trigger rate:

100 kHz ATLAS, CMS, 1 MHz LHCb

• Large uncertainties in estimating trigger rates

• L1 trigger on fast

(calorimeter and muon) information only

• The L1 architecture in the LHC experiments

Min. bias

More than 10 orders of magnitude

Physics

(5)

L1 Rate vs. Event Size

(6)

Need More Trigger Levels

• L1 trigger selection:

– 1 out of 1000-10000 (max. output rate ~ 100 kHz)

• This is NOT enough

– The typical ATLAS/CMS event size is 1 MB – 1 MB x 100 kHz = 100 GB/s (!!!)

• What is the amount of data we could “reasonably” store nowadays?

– 100 MB/s (ATLAS, CMS, LHCb) ÷ 1 GB/s (ALICE)

• More trigger levels are needed to further reduce the

fractions of less interesting events in the data sample to be

written to the permanent storage

(7)

Trigger/DAQ at LHC

100 (~10³)

(8)

Data Readout at LHC

(9)

Data flow: summary

• L1 pipelines (analog/digital)

• When L1 arrives, data readout from the Front-End

Electronics

• Accepted event fragments are temporarily stored on readout buffers

• Local detector data (partially assembled) could be used to provide an intermediate

trigger level

• Assemble event

• Provide High Level trigger(s)

• Write to permanent storage

(10)

ATLAS: the data flow

RoI based: identified by L1, are used by L2 trigger to investigate further (additional O(100) background rejection)

(not pipelines, direct access to data by L2 farm)

Only RoI data

(11)

CMS: the data flow

!

(12)

LHCb: the data flow

2 kHz

(Find high IP tracks using silicon detector information )

(13)

The Data Readout

• In “classical” data acquisition systems consist in bus-based systems, like VME:

– Parallel data transfer on a common bus

– One source at the time can use the bus Î bottleneck

• At LHC: point-to-point links

– Optical or electrical standards – Serialized data

– All sources can send data together

• This is also a general trend in the market

– ISA, SCSI, IDE, VME in the 80s

– PCI, USB, FireWire in the 90s

– Today USB2, FireWire800, PCI-X, gigabit-Ethernet, …

(14)

Readout from the Front-End

Global Trigger Processor

Trigger Primitive Generator

Trigger Timing Control

(15)

Need a Standard Interface to Front-End

CMS

Detector Front-End Driver (FED) (equivalent to ROD in ATLAS)

(16)

The Experiment Choices

• ATLAS: S-LINK

– Optical link @ 160MB/s (GOL) with flow-control – Need ~1600 links

– Receiver card (read-out boards, ROB) in standard PCs

• CMS: SLINK-64

– Electrical (LVDS) link @ 200MB/s (max. 15m) with flow- control

– Need ~500 links

– Peak throughput 400 MB/s

– Receiver (Front-end Readout Link, FRL) in standard PC

• LHCb: TELL-1 and GbE

– Copper quadruple GbE, IPv4, no flow-control – Need ~400 links

– Direct connection to GbE switch

• ALICE: DLL

– Optical link @ 200 MB/s – Need ~400 links

(17)

Receivers: the Readout Units

• Basic Task

– Merge data from N front-end (usually in an “hardwired” way)

– Send event (multi-)fragments to processor farm via Event Builder – Store data until no-longer needed (data sent to processors or event

rejected)

• Issues

– Input and Output interconnect (bus/p2p/switch) – Sustained bandwidth required (200-800 MB/s)

• Current status

– PCI-based boards everywhere (more or less…)

– DMA engines “on board” to perform data transfers with low CPU load – Good performances and good roadmap for the future

– …but limited by bus architecture: shared medium and limited number of available slots in a PC motherboard

(18)

Event Building

(19)

Data flow: ATLAS vs. CMS

• R/O Buffer: commodity

– Implemented with custom PCI boards sitting on standard PCs

• Event Builder: challenging

– 100 kHz @ 1 MB = O(100) GB/s – Traffic shaping

• R/O Buffer: challenging

– RoI generation @ L1

– RoI Builder (custom module) – Selective r/o from readout

buffers to supply L2 processors

• Event Builder: commodity

– 1 kHz @ 1 MB = O(1) GB/s

(20)

Event Builder Scheme

• Event fragments are stored in independent physical

memories

• Each full event should be

stored in one physical memory of the processing unit (a

commodity PC)

• The EVB builds full events from event fragments

– must interconnect data sources to destination – Îhuge network switch

• How to efficiently implement

(21)

Event Building with a Switch

• A SWITCH allows to send data from a PC connected to a port (the input port) to a PC connected to another port (the output port) directly, without duplicating the packet to all ports (like in the case of a HUB). The switch

knows were the destination PC is connected and optimize data transfer

A type of switch you should be familiar with

(22)

Event Building via a Switch

Network Switch

……

N x Readout Buffers

EVB Traffic

All sources send to the same destination concurrently

Îcongestion

M x Builder

Units

(23)

Event Building via a Switch

• The event builder should not lead to a readout buffer overflow

• Input traffic

– The average rate accepted by the

switch port (R_in) must be larger or equal to the readout buffer data bandwith (B_in)

• Output traffic

– M builder blocks (output) with bandwith Bout receive fragments from N inputs.

To avoid blocking MxB_out >= NxR_in

Network Switch

……

(24)

Switch implementation: crossbar

• Simultaneous data transfer between any arbitrary number of inputs and outputs

• Self-routing or arbiter-based routing

• Output Contention issues will reduce the effective bandwidth

• Need traffic shaping!

• Adding (very fast) memory on switching elements could in principle allow to

create a non-blocking switch, but

bandwidth of the memory used for

FIFOS becomes prohibitively large

(25)

EVB traffic shaping: barrel shifter

The sequence of send from each source to each

destination follows the cyclic

permutations of the destinations

Allow to reach a

throughput closer to 100% of input

bandwidth

(26)

Switching Technologies

• Myricom Myrinet 2000

– 64 (of 128 possible) ports @ 2.5 Gb/s – Clos net (a network of smaller switches) – Custom firmware implements barrel

shifting

– Transport with flow control at all stages (wormhole data)

• Gigabit Ethernet

– FastIron8000 series – 64 ports @ 1.2 Gb/s

– Multi-port memory system – Standard firmware

– Packets can be lost

(27)

EVB Example: CMS

Scalable at the RU level

(28)

EVB example: CMS (2)

Scalable from 1 to 8

64x64

8x8

(29)

Summary of EVB

• Event Building is implemented with commercial network technologies by means of huge network switches

• But EVB network traffic is particularly hard for switches

– Lead to switch congestion

– The switch either blocks (packets @ input will have to wait) or throws away packets (Ethernet switches)

• Possible solutions

– Buy very expensive switches ($$$) with a lot of high speed memory inside

– Over-dimension the system in terms of bandwidth

– Use smart traffic shaping techniques to allow the switch to exploit nearly 100% of its resources

(30)

High-Level Trigger

(31)

Introduction

• High Level Trigger will perform the final data

reduction: 1 / O(1000) events

• ATLAS and CMS have a different approach:

– ATLAS have an additional L2 farm, to build “global” RoI – CMS has L2, L2.5 and L3 all

SW trigger levels (running on the same processors)

• HLT algorithm perform the very first analysis in real time

– There exist some constraints on available time and maximum data size that can be analyzed – Once an event is rejected it is

rejected forever Î can Interaction

Rate

Selected Events

(32)

HLT Requirements

• Flexibility

– The working conditions of LHC and of the experiments in pp interaction at 14 TeV are difficult to evaluate

• Robustness

– HLT algorithms should not depend in a critical way on alignment and calibration constants

• Fast event rejection

– Event not selected should be discarded as fast as possible

• Inclusive selection

– HLT selection should rely heavily (but not exclusively) on inclusive selection to guarantee maximum efficiency to new physics

• Selection efficiency

– It should be possible to evaluate it directly from the data

• Quasi-offline algorithms

(33)

HLT Implementation

• High level triggers (>level 1) are implemented as more or less advanced software trigger algorithms (almost off-line

quality reconstruction) running on standard processor (PC) farms with Linux as O/S

• Very cost effective

– Linux free and very stable

– Interconnect exists on the market

(34)

ATLAS Implementation

LEVEL 2 TRIGGER

• Regions‐of‐Interest “seeds”

• Full granularity for all subdetector systems

• Fast Rejection “steering”

• O(10 ms) latency

EVENT FILTER

• “Seeded” by Level 2 result

• Potential full event access

• Offline‐like Algorithms

• O(1 s) latency

High Level Triggers (HLT) Software triggers

(35)

CMS Implementation

Level‐1 Maximum trigger rate 100 kHz Average event size ≈ 1 Mbyte No. of In‐Out units 512

Readout network bandwidth ≈ 1 Terabit/s Event filter computing power ≈ 10⁶SI95

Data production ≈ Tbyte/day

Pure software High

multi-level Trigger

(36)

ATLAS Muon Reconstruction

• Level 2

– µFast: MDT-only track segment fit and pt estimate through a LUT (~1ms)

– µComb: extrapolation to inner detectors and new pt estimate (~0.1 ms)

– µISOL: track isolation check in calorimeter

• Event Filter (Level 3)

– TrigMOORE: track segments helix fit in detector (including real magnetic field map) (~1s)

– MUID: track extrapolation to vertex by LUT (energy loss and multiple scattering are included), Helix fit (~0.1 s)

• Now muon is ready for final trigger menu selection

(37)

CMS e/γ reconstruction

• Level 2 (calo info only)

– Confirm L1 candidates

– Super-cluster algorithm to recover bremsstrahlung

– Cluster reconstruction and E_t threshold cut

• Level 2.5 (pixel info)

– Calorimeter particles are traced back to vertex detector

– Electron and photon stream separation and E_t cut

• Level 3 (electrons)

– Track reconstruction in tracker with L2.5 seed – Track-cluster quality cuts

– E/p cut

• Level 3 (photons)

– High E_t cut

– γ γ-event Et asymmetric cut as in th HÆ γ γ offline analysis

• Now electrons and photons are ready for final

(38)

The Trigger Table

• Issue: what to save permanently on mass storage

– Which trigger streams have to be created?

– What is the bandwidth to be allocated to each stream?

• Selection Criteria

– Inclusive triggers: to cover the major known (and unknown) physics channels

– Exclusive trigger: to extend the physics potential to specific studies (as for b-physics)

– Prescaled, calibration and detector-monitor triggers

• For every trigger stream the allocated bandwidth depends on the status of the collider and of the experiment

• As a general rule, the trigger table should be flexible,

extensible, non-biasing and should allow the discovery of

(39)

The Trigger Table @L=2x10 ³³ cm ^-2 s ^-1

Trigger stream Threshold (GeV) Rate (Hz) Threshold (GeV) Rate (Hz)

Isolated muon 20 19

40+10

40+1

Isolated photon 60 80 4

Inclusive Tau jet 86 3

Di‐tau‐jet 59 1

Inclusive b‐jets 237 5

electron + jet 19, 45 2

Double isolated photon 20 25+2 40, 25 5

30 20 5

B‐physics topology 10

20

7 29 17

657, 247, 113 180,123

Double muon 10

25 4 33

1

9 5

10

Isolated electron 25

Double isolated electron 15

Single Jet, 3 Jet, 4 Jet 400, 165, 110

Jet + missing energy 70, 70

Tau + missing energy 35, 45

Other (pre‐scales, calibration, …)

ATLAS CMS

(40)

HLT Performance in CMS

(41)

CMS: how large should the HLT farm be?

All numbers are for 1 GHz, Intel Pentium III CPU (2003 estimate)

• The above table gives ~270ms/event in average

• Therefore a 100 kHz capable system will require 30,000 CPU (PIII@1 GHz)

• According to Moore’s law this will translate in ~40ms/event in 2007, requiring O(1000) dual-CPU boxes

(42)

HLT Summary

• CMS example shows that single-farm design works

• If @startup the L1 trigger rate is <100kHz we can lower threshold on L1 selection criteria and/or add triggers in order to fully use the available bandwidth

• If @startup the rate is higher L1 trigger can be

reprogrammed to stay within the available bandwidth

• HLT trigger streams seen are only an indication, we

will see what is really happening on day 1

(43)

Final Conclusions…

(44)

• The L1 trigger takes the LHC experiments from the 25 ns timescale (40 MHz) to the 1÷25 µs timescale

– Custom hardware

– huge fan in/fan out problems

– fast algorithm on coarse-grained and low resolution data

• Depending on the experiment, the HLT is organized in one or more steps which usually occur after EVB

– Commercial hardware, large networks, Gb/s links

– The need of a challenging hardware vs. commodity hardware also depends on the trigger architecture (for example on the existence of a L2 trigger “à la” ATLAS (RoI-based))