Trigger and Data Acquisition
at the Large Hadron Collider
Acknowledgments (again)
• This overview talk would not exist without the help of many colleagues and all the material available online
• I wish to thank the colleagues from ATLAS, CMS, LHCb and ALICE, in particular R. Ferrari, P. Sphicas, C. Schwick, E.
Pasqualucci, A. Nisati, F. Pastore, S. Marcellini, S. Cadeddu ,
M. Zanetti, A. Di Mattia and many others for their excellent
reports and presentations
Day 2 - Summary
• Data acquisition
– Data flow scheme– Readout: from the front-end to the readout buffers
• Event Building
– How-to– Switching methods, limits and available technologies – A challenging example: CMS
• High-Level Trigger
– Requirements – Implementation – Performances• Final Conclusions
L1 Trigger Summary
Difficult experimental conditions at LHC !!!
• ~109 interactions per second
@ L = 1034 cm-2s-1
• 22 interactions / bunch crossing
• DAQ-limited trigger rate:
100 kHz ATLAS, CMS, 1 MHz LHCb
• Large uncertainties in estimating trigger rates
• L1 trigger on fast
(calorimeter and muon) information only
• The L1 architecture in the LHC experiments
Min. bias
More than 10 orders of magnitude
Physics
L1 Rate vs. Event Size
Need More Trigger Levels
• L1 trigger selection:
– 1 out of 1000-10000 (max. output rate ~ 100 kHz)
• This is NOT enough
– The typical ATLAS/CMS event size is 1 MB – 1 MB x 100 kHz = 100 GB/s (!!!)
• What is the amount of data we could “reasonably” store nowadays?
– 100 MB/s (ATLAS, CMS, LHCb) ÷ 1 GB/s (ALICE)
• More trigger levels are needed to further reduce the
fractions of less interesting events in the data sample to be
written to the permanent storage
Trigger/DAQ at LHC
100 (~103)
Data Readout at LHC
Data flow: summary
• L1 pipelines (analog/digital)
• When L1 arrives, data readout from the Front-End
Electronics
• Accepted event fragments are temporarily stored on readout buffers
• Local detector data (partially assembled) could be used to provide an intermediate
trigger level
• Assemble event
• Provide High Level trigger(s)
• Write to permanent storage
ATLAS: the data flow
RoI based: identified by L1, are used by L2 trigger to investigate further (additional O(100) background rejection)(not pipelines, direct access to data by L2 farm)
Only RoI data
CMS: the data flow
!
LHCb: the data flow
2 kHz
(Find high IP tracks using silicon detector information )
The Data Readout
• In “classical” data acquisition systems consist in bus-based systems, like VME:
– Parallel data transfer on a common bus
– One source at the time can use the bus Î bottleneck
• At LHC: point-to-point links
– Optical or electrical standards – Serialized data– All sources can send data together
• This is also a general trend in the market
– ISA, SCSI, IDE, VME in the 80s– PCI, USB, FireWire in the 90s
– Today USB2, FireWire800, PCI-X, gigabit-Ethernet, …
Readout from the Front-End
Global Trigger Processor
Trigger Primitive Generator
Trigger Timing Control
Need a Standard Interface to Front-End
CMS
Detector Front-End Driver (FED) (equivalent to ROD in ATLAS)
The Experiment Choices
• ATLAS: S-LINK
– Optical link @ 160MB/s (GOL) with flow-control – Need ~1600 links
– Receiver card (read-out boards, ROB) in standard PCs
• CMS: SLINK-64
– Electrical (LVDS) link @ 200MB/s (max. 15m) with flow- control
– Need ~500 links
– Peak throughput 400 MB/s
– Receiver (Front-end Readout Link, FRL) in standard PC
• LHCb: TELL-1 and GbE
– Copper quadruple GbE, IPv4, no flow-control – Need ~400 links
– Direct connection to GbE switch
• ALICE: DLL
– Optical link @ 200 MB/s – Need ~400 links
Receivers: the Readout Units
• Basic Task
– Merge data from N front-end (usually in an “hardwired” way)
– Send event (multi-)fragments to processor farm via Event Builder – Store data until no-longer needed (data sent to processors or event
rejected)
• Issues
– Input and Output interconnect (bus/p2p/switch) – Sustained bandwidth required (200-800 MB/s)
• Current status
– PCI-based boards everywhere (more or less…)
– DMA engines “on board” to perform data transfers with low CPU load – Good performances and good roadmap for the future
– …but limited by bus architecture: shared medium and limited number of available slots in a PC motherboard
Event Building
Data flow: ATLAS vs. CMS
• R/O Buffer: commodity
– Implemented with custom PCI boards sitting on standard PCs
• Event Builder: challenging
– 100 kHz @ 1 MB = O(100) GB/s – Traffic shaping
• R/O Buffer: challenging
– RoI generation @ L1– RoI Builder (custom module) – Selective r/o from readout
buffers to supply L2 processors
• Event Builder: commodity
– 1 kHz @ 1 MB = O(1) GB/sEvent Builder Scheme
• Event fragments are stored in independent physical
memories
• Each full event should be
stored in one physical memory of the processing unit (a
commodity PC)
• The EVB builds full events from event fragments
– must interconnect data sources to destination – Îhuge network switch
• How to efficiently implement
Event Building with a Switch
• A SWITCH allows to send data from a PC connected to a port (the input port) to a PC connected to another port (the output port) directly, without duplicating the packet to all ports (like in the case of a HUB). The switch
knows were the destination PC is connected and optimize data transfer
A type of switch you should be familiar with
Event Building via a Switch
Network Switch
……
……
N x Readout Buffers
EVB Traffic
All sources send to the same destination concurrently
Îcongestion
M x Builder
Units
Event Building via a Switch
• The event builder should not lead to a readout buffer overflow
• Input traffic
– The average rate accepted by the
switch port (Rin) must be larger or equal to the readout buffer data bandwith (Bin)
• Output traffic
– M builder blocks (output) with bandwith Bout receive fragments from N inputs.
To avoid blocking MxBout >= NxRin
Network Switch
……
……
Switch implementation: crossbar
• Simultaneous data transfer between any arbitrary number of inputs and outputs
• Self-routing or arbiter-based routing
• Output Contention issues will reduce the effective bandwidth
• Need traffic shaping!
• Adding (very fast) memory on switching elements could in principle allow to
create a non-blocking switch, but
bandwidth of the memory used for
FIFOS becomes prohibitively large
EVB traffic shaping: barrel shifter
The sequence of send from each source to each
destination follows the cyclic
permutations of the destinations
Allow to reach a
throughput closer to 100% of input
bandwidth
Switching Technologies
• Myricom Myrinet 2000
– 64 (of 128 possible) ports @ 2.5 Gb/s – Clos net (a network of smaller switches) – Custom firmware implements barrel
shifting
– Transport with flow control at all stages (wormhole data)
• Gigabit Ethernet
– FastIron8000 series – 64 ports @ 1.2 Gb/s
– Multi-port memory system – Standard firmware
– Packets can be lost
EVB Example: CMS
Scalable at the RU level
EVB example: CMS (2)
Scalable from 1 to 8
64x64
8x8
Summary of EVB
• Event Building is implemented with commercial network technologies by means of huge network switches
• But EVB network traffic is particularly hard for switches
– Lead to switch congestion– The switch either blocks (packets @ input will have to wait) or throws away packets (Ethernet switches)
• Possible solutions
– Buy very expensive switches ($$$) with a lot of high speed memory inside
– Over-dimension the system in terms of bandwidth
– Use smart traffic shaping techniques to allow the switch to exploit nearly 100% of its resources
High-Level Trigger
Introduction
• High Level Trigger will perform the final data
reduction: 1 / O(1000) events
• ATLAS and CMS have a different approach:
– ATLAS have an additional L2 farm, to build “global” RoI – CMS has L2, L2.5 and L3 all
SW trigger levels (running on the same processors)
• HLT algorithm perform the very first analysis in real time
– There exist some constraints on available time and maximum data size that can be analyzed – Once an event is rejected it is
rejected forever Î can Interaction
Rate
Selected Events
HLT Requirements
• Flexibility
– The working conditions of LHC and of the experiments in pp interaction at 14 TeV are difficult to evaluate
• Robustness
– HLT algorithms should not depend in a critical way on alignment and calibration constants
• Fast event rejection
– Event not selected should be discarded as fast as possible
• Inclusive selection
– HLT selection should rely heavily (but not exclusively) on inclusive selection to guarantee maximum efficiency to new physics
• Selection efficiency
– It should be possible to evaluate it directly from the data
• Quasi-offline algorithms
HLT Implementation
• High level triggers (>level 1) are implemented as more or less advanced software trigger algorithms (almost off-line
quality reconstruction) running on standard processor (PC) farms with Linux as O/S
• Very cost effective
– Linux free and very stable
– Interconnect exists on the market
ATLAS Implementation
LEVEL 2 TRIGGER
• Regions‐of‐Interest “seeds”
• Full granularity for all subdetector systems
• Fast Rejection “steering”
• O(10 ms) latency
EVENT FILTER
• “Seeded” by Level 2 result
• Potential full event access
• Offline‐like Algorithms
• O(1 s) latency
High Level Triggers (HLT) Software triggers
CMS Implementation
Level‐1 Maximum trigger rate 100 kHz Average event size ≈ 1 Mbyte No. of In‐Out units 512
Readout network bandwidth ≈ 1 Terabit/s Event filter computing power ≈ 106SI95
Data production ≈ Tbyte/day
Pure software High
multi-level Trigger
ATLAS Muon Reconstruction
• Level 2
– µFast: MDT-only track segment fit and pt estimate through a LUT (~1ms)
– µComb: extrapolation to inner detectors and new pt estimate (~0.1 ms)
– µISOL: track isolation check in calorimeter
• Event Filter (Level 3)
– TrigMOORE: track segments helix fit in detector (including real magnetic field map) (~1s)
– MUID: track extrapolation to vertex by LUT (energy loss and multiple scattering are included), Helix fit (~0.1 s)
• Now muon is ready for final trigger menu selection
CMS e/γ reconstruction
• Level 2 (calo info only)
– Confirm L1 candidates
– Super-cluster algorithm to recover bremsstrahlung
– Cluster reconstruction and Et threshold cut
• Level 2.5 (pixel info)
– Calorimeter particles are traced back to vertex detector
– Electron and photon stream separation and Et cut
• Level 3 (electrons)
– Track reconstruction in tracker with L2.5 seed – Track-cluster quality cuts
– E/p cut
• Level 3 (photons)
– High Et cut
– γ γ-event Et asymmetric cut as in th HÆ γ γ offline analysis
• Now electrons and photons are ready for final
The Trigger Table
• Issue: what to save permanently on mass storage
– Which trigger streams have to be created?– What is the bandwidth to be allocated to each stream?
• Selection Criteria
– Inclusive triggers: to cover the major known (and unknown) physics channels
– Exclusive trigger: to extend the physics potential to specific studies (as for b-physics)
– Prescaled, calibration and detector-monitor triggers
• For every trigger stream the allocated bandwidth depends on the status of the collider and of the experiment
• As a general rule, the trigger table should be flexible,
extensible, non-biasing and should allow the discovery of
The Trigger Table @L=2x10 33 cm -2 s -1
Trigger stream Threshold (GeV) Rate (Hz) Threshold (GeV) Rate (Hz)
Isolated muon 20 19
40+10
40+1
Isolated photon 60 80 4
Inclusive Tau jet 86 3
Di‐tau‐jet 59 1
Inclusive b‐jets 237 5
electron + jet 19, 45 2
Double isolated photon 20 25+2 40, 25 5
30 20 5
B‐physics topology 10
20
7 29 17
657, 247, 113 180,123
Double muon 10
25 4 33
1
9 5
10
Isolated electron 25
Double isolated electron 15
Single Jet, 3 Jet, 4 Jet 400, 165, 110
Jet + missing energy 70, 70
Tau + missing energy 35, 45
Other (pre‐scales, calibration, …)
ATLAS CMS
HLT Performance in CMS
CMS: how large should the HLT farm be?
All numbers are for 1 GHz, Intel Pentium III CPU (2003 estimate)
• The above table gives ~270ms/event in average
• Therefore a 100 kHz capable system will require 30,000 CPU (PIII@1 GHz)
• According to Moore’s law this will translate in ~40ms/event in 2007, requiring O(1000) dual-CPU boxes
HLT Summary
• CMS example shows that single-farm design works
• If @startup the L1 trigger rate is <100kHz we can lower threshold on L1 selection criteria and/or add triggers in order to fully use the available bandwidth
• If @startup the rate is higher L1 trigger can be
reprogrammed to stay within the available bandwidth
• HLT trigger streams seen are only an indication, we
will see what is really happening on day 1
Final Conclusions…
• The L1 trigger takes the LHC experiments from the 25 ns timescale (40 MHz) to the 1÷25 µs timescale
– Custom hardware
– huge fan in/fan out problems
– fast algorithm on coarse-grained and low resolution data
• Depending on the experiment, the HLT is organized in one or more steps which usually occur after EVB
– Commercial hardware, large networks, Gb/s links
– The need of a challenging hardware vs. commodity hardware also depends on the trigger architecture (for example on the existence of a L2 trigger “à la” ATLAS (RoI-based))
• HLT: will run algorithms as close as possible to offline ones
– Large processor PC farm (“easy” nowadays)– Monitoring issues…