Trigger and DAQ at LHC
Trigger and DAQ at LHC
Contents Contents
INTRODUCTION
The context: LHC & experiments
PART1:
Trigger at LHC
Requirements & Concepts
Muon and Calorimeter triggers (CMS and ATLAS) Specific solutions (ALICE, LHCb)
Hardware implementation
Part2:
Data Flow, Event Building and higher trigger levels
Data Flow of the 4 LHC experiments Data Readout (Interface to DAQ) Event Building: CMS as an example
Trigger summary Trigger summary
Experimental Conditions:
– 109 interactions / s @ L = 1034cm-2s-1 – ≈ 23 events / BX (i.e. every 25 ns)
– Max trigger rate 100kHz (limited by DAQ) – Synchronization of signals
– Large uncertainties in estimated trigger rates – Dedicated trigger signals / sub-detectors – Trigger on
σinel ≈ 100mb
Physics:
σ ≈ 0.001 … 1 pb
! "
ET Electromagnetic Hadron
&
Physics
– Complete Trigger System architecture – Trigger distribution
– Hardware implementation
– Special features for ALICE and LHCb
LHC experiments:
LHC experiments: Lvl Lvl 1 rate 1 rate vs vs size size
Previous or current experiments
Further trigger levels Further trigger levels
First level trigger selection:
1 out of 10000 (max 100 kHz rate)
This is not enough:
Typical event size: 1MB (ATLAS, CMS) 1 MB @ 100 kHz = 100 GB/s
“Reasonable” data rate to permanent Storage:
100 MB/s (CMS & ATLAS) … 1 GB/s (ALICE)
More trigger levels are needed to further reduce the fraction of less interesting events in the selected sample.
Trigger/DAQ parameters Trigger/DAQ parameters
No.Levels Level-0,1,2 Event Readout HLT Out
Trigger Rate (Hz) Size (Byte) Bandw.(GB/s) MB/s (Event/s)
3 LV-1 105 1.5x106 10 300 (2x102)
LV-2103
2 LV-1 105 106 100 100 (102)
3 LV-0 106 2x105 4 40 (2x102)
LV-1 4x104
4 Pp-Pp500 5x107 5 1250 (102)
p-p 103 2x106 200 (102)
High Lev el Trigger
Data Acquisition: main tasks Data Acquisition: main tasks
Lvl-1
HLT Lvl-2
Data readout from Front End Electronics Temporary buffering of event fragments in readout buffers
Provide higher level trigger with event data
Assemble events in
single location and provide to High Level Trigger (HLT)
Write selected events Lvl1 pipelines
Lvl1 trigger
Implementation of high level triggers Implementation of high level triggers
• Higher level triggers are implemented in software. Farms of PCs investigate event data in parallel.
• The PCs are connected to the Readout buffers via computer networks (details later)
Data Flow: Architecture
Data Flow: Architecture
Data Flow: ALICE Data Flow: ALICE
Lvl-0,1,2 front end pipeline
readout buffer event builder HLT farm HLT
event buffer
permanent data storage 25 GB/s
2.5 GB/s
1.25 GB/s
88µs lat custom hardware
PC
network switch
Data Flow:
Data Flow: LHCb LHCb
front end pipeline
Lvl-0
Lvl-1
HLT
1MHz
200Hz
readout buffer
readout/EVB network
Lvl1/HLT processing farm 10MHz (40 MHz clock)
40 kHz
4µs lat custom hardware
PC
network switch
Lvl1/HLT processing farm
readout link
Data Flow: ATLAS Data Flow: ATLAS
front end pipeline 100kHz
200Hz
readout buffer event builder HLT farm
40MHz
Lvl-1
Lvl-2
HLT
1 kHz
3µs lat custom hardware
PC
network switch
ROI Builder
Lvl2 farm
Regions Of Interest
Region of Interest (ROI):
Identified by Lvl1. Hint for Lvl2 To investigate further
Data Flow: CMS
Data Flow: CMS one trigger level one trigger level
event builder network 100kHz
100Hz
front end pipeline
readout buffer
processing farm 40MHz
100kHz, 100 GB/s
Lvl-1
HLT
3µs lat custom hardware PC
network switch
Data Flow: Readout Links
Data Flow: Readout Links
Data Flow: Data Readout Data Flow: Data Readout
• Former times: Use of bus-systems – VME or Fastbus
– Parallel data transfer (typical: 32 bit) on shared bus
– One source at a time can use the bus
• LHC: Point to point links – Optical or electrical – Data serialized
– Custom or standard protocols
– All sources can send data simultaneously
shared data bus (bottle-neck) data sources
• Compare trends in industry market:
– 198x: ISA, SCSI(1979),IDE, parallel port, VME(1982)
– 199x: PCI( 1990, 66MHz 1995), USB(1996), FireWire(1995)
buffer
Readout Links of LHC Experiments Readout Links of LHC Experiments
no yes yes Yes
Copper quad GbE Link ≈ 400 links
Protocol: IPv4 (direct connection to GbE switch) Forms “Multi Event Fragments”
TELL-1
& GbE Link
Optical 200 MB/s ≈ 400 links Half duplex: Controls FE (commands,
Pedestals,Calibration data) Receiver card interfaces to PC
DLL
LVDS: 200 MB/s (max. 15m) ≈ 500 links
Peak throughput 400 MB/s to absorb fluctuations Receiver card interfaces to commercial NIC
(Network Interface Card)
SLINK 64
Optical: 160 MB/s ≈ 1600 Links Receiver card interfaces to PC.
SLINK
Flow Control
Readout Links: Interface to PC Readout Links: Interface to PC
Problem:
Read data in PC with high bandwidth and low CPU load Note: copying data costs a lot of CPU time!
Solution: Buffer-Loaning
– Hardware shuffles data via DMA (Direct Memory Access) engines – Software maintains tables of buffer-chains
Advantage:
– No CPU copy involved
used for links of
Atlas, CMS, ALICE
Event Building: example CMS
Event Building: example CMS
Data Flow: Atlas
Data Flow: Atlas vs vs CMS CMS
Event Builder
“Commodity”
1kHz @ 1 MB = O(1) GB/s
Challenging
100kHz @ 1 MB = 100 GB/s Increased complexity:
• traffic shaping
• specialized (commercial) hardware
Readout Buffer
Challenging
Concept of “Region Of Interest” (ROI) Increased complexity
• ROI generation (at Lvl1)
• ROI Builder (custom module)
• selective readout from buffers
“Commodity”
Implemented with commercial PCs
Current EVB architecture Current EVB architecture
X
X X
Trigger
Front End
Readout Link Readout Buffer
Event builder network
Building Units
High Level Trigger Farm EVB Control
Intermezzo: Networking & Switches Intermezzo: Networking & Switches
Network Hub Address: a1
Address: a2
Address: a3
Address: a5
Address: a4
Dst: a4 Src: a2
Data
Intermezzo: Networking & Switches Intermezzo: Networking & Switches
Network Hub Address: a1
Address: a2
Address: a3
Address: a5
Address: a4 Dst: a4
Src: a2 Data
Dst: a4 Src: a2
Data
Dst: a4 Src: a2
Data
Dst: a4 Src: a2 Packets are replicated to Data
Intermezzo: Networking & Switches Intermezzo: Networking & Switches
Network Switch Address: a1
Address: a2
Address: a3
Address: a5
Address: a4
Dst: a4 Src: a2
Data
Intermezzo: Networking & Switches Intermezzo: Networking & Switches
Network Switch Address: a1
Address: a2
Address: a3
Address: a5
Address: a4 Dst: a4
Src: a2 Data
A switch “knows” the the addresses of the hosts connected to its “ports”
Networking: EVB traffic Networking: EVB traffic
Network Switch
“Builder Units”
Readout Buffers
Networking: EVB traffic Networking: EVB traffic
Network Switch
“Builder Units”
Readout Buffers
Lvl1 event
Networking: EVB traffic Networking: EVB traffic
Network Switch
Builder Units (M) Readout Buffers (N)
Lvl1 event
EVB traffic
all sources send to the same destination at (almost) concurrently.
Congestion
Ideal world for Event Building Ideal world for Event Building
• Aim: The Event Builder should not lead the Readout Buffers to overflow
• Input traffic:
– N Readout Buffers send data at a given rate to the Event Builder switch.
The average rate accepted by the switch input port must be larger or equal to the input data rate of the Readout Buffer.
Binput >= Rinput
• Output traffic:
– M Builder Units receive event fragments from each of the N input ports of the switch. To avoid blocking:
M * Boutput >= N * Rinput
These relations should hold for the particular case of Event Building:
All data sources send data fragments belonging to one event to the same destination
Switch implementation: crossbar Switch implementation: crossbar
I1
I4
O1 O4
Who operates the switches ? Control logic reads destination of package and sets the switches
appropriately.
Every input / output has A given max. “wire-speed”
(e.g. 2Gbits/sec).
Internal connections are much faster!
Switch implementation: crossbar Switch implementation: crossbar
I1
I4
O1 O4
Switch implementation: crossbar Switch implementation: crossbar
I1
I4
O1 O4
Switch implementation: crossbar Switch implementation: crossbar
I1
I4
O1 O4
Switch implementation Switch implementation
Crossbar switch: Congestion in EVB traffic
I1
I4
O1 O4
Only one packet at a time can be routed to the destination.
“Head of line” blocking
Switch implementation Switch implementation
Crossbar switch: Improvement : output FIFOs
I1
I4
O1 O4
Non-blocking switch:
The Fifos would need N times the bandwidth of the input lines. For large switches hard (or
not at all) achievable.
Switch implementation Switch implementation
Crossbar switch: Improvement : additional input FIFOs
I1
I4
O1 O4
Switch implementation Switch implementation
Crossbar switch: Improvement : additional input FIFOs
I1
I4
O1 O4
Still problematic:
Input Fifios can absorb data fluctuations until they are full. All
fine if:
Fifos capacity = O(event size) In practice: sizes of FIFOs are
much smaller!
EVB traffic: “head-of-line” blocking problem remains.
Alternative switch implementation Alternative switch implementation
O1 O4
I1
I4
Shared memory Crtl
Crtl
Similar issue:
The behavior of the
switch (blocking or non- blocking) depends largely on the amount of internal memory (FIFOs and
shared memory)
Conclusion: EVB traffic and switches Conclusion: EVB traffic and switches
• EVB network traffic is particularly hard for switches
– The traffic pattern is such that it leads to congestion in the switch.
– The switch either “blocks” (=packets at input have to “wait”) or throws away data packets (Ethernet switches)
• Possible cures:
– Buy very expensive switches with a lot of high speed memory in side – “Over-dimension” your system in terms of bandwidth and accept to
only exploit a small fraction of the “wire-speed”.
– Find a clever method which allows you to anyway exploit your switches to nearly 100%: traffic-shaping
EVB example: CMS
EVB example: CMS
EVB CMS: 2 stages
EVB CMS: 2 stages
CMS: 3D - EVB
CMS: 3D - EVB
Stage1: FED-Builder implementation Stage1: FED-Builder implementation
• FED Builder functionality
– Receives event fragments from several (8) Front End Readout Links (FRL).
– FRL fragments are merged into super-fragments at the destination (Readout Unit).
• FED Builder implementation
– Requirements:
• Sustained throughput of 200MB/s for every data source (500 in total).
• Input interfaces to FPGA (in FRL) -> protocol must be simple.
– Chosen network technology: Myrinet
• NICs (Network Interface Cards) with 2 x 2.0 Gb/s optical links
• Switches based on cross bars.
• Full duplex with flow control (no packet loss).
• NIC cards contain RISC processor. Development system available.
Performance of
Performance of “ “ 1 rail 1 rail ” ” FEDBuilder FEDBuilder
Measurement configuration:
8 sources to 8 destinations
Measured switch utilization:
Blue: all inputs 2 kB avg Magenta: 4 x 1.33 kB 4 x 2.66 kB Red: 4 x 1 kB 4 x 3 kB
≈ 50 %
Solution:
Solution: “ “ over-dimension over-dimension ” ” FED-Builder FED-Builder
Stage2: RU-Builder Stage2: RU-Builder
• Implementation: Myrinet
– Connect 64 Readout-Units to 64 Builder-Units with switch – Wire-speed in Myrinet: 250MB/s
• Traffic shaping with Barrel Shifter
– Network topology: CLOS switch
• No congestion if all source-ports send to different destination ports – Chop event data into fixed size blocks (re-assembly done at
receiver)
– Barrel shifter (next slide)
Clos-switch
Clos-switch 128 from 8x8 Crossbars 128 from 8x8 Crossbars
8x8 Crossbar switches
RU-Builder: Barrel shifter
RU-Builder: Barrel shifter
RU Builder Performance RU Builder Performance
• EVB - Demo 32x32
• Blocksize 4kB
• Throughput at 234 MByte/s = 94% of link Bandwidth
Measurement 2003
Working point
Event Building: EVB-Protocol Event Building: EVB-Protocol
• Aim: Event Builder should perform load balancing
– If for some reason some destinations are slower then others this should not slow down the entire DAQ system.
– Another form of traffic shaping
X
X X
Trigger
Front End Readout Link Readout Buffer
Event builder network Building Units
High Level Trigger Farm EVB Control:
Event Manager
Event Building: EVB-Protocol Event Building: EVB-Protocol
X
X X
Trigger
EVB Control:
Event Manager
I have n free resources.
Event Building: EVB-Protocol Event Building: EVB-Protocol
X
X X
Trigger
EVB Control:
Event Manager
Build events id1, id2, … idn
Event Building: EVB-Protocol Event Building: EVB-Protocol
X
X X
Trigger
EVB Control:
Event Manager
Send me fragments for Events: id1, id2, … idn
Conclusions Conclusions
• Trigger / DAQ at LHC experiments
Lvl-1
Lvl-2
HLT
Lvl-1
HLT
Many Trigger levels:
- partial event readout - complex readout buffer - “straight forward” EVB
One Trigger level (CMS):
- “simple” readout buffer - high throughput EVB
- complex EVB implementation (custom protocols, firmware)
• Detector Readout: Custom Point to Point Links
• Event-Building
– Implemented with commercial Network technologies
– Event building is done via “Network-switches” in large distributed systems.
– Event Building traffic leads to network congestion Traffic shaping copes with these problems
Outlook
Outlook
Moores
Moores Law Law
now
Technology:
Technology: FPGAs FPGAs
• Performance and features of todays “Top Of The Line”
– XILINX:
• High Performance Serial Connectivity (3.125Gb/s transceivers):
– 10GbE Cores, Infiniband, Fibre Channel, …
• PCI-express Core (1x and 4x => 10GbE ready)
• Embedded Processor:
– 1 or 2 400MHz Power PC 405 cores on chip – ALTERA:
• 3.125 Gb/s transceivers
– 10GbE Cores, Infiniband, Fibre Channel, …
• PCI-express Core
• Embedded Processors:
– ARM processor (200MHz)
– NIOS “soft” RISC: configurable
PCIexpress
Modern Server PC
Technology: PCs Technology: PCs
• Connectivity
– PCI(x) --> PCI express – 10GbE network interface
• INTELs Processor technology
– Future processors (2015):
• Parallel processing
• Many CPU-cores on the same silicon chip
• Cores might be different (e.g.
special cores for communication to offload software)
Event Builder Tasks Event Builder Tasks
• Synchronization of fragments
– Fragments of the same BX can be assembled into event – Possible means to achieve this
• Trigger Number
• Bunch counter (no overflow)
• Time-stamp (resolution of time needs to match machine parameters)
• Destination assignment
– Event fragments of one interaction need to be sent to the same destination in order to assemble the event
– Possible implementations:
• An “Event Manager” is distributing destinations for events identified by its Trigger Number
• Use “static” destination assignment based on Trigger Number
• Use “static” destination assignment based on BX Number
• Use algorithm based on time stamp
Cont Cont ’ ’ ed ed : Event Builder Tasks : Event Builder Tasks
• Load balancing for destination units:
– Needed optimally exploit the available resources – Needs a “feed back” system:
• Work load of destinations must be fed back to Event Manager:
– Possible Implementations:
• “Send on demand”: Destination assignment is done as a function of a requests from destination
• Static destination assignment: destination tables need to be updated if destinations start to become idle.
Current EVB architecture Current EVB architecture
X
X X
Trigger
Front End
Readout Link Readout Buffer
Event builder network
Building Units
High Level Trigger Farm EVB Control
Future EVB architecture I Future EVB architecture I
X
X X
Trigger, Dest. Assign.
Front End
Readout Link
Event builder network
Building Units
High Level Trigger Farm
Lvl1A & destination
Future EVB architecture II Future EVB architecture II
X
Trigger, Dest. Assign.
Front End
Readout Link
Event builder network
Builder and
High Level Trigger Farm
Lvl1A & destination
EXTRA SLIDES
EXTRA SLIDES
Example CMS: data flow
Example CMS: data flow
High Level
High Level Triger Triger : CPU usage : CPU usage
• Based on full simulation, full analysis and “offline” HLT Code
• All numbers for a 1 GHz, Intel Pentium-III CPU
• Total: 4092s for 15.1 kHz -> 271 ms/event
• Expect improvements, additions.
• A 100kHz system requires 1.2x106 SI95
• Correspoinds to 2000 dual CPU boxes in 2007 (assuming Moores’s law)
150 0.5
300 B-jets
–132
–0.8
–165
–e * jet
–170
–3.4
–50
–Jets, Jet * Miss-ET
–390
–3.0
–130
–1τ, 2τ
–2556
–3.6
–710
–1µ, 2µ
–688
–4.3
–160
–1e/γ, 2e/γ
–Total (s)
–Rate (kHz)
–CPU (ms)
–Trigger
GbE-EVB
GbE-EVB : raw packet & special driver : raw packet & special driver
• Point to point
120 MB/s• EVB 32x32
sawtooth due to MTU.Above 10k: plateau of 115 MB/s ie 92% of link speed (1Gbps)
• Alteon AceNIC
• FastIron-8000
+Recovery from packet loss
GbE-EVB
GbE-EVB : TCP/IP full standard : TCP/IP full standard
0 20 40 60 80 100 120 140
100 1000 10000 100000
Fragment Size (Byte)
Throughput per Node (MB/s)
link BW (1Gbps)
TCP/IP p2p 1-way stream TCP/IP EVB 32x32
• Point to point
:At 1 kB 88 MB/s (70%) .
• EVB 32x32
:At 16 kB 75 MB/s (60%)
• Alteon AceNIC
• FastIron-8000
• standard MTU (1500 B payload)
• PentiumIII Linux2.4
Myrinet
Myrinet (old switch) (old switch)
•
network built out of crossbars (Xbar16)• wormhole routing, built-in back pressure (no packet loss)
•
switch: 128-Clos switch crate• 64x64 x 2.0 Gbit/s port (bisection bandwidth 128 Gbit/s)
• NIC: M3S-PCI64B-2 (LANai9 with RISC), custom Firmware
Demonstrator equipment 2003 Demonstrator equipment 2003
• 64 PCs
• SuperMicro 370DLE (733 MHz and 1 GHz Pentium3), 256 MB DRAM
• ServerWorks LE chipset PCI: 32b/33MHz + 64b/66 MHz (2 slots)
• Linux 2.4
• Myrinet2000 Clos-128 switch (64 ports equipped) and M3M-PCI64B NICs
• GB Ethernet 64 port FastIron8000 and Alteon AceNIC NICs
“State-of-the-Art” in 2001
LHCb LHCb
• Operate at L = 2 x 10
32cm
-2s
-1: 10 MHz event rate
• Lvl0: 2-4 us latency, 1MHz output
– Pile-up veto, calorimeter, muon
• Lvl1: 52.4ms latency, 40 kHz output
– Impact parameter measurements – Runs on same farm as HLT, EVB
• Pile up veto
– Can only tolerate one interaction per bunch crossing since
otherwise always a displaced vertex would be found by trigger
Event Building Event Building
• ATLAS, CMS, LHCb
– Event Building at 2nd Level Trigger rate O(10kHz) – Data throughput of Event Builder O(10GB/s)
• CMS
– Only one level trigger (before HLT trigger on computing farm) – Event building parameters:
• First level trigger operates at 100kHz
• Event size O(1MB)
• Throughput 100GB/s – Further requirements
• Scalability
• Possibility to exploit technology improvements