The transformation of deep learning for jet flavour tagging in ATLAS
by
Nicole Michelle Hartman(TUM)
→
Europe/Berlin
ENC-D308
ENC-D308
Description
At the Large Hadron Collider, we seek to determine the fundamental parameters of the Standard Model by identifying the origin of the final state particles in an event. A hadron collider produces copious amounts of jets in the final state, and the classification of the origin of these jets, or “jet flavor tagging” is of paramount importance for the LHC physics program. In particular, identifying the jets from bottom quarks, i.e. b-jets, is crucial for the Higgs and top physics programs.
Deep neural network architectures for b-tagging from a set of constituent tracks has been advancing in ATLAS since the past 7 years, originating with the introduction of recurrent neural networks, then evolving to deep sets and graph neural networks. Today’s state of the art in flavor tagging is a transformer model, the same backbone architecture as ChatGPT. The improvement from these modern transformers is an impressive factor of 4 improvement compared to previous b-taggers, also afforded by the ten-fold increase in training statistics. Additional terms in the loss function help the transformer learn more about the physics of the weakly decaying B-hadrons. This same architecture is of high utility for other related applications, such as identifying boosted Higgs bosons or regressing the b-jet momentum.
This talk reviews modern b-taggers on ATLAS, and showcases the robust modelling we’ve seen in real data events. We conclude, highlighting how differentiable programming and fine-tuning such large “foundation models” can continue to push the bounds of these performance gains into the next generation of tagger development.