From 42279dc607f094859161e1c1f8ea66c239762e35 Mon Sep 17 00:00:00 2001 From: Javier Duarte Date: Thu, 1 Feb 2024 18:44:17 -0800 Subject: [PATCH 1/2] add ssl-jets project --- projects/ssl-jets.yml | 42 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) create mode 100644 projects/ssl-jets.yml diff --git a/projects/ssl-jets.yml b/projects/ssl-jets.yml new file mode 100644 index 0000000..bb10205 --- /dev/null +++ b/projects/ssl-jets.yml @@ -0,0 +1,42 @@ +--- +name: Self-Supervised Approaches to Jet Assignment + +postdate: 2024-02-01 +categories: + - ML/AI +durations: + - 3 months +experiments: + - Any +skillset: + - Python + - ML +status: + - Available +project: + - IRIS-HEP +location: + - Any +commitment: + - Any +program: + - IRIS-HEP fellow + +shortdescription: Self-Supervised Approaches to Jet Assignment + +description: > + Supervised machine learning has assisted various tasks in experimental high energy physics. However, using supervised learning to solve complicated problems, like assigning jets to resonant particles like Higgs bosons, requires a statistically representative, accurate, and fully labeled dataset. With the HL-LHC upgrade [1] in the near future, we will need to simulate an order of magnitude more events with a more complicated detector geometry to keep up with the recorded data [2], facing both budgetary and technological challenges [2, 3]. Therefore, it is desirable to explore how to assign jets to reconstruct particles via self-supervised learning (SSL) methods, which pretrain models on a large amount of unlabeled data and fine-tune those models on a small high-quality labeled dataset. Existing attempts [4-6] to use SSL in HEP focus on performing tasks at the jet or event levels. In this project, we propose to use the reconstruction of Higgs bosons from bottom quark jets as a test case to explore SSL for jet assignment. We will explore different neural network architectures, including PASSWD-ABC [7] for the self-supervised pretraining and SPANet [8, 9] for the supervised fine-tuning. The SSL model's performance will be compared with a baseline model trained from scratch on the small labeled dataset. We will test if pretraining with diverse objectives [10] improves the model performance on downstream tasks like jet assignment or tagging. The code will be developed open source to help other SSL projects. + + 1. [HL-LHC] https://arxiv.org/abs/1705.08830 \ + 2. [Computing for HL LHC] https://doi.org/10.1051/epjconf/201921402036 \ + 3. [Computing summary] https://arxiv.org/abs/1803.04165 \ + 4. [JetCLR] https://arxiv.org/abs/2108.04253 \ + 5. [DarkCLR] https://arxiv.org/abs/2312.03067 \ + 6. [SSL for new physics] https://doi.org/10.1103/PhysRevD.106.056005 \ + 7. [PASSWD-ABC] https://arxiv.org/abs/2309.05728 \ + 8. [SPANet1] https://arxiv.org/abs/2010.09206 \ + 9. [SPANet2] https://arxiv.org/abs/2106.03898 \ + 10. [Pretraining benefits] https://arxiv.org/abs/2306.15063 +contacts: + - name: Javier Duarte + email: jduarte@ucsd.edu From 37d3a3cde9cf8c1234b93ff72a4c1fec40187af7 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Fri, 2 Feb 2024 02:44:58 +0000 Subject: [PATCH 2/2] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- projects/ssl-jets.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/projects/ssl-jets.yml b/projects/ssl-jets.yml index bb10205..3385326 100644 --- a/projects/ssl-jets.yml +++ b/projects/ssl-jets.yml @@ -26,7 +26,7 @@ shortdescription: Self-Supervised Approaches to Jet Assignment description: > Supervised machine learning has assisted various tasks in experimental high energy physics. However, using supervised learning to solve complicated problems, like assigning jets to resonant particles like Higgs bosons, requires a statistically representative, accurate, and fully labeled dataset. With the HL-LHC upgrade [1] in the near future, we will need to simulate an order of magnitude more events with a more complicated detector geometry to keep up with the recorded data [2], facing both budgetary and technological challenges [2, 3]. Therefore, it is desirable to explore how to assign jets to reconstruct particles via self-supervised learning (SSL) methods, which pretrain models on a large amount of unlabeled data and fine-tune those models on a small high-quality labeled dataset. Existing attempts [4-6] to use SSL in HEP focus on performing tasks at the jet or event levels. In this project, we propose to use the reconstruction of Higgs bosons from bottom quark jets as a test case to explore SSL for jet assignment. We will explore different neural network architectures, including PASSWD-ABC [7] for the self-supervised pretraining and SPANet [8, 9] for the supervised fine-tuning. The SSL model's performance will be compared with a baseline model trained from scratch on the small labeled dataset. We will test if pretraining with diverse objectives [10] improves the model performance on downstream tasks like jet assignment or tagging. The code will be developed open source to help other SSL projects. - + 1. [HL-LHC] https://arxiv.org/abs/1705.08830 \ 2. [Computing for HL LHC] https://doi.org/10.1051/epjconf/201921402036 \ 3. [Computing summary] https://arxiv.org/abs/1803.04165 \