diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..3f23715 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,94 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Project Overview + +This is a Jekyll-based training module for the HEP Software Foundation (HSF) teaching "Matplotlib for HEP". It introduces matplotlib plotting library and creates plots commonly used in High Energy Physics, including specialized HEP styling via `mplhep`. + +## Recent Updates + +The training materials have been modernized to align with current matplotlib and Python best practices: +- Updated to modern matplotlib 3.x+ styling and API usage +- Enhanced with mplhep context managers and experiment-specific styles +- Fixed deprecated functions and improved plot aesthetics +- Added comprehensive requirements.txt for dependency management +- Updated MathJax to modern CDN for equation rendering + +## Common Development Commands + +### Local Development +- **Serve locally**: `make serve` - Builds and serves the Jekyll site locally +- **Build site**: `make site` - Builds the site without serving +- **Clean**: `make clean` - Removes generated files and caches + +### Docker Alternative +- **Docker serve**: `make docker-serve` - Uses Docker to serve the site (requires Docker) + +### Repository Maintenance +- **Repository check**: `make repo-check` - Validates repository settings +- **Lesson validation**: `make lesson-check` - Validates lesson Markdown files +- **Complete validation**: `make lesson-check-all` - Full validation including line lengths and whitespace +- **Unit tests**: `make unittest` - Runs tests on checking tools + +### Pre-commit Hooks +This repository uses pre-commit hooks for code quality: +```bash +pip3 install pre-commit +pre-commit install +``` + +## Site Architecture + +### Jekyll Structure +- **_config.yml**: Main Jekyll configuration for HSF training theme +- **_episodes/**: Lesson content in Markdown format (7 episodes total) +- **_extras/**: Additional pages (about, discussion, figures, guide) +- **_includes/**: Reusable template components +- **fig/**: Image assets for lessons +- **Gemfile**: Ruby dependencies including hsf-training-theme + +### Content Organization +- Episodes are numbered sequentially (01-07) covering: + - **01-introduction.md**: Matplotlib basics, modern best practices, figure creation + - **02-coffee-break.md**: Interactive break elements + - **03-physics.md**: Standard Model background, particle physics theory + - **04-higgs-search.md**: Real ATLAS data analysis, histogram techniques + - **05-mplhep.md**: HEP-specific styling, experiment themes (CMS, ATLAS, etc.) + - **06-coffee-break.md**: Interactive break elements + - **07-dimuonspectrum.md**: Advanced analysis with invariant mass calculations +- Uses Carpentries lesson template structure +- Physics-focused content with LaTeX math support via modern MathJax + +### Build System +- Uses Jekyll with HSF training theme +- Ruby-based build system via Bundler +- Makefile provides convenience commands +- GitHub Pages deployment via gh-pages branch +- Pre-commit hooks enforce markdown and code formatting + +## Development Notes + +- Main development branch: `gh-pages` +- Uses HSF training theme from remote repository +- Includes physics equation rendering via modern MathJax 3.x +- Associated notebooks repository for interactive content: `hsf_matplotlib_notebooks` +- Multiple cloud platforms supported: Binder, Google Colab, GitHub Codespaces, CERN SWAN +- **requirements.txt** available for local Python environment setup + +## Python Dependencies + +The repository includes a comprehensive `requirements.txt` with: +- Core: matplotlib>=3.6.0, numpy, pandas +- HEP-specific: mplhep>=0.3.0, uproot>=4.0.0, hist>=2.6.0 +- Jupyter ecosystem: jupyterlab>=4.0.0, notebook, ipywidgets +- Data handling: h5py for HDF5 files + +## Code Style Guidelines + +- Use explicit `fig, ax = plt.subplots()` pattern +- Prefer `plt.show()` over `fig.show()` +- Use context managers for temporary styling: `with hep.style.use("CMS"):` +- Include `plt.tight_layout()` for better subplot spacing +- Add subtle grids with `alpha=0.3` for better readability +- Use modern error bar styling with `capsize` and `markersize` parameters \ No newline at end of file diff --git a/_episodes/01-introduction.md b/_episodes/01-introduction.md index ccd3077..19106a3 100644 --- a/_episodes/01-introduction.md +++ b/_episodes/01-introduction.md @@ -20,6 +20,15 @@ The example-based nature of [Matplotlib documentation](https://matplotlib.org/) Matplotlib is the standard when it comes to making plots in Python. It is versatile and allows for lots of functionality and different ways to produce many plots. We will be focusing on using matplotlib for High Energy Physics. +> ## Modern Best Practices +> +> While matplotlib has evolved significantly, some key modern practices include: +> - Using explicit figure and axes creation with `fig, ax = plt.subplots()` +> - Leveraging context managers for temporary styling changes +> - Using more descriptive parameter names (e.g., `alpha=0.8` instead of just transparency) +> - Taking advantage of improved default styling in matplotlib 3.x+ +{: .callout} + # A simple example As with any Python code it is always good practice to import the necessary libraries as a first step. @@ -38,7 +47,6 @@ plt.show() # Show the figure This code produces the following figure: -  > ## Notice @@ -130,11 +138,11 @@ plt.show() As mentioned, by default `fig, ax = plt.subplots()` creates the canvas automatically. We can have finer control over the shape and quality of the plots by using the keyword arguments `figsize` and `dpi` as follows. ```python -fig, ax = plt.subplots(figsize=(10, 10), dpi=150) +fig, ax = plt.subplots(figsize=(10, 8), dpi=100) ``` -This has to be set **before** any instance of `ax.plot` and it sets the width and height to 10 and 10 inches respectively. The keyword `dpi` refers to a density of *Dots Per Inch*. -There is no particular reason to choose 150 as the value for dpi but there is a visually a noticeable difference in the size and quality of the plot. +This has to be set **before** any instance of `ax.plot` and it sets the width and height to 10 and 8 inches respectively. The keyword `dpi` refers to a density of *Dots Per Inch*. +Modern displays typically use 100 DPI as a good default, though you can increase to 150 or 200 for higher quality output. ### Title @@ -231,12 +239,11 @@ We have available a useful python package called [mplhep](https://mplhep.readthe ```python import mplhep as hep -hep.style.use(hep.style.ROOT) # For now ROOT defaults to CMS -# Or choose one of the experiment styles +# Modern mplhep usage - choose one of the experiment styles hep.style.use(hep.style.ATLAS) -# or -hep.style.use("CMS") # string aliases work too -# {"ALICE" | "ATLAS" | "CMS" | "LHCb1" | "LHCb2"} +# or using string aliases (recommended) +hep.style.use("CMS") +# Available styles: {"ALICE" | "ATLAS" | "CMS" | "LHCb1" | "LHCb2" | "ROOT"} ``` and with just this addition we can produce the same plot as before with this new look. @@ -267,14 +274,25 @@ We will discuss histograms more in detail later but here is an example code and # first lets get some fake data data = np.random.normal(size=10_000) +# create figure +fig, ax = plt.subplots() + # now lets make the plot -counts, bin_edges, _ = ax.hist(data, bins=50, histtype="step") +counts, bin_edges, _ = ax.hist(data, bins=50, histtype="step", alpha=0.8) + +# we need to get the centers in order get the correct location for the error bars +bin_centers = ( + bin_edges[:-1] + bin_edges[1:] +) / 2 # More explicit bin center calculation -# we need to get the centers in order get the correct location for the errobars -bin_centers = bin_edges[:-1] + np.diff(bin_edges) / 2 +# add error bars (Poisson errors for counts) +ax.errorbar(bin_centers, counts, yerr=np.sqrt(counts), fmt="none", capsize=2) -# add error bars -ax.errorbar(bin_centers, counts, yerr=np.sqrt(counts), fmt="none") +# Add labels and improve appearance +ax.set_xlabel("Value") +ax.set_ylabel("Counts") +ax.set_title("Histogram with Error Bars") +ax.grid(True, alpha=0.3) plt.show() ``` diff --git a/_episodes/03-physics.md b/_episodes/03-physics.md index b5ba127..8fd61e7 100644 --- a/_episodes/03-physics.md +++ b/_episodes/03-physics.md @@ -10,9 +10,9 @@ keypoints: - "Analysis studies Higgs boson decays" --- - + You can take a look at the mathematical structure of the Standard Model of Particle Physics in the next section, or go directly to the main Higgs production mechanisms at hadron colliders. diff --git a/_episodes/04-higgs-search.md b/_episodes/04-higgs-search.md index 881a08b..984b081 100644 --- a/_episodes/04-higgs-search.md +++ b/_episodes/04-higgs-search.md @@ -13,7 +13,7 @@ objectives: keypoints: - "In High-energy physics, histograms are used to analyze different data and MC distributions." - "With Matplotlib, data can be binned and histograms can be plotted in a few lines." -- "Using [Uproot](https://github.com/scikit-hep/uproot5) and Matplotlib, data in ROOT files can be display without need of a full ROOT installation." +- "Using [Uproot](https://github.com/scikit-hep/uproot5) and Matplotlib, data in ROOT files can be displayed without need of a full ROOT installation." - "Histograms can be stacked and/or overlapped to make comparison between recorded and simulated data." --- In this episode, we will go through a first HEP analysis where you will be able to apply your knowledge of matplotlib and learn something new. @@ -150,14 +150,19 @@ ax.hist(branches["data_A"]["m4l"])  -**Tip:** In the previous plot the numbers in the axis are very small, we can change the font size (and font family) for all the following plots, including in our code: +**Tip:** In the previous plot the numbers in the axis are very small, we can change the font size (and font family) for all the following plots. Modern best practice is to use context managers or explicit parameter setting: ```python -# Update the matplotlib configuration parameters: -mpl.rcParams.update({"font.size": 16, "font.family": "serif"}) +# Modern approach - use context manager for temporary changes +with plt.rc_context({"font.size": 16, "font.family": "serif"}): + fig, ax = plt.subplots() + # your plotting code here + +# Or update global settings (affects all subsequent plots) +plt.rcParams.update({"font.size": 16, "font.family": "serif"}) ``` -Note that this changes the global setting, but it can still be overwritten later. +The context manager approach is preferred when you want temporary styling changes, while `plt.rcParams.update()` is better for permanent changes in your session. Let's do the plot again to see the changes: @@ -412,13 +417,21 @@ bins = 24 ``` ```python -fig, (ax_1, ax_2) = plt.subplots(1, 2) -fig.set_size_inches((12, 8)) +fig, (ax_1, ax_2) = plt.subplots(1, 2, figsize=(12, 8)) ax_1.set_title("MC samples without weights") -ax_1.hist(stack_mc_list_m4l, range=ranges[0], label=mc_samples, stacked=True, bins=bins) +ax_1.hist( + stack_mc_list_m4l, + range=ranges[0], + label=mc_samples, + stacked=True, + bins=bins, + alpha=0.8, +) ax_1.set_ylabel("Events") ax_1.set_xlabel(f"{var_name}{units}") ax_1.legend(frameon=False) +ax_1.grid(True, alpha=0.3) # Add subtle grid + ax_2.set_title("MC samples with weights") ax_2.hist( stack_mc_list_m4l, @@ -427,11 +440,14 @@ ax_2.hist( stacked=True, weights=stack_weights_list, bins=bins, + alpha=0.8, ) ax_2.set_ylabel("Events") ax_2.set_xlabel(f"{var_name}{units}") ax_2.tick_params(which="both", direction="in", top=True, right=True, length=6, width=1) +ax_2.grid(True, alpha=0.3) # Add subtle grid ax_2.legend(frameon=False) +plt.tight_layout() # Better subplot spacing ```  @@ -463,25 +479,62 @@ To make more easy the data vs. MC final plot, we can define the following helper When we want to make a plot that includes uncertainties we need to use the `ax.errorbar` function. ```python -def plot_data(data_var, range_ab, bins_samples): +def plot_data(data_var, range_ab, bins_samples, ax=None): + """ + Plot data histogram with Poisson error bars. + + Parameters: + ----------- + data_var : array-like + Data to histogram + range_ab : tuple + Range for histogram (min, max) + bins_samples : int + Number of bins + ax : matplotlib.axes.Axes, optional + Axes to plot on. If None, creates new figure + + Returns: + -------- + fig : matplotlib.figure.Figure or None + Figure object if ax was None, otherwise None + """ data_hist, bins = np.histogram(data_var, range=range_ab, bins=bins_samples) - print(f"{data_hist} {bins}") + print(f"Data histogram: {data_hist}") + print(f"Bin edges: {bins}") + + # Poisson errors (sqrt(N) for each bin) data_hist_errors = np.sqrt(data_hist) - bin_center = (bins[1:] + bins[:-1]) / 2 - fig, ax = plt.subplots() + + # Calculate bin centers more explicitly + bin_centers = (bins[:-1] + bins[1:]) / 2 + + if ax is None: + fig, ax = plt.subplots() + return_fig = True + else: + fig = None + return_fig = False + ax.errorbar( - x=bin_center, y=data_hist, yerr=data_hist_errors, fmt="ko", label="Data" + x=bin_centers, + y=data_hist, + yerr=data_hist_errors, + fmt="ko", + label="Data", + capsize=3, + markersize=5, ) - return fig + + return fig if return_fig else None ``` # Data vs. MC plot Finally, we can include the MC and data in the same figure, and see if they are in agreement :). ```python -fig, ax = plt.subplots() -fig.set_size_inches((10, 8)) -plot_data(stack_data_list_m4l, ranges[0], bins) +fig, ax = plt.subplots(figsize=(10, 8)) +plot_data(stack_data_list_m4l, ranges[0], bins, ax=ax) ax.hist( stack_mc_list_m4l, range=ranges[0], @@ -489,11 +542,14 @@ ax.hist( stacked=True, weights=stack_weights_list, bins=bins, + alpha=0.8, ) ax.set_ylabel("Events") ax.set_xlabel(f"{var_name}{units}") ax.set_ylim(0, 30) -ax.legend(fontsize=18, frameon=False) +ax.grid(True, alpha=0.3) # Add subtle grid +ax.legend(fontsize=16, frameon=False) # Slightly smaller font +plt.tight_layout() # Better spacing ```  diff --git a/_episodes/05-mplhep.md b/_episodes/05-mplhep.md index ab72da0..652f340 100644 --- a/_episodes/05-mplhep.md +++ b/_episodes/05-mplhep.md @@ -8,7 +8,7 @@ questions: objectives: - "Learn how to reproduce a plot with HEP experiments style using mplhep" keypoints: -- "[Mplhep](https://github.com/scikit-hep/mplhep) is a wrapper for easily apply plotting styles approved in the HEP collaborations." +- "[Mplhep](https://github.com/scikit-hep/mplhep) is a wrapper for easily applying plotting styles approved in the HEP collaborations." - "Styles for LHC experiments (CMS, ATLAS, LHCb and ALICE) are available." - "If you would like to include a style for your collaboration, ask for it [opening an issue](https://github.com/scikit-hep/mplhep/issues)!" --- @@ -34,9 +34,13 @@ import numpy as np import pandas as pd import matplotlib.pyplot as plt +# Modern mplhep style setting hep.style.use("CMS") -# or use any of the following -# {CMS | ATLAS | ALICE | LHCb1 | LHCb2} +# Available experiment styles: {CMS | ATLAS | ALICE | LHCb1 | LHCb2 | ROOT} + +# Alternative: use context manager for temporary styling +# with hep.style.use("CMS"): +# # your plotting code here ``` ## Getting the data @@ -125,7 +129,7 @@ ax.set_xlabel("4l invariant mass (GeV)", fontsize=15) ax.set_ylabel("Events / 3 GeV", fontsize=15) ax.set_xlim(rmin, rmax) ax.legend() -fig.show() +plt.show() ``` This would plot the following figure. @@ -198,7 +202,7 @@ yerrs = np.sqrt(hist) > label="Data" >) > ->ax.title( +>ax.set_title( > "$ \sqrt{s} = 7$ TeV, L = 2.3 $fb^{-1}$; $\sqrt{s} = 8$ TeV, L = 11.6 $fb^{-1}$ \n", > fontsize=15, >) @@ -209,7 +213,7 @@ yerrs = np.sqrt(hist) >ax.legend(fontsize=15) >hep.cms.label(rlabel="") > ->fig.show() +>plt.show() >``` >  {: .solution} @@ -245,7 +249,7 @@ ax.set_xlabel("4l invariant mass (GeV)", fontsize=15) ax.set_ylabel("Events / 3 GeV\n", fontsize=15) ax.set_xlim(rmin, rmax) -fig.show() +plt.show() ```  @@ -292,7 +296,7 @@ fig.show() > label="Data" >) > ->ax.title( +>ax.set_title( > "$ \sqrt{s} = 7$ TeV, L = 2.3 $fb^{-1}$; $\sqrt{s} = 8$ TeV, L = 11.6 $fb^{-1}$ \n", > fontsize=16, >) @@ -303,7 +307,7 @@ fig.show() >ax.legend(fontsize=15) > >fig.savefig("final-plot.png", dpi=140) ->fig.show() +>plt.show() >``` {: .solution} diff --git a/_episodes/07-dimuonspectrum.md b/_episodes/07-dimuonspectrum.md index 59ba3b3..ec9f3b2 100644 --- a/_episodes/07-dimuonspectrum.md +++ b/_episodes/07-dimuonspectrum.md @@ -17,9 +17,9 @@ keypoints: ## Looking at the dimuon spectrum over a wide energy range - +