Skip to content
94 changes: 94 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

This is a Jekyll-based training module for the HEP Software Foundation (HSF) teaching "Matplotlib for HEP". It introduces matplotlib plotting library and creates plots commonly used in High Energy Physics, including specialized HEP styling via `mplhep`.

## Recent Updates

The training materials have been modernized to align with current matplotlib and Python best practices:
- Updated to modern matplotlib 3.x+ styling and API usage
- Enhanced with mplhep context managers and experiment-specific styles
- Fixed deprecated functions and improved plot aesthetics
- Added comprehensive requirements.txt for dependency management
- Updated MathJax to modern CDN for equation rendering

## Common Development Commands

### Local Development
- **Serve locally**: `make serve` - Builds and serves the Jekyll site locally
- **Build site**: `make site` - Builds the site without serving
- **Clean**: `make clean` - Removes generated files and caches

### Docker Alternative
- **Docker serve**: `make docker-serve` - Uses Docker to serve the site (requires Docker)

### Repository Maintenance
- **Repository check**: `make repo-check` - Validates repository settings
- **Lesson validation**: `make lesson-check` - Validates lesson Markdown files
- **Complete validation**: `make lesson-check-all` - Full validation including line lengths and whitespace
- **Unit tests**: `make unittest` - Runs tests on checking tools

### Pre-commit Hooks
This repository uses pre-commit hooks for code quality:
```bash
pip3 install pre-commit
pre-commit install
```

## Site Architecture

### Jekyll Structure
- **_config.yml**: Main Jekyll configuration for HSF training theme
- **_episodes/**: Lesson content in Markdown format (7 episodes total)
- **_extras/**: Additional pages (about, discussion, figures, guide)
- **_includes/**: Reusable template components
- **fig/**: Image assets for lessons
- **Gemfile**: Ruby dependencies including hsf-training-theme

### Content Organization
- Episodes are numbered sequentially (01-07) covering:
- **01-introduction.md**: Matplotlib basics, modern best practices, figure creation
- **02-coffee-break.md**: Interactive break elements
- **03-physics.md**: Standard Model background, particle physics theory
- **04-higgs-search.md**: Real ATLAS data analysis, histogram techniques
- **05-mplhep.md**: HEP-specific styling, experiment themes (CMS, ATLAS, etc.)
- **06-coffee-break.md**: Interactive break elements
- **07-dimuonspectrum.md**: Advanced analysis with invariant mass calculations
- Uses Carpentries lesson template structure
- Physics-focused content with LaTeX math support via modern MathJax

### Build System
- Uses Jekyll with HSF training theme
- Ruby-based build system via Bundler
- Makefile provides convenience commands
- GitHub Pages deployment via gh-pages branch
- Pre-commit hooks enforce markdown and code formatting

## Development Notes

- Main development branch: `gh-pages`
- Uses HSF training theme from remote repository
- Includes physics equation rendering via modern MathJax 3.x
- Associated notebooks repository for interactive content: `hsf_matplotlib_notebooks`
- Multiple cloud platforms supported: Binder, Google Colab, GitHub Codespaces, CERN SWAN
- **requirements.txt** available for local Python environment setup

## Python Dependencies

The repository includes a comprehensive `requirements.txt` with:
- Core: matplotlib>=3.6.0, numpy, pandas
- HEP-specific: mplhep>=0.3.0, uproot>=4.0.0, hist>=2.6.0
- Jupyter ecosystem: jupyterlab>=4.0.0, notebook, ipywidgets
- Data handling: h5py for HDF5 files

## Code Style Guidelines

- Use explicit `fig, ax = plt.subplots()` pattern
- Prefer `plt.show()` over `fig.show()`
- Use context managers for temporary styling: `with hep.style.use("CMS"):`
- Include `plt.tight_layout()` for better subplot spacing
- Add subtle grids with `alpha=0.3` for better readability
- Use modern error bar styling with `capsize` and `markersize` parameters
46 changes: 32 additions & 14 deletions _episodes/01-introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,15 @@ The example-based nature of [Matplotlib documentation](https://matplotlib.org/)
Matplotlib is the standard when it comes to making plots in Python. It is versatile and allows for lots of functionality and different ways to produce many plots.
We will be focusing on using matplotlib for High Energy Physics.

> ## Modern Best Practices
>
> While matplotlib has evolved significantly, some key modern practices include:
> - Using explicit figure and axes creation with `fig, ax = plt.subplots()`
> - Leveraging context managers for temporary styling changes
> - Using more descriptive parameter names (e.g., `alpha=0.8` instead of just transparency)
> - Taking advantage of improved default styling in matplotlib 3.x+
{: .callout}

# A simple example

As with any Python code it is always good practice to import the necessary libraries as a first step.
Expand All @@ -38,7 +47,6 @@ plt.show() # Show the figure

This code produces the following figure:

<!-- ![basic_plot](https://matplotlib.org/stable/_images/sphx_glr_usage_002.png) -->
![basic_plot](https://matplotlib.org/3.5.1/_images/sphx_glr_usage_001_2_0x.png)

> ## Notice
Expand Down Expand Up @@ -130,11 +138,11 @@ plt.show()
As mentioned, by default `fig, ax = plt.subplots()` creates the canvas automatically. We can have finer control over the shape and quality of the plots by using the keyword arguments `figsize` and `dpi` as follows.

```python
fig, ax = plt.subplots(figsize=(10, 10), dpi=150)
fig, ax = plt.subplots(figsize=(10, 8), dpi=100)
```

This has to be set **before** any instance of `ax.plot` and it sets the width and height to 10 and 10 inches respectively. The keyword `dpi` refers to a density of *Dots Per Inch*.
There is no particular reason to choose 150 as the value for dpi but there is a visually a noticeable difference in the size and quality of the plot.
This has to be set **before** any instance of `ax.plot` and it sets the width and height to 10 and 8 inches respectively. The keyword `dpi` refers to a density of *Dots Per Inch*.
Modern displays typically use 100 DPI as a good default, though you can increase to 150 or 200 for higher quality output.

### Title

Expand Down Expand Up @@ -231,12 +239,11 @@ We have available a useful python package called [mplhep](https://mplhep.readthe
```python
import mplhep as hep

hep.style.use(hep.style.ROOT) # For now ROOT defaults to CMS
# Or choose one of the experiment styles
# Modern mplhep usage - choose one of the experiment styles
hep.style.use(hep.style.ATLAS)
# or
hep.style.use("CMS") # string aliases work too
# {"ALICE" | "ATLAS" | "CMS" | "LHCb1" | "LHCb2"}
# or using string aliases (recommended)
hep.style.use("CMS")
# Available styles: {"ALICE" | "ATLAS" | "CMS" | "LHCb1" | "LHCb2" | "ROOT"}
```

and with just this addition we can produce the same plot as before with this new look.
Expand Down Expand Up @@ -267,14 +274,25 @@ We will discuss histograms more in detail later but here is an example code and
# first lets get some fake data
data = np.random.normal(size=10_000)

# create figure
fig, ax = plt.subplots()

# now lets make the plot
counts, bin_edges, _ = ax.hist(data, bins=50, histtype="step")
counts, bin_edges, _ = ax.hist(data, bins=50, histtype="step", alpha=0.8)

# we need to get the centers in order get the correct location for the error bars
bin_centers = (
bin_edges[:-1] + bin_edges[1:]
) / 2 # More explicit bin center calculation

# we need to get the centers in order get the correct location for the errobars
bin_centers = bin_edges[:-1] + np.diff(bin_edges) / 2
# add error bars (Poisson errors for counts)
ax.errorbar(bin_centers, counts, yerr=np.sqrt(counts), fmt="none", capsize=2)

# add error bars
ax.errorbar(bin_centers, counts, yerr=np.sqrt(counts), fmt="none")
# Add labels and improve appearance
ax.set_xlabel("Value")
ax.set_ylabel("Counts")
ax.set_title("Histogram with Error Bars")
ax.grid(True, alpha=0.3)
plt.show()
```

Expand Down
4 changes: 2 additions & 2 deletions _episodes/03-physics.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ keypoints:
- "Analysis studies Higgs boson decays"
---

<!-- Mathjax Support -->
<!-- MathJax Support -->
<script type="text/javascript" async
src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-MML-AM_CHTML">
src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js">
</script>

You can take a look at the mathematical structure of the Standard Model of Particle Physics in the next section, or go directly to the main Higgs production mechanisms at hadron colliders.
Expand Down
92 changes: 74 additions & 18 deletions _episodes/04-higgs-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ objectives:
keypoints:
- "In High-energy physics, histograms are used to analyze different data and MC distributions."
- "With Matplotlib, data can be binned and histograms can be plotted in a few lines."
- "Using [Uproot](https://github.com/scikit-hep/uproot5) and Matplotlib, data in ROOT files can be display without need of a full ROOT installation."
- "Using [Uproot](https://github.com/scikit-hep/uproot5) and Matplotlib, data in ROOT files can be displayed without need of a full ROOT installation."
- "Histograms can be stacked and/or overlapped to make comparison between recorded and simulated data."
---
In this episode, we will go through a first HEP analysis where you will be able to apply your knowledge of matplotlib and learn something new.
Expand Down Expand Up @@ -150,14 +150,19 @@ ax.hist(branches["data_A"]["m4l"])

![m4lep_histogram_0]({{ page.root }}/fig/m4lep_histogram_0.png)

**Tip:** In the previous plot the numbers in the axis are very small, we can change the font size (and font family) for all the following plots, including in our code:
**Tip:** In the previous plot the numbers in the axis are very small, we can change the font size (and font family) for all the following plots. Modern best practice is to use context managers or explicit parameter setting:

```python
# Update the matplotlib configuration parameters:
mpl.rcParams.update({"font.size": 16, "font.family": "serif"})
# Modern approach - use context manager for temporary changes
with plt.rc_context({"font.size": 16, "font.family": "serif"}):
fig, ax = plt.subplots()
# your plotting code here

# Or update global settings (affects all subsequent plots)
plt.rcParams.update({"font.size": 16, "font.family": "serif"})
```

Note that this changes the global setting, but it can still be overwritten later.
The context manager approach is preferred when you want temporary styling changes, while `plt.rcParams.update()` is better for permanent changes in your session.

Let's do the plot again to see the changes:

Expand Down Expand Up @@ -412,13 +417,21 @@ bins = 24
```

```python
fig, (ax_1, ax_2) = plt.subplots(1, 2)
fig.set_size_inches((12, 8))
fig, (ax_1, ax_2) = plt.subplots(1, 2, figsize=(12, 8))
ax_1.set_title("MC samples without weights")
ax_1.hist(stack_mc_list_m4l, range=ranges[0], label=mc_samples, stacked=True, bins=bins)
ax_1.hist(
stack_mc_list_m4l,
range=ranges[0],
label=mc_samples,
stacked=True,
bins=bins,
alpha=0.8,
)
ax_1.set_ylabel("Events")
ax_1.set_xlabel(f"{var_name}{units}")
ax_1.legend(frameon=False)
ax_1.grid(True, alpha=0.3) # Add subtle grid

ax_2.set_title("MC samples with weights")
ax_2.hist(
stack_mc_list_m4l,
Expand All @@ -427,11 +440,14 @@ ax_2.hist(
stacked=True,
weights=stack_weights_list,
bins=bins,
alpha=0.8,
)
ax_2.set_ylabel("Events")
ax_2.set_xlabel(f"{var_name}{units}")
ax_2.tick_params(which="both", direction="in", top=True, right=True, length=6, width=1)
ax_2.grid(True, alpha=0.3) # Add subtle grid
ax_2.legend(frameon=False)
plt.tight_layout() # Better subplot spacing
```

![MC_histogram_4]({{ page.root }}/fig/MC_histogram_4.png)
Expand Down Expand Up @@ -463,37 +479,77 @@ To make more easy the data vs. MC final plot, we can define the following helper
When we want to make a plot that includes uncertainties we need to use the `ax.errorbar` function.

```python
def plot_data(data_var, range_ab, bins_samples):
def plot_data(data_var, range_ab, bins_samples, ax=None):
"""
Plot data histogram with Poisson error bars.

Parameters:
-----------
data_var : array-like
Data to histogram
range_ab : tuple
Range for histogram (min, max)
bins_samples : int
Number of bins
ax : matplotlib.axes.Axes, optional
Axes to plot on. If None, creates new figure

Returns:
--------
fig : matplotlib.figure.Figure or None
Figure object if ax was None, otherwise None
"""
data_hist, bins = np.histogram(data_var, range=range_ab, bins=bins_samples)
print(f"{data_hist} {bins}")
print(f"Data histogram: {data_hist}")
print(f"Bin edges: {bins}")

# Poisson errors (sqrt(N) for each bin)
data_hist_errors = np.sqrt(data_hist)
bin_center = (bins[1:] + bins[:-1]) / 2
fig, ax = plt.subplots()

# Calculate bin centers more explicitly
bin_centers = (bins[:-1] + bins[1:]) / 2

if ax is None:
fig, ax = plt.subplots()
return_fig = True
else:
fig = None
return_fig = False

ax.errorbar(
x=bin_center, y=data_hist, yerr=data_hist_errors, fmt="ko", label="Data"
x=bin_centers,
y=data_hist,
yerr=data_hist_errors,
fmt="ko",
label="Data",
capsize=3,
markersize=5,
)
return fig

return fig if return_fig else None
```

# Data vs. MC plot

Finally, we can include the MC and data in the same figure, and see if they are in agreement :).
```python
fig, ax = plt.subplots()
fig.set_size_inches((10, 8))
plot_data(stack_data_list_m4l, ranges[0], bins)
fig, ax = plt.subplots(figsize=(10, 8))
plot_data(stack_data_list_m4l, ranges[0], bins, ax=ax)
ax.hist(
stack_mc_list_m4l,
range=ranges[0],
label=mc_samples,
stacked=True,
weights=stack_weights_list,
bins=bins,
alpha=0.8,
)
ax.set_ylabel("Events")
ax.set_xlabel(f"{var_name}{units}")
ax.set_ylim(0, 30)
ax.legend(fontsize=18, frameon=False)
ax.grid(True, alpha=0.3) # Add subtle grid
ax.legend(fontsize=16, frameon=False) # Slightly smaller font
plt.tight_layout() # Better spacing
```

![m4lep_histogram_5]({{ page.root }}/fig/m4lep_histogram_5.png)
Expand Down
Loading