Skip to content

Conversation

@meliache
Copy link
Contributor

@meliache meliache commented Jan 27, 2022

Assuming the axis has an edges attribute, I assume it should also have a widths attribute, so I thought we can just use that for the pull bar widths instead of calculating our own assuming a regular spacing (which might not be true).

For the patch-width, we just use right_edge - left_edge, instead of what was done before, which dividing this difference by the number of pulls and then multiplying them again with the same number.

From git-blame it looks like @matthewfeickert had written the original function so I might mention him.

I should also mention I'm not using this function, I just looked at it because I was writing my own pull plot function and was considering adapting the code for the rectangular patches and then just wondered why hist.axes[0].widths isn't used. Thus, it would be good if this could be tested properly.

  • Find why pulls in ratio_plot with variable binning are larger than their errorbars and fix this (see this plot)

Assuming the axis has an `edges` attribute, I assume it should also have a
`widths` attribute, so I thought we can just use that for the pull bar widths
instead of calculating our own assuming a regular spacing (which might not be
true).

For the patch-width, we just use `right_edge - left_edge`, instead of what was
done before, which dividing this difference by the number of pulls and then
multiplying them again with the same number.
@meliache
Copy link
Contributor Author

meliache commented Jan 27, 2022

Now I looked at the file again and plot_ratio_array also uses a constant bar width assumption:

bar_width = (right_edge - left_edge) / len(ratio)

I didn't use that function and am not sure how it's used and if possibly the ratio doesn't make sense with irregular binnings 🤷 But if you say it's safe I might also change that. I noticed that when I saw the needs changelog label and thought I should consider which other functions would be affected by this and was thinking whether plot_ratio_array would now also allow for non-regular axes.

@henryiii
Copy link
Member

CC @matthewfeickert, IIRC, on that one?

@matthewfeickert
Copy link
Member

I'll try to take a look at this before EOD on Monday.

@meliache
Copy link
Contributor Author

Thanks. Before merging I would also update the changelog, I was just waiting for feedback to my above comment, depending on which I might add further changes.

@henryiii
Copy link
Member

That's what I'm waiting on feedback for, @matthewfeickert added it, IIRC. I think the answer is yes, that these were both added without realizing that axes had edges and such. Even if the calculation doesn't handle irregular bin widths, I still think it's not "wrong" to use the correct edges attributes to compute it (it would just need a check and error in that case).

@henryiii
Copy link
Member

I'll be trying to get a release out today, FYI.

@matthewfeickert
Copy link
Member

I'll be trying to get a release out today, FYI.

Once I'm out of my meeting I'll go through this before lunch.

@matthewfeickert
Copy link
Member

Thanks for the PR @meliache and sorry for letting it get buried in notifications.

Now I looked at the file again and plot_ratio_array also uses a constant bar width assumption:

bar_width = (right_edge - left_edge) / len(ratio)

I didn't use that function and am not sure how it's used and if possibly the ratio doesn't make sense with irregular binnings 🤷 But if you say it's safe I might also change that. I noticed that when I saw the needs changelog label and thought I should consider which other functions would be affected by this and was thinking whether plot_ratio_array would now also allow for non-regular axes.

Yeah, I think that's fine. In PR #161 I was indeed making the assumption that the bin widths are constant.

Even if the calculation doesn't handle irregular bin widths, I still think it's not "wrong" to use the correct edges attributes to compute it (it would just need a check and error in that case).

Nothing should explicitly depend on this, so I would agree that this can be propagated everywhere.

Beyond using the suggested behavior of your PR everywhere, can you also add a test that uses irregular binning?

I'll be trying to get a release out today, FYI.

@henryiii As I was slow and it is already evening for @meliache I doubt this would be ready to go into a new release today (I have other work that I need to get done before this afternoon as well). I can help @meliache with finishing this tonight or tomorrow if they'd like it, so given the "Hist Serialization and Interactivity" fellow project that you're going to be mentoring will probably have some new releases coming out soon what do you think about getting this PR into a following release, so that I don't slow you down here?

@henryiii
Copy link
Member

Following release would be fine. Maybe even this one, since scikit-hep/boost-histogram#708 is a serious regression and hist will have to wait till after bh gets fixed. 😞

@matthewfeickert
Copy link
Member

Following release would be fine. Maybe even this one, since scikit-hep/boost-histogram#708 is a serious regression and hist will have to wait till after bh gets fixed.

Ah very sorry to hear it. :( Hopefully, while serious, it isn't too much of a pain to patch. But sounds good RE: either this release or the next.

@meliache I like your plan, and if you'd like me to help add contributions if you're currently busy I'm happy to do so given that I delayed you here. Of course I'm happy to answer any questions you have and let you finish this one off yourself. 👍

@meliache
Copy link
Contributor Author

meliache commented Feb 15, 2022

@henryiii As I was slow and it is already evening for @meliache I doubt this would be ready to go into a new release today (I have other work that I need to get done before this afternoon as well). I can help @meliache with finishing this tonight or tomorrow if they'd like it

Thanks for considering my time zone, evening is good as I don't have any meeting or other work to do, so I can add a couple of things like adapting plot_ratio_array and adding a changelog entry and possibly a test (see below), but after this I'll be a way from my keyboard and probably unresponsive. I never considered any release dates. So, if I'm unresponsive after 21 CET and you want this to be merged before tomorrow, feel free add what you think might be missing.

Beyond using the suggested behaviour of your PR everywhere, can you also add a test that uses irregular binning?

Good idea, a test would make me more comfortable merging this. The question is what to test. I would assume

  1. Test that the changed plotting functions run without exceptions with regular as well as irregular bins
  2. Test that we get the expected widths, e.g. how the resulting plots look like. For that I would use a regular axis or a variable axis with a regular binning and check that the bin widths are what we would expect from the constant-width calculation used prior to this PR. I could either look at the matplotlib artists that the plotting functions return, or I could use something like unittest.mock to mock the matplotlib plotting functions and check which arguments they were called with. Tbh the latter seems simpler to me. Update: Oh, I see you actually do image comparisons in tests, never saw that but it's nice.

I just took a look at your tests directory, probably it should go into test_plot.py, either into test_general_plot_pull or a separate similar test. The

@henryiii
Copy link
Member

You can use nox -s regenerate to make/update test images (linux or docker). In 2-3 hours, boost-histogram release will complete, and I'll start working on releasing Hist.

@meliache
Copy link
Contributor Author

meliache commented Feb 16, 2022

I just realized that _fit_callable_to_hist also doesn't take into account varying bin widths when calling _curve_fit_wrapper. Instead of histogram.values() they should be somehow normalize by the widths, e.g. like what the binwnorm parameter does in the plotting functions. This is one of the things we should fix before claiming that h.plot_pull allows for variable axes.

Update: Untested hotfix in 37b6d3d

Necessary when doing pull plots for histograms with an axis of type
`hist.axis.Variable` with varying bin widths.
Co-authored-by: Henry Schreiner <[email protected]>
@henryiii
Copy link
Member

$ docker run --rm -v$PWD:/hist -w/hist -it quay.io/pypa/manylinux2014_x86_64
# pipx run nox -s regenerate

It's easier to compare online, so I'm just pushing the result.

@henryiii
Copy link
Member

henryiii commented Feb 16, 2022

(Sorry for the delay pushing, got nerd-sniped) Looks like this messes with the scaling - should this be normalized by the total width?

@meliache
Copy link
Contributor Author

meliache commented Feb 16, 2022

(Sorry for the delay pushing, got nerd-sniped) Looks like this messes with the scaling - should this be normalized by the total width?

Upps, I think I messed this up in 37b6d3d. I didn't test that and somehow assumed that it should be right at that time. Now I think it should work if I replace the np.sum in there with a np.mean, i.e.

    h_values_width_corrected = histogram.values() / (bin_widths / np.mean(bin_widths))

In that case, in case of a regular binning, the hist values remain unchanged, but if a bin is e.g. twice the mean bin width, the value will be divided by half. Instead of np.mean(bin_widths) we could also just do (right_edge - left_edge) / num_bins, but I doubt this will be much faster in practice.

Anyway, my local tests now fail again, but I think this is because you changed the baseline. Can I just revert that commit when I think I have fixed it?

In case your rushing to review this PR to get it into the new release, I'm not sure I can propagate this all the way up, since this affects more functions that I expected (at least if we want to allow variable axes in general) and I'm still not sure how to best tackle the tests and getting good coverage might need some time for me (though I'd be glad if @matthewfeickert could help with that).

Update: I see you already suggested a fix, thanks. And by the way, my mind was blown when I saw that github has image diffs.

@matthewfeickert
Copy link
Member

matthewfeickert commented Feb 24, 2022

...eventually making my way back to this after getting some other reviews done. It looks like tests are currently passing (:+1:) but

and I'm still not sure how to best tackle the tests

I can help add some example tests to cover the cases of irregular binning. I think(?) that hist doesn't do Codecov reporting or anything like that (@henryiii if you're game I could add Codecov), but from the diff there's basically nothing that would be uncovered (:+1:) and this is just testing functionality / sniffing for pathological examples / using tests as examples to future us.

@henryiii
Copy link
Member

Sure, I'd be okay with Codecov. Is that the one that adds a comment to every PR? :/

@matthewfeickert
Copy link
Member

Sure, I'd be okay with Codecov. Is that the one that adds a comment to every PR? :/

Yeah. It updates it in place each time there is a coverage change. c.f. scikit-hep/pyhf#1809 (comment)

Not sure if that's desirable or not in your mind.

@henryiii
Copy link
Member

That's mildly annoying but not terrible. Though I'm guessing there's a way to avoid it: pypa/build#443 for example.

@meliache
Copy link
Contributor Author

To be honest I also got busy with other things, sorry for not replying this week earlier, but I could revisit the tests.

I can help add some example tests to cover the cases of irregular binning.

Thanks a lot. I think I got overwhelmed by open questions how exactly to do the tests. E.g. should I include asserts in with variable binnings into the existing test functions or create new test functions? What should I best do to avoid code duplication, because necessarily I will basically repeat existing tests with non-regular binnigngs? How fine-grained should the tests be? Should I just test non-crashing behaviour or also generate example images and if yes, I'm not sure what the best procedure is for that. But as always don't let perfection be the enemy of the good

meliache and others added 4 commits March 11, 2022 20:06
I add 2 tests

- `test_image_plot_ratio_variable_axis_with_regular_bins`:
  Same as `test_image_plot_ratio_callable`, but use an `axis.Variable`, but set
  the bin edges to be regular, same as former test with `axis.Regular`. With
  this I just check that the plot doesn't depend on the axis type. For this we
  re-use the existing baseline plot.

- `test_image_plot_ratio_callable_bayesian_blocks_bins`:
  Actually try the ratio plot with variable bin widths. For this I ran the
  bayesian blocks algorithm on 5000 random numbers. I used that instead of 1000
  numbers because for 1000 the binning seemed very coarse for me and the fit
  plots just looked like a triangle. For this we create a new baseline plot.

Also I added a helper function to create these `plot_ratio_callable` test for
different axes/binnings in order to avoid code duplication.

TODO: The baseline plots for
`test_image_plot_ratio_callable_bayesian_blocks_bins` looks like something is
off, the centre pull is much larger than the pull errorbars. I have to check
that the bin widths are correctly taken into account in the errorbars and ratios.
@meliache
Copy link
Contributor Author

meliache commented Mar 14, 2022

I just add two tests:

  • test_image_plot_ratio_variable_axis_with_regular_bins: Same as test_image_plot_ratio_callable, but use an axis.Variablewith manually set regular bin edges, same as former test with axis.Regular, so the plot should look the same as the axis.Regular plot, thus I re-use the baseline here.

  • test_image_plot_ratio_callable_bayesian_blocks_bins: Do a ration plots with variable bin widths. For this I ran the bayesian blocks algorithm on 5000 random numbers because I was too lazy to hand-pick variable bins.

Also I added a helper function to create these plot_ratio_callable test for different axes/binnings in order to avoid code duplication.

However, I see that for some python versions the plot diffs fail 🤔 . Also I saw that the new baseline for the variable binning looks off, the ratios are larger than the errorbars, so I think I have to adapt the errorbar or ratio calculation, but I decided to upload it anyway to at least have some tests as a starting point.
Update: Seems fixed now after f8bb460

Most tests use `np.random.seed(42)` to re-seed the global RNG. Initially I didn't
follow this in my tests, because according to its documentation it is not
best-practice and a legacy function.

> The best practice is to not reseed a BitGenerator, rather to recreate a new
> one. This method is here for legacy reasons.

Instead, I created a custom RNG with a seed and used that to generate my random
numbers. However, the tests failed and I noticed that the random seed not only affects the numbers to
fill the histogram with, but also the fit, which I can't affect via a custom RNG,
so I have to re-seed the global RNG.

Therefore, I fall back to using `np.random.normal`.
@meliache
Copy link
Contributor Author

Now I have arrived where I'm personally happy in this PR, I think I propagated my change up so that irregular bin widths should work for plot_pull and plot_ratio and I think the tests and the new baseline for plot_ratio with irregular bins look good now. I will no stop working on this for now and await further review. If you need more tests please tell me.

Coding-style wise I noticed that I now often repeat the same 2 to 3 lines to normalize the histogram values to the mean bin width. I think this could be factored out into a private helper function but for just 2 lines this doesn't really seam worth it, so I'm not sure. I'll leave it like it's now, except you think it's better to write a function for it.

@matthewfeickert
Copy link
Member

Thanks. I'll take a look at this tonight after I've finished Snowmass papers. But CI is passing and there are tests now, so my guess is that things are probably 👍.

pulls: np.typing.NDArray[Any] = (
hist_values - compare_values
hist_values_width_corrected - compare_values
) / hist_values_uncert
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, hist_values_uncert seems unbound if other is not a callable or string but a histogram. This is not something I introduced in this PR so if I'm not mistaken this might warrant a separate issue.

The question would be whether we just want to use np.sqrt(self.variances()) or in case other also has variances, if we want to statistically combine both somehow 🤷‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants