Skip to content

Conversation

@matthewfeickert
Copy link
Member

@matthewfeickert matthewfeickert commented Apr 7, 2021

Add frequentist coverage intervals as a module (based off those added by @nsmith- to coffea) which will be used in PR #161.

Preview of relevant changes to docs: https://hist--176.org.readthedocs.build/en/176/reference/hist.html#module-hist.intervals

Suggested squash and merge message

* Add intervals module to provide frequentist coverage interval support
* Add Literal to hist.typing

@matthewfeickert matthewfeickert added the enhancement New feature or request label Apr 7, 2021
@matthewfeickert matthewfeickert self-assigned this Apr 7, 2021
@matthewfeickert matthewfeickert added the documentation Improvements or additions to documentation label Apr 7, 2021
@matthewfeickert matthewfeickert requested a review from henryiii April 7, 2021 20:32
@henryiii
Copy link
Member

henryiii commented Apr 8, 2021

I don't think I need to list "add tests for intervals module" to the squashed version, that should be implied by adding the module. :)

@matthewfeickert
Copy link
Member Author

I don't think I need to list "add tests for intervals module" to the squashed version, that should be implied by adding the module. :)

SGTM. I've revised the PR body. 👍

missing = np.where(values == 0)
available = np.nonzero(values)
if len(available[0]) == 0:
raise RuntimeError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent with docstring. I would suggest warnings.warn with a RuntimeWarning.

Copy link
Member Author

@matthewfeickert matthewfeickert Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Henry had suggested the RuntimeError and I had forgot to upate the docstring. Why should this be a warning though vs. an error? If everything is zero then something is wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's valid to plot an empty histogram with error bars, certainly if the data is not scaled then you even have a well-defined error bar. The interpretation of them can be challenging if the data is scaled, however, and there's no way to tell other than the user. Either way it seems we should not throw an exception.

Copy link
Member Author

@matthewfeickert matthewfeickert Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's valid to plot an empty histogram with error bars

Why are you plotting non-existent data? If the histogram is actually empty this doesn't make sense. Link to an example?

Also, why are you taking a ratio of anything that is non-existent? This makes even less sense to me. :? An example would be helpful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this poisson interval method used for plain errorbar histos (as well as ratios)? For ratios sure makes less sense. But imagine you are dumping 100 region plots to a file, now if one region is empty do you want your plot dumper script to crash or emit a warning?

Copy link
Member Author

@matthewfeickert matthewfeickert Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But imagine you are dumping 100 region plots to a file, now if one region is empty do you want your plot dumper script to crash or emit a warning?

IMO it should crash. It is your responsibility to clean your data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, its a valid opinion. Just seems at odds with the amount of work this routine does to make up vaguely reasonable error bars in the case where even all but one bin is zero, that it would give up if all are zero. I won't press the issue further then (except to update the docstring)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you expect it to crash, that's what try/except (or using "if" beforehand) is for. This could mask real problems, like all plots being empty because something is misconfigured? Nobody reads logs; warnings are next to useless.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this poisson interval method used for plain errorbar histos (as well as ratios)?

In practice it could be, but at the moment it is only being used in ratio_uncertainty.

Just seems at odds with the amount of work this routine does to make up vaguely reasonable error bars in the case where even all but one bin is zero, that it would give up if all are zero

I see what you're saying, but then would you also suggest not allowing any empty values? Or how would you define the cutoff? I don't think that you're pressing anything, I just want to make sure that I understand what your thoughts are here as it is clear that you've thought about this far more than I have.

(Henry I think we're in agreement but if I'm misunderstanding (sorry) please let me know).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The all zero case is the only situation when there really isn't enough info to do something reasonable so its fine. LGTM

with np.errstate(divide="ignore"):
ratio = num / denom
if uncertainty_type == "poisson":
ratio_uncert = np.abs(poisson_interval(ratio, num / np.square(denom)) - ratio)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some more docs on what this is would be helpful? I am actually not sure what it is, poisson for numerator?

Copy link
Member

@nsmith- nsmith- Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked back at my code to see that indeed it is poisson for numerator. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm switching work at the moment, but I'll come back tonight and clean this up and make it more clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poke me when ready. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poke me when ready

sorry I missed this comment. poke.

elif uncertainty_type == "poisson-ratio":
# poisson ratio n/m is equivalent to binomial n/(n+m)
ratio_uncert = np.abs(clopper_pearson_interval(num, num + denom) - ratio)
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to add as well the simple propagation of error style uncertainty: https://github.com/CoffeaTeam/coffea/blob/84e805773c1fac32fc79bc9373ec324552371244/coffea/hist/plot.py#L82

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but is that needed in this PR to get things up and going for PR #161? That could be added later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to open as an issue and move forward with this for now. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering its the ROOT default ratio plot error (TH1::Divide) seems we would want that no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering its the ROOT default ratio plot error (TH1::Divide) seems we would want that no?

Let me go look, but I'm not sold on this argument. ROOT does plenty of things that are not a good idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also do not consider ROOT's warning vs. error behavior to be a design standard for us. :)

Now if there's a valid reason to do this, then why is there a warning? And if not, why not just error now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(If it's valid but rare, warnings can be squelched, while error's can't be)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't add it originally either, but it is valid in some cases. See the PR that added it to coffea: scikit-hep/coffea#182

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't add it originally either, but it is valid in some cases.

Looking back at that Issue it is brought up that

Ah yeah, fair enough, this would be needed for efficiencies derived from weighted data (which is kind of rare, but definitely happens).

But the specific example isn't really elaborated on. Can you either ELI5 why this is worth doing now, or make a new Issue from this discussion and it can get labeled as a "good first issue"? Again, you've thought about this all more than I have, so it is very possible I'm just not seeing something incredibly obvious to you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it would be convenient to copy it as the others, since it isn't much work. But also fine to put it off to later if you prefer. I can make the issue

@henryiii
Copy link
Member

henryiii commented Apr 8, 2021

@all-contributors please add @matthewfeickert for code

@allcontributors
Copy link
Contributor

@henryiii

I've put up a pull request to add @matthewfeickert! 🎉

@matthewfeickert matthewfeickert force-pushed the feat/add-intervals-module branch from eb13b6b to aa09c83 Compare April 8, 2021 21:24
@matthewfeickert matthewfeickert requested a review from henryiii April 8, 2021 21:27
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Apr 8, 2021

Sourcery Code Quality Report

❌  Merging this PR will decrease code quality in the affected files by 0.06%.

Quality metrics Before After Change
Complexity 13.10 🙂 13.10 🙂 0.00
Method Length 97.29 🙂 97.47 🙂 0.18 👎
Working memory 15.69 ⛔ 15.71 ⛔ 0.02 👎
Quality 41.54% 😞 41.48% 😞 -0.06% 👎
Other metrics Before After Change
Lines 474 473 -1
Changed files Quality Before Quality After Quality Change
src/hist/init.py 89.99% ⭐ 89.99% ⭐ 0.00%
src/hist/plot.py 37.70% 😞 37.70% 😞 0.00%
src/hist/typing.py 84.68% ⭐ 82.75% ⭐ -1.93% 👎

Here are some functions in these files that still need a tune-up:

File Function Complexity Length Working Memory Quality Recommendation
src/hist/plot.py plot_pull 23 😞 716 ⛔ 22 ⛔ 16.50% ⛔ Refactor to reduce nesting. Try splitting into smaller methods. Extract out complex expressions
src/hist/plot.py plot2d_full 9 🙂 278 ⛔ 14 😞 38.61% 😞 Try splitting into smaller methods. Extract out complex expressions
src/hist/plot.py _curve_fit_wrapper 5 ⭐ 156 😞 14 😞 50.72% 🙂 Try splitting into smaller methods. Extract out complex expressions
src/hist/plot.py _expr_to_lambda 9 🙂 129 😞 11 😞 54.60% 🙂 Try splitting into smaller methods. Extract out complex expressions
src/hist/plot.py plot_pie 1 ⭐ 51 ⭐ 10 😞 75.33% ⭐ Extract out complex expressions

Legend and Explanation

The emojis denote the absolute quality of the code:

  • ⭐ excellent
  • 🙂 good
  • 😞 poor
  • ⛔ very poor

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.


Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Let us know what you think of it by mentioning @sourcery-ai in a comment.

Copy link
Member

@henryiii henryiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@henryiii henryiii merged commit ef456a4 into master Apr 9, 2021
@henryiii henryiii deleted the feat/add-intervals-module branch April 9, 2021 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants