feat: Add frequentist coverage intervals module #176

matthewfeickert · 2021-04-07T20:27:51Z

Add frequentist coverage intervals as a module (based off those added by @nsmith- to coffea) which will be used in PR #161.

Preview of relevant changes to docs: https://hist--176.org.readthedocs.build/en/176/reference/hist.html#module-hist.intervals

Suggested squash and merge message

* Add intervals module to provide frequentist coverage interval support
* Add Literal to hist.typing

src/hist/intervals.py

henryiii · 2021-04-08T04:45:55Z

I don't think I need to list "add tests for intervals module" to the squashed version, that should be implied by adding the module. :)

matthewfeickert · 2021-04-08T04:49:47Z

I don't think I need to list "add tests for intervals module" to the squashed version, that should be implied by adding the module. :)

SGTM. I've revised the PR body. 👍

src/hist/intervals.py

nsmith- · 2021-04-08T13:24:49Z

src/hist/intervals.py

+        missing = np.where(values == 0)
+        available = np.nonzero(values)
+        if len(available[0]) == 0:
+            raise RuntimeError(


Inconsistent with docstring. I would suggest warnings.warn with a RuntimeWarning.

Henry had suggested the RuntimeError and I had forgot to upate the docstring. Why should this be a warning though vs. an error? If everything is zero then something is wrong.

It's valid to plot an empty histogram with error bars, certainly if the data is not scaled then you even have a well-defined error bar. The interpretation of them can be challenging if the data is scaled, however, and there's no way to tell other than the user. Either way it seems we should not throw an exception.

It's valid to plot an empty histogram with error bars

Why are you plotting non-existent data? If the histogram is actually empty this doesn't make sense. Link to an example?

Also, why are you taking a ratio of anything that is non-existent? This makes even less sense to me. :? An example would be helpful.

Isn't this poisson interval method used for plain errorbar histos (as well as ratios)? For ratios sure makes less sense. But imagine you are dumping 100 region plots to a file, now if one region is empty do you want your plot dumper script to crash or emit a warning?

But imagine you are dumping 100 region plots to a file, now if one region is empty do you want your plot dumper script to crash or emit a warning?

IMO it should crash. It is your responsibility to clean your data.

Ok, its a valid opinion. Just seems at odds with the amount of work this routine does to make up vaguely reasonable error bars in the case where even all but one bin is zero, that it would give up if all are zero. I won't press the issue further then (except to update the docstring)

If you expect it to crash, that's what try/except (or using "if" beforehand) is for. This could mask real problems, like all plots being empty because something is misconfigured? Nobody reads logs; warnings are next to useless.

Isn't this poisson interval method used for plain errorbar histos (as well as ratios)?

In practice it could be, but at the moment it is only being used in ratio_uncertainty.

Just seems at odds with the amount of work this routine does to make up vaguely reasonable error bars in the case where even all but one bin is zero, that it would give up if all are zero

I see what you're saying, but then would you also suggest not allowing any empty values? Or how would you define the cutoff? I don't think that you're pressing anything, I just want to make sure that I understand what your thoughts are here as it is clear that you've thought about this far more than I have.

(Henry I think we're in agreement but if I'm misunderstanding (sorry) please let me know).

The all zero case is the only situation when there really isn't enough info to do something reasonable so its fine. LGTM

nsmith- · 2021-04-08T13:31:28Z

src/hist/intervals.py

+    with np.errstate(divide="ignore"):
+        ratio = num / denom
+    if uncertainty_type == "poisson":
+        ratio_uncert = np.abs(poisson_interval(ratio, num / np.square(denom)) - ratio)


Maybe some more docs on what this is would be helpful? I am actually not sure what it is, poisson for numerator?

I looked back at my code to see that indeed it is poisson for numerator. :)

I'm switching work at the moment, but I'll come back tonight and clean this up and make it more clear.

Poke me when ready. :)

Poke me when ready

sorry I missed this comment. poke.

nsmith- · 2021-04-08T13:33:05Z

src/hist/intervals.py

+    elif uncertainty_type == "poisson-ratio":
+        # poisson ratio n/m is equivalent to binomial n/(n+m)
+        ratio_uncert = np.abs(clopper_pearson_interval(num, num + denom) - ratio)
+    else:


Would be nice to add as well the simple propagation of error style uncertainty: https://github.com/CoffeaTeam/coffea/blob/84e805773c1fac32fc79bc9373ec324552371244/coffea/hist/plot.py#L82

Sure, but is that needed in this PR to get things up and going for PR #161? That could be added later.

Feel free to open as an issue and move forward with this for now. :)

Considering its the ROOT default ratio plot error (TH1::Divide) seems we would want that no?

Considering its the ROOT default ratio plot error (TH1::Divide) seems we would want that no?

Let me go look, but I'm not sold on this argument. ROOT does plenty of things that are not a good idea.

I also do not consider ROOT's warning vs. error behavior to be a design standard for us. :)

Now if there's a valid reason to do this, then why is there a warning? And if not, why not just error now?

(If it's valid but rare, warnings can be squelched, while error's can't be)

I didn't add it originally either, but it is valid in some cases. See the PR that added it to coffea: scikit-hep/coffea#182

I didn't add it originally either, but it is valid in some cases.

Looking back at that Issue it is brought up that

Ah yeah, fair enough, this would be needed for efficiencies derived from weighted data (which is kind of rare, but definitely happens).

But the specific example isn't really elaborated on. Can you either ELI5 why this is worth doing now, or make a new Issue from this discussion and it can get labeled as a "good first issue"? Again, you've thought about this all more than I have, so it is very possible I'm just not seeing something incredibly obvious to you.

I thought it would be convenient to copy it as the others, since it isn't much work. But also fine to put it off to later if you prefer. I can make the issue

src/hist/__init__.py

henryiii · 2021-04-08T19:11:17Z

@all-contributors please add @matthewfeickert for code

allcontributors · 2021-04-08T19:11:25Z

@henryiii

I've put up a pull request to add @matthewfeickert! 🎉

src/hist/intervals.py

sourcery-ai · 2021-04-08T23:23:36Z

Sourcery Code Quality Report

❌ Merging this PR will decrease code quality in the affected files by 0.06%.

Quality metrics	Before	After	Change
Complexity	13.10 🙂	13.10 🙂	0.00
Method Length	97.29 🙂	97.47 🙂	0.18 👎
Working memory	15.69 ⛔	15.71 ⛔	0.02 👎
Quality	41.54% 😞	41.48% 😞	-0.06% 👎

Other metrics	Before	After	Change
Lines	474	473	-1

Changed files	Quality Before	Quality After	Quality Change
src/hist/init.py	89.99% ⭐	89.99% ⭐	0.00%
src/hist/plot.py	37.70% 😞	37.70% 😞	0.00%
src/hist/typing.py	84.68% ⭐	82.75% ⭐	-1.93% 👎

Here are some functions in these files that still need a tune-up:

File	Function	Complexity	Length	Working Memory	Quality	Recommendation
src/hist/plot.py	plot_pull	23 😞	716 ⛔	22 ⛔	16.50% ⛔	Refactor to reduce nesting. Try splitting into smaller methods. Extract out complex expressions
src/hist/plot.py	plot2d_full	9 🙂	278 ⛔	14 😞	38.61% 😞	Try splitting into smaller methods. Extract out complex expressions
src/hist/plot.py	_curve_fit_wrapper	5 ⭐	156 😞	14 😞	50.72% 🙂	Try splitting into smaller methods. Extract out complex expressions
src/hist/plot.py	_expr_to_lambda	9 🙂	129 😞	11 😞	54.60% 🙂	Try splitting into smaller methods. Extract out complex expressions
src/hist/plot.py	plot_pie	1 ⭐	51 ⭐	10 😞	75.33% ⭐	Extract out complex expressions

Legend and Explanation

The emojis denote the absolute quality of the code:

⭐ excellent
🙂 good
😞 poor
⛔ very poor

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.

Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Let us know what you think of it by mentioning @sourcery-ai in a comment.

henryiii

LGTM, thanks!

matthewfeickert added the enhancement New feature or request label Apr 7, 2021

matthewfeickert self-assigned this Apr 7, 2021

matthewfeickert mentioned this pull request Apr 7, 2021

feat: Add ratio plot support through .plot_ratio API #161

Merged

8 tasks

matthewfeickert commented Apr 7, 2021

View reviewed changes

src/hist/intervals.py Show resolved Hide resolved

matthewfeickert added the documentation Improvements or additions to documentation label Apr 7, 2021

matthewfeickert requested a review from henryiii April 7, 2021 20:32

nsmith- requested changes Apr 8, 2021

View reviewed changes

henryiii reviewed Apr 8, 2021

View reviewed changes

src/hist/__init__.py Outdated Show resolved Hide resolved

allcontributors bot mentioned this pull request Apr 8, 2021

docs: add matthewfeickert as a contributor #181

Closed

henryiii reviewed Apr 8, 2021

View reviewed changes

src/hist/intervals.py Outdated Show resolved Hide resolved

matthewfeickert added 16 commits April 8, 2021 16:22

Checkout typing updates from PR 161

455a20f

Pull intervals module from PR 161

79299ea

Update changelot and docs

fe934ff

Add intervals to module lists

b12fb80

Add tests for intervals

3c3fdca

Revise given Lindesy's suggestions

19fa9a4

Apply Henry's advice from code review in PR 161

fb69e4e

reduce number of bins for simplicity

9abfccf

uncert -> uncertainity

cde55f1

Add docstring for ratio_uncertainty

b806357

Simplify again given the very course binning

4a0e13d

Raise TypeError if invalid uncertainity_type passed

b4295ca

Test for TypeError

f594fcb

full name again

97a0daf

Remove intervals from default imports as contains non-core dependencies

5a86413

Add ImportError check for scipy import

1a1640c

matthewfeickert added 3 commits April 8, 2021 16:23

Correct poisson_interval docstring

ff74f25

Change ImportError -> ModuleNotFoundError

821994a

Add expanded docstring on uncertainty_type options

aa09c83

matthewfeickert force-pushed the feat/add-intervals-module branch from eb13b6b to aa09c83 Compare April 8, 2021 21:24

matthewfeickert requested a review from henryiii April 8, 2021 21:27

Add the __dir__ trick

bf2388c

nsmith- approved these changes Apr 9, 2021

View reviewed changes

henryiii approved these changes Apr 9, 2021

View reviewed changes

henryiii merged commit ef456a4 into master Apr 9, 2021

henryiii deleted the feat/add-intervals-module branch April 9, 2021 02:13

matthewfeickert mentioned this pull request Apr 9, 2021

Add normal uncertainty interval to intervals module #182

Open

feat: Add frequentist coverage intervals module #176

feat: Add frequentist coverage intervals module #176

Uh oh!

Conversation

matthewfeickert commented Apr 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Suggested squash and merge message

Uh oh!

Uh oh!

henryiii commented Apr 8, 2021

Uh oh!

matthewfeickert commented Apr 8, 2021

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthewfeickert Apr 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthewfeickert Apr 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthewfeickert Apr 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nsmith- Apr 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

henryiii commented Apr 8, 2021

Uh oh!

allcontributors bot commented Apr 8, 2021

Uh oh!

Uh oh!

sourcery-ai bot commented Apr 8, 2021

Sourcery Code Quality Report

matthewfeickert commented Apr 7, 2021 •

edited

Loading

matthewfeickert Apr 8, 2021 •

edited

Loading

matthewfeickert Apr 8, 2021 •

edited

Loading

matthewfeickert Apr 8, 2021 •

edited

Loading

nsmith- Apr 8, 2021 •

edited

Loading