Skip to content

Conversation

@paulgessinger
Copy link
Contributor

I was thinking about adding the efficiency error calculation as is implemented in TEfficiency. Before I write comments and/or tests, I wanted to ask/discuss if this makes sense, since there is already the num method.

One thing I noticed with the way the errors are calculated (see https://root.cern.ch/doc/master/TEfficiency_8cxx_source.html#l02515 and below) is that if num==denom then the error will be 0. Does that even make sense? Technically, scipy.stats.norm.ppf complains if it is called with sigma = 0, so I'm masking out zero elements and manually set those to zero errors.

Thoughts?

@paulgessinger paulgessinger changed the title Add normal interrval to plotratio Add normal interval to plotratio Oct 17, 2019
@lgray
Copy link
Collaborator

lgray commented Oct 17, 2019

Please fix flake8 violations:

$ python setup.py flake8
running flake8
coffea/hist/plot.py:80:1: E302 expected 2 blank lines, found 1
coffea/hist/plot.py:83:13: E226 missing whitespace around arithmetic operator
coffea/hist/plot.py:85:17: E201 whitespace after '('
coffea/hist/plot.py:85:30: E226 missing whitespace around arithmetic operator
coffea/hist/plot.py:85:50: E202 whitespace before ')'
coffea/hist/plot.py:88:15: E226 missing whitespace around arithmetic operator
coffea/hist/plot.py:88:18: E226 missing whitespace around arithmetic operator
The command "python setup.py flake8" exited with 1.

@lgray
Copy link
Collaborator

lgray commented Oct 17, 2019

Since the error on an efficiency is derived from two correlated numbers, when the numerator and denominator are equal their variations are 100% correlated and of the same size and so the statistical error on an efficiency value of 1 is zero.

This is manifestly undercovering and there are a ton of recipes for getting good coverage and the tradeoffs between them. Typically Clopper-Pearson is favored, but having other implementations of efficiency errors is welcome.

@lgray
Copy link
Collaborator

lgray commented Oct 17, 2019

Please add a test for this new function.

@nsmith-
Copy link
Member

nsmith- commented Oct 17, 2019

So, of course clopper-pearson is already available.
What this looks like is an attempt to find the frequentist coverage for a ratio of independent normal distributions, is that correct? Then can you cite https://en.wikipedia.org/wiki/Ratio_distribution#Uncorrelated_noncentral_normal_ratio rather than a line in ROOT with no context.

@paulgessinger
Copy link
Contributor Author

My understanding was that Clopper-Pearson is not valid for non-integer num/denoms.

I think what you're describing is basically what is happening there. The reason I link to the ROOT code is that that's literally where I have the algorithm from, and I think it makes sense to disclose that. I can certainly add the wikipedia link you provided as well, but I'd prefer keeping the other URL in there as well.

@lgray
Copy link
Collaborator

lgray commented Oct 17, 2019

Ah yeah, fair enough, this would be needed for efficiencies derived from weighted data (which is kind of rare, but definitely happens).

@paulgessinger
Copy link
Contributor Author

Well I have this specific case right now, that's why I thought of adding it 😄

@nsmith-
Copy link
Member

nsmith- commented Oct 17, 2019

Ah, nevermind, this is just the result of expanding eff = p/(p+f) in partial derivatives of p and f.
Maybe we should just allow a callback function to be passed into plotratio.

@nsmith-
Copy link
Member

nsmith- commented Oct 17, 2019

I think we should go ahead with this PR. @paulgessinger can you fix the flake8 complaints?

@paulgessinger
Copy link
Contributor Author

Will do.

@lgray
Copy link
Collaborator

lgray commented Oct 18, 2019

@paulgessinger can you please add a test for: normal_interval? Thanks!

@paulgessinger
Copy link
Contributor Author

I'm working on it.

@paulgessinger
Copy link
Contributor Author

paulgessinger commented Oct 18, 2019

Flake8 should be fixed, I added a test that compares against test output I got from TEfficiency. The test also covers edge cases like passed == total and passed == total == 0 (which results in a nan)

@lgray
Copy link
Collaborator

lgray commented Oct 18, 2019

Looks good, thank you!

@lgray lgray merged commit 97cb79b into scikit-hep:master Oct 18, 2019
@paulgessinger paulgessinger deleted the normal-interval-eff-error branch October 18, 2019 11:49
@lgray
Copy link
Collaborator

lgray commented Oct 18, 2019

I added a docstring to the function after the fact.
Can you take a look at:
https://github.com/CoffeaTeam/coffea/blob/master/coffea/hist/plot.py#L82-L98
and let me know if it's ok?

@paulgessinger
Copy link
Contributor Author

Yeah looks good I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants