-
Notifications
You must be signed in to change notification settings - Fork 9
Update sdc.md #1826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Update sdc.md #1826
Conversation
Deploying opensafely-docs with
|
Latest commit: |
6c41322
|
Status: | ✅ Deploy successful! |
Preview URL: | https://27ea5f2a.opensafely-docs.pages.dev |
Branch Preview URL: | https://andrewscolm-patch-3.opensafely-docs.pages.dev |
The general principle is that **any statistic describing 7 or fewer patients, either directly or indirectly, should be redacted or combined into other statistics**. This includes: | ||
The general principle is that **any statistic describing 5 or fewer patients, either directly or indirectly, should be redacted or combined into other statistics**. This includes: | ||
|
||
* Redacting counts <=7 in frequency tables. Row and column totals should be recalculated after you have redacted the cell values, to ensure that the redacted values can not be inferred from the totals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rounding to the nearest 5 offers protection against this
docs/outputs/sdc.md
Outdated
In general, good SDC is consistent with good statistics: many observations, no influential outliers, well-behaved distributions etc both prevent disclosure and increase confidence in the statistics. The one area to be wary of is where you can say something for certain about entire groups (‘all patients presenting with X also needed treatment for Y’). Be cautious about statements like this. | ||
|
||
To understand what checks have to be made to outputs it is important to understand the **attribute types** that exist in data and how these could lead to **primary or secondary disclosure**. Importantly, OpenSAFELY requires that researchers redact any outputs based on counts <= 7 before they can be released. | ||
To understand what checks have to be made to outputs it is important to understand the **attribute types** that exist in data and how these could lead to **primary or secondary disclosure**. Importantly, OpenSAFELY requires that researchers redact any outputs that can identify <=5 individuals. In order to achieve this for counts rounded to the nearest 5 counts of 7 or fewer must be redacted before rounding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comma would help:
In order to achieve this for counts rounded to the nearest 5, counts of 7 or fewer must be redacted before rounding.
However, I don't think this is correct. We don't have to redact (= completely remove a value) if the rounding precision doesn't lead to a rounding band with width <5. For example, if I round everything to the nearest 20, then we have [-9, 9], [10, 29], [30, 49],...
mapping to values 0, 20, 40, ...
, which is allowed, and doesn't require any redaction. Similarly for midpoint-5 and above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reworded for clarity:
Importantly, OpenSAFELY requires that researchers redact any outputs that can identify <=5 individuals. For example, if you plan to round your counts to the nearest 5, you would need to redact counts of 7 or fewer before rounding.
Add clarification to rounding requirements and redaction of values <=7