Skip to content

Conversation

andrewscolm
Copy link
Contributor

@andrewscolm andrewscolm commented Aug 20, 2025

Add clarification to rounding requirements and redaction of values <=7

Copy link

cloudflare-workers-and-pages bot commented Aug 20, 2025

Deploying opensafely-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: 6c41322
Status: ✅  Deploy successful!
Preview URL: https://27ea5f2a.opensafely-docs.pages.dev
Branch Preview URL: https://andrewscolm-patch-3.opensafely-docs.pages.dev

View logs

The general principle is that **any statistic describing 7 or fewer patients, either directly or indirectly, should be redacted or combined into other statistics**. This includes:
The general principle is that **any statistic describing 5 or fewer patients, either directly or indirectly, should be redacted or combined into other statistics**. This includes:

* Redacting counts <=7 in frequency tables. Row and column totals should be recalculated after you have redacted the cell values, to ensure that the redacted values can not be inferred from the totals.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rounding to the nearest 5 offers protection against this

@andrewscolm andrewscolm requested a review from wjchulme August 20, 2025 09:29
In general, good SDC is consistent with good statistics: many observations, no influential outliers, well-behaved distributions etc both prevent disclosure and increase confidence in the statistics. The one area to be wary of is where you can say something for certain about entire groups (‘all patients presenting with X also needed treatment for Y’). Be cautious about statements like this.

To understand what checks have to be made to outputs it is important to understand the **attribute types** that exist in data and how these could lead to **primary or secondary disclosure**. Importantly, OpenSAFELY requires that researchers redact any outputs based on counts <= 7 before they can be released.
To understand what checks have to be made to outputs it is important to understand the **attribute types** that exist in data and how these could lead to **primary or secondary disclosure**. Importantly, OpenSAFELY requires that researchers redact any outputs that can identify <=5 individuals. In order to achieve this for counts rounded to the nearest 5 counts of 7 or fewer must be redacted before rounding.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comma would help:

In order to achieve this for counts rounded to the nearest 5, counts of 7 or fewer must be redacted before rounding.

However, I don't think this is correct. We don't have to redact (= completely remove a value) if the rounding precision doesn't lead to a rounding band with width <5. For example, if I round everything to the nearest 20, then we have [-9, 9], [10, 29], [30, 49],... mapping to values 0, 20, 40, ..., which is allowed, and doesn't require any redaction. Similarly for midpoint-5 and above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reworded for clarity:

Importantly, OpenSAFELY requires that researchers redact any outputs that can identify <=5 individuals. For example, if you plan to round your counts to the nearest 5, you would need to redact counts of 7 or fewer before rounding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants