-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
feat(ai_grouping): Send token length metrics on stacktraces sent to Seer #99873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…Seer In preparation to making the switch to token length being considered instead of frame count of errors, we take metrics of the token length of stacktraces being sent to be able to map out the statistics and the impact that would make. Insturmented get_token_count to monitor how long it takes. We use the existing titoken library which was already in use in Sentry.
| sample_rate=options.get("seer.similarity.metrics_sample_rate"), | ||
| tags={**shared_tags, "outcome": "block"}, | ||
| ) | ||
| metrics.distribution( | ||
| "grouping.similarity.token_count", | ||
| get_token_count(event, variants), | ||
| sample_rate=options.get("seer.similarity.metrics_sample_rate"), | ||
| tags={ | ||
| "platform": platform, | ||
| "frame_check_outcome": "block", | ||
| }, | ||
| ) | ||
| return True | ||
|
|
||
| metrics.incr( | ||
| "grouping.similarity.frame_count_filter", | ||
| sample_rate=options.get("seer.similarity.metrics_sample_rate"), | ||
| tags={**shared_tags, "outcome": "pass"}, | ||
| ) | ||
| metrics.distribution( | ||
| "grouping.similarity.token_count", | ||
| get_token_count(event, variants), | ||
| sample_rate=options.get("seer.similarity.metrics_sample_rate"), | ||
| tags={ | ||
| "platform": platform, | ||
| "frame_check_outcome": "pass", | ||
| }, | ||
| ) | ||
| return False | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential bug: The has_too_many_contributing_frames function calls get_token_count twice in its main execution path, instead of reusing the result from the first call.
-
Description: In the
has_too_many_contributing_framesfunction, the token count is calculated multiple times unnecessarily. After an initial call toget_token_countwhose result is stored in thetoken_countvariable, the function is called again for metrics reporting in both the "block" and "pass" branches, instead of reusing the value from thetoken_countvariable. This leads to a second, redundant execution of an expensive operation in a critical event ingestion path, impacting performance for every event processed by Seer. -
Suggested fix: In the "block" and "pass" branches of
has_too_many_contributing_frames, reuse thetoken_countvariable for themetrics.timingcall instead of callingget_token_counta second time. The "bypass" branch should also be refactored to avoid a separate call, perhaps by calculating the token count once at the top of the function.
severity: 0.6, confidence: 0.95
Did we get this right? 👍 / 👎 to inform future reviews.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #99873 +/- ##
==========================================
+ Coverage 78.81% 81.30% +2.49%
==========================================
Files 8699 8593 -106
Lines 385940 383015 -2925
Branches 24413 23858 -555
==========================================
+ Hits 304162 311429 +7267
+ Misses 81427 71243 -10184
+ Partials 351 343 -8 |
src/sentry/seer/similarity/utils.py
Outdated
| try: | ||
| timer_tags["has_content"] = False | ||
|
|
||
| # Get the encoding for cl100k_base (used by GPT-3.5-turbo/GPT-4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should double check that this is the same/similar enough to the tokenizer our embeddings model uses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm thats actually a good point, I used what we use in Sentry in Seer its actually something else. I looked and it seems like the actual models we use in Seer are environment depenedant, locally it uses a dummy and in production they come from Google storage, ill ask the AI team lets see
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I thought it was the same one! Yeah, I def think we should try to align if possible.
src/sentry/seer/similarity/utils.py
Outdated
| timer_tags["has_content"] = False | ||
|
|
||
| # Get the encoding for cl100k_base (used by GPT-3.5-turbo/GPT-4) | ||
| encoding = tiktoken.get_encoding("cl100k_base") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this line have to be done every function call, or could it be static?
src/sentry/seer/similarity/utils.py
Outdated
| timer_tags["source"] = "no_stacktrace_string" | ||
| return 0 | ||
|
|
||
| except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we report the exception to sentry?
| ) | ||
|
|
||
|
|
||
| def get_token_count( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm wondering if we should put this behind an option for the initial deploy just to be safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a part of me feels like I am wrapping it with a try so it should be fine, but I dont mind being extra safe here. You mean like internally inside this get token to not even perform the try and just return a zero? or around even sending the metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just around being able to debug any exceptions / know why they're happening
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh woops, i replied to the wrong comment. i mean just around running the function to report the metric, in case for whatever reason there's a problem when we deploy and need to roll back. (thinking perf/memory/whatever problem)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense to me.
src/sentry/seer/similarity/utils.py
Outdated
|
|
||
|
|
||
| def get_token_count( | ||
| event: Event | GroupEvent, variants: dict[str, BaseVariant] | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason variants is typed as being nullable? AFAIK that shouldn't be possible.
| data={"title": "Broken event"}, | ||
| ) | ||
|
|
||
| # Should not raise an exception, even with bad data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this in fact raise an exception inside the function? Can we assert on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will be adding a repot to sentry there, I will assert it gets called
|
refactored a bunch, could use another look - still up in the air if titoken is going to be a good gauge for our seer token count, so I will need to see about that |
src/sentry/seer/similarity/utils.py
Outdated
|
|
||
| def report_token_count_metric( | ||
| event: Event | GroupEvent, | ||
| variants: dict[str, BaseVariant] | None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as before - this shouldn't be nullable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bah cursor keeps adding it for some reason 🤦
src/sentry/seer/similarity/utils.py
Outdated
| try: | ||
| timer_tags["has_content"] = False | ||
| timer_tags["cached"] = event.data.get("stacktrace_string") is not None | ||
| timer_tags["source"] = "stacktrace_string" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| timer_tags["source"] = "stacktrace_string" | |
| timer_tags["source"] = "cached_string" |
(To better differentiate from the get_stacktrace_string source.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but there's a "cached" tag, though I guess I can just lose that and unifty it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I prefer it like this, I dont want to have to write "non_cached_string" on the other, and I dont want it to just say "stacktrace_string" and someone to need to know "cached_string" also exists.
| stacktrace_text = event.data.get("stacktrace_string") | ||
|
|
||
| if stacktrace_text is None: | ||
| stacktrace_text = get_stacktrace_string(get_grouping_info_from_variants(variants)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| stacktrace_text = get_stacktrace_string(get_grouping_info_from_variants(variants)) | |
| stacktrace_text = get_stacktrace_string(get_grouping_info_from_variants(variants)) | |
| timer_tags["source"] = "get_stacktrace_string" |
(To address cursor's complaint, which I think is valid.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I understand this one, adding the source tag here? I set it above always, and I dont see any cursor comment about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the comment I'm talking about:
So yeah, if you have the cached boolean separately (which I wasn't thinking about), then how you had it originally is fine. If you go with unifying them, though, having cached_stacktrace and get_stacktrace_string as two different values tells you whether or not it was there, just like cached would.
|
|
||
| def test_handles_exception_gracefully(self) -> None: | ||
| """Test that get_token_count handles exceptions gracefully and returns 0.""" | ||
| # Create an event with cached stacktrace that will cause tiktoken to fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this is a misleading comment - the only reason tikitoken fails is because we later force it to, not because of the cached stacktrace.
| mock_encode.side_effect = ValueError("Tiktoken encoding failed") | ||
|
|
||
| with patch("sentry.seer.similarity.utils.logger.error") as mock_logger_error: | ||
| token_count = get_token_count(broken_event, variants=None, platform="python") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| token_count = get_token_count(broken_event, variants=None, platform="python") | |
| token_count = get_token_count(broken_event, variants={}, platform="python") |
(Since variants should be non-nullable.)
src/sentry/seer/similarity/utils.py
Outdated
| ] | ||
|
|
||
| IGNORED_FILENAMES = ["<compiler-generated>"] | ||
| TOKENIZER = Tokenizer.from_pretrained("jinaai/jina-embeddings-v2-base-en") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Global Tokenizer Initialization Causes Startup Delays
The TOKENIZER is initialized globally at module import time via Tokenizer.from_pretrained(). This can significantly slow down application startup, make unnecessary network requests, and potentially crash the module import if the model fails to load, even when the token counting feature is disabled.
…cal file saved the model locally under data/models and added a readme for downloading it again
src/sentry/seer/similarity/utils.py
Outdated
|
|
||
| if stacktrace_text: | ||
| timer_tags["has_content"] = True | ||
| return len(get_tokenizer().encode(stacktrace_text)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Token Count Calculation and Stacktrace Metrics Issues
The get_token_count function incorrectly calculates token counts by using len() on the tokenizer's Encoding object instead of its .ids attribute. It also reports misleading timer_tags["source"] metrics, showing "stacktrace_string" even when the stacktrace is generated rather than cached. Furthermore, calling get_stacktrace_string with empty variants may lead to a crash.
| token_count = get_token_count(broken_event, variants=variants, platform="python") | ||
| mock_logger_exception.assert_called() | ||
|
|
||
| assert token_count == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Test Fails to Validate Tokenizer Exception Handling
The test_handles_exception_gracefully test doesn't properly verify tokenizer exception handling. It attempts to mock a non-existent TOKENIZER global, and get_token_count returns early when variants is empty, preventing the mocked exception from ever being reached.
| # Double-check pattern to avoid race conditions | ||
| if self._tokenizer is None: | ||
| # Try to load from local model first, fallback to remote | ||
| if os.path.exists(TOKENIZER_MODEL_PATH): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should always be true, no? the remote fallback might not be completely secure or successful (e.g., jinaai makes the huggingface repo private)
if for some reason we do want the remote fallback, then it should supply
self._tokenizer = Tokenizer.from_pretrained(
"jinaai/jina-embeddings-v2-base-en",
revision="322d4d7e2f35e84137961a65af894fda0385eb7a",
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm yea I was conflicted on the remote fallback.. looking at it again I am not sure it makes sense, I think I will just remove it. If this stays in our source code it should not get broken and tests will fail if it does. I will consider it if I end up having to host the model file.
|
PR reverted: 982c529 |
…ent to Seer (#99873)" This reverts commit 7d8584b. Co-authored-by: yuvmen <[email protected]>
…eer (#101477) In preparation to making the switch to token length being considered instead of frame count of errors, we take metrics of the token length of stacktraces being sent to be able to map out the statistics and the impact that would make. Insturmented get_token_count to monitor how long it takes. Introduces usage of tokenizers library for token count. Added the local tokenization model to Sentry to be used for tokenization without external dependencies. Redo of #99873 which removed `tiktoken` dep by mistake. It is still used in `getsentry` and causes build errors if removed.
…eer (#99873) In preparation to making the switch to token length being considered instead of frame count of errors, we take metrics of the token length of stacktraces being sent to be able to map out the statistics and the impact that would make. Insturmented get_token_count to monitor how long it takes. Introduces usage of `tokenizers` library for token count. Added the local tokenization model to Sentry to be used for tokenization without external dependencies.
…ent to Seer (#99873)" This reverts commit 7d8584b. Co-authored-by: yuvmen <[email protected]>
In preparation to making the switch to token length being considered instead of frame count of errors, we take metrics of the token length of stacktraces being sent to be able to map out the statistics and the impact that would make. Insturmented get_token_count to monitor how long it takes.
Introduces usage of
tokenizerslibrary for token count. Added the local tokenization model to Sentry to be used for tokenization without external dependencies.Note
Add token-count metrics for Seer stacktraces using a transformers tokenizer, gate via option, adjust ingest checks, and update tests.
get_token_countandreport_token_count_metricusingtransformers.AutoTokenizerwithjinaai/jina-embeddings-v2-base-en.grouping.similarity.token_countdistribution and timing viametrics.timer; capture exceptions withsentry_sdk.should_call_seer_for_groupingto run_has_too_many_contributing_framesafter_has_empty_stacktrace_string.seer.similarity.token_count_metrics_enabled(Bool, default True).transformers>=4.21.0.Written by Cursor Bugbot for commit d079587. This will update automatically on new commits. Configure here.