Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
b4b77b8
feat(ai_grouping) - Send token length metrics on stacktraces sent to …
yuvmen Sep 18, 2025
159a9f9
test and typing fix
yuvmen Sep 19, 2025
fec50e8
Review refactors
yuvmen Sep 19, 2025
8518cf7
typing fixes
yuvmen Sep 19, 2025
4767202
PR comments
yuvmen Sep 19, 2025
2d29e85
Setting option to be default true
yuvmen Sep 19, 2025
da58c5d
fix test
yuvmen Sep 19, 2025
e0c7ac7
Merge branch 'master' into yuvmen/token-count-stacktraces-poc
yuvmen Sep 29, 2025
d079587
Changed tokenizer to same one used by Seer
yuvmen Sep 29, 2025
ff26951
swapped tokenizer library to tokenizers
yuvmen Sep 30, 2025
17c3a6a
:snowflake: re-freeze requirements
getsantry[bot] Sep 30, 2025
5b36cb2
Changed Tokenizer model to be lazy loaded at runtime and load from lo…
yuvmen Sep 30, 2025
06777b1
typing fix
yuvmen Sep 30, 2025
42e61d7
small refactor to tags and protetction from empty variants
yuvmen Sep 30, 2025
4a8d2ac
fix tests
yuvmen Sep 30, 2025
2dbbe74
Merge branch 'master' into yuvmen/token-count-stacktraces-poc
yuvmen Oct 9, 2025
2e4d20a
Merge branch 'master' into yuvmen/token-count-stacktraces-poc
yuvmen Oct 14, 2025
f7de5db
removed remote fallback, raised error which gets caught instead
yuvmen Oct 14, 2025
a38b96c
return tiktoken dependacny removed by mistake
yuvmen Oct 14, 2025
e1f2ab7
:snowflake: re-freeze requirements
getsantry[bot] Oct 14, 2025
8c3987a
correct no variants case to use `get_grouping_info_from_variants_lega…
yuvmen Oct 14, 2025
b94eeff
Merge branch 'master' into yuvmen/token-count-stacktraces-poc
yuvmen Oct 15, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ dependencies = [
"structlog>=22.1.0",
"symbolic>=12.14.1",
"tiktoken>=0.8.0",
"tokenizers>=0.22.0",
"tldextract>=5.1.2",
"toronado>=0.1.0",
"typing-extensions>=4.12.0",
Expand Down Expand Up @@ -295,6 +296,7 @@ module = [
"onelogin.saml2.idp_metadata_parser.*",
"rb.*",
"statsd.*",
"tokenizers.*",
"u2flib_server.model.*",
]
ignore_missing_imports = true
Expand Down
29 changes: 29 additions & 0 deletions src/sentry/data/models/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Sentry ML Models

This directory contains machine learning models used by Sentry.

## Tokenizer Model

### jina-embeddings-v2-base-en

This directory contains the tokenizer model for the Jina AI embeddings v2 base English model.

- **Model**: `jinaai/jina-embeddings-v2-base-en`
- **File**: `jina-embeddings-v2-base-en/tokenizer.json`
- **Usage**: Used by `src/sentry/seer/similarity/utils.py` for tokenizing stacktrace text

### Updating the Model

To update or re-download the tokenizer model, you can run:

```python
from tokenizers import Tokenizer
import os
from sentry.constants import DATA_ROOT

# Download and save the model
tokenizer = Tokenizer.from_pretrained("jinaai/jina-embeddings-v2-base-en")
model_path = os.path.join(DATA_ROOT, "models", "jina-embeddings-v2-base-en", "tokenizer.json")
os.makedirs(os.path.dirname(model_path), exist_ok=True)
tokenizer.save(model_path)
```
Loading
Loading