-
-
Notifications
You must be signed in to change notification settings - Fork 1
Enterprise-grade error handling and logging #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
When I use JJ workspaces I don't have a .git folder in the workspace. This makes flake.nix misbehave - I copy the whole directory contents in the /nix/store including the ./target directory which is 1.1GB. If this were a Git repo I would have copied only the tracked files. The restructuring fixes this issue.
Signed-off-by: Tzanko Matev <[email protected]>
We no longer emit DEBUG logs by default. This makes our tests less noisy. DEBUG logs can be turned on using RUST_LOG Also we tried to rephrase the error status document as an experiment to make Codex use a simpler language
…strings Signed-off-by: Tzanko Matev <[email protected]>
… what docs need to be produced Signed-off-by: Tzanko Matev <[email protected]>
Coverage SummaryRust (lines)
Python (statements)
Generated automatically via |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems great, but it's so complex: i admit ~4k lines for error handling/logging seem a lot to me, i'd expect, maybe naively from the outside, that a whole recorder is ~4k lines; maybe if there were suitable ready libs for some things like policy/error facades it would decrease it. Some of it: maybe a lot of this are touched/changed lines though: not all is new code as well, so maybe it makes sense and it's just hard for me to quickly track it
on the other hand a lot of it makes sense, and there are many cases where custom logic/flexibility is needed, so it's probably very useful
A lot of it are just tests. Basically I told codex that our error handling was ad-hoc and asked it to fix it |
tests are maybe ~500 lines, maybe ~500 lines documents as well: there are a lot of new abstractions/helpers and places where code is wrapped with them, i guess they are useful, i just haven't went deep into them |
Tried to centralize and manage error-handling and logging. Here's the status of the task. See the README.md for better description of the new features.
WS1 – Foundations & Inventory
State: In progress
Tooling:
just errors-audit
(findsPyRuntimeError::new_err
,unwrap
/expect
/panic!
, PythonRuntimeError
/ValueError
).What we saw:
RecorderError
; raw Python exceptions survive incodetracer_python_recorder/session.py
and tests (ISSUE-014).src/monitoring/tracer.rs
still useslock().unwrap()
and lacks error reporting for callback failures (ISSUE-013).Next moves:
WS2 –
recorder-errors
CrateState: Done (2025-10-02)
Highlights:
crates/recorder-errors
withRecorderError
, enums, context helpers, macros (usage!
,enverr!
,target!
,bug!
,ensure_*
), plus tests and optional serde support.cargo test -p recorder-errors
+ workspacecargo check
stay green.Next moves: Use this crate everywhere in WS3/WS4 work.
WS3 – Retrofit Rust Modules
State: Done (2025-10-02)
Highlights:
session/*
,runtime/*
, andmonitoring/tracer.rs
now returnRecorderError
via the shared macros.errors
mapper; IO errors now carry context.PyRuntimeError::new_err
left outside that mapper.Next moves: Feed findings into WS4 and loop back to WS1 issues.
WS4 – FFI Wrapper & Python Exception Hierarchy
State: Done (2025-10-02)
Highlights:
ffi
guard around each PyO3 entry point to mapRecorderError
plus panic safety.RecorderError
,UsageError
,EnvironmentError
,TargetError
,InternalError
.uv run cargo nextest run ...
;.venv/bin/python -m pytest ...
).Next moves: Hold for WS5 until ISSUES 013/014 close.
WS5 – Policy Switches & Runtime Configuration
State: Done (2025-10-03)
Highlights:
TraceSession.start()
andtrace()
now refresh policy from env vars and accept override mappings so embeds wire recorder switches without manual plumbing.configure_policy
/configure_policy_from_env
under the expected Python names; unit tests cover env-driven and explicit override flows.RecorderPolicy
: callback errors respecton_recorder_error
(disable detaches without surfacing exceptions),require_trace
now fails cleanly when no events land, and partial traces are deleted or retained based onkeep_partial_trace
.Next moves: Kick off WS6 once upstream WS1 cleanups land.
WS6 – Logging, Metrics, and Diagnostics
State: Done (2025-10-03)
Highlights:
env_logger
helper with a structured JSON logger that always emitsrun_id
, activetrace_id
, anderror_code
fields while honouring policy-driven log level and log file overrides.RecorderMetrics
sink and instrumented dropped locations, policy-triggered detachments, and caught panics across the monitoring/runtime paths; Rust unit tests exercise the metrics capture.--json-errors
policy path so runtime shutdown emits a single-line JSON trailer on stderr; CLI integration tests now assert the abort flow surfaces the trailer alongside existing stack traces.Next moves: Wire the metrics sink into the chosen exporter and align the log schema with Observability consumption before rolling out to downstream tooling.
WS7 – Test Coverage & Tooling Enforcement
State: Done (2025-10-04)
Highlights:
recorder-errors
and policy unit tests covering every macro (usage/target/internal ensures) plus invalid boolean parsing.dispatch
/wrap_pyfunction
, panic containment, and Python exception attribute propagation.just lint
orchestration runningcargo clippy -D clippy::panic
and a repository script that blocks unchecked.unwrap(
usage outside the legacy allowlist.Next moves: Monitor unwrap allowlist shrinkage once WS1 follow-ups land; evaluate extending the lint to
.expect(
once monitoring refactor closes.WS8 – Documentation & Rollout
State: Done (2025-10-05)
Highlights:
RecorderError
catch example.docs/onboarding/error-handling.md
with migration steps, policy wiring tips, and assertion rules for contributors.codetracer-python-recorder/CHANGELOG.md
to brief downstream tools on consuming structured errors.Next moves: