diff --git a/.coverage b/.coverage deleted file mode 100644 index 5744558..0000000 Binary files a/.coverage and /dev/null differ diff --git a/.envrc b/.envrc index 3550a30..2a33d7a 100644 --- a/.envrc +++ b/.envrc @@ -1 +1,3 @@ -use flake +watch_file nix/flake.nix +watch_file nix/flake.lock +use flake ./nix diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index b87bec4..9e70c3c 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -23,13 +23,13 @@ jobs: extra_nix_config: | experimental-features = nix-command flakes - name: Prepare dev environment - run: nix develop --command bash -lc 'just venv ${{matrix.python-version}} dev' + run: nix develop ./nix --command bash -lc 'just venv ${{matrix.python-version}} dev' - name: Rust tests - run: nix develop --command bash -lc 'just cargo-test' + run: nix develop ./nix --command bash -lc 'just cargo-test' - name: Python tests - run: nix develop --command bash -lc 'just py-test' + run: nix develop ./nix --command bash -lc 'just py-test' coverage: name: Coverage (Python 3.12) @@ -49,17 +49,17 @@ jobs: experimental-features = nix-command flakes - name: Prepare dev environment (Python 3.12) - run: nix develop --command bash -lc 'just venv 3.12 dev' + run: nix develop ./nix --command bash -lc 'just venv 3.12 dev' - name: Collect coverage id: coverage-run - run: nix develop --command bash -lc 'just coverage' + run: nix develop ./nix --command bash -lc 'just coverage' - name: Generate coverage comment if: steps.coverage-run.outcome == 'success' run: | ROOT="$(pwd)" - nix develop --command bash -lc "python3 codetracer-python-recorder/scripts/generate_coverage_comment.py \ + nix develop ./nix --command bash -lc "python3 codetracer-python-recorder/scripts/generate_coverage_comment.py \ --rust-summary codetracer-python-recorder/target/coverage/rust/summary.json \ --python-json codetracer-python-recorder/target/coverage/python/coverage.json \ --output codetracer-python-recorder/target/coverage/coverage-comment.md \ diff --git a/.gitignore b/.gitignore index 7fc2ae7..e5670e6 100644 --- a/.gitignore +++ b/.gitignore @@ -7,5 +7,5 @@ build *~ .idea/ .cargo/ - +.coverage **/*.egg-info/ diff --git a/Justfile b/Justfile index 32a8c23..1dca614 100644 --- a/Justfile +++ b/Justfile @@ -31,22 +31,44 @@ venv version=PYTHON_DEFAULT_VERSION: # Build the module in dev mode dev: - uv run --directory codetracer-python-recorder maturin develop --uv + uv run --directory codetracer-python-recorder maturin develop --uv --features integration-test # Run unit tests of dev build test: cargo-test py-test # Run Rust unit tests without default features to link Python C library cargo-test: - uv run cargo nextest run --manifest-path codetracer-python-recorder/Cargo.toml --no-default-features + uv run cargo nextest run --manifest-path codetracer-python-recorder/Cargo.toml --workspace --no-default-features py-test: uv run --group dev --group test pytest codetracer-python-recorder/tests/python codetracer-pure-python-recorder + +lint: lint-rust lint-errors + +lint-rust: + uv run cargo clippy --manifest-path codetracer-python-recorder/Cargo.toml --workspace --no-default-features -- -D clippy::panic + +lint-errors: + uv run python3 codetracer-python-recorder/scripts/lint_no_unwraps.py # Run tests only on the pure recorder test-pure: uv run --group dev --group test pytest codetracer-pure-python-recorder +# Inspect ad-hoc error handling patterns across the Rust/Python recorder +errors-audit: + @echo "== PyRuntimeError construction ==" + @rg --color=never --no-heading -n "PyRuntimeError::new_err" codetracer-python-recorder/src codetracer-python-recorder/tests codetracer-python-recorder/codetracer_python_recorder || true + @echo + @echo "== unwrap()/expect()/panic! usage ==" + @rg --color=never --no-heading -n "\\.unwrap\\(" codetracer-python-recorder/src || true + @rg --color=never --no-heading -n "\\.expect\\(" codetracer-python-recorder/src || true + @rg --color=never --no-heading -n "panic!" codetracer-python-recorder/src || true + @echo + @echo "== Python-side bare RuntimeError/ValueError ==" + @rg --color=never --no-heading -n "raise RuntimeError" codetracer-python-recorder/codetracer_python_recorder || true + @rg --color=never --no-heading -n "raise ValueError" codetracer-python-recorder/codetracer_python_recorder || true + # Generate combined coverage artefacts for both crates coverage: just coverage-rust diff --git a/README.md b/README.md index 778a833..56e1fd0 100644 --- a/README.md +++ b/README.md @@ -2,11 +2,68 @@ This repository now hosts two related projects: -- codetracer-pure-python-recorder — the existing pure-Python prototype that records [CodeTracer](https://github.com/metacraft-labs/CodeTracer) traces using sys.settrace. -- codetracer-python-recorder — a new, Rust-backed Python extension module (PyO3) intended to provide a faster and more featureful recorder. +- codetracer-pure-python-recorder — a pure-Python tracer that still mirrors the early prototype. +- codetracer-python-recorder — a Rust-backed Python extension (PyO3 + maturin) with structured errors and tighter tooling. -> [!WARNING] -> Both projects are early-stage prototypes. Contributions and discussion are welcome! +Both projects are still in motion. Expect breaking changes while we finish the error-handling rollout. + +### Structured errors (Rust-backed recorder) + +The Rust module wraps every failure in a `RecorderError` hierarchy that reaches Python with a stable `code`, a readable `kind`, and a `context` dict. + +- `UsageError` → bad input or calling pattern. Codes like `ERR_ALREADY_TRACING`. +- `EnvironmentError` → IO or OS problems. Codes like `ERR_IO`. +- `TargetError` → the traced program raised or refused inspection. Codes like `ERR_TRACE_INCOMPLETE`. +- `InternalError` → a recorder bug or panic. Codes default to `ERR_UNKNOWN` unless classified. + +Quick catch example: + +```python +from codetracer_python_recorder import RecorderError, start, stop + +try: + session = start("/tmp/trace", format="json") +except RecorderError as err: + print(f"Recorder failed: {err.code}") + for key, value in err.context.items(): + print(f" {key}: {value}") +else: + try: + ... # run work here + finally: + session.flush() + stop() +``` + +All subclasses carry the same attributes, so existing handlers can migrate by catching `RecorderError` once and branching on `err.code` if needed. + +### CLI exit behaviour and JSON trailers + +`python -m codetracer_python_recorder` returns: + +- `0` when tracing and the target script succeed. +- The script's own exit code when it calls `sys.exit()`. +- `1` when a `RecorderError` bubbles out of startup or shutdown. +- `2` when the CLI arguments are incomplete. + +Pass `--codetracer-json-errors` (or set the policy via `configure_policy(json_errors=True)`) to stream a one-line JSON trailer on stderr. The payload includes `run_id`, `trace_id`, `error_code`, `error_kind`, `message`, and the `context` map so downstream tooling can log failures without scraping text. + +### Migration checklist for downstream tools + +- Catch `RecorderError` (or a subclass) instead of `RuntimeError`. +- Switch any string matching over to `err.code` values like `ERR_TRACE_DIR_CONFLICT`. +- Expect structured log lines (JSON) on stderr. Use the `error_code` field instead of parsing text. +- Opt in to JSON trailers for machine parsing and keep human output short. +- Update policy wiring to use `configure_policy` / `policy_snapshot()` rather than hand-rolled env parsing. +- Read `docs/onboarding/error-handling.md` for detailed migration steps and assertion rules. + +### Logging defaults + +The recorder now installs a JSON logger on first import. Logs include `run_id`, optional `trace_id`, and an `error_code` field when set. + +- Control the log filter with `RUST_LOG=target=level` (standard env syntax). +- Override from Python with `configure_policy(log_level="info")` or `log_file=...` for file output. +- Metrics counters record dropped events, detach reasons, and caught panics; plug your own sink via the Rust API when embedding. ### codetracer-pure-python-recorder @@ -54,6 +111,20 @@ Basic workflow: The CI workflow mirrors these commands. Pull requests get an automated comment with the latest Rust/Python coverage tables and downloadable artefacts (`lcov.info`, `coverage.xml`, `coverage.json`). +#### Debug logging + +Rust-side logging defaults to `warn` so test output stays readable. Export +`RUST_LOG` when you need more detail: + +```bash +RUST_LOG=codetracer_python_recorder=debug pytest \ + codetracer-python-recorder/tests/python/unit/test_backend_exceptions.py -q +``` + +Any filter accepted by `env_logger` still works, so you can switch to +`RUST_LOG=codetracer_python_recorder=info` or silence everything with +`RUST_LOG=off`. + ### Future directions The current Python support is an unfinished prototype. We can finish it. In the future, it may be expanded to function in a way to similar to the more complete implementations, e.g. [Noir](https://github.com/blocksense-network/noir/tree/blocksense/tooling/tracer). diff --git a/codetracer-python-recorder/CHANGELOG.md b/codetracer-python-recorder/CHANGELOG.md new file mode 100644 index 0000000..7cce9a7 --- /dev/null +++ b/codetracer-python-recorder/CHANGELOG.md @@ -0,0 +1,6 @@ +# codetracer-python-recorder — Change Log + +## Unreleased +- Documented the error-handling policy. README now lists the `RecorderError` hierarchy, policy hooks (`configure_policy`, JSON trailers), exit codes, and sample handlers so Python callers can consume structured failures. +- Added an onboarding guide under `docs/onboarding/error-handling.md` with migration steps for downstream tools. +- Recorded assertion guidance for contributors: prefer `bug!`/`ensure_internal!` over raw `panic!`/`.unwrap()` and keep `debug_assert!` paired with classified errors. diff --git a/codetracer-python-recorder/Cargo.lock b/codetracer-python-recorder/Cargo.lock index 09078b0..2399ba2 100644 --- a/codetracer-python-recorder/Cargo.lock +++ b/codetracer-python-recorder/Cargo.lock @@ -2,65 +2,6 @@ # It is not intended for manual editing. version = 4 -[[package]] -name = "aho-corasick" -version = "1.1.3" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "8e60d3430d3a69478ad0993f19238d2df97c507009a52b3c10addcd7f6bcb916" -dependencies = [ - "memchr", -] - -[[package]] -name = "anstream" -version = "0.6.20" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "3ae563653d1938f79b1ab1b5e668c87c76a9930414574a6583a7b7e11a8e6192" -dependencies = [ - "anstyle", - "anstyle-parse", - "anstyle-query", - "anstyle-wincon", - "colorchoice", - "is_terminal_polyfill", - "utf8parse", -] - -[[package]] -name = "anstyle" -version = "1.0.11" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "862ed96ca487e809f1c8e5a8447f6ee2cf102f846893800b20cebdf541fc6bbd" - -[[package]] -name = "anstyle-parse" -version = "0.2.7" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "4e7644824f0aa2c7b9384579234ef10eb7efb6a0deb83f9630a49594dd9c15c2" -dependencies = [ - "utf8parse", -] - -[[package]] -name = "anstyle-query" -version = "1.1.4" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "9e231f6134f61b71076a3eab506c379d4f36122f2af15a9ff04415ea4c3339e2" -dependencies = [ - "windows-sys", -] - -[[package]] -name = "anstyle-wincon" -version = "3.0.10" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "3e0633414522a32ffaac8ac6cc8f748e090c5717661fddeea04219e2344f5f2a" -dependencies = [ - "anstyle", - "once_cell_polyfill", - "windows-sys", -] - [[package]] name = "autocfg" version = "1.5.0" @@ -79,6 +20,12 @@ version = "2.9.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "1b8e56985ec62d17e9c1001dc89c88ecd7dc08e47eba5ec7c29c7b5eeecde967" +[[package]] +name = "bumpalo" +version = "3.19.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "46c5e41b57b8bba42a04676d81cb89e9ee8e859a1a66f80a5a72e1cb76b34d43" + [[package]] name = "capnp" version = "0.21.4" @@ -129,20 +76,17 @@ version = "0.1.0" dependencies = [ "bitflags", "dashmap", - "env_logger", "log", "once_cell", "pyo3", + "recorder-errors", "runtime_tracing", + "serde", + "serde_json", "tempfile", + "uuid", ] -[[package]] -name = "colorchoice" -version = "1.0.4" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "b05b61dc5112cbb17e4b6cd61790d9845d13888356391624cbe7e41efeac1e75" - [[package]] name = "dashmap" version = "5.5.3" @@ -162,29 +106,6 @@ version = "0.6.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "edd0f118536f44f5ccd48bcb8b111bdc3de888b58c74639dfb034a357d0f206d" -[[package]] -name = "env_filter" -version = "0.1.3" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "186e05a59d4c50738528153b83b0b0194d3a29507dfec16eccd4b342903397d0" -dependencies = [ - "log", - "regex", -] - -[[package]] -name = "env_logger" -version = "0.11.8" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "13c863f0904021b108aa8b2f55046443e6b1ebde8fd4a15c399893aae4fa069f" -dependencies = [ - "anstream", - "anstyle", - "env_filter", - "jiff", - "log", -] - [[package]] name = "errno" version = "0.3.14" @@ -240,42 +161,12 @@ version = "2.0.6" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "f4c7245a08504955605670dbf141fceab975f15ca21570696aebe9d2e71576bd" -[[package]] -name = "is_terminal_polyfill" -version = "1.70.1" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "7943c866cc5cd64cbc25b2e01621d07fa8eb2a1a23160ee81ce38704e97b8ecf" - [[package]] name = "itoa" version = "1.0.15" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "4a5f13b858c8d314ee3e8f639011f7ccefe71f97f96e50151fb991f267928e2c" -[[package]] -name = "jiff" -version = "0.2.15" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "be1f93b8b1eb69c77f24bbb0afdf66f54b632ee39af40ca21c4365a1d7347e49" -dependencies = [ - "jiff-static", - "log", - "portable-atomic", - "portable-atomic-util", - "serde", -] - -[[package]] -name = "jiff-static" -version = "0.2.15" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "03343451ff899767262ec32146f6d559dd759fdadf42ff0e227c7c48f72594b4" -dependencies = [ - "proc-macro2", - "quote", - "syn", -] - [[package]] name = "jobserver" version = "0.1.33" @@ -286,6 +177,16 @@ dependencies = [ "libc", ] +[[package]] +name = "js-sys" +version = "0.3.81" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ec48937a97411dcb524a265206ccd4c90bb711fca92b2792c407f268825b9305" +dependencies = [ + "once_cell", + "wasm-bindgen", +] + [[package]] name = "libc" version = "0.2.175" @@ -355,12 +256,6 @@ version = "1.21.3" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "42f5e15c9953c5e4ccceeb2e7382a716482c34515315f7b03532b8b4e8393d2d" -[[package]] -name = "once_cell_polyfill" -version = "1.70.1" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "a4895175b425cb1f87721b59f0f286c2092bd4af812243672510e1ac53e2e0ad" - [[package]] name = "parking_lot_core" version = "0.9.11" @@ -386,15 +281,6 @@ version = "1.11.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "f84267b20a16ea918e43c6a88433c2d54fa145c92a811b5b047ccbe153674483" -[[package]] -name = "portable-atomic-util" -version = "0.2.4" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "d8a2f0d8d040d7848a709caf78912debcc3f33ee4b3cac47d73d1e1069e83507" -dependencies = [ - "portable-atomic", -] - [[package]] name = "proc-macro2" version = "1.0.97" @@ -482,43 +368,21 @@ source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "69cdb34c158ceb288df11e18b4bd39de994f6657d83847bdffdbd7f346754b0f" [[package]] -name = "redox_syscall" -version = "0.5.17" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "5407465600fb0548f1442edf71dd20683c6ed326200ace4b1ef0763521bb3b77" -dependencies = [ - "bitflags", -] - -[[package]] -name = "regex" -version = "1.11.2" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "23d7fd106d8c02486a8d64e778353d1cffe08ce79ac2e82f540c86d0facf6912" +name = "recorder-errors" +version = "0.1.0" dependencies = [ - "aho-corasick", - "memchr", - "regex-automata", - "regex-syntax", + "serde", ] [[package]] -name = "regex-automata" -version = "0.4.10" +name = "redox_syscall" +version = "0.5.17" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "6b9458fa0bfeeac22b5ca447c63aaf45f28439a709ccd244698632f9aa6394d6" +checksum = "5407465600fb0548f1442edf71dd20683c6ed326200ace4b1ef0763521bb3b77" dependencies = [ - "aho-corasick", - "memchr", - "regex-syntax", + "bitflags", ] -[[package]] -name = "regex-syntax" -version = "0.8.6" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "caf4aa5b0f434c91fe5c7f1ecb6a5ece2130b02ad2a590589dda5146df959001" - [[package]] name = "runtime_tracing" version = "0.14.0" @@ -551,6 +415,12 @@ dependencies = [ "windows-sys", ] +[[package]] +name = "rustversion" +version = "1.0.22" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b39cdef0fa800fc44525c84ccb54a029961a8215f9619753635a9c0d2538d46d" + [[package]] name = "ryu" version = "1.0.20" @@ -661,10 +531,15 @@ source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "7264e107f553ccae879d21fbea1d6724ac785e8c3bfc762137959b5802826ef3" [[package]] -name = "utf8parse" -version = "0.2.2" +name = "uuid" +version = "1.18.1" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "06abde3611657adf66d383f00b093d7faecc7fa57071cce2578660c9f1010821" +checksum = "2f87b8aa10b915a06587d0dec516c282ff295b475d94abf425d62b57710070a2" +dependencies = [ + "getrandom", + "js-sys", + "wasm-bindgen", +] [[package]] name = "wasi" @@ -675,6 +550,65 @@ dependencies = [ "wit-bindgen-rt", ] +[[package]] +name = "wasm-bindgen" +version = "0.2.104" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c1da10c01ae9f1ae40cbfac0bac3b1e724b320abfcf52229f80b547c0d250e2d" +dependencies = [ + "cfg-if", + "once_cell", + "rustversion", + "wasm-bindgen-macro", + "wasm-bindgen-shared", +] + +[[package]] +name = "wasm-bindgen-backend" +version = "0.2.104" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "671c9a5a66f49d8a47345ab942e2cb93c7d1d0339065d4f8139c486121b43b19" +dependencies = [ + "bumpalo", + "log", + "proc-macro2", + "quote", + "syn", + "wasm-bindgen-shared", +] + +[[package]] +name = "wasm-bindgen-macro" +version = "0.2.104" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7ca60477e4c59f5f2986c50191cd972e3a50d8a95603bc9434501cf156a9a119" +dependencies = [ + "quote", + "wasm-bindgen-macro-support", +] + +[[package]] +name = "wasm-bindgen-macro-support" +version = "0.2.104" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9f07d2f20d4da7b26400c9f4a0511e6e0345b040694e8a75bd41d578fa4421d7" +dependencies = [ + "proc-macro2", + "quote", + "syn", + "wasm-bindgen-backend", + "wasm-bindgen-shared", +] + +[[package]] +name = "wasm-bindgen-shared" +version = "0.2.104" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "bad67dc8b2a1a6e5448428adec4c3e84c43e561d8c9ee8a9e5aabeb193ec41d1" +dependencies = [ + "unicode-ident", +] + [[package]] name = "windows-link" version = "0.1.3" diff --git a/codetracer-python-recorder/Cargo.toml b/codetracer-python-recorder/Cargo.toml index 714f362..d368a27 100644 --- a/codetracer-python-recorder/Cargo.toml +++ b/codetracer-python-recorder/Cargo.toml @@ -1,3 +1,8 @@ +[workspace] +members = ["crates/recorder-errors"] +default-members = [".", "crates/recorder-errors"] +resolver = "2" + [package] name = "codetracer-python-recorder" version = "0.1.0" @@ -12,6 +17,7 @@ crate-type = ["cdylib", "rlib"] [features] extension-module = ["pyo3/extension-module"] +integration-test = [] default = ["extension-module"] [dependencies] @@ -20,8 +26,11 @@ runtime_tracing = "0.14.0" bitflags = "2.4" once_cell = "1.19" dashmap = "5.5" -log = "0.4" -env_logger = "0.11" +log = { version = "0.4", features = ["kv"] } +serde = { version = "1.0", features = ["derive"] } +serde_json = "1.0" +uuid = { version = "1.10", features = ["v4"] } +recorder-errors = { version = "0.1.0", path = "crates/recorder-errors" } [dev-dependencies] pyo3 = { version = "0.25.1", features = ["auto-initialize"] } diff --git a/codetracer-python-recorder/codetracer_python_recorder/__init__.py b/codetracer-python-recorder/codetracer_python_recorder/__init__.py index b22a5a4..da6d108 100644 --- a/codetracer-python-recorder/codetracer_python_recorder/__init__.py +++ b/codetracer-python-recorder/codetracer_python_recorder/__init__.py @@ -1,16 +1,37 @@ -"""High-level tracing API built on a Rust backend. +"""Public tracing surface with structured recorder errors. -This module exposes a minimal interface for starting and stopping -runtime traces. The heavy lifting is delegated to the -`codetracer_python_recorder` Rust extension which will eventually hook -into `runtime_tracing` and `sys.monitoring`. For now the Rust side only -maintains placeholder state and performs no actual tracing. +Importing this package installs policy defaults, wires the Rust backend, +and exposes helpers to start and stop tracing. Every failure travels +through :class:`RecorderError` or one of its subclasses. Each exception +carries a stable ``code`` string (``ERR_*``), a ``kind`` label, and a +``context`` dict for tooling. """ from . import api as _api from .api import * # re-export public API symbols from .auto_start import auto_start_from_env +from .codetracer_python_recorder import ( + EnvironmentError, + InternalError, + RecorderError, + TargetError, + UsageError, + configure_policy, + configure_policy_from_env, + policy_snapshot, +) +configure_policy_from_env() auto_start_from_env() -__all__ = _api.__all__ +__all__ = ( + *_api.__all__, + "RecorderError", + "UsageError", + "EnvironmentError", + "TargetError", + "InternalError", + "configure_policy", + "configure_policy_from_env", + "policy_snapshot", +) diff --git a/codetracer-python-recorder/codetracer_python_recorder/__main__.py b/codetracer-python-recorder/codetracer_python_recorder/__main__.py index 6982239..7f44709 100644 --- a/codetracer-python-recorder/codetracer_python_recorder/__main__.py +++ b/codetracer-python-recorder/codetracer_python_recorder/__main__.py @@ -6,7 +6,12 @@ Codetracer options (must appear before the script path): --codetracer-trace PATH Output events file (default: trace.bin or trace.json) --codetracer-format {binary,json} Output format (default: binary) - --codetracer-capture-values BOOL Whether to capture values (default: true) + --codetracer-on-recorder-error MODE How to react to recorder errors (abort|disable) + --codetracer-require-trace Exit with failure if no trace is produced + --codetracer-keep-partial-trace Preserve partial traces when failures occur + --codetracer-log-level LEVEL Override Rust log filter (e.g. info,debug) + --codetracer-log-file PATH Write recorder logs to a file + --codetracer-json-errors Emit JSON error trailers on stderr Examples: python -m codetracer_python_recorder --codetracer-format=json app.py --flag=1 @@ -19,7 +24,7 @@ import sys from pathlib import Path -from . import DEFAULT_FORMAT, start, stop +from . import DEFAULT_FORMAT, configure_policy, configure_policy_from_env, start, stop import argparse @@ -48,6 +53,44 @@ def main(argv: list[str] | None = None) -> int: default=DEFAULT_FORMAT, help="Output format for trace events. 'binary' is compact; 'json' is human-readable. Default: %(default)s.", ) + parser.add_argument( + "--codetracer-on-recorder-error", + dest="on_recorder_error", + choices=["abort", "disable"], + help="How the recorder responds to internal errors (abort or disable).", + ) + parser.add_argument( + "--codetracer-require-trace", + dest="require_trace", + action="store_true", + help="Exit with status 1 if no trace output is produced.", + ) + parser.add_argument( + "--codetracer-keep-partial-trace", + dest="keep_partial_trace", + action="store_true", + help="Keep partial trace files on failure instead of cleaning up.", + ) + parser.add_argument( + "--codetracer-log-level", + dest="log_level", + default=None, + help="Override the Rust log filter (e.g. info,debug).", + ) + parser.add_argument( + "--codetracer-log-file", + dest="log_file", + default=None, + help="Path to a file where recorder logs should be written.", + ) + parser.add_argument( + "--codetracer-json-errors", + dest="json_errors", + action="store_true", + help="Emit JSON error trailers on stderr for machine parsing.", + ) + + configure_policy_from_env() # Only parse our options; leave script and script args in unknown ns, unknown = parser.parse_known_args(argv) @@ -62,6 +105,22 @@ def main(argv: list[str] | None = None) -> int: fmt = ns.format or DEFAULT_FORMAT trace_path = Path(ns.trace) if ns.trace else _default_trace_path(fmt) + policy_kwargs: dict[str, object] = {} + if ns.on_recorder_error: + policy_kwargs["on_recorder_error"] = ns.on_recorder_error + if ns.require_trace: + policy_kwargs["require_trace"] = True + if ns.keep_partial_trace: + policy_kwargs["keep_partial_trace"] = True + if ns.log_level is not None: + policy_kwargs["log_level"] = ns.log_level + if ns.log_file is not None: + policy_kwargs["log_file"] = ns.log_file + if ns.json_errors: + policy_kwargs["json_errors"] = True + if policy_kwargs: + configure_policy(**policy_kwargs) + old_argv = sys.argv sys.argv = [str(script_path)] + script_args # Activate tracing only after entering the target script file. @@ -69,6 +128,7 @@ def main(argv: list[str] | None = None) -> int: trace_path, format=fmt, start_on_enter=script_path, + policy=policy_kwargs if policy_kwargs else None, ) try: runpy.run_path(str(script_path), run_name="__main__") diff --git a/codetracer-python-recorder/codetracer_python_recorder/api.py b/codetracer-python-recorder/codetracer_python_recorder/api.py index d42d98c..6d8f1ea 100644 --- a/codetracer-python-recorder/codetracer_python_recorder/api.py +++ b/codetracer-python-recorder/codetracer_python_recorder/api.py @@ -1,4 +1,9 @@ -"""High-level tracing API built on a Rust backend.""" +"""High-level tracing helpers with structured error propagation. + +Expose the core session helpers (:func:`start`, :func:`stop`, +:func:`trace`, etc.). These wrappers bubble up :class:`RecorderError` +instances from the Rust layer so callers see stable ``ERR_*`` codes. +""" from __future__ import annotations from typing import Iterable diff --git a/codetracer-python-recorder/codetracer_python_recorder/auto_start.py b/codetracer-python-recorder/codetracer_python_recorder/auto_start.py index 8918cc7..cb3ea05 100644 --- a/codetracer-python-recorder/codetracer_python_recorder/auto_start.py +++ b/codetracer-python-recorder/codetracer_python_recorder/auto_start.py @@ -21,6 +21,9 @@ def auto_start_from_env() -> None: # Delay import to avoid boot-time circular dependencies. from . import session + from .codetracer_python_recorder import configure_policy_from_env as _configure_policy_from_env + + _configure_policy_from_env() if session.is_tracing(): log.debug("codetracer auto-start skipped: tracing already active") diff --git a/codetracer-python-recorder/codetracer_python_recorder/session.py b/codetracer-python-recorder/codetracer_python_recorder/session.py index 5014a6d..cd43a43 100644 --- a/codetracer-python-recorder/codetracer_python_recorder/session.py +++ b/codetracer-python-recorder/codetracer_python_recorder/session.py @@ -1,11 +1,18 @@ -"""Tracing session management helpers.""" +"""Tracing session management helpers with policy integration. + +These wrappers load policy from env vars, call into the Rust backend, +and surface structured :class:`RecorderError` instances on failure. +""" from __future__ import annotations import contextlib +import os from pathlib import Path -from typing import Iterator, Optional +from typing import Iterator, Mapping, Optional from .codetracer_python_recorder import ( + configure_policy as _configure_policy, + configure_policy_from_env as _configure_policy_from_env, flush_tracing as _flush_backend, is_tracing as _is_tracing_backend, start_tracing as _start_backend, @@ -17,7 +24,11 @@ class TraceSession: - """Handle representing a live tracing session.""" + """Handle representing a live tracing session. + + The object keeps the resolved trace path and format. Use + :meth:`flush` and :meth:`stop` to interact with the global session. + """ path: Path format: str @@ -47,8 +58,43 @@ def start( *, format: str = DEFAULT_FORMAT, start_on_enter: str | Path | None = None, + policy: Mapping[str, object] | None = None, + apply_env_policy: bool = True, ) -> TraceSession: - """Start a new global trace session.""" + """Start a new global trace session. + + Parameters + ---------- + path: + Destination directory for generated trace artefacts. + format: + Trace events serialisation format (``"binary"`` or ``"json"``). + start_on_enter: + Optional path that delays trace activation until the interpreter enters + the referenced file. + policy: + Optional mapping of runtime policy overrides forwarded to + :func:`configure_policy` before tracing begins. Keys match the policy + keyword arguments (``on_recorder_error``, ``require_trace``, etc.). + apply_env_policy: + When ``True`` (default), refresh policy settings from environment + variables via :func:`configure_policy_from_env` prior to applying + explicit overrides. + + Returns + ------- + TraceSession + Handle for the active recorder session. + + Raises + ------ + RecorderError + Raised by the Rust backend when configuration, IO, or the target + script fails. + RuntimeError + Raised when ``start`` is called while another session is still + active. The guard lives in Python so the error stays synchronous. + """ global _active_session if _is_tracing_backend(): raise RuntimeError("tracing already active") @@ -57,6 +103,11 @@ def start( normalized_format = _coerce_format(format) activation_path = _normalize_activation_path(start_on_enter) + if apply_env_policy: + _configure_policy_from_env() + if policy: + _configure_policy(**_coerce_policy_kwargs(policy)) + _start_backend(str(trace_path), normalized_format, activation_path) session = TraceSession(path=trace_path, format=normalized_format) _active_session = session @@ -88,9 +139,16 @@ def trace( path: str | Path, *, format: str = DEFAULT_FORMAT, + policy: Mapping[str, object] | None = None, + apply_env_policy: bool = True, ) -> Iterator[TraceSession]: """Context manager helper for scoped tracing.""" - session = start(path, format=format) + session = start( + path, + format=format, + policy=policy, + apply_env_policy=apply_env_policy, + ) try: yield session finally: @@ -120,6 +178,18 @@ def _normalize_activation_path(value: str | Path | None) -> str | None: return str(Path(value).expanduser()) +def _coerce_policy_kwargs(policy: Mapping[str, object]) -> dict[str, object]: + normalized: dict[str, object] = {} + for key, raw_value in policy.items(): + if key == "log_file" and raw_value is not None: + normalized[key] = os.fspath(raw_value) + elif key in {"on_recorder_error", "log_level"} and raw_value is not None: + normalized[key] = str(raw_value) + else: + normalized[key] = raw_value + return normalized + + __all__ = ( "TraceSession", "flush", diff --git a/codetracer-python-recorder/crates/recorder-errors/Cargo.toml b/codetracer-python-recorder/crates/recorder-errors/Cargo.toml new file mode 100644 index 0000000..a3dd7fb --- /dev/null +++ b/codetracer-python-recorder/crates/recorder-errors/Cargo.toml @@ -0,0 +1,17 @@ +[package] +name = "recorder-errors" +version = "0.1.0" +edition = "2021" +license = "MIT" +readme = "README.md" +description = "Shared error facade for codetracer recorders" +authors = ["CodeTracer Runtime Team"] +categories = ["development-tools"] +keywords = ["codetracer", "error-handling"] + +[dependencies] +serde = { version = "1.0", features = ["derive"], optional = true } + +[features] +default = [] +serde = ["dep:serde"] diff --git a/codetracer-python-recorder/crates/recorder-errors/README.md b/codetracer-python-recorder/crates/recorder-errors/README.md new file mode 100644 index 0000000..cfd5267 --- /dev/null +++ b/codetracer-python-recorder/crates/recorder-errors/README.md @@ -0,0 +1,6 @@ +# recorder-errors + +Shared error handling primitives for the CodeTracer recorders. This crate +provides the `RecorderError` type, error classification enums, ergonomic macros, +and opt-in serde support so higher-level crates can transport structured +failures across process and FFI boundaries. diff --git a/codetracer-python-recorder/crates/recorder-errors/src/lib.rs b/codetracer-python-recorder/crates/recorder-errors/src/lib.rs new file mode 100644 index 0000000..7f4ee21 --- /dev/null +++ b/codetracer-python-recorder/crates/recorder-errors/src/lib.rs @@ -0,0 +1,667 @@ +//! Recorder-wide error façade for CodeTracer components. +//! +//! The crate defines a structured [`RecorderError`] type backed by +//! [`ErrorKind`] and [`ErrorCode`] classifications. Use the provided macros +//! (`usage!`, `enverr!`, `target!`, `bug!`, `ensure_usage!`, `ensure_env!`, etc.) +//! to author failures consistently across the workspace. + +use std::borrow::Cow; +use std::collections::BTreeMap; +use std::error::Error as StdError; +use std::fmt; +use std::io; + +/// Result alias used throughout the recorder workspace. +/// +/// # Examples +/// +/// ``` +/// use recorder_errors::{ErrorCode, ErrorKind, RecorderError, RecorderResult}; +/// +/// fn validate(flag: &str) -> RecorderResult<()> { +/// if flag == "ok" { +/// Ok(()) +/// } else { +/// Err(RecorderError::new( +/// ErrorKind::Usage, +/// ErrorCode::UnsupportedFormat, +/// "flag must be 'ok'", +/// )) +/// } +/// } +/// +/// assert!(validate("ok").is_ok()); +/// assert!(validate("nope").is_err()); +/// ``` +pub type RecorderResult = Result; + +/// Key-value metadata associated with an error. +/// +/// # Examples +/// +/// ``` +/// use recorder_errors::{ContextMap, RecorderError, ErrorKind, ErrorCode}; +/// +/// let mut context: ContextMap = ContextMap::new(); +/// context.insert("path", "/tmp/trace.json".into()); +/// +/// let error = RecorderError::new( +/// ErrorKind::Environment, +/// ErrorCode::Io, +/// "failed to write trace", +/// ) +/// .with_context("path", "/tmp/trace.json"); +/// +/// assert_eq!(context.get("path"), Some(&"/tmp/trace.json".to_owned())); +/// assert_eq!(error.context.get("path"), Some(&"/tmp/trace.json".to_owned())); +/// ``` +pub type ContextMap = BTreeMap<&'static str, String>; + +/// High-level grouping of recorder failures. +/// +/// # Examples +/// +/// ``` +/// use recorder_errors::{ErrorCode, ErrorKind, RecorderError}; +/// +/// let err = RecorderError::new(ErrorKind::Target, ErrorCode::TraceIncomplete, "target failed"); +/// +/// match err.kind { +/// ErrorKind::Target => { +/// // Target code misbehaved; take recovery action. +/// } +/// // Non-exhaustive enums require a catch-all branch for forward compatibility. +/// _ => {} +/// } +/// ``` +#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)] +#[non_exhaustive] +#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))] +pub enum ErrorKind { + /// Caller provided invalid input or violated usage constraints. + Usage, + /// External environment or IO prevented the recorder from continuing. + Environment, + /// Target code being traced raised/behaved unexpectedly. + Target, + /// Internal bug or invariant violation inside the recorder. + Internal, +} + +/// Stable error codes used for analytics and tooling. +/// +/// # Examples +/// +/// ``` +/// use recorder_errors::ErrorCode; +/// +/// let code = ErrorCode::TraceMissing; +/// assert_eq!(code.as_str(), "ERR_TRACE_MISSING"); +/// assert_eq!(ErrorCode::parse("ERR_TRACE_MISSING"), Some(ErrorCode::TraceMissing)); +/// assert_eq!(ErrorCode::parse("ERR_DOES_NOT_EXIST"), None); +/// ``` +#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)] +#[non_exhaustive] +#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))] +pub enum ErrorCode { + /// Fallback code used when no specific code applies yet. + Unknown, + /// Attempted to start tracing while another session runs. + AlreadyTracing, + /// Requested trace directory exists but is not a directory. + TraceDirectoryConflict, + /// Failed to create the trace directory due to IO errors. + TraceDirectoryCreateFailed, + /// User requested an unsupported trace format. + UnsupportedFormat, + /// Introspection of positional arguments failed. + MissingPositionalArgument, + /// Introspection of keyword arguments failed. + MissingKeywordArgument, + /// Failed to resolve frame locals or metadata. + FrameIntrospectionFailed, + /// Failed to resolve frame globals during introspection. + GlobalsIntrospectionFailed, + /// Attempted to install multiple tracers simultaneously. + TracerInstallConflict, + /// General IO failure propagated from lower layers. + Io, + /// Invalid runtime policy configuration value. + InvalidPolicyValue, + /// Recorder was configured to require a trace but none was produced. + TraceMissing, + /// Recorder stopped early leaving partial trace artefacts behind. + TraceIncomplete, +} + +impl ErrorCode { + /// Stable identifier string for this error code. + /// + /// # Examples + /// + /// ``` + /// use recorder_errors::ErrorCode; + /// + /// assert_eq!(ErrorCode::Io.as_str(), "ERR_IO"); + /// ``` + pub const fn as_str(self) -> &'static str { + match self { + ErrorCode::Unknown => "ERR_UNKNOWN", + ErrorCode::AlreadyTracing => "ERR_ALREADY_TRACING", + ErrorCode::TraceDirectoryConflict => "ERR_TRACE_DIR_CONFLICT", + ErrorCode::TraceDirectoryCreateFailed => "ERR_TRACE_DIR_CREATE_FAILED", + ErrorCode::UnsupportedFormat => "ERR_UNSUPPORTED_FORMAT", + ErrorCode::MissingPositionalArgument => "ERR_MISSING_POSITIONAL_ARG", + ErrorCode::MissingKeywordArgument => "ERR_MISSING_KEYWORD_ARG", + ErrorCode::FrameIntrospectionFailed => "ERR_FRAME_INTROSPECTION_FAILED", + ErrorCode::GlobalsIntrospectionFailed => "ERR_GLOBALS_INTROSPECTION_FAILED", + ErrorCode::TracerInstallConflict => "ERR_TRACER_INSTALL_CONFLICT", + ErrorCode::Io => "ERR_IO", + ErrorCode::InvalidPolicyValue => "ERR_INVALID_POLICY_VALUE", + ErrorCode::TraceMissing => "ERR_TRACE_MISSING", + ErrorCode::TraceIncomplete => "ERR_TRACE_INCOMPLETE", + } + } + + /// Parse a string representation (e.g. `ERR_TRACE_MISSING`) back into an `ErrorCode`. + /// + /// # Examples + /// + /// ``` + /// use recorder_errors::ErrorCode; + /// + /// let parsed = ErrorCode::parse("ERR_TRACE_INCOMPLETE"); + /// assert_eq!(parsed, Some(ErrorCode::TraceIncomplete)); + /// assert!(ErrorCode::parse("ERR_UNKNOWN_CODE").is_none()); + /// ``` + pub fn parse(value: &str) -> Option { + match value { + "ERR_UNKNOWN" => Some(ErrorCode::Unknown), + "ERR_ALREADY_TRACING" => Some(ErrorCode::AlreadyTracing), + "ERR_TRACE_DIR_CONFLICT" => Some(ErrorCode::TraceDirectoryConflict), + "ERR_TRACE_DIR_CREATE_FAILED" => Some(ErrorCode::TraceDirectoryCreateFailed), + "ERR_UNSUPPORTED_FORMAT" => Some(ErrorCode::UnsupportedFormat), + "ERR_MISSING_POSITIONAL_ARG" => Some(ErrorCode::MissingPositionalArgument), + "ERR_MISSING_KEYWORD_ARG" => Some(ErrorCode::MissingKeywordArgument), + "ERR_FRAME_INTROSPECTION_FAILED" => Some(ErrorCode::FrameIntrospectionFailed), + "ERR_GLOBALS_INTROSPECTION_FAILED" => Some(ErrorCode::GlobalsIntrospectionFailed), + "ERR_TRACER_INSTALL_CONFLICT" => Some(ErrorCode::TracerInstallConflict), + "ERR_IO" => Some(ErrorCode::Io), + "ERR_INVALID_POLICY_VALUE" => Some(ErrorCode::InvalidPolicyValue), + "ERR_TRACE_MISSING" => Some(ErrorCode::TraceMissing), + "ERR_TRACE_INCOMPLETE" => Some(ErrorCode::TraceIncomplete), + _ => None, + } + } +} + +impl fmt::Display for ErrorCode { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + f.write_str(self.as_str()) + } +} + +/// Canonical error type flowing through the recorder workspace. +/// +/// # Examples +/// +/// ``` +/// use recorder_errors::{ErrorCode, ErrorKind, RecorderError}; +/// +/// let err = RecorderError::new( +/// ErrorKind::Environment, +/// ErrorCode::Io, +/// "failed to create trace directory", +/// ) +/// .with_context("path", "/tmp/trace") +/// .with_message("unable to prepare trace output"); +/// +/// assert_eq!(err.message(), "unable to prepare trace output"); +/// assert_eq!(err.context.get("path"), Some(&"/tmp/trace".to_owned())); +/// ``` +#[derive(Debug)] +pub struct RecorderError { + pub kind: ErrorKind, + pub code: ErrorCode, + pub message: Cow<'static, str>, + pub context: ContextMap, + source: Option>, +} + +impl RecorderError { + /// Create a new error with the provided classification and message. + /// + /// # Examples + /// + /// ``` + /// use recorder_errors::{ErrorCode, ErrorKind, RecorderError}; + /// + /// let err = RecorderError::new( + /// ErrorKind::Usage, + /// ErrorCode::UnsupportedFormat, + /// "format must be json", + /// ); + /// assert_eq!(err.code, ErrorCode::UnsupportedFormat); + /// ``` + pub fn new(kind: ErrorKind, code: ErrorCode, message: impl Into>) -> Self { + Self { + kind, + code, + message: message.into(), + context: ContextMap::new(), + source: None, + } + } + + /// Attach a context key/value pair to the error. + /// + /// # Examples + /// + /// ``` + /// use recorder_errors::{ErrorCode, ErrorKind, RecorderError}; + /// + /// let err = RecorderError::new(ErrorKind::Target, ErrorCode::TraceIncomplete, "failed") + /// .with_context("function", "process_event"); + /// assert_eq!(err.context.get("function"), Some(&"process_event".to_owned())); + /// ``` + pub fn with_context(mut self, key: &'static str, value: impl Into) -> Self { + self.context.insert(key, value.into()); + self + } + + /// Attach an underlying error source. + /// + /// # Examples + /// + /// ``` + /// use std::io; + /// use recorder_errors::{ErrorCode, ErrorKind, RecorderError}; + /// + /// let io_err = io::Error::new(io::ErrorKind::Other, "disk full"); + /// let err = RecorderError::new(ErrorKind::Environment, ErrorCode::Io, "write failed") + /// .with_source(io_err); + /// assert!(err.source_ref().is_some()); + /// ``` + pub fn with_source(mut self, source: E) -> Self + where + E: StdError + Send + Sync + 'static, + { + self.source = Some(Box::new(source)); + self + } + + /// Update the error message while retaining classification and metadata. + /// + /// # Examples + /// + /// ``` + /// use recorder_errors::{ErrorCode, ErrorKind, RecorderError}; + /// + /// let err = RecorderError::new(ErrorKind::Usage, ErrorCode::TraceMissing, "not found") + /// .with_message("trace is required"); + /// assert_eq!(err.message(), "trace is required"); + /// ``` + pub fn with_message(mut self, message: impl Into>) -> Self { + self.message = message.into(); + self + } + + /// Borrow the primary human-readable message. + /// + /// # Examples + /// + /// ``` + /// use recorder_errors::{ErrorCode, ErrorKind, RecorderError}; + /// + /// let err = RecorderError::new(ErrorKind::Usage, ErrorCode::MissingKeywordArgument, "missing"); + /// assert_eq!(err.message(), "missing"); + /// ``` + pub fn message(&self) -> &str { + self.message.as_ref() + } + + /// Borrow the optional underlying source. + /// + /// # Examples + /// + /// ``` + /// use std::io; + /// use recorder_errors::{ErrorCode, ErrorKind, RecorderError}; + /// + /// let io_err = io::Error::new(io::ErrorKind::Other, "network down"); + /// let err = RecorderError::new(ErrorKind::Environment, ErrorCode::Io, "request failed") + /// .with_source(io_err); + /// assert!(err.source_ref().is_some()); + /// ``` + pub fn source_ref(&self) -> Option<&(dyn StdError + Send + Sync + 'static)> { + self.source.as_deref() + } +} + +impl fmt::Display for RecorderError { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + write!(f, "{}", self.message) + } +} + +impl StdError for RecorderError { + fn source(&self) -> Option<&(dyn StdError + 'static)> { + self.source + .as_deref() + .map(|err| err as &(dyn StdError + 'static)) + } +} + +impl From for RecorderError { + /// Convert an `io::Error` into a recorder error with `ErrorKind::Environment`. + /// + /// # Examples + /// + /// ``` + /// use std::io; + /// use recorder_errors::{ErrorCode, RecorderError}; + /// + /// let io_err = io::Error::new(io::ErrorKind::Other, "disk full"); + /// let err: RecorderError = io_err.into(); + /// assert_eq!(err.code, ErrorCode::Io); + /// ``` + fn from(err: io::Error) -> Self { + RecorderError::new(ErrorKind::Environment, ErrorCode::Io, err.to_string()).with_source(err) + } +} + +/// Declare a recorder error using formatting syntax. +/// +/// # Examples +/// +/// ``` +/// let err = recorder_errors::recorder_error!( +/// recorder_errors::ErrorKind::Usage, +/// recorder_errors::ErrorCode::UnsupportedFormat, +/// "format not supported", +/// ); +/// assert_eq!(err.code, recorder_errors::ErrorCode::UnsupportedFormat); +/// ``` +#[macro_export] +macro_rules! recorder_error { + ($kind:expr, $code:expr, $msg:literal $(,)?) => { + $crate::RecorderError::new($kind, $code, $msg) + }; + ($kind:expr, $code:expr, $fmt:expr, $($arg:tt)+) => { + $crate::RecorderError::new($kind, $code, ::std::borrow::Cow::Owned(format!($fmt, $($arg)+))) + }; +} + +/// Convenience macro for usage errors. +/// +/// # Examples +/// +/// ``` +/// let err = recorder_errors::usage!( +/// recorder_errors::ErrorCode::MissingPositionalArgument, +/// "argument missing", +/// ); +/// assert_eq!(err.kind, recorder_errors::ErrorKind::Usage); +/// ``` +#[macro_export] +macro_rules! usage { + ($code:expr, $msg:literal $(,)?) => { + $crate::recorder_error!($crate::ErrorKind::Usage, $code, $msg) + }; + ($code:expr, $fmt:expr, $($arg:tt)+) => { + $crate::recorder_error!($crate::ErrorKind::Usage, $code, $fmt, $($arg)+) + }; +} + +/// Convenience macro for environment/IO errors. +/// +/// # Examples +/// +/// ``` +/// let err = recorder_errors::enverr!( +/// recorder_errors::ErrorCode::Io, +/// "failed to write", +/// ); +/// assert_eq!(err.kind, recorder_errors::ErrorKind::Environment); +/// ``` +#[macro_export] +macro_rules! enverr { + ($code:expr, $msg:literal $(,)?) => { + $crate::recorder_error!($crate::ErrorKind::Environment, $code, $msg) + }; + ($code:expr, $fmt:expr, $($arg:tt)+) => { + $crate::recorder_error!($crate::ErrorKind::Environment, $code, $fmt, $($arg)+) + }; +} + +/// Convenience macro for target errors. +/// +/// # Examples +/// +/// ``` +/// let err = recorder_errors::target!( +/// recorder_errors::ErrorCode::TraceIncomplete, +/// "target raised", +/// ); +/// assert_eq!(err.kind, recorder_errors::ErrorKind::Target); +/// ``` +#[macro_export] +macro_rules! target { + ($code:expr, $msg:literal $(,)?) => { + $crate::recorder_error!($crate::ErrorKind::Target, $code, $msg) + }; + ($code:expr, $fmt:expr, $($arg:tt)+) => { + $crate::recorder_error!($crate::ErrorKind::Target, $code, $fmt, $($arg)+) + }; +} + +/// Convenience macro for internal bugs/invariants. +/// +/// # Examples +/// +/// ``` +/// let err = recorder_errors::bug!( +/// recorder_errors::ErrorCode::TraceIncomplete, +/// "unexpected state", +/// ); +/// assert_eq!(err.kind, recorder_errors::ErrorKind::Internal); +/// ``` +#[macro_export] +macro_rules! bug { + ($code:expr, $msg:literal $(,)?) => { + $crate::recorder_error!($crate::ErrorKind::Internal, $code, $msg) + }; + ($code:expr, $fmt:expr, $($arg:tt)+) => { + $crate::recorder_error!($crate::ErrorKind::Internal, $code, $fmt, $($arg)+) + }; +} + +/// Ensure a predicate holds, returning a usage error when it does not. +/// +/// # Examples +/// +/// ``` +/// use recorder_errors::{ErrorCode, RecorderResult}; +/// +/// fn guard(active: bool) -> RecorderResult<()> { +/// recorder_errors::ensure_usage!(active, ErrorCode::AlreadyTracing, "already tracing"); +/// Ok(()) +/// } +/// +/// assert!(guard(true).is_ok()); +/// assert_eq!(guard(false).unwrap_err().code, ErrorCode::AlreadyTracing); +/// ``` +#[macro_export] +macro_rules! ensure_usage { + ($cond:expr, $code:expr, $msg:literal $(,)?) => { + if !$cond { + return Err($crate::usage!($code, $msg)); + } + }; + ($cond:expr, $code:expr, $fmt:expr, $($arg:tt)+) => { + if !$cond { + return Err($crate::usage!($code, $fmt, $($arg)+)); + } + }; +} + +/// Ensure a predicate holds, returning an environment error when it does not. +/// +/// # Examples +/// +/// ``` +/// use recorder_errors::{ErrorCode, RecorderResult}; +/// +/// fn guard(io_ready: bool) -> RecorderResult<()> { +/// recorder_errors::ensure_env!(io_ready, ErrorCode::Io, "io failure"); +/// Ok(()) +/// } +/// +/// assert!(guard(true).is_ok()); +/// assert_eq!(guard(false).unwrap_err().code, ErrorCode::Io); +/// ``` +#[macro_export] +macro_rules! ensure_env { + ($cond:expr, $code:expr, $msg:literal $(,)?) => { + if !$cond { + return Err($crate::enverr!($code, $msg)); + } + }; + ($cond:expr, $code:expr, $fmt:expr, $($arg:tt)+) => { + if !$cond { + return Err($crate::enverr!($code, $fmt, $($arg)+)); + } + }; +} + +/// Ensure a predicate holds, returning an internal bug otherwise. +/// +/// # Examples +/// +/// ``` +/// use recorder_errors::{ErrorCode, RecorderResult}; +/// +/// fn guard(invariant_ok: bool) -> RecorderResult<()> { +/// recorder_errors::ensure_internal!(invariant_ok, ErrorCode::TraceIncomplete, "corrupted state"); +/// Ok(()) +/// } +/// +/// assert!(guard(true).is_ok()); +/// assert_eq!(guard(false).unwrap_err().kind, recorder_errors::ErrorKind::Internal); +/// ``` +#[macro_export] +macro_rules! ensure_internal { + ($cond:expr, $code:expr, $msg:literal $(,)?) => { + if !$cond { + return Err($crate::bug!($code, $msg)); + } + }; + ($cond:expr, $code:expr, $fmt:expr, $($arg:tt)+) => { + if !$cond { + return Err($crate::bug!($code, $fmt, $($arg)+)); + } + }; +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn usage_macro_builds_error() { + let err = usage!(ErrorCode::UnsupportedFormat, "unsupported"); + assert_eq!(err.kind, ErrorKind::Usage); + assert_eq!(err.code, ErrorCode::UnsupportedFormat); + assert_eq!(err.message, "unsupported"); + } + + #[test] + fn ensure_usage_fails() -> RecorderResult<()> { + fn guarded(value: bool) -> RecorderResult<()> { + ensure_usage!(value, ErrorCode::AlreadyTracing, "still tracing"); + Ok(()) + } + + let failure = guarded(false).expect_err("guard should fail"); + assert_eq!(failure.kind, ErrorKind::Usage); + assert_eq!(failure.code, ErrorCode::AlreadyTracing); + guarded(true)?; + Ok(()) + } + + #[test] + fn context_and_source_roundtrip() { + let io_err = io::Error::new(io::ErrorKind::Other, "disk full"); + let err = RecorderError::from(io_err) + .with_context("path", "/tmp/out") + .with_message("failed to write trace"); + assert_eq!(err.kind, ErrorKind::Environment); + assert_eq!(err.code, ErrorCode::Io); + assert_eq!(err.context.get("path"), Some(&"/tmp/out".to_string())); + assert_eq!(err.to_string(), "failed to write trace"); + assert!(err.source_ref().is_some()); + } + + #[test] + fn ensure_env_reports_environment() { + fn guarded(value: bool) -> RecorderResult<()> { + ensure_env!(value, ErrorCode::TraceDirectoryCreateFailed, "io failure"); + Ok(()) + } + + let err = guarded(false).expect_err("should fail"); + assert_eq!(err.kind, ErrorKind::Environment); + assert_eq!(err.code, ErrorCode::TraceDirectoryCreateFailed); + } + + #[test] + fn bug_macro_marks_internal() { + let err = bug!(ErrorCode::FrameIntrospectionFailed, "panic avoided"); + assert_eq!(err.kind, ErrorKind::Internal); + assert_eq!(err.code, ErrorCode::FrameIntrospectionFailed); + } + + #[test] + fn target_macro_marks_target_error() { + let err = target!(ErrorCode::TraceIncomplete, "target callback failed"); + assert_eq!(err.kind, ErrorKind::Target); + assert_eq!(err.code, ErrorCode::TraceIncomplete); + } + + #[test] + fn ensure_internal_marks_internal_failures() { + fn guarded(assert_ok: bool) -> RecorderResult<()> { + ensure_internal!(assert_ok, ErrorCode::TraceIncomplete, "invariant broken"); + Ok(()) + } + + let err = guarded(false).expect_err("expected invariant failure"); + assert_eq!(err.kind, ErrorKind::Internal); + assert_eq!(err.code, ErrorCode::TraceIncomplete); + guarded(true).expect("guarded success"); + } + + #[test] + fn parse_roundtrip_matches_known_codes() { + for code in [ + ErrorCode::Unknown, + ErrorCode::AlreadyTracing, + ErrorCode::TraceDirectoryConflict, + ErrorCode::TraceDirectoryCreateFailed, + ErrorCode::UnsupportedFormat, + ErrorCode::MissingPositionalArgument, + ErrorCode::MissingKeywordArgument, + ErrorCode::FrameIntrospectionFailed, + ErrorCode::GlobalsIntrospectionFailed, + ErrorCode::TracerInstallConflict, + ErrorCode::Io, + ErrorCode::InvalidPolicyValue, + ErrorCode::TraceMissing, + ErrorCode::TraceIncomplete, + ] { + let code_str = code.as_str(); + assert_eq!(ErrorCode::parse(code_str), Some(code)); + } + assert_eq!(ErrorCode::parse("ERR_NOT_REAL"), None); + } +} diff --git a/codetracer-python-recorder/scripts/generate_coverage_comment.py b/codetracer-python-recorder/scripts/generate_coverage_comment.py index 82bab17..ef3d9f0 100755 --- a/codetracer-python-recorder/scripts/generate_coverage_comment.py +++ b/codetracer-python-recorder/scripts/generate_coverage_comment.py @@ -118,7 +118,13 @@ def _load_rust_rows( summary_path: pathlib.Path, repo_root: pathlib.Path, ) -> Tuple[List[Row], Dict[str, float]]: - rows, totals = load_rust_summary(summary_path, repo_root) + summary_result = load_rust_summary(summary_path, repo_root) + if not isinstance(summary_result, tuple): + summary_result = tuple(summary_result) + if len(summary_result) < 2: + raise SystemExit("Rust summary loader returned an unexpected payload") + + rows, totals = summary_result[:2] # Normalise totals dict to expected keys total = float(totals.get("count", 0)) covered = float(totals.get("covered", 0)) diff --git a/codetracer-python-recorder/scripts/lint_no_unwraps.py b/codetracer-python-recorder/scripts/lint_no_unwraps.py new file mode 100755 index 0000000..77c8f9d --- /dev/null +++ b/codetracer-python-recorder/scripts/lint_no_unwraps.py @@ -0,0 +1,61 @@ +#!/usr/bin/env python3 +"""Fail the build when unchecked `.unwrap(` usage appears outside the allowlist.""" +from __future__ import annotations + +import argparse +import pathlib +import sys +from typing import Iterable + +DEFAULT_ALLOWLIST = { + pathlib.Path("codetracer-python-recorder/src/runtime/value_encoder.rs"), + pathlib.Path("codetracer-python-recorder/src/runtime/mod.rs"), + pathlib.Path("codetracer-python-recorder/src/monitoring/mod.rs"), + pathlib.Path("codetracer-python-recorder/src/monitoring/tracer.rs"), +} + + +def scan_for_unsafe_unwraps(root: pathlib.Path, allowlist: Iterable[pathlib.Path]) -> list[pathlib.Path]: + repo_root = root.resolve() + src_root = repo_root / "codetracer-python-recorder" / "src" + allowed = {path.resolve() for path in (repo_root / entry for entry in allowlist)} + + failures: list[pathlib.Path] = [] + for candidate in src_root.rglob("*.rs"): + if candidate in allowed: + continue + text = candidate.read_text() + if ".unwrap(" in text: + failures.append(candidate.relative_to(repo_root)) + return failures + + +def main() -> int: + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument( + "--repo-root", + type=pathlib.Path, + default=pathlib.Path(__file__).resolve().parents[2], + help="Path to the repository root (auto-detected by default)", + ) + parser.add_argument( + "--allow", + action="append", + dest="allow", + default=[str(path) for path in DEFAULT_ALLOWLIST], + help="Additional relative paths that may contain unwrap usage", + ) + args = parser.parse_args() + + allowlist = [pathlib.Path(entry) for entry in args.allow] + failures = scan_for_unsafe_unwraps(args.repo_root, allowlist) + if failures: + print("Found disallowed `.unwrap(` usage in the recorder crate:", file=sys.stderr) + for failure in sorted(failures): + print(f" - {failure}", file=sys.stderr) + return 1 + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/codetracer-python-recorder/scripts/render_rust_coverage_summary.py b/codetracer-python-recorder/scripts/render_rust_coverage_summary.py index cb18ab4..e17df52 100755 --- a/codetracer-python-recorder/scripts/render_rust_coverage_summary.py +++ b/codetracer-python-recorder/scripts/render_rust_coverage_summary.py @@ -34,7 +34,7 @@ def parse_args(argv: Iterable[str] | None = None) -> argparse.Namespace: def load_rows(summary_path: pathlib.Path, repo_root: pathlib.Path) -> List[Tuple[str, int, int, float]]: payload = _load_payload(summary_path) - rows, _ = load_summary(summary_path, repo_root, payload) + rows, _, _ = load_summary(summary_path, repo_root, payload) return rows @@ -42,7 +42,7 @@ def load_summary( summary_path: pathlib.Path, repo_root: pathlib.Path, payload: Dict | None = None, -) -> Tuple[List[Tuple[str, int, int, float]], Dict[str, float]]: +) -> Tuple[List[Tuple[str, int, int, float]], Dict[str, float], Dict[str, Tuple[int, int]]]: if payload is None: payload = _load_payload(summary_path) @@ -50,6 +50,7 @@ def load_summary( rows: List[Tuple[str, int, int, float]] = [] totals: Dict[str, float] = {} + crate_totals: Dict[str, Tuple[int, int]] = {} for dataset in payload.get("data", []): dataset_totals = dataset.get("totals", {}) @@ -73,8 +74,19 @@ def load_summary( percent = float(line_summary.get("percent", 0.0)) rows.append((rel_path.as_posix(), total, missed, percent)) + crate_key = _crate_key(rel_path) + agg_total, agg_covered = crate_totals.get(crate_key, (0, 0)) + crate_totals[crate_key] = (agg_total + total, agg_covered + covered) + rows.sort(key=lambda item: item[0]) - return rows, totals + return rows, totals, crate_totals + + +def _crate_key(rel_path: pathlib.Path) -> str: + parts = rel_path.parts + if len(parts) >= 3 and parts[1] == "crates": + return "/".join(parts[:3]) + return parts[0] def render(rows: List[Tuple[str, int, int, float]]) -> str: @@ -90,11 +102,33 @@ def render(rows: List[Tuple[str, int, int, float]]) -> str: return "\n".join(lines) +def render_crates(crate_totals: Dict[str, Tuple[int, int]]) -> str: + if not crate_totals: + return "Rust crate coverage summary: no project files found" + + name_width = max(len(name) for name in crate_totals) + lines = [ + "Rust coverage summary by crate (lines):", + f"{'Crate'.ljust(name_width)} Lines Miss Cover", + ] + + for name in sorted(crate_totals): + total, covered = crate_totals[name] + missed = max(total - covered, 0) + percent = 0.0 if total == 0 else (covered / total) * 100 + lines.append(f"{name.ljust(name_width)} {total:5d} {missed:4d} {percent:5.1f}%") + + return "\n".join(lines) + + def main(argv: Iterable[str] | None = None) -> int: args = parse_args(argv) - rows, _ = load_summary(args.summary_path, args.root) - output = render(rows) - print(output) + rows, _, crate_totals = load_summary(args.summary_path, args.root) + crate_output = render_crates(crate_totals) + file_output = render(rows) + print(crate_output) + print() + print(file_output) return 0 diff --git a/codetracer-python-recorder/src/errors.rs b/codetracer-python-recorder/src/errors.rs new file mode 100644 index 0000000..9502f48 --- /dev/null +++ b/codetracer-python-recorder/src/errors.rs @@ -0,0 +1,6 @@ +//! Shared helpers for translating `RecorderError` into Python exceptions. + +use recorder_errors::RecorderResult; + +/// Convenient alias for recorder results used across the Rust modules. +pub type Result = RecorderResult; diff --git a/codetracer-python-recorder/src/ffi.rs b/codetracer-python-recorder/src/ffi.rs new file mode 100644 index 0000000..04f51c4 --- /dev/null +++ b/codetracer-python-recorder/src/ffi.rs @@ -0,0 +1,240 @@ +//! FFI helpers bridging `RecorderError` into Python exceptions with panic containment. + +use std::any::Any; +use std::panic::{catch_unwind, AssertUnwindSafe}; + +use pyo3::create_exception; +use pyo3::exceptions::PyException; +use pyo3::prelude::*; +use pyo3::types::PyDict; +use recorder_errors::{ErrorCode, ErrorKind, RecorderError, RecorderResult}; + +use crate::logging; + +create_exception!(codetracer_python_recorder, PyRecorderError, PyException); +create_exception!(codetracer_python_recorder, PyUsageError, PyRecorderError); +create_exception!( + codetracer_python_recorder, + PyEnvironmentError, + PyRecorderError +); +create_exception!(codetracer_python_recorder, PyTargetError, PyRecorderError); +create_exception!(codetracer_python_recorder, PyInternalError, PyRecorderError); + +/// Register the recorder exception hierarchy into the Python module. +pub fn register_exceptions(module: &Bound<'_, PyModule>) -> PyResult<()> { + let py = module.py(); + module.add("RecorderError", py.get_type::())?; + module.add("UsageError", py.get_type::())?; + module.add("EnvironmentError", py.get_type::())?; + module.add("TargetError", py.get_type::())?; + module.add("InternalError", py.get_type::())?; + Ok(()) +} + +/// Execute `operation`, mapping any `RecorderError` into the Python exception hierarchy +/// and containing panics as `PyInternalError` instances. +#[allow(dead_code)] +pub fn dispatch(label: &'static str, operation: F) -> PyResult +where + F: FnOnce() -> RecorderResult, +{ + match catch_unwind(AssertUnwindSafe(operation)) { + Ok(result) => result.map_err(map_recorder_error), + Err(panic_payload) => Err(handle_panic(label, panic_payload)), + } +} + +/// Convert a captured panic into a `PyInternalError` while logging the payload. +pub(crate) fn panic_to_pyerr(label: &'static str, payload: Box) -> PyErr { + handle_panic(label, payload) +} + +fn handle_panic(label: &'static str, payload: Box) -> PyErr { + let message = panic_payload_to_string(&payload); + logging::record_panic(label); + map_recorder_error(RecorderError::new( + ErrorKind::Internal, + ErrorCode::Unknown, + format!("panic in {label}: {message}"), + )) +} + +fn panic_payload_to_string(payload: &Box) -> String { + if let Some(message) = payload.downcast_ref::<&'static str>() { + message.to_string() + } else if let Some(message) = payload.downcast_ref::() { + message.clone() + } else { + "".to_string() + } +} + +/// Map a `RecorderError` into the appropriate Python exception subclass. +pub fn map_recorder_error(err: RecorderError) -> PyErr { + logging::log_recorder_error("recorder_error", &err); + logging::emit_error_trailer(&err); + let source_desc = err.source_ref().map(|src| src.to_string()); + let RecorderError { + kind, + code, + message, + context, + .. + } = err; + + let mut text = format!("[{code}] {message}"); + if !context.is_empty() { + let mut first = true; + text.push_str(" ("); + for (key, value) in &context { + if !first { + text.push_str(", "); + } + first = false; + text.push_str(key); + text.push('='); + text.push_str(value); + } + text.push(')'); + } + if let Some(source) = source_desc.as_ref() { + text.push_str(": caused by "); + text.push_str(source); + } + + let pyerr = match kind { + ErrorKind::Usage => PyUsageError::new_err(text.clone()), + ErrorKind::Environment => PyEnvironmentError::new_err(text.clone()), + ErrorKind::Target => PyTargetError::new_err(text.clone()), + ErrorKind::Internal => PyInternalError::new_err(text.clone()), + _ => PyInternalError::new_err(text.clone()), + }; + + Python::with_gil(|py| { + let instance = pyerr.value(py); + let _ = instance.setattr("code", code.as_str()); + let _ = instance.setattr("kind", format!("{:?}", kind)); + let context_dict = PyDict::new(py); + for (key, value) in &context { + let _ = context_dict.set_item(*key, value); + } + let _ = instance.setattr("context", context_dict); + }); + + pyerr +} + +/// Helper that guards a `#[pyfunction]` implementation, catching panics while +/// leaving existing `PyResult` usage intact. +pub fn wrap_pyfunction(label: &'static str, operation: F) -> PyResult +where + F: FnOnce() -> PyResult, +{ + match catch_unwind(AssertUnwindSafe(operation)) { + Ok(result) => result, + Err(panic_payload) => Err(handle_panic(label, panic_payload)), + } +} + +#[cfg(test)] +mod tests { + use super::*; + use recorder_errors::{enverr, target, usage}; + + #[test] + fn map_recorder_error_sets_python_attributes() { + Python::with_gil(|py| { + let err = usage!( + ErrorCode::UnsupportedFormat, + "invalid trace format" + ) + .with_context("format", "yaml") + .with_source(std::io::Error::new(std::io::ErrorKind::Other, "boom")); + let pyerr = map_recorder_error(err); + let ty = pyerr.get_type(py); + assert!(ty.is(py.get_type::())); + let value = pyerr.value(py); + assert_eq!( + value + .getattr("code") + .expect("error code attribute") + .extract::() + .expect("code string"), + "ERR_UNSUPPORTED_FORMAT" + ); + assert_eq!( + value + .getattr("kind") + .expect("error kind attribute") + .extract::() + .expect("kind string"), + "Usage" + ); + let context_obj = value.getattr("context").expect("context attribute"); + let ctx = context_obj + .downcast::() + .expect("context attribute downcast"); + let format_value = ctx + .get_item("format") + .expect("context lookup failed") + .expect("context map missing 'format'"); + assert_eq!( + format_value + .extract::() + .expect("format value extraction"), + "yaml" + ); + }); + } + + #[test] + fn dispatch_converts_recorder_error_to_pyerr() { + Python::with_gil(|py| { + let result: PyResult<()> = dispatch("dispatch_env", || { + Err(enverr!(ErrorCode::Io, "disk full")) + }); + let err = result.expect_err("expected PyErr"); + let ty = err.get_type(py); + assert!(ty.is(py.get_type::())); + }); + } + + #[test] + fn dispatch_converts_panic_into_internal_error() { + Python::with_gil(|py| { + let result: PyResult<()> = dispatch("dispatch_panic", || panic!("boom")); + let err = result.expect_err("expected panic to map into PyErr"); + let ty = err.get_type(py); + assert!(ty.is(py.get_type::())); + assert!(err.to_string().contains("panic in dispatch_panic")); + }); + } + + #[test] + fn wrap_pyfunction_passes_through_success() { + let result = wrap_pyfunction("wrap_ok", || Ok::<_, PyErr>(42)); + assert_eq!(result.expect("expected success"), 42); + } + + #[test] + fn wrap_pyfunction_converts_errors_and_panics() { + Python::with_gil(|py| { + let err = wrap_pyfunction("wrap_error", || -> PyResult<()> { + Err(map_recorder_error(target!( + ErrorCode::TraceIncomplete, + "target failure" + ))) + }) + .expect_err("expected error"); + assert!(err.get_type(py).is(py.get_type::())); + + let panic_err = wrap_pyfunction("wrap_panic", || -> PyResult<()> { + panic!("boom"); + }) + .expect_err("expected panic"); + assert!(panic_err.get_type(py).is(py.get_type::())); + assert!(panic_err.to_string().contains("panic in wrap_panic")); + }); + } +} diff --git a/codetracer-python-recorder/src/lib.rs b/codetracer-python-recorder/src/lib.rs index 0359648..21711c6 100644 --- a/codetracer-python-recorder/src/lib.rs +++ b/codetracer-python-recorder/src/lib.rs @@ -5,8 +5,11 @@ //! the `sys.monitoring.DISABLE` sentinel. pub mod code_object; +mod errors; +mod ffi; mod logging; pub mod monitoring; +mod policy; mod runtime; mod session; @@ -24,11 +27,15 @@ use pyo3::prelude::*; #[pymodule] fn codetracer_python_recorder(_py: Python<'_>, m: &Bound<'_, PyModule>) -> PyResult<()> { // Initialize logging on import so users see logs without extra setup. - // Respect RUST_LOG if present; otherwise default to debug for this crate. - logging::init_rust_logging_with_default("codetracer_python_recorder=debug"); + // Respect RUST_LOG if present; otherwise keep our crate quiet unless warnings/errors occur. + logging::init_rust_logging_with_default("codetracer_python_recorder=warn"); + ffi::register_exceptions(m)?; m.add_function(wrap_pyfunction!(start_tracing, m)?)?; m.add_function(wrap_pyfunction!(stop_tracing, m)?)?; m.add_function(wrap_pyfunction!(is_tracing, m)?)?; m.add_function(wrap_pyfunction!(flush_tracing, m)?)?; + m.add_function(wrap_pyfunction!(policy::configure_policy_py, m)?)?; + m.add_function(wrap_pyfunction!(policy::py_configure_policy_from_env, m)?)?; + m.add_function(wrap_pyfunction!(policy::py_policy_snapshot, m)?)?; Ok(()) } diff --git a/codetracer-python-recorder/src/logging.rs b/codetracer-python-recorder/src/logging.rs index 72c1095..ce28117 100644 --- a/codetracer-python-recorder/src/logging.rs +++ b/codetracer-python-recorder/src/logging.rs @@ -1,19 +1,685 @@ -//! Process-wide logging helpers shared by the PyO3 entry points and tests. +//! Diagnostics utilities: structured logging, metrics sinks, and error trailers. -use std::sync::Once; +use std::cell::Cell; +use std::collections::BTreeMap; +use std::fs::{File, OpenOptions}; +use std::io::{self, Write}; +use std::path::Path; +use std::str::FromStr; +use std::sync::atomic::{AtomicBool, Ordering}; +use std::sync::{Mutex, Once, RwLock}; +use std::time::{SystemTime, UNIX_EPOCH}; -/// Initialise the process-wide Rust logger with a default filter. +use log::{LevelFilter, Log, Metadata, Record}; +use once_cell::sync::OnceCell; +use pyo3::prelude::*; +use recorder_errors::{ErrorCode, RecorderError}; +use serde::Serialize; +use uuid::Uuid; + +use crate::policy::RecorderPolicy; + +thread_local! { + static ERROR_CODE_OVERRIDE: Cell> = Cell::new(None); +} + +static LOGGER_INSTANCE: OnceCell<&'static RecorderLogger> = OnceCell::new(); +static INIT_LOGGER: Once = Once::new(); +static JSON_ERRORS_ENABLED: AtomicBool = AtomicBool::new(false); +static ERROR_TRAILER_WRITER: OnceCell>> = OnceCell::new(); +static METRICS_SINK: OnceCell> = OnceCell::new(); + +/// Structured logging initialisation applied on module import and during tracing start. /// -/// The logger is only set up once per process. Callers can override the filter -/// by setting the `RUST_LOG` environment variable before the first invocation. +/// The first caller installs a process-wide logger that emits JSON records containing +/// `run_id`, `trace_id`, and optional `error_code` fields. Subsequent calls are no-ops. pub fn init_rust_logging_with_default(default_filter: &str) { - static INIT_LOGGER: Once = Once::new(); - INIT_LOGGER.call_once(|| { - let env = env_logger::Env::default().default_filter_or(default_filter); - // Use a compact format with timestamps and targets to aid debugging. - let mut builder = env_logger::Builder::from_env(env); - builder.format_timestamp_micros().format_target(true); - let _ = builder.try_init(); + let default_spec = FilterSpec::parse(default_filter, LevelFilter::Warn) + .unwrap_or_else(|_| FilterSpec::new(LevelFilter::Warn)); + + let initial_spec = std::env::var("RUST_LOG") + .ok() + .and_then(|spec| FilterSpec::parse(&spec, default_spec.global).ok()) + .unwrap_or_else(|| default_spec.clone()); + + let logger = RecorderLogger::new(default_spec, initial_spec); + let leaked: &'static RecorderLogger = Box::leak(Box::new(logger)); + log::set_logger(leaked).expect("recorder logger already initialised"); + log::set_max_level(leaked.filter.read().expect("filter lock").max_level()); + let _ = LOGGER_INSTANCE.set(leaked); }); } + +/// Apply the current policy to logging and diagnostics outputs. +pub fn apply_policy(policy: &RecorderPolicy) { + if let Some(logger) = LOGGER_INSTANCE.get() { + logger.apply_policy(policy); + } + JSON_ERRORS_ENABLED.store(policy.json_errors, Ordering::SeqCst); +} + +/// Scope a log emission with an explicit `ErrorCode` so the structured logger can attach it. +pub fn with_error_code(code: ErrorCode, op: F) -> R +where + F: FnOnce() -> R, +{ + ERROR_CODE_OVERRIDE.with(|cell| { + let previous = cell.replace(Some(code)); + let result = op(); + cell.set(previous); + result + }) +} + +/// Scope a log emission with an optional `ErrorCode` (falls back to `ERR_UNKNOWN`). +pub fn with_error_code_opt(code: Option, op: F) -> R +where + F: FnOnce() -> R, +{ + match code { + Some(code) => with_error_code(code, op), + None => with_error_code(ErrorCode::Unknown, op), + } +} + +/// Update the active trace identifier associated with subsequent log records. +pub fn set_active_trace_id(trace_id: Option) { + if let Some(logger) = LOGGER_INSTANCE.get() { + let mut guard = logger.trace_id.write().expect("trace id lock"); + *guard = trace_id; + } +} + +/// Log a structured representation of `err` for observability pipelines. +pub fn log_recorder_error(label: &str, err: &RecorderError) { + let message = build_error_text(err, Some(label)); + with_error_code(err.code, || { + log::error!(target: "codetracer_python_recorder::errors", "{}", message); + }); +} + +/// Emit a JSON error trailer on stderr when the policy requests it. +pub fn emit_error_trailer(err: &RecorderError) { + if !JSON_ERRORS_ENABLED.load(Ordering::SeqCst) { + return; + } + + let Some(logger) = LOGGER_INSTANCE.get() else { + return; + }; + + let trace_id = logger.trace_id.read().expect("trace id lock").clone(); + + let mut context = serde_json::Map::new(); + for (key, value) in &err.context { + context.insert((*key).to_string(), serde_json::Value::String(value.clone())); + } + + let payload = serde_json::json!({ + "run_id": logger.run_id, + "trace_id": trace_id, + "error_code": err.code.as_str(), + "error_kind": format!("{:?}", err.kind), + "message": err.message(), + "context": context, + }); + + if let Ok(mut bytes) = serde_json::to_vec(&payload) { + bytes.push(b'\n'); + if let Some(writer) = ERROR_TRAILER_WRITER.get() { + let mut guard = writer.lock().expect("error trailer writer lock"); + let _ = guard.write_all(&bytes); + let _ = guard.flush(); + } else { + let mut stderr = io::stderr().lock(); + let _ = stderr.write_all(&bytes); + let _ = stderr.flush(); + } + } +} + +/// Metrics interface allowing pluggable sinks (default: no-op). +pub trait RecorderMetrics: Send + Sync { + /// Record that an event stream was dropped for the provided reason. + fn record_dropped_event(&self, _reason: &'static str) {} + /// Record that tracing detached, optionally linked to an error code. + fn record_detach(&self, _reason: &'static str, _error_code: Option<&str>) {} + /// Record that a panic was caught and converted into an error. + fn record_panic(&self, _label: &'static str) {} +} + +struct NoopMetrics; + +impl RecorderMetrics for NoopMetrics {} + +fn metrics_sink() -> &'static dyn RecorderMetrics { + METRICS_SINK + .get_or_init(|| Box::new(NoopMetrics) as Box) + .as_ref() +} + +/// Install a custom metrics sink. Intended for embedding or tests. +#[cfg_attr(not(test), allow(dead_code))] +pub fn install_metrics(metrics: Box) -> Result<(), Box> { + METRICS_SINK.set(metrics) +} + +/// Record that we abandoned a monitoring location (e.g., synthetic filename). +pub fn record_dropped_event(reason: &'static str) { + metrics_sink().record_dropped_event(reason); +} + +/// Record that we detached per-policy or due to unrecoverable failure. +pub fn record_detach(reason: &'static str, error_code: Option<&str>) { + metrics_sink().record_detach(reason, error_code); +} + +/// Record that we caught a panic at the FFI boundary. +pub fn record_panic(label: &'static str) { + metrics_sink().record_panic(label); +} + +/// Attempt to read an `ErrorCode` attribute from a Python exception value. +pub fn error_code_from_pyerr(py: pyo3::Python<'_>, err: &pyo3::PyErr) -> Option { + let value = err.value(py); + let attr = value.getattr("code").ok()?; + let code_str: String = attr.extract().ok()?; + ErrorCode::parse(&code_str) +} + +/// Provide a helper for tests to override the error trailer destination. +#[cfg(test)] +pub fn set_error_trailer_writer_for_tests(writer: Box) { + let _ = ERROR_TRAILER_WRITER.set(Mutex::new(writer)); +} + +struct RecorderLogger { + run_id: String, + trace_id: RwLock>, + default_filter: FilterSpec, + filter: RwLock, + writer: Mutex, +} + +impl RecorderLogger { + fn new(default_filter: FilterSpec, initial: FilterSpec) -> Self { + Self { + run_id: Uuid::new_v4().to_string(), + trace_id: RwLock::new(None), + writer: Mutex::new(Destination::Stderr), + filter: RwLock::new(initial), + default_filter, + } + } + + fn apply_policy(&self, policy: &RecorderPolicy) { + let new_filter = match policy.log_level.as_deref() { + Some(spec) if !spec.trim().is_empty() => { + match FilterSpec::parse(spec, self.default_filter.global) { + Ok(parsed) => parsed, + Err(_) => { + with_error_code(ErrorCode::InvalidPolicyValue, || { + log::warn!( + target: "codetracer_python_recorder::logging", + "invalid log level filter '{}'; reverting to default", + spec + ); + }); + self.default_filter.clone() + } + } + } + _ => self.default_filter.clone(), + }; + + { + let mut guard = self.filter.write().expect("filter lock"); + *guard = new_filter.clone(); + } + log::set_max_level(new_filter.max_level()); + + match policy.log_file.as_ref() { + Some(path) => match open_log_file(path) { + Ok(file) => { + *self.writer.lock().expect("writer lock") = Destination::File(file); + } + Err(err) => { + with_error_code(ErrorCode::Io, || { + log::warn!( + target: "codetracer_python_recorder::logging", + "failed to open log file '{}': {}", + path.display(), + err + ); + }); + *self.writer.lock().expect("writer lock") = Destination::Stderr; + } + }, + None => { + *self.writer.lock().expect("writer lock") = Destination::Stderr; + } + } + } + + fn enabled(&self, metadata: &Metadata<'_>) -> bool { + self.filter.read().expect("filter lock").allows(metadata) + } + + fn write_entry(&self, entry: &LogEntry<'_>) { + match serde_json::to_vec(entry) { + Ok(mut bytes) => { + bytes.push(b'\n'); + if let Err(err) = self.writer.lock().expect("writer lock").write_all(&bytes) { + let mut stderr = io::stderr().lock(); + let _ = stderr.write_all(&bytes); + let _ = writeln!( + stderr, + "{{\"run_id\":\"{}\",\"message\":\"logger write failure: {}\"}}", + self.run_id, err + ); + } + } + Err(_) => { + // Fallback to plain message if serialization fails + let mut stderr = io::stderr().lock(); + let _ = writeln!( + stderr, + "{{\"run_id\":\"{}\",\"message\":\"failed to encode log entry\"}}", + self.run_id + ); + } + } + } +} + +impl Log for RecorderLogger { + fn enabled(&self, metadata: &Metadata<'_>) -> bool { + self.enabled(metadata) + } + + fn log(&self, record: &Record<'_>) { + if !self.enabled(record.metadata()) { + return; + } + + let thread_code = ERROR_CODE_OVERRIDE.with(|cell| cell.get()); + let error_code = thread_code.map(|code| code.as_str().to_string()); + let mut fields = BTreeMap::new(); + if let Some(code) = error_code.as_ref() { + fields.insert( + "error_code".to_string(), + serde_json::Value::String(code.clone()), + ); + } + + let trace_id = self.trace_id.read().expect("trace id lock").clone(); + + let entry = LogEntry { + ts_micros: current_timestamp_micros(), + level: record.level().as_str(), + target: record.target(), + run_id: &self.run_id, + trace_id: trace_id.as_deref(), + message: record.args().to_string(), + error_code, + module_path: record.module_path(), + file: record.file(), + line: record.line(), + fields, + }; + + self.write_entry(&entry); + } + + fn flush(&self) { + let _ = self.writer.lock().expect("writer lock").flush(); + } +} + +#[derive(Clone)] +struct FilterSpec { + global: LevelFilter, + targets: Vec<(String, LevelFilter)>, +} + +impl FilterSpec { + fn new(global: LevelFilter) -> Self { + Self { + global, + targets: Vec::new(), + } + } + + fn parse(spec: &str, default_global: LevelFilter) -> Result { + let mut filter = Self::new(default_global); + for part in spec.split(',') { + let trimmed = part.trim(); + if trimmed.is_empty() { + continue; + } + if let Some((target, level)) = trimmed.split_once('=') { + let lvl = LevelFilter::from_str(level.trim()).map_err(|_| ())?; + filter.targets.push((target.trim().to_string(), lvl)); + } else { + filter.global = LevelFilter::from_str(trimmed).map_err(|_| ())?; + } + } + Ok(filter) + } + + fn allows(&self, metadata: &Metadata<'_>) -> bool { + let mut allowed = self.global; + let mut matched_len = 0usize; + let target = metadata.target(); + for (pattern, level) in &self.targets { + if target == pattern + || target.starts_with(pattern) && target.chars().nth(pattern.len()) == Some(':') + { + if pattern.len() > matched_len { + matched_len = pattern.len(); + allowed = *level; + } + } + } + allowed >= metadata.level().to_level_filter() + } + + fn max_level(&self) -> LevelFilter { + self.targets + .iter() + .fold(self.global, |acc, (_, lvl)| acc.max(*lvl)) + } +} + +#[derive(Serialize)] +struct LogEntry<'a> { + ts_micros: i128, + level: &'a str, + target: &'a str, + run_id: &'a str, + #[serde(skip_serializing_if = "Option::is_none")] + trace_id: Option<&'a str>, + message: String, + #[serde(skip_serializing_if = "Option::is_none")] + error_code: Option, + #[serde(skip_serializing_if = "Option::is_none")] + module_path: Option<&'a str>, + #[serde(skip_serializing_if = "Option::is_none")] + file: Option<&'a str>, + #[serde(skip_serializing_if = "Option::is_none")] + line: Option, + #[serde(skip_serializing_if = "BTreeMap::is_empty")] + fields: BTreeMap, +} + +fn current_timestamp_micros() -> i128 { + match SystemTime::now().duration_since(UNIX_EPOCH) { + Ok(duration) => { + let secs = duration.as_secs() as i128; + let micros = duration.subsec_micros() as i128; + secs * 1_000_000 + micros + } + Err(_) => 0, + } +} + +enum Destination { + Stderr, + File(File), +} + +impl Destination { + fn write_all(&mut self, bytes: &[u8]) -> io::Result<()> { + match self { + Destination::Stderr => { + let mut stderr = io::stderr().lock(); + stderr.write_all(bytes)?; + stderr.flush() + } + Destination::File(file) => { + file.write_all(bytes)?; + file.flush() + } + } + } + + fn flush(&mut self) -> io::Result<()> { + match self { + Destination::Stderr => io::stderr().lock().flush(), + Destination::File(file) => file.flush(), + } + } +} + +fn open_log_file(path: &Path) -> io::Result { + OpenOptions::new().create(true).append(true).open(path) +} + +fn build_error_text(err: &RecorderError, label: Option<&str>) -> String { + let mut text = String::new(); + if let Some(label) = label { + text.push_str(label); + text.push_str(": "); + } + text.push_str(err.message()); + if !err.context.is_empty() { + text.push_str(" ("); + let mut first = true; + for (key, value) in &err.context { + if !first { + text.push_str(", "); + } + first = false; + text.push_str(key); + text.push('='); + text.push_str(value); + } + text.push(')'); + } + text +} + +#[cfg(test)] +pub mod test_support { + use super::*; + use once_cell::sync::OnceCell; + use std::sync::{Arc, Mutex}; + + #[derive(Clone, Default)] + pub struct CapturingMetrics { + events: Arc>>, + } + + #[derive(Clone, Debug, PartialEq, Eq)] + pub enum MetricEvent { + Dropped(&'static str), + Detach(&'static str, Option), + Panic(&'static str), + } + + impl CapturingMetrics { + pub fn take(&self) -> Vec { + let mut guard = self.events.lock().expect("metrics events lock"); + let events = guard.clone(); + guard.clear(); + events + } + } + + impl RecorderMetrics for CapturingMetrics { + fn record_dropped_event(&self, reason: &'static str) { + self.events + .lock() + .expect("metrics events lock") + .push(MetricEvent::Dropped(reason)); + } + + fn record_detach(&self, reason: &'static str, error_code: Option<&str>) { + self.events + .lock() + .expect("metrics events lock") + .push(MetricEvent::Detach( + reason, + error_code.map(|s| s.to_string()), + )); + } + + fn record_panic(&self, label: &'static str) { + self.events + .lock() + .expect("metrics events lock") + .push(MetricEvent::Panic(label)); + } + } + + static CAPTURING: OnceCell = OnceCell::new(); + + pub fn install() -> &'static CapturingMetrics { + CAPTURING.get_or_init(|| { + let metrics = CapturingMetrics::default(); + let _ = super::install_metrics(Box::new(metrics.clone())); + metrics + }) + } +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::policy::RecorderPolicy; + use once_cell::sync::OnceCell; + use recorder_errors::{ErrorCode, ErrorKind}; + use serde_json::Value; + use std::sync::{Arc, Mutex}; + use tempfile::tempdir; + + fn ensure_logger() { + init_rust_logging_with_default("codetracer_python_recorder=debug"); + } + + fn build_policy() -> RecorderPolicy { + RecorderPolicy::default() + } + + struct VecWriter { + buf: Arc>>, + } + + impl VecWriter { + fn new(buf: Arc>>) -> Self { + Self { buf } + } + } + + impl Write for VecWriter { + fn write(&mut self, data: &[u8]) -> io::Result { + let mut guard = self.buf.lock().expect("buffer lock"); + guard.extend_from_slice(data); + Ok(data.len()) + } + + fn flush(&mut self) -> io::Result<()> { + Ok(()) + } + } + + #[test] + fn structured_log_records_run_and_error_code() { + ensure_logger(); + let tmp = tempdir().expect("tempdir"); + let log_path = tmp.path().join("recorder.log"); + + let mut policy = build_policy(); + policy.log_level = Some("debug".to_string()); + policy.log_file = Some(log_path.clone()); + apply_policy(&policy); + + with_error_code(ErrorCode::TraceMissing, || { + log::error!(target: "codetracer_python_recorder::tests", "sample message"); + }); + + log::logger().flush(); + + let contents = std::fs::read_to_string(&log_path).expect("read log file"); + let line = contents.lines().last().expect("log line"); + let json: Value = serde_json::from_str(line).expect("valid json log"); + + assert!(json.get("run_id").and_then(Value::as_str).is_some()); + assert_eq!( + json.get("error_code").and_then(Value::as_str), + Some("ERR_TRACE_MISSING") + ); + assert_eq!( + json.get("message").and_then(Value::as_str), + Some("sample message") + ); + + apply_policy(&RecorderPolicy::default()); + } + + #[test] + fn json_error_trailers_emit_payload() { + ensure_logger(); + static BUFFER: OnceCell>>> = OnceCell::new(); + let buf = BUFFER.get_or_init(|| { + let buffer = Arc::new(Mutex::new(Vec::new())); + let writer = VecWriter::new(buffer.clone()); + set_error_trailer_writer_for_tests(Box::new(writer)); + buffer + }); + buf.lock().expect("buffer lock").clear(); + + let mut policy = build_policy(); + policy.json_errors = true; + apply_policy(&policy); + + let mut err = RecorderError::new( + ErrorKind::Usage, + ErrorCode::TraceMissing, + "no trace produced", + ); + err = err.with_context("path", "/tmp/trace".to_string()); + + emit_error_trailer(&err); + + let data = buf.lock().expect("buffer lock").clone(); + let payload = String::from_utf8(data).expect("utf8"); + let line = payload.lines().last().expect("json line"); + let json: Value = serde_json::from_str(line).expect("valid trailer json"); + + assert_eq!( + json.get("error_code").and_then(Value::as_str), + Some("ERR_TRACE_MISSING") + ); + assert_eq!( + json.get("message").and_then(Value::as_str), + Some("no trace produced") + ); + assert_eq!( + json.get("context") + .and_then(|ctx| ctx.get("path")) + .and_then(Value::as_str), + Some("/tmp/trace") + ); + + policy.json_errors = false; + apply_policy(&policy); + } + + #[test] + fn metrics_sink_records_events() { + let metrics = test_support::install(); + metrics.take(); + record_dropped_event("synthetic"); + record_detach("policy_disable", Some("ERR_TRACE_MISSING")); + record_panic("ffi_guard"); + let events = metrics.take(); + assert!(events.contains(&test_support::MetricEvent::Dropped("synthetic"))); + assert!(events.contains(&test_support::MetricEvent::Detach( + "policy_disable", + Some("ERR_TRACE_MISSING".to_string()) + ))); + assert!(events.contains(&test_support::MetricEvent::Panic("ffi_guard"))); + } +} diff --git a/codetracer-python-recorder/src/monitoring/tracer.rs b/codetracer-python-recorder/src/monitoring/tracer.rs index bc7512f..ff63aee 100644 --- a/codetracer-python-recorder/src/monitoring/tracer.rs +++ b/codetracer-python-recorder/src/monitoring/tracer.rs @@ -1,14 +1,19 @@ //! Tracer trait and sys.monitoring callback plumbing. use std::any::Any; +use std::panic::{catch_unwind, AssertUnwindSafe}; use std::sync::Mutex; use crate::code_object::{CodeObjectRegistry, CodeObjectWrapper}; +use crate::ffi; +use crate::logging; +use crate::policy::{self, OnRecorderError}; +use log::{error, warn}; use pyo3::{ - exceptions::PyRuntimeError, prelude::*, types::{PyAny, PyCode, PyModule}, }; +use recorder_errors::{usage, ErrorCode}; use super::{ acquire_tool_id, free_tool_id, monitoring_events, register_callback, set_events, @@ -108,6 +113,12 @@ pub trait Tracer: Send + Any { Ok(CallbackOutcome::Continue) } + /// Notify the tracer that an unrecoverable error occurred and the runtime + /// is transitioning into a detach/disable flow. + fn notify_failure(&mut self, _py: Python<'_>) -> PyResult<()> { + Ok(()) + } + /// Called on resumption of a generator/coroutine (not via throw()). fn on_py_resume( &mut self, @@ -254,20 +265,167 @@ struct Global { static GLOBAL: Mutex> = Mutex::new(None); -impl Global { - fn handle_callback(&self, py: Python<'_>, result: CallbackResult) -> PyResult> { - match result? { - CallbackOutcome::Continue => Ok(py.None()), - CallbackOutcome::DisableLocation => Ok(self.disable_sentinel.clone_ref(py)), +fn catch_callback(label: &'static str, callback: F) -> CallbackResult +where + F: FnOnce() -> CallbackResult, +{ + match catch_unwind(AssertUnwindSafe(callback)) { + Ok(result) => result, + Err(payload) => Err(ffi::panic_to_pyerr(label, payload)), + } +} + +fn call_tracer_with_code<'py, F>( + py: Python<'py>, + guard: &mut Option, + code: &Bound<'py, PyCode>, + label: &'static str, + callback: F, +) -> CallbackResult +where + F: FnOnce(&mut dyn Tracer, &CodeObjectWrapper) -> CallbackResult, +{ + let global = guard.as_mut().expect("tracer installed"); + let wrapper = global.registry.get_or_insert(py, code); + let tracer = global.tracer.as_mut(); + catch_callback(label, || callback(tracer, &wrapper)) +} + +fn handle_callback_result( + py: Python<'_>, + guard: &mut Option, + result: CallbackResult, +) -> PyResult> { + match result { + Ok(CallbackOutcome::Continue) => Ok(py.None()), + Ok(CallbackOutcome::DisableLocation) => Ok(guard + .as_ref() + .map(|global| global.disable_sentinel.clone_ref(py)) + .unwrap_or_else(|| py.None())), + Err(err) => handle_callback_error(py, guard, err), + } +} + +fn handle_callback_error( + py: Python<'_>, + guard: &mut Option, + err: PyErr, +) -> PyResult> { + let policy = policy::policy_snapshot(); + match policy.on_recorder_error { + OnRecorderError::Abort => Err(err), + OnRecorderError::Disable => { + let message = err.to_string(); + let code = logging::error_code_from_pyerr(py, &err); + logging::record_detach("policy_disable", code.map(|code| code.as_str())); + logging::with_error_code_opt(code, || { + error!( + "recorder callback error; disabling tracer per policy: {}", + message + ); + }); + if let Some(global) = guard.as_mut() { + if let Err(notify_err) = global.tracer.notify_failure(py) { + logging::with_error_code(ErrorCode::TraceIncomplete, || { + warn!( + "failed to notify tracer about disable transition: {}", + notify_err + ); + }); + } + } + uninstall_locked(py, guard)?; + Ok(py.None()) + } + } +} + +fn uninstall_locked(py: Python<'_>, guard: &mut Option) -> PyResult<()> { + if let Some(mut global) = guard.take() { + let finish_result = global.tracer.finish(py); + + let cleanup_result = (|| -> PyResult<()> { + let events = monitoring_events(py)?; + if global.mask.contains(&events.CALL) { + register_callback(py, &global.tool, &events.CALL, None)?; + } + if global.mask.contains(&events.LINE) { + register_callback(py, &global.tool, &events.LINE, None)?; + } + if global.mask.contains(&events.INSTRUCTION) { + register_callback(py, &global.tool, &events.INSTRUCTION, None)?; + } + if global.mask.contains(&events.JUMP) { + register_callback(py, &global.tool, &events.JUMP, None)?; + } + if global.mask.contains(&events.BRANCH) { + register_callback(py, &global.tool, &events.BRANCH, None)?; + } + if global.mask.contains(&events.PY_START) { + register_callback(py, &global.tool, &events.PY_START, None)?; + } + if global.mask.contains(&events.PY_RESUME) { + register_callback(py, &global.tool, &events.PY_RESUME, None)?; + } + if global.mask.contains(&events.PY_RETURN) { + register_callback(py, &global.tool, &events.PY_RETURN, None)?; + } + if global.mask.contains(&events.PY_YIELD) { + register_callback(py, &global.tool, &events.PY_YIELD, None)?; + } + if global.mask.contains(&events.PY_THROW) { + register_callback(py, &global.tool, &events.PY_THROW, None)?; + } + if global.mask.contains(&events.PY_UNWIND) { + register_callback(py, &global.tool, &events.PY_UNWIND, None)?; + } + if global.mask.contains(&events.RAISE) { + register_callback(py, &global.tool, &events.RAISE, None)?; + } + if global.mask.contains(&events.RERAISE) { + register_callback(py, &global.tool, &events.RERAISE, None)?; + } + if global.mask.contains(&events.EXCEPTION_HANDLED) { + register_callback(py, &global.tool, &events.EXCEPTION_HANDLED, None)?; + } + // if global.mask.contains(&events.STOP_ITERATION) { + // register_callback(py, &global.tool, &events.STOP_ITERATION, None)?; + // } + if global.mask.contains(&events.C_RETURN) { + register_callback(py, &global.tool, &events.C_RETURN, None)?; + } + if global.mask.contains(&events.C_RAISE) { + register_callback(py, &global.tool, &events.C_RAISE, None)?; + } + + set_events(py, &global.tool, NO_EVENTS)?; + free_tool_id(py, &global.tool)?; + Ok(()) + })(); + + if let Err(err) = finish_result { + if let Err(cleanup_err) = cleanup_result { + warn!( + "failed to reset monitoring callbacks after finish error: {}", + cleanup_err + ); + } + return Err(err); } + + cleanup_result?; } + Ok(()) } /// Install a tracer and hook it into Python's `sys.monitoring`. pub fn install_tracer(py: Python<'_>, tracer: Box) -> PyResult<()> { let mut guard = GLOBAL.lock().unwrap(); if guard.is_some() { - return Err(PyRuntimeError::new_err("tracer already installed")); + return Err(ffi::map_recorder_error(usage!( + ErrorCode::TracerInstallConflict, + "tracer already installed" + ))); } let tool = acquire_tool_id(py)?; @@ -364,67 +522,7 @@ pub fn install_tracer(py: Python<'_>, tracer: Box) -> PyResult<()> { /// Remove the installed tracer if any. pub fn uninstall_tracer(py: Python<'_>) -> PyResult<()> { let mut guard = GLOBAL.lock().unwrap(); - if let Some(mut global) = guard.take() { - // Give the tracer a chance to finish underlying writers before - // unregistering callbacks. - let _ = global.tracer.finish(py); - let events = monitoring_events(py)?; - if global.mask.contains(&events.CALL) { - register_callback(py, &global.tool, &events.CALL, None)?; - } - if global.mask.contains(&events.LINE) { - register_callback(py, &global.tool, &events.LINE, None)?; - } - if global.mask.contains(&events.INSTRUCTION) { - register_callback(py, &global.tool, &events.INSTRUCTION, None)?; - } - if global.mask.contains(&events.JUMP) { - register_callback(py, &global.tool, &events.JUMP, None)?; - } - if global.mask.contains(&events.BRANCH) { - register_callback(py, &global.tool, &events.BRANCH, None)?; - } - if global.mask.contains(&events.PY_START) { - register_callback(py, &global.tool, &events.PY_START, None)?; - } - if global.mask.contains(&events.PY_RESUME) { - register_callback(py, &global.tool, &events.PY_RESUME, None)?; - } - if global.mask.contains(&events.PY_RETURN) { - register_callback(py, &global.tool, &events.PY_RETURN, None)?; - } - if global.mask.contains(&events.PY_YIELD) { - register_callback(py, &global.tool, &events.PY_YIELD, None)?; - } - if global.mask.contains(&events.PY_THROW) { - register_callback(py, &global.tool, &events.PY_THROW, None)?; - } - if global.mask.contains(&events.PY_UNWIND) { - register_callback(py, &global.tool, &events.PY_UNWIND, None)?; - } - if global.mask.contains(&events.RAISE) { - register_callback(py, &global.tool, &events.RAISE, None)?; - } - if global.mask.contains(&events.RERAISE) { - register_callback(py, &global.tool, &events.RERAISE, None)?; - } - if global.mask.contains(&events.EXCEPTION_HANDLED) { - register_callback(py, &global.tool, &events.EXCEPTION_HANDLED, None)?; - } - // if global.mask.contains(&events.STOP_ITERATION) { - // register_callback(py, &global.tool, &events.STOP_ITERATION, None)?; - // } - if global.mask.contains(&events.C_RETURN) { - register_callback(py, &global.tool, &events.C_RETURN, None)?; - } - if global.mask.contains(&events.C_RAISE) { - register_callback(py, &global.tool, &events.C_RAISE, None)?; - } - - set_events(py, &global.tool, NO_EVENTS)?; - free_tool_id(py, &global.tool)?; - } - Ok(()) + uninstall_locked(py, &mut guard) } /// Flush the currently installed tracer if any. @@ -443,24 +541,32 @@ fn callback_call( callable: Bound<'_, PyAny>, arg0: Option>, ) -> PyResult> { - if let Some(global) = GLOBAL.lock().unwrap().as_mut() { - let wrapper = global.registry.get_or_insert(py, &code); - let result = global - .tracer - .on_call(py, &wrapper, offset, &callable, arg0.as_ref()); - return global.handle_callback(py, result); - } - Ok(py.None()) + ffi::wrap_pyfunction("callback_call", || { + let mut guard = GLOBAL.lock().unwrap(); + if guard.is_none() { + return Ok(py.None()); + } + let result = + call_tracer_with_code(py, &mut guard, &code, "callback_call", |tracer, wrapper| { + tracer.on_call(py, wrapper, offset, &callable, arg0.as_ref()) + }); + handle_callback_result(py, &mut guard, result) + }) } #[pyfunction] fn callback_line(py: Python<'_>, code: Bound<'_, PyCode>, lineno: u32) -> PyResult> { - if let Some(global) = GLOBAL.lock().unwrap().as_mut() { - let wrapper = global.registry.get_or_insert(py, &code); - let result = global.tracer.on_line(py, &wrapper, lineno); - return global.handle_callback(py, result); - } - Ok(py.None()) + ffi::wrap_pyfunction("callback_line", || { + let mut guard = GLOBAL.lock().unwrap(); + if guard.is_none() { + return Ok(py.None()); + } + let result = + call_tracer_with_code(py, &mut guard, &code, "callback_line", |tracer, wrapper| { + tracer.on_line(py, wrapper, lineno) + }); + handle_callback_result(py, &mut guard, result) + }) } #[pyfunction] @@ -469,14 +575,20 @@ fn callback_instruction( code: Bound<'_, PyCode>, instruction_offset: i32, ) -> PyResult> { - if let Some(global) = GLOBAL.lock().unwrap().as_mut() { - let wrapper = global.registry.get_or_insert(py, &code); - let result = global - .tracer - .on_instruction(py, &wrapper, instruction_offset); - return global.handle_callback(py, result); - } - Ok(py.None()) + ffi::wrap_pyfunction("callback_instruction", || { + let mut guard = GLOBAL.lock().unwrap(); + if guard.is_none() { + return Ok(py.None()); + } + let result = call_tracer_with_code( + py, + &mut guard, + &code, + "callback_instruction", + |tracer, wrapper| tracer.on_instruction(py, wrapper, instruction_offset), + ); + handle_callback_result(py, &mut guard, result) + }) } #[pyfunction] @@ -486,14 +598,17 @@ fn callback_jump( instruction_offset: i32, destination_offset: i32, ) -> PyResult> { - if let Some(global) = GLOBAL.lock().unwrap().as_mut() { - let wrapper = global.registry.get_or_insert(py, &code); - let result = global - .tracer - .on_jump(py, &wrapper, instruction_offset, destination_offset); - return global.handle_callback(py, result); - } - Ok(py.None()) + ffi::wrap_pyfunction("callback_jump", || { + let mut guard = GLOBAL.lock().unwrap(); + if guard.is_none() { + return Ok(py.None()); + } + let result = + call_tracer_with_code(py, &mut guard, &code, "callback_jump", |tracer, wrapper| { + tracer.on_jump(py, wrapper, instruction_offset, destination_offset) + }); + handle_callback_result(py, &mut guard, result) + }) } #[pyfunction] @@ -503,14 +618,20 @@ fn callback_branch( instruction_offset: i32, destination_offset: i32, ) -> PyResult> { - if let Some(global) = GLOBAL.lock().unwrap().as_mut() { - let wrapper = global.registry.get_or_insert(py, &code); - let result = global - .tracer - .on_branch(py, &wrapper, instruction_offset, destination_offset); - return global.handle_callback(py, result); - } - Ok(py.None()) + ffi::wrap_pyfunction("callback_branch", || { + let mut guard = GLOBAL.lock().unwrap(); + if guard.is_none() { + return Ok(py.None()); + } + let result = call_tracer_with_code( + py, + &mut guard, + &code, + "callback_branch", + |tracer, wrapper| tracer.on_branch(py, wrapper, instruction_offset, destination_offset), + ); + handle_callback_result(py, &mut guard, result) + }) } #[pyfunction] @@ -519,12 +640,20 @@ fn callback_py_start( code: Bound<'_, PyCode>, instruction_offset: i32, ) -> PyResult> { - if let Some(global) = GLOBAL.lock().unwrap().as_mut() { - let wrapper = global.registry.get_or_insert(py, &code); - let result = global.tracer.on_py_start(py, &wrapper, instruction_offset); - return global.handle_callback(py, result); - } - Ok(py.None()) + ffi::wrap_pyfunction("callback_py_start", || { + let mut guard = GLOBAL.lock().unwrap(); + if guard.is_none() { + return Ok(py.None()); + } + let result = call_tracer_with_code( + py, + &mut guard, + &code, + "callback_py_start", + |tracer, wrapper| tracer.on_py_start(py, wrapper, instruction_offset), + ); + handle_callback_result(py, &mut guard, result) + }) } #[pyfunction] @@ -533,12 +662,20 @@ fn callback_py_resume( code: Bound<'_, PyCode>, instruction_offset: i32, ) -> PyResult> { - if let Some(global) = GLOBAL.lock().unwrap().as_mut() { - let wrapper = global.registry.get_or_insert(py, &code); - let result = global.tracer.on_py_resume(py, &wrapper, instruction_offset); - return global.handle_callback(py, result); - } - Ok(py.None()) + ffi::wrap_pyfunction("callback_py_resume", || { + let mut guard = GLOBAL.lock().unwrap(); + if guard.is_none() { + return Ok(py.None()); + } + let result = call_tracer_with_code( + py, + &mut guard, + &code, + "callback_py_resume", + |tracer, wrapper| tracer.on_py_resume(py, wrapper, instruction_offset), + ); + handle_callback_result(py, &mut guard, result) + }) } #[pyfunction] @@ -548,14 +685,20 @@ fn callback_py_return( instruction_offset: i32, retval: Bound<'_, PyAny>, ) -> PyResult> { - if let Some(global) = GLOBAL.lock().unwrap().as_mut() { - let wrapper = global.registry.get_or_insert(py, &code); - let result = global - .tracer - .on_py_return(py, &wrapper, instruction_offset, &retval); - return global.handle_callback(py, result); - } - Ok(py.None()) + ffi::wrap_pyfunction("callback_py_return", || { + let mut guard = GLOBAL.lock().unwrap(); + if guard.is_none() { + return Ok(py.None()); + } + let result = call_tracer_with_code( + py, + &mut guard, + &code, + "callback_py_return", + |tracer, wrapper| tracer.on_py_return(py, wrapper, instruction_offset, &retval), + ); + handle_callback_result(py, &mut guard, result) + }) } #[pyfunction] @@ -565,14 +708,20 @@ fn callback_py_yield( instruction_offset: i32, retval: Bound<'_, PyAny>, ) -> PyResult> { - if let Some(global) = GLOBAL.lock().unwrap().as_mut() { - let wrapper = global.registry.get_or_insert(py, &code); - let result = global - .tracer - .on_py_yield(py, &wrapper, instruction_offset, &retval); - return global.handle_callback(py, result); - } - Ok(py.None()) + ffi::wrap_pyfunction("callback_py_yield", || { + let mut guard = GLOBAL.lock().unwrap(); + if guard.is_none() { + return Ok(py.None()); + } + let result = call_tracer_with_code( + py, + &mut guard, + &code, + "callback_py_yield", + |tracer, wrapper| tracer.on_py_yield(py, wrapper, instruction_offset, &retval), + ); + handle_callback_result(py, &mut guard, result) + }) } #[pyfunction] @@ -582,14 +731,20 @@ fn callback_py_throw( instruction_offset: i32, exception: Bound<'_, PyAny>, ) -> PyResult> { - if let Some(global) = GLOBAL.lock().unwrap().as_mut() { - let wrapper = global.registry.get_or_insert(py, &code); - let result = global - .tracer - .on_py_throw(py, &wrapper, instruction_offset, &exception); - return global.handle_callback(py, result); - } - Ok(py.None()) + ffi::wrap_pyfunction("callback_py_throw", || { + let mut guard = GLOBAL.lock().unwrap(); + if guard.is_none() { + return Ok(py.None()); + } + let result = call_tracer_with_code( + py, + &mut guard, + &code, + "callback_py_throw", + |tracer, wrapper| tracer.on_py_throw(py, wrapper, instruction_offset, &exception), + ); + handle_callback_result(py, &mut guard, result) + }) } #[pyfunction] @@ -599,14 +754,20 @@ fn callback_py_unwind( instruction_offset: i32, exception: Bound<'_, PyAny>, ) -> PyResult> { - if let Some(global) = GLOBAL.lock().unwrap().as_mut() { - let wrapper = global.registry.get_or_insert(py, &code); - let result = global - .tracer - .on_py_unwind(py, &wrapper, instruction_offset, &exception); - return global.handle_callback(py, result); - } - Ok(py.None()) + ffi::wrap_pyfunction("callback_py_unwind", || { + let mut guard = GLOBAL.lock().unwrap(); + if guard.is_none() { + return Ok(py.None()); + } + let result = call_tracer_with_code( + py, + &mut guard, + &code, + "callback_py_unwind", + |tracer, wrapper| tracer.on_py_unwind(py, wrapper, instruction_offset, &exception), + ); + handle_callback_result(py, &mut guard, result) + }) } #[pyfunction] @@ -616,14 +777,20 @@ fn callback_raise( instruction_offset: i32, exception: Bound<'_, PyAny>, ) -> PyResult> { - if let Some(global) = GLOBAL.lock().unwrap().as_mut() { - let wrapper = global.registry.get_or_insert(py, &code); - let result = global - .tracer - .on_raise(py, &wrapper, instruction_offset, &exception); - return global.handle_callback(py, result); - } - Ok(py.None()) + ffi::wrap_pyfunction("callback_raise", || { + let mut guard = GLOBAL.lock().unwrap(); + if guard.is_none() { + return Ok(py.None()); + } + let result = call_tracer_with_code( + py, + &mut guard, + &code, + "callback_raise", + |tracer, wrapper| tracer.on_raise(py, wrapper, instruction_offset, &exception), + ); + handle_callback_result(py, &mut guard, result) + }) } #[pyfunction] @@ -633,14 +800,20 @@ fn callback_reraise( instruction_offset: i32, exception: Bound<'_, PyAny>, ) -> PyResult> { - if let Some(global) = GLOBAL.lock().unwrap().as_mut() { - let wrapper = global.registry.get_or_insert(py, &code); - let result = global - .tracer - .on_reraise(py, &wrapper, instruction_offset, &exception); - return global.handle_callback(py, result); - } - Ok(py.None()) + ffi::wrap_pyfunction("callback_reraise", || { + let mut guard = GLOBAL.lock().unwrap(); + if guard.is_none() { + return Ok(py.None()); + } + let result = call_tracer_with_code( + py, + &mut guard, + &code, + "callback_reraise", + |tracer, wrapper| tracer.on_reraise(py, wrapper, instruction_offset, &exception), + ); + handle_callback_result(py, &mut guard, result) + }) } #[pyfunction] @@ -650,15 +823,22 @@ fn callback_exception_handled( instruction_offset: i32, exception: Bound<'_, PyAny>, ) -> PyResult> { - if let Some(global) = GLOBAL.lock().unwrap().as_mut() { - let wrapper = global.registry.get_or_insert(py, &code); - let result = - global - .tracer - .on_exception_handled(py, &wrapper, instruction_offset, &exception); - return global.handle_callback(py, result); - } - Ok(py.None()) + ffi::wrap_pyfunction("callback_exception_handled", || { + let mut guard = GLOBAL.lock().unwrap(); + if guard.is_none() { + return Ok(py.None()); + } + let result = call_tracer_with_code( + py, + &mut guard, + &code, + "callback_exception_handled", + |tracer, wrapper| { + tracer.on_exception_handled(py, wrapper, instruction_offset, &exception) + }, + ); + handle_callback_result(py, &mut guard, result) + }) } // See comment in Tracer trait @@ -685,14 +865,20 @@ fn callback_c_return( callable: Bound<'_, PyAny>, arg0: Option>, ) -> PyResult> { - if let Some(global) = GLOBAL.lock().unwrap().as_mut() { - let wrapper = global.registry.get_or_insert(py, &code); - let result = global - .tracer - .on_c_return(py, &wrapper, offset, &callable, arg0.as_ref()); - return global.handle_callback(py, result); - } - Ok(py.None()) + ffi::wrap_pyfunction("callback_c_return", || { + let mut guard = GLOBAL.lock().unwrap(); + if guard.is_none() { + return Ok(py.None()); + } + let result = call_tracer_with_code( + py, + &mut guard, + &code, + "callback_c_return", + |tracer, wrapper| tracer.on_c_return(py, wrapper, offset, &callable, arg0.as_ref()), + ); + handle_callback_result(py, &mut guard, result) + }) } #[pyfunction] @@ -703,12 +889,18 @@ fn callback_c_raise( callable: Bound<'_, PyAny>, arg0: Option>, ) -> PyResult> { - if let Some(global) = GLOBAL.lock().unwrap().as_mut() { - let wrapper = global.registry.get_or_insert(py, &code); - let result = global - .tracer - .on_c_raise(py, &wrapper, offset, &callable, arg0.as_ref()); - return global.handle_callback(py, result); - } - Ok(py.None()) + ffi::wrap_pyfunction("callback_c_raise", || { + let mut guard = GLOBAL.lock().unwrap(); + if guard.is_none() { + return Ok(py.None()); + } + let result = call_tracer_with_code( + py, + &mut guard, + &code, + "callback_c_raise", + |tracer, wrapper| tracer.on_c_raise(py, wrapper, offset, &callable, arg0.as_ref()), + ); + handle_callback_result(py, &mut guard, result) + }) } diff --git a/codetracer-python-recorder/src/policy.rs b/codetracer-python-recorder/src/policy.rs new file mode 100644 index 0000000..091a639 --- /dev/null +++ b/codetracer-python-recorder/src/policy.rs @@ -0,0 +1,387 @@ +//! Runtime configuration policy for the recorder. + +use std::env; +use std::path::PathBuf; +use std::str::FromStr; +use std::sync::RwLock; + +use once_cell::sync::OnceCell; +use recorder_errors::{usage, ErrorCode, RecorderError, RecorderResult}; + +/// Environment variable configuring how the recorder reacts to internal errors. +pub const ENV_ON_RECORDER_ERROR: &str = "CODETRACER_ON_RECORDER_ERROR"; +/// Environment variable enforcing that a trace file must be produced. +pub const ENV_REQUIRE_TRACE: &str = "CODETRACER_REQUIRE_TRACE"; +/// Environment variable toggling whether partial trace files are kept. +pub const ENV_KEEP_PARTIAL_TRACE: &str = "CODETRACER_KEEP_PARTIAL_TRACE"; +/// Environment variable controlling log level for the recorder crate. +pub const ENV_LOG_LEVEL: &str = "CODETRACER_LOG_LEVEL"; +/// Environment variable pointing to a log destination file. +pub const ENV_LOG_FILE: &str = "CODETRACER_LOG_FILE"; +/// Environment variable enabling JSON error trailers on stderr. +pub const ENV_JSON_ERRORS: &str = "CODETRACER_JSON_ERRORS"; + +static POLICY: OnceCell> = OnceCell::new(); + +fn policy_cell() -> &'static RwLock { + POLICY.get_or_init(|| RwLock::new(RecorderPolicy::default())) +} + +/// Behaviour when the recorder encounters an error. +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum OnRecorderError { + /// Propagate the error to callers; tracing stops with a non-zero exit. + Abort, + /// Disable tracing but allow the host process to continue running. + Disable, +} + +impl Default for OnRecorderError { + fn default() -> Self { + OnRecorderError::Abort + } +} + +#[derive(Debug)] +pub struct PolicyParseError(pub RecorderError); + +impl FromStr for OnRecorderError { + type Err = PolicyParseError; + + fn from_str(value: &str) -> Result { + match value.trim().to_ascii_lowercase().as_str() { + "abort" => Ok(OnRecorderError::Abort), + "disable" => Ok(OnRecorderError::Disable), + other => Err(PolicyParseError(usage!( + ErrorCode::InvalidPolicyValue, + "invalid on_recorder_error value '{}' (expected 'abort' or 'disable')", + other + ))), + } + } +} + +/// Recorder-wide runtime configuration. +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct RecorderPolicy { + pub on_recorder_error: OnRecorderError, + pub require_trace: bool, + pub keep_partial_trace: bool, + pub log_level: Option, + pub log_file: Option, + pub json_errors: bool, +} + +impl Default for RecorderPolicy { + fn default() -> Self { + Self { + on_recorder_error: OnRecorderError::Abort, + require_trace: false, + keep_partial_trace: false, + log_level: None, + log_file: None, + json_errors: false, + } + } +} + +impl RecorderPolicy { + fn apply_update(&mut self, update: PolicyUpdate) { + if let Some(on_err) = update.on_recorder_error { + self.on_recorder_error = on_err; + } + if let Some(require_trace) = update.require_trace { + self.require_trace = require_trace; + } + if let Some(keep_partial) = update.keep_partial_trace { + self.keep_partial_trace = keep_partial; + } + if let Some(level) = update.log_level { + self.log_level = match level.trim() { + "" => None, + other => Some(other.to_string()), + }; + } + if let Some(path) = update.log_file { + self.log_file = match path { + PolicyPath::Clear => None, + PolicyPath::Value(pb) => Some(pb), + }; + } + if let Some(json_errors) = update.json_errors { + self.json_errors = json_errors; + } + } +} + +/// Internal helper representing path updates. +#[derive(Debug, Clone)] +enum PolicyPath { + Clear, + Value(PathBuf), +} + +/// Mutation record for the policy. +#[derive(Debug, Default, Clone)] +struct PolicyUpdate { + on_recorder_error: Option, + require_trace: Option, + keep_partial_trace: Option, + log_level: Option, + log_file: Option, + json_errors: Option, +} + +/// Snapshot the current policy. +pub fn policy_snapshot() -> RecorderPolicy { + policy_cell().read().expect("policy lock poisoned").clone() +} + +/// Apply the provided update to the global policy. +fn apply_policy_update(update: PolicyUpdate) { + let mut guard = policy_cell().write().expect("policy lock poisoned"); + guard.apply_update(update); + crate::logging::apply_policy(&guard); +} + +/// Load policy overrides from environment variables. +pub fn configure_policy_from_env() -> RecorderResult<()> { + let mut update = PolicyUpdate::default(); + + if let Ok(value) = env::var(ENV_ON_RECORDER_ERROR) { + let on_err = OnRecorderError::from_str(&value).map_err(|err| err.0)?; + update.on_recorder_error = Some(on_err); + } + + if let Ok(value) = env::var(ENV_REQUIRE_TRACE) { + update.require_trace = Some(parse_bool(&value)?); + } + + if let Ok(value) = env::var(ENV_KEEP_PARTIAL_TRACE) { + update.keep_partial_trace = Some(parse_bool(&value)?); + } + + if let Ok(value) = env::var(ENV_LOG_LEVEL) { + update.log_level = Some(value); + } + + if let Ok(value) = env::var(ENV_LOG_FILE) { + let path = if value.trim().is_empty() { + PolicyPath::Clear + } else { + PolicyPath::Value(PathBuf::from(value)) + }; + update.log_file = Some(path); + } + + if let Ok(value) = env::var(ENV_JSON_ERRORS) { + update.json_errors = Some(parse_bool(&value)?); + } + + apply_policy_update(update); + Ok(()) +} + +fn parse_bool(value: &str) -> RecorderResult { + match value.trim().to_ascii_lowercase().as_str() { + "1" | "true" | "t" | "yes" | "y" => Ok(true), + "0" | "false" | "f" | "no" | "n" => Ok(false), + other => Err(usage!( + ErrorCode::InvalidPolicyValue, + "invalid boolean value '{}' (expected true/false)", + other + )), + } +} + +// === PyO3 helpers === + +use pyo3::prelude::*; +use pyo3::types::PyDict; + +use crate::ffi; + +#[pyfunction(name = "configure_policy")] +#[pyo3(signature = (on_recorder_error=None, require_trace=None, keep_partial_trace=None, log_level=None, log_file=None, json_errors=None))] +pub fn configure_policy_py( + on_recorder_error: Option<&str>, + require_trace: Option, + keep_partial_trace: Option, + log_level: Option<&str>, + log_file: Option<&str>, + json_errors: Option, +) -> PyResult<()> { + let mut update = PolicyUpdate::default(); + + if let Some(value) = on_recorder_error { + match OnRecorderError::from_str(value) { + Ok(parsed) => update.on_recorder_error = Some(parsed), + Err(err) => return Err(ffi::map_recorder_error(err.0)), + } + } + + if let Some(value) = require_trace { + update.require_trace = Some(value); + } + + if let Some(value) = keep_partial_trace { + update.keep_partial_trace = Some(value); + } + + if let Some(value) = log_level { + update.log_level = Some(value.to_string()); + } + + if let Some(value) = log_file { + let path = if value.trim().is_empty() { + PolicyPath::Clear + } else { + PolicyPath::Value(PathBuf::from(value)) + }; + update.log_file = Some(path); + } + + if let Some(value) = json_errors { + update.json_errors = Some(value); + } + + apply_policy_update(update); + Ok(()) +} + +#[pyfunction(name = "configure_policy_from_env")] +pub fn py_configure_policy_from_env() -> PyResult<()> { + configure_policy_from_env().map_err(ffi::map_recorder_error) +} + +#[pyfunction(name = "policy_snapshot")] +pub fn py_policy_snapshot(py: Python<'_>) -> PyResult { + let snapshot = policy_snapshot(); + let dict = PyDict::new(py); + dict.set_item( + "on_recorder_error", + match snapshot.on_recorder_error { + OnRecorderError::Abort => "abort", + OnRecorderError::Disable => "disable", + }, + )?; + dict.set_item("require_trace", snapshot.require_trace)?; + dict.set_item("keep_partial_trace", snapshot.keep_partial_trace)?; + if let Some(level) = snapshot.log_level.as_deref() { + dict.set_item("log_level", level)?; + } else { + dict.set_item("log_level", py.None())?; + } + if let Some(path) = snapshot.log_file.as_ref() { + dict.set_item("log_file", path.display().to_string())?; + } else { + dict.set_item("log_file", py.None())?; + } + dict.set_item("json_errors", snapshot.json_errors)?; + Ok(dict.into()) +} + +#[cfg(test)] +mod tests { + use super::*; + use std::path::Path; + + fn reset_policy() { + let mut guard = super::policy_cell().write().expect("policy lock poisoned"); + *guard = RecorderPolicy::default(); + } + + #[test] + fn default_policy_snapshot() { + reset_policy(); + let snap = policy_snapshot(); + assert_eq!(snap.on_recorder_error, OnRecorderError::Abort); + assert!(!snap.require_trace); + assert!(!snap.keep_partial_trace); + assert!(!snap.json_errors); + assert!(snap.log_level.is_none()); + assert!(snap.log_file.is_none()); + } + + #[test] + fn configure_policy_updates_fields() { + reset_policy(); + let mut update = PolicyUpdate::default(); + update.on_recorder_error = Some(OnRecorderError::Disable); + update.require_trace = Some(true); + update.keep_partial_trace = Some(true); + update.log_level = Some("debug".to_string()); + update.log_file = Some(PolicyPath::Value(PathBuf::from("/tmp/log.txt"))); + update.json_errors = Some(true); + + apply_policy_update(update); + + let snap = policy_snapshot(); + assert_eq!(snap.on_recorder_error, OnRecorderError::Disable); + assert!(snap.require_trace); + assert!(snap.keep_partial_trace); + assert_eq!(snap.log_level.as_deref(), Some("debug")); + assert_eq!(snap.log_file.as_deref(), Some(Path::new("/tmp/log.txt"))); + assert!(snap.json_errors); + reset_policy(); + } + + #[test] + fn configure_policy_from_env_parses_values() { + reset_policy(); + let env_guard = env_lock(); + env::set_var(ENV_ON_RECORDER_ERROR, "disable"); + env::set_var(ENV_REQUIRE_TRACE, "true"); + env::set_var(ENV_KEEP_PARTIAL_TRACE, "1"); + env::set_var(ENV_LOG_LEVEL, "info"); + env::set_var(ENV_LOG_FILE, "/tmp/out.log"); + env::set_var(ENV_JSON_ERRORS, "yes"); + + configure_policy_from_env().expect("configure from env"); + + drop(env_guard); + + let snap = policy_snapshot(); + assert_eq!(snap.on_recorder_error, OnRecorderError::Disable); + assert!(snap.require_trace); + assert!(snap.keep_partial_trace); + assert_eq!(snap.log_level.as_deref(), Some("info")); + assert_eq!(snap.log_file.as_deref(), Some(Path::new("/tmp/out.log"))); + assert!(snap.json_errors); + reset_policy(); + } + + #[test] + fn configure_policy_from_env_rejects_invalid_boolean() { + reset_policy(); + let env_guard = env_lock(); + env::set_var(ENV_REQUIRE_TRACE, "sometimes"); + + let err = configure_policy_from_env().expect_err("invalid bool should error"); + assert_eq!(err.code, ErrorCode::InvalidPolicyValue); + + drop(env_guard); + reset_policy(); + } + + fn env_lock() -> EnvGuard { + EnvGuard + } + + struct EnvGuard; + + impl Drop for EnvGuard { + fn drop(&mut self) { + for key in [ + ENV_ON_RECORDER_ERROR, + ENV_REQUIRE_TRACE, + ENV_KEEP_PARTIAL_TRACE, + ENV_LOG_LEVEL, + ENV_LOG_FILE, + ENV_JSON_ERRORS, + ] { + env::remove_var(key); + } + } + } +} diff --git a/codetracer-python-recorder/src/runtime/frame_inspector.rs b/codetracer-python-recorder/src/runtime/frame_inspector.rs index b359b92..3bb06d2 100644 --- a/codetracer-python-recorder/src/runtime/frame_inspector.rs +++ b/codetracer-python-recorder/src/runtime/frame_inspector.rs @@ -2,12 +2,13 @@ use std::ptr; -use pyo3::exceptions::PyRuntimeError; use pyo3::prelude::*; use pyo3::types::{PyAny, PyDict, PyMapping}; use pyo3::{ffi, Py, PyErr}; +use recorder_errors::{enverr, ErrorCode}; use crate::code_object::CodeObjectWrapper; +use crate::ffi::map_recorder_error; extern "C" { fn PyFrame_GetLocals(frame: *mut ffi::PyFrameObject) -> *mut ffi::PyObject; @@ -62,9 +63,10 @@ pub fn capture_frame<'py>( ) -> PyResult> { let mut frame_ptr = unsafe { ffi::PyEval_GetFrame() }; if frame_ptr.is_null() { - return Err(PyRuntimeError::new_err( - "PyEval_GetFrame returned null frame", - )); + return Err(map_recorder_error(enverr!( + ErrorCode::FrameIntrospectionFailed, + "PyEval_GetFrame returned null frame" + ))); } unsafe { @@ -82,7 +84,10 @@ pub fn capture_frame<'py>( unsafe { ffi::Py_DECREF(frame_ptr.cast()); } - return Err(PyRuntimeError::new_err("PyFrame_GetCode returned null")); + return Err(map_recorder_error(enverr!( + ErrorCode::FrameIntrospectionFailed, + "PyFrame_GetCode returned null" + ))); } let frame_code: Py = unsafe { Py::from_owned_ptr(py, frame_code_ptr.cast()) }; if frame_code.as_ptr() == target_code_ptr { @@ -96,9 +101,10 @@ pub fn capture_frame<'py>( } if frame_ptr.is_null() { - return Err(PyRuntimeError::new_err( - "Failed to locate frame for code object", - )); + return Err(map_recorder_error(enverr!( + ErrorCode::FrameIntrospectionFailed, + "Failed to locate frame for code object" + ))); } unsafe { @@ -114,38 +120,63 @@ pub fn capture_frame<'py>( unsafe { ffi::Py_DECREF(frame_ptr.cast()); } - return Err(PyRuntimeError::new_err("PyFrame_GetLocals returned null")); + return Err(map_recorder_error(enverr!( + ErrorCode::FrameIntrospectionFailed, + "PyFrame_GetLocals returned null" + ))); } let locals_any = unsafe { Bound::::from_owned_ptr(py, locals_raw.cast()) }; - let locals_mapping = locals_any - .downcast::() - .map_err(|_| PyRuntimeError::new_err("Frame locals was not a mapping"))?; + let locals_mapping = locals_any.downcast::().map_err(|_| { + map_recorder_error(enverr!( + ErrorCode::FrameIntrospectionFailed, + "Frame locals was not a mapping" + )) + })?; let globals_raw = unsafe { PyFrame_GetGlobals(frame_ptr) }; if globals_raw.is_null() { unsafe { ffi::Py_DECREF(frame_ptr.cast()); } - return Err(PyRuntimeError::new_err("PyFrame_GetGlobals returned null")); + return Err(map_recorder_error(enverr!( + ErrorCode::GlobalsIntrospectionFailed, + "PyFrame_GetGlobals returned null" + ))); } let globals_any = unsafe { Bound::::from_owned_ptr(py, globals_raw.cast()) }; - let globals_mapping = globals_any - .downcast::() - .map_err(|_| PyRuntimeError::new_err("Frame globals was not a mapping"))?; + let globals_mapping = globals_any.downcast::().map_err(|_| { + map_recorder_error(enverr!( + ErrorCode::GlobalsIntrospectionFailed, + "Frame globals was not a mapping" + )) + })?; let locals_is_globals = locals_raw == globals_raw; let locals_dict = PyDict::new(py); - locals_dict - .update(&locals_mapping) - .expect("Failed to materialize locals dict"); + locals_dict.update(&locals_mapping).map_err(|err| { + map_recorder_error( + enverr!( + ErrorCode::FrameIntrospectionFailed, + "Failed to materialize locals dict" + ) + .with_context("details", err.to_string()), + ) + })?; let globals_dict = if locals_is_globals { None } else { let dict = PyDict::new(py); - dict.update(&globals_mapping) - .expect("Failed to materialize globals dict"); + dict.update(&globals_mapping).map_err(|err| { + map_recorder_error( + enverr!( + ErrorCode::GlobalsIntrospectionFailed, + "Failed to materialize globals dict" + ) + .with_context("details", err.to_string()), + ) + })?; Some(dict) }; diff --git a/codetracer-python-recorder/src/runtime/mod.rs b/codetracer-python-recorder/src/runtime/mod.rs index ccf0e60..45c8e29 100644 --- a/codetracer-python-recorder/src/runtime/mod.rs +++ b/codetracer-python-recorder/src/runtime/mod.rs @@ -15,20 +15,43 @@ use logging::log_event; use value_capture::{capture_call_arguments, record_return_value, record_visible_scope}; use std::collections::{hash_map::Entry, HashMap, HashSet}; +use std::fs; use std::path::{Path, PathBuf}; +#[cfg(feature = "integration-test")] +use std::sync::atomic::{AtomicBool, Ordering}; +#[cfg(feature = "integration-test")] +use std::sync::OnceLock; use pyo3::prelude::*; use pyo3::types::PyAny; +use recorder_errors::{bug, enverr, target, usage, ErrorCode, RecorderResult}; use runtime_tracing::NonStreamingTraceWriter; use runtime_tracing::{Line, TraceEventsFileFormat, TraceWriter}; use crate::code_object::CodeObjectWrapper; +use crate::ffi; +use crate::logging::{record_dropped_event, set_active_trace_id, with_error_code}; use crate::monitoring::{ events_union, CallbackOutcome, CallbackResult, EventSet, MonitoringEvents, Tracer, }; +use crate::policy::{policy_snapshot, RecorderPolicy}; -// Logging is handled via the `log` crate macros (e.g., log::debug!). +use uuid::Uuid; + +struct TraceIdResetGuard; + +impl TraceIdResetGuard { + fn new() -> Self { + TraceIdResetGuard + } +} + +impl Drop for TraceIdResetGuard { + fn drop(&mut self) { + set_active_trace_id(None); + } +} /// Minimal runtime tracer that maps Python sys.monitoring events to /// runtime_tracing writer operations. @@ -39,6 +62,10 @@ pub struct RuntimeTracer { program_path: PathBuf, ignored_code_ids: HashSet, function_ids: HashMap, + output_paths: Option, + events_recorded: bool, + encountered_failure: bool, + trace_id: String, } #[derive(Clone, Copy, Debug, PartialEq, Eq)] @@ -47,6 +74,133 @@ enum ShouldTrace { SkipAndDisable, } +#[derive(Clone, Copy, Debug, PartialEq, Eq)] +enum FailureStage { + PyStart, + Line, + Finish, +} + +impl FailureStage { + fn as_str(self) -> &'static str { + match self { + FailureStage::PyStart => "py_start", + FailureStage::Line => "line", + FailureStage::Finish => "finish", + } + } +} + +// Failure injection helpers are only compiled for integration tests. +#[cfg_attr(not(feature = "integration-test"), allow(dead_code))] +#[derive(Clone, Copy, Debug, PartialEq, Eq)] +enum FailureMode { + Stage(FailureStage), + SuppressEvents, + TargetArgs, + Panic, +} + +#[cfg(feature = "integration-test")] +static FAILURE_MODE: OnceLock> = OnceLock::new(); +#[cfg(feature = "integration-test")] +static FAILURE_TRIGGERED: AtomicBool = AtomicBool::new(false); + +#[cfg(feature = "integration-test")] +fn configured_failure_mode() -> Option { + *FAILURE_MODE.get_or_init(|| { + let raw = std::env::var("CODETRACER_TEST_INJECT_FAILURE").ok(); + if let Some(value) = raw.as_deref() { + log::debug!("[RuntimeTracer] test failure injection mode: {}", value); + } + raw.and_then(|raw| match raw.trim().to_ascii_lowercase().as_str() { + "py_start" | "py-start" => Some(FailureMode::Stage(FailureStage::PyStart)), + "line" => Some(FailureMode::Stage(FailureStage::Line)), + "finish" => Some(FailureMode::Stage(FailureStage::Finish)), + "suppress-events" | "suppress_events" | "suppress" => Some(FailureMode::SuppressEvents), + "target" | "target-args" | "target_args" => Some(FailureMode::TargetArgs), + "panic" | "panic-callback" | "panic_callback" => Some(FailureMode::Panic), + _ => None, + }) + }) +} + +#[cfg(feature = "integration-test")] +fn should_inject_failure(stage: FailureStage) -> bool { + matches!(configured_failure_mode(), Some(FailureMode::Stage(mode)) if mode == stage) + && mark_failure_triggered() +} + +#[cfg(not(feature = "integration-test"))] +fn should_inject_failure(_stage: FailureStage) -> bool { + false +} + +#[cfg(feature = "integration-test")] +fn should_inject_target_error() -> bool { + matches!(configured_failure_mode(), Some(FailureMode::TargetArgs)) + && mark_failure_triggered() +} + +#[cfg(not(feature = "integration-test"))] +fn should_inject_target_error() -> bool { + false +} + +#[cfg(feature = "integration-test")] +fn should_panic_in_callback() -> bool { + matches!(configured_failure_mode(), Some(FailureMode::Panic)) && mark_failure_triggered() +} + +#[cfg(not(feature = "integration-test"))] +#[allow(dead_code)] +fn should_panic_in_callback() -> bool { + false +} + +#[cfg(feature = "integration-test")] +fn suppress_events() -> bool { + matches!(configured_failure_mode(), Some(FailureMode::SuppressEvents)) +} + +#[cfg(not(feature = "integration-test"))] +fn suppress_events() -> bool { + false +} + +#[cfg(feature = "integration-test")] +fn mark_failure_triggered() -> bool { + !FAILURE_TRIGGERED.swap(true, Ordering::SeqCst) +} + +#[cfg(not(feature = "integration-test"))] +#[allow(dead_code)] +fn mark_failure_triggered() -> bool { + false +} + +#[cfg(feature = "integration-test")] +fn injected_failure_err(stage: FailureStage) -> PyErr { + let err = bug!( + ErrorCode::TraceIncomplete, + "test-injected failure at {}", + stage.as_str() + ) + .with_context("injection_stage", stage.as_str().to_string()); + ffi::map_recorder_error(err) +} + +#[cfg(not(feature = "integration-test"))] +fn injected_failure_err(stage: FailureStage) -> PyErr { + let err = bug!( + ErrorCode::TraceIncomplete, + "failure injection requested at {} without fail-injection feature", + stage.as_str() + ) + .with_context("injection_stage", stage.as_str().to_string()); + ffi::map_recorder_error(err) +} + fn is_real_filename(filename: &str) -> bool { let trimmed = filename.trim(); !(trimmed.starts_with('<') && trimmed.ends_with('>')) @@ -70,6 +224,10 @@ impl RuntimeTracer { program_path, ignored_code_ids: HashSet::new(), function_ids: HashMap::new(), + output_paths: None, + events_recorded: false, + encountered_failure: false, + trace_id: Uuid::new_v4().to_string(), } } @@ -79,7 +237,65 @@ impl RuntimeTracer { log::debug!("{}", start_path.display()); outputs .configure_writer(&mut self.writer, start_path, start_line) - .map_err(to_py_err) + .map_err(ffi::map_recorder_error)?; + self.output_paths = Some(outputs.clone()); + self.events_recorded = false; + self.encountered_failure = false; + set_active_trace_id(Some(self.trace_id.clone())); + Ok(()) + } + + fn mark_event(&mut self) { + if suppress_events() { + log::debug!("[RuntimeTracer] skipping event mark due to test injection"); + return; + } + self.events_recorded = true; + } + + fn mark_failure(&mut self) { + self.encountered_failure = true; + } + + fn cleanup_partial_outputs(&self) -> RecorderResult<()> { + if let Some(outputs) = &self.output_paths { + for path in [outputs.events(), outputs.metadata(), outputs.paths()] { + if path.exists() { + fs::remove_file(path).map_err(|err| { + enverr!(ErrorCode::Io, "failed to remove partial trace file") + .with_context("path", path.display().to_string()) + .with_context("io", err.to_string()) + })?; + } + } + } + Ok(()) + } + + fn require_trace_or_fail(&self, policy: &RecorderPolicy) -> RecorderResult<()> { + if policy.require_trace && !self.events_recorded { + return Err(usage!( + ErrorCode::TraceMissing, + "recorder policy requires a trace but no events were recorded" + )); + } + Ok(()) + } + + fn finalise_writer(&mut self) -> RecorderResult<()> { + TraceWriter::finish_writing_trace_metadata(&mut self.writer).map_err(|err| { + enverr!(ErrorCode::Io, "failed to finalise trace metadata") + .with_context("source", err.to_string()) + })?; + TraceWriter::finish_writing_trace_paths(&mut self.writer).map_err(|err| { + enverr!(ErrorCode::Io, "failed to finalise trace paths") + .with_context("source", err.to_string()) + })?; + TraceWriter::finish_writing_trace_events(&mut self.writer).map_err(|err| { + enverr!(ErrorCode::Io, "failed to finalise trace events") + .with_context("source", err.to_string()) + })?; + Ok(()) } fn ensure_function_id( @@ -109,22 +325,27 @@ impl RuntimeTracer { if self.ignored_code_ids.contains(&code_id) { return ShouldTrace::SkipAndDisable; } - let filename = code - .filename(py) - .expect("RuntimeTracer::should_trace_code failed to resolve filename"); + let filename = match code.filename(py) { + Ok(name) => name, + Err(err) => { + with_error_code(ErrorCode::Io, || { + log::error!("failed to resolve code filename: {err}"); + }); + record_dropped_event("filename_lookup_failed"); + self.ignored_code_ids.insert(code_id); + return ShouldTrace::SkipAndDisable; + } + }; if is_real_filename(filename) { ShouldTrace::Trace } else { self.ignored_code_ids.insert(code_id); + record_dropped_event("synthetic_filename"); ShouldTrace::SkipAndDisable } } } -fn to_py_err(e: Box) -> pyo3::PyErr { - pyo3::exceptions::PyRuntimeError::new_err(e.to_string()) -} - impl Tracer for RuntimeTracer { fn interest(&self, events: &MonitoringEvents) -> EventSet { // Minimal set: function start, step lines, and returns @@ -148,17 +369,40 @@ impl Tracer for RuntimeTracer { return Ok(CallbackOutcome::Continue); } + if should_inject_failure(FailureStage::PyStart) { + return Err(injected_failure_err(FailureStage::PyStart)); + } + + if should_inject_target_error() { + return Err(ffi::map_recorder_error( + target!( + ErrorCode::TraceIncomplete, + "test-injected target error from capture_call_arguments" + ) + .with_context("injection_stage", "capture_call_arguments"), + )); + } + log_event(py, code, "on_py_start", None); if let Ok(fid) = self.ensure_function_id(py, code) { match capture_call_arguments(py, &mut self.writer, code) { Ok(args) => TraceWriter::register_call(&mut self.writer, fid, args), Err(err) => { - let message = format!("on_py_start: failed to capture args: {}", err); - log::error!("{message}"); - return Err(pyo3::exceptions::PyRuntimeError::new_err(message)); + let details = err.to_string(); + with_error_code(ErrorCode::FrameIntrospectionFailed, || { + log::error!("on_py_start: failed to capture args: {details}"); + }); + return Err(ffi::map_recorder_error( + enverr!( + ErrorCode::FrameIntrospectionFailed, + "failed to capture call arguments" + ) + .with_context("details", details), + )); } } + self.mark_event(); } Ok(CallbackOutcome::Continue) @@ -176,10 +420,22 @@ impl Tracer for RuntimeTracer { return Ok(CallbackOutcome::Continue); } + if should_inject_failure(FailureStage::Line) { + return Err(injected_failure_err(FailureStage::Line)); + } + + #[cfg(feature = "integration-test")] + { + if should_panic_in_callback() { + panic!("test-injected panic in on_line"); + } + } + log_event(py, code, "on_line", Some(lineno)); if let Ok(filename) = code.filename(py) { TraceWriter::register_step(&mut self.writer, Path::new(filename), Line(lineno as i64)); + self.mark_event(); } let snapshot = capture_frame(py, code)?; @@ -211,6 +467,7 @@ impl Tracer for RuntimeTracer { log_event(py, code, "on_py_return", None); record_return_value(py, &mut self.writer, retval); + self.mark_event(); if self.activation.handle_return_event(code.id()) { log::debug!("[RuntimeTracer] deactivated on activation return"); } @@ -218,13 +475,23 @@ impl Tracer for RuntimeTracer { Ok(CallbackOutcome::Continue) } + fn notify_failure(&mut self, _py: Python<'_>) -> PyResult<()> { + self.mark_failure(); + Ok(()) + } + fn flush(&mut self, _py: Python<'_>) -> PyResult<()> { // Trace event entry log::debug!("[RuntimeTracer] flush"); // For non-streaming formats we can update the events file. match self.format { TraceEventsFileFormat::Json | TraceEventsFileFormat::BinaryV0 => { - TraceWriter::finish_writing_trace_events(&mut self.writer).map_err(to_py_err)?; + TraceWriter::finish_writing_trace_events(&mut self.writer).map_err(|err| { + ffi::map_recorder_error( + enverr!(ErrorCode::Io, "failed to finalise trace events") + .with_context("source", err.to_string()), + ) + })?; } TraceEventsFileFormat::Binary => { // Streaming writer: no partial flush to avoid closing the stream. @@ -237,9 +504,45 @@ impl Tracer for RuntimeTracer { fn finish(&mut self, _py: Python<'_>) -> PyResult<()> { // Trace event entry log::debug!("[RuntimeTracer] finish"); - TraceWriter::finish_writing_trace_metadata(&mut self.writer).map_err(to_py_err)?; - TraceWriter::finish_writing_trace_paths(&mut self.writer).map_err(to_py_err)?; - TraceWriter::finish_writing_trace_events(&mut self.writer).map_err(to_py_err)?; + + if should_inject_failure(FailureStage::Finish) { + return Err(injected_failure_err(FailureStage::Finish)); + } + + set_active_trace_id(Some(self.trace_id.clone())); + let _reset = TraceIdResetGuard::new(); + let policy = policy_snapshot(); + + if self.encountered_failure { + if policy.keep_partial_trace { + if let Err(err) = self.finalise_writer() { + with_error_code(ErrorCode::TraceIncomplete, || { + log::warn!( + "failed to finalise partial trace after disable: {}", + err.message() + ); + }); + } + if let Some(outputs) = &self.output_paths { + with_error_code(ErrorCode::TraceIncomplete, || { + log::warn!( + "recorder detached after failure; keeping partial trace at {}", + outputs.events().display() + ); + }); + } + } else { + self.cleanup_partial_outputs() + .map_err(ffi::map_recorder_error)?; + } + self.ignored_code_ids.clear(); + self.function_ids.clear(); + return Ok(()); + } + + self.require_trace_or_fail(&policy) + .map_err(ffi::map_recorder_error)?; + self.finalise_writer().map_err(ffi::map_recorder_error)?; self.ignored_code_ids.clear(); self.function_ids.clear(); Ok(()) @@ -250,6 +553,7 @@ impl Tracer for RuntimeTracer { mod tests { use super::*; use crate::monitoring::CallbackOutcome; + use crate::policy; use pyo3::types::{PyAny, PyCode, PyModule}; use pyo3::wrap_pyfunction; use runtime_tracing::{FullValueRecord, TraceLowLevelEvent, ValueRecord}; @@ -282,6 +586,18 @@ mod tests { LAST_OUTCOME.with(|cell| cell.get()) } + fn reset_policy(_py: Python<'_>) { + policy::configure_policy_py( + Some("abort"), + Some(false), + Some(false), + None, + None, + Some(false), + ) + .expect("reset recorder policy"); + } + #[test] fn detects_real_filenames() { assert!(is_real_filename("example.py")); @@ -420,24 +736,26 @@ result = compute()\n" #[pyfunction] fn capture_line(py: Python<'_>, code: Bound<'_, PyCode>, lineno: u32) -> PyResult<()> { - ACTIVE_TRACER.with(|cell| -> PyResult<()> { - let ptr = cell.get(); - if ptr.is_null() { - panic!("No active RuntimeTracer for capture_line"); - } - unsafe { - let tracer = &mut *ptr; - let wrapper = CodeObjectWrapper::new(py, &code); - match tracer.on_line(py, &wrapper, lineno) { - Ok(outcome) => { - LAST_OUTCOME.with(|cell| cell.set(Some(outcome))); - Ok(()) + ffi::wrap_pyfunction("test_capture_line", || { + ACTIVE_TRACER.with(|cell| -> PyResult<()> { + let ptr = cell.get(); + if ptr.is_null() { + panic!("No active RuntimeTracer for capture_line"); + } + unsafe { + let tracer = &mut *ptr; + let wrapper = CodeObjectWrapper::new(py, &code); + match tracer.on_line(py, &wrapper, lineno) { + Ok(outcome) => { + LAST_OUTCOME.with(|cell| cell.set(Some(outcome))); + Ok(()) + } + Err(err) => Err(err), } - Err(err) => Err(err), } - } - })?; - Ok(()) + })?; + Ok(()) + }) } #[pyfunction] @@ -446,24 +764,26 @@ result = compute()\n" code: Bound<'_, PyCode>, value: Bound<'_, PyAny>, ) -> PyResult<()> { - ACTIVE_TRACER.with(|cell| -> PyResult<()> { - let ptr = cell.get(); - if ptr.is_null() { - panic!("No active RuntimeTracer for capture_return_event"); - } - unsafe { - let tracer = &mut *ptr; - let wrapper = CodeObjectWrapper::new(py, &code); - match tracer.on_py_return(py, &wrapper, 0, &value) { - Ok(outcome) => { - LAST_OUTCOME.with(|cell| cell.set(Some(outcome))); - Ok(()) + ffi::wrap_pyfunction("test_capture_return_event", || { + ACTIVE_TRACER.with(|cell| -> PyResult<()> { + let ptr = cell.get(); + if ptr.is_null() { + panic!("No active RuntimeTracer for capture_return_event"); + } + unsafe { + let tracer = &mut *ptr; + let wrapper = CodeObjectWrapper::new(py, &code); + match tracer.on_py_return(py, &wrapper, 0, &value) { + Ok(outcome) => { + LAST_OUTCOME.with(|cell| cell.set(Some(outcome))); + Ok(()) + } + Err(err) => Err(err), } - Err(err) => Err(err), } - } - })?; - Ok(()) + })?; + Ok(()) + }) } const PRELUDE: &str = r#" @@ -1037,4 +1357,119 @@ snapshot() assert_var(len_snapshot, "n", SimpleValue::Int(3)); assert_no_variable(&snapshots, "len"); } + + #[test] + fn finish_enforces_require_trace_policy() { + Python::with_gil(|py| { + policy::configure_policy_py( + Some("abort"), + Some(true), + Some(false), + None, + None, + Some(false), + ) + .expect("enable require_trace policy"); + + let script_dir = tempfile::tempdir().expect("script dir"); + let program_path = script_dir.path().join("program.py"); + std::fs::write(&program_path, "print('hi')\n").expect("write program"); + + let outputs_dir = tempfile::tempdir().expect("outputs dir"); + let outputs = TraceOutputPaths::new(outputs_dir.path(), TraceEventsFileFormat::Json); + + let mut tracer = RuntimeTracer::new( + program_path.to_string_lossy().as_ref(), + &[], + TraceEventsFileFormat::Json, + None, + ); + tracer.begin(&outputs, 1).expect("begin tracer"); + + let err = tracer + .finish(py) + .expect_err("finish should error when require_trace true"); + let message = err.to_string(); + assert!( + message.contains("ERR_TRACE_MISSING"), + "expected trace missing error, got {message}" + ); + + reset_policy(py); + }); + } + + #[test] + fn finish_removes_partial_outputs_when_policy_forbids_keep() { + Python::with_gil(|py| { + reset_policy(py); + + let script_dir = tempfile::tempdir().expect("script dir"); + let program_path = script_dir.path().join("program.py"); + std::fs::write(&program_path, "print('hi')\n").expect("write program"); + + let outputs_dir = tempfile::tempdir().expect("outputs dir"); + let outputs = TraceOutputPaths::new(outputs_dir.path(), TraceEventsFileFormat::Json); + + let mut tracer = RuntimeTracer::new( + program_path.to_string_lossy().as_ref(), + &[], + TraceEventsFileFormat::Json, + None, + ); + tracer.begin(&outputs, 1).expect("begin tracer"); + tracer.mark_failure(); + + tracer.finish(py).expect("finish after failure"); + + assert!(!outputs.events().exists(), "expected events file removed"); + assert!( + !outputs.metadata().exists(), + "expected metadata file removed" + ); + assert!(!outputs.paths().exists(), "expected paths file removed"); + }); + } + + #[test] + fn finish_keeps_partial_outputs_when_policy_allows() { + Python::with_gil(|py| { + policy::configure_policy_py( + Some("abort"), + Some(false), + Some(true), + None, + None, + Some(false), + ) + .expect("enable keep_partial policy"); + + let script_dir = tempfile::tempdir().expect("script dir"); + let program_path = script_dir.path().join("program.py"); + std::fs::write(&program_path, "print('hi')\n").expect("write program"); + + let outputs_dir = tempfile::tempdir().expect("outputs dir"); + let outputs = TraceOutputPaths::new(outputs_dir.path(), TraceEventsFileFormat::Json); + + let mut tracer = RuntimeTracer::new( + program_path.to_string_lossy().as_ref(), + &[], + TraceEventsFileFormat::Json, + None, + ); + tracer.begin(&outputs, 1).expect("begin tracer"); + tracer.mark_failure(); + + tracer.finish(py).expect("finish after failure"); + + assert!(outputs.events().exists(), "expected events file retained"); + assert!( + outputs.metadata().exists(), + "expected metadata file retained" + ); + assert!(outputs.paths().exists(), "expected paths file retained"); + + reset_policy(py); + }); + } } diff --git a/codetracer-python-recorder/src/runtime/output_paths.rs b/codetracer-python-recorder/src/runtime/output_paths.rs index 088f932..040c8b5 100644 --- a/codetracer-python-recorder/src/runtime/output_paths.rs +++ b/codetracer-python-recorder/src/runtime/output_paths.rs @@ -1,10 +1,12 @@ //! File-system helpers for trace output management. -use std::error::Error; use std::path::{Path, PathBuf}; +use recorder_errors::{enverr, ErrorCode}; use runtime_tracing::{Line, NonStreamingTraceWriter, TraceEventsFileFormat, TraceWriter}; +use crate::errors::Result; + /// File layout for a trace session. Encapsulates the metadata, event, and paths /// files that need to be initialised alongside the runtime tracer. #[derive(Debug, Clone)] @@ -50,10 +52,22 @@ impl TraceOutputPaths { writer: &mut NonStreamingTraceWriter, start_path: &Path, start_line: u32, - ) -> Result<(), Box> { - TraceWriter::begin_writing_trace_metadata(writer, self.metadata())?; - TraceWriter::begin_writing_trace_paths(writer, self.paths())?; - TraceWriter::begin_writing_trace_events(writer, self.events())?; + ) -> Result<()> { + TraceWriter::begin_writing_trace_metadata(writer, self.metadata()).map_err(|err| { + enverr!(ErrorCode::Io, "failed to begin trace metadata") + .with_context("path", self.metadata().display().to_string()) + .with_context("source", err.to_string()) + })?; + TraceWriter::begin_writing_trace_paths(writer, self.paths()).map_err(|err| { + enverr!(ErrorCode::Io, "failed to begin trace paths") + .with_context("path", self.paths().display().to_string()) + .with_context("source", err.to_string()) + })?; + TraceWriter::begin_writing_trace_events(writer, self.events()).map_err(|err| { + enverr!(ErrorCode::Io, "failed to begin trace events") + .with_context("path", self.events().display().to_string()) + .with_context("source", err.to_string()) + })?; TraceWriter::start(writer, start_path, Line(start_line as i64)); Ok(()) } diff --git a/codetracer-python-recorder/src/runtime/value_capture.rs b/codetracer-python-recorder/src/runtime/value_capture.rs index b3d36a1..f0950f2 100644 --- a/codetracer-python-recorder/src/runtime/value_capture.rs +++ b/codetracer-python-recorder/src/runtime/value_capture.rs @@ -2,13 +2,14 @@ use std::collections::HashSet; -use pyo3::exceptions::PyRuntimeError; use pyo3::prelude::*; use pyo3::types::PyString; +use recorder_errors::{usage, ErrorCode}; use runtime_tracing::{FullValueRecord, NonStreamingTraceWriter, TraceWriter}; use crate::code_object::CodeObjectWrapper; +use crate::ffi; use crate::runtime::frame_inspector::{capture_frame, FrameSnapshot}; use crate::runtime::value_encoder::encode_value; @@ -38,9 +39,12 @@ pub fn capture_call_arguments<'py>( let positional_take = std::cmp::min(argcount, varnames.len()); for name in varnames.iter().take(positional_take) { - let value = locals - .get_item(name)? - .ok_or_else(|| PyRuntimeError::new_err(format!("missing positional arg '{name}'")))?; + let value = locals.get_item(name)?.ok_or_else(|| { + ffi::map_recorder_error(usage!( + ErrorCode::MissingPositionalArgument, + "missing positional arg '{name}'" + )) + })?; let encoded = encode_value(py, writer, &value); args.push(TraceWriter::arg(writer, name, encoded)); idx += 1; @@ -57,9 +61,12 @@ pub fn capture_call_arguments<'py>( let kwonly_take = std::cmp::min(kwonly, varnames.len().saturating_sub(idx)); for name in varnames.iter().skip(idx).take(kwonly_take) { - let value = locals - .get_item(name)? - .ok_or_else(|| PyRuntimeError::new_err(format!("missing kw-only arg '{name}'")))?; + let value = locals.get_item(name)?.ok_or_else(|| { + ffi::map_recorder_error(usage!( + ErrorCode::MissingKeywordArgument, + "missing kw-only arg '{name}'" + )) + })?; let encoded = encode_value(py, writer, &value); args.push(TraceWriter::arg(writer, name, encoded)); } diff --git a/codetracer-python-recorder/src/session.rs b/codetracer-python-recorder/src/session.rs index bfcfa9a..4acfc51 100644 --- a/codetracer-python-recorder/src/session.rs +++ b/codetracer-python-recorder/src/session.rs @@ -5,9 +5,10 @@ mod bootstrap; use std::path::{Path, PathBuf}; use std::sync::atomic::{AtomicBool, Ordering}; -use pyo3::exceptions::PyRuntimeError; use pyo3::prelude::*; +use recorder_errors::{usage, ErrorCode}; +use crate::ffi; use crate::logging::init_rust_logging_with_default; use crate::monitoring::{flush_installed_tracer, install_tracer, uninstall_tracer}; use crate::runtime::{RuntimeTracer, TraceOutputPaths}; @@ -19,59 +20,69 @@ static ACTIVE: AtomicBool = AtomicBool::new(false); /// Start tracing using sys.monitoring and runtime_tracing writer. #[pyfunction] pub fn start_tracing(path: &str, format: &str, activation_path: Option<&str>) -> PyResult<()> { - // Ensure logging is ready before any tracer logs might be emitted. - // Default only our crate to debug to avoid excessive verbosity from deps. - init_rust_logging_with_default("codetracer_python_recorder=debug"); - if ACTIVE.load(Ordering::SeqCst) { - return Err(PyRuntimeError::new_err("tracing already active")); - } + ffi::wrap_pyfunction("start_tracing", || { + // Ensure logging is ready before any tracer logs might be emitted. + // Default our crate to warnings-only so tests stay quiet unless explicitly enabled. + init_rust_logging_with_default("codetracer_python_recorder=warn"); + if ACTIVE.load(Ordering::SeqCst) { + return Err(ffi::map_recorder_error(usage!( + ErrorCode::AlreadyTracing, + "tracing already active" + ))); + } - let activation_path = activation_path.map(PathBuf::from); + let activation_path = activation_path.map(PathBuf::from); - Python::with_gil(|py| { - let bootstrap = TraceSessionBootstrap::prepare( - py, - Path::new(path), - format, - activation_path.as_deref(), - )?; + Python::with_gil(|py| { + let bootstrap = TraceSessionBootstrap::prepare( + py, + Path::new(path), + format, + activation_path.as_deref(), + ) + .map_err(ffi::map_recorder_error)?; - let outputs = TraceOutputPaths::new(bootstrap.trace_directory(), bootstrap.format()); + let outputs = TraceOutputPaths::new(bootstrap.trace_directory(), bootstrap.format()); - let mut tracer = RuntimeTracer::new( - bootstrap.program(), - bootstrap.args(), - bootstrap.format(), - bootstrap.activation_path(), - ); - tracer.begin(&outputs, 1)?; + let mut tracer = RuntimeTracer::new( + bootstrap.program(), + bootstrap.args(), + bootstrap.format(), + bootstrap.activation_path(), + ); + tracer.begin(&outputs, 1)?; - // Install callbacks - install_tracer(py, Box::new(tracer))?; - ACTIVE.store(true, Ordering::SeqCst); - Ok(()) + // Install callbacks + install_tracer(py, Box::new(tracer))?; + ACTIVE.store(true, Ordering::SeqCst); + Ok(()) + }) }) } /// Stop tracing by resetting the global flag. #[pyfunction] pub fn stop_tracing() -> PyResult<()> { - Python::with_gil(|py| { - // Uninstall triggers finish() on tracer implementation. - uninstall_tracer(py)?; - ACTIVE.store(false, Ordering::SeqCst); - Ok(()) + ffi::wrap_pyfunction("stop_tracing", || { + Python::with_gil(|py| { + // Uninstall triggers finish() on tracer implementation. + uninstall_tracer(py)?; + ACTIVE.store(false, Ordering::SeqCst); + Ok(()) + }) }) } /// Query whether tracing is currently active. #[pyfunction] pub fn is_tracing() -> PyResult { - Ok(ACTIVE.load(Ordering::SeqCst)) + ffi::wrap_pyfunction("is_tracing", || Ok(ACTIVE.load(Ordering::SeqCst))) } /// Flush buffered trace data (best-effort, non-streaming formats only). #[pyfunction] pub fn flush_tracing() -> PyResult<()> { - Python::with_gil(|py| flush_installed_tracer(py)) + ffi::wrap_pyfunction("flush_tracing", || { + Python::with_gil(|py| flush_installed_tracer(py)) + }) } diff --git a/codetracer-python-recorder/src/session/bootstrap.rs b/codetracer-python-recorder/src/session/bootstrap.rs index 1327f82..2e0e53d 100644 --- a/codetracer-python-recorder/src/session/bootstrap.rs +++ b/codetracer-python-recorder/src/session/bootstrap.rs @@ -3,10 +3,12 @@ use std::fs; use std::path::{Path, PathBuf}; -use pyo3::exceptions::PyRuntimeError; use pyo3::prelude::*; +use recorder_errors::{enverr, usage, ErrorCode}; use runtime_tracing::TraceEventsFileFormat; +use crate::errors::Result; + /// Basic metadata about the currently running Python program. #[derive(Debug, Clone)] pub struct ProgramMetadata { @@ -31,10 +33,13 @@ impl TraceSessionBootstrap { trace_directory: &Path, format: &str, activation_path: Option<&Path>, - ) -> PyResult { + ) -> Result { ensure_trace_directory(trace_directory)?; let format = resolve_trace_format(format)?; - let metadata = collect_program_metadata(py)?; + let metadata = collect_program_metadata(py).map_err(|err| { + enverr!(ErrorCode::Io, "failed to collect program metadata") + .with_context("details", err.to_string()) + })?; Ok(Self { trace_directory: trace_directory.to_path_buf(), format, @@ -65,33 +70,39 @@ impl TraceSessionBootstrap { } /// Ensure the requested trace directory exists and is writable. -pub fn ensure_trace_directory(path: &Path) -> PyResult<()> { +pub fn ensure_trace_directory(path: &Path) -> Result<()> { if path.exists() { if !path.is_dir() { - return Err(PyRuntimeError::new_err( - "trace path exists and is not a directory", - )); + return Err(usage!( + ErrorCode::TraceDirectoryConflict, + "trace path exists and is not a directory" + ) + .with_context("path", path.display().to_string())); } return Ok(()); } fs::create_dir_all(path).map_err(|e| { - PyRuntimeError::new_err(format!( - "failed to create trace directory '{}': {e}", - path.display() - )) + enverr!( + ErrorCode::TraceDirectoryCreateFailed, + "failed to create trace directory" + ) + .with_context("path", path.display().to_string()) + .with_context("io", e.to_string()) }) } /// Convert a user-provided format string into the runtime representation. -pub fn resolve_trace_format(value: &str) -> PyResult { +pub fn resolve_trace_format(value: &str) -> Result { match value.to_ascii_lowercase().as_str() { "json" => Ok(TraceEventsFileFormat::Json), // Accept historical aliases for the binary format. "binary" | "binaryv0" | "binary_v0" | "b0" => Ok(TraceEventsFileFormat::BinaryV0), - other => Err(PyRuntimeError::new_err(format!( - "unsupported trace format '{other}'. Expected one of: json, binary" - ))), + other => Err(usage!( + ErrorCode::UnsupportedFormat, + "unsupported trace format '{}'. Expected one of: json, binary", + other + )), } } @@ -124,6 +135,7 @@ pub fn collect_program_metadata(py: Python<'_>) -> PyResult { mod tests { use super::*; use pyo3::types::PyList; + use recorder_errors::ErrorCode; use tempfile::tempdir; #[test] @@ -139,7 +151,8 @@ mod tests { let tmp = tempdir().expect("tempdir"); let file_path = tmp.path().join("trace.bin"); std::fs::write(&file_path, b"stub").expect("write stub file"); - assert!(ensure_trace_directory(&file_path).is_err()); + let err = ensure_trace_directory(&file_path).expect_err("should reject file path"); + assert_eq!(err.code, ErrorCode::TraceDirectoryConflict); } #[test] @@ -156,12 +169,9 @@ mod tests { #[test] fn resolve_trace_format_rejects_unknown_values() { - Python::with_gil(|py| { - let err = resolve_trace_format("yaml").expect_err("should reject yaml"); - assert_eq!(err.get_type(py).name().expect("type name"), "RuntimeError"); - let message = err.value(py).to_string(); - assert!(message.contains("unsupported trace format")); - }); + let err = resolve_trace_format("yaml").expect_err("should reject yaml"); + assert_eq!(err.code, ErrorCode::UnsupportedFormat); + assert!(err.message().contains("unsupported trace format")); } #[test] diff --git a/codetracer-python-recorder/tests/python/test_error_handling.py b/codetracer-python-recorder/tests/python/test_error_handling.py new file mode 100644 index 0000000..838197e --- /dev/null +++ b/codetracer-python-recorder/tests/python/test_error_handling.py @@ -0,0 +1,134 @@ +from __future__ import annotations + +import json +import os +import stat +import subprocess +import sys +import textwrap +from pathlib import Path + +import pytest + +import codetracer_python_recorder as codetracer + + +@pytest.fixture(autouse=True) +def ensure_tracer_stopped() -> None: + yield + if codetracer.is_tracing(): + codetracer.stop() + + +def _run_python_script(tmp_path: Path, body: str, env: dict[str, str] | None = None) -> subprocess.CompletedProcess[str]: + script_path = tmp_path / "runner.py" + script_path.write_text(textwrap.dedent(body)) + return subprocess.run( + [sys.executable, str(script_path)], + capture_output=True, + text=True, + check=False, + env=env, + ) + + +@pytest.mark.skipif(os.name == "nt", reason="posix permissions required") +def test_start_tracing_raises_environment_error(tmp_path: Path) -> None: + locked_dir = tmp_path / "locked" + locked_dir.mkdir() + locked_dir.chmod(stat.S_IRUSR | stat.S_IXUSR) + trace_dir = locked_dir / "trace" + + try: + with pytest.raises(codetracer.EnvironmentError) as excinfo: + codetracer.start(trace_dir) + assert excinfo.value.code == "ERR_TRACE_DIR_CREATE_FAILED" + assert "failed to create trace directory" in str(excinfo.value) + finally: + locked_dir.chmod(stat.S_IRWXU) + + +TARGET_ERROR_SCRIPT = """ +import os +import sys +from pathlib import Path + +import codetracer_python_recorder as recorder + +trace_dir = Path(os.environ["TRACE_DIR"]) +try: + with recorder.trace(trace_dir): + def sample(value: int) -> int: + return value + 1 + sample(1) +except recorder.TargetError as exc: + print(f"caught {exc.__class__.__name__} {getattr(exc, 'code', '')}") + sys.exit(0) +except Exception as exc: # pragma: no cover - defensive logging for debugging + print(f"unexpected {type(exc).__name__}") + sys.exit(2) +else: + print("no-error") + sys.exit(1) +""" + + +def test_target_error_integration(tmp_path: Path) -> None: + env = os.environ.copy() + env["CODETRACER_TEST_INJECT_FAILURE"] = "target-args" + trace_dir = tmp_path / "target-trace" + env["TRACE_DIR"] = str(trace_dir) + + result = _run_python_script(tmp_path, TARGET_ERROR_SCRIPT, env=env) + if result.returncode != 0 and "no-error" in result.stdout: + pytest.skip("recorder built without integration-test hooks") + assert result.returncode == 0, result.stderr + assert "TargetError" in result.stdout + assert "ERR_TRACE_INCOMPLETE" in result.stdout + + +PANIC_SCRIPT = """ +import json +import os +import sys +from pathlib import Path + +import codetracer_python_recorder as recorder + +trace_dir = Path(os.environ["TRACE_DIR"]) +try: + with recorder.trace(trace_dir): + def probe() -> int: + return sum(range(5)) + probe() + probe() +except recorder.InternalError as exc: + print(f"caught {exc.__class__.__name__} {getattr(exc, 'code', '')}" ) + sys.exit(0) +except Exception as exc: # pragma: no cover - defensive logging + print(f"unexpected {type(exc).__name__}") + sys.exit(2) +else: + print("no-error") + sys.exit(1) +""" + + +def test_panic_in_callback_maps_to_internal_error(tmp_path: Path) -> None: + env = os.environ.copy() + env["CODETRACER_TEST_INJECT_FAILURE"] = "panic" + env["CODETRACER_JSON_ERRORS"] = "1" + trace_dir = tmp_path / "panic-trace" + env["TRACE_DIR"] = str(trace_dir) + + result = _run_python_script(tmp_path, PANIC_SCRIPT, env=env) + if result.returncode != 0 and "no-error" in result.stdout: + pytest.skip("recorder built without integration-test hooks") + assert result.returncode == 0, result.stderr + assert "InternalError" in result.stdout + stderr_lines = [line for line in result.stderr.splitlines() if line.strip()] + assert any("panic in on_line" in line for line in stderr_lines) + assert any( + line.strip().startswith("{") and json.loads(line).get("error_kind") == "Internal" + for line in stderr_lines + ), result.stderr diff --git a/codetracer-python-recorder/tests/python/test_policy_runtime.py b/codetracer-python-recorder/tests/python/test_policy_runtime.py new file mode 100644 index 0000000..bc42112 --- /dev/null +++ b/codetracer-python-recorder/tests/python/test_policy_runtime.py @@ -0,0 +1,137 @@ +from __future__ import annotations + +import json +import os +import subprocess +import sys +from pathlib import Path + +import pytest + + +def _run_cli(script: Path, *args: str, env: dict[str, str] | None = None) -> subprocess.CompletedProcess[str]: + cmd = [ + sys.executable, + "-m", + "codetracer_python_recorder", + *args, + str(script), + ] + return subprocess.run(cmd, capture_output=True, text=True, env=env, check=False) + + +def test_cli_disable_policy_detaches_on_internal_error(tmp_path: Path) -> None: + script = tmp_path / "app.py" + script.write_text("value = 1\nprint(value)\n") + trace_dir = tmp_path / "trace" + + env = os.environ.copy() + env["CODETRACER_TEST_INJECT_FAILURE"] = "line" + + result = _run_cli( + script, + "--codetracer-trace", + str(trace_dir), + "--codetracer-format", + "json", + "--codetracer-on-recorder-error", + "disable", + env=env, + ) + + assert result.returncode == 0, result.stderr + assert trace_dir.is_dir() + events = trace_dir / "trace.json" + metadata = trace_dir / "trace_metadata.json" + paths = trace_dir / "trace_paths.json" + assert not events.exists() + assert not metadata.exists() + assert not paths.exists() + assert "test-injected failure" in result.stderr + + +def test_cli_abort_policy_propagates_internal_error(tmp_path: Path) -> None: + script = tmp_path / "app.py" + script.write_text("value = 1\nprint(value)\n") + trace_dir = tmp_path / "trace" + + env = os.environ.copy() + env["CODETRACER_TEST_INJECT_FAILURE"] = "line" + + result = _run_cli( + script, + "--codetracer-trace", + str(trace_dir), + "--codetracer-format", + "json", + "--codetracer-on-recorder-error", + "abort", + env=env, + ) + + assert result.returncode != 0 + assert trace_dir.is_dir() + assert "test-injected failure" in result.stderr + + +@pytest.mark.skipif(not hasattr(os, "symlink"), reason="symlinks required") +def test_cli_require_trace_fails_when_no_events_recorded(tmp_path: Path) -> None: + script = tmp_path / "real_script.py" + script.write_text("print('ran')\n") + alias = tmp_path / "alias.py" + try: + alias.symlink_to(script) + except OSError as exc: # pragma: no cover - platform dependent branch + pytest.skip(f"symlinks unavailable: {exc}") + + trace_dir = tmp_path / "trace" + env = os.environ.copy() + env["CODETRACER_TEST_INJECT_FAILURE"] = "suppress-events" + + result = _run_cli( + alias, + "--codetracer-trace", + str(trace_dir), + "--codetracer-format", + "json", + "--codetracer-require-trace", + env=env, + ) + + assert result.returncode != 0 + assert trace_dir.is_dir() + assert "requires a trace but no events were recorded" in result.stderr + + +def test_cli_json_errors_emits_trailer(tmp_path: Path) -> None: + script = tmp_path / "app.py" + script.write_text("value = 1\nprint(value)\n") + trace_dir = tmp_path / "trace" + + env = os.environ.copy() + env["CODETRACER_TEST_INJECT_FAILURE"] = "line" + + result = _run_cli( + script, + "--codetracer-trace", + str(trace_dir), + "--codetracer-format", + "json", + "--codetracer-json-errors", + "--codetracer-on-recorder-error", + "abort", + env=env, + ) + + assert result.returncode != 0 + lines = [line for line in result.stderr.splitlines() if line.strip()] + for line in reversed(lines): + if line.lstrip().startswith("{"): + trailer = json.loads(line) + break + else: + raise AssertionError(f"missing JSON error trailer in stderr: {result.stderr!r}") + assert trailer["error_code"] == "ERR_TRACE_INCOMPLETE" + assert trailer["message"].startswith("test-injected failure") + assert trailer["run_id"] + assert trailer["trace_id"] diff --git a/codetracer-python-recorder/tests/python/unit/test_backend_exceptions.py b/codetracer-python-recorder/tests/python/unit/test_backend_exceptions.py new file mode 100644 index 0000000..5014b3b --- /dev/null +++ b/codetracer-python-recorder/tests/python/unit/test_backend_exceptions.py @@ -0,0 +1,32 @@ +"""Tests for the Python exception hierarchy exposed by the Rust module.""" +from __future__ import annotations + +import pytest + +import codetracer_python_recorder as codetracer +from codetracer_python_recorder.codetracer_python_recorder import ( + UsageError, + is_tracing, + start_tracing, + stop_tracing, +) + + +@pytest.fixture(autouse=True) +def stop_after() -> None: + yield + if is_tracing(): + stop_tracing() + + +def test_start_tracing_raises_usage_error(tmp_path) -> None: + start_tracing(str(tmp_path), "json", None) + with pytest.raises(UsageError) as excinfo: + start_tracing(str(tmp_path), "json", None) + err = excinfo.value + assert getattr(err, "code") == "ERR_ALREADY_TRACING" + assert "tracing already active" in str(err) + + +def test_exception_reexport_matches_underlying_type() -> None: + assert codetracer.UsageError is UsageError diff --git a/codetracer-python-recorder/tests/python/unit/test_policy_configuration.py b/codetracer-python-recorder/tests/python/unit/test_policy_configuration.py new file mode 100644 index 0000000..3e62961 --- /dev/null +++ b/codetracer-python-recorder/tests/python/unit/test_policy_configuration.py @@ -0,0 +1,106 @@ +"""Unit tests for runtime policy configuration.""" +from __future__ import annotations + +from pathlib import Path + +import pytest + +import codetracer_python_recorder as codetracer +from codetracer_python_recorder import session as session_api + + +@pytest.fixture(autouse=True) +def reset_policy() -> None: + codetracer.configure_policy( + on_recorder_error="abort", + require_trace=False, + keep_partial_trace=False, + log_level="", + log_file="", + json_errors=False, + ) + yield + codetracer.configure_policy( + on_recorder_error="abort", + require_trace=False, + keep_partial_trace=False, + log_level="", + log_file="", + json_errors=False, + ) + + +def test_configure_policy_overrides_values(tmp_path: Path) -> None: + target_log = tmp_path / "recorder.log" + codetracer.configure_policy( + on_recorder_error="disable", + require_trace=True, + keep_partial_trace=True, + log_level="info", + log_file=str(target_log), + json_errors=True, + ) + + snapshot = codetracer.policy_snapshot() + assert snapshot["on_recorder_error"] == "disable" + assert snapshot["require_trace"] is True + assert snapshot["keep_partial_trace"] is True + assert snapshot["log_level"] == "info" + assert snapshot["log_file"] == str(target_log) + assert snapshot["json_errors"] is True + + +def test_policy_env_configuration(monkeypatch: pytest.MonkeyPatch, tmp_path: Path) -> None: + monkeypatch.setenv("CODETRACER_ON_RECORDER_ERROR", "disable") + monkeypatch.setenv("CODETRACER_REQUIRE_TRACE", "true") + monkeypatch.setenv("CODETRACER_KEEP_PARTIAL_TRACE", "1") + monkeypatch.setenv("CODETRACER_LOG_LEVEL", "debug") + log_file = tmp_path / "policy.log" + monkeypatch.setenv("CODETRACER_LOG_FILE", str(log_file)) + monkeypatch.setenv("CODETRACER_JSON_ERRORS", "yes") + + codetracer.configure_policy_from_env() + + snapshot = codetracer.policy_snapshot() + assert snapshot["on_recorder_error"] == "disable" + assert snapshot["require_trace"] is True + assert snapshot["keep_partial_trace"] is True + assert snapshot["log_level"] == "debug" + assert snapshot["log_file"] == str(log_file) + assert snapshot["json_errors"] is True + + +def test_clearing_log_configuration(tmp_path: Path) -> None: + codetracer.configure_policy(log_level="debug", log_file=str(tmp_path / "log.txt")) + codetracer.configure_policy(log_level="", log_file="") + snapshot = codetracer.policy_snapshot() + assert snapshot["log_level"] is None + assert snapshot["log_file"] is None + + +def test_session_start_applies_policy_overrides(tmp_path: Path) -> None: + policy = { + "on_recorder_error": "disable", + "log_file": tmp_path / "policy.log", + "json_errors": True, + } + + session = session_api.start(tmp_path, policy=policy, apply_env_policy=False) + try: + snapshot = codetracer.policy_snapshot() + assert snapshot["on_recorder_error"] == "disable" + assert snapshot["log_file"] == str(tmp_path / "policy.log") + assert snapshot["json_errors"] is True + finally: + session.stop() + + +def test_session_start_respects_env_policy(monkeypatch: pytest.MonkeyPatch, tmp_path: Path) -> None: + monkeypatch.setenv("CODETRACER_REQUIRE_TRACE", "true") + + session = session_api.start(tmp_path) + try: + snapshot = codetracer.policy_snapshot() + assert snapshot["require_trace"] is True + finally: + session.stop() diff --git a/design-docs/capture-output.md b/design-docs/capture-output.md new file mode 100644 index 0000000..efa4f03 --- /dev/null +++ b/design-docs/capture-output.md @@ -0,0 +1,204 @@ +# ADR: Non-invasive stdout/stderr/stdin capture and line-level mapping for the Python tracer (PyO3 + runtime_tracing) + +**Status**: Accepted +**Date**: 2025-10-01 +**Owners**: Tracing/Recorder team +**Scope**: Recorder runtime (Rust/PyO3) and Python instrumentation glue + +--- + +## Context + +- The user-facing CLI remains `python -m codetracer_python_recorder`. +- On startup the CLI parses recorder flags, adjusts `sys.argv`, and calls `codetracer_python_recorder.start(...)`, which delegates to the Rust extension. +- The Rust side already installs a `RuntimeTracer` implementation that subscribes to `sys.monitoring` callbacks and writes runtime_tracing artifacts (`trace.bin`/`trace.json`, `trace_metadata.json`, `trace_paths.json`). +- We still lack non-invasive capture of stdout/stderr/stdin that aligns each chunk with the trace stream for replay. +- The original draft assumed the script would be executed from Rust; in practice `runpy.run_path` stays inside the Python entrypoint, so the design must integrate with that lifecycle. +- The runtime_tracing crate (`TraceWriter`, `TraceLowLevelEvent::Event`, `EventLogKind::*`) already provides primitives for persisting arbitrary IO events; we will rely on it rather than introducing a bespoke artifact. + +--- + +## Decision + +1. **Lifecycle & CLI compatibility** + Retain the existing Python launcher. `python -m codetracer_python_recorder script.py` continues to prepare `sys.argv`, call `start(...)`, execute `runpy.run_path`, and finally `stop()`. Capture plumbing lives entirely inside `start()`/`stop()`, keeping the public API stable. + +2. **FD-level capture component** + Within `start_tracing` we instantiate an `OutputCapture` controller that duplicates the original stdin/stdout/stderr descriptors (or Windows handles), installs pipes in their place, and spawns draining threads. The Python script still runs in-process, but every write to fd 1/2 is diverted through the controller. Each chunk receives a monotonic timestamp before being queued, and the bytes are simultaneously mirrored back to the preserved descriptors so users continue to see live console output. + +3. **stdin strategy** + The controller exposes a write-end that the CLI feeds. By default we mirror the user's real stdin into the pipe (teeing so we can persist what was provided) and close it on EOF. Later scripted input can use the same interface. + +4. **runtime_tracing integration** + `RuntimeTracer` keeps sole ownership of the `NonStreamingTraceWriter`. We wrap it in `Arc>` so the FD reader threads can call `TraceWriter::add_event(TraceLowLevelEvent::Event(...))`. Each chunk becomes an `EventLogKind::Write` (stdout), `WriteOther` (stderr), or `Read` (stdin) record whose `metadata` is a JSON document (stream, thread ids when known, activation state, step reference, timestamps) and whose `content` carries the base64-encoded bytes. IO data therefore lives in the same `trace.bin`/`trace.json` file as step and call events. + +5. **Line attribution** + `RuntimeTracer` already handles `LINE` and `PY_*` monitoring callbacks. We extend it to track the most recent `Step` per Python thread and expose a `Snapshot` API returning `(path_id, line, call_key)`. When an IO chunk arrives we fetch the latest snapshot for the emitting Python thread (or fall back to the global latest step if the writer thread is unknown) and include it in the metadata so replay tooling can align output with execution. + +6. **Recorder logging isolation** + Logger initialisation continues to route Rust `log` output to the preserved original stderr handle before redirection. Capture threads never use `println!`; they rely on the logger to avoid contaminating user streams. + +7. **Buffering & flushing** + After installing the pipes we request line buffering on `sys.stdout`/`sys.stderr` when possible. On `stop()` we drain pending chunks, restore the original descriptors, and finish the runtime_tracing writer (`finish_writing_trace_*`). + +--- + +## Alternatives Considered + +- Monkey-patching `sys.stdout`/`sys.stderr`: misses native writes and conflicts with user overrides. +- Spawning a subprocess wrapper around the script: breaks the in-process monitoring story and changes CLI semantics. +- Emitting IO to a bespoke JSON artifact: unnecessary because runtime_tracing already models IO events. +- Deferring IO capture to a UI component: prevents parity with existing CodeTracer replay capabilities. + +--- + +## Consequences + +- **Pros** + - IO, monitoring events, and metadata share the runtime_tracing stream, so downstream tooling continues to work. + - Users keep invoking the CLI exactly the same way. + - FD duplication captures writes coming from both Python and native extensions; recorder logging stays isolated. + +- **Cons / Risks** + - `NonStreamingTraceWriter` must be guarded for cross-thread use; we need to validate performance and correctness when `add_event` is called from background threads. + - Mapping IO chunks to the exact Python thread is best-effort because file descriptors do not expose thread identity. + - Reader threads must stay ahead of producers to avoid filling pipe buffers; shutdown paths must handle long-running readers. + +--- + +## Detailed Design + +### A. Execution lifecycle + +1. **CLI (Python)** + - Parse recorder flags, resolve `script_path`, choose format/path, call `start(trace_dir, format, start_on_enter=script_path)`. + +2. **`start_tracing` (Rust)** + - Initialise logging and guard against nested sessions. + - Ensure the trace directory exists and choose event/meta/path filenames. + - Instantiate `RuntimeTracer` with the requested format and activation path. + - Wrap the tracer in `Arc>` exposing `emit_io_chunk(...)`. + - Construct `OutputCapture`: + - Duplicate original fds/handles and store them for restoration and logger use. + - Create pipes (Unix: `pipe2`; Windows: `CreatePipe`). + - Redirect 0/1/2 via `dup2`/`SetStdHandle`. + - Spawn reader threads that block on the pipe, timestamp (`Instant::now()`), gather OS thread id when available, and push `IoChunk { stream, bytes, ts, os_thread }` into a channel while forwarding the same bytes to the saved descriptors to maintain passthrough console behaviour. + +3. **Tracing activation** + - Install `sys.monitoring` callbacks with `install_tracer(py, Box::new(runtime_tracer_clone))`. + - Start a draining thread that consumes `IoChunk`, resolves Python thread ids via a shared `DashMap` maintained by `RuntimeTracer` on `PY_START/PY_RESUME`, and calls `emit_io_chunk`. + +4. **Python script execution** + - Back in Python, `runpy.run_path(str(script_path), run_name="__main__")` runs the target script. IO already flows through the capture pipelines. + +5. **Shutdown** +- On normal completion or exception: + - Close writer handles to signal EOF. + - Join reader/draining threads with a timeout guard. + - Restore original descriptors/handles. + - Call `flush()`/`finish()` on `RuntimeTracer` and release the monitoring tool id. + +### B. Encoding IO chunks with runtime_tracing + +- `emit_io_chunk` composes a metadata JSON document similar to: + +```json +{ + "stream": "stdout", + "encoding": "base64", + "ts_ns": 1234567890123, + "os_thread": 140355679779840, + "py_thread": 123, + "step": { "path_id": 5, "line": 42, "call_key": 12 }, + "activation": "active" +} +``` + +- The payload is `base64::encode(chunk.bytes)`. +- We invoke `TraceWriter::add_event(TraceLowLevelEvent::Event(RecordEvent { kind, metadata, content }))`, where `kind` is `EventLogKind::Write` for stdout, `WriteOther` for stderr, and `Read` for stdin. +- This keeps IO data inside `trace.bin`/`trace.json`, allowing replay tools to process it alongside steps and returns. + +### C. Mapping algorithm + +- Maintain `DashMap` updated on every LINE event. A snapshot stores `(path_id, line, call_key, perf_counter_ns)`. +- When an IO chunk arrives: + - Map OS thread id to Python thread id if possible; otherwise leave it null. + - Retrieve the latest snapshot for that Python thread. If the chunk timestamp predates the snapshot by more than a configurable threshold, fall back to the global latest snapshot. + - Embed the snapshot in the metadata JSON. +- For multi-line outputs the drainers keep chunk boundaries (newline-aware if the writer flushes per line) so replay can group contiguous chunks with identical snapshots. + +--- + +## Implementation Notes (Unix/Windows) + +- **Unix**: rely on the `nix` crate for `pipe2`, `dup`, `dup2`, and `fcntl` (set CLOEXEC). Reader threads use blocking `read` and propagate EOF by pushing a terminal message. +- **Windows**: use `CreatePipe`, `SetStdHandle`, `DuplicateHandle`, and `ReadFile` in overlapped mode if needed. Convert UTF-16 console output to UTF-8 before encoding. Ensure we re-register the original handles via `SetStdHandle` on teardown. Forwarding threads write mirrored bytes via the saved handles (`WriteFile`) so console output remains visible. +- PTY support (for interactive shells) can be layered later with `openpty`/ConPTY; initial scope sticks to pipes. + +--- + +## Telemetry & Logging + +- Keep `env_logger` initialisation on module import. Default to debug for the recorder crate so developers can inspect capture lifecycle logs. +- Emit debug logs (`install`, `chunk`, `drain_exit`, `restore`) to the preserved stderr handle. Emit failures as `EventLogKind::Error` events with diagnostic metadata when feasible. + +--- + +## Key Files + +- `codetracer-python-recorder/src/lib.rs`: orchestrates tracer startup/shutdown, trace directory provisioning, and will host creation of the `OutputCapture` component. +- `codetracer-python-recorder/src/runtime_tracer.rs`: owns the `NonStreamingTraceWriter`, handles `sys.monitoring` callbacks, and will expose APIs for emitting IO events plus step snapshots. +- `codetracer-python-recorder/src/tracer.rs`: manages registration with `sys.monitoring`; may require adjustments to share thread metadata with the capture pipeline. +- `codetracer-python-recorder/codetracer_python_recorder/__main__.py`: CLI entrypoint that invokes `start()`/`stop()`; may need updates for new flags or environment toggles. +- `codetracer-python-recorder/codetracer_python_recorder/api.py`: Python façade over the Rust extension, coordinating session lifecycle and flushing semantics. +- `codetracer-python-recorder/src/output_capture.rs` (new): encapsulates platform-specific descriptor duplication, pipe management, mirroring, and reader threads. +- `codetracer-python-recorder/tests/` (integration tests): will gain coverage asserting IO events appear in traces and that console passthrough remains functional. + +--- + +## Testing Plan + +1. **Unit / small integration** + - Scripts emitting stdout/stderr; assert generated traces contain `EventLogKind::Write`/`WriteOther` records with correct base64 payloads. + - Validate metadata JSON includes expected `path_id` and `line` for simple cases. + +2. **Concurrency** + - Multi-threaded printing to ensure no deadlocks and that chunks remain ordered. + - `asyncio` tasks writing concurrently; confirm snapshots continue to resolve. + +3. **Input** + - `input()` and `sys.stdin.read()` scenarios; ensure `EventLogKind::Read` captures bytes and EOF. + - Passthrough from the real stdin preserves exact bytes. + +4. **Large output & stress** + - Emit payloads larger than the pipe buffer to validate continuous draining. + - Rapid start/stop cycles ensure descriptors restore cleanly. + +5. **Windows** + - Mirror the above coverage on Windows CI runners, focusing on handle restoration and CRLF handling. + +--- + +## Rollout + +- Gate the feature behind an environment flag (`CODETRACER_CAPTURE_IO=1`) for early adopters, removing it after validation. +- Update CLI help text to mention stdout/stderr/stdin capture. +- Add regression tests driven by `just test` that assert on IO events in traces generated from the examples. + +--- + +## Open Questions / Future Work + +- Improve thread attribution by integrating with Python `threading` hooks if OS-level mapping proves insufficient. +- Allow configurable passthrough (e.g., disable mirroring when running headless or redirect to a file) once the default teeing behaviour is in place. +- Investigate PTY support for interactive applications that expect terminal semantics. +- Consider compressing large payloads before base64 encoding to reduce trace sizes. + +--- + +## Acceptance Criteria + +- Running `python -m codetracer_python_recorder script.py` produces runtime_tracing files containing IO events alongside existing step/call records. +- stdout, stderr, and stdin bytes are captured losslessly and attributed to the most relevant step snapshot. +- Original descriptors are restored even on exceptions or early exits. +- Reader threads terminate cleanly without leaks on Unix and Windows. diff --git a/design-docs/error-handling-implementation-plan.md b/design-docs/error-handling-implementation-plan.md index 0abff33..608a499 100644 --- a/design-docs/error-handling-implementation-plan.md +++ b/design-docs/error-handling-implementation-plan.md @@ -7,25 +7,28 @@ - Provide developers with ergonomic macros, tooling guardrails, and comprehensive tests covering failure paths. ## Current Gaps -- Ad-hoc `PyRuntimeError` strings in `src/session.rs:21-76` and `src/runtime/mod.rs:77-190` prevent stable categorisation and user scripting. -- FFI trampolines in `src/monitoring/tracer.rs:268-706` and activation helpers in `src/runtime/activation.rs:24-83` still use `unwrap`/`expect`, so poisoned locks or filesystem errors abort the interpreter. -- Python facade functions (`codetracer_python_recorder/session.py:27-63`) return built-in exceptions and provide no context or exit codes. -- No support for JSON diagnostics, policy switches, or atomic output staging; disk failures can leave half-written traces and logs mix stdout/stderr. +- PyO3 entry points still surface plain Python exceptions; `codetracer_python_recorder/session.py:27-63` raises built-in errors without stable codes or hierarchy. +- Monitoring trampolines continue to rely on `GLOBAL.lock().unwrap()` and lack structured error propagation; poisoned mutexes will still panic. +- Policy toggles (`--on-recorder-error`, `--keep-partial-trace`, logging controls) and JSON diagnostics remain unimplemented. +- Atomic output staging is limited to the existing non-streaming writer; detaching/abort semantics are not yet centralised. ## Workstreams ### WS1 – Foundations & Inventory +**Status:** In progress (2025-10-02). `just errors-audit` added; call sites catalogued in the accompanying status log. - Add a `just errors-audit` command that runs `rg` to list `PyRuntimeError`, `unwrap`, `expect`, and direct `panic!` usage in the recorder crate. - Create issue tracker entries grouping call sites by module (`session`, `runtime`, `monitoring`, Python facade) to guide refactors. - Exit criteria: checklist of legacy error sites recorded with owners. ### WS2 – `recorder-errors` Crate +**Status:** Completed (2025-10-02). Crate scaffolded with central types, macros, and unit tests; workspace updated to include it. - Scaffold `recorder-errors` under the workspace with `RecorderError`, `RecorderResult`, `ErrorKind`, `ErrorCode`, context map type, and conversion traits from `io::Error`, `PyErr`, etc. - Implement ergonomic macros (`usage!`, `enverr!`, `target!`, `bug!`, `ensure_*`) plus unit tests covering formatting, context propagation, and downcasting. - Publish crate docs explaining mapping rules and promises; link ADR 0004. - Exit criteria: `cargo test -p recorder-errors` covers all codes; workspace builds with the new crate. ### WS3 – Retrofit Rust Modules +**Status:** Completed (2025-10-02). Core Rust modules now emit `RecorderError` instances; PyO3 entry points map them through `to_py_err` with stable codes and context. - Replace direct `PyRuntimeError` construction in `src/session/bootstrap.rs`, `src/session.rs`, `src/runtime/mod.rs`, `src/runtime/output_paths.rs`, and helpers with `RecorderResult` + macros. - Update `RuntimeTracer` to propagate structured errors instead of strings; remove `expect`/`unwrap` in hot paths by returning classified `bug!` or `enverr!` failures. - Introduce a small adapter in `src/runtime/mod.rs` that stages IO writes and applies the atomic/partial policy described in ADR 0004. @@ -52,12 +55,15 @@ ### WS7 – Test Coverage & Tooling Enforcement - Add unit tests for the new error crate, IO façade, policy switches, and FFI wrappers (panic capture, exception mapping). - Extend Python tests to cover the new exception hierarchy, JSON diagnostics, and policy flags. +- Add backend integration coverage for every `RecorderError` kind (exercise unwritable directories for `EnvironmentError`, force a mocked `capture_call_arguments` failure for `TargetError`, and trigger a panic inside a wrapped callback to assert `InternalError` mapping and log capture). +- Add regression tests that hit the Rust tracer on real scripts (validate IO context metadata from `TraceOutputPaths::configure_writer`, ensure repeated start/stop cycles leave no partial artefacts, and confirm debug logs stay quiet when `RUST_LOG` is unset). - Introduce CI lints (`cargo clippy --deny clippy::panic`, custom script rejecting `unwrap` outside allowed modules) and integrate with `just lint`. - Exit criteria: CI blocks regressions; failure-path tests cover disk full, permission denied, target exceptions, partial trace recovery, and SIGINT during detach. ### WS8 – Documentation & Rollout - Update README, API docs, and onboarding material to describe guarantees, exit codes, example snippets, and migration guidance for downstream tools. - Add a change log entry summarising the policy and how to consume structured errors from Python. +- Document assertion guidance: prefer `bug!`/`ensure_internal!` for invariant violations, reserve `assert!` for tests, and pair `debug_assert!` with the classified error when you need both dev-time tripwires and production containment. - Track adoption status in `design-docs/error-handling-implementation-plan.status.md` (mirror existing planning artifacts). - Exit criteria: Documentation merged, status file created, ADR 0004 promoted to **Accepted** once WS2–WS7 land. @@ -89,4 +95,3 @@ - All panics are caught before crossing into Python; fuzz tests confirm no UB. - `just test` (and targeted error suites) pass on Linux/macOS CI, with new structured logs and metrics visible. - Documentation reflects guarantees, and downstream teams acknowledge new exit codes. - diff --git a/design-docs/error-handling-implementation-plan.status.md b/design-docs/error-handling-implementation-plan.status.md new file mode 100644 index 0000000..199b233 --- /dev/null +++ b/design-docs/error-handling-implementation-plan.status.md @@ -0,0 +1,76 @@ +# Error Handling Implementation Plan — Status + +_Last updated: 2025-10-05_ + +## WS1 – Foundations & Inventory +State: In progress +Tooling: `just errors-audit` (finds `PyRuntimeError::new_err`, `unwrap`/`expect`/`panic!`, Python `RuntimeError`/`ValueError`). +What we saw: +- Rust modules now emit `RecorderError`; raw Python exceptions survive in `codetracer_python_recorder/session.py` and tests (ISSUE-014). +- `src/monitoring/tracer.rs` still uses `lock().unwrap()` and lacks error reporting for callback failures (ISSUE-013). +- Python glue keeps legacy assertions/unwraps (ISSUE-012). +Next moves: +- Land ISSUE-013 to sort the locking story. +- Plan the Python facade cleanup (ISSUE-014) once WS4 is steady. + +## WS2 – `recorder-errors` Crate +State: Done (2025-10-02) +Highlights: +- Added `crates/recorder-errors` with `RecorderError`, enums, context helpers, macros (`usage!`, `enverr!`, `target!`, `bug!`, `ensure_*`), plus tests and optional serde support. +- `cargo test -p recorder-errors` + workspace `cargo check` stay green. +Next moves: Use this crate everywhere in WS3/WS4 work. + +## WS3 – Retrofit Rust Modules +State: Done (2025-10-02) +Highlights: +- `session/*`, `runtime/*`, and `monitoring/tracer.rs` now return `RecorderError` via the shared macros. +- Python exposure happens through one `errors` mapper; IO errors now carry context. +- No stray `PyRuntimeError::new_err` left outside that mapper. +Next moves: Feed findings into WS4 and loop back to WS1 issues. + +## WS4 – FFI Wrapper & Python Exception Hierarchy +State: Done (2025-10-02) +Highlights: +- Added `ffi` guard around each PyO3 entry point to map `RecorderError` plus panic safety. +- Exposed Python classes `RecorderError`, `UsageError`, `EnvironmentError`, `TargetError`, `InternalError`. +- Rust and Python tests cover the new flow (`uv run cargo nextest run ...`; `.venv/bin/python -m pytest ...`). +Next moves: Hold for WS5 until ISSUES 013/014 close. + +## WS5 – Policy Switches & Runtime Configuration +State: Done (2025-10-03) +Highlights: +- `TraceSession.start()` and `trace()` now refresh policy from env vars and accept override mappings so embeds wire recorder switches without manual plumbing. +- Rust exports expose `configure_policy`/`configure_policy_from_env` under the expected Python names; unit tests cover env-driven and explicit override flows. +- Runtime tracer finish path honours `RecorderPolicy`: callback errors respect `on_recorder_error` (disable detaches without surfacing exceptions), `require_trace` now fails cleanly when no events land, and partial traces are deleted or retained based on `keep_partial_trace`. +- Python CLI integration tests exercise disable vs abort paths and require-trace enforcement using the new failure-injection toggles; CLI now propagates runtime shutdown errors so exit codes reflect policy outcomes while partial traces are cleaned per configuration. +Next moves: Kick off WS6 once upstream WS1 cleanups land. + +## WS6 – Logging, Metrics, and Diagnostics +State: Done (2025-10-03) +Highlights: +- Replaced the `env_logger` helper with a structured JSON logger that always emits `run_id`, active `trace_id`, and `error_code` fields while honouring policy-driven log level and log file overrides. +- Introduced a pluggable `RecorderMetrics` sink and instrumented dropped locations, policy-triggered detachments, and caught panics across the monitoring/runtime paths; Rust unit tests exercise the metrics capture. +- Enabled the `--json-errors` policy path so runtime shutdown emits a single-line JSON trailer on stderr; CLI integration tests now assert the abort flow surfaces the trailer alongside existing stack traces. +Next moves: Wire the metrics sink into the chosen exporter and align the log schema with Observability consumption before rolling out to downstream tooling. + +## WS7 – Test Coverage & Tooling Enforcement +State: Done (2025-10-04) +Highlights: +- Expanded `recorder-errors` and policy unit tests covering every macro (usage/target/internal ensures) plus invalid boolean parsing. +- Added FFI unit tests for `dispatch`/`wrap_pyfunction`, panic containment, and Python exception attribute propagation. +- Introduced integration coverage for environment permission failures, injected target argument capture errors, and synthetic callback panics (verifying JSON trailers and error classes). +- Implemented `just lint` orchestration running `cargo clippy -D clippy::panic` and a repository script that blocks unchecked `.unwrap(` usage outside the legacy allowlist. +Next moves: Monitor unwrap allowlist shrinkage once WS1 follow-ups land; evaluate extending the lint to `.expect(` once monitoring refactor closes. + +## WS8 – Documentation & Rollout +State: Done (2025-10-05) +Highlights: +- README now covers the recorder error policy, JSON trailers, exit codes, and a short Python `RecorderError` catch example. +- Added `docs/onboarding/error-handling.md` with migration steps, policy wiring tips, and assertion rules for contributors. +- Started `codetracer-python-recorder/CHANGELOG.md` to brief downstream tools on consuming structured errors. +Next moves: +- Share the onboarding doc with downstream maintainers and collect gaps before promoting ADR 0004 to **Accepted**. +- Fold feedback into the change log before the next release tag. + +## Upcoming Workstreams +- None. Hold for ADR 0004 promotion once downstream validation wraps up. diff --git a/docs/onboarding/error-handling.md b/docs/onboarding/error-handling.md new file mode 100644 index 0000000..0fdaacd --- /dev/null +++ b/docs/onboarding/error-handling.md @@ -0,0 +1,58 @@ +# Recorder Error Handling Onboarding + +This note aligns new contributors and downstream consumers on the structured error work. Keep it close when you wire the recorder into tools or review patches that touch failure paths. + +## Error classes at a glance +- `RecorderError` is the base class. Subclasses are `UsageError`, `EnvironmentError`, `TargetError`, and `InternalError`. +- Every instance exposes `code` (an `ERR_*` string), `kind` (matches the class), and a `context` dict with string keys. +- Codes stay stable. Add new codes instead of recycling strings. +- The Rust layer also attaches a source error when possible; Python reprs show it as `caused by ...`. + +## Python API quick start +```python +from codetracer_python_recorder import RecorderError, TargetError, start, stop + +try: + session = start("/tmp/trace", format="json") +except RecorderError as err: + print(f"Recorder failed: {err.code}") + for key, value in err.context.items(): + print(f" {key}: {value}") +else: + try: + ... # run traced work here + finally: + session.flush() + stop() +``` +- Catch `RecorderError` when you want a single guard. Catch subclasses when you care about `UsageError` vs `TargetError`. +- Calling `start` twice raises `RuntimeError` from a thin Python guard. Everything after the guard uses `RecorderError`. + +## CLI workflow and JSON trailers +- Run `python -m codetracer_python_recorder --codetracer-format=json app.py` to trace a script. +- Exit codes: `0` for success, script exit code when the script stops itself, `1` when a `RecorderError` escapes startup/shutdown, `2` on CLI misuse. +- Pass `--codetracer-json-errors` (or `configure_policy(json_errors=True)`) to mirror each failure as a one-line JSON object on stderr. +- JSON fields: `run_id`, optional `trace_id`, `error_code`, `error_kind`, `message`, `context`. + +## Migration checklist for existing clients +1. Replace `RuntimeError` / string matching with `RecorderError` + `err.code` checks. +2. Forward policy options through `configure_policy` (or `policy_snapshot`) instead of reinventing env parsing. +3. Expect structured log lines on stderr. Parse JSON and read the `error_code` field. +4. Opt in to JSON trailers when you need machine-readable failure signals. +5. Keep CLI wrappers short. Avoid reformatting the recorder message; attach extra context alongside it. + +## Assertion rules for recorder code +- Use `ensure_usage!`, `ensure_env!`, or `ensure_internal!` when translating invariants into classified failures. +- Reach for `bug!` when you hit a state that should never happen in production. +- Reserve `assert!` and `debug_assert!` for tests or temporary invariants. If you need a dev-only guard, combine `debug_assert!` with the matching `ensure_*` call so production still fails cleanly. +- Never reintroduce `.unwrap()` inside the recorder crate without extending the allowlist. Use the macros instead. + +## Tooling guardrails +- Run `just lint` before sending a patch. It runs Clippy with `-D clippy::panic` and our unwrap scanner. +- Run `just test` to exercise Rust (nextest) and Python suites. Failure injections cover permission errors, target crashes, and panic paths. +- Enable the `integration-test` cargo feature when you add new Python surface tests so the Rust hooks are active. +- When in doubt, add a regression test alongside the docs. The plan treats docs plus tests as the definition of done. + +## Need help? +- Check `design-docs/error-handling-implementation-plan.md` for context and open questions. +- Ping the error-handling working thread if a new code or policy toggle seems missing. The goal is to keep `RecorderError` exhaustive, not to fork ad hoc enums in downstream tools. diff --git a/issues.md b/issues.md index f60f9ef..a97250e 100644 --- a/issues.md +++ b/issues.md @@ -1,63 +1,69 @@ -# General Issues +## ISSUE-011 +### Description +Consolidate session/bootstrap error handling while migrating to the central +`RecorderError` façade. Current call sites return raw `PyRuntimeError` strings +without classification: +- `src/session.rs:26` +- `src/session/bootstrap.rs:71`, `:79`, `:92` -# Issues Breaking Declared Relations +### Definition of Done +- Replace the call sites above with `RecorderResult` + structured error codes. +- Python facade work is tracked under ISSUE-014. +- Unit tests cover usage/environment error variants for session startup. -This document lists concrete mismatches that cause the relations in `relations.md` to fail. +### Status +Completed via WS3 (2025-10-02) -It should be structured like so: -```md -## REL-001 -### ISSUE-001-001 -#### Description -Blah blah blah -#### Proposed solution -Blah blah bleh -### ISSUE-001-002 -... +## ISSUE-012 +### Description +Retrofit runtime helpers and value capture to use classified errors. A number +of hotspots still return bare `PyRuntimeError` or rely on `unwrap`/`expect`: +- `src/runtime/mod.rs:125`, `:159`, `:854` +- `src/runtime/frame_inspector.rs:65`, `:85`, `:99`, `:117`, `:122`, `:129`, + `:134`, `:141`, `:148` +- `src/runtime/value_capture.rs:43`, `:62` +- `src/runtime/value_encoder.rs:70-71` +- `src/runtime/activation.rs:24` -## REL-002 -... -``` +### Definition of Done +- All sites above return `RecorderResult` with stable `ErrorCode`s; unwraps are + replaced by guarded conversions or `bug!` macros. +- Runtime tracer IO paths adopt the atomic write façade defined in ADR 0004. +- Regression tests cover failure paths (missing locals/globals, encoding + mismatches). -## ISSUE-009 +### Status +Open + + +## ISSUE-013 ### Description -Unify list/sequence `lang_type` naming across recorders. The Rust tracer now -emits `TypeKind::Seq` with name "List" for Python `list`, while the -pure-Python recorder uses "Array". This divergence can fragment the trace -schema and complicate downstream consumers. +Harden monitoring/FFI plumbing around `GLOBAL` and tool management. The module +still uses `lock().unwrap()` and direct `PyRuntimeError`: +- `src/monitoring/tracer.rs:268`, `:270`, `:366`, `:432-706` +- `src/monitoring/mod.rs:123` ### Definition of Done -- Both recorders emit the same `lang_type` for Python list values. -- Fixtures and docs/spec are updated to reflect the chosen term. -- Cross-recorder tests pass with consistent types. - -### Proposed solution -- We will use "List" in order to match existing Python nomenclature +- Replace `unwrap` with fallible guard handling (`RecorderError` + policy). +- Install global panic-catching wrappers so monitoring callbacks never unwind + into CPython. +- Add integration tests simulating poisoned mutexes and double-install calls. ### Status -Low priority. We won't work on this unless it blocks another issue. +Open -## ISSUE-010 +## ISSUE-014 ### Description -Clarify scope of dict structural encoding and key typing. The current change -encodes any Python `dict` as a `Sequence` of `(key, value)` tuples and falls -back to generic encoding for non-string keys. Repo rules favor fail-fast over -defensive fallbacks, and ISSUE-008 focused specifically on `**kwargs`. +Introduce structured Python exception hierarchy for user-facing APIs. The +module still raises built-ins: +- `codetracer_python_recorder/session.py:34`, `:57`, `:62` ### Definition of Done -- Decide whether structural dict encoding should apply only to kwargs or to all - dict values; document the choice. -- If limited to kwargs, restrict structured encoding to kwargs capture sites. -- If applied generally, define behavior for non-string keys (e.g., fail fast) - and add tests for nested and non-string-key dicts. - -### Proposed solution -- Prefer failing fast on non-string keys in contexts representing kwargs; if - general dict encoding is retained, update the spec and tests and remove the - defensive fallback for key encoding. +- Define `RecorderError` Python base class and map subclasses to error kinds. +- Update API/unit tests to assert the new classes and associated error codes. +- Document upgrade guidance in the README/changelog. ### Status -Low priority. We won't work on this until a user reports that it causes issues. - +Open diff --git a/flake.lock b/nix/flake.lock similarity index 100% rename from flake.lock rename to nix/flake.lock diff --git a/flake.nix b/nix/flake.nix similarity index 100% rename from flake.nix rename to nix/flake.nix