Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
215 changes: 214 additions & 1 deletion .agents/tasks/2025/08/21-0939-codetype-interface
Original file line number Diff line number Diff line change
Expand Up @@ -97,4 +97,217 @@ diff --git a/None b/design-docs/workflow-platform-spec.md

Please address any inline comments on the diff, as well as any additional instructions below.

Now write a PR/FAQ document for this product. If there are important questions whose answers are unclear add them to the FAQ but don't write an answer
Now write a PR/FAQ document for this product. If there are important questions whose answers are unclear add them to the FAQ but don't write an answer
--- FOLLOW UP TASK ---
Implement full variable capture in codetracer-python-recorder. Add a comprehensive test suite. Here is the spec for the task and the tests:

Python Tracing Recorder: Capturing All Visible Variables at Each Line

Overview of Python Variable Scopes

In CPython, the accessible variables at a given execution point consist of:

Local variables of the current function or code block (including parameters).

Closure (nonlocal) variables that come from enclosing functions (if any).

Global variables defined at the module level (the current module’s namespace).

(Built-ins are also always accessible if not shadowed, but they are usually not included in “visible variables” snapshots for tracing.)


Each executing frame in CPython carries these variables in its namespace. To capture a snapshot of all variables accessible at a line, we need to inspect the frame’s environment, combining locals, nonlocals, and globals. This must work for any code construct (functions, methods, comprehensions, class bodies, etc.) under CPython.

Using the CPython C API (via PyO3) to Get Variables

1. Access the current frame: The sys.monitoring API’s line event callback does not directly provide a frame object. We can obtain the current PyFrameObject via the C API. For example, using PyO3’s FFI, you can call PyThreadState_GetFrame(PyThreadState_Get()) to get a strong reference to the current frame object. (This yields the top-of-stack frame – if your callback is a C function, that should be the frame of the user code. If your callback is a Python function, you may need frame.f_back to get the user code’s frame.)

2. Get all local and closure variables: Once you have the PyFrameObject *frame, retrieve the frame’s local variables mapping. In Python 3.12+, frame.f_locals is a proxy that reflects both local variables and any closure (cell/free) variables with their current values. In C, you can use PyFrame_GetLocals(frame) to get this mapping (as a proxy object). For a stable snapshot (independent copy), you can convert this to a real dictionary. One approach is calling the new API PyFrame_GetLocalsCopy(frame) (added with PEP 558) which returns a fresh dict of the frame’s locals at that moment. If that function isn’t directly accessible, you can manually create a dict and update it with the proxy (ensuring you capture values at that time).

3. Get global variables: The frame’s globals are in frame.f_globals. You can obtain this dictionary via PyFrame_GetGlobals(frame). This is the module’s global namespace. For completeness, you may copy it to avoid future mutations, but since it’s a regular dict (not an optimized locals proxy), reading it directly is fine.

4. Combine into a snapshot: The union of keys from the locals/closure snapshot and the globals dictionary constitutes all names accessible in that scope. In practice, you might present them separately (like debuggers do, showing locals vs. globals). But if needed, you can merge them (with locals overriding globals on name conflicts, as Python name resolution would). Each variable’s value can then be serialized or truncated as required.

Important Details and Edge Cases

Closure (free) variables: In modern CPython, closure variables are handled seamlessly via the frame’s locals proxy. You do not need to separately fetch function.__closure__ or outer frame variables – the frame’s local mapping already includes free vars. The PEP for frame proxies explicitly states that each access to frame.f_locals yields a mapping of local and closure variable names to their current values. This ensures that in a nested function, variables from an enclosing scope (nonlocals) appear in the inner frame’s locals mapping (bound to the value in the closure cell).

Comprehensions and generators: In Python 3, list comprehensions, generator expressions, and the like are implemented as separate function frames. The above approach still works since those have their own frames (with any needed closure variables included similarly). Just grab that frame’s locals and globals as usual.

Class bodies and module level: A class body or module top-level code is executed in an unoptimized frame where locals == globals (module) or a new class namespace dict. In these cases, frame.f_locals is a real dict of the namespace. Using PyFrame_GetLocalsCopy will still produce a snapshot of that dict. The global variables (for a class body, the “global” context is the module’s globals) are accessible via f_globals. So the method still enumerates everything correctly.

Builtins: Typically, built-in names (from frame.f_builtins) are implicitly accessible if not shadowed, but they are usually not included in a variables snapshot. You can choose to ignore builtins unless needed, to avoid dumping a large static list each time.

Name resolution order: If needed, CPython 3.12 introduced PyFrame_GetVar(frame, name) which will retrieve a variable by name as the interpreter would – checking locals (including cells), then globals, then builtins. This could be used to fetch specific variables on demand. However, for capturing all variables, it’s more efficient to pull the mappings as described above rather than querying names one by one.


Putting It Together

In your Rust/PyO3 tracing recorder, for each line event you can do something like:

Get the current frame (frame_obj).

Obtain a snapshot dict of locals (with closures) – e.g. via PyFrame_GetLocalsCopy(frame_obj) or by copying the proxy from frame.f_locals.

Get the globals dict (globals_dict = PyFrame_GetGlobals(frame_obj)).

Iterate over these mappings to collect name/value pairs. Apply your truncation to large objects as needed.

Record or output this snapshot for the line.


This approach will work for any Python code running on CPython. It leverages the official C-API designed for debuggers and monitoring tools, ensuring even tricky cases (like nonlocal variables, exec/eval contexts, etc.) are handled correctly. In Python 3.12+, the implementation of frame.f_locals and the new APIs from PEP 558/667 guarantee an up-to-date view of the frame’s environment, including cell/free variables.

By using these facilities via PyO3, you can reliably capture all visible variables at each line of execution in your tracing recorder.

References

Python C-API – Frame Objects: functions to access frame attributes (locals, globals, etc.).

PEP 667 – Frame locals proxy (Python 3.13): frame.f_locals now reflects local + cell + free variables’ values.

PEP 558 – Defined semantics for locals(): introduced Py
Locals_GetCopy/PyFrame_GetLocalsCopy to snapshot locals safely.

Comprehensive Test Suite for Python Tracing Recorder

This test suite is designed to verify that a tracing recorder (using sys.monitoring and frame inspection) correctly captures all variables visible at each executable line of Python code. Each test covers a distinct scope or visibility scenario in Python. The tracer should record every variable that is in scope at that line, ensuring no visible name is missed. We include functions, closures, globals, class scopes, comprehensions, generators, exception blocks, and more, to guarantee full coverage of Python's LEGB (Local, Enclosing, Global, Built-in) name resolution rules.

Each test case below provides a brief description of what it covers, followed by a code snippet (Python script) that exercises that behavior. No actual tracing logic is included – we only show the source code whose execution should be monitored. The expectation is that at runtime, the tracer’s LINE event will fire on each line and the recorder will capture all variables accessible in that scope at that moment.

1. Simple Function: Parameters and Locals

Scope: This test focuses on a simple function with a parameter and local variables. It verifies that the recorder sees function parameters and any locals on each line inside the function. On entering the function, the parameter should be visible; as lines execute, newly assigned local variables become visible too. This ensures that basic function scope is handled.

def simple_function(x):
a = 1 # Parameter x is visible; local a is being defined
b = a + x # Locals a, b and parameter x are visible (b defined this line)
return a, b # Locals a, b and x still visible at return

# Test the function
result = simple_function(5)

Expected: The tracer should capture x (parameter) and then a and b as they become defined in simple_function.

2. Nested Functions and Closure Variables (nonlocal)

Scope: This test covers nested functions, where an inner function uses a closure variable from its outer function. We verify that variables in the enclosing (nonlocal) scope are visible inside the inner function, and that the nonlocal statement allows the inner function to modify the outer variable. Both the outer function’s locals and the inner function’s locals (plus closed-over variables) should be captured appropriately.

def outer_func(x):
y = 1
def inner_func(z):
nonlocal y # Declare y from outer_func as nonlocal
w = x + y + z # x (outer param), y (outer var), z (inner param), w (inner local)
y = w # Modify outer variable y
return w
total = inner_func(5) # Calls inner_func, which updates y
return y, total # y is updated in outer scope
result = outer_func(2)

Expected: Inside inner_func, the tracer should capture x, y (from outer scope), z, and w at each line. In outer_func, it should capture x, y, and later the returned total. This ensures enclosing scope variables are handled (nonlocal variables are accessible to nested functions).

3. Global and Module-Level Variables

Scope: This test validates visibility of module-level (global) variables. It defines globals and uses them inside a function, including modifying a global with the global statement. We ensure that at each line, global names are captured when in scope (either at the module level or when referenced inside a function).

GLOBAL_VAL = 10
counter = 0

def global_test():
local_copy = GLOBAL_VAL # Access a global variable
global counter
counter += 1 # Modify a global variable
return local_copy, counter

# Use the function and check global effects
before = counter
result = global_test()
after = counter

Expected: The tracer should capture GLOBAL_VAL and counter as globals on relevant lines. At the module level, GLOBAL_VAL, counter, before, after, etc. are in the global namespace. Inside global_test(), it should capture local_copy and see GLOBAL_VAL as a global. The global counter declaration ensures counter is treated as global in that function and its updated value remains in the module scope.

4. Class Definition Scope and Metaclass

Scope: This test targets class definition bodies, including the effect of a metaclass. When a class body executes, it has a local namespace that becomes the class’s attribute dictionary. We verify that variables assigned in the class body are captured, and that references to those variables or to globals are handled. Additionally, we include a metaclass to ensure that class creation via a metaclass is also traced.

CONSTANT = 42

class MetaCounter(type):
count = 0
def __init__(cls, name, bases, attrs):
MetaCounter.count += 1 # cls, name, bases, attrs visible; MetaCounter.count updated
super().__init__(name, bases, attrs)

class Sample(metaclass=MetaCounter):
a = 10
b = a + 5 # uses class attribute a
print(a, b, CONSTANT) # can access class attrs a, b and global CONSTANT
def method(self):
return self.a + self.b

# After class definition, metaclass count should have incremented
instances = MetaCounter.count

Expected: Within MetaCounter, the tracer should capture class-level attributes like count as well as method parameters (cls, name, bases, attrs) during class creation. In Sample’s body, it should capture a once defined, then b and a on the next line, and even allow access to CONSTANT (a global) during class body execution. After definition, Sample.a and Sample.b exist as class attributes (not directly as globals outside the class). The tracer should handle the class scope like a local namespace for that block.

5. Lambdas and Comprehensions (List, Set, Dict, Generator)

Scope: This combined test covers lambda expressions and various comprehensions, each of which introduces an inner scope. We ensure the tracer captures variables inside these expressions, including any outer variables they close over and the loop variables within comprehensions. Notably, in Python 3, the loop variable in a comprehension is local to the comprehension and not visible outside.

Lambda: Tests an inline lambda function with its own parameter and expression.

List Comprehension: Uses a loop variable internally and an external variable.

Set & Dict Comprehensions: Similar scope behavior with their own loop variables.

Generator Expression: A generator comprehension that lazily produces values.


factor = 2
double = lambda y: y * factor # 'y' is local parameter, 'factor' is captured from outer scope

squares = [n**2 for n in range(3)] # 'n' is local to comprehension, not visible after6
scaled_set = {n * factor for n in range(3)} # set comprehension capturing outer 'factor'
mapping = {n: n*factor for n in range(3)} # dict comprehension with local n
gen_exp = (n * factor for n in range(3)) # generator expression (lazy evaluated)
result_list = list(gen_exp) # force generator to evaluate

Expected: Inside the lambda, y (parameter) and factor (enclosing variable) are visible to the tracer. In each comprehension, the loop variable (e.g., n) and any outer variables (factor) should be captured during the comprehension's execution. After the comprehension, the loop variable is no longer defined (e.g., n is not accessible outside the list comprehension). The generator expression has a similar scope to a comprehension; its variables should be captured when it's iterated. All these ensure the recorder handles anonymous function scopes and comprehension internals.

6. Generators and Coroutines (async/await)

Scope: This test covers a generator function and an async coroutine function. Generators use yield to produce values and suspend execution, while async coroutines use await. We ensure that local variables persist across yields/awaits and remain visible when execution resumes (on each line hit). This verifies that the tracer captures the state in suspended functions.

def counter_gen(n):
total = 0
for i in range(n):
total += i
yield total # At yield: i and total are visible and persisted across resumes
return total

import asyncio
async def async_sum(data):
total = 0
for x in data:
total += x
await asyncio.sleep(0) # At await: x and total persist in coroutine
return total

# Run the generator
gen = counter_gen(3)
gen_results = list(gen) # exhaust the generator

# Run the async coroutine
coroutine_result = asyncio.run(async_sum([1, 2, 3]))

Expected: In counter_gen, at each yield line the tracer should capture i and total (and after resumption, those values are still available). In async_sum, at the await line, x and total are captured and remain after the await. The tracer must handle the resumption of these functions (triggered by PY_RESUME events) and still see previously defined locals. This test ensures generator state and coroutine state do not lose any variables between pauses.

7. Try/Except/Finally and With Statement

Scope: This test combines exception handling blocks and context manager usage. It verifies that the tracer captures variables introduced in a try/except flow (including the exception variable, which has a limited scope) as well as in a with statement context manager. We specifically ensure the exception alias is only visible inside the except block, and that variables from try, else, and finally blocks, as well as the with target, are all accounted for.

def exception_and_with_demo(x):
try:
inv = 10 / x # In try: 'inv' defined if no error
except ZeroDivisionError as e:
error_msg = fError:
49 changes: 49 additions & 0 deletions codetracer-python-recorder/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions codetracer-python-recorder/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,4 @@ env_logger = "0.11"

[dev-dependencies]
pyo3 = { version = "0.25.1", features = ["auto-initialize"] }
tempfile = "3.10"
Loading
Loading