added sample_filter_outputs utility and accompanying simple tests #526

Dekermanjian · 2025-06-17T16:22:14Z

This addresses #521, allows state space users to sample filtered|predicted|smoothed|observed states|covariances using a utility that is consistent with PyMC workflow of sample -> sample_posterior.

jessegrabowski

This looks great to me, just a small question about the use of modecontext here

pymc_extras/statespace/core/statespace.py

jessegrabowski · 2025-07-21T10:24:32Z

Also you need to rebase :)

Copilot

Pull Request Overview

This PR adds a sample_filter_outputs utility method to the StateSpace class that enables users to sample filtered, predicted, smoothed, and observed states and covariances from fitted models. This aligns with PyMC's workflow pattern of sample → sample_posterior_predictive.

Adds sample_filter_outputs method to StateSpace class for sampling various filter outputs
Includes validation logic to ensure requested filter output names are valid
Adds comprehensive tests covering basic functionality and error handling

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
pymc_extras/statespace/core/statespace.py	Implements the main `sample_filter_outputs` method with validation and output sampling logic
tests/statespace/core/test_statespace.py	Adds test cases for the new utility including positive and negative test scenarios

Comments suppressed due to low confidence (1)

tests/statespace/core/test_statespace.py:1045

The error message test hardcodes the expected array string representation, which may be fragile across different NumPy versions or platforms. Consider using a more flexible assertion that checks for key parts of the error message instead of the exact format.

    msg = "['filter_covariances' 'filter_states'] not a valid filter output name!"

pymc_extras/statespace/core/statespace.py

Copilot · 2025-07-21T10:24:49Z

pymc_extras/statespace/core/statespace.py

+                    case "filtered_states" | "predicted_states" | "smoothed_states":
+                        dims = [TIME_DIM, "state"]
+                    case "filtered_covariances" | "predicted_covariances" | "smoothed_covariances":
+                        dims = [TIME_DIM, "state", "state_aux"]
+                    case "observed_states":
+                        dims = [TIME_DIM, "observed_state"]
+                    case "observed_covariances":


The match statement uses hardcoded string literals that are repeated multiple times. Consider defining constants for the filter output names to improve maintainability and reduce the risk of typos.

Suggested change

case "filtered_states" | "predicted_states" | "smoothed_states":

dims = [TIME_DIM, "state"]

case "filtered_covariances" | "predicted_covariances" | "smoothed_covariances":

dims = [TIME_DIM, "state", "state_aux"]

case "observed_states":

dims = [TIME_DIM, "observed_state"]

case "observed_covariances":

case FILTERED_STATES | PREDICTED_STATES | SMOOTHED_STATES:

dims = [TIME_DIM, "state"]

case FILTERED_COVARIANCES | PREDICTED_COVARIANCES | SMOOTHED_COVARIANCES:

dims = [TIME_DIM, "state", "state_aux"]

case OBSERVED_STATES:

dims = [TIME_DIM, "observed_state"]

case OBSERVED_COVARIANCES:

A dictionary mapping cases to dims might be better than the casework here. There isn't already one in constants?

There was a dictionary in constats.py variable name FILTER_OUTPUT_DIMS but the key names are singular whereas the returned names from both kalman_(filter|smoother).build_graph() are plural. I handled this inside the method but I am wondering if you want to consolidate this either in constants.py or inside the kalman_(filter|smoother).build_graph() methods?

Barf. This is the kind of inconsistency that we really need to stomp out. Do you see any reason why there should be a singular version and a plural version? My guess is that I just did it out of sloppiness. If you don't, can you pick one and use it everywhere?

My preference would be plural. The logic is that the variable (the symbolic output) really is many states. On the other hand, dimensions should be singular, because it's just one dimension. Example: the "state" dimension is a label for the 1st dimension of a (100, 5) tensor, vs the "filtered_states" object which is a (100, 5) tensor concatenating the evolution of 5 states over 100 timesteps.

Rebased from upstream

2. Added handle for when filter_output param is passed in as a str 3. removed case statement in favor of dictionary mapping that already exists in conf.py

Dekermanjian · 2025-07-22T22:25:46Z

Hey @jessegrabowski, I updated some of the constant in constants.py to be plural and updated the tests for any mismatches. There were a few constants that I was not sure about so I wanted to ask you first before I change them. These are:

ALL_STATE_DIM = "state(?s)"
ALL_STATE_AUX_DIM = "state(?s)_aux"
OBS_STATE_DIM = "observed_state(?s)"
OBS_STATE_AUX_DIM = "observed_state(?s)_aux"

NEVER_TIME_VARYING = ["initial_state(?s)", "initial_state(?s)_cov(?s)"]
VECTOR_VALUED = ["initial_state(?s)", "state(?s)_intercept(?s)", "obs_intercept(?s)"]

LONG_MATRIX_NAMES = [
    "initial_state(?s)",
    "initial_state(?s)_cov(?s)",
    "state(?s)_intercept(?s)",
    "obs_intercept(?s)",
    "obs_cov(?s)",
    "state_cov(?s)",
]

jessegrabowski · 2025-07-24T15:38:49Z

Dims should be singular, I have strong feelings on that.

For the matrix names, I have less of a strong preference. On one hand, x0 is a vector of states, but on the other hand, it is the state vector. So it could go either way. I guess I lean to not changing things in that case?

If you agree, that would mean all of the things you identified there would stay as-is I think.

Dekermanjian · 2025-07-24T18:51:02Z

I agree that the singular dim names sound better. No objections from me about keeping the rest as-is! I think the main issue of having the same thing named differently is now resolved 🤞

jessegrabowski

A few final nitpicks then let's merge this! It's looking really great.

jessegrabowski · 2025-07-25T08:53:28Z

pymc_extras/statespace/core/statespace.py

+            # Filter output names are singular in constants.py but are returned as plural from kalman_.build_graph()
+            # filter_output_dims_mapping = {}
+            # for k in FILTER_OUTPUT_DIMS.keys():
+            #     filter_output_dims_mapping[k + "s"] = FILTER_OUTPUT_DIMS[k]


oops! Sorry about this. That was a careless oversight. I will clean that up right away!

jessegrabowski · 2025-07-25T08:54:00Z

pymc_extras/statespace/core/statespace.py

+            else:
+                unknown_filter_output_names = np.setdiff1d(
+                    filter_output_names, [x.name for x in all_filter_outputs]
+                )
+                if unknown_filter_output_names.size > 0:
+                    raise ValueError(
+                        f"{unknown_filter_output_names} not a valid filter output name!"
+                    )
+                filter_output_names = [
+                    x for x in all_filter_outputs if x.name in filter_output_names
+                ]


Move the input validation up to the top, so we fail quickly without doing any work if the user passes invalid names

jessegrabowski · 2025-07-25T08:54:57Z

pymc_extras/statespace/core/statespace.py

+
+        frozen_model = freeze_dims_and_data(m)
+        with frozen_model:
+            idata_filter = pm.sample_posterior_predictive(


nit: no need for an intermediate variable here, just directly return

jessegrabowski · 2025-07-25T08:55:46Z

pymc_extras/statespace/core/statespace.py

+        with frozen_model:
+            idata_filter = pm.sample_posterior_predictive(
+                idata if group == "posterior" else idata.prior,
+                var_names=[x.name for x in frozen_model.deterministics],


just use filter_output_names here. I'm not sure anything could go wrong with your approach, but it's an unnecessary extra bit of complexity.

…intermediate variables

jessegrabowski · 2025-07-25T15:05:25Z

pymc_extras/statespace/core/statespace.py

@@ -1684,6 +1684,21 @@ def sample_filter_outputs(
        if isinstance(filter_output_names, str):
            filter_output_names = [filter_output_names]

+        drop_keys = {"predicted_observed_states", "predicted_observed_covariances"}


I think we shouldn't treat these as special (even though I agree it's silly to ask for them). I'd be confused if I tried to ask for them and it said it's not a valid filter output name.

Having everything in one place is convenient, even if it's duplicative.

Okay, yeah I agree with you.

I just wasn't sure because in constants.py FILTER_OUTPUT_DIMS has predicted_observed_states and predicted_observed_covariances but the output from kalman_filter.build_graph() doesn't have predicted_observed_states and predicted_observed_covariances it seems like these are named observed_states and observed_covariances.

Should I change the names in constants.py to match the returned names from kalman_filter.build_graph()?

Yes, these should be consistent. But where does the name change currently happen between the filter and the idata? Maybe this is an issue for another PR.

@jessegrabowski, I believe this happens in _postprocess_scan_results() in kalman_filter.py. It looks like the names of the filter outputs are hardcoded in there.

If you want to make this consistent in this PR I have no objection. I don't have a good sense if you should change FILTER_OUTPUT_DIMS to match the output names, or change the output names to match the FILTER_OUTPUT_DIMS. I'll defer to you if you have a sense of which one is better.

Okay, I will do it in this PR because I think it is somewhat related. I think the names should match whatever we put in FILTER_OUTPUT_DIMS

jessegrabowski

I made one last nitpick, but it's not a blocker. Feel free to address or not, then merge :)

…pdated sample_filter_outputs to allow sampling any filter outputs defined in constants.py

Dekermanjian · 2025-07-28T10:50:40Z

I made one last nitpick, but it's not a blocker. Feel free to address or not, then merge :)

Hey @jessegrabowski, I don't believe I have permissions to merge.

jessegrabowski · 2025-07-28T11:37:44Z

Great work as always :D

Dekermanjian · 2025-07-28T13:31:07Z

Thank you, Jesse! Always happy to help!

jessegrabowski requested changes Jul 21, 2025

View reviewed changes

pymc_extras/statespace/core/statespace.py Outdated Show resolved Hide resolved

jessegrabowski requested a review from Copilot July 21, 2025 10:23

Copilot AI reviewed Jul 21, 2025

View reviewed changes

Dekermanjian added 2 commits July 21, 2025 15:48

added sample_filter_outputs utility and accompanying simple tests

fd87691

Rebased from upstream

1. removed modelcontext call that is not needed

5b064d4

2. Added handle for when filter_output param is passed in as a str 3. removed case statement in favor of dictionary mapping that already exists in conf.py

Dekermanjian force-pushed the filter_outputs_utility branch from 0d4df37 to 5b064d4 Compare July 21, 2025 21:50

updated plurality for some of the constants in constants.py

d142a91

jessegrabowski requested changes Jul 25, 2025

View reviewed changes

cleaned up commented code, moved internal checks to the top, reduced …

9e78bae

…intermediate variables

jessegrabowski reviewed Jul 25, 2025

View reviewed changes

jessegrabowski approved these changes Jul 25, 2025

View reviewed changes

updated kalman filter outputs to use names defined in constants.py, u…

46149ac

…pdated sample_filter_outputs to allow sampling any filter outputs defined in constants.py

jessegrabowski merged commit 24930b5 into pymc-devs:main Jul 28, 2025
17 checks passed

Uh oh!

added sample_filter_outputs utility and accompanying simple tests #526

added sample_filter_outputs utility and accompanying simple tests #526

Uh oh!

Conversation

Dekermanjian commented Jun 17, 2025

Uh oh!

jessegrabowski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jessegrabowski commented Jul 21, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dekermanjian Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dekermanjian commented Jul 22, 2025

Uh oh!

jessegrabowski commented Jul 24, 2025

Uh oh!

Dekermanjian commented Jul 24, 2025

Uh oh!

jessegrabowski left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jessegrabowski left a comment

Choose a reason for hiding this comment

Uh oh!

Dekermanjian commented Jul 28, 2025

Uh oh!

Uh oh!

jessegrabowski commented Jul 28, 2025

Uh oh!

Dekermanjian commented Jul 28, 2025

Uh oh!

Uh oh!

Dekermanjian Jul 21, 2025 •

edited

Loading