chore: Separate status check from synchronizer functionality #373

keelerm84 · 2025-11-14T18:01:34Z

In the previous setup, we would only check the fallback or recovery
conditions once the synchronizer returned an update. If the synchronizer
was stuck, or nothing was changing in the environment, we would never
check the conditions.

This configuration also exposed an interesting behavior. If the
synchronizer cannot connect, it will emit error updates. Each time we
receive an error, we check if we have failed to initialize for the last
10 seconds. If so, we re-create the primary synchronizer.

When it continues to fail, the first update will trigger the condition
check. And since it has still failed for 10 seconds, it will immediately
error out. With this change, we can be assured a synchronizer is given
at least 10 seconds to try before the condition is evaluated.

Note

Separates FDv2 fallback/recovery checks from synchronizer emissions via a timer+reader thread, fixes initial status timestamp, removes streaming context manager hooks, and skips flaky file-source tests in CI.

FDv2 core:
- Periodically evaluates conditions independent of updates: _consume_synchronizer_results now uses a queue, background reader thread, and RepeatingTask timer to trigger regular status checks and act without waiting for sync updates.
- Minor flow tweak when starting the secondary synchronizer.
Status handling:
- Initialize DataSourceStatus.since with time.time() in DataSourceStatusProviderImpl.
Streaming:
- Remove context-manager methods (__enter__, __exit__) from StreamingDataSource.
Tests:
- Update test_two_phase_init to ensure version progression and add readiness/change events.
- Mark file data source tests as flaky-skipped when LD_SKIP_FLAKY_TESTS is set.
CI:
- Set LD_SKIP_FLAKY_TESTS: true for test steps on Linux and Windows jobs.

^{Written by Cursor Bugbot for commit fa34244. This will update automatically on new commits. Configure here.}

ldclient/impl/datasystem/fdv2.py

jsonbailey · 2025-11-17T22:43:40Z

ldclient/impl/datasystem/fdv2.py

            synchronizer.stop()
+            timer.stop()
+
+            sync_reader.join(0.5)


What is the purpose of the half second timeout here? My understanding is that if it doesn't stop in .5 seconds, execution will move on and that thread will still continue to process.

Not a blocker, just curious.

We are trying to shut down in an orderly fashion. The timeout gives us time to close out correctly, but eventually we do have to move on if it isn't halting as expected. The idea is that if close has been called, the python interpreter is going to exit shortly after anyway.

In the previous setup, we would only check the fallback or recovery conditions once the synchronizer returned an update. If the synchronizer was stuck, or nothing was changing in the environment, we would never check the conditions. This configuration also exposed an interesting behavior. If the synchronizer cannot connect, it will emit error updates. Each time we receive an error, we check if we have failed to initialize for the last 10 seconds. If so, we re-create the primary synchronizer. When it continues to fail, the first update will trigger the condition check. And since it has still failed for 10 seconds, it will immediately error out. With this change, we can be assured a synchronizer is given at least 10 seconds to try before the condition is evaluated.

cursor · 2025-11-18T19:02:36Z

ldclient/impl/datasystem/fdv2.py

            synchronizer.stop()
+            timer.stop()
+
+            sync_reader.join(0.5)


Bug: Thread Termination Failure Leads to Instability

The sync_reader thread is given only a 0.5-second timeout to exit. If it hasn't finished within that time, execution continues while the thread is still running, potentially causing race conditions or resource leaks when a new synchronizer is subsequently created and another reader thread is started.

ldclient/impl/datasystem/fdv2.py

Makefile

ldclient/impl/datasystem/store.py

ldclient/testing/impl/datasystem/test_fdv2_datasystem.py

ldclient/impl/datasystem/store.py

cursor · 2025-11-19T18:36:00Z

ldclient/impl/datasystem/store.py


        # Send change events
        if affected_items:
+            print("@@@@@ affected items are (delta)", affected_items, "\n")


Bug: Debug print statements left in production code

Debug print() statement with marker @@@@@ left in production code at line 374. This should be removed before merging.

In the previous setup, we would only check the fallback or recovery conditions once the synchronizer returned an update. If the synchronizer was stuck, or nothing was changing in the environment, we would never check the conditions. This configuration also exposed an interesting behavior. If the synchronizer cannot connect, it will emit error updates. Each time we receive an error, we check if we have failed to initialize for the last 10 seconds. If so, we re-create the primary synchronizer. When it continues to fail, the first update will trigger the condition check. And since it has still failed for 10 seconds, it will immediately error out. With this change, we can be assured a synchronizer is given at least 10 seconds to try before the condition is evaluated.

keelerm84 requested a review from a team as a code owner November 14, 2025 18:01

cursor bot reviewed Nov 14, 2025

View reviewed changes

ldclient/impl/datasystem/fdv2.py Show resolved Hide resolved

keelerm84 force-pushed the mk/sdk-1579/improve-recovery branch from cbcd3d6 to 7e0dbce Compare November 14, 2025 18:08

jsonbailey approved these changes Nov 17, 2025

View reviewed changes

kinyoklion approved these changes Nov 17, 2025

View reviewed changes

keelerm84 force-pushed the mk/sdk-1574/outage-detector branch from 85f59ba to 8e81cdb Compare November 18, 2025 18:11

keelerm84 force-pushed the mk/sdk-1579/improve-recovery branch from 7e0dbce to 38fb5f4 Compare November 18, 2025 18:55

Base automatically changed from mk/sdk-1574/outage-detector to feat/fdv2 November 18, 2025 18:55

Empty to trigger ci

3d7db5c

cursor bot reviewed Nov 18, 2025

View reviewed changes

keelerm84 added 2 commits November 18, 2025 15:57

Skip flaky tests

82bb228

Fix lint

2ca4af0

cursor bot reviewed Nov 19, 2025

View reviewed changes

ldclient/impl/datasystem/fdv2.py Outdated Show resolved Hide resolved

ldclient/impl/datasystem/fdv2.py Outdated Show resolved Hide resolved

keelerm84 force-pushed the mk/sdk-1579/improve-recovery branch 2 times, most recently from 1799cee to 644cc13 Compare November 19, 2025 15:22

cursor bot reviewed Nov 19, 2025

View reviewed changes

Makefile Outdated Show resolved Hide resolved

keelerm84 force-pushed the mk/sdk-1579/improve-recovery branch 2 times, most recently from 842e8fa to e578a2e Compare November 19, 2025 17:55

cursor bot reviewed Nov 19, 2025

View reviewed changes

keelerm84 force-pushed the mk/sdk-1579/improve-recovery branch from e578a2e to a37fa56 Compare November 19, 2025 18:33

cursor bot reviewed Nov 19, 2025

View reviewed changes

keelerm84 force-pushed the mk/sdk-1579/improve-recovery branch from a37fa56 to 2ca4af0 Compare November 19, 2025 19:45

keelerm84 added 2 commits November 19, 2025 16:17

fix test

1c61bfc

oops

fa34244

keelerm84 merged commit 7834f90 into feat/fdv2 Nov 19, 2025
24 of 25 checks passed

keelerm84 deleted the mk/sdk-1579/improve-recovery branch November 19, 2025 21:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: Separate status check from synchronizer functionality #373

chore: Separate status check from synchronizer functionality #373

Uh oh!

keelerm84 commented Nov 14, 2025 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

jsonbailey Nov 17, 2025

Uh oh!

jsonbailey Nov 17, 2025

Uh oh!

keelerm84 Nov 18, 2025

Uh oh!

cursor bot Nov 18, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

chore: Separate status check from synchronizer functionality #373

chore: Separate status check from synchronizer functionality #373

Uh oh!

Conversation

keelerm84 commented Nov 14, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jsonbailey Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

jsonbailey Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

keelerm84 Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

cursor bot Nov 18, 2025

Choose a reason for hiding this comment

Bug: Thread Termination Failure Leads to Instability

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Nov 19, 2025

Choose a reason for hiding this comment

Bug: Debug print statements left in production code

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

keelerm84 commented Nov 14, 2025 •

edited by cursor bot

Loading