Skip to content

Conversation

@sandhose
Copy link
Member

@sandhose sandhose commented Jun 20, 2025

The main goal of this PR is to handle device list changes onto multiple writers, off the main process, so that we can have logins happening whilst Synapse is rolling-restarting.

This is quite an intrusive change, so I would advise to review this commit by commit; I tried to keep the history as clean as possible.

There are a few things to consider:

  • the device_list_key in stream tokens becomes a MultiWriterStreamToken, which has a few implications in sync and on the storage layer
  • we had a split between DeviceHandler and DeviceWorkerHandler for master vs. worker process. I've kept this split, but making it rather writer vs. non-writer worker, using method overrides for doing replication calls when needed
  • there are a few operations that need to happen on a single worker at a time. Instead of using cross-worker locks, for now I made them run on the first writer on the list

@github-actions github-actions bot deployed to PR Documentation Preview June 20, 2025 07:38 Active
@github-actions github-actions bot deployed to PR Documentation Preview June 23, 2025 07:58 Active
@sandhose sandhose force-pushed the quenting/device-changes-off-main branch from 80c04b8 to 4a8a124 Compare June 23, 2025 08:41
@github-actions github-actions bot deployed to PR Documentation Preview June 23, 2025 08:42 Active
@sandhose sandhose force-pushed the quenting/device-changes-off-main branch from 4a8a124 to e504cad Compare June 23, 2025 09:03
@github-actions github-actions bot deployed to PR Documentation Preview June 23, 2025 09:06 Active
@sandhose sandhose force-pushed the quenting/device-changes-off-main branch from e504cad to d943a6d Compare June 23, 2025 11:46
@github-actions github-actions bot deployed to PR Documentation Preview June 23, 2025 11:47 Active
@sandhose sandhose force-pushed the quenting/device-changes-off-main branch from d943a6d to 874acc8 Compare June 23, 2025 12:37
@github-actions github-actions bot deployed to PR Documentation Preview June 23, 2025 12:38 Active
@sandhose sandhose force-pushed the quenting/device-changes-off-main branch from 874acc8 to c28d2dd Compare June 23, 2025 15:06
@github-actions github-actions bot deployed to PR Documentation Preview June 23, 2025 15:07 Active
@github-actions github-actions bot deployed to PR Documentation Preview June 23, 2025 15:29 Active
@sandhose sandhose force-pushed the quenting/device-changes-off-main branch from 7b63f09 to a27d754 Compare June 23, 2025 15:32
@github-actions github-actions bot deployed to PR Documentation Preview June 23, 2025 15:34 Active
@github-actions github-actions bot deployed to PR Documentation Preview June 24, 2025 13:17 Active
@github-actions github-actions bot deployed to PR Documentation Preview June 24, 2025 14:36 Active
@github-actions github-actions bot deployed to PR Documentation Preview June 24, 2025 15:13 Active
@sandhose sandhose force-pushed the quenting/device-changes-off-main branch from 1829f79 to 2ac8e0e Compare June 24, 2025 15:45
@github-actions github-actions bot deployed to PR Documentation Preview June 24, 2025 15:46 Active
@github-actions github-actions bot deployed to PR Documentation Preview June 25, 2025 12:56 Active
@github-actions github-actions bot deployed to PR Documentation Preview June 25, 2025 13:46 Active
@github-actions github-actions bot deployed to PR Documentation Preview June 25, 2025 14:50 Active
@sandhose sandhose force-pushed the quenting/device-changes-off-main branch from cdf65c6 to b011080 Compare June 25, 2025 15:05
@github-actions github-actions bot deployed to PR Documentation Preview June 25, 2025 15:06 Active
@github-actions github-actions bot deployed to PR Documentation Preview June 25, 2025 15:08 Active
@sandhose sandhose force-pushed the quenting/device-changes-off-main branch from d5c23ec to 9392adc Compare June 25, 2025 15:17
@github-actions github-actions bot deployed to PR Documentation Preview June 25, 2025 15:18 Active
@github-actions github-actions bot deployed to PR Documentation Preview June 26, 2025 09:43 Active
@github-actions github-actions bot deployed to PR Documentation Preview June 26, 2025 10:19 Active
@@ -0,0 +1 @@
Enable workers to write directly to the device lists stream and handle device list updates, reducing load on the main process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually adding /delete_devices I think uncovered two more bugs:

  • deleting access token was only available on the main store, not worker store (fixed with 07f32aa)
  • when we called notify_device_update through replication, it was missing a conversion from StrCollection (which can be a lot of things) to a proper list to be serializable. Grrr @ replication clients that don't retain proper typing… Fixed with d916e9e

-- @sandhose, #18581 (comment)

Overall, this is the type of thing I can't really review (related discussion: #18581 (comment))

I can add an approval but it's just like "sure".

Feels like some typing/lints are missing around ensuring only serializable data is passed to a ReplicationEndpoint but this is something to address outside of this PR.

@sandhose sandhose merged commit 5ea2cf2 into develop Jul 18, 2025
78 of 80 checks passed
@sandhose sandhose deleted the quenting/device-changes-off-main branch July 18, 2025 07:06
benbz added a commit to element-hq/ess-helm that referenced this pull request Aug 1, 2025
benbz added a commit to element-hq/ess-helm that referenced this pull request Aug 1, 2025
benbz added a commit to element-hq/ess-helm that referenced this pull request Aug 1, 2025
benbz added a commit to element-hq/ess-helm that referenced this pull request Aug 1, 2025
benbz added a commit to element-hq/ess-helm that referenced this pull request Aug 1, 2025
benbz added a commit to element-hq/ess-helm that referenced this pull request Aug 1, 2025
benbz added a commit to element-hq/ess-helm that referenced this pull request Aug 1, 2025
benbz added a commit to element-hq/ess-helm that referenced this pull request Aug 1, 2025
benbz added a commit to element-hq/ess-helm that referenced this pull request Aug 1, 2025
benbz added a commit to element-hq/ess-helm that referenced this pull request Aug 1, 2025
benbz added a commit to element-hq/ess-helm that referenced this pull request Aug 1, 2025
benbz added a commit to element-hq/ess-helm that referenced this pull request Aug 1, 2025
benbz added a commit to element-hq/ess-helm that referenced this pull request Aug 1, 2025
benbz added a commit to element-hq/ess-helm that referenced this pull request Aug 1, 2025
benbz added a commit to element-hq/ess-helm that referenced this pull request Aug 1, 2025
benbz added a commit to element-hq/ess-helm that referenced this pull request Aug 1, 2025
benbz added a commit to element-hq/ess-helm that referenced this pull request Aug 1, 2025
benbz added a commit to element-hq/ess-helm that referenced this pull request Aug 1, 2025
benbz added a commit to element-hq/ess-helm that referenced this pull request Aug 1, 2025
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Aug 7, 2025
Tested on NetBSD 9 amd64 (federation, multiple clients)

# Synapse 1.135.0 (2025-08-01)

## Features

- Add `recaptcha_private_key_path` and `recaptcha_public_key_path` config option. ([\#17984](element-hq/synapse#17984), [\#18684](element-hq/synapse#18684))
- Add plain-text handling for rich-text topics as per [MSC3765](matrix-org/matrix-spec-proposals#3765). ([\#18195](element-hq/synapse#18195))
- If enabled by the user, server admins will see [soft failed](https://spec.matrix.org/v1.13/server-server-api/#soft-failure) events over the Client-Server API. ([\#18238](element-hq/synapse#18238))
- Add experimental support for [MSC4277: Harmonizing the reporting endpoints](matrix-org/matrix-spec-proposals#4277). ([\#18263](element-hq/synapse#18263))
- Add ability to limit amount of media uploaded by a user in a given time period. ([\#18527](element-hq/synapse#18527))
- Enable workers to write directly to the device lists stream and handle device list updates, reducing load on the main process. ([\#18581](element-hq/synapse#18581))
- Support arbitrary profile fields. Contributed by @clokep. ([\#18635](element-hq/synapse#18635))
- Advertise support for Matrix v1.12. ([\#18647](element-hq/synapse#18647))
- Add an option to issue redactions as an admin user via the [admin redaction endpoint](https://element-hq.github.io/synapse/latest/admin_api/user_admin_api.html#redact-all-the-events-of-a-user). ([\#18671](element-hq/synapse#18671))
- Add experimental and incomplete support for [MSC4306: Thread Subscriptions](https://github.com/matrix-org/matrix-spec-proposals/blob/rei/msc_thread_subscriptions/proposals/4306-thread-subscriptions.md). ([\#18674](element-hq/synapse#18674))
- Include `event_id` when getting state with `?format=event`. Contributed by @tulir @ Beeper. ([\#18675](element-hq/synapse#18675))
anoadragon453 added a commit that referenced this pull request Sep 29, 2025
As we are now well past Synapse 1.135. This was originally added in
#18581.
Michael-Ixo pushed a commit to ixoworld/synapse that referenced this pull request Oct 23, 2025
Deployments that make use of the
[synapse-s3-storage-provider](https://github.com/matrix-org/synapse-s3-storage-provider)
module must upgrade to
[v1.6.0](https://github.com/matrix-org/synapse-s3-storage-provider/releases/tag/v1.6.0).
Using older versions of the module with this release of Synapse will prevent
users from being able to upload or download media.

No significant changes since 1.140.0rc1.

- Add [a new Media Query by ID Admin API](https://element-hq.github.io/synapse/v1.140/admin_api/media_admin_api.html#query-a-piece-of-media-by-id) that allows server admins to query and investigate the metadata of local or cached remote media via
  the `origin/media_id` identifier found in a [Matrix Content URI](https://spec.matrix.org/v1.14/client-server-api/#matrix-content-mxc-uris). ([\element-hq#18911](element-hq#18911))
- Add [a new Fetch Event Admin API](https://element-hq.github.io/synapse/v1.140/admin_api/fetch_event.html) to fetch an event by ID. ([\element-hq#18963](element-hq#18963))
- Update [MSC4284: Policy Servers](matrix-org/matrix-spec-proposals#4284) implementation to support signatures when available. ([\element-hq#18934](element-hq#18934))
- Add experimental implementation of the `GET /_matrix/client/v1/rtc/transports` endpoint for the latest draft of [MSC4143: MatrixRTC](matrix-org/matrix-spec-proposals#4143). ([\element-hq#18967](element-hq#18967))
- Expose a `defer_to_threadpool` function in the Synapse Module API that allows modules to run a function on a separate thread in a custom threadpool. ([\element-hq#19032](element-hq#19032))

- Fix room upgrade `room_config` argument and documentation for `user_may_create_room` spam-checker callback. ([\element-hq#18721](element-hq#18721))
- Compute a user's last seen timestamp from their devices' last seen timestamps instead of IPs, because the latter are automatically cleared according to `user_ips_max_age`. ([\element-hq#18948](element-hq#18948))
- Fix bug where ephemeral events were not filtered by room ID. Contributed by @frastefanini. ([\element-hq#19002](element-hq#19002))
- Update Synapse main process version string to include git info. ([\element-hq#19011](element-hq#19011))

- Explain how `Deferred` callbacks interact with logcontexts. ([\element-hq#18914](element-hq#18914))
- Fix documentation for `rc_room_creation` and `rc_reports` to clarify that a `per_user` rate limit is not supported. ([\element-hq#18998](element-hq#18998))

- Remove deprecated `LoggingContext.set_current_context`/`LoggingContext.current_context` methods which already have equivalent bare methods in `synapse.logging.context`. ([\element-hq#18989](element-hq#18989))
- Drop support for unstable field names from the long-accepted [MSC2732](matrix-org/matrix-spec-proposals#2732) (Olm fallback keys) proposal. ([\element-hq#18996](element-hq#18996))

- Cleanly shutdown `SynapseHomeServer` object, allowing artifacts of embedded small hosts to be properly garbage collected. ([\element-hq#18828](element-hq#18828))
- Update OEmbed providers to use 'X' instead of 'Twitter' in URL previews, following a rebrand. Contributed by @HammyHavoc. ([\element-hq#18767](element-hq#18767))
- Fix `server_name` in logging context for multiple Synapse instances in one process. ([\element-hq#18868](element-hq#18868))
- Wrap the Rust HTTP client with `make_deferred_yieldable` so it follows Synapse logcontext rules. ([\element-hq#18903](element-hq#18903))
- Fix the GitHub Actions workflow that moves issues labeled "X-Needs-Info" to the "Needs info" column on the team's internal triage board. ([\element-hq#18913](element-hq#18913))
- Disconnect background process work from request trace. ([\element-hq#18932](element-hq#18932))
- Reduce overall number of calls to `_get_e2e_cross_signing_signatures_for_devices` by increasing the batch size of devices the query is called with, reducing DB load. ([\element-hq#18939](element-hq#18939))
- Update error code used when an appservice tries to masquerade as an unknown device using [MSC4326](matrix-org/matrix-spec-proposals#4326). Contributed by @tulir @ Beeper. ([\element-hq#18947](element-hq#18947))
- Fix `no active span when trying to log` tracing error on startup (when OpenTracing is enabled). ([\element-hq#18959](element-hq#18959))
- Fix `run_coroutine_in_background(...)` incorrectly handling logcontext. ([\element-hq#18964](element-hq#18964))
- Add debug logs wherever we change current logcontext. ([\element-hq#18966](element-hq#18966))
- Update dockerfile metadata to fix broken link; point to documentation website. ([\element-hq#18971](element-hq#18971))
- Note that the code is additionally licensed under the [Element Commercial license](https://github.com/element-hq/synapse/blob/develop/LICENSE-COMMERCIAL) in SPDX expression field configs. ([\element-hq#18973](element-hq#18973))
- Fix logcontext handling in `timeout_deferred` tests. ([\element-hq#18974](element-hq#18974))
- Remove internal `ReplicationUploadKeysForUserRestServlet` as a follow-up to the work in element-hq#18581 that moved device changes off the main process. ([\element-hq#18988](element-hq#18988))
- Switch task scheduler from raw logcontext manipulation to using the dedicated logcontext utils. ([\element-hq#18990](element-hq#18990))
- Remove `MockClock()` in tests. ([\element-hq#18992](element-hq#18992))
- Switch back to our own custom `LogContextScopeManager` instead of OpenTracing's `ContextVarsScopeManager` which was causing problems when using the experimental `SYNAPSE_ASYNC_IO_REACTOR` option with tracing enabled. ([\element-hq#19007](element-hq#19007))
- Remove `version_string` argument from `HomeServer` since it's always the same. ([\element-hq#19012](element-hq#19012))
- Remove duplicate call to `hs.start_background_tasks()` introduced from a bad merge. ([\element-hq#19013](element-hq#19013))
- Split homeserver creation (`create_homeserver`) and setup (`setup`). ([\element-hq#19015](element-hq#19015))
- Swap near-end-of-life `macos-13` GitHub Actions runner for the `macos-15-intel` variant. ([\element-hq#19025](element-hq#19025))
- Introduce `RootConfig.validate_config()` which can be subclassed in `HomeServerConfig` to do cross-config class validation. ([\element-hq#19027](element-hq#19027))
- Allow any command of the `release.py` script to accept a `--gh-token` argument. ([\element-hq#19035](element-hq#19035))

* Bump Swatinem/rust-cache from 2.8.0 to 2.8.1. ([\element-hq#18949](element-hq#18949))
* Bump actions/cache from 4.2.4 to 4.3.0. ([\element-hq#18983](element-hq#18983))
* Bump anyhow from 1.0.99 to 1.0.100. ([\element-hq#18950](element-hq#18950))
* Bump authlib from 1.6.3 to 1.6.4. ([\element-hq#18957](element-hq#18957))
* Bump authlib from 1.6.4 to 1.6.5. ([\element-hq#19019](element-hq#19019))
* Bump bcrypt from 4.3.0 to 5.0.0. ([\element-hq#18984](element-hq#18984))
* Bump docker/login-action from 3.5.0 to 3.6.0. ([\element-hq#18978](element-hq#18978))
* Bump lxml from 6.0.0 to 6.0.2. ([\element-hq#18979](element-hq#18979))
* Bump phonenumbers from 9.0.13 to 9.0.14. ([\element-hq#18954](element-hq#18954))
* Bump phonenumbers from 9.0.14 to 9.0.15. ([\element-hq#18991](element-hq#18991))
* Bump prometheus-client from 0.22.1 to 0.23.1. ([\element-hq#19016](element-hq#19016))
* Bump pydantic from 2.11.9 to 2.11.10. ([\element-hq#19017](element-hq#19017))
* Bump pygithub from 2.7.0 to 2.8.1. ([\element-hq#18952](element-hq#18952))
* Bump regex from 1.11.2 to 1.11.3. ([\element-hq#18981](element-hq#18981))
* Bump serde from 1.0.224 to 1.0.226. ([\element-hq#18953](element-hq#18953))
* Bump serde from 1.0.226 to 1.0.228. ([\element-hq#18982](element-hq#18982))
* Bump setuptools-rust from 1.11.1 to 1.12.0. ([\element-hq#18980](element-hq#18980))
* Bump twine from 6.1.0 to 6.2.0. ([\element-hq#18985](element-hq#18985))
* Bump types-pyyaml from 6.0.12.20250809 to 6.0.12.20250915. ([\element-hq#19018](element-hq#19018))
* Bump types-requests from 2.32.4.20250809 to 2.32.4.20250913. ([\element-hq#18951](element-hq#18951))
* Bump typing-extensions from 4.14.1 to 4.15.0. ([\element-hq#18956](element-hq#18956))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants