Skip to content

Conversation

kegsay
Copy link
Contributor

@kegsay kegsay commented Sep 24, 2025

This implements MSC4354: Sticky Events. To enable it, set this in homeserver.yaml:

experimental_features:
    msc4354_enabled: true

Still a WIP because it needs:

  • more federation support (sending sticky events to newly joined servers)
  • complement tests
  • unit tests

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

Copy link
Contributor

@MadLittleMods MadLittleMods left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked everything over but didn't full evaluate the sticky events part against the MSC or fully ponder weird corners.


content: JsonDict = attr.Factory(dict)
unsigned: JsonDict = attr.Factory(dict)
sticky: Optional[StickyEventField] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sticky: Optional[StickyEventField] = None
sticky: Optional[StickyEventField] = None
"""
TODO
"""

It would be good to add a docstring to indicate where this is from and what's used for.

Comment on lines +198 to +199
# Mark this as spam so we don't re-evaluate soft-failure status.
redacted_event.internal_metadata.policy_server_spammy = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems unrelated to sticky events

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's related because spam is currently flagged in Synapse via soft-failure, and sometimes via policy_server_spammy. As this PR is re-evaluating soft-failed events, we want to make sure we don't re-evaluate spam, hence need to distinguish it.

if not sticky_events_request.enabled:
return None
now = self.clock.time_msec()
from_id = from_token.stream_token.sticky_events_key if from_token else 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we copy this pattern?

if from_token:
from_stream_id = from_token.stream_token.thread_subscriptions_key
else:
from_stream_id = StreamToken.START.thread_subscriptions_key

Comment on lines +1006 to +1011
_, room_to_event_ids = await self.store.get_sticky_events_in_rooms(
actual_room_ids,
from_id,
to_token.sticky_events_key,
now,
)
Copy link
Contributor

@MadLittleMods MadLittleMods Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To double-check, it's desired to return all sticky events in the room on initial sync? It's fine if there are 100k+ sticky events? Currently, on initial sync, this will return everything from 0 to current.

Here is my suggestion:

I think the better way to go about is to apply the same pattern that we just worked through with thread subscriptions. We would need a few things:

  • A way to return the new sticky events in /sync
  • Dedicated sticky_events_limited flag when there are other sticky events that we didn't return
  • A pagination endpoint for the sticky events

To be more detailed, these could be normal timeline events. If there are more sticky events in between the given sync token and the current position, we set the sticky_events_limited flag. For the pagination endpoint, we could overload /messages with a new filter.

-- @MadLittleMods, internal room

Some other cross-links when we tackled this with thread_subscriptions -> #18695 (comment) and matrix-org/matrix-spec-proposals#4306 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the same problem as the timeline. While the section talks about the problem extensively, I'm not really convinced this is fine.

The problem is the amount of work needed to come up with the giant 100k response and the amount of network effort to get that back to the client.

content: The content of the event to be sent.
delay: How long (in milliseconds) to wait before automatically sending the event.
sticky_duration_ms: The sticky duration if any, see MSC4354.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be expanded a bit to explain what it means without having to look at MSC4354. I have to make a lot of assumptions otherwise.

"Time since xxx that the event should appear in xxx"

# Consumers call 'get_sticky_events_in_rooms' which has `WHERE expires_at > ?`
# to filter out expired sticky events that have yet to be deleted.
DELETE_EXPIRED_STICKY_EVENTS_MS = 60 * 1000 * 60 # 1 hour
MAX_STICKY_DURATION_MS = 3600000 # 1 hour
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the client event creation, we should be preventing creating events that have duration greater than MAX_STICKY_DURATION_MS

if (
type(sticky_duration_ms) is int
and sticky_duration_ms >= 0
and sticky_duration_ms <= MAX_STICKY_DURATION_MS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silently ignoring events with a duration greater than the MAX_STICKY_DURATION_MS seems a bit dubious. Is that correct? We should explain why that's fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If created locally, we should return an error as per your earlier comment. Over federation, no auth rule forbids a high duration (the field itself isn't even protected from redaction) so we have to continue processing the event. We obviously don't want to action a long duration, so silently ignoring is the right thing to do in this particular case.

Comment on lines +381 to +385
critical_auth_types = (
EventTypes.JoinRules,
EventTypes.PowerLevels,
EventTypes.Member,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering how this plays with room versions. Do we need to be careful at all with forward compatibility, etc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be using auth_types_for_event instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to be that careful here. We will need to keep this updated if we make new room versions which break the current auth model though.

9. `groups_key`: `1` (note that this key is now unused)
10. `un_partial_stated_rooms_key`: `379`
11. `thread_subscriptions_key`: 4242
12. `sticky_events_key`: 4141
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add it to the full example token above


# now we send a sticky event that we expect to be bundled with the fwd extrem event
sticky_event_id = self.helper.send_sticky_event(
room_1, "m.room.sticky", 60000, tok=u1_token
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

60000 is obtuse. Would be nice to have a labeled variable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants