-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
┌─────────────────────────┐ Merged Stream
│ ┌───┬───┬───┬───┐ │ ┌───────────────────────────────┐
│ │ A │ B │ C │ D │ ... │────────┐ │ ┌───┬───╦═══╦───┬───╦═══╗ │
│ └───┴───┴───┴───┘ │ ├───────▶│ │ A │ B ║ B ║ C │ D ║ E ║ ... │
└─────────────────────────┘ │ │ └───┴─▲─╩═══╩───┴───╩═══╝ │
Stream 1 │ └───────────────────────────────┘
│ └ ─ ─ ─ ─ ─ ─ ┐
│
┌─────────────────────────┐ │ │
│ ╔═══╦═══╗ │ │
│ ║ B ║ E ║ ... │────────┘ Stable Sort: the merged
│ ╚═══╩═══╝ │ stream places equal rows from
└─────────────────────────┘ stream 1 *before* stream 2
Stream 2
Thus, if stream 1 and stream 2 had the same record values, it was guaranteed that the row from stream1 came out before stream 2. IOx uses this property correctly deduplicate updates (as streams with a larger index were inserted later).
IOx uses this property correctly deduplicate updates (as streams with a larger index were inserted later).
SortPreservingMergeStream appears to no longer be stable after #1596 was merged Specifically, the previous implementation of SortPreservingMergeSteam would always pick inputs with a lower 'stream index' when the sort keys were tied.
Describe the solution you'd like
I would like SortPreservingMergeStream to be stable
Describe alternatives you've considered
We could add a synthetic "column" to our input and then add it to the sort key, but this seems like a larger overhead than necessary
Additional context
I have a simple fix for this. Will get it up shortly.
cc @yjshen