Skip to content

Conversation

ianton-ru
Copy link

@ianton-ru ianton-ru commented Jun 18, 2025

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Setting lock_object_storage_task_distribution_ms to improve cache locality with swarm cluster

Documentation entry for user-facing changes

Solve #867
Setting lock_object_storage_task_distribution_ms prevent object to be processed on non-primary swarm nodes when primary is alive at last lock_object_storage_task_distribution_ms milliseconds.
Swarm has 2 nodes, node1 processed all objects, node2 is overloaded and processed only part objects.
Behavior before PR and default (with lock_object_storage_task_distribution_ms=0): node1 takes unprocessed objects.
Behavior with lock_object_storage_task_distribution_ms=3000 - node1 takes unprocessed objects only if node2 does not take it for 3 seconds. If node2 took something 1 second ago, node1 waits and makes new request after 2 seconds. If node2 still did not take any object, objects goes to node1. It node2 took something, node1 waits again till the end of new 3-second period.

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

@ianton-ru ianton-ru force-pushed the feature/cache_locality_lock branch from fd2dde7 to f1a4e00 Compare June 18, 2025 22:36
@ianton-ru ianton-ru force-pushed the feature/cache_locality_lock branch from f1a4e00 to 258bbbb Compare June 23, 2025 11:05
SETTINGS {",".join(f"{k}={v}" for k, v in settings.items())}
""",
query_id=query_id_first,
timeout=30,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeouts removed, because it creates to many descriptors.
(https://github.com/Altinity/ClickHouse/blob/antalya/tests/integration/helpers/client.py#L209)
Locally tests failed with "too many open files".

return {};

StorageObjectStorageStableTaskDistributor::CommandInTaskResponse command(object_info->getPath());
if (command.is_parsed())
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not elegant, but for backward compatibility left string as filename and JSON as something with additional information.
Can be extended in future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like storing a JSON in a variable named as PATH, which is highly misleading... If you want to pass some extra data as JSON, please do it explicitly.

Either by storing some extra information in ObjectInfo's ObjectMetadata, by updating ObjectMetadata with extra field (preferred), OR inserting a special key into ObjectMetadata::attributes with stringified JSON (less desirable), OR doing something else.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved into RelativePathWithMetadata

: iterator(std::move(iterator_))
, connection_to_files(ids_of_nodes_.size())
, ids_of_nodes(ids_of_nodes_)
, lock_object_storage_task_distribution_us(lock_object_storage_task_distribution_ms_ * 1000)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poco::Timestamp is in microseconds, so here convert from milliseconds to microseconds to avoid multiplications or divides later.

@Enmk Enmk changed the title lock_object_storage_task_distribution_ms setting Antalya 25.3: lock_object_storage_task_distribution_ms setting Jul 1, 2025


def check_s3_gets(cluster, node, expected_result, cluster_first, cluster_second, enable_filesystem_cache):
def check_s3_gets(cluster, node, expected_result, cluster_first, cluster_second, enable_filesystem_cache, lock=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and everywhere else down from here, just for the sake of readability and maintainability... please you consider renaming lock into something like use_lock_object_storage_task_distribution_ms or even just (lock_object_storage_task_distribution_ms with either None or numerical value )?

So it is clear what that parameter is used for, and also more grep friendly...

if (retry_after_us.has_value())
{
not_a_path = true;
/// TODO: Make asyncronous waiting without sleep in thread
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// TODO: Make asyncronous waiting without sleep in thread

When do you plan to implement this TODO?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need more deep understanding how to do it, plan to make separate PR later.
In any case need to do it before sending to upstream, now it is just experimental level of code quality.


private:
bool successfully_parsed = false;
std::optional<uint64_t> retry_after_us;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not Poco::Timestamp::TimeDiff ?

: iterator(std::move(iterator_))
, connection_to_files(ids_of_nodes_.size())
, ids_of_nodes(ids_of_nodes_)
, lock_object_storage_task_distribution_us(lock_object_storage_task_distribution_ms_ * 1000)
Copy link
Member

@Enmk Enmk Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a slight chance of overflow with large positive value (mot likely unintentionally, due to misconfiguration or error) becoming a negative one, since Poco::Timestamp::TimeDiff is signed int64.

IDK if it is even worth it to do some overflow checks here, but please makes sure that negative values do not break anything down the road.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add checks for lock_object_storage_task_distribution_ms in getTaskIteratorExtension before constructor called.

/// Limit time of node activity to keep task in queue
Poco::Timestamp activity_limit;
Poco::Timestamp oldest_activity;
if (lock_object_storage_task_distribution_us)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lock_object_storage_task_distribution_us >= 0, since lock_object_storage_task_distribution_us is SIGNED

@ianton-ru ianton-ru force-pushed the feature/cache_locality_lock branch from e90db31 to d1f4b4c Compare July 2, 2025 16:10
@Enmk Enmk merged commit dc82819 into antalya-25.3 Jul 3, 2025
322 of 348 checks passed
@svb-alt svb-alt added antalya-25.6 port-antalya PRs to be ported to all new Antalya releases and removed antalya-25.6 labels Jul 14, 2025
ianton-ru pushed a commit that referenced this pull request Aug 6, 2025
Antalya 25.3: lock_object_storage_task_distribution_ms setting
Enmk added a commit that referenced this pull request Sep 9, 2025
…us_hashing

25.6.5 Antalya port of #709, #760, #866 - Rendezvous hashing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
antalya antalya-25.3 antalya-25.3.3 port-antalya PRs to be ported to all new Antalya releases swarms Antalya Roadmap: Swarms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants