-
Notifications
You must be signed in to change notification settings - Fork 8
Antalya 25.3: lock_object_storage_task_distribution_ms setting #866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fd2dde7
to
f1a4e00
Compare
f1a4e00
to
258bbbb
Compare
SETTINGS {",".join(f"{k}={v}" for k, v in settings.items())} | ||
""", | ||
query_id=query_id_first, | ||
timeout=30, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timeouts removed, because it creates to many descriptors.
(https://github.com/Altinity/ClickHouse/blob/antalya/tests/integration/helpers/client.py#L209)
Locally tests failed with "too many open files".
return {}; | ||
|
||
StorageObjectStorageStableTaskDistributor::CommandInTaskResponse command(object_info->getPath()); | ||
if (command.is_parsed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not elegant, but for backward compatibility left string as filename and JSON as something with additional information.
Can be extended in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like storing a JSON in a variable named as PATH, which is highly misleading... If you want to pass some extra data as JSON, please do it explicitly.
Either by storing some extra information in ObjectInfo
's ObjectMetadata
, by updating ObjectMetadata
with extra field (preferred), OR inserting a special key into ObjectMetadata::attributes
with stringified JSON (less desirable), OR doing something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved into RelativePathWithMetadata
: iterator(std::move(iterator_)) | ||
, connection_to_files(ids_of_nodes_.size()) | ||
, ids_of_nodes(ids_of_nodes_) | ||
, lock_object_storage_task_distribution_us(lock_object_storage_task_distribution_ms_ * 1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Poco::Timestamp is in microseconds, so here convert from milliseconds to microseconds to avoid multiplications or divides later.
|
||
|
||
def check_s3_gets(cluster, node, expected_result, cluster_first, cluster_second, enable_filesystem_cache): | ||
def check_s3_gets(cluster, node, expected_result, cluster_first, cluster_second, enable_filesystem_cache, lock=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and everywhere else down from here, just for the sake of readability and maintainability... please you consider renaming lock
into something like use_lock_object_storage_task_distribution_ms
or even just (lock_object_storage_task_distribution_ms
with either None
or numerical value )?
So it is clear what that parameter is used for, and also more grep
friendly...
if (retry_after_us.has_value()) | ||
{ | ||
not_a_path = true; | ||
/// TODO: Make asyncronous waiting without sleep in thread |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// TODO: Make asyncronous waiting without sleep in thread
When do you plan to implement this TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need more deep understanding how to do it, plan to make separate PR later.
In any case need to do it before sending to upstream, now it is just experimental level of code quality.
|
||
private: | ||
bool successfully_parsed = false; | ||
std::optional<uint64_t> retry_after_us; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not Poco::Timestamp::TimeDiff
?
: iterator(std::move(iterator_)) | ||
, connection_to_files(ids_of_nodes_.size()) | ||
, ids_of_nodes(ids_of_nodes_) | ||
, lock_object_storage_task_distribution_us(lock_object_storage_task_distribution_ms_ * 1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a slight chance of overflow with large positive value (mot likely unintentionally, due to misconfiguration or error) becoming a negative one, since Poco::Timestamp::TimeDiff
is signed int64.
IDK if it is even worth it to do some overflow checks here, but please makes sure that negative values do not break anything down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add checks for lock_object_storage_task_distribution_ms
in getTaskIteratorExtension before constructor called.
/// Limit time of node activity to keep task in queue | ||
Poco::Timestamp activity_limit; | ||
Poco::Timestamp oldest_activity; | ||
if (lock_object_storage_task_distribution_us) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lock_object_storage_task_distribution_us >= 0
, since lock_object_storage_task_distribution_us
is SIGNED
e90db31
to
d1f4b4c
Compare
Antalya 25.3: lock_object_storage_task_distribution_ms setting
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Setting
lock_object_storage_task_distribution_ms
to improve cache locality with swarm clusterDocumentation entry for user-facing changes
Solve #867
Setting
lock_object_storage_task_distribution_ms
prevent object to be processed on non-primary swarm nodes when primary is alive at lastlock_object_storage_task_distribution_ms
milliseconds.Swarm has 2 nodes, node1 processed all objects, node2 is overloaded and processed only part objects.
Behavior before PR and default (with
lock_object_storage_task_distribution_ms=0
): node1 takes unprocessed objects.Behavior with
lock_object_storage_task_distribution_ms=3000
- node1 takes unprocessed objects only if node2 does not take it for 3 seconds. If node2 took something 1 second ago, node1 waits and makes new request after 2 seconds. If node2 still did not take any object, objects goes to node1. It node2 took something, node1 waits again till the end of new 3-second period.Exclude tests: