MDEV-36886 log_t::get_lsn_approx() isn't lower bound #4074

dr-m · 2025-05-28T12:01:35Z

The Jira issue number for this PR is: MDEV-36886

Description

If the execution of the two reads in log_t::get_lsn_approx() is interleaved with concurrent writes of those fields in log_t::write_buf() or log_t::persist(), the returned approximation will be an upper bound. If log_t::append_prepare_wait() is pending, the approximation could be a lower bound.

We must adjust each caller of log_t::get_lsn_approx() for the possibility that the return value is larger than
MAX(oldest_modification) in buf_pool.flush_list.

af_needed_for_redo(): Add a comment that explains why the glitch is not a problem.

page_cleaner_flush_pages_recommendation(): Revise the logic for the unlikely case cur_lsn < oldest_lsn . The original logic would have invoked af_get_pct_for_lsn() with a very large age value, which would likely cause an overflow of the local variable lsn_age_factor, and make pct_for_lsn a "random number". Based on that value, total_ratio would be normalized to something between 0.0 and 1.0. Nothing extremely bad should have happened in this case; the innodb_io_capacity_max should not be exceeded.

Release Notes

N/A. This probably has no observable impact.

How can this PR be tested?

@mariadb-SaahilAlam reproduced this in #4014 using RQG using rr record, and I presume that the probability of this is very low. This is a very unlikely race condition.

Basing the PR against the correct MariaDB version

This is a new feature or a refactoring, and the PR is based against the main branch.
This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

If the execution of the two reads in log_t::get_lsn_approx() is interleaved with concurrent writes of those fields in log_t::write_buf() or log_t::persist(), the returned approximation will be an upper bound. If log_t::append_prepare_wait() is pending, the approximation could be a lower bound. We must adjust each caller of log_t::get_lsn_approx() for the possibility that the return value is larger than MAX(oldest_modification) in buf_pool.flush_list. af_needed_for_redo(): Add a comment that explains why the glitch is not a problem. page_cleaner_flush_pages_recommendation(): Revise the logic for the unlikely case cur_lsn < oldest_lsn. The original logic would have invoked af_get_pct_for_lsn() with a very large age value, which would likely cause an overflow of the local variable lsn_age_factor, and make pct_for_lsn a "random number". Based on that value, total_ratio would be normalized to something between 0.0 and 1.0. Nothing extremely bad should have happened in this case; the innodb_io_capacity_max should not be exceeded.

CLAassistant · 2025-05-28T12:01:42Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

mariadb-DebarunBanerjee

Looks good. Please see the comment.

mariadb-DebarunBanerjee · 2025-05-28T12:52:29Z

storage/innobase/buf/buf0flu.cc

+  /* We may have oldest_lsn > log_sys.get_lsn_approx() if
+  log_t::write_buf() or log_t::persist() are executing concurrently
+  with this. In that case, age > af_lwm should hold, and
+  buf_flush_page_cleaner() would execute one more timed wait. (Not a
+  big problem.) */
+
  lsn_t age= log_sys.get_lsn_approx() - oldest_lsn;
  lsn_t af_lwm= static_cast<lsn_t>(srv_adaptive_flushing_lwm *
    static_cast<double>(log_sys.log_capacity) / 100);


The explanation is correct but we don't want to subtract a bigger number and then convert the -ve number to get a huge(incorrect) age and falsely recommend adaptive flushing. Can you please add similar condition as in the other caller of get_lsn_approx (page_cleaner_flush_pages_recommendation) ?

lsn_t is an unsigned integer (uint64_t aliased via ib_uint64_t), and any wrap-around for unsigned integers is well defined behaviour. For signed integers, wrap-around from positive to negative or vice versa is undefined behaviour. This fact allows compilers to optimize some loop conditions when the loop variable typically is a signed int (although not in InnoDB).

The comment tries to explain that the impact of the odd return true instead of a more correct return false is limited to the following in the only caller buf_flush_page_cleaner():

possibly_unemployed: if (!soft_lsn_limit && !af_needed_for_redo(oldest_lsn)) goto set_idle; buf_pool.page_cleaner_set_idle(false); buf_pool.n_flush_inc(); mysql_mutex_unlock(&buf_pool.flush_list_mutex); if (UNIV_UNLIKELY(soft_lsn_limit != 0)) { n= srv_max_io_capacity; goto background_flush; } if (!srv_adaptive_flushing)

That is, an accidental return true may lead to attempting to run an extra flushing batch instead of entering an unbounded wait for buf_pool_t::page_cleaner_wakeup(). This is similar to the glitch in page_cleaner_flush_pages_recommendation(), which could return an unnecessarily large number of recommended pages to flush.

dr-m requested a review from mariadb-DebarunBanerjee May 28, 2025 12:01

dr-m added the MariaDB Corporation label May 28, 2025

mariadb-DebarunBanerjee approved these changes May 28, 2025

View reviewed changes

dr-m merged commit 4d37b1c into 10.11 May 28, 2025
11 of 13 checks passed

dr-m deleted the 10.11-MDEV-36886 branch May 28, 2025 13:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MDEV-36886 log_t::get_lsn_approx() isn't lower bound #4074

MDEV-36886 log_t::get_lsn_approx() isn't lower bound #4074

Uh oh!

dr-m commented May 28, 2025

Uh oh!

CLAassistant commented May 28, 2025

Uh oh!

mariadb-DebarunBanerjee left a comment

Uh oh!

mariadb-DebarunBanerjee May 28, 2025

Uh oh!

dr-m May 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

Uh oh!

MDEV-36886 log_t::get_lsn_approx() isn't lower bound #4074

MDEV-36886 log_t::get_lsn_approx() isn't lower bound #4074

Uh oh!

Conversation

dr-m commented May 28, 2025

Description

Release Notes

How can this PR be tested?

Basing the PR against the correct MariaDB version

PR quality check

Uh oh!

CLAassistant commented May 28, 2025

Uh oh!

mariadb-DebarunBanerjee left a comment

Choose a reason for hiding this comment

Uh oh!

mariadb-DebarunBanerjee May 28, 2025

Choose a reason for hiding this comment

Uh oh!

dr-m May 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants