Skip to content

Conversation

@dr-m
Copy link
Contributor

@dr-m dr-m commented May 28, 2025

  • The Jira issue number for this PR is: MDEV-36886

Description

If the execution of the two reads in log_t::get_lsn_approx() is interleaved with concurrent writes of those fields in log_t::write_buf() or log_t::persist(), the returned approximation will be an upper bound. If log_t::append_prepare_wait() is pending, the approximation could be a lower bound.

We must adjust each caller of log_t::get_lsn_approx() for the possibility that the return value is larger than
MAX(oldest_modification) in buf_pool.flush_list.

af_needed_for_redo(): Add a comment that explains why the glitch is not a problem.

page_cleaner_flush_pages_recommendation(): Revise the logic for the unlikely case cur_lsn < oldest_lsn . The original logic would have invoked af_get_pct_for_lsn() with a very large age value, which would likely cause an overflow of the local variable lsn_age_factor, and make pct_for_lsn a "random number". Based on that value, total_ratio would be normalized to something between 0.0 and 1.0. Nothing extremely bad should have happened in this case; the innodb_io_capacity_max should not be exceeded.

Release Notes

N/A. This probably has no observable impact.

How can this PR be tested?

@mariadb-SaahilAlam reproduced this in #4014 using RQG using rr record, and I presume that the probability of this is very low. This is a very unlikely race condition.

Basing the PR against the correct MariaDB version

  • This is a new feature or a refactoring, and the PR is based against the main branch.
  • This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

If the execution of the two reads in log_t::get_lsn_approx() is
interleaved with concurrent writes of those fields in
log_t::write_buf() or log_t::persist(), the returned approximation
will be an upper bound. If log_t::append_prepare_wait() is pending,
the approximation could be a lower bound.

We must adjust each caller of log_t::get_lsn_approx() for the
possibility that the return value is larger than
MAX(oldest_modification) in buf_pool.flush_list.

af_needed_for_redo(): Add a comment that explains why the glitch
is not a problem.

page_cleaner_flush_pages_recommendation(): Revise the logic for
the unlikely case cur_lsn < oldest_lsn.  The original logic would have
invoked af_get_pct_for_lsn() with a very large age value, which
would likely cause an overflow of the local variable lsn_age_factor,
and make pct_for_lsn a "random number".  Based on that value,
total_ratio would be normalized to something between 0.0 and 1.0.
Nothing extremely bad should have happened in this case;
the innodb_io_capacity_max should not be exceeded.
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Contributor

@mariadb-DebarunBanerjee mariadb-DebarunBanerjee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Please see the comment.

Comment on lines +2297 to 2305
/* We may have oldest_lsn > log_sys.get_lsn_approx() if
log_t::write_buf() or log_t::persist() are executing concurrently
with this. In that case, age > af_lwm should hold, and
buf_flush_page_cleaner() would execute one more timed wait. (Not a
big problem.) */

lsn_t age= log_sys.get_lsn_approx() - oldest_lsn;
lsn_t af_lwm= static_cast<lsn_t>(srv_adaptive_flushing_lwm *
static_cast<double>(log_sys.log_capacity) / 100);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation is correct but we don't want to subtract a bigger number and then convert the -ve number to get a huge(incorrect) age and falsely recommend adaptive flushing. Can you please add similar condition as in the other caller of get_lsn_approx (page_cleaner_flush_pages_recommendation) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lsn_t is an unsigned integer (uint64_t aliased via ib_uint64_t), and any wrap-around for unsigned integers is well defined behaviour. For signed integers, wrap-around from positive to negative or vice versa is undefined behaviour. This fact allows compilers to optimize some loop conditions when the loop variable typically is a signed int (although not in InnoDB).

The comment tries to explain that the impact of the odd return true instead of a more correct return false is limited to the following in the only caller buf_flush_page_cleaner():

    possibly_unemployed:
      if (!soft_lsn_limit && !af_needed_for_redo(oldest_lsn))
        goto set_idle;

    buf_pool.page_cleaner_set_idle(false);
    buf_pool.n_flush_inc();
    mysql_mutex_unlock(&buf_pool.flush_list_mutex);

    if (UNIV_UNLIKELY(soft_lsn_limit != 0))
    {
      n= srv_max_io_capacity;
      goto background_flush;
    }

    if (!srv_adaptive_flushing)

That is, an accidental return true may lead to attempting to run an extra flushing batch instead of entering an unbounded wait for buf_pool_t::page_cleaner_wakeup(). This is similar to the glitch in page_cleaner_flush_pages_recommendation(), which could return an unnecessarily large number of recommended pages to flush.

@dr-m dr-m merged commit 4d37b1c into 10.11 May 28, 2025
11 of 13 checks passed
@dr-m dr-m deleted the 10.11-MDEV-36886 branch May 28, 2025 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants