-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
MDEV-36886 log_t::get_lsn_approx() isn't lower bound #4074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If the execution of the two reads in log_t::get_lsn_approx() is interleaved with concurrent writes of those fields in log_t::write_buf() or log_t::persist(), the returned approximation will be an upper bound. If log_t::append_prepare_wait() is pending, the approximation could be a lower bound. We must adjust each caller of log_t::get_lsn_approx() for the possibility that the return value is larger than MAX(oldest_modification) in buf_pool.flush_list. af_needed_for_redo(): Add a comment that explains why the glitch is not a problem. page_cleaner_flush_pages_recommendation(): Revise the logic for the unlikely case cur_lsn < oldest_lsn. The original logic would have invoked af_get_pct_for_lsn() with a very large age value, which would likely cause an overflow of the local variable lsn_age_factor, and make pct_for_lsn a "random number". Based on that value, total_ratio would be normalized to something between 0.0 and 1.0. Nothing extremely bad should have happened in this case; the innodb_io_capacity_max should not be exceeded.
|
|
mariadb-DebarunBanerjee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Please see the comment.
| /* We may have oldest_lsn > log_sys.get_lsn_approx() if | ||
| log_t::write_buf() or log_t::persist() are executing concurrently | ||
| with this. In that case, age > af_lwm should hold, and | ||
| buf_flush_page_cleaner() would execute one more timed wait. (Not a | ||
| big problem.) */ | ||
|
|
||
| lsn_t age= log_sys.get_lsn_approx() - oldest_lsn; | ||
| lsn_t af_lwm= static_cast<lsn_t>(srv_adaptive_flushing_lwm * | ||
| static_cast<double>(log_sys.log_capacity) / 100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The explanation is correct but we don't want to subtract a bigger number and then convert the -ve number to get a huge(incorrect) age and falsely recommend adaptive flushing. Can you please add similar condition as in the other caller of get_lsn_approx (page_cleaner_flush_pages_recommendation) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lsn_t is an unsigned integer (uint64_t aliased via ib_uint64_t), and any wrap-around for unsigned integers is well defined behaviour. For signed integers, wrap-around from positive to negative or vice versa is undefined behaviour. This fact allows compilers to optimize some loop conditions when the loop variable typically is a signed int (although not in InnoDB).
The comment tries to explain that the impact of the odd return true instead of a more correct return false is limited to the following in the only caller buf_flush_page_cleaner():
possibly_unemployed:
if (!soft_lsn_limit && !af_needed_for_redo(oldest_lsn))
goto set_idle;
buf_pool.page_cleaner_set_idle(false);
buf_pool.n_flush_inc();
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
if (UNIV_UNLIKELY(soft_lsn_limit != 0))
{
n= srv_max_io_capacity;
goto background_flush;
}
if (!srv_adaptive_flushing)That is, an accidental return true may lead to attempting to run an extra flushing batch instead of entering an unbounded wait for buf_pool_t::page_cleaner_wakeup(). This is similar to the glitch in page_cleaner_flush_pages_recommendation(), which could return an unnecessarily large number of recommended pages to flush.
Description
If the execution of the two reads in
log_t::get_lsn_approx()is interleaved with concurrent writes of those fields inlog_t::write_buf()orlog_t::persist(), the returned approximation will be an upper bound. Iflog_t::append_prepare_wait()is pending, the approximation could be a lower bound.We must adjust each caller of
log_t::get_lsn_approx()for the possibility that the return value is larger thanMAX(oldest_modification)inbuf_pool.flush_list.af_needed_for_redo(): Add a comment that explains why the glitch is not a problem.page_cleaner_flush_pages_recommendation(): Revise the logic for the unlikely casecur_lsn < oldest_lsn. The original logic would have invokedaf_get_pct_for_lsn()with a very large age value, which would likely cause an overflow of the local variablelsn_age_factor, and makepct_for_lsna "random number". Based on that value, total_ratio would be normalized to something between0.0and1.0. Nothing extremely bad should have happened in this case; theinnodb_io_capacity_maxshould not be exceeded.Release Notes
N/A. This probably has no observable impact.
How can this PR be tested?
@mariadb-SaahilAlam reproduced this in #4014 using RQG using
rr record, and I presume that the probability of this is very low. This is a very unlikely race condition.Basing the PR against the correct MariaDB version
mainbranch.PR quality check