-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
MDEV-15111 - fix simple InnoDB data race #582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dr-m
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK to me. @svoj please merge this after double-checking (you are the atomics expert).
|
Can one of the admins verify this patch? |
storage/innobase/srv/srv0srv.cc
Outdated
| if (difftime(time(NULL), | ||
| my_atomic_load32_explicit(&srv_last_monitor_time, | ||
| MY_MEMORY_ORDER_RELAXED)) | ||
| > 60) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you just move this branch to srv_monitor_thread? It doesn't seem to make any sense to update stats in a separate thread. Then you don't need atomic operations. You may also have to move this whole branch under srv_innodb_monitor_mutex protection, but that shouldn't be a big problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think if I make srv_last_monitor_time a local variable in srv_monitor_thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then how do you fix srv_printf_innodb_monitor()? I think it can be called by other threads. OTOH I doubt it uses srv_last_monitor_time correctly (probably it should actually use last_monitor_time?).
Anyway this is something that needs to be understood and fixed properly. If you can make it local eventually - I'm all for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, you're right. I can't make srv_last_monitor_time local easily. Let it be a subject of another PR.
I don't see TSAN reports now. But while this variable is accessed from different thread I guess I will with some different test in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why I told:
You may also have to move this whole branch under srv_innodb_monitor_mutex protection
Something like:
static void srv_refresh_innodb_monitor_stats(void)
{
mutex_enter(&srv_innodb_monitor_mutex);
if (difftime(time(NULL), srv_last_monitor_time) > 60) {
goto end;
}
srv_last_monitor_time = time(NULL);
...
end:
mutex_exit(&srv_innodb_monitor_mutex);
Then all accesses except for one will be guarded by the mutex. The last one will be srv_last_monitor_time initialisation, which I'm not yet so sure how to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was inattentive, sorry. This is fixed now.
| { | ||
| return(m_lock_word); | ||
| return(my_atomic_load32_explicit(&m_lock_word, | ||
| MY_MEMORY_ORDER_RELAXED)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, thanks!
| mysql_mutex_lock(&thd_destructor_mutex); | ||
| srv_running = myvar; | ||
| my_atomic_storeptr_explicit(&srv_running, myvar, | ||
| MY_MEMORY_ORDER_RELAXED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of alright, but FWICS it is just a half of the fix. Though probably it is acceptable.
A few lines below there's second assignment srv_running = NULL. As well as two loads in srv0start.cc and srv0srv.cc that are most probably affected by data race as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made all accesses atomical.
svoj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix monitor stuff. The rest is optional.
e07bbc4 to
3ad2316
Compare
srv_last_monitor_time: make all accesses relaxed atomical
WARNING: ThreadSanitizer: data race (pid=12041)
Write of size 8 at 0x000003949278 by thread T26 (mutexes: write M226445748578513120):
#0 thd_destructor_proxy storage/innobase/handler/ha_innodb.cc:314:14 (mysqld+0x19b5505)
Previous read of size 8 at 0x000003949278 by main thread:
#0 innobase_init(void*) storage/innobase/handler/ha_innodb.cc:4180:11 (mysqld+0x1a03404)
#1 ha_initialize_handlerton(st_plugin_int*) sql/handler.cc:522:31 (mysqld+0xc5ec73)
#2 plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool) sql/sql_plugin.cc:1447:9 (mysqld+0x134908d)
#3 plugin_init(int*, char**, int) sql/sql_plugin.cc:1729:15 (mysqld+0x13484f0)
#4 init_server_components() sql/mysqld.cc:5345:7 (mysqld+0xbf720f)
#5 mysqld_main(int, char**) sql/mysqld.cc:5940:7 (mysqld+0xbf107d)
#6 main sql/main.cc:25:10 (mysqld+0xbe971b)
Location is global 'srv_running' of size 8 at 0x000003949278 (mysqld+0x000003949278)
WARNING: ThreadSanitizer: data race (pid=27869)
Atomic write of size 4 at 0x7b4800000c00 by thread T8:
#0 __tsan_atomic32_exchange llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cc:589 (mysqld+0xbd4eac)
#1 TTASEventMutex<GenericPolicy>::exit() storage/innobase/include/ib0mutex.h:467:7 (mysqld+0x1a8d4cb)
#2 PolicyMutex<TTASEventMutex<GenericPolicy> >::exit() storage/innobase/include/ib0mutex.h:609:10 (mysqld+0x1a7839e)
#3 fil_validate() storage/innobase/fil/fil0fil.cc:5535:2 (mysqld+0x1abd913)
#4 fil_validate_skip() storage/innobase/fil/fil0fil.cc:204:9 (mysqld+0x1aba601)
#5 fil_aio_wait(unsigned long) storage/innobase/fil/fil0fil.cc:5296:2 (mysqld+0x1abbae6)
#6 io_handler_thread storage/innobase/srv/srv0start.cc:340:3 (mysqld+0x21abe1e)
Previous read of size 4 at 0x7b4800000c00 by main thread (mutexes: write M1273, write M1271):
#0 TTASEventMutex<GenericPolicy>::state() const storage/innobase/include/ib0mutex.h:530:10 (mysqld+0x21c66e2)
#1 sync_array_detect_deadlock(sync_array_t*, sync_cell_t*, sync_cell_t*, unsigned long) storage/innobase/sync/sync0arr.cc:746:14 (mysqld+0x21c1c7a)
#2 sync_array_wait_event(sync_array_t*, sync_cell_t*&) storage/innobase/sync/sync0arr.cc:465:6 (mysqld+0x21c1708)
#3 TTASEventMutex<GenericPolicy>::enter(unsigned int, unsigned int, char const*, unsigned int) storage/innobase/include/ib0mutex.h:516:6 (mysqld+0x1a8c206)
#4 PolicyMutex<TTASEventMutex<GenericPolicy> >::enter(unsigned int, unsigned int, char const*, unsigned int) storage/innobase/include/ib0mutex.h:635:10 (mysqld+0x1a782c3)
#5 fil_mutex_enter_and_prepare_for_io(unsigned long) storage/innobase/fil/fil0fil.cc:1131:3 (mysqld+0x1a9a92e)
#6 fil_io(IORequest const&, bool, page_id_t const&, page_size_t const&, unsigned long, unsigned long, void*, void*, bool) storage/innobase/fil/fil0fil.cc:5082:2 (mysqld+0x1ab8de2)
#7 buf_flush_write_block_low(buf_page_t*, buf_flush_t, bool) storage/innobase/buf/buf0flu.cc:1112:3 (mysqld+0x1cb970a)
#8 buf_flush_page(buf_pool_t*, buf_page_t*, buf_flush_t, bool) storage/innobase/buf/buf0flu.cc:1270:3 (mysqld+0x1cb7d70)
#9 buf_flush_try_neighbors(page_id_t const&, buf_flush_t, unsigned long, unsigned long) storage/innobase/buf/buf0flu.cc:1493:9 (mysqld+0x1cc9674)
#10 buf_flush_page_and_try_neighbors(buf_page_t*, buf_flush_t, unsigned long, unsigned long*) storage/innobase/buf/buf0flu.cc:1565:13 (mysqld+0x1cbadf3)
#11 buf_do_flush_list_batch(buf_pool_t*, unsigned long, unsigned long) storage/innobase/buf/buf0flu.cc:1825:3 (mysqld+0x1cbbcb8)
#12 buf_flush_batch(buf_pool_t*, buf_flush_t, unsigned long, unsigned long, flush_counters_t*) storage/innobase/buf/buf0flu.cc:1895:16 (mysqld+0x1cbb459)
#13 buf_flush_do_batch(buf_pool_t*, buf_flush_t, unsigned long, unsigned long, flush_counters_t*) storage/innobase/buf/buf0flu.cc:2065:2 (mysqld+0x1cbcfe1)
#14 buf_flush_lists(unsigned long, unsigned long, unsigned long*) storage/innobase/buf/buf0flu.cc:2167:8 (mysqld+0x1cbd5a3)
#15 log_preflush_pool_modified_pages(unsigned long) storage/innobase/log/log0log.cc:1400:13 (mysqld+0x1eefc3b)
#16 log_make_checkpoint_at(unsigned long, bool) storage/innobase/log/log0log.cc:1751:10 (mysqld+0x1eefb16)
#17 buf_dblwr_create() storage/innobase/buf/buf0dblwr.cc:335:2 (mysqld+0x1cd2141)
#18 innobase_start_or_create_for_mysql() storage/innobase/srv/srv0start.cc:2539:10 (mysqld+0x21b4d8e)
#19 innobase_init(void*) storage/innobase/handler/ha_innodb.cc:4193:8 (mysqld+0x1a5e3d7)
#20 ha_initialize_handlerton(st_plugin_int*) sql/handler.cc:522:31 (mysqld+0xc74d33)
#21 plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool) sql/sql_plugin.cc:1447:9 (mysqld+0x1376d5d)
#22 plugin_init(int*, char**, int) sql/sql_plugin.cc:1729:15 (mysqld+0x13761c0)
#23 init_server_components() sql/mysqld.cc:5348:7 (mysqld+0xc0d0ff)
#24 mysqld_main(int, char**) sql/mysqld.cc:5943:7 (mysqld+0xc06f9d)
#25 main sql/main.cc:25:10 (mysqld+0xbff71b)
WARNING: ThreadSanitizer: data race (pid=29031)
Write of size 8 at 0x0000039e48e0 by thread T15:
#0 srv_monitor_thread storage/innobase/srv/srv0srv.cc:1699:24 (mysqld+0x21a254e)
Previous write of size 8 at 0x0000039e48e0 by thread T14:
#0 srv_refresh_innodb_monitor_stats() storage/innobase/srv/srv0srv.cc:1165:24 (mysqld+0x21a3124)
#1 srv_error_monitor_thread storage/innobase/srv/srv0srv.cc:1836:3 (mysqld+0x21a2d40)
Location is global 'srv_last_monitor_time' of size 8 at 0x0000039e48e0 (mysqld+0x0000039e48e0)
…or srv_last_monitor_time
3ad2316 to
e8696e5
Compare
|
|
||
| if (difftime(current_time, srv_last_monitor_time) > 60) { | ||
| /* We referesh InnoDB Monitor values so that averages are | ||
| printed from at most 60 last seconds */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to unlock mutex here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Sorry once more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I will go through this patch once again soon. So far looks good.
| while (!srv_running->abort) | ||
| while (!my_atomic_loadptr_explicit(&srv_running, | ||
| MY_MEMORY_ORDER_RELAXED) | ||
| ->abort) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted this atomic load: no other thread can update srv_running.
| srv_last_monitor_time = time(NULL); | ||
| time_t current_time = time(NULL); | ||
|
|
||
| if (difftime(current_time, srv_last_monitor_time) > 60) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I inverted this condition, so that original meaning retained.
|
Pushed 8d32959, thanks! |
Original PR
I'm contributing this new code of the whole pull request, including one or several files that are either new files or modified ones, under the BSD-new license.