Skip to content

Conversation

@kevgs
Copy link
Contributor

@kevgs kevgs commented Jan 31, 2018

Original PR

I'm contributing this new code of the whole pull request, including one or several files that are either new files or modified ones, under the BSD-new license.

@svoj svoj added this to the 10.2 milestone Jan 31, 2018
Copy link
Contributor

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK to me. @svoj please merge this after double-checking (you are the atomics expert).

@dr-m dr-m assigned svoj and unassigned dr-m Feb 8, 2018
@codership-jenkins
Copy link

Can one of the admins verify this patch?

if (difftime(time(NULL),
my_atomic_load32_explicit(&srv_last_monitor_time,
MY_MEMORY_ORDER_RELAXED))
> 60) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you just move this branch to srv_monitor_thread? It doesn't seem to make any sense to update stats in a separate thread. Then you don't need atomic operations. You may also have to move this whole branch under srv_innodb_monitor_mutex protection, but that shouldn't be a big problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think if I make srv_last_monitor_time a local variable in srv_monitor_thread?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then how do you fix srv_printf_innodb_monitor()? I think it can be called by other threads. OTOH I doubt it uses srv_last_monitor_time correctly (probably it should actually use last_monitor_time?).

Anyway this is something that needs to be understood and fixed properly. If you can make it local eventually - I'm all for it.

Copy link
Contributor Author

@kevgs kevgs Mar 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, you're right. I can't make srv_last_monitor_time local easily. Let it be a subject of another PR.

I don't see TSAN reports now. But while this variable is accessed from different thread I guess I will with some different test in future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I told:

You may also have to move this whole branch under srv_innodb_monitor_mutex protection

Something like:

static void srv_refresh_innodb_monitor_stats(void)
{
        mutex_enter(&srv_innodb_monitor_mutex);
        if (difftime(time(NULL), srv_last_monitor_time) > 60) {
                goto end;
        }
        srv_last_monitor_time = time(NULL);
        ...
end:
        mutex_exit(&srv_innodb_monitor_mutex);

Then all accesses except for one will be guarded by the mutex. The last one will be srv_last_monitor_time initialisation, which I'm not yet so sure how to fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was inattentive, sorry. This is fixed now.

{
return(m_lock_word);
return(my_atomic_load32_explicit(&m_lock_word,
MY_MEMORY_ORDER_RELAXED));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, thanks!

mysql_mutex_lock(&thd_destructor_mutex);
srv_running = myvar;
my_atomic_storeptr_explicit(&srv_running, myvar,
MY_MEMORY_ORDER_RELAXED);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of alright, but FWICS it is just a half of the fix. Though probably it is acceptable.

A few lines below there's second assignment srv_running = NULL. As well as two loads in srv0start.cc and srv0srv.cc that are most probably affected by data race as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made all accesses atomical.

Copy link
Contributor

@svoj svoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix monitor stuff. The rest is optional.

@kevgs kevgs force-pushed the kevgs/data_race_10.2 branch from e07bbc4 to 3ad2316 Compare March 14, 2018 11:00
kevgs added 3 commits March 19, 2018 11:07
srv_last_monitor_time: make all accesses relaxed atomical

WARNING: ThreadSanitizer: data race (pid=12041)
  Write of size 8 at 0x000003949278 by thread T26 (mutexes: write M226445748578513120):
    #0 thd_destructor_proxy storage/innobase/handler/ha_innodb.cc:314:14 (mysqld+0x19b5505)

  Previous read of size 8 at 0x000003949278 by main thread:
    #0 innobase_init(void*) storage/innobase/handler/ha_innodb.cc:4180:11 (mysqld+0x1a03404)
    #1 ha_initialize_handlerton(st_plugin_int*) sql/handler.cc:522:31 (mysqld+0xc5ec73)
    #2 plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool) sql/sql_plugin.cc:1447:9 (mysqld+0x134908d)
    #3 plugin_init(int*, char**, int) sql/sql_plugin.cc:1729:15 (mysqld+0x13484f0)
    #4 init_server_components() sql/mysqld.cc:5345:7 (mysqld+0xbf720f)
    #5 mysqld_main(int, char**) sql/mysqld.cc:5940:7 (mysqld+0xbf107d)
    #6 main sql/main.cc:25:10 (mysqld+0xbe971b)

  Location is global 'srv_running' of size 8 at 0x000003949278 (mysqld+0x000003949278)

WARNING: ThreadSanitizer: data race (pid=27869)
  Atomic write of size 4 at 0x7b4800000c00 by thread T8:
    #0 __tsan_atomic32_exchange llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cc:589 (mysqld+0xbd4eac)
    #1 TTASEventMutex<GenericPolicy>::exit() storage/innobase/include/ib0mutex.h:467:7 (mysqld+0x1a8d4cb)
    #2 PolicyMutex<TTASEventMutex<GenericPolicy> >::exit() storage/innobase/include/ib0mutex.h:609:10 (mysqld+0x1a7839e)
    #3 fil_validate() storage/innobase/fil/fil0fil.cc:5535:2 (mysqld+0x1abd913)
    #4 fil_validate_skip() storage/innobase/fil/fil0fil.cc:204:9 (mysqld+0x1aba601)
    #5 fil_aio_wait(unsigned long) storage/innobase/fil/fil0fil.cc:5296:2 (mysqld+0x1abbae6)
    #6 io_handler_thread storage/innobase/srv/srv0start.cc:340:3 (mysqld+0x21abe1e)

  Previous read of size 4 at 0x7b4800000c00 by main thread (mutexes: write M1273, write M1271):
    #0 TTASEventMutex<GenericPolicy>::state() const storage/innobase/include/ib0mutex.h:530:10 (mysqld+0x21c66e2)
    #1 sync_array_detect_deadlock(sync_array_t*, sync_cell_t*, sync_cell_t*, unsigned long) storage/innobase/sync/sync0arr.cc:746:14 (mysqld+0x21c1c7a)
    #2 sync_array_wait_event(sync_array_t*, sync_cell_t*&) storage/innobase/sync/sync0arr.cc:465:6 (mysqld+0x21c1708)
    #3 TTASEventMutex<GenericPolicy>::enter(unsigned int, unsigned int, char const*, unsigned int) storage/innobase/include/ib0mutex.h:516:6 (mysqld+0x1a8c206)
    #4 PolicyMutex<TTASEventMutex<GenericPolicy> >::enter(unsigned int, unsigned int, char const*, unsigned int) storage/innobase/include/ib0mutex.h:635:10 (mysqld+0x1a782c3)
    #5 fil_mutex_enter_and_prepare_for_io(unsigned long) storage/innobase/fil/fil0fil.cc:1131:3 (mysqld+0x1a9a92e)
    #6 fil_io(IORequest const&, bool, page_id_t const&, page_size_t const&, unsigned long, unsigned long, void*, void*, bool) storage/innobase/fil/fil0fil.cc:5082:2 (mysqld+0x1ab8de2)
    #7 buf_flush_write_block_low(buf_page_t*, buf_flush_t, bool) storage/innobase/buf/buf0flu.cc:1112:3 (mysqld+0x1cb970a)
    #8 buf_flush_page(buf_pool_t*, buf_page_t*, buf_flush_t, bool) storage/innobase/buf/buf0flu.cc:1270:3 (mysqld+0x1cb7d70)
    #9 buf_flush_try_neighbors(page_id_t const&, buf_flush_t, unsigned long, unsigned long) storage/innobase/buf/buf0flu.cc:1493:9 (mysqld+0x1cc9674)
    #10 buf_flush_page_and_try_neighbors(buf_page_t*, buf_flush_t, unsigned long, unsigned long*) storage/innobase/buf/buf0flu.cc:1565:13 (mysqld+0x1cbadf3)
    #11 buf_do_flush_list_batch(buf_pool_t*, unsigned long, unsigned long) storage/innobase/buf/buf0flu.cc:1825:3 (mysqld+0x1cbbcb8)
    #12 buf_flush_batch(buf_pool_t*, buf_flush_t, unsigned long, unsigned long, flush_counters_t*) storage/innobase/buf/buf0flu.cc:1895:16 (mysqld+0x1cbb459)
    #13 buf_flush_do_batch(buf_pool_t*, buf_flush_t, unsigned long, unsigned long, flush_counters_t*) storage/innobase/buf/buf0flu.cc:2065:2 (mysqld+0x1cbcfe1)
    #14 buf_flush_lists(unsigned long, unsigned long, unsigned long*) storage/innobase/buf/buf0flu.cc:2167:8 (mysqld+0x1cbd5a3)
    #15 log_preflush_pool_modified_pages(unsigned long) storage/innobase/log/log0log.cc:1400:13 (mysqld+0x1eefc3b)
    #16 log_make_checkpoint_at(unsigned long, bool) storage/innobase/log/log0log.cc:1751:10 (mysqld+0x1eefb16)
    #17 buf_dblwr_create() storage/innobase/buf/buf0dblwr.cc:335:2 (mysqld+0x1cd2141)
    #18 innobase_start_or_create_for_mysql() storage/innobase/srv/srv0start.cc:2539:10 (mysqld+0x21b4d8e)
    #19 innobase_init(void*) storage/innobase/handler/ha_innodb.cc:4193:8 (mysqld+0x1a5e3d7)
    #20 ha_initialize_handlerton(st_plugin_int*) sql/handler.cc:522:31 (mysqld+0xc74d33)
    #21 plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool) sql/sql_plugin.cc:1447:9 (mysqld+0x1376d5d)
    #22 plugin_init(int*, char**, int) sql/sql_plugin.cc:1729:15 (mysqld+0x13761c0)
    #23 init_server_components() sql/mysqld.cc:5348:7 (mysqld+0xc0d0ff)
    #24 mysqld_main(int, char**) sql/mysqld.cc:5943:7 (mysqld+0xc06f9d)
    #25 main sql/main.cc:25:10 (mysqld+0xbff71b)

WARNING: ThreadSanitizer: data race (pid=29031)
  Write of size 8 at 0x0000039e48e0 by thread T15:
    #0 srv_monitor_thread storage/innobase/srv/srv0srv.cc:1699:24 (mysqld+0x21a254e)

  Previous write of size 8 at 0x0000039e48e0 by thread T14:
    #0 srv_refresh_innodb_monitor_stats() storage/innobase/srv/srv0srv.cc:1165:24 (mysqld+0x21a3124)
    #1 srv_error_monitor_thread storage/innobase/srv/srv0srv.cc:1836:3 (mysqld+0x21a2d40)

  Location is global 'srv_last_monitor_time' of size 8 at 0x0000039e48e0 (mysqld+0x0000039e48e0)
@kevgs kevgs force-pushed the kevgs/data_race_10.2 branch from 3ad2316 to e8696e5 Compare March 19, 2018 08:16

if (difftime(current_time, srv_last_monitor_time) > 60) {
/* We referesh InnoDB Monitor values so that averages are
printed from at most 60 last seconds */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to unlock mutex here, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Sorry once more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I will go through this patch once again soon. So far looks good.

while (!srv_running->abort)
while (!my_atomic_loadptr_explicit(&srv_running,
MY_MEMORY_ORDER_RELAXED)
->abort)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted this atomic load: no other thread can update srv_running.

srv_last_monitor_time = time(NULL);
time_t current_time = time(NULL);

if (difftime(current_time, srv_last_monitor_time) > 60) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I inverted this condition, so that original meaning retained.

@svoj
Copy link
Contributor

svoj commented Mar 22, 2018

Pushed 8d32959, thanks!

@svoj svoj closed this Mar 22, 2018
@kevgs kevgs deleted the kevgs/data_race_10.2 branch September 28, 2018 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants