Skip to content

Commit 4dcb1b5

Browse files
committed
MDEV-35049: Use CRC-32C and avoid allocating heap
For the adaptive hash index, dtuple_fold() and rec_fold() were employing a slow rolling hash algorithm, computing hash values ("fold") for one field and one byte at a time, while depending on calls to rec_get_offsets(). We already have optimized implementations of CRC-32C and have been successfully using that function in some other InnoDB tables, but not yet in the adaptive hash index. Any linear function such as any CRC will fail the avalanche test that any cryptographically secure hash function is expected to pass: any single-bit change in the input key should affect on average half the bits in the output. But we always were happy with less than cryptographically secure: in fact, ut_fold_ulint_pair() or ut_fold_binary() are just about as linear as any CRC, using a combination of multiplication and addition, partly carry-less. It is worth noting that exclusive-or corresponds to carry-less subtraction or addition in a binary Galois field, or GF(2). We only need some way of reducing key prefixes into hash values. The CRC-32C should be better than a Rabin–Karp rolling hash algorithm. Compared to the old hash algorithm, it has the drawback that there will be only 32 bits of entropy before we choose the hash table cell by a modulus operation. The size of each adaptive hash index array is (innodb_buffer_pool_size / 512) / innodb_adaptive_hash_index_parts. With the maximum number of partitions (512), we would not exceed 1<<32 elements per array until the buffer pool size exceeds 1<<50 bytes (1 PiB). We would hit other limits before that: the virtual address space on many contemporary 64-bit processor implementations is only 48 bits (256 TiB). So, we can simply go for the SIMD accelerated CRC-32C. rec_fold(): Take a combined parameter n_bytes_fields. Determine the length of each field on the fly, and compute CRC-32C over a single contiguous range of bytes, from the start of the record payload area to the end of the last full or partial field. For secondary index records in ROW_FORMAT=REDUNDANT, also the data area that is reserved for NULL values (to facilitate in-place updates between NULL and NOT NULL values) will be included in the count. Luckily, InnoDB always zero-initialized such unused area; refer to data_write_sql_null() in rec_convert_dtuple_to_rec_old(). For other than ROW_FORMAT=REDUNDANT, no space is allocated for NULL values, and therefore the CRC-32C will only cover the actual payload of the key prefix. dtuple_fold(): For ROW_FORMAT=REDUNDANT, include the dummy NULL values in the CRC-32C, so that the values will be comparable with rec_fold(). innodb_ahi-t: A unit test for rec_fold() and dtuple_fold(). btr_search_build_page_hash_index(), btr_search_drop_page_hash_index(): Use a fixed-size stack buffer for computing the fold values, to avoid dynamic memory allocation. btr_search_drop_page_hash_index(): Do not release part.latch if we need to invoke multiple batches of rec_fold(). dtuple_t: Allocate fewer bits for the fields. The maximum number of data fields is about 1023, so uint16_t will be fine for them. The info_bits is stored in less than 1 byte. ut_pair_min(), ut_pair_cmp(): Remove. We can actually combine and compare int(n_fields << 16 | n_bytes). PAGE_CUR_LE_OR_EXTENDS, PAGE_CUR_DBG: Remove. These were never defined, because they would only work with latin1_swedish_ci if at all. btr_cur_t::check_mismatch(): Replaces !btr_search_check_guess(). cmp_dtuple_rec_bytes(): Replaces cmp_dtuple_rec_with_match_bytes(). Determine the offsets of fields on the fly. page_cur_try_search_shortcut_bytes(): This caller of cmp_dtuple_rec_bytes() will not be invoked on the change buffer tree. cmp_dtuple_rec_leaf(): Replaces cmp_dtuple_rec_with_match() for comparing leaf-page records. buf_block_t::ahi_left_bytes_fields: Consolidated Atomic_relaxed<uint32_t> of curr_left_side << 31 | curr_n_bytes << 16 | curr_n_fields. The other set of parameters (n_fields, n_bytes, left_side) was removed as redundant. btr_search_update_hash_node_on_insert(): Merged to btr_search_update_hash_on_insert(). btr_search_build_page_hash_index(): Take combined left_bytes_fields instead of n_fields, n_bytes, left_side. btr_search_update_block_hash_info(), btr_search_update_hash_ref(): Merged to btr_search_info_update_hash(). btr_cur_t::n_bytes_fields: Replaces n_bytes << 16 | n_fields. We also remove many redundant checks of btr_search.enabled. If we are holding any btr_sea::partition::latch, then a nonnull pointer in buf_block_t::index must imply that the adaptive hash index is enabled. Reviewed by: Vladislav Lesin
1 parent 9c8bdc6 commit 4dcb1b5

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+2187
-2695
lines changed

storage/innobase/CMakeLists.txt

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -362,7 +362,6 @@ SET(INNOBASE_SOURCES
362362
include/ut0sort.h
363363
include/ut0stage.h
364364
include/ut0ut.h
365-
include/ut0ut.inl
366365
include/ut0vec.h
367366
include/ut0vec.inl
368367
include/ut0wqueue.h

storage/innobase/btr/btr0btr.cc

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -558,8 +558,8 @@ buf_block_t *btr_root_block_sx(dict_index_t *index, mtr_t *mtr, dberr_t *err)
558558
return root;
559559
}
560560
#ifdef BTR_CUR_HASH_ADAPT
561-
else
562-
ut_ad(!root->index || !root->index->freed());
561+
ut_d(else if (dict_index_t *index= root->index))
562+
ut_ad(!index->freed());
563563
#endif
564564
return root;
565565
}
@@ -772,7 +772,7 @@ static rec_offs *btr_page_get_parent(rec_offs *offsets, mem_heap_t *heap,
772772
{
773773
ut_ad(block->page.lock.have_u_or_x() ||
774774
(!block->page.lock.have_s() && index->lock.have_x()));
775-
ulint up_match= 0, low_match= 0;
775+
uint16_t up_match= 0, low_match= 0;
776776
cursor->page_cur.block= block;
777777
if (page_cur_search_with_match(tuple, PAGE_CUR_LE, &up_match,
778778
&low_match, &cursor->page_cur,
@@ -1976,7 +1976,7 @@ btr_root_raise_and_insert(
19761976

19771977
ut_ad(dtuple_check_typed(tuple));
19781978
/* Reposition the cursor to the child node */
1979-
ulint low_match = 0, up_match = 0;
1979+
uint16_t low_match = 0, up_match = 0;
19801980

19811981
if (page_cur_search_with_match(tuple, PAGE_CUR_LE,
19821982
&up_match, &low_match,
@@ -2660,7 +2660,7 @@ btr_insert_into_right_sibling(
26602660
return nullptr;
26612661
}
26622662

2663-
ulint up_match = 0, low_match = 0;
2663+
uint16_t up_match = 0, low_match = 0;
26642664

26652665
if (page_cur_search_with_match(tuple,
26662666
PAGE_CUR_LE, &up_match, &low_match,
@@ -3142,7 +3142,7 @@ btr_page_split_and_insert(
31423142
page_cursor = btr_cur_get_page_cur(cursor);
31433143
page_cursor->block = insert_block;
31443144

3145-
ulint up_match = 0, low_match = 0;
3145+
uint16_t up_match = 0, low_match = 0;
31463146

31473147
if (page_cur_search_with_match(tuple,
31483148
PAGE_CUR_LE, &up_match, &low_match,

storage/innobase/btr/btr0cur.cc

Lines changed: 35 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1100,8 +1100,7 @@ dberr_t btr_cur_t::search_leaf(const dtuple_t *tuple, page_cur_mode_t mode,
11001100
MEM_UNDEFINED(&up_bytes, sizeof up_bytes);
11011101
MEM_UNDEFINED(&low_match, sizeof low_match);
11021102
MEM_UNDEFINED(&low_bytes, sizeof low_bytes);
1103-
ut_d(up_match= ULINT_UNDEFINED);
1104-
ut_d(low_match= ULINT_UNDEFINED);
1103+
ut_d(up_match= low_match= uint16_t(~0u));
11051104

11061105
ut_ad(!(latch_mode & BTR_ALREADY_S_LATCHED) ||
11071106
mtr->memo_contains_flagged(&index()->lock,
@@ -1118,14 +1117,14 @@ dberr_t btr_cur_t::search_leaf(const dtuple_t *tuple, page_cur_mode_t mode,
11181117
|| latch_mode == BTR_MODIFY_TREE
11191118
|| latch_mode == BTR_MODIFY_ROOT_AND_LEAF);
11201119

1121-
flag= BTR_CUR_BINARY;
11221120
#ifndef BTR_CUR_ADAPT
11231121
guess= nullptr;
11241122
#else
11251123
auto info= &index()->search_info;
11261124
guess= info->root_guess;
11271125

11281126
# ifdef BTR_CUR_HASH_ADAPT
1127+
flag= BTR_CUR_BINARY;
11291128
# ifdef UNIV_SEARCH_PERF_STAT
11301129
info->n_searches++;
11311130
# endif
@@ -1138,9 +1137,9 @@ dberr_t btr_cur_t::search_leaf(const dtuple_t *tuple, page_cur_mode_t mode,
11381137
latch_mode, this, mtr))
11391138
{
11401139
/* Search using the hash index succeeded */
1141-
ut_ad(up_match != ULINT_UNDEFINED || mode != PAGE_CUR_GE);
1142-
ut_ad(up_match != ULINT_UNDEFINED || mode != PAGE_CUR_LE);
1143-
ut_ad(low_match != ULINT_UNDEFINED || mode != PAGE_CUR_LE);
1140+
ut_ad(up_match != uint16_t(~0U) || mode != PAGE_CUR_GE);
1141+
ut_ad(up_match != uint16_t(~0U) || mode != PAGE_CUR_LE);
1142+
ut_ad(low_match != uint16_t(~0U) || mode != PAGE_CUR_LE);
11441143
++btr_cur_n_sea;
11451144

11461145
return DB_SUCCESS;
@@ -1349,9 +1348,9 @@ dberr_t btr_cur_t::search_leaf(const dtuple_t *tuple, page_cur_mode_t mode,
13491348
if (page_cur_search_with_match(tuple, mode, &up_match, &low_match,
13501349
&page_cur, nullptr))
13511350
goto corrupted;
1352-
ut_ad(up_match != ULINT_UNDEFINED || mode != PAGE_CUR_GE);
1353-
ut_ad(up_match != ULINT_UNDEFINED || mode != PAGE_CUR_LE);
1354-
ut_ad(low_match != ULINT_UNDEFINED || mode != PAGE_CUR_LE);
1351+
ut_ad(up_match != uint16_t(~0U) || mode != PAGE_CUR_GE);
1352+
ut_ad(up_match != uint16_t(~0U) || mode != PAGE_CUR_LE);
1353+
ut_ad(low_match != uint16_t(~0U) || mode != PAGE_CUR_LE);
13551354
goto func_exit;
13561355
}
13571356

@@ -1398,22 +1397,21 @@ dberr_t btr_cur_t::search_leaf(const dtuple_t *tuple, page_cur_mode_t mode,
13981397

13991398
reached_latched_leaf:
14001399
#ifdef BTR_CUR_HASH_ADAPT
1401-
if (btr_search.enabled && !(tuple->info_bits & REC_INFO_MIN_REC_FLAG))
1400+
if (!(tuple->info_bits & REC_INFO_MIN_REC_FLAG) && btr_search.enabled)
14021401
{
1403-
if (page_cur_search_with_match_bytes(tuple, mode,
1404-
&up_match, &up_bytes,
1405-
&low_match, &low_bytes, &page_cur))
1402+
if (page_cur_search_with_match_bytes(*tuple, mode, &up_match, &low_match,
1403+
&page_cur, &up_bytes, &low_bytes))
14061404
goto corrupted;
14071405
}
14081406
else
14091407
#endif /* BTR_CUR_HASH_ADAPT */
1410-
if (page_cur_search_with_match(tuple, mode, &up_match, &low_match,
1411-
&page_cur, nullptr))
1412-
goto corrupted;
1408+
if (page_cur_search_with_match(tuple, mode, &up_match, &low_match,
1409+
&page_cur, nullptr))
1410+
goto corrupted;
14131411

1414-
ut_ad(up_match != ULINT_UNDEFINED || mode != PAGE_CUR_GE);
1415-
ut_ad(up_match != ULINT_UNDEFINED || mode != PAGE_CUR_LE);
1416-
ut_ad(low_match != ULINT_UNDEFINED || mode != PAGE_CUR_LE);
1412+
ut_ad(up_match != uint16_t(~0U) || mode != PAGE_CUR_GE);
1413+
ut_ad(up_match != uint16_t(~0U) || mode != PAGE_CUR_LE);
1414+
ut_ad(low_match != uint16_t(~0U) || mode != PAGE_CUR_LE);
14171415

14181416
if (latch_mode == BTR_MODIFY_TREE &&
14191417
btr_cur_need_opposite_intention(block->page, index()->is_clust(),
@@ -1656,9 +1654,9 @@ dberr_t btr_cur_t::pessimistic_search_leaf(const dtuple_t *tuple,
16561654
err= DB_CORRUPTION;
16571655
else
16581656
{
1659-
ut_ad(up_match != ULINT_UNDEFINED || mode != PAGE_CUR_GE);
1660-
ut_ad(up_match != ULINT_UNDEFINED || mode != PAGE_CUR_LE);
1661-
ut_ad(low_match != ULINT_UNDEFINED || mode != PAGE_CUR_LE);
1657+
ut_ad(up_match != uint16_t(~0U) || mode != PAGE_CUR_GE);
1658+
ut_ad(up_match != uint16_t(~0U) || mode != PAGE_CUR_LE);
1659+
ut_ad(low_match != uint16_t(~0U) || mode != PAGE_CUR_LE);
16621660

16631661
#ifdef BTR_CUR_HASH_ADAPT
16641662
/* We do a dirty read of btr_search.enabled here. We will recheck in
@@ -1770,8 +1768,9 @@ dberr_t btr_cur_search_to_nth_level(ulint level,
17701768
MEM_UNDEFINED(&cursor->low_bytes, sizeof cursor->low_bytes);
17711769
cursor->up_match= 0;
17721770
cursor->low_match= 0;
1771+
#ifdef BTR_CUR_HASH_ADAPT
17731772
cursor->flag= BTR_CUR_BINARY;
1774-
1773+
#endif
17751774
#ifndef BTR_CUR_ADAPT
17761775
buf_block_t *block= nullptr;
17771776
#else
@@ -2518,13 +2517,8 @@ btr_cur_optimistic_insert(
25182517
ut_ad(entry->is_metadata());
25192518
ut_ad(index->is_instant());
25202519
ut_ad(flags == BTR_NO_LOCKING_FLAG);
2521-
} else if (index->table->is_temporary()) {
2522-
} else {
2523-
if (!reorg && cursor->flag == BTR_CUR_HASH) {
2524-
btr_search_update_hash_node_on_insert(cursor);
2525-
} else {
2526-
btr_search_update_hash_on_insert(cursor);
2527-
}
2520+
} else if (!index->table->is_temporary()) {
2521+
btr_search_update_hash_on_insert(cursor, reorg);
25282522
}
25292523
#endif /* BTR_CUR_HASH_ADAPT */
25302524

@@ -2588,7 +2582,9 @@ btr_cur_pessimistic_insert(
25882582
|| dict_index_is_clust(index)
25892583
|| (flags & BTR_CREATE_FLAG));
25902584

2585+
#ifdef BTR_CUR_HASH_ADAPT
25912586
cursor->flag = BTR_CUR_BINARY;
2587+
#endif
25922588

25932589
/* Check locks and write to undo log, if specified */
25942590

@@ -2694,9 +2690,8 @@ btr_cur_pessimistic_insert(
26942690
ut_ad(index->is_instant());
26952691
ut_ad(flags & BTR_NO_LOCKING_FLAG);
26962692
ut_ad(!(flags & BTR_CREATE_FLAG));
2697-
} else if (index->table->is_temporary()) {
2698-
} else {
2699-
btr_search_update_hash_on_insert(cursor);
2693+
} else if (!index->table->is_temporary()) {
2694+
btr_search_update_hash_on_insert(cursor, false);
27002695
}
27012696
#endif /* BTR_CUR_HASH_ADAPT */
27022697
if (inherit && !(flags & BTR_NO_LOCKING_FLAG)) {
@@ -3325,7 +3320,7 @@ static void btr_cur_trim_alter_metadata(dtuple_t* entry,
33253320
if (n_fields != index->n_uniq) {
33263321
ut_ad(n_fields
33273322
>= index->n_core_fields);
3328-
entry->n_fields = n_fields;
3323+
entry->n_fields = uint16_t(n_fields);
33293324
return;
33303325
}
33313326

@@ -3361,7 +3356,7 @@ static void btr_cur_trim_alter_metadata(dtuple_t* entry,
33613356
ut_ad(n_fields >= index->n_core_fields);
33623357

33633358
mtr.commit();
3364-
entry->n_fields = n_fields + 1;
3359+
entry->n_fields = uint16_t(n_fields + 1);
33653360
}
33663361

33673362
/** Trim an update tuple due to instant ADD COLUMN, if needed.
@@ -3417,7 +3412,7 @@ btr_cur_trim(
34173412
ulint n_fields = upd_get_nth_field(update, 0)
34183413
->field_no;
34193414
ut_ad(n_fields + 1 >= entry->n_fields);
3420-
entry->n_fields = n_fields;
3415+
entry->n_fields = uint16_t(n_fields);
34213416
}
34223417
} else {
34233418
entry->trim(*index);
@@ -4817,10 +4812,10 @@ class btr_est_cur_t
48174812

48184813
/** Matched fields and bytes which are used for on-page search, see
48194814
btr_cur_t::(up|low)_(match|bytes) comments for details */
4820-
ulint m_up_match= 0;
4821-
ulint m_up_bytes= 0;
4822-
ulint m_low_match= 0;
4823-
ulint m_low_bytes= 0;
4815+
uint16_t m_up_match= 0;
4816+
uint16_t m_up_bytes= 0;
4817+
uint16_t m_low_match= 0;
4818+
uint16_t m_low_bytes= 0;
48244819

48254820
public:
48264821
btr_est_cur_t(dict_index_t *index, const dtuple_t &tuple,
@@ -4844,12 +4839,7 @@ class btr_est_cur_t
48444839
m_page_mode= PAGE_CUR_LE;
48454840
break;
48464841
default:
4847-
#ifdef PAGE_CUR_LE_OR_EXTENDS
4848-
ut_ad(mode == PAGE_CUR_L || mode == PAGE_CUR_LE ||
4849-
mode == PAGE_CUR_LE_OR_EXTENDS);
4850-
#else /* PAGE_CUR_LE_OR_EXTENDS */
48514842
ut_ad(mode == PAGE_CUR_L || mode == PAGE_CUR_LE);
4852-
#endif /* PAGE_CUR_LE_OR_EXTENDS */
48534843
m_page_mode= mode;
48544844
break;
48554845
}

storage/innobase/btr/btr0pcur.cc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -435,7 +435,7 @@ btr_pcur_t::restore_position(btr_latch_mode restore_latch_mode, mtr_t *mtr)
435435
rec_offs_init(offsets);
436436
restore_status ret_val= restore_status::NOT_SAME;
437437
if (rel_pos == BTR_PCUR_ON && btr_pcur_is_on_user_rec(this)) {
438-
ulint n_matched_fields= 0;
438+
uint16_t n_matched_fields= 0;
439439
if (!cmp_dtuple_rec_with_match(
440440
tuple, btr_pcur_get_rec(this), index,
441441
rec_get_offsets(btr_pcur_get_rec(this), index, offsets,

0 commit comments

Comments
 (0)