-
Notifications
You must be signed in to change notification settings - Fork 21
CNDB-15487: Add queryability logic to View to prevent race to discover non-queryability #2029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
552efae
aca2ae2
732e6aa
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -189,18 +189,26 @@ public void testIndexRebuildWhenAddingSStableViaRemoteReload() | |
| // unlink sstable and index context: expect no rows to be read by base and index | ||
| cfs.clearUnsafe(); | ||
| IndexMetadata indexMetadata = cfs.metadata().indexes.iterator().next(); | ||
| ((StorageAttachedIndex) cfs.getIndexManager().getIndex(indexMetadata)).getIndexContext().prepareSSTablesForRebuild(sstables); | ||
| // First confirm that if we pass an empty set, the index stays queryable. Not sure if this is a realistic | ||
| // code path, but it seems plausible that we could have an unrelated sstable passed to this method in the | ||
| // event of a data race, and in that case, we should stay queryable. | ||
| ((StorageAttachedIndex) cfs.getIndexManager().getIndex(indexMetadata)).getIndexContext().prepareSSTablesForRebuild(Collections.emptySet()); | ||
| assertEmpty(execute("SELECT * FROM %s WHERE a=1")); | ||
| assertEmpty(execute("SELECT * FROM %s WHERE c=1")); | ||
|
|
||
| // TODO why? This change reverts back to behavior from before https://github.com/datastax/cassandra/pull/1491, | ||
| // but it seems invalid. | ||
| // track sstable again: expect no rows to be read by index | ||
| // Now pass the actual sstables: expect no rows to be read by base and for the index to be non-queryable because | ||
| // preparing an sstable for rebuild removes the index from the view, which would otherwise result in | ||
| // partial results. | ||
| ((StorageAttachedIndex) cfs.getIndexManager().getIndex(indexMetadata)).getIndexContext().prepareSSTablesForRebuild(sstables); | ||
| assertEmpty(execute("SELECT * FROM %s WHERE a=1")); | ||
| assertInvalid("SELECT * FROM %s WHERE c=1"); | ||
|
|
||
| // track sstable again: expect index to remain non-queryable because it is not rebuilt (the view hasn't been updated) | ||
|
Comment on lines
-196
to
+206
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jasonstack - I see that you wrote these test assertions in #309. We've flipped this logic a couple of times this year. First we went towards non-queryable with https://github.com/datastax/cassandra/pull/1491/files#diff-98107b1903c60731ef493d2e86f26460e40d2b0e60862db3a49ea27ba4e282fa. Then, we reverted it back to the #309 state when I merged #1700. You also commented that this test could be removed here: #1700 (comment). So, my key question: do you have any concern with the design change to make the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
With this change, we effectively have two queryability states: one in Marking queryable:
Marking non-queryable. There are two possible approaches
I am +1 on keeping the code simple with current PR approach. WDYT? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think we can remove it and rely on CNDB side tests
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We already have a race. I am just proposing one way to solve it. We could alternatively make SIM get a reference to the view when checking queryability. In general, I tend to agree with you that two ways to track the same concept is a design smell. As the new tests show, there is a race already where it is possible to get an incomplete set of sstable indexes, which is all I am trying to fix. |
||
| cfs.getTracker().addInitialSSTables(sstables); | ||
| assertRows(execute("SELECT * FROM %s WHERE a=1"), row(1, 1, 1)); | ||
| assertEmpty(execute("SELECT * FROM %s WHERE c=1")); | ||
| assertInvalid("SELECT * FROM %s WHERE c=1"); | ||
|
|
||
| // remote reload should trigger index rebuild | ||
| // remote reload should trigger index rebuild, making the index queryable again | ||
| cfs.getTracker().notifySSTablesChanged(Collections.emptySet(), sstables, OperationType.REMOTE_RELOAD, Optional.empty(), null); | ||
| waitForIndexBuilds(KEYSPACE, indexName); // this is needed because index build on remote reload is async | ||
| assertRows(execute("SELECT * FROM %s WHERE a=1"), row(1, 1, 1)); | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,118 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.cassandra.index.sai.plan; | ||
|
|
||
| import java.util.concurrent.ForkJoinPool; | ||
| import java.util.concurrent.TimeUnit; | ||
|
|
||
| import org.junit.Test; | ||
|
|
||
| import org.apache.cassandra.exceptions.ReadFailureException; | ||
| import org.apache.cassandra.index.sai.SAITester; | ||
| import org.apache.cassandra.index.sai.view.IndexViewManager; | ||
| import org.apache.cassandra.inject.ActionBuilder; | ||
| import org.apache.cassandra.inject.Expression; | ||
| import org.apache.cassandra.inject.Injection; | ||
| import org.apache.cassandra.inject.Injections; | ||
| import org.apache.cassandra.inject.InvokePointBuilder; | ||
| import org.awaitility.Awaitility; | ||
|
|
||
| import static org.junit.Assert.assertEquals; | ||
| import static org.junit.Assert.assertThrows; | ||
| import static org.junit.Assert.assertTrue; | ||
|
|
||
| /** | ||
| * When a query is in progress and an index build fails, we need to mark the index as non-queryable instead of | ||
| * returning partial or incorrect results. | ||
| */ | ||
| public class FailedSSTableIndexLoadWhileQueryingTest extends SAITester | ||
| { | ||
|
|
||
| @Test | ||
| public void testSSTableIndexInitFailsAfterQueryViewBuildEqualityQuery() throws Throwable | ||
| { | ||
| createTable("CREATE TABLE %s (k text PRIMARY KEY, x int)"); | ||
|
|
||
| var indexName = createIndex("CREATE CUSTOM INDEX ON %s(x) USING 'StorageAttachedIndex'"); | ||
|
|
||
| execute("INSERT INTO %s (k, x) VALUES (?, ?)", "a", 0); | ||
| execute("INSERT INTO %s (k, x) VALUES (?, ?)", "b", 0); | ||
| execute("INSERT INTO %s (k, x) VALUES (?, ?)", "c", 1); | ||
|
|
||
| testSSTableIndexInitFailsAfterQueryViewBuiltBeforeQueryExecution(indexName, "SELECT k FROM %s WHERE x = 0"); | ||
| } | ||
|
|
||
| @Test | ||
| public void testSSTableIndexInitFailsAfterQueryViewBuildNotContainsQuery() throws Throwable | ||
| { | ||
| createTable("CREATE TABLE %s (k text PRIMARY KEY, x set<int>)"); | ||
|
|
||
| var indexName = createIndex("CREATE CUSTOM INDEX ON %s(x) USING 'StorageAttachedIndex'"); | ||
|
|
||
| execute("INSERT INTO %s (k, x) VALUES ('a', {1, 2, 3})"); | ||
| execute("INSERT INTO %s (k, x) VALUES ('b', {1, 2, 3})"); | ||
| execute("INSERT INTO %s (k, x) VALUES ('c', {1, 2, 4})"); | ||
|
|
||
| testSSTableIndexInitFailsAfterQueryViewBuiltBeforeQueryExecution(indexName, "SELECT k FROM %s WHERE x NOT CONTAINS 3"); | ||
| } | ||
|
|
||
| private void testSSTableIndexInitFailsAfterQueryViewBuiltBeforeQueryExecution(String indexName, String query) throws Throwable | ||
| { | ||
| // This bug is only reachable when you fail to replace an sstable via compaction. | ||
| flush(); | ||
|
|
||
| Injection failSSTableIndexLoadOnInit = Injections.newCustom("FailSSTableIndexLoadOnInit-" + indexName) | ||
| .add(InvokePointBuilder.newInvokePoint() | ||
| .onClass("org.apache.cassandra.index.sai.SSTableIndex") | ||
| .onMethod("<init>") | ||
| .atEntry() | ||
| ) | ||
| .add(ActionBuilder.newActionBuilder().actions() | ||
| .doThrow(java.lang.RuntimeException.class, Expression.quote("Byteman-injected fault in MemtableIndexWriter.complete")) | ||
| ) | ||
| .build(); | ||
| Injections.inject(failSSTableIndexLoadOnInit); | ||
|
|
||
| // We use two barriers to ensure that we first flush and fail to load the index (thereby putting in place | ||
| // an invalid view) and then to make sure the query gets that view. | ||
| var badViewPoint = InvokePointBuilder.newInvokePoint().onClass(IndexViewManager.class).onMethod("update").atExit(); | ||
| var badViewBarrier = Injections.newBarrier("pause_after_setting_bad_view", 2, false).add(badViewPoint).build(); | ||
| Injections.inject(badViewBarrier); | ||
|
|
||
| // Flush in a separate thread since the badViewPointBarrier will block it, thereby preventing it from | ||
| // marking the index as non-queryable. | ||
| ForkJoinPool.commonPool().submit(() -> compact()); | ||
|
|
||
| // Wait for compaction to reach the badViewBarrier (the point where we know the view is bad but the | ||
| // index is still considered queryable). | ||
| Awaitility.waitAtMost(5, TimeUnit.SECONDS).until(() -> badViewBarrier.getCount() == 1); | ||
|
|
||
| // The query hits the getViewBarrier. | ||
| assertThrows(ReadFailureException.class, () -> execute(query)); | ||
| assertEquals("Confirm that flush hit the barrier, but did not pass it", 1, badViewBarrier.getCount()); | ||
|
|
||
| // Confirm index is considered queryable. The primary point of the remaining assertions is to show to readers | ||
| // that we have a period of time after creating a broken view and before marking the index as non-queryable. | ||
| assertTrue(isIndexQueryable(KEYSPACE, indexName)); | ||
| // Arrive and unblock the thread that will mark the index as non-queryable. | ||
| badViewBarrier.arrive(); | ||
| // Expect index to go non-queryable soon. | ||
| Awaitility.waitAtMost(5, TimeUnit.SECONDS).until(() -> !isIndexQueryable(KEYSPACE, indexName)); | ||
| } | ||
| } |
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it works on CNDB which skips index build on writer.
If we create index on an empty table, existing CNDB directly marks the index as queryable. But with this change, will read request fail because view.isQueryable = false?
Can you create a CNDB PR to verify if tests are working?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this comment. I was wondering if there was different logic in CNDB. I would tend to think this logic is captured in a test though, right? Here is the test pr https://github.com/riptano/cndb/pull/15531. I haven't reviewed the test failures yet, but I will now. None of them appear directly related to SAI at first glance, though.