Skip to content

Conversation

@pgomulka
Copy link
Contributor

@pgomulka pgomulka commented Feb 21, 2022

This PR properly registers NamedXContentRegistry entries for SystemIndexMigrationTaskParams and SystemIndexMigrationTaskState. It also adds tests for the XContent de/serialization for those classes,
and fixes a bug revealed by these tests in SystemIndexMigrationTaskState's parser.

Finally, it adds an integration test which simulates the conditions in which #84115 occurs: A node restart
while the migration is in progress. This ensures that we have fixed that particular bug.

Closes #84115


To add context for anyone looking at this fix as it applies to #84115 which is currently occurring in a
live cluster: Because this fix only changes how these classes are read, not how they are written,
installing a version which includes this fix should be sufficient to allow the impacted nodes to start.

Registering SystemIndexMigrationTaskParams and SystemIndexMigrationTaskState parsers into named xcontent registry.
closes elastic#84115
@pgomulka pgomulka added :Core/Infra/Core Core issues without another label auto-backport Automatically create backport pull requests when merged v8.1.1 v8.0.2 labels Feb 21, 2022
@pgomulka pgomulka self-assigned this Feb 21, 2022
@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Feb 21, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine
Copy link
Collaborator

Hi @pgomulka, I've created a changelog YAML for you.

…thub.com:pgomulka/elasticsearch into main/systemindexmigration_register_namedxcontent
@pgomulka
Copy link
Contributor Author

I wonder if the problems from this PR together with #84142 are enough?
Should we try to have an integration test that would actually create that SystemIndexMigrationTask and attempt to read the cluster state from disk? (no idea if it is possible)

@DaveCTurner
Copy link
Contributor

Copying the content of our Slack conversation here:

  • I think we should have a test that restarts a node while the upgrade task is running. Assert written metadata is readable #84142 would have caught this specific problem, but "can read the metadata without throwing an exception" is a pretty weak property and there may be other problems related to restarting nodes while this task is running. Although the task is usually short-lived, it should be possible to write an ESIntegTestCase which adds a ClusterStateListener that blocks the master thread when it sees the task, and then restarts a node when the block happens.

  • Apart from that comment about testing, I'd rather have a core/infra person do the detailed review of the changes.

@pgomulka pgomulka requested a review from rjernst February 21, 2022 15:54
case 2:
return randomLong();
case 3:
return randomShort();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed short and float from there because when serialised/deserialised jackson will read this as integer/double

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test looks good, I left a couple of suggestions.

AtomicBoolean hasBlocked = new AtomicBoolean();

final String masterName = internalCluster().startMasterOnlyNode();
final String dataNodeName = internalCluster().startDataOnlyNode();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I called this a "data node" when discussing the test but in fact it would be preferable to test it for master nodes. E.g. #84148 would drop the bad metadata on non-master nodes so this test wouldn't do anything any more.

import static org.hamcrest.Matchers.equalTo;
import static org.hamcrest.Matchers.is;

public abstract class AbstractFeatureMigrationIntegTest extends ESIntegTestCase {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of these tests share the same setup, so I think it is better they have it in one abstract class common to them all, rather than extending from the base test (I don't think MultiFeatureMigrationIT was meant to run tests from FeatureMigrationIT, but maybe @gwbrown can confirm?)
I was considering moving this all to utility class, but I also want to make sure that cleanup of static hooks (preMigrationHook and postMigrationHook ) is performed before every test. I had a failure due to lack of this cleanup https://gradle-enterprise.elastic.co/s/z75xl73xezq36/tests/:modules:reindex:internalClusterTest/org.elasticsearch.migration.SystemIndexMigrationIT/testSystemIndexMigrationCanBeInterruptedWithShutdown?top-execution=1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracting this to a separate class is quite reasonable. MultiFeatureMigrationIT was to some degree intended to run the tests from FeatureMigrationIT as well as its own, but it's very possible that's not the best way to handle that, and I don't think we lose much test coverage as the test in MultiFeatureMigrationIT should cover those cases alright.

++ to adding automatic cleanup, we should have had it to start with.

@pgomulka pgomulka requested a review from DaveCTurner February 22, 2022 11:08
@gwbrown gwbrown changed the title Registration of SystemIndexMigrationTask named xcontent objects Register Feature migration persistent task state named XContent Feb 22, 2022
@gwbrown gwbrown merged commit 4c3b7b1 into elastic:master Feb 22, 2022
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

The backport operation could not be completed due to the following error:
An unexpected error occurred when attempting to backport this PR.

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 84192

gwbrown added a commit to gwbrown/elasticsearch that referenced this pull request Feb 22, 2022
…tic#84192)

This PR properly registers `NamedXContentRegistry` entries for `SystemIndexMigrationTaskParams` and `SystemIndexMigrationTaskState`. It also adds tests for the XContent de/serialization for those classes,
and fixes a bug revealed by these tests in `SystemIndexMigrationTaskState`'s parser.

Finally, it adds an integration test which simulates the conditions in which elastic#84115 occurs: A node restart
while the migration is in progress. This ensures that we have fixed that particular bug.

Co-authored-by: David Turner <[email protected]>
Co-authored-by: Gordon Brown <[email protected]>
(cherry picked from commit 4c3b7b1)
gwbrown added a commit to gwbrown/elasticsearch that referenced this pull request Feb 22, 2022
…tic#84192)

This PR properly registers `NamedXContentRegistry` entries for `SystemIndexMigrationTaskParams` and `SystemIndexMigrationTaskState`. It also adds tests for the XContent de/serialization for those classes,
and fixes a bug revealed by these tests in `SystemIndexMigrationTaskState`'s parser.

Finally, it adds an integration test which simulates the conditions in which elastic#84115 occurs: A node restart
while the migration is in progress. This ensures that we have fixed that particular bug.

Co-authored-by: David Turner <[email protected]>
Co-authored-by: Gordon Brown <[email protected]>
(cherry picked from commit 4c3b7b1)
gwbrown added a commit that referenced this pull request Feb 23, 2022
This PR properly registers `NamedXContentRegistry` entries for `SystemIndexMigrationTaskParams` and `SystemIndexMigrationTaskState`. It also adds tests for the XContent de/serialization for those classes,
and fixes a bug revealed by these tests in `SystemIndexMigrationTaskState`'s parser.

Finally, it adds an integration test which simulates the conditions in which #84115 occurs: A node restart
while the migration is in progress. This ensures that we have fixed that particular bug.

Co-authored-by: David Turner <[email protected]>
Co-authored-by: Gordon Brown <[email protected]>
(cherry picked from commit 4c3b7b1)

Co-authored-by: Przemyslaw Gomulka <[email protected]>
gwbrown added a commit to gwbrown/elasticsearch that referenced this pull request Feb 23, 2022
…tic#84192)

This PR properly registers `NamedXContentRegistry` entries for `SystemIndexMigrationTaskParams` and `SystemIndexMigrationTaskState`. It also adds tests for the XContent de/serialization for those classes,
and fixes a bug revealed by these tests in `SystemIndexMigrationTaskState`'s parser.

Finally, it adds an integration test which simulates the conditions in which elastic#84115 occurs: A node restart
while the migration is in progress. This ensures that we have fixed that particular bug.

Co-authored-by: David Turner <[email protected]>
Co-authored-by: Gordon Brown <[email protected]>
(cherry picked from commit 4c3b7b1)
@gwbrown
Copy link
Contributor

gwbrown commented Feb 23, 2022

💚 All backports created successfully

Status Branch Result
8.1
8.0
7.17

Questions ?

Please refer to the Backport tool documentation

pgomulka pushed a commit that referenced this pull request Feb 23, 2022
#84192) (#84258)

In addition to the backport, this commit also adds SystemIndexMigrationTaskParams/State to writeableRegistry of transport client
This will backport the following commits from master to 7.17:

Register Feature migration persistent task state named XContent (#84192)
pgomulka added a commit to pgomulka/elasticsearch that referenced this pull request Feb 23, 2022
A follow up after elastic#84192
- refactor the static state in TestPlugin to be an instance
- refactor assertions to use hamcrest
- remove Simple from methods as it is not meaningful
- refactor xcontent tests to support unknown fields

closes elastic#84245
probakowski pushed a commit to probakowski/elasticsearch that referenced this pull request Feb 23, 2022
…tic#84192)

This PR properly registers `NamedXContentRegistry` entries for `SystemIndexMigrationTaskParams` and `SystemIndexMigrationTaskState`. It also adds tests for the XContent de/serialization for those classes, 
and fixes a bug revealed by these tests in `SystemIndexMigrationTaskState`'s parser.

Finally, it adds an integration test which simulates the conditions in which elastic#84115 occurs: A node restart 
while the migration is in progress. This ensures that we have fixed that particular bug.

Co-authored-by: David Turner <[email protected]>
Co-authored-by: Gordon Brown <[email protected]>
gwbrown added a commit that referenced this pull request Feb 23, 2022
This PR properly registers `NamedXContentRegistry` entries for `SystemIndexMigrationTaskParams` and `SystemIndexMigrationTaskState`. It also adds tests for the XContent de/serialization for those classes,
and fixes a bug revealed by these tests in `SystemIndexMigrationTaskState`'s parser.

Finally, it adds an integration test which simulates the conditions in which #84115 occurs: A node restart
while the migration is in progress. This ensures that we have fixed that particular bug.

Co-authored-by: David Turner <[email protected]>
Co-authored-by: Gordon Brown <[email protected]>
Co-authored-by: Przemyslaw Gomulka <[email protected]>
pgomulka added a commit that referenced this pull request Mar 21, 2022
A follow up after #84192

refactor the static state in TestPlugin to be an instance
refactor assertions to use hamcrest
remove Simple from methods as it is not meaningful
refactor xcontent tests to support unknown fields
closes #84245
pgomulka added a commit to pgomulka/elasticsearch that referenced this pull request Mar 21, 2022
A follow up after elastic#84192

refactor the static state in TestPlugin to be an instance
refactor assertions to use hamcrest
remove Simple from methods as it is not meaningful
refactor xcontent tests to support unknown fields
closes elastic#84245
pgomulka added a commit to pgomulka/elasticsearch that referenced this pull request Mar 21, 2022
A follow up after elastic#84192

refactor the static state in TestPlugin to be an instance
refactor assertions to use hamcrest
remove Simple from methods as it is not meaningful
refactor xcontent tests to support unknown fields
closes elastic#84245
pgomulka added a commit to pgomulka/elasticsearch that referenced this pull request Mar 23, 2022
A follow up after elastic#84192

refactor the static state in TestPlugin to be an instance
refactor assertions to use hamcrest
remove Simple from methods as it is not meaningful
refactor xcontent tests to support unknown fields
closes elastic#84245
@pgomulka pgomulka deleted the main/systemindexmigration_register_namedxcontent branch July 25, 2022 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >bug :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team v7.17.1 v8.0.2 v8.1.1 v8.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unable to parse SystemIndexMigrationTaskParams and SystemIndexMigrationTaskState

7 participants