adfsdfas #1

sangyo7 · 2021-05-18T14:47:34Z

What is the purpose of the change

(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)

Brief change log

(for example:)

The TaskInfo is stored in the blob store on job creation time as a persistent artifact
Deployments RPC transmits only the blob storage reference
TaskManagers retrieve the TaskInfo from the blob cache

Verifying this change

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end deployment with large payloads (100MB)
Extended integration test for recovery after master (JobManager) failure
Added test that validates that TaskInfo is transferred only once across recoveries
Manually verified the change by running a 4 node cluser with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (yes / no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
The serializers: (yes / no / don't know)
The runtime per-record code paths (performance sensitive): (yes / no / don't know)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / no / don't know)
The S3 file system connector: (yes / no / don't know)

Documentation

Does this pull request introduce a new feature? (yes / no)
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
ggf

…if join condition contains 'IS NOT DISTINCT FROM' Fix Flink-22098 caused by a mistake when rebasing This closes #15695

This closes #13011

…TE TABLE' statement. This commit tries to 1. resolve the conflicts 2. revert the changes made on old planner 3. apply spotless formatting 4. fix DDL missing `TEMPORARY` keyword for temporary table 5. display table's full object path as catalog.db.table 6. support displaying the expanded query for view 7. add view test in CatalogTableITCase, and adapt sql client test to the new test framework 8. adapt docs This closes #13011

…t declare managed memory for Python operators This closes #15665.

… in Python DataStream API This closes #15702.

…precision isn't matching with declared DataType This closes #15689

…ute field This closes #15689

This closes #15689

Additionally, set speed up the AdaptiveScheduler ITCases by configuring a very low jobmanager.adaptive-scheduler.resource-stabilization-timeout.

When this incorrect assertion is violated, the scheduler can trip into an unrecoverable failover loop.

…IncrementalAggregate.out After FLINK-22298 is finished, the ExecNode's id should always start from 1 in the json plan tests, while the testIncrementalAggregate.out was overrided by FLINK-20613 This closes #15698

… on persisting Exception can be thrown for example if task is being cancelled. This was leading to same buffer being recycled twice. Most of the times that was just leading to an IllegalReferenceCount being thrown, which was ignored, as this task was being cancelled. However on rare occasions this buffer could have been picked up by another task after being recycled for the first time, recycled second time and being picked up by another (third task). In that case we had two users of the same buffer, which could lead to all sort of data corruptions.

[FLINK-XXXX] Draft separation of leader election and creation of JobMasterService [FLINK-XXXX] Continued work on JobMasterServiceLeadershipRunner [FLINK-XXXX] Integrate RunningJobsRegistry, Cancelling state and termination future watching [FLINK-XXXX] Delete old JobManagerRunnerImpl classes [FLINK-22001] Add tests for DefaultJobMasterServiceProcess [FLINK-22001][hotfix] Clean up ITCase a bit [FLINK-22001] Add missing check for empty job graph [FLINK-22001] Rename JobMasterServiceFactoryNg to JobMasterServiceFactory This closes #15715.

…tor NOTICE file This closes #15706

…nfiguration

This closes #15699.

…or Coordinators

This closes #15608

This closes #15660

This closes #15693.

… after execution failure. This closes #15713

This closes #15723

…rrelate json serialization/deserialization This closes #15922.

…oupWindowAggregate json serialization/deserialization This closes #15934.

…g casting Compare children individually for anonymous structured types. This fixes issues with primitive fields and Scala case classes. This closes #15935.

… structured types

…lasses This removes the OrcTableSource and related classes including OrcInputFormat. Use the filesystem connector with a ORC format as a replacement. It is possible to read via Table & SQL API snd convert the Table to DataStream API if necessary. DataSet API is not supported anymore. This closes #15891.

…oupAggregate json serialization/deserialization This closes #15928.

…erAggregate json serialization/deserialization This closes #15937.

This closes #15940

At the moment Flink only cleans up the ha data (e.g. K8s ConfigMaps, or Zookeeper nodes) while shutting down the cluster. This is not enough for a long running session cluster to which you submit multiple jobs. In this commit, we clean up the data for the particular job if it reaches a globally terminal state. This closes #15561.

…tApplyEvictorWindowStateReader

…elated classes This removes the ParquetTableSource and related classes including various ParquetInputFormats. Use the filesystem connector with a Parquet format as a replacement. It is possible to read via Table & SQL API and convert the Table to DataStream API if necessary. DataSet API is not supported anymore. This closes #15895.

The HA e2e test will start a Flink application first and wait for three successful checkpoints. Then kill the JobManager. A new JobManager should be launched and recover the job from latest successful checkpoint. Finally, cancel the job and all the K8s resources should be cleaned up automatically. This closes #14172.

RetrievableStateStorageHelper is not only used by ZooKeeperStateHandleStore but also by KubernetesStateHandleStore.

We experienced cases where the ConfigMap was updated but the corresponding HTTP request failed due to connectivity issues. PossibleInconsistentStateException is used to reflect cases where it's not clear whether the data was actually written or not.

…n references The previous implementation stored the state in the StateHandle. This causes problems when deserializing the state creating a new instance that does not point to the actual state but is a copy of this state. This refactoring introduces LongStateHandle handling the actual state and LongRetrievableStateHandle referencing this handle.

…Store

…scardOnFailedStoring

…ng to CheckpointCoordinator This closes #15832.

…alGroupWindowAggregate which only supports insert-only input node This closes #14830

… and related classes This removes the HBaseTableSource/Sink and related classes including various HBaseInputFormats and HBaseSinkFunction. It is possible to read via Table & SQL API and convert the Table to DataStream API (or vice versa) if necessary. DataSet API is not supported anymore. This closes #15905.

In order to better clean up job specific HA services, this commit changes the layout of the zNode structure so that the JobMaster leader, checkpoints and checkpoint counter is now grouped below the jobs/ zNode. Moreover, this commit groups the leaders of the cluster components (Dispatcher, ResourceManager, RestServer) under /leader/process/latch and /leader/process/connection-info. This closes #15893.

sangyo7 · 2021-05-18T14:48:43Z

qw

dianfu and others added 30 commits April 21, 2021 19:25

[hotfix][docs][python] Fix the invalid url in state.md

bfceafd

[hotfix][table-planner-blink] Fix bug for window join: plan is wrong …

a012400

…if join condition contains 'IS NOT DISTINCT FROM' Fix Flink-22098 caused by a mistake when rebasing This closes #15695

[hotfix][docs] Fix the invalid urls in iterations.md

17bc2d0

[FLINK-16384][table][sql-client] Support 'SHOW CREATE TABLE' statement

a3437d5

This closes #13011

[hotfix][docs][python] Add missing pages for PyFlink documentation

3fa191b

[FLINK-22348][python] Fix DataStream.execute_and_collect which doesn'…

8b2fada

…t declare managed memory for Python operators This closes #15665.

[FLINK-22350][python][docs] Add documentation for user-defined window…

48b8018

… in Python DataStream API This closes #15702.

[FLINK-22354][table-planner] Fix the timestamp function return value …

d304373

…precision isn't matching with declared DataType This closes #15689

[FLINK-22354][table] Remove the strict precision check of time attrib…

f4f0fb5

…ute field This closes #15689

[FLINK-22354][table] Fix TIME field display in sql-client

3a2d825

This closes #15689

[hotfix] Set default value of resource-stabilization-timeout to 10s

4ad4e87

Additionally, set speed up the AdaptiveScheduler ITCases by configuring a very low jobmanager.adaptive-scheduler.resource-stabilization-timeout.

[FLINK-22345][coordination] Remove incorrect assertion in scheduler

02e7db8

When this incorrect assertion is violated, the scheduler can trip into an unrecoverable failover loop.

[hotfix] Regenerate html configuration after 4be9aff

27efcd2

[FLINK-22396][legal] Remove unnecessary entries from sql hbase connec…

bdc4c10

…tor NOTICE file This closes #15706

[FLINK-22396][hbase] Remove unnecessary entries from shaded plugin co…

a493a6b

…nfiguration

[FLINK-19980][table-common] Add a ChangelogMode.upsert() shortcut

c54a15d

[FLINK-19980][table-common] Add a ChangelogMode.all() shortcut

923e161

[FLINK-19980][table] Support fromChangelogStream/toChangelogStream

4b5e8cc

This closes #15699.

[hotfix][table-api] Properly deprecate StreamTableEnvironment.execute

fa0be65

[FLINK-22345][coordination] Catch pre-mature state restore for Operat…

e3cfa40

…or Coordinators

[FLINK-22168][table] Partition insert can not work with union all

04ce600

This closes #15608

[FLINK-22341][hive] Fix describe table for hive dialect

0270244

This closes #15660

[FLINK-22000][io] Set a default character set in InputStreamReader

0d29b23

[FLINK-22385][runtime] Fix type cast error in NetworkBufferPool

4ab8cea

This closes #15693.

[FLINK-22085][tests] Update TestUtils::tryExecute() to cancel the job…

58a7c80

… after execution failure. This closes #15713

[hotfix][runtime] fix typo in ResourceProfile

59cb0b4

This closes #15723

HuangXingBo and others added 28 commits May 17, 2021 21:52

[FLINK-22650][python][table-planner-blink] Support StreamExecPythonCo…

d09745a

…rrelate json serialization/deserialization This closes #15922.

[FLINK-22652][python][table-planner-blink] Support StreamExecPythonGr…

5eebab4

…oupWindowAggregate json serialization/deserialization This closes #15934.

[FLINK-22666][table] Make structured type's fields more lenient durin…

099a18e

…g casting Compare children individually for anonymous structured types. This fixes issues with primitive fields and Scala case classes. This closes #15935.

[hotfix][table-planner-blink] Give more helpful exception for codegen…

5746385

… structured types

[FLINK-22651][python][table-planner-blink] Support StreamExecPythonGr…

06dec01

…oupAggregate json serialization/deserialization This closes #15928.

[FLINK-22653][python][table-planner-blink] Support StreamExecPythonOv…

4d33e85

…erAggregate json serialization/deserialization This closes #15937.

[FLINK-12295][table] Fix comments in MinAggFunction and MaxAggFunction

057f86b

This closes #15940

[FLINK-22067][tests] Wait for vertices to using API

0528f84

[hotfix][tests] Remove try/catch from SavepointWindowReaderITCase.tes…

7399bb4

…tApplyEvictorWindowStateReader

[hotfix] Ignore failing KinesisITCase traacked in FLINK-22613

aa9f474

[FLINK-22656] Fix typos

1846bed

[hotfix][runtime] Fixes JavaDoc for RetrievableStateStorageHelper

73fcaa1

RetrievableStateStorageHelper is not only used by ZooKeeperStateHandleStore but also by KubernetesStateHandleStore.

[hotfix][runtime] Cleans up unnecessary annotations

15eafd9

[FLINK-22494][ha] Introduces PossibleInconsistentState to StateHandle…

417cf78

…Store

[FLINK-22494][runtime] Refactors CheckpointsCleaner to handle also di…

e632591

…scardOnFailedStoring

[FLINK-22494][runtime] Adds PossibleInconsistentStateException handli…

fa0d1dc

…ng to CheckpointCoordinator This closes #15832.

[FLINK-22515][docs] Add documentation for GSR-Flink Integration

7d2a325

[FLINK-20487][table-planner-blink] Remove restriction on StreamPhysic…

1a1a729

…alGroupWindowAggregate which only supports insert-only input node This closes #14830

[hotfix][hbase] Fix warnings around decimals in HBaseTestBase

927a21c

[FLINK-22696][tests] Enable Confluent Schema Registry e2e test on jdk 11

cf3c8b1

sangyo7 merged commit fe7c3e4 into sangyo7:master May 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

adfsdfas #1

adfsdfas #1

Uh oh!

sangyo7 commented May 18, 2021

Uh oh!

sangyo7 commented May 18, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

69 participants

adfsdfas #1

adfsdfas #1

Uh oh!

Conversation

sangyo7 commented May 18, 2021

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

sangyo7 commented May 18, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

69 participants