Skip to content

Conversation

@lw-lin
Copy link
Contributor

@lw-lin lw-lin commented Apr 23, 2017

What changes were proposed in this pull request?

Within the same streaming query, when one StreamingRelation is referred multiple times – e.g. df.union(df) – we should transform it only to one StreamingExecutionRelation, instead of two or more different StreamingExecutionRelations (each of which would have a separate set of source, source logs, ...).

How was this patch tested?

Added two test cases, each of which would fail without this patch.

@SparkQA
Copy link

SparkQA commented Apr 23, 2017

Test build #76080 has finished for PR 17735 at commit bf502a7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@lw-lin lw-lin changed the title [WIP][SS] Within the same streaming query, one StreamingRelation should only be transformed to one StreamingExecutionRelation [SPARK-20441][SPARK-20432][SS] Within the same streaming query, one StreamingRelation should only be transformed to one StreamingExecutionRelation Apr 23, 2017
@lw-lin
Copy link
Contributor Author

lw-lin commented Apr 25, 2017

@zsxwing @brkyvz would you take a look at this? thanks!

.collect { case ser: StreamingExecutionRelation => ser }
assert(executionRelations.size == 2)
assert(executionRelations.distinct.size == 1)
query.stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please wrap this in a finally?

@SparkQA
Copy link

SparkQA commented Apr 27, 2017

Test build #76208 has finished for PR 17735 at commit 98eb3dc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 27, 2017

Test build #76211 has finished for PR 17735 at commit db11db0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@lw-lin
Copy link
Contributor Author

lw-lin commented Apr 28, 2017

@brkyvz please take another look

@lw-lin
Copy link
Contributor Author

lw-lin commented May 1, 2017

Jenkins retest this please

@lw-lin
Copy link
Contributor Author

lw-lin commented May 1, 2017

@zsxwing @brkyvz would you take a look at this? thanks!

@SparkQA
Copy link

SparkQA commented May 1, 2017

Test build #76347 has finished for PR 17735 at commit db11db0.

  • This patch fails from timeout after a configured wait of `250m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@lw-lin
Copy link
Contributor Author

lw-lin commented May 1, 2017

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented May 2, 2017

Test build #76363 has finished for PR 17735 at commit db11db0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.streamingQuery
val executionRelations =
query
.logicalPlan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to call query.awaitInitialization before accessing logicalPlan. Otherwise this test will be flaky.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, i see.
fixed. thanks!

@zsxwing
Copy link
Member

zsxwing commented May 2, 2017

Looks pretty good except one minor issue in tests.

@SparkQA
Copy link

SparkQA commented May 3, 2017

Test build #76398 has finished for PR 17735 at commit 63ed28a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@brkyvz
Copy link
Contributor

brkyvz commented May 3, 2017

LGTM. Merging to master/2.2. Thanks for the PR!

asfgit pushed a commit that referenced this pull request May 3, 2017
…treamingRelation should only be transformed to one StreamingExecutionRelation

## What changes were proposed in this pull request?

Within the same streaming query, when one `StreamingRelation` is referred multiple times – e.g. `df.union(df)` – we should transform it only to one `StreamingExecutionRelation`, instead of two or more different `StreamingExecutionRelation`s (each of which would have a separate set of source, source logs, ...).

## How was this patch tested?

Added two test cases, each of which would fail without this patch.

Author: Liwei Lin <[email protected]>

Closes #17735 from lw-lin/SPARK-20441.

(cherry picked from commit 27f543b)
Signed-off-by: Burak Yavuz <[email protected]>
@asfgit asfgit closed this in 27f543b May 3, 2017
@lw-lin lw-lin deleted the SPARK-20441 branch May 4, 2017 14:19
cenyuhai pushed a commit to cenyuhai/spark that referenced this pull request Oct 8, 2017
…treamingRelation should only be transformed to one StreamingExecutionRelation

## What changes were proposed in this pull request?

Within the same streaming query, when one `StreamingRelation` is referred multiple times – e.g. `df.union(df)` – we should transform it only to one `StreamingExecutionRelation`, instead of two or more different `StreamingExecutionRelation`s (each of which would have a separate set of source, source logs, ...).

## How was this patch tested?

Added two test cases, each of which would fail without this patch.

Author: Liwei Lin <[email protected]>

Closes apache#17735 from lw-lin/SPARK-20441.

(cherry picked from commit 27f543b)
Signed-off-by: Burak Yavuz <[email protected]>

(cherry picked from commit b1a732f)
cenyuhai pushed a commit to cenyuhai/spark that referenced this pull request Oct 8, 2017
…treamingRelation should only be transformed to one StreamingExecutionRelation

## What changes were proposed in this pull request?

Within the same streaming query, when one `StreamingRelation` is referred multiple times – e.g. `df.union(df)` – we should transform it only to one `StreamingExecutionRelation`, instead of two or more different `StreamingExecutionRelation`s (each of which would have a separate set of source, source logs, ...).

## How was this patch tested?

Added two test cases, each of which would fail without this patch.

Author: Liwei Lin <[email protected]>

Closes apache#17735 from lw-lin/SPARK-20441.

(cherry picked from commit 27f543b)
// "df.logicalPlan" has already used attributes of the previous `output`.
StreamingExecutionRelation(source, output)
case streamingRelation@StreamingRelation(dataSource, _, output) =>
toExecutionRelationMap.getOrElseUpdate(streamingRelation, {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using QueryPlan.sameResult? Our dedup could break it, right? cc @zsxwing @brkyvz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants