Skip to content

Conversation

@thinkharderdev
Copy link

Looks like parquet sources don't produce any data when scheduled. This is a test case which reproduces.

@tustvold
Copy link
Owner

tustvold commented Apr 24, 2022

Thank you for the report, I'm afraid I am away from a computer the next couple of days but I'll take a look when I get back.

That being said, my hunch is that the test is panicking somewhere, and this is getting lost as the log facade isn't initialised. Perhaps you might test this out?

(As an aside fixing I'll add fixing the lack of panic guards to my list)

@thinkharderdev
Copy link
Author

Thank you for the report, I'm afraid I am away from a computer the next couple of days but I'll take a look when I get back.

That being said, my hunch is that the test is panicking somewhere, and this is getting lost as the log facade isn't initialised. Perhaps you might test this out?

(As an aside fixing I'll add fixing the lack of panic guards to my list)

Yeah, I can take a look

@thinkharderdev
Copy link
Author

K, I think it's fixed. We have to treat CoalesceBatchExec as a repartition pipeline since it will eagerly poll it's input.

@tustvold
Copy link
Owner

Hmm... That should just work... Something isn't quite right here... FWIW CoalesceBatches is not a repartition operation, it stitches batches together within a partition, so this PR currently will change the behaviour of the plan

@thinkharderdev
Copy link
Author

K, looked some more and I think the issue is that the output channel closes as soon as any output partition finishes. Pushed a change that will track the active output partitions and close the output channel only if all partitions are done.

@tustvold
Copy link
Owner

Ahah! Yeah that would do it, thanks for investigating 👍

@tustvold
Copy link
Owner

tustvold commented May 3, 2022

I fixed this as part of supporting partitioned execution - see apache@505e880

In particular a reduced version of this test case can be found apache@505e880#diff-0005c0590888cdd2c7efd378972ade4f764ec75f4c62e009eb863e4a2bef99f9R378

Thanks again for your help in diagnosing this issue 🏅

@tustvold tustvold closed this May 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants