Skip to content

Conversation

@cloud-fan
Copy link
Contributor

draft prototype, submit PR to test it via jenkins.

TODO:

  1. the new Dataset map, mapPartitions, etc. conflict with the existing ones(which just forward to RDD), we should remove old ones, but that will break some other code, so here we still keep the old ones, and use different names like mapPartitions2 for new ones.
  2. mapPartitions is the fundamental function, which is enough for prototype, I'll add map, flatMap, etc. later based on it.

@SparkQA
Copy link

SparkQA commented Feb 8, 2016

Test build #50927 has finished for PR 11117 at commit 4c757f1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class PipelinedDataFrame(DataFrame):
    • case class PythonMapPartitions(
    • case class PythonMapPartitions(

@SparkQA
Copy link

SparkQA commented Feb 14, 2016

Test build #51258 has finished for PR 11117 at commit 5282d42.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class PipelinedDataFrame(DataFrame):
    • case class PythonMapPartitions(
    • case class PythonMapPartitions(

@SparkQA
Copy link

SparkQA commented Feb 14, 2016

Test build #51262 has finished for PR 11117 at commit 6107495.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class PipelinedDataFrame(DataFrame):
    • case class PythonMapPartitions(
    • case class PythonMapPartitions(

@SparkQA
Copy link

SparkQA commented Feb 15, 2016

Test build #51297 has finished for PR 11117 at commit 15fd836.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

).compute(inputIterator, context.partitionId(), context)

if (outputIsPickled) {
outputIterator.map(bytes => InternalRow(bytes))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid copying the bytes, here I create safe rows. However, according to #10511, operators should always produce unsafe rows. Actually python UDF operator(BatchPythonEvaluation) also produce safe rows, which may also have problems. Should we bring back the requireUnsafeRow stuff? In some cases like here, converting to unsafe rows is expensive and may not have much benefit.

cc @davies

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BatchPythonEvaluation will produce UnsafeRow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, I missed the unsafe projection at the very last. Then we can probably add an unsafe projection here too.

@SparkQA
Copy link

SparkQA commented Feb 15, 2016

Test build #51300 has finished for PR 11117 at commit 6c26daa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 15, 2016

Test build #51301 has finished for PR 11117 at commit d96f103.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 15, 2016

Test build #51308 has finished for PR 11117 at commit 4862fe1.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Feb 16, 2016

Test build #51354 has finished for PR 11117 at commit 4dfe604.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 17, 2016

Test build #51433 has finished for PR 11117 at commit 1c8e7b3.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class PythonAppendColumns(
    • case class PythonMapGroups(
    • case class PythonAppendColumns(
    • case class PythonMapGroups(

@SparkQA
Copy link

SparkQA commented Feb 17, 2016

Test build #51436 has finished for PR 11117 at commit 862288b.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class PythonAppendColumns(
    • case class PythonMapGroups(
    • case class PythonAppendColumns(
    • case class PythonMapGroups(

@SparkQA
Copy link

SparkQA commented Feb 17, 2016

Test build #51438 has finished for PR 11117 at commit e0ca98f.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class PythonAppendColumns(
    • case class PythonMapGroups(
    • case class PythonAppendColumns(
    • case class PythonMapGroups(

@SparkQA
Copy link

SparkQA commented Feb 19, 2016

Test build #51518 has finished for PR 11117 at commit a772492.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 19, 2016

Test build #51526 has finished for PR 11117 at commit 590308a.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 19, 2016

Test build #51534 has finished for PR 11117 at commit c883fa6.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Feb 19, 2016

Test build #51559 has finished for PR 11117 at commit df53348.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 20, 2016

Test build #51588 has finished for PR 11117 at commit 97dcac2.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 21, 2016

Test build #51610 has finished for PR 11117 at commit e0e86c2.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Feb 21, 2016

Test build #51617 has finished for PR 11117 at commit 4783c4c.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 21, 2016

Test build #51620 has finished for PR 11117 at commit 4c3c2b5.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 21, 2016

Test build #51638 has finished for PR 11117 at commit 349b119.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 22, 2016

Test build #51652 has finished for PR 11117 at commit 8c32d31.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Feb 22, 2016

Test build #51680 has finished for PR 11117 at commit aec6fc4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Feb 23, 2016

Test build #51720 has finished for PR 11117 at commit 1095d7f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants