Skip to content

Conversation

@nealrichardson
Copy link
Member

@nealrichardson nealrichardson commented Feb 18, 2021

First steps:

  • Rework selected_columns to hold field_refs instead of string column names; add code to back out the string field names where needed (e.g. dataset Project())
  • Create an array_ref pseudo-function to do the same as field_ref for array_expressions
  • Add a data argument to eval_array_expression in order to bind array_refs to the actual Arrays before evaluating
  • Refactor filter() NSE code for reuse in mutate()
  • Split up dplyr tests because we're going to be adding lots more

Then:

  • Basic mutate() and transmute() (done in 578d492)
  • Go through the examples in the dplyr::mutate() docs and add tests for all cases. Where possible they're implemented in arrow fully; where we don't support the functions, it falls back to the current behavior of pulling the data into R first.

Followup JIRAs:

  • ARROW-11704: Wire up dplyr::mutate() for datasets
  • ARROW-16999: Implement dplyr::across() and autosplicing
  • ARROW-11700: Internationalize error handling in tidy eval
  • ARROW-11701: Implement dplyr::relocate()
  • ARROW-11702: Enable ungrouped aggregations in non-Dataset expressions
  • ARROW-11658: Handle mutate/rename inside group_by
  • ARROW-11705: Support scalar value recycling in RecordBatch/Table$create()
  • ARROW-11754: Support dplyr::compute()
  • ARROW-11752: Replace usage of testthat::expect_is()
  • ARROW-11755: Add tests from dplyr/test-mutate.r
  • ARROW-11785: Fallback when filtering Table with if_any() expression fails

@github-actions
Copy link

@nealrichardson nealrichardson marked this pull request as ready for review February 19, 2021 22:14
Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, I went through commit by commit which helped before going back over the full diff. A few suggestions and a few questions (making sure I am reading this correctly)

@nealrichardson nealrichardson deleted the mutate branch March 22, 2021 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants