Skip to content

cross_join() in place of by = character() #6604

@DavisVaughan

Description

@DavisVaughan

Rationale for adding an explicit cross_join() function in favor of the existing by = character() method:

  • I think by = character() is fairly unintuitive, and I’ve seen 2 cases recently where people didn’t even know it existed

  • I think by = character() should really be an error. I think its current behavior of doing a cross join is actually a little scary. Like, supplying c("x", "y") matches on two columns, supplying "x" matches on one column, but character() all of the sudden matches all rows? Seems very odd to me.

  • Importantly, currently only inner_join(by = character()) is a true cross join. SQL cross joins must result in nrow(x) * nrow(y) number of rows, and only the inner cross join does this in the edge case where one of the inputs has 0 rows (i.e. the left join would retain all x rows). I know this is edge case behavior, but the weirdness here feels worth resolving by just adding cross_join()

  • I think it would be more straightforward for dbplyr and friends to convert cross_join() to SQL. Converting left_join(by = character()) to SQL is probably quite odd.

  • It would be nice that cross_join() wouldn’t have a by argument at all, making it very clear it doesn’t have anything to do with columns

  • We could document cross_join() on its own help page, raising awareness of it

  • join_by() wouldn’t have to support cross joins, which it currently does by allowing empty join_by() calls to be treated like cross joins. I could turn that into an error for 1.1.0 if we also added cross_join() at the same time, which would be much nicer than having to deprecate it slowly in the future.

Metadata

Metadata

Assignees

Labels

featurea feature request or enhancementtables 🧮joins and set operations

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions