-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
Rationale for adding an explicit cross_join() function in favor of the existing by = character() method:
-
I think
by = character()is fairly unintuitive, and I’ve seen 2 cases recently where people didn’t even know it existed -
I think
by = character()should really be an error. I think its current behavior of doing a cross join is actually a little scary. Like, supplyingc("x", "y")matches on two columns, supplying"x"matches on one column, butcharacter()all of the sudden matches all rows? Seems very odd to me. -
Importantly, currently only
inner_join(by = character())is a true cross join. SQL cross joins must result innrow(x) * nrow(y)number of rows, and only the inner cross join does this in the edge case where one of the inputs has 0 rows (i.e. the left join would retain all x rows). I know this is edge case behavior, but the weirdness here feels worth resolving by just addingcross_join() -
I think it would be more straightforward for dbplyr and friends to convert
cross_join()to SQL. Convertingleft_join(by = character())to SQL is probably quite odd. -
It would be nice that
cross_join()wouldn’t have abyargument at all, making it very clear it doesn’t have anything to do with columns -
We could document
cross_join()on its own help page, raising awareness of it -
join_by()wouldn’t have to support cross joins, which it currently does by allowing emptyjoin_by()calls to be treated like cross joins. I could turn that into an error for 1.1.0 if we also addedcross_join()at the same time, which would be much nicer than having to deprecate it slowly in the future.