Skip to content

[ENH] Support dplyr 1.1.0~1.1.2 #189

@pwwang

Description

@pwwang

Feature Type

  • Adding new functionality to datar

  • Changing existing functionality in datar

  • Removing existing functionality in datar

Problem Description

Feature Description

  • *_join()

    • A join specification can now be created through join_by(). This allows
      you to specify both the left and right hand side of a join using unquoted
      column names, such as join_by(sale_date == commercial_date). Join
      specifications can be supplied to any *_join() function as the by
      argument.

      Join specifications allow for new types of joins:

      • Equality joins: The most common join, specified by ==. For example,
        join_by(sale_date == commercial_date).

      • Inequality joins: For joining on inequalities, i.e.>=, >, <, and
        <=. For example, use join_by(sale_date >= commercial_date) to find
        every commercial that aired before a particular sale.

      • Rolling joins: For "rolling" the closest match forward or backwards when
        there isn't an exact match, specified by using the rolling helper,
        closest(). For example,
        join_by(closest(sale_date >= commercial_date)) to find only the most
        recent commercial that aired before a particular sale.

      • Overlap joins: For detecting overlaps between sets of columns, specified
        by using one of the overlap helpers: between(), within(), or
        overlaps(). For example, use
        join_by(between(commercial_date, sale_date_lower, sale_date)) to
        find commercials that aired before a particular sale, as long as they
        occurred after some lower bound, such as 40 days before the sale was made.

      • multiple is a new argument for controlling what happens when a row
        in x matches multiple rows in y. For equality joins and rolling joins,
        where this is usually surprising, this defaults to signalling a "warning",
        but still returns all of the matches. For inequality joins, where multiple
        matches are usually expected, this defaults to returning "all" of the
        matches. You can also return only the "first" or "last" match, "any"
        of the matches, or you can "error".

      • keep now defaults to NULL rather than FALSE. NULL implies
        keep = FALSE for equality conditions, but keep = TRUE for inequality
        conditions, since you generally want to preserve both sides of an
        inequality join.

      • unmatched is a new argument for controlling what happens when a row
        would be dropped because it doesn't have a match. For backwards
        compatibility, the default is "drop", but you can also choose to
        "error" if dropped rows would be surprising.

  • consecutive_id() for creating groups based on contiguous runs of the
    same values

  • case_match() is a "vectorised switch" variant of case_when() that matches
    on values rather than logical expressions. It is like a SQL "simple"
    CASE WHEN statement, whereas case_when() is like a SQL "searched"
    CASE WHEN statement

  • cross_join() is a more explicit and slightly more correct replacement for
    using by = character() during a join

  • pick() makes it easy to access a subset of columns from the current group.
    pick() is intended as a replacement for across(.fns = NULL), cur_data(),
    and cur_data_all(). We feel that pick() is a much more evocative name when
    you are just trying to select a subset of columns from your data.

  • symdiff() computes the symmetric difference.

  • cur_data() and cur_data_all() are soft-deprecated in favour of
    pick()

  • across(), c_across(), if_any(), and if_all() now require the
    _cols and _fns arguments. In general, we now recommend that you use
    pick() instead of an empty across() call or across() with no _fns
    (e.g. across(c(x, y)). (see also Quietly deprecate optional .cols and .fns cases tidyverse/dplyr#6523).

  • Passing **kwargs to across() is deprecated because it's ambiguous when
    those arguments are evaluated. (see also Deprecate across(, ...) tidyverse/dplyr#6073).

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions