You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function
### What changes were proposed in this pull request?
Extracting millennium, century, decade, millisecond, microsecond and epoch from datetime is neither ANSI standard nor quite common in modern SQL platforms. Most of the systems listing below does not support these except PostgreSQL and redshift.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDFhttps://docs.oracle.com/cd/B19306_01/server.102/b14200/functions050.htmhttps://prestodb.io/docs/current/functions/datetime.htmlhttps://docs.cloudera.com/documentation/enterprise/5-8-x/topics/impala_datetime_functions.htmlhttps://docs.snowflake.com/en/sql-reference/functions-date-time.html#label-supported-date-time-partshttps://www.postgresql.org/docs/9.1/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT
This PR removes these extract fields support from extract function for date and timestamp values
`isoyear` is PostgreSQL specific but `yearofweek` is more commonly used across platforms
`isodow` is PostgreSQL specific but `iso` as a suffix is more commonly used across platforms so, `dow_iso` and `dayofweek_iso` is used to replace it.
For historical reasons, we have [`dayofweek`, `dow`] implemented for representing a non-ISO day-of-week and a newly added `isodow` from PostgreSQL for ISO day-of-week. Many other systems only have one week-numbering system support and use either full names or abbreviations. Things in spark become a little bit complicated.
1. because of the existence of `isodow`, so we need to add iso-prefix to `dayofweek` to make a pair for it too. [`dayofweek`, `isodayofweek`, `dow` and `isodow`]
2. because there are rare `iso`-prefixed systems and more systems choose `iso`-suffixed way, so we may result in [`dayofweek`, `dayofweekiso`, `dow`, `dowiso`]
3. `dayofweekiso` looks nice and has use cases in the platforms listed above, e.g. snowflake, but `dowiso` looks weird and no use cases found.
4. with a discussion the community,we have agreed with an underscore before `iso` may look much better because `isodow` is new and there is no standard for `iso` kind of things, so this may be good for us to make it simple and clear for end-users if they are well documented too.
Thus, we finally result in [`dayofweek`, `dow`] for Non-ISO day-of-week system and [`dayofweek_iso`, `dow_iso`] for ISO system
### Why are the changes needed?
Remove some nonstandard and uncommon features as we can add them back if necessary
### Does this PR introduce any user-facing change?
NO, we should target this to 3.0.0 and these are added during 3.0.0
### How was this patch tested?
Remove unused tests
Closes#28284 from yaooqinn/SPARK-31507.
Authored-by: Kent Yao <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@@ -2206,26 +2082,20 @@ case class DatePart(field: Expression, source: Expression, child: Expression)
2206
2082
arguments ="""
2207
2083
Arguments:
2208
2084
* field - selects which part of the source should be extracted
2209
-
- Supported string values of `field` for dates and timestamps are:
2210
-
- "MILLENNIUM", ("MILLENNIA", "MIL", "MILS") - the conventional numbering of millennia
2211
-
- "CENTURY", ("CENTURIES", "C", "CENT") - the conventional numbering of centuries
2212
-
- "DECADE", ("DECADES", "DEC", "DECS") - the year field divided by 10
2085
+
- Supported string values of `field` for dates and timestamps are(case insensitive):
2213
2086
- "YEAR", ("Y", "YEARS", "YR", "YRS") - the year field
2214
-
- "ISOYEAR" - the ISO 8601 week-numbering year that the datetime falls in
2087
+
- "YEAROFWEEK" - the ISO 8601 week-numbering year that the datetime falls in. For example, 2005-01-02 is part of the 53rd week of year 2004, so the result is 2004
2215
2088
- "QUARTER", ("QTR") - the quarter (1 - 4) of the year that the datetime falls in
2216
2089
- "MONTH", ("MON", "MONS", "MONTHS") - the month field (1 - 12)
2217
2090
- "WEEK", ("W", "WEEKS") - the number of the ISO 8601 week-of-week-based-year. A week is considered to start on a Monday and week 1 is the first week with >3 days. In the ISO week-numbering system, it is possible for early-January dates to be part of the 52nd or 53rd week of the previous year, and for late-December dates to be part of the first week of the next year. For example, 2005-01-02 is part of the 53rd week of year 2004, while 2012-12-31 is part of the first week of 2013
2218
2091
- "DAY", ("D", "DAYS") - the day of the month field (1 - 31)
2219
2092
- "DAYOFWEEK",("DOW") - the day of the week for datetime as Sunday(1) to Saturday(7)
2220
-
- "ISODOW" - ISO 8601 based day of the week for datetime as Monday(1) to Sunday(7)
2093
+
- "DAYOFWEEK_ISO",("DOW_ISO") - ISO 8601 based day of the week for datetime as Monday(1) to Sunday(7)
2221
2094
- "DOY" - the day of the year (1 - 365/366)
2222
2095
- "HOUR", ("H", "HOURS", "HR", "HRS") - The hour field (0 - 23)
2223
2096
- "MINUTE", ("M", "MIN", "MINS", "MINUTES") - the minutes field (0 - 59)
2224
2097
- "SECOND", ("S", "SEC", "SECONDS", "SECS") - the seconds field, including fractional parts
2225
-
- "MILLISECONDS", ("MSEC", "MSECS", "MILLISECON", "MSECONDS", "MS") - the seconds field, including fractional parts, multiplied by 1000. Note that this includes full seconds
2226
-
- "MICROSECONDS", ("USEC", "USECS", "USECONDS", "MICROSECON", "US") - The seconds field, including fractional parts, multiplied by 1000000. Note that this includes full seconds
2227
-
- "EPOCH" - the number of seconds with fractional part in microsecond precision since 1970-01-01 00:00:00 local time (can be negative)
2228
-
- Supported string values of `field` for interval(which consists of `months`, `days`, `microseconds`) are:
2098
+
- Supported string values of `field` for interval(which consists of `months`, `days`, `microseconds`) are(case insensitive):
2229
2099
- "YEAR", ("Y", "YEARS", "YR", "YRS") - the total `months` / 12
2230
2100
- "MONTH", ("MON", "MONS", "MONTHS") - the total `months` % 12
2231
2101
- "DAY", ("D", "DAYS") - the `days` part of interval
@@ -2258,7 +2128,7 @@ case class Extract(field: Expression, source: Expression, child: Expression)
0 commit comments