Skip to content

Conversation

@delamarch3
Copy link
Contributor

@delamarch3 delamarch3 commented Feb 24, 2025

Which issue does this PR close?

Rationale for this change

date_part with an interval returns results inconsistent with implementations in duckdb and postgres.

What changes are included in this PR?

Parts are excluded from the interval:
Milliseconds, microseconds and nanoseconds will exclude the minutes and hours from the date part, but not seconds, so date_part('ms', interval '61s') will return 1000
Seconds will exclude minutes and hours, so date_part('s', interval '3600s') will return 0
Hours will not exclude days, so date_part('h', interval '25h') returns 25
Months will exclude years, so date_part('month', interval '13 months') returns 1

Are there any user-facing changes?

Yes, date_part with IntervalDayTimeType and IntervalMonthDayNanoType will return a different result

@github-actions github-actions bot added the arrow Changes to the arrow crate label Feb 24, 2025
match part {
DatePart::Year => Ok(self.unary_opt(|d: IntervalMonthDayNano| Some(d.months / 12))),
DatePart::Month => Ok(self.unary_opt(|d: IntervalMonthDayNano| Some(d.months))),
DatePart::Month => Ok(self.unary_opt(|d: IntervalMonthDayNano| Some(d.months % 12))),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D SELECT datepart('year', interval '12 month');
┌────────────────────────────────────────────────┐
│ datepart('year', CAST('12 month' AS INTERVAL)) │
│                     int64                      │
├────────────────────────────────────────────────┤
│                       1                        │
└────────────────────────────────────────────────┘
D SELECT datepart('month', interval '12 month');
┌─────────────────────────────────────────────────┐
│ datepart('month', CAST('12 month' AS INTERVAL)) │
│                      int64                      │
├─────────────────────────────────────────────────┤
│                        0                        │
└─────────────────────────────────────────────────┘

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgres agrees

postgres=# SELECT extract (month from interval '12 month');
 extract
---------
       0
(1 row)

Before this PR arrow also currently says 12.

> SELECT datepart('month', interval '12 month');
+---------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("month"),IntervalMonthDayNano("IntervalMonthDayNano { months: 12, days: 0, nanoseconds: 0 }")) |
+---------------------------------------------------------------------------------------------------------------+
| 12                                                                                                            |
+---------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

I think this change makes sense

}
DatePart::Nanosecond => {
Ok(self.unary_opt(|d| (d.milliseconds % (60 * 1_000)).checked_mul(1_000_000)))
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D SELECT datepart('m', interval '1m 61s 33ms 44us');
┌─────────────────────────────────────────────────────┐
│ datepart('m', CAST('1m 61s 33ms 44us' AS INTERVAL)) │
│                        int64                        │
├─────────────────────────────────────────────────────┤
│                          2                          │
└─────────────────────────────────────────────────────┘
D SELECT datepart('s', interval '1m 61s 33ms 44us');
┌─────────────────────────────────────────────────────┐
│ datepart('s', CAST('1m 61s 33ms 44us' AS INTERVAL)) │
│                        int64                        │
├─────────────────────────────────────────────────────┤
│                          1                          │
└─────────────────────────────────────────────────────┘
D SELECT datepart('ms', interval '1m 61s 33ms 44us');
┌──────────────────────────────────────────────────────┐
│ datepart('ms', CAST('1m 61s 33ms 44us' AS INTERVAL)) │
│                        int64                         │
├──────────────────────────────────────────────────────┤
│                         1033                         │
└──────────────────────────────────────────────────────┘
D SELECT datepart('us', interval '1m 61s 33ms 44us');
┌──────────────────────────────────────────────────────┐
│ datepart('us', CAST('1m 61s 33ms 44us' AS INTERVAL)) │
│                        int64                         │
├──────────────────────────────────────────────────────┤
│                       1033044                        │
│                    (1.03 million)                    │
└──────────────────────────────────────────────────────┘

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a behavior change for arrow-rs.

duckdb says 1

SELECT datepart('s', interval '1m 61s 33ms 44us'); ┌─────────────────────────────────────────────────────┐ │ datepart('s', CAST('1m 61s 33ms 44us' AS INTERVAL)) │ │ int64 │ ├─────────────────────────────────────────────────────┤ │ 1 │ └─────────────────────────────────────────────────────┘

postgres says almost 1:

postgres=# SELECT extract (seconds from  interval '1m 61s 33ms 44us');
 extract
----------
 1.033044
(1 row)

Arrow says 121 (before this PR)

> SELECT datepart('s', interval '1m 61s 33ms 44us');
+---------------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("s"),IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 121033044000 }")) |
+---------------------------------------------------------------------------------------------------------------------+
| 121                                                                                                                 |
+---------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.004 seconds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing for millisecond and microseconds

match part {
DatePart::Week => Ok(self.unary_opt(|d| Some(d.days / 7))),
DatePart::Day => Ok(self.unary_opt(|d| Some(d.days))),
DatePart::Hour => Ok(self.unary_opt(|d| Some(d.milliseconds / (60 * 60 * 1_000)))),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D SELECT datepart('hour', interval '25 hour');
┌───────────────────────────────────────────────┐
│ datepart('hour', CAST('25 hour' AS INTERVAL)) │
│                     int64                     │
├───────────────────────────────────────────────┤
│                      25                       │
└───────────────────────────────────────────────┘
D SELECT datepart('day', interval '25 hour');
┌──────────────────────────────────────────────┐
│ datepart('day', CAST('25 hour' AS INTERVAL)) │
│                    int64                     │
├──────────────────────────────────────────────┤
│                      0                       │
└──────────────────────────────────────────────┘

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, the current arrow behavior is the same it seems:

> SELECT datepart('hour', interval '25 hour');
+--------------------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("hour"),IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 90000000000000 }")) |
+--------------------------------------------------------------------------------------------------------------------------+
| 25                                                                                                                       |
+--------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.030 seconds.

> SELECT datepart('day', interval '25 hour');
+-------------------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("day"),IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 90000000000000 }")) |
+-------------------------------------------------------------------------------------------------------------------------+
| 0                                                                                                                       |
+-------------------------------------------------------------------------------------------------------------------------+

@delamarch3 delamarch3 marked this pull request as ready for review February 27, 2025 21:34
@alamb alamb changed the title Extract date parts out of intervals Fix: date_part to extract only the requested part (not the overall interval) Mar 7, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @delamarch3 -- this makes sense to me

cc @Omega359

Related

Thank you very much

match part {
DatePart::Week => Ok(self.unary_opt(|d| Some(d.days / 7))),
DatePart::Day => Ok(self.unary_opt(|d| Some(d.days))),
DatePart::Hour => Ok(self.unary_opt(|d| Some(d.milliseconds / (60 * 60 * 1_000)))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, the current arrow behavior is the same it seems:

> SELECT datepart('hour', interval '25 hour');
+--------------------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("hour"),IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 90000000000000 }")) |
+--------------------------------------------------------------------------------------------------------------------------+
| 25                                                                                                                       |
+--------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.030 seconds.

> SELECT datepart('day', interval '25 hour');
+-------------------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("day"),IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 90000000000000 }")) |
+-------------------------------------------------------------------------------------------------------------------------+
| 0                                                                                                                       |
+-------------------------------------------------------------------------------------------------------------------------+

}
DatePart::Nanosecond => {
Ok(self.unary_opt(|d| (d.milliseconds % (60 * 1_000)).checked_mul(1_000_000)))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a behavior change for arrow-rs.

duckdb says 1

SELECT datepart('s', interval '1m 61s 33ms 44us'); ┌─────────────────────────────────────────────────────┐ │ datepart('s', CAST('1m 61s 33ms 44us' AS INTERVAL)) │ │ int64 │ ├─────────────────────────────────────────────────────┤ │ 1 │ └─────────────────────────────────────────────────────┘

postgres says almost 1:

postgres=# SELECT extract (seconds from  interval '1m 61s 33ms 44us');
 extract
----------
 1.033044
(1 row)

Arrow says 121 (before this PR)

> SELECT datepart('s', interval '1m 61s 33ms 44us');
+---------------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("s"),IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 121033044000 }")) |
+---------------------------------------------------------------------------------------------------------------------+
| 121                                                                                                                 |
+---------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.004 seconds.

}
DatePart::Nanosecond => {
Ok(self.unary_opt(|d| (d.milliseconds % (60 * 1_000)).checked_mul(1_000_000)))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing for millisecond and microseconds

match part {
DatePart::Year => Ok(self.unary_opt(|d: IntervalMonthDayNano| Some(d.months / 12))),
DatePart::Month => Ok(self.unary_opt(|d: IntervalMonthDayNano| Some(d.months))),
DatePart::Month => Ok(self.unary_opt(|d: IntervalMonthDayNano| Some(d.months % 12))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgres agrees

postgres=# SELECT extract (month from interval '12 month');
 extract
---------
       0
(1 row)

Before this PR arrow also currently says 12.

> SELECT datepart('month', interval '12 month');
+---------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("month"),IntervalMonthDayNano("IntervalMonthDayNano { months: 12, days: 0, nanoseconds: 0 }")) |
+---------------------------------------------------------------------------------------------------------------+
| 12                                                                                                            |
+---------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

I think this change makes sense

@Omega359
Copy link
Contributor

Omega359 commented Mar 7, 2025

I agree this change makes sense for consistency with established systems.

@tustvold
Copy link
Contributor

tustvold commented Mar 8, 2025

As this is a behaviour change, even if the consensus appears to be that the current behaviour is somewhere between undesirable and a bug, should this wait until the next major release?

@alamb
Copy link
Contributor

alamb commented Mar 8, 2025

As this is a behaviour change, even if the consensus appears to be that the current behaviour is somewhere between undesirable and a bug, should this wait until the next major release?

That probably makes sense

That will be in April so not that long:

@alamb alamb added the next-major-release the PR has API changes and it waiting on the next major version label Mar 8, 2025
@alamb
Copy link
Contributor

alamb commented Mar 8, 2025

Marking with label

@alamb alamb added the api-change Changes to the arrow API label Mar 8, 2025
@alamb alamb merged commit 15db256 into apache:main Mar 25, 2025
13 checks passed
@alamb
Copy link
Contributor

alamb commented Mar 25, 2025

🚀 -- thank you for your patience @delamarch3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-change Changes to the arrow API arrow Changes to the arrow crate next-major-release the PR has API changes and it waiting on the next major version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

date_part using an interval returns an incorrect result

4 participants