Fix parse '1'::interval as month by default#11454
Conversation
|
My approach to this problem maybe a bit naive! open to suggestions. use datafusion::prelude::*;
#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
// register the table
let ctx = SessionContext::new();
ctx.register_csv("example", "test.csv", CsvReadOptions::new()).await?;
// create a plan to run a SQL query
let df = ctx.sql("SELECT '-12'::interval, '1'::interval, '1 day 2 month 3 hour'::interval FROM example").await?;
// execute and print results
df.show().await?;
Ok(())
}+---------------------------------------------------------+-------------------------------------------------------+-------------------------------------------------------+
| Utf8("-12") | Utf8("1") | Utf8("1 day 2 month 3 hour") |
+---------------------------------------------------------+-------------------------------------------------------+-------------------------------------------------------+
| 0 years 0 mons 0 days 0 hours 0 mins -12.000000000 secs | 0 years 0 mons 0 days 0 hours 0 mins 1.000000000 secs | 0 years 2 mons 1 days 3 hours 0 mins 0.000000000 secs |
+---------------------------------------------------------+-------------------------------------------------------+-------------------------------------------------------+ |
'1'::interval as month by default
I recommend sqllogictests Here are the instructions: https://github.com/apache/datafusion/tree/main/datafusion/sqllogictest Ideally you should be able to extend one of the existing test files in https://github.com/apache/datafusion/tree/main/datafusion/sqllogictest/test_files |
| } else { | ||
| // Arrow by default will parse str as Month for unit MonthDayNano. | ||
| // So we need to be explict that we want it to parse as second. | ||
| match (scalar, cast_type) { |
There was a problem hiding this comment.
one issue with this approach is that it will only work for single values like '1'::interval
the behavior of
create table foo as values ('1');
select column1::interval from foo;will not be affected
I recommend you update the code in kernels::cast to have the new semantics you have in mind
There was a problem hiding this comment.
Sorry but as my understand is that kernels::cast is within arrow-cast which it a different repo. So you mean this change is should happen in arrow-cast repo ?
There was a problem hiding this comment.
Yes, eventally the fix should make it up to arrow-cast
In the interim, you could make a wrapper function in datafusion with the same signature as cast in arrow and implement your logic there for interval and fall through to the kernel in arrow-cast otherwise
I haven't though through the implications of changing the semantics of casting strings to intervals to be honest
|
@nix010 we really need this, any chance you could take a look at failing tests, otherwise I'd be happy to have a go at it. |
| None | ||
| } | ||
|
|
||
| fn _kernels_cast_with_option( |
There was a problem hiding this comment.
maybe useful to add a docstring here explaining why this method is necessary / what it does.
| // Arrow by default will parse str as Month for unit MonthDayNano. | ||
| // So we need to be explict that we want it to parse as second. |
|
|
||
| ### date / timestamp (array) + interval (scalar) | ||
| query D | ||
| query error DataFusion error: Error during planning: table 'datafusion\.public\.t' not found |
There was a problem hiding this comment.
these should definitely work, not raise an error!
|
|
||
| ### interval with string literal default as second | ||
| query ???? | ||
| SELECT interval '1', '-12'::interval, '1'::interval, '1'::interval + '1'::interval |
There was a problem hiding this comment.
this is very hard to read, better to do separate tests.
|
I think this is the wrong approach, it would be better to fix the underlying implementation in arrow-rs. |
Ok, then I think this PR can be closed. |
1 similar comment
Ok, then I think this PR can be closed. |
|
Marking as draft to signify this PR isn't waiting on feedback anymore |
|
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
Which issue does this PR close?
Closes #11271 .
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?