Support different scales for decimal binary math functions #19874
Support different scales for decimal binary math functions #19874theirix wants to merge 7 commits into
Conversation
Introduce calculate_binary_math_decimal and calculate_binary_math_numeric functions with a proper handling of arguments of different scales and type casting. They supersede calculate_binary_math and calculate_binary_math_decimal due to having a slightly different functior signature with automatic passing of effective precision and scale (even if rescaled).
|
I'm having trouble understanding the rationale here; Also for |
For most of these functions, agree, only one argument is decimal, so we execute the decimal/non-decimal case with code for adjusting both scales untouched. Overall, I can see the following benefits:
So, symmetric functions like gcd/lcm (WIP), mod, div, etc benefit most from this PR. From the first glance, a For
Yes, I discovered it recently. I am wondering whether it could also be simplified using this PR, since it handles different input and output precisions and scales. |
|
I agree log can be greatly simplified, and it's something I'd love to get around to eventually (same for power), however I'm not sure if this is the way to do it; the code introduced here at a glance is not trivial and it doesn't solve any use case at hand (in that nothing is being simplified by this PR). Perhaps it would be easier to see the use case if one of the functions which could benefit from this are refactored to use it in a way that has clear benefits over the existing |
|
Hello @Jefffrey, I agree, there aren't many functions that support both decimal types. Most likely, it will be only The rationale for this PR is to support these symmetric functions for both decimal types and avoid scaling issues, as shown below. This implementation of When I tried to implement it using the old function, I immediately encountered a problem with the old function Details``` [2026-02-09T22:17:42Z INFO datafusion_functions::math::gcd] invoke gcd with PrimitiveArray [ 2, ] and Scalar(Decimal128(Some(3),38,0)) [2026-02-09T22:17:42Z INFO datafusion_functions::utils] Calculating binary math with left PrimitiveArray [ 2, ] and right Scalar(Decimal128(Some(30000000000),38,10)) [2026-02-09T22:17:42Z INFO datafusion_functions::math::gcd] euclid_gcd_unsigned a=2 b=30000000000 -> Ok(2) [2026-02-09T22:17:42Z INFO datafusion_functions::utils] Calculating binary math with result PrimitiveArray [ 2, ] ```This implementation avoids this bug by properly analysing scale, so rescaling shouldn't be handled in the UDF code. If you check the implementation closely, it uses rescaling only for the decimal-decimal case. Other cases have code almost identical to the old function. I can see two approaches for implementing decimal-decimal math functions:
I favour the first approach for consistency, but I'm happy to refactor with the second approach if you find it more suitable. |
|
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
@Jefffrey , I would appreciate hearing your thoughts on this |
|
Decided to go with the way (2) - support different scales per-udf since it is has a limited scope for two udfs only. This pr is an overkill. So closing Done in #22655 |
## Which issue does this PR close? - Closes apache#19057. ## Rationale for this change A binary gcd and lcm UDF in the datafusion-functions crate supports only Int64, but not Decimals. Adding missing support for decimals. ## What changes are included in this PR? 1. Updated gcd and lcm functions to add decimal support. The integer path is more performant and stays intact. For decimals, the Euclidean algorithm is used for GCD 2. Added coercion rules: casting to decimals if any argument is decimal; otherwise, stay with ints as before 3. Common functionality extracted to `common.rs` to avoid inter-UDF dependency 4. In order to use `calculate_binary_math` for Decimals, updated it to accept a target type instead of raw `Decimal128Type::DATA_TYPE` - it causes scaling issues for these UDFs, see apache#19621 A bit more on (4). The driving force is this failing example: ```sql query R select gcd(2::decimal(38, 0), 3::decimal(38, 0)); ---- 1 ``` Previously in apache#19874, I suggested a more complicated solution to extend `calculate_binary_math`. However, it only affected gcd/lcm and could be considered overkill. This PR extends these functions with an extra parameter `cast_target` for `calculate_binary_decimal_math` to perform a proper cast to the actual type used, rather than to the default `Decimal128Type::DATA_TYPE` - it is much lighter. ## Are these changes tested? - Added unit test for UDFs with decimals for array and scalar paths - Added unit tests for the gcd/lcm math itself - Added new SLT tests for decimals ## Are there any user-facing changes? No
Which issue does this PR close?
Rationale for this change
A helper
calculate_binary_math andUDFs relying on it could behave strangely if the scales of inputs and outputs are different. Original logic didn't fully handle it.So let's introduce
calculate_binary_math_decimalandcalculate_binary_math_numeric functionswith a proper handling of arguments of different scales and type casting for input and output arguments.They supersede
calculate_binary_mathandcalculate_binary_math_decimalbecause they have a slightly different functor signature that automatically passes the effective precision and scale (even if rescaled). The rest is compatible.What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
Older functions could be deprecated. Since they are a part of the public interface of datafusion-functions, I just placed a comment without a full-fledged
deprecatemacro. Up to discussion whether it should be used