Skip to content

Array types coercion does not preserve child element nullability for list types #17305

Description

@sgrebnov

Describe the bug

After upgrading from DataFusion 47 to a newer version I started seeing schema mismatch errors caused by updated array type coercion logic that does not preserve nullability information for nested types.

SELECT offset[2]-offset[1] FROM rd;
Arrow error: Invalid argument error: column types must match schema types, expected List(Field { name: "item", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) but found List(Field { name: "item", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }) at column index 0

To Reproduce

The following unit test can be used to verify this behavior.

assertion left == right failed
left: [[List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })]]
right: [[List(Field { name: "item", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), List(Field { name: "item", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} })]]
stack backtrace:

fn test_get_valid_types_fixed_size_arrays() -> Result<()> {
        let function = "fixed_size_arrays";
        let signature = Signature::arrays(2, None, Volatility::Immutable);

        let data_types = vec![
            DataType::new_fixed_size_list(DataType::Int64, 3, false),
            DataType::new_list(DataType::Int32, false),
        ];
        assert_eq!(
            get_valid_types(function, &signature.type_signature, &data_types)?,
            vec![vec![
                DataType::new_list(DataType::Int64, false),
                DataType::new_list(DataType::Int64, false),
            ]]
        );

        Ok(())
    }

This can also be observed by adding additional tracing into coerce_arguments_for_signature_with_scalar_udf. Observe data_type: Int32, nullable: false has changed to data_type: Int32, nullable: true in coerced type.

/// Returns `expressions` coerced to types compatible with
/// `signature`, if possible.
///
/// See the module level documentation for more detail on coercion.
fn coerce_arguments_for_signature_with_scalar_udf(
    expressions: Vec<Expr>,
    schema: &DFSchema,
    func: &ScalarUDF,
) -> Result<Vec<Expr>> {
    if expressions.is_empty() {
        return Ok(expressions);
    }

    let current_types = expressions
        .iter()
        .map(|e| e.get_type(schema))
        .collect::<Result<Vec<_>>>()?;

    let new_types = data_types_with_scalar_udf(&current_types, func)?;

    println!("schema: {:?}", schema);
    println!("current_types: {:?}", current_types);
    println!("Coerced types: {:?}", new_types);

    expressions
        .into_iter()
        .enumerate()
        .map(|(i, expr)| expr.cast_to(&new_types[i], schema))
        .collect()
}
schema: DFSchema { inner: Schema { fields: [Field { name: "offset", data_type: FixedSizeList(Field { name: "item", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, 2), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {"content_computed_columns": "content_embedding,content_offset"} }, field_qualifiers: [Some(Bare { table: "rd" })], functional_dependencies: FunctionalDependencies { deps: [] } }

current_types: [FixedSizeList(Field { name: "item", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, 2), Int64]

Coerced types: [List(Field { name: "item", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), Int64]

Expected behavior

No response

Additional context

The original (correct) behavior was changed by the following improvement:
#15149 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions