feat: add complex type support to native Parquet writer#3214
Conversation
Enables support for complex types (arrays, maps, structs) in Comet's native Parquet writer by removing the blocking check that previously prevented them. Changes: - Remove complex type blocking check in CometDataWritingCommand.scala - Add comprehensive test coverage for complex types including: - Basic complex types (array, struct, map) - Nested complex types (array of structs, struct containing array, etc.) - Nullable complex types with nulls at various nesting levels - Complex types containing decimal and temporal types - Empty arrays and maps - Fuzz testing with randomly generated complex type schemas - Update documentation to reflect complex type support Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3214 +/- ##
============================================
+ Coverage 56.12% 60.05% +3.92%
- Complexity 976 1429 +453
============================================
Files 119 170 +51
Lines 11743 15758 +4015
Branches 2251 2602 +351
============================================
+ Hits 6591 9463 +2872
- Misses 4012 4976 +964
- Partials 1140 1319 +179 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Enable spark.comet.scan.allowIncompatible in complex type tests so that native_iceberg_compat scan is used (which supports complex types) instead of falling back to native_comet (which doesn't support complex types). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
c6bc58a to
93d1f82
Compare
The CI sets COMET_PARQUET_SCAN_IMPL=native_comet for some test profiles, which overrides the default auto mode. Since native_comet doesn't support complex types, the scan falls back to Spark's reader which produces OnHeapColumnVector instead of CometVector, causing the native writer to fail. This fix explicitly sets COMET_NATIVE_SCAN_IMPL to "auto" in the test configuration, allowing native_iceberg_compat to be used for complex types. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@comphead ☝️ |
|
Thanks for the thorough review @wForget. I refactored the test framework and added assertions that the plans are running as intended (Spark vs Comet). |
| .strip_prefix("file://") | ||
| .or_else(|| part_file.strip_prefix("file:")) | ||
| .unwrap_or(&part_file); | ||
| let file_size = std::fs::metadata(local_path) |
There was a problem hiding this comment.
A question unrelated to this PR: #2929 supports write to hdfs. So, is the logic here correct when local_path is an hdfs path?

Closes #3023
Summary
Changes
CometDataWritingCommand.scalaCometParquetWriterSuite.scala:Test plan
🤖 Generated with Claude Code