feat: Support basic Delta scans by Kimahriman · Pull Request #3035 · apache/datafusion-comet

Kimahriman · 2026-01-04T13:55:43Z

Which issue does this PR close?

Related to #174, not full support so probably should keep that open (or open new tickets specifically for column mapping and deletion vectors)

Rationale for this change

Delta scans are simply built-in Parquet scans when column mapping and deletion vectors are not used, so the existing Comet parquet scans work fine for them. This adds support for these cases in the scan rule and adds some unit tests that could be used for future integration of delta-rs or delta-kernel-rs for full scan support.

What changes are included in this PR?

Updates the CometScanRule to use Comet parquet scans for Delta V1 scans of tables that don't use column mapping or deletion vectors. Adds a new DeltaReflection class for accessing the Delta properties to determine the enabled features.

How are these changes tested?

Adds unit tests for different Delta scans and adds Delta as a test dependency.

Kimahriman · 2026-01-04T14:01:53Z

 reqwest = { version = "0.12", default-features = false, features = ["rustls-tls-native-roots", "http2"] }
 object_store_opendal = {version = "0.55.0", optional = true}
-hdfs-sys = {version = "0.3", optional = true, features = ["hdfs_3_3"]}
+hdfs-sys = {version = "0.3", optional = true, features = ["hdfs_3_3", "vendored"]}


With opendal enabled by default in #2929, I was getting failures finding libhdfs.so running unit tests on Linux. The hdrs dependency below for macos was purely activating the vendored feature of hdfs-sys, so I just enabled that globally here so you don't need libhdfs available for developing on Linux either

Kimahriman · 2026-01-04T14:02:30Z

+    <dependency>
+      <groupId>com.google.guava</groupId>
+      <artifactId>failureaccess</artifactId>
+      <version>1.0.3</version>
+      <scope>test</scope>
+    </dependency>


Without this I'm getting an error creating the Delta tables:

Cause: java.lang.ClassNotFoundException: com.google.common.util.concurrent.internal.InternalFutureFailureAccess at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) at java.base/java.lang.ClassLoader.defineClass1(Native Method) at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) ...

Not sure why, could use some help figuring out what is missing with Guava from where

codecov-commenter · 2026-01-04T14:29:02Z

Codecov Report

❌ Patch coverage is 65.78947% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.60%. Comparing base (f09f8af) to head (aa51b22).
⚠️ Report is 832 commits behind head on main.

Files with missing lines	Patch %	Lines
...scala/org/apache/comet/delta/DeltaReflection.scala	66.66%	9 Missing ⚠️
...ala/org/apache/spark/sql/comet/CometScanExec.scala	63.63%	1 Missing and 3 partials ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #3035      +/-   ##
============================================
+ Coverage     56.12%   59.60%   +3.47%     
- Complexity      976     1379     +403     
============================================
  Files           119      168      +49     
  Lines         11743    15534    +3791     
  Branches       2251     2579     +328     
============================================
+ Hits           6591     9259    +2668     
- Misses         4012     4975     +963     
- Partials       1140     1300     +160

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

andygrove · 2026-01-05T19:33:35Z

Thanks, @Kimahriman. Please also add content to the documentation (either the user guide or the contributor guide) explaining this new feature.

Kimahriman · 2026-01-07T16:29:23Z

Added a snippet to the user guide. There's not much for the contributor guide since this doesn't start the process of trying to use the delta rust libraries. Still trying to figure out if/how to integrate those

sebbegg · 2026-03-02T12:36:17Z

@Kimahriman any plans to continue work on this?

We rely on delta tables and apache comet sounds like it might be very interesting option to speed up our queries without making a big transition off spark.

andygrove · 2026-03-02T14:51:55Z

+    deltaMetadata.minReaderVersion match {
+      case 1 => true
+      case 2 => false
+      case 3 =>


Could you add case _ to handle future versions

Reader version 3 switched to named features instead of incremental versions, so there will never be another reader version. I can still handle more gracefully if you would like though

Kimahriman · 2026-03-03T14:01:31Z

@Kimahriman any plans to continue work on this?

We rely on delta tables and apache comet sounds like it might be very interesting option to speed up our queries without making a big transition off spark.

This should be good to go. I don't have immediate plans to address the reader features that will require custom integration like DVs and column mapping (though the latter would likely be somewhat straightforward to support)

github-actions · 2026-05-03T02:19:19Z

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

Kimahriman added 6 commits January 2, 2026 22:05

work on getting delta scans working

6c01551

More tests

82dada0

Scalafix

0a995ef

Support Spark 3.4

86ff296

Remove unnecessary dependency and fix tests for Spark 4

a31f3a0

Add test for CDF

b0bbba8

Kimahriman commented Jan 4, 2026

View reviewed changes

Add new suite to CI

2076f1e

Kimahriman added 2 commits January 7, 2026 16:27

Add a snippet to supported sources

ef51e5e

Fix md

aa51b22

andygrove reviewed Mar 2, 2026

View reviewed changes

Comment thread spark/src/main/scala/org/apache/comet/delta/DeltaReflection.scala

andygrove reviewed Mar 2, 2026

View reviewed changes

Comment thread spark/src/main/scala/org/apache/comet/delta/DeltaReflection.scala Outdated

andygrove reviewed Mar 2, 2026

View reviewed changes

Kimahriman added 3 commits March 3, 2026 08:52

Fix comments

aada9b1

Merge branch 'main' into delta-scan

3bbb78b

Fix error

5a7633b

github-actions Bot added the Stale label May 3, 2026

github-actions Bot closed this May 10, 2026

adityavaish mentioned this pull request Jun 17, 2026

feat: support native Comet scan of plain Delta Lake tables #4669

Draft

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Support basic Delta scans#3035

feat: Support basic Delta scans#3035
Kimahriman wants to merge 12 commits into
apache:mainfrom
Kimahriman:delta-scan

Kimahriman commented Jan 4, 2026 •

edited

Loading

Uh oh!

Kimahriman Jan 4, 2026 •

edited

Loading

Uh oh!

Kimahriman Jan 4, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Jan 4, 2026 •

edited

Loading

Uh oh!

andygrove commented Jan 5, 2026

Uh oh!

Kimahriman commented Jan 7, 2026

Uh oh!

sebbegg commented Mar 2, 2026

Uh oh!

Uh oh!

Uh oh!

andygrove Mar 2, 2026

Uh oh!

Kimahriman Mar 3, 2026

Uh oh!

Kimahriman commented Mar 3, 2026

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

Kimahriman commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Kimahriman Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Kimahriman Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

andygrove commented Jan 5, 2026

Uh oh!

Kimahriman commented Jan 7, 2026

Uh oh!

sebbegg commented Mar 2, 2026

Uh oh!

Uh oh!

Uh oh!

andygrove Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Kimahriman Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Kimahriman commented Mar 3, 2026

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Kimahriman commented Jan 4, 2026 •

edited

Loading

Kimahriman Jan 4, 2026 •

edited

Loading

Kimahriman Jan 4, 2026 •

edited

Loading

codecov-commenter commented Jan 4, 2026 •

edited

Loading