Skip to content

feat: Support hdfs with OpenDAL#2244

Merged
comphead merged 4 commits into
apache:mainfrom
wForget:opendal
Sep 3, 2025
Merged

feat: Support hdfs with OpenDAL#2244
comphead merged 4 commits into
apache:mainfrom
wForget:opendal

Conversation

@wForget

@wForget wForget commented Aug 27, 2025

Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #2243.

Rationale for this change

I also noticed the Apache OpenDAL project, which supports object_store and many file services. Perhaps we can integrate it to access more file services.

What changes are included in this PR?

add hdfs-opendal feature to support hdfs with opendal

How are these changes tested?

Successfully run CometReadHdfsBenchmark locally (tips: build native enable hdfs-opendal: cd native && cargo build --features hdfs-opendal)

@codecov-commenter

codecov-commenter commented Aug 27, 2025

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 57.77%. Comparing base (f09f8af) to head (488d649).
⚠️ Report is 434 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2244      +/-   ##
============================================
+ Coverage     56.12%   57.77%   +1.64%     
- Complexity      976     1291     +315     
============================================
  Files           119      145      +26     
  Lines         11743    13360    +1617     
  Branches       2251     2378     +127     
============================================
+ Hits           6591     7719    +1128     
- Misses         4012     4384     +372     
- Partials       1140     1257     +117     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@wForget wForget marked this pull request as ready for review August 27, 2025 09:35
@andygrove

Copy link
Copy Markdown
Member

@comphead do you remember if we looked at OpenDAL originally for HDFS support?

@comphead

Copy link
Copy Markdown
Contributor

@comphead do you remember if we looked at OpenDAL originally for HDFS support?

Yeah, the main concern was limited support for HDFS client https://github.com/Kimahriman/hdfs-native?tab=readme-ov-file#supported-hdfs-settings
which can lead to lack of support for IO retries, authentication, etc
but this PR uses openDAL as an optional feature, which IMO should be fine

@comphead

Copy link
Copy Markdown
Contributor

@wForget is there a real use case for open_dal?

On a separate note the crate is very actively evolving and in future might be a more successful candidate the hdfs

@andygrove

Copy link
Copy Markdown
Member

@wForget is there a real use case for open_dal?

On a separate note the crate is very actively evolving and in future might be a more successful candidate the hdfs

OpenDAL is governed by the Apache Software Foundation, which is nice too. It also supports object-store as a backend.

@mbutrovich

Copy link
Copy Markdown
Contributor

@wForget is there a real use case for open_dal?

On a separate note the crate is very actively evolving and in future might be a more successful candidate the hdfs

I think Iceberg-rs is relying on it, so that's a big motivation to contribute to it.

@wForget

wForget commented Aug 29, 2025

Copy link
Copy Markdown
Member Author

@comphead do you remember if we looked at OpenDAL originally for HDFS support?

Yeah, the main concern was limited support for HDFS client https://github.com/Kimahriman/hdfs-native?tab=readme-ov-file#supported-hdfs-settings which can lead to lack of support for IO retries, authentication, etc but this PR uses openDAL as an optional feature, which IMO should be fine

The dependency you mentioned corresponds to the services-hdfs-native feature of OpenDAL, which is a fully native HDFS client. This PR introduces the service-hdfs feature, which uses hdrs crate, a jvm based libhfs.

@wForget

wForget commented Aug 29, 2025

Copy link
Copy Markdown
Member Author

@wForget is there a real use case for open_dal?

No, currently we use gluten as spark native engine, but we also use jvm-based libhdfs

@comphead

Copy link
Copy Markdown
Contributor

Thanks @wForget I was referring to native-hdfs crate as you correctly mentioned.
Reg to hdrs it looks interesting and more supported than hdfs crate with the same architecture and based on libhdfs which based on jvm libs.

Is there an object store implementation based on hdrs? from PR ny understanding object_store_opendal is the implemenation?

@wForget

wForget commented Aug 29, 2025

Copy link
Copy Markdown
Member Author

Is there an object store implementation based on hdrs? from PR ny understanding object_store_opendal is the implemenation?

It seems to be https://github.com/apache/opendal/tree/main/integrations/object_store

@comphead

Copy link
Copy Markdown
Contributor

Thanks I'm planning to run some tests this weekend using this feature and local HDFS 3 node cluster

@comphead

comphead commented Sep 2, 2025

Copy link
Copy Markdown
Contributor

I'm still on it @wForget, the local hdfs cluster setup having some issue

@wForget

wForget commented Sep 2, 2025

Copy link
Copy Markdown
Member Author

I'm still on it @wForget, the local hdfs cluster setup having some issue

Thank you for your verification and feedback.

@comphead

comphead commented Sep 3, 2025

Copy link
Copy Markdown
Contributor

I made smoke checks on my local 3 nodes cluster and it works, however I haven't checked auth part, @parthchandra WDYT?

Its probably doesn't make sense of having 2 similar libhdfs clients in the project, however open dal one is more actively supported

@parthchandra

Copy link
Copy Markdown
Contributor

I made smoke checks on my local 3 nodes cluster and it works, however I haven't checked auth part, @parthchandra WDYT?

I'll try this out (though it may be a couple of days before I get to it). In the meantime, I see no issue with having this in the code base given that it is behind a feature flag (as is the current fs-hdfs based implementation).
I suspect that we will have to make minor changes to use this when accessing object store implementations that already have native implementations (e.g. S3) and I would like to be able to benchmark the various implementations at some point.

@comphead comphead left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @wForget and @parthchandra

@comphead comphead merged commit a74fea1 into apache:main Sep 3, 2025
93 checks passed
coderfender pushed a commit to coderfender/datafusion-comet that referenced this pull request Dec 13, 2025
* feat: Support hdfs with OpenDAL
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integrate Apache OpenDAL to support more file serivces

6 participants