feat: Support hdfs with OpenDAL#2244
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2244 +/- ##
============================================
+ Coverage 56.12% 57.77% +1.64%
- Complexity 976 1291 +315
============================================
Files 119 145 +26
Lines 11743 13360 +1617
Branches 2251 2378 +127
============================================
+ Hits 6591 7719 +1128
- Misses 4012 4384 +372
- Partials 1140 1257 +117 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@comphead do you remember if we looked at OpenDAL originally for HDFS support? |
Yeah, the main concern was limited support for HDFS client https://github.com/Kimahriman/hdfs-native?tab=readme-ov-file#supported-hdfs-settings |
|
@wForget is there a real use case for On a separate note the crate is very actively evolving and in future might be a more successful candidate the |
OpenDAL is governed by the Apache Software Foundation, which is nice too. It also supports object-store as a backend. |
I think Iceberg-rs is relying on it, so that's a big motivation to contribute to it. |
The dependency you mentioned corresponds to the services-hdfs-native feature of OpenDAL, which is a fully native HDFS client. This PR introduces the service-hdfs feature, which uses hdrs crate, a jvm based libhfs. |
No, currently we use gluten as spark native engine, but we also use jvm-based libhdfs |
|
Thanks @wForget I was referring to native-hdfs crate as you correctly mentioned. Is there an object store implementation based on hdrs? from PR ny understanding |
It seems to be https://github.com/apache/opendal/tree/main/integrations/object_store |
|
Thanks I'm planning to run some tests this weekend using this feature and local HDFS 3 node cluster |
|
I'm still on it @wForget, the local hdfs cluster setup having some issue |
Thank you for your verification and feedback. |
|
I made smoke checks on my local 3 nodes cluster and it works, however I haven't checked auth part, @parthchandra WDYT? Its probably doesn't make sense of having 2 similar libhdfs clients in the project, however open dal one is more actively supported |
I'll try this out (though it may be a couple of days before I get to it). In the meantime, I see no issue with having this in the code base given that it is behind a feature flag (as is the current fs-hdfs based implementation). |
comphead
left a comment
There was a problem hiding this comment.
lgtm thanks @wForget and @parthchandra
* feat: Support hdfs with OpenDAL
Which issue does this PR close?
Closes #2243.
Rationale for this change
I also noticed the Apache OpenDAL project, which supports object_store and many file services. Perhaps we can integrate it to access more file services.
What changes are included in this PR?
add hdfs-opendal feature to support hdfs with opendal
How are these changes tested?
Successfully run CometReadHdfsBenchmark locally (tips: build native enable hdfs-opendal: cd native && cargo build --features hdfs-opendal)