feat: Add support for remote Parquet HDFS writer with openDAL#2929
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2929 +/- ##
============================================
+ Coverage 56.12% 59.59% +3.46%
- Complexity 976 1377 +401
============================================
Files 119 167 +48
Lines 11743 15494 +3751
Branches 2251 2570 +319
============================================
+ Hits 6591 9233 +2642
- Misses 4012 4961 +949
- Partials 1140 1300 +160 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
e2dfaa1 to
cc21ac4
Compare
|
Filed #2971 for double writes |
cc21ac4 to
27c4d54
Compare
| fn parse_hdfs_url(url: &Url) -> Result<(Box<dyn ObjectStore>, Path), object_store::Error> { | ||
| // Creates an HDFS object store from a URL using the native HDFS implementation | ||
| #[cfg(all(feature = "hdfs", not(feature = "hdfs-opendal")))] | ||
| fn create_hdfs_object_store( |
There was a problem hiding this comment.
renamed from parse_hdfs_url to reflect more sense
| if (!cmd.outputPath.toString.startsWith("file:")) { | ||
| return Unsupported(Some("Only local filesystem output paths are supported")) | ||
| if (!cmd.outputPath.toString.startsWith("file:") && !cmd.outputPath.toString | ||
| .startsWith("hdfs:")) { |
There was a problem hiding this comment.
do you mean to refer fn is_hdfs_scheme ?
There was a problem hiding this comment.
do you mean to refer
fn is_hdfs_scheme?
Yes, but it seems there's no verification on java side, we can improve that later.
|
|
||
| // Create HDFS writer lazily on first write | ||
| if hdfs_writer_opt.is_none() { | ||
| let writer = op.writer(output_path.as_str()).await.map_err(|e| { |
There was a problem hiding this comment.
Should we use a generic ObjectStore for writing instead of directly using opendal Operator?
There was a problem hiding this comment.
Thanks @wForget that sounds really good to refer to a generic interface rather than specific fs implementation, I would address it in follow up PR.
|
@wForget thanks for feedback do you see any other improvements could be made? |
|
Thanks @andygrove and @wForget for the review, I'll file some tickets to support generic object store and using |

Which issue does this PR close?
Running experiments to use openDAL with HDFS writes on local and remote clusters
Closes #2890 .
Rationale for this change
What changes are included in this PR?
How are these changes tested?