I was going through KeyValueMetadata and found some error handling which I feel can be made better.
Like for KeyValueMetadata
Status KeyValueMetadata::Delete(int64_t index) {
keys_.erase(keys_.begin() + index);
values_.erase(values_.begin() + index);
return Status::OK();
}
Here if the index is out of bounds, i.e. index >= keys_.size() or index < 0, then this will throw a seg fault or may corrupt memory.
The script I used for testing this
#include <arrow/util/key_value_metadata.h>
#include <iostream>
using arrow::Status;
using std::cin;
using std::cout;
using std::endl;
Status reproduce_delete() {
arrow::KeyValueMetadata meta;
return meta.Delete(20);
}
void DeleteSegFault() {
// The delete method on KeyValueMetadata
Status st = reproduce_delete(); // this seg faults
if (st.ok()) { // if no seg fault then this would be triggered, also memory
// could be currupted
cout << "The bug exsits and returns a OK status even when the index to "
"delete is out of range."
<< endl;
}
}
int main() {
DeleteSegFault();
}
Would a fix like the below would be good? If so I will open a PR for it.
I also used ARROW_PREDICT_FALSE to minimise the overhead.
Status KeyValueMetadata::Delete(int64_t index) {
if (ARROW_PREDICT_FALSE(index < 0 || std::cmp_greater_equal(index, values_.size()))) {
return arrow::Status::IndexError("Index out of bounds. Expected 0 to ", std::to_string(keys_.size() - 1), ", but got ", std::to_string(index));
}
keys_.erase(keys_.begin() + index);
values_.erase(values_.begin() + index);
return Status::OK();
}
Also in C++20, the static_cast above can be changed with std::cmp_greater_equal(index, values_.size()) which avoids the edge case of casting an unsigned long (size_t) to a singed long (int64_t) but I see that Arrow supports a minimum of C++17 so I think static_cast is the only option.
Component(s)
C++
I was going through
KeyValueMetadataand found some error handling which I feel can be made better.Like for
KeyValueMetadataHere if the index is out of bounds, i.e.
index >= keys_.size()orindex < 0, then this will throw a seg fault or may corrupt memory.The script I used for testing this
Would a fix like the below would be good? If so I will open a PR for it.
I also used
ARROW_PREDICT_FALSEto minimise the overhead.Also in C++20, the
static_castabove can be changed withstd::cmp_greater_equal(index, values_.size())which avoids the edge case of casting an unsigned long (size_t) to a singed long (int64_t) but I see that Arrow supports a minimum of C++17 so I thinkstatic_castis the only option.Component(s)
C++