Skip to content

[C++] KeyValueMetadata gives seg fault while deleting a key #50311

Description

@OmBiradar

I was going through KeyValueMetadata and found some error handling which I feel can be made better.

Like for KeyValueMetadata

Status KeyValueMetadata::Delete(int64_t index) {
  keys_.erase(keys_.begin() + index);
  values_.erase(values_.begin() + index);
  return Status::OK();
}

Here if the index is out of bounds, i.e. index >= keys_.size() or index < 0, then this will throw a seg fault or may corrupt memory.

The script I used for testing this

#include <arrow/util/key_value_metadata.h>
#include <iostream>

using arrow::Status;
using std::cin;
using std::cout;
using std::endl;

Status reproduce_delete() {
  arrow::KeyValueMetadata meta;
  return meta.Delete(20);
}

void DeleteSegFault() {
  // The delete method on KeyValueMetadata
  Status st = reproduce_delete(); // this seg faults
  if (st.ok()) { // if no seg fault then this would be triggered, also memory
                 // could be currupted
    cout << "The bug exsits and returns a OK status even when the index to "
            "delete is out of range."
         << endl;
  }
}

int main() {
  DeleteSegFault();
}

Would a fix like the below would be good? If so I will open a PR for it.

I also used ARROW_PREDICT_FALSE to minimise the overhead.

Status KeyValueMetadata::Delete(int64_t index) {
  if (ARROW_PREDICT_FALSE(index < 0 || std::cmp_greater_equal(index, values_.size()))) {
    return arrow::Status::IndexError("Index out of bounds. Expected 0 to ", std::to_string(keys_.size() - 1), ", but got ", std::to_string(index));
  }
  keys_.erase(keys_.begin() + index);
  values_.erase(values_.begin() + index);
  return Status::OK();
}

Also in C++20, the static_cast above can be changed with std::cmp_greater_equal(index, values_.size()) which avoids the edge case of casting an unsigned long (size_t) to a singed long (int64_t) but I see that Arrow supports a minimum of C++17 so I think static_cast is the only option.

Component(s)

C++

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions