Skip to content

boolbv_index: handle incomplete extern array types in symbol registration#9055

Open
tautschnig wants to merge 2 commits into
diffblue:developfrom
tautschnig:extract/boolbv-index-incomplete-extern-array
Open

boolbv_index: handle incomplete extern array types in symbol registration#9055
tautschnig wants to merge 2 commits into
diffblue:developfrom
tautschnig:extract/boolbv-index-incomplete-extern-array

Conversation

@tautschnig

@tautschnig tautschnig commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

When boolbv flattens an array index expression where the array is a symbol of unbounded array_typet (e.g. an incomplete declaration like extern T arr[]), it registered that symbol with boolbv_mapt::get_literals using a width of array_width_opt.value_or(0). Passing 0 created a zero-width entry that tripped the size-equals-width invariant in get_literals when the same symbol was — or had been — registered at a non-zero width via its element-typed access path (e.g. T arr[i] returns a T-width value).

This was first surfaced by integration/linux scans of kernel sources that include <linux/ctype.h>, which declares extern const unsigned char _ctype[] — an incomplete extern array referenced by the is*() classifier macros that expand to _ctype[c]. CBMC aborted at boolbv_map.cpp:68 on every such TU.

The fix spans both back-ends, hence two commits:

  • boolbv_index — Skip the registration only when the symbol is already registered at a different width. The width-0 registration itself must be preserved: it is what lets unbounded-array counterexample traces and string-refinement arrays display their element values (e.g. regression/cbmc/trace-values/unbounded_array); only the conflicting re-registration that trips the invariant is dropped. The shared guard is factored into boolbvt::register_array_symbol so the two index branches cannot drift apart.

  • smt2_incremental — The incremental SMT2 back-end needs the same symbol handled once. It keyed its set of already-declared functions on the full expression irep, so the same SSA symbol reached via two sort-equivalent array-type ireps (e.g. _ctype[c] and _ctype[c + 1]) was emitted as two declare-funs, which z3 rejects (constant '_ctype#1' ... already declared). send_function_definition now deduplicates by SSA identifier, with a defensive invariant that the later expression's type is sort-equivalent to the existing declaration.

Regression test: regression/cbmc/incomplete_extern_array1/ indexes such an array at two distinct indices (forcing the symbol to be reached twice) under --bounds-check, and asserts that re-reading the same index yields the same value — so it guards against unsound flattening as well as the original invariant abort. As a CORE test it is additionally run by the cbmc-new-smt-backend CTest profile (--incremental-smt2-solver) when z3 is on PATH, which exercises the second commit.

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • n/a My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@tautschnig tautschnig self-assigned this Jun 18, 2026
Copilot AI review requested due to automatic review settings June 18, 2026 15:51

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes a boolbv symbol-registration bug when flattening index expressions over incomplete extern T arr[] array symbols by avoiding zero-width literal-map entries, and adds a regression test reproducing the kernel _ctype[] case.

Changes:

  • Skip boolbv_mapt::get_literals registration when array width is unknown during index conversion.
  • Duplicate the same guard for both the byte-operator and non-byte-operator index paths.
  • Add regression regression/cbmc/incomplete_extern_array1/ to prevent the invariant failure from recurring.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
src/solvers/flattening/boolbv_index.cpp Avoids registering array symbols in the literal map when width is unknown, preventing zero-width invariant failures.
regression/cbmc/incomplete_extern_array1/test.desc Adds a regression harness asserting success and absence of prior failure strings.
regression/cbmc/incomplete_extern_array1/main.c Reproduces indexing into an incomplete extern array at multiple indices.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/solvers/flattening/boolbv_index.cpp Outdated
Comment thread src/solvers/flattening/boolbv_index.cpp Outdated
Comment thread src/solvers/flattening/boolbv_index.cpp Outdated
Comment thread src/solvers/flattening/boolbv_index.cpp Outdated
Comment thread regression/cbmc/incomplete_extern_array1/test.desc Outdated
@tautschnig tautschnig force-pushed the extract/boolbv-index-incomplete-extern-array branch 2 times, most recently from 2e277ce to 232ff03 Compare June 19, 2026 08:55
@tautschnig tautschnig requested a review from TGWDB as a code owner June 19, 2026 08:55
@codecov

codecov Bot commented Jun 19, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.69%. Comparing base (19dcaca) to head (bfa0251).

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9055      +/-   ##
===========================================
+ Coverage    80.68%   80.69%   +0.01%     
===========================================
  Files         1714     1714              
  Lines       189519   189526       +7     
  Branches        73       73              
===========================================
+ Hits        152916   152946      +30     
+ Misses       36603    36580      -23     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@tautschnig tautschnig force-pushed the extract/boolbv-index-incomplete-extern-array branch from 232ff03 to dd4fe95 Compare June 22, 2026 12:23
@tautschnig tautschnig assigned kroening and unassigned tautschnig Jun 22, 2026
tautschnig and others added 2 commits June 22, 2026 19:03
…tion

When boolbv flattens an array index expression where the array is a
symbol of unbounded array_typet (e.g. an incomplete declaration like
'extern T arr[]'), it registered that symbol with
boolbv_mapt::get_literals using a width of array_width_opt.value_or(0).
Passing 0 created a zero-width entry that tripped the size-equals-width
invariant in get_literals when the same symbol was — or had been —
registered at a non-zero width via its element-typed access path (e.g. T
arr[i] returns a T-width value).

Skip the registration only when the symbol is already registered at a
different width.  The width-0 registration itself must be preserved: it
is what lets unbounded-array counterexample traces and string-refinement
arrays display their element values (e.g.
regression/cbmc/trace-values/unbounded_array); only the conflicting
re-registration that trips the invariant is dropped.  The shared logic is
factored into boolbvt::register_array_symbol so the two index branches
cannot drift apart.

This was first surfaced by integration/linux scans of kernel sources
that include <linux/ctype.h>, which declares 'extern const unsigned char
_ctype[]' — an incomplete extern array referenced by the is*()
classifier macros that expand to '_ctype[c]'.  CBMC aborted at
boolbv_map.cpp:68 on every such TU.

Regression test: regression/cbmc/incomplete_extern_array1/ indexes such an
array at two distinct indices (forcing the symbol to be reached twice)
under --bounds-check, and asserts that re-reading the same index yields
the same value, so the test guards against unsound flattening as well as
the invariant abort.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
The incremental SMT2 back-end keyed its set of already-declared functions
on the full expression (irept).  An incomplete `extern T arr[]` symbol can
be reached via two expressions that share the same SSA identifier but
differ only in their (sort-equivalent) array-type irep -- e.g. _ctype[c]
and _ctype[c + 1] from <linux/ctype.h>.  Those are distinct expression
keys, so both reached send_function_definition and emitted a second
(declare-fun |_ctype#1| () (Array ...)), which z3 rejects with
"constant '_ctype#1' (with the given signature) already declared".

Deduplicate by SSA identifier in send_function_definition: if the symbol
is already in identifier_table, map the later expression to the existing
declaration instead of re-declaring it.  A defensive invariant checks that
the later expression's type is sort-equivalent to the existing one, so a
violated assumption fails loudly here rather than as an opaque downstream
solver error.

With this, regression/cbmc/incomplete_extern_array1 also passes under the
incremental SMT2 back-end (the cbmc-new-smt-backend CTest profile), so its
no-new-smt tag is removed.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
@tautschnig tautschnig force-pushed the extract/boolbv-index-incomplete-extern-array branch from dd4fe95 to bfa0251 Compare June 22, 2026 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants