Skip to content

SMT2: encode mathematical and string/regex types and operations#9074

Open
tautschnig wants to merge 3 commits into
diffblue:developfrom
tautschnig:strata/smt2-mathematical-and-string-types
Open

SMT2: encode mathematical and string/regex types and operations#9074
tautschnig wants to merge 3 commits into
diffblue:developfrom
tautschnig:strata/smt2-mathematical-and-string-types

Conversation

@tautschnig

@tautschnig tautschnig commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Teach the (non-incremental) SMT2 back-end (smt2_convt) to encode SMT-LIB-native mathematical, string and regex types and their operators.

Motivation

Rather than threading a back-end-specific representation through the GOTO pipeline, front-ends (e.g. Strata) emit the canonical CPROVER string/regex built-in functions that the SAT string solver already understands. This PR additionally lowers those built-ins natively to the SMT-LIB theory of strings, so string/regex problems can be solved by the SMT2 back-end without the SAT string-refinement loop, and without any Strata-specific operators leaking into the IR.

What it does

Mathematical types/operators (commit 1)

  • Map integer/real (and mathematical-function) types to SMT-LIB Int/Real in type2id.
  • Encode mathematical integer mod/div with SMT-LIB Int semantics.
  • Allow arrays of unknown/non-constant size (e.g. indexed by a mathematical integer): drop the array-size invariant in convert_type so the SMT-LIB array sort is still emitted, and make flatten_array reject bit-vector flattening of such arrays with a clear unsupported-operation error rather than an assertion failure.

String/regex types and operators (commit 2)

  • Encode the String/RegLan types and string constants (only " is escaped, as "", per SMT-LIB 2.6).
  • Lower the CPROVER string/regex built-ins (cprover_string_*_func, cprover_regex_*_func) in convert_expr to the corresponding SMT-LIB operators: str.++, str.len, str.at, str.substr, str.contains, str.prefixof/str.suffixof (covering is_prefix/is_suffix and startswith/endswith), str.indexof, str.replace, =, is_empty(= s ""), str.to_re/str.in_re, and the regex family re.range/re.++/re.*/re.+/re.opt/re.loop/re.union/re.inter/re.comp/re.diff/re.all/re.allchar/re.none.
  • Operand contract: operands are SMT-LIB-native (String/RegLan/Int) and in SMT-LIB operand order; the per-operator contract is documented in convert_expr. Built-ins without an exact SMT-LIB counterpart (compare_to, of_int, parse_int, format, case folding, trim, …) are intentionally not lowered.
  • Soundness guard: a built-in carrying the refined-string (length, char-array / pointer) representation is rejected with unsupported_operation_exceptiont rather than silently mis-lowered — that representation is handled by the SAT string solver (--refine-strings).

Back-end selection (commit 3)

  • --refine-strings (a SAT refinement procedure) together with an SMT2 back-end is now a usage error instead of --refine-strings silently winning. With an SMT2 back-end and no --refine-strings, the native lowering above applies.

Scope / limitations

  • The incremental SMT back-end (src/solvers/smt2_incremental) uses a separate converter and does not support these built-ins; such inputs are reported via UNIMPLEMENTED_FEATURE (a note documents this).
  • Verification is via smt2_convt unit tests (lowering of each operator, string-literal escaping, and the soundness guard). End-to-end regression tests require a front-end that emits these native built-ins (the Strata/Python side) and are expected to land there.
  • This supersedes the SMT2-side of Add string type support to SMT back-end #8838 (now closed).
  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (unit/solvers/smt2/smt2_conv.cpp; end-to-end tests to follow via the emitting front-end).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@tautschnig tautschnig self-assigned this Jun 18, 2026
Copilot AI review requested due to automatic review settings June 18, 2026 21:49

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Extends the SMT2 converter to support Strata mathematical Int/Real types plus SMT-LIB string/regex types, constants, and common Str.* / Re.* operations.

Changes:

  • Encode integer, real, string, and regex sorts in type2id / convert_type
  • Map selected Str.* and Re.* function applications to SMT-LIB string/regex operators
  • Adjust integer div/mod encoding and add SMT-LIB string constant emission

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/solvers/smt2/smt2_conv.cpp Outdated
Comment thread src/solvers/smt2/smt2_conv.cpp
Comment thread src/solvers/smt2/smt2_conv.cpp Outdated
Comment thread src/solvers/smt2/smt2_conv.cpp Outdated
Comment thread src/solvers/smt2/smt2_conv.cpp Outdated
fn_name = id2string(
to_symbol_expr(function_application_expr.function()).get_identifier());

// Map Strata function names to SMT-LIB names for string/regex ops

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should introduce expressions for these at some point.

@tautschnig tautschnig force-pushed the strata/smt2-mathematical-and-string-types branch from 8d36980 to 4fa51d7 Compare June 25, 2026 18:40
@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.75000% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.71%. Comparing base (02c2769) to head (208e0dd).

Files with missing lines Patch % Lines
src/solvers/smt2/smt2_conv.cpp 83.87% 20 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9074      +/-   ##
===========================================
+ Coverage    80.69%   80.71%   +0.02%     
===========================================
  Files         1714     1714              
  Lines       189597   189908     +311     
  Branches        73       73              
===========================================
+ Hits        152987   153289     +302     
- Misses       36610    36619       +9     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

tautschnig and others added 2 commits June 25, 2026 21:52
Map the mathematical integer/real types (and mathematical-function
types) to their SMT-LIB names in type2id, and encode the mathematical
integer mod and div operators with their SMT-LIB Int semantics (div/mod
rather than real division).

Also relax array handling for arrays of unknown/non-constant size (e.g.
those indexed by a mathematical integer): drop the array-size invariant
in convert_type so the SMT-LIB array sort can still be emitted, and make
flatten_array reject bit-vector flattening of such arrays with a clear
unsupported-operation error instead of an assertion failure.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Encode the SMT-LIB theory-of-strings String and RegLan types and string
constants, and lower the CPROVER string/regex built-in functions
(cprover_string_*_func, cprover_regex_*_func) to the corresponding
SMT-LIB string/regex operators in convert_expr.

Rather than threading a back-end-specific representation through the
GOTO pipeline, front-ends (e.g. Strata) emit the canonical CPROVER
string built-ins, which the SAT string solver already understands; here
we additionally lower them natively to the SMT-LIB theory of strings.

Operand contract: arguments are SMT-LIB-native (String/RegLan and scalar
Int) and in SMT-LIB operand order, so most built-ins lower to a flat
application. Specifically:
  concat -> str.++            length -> str.len
  char_at -> str.at           substring -> str.substr
  contains -> str.contains    is_prefix/is_suffix -> str.prefixof/suffixof
  replace -> str.replace      equal -> =
  index_of -> str.indexof (start offset defaults to 0)
  startswith/endswith -> str.prefixof/suffixof (affix reordered first)
  is_empty -> (= s "")
  to_regex/in_regex -> str.to_re/str.in_re
  regex range/concat/star/plus/opt/loop/union/inter/comp/diff/all/
  allchar/none -> the corresponding re.* operators
Built-ins without an exact SMT-LIB counterpart (compare_to, of_int,
parse_int, format, case folding, trim, ...) are intentionally not
lowered and fall through to the generic handling.

A soundness guard rejects (with unsupported_operation_exceptiont) any
such built-in carrying refined-string (array/pointer) operands: that
representation is handled by the SAT string solver (--refine-strings)
and must not be silently mis-lowered here. The incremental SMT back-end
(src/solvers/smt2_incremental) does not support these built-ins and
reports them via UNIMPLEMENTED_FEATURE; a note documents this.

Unit tests in unit/solvers/smt2/smt2_conv.cpp cover the lowering of each
operator, the string-literal escaping, and the soundness guard.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
@tautschnig tautschnig force-pushed the strata/smt2-mathematical-and-string-types branch from 4fa51d7 to 1085c00 Compare June 25, 2026 21:53
The refined-string solver is a SAT (bit-vector refinement) procedure and
cannot run on an SMT2 back-end, which instead lowers string operations
to the SMT-LIB theory of strings directly. Previously --refine-strings
silently took precedence over --smt2/--incremental-smt2-solver; reject
the contradictory combination with a usage error instead.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
@tautschnig tautschnig force-pushed the strata/smt2-mathematical-and-string-types branch from 1085c00 to 208e0dd Compare June 26, 2026 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants