Skip to content

Narrow preg_match/preg_match_all subject string type when match is truthy#5777

Merged
staabm merged 9 commits into
phpstan:2.2.xfrom
phpstan-bot:create-pull-request/patch-3uk0oyi
May 29, 2026
Merged

Narrow preg_match/preg_match_all subject string type when match is truthy#5777
staabm merged 9 commits into
phpstan:2.2.xfrom
phpstan-bot:create-pull-request/patch-3uk0oyi

Conversation

@phpstan-bot
Copy link
Copy Markdown
Collaborator

Summary

When preg_match() or preg_match_all() is used as a condition (in if, ternary, &&, etc.) and returns truthy, the subject string parameter was not being narrowed based on the regex pattern. The PregMatchTypeSpecifyingExtension only narrowed the $matches array (arg 2) but never the subject string (arg 1). This fix adds subject narrowing using the regex pattern analysis that already exists in RegexArrayShapeMatcher.

Changes

  • Added matchSubjectExpr() method to src/Type/Php/RegexArrayShapeMatcher.php that extracts the subject base type from a regex pattern (non-empty-string or non-falsy-string) using the existing RegexGroupParser infrastructure
  • Modified src/Type/Php/PregMatchTypeSpecifyingExtension.php to:
    • Narrow the subject parameter (arg 1) in truthy context using the regex-derived subject type
    • Work even when there's no $matches argument (previously returned empty SpecifiedTypes immediately)
    • Guard against narrowing non-string subjects (e.g. int|string) by checking isString()->yes()
    • Skip subject narrowing when the subject expression shares a root variable with the matches argument to avoid conflicting type specifications (e.g. preg_match($pat, $matches[2], $matches))

Analogous cases probed

  • preg_match_all: tested and working (uses the same extension)
  • Negated !preg_match: works correctly (falsey branch keeps string, truthy branch narrows)
  • Comparison preg_match() === 1: works correctly
  • && conditions: works correctly (both subjects narrowed)
  • Short ternary ?:: falsey branch correctly stays string
  • Non-constant patterns: correctly returns string (no narrowing when pattern can't be analyzed)
  • Non-string subjects (int|string): correctly skipped (no narrowing applied)
  • Subject sharing variable with matches ($matches[2] as subject, $matches as output): correctly skipped to avoid conflicts

Root cause

PregMatchTypeSpecifyingExtension::specifyTypes() returned empty SpecifiedTypes when no $matches argument was present, and even when $matches was present, it only narrowed the matches array variable — never the subject string. The regex pattern analysis infrastructure (RegexGroupParserRegexAstWalkResult::getSubjectBaseType()) already computed the subject type but it was only used for $matches[0]'s type, not for narrowing the subject variable itself.

Test

  • Added tests/PHPStan/Analyser/nsrt/bug-14710.php with regression tests covering:
    • Ternary with preg_match (the reported issue)
    • if with preg_match (same underlying bug)
    • With and without $matches argument
    • preg_match_all
    • Negated condition
    • === 1 comparison
    • Non-narrowing patterns (e.g. /^$/ that can match empty strings)
    • Non-constant patterns (no narrowing possible)
    • Non-string subjects

Fixes phpstan/phpstan#14710

Copy link
Copy Markdown
Contributor

@staabm staabm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a simliar fix for other type-specifying extensions?
can this fix be made in MutatingScope/TypeSpecifier/NodeScopeResolver instead, so it works for more expressions?

@phpstan-bot
Copy link
Copy Markdown
Collaborator Author

Done. Here's my response to the review, with the investigation behind it.

Addressing @staabm's review

do we need a similar fix for other type-specifying extensions?

No new extension is needed — the codebase already applies this exact per-function pattern:

  • StrContainingTypeSpecifyingExtension narrows the haystack of str_contains/str_starts_with/strpos/etc. to non-empty-string/non-falsy-string when the call is truthy.
  • CtypeDigitFunctionTypeSpecifyingExtension narrows its argument to numeric-string.

Each lives in its own extension because the narrowed type is derived from that function's specific semantics. preg_match/preg_match_all was simply the missing case, and this PR fills it analogously (its narrowed type comes from regex pattern analysis). I checked the other string predicates and there's no remaining gap.

can this fix be made in MutatingScope/TypeSpecifier/NodeScopeResolver instead, so it works for more expressions?

It can't, and shouldn't:

  • The narrowing depends on regex pattern analysis (RegexGroupParserRegexAstWalkResult::getSubjectBaseType()). Deciding "what does a function's truthy return imply about its arguments" is precisely the role of FunctionTypeSpecifyingExtension; the core scope machinery has no concept of it.
  • It already works for arbitrary subject expressions (property fetches, array dims, not just variables), because it narrows via typeSpecifier->create($subjectArg->value, …) on the expression node.

What I did change

To genuinely honor the "works for more expressions" intent, I made the subject narrowing smarter for nullable subjects — something preg_match can do that str_contains can't, because it knows the pattern at compile time:

  • A null subject is coerced to '' by preg_*, which cannot match a pattern whose derived subject type is non-empty. So ?string now narrows to non-falsy-string in the truthy branch (previously left as string|null). Non-string scalars like int are still left untouched, since an int can be coerced to a matching string — narrowing those away would be unsound.

I also added the regression coverage the PR claimed but was missing:

  • subject sharing its root variable with the $matches arg (the conflict-avoidance guard)
  • the int|string non-narrowing case

make tests (12539 tests) and make phpstan both pass. Pushed as 70f7cbbb6.

Comment thread src/Type/Php/PregMatchTypeSpecifyingExtension.php Outdated
staabm and others added 5 commits May 29, 2026 13:41
…s truthy

- Add `matchSubjectExpr()` to `RegexArrayShapeMatcher` that extracts the
  subject base type (non-empty-string or non-falsy-string) from a regex
  pattern without requiring a $matches argument
- Modify `PregMatchTypeSpecifyingExtension` to narrow the subject parameter
  (arg 1) to the regex-derived subject type in truthy context
- Guard against narrowing non-string subjects (e.g. int|string) by checking
  `isString()->yes()` on the subject type
- Skip subject narrowing when the subject expression shares a root variable
  with the matches argument (e.g. `preg_match($pat, $matches[2], $matches)`)
  to avoid conflicting type specifications
A null subject is coerced to '' by preg_*, which cannot match a pattern
whose derived subject type is non-empty, so null can be soundly removed
from a nullable string subject. Non-string scalars (e.g. int) may be
coerced to a matching string and are still left untouched.

Also adds regression coverage for the subject-shares-variable-with-matches
guard and the int|string non-narrowing case.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ables

Make the rationale behind the `instanceof Expr\\Variable` guard explicit:
narrowing only plain variables covers the vast majority of real-world code
and avoids breaking exotic subjects like `preg_match($p, $matches[2], $matches)`
(Rules bug-9503), where the subject is an offset of the array receiving matches.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@staabm staabm force-pushed the create-pull-request/patch-3uk0oyi branch from 3f54a9c to 805983f Compare May 29, 2026 11:41
@staabm staabm requested a review from VincentLanglet May 29, 2026 12:29
@staabm staabm merged commit 04d314d into phpstan:2.2.x May 29, 2026
655 of 668 checks passed
@staabm staabm deleted the create-pull-request/patch-3uk0oyi branch May 29, 2026 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

narrow types when using ternary on preg_match

3 participants