-
Notifications
You must be signed in to change notification settings - Fork 342
Return SubSequence from SQLCommenter.getFirstWord to avoid per-inject substring #11736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
gh-worker-dd-mergequeue-cf854d
merged 5 commits into
master
from
dougqh/sqlcommenter-getfirstword-subseq
Jun 30, 2026
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
0ed82fb
Add SubSequence.startsWith / equalsIgnoreCase (CharSequence-friendly …
dougqh ea771f3
Return SubSequence from SQLCommenter.getFirstWord to avoid per-inject…
dougqh e9f9aea
Add SQLCommenterGetFirstWordBenchmark (getFirstWord alloc: 48 -> 0 B/op)
dougqh 2f648c7
Refresh SQLCommenterGetFirstWordBenchmark results to JDK 17 @Fork(5)
dougqh e66f238
Merge branch 'master' into dougqh/sqlcommenter-getfirstword-subseq
dougqh File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
82 changes: 82 additions & 0 deletions
82
...bc/src/jmh/java/datadog/trace/instrumentation/jdbc/SQLCommenterGetFirstWordBenchmark.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,82 @@ | ||
| package datadog.trace.instrumentation.jdbc; | ||
|
|
||
| import org.openjdk.jmh.annotations.Benchmark; | ||
| import org.openjdk.jmh.annotations.Fork; | ||
| import org.openjdk.jmh.annotations.Measurement; | ||
| import org.openjdk.jmh.annotations.Scope; | ||
| import org.openjdk.jmh.annotations.State; | ||
| import org.openjdk.jmh.annotations.Threads; | ||
| import org.openjdk.jmh.annotations.Warmup; | ||
|
|
||
| /** | ||
| * Benchmark for {@link SQLCommenter#getFirstWord(String)} -- the per-{@code inject} first-word | ||
| * scan. | ||
| * | ||
| * <p><b>What we're measuring.</b> {@code getFirstWord} used to return {@code sql.substring(b, e)} -- | ||
| * a fresh {@code String} (with its own backing array) allocated on every {@code inject} call, just | ||
| * to {@code startsWith}/{@code equalsIgnoreCase} it. It now returns a zero-copy {@code SubSequence} | ||
| * view. The question is empirical: does escape analysis elide the view in the transient consumption | ||
| * (→ 0 B/op), while the old {@code substring} always allocated? | ||
| * | ||
| * <p><b>Honest EA measurement.</b> The view is consumed exactly as {@code inject} consumes it -- a | ||
| * boolean decision ({@code startsWith("{")}) -- and the benchmark returns that boolean. It does NOT | ||
| * return/Blackhole the view itself, which would force it to escape and fake away the very EA win | ||
| * under test. The chained {@code getFirstWord(sql).startsWith("{")} (no typed local) also lets one | ||
| * source compile both before (String.startsWith) and after (SubSequence.startsWith), so before/after | ||
| * is a clean toggle of the production method. | ||
| * | ||
| * <p>Run at {@code @Threads(8)} so the allocation delta surfaces as throughput; {@code -prof gc} | ||
| * (gc.alloc.rate.norm) is the headline mechanism and is fork-stable. | ||
| * | ||
| * <pre> | ||
| * ./gradlew :dd-java-agent:instrumentation:jdbc:jmh # add -prof gc | ||
| * </pre> | ||
| * | ||
| * <p><b>Results</b> (JDK 17, MacBook M-series, {@code @Threads(8)}, {@code @Fork(5)}, {@code -prof | ||
| * gc}): | ||
| * | ||
| * <pre> | ||
| * throughput gc.alloc.rate.norm | ||
| * before (substring) 258.1M ± 21.0M ops/s 48 B/op | ||
| * after (SubSequence) 508.0M ± 21.6M ops/s ~0 B/op (10^-4) | ||
| * </pre> | ||
| * | ||
| * Escape analysis fully elides the view in the transient consumption (it never escapes the | ||
| * decision), so the per-call 48 B/op of the old {@code substring} (a String + its backing array) | ||
| * drops to ~0 and throughput rises ~2x at {@code @Threads(8)} — the allocation win surfacing as | ||
| * throughput. At {@code @Fork(5)} the error tightens (the earlier {@code @Fork(2)} spread was | ||
| * bimodal JIT, not signal); the allocation delta is exact and the throughput gap clears it. | ||
| */ | ||
| @Fork(5) | ||
| @Warmup(iterations = 2) | ||
| @Measurement(iterations = 5) | ||
| @Threads(8) | ||
| public class SQLCommenterGetFirstWordBenchmark { | ||
|
|
||
| // Representative first-word shapes: plain keywords, a stored-proc brace, a CALL, leading space. | ||
| static final String[] SQL = { | ||
| "SELECT * FROM foo WHERE id = 42", | ||
| "{call dogshelterProc(?, ?)}", | ||
| "CALL dogshelterProc(?, ?)", | ||
| "UPDATE accounts SET balance = balance - 100 WHERE id = 42", | ||
| " INSERT INTO logs VALUES (?)", | ||
| }; | ||
|
|
||
| /** Per-thread cursor so threads don't contend on a shared index under {@code @Threads(8)}. */ | ||
| @State(Scope.Thread) | ||
| public static class Cursor { | ||
| int index = 0; | ||
|
|
||
| String next() { | ||
| int i = index; | ||
| index = (i + 1) % SQL.length; | ||
| return SQL[i]; | ||
| } | ||
| } | ||
|
|
||
| @Benchmark | ||
| public boolean firstWordCheck(Cursor cursor) { | ||
| // Mirrors inject(): take the first word, make a boolean decision, discard it. | ||
| return SQLCommenter.getFirstWord(cursor.next()).startsWith("{"); | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for the generic
SubSequencecase to have extra helper methods such asstartsWithandequalsIgnoreCase. Looking back at the concretegetFirstWorduse cases, though, it could be much simpler: I would eliminategetFirstWordcompletely, since it does not seem to add much value here.It is only used to skip whitespace and check whether the next token is either
{or a case-insensitivecallfollowed by whitespace. That could be checked directly in a small amount of code.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. I've been erring on the side of keeping the changes 1-to-1 and very mechanical, but I do agree that further simplification might make sense here.