simd: split cursor advancing from value matching#156
Conversation
eea5c01 to
3aaac3a
Compare
64b4de5 to
4ab2ffb
Compare
This refactors all SIMD modules in order to make the value-matching logic self-contained. Thus, all bytes-cursor manipulations are now grouped and performed once at the end, outside of SIMD logic.
4ab2ffb to
a88052f
Compare
|
@seanmonstar this is ready for a review pass, whenever you have time. There is a minor cleanup bundled in this PR (marking several functions as I'll be honest, I started doing this rework as part of hyperium/hyper#3574 before actually going for hyperium/hyper#3575, focused on memory usage/allocation patterns. |
seanmonstar
left a comment
There was a problem hiding this comment.
Beautiful PR, and the speed boosts seem out of this world!
|
Thanks for merging this. Even if I recorded those perf numbers myself, I'm still somehow puzzled and a bit skeptical about them. Overall, I think the new code is a useful refactor but I personally won't guarantee the pictured performance changes to be valid in all environments. |
This reverts commit b2625f3.
This reverts commit b2625f3.
This has massive implications on the default runtime perf, improving how the code is lowered/inlined. (Falling back to SSE4.2 for a handful of bytes was wasteful). Should supersede seanmonstar#175, seanmonstar#156
This refactors all SIMD modules in order to make the value-matching logic self-contained. Thus, all bytes-cursor manipulations are now grouped and performed once at the end, outside of SIMD logic.
Performance impact on my Intel AVX2-capable workstation seems positive (arbitrary benchmark-noise filtering at >20%):