Report profiling data in v2.4 intake format; compress files#53
Conversation
This was tested with both the Ruby (agent and agentless modes) and the PHP profilers. This also introduces a breaking API change: the `ddog_ProfileExporter_build` / `ProfileExporter::build` functions now take two additional arguments -- the profiling library name and version. Other than that change, using the v2.4 intake format is transparent to the libdatadog users. Thanks to @morrisonlevi for pairing with me on this. Note that this does not (yet) include support for including attributes in the reporting data. I'll leave that for a separate PR.
This isn't a thing in v2.4
c2ac732 to
61ccef9
Compare
|
@morrisonlevi I was able to test your lz4 changes successfully with the Ruby profiler as well. I had, of course, to disable my own compression -- funnily enough the intake would still accept profiles that were gzipped by Ruby + lz4'd by libdatadog but they never showed up on the profile list ;) |
| ) | ||
| let mut encoder = FrameEncoder::new(Vec::new()); | ||
| encoder.write_all(file.bytes)?; | ||
| form.add_reader_file(file.name, Cursor::new(encoder.finish()?), file.name) |
There was a problem hiding this comment.
How is the compression format defined in the payload ?
There was a problem hiding this comment.
AFAIK the intake just looks for magic bits to distinguish if an uploaded file is [gzip|lz4] compressed or not. Otherwise no changes needed.
There was a problem hiding this comment.
Ivo is right, according to Florian:
Levi: How does it know what compression format to use? Inspect magic bytes?
Florian: yes
Florian: I was looking at java profiler and it uses LZ4 by default
Florian: Jaroslav experiments has shown that it's a good space/CPU tradeoff for the profiler
It's also is why I picked lz4. I didn't test or verify these claims beyond "works for me."
There was a problem hiding this comment.
Thanks. There are indeed magic values at the start of the payload that you can use.
As a Datadog agnostic library, this is not super friendly.
There was a problem hiding this comment.
As a Datadog agnostic library, this is not super friendly.
Do you mean, if other teams at Datadog want to reuse this, or if non-datadog people want to reuse this? I'd like to understand your concerns :)
|
I'm clarifying with Florian whether it's important that Or if this is fine: |
morrisonlevi
left a comment
There was a problem hiding this comment.
I'm still waiting on Florian for a question about name vs filename. Aside from this, with fresh I eyes I think we should set profiling library name and version at the same place we set family, which is when we build the exporter, not the request, as this information is unlikely to change.
I also don't know how many people are re-using an exporter vs just making a new one so... it may be a moot point.
| } | ||
|
|
||
| /// Build a Request object representing the profile information provided. | ||
| #[allow(clippy::too_many_arguments)] |
There was a problem hiding this comment.
IMO this is fine for now, but we should probably work on breaking this up when we add support for the extra attributes feature.
| ) | ||
| let mut encoder = FrameEncoder::new(Vec::new()); | ||
| encoder.write_all(file.bytes)?; | ||
| form.add_reader_file(file.name, Cursor::new(encoder.finish()?), file.name) |
There was a problem hiding this comment.
Ivo is right, according to Florian:
Levi: How does it know what compression format to use? Inspect magic bytes?
Florian: yes
Florian: I was looking at java profiler and it uses LZ4 by default
Florian: Jaroslav experiments has shown that it's a good space/CPU tradeoff for the profiler
It's also is why I picked lz4. I didn't test or verify these claims beyond "works for me."
Right, that makes sense. I'll probably not have time today, but I've made and note and will come back soon-ish™️ |
I confirmed that intake doesn't care about the form field name for files other than |
morrisonlevi
left a comment
There was a problem hiding this comment.
This looks good to me, but then again I wrote or paired for all of it, so that's expected ^_^. I'll let someone else review it.
| ) -> anyhow::Result<ProfileExporter> { | ||
| ) -> anyhow::Result<ProfileExporter> | ||
| where | ||
| F: Into<Cow<'static, str>>, |
There was a problem hiding this comment.
why do we need to alias to different types ?
Also the Cow 'static lifetime seems strange to me.
There was a problem hiding this comment.
Instead of thinking Cow<str> as a string that gets cloned on write, think of it instead as either a Borrowed string, or an Owned one. In this case, we can borrow a 'static string because that will, by definition, live long enough. Doing so allows us to save some memory allocations if we can show the lifetime is static (such as PHP profiler calling this from Rust). However, if we can't prove it, then we need an Owned version that copies it; this is what will happen across the C FFI.
The reason for taking an Into<Cow<_>> is so that you can pass a &str or a String or anything else that the compiler knows how to convert via into or from, making it nicer for the caller since they don't have to wrap it into a Cow<_>. The reason for having it repeated 3 times is so each parameter can independently do this -- if we had only one type IntoCow: Into<Cow<'static, str>> that they all used, then they'd all have to be the same type, which isn't so nice. For instance, maybe the library name is a &str but the version is a String.
r1viollet
left a comment
There was a problem hiding this comment.
LGTM !
Happy to integrate this in native !
* Fix compilation of main branch Invert order of debug and pin_project macros to fix compilation issue on rust 1.56 * Bump version to 0.6.0-rc.1 * Fix license check requirements
What does this PR do?
This PR changes libdatadog to report profiling data using the v2.4 intake format. It also compresses the files
includingexcluding the newevent.jsonfile using lz4.This was tested with both the Ruby (agent and agentless modes) and the PHP profilers.
This also introduces two breaking changes:
ddog_ProfileExporter_new/ProfileExporter::newfunctions now take two additional arguments -- the profiling library name and version.We expect that other than the API change, using the v2.4 intake format and the added compression will be transparent to the libdatadog users.
Thanks to @morrisonlevi for pairing with me on this.
Note that this does not (yet) include support for including attributes in the reporting data. I'll leave that for a separate PR.
Motivation
Get libdatadog users to use intake v2.4, and lay the ground work for including attributes in the reported data.
Also save some bytes over the wire for files. A few stats:
- PHP CLI: running
composer create-project symfony/symfony-demoresulted in a 40016 byte pprof which compressed to 18139 bytes in 141661 nanoseconds. That's a compression ratio of 2.2061 and a space savings of 54.67%.- Ruby github relenv test app:
Additional Notes
(Nothing)
How to test the change?
Validate that data can still be reported to the backend, in both agent as well as agentless modes.