Parser performance by jdu2600 · Pull Request #273 · microsoft/krabsetw

jdu2600 · 2026-03-25T09:20:13Z

I'm writing a modern Rust ETW parsing library, and was doing some parsing benchmarking.

I was surprised at just how more performant my approach was againt existing approaches so I did a little digging.

The main issue for krabs seemed to be a linear scan over a per-event property cache. This is effectively an O(N²) scan if you grab all properties - and the cache actually only hits if you grab the same property twice. 😱

Plus wstring temporaries.

Event Type	krabsetw 4.4.8	krabsetw 4.4.6	ferrisetw 1.2.0	TdhFormatProperty	TdhFormatProperty (naive)	TdhGetProperty
Process Start (16 fields)	1286 ns	2522 ns	2503 ns	541 ns	1218 ns	3684 ns
RPC Client Call (9 fields)	516 ns	771 ns	1213 ns	338 ns	481 ns	1755 ns
Registry CreateKey (6 fields)	331 ns	467 ns	749 ns	548 ns	700 ns	1577 ns
File Create (7 fields)	410 ns	872 ns	867 ns	294 ns	489 ns	1146 ns
Thread Start (10 fields)	485 ns	1055 ns	905 ns	201 ns	849 ns	1482 ns

All measurements are per-event averages (ns) over the same captured set of events run on my laptop.

Replace the per-event deque/linear-scan property cache with a two-level scheme: a persistent name-to-index map in schema_locator (built once per event type, shared across all events) and a flat vector<property_info> in parser indexed by property position. This eliminates per-event hash table allocation and removes string comparisons during the blob walk. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace const std::wstring& with std::wstring_view in parser's public API (parse, try_parse, view_of) and internal helpers (find_property, assert_valid_assignment, throw_if_invalid). This eliminates heap-allocating std::wstring temporaries when callers pass string literals (the common case). Property names longer than MSVC's SSO threshold of 7 wchar_t previously caused a heap allocation and free per field access. Not a breaking change - std::wstring_view is implicitly constructible from both std::wstring and const wchar_t*. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

swannman

Outstanding work @jdu2600 - thank you for doing this!

kylereedmsft

Thank you for the PR. I know this has already been merged, but this change raises some new ideas that we should consider.

kylereedmsft · 2026-03-27T22:25:14Z

-
-        // Maintain a mapping from property name to blob data index.
-        std::deque<std::pair<const wchar_t *, property_info>> propertyCache_;
+        // Persistent name to index map shared across all events of the same type.


I'm not sure this layer of caching even makes sense. It's just overhead.
I can't think of any reason not to just maintain a map of name->property_info from the schema.

I'm fuzzy on all the various event type schemas, but looking at this code, it seems like the property_info blobs are expected to be consistent. Just trying to think through whether we ever need any of the old fallback code if we pre-populate and cache.

Yes and no. These were just the smallest and quickest wins that I could port across.

Since a cache existed, I kept it and just addressed its bugged performance characteristics.
Though, in practice, the O(1) map lookup only seems to have gains for events with 12+ fields. (Though I see much beter gains in Rust because of UTF-8 - smaller keys to hash).

We could just delete it - and use the (fallback) hinted linear scan.
This is actually even faster for my benchmark which accesses properties in event order.

I have implementioned both approaches - and will open two PRs for you.

property_info has dynamic per-event information for events with more than one variable length field - whereas the name->index mapping is static per-schema.

I actually do a lot more work upfront in my library - I allocate a parser, calculate the offsets for all fixed location properties and cache this. Variable location property offsets are still calculated lazily. The lookup step just resets the parser.

So you could pre-populate and cache a partial property_info[] also, but that's a bigger lift.

I'm happy to provide you both pre-release access to my library if you are interested.

Thanks for the PRs. Did you compare the two against each other? Just wondering whether you have a preference.
Re. lookup, yes the linear scans are SUPER fast and are almost always faster than the overhead of the hashing and node memory management from the maps.

I'm still trying to convince myself we need all that overhead in the parser to get the property_info. There's a lot more code in there than I would have expected taking a look with fresh eyes.

Hinting and/or simply tracking the last index between calls is probably more than enough and wouldn't need a cache at all.

After looking at them both, I really like the idea of having less than more. I can see the merits of 274 -- and it would line up most accurately with the current model. 275 seems better to me though, and I'm thinking it doesn't go far enough.

Some performance numbers for both approaches are in #275.
The hinted linear scan wins best case, and loses worst case. And the loss scales with the number of fields. TL;DR - O(1) versus O(N) scaling

If you can improve the map performance in a non-API-breaking way I would do that...

Otherwise, due to its simplicity, I would lean towards the linear scan using the highest accessed hint - despite it having worse performance in (arguably perverse) scenarios. I always consume properties in roughly the event order as this feels natural to me. It feels reasonable to assume that other developers mostly do the same...and will hit the fast path...

My only worry is that the user needs to understand the implemention details if they want best performance though. Whereas the map always gives adequate performance.

re: not far enough
What else are you thinking?

In my new library I optimised the map's performance, and provided a second API.

UTF-16LE -> UTF-8 keys

SipHash -> FxHash

// grab the parser from the cache - it's pre-warmed and reset for use let mut parser = schema_cache.parser(event)?; // convenience API - FxHashMap lookup keyed by UTF-8 property name let process_id: u32 = parser.get("ProcessID")?; // performance API - check hinted index first, then fall back to map lookup let image_name: String = parser.get_hinted("ImageName", 10)?;

I'm trying to decide whether even the vector in the parser makes sense. I will take some time today and look more closely to see if it's even worth THAT allocation.

I like the idea of the hint as a way to allow callers to help mitigate non-scan lookups. It's a nice fallback that isn't too hard to explain and doesn't require any caching in the core layer. Well, aside from the schema caching which has a huge perf benefit.

Eager caching is definitely the winner of the three approaches for my all-fields-sequentially benchmarks.

Event Type 4.4.6 PR#274 PR#275 PR#276 PR#276 hinted

Process Start 2620 ns 1322 ns 824 ns 624 ns 465 ns

Thread Start 1269 ns 495 ns 280 ns 152 ns 158 ns

RPC Client Call 871 ns 549 ns 304 ns 187 ns 194 ns

File Create 653 ns 415 ns 256 ns 192 ns 175 ns

Registry CreateKey 532 ns 351 ns 232 ns 153 ns 152 ns

It's less good for sparse/partial access though - especially for long events where only early properties are accessed.
Here are the times for accessing only the 2nd/4th/6th properties.
For an eager caching approach, this can potentially be mitigated by providing an (optional) max_eager_index limit to the constructor, and additional logic to handle lookups beyond the eager cache... at this point you are probably better off taking my approach of doing upfront parser pre-work and caching / resetting that.

Event Props PR#274 PR#275 PR#276 PR#276 hinted PR#276 + max_eager(5) PR#276 + max_eager(5) + hinted

Process Start 16 233 ns 147 ns 272 ns 260 ns 102 ns 91 ns

RPC ClientCall 9 204 ns 158 ns 121 ns 114 ns 115 ns 103 ns

Thread Start 8 206 ns 140 ns 103 ns 88 ns 94 ns 81 ns

File Create 7 202 ns 137 ns 126 ns 107 ns 94 ns 80 ns

Registry CreateKey 6 237 ns 170 ns 129 ns 119 ns 129 ns 120 ns

In practice though #276 is probably good enough for most real world use cases?

Do we have any of the history on why the lazy approach was chosen initially?
I assume there was something about how the team was using it internally at the time?

An earlier iteration of our ETW library parsing looked more like a map from name->functor to accept the data. That allowed a single, forward-only pass through the event. Only fields with a binding were actually extracted. When krabs was written, the original dev tried to keep that support for sparse access intact.
Krabs is nice at the consumer side, but the lazy, pull-through model comes with some major costs -- especially when using C# because of the thunk (hence the schema locator was added). I think #275 is probably good enough, especially with hinting plumbed through. If we really wanted to chase the dragon and squeeze out every last ns, a forward only, no-cache walk with binding layer would probably be the fastest, but it'd be a much more complicated change and would change the client API shape a lot.

I looked at our internal usage patterns; in most cases we do not end up reading all the properties on the event, we only care about a handful of properties (< 40% of the total properties available).
The benchmarks suggest #275 is the fastest for this pattern. Even in cases where we end up reading all the properties #276 will only be significantly better if the number of properties on the event is large (no such internal usage exists today), for a smaller number of properties the performance looks nearly the same.
I'm leaning towards 275 as the preferred approach to go with.

Thanks, @vmurthysuhas. That's my take too. It might still be worth pulling in the hint parameter changes from 276. We can even hold off on touching the managed layer for now and add that later if we need to.
@jdu2600 do you want to update 275 or would you like me to take your changes and apply the hinted overloads as a new PR?

👍 hinted overloads added to #275

we only care about a handful of properties (< 40% of the total properties available).
The benchmarks suggest #275 is the fastest for this pattern. Even in cases where we end up reading all the properties #276 will only be significantly better if the number of properties on the event is large (no such internal usage exists today), for a smaller number of properties the performance looks nearly the same.

The performance is less to do with the number of properties accessed, and more to do with the maximum property index accessed. And also to do with whether any slow size calculations (esp uncounted strings) can be skipped.

A better sparse benchmark might be every third property.
This always accesses a property near the end so #276's eager eval wins.

Event Props #275 #275 reverse #275 hinted (+reverse ) #276 #276 reverse #276 hinted (+reverse )

Process 6 of 16 482.9 1105.4 498.3 385.4 779.2 307.3

Thread 4 of 10 199.9 309.6 183.7 117.0 240.0 94.4

RPC 3 of 9 166.7 220.3 155.1 121.0 193.4 99.2

File 3 of 7 208.3 237.4 197.5 162.5 213.5 134.1

Registry 2 of 6 111.3 120.4 109.4 88.3 104.9 81.4

But if only the middle property is accessed, then #275's lazy eval sometimes wins.
But this depends on the event layout - and where the fixed/variable length fields sit.

Event type #275 #275 hinted #276 #276 hinted

Process 184.4 145.2 277.8 234.1

Thread 130.7 127.0 87.1 68.7

File 97.6 93.0 98.4 85.5

Registry 98.8 93.3 82.3 70.3

RPC 139.5 127.5 120.3 98.9

swannman · 2026-03-27T22:51:19Z

Thanks @kylereedmsft for the review - @jdu2600 if you have a moment, would you take a look?

John Uhlmann and others added 3 commits March 25, 2026 16:59

Bump NuGet version to 4.4.8

3a81f57

swannman approved these changes Mar 27, 2026

View reviewed changes

swannman merged commit f0934fd into microsoft:master Mar 27, 2026
2 checks passed

kylereedmsft reviewed Mar 27, 2026

View reviewed changes

This was referenced Apr 1, 2026

Parser performance feedback #274

Closed

replace name->property map with hinted linear scan #275

Merged

Event Type	4.4.6	PR#274	PR#275	PR#276	PR#276 hinted
Process Start	2620 ns	1322 ns	824 ns	624 ns	465 ns
Thread Start	1269 ns	495 ns	280 ns	152 ns	158 ns
RPC Client Call	871 ns	549 ns	304 ns	187 ns	194 ns
File Create	653 ns	415 ns	256 ns	192 ns	175 ns
Registry CreateKey	532 ns	351 ns	232 ns	153 ns	152 ns

Event	Props	PR#274	PR#275	PR#276	PR#276 hinted	PR#276 + max_eager(5)	PR#276 + max_eager(5) + hinted
Process Start	16	233 ns	147 ns	272 ns	260 ns	102 ns	91 ns
RPC ClientCall	9	204 ns	158 ns	121 ns	114 ns	115 ns	103 ns
Thread Start	8	206 ns	140 ns	103 ns	88 ns	94 ns	81 ns
File Create	7	202 ns	137 ns	126 ns	107 ns	94 ns	80 ns
Registry CreateKey	6	237 ns	170 ns	129 ns	119 ns	129 ns	120 ns

Event	Props	#275	#275 reverse	#275 hinted (+reverse )	#276	#276 reverse	#276 hinted (+reverse )
Process	6 of 16	482.9	1105.4	498.3	385.4	779.2	307.3
Thread	4 of 10	199.9	309.6	183.7	117.0	240.0	94.4
RPC	3 of 9	166.7	220.3	155.1	121.0	193.4	99.2
File	3 of 7	208.3	237.4	197.5	162.5	213.5	134.1
Registry	2 of 6	111.3	120.4	109.4	88.3	104.9	81.4

Event type	#275	#275 hinted	#276	#276 hinted
Process	184.4	145.2	277.8	234.1
Thread	130.7	127.0	87.1	68.7
File	97.6	93.0	98.4	85.5
Registry	98.8	93.3	82.3	70.3
RPC	139.5	127.5	120.3	98.9

Conversation

jdu2600 commented Mar 25, 2026

Uh oh!

swannman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kylereedmsft left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kylereedmsft Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vmurthysuhas Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

swannman commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kylereedmsft Apr 3, 2026 •

edited

Loading

vmurthysuhas Apr 9, 2026 •

edited

Loading