-
Notifications
You must be signed in to change notification settings - Fork 84
Add span overloads to API #122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add span overloads to API #122
Conversation
|
Before : BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.836 (1909/November2018Update/19H2)
Intel Core i7-8750H CPU 2.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-preview.5.20279.10
[Host] : .NET Core 3.1.5 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.27001), X64 RyuJIT
DefaultJob : .NET Core 3.1.5 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.27001), X64 RyuJIT
After BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.836 (1909/November2018Update/19H2)
Intel Core i7-8750H CPU 2.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-preview.5.20279.10
[Host] : .NET Core 3.1.5 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.27001), X64 RyuJIT
DefaultJob : .NET Core 3.1.5 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.27001), X64 RyuJIT
After (with new non allocating API) BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.836 (1909/November2018Update/19H2)
Intel Core i7-8750H CPU 2.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-preview.5.20279.10
[Host] : .NET Core 3.1.5 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.27001), X64 RyuJIT
DefaultJob : .NET Core 3.1.5 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.27001), X64 RyuJIT
|
|
I need to do another pass over the documentation and do the write benchmarks I'll take care of those tomorrow, it's late |
| _currentValue = default; | ||
| var findKey = new MDBValue(new Span<byte>(key)); | ||
| return mdb_cursor_get(_handle, ref findKey, ref _currentValue, operation) == 0; | ||
| #warning should this update _currentKey? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it looks like I introduced this in my first pass at changing things. Good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question For these Gets, if the key doesn't exist, should the cursor data be updated?
The LMDB docs don't actually say what might get returned if the key isn't found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's using search under the covers and if my memory is correct it will change the cursor index to the one just below a key "higher" than what you looked for if that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me. One last update one the way then
|
I've spent some time today seeing if there was any low hanging fruit in terms of perf: In the table below:
The good news is that at peak, we are hitting 5M+ reads per second depending on batch and value sizes. This translates to read throughput of over 1.2GB/s. Not too shabby. Also, the overhead of copying the value into an array is pretty low, this hurts my case for #121 The not so great is that the overhead of the transaction is felt more than I expected. The common pattern of single reads per transaction is giving up about half of it's performance to library overhead. I'm a little shocked by this result because the overhead was only about 15% or so when using sequential keys and a smaller database. This test uses random keys and 256K entries in the DB instead of just 1000. But even the "worst case" is still ~2M reads per second. That ain't bad at all. Also, even though we are allocating 300 bytes per read, they are all clustered in Gen0 so the GC not incurring much pain from it. I'll probably create another issue to poke at it some more after I address feedback and remove those warnings i forgot about in this one. BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.836 (1909/November2018Update/19H2)
|
|
This is all good news. Sounds like there might be more to "squeeze" out, but we're in a much better place with these changes than we were. Looking forward to getting these changes in. Thanks for all the effort here. |
7d4a014 to
7958629
Compare
|
I'm planning on pulling this in tomorrow unless you have any objections? |
|
Yeah, I don't object. I can land anything else in another PR. Sorry about the delay here |
|
Added you as a collaborator if you're interested. All I ask is bigger changes get discussed first. Regarding time, I get it, nothing to be sorry for. |
| } | ||
| else | ||
| { | ||
| fixed (byte* contiguousValuesPtr = new byte[overallLength]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if new buffers are created elsewhere, or indeed how often you'd expect this unoptimised path to be hit, but maybe it could use pooled memory instead of creating a new byte array, e.g. using ArrayPool from System.Buffers?
(just a random user happy to see Span support, having a quick browse through the PR 😄)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous implementation allocated a lot more also, so this should be a net improvement either way.
As for pooling vs allocs, I've actually been kinda frustrated at how good the .NET GC has become 😅. I haven't benchmarked this method specifically, but it is probably still improvable, the question is by how much.
If this path is important to your use case I can create an issue for it to investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't actually benchmarked (or even tried the new Spanified APIs yet), I was just looked at the PR and happened to notice this, and it's similar to a pattern I use a lot (fast path using stackalloc for small buffers, slow path using ArrayPool for larger buffers). But for sure as-is it's a huge net benefit 😃

This PR closes #116. It surfaces Span overloads for most operations and should improve both read and write performance primarily by reducing GC allocations