Add span overloads to API #122

AlgorithmsAreCool · 2020-06-28T06:40:03Z

This PR closes #116. It surfaces Span overloads for most operations and should improve both read and write performance primarily by reducing GC allocations

AlgorithmsAreCool · 2020-06-28T07:14:12Z

Before :

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.836 (1909/November2018Update/19H2)
Intel Core i7-8750H CPU 2.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-preview.5.20279.10
  [Host]     : .NET Core 3.1.5 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.27001), X64 RyuJIT
  DefaultJob : .NET Core 3.1.5 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.27001), X64 RyuJIT

Method	OpsPerTransaction	ValueSize	KeyOrder	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
Read	1	8	Sequential	424.5 ns	8.15 ns	7.62 ns	0.0710	-	-	336 B
Read	1	64	Sequential	429.0 ns	8.51 ns	13.00 ns	0.0830	-	-	392 B
Read	1	256	Sequential	441.4 ns	7.69 ns	7.19 ns	0.1240	-	-	584 B
Read	100	8	Sequential	12,507.1 ns	244.10 ns	372.77 ns	0.7324	-	-	3504 B
Read	100	64	Sequential	15,080.6 ns	288.58 ns	364.96 ns	1.9226	-	-	9104 B
Read	100	256	Sequential	16,760.2 ns	331.27 ns	515.74 ns	6.0120	-	-	28304 B
Read	1000	8	Sequential	173,016.9 ns	3,397.00 ns	8,332.91 ns	6.8359	-	-	32304 B
Read	1000	64	Sequential	179,190.1 ns	3,554.55 ns	5,428.18 ns	18.5547	-	-	88304 B
Read	1000	256	Sequential	201,845.5 ns	3,963.95 ns	6,512.88 ns	59.5703	-	-	280304 B

After

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.836 (1909/November2018Update/19H2)
Intel Core i7-8750H CPU 2.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-preview.5.20279.10
  [Host]     : .NET Core 3.1.5 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.27001), X64 RyuJIT
  DefaultJob : .NET Core 3.1.5 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.27001), X64 RyuJIT

Method	OpsPerTransaction	ValueSize	KeyOrder	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
Read	1	8	Sequential	426.7 ns	8.46 ns	11.00 ns	0.0710	-	-	336 B
Read	1	64	Sequential	437.0 ns	8.76 ns	15.79 ns	0.0830	-	-	392 B
Read	1	256	Sequential	445.2 ns	8.53 ns	10.79 ns	0.1240	-	-	584 B
Read	100	8	Sequential	12,377.9 ns	236.80 ns	324.14 ns	0.7324	-	-	3504 B
Read	100	64	Sequential	14,276.2 ns	281.75 ns	438.66 ns	1.9226	-	-	9104 B
Read	100	256	Sequential	16,474.7 ns	324.40 ns	485.55 ns	6.0120	-	-	28304 B
Read	1000	8	Sequential	165,299.1 ns	3,158.93 ns	3,243.99 ns	6.8359	-	-	32304 B
Read	1000	64	Sequential	179,787.0 ns	3,389.49 ns	3,170.53 ns	18.5547	-	-	88305 B
Read	1000	256	Sequential	220,032.9 ns	5,844.55 ns	17,048.84 ns	59.5703	-	-	280304 B

After (with new non allocating API)

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.836 (1909/November2018Update/19H2)
Intel Core i7-8750H CPU 2.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-preview.5.20279.10
  [Host]     : .NET Core 3.1.5 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.27001), X64 RyuJIT
  DefaultJob : .NET Core 3.1.5 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.27001), X64 RyuJIT

Method	OpsPerTransaction	ValueSize	KeyOrder	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
Read	1	8	Sequential	449.6 ns	8.88 ns	14.83 ns	0.0644	-	-	304 B
Read	1	64	Sequential	435.0 ns	8.72 ns	21.88 ns	0.0644	-	-	304 B
Read	1	256	Sequential	415.3 ns	7.60 ns	8.44 ns	0.0644	-	-	304 B
Read	100	8	Sequential	12,363.8 ns	240.95 ns	337.78 ns	0.0610	-	-	304 B
Read	100	64	Sequential	13,812.4 ns	272.65 ns	505.38 ns	0.0610	-	-	304 B
Read	100	256	Sequential	14,835.0 ns	293.48 ns	521.66 ns	0.0610	-	-	304 B
Read	1000	8	Sequential	164,105.6 ns	3,256.16 ns	5,529.21 ns	-	-	-	304 B
Read	1000	64	Sequential	172,565.8 ns	3,365.86 ns	5,715.49 ns	-	-	-	304 B
Read	1000	256	Sequential	173,051.3 ns	3,410.59 ns	7,557.63 ns	-	-	-	304 B

AlgorithmsAreCool · 2020-06-28T07:15:06Z

I need to do another pass over the documentation and do the write benchmarks

I'll take care of those tomorrow, it's late

CoreyKaylor · 2020-06-28T12:43:52Z

src/LightningDB/LightningCursor.cs

            _currentValue = default;
-            var findKey = new MDBValue(new Span<byte>(key));
-            return mdb_cursor_get(_handle, ref findKey, ref _currentValue, operation) == 0;
+#warning should this update _currentKey?


Yes, it looks like I introduced this in my first pass at changing things. Good catch.

Question For these Gets, if the key doesn't exist, should the cursor data be updated?
The LMDB docs don't actually say what might get returned if the key isn't found

I believe it's using search under the covers and if my memory is correct it will change the cursor index to the one just below a key "higher" than what you looked for if that makes sense.

That makes sense to me. One last update one the way then

src/LightningDB/LightningTransaction.cs

CoreyKaylor · 2020-06-28T13:05:27Z

Seriously, well done.

AlgorithmsAreCool · 2020-06-29T05:26:20Z

I've spent some time today seeing if there was any low hanging fruit in terms of perf:

In the table below:

Read is a normal TryRead into a preallocated buffer
ReadNoCopy is a read but with the value copy ommitted
ReadMinimal is a rawusage of the LmdbMethods, to see how much overhead safety of the library was adding.

The good news is that at peak, we are hitting 5M+ reads per second depending on batch and value sizes. This translates to read throughput of over 1.2GB/s. Not too shabby. Also, the overhead of copying the value into an array is pretty low, this hurts my case for #121

The not so great is that the overhead of the transaction is felt more than I expected. The common pattern of single reads per transaction is giving up about half of it's performance to library overhead. I'm a little shocked by this result because the overhead was only about 15% or so when using sequential keys and a smaller database. This test uses random keys and 256K entries in the DB instead of just 1000.

But even the "worst case" is still ~2M reads per second. That ain't bad at all. Also, even though we are allocating 300 bytes per read, they are all clustered in Gen0 so the GC not incurring much pain from it.

I'll probably create another issue to poke at it some more after I address feedback and remove those warnings i forgot about in this one.

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.836 (1909/November2018Update/19H2)
Intel Core i7-8750H CPU 2.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-preview.5.20279.10
[Host] : .NET Core 3.1.5 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.27001), X64 RyuJIT
DefaultJob : .NET Core 3.1.5 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.27001), X64 RyuJIT

Method	OpsPerTransaction	DatabaseSize	ValueSize	KeyOrder	Mean	Error	StdDev	Throughput	Memory Throughput	Median	Ratio	RatioSD	Gen 0	Gen 1	Gen 2	Allocated
Read	1	262144	8	Random	513.3 ns	9.67 ns	14.17 ns	1,948,162.52 / s	15,585,300.18 MB/s	513.3 ns	2.05	0.07	0.0639	-	-	304 B
ReadNoCopy	1	262144	8	Random	497.6 ns	9.95 ns	18.93 ns	2,009,765.73 / s	16,078,125.87 MB/s	491.2 ns	1.98	0.07	0.0639	-	-	304 B
ReadMinimal	1	262144	8	Random	250.9 ns	5.05 ns	7.71 ns	3,985,211.15 / s	31,881,689.24 MB/s	251.1 ns	1.00	0.00	-	-	-	-

Read	1	262144	64	Random	492.7 ns	9.69 ns	13.59 ns	2,029,661.31 / s	129,898,323.92 MB/s	488.3 ns	2.08	0.07	0.0639	-	-	304 B
ReadNoCopy	1	262144	64	Random	492.4 ns	9.53 ns	13.35 ns	2,030,757.10 / s	129,968,454.16 MB/s	495.6 ns	2.08	0.08	0.0639	-	-	304 B
ReadMinimal	1	262144	64	Random	237.8 ns	4.77 ns	6.36 ns	4,205,150.29 / s	269,129,618.80 MB/s	237.8 ns	1.00	0.00	-	-	-	-

Read	1	262144	256	Random	498.5 ns	9.97 ns	14.61 ns	2,005,850.63 / s	513,497,761.70 MB/s	503.6 ns	2.04	0.10	0.0639	-	-	304 B
ReadNoCopy	1	262144	256	Random	494.2 ns	9.80 ns	16.63 ns	2,023,338.34 / s	517,974,614.30 MB/s	495.7 ns	2.00	0.13	0.0644	-	-	304 B
ReadMinimal	1	262144	256	Random	252.6 ns	5.03 ns	12.24 ns	3,958,301.20 / s	1,013,325,108.30 MB/s	254.3 ns	1.00	0.00	-	-	-	-

Read	100	262144	8	Random	19,900.9 ns	389.37 ns	594.61 ns	5,024,900.04 / s	40,199,200.35 MB/s	20,074.8 ns	1.26	0.04	0.0610	-	-	304 B
ReadNoCopy	100	262144	8	Random	19,488.5 ns	377.90 ns	912.68 ns	5,131,219.32 / s	41,049,754.55 MB/s	19,426.4 ns	1.20	0.07	0.0610	-	-	304 B
ReadMinimal	100	262144	8	Random	15,874.3 ns	298.27 ns	319.14 ns	6,299,496.24 / s	50,395,969.92 MB/s	15,858.7 ns	1.00	0.00	-	-	-	-

Read	100	262144	64	Random	20,338.6 ns	402.07 ns	774.65 ns	4,916,754.83 / s	314,672,309.30 MB/s	20,428.5 ns	1.18	0.05	0.0610	-	-	304 B
ReadNoCopy	100	262144	64	Random	18,794.6 ns	373.90 ns	445.10 ns	5,320,667.67 / s	340,522,731.20 MB/s	18,747.5 ns	1.10	0.05	0.0610	-	-	304 B
ReadMinimal	100	262144	64	Random	17,299.3 ns	344.10 ns	584.31 ns	5,780,576.01 / s	369,956,864.43 MB/s	17,583.2 ns	1.00	0.00	-	-	-	-

Read	100	262144	256	Random	21,189.2 ns	417.04 ns	390.10 ns	4,719,374.46 / s	1,208,159,862.97 MB/s	21,199.8 ns	1.20	0.03	0.0610	-	-	304 B
ReadNoCopy	100	262144	256	Random	20,382.1 ns	402.55 ns	523.43 ns	4,906,272.15 / s	1,256,005,670.03 MB/s	20,381.3 ns	1.15	0.03	0.0610	-	-	304 B
ReadMinimal	100	262144	256	Random	17,678.7 ns	350.80 ns	344.53 ns	5,656,513.21 / s	1,448,067,381.00 MB/s	17,541.7 ns	1.00	0.00	-	-	-	-

Read	1000	262144	8	Random	271,680.7 ns	5,404.78 ns	8,880.21 ns	3,680,791.47 / s	29,446,331.78 MB/s	270,716.9 ns	1.09	0.04	-	-	-	304 B
ReadNoCopy	1000	262144	8	Random	257,954.2 ns	4,940.35 ns	5,073.38 ns	3,876,657.19 / s	31,013,257.49 MB/s	257,343.4 ns	1.06	0.04	-	-	-	304 B
ReadMinimal	1000	262144	8	Random	248,480.0 ns	4,924.56 ns	8,227.83 ns	4,024,468.93 / s	32,195,751.46 MB/s	247,993.7 ns	1.00	0.00	-	-	-	-

Read	1000	262144	64	Random	279,058.9 ns	5,554.03 ns	7,024.07 ns	3,583,473.35 / s	229,342,294.41 MB/s	281,347.1 ns	1.06	0.04	-	-	-	304 B
ReadNoCopy	1000	262144	64	Random	271,230.2 ns	5,180.11 ns	4,845.48 ns	3,686,904.78 / s	235,961,906.02 MB/s	272,830.3 ns	1.03	0.02	-	-	-	304 B
ReadMinimal	1000	262144	64	Random	263,690.2 ns	5,026.03 ns	4,701.35 ns	3,792,328.96 / s	242,709,053.15 MB/s	265,527.0 ns	1.00	0.00	-	-	-	-

Read	1000	262144	256	Random	300,485.7 ns	5,819.14 ns	5,975.82 ns	3,327,944.95 / s	851,953,908.31 MB/s	302,726.5 ns	1.09	0.04	-	-	-	304 B
ReadNoCopy	1000	262144	256	Random	284,745.1 ns	5,492.79 ns	6,105.23 ns	3,511,912.41 / s	899,049,576.07 MB/s	287,330.3 ns	1.03	0.02	-	-	-	304 B
ReadMinimal	1000	262144	256	Random	276,339.5 ns	5,475.50 ns	5,858.72 ns	3,618,736.54 / s	926,396,553.86 MB/s	278,851.8 ns	1.00	0.00	-	-	-	1 B

Read	10000	262144	8	Random	3,156,183.7 ns	28,290.50 ns	26,462.95 ns	3,168,383.35 / s	25,347,066.80 MB/s	3,156,604.1 ns	1.09	0.01	-	-	-	306 B
ReadNoCopy	10000	262144	8	Random	3,082,680.0 ns	26,803.50 ns	25,072.01 ns	3,243,930.62 / s	25,951,444.96 MB/s	3,075,674.0 ns	1.07	0.01	-	-	-	304 B
ReadMinimal	10000	262144	8	Random	2,886,256.1 ns	28,983.67 ns	24,202.68 ns	3,464,696.05 / s	27,717,568.39 MB/s	2,885,249.2 ns	1.00	0.00	-	-	-	5 B

Read	10000	262144	64	Random	3,978,827.5 ns	29,298.17 ns	27,405.53 ns	2,513,303.23 / s	160,851,406.60 MB/s	3,974,537.5 ns	1.06	0.01	-	-	-	314 B
ReadNoCopy	10000	262144	64	Random	3,869,549.2 ns	24,773.98 ns	23,173.60 ns	2,584,280.35 / s	165,393,942.24 MB/s	3,865,378.9 ns	1.03	0.01	-	-	-	306 B
ReadMinimal	10000	262144	64	Random	3,761,864.1 ns	26,050.39 ns	21,753.25 ns	2,658,256.56 / s	170,128,420.15 MB/s	3,758,513.3 ns	1.00	0.00	-	-	-	5 B

Read	10000	262144	256	Random	4,671,714.2 ns	35,431.55 ns	31,409.15 ns	2,140,541.91 / s	547,978,729.99 MB/s	4,671,530.5 ns	1.10	0.02	-	-	-	304 B
ReadNoCopy	10000	262144	256	Random	4,362,596.4 ns	27,258.76 ns	24,164.19 ns	2,292,212.97 / s	586,806,520.99 MB/s	4,352,195.3 ns	1.03	0.01	-	-	-	310 B
ReadMinimal	10000	262144	256	Random	4,231,405.8 ns	45,450.64 ns	40,290.81 ns	2,363,280.78 / s	604,999,879.20 MB/s	4,225,128.1 ns	1.00	0.00	-	-	-	37 B

CoreyKaylor · 2020-06-29T18:32:19Z

This is all good news. Sounds like there might be more to "squeeze" out, but we're in a much better place with these changes than we were. Looking forward to getting these changes in. Thanks for all the effort here.

CoreyKaylor · 2020-07-06T03:00:53Z

I'm planning on pulling this in tomorrow unless you have any objections?

AlgorithmsAreCool · 2020-07-06T18:19:04Z

Yeah, I don't object. I can land anything else in another PR. Sorry about the delay here

CoreyKaylor · 2020-07-06T18:28:07Z

Added you as a collaborator if you're interested. All I ask is bigger changes get discussed first. Regarding time, I get it, nothing to be sorry for.

cocowalla · 2020-07-07T17:58:08Z

src/LightningDB/LightningCursor.cs

+            }
+            else
+            {
+                fixed (byte* contiguousValuesPtr = new byte[overallLength])


Not sure if new buffers are created elsewhere, or indeed how often you'd expect this unoptimised path to be hit, but maybe it could use pooled memory instead of creating a new byte array, e.g. using ArrayPool from System.Buffers?

(just a random user happy to see Span support, having a quick browse through the PR 😄)

The previous implementation allocated a lot more also, so this should be a net improvement either way.

As for pooling vs allocs, I've actually been kinda frustrated at how good the .NET GC has become 😅. I haven't benchmarked this method specifically, but it is probably still improvable, the question is by how much.

If this path is important to your use case I can create an issue for it to investigate.

I haven't actually benchmarked (or even tried the new Spanified APIs yet), I was just looked at the PR and happened to notice this, and it's similar to a pattern I use a lot (fast path using stackalloc for small buffers, slow path using ArrayPool for larger buffers). But for sure as-is it's a huge net benefit 😃

CoreyKaylor reviewed Jun 28, 2020

View reviewed changes

src/LightningDB/LightningTransaction.cs Outdated Show resolved Hide resolved

AlgorithmsAreCool added 2 commits June 29, 2020 15:00

Add span overloads to API

2d96e72

add more docs

7958629

AlgorithmsAreCool force-pushed the surface-span-api branch from 7d4a014 to 7958629 Compare June 29, 2020 21:04

CoreyKaylor merged commit 7958629 into CoreyKaylor:master Jul 6, 2020

cocowalla reviewed Jul 7, 2020

View reviewed changes

Add span overloads to API #122

Add span overloads to API #122

Uh oh!

Conversation

AlgorithmsAreCool commented Jun 28, 2020

Uh oh!

AlgorithmsAreCool commented Jun 28, 2020

Uh oh!

AlgorithmsAreCool commented Jun 28, 2020

Uh oh!

CoreyKaylor Jun 28, 2020

Choose a reason for hiding this comment

Uh oh!

AlgorithmsAreCool Jun 29, 2020

Choose a reason for hiding this comment

Uh oh!

CoreyKaylor Jun 30, 2020

Choose a reason for hiding this comment

Uh oh!

AlgorithmsAreCool Jun 30, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

CoreyKaylor commented Jun 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlgorithmsAreCool commented Jun 29, 2020

Uh oh!

CoreyKaylor commented Jun 29, 2020

Uh oh!

CoreyKaylor commented Jul 6, 2020

Uh oh!

AlgorithmsAreCool commented Jul 6, 2020

Uh oh!

CoreyKaylor commented Jul 6, 2020

Uh oh!

cocowalla Jul 7, 2020

Choose a reason for hiding this comment

Uh oh!

AlgorithmsAreCool Jul 7, 2020

Choose a reason for hiding this comment

Uh oh!

cocowalla Jul 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CoreyKaylor commented Jun 28, 2020 •

edited

Loading

cocowalla Jul 7, 2020 •

edited

Loading