-
Notifications
You must be signed in to change notification settings - Fork 936
Description
The new QueryContext and ExecContext API's both take a driver.NamedValue instead of a driver.Value. Because pq internally uses driver.Value this means that the first thing that happens with both API's is a copy:
// Implement the "StmtExecContext" interface
func (st *stmt) ExecContext(ctx context.Context, args []driver.NamedValue) (driver.Result, error) {
list := make([]driver.Value, len(args))
for i, nv := range args {
list[i] = nv.Value
}This means that every call to this function with arguments allocates. Note also that database/sql will use QueryContext if it exists, so every call from database/sql is going through that call path now:
// queryDC executes a query on the given connection.
// The connection gets released by the releaseConn function.
// The ctx context is from a query method and the txctx context is from an
// optional transaction context.
func (db *DB) queryDC(ctx, txctx context.Context, dc *driverConn, releaseConn func(error), query string, args []interface{}) (*Rows, error) {
queryerCtx, ok := dc.ci.(driver.QueryerContext)
var queryer driver.Queryer
if !ok {
queryer, ok = dc.ci.(driver.Queryer)
}
if ok {
var nvdargs []driver.NamedValue
var rowsi driver.Rows
var err error
withLock(dc, func() {
nvdargs, err = driverArgsConnLocked(dc.ci, nil, args)
if err != nil {
return
}
rowsi, err = ctxDriverQuery(ctx, queryerCtx, queryer, query, nvdargs)
})Instead of using driver.Value internally, if all of the pq internal API's use driver.NamedValue, this saves an allocation in the most common case.
The patch implemented here: https://github.com/kevinburke/pq/compare/named-value?expand=1 improves on the PreparedSelect benchmark by about 4% on my Mac (the rest of the results appear to be noise)
name old time/op new time/op delta
BoolArrayScanBytes-10 530ns ± 1% 530ns ± 1% ~ (p=0.548 n=5+5)
BoolArrayValue-10 66.3ns ± 1% 66.7ns ± 0% ~ (p=0.095 n=5+5)
ByteaArrayScanBytes-10 980ns ± 2% 976ns ± 1% ~ (p=0.690 n=5+5)
ByteaArrayValue-10 279ns ± 2% 281ns ± 2% ~ (p=0.310 n=5+5)
Float64ArrayScanBytes-10 960ns ± 1% 956ns ± 4% ~ (p=0.310 n=5+5)
Float64ArrayValue-10 969ns ± 2% 966ns ± 1% ~ (p=0.421 n=5+5)
Int64ArrayScanBytes-10 626ns ± 1% 624ns ± 1% ~ (p=0.421 n=5+5)
Int64ArrayValue-10 483ns ± 2% 485ns ± 2% ~ (p=0.841 n=5+5)
Float32ArrayScanBytes-10 941ns ± 1% 950ns ± 2% ~ (p=0.246 n=5+5)
Float32ArrayValue-10 654ns ± 0% 663ns ± 2% +1.36% (p=0.016 n=5+5)
Int32ArrayScanBytes-10 623ns ± 1% 619ns ± 1% ~ (p=0.246 n=5+5)
Int32ArrayValue-10 328ns ± 1% 333ns ± 1% +1.53% (p=0.008 n=5+5)
StringArrayScanBytes-10 1.36µs ±10% 1.33µs ± 1% ~ (p=0.579 n=5+5)
StringArrayValue-10 2.51µs ± 2% 2.58µs ± 9% ~ (p=0.421 n=5+5)
GenericArrayScanScannerSliceBytes-10 2.62µs ± 1% 2.66µs ± 3% ~ (p=0.095 n=5+5)
GenericArrayValueBools-10 642ns ± 1% 647ns ± 1% ~ (p=0.151 n=5+5)
GenericArrayValueFloat64s-10 1.91µs ± 1% 1.89µs ± 1% ~ (p=0.151 n=5+5)
GenericArrayValueInt64s-10 1.09µs ± 0% 1.11µs ± 1% +1.26% (p=0.024 n=5+5)
GenericArrayValueByteSlices-10 2.69µs ± 1% 2.71µs ± 2% ~ (p=0.690 n=5+5)
GenericArrayValueStrings-10 2.88µs ± 1% 2.89µs ± 0% ~ (p=0.206 n=5+5)
SelectString-10 28.8µs ± 3% 28.6µs ± 1% ~ (p=0.690 n=5+5)
SelectSeries-10 52.1µs ± 2% 52.0µs ± 1% ~ (p=1.000 n=5+5)
MockSelectString-10 663ns ± 1% 665ns ± 2% ~ (p=1.000 n=5+5)
MockSelectSeries-10 7.12µs ± 1% 7.12µs ± 0% ~ (p=0.690 n=5+5)
PreparedSelectString-10 28.2µs ± 5% 27.0µs ± 1% -4.47% (p=0.008 n=5+5)
PreparedSelectSeries-10 45.1µs ± 1% 44.7µs ± 1% -0.86% (p=0.032 n=5+5)
MockPreparedSelectString-10 336ns ± 1% 342ns ± 3% +1.68% (p=0.016 n=5+5)
MockPreparedSelectSeries-10 6.77µs ± 1% 6.79µs ± 0% ~ (p=0.310 n=5+5)
EncodeInt64-10 22.6ns ± 0% 22.7ns ± 1% ~ (p=0.579 n=5+5)
EncodeFloat64-10 65.0ns ± 2% 64.6ns ± 1% ~ (p=0.548 n=5+5)
EncodeByteaHex-10 78.5ns ± 2% 80.9ns ± 4% +3.07% (p=0.016 n=5+5)
EncodeByteaEscape-10 125ns ± 1% 125ns ± 1% ~ (p=0.508 n=5+5)
EncodeBool-10 14.8ns ± 0% 14.9ns ± 1% ~ (p=0.143 n=4+5)
EncodeTimestamptz-10 264ns ± 1% 264ns ± 1% ~ (p=0.690 n=5+5)
DecodeInt64-10 32.5ns ± 1% 32.6ns ± 1% ~ (p=0.548 n=5+5)
DecodeFloat64-10 43.7ns ± 1% 44.0ns ± 2% ~ (p=0.595 n=5+5)
DecodeBool-10 2.56ns ± 0% 2.55ns ± 1% -0.62% (p=0.032 n=5+5)
DecodeTimestamptz-10 146ns ± 0% 147ns ± 1% ~ (p=0.063 n=5+5)
DecodeTimestamptzMultiThread-10 176ns ± 2% 179ns ± 3% ~ (p=0.222 n=5+5)
LocationCache-10 37.1ns ± 1% 36.7ns ± 1% -1.18% (p=0.008 n=5+5)
LocationCacheMultiThread-10 162ns ± 1% 160ns ± 1% -1.11% (p=0.008 n=5+5)
ResultParsing-10 3.81ms ± 0% 3.81ms ± 0% ~ (p=0.730 n=4+5)
_writeBuf_string-10 1.58ns ± 1% 1.56ns ± 0% -1.70% (p=0.008 n=5+5)
CopyIn-10 309ns ± 4% 308ns ± 2% ~ (p=0.889 n=5+5)
AppendEscapedText-10 2.29µs ± 1% 2.30µs ± 1% ~ (p=0.548 n=5+5)
AppendEscapedTextNoEscape-10 1.00µs ± 0% 1.01µs ± 0% +0.54% (p=0.024 n=5+5)
DecodeUUIDBinary-10 37.8ns ± 1% 38.0ns ± 1% ~ (p=0.095 n=5+5)
This patch improves performance on my rickover dequeue benchmark (github.com/kevinburke/rickover), which measures how fast I can get rows out of the database. I can try to get statistically significant results, but you can see it reduces the number of allocations and it's reasonable to assume that performance is also improved.
$ benchstat /tmp/old /tmp/new
name old time/op new time/op delta
Dequeue/Dequeue1-10 8.00ms ±10% 7.74ms ± 4% ~ (p=0.421 n=5+5)
name old speed new speed delta
Dequeue/Dequeue1-10 0.00B/s 0.00B/s ~ (all equal)
name old alloc/op new alloc/op delta
Dequeue/Dequeue1-10 12.3kB ±13% 12.1kB ± 2% ~ (p=0.690 n=5+5)
name old allocs/op new allocs/op delta
Dequeue/Dequeue1-10 160 ±13% 155 ± 1% ~ (p=1.000 n=5+5)