Skip to content

Swap driver.Value for driver.NamedValue in internal APIs #1067

@kevinburke

Description

@kevinburke

The new QueryContext and ExecContext API's both take a driver.NamedValue instead of a driver.Value. Because pq internally uses driver.Value this means that the first thing that happens with both API's is a copy:

// Implement the "StmtExecContext" interface
func (st *stmt) ExecContext(ctx context.Context, args []driver.NamedValue) (driver.Result, error) {
	list := make([]driver.Value, len(args))
	for i, nv := range args {
		list[i] = nv.Value
	}

This means that every call to this function with arguments allocates. Note also that database/sql will use QueryContext if it exists, so every call from database/sql is going through that call path now:

// queryDC executes a query on the given connection.
// The connection gets released by the releaseConn function.
// The ctx context is from a query method and the txctx context is from an
// optional transaction context.
func (db *DB) queryDC(ctx, txctx context.Context, dc *driverConn, releaseConn func(error), query string, args []interface{}) (*Rows, error) {
	queryerCtx, ok := dc.ci.(driver.QueryerContext)
	var queryer driver.Queryer
	if !ok {
		queryer, ok = dc.ci.(driver.Queryer)
	}
	if ok {
		var nvdargs []driver.NamedValue
		var rowsi driver.Rows
		var err error
		withLock(dc, func() {
			nvdargs, err = driverArgsConnLocked(dc.ci, nil, args)
			if err != nil {
				return
			}
			rowsi, err = ctxDriverQuery(ctx, queryerCtx, queryer, query, nvdargs)
		})

Instead of using driver.Value internally, if all of the pq internal API's use driver.NamedValue, this saves an allocation in the most common case.

The patch implemented here: https://github.com/kevinburke/pq/compare/named-value?expand=1 improves on the PreparedSelect benchmark by about 4% on my Mac (the rest of the results appear to be noise)

name                                  old time/op    new time/op    delta
BoolArrayScanBytes-10                    530ns ± 1%     530ns ± 1%    ~     (p=0.548 n=5+5)
BoolArrayValue-10                       66.3ns ± 1%    66.7ns ± 0%    ~     (p=0.095 n=5+5)
ByteaArrayScanBytes-10                   980ns ± 2%     976ns ± 1%    ~     (p=0.690 n=5+5)
ByteaArrayValue-10                       279ns ± 2%     281ns ± 2%    ~     (p=0.310 n=5+5)
Float64ArrayScanBytes-10                 960ns ± 1%     956ns ± 4%    ~     (p=0.310 n=5+5)
Float64ArrayValue-10                     969ns ± 2%     966ns ± 1%    ~     (p=0.421 n=5+5)
Int64ArrayScanBytes-10                   626ns ± 1%     624ns ± 1%    ~     (p=0.421 n=5+5)
Int64ArrayValue-10                       483ns ± 2%     485ns ± 2%    ~     (p=0.841 n=5+5)
Float32ArrayScanBytes-10                 941ns ± 1%     950ns ± 2%    ~     (p=0.246 n=5+5)
Float32ArrayValue-10                     654ns ± 0%     663ns ± 2%  +1.36%  (p=0.016 n=5+5)
Int32ArrayScanBytes-10                   623ns ± 1%     619ns ± 1%    ~     (p=0.246 n=5+5)
Int32ArrayValue-10                       328ns ± 1%     333ns ± 1%  +1.53%  (p=0.008 n=5+5)
StringArrayScanBytes-10                 1.36µs ±10%    1.33µs ± 1%    ~     (p=0.579 n=5+5)
StringArrayValue-10                     2.51µs ± 2%    2.58µs ± 9%    ~     (p=0.421 n=5+5)
GenericArrayScanScannerSliceBytes-10    2.62µs ± 1%    2.66µs ± 3%    ~     (p=0.095 n=5+5)
GenericArrayValueBools-10                642ns ± 1%     647ns ± 1%    ~     (p=0.151 n=5+5)
GenericArrayValueFloat64s-10            1.91µs ± 1%    1.89µs ± 1%    ~     (p=0.151 n=5+5)
GenericArrayValueInt64s-10              1.09µs ± 0%    1.11µs ± 1%  +1.26%  (p=0.024 n=5+5)
GenericArrayValueByteSlices-10          2.69µs ± 1%    2.71µs ± 2%    ~     (p=0.690 n=5+5)
GenericArrayValueStrings-10             2.88µs ± 1%    2.89µs ± 0%    ~     (p=0.206 n=5+5)
SelectString-10                         28.8µs ± 3%    28.6µs ± 1%    ~     (p=0.690 n=5+5)
SelectSeries-10                         52.1µs ± 2%    52.0µs ± 1%    ~     (p=1.000 n=5+5)
MockSelectString-10                      663ns ± 1%     665ns ± 2%    ~     (p=1.000 n=5+5)
MockSelectSeries-10                     7.12µs ± 1%    7.12µs ± 0%    ~     (p=0.690 n=5+5)
PreparedSelectString-10                 28.2µs ± 5%    27.0µs ± 1%  -4.47%  (p=0.008 n=5+5)
PreparedSelectSeries-10                 45.1µs ± 1%    44.7µs ± 1%  -0.86%  (p=0.032 n=5+5)
MockPreparedSelectString-10              336ns ± 1%     342ns ± 3%  +1.68%  (p=0.016 n=5+5)
MockPreparedSelectSeries-10             6.77µs ± 1%    6.79µs ± 0%    ~     (p=0.310 n=5+5)
EncodeInt64-10                          22.6ns ± 0%    22.7ns ± 1%    ~     (p=0.579 n=5+5)
EncodeFloat64-10                        65.0ns ± 2%    64.6ns ± 1%    ~     (p=0.548 n=5+5)
EncodeByteaHex-10                       78.5ns ± 2%    80.9ns ± 4%  +3.07%  (p=0.016 n=5+5)
EncodeByteaEscape-10                     125ns ± 1%     125ns ± 1%    ~     (p=0.508 n=5+5)
EncodeBool-10                           14.8ns ± 0%    14.9ns ± 1%    ~     (p=0.143 n=4+5)
EncodeTimestamptz-10                     264ns ± 1%     264ns ± 1%    ~     (p=0.690 n=5+5)
DecodeInt64-10                          32.5ns ± 1%    32.6ns ± 1%    ~     (p=0.548 n=5+5)
DecodeFloat64-10                        43.7ns ± 1%    44.0ns ± 2%    ~     (p=0.595 n=5+5)
DecodeBool-10                           2.56ns ± 0%    2.55ns ± 1%  -0.62%  (p=0.032 n=5+5)
DecodeTimestamptz-10                     146ns ± 0%     147ns ± 1%    ~     (p=0.063 n=5+5)
DecodeTimestamptzMultiThread-10          176ns ± 2%     179ns ± 3%    ~     (p=0.222 n=5+5)
LocationCache-10                        37.1ns ± 1%    36.7ns ± 1%  -1.18%  (p=0.008 n=5+5)
LocationCacheMultiThread-10              162ns ± 1%     160ns ± 1%  -1.11%  (p=0.008 n=5+5)
ResultParsing-10                        3.81ms ± 0%    3.81ms ± 0%    ~     (p=0.730 n=4+5)
_writeBuf_string-10                     1.58ns ± 1%    1.56ns ± 0%  -1.70%  (p=0.008 n=5+5)
CopyIn-10                                309ns ± 4%     308ns ± 2%    ~     (p=0.889 n=5+5)
AppendEscapedText-10                    2.29µs ± 1%    2.30µs ± 1%    ~     (p=0.548 n=5+5)
AppendEscapedTextNoEscape-10            1.00µs ± 0%    1.01µs ± 0%  +0.54%  (p=0.024 n=5+5)
DecodeUUIDBinary-10                     37.8ns ± 1%    38.0ns ± 1%    ~     (p=0.095 n=5+5)

This patch improves performance on my rickover dequeue benchmark (github.com/kevinburke/rickover), which measures how fast I can get rows out of the database. I can try to get statistically significant results, but you can see it reduces the number of allocations and it's reasonable to assume that performance is also improved.

$ benchstat /tmp/old /tmp/new
name                 old time/op    new time/op    delta
Dequeue/Dequeue1-10    8.00ms ±10%    7.74ms ± 4%   ~     (p=0.421 n=5+5)

name                 old speed      new speed      delta
Dequeue/Dequeue1-10   0.00B/s        0.00B/s        ~     (all equal)

name                 old alloc/op   new alloc/op   delta
Dequeue/Dequeue1-10    12.3kB ±13%    12.1kB ± 2%   ~     (p=0.690 n=5+5)

name                 old allocs/op  new allocs/op  delta
Dequeue/Dequeue1-10       160 ±13%       155 ± 1%   ~     (p=1.000 n=5+5)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions