Skip to content

Conversation

@olt
Copy link
Contributor

@olt olt commented Oct 31, 2013

See #74

@fdr
Copy link
Member

fdr commented Nov 1, 2013

This is a bizarre and very cute trick. Let me see if I have the contour right:

You use Prepare to attach a brand new implementation/state machine that's hidden in the Stmt type, and then Exec is used on that Stmt value to control it. Is that about right?

@olt
Copy link
Contributor Author

olt commented Nov 1, 2013

@fdr Yes, right. It was the only sane way that I found to get the COPY statement into Go's sql API.

I'm using it for OpenStreetMap imports into Postgres/PostGIS and it works pretty well (https://github.com/omniscale/imposm3). It does 30k/s inserts while the sql part and Postgres use <10% CPU. The rest is data crunching. Before the COPY change the load was nearly shared 50:50 between Imposm and Postgres.

conn.go Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you mean with "asynchronous" here. Are you referring to the usage of an internal buffer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because of the buffering and also because it reads the responses in a separate goroutine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a bit misleading and could use some rewording, but I'd like to hear other peoples' inputs as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having thought about this for a bit more, I think the behavior actually fits all the criteria to call at least the error handling part of the feature "asynchronous". I think the documentation could be a bit more clear on what exactly that means in this case. My confusion arose from the use of "and"; my brain didn't connect the comment about Exec with the comment about asynchronicity. But that might just be me :-)

@johto
Copy link
Contributor

johto commented Nov 6, 2013

Thanks for working on this, I think it's looking pretty good in general. The only thing I'm worried about is the ugly interface database/sql is forcing us into. I see three options here:

  1. Go with this one (or a similar interface, but I don't see any clear room for improvement).
  2. Implement something similar to the proposed LISTEN/NOTIFY interface, i.e. force the user to open a new connection just for COPYing in and out of the database. That doesn't sound too bad for the case of bulk loading efficiently, but I'm not sure how well that would work as a general interface.
  3. Wait and hope for (or help) database/sql to implement an interface for grabbing a "session" from the pool.

I don't consider 1) to be unreasonable, but we'd have to support this interface for quite some time, which seems like an unwanted side effect.

Thoughts?

@olt
Copy link
Contributor Author

olt commented Nov 7, 2013

I think I addressed your comments. I hope the documentation is now more clear, I'm not a native speaker though.

To the interface: The benefit from having it integrated in the database/sql API is that you can use the connection and transaction for regular SQL commands as well. The only difference between INSERT and COPY FROM is the prepared SQL statement. Creating a new transaction, creating or truncating a table and preparing the statement is identical. See here where I could change tt.Spec.CopySQL() to tt.Spec.InsertSQL() to switch back to regular inserts.

I think the only wart is the empty Exec() to get all errors. I'll create a ticket for go to see if it's a bug that stmt.Close() does not pass the errors from the underlying statement back, or if this is intended behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you thought about COPY TO (a.k.a. "copy out")? As-is this seems to foreclose doing something about that. Is it credible to use the protocol state returned from the server to determine what direction the COPY is in? If that is done, could statement-sniffing be avoided altogether?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there's a different protocol message for both COPY FROM and COPY TO, and using that to realize we're doing a COPY would probably be an improvement over the current approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that vein, to prove it can be done, I'd like to see the other COPY (out) direction die in a lucid manner. Maybe with a test, if it's not herculean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It now returns "COPY TO is not supported": olt@0bdf7d3
Before that the error was "unknown response for copy query: 'H'"

I can check if I can integrate it into prepareTo and then use the server response to decide if it's a COPY FROM. Not sure if this will work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a nice thing to confirm, even if COPY ... TO STDOUT is but a stub. What would the interface look like anyway? I could only imagine .Exec() (of no arguments) would yield an opaque []byte of the copy-out data record.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In-app pooling like database/sql. Now pooling is obviously not a concern for us here, but I don't think we should assume that the client gets to work with our implementations of datbase/sql/driver interfaces directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not, and would it help if the type assertion was to an interface, not a concrete type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now pooling is obviously not a concern for us here, but I don't think we should assume that the client gets to work with our implementations of datbase/sql/driver interfaces directly.

Why not, and would it help if the type assertion was to an interface, not a concrete type?

Because the implementation in database/sql can, as far as I can tell, interpose its own types wherever interfaces are returned and delegate to the concrete types of a given database/sql/driver implementation. So no, interfaces won't help.

I'm not saying this won't work, but I don't believe it's guaranteed to always work in the future, since it's depending on specific implementation semantics.of the database/sql framework.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having access to our driver.Result interface would allow an API where lib/pq would abstract all the nasty stuff (casting of Result, etc.). But the database/sql API is clear that it returns sql.Result and not driver.Result, so there is the danger that it will change in future Go releases.

Given the restrictions here, could we try to bullet-proof that by doing something like

db.Prepare(pq.CopyIn("table_name", "col1", "col2"))

I like that. How should we handle quoting of table and column names? We could quote everything when building the COPY statement, but how do we handle Postgres schemas then? Schema names and table names need to be quoted separately ("schema"."table").

Just quote everything and permit hack like:

    db.Prepare(pq.CopyIn("schema\".\"table_name", "col1", "col2"))

Or only quote if a string is not quoted already?

    db.Prepare(pq.CopyIn("\"schema\".\"table_name\"", "col1", "col2"))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good point. Hadn't thought of schemas. Given that they're somewhat less common, maybe just do a separate function for that? E.g., pq.CopyIn and pq.CopyInSchema. This is a little gross, but the other options seem kinda sketchy (i.e., we should not assume any special meanings for any characters in the identifiers).

@msakrejda
Copy link
Contributor

Could you please rebase when you have a chance? Just got back from vacation and merged a few unrelated fixes. As this is biggish, it'd be great to discuss it as a clean patch. Thanks!

For what it's worth, I actually kind of like @fdr's idea of using an io.Reader to provide data, but I think that can build on this as a later patch.

I do think this only handles some of the use cases of COPY, but I believe there's value to this, and it's a clever approach. Thanks for the patch.

@olt
Copy link
Contributor Author

olt commented Nov 14, 2013

I've added CopyIn and CopyInSchema and updated the docs.

I've also created a ticket for the 'sql.Stmt.Close does not return errors from driver.Stmt.Close' issue to get an official statement: http://code.google.com/p/go/issues/detail?id=6734 Maybe we are lucky and it gets fixed in the future.

@msakrejda
Copy link
Contributor

So I'm reasonably happy with this at this point. It's a hack, but it exposes a handy feature of Postgres, and the io.Reader approach would be a reasonable mechanism to extend this to existing already-formatted data. We should be able to hack an output mechanism as well using similar semantics. I think we should go ahead with this. Any other thoughts or objections?

@teburd
Copy link

teburd commented Dec 4, 2013

I'm already using this myself, and its great, I did however find I needed to change one thing shown here...

https://github.com/treetopllc/libpq/commit/ea2b0f61dd30b5d80a2a141eedaa7969c9f2d29b

I'm not entirely sure what the response is from or caused by, but found that if I didn't catch that case bad stuff happened, if I match it and ignore it the copy succeeds just fine

@johto
Copy link
Contributor

johto commented Dec 4, 2013

I'm already using this myself, and its great

Thanks for testing!

I did however find I needed to change one thing shown here...

treetopllc/pq@ea2b0f6

I'm not entirely sure what the response is from or caused by, but found that if I didn't catch that case bad stuff happened, if I match it and ignore it the copy succeeds just fine

Callers or recv1 don't need to ignore 'N' as of 7454a98, so rebasing against master should fix this issue.

@olt
Copy link
Contributor Author

olt commented Dec 12, 2013

I just did a rebase on the current master.

copy.go Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recv1() already ignores these, so you shouldn't try to handle them here.

@olt
Copy link
Contributor Author

olt commented Dec 12, 2013

@johto I removed the check.

@msakrejda
Copy link
Contributor

Rebased, squashed, and merged. Thanks!

@msakrejda msakrejda closed this Dec 15, 2013
@johto
Copy link
Contributor

johto commented Dec 16, 2013

@deafbybeheading:

Looks like this patch broke the tests against 8.4. This seems to be the cause:

    case []byte:
        return append(buf, fmt.Sprintf("\\\\x%x", v)...)

Passing parameterStatus and calling encode() instead should fix this, I think.

@msakrejda
Copy link
Contributor

This is in appendEncodedText, right? It'd be great if that could share a code path with encode. Passing parameter status will make Demeter spin in her grave:

ci.buffer = appendEncodedText(ci.conn.parameterStatus.serverVersion, ci.buffer, value)

but I don't see a simpler fix.

@msakrejda
Copy link
Contributor

Actually, I guess we're passing the whole parameterStatus struct, so it's not quite as bad.

@johto
Copy link
Contributor

johto commented Dec 16, 2013

This is in appendEncodedText, right? It'd be great if that could share a code path with encode.

Naively thinking I would expect both COPY and the "normal" interface to use encode(), but I have to admit I haven't looked at the code to see if that's actually possible.

@msakrejda
Copy link
Contributor

Yeah, or that--I think the idea is to append the text to an existing buffer and avoid copying where possible. It's nice for performance but not so nice for the code.

@johto
Copy link
Contributor

johto commented Dec 16, 2013

Ah. Can't we change the interface for encode() to do that instead? I think with COPY performance is often very important.

@msakrejda
Copy link
Contributor

Yeah, I think we can fix this one way or another. Unfortunately, I ran out of time for looking at this right now. I got this: msakrejda/pq@lib:master...broken-fix-8.4-copy but it fails some tests so my approach may be wrong. I can revisit tonight.

@johto
Copy link
Contributor

johto commented Dec 16, 2013

@deafbybeheading: Thanks, I might take a look a bit later.

@msakrejda
Copy link
Contributor

Fixed. The appendEncodedText method actually also escapes backslashes for COPY, so I had to add that. Thanks for pointing it out.

@johto
Copy link
Contributor

johto commented Dec 17, 2013

I think on 8.4 it needs to go through the entire appendEscapedText() treatment. That'll be a bit more costly performance-wise, though. Or one could try and add a boolean to escapeBytea() to properly escape the special characters for COPY.

I'm hoping we can use binary mode in the future for the normal code path, though, so I'm not sure if spending much work on escapeBytea() right now is justified. Especially only to to make COPY faster on 8.4, both of which were unsupported until a week ago.

@msakrejda
Copy link
Contributor

Oh hmm, yeah. Hopefully this will do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants