-
Notifications
You must be signed in to change notification settings - Fork 134
feat!(insert): RowBinaryWithNamesAndTypes
#244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
RowBinaryWithNamesAndTypes
RowBinaryWithNamesAndTypes
Perhaps we should do it as a follow-up. |
Added mocked_insert benchmark with validation, results are a bit odd:
|
This indeed fixes #266 |
if columns.len() != T::COLUMN_NAMES.len() { | ||
panic!( | ||
"While processing struct {}: database schema has {} columns, \ | ||
but the struct definition has {} fields.\ | ||
\n#### All struct fields:\n{}\n#### All schema columns:\n{}", | ||
T::NAME, | ||
columns.len(), | ||
T::COLUMN_NAMES.len(), | ||
join_panic_schema_hint(T::COLUMN_NAMES), | ||
join_panic_schema_hint(&columns), | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break inserts that leave out columns with default values, and relatedly, it will make it impossible to insert into tables with MATERIALIZED
or ALIAS
columns altogether. Perhaps this function should use the DESCRIBE TABLE
output rather than the RowBinaryWithNamesAndTypes
header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is a good find, also worth testing with both current and updated version
} | ||
let mut insert = client.insert("some")?; | ||
let mut insert = client.insert::<MyRow>("some").await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of ```rust,ignore
we could use ```rust,no_run
so that these at least get compile-tested.
I could take that on as a task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be done as a part of finalizing #160
) | ||
ENGINE = MergeTree | ||
PRIMARY KEY (instrument_id, received_time) | ||
"#, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using raw string literals is also something idiomatic to SQLx, though strictly speaking this is only necessary for when you need to quote identifiers as it avoids the need for escapes: https://docs.rs/sqlx/latest/sqlx/macro.query.html#type-overrides-output-columns
Since it appears ClickHouse uses backticks instead of double quotes for quoted identifiers, there's not much benefit to using raw string literals here.
} | ||
} | ||
|
||
/// Cache for [`RowMetadata`] to avoid allocating it for the same struct more than once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to be careful with this kind of caching is the user may make concurrent changes to the table which invalidates the cached metadata. This could easily happen if they run migrations during a rolling upgrade of an application.
We've had problems with this in SQLx WRT prepared statement caching.
This isn't necessarily a bad idea, but at the very least you should provide a method for the user to manually invalidate the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed! I think ideally we should:
- provide a TTL (maybe one or a few hours by default)
- an ability to manually drop the caches
// https://clickhouse.com/docs/en/sql-reference/syntax#identifiers | ||
let sql = format!("INSERT INTO {table}({fields}) FORMAT RowBinary"); | ||
let format = if row_metadata.is_some() { | ||
"RowBinaryWithNamesAndTypes" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see the format types defined as an enum
, or at least constants somewhere. That's something I could do. If we want the user to be able to specify formats that we don't know about (e.g if new ones get added), there's options for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, this is internal only; we can add some popular formats as an enum provided by the crate. This might be useful with fetch_bytes
: https://github.com/ClickHouse/clickhouse-rs/blob/main/examples/stream_arbitrary_format_rows.rs
and its INSERT counterpart that is to be implemented: #174
So perhaps it can be introduced in the scope of #174 indeed. However: it should implement Into<String>
and not break the existing API expecting an impl Into<String>
, cause we want to export these enums as just a hint on what is available, and this library might not be in sync with the latest CH release that adds a new format, which the end user might want to use ASAP.
Let's squash this one as it became too big and continue in the follow-ups. I created a few issues based on the feedback in the comments:
These issues should be addressed before 0.14.0. We can release #294 (changing panic to an error instead) as 0.14.1 cause it is not a breaking change. |
Summary
Implements RBWNAT support for insert and inserter.
Unfortunately, there will be quite a lot of breaking changes. Specifically, we absolutely need make
Query::insert
async, so we can properly fetch the schema, as we need to put RBWNAT header in the stream before the first record. This also implies makingInserter::write
async too (but at least this is aligned with howinsert
works already).On a flip side, useless
Result
types from thenew
constructors forinsert
andinserter
are now removed.Current progress:
Related issues
A checkmark means that we have all necessary integration tests in place.
Resolved after this PR
RowBinaryWithNamesAndTypes
#10FixedString
type #49DB::Exception: Cannot read all data. Bytes read: N. Bytes expected: N
#266Previously closed issues
These issues were closed due to schema mismatch in the code, but now should have clearer error (panic) messages:
CANNOT_READ_ALL_DATA
Error when serializing Nested types with Map fields #214Wrapping up RBWNAT
After #221, we had a few leftovers:
BTreeMap
,IndexMap
, etc.)Mock
API after all the breaking changesFollow-ups
Flatten attribute is non-trivial and has been extracted as #264 to reduce scope and implement it later.