-
Notifications
You must be signed in to change notification settings - Fork 824
row base serializer #5791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
row base serializer #5791
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
Thanks for the contribution! Please review the labels and make any necessary changes. |
enum_dispath is not IDE friendly, both do not support associate const( can be used in |
8020827
to
79ca832
Compare
@mergify update |
✅ Branch has been successfully updated |
Let's make life easier, enum_dispatch is not needed. |
8060016
to
289fec2
Compare
the serializer is used in limited places, not too bothering, I prefer keep it for now |
c95a8a9
to
fddffdc
Compare
https://www.db-fiddle.com/f/abKfoifJD29jCXVgBNnZap/0 MySQL outputs 1, but I think we do not need to follow MySQL strictly. |
I prefer always use
|
have some problem to use WriteFloatOptions
|
} | ||
} | ||
|
||
fn create_serializer_inner<'a>(&self, _col: &'a ColumnRef) -> Result<TypeSerializerImpl<'a>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used to for normal column( non-const)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create_serializer_inner use for impl
create_serializer as interface
oops, SQL logic tests take effect, need to update expect the results for floats too |
Ok(Arc::new(ArrayColumnData::create(inner_data, offsets))) | ||
} | ||
|
||
fn serialize_json(&self, format: &FormatSettings) -> Result<Vec<Value>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So row based api not works on serialize_json
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
serializer can has API for both row based and col-based(clickhouse, json::Value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format choose to use which
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub fn block_to_json_value(
block: &DataBlock,
format: &FormatSettings,
) -> Result<Vec<Vec<JsonValue>>> {
let cols = block_to_json_value_columns(block, format)?;
Ok(transpose(cols))
}
Since we need to transpose, so serialize_json
is row based ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess column-based may be faster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's not zero alloc in the transpose function. So row based is faster in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's not zero alloc in the transpose function. So row based is faster in this case.
make sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may I refactor it later in another pr later, this pr is large and need to change new-added tests case, hope it to merge early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also need add formats based on this
@sundy-li @zhang2014 rebase only to include new test cases, the original commits are not changed |
@sundy-li @zhang2014 all tests passed (except a build), please review |
/LGTM |
I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/
Summary
add new trait and trait functions so I can replace them increasingly
serialization problem with "f32 as f64" #5863
Conclusions of prev bench results:
Compared with row base impl in this pr:
with row base impl
enum_dispath
is 10% faster thandyn dispatch
@sundy-li if you vote for enum_dispath, I will delete the
dyn dispatch
code, and go on with other typessome notes about serializer:
.0
of float by default unless use it with WriteFloatOptions, since we now use the same float str format in all output formats (need more effort to specialize for each format). it is better to keep.0
to make it easy for other applicaitons to detect the schema for json, csv .etc.Changelog
Related Issues
Fixes #issue