-
Notifications
You must be signed in to change notification settings - Fork 0
feat: support for other data types #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dopiera
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 14 unresolved discussions (waiting on @dopiera and @kboroszko)
tensorflow_io/core/kernels/bigtable/bigtable_dataset_kernel.cc, line 34 at r1 (raw file):
tensorflow::error::Code GoogleCloudErrorCodeToTfErrorCode( ::google::cloud::StatusCode code) {
Why these formatting changes?
tensorflow_io/core/kernels/bigtable/bigtable_dataset_kernel.cc, line 281 at r1 (raw file):
client_resource_(*client_resource), client_resource_unref_(client_resource), row_set_resource_(*row_set_resource),
What do we need it for? I don't see any uses. Besides, if we're storing this, we should increase the ref counter, no?
tensorflow_io/core/kernels/bigtable/bigtable_dataset_kernel.cc, line 351 at r1 (raw file):
Quoted 5 lines of code…
io::BigtableRowSetResource* row_set_resource; OP_REQUIRES_OK(ctx, GetResourceFromContext(ctx, "row_set", &row_set_resource)); core::ScopedUnref row_set_resource_unref_(row_set_resource);
This seems unrelated to the data types, no?
tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 1 at r1 (raw file):
/* Copyright 2021 The TensorFlow Authors. All Rights Reserved.
Please add a comment explaining that we're using XDR because this seems consistent with what HBase is using.
tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 16 at r1 (raw file):
==============================================================================*/ #ifndef BIGTABLE_LIB_H
That's a very generic file name compared with its very specific contents.
How about we call it along the lines of serialization.h?
tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 20 at r1 (raw file):
Quoted 12 lines of code…
#include "absl/memory/memory.h" #include "google/cloud/bigtable/table.h" #include "google/cloud/bigtable/table_admin.h" #include "rpc/rpc.h" /* xdr is a sub-library of rpc */ #include "tensorflow/core/framework/common_shape_fns.h" #include "tensorflow/core/framework/dataset.h" #include "tensorflow/core/framework/op.h" #include "tensorflow/core/framework/op_kernel.h" #include "tensorflow/core/framework/resource_mgr.h" #include "tensorflow/core/framework/resource_op_kernel.h" #include "tensorflow_io/core/kernels/bigtable/bigtable_resource_kernel.h" #include "tensorflow_io/core/kernels/bigtable/bigtable_row_range.h"
Vast majority of these headers doesn't seem needed - please remove the unnecessary.
tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 22 at r1 (raw file):
rpc
then maybe rpc/xdr.h?
tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 33 at r1 (raw file):
Quoted 5 lines of code…
using ::tensorflow::DT_STRING; using ::tensorflow::PartialTensorShape; using ::tensorflow::Status; namespace cbt = ::google::cloud::bigtable;
Please don't do this in header files.
tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 40 at r1 (raw file):
namespace tensorflow { namespace data { namespace {
Anonymous namespaces in headers is are fishy at best. Every compilation unit including this file will have their own copy of these functions. I don't think it makes any sense.
tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 42 at r1 (raw file):
namespace { std::string FloatToBytes(float v) {
When you define functions in a header file, you should make them inline - otherwise every compilation unit will compile them.
tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 47 at r1 (raw file):
xdrmem_create(&xdrs, buffer, sizeof(v), XDR_ENCODE); if (!xdr_float(&xdrs, &v)) { throw std::runtime_error("Error writing float to byte array.");
I'm afraid we should change the error reporting to StatusOr in TF. I guess the status should be "invalid argument".
tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 147 at r1 (raw file):
If an non-templated function is defined in a header file it should be inline.
BUT, Google's style guide says:
A decent rule of thumb is to not inline a function if it is more than 10 lines long.
I think it's fine if per-datatype functions are inline, but this one shouldn't.
Long story short - please move all these functions to a .cc file and make the others inline but not this one.
I was about to also say that we should also avoid calling this switch for every cell but given that we're doing a lookup in a hashmap anyway, I think this extra cost is negligible and we can leave the code structure as is.
tensorflow_io/python/ops/bigtable/bigtable_dataset_ops.py, line 40 at r1 (raw file):
) def parallel_read_rows(
This seems to belong to a different PR.
tensorflow_io/python/ops/bigtable/bigtable_dataset_ops.py, line 82 at r1 (raw file):
row_set: row_set.RowSet,
The row_set part seems to be from a different PR.
kboroszko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 14 unresolved discussions (waiting on @dopiera and @kboroszko)
tensorflow_io/core/kernels/bigtable/bigtable_dataset_kernel.cc, line 34 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
Why these formatting changes?
linter
tensorflow_io/core/kernels/bigtable/bigtable_dataset_kernel.cc, line 281 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
What do we need it for? I don't see any uses. Besides, if we're storing this, we should increase the ref counter, no?
Done.
tensorflow_io/core/kernels/bigtable/bigtable_dataset_kernel.cc, line 351 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
io::BigtableRowSetResource* row_set_resource; OP_REQUIRES_OK(ctx, GetResourceFromContext(ctx, "row_set", &row_set_resource)); core::ScopedUnref row_set_resource_unref_(row_set_resource);This seems unrelated to the data types, no?
Done.
tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 16 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
That's a very generic file name compared with its very specific contents.
How about we call it along the lines of
serialization.h?
Done.
tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 20 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
#include "absl/memory/memory.h" #include "google/cloud/bigtable/table.h" #include "google/cloud/bigtable/table_admin.h" #include "rpc/rpc.h" /* xdr is a sub-library of rpc */ #include "tensorflow/core/framework/common_shape_fns.h" #include "tensorflow/core/framework/dataset.h" #include "tensorflow/core/framework/op.h" #include "tensorflow/core/framework/op_kernel.h" #include "tensorflow/core/framework/resource_mgr.h" #include "tensorflow/core/framework/resource_op_kernel.h" #include "tensorflow_io/core/kernels/bigtable/bigtable_resource_kernel.h" #include "tensorflow_io/core/kernels/bigtable/bigtable_row_range.h"Vast majority of these headers doesn't seem needed - please remove the unnecessary.
Done.
tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 22 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
rpcthen maybe
rpc/xdr.h?
Done.
tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 33 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
using ::tensorflow::DT_STRING; using ::tensorflow::PartialTensorShape; using ::tensorflow::Status; namespace cbt = ::google::cloud::bigtable;Please don't do this in header files.
Done.
tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 40 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
Anonymous namespaces in headers is are fishy at best. Every compilation unit including this file will have their own copy of these functions. I don't think it makes any sense.
Done.
tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 42 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
When you define functions in a header file, you should make them inline - otherwise every compilation unit will compile them.
Done.
tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 47 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
I'm afraid we should change the error reporting to
StatusOrin TF. I guess the status should be "invalid argument".
Good point.
tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 147 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
If an non-templated function is defined in a header file it should be inline.
BUT, Google's style guide says:
A decent rule of thumb is to not inline a function if it is more than 10 lines long.
I think it's fine if per-datatype functions are inline, but this one shouldn't.
Long story short - please move all these functions to a
.ccfile and make the others inline but not this one.I was about to also say that we should also avoid calling this switch for every cell but given that we're doing a lookup in a hashmap anyway, I think this extra cost is negligible and we can leave the code structure as is.
Oh goodie.
tensorflow_io/python/ops/bigtable/bigtable_dataset_ops.py, line 40 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
This seems to belong to a different PR.
damn it!
dopiera
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you responded to the comments but forgot to push the changes (or at least I can't see them in Reviewable).
Reviewable status: 0 of 5 files reviewed, 13 unresolved discussions (waiting on @dopiera and @kboroszko)
kboroszko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 9 files reviewed, 13 unresolved discussions (waiting on @dopiera)
tensorflow_io/python/ops/bigtable/bigtable_dataset_ops.py, line 82 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
row_set: row_set.RowSet,The
row_setpart seems to be from a different PR.
Done.
tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 1 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
/* Copyright 2021 The TensorFlow Authors. All Rights Reserved.Please add a comment explaining that we're using XDR because this seems consistent with what HBase is using.
Done.
tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 147 at r1 (raw file):
Previously, kboroszko (Kajetan Boroszko) wrote…
Oh goodie.
Done.
dopiera
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 10 files reviewed, 6 unresolved discussions (waiting on @dopiera and @kboroszko)
tensorflow_io/core/kernels/bigtable/serialization.h, line 18 at r5 (raw file):
Byte representation of basic types may depend on language and architecture, thus we need something that is invariant and consistent with HBase. XDR library seems to satisfy those constraints.
Sorry for being obnoxious here, but could you please move this comment to a separate comment? I almost missed it, thinking it's a part of the copyright header.
Also, could you please include a sentence along these lines:
Bigtable only stores byte buffers as values - except for
int64the server side does not have any notion of types. Tensorflow, needs to store shorter integers, floats, doubles, so we needed to decide on how. We chose to follow what HBase does, since there is a path for migrating from HBase to Bigtable. XDR seems to match what HBase does.
tensorflow_io/core/kernels/bigtable/serialization.h, line 25 at r5 (raw file):
#include "google/cloud/bigtable/table.h" #include "rpc/xdr.h" /* xdr is a sub-library of rpc */
Now this comment seems stale. Also, I don't think you need this #include in the header, please move it to the .cc file. I think this applies to most of the #includes in this file, actually.
tensorflow_io/core/kernels/bigtable/serialization.cc, line 1 at r5 (raw file):
/* Copyright 2021 The TensorFlow Authors. All Rights Reserved.
I noticed it only now - there are functions for converting types to bytes - we're not using them here. Please remove them.
kboroszko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 10 files reviewed, 6 unresolved discussions (waiting on @dopiera)
tensorflow_io/core/kernels/bigtable/serialization.h, line 18 at r5 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
Byte representation of basic types may depend on language and architecture, thus we need something that is invariant and consistent with HBase. XDR library seems to satisfy those constraints.Sorry for being obnoxious here, but could you please move this comment to a separate comment? I almost missed it, thinking it's a part of the copyright header.
Also, could you please include a sentence along these lines:
Bigtable only stores byte buffers as values - except for
int64the server side does not have any notion of types. Tensorflow, needs to store shorter integers, floats, doubles, so we needed to decide on how. We chose to follow what HBase does, since there is a path for migrating from HBase to Bigtable. XDR seems to match what HBase does.
Now it's impossible to miss. 👍
tensorflow_io/core/kernels/bigtable/serialization.h, line 25 at r5 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
Now this comment seems stale. Also, I don't think you need this
#includein the header, please move it to the.ccfile. I think this applies to most of the#includes in this file, actually.
Done.
tensorflow_io/core/kernels/bigtable/serialization.cc, line 1 at r5 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
I noticed it only now - there are functions for converting types to bytes - we're not using them here. Please remove them.
Sure. I added them expecting the code in for writing to BT in the next PRs. But, since we're not doing that, I guess it's obsolete.
dopiera
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 10 files reviewed, all discussions resolved (waiting on @dopiera)
This change is