Skip to content

Conversation

@kboroszko
Copy link
Collaborator

@kboroszko kboroszko commented Nov 8, 2021

This change is Reviewable

Copy link

@dopiera dopiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 14 unresolved discussions (waiting on @dopiera and @kboroszko)


tensorflow_io/core/kernels/bigtable/bigtable_dataset_kernel.cc, line 34 at r1 (raw file):

tensorflow::error::Code GoogleCloudErrorCodeToTfErrorCode(
    ::google::cloud::StatusCode code) {

Why these formatting changes?


tensorflow_io/core/kernels/bigtable/bigtable_dataset_kernel.cc, line 281 at r1 (raw file):

        client_resource_(*client_resource),
        client_resource_unref_(client_resource),
        row_set_resource_(*row_set_resource),

What do we need it for? I don't see any uses. Besides, if we're storing this, we should increase the ref counter, no?


tensorflow_io/core/kernels/bigtable/bigtable_dataset_kernel.cc, line 351 at r1 (raw file):

Quoted 5 lines of code…
    io::BigtableRowSetResource* row_set_resource;
    OP_REQUIRES_OK(ctx,
                   GetResourceFromContext(ctx, "row_set", &row_set_resource));
    core::ScopedUnref row_set_resource_unref_(row_set_resource);

This seems unrelated to the data types, no?


tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 1 at r1 (raw file):

/* Copyright 2021 The TensorFlow Authors. All Rights Reserved.

Please add a comment explaining that we're using XDR because this seems consistent with what HBase is using.


tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 16 at r1 (raw file):

==============================================================================*/

#ifndef BIGTABLE_LIB_H

That's a very generic file name compared with its very specific contents.

How about we call it along the lines of serialization.h?


tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 20 at r1 (raw file):

Quoted 12 lines of code…
#include "absl/memory/memory.h"
#include "google/cloud/bigtable/table.h"
#include "google/cloud/bigtable/table_admin.h"
#include "rpc/rpc.h" /* xdr is a sub-library of rpc */
#include "tensorflow/core/framework/common_shape_fns.h"
#include "tensorflow/core/framework/dataset.h"
#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/resource_mgr.h"
#include "tensorflow/core/framework/resource_op_kernel.h"
#include "tensorflow_io/core/kernels/bigtable/bigtable_resource_kernel.h"
#include "tensorflow_io/core/kernels/bigtable/bigtable_row_range.h"

Vast majority of these headers doesn't seem needed - please remove the unnecessary.


tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 22 at r1 (raw file):

rpc

then maybe rpc/xdr.h?


tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 33 at r1 (raw file):

Quoted 5 lines of code…
using ::tensorflow::DT_STRING;
using ::tensorflow::PartialTensorShape;
using ::tensorflow::Status;

namespace cbt = ::google::cloud::bigtable;

Please don't do this in header files.


tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 40 at r1 (raw file):

namespace tensorflow {
namespace data {
namespace {

Anonymous namespaces in headers is are fishy at best. Every compilation unit including this file will have their own copy of these functions. I don't think it makes any sense.


tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 42 at r1 (raw file):

namespace {

std::string FloatToBytes(float v) {

When you define functions in a header file, you should make them inline - otherwise every compilation unit will compile them.


tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 47 at r1 (raw file):

  xdrmem_create(&xdrs, buffer, sizeof(v), XDR_ENCODE);
  if (!xdr_float(&xdrs, &v)) {
    throw std::runtime_error("Error writing float to byte array.");

I'm afraid we should change the error reporting to StatusOr in TF. I guess the status should be "invalid argument".


tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 147 at r1 (raw file):
If an non-templated function is defined in a header file it should be inline.

BUT, Google's style guide says:

A decent rule of thumb is to not inline a function if it is more than 10 lines long.

I think it's fine if per-datatype functions are inline, but this one shouldn't.

Long story short - please move all these functions to a .cc file and make the others inline but not this one.

I was about to also say that we should also avoid calling this switch for every cell but given that we're doing a lookup in a hashmap anyway, I think this extra cost is negligible and we can leave the code structure as is.


tensorflow_io/python/ops/bigtable/bigtable_dataset_ops.py, line 40 at r1 (raw file):

        )

    def parallel_read_rows(

This seems to belong to a different PR.


tensorflow_io/python/ops/bigtable/bigtable_dataset_ops.py, line 82 at r1 (raw file):

row_set: row_set.RowSet,

The row_set part seems to be from a different PR.

Copy link
Collaborator Author

@kboroszko kboroszko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 14 unresolved discussions (waiting on @dopiera and @kboroszko)


tensorflow_io/core/kernels/bigtable/bigtable_dataset_kernel.cc, line 34 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

Why these formatting changes?

linter


tensorflow_io/core/kernels/bigtable/bigtable_dataset_kernel.cc, line 281 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

What do we need it for? I don't see any uses. Besides, if we're storing this, we should increase the ref counter, no?

Done.


tensorflow_io/core/kernels/bigtable/bigtable_dataset_kernel.cc, line 351 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…
    io::BigtableRowSetResource* row_set_resource;
    OP_REQUIRES_OK(ctx,
                   GetResourceFromContext(ctx, "row_set", &row_set_resource));
    core::ScopedUnref row_set_resource_unref_(row_set_resource);

This seems unrelated to the data types, no?

Done.


tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 16 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

That's a very generic file name compared with its very specific contents.

How about we call it along the lines of serialization.h?

Done.


tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 20 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…
#include "absl/memory/memory.h"
#include "google/cloud/bigtable/table.h"
#include "google/cloud/bigtable/table_admin.h"
#include "rpc/rpc.h" /* xdr is a sub-library of rpc */
#include "tensorflow/core/framework/common_shape_fns.h"
#include "tensorflow/core/framework/dataset.h"
#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/resource_mgr.h"
#include "tensorflow/core/framework/resource_op_kernel.h"
#include "tensorflow_io/core/kernels/bigtable/bigtable_resource_kernel.h"
#include "tensorflow_io/core/kernels/bigtable/bigtable_row_range.h"

Vast majority of these headers doesn't seem needed - please remove the unnecessary.

Done.


tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 22 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…
rpc

then maybe rpc/xdr.h?

Done.


tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 33 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…
using ::tensorflow::DT_STRING;
using ::tensorflow::PartialTensorShape;
using ::tensorflow::Status;

namespace cbt = ::google::cloud::bigtable;

Please don't do this in header files.

Done.


tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 40 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

Anonymous namespaces in headers is are fishy at best. Every compilation unit including this file will have their own copy of these functions. I don't think it makes any sense.

Done.


tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 42 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

When you define functions in a header file, you should make them inline - otherwise every compilation unit will compile them.

Done.


tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 47 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

I'm afraid we should change the error reporting to StatusOr in TF. I guess the status should be "invalid argument".

Good point.


tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 147 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

If an non-templated function is defined in a header file it should be inline.

BUT, Google's style guide says:

A decent rule of thumb is to not inline a function if it is more than 10 lines long.

I think it's fine if per-datatype functions are inline, but this one shouldn't.

Long story short - please move all these functions to a .cc file and make the others inline but not this one.

I was about to also say that we should also avoid calling this switch for every cell but given that we're doing a lookup in a hashmap anyway, I think this extra cost is negligible and we can leave the code structure as is.

Oh goodie.


tensorflow_io/python/ops/bigtable/bigtable_dataset_ops.py, line 40 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

This seems to belong to a different PR.

damn it!

Copy link

@dopiera dopiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you responded to the comments but forgot to push the changes (or at least I can't see them in Reviewable).

Reviewable status: 0 of 5 files reviewed, 13 unresolved discussions (waiting on @dopiera and @kboroszko)

Copy link
Collaborator Author

@kboroszko kboroszko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 9 files reviewed, 13 unresolved discussions (waiting on @dopiera)


tensorflow_io/python/ops/bigtable/bigtable_dataset_ops.py, line 82 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…
row_set: row_set.RowSet,

The row_set part seems to be from a different PR.

Done.


tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 1 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…
/* Copyright 2021 The TensorFlow Authors. All Rights Reserved.

Please add a comment explaining that we're using XDR because this seems consistent with what HBase is using.

Done.


tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 147 at r1 (raw file):

Previously, kboroszko (Kajetan Boroszko) wrote…

Oh goodie.

Done.

Copy link

@dopiera dopiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 10 files reviewed, 6 unresolved discussions (waiting on @dopiera and @kboroszko)


tensorflow_io/core/kernels/bigtable/serialization.h, line 18 at r5 (raw file):

Byte representation of basic types may depend on language and architecture, thus
we need something that is invariant and consistent with HBase. XDR library seems
to satisfy those constraints.

Sorry for being obnoxious here, but could you please move this comment to a separate comment? I almost missed it, thinking it's a part of the copyright header.

Also, could you please include a sentence along these lines:

Bigtable only stores byte buffers as values - except for int64 the server side does not have any notion of types. Tensorflow, needs to store shorter integers, floats, doubles, so we needed to decide on how. We chose to follow what HBase does, since there is a path for migrating from HBase to Bigtable. XDR seems to match what HBase does.


tensorflow_io/core/kernels/bigtable/serialization.h, line 25 at r5 (raw file):

#include "google/cloud/bigtable/table.h"
#include "rpc/xdr.h" /* xdr is a sub-library of rpc */

Now this comment seems stale. Also, I don't think you need this #include in the header, please move it to the .cc file. I think this applies to most of the #includes in this file, actually.


tensorflow_io/core/kernels/bigtable/serialization.cc, line 1 at r5 (raw file):

/* Copyright 2021 The TensorFlow Authors. All Rights Reserved.

I noticed it only now - there are functions for converting types to bytes - we're not using them here. Please remove them.

Copy link
Collaborator Author

@kboroszko kboroszko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 10 files reviewed, 6 unresolved discussions (waiting on @dopiera)


tensorflow_io/core/kernels/bigtable/serialization.h, line 18 at r5 (raw file):

Previously, dopiera (Marek Dopiera) wrote…
Byte representation of basic types may depend on language and architecture, thus
we need something that is invariant and consistent with HBase. XDR library seems
to satisfy those constraints.

Sorry for being obnoxious here, but could you please move this comment to a separate comment? I almost missed it, thinking it's a part of the copyright header.

Also, could you please include a sentence along these lines:

Bigtable only stores byte buffers as values - except for int64 the server side does not have any notion of types. Tensorflow, needs to store shorter integers, floats, doubles, so we needed to decide on how. We chose to follow what HBase does, since there is a path for migrating from HBase to Bigtable. XDR seems to match what HBase does.

Now it's impossible to miss. 👍


tensorflow_io/core/kernels/bigtable/serialization.h, line 25 at r5 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

Now this comment seems stale. Also, I don't think you need this #include in the header, please move it to the .cc file. I think this applies to most of the #includes in this file, actually.

Done.


tensorflow_io/core/kernels/bigtable/serialization.cc, line 1 at r5 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

I noticed it only now - there are functions for converting types to bytes - we're not using them here. Please remove them.

Sure. I added them expecting the code in for writing to BT in the next PRs. But, since we're not doing that, I guess it's obsolete.

Copy link

@dopiera dopiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 10 files reviewed, all discussions resolved (waiting on @dopiera)

@kboroszko kboroszko merged commit 1adb827 into master Nov 23, 2021
@kboroszko kboroszko deleted the kb/types2 branch November 23, 2021 12:47
kboroszko added a commit that referenced this pull request Nov 23, 2021
dopiera pushed a commit that referenced this pull request Nov 24, 2021
dopiera pushed a commit that referenced this pull request Dec 3, 2021
dopiera pushed a commit that referenced this pull request Dec 3, 2021
kboroszko added a commit that referenced this pull request Dec 13, 2021
kboroszko added a commit that referenced this pull request Dec 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants