feat: support for other data types #5

kboroszko · 2021-11-08T09:55:21Z

This change is

dopiera

Reviewable status: 0 of 5 files reviewed, 14 unresolved discussions (waiting on @dopiera and @kboroszko)

tensorflow_io/core/kernels/bigtable/bigtable_dataset_kernel.cc, line 34 at r1 (raw file):

tensorflow::error::Code GoogleCloudErrorCodeToTfErrorCode(
    ::google::cloud::StatusCode code) {

Why these formatting changes?

tensorflow_io/core/kernels/bigtable/bigtable_dataset_kernel.cc, line 281 at r1 (raw file):

        client_resource_(*client_resource),
        client_resource_unref_(client_resource),
        row_set_resource_(*row_set_resource),

What do we need it for? I don't see any uses. Besides, if we're storing this, we should increase the ref counter, no?

tensorflow_io/core/kernels/bigtable/bigtable_dataset_kernel.cc, line 351 at r1 (raw file):

Quoted 5 lines of code…

    io::BigtableRowSetResource* row_set_resource;
    OP_REQUIRES_OK(ctx,
                   GetResourceFromContext(ctx, "row_set", &row_set_resource));
    core::ScopedUnref row_set_resource_unref_(row_set_resource);

This seems unrelated to the data types, no?

tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 1 at r1 (raw file):

/* Copyright 2021 The TensorFlow Authors. All Rights Reserved.

Please add a comment explaining that we're using XDR because this seems consistent with what HBase is using.

tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 16 at r1 (raw file):

==============================================================================*/

#ifndef BIGTABLE_LIB_H

That's a very generic file name compared with its very specific contents.

How about we call it along the lines of serialization.h?

tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 20 at r1 (raw file):

Quoted 12 lines of code…

#include "absl/memory/memory.h"
#include "google/cloud/bigtable/table.h"
#include "google/cloud/bigtable/table_admin.h"
#include "rpc/rpc.h" /* xdr is a sub-library of rpc */
#include "tensorflow/core/framework/common_shape_fns.h"
#include "tensorflow/core/framework/dataset.h"
#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/resource_mgr.h"
#include "tensorflow/core/framework/resource_op_kernel.h"
#include "tensorflow_io/core/kernels/bigtable/bigtable_resource_kernel.h"
#include "tensorflow_io/core/kernels/bigtable/bigtable_row_range.h"

Vast majority of these headers doesn't seem needed - please remove the unnecessary.

tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 22 at r1 (raw file):

rpc

then maybe rpc/xdr.h?

tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 33 at r1 (raw file):

Quoted 5 lines of code…

using ::tensorflow::DT_STRING;
using ::tensorflow::PartialTensorShape;
using ::tensorflow::Status;

namespace cbt = ::google::cloud::bigtable;

Please don't do this in header files.

tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 40 at r1 (raw file):

namespace tensorflow {
namespace data {
namespace {

Anonymous namespaces in headers is are fishy at best. Every compilation unit including this file will have their own copy of these functions. I don't think it makes any sense.

tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 42 at r1 (raw file):

namespace {

std::string FloatToBytes(float v) {

When you define functions in a header file, you should make them inline - otherwise every compilation unit will compile them.

tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 47 at r1 (raw file):

  xdrmem_create(&xdrs, buffer, sizeof(v), XDR_ENCODE);
  if (!xdr_float(&xdrs, &v)) {
    throw std::runtime_error("Error writing float to byte array.");

I'm afraid we should change the error reporting to StatusOr in TF. I guess the status should be "invalid argument".

tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 147 at r1 (raw file):
If an non-templated function is defined in a header file it should be inline.

BUT, Google's style guide says:

A decent rule of thumb is to not inline a function if it is more than 10 lines long.

I think it's fine if per-datatype functions are inline, but this one shouldn't.

Long story short - please move all these functions to a .cc file and make the others inline but not this one.

I was about to also say that we should also avoid calling this switch for every cell but given that we're doing a lookup in a hashmap anyway, I think this extra cost is negligible and we can leave the code structure as is.

tensorflow_io/python/ops/bigtable/bigtable_dataset_ops.py, line 40 at r1 (raw file):

        )

    def parallel_read_rows(

This seems to belong to a different PR.

tensorflow_io/python/ops/bigtable/bigtable_dataset_ops.py, line 82 at r1 (raw file):

row_set: row_set.RowSet,

The row_set part seems to be from a different PR.

kboroszko

Reviewable status: 0 of 5 files reviewed, 14 unresolved discussions (waiting on @dopiera and @kboroszko)

tensorflow_io/core/kernels/bigtable/bigtable_dataset_kernel.cc, line 34 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

Why these formatting changes?

linter

tensorflow_io/core/kernels/bigtable/bigtable_dataset_kernel.cc, line 281 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

What do we need it for? I don't see any uses. Besides, if we're storing this, we should increase the ref counter, no?

Done.

tensorflow_io/core/kernels/bigtable/bigtable_dataset_kernel.cc, line 351 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

    io::BigtableRowSetResource* row_set_resource;
    OP_REQUIRES_OK(ctx,
                   GetResourceFromContext(ctx, "row_set", &row_set_resource));
    core::ScopedUnref row_set_resource_unref_(row_set_resource);

This seems unrelated to the data types, no?

Done.

tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 16 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

That's a very generic file name compared with its very specific contents.

How about we call it along the lines of serialization.h?

Done.

tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 20 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

#include "absl/memory/memory.h"
#include "google/cloud/bigtable/table.h"
#include "google/cloud/bigtable/table_admin.h"
#include "rpc/rpc.h" /* xdr is a sub-library of rpc */
#include "tensorflow/core/framework/common_shape_fns.h"
#include "tensorflow/core/framework/dataset.h"
#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/resource_mgr.h"
#include "tensorflow/core/framework/resource_op_kernel.h"
#include "tensorflow_io/core/kernels/bigtable/bigtable_resource_kernel.h"
#include "tensorflow_io/core/kernels/bigtable/bigtable_row_range.h"

Vast majority of these headers doesn't seem needed - please remove the unnecessary.

Done.

tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 22 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

rpc
then maybe rpc/xdr.h?

Done.

tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 33 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

using ::tensorflow::DT_STRING;
using ::tensorflow::PartialTensorShape;
using ::tensorflow::Status;

namespace cbt = ::google::cloud::bigtable;

Please don't do this in header files.

Done.

tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 40 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

Anonymous namespaces in headers is are fishy at best. Every compilation unit including this file will have their own copy of these functions. I don't think it makes any sense.

Done.

tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 42 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

When you define functions in a header file, you should make them inline - otherwise every compilation unit will compile them.

Done.

tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 47 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

I'm afraid we should change the error reporting to StatusOr in TF. I guess the status should be "invalid argument".

Good point.

tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 147 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

If an non-templated function is defined in a header file it should be inline.

BUT, Google's style guide says:

A decent rule of thumb is to not inline a function if it is more than 10 lines long.

I think it's fine if per-datatype functions are inline, but this one shouldn't.

Long story short - please move all these functions to a .cc file and make the others inline but not this one.

I was about to also say that we should also avoid calling this switch for every cell but given that we're doing a lookup in a hashmap anyway, I think this extra cost is negligible and we can leave the code structure as is.

Oh goodie.

tensorflow_io/python/ops/bigtable/bigtable_dataset_ops.py, line 40 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

This seems to belong to a different PR.

damn it!

dopiera

I think you responded to the comments but forgot to push the changes (or at least I can't see them in Reviewable).

Reviewable status: 0 of 5 files reviewed, 13 unresolved discussions (waiting on @dopiera and @kboroszko)

kboroszko

Reviewable status: 0 of 9 files reviewed, 13 unresolved discussions (waiting on @dopiera)

tensorflow_io/python/ops/bigtable/bigtable_dataset_ops.py, line 82 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

row_set: row_set.RowSet,
The row_set part seems to be from a different PR.

Done.

tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 1 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

/* Copyright 2021 The TensorFlow Authors. All Rights Reserved.
Please add a comment explaining that we're using XDR because this seems consistent with what HBase is using.

Done.

tensorflow_io/core/kernels/bigtable/bigtable_lib.h, line 147 at r1 (raw file):

Previously, kboroszko (Kajetan Boroszko) wrote…

Oh goodie.

Done.

dopiera

Reviewable status: 0 of 10 files reviewed, 6 unresolved discussions (waiting on @dopiera and @kboroszko)

tensorflow_io/core/kernels/bigtable/serialization.h, line 18 at r5 (raw file):

Byte representation of basic types may depend on language and architecture, thus
we need something that is invariant and consistent with HBase. XDR library seems
to satisfy those constraints.

Sorry for being obnoxious here, but could you please move this comment to a separate comment? I almost missed it, thinking it's a part of the copyright header.

Also, could you please include a sentence along these lines:

Bigtable only stores byte buffers as values - except for int64 the server side does not have any notion of types. Tensorflow, needs to store shorter integers, floats, doubles, so we needed to decide on how. We chose to follow what HBase does, since there is a path for migrating from HBase to Bigtable. XDR seems to match what HBase does.

tensorflow_io/core/kernels/bigtable/serialization.h, line 25 at r5 (raw file):

#include "google/cloud/bigtable/table.h"
#include "rpc/xdr.h" /* xdr is a sub-library of rpc */

Now this comment seems stale. Also, I don't think you need this #include in the header, please move it to the .cc file. I think this applies to most of the #includes in this file, actually.

tensorflow_io/core/kernels/bigtable/serialization.cc, line 1 at r5 (raw file):

/* Copyright 2021 The TensorFlow Authors. All Rights Reserved.

I noticed it only now - there are functions for converting types to bytes - we're not using them here. Please remove them.

kboroszko

Reviewable status: 0 of 10 files reviewed, 6 unresolved discussions (waiting on @dopiera)

tensorflow_io/core/kernels/bigtable/serialization.h, line 18 at r5 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

Byte representation of basic types may depend on language and architecture, thus
we need something that is invariant and consistent with HBase. XDR library seems
to satisfy those constraints.
Sorry for being obnoxious here, but could you please move this comment to a separate comment? I almost missed it, thinking it's a part of the copyright header.

Also, could you please include a sentence along these lines:

Bigtable only stores byte buffers as values - except for int64 the server side does not have any notion of types. Tensorflow, needs to store shorter integers, floats, doubles, so we needed to decide on how. We chose to follow what HBase does, since there is a path for migrating from HBase to Bigtable. XDR seems to match what HBase does.

Now it's impossible to miss. 👍

tensorflow_io/core/kernels/bigtable/serialization.h, line 25 at r5 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

Now this comment seems stale. Also, I don't think you need this #include in the header, please move it to the .cc file. I think this applies to most of the #includes in this file, actually.

Done.

tensorflow_io/core/kernels/bigtable/serialization.cc, line 1 at r5 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

I noticed it only now - there are functions for converting types to bytes - we're not using them here. Please remove them.

Sure. I added them expecting the code in for writing to BT in the next PRs. But, since we're not doing that, I guess it's obsolete.

dopiera

Reviewable status: 0 of 10 files reviewed, all discussions resolved (waiting on @dopiera)

dopiera requested changes Nov 8, 2021

View reviewed changes

kboroszko commented Nov 19, 2021

View reviewed changes

dopiera reviewed Nov 21, 2021

View reviewed changes

kboroszko commented Nov 22, 2021

View reviewed changes

kboroszko force-pushed the kb/types2 branch from b385cab to 48dd85e Compare November 22, 2021 13:44

dopiera requested changes Nov 22, 2021

View reviewed changes

kboroszko added 8 commits November 23, 2021 10:40

added support for other data types

f43c714

pr fixes 1

69ccb2f

statuses in serialization.h and remove rowset

7cbb2ed

linter

bf5ea6a

moved byte parsing functions to cc file

b76b38e

linter

b93849f

organized imports

b6af692

better imports and comment on putCellValueInTensor

384fd22

kboroszko force-pushed the kb/types2 branch from b2a55e8 to 384fd22 Compare November 23, 2021 10:40

kboroszko commented Nov 23, 2021

View reviewed changes

linting

13b9647

dopiera approved these changes Nov 23, 2021

View reviewed changes

kboroszko merged commit 1adb827 into master Nov 23, 2021

kboroszko deleted the kb/types2 branch November 23, 2021 12:47

kboroszko added a commit that referenced this pull request Nov 23, 2021

feat: support for other data types (#5)

5494db7

dopiera pushed a commit that referenced this pull request Nov 24, 2021

feat: support for other data types (#5)

51d1ef1

dopiera pushed a commit that referenced this pull request Dec 3, 2021

feat: support for other data types (#5)

6c34901

dopiera pushed a commit that referenced this pull request Dec 3, 2021

feat: support for other data types (#5)

9d32181

kboroszko added a commit that referenced this pull request Dec 13, 2021

feat: support for other data types (#5)

e261104

kboroszko added a commit that referenced this pull request Dec 20, 2021

feat: support for other data types (#5)

70a3f6b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support for other data types #5

feat: support for other data types #5

Uh oh!

kboroszko commented Nov 8, 2021 •

edited by dopiera

Loading

Uh oh!

dopiera left a comment

Uh oh!

kboroszko left a comment

Uh oh!

dopiera left a comment

Uh oh!

kboroszko left a comment

Uh oh!

dopiera left a comment

Uh oh!

kboroszko left a comment

Uh oh!

dopiera left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: support for other data types #5

feat: support for other data types #5

Uh oh!

Conversation

kboroszko commented Nov 8, 2021 • edited by dopiera Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dopiera left a comment

Choose a reason for hiding this comment

Uh oh!

kboroszko left a comment

Choose a reason for hiding this comment

Uh oh!

dopiera left a comment

Choose a reason for hiding this comment

Uh oh!

kboroszko left a comment

Choose a reason for hiding this comment

Uh oh!

dopiera left a comment

Choose a reason for hiding this comment

Uh oh!

kboroszko left a comment

Choose a reason for hiding this comment

Uh oh!

dopiera left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kboroszko commented Nov 8, 2021 •

edited by dopiera

Loading