Skip to content

Conversation

@DzAvril
Copy link
Contributor

@DzAvril DzAvril commented Mar 28, 2022

The input of op concatenate is a list of tensors. We tried to bring it to ACL and found the JSON node of it is like below.

{
  "nodes": [
    {
      "op": "input",
      "name": "arm_compute_lib_0_i0",
      "attrs": {
        "dtype": [
          [
            "int32",
            "int32",
            "int32"
          ]
        ],
        "shape": [
          [
            [1, 234, 234, 256],
            [1, 234, 234, 256],
            [1, 234, 234, 256]
          ]
        ]
      }
    },
    {
      "op": "kernel",
      "name": "concatenate",
      "inputs": [[
          0,
          0,
          0], [
          0,
          1,
          0], [
          0,
          2,
          0]],
      "attrs": {
        "num_outputs": "1",
        "num_inputs": "3",
        "dtype": [
          [
            "int32"
          ]
        ],
        "axis": [
          [
            "0"
          ]
        ],
        "shape": [
          [
            [3, 234, 234, 256]
          ]
        ]
      }
    }
  ],
  "arg_nodes": [0],
  "heads": [[
      1,
      0,
      0]],
  "node_row_ptr": [0, 3, 4]
}

The inputs of node kernel is:

"inputs": [[
          0,
          0,
          0], [
          0,
          1,
          0], [
          0,
          2,
          0]]

Element in inputs is the index of input node. Taking [0, 1, 0] as an example, the first 0 is nodeid, the second 1 is indexid, and the last 0 is version.
But in function Run, it only uses 0 as indexid in statement uint32_t eid = EntryID(nid, 0); so the other two inputs are ignored.

  void Run() override {
    for (size_t i = 0; i < input_nodes_.size(); ++i) {
      auto nid = input_nodes_[i];
      uint32_t eid = EntryID(nid, 0);
      if (nodes_[nid].GetOpType() == "input") {
        void* data = data_entry_[eid]->data;
        CheckACLError(layer_.inputs[i].allocator()->import_memory(data));
      }
    }

    for (size_t i = 0; i < outputs_.size(); ++i) {
      uint32_t eid = EntryID(outputs_[i]);
      void* data = data_entry_[eid]->data;
      CheckACLError(layer_.outputs[i].allocator()->import_memory(data));
    }

    this->layer_.function->run();
  }

This PR introduces a variable json_inputid_to_layer_inputid which maps the input index of JSON node to the index of the ACL layer's inputs.

@u99127
Copy link

u99127 commented Mar 28, 2022

@lhutton1 - could you take a look ?

Copy link
Contributor

@lhutton1 lhutton1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DzAvril, its great to see more support being added to ACL again :) Apologies for the delayed review I was on holiday the past couple of days. Overall LGTM! The comments are mostly just about cleaning things up, testing and tightening the conditions under which concatenate gets offloaded

@DzAvril
Copy link
Contributor Author

DzAvril commented Mar 31, 2022

Thanks @DzAvril, its great to see more support being added to ACL again :) Apologies for the delayed review I was on holiday the past couple of days. Overall LGTM! The comments are mostly just about cleaning things up, testing and tightening the conditions under which concatenate gets offloaded

@lhutton1 Thanks for the comments. Please help to review again.

@lhutton1
Copy link
Contributor

Thanks @DzAvril LGTM!

cc @manupa-arm @comaniac @Mousius who may be able to help get this merged

@DzAvril
Copy link
Contributor Author

DzAvril commented Apr 7, 2022

Thanks @DzAvril LGTM!

cc @manupa-arm @comaniac @Mousius who may be able to help get this merged

Hi @lhutton1 , thanks for your review. As you have approved these changes, the status of this PR is supposed to be Approved, but in fact, it is still Review required. And quote:

Merging can be performed automatically with 1 approving review.

Will it be merged automatically after the status changes to Approved?

@lhutton1
Copy link
Contributor

lhutton1 commented Apr 7, 2022

Unfortunately since I'm just a reviewer my approval doesn't unblock merging, we would need to wait for someone who is a committer to approve this change and merge it.

Friendly ping @manupa-arm @comaniac @Mousius (I'm not sure who else might be interested in ACL support)

@lhutton1
Copy link
Contributor

Hi @DzAvril, I was having a think about this over the weekend and remembered that for the ACL integration we don't run the MergeCompilerRegions pass, meaning each operation is offloaded to ACL separately. Because of this, I'm not sure offloading concatenate (as opposed to running on TVM) would bring any performance improvement - since the operation doesn't really do any computation. So I'm curious, does this PR fix a separate issue you were seeing, or did you observe any kind of performance improvement while offloading concatenate?

My apologies for missing this previously - it's been a while since working on this.

@DzAvril
Copy link
Contributor Author

DzAvril commented Apr 11, 2022

Our team is working on a project which needs hot-upgrade for models, our approach is offloading as many operations as possible to ACL and performance is not a concern. So this PR fixes a separate issue of offloading operations which have a list as inputs and offloading concatenate is a typical case of this issue.

@cee1
Copy link

cee1 commented Apr 11, 2022

which have a list as inputs and offloading concatenate is a typical case of this issue.
For src/runtime/contrib/json/json_runtime.h

The JSONRuntimeBase misses a line for multiple EntryID (eid), as the patch pointed out.

( And this issue hits us while hooking ACL concat for a feature of "hot upgradable model":

  • "Hot upgradable model" are pure data, no executable code within it.
  • Which desire a common Op library to be used by ALL models.

We leverage ACL as the base of the "common Op library". )

Hi @lhutton1, is it a more appropriate way to
a) Disable hooking to ACL' concat within relay/op/contrib/arm_compute_lib.py? (While keeping the concat entry in runtime/contrib/arm_compute_lib/acl_runtime.cc)

Or b) Submit the fixup of json_runtime.h only?

Any idea?

@lhutton1
Copy link
Contributor

Thanks for the insight @DzAvril, @cee1, I think in that case it would be fine to keep the implementation. It might be a good idea to disable offloading concatenate by default, but then add an optional parameter say offload_concat=False to partition_for_arm_compute_lib to manually turn on partitioning for concatenate, WDYT?

@cee1
Copy link

cee1 commented Apr 12, 2022

offload_concat=False to partition_for_arm_compute_lib to manually turn on partitioning for concatenate

Hi @lhutton1, would you please review this revision of the patch?

Copy link
Contributor

@lhutton1 lhutton1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the revision @DzAvril, @cee1. LGTM.. again!

@cee1
Copy link

cee1 commented Apr 12, 2022

Thanks for the revision @DzAvril, @cee1. LGTM.. again!

Hi @u99127, could you review this patch and perform "approving review" if looks fine ...?

@lhutton1
Copy link
Contributor

@masahi @areusch @tmoreau89 - we're struggling to find a committer to take a look at this PR, do you have any suggestions?

@DzAvril
Copy link
Contributor Author

DzAvril commented Apr 13, 2022

@masahi @lhutton1 Thanks for your comments. I re-support concatenate with pattern table mechanism and bring two changes.

  1. Add a parameter called disabled_ops that indicates the ops we do not offload to ACL to partition_for_arm_compute_lib, concatenate is included by default.
  2. Add a function called AddCommonSingleJSONNode to add single op like concatenate to graph.
  std::vector<JSONGraphNodeEntry> AddCommonSingleJSONNode(const CallNode* cn, std::string name) {
    std::vector<JSONGraphNodeEntry> inputs;
    for (const auto& arg : cn->args) {
      auto res = VisitExpr(arg);
      inputs.insert(inputs.end(), res.begin(), res.end());
    }
    auto node = std::make_shared<JSONGraphNode>(name,     /* name_ */
                                                "kernel", /* op_type_ */
                                                inputs, 1 /* num_outputs_ */);

    const auto* fn = cn->op.as<FunctionNode>();
    ICHECK(fn);
    const auto* callNode = fn->body.as<CallNode>();
    ICHECK(callNode);
    SetCallNodeAttribute(node, callNode);
    return AddNode(node, GetRef<Expr>(cn));
  }

This function calls AddNode which returns a list of graph entry nodes and flattens the tuple type expr. In this way, the core problem this PR aims to fix is gone, because after flattening tuple type expr the JSON graph like below:

  {
    "attrs": {
      "dtype": [
        [
          "float32"
        ]
      ],
      "shape": [
        [
          [
            1,
            234,
            234,
            256
          ]
        ]
      ]
    },
    "name": "",
    "op": "input"
  },
  {
    "attrs": {
      "dtype": [
        "float32"
      ],
      "shape": [
        [
          2,
          234,
          234,
          256
        ]
      ]
    },
    "name": "",
    "op": "input"
  },
  {
    "attrs": {
      "dtype": [
        "float32"
      ],
      "shape": [
        [
          3,
          234,
          234,
          256
        ]
      ]
    },
    "name": "",
    "op": "input"
  },
  {
    "attrs": {
      "axis": [
        [
          "0"
        ]
      ],
      "dtype": [
        [
          "float32"
        ]
      ],
      "num_inputs": "3",
      "num_outputs": "1",
      "shape": [
        [
          [
            6,
            234,
            234,
            256
          ]
        ]
      ]
    },
    "inputs": [
      [
        0,
        0,
        0
      ],
      [
        1,
        0,
        0
      ],
      [
        2,
        0,
        0
      ]
    ],
    "name": "concatenate",
    "op": "kernel"
  }

It will work without json_inputid_to_layer_inputid. Even though I think we should keep the change I made in acl_runtime.cc and other related files in case of other corner cases occur.

@DzAvril
Copy link
Contributor Author

DzAvril commented Apr 13, 2022

def arm_compute_lib_pattern_table(disabled_ops=["concatenate"]):

The default value of disabled_ops triggers an error in Sanity Check

Dangerous default value [] as argument

I have no idea how to fix this, can you give a hint pls?

@masahi
Copy link
Member

masahi commented Apr 13, 2022

You can add dangerous-default-value to

# pylint: disable=invalid-name, unused-argument
to suppress this warning.

@masahi masahi merged commit 52f52c8 into apache:main Apr 14, 2022
Lucien0 pushed a commit to Lucien0/tvm that referenced this pull request Apr 19, 2022
* [BYOC][ACL] Fix list is not supported as an input node

* fix clang lint error

* fix compile warnning

* fix python module import error

* rename concatenate test file

* fix always MakeACLTensor with same eid 0

* do not offload concat default

* fix concattnate test failure

* fix test failure

* fix lint error

* fix lint

* remove global var offload_concat

* support concatenate with pattern table mechanism

* disable pylint dangerous-default-value warning

Co-authored-by: XuZhi <[email protected]>
altanh pushed a commit to altanh/tvm that referenced this pull request Apr 28, 2022
* [BYOC][ACL] Fix list is not supported as an input node

* fix clang lint error

* fix compile warnning

* fix python module import error

* rename concatenate test file

* fix always MakeACLTensor with same eid 0

* do not offload concat default

* fix concattnate test failure

* fix test failure

* fix lint error

* fix lint

* remove global var offload_concat

* support concatenate with pattern table mechanism

* disable pylint dangerous-default-value warning

Co-authored-by: XuZhi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants