Skip to content

Conversation

gs-olive
Copy link
Contributor

@gs-olive gs-olive commented Nov 28, 2022

Description

Fix compilation issues with Citrinet-1024 arising from type-casting issue in aten::sum and layer-naming issue in aten::div.

  • Enable automatic type-casting in aten::sum for bool tensor inputs to agree with Torch casting behavior
  • Fix bug in aten::div where all internal div layers have the same name
  • Add test cases for aten::sum type-casting

Without type-casting, the error arises from the following:

DEBUG: [Torch-TensorRT - Debug Build] - ITensor shape: [1, 1, 256]
DEBUG: [Torch-TensorRT - Debug Build] - ITensor type: Bool
DEBUG: [Torch-TensorRT - Debug Build] - InDims 1 1 256
DEBUG: [Torch-TensorRT - Debug Build] - Dim to reduce(original):[-1]
DEBUG: [Torch-TensorRT - Debug Build] - Dim to reduce(converted):[2]
DEBUG: [Torch-TensorRT - Debug Build] - Axis Mask: 00000000000000000000000000000100
DEBUG: [Torch-TensorRT - Debug Build] - Keep dims: 1
WARNING: [Torch-TensorRT - Debug Build] - Sum converter disregards dtype
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 3: %66 : Tensor = aten::sum(%mask2.2, %65, %26, %3779)
# ReduceLayer only supports Float, Half, Int8, and Int32 data types.

With type casting, one error is resolved, but another appears:

ERROR: [Torch-TensorRT TorchScript Conversion Context] - 4: [network.cpp::validate::2761] Error Code 4: Internal Error (Repeated layer name: tmp_div (layers must have distinct names))
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 2: [builder.cpp::buildSerializedNetwork::742] Error Code 2: Internal Error (Assertion engine != nullptr failed. )

With both updates, the model compiles succesfully.

Fixes #1487

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • [ x ] My code follows the style guidelines of this project (You can use the linters)
  • [ x ] I have performed a self-review of my own code
  • [ x ] I have commented my code, particularly in hard-to-understand areas and hacks
  • [ x ] I have made corresponding changes to the documentation
  • [ x ] I have added tests to verify my fix or my feature
  • [ x ] New and existing unit tests pass locally with my changes
  • [ x ] I have added the relevant labels to my PR in so that relevant reviewers are notified

@gs-olive gs-olive requested a review from narendasan November 28, 2022 19:28
@github-actions github-actions bot added component: conversion Issues re: Conversion stage component: converters Issues re: Specific op converters component: core Issues re: The core compiler labels Nov 28, 2022
- Enable automatic type-casting in `aten::sum` for bool tensor inputs to
agree with Torch casting behavior
- Fix bug in `aten::div` where all internal div layers have the same
name

- Add test cases for `aten::sum` type-casting
Copy link
Collaborator

@narendasan narendasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@narendasan narendasan merged commit cc4e3e6 into pytorch:master Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed component: conversion Issues re: Conversion stage component: converters Issues re: Specific op converters component: core Issues re: The core compiler component: tests Issues re: Tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 [Bug] Compilation Error on Citrinet-1024

3 participants