Skip to content

Conversation

lanluo-nvidia
Copy link
Collaborator

Description

For the windows build issue, I am able to identify the issue is caused by the torchtrt.dll:
It is complaining the getinstance method in the UndefinedTensorImpl class

The procedure entry point
?getinstance@UndefinedTensorImpl@c10@@CAAEAU12@XZ
could not be located in the dynamic link library
C:\Users\lanl\git\venv_py310\Lib\site-packages\torch_tensorrt
lib\torchtrt.dll.
I believe this is related to the libtorch version:
currently we use the one from:

https://download.pytorch.org/libtorch/${CHANNEL}/${CU_VERSION}/libtorch-win-shared-with-deps-latest.zip
which is causing the issue.

Type of change

Please delete options that are not relevant and/or add your own.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • My code follows the style guidelines of this project (You can use the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes
  • I have added the relevant labels to my PR in so that relevant reviewers are notified

@lanluo-nvidia lanluo-nvidia self-assigned this Aug 14, 2024
@lanluo-nvidia lanluo-nvidia added build-release-artifacts Build the release artifacts ciflow/binaries/all Build for all Python Versions labels Aug 14, 2024
Copy link

pytorch-bot bot commented Aug 14, 2024

No ciflow labels are configured for this repo.
For information on how to enable CIFlow bot see this wiki

@HolyWu
Copy link
Contributor

HolyWu commented Aug 15, 2024

I don't think simply changing to local_repository will solve the issue, since the locally installed torch in building wheel phase is still the latest nightly version.

@lanluo-nvidia
Copy link
Collaborator Author

lanluo-nvidia commented Aug 15, 2024

I don't think simply changing to local_repository will solve the issue, since the locally installed torch in building wheel phase is still the latest nightly version.

in the pre_build script I have uninstalled the latest nightly version, always use the version we have dependencies defined on py/requirements.txt
this way our torch and libtorch is always in sync, I have verified does not show us the error in windows for cu121/cu124.

However there is a new error in windows for cu118:
torchvision 0.19.0.dev20240617+cu118 depends on torch==2.5.0.dev20240616+cu118 in windows,
however https://download.pytorch.org/whl/nightly/torch/ does not have a 2.5.0.dev20240616+cu118

I think we may have to anyway bump torchvision version from 0.19 to 0.20 and even though there is some test failures on batch_norm, it is acceptable and can be fixed later on.

@github-actions github-actions bot added component: conversion Issues re: Conversion stage component: build system Issues re: Build system component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Aug 15, 2024
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/_compile.py	2024-08-15 23:49:19.753876+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/_compile.py	2024-08-15 23:49:38.292213+00:00
@@ -532,6 +532,6 @@

                with enable_torchbind_tracing():
                    exp_program = torch.export.export(
                        module, tuple(arg_inputs), kwargs=kwarg_inputs, strict=False
                    )
-                    torch.export.save(exp_program, file_path)
\ No newline at end of file
+                    torch.export.save(exp_program, file_path)

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/_compile.py	2024-08-16 04:05:49.326484+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/_compile.py	2024-08-16 04:06:08.080367+00:00
@@ -532,6 +532,6 @@

                with enable_torchbind_tracing():
                    exp_program = torch.export.export(
                        module, tuple(arg_inputs), kwargs=kwarg_inputs, strict=False
                    )
-                    torch.export.save(exp_program, file_path)
\ No newline at end of file
+                    torch.export.save(exp_program, file_path)

@github-actions github-actions bot removed the component: build system Issues re: Build system label Aug 16, 2024
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/_compile.py	2024-08-16 04:25:34.264305+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/_compile.py	2024-08-16 04:25:54.988511+00:00
@@ -532,6 +532,6 @@

                with enable_torchbind_tracing():
                    exp_program = torch.export.export(
                        module, tuple(arg_inputs), kwargs=kwarg_inputs, strict=False
                    )
-                    torch.export.save(exp_program, file_path)
\ No newline at end of file
+                    torch.export.save(exp_program, file_path)

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/_compile.py	2024-08-16 04:27:06.573034+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/_compile.py	2024-08-16 04:27:25.968256+00:00
@@ -532,6 +532,6 @@

                with enable_torchbind_tracing():
                    exp_program = torch.export.export(
                        module, tuple(arg_inputs), kwargs=kwarg_inputs, strict=False
                    )
-                    torch.export.save(exp_program, file_path)
\ No newline at end of file
+                    torch.export.save(exp_program, file_path)

Copy link
Collaborator

@narendasan narendasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lanluo-nvidia lanluo-nvidia merged commit 4d8a94a into main Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build-release-artifacts Build the release artifacts ciflow/binaries/all Build for all Python Versions cla signed component: api [Python] Issues re: Python API component: conversion Issues re: Conversion stage component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants