chore: Fixes required for LLM models #3002

peri044 · 2024-07-11T16:41:24Z

Description

Converter additions for LLM models
Fix memory allocations on GPU - Models can now be exported on CPU and only use GPU for TRT compilation.

Inputs: List[Tensor: (1, (min=1, max=64))@int64]
    ...
    TRT Engine #1 - Submodule name: _run_on_acc_0
     Engine Inputs: List[Tensor: (1, (min=1, max=64))@int64]
     Number of Operators in Engine: 143
     Engine Outputs: List[Tensor: (1, (min=1, max=64), 32000)@float32]
    ...
   Outputs: List[Tensor: (1, (min=1, max=64), 32000)@float32]

Modifications to dryrun tracker to handle dynamic shapes.
LLM examples

Type of change

Please delete options that are not relevant and/or add your own.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

…llm_examples_main

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_DryRunTracker.py	2024-08-19 21:00:09.967336+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_DryRunTracker.py	2024-08-19 21:00:32.451960+00:00
@@ -224,20 +224,22 @@
    """Format shapes and dtypes of input Tensors into a readable string"""

    def input_formatter_helper(shapes: Any, dtypes: Any) -> str:
        """Helper for input formatter"""
        # Base case 1 - single static/dynamic shape, single dtype
-        if isinstance(shapes, tuple) and all(isinstance(elt, (int, tuple)) for elt in shapes):
+        if isinstance(shapes, tuple) and all(
+            isinstance(elt, (int, tuple)) for elt in shapes
+        ):
            input_shape_string = "Tensor: ("
            for elt in shapes:
                if isinstance(elt, tuple):
-                    input_shape_string+= f"(min={elt[0]}, max={elt[1]}), "
+                    input_shape_string += f"(min={elt[0]}, max={elt[1]}), "
                else:
-                    input_shape_string+= f"{elt}, "
+                    input_shape_string += f"{elt}, "
            input_shape_string = input_shape_string[:-2] + ")" + f"@{str(dtypes)[6:]}, "
            return input_shape_string
-        
+
        # Base case 2 - dynamic shape, single dtype
        elif (
            isinstance(shapes, dict)
            and len(shapes) == 3
            and all(
--- /home/runner/work/TensorRT/TensorRT/tools/perf/utils.py	2024-08-19 21:00:10.003336+00:00
+++ /home/runner/work/TensorRT/TensorRT/tools/perf/utils.py	2024-08-19 21:00:37.999905+00:00
@@ -28,19 +28,16 @@
}


def load_hf_model(model_name_hf):
    print("Loading user-specified HF model: ", model_name_hf)
-    model_hf = (
-        AutoModelForCausalLM.from_pretrained(
-            model_name_hf,
-            trust_remote_code=True,
-            use_cache=False,
-            attn_implementation="eager",
-        )
-        .eval()
-    )
+    model_hf = AutoModelForCausalLM.from_pretrained(
+        model_name_hf,
+        trust_remote_code=True,
+        use_cache=False,
+        attn_implementation="eager",
+    ).eval()

    return {"model": model_hf}


class ModelStorage:

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_DryRunTracker.py	2024-08-20 22:09:12.087830+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_DryRunTracker.py	2024-08-20 22:09:32.444447+00:00
@@ -224,20 +224,22 @@
    """Format shapes and dtypes of input Tensors into a readable string"""

    def input_formatter_helper(shapes: Any, dtypes: Any) -> str:
        """Helper for input formatter"""
        # Base case 1 - single static/dynamic shape, single dtype
-        if isinstance(shapes, tuple) and all(isinstance(elt, (int, tuple)) for elt in shapes):
+        if isinstance(shapes, tuple) and all(
+            isinstance(elt, (int, tuple)) for elt in shapes
+        ):
            input_shape_string = "Tensor: ("
            for elt in shapes:
                if isinstance(elt, tuple):
-                    input_shape_string+= f"(min={elt[0]}, max={elt[1]}), "
+                    input_shape_string += f"(min={elt[0]}, max={elt[1]}), "
                else:
-                    input_shape_string+= f"{elt}, "
+                    input_shape_string += f"{elt}, "
            input_shape_string = input_shape_string[:-2] + ")" + f"@{str(dtypes)[6:]}, "
            return input_shape_string
-        
+
        # Base case 2 - dynamic shape, single dtype
        elif (
            isinstance(shapes, dict)
            and len(shapes) == 3
            and all(
--- /home/runner/work/TensorRT/TensorRT/tools/perf/utils.py	2024-08-20 22:09:12.123831+00:00
+++ /home/runner/work/TensorRT/TensorRT/tools/perf/utils.py	2024-08-20 22:09:37.620289+00:00
@@ -28,19 +28,16 @@
}


def load_hf_model(model_name_hf):
    print("Loading user-specified HF model: ", model_name_hf)
-    model_hf = (
-        AutoModelForCausalLM.from_pretrained(
-            model_name_hf,
-            trust_remote_code=True,
-            use_cache=False,
-            attn_implementation="eager",
-        )
-        .eval()
-    )
+    model_hf = AutoModelForCausalLM.from_pretrained(
+        model_name_hf,
+        trust_remote_code=True,
+        use_cache=False,
+        attn_implementation="eager",
+    ).eval()

    return {"model": model_hf}


class ModelStorage:

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_DryRunTracker.py	2024-08-21 00:33:04.790449+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_DryRunTracker.py	2024-08-21 00:33:27.241394+00:00
@@ -224,20 +224,22 @@
    """Format shapes and dtypes of input Tensors into a readable string"""

    def input_formatter_helper(shapes: Any, dtypes: Any) -> str:
        """Helper for input formatter"""
        # Base case 1 - single static/dynamic shape, single dtype
-        if isinstance(shapes, tuple) and all(isinstance(elt, (int, tuple)) for elt in shapes):
+        if isinstance(shapes, tuple) and all(
+            isinstance(elt, (int, tuple)) for elt in shapes
+        ):
            input_shape_string = "Tensor: ("
            for elt in shapes:
                if isinstance(elt, tuple):
-                    input_shape_string+= f"(min={elt[0]}, max={elt[1]}), "
+                    input_shape_string += f"(min={elt[0]}, max={elt[1]}), "
                else:
-                    input_shape_string+= f"{elt}, "
+                    input_shape_string += f"{elt}, "
            input_shape_string = input_shape_string[:-2] + ")" + f"@{str(dtypes)[6:]}, "
            return input_shape_string
-        
+
        # Base case 2 - dynamic shape, single dtype
        elif (
            isinstance(shapes, dict)
            and len(shapes) == 3
            and all(
--- /home/runner/work/TensorRT/TensorRT/tools/perf/utils.py	2024-08-21 00:33:04.830449+00:00
+++ /home/runner/work/TensorRT/TensorRT/tools/perf/utils.py	2024-08-21 00:33:32.549136+00:00
@@ -28,19 +28,16 @@
}


def load_hf_model(model_name_hf):
    print("Loading user-specified HF model: ", model_name_hf)
-    model_hf = (
-        AutoModelForCausalLM.from_pretrained(
-            model_name_hf,
-            trust_remote_code=True,
-            use_cache=False,
-            attn_implementation="eager",
-        )
-        .eval()
-    )
+    model_hf = AutoModelForCausalLM.from_pretrained(
+        model_name_hf,
+        trust_remote_code=True,
+        use_cache=False,
+        attn_implementation="eager",
+    ).eval()

    return {"model": model_hf}


class ModelStorage:

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_DryRunTracker.py	2024-08-21 00:41:21.928115+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_DryRunTracker.py	2024-08-21 00:41:47.424333+00:00
@@ -224,20 +224,22 @@
    """Format shapes and dtypes of input Tensors into a readable string"""

    def input_formatter_helper(shapes: Any, dtypes: Any) -> str:
        """Helper for input formatter"""
        # Base case 1 - single static/dynamic shape, single dtype
-        if isinstance(shapes, tuple) and all(isinstance(elt, (int, tuple)) for elt in shapes):
+        if isinstance(shapes, tuple) and all(
+            isinstance(elt, (int, tuple)) for elt in shapes
+        ):
            input_shape_string = "Tensor: ("
            for elt in shapes:
                if isinstance(elt, tuple):
-                    input_shape_string+= f"(min={elt[0]}, max={elt[1]}), "
+                    input_shape_string += f"(min={elt[0]}, max={elt[1]}), "
                else:
-                    input_shape_string+= f"{elt}, "
+                    input_shape_string += f"{elt}, "
            input_shape_string = input_shape_string[:-2] + ")" + f"@{str(dtypes)[6:]}, "
            return input_shape_string
-        
+
        # Base case 2 - dynamic shape, single dtype
        elif (
            isinstance(shapes, dict)
            and len(shapes) == 3
            and all(
--- /home/runner/work/TensorRT/TensorRT/tools/perf/utils.py	2024-08-21 00:41:21.964116+00:00
+++ /home/runner/work/TensorRT/TensorRT/tools/perf/utils.py	2024-08-21 00:41:52.826949+00:00
@@ -28,19 +28,16 @@
}


def load_hf_model(model_name_hf):
    print("Loading user-specified HF model: ", model_name_hf)
-    model_hf = (
-        AutoModelForCausalLM.from_pretrained(
-            model_name_hf,
-            trust_remote_code=True,
-            use_cache=False,
-            attn_implementation="eager",
-        )
-        .eval()
-    )
+    model_hf = AutoModelForCausalLM.from_pretrained(
+        model_name_hf,
+        trust_remote_code=True,
+        use_cache=False,
+        attn_implementation="eager",
+    ).eval()

    return {"model": model_hf}


class ModelStorage:

py/torch_tensorrt/dynamo/conversion/_conversion.py

narendasan

LGTM

py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py

peri044 added 24 commits June 12, 2024 17:24

chore: add gpt2 example

2ea181a

chore: add llama2 example

37b65a5

Merge branch 'main' into llm_examples_main

bd12b12

Merge branch 'main' into llm_examples_main

4a9f73e

Merge branch 'main' into llm_examples_main

0387d0b

chore: updates

6193939

Merge branch 'main' into llm_examples_main

9d3296e

Merge branch 'main' into llm_examples_main

84fc49c

chore: rebase

ff17d91

Merge branch 'llm_examples_main' of github.com:pytorch/TensorRT into …

8e6ba26

…llm_examples_main

Merge branch 'main' into llm_examples_main

67ec408

chore: remove aten.full decomposition

9af8e39

chore: fix expand DS support

50d4096

chore: minor fix

59febf5

chore: updates

c3e4382

chore: add testcase

0673db4

Merge branch 'main' into full

0b62f8f

Merge branch 'full' into fix_expand_ds

54f6410

Merge branch 'fix_expand_ds' into llm_examples_main

ae3d6b2

chore: updates

4464fd5

chore: updates

63b13cf

Merge branch 'main' into llm_examples_main

3d10b92

chore: updates

e97a94f

chore: updates

4f503a8

facebook-github-bot added the cla signed label Jul 11, 2024

github-actions bot requested changes Aug 19, 2024

View reviewed changes

peri044 requested a review from narendasan August 20, 2024 22:08

github-actions bot requested changes Aug 20, 2024

View reviewed changes

chore: updates

7be8604

github-actions bot removed the component: tests Issues re: Tests label Aug 21, 2024

github-actions bot requested changes Aug 21, 2024

View reviewed changes

peri044 requested a review from zewenli98 August 21, 2024 00:41

github-actions bot requested changes Aug 21, 2024

View reviewed changes

Merge branch 'main' into llm_examples_main

4d75a2e

This comment was marked as resolved.

Sign in to view

narendasan reviewed Aug 21, 2024

View reviewed changes

py/torch_tensorrt/dynamo/conversion/_conversion.py Show resolved Hide resolved

narendasan approved these changes Aug 21, 2024

View reviewed changes

Merge branch 'main' into llm_examples_main

6a215f8

This comment was marked as resolved.

Sign in to view

chore: updates

510c5ae

This was referenced Aug 23, 2024

🐛 [Bug] torch._export.verifier.SpecViolationError when using Torch-TensorRT #2655

Open

🐛 [torch.export][llama2] Accuracy issues with llama model #2964

Closed

HolyWu reviewed Aug 24, 2024

View reviewed changes

py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py Outdated Show resolved Hide resolved

Merge branch 'main' into llm_examples_main

2fcdaad

github-actions bot added the component: torch_compile label Aug 28, 2024

This comment was marked as resolved.

Sign in to view

peri044 added 2 commits August 28, 2024 09:49

chore: updates

0a429c3

chore: updates

58b5bfd

github-actions bot added the component: tests Issues re: Tests label Aug 28, 2024

chore: CI test fixes

4e404df

github-actions bot removed the component: torch_compile label Aug 28, 2024

peri044 merged commit fa812a9 into main Aug 29, 2024
67 checks passed

keehyuna mentioned this pull request Sep 4, 2024

🐛 [Bug] Compiler error while running sd_unet model #3144

Open

HolyWu mentioned this pull request Sep 8, 2024

feat: log_softmax decomposition #3137

Merged

7 tasks

chohk88 mentioned this pull request Dec 18, 2024

[Coverage] ValueError: The meta val for input node max_pool2d_default is of type : <class 'tuple'> #3185

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: Fixes required for LLM models #3002

chore: Fixes required for LLM models #3002

Uh oh!

peri044 commented Jul 11, 2024 •

edited

Loading

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

narendasan left a comment

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

chore: Fixes required for LLM models #3002

chore: Fixes required for LLM models #3002

Uh oh!

Conversation

peri044 commented Jul 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist:

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

narendasan left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

peri044 commented Jul 11, 2024 •

edited

Loading