Simulated OpenVINO Backend for Testing Unmerged PR Features with Memory Profiling #21500

Mohamed-Ashraf273 · 2025-07-22T13:13:40Z

Operating System

Ubuntu 22.04 (LTS)

Device used for inference

CPU

OpenVINO installation

PyPi

Programming Language

Python

Hardware Architecture

x86 (64 bits)

Model used

GPT-2

Model quantization

No

Performance issue description

During my GSoC project, I've faced this issue:
Running the generate step using OpenVINO backend gives a very high memory usage for some reason, based on these PRs:
Keras: #21491
Keras_hub: keras-team/keras-hub#2310

causal_lm = keras_hub.models.GPT2CausalLM.from_preset("gpt2_medium_en", dtype="float32")
causal_lm.summary()

for OpenVINO the model is being serialized with size:

Total params: 354,823,168 (1.32 GB)
Trainable params: 354,823,168 (1.32 GB)
Non-trainable params: 0 (0.00 B)

Generate with max_length=20:

OpenVINO:
Generated text: Keras is  an open-source machine learning framework for Python, written by 
Keras is using the backend: openvino
Latency: 7.02 seconds
Throughput: 1.57 tokens/sec
CPU Memory Used (end - start): 2708.63 MB
Peak CPU Memory Used: 2832.45 MB

Tensorflow:
Generated text: Keras is  a powerful Python programming language that allows you to create powerful interactive models
Keras is using the backend: tensorflow
Latency: 8.97 seconds
Throughput: 1.67 tokens/sec
CPU Memory Used (end - start): 264.24 MB
Peak CPU Memory Used: 264.07 MB

JAX:
Generated text: Keras is  an object-oriented framework for building complex models with Python. It
Keras is using the backend: jax
Latency: 11.65 seconds
Throughput: 1.03 tokens/sec
CPU Memory Used (end - start): 260.25 MB
Peak CPU Memory Used: 260.07 MB


Torch:
Generated text: Keras is _____ and you want to use Keras?
This tutorial explains
Keras is using the backend: torch
Latency: 4.13 seconds
Throughput: 2.91 tokens/sec
CPU Memory Used (end - start): 56.97 MB
Peak CPU Memory Used: 56.61 MB

for

causal_lm = keras_hub.models.GPT2CausalLM.from_preset("gpt2_medium_en", dtype="float16")
causal_lm.summary()

for OpenVINO the model is being serialized with size:

Total params: 354,823,168 (676.77 MB)
Trainable params: 354,823,168 (676.77 MB)
Non-trainable params: 0 (0.00 B)

Generate with max_length=20:

OpenVINO:
Generated text: Keras is  an open source framework for rapid prototyping and automation that is designed
Keras is using the backend: openvino
Latency: 12.31 seconds
Throughput: 1.14 tokens/sec
CPU Memory Used (end - start): 5564.21 MB
Peak CPU Memory Used: 6352.55 MB

Tensorflow:
Generated text: Keras is  a great language library for the JavaScript programming language. It provides you
Keras is using the backend: tensorflow
Latency: 11.43 seconds
Throughput: 1.22 tokens/sec
CPU Memory Used (end - start): 441.70 MB
Peak CPU Memory Used: 441.34 MB

JAX:
Generated text: Keras is _____. It's a language that is written in, or at least
Keras is using the backend: jax
Latency: 11.12 seconds
Throughput: 1.17 tokens/sec
CPU Memory Used (end - start): 364.71 MB
Peak CPU Memory Used: 1909.30 MB

Torch:
Generated text: Keras is  a powerful machine learning library written by  Martin Odersky 
Keras is using the backend: torch
Latency: 19.95 seconds
Throughput: 0.55 tokens/sec
CPU Memory Used (end - start): 62.81 MB
Peak CPU Memory Used: 258.79 MB

Step-by-step reproduction

using these PRs:
Keras: #21491
Keras_hub: keras-team/keras-hub#2310

run that code:

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

os.environ["KERAS_BACKEND"] = "openvino"


import time
import psutil
import keras
import keras_hub
import threading


causal_lm = keras_hub.models.GPT2CausalLM.from_preset("gpt2_medium_en", dtype="float16")

process = psutil.Process(os.getpid())
mem_before = process.memory_info().rss / (1024 ** 2)

peak_memory = mem_before
done = [False]
def monitor_memory():
    global peak_memory
    while not done[0]:
        mem_now = process.memory_info().rss / (1024 ** 2)
        if mem_now > peak_memory:
            peak_memory = mem_now
        time.sleep(0.05)

monitor_thread = threading.Thread(target=monitor_memory)
monitor_thread.start()

start_time = time.perf_counter()
output = causal_lm.generate("Keras is ", max_length=20)

end_time = time.perf_counter()
done[0] = True
monitor_thread.join()

mem_after = process.memory_info().rss / (1024 ** 2)
memory_used = mem_after - mem_before
latency = end_time - start_time
tokens_generated = len(output.split())
throughput = tokens_generated / latency

print("Generated text:", output)
print(f"Keras is using the backend: {keras.backend.backend()}")
print(f"Latency: {latency:.2f} seconds")
print(f"Throughput: {throughput:.2f} tokens/sec")
print(f"CPU Memory Used (end - start): {memory_used:.2f} MB")
print(f"Peak CPU Memory Used: {peak_memory - mem_before:.2f} MB")

Issue submission checklist

I'm reporting a performance issue. It's not a question.
I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
There is reproducer code and related data files such as images, videos, models, etc.

gemini-code-assist

Summary of Changes

Hello @Mohamed-Ashraf273, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly advances the OpenVINO backend for Keras by implementing critical missing operations and introducing direct model export capabilities. The changes aim to improve the performance and memory efficiency of Keras models when using OpenVINO, addressing reported high memory usage issues, and enabling a more seamless workflow for deploying Keras models with OpenVINO.

Highlights

Expanded OpenVINO Backend Operations: Implemented several previously unsupported Keras operations for the OpenVINO backend, including slice_update, repeat, tri, tril, triu, and categorical functions. This significantly broadens the range of Keras models and operations that can be executed efficiently with OpenVINO.
Improved OpenVINO Tensor Handling: Enhanced the OpenVINOKerasTensor class to support conversion to NumPy arrays via __array__ and numpy() methods, and enabled direct handling of ov.Output objects. This improves compatibility, debuggability, and integration within the Keras ecosystem.
Direct OpenVINO Model Export: Introduced a new export_openvino utility and integrated it into the keras.models.Model.export() method. Keras models can now be directly exported to the OpenVINO Intermediate Representation (IR) format (.xml and .bin) for optimized inference on Intel hardware.
Test Suite Adjustments and Progress: Updated the OpenVINO backend's test exclusion lists, enabling several previously skipped NumPy and core operations tests (e.g., test_tri, test_repeat, test_slice_update). New test files for the OpenVINO export functionality were added, demonstrating successful export and inference for various model types and input structures.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request enhances the OpenVINO backend by implementing previously unsupported operations and adding a new feature to export Keras models to the OpenVINO IR format. The code is generally of high quality. My review focuses on improving maintainability by suggesting refactoring for complex functions and duplicated code, fixing a minor issue in a test, and removing non-source files from the PR. Additionally, I've highlighted some tests that have been excluded, which may indicate areas needing further attention.

gemini-code-assist · 2025-07-22T13:15:53Z

testing files/gemma_test.txt

+============================= test session starts ==============================
+platform linux -- Python 3.12.3, pytest-8.4.0, pluggy-1.6.0 -- /home/mohamed-ashraf/Desktop/GSoC2025/env/bin/python
+cachedir: .pytest_cache
+rootdir: /home/mohamed-ashraf/Desktop/GSoC2025/keras-hub
+configfile: pytest.ini
+plugins: cov-6.1.1
+collecting ... collected 15 items
+
+keras_hub/src/models/gemma/gemma_causal_lm_test.py::TestCase::test_session SKIPPED [  6%]
+keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_all_presets SKIPPED [ 13%]
+keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_cache_correctness SKIPPED [ 20%]
+keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_causal_lm_basics SKIPPED [ 26%]
+keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_early_stopping PASSED [ 33%]
+keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_flash_attention_call SKIPPED [ 40%]
+keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_generate PASSED [ 46%]
+keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_generate_compilation PASSED [ 53%]
+keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_generate_with_bfloat16 PASSED [ 60%]
+keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_multitoken_stopping PASSED [ 66%]
+keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_saved_model SKIPPED [ 73%]
+keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_score_layer_intercept_fn_exfiltration PASSED [ 80%]
+keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_score_logits PASSED [ 86%]
+keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_score_loss SKIPPED [ 93%]
+keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_session PASSED [100%]
+
+=============================== warnings summary ===============================
+../../../../../usr/lib/python3.12/multiprocessing/popen_fork.py:66
+../../../../../usr/lib/python3.12/multiprocessing/popen_fork.py:66
+  /usr/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=450338) is multi-threaded, use of fork() may lead to deadlocks in the child.
+    self.pid = os.fork()
+
+../env/lib/python3.12/site-packages/openvino/runtime/__init__.py:10
+  /home/mohamed-ashraf/Desktop/GSoC2025/env/lib/python3.12/site-packages/openvino/runtime/__init__.py:10: DeprecationWarning: The `openvino.runtime` module is deprecated and will be removed in the 2026.0 release. Please replace `openvino.runtime` with `openvino`.
+    warnings.warn(
+
+../env/lib/python3.12/site-packages/_pytest/config/__init__.py:1474
+  /home/mohamed-ashraf/Desktop/GSoC2025/env/lib/python3.12/site-packages/_pytest/config/__init__.py:1474: PytestConfigWarning: Unknown config option: env
+
+    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")
+
+../env/lib/python3.12/site-packages/google/protobuf/internal/well_known_types.py:91
+  /home/mohamed-ashraf/Desktop/GSoC2025/env/lib/python3.12/site-packages/google/protobuf/internal/well_known_types.py:91: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
+    _EPOCH_DATETIME_NAIVE = datetime.datetime.utcfromtimestamp(0)
+
+keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_session
+  /usr/lib/python3.12/unittest/case.py:690: DeprecationWarning: It is deprecated to return a value that is not None from a test case (<bound method TensorFlowTestCase.test_session of <keras_hub.src.models.gemma.gemma_causal_lm_test.GemmaCausalLMTest testMethod=test_session>>)
+    return self.run(*args, **kwds)
+
+-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
+================== 8 passed, 7 skipped, 6 warnings in 11.34s ===================


This file appears to be a test log. Such files should not be included in the source code repository. Please remove it, along with gpt2_test.txt and mistral_test.txt.

gemini-code-assist · 2025-07-22T13:15:53Z

keras/src/backend/openvino/core.py

@@ -815,10 +842,152 @@ def prepare_slice_index(val):


 def slice_update(inputs, start_indices, updates):


The slice_update function is lengthy and complex. Refactoring it into smaller, more focused helper functions could improve readability and maintainability. Consider helpers for processing start_indices, generating the multi-dimensional indices matrix, and calculating absolute indices.

gemini-code-assist · 2025-07-22T13:15:54Z

keras/src/backend/openvino/numpy.py

+    x = get_ov_output(x)
+    ov_type = x.get_element_type()
+    shape = ov_opset.shape_of(x, Type.i32)
+    zero_const = ov_opset.constant(0, Type.i32)
+    minus2 = ov_opset.constant([-2], Type.i32)
+    minus1 = ov_opset.constant([-1], Type.i32)
+    M = ov_opset.squeeze(ov_opset.gather(shape, minus2, zero_const), zero_const)
+    N = ov_opset.squeeze(ov_opset.gather(shape, minus1, zero_const), zero_const)


The logic to extract matrix dimensions M and N from the input shape is duplicated in tril and triu. Extracting this into a private helper function would improve code reuse and maintainability.

gemini-code-assist · 2025-07-22T13:15:54Z

keras/src/export/openvino_test.py

+        class TwoInputsModel(models.Model):
+            def call(self, x, y):
+                return x + y
+
+            def build(self, y_shape, x_shape):
+                self.built = True
+


The build method signature build(self, y_shape, x_shape) has its arguments reversed compared to the call(self, x, y) method. For consistency, the build signature should match the order of arguments in call, i.e., build(self, x_shape, y_shape).

def build(self, x_shape, y_shape):

gemini-code-assist · 2025-07-22T13:15:54Z

keras/src/backend/openvino/excluded_concrete_tests.txt

@@ -1,6 +1,5 @@
 NumPyTestRot90
 NumpyArrayCreateOpsCorrectnessTest::test_eye
-NumpyArrayCreateOpsCorrectnessTest::test_tri
 NumpyDtypeTest::test_absolute_bool


Several tests related to tri, repeat have been excluded. Please ensure that these exclusions are intentional and documented, as they might indicate unimplemented or incorrectly implemented functionality.

gemini-code-assist · 2025-07-22T13:15:54Z

keras/src/backend/openvino/excluded_tests.txt

@@ -32,7 +32,7 @@ keras/src/ops/linalg_test.py
 keras/src/ops/nn_test.py
 keras/src/optimizers
 keras/src/quantizers
-keras/src/random
+keras/src/random/seed_generator_test.py


The file keras/src/random/seed_generator_test.py is excluded. This suggests potential issues with random number generation or seed handling in the OpenVINO backend. Verify that the random number generation is working as expected.

gemini-code-assist · 2025-07-22T13:15:54Z

keras/src/backend/openvino/excluded_concrete_tests.txt

 CoreOpsCallsTests::test_switch_basic_call
 CoreOpsCallsTests::test_unstack_basic_functionality
 CoreOpsCorrectnessTest::test_associative_scan
 CoreOpsCorrectnessTest::test_cond
-CoreOpsCorrectnessTest::test_dynamic_slice
 CoreOpsCorrectnessTest::test_fori_loop
 CoreOpsCorrectnessTest::test_map
 CoreOpsCorrectnessTest::test_scan
 CoreOpsCorrectnessTest::test_scatter


The CoreOpsCallsTests::test_slice_update_basic_call test is excluded. This exclusion indicates that slice_update may not be fully supported or may have issues. Verify the implementation of slice_update.

codecov-commenter · 2025-07-22T13:20:14Z

Codecov Report

❌ Patch coverage is 88.76404% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.72%. Comparing base (3554825) to head (bd3b9cc).
⚠️ Report is 4 commits behind head on master.

Files with missing lines	Patch %	Lines
keras/src/backend/openvino/core.py	82.45%	6 Missing and 4 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #21500      +/-   ##
==========================================
- Coverage   82.87%   82.72%   -0.15%     
==========================================
  Files         567      567              
  Lines       56073    56311     +238     
  Branches     8756     8800      +44     
==========================================
+ Hits        46470    46585     +115     
- Misses       7459     7566     +107     
- Partials     2144     2160      +16

Flag	Coverage Δ
keras	`82.53% <88.76%> (-0.15%)`	⬇️
keras-jax	`63.81% <0.00%> (-0.22%)`	⬇️
keras-numpy	`58.31% <0.00%> (-0.21%)`	⬇️
keras-openvino	`34.66% <88.76%> (+0.05%)`	⬆️
keras-tensorflow	`64.23% <0.00%> (-0.23%)`	⬇️
keras-torch	`63.87% <0.00%> (-0.22%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Mohamed-Ashraf273 added 3 commits July 1, 2025 16:46

[OpenVINO backend] support repeat

aefa32f

[OpenVINO backend] support slice_update

8093d51

add more detailed comments

1f8f2ea

google-ml-butler bot added the size:XL label Jul 22, 2025

google-ml-butler bot assigned gbaned Jul 22, 2025

Mohamed-Ashraf273 marked this pull request as draft July 22, 2025 13:13

gemini-code-assist bot reviewed Jul 22, 2025

View reviewed changes

Mohamed-Ashraf273 mentioned this pull request Jul 22, 2025

[Performance] High Memory Usage During GPT-2 Generation Using OpenVINO Backend on Keras 3 Compared to other backends openvinotoolkit/openvino#31390

Open

3 tasks

gemini-code-assist bot reviewed Jul 22, 2025

View reviewed changes

Mohamed-Ashraf273 force-pushed the gsoc2025 branch 2 times, most recently from 4327cc8 to 6855d8b Compare July 22, 2025 16:23

[OpenVINO backend] handle scalar updates for slice_update

6e0aecd

Mohamed-Ashraf273 force-pushed the gsoc2025 branch from 9217787 to 930690a Compare July 24, 2025 22:18

Mohamed-Ashraf273 added 3 commits July 25, 2025 01:42

Merge branch 'support_slice_update' into gsoc2025

6f01308

Merge branch 'support_repeat' into gsoc2025

373f737

adding testing_files

53d7143

Mohamed-Ashraf273 force-pushed the gsoc2025 branch from 930690a to 53d7143 Compare July 24, 2025 22:43

adding tests results for the main PR

bd3b9cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simulated OpenVINO Backend for Testing Unmerged PR Features with Memory Profiling #21500

Simulated OpenVINO Backend for Testing Unmerged PR Features with Memory Profiling #21500

Uh oh!

Mohamed-Ashraf273 commented Jul 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jul 22, 2025

Uh oh!

gemini-code-assist bot Jul 22, 2025

Uh oh!

gemini-code-assist bot Jul 22, 2025

Uh oh!

gemini-code-assist bot Jul 22, 2025

Uh oh!

gemini-code-assist bot Jul 22, 2025

Uh oh!

gemini-code-assist bot Jul 22, 2025

Uh oh!

gemini-code-assist bot Jul 22, 2025

Uh oh!

codecov-commenter commented Jul 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

		@@ -815,10 +842,152 @@ def prepare_slice_index(val):


		def slice_update(inputs, start_indices, updates):

Simulated OpenVINO Backend for Testing Unmerged PR Features with Memory Profiling #21500

Are you sure you want to change the base?

Simulated OpenVINO Backend for Testing Unmerged PR Features with Memory Profiling #21500

Uh oh!

Conversation

Mohamed-Ashraf273 commented Jul 22, 2025

Operating System

Device used for inference

OpenVINO installation

Programming Language

Hardware Architecture

Model used

Model quantization

Performance issue description

Step-by-step reproduction

Issue submission checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

codecov-commenter commented Jul 22, 2025 •

edited

Loading