Skip to content

Simulated OpenVINO Backend for Testing Unmerged PR Features with Memory Profiling #21500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

Mohamed-Ashraf273
Copy link
Contributor

Operating System

Ubuntu 22.04 (LTS)

Device used for inference

CPU

OpenVINO installation

PyPi

Programming Language

Python

Hardware Architecture

x86 (64 bits)

Model used

GPT-2

Model quantization

No

Performance issue description

During my GSoC project, I've faced this issue:
Running the generate step using OpenVINO backend gives a very high memory usage for some reason, based on these PRs:
Keras: #21491
Keras_hub: keras-team/keras-hub#2310

causal_lm = keras_hub.models.GPT2CausalLM.from_preset("gpt2_medium_en", dtype="float32")
causal_lm.summary()

for OpenVINO the model is being serialized with size:

Image
Total params: 354,823,168 (1.32 GB)
Trainable params: 354,823,168 (1.32 GB)
Non-trainable params: 0 (0.00 B)

Generate with max_length=20:

OpenVINO:
Generated text: Keras is  an open-source machine learning framework for Python, written by 
Keras is using the backend: openvino
Latency: 7.02 seconds
Throughput: 1.57 tokens/sec
CPU Memory Used (end - start): 2708.63 MB
Peak CPU Memory Used: 2832.45 MB

Tensorflow:
Generated text: Keras is  a powerful Python programming language that allows you to create powerful interactive models
Keras is using the backend: tensorflow
Latency: 8.97 seconds
Throughput: 1.67 tokens/sec
CPU Memory Used (end - start): 264.24 MB
Peak CPU Memory Used: 264.07 MB

JAX:
Generated text: Keras is  an object-oriented framework for building complex models with Python. It
Keras is using the backend: jax
Latency: 11.65 seconds
Throughput: 1.03 tokens/sec
CPU Memory Used (end - start): 260.25 MB
Peak CPU Memory Used: 260.07 MB


Torch:
Generated text: Keras is _____ and you want to use Keras?
This tutorial explains
Keras is using the backend: torch
Latency: 4.13 seconds
Throughput: 2.91 tokens/sec
CPU Memory Used (end - start): 56.97 MB
Peak CPU Memory Used: 56.61 MB

for

causal_lm = keras_hub.models.GPT2CausalLM.from_preset("gpt2_medium_en", dtype="float16")
causal_lm.summary()

for OpenVINO the model is being serialized with size:

Image
Total params: 354,823,168 (676.77 MB)
Trainable params: 354,823,168 (676.77 MB)
Non-trainable params: 0 (0.00 B)

Generate with max_length=20:

OpenVINO:
Generated text: Keras is  an open source framework for rapid prototyping and automation that is designed
Keras is using the backend: openvino
Latency: 12.31 seconds
Throughput: 1.14 tokens/sec
CPU Memory Used (end - start): 5564.21 MB
Peak CPU Memory Used: 6352.55 MB

Tensorflow:
Generated text: Keras is  a great language library for the JavaScript programming language. It provides you
Keras is using the backend: tensorflow
Latency: 11.43 seconds
Throughput: 1.22 tokens/sec
CPU Memory Used (end - start): 441.70 MB
Peak CPU Memory Used: 441.34 MB

JAX:
Generated text: Keras is _____. It's a language that is written in, or at least
Keras is using the backend: jax
Latency: 11.12 seconds
Throughput: 1.17 tokens/sec
CPU Memory Used (end - start): 364.71 MB
Peak CPU Memory Used: 1909.30 MB

Torch:
Generated text: Keras is  a powerful machine learning library written by  Martin Odersky 
Keras is using the backend: torch
Latency: 19.95 seconds
Throughput: 0.55 tokens/sec
CPU Memory Used (end - start): 62.81 MB
Peak CPU Memory Used: 258.79 MB

Step-by-step reproduction

using these PRs:
Keras: #21491
Keras_hub: keras-team/keras-hub#2310

run that code:

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

os.environ["KERAS_BACKEND"] = "openvino"


import time
import psutil
import keras
import keras_hub
import threading


causal_lm = keras_hub.models.GPT2CausalLM.from_preset("gpt2_medium_en", dtype="float16")

process = psutil.Process(os.getpid())
mem_before = process.memory_info().rss / (1024 ** 2)

peak_memory = mem_before
done = [False]
def monitor_memory():
    global peak_memory
    while not done[0]:
        mem_now = process.memory_info().rss / (1024 ** 2)
        if mem_now > peak_memory:
            peak_memory = mem_now
        time.sleep(0.05)

monitor_thread = threading.Thread(target=monitor_memory)
monitor_thread.start()

start_time = time.perf_counter()
output = causal_lm.generate("Keras is ", max_length=20)

end_time = time.perf_counter()
done[0] = True
monitor_thread.join()

mem_after = process.memory_info().rss / (1024 ** 2)
memory_used = mem_after - mem_before
latency = end_time - start_time
tokens_generated = len(output.split())
throughput = tokens_generated / latency

print("Generated text:", output)
print(f"Keras is using the backend: {keras.backend.backend()}")
print(f"Latency: {latency:.2f} seconds")
print(f"Throughput: {throughput:.2f} tokens/sec")
print(f"CPU Memory Used (end - start): {memory_used:.2f} MB")
print(f"Peak CPU Memory Used: {peak_memory - mem_before:.2f} MB")

Issue submission checklist

  • I'm reporting a performance issue. It's not a question.
  • I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
  • There is reproducer code and related data files such as images, videos, models, etc.

@Mohamed-Ashraf273 Mohamed-Ashraf273 marked this pull request as draft July 22, 2025 13:13
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @Mohamed-Ashraf273, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly advances the OpenVINO backend for Keras by implementing critical missing operations and introducing direct model export capabilities. The changes aim to improve the performance and memory efficiency of Keras models when using OpenVINO, addressing reported high memory usage issues, and enabling a more seamless workflow for deploying Keras models with OpenVINO.

Highlights

  • Expanded OpenVINO Backend Operations: Implemented several previously unsupported Keras operations for the OpenVINO backend, including slice_update, repeat, tri, tril, triu, and categorical functions. This significantly broadens the range of Keras models and operations that can be executed efficiently with OpenVINO.
  • Improved OpenVINO Tensor Handling: Enhanced the OpenVINOKerasTensor class to support conversion to NumPy arrays via __array__ and numpy() methods, and enabled direct handling of ov.Output objects. This improves compatibility, debuggability, and integration within the Keras ecosystem.
  • Direct OpenVINO Model Export: Introduced a new export_openvino utility and integrated it into the keras.models.Model.export() method. Keras models can now be directly exported to the OpenVINO Intermediate Representation (IR) format (.xml and .bin) for optimized inference on Intel hardware.
  • Test Suite Adjustments and Progress: Updated the OpenVINO backend's test exclusion lists, enabling several previously skipped NumPy and core operations tests (e.g., test_tri, test_repeat, test_slice_update). New test files for the OpenVINO export functionality were added, demonstrating successful export and inference for various model types and input structures.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the OpenVINO backend by implementing previously unsupported operations and adding a new feature to export Keras models to the OpenVINO IR format. The code is generally of high quality. My review focuses on improving maintainability by suggesting refactoring for complex functions and duplicated code, fixing a minor issue in a test, and removing non-source files from the PR. Additionally, I've highlighted some tests that have been excluded, which may indicate areas needing further attention.

Comment on lines 1 to 49
============================= test session starts ==============================
platform linux -- Python 3.12.3, pytest-8.4.0, pluggy-1.6.0 -- /home/mohamed-ashraf/Desktop/GSoC2025/env/bin/python
cachedir: .pytest_cache
rootdir: /home/mohamed-ashraf/Desktop/GSoC2025/keras-hub
configfile: pytest.ini
plugins: cov-6.1.1
collecting ... collected 15 items

keras_hub/src/models/gemma/gemma_causal_lm_test.py::TestCase::test_session SKIPPED [ 6%]
keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_all_presets SKIPPED [ 13%]
keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_cache_correctness SKIPPED [ 20%]
keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_causal_lm_basics SKIPPED [ 26%]
keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_early_stopping PASSED [ 33%]
keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_flash_attention_call SKIPPED [ 40%]
keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_generate PASSED [ 46%]
keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_generate_compilation PASSED [ 53%]
keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_generate_with_bfloat16 PASSED [ 60%]
keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_multitoken_stopping PASSED [ 66%]
keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_saved_model SKIPPED [ 73%]
keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_score_layer_intercept_fn_exfiltration PASSED [ 80%]
keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_score_logits PASSED [ 86%]
keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_score_loss SKIPPED [ 93%]
keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_session PASSED [100%]

=============================== warnings summary ===============================
../../../../../usr/lib/python3.12/multiprocessing/popen_fork.py:66
../../../../../usr/lib/python3.12/multiprocessing/popen_fork.py:66
/usr/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=450338) is multi-threaded, use of fork() may lead to deadlocks in the child.
self.pid = os.fork()

../env/lib/python3.12/site-packages/openvino/runtime/__init__.py:10
/home/mohamed-ashraf/Desktop/GSoC2025/env/lib/python3.12/site-packages/openvino/runtime/__init__.py:10: DeprecationWarning: The `openvino.runtime` module is deprecated and will be removed in the 2026.0 release. Please replace `openvino.runtime` with `openvino`.
warnings.warn(

../env/lib/python3.12/site-packages/_pytest/config/__init__.py:1474
/home/mohamed-ashraf/Desktop/GSoC2025/env/lib/python3.12/site-packages/_pytest/config/__init__.py:1474: PytestConfigWarning: Unknown config option: env

self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

../env/lib/python3.12/site-packages/google/protobuf/internal/well_known_types.py:91
/home/mohamed-ashraf/Desktop/GSoC2025/env/lib/python3.12/site-packages/google/protobuf/internal/well_known_types.py:91: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
_EPOCH_DATETIME_NAIVE = datetime.datetime.utcfromtimestamp(0)

keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_session
/usr/lib/python3.12/unittest/case.py:690: DeprecationWarning: It is deprecated to return a value that is not None from a test case (<bound method TensorFlowTestCase.test_session of <keras_hub.src.models.gemma.gemma_causal_lm_test.GemmaCausalLMTest testMethod=test_session>>)
return self.run(*args, **kwds)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================== 8 passed, 7 skipped, 6 warnings in 11.34s ===================
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This file appears to be a test log. Such files should not be included in the source code repository. Please remove it, along with gpt2_test.txt and mistral_test.txt.

@@ -815,10 +842,152 @@ def prepare_slice_index(val):


def slice_update(inputs, start_indices, updates):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The slice_update function is lengthy and complex. Refactoring it into smaller, more focused helper functions could improve readability and maintainability. Consider helpers for processing start_indices, generating the multi-dimensional indices matrix, and calculating absolute indices.

Comment on lines 1707 to 1714
x = get_ov_output(x)
ov_type = x.get_element_type()
shape = ov_opset.shape_of(x, Type.i32)
zero_const = ov_opset.constant(0, Type.i32)
minus2 = ov_opset.constant([-2], Type.i32)
minus1 = ov_opset.constant([-1], Type.i32)
M = ov_opset.squeeze(ov_opset.gather(shape, minus2, zero_const), zero_const)
N = ov_opset.squeeze(ov_opset.gather(shape, minus1, zero_const), zero_const)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic to extract matrix dimensions M and N from the input shape is duplicated in tril and triu. Extracting this into a private helper function would improve code reuse and maintainability.

Comment on lines 201 to 208
class TwoInputsModel(models.Model):
def call(self, x, y):
return x + y

def build(self, y_shape, x_shape):
self.built = True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The build method signature build(self, y_shape, x_shape) has its arguments reversed compared to the call(self, x, y) method. For consistency, the build signature should match the order of arguments in call, i.e., build(self, x_shape, y_shape).

            def build(self, x_shape, y_shape):

@@ -1,6 +1,5 @@
NumPyTestRot90
NumpyArrayCreateOpsCorrectnessTest::test_eye
NumpyArrayCreateOpsCorrectnessTest::test_tri
NumpyDtypeTest::test_absolute_bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Several tests related to tri, repeat have been excluded. Please ensure that these exclusions are intentional and documented, as they might indicate unimplemented or incorrectly implemented functionality.

@@ -32,7 +32,7 @@ keras/src/ops/linalg_test.py
keras/src/ops/nn_test.py
keras/src/optimizers
keras/src/quantizers
keras/src/random
keras/src/random/seed_generator_test.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The file keras/src/random/seed_generator_test.py is excluded. This suggests potential issues with random number generation or seed handling in the OpenVINO backend. Verify that the random number generation is working as expected.

CoreOpsCallsTests::test_switch_basic_call
CoreOpsCallsTests::test_unstack_basic_functionality
CoreOpsCorrectnessTest::test_associative_scan
CoreOpsCorrectnessTest::test_cond
CoreOpsCorrectnessTest::test_dynamic_slice
CoreOpsCorrectnessTest::test_fori_loop
CoreOpsCorrectnessTest::test_map
CoreOpsCorrectnessTest::test_scan
CoreOpsCorrectnessTest::test_scatter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The CoreOpsCallsTests::test_slice_update_basic_call test is excluded. This exclusion indicates that slice_update may not be fully supported or may have issues. Verify the implementation of slice_update.

@codecov-commenter
Copy link

codecov-commenter commented Jul 22, 2025

Codecov Report

❌ Patch coverage is 88.76404% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.72%. Comparing base (3554825) to head (bd3b9cc).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
keras/src/backend/openvino/core.py 82.45% 6 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #21500      +/-   ##
==========================================
- Coverage   82.87%   82.72%   -0.15%     
==========================================
  Files         567      567              
  Lines       56073    56311     +238     
  Branches     8756     8800      +44     
==========================================
+ Hits        46470    46585     +115     
- Misses       7459     7566     +107     
- Partials     2144     2160      +16     
Flag Coverage Δ
keras 82.53% <88.76%> (-0.15%) ⬇️
keras-jax 63.81% <0.00%> (-0.22%) ⬇️
keras-numpy 58.31% <0.00%> (-0.21%) ⬇️
keras-openvino 34.66% <88.76%> (+0.05%) ⬆️
keras-tensorflow 64.23% <0.00%> (-0.23%) ⬇️
keras-torch 63.87% <0.00%> (-0.22%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Mohamed-Ashraf273 Mohamed-Ashraf273 force-pushed the gsoc2025 branch 2 times, most recently from 4327cc8 to 6855d8b Compare July 22, 2025 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants