Skip to content

Conversation

@mcopik
Copy link
Collaborator

@mcopik mcopik commented Aug 9, 2022

The features that we're planning to implement here are:

  • Building C++ functions with CMake.
  • Building and handling functions' Docker containers for C++.
  • Providing base images with AWS SDK, Boost, and Redis support.
  • Test sleep benchmarks.

Summary by CodeRabbit

  • New Features

    • Added first-class C++ support across benchmarks, packaging, CLI, and AWS runtimes.
    • New C++ benchmark implementations (sleep, thumbnailer, image recognition, PageRank, BFS) with S3/DynamoDB/Redis integrations.
    • Prebuilt Docker images and installer scripts for C++ dependencies (Boost, OpenCV, LibTorch, igraph, hiredis, runtime).
  • Refactor

    • Introduced strongly-typed language/architecture enums for consistent language handling.
  • Chores

    • Extended build tooling and CLI to support C++ and parallel/dependencies builds.

@mcopik mcopik mentioned this pull request Aug 9, 2022
6 tasks
@mcopik mcopik mentioned this pull request Apr 26, 2023
@coderabbitai
Copy link

coderabbitai bot commented Jun 18, 2025

Walkthrough

Adds first-class C++ support across the benchmark tooling: C++ benchmark implementations and wrappers, Docker build recipes for C++ dependencies, CMake packaging support, and refactors language handling to a typed Language enum used throughout packaging and runtime flows.

Changes

Cohort / File(s) Change Summary
Microbenchmark (sleep)
benchmarks/000.microbenchmarks/010.sleep/config.json, benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp
Enabled cpp language and added a C++ implementation that sleeps for the requested seconds and returns a JSON result.
AWS C++ wrappers
benchmarks/wrappers/aws/cpp/handler.cpp, .../key-value.cpp, .../key-value.hpp, .../redis.cpp, .../redis.hpp, .../storage.cpp, .../storage.hpp, .../utils.cpp, .../utils.hpp
Added AWS Lambda C++ handler, S3/DynamoDB/Redis wrappers, and time utility; includes retry/backoff, streaming, timing, and container/cold-start metadata handling.
Benchmarks (C++ workloads)
benchmarks/200.multimedia/210.thumbnailer/.../cpp/*, benchmarks/400.inference/411.image-recognition/cpp/*, benchmarks/500.scientific/*/cpp/*
Added multiple C++ benchmark implementations and headers (thumbnailer, image-recognition, pagerank, BFS), integrating OpenCV, Torch, igraph, UUIDs, S3 interactions, and timing metrics.
Configs
config/systems.json, benchmarks/*/config.json (multiple)
Added cpp language entries and cpp_dependencies in several benchmark configs; added AWS cpp language configuration and deployment file lists in systems.json.
Dockerfiles & installer
dockerfiles/aws/cpp/Dockerfile.build, dockerfiles/aws/cpp/Dockerfile.dependencies-* (boost,hiredis,runtime,sdk,opencv,igraph,torch), dockerfiles/cpp_installer.sh, dockerfiles/aws/nodejs/Dockerfile.build
New multi-stage Dockerfiles to build C++ dependencies and final build image; installer script to run CMake build; small package change in nodejs build Dockerfile.
SeBS core: types & C++ support
sebs/types.py, sebs/cpp_dependencies.py
Introduced Language and Architecture enums and a C++ dependency model with CMake snippet generation and dependency metadata.
SeBS packaging & build refactors
sebs/benchmark.py, sebs/aws/aws.py, sebs/faas/* (container.py, function.py, system.py), sebs/azure/azure.py, sebs/gcp/gcp.py, sebs/openwhisk/openwhisk.py, sebs.py
Replaced string language identifiers with Language enum across packaging/build/runtime; added C++ packaging path (CMake/CMakeLists generation), cloud runtime mapping for C++, updated packaging logic and CLI to accept cpp.
Tools
tools/build_docker_images.py
Added support for building dependency images, new CLI args (--type dependencies, --type-tag, --parallel), added cpp language option, and extended dependency/image build loops.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Dev as Developer/CLI
    participant SeBS as SeBS (packaging)
    participant Builder as DockerBuilder (build image)
    participant Registry as ImageRegistry
    participant Cloud as Cloud (AWS)
    participant Lambda as Lambda (function)

    Dev->>SeBS: sebs package --language cpp
    SeBS->>SeBS: generate CMakeLists & deployment (benchmarks/.../cpp)
    SeBS->>Builder: build docker image (Dockerfile.build, deps)
    Builder->>Registry: push dependency/base images (optional)
    Registry->>Cloud: register image (ECR / cloud registry)
    SeBS->>Cloud: create_function with runtime "provided.al2" and package
    Dev->>Lambda: invoke
    Lambda->>Wrappers: handler.cpp (cold-start check, timing)
    Wrappers->>Benchmark: invoke function(...) (C++ user code)
    Benchmark->>Lambda: return JSON result with measurements
    Lambda->>Dev: response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Points needing extra attention:

  • C++ build/dockerfiles (multi-stage complexity, library linking, libtorch symlink handling).
  • C++ wrappers (handler cold-start, JSON shaping, and error paths).
  • CMake generation and sebs/cpp_dependencies.py correctness for all dependency mappings.
  • Language enum propagation changes across many Python modules (typing and runtime mapping).

Possibly related PRs

Poem

"I nibble code in twilight's lap,
New C++ tunnels map the app.
Docker carrots, dependencies sprout,
Enums bring order, no more doubt.
Hop—benchmarks run, the warren cheers!" 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.41% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "Add support for C++ benchmarks" is directly aligned with the main objective of this changeset. The comprehensive modifications include adding C++ language support across the entire benchmarking infrastructure, including new benchmark implementations (sleep, thumbnailer, image-recognition, graph-pagerank, graph-bfs), AWS Lambda C++ runtime wrappers, dependency management wrappers for DynamoDB and Redis, Docker build configurations for C++ dependencies, CMake-based deployment packaging, and updates to the core framework to recognize and handle C++ as a first-class language. The title accurately captures this primary enhancement without being vague or overly broad.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cpp_benchmarks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 25

🔭 Outside diff range comments (4)
benchmarks/wrappers/aws/cpp/redis.hpp (1)

1-27: Add include guards and improve class design.

The header file lacks include guards and has several design issues that should be addressed:

  1. Missing include guards: The header should have include guards to prevent multiple inclusions
  2. Constructor should be explicit: Single-parameter constructors should be explicit to prevent implicit conversions
  3. Missing const correctness: Methods that don't modify state should be const
  4. Raw pointer management: Consider RAII principles for redisContext*
+#ifndef REDIS_HPP
+#define REDIS_HPP
+
 #include <cstdint>
 #include <string>

 #include <aws/core/utils/memory/stl/AWSString.h>

 #include <hiredis/hiredis.h>

 class Redis
 {
   redisContext* _context;
 public:

-  Redis(std::string redis_hostname, int redis_port);
+  explicit Redis(const std::string& redis_hostname, int redis_port);
   ~Redis();

-  bool is_initialized();
+  bool is_initialized() const;

-  uint64_t download_file(Aws::String const &key, int &required_retries, bool with_backoff);
+  uint64_t download_file(const Aws::String& key, int& required_retries, bool with_backoff) const;

-  uint64_t upload_file(Aws::String const &key, int size, char* pBuf);
+  uint64_t upload_file(const Aws::String& key, int size, const char* pBuf);

-  uint64_t delete_file(std::string const &key);
+  uint64_t delete_file(const std::string& key) const;

 };

+#endif // REDIS_HPP
benchmarks/wrappers/aws/cpp/key-value.hpp (1)

1-32: Add include guards and improve interface design.

The header file has several areas for improvement:

  1. Missing include guards: Add them to prevent multiple inclusions
  2. Const correctness: Methods that don't modify state should be const
  3. Parameter const-correctness: Input parameters should be const references
  4. Comment about non-copyable/non-movable: Should be enforced with deleted copy/move operators
+#ifndef KEY_VALUE_HPP
+#define KEY_VALUE_HPP
+
 #include <cstdint>
 #include <string>
 #include <initializer_list>
 #include <memory>

 #include <aws/core/utils/memory/stl/AWSString.h>
 #include <aws/dynamodb/DynamoDBClient.h>

 class KeyValue
 {
-  // non-copyable, non-movable
   std::shared_ptr<Aws::DynamoDB::DynamoDBClient> _client;
 public:

   KeyValue();
+  
+  // non-copyable, non-movable
+  KeyValue(const KeyValue&) = delete;
+  KeyValue& operator=(const KeyValue&) = delete;
+  KeyValue(KeyValue&&) = delete;
+  KeyValue& operator=(KeyValue&&) = delete;

-uint64_t download_file(Aws::String const &bucket,
-                        Aws::String const &key,
+uint64_t download_file(const Aws::String& bucket,
+                        const Aws::String& key,
                         int& required_retries,
                         double& read_units,
-                        bool with_backoff = false);
+                        bool with_backoff = false) const;

-uint64_t upload_file(Aws::String const &bucket,
-                        Aws::String const &key,
+uint64_t upload_file(const Aws::String& bucket,
+                        const Aws::String& key,
                           double& write_units,
                           int size,
-                          unsigned char* pBuf);
+                          const unsigned char* pBuf);

 };

+#endif // KEY_VALUE_HPP
benchmarks/wrappers/aws/cpp/storage.hpp (1)

1-29: Add include guards and improve interface consistency.

The header file needs several improvements:

  1. Missing include guards: Add them for safety
  2. Interface consistency: Consider aligning parameter patterns with other wrapper classes
  3. Const correctness: Download method should be const
+#ifndef STORAGE_HPP
+#define STORAGE_HPP
+
 #include <cstdint>

 #include <aws/core/utils/memory/stl/AWSString.h>
 #include <aws/s3/S3Client.h>

 class Storage
 {
   Aws::S3::S3Client _client;
 public:

-  Storage(Aws::S3::S3Client && client):
+  explicit Storage(Aws::S3::S3Client&& client):
     _client(client)
   {}

   static Storage get_client();

-  uint64_t download_file(Aws::String const &bucket,
-                          Aws::String const &key,
+  uint64_t download_file(const Aws::String& bucket,
+                          const Aws::String& key,
                           int &required_retries,
-                          bool report_dl_time);
+                          bool report_dl_time) const;

-  uint64_t upload_random_file(Aws::String const &bucket,
-                          Aws::String const &key,
-                          int size, char* pBuf);
+  uint64_t upload_random_file(const Aws::String& bucket,
+                          const Aws::String& key,
+                          int size, const char* pBuf);

 };

+#endif // STORAGE_HPP
tools/build_docker_images.py (1)

51-151: CRITICAL: Resolve Git merge conflicts before merging.

The file contains unresolved Git merge conflict markers (<<<<<<<, =======, >>>>>>>) that cause syntax errors and prevent the code from running. These conflicts must be resolved manually.

The conflicts appear to be between:

  • HEAD version with dot notation: f"{args.type}.{args.type_tag}"
  • Branch a9f3c27 with hyphen notation: f"{args.type}-{args.type_tag}"

Please resolve these conflicts by choosing the appropriate format and removing all conflict markers.

🧹 Nitpick comments (18)
dockerfiles/aws/nodejs/Dockerfile.build (1)

5-5: Optimize yum layer by cleaning cache.

To slim down the final image, combine the install and cleanup steps into a single RUN and clear the yum cache:

-RUN yum install -y cmake curl libcurl libcurl-devel
+RUN yum install -y cmake curl libcurl libcurl-devel && \
+    yum clean all && \
+    rm -rf /var/cache/yum
dockerfiles/aws/cpp/Dockerfile.dependencies-hiredis (1)

7-8: Consider adding error handling and cleanup.

The build process could benefit from explicit error handling and cleanup to ensure robust builds and smaller image sizes.

-RUN   git clone https://github.com/redis/hiredis.git && cd hiredis\
-      && PREFIX=/opt make -j${WORKERS} install
+RUN set -e && \
+    git clone --depth 1 --branch v1.2.0 https://github.com/redis/hiredis.git && \
+    cd hiredis && \
+    PREFIX=/opt make -j${WORKERS} install && \
+    cd .. && \
+    rm -rf hiredis
benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp (2)

7-7: Remove unused include.

The iostream header is included but never used in this file.

 #include <thread>
-#include <iostream>

16-17: Remove commented-out code.

Commented-out code should be removed to keep the codebase clean. If this alternative implementation is needed for reference, consider documenting it properly or moving it to documentation.

-  //std::string res_json = "{ \"result\": " + std::to_string(sleep) + "}";
-  //return aws::lambda_runtime::invocation_response::success(res_json, "application/json");
dockerfiles/aws/cpp/Dockerfile.dependencies-runtime (1)

8-10: Consider improving build robustness and cleanup.

The build process could benefit from better error handling and cleanup to reduce image size and improve reliability.

-RUN   git clone https://github.com/awslabs/aws-lambda-cpp.git\
-      && cmake3 -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF -DCMAKE_INSTALL_PREFIX=/opt -B aws-lambda-cpp/build -S aws-lambda-cpp\
-      && cmake3 --build aws-lambda-cpp/build --parallel ${WORKERS} --target install
+RUN set -e && \
+    git clone --depth 1 --branch v0.2.8 https://github.com/awslabs/aws-lambda-cpp.git && \
+    cmake3 -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF -DCMAKE_INSTALL_PREFIX=/opt \
+           -B aws-lambda-cpp/build -S aws-lambda-cpp && \
+    cmake3 --build aws-lambda-cpp/build --parallel ${WORKERS} --target install && \
+    rm -rf aws-lambda-cpp
dockerfiles/aws/cpp/Dockerfile.dependencies-sdk (1)

9-9: Consider adding integrity verification and optimizing cmake configuration.

The cmake configuration looks appropriate, but consider adding integrity checks and optimization flags.

       && cmake3 -DCMAKE_BUILD_TYPE=Release -DBUILD_ONLY="s3;dynamodb;sqs" -DENABLE_TESTING=OFF -DCMAKE_INSTALL_PREFIX=/opt/ ..\
+      && cmake3 -DCMAKE_BUILD_TYPE=Release -DBUILD_ONLY="s3;dynamodb;sqs" -DENABLE_TESTING=OFF -DCMAKE_INSTALL_PREFIX=/opt/ -DCPP_STANDARD=17 -DCMAKE_CXX_FLAGS="-O3" ..\
dockerfiles/aws/cpp/Dockerfile.dependencies-boost (1)

13-15: Remove commented code.

The commented lines should be removed as they add confusion and don't serve any purpose in the final implementation.

-#RUN curl -LO https://boostorg.jfrog.io/artifactory/main/release/1.79.0/source/boost_1_79_0.tar.gz\
-#      && tar -xf boost_1_79_0.tar.gz && cd boost_1_79_0 && ./bootstrap.sh --prefix=/opt\
-#      && ./b2 -j${WORKERS} --prefix=/opt cxxflags="-fPIC" link=static install
config/systems.json (1)

125-137: Consider explicit architecture specification for consistency.

The C++ configuration uses "all" for architecture while Python and Node.js specify explicit architectures ("x64", "arm64"). This inconsistency might cause confusion and could limit architecture-specific optimizations.

Consider updating to be consistent with other languages:

       "cpp": {
         "base_images": {
-          "all": "amazon/aws-lambda-provided:al2.2022.04.27.09"
+          "x64": "amazon/aws-lambda-provided:al2.2022.04.27.09",
+          "arm64": "amazon/aws-lambda-provided:al2.2022.04.27.09"
         },
         "dependencies": [
             "runtime", "sdk", "boost", "hiredis"
         ],
-        "versions": ["all"],
+        "versions": ["all"],
         "images": ["build"],
dockerfiles/aws/cpp/Dockerfile.build (1)

18-21: Optimize script copying and permission setting.

Consider combining COPY and chmod operations to reduce layers.

-COPY docker/entrypoint.sh /sebs/entrypoint.sh
-COPY docker/cpp_installer.sh /sebs/installer.sh
-RUN chmod +x /sebs/entrypoint.sh
-RUN chmod +x /sebs/installer.sh
+COPY --chmod=755 docker/entrypoint.sh /sebs/entrypoint.sh
+COPY --chmod=755 docker/cpp_installer.sh /sebs/installer.sh
benchmarks/wrappers/aws/cpp/redis.hpp (1)

11-11: Consider using smart pointers for automatic resource management.

The raw redisContext* pointer requires manual memory management. Consider using std::unique_ptr<redisContext, void(*)(redisContext*)> with a custom deleter for automatic cleanup.

+#include <memory>
+
 class Redis
 {
-  redisContext* _context;
+  std::unique_ptr<redisContext, void(*)(redisContext*)> _context;
 public:
benchmarks/wrappers/aws/cpp/storage.hpp (1)

18-26: Consider interface consistency across wrapper classes.

The method signatures differ from other wrapper classes (Redis, KeyValue). Consider standardizing parameter patterns for consistency, especially for retry handling and timing measurements.

For example, the Redis class uses int& required_retries and bool with_backoff, while this class uses bool report_dl_time. Consider aligning these interfaces for consistency across the AWS wrapper ecosystem.

benchmarks/wrappers/aws/cpp/key-value.cpp (1)

19-20: Consider making configuration more flexible.

The hardcoded CA file path and commented region configuration reduce flexibility across different environments.

Consider reading these from environment variables:

-  //config.region = "eu-central-1";
-  config.caFile = "/etc/pki/tls/certs/ca-bundle.crt";
+  const char* region = std::getenv("AWS_REGION");
+  if (region) {
+    config.region = region;
+  }
+  const char* caFile = std::getenv("AWS_CA_BUNDLE");
+  config.caFile = caFile ? caFile : "/etc/pki/tls/certs/ca-bundle.crt";
benchmarks/wrappers/aws/cpp/handler.cpp (1)

23-23: Fix typo in comment.

"Gateaway" should be "Gateway".

-  // HTTP trigger with API Gateaway sends payload as a serialized JSON
+  // HTTP trigger with API Gateway sends payload as a serialized JSON
benchmarks/wrappers/aws/cpp/storage.cpp (3)

46-50: Clarify the NOP operation purpose.

The current implementation to prevent compiler optimization is unusual. Consider a simpler approach or document why this specific pattern is needed.

             // Perform NOP on result to prevent optimizations
             std::stringstream ss;
             ss << s.rdbuf();
-            std::string first(" ");
-            ss.get(&first[0], 1);
+            // Simply access the stream to prevent optimization
+            volatile auto size = ss.tellp();
+            (void)size;  // Suppress unused variable warning

59-63: Remove commented code or document why it's preserved.

The commented backoff logic should either be removed or documented if it's intentionally kept for future use.

         } else {
             retries += 1;
-            //int sleep_time = retries;
-            //if (retries > 100) {
-            //    sleep_time = retries * 2;
-            //}
-            //std::this_thread::sleep_for(std::chrono::milliseconds(sleep_time));
+            // TODO: Consider adding backoff delay for failed downloads
         }

75-78: Simplify the comment about bufferstream usage.

The comment is quite long and could be more concise while still explaining the rationale.

-    /**
-     * We use Boost's bufferstream to wrap the array as an IOStream. Usign a light-weight streambuf wrapper, as many solutions 
-     * (e.g. https://stackoverflow.com/questions/13059091/creating-an-input-stream-from-constant-memory) on the internet
-     * suggest does not work because the S3 SDK relies on proper functioning tellp(), etc... (for instance to get the body length).
-     */
+    // Use Boost's bufferstream to wrap the buffer as an IOStream.
+    // The S3 SDK requires proper tellp() functionality, so simple streambuf wrappers don't work.
sebs/aws/aws.py (1)

671-677: Simplify the conditional structure.

The elif after return is unnecessary and can be simplified for better readability.

 def cloud_runtime(self, language: Language, language_version: str):
     if language in [Language.NODEJS, Language.PYTHON]:
         return ("{}{}".format(language, language_version),)
-    elif language == Language.CPP:
+    if language == Language.CPP:
         return "provided.al2"
-    else:
-        raise NotImplementedError()
+    raise NotImplementedError()
sebs/benchmark.py (1)

18-19: Clean up duplicate import.

Remove the duplicate Language import on line 19.

 from sebs.utils import find_benchmark, project_absolute_path, LoggingBase
-from sebs.types import BenchmarkModule, Language
 from sebs.types import Language
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2d68834 and 9a96d72.

📒 Files selected for processing (30)
  • benchmarks/000.microbenchmarks/010.sleep/config.json (1 hunks)
  • benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/handler.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/key-value.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/key-value.hpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/redis.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/redis.hpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/storage.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/storage.hpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/utils.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/utils.hpp (1 hunks)
  • config/systems.json (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.build (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.dependencies-boost (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.dependencies-hiredis (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.dependencies-runtime (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.dependencies-sdk (1 hunks)
  • dockerfiles/aws/nodejs/Dockerfile.build (1 hunks)
  • dockerfiles/cpp_installer.sh (1 hunks)
  • sebs.py (1 hunks)
  • sebs/aws/aws.py (6 hunks)
  • sebs/azure/azure.py (4 hunks)
  • sebs/benchmark.py (11 hunks)
  • sebs/faas/container.py (9 hunks)
  • sebs/faas/function.py (9 hunks)
  • sebs/faas/system.py (2 hunks)
  • sebs/gcp/gcp.py (4 hunks)
  • sebs/openwhisk/openwhisk.py (3 hunks)
  • sebs/types.py (2 hunks)
  • tools/build_docker_images.py (5 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
sebs/faas/container.py

[refactor] 143-143: Too many arguments (7/5)

(R0913)


[refactor] 143-143: Too many positional arguments (7/5)

(R0917)


[refactor] 143-143: Too many local variables (19/15)

(R0914)

tools/build_docker_images.py

[error] 51-51: Parsing failed: 'invalid syntax (build_docker_images, line 51)'

(E0001)

sebs/aws/aws.py

[refactor] 672-677: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

🪛 Ruff (0.11.9)
sebs/faas/function.py

251-253: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

tools/build_docker_images.py

51-51: SyntaxError: Expected a statement


51-51: SyntaxError: Expected a statement


51-51: SyntaxError: Expected a statement


51-51: SyntaxError: Expected a statement


52-52: SyntaxError: Unexpected indentation


53-53: SyntaxError: Expected a statement


53-53: SyntaxError: Expected a statement


53-53: SyntaxError: Expected a statement


53-53: SyntaxError: Expected a statement


53-54: SyntaxError: Expected a statement


54-54: SyntaxError: Unexpected indentation


59-59: SyntaxError: Expected a statement


59-59: SyntaxError: Expected a statement


59-59: SyntaxError: Expected a statement


59-59: SyntaxError: Expected a statement


59-59: SyntaxError: Expected ',', found name


59-59: SyntaxError: Expected ',', found name


59-59: SyntaxError: Expected ',', found name


109-109: SyntaxError: Expected a statement


109-109: SyntaxError: Expected a statement


109-109: SyntaxError: Expected a statement


109-109: SyntaxError: Expected a statement


111-111: SyntaxError: Unexpected indentation


136-136: SyntaxError: unindent does not match any outer indentation level


136-136: SyntaxError: Expected a statement


136-137: SyntaxError: Expected a statement


137-137: SyntaxError: Unexpected indentation


138-138: SyntaxError: Expected a statement


138-138: SyntaxError: Expected a statement


138-138: SyntaxError: Expected a statement


138-138: SyntaxError: Expected a statement


138-139: SyntaxError: Expected a statement


140-140: SyntaxError: Unexpected indentation


149-149: SyntaxError: unindent does not match any outer indentation level


149-149: SyntaxError: Expected a statement


149-150: SyntaxError: Expected a statement


150-150: SyntaxError: Unexpected indentation


151-151: SyntaxError: Expected a statement


151-151: SyntaxError: Expected a statement


151-151: SyntaxError: Expected a statement


151-151: SyntaxError: Expected a statement


151-151: SyntaxError: Expected ',', found name


151-151: SyntaxError: Expected ',', found name


151-151: SyntaxError: Expected ',', found name


152-152: SyntaxError: Unexpected indentation


152-152: SyntaxError: Expected a statement


152-152: SyntaxError: Expected a statement


152-153: SyntaxError: Expected a statement


153-153: SyntaxError: Unexpected indentation

sebs/benchmark.py

19-19: Redefinition of unused Language from line 18

Remove definition: Language

(F811)

🔇 Additional comments (33)
dockerfiles/aws/nodejs/Dockerfile.build (1)

4-5: Update comment and review package dependencies for user management.

The # useradd, groupmod comment no longer matches the installed packages—you're now installing C++ build tools (cmake, curl, libcurl, libcurl-devel) but not shadow-utils (which provides useradd/groupmod). If you still rely on gosu to switch users, either re-add shadow-utils to the install line or remove/update the comment to reflect the new purpose.

sebs/types.py (1)

45-57: Well-structured Architecture enum implementation.

The Architecture enum with its serialize/deserialize methods follows good patterns and provides type safety for the codebase.

benchmarks/000.microbenchmarks/010.sleep/config.json (1)

4-4: LGTM - Consistent language addition.

The addition of "cpp" to the languages array is consistent with the existing configuration pattern and enables C++ benchmark support.

sebs.py (1)

67-67: LGTM - Correct CLI integration.

The addition of "cpp" to the language choices properly integrates C++ support into the command-line interface.

sebs/faas/system.py (1)

16-16: LGTM! Excellent type safety improvement.

The addition of the Language enum import and updating the method signature to use strongly-typed Language instead of string-based language_name significantly improves type safety and code maintainability. This change aligns well with the broader refactoring across the codebase.

Also applies to: 174-174

dockerfiles/aws/cpp/Dockerfile.dependencies-runtime (1)

9-9: Approve the CMake configuration choices.

The CMake configuration looks well-thought-out:

  • Release build type for optimized binaries
  • BUILD_SHARED_LIBS=OFF for static linking (appropriate for Lambda)
  • Custom install prefix to /opt for multi-stage builds
  • Parallel builds using ${WORKERS} for build efficiency
config/systems.json (1)

129-131: Verify dependency order and completeness.

The dependencies list looks comprehensive, but ensure the order matches any build dependencies (e.g., runtime before SDK).

#!/bin/bash
# Verify that all dependency Dockerfiles exist for the listed dependencies
echo "Checking for dependency Dockerfiles..."
for dep in runtime sdk boost hiredis; do
    dockerfile="dockerfiles/aws/cpp/Dockerfile.dependencies-${dep}"
    if [ -f "$dockerfile" ]; then
        echo "✓ Found: $dockerfile"
    else
        echo "✗ Missing: $dockerfile"
    fi
done
sebs/gcp/gcp.py (3)

27-27: Good refactoring for type safety.

The import of the Language enum improves type safety and consistency across the codebase.


142-147: Type-safe dictionary access implementation looks correct.

The refactoring properly updates both CONFIG_FILES and HANDLER dictionaries to use Language enum keys, and all access patterns use the typed language parameter.


128-148: I didn’t locate the Python Language enum – let’s search for its class definition and inspect it for a CPP member:

#!/bin/bash
# Locate and display the Python enum for Language
echo "Searching for Language enum in .py files..."
rg -n "class Language" -g "*.py"

echo -e "\nShowing definition and first 20 lines after declaration:"
file=$(rg -l "class Language" -g "*.py" | head -n1)
if [[ -n "$file" ]]; then
  start_line=$(grep -n "class Language" "$file" | head -n1 | cut -d: -f1)
  sed -n "${start_line},$((start_line+20))p" "$file"
else
  echo "❌ Language enum definition file not found."
fi
dockerfiles/aws/cpp/Dockerfile.build (2)

23-26: Dependency aggregation looks correct.

The sequential copying from all dependency stages properly aggregates the built libraries and headers into /opt.


30-31: ```shell
#!/bin/bash
echo "Locating Dockerfile.build files..."
fd --type f 'Dockerfile.build'

echo
echo "Printing lines 1–60 of dockerfiles/aws/cpp/Dockerfile.build:"
sed -n '1,60p' dockerfiles/aws/cpp/Dockerfile.build

echo
echo "Looking for any COPY commands or /sebs references:"
grep -R -n 'COPY|sebs' dockerfiles/aws/cpp/Dockerfile.build


</details>
<details>
<summary>sebs/openwhisk/openwhisk.py (3)</summary>

`21-21`: **Good addition of type safety with Language enum import.**

The import of `Language` enum enhances type safety and consistency across the codebase.

---

`100-100`: **Parameter type change improves type safety.**

Changing from `language_name: str` to `language: Language` provides compile-time type checking and prevents invalid language strings.

---

`111-111`: **Correct usage of language.value for backward compatibility.**

Using `language.value` maintains compatibility with downstream methods expecting string identifiers.

</details>
<details>
<summary>sebs/azure/azure.py (3)</summary>

`27-27`: **Good addition of Language enum import for type safety.**

The import enhances type consistency across cloud provider modules.

---

`122-122`: **Parameter type change improves type safety.**

The change from string to `Language` enum parameter provides better type checking.

---

`155-155`: **Dictionary lookup now uses proper enum type.**

Good use of the Language enum for dictionary access instead of string.

</details>
<details>
<summary>benchmarks/wrappers/aws/cpp/key-value.hpp (1)</summary>

`13-13`: **Good use of shared_ptr for AWS client management.**

Using `shared_ptr` for the DynamoDB client is appropriate as it handles the AWS SDK object lifecycle correctly.

</details>
<details>
<summary>benchmarks/wrappers/aws/cpp/storage.hpp (2)</summary>

`12-14`: **Good use of move semantics in constructor.**

The constructor correctly uses move semantics to efficiently transfer ownership of the S3Client, avoiding unnecessary copies.

---

`16-16`: **Static factory method follows good design patterns.**

The `get_client()` static factory method is a good design choice for encapsulating client creation logic.

</details>
<details>
<summary>sebs/faas/container.py (3)</summary>

`12-12`: **LGTM: Type-safe language handling.**

The import of the `Language` enum improves type safety and consistency across the codebase.

---

`143-151`: **LGTM: Method signature update for type safety.**

The change from `language_name: str` to `language: Language` parameter improves type safety and aligns with the broader refactoring to use strongly-typed language identifiers.

---

`164-164`: **LGTM: Consistent use of Language enum.**

The usage of `language.value` to extract the string representation is correct and consistent throughout the method.




Also applies to: 188-188, 201-201

</details>
<details>
<summary>sebs/aws/aws.py (3)</summary>

`25-25`: **LGTM: Import addition aligns with type safety improvements.**

The import of the `Language` enum from `sebs.types` supports the refactoring to use strongly-typed language identifiers throughout the codebase.

---

`121-121`: **LGTM: Parameter type improved with enum.**

Changing the parameter type from string to `Language` enum enhances type safety and prevents invalid language values.

---

`139-175`: ```bash
#!/bin/bash
set -eo pipefail

# 1. List all files under benchmarks/ to inspect their extensions
find benchmarks/ -type f

# 2. Search for any 'zip' commands or references to 'benchmark.zip' in those files
grep -R --include="*.sh" --include="Makefile" --include="*.py" -nE "zip|benchmark\.zip" benchmarks/ || true

# 3. Search for any build directory creations or references
grep -R -nE "mkdir.*build|cmake|make" benchmarks/ || true
sebs/faas/function.py (3)

12-12: LGTM: Centralized enum imports improve consistency.

Importing Language and Architecture from the shared sebs.types module enhances modularity and ensures consistent enum usage across the codebase.


221-227: LGTM: Enhanced JSON parsing for C++ triggers.

The enhanced parsing logic correctly handles AWS C++ trigger responses that may have nested JSON bodies, improving compatibility with the new C++ runtime.


290-292: LGTM: Consistent use of enum deserialization.

The update to use Language.deserialize() maintains consistency with the typed enum approach throughout the codebase.

sebs/benchmark.py (3)

7-7: LGTM: Appropriate import for CMake script formatting.

The textwrap import is correctly added to support the dedenting of the CMake script in the new C++ packaging method.


410-450: Well-structured C++ packaging method with future improvement opportunities.

The CMake script generation is well-implemented and correctly handles C++ Lambda runtime dependencies. The FIXME comments appropriately identify areas for future enhancement.

Consider addressing the FIXME comments in future iterations:

  • Generic configuration for different cloud providers
  • Optional dependency configuration
  • Dynamic dependency resolution

629-629: LGTM: Enhanced error logging for debugging.

Adding stderr decoding to the Docker build error logging provides better diagnostic information when builds fail.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
sebs/aws/aws.py (1)

676-682: Simplify the conditional structure.

The elif after return is unnecessary and can be simplified for better readability.

 def cloud_runtime(self, language: Language, language_version: str):
     if language in [Language.NODEJS, Language.PYTHON]:
         return ("{}{}".format(language, language_version),)
-    elif language == Language.CPP:
+    if language == Language.CPP:
         return "provided.al2"
-    else:
-        raise NotImplementedError()
+    raise NotImplementedError()
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a96d72 and b5fbb12.

📒 Files selected for processing (5)
  • sebs/aws/aws.py (6 hunks)
  • sebs/benchmark.py (16 hunks)
  • sebs/faas/container.py (6 hunks)
  • sebs/faas/function.py (3 hunks)
  • sebs/openwhisk/openwhisk.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • sebs/openwhisk/openwhisk.py
  • sebs/faas/function.py
🧰 Additional context used
🪛 Pylint (3.3.7)
sebs/aws/aws.py

[refactor] 677-682: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

sebs/faas/container.py

[refactor] 137-137: Too many arguments (7/5)

(R0913)


[refactor] 137-137: Too many positional arguments (7/5)

(R0917)


[refactor] 137-137: Too many local variables (19/15)

(R0914)

🔇 Additional comments (15)
sebs/faas/container.py (3)

12-12: LGTM: Good type safety improvement.

The addition of the Language enum import supports the transition from string-based language handling to a more type-safe approach.


102-106: LGTM: Improved code formatting.

The parameter formatting improves readability without changing functionality.


140-140: LGTM: Proper enum usage throughout the method.

The method correctly uses Language enum type for the parameter and language.value when string representation is needed for Docker operations and file paths.

Also applies to: 158-158, 181-181, 193-193

sebs/aws/aws.py (3)

25-25: LGTM: Type safety improvement.

Adding the Language enum import supports better type safety throughout the codebase.


121-121: LGTM: Proper enum adoption.

The method signature correctly uses the Language enum, and the CONFIG_FILES dictionary properly maps enum values to their respective configuration files.

Also applies to: 144-146


148-180: LGTM: Well-structured C++ packaging logic.

The conditional logic properly handles different languages. The C++ implementation assumes the build script creates a zip file in the build directory, which aligns with typical CMake build patterns for AWS Lambda.

sebs/benchmark.py (9)

7-7: LGTM: Required imports for C++ support.

The textwrap import is needed for the CMakeLists.txt generation, and the Language enum import supports the type safety improvements.

Also applies to: 18-18


30-30: LGTM: Proper type annotation.

The type annotation correctly uses the Language enum for better type safety.


180-180: LGTM: Consistent enum usage in hash operations.

The hash methods now properly use the Language enum, maintaining consistency with the broader refactoring.

Also applies to: 253-253


257-282: LGTM: Comprehensive C++ file support.

The file handling dictionaries properly include C++ file types (*.cpp, *.hpp, dependencies.json) and integrate them consistently with existing Python and Node.js support.


327-334: LGTM: Updated copy logic for C++ files.

The copy_code method correctly handles C++ files using the Language enum.


413-454: Well-implemented C++ CMakeLists.txt generation.

The CMakeLists.txt generation includes all necessary dependencies for AWS Lambda C++ runtime:

  • AWS Lambda runtime
  • Boost libraries
  • AWS SDK (S3, DynamoDB)
  • Hiredis for Redis support

The FIXME comments appropriately mark areas for future enhancement (generic deployment support, optional dependencies).


541-546: LGTM: C++ package file configuration.

Correctly identifies CMakeLists.txt as the package file for C++ builds, consistent with CMake build systems.


632-632: LGTM: Improved error logging.

Adding stderr output to error logging provides better debugging information when Docker builds fail.


643-643: LGTM: Updated method signature for type safety.

The build method now properly uses the Language enum in its callable parameter type, maintaining consistency with the broader refactoring.

Also applies to: 684-684

@mcopik mcopik mentioned this pull request Jun 19, 2025
2 tasks
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
sebs/aws/aws.py (1)

168-185: Fix inconsistent return path for C++ packages.

The C++ packaging logic creates the archive in build/benchmark.zip (line 171) but the return statement (line 182) still uses the old path format {}.zip which doesn't account for the C++ build directory structure.

Apply this fix:

+        elif language == Language.CPP:
+            benchmark_archive = os.path.join(directory, "build", "benchmark.zip")
            return (
-                os.path.join(directory, "{}.zip".format(benchmark)),
+                benchmark_archive,
                bytes_size,
                container_uri,
            )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5fbb12 and 068adfd.

📒 Files selected for processing (12)
  • benchmarks/wrappers/aws/cpp/handler.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/key-value.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/redis.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/storage.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/utils.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/utils.hpp (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.dependencies-hiredis (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.dependencies-runtime (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.dependencies-sdk (1 hunks)
  • dockerfiles/cpp_installer.sh (1 hunks)
  • sebs/aws/aws.py (6 hunks)
  • sebs/types.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
  • dockerfiles/aws/cpp/Dockerfile.dependencies-hiredis
  • dockerfiles/aws/cpp/Dockerfile.dependencies-sdk
  • dockerfiles/cpp_installer.sh
  • dockerfiles/aws/cpp/Dockerfile.dependencies-runtime
  • benchmarks/wrappers/aws/cpp/key-value.cpp
  • sebs/types.py
  • benchmarks/wrappers/aws/cpp/handler.cpp
  • benchmarks/wrappers/aws/cpp/redis.cpp
  • benchmarks/wrappers/aws/cpp/storage.cpp
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: mcopik
PR: spcl/serverless-benchmarks#99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.718Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.
Learnt from: mcopik
PR: spcl/serverless-benchmarks#99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.851Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.
🪛 Pylint (3.3.7)
sebs/aws/aws.py

[refactor] 197-204: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

🔇 Additional comments (2)
benchmarks/wrappers/aws/cpp/utils.cpp (1)

6-10: LGTM! Naming inconsistency has been resolved.

The function name timeSinceEpochMicrosec now correctly reflects that it returns microseconds, matching the implementation. This addresses the previous naming inconsistency issue.

benchmarks/wrappers/aws/cpp/utils.hpp (1)

1-8: LGTM! Include guards have been properly added.

The header file now includes proper include guards and the necessary <cstdint> header, addressing the previous review comment about multiple inclusion prevention.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
sebs/aws/aws.py (1)

193-204: LGTM: C++ runtime support added correctly.

The method properly handles the new C++ language with the appropriate AWS Lambda custom runtime "provided.al2".

Consider addressing the style issue flagged by static analysis - the elif statements after return are unnecessary:

     def cloud_runtime(self, language: Language, language_version: str):
         # AWS uses different naming scheme for Node.js versions
         # For example, it's 12.x instead of 12.
         if language == Language.NODEJS:
             return f"{language}{language_version}.x"
-        elif language == Language.CPP:
+        if language == Language.CPP:
             return "provided.al2"
-        elif language in [Language.PYTHON]:
+        if language == Language.PYTHON:
             return f"{language}{language_version}"
-        else:
+        
             raise NotImplementedError()
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 068adfd and ef25201.

📒 Files selected for processing (1)
  • sebs/aws/aws.py (6 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: mcopik
PR: spcl/serverless-benchmarks#99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.718Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.
Learnt from: mcopik
PR: spcl/serverless-benchmarks#99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.851Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.
🪛 Pylint (3.3.7)
sebs/aws/aws.py

[refactor] 197-204: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

🔇 Additional comments (5)
sebs/aws/aws.py (5)

25-25: LGTM: Language enum import aligns with system-wide type safety improvements.

The import of the Language enum supports the transition from string-based to strongly-typed language handling throughout the codebase.


118-127: LGTM: Method signature updated to use Language enum.

The transition from string-based to enum-based language parameter improves type safety and consistency across the codebase.


144-146: LGTM: CONFIG_FILES dictionary updated for enum-based language handling.

The dictionary keys are properly updated to use Language enum values instead of strings.


216-216: LGTM: Correctly using Language enum from code package.

The change from string-based to enum-based language access is consistent with the type-safe approach introduced in this PR.


281-281: LGTM: Consistent enum usage in runtime configuration.

The method call correctly passes the Language enum to the updated cloud_runtime method signature.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 24

♻️ Duplicate comments (1)
benchmarks/wrappers/aws/cpp/storage.cpp (1)

16-24: Configure AWS region from environment variable.

The config.region is not set, which may cause the client to fall back to a default region. As noted in past reviews, the region should be configurable from the AWS_REGION environment variable for consistency.

Apply this diff to set the region from the environment:

 sebs::Storage sebs::Storage::get_client() {
   Aws::Client::ClientConfiguration config;
+  const char* region = std::getenv("AWS_REGION");
+  if (region) {
+    config.region = region;
+  }
+  config.caFile = "/etc/pki/tls/certs/ca-bundle.crt";
 
   char const TAG[] = "LAMBDA_ALLOC";
🧹 Nitpick comments (28)
benchmarks/200.multimedia/210.thumbnailer/cpp/function.hpp (1)

5-27: Consider adding input validation.

The function doesn't validate that in is non-empty or that width and height are positive. While OpenCV may handle some edge cases, explicit validation would catch issues earlier and provide clearer error messages.

Consider adding validation at the start of the function:

void thumbnailer(cv::Mat &in, int64_t width, int64_t height, cv::Mat &out)
{
  if (in.empty()) {
    throw std::invalid_argument("Input image is empty");
  }
  if (width <= 0 || height <= 0) {
    throw std::invalid_argument("Width and height must be positive");
  }
  // ... rest of implementation
}
sebs/cpp_dependencies.py (2)

56-61: Consider using a custom exception type.

The generic Exception works but a custom exception type would make error handling more specific for callers who want to catch C++ dependency errors separately.

Consider defining a custom exception:

+class CppDependencyError(Exception):
+    """Exception raised for unknown C++ dependencies."""
+    pass
+
 class CppDependencies(str, Enum):
     # ... existing code ...
     
     @staticmethod
     def deserialize(val: str) -> CppDependencies:
         for member in CppDependencies:
             if member.value == val:
                 return member
-        raise Exception(f"Unknown C++ dependency type {val}")
+        raise CppDependencyError(f"Unknown C++ dependency type {val}")

22-54: Cache the dependency dictionary to avoid recreating it.

The _dependency_dictionary method creates a new dictionary on each call. Since the configuration is static, this can be cached as a class attribute.

Apply this diff to cache the dictionary:

 class CppDependencies(str, Enum):
     """
     Enum for C++ dependencies used in the benchmarks.
     """
     TORCH = "torch"
     OPENCV = "opencv"
     IGRAPH = "igraph"
     BOOST = "boost"
     HIREDIS = "hiredis"
     
-    @staticmethod
-    def _dependency_dictionary() -> dict[str, CppDependencyConfig]:
-        return {
+    _DEPENDENCY_DICT: dict[str, CppDependencyConfig] = {
             CppDependencies.TORCH: CppDependencyConfig(
                 docker_img="dependencies-torch.aws.cpp.all",
                 cmake_package="Torch",
                 cmake_libs="${TORCH_LIBRARIES} -lm",
                 cmake_dir="${TORCH_INCLUDE_DIRS}"
             ),
             # ... rest of dictionary ...
         }
     
     @staticmethod
     def to_cmake_list(dependency: CppDependencies) -> str:
         """
         Returns the CMake target for the given C++ dependency.
         """
         if dependency not in CppDependencies:
             raise ValueError(f"Unknown C++ dependency {dependency}")
-        dependency_config = CppDependencies._dependency_dictionary()[dependency]
+        dependency_config = CppDependencies._DEPENDENCY_DICT[dependency]
         # ... rest of method ...
dockerfiles/aws/cpp/Dockerfile.dependencies-torch (1)

18-18: Add checksum verification for downloaded files.

The Dockerfile downloads LibTorch and torchvision without verifying checksums or signatures. Adding verification would protect against compromised downloads and ensure build reproducibility.

Consider adding SHA256 verification:

# For LibTorch
RUN curl -L -o torchlib.zip https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-1.7.0%2Bcpu.zip && \
    echo "EXPECTED_SHA256  torchlib.zip" | sha256sum -c -

# For torchvision
RUN curl -LO https://github.com/pytorch/vision/archive/refs/tags/v${TORCHVISION_VERSION}.zip && \
    echo "EXPECTED_SHA256  v${TORCHVISION_VERSION}.zip" | sha256sum -c -

Also applies to: 33-35

dockerfiles/aws/cpp/Dockerfile.dependencies-igraph (2)

10-15: Avoid redundant CMake installs; pick one source.

You install cmake3 via yum and then overlay a tarball. Keep only one to reduce build time and ambiguity on PATH.

Option A (use distro cmake3): remove the Kitware tarball block. Option B (use Kitware): drop cmake3 from yum list:

-    yum install -y cmake3 wget unzip curl git gcc gcc-c++ \
+    yum install -y wget unzip curl git gcc gcc-c++ \
                     make tar gzip libpng-devel zlib-devel \
                     libjpeg-turbo-devel python3-devel python3-pip && \

Also applies to: 17-21


5-6: Robust parallelism and toolchain hints.

Default WORKERS can be empty; prefer nproc, and export paths so builds can find /opt artifacts later.

Apply this diff:

-ARG WORKERS
-ENV WORKERS=${WORKERS}
+ARG WORKERS
+ENV WORKERS=${WORKERS}
@@
-    make -j${WORKERS} && \
+    make -j"$(test -n "$WORKERS" && echo "$WORKERS" || nproc)" && \
     make install
@@
 FROM ${BASE_IMAGE}
 
 WORKDIR /app
+ENV CMAKE_PREFIX_PATH=/opt \
+    PKG_CONFIG_PATH=/opt/lib64/pkgconfig:/opt/lib/pkgconfig \
+    LD_LIBRARY_PATH=/opt/lib64:/opt/lib:${LD_LIBRARY_PATH}
+# Optional on AL2:
+# RUN ldconfig

Also applies to: 34-35

benchmarks/500.scientific/501.graph-pagerank/cpp/function.hpp (2)

30-31: Units mismatch: variables named “..._ms” hold microseconds.

Either rename or convert to milliseconds. Suggest converting to ms to match names.

Apply this diff:

-    graph_generation_time_ms = (timeSinceEpochMicrosec() - start_time);
+    graph_generation_time_ms = (timeSinceEpochMicrosec() - start_time) / 1000;
@@
-    compute_pr_time_ms = (timeSinceEpochMicrosec() - start_time);
+    compute_pr_time_ms = (timeSinceEpochMicrosec() - start_time) / 1000;

Also applies to: 41-42


4-6: Include for fabs and tighten error reporting.

Ensure fabs is declared; optional: print eigenvalue to aid debugging.

Apply this diff:

-#include <cfloat>
+#include <cfloat>
+#include <cmath>
@@
-  if (fabs(value - 1.0) > 32*DBL_EPSILON) {
-      fprintf(stderr, "PageRank failed to converge.\n");
+  if (std::fabs(value - 1.0) > 32*DBL_EPSILON) {
+      fprintf(stderr, "PageRank failed to converge, eigenvalue=%f\n", (double)value);
       igraph_vector_destroy(&pagerank);
       igraph_destroy(&graph);
       return 1;
   }

Also applies to: 43-47

config/systems.json (1)

126-133: Consider updating the base image and scoping heavy deps.

Base image tag al2.2022.04.27.09 is old; torch/opencv add significant size. If feasible, move “torch/opencv” into optional, per-benchmark dependencies to slim generic images.

If you want, I can propose a split of dependencies into “core” (runtime, sdk, boost, hiredis, igraph) and “extras” (opencv, torch) and update the dependency Docker build accordingly.

benchmarks/500.scientific/503.graph-bfs/cpp/main.cpp (3)

15-24: Seed once; current code seeds twice when seed exists.

Simplify to set seed value then seed RNG once.

Apply this diff:

-    if (request.ValueExists("seed")) {
-        seed = request.GetInteger("seed");
-        igraph_rng_seed(igraph_rng_default(), seed);
-    } else {
+    if (request.ValueExists("seed")) {
+        seed = request.GetInteger("seed");
+    } else {
         std::random_device rd;
         seed = rd();
     }
-    igraph_rng_seed(igraph_rng_default(), seed);
+    igraph_rng_seed(igraph_rng_default(), seed);

41-62: Check igraph return codes and report errors.

Add error handling for igraph_bfs and barabasi to avoid silent failures.

Example:

-    igraph_bfs(
+    if (igraph_bfs(
         &graph,
         0,          // root vertex
         nullptr,    // roots
         IGRAPH_ALL, // neimode
         true,       // unreachable
         nullptr,    // restricted
         &order,     // order
         nullptr,    // rank
         nullptr,    // father
         nullptr,    // pred
         nullptr,    //succ,
         nullptr,    // dist
         nullptr,    // callback
-        nullptr     // extra
-    );
+        nullptr     // extra
+    ) != IGRAPH_SUCCESS) {
+        igraph_vector_int_destroy(&order);
+        igraph_destroy(&graph);
+        Aws::Utils::Json::JsonValue err;
+        err.WithString("error", "igraph_bfs failed");
+        return err;
+    }

73-79: Use a consistent key: “measurements” (plural) like other benchmarks.

Current key is “measurement”.

Apply this diff:

-    Aws::Utils::Json::JsonValue measurement;
-    measurement.WithInt64("graph_generating_time", graph_generating_time);
-    measurement.WithInt64("compute_time", process_time);
-    result.WithObject("measurement", std::move(measurement));
+    Aws::Utils::Json::JsonValue measurements;
+    measurements.WithInt64("graph_generating_time", graph_generating_time);
+    measurements.WithInt64("compute_time", process_time);
+    result.WithObject("measurements", std::move(measurements));
benchmarks/200.multimedia/210.thumbnailer/cpp/main.cpp (3)

51-53: Use vector and direct imdecode.

vector + cv::Mat(vectordata) is suboptimal and type-mismatched for OpenCV.

Apply this diff:

-  std::vector<char> vectordata(body_str.begin(), body_str.end());
-  cv::Mat image = imdecode(cv::Mat(vectordata), 1);
+  std::vector<unsigned char> vectordata(body_str.begin(), body_str.end());
+  cv::Mat image = cv::imdecode(vectordata, cv::IMREAD_COLOR);
+  if (image.empty()) {
+    Aws::Utils::Json::JsonValue error;
+    error.WithString("error", "Failed to decode input image.");
+    return error;
+  }

88-93: Remove unused copy and rely on WithInt64 for metrics.

Avoid extra Aws::String allocation; use consistent 64-bit setters.

Apply this diff:

-  Aws::String upload_data(out_buffer.begin(), out_buffer.end());
-
   uint64_t upload_time = client_.upload_random_file(
       bucket_name, key_name, true,
       reinterpret_cast<char *>(out_buffer.data()), out_buffer.size());
@@
-  measurements.WithInteger("download_time", download_time);
-  measurements.WithInteger("upload_time", upload_time);
-  measurements.WithInteger("compute_time", computing_time);
-  measurements.WithInteger("download_size", vectordata.size());
-  measurements.WithInteger("upload_size", out_buffer.size());
+  measurements.WithInt64("download_time", download_time);
+  measurements.WithInt64("upload_time", upload_time);
+  measurements.WithInt64("compute_time", computing_time);
+  measurements.WithInt64("download_size", static_cast<long long>(vectordata.size()));
+  measurements.WithInt64("upload_size", static_cast<long long>(out_buffer.size()));

Also applies to: 94-99


63-65: Check imencode success and non-empty output.

Guard against failed encoding before upload.

Apply this diff:

-  cv::imencode(".jpg", image2, out_buffer);
+  if (!cv::imencode(".jpg", image2, out_buffer) || out_buffer.empty()) {
+    Aws::Utils::Json::JsonValue error;
+    error.WithString("error", "Failed to encode thumbnail.");
+    return error;
+  }
benchmarks/500.scientific/501.graph-pagerank/cpp/main.cpp (3)

17-17: Unused storage client.

Either use it or remove to avoid warnings.


9-13: Drop unused includes.

fstream, random, vector appear unused; remove to speed up builds.


2-3: Build error: AWS headers not found.

Ensure AWS SDK dev headers are installed and include paths are set in CMake/Docker (e.g., aws-sdk-cpp with s3/core, target_link_libraries + target_include_directories).

Can you confirm the Docker image used here already installs aws-sdk-cpp and exposes headers for compilation?

benchmarks/wrappers/aws/cpp/storage.hpp (2)

12-13: Encapsulation: avoid public client field.

Make _client private and expose a client() accessor to prevent external mutation.

-class Storage
-{
-public:
-  Aws::S3::S3Client _client;
+class Storage {
+public:
+  const Aws::S3::S3Client& client() const { return _client; }
+  Aws::S3::S3Client& client() { return _client; }
+private:
+  Aws::S3::S3::S3Client _client;

2-2: Clang 'cstdint' not found.

Likely tool env issue; ensure proper C++ toolchain in the build image (libstdc++/libc++ headers).

benchmarks/400.inference/411.image-recognition/cpp/main.cpp (4)

56-65: Set eval mode after loading the model.

Use inference settings to avoid training-time layers behavior.

             module = torch::jit::load("./resnet50.pt");
+            module.eval();

78-86: Disable autograd during inference.

Wrap forward in no-grad.

-        torch::Tensor out_tensor = module.forward({input_tensor}).toTensor();
+        torch::NoGradGuard no_grad;
+        torch::Tensor out_tensor = module.forward({input_tensor}).toTensor();

111-116: Return a meaningful status until S3 I/O + inference path is wired.

Avoid returning an empty object.

-    Aws::Utils::Json::JsonValue val;
-    Aws::Utils::Json::JsonValue result;
-    Aws::Utils::Json::JsonValue measurements;
-
-    return val;
+    Aws::Utils::Json::JsonValue resp;
+    resp.WithString("status", "not_implemented");
+    return resp;

I can add S3 download (Storage::download_file), model caching to /tmp, and inference wiring next.


1-2: Build error: AWS headers not found.

Confirm the C++ base image installs aws-sdk-cpp and proper include paths.

sebs/aws/aws.py (1)

135-141: Confirm ECR build accepts Language enum.

If build_base_image expects a string, convert via language.value.

tools/build_docker_images.py (1)

102-117: Harden 'dependencies' branch against unknown language keys.

Add a defensive check to avoid KeyError.

-    elif args.type == "dependencies":
-        if args.language:
-            if "dependencies" in system_config["languages"][args.language]:
-                language_config = system_config["languages"][args.language]
+    elif args.type == "dependencies":
+        if args.language:
+            if args.language not in system_config["languages"]:
+                raise RuntimeError(f"Unknown language: {args.language}")
+            if "dependencies" in system_config["languages"][args.language]:
+                language_config = system_config["languages"][args.language]
benchmarks/wrappers/aws/cpp/storage.cpp (1)

95-95: Fix typo in comment.

Line 95 has a typo: "Usign" should be "Using".

-  * We use Boost's bufferstream to wrap the array as an IOStream. Usign a
+  * We use Boost's bufferstream to wrap the array as an IOStream. Using a
sebs/benchmark.py (1)

648-648: Consider using logging.exception for better error context.

When logging errors inside an exception handler, logging.exception() automatically includes the stack trace, providing more debugging context than logging.error().

-                    self.logging.error(f"Stderr: {e.stderr.decode()}")
+                    self.logging.exception(f"Package build failed. Stderr: {e.stderr.decode()}")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef25201 and 5a1557d.

📒 Files selected for processing (21)
  • benchmarks/200.multimedia/210.thumbnailer/config.json (1 hunks)
  • benchmarks/200.multimedia/210.thumbnailer/cpp/function.hpp (1 hunks)
  • benchmarks/200.multimedia/210.thumbnailer/cpp/main.cpp (1 hunks)
  • benchmarks/400.inference/411.image-recognition/config.json (1 hunks)
  • benchmarks/400.inference/411.image-recognition/cpp/main.cpp (1 hunks)
  • benchmarks/500.scientific/501.graph-pagerank/config.json (1 hunks)
  • benchmarks/500.scientific/501.graph-pagerank/cpp/function.hpp (1 hunks)
  • benchmarks/500.scientific/501.graph-pagerank/cpp/main.cpp (1 hunks)
  • benchmarks/500.scientific/503.graph-bfs/config.json (1 hunks)
  • benchmarks/500.scientific/503.graph-bfs/cpp/main.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/storage.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/storage.hpp (1 hunks)
  • config/systems.json (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.build (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.dependencies-igraph (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.dependencies-opencv (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.dependencies-torch (1 hunks)
  • sebs/aws/aws.py (6 hunks)
  • sebs/benchmark.py (16 hunks)
  • sebs/cpp_dependencies.py (1 hunks)
  • tools/build_docker_images.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • dockerfiles/aws/cpp/Dockerfile.build
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: mcopik
PR: spcl/serverless-benchmarks#99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.
🧬 Code graph analysis (9)
benchmarks/200.multimedia/210.thumbnailer/cpp/main.cpp (2)
benchmarks/wrappers/aws/cpp/storage.cpp (2)
  • get_client (16-24)
  • get_client (16-16)
benchmarks/200.multimedia/210.thumbnailer/cpp/function.hpp (2)
  • thumbnailer (5-27)
  • thumbnailer (5-5)
benchmarks/wrappers/aws/cpp/storage.cpp (3)
benchmarks/wrappers/aws/cpp/storage.hpp (4)
  • Storage (14-16)
  • bucket (20-24)
  • bucket (34-35)
  • bucket (37-41)
benchmarks/wrappers/aws/cpp/key-value.cpp (2)
  • download_file (28-75)
  • download_file (28-30)
benchmarks/wrappers/aws/cpp/utils.cpp (2)
  • timeSinceEpochMicrosec (6-10)
  • timeSinceEpochMicrosec (6-6)
sebs/aws/aws.py (3)
sebs/types.py (1)
  • Language (25-35)
sebs/benchmark.py (3)
  • language (153-154)
  • language_version (161-162)
  • benchmark (101-102)
sebs/utils.py (1)
  • execute (43-53)
benchmarks/500.scientific/501.graph-pagerank/cpp/main.cpp (2)
benchmarks/wrappers/aws/cpp/storage.cpp (2)
  • get_client (16-24)
  • get_client (16-16)
benchmarks/500.scientific/501.graph-pagerank/cpp/function.hpp (2)
  • graph_pagerank (8-55)
  • graph_pagerank (8-9)
sebs/benchmark.py (3)
sebs/cpp_dependencies.py (2)
  • CppDependencies (12-86)
  • to_cmake_list (64-86)
sebs/utils.py (2)
  • find_benchmark (144-147)
  • project_absolute_path (18-19)
sebs/types.py (2)
  • BenchmarkModule (5-7)
  • Language (25-35)
sebs/cpp_dependencies.py (1)
sebs/benchmark.py (1)
  • deserialize (68-79)
benchmarks/400.inference/411.image-recognition/cpp/main.cpp (1)
benchmarks/wrappers/aws/cpp/storage.cpp (2)
  • get_client (16-24)
  • get_client (16-16)
tools/build_docker_images.py (1)
sebs/config.py (1)
  • version (67-68)
benchmarks/wrappers/aws/cpp/storage.hpp (1)
benchmarks/wrappers/aws/cpp/storage.cpp (1)
  • client (22-22)
🪛 Clang (14.0.6)
benchmarks/200.multimedia/210.thumbnailer/cpp/main.cpp

[error] 2-2: 'aws/core/Aws.h' file not found

(clang-diagnostic-error)

benchmarks/wrappers/aws/cpp/storage.cpp

[error] 2-2: 'memory' file not found

(clang-diagnostic-error)

benchmarks/500.scientific/503.graph-bfs/cpp/main.cpp

[error] 1-1: 'aws/core/utils/json/JsonSerializer.h' file not found

(clang-diagnostic-error)

benchmarks/500.scientific/501.graph-pagerank/cpp/function.hpp

[error] 2-2: 'igraph.h' file not found

(clang-diagnostic-error)

benchmarks/500.scientific/501.graph-pagerank/cpp/main.cpp

[error] 2-2: 'aws/core/Aws.h' file not found

(clang-diagnostic-error)

benchmarks/200.multimedia/210.thumbnailer/cpp/function.hpp

[error] 2-2: 'opencv2/opencv.hpp' file not found

(clang-diagnostic-error)

benchmarks/400.inference/411.image-recognition/cpp/main.cpp

[error] 1-1: 'aws/core/Aws.h' file not found

(clang-diagnostic-error)

benchmarks/wrappers/aws/cpp/storage.hpp

[error] 2-2: 'cstdint' file not found

(clang-diagnostic-error)

🪛 Ruff (0.14.1)
sebs/aws/aws.py

160-160: Function call with shell=True parameter identified, security issue

(S604)

sebs/benchmark.py

262-262: Probable use of insecure hash functions in hashlib: md5

(S324)


648-648: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

sebs/cpp_dependencies.py

61-61: Create your own exception

(TRY002)


61-61: Avoid specifying long messages outside the exception class

(TRY003)


69-69: Avoid specifying long messages outside the exception class

(TRY003)

tools/build_docker_images.py

117-117: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (14)
benchmarks/500.scientific/503.graph-bfs/config.json (1)

4-6: LGTM!

The addition of C++ language support and igraph dependency is correctly configured.

benchmarks/200.multimedia/210.thumbnailer/config.json (1)

4-6: LGTM!

The C++ language support with opencv and boost dependencies is properly configured.

benchmarks/400.inference/411.image-recognition/config.json (1)

4-6: LGTM!

C++ support with torch and opencv dependencies is correctly configured.

benchmarks/500.scientific/501.graph-pagerank/config.json (1)

4-6: LGTM!

C++ language support with igraph dependency is properly configured.

dockerfiles/aws/cpp/Dockerfile.dependencies-opencv (1)

8-10: Verify OpenCV version is appropriate.

OpenCV 4.5.0 was released in December 2020. Ensure this version meets security and functional requirements, or consider upgrading to a more recent release for potential bug fixes and improvements.

dockerfiles/aws/cpp/Dockerfile.dependencies-torch (1)

18-18: LibTorch and torchvision versions are outdated and warrant verification.

LibTorch 1.7.0 and torchvision 0.8.0 were both released on October 27, 2020, making them exactly 5 years old. The current stable release as of October 27, 2025 is PyTorch 2.9. While the review comment's factual assessment is correct, the appropriate action depends on your specific build and security requirements. Verify that these pinned versions satisfy current security standards and compatibility needs, or upgrade to a more recent release if feasible.

config/systems.json (2)

124-138: CPP architecture gating confirmed—implicit rejection via unsupported versions.

The toolchain does reject arm64 for CPP, but through implicit gating. When CPP+arm64 is attempted, supported_language_versions() in sebs/config.py returns an empty list (since config/systems.json defines CPP base_images only for x64), causing get_function() in sebs/faas/system.py to raise an exception before deployment. All invocation paths converge on this check, ensuring rejection happens.

However, the error message ("Unsupported version") is misleading—it doesn't clarify that the issue is missing architecture support. Consider adding explicit language-architecture compatibility validation with a clearer error message early in the deployment flow.


126-129: I need to clarify the registry path discrepancy before finalizing the response.

Correct the registry path inconsistency in the multi-OS image suggestion.

The AL2023 variant suggestion is valid for future-proofing, but the example has a critical flaw: the AL2 image uses amazon/aws-lambda-provided while the AL2023 image uses public.ecr.aws/lambda/provided. AWS official documentation recommends public.ecr.aws/lambda/provided as the canonical path. Both should use the same registry for consistency.

Corrected example:

"base_images": {
  "x64": {
    "all": {
      "al2": "public.ecr.aws/lambda/provided:al2",
      "al2023": "public.ecr.aws/lambda/provided:al2023"
    }
  }
}
benchmarks/wrappers/aws/cpp/storage.cpp (1)

64-82: LGTM!

This overload has proper error handling with logging and returns appropriate failure indicators.

sebs/benchmark.py (5)

13-14: LGTM!

The import of CppDependencies is necessary for the C++ benchmark support being added.


28-40: LGTM!

The integration of cpp_dependencies into BenchmarkConfig follows the existing pattern for other configuration fields and properly handles the optional nature of the field.

Also applies to: 66-79


260-289: Good refactor to use Language enum.

The migration from string-based language keys to the Language enum improves type safety and consistency across the codebase.

Note: The static analysis tool flags MD5 usage at line 262, but MD5 is appropriate here for cache invalidation purposes (non-cryptographic hashing).


557-562: LGTM!

The PACKAGE_FILES dictionary correctly maps C++ to CMakeLists.txt, following the same pattern as Python and Node.js.


742-747: LGTM!

The formatting adjustment improves readability by clearly showing self as the first parameter.

Comment on lines +7 to +27
try
{
// Calculate thumbnail size while maintaining aspect ratio
int orig_width = in.cols;
int orig_height = in.rows;

double scale_w = static_cast<double>(width) / orig_width;
double scale_h = static_cast<double>(height) / orig_height;
double scale = std::min(scale_w, scale_h); // Use smaller scale to fit within bounds

int new_width = static_cast<int>(orig_width * scale);
int new_height = static_cast<int>(orig_height * scale);

// Resize image (equivalent to PIL's thumbnail method)
cv::resize(in, out, cv::Size(new_width, new_height), cv::INTER_LINEAR);
}
catch (const cv::Exception &e)
{
std::cerr << "OpenCV error: " << e.what() << std::endl;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Rethrow exceptions to avoid swallowing errors.

The catch block logs the error but doesn't rethrow, leaving the out parameter in an undefined state. The caller has no way to detect that the operation failed, which could lead to silent failures downstream.

Apply this diff to rethrow the exception:

   catch (const cv::Exception &e)
   {
     std::cerr << "OpenCV error: " << e.what() << std::endl;
+    throw;
   }
🤖 Prompt for AI Agents
In benchmarks/200.multimedia/210.thumbnailer/cpp/function.hpp around lines 7 to
27, the catch block logs OpenCV errors but swallows the exception leaving `out`
undefined and the caller unaware of failure; after logging the error, rethrow
the exception (use `throw;` to preserve the original exception) so the failure
propagates to the caller and they can handle it appropriately.

Comment on lines +31 to +34
auto image_name = request.GetObject("object").GetString("key");
auto width = request.GetObject("object").GetInteger("width");
auto height = request.GetObject("object").GetInteger("height");

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate presence and types of object.key/width/height.

Avoid crashes when fields are missing or wrong types.

Apply this diff:

-  auto image_name = request.GetObject("object").GetString("key");
-  auto width = request.GetObject("object").GetInteger("width");
-  auto height = request.GetObject("object").GetInteger("height");
+  if (!request.ValueExists("object") ||
+      !request.GetObject("object").ValueExists("key") ||
+      !request.GetObject("object").ValueExists("width") ||
+      !request.GetObject("object").ValueExists("height")) {
+    Aws::Utils::Json::JsonValue error;
+    error.WithString("error", "Missing object.key/width/height.");
+    return error;
+  }
+  auto image_name = request.GetObject("object").GetString("key");
+  auto width = request.GetObject("object").GetInteger("width");
+  auto height = request.GetObject("object").GetInteger("height");
+  if (width <= 0 || height <= 0) {
+    Aws::Utils::Json::JsonValue error;
+    error.WithString("error", "Width/height must be positive.");
+    return error;
+  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto image_name = request.GetObject("object").GetString("key");
auto width = request.GetObject("object").GetInteger("width");
auto height = request.GetObject("object").GetInteger("height");
if (!request.ValueExists("object") ||
!request.GetObject("object").ValueExists("key") ||
!request.GetObject("object").ValueExists("width") ||
!request.GetObject("object").ValueExists("height")) {
Aws::Utils::Json::JsonValue error;
error.WithString("error", "Missing object.key/width/height.");
return error;
}
auto image_name = request.GetObject("object").GetString("key");
auto width = request.GetObject("object").GetInteger("width");
auto height = request.GetObject("object").GetInteger("height");
if (width <= 0 || height <= 0) {
Aws::Utils::Json::JsonValue error;
error.WithString("error", "Width/height must be positive.");
return error;
}
🤖 Prompt for AI Agents
In benchmarks/200.multimedia/210.thumbnailer/cpp/main.cpp around lines 31 to 34,
the code assumes request.GetObject("object") always contains "key", "width", and
"height" with correct types; add defensive checks to validate existence and
types before using them: verify the top-level "object" exists and is an object,
check that "key" exists and is a string, and that "width" and "height" exist and
are integers (or convertible), and if any check fails return or throw a
descriptive error (or set an error response) instead of proceeding; use the
appropriate API (HasMember/isString/isInt or equivalent) to test types and
extract values safely.

Comment on lines +37 to +43
int w = image.size().width, h = image.size().height;
cv::Size scale((int)256 * ((float)w) / h, 256);
cv::resize(image, image, scale);
w = image.size().width, h = image.size().height;
image = image(cv::Range(16, 240), cv::Range(80, 304));
image.convertTo(image, CV_32FC3, 1.0f / 255.0f);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Unsafe fixed ROI crop; center-crop robustly after resizing the shorter side.

Current indices can go out of bounds (e.g., width 256). Replace with size-aware crop.

-    int w = image.size().width, h = image.size().height;
-    cv::Size scale((int)256 * ((float)w) / h, 256);
-    cv::resize(image, image, scale);
-    w = image.size().width, h = image.size().height;
-    image = image(cv::Range(16, 240), cv::Range(80, 304));
-    image.convertTo(image, CV_32FC3, 1.0f / 255.0f);
+    int w = image.cols, h = image.rows;
+    int short_side = std::min(w, h);
+    cv::Size scaled(
+        static_cast<int>(std::round(w * 256.0 / short_side)),
+        static_cast<int>(std::round(h * 256.0 / short_side)));
+    cv::resize(image, image, scaled);
+    w = image.cols; h = image.rows;
+    int x = std::max(0, std::min((w - kIMAGE_SIZE) / 2, w - kIMAGE_SIZE));
+    int y = std::max(0, std::min((h - kIMAGE_SIZE) / 2, h - kIMAGE_SIZE));
+    image = image(cv::Rect(x, y, kIMAGE_SIZE, kIMAGE_SIZE));
+    image.convertTo(image, CV_32FC3, 1.0f / 255.0f);

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +96 to +109
auto bucket_obj = request.GetObject("bucket");
if (!bucket_obj.IsObject())
{
Aws::Utils::Json::JsonValue error;
error.WithString("error", "Bucket object is not valid.");
return error;
}

auto bucket_name = bucket_obj.GetString("bucket");
auto input_prefix = bucket_obj.GetString("input");
auto model_prefix = bucket_obj.GetString("model");
auto key = request.GetObject("object").GetString("input");
auto model_key = request.GetObject("object").GetString("model");

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate request schema before GetString() calls.

Guard against missing keys to prevent exceptions.

-    auto bucket_obj = request.GetObject("bucket");
-    if (!bucket_obj.IsObject())
+    if (!request.ValueExists("bucket") || !request.GetObject("bucket").IsObject())
     {
         Aws::Utils::Json::JsonValue error;
         error.WithString("error", "Bucket object is not valid.");
         return error;
     }
-
-    auto bucket_name = bucket_obj.GetString("bucket");
-    auto input_prefix = bucket_obj.GetString("input");
-    auto model_prefix = bucket_obj.GetString("model");
-    auto key = request.GetObject("object").GetString("input");
-    auto model_key = request.GetObject("object").GetString("model");
+    auto bucket_obj = request.GetObject("bucket");
+    if (!bucket_obj.ValueExists("bucket") || !bucket_obj.ValueExists("input") || !bucket_obj.ValueExists("model")
+        || !request.ValueExists("object")
+        || !request.GetObject("object").ValueExists("input")
+        || !request.GetObject("object").ValueExists("model")) {
+        Aws::Utils::Json::JsonValue error;
+        error.WithString("error", "Missing required keys: bucket.bucket/input/model and object.input/model.");
+        return error;
+    }
+    auto bucket_name = bucket_obj.GetString("bucket");
+    auto input_prefix = bucket_obj.GetString("input");
+    auto model_prefix = bucket_obj.GetString("model");
+    auto key = request.GetObject("object").GetString("input");
+    auto model_key = request.GetObject("object").GetString("model");
🤖 Prompt for AI Agents
In benchmarks/400.inference/411.image-recognition/cpp/main.cpp around lines 96
to 109, the code calls GetString() on JSON members without checking they exist
or are strings; add guards that verify bucket_obj has the keys "bucket",
"input", and "model" and that request.GetObject("object") has "input" and
"model", and confirm each member IsString()/HasMember() (or equivalent) before
calling GetString(); if any check fails, build and return an
Aws::Utils::Json::JsonValue error with a clear message indicating which key is
missing or of the wrong type.

Comment on lines +17 to +23
uint64_t start_time = timeSinceEpochMicrosec();
igraph_barabasi_game(
/* graph= */ &graph,
/* n= */ size,
/* power= */ 1,
/* m= */ 10,
/* outseq= */ NULL,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Boundary checks for Barabási–Albert parameters.

m=10 requires size > 10; for smaller graphs igraph may error. Guard inputs or scale m.

Apply this diff:

-  igraph_rng_seed(igraph_rng_default(), seed);
+  igraph_rng_seed(igraph_rng_default(), seed);
   {
     uint64_t start_time = timeSinceEpochMicrosec();
-    igraph_barabasi_game(
+    int m = size > 1 ? std::min(10, size - 1) : 1;
+    igraph_barabasi_game(
       /* graph=    */ &graph,
       /* n=        */ size,
       /* power=    */ 1,
-      /* m=        */ 10,
+      /* m=        */ m,

Also applies to: 29-35

🤖 Prompt for AI Agents
In benchmarks/500.scientific/501.graph-pagerank/cpp/function.hpp around lines
17-23 (and similarly at 29-35), the call uses m=10 unconditionally which will
fail when size <= 10; compute a safe m before calling igraph_barabasi_game
(e.g., cap m to size-1 and ensure it is at least 1: m = (size > 1) ? min(10,
(int)size - 1) : 1), then pass that m to igraph_barabasi_game; apply the same
defensive m calculation to the second occurrence at lines 29-35.

Comment on lines +158 to +166
# FIXME: use zipfile
# create zip with hidden directory but without parent directory
execute("zip -qu -r9 {}.zip * .".format(benchmark), shell=True, cwd=directory)
benchmark_archive = "{}.zip".format(os.path.join(directory, benchmark))
self.logging.info("Created {} archive".format(benchmark_archive))

bytes_size = os.path.getsize(os.path.join(directory, benchmark_archive))
mbytes = bytes_size / 1024.0 / 1024.0
self.logging.info("Zip archive size {:2f} MB".format(mbytes))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid shell zip (S604) and fix archive sizing path.

Use zipfile; also benchmark_archive already has directory prefix.

-            # FIXME: use zipfile
-            # create zip with hidden directory but without parent directory
-            execute("zip -qu -r9 {}.zip * .".format(benchmark), shell=True, cwd=directory)
-            benchmark_archive = "{}.zip".format(os.path.join(directory, benchmark))
-            self.logging.info("Created {} archive".format(benchmark_archive))
-
-            bytes_size = os.path.getsize(os.path.join(directory, benchmark_archive))
+            benchmark_archive = os.path.join(directory, f"{benchmark}.zip")
+            from zipfile import ZipFile, ZIP_DEFLATED
+            with ZipFile(benchmark_archive, "w", compression=ZIP_DEFLATED) as zf:
+                for entry in os.listdir(directory):
+                    if entry == os.path.basename(benchmark_archive):
+                        continue
+                    zf.write(os.path.join(directory, entry), arcname=entry)
+            self.logging.info("Created {} archive".format(benchmark_archive))
+            bytes_size = os.path.getsize(benchmark_archive)
             mbytes = bytes_size / 1024.0 / 1024.0
             self.logging.info("Zip archive size {:2f} MB".format(mbytes))
🧰 Tools
🪛 Ruff (0.14.1)

160-160: Function call with shell=True parameter identified, security issue

(S604)

🤖 Prompt for AI Agents
In sebs/aws/aws.py around lines 158-166, replace the shell zip call and
incorrect size path with Python's zipfile: open a ZipFile in write mode, walk
the target directory and add files (including hidden files) using
arcname=os.path.relpath(path, start=directory) to avoid including the parent
directory, and ensure you close the archive (use context manager). Set
benchmark_archive to the absolute path of the created .zip (do not prepend
directory twice) and compute bytes_size with os.path.getsize(benchmark_archive);
then log the created archive path and its size in MB.

Comment on lines +171 to +176
benchmark_archive = os.path.join(directory, "build", "benchmark.zip")
self.logging.info("Created {} archive".format(benchmark_archive))

bytes_size = os.path.getsize(os.path.join(directory, benchmark_archive))
mbytes = bytes_size / 1024.0 / 1024.0
self.logging.info("Zip archive size {:2f} MB".format(mbytes))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix size path for C++ archive.

Don’t re-join directory; path already absolute.

-            bytes_size = os.path.getsize(os.path.join(directory, benchmark_archive))
+            bytes_size = os.path.getsize(benchmark_archive)
🤖 Prompt for AI Agents
In sebs/aws/aws.py around lines 171 to 176, the code incorrectly calls
os.path.getsize(os.path.join(directory, benchmark_archive)) even though
benchmark_archive is already an absolute path; replace that double-join with
os.path.getsize(benchmark_archive) so the correct file is sized, then compute
mbytes and log as before.

Comment on lines +193 to +204
def cloud_runtime(self, language: Language, language_version: str):

# AWS uses different naming scheme for Node.js versions
# For example, it's 12.x instead of 12.
if language == "nodejs":
return f"{runtime}.x"
return runtime
if language == Language.NODEJS:
return f"{language}{language_version}.x"
elif language == Language.CPP:
return "provided.al2"
elif language in [Language.PYTHON]:
return f"{language}{language_version}"
else:
raise NotImplementedError()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Incorrect runtime strings from enum; use value.

f"{language}{version}" renders 'Language.PYTHON3.10'. AWS expects 'python3.10', 'nodejs18.x', or 'provided.al2'.

-        if language == Language.NODEJS:
-            return f"{language}{language_version}.x"
+        if language == Language.NODEJS:
+            return f"{language.value}{language_version}.x"
-        elif language == Language.CPP:
+        if language == Language.CPP:
             return "provided.al2"
-        elif language in [Language.PYTHON]:
-            return f"{language}{language_version}"
-        else:
+        if language == Language.PYTHON:
+            return f"{language.value}{language_version}"
+        # else
             raise NotImplementedError()

You can quickly assert expected outputs:


🏁 Script executed:

#!/bin/bash
python - <<'PY'
from sebs.types import Language
from sebs.aws.aws import AWS
print("node:", AWS.cloud_runtime(None, Language.NODEJS, "18") if False else f"{Language.NODEJS.value}18.x")
print("py:", f"{Language.PYTHON.value}3.10")
print("cpp:", "provided.al2")
PY

Length of output: 326



---

🏁 Script executed:

```shell
#!/bin/bash
# Find and inspect the Language enum definition
rg -n "class Language" --type=py -A 20 | head -50

Length of output: 868


🏁 Script executed:

#!/bin/bash
# Inspect the current cloud_runtime method in aws.py
rg -n "def cloud_runtime" --type=py -A 15

Length of output: 892


🏁 Script executed:

#!/bin/bash
# Check if Language enum uses value property and what format
fd -type f -name "types.py" | xargs cat 2>/dev/null

Length of output: 170


🏁 Script executed:

#!/bin/bash
# Look for all usages of Language enum string conversion to understand current behavior
rg -n "f.*{language}" --type=py -C 2

Length of output: 99


🏁 Script executed:

#!/bin/bash
# Test Python str Enum behavior without importing the full sebs module
python3 << 'EOF'
from enum import Enum

class Language(str, Enum):
    PYTHON = "python"
    NODEJS = "nodejs"
    CPP = "cpp"

# Test different ways to access enum value
print("f-string direct:", f"{Language.PYTHON}3.10")
print("f-string .value:", f"{Language.PYTHON.value}3.10")
print("str() direct:", str(Language.PYTHON) + "3.10")
print("repr():", repr(Language.PYTHON))
print("name:", Language.PYTHON.name)
print("value:", Language.PYTHON.value)
EOF

Length of output: 161


🏁 Script executed:

#!/bin/bash
# Also check if there are any other cloud_runtime usages or implementations
rg -n "cloud_runtime" --type=py -B 2 -A 2

Length of output: 561


Fix missing .value property on Language enum strings.

The current implementation produces invalid AWS Lambda runtime strings. Testing confirms that f"{Language.PYTHON}{version}" renders as "Language.PYTHON3.10" (using enum name) instead of "python3.10" (using enum value). Since Language inherits from str and Enum, the .value property must be explicitly accessed in f-strings.

Required fixes at sebs/aws/aws.py lines 193-204:

  • Line 198: Use f"{language.value}{language_version}.x"
  • Line 202: Use f"{language.value}{language_version}"
-            return f"{language}{language_version}.x"
+            return f"{language.value}{language_version}.x"
-            return f"{language}{language_version}"
+            return f"{language.value}{language_version}"
🤖 Prompt for AI Agents
In sebs/aws/aws.py around lines 193 to 204, the f-strings build AWS runtime
names using the Language enum itself (e.g. f"{language}{language_version}")
which yields enum names like "Language.PYTHON3.10" instead of the enum string
values; update the NodeJS and Python branches to use the enum's .value (i.e. use
f"{language.value}{language_version}.x" for NodeJS and
f"{language.value}{language_version}" for Python) so the runtime strings are
correct; leave the CPP branch and the NotImplementedError unchanged.

Comment on lines +420 to +470
def add_deployment_package_cpp(self, output_dir):

# FIXME: Configure CMakeLists.txt dependencies
# FIXME: Configure for AWS - this should be generic
# FIXME: optional hiredis

cmake_script = """
cmake_minimum_required(VERSION 3.9)
set(CMAKE_CXX_STANDARD 14)
set(CMAKE_CXX_FLAGS "-Os")
project(benchmark LANGUAGES CXX)
add_executable(
${PROJECT_NAME} "handler.cpp" "key-value.cpp"
"storage.cpp" "redis.cpp" "utils.cpp" "main.cpp"
)
target_include_directories(${PROJECT_NAME} PRIVATE ".")
target_compile_features(${PROJECT_NAME} PRIVATE "cxx_std_14")
target_compile_options(${PROJECT_NAME} PRIVATE "-Wall" "-Wextra")
find_package(aws-lambda-runtime)
target_link_libraries(${PROJECT_NAME} PRIVATE AWS::aws-lambda-runtime)
"""

for dependency in self._benchmark_config._cpp_dependencies:
cmake_script += CppDependencies.to_cmake_list(dependency)

cmake_script += """
find_package(AWSSDK COMPONENTS s3 dynamodb core)
target_link_libraries(${PROJECT_NAME} PUBLIC ${AWSSDK_LINK_LIBRARIES})
find_package(PkgConfig REQUIRED)
set(ENV{PKG_CONFIG_PATH} "/opt/lib/pkgconfig")
pkg_check_modules(HIREDIS REQUIRED IMPORTED_TARGET hiredis)
target_include_directories(${PROJECT_NAME} PUBLIC PkgConfig::HIREDIS)
target_link_libraries(${PROJECT_NAME} PUBLIC PkgConfig::HIREDIS)
# this line creates a target that packages your binary and zips it up
aws_lambda_package_target(${PROJECT_NAME})
"""

self.logging.info(
"Dependencies for CPP benchmark {benchmark} are " +
str(len(self._benchmark_config._cpp_dependencies)) + " dependencies."
)

build_script = os.path.join(output_dir, "CMakeLists.txt")
with open(build_script, "w") as script_file:
script_file.write(textwrap.dedent(cmake_script))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix logging statement and consider making file list configurable.

The implementation has several issues:

  1. Critical: Lines 462-464 have a syntax error in the logging statement - mixing f-string/format-style with string concatenation:
self.logging.info(
    "Dependencies for CPP benchmark {benchmark} are " + 
        str(len(self._benchmark_config._cpp_dependencies)) + " dependencies."
)
  1. Major: Line 432 hardcodes the source file list, which reduces flexibility for benchmarks with different file structures.

  2. Minor: The function is AWS-specific (aws-lambda-runtime, AWSSDK, hiredis are always included), which aligns with the learnings that C++ support is currently AWS-only.

Apply this diff to fix the logging statement:

-        self.logging.info(
-            "Dependencies for CPP benchmark {benchmark} are " + 
-                str(len(self._benchmark_config._cpp_dependencies)) + " dependencies."
-        )
+        self.logging.info(
+            f"Dependencies for CPP benchmark {self.benchmark} are "
+            f"{len(self._benchmark_config._cpp_dependencies)} dependencies."
+        )

For the hardcoded file list, consider making it configurable via the benchmark's config.json or by discovering .cpp/.hpp files automatically.

Based on learnings

Comment on lines 656 to 661
def build(
self,
deployment_build_step: Callable[
[str, str, str, str, str, bool, bool], Tuple[str, int, str]
[str, Language, str, str, str, bool, bool], Tuple[str, int, str]
],
) -> Tuple[bool, str, bool, str]:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify breaking API change is coordinated with callers.

The deployment_build_step callable signature has been updated to accept Language as the first parameter (line 659). This is a breaking change that requires all callers to be updated accordingly.

Run the following script to find all usages of methods that pass deployment_build_step callbacks:


🏁 Script executed:

#!/bin/bash
# Find all locations where deployment_build_step or similar build callbacks are passed

# Search for methods that call Benchmark.build() or similar
rg -nP --type=py -C3 '\.build\s*\(' -g '!sebs/benchmark.py'

# Search for lambda or function definitions that might be build steps
rg -nP --type=py -C3 'def.*build.*step|lambda.*Language' -g '!sebs/benchmark.py'

Length of output: 1254


🏁 Script executed:

#!/bin/bash
# Find Benchmark class usages and build method calls more precisely

# Search for Benchmark.build() calls specifically
rg -nP --type=py 'Benchmark.*build|\.build\s*\(\s*lambda|\.build\s*\(\s*self\.' -C5

# Search for deployment_build_step definitions or usages
rg -nP --type=py 'deployment_build_step' -C3

# Look for function definitions that match the old pattern (6 string params + bool)
rg -nP --type=py 'def\s+\w+\s*\([^)]*\bstr\b[^)]*\bstr\b[^)]*\bbool\b' -C2

Length of output: 2612



🏁 Script executed:

#!/bin/bash
# Find package_code method definitions and implementations

# Search for package_code method definition
rg -nP --type=py 'def\s+package_code|self\.package_code' -C4

# Search for System class hierarchy to find all implementations
rg -nP --type=py 'class.*System.*:' -C2

# Look for any method that might be a build step callback
rg -nP --type=py '\(self,\s*str,\s*[Ll]anguage' -C2

Length of output: 7825



🏁 Script executed:

#!/bin/bash
# Get the complete signature of Local.package_code method
sed -n '113,135p' sebs/local/local.py

Length of output: 835


Update sebs/local/local.py:113 to accept Language type parameter.

The deployment_build_step callable signature change requires all system implementations to accept Language as the second parameter. However, sebs/local/local.py:113 still expects language_name: str and has not been updated. This will cause a runtime error when Benchmark.build() passes self.language (Language type) to the Local system's package_code() method.

All other system implementations (AWS, GCP, Azure, OpenWhisk) have been correctly updated with the Language parameter. The Local system implementation is the only one out of sync with the new signature.

🤖 Prompt for AI Agents
In sebs/local/local.py around line 113, the Local system's package_code (or
equivalent) function currently accepts language_name: str but must accept
language: Language to match the updated deployment_build_step signature; change
the parameter type from str to Language, update the parameter name to language,
and update any uses inside the function to reference language (e.g., use
language.name or language.value as needed when a string is required) so the
method can accept the Language object passed from Benchmark.build without
runtime type errors.

@DJAntivenom
Copy link

Hey @mcopik I would like to take over this branch and finish all the remaining todos, warnings, etc. once you agree to it. Please let me know if you have unfinished work locally somewhere otherwise I would start working on this branch tomorrow (Monday 16.11.25).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants