Skip to content

Conversation

mangelajo
Copy link
Member

@mangelajo mangelajo commented Sep 26, 2025

Summary by CodeRabbit

  • New Features

    • Configurable exporter offline timeout (default 180s) driving online/offline status and dynamic requeue timing.
    • Expanded gRPC keepalive settings exposed in chart values: timeout, maxConnectionIdle, maxConnectionAge, maxConnectionAgeGrace, time.
  • Refactor

    • Services and controllers now accept multiple gRPC server options and propagate exporter options end-to-end.
  • Chores

    • Updated linting tool version.
    • CI e2e job now targets release-0.7.
  • Tests

    • Tests updated to cover exporter offline handling and status behavior.

The version we were currently usign has a missing dependency
which isn't available anymore.
Copy link
Contributor

coderabbitai bot commented Sep 26, 2025

Walkthrough

Adds ExporterOptions to configuration and threads it through LoadConfiguration to services and ExporterReconciler; changes gRPC server option handling from a single option to a slice and extends gRPC keepalive/server-parameter fields; updates Helm model/schema/values and bumps golangci-lint in the Makefile.

Changes

Cohort / File(s) Summary
Build tooling
Makefile
Bump GOLANGCI_LINT_VERSION from v2.1.2 to v2.5.0.
Config types & loading
internal/config/types.go, internal/config/config.go, internal/config/grpc.go, internal/config/oidc.go
Add ExporterOptions type/field and preprocessing; introduce AuthenticationConfigResult and ConfigurationResult; extend Keepalive with Timeout, MaxConnectionIdle, MaxConnectionAge, MaxConnectionAgeGrace, Time; change several Load* signatures to return consolidated result types and []grpc.ServerOption; parse keepalive params and build server options slice.
Service structs & startup
internal/service/controller_service.go, internal/service/router_service.go, cmd/router/main.go
Replace single ServerOption with ServerOptions []grpc.ServerOption; compose base server options then append ServerOptions... when creating gRPC servers; update composite literal usage to ServerOptions.
Entrypoints
cmd/main.go, cmd/router/main.go
Update callers to accept ConfigurationResult and use configResult.* fields; populate ExporterReconciler.ExporterOptions; use ServerOptions field when initializing services.
Controller logic
internal/controller/exporter_controller.go, internal/controller/lease_controller_test.go
Add ExporterOptions to ExporterReconciler; derive offlineTimeout via ExporterOptions.GetOfflineTimeout(); replace hardcoded 1m offline threshold with configurable timeout and compute dynamic requeueAfter; update tests to wire ExporterOptions and set exporter-online conditions.
Helm chart model/schema/values
deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py, .../values.schema.json, deploy/helm/jumpstarter/values.yaml
Add exporterOptions.offlineTimeout; extend Keepalive model/schema/values with server-parameter fields (timeout, maxConnectionIdle, maxConnectionAge, maxConnectionAgeGrace, time); add schema definitions and inline comments.
CI workflow
.github/workflows/e2e.yaml
Update e2e job key and change referenced jumpstarter action to release-0.7 (use release tag instead of specific commit).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Main as cmd/main.go
  participant Cfg as internal/config.LoadConfiguration
  participant Svc as ControllerService
  participant Reco as ExporterReconciler

  Main->>Cfg: LoadConfiguration(ctx,...)
  Cfg-->>Main: ConfigurationResult{ AuthenticationConfigResult, Router, ServerOptions[], Provisioning, ExporterOptions }
  Main->>Svc: New ControllerService{ ServerOptions: configResult.ServerOptions, Router: configResult.Router, ... }
  Main->>Reco: New ExporterReconciler{ ExporterOptions: configResult.ExporterOptions, ... }
  Note right of Reco: Reconciler uses ExporterOptions.GetOfflineTimeout() for scheduling
Loading
sequenceDiagram
  autonumber
  participant Reco as ExporterReconciler
  participant Cfg as ExporterOptions
  participant K8s as Kubernetes API

  Reco->>Cfg: GetOfflineTimeout()
  Reco->>K8s: Read Exporter.Status.LastSeen
  alt LastSeen older than offlineTimeout
    Reco->>K8s: Set status Offline
    Reco-->>K8s: Requeue after safetyMargin (10s)
  else LastSeen within offlineTimeout
    Reco->>Reco: expiration = LastSeen + offlineTimeout
    Reco-->>K8s: RequeueAfter = min(expiration + margin - now, cap)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • bennyz

Poem

A rabbit taps keys in a quiet nook,
Timeouts tuned where exporters once stood,
Keepalive heartbeats now list each look,
Servers gather options, tidy and good,
Hop—configs hum, deployment tastes like food 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly captures the main focus of the changes by referencing the addition of an exporter timeout feature and gRPC configuration knobs, directly correlating with the updates to configuration, controllers, and Helm charts.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch exporter-timeout-grpc

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cec95ca and 31165c8.

📒 Files selected for processing (4)
  • cmd/main.go (3 hunks)
  • internal/config/config.go (2 hunks)
  • internal/config/oidc.go (2 hunks)
  • internal/config/types.go (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
internal/config/config.go (4)
internal/config/types.go (8)
  • ConfigurationResult (93-99)
  • Router (79-79)
  • Config (12-17)
  • Authentication (19-22)
  • Grpc (32-34)
  • ExporterOptions (49-52)
  • AuthenticationConfigResult (87-90)
  • Provisioning (24-26)
internal/config/oidc.go (1)
  • LoadAuthenticationConfiguration (17-55)
internal/oidc/config.go (1)
  • LoadAuthenticationConfiguration (18-62)
internal/config/grpc.go (1)
  • LoadGrpcConfiguration (13-83)
internal/config/types.go (1)
deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py (5)
  • Authentication (215-222)
  • Provisioning (17-20)
  • Grpc (65-68)
  • ExporterOptions (206-212)
  • Router (289-294)
cmd/main.go (5)
internal/config/config.go (1)
  • LoadConfiguration (44-105)
internal/config/types.go (3)
  • ExporterOptions (49-52)
  • Provisioning (24-26)
  • Router (79-79)
internal/authentication/bearer.go (1)
  • NewBearerTokenAuthenticator (19-21)
internal/authorization/basic.go (1)
  • NewBasicAuthorizer (21-27)
internal/authorization/metadata.go (2)
  • NewMetadataAttributesGetter (35-39)
  • MetadataAttributesGetterConfig (25-29)
internal/config/oidc.go (1)
internal/config/types.go (2)
  • AuthenticationConfigResult (87-90)
  • Internal (28-30)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: tests
  • GitHub Check: e2e-tests-release-0-7
  • GitHub Check: e2e-tests (ubuntu-24.04)
  • GitHub Check: lint-go
  • GitHub Check: deploy-kind
  • GitHub Check: e2e-tests (ubuntu-24.04-arm)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (9)
internal/config/types.go (1)

31-46: Keepalive timeout fields stay strings—shouldn’t we parse durations earlier?

All keepalive fields remain string, so invalid or malformed values will pass through config loading and only explode at runtime when gRPC parses them. Consider switching these to time.Duration (with metav1.Duration or custom parsing) during config load so we fail fast with a clear error.

cmd/router/main.go (1)

63-65: Variable naming still singular despite slice

Minor nit: serverOption is now a slice. Consider renaming to serverOptions for clarity, but optional.

deploy/helm/jumpstarter/values.yaml (1)

45-46: Document warnings about low offline timeout values

Maybe mention any operational caveats (e.g., too low timeout triggering flapping) beside the default? Optional but could help operators.

cmd/main.go (3)

157-171: Rename local option to serverOptions for clarity.

It holds a slice of grpc.ServerOption. Renaming avoids confusion with the previous single-option field and improves readability.

-	authenticator, prefix, router, option, provisioning, exporterOptions, err := config.LoadConfiguration(
+	authenticator, prefix, router, serverOptions, provisioning, exporterOptions, err := config.LoadConfiguration(

176-181: Defensive nil-safety when dereferencing exporterOptions.

Today LoadConfiguration always returns a non-nil pointer, but a small guard avoids future NPEs if that changes.

Example (outside this block, just before constructing the reconciler):

expOpts := config.ExporterOptions{}
if exporterOptions != nil {
	expOpts = *exporterOptions
}

Then here:

-		ExporterOptions: *exporterOptions,
+		ExporterOptions: expOpts,

218-220: Rename local option to serverOptions

Keeps naming consistent with the slice-based gRPC API. Update both the return binding and the SetupWithManager call:

- authenticator, prefix, router, option, provisioning, exporterOptions, err := config.LoadConfiguration(...)
+ authenticator, prefix, router, serverOptions, provisioning, exporterOptions, err := config.LoadConfiguration(...)-       ServerOptions: option,
+       ServerOptions: serverOptions,
internal/config/grpc.go (2)

68-75: Avoid shadowing the time package.

time, err := time.ParseDuration(...) shadows the imported package. Rename the local var.

-	if config.Keepalive.Time != "" {
-		time, err := time.ParseDuration(config.Keepalive.Time)
+	if config.Keepalive.Time != "" {
+		pingInterval, err := time.ParseDuration(config.Keepalive.Time)
 		if err != nil {
 			return nil, err
 		}
-		serverParams.Time = time
+		serverParams.Time = pingInterval
 	}

77-81: Comment and condition mismatch; ServerParameters are always added.

You always set Timeout (defaultGrpcTimeout), making serverParams non-zero. Either remove the condition or gate setting the default.

-	// Only add ServerParameters if at least one parameter is set
-	if serverParams != (keepalive.ServerParameters{}) {
-		serverOptions = append(serverOptions, grpc.KeepaliveParams(serverParams))
-	}
+	// Always add ServerParameters (Timeout defaults to defaultGrpcTimeout when unset)
+	serverOptions = append(serverOptions, grpc.KeepaliveParams(serverParams))
internal/config/config.go (1)

94-95: Avoid returning pointers to fields of a local struct (optional).

Returning &config.Provisioning and &config.ExporterOptions is safe (heap escape), but returning by value simplifies ownership and avoids interior pointers.

Consider changing the signature to return Provisioning and ExporterOptions by value and adjust call sites accordingly. If you keep pointers, fine as-is.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c7dd883 and bcb6483.

📒 Files selected for processing (12)
  • Makefile (1 hunks)
  • cmd/main.go (3 hunks)
  • cmd/router/main.go (1 hunks)
  • deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py (4 hunks)
  • deploy/helm/jumpstarter/charts/jumpstarter-controller/values.schema.json (3 hunks)
  • deploy/helm/jumpstarter/values.yaml (2 hunks)
  • internal/config/config.go (3 hunks)
  • internal/config/grpc.go (1 hunks)
  • internal/config/types.go (2 hunks)
  • internal/controller/exporter_controller.go (4 hunks)
  • internal/service/controller_service.go (3 hunks)
  • internal/service/router_service.go (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
internal/config/grpc.go (2)
deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py (2)
  • Grpc (65-68)
  • Keepalive (29-62)
internal/config/types.go (2)
  • Grpc (27-29)
  • Keepalive (31-42)
internal/service/controller_service.go (3)
internal/authentication/types.go (1)
  • ContextAuthenticator (9-11)
internal/authorization/types.go (1)
  • ContextAttributesGetter (10-12)
internal/config/types.go (1)
  • Router (48-48)
internal/config/types.go (1)
deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py (4)
  • Authentication (215-222)
  • Provisioning (17-20)
  • Grpc (65-68)
  • ExporterOptions (206-212)
deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py (1)
internal/config/types.go (1)
  • ExporterOptions (44-46)
cmd/main.go (2)
internal/config/config.go (1)
  • LoadConfiguration (45-95)
internal/config/types.go (2)
  • ExporterOptions (44-46)
  • Router (48-48)
internal/config/config.go (2)
internal/config/types.go (5)
  • Router (48-48)
  • Provisioning (19-21)
  • ExporterOptions (44-46)
  • Config (7-12)
  • Grpc (27-29)
internal/config/grpc.go (1)
  • LoadGrpcConfiguration (13-83)
internal/controller/exporter_controller.go (2)
deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py (1)
  • ExporterOptions (206-212)
internal/config/types.go (1)
  • ExporterOptions (44-46)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: e2e-tests (ubuntu-24.04-arm)
  • GitHub Check: e2e-tests (ubuntu-24.04)
  • GitHub Check: e2e-tests-28d6b1cc3b49ab9ae176918ab9709a2e2522c97e
  • GitHub Check: deploy-kind
  • GitHub Check: tests
  • GitHub Check: lint-go
🔇 Additional comments (11)
deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py (1)

206-232: Schema mirrors Go types—looks good

The new ExporterOptions model lines up with the Go struct and Helm schema. No issues spotted.

internal/service/controller_service.go (2)

71-83: ServerOptions slice integrates cleanly

Switching to a slice cleanly preserves existing interceptors while letting callers append more options. Looks solid.


703-724: Base options rebuilt each start—works as intended

Collecting interceptors into a slice before appending external options keeps behavior identical while supporting multiple additional options. All good here.

deploy/helm/jumpstarter/charts/jumpstarter-controller/values.schema.json (3)

217-236: ExporterOptions schema matches controller config

The schema addition for exporter options aligns with the Go and Python models, enforcing the null-or-string contract as expected.


626-672: Config schema extension is consistent

Wiring the new exporterOptions property into JumpstarterConfig keeps the schema coherent. Nothing to fix.


678-770: Keepalive schema expanded appropriately

The additional keepalive fields line up with the backend changes and include clear descriptions. No issues spotted.

internal/service/router_service.go (2)

38-43: RouterService ServerOptions change looks good

Field rename to a slice keeps behavior while enabling composition—no concerns.


123-131: Start now supports multiple server options cleanly

Building a local slice before appending injected options mirrors the controller change—implementation looks correct.

Makefile (1)

188-189: Confirm golangci-lint v2.5.0 compatibility
Ensure CI and local contributor environments install golangci-lint v2.5.0 and that running

make golangci-lint && ./bin/golangci-lint version && ./bin/golangci-lint run ./...

yields no new lint failures.

internal/controller/exporter_controller.go (1)

187-234: Requeue scheduling logic looks solid.

  • Uses configurable timeout.
  • Schedules just after expiry with a safety margin.
  • Caps requeue to avoid long sleeps.

Please confirm the desired defaults:

  • safetyMargin = 10s
  • max requeue cap = 5m

If different, I can parameterize them.

internal/config/config.go (1)

20-20: Return type change to []grpc.ServerOption looks good.

Aligns with downstream slice-based ServerOptions.

@mangelajo mangelajo force-pushed the exporter-timeout-grpc branch 3 times, most recently from d69ad62 to 9e0e98a Compare September 26, 2025 14:55
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
cmd/main.go (2)

157-170: Rename local option to serverOptions for clarity.

It holds []grpc.ServerOption. Renaming improves readability and matches field name ServerOptions.

-	authenticator, prefix, router, option, provisioning, exporterOptions, err := config.LoadConfiguration(
+	authenticator, prefix, router, serverOptions, provisioning, exporterOptions, err := config.LoadConfiguration(
   ...
-		ServerOptions: option,
+		ServerOptions: serverOptions,

Also applies to: 218-220


176-181: Nil-safety check (defensive).

exporterOptions currently comes from &config.ExporterOptions and won’t be nil, but if the source changes later this would panic. Consider guarding or passing the pointer instead to keep the contract flexible.

Example:

-		ExporterOptions: *exporterOptions,
+		ExporterOptions: func() config.ExporterOptions {
+			if exporterOptions == nil {
+				return config.ExporterOptions{}
+			}
+			return *exporterOptions
+		}(),
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bcb6483 and 9e0e98a.

📒 Files selected for processing (8)
  • .github/workflows/e2e.yaml (1 hunks)
  • cmd/main.go (3 hunks)
  • deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py (4 hunks)
  • deploy/helm/jumpstarter/charts/jumpstarter-controller/values.schema.json (3 hunks)
  • deploy/helm/jumpstarter/values.yaml (2 hunks)
  • internal/config/config.go (3 hunks)
  • internal/config/types.go (2 hunks)
  • internal/controller/exporter_controller.go (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • deploy/helm/jumpstarter/values.yaml
  • deploy/helm/jumpstarter/charts/jumpstarter-controller/values.schema.json
  • deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py
🧰 Additional context used
🧬 Code graph analysis (4)
internal/config/config.go (2)
internal/config/types.go (5)
  • Router (48-48)
  • Provisioning (19-21)
  • ExporterOptions (44-46)
  • Config (7-12)
  • Grpc (27-29)
internal/config/grpc.go (1)
  • LoadGrpcConfiguration (13-83)
internal/controller/exporter_controller.go (3)
deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py (1)
  • ExporterOptions (206-212)
internal/config/types.go (1)
  • ExporterOptions (44-46)
internal/oidc/op.go (1)
  • Signer (20-24)
cmd/main.go (2)
internal/config/config.go (1)
  • LoadConfiguration (45-95)
internal/config/types.go (2)
  • ExporterOptions (44-46)
  • Router (48-48)
internal/config/types.go (1)
deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py (4)
  • Authentication (215-222)
  • Provisioning (17-20)
  • Grpc (65-68)
  • ExporterOptions (206-212)
🪛 actionlint (1.7.7)
.github/workflows/e2e.yaml

22-22: invalid job ID "e2e-tests-release-0.7". job ID must start with a letter or _ and contain only alphanumeric characters, -, or _

(id)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: lint-go
  • GitHub Check: tests
  • GitHub Check: deploy-kind
🔇 Additional comments (5)
internal/config/types.go (1)

7-12: LGTM: New ExporterOptions surface.

Adding ExporterOptions to Config and defining OfflineTimeout looks consistent with downstream usage.

Also applies to: 44-46

internal/controller/exporter_controller.go (2)

186-234: Requeue logic: sensible and capped.

Using LastSeen + offlineTimeout with a safety margin and a 5m cap looks good.


37-58: Data race: cached offlineTimeout written without synchronization. Also guard non-positive durations.

Concurrent reconciles can race on first write. Use sync.Once to init the cache, and treat empty/invalid/non-positive durations as default.

Apply this diff within the method:

-func (r *ExporterReconciler) getOfflineTimeout() time.Duration {
-	// Return cached value if already parsed
-	if r.offlineTimeout != 0 {
-		return r.offlineTimeout
-	}
-
-	// Parse and cache the duration
-	if r.ExporterOptions.OfflineTimeout == "" {
-		r.offlineTimeout = time.Minute // Default fallback
-	} else {
-		duration, err := time.ParseDuration(r.ExporterOptions.OfflineTimeout)
-		if err != nil {
-			// If parsing fails, use default fallback
-			r.offlineTimeout = time.Minute
-		} else {
-			r.offlineTimeout = duration
-		}
-	}
-
-	return r.offlineTimeout
-}
+func (r *ExporterReconciler) getOfflineTimeout() time.Duration {
+	r.offlineOnce.Do(func() {
+		const def = time.Minute
+		if r.ExporterOptions.OfflineTimeout == "" {
+			r.offlineTimeout = def
+			return
+		}
+		if d, err := time.ParseDuration(r.ExporterOptions.OfflineTimeout); err == nil && d > 0 {
+			r.offlineTimeout = d
+		} else {
+			r.offlineTimeout = def
+		}
+	})
+	return r.offlineTimeout
+}

Additions outside this hunk:

  • Import:
import "sync"
  • Struct field:
 type ExporterReconciler struct {
   ...
-  offlineTimeout  time.Duration // Cached parsed duration
+  offlineTimeout  time.Duration // Cached parsed duration
+  offlineOnce     sync.Once
 }
internal/config/config.go (2)

20-43: LGTM: Router server options now a slice.

Returning []grpc.ServerOption from LoadRouterConfiguration aligns with updated server code.


52-95: LGTM: Added ExporterOptions and serverOptions propagation.

Return shape updates and nil placements look correct; errors propagate properly.

@mangelajo mangelajo force-pushed the exporter-timeout-grpc branch from 9e0e98a to c1c96c6 Compare September 29, 2025 14:54
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (8)
.github/workflows/e2e.yaml (2)

1-8: Harden workflow token scope.

Add explicit minimal permissions (e.g., contents: read) at the workflow level.

 name: End-to-end tests
 on:
   workflow_dispatch:
   pull_request:
     branches:
       - main
       - 'release-*'
+permissions:
+  contents: read

24-31: Optional: run the compat job on both architectures.

To mirror the primary job’s coverage, consider adding a strategy matrix (ubuntu-24.04 and ubuntu-24.04-arm) here as well.

-e2e-tests-release-0-7:
-  runs-on: ubuntu-latest
+e2e-tests-release-0-7:
+  strategy:
+    matrix:
+      os: [ubuntu-24.04, ubuntu-24.04-arm]
+  runs-on: ${{ matrix.os }}
deploy/helm/jumpstarter/values.yaml (1)

82-84: Default consistency: chart vs code fallback

Chart defaults offlineTimeout to 180s, but code currently falls back to 1m if unset/invalid. Either align the code fallback to 180s or make the field required in the schema to avoid divergent behavior outside Helm. I’ve proposed a small code tweak in internal/config/types.go to align to 180s.

internal/config/types.go (2)

51-70: Align fallback to chart (180s) and guard negative durations

Switch fallback from 1m to 3m (180s) to match values.yaml, and treat non-positive parsed values as invalid to avoid “immediate offline.”

 func (e *ExporterOptions) PreprocessConfig() {
   // Parse and cache the offline timeout duration
   if e.OfflineTimeout == "" {
-    e.offlineTimeoutDur = time.Minute // Default fallback
+    e.offlineTimeoutDur = 3 * time.Minute // Align with chart default (180s)
   } else {
     duration, err := time.ParseDuration(e.OfflineTimeout)
     if err != nil {
-      // If parsing fails, use default fallback
-      e.offlineTimeoutDur = time.Minute
+      // If parsing fails, use default fallback
+      e.offlineTimeoutDur = 3 * time.Minute
     } else {
-      e.offlineTimeoutDur = duration
+      if duration <= 0 {
+        // Prevent non-positive durations from causing immediate-offline behavior
+        e.offlineTimeoutDur = 3 * time.Minute
+      } else {
+        e.offlineTimeoutDur = duration
+      }
     }
   }
 }

71-74: Defensive getter when preprocessing wasn’t called

Make GetOfflineTimeout resilient (use same defaults/parse if offlineTimeoutDur is zero). This helps tests or alternative initialization paths.

 func (e *ExporterOptions) GetOfflineTimeout() time.Duration {
-  return e.offlineTimeoutDur
+  if e.offlineTimeoutDur != 0 {
+    return e.offlineTimeoutDur
+  }
+  if e.OfflineTimeout == "" {
+    return 3 * time.Minute
+  }
+  if d, err := time.ParseDuration(e.OfflineTimeout); err == nil && d > 0 {
+    return d
+  }
+  return 3 * time.Minute
 }
deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py (2)

42-63: Add JSON‑Schema pattern for duration strings

To fail fast on typos, add a regex pattern for Go-style durations (ns|us|ms|s|m|h). This propagates into values.schema.json.

-    timeout: Optional[str] = Field(
-        None,
-        description="How long the server waits for a ping response before closing the connection",
-    )
+    timeout: Optional[str] = Field(
+        None,
+        description="How long the server waits for a ping response before closing the connection",
+        json_schema_extra={"pattern": r"^\d+(ns|us|ms|s|m|h)$"},
+    )
@@
-    maxConnectionIdle: Optional[str] = Field(
+    maxConnectionIdle: Optional[str] = Field(
         None,
-        description="Maximum time a connection can be idle before being closed",
+        description="Maximum time a connection can be idle before being closed",
+        json_schema_extra={"pattern": r"^\d+(ns|us|ms|s|m|h)$"},
     )
@@
-    maxConnectionAge: Optional[str] = Field(
+    maxConnectionAge: Optional[str] = Field(
         None,
-        description="Maximum lifetime of a connection before it's closed",
+        description="Maximum lifetime of a connection before it's closed",
+        json_schema_extra={"pattern": r"^\d+(ns|us|ms|s|m|h)$"},
     )
@@
-    maxConnectionAgeGrace: Optional[str] = Field(
+    maxConnectionAgeGrace: Optional[str] = Field(
         None,
-        description="Grace period after max connection age before forcible closure",
+        description="Grace period after max connection age before forcible closure",
+        json_schema_extra={"pattern": r"^\d+(ns|us|ms|s|m|h)$"},
     )
@@
-    time: Optional[str] = Field(
+    time: Optional[str] = Field(
         None,
-        description="How often the server sends keepalive pings to clients",
+        description="How often the server sends keepalive pings to clients",
+        json_schema_extra={"pattern": r"^\d+(ns|us|ms|s|m|h)$"},
     )

206-213: Validate exporter offlineTimeout format

Apply the same duration pattern here to catch invalid values early.

-    offlineTimeout: Optional[str] = Field(
-        None,
-        description="How long to wait before marking the exporter as offline",
-    )
+    offlineTimeout: Optional[str] = Field(
+        None,
+        description="How long to wait before marking the exporter as offline",
+        json_schema_extra={"pattern": r"^\d+(ns|us|ms|s|m|h)$"},
+    )
internal/controller/exporter_controller.go (1)

192-209: Nice dynamic requeue; consider constants for tuning

Requeue with a 10s margin and 5m cap is sensible. Optionally lift these literals to package‑level consts for easier tuning and testability.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9e0e98a and c1c96c6.

📒 Files selected for processing (8)
  • .github/workflows/e2e.yaml (1 hunks)
  • cmd/main.go (3 hunks)
  • deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py (4 hunks)
  • deploy/helm/jumpstarter/charts/jumpstarter-controller/values.schema.json (3 hunks)
  • deploy/helm/jumpstarter/values.yaml (2 hunks)
  • internal/config/config.go (3 hunks)
  • internal/config/types.go (2 hunks)
  • internal/controller/exporter_controller.go (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • deploy/helm/jumpstarter/charts/jumpstarter-controller/values.schema.json
  • cmd/main.go
🧰 Additional context used
🧬 Code graph analysis (4)
internal/config/types.go (1)
deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py (1)
  • ExporterOptions (206-212)
internal/controller/exporter_controller.go (3)
deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py (1)
  • ExporterOptions (206-212)
internal/config/types.go (1)
  • ExporterOptions (46-49)
api/v1alpha1/exporter_types.go (1)
  • ExporterConditionTypeOnline (48-48)
deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py (1)
internal/config/types.go (1)
  • ExporterOptions (46-49)
internal/config/config.go (3)
deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py (4)
  • Router (289-294)
  • Provisioning (17-20)
  • ExporterOptions (206-212)
  • Grpc (65-68)
internal/config/types.go (5)
  • Router (76-76)
  • Provisioning (21-23)
  • ExporterOptions (46-49)
  • Config (9-14)
  • Grpc (29-31)
internal/config/grpc.go (1)
  • LoadGrpcConfiguration (13-83)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: tests
  • GitHub Check: e2e-tests (ubuntu-24.04-arm)
  • GitHub Check: e2e-tests-release-0-7
  • GitHub Check: e2e-tests (ubuntu-24.04)
  • GitHub Check: deploy-kind
  • GitHub Check: lint-go
🔇 Additional comments (8)
.github/workflows/e2e.yaml (2)

24-24: Job ID fix LGTM; no lingering references to e2e-tests-release-0.7 found.
No updates needed for external badges or docs.


27-30: Verify and pin action SHAs
The tag release-0.7 could not be resolved in jumpstarter-dev/jumpstarter-e2e (404); please confirm the correct tag name and update both steps to use immutable commit SHAs (for example, pin main to 743645c760a96250d7497652ac28e393e91e6a7f).

deploy/helm/jumpstarter/values.yaml (1)

93-99: Keepalive timeout docs vs controller defaults

Comment says “default: 20s” while our controller sets a default when unspecified (defaultGrpcTimeout). Please confirm defaultGrpcTimeout matches 120s (or the upstream default you intend) and update either the code default or this comment to avoid confusion.

internal/config/types.go (1)

34-44: MinTime requirement remains implicit

Parsing of keepalive MinTime is still unconditional downstream. Ensure schemas/values guarantee a non-empty minTime to prevent parse errors (previously discussed).

internal/controller/exporter_controller.go (2)

41-43: Plumbing looks good

ExporterOptions injected into the reconciler is clean and keeps policy separate from logic.


163-190: Safe use of preprocessed timeout; small robustness note

Using GetOfflineTimeout centralizes policy. Ensure tests constructing ExporterReconciler outside LoadConfiguration either call PreprocessConfig or rely on the defensive getter (see suggested change in types.go).

internal/config/config.go (2)

16-22: Switch to []grpc.ServerOption is appropriate

This unblocks composing multiple keepalive/server params. Callers updated accordingly in services.


94-98: Preprocessing placement is correct and all LoadConfiguration callers handle the extended return signature

func (e *ExporterOptions) PreprocessConfig() {
// Parse and cache the offline timeout duration
if e.OfflineTimeout == "" {
e.offlineTimeoutDur = time.Minute // Default fallback
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the chart the default is indeed 3 minutes

Copy link
Member Author

@mangelajo mangelajo Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, just trying to be a bit defensive here, perhaps someone not even using the charts, but I should probably use the same default value, that's true.

} else {
duration, err := time.ParseDuration(e.OfflineTimeout)
if err != nil {
// If parsing fails, use default fallback
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we should probably fail here or log a warning/error

@mangelajo mangelajo force-pushed the exporter-timeout-grpc branch 2 times, most recently from 4d29efd to cbad11b Compare September 30, 2025 20:42
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
cmd/main.go (1)

157-157: Rename option to serverOptions for clarity.

The variable option now holds a slice of gRPC server options ([]grpc.ServerOption), but the singular name suggests a single option. This creates a mismatch between the name and the actual type, reducing code readability.

Apply this diff to rename the variable:

-	authenticator, prefix, router, option, provisioning, exporterOptions, err := config.LoadConfiguration(
+	authenticator, prefix, router, serverOptions, provisioning, exporterOptions, err := config.LoadConfiguration(

Also update line 219:

-		ServerOptions: option,
+		ServerOptions: serverOptions,
internal/config/types.go (1)

55-70: LGTM! Solid preprocessing with defensive defaults.

The PreprocessConfig method correctly parses and caches the offline timeout duration, with a sensible 3-minute default when the configuration is empty. This aligns with the Helm chart defaults and provides efficient access via GetOfflineTimeout.

Minor cleanup: The else clause at line 63 is unnecessary after the return in the error case. You can simplify:

 	} else {
 		duration, err := time.ParseDuration(e.OfflineTimeout)
 		if err != nil {
 			return fmt.Errorf("PreprocessConfig: failed to parse exporter offline timeout: %w", err)
-		} else {
-			e.offlineTimeoutDur = duration
-		}
+		}
+		e.offlineTimeoutDur = duration
 	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c1c96c6 and cbad11b.

📒 Files selected for processing (9)
  • .github/workflows/e2e.yaml (1 hunks)
  • cmd/main.go (3 hunks)
  • deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py (4 hunks)
  • deploy/helm/jumpstarter/charts/jumpstarter-controller/values.schema.json (3 hunks)
  • deploy/helm/jumpstarter/values.yaml (2 hunks)
  • internal/config/config.go (3 hunks)
  • internal/config/types.go (2 hunks)
  • internal/controller/exporter_controller.go (4 hunks)
  • internal/controller/lease_controller_test.go (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • deploy/helm/jumpstarter/values.yaml
  • deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py
🧰 Additional context used
🧬 Code graph analysis (5)
internal/controller/exporter_controller.go (2)
deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py (1)
  • ExporterOptions (206-212)
internal/config/types.go (1)
  • ExporterOptions (47-50)
internal/controller/lease_controller_test.go (2)
deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py (1)
  • ExporterOptions (206-212)
internal/config/types.go (1)
  • ExporterOptions (47-50)
cmd/main.go (2)
internal/config/config.go (1)
  • LoadConfiguration (45-100)
internal/config/types.go (2)
  • ExporterOptions (47-50)
  • Router (77-77)
internal/config/config.go (2)
internal/config/types.go (5)
  • Router (77-77)
  • Provisioning (22-24)
  • ExporterOptions (47-50)
  • Config (10-15)
  • Grpc (30-32)
internal/config/grpc.go (1)
  • LoadGrpcConfiguration (13-83)
internal/config/types.go (1)
deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py (3)
  • Provisioning (17-20)
  • Grpc (65-68)
  • ExporterOptions (206-212)
🔇 Additional comments (15)
.github/workflows/e2e.yaml (1)

22-30: LGTM! Good practice for backwards compatibility testing.

The addition of a dedicated job to test the current controller against the previous release (0.7) ensures backwards compatibility. The job ID correctly uses hyphens instead of dots, addressing the previous actionlint issue.

deploy/helm/jumpstarter/charts/jumpstarter-controller/values.schema.json (3)

217-236: LGTM! Schema correctly defines ExporterOptions.

The new ExporterOptions schema definition properly models the offlineTimeout field with appropriate type constraints (string|null) and descriptive metadata.


662-672: LGTM! ExporterOptions integrated into JumpstarterConfig.

The exporterOptions field is correctly added to JumpstarterConfig with proper reference to the ExporterOptions definition.


706-770: LGTM! Keepalive fields expanded appropriately.

The five new keepalive parameters (timeout, maxConnectionIdle, maxConnectionAge, maxConnectionAgeGrace, time) are properly defined with:

  • Correct type constraints (string|null)
  • Sensible defaults (null)
  • Clear descriptions of their purpose

These align with gRPC keepalive best practices and match the implementation in internal/config/grpc.go.

internal/config/config.go (2)

16-43: LGTM! LoadRouterConfiguration correctly updated to return slice.

The function signature properly returns []grpc.ServerOption instead of a single option, aligning with the pattern used throughout this PR for handling multiple gRPC server options.


45-99: LGTM! LoadConfiguration properly extended with ExporterOptions.

Key improvements:

  • Signature correctly returns *ExporterOptions as an additional value
  • All error returns properly updated with nil placeholders in the correct positions
  • PreprocessConfig() is called on ExporterOptions before returning (line 95), ensuring duration parsing happens at load time rather than lazy initialization
  • Consistent error handling throughout

The preprocessing approach addresses the concurrency concerns raised in past reviews by parsing configuration values once during initialization rather than at runtime.

internal/controller/lease_controller_test.go (1)

379-385: LGTM! Test properly configures ExporterOptions.

The test correctly:

  • Initializes ExporterReconciler with ExporterOptions containing a 1-minute offline timeout
  • Calls PreprocessConfig() to parse the duration before reconciliation
  • Validates the error from preprocessing

This mirrors the production initialization pattern and ensures tests exercise the real configuration path.

internal/controller/exporter_controller.go (3)

38-42: LGTM! ExporterOptions field added correctly.

The ExporterOptions field is properly added to the reconciler struct. Based on the discussion in past reviews, the preprocessing approach (parsing the duration during LoadConfiguration via PreprocessConfig()) is the correct solution rather than lazy initialization with sync.Once.


158-190: LGTM! Offline timeout now configurable.

The reconciliation logic correctly:

  • Retrieves the preprocessed offline timeout via GetOfflineTimeout()
  • Uses it consistently for all LastSeen comparisons
  • Updates condition messages to reflect the actual configured timeout

This replaces the previous hardcoded 1-minute timeout with a configurable value.


192-209: LGTM! Dynamic requeue logic is well-designed.

The new requeue calculation is sophisticated and correct:

  • Computes expiration time as LastSeen + offlineTimeout
  • Schedules requeue for expiration + 10-second safety margin
  • Includes a 5-minute cap to avoid excessive wait times
  • Handles the edge case where expiration has already passed

This ensures the controller efficiently re-evaluates exporter status around the time they should transition offline, rather than using a fixed requeue interval.

cmd/main.go (1)

176-184: LGTM! ExporterOptions correctly threaded through.

The ExporterOptions field is properly populated with the dereferenced exporterOptions returned from LoadConfiguration. The integration aligns with the changes in the ExporterReconciler struct.

internal/config/types.go (4)

4-5: LGTM! Required imports for duration handling.

The fmt and time imports are necessary for the new duration parsing and error handling in ExporterOptions.PreprocessConfig.


10-15: LGTM! ExporterOptions integrated into Config.

The ExporterOptions field is properly added to the Config struct with appropriate JSON tags, enabling serialization/deserialization of exporter-specific configuration.


34-45: LGTM! Comprehensive gRPC keepalive and timeout configuration.

The expanded Keepalive struct provides granular control over gRPC connection lifecycle through standard server parameters (Timeout, MaxConnectionIdle, MaxConnectionAge, MaxConnectionAgeGrace, Time). The inline comments clearly document each field's purpose, and the omitempty tags appropriately mark these as optional configuration values.


72-75: LGTM! Clean getter for cached duration.

The GetOfflineTimeout method provides efficient access to the pre-calculated duration. Since PreprocessConfig is called during LoadConfiguration (as shown in the relevant code snippets), the cached value is guaranteed to be initialized before any consumers access it.

@mangelajo mangelajo changed the title Exporter timeout grpc Exporter timeout and grpc knobs Oct 1, 2025
@mangelajo mangelajo force-pushed the exporter-timeout-grpc branch 2 times, most recently from 4d4a555 to c1579ae Compare October 2, 2025 10:03
@mangelajo mangelajo force-pushed the exporter-timeout-grpc branch from c1579ae to cec95ca Compare October 2, 2025 10:05
@mangelajo mangelajo requested review from bennyz and NickCao October 2, 2025 10:06
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
internal/config/grpc.go (1)

16-25: Make EnforcementPolicy optional; default MinTime to 5m when only PermitWithoutStream is set

Parsing MinTime unconditionally fails on empty config, making keepalive effectively mandatory. Build the policy only when configured; if only PermitWithoutStream is true, use grpc-go’s default 5m MinTime.

-	// Parse EnforcementPolicy parameters
-	minTime, err := time.ParseDuration(config.Keepalive.MinTime)
-	if err != nil {
-		return nil, err
-	}
-
-	serverOptions = append(serverOptions, grpc.KeepaliveEnforcementPolicy(keepalive.EnforcementPolicy{
-		MinTime:             minTime,
-		PermitWithoutStream: config.Keepalive.PermitWithoutStream,
-	}))
+	// Build EnforcementPolicy only when MinTime or PermitWithoutStream is configured
+	if config.Keepalive.MinTime != "" || config.Keepalive.PermitWithoutStream {
+		var minTime time.Duration
+		if config.Keepalive.MinTime != "" {
+			var err error
+			minTime, err = time.ParseDuration(config.Keepalive.MinTime)
+			if err != nil {
+				return nil, fmt.Errorf("LoadGrpcConfiguration: invalid Keepalive MinTime: %w", err)
+			}
+		} else {
+			minTime = 5 * time.Minute // grpc-go default
+		}
+		serverOptions = append(serverOptions, grpc.KeepaliveEnforcementPolicy(keepalive.EnforcementPolicy{
+			MinTime:             minTime,
+			PermitWithoutStream: config.Keepalive.PermitWithoutStream,
+		}))
+	}

Based on learnings

🧹 Nitpick comments (9)
.github/workflows/e2e.yaml (1)

27-27: Pin action to a commit SHA for supply‑chain safety.

Tags can be retagged; prefer an immutable SHA.

-      - uses: jumpstarter-dev/[email protected]
+      - uses: jumpstarter-dev/jumpstarter-e2e@<commit-sha-for-release-0.7>
internal/config/types.go (1)

75-77: Defensive fallback in GetOfflineTimeout (optional).

If PreprocessConfig is skipped accidentally, return a sane default.

 func (e *ExporterOptions) GetOfflineTimeout() time.Duration {
-  return e.offlineTimeoutDur
+  if e.offlineTimeoutDur <= 0 {
+    // fallback; PreprocessConfig should normally set this
+    return 3 * time.Minute
+  }
+  return e.offlineTimeoutDur
 }
cmd/main.go (2)

19-26: Add time import for configuration timeout (see below).

 import (
   "context"
   "crypto/tls"
   "encoding/pem"
   "flag"
   "net"
   "os"
+  "time"

157-171: Bound LoadConfiguration with a timeout.

Prevents hanging at startup if the API server is unreachable.

-  configResult, err := config.LoadConfiguration(
-    context.Background(),
+  cfgCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
+  defer cancel()
+  configResult, err := config.LoadConfiguration(
+    cfgCtx,
     mgr.GetAPIReader(),
     mgr.GetScheme(),
     client.ObjectKey{
       Namespace: os.Getenv("NAMESPACE"),
       Name:      "jumpstarter-controller",
     },
     oidcSigner,
     string(pem.EncodeToMemory(&pem.Block{
       Type:  "CERTIFICATE",
       Bytes: oidcCert.Certificate[0],
     })),
   )
internal/config/grpc.go (4)

69-75: Avoid shadowing the time package

Local var named time shadows the imported package; linters will flag this and it’s confusing.

-	if config.Keepalive.Time != "" {
-		time, err := time.ParseDuration(config.Keepalive.Time)
+	if config.Keepalive.Time != "" {
+		pingInterval, err := time.ParseDuration(config.Keepalive.Time)
 		if err != nil {
-			return nil, err
+			return nil, fmt.Errorf("LoadGrpcConfiguration: failed to parse Keepalive Time: %w", err)
 		}
-		serverParams.Time = time
+		serverParams.Time = pingInterval
 	}

43-48: Consistently wrap parse errors with context

Only Timeout parse errors are wrapped. Wrap all parse errors to aid debugging.

-		maxIdle, err := time.ParseDuration(config.Keepalive.MaxConnectionIdle)
-		if err != nil {
-			return nil, err
+		maxIdle, err := time.ParseDuration(config.Keepalive.MaxConnectionIdle)
+		if err != nil {
+			return nil, fmt.Errorf("LoadGrpcConfiguration: failed to parse Keepalive MaxConnectionIdle: %w", err)
 		}
...
-		maxAge, err := time.ParseDuration(config.Keepalive.MaxConnectionAge)
-		if err != nil {
-			return nil, err
+		maxAge, err := time.ParseDuration(config.Keepalive.MaxConnectionAge)
+		if err != nil {
+			return nil, fmt.Errorf("LoadGrpcConfiguration: failed to parse Keepalive MaxConnectionAge: %w", err)
 		}
...
-		maxAgeGrace, err := time.ParseDuration(config.Keepalive.MaxConnectionAgeGrace)
-		if err != nil {
-			return nil, err
+		maxAgeGrace, err := time.ParseDuration(config.Keepalive.MaxConnectionAgeGrace)
+		if err != nil {
+			return nil, fmt.Errorf("LoadGrpcConfiguration: failed to parse Keepalive MaxConnectionAgeGrace: %w", err)
 		}
...
-		if err != nil {
-			return nil, err
+		if err != nil {
+			return nil, fmt.Errorf("LoadGrpcConfiguration: failed to parse Keepalive Time: %w", err)
 		}

Also applies to: 52-57, 60-66, 71-72


31-39: Confirm intent: KeepaliveParams is always applied due to default Timeout

Setting a default Timeout makes serverParams non-empty, so KeepaliveParams will always be appended. If that’s intended, drop the comment “Only add … if at least one parameter is set”. If not intended, consider gating the default behind an explicit keepalive config.

Option A (intended always-on): remove the guard/comment.

-	// Only add ServerParameters if at least one parameter is set
-	if serverParams != (keepalive.ServerParameters{}) {
-		serverOptions = append(serverOptions, grpc.KeepaliveParams(serverParams))
-	}
+	serverOptions = append(serverOptions, grpc.KeepaliveParams(serverParams))

Option B (opt-in only): only set default when any field explicitly set; otherwise skip KeepaliveParams. Happy to draft this if you prefer.

Also applies to: 77-80


31-37: Validate non-negative durations

Guard against negative durations (accepted by time.ParseDuration) which lead to surprising behavior at runtime.

Example for Timeout (apply similarly to others as appropriate):

 	if config.Keepalive.Timeout != "" {
 		timeout, err := time.ParseDuration(config.Keepalive.Timeout)
 		if err != nil {
 			return nil, fmt.Errorf("LoadGrpcConfiguration: failed to parse Keepalive Timeout: %w", err)
 		}
+		if timeout < 0 {
+			return nil, fmt.Errorf("LoadGrpcConfiguration: Keepalive Timeout must be >= 0")
+		}
 		serverParams.Timeout = timeout
 	} else {
 		serverParams.Timeout = defaultGrpcTimeout
 	}

If you want stricter validation (e.g., > 0), clarify and I can update all fields accordingly.

Also applies to: 42-48, 51-57, 60-66, 69-75

internal/config/config.go (1)

65-65: Wrap decode errors with context

Add function/field context to errors for easier triage.

-	if err := yaml.Unmarshal([]byte(rawRouter), &router); err != nil {
-		return nil, err
+	if err := yaml.Unmarshal([]byte(rawRouter), &router); err != nil {
+		return nil, fmt.Errorf("LoadConfiguration: failed to unmarshal router: %w", err)
 	}
...
-	if err := yaml.UnmarshalStrict([]byte(rawConfig), &config); err != nil {
-		return nil, err
+	if err := yaml.UnmarshalStrict([]byte(rawConfig), &config); err != nil {
+		return nil, fmt.Errorf("LoadConfiguration: failed to unmarshal config: %w", err)
 	}

Also applies to: 75-75

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cbad11b and cec95ca.

📒 Files selected for processing (11)
  • .github/workflows/e2e.yaml (1 hunks)
  • cmd/main.go (3 hunks)
  • deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py (4 hunks)
  • deploy/helm/jumpstarter/charts/jumpstarter-controller/values.schema.json (3 hunks)
  • deploy/helm/jumpstarter/values.yaml (2 hunks)
  • internal/config/config.go (2 hunks)
  • internal/config/grpc.go (1 hunks)
  • internal/config/oidc.go (2 hunks)
  • internal/config/types.go (3 hunks)
  • internal/controller/exporter_controller.go (4 hunks)
  • internal/controller/lease_controller_test.go (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • deploy/helm/jumpstarter/values.yaml
  • deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py
  • deploy/helm/jumpstarter/charts/jumpstarter-controller/values.schema.json
  • internal/controller/lease_controller_test.go
🧰 Additional context used
🧬 Code graph analysis (6)
internal/controller/exporter_controller.go (1)
internal/config/types.go (1)
  • ExporterOptions (49-52)
cmd/main.go (5)
internal/config/config.go (1)
  • LoadConfiguration (44-105)
internal/oidc/op.go (1)
  • Signer (20-24)
internal/config/types.go (3)
  • ExporterOptions (49-52)
  • Provisioning (24-26)
  • Router (79-79)
internal/authentication/bearer.go (1)
  • NewBearerTokenAuthenticator (19-21)
internal/authorization/basic.go (1)
  • NewBasicAuthorizer (21-27)
internal/config/types.go (1)
deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py (5)
  • Authentication (215-222)
  • Provisioning (17-20)
  • Grpc (65-68)
  • ExporterOptions (206-212)
  • Router (289-294)
internal/config/config.go (3)
internal/config/types.go (8)
  • ConfigurationResult (93-99)
  • Router (79-79)
  • Config (12-17)
  • Authentication (19-22)
  • Grpc (32-34)
  • ExporterOptions (49-52)
  • AuthenticationConfigResult (87-90)
  • Provisioning (24-26)
internal/config/oidc.go (1)
  • LoadAuthenticationConfiguration (17-55)
internal/config/grpc.go (1)
  • LoadGrpcConfiguration (13-83)
internal/config/grpc.go (2)
internal/config/types.go (2)
  • Grpc (32-34)
  • Keepalive (36-47)
deploy/helm/jumpstarter/charts/jumpstarter-controller/model.py (2)
  • Grpc (65-68)
  • Keepalive (29-62)
internal/config/oidc.go (1)
internal/config/types.go (2)
  • AuthenticationConfigResult (87-90)
  • Internal (28-30)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: lint-go
  • GitHub Check: tests
  • GitHub Check: deploy-kind
  • GitHub Check: e2e-tests-release-0-7
  • GitHub Check: e2e-tests (ubuntu-24.04)
  • GitHub Check: e2e-tests (ubuntu-24.04-arm)
🔇 Additional comments (5)
internal/config/oidc.go (1)

17-55: Good refactor to structured result and internal JWT wiring.

Clear defaulting of prefix, safe CA handling (file vs inline), and cohesive return type.

internal/controller/exporter_controller.go (1)

163-209: Dynamic offline threshold and requeue logic looks solid.

Uses preprocessed timeout, clear condition messages, and capped requeue with safety margin.

Please confirm tests cover:

  • negative/zero offlineTimeout rejected by PreprocessConfig (after applying the guard),
  • online→offline transition timing near the boundary (within ±safetyMargin).
internal/config/config.go (3)

19-19: API change to []grpc.ServerOption looks good

Returning a slice aligns with grpc.NewServer(...opts...). LGTM.


59-60: Router unmarshal is non-strict — confirm this is intentional

Unmarshal (not Strict) allows unknown keys/typos in router. If you want to catch typos early, switch to UnmarshalStrict; if router is intentionally flexible, leave as-is.

I can provide a quick diff either way.


98-104: Nice consolidation into ConfigurationResult

Cleaner API and easier to evolve. LGTM.

@mangelajo mangelajo force-pushed the exporter-timeout-grpc branch from cec95ca to 31165c8 Compare October 2, 2025 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants