You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/learn/documentation/versioned/jobs/samza-configurations.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -295,6 +295,7 @@ Samza supports both standalone and clustered ([YARN](yarn-jobs.html)) [deploymen
295
295
|--- |--- |--- |
296
296
|cluster-manager.container.retry.count|8|If a container fails, it is automatically restarted by Samza. However, if a container keeps failing shortly after startup, that indicates a deeper problem, so we should kill the job rather than retrying indefinitely. This property determines the maximum number of times we are willing to restart a failed container in quick succession (the time period is configured with `cluster-manager.container.retry.window.ms`). Each container in the job is counted separately. If this property is set to 0, any failed container immediately causes the whole job to fail. If it is set to a negative number, there is no limit on the number of retries.|
297
297
|cluster-manager.container.retry.window.ms|300000|This property determines how frequently a container is allowed to fail before we give up and fail the job. If the same container has failed more than `cluster-manager.container.retry.count` times, and the time between failures was less than this property `cluster-manager.container.retry.window.ms` (in milliseconds), then we fail the job. There is no limit to the number of times we will restart a container if the time between failures is greater than `cluster-manager.container.retry.window.ms`.|
298
+
|cluster-manager.container.preferred-host.last.retry.delay.ms|360000|The delay added to the last retry for a failing container after all but one of cluster-manager.container.retry.count retries have been exhausted. The delay is only added when `job.host-affinity.enabled` is true and the retried request is for a preferred host. This addresses the issue where there may be a delay when a preferred host is marked invalid and the container continuously attempts to restart and fail on the invalid preferred host. This property is useful to prevent the `cluster-manager.container.retry.count` from being exceeded too quickly for such scenarios.|
298
299
|cluster-manager.jobcoordinator.jmx.enabled|true|This is deprecated in favor of `job.jmx.enabled`|
299
300
|cluster-manager.allocator.sleep.ms|3600|The container allocator thread is responsible for matching requests to allocated containers. The sleep interval for this thread is configured using this property.|
300
301
|cluster-manager.container.request.timeout.ms|5000|The allocator thread periodically checks the state of the container requests and allocated containers to determine the assignment of a container to an allocated resource. This property determines the number of milliseconds before a container request is considered to have expired / timed-out. When a request expires, it gets allocated to any available container that was returned by the cluster manager.|
0 commit comments