- 
                Notifications
    You must be signed in to change notification settings 
- Fork 6.9k
Open
Labels
coreIssues that should be addressed in Ray CoreIssues that should be addressed in Ray Core
Description
Background
We just refactored PutInLocalPlasmaCallback to return Status and propagated failures through TaskManager instead of crashing. During shutdown, IOError from plasma puts is tolerated and logged; otherwise, we surface a system failure to the task owner. See #55367
We should apply this pattern consistently across other Raylet IPC and plasma callsites.
Goal:
Ensure all Raylet IPC and Plasma interactions:
- Return/propagate Status to callers instead of hard-crashing (no RAY_CHECK_OK on external IPC paths).
- Document expected Status codes per operation (e.g., OK/ObjectExists, ObjectStoreFull, IOError, Disconnected).
- Tolerate shutdown-specific IOErrors where appropriate (warn and continue), but treat non-shutdown failures as real errors handled at the task/owner level.
- Avoid status-message string matching; use status codes.
Scope (primary targets to audit):
- Core worker plasma paths in CoreWorker and CoreWorkerPlasmaStoreProvider:
- PutInLocalPlasmaStore,- Put,- CreateExisting,- SealExisting,- Release,- GetIfLocal,- Contains,- Delete/DeleteImpl,- GetPlasmaUsage.
 
- Raylet IPC paths relevant to object lifecycle:
- PinObjectIDs RPC callback handling,- UpdateObjectLocation/pubsub paths.
 
- Higher-level consumers:
- TaskManagerreturn-object handling (done in current PR),- ObjectRecoveryManager,- FutureResolver, generator streaming- HandleReportGeneratorItemReturns(evaluate best-effort vs strict propagation).
 
- Any remaining RAY_CHECK_OKimmediately after Raylet/Plasma calls that can return expected runtime errors.
Metadata
Metadata
Assignees
Labels
coreIssues that should be addressed in Ray CoreIssues that should be addressed in Ray Core