Skip to content

Job worker randomly restarts? #213

@keithschacht

Description

@keithschacht

I've had SolidQueue setup and working for awhile on a low traffic app. But I sporadically get a job that takes a long time to complete (I just had one that took 7 minutes). When I check the logs, it looks like the job starts, then the worker gets killed and takes a bit to restart. After it restarts, it picks up the job again and completes it quickly (the usual 10 seconds). I'm struggling to figure out the root cause of this since my UI needs this job to update things so the user is just sitting there.

Notably, the pattern is the sam: My app is warmed up and running. I'm taking a series of user actions on the front-end which each trigger a job to complete. I'm having success many times in a row, which I can tell because the UI quickly updates, and then randomly the UI stops updating. I check the logs and see it's doing this weird restart thing.

My database is postgres. I'm deployed to Render. I'm running Solid Queue's supervisor together with Puma (using plugin :solid_queue).

Here is the log output for a recent failure. I tried to delete lines that were definitely unrelated. Also, any lines beginning with "### " are simple puts messages from within my job:

[SolidQueue] Claimed 1 jobs
I, [2024-04-29T18:16:29.093278 #67]  INFO -- : [ActiveJob] [GetNextAIMessageJob] [33930e21-87b8-4173-9c6e-edc5e39ecf4b] Performing GetNextAIMessageJob (Job ID: 33930e21-87b8-4173-9c6e-edc5e39ecf4b) from SolidQueue(default) enqueued at 2024-04-29T18:16:29.024459609Z with arguments: 3357, 3
### GetNextAIMessageJob.perform(3357, 3, 1)
### Wait for reply
[SolidQueue] Restarting fork[67] (status: )
[SolidQueue] Starting Worker(pid=91, hostname=srv-cn7qcf7sc6pc73cbk1ug-64d5657b7b-4kdb6, metadata={:polling_interval=>0.1, :queues=>"*", :thread_pool_size=>5})
[1] - Worker 0 (PID: 103) booted in 0.0s, phase: 0
I, [2024-04-29T18:18:21.404711 #103]  INFO -- : [6ee21674-2654-445a-a387-2c64c24f786c] Started GET "/cable" for 23.112.39.144 at 2024-04-29 18:18:21 +0000
I, [2024-04-29T18:18:21.413428 #103]  INFO -- : [6ee21674-2654-445a-a387-2c64c24f786c] Started GET "/cable" [WebSocket] for 23.112.39.144 at 2024-04-29 18:18:21 +0000
I, [2024-04-29T18:18:21.413564 #103]  INFO -- : [6ee21674-2654-445a-a387-2c64c24f786c] Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: websocket)
I, [2024-04-29T18:18:21.514497 #103]  INFO -- : Turbo::StreamsChannel is streaming from Z2lkOi8vaG9zdGVkLWdwdC9Db252ZXJzYXRpb24vMTI1
[SolidQueue] Pruning dead process 360 - {"polling_interval"=>0.1, "queues"=>"*", "thread_pool_size"=>5}
[SolidQueue] Claimed 1 jobs
I, [2024-04-29T18:23:10.309082 #91]  INFO -- : [ActiveJob] [GetNextAIMessageJob] [33930e21-87b8-4173-9c6e-edc5e39ecf4b] Performing GetNextAIMessageJob (Job ID: 33930e21-87b8-4173-9c6e-edc5e39ecf4b) from SolidQueue(default) enqueued at 2024-04-29T18:16:29.024459609Z with arguments: 3357, 3
### GetNextAIMessageJob.perform(3357, 3, 1)
### Wait for reply
### Finished GetNextAIMessageJob.perform(3357, 3)
I, [2024-04-29T18:23:24.835017 #91]  INFO -- : [ActiveJob] [GetNextAIMessageJob] [33930e21-87b8-4173-9c6e-edc5e39ecf4b] Performed GetNextAIMessageJob (Job ID: 33930e21-87b8-4173-9c6e-edc5e39ecf4b) from SolidQueue(default) in 14525.39ms
I, [2024-04-29T18:23:25.344451 #91]  INFO -- : [ActiveJob] Enqueued Turbo::Streams::BroadcastStreamJob (Job ID: b0083a97-908d-4568-85ec-1ebceb374adf) to SolidQueue(default) with arguments: "Z2lkOi8vaG9zdGVkLWdwdC9Db252ZXJzYXRpb24vMTI1", {:content=>"<turbo-stream action=\"refresh\"></turbo-stream>"}
[SolidQueue] Claimed 1 jobs
I, [2024-04-29T18:23:25.365729 #91]  INFO -- : [ActiveJob] [Turbo::Streams::BroadcastStreamJob] [b0083a97-908d-4568-85ec-1ebceb374adf] Performing Turbo::Streams::BroadcastStreamJob (Job ID: b0083a97-908d-4568-85ec-1ebceb374adf) from SolidQueue(default) enqueued at 2024-04-29T18:23:25.335394484Z with arguments: "Z2lkOi8vaG9zdGVkLWdwdC9Db252ZXJzYXRpb24vMTI1", {:content=>"<turbo-stream action=\"refresh\"></turbo-stream>"}
I, [2024-04-29T18:23:25.366418 #91]  INFO -- : [ActiveJob] [Turbo::Streams::BroadcastStreamJob] [b0083a97-908d-4568-85ec-1ebceb374adf] Performed Turbo::Streams::BroadcastStreamJob (Job ID: b0083a97-908d-4568-85ec-1ebceb374adf) from SolidQueue(default) in 0.87ms
I, [2024-04-29T18:23:26.311872 #103]  INFO -- : [5082870c-319c-444c-b71e-c36f29e7804c] Started GET "/conversations/125/messages?version=1" for 108.162.245.166 at 2024-04-29 18:23:26 +0000
I, [2024-04-29T18:23:26.317496 #103]  INFO -- : [5082870c-319c-444c-b71e-c36f29e7804c] Processing by MessagesController#index as HTML
I, [2024-04-29T18:23:26.317594 #103]  INFO -- : [5082870c-319c-444c-b71e-c36f29e7804c]   Parameters: {"version"=>"1", "conversation_id"=>"125"}
[1] - Worker 1 (PID: 124) booted in 0.0s, phase: 0
I, [2024-04-29T18:23:28.218480 #103]  INFO -- : [5082870c-319c-444c-b71e-c36f29e7804c]   Rendered layout layouts/application.html.erb (Duration: 455.2ms | Allocations: 129565)
I, [2024-04-29T18:23:28.219448 #103]  INFO -- : [5082870c-319c-444c-b71e-c36f29e7804c] Completed 200 OK in 1902ms (Views: 448.0ms | ActiveRecord: 43.2ms | Allocations: 179138)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions