Skip to content

Increase default keepAliveTimeout to 65s #59203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

pras529
Copy link

@pras529 pras529 commented Jul 25, 2025

Summary

This PR increases the default value of http.Server.keepAliveTimeout from 5000ms (5s) to 65000ms (65s). This change aligns Node.js with the keep-alive timeout expectations of most browsers, proxies, and mobile clients, reducing the risk of subtle long-lived connection failures due to premature socket closure.

Details

  • Default Change:
    The keepAliveTimeout default is now set to 65 seconds (65000ms) instead of the previous 5 seconds (5000ms).
  • Documentation & Comments:
    All relevant documentation and code comments have been updated to reflect the new default.

Motivation

The previous default of 5 seconds is shorter than the defaults used by common infrastructure, which can lead to unexpected connection terminations or failures. Setting the default to 65 seconds brings Node.js in line with industry standards and improves interoperability.

References


Please review carefully as this changes default server behavior for all applications that do not explicitly set keepAliveTimeout.

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/http
  • @nodejs/net

@nodejs-github-bot nodejs-github-bot added http Issues or PRs related to the http subsystem. needs-ci PRs that need a full CI run. labels Jul 25, 2025
@pras529
Copy link
Author

pras529 commented Jul 25, 2025

@jasnell @targos @lpinca Please review my pull request when you have a chance. Thank you!

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this should have been raised a long time ago.

What are the http keep-alive timeouts in browsers, smartphones etc? Can you provide references?

I did a quick check and Chrome would default to 300s. @nodejs/tsc can you weight it if you want to increase it up to 350s?

Fastify uses 72 seconds to avoid any issues with ALB/ELBs and other load balancers that expect 60s (having the same timeout is slightly problematic).

Increasing this exposes users to the possibility of DoS attacks by exhaustion of file descriptor - just keeping the sockets open would prevent new connections from arriving.

@mcollina mcollina added the semver-major PRs that contain breaking changes and should be released in the next major version. label Jul 25, 2025
@pras529 pras529 marked this pull request as draft July 25, 2025 09:07
Copy link

codecov bot commented Jul 25, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.06%. Comparing base (7215d9b) to head (e59f1f3).
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #59203      +/-   ##
==========================================
+ Coverage   90.04%   90.06%   +0.02%     
==========================================
  Files         648      648              
  Lines      191041   191041              
  Branches    37448    37455       +7     
==========================================
+ Hits       172026   172065      +39     
+ Misses      11651    11601      -50     
- Partials     7364     7375      +11     
Files with missing lines Coverage Δ
lib/_http_server.js 97.23% <100.00%> (ø)

... and 32 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pras529 pras529 marked this pull request as ready for review July 25, 2025 09:42
@pras529
Copy link
Author

pras529 commented Jul 25, 2025

Hey @mcollina ,

Thanks for your feedback on the keep-alive timeout.

I’ve chosen 60 seconds to align with defaults used by major cloud load balancers like AWS ELB/ALB, ensuring broad compatibility and preventing unexpected connection drops. Increasing the timeout to 72 seconds, as seen in Fastify and NGINX, could slightly boost connection reuse, but also raises the risk of file descriptor exhaustion under DoS scenarios.

Overall, 60 seconds provides a safer balance between interoperability and security for most environments. I’m open to discussing further if there’s a strong case for a higher value.

@pras529
Copy link
Author

pras529 commented Jul 25, 2025

@mcollina , I've resolved this issue and removed the breaking change alert regarding http.Server.keepAliveTimeout. The documentation now reflects the correct status, in line with the release team's guidance. Please review my pull request when you have a chance. Thank you!

@pras529 pras529 changed the title Increase default keepAliveTimeout to 60s and warn on legacy 5s usage Increase default keepAliveTimeout to 60s Jul 25, 2025
@pras529 pras529 requested a review from mcollina July 25, 2025 09:57
@mcollina
Copy link
Member

mcollina commented Jul 25, 2025

I've seen 60 seconds cause problems due to time network delays - I've actually seen this happen. Going over 60 seconds is necessary to account for that.

@pras529
Copy link
Author

pras529 commented Jul 25, 2025

Okay, sure—that makes sense. Is 72 seconds sufficient to account for those network delays, or do you recommend going even higher? Happy to adjust as needed based on your experience.

@mcollina
Copy link
Member

I literally picked 72 with no reasoning. We could do 65.

@pras529
Copy link
Author

pras529 commented Jul 25, 2025

Thanks for clarifying. I’m fine with 65 seconds if that works better—happy to update the default to 65 to provide a bit more buffer over 60. Let me know if you have any concerns or if you think another value might be preferable.

@pras529 pras529 changed the title Increase default keepAliveTimeout to 60s Increase default keepAliveTimeout to 65s Jul 25, 2025
@mcollina
Copy link
Member

@pras529 there are 4 frailing tests and linting to be fixed, could you address them?

@pras529
Copy link
Author

pras529 commented Jul 25, 2025

@mcollina Yes, I’ll address the failing tests and linting issues—just need a bit of time to work through them.

Copy link
Member

@pimterry pimterry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very happy to increase this as well, but it's worth noting that the description in the linked issue doesn't quite match the reality of what's going on. The example reproduction in #59193 specifically isn't correct - in fact the response there is delivered just fine AFAICT. The keepAliveTimeout we're talking about here is how long the server will wait between requests, to keep a connection alive, when there's no response pending. In cases like that repro, it's not relevant: the response will be delivered to the client, and then the connection will close 5 seconds later (11 seconds after the initial connection started, in that example).

I've done some quick research, there's a wide split in defaults elsewhere:

I did a quick check and Chrome would default to 300s.

I don't think browser's configuration matters as much as LBs & reverse proxies. Given a RST when reusing an idle connection they'll retry transparently without exposing the error (I've just tested Chrome & Firefox and confirmed) and I've seen them apply heuristics and various connection pooling games to manage keep-alive and idle sockets there in the past - it's not a fixed value.

Of course, it's still nice for performance to keep the connection open, but in browser-facing performance-sensitive cases you'll probably want HTTP/2 anyway, which this PR doesn't touch.

The only case where there's going to cause an actual problem (e.g. those 502 errors) is when a) the client which doesn't gracefully handle this (it's not a browser) and b) you hit this race condition where the server closes the socket, but the client sends the next request before it processes the FIN packet.

Unless the Node timeout is greater than the LB timeout (i.e. many minutes) then that race condition is always possible, but any increase will exponentially decrease the probability of this race happening (particularly on production sites, where I'd strongly expect most LB->backend connections are rarely idle for more than a few seconds anyway).

Anyway, that's a long way of saying: given the above I think boosting this to 30+ seconds is a good idea, happy to go with 65 seconds as a balanced option 👍 but I don't think we should worry about going beyond that - we're balancing protection against resource usage vs race condition resistance, and there's diminishing returns.

@yujiang
Copy link

yujiang commented Jul 25, 2025

Thanks for referencing this and working on a fix!
Looks like a great step forward, and I appreciate you taking the time to address it 🙏

@pras529 pras529 force-pushed the feat/keepalive-timeout-60s branch from e59f1f3 to 695d5d0 Compare July 25, 2025 12:56
@pras529 pras529 requested a review from pimterry July 25, 2025 13:02
Copy link
Member

@pimterry pimterry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code all looks good to me 👍

The first commit message is failing the linting - you'll need to use git to modify that. It's probably best to just squash all your commits and change the resulting commit message to describe the end result of the whole change here, because that message is what will be shown in the CHANGELOG when this is released.

pras529 added a commit to pras529/node that referenced this pull request Jul 25, 2025
http: Requested changes addressed and resolved

http: Increase default keepAliveTimeout to 65s

http: explicitly set keepAliveTimeout to 65s in tests

http: increase keepAliveTimeout default

Increase the default keepAliveTimeout to 65 seconds in the http module. Tests have been
updated to explicitly set the new timeout value. All requested changes
have been addressed and resolved.

PR-URL: nodejs#59203
@pras529 pras529 force-pushed the feat/keepalive-timeout-60s branch from 695d5d0 to 33c1556 Compare July 25, 2025 17:19
@pras529
Copy link
Author

pras529 commented Jul 26, 2025

Hello @mcollina,

Thank you for your review. I’ve addressed the changes you requested and updated the commit message to fully comply with our guidelines. I’m still a bit uncertain about the Linux and macOS test results—could you please take a look and let me know if anything else needs adjustment?

Thanks for your help!

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more nit, sorry.

@mcollina
Copy link
Member

I'm on the fence to consider this a breaking change or not. If so, this change will probably go out in v25,
but I think it's pretty harmless to ship in v24 as well.

@pimterry wdyt?

Copy link
Member

@benjamingr benjamingr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 65 is a much more reasonable default than 5. Apache are the only ones I found to reduce it and that was with a given threading model (open idle connections take a thread and are more expensive) and time.

The nginx 75s keepalive timeout sounds a lot more reasonable to me ( https://nginx.org/en/docs/http/ngx_http_core_module.html )

Especially since reverse proxying Node behind nginx is common so I suspect gateway timeouts are needlessly common now for people using node:http directly.

Increase the default keepAliveTimeout to 65 seconds in the http module. Tests have been
updated to explicitly set the new timeout value.

PR-URL: nodejs#59203
@pras529 pras529 force-pushed the feat/keepalive-timeout-60s branch from 33c1556 to b33c716 Compare July 26, 2025 14:23
@pras529 pras529 requested a review from mcollina July 26, 2025 14:32
@mcollina
Copy link
Member

@benjamingr any opinion on semversiness?

@pimterry
Copy link
Member

I think it could be semver minor. This will change some behaviour (more connection reuse, subtly different default response headers, could very mildly increase resource usage in scenarios with many keep-alive-but-no-requests clients) but that's all quite limited, and nothing that would break any reasonable code I can imagine.

The nginx 75s keepalive timeout sounds a lot more reasonable to me

I don't mind going to 75s if people prefer, but I don't think finetuning matters much in practice: the important thing for actual failures (as opposed to peak performance) is the relative difference between the race condition time (client sending a packet before receiving the FIN = roughly RTT, say 5ms) and the total timeout. Node closing before the LB isn't a big problem, as long as the LB isn't sending a request at exactly the same time.

With this change, to hit an error a client needs to send a first request, wait exactly 64.995 seconds, and then send another request just before the FIN comes in. For any likely traffic distribution, I think odds go down exponentially as the timeout goes up, so by 65s this is already extremely unlikely to ever appear in practice. Meanwhile resource usage (no threads for us, but still handles + some memory at least) goes up linearly.

For failures specifically I don't think tuning individual seconds matters much here (for maximising performance in some envs fine tuning definitely does matter, but it's env specific and I'd expect 60s+ should already deliver pretty good results for typical node deployments).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http Issues or PRs related to the http subsystem. needs-ci PRs that need a full CI run. semver-major PRs that contain breaking changes and should be released in the next major version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants