-
-
Notifications
You must be signed in to change notification settings - Fork 32.3k
Increase default keepAliveTimeout to 65s #59203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Review requested:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this should have been raised a long time ago.
What are the http keep-alive timeouts in browsers, smartphones etc? Can you provide references?
I did a quick check and Chrome would default to 300s. @nodejs/tsc can you weight it if you want to increase it up to 350s?
Fastify uses 72 seconds to avoid any issues with ALB/ELBs and other load balancers that expect 60s (having the same timeout is slightly problematic).
Increasing this exposes users to the possibility of DoS attacks by exhaustion of file descriptor - just keeping the sockets open would prevent new connections from arriving.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #59203 +/- ##
==========================================
+ Coverage 90.04% 90.06% +0.02%
==========================================
Files 648 648
Lines 191041 191041
Branches 37448 37455 +7
==========================================
+ Hits 172026 172065 +39
+ Misses 11651 11601 -50
- Partials 7364 7375 +11
🚀 New features to boost your workflow:
|
Hey @mcollina , Thanks for your feedback on the keep-alive timeout. I’ve chosen 60 seconds to align with defaults used by major cloud load balancers like AWS ELB/ALB, ensuring broad compatibility and preventing unexpected connection drops. Increasing the timeout to 72 seconds, as seen in Fastify and NGINX, could slightly boost connection reuse, but also raises the risk of file descriptor exhaustion under DoS scenarios. Overall, 60 seconds provides a safer balance between interoperability and security for most environments. I’m open to discussing further if there’s a strong case for a higher value. |
@mcollina , I've resolved this issue and removed the breaking change alert regarding http.Server.keepAliveTimeout. The documentation now reflects the correct status, in line with the release team's guidance. Please review my pull request when you have a chance. Thank you! |
I've seen 60 seconds cause problems due to time network delays - I've actually seen this happen. Going over 60 seconds is necessary to account for that. |
Okay, sure—that makes sense. Is 72 seconds sufficient to account for those network delays, or do you recommend going even higher? Happy to adjust as needed based on your experience. |
I literally picked 72 with no reasoning. We could do 65. |
Thanks for clarifying. I’m fine with 65 seconds if that works better—happy to update the default to 65 to provide a bit more buffer over 60. Let me know if you have any concerns or if you think another value might be preferable. |
@pras529 there are 4 frailing tests and linting to be fixed, could you address them? |
@mcollina Yes, I’ll address the failing tests and linting issues—just need a bit of time to work through them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very happy to increase this as well, but it's worth noting that the description in the linked issue doesn't quite match the reality of what's going on. The example reproduction in #59193 specifically isn't correct - in fact the response there is delivered just fine AFAICT. The keepAliveTimeout
we're talking about here is how long the server will wait between requests, to keep a connection alive, when there's no response pending. In cases like that repro, it's not relevant: the response will be delivered to the client, and then the connection will close 5 seconds later (11 seconds after the initial connection started, in that example).
I've done some quick research, there's a wide split in defaults elsewhere:
- Reverse proxies & standalone server tools generally all have long multi-minute keep-alive timeouts, e.g.:
- Nginx uses 75 seconds
- Caddy uses 300 seconds
- Traefik uses 180 seconds
- AWS ELBs use 60 seconds
- Google's Cloud uses 620 seconds
- Cloudflare users 900 seconds
- Meanwhile most other programming languages and backends tend to use fairly short keep-alive timeouts:
- Uvicorn defaults to 5 seconds
- Gunicorn defaults to 2 seconds
- Puma defaults to 20 seconds
- Apache defaults to 5 seconds
- Jetty defaults to 30 seconds
I did a quick check and Chrome would default to 300s.
I don't think browser's configuration matters as much as LBs & reverse proxies. Given a RST when reusing an idle connection they'll retry transparently without exposing the error (I've just tested Chrome & Firefox and confirmed) and I've seen them apply heuristics and various connection pooling games to manage keep-alive and idle sockets there in the past - it's not a fixed value.
Of course, it's still nice for performance to keep the connection open, but in browser-facing performance-sensitive cases you'll probably want HTTP/2 anyway, which this PR doesn't touch.
The only case where there's going to cause an actual problem (e.g. those 502 errors) is when a) the client which doesn't gracefully handle this (it's not a browser) and b) you hit this race condition where the server closes the socket, but the client sends the next request before it processes the FIN packet.
Unless the Node timeout is greater than the LB timeout (i.e. many minutes) then that race condition is always possible, but any increase will exponentially decrease the probability of this race happening (particularly on production sites, where I'd strongly expect most LB->backend connections are rarely idle for more than a few seconds anyway).
Anyway, that's a long way of saying: given the above I think boosting this to 30+ seconds is a good idea, happy to go with 65 seconds as a balanced option 👍 but I don't think we should worry about going beyond that - we're balancing protection against resource usage vs race condition resistance, and there's diminishing returns.
Thanks for referencing this and working on a fix! |
e59f1f3
to
695d5d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code all looks good to me 👍
The first commit message is failing the linting - you'll need to use git to modify that. It's probably best to just squash all your commits and change the resulting commit message to describe the end result of the whole change here, because that message is what will be shown in the CHANGELOG when this is released.
http: Requested changes addressed and resolved http: Increase default keepAliveTimeout to 65s http: explicitly set keepAliveTimeout to 65s in tests http: increase keepAliveTimeout default Increase the default keepAliveTimeout to 65 seconds in the http module. Tests have been updated to explicitly set the new timeout value. All requested changes have been addressed and resolved. PR-URL: nodejs#59203
695d5d0
to
33c1556
Compare
Hello @mcollina, Thank you for your review. I’ve addressed the changes you requested and updated the commit message to fully comply with our guidelines. I’m still a bit uncertain about the Linux and macOS test results—could you please take a look and let me know if anything else needs adjustment? Thanks for your help! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more nit, sorry.
I'm on the fence to consider this a breaking change or not. If so, this change will probably go out in v25, @pimterry wdyt? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 65 is a much more reasonable default than 5. Apache are the only ones I found to reduce it and that was with a given threading model (open idle connections take a thread and are more expensive) and time.
The nginx 75s keepalive timeout sounds a lot more reasonable to me ( https://nginx.org/en/docs/http/ngx_http_core_module.html )
Especially since reverse proxying Node behind nginx is common so I suspect gateway timeouts are needlessly common now for people using node:http directly.
Increase the default keepAliveTimeout to 65 seconds in the http module. Tests have been updated to explicitly set the new timeout value. PR-URL: nodejs#59203
33c1556
to
b33c716
Compare
@benjamingr any opinion on semversiness? |
I think it could be semver minor. This will change some behaviour (more connection reuse, subtly different default response headers, could very mildly increase resource usage in scenarios with many keep-alive-but-no-requests clients) but that's all quite limited, and nothing that would break any reasonable code I can imagine.
I don't mind going to 75s if people prefer, but I don't think finetuning matters much in practice: the important thing for actual failures (as opposed to peak performance) is the relative difference between the race condition time (client sending a packet before receiving the FIN = roughly RTT, say 5ms) and the total timeout. Node closing before the LB isn't a big problem, as long as the LB isn't sending a request at exactly the same time. With this change, to hit an error a client needs to send a first request, wait exactly 64.995 seconds, and then send another request just before the FIN comes in. For any likely traffic distribution, I think odds go down exponentially as the timeout goes up, so by 65s this is already extremely unlikely to ever appear in practice. Meanwhile resource usage (no threads for us, but still handles + some memory at least) goes up linearly. For failures specifically I don't think tuning individual seconds matters much here (for maximising performance in some envs fine tuning definitely does matter, but it's env specific and I'd expect 60s+ should already deliver pretty good results for typical node deployments). |
Summary
This PR increases the default value of
http.Server.keepAliveTimeout
from 5000ms (5s) to 65000ms (65s). This change aligns Node.js with the keep-alive timeout expectations of most browsers, proxies, and mobile clients, reducing the risk of subtle long-lived connection failures due to premature socket closure.Details
The
keepAliveTimeout
default is now set to 65 seconds (65000ms) instead of the previous 5 seconds (5000ms).All relevant documentation and code comments have been updated to reflect the new default.
Motivation
The previous default of 5 seconds is shorter than the defaults used by common infrastructure, which can lead to unexpected connection terminations or failures. Setting the default to 65 seconds brings Node.js in line with industry standards and improves interoperability.
References
Please review carefully as this changes default server behavior for all applications that do not explicitly set
keepAliveTimeout
.