Skip to content

Conversation

@github-actions
Copy link
Contributor

@github-actions github-actions bot commented May 10, 2024

Backport of #101684 to release/8.0-staging

/cc @wfurt @rzikm

Customer Impact

Reported by customer via official support. Small repro available.
Customers on Linux sees increased memory usage when establishing parallel connections to the same host (note that parallel requests on HTTP/1.1 will always use parallel connections). Measured overhead can be easily 100M+, which is problem for containers in k8s clusters limited to 300M memory.

It also helped one customer in general "memory problems with .NET 6 -> 8 upgrade" issue - see comment.

Workaround is lowering TLS cache size via:

  • System.Net.Security.TlsCacheSize AppCtx switch, or
  • DOTNET_SYSTEM_NET_SECURITY_TLSCACHESIZE environment variable

Technical details:

The mechanism of the (bounded) memory leak is as follows:

  • On new TLS connection, we receive TLS session ticket so that future connections can use TLS Resume (for faster handshake) (this is new feature in .NET 7).
  • There are 2 TLS session caches: internal one in OpenSSL, and managed one in .NET. These caches are supposed to be in sync via callbacks (e.g. OpenSSL may remove an entry internally and notifies managed code via callback that a ticket has been invalidated).
  • When many connections are opened in a burst, we receive many TLS session tickets in quick succession, but the implementation keeps only the last one.
  • Sessions discarded from managed (.NET) TLS session cache were still being kept in the internal OpenSSL cache, until the ticket expired or the internal cache reached a predefined maximum size.

The fix is to keep the two caches in sync and remove the dropped TLS session tickets from the internal cache as well.

Regression

Yes, the bug is part of TLS Session resumption feature on Linux, introduced in .NET 7. For customers migrating from .NET 6 it manifests as E2E scenario regression.

Testing

Tested on customer provided minimal repro.
Customer was not willing to verify privates in production.
Note: Customer confirmed that the workaround helps them in production, which means we have high confidence, this fix is the real root cause of their production problems and will help them.

Risk

Low, the issue is well understood and the change is localized to the feature. Functional tests verified TLS resumption works.

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/ncl, @bartonjs, @vcsjones
See info in area-owners.md if you want to be subscribed.

@rzikm rzikm requested review from karelz and wfurt May 13, 2024 08:09
@rzikm rzikm changed the title [release/8.0-staging] Removed unused sessions from SSL_CTX internal cache [release/8.0] Removed unused sessions from SSL_CTX internal cache May 13, 2024
Copy link
Member

@wfurt wfurt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@karelz karelz added os-linux Linux OS (any supported distro) Servicing-consider Issue for next servicing release review labels Jun 3, 2024
@karelz
Copy link
Member

karelz commented Jun 3, 2024

Memory usage impact on production systems. E2E scenario regression from .NET 6. We should service it.

@rzikm
Copy link
Member

rzikm commented Jun 3, 2024

Approved via email on 06/03.

@rzikm rzikm added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Jun 3, 2024
@rzikm rzikm merged commit bfaf24e into release/8.0-staging Jun 4, 2024
@jkotas jkotas deleted the backport/pr-101684-to-release/8.0-staging branch June 7, 2024 03:26
@karelz
Copy link
Member

karelz commented Jun 24, 2024

Will be part of July release 8.0.7.

@karelz karelz modified the milestones: 8.0.x, 8.0.7 Jun 25, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Jul 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-System.Net.Security os-linux Linux OS (any supported distro) Servicing-approved Approved for servicing release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants