Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 17 additions & 7 deletions src/libraries/System.Net.NameResolution/src/System/Net/Dns.cs
Original file line number Diff line number Diff line change
Expand Up @@ -641,9 +641,7 @@ private static bool LogFailure(object hostNameOrAddress, long? startingTimestamp
/// Since this is doing synchronous work on a thread pool thread, we want to limit how many threads end up being
/// blocked. We could employ a semaphore to limit overall usage, but a common case is that DNS requests are made
/// for only a handful of endpoints, and a reasonable compromise is to ensure that requests for a given host are
/// serialized. Once the data for that host is cached locally by the OS, the subsequent requests should all complete
/// very quickly, and if the head-of-line request is taking a long time due to the connection to the server, we won't
/// block lots of threads all getting data for that one host. We also still want to issue the request to the OS, rather
Comment on lines -644 to -646
Copy link
Contributor Author

@antonfirsov antonfirsov Oct 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the data for that host is cached locally by the OS, the subsequent requests should all complete very quickly, and if the head-of-line request is taking a long time due to the connection to the server, we won't block lots of threads all getting data for that one host

This turned out to be a false assumption. In Ubuntu there is no caching by default, and each getaddrinfo call sends an A & AAAA DNS request and takes 5-50ms. If the packet is lost, it waits 5 sec before retrying.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turned out to be a false assumption. In Ubuntu there is no caching by default, and each getaddrinfo call sends an A & AAAA DNS request and takes 5-50ms.

On what are you basing this? On Ubuntu for:

[Benchmark]
public object GetHostEntry() => Dns.GetHostEntry("www.microsoft.com");

I get:

Method Mean
GetHostEntry 498.9 us

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Benchmark]
public void GetHostAddresses() => Dns.GetHostAddresses("fb.lt.main.ml.mdn.skype.net");

averages to 24.84 ms in my local 22.04 environment and to 9.051 ms in the Azure D32 VM from #92054.

Copy link
Member

@MihaZupan MihaZupan Oct 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fb.lt.main.ml.mdn.skype.net

That name resolves through a CNAME of fb.lt.main.ml.mdn.trafficmanager.net, which advertises a CNAME for fileserver-backend-lb.lt-westus-01.ic3-middle-lane-main-lt.westus-test.cosmic-int.office.net with a 0 second TTL. So my guess is the OS is just honoring the TTLs in this case.

The www.microsoft.com example above on the other hand uses non-0 TLLs.

(that's in no way saying that this isn't a real scenario, DNS-based connection load balancing works this way)

Copy link
Member

@stephentoub stephentoub Oct 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also check whether DNS caching in your system is enabled, e.g.

systemctl is-active systemd-resolved

My point is simply that stating that it's a false assumption that OSes cache DNS information is itself a false assumption. Every major OS I'm aware of is capable of doing DNS caching. Whether it's enabled is a different question, though to my knowledge it is by default in Ubuntu 22.04. And we shouldn't just be killing the associated comment. If you want to augment the comment to say that caching might be disabled, that's fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong thinking that answers are never cached on Ubuntu by default. However, to me the comment suggests that we can generally rely on short completion times because of the caching, which is also not true, eg. as Miha pointed out answers with TTL=0 are not cached, which then increases the probability of hitting #81023.

I will add the comment back & augment it.

/// serialized. We also still want to issue the request to the OS, rather
/// than having all concurrent requests for the same host share the exact same task, so that any shuffling of the results
/// by the OS to enable round robin is still perceived.
/// </remarks>
Expand All @@ -666,7 +664,18 @@ private static Task<TResult> RunAsync<TResult>(Func<object, long, TResult> func,
Debug.Assert(!Monitor.IsEntered(s_tasks));
try
{
return func(key, startingTimestamp);
using (cancellationToken.UnsafeRegister(static s =>
{
// In case a request is cancelled, start a new queue for this key.
// This helps avoiding congestion when a getaddrinfo call runs for too long.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale for why a cancellation request means we no longer want to serialize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If configured well, a cancellation would likely kick-in for a long-running or stalled request. Closing the existing queue and starting a new one makes sure that requests following the cancellation would not be blocked by the long-running request. This helped in my experiments + validations are in progress in customer environments.

Copy link
Member

@stephentoub stephentoub Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the original motivations for doing this queueing was a DNS request that could get stalled, and thus all requests for that DNS endpoint would end up blocking all threads in the pool. Doesn't this effectively put us back into the same place?

Copy link
Contributor Author

@antonfirsov antonfirsov Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I see, #48566 doesn't talk about stalled (up to several seconds) requests but about slow (tens of milliseconds?) requests using up the ThreadPool threads. This does not remove the serialization of those. The queue is only being reset when a cancellation kicks in which is typically triggered by a ConnectTimeout, which is usually in a multiple seconds range. If the customer mentioned in #34633 had stalled requests, they would have likely came back to us reporting #81023 much earlier.

cc @davidfowl if you can recall how long did the slow requests take in #34633. Without that I would assume it's a case similar to what we discussed in #92862 (comment) and "slow" means that the requests are in the 5-50 ms range.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so if every request for a given destination ends up taking a longer period of time and ends up being canceled as a result, every cancellation is going to cause subsequent requests to run concurrently with ones already there, and if it happens enough, won't it similarly spread out over all of the pool threads?

when a cancellation kicks in which is typically triggered by a ConnectTimeout

That's one source. The Dns APIs that support cancellation are public.

Copy link
Contributor Author

@antonfirsov antonfirsov Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if every request for a given destination ends up taking a longer period of time and ends up being canceled as a result, every cancellation is going to cause subsequent requests to run concurrently with ones already there, and if it happens enough, won't it similarly spread out over all of the pool threads?

Yes, but how likely is this scenario, and is it more probable than hitting #81023? To me it looks like having all requests to the same host stalled for a long enough period to starve the ThreadPool is a catsrophic event which would have serious impact on the application's funcionality even without the pool starvation, and the customer has to go and fix the environmental setup anyways. Occasional stalled requests are more common IMO, and the problem is that they are currently killing a possibly long chain of follow-up requests.

That's one source. The Dns APIs that support cancellation are public.

IMO this doesn't really change things. Regularly occuring cases of cancellation are most likely timeouts, and cancelling a significant percentage of DNS lookups by setting an overly short timeout doesn't look like a sane scenario to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Occasional stalled requests are more common IMO

What is the cause of the stalled requests you're seeing that makes this the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #92054 we see the OS resolver (succesfully) retrying a query after the default timeout of 5 seconds, and getaddrinfo is blocking for this 5sec period. The retries are sporadic and do not come in bursts. They are caused by a packet loss in my understanding, which shouldn't be very uncommon.

I don't have such confident information from the other cases, because this is typcially happening in production systems in a sporadic manner. Currently I'm working with @matt-augustine checking if a private build of the changes in this PR helps with their symptoms. Would that bring enough confidence for you that the change is net positive?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We see the issue in a production service with a containerized application running in a Kubernetes cluster on AKS. It usually occurs in new pods shortly after they are created. The cluster is using the default Kubernetes DNS configuration, which as I understand it means that DNS lookups are not cached in the container/pod, so every DNS query is routed to the CoreDNS service as the nameserver, which might be running on a different node.

Our theory is that UDP packet loss might be responsible for failed/retried DNS queries when new pods are created, but we have not confirmed that. We are currently testing a private build to see if it helps, and I'm going to try to make a simpler way to test it in a more repeatable way.

lock (s_tasks)
{
s_tasks.Remove(s!);
}
}, key))
{
return func(key, startingTimestamp);
}
}
finally
{
Expand All @@ -682,16 +691,17 @@ private static Task<TResult> RunAsync<TResult>(Func<object, long, TResult> func,
}, key, cancellationToken, TaskContinuationOptions.DenyChildAttach, TaskScheduler.Default);

// If it's possible the task may end up getting canceled, it won't have a chance to remove itself from
// the dictionary if it is canceled, so use a separate continuation to do so.
// the dictionary and report AfterResolution() if it is canceled, so use a separate continuation to do so.
if (cancellationToken.CanBeCanceled)
{
task.ContinueWith((task, key) =>
task.ContinueWith(delegate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously this wasn't allocating a new delegate and a new closure on every call. Now it is, as it's closing over the key and the task and the startingTimestamp. It'll even result in that allocation even if !CanBeCanceled, because state it's closing over is at the top-level method scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fix for #92045. I don't see a way to capture startingTimestamp without an allocation. We can pass the info in a tuple to avoid the allocation in the !CanBeCanceled case. Or we can create a private class & cache instances to reduce the allocations in general.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a minimum you can pass the required state in via the object state parameter. That'll require a tuple allocation/boxing, but the one allocation for that is still better than the two allocations for the closure and delegate. And it'll make it specific to the CanBeCanceled case, whereas with how it's currently written the closure allocation will happen first thing in the method body.

{
NameResolutionTelemetry.Log.AfterResolution(key, startingTimestamp, false);
lock (s_tasks)
{
((ICollection<KeyValuePair<object, Task>>)s_tasks).Remove(new KeyValuePair<object, Task>(key!, task));
}
}, key, CancellationToken.None, TaskContinuationOptions.OnlyOnCanceled | TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default);
}, CancellationToken.None, TaskContinuationOptions.OnlyOnCanceled | TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default);
}

// Finally, store the task into the dictionary as the current task for this key.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ public ValueTask ConnectAsync(EndPoint remoteEP, CancellationToken cancellationT

saea.RemoteEndPoint = remoteEP;

ValueTask connectTask = saea.ConnectAsync(this);
ValueTask connectTask = saea.ConnectAsync(this, saeaCancelable: cancellationToken.CanBeCanceled);
if (connectTask.IsCompleted || !cancellationToken.CanBeCanceled)
{
// Avoid async invocation overhead
Expand Down Expand Up @@ -1202,11 +1202,11 @@ public ValueTask<int> SendToAsync(Socket socket, CancellationToken cancellationT
ValueTask.FromException<int>(CreateException(error));
}

public ValueTask ConnectAsync(Socket socket)
public ValueTask ConnectAsync(Socket socket, bool saeaCancelable)
{
try
{
if (socket.ConnectAsync(this, userSocket: true, saeaCancelable: false))
if (socket.ConnectAsync(this, userSocket: true, saeaCancelable: saeaCancelable))
{
return new ValueTask(this, _mrvtsc.Version);
}
Expand Down