Long-polling `/poll` hangs ~5 min on IP change, causes metrics gap


**What’s happening**
We run a PushProx client on a host behind a NAT whose public IP changes about twice a day. When that IP change happens, the long-polling `/poll` request to the proxy stays open for almost 5 minutes before it finally errors out. During that time Prometheus receives no metrics,  which causes a 5 min gap, and the “down” alerts to get triggered.


**Root cause**
The client’s transport uses TCP keepalives of 30 s, but Linux by default will send **9** probes (each 30 s apart) before declaring the socket dead. That’s 30 s + (8 × 30 s) = 4 m 30 s from the moment the connection is broken until it closes.
[client/main.go](https://github.com/prometheus-community/PushProx/blob/master/cmd/client/main.go)
```go
transport := &http.Transport{
    Proxy: http.ProxyFromEnvironment,
    DialContext: (&net.Dialer{
        Timeout:   30 * time.Second,
        KeepAlive: 30 * time.Second,
        DualStack: true,
    }).DialContext,
    MaxIdleConns:          100,
    IdleConnTimeout:       90 * time.Second,
    TLSHandshakeTimeout:   10 * time.Second,
    ExpectContinueTimeout: 1 * time.Second,
    TLSClientConfig:       tlsConfig,
}
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Long-polling `/poll` hangs ~5 min on IP change, causes metrics gap #205

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Long-polling /poll hangs ~5 min on IP change, causes metrics gap #205

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Long-polling `/poll` hangs ~5 min on IP change, causes metrics gap #205