Skip to content

connectionTimeout is actually operationTimeout #63

@cduchesne

Description

@cduchesne

I noticed that when I use a short connection timeout, for example 5 seconds, and then throttle between my csi plugin (csi-scaleio) and my storage api server, any operation that takes more than the timeout fails. This is a big problem because the external-provisioner has only connectionTimeout to complete any request. If a volume creation event took longer than 5 seconds, which I tested via throttling, a new request will kick off to the csi plugin, resulting in duplicate volumes with different names being created. This is because the first command was still sent to the csi plugin which it successfully connected to even though the operation is dropped from the provisioner side.

I find that context.WithTimeout is the culprit here as it is used frequently for every command that is sent to the grpc server. This results in the grpc request being severed once connectionTimeout is reached. This would possibly be fine if all operations were requested in an idempotent manner.

My suggestion is to change connectionTimeout to only occur at startup or whenever connection to the csi plugin is first established, and to maybe introduce a new option, operationTimeout.

There are some fixes required to make sure the same volume is requested each time to make the volume creation idempotent. Every time Provision (in pkg/controller) is called, a new random share name is generated, hence if the Provision command is called multiple times, it will result in duplicate volumes with different names. Only 1 volume will ever be tracked by Kubernetes and the others will be orphaned.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions