From 48bb6aa772fdda03505f0c555c017d9e815693d5 Mon Sep 17 00:00:00 2001 From: Joe Betz Date: Fri, 26 Apr 2019 15:36:55 -0700 Subject: [PATCH 1/3] sig-api-machinery: Add scale targets to CRDs to GA KEP --- keps/sig-api-machinery/20180415-crds-to-ga.md | 78 ++++++++++++++++--- 1 file changed, 67 insertions(+), 11 deletions(-) diff --git a/keps/sig-api-machinery/20180415-crds-to-ga.md b/keps/sig-api-machinery/20180415-crds-to-ga.md index 3ea5f825054..26e518b6a79 100644 --- a/keps/sig-api-machinery/20180415-crds-to-ga.md +++ b/keps/sig-api-machinery/20180415-crds-to-ga.md @@ -25,6 +25,7 @@ see-also: - "[Umbrella Issue](https://github.com/kubernetes/kubernetes/issues/58682)" - "[Vanilla OpenAPI Subset Design](https://docs.google.com/document/d/1pcGlbmw-2Y0JJs9hsYnSBXamgG9TfWtHY6eh80zSTd8)" - "[Pruning for CustomResources KEP](https://github.com/kubernetes/enhancements/pull/709)" + - "[Defaulting for Custom Resources KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190426-crd-defaulting.md)" --- # Title @@ -102,14 +103,14 @@ See [Post-GA tasks](#post-ga-tasks) for decided out-of-scope features. ### Defaulting and pruning for custom resources is implemented Both defaulting and pruning and also read-only validation are blocked by the -OpenAPI subset definition (next point). An update of the [old Pruning for -CustomResources KEP](https://github.com/kubernetes/enhancements/pull/709) and the implementation -([pruning PR](https://github.com/kubernetes/kubernetes/pull/64558), [defaulting -PR](https://github.com/kubernetes/kubernetes/pull/63604)), are follow-ups as soon as unblocked. +OpenAPI subset definition (next point). + +See the [Pruning for CustomResources KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20180731-crd-pruning.md) +and the [Defaulting for Custom Resources KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190426-crd-defaulting.md). ### CRD v1 schemas are restricted to a subset of the OpenAPI specification -See [Vanilla OpenAPI Subset Design](https://docs.google.com/document/d/1pcGlbmw-2Y0JJs9hsYnSBXamgG9TfWtHY6eh80zSTd8) +See [OpenAPI Subset KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190425-structural-openapi.md) ### Generator exists for CRD Validation Schema v3 (Kubebuilder) @@ -121,8 +122,8 @@ to be integrated into kubebuidler’s controller-tools. ### CustomResourceWebhookConversion API is GA ready -Currently CRD webhook conversion is alpha. We plan to take this to v1beta1 via the -"Graduation Criteria" proposed in [PR #1004](https://github.com/kubernetes/enhancements/pull/1004). +Currently CRD webhook conversion is alpha. We plan to take this to v1beta1 according to the +[CustomResourceDefinition Conversion Webhook's Graduation Criteria](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190425-crd-conversion-webhook.md#graduation-criteria). We plan to then graduate this to GA as part of the CRD to GA graduation. ### CustomResourceSubresources API is GA ready @@ -162,10 +163,65 @@ TODO: complete this list ### Scale Targets for GA -* TODO quantify: Read/write latency of CRDs within X% of native Kubernetes types -* TODO quantify: Latency degrades less than X% for up to 100k Custom Resources per CRD kind -* TODO quantify: Webhook conversion QPS of a noop converter is within X% of QPS with no webhook -* Coordinate with sig-scalability +The scale target for GA of custom resources are defined by the same [API call latency +SLIs/SLOs as the Kuberetes native types](https://github.com/kubernetes/community/blob/master/sig-scalability/slos/api_call_latency.md#api-call-latency-slisslos-details). + +The targets are defined by the below thresholds, which are organized the same way as the [Kubernetes native type thresholds](https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md#kubernetes-thresholds), with only a couple changes: + +- Since custom resources can be arbitrarily large, we have added a object size column and thresholds for a range of object sizes. +- For clarity, we include thersolds with and without conversion webhooks. + +**Custom Resources per Definition, with Conversion webhook:** + +| Object size | Threshold scope=namespace | Threshold: scope=cluster | +| --- | --- | --- | +| 10kb | 2500 | 12500 | +| 25kb | 1000 | 5000 | +| 50kb | 500 | 2500 | + +**Custom Resources per Definition, without Conversion webhook:** + +| Object size | Threshold scope=namespace | Threshold: scope=cluster | +| --- | --- | --- | +| 10kb | 5000 | 25000 | +| 25kb | 2000 | 10000 | +| 50kb | 1000 | 5000 | + +_Note: For `scope: Namespaced` custom resource definitions, the scope=namespace +threshold indicates how many custom resource objects may be in each namespace, +and the scope=cluster threshold indicates how many custom resource objects may +be in the cluster total. For `scope: Cluster` custom resource definitions, only +the scope=cluster threshold applies._ + +**Custom Resource Definitions:** + +| Threshold scope=namespace | Threshold: scope=cluster | +| --- | --- | +| n/a | 500 | + +_Note: The Custom Resource Definition threshold was selected not due to the +above SLI/SLOs, but instead due to the latency OpenAPI publishing, which is a +background process that occurs asychroniously each time a Custom Resource +Definition is updated. For 500 Custom Resource Definitions it takes slightly +over 35 seconds for a a definition change to be visible via the OpenAPI schema +endpoint._ + +_Note: Given that the performance and scalability of conversion webhooks are the +responsibility of their author, CRD scale targets are defined for conversion +webhook latency that includes serialization/deserialization cost, but does not +include the webhook conversion operation cost (i.e. the cost of the custom +authored conversion routines for a particular conversion webhook is not +accounted for)._ + +GA custom resource scale targets were selected based on an [analysis of our current scale limits](https://docs.google.com/document/d/1tEstPQvzGvaRnN-WwGUWx1H9xHPRCy_fFcGlgTkB3f8). + +We ran a month long survey of Custom Resource Definition scale needs across Kubernetes mailing lists, slack channels and social media. +Of the custom resource definitions surveyed, 96% are currently within these scale thresholds, 91% are within these thresholds for their anticipated future growth, and survey data provides useful guidance for our post-GA scalability work. See [survey of real-world custom resource usage](https://docs.google.com/document/d/1MTd_gDlpgBaT5sAKM4j6tQVeCFIT9J44RHzt2yWOK_g) for details. + +As part of GA the threshold and SLO documentation will be updated to make this clear and to +encourage CRD authors to provide concrete thresholds/SLOs for their custom +resource kinds to their users that account for the per resource conversion cost +of their conversion webhook and/or size of their custom resources. ## Graduation Criteria From 5d95aee81efb7541db630438fc25149ae030721b Mon Sep 17 00:00:00 2001 From: Joe Betz Date: Mon, 22 Jul 2019 13:13:48 -0700 Subject: [PATCH 2/3] Clarify scale limits --- keps/sig-api-machinery/20180415-crds-to-ga.md | 96 +++++++++++-------- 1 file changed, 55 insertions(+), 41 deletions(-) diff --git a/keps/sig-api-machinery/20180415-crds-to-ga.md b/keps/sig-api-machinery/20180415-crds-to-ga.md index 26e518b6a79..9533e8471ea 100644 --- a/keps/sig-api-machinery/20180415-crds-to-ga.md +++ b/keps/sig-api-machinery/20180415-crds-to-ga.md @@ -96,7 +96,7 @@ Bug fixes required to graduate CRDs to GA: * See “Required for GA” issues tracked via the [CRD Project Board](https://github.com/orgs/kubernetes/projects/28). -For additional details on already completed features, see the [Umbrella Issue](https://github.com/kubernetes/kubernetes/issues/58682). +For additional details on already completed features, see the [CRD Project Board](https://github.com/orgs/kubernetes/projects/28). See [Post-GA tasks](#post-ga-tasks) for decided out-of-scope features. @@ -163,63 +163,77 @@ TODO: complete this list ### Scale Targets for GA -The scale target for GA of custom resources are defined by the same [API call latency +The scale targets for GA of custom resources are defined by the same [API call latency SLIs/SLOs as the Kuberetes native types](https://github.com/kubernetes/community/blob/master/sig-scalability/slos/api_call_latency.md#api-call-latency-slisslos-details). -The targets are defined by the below thresholds, which are organized the same way as the [Kubernetes native type thresholds](https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md#kubernetes-thresholds), with only a couple changes: +The targets are defined by the below suggested maximum limits, which are organized the same way as the [Kubernetes native type thresholds](https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md#kubernetes-thresholds), with only one change: -- Since custom resources can be arbitrarily large, we have added a object size column and thresholds for a range of object sizes. -- For clarity, we include thersolds with and without conversion webhooks. +- Since custom resources can be arbitrarily large, we have broken down the limit by custom resource object size. -**Custom Resources per Definition, with Conversion webhook:** +**Custom Resources per Definition:** -| Object size | Threshold scope=namespace | Threshold: scope=cluster | +| Object size | Suggested Maximum Limit: scope=namespace (5s p99 SLO) | Suggested Maximum Limit: scope=cluster (30s p99 SLO) | | --- | --- | --- | -| 10kb | 2500 | 12500 | -| 25kb | 1000 | 5000 | -| 50kb | 500 | 2500 | +| 10kb | 1500 | 10000 | +| 25kb | 600 | 4000 | +| 50kb | 300 | 2000 | + +Since, in practice, custom resources scale farther without conversion webhooks +within the SLI/SLOs (roughly 2x according to our scale tests), custom resource +definition authors should be careful to adhere to these limits so that in the +future a webhook converter may safely be added as part of a custom resource +version upgrade. + +_Note: For custom resources of custom resource definitions using `scope: Namespaced`: the scope=namespace +suggested maximum limit indicates how many custom resource objects may be in each namespace, +and the scope=cluster suggested maximum limit indicates how many custom resource objects may +be in the cluster total. For custom resources of custom resource definitions using `scope: Cluster`: only +the scope=cluster suggested maximum limit applies._ -**Custom Resources per Definition, without Conversion webhook:** +**Custom Resource Definitions:** -| Object size | Threshold scope=namespace | Threshold: scope=cluster | -| --- | --- | --- | -| 10kb | 5000 | 25000 | -| 25kb | 2000 | 10000 | -| 50kb | 1000 | 5000 | +| Suggested Maximum Limit: scope=cluster | +| --- | +| 500 | -_Note: For `scope: Namespaced` custom resource definitions, the scope=namespace -threshold indicates how many custom resource objects may be in each namespace, -and the scope=cluster threshold indicates how many custom resource objects may -be in the cluster total. For `scope: Cluster` custom resource definitions, only -the scope=cluster threshold applies._ +_Note: The Custom Resource Definition suggested maximum limit was selected not +due to the above SLI/SLOs, but instead due to the latency OpenAPI publishing, +which is a background process that occurs asychroniously each time a Custom +Resource Definition schema is updated. For 500 Custom Resource Definitions it takes +slightly over 35 seconds for a definition change to be visible via the OpenAPI +spec endpoint._ -**Custom Resource Definitions:** +**Conversion Webhooks:** + +Conversion Webhook SLOs are defined from the perspective of the conversion +webhook. It does not include any api-server serialization/deserialization for +making the request to the webhook, but it does include network latency. -| Threshold scope=namespace | Threshold: scope=cluster | +Given that the performance and scalability of conversion webhooks are the +responsibility of their author, Custom resource scale targets are applied only for +conversion webhooks that are within the follow latencies for the above suggested +maximum limits. + +| scope | Expected conversion Webhook SLO: p99 latency | | --- | --- | -| n/a | 500 | - -_Note: The Custom Resource Definition threshold was selected not due to the -above SLI/SLOs, but instead due to the latency OpenAPI publishing, which is a -background process that occurs asychroniously each time a Custom Resource -Definition is updated. For 500 Custom Resource Definitions it takes slightly -over 35 seconds for a a definition change to be visible via the OpenAPI schema -endpoint._ - -_Note: Given that the performance and scalability of conversion webhooks are the -responsibility of their author, CRD scale targets are defined for conversion -webhook latency that includes serialization/deserialization cost, but does not -include the webhook conversion operation cost (i.e. the cost of the custom -authored conversion routines for a particular conversion webhook is not -accounted for)._ +| resource | 50ms | +| namespace | 1 seconds | +| cluster | 6 seconds | + +The above object size and suggested maximum limits in the Custom Resources per +Definition table applies to these conversion webhook SLOs. For example, for a +list request for 1500 custom resource objects that are 10k in size, the resource +scope SLO of 1 second for the conversion webhook applies. + +**Scale Target Data** GA custom resource scale targets were selected based on an [analysis of our current scale limits](https://docs.google.com/document/d/1tEstPQvzGvaRnN-WwGUWx1H9xHPRCy_fFcGlgTkB3f8). We ran a month long survey of Custom Resource Definition scale needs across Kubernetes mailing lists, slack channels and social media. -Of the custom resource definitions surveyed, 96% are currently within these scale thresholds, 91% are within these thresholds for their anticipated future growth, and survey data provides useful guidance for our post-GA scalability work. See [survey of real-world custom resource usage](https://docs.google.com/document/d/1MTd_gDlpgBaT5sAKM4j6tQVeCFIT9J44RHzt2yWOK_g) for details. +Of the custom resource definitions surveyed, 96% are currently within these suggested maximum limits, 91% are within these limits for their anticipated future growth, and survey data provides useful guidance for our post-GA scalability work. See [survey of real-world custom resource usage](https://docs.google.com/document/d/1MTd_gDlpgBaT5sAKM4j6tQVeCFIT9J44RHzt2yWOK_g) for details. -As part of GA the threshold and SLO documentation will be updated to make this clear and to -encourage CRD authors to provide concrete thresholds/SLOs for their custom +As part of GA the suggested maximum limits and SLO documentation will be updated to make this clear and to +encourage CRD authors to provide concrete suggested maximum limits and SLIs/SLOs for their custom resource kinds to their users that account for the per resource conversion cost of their conversion webhook and/or size of their custom resources. From 762a1186160b464997f54e0552bfa53dac4275de Mon Sep 17 00:00:00 2001 From: Joe Betz Date: Mon, 29 Jul 2019 14:17:09 -0700 Subject: [PATCH 3/3] Add cluster total CR limit, explain CR per CRD limits in more detail --- keps/sig-api-machinery/20180415-crds-to-ga.md | 74 +++++++++++++------ 1 file changed, 52 insertions(+), 22 deletions(-) diff --git a/keps/sig-api-machinery/20180415-crds-to-ga.md b/keps/sig-api-machinery/20180415-crds-to-ga.md index 9533e8471ea..cc123fbc7a4 100644 --- a/keps/sig-api-machinery/20180415-crds-to-ga.md +++ b/keps/sig-api-machinery/20180415-crds-to-ga.md @@ -170,13 +170,53 @@ The targets are defined by the below suggested maximum limits, which are organiz - Since custom resources can be arbitrarily large, we have broken down the limit by custom resource object size. +**Custom Resource Definitions:** + +| Suggested Maximum Limit: scope=cluster | +| --- | +| 500 | + +_Note: The Custom Resource Definition suggested maximum limit was selected not +due to the above SLI/SLOs, but instead due to the latency OpenAPI publishing, +which is a background process that occurs asychroniously each time a Custom +Resource Definition schema is updated. For 500 Custom Resource Definitions it takes +slightly over 35 seconds for a definition change to be visible via the OpenAPI +spec endpoint._ + +**Custom Resources, Cluster Wide:** + +Cluster wide limits for custom resources are storage bound and custom resources +share the storage space with all other objects. While determining the +appropriate storage limit for a cluster is out-of-scope for this document, once +a etcd storage limit selected, suggested maximum limits for custom resources +are: + +| etcd storage limit | Suggested Maximum Limit: scope=cluster | +| --- | --- | +| 4GB | 40000 | +| 8GB | 80000 | + +These limits aim to keep custom resource storage usage to less than half of the +total cluster storage capacity for custom resources of 50kb or less in size. + **Custom Resources per Definition:** +For each custom resource definition, the limit on the number of custom resources +can be found by taking the (median) object size of the custom resource and finding +the the matching row in this table: + | Object size | Suggested Maximum Limit: scope=namespace (5s p99 SLO) | Suggested Maximum Limit: scope=cluster (30s p99 SLO) | | --- | --- | --- | -| 10kb | 1500 | 10000 | -| 25kb | 600 | 4000 | -| 50kb | 300 | 2000 | +| <=10kb | 1500 | 10000 | +| (10kb - 25kb] | 600 | 4000 | +| (25kb - 50kb] | 300 | 2000 | + +The cluster scope indicates the total number of custom resources for that +definition allowed in the entire cluster. + +The namespace scope indicates the total number of custom resources for that +definition allowed in any particular namespace. The cumulative count of the +custom resource across all namespaces must not exceed the cluster limit. Since, in practice, custom resources scale farther without conversion webhooks within the SLI/SLOs (roughly 2x according to our scale tests), custom resource @@ -190,19 +230,6 @@ and the scope=cluster suggested maximum limit indicates how many custom resource be in the cluster total. For custom resources of custom resource definitions using `scope: Cluster`: only the scope=cluster suggested maximum limit applies._ -**Custom Resource Definitions:** - -| Suggested Maximum Limit: scope=cluster | -| --- | -| 500 | - -_Note: The Custom Resource Definition suggested maximum limit was selected not -due to the above SLI/SLOs, but instead due to the latency OpenAPI publishing, -which is a background process that occurs asychroniously each time a Custom -Resource Definition schema is updated. For 500 Custom Resource Definitions it takes -slightly over 35 seconds for a definition change to be visible via the OpenAPI -spec endpoint._ - **Conversion Webhooks:** Conversion Webhook SLOs are defined from the perspective of the conversion @@ -211,14 +238,17 @@ making the request to the webhook, but it does include network latency. Given that the performance and scalability of conversion webhooks are the responsibility of their author, Custom resource scale targets are applied only for -conversion webhooks that are within the follow latencies for the above suggested +conversion webhooks that are within the following latencies for the above suggested maximum limits. -| scope | Expected conversion Webhook SLO: p99 latency | -| --- | --- | -| resource | 50ms | -| namespace | 1 seconds | -| cluster | 6 seconds | +| scope | object count limit | Expected conversion Webhook SLO: p99 latency | +| --- | --- | --- | +| resource | 1 | 50ms | +| namespace | 1500 (<=10kb), 600 (10-25kb) or 300 (25-50kb) | 1 seconds | +| cluster | 10000 (<=10kb), 4000 (10-25kb) or 2000 (25-50kb) | 6 seconds | + +The scope=resource's higher "per-object" latency (50ms vs ~1.5ms for namespace +and cluster scope) is to accommodate for a request serving cost constant. The above object size and suggested maximum limits in the Custom Resources per Definition table applies to these conversion webhook SLOs. For example, for a