Skip to content

Organize FAQ page by section #4709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 29, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
297 changes: 156 additions & 141 deletions docs/hugo/content/guide/frequently-asked-questions.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
---
title: FAQ
title: Frequently Asked Questions
linkTitle: FAQ
weight: -2
---
## Frequently Asked Questions
## Project and scope

_Questions about the Azure Service Operator project, its vision and scope._

### What is the release cadence?

Expand Down Expand Up @@ -42,42 +45,11 @@ support ticket.
No. If the Azure resource supports DR then you can configure it through ASO.
If the underlying Azure Resource doesn't support DR (or the story is more complicated/manual), then you cannot currently configure it through ASO.

### How can I protect against accidentally deleting an important resource?

1. You can set [serviceoperator.azure.com/reconcile-policy: detach-on-delete]( {{< relref "annotations#serviceoperatorazurecomreconcile-policy" >}}). This will allow the resource to be deleted in k8s but not delete the underlying resource in Azure.
2. You can use a project like <https://github.com/petrkotas/k8s-object-lock> to protect the resources you're worried about. Note: That project is not owned/sponsored by Microsoft.
3. You can manually add a finalizer to the resource which will not be removed except manually by you when ready to delete the resource, see [this](https://kubernetes.io/blog/2021/05/14/using-finalizers-to-control-deletion/)

There's also a proposal for [more general upstream support](https://github.com/kubernetes/kubernetes/issues/10179) on this topic, although there hasn't been movement on it in a while.

### What are some ASO best practices?

See [best practices]( {{< relref "best-practices" >}} ).

### Can I run ASO in active-active mode?

This is where two different ASO instances manage the same resource in Azure.

This _can_ be done but is not recommended. The main risk here is the goal state between the two instances of the same resource differing, causing thrashing in Azure
as each operator instance tries to drive to its goal. If you take great care to ensure that the goal state between the two clusters cannot differ, then
active-active can be done.

We instead recommend an active-passive approach, where in 1 cluster the resources are created/managed as normal, and in the other cluster the resources are just watched.
This can be accomplished with the [serviceoperator.azure.com/reconcile-policy: skip]( {{< relref "annotations#serviceoperatorazurecomreconcile-policy" >}} )
annotation used in the second cluster. In the case of a DR event, automation or manual action can remove the `skip` annotation in the passive cluster, turning it into active mode.

### Can ASO be used with IAC/GitOps tools?

Yes! We strongly recommend using something like [fluxcd](https://fluxcd.io/) or [argocd](https://argo-cd.readthedocs.io/en/stable/) with ASO.

If using argocd, make sure to **avoid** the `SyncPolicy Replace=true`, as that removes finalizers and annotations added by the operator whenever resources are re-applied.
ASO relies on the finalizer and annotations it adds being left alone to function properly. If they are unexpectedly removed the operator may not behave as expected.

### What's the difference between ASO and Crossplane.io?

There are a lot of similarities between ASO and Crossplane. They do similar things and have similar audiences. You can see some of this discussed [here](https://github.com/Azure/azure-service-operator/issues/1190).

**Today** primary differences are:
**Today** the primary differences are:

* ASO is officially maintained by Microsoft, while Crossplane Azure is community maintained.
* ASO focuses on simplicity. It doesn't offer any of the higher level abstractions that Crossplane does. ASO is not and will not ever be multi-cloud.
Expand All @@ -87,58 +59,6 @@ There are a lot of similarities between ASO and Crossplane. They do similar thin
We would like to share our code-generator with Crossplane, as it’s higher fidelity than Terrajet (the codegenerator Crossplane uses to generate resources) for Azure resources.
Right now our focus is on getting ASO to GA, after which we will hopefully have more time to invest in that.

### Can I configure how often ASO re-syncs to Azure when there have been no changes?

Yes, using the `azureSyncPeriod` argument in Helm's values.yaml, or using the `AZURE_SYNC_PERIOD`
in the `aso-controller-settings` secret. This value is a string with format like: `15m`, `1h`, or `24h`.

After changing this value, you must restart the `azureserviceoperator-controller-manager` pod in order for it to take effect
if the pod is already running.

Be careful setting this value too low as it can produce a lot of calls to Azure.

### How well does ASO scale?

ASO is designed to easily scale to thousands of resources, and we have customers who routinely reconcile thousands of resources with a single instance of ASO without issue.

A separate reconciler is run for each different kind of resource, and each reconciler is non-blocking. If resource creation (or update) triggers a long-running-operation (LRO), ASO doesn't sit there polling for completion of the operation. Instead, it stashes information about the operation on the resource as an annotation, schedules the resource for a retry of reconciliation, and moves on to reconcile the next resource due. Later, the LRO is picked up and checked to see if it has completed.

For the rare situation where users observe that ASO is failing to "keep up" with resource reconciliation (this can be monitored by looking at the queue length of the operator via [metrics]({{< relref metrics>}})), they can set MAX_CONCURRENT_RECONCILES to a larger value (the default is 1).

### I'm seeing Subscription throttling, what can I do?

ASO puts some steady load on your subscription due to re-reconciling resources periodically to ensure that
there is no drift from the desired goal state. The rate at which this syncing occurs is set by the
[AZURE_SYNC_PERIOD]({{< relref "aso-controller-settings-options#azure_sync_period" >}}) (`azureSyncPeriod` in Helm)
The default is 1h

When `azureSyncPeriod` is up for a particular resource, a new PUT is issued to the resource RP to correct any drift from
the goal state defined in ASO. There has been discussion about changing to do diffing locally to reduce requests to Azure,
see [#1491](https://github.com/Azure/azure-service-operator/issues/1491).

You can estimate the maximum idle request rate of ASO based on the configured `azureSyncPeriod` and the number of
resources being managed. The rough formula is: `numResources * 60/azureSyncPeriod(in minutes) = requestsPerHour`

For example:

| azureSyncPeriod | Number of resources | Requests / hour |
| --------------- | ------------------- | --------------- |
| 15m | 300 | 1200 |
| 15m | 1000 | 4000 |
| 1h | 1200 | 1200 |
| 24h | 28800 | 1200 |

### ASO is slow to reconcile some resources

If you have a large amount of a single type of resource, ASO may not be able to keep up with the
number of reconciles it needs to run for that resource type. This would manifest as the
[workqueue_depth metric]({{< relref "metrics" >}}) staying consistently high for a single controller
(or set of controllers).

If this happens, you can increase the
[MAX_CONCURRENT_RECONCILES]( {{< relref "aso-controller-settings-options#max_concurrent_reconciles" >}})
setting to allow for more than a single reconcile. See the documentation of that option to understand what it means.

### Why doesn't ASO support exporting/importing all data from configmap/secret?

For configuration management details, where the values are statically known prior to deployment,
Expand All @@ -152,76 +72,46 @@ The problem of how to template Kubernetes resources has already been solved a nu
as Kustomize and Helm (among others). ASO composes with those projects, and others like them, rather than trying
to build our own templating.

### Should I use user-assigned or system-assigned identity?
## Running in Production

We don't take a position on whether it's universally better to deploy ASO using a user-assigned or system-assigned managed identity because the correct choice for you depends on your own context.
_How to set up and configure ASO in a production environment._

If you haven't already read it, Azure has a good [best practices for managed identity guide](https://learn.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/managed-identity-best-practice-recommendations) that may be useful.
### How can I protect against accidentally deleting an important resource?

### When using Workload Identity, how can I easily inject the ASO created User Managed Identity details onto the service account?
1. You can set [serviceoperator.azure.com/reconcile-policy: detach-on-delete]( {{< relref "annotations#serviceoperatorazurecomreconcile-policy" >}}). This will allow the resource to be deleted in k8s but not delete the underlying resource in Azure.
2. You can use a project like <https://github.com/petrkotas/k8s-object-lock> to protect the resources you're worried about. Note: That project is not owned/sponsored by Microsoft.
3. You can manually add a finalizer to the resource which will not be removed except manually by you when ready to delete the resource, see [this](https://kubernetes.io/blog/2021/05/14/using-finalizers-to-control-deletion/)

The [workload identity documentation](https://azure.github.io/azure-workload-identity/docs/topics/service-account-labels-and-annotations.html#service-account)
suggests that you need to set the `azure.workload.identity/client-id` annotation on the ServiceAccount.
This is not actually required! Setting that annotation instructs the Workload Identity webhook to inject the `AZURE_CLIENT_ID`
environment variable into the pods on which the ServiceAccount is used.
There's also a proposal for [more general upstream support](https://github.com/kubernetes/kubernetes/issues/10179) on this topic, although there hasn't been movement on it in a while.

If you've created your user managed identity with ASO, it's easier to just do that injection yourself by using the
`operatorSpec.configMaps` feature of the identity:
### What are some ASO best practices?

Identity:
See [best practices]( {{< relref "best-practices" >}} ).

```yaml
operatorSpec:
configMaps:
tenantId:
name: identity-details
key: tenantId
clientId:
name: identity-details
key: clientId
```
### Can I run ASO in active-active mode?

and
This is where two different ASO instances manage the same resource in Azure.

Pod:
This _can_ be done but is not recommended. The main risk here is the goal state between the two instances of the same resource differing, causing thrashing in Azure
as each operator instance tries to drive to its goal. If you take great care to ensure that the goal state between the two clusters cannot differ, then
active-active can be done.

```yaml
env:
- name: AZURE_CLIENT_ID
valueFrom:
configMapKeyRef:
key: clientId
name: identity-details
```
We instead recommend an active-passive approach, where in 1 cluster the resources are created/managed as normal, and in the other cluster the resources are just watched.
This can be accomplished with the [serviceoperator.azure.com/reconcile-policy: skip]( {{< relref "annotations#serviceoperatorazurecomreconcile-policy" >}} )
annotation used in the second cluster. In the case of a DR event, automation or manual action can remove the `skip` annotation in the passive cluster, turning it into active mode.

You can allow the other environment variables, volumes, and volume mounts to be injected automatically by the
[Azure Workload Identity webhook](https://azure.github.io/azure-workload-identity/docs/installation/mutating-admission-webhook.html),
or you can avoid running the Azure Workload Identity webhook entirely, but doing so requires that you manually
include the `azure-identity` volume and volumeMount, as well as set the `AZURE_TENANT_ID` variable
alongside `AZURE_CLIENT_ID` in every pod that needs workload identity.
### Can ASO be used with IAC/GitOps tools?

Sample VolumeMount:
Yes! We strongly recommend using something like [fluxcd](https://fluxcd.io/) or [argocd](https://argo-cd.readthedocs.io/en/stable/) with ASO.

```yaml
volumeMounts:
- mountPath: /var/run/secrets/azure/tokens/azure-identity-token
name: azure-identity-token
readOnly: true
```
If using argocd, make sure to **avoid** the `SyncPolicy Replace=true`, as that removes finalizers and annotations added by the operator whenever resources are re-applied.
ASO relies on the finalizer and annotations it adds being left alone to function properly. If they are unexpectedly removed the operator may not behave as expected.

Sample Volume:
### Should I use user-assigned or system-assigned identity?

```yaml
volumes:
- name: azure-identity-token
projected:
defaultMode: 420
sources:
- serviceAccountToken:
audience: api://AzureADTokenExchange
expirationSeconds: 3600
path: azure-identity
```
We don't take a position on whether it's universally better to deploy ASO using a user-assigned or system-assigned managed identity because the correct choice for you depends on your own context.

If you haven't already read it, Azure has a good [best practices for managed identity guide](https://learn.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/managed-identity-best-practice-recommendations) that may be useful.

### How can I feed the output of one resource into a parameter for the next?

Expand Down Expand Up @@ -283,6 +173,66 @@ allowing it to adopt the existing resource in Azure) you must manually specify t
of the `RoleAssignment` as the original UUID. Otherwise, the UUID defaulting algorithm will choose a different UUID since
the namespace has changed.

### Can I configure how often ASO re-syncs to Azure when there have been no changes?

Yes, using the `azureSyncPeriod` argument in Helm's values.yaml, or using the `AZURE_SYNC_PERIOD`
in the `aso-controller-settings` secret. This value is a string with format like: `15m`, `1h`, or `24h`.

After changing this value, you must restart the `azureserviceoperator-controller-manager` pod in order for it to take effect
if the pod is already running.

Be careful setting this value too low as it can produce a lot of calls to Azure.

## Performance and scalability

_How and when to tune ASO to work well for you._

### How well does ASO scale?

ASO is designed to easily scale to thousands of resources, and we have customers who routinely reconcile thousands of resources with a single instance of ASO without issue.

A separate reconciler is run for each different kind of resource, and each reconciler is non-blocking. If resource creation (or update) triggers a long-running-operation (LRO), ASO doesn't sit there polling for completion of the operation. Instead, it stashes information about the operation on the resource as an annotation, schedules the resource for a retry of reconciliation, and moves on to reconcile the next resource due. Later, the LRO is picked up and checked to see if it has completed.

For the rare situation where users observe that ASO is failing to "keep up" with resource reconciliation (this can be monitored by looking at the queue length of the operator via [metrics]({{< relref metrics>}})), they can set MAX_CONCURRENT_RECONCILES to a larger value (the default is 1).

### I'm seeing Subscription throttling, what can I do?

ASO puts some steady load on your subscription due to re-reconciling resources periodically to ensure that
there is no drift from the desired goal state. The rate at which this syncing occurs is set by the
[AZURE_SYNC_PERIOD]({{< relref "aso-controller-settings-options#azure_sync_period" >}}) (`azureSyncPeriod` in Helm)
The default is 1h

When `azureSyncPeriod` is up for a particular resource, a new PUT is issued to the resource RP to correct any drift from
the goal state defined in ASO. There has been discussion about changing to do diffing locally to reduce requests to Azure,
see [#1491](https://github.com/Azure/azure-service-operator/issues/1491).

You can estimate the maximum idle request rate of ASO based on the configured `azureSyncPeriod` and the number of
resources being managed. The rough formula is: `numResources * 60/azureSyncPeriod(in minutes) = requestsPerHour`

For example:

| azureSyncPeriod | Number of resources | Requests / hour |
| --------------- | ------------------- | --------------- |
| 15m | 300 | 1200 |
| 15m | 1000 | 4000 |
| 1h | 1200 | 1200 |
| 24h | 28800 | 1200 |

### ASO is slow to reconcile some resources

If you have a large amount of a single type of resource, ASO may not be able to keep up with the
number of reconciles it needs to run for that resource type. This would manifest as the
[workqueue_depth metric]({{< relref "metrics" >}}) staying consistently high for a single controller
(or set of controllers).

If this happens, you can increase the
[MAX_CONCURRENT_RECONCILES]( {{< relref "aso-controller-settings-options#max_concurrent_reconciles" >}})
setting to allow for more than a single reconcile. See the documentation of that option to understand what it means.

## Other questions

_Anything that didn't fit into the other categories._

### How can I import existing Azure resources into ASO?

See [Annotations understood by the operator]({{< relref "annotations#serviceoperatorazurecomreconcile-policy" >}}) for
Expand Down Expand Up @@ -368,3 +318,68 @@ func main() {
kubeClient.Create(ctx, obj)
}
```

### When using Workload Identity, how can I easily inject the ASO created User Managed Identity details onto the service account?

The [workload identity documentation](https://azure.github.io/azure-workload-identity/docs/topics/service-account-labels-and-annotations.html#service-account)
suggests that you need to set the `azure.workload.identity/client-id` annotation on the ServiceAccount.
This is not actually required! Setting that annotation instructs the Workload Identity webhook to inject the `AZURE_CLIENT_ID`
environment variable into the pods on which the ServiceAccount is used.

If you've created your user managed identity with ASO, it's easier to just do that injection yourself by using the
`operatorSpec.configMaps` feature of the identity:

Identity:

```yaml
operatorSpec:
configMaps:
tenantId:
name: identity-details
key: tenantId
clientId:
name: identity-details
key: clientId
```

and

Pod:

```yaml
env:
- name: AZURE_CLIENT_ID
valueFrom:
configMapKeyRef:
key: clientId
name: identity-details
```

You can allow the other environment variables, volumes, and volume mounts to be injected automatically by the
[Azure Workload Identity webhook](https://azure.github.io/azure-workload-identity/docs/installation/mutating-admission-webhook.html),
or you can avoid running the Azure Workload Identity webhook entirely, but doing so requires that you manually
include the `azure-identity` volume and volumeMount, as well as set the `AZURE_TENANT_ID` variable
alongside `AZURE_CLIENT_ID` in every pod that needs workload identity.

Sample VolumeMount:

```yaml
volumeMounts:
- mountPath: /var/run/secrets/azure/tokens/azure-identity-token
name: azure-identity-token
readOnly: true
```

Sample Volume:

```yaml
volumes:
- name: azure-identity-token
projected:
defaultMode: 420
sources:
- serviceAccountToken:
audience: api://AzureADTokenExchange
expirationSeconds: 3600
path: azure-identity
```
Loading