The 5 Azure Managed Identity Mistakes I See in Every Client Environment

I've spent the last few years auditing, architecting, and fixing Azure environments for clients ranging from startups to enterprises. Managed identity is one of those features that Microsoft rightfully pushes hard — no secrets to rotate, no credentials in code, tight integration with Entra ID. Super important stuff.

But honestly? The implementation is where things fall apart. I keep seeing the same five mistakes. Over and over. In environments that have passed security reviews. In environments built by senior engineers.

Let me walk through each one, show you what goes wrong in production, and give you the CLI commands to fix it.

Mistake 1: Using User-Assigned Identity When System-Assigned Would Be Simpler and More Secure

What People Do Wrong

A team creates a user-assigned managed identity, assigns it to a single Azure Function, gives it permissions to one storage account, and moves on. They never share that identity with anything else. They never move it between resources.

So why did they make it user-assigned in the first place?

Why It Happens

The docs talk about user-assigned identities first in a lot of places. People read "you can reuse it across resources" and think that sounds like good architecture. Future-proofing. Flexibility.

What Actually Happens in Production

You now have an identity whose lifecycle is completely decoupled from the resource it serves. Delete the Function App? The identity and its RBAC assignments are still sitting there. Orphaned. Still has permissions. Nobody cleans it up because nobody remembers it exists.

I've seen client environments with 40+ orphaned user-assigned identities, each with active role assignments. That's 40 attack surfaces that serve zero purpose.

The Fix

If the identity only needs to belong to one resource, use system-assigned. It gets created with the resource and deleted with the resource.

# Enable system-assigned identity on an Azure Function
az functionapp identity assign \
  --name my-function-app \
  --resource-group rg-production
 
# The output gives you the principalId — use it for RBAC
# {
#   "principalId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
#   "tenantId": "...",
#   "type": "SystemAssigned"
# }

Compare that to the user-assigned approach:

# Creating a user-assigned identity (only do this if you need it on multiple resources)
az identity create \
  --name id-shared-storage-reader \
  --resource-group rg-identity
 
# Assigning it to a resource
az functionapp identity assign \
  --name my-function-app \
  --resource-group rg-production \
  --identities /subscriptions/<sub-id>/resourceGroups/rg-identity/providers/Microsoft.ManagedIdentity/userAssignedIdentities/id-shared-storage-reader

Docs vs Reality: The docs present both types as equally valid starting points. In reality, system-assigned should be your default. Only reach for user-assigned when you have a concrete reason — like multiple resources needing the same identity.

What People Do Wrong

A team creates one user-assigned managed identity and assigns it to their API app, their background processor, their admin portal, and their reporting service. One identity to rule them all. Sounds efficient, right?

Why It Happens

Fewer identities means fewer role assignments to manage. It feels cleaner. "They all need access to the same storage account anyway."

What Actually Happens in Production

The blast radius explodes. If the background processor gets compromised — maybe through a deserialization vulnerability in a queue message — the attacker now has every permission you gave to the API, the admin portal, and the reporting service. Because they're all the same identity.

I've seen this break in client environments where a low-priority batch job shared an identity with a customer-facing API. The batch job needed write access to a blob container. The API only needed read. But because they shared an identity, the API effectively had write access too. A single SSRF vulnerability could have turned into data modification.

The Fix

One identity per trust boundary. Resources that serve different purposes or have different security profiles get their own identity.

# Create separate identities for separate trust boundaries
az identity create --name id-api-prod --resource-group rg-identity
az identity create --name id-batch-prod --resource-group rg-identity
az identity create --name id-admin-prod --resource-group rg-identity
 
# Assign minimal permissions to each
# API: read-only to blob storage
az role assignment create \
  --assignee-object-id $(az identity show --name id-api-prod --resource-group rg-identity --query principalId -o tsv) \
  --assignee-principal-type ServicePrincipal \
  --role "Storage Blob Data Reader" \
  --scope /subscriptions/<sub-id>/resourceGroups/rg-data/providers/Microsoft.Storage/storageAccounts/stproddata
 
# Batch: write access to a specific container (using custom role — see Mistake 3)
az role assignment create \
  --assignee-object-id $(az identity show --name id-batch-prod --resource-group rg-identity --query principalId -o tsv) \
  --assignee-principal-type ServicePrincipal \
  --role "Storage Blob Data Contributor" \
  --scope /subscriptions/<sub-id>/resourceGroups/rg-data/providers/Microsoft.Storage/storageAccounts/stproddata/blobServices/default/containers/batch-output

Notice the scope difference. The API gets read on the whole storage account. The batch job gets write on a single container. Big difference.

Mistake 3: Over-Scoping RBAC — Giving Contributor When a Custom Role With 3 Permissions Would Do

What People Do Wrong

Someone needs their managed identity to read blobs from a storage account. They assign the Contributor role at the subscription level. Done.

That identity can now create VMs, delete databases, modify networking rules, and deploy whatever it wants. For reading blobs.

Why It Happens

Built-in roles are easy. Contributor is the "it just works" role. Custom roles feel like overhead. And honestly, the Azure portal makes it really tempting — there's a big dropdown, you pick Contributor, you move on with your day.

What Actually Happens in Production

You've given a single application identity the keys to your entire subscription. If that identity gets abused — through a code vulnerability, token theft, or a misconfigured endpoint — the attacker has Contributor access to everything.

I've audited environments where 15+ managed identities all had Contributor at subscription scope. The actual permissions each one needed? Typically 3-5 specific actions on 1-2 resources.

The Fix

Step 1: Figure out the minimum permissions you actually need.

# List what permissions a built-in role actually grants
az role definition list --name "Storage Blob Data Contributor" --output json \
  | jq '.[0].permissions[0].actions'

Step 2: Create a custom role with only those permissions.

# Create a custom role definition
az role definition create --role-definition '{
  "Name": "Blob Writer - Batch Output Only",
  "Description": "Can write blobs to the batch-output container only",
  "Actions": [],
  "NotActions": [],
  "DataActions": [
    "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write",
    "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/add/action",
    "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read"
  ],
  "NotDataActions": [],
  "AssignableScopes": [
    "/subscriptions/<sub-id>/resourceGroups/rg-data"
  ]
}'

Step 3: Assign the custom role at the narrowest possible scope.

# Assign to the managed identity at container scope
az role assignment create \
  --assignee-object-id $(az identity show --name id-batch-prod --resource-group rg-identity --query principalId -o tsv) \
  --assignee-principal-type ServicePrincipal \
  --role "Blob Writer - Batch Output Only" \
  --scope /subscriptions/<sub-id>/resourceGroups/rg-data/providers/Microsoft.Storage/storageAccounts/stproddata/blobServices/default/containers/batch-output

Pro tip: Use az role assignment list --assignee <principal-id> --all to audit what a managed identity can actually do. Run this regularly. You'll be surprised.

# Audit all role assignments for a managed identity
az role assignment list \
  --assignee a1b2c3d4-e5f6-7890-abcd-ef1234567890 \
  --all \
  --output table

Mistake 4: Assuming Managed Identity Works Everywhere

What People Do Wrong

A team designs their entire authentication architecture around managed identity. CI/CD pipelines, on-prem agents, third-party integrations, multi-cloud workloads — everything will use managed identity. Clean. No secrets anywhere.

Then they start building and realize half their workloads can't use it.

Why It Happens

Microsoft's marketing (and honestly, their docs) push managed identity as the answer to service authentication. And for workloads running inside Azure, it mostly is. But the moment you step outside the Azure compute boundary, it doesn't work.

What Actually Happens in Production

Teams hit walls and start improvising. I've seen GitHub Actions workflows using hardcoded service principal secrets in repository settings because someone assumed managed identity would "just work" with external CI/CD. Azure DevOps self-hosted agents running on-prem can't get managed identity tokens from IMDS. Third-party SaaS platforms integrating with Azure APIs need service principals with certificates.

The worst case I've seen: a team spent two weeks trying to make managed identity work in a Kubernetes cluster running in AWS that needed to talk to Azure Key Vault. The answer was workload identity federation with a service principal — but they burned days because the architecture diagram said "managed identity everywhere."

The Fix

Know where managed identity works and where it doesn't.

Where managed identity works:

Azure VMs and VM Scale Sets
Azure App Service and Function Apps
Azure Container Apps and Azure Container Instances
Azure Kubernetes Service (AKS) via workload identity
Azure Logic Apps, Data Factory, API Management
Azure Arc-enabled servers (extends to on-prem/multi-cloud)

Where you still need service principals or workload identity federation:

GitHub Actions (use OIDC federation with azure/login action)
On-premises servers (without Azure Arc)
Multi-cloud workloads (AWS, GCP compute)
Third-party SaaS integrations
Azure Cloud Services (classic) — no managed identity support at all
Azure DevOps self-hosted agents not running on Azure compute

# For GitHub Actions: set up OIDC federation instead of secrets
# Step 1: Create an app registration and service principal
az ad app create --display-name "github-actions-deploy"
 
APP_ID=$(az ad app list --display-name "github-actions-deploy" --query "[0].appId" -o tsv)
az ad sp create --id $APP_ID
 
# Step 2: Add federated credential for GitHub OIDC
az ad app federated-credential create --id $APP_ID --parameters '{
  "name": "github-actions-main",
  "issuer": "https://token.actions.githubusercontent.com",
  "subject": "repo:your-org/your-repo:ref:refs/heads/main",
  "audiences": ["api://AzureADTokenExchange"]
}'
 
# Step 3: Assign roles to the service principal (not the app)
SP_OBJECT_ID=$(az ad sp show --id $APP_ID --query id -o tsv)
az role assignment create \
  --assignee-object-id $SP_OBJECT_ID \
  --assignee-principal-type ServicePrincipal \
  --role "Contributor" \
  --scope /subscriptions/<sub-id>/resourceGroups/rg-deploy

Docs vs Reality: The docs say "use managed identity when possible." True. But they don't always make it obvious where the boundary is. If your workload runs outside Azure compute, stop and evaluate before assuming managed identity will work.

Mistake 5: Not Understanding System-Assigned vs User-Assigned Lifecycle Behavior

What People Do Wrong

Teams use managed identities without understanding what happens when resources are deleted, moved, or recreated. They get surprised when permissions disappear or when orphaned identities pile up.

Why It Happens

The lifecycle behavior is documented, but it's buried. People learn "system = one resource, user = many resources" and stop there. The critical difference is about what happens during the lifecycle of the resource, not just during creation.

What Actually Happens in Production

Scenario 1: A team uses system-assigned identity on a VM. They delete and recreate the VM for maintenance. The new VM gets a new system-assigned identity — new principal ID. All the old RBAC assignments? Gone. The app starts throwing 403s and nobody knows why.

Scenario 2: A team uses user-assigned identity everywhere. They decommission services but never delete the identities. Six months later, there are dozens of identities with active permissions that aren't assigned to any resource.

The Fix

Understand the lifecycle contract:

Behavior	System-Assigned	User-Assigned
Created	When the resource is created (or identity enabled)	Independently, before resource creation
Deleted	When the resource is deleted	Manually, independent of resources
Shared across resources	No — 1:1 relationship	Yes — 1:many relationship
Resource recreation	New identity, new principal ID, old RBAC lost	Same identity survives, RBAC intact
Orphan risk	None — dies with the resource	High — must be manually cleaned up
RBAC management	Tied to resource lifecycle	Decoupled — must manage separately

For system-assigned, automate the RBAC re-assignment:

# Script to re-assign roles after VM recreation
RESOURCE_GROUP="rg-production"
VM_NAME="vm-api-server"
 
# Get the new principal ID after recreation
PRINCIPAL_ID=$(az vm identity show \
  --resource-group $RESOURCE_GROUP \
  --name $VM_NAME \
  --query principalId -o tsv)
 
# Re-assign the required roles
az role assignment create \
  --assignee-object-id $PRINCIPAL_ID \
  --assignee-principal-type ServicePrincipal \
  --role "Storage Blob Data Reader" \
  --scope /subscriptions/<sub-id>/resourceGroups/rg-data/providers/Microsoft.Storage/storageAccounts/stproddata
 
az role assignment create \
  --assignee-object-id $PRINCIPAL_ID \
  --assignee-principal-type ServicePrincipal \
  --role "Key Vault Secrets User" \
  --scope /subscriptions/<sub-id>/resourceGroups/rg-security/providers/Microsoft.KeyVault/vaults/kv-prod

For user-assigned, run regular cleanup audits:

# Find all user-assigned identities
az identity list --query "[].{Name:name, ResourceGroup:resourceGroup, PrincipalId:principalId}" -o table
 
# Check if an identity is actually assigned to any resource
# (If this returns empty, the identity is orphaned)
az resource list --query "[?identity.userAssignedIdentities.\
'/subscriptions/<sub-id>/resourceGroups/rg-identity/providers/\
Microsoft.ManagedIdentity/userAssignedIdentities/id-batch-prod' != null].name" -o tsv

Pro tip: If you're using Infrastructure as Code (Terraform, Bicep), system-assigned identities are trickier because you have a chicken-and-egg problem — you need the resource to exist to get the principal ID, but you need the principal ID to create role assignments. Use depends_on in Terraform or nested deployments in Bicep to handle this. User-assigned identities avoid this problem entirely since you create them first.

The Decision Framework: Which Identity Type Should You Use?

Stop overthinking it. Here's the decision tree I use with every client:

Use System-Assigned when:

The identity serves exactly one resource
You want automatic cleanup when the resource is deleted
The resource won't be frequently deleted and recreated
You're managing infrastructure manually or with simple IaC

Use User-Assigned when:

Multiple resources need the same set of permissions
Resources are frequently deleted and recreated (e.g., ephemeral VMs, scale sets)
You need to pre-configure RBAC before the target resource exists
You want to separate identity management from resource management

Use a Service Principal when:

The workload runs outside Azure (CI/CD, on-prem, multi-cloud)
The service genuinely doesn't support managed identity
You need certificate-based authentication for regulatory compliance
Prefer OIDC federation over client secrets whenever possible

# Quick audit: see all managed identities and their assignments in your subscription
echo "=== System-Assigned Identities ==="
az resource list --query "[?identity.type=='SystemAssigned'].{Name:name, Type:type, PrincipalId:identity.principalId}" -o table
 
echo "=== User-Assigned Identities ==="
az identity list -o table
 
echo "=== All Role Assignments for Managed Identities ==="
az role assignment list --all \
  --query "[?principalType=='ServicePrincipal'].{Principal:principalId, Role:roleDefinitionName, Scope:scope}" \
  -o table

Wrapping Up

Managed identity is genuinely one of the best security features in Azure. No secrets. No rotation schedules. Tight Entra ID integration. But the implementation details matter.

The truth is, most of these mistakes come from reasonable-sounding decisions. "Use one identity for simplicity." "Give Contributor so it works." "Managed identity everywhere." They all sound right until production teaches you otherwise.

Audit your environment. Check your scoping. Understand your lifecycle behavior. And stop giving Contributor at subscription scope. Please.

Got questions about managed identity in your environment? Drop them in the comments — I read every one.

The 5 Azure Managed Identity Mistakes I See in Every Client Environment

Mistake 1: Using User-Assigned Identity When System-Assigned Would Be Simpler and More Secure

What People Do Wrong

Why It Happens

What Actually Happens in Production

The Fix

What People Do Wrong

Why It Happens

What Actually Happens in Production

The Fix

Mistake 3: Over-Scoping RBAC — Giving Contributor When a Custom Role With 3 Permissions Would Do

What People Do Wrong

Why It Happens

What Actually Happens in Production

The Fix

Mistake 4: Assuming Managed Identity Works Everywhere

What People Do Wrong

Why It Happens

What Actually Happens in Production

The Fix

Mistake 5: Not Understanding System-Assigned vs User-Assigned Lifecycle Behavior

What People Do Wrong

Why It Happens

What Actually Happens in Production

The Fix

The Decision Framework: Which Identity Type Should You Use?

Wrapping Up

Read Next

Implementing Conditional Access for Azure Virtual Desktop

AWS IAM for Azure Admins: What Confused Me and What Finally Clicked

Mistake 1: Using User-Assigned Identity When System-Assigned Would Be Simpler and More Secure

What People Do Wrong

Why It Happens

What Actually Happens in Production

The Fix

Mistake 2: Sharing One Managed Identity Across Resources That Shouldn't Trust Each Other

What People Do Wrong

Why It Happens

What Actually Happens in Production

The Fix

Mistake 3: Over-Scoping RBAC — Giving Contributor When a Custom Role With 3 Permissions Would Do

What People Do Wrong

Why It Happens

What Actually Happens in Production

The Fix

Mistake 4: Assuming Managed Identity Works Everywhere

What People Do Wrong

Why It Happens

What Actually Happens in Production

The Fix

Mistake 5: Not Understanding System-Assigned vs User-Assigned Lifecycle Behavior

What People Do Wrong

Why It Happens

What Actually Happens in Production

The Fix

The Decision Framework: Which Identity Type Should You Use?

Wrapping Up

Read Next

Implementing Conditional Access for Azure Virtual Desktop

AWS IAM for Azure Admins: What Confused Me and What Finally Clicked