Azure WAF and Front Door

Azure WAF and Front Door

Azure Front Door

Azure Front Door is a global, scalable entry point for your web applications. Think of it as a smart traffic cop sitting at the edge of Microsoft’s global network that routes users to the fastest, most available backend.

Key capabilities:

  • Global load balancing — distributes traffic across regions, routing users to the nearest or healthiest backend
  • SSL/TLS termination — handles HTTPS offloading at the edge, reducing backend load
  • URL-based routing — routes /api/* to one backend and /images/* to another
  • Caching — caches static content at edge locations (POPs) to reduce latency
  • Health probes — automatically detects unhealthy backends and reroutes traffic
  • Session affinity — sticky sessions to keep a user on the same backend

Front Door operates at Layer 7 (HTTP/HTTPS) and uses Microsoft’s global private WAN backbone, so traffic travels faster than the public internet.


Azure WAF (Web Application Firewall)

Azure WAF is a security layer that inspects and filters HTTP/S traffic to protect web apps from common exploits and vulnerabilities.

What it protects against:

  • SQL injection
  • Cross-site scripting (XSS)
  • OWASP Top 10 threats
  • Bot attacks and scraping
  • Rate limiting / DDoS at Layer 7
  • Custom rule-based threats (e.g. block specific IPs, countries, headers)

Two modes:

  • Detection mode — logs threats but doesn’t block (good for tuning)
  • Prevention mode — actively blocks malicious requests

How They Work Together

WAF is a feature/policy that runs on top of Front Door (and also on Application Gateway). You attach a WAF policy to your Front Door profile, and it inspects all incoming traffic before it reaches your backends.

User Request
┌─────────────────────────────┐
│ Azure Front Door │ ← Global routing, caching, SSL termination
│ ┌───────────────────────┐ │
│ │ WAF Policy │ │ ← Inspect & filter malicious traffic
│ └───────────────────────┘ │
└─────────────────────────────┘
Your Backend (App Service, AKS, VM, etc.)

Front Door Tiers

FeatureStandardPremium
CDN + load balancing
WAFBasic rules only✅ Full (managed + custom rules)
Bot protection
Private Link to backends

When to Use What

ScenarioUse
Global traffic routing + failoverFront Door alone
Protect a single-region appApplication Gateway + WAF
Protect a global appFront Door + WAF (Premium)
Edge caching + securityFront Door + WAF

In short: Front Door gets traffic to the right place fast; WAF makes sure that traffic is safe.

Azure Resource Graph – find orphaned resource


What is Azure Resource Graph?

Azure Resource Graph lets you query all your resources across subscriptions using KQL (Kusto Query Language)—fast and at scale.

👉 Perfect for finding:

  • Orphaned disks
  • Unattached NICs
  • Unused public IPs
  • Resources missing relationships

What is an “Orphaned Resource”?

An orphaned resource is:

  • Not attached to anything
  • Still costing money or creating risk

Examples:

  • Disk not attached to any VM
  • Public IP not associated
  • NIC not connected
  • NSG not applied

Common Queries to Find Orphaned Resources


1. Unattached Managed Disks

Resources
| where type == "microsoft.compute/disks"
| where properties.diskState == "Unattached"
| project name, resourceGroup, location, diskSizeGB

👉 Finds disks not connected to any VM


2. Unused Public IP Addresses

Resources
| where type == "microsoft.network/publicipaddresses"
| where isnull(properties.ipConfiguration)
| project name, resourceGroup, location, sku

👉 These are exposed but unused → security + cost risk


3. Unattached Network Interfaces (NICs)

Resources
| where type == "microsoft.network/networkinterfaces"
| where isnull(properties.virtualMachine)
| project name, resourceGroup, location

4. Unused Network Security Groups (NSGs)

Resources
| where type == "microsoft.network/networksecuritygroups"
| where isnull(properties.networkInterfaces)
and isnull(properties.subnets)
| project name, resourceGroup, location

5. Empty Resource Groups (Bonus)

ResourceContainers
| where type == "microsoft.resources/subscriptions/resourcegroups"
| join kind=leftouter (
Resources
| summarize count() by resourceGroup
) on resourceGroup
| where count_ == 0 or isnull(count_)
| project resourceGroup

How to Run These Queries

You can run them in:

  • Azure Portal → Resource Graph Explorer
  • CLI:az graph query -q "<query>"
  • PowerShell:Search-AzGraph -Query "<query>"

Pro Tip (Senior-Level Insight)

👉 Don’t just find orphaned resources—automate cleanup

  • Schedule queries using:
    • Azure Automation
    • Logic Apps
  • Trigger:
    • Alerts
    • Cleanup workflows

Interview Answer

I use Azure Resource Graph with KQL queries to identify orphaned resources at scale across subscriptions. For example, I can query for unmanaged disks where the disk state is unattached, or public IPs without an associated configuration. Similarly, I check for NICs not linked to VMs and NSGs not applied to subnets or interfaces.

Beyond detection, I typically integrate these queries into automated governance workflows—using alerts or scheduled jobs to either notify teams or trigger cleanup—so we continuously reduce cost and improve security posture.


One-Liner to Remember

👉
“Resource Graph + KQL = fast, cross-subscription visibility for orphaned resources.”


Here’s a solid production-ready pattern, plus a script approach you can talk through in an interview.

Production cleanup strategy

Use Azure Resource Graph for detection, then use Azure Automation with Managed Identity for controlled remediation. Resource Graph is built for cross-subscription inventory queries at scale, and its query language is based on KQL. You can run the same queries in the portal, Azure CLI with az graph query, or PowerShell with Search-AzGraph. (Microsoft Learn)

Safe workflow

Phase 1: Detect
Run queries for likely orphaned resources such as unattached disks, unused public IPs, unattached NICs, and unused NSGs. Azure documents advanced query samples and the CLI quickstart for running them. (Microsoft Learn)

Phase 2: Classify
Do not delete immediately. First separate findings into:

  • definitely orphaned
  • likely orphaned
  • needs human review

A good rule is to require at least one of these before cleanup:

  • older than X days
  • no keep tag
  • no recent change window
  • not in protected subscriptions or resource groups

You can also use Resource Graph change history to review whether a resource was recently modified before acting. (Microsoft Learn)

Phase 3: Notify
Send a report to the owning team or central platform team. Include:

  • resource ID
  • resource group
  • subscription
  • resource age or last change
  • proposed action
  • deadline for objection

Phase 4: Quarantine before delete
For risky resource types, first tag them with something like:

  • cleanupCandidate=true
  • cleanupMarkedDate=2026-04-13
  • cleanupOwner=platform

Then wait 7 to 30 days depending on the environment.

Phase 5: Delete with guardrails
Only auto-delete low-risk items such as clearly unattached disks or unused public IPs after the waiting window. Keep production subscriptions on approval-based cleanup unless the criteria are extremely strict.

Good governance rules

A mature setup usually includes:

  • exclusion tags like doNotDelete=true
  • separate policy for prod vs non-prod
  • allowlist of critical subscriptions
  • dry-run mode by default
  • centralized logs of all cleanup actions
  • approval gate for medium-risk deletions

This aligns well with Azure’s broader security and operations guidance, and Azure Automation supports managed identities so runbooks can access Azure without stored secrets. (Microsoft Learn)

Example architecture

Azure Resource Graph
|
v
Scheduled Automation Runbook
(with Managed Identity)
|
+--> Query orphaned resources
+--> Filter by tags / age / subscription
+--> Write report to Storage / Log Analytics
+--> Notify owners
+--> Optional approval step
+--> Delete approved resources

Example: Azure CLI script

This is a simple version for unattached managed disks. Start in report-only mode.

#!/usr/bin/env bash
set -euo pipefail
QUERY="
Resources
| where type =~ 'microsoft.compute/disks'
| where properties.diskState =~ 'Unattached'
| project id, name, resourceGroup, subscriptionId, location, tags
"
echo "Finding unattached managed disks..."
az graph query -q "$QUERY" --first 1000 -o json > orphaned-disks.json
echo "Report saved to orphaned-disks.json"
cat orphaned-disks.json | jq -r '.data[] | [.subscriptionId, .resourceGroup, .name, .id] | @tsv'

Azure CLI supports az graph query for Resource Graph queries. (Microsoft Learn)

Example: safer delete flow in Bash

This version only deletes disks that:

  • are unattached
  • are not tagged doNotDelete=true
#!/usr/bin/env bash
set -euo pipefail
QUERY="
Resources
| where type =~ 'microsoft.compute/disks'
| where properties.diskState =~ 'Unattached'
| extend doNotDelete = tostring(tags.doNotDelete)
| where doNotDelete !~ 'true'
| project id, name, resourceGroup, subscriptionId, location
"
RESULTS=$(az graph query -q "$QUERY" --first 1000 -o json)
echo "$RESULTS" | jq -c '.data[]' | while read -r row; do
ID=$(echo "$row" | jq -r '.id')
NAME=$(echo "$row" | jq -r '.name')
echo "Deleting unattached disk: $NAME"
az resource delete --ids "$ID"
done

For production, add:

  • dry-run flag
  • approval list
  • deletion logging
  • retry handling
  • resource locks check

Example: PowerShell runbook pattern

This is closer to what many platform teams use in Azure Automation.

Disable-AzContextAutosave -Scope Process
Connect-AzAccount -Identity
$query = @"
Resources
| where type =~ 'microsoft.network/publicipaddresses'
| where isnull(properties.ipConfiguration)
| extend doNotDelete = tostring(tags.doNotDelete)
| where doNotDelete !~ 'true'
| project id, name, resourceGroup, subscriptionId, location
"@
$results = Search-AzGraph -Query $query
foreach ($item in $results) {
Write-Output "Cleanup candidate: $($item.name) [$($item.id)]"
# Dry run by default
# Remove-AzResource -ResourceId $item.id -Force
}

Search-AzGraph is the PowerShell command for Resource Graph, and Azure Automation supports system-assigned or user-assigned managed identities for authenticating runbooks securely. (Microsoft Learn)

What to say in an interview

A strong answer would sound like this:

I’d use Azure Resource Graph to detect orphaned resources across subscriptions, then feed those results into an Azure Automation runbook running under Managed Identity. I would never delete immediately. Instead, I’d apply filters like age, tags, subscription scope, and recent change history, then notify owners or mark resources for cleanup first. For low-risk resources in non-production, I might automate deletion after a quarantine period. For production, I’d usually keep an approval gate. That gives you cost control without creating operational risk. (Microsoft Learn)

Best resource types to target first

Start with the safest, highest-confidence cleanup candidates:

  • unattached managed disks
  • public IPs with no association
  • NICs not attached to VMs
  • NSGs not attached to subnets or NICs (Microsoft Learn)

Most Secure Identity in Microsoft Azure


🔐 Most Secure Identity in Microsoft Azure

The most secure identity type is:

👉 Managed Identity

Why Managed Identity is the most secure:

  • No credentials to store (no passwords, secrets, or keys)
  • Automatically managed by Azure
  • Uses Microsoft Entra ID behind the scenes
  • Eliminates risk of:
    • Credential leaks
    • Hardcoded secrets in code

Example:

An Azure VM accessing Azure Key Vault using Managed Identity—no secrets needed at all.


🧩 Types of Identities in Azure

There are 3 main identity types you should know:


1. 👤 User Identity

  • Represents a person
  • Used for:
    • Logging into Azure Portal
    • Admin access
  • Stored in Entra ID

2. 🧾 Service Principal

  • Identity for applications or services
  • Used in:
    • CI/CD pipelines (e.g., GitHub Actions)
    • Automation scripts
  • Requires:
    • Client ID + Secret or Certificate

⚠️ Less secure than Managed Identity because secrets must be managed


3. 🤖 Managed Identity (Best Practice)

  • Special type of Service Principal managed by Azure
  • Two subtypes:

• System-assigned

  • Tied to one resource (e.g., VM, App Service)
  • Deleted when resource is deleted

• User-assigned

  • مستقل (independent)
  • Can be shared across multiple resources

🧠 Interview-Ready Answer

“The most secure identity in Azure is Managed Identity because it eliminates the need to manage credentials like client secrets or certificates. It’s automatically handled by Azure and integrates with Entra ID, reducing the risk of credential leakage.

In Azure, there are three main identity types: user identities for people, service principals for applications, and managed identities, which are a more secure, Azure-managed version of service principals. Managed identities come in system-assigned and user-assigned forms, depending on whether they’re tied to a single resource or reusable across multiple resources.”


Managed Identity is usually the best choice—but not always.


🚫 When NOT to Use Managed Identity in Microsoft Azure

1. ❌ Accessing Resources Outside Azure

Managed Identity only works within Azure + Microsoft Entra ID.

👉 Don’t use it if:

  • You need to access:
    • AWS / GCP services
    • External APIs (Stripe, GitHub, etc.)
    • On-prem systems without Entra integration

✔️ Use instead:

  • Service Principal (with secret/cert)
  • Or API keys / OAuth depending on the service

2. ❌ Cross-Tenant Access

Managed Identities are tied to one Azure tenant.

👉 Problem:

  • You can’t easily use a Managed Identity to authenticate into another tenant

✔️ Use instead:

  • Service Principal with explicit cross-tenant permissions

3. ❌ Local Development / Non-Azure Environments

Managed Identity only exists inside Azure resources.

👉 Doesn’t work:

  • On your laptop
  • In local Docker containers
  • On-prem servers

✔️ Use instead:

  • Developer login (az login)
  • Service Principal for testing

4. ❌ CI/CD Pipelines Outside Azure (Important!)

If your pipeline runs in:

  • GitHub-hosted runners
  • Jenkins
  • GitLab

👉 Managed Identity won’t work directly (no Azure resource identity)

✔️ Use instead:

  • Service Principal
    OR (better modern approach):
  • Federated Identity Credentials (OIDC)

5. ❌ Fine-Grained Credential Control Needed

Managed Identity is:

  • Automatically rotated
  • Not directly visible or exportable

👉 Not ideal when:

  • You need explicit credential lifecycle control
  • You must integrate with legacy systems requiring static credentials

6. ❌ Unsupported Services / Legacy Scenarios

Some older or niche services:

  • Don’t support Managed Identity authentication

✔️ You’re forced to use:

  • Service Principal
  • Connection strings / secrets (secured via Azure Key Vault)

⚖️ Quick Rule of Thumb

👉 Use Managed Identity when:

  • Resource is in Azure
  • Target service supports Entra ID
  • Same tenant

👉 Avoid it when:

  • Outside Azure
  • Cross-tenant
  • Local/dev or external CI/CD

🧠 Interview-Level Answer

“Managed Identity is the most secure option in Azure, but it’s not suitable in all scenarios. For example, it doesn’t work outside Azure environments, so for local development or external CI/CD systems like GitHub Actions, you’d need a service principal or federated identity. It’s also limited to a single Entra ID tenant, so cross-tenant access scenarios typically require a service principal.

Additionally, if you’re integrating with external APIs or legacy systems that don’t support Entra ID, Managed Identity won’t work. In those cases, you fall back to service principals or other credential mechanisms, ideally storing secrets securely in Key Vault.”


Perfect—this is exactly how interviewers probe deeper 👇


🎯 Tricky Scenario Question

“You have an application running in GitHub Actions that needs to deploy resources into Microsoft Azure. You want to avoid using secrets. Would you use Managed Identity?”


❗ What They Expect You to Notice

  • GitHub Actions runs outside Azure
  • ❌ No native Managed Identity available

👉 So if you answer “Managed Identity” → that’s wrong


✅ Strong Answer

“I would not use Managed Identity here because GitHub Actions runs outside Azure, so it doesn’t have access to a Managed Identity. Instead, I would use a Service Principal with Federated Identity Credentials using OIDC. This allows GitHub to authenticate to Azure without storing secrets, which maintains a high level of security.”


🔐 The Correct Architecture (Modern Best Practice)

  • GitHub Actions → OIDC token
  • Trusted by Microsoft Entra ID
  • Maps to a Service Principal
  • Azure grants access via RBAC

👉 Result:

  • ✅ No secrets
  • ✅ Short-lived tokens
  • ✅ Secure + scalable

🧠 Follow-Up Trap Question


Why not just use a Service Principal with a client secret?

🔥 Strong Answer:

“You can, but it introduces risk because the secret must be stored and rotated. If it’s leaked, it can be used until it expires. Federated identity with OIDC is more secure because it uses short-lived tokens and eliminates secret management entirely.”


💡 Bonus Edge Case

If you add this, you’ll stand out:

“In Azure-hosted pipelines like Azure DevOps with self-hosted agents running on Azure VMs, you could use Managed Identity—but for external platforms like GitHub Actions, federated identity is the better approach.”


🏁 One-Liner Summary

👉
“Managed Identity is best inside Azure; outside Azure, use federated identity instead of secrets.”


AZ Landing Zone with diagram

An Azure Landing Zone is basically the foundation of your cloud environment—a pre-configured setup in Microsoft Azure that ensures everything you build is secure, scalable, and well-organized from day one.


🧱 What is an Azure Landing Zone?

Think of it like setting up the rules and structure before building a city.

An Azure Landing Zone provides:

  • A standardized environment
  • Built using best practices (security, governance, networking)
  • Ready for workloads (apps, data, services) to be deployed

It’s part of the Cloud Adoption Framework (CAF) by Microsoft.


🧩 Core Components

1. Management Groups & Subscriptions

  • Organizes resources hierarchically
  • Example:
    • Root → Platform → Landing Zones → Workloads

2. Identity & Access Management

  • Uses Microsoft Entra ID
  • Controls:
    • Who can access what
    • Role-Based Access Control (RBAC)

3. Networking

  • Hub-and-spoke or Virtual WAN architecture
  • Includes:
    • VNets, subnets
    • Private endpoints
    • Firewalls

4. Governance & Policies

  • Uses Azure Policies to enforce rules:
    • Allowed regions
    • Naming conventions
    • Security requirements

5. Security & Monitoring

  • Tools like:
    • Microsoft Defender for Cloud
    • Microsoft Sentinel
  • Logging, alerts, compliance tracking

6. Platform Services

  • Shared services like:
    • Key Vault
    • DNS
    • Backup
  • Example: Azure Key Vault

🏗️ Types of Landing Zones

1. Platform Landing Zone

  • Shared infrastructure (networking, identity, security)

2. Application Landing Zone

  • Where actual apps/workloads run

🧠 Why It Matters (Interview Gold)

  • Prevents chaos and misconfiguration
  • Enforces security & compliance at scale
  • Enables automation (IaC + CI/CD)
  • Supports multi-team environments

💬 2-Minute Interview Answer (Polished)

“An Azure Landing Zone is a pre-configured, enterprise-ready environment in Microsoft Azure that provides the foundational setup for deploying workloads securely and consistently. It includes key components like management groups and subscriptions for organization, identity and access management through Entra ID, hub-and-spoke networking, and governance using Azure Policies.

It also integrates security and monitoring tools like Defender for Cloud and Sentinel, along with shared platform services such as Key Vault. The goal is to enforce best practices from the start—covering security, compliance, and scalability—so teams can deploy applications without worrying about underlying infrastructure standards.

In practice, I’ve seen Landing Zones implemented using Infrastructure as Code with tools like Terraform or Bicep, combined with CI/CD pipelines, to ensure everything is repeatable and governed automatically.”


Azure – Landing Zone

An Azure Landing Zone is the “plumbing and wiring” of your cloud environment. It is a set of best practices, configurations, and governance rules that ensure a subscription is ready to host workloads securely and at scale.

If you think of a workload (like a website or database) as a house, the Landing Zone is the city block—it provides the electricity, water, roads, and security so the house can function.


🏛️ The Conceptual Architecture

A landing zone follows a Hub-and-Spoke design, ensuring that common services (like firewalls and identity) aren’t repeated for every single application.

1. The Management Group Hierarchy

Instead of managing one giant subscription, you organize them into “folders” called Management Groups:

  • Platform: Contains the “Engine Room” (Identity, Management, and Connectivity).
  • Workloads (Landing Zones): Where your actual applications live (Production, Development, Sandbox).
  • Decommissioned: Where old subscriptions go to die while retaining data for audit.

🏗️ The 8 Critical Design Areas

When you build a landing zone, you must make decisions in these eight categories:

  1. Enterprise Agreement (EA) & Tenants: How you bill and manage the top-level account.
  2. Identity & Access Management (IAM): Setting up Microsoft Entra ID and RBAC.
  3. Network Topology: Designing the Hub-and-Spoke, VNet peering, and hybrid connectivity (VPN/ExpressRoute).
  4. Resource Organization: Establishing a naming convention and tagging strategy.
  5. Security: Implementing Defender for Cloud and Azure Policy.
  6. Management: Centralizing logging in a Log Analytics Workspace.
  7. Governance: Using Azure Policy to prevent “shadow IT” (e.g., “No VMs allowed outside of East US”).
  8. Deployment: Using Infrastructure as Code (Terraform, Bicep, or Pulumi) to deploy the environment.

🚀 Two Main Implementation Paths

A. “Platform” Landing Zone (The Hub)

This is the central infrastructure managed by your IT/Cloud Platform team.

  • Connectivity Hub: Contains Azure Firewall, VPN Gateway, and Private DNS Zones.
  • Identity: Dedicated subscription for Domain Controllers or Entra Domain Services.
  • Management: Centralized Log Analytics and Automation accounts.

B. “Application” Landing Zone (The Spoke)

This is a subscription handed over to a development team.

  • It comes pre-configured with network peering back to the Hub.
  • It has Policies already applied (e.g., “Encryption must be enabled on all disks”).
  • The dev team has “Contributor” rights to build their app, but they cannot break the underlying network or security rules.

🛠️ How do you actually deploy it?

Microsoft provides the “Accelerator”—a set of templates that allow you to deploy a fully functional enterprise-scale environment in a few clicks or via code.

  1. Portal-based: Use the “Azure Landing Zone Accelerator” in the portal.
  2. Bicep/Terraform: Use the official Azure/Terraform-azurerm-caf-enterprise-scale modules.

✅ Why do it?

  • Scalability: You can add 100 subscriptions without manual setup.
  • Security: Guardrails are “baked in” from day one.
  • Cost Control: Centralized monitoring stops “orphan” resources from running up the bill.

Azure DNZ zone with autoregistration enabled,

Here’s what it means in plain terms:

The short version

When you link a Virtual Network to a Private DNS Zone with autoregistration enabled, Azure automatically maintains DNS records for every VM in that VNet. You don’t touch the DNS zone manually — Azure handles it for you.

What happens at each VM lifecycle event

When you link a virtual network with a private DNS zone with this setting enabled, a DNS record gets created for each virtual machine deployed in the virtual network. For each virtual machine, an address (A) record is created.

If autoregistration is enabled, Azure Private DNS updates DNS records whenever a virtual machine inside the linked virtual network is created, changes its IP address, or is deleted.

So the three automatic actions are:

  • VM created → A record added (vm-web-01 → 10.0.0.4)
  • VM IP changes → A record updated automatically
  • VM deleted or deallocated → A record removed from the zone

What powers it under the hood

The private zone’s records are populated by the Azure DHCP service — client registration messages are ignored. This means it’s the Azure platform doing the work, not the VM’s operating system. If you configure a static IP on the VM without using Azure’s DHCP, changes to the hostname or IP won’t be reflected in the zone.

Important limits to know

A specific virtual network can be linked to only one private DNS zone when automatic registration is enabled. You can, however, link multiple virtual networks to a single DNS zone.

Autoregistration works only for virtual machines. For all other resources like internal load balancers, you can create DNS records manually in the private DNS zone linked to the virtual network.

Also, autoregistration doesn’t support reverse DNS pointer (PTR) records.

The practical benefit

In a classic setup without autoregistration, every time a VM is deployed or its IP changes, someone has to go manually update the DNS zone. With autoregistration on, your VMs are always reachable by a friendly name like vm-web-01.internal.contoso.com from anywhere inside the linked VNet — with zero manual effort, and no stale records left behind after deletions.

AZ – IAM

Azure IAM is best understood as two interlocking systems working together. Let me show you the big picture first, then how a request actually flows through it.Azure IAM is built around one question answered in two steps: who are you? and what are you allowed to do? Those two steps map to two distinct systems that work together.


Pillar 1 — Microsoft Entra ID (formerly Azure Active Directory): identity

This is the authentication layer. It answers “who are you?” by verifying credentials and issuing a token. It manages every type of identity in Azure: human users, guest accounts, groups, service principals (for apps and automation), and managed identities (the zero-secret identity type where Azure owns the credential). It also enforces Conditional Access policies — rules that say things like “only allow login from compliant devices” or “require MFA when signing in from outside the corporate network.”

Pillar 2 — Azure RBAC (Role-Based Access Control): access

This is the authorization layer. It answers “what can you do?” once identity is proven. RBAC works through three concepts combined into a role assignment:

  • A security principal — the identity receiving the role (user, group, service principal, or managed identity)
  • A role definition — what actions are permitted (e.g., Owner, Contributor, Reader, or a custom role)
  • A scope — where the role applies, which follows a hierarchy: Management Group → Subscription → Resource Group → individual Resource

A role assigned at a higher scope automatically inherits down. Give someone Reader on a subscription and they can read everything inside it.

The supporting tools

Three tools round out a mature IAM setup. PIM (Privileged Identity Management) implements just-in-time access — instead of being a permanent Owner, you request elevation for 2 hours, do the work, and the permission expires automatically. Access Reviews let you periodically re-validate who still needs access, cleaning up stale assignments. Azure Policy enforces guardrails at scale — for example, preventing anyone from assigning Owner at the subscription level without an approval workflow.

The core principle threading through all of it

Least privilege: grant the minimum role, at the narrowest scope, for the shortest duration. This is what PIM, custom roles, and resource-group-level assignments all support — shrinking the blast radius if any identity is ever compromised.

Types of ID are in Azure

Here’s the full breakdown:


🏆 Most secure identity: Managed Identity

What makes managed identities uniquely secure is that no one knows the credentials — they are automatically created by Azure, including the credentials themselves. This eliminates the biggest risk in cloud security: leaked or hardcoded secrets. Managed identity replaces secrets such as access keys or passwords, and can also replace certificates or other forms of authentication for service-to-service dependencies.


How many identity types are there in Azure?

At a high level, there are two types of identities: human and machine/non-human identities. Machine/non-human identities consist of device and workload identities. In Microsoft Entra, workload identities are applications, service principals, and managed identities.

Breaking it down further, Azure has 4 main categories with several sub-types:

1. Human identities

  • User accounts (employees, admins)
  • Guest/B2B accounts (external partners)
  • Consumer/B2C accounts (end-users via social login)

2. Workload/machine identities

  • Managed Identity — most secure; no secrets to manage
    • System-assigned: tied to the lifecycle of an Azure resource; when the resource is deleted, Azure automatically deletes the service principal.
    • User-assigned: a standalone Azure resource that can be assigned to one or more Azure resources — the recommended type for Microsoft services.
  • Service Principal — three main types exist: Application service principal, Managed identity service principal, and Legacy service principal.

3. Device identities

  • Entra ID joined (corporate devices)
  • Hybrid joined (on-prem + cloud)
  • Entra registered / BYOD (personal devices)

Why prefer Managed Identity over Service Principal?

Microsoft Entra tokens expire every hour, reducing exposure risk compared to Personal Access Tokens which can last up to one year. Managed identities handle credential rotation automatically, and there is no need to store long-lived credentials in code or configuration. Service principals, by contrast, require you to manually rotate client secrets or certificates — a 2025 report highlighted that 23.77 million secrets were leaked on GitHub in 2024 alone, underscoring the risks of hardcoded credentials.

The rule of thumb: use Managed Identity whenever your workload runs inside Azure. Use a Service Principal only when you need to authenticate from outside Azure (CI/CD pipelines, on-premises systems, multi-cloud).

How to investigate spike in azure


Quick Decision Tree First

Do you know which service spiked?

  • Yes → Skip to Step 3
  • No → Start at Step 1

Step 1: Pinpoint the Spike in Cost Management

  • Azure Portal → Cost Management → Cost Analysis
  • Set view to Daily to find the exact day
  • Group by Service Name first → tells you what spiked
  • Then group by Resource → tells you which specific resource

Step 2: Narrow by Dimension

Keep drilling down by:

  • Resource Group
  • Resource type
  • Region (unexpected cross-region egress is a common hidden cost)
  • Meter (very granular — shows exactly what operation you’re being charged for)

Step 3: Go to the Offending Resource

Once you know what it is:

ServiceWhere to look
VM / VMSSCheck scaling events, uptime, instance count
StorageCheck blob transactions, egress, data written
Azure SQL / SynapseQuery history, DTU spikes, long-running queries
ADF (Data Factory)Pipeline run history — loops, retries, backfills
DatabricksCluster history — was a cluster left running?
App ServiceScale-out events, request volume
Azure FunctionsExecution count — was something stuck in a loop?

Step 4: Check Activity Log

  • Monitor → Activity Log
  • Filter by the spike timeframe
  • Look for:
    • New resource deployments
    • Scaling events
    • Config changes
    • Who or what triggered it (user vs service principal)

This answers “what changed?”


Step 5: Check Azure Monitor Metrics

  • Go to the specific resource → Metrics
  • Look at usage metrics around the spike time:
    • CPU / memory
    • Data in/out (egress is often the culprit)
    • Request count
    • DTU / vCore usage

Correlate the metric spike timeline with the cost spike timeline.


Step 6: Check Logs (Log Analytics / KQL)

If you have Log Analytics workspace connected:

// Example: Find expensive or long-running operations
AzureActivity
| where TimeGenerated between (datetime(2026-04-01) .. datetime(2026-04-11))
| where ActivityStatusValue == "Success"
| summarize count() by OperationNameValue, ResourceGroup
| order by count_ desc
// Check for VM scaling events
AzureActivity
| where OperationNameValue contains "virtualMachines"
| where TimeGenerated > ago(7d)
| project TimeGenerated, Caller, OperationNameValue, ResourceGroup

Step 7: Check for Common Culprits

These are the most frequent causes of unexpected spikes:

  • 🔁 Pipeline/job stuck in a loop (ADF, Functions, Logic Apps)
  • 💾 Unexpected data egress (cross-region or internet-bound traffic)
  • 📈 Auto-scaling that didn’t scale back down
  • 🗄️ Full table scan or bad query in SQL/Synapse
  • 🖥️ VM or cluster left running after a job
  • 📦 Historical data backfill triggered accidentally
  • 🔄 Snapshot or backup policy changed

The Mental Model

Cost Analysis (when + what?)
→ Drill by dimension (which resource?)
→ Activity Log (what changed?)
→ Metrics (how did usage behave?)
→ Logs/KQL (why did it happen?)

AZ- VM types of “stress”: CPU, Memory, and I/O

If you need to put load on an Azure VM for testing (like checking if your Azure Monitor Alerts or Autoscale settings are working), using a Perl script is a classic, lightweight way to do it.

Below are three scripts tailored for different types of “stress”: CPU, Memory, and I/O.


1. CPU Load Script

This script performs high-intensity mathematical calculations in a loop to pin the processor.

Perl

#!/usr/bin/perl
# CPU Stress Test
use strict;
use warnings;
print "Starting CPU Load... Press Ctrl+C to stop.\n";
# This will run on one core.
# To stress multiple cores, run this script multiple times in the background.
while (1) {
my $x = sqrt(rand(1000000)) * sin(rand(1000000));
}
  • Pro Tip: If your VM has 4 cores and you want to hit 100% total CPU, run this 4 times: perl cpu_load.pl & perl cpu_load.pl & perl cpu_load.pl & perl cpu_load.pl &

2. Memory (RAM) Load Script

This script creates a massive string and keeps adding to it to consume available RAM. Warning: Be careful with this; if it consumes all RAM, the Linux OOM (Out of Memory) killer might crash the VM.

Perl

#!/usr/bin/perl
# Memory Stress Test
use strict;
use warnings;
print "How many MB of RAM should I consume? ";
my $mb_to_hit = <STDIN>;
chomp($mb_to_hit);
my $data = "";
my $chunk = "A" x (1024 * 1024); # 1MB string chunk
print "Allocating memory...\n";
for (1..$mb_to_hit) {
$data .= $chunk;
print "Currently holding approx $_ MB\n" if $_ % 100 == 0;
}
print "Memory allocated. Press Enter to release memory and exit.";
<STDIN>;

3. I/O (Disk) Load Script

This script continuously writes and deletes a file to stress the Virtual Machine’s disk IOPS (Input/Output Operations Per Second).

Perl

#!/usr/bin/perl
# Disk I/O Stress Test
use strict;
use warnings;
my $filename = "test_load_file.tmp";
print "Starting Disk I/O load... Press Ctrl+C to stop.\n";
while (1) {
open(my $fh, '>', $filename) or die "Could not open file: $!";
print $fh "This is a stress test line\n" x 10000;
close $fh;
unlink($filename); # Deletes the file immediately to repeat the write
}

💡 The “Cloud Native” Alternative: stress-ng

While Perl scripts are great, most Azure Engineers use a tool called stress-ng. It is purpose-built for this and gives you much more granular control over exactly how many cores or how much RAM you hit.

To install and run (Ubuntu/Debian):

Bash

sudo apt update && sudo apt install stress-ng -y
# Stress 2 CPUs for 60 seconds
stress-ng --cpu 2 --timeout 60s
# Stress 1GB of RAM
stress-ng --vm 1 --vm-bytes 1G --timeout 60s

🛑 Important Reminder

When putting load on a VM, keep a separate window open with the command top or htop (if installed) to monitor the resource usage in real-time. If you are testing Azure Autoscale, remember that it usually takes 5–10 minutes for the Azure portal to reflect the spike and trigger the scaling action!