It sounds like you are describing a “Traffic Spike” or perhaps a “Burst” in resource usage. Since there are no alerts, you are in “detective mode,” looking for a silent surge that hasn’t crossed a threshold yet but is clearly visible in your telemetry.
If you heard “frost,” you might be referring to “Cold Start” spikes (common in Serverless/Functions) or a “Request Spike.” Here are the steps to track down the source of a sudden surge in Azure:
π Step 1: Use Azure Monitor “Metrics Explorer”
Since you don’t have alerts, you need to visualize the spike to see its “shape.”
- Go to the resource (e.g., App Service, VM, or Load Balancer).
- Select Metrics from the left menu.
- Add the Request Count (for apps) or CPU/Network In (for VMs) metric.
- The Secret Step: Change the “Aggregation” to Count or Sum and look for the exact timestamp of the spike.
- Apply Splitting: Split the metric by “RemoteIP” or “Instance”. This tells you if the spike is coming from one specific user/IP or hitting one specific server.
π΅οΈ Step 2: Dig into Log Analytics (KQL)
If the metrics show a spike but not the “who,” you need the logs. This is where you find the “Source.”
- Go to Logs (Log Analytics Workspace).
- Run a query to find the top callers during that spike period.
Example KQL for App Gateways/Web Apps:
Code snippet
// Find the top 10 IP addresses causing the spikeAzureDiagnostics| where TimeGenerated > datetime(2026-04-10T12:00:00Z) // Set to your spike time| where Category == "ApplicationGatewayAccessLog"| summarize RequestCount = count() by clientIP_s| top 10 by RequestCount
- Result: If one IP address has 50,000 requests while others have 10, you’ve found a bot or a misconfigured client.
π Step 3: Check “Application Insights” (App Level)
If the spike is happening inside your application code (e.g., a “Cold Start” or a heavy API call):
- Go to Application Insights > Failures or Performance.
- Look at the “Top 10 Operations”.
- Check if a specific API endpoint (e.g.,
/api/export) suddenly jumped in volume. - Use User Map to see if the traffic is coming from a specific geographic region (e.g., a sudden burst of traffic from a country you don’t usually service).
πΊοΈ Step 4: Network Watcher (Infrastructure Level)
If you suspect the spike is at the “packet” level (like a DDoS attempt or a backup job gone rogue):
- Go to Network Watcher > NSG Flow Logs.
- Use Traffic Analytics. It provides a map showing which VNets or Public IPs are sending the most data.
- Check for “Flows”: It will show you the “Source Port” and “Destination Port.” If you see a spike on Port 22 (SSH) or 3389 (RDP), someone is likely trying to brute-force your VMs.
π€ Step 5: Check for “Auto-Scaling” Events
Sometimes the “spike” isn’t a problem, but a reaction.
- Go to Activity Log.
- Filter for “Autoscale” events.
- If the spike happened exactly when a new instance was added, the “spike” might actually be the resource “warming up” (loading caches, etc.), which can look like a surge in CPU or Disk I/O.
Summary Checklist:
- Metrics Explorer: To see when it happened and how big it was.
- Log Analytics (KQL): To find the specific Client IP or User Agent.
- Traffic Analytics: To see if it was a Network-level burst.
- Activity Log: To see if any Manual Changes or Scaling occurred at that exact second.
A common real-world “mystery spike” case. Since you mentioned “frost spike” and “source space,” you are likely referring to a Cost Spike or a Request/Throughput Spike in your resource namespace.
If there are no alerts firing, it means the spike either didn’t hit a specific threshold or was too brief to trigger a standard “Static” alert.
ποΈ Step 1: Establish the “When” and “What”
First, you need to see the “DNA” of the spike using Azure Monitor Metrics.
- Look at the Graph: Is it a “Square” spike (starts and stops abruptly, like a scheduled job)? Or a “Needle” spike (hits a peak and drops, like a bot attack)?
- Identify the Resource: Go to Metrics Explorer and check:
- For VMs:
Percentage CPUorNetwork In/Out. - For Storage/SQL:
TransactionsorDTU Consumption. - For App Services:
RequestsorData In.
- For VMs:
π Step 2: Finding the Source (The Detective Work)
Since you don’t know where it came from, you use “Splitting” and “Filtering” in Metrics Explorer.
- Split by Instance/Role: If you have 10 servers, split by
InstanceName. Does only one server show the spike? If yes, itβs a local process (like a hanging Windows Update or a log-rotation fail). - Split by Operation: For Storage or SQL, split by
API Name. Is itGetBlob?PutBlob? This tells you if you are reading too much or writing too much. - Split by Remote IP: If your load balancer shows the spike, split by
ClientIP. If one IP has 100x the traffic of others, you’ve found your source.
π΅οΈ Step 3: Deep Dive with Log Analytics (KQL)
Metrics only show numbers. Logs show names. You need to run a KQL query in your Log Analytics Workspace.
Query to find “Who is talking to me”:
Code snippet
// This finds the top 5 callers during the spike windowAzureDiagnostics| where TimeGenerated > datetime(2026-04-10T12:00:00Z) // Use your spike time| summarize RequestCount = count() by clientIp_s, requestUri_s| top 5 by RequestCount
- Result: This will literally list the IP address and the specific URL they were hitting.
π° Step 4: The “Cost” Investigation
If the spike is financial (a “Cost Spike”), you check Azure Cost Management.
- Cost Analysis: View cost by Resource. Did one specific Disk or Data Transfer cost jump?
- Check for “Orphaned” Resources: Sometimes a spike comes from a process that created 1,000 snapshots or temporary disks and forgot to delete them.
π€ Step 5: Check the “Silent” Sources
If the metrics and logs don’t show an external attacker, check internal Azure “automated” sources:
- Resource Graph: Check for “Change Tracking.” Did someone deploy code or change a firewall rule at that exact minute?
- Backup/Recovery Services: A “huge spike” in disk I/O often aligns with a Storage Snapshot or an Azure Backup job starting.
- Defender for Cloud: Even if you don’t have a “Metric Alert,” check the Security Alerts. Defender might have seen the spike and flagged it as “Suspicious PowerShell Activity” or “Port Scanning.”
β Summary Checklist
| Step | Action | Tool |
| 1. Visualize | See the shape and duration of the spike. | Metrics Explorer |
| 2. Isolate | Split metrics by IP or Instance. | Metrics Explorer |
| 3. Identify | Run a query to find the specific Client IP or User. | Log Analytics (KQL) |
| 4. Correlate | Check if any “Deployments” happened at that time. | Activity Log / Change Analysis |
| 5. Network | Check for massive data transfers between regions. | Network Watcher / Traffic Analytics |
How to prevent this next time? Once you find the source, create a Dynamic Threshold Alert. Unlike static alerts, these use AI to learn your “normal” pattern and will fire if a spike looks “unusual,” even if it doesn’t hit a high maximum number.