“Traffic Spike” or perhaps a “Burst” in resource usage

It sounds like you are describing a “Traffic Spike” or perhaps a “Burst” in resource usage. Since there are no alerts, you are in “detective mode,” looking for a silent surge that hasn’t crossed a threshold yet but is clearly visible in your telemetry.

If you heard “frost,” you might be referring to “Cold Start” spikes (common in Serverless/Functions) or a “Request Spike.” Here are the steps to track down the source of a sudden surge in Azure:

🔍 Step 1: Use Azure Monitor “Metrics Explorer”

Since you don’t have alerts, you need to visualize the spike to see its “shape.”

Go to the resource (e.g., App Service, VM, or Load Balancer).
Select Metrics from the left menu.
Add the Request Count (for apps) or CPU/Network In (for VMs) metric.
The Secret Step: Change the “Aggregation” to Count or Sum and look for the exact timestamp of the spike.
Apply Splitting: Split the metric by “RemoteIP” or “Instance”. This tells you if the spike is coming from one specific user/IP or hitting one specific server.

🕵️ Step 2: Dig into Log Analytics (KQL)

If the metrics show a spike but not the “who,” you need the logs. This is where you find the “Source.”

Go to Logs (Log Analytics Workspace).
Run a query to find the top callers during that spike period.

Example KQL for App Gateways/Web Apps:

Code snippet

			
// Find the top 10 IP addresses causing the spike
AzureDiagnostics
| where TimeGenerated > datetime(2026-04-10T12:00:00Z) // Set to your spike time
| where Category == "ApplicationGatewayAccessLog"
| summarize RequestCount = count() by clientIP_s
| top 10 by RequestCount

		

Result: If one IP address has 50,000 requests while others have 10, you’ve found a bot or a misconfigured client.

🌐 Step 3: Check “Application Insights” (App Level)

If the spike is happening inside your application code (e.g., a “Cold Start” or a heavy API call):

Go to Application Insights > Failures or Performance.
Look at the “Top 10 Operations”.
Check if a specific API endpoint (e.g., /api/export) suddenly jumped in volume.
Use User Map to see if the traffic is coming from a specific geographic region (e.g., a sudden burst of traffic from a country you don’t usually service).

🗺️ Step 4: Network Watcher (Infrastructure Level)

If you suspect the spike is at the “packet” level (like a DDoS attempt or a backup job gone rogue):

Go to Network Watcher > NSG Flow Logs.
Use Traffic Analytics. It provides a map showing which VNets or Public IPs are sending the most data.
Check for “Flows”: It will show you the “Source Port” and “Destination Port.” If you see a spike on Port 22 (SSH) or 3389 (RDP), someone is likely trying to brute-force your VMs.

🤖 Step 5: Check for “Auto-Scaling” Events

Sometimes the “spike” isn’t a problem, but a reaction.

Go to Activity Log.
Filter for “Autoscale” events.
If the spike happened exactly when a new instance was added, the “spike” might actually be the resource “warming up” (loading caches, etc.), which can look like a surge in CPU or Disk I/O.

Summary Checklist:

Metrics Explorer: To see when it happened and how big it was.
Log Analytics (KQL): To find the specific Client IP or User Agent.
Traffic Analytics: To see if it was a Network-level burst.
Activity Log: To see if any Manual Changes or Scaling occurred at that exact second.

A common real-world “mystery spike” case. Since you mentioned “frost spike” and “source space,” you are likely referring to a Cost Spike or a Request/Throughput Spike in your resource namespace.

If there are no alerts firing, it means the spike either didn’t hit a specific threshold or was too brief to trigger a standard “Static” alert.

🏗️ Step 1: Establish the “When” and “What”

First, you need to see the “DNA” of the spike using Azure Monitor Metrics.

Look at the Graph: Is it a “Square” spike (starts and stops abruptly, like a scheduled job)? Or a “Needle” spike (hits a peak and drops, like a bot attack)?
Identify the Resource: Go to Metrics Explorer and check:
- For VMs: Percentage CPU or Network In/Out.
- For Storage/SQL: Transactions or DTU Consumption.
- For App Services: Requests or Data In.

🔍 Step 2: Finding the Source (The Detective Work)

Since you don’t know where it came from, you use “Splitting” and “Filtering” in Metrics Explorer.

Split by Instance/Role: If you have 10 servers, split by InstanceName. Does only one server show the spike? If yes, it’s a local process (like a hanging Windows Update or a log-rotation fail).
Split by Operation: For Storage or SQL, split by API Name. Is it GetBlob? PutBlob? This tells you if you are reading too much or writing too much.
Split by Remote IP: If your load balancer shows the spike, split by ClientIP. If one IP has 100x the traffic of others, you’ve found your source.

🕵️ Step 3: Deep Dive with Log Analytics (KQL)

Metrics only show numbers. Logs show names. You need to run a KQL query in your Log Analytics Workspace.

Query to find “Who is talking to me”:

Code snippet

			
// This finds the top 5 callers during the spike window
AzureDiagnostics
| where TimeGenerated > datetime(2026-04-10T12:00:00Z) // Use your spike time
| summarize RequestCount = count() by clientIp_s, requestUri_s
| top 5 by RequestCount

		

Result: This will literally list the IP address and the specific URL they were hitting.

💰 Step 4: The “Cost” Investigation

If the spike is financial (a “Cost Spike”), you check Azure Cost Management.

Cost Analysis: View cost by Resource. Did one specific Disk or Data Transfer cost jump?
Check for “Orphaned” Resources: Sometimes a spike comes from a process that created 1,000 snapshots or temporary disks and forgot to delete them.

🤖 Step 5: Check the “Silent” Sources

If the metrics and logs don’t show an external attacker, check internal Azure “automated” sources:

Resource Graph: Check for “Change Tracking.” Did someone deploy code or change a firewall rule at that exact minute?
Backup/Recovery Services: A “huge spike” in disk I/O often aligns with a Storage Snapshot or an Azure Backup job starting.
Defender for Cloud: Even if you don’t have a “Metric Alert,” check the Security Alerts. Defender might have seen the spike and flagged it as “Suspicious PowerShell Activity” or “Port Scanning.”

✅ Summary Checklist

Step	Action	Tool
1. Visualize	See the shape and duration of the spike.	Metrics Explorer
2. Isolate	Split metrics by IP or Instance.	Metrics Explorer
3. Identify	Run a query to find the specific Client IP or User.	Log Analytics (KQL)
4. Correlate	Check if any “Deployments” happened at that time.	Activity Log / Change Analysis
5. Network	Check for massive data transfers between regions.	Network Watcher / Traffic Analytics

How to prevent this next time? Once you find the source, create a Dynamic Threshold Alert. Unlike static alerts, these use AI to learn your “normal” pattern and will fire if a spike looks “unusual,” even if it doesn’t hit a high maximum number.

Infra Cloud Solutions

Leave a comment Cancel reply