Top Tools in Azure Network Watcher for Network Troubleshooting

If Azure Monitor is the “Central Nervous System,” Azure Network Watcher is the “Private Investigator.”

While regular monitoring tells you if a server is up, Network Watcher tells you why two resources can’t talk to each other, even though they both seem healthy. It focuses specifically on the IaaS (Infrastructure as a Service) networking layer—VNets, Subnets, Network Security Groups (NSGs), and Gateways.

The “Big Three” Troubleshooting Tools

Most people use Network Watcher for these three specific “Oh no, why isn’t this working?” scenarios:

1. IP Flow Verify

Have you ever been certain your Firewall/NSG rules were correct, but traffic still wasn’t getting through?

What it does: You give it a source/destination IP and port. It runs a simulation and tells you exactly which rule is Allowing or Denying that traffic.
The “Win”: No more scrolling through 50 NSG rules to find the one “Deny All” hidden at the bottom.

2. Next Hop

Sometimes a packet leaves a VM but never arrives, not because of a firewall, but because it got lost in the routing.

What it does: It tells you where a packet is headed next (e.g., Internet, Virtual Appliance, or VNet Gateway).
The “Win”: It helps you identify if a User-Defined Route (UDR) is accidentally sending your database traffic into a “black hole.”

3. Connection Troubleshoot

This is the “All-in-One” button. It checks the connectivity between a source (VM or Application Gateway) and a destination (VM, URI, or IP).

What it does: It checks for DNS issues, routing problems, and port blockages all at once.

Advanced Monitoring & Logging

Network Watcher also handles the “heavy lifting” of network data analysis:

NSG Flow Logs: This records every single IP flow passing through your Network Security Groups. It tells you who talked to whom, over which port, and whether it was allowed.
- Pair it with: Traffic Analytics to turn that raw data into a beautiful map showing where your global traffic is coming from.
Packet Capture: If you need to go “Full Matrix,” you can trigger a remote packet capture on a VM. It creates a .cap file that you can open in Wireshark to see exactly what is happening at the byte level.
Topology: This automatically generates a visual map of your entire network. If you inherited a messy environment, this is how you figure out what is actually connected to what.

Crucial Things to Know

It’s Regional: Network Watcher must be enabled for every region where you have resources. If you have VMs in East US but Network Watcher is only on in West US, you can’t troubleshoot the East US VMs.
The “NetworkWatcherRG”: You might see a resource group with this name appear automatically. Don’t delete it. That’s where Azure stores the Network Watcher instances for your regions.
Cost: Most of the diagnostic tools (IP Flow Verify, Next Hop) are free. However, Packet Captures and NSG Flow Logs incur storage costs (and processing costs if you use Traffic Analytics).

Peer Tip: If you’re ever stuck on a “Communication Link Failure” error between an App and a Database, run IP Flow Verify first. 90% of the time, it’s a missing NSG rule for the specific port you’re using.

In the cloud, “defense in depth” means assuming that at some point, one of your layers will be bypassed. Monitoring is your way of making sure that when it happens, you aren’t the last one to find out.

For a robust setup, you want to layer your visibility from the outside (the internet) all the way down to the code.

The “Defense in Depth” Monitoring Stack

Layer 1: The Perimeter (Network Watcher + NSG)

This is your “security camera” at the front gate.

NSG Flow Logs: Enable these for all critical subnets. It records every hit (and every block) on your firewalls.
Traffic Analytics: This is a must-add to Flow Logs. It visualizes the data so you can see if, for example, a random IP in a country you don’t do business with is hammering your SSH port.

Layer 2: The House (Azure Monitor + VM Insights)

This monitors the health of the “building” itself.

Azure Activity Logs: These track who did what in the Azure Portal. If someone deletes a production database, the Activity Log is where you find the “fingerprints.”
Resource Health: Set up alerts for when Azure’s own infrastructure has an issue (e.g., a hardware failure in the data center).

Layer 3: The Interior (Azure Monitor Agent – AMA)

Once inside the VM, you need to know what’s happening in the “rooms.”

Syslog (Linux) / Event Logs (Windows): Use the AMA to stream these to Log Analytics. You’re looking for failed login attempts or unauthorized “sudo” commands.
Process Monitoring: VM Insights can show you if a strange, unnamed process is suddenly eating 90% of your CPU (a classic sign of crypto-jacking).

Layer 4: The Residents (Application Insights)

This is monitoring the behavior of the people (the code) inside.

Exception Tracking: If your app starts throwing 401 Unauthorized errors suddenly, App Insights will tell you if it’s a bug or a brute-force credential stuffing attack.
User Behavior: Monitor for unusual spikes in traffic to specific API endpoints.

Organizing Your “Command Center”

To keep this from becoming a chaotic mess of alerts, follow these three best practices:

Strategy	Action	Benefit
Centralize	Send all logs to a single Log Analytics Workspace.	Allows you to “correlate” data (e.g., see a network spike and a CPU spike at the same time).
Action Groups	Group your alerts by “Severity.”	SEV 0 goes to a phone call; SEV 3 just sends a quiet email or a Slack message.
Smart Defaults	Use Azure Policy to enforce monitoring.	Automatically installs the Monitoring Agent on any new VM created, so you never have “blind spots.”

The “Final Boss” of Defense: Microsoft Sentinel

Since you mentioned “Defense in Depth,” you should eventually look at Microsoft Sentinel. It’s a SIEM (Security Information and Event Management) that sits on top of all the tools we’ve discussed.

It uses AI to look at your Network Watcher logs, your VM logs, and your App Insights and says: “Hey, I saw a weird login on this VM, and then five minutes later, that VM started sending weird traffic to an unknown IP. This looks like an attack.”

Peer Tip: Don’t try to alert on everything at once. You’ll get “Alert Fatigue” and start ignoring your inbox. Start with Availability (is it up?) and Errors (is it broken?), then refine from there.

Infra Cloud Solutions

Leave a comment Cancel reply