Monitoring Bandwidth Consumed by Content Delivery Networks
A few days ago a customer came to me with a question; they wanted to know what online services were consuming the most amount of bandwidth on their Internet connection. Easy, I thought, and I proceeded to show them how to report on the top websites consuming bandwidth. The problem was that the top entries in the list were all sites associated with content delivery networks (CDN).
In summary, CDNs are used to host multiple copies of the same source data in different locations. Common types of data hosted on CDNs include software downloads and streaming media. When an end-user attempts to download the data, they are directed to their nearest copy which speeds things up. The available bandwidth capacity increases as you are not dependent on a single server. Ten servers with 1Gb connections could, in theory, provide for 10Gb of download capacity. This type of innovation is delivering the now as we want our social network updates to appear instantly and we want to download data as fast as our Internet connections allow.
This is all great news for the end user but can create problems for the IT manager. A popular method to gain visibility as to what is happening on an Internet connection has been to capture flow records from a router or firewall. Flow records will show the source and destination IP addresses of the connections going to and from the web-based services. You could then find out what service was associated with the IP address by doing a reverse DNS lookup.
The image below shows an example of this; it would appear that someone downloaded a lot of data from akamai.net.
The problem is that this download was triggered by an action on another website. The user may have clicked on a link to download a software application, and they were automatically redirected to the CDN from where the data was downloaded. It is not possible to see what the original website was by just looking at the destination IP address. So what are the options?
One option would be to check your proxy server logs for any occurrences of the source and destination IP addresses. If you then match these up to the date and time of the download, you should be able to resolve what the name of the original website was.
If you don't have a record of the IP addresses on your proxy server, then you could look at the option of doing deep packet analysis (DPI) of the traffic and it goes to and from your network. You just need to locate your network core and enable port mirroring on the switch port that links your network to the firewall. Port mirroring allows you to take a copy of network traffic without interfering with the operation of the network. Once port mirroring is configured you need to deploy a DPI tool, anything from free ones like Wireshark to commercial products which have special decoders which look at web traffic.
No matter what system you choose, you are looking to get an output similar to the one shown in the image below. Here we can see the original website name and the name of the link that triggered the download from the CDN.
You could take it a step further and cross reference the source IP address with your authentication infrastructure which may have a record of what username was associated with this IP address at the time of the download. I looked at a similar subject in my previous post which looked at how to find the top users of bandwidth on your network.
Darragh Delaney is head of technical services at NetFort. As Director of Technical Services and Customer Support, he interacts on a daily basis with NetFort customers and is responsible for the delivery of a high quality technical and customer support service. Follow Darragh on Twitter @darraghdelaney