A common topic that comes up when implementing a new Celerra for a customer is how to configure networking. What’s the point of five nines of uptime if your networking isn’t reliable? So, what to do? The Celerra offers some great options for high available network configuration. Most built on industry standard items…a few proprietary to the EMC device.
The first option is something most people are at least a little bit familiar with, port aggregation. Depending on the standard used this is also called EtherChannel. The key here is that you are binding multiple Ethernet connections together. The idea is to have a single logical connection with one IP address made up of multiple physical Ethernet connections.
First, something very important. Using port aggregation does not automatically increase bandwidth. The reason for that is the statistical load balancing used to distribute traffic across the aggregated connections. Let’s assume you have a single Celerra with four ports aggregated together and it’s communicating with a single client device. There are several methods used to distribute load on a port channel with the two most common being MAC address and IP address hashing. What this means is that a client using one IP address talking to a Celerra with one IP address gets distributed across one physical connection in the aggregated link. So a single client will not see 4Gb of bandwidth when talking to a Celerra (or any device) with a four link port aggregation. But, four clients will see a total bandwidth of 4Gb since each one will be balanced across a separate link. The takeaway here is to think about “conversations”. A client talking to a Celerra, or any other server using aggregation, is having a single conversation. A single conversation only gets sent over one physical link. Multiple conversations can be sent across multiple links. Also, port aggregation isn’t intelligent. It just round robin balances connections without worrying about actual utilization. You could end up with 20 very low utiliziation connections on one port and 20 very intensive connections on another. To be blunt, a lot of people don’t understand this but it’s a very important detail, especially in other areas such as proper iSCSI architecture.
The Celerra supports both Cisco EtherChannel as well as the standard LACP (Link Aggregation Control Protocol). There are some minor differences between them, but all the rules we just discussed apply. When looking to implement aggregation you should check your switches to see if they allow a group of aggregates to span multiple physical switches. The benefit here is that the aggregation group will survive a switch failure. Most new switches support this within a switch stack. If yours does, take advantage of it. Remember, the idea here is resiliency.
Fail Safe Networking
The Celerra supports a feature known as Fail Safe Networking, or FSN. FSN provides an active/passive network configuration. An FSN can be a single port, aggregation, or combination fo them both. A group of connections setup for FSN share a single MAC address but can have multiple IP addresses. As I just said, it is an active/passive configuration. This means that if you have four physical connections only two will be active at any one time. This is unlike simple port aggregation where all ports are active at any time. When a Celerra Data Mover detects a failure it automatically switches the connection to the passive interfaces.
So how does this compare to just using EtherChannel or LACP? The key differentiator is that FSN supports connections to different switches without the switch having to support this functionality. The downside is that only half the connections will be active at any one time. Which should you use? That depends on your switches. If your switches support port aggregation across switches in a stack use LACP or EtherChannel. You’ll get better bandwidth to multiple clients with a simpler configuration. If your switches don’t support this feature then look at FSN. You’ll lose half the available bandwidth but gain much better fault tolerance.