The Journey to Unified Fabric: Where are you?

By Fausto Vaninetti, SNIA Europe Board of Directors, Cisco Systems.

10 years ago Posted in

Within datacenters, one thing is always true: users and their applications need to be able to access their data. For user traffic, the winning transport technology is clearly Ethernet. For data traffic, several alternatives or complementary options have been proposed and used to some degree of success. Infiniband, iSCSI, Fibre Channel, Ethernet with NFS are all names that come to mind. In this document we just want to explore the most popular combination of technologies and the way they are coming to work together.

A few years back, the typical classic datacenter architecture would see the deployment of two distinct networks: the local-area network (LAN) based on Ethernet technology and the storage-area network (SAN) based on Fibre Channel technology. The two networks would be characterized by different capabilities in several respects.

For example, in those days, the Ethernet network would be mostly 1 Gbps at the edge and sometime 10 Gbps in the core. Packet drops would be well-accepted and no special care was generally considered to guarantee bandwidth to specific traffic types. On the contrary, the Fibre Channel network would operate at 1, 2 or 4 Gbps and provide a strict flow control mechanism, based on buffer to buffer credits, to avoid frame drops. Nowadays, 10 Gbps Ethernet in the access and 40 Gbps in the core are quickly becoming the norm, paired with 16G FC on the storage side. Apart from the evident bandwidth and performance increases, the original idea to have LAN traffic and SAN traffic flowing on dedicated networks is still prevalent and considered a best practice by many customers, both in terms of stability, availability and ease of use, with clear demarcation of administrative roles and tasks. This is all well reflected in the industry move to standardize and introduce to market the next level of Fibre Channel technology: 32G FC (a.k.a Gen6 in Fibre Channel Industry Association terminology).

Year 2008 gave birth to a new technology called Fibre Channel over Ethernet, normally known as FCoE, aimed at alleviating some of the challenges of the previously described “duplicated network” approach. With FCoE it became possible to transport FC frames in their entirety as the payload of Ethernet frames. In other words, the Fibre Channel cable was essentially replaced by an Ethernet pipe, where a solution was introduced to guarantee a no drop behavior so that FC frames would not be lost along the way. This solution was derived by the “pause” technique in use since years for Ethernet networks and became standardized by IEEE as the Priority Flow Control (PFC) mechanism. It is worth mentioning that the buffer credits mechanism of Fibre Channel is a proactive technique whereas the pause mechanism is reactive in nature, but the end result holds the same: no frame drops.

After complete standardization of FCoE technology with INCITS T11 back in 2009, IT Leaders expected to save considerable amount of money due to the promised reduction of equipment, and demand for this new solution started to take off. Considering cable simplification, reduction in network adapters, converged networking and power consumption savings alone, customers enjoyed an estimated 30% improvement in their TCO. Since FCoE only works on 10 Gbps Ethernet links and higher, this technology also contributed to a wider adoption of high speed Ethernet networking devices. Nowadays most x86 servers offer 10 Gbps ports on the motherboard and converged switches with both 10 Gbps and 40 Gbps ports are deployed. It is clear that the adoption of FCoE technology started close to servers, where it provides the faster return on investment. This architecture can be conveniently described as “unified access” and it has seen wide commercial success, particularly in combination with blade server offerings. In fact, FCoE connectivity is often in use inside many of the Integrated Computing Stacks, pre-validated and application optimized combinations of computing, networking and storage.

Before the advent of 16G FC devices, customers started to compare the bandwidth performance of 8G FC with 10G FCoE and realized FCoE would lead to a 50% higher throughput thanks to a better encoding scheme (64/66b vs 8/10b). Also, due to commercial implementations of FCoE networking devices, whereby an FCoE switch is essentially an Ethernet switch with a software license to enable additional capabilities, few customers moved one step ahead and embraced an end to end FCoE architecture, from server to disk array.

One benefit of this approach was identified in the use of the same exact switch model for both conventional Ethernet traffic and FCoE traffic, even when separated networks were deployed. In other cases, both capabilities were enabled on a single device with logical separation of the two protocols, leading to further savings. This model became known as multi-hop FCoE and few customers have actually adopted it already. The advent of conveniently priced 16G FC products on one hand and the evolution of Ethernet networks from multi-layer to leaf-spine topologies have somehow affected the popularity of multi-hop FCoE solutions and organizations are now waiting for the next level of aggregation: 40 Gbps.

Thinking to the future, 2015 could be the year where a further technical evolution will make possible yet another scenario. For conventional Ethernet deployments, changes in network traffic patterns have resulted in more “East-West” traffic across the data center, where server-to-server communication has encouraged us to rethink the typical 3-tier approach (access, aggregation, core). As a result of that, a modern 2-tier design, also known as leaf-spine or “Clos” fabric, provides a complementary option for Data Center architects. This prompted networking vendors to consider the possibility to transport FC frames on this new network topology using FCoE as an overlay technology. In its typical implementation it would be served by physically separated edge devices for SAN A and SAN B, but would leverage logical separation of SAN A/SAN B internally to the fabric.

The establishment of virtual ISL links within the fabric would happen automatically and dynamically and the traditional SID/DID/OXID load-balancing scheme would be honored. While certainly not a requirement, there are three specific benefits for considering this type of approach. First, all of the available bandwidth in the Data Center topology is utilized across both SANs. In traditional physically separated SAN A/B environments, half of the available bandwidth is allocated for SAN A, and the other half is allocated for SAN B. Using this new overlay method, both logical SANs have access to the full bandwidth. In consequence of this, the second benefit becomes clearer: higher resource utilization would follow in the case of a spine failure.

Normally in a SAN A/SAN B design with physical separation and a single core switch per fabric, if that core switch for SAN A were to fail, the entire SAN A would be required to fail-over, not just the traffic for the failed switch. That means that we would lose the bandwidth and capabilities of the other switches part of SAN A, despite operating normally. With the new overlay approach, only the capacity of one switch would be missed by the storage network. Third, in just such a situation, a failover from SAN A to SAN B would not be triggered in the case of a spine switch failure.

Since the Multi-pathing IO (MPIO) software on the host never receives a signal that it has lost connectivity, no “outage event” occurs. The underlying Ethernet fabric handles the traffic load balancing through normal equal-cost multi-pathing (ECMP) algorithms. At present, it is still up to networking vendors to demonstrate the advantages of this approach. Reliability and ease of use will also need to be assessed, so that some time will pass before significant market traction will come into play. The storage administrators’ familiarity with physically separated fabrics and their proven track record in terms of resiliency will not be easy to overcome, but it is clear that a single physical fabric with an FCoE overlay and logical separation could represent a new approach, where resource utilization is maximized and protection from human errors retained (for example zoning mistakes will be confined to individual logical SANs).

Virtualization, unification and consolidation tend to drive higher link speeds. If FCoE over 40G pipes is already a reality on some networking products, anticipate a broader commercial support during 2015 with 100G FCoE to be released a year later. Only those customers that will decide to move in this direction, with the dynamic FCoE overlay on high speed links, will be entitled to claim a truly “unified fabric” deployment within their Datacenters.

For more information about SNIA’s activities as it relates to technology initiatives, please visit www.snia-europe.org