Open-Source NFV Doesn’t Mean Cookie-Cutter NFV

Open-Source NFV Doesn’t Mean Cookie-Cutter NFV

C.Ashton

We recently published a couple of posts about the use of open-source technology in NFV, one asking “Will OPNFV become the de facto standard for NFV compatibility?” and the second explaining that “Yes, you can beat your NFV competitors to market while still leveraging open-source”.

In these two posts, we discussed many of the benefits that developers of Network Functions Virtualization (NFV) solutions can derive from open-source projects like OPNFV. These benefits include the flexibility afforded by multi-vendor solutions based on open standards and the time-to-market advantage that results from pre-integrated platforms.

It’s important to note that a decision to implement solutions based on open-source code doesn’t mean that you’re restricted to achieving the same performance and deploying the same features as all your competitors. Differentiation is alive and well in NFV. In this post, we’ll illustrate this point with a couple of examples showing how the right platform choice allows you to leverage the benefits of open-source technology while delivering best-in-class Virtual Network Function (VNF) performance as well as critical service reliability.

Let’s talk first about the VNF performance and OPEX savings that are enabled by high-performance virtual switching.

In the NFV architecture, the virtual switch (vSwitch) is responsible for switching network traffic between the core network and the VNFs that are running in Virtual Machines (VMs). The vSwitch runs on the same server platform as the VNFs. Processor cores that are required for running the vSwitch are not available for running VNFs and this can have a significant effect on the number of subscribers that can be supported on a single server blade. This in turn impacts the overall operational cost-per-subscriber and has a major influence on the OPEX improvements that can be achieved by a move to NFV.

Wind River’s Titanium Server NFV Infrastructure (NFVI) platform is a commercially-available solution that addresses this challenge, thanks to its Carrier Grade Accelerated vSwitch (AVS), highlighted in this diagram.

NFV blog image 1

Let’s look at a specific use case to demonstrate how Titanium Server’s AVS results in big OPEX savings for service providers, while remaining fully-compatible with all the relevant open-source standards:

To keep the analysis simple, we’ll assume that we need to instantiate a function such as a media gateway as a VNF and that it requires a bandwidth of 2 million packets per second (2Mpps) from the vSwitch. For a further level of simplification, we’ll assume that we’re going to instantiate a single VM, running this VNF, on each processor core. So we need to calculate how many VMs we can actually instantiate on our server blade, given that some of the available cores will be required for the vSwitch.

As the reference platform for our analysis, we’ll use a dual-socket Intel® Xeon® Processor E5-2600 series platform running at 2.9GHz, with a total of 24 cores available across the two sockets. All our performance measurements are based on bidirectional network traffic running from the Network Interface Card (NIC) to the vSwitch, through a VM and back through the vSwitch to the NIC. This is a real-world NFV configuration, rather than a simplified configuration in which traffic runs only from the NIC to the vSwitch and back to the NIC, bypassing the VM so that no useful work is performed.

In the first scenario, we use the open-source Open vSwitch (OVS) to switch the traffic to the VMs on the platform. Measurements show that each core running OVS can switch approximately 0.3 million packets per second (Mpps) of traffic to a VM (64-byte packets). The optimum configuration for our 24-core platform will be to use 20 cores for the vSwitch, delivering a total of 6Mpps of traffic. This traffic will be consumed by 3 cores running VMs and one core will be unused. We can’t run VMs on more than 3 cores because OVS can’t deliver the bandwidth required. So our resource utilization is 3 VMs per blade.

What if we replace OVS with Titanium Server’s Accelerated vSwitch (AVS)? We can now switch 12Mpps per core, again assuming 64-byte packets. So our 24-core platform can be configured with 4 cores running the vSwitch. These deliver a total of 40Mpps to exactly meet the bandwidth requirements of 20 VMs running on the remaining 20 cores. Our resource utilization is now 20 VMs per blade thanks to the use of the AVS software optimized for NFV infrastructure.

nfv blog 2

From a business perspective, increasing the number of VMs per blade by a factor of 6.7 (20 divided by 3) allows us to serve the same number of customers using only 15% as many blades as when OVS was used, or to serve 6.7 times as many customers using the same server rack. In either case, this represents a very significant reduction in OPEX and it can be achieved with no changes required to the VNFs themselves.

Titanium Server’s AVS is fully-compatible with all the applicable open NFV standards. Software written to use OVS will typically work with AVS unchanged. And as described in more detail here, Wind River provides both an open-source Kernel-Loadable Module (KLM) and DPDK Poll Mode Driver (PMD) that can optionally be used by VNF vendors to fully exploit the performance features of AVS. These are both available free of charge at Wind River’s open-source repository.

AVS is a great example of how you can retain all the advantages of a platform based on open-source software and open standards, while leveraging its differentiating features to deliver compelling OPEX reductions in live networks.

Carrier Grade reliability is another area of differentiation that brings critical business benefits.

Over decades, telecom service providers have engineered an extensive range of sophisticated features into their networks, to the point where they guarantee “five-nines” (99.999%) reliability both for critical services (e.g. E-911) and also for enterprise-class services covered by stringent Service Level Agreements (SLAs). Delivering five-nines reliability for these services means guaranteeing six-nines uptime for the underlying network infrastructure, implying a downtime of no more than 32 seconds per year.

Delivering this level of Carrier Grade reliability has broad and complex implications for network availability, security, performance and management. Meeting critical requirements in these areas represents a key business challenge for telecom service providers as they refine their plans to progressively introduce NFV into their networks. They know that they need to continue to meet expectations for reliability as they transition to NFV; otherwise they run the risk of losing their high-value customers and seeing increased subscriber churn. That would seriously impact their ability to reduce OPEX and increase subscriber revenues, which are after all the core business objectives behind the NFV initiative.

It’s extremely difficult to develop a network infrastructure platform that delivers Carrier Grade reliability. And NFV makes this even harder, because of the complex interactions between so many software elements within the platform. These include not only the Operating System itself (e.g. Linux) but also the hypervisor (e.g. KVM), the virtual switch, the management layer (e.g. OpenStack), the storage subsystem (e.g. Ceph) and the middleware functions.

There’s no way to meet the six-nines goal by using solutions designed for enterprise-class IT applications. Those solutions fall way short of meeting critical performance and functionality requirements.

nfv blog image 3

You have to start from scratch, developing a platform specifically for this requirement and designing-in the reliability features from the start.

This requires not only a major engineering investment but also an in-depth technical understanding of the complex challenges that are involved. At Wind River, the Titanium Server engineering team leveraged their Carrier Grade experience gained from many years at telecom equipment companies and used the standard telecom TL9000 design methodology to guarantee the levels of reliability demanded by service providers.

The Titanium Server NFVI platform is based on open-source projects such as Carrier Grade Linux, KVM, OpenStack, Ceph Storage and Intel® DPDK. Our team was able to add the critical Carrier Grade reliability features while maintaining 100% compatibility with the relevant NFV standards, originally specified by the ETSI Industry Specification Group (ISG) and now being augmented by the OPNFV initiative. Our experts are frequent contributors to all these open-source projects, upstreaming hardened versions of a wide range of Open Stack and Linux components.

As in the case of virtual switching that we described above, the correct choice of an NFVI platform allows you to deliver the level of infrastructure uptime that is an absolute requirement for live telecom networks, while at the same time benefitting from a solution based on open-source projects.

Basing your NFV deployment on open-source software doesn’t have to mean that you end up with the same performance and reliability as your competitors. Choosing a platform that extends the open-source baseline with compatible, value-added enhancements allows you to differentiate yourself from the pack and grab market share during this exciting industry disruption.