By Charlie Ashton & Ron Breault
It’s been a busy 18 months for Wind River’s Titanium Server NFV Infrastructure (NFVI) platform. We announced Titanium Server at Mobile World Congress back in February 2014, as the industry’s first commercial Carrier Grade solution for NFVI, and it’s still the only platform in this category. We delivered the General Availability product release in October 2014 and shortly after that revealed that HP had adopted the technology for their Carrier Grade Helion solution: just the first of several customer announcements. We launched the Titanium Cloud partner ecosystem in June 2014 and this has quickly expanded to a rich set of validated solutions from industry-leading companies.
In this post, we’ll discuss how our customers’ target use cases have evolved during this time and highlight some of the key enhancements that we’ve made to Titanium Server reflecting this shift in focus within the industry.
At the beginning of 2014, the industry was emphasizing applications such as virtual EPC and virtual IMS as initial applications for NFV. The expectation was that these network core functions would not only yield significant OPEX savings but would also be suitable for early deployment.
That position seems to have changed in the intervening months. Our customers are now telling us that they see greater near-term business potential in applications such as virtual business CPE (vBCPE), Mobile Edge Computing (MEC) and virtual RAN (vRAN) use cases.
We consistently hear that these edge and access use cases will significantly accelerate service providers’ ability to deploy new services in response to customer requests, while also providing OPEX savings that are both significant and quantifiable. At the same time, they can be implemented without the need for comprehensive Management and Orchestration (MANO) solutions, which is helpful since the relevant ETSI standards are still under discussion.
In response to these shifting industry priorities, we’ve moved aggressively to implement new features in Titanium Server so that we can continue to support our customers’ target applications with an NFVI cloud that provides the reliability, performance and cost structure that they need. Our most recent release, now in use by customers and partners worldwide, includes a wide range of enhancements to the original platform.
Low system cost is critical to the viability of many vBCPE, MEC and vRAN solutions. These applications are often hosted either in a customer premise or in a local service provider Point of Presence (PoP) where large server racks are not cost-effective. So we’ve added a small-footprint version of Titanium Server that can be deployed in only two servers.
In this configuration, each of two redundant servers is partitioned into Compute, Control and Storage functions. The control and storage functions can each run on as few as a single processor core, leaving the lion’s share of cores available for the compute function which hosts revenue-generating services.
Unlike other enterprise-class platforms that require a third redundant control node (and therefore a third server) to arbitrate between the other two in the case of failures, Wind River has unique technology that avoids “split brain” conditions and enables Titanium Server to achieve full Carrier Grade reliability using just two servers, resulting in significant CAPEX and OPEX savings for our customers.
OPEX savings are critical for many of our customers, so processor resources need to be provisioned dynamically and optimally based on the actual network traffic at any given time, rather than over-allocated in anticipation of peak demand. To accomplish this, we’ve added a sophisticated CPU scale-up / scale-down capability to Titanium Server that provides full dynamic scaling without compromising Carrier Grade reliability. As traffic through a VM increases to the point where the VM is close to saturating the processor cores that it’s running on, Titanium Server automatically allocates additional processor cores to the VM. Similarly, when the load on a VM drops so that it needs fewer resources, processor cores are automatically removed. All this happens without any need to restart or reboot the VM, ensuring that there’s no risk of service downtime during the scale-up / scale-down process. Further, the triggers which initiate the scaling actions are flexible and policy driven, enabling full control over the process.
A key component of a service provider’s OPEX calculation is VM density, in other words the number of VMs that can be supported per server. In order to maximize the VM density in an NFV deployment, it’s important for the NFVI platform to support the very latest in high-performance Network Interface Cards (NICs). With Titanium Server, we work closely with the industry’s leading NIC providers and make sure that we implement optimized support for their high-performance solutions targeted at NFV applications. As an example, Titanium Server now supports the Intel® Ethernet Controller XL71 (formerly known as “Fortville”) as well as the Mellanox CX3 10G/40G NICs. We’ll continue to add support for additional, new high-performance NICs as they become available from our partners.
As shown on the Titanium Cloud website, a large number of Virtual Network Function (VNF) suppliers are now supporting Titanium Server. It’s important for most of these partners to optimize the performance of their VNFs and ensure they’re fully leveraging the performance-oriented features of the platform, such as the Accelerated vSwitch.
We’ve added a vSwitch packet trace tool to Titanium Server, enabling these partners to efficiently tune their VNFs so that they can deliver the highest possible performance to our mutual customers.
Many of our VNF partners use the Intel® Data Plane Development Kit (Intel® DPDK) software library as a way to maximize the packet processing performance of their applications (and we also use DPDK within the Titanium Server Accelerated vSwitch). While some of those partners have migrated to the latest version of DPDK, revision 2.0, many are continuing to use earlier versions. With Titanium Server, we implemented support for VNFs based on DPDK 2.0 as soon as it became available and we also migrated the Accelerated vSwitch to DPDK 2.0. Uniquely, we continue to support VNFs based on older DPDK versions, enabling multiple simultaneous versions to be running in VNFs concurrently; there’s no requirement for our partners to move to DPDK 2.0 in the guest even though the host uses that version. This feature is key to OPEX savings, ensuring that service providers can choose when to upgrade their VNFs rather than being compelled to do so when an obscure platform limitation is exposed.
To simplify and accelerate the migration of network functions to Titanium Server, we’ve added support for standard “QinQ” tunneling. This ensures that complex applications implementing their own VLAN network segregation schemes don’t have to be rewritten when transitioning to the virtualized environment provided by Titanium Server. Applications can continue to employ their own VLAN tags while Titanium Server’s Accelerated vSwitch transparently tunnels traffic across and between nodes and networks, uniquely encapsulating and protecting each VNF’s traffic.
The final new feature that we’ll highlight in this post is one that’s critically important to service providers deploying an NFV cloud based on Titanium Server. For service providers, infrastructure deployment costs represent a significant portion of their overall OPEX and many have expressed nervousness about the learning curve for their IT teams as they roll out new platforms. We’ve addressed this concern through a new bulk provisioning capability. This graphical, Wizard-like tool greatly simplifies the automated deployment of large, distributed Titanium Server clusters and supports accelerated installation from a boot server. All part of our focus on ensuring ease-of-use for our customers while also maximizing service-level performance and reliability.
Besides the features that we’ve touched on above, the latest release of Titanium Server includes a wealth of other enhancements in areas such as: huge page support; enabling VMs to span NUMA nodes; accelerated and distributed virtual routing; enabling scheduler hyperthreading awareness; Link Aggregation Control Protocol (LACP) and more.
Please feel free to contact us to talk about any of these topics or to suggest other areas that we should investigate. The industry is moving quickly to focus on early use cases that will deliver strong Return on Investment and Wind River will continue to deliver the NFVI platform features that are required for these applications.