Interview with Girish Venkatasubramanian

After my blog post on Academic Simics earlier this Summer, I got a very nice reply from Girish Venkatasubramanian of UFL. Turned out that he and his group was doing some really interesting and exciting stuff with Simics, researching into Hypervisor architectures and hardware support. Having been a PhD student myself, I can certainly appreciate the excitement and fun of working in that field. We ended up doing a virtual interview, which I am happy to present here.

We start with the basics. Can you introduce yourself?

Girish_200p

I am Girish Venkatasubramanian, a PhD candidate at the Department of Electrical and Computer Engineering, University of Florida. I am a part of the Advanced Computer and Information Systems (ACIS) Lab and am advised by Prof. Renato Figueiredo.

What is your university and research group?

Ufllogo

University of Florida (UF) is the largest university in Florida. The department of Electrical and Computer Engineering is one of the oldest department at UF (it celebrated its centennial in 2009).

The ACIS Lab was established in 2001 by Prof. Jose Fortes to conduct fundamental and applied research on all aspects of systems that integrate computing and information processing. Currently there are four faculty members, 12 students and one research scientist in the lab. Research at ACIS includes various areas like Computer architecture, Grid-computing middleware, Cyberinfrastructure for e-science, Autonomic computing and Peer-to-peer computing.

What is your research about?

My research is in the area of computer systems, architecture and
virtualization – designing hardware to increase performance and
provide manageability for high-end virtualized server computing
systems. To improve the performance of virtualized workloads, the
x86 architecture has been modified to provide hardware support for
virtualization [1,2]. The latest in this set of changes is the
modification of the Translation Lookaside Buffer (TLB) by adding
tags as a part of the TLB entry and providing hardware primitives
for tag comparison during TLB lookup, making hardware-managed
tagged TLBs viable.

I, along with Prof. Figueredo (UF), Ramesh Illikkal (Intel) and
Donald Newell (Intel), have looked into the generation and
management of tags to maximize the benefit of such
hardware-managed tagged TLBs. We have proposed the Tag Manager
Table (TMT) [3], a software-transparent solution for generating
and managing process-specific TLB tags based on the Page Table
Base Register (CR3 in x86). By using these tags, multiple address
spaces can share the TLB which reduces the number of TLB flushes
and the TLB miss rate and increases the performance of the
workloads. The TMT is designed to ensure low latency of TLB
lookups and imposes a very small area overhead. We have also
investigated the influence of factors like the size of the TMT,
the size and associativity of the TLB, the nature of the workload,
whether it is consolidated with other workloads and the page-walk
latency on the TLB behavior and on the performance boost that is
obtained from tagging [3,4]. These tags can also be used as a
mechanism for controlling the TLB usage of different applications
to achieve selective performance improvement as well as isolation
of the TLB behavior of one workload from others.

How do you use Simics?

One of the primary requirements for any simulation-based study of virtualized environments is a flexible and robust simulator and Simics fits the requirement very well. Since Simics is a full-system simulator, we were able to create a disk image by booting Xen 3-1.0/2.6.18-xen on a x86-440bx simulated machine [3].

This xen kernel was instrumented to indicate events of interest like context switches. Using this disk image, multiple domains can booted on the simulated machine and workloads run in them. We have been able to use Simics to create multi-domain (dom0 + 5 domUs) multi-processor (up to 8 CPUs) configurations to investigate consolidated scenarios. The Simics x86 TLB model has also been extended by adding tags, both process-specific and VM-specific tags, as a part of the TLB entry and tag-comparison logic to simulate tagged TLBs. Timing models for the TLB have been created and incorporated into a simulation framework consisting of Simics and FeS2 in order to study the impact of the TLB on the performance of virtualized workloads and the improvement in this performance due to tagged TLBs [4].

The picture below shows the setup:

Moreover, as a part of a multi-university collaborative project with the NSF Center for Autonomic Computing, this Simics simulation framework with the Xen disk image is being used by different research groups to generate instruction and memory traces of virtualized workloads. These traces are generic enough and, with a little post-processing, can be used for various simulators like DRAMSim and DEVS. As a part of the NSF Archer project, Simics has been deployed on a grid environment and is one of the widely used simulators in this infrastructure.

A list of publications which have used Simics on Archer can be found at http://www.archer-project.org/#Publications_Using_Archer.

This Xen on Simics set-up has also been used for class assignments and projects for Prof. Figueiredo's Virtual Computing course. Students were provided with Simics checkpoints of virtualized workloads and asked to investigate the effect of various factors like cache size, TLB size and Xen scheduler settings on the workloads.

So essentially, you are teaching virtual machine technology using a Simics virtual platform on a distributed virtualized cluster? Cool!

Yes, we use virtualization to teach virtualization. Students
benefit tremendously from hands-on assignments where they can
simulate, cycle-by-cycle, the execution of contemporary virtual
machine hypervisors (e.g. Xen) using Simics. With Archer, we
provide a plug-and-play virtual machine environment that makes it
easy for students to accomplish this – they can download an
appliance to their own desktop and run their first Simics
simulation in a matter of minutes. Because Archer is openly
accessible to academics and students, and because virtual machines
are configured in the same way regardless of where they run, the
hands-on educational modules we develop at UF can be easily
re-used by educators and students at other universities. In the
long run, we hope that many hands-on educational modules that use
Simics are created and shared by the community using the
distributed virtual cluster technology in Archer, creating a
"virtual university lab" of sorts for computer architecture
students.

Do you have any interesting results to share?

Using the instrumented Xen disk image booted on Simics simulator, the "flush profile" of a workload (a breakdown of the TLB flushes based on the cause for the flush into inter-VM flushes, intra-VM flushes and consistency flushes) can be determined. This profile gives an insight into the TLB behavior of the workload and can be used as a coarse-and-quick metric for comparing different TLB improvement schemes. Moreover, it is seen that an 8-entry Tag Manager Table is able to avoid upwards of 90% of the TLB flushes, which reduces the TLB miss rate by about 65% for a 1024-entry TLB for TPCC-UVa (a TPC-C benchmark).

This reduction in the TLB misses translates into a 50% reduction in the delay caused by the TLB misses and the resuting page walks.

The increase in IPC due to this reduction in the TLB misses depends on the workload and varies from 3% for I/O intensive workloads to 8% for memory intensive benchmarks. Along with the nature of the workload, the TLB size plays a huge role in determining this IPC increase. It is also seen that, in cases where the TLB space is a scarce resource (which happens when many processes share a small TLB), the gain in IPC for a selected high-priority domain can be further increased by constraining the TLB usage of other low-priority domains.

Thank you!

(Jakob is back) That was very interesting! I like how they use the same setup across their own research, in the Archer cluster used by other groups, and in their teaching. It is nice to be able to package complex things into a transportable unit with Simics.

References

This might sound a bit overly formal, but here are the references in the text above. Putting these long strings inline in the blog does not really work, so we used the classic "numbers in brackets" style to reference them here.

[1] Advanced Micro Devices, AMD-V Nested Paging, White paper, AMD, July 2008.

[2] G. Neiger, A. Santoni, F. Leung, D. Rodgers, and R. Uhlig, Intel Virtualization Technology: Hardware Support for Efficient Processor Virtualization, Intel Technology Journal, vol. 10, no. 3, pp. 167-178, August 2006.

[3] Venkatasubramanian, G.; Figueiredo, R.J.; Illikkal, R.; Newell, D., "TMT – A TLB Tag Management Framework for Virtualized Platforms," 21st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2009), October 2009

[4] Venkatasubramanian, G.; Figueiredo, R.J.; Illikkal, R.; Newell, D., "A Simulation Framework for the Analysis of TLB Behavior in Virtualized Environments," 18th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS 2010), August 2010

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>