NightWatch is an extension of memory management system that provides general, transparent and low-overhead cache pollution control. NightWatch extends the memory mapping into two types: restrictive-mapping and open-mapping. The restrictive-mapping is used for restricting the pollution effect of the poor locality data, while the open-mapping is used for cache friendly data. When a malloc request arrives, NightWatch will predict the access locality of the to be allocated memory, determine the proper cache demand, and select the right mapping type for the malloc request. NightWatch is based on the observation that data within the same memory chunk or chunks within the same allocation context often share similar locality property. NightWatch embodies this observation by online monitoring current cache locality to predict future behavior and restricting potential cache polluters proactively.
NightWatch is an externsion for memory allocator, which is targeting on the resource management of CPU cache.
The traditional memory allocators are designed focusing on main memory resource management, for example, improving the efficiency of memory allocation, reducing memory fragmentation. However, for most commodity platforms, both CPU cache sets and physical pages are physically indexed. This implies, data’s mapping to the main memory and CPU cache is closely coupled: once the main memory assignment for a piece of data is finished, the data’s mapping to the cache is automatically settled. With this coupling, it is possible that low locality and high-locality data are mapped to the same cache sets, causing cache performance degradation.
From this point of view, it is necessary to integrate cache resource management into dynamic memory allocators. In other words, the dynamic memory allocator should be extended to perform as a dual-memory-layer-manager, which handles main memory allocations, as well as cache memory management.
NightWatch is designed for this goal. When integreted with NightWatch, a traditional memory allocator can handle the resource management of cache: once an allocation request arrives, NightWatch quantifies its cache demand, and notifies the memory allocator to allocate memory with proper data-to-cache mapping.
For more CGCL’s open-source code, please visit https://github.com/CGCL-codes.
NightWatch benefits your programs in any of the following cases:
Single program cases, where weak-locality data and strong-locality data are accessed in parallel.
Multi-thread cases, where weak-locality data and strong-locality data are accessed in parallel.
Multi-program cases, where some of the programs pollute the shared cache via accessing weak-locality data, while other programs need sufficient cache space for better performence.
NOTE: NightWatch only focuses on dynamic memory allocations. It does not handle the cache assignment for the data in data segment, bss segment, or stack.
The service of NightWatch is transparent to user’s application. When integreted with NightWatch, a memory allocator does not need to modify the allocation interfaces.
OS kernel update. NightWatch relies on page coloring technique to achieve cache resource allocation. Our kernel patch is under /kernel_patch. The patch is for the linux kernel “kernel-2.6.32-71.el6”. See /kernel_patch/readme.txt for more details.
Install PAPI. You can find the latest version of PAPI at http://icl.cs.utk.edu/papi/.
Install NightWatch library. The single thread version is nightwatch_v1.01_serial, and the multi-thread version is nightwatch_v2.0_parallel.
Modify memory allocator. If you are an allocator developer, and you may want to integrate NightWatch into your own memory allocator. Then you need to implement the interfaces defined in allocator.h. In this project, we have integrated NightWatch into tcmalloc. You can take the modified allocator (under /gperftools-2.4_NW_externed_v2.0) as example. Or if you just want to try a cache-aware allocator, the allocator can be directly used without further modification. To use the allocator, you need to relink your application with flag -ltcmalloc. For more detailed information, see /gperftools-2.4_NW_externed_v2.0/readme.txt.
The system framework is illustrated in the following figure. There are three main components: memory manager, locality monitor, and locality predictor.
The locality monitor collects locality information from previously allocated chunks. It periodically samples the references to pages of the target chunks, and evaluate the chunk’s locality property, which is sent to the locality predictor. Based on the historical locality information, the locality predictor determines the proper mapping for pending allocation requests. When a new request arrives, the predictor first checks its allocation context, and uses its predecessor chunks’ locality profiles to predict the pending chunk’s locality property. Then, the predictor notifies the memory manager to perform the
allocation.