This repository contains the implementation and supporting components for our ASPLOS 2026 paper "PACT: A Criticality-First Design for Tiered Memory."
Overview. Our key insight is that access frequency is a poor proxy for performance: a page's true cost is its criticality - how much it stalls the CPU - which depends on access pattern, memory-level parallelism (MLP), and tier latency, not on how often it is touched. PACT therefore makes online, page-granular performance criticality a first-class placement signal. Its key technical contributions are: (1) Per-page Access Criticality (PAC), a metric that attributes per-tier CPU stall time to individual pages online, estimated from an analytical model over just four standard CPU performance counters with per-tier MLP decomposition from hardware queue-occupancy counters; and (2) two PAC-centric migration policies - eager demotion, which proactively frees fast-tier space, and adaptive promotion, which uses statistical sampling with adaptive binning to identify high-PAC pages without global sorting or hand-tuned thresholds. PACT realizes both as a lightweight runtime that places pages by their true performance impact, transparently and without application changes.
Note: Compared to the PACT system presented in the paper at the ASPLOS'26 conference, we have updated PACT with several performance improvements, most notably, we redesigned the multi-threaded sampling and migration into a coroutine-based scheme, and we added batched migration of memory pages. We also moved from a custom Linux 5.15 kernel to the built-in tiering support available in Linux 6.3. As such, PACT in this repo delivers better performance than the numbers reported in the paper.
PACT is a standalone userspace runtime. It attaches to an unmodified, already-running workload (no recompilation or library changes) and ranks the workload's pages by performance criticality (PAC): how many CPU stall cycles each page is responsible for, independent of how often it is accessed. It then drives the kernel's tiering interface to keep the most critical pages in fast memory. Everything below runs on each ~20 ms cycle:
PACT's per-cycle pipeline: sample → score (PAC) → bin → migrate, on a single-core coroutine loop.
- Sample - PEBS records the pages the workload loads from the slow tier; CHA/uncore counters measure memory-level parallelism (MLP).
- Score - each page's PAC ≈ the CPU stall cycles attributable to it,
computed from per-tier LLC misses weighted by tier latency and divided by
MLP:
LLC-stalls = k · LLC-misses / MLP. High PAC means the page stalls the CPU, independent of how often it is accessed. - Bin - pages are grouped into PAC bins whose width self-tunes (Freedman–Diaconis) to the live PAC distribution, keeping promotion selective even under skewed (power-law) workloads.
- Migrate - pages in the highest-criticality bin are promoted to the fast tier; cold pages fall back to the slow tier.
PACT/
├── src/ # the PACT runtime (C source + Makefile)
├── setup/ # host bring-up - do this first
│ ├── kernel/ # build/boot Linux 6.3 + the tierinit & kswapdrst modules
│ ├── env/ # uncore-freq pinning, CXL/NUMA layout, governor, THP/KSM off
│ └── perf/ # build the PAC-patched perf used for PAC sampling
├── run/ # run PACT on a workload (run-pact.sh, workloads.sh)
├── baselines/ # SOTA systems we compare against (TPP, NBT, Nomad, Colloid, Memtis, Soar/Alto)
└── modeling/ # PAC modeling scripts (Figure 3)
src/ is the PACT runtime - a standalone userspace process. It performs
PEBS sampling, computes per-page performance criticality, and drives page
migration through the kernel's tiering interface entirely from user space. It
is independent of the target workload: it attaches to a running workload
and needs no source changes, recompilation, or library linking on the
workload's side. The run/ scripts simply launch a workload and the src/
runtime side by side and pin them to separate cores.
The top-level directories follow the order you use them: setup/ → build
src/ → run/. baselines/ and modeling/ are independent and only needed
to reproduce specific paper results.
The artifact targets the same class of machine used in the paper:
- Hardware: an Intel Skylake-X server with a tiered memory layout. The
paper uses CloudLab
c220g5(96 GB DRAM per socket, 2 NUMA nodes), with the remote NUMA node configured to emulate a slower (CXL-like) tier. PEBS and CHA uncore counters are required for PAC sampling. - Kernel: vanilla Linux 6.3 for PACT (built by
setup/kernel/). Baselines use their own kernel versions - seebaselines/. - Software:
gcc,make,libnuma,numactl,vmtouch,gnuplot(plotting), and a patchedperfbuilt bysetup/perf/install-perf.sh. - Privileges: kernel install, module loading, and the environment-prep scripts require root.
Scope. This is a single-node release. It reproduces the PAC modeling experiment end-to-end, builds the PACT runtime and all baseline kernels, and runs the single-command workloads. The paper's Redis-YCSB result uses a two-node SSH client/server harness that is not part of this release.
Follow these steps top to bottom on a fresh machine.
1. Set up the host - setup/
# 1a. Build, install, and boot vanilla Linux 6.3, then build the modules.
cd setup/kernel
./setup_kernel.sh # clone + configure + build + install + update-grub
# reboot into 6.3, then:
./build_modules.sh # builds tierinit.ko and kswapdrst.ko
# 1b. Prepare the machine (uncore pinning, CXL/NUMA layout, governor,
# disable turbo/THP/KSM/NUMA-balancing). Requires root.
cd ../env
sudo ./prepare_environment.shRequired kernel modules. Even on a vanilla 6.3 kernel, the tiering subsystem needs two out-of-tree modules to work:
tierinitregisters the far NUMA node as a slow tier and establishes demotion targets (without it there is no tier structure to migrate across), andkswapdrstkeepskswapdfrom permanently backing off so demotion does not stall.run-pact.shloads both and refuses to start if either is missing. Seesetup/kernel/README.md.
See setup/README.md for details and tuning knobs
(uncore-frequency targets, local-DRAM sizing via memmap=).
2. Build the PACT runtime - src/
The runtime is a self-contained userspace program with no workload-side dependencies.
cd src
make # produces ./pact
make check-format # optional: clang-format style checkSee src/CODING_STYLE.md for style conventions.
3. Run a workload - run/
cd run
# Point the dataset paths in workloads.sh at your GAPBS / SPEC / Silo installs:
# export GAPBS_DIR=/path/to/gapbs SPEC_DIR=/path/to/cpu2017 ...
./run-pact.sh bc_kron_8trun-pact.sh loads the kernel modules from setup/kernel/, optionally runs
the environment prep, then launches the workload under PACT. See
run/README.md for the full runner reference and the list of
predefined workloads.
PACT is a standalone binary that attaches to an already-running, externally CPU-pinned workload by PID and runs as root (it needs PMU/PEBS access):
sudo ./src/pact --workload $(pgrep bc)PACT exposes a set of tuning knobs that control its sampling, scoring, and migration behavior and can affect overall performance. The default values are the ones we used in the paper. A few of the most relevant knobs:
| Option | Meaning | Default |
|---|---|---|
--workload PID |
target process to manage (pin it externally, e.g. taskset) |
required |
--pebs-period N |
PEBS sample period (1 sample per N slow-tier load events) | 400 |
--max-migrations-per-cycle N |
pages promoted per ~20 ms cycle; higher = faster convergence to the critical set | 4096 |
--bin-count N |
number of PAC bins; only the top (highest-criticality) bin is promoted | 20 |
--monitor-cpu / --migration-cpu |
pin the event loop / migration thread to one dedicated core | -1 (none) |
Run ./src/pact --help for the full option list (cooling, timing intervals,
logging, diagnostics).
baselines/ holds kernel patches and build scripts for TPP,
NBT, Nomad, Colloid-tpp, Memtis, and Soar/Alto, each with its own README
documenting the exact kernel tag and build steps.
This is a research artifact built to reproduce the paper's results. A few constraints of the current implementation are worth knowing up front:
- Intel Skylake-X only. The PMU event codes, the CHA-to-core mapping, and
the PAC model constants (
k_dram,k_cxl) are calibrated for Skylake-X (e.g. CloudLabc220g5). PACT aborts on an unrecognized CPU by default rather than produce invalid results; other microarchitectures (Emerald/ Sapphire Rapids, AMD) are not yet calibrated. - Two tiers, single node. PACT assumes one fast tier (local DRAM) and one CXL-like slow tier on a 2-NUMA-node host. More than two tiers, multi-socket fan-out, and multi-node setups are out of scope for this release.
- Hardware sampling required. PAC profiling depends on PEBS plus CHA/uncore
occupancy counters and a PAC-patched
perf, and the runtime must run as root. - Attaches to a running, pre-pinned workload. PACT manages an existing
process by PID; it does not launch or CPU-pin workloads itself (the
run/scripts handle that around it).
Natural directions for future work (and good places to contribute): calibrating the PAC model for additional CPUs, generalizing placement beyond two tiers, alternative access-sampling backends, and a self-contained workload launcher.
Contributions are welcome. Issues and pull requests, whether bug fixes,
new platform calibrations, or documentation improvements, are appreciated.
Please keep changes consistent with src/CODING_STYLE.md
and make sure make and make check-format pass before opening a PR.
If you use this work, please cite our ASPLOS 2026 paper:
@InProceedings{pact.asplos26,
author = {Hamid Hadian and Jinshu Liu and Hanchen Xu and Hansen Idden and Huaicheng Li},
title = "{PACT: A Criticality-First Design for Tiered Memory}",
booktitle = {Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)},
year = {2026},
}Maintainers:
- Hamid Hadian - hamidhadian@vt.edu
- Hanchen Xu - hanry@vt.edu
- Huaicheng Li - huaicheng@vt.edu
For questions about the research or implementation, please open an issue or contact a maintainer.
The PACT runtime is released under the MIT License. Vendored third-party headers, kernel patches (GPL-2.0), and documentation/data carry their own licenses - see LICENSE and NOTICE.