The Kernel's Write — AROM, a Friendly Field Guide

Before you begin

Twelve ideas the paper assumes you know

Each card clears up a single concept the paper builds on. They're collapsed — tap any one open. Read them in order the first time, or skip to whatever looks unfamiliar.

DRAM — today's fast memory LtRAM — the new cheap memory Optane — the cautionary tale

1RAM is not the same as storage (disk / SSD)▾

Picture your desk and a filing cabinet. The desk holds whatever you're working on right now — instant to grab, but small, and cleared off at the end of the day. The cabinet is roomy and keeps everything safe overnight, but it's slower to reach.

A computer has both. RAM is the desk: fast, temporary working space, wiped blank when the power goes off. Storage (disk or SSD) is the cabinet: slower, but permanent — your files survive a reboot.

Now you're ready: this paper is entirely about the desk (RAM), never the cabinet. When it says memory is half a server's cost, it means the desk.

2What computer memory (RAM) actually does▾

Think of RAM like a kitchen whiteboard. While cooking you jot notes — the timer, what's in the oven, the next step. Fast to read, fast to scribble on, and wiped clean when you leave.

That's RAM for a computer: its fast, temporary working space. Every running program lives there. When a laptop "has 16 gigs of RAM," that whiteboard is what they mean — and more of it, faster, costs more.

Now you're ready: the paper proposes adding a new, cheaper whiteboard alongside the usual expensive one.

3The difference between reading and writing data▾

A sticky note on the fridge. You can glance at it (reading) a hundred times without touching it. Or you can cross it out and rewrite it (writing) — slower, takes effort, actually changes the paper.

To us these feel like one act. For hardware they're completely separate operations with completely different costs. This gap matters more than any other in the paper: the new memory is wonderful at reading but slow and clumsy at writing.

Now you're ready: the paper's central rule, AROM, is entirely about restricting writes while leaving reads free.

4What an operating system does▾

Picture an apartment building. Tenants (apps like a browser or game) live in their units but never touch the wiring, plumbing, or front‑door locks. For that there's a building manager with master keys.

An operating system — Windows, macOS, Android — is that manager. It's a privileged program between every app and the actual hardware. It can quietly pause an app, do work behind its back, and resume it, and it's the only thing trusted to touch the hardware directly.

Now you're ready: the paper hands the awkward job of managing the new memory to the OS — the one trusted manager — instead of to the memory chip.

5What copy‑on‑write is▾

A shared document set to "read‑only." Everyone can open it; the edit button is locked. Try to type anyway and the system silently does "Save As," makes a fresh copy just for you, and sends your typing there. The original is never touched.

That trick is copy‑on‑write — a decades‑old, ordinary OS technique, not something this paper invented. One word to defuse: when the system notices your write attempt, the technical term is a "fault." That sounds like a crash, but it's just a polite tap on the manager's shoulder.

Now you're ready: copy‑on‑write is exactly how the paper enforces its read‑only rule.

6Why memory wears out (write endurance)▾

A page you write in pencil, erase, and rewrite. The first few times are fine. But erase and rewrite the same spot enough and the paper thins, smudges, and finally tears.

Ordinary RAM is basically magic paper — you can rewrite it endlessly. But some memory is like that pencil page: each tiny cell survives only so many rewrites before it physically dies. Crucially, only writing wears it out — reading is harmless.

Now you're ready: because rewrites are limited, the OS has to ration how often the new memory gets written, so the chip survives its whole life.

7What a memory page or chunk is▾

A warehouse that only handles standard boxes. Even one small item goes in a whole box, and the forklift only moves complete boxes, never loose items.

Memory works the same way. We picture changing one letter at a time, but really memory is handled in fixed‑size chunks: the OS shuffles "pages" of about 4 kilobytes (the standard boxes); the hardware moves smaller 64‑byte chunks into the processor. Nothing happens one loose byte at a time.

Now you're ready: the paper's fix is to always write the new memory one whole 4‑kilobyte page at a time, so the chunks line up neatly and waste disappears.

8Why memory has a cost plateau▾

A gadget whose price dropped a little every year — until the discount suddenly stops and the price freezes, because one stubborn part inside simply can't be made cheaper. It has hit a permanent floor.

That's what happened to ordinary memory (DRAM). For decades it got cheaper per unit of data, then plateaued because of a physical limit in its tiny cells. This is different from a temporary price spike (like the AI‑driven shortage of 2025‑2026); it's a wall that won't move on its own. In data centers, memory is now over half the cost of each server.

Now you're ready: this stuck cost is the whole reason the paper hunts for a cheaper kind of memory to mix in.

9The hardware / software boundary▾

A restaurant kitchen. The cleverness of a meal can live in one fancy all‑in‑one machine, or in a plain dumb hotplate paired with a skilled chef. Same meal — but you've drawn the line between "machine" and "person" in a totally different place.

Computers face the same choice. A memory stick can contain its own little controller chip that hides its quirks (the fancy machine), or that intelligence can live in the OS, leaving the chip simple (the hotplate‑plus‑chef). That dividing line is the hardware/software interface — a contract of who does what, not a screen you click.

Now you're ready: the failed product Optane put all the cleverness in the chip; this paper moves it into the OS, keeping the chip cheap and dumb.

10What "LtRAM‑as‑peer" means▾

Older systems add cheap memory like a slow basement across the house: roomy but a pain to reach, so anything sent there gets "demoted" and you pay a delay every time you fetch it.

This paper does something different: a second shelf right beside your desk, at the same height, just as quick to read from. The cheap new memory (LtRAM) isn't demoted storage and isn't a temporary copy (a "cache"). It's a genuine equal partner to normal memory, holding different kinds of data.

Now you're ready: the paper insists LtRAM is a peer of normal memory, not a slow tier below it — and that framing makes its design different.

11What an "invariant" means▾

A board game with one rule players promise never to break: "the bank's money is never touched without a receipt." Everything else can vary, but that one rule holds every turn — so everyone plans around it with total confidence.

In computer systems that kind of always‑true rule is called an invariant. It just means "a promise the system guarantees will hold at all times, with no exceptions." AROM's invariant: regular programs may only read the new memory, never write it; only the OS writes it. Always.

Now you're ready: when the paper calls AROM an "invariant," it means this read‑only rule is never broken — and that reliability is what makes the approach work.

12What latency vs bandwidth means▾

A highway. Two different questions: how long does one car's trip take end to end? That's latency. How many cars pass a point each hour? That's bandwidth. A road can have short trips but few lanes, or long trips but many.

Memory has both measures. Latency is how long one read or write takes; bandwidth is how much data flows per second. People blur them into "fast," but a memory can be great at one and poor at the other.

Now you're ready: keeping these apart lets you read the paper's performance claims correctly — whether a number is about delay per access or total throughput.

The whole story in one place

AROM, in a nutshell

The problem, the failed shortcut, and the clean fix — then a map of how every concept connects.

Every modern server now spends more than half its hardware budget on one component: memory — the fast "RAM" your computer works out of, called DRAM. Worse, the price of DRAM has stopped falling (a permanent manufacturing wall, not a passing spike), so this cost will only keep growing.

The obvious escape is a cheaper memory called LtRAM (Long‑term RAM): it reads just as fast as DRAM and is far denser and cheaper — but it's slow to write, only in big chunks, and physically wears out if you rewrite the same spot too often. The one product that tried it, Intel Optane, disguised this odd memory as ordinary DRAM by bolting a complicated controller onto the memory stick. That disguise is exactly what made it slow and expensive.

This paper's key idea, AROM (Application Read‑Only Memory), throws the disguise away. Picture a library where visitors may read any book but only the librarian may shelve them. AROM lets ordinary programs only READ the new memory; only the operating system may write to it, and only when deliberately moving data into place. With small, messy writes forbidden, the memory chip can drop all of Optane's hidden machinery and stay dumb and cheap, while the software handles the awkward parts.

The payoff: across every candidate technology, projected read speed is 26–79% faster than Optane, landing within 0.9–3.2× of DRAM. The surprise is STT‑MRAM, projected at 0.9× — roughly 72 nanoseconds versus DRAM's 80 — meaning it could match or slightly beat ordinary memory on reads. The catch: STT‑MRAM costs about as much as DRAM and may never scale to DRAM‑sized capacity.

The work is honest about what's unsettled. No single LtRAM technology wins on speed, density, endurance, and manufacturability at once, so there's no clear front‑runner yet. And while the OS rations writes to keep the chip within its lifetime budget, it doesn't yet spread that wear evenly.

🔑 Key takeaways

LtRAM reads as fast as DRAM but writes slowly, in big chunks, and wears out — so it fits read‑mostly data.
One rule (apps may only read LtRAM; only the OS writes it) lets the chip drop all of Optane's costly hidden machinery.
Projected reads are 26–79% faster than Optane and within 0.9–3.2× of DRAM.
STT‑MRAM can match DRAM read speed (0.9×) but carries DRAM‑level cost and may not scale to DRAM capacity.
The OS handles migration invisibly via copy‑on‑write — apps need no changes.
Open questions: no single LtRAM technology dominates yet, and wear is bounded in total but not spread evenly.

The memory stack — speed vs. cost

Where each kind of memory sits

↑ Faster · closer to the CPUcheaper per GB ↓

CPU caches (SRAM)

The CPU's own scratch pad — not "RAM" in the everyday sense

read ~1–5 nswrite ~1–5 nsmegabytes

MAIN MEMORY — the CPU touches this at full speed

DRAM the RAM everyone means

Fast both ways · effectively unlimited writes · but cost‑per‑bit stopped falling a decade ago

read ~80 nswrite ~80 nscost: HIGH

LtRAM the paper's addition

Nearly as fast to read · much cheaper · writes are slow & big‑chunk only · wears out · best for data you read all the time but rarely change

read ~20–300 nswrite slowcost: LOW

STORAGE — a much slower path to the CPU

SSD / NVMe

Long‑term file storage, not active working memory

read ~50–150 µswrite ~50–200 µsterabytes

Takeaway: DRAM is fast both ways but expensive. LtRAM is almost as fast to read and much cheaper, but slow to write and it wears out — so you only put data there that you mostly just read. That restriction is exactly what AROM enforces.

Concept map — how it all fits together

Why → the new memory → what went wrong → the answer

① Why, and what's good for it

DRAM cost plateau

A cost wall — capacitors can't shrink cheaper

→

LtRAM

Cheap, dense, read‑fast, write‑slow, wears out

→

Read‑mostly data

Code, AI weights, cached DBs

LtRAM sits as a peer beside ↓

DRAM

Fast, symmetric, expensive — the baseline partner

② Cautionary tale: Optane (what NOT to do)

AIT lookup table

+76–200 ns on every read (94% miss rate)

Read‑modify‑write

4× write amplification, –75% throughput

Hardware wear‑level

60 µs latency spikes, unpredictable

A fat on‑stick controller hid LtRAM's quirks — and that hiding is what cost so much.

③ The paper's answer: AROM + a thin interface

The AROM invariant

"Apps read LtRAM only; only the OS may write it." Always.

enforced by

Copy‑on‑Write

OS silently copies a page to DRAM before any app write lands

enables

Thin hardware + smart OS

HW: cache‑line reads · 4 KB page writes · cell‑health report.
OS: page placement · dirty‑bit scan · token allocator

Eleven everyday pictures

How to think about AROM

Each picture stands alone — read them in any order. Every card ends with an honest note about where the picture stops being true.

The central idea · AROM

The reference‑room encyclopedia

Anyone may walk up and read the giant encyclopedia for free — no limit. But there's one firm house rule: visitors never write in it. If a change is needed, only a librarian makes it, on a fresh copy elsewhere. The shared book stays untouched, so it keeps serving everyone fast. That one rule is what lets all the complicated write‑handling machinery be removed from the hardware.

Where it misleads: here the data isn't precious for its own sake — once copied off to be edited, the original slot can be reclaimed. And the interception is automatic and invisible, not a conscious request.

Enforcement · Copy‑on‑Write

The kitchen recipe cards

Master recipe cards the whole staff reads all day, with a rule: nobody writes on a master. The instant a cook touches pen to a card, a helper slides a fresh photocopy under the pen so the ink lands on the copy. The master is never marked, and the cook barely notices the swap. That reflex is copy‑on‑write: the moment a program tries to change read‑only data, the system instantly diverts the change to a private copy.

Where it breaks: the master card stays filed afterward — but here, once data is copied out, the system may discard or reuse the original's slot. The "helper" is just normal machinery reacting in an instant.

The new memory · LtRAM

Stone tablets vs. a whiteboard

Reading a stone tablet is instant — exactly as fast as reading a whiteboard, and stone is cheap to stack densely. But writing on stone is the opposite of easy: carving is slow, you re‑carve a whole tablet rather than fix one letter, and a tablet can only be reground a limited number of times before it crumbles. LtRAM is shaped like that stone: read‑fast and cheap, but write‑slow, big‑chunk only, and it wears out.

Where it breaks: don't import "stone = sluggish" — reading LtRAM is genuinely as quick as fast memory. And wearing out happens only from re‑carving (rewriting), never from reading or time passing.

The key constraint · Endurance

The pencil page that tears

Write in pencil, erase, rewrite — many times, but not infinitely. Each erase scuffs the paper until it thins and finally tears. Crucially, reading costs the paper nothing; you could glance a million times and it stays pristine. Only the erase‑and‑rewrite cycle uses up its life. Write endurance is the same: each cell survives only a bounded number of rewrites, so rewrites are a finite budget to spend carefully.

Where it breaks: real paper wears gradually; a memory cell works fine then fails abruptly once its budget is gone. And DRAM is "magic paper" that never wears out — the budget is specific to the newer memory.

The ideal tenant · Read‑mostly data

The laminated evacuation map

People glance at the fire‑exit map constantly, but it's essentially never edited — maybe once in a remodel, years apart. Because it's read often and changed almost never, it makes sense to print and laminate it (cheap, durable, fast to glance at) rather than keep it on an expensive rewritable display. Read‑mostly data is the computer version: program code, a fixed AI model's numbers, a cached lookup table.

Where it breaks: a poster is reliably static, but real data can suddenly start being edited heavily (a "phase change") — so judging "read‑mostly" only by recent quiet can be wrong.

The economic problem · Cost plateau

The apartment‑density floor

A city keeps fitting more apartments onto the same plots by building taller, so density rises every year. You'd expect price per apartment to keep falling — but every unit still needs one costly part (say a special elevator mechanism) that refuses to get cheaper. So even as buildings grow denser, the cost per apartment flattens out. DRAM is exactly this: engineers pack more in, but one stubborn cell ingredient won't get cheaper, so price per gigabyte has stopped falling.

Where it breaks: a city can also have a temporary rent spike that settles. That's NOT this — the plateau is the permanent floor underneath any spike. Don't mistake the 2025‑26 supply spike for the deeper wall.

Keeping the chip alive · Token allocator

The drought water allowance

One well must last the whole dry season, so the family sets a daily bucket allowance sized to reach the first rains. Each chore spends a bucket; quiet days bank unused buckets for a big laundry day later. The token allocator works the same: the OS releases "write tokens" at a steady, precalculated rate, every write spends one, unused tokens accumulate, and because writes can never outrun the token supply, the chip is guaranteed to survive its full lifetime.

Where it breaks: an allowance controls how MUCH water you use, not WHICH tap you run — you could wear out one valve. Likewise, tokens bound total writing but don't spread it evenly. That's a separate, harder problem the paper flags as unfinished.

Spreading the wear · Wear leveling

The rotating work boots

You own five pairs and wear one a day. Always grabbing your favorite kills it in a year; instead you rotate, so all five last roughly five times as long. Wear leveling is this rotation for memory: since each location survives only so many rewrites, the system deliberately spreads writing around so no spot wears out far ahead of the rest.

Where it breaks: some "ceremonial" boots you never wear get nothing from rotation — and similarly, write‑once data sits in a fresh spot that never takes its share of wear (a real gap the paper admits). One earlier memory did this rotation secretly in hardware mid‑use, causing jarring pauses; the paper lets software pick calm moments instead.

Moving data around · Page migration

The desk and the filing cabinet

Documents you're actively editing stay on the desk (fast memory), within instant reach; reference binders you only consult get filed in the cabinet (cheap memory). The arrangement isn't frozen — a binder that suddenly needs heavy editing comes back to the desk first; a desk document that settles down gets filed away. Page migration is this continual moving, steered by how each piece is currently used.

Where it breaks: moving a folder just leaves an empty slot, but memory has an extra chore — after data returns to fast memory, the freed cheap‑memory slot often must be deliberately wiped before reuse, slow enough to schedule for a quiet moment.

The cautionary tale · Intel Optane

The simultaneous interpreter

Two people who share no language, with an interpreter relaying every sentence both ways. It works — but every sentence detours through the interpreter, so it's slower, and you pay the interpreter's salary the whole time. Optane took a strange new memory and bolted a translator onto the stick so the computer could talk to it as ordinary memory. The disguise worked, but the translator sat in the middle of every access — and that constant detour made Optane slow and expensive.

Where it breaks: a human interpreter adds real understanding. Optane's translator added no value — pure overhead papering over a format mismatch, paid on every single read and write.

Optane's hidden tax · AIT

The coat‑check ledger

Behind the counter, attendants secretly reshuffle coats all night so no rack overcrowds. Because a coat is never where you left it, every ticket means first looking up its current rack in a big ledger in the back — a delay on every retrieval. The Address Indirection Table is that hidden ledger: a private map translating each address to where data actually sits. It lets the hardware shuffle data, but every read must consult the map first.

Where it breaks: a coat‑check looks things up only when you leave — a few times a night. The AIT is consulted on essentially every read, so a tiny per‑lookup cost becomes a heavy constant tax. The paper's design removes the shuffling, and thus the ledger, entirely.

The full explanation

How it actually works

The complete walk‑through, section by section, with each of the paper's diagrams rebuilt as a visual you can read at a glance.

1 · Introduction — the movie trailer

The fast working memory in servers (DRAM) has become shockingly expensive. At companies like Microsoft and Meta, memory alone is now more than half the cost of a server, and the price per unit has stopped dropping.

The proposed solution: mix in a cheaper memory, LtRAM. It's dense, cheap, and reads as fast as DRAM, but it's slow to write, only in big chunks, and wears out — so it's perfect for data you read constantly but rarely change. The catch is the cautionary tale: Intel Optane made the new memory pretend to be DRAM by bolting a complicated controller onto the stick, and that disguise is exactly what made it slow and pricey. The authors say: stop forcing the disguise. Keep the hardware dumb, and let the OS handle the bookkeeping. Their rule for doing it safely is AROM — programs may only READ the new memory.

🍳

Analogy. Optane is one expensive all‑in‑one bread machine that secretly does every step but is slow and costly. This paper is a cheap bare hotplate (simple hardware) plus a skilled chef (the OS) who already knows the recipes. Same meal, far cheaper appliance — and the chef only enters the kitchen when an alarm goes off.

💡

Why it matters for a beginner: the one thing to hold onto is "make the memory chip simple and let the software be smart."

2 · Motivation — the money problem

Ordinary memory used to get cheaper every year, then stopped. Two things happen at once: computers want more and more memory, and the price per unit has flattened and isn't coming back down. The subtlety: this is a permanent wall, not a passing spike. Yes, an AI‑driven buying frenzy in 2025‑2026 roughly doubled memory prices, and that may ease — but underneath it is a deeper problem. The manufacturing process relies on a special capacitor that refuses to get cheaper as it shrinks, so more density no longer means a lower price per unit.

The usual tricks only go so far: you can compress data, or shove rarely‑used data onto slower memory ("tiering"). These help you use memory more efficiently, but none make memory itself fundamentally cheaper per unit. That sets up the pitch: maybe we need a genuinely cheaper kind of memory.

🥛

Analogy. Milk that dropped a little every year, then froze — not because the dairy ran out of ideas, but because one stubborn ingredient won't get cheaper. Coupons (compression) and the almost‑expired carton (tiering) trim your bill but never change the frozen base price. To actually pay less, you need a different, cheaper drink.

3 · LtRAM and the Optane warning

LtRAM is a family of new chips that share a "shape": read as fast as normal memory, much denser and cheaper, but with three matching downsides — writing is slow, writes must be big chunks, and the chip wears out. Candidate technologies have intimidating names (RRAM, PCM, FeFET, MRAM); treat them as brand names, none perfect.

Now the cautionary tale. The deep reason Optane cost so much is a chunk‑size mismatch. Inside Optane the smallest writable unit is a 256‑byte block (an "XPLine"), but programs often want to change just a few bytes. The hardware can't touch only those bytes — it must read the whole block, change the small part, and write the whole block back, every time. That's a read‑modify‑write, and it moves roughly 4× as much data as asked. On top of that, a secret lookup table (the AIT) translates every address, and it's too big to keep handy, so most reads pay an extra fetch. Hold onto that chunk‑size mismatch — it's the hinge of the whole paper.

🥚

Analogy. LtRAM is a pantry of eggs sold only by the carton. Reading is free, but you can only write a whole carton. To replace one cracked egg you take the whole carton down, swap the egg, and put it all back — that fetch‑swap‑return is the read‑modify‑write dance. The paper's fix: insist everyone always hands over a complete fresh carton, never a single egg, so the costly swap never happens.

Why Optane struggled

The granularity mismatch and the AIT tax

A · The chunk‑size mismatch

The CPU wants to change one 64‑byte cache line. Optane's media only writes a whole 256‑byte XPLine — four cache lines wide.

64 B

what the CPU wants

↳ trapped inside →

one 256‑byte XPLine = 4 cache lines

B · The read‑modify‑write dance — 3 media trips for 1 write

① Read

old

fetch full 256 B · ~75 ns

② Modify

NEW

old

patch 64 B in scratch · ~0 ns

③ Write

NEW

old

write full 256 B back · ~75 ns

📉

Result: ~4× write amplification. Random‑write throughput collapses from ~2.3 GB/s to ~0.56 GB/s (−75%). AROM's fix: the OS only ever writes complete 4 KB pages (a multiple of 256 B), so steps ① and ③ collapse into one write — the dance never happens.

C · The AIT lookup tax — a toll on every single read

Every read must first check Optane's secret Address Indirection Table. Its on‑stick cache covers only ~6% of the chip, so 94% of reads miss and pay an extra media trip.

6%
HIT

94% MISS → extra +76–200 ns before you even get your data

Read scenario	Total latency	vs DRAM (80 ns)
AIT cache HIT (rare)	351 ns	4.4× slower
AIT cache MISS (most reads)	427 ns	5.3× slower
DRAM reference	80 ns	1.0×

Takeaway: Optane tried to make weird memory look exactly like normal memory by stuffing a complex translator on the chip. That translator is what made it slow and expensive. AROM's insight: stop translating — expose the memory's quirks honestly and let the OS, which already manages all memory, handle them.

4 · The proposed hardware / software interface

This is the heart of the paper. Having shown that Optane failed by stuffing intelligence into the hardware, the authors flip it: make the hardware deliberately dumb, and move the clever decisions into the OS. The key that makes this safe is AROM — programs may only READ the new memory; only the OS writes it, and only when deliberately moving data. If a program tries to change something in LtRAM, the system quietly catches it (copy‑on‑write), copies the data into DRAM first, and lets the write land there. The program never notices.

Why does one little rule matter so much? Because it removes the need for all of Optane's hidden machinery. If programs can never make small messy writes — and AROM guarantees that — the hardware never does the read‑modify‑write dance and never needs a hidden translation table. The chip's whole job shrinks to three things: hand over data when read, accept whole 4 KB page writes from the OS, and report how worn each block is.

🖼️

Analogy. An art museum. Visitors may LOOK at the paintings but never touch them. The instant someone reaches out to scribble, a guard (the OS) makes a photocopy on ordinary paper and hands them that. The original is never altered. (But unlike a museum, the original isn't kept forever — once your data is copied to DRAM and you're editing it, the system may reclaim the old LtRAM space. The data is moved, not duplicated and preserved.)

AROM in action

Reading freely · copying before writing

▶ Read path — the fast, common case

1App asks to read Page A in LtRAM.

2The thin controller fetches the bytes directly — no lookup table, no detour.

3Data comes straight back.

OS: 😴 sleeping — not involved at all.
Latency: just the raw LtRAM read time (~20–300 ns).

▶ Write path — the copy‑on‑write detour

1App tries to WRITE Page A — but it's protected. The write is frozen.

2Hardware raises a page fault (an alarm). The OS wakes up.

3OS copies the whole 4 KB page A → DRAM (as A′) and remaps the address.

4The write is retried — and now lands on the DRAM copy. ✓

App thinks: "I wrote Page A." Reality: "You wrote A′ in DRAM. LtRAM was never touched." Later, the OS quietly erases the freed LtRAM block in the background.

📏

The rule in one line: Apps READ LtRAM directly. Apps WRITE DRAM (after an invisible copy). Only the OS kernel ever writes LtRAM — one full 4 KB page at a time.

5 · Implementation — did they build it?

Real LtRAM chips don't exist yet, so the team built a stand‑in: a research computer called Enzian (a processor wired to a reprogrammable chip), turning that chip into a pretend memory controller, with cheap NOR flash standing in for the exotic LtRAM. NOR flash isn't ideal, but it has the same "read‑fast, write‑slow, wears‑out" shape. They ran ordinary Linux with new memory‑management rules added.

The OS rations writes with a "write token" budget — every move of data into LtRAM spends one, so the chip can never wear out faster than planned. And the headline result: every candidate LtRAM technology comes out 26–79% faster than Optane on reads, landing between 0.9× and 3.2× of ordinary memory. The best candidate, STT‑MRAM, is projected at about 0.9× — roughly 72 ns against DRAM's 80, i.e. projected to match or slightly beat ordinary memory. Two honest caveats: these are modeled estimates, not measured guarantees, and STT‑MRAM in particular may hit manufacturing hurdles that keep it from DRAM‑class capacity.

🎬

Analogy. A movie set. The real building (true LtRAM) doesn't exist yet, so the crew filmed a scene with a cardboard door. The NOR flash is far slower than the target technologies (its reads take ~490 ns), so the prototype's own speed numbers are NOT what the finished product would feel like — it proves the design runs real software correctly; the real speed is estimated separately.

The write‑token budget

Rationing LtRAM writes like a data allowance

Tokens drip in at a fixed rate r = N·E ÷ T.

How it works. The OS releases write tokens at rate r = N·E ÷ T — where N = total pages, E = erases each page survives, and T = the seconds the chip should last. Every migration into LtRAM spends one token. Quiet periods bank unused tokens for busy bursts; when the bucket hits 0, migrations wait. Because writes can never outrun the token supply, the chip is guaranteed to outlast its deployment — with almost no bookkeeping (just one running count).

📱

Like a phone data plan: chip lifetime = your 36‑month plan; token rate = daily allowance; a migration = downloading a file; unused allowance rolls over; when it's gone, wait until tomorrow. You never burn through 36 months in 6.

⚠️

Known limitation (open question): tokens control HOW MUCH is written, not WHERE. Pages written once (code, ML weights) sit frozen and never share wear, so other blocks age faster. Even wear is not yet solved.

How fast could LtRAM be?

Projected 128‑byte read latency · lower is better

DRAM target LtRAM candidates Optane NOR prototype

DRAMtarget

80 ns · 1.0×

STT‑MRAM0.9× ★ best

72 ns · could beat DRAM

3D V‑RRAM1.9×

152 ns

3D FeFET3.2×

252 ns

Optane4–5×

305–374 ns

NOR proto6.6× · stand‑in

530 ns

scale: 0 ────────────── 600 ns

🏁

All projected LtRAM candidates beat Optane by 26–79%. Strip out Optane's hidden machinery (AROM) and every candidate jumps closer to DRAM; the best is within a rounding error of DRAM speed while staying far cheaper. Caveat: these are modeled projections, not production measurements — and no single technology yet wins on speed, density, cost, and endurance at once.

6 · Discussion — honest about the rough edges

The authors place their work next to three earlier buckets: making new memory pretend to be normal memory (Optane — failed on overhead); software that demotes rarely‑used data to a slower "basement" tier; and earlier thinkers who argued new memory deserves its own purpose‑built design. Their key distinction: they do NOT treat the cheap memory as a slow basement underneath DRAM — it's an equal partner beside it, specialized for read‑mostly data.

On open questions, they're upfront: which exact technology is best, how to cleverly guess which data is safe to move, how to spread wear when some data is written once and never touched, and how to avoid running out of DRAM if a workload suddenly starts writing a lot. These are flagged as future work, not failures.

7 · Conclusion — the recap

Start to finish: DRAM is now more than half a server's cost and its price has stopped falling. A cheaper memory (LtRAM) could help, but disguising it as DRAM (the Optane way) loads on too much hidden overhead. The answer is a thin, simple hardware/software line, made safe by one rule — AROM: programs see the new memory as read‑only, only the OS writes it, and only while deliberately moving data. With that, the chip strips down to almost nothing while the OS takes over the hard decisions. They proved it's real with a stand‑in prototype, aiming to let cheap memory sit as an equal partner to DRAM for read‑mostly data and meaningfully cut server cost.

🎯

If you read one section to "get" the paper, it's this one: the expensive problem, the failed shortcut, and the clean fix — a simple chip plus a smart OS following one read‑only rule.

Follow one operation, step by step

Worked examples

A photo service called "Snaply" keeps a 4 GB AI model in memory — read millions of times, written almost never. Let's trace what happens on a read, a write, a wear check, and an Optane failure.

① Reading from AROM — the happy path

Snaply's model lives in cheap LtRAM (here, the fastest candidate, STT‑MRAM). It needs to read one slice.

Snaply asks for a piece of the model — "give me 64 bytes at address X." It doesn't know or care which chip holds it.
The hardware sees address X is in LtRAM. No permission check needed — everyone may always READ LtRAM. That's the whole point of the rule.
The thin controller fetches the bytes directly. No secret lookup table, no bookkeeping, no detour — straight off the media.
Snaply gets its data — speed depends on the chip. NOR proto ~530 ns, 3D FeFET ~252 ns, 3D V‑RRAM ~152 ns, STT‑MRAM ~72 ns (essentially DRAM's ~80 ns). Snaply chose STT‑MRAM, so the read lands at ~72 ns and feels instant.

🔑

Key insight: reads are 99% of the action for read‑mostly data, and AROM makes them dead simple — nothing clever happens on the read path, so the cheap memory reads at its full native speed. Optane, by contrast, forced every read to first check a hidden table.

② Writing to AROM data — the copy‑on‑write detour

A developer nudges one weight — Snaply tries to WRITE 8 bytes at address X, which lives in read‑only LtRAM. Apps may never write LtRAM directly.

Snaply attempts the write — "store these 8 bytes at X," same as any memory.
The hardware catches it and raises an alarm. The 4 KB page is marked copy‑on‑write. The write is frozen mid‑air — it has NOT happened.
The OS copies the whole 4 KB page into DRAM. A fresh page in DRAM receives all 4,096 bytes.
The OS remaps address X to point at the new DRAM copy.
The write is retried — and now it lands. X is in writable DRAM, so the 8 bytes write instantly. Snaply never noticed the detour.
The OS reclaims the old LtRAM page later — scheduled for a slow erase in the background, off the hot path.

🔑

Key insight: the read‑only rule is never actually violated. Apps THINK they can write; under the hood the write is silently diverted to DRAM. LtRAM only ever receives writes from the OS, as whole clean 4 KB pages — never a messy 8‑byte poke.

③ Rationing writes — how tokens keep LtRAM alive

A small example with round numbers:

Chip specs: N = 1,000,000 pages; each survives E = 100,000 rewrites; target life T = 100,000,000 s (~3 years).
Total lifetime budget: N × E = 100,000,000,000 (one hundred billion) page‑writes over the whole life.
Convert to a steady drip: r = N·E ÷ T = 1,000 writes per second. The OS may hand out 1,000 tokens/second on average.
Spend a token on every move into LtRAM. If tokens are available, the move happens; if the bucket's empty, it waits.
Bank unused tokens during quiet hours, then spend the balance during a burst — like rolling over phone data. Long‑run average stays 1,000/s.
Notice what the OS does NOT track: just one running number (the token balance) — not a separate erase count for every page. Because the OS is the only thing that writes LtRAM, counting tokens is enough.

🔑

Key insight: wear is controlled by a single simple accountant — a token bucket dripping at exactly the sustainable rate. The chip is mathematically guaranteed to outlast its deployment with almost no bookkeeping. Caveat the paper names: the budget bounds TOTAL writes, not even distribution across the chip.

④ Why Optane failed — the granularity‑mismatch walkthrough

Follow one tiny operation: a program writes a single 64‑byte cache line into Optane, whose media only writes 256‑byte blocks.

The program asks to write 64 bytes (one cache line). Only those bytes need to change.
The hardware can't write just 64 bytes. They sit inside a 256‑byte block, surrounded by 192 bytes it must not disturb.
READ the whole 256‑byte block into scratch — 256 bytes moved, just to change 64.
MODIFY the 64 bytes in the scratch copy; the other 192 stay as they were.
WRITE the whole patched 256‑byte block back — another 256 bytes moved. That three‑step ritual is the read‑modify‑write.
Tally the waste: 256 ÷ 64 = 4× write amplification (8× total media traffic once you count the read). Throughput collapsed from ~2.3 GB/s to ~0.56 GB/s (−75%). Every small write paid it, every time.

🔑

How it connects: AROM designs this out of existence. Forbid apps from writing LtRAM at all, so the only writes that reach the chip are full 4 KB pages from the OS, which line up with the media. A whole‑page write never splits a block — the dance never happens. That's why AROM can be 26–79% faster than Optane.

Optional code sketches

Pseudocode for the curious — safe to skip.

How the OS handles a write to read‑only memory

# app tries to write to an LtRAM page
when app writes to LtRAM page:
    OS catches the write   # hardware raises a flag
    OS copies the page into regular DRAM
    OS lets the app write to the DRAM copy
    OS updates its map: app now points to DRAM copy
    LtRAM original stays unchanged, cleared later

Like a librarian who won't let you write in the reference book but will photocopy the page so you can mark your own copy. The app never notices the swap.

How the OS rations writes to protect LtRAM

every second: add a few tokens to the bucket

when OS wants to move a page into LtRAM:
    if bucket has at least 1 token:
        spend 1 token
        write the page to LtRAM
    else:
        hold off — try again later

Total tokens over the device's lifetime equals exactly the total writes the chip can handle, so it can never wear out ahead of schedule.

How reading from LtRAM works — no hidden steps

when app reads data at address X:
    send address X straight to LtRAM chip
    LtRAM chip returns the data directly
    done

# no lookup table, no translation, no detour

Optane consulted a secret address table (the AIT) before every read, adding 76–200 ns. Throwing that table away is the main reason the new design is projected 26–79% faster on reads.

Test yourself

Practice problems

Try each one before opening the answer. Each ends with the common misconception it's checking for.

Problem 1 · easy

A game stores level textures (displayed constantly, never changed) in AROM. A modding feature lets a player paint a decal directly onto one. What happens the instant the paint stroke tries to change that texture, and does it succeed?

Show answer

The paint stroke is an app write to read‑only LtRAM, which AROM forbids directly. The hardware catches the write and signals the OS (copy‑on‑write). The OS copies the whole 4 KB page into DRAM, repoints the address, and the write is retried there — where it succeeds. So the stroke DOES succeed; it just quietly lands on a DRAM copy. The player never notices.Misconception: that the write is rejected, errors out, or crashes. Read‑only applies only to the app's direct access — the OS transparently reroutes the write.

Problem 2 · medium

Suppose the designers kept AROM's read‑only rule but ALSO kept Optane's hidden translation table that every read must consult. What does this do to ordinary reads, and does the design still deliver its headline benefit?

Show answer

Reads get slower again. AROM reads are fast precisely because the thin controller hands bytes straight off the media with no lookup. Re‑adding the table forces every read to check it first, piling delay back onto the read path — exactly the overhead that hurt Optane. Since read‑mostly data is read far more than anything else, the design would lose much of its advantage and drift back toward Optane‑like performance.Misconception: that a lookup table only affects writes, or is "just bookkeeping." It taxes every read — the most common operation for this data.

Problem 3 · medium

The art‑museum analogy: visitors may look but not touch, and a guard hands a photocopy to anyone who tries to scribble. Complete and explain: "The analogy breaks down when you assume the original painting ______." What really happens to the original data?

Show answer

"...stays safely on the wall forever while you doodle on the copy." In a real museum the original is preserved and you end up with two copies. In AROM the data is MOVED, not duplicated‑and‑kept: once a page is copied to DRAM and is being changed, the system may reclaim and erase the original LtRAM page. The goal is to keep each piece in the single cheapest place that works, not maintain two permanent copies.Misconception: that LtRAM permanently holds an untouched master while DRAM keeps a duplicate. There's no permanent second copy — the LtRAM page is freed and erased.

Problem 4 · hard

LtRAM is compared to a pantry of eggs sold only by the carton: glance at any egg for free, but only write a whole carton. Name TWO ways this analogy breaks down for real LtRAM, and why each matters.

Show answer

(1) Real eggs spoil whether or not you touch them, but LtRAM only degrades from REWRITING — data you leave alone sits safely. This is exactly why read‑mostly data fits: reading never wears the chip. (2) A carton can be refilled endlessly, but an LtRAM block wears thin after enough rewrites (bounded endurance), so the OS must ration rewrites (the token budget). The "whole carton only" part IS accurate (writes happen a full 4 KB page at a time).Misconception: that reading wears LtRAM out, or that it lasts forever. Reads are harmless; only rewrites consume its life.

Problem 5 · medium

A program wants to change just a few bytes. Describe how Optane handles this versus an AROM system, and explain why AROM avoids the expensive read‑modify‑write entirely.

Show answer

Optane accepts the small write directly, but its media writes only in 256‑byte blocks, so it must read the whole block, patch a few bytes, and write it back — the read‑modify‑write dance, moving far more data than asked, every time. AROM forbids apps from writing LtRAM at all: the small write is diverted via copy‑on‑write into DRAM, so it never reaches LtRAM. The only writes LtRAM sees are whole 4 KB pages from the OS, which align with the media — a write never splits a block, so the dance never happens.Misconception: that AROM makes small writes "fast" via a chip trick. It doesn't speed them up — it prevents them from ever reaching LtRAM, sending them to DRAM instead.

Problem 6 · medium

Both Optane and AROM must spread writes to avoid wear. Optane did it secretly in hardware; AROM uses an OS "write token" budget. Beyond who does the work, what real, observable difference does this make for a running program?

Show answer

Optane's hidden hardware wear‑leveling occasionally shuffled data with no warning, causing sudden stalls — a read could freeze for ~60 µs (over 160× a normal read). Because the OS couldn't see it coming, latency was spiky and unpredictable. With AROM, the OS controls wear via a visible token budget and schedules slow erases during quiet moments — same protection, but predictable, with no surprise stalls on the hot path. The observable difference is steady latency versus random spikes.Misconception: that moving the job to software just makes things slower. The opposite — software control removes the unpredictable stalls.

Problem 7 · medium

You're designing a music‑streaming app on AROM. It holds: (a) program code, (b) the fixed numbers of a recommendation model, (c) a live count of current listeners, (d) each user's playback position, updated every few seconds. Which go in AROM, which stay in DRAM, and why?

Show answer

Put read‑mostly data in AROM: (a) program code and (b) the model's fixed numbers are read constantly but essentially never change, so they fit LtRAM and free up DRAM. Keep write‑heavy data in DRAM: (c) the live count and (d) per‑user position change every few seconds, so they belong in fast, freely‑writable DRAM. Placing write‑heavy items in AROM backfires — every update triggers a copy‑on‑write back to DRAM.Misconception: "cheaper memory = put everything there." Write‑heavy data dragged into AROM just gets copied back to DRAM anyway.

Problem 8 · hard

A large batch sitting happily in AROM as read‑mostly suddenly becomes write‑heavy — a feature flips on and the whole region starts getting written constantly. Walk through what the system must do per write, and why the authors flag this as risky and unsolved.

Show answer

Every write now triggers a copy‑on‑write fault: the OS must copy the page from LtRAM into DRAM before the write lands. If a huge region flips at once, the system pays this copy cost over and over, and all that data piles into DRAM — which can fill up and run out. It's unsolved because, unlike slow tiering (where a wrong placement only slows reads), here a wrong placement costs a copy on the critical path of every write and can exhaust DRAM. The authors mention admission control and DRAM‑occupancy watermarks as starting points — listed as future work.Misconception: that AROM handles any workload gracefully. It's excellent for read‑mostly data; a sudden shift to write‑heavy is exactly where it strains.

Problem 9 · medium

Older systems treated cheap memory as a slow "basement" tier beneath DRAM. AROM insists its LtRAM is an "equal partner" beside DRAM. Using the desk‑and‑shelf analogy, explain the difference — and name the one way LtRAM is still NOT a full equal.

Show answer

Older tiering is like renting a slow storage unit across town — cheap but far and slow, for rarely‑touched data. AROM is a second shelf right at desk height — an equal neighbor — because LtRAM reads about as fast as DRAM, so read‑mostly data on it isn't "demoted," just specialized. The one way it's NOT a full equal: LtRAM is still slow to write (and wears out), so it's a peer for READING but not a place for active, frequently‑changed data.Misconception: "cheaper memory = slower in every way / a downgrade." LtRAM is a peer specifically for reads; the catch is writing, not reading.

Quick answers

Frequently asked questions

Fifteen of the questions a newcomer most often asks. Tap any to expand.

Q1Why not just buy more DRAM instead of inventing new memory?▾

Because DRAM is the single most expensive part of a server — at Microsoft Azure and Meta, memory alone is more than half the total cost, so doubling it would balloon the bill. Worse, the price per unit has stopped falling. The insight: most data in memory is barely ever changed, so you can keep that read‑mostly data in cheaper memory and only pull it into DRAM when it actually needs writing.

Q2What actually happens when an app tries to write to AROM memory?▾

Nothing breaks, and the app doesn't even notice. AROM means programs may only read this cheap memory, never write it directly. The instant a program tries to change something, the system catches it, copies that data into fast memory first, and lets the write land on the copy. This catch‑and‑copy reflex is the old, well‑known OS technique called copy‑on‑write.

Q3Is fast new memory like STT‑MRAM something I can buy today?▾

No. Real LtRAM chips don't yet exist as products. The team built a stand‑in prototype: a research computer (Enzian) wired to a reprogrammable chip as a memory controller, with cheap NOR flash standing in. The promising speed numbers are projections from a model, not measurements of a finished product. The paper even warns STT‑MRAM may hit manufacturing and scaling hurdles that keep it from DRAM‑class capacity.

Q4What if the OS guesses wrong and moves data that's actually written a lot?▾

That's one of the open problems the paper flags. The system moves data into cheap memory based on it looking read‑mostly, but something quiet all morning can suddenly start being edited (a "phase change"). A wrong guess isn't a crash — each write just triggers the copy‑back into fast memory. The real risk is running out of fast memory if a workload suddenly writes a lot. Listed as future work.

Q5How is this different from Intel Optane, which already tried cheap memory?▾

Optane disguised the new memory as ordinary memory by bolting a complicated controller onto the stick. That controller translated every access and quietly shuffled data to even out wear, adding delay and cost to nearly every read and write. This paper does the opposite: keep the chip dumb and move the clever bookkeeping into the OS. The key trick is forbidding small writes, so the hardware never needs Optane's costly read‑modify‑write or its hidden translation table. Projected result: 26–79% faster reads.

Q6Why are small writes such a big deal? Optane could write, couldn't it?▾

It could, but at a steep price. Optane's smallest physical write is a 256‑byte block, while programs often change just a few bytes. To do that, the hardware read the whole block, changed the small part, and wrote it all back — every time. That read‑modify‑write made the chip move roughly 4× as much data as asked, and random‑write speed collapsed by ~75%. This paper sidesteps it by only ever writing whole 4 KB pages.

Q7Does every application benefit from AROM?▾

No. AROM only helps read‑mostly data — read often, changed rarely. Good examples: program code, a fixed AI model's numbers, and cached lookup tables in stores like Redis or Memcached. The cheap memory is slow to write and wears out, so anything you constantly edit belongs in normal fast memory. An app that mostly churns rapidly‑changing data would gain little.

Q8Is AROM a kind of permanent storage, like a hard drive or SSD?▾

No — a common mix‑up. AROM is main memory that sits right alongside DRAM and is read by the processor directly, just specialized for data you rarely change. It's not a disk or a place for long‑term files. Although the underlying technologies can physically retain data, the paper treats this as a working partner to DRAM, not a storage tier underneath it.

Q9Is AROM just a cache for DRAM?▾

No. A cache keeps a second, temporary copy of data that also lives elsewhere. AROM doesn't — the cheap memory is an equal partner beside DRAM, not a layer that duplicates it. Data lives in one place at a time: read‑mostly data in cheap memory, actively‑written data in DRAM. When a program writes, the data is moved into DRAM and the old slot can be reclaimed. The aim is keeping each piece in the cheapest spot that works, not two copies.

Q10When the system copies my data on a write, did something go wrong?▾

Not at all. Copy‑on‑write is normal, expected behavior, not a crash. It's the very mechanism that makes the read‑only rule safe. Writes aren't forbidden; they're transparently redirected onto a fresh copy in fast memory at the moment they happen, so the shared original is never altered and the program keeps working without knowing. The only cost is the work of moving that page.

Q11If this memory wears out from writing, how do they stop it dying early?▾

The OS rations writes with a token allocator. It hands out write permits at a steady, precalculated rate, and every move of data into cheap memory spends one. Unused permits bank up for busy spells. Because writes can never outrun the permit supply, the chip is guaranteed to survive its full lifetime — and the system needs only the running permit count, not the wear of every spot. Reads cost nothing, so reading as often as you like never wears it out.

Q12Does this token system fully solve the wear‑out problem?▾

Not completely, and the paper says so. The token allocator controls how much writing happens (so the device lasts long enough), but it doesn't guarantee even spreading across the chip. Some spots could be overused while others are barely touched. In particular, data written once at load time and never changed sits in a fresh spot that never takes its share of wear. Spreading wear fairly is a separate, harder, unfinished problem.

Q13Which exact memory technology will AROM actually use?▾

The paper doesn't pick a winner — that's itself an open question. Candidates include 3D V‑RRAM, 3D FeFET, PCM, and STT‑MRAM. Each is strong on one trait and weak on another. The projections show all beating Optane, with STT‑MRAM fastest (roughly matching DRAM on read speed) but possibly hard to manufacture at large capacity, while denser candidates like 3D V‑RRAM trade some speed for much lower cost. Choosing is left as future work.

Q14Could a smarter program, maybe using ML, predict which data to move?▾

The paper raises exactly this as an open question. Right now the system decides based on how each piece is currently used — pushing data that's gone quiet into cheap memory and pulling it back the moment it gets written. Making that prediction smarter is listed as future work, alongside picking a technology and spreading wear fairly.

Q15Did they prove this is faster than DRAM, or is that still a hope?▾

Still a goal, not a proven result — and the paper is upfront. They built a working prototype that runs real Linux correctly, showing the design is buildable. But the headline speed numbers, including the best candidate matching DRAM, are modeled projections of a single read in isolation, not full application measurements. The authors explicitly say real application‑level evaluation and any cost model remain future work.

Look it up

Glossary & further reading

Every term in one place, then a guided path to go deeper.

Core terms

AROM (Application Read‑Only Memory)The central rule: regular programs may only read the new cheaper memory, never write it directly — if a program tries to write, the OS quietly moves the data to fast memory first and lets the write land there.

LtRAM (Long‑term RAM)A family of new, cheaper memory chips that read as fast as ordinary memory but write slowly and physically wear out — ideal for data you look at constantly but rarely change.

Mixed DRAM/LtRAM main memoryThe top‑level proposal: put normal fast memory and the new cheaper memory side by side as equals — fast memory for data that changes often, cheap memory for data that's mostly read.

Copy‑on‑Write (CoW)A well‑known OS trick the paper reuses: a page is marked read‑only, and the instant a program tries to write it, the OS copies it to fast memory and directs the write to the copy. The program never notices.

Intel Optane (DCPMM)The only real‑world LtRAM‑style product you could buy (discontinued 2022), which hid all the new memory's quirks inside a complex on‑module controller — the cautionary example of what not to do.

Wear levelingSpreading writes evenly so no single spot wears out early — Optane did this secretly in hardware (causing unpredictable slowdowns), while the paper moves the job into the OS.

Proposed HW/SW interfaceThe paper's central invention: keep the memory module deliberately simple — it only reads data, accepts full‑page writes, and reports wear — and hand all the clever decisions to the OS.

Token‑based wear allocatorAn OS part that issues a fixed number of write tokens per second so LtRAM is never written faster than its rated lifetime allows — every page moved into LtRAM spends one token; unused tokens bank up.

Supporting terms

DRAMThe standard fast working memory in every server today — fast both ways, but very expensive because its cost per unit has stopped falling.

DRAM cost‑per‑bit plateauThe decade‑long stagnation where the price of a gigabyte of DRAM stopped dropping, caused by a physical limit in the tiny capacitor cell — a structural ceiling, not a temporary spike.

On‑DIMM translation layerThe hidden logic Optane (and flash drives before it) built onto the memory stick to make quirky new memory look ordinary — but all that bookkeeping added the very delays it was supposed to avoid.

AIT (Address Indirection Table)A secret lookup table inside Optane recording where each piece of data actually sits, so it can shuffle data to spread wear — every read must consult it first, and it's too big to keep fully in fast cache.

Read‑Modify‑Write / Write AmplificationThe expensive three‑step operation forced when a program changes a few bytes on memory that only writes large blocks: read the block, change the small part, write it all back — ~4× the work requested.

EnduranceThe limited number of times a cell can be written before it fails — unlike DRAM (effectively unlimited), LtRAM has a fixed write budget that must last the whole deployment.

4 KB page write granularityThe rule that every write to LtRAM is exactly one full memory page, which lines up with the media's native block size and permanently eliminates read‑modify‑write.

Read‑mostly dataData programs look at frequently but almost never change — program code, AI model weights, cached lookup tables — exactly what LtRAM holds cheaply and safely.

Page placement policiesThe OS's three‑part strategy: put provably read‑only data directly into LtRAM, move any written page back to fast memory, and periodically promote long‑untouched pages into LtRAM.

Dirty‑bit trackingThe OS method for finding pages not written recently — a single flag per page checked on a timer; pages with a clear flag are candidates for promotion to LtRAM.

Enzian / NOR flash prototypeThe research testbed: a computer with a reprogrammable chip acting as the LtRAM controller, with cheap NOR flash standing in for real LtRAM, running ordinary Linux.

LtRAM technology candidatesThe shortlist of physical materials that could serve as LtRAM — 3D V‑RRAM, 3D FeFET, PCM, and STT‑MRAM — each trading off speed, density, cost, and endurance.

Projected read latencyThe paper's estimate of a single read through the simplified interface, split into interconnect, controller, and media — showing every candidate 26–79% faster than Optane.

Static‑page wear problemAn open problem: data written once and never changed always sits in the same blocks, so those blocks never share wear while others exhaust themselves.

Mixed‑workload bandwidth collapseAn Optane‑specific issue: mixing reads and writes dropped its total read throughput by 67%, whereas DRAM holds bandwidth regardless of the mix.

Further reading — conceptual next steps

The computer memory hierarchy: RAM, cache, and storage

explainer · 30–45 min

How computers store data at different speeds and costs. Try "memory hierarchy explained" or "CPU cache vs RAM vs SSD" (Khan Academy, Computerphile, MIT OCW). The whole paper exists because different memory sits at different points on this curve.

How operating systems manage memory: virtual memory and page tables

short course · 1–2 hrs

How the OS gives each program its own view of memory, uses 4 KB "pages," and handles a page fault. See the free Operating Systems: Three Easy Pieces (ostep.org). The paper's solution lives almost entirely in the OS memory manager.

Copy‑on‑write: how the OS shares memory safely

explainer · 20–30 min

The single mechanical trick the paper uses to enforce AROM. Search "copy‑on‑write Linux explained."

How flash storage works: cells, erase cycles, and wear leveling

explainer · 30–45 min

Why flash erases in large blocks, why cells wear out, and how SSDs hide it. NOR flash is the prototype's stand‑in LtRAM, and these properties are shared by all serious LtRAM candidates.

Why memory is so expensive in cloud data centers

explainer · 20–30 min

How server hardware costs break down at hyperscalers, and why DRAM became such a large fraction. The paper opens with DRAM being over half of server cost at Azure and Meta.

Storage Class Memory is Dead, All Hail Managed‑Retention Memory medium

Legtchenko et al. · Microsoft Research · HotOS '25

Proposes that emerging memory deserves its own purpose‑built interface — the intellectual parent of AROM. Short and accessible.

Towards Memory Specialization: A Case for Long‑Term and Short‑Term RAM medium

Li, Abdurrahman, Cleaveland, Legtchenko et al. · DIMES '25

The paper that coined "LtRAM" — effectively "chapter one" of the story AROM continues.

Software‑Defined Far Memory in Warehouse‑Scale Computers hard

Lagar‑Cavilla, Ahn, Souhlal et al. · Google · ASPLOS '19

How Google reduced DRAM by compressing cold pages in software — the best pure‑software answer, and why software alone isn't enough.

Basic Performance Measurements of the Intel Optane DC Persistent Memory Module hard

Izraelevitz, Yang, Zhang et al. · arXiv 2019

The measurement study that found Optane's read bandwidth collapses 67% when reads and writes mix — the hard numbers motivating AROM.

Pond: CXL‑Based Memory Pooling Systems for Cloud Platforms hard

Li, Berger, Hsu et al. · Microsoft/UW‑Madison · ASPLOS 2023

Sharing a common pool of memory over CXL — source of the "DRAM is more than half of Azure server cost" data point.