Trace Profiles · Tracerator

Profile guide

Base ISL Patterns

These profiles summarize the input sequence length distributions that drive prefill pressure, cache reuse opportunity, and replay behavior. Use them as visual shorthand when choosing a trace shape for planning runs.

Figure 3

Benchmarking workflow

Efficiency model

Effective efficiency

Successful output volume only tells part of the story. The useful throughput signal has to be divided by all of the system costs that consume the run envelope.

Effective efficiency drops as memory, network, storage I/O, and power costs expand around accelerator time.

Figure 2

KV cache hierarchy latencies

Log-scale bar chart showing GPU HBM at about 0.001 ms, CPU DRAM at about 0.05 ms, local flash at about 1 ms, and shared storage at about 10 ms. — Retrieval speed versus recompute becomes the dominant decision once state leaves HBM.