One engine, built for real time at massive scale.
A distributed, columnar, memory-first database with a vectorized engine that orchestrates work across GPU and CPU. Tables shard across worker ranks as compressed columnar chunks; data tiers from GPU VRAM down to object storage; streams are queried the same instant they land.
Streams in. Every query shape out — one data path, no copies, no ETL window.
The engine routes each operation to the right silicon.
Kinetica compiles each query into vectorized operations: parallel work — scans, joins, distance math — runs across thousands of GPU cores at once, while sequential logic stays on the CPU.
The same primitives serve every model — vector search and spatial joins share one path, not a bolted-on library.
- Vectorized, not row-by-row — one instruction over many values, the way GPUs want work
- GPU + CPU orchestration — intensive compute on GPU, sequential logic on CPU
- GPU VRAM as the hottest tier — the active working set sits closest to the cores
Sharded across ranks, with nothing in the middle.
A cluster is worker ranks, each owning a slice of every table and computing it in parallel — no shared disk, no bottleneck. Within a rank, TOMs split tables into shards by a shard key.
A high-cardinality key spreads rows evenly; tables sharing a key join locally, no network reshuffle. Add ranks to scale on demand.
- Worker ranks → TOMs → shards — a clear ownership hierarchy, no shared state
- Shard-key locality — co-located joins avoid cross-network shuffles
- Replicated dimensions — broadcast small tables so every join stays local
Columns, compressed, in skippable chunks.
Data is stored column-by-column, so a query reads only the columns it touches, and same-type values compress tightly.
Each column splits into chunks; in-memory min/max metadata lets the engine skip any chunk that can't match — most queries scan a fraction of the table.
- Columnar layout — read only the columns the query needs
- Dictionary encoding + compression — less memory, less I/O, more in VRAM
- Chunk-skipping — min/max metadata prunes scans automatically
Query the data while it's still arriving.
Kinetica is streaming-first: Kafka and CDC records are queryable the instant they land — one table takes writes and reads at once, no load window.
Streaming materialized views keep derived results current as records arrive — no recompute.
- Simultaneous query & ingest — no ETL window, no staleness gap
- Streaming materialized views — derived state updates as data arrives
- Vectorized stencils — continuous lightweight compute over the stream
Hot data in VRAM. Cold data in S3. One query plan.
Tables shard across nodes; joins on a shared key stay local. On top sits configurable tiered storage.
The working set lives in GPU VRAM and RAM, warm data on disk and SSD, cold data in S3, HDFS, or Azure Blob — moved transparently between tiers, so capacity is bounded only by the cheapest one.
- Shard-key locality — co-located joins, no network reshuffle
- Tiered storage — VRAM → RAM → disk → SSD → cold object store
- Petabyte capacity — guaranteed query completion at cold-storage size
Many users, one cluster, no one starves.
A shared database can't let one heavy query crowd out a live dashboard. Kinetica governs resources per user and group — how fast a request runs, how much it consumes, how long its data stays hot.
Who runs first
Each user or group carries a queue priority, so high-priority requests run ahead of competing work.
How much they use
Caps on CPU threads, memory, and tier usage stop any one user from overloading the cluster — a guard against accidental denial of service.
How long data stays hot
Higher-priority data holds a fast tier longer; lower-priority data is evicted first as the working set shifts, keeping critical workloads warm.
Every data model runs over the same tables.
Every retrieval mode compiles to the same primitives — no separate vector store, graph DB, or spatial engine to sync. Each has its own page.
The architecture is the advantage.
Real-time ingest, petabyte scale, every data model, GPU speed — in one engine. Spin up a free instance and run your own workload against it.