KeplorDB is an embeddable engine purpose-built for high-throughput, time-ordered event ingestion.
Declare your schema with #[derive(Schema)] — named
fields, fluent builders, zero positional indexing. No server, no SQL, no background threads.
#[derive(Schema)] generates typed event + filter builders. No positional dims[0] indexing anywhere.
Wrong counter type, two bloom dims, missing schema_id — all rejected by the macro with spans pointing at your source.
Segment headers record the active schema. Opening on a mismatched directory errors instead of silently reading misinterpreted bytes.
Queries touch only the columns they need. Aggregations scan contiguous arrays via mmap + zerocopy — zero deserialization.
Per-segment bloom on the primary dim; per-chunk zone maps on all dims (256 rows); per-value compressed status bitmaps for O(1) lookups.
String → u16 resolve tables decompress once per segment via OnceLock. 170× faster filtered aggregate after warmup.
Vectorized sum, count, filtered-sum using 256-bit registers. Hardware prefetch 128-256 bytes ahead. Scalar fallback on unsupported targets.
Each append_batch lands on one shard — one fetch_add, one lock. 8-thread concurrent writers hit 2.5M ev/s.
Cross-segment aggregates fan out across cores. query_recent merges globally newest-first with max_ts early-termination.
Events persisted before return. Default: fsync every 64 events / 256 KB per shard. Set to 1 for zero-loss, or u32::MAX for best-effort.
Partial WAL frames detected via CRC32 and recovered up to the last complete frame. Rotating *.wal.rotating files replayed safely too.
Engine::open() in your Rust binary. No TCP, no SQL, no background threads, no external service.
Measured with Criterion over 1 million events in 10 segments. Appends are WAL-durable; aggregates scan real, on-disk column data via mmap + AVX2 SIMD with rayon fan-out. Filtered queries skip chunks via zone maps and segments via bloom + max_ts.
Read-path throughput figures assume the intern resolve table is warm —
OnceLock decompresses it
once per segment, then re-uses the cached HashMap<String,u16>.
Run cargo bench --workspace locally to reproduce.
| operation | latency | throughput |
|---|---|---|
| write path | ||
| batch append · 4096 ev | 4.76 ms | 860K ev/s |
| batch append · 1024 ev | 873 µs | 1.17M ev/s |
| concurrent · 8t × 1024 | 1.24 ms | 6.6M ev/s |
| wal memory-only | 352 µs | 2.9M ev/s |
| rotation · 1 shard · 1024 | 10.1 ms | compress+fsync |
| read path · 1M events · 10 segments | ||
| aggregate · no filter | 756 µs | 1.3G ev/s |
| aggregate · user filter | 254 µs | 3.9G ev/s |
| aggregate · time range | 274 µs | 3.7G ev/s |
| aggregate · user + time | 105 µs | 9.5G ev/s |
| query_recent · 100 | 37 µs | — |
| query_recent · 1000 | 456 µs | — |
| query_recent · user · 100 | 70 µs | — |
| rollup (in-memory) | ||
| rollup · single user · day | 7.5 µs | — |
| rollup · all buckets · day | 24 µs | — |
append_batch claims a shard with one fetch_add, takes one lock, writes the whole batch. N concurrent callers → N shards, zero contention in the common case.ArcSwap. Intern resolve tables cached with OnceLock per segment — decompression + HashMap build happen once. Query_recent merges globally newest-first across all segments.wal_sync_interval / wal_sync_bytes. Three-phase crash-safe rotation: orphaned *.wal.rotating files replayed on next open.engine.gc(cutoff) drops segments whose max_ts is below the threshold — no compaction, no write amplification. Manifest updated in-memory; unlink()s issued once.
The raw LogEvent<D, C, L> — D indexed
string dimensions, C unsigned counters, L free-form string labels — backs every
typed schema. You rarely touch it directly; #[derive(Schema)]
derives a typed builder with named setters and a .into_log_event()
conversion. Ordered fields map positionally to the columns below. Caps: D ≤ 256, C ≤ 64, L ≤ 64.
| field | type | description |
|---|---|---|
| ts_ns | i64 | Nanosecond timestamp. Sorted, binary-searchable per segment. Delta-encoded + zstd, decoded once at mmap open. |
| metric | i64 | Primary signed metric — cost, duration, value. Delta-encoded + zstd. |
| counters[0..C] | u32 | C unsigned counters per event. Tagged with #[counter] in your schema struct. |
| latency_ms | u32 | Primary latency in milliseconds. |
| latency_detail_ms | u32 | Detailed latency breakdown in milliseconds. |
| status | u16 | Status code — HTTP, gRPC, application. Bitmap-indexed for O(1) lookups. |
| flags | EventFlags | 16 boolean bitflags, newtype-wrapped for type safety. |
| dims[0..D] | String | D indexed, filterable dimensions. Tagged #[dim]; optional #[dim(bloom)] or #[dim(rollup)] wires per-segment bloom and daily rollups. |
| labels[0..L] | String | L free-form string labels. Stored, not indexed. Returned by query_recent; invisible to aggregate. |
| id | String | Unique event identifier. Interned per segment for fast point lookups via engine.get_event(id). |
KeplorDB ships as a Cargo workspace — keplordb (engine) plus keplordb-macros (the #[derive(Schema)] proc-macro, re-exported from the main crate). One git dep gets both.
Requires Rust 1.82+. Pre-1.0 release — API and on-disk format may change before 1.0. See PRODUCTION.md for known gaps.