keplordb / docs · v0.1.0
docs/getting started/overview

KeplorDB, in one page.

A columnar, append-only log engine written in Rust — purpose-built for high-throughput structured event ingestion. No server, no SQL, no threads. Open the engine, append events, query columns.

Overview

KeplorDB is an embeddable library. It is not a database server. You link it into your Rust binary, call Engine::open(), and write events. Every append() goes to a WAL on disk and to an in-memory columnar buffer. When the buffer fills, it rotates into an immutable .kseg segment file.

Reads mmap those segment files and scan the relevant columns directly — no deserialization, no row reconstruction, no query planner. Aggregations run over contiguous i64 and u32 arrays using AVX2 SIMD, with a scalar fallback.

not forMutable rows, joins, secondary indexes, SQL, or multi-writer scenarios. KeplorDB assumes one writer, append-only, time-ordered.

Install

Add the crate to your Cargo.toml:

Cargo.toml[dependencies]
keplordb = { git = "https://github.com/themankindproject/keplordb" }

Or via cargo:

shell$ cargo add keplordb --git https://github.com/themankindproject/keplordb

Requires Rust 1.82 or newer. The crate pulls in zstd, zerocopy, memmap2, thiserror, rustc-hash, hashbrown, mimalloc, and rayon as dependencies.

Quickstart

src/main.rs — open · append · aggregateuse keplordb::{Engine, EngineConfig, LogEvent, QueryFilter};

fn main() -> Result<(), keplordb::DbError> {
    let engine = Engine::open(EngineConfig {
        data_dir: "/tmp/my_logs".into(),
        wal_max_events: 100_000,
        ..Default::default()
    })?;

    // Write a single event.
    let mut e = LogEvent::new(ts_ns());
    e.dims[0] = "alice".into();
    e.dims[2] = "gpt-4o".into();
    e.metric       = 5_000_000;
    e.counters[0] = 1000;
    e.status       = 200;
    engine.append(&e)?;

    // Last 50 events for a user.
    let results = engine.query_recent(&QueryFilter {
        user_id: Some("alice".into()),
        ..Default::default()
    }, 50)?;

    // Full-segment aggregate (SIMD scan).
    let totals = engine.aggregate(&QueryFilter::default())?;
    println!("events: {}, metric sum: {}", totals.event_count, totals.metric);

    engine.flush()?;
    Ok(())
}
tipPrefer append_batch(&events) when you have more than a handful of events. It bypasses per-event WAL framing and gets you ~973K ev/s versus ~843K for singles.

Data model

Every record is a LogEvent — a flat, fixed-shape struct. There is no schema migration; every event has the same columns. Unused columns are cheap (they intern to the empty string or zero).

Dimensions vs. labels

LogEvent schema

fieldtypedescription
idStringUnique event identifier.
ts_nsi64Nanosecond timestamp. Sorted, binary-searchable per segment.
metrici64Primary signed metric — cost, duration, scalar value.
counters[0..5]u32Five unsigned counters — tokens, bytes, retries.
latency_msu32Primary latency (ms). Second latency lives in counters.
statusu16Status code — HTTP, gRPC, or application-defined.
flagsu1616 boolean bitflags.
dims[0..5]StringFive indexed, filterable dimensions. Interned per segment.
labels[0..3]StringThree free-form string labels.
payloadStringJSON metadata — opaque to the engine.

Segments & WAL

A KeplorDB data directory contains:

Writes go to both the WAL and an in-memory columnar buffer. When the buffer hits wal_max_events, it serialises to a new segment file and the WAL is truncated. Segments are never modified after creation — only deleted.

Durability

Every append():

  1. Writes to the in-memory columnar buffer.
  2. Writes to the on-disk WAL file.
  3. fsyncs every 64 events by default (configurable).

On crash, Engine::open() replays the WAL and writes recovered events into a segment. Maximum data loss is one sync interval — with defaults, up to 63 events. Set wal_sync_interval: 1 for strict fsync per append, at a significant throughput cost.

caveatfsync only protects against process crashes. For power-loss durability on consumer SSDs, also ensure your filesystem is mounted with data=journal or equivalent — KeplorDB does not attempt to flush hardware write caches.

Garbage collection

Retention is segment-level. engine.gc(cutoff_ts_ns) deletes every segment whose max_ts < cutoff. There is no compaction, no background merge, and no write amplification — GC is a few unlink() calls.

// drop segments older than 7 days
engine.gc(ts_ns() - 7 * 86_400 * 1_000_000_000)?;

API reference

The full surface of the Engine struct.

Lifecycle

methoddescription
Engine::open(config)Open (or create) a data directory. Replays the WAL on start.
engine.flush()Flush in-memory buffer + WAL to disk. Always called in Drop.

Write

methoddescription
engine.append(&event)Append a single event. WAL-durable.
engine.append_batch(&events)Append a slice of events. Single WAL frame, bulk column writes.

Read

methoddescription
engine.query_recent(&filter, limit)Return the most recent events matching filter, newest first.
engine.aggregate(&filter)SIMD-scanned totals: event count, metric sum, per-status tallies.
engine.query_rollups(from_day, to_day, user, api_key)Per-day, per-user, per-key rollups across the selected range.
engine.get_event("id")Point lookup by event id. Uses bloom filters to skip segments.

Admin

methoddescription
engine.delete_event("id")Tombstone a single event by id. Excluded from subsequent reads.
engine.gc(cutoff_ts_ns)Drop every segment with max_ts < cutoff. Returns stats.

Errors

All fallible calls return Result<T, DbError>. Notable variants:

DbError::Io(io::Error) io
Underlying filesystem or mmap failure. Engine state is typically preserved; retry after diagnosing.
DbError::WalCorrupt { offset } recovery
WAL frame failed CRC check. The engine truncates at offset and surfaces this once on open.
DbError::SegmentBadMagic { path } recovery
Segment header missing the expected magic bytes. File is moved to corrupt/.
DbError::InternTableFull write
A single segment accumulated more than 65,535 unique strings in one dim. Rotate the segment or reduce cardinality.

Segment format

Every .kseg file is self-describing and read-only. Columns are written in fixed-width blocks so mmap'd slices can be reinterpreted as typed arrays via zerocopy::FromBytes.

kseg — on-disk layout┌────────────────────────────────┐
│  header            256 B       │  magic · version · N · bloom offset
├────────────────────────────────┤
│  ts_ns             i64 × N     │  sorted, binary-searchable
│  metric            i64 × N     │  contiguous for SIMD SUM
│  counters          u32 × N × 5 │
│  latencies         u32 × N × 2 │
│  status · flags    u16 × N     │
│  dim indices       u16/u8 × 5  │  interned string refs
│  ext indices       u16 × N × 4 │
├────────────────────────────────┤
│  bloom filter      128 B       │  primary dim skip
│  intern table      zstd        │  lazy-decompressed
│  variable data     zstd        │  labels · payload
└────────────────────────────────┘

Write path

Read path

SIMD & scan

Hot scan kernels compile to AVX2 when the target supports it and fall back to scalar code otherwise.

sum_i64(col: &[i64]) -> i128 avx2
Horizontal sum over the metric column. 4 lanes × 256-bit accumulators.
sum_u32_as_u64(col: &[u32]) -> u64 avx2
Widening sum for counter columns — avoids overflow on long segments.
count_eq_u16(col: &[u16], needle: u16) -> usize avx2
Vectorised equality count for status and flag columns.
filtered_aggregate(…) avx2
Combined mask + sum pass: filter by dim index, sum metric in a single linear scan.

Configuration

fieldtypedefaultdescription
data_dirPathBufDirectory to hold WAL + segments. Created if missing.
wal_max_eventsu32500_000Events per segment before rotation.
wal_sync_intervalu3264WAL fsync interval, in events. Set to 1 for zero data loss.
bloom_bitsu321024Bits of bloom per segment. Higher = fewer false positives.
compress_leveli323zstd level for intern table + variable data.

Crash recovery

Engine::open() scans the data directory in this order:

  1. Load meta.json — if missing, rebuild from segment headers.
  2. Validate each .kseg header magic; move corrupt files aside.
  3. Replay wal.log frame-by-frame, CRC-checked; truncate at first bad frame.
  4. Write replayed events into a new segment; rewrite meta.json.

Recovery is single-threaded and proportional to WAL size. For a default 64-event sync interval, recovery processes tens of thousands of events per second.

Sizing & limits

limitvaluewhy
events / segment2³¹u32 row indices throughout the column layout.
unique strings / dim / segment65_535u16 intern index. Exceeding triggers early rotation.
payload sizeUnbounded, but compressed together; aim for < 4 KB typical.
concurrent writers1Single-writer by design. Wrap Engine in an Arc<Mutex> for multi-producer.