Launching Flatseek 0.10 + Flatlens

Why we built this

Most teams don’t actually need a 24/7 search cluster. They have a few hundred million rows of customer data, log archives, or scraped catalogs that they’d like to search occasionally — and pay roughly nothing the rest of the time.

With Elasticsearch or OpenSearch, that means provisioning a JVM, sizing heap, picking shard counts, and watching your bill grow whether anyone’s querying or not. We thought there should be a much simpler shape for this problem: files on disk, indexes next to them, query when you want.

That’s what Flatseek is. 0.10 is the release where it grows up — proper trigram indexing, fast wildcard search, range queries on numbers and dates, aggregations on top of the same index, and now a real UI in front of it.

TL;DR. Flatseek 0.10 is a disk-first search engine you install with one command and run as a single Python process. Flatlens is the dashboard that ships with it — upload, search, aggregate, map, and manage indexes from the browser.

What’s new in Flatseek 0.10

Trigram + wildcard search

Substring queries like name:*joh* resolve through trigram blocks instead of scanning the corpus.

Typed columns

TEXT, KEYWORD, DATE, INT, FLOAT, BOOL, ARRAY, OBJECT — chosen at index time, not guessed at query time.

Range queries

Numeric and date ranges (age:[25 TO 35]) with the same syntax across all numeric types.

Aggregations

Terms, stats, cardinality, date-histogram — computed against the on-disk index without spinning up reducers.

Encrypted indexes

Per-index passphrases. Index files stay encrypted at rest; queries unlock with the key in-process only.

~40% smaller on disk

Block-compressed postings + dictionary-coded keywords. Same recall, less storage.

Indexing is incremental. You can append documents to an existing index without reindexing, and reads aren’t blocked while a write batch is still landing. Uploads run in the background — the dashboard polls and shows progress, you keep working.

What 1 million rows looks like

Numbers from a real run: 1,000,000 Solana transactions (128.4 MB CSV) on a single laptop, 8 worker processes splitting the file by byte range. Build wall-time was 4 minutes 42 seconds, with a final on-disk index at 341 MB across 7 typed columns.

Documents indexed in one run

4:42

Wall-clock build time, 8 workers

341MB

Total index size on disk

~3.5k/s

Sustained docs/sec throughput

flatseek — ~/projects/locations — zsh

$flatseek build flatdata/data/solana_txs.csv -o ./solana_txs -w 8

── Pre-classify ──────────────────────────────────────

Classify solana_txs.csv:

'signature' → KEYWORD (100%) ✓

'slot' → FLOAT (100%) ✓

'timestamp' → DATE (100%) ✓

'fee' → FLOAT (100%) ✓

'status' → KEYWORD (100%) ✓

'signer' → KEYWORD (100%) ✓

'programs' → KEYWORD (100%) ✓

Counting rows in 1 file(s)...

solana_txs.csv: ~1,000,000 rows (128.4 MB)

Single file — splitting into 8 byte-range chunks (O(1) seek)...

── Launching 8/8 workers ─────────────────────────────

W	rows	state	encode	disk	mem	ckpt_r
✓ 0	126,754	flush	21.2s	2.7s	43MB	99%
✓ 1	124,757	flush	21.1s	3.0s	43MB	99%
✓ 2	124,742	flush	20.8s	2.8s	44MB	99%
✓ 3	124,757	flush	21.0s	3.5s	44MB	99%
✓ 4	124,759	flush	21.4s	3.1s	44MB	99%
✓ 5	124,750	flush	21.3s	3.3s	44MB	99%
✓ 6	124,733	flush	20.8s	3.0s	44MB	99%
✓ 7	124,748	flush	21.9s	3.2s	46MB	99%

100.0% 1,000,000 / 1,000,000 · 4m42s elapsed

── Merging stats ─────────────────────────────────────

Merged stats: 1,000,000 docs, 18,432,000 entries, 341.2 MB

All 8 workers completed.

$# Wildcard prefix + numeric range, scoped to a single program

$flatseek search ./solana_txs "program:raydium AND signer:*7xMg* AND amount:>1000000" -n 3

Query: program:raydium AND signer:*7xMg* AND amount:>1000000

Found: 4,821 match(es) in 38 ms

DOC {'_id': 8821, 'signature': '5xJ3k9...LPv7m', 'program': 'raydium', 'signer': '7xMg3K...xpLR2', 'amount': '2500000', 'fee': '5000', 'status': 'success'}

DOC {'_id': 8822, 'signature': '3Ab2kP...QmR8w', 'program': 'raydium', 'signer': '7xMg3K...xpLR2', 'amount': '1750000', 'fee': '5000', 'status': 'success'}

$# Boolean OR + nested object dot-path

$flatseek search ./solana_txs '(program:jupiter OR program:orca) AND status:success' -n 1

Found: 241,338 match(es) in 22 ms

$# Top programs by transaction count — terms aggregation

$curl -s 'http://localhost:8000/solana_txs/_agg?field=program&type=terms&size=5'

{

"aggregations": {

"program": { "buckets": [

{ "key": "jupiter", "doc_count": 412,309 },

{ "key": "raydium", "doc_count": 218,455 },

{ "key": "orca", "doc_count": 114,901 },

{ "key": "phoenix", "doc_count": 86,212 },

{ "key": "meteora", "doc_count": 52,664 }

] }

"took_ms": 61

}

$# Stats over fee — min, max, avg, sum, count in one shot

$curl -s 'http://localhost:8000/solana_txs/_agg?field=fee&type=stats'

{ "fee": { "min": 0, "max": 125000, "avg": 5184.3, "sum": 5,184,300,000, "count": 1,000,000 } }

$# Date histogram by month

$curl -s 'http://localhost:8000/solana_txs/_agg?field=timestamp&type=date_histogram&interval=month'

[ { "key": "2026-01", "doc_count": 214,008 },

{ "key": "2026-02", "doc_count": 335,217 },

{ "key": "2026-03", "doc_count": 450,775 } ]

All three panes are real output, not mockups. The numeric column types mean range queries don’t scan documents — they walk a sorted block index. Aggregations on KEYWORD and ARRAY fields hit a value dictionary instead of materializing rows.

Introducing Flatlens

Flatlens is the dashboard. If you’ve used Kibana, the shape is familiar: a left sidebar for navigation, a topbar to pick the index and write your query, and tabs for the things you actually want to do — search, aggregate, plot on a map, or manage indexes.

Flatlens search tab — table of results with date histogram above — **Search.** Query bar, date histogram, paginated table, expandable JSON rows.

Search syntax, in full

The query language is field-scoped and familiar. Everything below comes straight from the engine’s test suite — these aren’t aspirational examples, they’re the assertions that have to pass before a release goes out.

field:value

Exact match on a KEYWORD, free-text on TEXT. city:Jakarta, status:active, name:Alice.

field:*term*

Wildcard — prefix (name:Alice*), suffix (name:*Putri), or substring (city:*arta*). Powered by trigram blocks, not a corpus scan.

field:[A TO B]

Inclusive range on numbers and dates. balance:[1000000 TO 3000000], created_at:[20260101 TO 20260131].

field:>N / >=N

Open-ended numeric comparators. balance:>1000000, score:>=88, level:<=10. Bare form (balance>1000000) also works.

field:true / false

Booleans normalised across true/false, yes/no, 1/0. enabled:true, enabled:false.

tags:<value>

Array fields are searched element-wise. tags:graphql matches any document where any tag equals graphql.

obj.path:value

Object fields auto-expand. address.city:Jakarta, address.zip:10110, and arbitrarily deep — info.metadata.a.submetadata.deep:secret_a.

obj.arr[i]:val

Index into a nested array directly. info.metadata.a.tags[0]:alpha, info.metadata.b.items[1]:item_y.

AND / OR / NOT

Boolean combinators with parens. city:Jakarta AND (status:active OR status:pending), * AND NOT enabled:true.

Match-all. Pairs naturally with NOT for exclusions: * AND NOT city:Jakarta.

Numeric and date queries don’t scan documents — they read a sorted block index. Free-text and wildcard queries route through the trigram store, so they cost the same shape as a literal lookup. Combining boolean operators just intersects or unions doc-id postings, which is fast even when both sides are large.

What the dashboard adds on top

The query language is the same in the CLI, the REST API, and the dashboard. What Flatlens layers on it:

Per-cell filter chips. Hover any cell, click the magnifier, the value becomes a filter — no syntax to memorise.
Filter builder popup. Compose the query visually — pick field, operator, value, and watch the resulting query string update live.
Date distribution chart. Auto-detected on date fields, switchable hourly/daily, with brushable date range filtering.
Expandable rows. Click any row to see the raw document with JSON syntax highlighting and a one-click copy button.
Hideable columns + saved selections. One-click hide; columns show up as restorable chips. Selection persists per index.
URL-shareable state. Query, filters, page, aggregation settings — all live in the URL. Send a teammate the link, they see the same view.

Aggregations

Aggregations run on the same on-disk index as search. The supported types cover the 90% case for analytics-style queries:

terms

Top values for a field, with doc counts. Works on KEYWORD and ARRAY; size cap configurable. Useful for facets and breakdowns.

stats

Min, max, avg, sum, count in one pass over a numeric field. {"stats": {"field": "score"}}.

avg / min / max / sum

Single-statistic shortcuts when you don’t need the full stats bundle. Faster on hot paths.

cardinality

Distinct-count of values. HyperLogLog under the hood — fixed memory, ~1% relative error at default precision.

date_histogram

Bucket by interval (hour, day, week, month). Auto-trims empty leading/trailing buckets so charts don’t show zero-padded edges.

Aggregations always accept a query filter — meaning you can do “average fee per program for transactions over 1M lamports in February” in one request, no two-stage pipeline. The dashboard wires search filters and aggregations together automatically: any tag chip you add on the Search tab carries over when you switch to Aggregations.

Flatlens aggregations tab showing a term aggregation as a donut chart — **Aggregations.** Terms, stats, cardinality, date histogram — same index, no reducer process.

Upload flow that doesn’t fight you

Most search tools assume the data is already there. Flatlens treats ingestion as a real surface. Drop a CSV, JSON, JSONL, or XLSX file (or paste a URL) and you get a four-step wizard with sensible defaults at each step:

Flatlens upload step 1 — drag and drop — **1. Pick a file** — drag-drop, browse, or paste a URL.

Flatlens upload step 2 — preview and column mapping — **2. Preview & map** — first rows + auto-detected types per column.

Flatlens upload step 3 — index configuration — **3. Configure** — index name, optional encryption, batch size.

Flatlens upload step 4 — progress with logs — **4. Index in background** — live progress and tailable logs.

The mapping step matters most. Flatlens auto-detects column types from samples — emails, phones, dates, numbers, booleans, and now arrays and objects too. You can rename fields on the way in (insertAs), exclude noisy columns, edit headers as a JSON array, and re-use mapping history from previous uploads. Nothing forces you to commit before you see what it looks like.

About arrays and objects. 0.10 stores them natively. Aggregations on an ARRAY field treat each element as a term. OBJECT fields expand to dot-paths in the table view (address.city, address.zip), so you query the leaves directly without flattening upstream.

Map view for geo data

Pick a latitude field, a longitude field, and a sample size. Flatlens renders the matching documents on a Leaflet map with auto-fit bounds. Combined lat,lng string fields work too — handy when your source already encodes coordinates as one column.

Flatlens map tab plotting points across Indonesia — **Map.** Plot result sets directly from the search query.

Indexes management

The Indexes tab is the cluster page if you’d had a cluster. It lists every index with document count, size on disk, column count, and current status (idle, indexing, encrypted). Per-index actions cover what you’d expect: rename, encrypt or decrypt, view the mapping, tail the indexing log, and delete.

Flatlens indexes tab listing all indexes with size and status — **Indexes.** Cluster page for a thing that isn’t a cluster.

Quick start

One command installs the engine, the dashboard, and the CLI. macOS, Linux, Python 3.11+:

$ curl -fsSL flatseek.io/install.sh | sh
$ flatseek serve            # API on :8000
$ flatseek dash             # Flatlens on :8080

Or, if you prefer Python packaging:

$ pip install flatseek
$ flatseek index create my-data --from data.csv
$ flatseek query my-data 'name:*joh* AND city:jakarta'

The same index works from the dashboard, the REST API, and the CLI — no separate connectors. If you want to see it without installing anything, the live demo at flatlens.demo.flatseek.io is loaded with a few public datasets.

What’s next

0.11 is already in motion. The shortlist:

Saved searches and dashboards — pin a query, share a dashboard with a URL.
Vector search — small-footprint ANN for the long tail of semantic queries, alongside trigrams.
Streaming ingest — append-only writes via HTTP and Kafka source.
Multi-tenancy primitives — per-API-key index visibility and rate limits.

We pick the next milestones based on what people actually run. If you’re using Flatseek (or thinking about it), the issues tracker is the right place to push us in a direction.

Try it in five minutes.

One command. No cluster, no JVM, no waiting.

Get started