Launch v0.10 — Flatseek + Flatlens

A search engine without the infrastructure.

Today we’re shipping Flatseek 0.10 together with Flatlens — a Kibana-style dashboard that runs on top of it. Disk-first, no JVM, no always-on cluster. Drop in a CSV and you’re searching billions of rows on a laptop.

Why we built this

Most teams don’t actually need a 24/7 search cluster. They have a few hundred million rows of customer data, log archives, or scraped catalogs that they’d like to search occasionally — and pay roughly nothing the rest of the time.

With Elasticsearch or OpenSearch, that means provisioning a JVM, sizing heap, picking shard counts, and watching your bill grow whether anyone’s querying or not. We thought there should be a much simpler shape for this problem: files on disk, indexes next to them, query when you want.

That’s what Flatseek is. 0.10 is the release where it grows up — proper trigram indexing, fast wildcard search, range queries on numbers and dates, aggregations on top of the same index, and now a real UI in front of it.

TL;DR. Flatseek 0.10 is a disk-first search engine you install with one command and run as a single Python process. Flatlens is the dashboard that ships with it — upload, search, aggregate, map, and manage indexes from the browser.

What’s new in Flatseek 0.10

Trigram + wildcard search

Substring queries like name:*joh* resolve through trigram blocks instead of scanning the corpus.

Typed columns

TEXT, KEYWORD, DATE, INT, FLOAT, BOOL, ARRAY, OBJECT — chosen at index time, not guessed at query time.

Range queries

Numeric and date ranges (age:[25 TO 35]) with the same syntax across all numeric types.

Aggregations

Terms, stats, cardinality, date-histogram — computed against the on-disk index without spinning up reducers.

Encrypted indexes

Per-index passphrases. Index files stay encrypted at rest; queries unlock with the key in-process only.

~40% smaller on disk

Block-compressed postings + dictionary-coded keywords. Same recall, less storage.

Indexing is incremental. You can append documents to an existing index without reindexing, and reads aren’t blocked while a write batch is still landing. Uploads run in the background — the dashboard polls and shows progress, you keep working.

What 1 million rows looks like

Numbers from a real run: 1,000,000 Solana transactions (128.4 MB CSV) on a single laptop, 8 worker processes splitting the file by byte range. Build wall-time was 4 minutes 42 seconds, with a final on-disk index at 341 MB across 7 typed columns.

1M
Documents indexed in one run
4:42
Wall-clock build time, 8 workers
341MB
Total index size on disk
~3.5k/s
Sustained docs/sec throughput
flatseek  —  ~/projects/locations  —  zsh
$flatseek build flatdata/data/solana_txs.csv -o ./solana_txs -w 8
── Pre-classify ──────────────────────────────────────
Classify solana_txs.csv:
'signature'KEYWORD (100%)
'slot'FLOAT (100%)
'timestamp'DATE (100%)
'fee'FLOAT (100%)
'status'KEYWORD (100%)
'signer'KEYWORD (100%)
'programs'KEYWORD (100%)
Counting rows in 1 file(s)...
solana_txs.csv: ~1,000,000 rows (128.4 MB)
Single file — splitting into 8 byte-range chunks (O(1) seek)...
── Launching 8/8 workers ─────────────────────────────
Wrowsstateencodediskmemckpt_r
✓ 0126,754flush21.2s2.7s43MB99%
✓ 1124,757flush21.1s3.0s43MB99%
✓ 2124,742flush20.8s2.8s44MB99%
✓ 3124,757flush21.0s3.5s44MB99%
✓ 4124,759flush21.4s3.1s44MB99%
✓ 5124,750flush21.3s3.3s44MB99%
✓ 6124,733flush20.8s3.0s44MB99%
✓ 7124,748flush21.9s3.2s46MB99%
100.0% 1,000,000 / 1,000,000 · 4m42s elapsed
── Merging stats ─────────────────────────────────────
Merged stats: 1,000,000 docs, 18,432,000 entries, 341.2 MB
All 8 workers completed.
$# Wildcard prefix + numeric range, scoped to a single program
$flatseek search ./solana_txs "program:raydium AND signer:*7xMg* AND amount:>1000000" -n 3
Query: program:raydium AND signer:*7xMg* AND amount:>1000000
Found: 4,821 match(es) in 38 ms
DOC {'_id': 8821, 'signature': '5xJ3k9...LPv7m', 'program': 'raydium', 'signer': '7xMg3K...xpLR2', 'amount': '2500000', 'fee': '5000', 'status': 'success'}
DOC {'_id': 8822, 'signature': '3Ab2kP...QmR8w', 'program': 'raydium', 'signer': '7xMg3K...xpLR2', 'amount': '1750000', 'fee': '5000', 'status': 'success'}
$# Boolean OR + nested object dot-path
$flatseek search ./solana_txs '(program:jupiter OR program:orca) AND status:success' -n 1
Found: 241,338 match(es) in 22 ms
$# Top programs by transaction count — terms aggregation
$curl -s 'http://localhost:8000/solana_txs/_agg?field=program&type=terms&size=5'
{
"aggregations": {
"program": { "buckets": [
{ "key": "jupiter", "doc_count": 412,309 },
{ "key": "raydium", "doc_count": 218,455 },
{ "key": "orca", "doc_count": 114,901 },
{ "key": "phoenix", "doc_count": 86,212 },
{ "key": "meteora", "doc_count": 52,664 }
] }
},
"took_ms": 61
}
$# Stats over fee — min, max, avg, sum, count in one shot
$curl -s 'http://localhost:8000/solana_txs/_agg?field=fee&type=stats'
{ "fee": { "min": 0, "max": 125000, "avg": 5184.3, "sum": 5,184,300,000, "count": 1,000,000 } }
$# Date histogram by month
$curl -s 'http://localhost:8000/solana_txs/_agg?field=timestamp&type=date_histogram&interval=month'
[ { "key": "2026-01", "doc_count": 214,008 },
{ "key": "2026-02", "doc_count": 335,217 },
{ "key": "2026-03", "doc_count": 450,775 } ]

All three panes are real output, not mockups. The numeric column types mean range queries don’t scan documents — they walk a sorted block index. Aggregations on KEYWORD and ARRAY fields hit a value dictionary instead of materializing rows.

Introducing Flatlens

Flatlens is the dashboard. If you’ve used Kibana, the shape is familiar: a left sidebar for navigation, a topbar to pick the index and write your query, and tabs for the things you actually want to do — search, aggregate, plot on a map, or manage indexes.

Flatlens search tab — table of results with date histogram above
Search. Query bar, date histogram, paginated table, expandable JSON rows.

Search syntax, in full

The query language is field-scoped and familiar. Everything below comes straight from the engine’s test suite — these aren’t aspirational examples, they’re the assertions that have to pass before a release goes out.

field:value
Exact match on a KEYWORD, free-text on TEXT. city:Jakarta, status:active, name:Alice.
field:*term*
Wildcard — prefix (name:Alice*), suffix (name:*Putri), or substring (city:*arta*). Powered by trigram blocks, not a corpus scan.
field:[A TO B]
Inclusive range on numbers and dates. balance:[1000000 TO 3000000], created_at:[20260101 TO 20260131].
field:>N / >=N
Open-ended numeric comparators. balance:>1000000, score:>=88, level:<=10. Bare form (balance>1000000) also works.
field:true / false
Booleans normalised across true/false, yes/no, 1/0. enabled:true, enabled:false.
tags:<value>
Array fields are searched element-wise. tags:graphql matches any document where any tag equals graphql.
obj.path:value
Object fields auto-expand. address.city:Jakarta, address.zip:10110, and arbitrarily deep — info.metadata.a.submetadata.deep:secret_a.
obj.arr[i]:val
Index into a nested array directly. info.metadata.a.tags[0]:alpha, info.metadata.b.items[1]:item_y.
AND / OR / NOT
Boolean combinators with parens. city:Jakarta AND (status:active OR status:pending), * AND NOT enabled:true.
*
Match-all. Pairs naturally with NOT for exclusions: * AND NOT city:Jakarta.

Numeric and date queries don’t scan documents — they read a sorted block index. Free-text and wildcard queries route through the trigram store, so they cost the same shape as a literal lookup. Combining boolean operators just intersects or unions doc-id postings, which is fast even when both sides are large.

What the dashboard adds on top

The query language is the same in the CLI, the REST API, and the dashboard. What Flatlens layers on it:

Aggregations

Aggregations run on the same on-disk index as search. The supported types cover the 90% case for analytics-style queries:

terms
Top values for a field, with doc counts. Works on KEYWORD and ARRAY; size cap configurable. Useful for facets and breakdowns.
stats
Min, max, avg, sum, count in one pass over a numeric field. {"stats": {"field": "score"}}.
avg / min / max / sum
Single-statistic shortcuts when you don’t need the full stats bundle. Faster on hot paths.
cardinality
Distinct-count of values. HyperLogLog under the hood — fixed memory, ~1% relative error at default precision.
date_histogram
Bucket by interval (hour, day, week, month). Auto-trims empty leading/trailing buckets so charts don’t show zero-padded edges.

Aggregations always accept a query filter — meaning you can do “average fee per program for transactions over 1M lamports in February” in one request, no two-stage pipeline. The dashboard wires search filters and aggregations together automatically: any tag chip you add on the Search tab carries over when you switch to Aggregations.

Flatlens aggregations tab showing a term aggregation as a donut chart
Aggregations. Terms, stats, cardinality, date histogram — same index, no reducer process.

Upload flow that doesn’t fight you

Most search tools assume the data is already there. Flatlens treats ingestion as a real surface. Drop a CSV, JSON, JSONL, or XLSX file (or paste a URL) and you get a four-step wizard with sensible defaults at each step:

Flatlens upload step 1 — drag and drop
1. Pick a file — drag-drop, browse, or paste a URL.
Flatlens upload step 2 — preview and column mapping
2. Preview & map — first rows + auto-detected types per column.
Flatlens upload step 3 — index configuration
3. Configure — index name, optional encryption, batch size.
Flatlens upload step 4 — progress with logs
4. Index in background — live progress and tailable logs.

The mapping step matters most. Flatlens auto-detects column types from samples — emails, phones, dates, numbers, booleans, and now arrays and objects too. You can rename fields on the way in (insertAs), exclude noisy columns, edit headers as a JSON array, and re-use mapping history from previous uploads. Nothing forces you to commit before you see what it looks like.

About arrays and objects. 0.10 stores them natively. Aggregations on an ARRAY field treat each element as a term. OBJECT fields expand to dot-paths in the table view (address.city, address.zip), so you query the leaves directly without flattening upstream.

Map view for geo data

Pick a latitude field, a longitude field, and a sample size. Flatlens renders the matching documents on a Leaflet map with auto-fit bounds. Combined lat,lng string fields work too — handy when your source already encodes coordinates as one column.

Flatlens map tab plotting points across Indonesia
Map. Plot result sets directly from the search query.

Indexes management

The Indexes tab is the cluster page if you’d had a cluster. It lists every index with document count, size on disk, column count, and current status (idle, indexing, encrypted). Per-index actions cover what you’d expect: rename, encrypt or decrypt, view the mapping, tail the indexing log, and delete.

Flatlens indexes tab listing all indexes with size and status
Indexes. Cluster page for a thing that isn’t a cluster.

Quick start

One command installs the engine, the dashboard, and the CLI. macOS, Linux, Python 3.11+:

$ curl -fsSL flatseek.io/install.sh | sh
$ flatseek serve            # API on :8000
$ flatseek dash             # Flatlens on :8080

Or, if you prefer Python packaging:

$ pip install flatseek
$ flatseek index create my-data --from data.csv
$ flatseek query my-data 'name:*joh* AND city:jakarta'

The same index works from the dashboard, the REST API, and the CLI — no separate connectors. If you want to see it without installing anything, the live demo at flatlens.demo.flatseek.io is loaded with a few public datasets.

What’s next

0.11 is already in motion. The shortlist:

We pick the next milestones based on what people actually run. If you’re using Flatseek (or thinking about it), the issues tracker is the right place to push us in a direction.

Try it in five minutes.

One command. No cluster, no JVM, no waiting.

Get started