flatseek — ~/projects/locations — zsh
build
search
stats
serve
aggregate
python
$ flatseek build flatdata/data/locations.csv -o ./locations -w 8
── Pre-classify ──────────────────────────────────────
Classify locations.csv :
'id' → KEYWORD (80%) ✓
'name' → TEXT (100%) ✓
'country' → KEYWORD (100%) ✓
'lat' → FLOAT (100%) ✓
'lng' → FLOAT (100%) ✓
'category' → KEYWORD (100%) ✓
'population' → FLOAT (80%) ✓
Counting rows in 1 file(s)...
locations.csv : ~1,000,000 rows (52.9 MB)
Single file — splitting into 8 byte-range chunks (O(1) seek) ...
Plan written: ./locations/build_plan.json
── Launching 8/8 workers ─────────────────────────────
Worker 0 pid 52067 log → worker_0.log
Worker 1 pid 52068 log → worker_1.log
Worker 2 pid 52069 log → worker_2.log
Worker 3-7 …
W rows state ckpt encode disk mem ckpt_r
✓ 0 126,754 flush 2 21.2s 2.7s 43MB 99%
✓ 1 124,757 flush 2 21.1s 3.0s 43MB 99%
✓ 2 124,742 flush 2 20.8s 2.8s 44MB 99%
✓ 3 124,757 flush 2 21.0s 3.5s 44MB 99%
✓ 4 124,759 flush 2 21.4s 3.1s 44MB 99%
✓ 5 124,750 flush 2 21.3s 3.3s 44MB 99%
✓ 6 124,733 flush 2 20.8s 3.0s 44MB 99%
✓ 7 124,748 flush 2 21.9s 3.2s 46MB 99%
100.0%
1,000,000 / 1,000,000
· 4m42s elapsed
── Workers done (4m42s) ─────────────────────────────
Worker 0 [done] 126,754 new rows (total: 126,754 docs)
Worker 1 [done] 124,757 new rows (total: 424,757 docs)
Worker 2 [done] 124,742 new rows (total: 724,742 docs)
Worker 3 [done] 124,757 new rows (total: 1,024,757 docs)
Worker 4-7 …
── Merging stats ─────────────────────────────────────
Merged stats: 1,000,000 docs, 22,636,800 entries, 485.7 MB
All 8 workers completed.
$ # Single-keyword search across 1M docs
$ flatseek search ./locations "jakarta" -n 3
Query: jakarta
Found: 23,714 match(es) (page 1, showing 3)
--- 1 ---
DOC
{'_id': 55, 'id': '56', 'name': 'Jakarta', 'country': 'ID', 'lat': '-6.217089', 'lng': '106.833582', 'category': 'tourist', 'population': '121262'}
--- 2 ---
DOC
{'_id': 70, 'id': '71', 'name': 'Jakarta', 'country': 'ID', 'lat': '-6.225444', 'lng': '106.808682', 'category': 'regional', 'population': '94715'}
--- 3 ---
DOC
{'_id': 95, 'id': '96', 'name': 'Jakarta', 'country': 'ID', 'lat': '-6.228413', 'lng': '106.816618', 'category': 'tourist', 'population': '686130'}
$ # Field filter + boolean AND
$ flatseek search ./locations "country:ID AND category:tourist" -n 2
Query: country:ID AND category:tourist
Found: 52,515 match(es) (page 1, showing 2)
DOC
{'_id': 9, 'id': '10', 'name': 'Bogor', 'country': 'ID', 'lat': '-6.679548', 'lng': '106.800171', 'category': 'tourist', 'population': '251629'}
DOC
{'_id': 13, 'id': '14', 'name': 'Tangerang', 'country': 'ID', 'lat': '-6.22116', 'lng': '106.643498', 'category': 'tourist', 'population': '814544'}
$ # Wildcard prefix + numeric range
$ flatseek search ./locations 'name:bogor* AND population:[100000 TO 500000]' -n 2
Query: name:bogor* AND population:[100000 TO 500000]
Found: 572 match(es) (page 1, showing 2)
DOC
{'_id': 1459, 'id': '1460', 'name': 'Bogor', 'country': 'ID', 'category': 'major', 'population': '251629'}
DOC
{'_id': 2532, 'id': '2533', 'name': 'Bogor', 'country': 'ID', 'category': 'industrial', 'population': '187340'}
$ flatseek stats ./locations
Docs: 1,000,000
Index: 465.0 MB (524288 files)
Doc store: 20.8 MB
Total: 485.7 MB
Columns:
category → KEYWORD
country → KEYWORD
id → KEYWORD
lat → FLOAT
lng → FLOAT
name → TEXT
population → FLOAT
$ # Plan a build without executing
$ flatseek plan ./flatdata/data/locations.csv -o ./locations
Counting rows in 1 file(s)...
locations.csv : ~1,000,000 rows (52.9 MB)
Single file — splitting into 4 byte-range chunks (O(1) seek) ...
Plan written: ./locations/build_plan.json
Worker 0: bytes [45…13,878,994 ] (13 MB) ~250,000 rows
Worker 1: bytes [13,878,994…27,757,919 ] (13 MB) ~250,000 rows
Worker 2: bytes [27,757,919…41,636,876 ] (13 MB) ~250,000 rows
Worker 3: bytes [41,636,876…55,515,777 ] (13 MB) ~250,000 rows
$ flatseek serve -d ./locations -p 8000
2026-04-22 09:58:01 [INFO] flatseek.api.main: Flatlens dashboard mounted at /dashboard
INFO: Will watch for changes in these directories: ['./locations' ]
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
Starting Flatseek API + Dashboard on 0.0.0.0:8000
Data directory: ./locations
→ API: http://localhost:8000
→ Dashboard: http://localhost:8000/dashboard
Opening dashboard in browser...
Press Ctrl+C to stop
2026-04-22 09:58:02 [INFO] Started server process [52729]
INFO: Application startup complete.
127.0.0.1:50021 - "GET /dashboard HTTP/1.1" 302 Found
127.0.0.1:50021 - "GET /dashboard/ HTTP/1.1" 200 OK
127.0.0.1:50029 - "GET /locations/_search?q=jakarta HTTP/1.1" 200 OK
$ # Run aggregations via REST — no doc loaded into RAM
$ curl -X POST http://localhost:8000/locations/_aggregate \
-H "Content-Type: application/json" -d '{
"query": "*",
"aggs": {
"by_country": { "terms": { "field": "country", "size": 5 } },
"by_category": { "terms": { "field": "category", "size": 5 } },
"pop_stats": { "stats": { "field": "population" } }
}
}'
{
"took" : 42 ,
"hits" : { "total" : 1000000 },
"aggregations" : {
"by_country" : {
"buckets" : [
{ "key" : "ID" , "doc_count" : 211284 },
{ "key" : "NG" , "doc_count" : 198557 },
{ "key" : "BR" , "doc_count" : 187902 },
{ "key" : "IN" , "doc_count" : 176443 },
{ "key" : "US" , "doc_count" : 112985 }
]
},
"by_category" : {
"buckets" : [
{ "key" : "tourist" , "doc_count" : 252108 },
{ "key" : "industrial" , "doc_count" : 249772 },
{ "key" : "regional" , "doc_count" : 249654 },
{ "key" : "major" , "doc_count" : 248466 }
]
},
"pop_stats" : {
"count" : 1000000 ,
"min" : 512 ,
"max" : 9998247 ,
"avg" : 2503641.7 ,
"sum" : 2503641723
}
}
}
$ python
Python 3.11.6 (main, Oct 2 2023, 13:45:54) on darwin
››› from flatseek import Flatseek
››› qe = Flatseek("./locations" ) # direct mode — no server needed
››› r = qe.search(q="name:jakarta AND country:ID" , size=2 )
››› print(f"total = {r.total:,}" )
total = 23,714
››› for doc in r.docs:
... print(doc["name" ], doc["category" ], doc["population" ])
Jakarta tourist 121262
Jakarta regional 94715
››› agg = qe.aggregate(q="*" , aggs={
... "by_category" : {"terms" : {"field" : "category" , "size" : 5 }}
... })
››› [(b["key" ], b["doc_count" ]) for b in agg.aggs["by_category" ]["buckets" ]]
[ ('tourist' , 252108 ), ('industrial' , 249772 ),
('regional' , 249654 ), ('major' , 248466 )]