Documentation
Flatseek is a disk-first, serverless search platform. Index structured data (CSV, JSON, JSONL) using a trigram inverted index and query via REST API or web dashboard — without managing a search cluster.
Installation
Requirements
- Python 3.10+
- pip or pip3
- git (for installer script)
Install via pip
pip install flatseek
Install via installer script
curl -fsSL flatseek.io/install.sh | sh
This installs both the flatseek Python package and the flatlens dashboard to ~/.local/share/flatlens.
Install from source
git clone https://github.com/flatseek/flatseek.git
cd flatseek
pip install -e .
Verify installation
flatseek --version
Quick Start
1. Build an index from CSV
flatseek build ./data/people.csv -o ./data
This creates a search index in ./data. Flatseek auto-detects column types (name, email, phone, date, etc.).
2. Start the API server + dashboard
cd ./data
flatseek serve
Starts on http://localhost:8000 by default.
- API:
http://localhost:8000 - Dashboard:
http://localhost:8000/dashboard - API Docs:
http://localhost:8000/docs
3. Open the dashboard
Navigate to http://localhost:8000/dashboard. Select your index from the dropdown, enter a query, and explore results.
4. Search from CLI
flatseek search ./data "name:*john*"
flatseek search ./data "city:jakarta AND gender:L"
flatseek stats ./data
5. Search from Python
from flatseek import Flatseek
client = Flatseek("http://localhost:8000")
result = client.search(index="people", q="*john*", size=20)
print(result.total)
for doc in result.docs:
print(doc)
Architecture
Flatseek consists of four components:
- Flatseek Core — Disk-first trigram indexing engine. Reads from memory-mapped files without loading into heap.
- Flatseek API — FastAPI-based REST layer with Elasticsearch-compatible endpoints.
- Flatlens Dashboard — Web UI for uploading files, running queries, visualizing aggregations, and map views.
- Python Client — Dual-mode client supporting both API mode (HTTP) and direct mode (local files).
Index Structure
When you build an index, Flatseek creates this directory structure:
./data/
├── index/ # Trigram posting lists (binary, memory-mapped)
│ └── *.bin
├── docs/ # Document store (compressed JSON)
│ └── *.zlib
├── column_map.json # Column type mappings
├── manifest.json # Index metadata
└── stats.json # Index statistics
CLI: build
Build a trigram index from CSV, JSON, JSONL, or directory of files.
Synopsis
flatseek build <csv_dir> [options]
Options
| Option | Description |
|---|---|
-o, --output | Output directory (default: ./data) |
-m, --map | Path to column_map.json |
-s, --sep | CSV delimiter (default: comma). Use '#' for hash-separated files. |
--columns | Comma-separated column names for headerless files |
-w, --workers | Number of parallel workers (default: 1). Use >1 for multi-core builds. |
--dataset | Dataset label (e.g. 'people', 'accounts') |
--dedup | Skip duplicate rows during indexing |
--dedup-fields | Dedup on specific columns only (e.g. --dedup-fields phone,nik) |
--daemon | Memory mode: never write prefix buffers to disk at checkpoint. Faster but uses more RAM. |
--estimate | Sample 5,000 rows before indexing to show speed/ETA estimate |
Examples
# Basic build
flatseek build ./people.csv -o ./data
# Multi-file directory
flatseek build ./csv_folder/ -o ./data
# Hash-delimited file
flatseek build ./data.csv -o ./data -s '#'
# Headerless CSV
flatseek build ./data.csv -o ./data --columns "id,name,email,phone"
# Parallel build (4 workers)
flatseek build ./large.csv -o ./data -w 4
# With ETA estimate
flatseek build ./data.csv -o ./data --estimate
# Daemon mode (uses more RAM, faster indexing)
flatseek build ./data.csv -o ./data --daemon
-w N to spawn N parallel workers. Each worker processes a portion of the data. Works for both single files (byte-range split) and multi-file directories.
Column Type Detection
Flatseek auto-detects semantic column types based on header names and sample values:
- name — Person names (with phonetic normalization)
- email — Email addresses
- phone — Phone numbers (normalized to digits only)
- birthday — Birth dates
- gender — Gender codes (L/G/P normalized to F/M)
- city — City names
- province — Province/region names
- address — Full addresses
- date — Date/datetime fields
- number — Numeric values
- status — Status enums
- id_number — ID numbers (NIK, KTP, etc.)
- string — Plain text (trigram-indexed)
CLI: serve
Start the API server and Flatlens dashboard, serving data from the current directory.
Synopsis
flatseek serve [options]
Options
| Option | Description |
|---|---|
-d, --data | Data directory to serve (default: current directory) |
-p, --port | Port number (default: 8000) |
--host | Host to bind to (default: 0.0.0.0) |
--no-reload | Disable auto-reload on code changes |
Examples
# Start on default port 8000 (serves current directory)
flatseek serve
# Serve a specific data directory
flatseek serve -d ./data
# Start on custom port
flatseek serve -p 9000 -d ./my-index
# Bind to localhost only
flatseek serve --host 127.0.0.1
Also available: api
Start API server only (no dashboard):
flatseek api -d ./data
CLI: search
Search the index from the command line.
Synopsis
flatseek search <data_dir> [query] [options]
Options
| Option | Description |
|---|---|
-c, --column | Restrict search to a specific column |
-p, --page | Page number, 0-based (default: 0) |
-n, --page-size | Results per page (default: 20) |
--and | AND condition (repeatable), e.g. --and city:jakarta |
--passphrase | Decryption passphrase for encrypted indexes |
Examples
flatseek search ./data "*john*"
flatseek search ./data "name:john"
flatseek search ./data "city:jakarta AND gender:L"
flatseek search ./data --column name --query "*john*"
flatseek search ./data --and city:jakarta --and gender:L
flatseek search ./data "*john*" -p 0 -n 50
CLI: classify
Detect and set column semantic types without building an index.
Synopsis
flatseek classify <csv_dir> [options]
Options
| Option | Description |
|---|---|
-o, --output | Output path for column_map.json |
-s, --sep | CSV delimiter |
--columns | Comma-separated column names |
Examples
flatseek classify ./data.csv -o ./column_map.json
flatseek classify ./csv_folder/
CLI: stats
Show index statistics.
Synopsis
flatseek stats <data_dir>
Output includes
- Total documents indexed
- Number of columns
- Column names and types
- Index size on disk
- Indexing date
CLI: compress
Compress index files in-place with zlib. Run after building or when adding data is complete.
Synopsis
flatseek compress <data_dir> [options]
Options
| Option | Description |
|---|---|
-l, --level | Compression level 1-9 (default: 6). Higher = smaller but slower. |
-w, --workers | Parallel workers (default: min(8, cpu_count)) |
Examples
flatseek compress ./data
flatseek compress ./data -l 9
flatseek compress ./data -w 4
CLI: encrypt / decrypt
Encrypt or decrypt index files in-place with ChaCha20-Poly1305.
Synopsis
flatseek encrypt <data_dir> [options]
flatseek decrypt <data_dir> [options]
Options
| Option | Description |
|---|---|
--passphrase | Encryption/decryption passphrase. If omitted, prompted interactively. |
Examples
flatseek encrypt ./data --passphrase "mysecretpass"
flatseek decrypt ./data --passphrase "mysecretpass"
encryption.json. The passphrase is derived via PBKDF2-HMAC-SHA256 (600,000 iterations). All .bin and .zlib files are encrypted with ChaCha20-Poly1305. There is no recovery without the passphrase.
CLI: dedup
Remove duplicate documents from an existing index. Works on both single-worker and parallel builds.
Synopsis
flatseek dedup <data_dir> [options]
Options
| Option | Description |
|---|---|
--fields | Comma-separated columns to use for fingerprinting (default: all non-meta fields) |
--dry-run | Report duplicates without making changes |
-w, --workers | Parallel workers for rewrite phase |
Examples
# Dedup on all fields
flatseek dedup ./data
# Dedup on specific fields
flatseek dedup ./data --fields phone,nik
# Dry run (report only)
flatseek dedup ./data --dry-run
dedup_checkpoint.json.
CLI: delete
Delete an index directory quickly using parallel rm -rf on subdirectories.
Synopsis
flatseek delete <data_dir> [options]
Options
| Option | Description |
|---|---|
-y, --yes | Skip confirmation prompt |
-w, --workers | Parallel workers (default: min(16, cpu_count)) |
Examples
flatseek delete ./data
flatseek delete ./data --yes
flatseek delete ./data -w 8
CLI: join
Perform a cross-dataset join on a shared field.
Synopsis
flatseek join <data_dir> <query_a> <query_b> --on <field>
Examples
flatseek join ./data "_dataset:people" "_dataset:accounts" --on phone
flatseek join ./data "_dataset:orders" "_dataset:customers" --on customer_id
_dataset field to distinguish between datasets. When building an index, use --dataset to label documents (e.g. flatseek build ./people.csv --dataset people).
CLI: chat
Interactive natural language query interface powered by Ollama.
Synopsis
flatseek chat <data_dir> [options]
Options
| Option | Description |
|---|---|
--model | LLM model name (default: qwen2.5-coder) |
--api-base | Ollama API base URL (default: http://localhost:11434/v1) |
Examples
flatseek chat ./data
flatseek chat ./data --model llama3
Dashboard: Upload Data
Click + Upload in the top-right corner of the Dashboard to open the upload wizard.
Supported file formats
- CSV — comma, semicolon, tab, pipe, colon, or hash delimiters (auto-detected)
- JSON — array of objects:
[{"name":"Alice","age":30}, ...] - JSONL — one object per line (newline-delimited)
- XLS / XLSX — Excel files with sheet selection
Step 1: Select File
Drag and drop files onto the dropzone, or click Browse Files. Multiple files can be selected.
Step 2: Preview & Mapping
After selecting a file, Flatseek parses the first few rows and shows a preview.
Column Mapping Options
- Exclude — skip this column during indexing
- Insert As — rename the column at insertion time
- Type — click the type badge to set the semantic type
Format Options
- Separator — auto-detected, but can be manually overridden
- First row is header — checkbox (default: on)
- Edit Headers — modify column names as JSON array
Step 3: Configure Index
- Upload to — "Create New Index" or "Existing Index"
- Index Name — lowercase, letters/numbers/underscores only
- Encrypt Index — enable password protection
- ID Field — specify a column as document ID (auto-generated if empty)
- Batch Size — documents sent per batch (default: 5000)
Step 4: Upload Progress
Live progress bar shows documents processed, throughput (docs/sec), and ETA.
Step 5: Complete
Summary shows total documents indexed and index size on disk.
Dashboard: Search
Select an index from the dropdown, enter a query in the search box, and press Search.
Query Syntax
*john* → contains "john"
name:john → name field = "john"
city:jakarta AND gender:L → both conditions
"john doe" → exact phrase
Filter Builder
Click Filter to open the advanced filter popup. Build filters by:
- Selecting a field
- Choosing an operator (contains, equals, starts with, ends with, greater than, less than, range)
- Entering a value
Click + Add Filter to add multiple conditions. Click Apply to add them as filter tags.
Results Table
- Pagination — navigate through results pages
- Column Visibility — click column header tags to show/hide columns
- Remove Duplicates — checkbox + column selector to deduplicate results
- Date Distribution Chart — shown when date columns are detected (toggle with arrow button)
- Click row — expand to see full document JSON
- Copy JSON — copy button in expanded document view
Dashboard: Aggregations
Go to the Aggregations tab to build aggregate summaries.
Aggregation Types
| Type | Description | Output |
|---|---|---|
terms | Group by unique values | Top N terms with counts |
stats | Statistical summary | Count, min, max, sum, avg |
date_histogram | Time-series grouping | Counts per time bucket |
avg | Average value | Single numeric average |
min | Minimum value | Single numeric minimum |
max | Maximum value | Single numeric maximum |
sum | Sum of values | Single numeric total |
cardinality | Unique value count | Single approximate count |
Chart Types
Toggle between Bar, Line, Donut, and Pie charts. Use Table view for raw data.
Common Use Cases
| Goal | Field | Type | Size |
|---|---|---|---|
| Top 10 cities | city | terms | 10 |
| Daily signups (30 days) | created_at | date_histogram | 30 |
| Average order value | order_amount | avg | — |
| Unique customers | customer_id | cardinality | — |
| Age statistics | age | stats | — |
Dashboard: Map View
Go to the Map tab to plot geo-tagged documents on an interactive Leaflet map.
Requirements
- Two numeric fields: latitude (-90 to 90) and longitude (-180 to 180)
- Up to 50,000 documents per map view
Combined lat,lng Field
If your data has a single field like "-6.2088, 106.8456" or "-6.2088 106.8456", use the combined lat,lng field selector instead of selecting latitude and longitude separately.
Map Controls
- Marker clustering — nearby points cluster automatically
- Click cluster — zoom in
- Click marker — see document details in popup
- Size input — control max documents plotted (1–50,000)
Dashboard: Index Management
Click the cluster status button (top-right) to view all indices and their stats.
Index Actions
- View stats — document count, index size, columns, type mappings
- Rename — click the rename button, enter new name
- Encrypt — click encrypt button, enter passphrase twice
- Decrypt — click decrypt button, enter passphrase
- Delete — click delete button, confirm deletion
Encrypted Indices
When accessing an encrypted index, a password modal appears. Enter the passphrase to authenticate. The session stores the password in memory for subsequent queries on that index.
Upload Progress
While uploading, the index card shows upload progress with live stats (documents processed, files done, ETA).
Query Syntax Reference
Basic Operators
| Pattern | Description | Example |
|---|---|---|
word | Match documents containing this word | john → all docs with "john" |
*ord | Wildcard at start | *ohn → "john", "mohn" |
wo*d | Wildcard in middle | j*doe → "jackson doe" |
"exact phrase" | Exact phrase match | "john doe" |
word1 AND word2 | Both words required | john AND jakarta |
word1 OR word2 | Either word matches | john OR jane |
NOT word | Exclude word | NOT john |
( ) | Grouping | (john OR jane) AND jakarta |
Field Prefixes
| Pattern | Description | Example |
|---|---|---|
name:value | Field contains value | city:jakarta |
name:*value* | Field wildcard | email:*@gmail.com |
field:[min TO max] | Range query (date/number fields) | birthday:[1990-01-01 TO 1999-12-31] |
Special Characters
These characters have special meaning and must be escaped with \ to search literally:
+ - && || ! ( ) { } [ ] ^ " ~ * ? : \ /
Example: to search for john+doe, use john\+doe
Query Examples
Basic Search
*john* # All documents containing "john"
"john doe" # Exact phrase match
john doe # Both words must appear (implicit AND)
Field-Specific
name:john # name field = john
city:jakarta # city field = jakarta
email:*@gmail.com # emails from gmail
phone:*1234* # phone containing "1234"
Boolean Logic
john AND jakarta # Both must appear
john OR jane # Either appears
john AND NOT jakarta # john but not jakarta
city:jakarta AND gender:L # Both conditions
Wildcards
*john* # Contains john anywhere
j* # Starts with j
*doe # Ends with doe
na*e # na followed by e (na + anything + e)
Grouping
(john OR jane) AND jakarta # Either john or jane, in jakarta
name:(john OR doe) # john or doe in name field
Date Ranges (date fields only)
birthday:[1990-01-01 TO 1999-12-31] # Born in the 1990s
created_at:[2024-01-01 TO 2024-12-31] # Created in 2024
Column Types
Column types determine how values are indexed and queried.
| Type | Indexing | Query Style | Example |
|---|---|---|---|
TEXT | Trigrams + tokenization | Wildcard, contains | Free text, descriptions |
KEYWORD | Exact value | Equals, terms aggregation | Tags, status, category |
DATE | ISO date (YYYYMMDD) | Range queries, date_histogram | created_at, birthday |
FLOAT | Numeric value | Range, stats aggregation | price, latitude |
INT | Integer value | Range, stats aggregation | age, quantity |
BOOL | Boolean | Equals | is_active, is_verified |
ARRAY | JSON array | Contains | tags, interests |
OBJECT | JSON object | Dot-path access | address.city, profile.name |
Semantic Types
Flatseek also auto-detects semantic types (name, email, phone, etc.) which apply additional normalization:
- name — Double Metaphone phonetic matching
- email — Domain extraction for domain queries
- phone — Digits-only normalization (strips +, -, spaces)
- gender — L/G/P normalized to F/M
API Overview
The Flatseek API is a REST interface with Elasticsearch-compatible endpoints. It runs on port 8000 by default when using flatseek serve.
Base URL
http://localhost:8000
API Documentation
Interactive API docs (Swagger UI) available at:
http://localhost:8000/docs
Authentication
For encrypted indices, include the passphrase in the X-Index-Password header:
curl -H "X-Index-Password: mypass" \
http://localhost:8000/my_index/_search?q=*john*
API: Search
Search Documents
GET /{index}/_search?q={query}&size={size}&from={offset}
or
POST /{index}/_search
{"query": "*john*", "size": 20, "from": 0}
Count Documents
GET /{index}/_count?q={query}
Aggregations
POST /{index}/_aggregate
{
"query": "*john*",
"aggs": {
"by_city": {"terms": {"field": "city", "size": 10}},
"age_stats": {"stats": {"field": "age"}}
}
}
Validate Query
POST /{index}/_validate
{"query": "name:john AND city:jakarta"}
Get Index Statistics
GET /{index}/_stats
Get Index Mapping
GET /{index}/_mapping
API: Upload
Bulk Index Documents
POST /{index}/_bulk
[{"name": "John", "city": "Jakarta"}, {"name": "Jane", "city": "Bandung"}]
Supports JSON array of documents. Returns summary of indexed count and errors.
Upload Progress
GET /{index}/_upload_progress
Returns current progress of an in-progress upload.
Flush Index
POST /{index}/_flush
Force-write all pending buffers to disk.
API: Index Management
List Indices
GET /_indices
Create Index
PUT /{index_name}
{
"description": "My index",
"encrypt": true
}
Delete Index
DELETE /{index_name}
Rename Index
POST /{old_name}/_rename
{"new_name": "new_index_name"}
Create/Update Mapping
PUT /{index_name}/_mapping
{
"columns": [
{"name": "id", "type": "INT"},
{"name": "name", "type": "TEXT"},
{"name": "email", "type": "KEYWORD"}
]
}
API: Encryption
Check if Index is Encrypted
GET /{index}/_is_encrypted
Authenticate (Decrypt)
POST /{index}/_authenticate
{"passphrase": "mysecretpass"}
Logout (Clear Session)
DELETE /{index}/_authenticate
Encrypt Index
POST /{index}/_encrypt
{"passphrase": "mysecretpass"}
Returns a job ID. Poll progress with:
GET /{index}/_encrypt_progress?job_id={job_id}
Decrypt Index
POST /{index}/_decrypt
{"passphrase": "mysecretpass"}
Python Package: Client
Installation
pip install flatseek
API Mode (HTTP client)
from flatseek import Flatseek
client = Flatseek("http://localhost:8000")
# Search
result = client.search(index="people", q="*john*", size=20)
print(f"Found: {result.total}")
for doc in result.docs:
print(doc["name"], doc["city"])
# Count
count = client.count(index="people", q="city:jakarta")
print(f"Jakarta residents: {count.count}")
# Aggregations
result = client.aggregate(
index="people",
q="*",
body={
"aggs": {
"by_city": {"terms": {"field": "city", "size": 10}}
}
}
)
# Index a document
client.index(index="people", body={"name": "John", "city": "Jakarta"})
# Bulk index
client.bulk_insert(index="people", docs=[
{"name": "Alice", "city": "Bandung"},
{"name": "Bob", "city": "Surabaya"},
])
# Cluster health
health = client.cluster()
print(health)
Response Objects
- Response — search results with
.hits,.total,.docs - CountResponse — count result with
.count - AggsResponse — aggregation results with
.total,.aggs
Python Package: Direct Mode
Direct mode accesses the index files directly without needing the API server. Faster for local scripts.
from flatseek import Flatseek
# Open local index directory
qe = Flatseek("./data")
qe = Flatseek("./data", index="people") # named sub-index
# Search
result = qe.search(q="name:*john*", size=10)
# Count
count = qe.count(q="city:jakarta")
# Aggregations
result = qe.aggregate(q="name:*john*", aggs={
"by_city": {"terms": {"field": "city", "size": 10}},
"birth_stats": {"stats": {"field": "birthday"}}
})
print(result.aggs["by_city"]["buckets"])
Python Package: Aggregations
Terms Aggregation
result = qe.aggregate(q="*", aggs={
"top_cities": {"terms": {"field": "city", "size": 10}}
})
buckets = result.aggs["top_cities"]["buckets"]
for b in buckets:
print(f"{b['key']}: {b['doc_count']}")
Stats Aggregation
result = qe.aggregate(q="*", aggs={
"age_stats": {"stats": {"field": "age"}}
})
stats = result.aggs["age_stats"]
print(f"Count: {stats['count']}, Avg: {stats['avg']}, Min: {stats['min']}, Max: {stats['max']}")
Date Histogram
result = qe.aggregate(q="*", aggs={
"daily_signups": {"date_histogram": {"field": "created_at", "interval": "day"}}
})
Multiple Aggregations
result = qe.aggregate(q="*", aggs={
"by_city": {"terms": {"field": "city", "size": 10}},
"by_gender": {"terms": {"field": "gender"}},
"avg_age": {"avg": {"field": "age"}},
"age_range": {"stats": {"field": "age"}}
})