Datasets

A dataset is the atom of analysis in OneAnalytics. Every report, every AI question, every alert reads from a dataset — never from a raw connection.

Three modes

Import — We copy rows into our columnar cache at refresh time. Fast queries, arbitrary SQL dialects, but data is as fresh as your last refresh.
Direct — We push queries straight to the upstream database. Always fresh, but latency and concurrency depend on the source.
Composite — Some tables imported, some direct. Useful when a fact table is huge (direct) but dimension tables are small (import).

Mode is picked at dataset creation and can be changed later. The semantic model travels with the dataset regardless of mode.

Lifecycle

created → syncing → ready ↔ refreshing → ready
                       ↓
                     error → (retry or disable)

row_count, size_bytes, and last_refresh_at are updated on every successful refresh.
Failed refreshes trigger an email + Slack alert to workspace admins (configurable in Settings → Alerts).
A dataset can be paused — queries continue against the last good snapshot; no refreshes fire.

Querying

Datasets are never queried with raw SQL from the web UI. You send a structured QueryRequest:

{
  "dimensions": [{"id": "order_date", "grain": "month"}],
  "measures": ["revenue"],
  "filters": [{"field": "region", "op": "in", "value": ["North", "South"]}],
  "order_by": [{"field": "order_date", "dir": "asc"}],
  "limit": 1000
}

The compiler binds this to the semantic model, emits dialect-specific SQL (PostgreSQL/MySQL/Snowflake/BigQuery/DuckDB), and returns rows plus a sql_preview for transparency.

Versioning

Every schema change (a new column, a changed type, a dropped table) creates a new dataset version. Reports pin to a version; opening an old report against a new version warns you if a referenced column is gone.

Quotas

Starter plans get 2 GB total across all datasets; Growth gets 50 GB; Scale is unlimited. Row counts over 100 M default to Direct mode.