Connect BigQuery
BigQuery is a first-class source. Auth is a service account JSON key; IAM follows the least-privilege principle — roles/bigquery.dataViewer on the dataset plus roles/bigquery.jobUser on the project.
1. Create the service account
export PROJECT=my-gcp-project
export DATASET=my_bq_dataset
gcloud iam service-accounts create oa-reader \
--display-name "OneAnalytics reader" --project="$PROJECT"
SA="oa-reader@${PROJECT}.iam.gserviceaccount.com"
# Dataset-scoped read
bq --project_id="$PROJECT" show --format=prettyjson "$DATASET" > /tmp/acl.json
# … edit /tmp/acl.json to add {"role": "READER", "userByEmail": "$SA"} …
bq update --source=/tmp/acl.json "$PROJECT:$DATASET"
# Project-level job runner
gcloud projects add-iam-policy-binding "$PROJECT" \
--member="serviceAccount:$SA" --role="roles/bigquery.jobUser"
# Key
gcloud iam service-accounts keys create /tmp/oa-reader.json \
--iam-account="$SA"
Upload /tmp/oa-reader.json into the OneAnalytics connection dialog — stored encrypted at rest.
2. Add the connection
Sources → Add → BigQuery:
- Project ID:
my-gcp-project - Dataset (optional):
my_bq_dataset— scopes the browser - Service account key: upload
oa-reader.json - Location:
US,EU,asia-south1, … (matches your dataset's region)
Click Test — we list tables you have read access to.
3. Pick a mode
- Direct — typical. BigQuery charges per bytes scanned; our compiler aggressively pushes down filters and projections. The
sql_previewshows exactly what will be scanned. - Import — for very small tables or to cap BigQuery cost. Streams via the Storage Read API.
Cost controls
- Per-query maximum bytes billed — set in Dataset Settings. Queries over the limit fail fast.
- Partitioned-table filter — we refuse to issue a scan that omits the partition column on a partitioned table unless you tick the override. Saves a lot of money.
- Slot reservation — if you use flex slots, configure the reservation ID in Dataset Settings; we send it with every job.
Labels
Every job we submit carries labels app=oneanalytics, workspace_id=<uuid>, dataset_id=<uuid>. Use them in INFORMATION_SCHEMA.JOBS_BY_PROJECT for cost attribution.
Nested / repeated fields
STRUCT and ARRAY types are fully supported. In the semantic model, reference nested fields with dotted paths (order.shipping.city), and UNNEST for arrays happens transparently when you add them as dimensions.