Dataset refresh fails

A failed refresh doesn't mean your data is gone — queries keep running against the last good snapshot. But until you fix the cause, that snapshot gets staler. Here's how to triage.

Read the error

In Dataset → Refresh history, the last row shows the status and a short error. Click it for the full message, timing breakdown, and stack trace if it's one of ours.

Common error families:

  • connection/* — can't reach the source
  • auth/* — credentials no longer work
  • schema/* — upstream schema changed incompatibly
  • capacity/* — timed out or ran out of memory
  • gateway/* — gateway agent offline or misconfigured

connection/timeout or connection/refused

The source DB is down, the network blocks us, or the host name is wrong.

  • Test from our egress with our network-probe tool (Dataset → Test connection). Sends a TCP SYN to the host/port from our egress IPs (10.20.0.129, 10.20.0.130).
  • If it times out, the source firewall is blocking us. Allowlist both IPs or use the gateway agent.
  • If it connects but auth fails, see below.

auth/invalid-credentials

Password rotated, certificate expired, or the service account was disabled.

  • Dataset → Settings → Edit connection → update credentials → Test.
  • For Snowflake, confirm the key pair is rotated in both places (Snowflake user + OneAnalytics connection).
  • For BigQuery, service-account keys can be disabled without deletion; check IAM → Service Accounts in GCP.

schema/column-not-found or schema/type-changed

A column that the semantic model references has been dropped or retyped upstream.

  • Dataset → Model → the validator highlights the broken reference.
  • Either restore the upstream column or update the model (drop the dimension/measure that depended on it) and save. The next refresh should pass.

We never auto-modify your semantic model — stability > convenience.

capacity/statement-timeout

The refresh query exceeded the statement timeout (60 s default on Postgres connections we configure).

  • Increase the timeout in Dataset → Settings → Timeout (max 10 min for Import, 2 min for Direct queries).
  • Or, for truly huge tables, enable CDC (logical replication on Postgres, binlog on MySQL). Incremental pulls avoid full-table scans.

capacity/out-of-memory

Rare but possible on very wide tables (hundreds of columns, millions of rows, Import mode).

  • Switch affected dataset to Direct mode temporarily.
  • Or: ship us a support ticket with the dataset ID; we'll move you to a larger refresh worker (automatic on Scale plan).

gateway/offline

The gateway agent hasn't heartbeated in 60 s.

  • Check the service is running on the agent host.
  • Check outbound 443 to gw.analytics.rstglobal.in is not being MITM'd or blocked (we use mTLS; a transparent proxy will break it).

Partial failures

If the upstream has 50 tables and 2 fail, we mark the refresh partial-success — good tables land, bad tables keep their previous snapshot. UI clearly distinguishes this from a total failure.

Automatic retries

Transient failures (timeout, connection refused, 503) are retried automatically: 5s, 10s, 20s, 40s, 60s, 60s. After exhausted retries, the refresh is marked failed and admins are notified via the configured channel(s).

Pausing

If a dataset is broken and you can't fix it immediately, Pause refresh from the dataset menu. Queries continue against the last good snapshot; no more retries fire.