Skip to content

How to Stream OTLP to Remote DuckLake

Use the duckdb-otlp Docker image in r2-neon-ducklake mode to stream OTLP/HTTP exports into DuckLake with a Neon Postgres catalog and Cloudflare R2 data files.

The container initializes DuckDB, loads the required extensions, connects DuckLake to Neon, stores Parquet files in R2, starts the ingest server, and commits accepted rows in batches.

Live ingestion uses OTLP/HTTP on port 4318. WASM builds do not include the ingest server.

Create or choose a Neon database for the DuckLake catalog. Copy the connection details from Neon as separate values:

  • host
  • port
  • database
  • user
  • password
  • SSL mode

Create or choose a Cloudflare R2 bucket for DuckLake data files:

Terminal window
export CLOUDFLARE_ACCOUNT_ID=<account-id>
export R2_BUCKET_NAME=duckdb-otlp-ducklake
wrangler r2 bucket create "$R2_BUCKET_NAME"

Create an R2 S3-compatible access key pair that can write objects to the bucket. Save the access key ID and secret access key for the next step.

Create remote-ducklake.env:

DUCKDB_MODE=r2-neon-ducklake
DUCKDB_OTLP_TOKEN=dev-token-123456
DUCKLAKE_NAME=lake
DUCKDB_SCHEMA=otlp
DUCKDB_QUACK_ENABLED=1
DUCKDB_QUACK_ADDR=0.0.0.0:9494
DUCKDB_QUACK_TOKEN=dev-quack-token-123456
CLOUDFLARE_ACCOUNT_ID=<account-id>
CLOUDFLARE_R2_BUCKET=<bucket-name>
CLOUDFLARE_R2_PREFIX=duckdb-otlp/
CLOUDFLARE_ACCESS_KEY_ID=<r2-s3-access-key-id>
CLOUDFLARE_SECRET_ACCESS_KEY=<r2-s3-secret-access-key>
NEON_PGHOST=<neon-host>
NEON_PGPORT=5432
NEON_PGDATABASE=<database>
NEON_PGUSER=<user>
NEON_PGPASSWORD=<password>
NEON_PGSSLMODE=require

If you use a custom R2 endpoint, add CLOUDFLARE_R2_ENDPOINT=<endpoint-host>.

Terminal window
docker run --rm --name duckdb-otlp \
--env-file remote-ducklake.env \
-p 4318:4318 \
-p 9494:9494 \
ghcr.io/smithclay/duckdb-otlp:latest

The container creates the target tables in lake.otlp if they do not exist:

  • otlp_logs
  • otlp_traces
  • otlp_metrics_gauge
  • otlp_metrics_sum
  • otlp_metrics_histogram
  • otlp_metrics_exp_histogram

Leave the container running while clients send OTLP/HTTP requests.

POST a log record

In another terminal:

Terminal window
curl -sS http://localhost:4318/v1/logs \
-H 'Authorization: Bearer dev-token-123456' \
-H 'Content-Type: application/json' \
-d '{"resourceLogs":[{"resource":{"attributes":[{"key":"service.name","value":{"stringValue":"remote-ducklake-demo"}},{"key":"deployment.environment","value":{"stringValue":"docs"}}]},"scopeLogs":[{"scope":{"name":"duckdb-otlp-guide"},"logRecords":[{"timeUnixNano":"1704067200000000000","observedTimeUnixNano":"1704067200123456789","severityNumber":9,"severityText":"INFO","body":{"stringValue":"hello from remote DuckLake"},"attributes":[{"key":"guide","value":{"stringValue":"stream-to-remote-ducklake"}}]}]}]}]}'

Response:

{"status":"buffered","rows":1,"batches":1}

Rows are accepted before they are durable. They commit automatically in the background, on graceful shutdown, or immediately after an explicit flush.

Query committed rows

Flush and query through Quack from a host DuckDB process. The server process owns the DuckLake catalog connection while it runs, and the distroless image has no shell or bundled DuckDB CLI.

The server image is distroless and has no shell or DuckDB CLI, so do not use docker exec ... sh -c for inspection SQL. The examples in this guide enable Quack and publish port 9494 for this purpose.

Terminal window
duckdb <<'SQL'
INSTALL quack;
LOAD quack;
FROM quack_query(
'quack:localhost:9494',
'SELECT * FROM otlp_flush(''otlp:0.0.0.0:4318'')',
token = 'dev-quack-token-123456'
);
FROM quack_query(
'quack:localhost:9494',
$$
SELECT service_name, severity_text, body
FROM lake.otlp.otlp_logs
WHERE service_name = 'remote-ducklake-demo'
ORDER BY time_unix_nano DESC
LIMIT 5
$$,
token = 'dev-quack-token-123456'
);
SQL

Stop cleanly

If you plan to delete the Neon and R2 resources immediately, skip this step and use Clean up instead.

Terminal window
docker stop duckdb-otlp

The image sends otlp_stop('otlp:0.0.0.0:4318') during shutdown, so remaining buffered rows are committed before the process exits.

Drop the DuckLake tables before deleting the Neon database or R2 bucket:

Terminal window
duckdb <<'SQL'
INSTALL quack;
LOAD quack;
FROM quack_query(
'quack:localhost:9494',
$$
SELECT status FROM otlp_stop('otlp:0.0.0.0:4318');
DROP TABLE IF EXISTS lake.otlp.otlp_logs;
DROP TABLE IF EXISTS lake.otlp.otlp_traces;
DROP TABLE IF EXISTS lake.otlp.otlp_metrics_gauge;
DROP TABLE IF EXISTS lake.otlp.otlp_metrics_sum;
DROP TABLE IF EXISTS lake.otlp.otlp_metrics_histogram;
DROP TABLE IF EXISTS lake.otlp.otlp_metrics_exp_histogram;
DETACH lake;
$$,
token = 'dev-quack-token-123456'
);
SQL
docker stop duckdb-otlp

Then delete the R2 prefix or bucket and remove the Neon database or branch you created for this guide.

  • If the container cannot connect to Neon at startup, confirm NEON_PGHOST, NEON_PGDATABASE, NEON_PGUSER, NEON_PGPASSWORD, and NEON_PGSSLMODE=require.
  • If the container connects to Neon but cannot write data files, confirm the R2 access key can write objects to CLOUDFLARE_R2_BUCKET.
  • If R2 paths use the wrong location, confirm CLOUDFLARE_ACCOUNT_ID, CLOUDFLARE_R2_BUCKET, and CLOUDFLARE_R2_PREFIX.
  • If no rows appear after a 202 response, run the flush command before querying.
  • otlp_flush seals buffered ingest rows. Run DuckDB or DuckLake maintenance commands when you need compaction.