How to Stream OTLP to Parquet
Use the duckdb-otlp Docker image in parquet mode to stream OTLP/HTTP exports into partitioned Parquet files.
Use this when you want ordinary DuckDB-readable files instead of a lakehouse catalog. The Parquet dataset is the only durable store — the daemon keeps no local copy of the data. Each seal writes the buffered rows to the configured Parquet root:
<export-root>/<table>/year=YYYY/month=MM/day=DD/*.parquetPARQUET_EXPORT_PATH can be a local path such as /data/otlp-parquet or a DuckDB-writable URI such as s3://bucket/prefix.
Live ingestion uses OTLP/HTTP on port 4318. WASM builds do not include the ingest server.
Configure local Parquet
Section titled “Configure local Parquet”Create parquet.env:
DUCKDB_MODE=parquetDUCKDB_OTLP_TOKEN=dev-token-123456DUCKDB_SCHEMA=otlp
PARQUET_EXPORT_PATH=/data/otlp-parquet
DUCKDB_QUACK_ENABLED=1DUCKDB_QUACK_ADDR=0.0.0.0:9494DUCKDB_QUACK_TOKEN=dev-quack-token-123456If you omit PARQUET_EXPORT_PATH, the daemon writes to /data/parquet.
Configure S3 Parquet
Section titled “Configure S3 Parquet”For S3, set PARQUET_EXPORT_PATH to an s3:// URI and provide AWS credentials through DuckDB’s credential chain:
DUCKDB_MODE=parquetDUCKDB_OTLP_TOKEN=dev-token-123456DUCKDB_SCHEMA=otlp
PARQUET_EXPORT_PATH=s3://duckdb-otlp-telemetry/otlpAWS_REGION=us-west-2AWS_PROFILE=cli-dev
DUCKDB_QUACK_ENABLED=1DUCKDB_QUACK_ADDR=0.0.0.0:9494DUCKDB_QUACK_TOKEN=dev-quack-token-123456You can also use S3_BUCKET and S3_PREFIX instead of PARQUET_EXPORT_PATH; those resolve to s3://$S3_BUCKET/$S3_PREFIX.
Start the server
Section titled “Start the server”Prefer to run manually in the DuckDB shell? See Run manually.
For local Parquet output:
mkdir -p data
docker run --rm --name duckdb-otlp \ --env-file parquet.env \ -p 4318:4318 \ -p 9494:9494 \ -v "$(pwd)/data:/data" \ ghcr.io/smithclay/duckdb-otlp:latestFor S3 output, mount your AWS config read-only:
docker run --rm --name duckdb-otlp \ --env-file parquet.env \ -p 4318:4318 \ -p 9494:9494 \ -v "$(pwd)/data:/data" \ -v "$HOME/.aws:/root/.aws:ro" \ ghcr.io/smithclay/duckdb-otlp:latestEach seal writes only to Parquet. For convenient inspection over Quack, the daemon lazily creates a read-only view over the exported files for each signal — otlp.otlp_logs, otlp.otlp_traces, otlp.otlp_metrics_gauge, otlp.otlp_metrics_sum, otlp.otlp_metrics_histogram, otlp.otlp_metrics_exp_histogram. These views read the Parquet dataset directly (so they reflect exactly what is durable, including any at-least-once duplicates) and appear after the first seal for that signal. They store no data. Logs and metrics partition by time_unix_nano; traces partition by start_time_unix_nano.
POST a log record
In another terminal:
curl -sS http://localhost:4318/v1/logs \ -H 'Authorization: Bearer dev-token-123456' \ -H 'Content-Type: application/json' \ -d '{"resourceLogs":[{"resource":{"attributes":[{"key":"service.name","value":{"stringValue":"parquet-demo"}},{"key":"deployment.environment","value":{"stringValue":"docs"}}]},"scopeLogs":[{"scope":{"name":"duckdb-otlp-guide"},"logRecords":[{"timeUnixNano":"1704067200000000000","observedTimeUnixNano":"1704067200123456789","severityNumber":9,"severityText":"INFO","body":{"stringValue":"hello from Parquet mode"},"attributes":[{"key":"guide","value":{"stringValue":"stream-to-parquet"}}]}]}]}]}'Response:
{"status":"buffered","rows":1,"batches":1}Rows are accepted before they are durable. They commit automatically in the background, on graceful shutdown, or immediately after an explicit flush.
Flush and inspect
Section titled “Flush and inspect”Flush through Quack from a local DuckDB process:
duckdb -unsigned -c "INSTALL quack;LOAD quack;FROM quack_query( 'quack:localhost:9494', 'SELECT * FROM otlp_flush(''otlp:0.0.0.0:4318'')', token = 'dev-quack-token-123456');"The example log record uses 2024-01-01, so local output lands under this path:
find data/otlp-parquet/otlp_logs/year=2024/month=01/day=01 -name '*.parquet'For S3 output:
aws s3 ls \ "s3://duckdb-otlp-telemetry/otlp/otlp_logs/year=2024/month=01/day=01/" \ --profile cli-dev \ --region us-west-2 \ --recursiveTo query the exported rows through the inspection view inside the running daemon (this reads the Parquet files, not a local copy):
duckdb -unsigned -c "INSTALL quack;LOAD quack;FROM quack_query( 'quack:localhost:9494', \$\$ SELECT time_unix_nano, service_name, severity_text, body FROM otlp.otlp_logs WHERE service_name = 'parquet-demo' ORDER BY time_unix_nano DESC LIMIT 5 \$\$, token = 'dev-quack-token-123456');"To query local Parquet files directly:
SELECT service_name, severity_text, bodyFROM read_parquet( 'data/otlp-parquet/otlp_logs/**/*.parquet', hive_partitioning = true)WHERE year = '2024' AND month = '01' AND day = '01';For S3 Parquet, create an S3 secret first:
INSTALL httpfs;LOAD httpfs;
CREATE OR REPLACE SECRET plain_s3_secret ( TYPE s3, PROVIDER credential_chain, CHAIN config, PROFILE 'cli-dev', REGION 'us-west-2');
SELECT service_name, severity_text, bodyFROM read_parquet( 's3://duckdb-otlp-telemetry/otlp/otlp_logs/**/*.parquet', hive_partitioning = true)WHERE year = '2024' AND month = '01' AND day = '01';Stop cleanly
If you plan to delete the Parquet resources immediately, skip this step and use Clean up instead.
docker stop duckdb-otlp
The image sends otlp_stop('otlp:0.0.0.0:4318') during shutdown,
so remaining buffered rows are committed before the process exits.
Clean up
Section titled “Clean up”For local output:
rm -rf dataFor S3 output:
aws s3 rm "s3://duckdb-otlp-telemetry/otlp/" \ --profile cli-dev \ --region us-west-2 \ --recursiveRun manually
Section titled “Run manually”To run this configuration in a DuckDB 1.5.4+ shell instead of the daemon, execute the SQL below. Replace bracketed values with the corresponding values from this guide. Keep the shell open while clients send telemetry.
duckdb duckdb-otlp-control.duckdbLocal Parquet
Section titled “Local Parquet”-- The daemon embeds otlp statically; the shell loads the extension explicitly.INSTALL otlp FROM community;LOAD otlp;
-- Local Parquet needs no storage extension setup.
-- Create the target schema before starting ingestion.CREATE SCHEMA IF NOT EXISTS otlp;
-- Start OTLP/HTTP. Seal cadence, file sizes, and buffer limits use defaults;-- override only as needed — see the Live Ingest Reference:-- https://smithclay.github.io/duckdb-otlp/reference/serve/SELECT listen_url, catalog_name, schema_nameFROM otlp_serve( 'otlp:0.0.0.0:4318', catalog := '', schema := 'otlp', token := 'dev-token-123456', allow_other_hostname := true, parquet_export_path := 'data/otlp-parquet');
-- The guide's daemon configuration also enables Quack.INSTALL quack;LOAD quack;SELECT listen_uriFROM quack_serve( 'quack:0.0.0.0:9494', token := 'dev-quack-token-123456', allow_other_hostname := true);S3 Parquet
Section titled “S3 Parquet”Use the profile, region, and export URI configured earlier in this guide.
-- The daemon embeds otlp statically; the shell loads the extension explicitly.INSTALL otlp FROM community;LOAD otlp;
-- Configure the AWS credential chain used by the S3 variant.INSTALL aws;INSTALL httpfs;LOAD aws;LOAD httpfs;CREATE OR REPLACE SECRET plain_s3_secret ( TYPE s3, PROVIDER credential_chain, CHAIN config, PROFILE 'cli-dev', REGION 'us-west-2');
-- Create the target schema before starting ingestion.CREATE SCHEMA IF NOT EXISTS otlp;
-- Start OTLP/HTTP. Seal cadence, file sizes, and buffer limits use defaults;-- override only as needed — see the Live Ingest Reference:-- https://smithclay.github.io/duckdb-otlp/reference/serve/SELECT listen_url, catalog_name, schema_nameFROM otlp_serve( 'otlp:0.0.0.0:4318', catalog := '', schema := 'otlp', token := 'dev-token-123456', allow_other_hostname := true, parquet_export_path := 's3://duckdb-otlp-telemetry/otlp');
-- The guide's daemon configuration also enables Quack.INSTALL quack;LOAD quack;SELECT listen_uriFROM quack_serve( 'quack:0.0.0.0:9494', token := 'dev-quack-token-123456', allow_other_hostname := true);Before closing DuckDB, stop both listeners cleanly:
-- Stop Quack, then commit buffered telemetry and stop OTLP.CALL quack_stop('quack:0.0.0.0:9494');SELECT status, dropped_rowsFROM otlp_stop('otlp:0.0.0.0:4318');Troubleshooting
Section titled “Troubleshooting”- If S3 startup cannot find credentials, confirm
AWS_PROFILEinparquet.envmatches a profile in the mounted$HOME/.awsdirectory. - If S3 writes fail, confirm the profile can write objects under
PARQUET_EXPORT_PATH. - If no files appear after a
202response, runotlp_flushbefore listing the output path. - Plain Parquet object writes are not catalog transactions (see the at-least-once caution above). A signal that already exported is never re-written, but if a single seal’s
COPYfails part-way through and the server retries, that signal’s rows can be duplicated under the same partition. Deduplicate downstream, or use a catalog mode for exactly-once.