Skip to content

How to Stream OTLP to Parquet

Use the duckdb-otlp Docker image in parquet mode to stream OTLP/HTTP exports into partitioned Parquet files.

Use this when you want ordinary DuckDB-readable files instead of a lakehouse catalog. The Parquet dataset is the only durable store — the daemon keeps no local copy of the data. Each seal writes the buffered rows to the configured Parquet root:

<export-root>/<table>/year=YYYY/month=MM/day=DD/*.parquet

PARQUET_EXPORT_PATH can be a local path such as /data/otlp-parquet or a DuckDB-writable URI such as s3://bucket/prefix.

Live ingestion uses OTLP/HTTP on port 4318. WASM builds do not include the ingest server.

Create parquet.env:

DUCKDB_MODE=parquet
DUCKDB_OTLP_TOKEN=dev-token-123456
DUCKDB_SCHEMA=otlp
PARQUET_EXPORT_PATH=/data/otlp-parquet
DUCKDB_QUACK_ENABLED=1
DUCKDB_QUACK_ADDR=0.0.0.0:9494
DUCKDB_QUACK_TOKEN=dev-quack-token-123456

If you omit PARQUET_EXPORT_PATH, the daemon writes to /data/parquet.

For S3, set PARQUET_EXPORT_PATH to an s3:// URI and provide AWS credentials through DuckDB’s credential chain:

DUCKDB_MODE=parquet
DUCKDB_OTLP_TOKEN=dev-token-123456
DUCKDB_SCHEMA=otlp
PARQUET_EXPORT_PATH=s3://duckdb-otlp-telemetry/otlp
AWS_REGION=us-west-2
AWS_PROFILE=cli-dev
DUCKDB_QUACK_ENABLED=1
DUCKDB_QUACK_ADDR=0.0.0.0:9494
DUCKDB_QUACK_TOKEN=dev-quack-token-123456

You can also use S3_BUCKET and S3_PREFIX instead of PARQUET_EXPORT_PATH; those resolve to s3://$S3_BUCKET/$S3_PREFIX.

Prefer to run manually in the DuckDB shell? See Run manually.

For local Parquet output:

Terminal window
mkdir -p data
docker run --rm --name duckdb-otlp \
--env-file parquet.env \
-p 4318:4318 \
-p 9494:9494 \
-v "$(pwd)/data:/data" \
ghcr.io/smithclay/duckdb-otlp:latest

For S3 output, mount your AWS config read-only:

Terminal window
docker run --rm --name duckdb-otlp \
--env-file parquet.env \
-p 4318:4318 \
-p 9494:9494 \
-v "$(pwd)/data:/data" \
-v "$HOME/.aws:/root/.aws:ro" \
ghcr.io/smithclay/duckdb-otlp:latest

Each seal writes only to Parquet. For convenient inspection over Quack, the daemon lazily creates a read-only view over the exported files for each signal — otlp.otlp_logs, otlp.otlp_traces, otlp.otlp_metrics_gauge, otlp.otlp_metrics_sum, otlp.otlp_metrics_histogram, otlp.otlp_metrics_exp_histogram. These views read the Parquet dataset directly (so they reflect exactly what is durable, including any at-least-once duplicates) and appear after the first seal for that signal. They store no data. Logs and metrics partition by time_unix_nano; traces partition by start_time_unix_nano.

POST a log record

In another terminal:

Terminal window
curl -sS http://localhost:4318/v1/logs \
-H 'Authorization: Bearer dev-token-123456' \
-H 'Content-Type: application/json' \
-d '{"resourceLogs":[{"resource":{"attributes":[{"key":"service.name","value":{"stringValue":"parquet-demo"}},{"key":"deployment.environment","value":{"stringValue":"docs"}}]},"scopeLogs":[{"scope":{"name":"duckdb-otlp-guide"},"logRecords":[{"timeUnixNano":"1704067200000000000","observedTimeUnixNano":"1704067200123456789","severityNumber":9,"severityText":"INFO","body":{"stringValue":"hello from Parquet mode"},"attributes":[{"key":"guide","value":{"stringValue":"stream-to-parquet"}}]}]}]}]}'

Response:

{"status":"buffered","rows":1,"batches":1}

Rows are accepted before they are durable. They commit automatically in the background, on graceful shutdown, or immediately after an explicit flush.

Flush through Quack from a local DuckDB process:

Terminal window
duckdb -unsigned -c "
INSTALL quack;
LOAD quack;
FROM quack_query(
'quack:localhost:9494',
'SELECT * FROM otlp_flush(''otlp:0.0.0.0:4318'')',
token = 'dev-quack-token-123456'
);
"

The example log record uses 2024-01-01, so local output lands under this path:

Terminal window
find data/otlp-parquet/otlp_logs/year=2024/month=01/day=01 -name '*.parquet'

For S3 output:

Terminal window
aws s3 ls \
"s3://duckdb-otlp-telemetry/otlp/otlp_logs/year=2024/month=01/day=01/" \
--profile cli-dev \
--region us-west-2 \
--recursive

To query the exported rows through the inspection view inside the running daemon (this reads the Parquet files, not a local copy):

Terminal window
duckdb -unsigned -c "
INSTALL quack;
LOAD quack;
FROM quack_query(
'quack:localhost:9494',
\$\$
SELECT time_unix_nano, service_name, severity_text, body
FROM otlp.otlp_logs
WHERE service_name = 'parquet-demo'
ORDER BY time_unix_nano DESC
LIMIT 5
\$\$,
token = 'dev-quack-token-123456'
);
"

To query local Parquet files directly:

SELECT service_name, severity_text, body
FROM read_parquet(
'data/otlp-parquet/otlp_logs/**/*.parquet',
hive_partitioning = true
)
WHERE year = '2024' AND month = '01' AND day = '01';

For S3 Parquet, create an S3 secret first:

INSTALL httpfs;
LOAD httpfs;
CREATE OR REPLACE SECRET plain_s3_secret (
TYPE s3,
PROVIDER credential_chain,
CHAIN config,
PROFILE 'cli-dev',
REGION 'us-west-2'
);
SELECT service_name, severity_text, body
FROM read_parquet(
's3://duckdb-otlp-telemetry/otlp/otlp_logs/**/*.parquet',
hive_partitioning = true
)
WHERE year = '2024' AND month = '01' AND day = '01';

Stop cleanly

If you plan to delete the Parquet resources immediately, skip this step and use Clean up instead.

Terminal window
docker stop duckdb-otlp

The image sends otlp_stop('otlp:0.0.0.0:4318') during shutdown, so remaining buffered rows are committed before the process exits.

For local output:

Terminal window
rm -rf data

For S3 output:

Terminal window
aws s3 rm "s3://duckdb-otlp-telemetry/otlp/" \
--profile cli-dev \
--region us-west-2 \
--recursive

To run this configuration in a DuckDB 1.5.4+ shell instead of the daemon, execute the SQL below. Replace bracketed values with the corresponding values from this guide. Keep the shell open while clients send telemetry.

Terminal window
duckdb duckdb-otlp-control.duckdb
-- The daemon embeds otlp statically; the shell loads the extension explicitly.
INSTALL otlp FROM community;
LOAD otlp;
-- Local Parquet needs no storage extension setup.
-- Create the target schema before starting ingestion.
CREATE SCHEMA IF NOT EXISTS otlp;
-- Start OTLP/HTTP. Seal cadence, file sizes, and buffer limits use defaults;
-- override only as needed — see the Live Ingest Reference:
-- https://smithclay.github.io/duckdb-otlp/reference/serve/
SELECT listen_url, catalog_name, schema_name
FROM otlp_serve(
'otlp:0.0.0.0:4318',
catalog := '',
schema := 'otlp',
token := 'dev-token-123456',
allow_other_hostname := true,
parquet_export_path := 'data/otlp-parquet'
);
-- The guide's daemon configuration also enables Quack.
INSTALL quack;
LOAD quack;
SELECT listen_uri
FROM quack_serve(
'quack:0.0.0.0:9494',
token := 'dev-quack-token-123456',
allow_other_hostname := true
);

Use the profile, region, and export URI configured earlier in this guide.

-- The daemon embeds otlp statically; the shell loads the extension explicitly.
INSTALL otlp FROM community;
LOAD otlp;
-- Configure the AWS credential chain used by the S3 variant.
INSTALL aws;
INSTALL httpfs;
LOAD aws;
LOAD httpfs;
CREATE OR REPLACE SECRET plain_s3_secret (
TYPE s3,
PROVIDER credential_chain,
CHAIN config,
PROFILE 'cli-dev',
REGION 'us-west-2'
);
-- Create the target schema before starting ingestion.
CREATE SCHEMA IF NOT EXISTS otlp;
-- Start OTLP/HTTP. Seal cadence, file sizes, and buffer limits use defaults;
-- override only as needed — see the Live Ingest Reference:
-- https://smithclay.github.io/duckdb-otlp/reference/serve/
SELECT listen_url, catalog_name, schema_name
FROM otlp_serve(
'otlp:0.0.0.0:4318',
catalog := '',
schema := 'otlp',
token := 'dev-token-123456',
allow_other_hostname := true,
parquet_export_path := 's3://duckdb-otlp-telemetry/otlp'
);
-- The guide's daemon configuration also enables Quack.
INSTALL quack;
LOAD quack;
SELECT listen_uri
FROM quack_serve(
'quack:0.0.0.0:9494',
token := 'dev-quack-token-123456',
allow_other_hostname := true
);

Before closing DuckDB, stop both listeners cleanly:

-- Stop Quack, then commit buffered telemetry and stop OTLP.
CALL quack_stop('quack:0.0.0.0:9494');
SELECT status, dropped_rows
FROM otlp_stop('otlp:0.0.0.0:4318');
  • If S3 startup cannot find credentials, confirm AWS_PROFILE in parquet.env matches a profile in the mounted $HOME/.aws directory.
  • If S3 writes fail, confirm the profile can write objects under PARQUET_EXPORT_PATH.
  • If no files appear after a 202 response, run otlp_flush before listing the output path.
  • Plain Parquet object writes are not catalog transactions (see the at-least-once caution above). A signal that already exported is never re-written, but if a single seal’s COPY fails part-way through and the server retries, that signal’s rows can be duplicated under the same partition. Deduplicate downstream, or use a catalog mode for exactly-once.