How to Stream OTLP to Parquet
import PostLogRecord from ’../../../components/docs/PostLogRecord.astro’; import StopCleanly from ’../../../components/docs/StopCleanly.astro’;
Use the duckdb-otlp Docker image in parquet mode to stream OTLP/HTTP exports into partitioned Parquet files.
Use this when you want ordinary DuckDB-readable files instead of a lakehouse catalog. The Parquet dataset is the only durable store — the daemon keeps no local copy of the data. Each seal writes the buffered rows to the configured Parquet root:
<export-root>/<table>/year=YYYY/month=MM/day=DD/*.parquetPARQUET_EXPORT_PATH can be a local path such as /data/otlp-parquet or a DuckDB-writable URI such as s3://bucket/prefix.
Live ingestion uses OTLP/HTTP on port 4318. WASM builds do not include the ingest server.
Configure local Parquet
Section titled “Configure local Parquet”Create parquet.env:
DUCKDB_MODE=parquetDUCKDB_OTLP_TOKEN=dev-token-123456DUCKDB_SCHEMA=otlp
PARQUET_EXPORT_PATH=/data/otlp-parquet
DUCKDB_QUACK_ENABLED=1DUCKDB_QUACK_ADDR=0.0.0.0:9494DUCKDB_QUACK_TOKEN=dev-quack-token-123456If you omit PARQUET_EXPORT_PATH, the daemon writes to /data/parquet.
Configure S3 Parquet
Section titled “Configure S3 Parquet”For S3, set PARQUET_EXPORT_PATH to an s3:// URI and provide AWS credentials through DuckDB’s credential chain:
DUCKDB_MODE=parquetDUCKDB_OTLP_TOKEN=dev-token-123456DUCKDB_SCHEMA=otlp
PARQUET_EXPORT_PATH=s3://duckdb-otlp-telemetry/otlpAWS_REGION=us-west-2AWS_PROFILE=cli-dev
DUCKDB_QUACK_ENABLED=1DUCKDB_QUACK_ADDR=0.0.0.0:9494DUCKDB_QUACK_TOKEN=dev-quack-token-123456You can also use S3_BUCKET and S3_PREFIX instead of PARQUET_EXPORT_PATH; those resolve to s3://$S3_BUCKET/$S3_PREFIX.
Start the server
Section titled “Start the server”For local Parquet output:
mkdir -p data
docker run --rm --name duckdb-otlp \ --env-file parquet.env \ -p 4318:4318 \ -p 9494:9494 \ -v "$(pwd)/data:/data" \ ghcr.io/smithclay/duckdb-otlp:latestFor S3 output, mount your AWS config read-only:
docker run --rm --name duckdb-otlp \ --env-file parquet.env \ -p 4318:4318 \ -p 9494:9494 \ -v "$(pwd)/data:/data" \ -v "$HOME/.aws:/root/.aws:ro" \ ghcr.io/smithclay/duckdb-otlp:latestEach seal writes only to Parquet. For convenient inspection over Quack, the daemon lazily creates a read-only view over the exported files for each signal — otlp.otlp_logs, otlp.otlp_traces, otlp.otlp_metrics_gauge, otlp.otlp_metrics_sum, otlp.otlp_metrics_histogram, otlp.otlp_metrics_exp_histogram. These views read the Parquet dataset directly (so they reflect exactly what is durable, including any at-least-once duplicates) and appear after the first seal for that signal. They store no data. Logs and metrics partition by time_unix_nano; traces partition by start_time_unix_nano.
Flush and inspect
Section titled “Flush and inspect”Flush through Quack from a local DuckDB process:
duckdb -unsigned -c "INSTALL quack;LOAD quack;FROM quack_query( 'quack:localhost:9494', 'SELECT * FROM otlp_flush(''otlp:0.0.0.0:4318'')', token = 'dev-quack-token-123456');"The example log record uses 2024-01-01, so local output lands under this path:
find data/otlp-parquet/otlp_logs/year=2024/month=01/day=01 -name '*.parquet'For S3 output:
aws s3 ls \ "s3://duckdb-otlp-telemetry/otlp/otlp_logs/year=2024/month=01/day=01/" \ --profile cli-dev \ --region us-west-2 \ --recursiveTo query the exported rows through the inspection view inside the running daemon (this reads the Parquet files, not a local copy):
duckdb -unsigned -c "INSTALL quack;LOAD quack;FROM quack_query( 'quack:localhost:9494', \$\$ SELECT time_unix_nano, service_name, severity_text, body FROM otlp.otlp_logs WHERE service_name = 'parquet-demo' ORDER BY time_unix_nano DESC LIMIT 5 \$\$, token = 'dev-quack-token-123456');"To query local Parquet files directly:
SELECT service_name, severity_text, bodyFROM read_parquet( 'data/otlp-parquet/otlp_logs/**/*.parquet', hive_partitioning = true)WHERE year = '2024' AND month = '01' AND day = '01';For S3 Parquet, create an S3 secret first:
INSTALL httpfs;LOAD httpfs;
CREATE OR REPLACE SECRET plain_s3_secret ( TYPE s3, PROVIDER credential_chain, CHAIN config, PROFILE 'cli-dev', REGION 'us-west-2');
SELECT service_name, severity_text, bodyFROM read_parquet( 's3://duckdb-otlp-telemetry/otlp/otlp_logs/**/*.parquet', hive_partitioning = true)WHERE year = '2024' AND month = '01' AND day = '01';Clean up
Section titled “Clean up”For local output:
rm -rf dataFor S3 output:
aws s3 rm "s3://duckdb-otlp-telemetry/otlp/" \ --profile cli-dev \ --region us-west-2 \ --recursiveTroubleshooting
Section titled “Troubleshooting”- If S3 startup cannot find credentials, confirm
AWS_PROFILEinparquet.envmatches a profile in the mounted$HOME/.awsdirectory. - If S3 writes fail, confirm the profile can write objects under
PARQUET_EXPORT_PATH. - If no files appear after a
202response, runotlp_flushbefore listing the output path. - Plain Parquet object writes are not catalog transactions (see the at-least-once caution above). A signal that already exported is never re-written, but if a single seal’s
COPYfails part-way through and the server retries, that signal’s rows can be duplicated under the same partition. Deduplicate downstream, or use a catalog mode for exactly-once.