Skip to content

OTLP HTTP Ingest Server

You can run an embedded HTTP server that accepts live OTLP/HTTP exports and streams them into DuckDB. Point any OpenTelemetry exporter at the server. The server buffers rows, then commits them in batches to the connection’s default catalog, a DuckLake lakehouse, or another attached writable catalog such as an Iceberg REST catalog.

Native extension builds include the server. WASM builds omit it. Live ingestion uses HTTP only, with no gRPC listener.

For a runnable walkthrough, see the Live Ingest Quickstart. For lakehouse examples, see Stream to Local DuckLake, Stream to Remote DuckLake, Stream to Amazon S3 Tables, and Stream to Cloudflare R2 Data Catalog. For plain files or object storage, see Stream to Parquet. For the implementation model, see Architecture.

The extension registers four lifecycle functions:

FunctionWhat it does
otlp_serve([uri], ...)Start an HTTP server and create/validate target tables. Returns one row describing the listener.
otlp_flush(uri)Force a synchronous commit of buffered rows when readers need fresh data. Returns commit stats. It leaves catalog maintenance alone.
otlp_stop(uri)Stop the server listening on uri (commits remaining rows first). Returns a status string.
otlp_server_list()List all running servers with live counters, buffer state, and health.

Starts an OTLP/HTTP ingest server bound to uri. The uri argument is optional; with no argument it defaults to otlp:localhost:4318.

-- Stream into an attached catalog
SELECT * FROM otlp_serve('otlp:localhost:4318', catalog := 'lake', token := 'my-dev-token-123456');

Parameters:

ParameterTypeDefaultDescription
uri (positional)VARCHARotlp:localhost:4318Listen URI. See URI scheme.
catalogVARCHAR(default catalog)Name of the target catalog. Empty means the connection’s default catalog (in-memory or file). Set this to an attached writable catalog such as DuckLake or an Iceberg REST catalog to stream OTLP into a lakehouse. See Catalog targeting.
tokenVARCHAR(random, see below)Auth token clients must present. Must be at least 16 characters. If you omit it, otlp_serve generates a random 32-hex-character token and returns it in auth_token.
schemaVARCHARmainSchema (within the catalog) that holds the target tables.
parquet_export_pathVARCHAR(none)Plain Parquet export root. When set, each seal writes the sealed rows to <root>/<table>/year=YYYY/month=MM/day=DD/*.parquet as the only durable store (no local table copy); a read-only view per signal is created over the files for inspection. Mutually exclusive with a catalog target. Export is at-least-once (a COPY cannot be rolled back).
create_tablesBOOLEANtrueCreate the six target tables if they don’t exist. When false, the tables must already exist with the expected columns or otlp_serve fails fast.
allow_other_hostnameBOOLEANfalseAllow binding to a non-localhost host. By default otlp_serve permits only localhost, 127.0.0.1, and ::1.
max_body_bytesUBIGINT16777216 (16 MiB)Reject request bodies larger than this with 413. Must be greater than zero.
http_threadsUBIGINThost-based bounded defaultWorker threads for concurrent HTTP requests. Must be greater than zero when set.
max_buffered_bytesUBIGINT536870912 (512 MiB)Backpressure cap. POSTs that would exceed this return 503.

Output columns (one row):

ColumnTypeDescription
listen_uriVARCHARThe otlp: URI the server is bound to.
listen_urlVARCHARThe equivalent http:// base URL (POST endpoints hang off this).
auth_tokenVARCHARThe token clients must present (the value you passed, or the generated one).
schema_nameVARCHARSchema holding the target tables.
logs_tableVARCHARotlp_logs
traces_tableVARCHARotlp_traces
metrics_gauge_tableVARCHARotlp_metrics_gauge
metrics_sum_tableVARCHARotlp_metrics_sum
metrics_histogram_tableVARCHARotlp_metrics_histogram
metrics_exp_histogram_tableVARCHARotlp_metrics_exp_histogram
catalog_nameVARCHARTarget catalog. Empty for the connection’s default catalog.

Starting a second server on the same URI fails (OTLP server already exists). The DuckDB DatabaseInstance owns the server lifetime: DuckDB stops all servers when the database closes, but it does not commit their buffers at that point (see Durability below). Call otlp_stop before closing the database to avoid losing buffered rows.

Forces a synchronous commit: the server writes its in-memory buffer to the target in one transaction before the function returns. Normal ingest can rely on background commits, and otlp_stop performs a final commit. Use otlp_flush when readers need the latest accepted rows while the server stays running. otlp_flush handles durability and read freshness; it leaves compaction and other catalog maintenance alone.

-- Force a commit
SELECT * FROM otlp_flush('otlp:localhost:4318');

Parameters:

ParameterTypeDefaultDescription
uri (positional)VARCHAR(required)Listen URI of the server to flush.

Output columns (one row):

ColumnTypeDescription
statusVARCHARsealed on success, or No server found listening on <uri>. The literal success value means the batch commit completed.
sealed_rowsUBIGINTRows committed by this call.
seals_totalUBIGINTTotal batch commits performed by this server since startup.
errorVARCHARCommit error detail, or NULL if none.

Stops the server listening on uri and frees the port. Any buffered rows are committed first, so a graceful stop loses no data.

SELECT status FROM otlp_stop('otlp:localhost:4318');

Output column:

ColumnTypeDescription
statusVARCHARStopped listening on <uri> if a server was stopped, or No server found listening on <uri> if none matched.

Lists running OTLP servers with live counters and buffer state. Takes no arguments.

SELECT
listen_uri,
catalog_name,
total_rows,
buffered_rows,
last_seal_age_ms AS last_commit_age_ms,
is_listening
FROM otlp_server_list();

Output columns (one row per running server):

ColumnTypeDescription
listen_uriVARCHARThe otlp: listen URI.
listen_urlVARCHARThe http:// base URL.
hostVARCHARBound host.
portUSMALLINTBound port.
catalog_nameVARCHARTarget catalog. Empty for the default catalog.
schema_nameVARCHARSchema holding the target tables.
active_requestsUBIGINTRequests in progress.
total_requestsUBIGINTRequests handled since startup (includes failures).
total_rowsUBIGINTRows accepted (buffered) since startup. Once the buffer drains, this equals the rows committed. A /v1/metrics request counts rows across all four metric tables.
buffered_rowsUBIGINTRows in the buffer that the writer has not committed.
last_seal_age_msBIGINTAge (ms) since the last successful batch commit, or NULL if none has completed.
seals_totalUBIGINTBatch commits performed since startup.
is_listeningBOOLEANfalse once the listener has fallen over (e.g. an error after a successful bind).
last_errorVARCHARLast fatal listener error, or NULL if none.
seal_last_errorVARCHARLast batch commit error, or NULL if none.

Use is_listening / last_error to detect a dead listener. Use seal_last_error to inspect writer failures, such as catalog conflicts.

The target of a server is <catalog>.<schema>.<table>:

  • Default catalog (catalog omitted): rows land in the connection’s default catalog, either an in-memory database or the file you opened DuckDB with. Use this zero-setup path when you do not need a lakehouse catalog. The server still buffers ingest (a POST returns 202); rows become durable in that database at the next background commit.
  • DuckLake catalog (catalog := '<attached_db>'): rows stream into a DuckLake lakehouse with Parquet data files on local or object storage, tracked by a catalog. Attach the catalog first, then name it:
INSTALL ducklake; LOAD ducklake;
ATTACH 'ducklake:metadata.ducklake' AS lake (DATA_PATH 'otlp_data/');
CALL otlp_serve('otlp:localhost:4318', catalog := 'lake');
-- rows buffer and commit into Parquet under otlp_data/
SELECT count(*) FROM lake.main.otlp_logs;

Each batch commit writes one Parquet data file per signal plus one DuckLake snapshot. After a conservative number of successful automatic row-seals, duckdb-otlp may run best-effort catalog-native maintenance with DuckDB’s non-force CHECKPOINT lake when recent ingest rate and pending bytes leave ample admission headroom; DuckLake owns the actual policy through its settings such as auto_compact, retention, inlining, and target file size. See Durability and background commits.

  • Iceberg REST catalog (catalog := '<attached_db>'): rows stream into tables in an attached writable Iceberg REST catalog. Attach the catalog with DuckDB’s iceberg extension, create the target schema, then pass the catalog and schema to otlp_serve; see Stream to Amazon S3 Tables and Stream to Cloudflare R2 Data Catalog for managed provider paths. DuckDB’s Iceberg REST catalog docs have no useful CHECKPOINT maintenance path today. The internal maintenance probe uses generic CHECKPOINT <catalog> only. If the catalog reports checkpointing as unsupported, the server disables automatic maintenance for that server and ingest durability continues normally.

  • Plain Parquet export (parquet_export_path := '/data/otlp-parquet' or parquet_export_path := 's3://bucket/prefix'): each seal writes the sealed rows straight to <path>/<table>/year=YYYY/month=MM/day=DD/*.parquet. The Parquet dataset is the only durable store — no local table copy is kept (a read-only view per signal is created over the files for inspection), and it is mutually exclusive with a catalog target. Because a COPY is a file write and cannot be rolled back, export is at-least-once: a signal that already exported is never re-written, but a seal whose COPY fails part-way can re-export that signal’s rows on retry, so deduplicate downstream if you need exactly-once. Use this when you want partitioned Parquet without a lakehouse catalog; see Stream to Parquet.

Listen URIs use the otlp: scheme:

FormExample
otlp:host:portotlp:localhost:4318
otlp://host:portotlp://127.0.0.1:4318
IPv6 (host in brackets)otlp:[::1]:4318

By default, otlp_serve allows only localhost, 127.0.0.1, and ::1. To bind to any other host (for example 0.0.0.0 to accept remote exporters), pass allow_other_hostname := true. otlp_serve rejects non-localhost hosts before it binds a socket.

The http:// base URL from listen_url exposes:

MethodPathDescription
POST/v1/logsIngest logs into otlp_logs.
POST/v1/tracesIngest traces into otlp_traces.
POST/v1/metricsIngest metrics. Fans out across all four metric tables: otlp_metrics_gauge, otlp_metrics_sum, otlp_metrics_histogram, otlp_metrics_exp_histogram.
GET/healthzLiveness probe. Returns 200 with {"status":"ok"}. No auth required.
GET/readyzReadiness probe. Returns 200 with {"status":"ready"} once the listener is bound. No auth required.

Tables live in <catalog>.<schema>, chosen by otlp_serve(catalog := ..., schema := ...).

The server picks a parser from the request Content-Type:

Content-TypeFormat
application/json, application/otlp+jsonOTLP/JSON
application/x-ndjsonnewline-delimited OTLP/JSON (JSONL)
application/x-protobuf, application/protobuf, application/otlpOTLP/protobuf

Any other content type returns 415.

Content-Encoding: identity (or no header), gzip, and deflate are accepted. Any other content encoding returns 415.

Every POST must present the configured token through one of these headers:

  • Authorization: Bearer <token> (case-insensitive scheme)
  • x-api-key: <token>

The server checks the two headers independently, so a malformed Authorization header does not mask a valid x-api-key. A missing or invalid token returns 401. The server compares tokens with a constant-time check.

Tokens must be at least 16 characters. Auto-generated tokens (when token is omitted) are 32 hex characters (128 bits of entropy).

A successful POST returns 202 Accepted after the server buffers rows:

{"status":"buffered","rows":42,"batches":1}

A 202 means the server validated, converted, and accepted the rows into the in-memory buffer. The rows are not yet durable (see below).

Errors return JSON shaped like {"error":"<reason>","message":"<detail>"}:

StatusWhen
400OTLP body failed to parse (or other invalid input).
401Missing or invalid auth token.
413Body larger than max_body_bytes.
415Unsupported Content-Type or Content-Encoding.
503Buffer admission full (request would exceed max_buffered_bytes). Retry with backoff.
500Internal error (also written to duckdb_logs).

The server buffers ingest and commits rows in batches for each target. Batch commits avoid per-request tiny files and write conflicts: a single serialized writer prevents concurrent catalog writes from the ingest server.

The flow:

  1. A POST reserves admission bytes, parses, converts, and appends rows into the relevant per-signal in-memory buffer, then returns 202. The bounded worker pool does this concurrently; append locks only the target signal buffer.
  2. A single background writer commits the buffer to the target in one transaction when any trigger fires:
    • admitted request-body bytes reach the internal size threshold, 64 MiB today,
    • the oldest buffered row reaches the internal age limit, about 5 seconds today, or
    • an explicit otlp_flush.
  3. For a DuckLake target, each batch commit writes one Parquet data file per signal plus one snapshot.
  4. For named catalogs, the server may follow successful automatic row-seals with best-effort, non-force CHECKPOINT <catalog> outside the ingest transaction when recent ingest rate and pending bytes leave ample admission headroom. Treat this as internal scheduling. The server skips the default catalog. The hook also skips explicit otlp_flush, sustained high ingest, high pending buffered bytes, and shutdown drains; unsupported checkpoint implementations log once and disable the hook for that server.

Durability contract:

  • A 202 is not durable. Rows become durable at the next background commit, on otlp_stop, or on otlp_flush.
  • A crash or hard kill loses buffered-but-uncommitted rows (at-most-once for that window).
  • otlp_stop and otlp_flush commit remaining rows before returning, so those calls lose no accepted rows. A plain database/connection close does NOT commit buffered rows. The drain runs after DuckDB tears down the instance, when it can no longer write, so DuckDB can drop buffered rows. Prefer otlp_stop before closing the database. Use otlp_flush when the server should keep running but readers need durable rows now.

The project tracks a future durable raw-spool journal for at-least-once delivery.

Backpressure: if admitting a request would exceed max_buffered_bytes (default 512 MiB) across in-flight and uncommitted accepted payloads, the POST returns 503 before parse/transform work. Clients should retry with backoff.

Keeping DuckLake tidy: each batch commit can leave a small Parquet file per signal. For DuckLake, the server lets the catalog run its own maintenance through DuckDB’s catalog-native CHECKPOINT machinery when recent ingest leaves enough admission headroom. The hook skips per-seal maintenance and stays outside the ingest transaction; a maintenance failure leaves committed rows intact. otlp_flush still forces only ingest durability and leaves compaction to catalog maintenance.

  • The server runs a bounded httplib worker pool. Workers parse, convert, and buffer requests concurrently; each signal table has its own buffer lock. In the daemon, set DUCKDB_OTLP_HTTP_THREADS to override the host-based default.
  • A single background writer thread writes to the target catalog. Serial writes let DuckLake, which uses optimistic concurrency, avoid conflict retries and tiny-file churn.

make test cannot issue HTTP POSTs, so the SQL logic tests cover only the lifecycle functions. To exercise the ingest hot path, run the manual concurrency harness. It covers auth, content-type handling, the metrics fan-out, buffering and batch commits, and Arrow → DuckDB conversion under concurrency:

Terminal window
uv run --script test/manual/otlp_serve_concurrency.py
# Override the payload / concurrency:
OTLP_PAYLOAD=test/data/logs_simple.jsonl OTLP_CONCURRENCY=64 \
uv run --script test/manual/otlp_serve_concurrency.py
# Exercise the DuckLake path (writes Parquet under the given dir and checks the
# automatic catalog-maintenance event):
OTLP_DUCKLAKE_DIR=/tmp/otlp_lake \
uv run --script test/manual/otlp_serve_concurrency.py

It covers auth and validation errors, low-buffer backpressure, metrics fanout, stop-under-load, concurrent flush/stop, and the optional local DuckLake maintenance checkpoint event, then reconciles accepted rows against committed row counts. Run it against a TSan/ASan build to catch races.