Architecture
This page describes the internal architecture of the DuckDB OpenTelemetry Extension.
Overview
Section titled “Overview”The DuckDB OpenTelemetry Extension exposes typed readers for OpenTelemetry Protocol (OTLP) telemetry files. It reads traces, logs, and metrics from JSON and protobuf OTLP exports.
Extension Type
Section titled “Extension Type”The extension includes:
- Table functions:
read_otlp_traces,read_otlp_logs - Metrics functions:
read_otlp_metrics_gauge,read_otlp_metrics_sum,read_otlp_metrics_histogram,read_otlp_metrics_exp_histogram - Live ingest functions:
otlp_serve,otlp_flush,otlp_stop,otlp_server_list(native builds only). See OTLP HTTP Ingest Server
How It Works
Section titled “How It Works”OpenTelemetry File duckdb-otlp SQLCollector Exporter Extension Results │ │ │ │ │ OTLP/gRPC │ │ │ ├─────────────►│ .jsonl/.pb │ │ │ ├──────────────►│ read_otlp_*() │ │ │ ├────────────────►│ │ │ │ │The extension reads OTLP files (JSON or protobuf), detects the format, and streams typed rows into DuckDB tables. Schemas use snake_case names and follow the OpenTelemetry ClickHouse exporter shape.
Core Components
Section titled “Core Components”Rust Backend
Section titled “Rust Backend”Location: external/otlp2records
Parses OTLP JSON, JSONL, and protobuf payloads and emits Arrow arrays through the Arrow C Data Interface.
Arrow Bridge
Section titled “Arrow Bridge”Location: src/otlp_arrow.cpp, src/function/read_otlp.cpp
Converts Arrow arrays from the Rust backend into DuckDB DataChunks for the table functions.
Schema Definitions
Section titled “Schema Definitions”Location: src/schema/*.hpp
Centralized column layouts for traces, logs, and shape-specific metrics. See the Schema Reference for complete column details.
Data Flow
Section titled “Data Flow”User: SELECT * FROM read_otlp_metrics_gauge('metrics.pb') ↓otlp2records parses OTLP JSON/protobuf into Arrow arrays ↓Arrow bridge copies arrays into DuckDB DataChunksOTLP HTTP Ingest Server
Section titled “OTLP HTTP Ingest Server”Alongside the file readers, the extension can run an embedded HTTP server that accepts live OTLP/HTTP exports and streams them into a DuckDB catalog: the connection’s default in-memory/file catalog, an attached DuckLake lakehouse, or another writable catalog such as an Iceberg REST catalog. The extension registers it as a storage extension, so the database owns running servers and tears them down with it. See the Serve Reference for the SQL API.
The server requires a native build. Live ingestion uses HTTP only, with no gRPC listener.
The server buffers ingest and commits rows in batches. Worker threads validate, convert, and append rows into an in-memory buffer, then return 202. A single background writer commits the buffer to the target in one transaction. That model keeps the DuckLake path practical: one Parquet data file per signal per batch commit, and one serialized writer that avoids DuckLake optimistic-concurrency retries.
Components
Section titled “Components”| Component | Location | Role |
|---|---|---|
OtlpServer | src/otlp_server.cpp / src/include/otlp_server.hpp | Base server: token validation/auth, content-type → format selection, Arrow → DuckDB conversion, the in-memory buffer, and the background writer that commits batches into the target catalog. |
HttpOtlpServer | src/otlp_server.cpp | OtlpServer subclass wrapping httplib. Owns the worker pool and the /v1/logs, /v1/traces, /v1/metrics, /healthz, and /readyz routes; binds the socket synchronously so callers see bind failures. |
OtlpStorageExtensionInfo | src/include/otlp_storage.hpp | Database-scoped registry of running servers (keyed by listen URI). Backs CreateServer / FlushServer / StopServer / ListServers, and stops every server when the database closes. It cannot commit buffered rows at that point; see the durability note below. |
| Lifecycle functions | src/otlp_start_stop.cpp | The otlp_serve, otlp_flush, otlp_stop, and otlp_server_list table functions that drive the registry. |
Request flow
Section titled “Request flow”Exporter: POST http://localhost:4318/v1/logs (Bearer token, OTLP body) ↓HttpOtlpServer route → CheckAuth (Bearer or x-api-key) ↓FormatFromContentType (json / ndjson / protobuf) ↓Worker thread: reserve admission bytes -> otlp_transform (FFI) -> convert -> append to per-signal buffer ↓202 {"status":"buffered","rows":N,"batches":M} (NOT yet durable)
... asynchronously ...
Background writer (trigger: internal size/age threshold; optional otlp_flush) → one transaction → for DuckLake: one Parquet data file per signal + one snapshot → COMMIT → after conservative automatic row-seal cadence when there is admission headroom: best-effort CHECKPOINT <catalog> outside the ingest transactionBatch commit model and durability
Section titled “Batch commit model and durability”A single background writer commits the buffer to the target catalog when any trigger fires: admitted request-body bytes reach the internal size threshold, 64 MiB today; the oldest buffered row reaches the internal age limit, about 5 seconds today; or a caller runs otlp_flush. Each batch commit is one transaction.
- A
202is not durable; rows become durable at the next background commit, onotlp_stop, or onotlp_flush. A crash loses buffered-but-uncommitted rows (at-most-once for that window). otlp_stopandotlp_flushcommit remaining buffered rows before returning, so those calls lose no accepted rows. A plain database/connection close does NOT commit buffered rows. The shutdown drain runs after DuckDB tears down the instance, whendb_ptrcan no longer write, so DuckDB can drop buffered rows. Preferotlp_stopbefore closing the database. Useotlp_flushwhen the server should stay running but readers need durable rows now. The project tracks a durable raw-spool journal and earlier shutdown hook for at-least-once delivery.- After successful automatic row-seals into a named catalog, the writer may run non-force
CHECKPOINT <catalog>as best-effort catalog-native maintenance when recent ingest rate and pending bytes leave ample admission headroom. The writer runs maintenance after the ingest transaction commits. It skips the default catalog; sustained high ingest and high pending buffered bytes defer maintenance; explicitotlp_flushand shutdown drains skip the hook. The server logs unsupported catalog implementations once and disables maintenance for that server. DuckLake uses this hook to apply its own maintenance policy;duckdb-otlphas no custom compaction planner.
Concurrency model
Section titled “Concurrency model”The server uses a bounded httplib worker pool like duckdb-quack, but sends all target writes through one background thread. The pool parses, converts, and buffers requests concurrently; each signal table has its own buffer lock. Serial writes let a DuckLake target avoid tiny-file churn and optimistic-concurrency retries. Backpressure: if request admission would exceed max_buffered_bytes (default 512 MiB) across in-flight and uncommitted accepted payloads, POSTs return 503 before parse/transform work and clients should retry with backoff.
Key Design Decisions
Section titled “Key Design Decisions”Schema Design
Section titled “Schema Design”The table functions emit schemas inspired by the OpenTelemetry ClickHouse exporter, with all column names in snake_case:
- Traces: 24 columns covering identifiers, scope metadata, resource attributes, events, links, and computed duration
- Logs: 18 columns with severity, body, resource/scope maps, and trace correlation fields
- Metrics (gauge): 17 columns with timestamp, service info, metric metadata, and numeric value fields
- Metrics (sum): 19 columns (gauge columns plus aggregation temporality and is_monotonic)
- Metrics (histogram): 22 columns with counts, sum, min/max, explicit bounds, and bucket counts
- Metrics (exponential histogram): 27 columns with scale, zero bucket, and positive/negative bucket data
Streaming Architecture
Section titled “Streaming Architecture”The extension streams data through DuckDB’s scan interface without loading entire files into memory. This supports:
- Files larger than available RAM
- Lower memory use for large datasets
- Glob pattern scans across many files
File Organization
Section titled “File Organization”src/├── include/ # Public headers (forwarding to implementation dirs)├── storage/ # Extension entry point registration├── function/ # Table function implementations (`read_otlp_*`)├── schema/ # Column layout helpers├── generated/ # Protobuf message stubs (DO NOT EDIT)└── wasm/ # WASM build configuration
test/├── sql/ # SQLLogicTests (primary test format)└── data/ # Test data (OTLP JSON/protobuf files)
site/├── src/content/docs/ # Astro/Starlight documentation pages└── public/wasm-demo/ # Browser demo, WASM extension, and sample OTLP filesGenerated Code
Section titled “Generated Code”The src/generated/ directory contains protobuf message stubs generated from OpenTelemetry .proto files (*.pb.h, *.pb.cc). These provide the message types consumed by parsers/protobuf_parser.cpp.
Do not edit these files directly. The build excludes them from formatting and linting.
Dependencies
Section titled “Dependencies”Managed via VCPKG:
- Protobuf - Wire format parsing for binary OTLP files in builds that include protobuf support
Python dependencies (via uv):
black- Python formattingclang-format/clang-tidy- C++ formatting/lintingpre-commit- Git hooks
Known Limitations
Section titled “Known Limitations”- Live OTLP ingestion supports HTTP (
otlp_serve/otlp_flush/otlp_stop/otlp_server_list). There is no gRPC listener. See OTLP HTTP Ingest Server. Ingest is buffered (a POST returns202) and durable at the next background commit or gracefulotlp_stop; a crash loses buffered-but-uncommitted rows (at-most-once). The project tracks a durable raw-spool journal for at-least-once delivery. The server requires a native build. - WASM builds support JSON, JSONL, and protobuf file reads only. Native builds add live ingest.
- Protobuf parsing requires the protobuf runtime in builds that include protobuf support.
- Summary metrics are registered placeholders
- The union metrics function (
read_otlp_metrics) is a registered placeholder; use the shape-specific metric readers
Building
Section titled “Building”See CONTRIBUTING.md for build instructions.
See Also
Section titled “See Also”- API Reference - Table function signatures and parameters
- Schema Reference - Complete column layouts