How to Stream OTLP to Cloudflare R2 Data Catalog
Use the duckdb-otlp Docker image in r2-data-catalog mode to stream OTLP/HTTP exports into an Iceberg catalog hosted by Cloudflare R2 Data Catalog.
The container initializes DuckDB, loads the required extensions, attaches the R2 Data Catalog warehouse, starts the ingest server, and commits accepted rows in batches.
Choose R2 Data Catalog when you want Cloudflare to host an Iceberg REST catalog for an R2 bucket. To write Iceberg metadata and Parquet files to a regular R2 bucket through the S3-compatible API, use a different catalog setup.
Live ingestion uses OTLP/HTTP on port 4318. WASM builds do not include the ingest server.
Create R2 Data Catalog resources
Section titled “Create R2 Data Catalog resources”Choose a bucket name and a Cloudflare account/token that can create R2 buckets and enable R2 Data Catalog:
export CLOUDFLARE_ACCOUNT_ID=<account-id>export CLOUDFLARE_API_TOKEN=<r2-admin-read-write-token>export R2_BUCKET_NAME="duckdb-otlp-r2catalog-${CLOUDFLARE_ACCOUNT_ID}"Do not paste the API token into logs or source files. The token needs R2 storage read/write and R2 Data Catalog read/write permissions. Cloudflare’s R2 Admin Read & Write token includes both.
Create the bucket:
wrangler r2 bucket create "$R2_BUCKET_NAME"Enable R2 Data Catalog on the bucket:
wrangler r2 bucket catalog enable "$R2_BUCKET_NAME"Wrangler prints the catalog values DuckDB needs:
Catalog URI: 'https://catalog.cloudflarestorage.com/<account-id>/<bucket-name>'Warehouse: '<account-id>_<bucket-name>'Save them in your shell:
export R2_CATALOG_URI="https://catalog.cloudflarestorage.com/${CLOUDFLARE_ACCOUNT_ID}/${R2_BUCKET_NAME}"export R2_WAREHOUSE="${CLOUDFLARE_ACCOUNT_ID}_${R2_BUCKET_NAME}"Enable R2 Data Catalog table maintenance before sustained ingest. Live OTLP ingest commits rows in batches, which can leave many small data files and table snapshots without Cloudflare managed maintenance.
wrangler r2 bucket catalog compaction enable "$R2_BUCKET_NAME" \ --target-size 128 \ --token "$CLOUDFLARE_API_TOKEN"
wrangler r2 bucket catalog snapshot-expiration enable "$R2_BUCKET_NAME" \ --token "$CLOUDFLARE_API_TOKEN" \ --older-than-days 7 \ --retain-last 10Compaction combines small files into larger files for faster queries; catalog-level compaction applies retroactively to existing tables. Snapshot expiration removes old Iceberg snapshots and unreferenced files according to the retention policy. Snapshot expiration requires Wrangler 4.56.0 or newer.
You also need an R2 S3-compatible access key pair that can write objects to the bucket. Save those values as CLOUDFLARE_ACCESS_KEY_ID and CLOUDFLARE_SECRET_ACCESS_KEY in the next step.
Configure
Section titled “Configure”Create cloudflare.env:
DUCKDB_MODE=r2-data-catalogDUCKDB_OTLP_TOKEN=dev-token-123456
DUCKDB_CATALOG=r2catalogDUCKDB_SCHEMA=otlp
DUCKDB_QUACK_ENABLED=1DUCKDB_QUACK_ADDR=0.0.0.0:9494DUCKDB_QUACK_TOKEN=dev-quack-token-123456
CLOUDFLARE_ACCOUNT_ID=<account-id>CLOUDFLARE_R2_BUCKET=<bucket-name>CLOUDFLARE_ACCESS_KEY_ID=<r2-s3-access-key-id>CLOUDFLARE_SECRET_ACCESS_KEY=<r2-s3-secret-access-key>CLOUDFLARE_CATALOG_URI=https://catalog.cloudflarestorage.com/<account-id>/<bucket-name>CLOUDFLARE_CATALOG_TOKEN=<r2-admin-read-write-token>Start the server
Section titled “Start the server”docker run --rm --name duckdb-otlp \ --env-file cloudflare.env \ -p 4318:4318 \ -p 9494:9494 \ ghcr.io/smithclay/duckdb-otlp:latestThe container creates these Iceberg tables in the R2 Data Catalog namespace if they do not exist:
r2catalog.otlp.otlp_logsr2catalog.otlp.otlp_tracesr2catalog.otlp.otlp_metrics_gauger2catalog.otlp.otlp_metrics_sumr2catalog.otlp.otlp_metrics_histogramr2catalog.otlp.otlp_metrics_exp_histogram
POST a log record
In another terminal:
curl -sS http://localhost:4318/v1/logs \ -H 'Authorization: Bearer dev-token-123456' \ -H 'Content-Type: application/json' \ -d '{"resourceLogs":[{"resource":{"attributes":[{"key":"service.name","value":{"stringValue":"r2-data-catalog-demo"}},{"key":"deployment.environment","value":{"stringValue":"docs"}}]},"scopeLogs":[{"scope":{"name":"duckdb-otlp-guide"},"logRecords":[{"timeUnixNano":"1704067200000000000","observedTimeUnixNano":"1704067200123456789","severityNumber":9,"severityText":"INFO","body":{"stringValue":"hello from Cloudflare R2 Data Catalog"},"attributes":[{"key":"guide","value":{"stringValue":"stream-to-r2-data-catalog"}}]}]}]}]}'Response:
{"status":"buffered","rows":1,"batches":1}Rows are accepted before they are durable. They commit automatically in the background, on graceful shutdown, or immediately after an explicit flush.
Query committed rows
Flush and query through Quack from a host DuckDB process:
The server image is distroless and has no shell or DuckDB CLI, so do not use
docker exec ... sh -c for inspection SQL. The examples in this
guide enable Quack and publish port 9494 for this purpose.
duckdb <<'SQL'INSTALL quack;LOAD quack;
FROM quack_query( 'quack:localhost:9494', 'SELECT * FROM otlp_flush(''otlp:0.0.0.0:4318'')', token = 'dev-quack-token-123456');
FROM quack_query( 'quack:localhost:9494', $$ SELECT service_name, severity_text, body FROM r2catalog.otlp.otlp_logs WHERE service_name = 'r2-data-catalog-demo' ORDER BY time_unix_nano DESC LIMIT 5 $$, token = 'dev-quack-token-123456');SQLStop cleanly
If you plan to delete the R2 Data Catalog resources immediately, skip this step and use Clean up instead.
docker stop duckdb-otlp
The image sends otlp_stop('otlp:0.0.0.0:4318') during shutdown,
so remaining buffered rows are committed before the process exits.
Clean up
Section titled “Clean up”Drop the Iceberg tables before disabling the catalog and deleting the R2 bucket:
duckdb <<'SQL'INSTALL quack;LOAD quack;
FROM quack_query( 'quack:localhost:9494', $$ SELECT status FROM otlp_stop('otlp:0.0.0.0:4318'); DROP TABLE IF EXISTS r2catalog.otlp.otlp_logs; DROP TABLE IF EXISTS r2catalog.otlp.otlp_traces; DROP TABLE IF EXISTS r2catalog.otlp.otlp_metrics_gauge; DROP TABLE IF EXISTS r2catalog.otlp.otlp_metrics_sum; DROP TABLE IF EXISTS r2catalog.otlp.otlp_metrics_histogram; DROP TABLE IF EXISTS r2catalog.otlp.otlp_metrics_exp_histogram; DETACH r2catalog; $$, token = 'dev-quack-token-123456');SQL
docker stop duckdb-otlpThen disable the catalog and delete the bucket:
wrangler r2 bucket catalog disable "$R2_BUCKET_NAME"wrangler r2 bucket delete "$R2_BUCKET_NAME"If bucket deletion reports that the bucket is not empty, delete the remaining catalog objects and retry. For a bucket you created for this guide, those objects are the Iceberg metadata and data files under __r2_data_catalog/:
export R2_OBJECTS_API="https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/r2/buckets/${R2_BUCKET_NAME}/objects"
curl -fsS \ -H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}" \ "$R2_OBJECTS_API" | jq -r '.result[]?.key' | while IFS= read -r key; do encoded="$(node -e 'process.stdout.write(encodeURIComponent(process.argv[1]))' "$key")" curl -fsS -X DELETE \ -H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}" \ "${R2_OBJECTS_API}/${encoded}" >/dev/null done
wrangler r2 bucket delete "$R2_BUCKET_NAME"Troubleshooting
Section titled “Troubleshooting”- If the container cannot attach the catalog at startup, confirm
CLOUDFLARE_CATALOG_URI,CLOUDFLARE_CATALOG_TOKEN,CLOUDFLARE_ACCOUNT_ID, andCLOUDFLARE_R2_BUCKETall refer to the same bucket. - If the container attaches the catalog but cannot write files, confirm
CLOUDFLARE_ACCESS_KEY_IDandCLOUDFLARE_SECRET_ACCESS_KEYcan write objects to the R2 bucket. - If no rows appear after a
202response, run the flush command before querying. - If bucket deletion reports that the bucket is not empty, delete the remaining catalog objects as shown in Clean up, then retry
wrangler r2 bucket delete. - R2 Data Catalog supports R2 buckets in the default jurisdiction.
- R2 Data Catalog stores live ingest timestamp columns with microsecond precision because the Iceberg catalog does not accept DuckDB
TIMESTAMP_NScolumns. - If DuckDB reports unsupported catalog checkpointing, no action is required; ingest, flush, and stop durability behavior stays the same.