How to Stream OTLP to Cloudflare R2 Data Catalog

Use the duckdb-otlp Docker image in r2-data-catalog mode to stream OTLP/HTTP exports into an Iceberg catalog hosted by Cloudflare R2 Data Catalog.

The container initializes DuckDB, loads the required extensions, attaches the R2 Data Catalog warehouse, starts the ingest server, and commits accepted rows in batches.

Choose R2 Data Catalog when you want Cloudflare to host an Iceberg REST catalog for an R2 bucket. To write Iceberg metadata and Parquet files to a regular R2 bucket through the S3-compatible API, use a different catalog setup.

Live ingestion uses OTLP/HTTP on port 4318. WASM builds do not include the ingest server.

Create R2 Data Catalog resources

Choose a bucket name and a Cloudflare account/token that can create R2 buckets and enable R2 Data Catalog:

export CLOUDFLARE_ACCOUNT_ID=<account-id>
export CLOUDFLARE_API_TOKEN=<r2-admin-read-write-token>
export R2_BUCKET_NAME="duckdb-otlp-r2catalog-${CLOUDFLARE_ACCOUNT_ID}"

Do not paste the API token into logs or source files. The token needs R2 storage read/write and R2 Data Catalog read/write permissions. Cloudflare’s R2 Admin Read & Write token includes both.

Create the bucket:

wrangler r2 bucket create "$R2_BUCKET_NAME"

Enable R2 Data Catalog on the bucket:

wrangler r2 bucket catalog enable "$R2_BUCKET_NAME"

Wrangler prints the catalog values DuckDB needs:

Catalog URI: 'https://catalog.cloudflarestorage.com/<account-id>/<bucket-name>'
Warehouse: '<account-id>_<bucket-name>'

Save them in your shell:

export R2_CATALOG_URI="https://catalog.cloudflarestorage.com/${CLOUDFLARE_ACCOUNT_ID}/${R2_BUCKET_NAME}"
export R2_WAREHOUSE="${CLOUDFLARE_ACCOUNT_ID}_${R2_BUCKET_NAME}"

Enable R2 Data Catalog table maintenance before sustained ingest. Live OTLP ingest commits rows in batches, which can leave many small data files and table snapshots without Cloudflare managed maintenance.

wrangler r2 bucket catalog compaction enable "$R2_BUCKET_NAME" \
  --target-size 128 \
  --token "$CLOUDFLARE_API_TOKEN"

wrangler r2 bucket catalog snapshot-expiration enable "$R2_BUCKET_NAME" \
  --token "$CLOUDFLARE_API_TOKEN" \
  --older-than-days 7 \
  --retain-last 10

Compaction combines small files into larger files for faster queries; catalog-level compaction applies retroactively to existing tables. Snapshot expiration removes old Iceberg snapshots and unreferenced files according to the retention policy. Snapshot expiration requires Wrangler 4.56.0 or newer.

You also need an R2 S3-compatible access key pair that can write objects to the bucket. Save those values as CLOUDFLARE_ACCESS_KEY_ID and CLOUDFLARE_SECRET_ACCESS_KEY in the next step.

Configure

Create cloudflare.env:

DUCKDB_MODE=r2-data-catalog
DUCKDB_OTLP_TOKEN=dev-token-123456

DUCKDB_CATALOG=r2catalog
DUCKDB_SCHEMA=otlp

DUCKDB_QUACK_ENABLED=1
DUCKDB_QUACK_ADDR=0.0.0.0:9494
DUCKDB_QUACK_TOKEN=dev-quack-token-123456

CLOUDFLARE_ACCOUNT_ID=<account-id>
CLOUDFLARE_R2_BUCKET=<bucket-name>
CLOUDFLARE_ACCESS_KEY_ID=<r2-s3-access-key-id>
CLOUDFLARE_SECRET_ACCESS_KEY=<r2-s3-secret-access-key>
CLOUDFLARE_CATALOG_URI=https://catalog.cloudflarestorage.com/<account-id>/<bucket-name>
CLOUDFLARE_CATALOG_TOKEN=<r2-admin-read-write-token>

Start the server

Prefer to run manually in the DuckDB shell? See Run manually.

docker run --rm --name duckdb-otlp \
  --env-file cloudflare.env \
  -p 4318:4318 \
  -p 9494:9494 \
  ghcr.io/smithclay/duckdb-otlp:latest

The container creates these Iceberg tables in the R2 Data Catalog namespace if they do not exist:

r2catalog.otlp.otlp_logs
r2catalog.otlp.otlp_traces
r2catalog.otlp.otlp_metrics_gauge
r2catalog.otlp.otlp_metrics_sum
r2catalog.otlp.otlp_metrics_histogram
r2catalog.otlp.otlp_metrics_exp_histogram

POST a log record

In another terminal:

curl -sS http://localhost:4318/v1/logs \
  -H 'Authorization: Bearer dev-token-123456' \
  -H 'Content-Type: application/json' \
  -d '{"resourceLogs":[{"resource":{"attributes":[{"key":"service.name","value":{"stringValue":"r2-data-catalog-demo"}},{"key":"deployment.environment","value":{"stringValue":"docs"}}]},"scopeLogs":[{"scope":{"name":"duckdb-otlp-guide"},"logRecords":[{"timeUnixNano":"1704067200000000000","observedTimeUnixNano":"1704067200123456789","severityNumber":9,"severityText":"INFO","body":{"stringValue":"hello from Cloudflare R2 Data Catalog"},"attributes":[{"key":"guide","value":{"stringValue":"stream-to-r2-data-catalog"}}]}]}]}]}'

Response:

{"status":"buffered","rows":1,"batches":1}

Rows are accepted before they are durable. They commit automatically in the background, on graceful shutdown, or immediately after an explicit flush.

Query committed rows

Flush and query through Quack from a host DuckDB process:

The server image is distroless and has no shell or DuckDB CLI, so do not use docker exec ... sh -c for inspection SQL. The examples in this guide enable Quack and publish port 9494 for this purpose.

duckdb <<'SQL'
INSTALL quack;
LOAD quack;

FROM quack_query(
  'quack:localhost:9494',
  'SELECT * FROM otlp_flush(''otlp:0.0.0.0:4318'')',
  token = 'dev-quack-token-123456'
);

FROM quack_query(
  'quack:localhost:9494',
  $$
  SELECT service_name, severity_text, body
  FROM r2catalog.otlp.otlp_logs
  WHERE service_name = 'r2-data-catalog-demo'
  ORDER BY time_unix_nano DESC
  LIMIT 5
  $$,
  token = 'dev-quack-token-123456'
);
SQL

Stop cleanly

If you plan to delete the R2 Data Catalog resources immediately, skip this step and use Clean up instead.

docker stop duckdb-otlp

The image sends otlp_stop('otlp:0.0.0.0:4318') during shutdown, so remaining buffered rows are committed before the process exits.

Clean up

Drop the Iceberg tables before disabling the catalog and deleting the R2 bucket:

duckdb <<'SQL'
INSTALL quack;
LOAD quack;

FROM quack_query(
  'quack:localhost:9494',
  $$
  SELECT status FROM otlp_stop('otlp:0.0.0.0:4318');
  DROP TABLE IF EXISTS r2catalog.otlp.otlp_logs;
  DROP TABLE IF EXISTS r2catalog.otlp.otlp_traces;
  DROP TABLE IF EXISTS r2catalog.otlp.otlp_metrics_gauge;
  DROP TABLE IF EXISTS r2catalog.otlp.otlp_metrics_sum;
  DROP TABLE IF EXISTS r2catalog.otlp.otlp_metrics_histogram;
  DROP TABLE IF EXISTS r2catalog.otlp.otlp_metrics_exp_histogram;
  DETACH r2catalog;
  $$,
  token = 'dev-quack-token-123456'
);
SQL

docker stop duckdb-otlp

Then disable the catalog and delete the bucket:

wrangler r2 bucket catalog disable "$R2_BUCKET_NAME"
wrangler r2 bucket delete "$R2_BUCKET_NAME"

If bucket deletion reports that the bucket is not empty, delete the remaining catalog objects and retry. For a bucket you created for this guide, those objects are the Iceberg metadata and data files under __r2_data_catalog/:

export R2_OBJECTS_API="https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/r2/buckets/${R2_BUCKET_NAME}/objects"

curl -fsS \
  -H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}" \
  "$R2_OBJECTS_API" |
  jq -r '.result[]?.key' |
  while IFS= read -r key; do
    encoded="$(node -e 'process.stdout.write(encodeURIComponent(process.argv[1]))' "$key")"
    curl -fsS -X DELETE \
      -H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}" \
      "${R2_OBJECTS_API}/${encoded}" >/dev/null
  done

wrangler r2 bucket delete "$R2_BUCKET_NAME"

Run manually

To run this configuration in a DuckDB 1.5.4+ shell instead of the daemon, execute the SQL below. Replace bracketed values with the corresponding values from this guide. Keep the shell open while clients send telemetry.

duckdb duckdb-otlp-control.duckdb

-- The daemon embeds otlp statically; the shell loads the extension explicitly.
INSTALL otlp FROM community;
LOAD otlp;

-- Configure R2 object storage and attach the Iceberg REST catalog.
INSTALL iceberg;
INSTALL httpfs;
LOAD iceberg;
LOAD httpfs;
CREATE OR REPLACE SECRET cloudflare_r2_secret (
  TYPE s3,
  KEY_ID '<r2-s3-access-key-id>',
  SECRET '<r2-s3-secret-access-key>',
  REGION 'auto',
  ENDPOINT '<account-id>.r2.cloudflarestorage.com',
  URL_STYLE 'path'
);
CREATE OR REPLACE SECRET cloudflare_catalog_secret (
  TYPE iceberg,
  TOKEN '<r2-admin-read-write-token>'
);
ATTACH '<account-id>_<bucket-name>' AS r2catalog (
  TYPE iceberg,
  ENDPOINT 'https://catalog.cloudflarestorage.com/<account-id>/<bucket-name>',
  SECRET cloudflare_catalog_secret
);

-- Create the target schema before starting ingestion.
CREATE SCHEMA IF NOT EXISTS r2catalog.otlp;

-- Match the daemon's default-catalog setting for unqualified queries.
USE r2catalog;

-- Start OTLP/HTTP. Seal cadence, file sizes, and buffer limits use defaults;
-- override only as needed — see the Live Ingest Reference:
-- https://smithclay.github.io/duckdb-otlp/reference/serve/
SELECT listen_url, catalog_name, schema_name
FROM otlp_serve(
  'otlp:0.0.0.0:4318',
  catalog := 'r2catalog',
  schema := 'otlp',
  token := 'dev-token-123456',
  allow_other_hostname := true
);

-- The guide's daemon configuration also enables Quack.
INSTALL quack;
LOAD quack;
SELECT listen_uri
FROM quack_serve(
  'quack:0.0.0.0:9494',
  token := 'dev-quack-token-123456',
  allow_other_hostname := true
);

Before closing DuckDB, stop both listeners cleanly:

-- Stop Quack, then commit buffered telemetry and stop OTLP.
CALL quack_stop('quack:0.0.0.0:9494');
SELECT status, dropped_rows
FROM otlp_stop('otlp:0.0.0.0:4318');

Troubleshooting

If the container cannot attach the catalog at startup, confirm CLOUDFLARE_CATALOG_URI, CLOUDFLARE_CATALOG_TOKEN, CLOUDFLARE_ACCOUNT_ID, and CLOUDFLARE_R2_BUCKET all refer to the same bucket.
If the container attaches the catalog but cannot write files, confirm CLOUDFLARE_ACCESS_KEY_ID and CLOUDFLARE_SECRET_ACCESS_KEY can write objects to the R2 bucket.
If no rows appear after a 202 response, run the flush command before querying.
If bucket deletion reports that the bucket is not empty, delete the remaining catalog objects as shown in Clean up, then retry wrangler r2 bucket delete.
R2 Data Catalog supports R2 buckets in the default jurisdiction.
R2 Data Catalog stores live ingest timestamp columns with microsecond precision because the Iceberg catalog does not accept DuckDB TIMESTAMP_NS columns.
If DuckDB reports unsupported catalog checkpointing, no action is required; ingest, flush, and stop durability behavior stays the same.