Use Case: Replicating Public Register Data
This page describes how you can use the streaming API to replicate entire public registers into your own systems. By connecting to the stream once, you can maintain a local copy that is always current, without waiting for batch file deliveries.
How the Endpoint Works
- Streams use chunked HTTP to deliver records as they become available.
- Offsets are tracked automatically, ensuring continuity across reconnects.
- Streams run in 25-second sessions (automatically re-established if kept open).
Understanding Offsets and Partitions
Offsets determine where in the data history your stream begins, and partitions provide parallel lanes for scale.
- Partitions split the stream into lanes so high volumes can be processed in parallel.
- Offsets mark your exact position within each lane, enabling resume, replay, or skipping ahead.
Think of offsets as bookmarks in each lane: you can replay, skip ahead, or pick up exactly where you left off.
Example: Streaming Data
# Start streaming from the earliest available record
curl --location 'https://api.predicti.com/datahub/v1/sources/{sourceName}/stream' \
--header 'x-api-key: {API_KEY}' \
--header 'Accept: application/x-ndjson'
Handling Offsets
- Default (no offset) → start from earliest available messages.
-1
→ jump to the latest records (skip history).-2
→ reset to the very beginning (same as default).- ISO 8601 timestamp → start from a specific point in time.
- Partition control → set offsets per partition for fine-grained replay.
Example: Reset to Earliest
curl -X POST \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
https://api.predicti.com/datahub/v1/sources/bbr/offsets \
-d '{"offset": -2}'
Example: Reset to Specific Time
curl -X POST \
-H "Authorization: Bearer <token)" \
-H "Content-Type: application/json" \
https://api.predicti.com/datahub/v1/sources/bbr/offsets \
-d '{"offset": "2024-01-15T10:30:00Z"}'
Example: Partition-specific Offsets
curl -X POST \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
https://api.predicti.com/datahub/v1/sources/bbr/offsets \
-d '{
"offsets": [
{ "partition": 0, "offset": "1" },
{ "partition": 1, "offset": "1" },
{ "partition": 2, "offset": "2" },
{ "partition": 3, "offset": "2" },
{ "partition": 4, "offset": "3" },
{ "partition": 5, "offset": "3" },
{ "partition": 6, "offset": "4" },
{ "partition": 7, "offset": "4" },
{ "partition": 8, "offset": "5" }
]
}'
Adjusting offsets is useful for reprocessing historical data, skipping already-processed messages, or recovering after errors.
How This Compares to Traditional Approaches
- No waiting for nightly CSV files or scheduled imports.
- Data is always current, arriving within seconds of registration.
- Lightweight HTTP-based connection—no heavy brokers or clusters required.
- Built for both operational replication and analytical pipelines.
Practical Benefits for Organizations
Replicating via streaming enables:
- Maintaining a local database that mirrors official registers in real time.
- Removing delays caused by batch jobs or manual data transfers.
- Easier compliance and auditing with full replayable event logs.
- Real-time dashboards, alerts, and analytics that run on live data.
- Effortless scaling from single-system replication to enterprise-wide data lakes.
The result is a faster, more reliable, and lower-cost integration with public data sources.