Feature Data Replication
To replicate feature data, users need to consume the following API endpoint: https://api.predicti.com/datahub/v1/entities/{entityId}/features/{featureName}/history?start=1970-01-01T00:00:00Z See API documentation for more details.
How the Endpoint Works
- The endpoint delivers data in a chunked HTTP streaming manner.
- It supports multiple response types via the
Acceptheader, allowing flexible data formats. - The
startquery parameter specifies the timestamp from which to begin streaming data. It should be set to the_lastModifiedtimestamp of the latest element received during the previous request. - To fully replicate a feature's history, clients must:
- Stream chunks of data continuously.
- Track the
_lastModifiedtimestamp of the latest received element. - Use this timestamp as the
startparameter in the next request to incrementally fetch updates and avoid duplication.
This approach ensures clients stay synchronized with the latest feature updates over time.
For performance reasons it is recommended to use the
Accept: application/x-protobufheader to receive updates in the most efficient format.
Full History and Data Updates
The feature data history endpoint provides the complete history of the requested feature each time a change happens, including all known historical values.
Because of this:
- Each update contains the entire set of historical records for the feature at that point in time.
- To keep data consistent, previously stored records for the specific
idshould be deleted, and the newly received history inserted instead. - This method supports data sources that allow corrections or updates to historical feature values, helping to maintain accurate and up-to-date data.
Replacing old records with the new full history helps avoid duplication and ensures data stays synchronized.
Protobuf Implementation Details
When using Accept: application/x-protobuf, the server responds with protobuf-encoded messages containing feature updates.
Below is the protobuf schema used for streaming feature updates:
// A value for a feature that occurs at a specific point in time
message Value {
// The timestamp in history where this value occurs
optional fixed64 effectiveFrom = 1;
// The value for feature types KEYWORD, DANISH, LOCALDATE (in ISO format, yyyy-mm-dd)
repeated string stringValue = 2;
// The value for feature types FLOAT
repeated double doubleValue = 3;
// The value for feature types BOOLEAN
repeated bool boolValue = 4;
// The value for feature types INSTANT and LOCALDATETIME (as milliseconds since the epoch)
repeated sfixed64 instantValue = 5;
}
// An update to an entity for a specific feature
message UpdatedEntity {
// The feature which is being updated
optional string featureName = 1;
// The new value for the feature
optional Value value = 2;
}
// An update to a feature for a specific entity
message UpdatedFeature {
// The ID of the entity being updated
optional string entityId = 1;
// The timestamp at which this update has been applied in the system
// (which is not the time at which the value takes effect historically)
optional fixed64 lastModified = 2;
// The new complete history of the current feature for the given entity
repeated Value history = 3;
}
Notes on Protobuf Fields
Value.effectiveFromindicates the historical time the feature value applies to.UpdatedFeature.lastModifiedshows when the system recorded the update (not the historical value time).historycontains the full set of known historical values for the feature at that moment.
By combining HTTP chunked streaming with protobuf-encoded messages, the replication system efficiently delivers complete and incremental updates for feature data.