Difference between revisions of "Apache Kafka integration"
(→Not Supported: Use of SSL as connection mechanism) |
|||
Line 1: | Line 1: | ||
[[Category:Functions]] | [[Category:Functions]] | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
=Overview= | =Overview= | ||
− | Fusion Registry can act as an Apache Kafka Producer where specified events are published on definable Kafka [https://kafka.apache.org/quickstart#quickstart_createtopic topics]. | + | Fusion Metadata Registry (FMR) can act as an Apache Kafka Producer where specified events are published on definable Kafka [https://kafka.apache.org/quickstart#quickstart_createtopic topics]. |
It consists of a generalised ‘producer’ interface capable of publishing any information to definable Kafka topics, and a collection of ‘handlers’ for managing specific events. | It consists of a generalised ‘producer’ interface capable of publishing any information to definable Kafka topics, and a collection of ‘handlers’ for managing specific events. | ||
Line 43: | Line 34: | ||
! Parameter!! Value | ! Parameter!! Value | ||
|- | |- | ||
− | | Client ID || A unique identifier for the Fusion Registry client. There's no real restrictions. | + | | Client ID || A unique identifier for the Fusion Metadata Registry client. There's no real restrictions. |
|- | |- | ||
| Servers || A comma-separated list of host:port key-pairs of the Kafka Service. e.g. myserver:9093,anotherServer:9094,server3:9093 | | Servers || A comma-separated list of host:port key-pairs of the Kafka Service. e.g. myserver:9093,anotherServer:9094,server3:9093 | ||
Line 114: | Line 105: | ||
Message publication is not 100% reliable. | Message publication is not 100% reliable. | ||
− | + | FMR places event messages as they are created onto an internal in-memory Staging Queue. When a message is created, an immediate attempt is made to publish to the Kafka broker cluster. If the publication fails (the Kafka service is unavailable, for instance), the message is returned to the Queue. An independent Queue Processor thread makes periodic attempts to complete the publication of staged messages. | |
− | IMPORTANT: Messages stay on the Staging Queue until they are successfully published to Kafka. However, the Staging Queue is not persistent so any staged but unpublished messages will be lost when the | + | IMPORTANT: Messages stay on the Staging Queue until they are successfully published to Kafka. However, the Staging Queue is not persistent so any staged but unpublished messages will be lost when the FMR service terminates. |
Each individual event message service manages the risk of message loss in a way appropriate for their specific use case. For [[#Structure Notification|Structure Notification]], the risk is managed by forcing consumers to re-synchronise their complete structural metadata content with the Registry on Registry startup. Even if structure change event messages were lost, Consumers' metadata is reset to a consistent state. | Each individual event message service manages the risk of message loss in a way appropriate for their specific use case. For [[#Structure Notification|Structure Notification]], the risk is managed by forcing consumers to re-synchronise their complete structural metadata content with the Registry on Registry startup. Even if structure change event messages were lost, Consumers' metadata is reset to a consistent state. | ||
=Events= | =Events= | ||
− | The following | + | The following FMR events are available for publication on a Kafka service. |
{| class="wikitable" | {| class="wikitable" | ||
Line 131: | Line 122: | ||
==Structure Notification== | ==Structure Notification== | ||
− | SDMX Structural Metadata is published to definable Kafka topics on | + | SDMX Structural Metadata is published to definable Kafka topics on FMR startup and each time structures are added / modified, or deleted. Metadata-driven applications can subscribe to the relevant topic(s) to receive notifications of changes to structures allowing them to maintain an up-to-date replica copy of the structural metadata they need. |
Example use cases include: | Example use cases include: | ||
Line 159: | Line 150: | ||
|- | |- | ||
| SDMX-JSON || SDMX-JSON 1.0 || application/vnd.sdmx.json | | SDMX-JSON || SDMX-JSON 1.0 || application/vnd.sdmx.json | ||
− | |||
− | |||
|- | |- | ||
| Fusion-JSON || Metadata Technology's JSON format with similarities to SDMX-JSON provided primarily for backward compatibility || application/vnd.fusion.json | | Fusion-JSON || Metadata Technology's JSON format with similarities to SDMX-JSON provided primarily for backward compatibility || application/vnd.fusion.json | ||
|} | |} | ||
− | Events for certain structures will only be published if the structure type is supported by the chosen format. Only Fusion-JSON supports all possible structures including | + | Events for certain structures will only be published if the structure type is supported by the chosen format. Only Fusion-JSON supports all possible structures including FMR 'extended structures' like [[Validation scheme|Validation Schemes]]. |
===Additions and Modifications=== | ===Additions and Modifications=== | ||
Line 174: | Line 163: | ||
===Kafka Transactions=== | ===Kafka Transactions=== | ||
− | Additions and modifications to structures in | + | Additions and modifications to structures in FMR are published using [https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka Kafka Transactional Messaging] to help guarantee consistency by providing: |
* Atomicity - The consumer will not be exposed to uncommited transactions | * Atomicity - The consumer will not be exposed to uncommited transactions | ||
* Durability - Kafka brokers guarantee not to lose any committed transactions | * Durability - Kafka brokers guarantee not to lose any committed transactions | ||
Line 191: | Line 180: | ||
===Full Content Publication and Re-synchronisation=== | ===Full Content Publication and Re-synchronisation=== | ||
− | Under certain conditions, the | + | Under certain conditions, the FMR Producer will publish all of its structures in a single SDMX Structure Message of the chosen format to Kafka.<br> |
These are: | These are: | ||
# When a new Kafka connection is set up, or the configuration of an existing Kafka connection is changed | # When a new Kafka connection is set up, or the configuration of an existing Kafka connection is changed | ||
Line 202: | Line 191: | ||
===Not Supported: Subscription to Specific Structures=== | ===Not Supported: Subscription to Specific Structures=== | ||
The current implementation does not allow Kafka Consumers to subscribe to changes on specific structures or sets of structures, such as those maintained by a particular Agency. Consumers subscribing to the chosen topic will receive information about all structures and will need to select what they need. | The current implementation does not allow Kafka Consumers to subscribe to changes on specific structures or sets of structures, such as those maintained by a particular Agency. Consumers subscribing to the chosen topic will receive information about all structures and will need to select what they need. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− |
Revision as of 02:47, 3 November 2020
Overview
Fusion Metadata Registry (FMR) can act as an Apache Kafka Producer where specified events are published on definable Kafka topics. It consists of a generalised ‘producer’ interface capable of publishing any information to definable Kafka topics, and a collection of ‘handlers’ for managing specific events.
At present there is only one event handler Structure Notification which publishes changes to any structures as they occur. For changes or modifications to structures, the body of the message is always an SDMX structure message. The format is configurable at the event handler level with choices including:
- SDMX-ML (1.0, 2.0 and 2.1)
- SDMX-JSON
- EDI
- Excel
- Fusion JSON (the JSON dialect that pre-dated the formal SDMX-JSON specification)
A ‘tombstone’ message is used for structure deletions with the structure URN in the Kafka Message Key, and a null body.
The list of supported events that can be published to Kafka will be extended over time.
Other events envisaged include:
- Audit events such as logins, data registrations and API calls
- Errors, for instance where a scheduled data import fails
- Configuration changes
- Changes to Content Security Rules
Configuration
Configuration is performed through the GUI with 'admin' privileges.
Connection
The Connection form configures the parameters needed to connect to the Kafka broker service.
Parameter | Value |
---|---|
Client ID | A unique identifier for the Fusion Metadata Registry client. There's no real restrictions. |
Servers | A comma-separated list of host:port key-pairs of the Kafka Service. e.g. myserver:9093,anotherServer:9094,server3:9093 |
Comression Algorithm | The algorithm to compress the payload. Choices are: None, GZIP, Snappy, LZ4. |
Enable Kerberos Security | If Enabled, the Producer attempts to authenticate with the specified Kerberos service for access to Kafka |
Enable SSL | If Enabled, an SSL connection will be established - see the section below for further information. |
Topics
The Topics form allows configuration of which events should be published on Kafka, and on what Kafka Topics.
Kafka Topic is ID of the topic on which to publish.
Each message can be published onto multiple topics by providing a comma separated list of topic names.
So 'FOO' publishes on just the single topic specified.
'FOO,BAR' publishes on both FOO and BAR topics.
The form also allows event specific parameters to be set. For Structure Notification, this is the format of the Kafka Message Body
SSL
It is possible to set up an SSL connection from the Registry to Kafka. It is required that:
- Your Kafka system has been set up correctly with a KeyStore, TrustStore and appropriate passwords
- The Registry has access to both KeyStore and TrustStore
Configuring Kafka for SSL
- Edit the Kafka configuration file: server.properties
- Locate the address for the listeners. This will need to either be changed to support SSL or another port will need to be added. For example, having 2 ports, one for PlainText and the other for SSL can be defined with:
listeners=PLAINTEXT://:9092,SSL://localhost:9093
- Add the following section for your SSL configuration:
ssl.keystore.location = <location of the keystore> ssl.keystore.password = <keystore password> ssl.key.password = <key password> ssl.truststore.location = <location of the truststore> ssl.truststore.password = <truststore password>
For example:
ssl.keystore.location = f:/keystores/kafka.server.keystore.jks ssl.keystore.password = password ssl.key.password = password ssl.truststore.location = f:/keystores/kafka.server.truststore.jks ssl.truststore.password = password
Configuring the Registry Kafka SSL communication
This can be specified on the Kafka Connection page.
Ensure that the "servers" field shows a connection to the SSL port as configured in Kafka.
The following five pieces of information must be known and entered in the appropriate input fields:
- KeyStore Location - the full URI is required
- KeyStore Password
- Key Password
- TrustStore Location - the full URI is required
- TrustStore Password
General Behaviour
Publication Reliability
Message publication is not 100% reliable.
FMR places event messages as they are created onto an internal in-memory Staging Queue. When a message is created, an immediate attempt is made to publish to the Kafka broker cluster. If the publication fails (the Kafka service is unavailable, for instance), the message is returned to the Queue. An independent Queue Processor thread makes periodic attempts to complete the publication of staged messages.
IMPORTANT: Messages stay on the Staging Queue until they are successfully published to Kafka. However, the Staging Queue is not persistent so any staged but unpublished messages will be lost when the FMR service terminates.
Each individual event message service manages the risk of message loss in a way appropriate for their specific use case. For Structure Notification, the risk is managed by forcing consumers to re-synchronise their complete structural metadata content with the Registry on Registry startup. Even if structure change event messages were lost, Consumers' metadata is reset to a consistent state.
Events
The following FMR events are available for publication on a Kafka service.
Event | Details |
---|---|
Structure Notification | Addition / modification or deletion of SDMX structures |
Structure Notification
SDMX Structural Metadata is published to definable Kafka topics on FMR startup and each time structures are added / modified, or deleted. Metadata-driven applications can subscribe to the relevant topic(s) to receive notifications of changes to structures allowing them to maintain an up-to-date replica copy of the structural metadata they need.
Example use cases include:
- Structural validation applications which require up-to-date structural metadata such as Codelists and DSDs to check whether data is correctly structured.
- Structure mapping applications which require up-to-date copies of relevant Structure Sets and maps to perform data transformations.
Kafka Message Key
The SDMX Structure URN is used as the Kafka Message Key. Consumers must therefore be prepared to receive and interpret the key as the Structure URN correctly when processing the message. This is particularly important for structure deletion where the Message Key is the only source of information about which structure is being referred to.
Kafka Message Body
The Kafka Message Body is a normal SDMX structure message, except in the case of deletions where the message has no content (a null payload), and the structure to delete is identified exclusively by its URN in the Message Key. The format of the SDMX structure message is configurable using the Topics form with the following options available:
Format | Explanation | HTTP Accept Header Equivalent |
---|---|---|
SDMX-ML 1.0 | SDMX-ML 1.0 XML structure document | application/vnd.sdmx.structure+xml;version=1.0 |
SDMX-ML 2.0 | SDMX-ML 2.0 XML structure document | application/vnd.sdmx.structure+xml;version=2.0 |
SDMX-ML 2.0 | SDMX-ML 2.1 XML structure document | application/vnd.sdmx.structure+xml;version=2.1 |
SDMX-EDI | EDI | application/vnd.sdmx.structure;version=edi |
SDMX-JSON | SDMX-JSON 1.0 | application/vnd.sdmx.json |
Fusion-JSON | Metadata Technology's JSON format with similarities to SDMX-JSON provided primarily for backward compatibility | application/vnd.fusion.json |
Events for certain structures will only be published if the structure type is supported by the chosen format. Only Fusion-JSON supports all possible structures including FMR 'extended structures' like Validation Schemes.
Additions and Modifications
Addition and Modification of structures both result in an SDMX 'replace' message containing the full content of the structures. Deltas (such as the addition of a Code to a Codelist) are not supported.
Deletions
Deletion of structures results in a 'tombstone' message, i.e. one with a null payload but the URN of the deleted structure in the Message Key.
Kafka Transactions
Additions and modifications to structures in FMR are published using Kafka Transactional Messaging to help guarantee consistency by providing:
- Atomicity - The consumer will not be exposed to uncommited transactions
- Durability - Kafka brokers guarantee not to lose any committed transactions
- Ordering - Transaction-aware consumers receive transactions in the original order
- Non-Duplication - No duplicate messages within transactions
If a single structure is added or changed, the transaction will contain just that structure.
Transactions will contain multiple structures in cases including:
- Where multiple structures are submitted to the Registry in a single SDMX-ML message.
- Where a structure's Agency or ID is changed that affects dependent structures.
Subscribers to the Kafka topic must be transaction-aware, and must ensure that each transaction is processed atomically to maintain the referential integrity of their structural metadata replicas.
Structure deletions are never mixed with additions / modifications in the same transaction. This reflects the behaviour of the SDMX REST API whereby additions / modifications use HTTP POST, while deletions require HTTP DELETE.
Full Content Publication and Re-synchronisation
Under certain conditions, the FMR Producer will publish all of its structures in a single SDMX Structure Message of the chosen format to Kafka.
These are:
- When a new Kafka connection is set up, or the configuration of an existing Kafka connection is changed
- Registry startup
The principle here is to ensure that Consumers have a consistent baseline against which to process subsequent change notifications.
It also mitigates the risk that changes held in the Registry's in-memory queue of messages awaiting Kafka publication were lost on shutdown by forcing Consumers to re-synchronise on Registry startup.
Not Supported: Subscription to Specific Structures
The current implementation does not allow Kafka Consumers to subscribe to changes on specific structures or sets of structures, such as those maintained by a particular Agency. Consumers subscribing to the chosen topic will receive information about all structures and will need to select what they need.