Difference between revisions of "Data Transformation Web Service"
m (→Overview) |
|||
Line 2: | Line 2: | ||
= Overview = | = Overview = | ||
The Data Transformation Web Service converts the dataset submitted in the POST body to the data transmission format specified by the Accept header, optionally transforming it to a different Data Structure Definition if the Registry has a Structure Map defining the mapping. | The Data Transformation Web Service converts the dataset submitted in the POST body to the data transmission format specified by the Accept header, optionally transforming it to a different Data Structure Definition if the Registry has a Structure Map defining the mapping. | ||
+ | |||
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
Line 24: | Line 25: | ||
<p><b>500</b> - Server Error</p> | <p><b>500</b> - Server Error</p> | ||
|} | |} | ||
+ | |||
+ | This is a <b>synchronous</b> service where the client receives the transformed data directly as the response to the HTTP POST request. | ||
+ | |||
+ | Benefits: | ||
+ | * Simple to use - the transformation can be completed in a single web service call making it easy to use with Postman, <code>curl</code> and similar tools | ||
+ | Disadvantages: | ||
+ | * Suitable only for smaller datasets - HTTP timeouts may occurs when processioning larger datasets or complex transformations which take longer to execute | ||
+ | |||
+ | Use the [[Asynchronous Data Validation and Transformation Web Service]] for larger datasets and heavier workloads. This avoids the risk of HTTP timeouts by submitting the data to be transformed as a job and executing the transformation in the background. | ||
= HTTP Headers = | = HTTP Headers = |
Latest revision as of 08:41, 16 December 2024
Overview
The Data Transformation Web Service converts the dataset submitted in the POST body to the data transmission format specified by the Accept header, optionally transforming it to a different Data Structure Definition if the Registry has a Structure Map defining the mapping.
Entry Point | /ws/public/data/transform |
Access | Public (default). Configurable to Private |
Http Method | POST |
Accepts | CSV, XLSX, SDMX-ML, SDMX-EDI (any format for which there is a Data Reader) |
Compression | Zip files supported, if loading from URL gzip responses supported |
Content-Type | 1. multipart/form-data (if attaching file) – the attached file must be in field name of uploadFile 2. application/text or application/xml (if submitting data in the body of the POST) |
Response Format | Determined by Accept Header - default SDMX 2.1 Structure Specific |
Response Statuses | 200 - Transformation performed 400 - Transformation could not be performed (either an unreadable datasets, or resolvable reference to a required structure) 401 - Unauthorized (if access has been restricted) 500 - Server Error |
This is a synchronous service where the client receives the transformed data directly as the response to the HTTP POST request.
Benefits:
- Simple to use - the transformation can be completed in a single web service call making it easy to use with Postman,
curl
and similar tools
Disadvantages:
- Suitable only for smaller datasets - HTTP timeouts may occurs when processioning larger datasets or complex transformations which take longer to execute
Use the Asynchronous Data Validation and Transformation Web Service for larger datasets and heavier workloads. This avoids the risk of HTTP timeouts by submitting the data to be transformed as a job and executing the transformation in the background.
HTTP Headers
The Accept Header is used to define the output format, to transform the data to.
In addition, the following optional header parameters can be used to provide further details on the incoming dataset. If these details are not provided, the Fusion Registry will interrogate the dataset header to get the information. If the dataset is a non-SDMX format, or does not contain the required information in the header, then an error response will be returned.
HTTP Header | Purpose | Allowed Values |
---|---|---|
Accept | The data transmission format to convert the dataset to.
|
SDMX Formats
Note that the Fusion Excel data transmission format is supported as the input, but not output of a transformation. |
Data-Format | Used to inform the server when the data is in CSV format. | csv;delimiter=[delimiter]
Where [delimiter] is either:
|
Structure | (optional) Provides the structure to validate the data against. This is optional as this information may be present in the header of the DataSet. If provided this value will override the value in the dataset (if present). |
Valid SDMX URN for Provision Agreement, Dataflow, or Data Structure Definition |
Receiver-Id (Since v9.8) |
The ReceiverId may be included in the validation report. If not provided, the ReceiverId will be taken from the header of the dataset if it is present. If the dataset does not contain a ReceiverId (for example a non-SDMX format) then the validation report will not contain a ReceiverId in the header. |
The following characters are allowed: A-z, a-z 0-9 $, _, -, @, \ |
Structure | Provides the structure used to read the data. This is optional as this information may be present in the header of the DataSet. If provided this value will override the value in the dataset (if present). |
Valid SDMX URN for Provision Agreement, Dataflow, or Data Structure Definition. |
Dataset-Idx | If the loaded file contains multiple datasets, this argument can be used to indicate which dataset is transformed. If this argument is not present then all datasets will be in the output file (if the file formats permits multiple datasets). |
Zero indexed integer, example: 0 |
Dataset-Id (Since v9.8) |
An optional parameter which allows the user to specify the value of the DataSetID generated in the validation. |
The following characters are allowed:
A-z, a-z
0-9
$, _, -, @, \
Specific variables permit the insertion of Data Structure / Data Flow values. These values are:
|
Dataset-Action (Since v9.8.1) |
An optional parameter which allows the user to specify the value of the DataSetAction generated in the validation report. If this parameter is not specified, the default value will be used. | May be one of the following:
|
Map-Structure (Since v9.2.13) |
An optional parameter to inform the Fusion Registry to transform the structure of the dataset to conform to another Data Structure Definition. The value provided can be a URN of a Dataflow or Data Structure Definition to map the incoming data to. A Structure Map must exist in the Fusion Registry which maps between the incoming Data Structure/Dataflow and Mapped Data Structure/Dataflow. Alternatively the URN may be the URN of the Data Structure Map to use for the mapping (since v9.4.4) |
Valid SDMX URN for Dataflow or Data Structure Definition. |
Inc-Unmapped (Since v9.6.5) |
If the Map-Structure Header is used, then the inclusion of Inc-Unmapped will output a second dataset, if there are unmapped series. The additional dataset contains the data that could not be mapped due to missing mapping rules, or ambiguous outputs. The format of the additional dataset is the same format as the output dataset. As the result may contain a separate file, the response format is either set to multipart/mixed message with a boundary per file, or if the Zip header is set to true, the output will be a single zip file. The file names are 'out' and 'unmapped' with the file extension based on the output format. |
Boolean (true/false) |
Inc-UnmappedReport (Since v11.5.0) |
If the Map-Structure Header is used, then the inclusion of Inc-UnmappedReport may output another file, if there are unmapped series. The additional file contains a report on the information that could not be mapped due to missing mapping rules, or ambiguous outputs. The format of this report consists of JSON elements:
The result consists of an Input and an Output which details what the input managed to map to. The output also contains a Array called "MissingDimensions" which lists the ID of the missing dimensions. |
Boolean (true/false) |
Inc-Metrics (Since v9.6.5) |
Includes metrics on the transformation. The result will contain a separate file, either as a multipart/mixed message with a boundary per file, or if the Zip header is set to true, the output will be a single zip file. |
Boolean (true/false)
|
Fail-On-Error (Since v9.5.0) |
An optional parameter to tell the transformation process to fail if an error is detected in the dataset. |
Boolean (true/false) |
Zip (Since v9.6.5) |
Compresses the output as a zip file. This if used in conjunction with Inc-Metrics or Inc-Unmapped the zip will contain multiple files. |
Boolean (true/false) |
Duplicate-Behaviour (Since v11.1.6) |
Specify the behaviour to perform when duplicate observations are encountered. Either the duplicates can be preserved or either the first or last value can be used. |
May be one of the following:
|
Skip-Validation (Since v11.5.1) |
Allows the validation process to be skipped when transforming a file. Useful when the input file is well understood or large. Default is false. |
Boolean (true/false) |
Include Metrics
The following JSON is an example response when Inc-Metrics header is set to true. Request Time is Epoc Time Milliseconds, and Duration is measured in the number of milliseconds taken to complete the transformation.
{ "Meta": { "RequestTime": 1559124708568, "Duration": 220 }, "SourceData": { "Datasets": [ { "Structure": "urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=BIS:IN_FLOW(1.0)", "Series": 3118, "Observations": 3118, "Groups": 0 } ] }, "OutputData": { "Datasets": [ { "Structure": "urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=BIS:OUT_FLOW(1.0)", "Series": 1753, "Observations": 1855, "Groups": 0 } ] }, "UnMappedData": { "Datasets": [ { "Structure": "urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=BIS:IN_FLOW(1.0)", "Series": 1263, "Observations": 1263, "Groups": 0 } ] } }