Data Transformation Web Service

From FMR Knowledge Base
Revision as of 04:38, 28 March 2024 by Vmurrell (talk | contribs)
Jump to navigation Jump to search

Overview

The Data Transformation Web Service converts the dataset submitted in the POST body to the data transmission format specified by the Accept header, optionally transforming it to a different Data Structure Definition if the Registry has a Structure Map defining the mapping.

Entry Point /ws/public/data/transform
Access Public (default). Configurable to Private
Http Method POST
Accepts CSV, XLSX, SDMX-ML, SDMX-EDI (any format for which there is a Data Reader)
Compression Zip files supported, if loading from URL gzip responses supported
Content-Type

1. multipart/form-data (if attaching file) – the attached file must be in field name of uploadFile

2. application/text or application/xml (if submitting data in the body of the POST)

Response Format Determined by Accept Header - default SDMX 2.1 Structure Specific
Response Statuses

200 - Transformation performed

400 - Transformation could not be performed (either an unreadable datasets, or resolvable reference to a required structure)

401 - Unauthorized (if access has been restricted)

500 - Server Error

HTTP Headers

The Accept Header is used to define the output format, to transform the data to.

In addition, the following optional header parameters can be used to provide further details on the incoming dataset. If these details are not provided, the Fusion Registry will interrogate the dataset header to get the information. If the dataset is a non-SDMX format, or does not contain the required information in the header, then an error response will be returned.

HTTP Header Purpose Allowed Values
Accept The data transmission format to convert the dataset to.


Note: From FMR 11.5.0 the format (if not specified) defaults to the input format. Previous versions defaulted to SDMX Structure Specific 2.1

SDMX Formats

  • application/vnd.sdmx.data+csv;version=2.0.0;labels=[id|name|both];timeFormat=[original|normalized];keys=[none|obs|series|both]
  • application/vnd.sdmx.genericdata+xml;version=2.1
  • application/vnd.sdmx.structurespecificdata+xml;version=2.1
  • application/vnd.sdmx.generictimeseriesdata+xml;version=2.1
  • application/vnd.sdmx.structurespecifictimeseriesdata+xml;version=2.1
  • application/vnd.sdmx.data+json;version=1.0.0
  • application/vnd.sdmx.data+csv;version=1.0.0;labels=[id|both];timeFormat=[original|normalized]
  • application/vnd.sdmx.data+edi
  • application/vnd.sdmx.data+json;version=2.0.0
  • application/vnd.sdmx.data+xml;version=3.0.0
  • Note that the Fusion Excel data transmission format is supported as the input, but not output of a transformation.

    Data-Format Used to inform the server when the data is in CSV format. csv;delimiter=[delimiter]

    Where [delimiter] is either:

    • comma
    • tab
    • semicolon
    • space
    Structure

    (optional) Provides the structure to validate the data against.

    This is optional as this information may be present in the header of the DataSet. If provided this value will override the value in the dataset (if present).

    Valid SDMX URN for Provision Agreement, Dataflow, or Data Structure Definition
    Receiver-Id

    (Since v9.8)

    The ReceiverId may be included in the validation report.

    If not provided, the ReceiverId will be taken from the header of the dataset if it is present.

    If the dataset does not contain a ReceiverId (for example a non-SDMX format) then the validation report will not contain a ReceiverId in the header.

    The following characters are allowed: A-z, a-z 0-9 $, _, -, @, \

    Structure

    Provides the structure used to read the data.

    This is optional as this information may be present in the header of the DataSet. If provided this value will override the value in the dataset (if present).

    Valid SDMX URN for Provision Agreement, Dataflow, or Data Structure Definition.
    Dataset-Idx

    If the loaded file contains multiple datasets, this argument can be used to indicate which dataset is transformed. If this argument is not present then all datasets will be in the output file (if the file formats permits multiple datasets).

    Zero indexed integer, example: 0
    Dataset-Id

    (Since v9.8)

    An optional parameter which allows the user to specify the value of the DataSetID generated in the validation.

    The following characters are allowed: A-z, a-z 0-9 $, _, -, @, \ Specific variables permit the insertion of Data Structure / Data Flow values. These values are:
    ${DATFLOW_ID}
    ${DATFLOW_ACY}
    ${DATFLOW_VER}
    ${DSD_ID}
    ${DSD_ACY}
    ${DSD_VER}

    Note that dots in the version number will be replaced with the _ character, since dots are not permitted in the ID.

    Dataset-Action

    (Since v9.8.1)

    An optional parameter which allows the user to specify the value of the DataSetAction generated in the validation report. If this parameter is not specified, the default value will be used. May be one of the following:
    • Append
    • Replace
    • Merge
    • FullReplace
    • Delete
    • Information
    Map-Structure

    (Since v9.2.13)

    An optional parameter to inform the Fusion Registry to transform the structure of the dataset to conform to another Data Structure Definition.

    The value provided can be a URN of a Dataflow or Data Structure Definition to map the incoming data to. A Structure Map must exist in the Fusion Registry which maps between the incoming Data Structure/Dataflow and Mapped Data Structure/Dataflow.

    Alternatively the URN may be the URN of the Data Structure Map to use for the mapping (since v9.4.4)

    Valid SDMX URN for Dataflow or Data Structure Definition.

    Inc-Unmapped

    (Since v9.6.5)

    If the Map-Structure Header is used, then the inclusion of Inc-Unmapped will output a second dataset, if there are unmapped series. The additional dataset contains the data that could not be mapped due to missing mapping rules, or ambiguous outputs.

    The format of the additional dataset is the same format as the output dataset.

    As the result may contain a separate file, the response format is either set to multipart/mixed message with a boundary per file, or if the Zip header is set to true, the output will be a single zip file. The file names are 'out' and 'unmapped' with the file extension based on the output format.

    Boolean (true/false)
    Inc-UnmappedReport

    (Since v11.5.0)

    If the Map-Structure Header is used, then the inclusion of Inc-UnmappedReport may output another file, if there are unmapped series. The additional file contains a report on the information that could not be mapped due to missing mapping rules, or ambiguous outputs.

    The format of this report consists of JSON elements:

    • The StructureMap used in the mapping
    • The Source Structure URN
    • The Target Structure URN
    • The Result

    The result consists of an Input and an Output which details what the input managed to map to. The output also contains a Array called "MissingDimensions" which lists the ID of the missing dimensions.

    Boolean (true/false)
    Inc-Metrics

    (Since v9.6.5)

    Includes metrics on the transformation.

    The result will contain a separate file, either as a multipart/mixed message with a boundary per file, or if the Zip header is set to true, the output will be a single zip file.

    Boolean (true/false)


    Fail-On-Error

    (Since v9.5.0)

    An optional parameter to tell the transformation process to fail if an error is detected in the dataset.

    Boolean (true/false)
    Zip

    (Since v9.6.5)

    Compresses the output as a zip file. This if used in conjunction with Inc-Metrics or Inc-Unmapped the zip will contain multiple files.

    Boolean (true/false)
    Duplicate-Behaviour

    (Since v11.1.6)

    Specify the behaviour to perform when duplicate observations are encountered. Either the duplicates can be preserved or either the first or last value can be used.

    May be one of the following:
    • useFirst
    • useLast
    • preserve
    Skip-Validation

    (Since v11.5.1)

    Allows the validation process to be skipped when transforming a file. Useful when the input file is well understood or large. Default is false.

    Boolean (true/false)


    Include Metrics

    The following JSON is an example response when Inc-Metrics header is set to true. Request Time is Epoc Time Milliseconds, and Duration is measured in the number of milliseconds taken to complete the transformation.

    { 
      "Meta": { 
                "RequestTime": 1559124708568, 
                "Duration": 220 }, 
      "SourceData": { 
                "Datasets": [
                    { 
                      "Structure": "urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=BIS:IN_FLOW(1.0)", 
                      "Series": 3118, 
                      "Observations": 3118, 
                      "Groups": 0 
                    }
                  ] 
                }, 
      "OutputData": { 
                 "Datasets": [
                    { 
                      "Structure": "urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=BIS:OUT_FLOW(1.0)", 
                      "Series": 1753, 
                      "Observations": 1855, 
                      "Groups": 0 
                    }
                  ] 
                }, 
      "UnMappedData": { 
                  "Datasets": [
                     { "Structure": "urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=BIS:IN_FLOW(1.0)", 
                       "Series": 1263, 
                       "Observations": 1263, 
                       "Groups": 0 
                     }
                   ] 
                 } 
     }