Difference between revisions of "DRAFT Data Process Service"
(→Load Data) |
|||
(3 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
+ | [[Category:How_To V11]] | ||
= Overview = | = Overview = | ||
The Data Process Service is a web service hosted by Fusion Registry which accepts a dataset (multiple formats supported) and a process instruction, which tells the Fusion Registry how to process the data, for example validate, map, transform, export. | The Data Process Service is a web service hosted by Fusion Registry which accepts a dataset (multiple formats supported) and a process instruction, which tells the Fusion Registry how to process the data, for example validate, map, transform, export. | ||
Line 50: | Line 51: | ||
== Run Process == | == Run Process == | ||
=== Purpose === | === Purpose === | ||
+ | Runs a process against the dataset, the purpose of running a process can include transformation, mapping, data manipulation | ||
=== Service Definition === | === Service Definition === | ||
Line 79: | Line 81: | ||
=== Process Instruction === | === Process Instruction === | ||
− | The process instruction chains together one or more Process Steps. Each Process Steps has an Identification (StepId) links to a | + | The process instruction chains together one or more Process Steps. Each Process Steps has an Identification (StepId) links to a [[DRAFT_Data_Process_Service#Supported_Processes|supported process]] (VALIDATE). The supported Properties of a Process Step are specific to the Process being run, a typical example for processes which output a dataset is the '''Output''' property which defines which Process Step to pass the output to. |
{ | { | ||
Line 187: | Line 189: | ||
=== Purpose === | === Purpose === | ||
− | Enriches a dataset by adding one or more components whose values | + | Enriches a dataset by adding one or more components whose values can be fixed, or contain content taken from other reported values for the same row of data. For example the creation of a Series Title based on the reported Indicator, Reference Area, and Frequency |
=== Parameters === | === Parameters === |
Latest revision as of 02:47, 19 July 2023
Contents
Overview
The Data Process Service is a web service hosted by Fusion Registry which accepts a dataset (multiple formats supported) and a process instruction, which tells the Fusion Registry how to process the data, for example validate, map, transform, export.
Workflow
The Data Process Service has the following workflow
- Load Data - Data is submitted using HTTP Post. The server provides a token to be able to perform further actions on the data
- Run Process - The process to execute is sent to the server, along with the token of which dataset to apply the process to. The server provides a token to track the process
- Track Progress - A GET request to track the progress of a running process, using the processes' unique token as supplied by the server
- Export Data - If the final stage of a Process is to store the data in a temporary store, then it can be exported in any data format supported by Fusion Registry
Any data loaded is stored for 2 minutes, if there is no activity on the data 2 minutes after it is loaded, it will be evicted from the system.
Load Data
Purpose
To load a dataset into the system, it is stored once loaded to await a process request. Loaded data may have more then one process request executed against it. After 2 minutes of inactivity, the loaded data is removed from the system.
Service Definition
Entry Point | /ws/secure/dataprocess/load |
Access | Restricted |
Http Method | POST |
Accepts | Any Supported Data Format |
Compression | Zip files supported |
Content-Type | 1. multipart/form-data (if attaching file) – the attached file must be in field name of uploadFile 2. application/text or application/xml (if submitting data in the body of the POST) |
Response Format | JSON |
Response Statuses | 200 - Data file received 400 - If not dataset provided 401 - Unauthorized (if access has been restricted) 500 - Server Error |
Response
{ "token" : "uid123" }
Run Process
Purpose
Runs a process against the dataset, the purpose of running a process can include transformation, mapping, data manipulation
Service Definition
Entry Point | /ws/secure/dataprocess/run/{token} |
Access | Restricted |
Http Method | POST |
Accepts | Process Instruction (JSON format) |
Content-Type | 1. multipart/form-data (if attaching file) – the attached file must be in field name of uploadFile 2. application/json (if submitting data in the body of the POST) |
Response Format | JSON |
Response Statuses | 200 - Data file received 400 - If the token does not match a known dataset 401 - Unauthorized (if access has been restricted) 500 - Server Error |
Response
{ "token" : "uid123" }
Process Instruction
The process instruction chains together one or more Process Steps. Each Process Steps has an Identification (StepId) links to a supported process (VALIDATE). The supported Properties of a Process Step are specific to the Process being run, a typical example for processes which output a dataset is the Output property which defines which Process Step to pass the output to.
{ "Structure" : "urn" //Provision, Dataflow, or DSD URN used to read the data, only required if the dataset does not contain this, or if overriding the Dataset "ProcessSteps" : [ { "ProcessId" : "VALIDATE" //Id of a registered process "StepId" : "STEP1" //Unique Id for the step "Metrics" : true //true to capture metrics for the process which are output in the track progress report "Properties" : {} //optional map of properties specific to the process being run } ] }
Track Process
Purpose
To track the execution status of a process request.
Service Definition
Entry Point | /ws/secure/dataprocess/status/{token} |
Access | Restricted |
Http Method | GET |
Response Format | JSON |
Response Statuses | 200 - Data file received 404 - If the token does not match a known process 401 - Unauthorized (if access has been restricted) 500 - Server Error |
Response Example
{ "ProcessToken" : "abcd", "StartTime" : 123456, //unix time milliseconds since 1970 "EndTime" : null, //unix time milliseconds since 1970 "Status" : 1 //0=running,1=success,2=error "Steps" : { "Step1" : { "StartTime" : 123456, //unix time milliseconds since 1970 "EndTime" : null, //unix time milliseconds since 1970 "Series" : 12, "Rows" : 132 } } }
Export Data
Purpose
Any data that is sent to the a process with Id STORE_TMP can be exported in any format that the Fusion Registry supports. This enables datasets to be exported in different formats from which they were loaded, and as this service can be called multiple times, the dataset can be exported in more then one format.
Service Definition
Entry Point | /ws/secure/dataprocess/download/{token}/{stepId} |
Access | Restricted |
Http Method | GET |
Response Format | JSON |
Response Statuses | 200 - Data file received 404 - If the token does not match a known process 401 - Unauthorized (if access has been restricted) 500 - Server Error |
Parameter | Required | Description |
---|---|---|
saveAs | False | If provided, the response will include a content-disposition HttpHeader with the value attachment; filename = {param value} |
format | False | This can be used to define the export data format, as opposed to using the HTTP Accept Header. @see formats reference for valid values for each format. |
Supported Processes
Mapping
Process ID: STRUCTURE_MAP
Purpose
Uses a Structure Map to map data conforming to one Data Structure Definition to another Data Structure Definition. Mapped output is sent to the Output step, any unmapped data is sent to the UnmappedStep (if configured to do so)
Parameters
Parameter Id | Required | Value |
---|---|---|
Output | true | String. The Process Step ID to pass the mapped data to |
Map | true | URN of the Structure Map to use |
UnmappedStep | false | The Process Step ID to pass the unmapped data to (if no mapping found) |
Dynamic Component
Process ID: DYNAMIC_COMPONENT
Purpose
Enriches a dataset by adding one or more components whose values can be fixed, or contain content taken from other reported values for the same row of data. For example the creation of a Series Title based on the reported Indicator, Reference Area, and Frequency
Parameters
Parameter Id | Required | Value |
---|---|---|
Output | true | String. The Process Step ID to pass the mapped data to |
DynamicComponents | true | A JSON Map of Component Id to value to use
|
Example
{ "Output" : "Step3", "DynamicComponents" : { "SERIES_TITLE" : "$[INDICATOR.name] for $[REF_AREA.name]" "OBS_STATUS" : "A" } }
Temporary Store (for download)
Process ID: STORE_TMP
Purpose
This process step is a terminal step, which stores the data in a location which will be available for download via the download service. The store is preserved for 10 minutes before being removed from the system.
Parameters
Not Applicable