Difference between revisions of "DRAFT Data Process Service"

From FMR Knowledge Base
Jump to navigation Jump to search
(Load Data)
 
(3 intermediate revisions by one other user not shown)
Line 1: Line 1:
 +
[[Category:How_To V11]]
 
= Overview =
 
= Overview =
 
The Data Process Service is a web service hosted by Fusion Registry which accepts a dataset (multiple formats supported) and a process instruction, which tells the Fusion Registry how to process the data, for example validate, map, transform, export.
 
The Data Process Service is a web service hosted by Fusion Registry which accepts a dataset (multiple formats supported) and a process instruction, which tells the Fusion Registry how to process the data, for example validate, map, transform, export.
Line 50: Line 51:
 
== Run Process ==
 
== Run Process ==
 
=== Purpose ===
 
=== Purpose ===
 +
Runs a process against the dataset, the purpose of running a process can include transformation, mapping, data manipulation
  
 
=== Service Definition ===
 
=== Service Definition ===
Line 79: Line 81:
  
 
=== Process Instruction ===
 
=== Process Instruction ===
The process instruction chains together one or more Process Steps.  Each Process Steps has an Identification (StepId) links to a task (VALIDATE). The supported Properties of a Process Step are specific to the Process being run, a typical example for processes which output a dataset is the '''Output''' property which defines which Process Step to pass the output to.   
+
The process instruction chains together one or more Process Steps.  Each Process Steps has an Identification (StepId) links to a [[DRAFT_Data_Process_Service#Supported_Processes|supported process]] (VALIDATE). The supported Properties of a Process Step are specific to the Process being run, a typical example for processes which output a dataset is the '''Output''' property which defines which Process Step to pass the output to.   
  
 
  {
 
  {
Line 187: Line 189:
  
 
=== Purpose ===
 
=== Purpose ===
Enriches a dataset by adding one or more components whose values are based on values of other reported values for the same row.  For example the creation of a Series Title based on the reported Indicator, Reference Area, and Frequency
+
Enriches a dataset by adding one or more components whose values can be fixed, or contain content taken from other reported values for the same row of data.  For example the creation of a Series Title based on the reported Indicator, Reference Area, and Frequency
  
 
=== Parameters ===
 
=== Parameters ===

Latest revision as of 01:47, 19 July 2023

Overview

The Data Process Service is a web service hosted by Fusion Registry which accepts a dataset (multiple formats supported) and a process instruction, which tells the Fusion Registry how to process the data, for example validate, map, transform, export.

FR Process Example.jpg

Workflow

The Data Process Service has the following workflow

  1. Load Data - Data is submitted using HTTP Post. The server provides a token to be able to perform further actions on the data
  2. Run Process - The process to execute is sent to the server, along with the token of which dataset to apply the process to. The server provides a token to track the process
  3. Track Progress - A GET request to track the progress of a running process, using the processes' unique token as supplied by the server
  4. Export Data - If the final stage of a Process is to store the data in a temporary store, then it can be exported in any data format supported by Fusion Registry

Any data loaded is stored for 2 minutes, if there is no activity on the data 2 minutes after it is loaded, it will be evicted from the system.

Load Data

Purpose

To load a dataset into the system, it is stored once loaded to await a process request. Loaded data may have more then one process request executed against it. After 2 minutes of inactivity, the loaded data is removed from the system.

Service Definition

Entry Point /ws/secure/dataprocess/load
Access Restricted
Http Method POST
Accepts Any Supported Data Format
Compression Zip files supported
Content-Type

1. multipart/form-data (if attaching file) – the attached file must be in field name of uploadFile

2. application/text or application/xml (if submitting data in the body of the POST)

Response Format JSON
Response Statuses

200 - Data file received

400 - If not dataset provided

401 - Unauthorized (if access has been restricted)

500 - Server Error

Response

 {
   "token" : "uid123"
 }

Run Process

Purpose

Runs a process against the dataset, the purpose of running a process can include transformation, mapping, data manipulation

Service Definition

Entry Point /ws/secure/dataprocess/run/{token}
Access Restricted
Http Method POST
Accepts Process Instruction (JSON format)
Content-Type

1. multipart/form-data (if attaching file) – the attached file must be in field name of uploadFile

2. application/json (if submitting data in the body of the POST)

Response Format JSON
Response Statuses

200 - Data file received

400 - If the token does not match a known dataset

401 - Unauthorized (if access has been restricted)

500 - Server Error

Response

 {
   "token" : "uid123"
 }

Process Instruction

The process instruction chains together one or more Process Steps. Each Process Steps has an Identification (StepId) links to a supported process (VALIDATE). The supported Properties of a Process Step are specific to the Process being run, a typical example for processes which output a dataset is the Output property which defines which Process Step to pass the output to.

{
  "Structure" : "urn"     //Provision, Dataflow, or DSD URN used to read the data, only required if the dataset does not contain this, or if overriding the Dataset
  "ProcessSteps" : [
    {
      "ProcessId"  : "VALIDATE"  //Id of a registered process
      "StepId"     : "STEP1"     //Unique Id for the step
      "Metrics"    : true        //true to capture metrics for the process which are output in the track progress report
      "Properties" : {}          //optional map of properties specific to the process being run
    }
  ]
}

Track Process

Purpose

To track the execution status of a process request.

Service Definition

Entry Point /ws/secure/dataprocess/status/{token}
Access Restricted
Http Method GET
Response Format JSON
Response Statuses

200 - Data file received

404 - If the token does not match a known process

401 - Unauthorized (if access has been restricted)

500 - Server Error

Response Example

 {
   "ProcessToken" : "abcd",
   "StartTime"    : 123456,   //unix time milliseconds since 1970
   "EndTime"      : null,     //unix time milliseconds since 1970
   "Status"       : 1         //0=running,1=success,2=error
   "Steps"        :
     {
        "Step1" : 
          {
            "StartTime"   : 123456,  //unix time milliseconds since 1970
            "EndTime"     : null,    //unix time milliseconds since 1970
            "Series" 	   : 12,
            "Rows"	   : 132
          }
     }
 }

Export Data

Purpose

Any data that is sent to the a process with Id STORE_TMP can be exported in any format that the Fusion Registry supports. This enables datasets to be exported in different formats from which they were loaded, and as this service can be called multiple times, the dataset can be exported in more then one format.

Service Definition

Entry Point /ws/secure/dataprocess/download/{token}/{stepId}
Access Restricted
Http Method GET
Response Format JSON
Response Statuses

200 - Data file received

404 - If the token does not match a known process

401 - Unauthorized (if access has been restricted)

500 - Server Error


Request Parameters
Parameter Required Description
saveAs False If provided, the response will include a content-disposition HttpHeader with the value attachment; filename = {param value}
format False This can be used to define the export data format, as opposed to using the HTTP Accept Header. @see formats reference for valid values for each format.

Supported Processes

Mapping

Process ID: STRUCTURE_MAP

Purpose

Uses a Structure Map to map data conforming to one Data Structure Definition to another Data Structure Definition. Mapped output is sent to the Output step, any unmapped data is sent to the UnmappedStep (if configured to do so)

Parameters

Parameter Id Required Value
Output true String. The Process Step ID to pass the mapped data to
Map true URN of the Structure Map to use
UnmappedStep false The Process Step ID to pass the unmapped data to (if no mapping found)

Dynamic Component

Process ID: DYNAMIC_COMPONENT

Purpose

Enriches a dataset by adding one or more components whose values can be fixed, or contain content taken from other reported values for the same row of data. For example the creation of a Series Title based on the reported Indicator, Reference Area, and Frequency

Parameters

Parameter Id Required Value
Output true String. The Process Step ID to pass the mapped data to
DynamicComponents true A JSON Map of Component Id to value to use


Variable Placeholder The value can contain variable using the syntax $[COMPONENT_ID.{postfix}] where postfix is either name or desc which will take the component's Name or Description respectively


Example

{
  "Output" : "Step3",
  "DynamicComponents"       : 
  {
    "SERIES_TITLE" : "$[INDICATOR.name] for $[REF_AREA.name]"
    "OBS_STATUS"   : "A"
  }
}

Temporary Store (for download)

Process ID: STORE_TMP

Purpose

This process step is a terminal step, which stores the data in a location which will be available for download via the download service. The store is preserved for 10 minutes before being removed from the system.

Parameters

Not Applicable