Difference between revisions of "Data Validation"

From FMR Knowledge Base
Jump to navigation Jump to search
Line 10: Line 10:
 
  <li><strong>Syntax Agnostic Validation</strong> - does the dataset contain the correct content</li>
 
  <li><strong>Syntax Agnostic Validation</strong> - does the dataset contain the correct content</li>
 
</ol>
 
</ol>
 +
 +
[[File:Data-validation-process.png|600px]]
 +
 +
<p>
 +
 
<p>Data Validation can either be performed via the web User Interface of the Fusion Registry, or by POSTing data directly to the Fusion Registries' data validation web service.</p>
 
<p>Data Validation can either be performed via the web User Interface of the Fusion Registry, or by POSTing data directly to the Fusion Registries' data validation web service.</p>
<p>Data Validation can be performed on data taken from a URL as long as the security configuration of the Registry allows this</p>
+
 
 +
= Syntax Validation =
 +
Syntax Validation refers to validaiton of the reported dataset in terms of the file syntax.  If the dataset is in SDMX-ML then this will ensure the XML is formatted correctly, and the XML Elements and XML Attributes are as expected.  If the dataset is in Excel Format (propriatory to the Fusion Registry) then these checks will ensure the data complies with the expected Excel format.
 +
 
 +
= Duplicates Validation =
 +
Part of the validation process is the consolidation of a dataset.  Consolidation refers to ensuring any duplicate series are 'rolled up' into a single series.  This process is important for data formats such as SDMX-EDI, where the series and observation attributes are reported at the end of a dataset, after all the observation values have been reported. 
 +
 
 +
Example:
 +
{| class="wikitable"
 +
|-
 +
! Frequency !! Reference Area !! Indicator !! Time !! Observation Value !! Observation Note
 +
|-
 +
| A || UK || IND_1 || 2009 || 12.2 || -
 +
|-
 +
| A || UK || IND_1 || 2010 || 13.2 || -
 +
|-
 +
| A || UK || IND_1 || 2009 || - || A Note
 +
|}
 +
 +
After consolidation:
 +
{| class="wikitable"
 +
|-
 +
! Frequency !! Reference Area !! Indicator !! Time !! Observation Value !! Observation Note
 +
|-
 +
| A || UK || IND_1 || 2009 || 12.2 || A Note
 +
|-
 +
| A || UK || IND_1 || 2010 || 13.2 || -
 +
|}
  
 
= Security =
 
= Security =
Line 19: Line 51:
 
  <li>Require that a user is authenticated before they can perform data validation on a dataset obtained from a URL</li>
 
  <li>Require that a user is authenticated before they can perform data validation on a dataset obtained from a URL</li>
 
</ul>
 
</ul>
 
 
 
 
 
 
[[File:Data-validation-process.png|600px]]
 

Revision as of 06:14, 10 February 2020


Overview

The Fusion Registry is able to validate datasets for which there is a Dataflow present in the Registry.

Data Validation is split into 3 high level validation process:

  1. Syntax Validation - is the syntax of the dataset correct
  2. Duplicates - format agnostic process of rolling up duplicate series and obs
  3. Syntax Agnostic Validation - does the dataset contain the correct content

Data-validation-process.png

Data Validation can either be performed via the web User Interface of the Fusion Registry, or by POSTing data directly to the Fusion Registries' data validation web service.

Syntax Validation

Syntax Validation refers to validaiton of the reported dataset in terms of the file syntax. If the dataset is in SDMX-ML then this will ensure the XML is formatted correctly, and the XML Elements and XML Attributes are as expected. If the dataset is in Excel Format (propriatory to the Fusion Registry) then these checks will ensure the data complies with the expected Excel format.

Duplicates Validation

Part of the validation process is the consolidation of a dataset. Consolidation refers to ensuring any duplicate series are 'rolled up' into a single series. This process is important for data formats such as SDMX-EDI, where the series and observation attributes are reported at the end of a dataset, after all the observation values have been reported.

Example:

Frequency Reference Area Indicator Time Observation Value Observation Note
A UK IND_1 2009 12.2 -
A UK IND_1 2010 13.2 -
A UK IND_1 2009 - A Note

After consolidation:

Frequency Reference Area Indicator Time Observation Value Observation Note
A UK IND_1 2009 12.2 A Note
A UK IND_1 2010 13.2 -

Security

Data Validation is by default a public service and as such a user can perform data validation with no authentication required. It is possible to change the security level in the Registry to either:

  • Require that a user is authenticated before they can perform ANY data validation
  • Require that a user is authenticated before they can perform data validation on a dataset obtained from a URL