Difference between revisions of "Data Structure Definition V10"

From FMR Knowledge Base
Jump to navigation Jump to search
(Conventions)
(Usage)
 
(49 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
+
[[Category:SDMX 2.1 Structures]]
[[Category:SDMX Structures]]
 
 
=Overview=
 
=Overview=
<p>An SDMX Data Structure Definition (DSD) describes the structure and dimensionality of a dataset in terms of its dimensions, attributes and measures.</p>
+
<p>An SDMX Data Structure Definition (DSD) provides a template which describes the dimensionality of related datasets in terms of their dimensions, attributes and measures.</p>
  
 
==Structure Properties==
 
==Structure Properties==
Line 10: Line 9:
 
|-
 
|-
 
! scope=row style="text-align: left;"  | Maintainable
 
! scope=row style="text-align: left;"  | Maintainable
| Yes
+
| [[Maintainable_V10|Yes]]
 
|-
 
|-
 
! scope=row style="text-align: left;"  | Identifiable
 
! scope=row style="text-align: left;"  | Identifiable
| Yes
+
| [[Identifiable V10|Yes]]
 
|-
 
|-
 
! scope=row style="text-align: left;"  | Item Scheme
 
! scope=row style="text-align: left;"  | Item Scheme
| No
+
| [[Item_Scheme_V10|No]]
 
|-
 
|-
 
! scope=row style="text-align: left;"  | SDMX Information Model Versions  
 
! scope=row style="text-align: left;"  | SDMX Information Model Versions  
 
| 1.0, 2.0, 2.1
 
| 1.0, 2.0, 2.1
 
|-
 
|-
! scope=row style="text-align: left;"  | Concept ID
+
! scope=row style="text-align: left;"  | URN - DataStructure namespace
| DSD
+
| <nowiki>urn:sdmx:org.sdmx.infomodel.datastructure.DataStructure</nowiki>
 +
|-
 +
! scope=row style="text-align: left;"  | URN - Dimension namespace
 +
| <nowiki>urn:sdmx:org.sdmx.infomodel.datastructure.Dimension</nowiki>
 +
|-
 +
! scope=row style="text-align: left;"  | URN - Attribute namespace
 +
| <nowiki>urn:sdmx:org.sdmx.infomodel.datastructure.DataAttribute</nowiki>
 +
|-
 +
! scope=row style="text-align: left;"  | URN - MeasureDimension namespace
 +
| <nowiki>urn:sdmx:org.sdmx.infomodel.datastructure.MeasureDimension</nowiki>
 +
|-
 +
! scope=row style="text-align: left;"  | URN - TimeDimension namespace
 +
| <nowiki>urn:sdmx:org.sdmx.infomodel.datastructure.TimeDimension</nowiki>
 +
|-
 
|}
 
|}
  
 
==Context within the SDMX 2.1 Information Model==
 
==Context within the SDMX 2.1 Information Model==
::[[File:SDMX_Information_Model_-_Core_Artefacts_-_DSD.png|600px|frameless]]
+
 
<p>The schematic illustrates the Data Structure Definition artefact within the SDMX 2.1 Information Model</p>
+
[[File:L_DSD1.png|Data Structure Definition|600px]]
  
 
==Usage==
 
==Usage==
Line 33: Line 45:
 
* [[Dimension|Dimensions]]
 
* [[Dimension|Dimensions]]
 
* [[Attribute|Attributes]]
 
* [[Attribute|Attributes]]
* [[Measure|Measures]]
+
* Measures
 
and optionally the [[Representation]] for each Component.</p>
 
and optionally the [[Representation]] for each Component.</p>
  
<p>Each [[Dataflow|Dataflow]] references a single DSD which describes the structure of the dataset that the Dataflow represents.</p>
+
<p>Each [[Dataflow_V10|Dataflow]] references a single DSD which describes the structure of the dataset that the Dataflow represents.</p>
  
 
=Conventions=
 
=Conventions=
 
<p>
 
<p>
 
<strong>DSD IDs</strong><br>
 
<strong>DSD IDs</strong><br>
DSD IDs are conventionally written uppercase using underscores '_' as separators if required. Examples:<br>
+
DSD IDs are conventionally uppercase using underscores '_' as separators if required. Examples:<br>
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Agency !! DSD ID !! Description
+
! Agency !! DSD ID !! Description !! SDMX-ML
 
|-
 
|-
| Eurostat || NA_MAIN || European National Accounts Main Aggregated Statistical Indicators
+
| Eurostat || NA_MAIN || European National Accounts Main Aggregated Statistical Indicators || [https://registry.sdmx.org/ws/public/sdmxapi/rest/datastructure/ESTAT/NA_MAIN/1.11 SDMX-ML]
 
|-
 
|-
| IMF || BPO || Balance of Payments and International Investment Position
+
| IMF || BPO || Balance of Payments and International Investment Position || [https://registry.sdmx.org/ws/public/sdmxapi/rest/datastructure/IMF/BOP/1.13 SDMX-ML]
 
|-
 
|-
| IMF || ALT_FISCAL_DSD || Alternate Fiscal Data Structure Definition
+
| IMF || ALT_FISCAL_DSD || Alternate Fiscal Data Structure Definition || [https://sdmxcentral.imf.org/ws/public/sdmxapi/rest/datastructure/IMF/ALT_FISCAL_DSD/1.0 SDMX-ML]
| World Bank || WDI || World Development Indicators
+
|-
 +
| World Bank || WDI || World Development Indicators || [https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/datastructure/WB/WDI/1.0 SDMX-ML]
 
|}
 
|}
 +
The SDMX standard does not preclude using lowercase or mixed case for structure IDs. However IDs are case sensitive meaning that a DSD with ID 'NATIONAL_ACCOUNTS' is distinct from another named 'National_Accounts'.<br>
 +
<strong>TIME_PERIOD</strong><br>
 +
For Time Series DSDs, the [[Time Dimension]] Component is conventionally given the ID 'TIME_PERIOD'.
 
</p>
 
</p>
  
 
=Data Structure Components=
 
=Data Structure Components=
 
<p>
 
<p>
<strong>The Role of Concepts In Defining a DSD's Components</strong><br>
+
<strong>The role of Concepts In defining a DSD's Components</strong><br>
Every Dimension, Attribute and Measure is described by a predefined [[Concept]]. Concepts have their own default [[Representation]] which can be overridden by defining a [[Local Representation]] for the Component in the DSD. That's particularly helpful when using some standard Concepts like the [https://registry.sdmx.org/ws/public/sdmxapi/rest/conceptscheme/SDMX/CROSS_DOMAIN_CONCEPTS/2.0 SDMX Cross Domain Concepts] where the default Representation is 'String', but the Component needs to be [[Enumerated]] or have some use case specific restriction on what values are allowable.  
+
Every Dimension, Attribute and Measure is described by a predefined [[Concepts V10|Concepts]]. Concepts have their own default [[Representation]] which can be overridden by defining a Local Representation for the Component in the DSD. That's particularly helpful when using some standard Concepts like the [https://registry.sdmx.org/ws/public/sdmxapi/rest/conceptscheme/SDMX/CROSS_DOMAIN_CONCEPTS/2.0 SDMX Cross Domain Concepts] where the default Representation is 'String', but the Component needs to be Enumerated or have some use case specific restriction on what values are allowable.  
 
</p>
 
</p>
  
Line 89: Line 105:
 
| n/a || Primary Measure || Observation Value || The observation value
 
| n/a || Primary Measure || Observation Value || The observation value
 
|}
 
|}
The Series Key is the concatenation of the Dimensions in the order specified in the DSD. In this example, the Series Key is:<br><br>
+
The Series Key is the dot (.) concatenation of the Dimensions in the order specified in the DSD.  
<code>INDICATOR.REF_AREA.FREQUENCY</code><br><br>
+
 
 +
For the DSD above, the Series Key is constructed as follows:
 +
 
 +
'''<INDICATOR>.<REF_AREA>.<FREQUENCY>'''
 +
 
 +
Examples
 +
ATMCO2.GRC.A
 +
ATMCO2.GRC.M
 +
TBSINDC.MDG.A
 +
TBSINDC.GBR.M
 +
 
 +
 
 
Attributes do not form part of the Series Key so have no explicit or implied ordering.
 
Attributes do not form part of the Series Key so have no explicit or implied ordering.
 
</p>
 
</p>
Line 96: Line 123:
 
<p>
 
<p>
 
<strong>Attributes</strong><br>
 
<strong>Attributes</strong><br>
Attributes allow extra concepts to be added to the dataset to provide additional information about the variable being measured such as the unit multipler or observation status.<br>
+
Attributes allow extra concepts to be added to the dataset to provide additional information about the variable being measured such as the unit multiplier or observation status.<br>
 
Attributes are unique in that they must be attached to [[#Attribute Attachment Levels|specific levels in the dataset]] at DSD design time.   
 
Attributes are unique in that they must be attached to [[#Attribute Attachment Levels|specific levels in the dataset]] at DSD design time.   
 
</p>
 
</p>
Line 107: Line 134:
 
<p>
 
<p>
 
<strong>Time Dimension</strong><br>
 
<strong>Time Dimension</strong><br>
A Time Dimension is required for DSDs representing [[Time Series]] datasets. Again, the Time Dimension must reference a Concept which should have an appropriate time representation - typically [[Observational Time Period]].
+
A Time Dimension is required for DSDs representing [[Time Series]] datasets. Again, the Time Dimension must reference a Concept which should have an appropriate time representation - typically Observational Time Period.
 
</p>
 
</p>
  
 
=Attribute Attachment Levels=
 
=Attribute Attachment Levels=
In designing a DSD, attributes must be attached to specific levels in the datasets.
+
<p>
 +
In designing a DSD, attributes must be attached to specific levels in the dataset.
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
Line 122: Line 150:
 
| Observation || A different value for the attribute can be set for each individual observation in a time series
 
| Observation || A different value for the attribute can be set for each individual observation in a time series
 
|-
 
|-
| Groupe || A different value for the attribute can be set for a [[Group]] of series
+
| Group || A different value for the attribute can be set for a [[Group]] of series
 
|}
 
|}
 +
Example 'Demography' DSD with Attributes attached at the Series and Observation level:<br><br>
 +
[[File:DSD With Attribute Attachments.PNG|500px]]
 +
</p>
 +
 +
=Time Series=
 +
<p>
 +
DSDs for Time Series are characterised by having an explicit [[Time Dimension]].<br>
 +
In combination with the DSD's other Dimensions, the Time Dimension uniquely identifies an individual Observation within a Dataset.
 +
</p>
 +
 +
=Non Time Series=
 +
<p>
 +
DSDs can be designed for non Time Series datasets by excluding the [[Time Dimension]]. This supports use cases like census statistics which, although the observations are from a fixed point in time, there's no sequence of observations over a period of time.
 +
</p>
 +
 +
=Data Structure Definitions with Multiple Measures=
 +
Data Structures can be created in the Registry containing multiple measures, however the generate data set option does not support multiple measures and the load/validation process will reject datasets containing multiple measures.

Latest revision as of 04:28, 28 March 2024

Overview

An SDMX Data Structure Definition (DSD) provides a template which describes the dimensionality of related datasets in terms of their dimensions, attributes and measures.

Structure Properties

Structure Type Standard SDMX Structural Metadata Artefact
Maintainable Yes
Identifiable Yes
Item Scheme No
SDMX Information Model Versions 1.0, 2.0, 2.1
URN - DataStructure namespace urn:sdmx:org.sdmx.infomodel.datastructure.DataStructure
URN - Dimension namespace urn:sdmx:org.sdmx.infomodel.datastructure.Dimension
URN - Attribute namespace urn:sdmx:org.sdmx.infomodel.datastructure.DataAttribute
URN - MeasureDimension namespace urn:sdmx:org.sdmx.infomodel.datastructure.MeasureDimension
URN - TimeDimension namespace urn:sdmx:org.sdmx.infomodel.datastructure.TimeDimension

Context within the SDMX 2.1 Information Model

Data Structure Definition

Usage

Data Structure Definitions (DSDs) are used to describe the structure of datasets by specifying their constituent Components:

and optionally the Representation for each Component.

Each Dataflow references a single DSD which describes the structure of the dataset that the Dataflow represents.

Conventions

DSD IDs
DSD IDs are conventionally uppercase using underscores '_' as separators if required. Examples:

Agency DSD ID Description SDMX-ML
Eurostat NA_MAIN European National Accounts Main Aggregated Statistical Indicators SDMX-ML
IMF BPO Balance of Payments and International Investment Position SDMX-ML
IMF ALT_FISCAL_DSD Alternate Fiscal Data Structure Definition SDMX-ML
World Bank WDI World Development Indicators SDMX-ML

The SDMX standard does not preclude using lowercase or mixed case for structure IDs. However IDs are case sensitive meaning that a DSD with ID 'NATIONAL_ACCOUNTS' is distinct from another named 'National_Accounts'.
TIME_PERIOD
For Time Series DSDs, the Time Dimension Component is conventionally given the ID 'TIME_PERIOD'.

Data Structure Components

The role of Concepts In defining a DSD's Components
Every Dimension, Attribute and Measure is described by a predefined Concepts. Concepts have their own default Representation which can be overridden by defining a Local Representation for the Component in the DSD. That's particularly helpful when using some standard Concepts like the SDMX Cross Domain Concepts where the default Representation is 'String', but the Component needs to be Enumerated or have some use case specific restriction on what values are allowable.

Dimensions
A DSDs Dimensions are the minimal set of statistical concepts capable of uniquely identifying a specific series. For Time Series, the Dimensions in combination with the Time Dimension, uniquely identify an Observation.
In this sense, the Dimensions of a dataset together form its primary key.

Ordering of Dimensions in a DSD
The Dimensions in a DSD have a defined order and together form the dataset's Series Key.
Below is a simple example DSD:

Position Component Type Component ID Description
1 Dimension INDICATOR Indicator
2 Dimension REF_AREA Reference Area
3 Dimension FREQUENCY Data Frequency
n/a Time Dimension TIME_PERIOD Observation Time
n/a Attribute UNIT_MULT Unit Multiplier e.g. tens, thousands, millions
n/a Attribute Observation Status Observation Status e.g. Estimated, Final
n/a Primary Measure Observation Value The observation value

The Series Key is the dot (.) concatenation of the Dimensions in the order specified in the DSD.

For the DSD above, the Series Key is constructed as follows:

<INDICATOR>.<REF_AREA>.<FREQUENCY>

Examples

ATMCO2.GRC.A
ATMCO2.GRC.M
TBSINDC.MDG.A
TBSINDC.GBR.M


Attributes do not form part of the Series Key so have no explicit or implied ordering.

Attributes
Attributes allow extra concepts to be added to the dataset to provide additional information about the variable being measured such as the unit multiplier or observation status.
Attributes are unique in that they must be attached to specific levels in the dataset at DSD design time.

Primary Measure
All DSDs must have a Primary Measure Component, which is used for the observation value of the main variable being measured. Like all components, the Primary Measure must reference a Concept. For many series, the measure is numeric, but does not need to be so.

Time Dimension
A Time Dimension is required for DSDs representing Time Series datasets. Again, the Time Dimension must reference a Concept which should have an appropriate time representation - typically Observational Time Period.

Attribute Attachment Levels

In designing a DSD, attributes must be attached to specific levels in the dataset.

Attachment Level Description
Dataset A single value for the attribute is set for the complete dataset
Series A different value for the attribute can be set for each series
Observation A different value for the attribute can be set for each individual observation in a time series
Group A different value for the attribute can be set for a Group of series

Example 'Demography' DSD with Attributes attached at the Series and Observation level:

DSD With Attribute Attachments.PNG

Time Series

DSDs for Time Series are characterised by having an explicit Time Dimension.
In combination with the DSD's other Dimensions, the Time Dimension uniquely identifies an individual Observation within a Dataset.

Non Time Series

DSDs can be designed for non Time Series datasets by excluding the Time Dimension. This supports use cases like census statistics which, although the observations are from a fixed point in time, there's no sequence of observations over a period of time.

Data Structure Definitions with Multiple Measures

Data Structures can be created in the Registry containing multiple measures, however the generate data set option does not support multiple measures and the load/validation process will reject datasets containing multiple measures.