Difference between revisions of "Data Structure Definition V10"
(→Usage) |
(→Usage) |
||
Line 30: | Line 30: | ||
==Usage== | ==Usage== | ||
− | <p>Data Structure Definitions (DSDs) are used to describe the structure of datasets by specifying their [[Dimension] | + | <p>Data Structure Definitions (DSDs) are used to describe the structure of datasets by specifying their constituent [[Component|Components]]: |
− | <p> | + | * [[Dimension|Dimensions]] |
+ | * [[Attribute|Attributes]] | ||
+ | * [[Measure|Measures]] | ||
+ | and optionally the [[Representation]] for each Component.</p> | ||
+ | |||
+ | <p> | ||
+ | Each Dimension and Attribute is described by a [[Concept]]. Concepts have their own default [[Representation]], but can be overridden by defining a [[Local Representation]] for the Component in the DSD. That's particularly helpful when using some standard Concepts like the [https://registry.sdmx.org/ws/public/sdmxapi/rest/conceptscheme/SDMX/CROSS_DOMAIN_CONCEPTS/2.0 SDMX Cross Domain Concepts] where the default Representation is 'String', but the Component needs to be [[Enumerated]] or have some use case specific restriction on what values are allowable. | ||
+ | </p> | ||
+ | |||
+ | <p> | ||
+ | The Dimensions in a DSD have a defined order and together form the dataset's [[Series Key]].<br> | ||
+ | To illustrate the principle, here's a simple DSD:<br> | ||
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
− | ! Component Type !! Component | + | ! Position !! Component Type !! Component ID !! Description |
|- | |- | ||
− | | Dimension || Indicator | + | | 1 || Dimension || INDICATOR || Indicator |
|- | |- | ||
− | | Dimension || Reference Area | + | | 2 || Dimension || REF_AREA || Reference Area |
|- | |- | ||
− | | Dimension || Frequency | + | | 3 || Dimension || FREQUENCY || Data Frequency |
|- | |- | ||
− | | Dimension || Time | + | | n/a || Time Dimension || TIME_PERIOD || Observation Time |
|- | |- | ||
− | | Attribute || Unit Multiplier | + | | n/a || Attribute || UNIT_MULT || Unit Multiplier e.g. tens, thousands, millions |
|- | |- | ||
− | | Attribute || Observation Status | + | | n/a || Attribute || Observation Status || Observation Status e.g. Estimated, Final |
|- | |- | ||
− | | Primary Measure || Observation Value | + | | n/a || Primary Measure || Observation Value || The observation value |
|} | |} | ||
+ | The Series Key is the concatenation of the Dimensions in the order specified in the DSD. So for this example, the Series Key is:<br><br> | ||
+ | INDICATOR.REF_AREA.FREQUENCY | ||
+ | </p> | ||
+ | |||
+ | <p> | ||
+ | |||
+ | </p> | ||
+ | |||
+ | DSDs are reusable in that each can be used by multiple different [[Dataflow|Dataflows]]. It's important where a number of different datasets need to be collected or disseminated that all share the same dimensionality and coding schemes. But also allows standard DSDs to be published for use </p> | ||
+ | <p>Consider three datasets on the topics of Education, Health and Infrastructure. A simple DSD could be designed suitable for all three datasets as follows:</p> | ||
+ | |||
<p>In this model, indicators would need to be chosen carefully to avoid ambiguity between datasets. Nevertheless, it avoids proliferation of data structure definitions. </p> | <p>In this model, indicators would need to be chosen carefully to avoid ambiguity between datasets. Nevertheless, it avoids proliferation of data structure definitions. </p> |
Revision as of 03:30, 20 December 2019
Contents
Overview
An SDMX Data Structure Definition (DSD) describes the structure and dimensionality of a dataset in terms of its dimensions, attributes and measures.
Structure Properties
Structure Type | Standard SDMX Structural Metadata Artefact |
---|---|
Maintainable | Yes |
Identifiable | Yes |
Item Scheme | No |
SDMX Information Model Versions | 1.0, 2.0, 2.1 |
Concept ID | DSD |
Context within the SDMX 2.1 Information Model
The schematic illustrates the Data Structure Definition artefact within the SDMX 2.1 Information Model
Usage
Data Structure Definitions (DSDs) are used to describe the structure of datasets by specifying their constituent Components:
and optionally the Representation for each Component.
Each Dimension and Attribute is described by a Concept. Concepts have their own default Representation, but can be overridden by defining a Local Representation for the Component in the DSD. That's particularly helpful when using some standard Concepts like the SDMX Cross Domain Concepts where the default Representation is 'String', but the Component needs to be Enumerated or have some use case specific restriction on what values are allowable.
The Dimensions in a DSD have a defined order and together form the dataset's Series Key.
To illustrate the principle, here's a simple DSD:
Position | Component Type | Component ID | Description |
---|---|---|---|
1 | Dimension | INDICATOR | Indicator |
2 | Dimension | REF_AREA | Reference Area |
3 | Dimension | FREQUENCY | Data Frequency |
n/a | Time Dimension | TIME_PERIOD | Observation Time |
n/a | Attribute | UNIT_MULT | Unit Multiplier e.g. tens, thousands, millions |
n/a | Attribute | Observation Status | Observation Status e.g. Estimated, Final |
n/a | Primary Measure | Observation Value | The observation value |
The Series Key is the concatenation of the Dimensions in the order specified in the DSD. So for this example, the Series Key is:
INDICATOR.REF_AREA.FREQUENCY
DSDs are reusable in that each can be used by multiple different Dataflows. It's important where a number of different datasets need to be collected or disseminated that all share the same dimensionality and coding schemes. But also allows standard DSDs to be published for use
Consider three datasets on the topics of Education, Health and Infrastructure. A simple DSD could be designed suitable for all three datasets as follows:
In this model, indicators would need to be chosen carefully to avoid ambiguity between datasets. Nevertheless, it avoids proliferation of data structure definitions.