Difference between revisions of "SDMX-CSV Data"

From FMR Knowledge Base
Jump to navigation Jump to search
(Overview)
 
(22 intermediate revisions by 3 users not shown)
Line 1: Line 1:
[[Category:SdmxDataFormat]]
+
[[Category:FMR Formats Reference]]
 
=Overview=
 
=Overview=
The SDMX-CSV Data format is an official SDMX format described on the [https://github.com/sdmx-twg/sdmx-csv/tree/master/data-message SDMX GitHub Repository].  
+
The SDMX-CSV Data format is an official SDMX format described on the [https://github.com/sdmx-twg/sdmx-csv/tree/master/data-message SDMX GitHub Repository]. It comes in 2 flavours, the older [https://github.com/sdmx-twg/sdmx-csv/blob/v1.0/data-message/docs/sdmx-csv-field-guide.md version 1.0.0 format] or the newer [https://github.com/sdmx-twg/sdmx-csv/blob/master/data-message/docs/sdmx-csv-field-guide.md version 2.0.0 format]. Using the newer version 2.0.0 format is recommended.
  
It can be used as both an import and export format for the Fusion Registry, and an export format for the Fusion Edge Server and Fusion Data Browser.
+
It can be used as both an import and export format for the Fusion Metadata Registry, and an export format for the Fusion Edge Server and Fusion Data Browser.
  
The format is comma separated format where each row describes a single Observation value.  The columns describe each component of the Observation value, reflecting the Components in the [[Data_Structure_Definition#Data_Structure_Components|Data Structure Definition]].
+
The format is comma separated format where each row describes a single Observation value.  The columns describe each component of the Observation value, reflecting the Components in the Data Structure Definition.
  
The dataset can contain both Code labels, in addition to the Code IDs.  The order of the columns is not important, however each column header will contain the ID of the corresponding Component.  The first column must be called '''Dataflow''' where each value contains the [[URN]] postfix of the Dataflow that dataset is for.
+
The dataset can contain both Code labels, in addition to the Code IDs.  The order of the columns is not important, however each column header will contain the ID of the corresponding Component.  The first column must be called '''Structure''' (unless using version 1.0.0) where each row contains the URN postfix of the Structure that the dataset is for.
  
 
=Formatting Using Query Parameters=
 
=Formatting Using Query Parameters=
Line 15: Line 15:
 
* timeFormat = normalized
 
* timeFormat = normalized
 
* bom = include | exclude  (Include or Exclude the [https://en.wikipedia.org/wiki/Byte_order_mark '''B'''yte '''O'''rder '''M'''ark] (BOM).<br/> The BOM helps Excel interpret non Latin characters when opening a CSV file)
 
* bom = include | exclude  (Include or Exclude the [https://en.wikipedia.org/wiki/Byte_order_mark '''B'''yte '''O'''rder '''M'''ark] (BOM).<br/> The BOM helps Excel interpret non Latin characters when opening a CSV file)
 +
 +
 +
{| class="wikitable"
 +
|-
 +
! Query Parameter !! Values !! Description
 +
|-
 +
| format || sdmx-csv || Required to output the dataset in csv-ts format
 +
|-
 +
| labels ||  '''id''' or '''name'''  or '''both''' <br/> default id || Defines if labels or ids should be used on output, if both is selected then there are 2 columns per coded component (one for Id one for the label)
 +
|-
 +
| timeFormat || normalized || outputs the time in the highest-frequency.  E.g if annual data is supplied, but the highest frequency is daily, then 2001 would become either 2001-01-01 or 2001-12-31 depending on whether startPeriod or endPeriod is used.
 +
|-
 +
| serieskey  || '''include ''' or '''exclude'''  <br/> default exclude  || If true, a series key column will be included in the output. Example -:-,-true-y.
 +
|-
 +
| bom || '''include ''' or '''exclude'''  || (Include or Exclude the [https://en.wikipedia.org/wiki/Byte_order_mark '''B'''yte '''O'''rder '''M'''ark] (BOM).<br/> The BOM helps Excel interpret non Latin characters when opening a CSV file)
 +
|-
 +
| keys || '''none ''', '''series''', '''obs''', '''both''', '''row''' or '''row:<name>''' <br/> default none || Outputs a series key or observation key (or both the series and observation key with "both") or the row key. The series and obs keys are colon delimited strings detailing the series / obs and empty values are left blank (e.g A:FR:).  The row key is a dot delimited string of all the dimensions including the time dimensions and empty values are specified as asterisks (e.g. A.FR.* )
 +
|}
 +
  
 
The output type "all-lang" will output both id and name and will add an element to the start of each row which will be the language. For each series and each language a row will be output where the name in the row is for the specified language. If there is no name value for that language, the name is simply not output. See below for an example output/
 
The output type "all-lang" will output both id and name and will add an element to the start of each row which will be the language. For each series and each language a row will be output where the name in the row is for the specified language. If there is no name value for that language, the name is simply not output. See below for an example output/
Line 28: Line 47:
 
=Example=
 
=Example=
 
An example query using the format request parameters, [[Data_Formats|HTTP Accept Headers]] can also be used to define the same format.<br/>
 
An example query using the format request parameters, [[Data_Formats|HTTP Accept Headers]] can also be used to define the same format.<br/>
<i>https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/WB,GCI,1.0/GHA.GCI..?format=sdmx-csv&labels=both</i>
+
<i>https://demo11.metadatatechnology.com/FusionRegistry/sdmx/v2/data/dataflow/WB/GCI/1.0/?c%5BREF_AREA%5D=GHA&c%5BINDICATOR%5D=GCI&c%5BSUB_INDICATOR%5D=RANK&format=sdmx-csv</i>
 +
 
 +
An example dataset with IDs only, spaces have been added to this example to assist readability.
 +
<pre>
 +
STRUCTURE, STRUCTURE_ID, ACTION, REF_AREA, INDICATOR, SUB_INDICATOR, FREQ, TIME_PERIOD, OBS_VALUE
 +
dataflow,  WB:GCI(1.0),  I,      GHA,      GCI,      RANK,          A,    2008,        102
 +
dataflow,  WB:GCI(1.0),  I,      GHA,      GCI,      RANK,          A,    2009,        114
 +
dataflow,  WB:GCI(1.0),  I,      GHA,      GCI,      RANK,          A,    2010,        114
 +
dataflow,  WB:GCI(1.0),  I,      GHA,      GCI,      RANK,          A,    2011,        114
 +
dataflow,  WB:GCI(1.0),  I,      GHA,      GCI,      RANK,          A,    2012,        103
 +
dataflow,  WB:GCI(1.0),  I,      GHA,     GCI,       RANK,          A,    2013,        114
 +
dataflow,  WB:GCI(1.0),  I,      GHA,      GCI,      RANK,          A,    2014,        111
 +
dataflow,  WB:GCI(1.0),  I,      GHA,      GCI,      RANK,          A,    2015,        119
 +
dataflow,  WB:GCI(1.0),  I,      GHA,      GCI,      RANK,          A,    2016,        114
 +
dataflow,  WB:GCI(1.0),  I,      GHA,      GCI,      RANK,          A,    2017,        111
 +
</pre>
 +
 
 +
The same dataset in SDMX-CSV with labels included.
 +
<pre>
 +
STRUCTURE, STRUCTURE_ID,                              ACTION, REF_AREA:Reference Area, INDICATOR:Indicator,              SUB_INDICATOR:Sub Indicator, FREQ:Frequency, TIME_PERIOD:Time period, OBS_VALUE:Observation
 +
dataflow,  WB:GCI(1.0): Global Competitiveness Index, I,      GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2008,                    102
 +
dataflow,  WB:GCI(1.0): Global Competitiveness Index, I,      GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2009,                    114
 +
dataflow,  WB:GCI(1.0): Global Competitiveness Index, I,      GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2010,                    114
 +
dataflow,  WB:GCI(1.0): Global Competitiveness Index, I,      GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2011,                    114
 +
dataflow,  WB:GCI(1.0): Global Competitiveness Index, I,      GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2012,                    103
 +
dataflow,  WB:GCI(1.0): Global Competitiveness Index, I,      GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2013,                    114
 +
dataflow,  WB:GCI(1.0): Global Competitiveness Index, I,      GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2014,                    111
 +
dataflow,  WB:GCI(1.0): Global Competitiveness Index, I,      GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2015,                    119
 +
dataflow,  WB:GCI(1.0): Global Competitiveness Index, I,      GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2016,                    114
 +
dataflow,  WB:GCI(1.0): Global Competitiveness Index, I,      GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2017,                    111
 +
</pre>
 +
 
 +
=Example of SDMX CSV version 1.0.0=
 +
An example query using the format request parameters, [[Data_Formats|HTTP Accept Headers]] can also be used to define the same format.<br/>
 +
https://demo11.metadatatechnology.com/FusionRegistry/sdmx/v2/data/dataflow/WB/GCI/1.0/?c%5BREF_AREA%5D=GHA&c%5BINDICATOR%5D=GCI&c%5BSUB_INDICATOR%5D=RANK&format=sdmx-csv-1.0.0</i>
  
 
An example dataset with IDs only, spaces have been added to this example to assist readability.
 
An example dataset with IDs only, spaces have been added to this example to assist readability.
Line 42: Line 95:
 
</pre>
 
</pre>
  
The same dataset in SDMX-CSV with lables included.
+
The same dataset in SDMX-CSV with labels included.
 
<pre>
 
<pre>
 
DATAFLOW,                                  REF_AREA:Reference Area, INDICATOR:Indicator,              SUB_INDICATOR:Sub Indicator, FREQ:Frequency, TIME_PERIOD:Time period, OBS_VALUE:Observation
 
DATAFLOW,                                  REF_AREA:Reference Area, INDICATOR:Indicator,              SUB_INDICATOR:Sub Indicator, FREQ:Frequency, TIME_PERIOD:Time period, OBS_VALUE:Observation

Latest revision as of 03:18, 14 August 2023

Overview

The SDMX-CSV Data format is an official SDMX format described on the SDMX GitHub Repository. It comes in 2 flavours, the older version 1.0.0 format or the newer version 2.0.0 format. Using the newer version 2.0.0 format is recommended.

It can be used as both an import and export format for the Fusion Metadata Registry, and an export format for the Fusion Edge Server and Fusion Data Browser.

The format is comma separated format where each row describes a single Observation value. The columns describe each component of the Observation value, reflecting the Components in the Data Structure Definition.

The dataset can contain both Code labels, in addition to the Code IDs. The order of the columns is not important, however each column header will contain the ID of the corresponding Component. The first column must be called Structure (unless using version 1.0.0) where each row contains the URN postfix of the Structure that the dataset is for.

Formatting Using Query Parameters

The following URL parameters can be used in a RESTful query for Fusion-CSV data.

  • format = sdmx-csv
  • labels = id | name | both | (id is default)
  • timeFormat = normalized
  • bom = include | exclude (Include or Exclude the Byte Order Mark (BOM).
    The BOM helps Excel interpret non Latin characters when opening a CSV file)


Query Parameter Values Description
format sdmx-csv Required to output the dataset in csv-ts format
labels id or name or both
default id
Defines if labels or ids should be used on output, if both is selected then there are 2 columns per coded component (one for Id one for the label)
timeFormat normalized outputs the time in the highest-frequency. E.g if annual data is supplied, but the highest frequency is daily, then 2001 would become either 2001-01-01 or 2001-12-31 depending on whether startPeriod or endPeriod is used.
serieskey include or exclude
default exclude
If true, a series key column will be included in the output. Example -:-,-true-y.
bom include or exclude (Include or Exclude the Byte Order Mark (BOM).
The BOM helps Excel interpret non Latin characters when opening a CSV file)
keys none , series, obs, both, row or row:<name>
default none
Outputs a series key or observation key (or both the series and observation key with "both") or the row key. The series and obs keys are colon delimited strings detailing the series / obs and empty values are left blank (e.g A:FR:). The row key is a dot delimited string of all the dimensions including the time dimensions and empty values are specified as asterisks (e.g. A.FR.* )


The output type "all-lang" will output both id and name and will add an element to the start of each row which will be the language. For each series and each language a row will be output where the name in the row is for the specified language. If there is no name value for that language, the name is simply not output. See below for an example output/

Normalized Time If the parameter value is normalized then the TIME_PERIOD values are converted to the most granular ISO 8601 representation taking into account the highest frequency of the data in the message

Example https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/WB,GCI,1.0/GHA.GCI..?format=csv&labels=both&delimiter=tab

Note: The same formatting can be applied using HTTP Accept Headers as opposed to query parameters.

Example

An example query using the format request parameters, HTTP Accept Headers can also be used to define the same format.
https://demo11.metadatatechnology.com/FusionRegistry/sdmx/v2/data/dataflow/WB/GCI/1.0/?c%5BREF_AREA%5D=GHA&c%5BINDICATOR%5D=GCI&c%5BSUB_INDICATOR%5D=RANK&format=sdmx-csv

An example dataset with IDs only, spaces have been added to this example to assist readability.

STRUCTURE, STRUCTURE_ID, ACTION, REF_AREA, INDICATOR, SUB_INDICATOR, FREQ, TIME_PERIOD, OBS_VALUE
dataflow,  WB:GCI(1.0),  I,      GHA,      GCI,       RANK,          A,    2008,        102
dataflow,  WB:GCI(1.0),  I,      GHA,      GCI,       RANK,          A,    2009,        114
dataflow,  WB:GCI(1.0),  I,      GHA,      GCI,       RANK,          A,    2010,        114
dataflow,  WB:GCI(1.0),  I,      GHA,      GCI,       RANK,          A,    2011,        114
dataflow,  WB:GCI(1.0),  I,      GHA,      GCI,       RANK,          A,    2012,        103
dataflow,  WB:GCI(1.0),  I,      GHA,      GCI,       RANK,          A,    2013,        114
dataflow,  WB:GCI(1.0),  I,      GHA,      GCI,       RANK,          A,    2014,        111
dataflow,  WB:GCI(1.0),  I,      GHA,      GCI,       RANK,          A,    2015,        119
dataflow,  WB:GCI(1.0),  I,      GHA,      GCI,       RANK,          A,    2016,        114
dataflow,  WB:GCI(1.0),  I,      GHA,      GCI,       RANK,          A,    2017,        111

The same dataset in SDMX-CSV with labels included.

STRUCTURE, STRUCTURE_ID,                              ACTION, REF_AREA:Reference Area, INDICATOR:Indicator,               SUB_INDICATOR:Sub Indicator, FREQ:Frequency, TIME_PERIOD:Time period, OBS_VALUE:Observation
dataflow,  WB:GCI(1.0): Global Competitiveness Index, I,      GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2008,                    102
dataflow,  WB:GCI(1.0): Global Competitiveness Index, I,      GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2009,                    114
dataflow,  WB:GCI(1.0): Global Competitiveness Index, I,      GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2010,                    114
dataflow,  WB:GCI(1.0): Global Competitiveness Index, I,      GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2011,                    114
dataflow,  WB:GCI(1.0): Global Competitiveness Index, I,      GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2012,                    103
dataflow,  WB:GCI(1.0): Global Competitiveness Index, I,      GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2013,                    114
dataflow,  WB:GCI(1.0): Global Competitiveness Index, I,      GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2014,                    111
dataflow,  WB:GCI(1.0): Global Competitiveness Index, I,      GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2015,                    119
dataflow,  WB:GCI(1.0): Global Competitiveness Index, I,      GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2016,                    114
dataflow,  WB:GCI(1.0): Global Competitiveness Index, I,      GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2017,                    111

Example of SDMX CSV version 1.0.0

An example query using the format request parameters, HTTP Accept Headers can also be used to define the same format.
https://demo11.metadatatechnology.com/FusionRegistry/sdmx/v2/data/dataflow/WB/GCI/1.0/?c%5BREF_AREA%5D=GHA&c%5BINDICATOR%5D=GCI&c%5BSUB_INDICATOR%5D=RANK&format=sdmx-csv-1.0.0

An example dataset with IDs only, spaces have been added to this example to assist readability.

DATAFLOW,    REF_AREA, INDICATOR, SUB_INDICATOR, FREQ, TIME_PERIOD, OBS_VALUE
WB:GCI(1.0), GHA,      GCI,       RANK,          A,    2008,        102
WB:GCI(1.0), GHA,      GCI,       RANK,          A,    2009,        114
WB:GCI(1.0), GHA,      GCI,       RANK,          A,    2010,        114
WB:GCI(1.0), GHA,      GCI,       RANK,          A,    2011,        114
WB:GCI(1.0), GHA,      GCI,       RANK,          A,    2012,        103
WB:GCI(1.0), GHA,      GCI,       RANK,          A,    2013,        114
WB:GCI(1.0), GHA,      GCI,       RANK,          A,    2014,        111

The same dataset in SDMX-CSV with labels included.

DATAFLOW,                                  REF_AREA:Reference Area, INDICATOR:Indicator,               SUB_INDICATOR:Sub Indicator, FREQ:Frequency, TIME_PERIOD:Time period, OBS_VALUE:Observation
WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2008,                    102
WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2009,                    114
WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2010,                    114
WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2011,                    114
WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2012,                    103
WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2013,                    114
WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana,              GCI: Global Competitiveness Index, RANK: Rank,                  A: Annual,      2014,                    111