Representation Map

From FMR Knowledge Base
Revision as of 03:06, 29 January 2021 by Mnelson (talk | contribs) (Regular Expressions)
Jump to navigation Jump to search

Overview

SDMX Version 3.0 Representation Maps are used to define a mapping rules between a source value (or combination of values) and a target value (or combination of targets). When a souce value is matched, the target value is output. Representation Maps can be used to map from one classification to another (for example from ISO2 to ISO3 character country codes), or can be used to map from non coded to coded values ($ maps to USD), or more complex mapping which require a combination of source values to generate target values. A Representation Map is more than a simple lookup table, as more complex rules can be introduced such as regular expression matches which can include capture groups to transfer patterns from source to target, substring matches, as well as rules which are only applicable to certain periods of time.

Representation Maps can be used by Structure Maps when defining relationships between two Data Structure Definitions (DSDs). The Structure Maps describe the relationships between the Components of the two DSDs, whilst Relationship Maps are used to describe how the values reported for each component map. For example, a Structure Map may say the COUNTRY Dimension in the source DSD maps to the REF_AREA Dimension in the target DSD - it is then the Relationship Map which describes that the value GB maps to GBR.

Representation Map - Model

Source and Target

A Representation Map can describe the allowable inputs by linking to source Codelists, or Valuelists, if the input is free text then the source is simply defined as Free Text. a Representation Map is not restricted to one source, it can define multiple, of mixed types. A combination of source values is a typical use case when a Representation Map is used by a Structure Map which is mapping more than one source component. For example, a Structure Map may state that the combination of REF_AREA and CURRENCY are used to derive an output value. The corresponding Representation Map would then have to include 2 sources, one for REF_AREA and the other for CURRENCY. A rule may state that the UK in combination with USD maps to a certain output value, whilst the UK in combination with GBP maps to another value.

A Representation Map has the same options for the targets, in that a target can link to a Codelist, Valuelist, or simply describe itself as a free text target (any textual content is valid).

An example Representation Map which maps from ISO2 to ISO3 Country Codes may define a single source Codelist of CL_ISO2_COUNTRIES and a target of CL_ISO3_COUNTRIES. Mapped source values will be code Ids in the ISO2 Codelist and the corresponding target will be codes in the ISO3 Codelist. It is perfectly valid to create the same mapping rules between ISO2 codes and ISO3 codes without linking the Representation Map to any source codelist or target codelist (just define both source and target as free text). The mapped values are the same, GB maps to GBR for example. The reason to link to the Country Codelist tells the user of the map, the intent of the mapping (to map between country codelists). It is also helpful to discover mappings for codelists (to ask the question which mappings exist for ISO2 Countries), and to ensure referential integrity (a Code can not be deleted from the Codelist if it is used in a Mapping Rule).

Mapped Values

A Representation Map essentially defines a list of mappings, source to target. In its simplest form mapping rules can be thought of as a lookup table

Simple mapping rules as a lookup table
Source Value Target Value
GB GBR
US USA

However, each rule may have multiple source inputs (and could have multiple outputs). Each source input rule may not be as simple as a string compare, it could define a rule to take a substring before applying the compare, or even apply a regular expression to match any input that follows a particular pattern. In this way, a single rule could be thought of as a table of matching rules, each input must match before the rule is deemed to have passed, and the corresponding output generated.

Defining a single mapping rule, consisting of 3 inputs and one output
Source Value Sub-String Start Sub-String End Is RegEx Target Value
GB - - No MY_NEW_CODE_ID
A - 1 No
[xyz] 2 4 Yes


Substring Rules

An input rule can define the start and end index of a string to match on, the index is 0 indexed (the 0 position is the start of the string).

Example
Rule: Match Value = UK, Substring Start = 2, Substring End = 4
Matched Input: __UK__
Matched Input: AAUKBB
No Match Input: AUKBB


Regular Expressions

An input rule can define a pattern match using regular expressions. Regular expressions have been around since the 1950s and as such are in common use, with many online [1] available to assist in generating a valid RegEx.

Example (zip code with or without dashes)
Rule: Match Value = (^\d{5}$)|(^\d{9}$)|(^\d{5}-\d{4}$)
Matched Input: 32225-1234
Matched Input: 322251234
Matched Input: 32225
Matched Input: 3222


Note: A RexEx rule can be used in combination with substring. The substring will first be taken on the input value, before applying the RegEx

Capture Groups