Difference between revisions of "Representation Map"
(→Copy Value Unless) |
(→Copy Value Unless) |
||
Line 157: | Line 157: | ||
== Copy Value Unless == | == Copy Value Unless == | ||
− | This use case is when the user wants to copy the input to the output (verbatim), with the exception of a few conditions. This could be used when the source and target Codelists are largely the same, with a few exceptions. This use case is satisfied by using explicit mapping, and a catch all regular expression which is used to perform a copy from source to target by using the regex capture group. | + | This use case is when the user wants to copy the input to the output (verbatim), with the exception of a few conditions. This could be used when the source and target Codelists are largely the same, with a few exceptions. This use case is satisfied by using explicit mapping, and a catch all regular expression which is used to perform a copy from source to target by using the [[#Capture Groups|regex capture group]]. |
{| class="wikitable" | {| class="wikitable" |
Revision as of 04:49, 27 April 2022
Overview
Structure Properties
Structure Type | Standard SDMX Structural Metadata Artefact |
---|---|
Maintainable | Yes |
Identifiable | Yes |
Item Scheme | No |
SDMX Information Model Versions | 3.0-DRAFT |
Concept ID | RepresentationMap |
Useage
SDMX Version 3.0 Representation Maps are used to define a mapping rules between a source value (or combination of values) and a target value (or combination of targets). When a souce value is matched, the target value is output. Representation Maps can be used to map from one classification to another (for example from ISO2 to ISO3 character country codes), or can be used to map from non coded to coded values ($ maps to USD), or more complex mapping which require a combination of source values to generate target values. A Representation Map is more than a simple lookup table, as more complex rules can be introduced such as regular expression matches which can include capture groups to transfer patterns from source to target, substring matches, as well as rules which are only applicable to certain periods of time.
Representation Maps can be used by Structure Maps when defining relationships between two Data Structure Definitions (DSDs). The Structure Maps describe the relationships between the Components of the two DSDs, whilst Relationship Maps are used to describe how the values reported for each component map. For example, a Structure Map may say the COUNTRY Dimension in the source DSD maps to the REF_AREA Dimension in the target DSD - it is then the Relationship Map which describes that the value GB maps to GBR.
Representation Map - Model
Source and Target
A Representation Map can describe the allowable inputs by linking to source Codelists, or Valuelists, if the input is free text then the source is simply defined as Free Text. a Representation Map is not restricted to one source, it can define multiple, of mixed types. A combination of source values is a typical use case when a Representation Map is used by a Structure Map which is mapping more than one source component. For example, a Structure Map may state that the combination of REF_AREA and CURRENCY are used to derive an output value. The corresponding Representation Map would then have to include 2 sources, one for REF_AREA and the other for CURRENCY. A rule may state that the UK in combination with USD maps to a certain output value, whilst the UK in combination with GBP maps to another value.
A Representation Map has the same options for the targets, in that a target can link to a Codelist, Valuelist, or simply describe itself as a free text target (any textual content is valid).
An example Representation Map which maps from ISO2 to ISO3 Country Codes may define a single source Codelist of CL_ISO2_COUNTRIES and a target of CL_ISO3_COUNTRIES. Mapped source values will be code Ids in the ISO2 Codelist and the corresponding target will be codes in the ISO3 Codelist. It is perfectly valid to create the same mapping rules between ISO2 codes and ISO3 codes without linking the Representation Map to any source codelist or target codelist (just define both source and target as free text). The mapped values are the same, GB maps to GBR for example. The reason to link to the Country Codelist tells the user of the map, the intent of the mapping (to map between country codelists). It is also helpful to discover mappings for codelists (to ask the question which mappings exist for ISO2 Countries), and to ensure referential integrity (a Code can not be deleted from the Codelist if it is used in a Mapping Rule).
Mapped Values
A Representation Map essentially defines a list of mappings, source to target. In its simplest form mapping rules can be thought of as a lookup table
Source Value | Target Value |
---|---|
GB | GBR |
US | USA |
However, each rule may have multiple source inputs (and could have multiple outputs). Each source input rule may not be as simple as a string compare, it could define a rule to take a substring before applying the compare, or even apply a regular expression to match any input that follows a particular pattern. In this way, a single rule could be thought of as a table of matching rules, each input must match before the rule is deemed to have passed, and the corresponding output generated.
Source Value | Sub-String Start | Sub-String End | Is RegEx | Target Value |
---|---|---|---|---|
GB | - | - | No | MY_NEW_CODE_ID |
A | - | 1 | No | |
[xyz] | 2 | 4 | Yes |
Substring Rules
An input rule can define the start and end index of a string to match on, the index is 0 indexed (the 0 position is the start of the string).
Example
Rule: Match Value = UK, Substring Start = 2, Substring End = 4
Matched Input: __UK__
Matched Input: AAUKBB
No Match Input: AUKBB
Regular Expressions
An input rule can define a pattern match using regular expressions. Regular expressions have been around since the 1950s and as such are in common use, with many online [1] available to assist in generating a valid RegEx.
Example (zip code with or without dashes)
Rule: Match Value = (^\d{5}$)|(^\d{9}$)|(^\d{5}-\d{4}$)
Matched Input: 32225-1234
Matched Input: 322251234
Matched Input: 32225
Matched Input: 3222
Note: A RexEx rule can be used in combination with substring. The substring will first be taken on the input value, before applying the RegEx
Capture Groups
An extension to the Regular Expression rules, the captured information from a RegEx match can be used to further match an input, or an output. This is achieved using a Regular Expression capture group. When part of the RegEx is in parenthesis it forms a capture group, and expression may contain 0 to many capture groups, each group can be referred to by its index.
Example RegEx with 2 Capture Groups
RegEx: ([ABC])([DEF]) - match the character A,B, or C followed by D, E, or F
Match: AD
Capture Group 1: 1
Capture Group 2: 2
The above rule defines 2 capture groups, when an input value is matched, parts of the value which were matched by a capture group can be referred to by index (starting at 1), by using the \ prefix. For example, \1 would output A and \2 would output D from the above example.
A mapping rule can use capture groups by copying matched information from one source value, into either another source value, or a target value (of both!).
Example using Capture groups in target
Input Rule: ([ABC])([DEF])
Output Rule: \2&\1
Matched Input: AD
Output: D&A
Example using Capture groups with multiple source inputs
Source Rule 1 | Source Rule 2 | Input Value 1 | Input Value 2 | Is Match |
---|---|---|---|---|
([ABC])([DEF]) | \2\1 | AD | DA | Yes |
([ABC])([DEF]) | \2\1 | AD | AD | No |
Example Use Cases
Simple Lookup Table
To create a simple lookup table, for example ISO2 to ISO3 country codes, a simple Representation Map is required with a single source Codelist (ISO2 Countries) and a single target Codelist (ISO3 Countries)
ISO2 Countries | ISO3 Countries |
---|---|
AF | AFG |
AL | ALB |
DZ | DZA |
Default if not specified
This use case is supported by creating a final mapping which uses the Regular Expression for match all. This mapping will only be used if no other mappings match
ISO2 Countries | Is Reg Ex | ISO3 Countries |
---|---|---|
AF | false | AFG |
AL | false | ALB |
DZ | false | DZA |
.* | true | _Z |
In the above example if no country codes match, then _Z will be output.
Copy Value Unless
This use case is when the user wants to copy the input to the output (verbatim), with the exception of a few conditions. This could be used when the source and target Codelists are largely the same, with a few exceptions. This use case is satisfied by using explicit mapping, and a catch all regular expression which is used to perform a copy from source to target by using the regex capture group.
Country | Is Reg Ex | REF_AREA |
---|---|---|
RW | false | ROW |
IO | false | INO |
(.*) | true | \1 |
In the above example if the source country is RW it is mapped to ROW, IO is mapped to INO, for anything else, the value is simply copied as is.
If-Then-Else
This use case is supported by creating a number of RegEx expressions, which are executed in order. The first one to match wins.
Source | Is Reg Ex | Target | |
---|---|---|---|
S1 | false | S1A | Exact Match, S1 maps to S1A |
S2 | false | S2A | Exact Match, S1 maps to S2A |
{2}[A-Z]1 | true | _X | Map to _X if the source starts with two upper case A to Z characters followed by the number 1 |
.* | true | _Z | anything maps to _Z |
In this example, the mapping process first checks the exact matches, if none match, then each reg-ex pattern is tested in turn, in the above example the final reg ex just matches anything to output _Z.
Collapsing Dimensions
This use case is to facilitate the collection of data conforming to one DSD, whist the internal usage (or future dissemination) of the data may conform to another DSD. Consider the following DSDs which are made available by the IMF, both are Balance of Payments, however one is named as Global Use, and the other is For Collection. It is possible that the data collected against one DSD can be translated to the other DSD, however the Global Use DSD has many more Dimensions, these appear to have been replaced an INDICATOR Dimension in the For Collection DSD:
BOP_GBPM6 Balance of Payments _Global use_BOP_BPM6 | BOP6 Balance of Payments (BPM6) for Collection |
---|---|
|
[DATA_DOMAIN] Data Domain [REF_AREA] Reference Country or area [INDICATOR] Economic Indicator [COUNTERPART_AREA] Counterpart Area [FREQ] Frequency [TIME_PERIOD] Time period |
In this use case, we will need to create a Representation Map to describe how the combination of Source values are translated into a corresponding value for the INDICATOR Dimension. This Representation Map can then be used by a Structure Map in order to map the Global BOP Dataflow to the Collection BOP Dataflow.
The First step is to create a Representation Map and choose multiple Source Codelists, one for each Dimension highligthed in BOLD. Each of these Dimensions describes which Codelist it uses, the ADJUSTMENT Dimension for example uses IMF:CL_ADJUSTMENT(1.4). The Representation Map will have the following Source Codelists:
- IMF:CL_ADJUSTMENT(1.4)
- ESTAT:CL_SECTOR(1.9) (REF_SECTOR)
- ESTAT:CL_SECTOR(1.9) (COUNTERPART_SECTOR)
- IMF:CL_FSENTRY(1.1)
- IMF:CL_ACCOUNT_ENTRY(1.2)
- IMF:CL_ACCOUNTS_ITEM(1.5)
- IMF:CL_FUNCTIONAL_CAT(1.8)
- ESTAT:CL_INSTR_ASSET(1.8)
- ESTAT:CL_MATURITY(1.6)
- IMF:CL_UNIT(1.12) (UNIT_MEASURE)
- IMF:CL_UNIT(1.12) (CURRENCY_DENOM)
- ESTAT:CL_VALUATION(1.6)
- IMF:CL_COMP_METHOD(1.2)
Only One Target Codelist is required, the CL_INDICATOR Codelist.
When defining the mapping, it is important to think about each mapping as a row in a table, where the indicator is only mapped if each of the source values match for all the other Dimensions, for example:
ADJUSTMENT | REF_SECTOR | COUNTERPART_SECTOR | FLOW_STOCK_ENTRY | ACCOUNTING_ENTRY | INT_ACC_ITEM | FUNCTIONAL_CAT | INSTR_ASSET | MATURITY | UNIT_MEASURE | CURRENCY_DENOM | VALUATION | COMP_METHOD | INDICATOR |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
_Z | S1 | S1 | T | A | CA | D | F | L | _Z | _Z | _Z | C | IND_ABCD1 |
_Z | S1 | S1 | T | B | CA | D | F | L | _Z | _Z | _Z | L | IND_ABCD2 |
_Z | S1 | S1 | T | C | CA | D | F | L | _Z | _Z | _Z | C | IND_ABCD3 |