Reference Metadata

From FMR Knowledge Base
Revision as of 08:04, 22 July 2022 by Glenn (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


See also Reference Metadata API

Overview

Reference Metadata is a structured document of information which can be attached to any SDMX structure. Whist Reference Metadata can be used to capture information to run business processes, the typical use case is to capture textual information, intended for human users to get a better understanding about the information they are looking at. This could be more detailed information about a dataset, contact details, legal text, or any other type of information.

Reference Metadata report are structured under headings and sub headings. The reported content can be a mixture of data types including html, numbers, coded values, boolean, urls, and more. One way to think about a Reference Metadata Report is as a Microsoft Word document with headings and content. Where the Word document is linked to some other part of the SDMX information model (a specific Concept or Code for example, or a collection of Dataflows).

Common use cases for Reference Metadata are to capture data quality metadata including aspects such as how the data was collected, legal information, contract information, and so on. The IMF Data Quality Framework provides a good example of this type of metadata, with their Dissemination Standards Bulletin Board (DSBB) demonstrating the use of Data Quality Metadata in a collection and dissemination space.

Designing a Metadata Collection

Metadata Structure Definition

In the same way that SDMX defines the shape and content of Datasets in terms of the Dimensions and Attributes, there is the same paradigm definition for Reference Metadata.

The Metadata Structure Definition is the structure used to define the allowable content of a Metadata Report. A good analogy for a Metadata Structure Definition is a Word document with all the Headings and Sub Headings completed, but no content under any of the headings. This word document can be thought of as a template which could be distributed to users in order to complete. In the same way, the Metadata Structure Definition (MSD) is a template, which lays out the structure of the document Headings/Sub Headings, the analogy is not perfect because the MSD also describes cardinality (how many times each heading can repeat, if the heading is mandatory or optional), the MSD also defines content. It is not possible in a Word document to create a rule to say the content under a particular heading is restricted to 400 characters, or it can be in multiple languages, or it must be a value in a corresponding Codelist, with an MSD it is possible. So whilst the Microsoft Word analogy helps visualise the role of the MSD, the MSD provides much more control over what can be reported.

In terms of the SDMX Information Mode, the Metadata Structure Definition defines a list of Concepts (the analogy is a document heading). Each Concept can be put into a hierarchy (headings and sub headings). Concepts are given allowable representation (is the authored content under the heading HTML, Integer, Coded, does it have a maximum length, etc.) The allowable representation can even be set to ‘none’ to indicate the heading is just a presentational heading to group a set of sub headings.

An example below shows the Concept of Contact set to be presentational which allows more then one occurrence. The sub Concepts of Name, Phone Number, and Department are sub headings. Name would have a max limit, Phone Number could be set to numerical or given a pattern which incudes spaces, Department could come from a Codelist or be simple text, Website would take valid URLs.

 Contact
 Name : Matthew Nelson
 Phone Number : 01234 444555
 Department : IT
 Website: https://metadatatechnology.com/
 Contact
 Name : Glenn Tice
 Phone Number : 01234 444666
 Department : Management
 Website: https://metadatatechnology.com/

Metadata Flow

The Metadata Structure Definition defines the template of the Metadata Report, it is the Metadataflow which the Metadata Report is reported against.

The Metadataflow is a SDMX structure which does two things:

  1. It references a Metadata Structure Definition, this provides the rules to which the Metadata Report must conform
  2. It defines one or more allowable targets, explained below

When a Metadata Report is authored, it must be attached to one or more SDMX structures. For example a report about data collection methodology might be attached to a specific Dataflow, for example National Accounts. The author of the report chooses what structure(s) they are attaching their report to. However, the collecting Agency, the owner of the Metadataflow, may want to restrict what options are given to the report owner. It is the Metadataflow which enables the collecting Agency to provide one or more restrictions.

The restriction (an allowable target) may define a single allowable structure, such as

Dataflow=IMF:BOP(1.0)

The target must be a Dataflow owned by the IMF with the ID of BOP and the version of 1.0. This target gives the report author no real choice in what they attached their report to, it must be 1 and only 1 structure which has been predefined.

The target could be less restrictive by allowing wildcard values

Dataflow=IMF:*(*)

In this example the Metadata Report may only link to Dataflows owned by the IMF.

Any part of the target (including the structure type) can be left open. Therefore it is possible to define a target for any structure, any agency, any id, at any version.

A Metadataflow may define multiple targets, in the following example the Metadata Report could be attached to any Code as long as the Codelist ID is CL_FREQ, it could also attached to any Concept as long as the Concept ID is FREQ. The Metadata Report can attach to one or more structures as long as the comply with these restrictions, in this way a single Metadata Report could link to one Code or a Code and a Concept or multiple Concepts or multiple Codes.

Code=*:CL_FREQ(*).*
Concept=*:*(*).FREQ

If the collection of Reference Metadata has multiple providers, then the Metadata Provision Agreement can be used to provide an extra control over the collection process.

Metadata Provision Agreement

Whilst a Metadata Report can be authored directly against a Metadataflow, in a collection environment where there may be multiple reports from multiple providers, the Metadata Provision Agreement is needed to define the ownership of each Metadata Report.

The Metadata Provision Agreement is an SDMX structure that has 3 pieces of information:

  1. It defines what Metadataflow it is for, which by extension defines the Metadata Structure Definition
  2. It defines who the Metadata Provider is (who owns the report)
  3. It can define additional targets which further restrict those on the Metadataflow

Linking a Metadata Provider to a Metadataflow enables further restriction to be applied to the the allowable targets, giving finer control over the Reference Metadata collection.

For example an Metadataflow can be defined with an generic target such as ‘any IMF Dataflow’:

Dataflow=IMF:*(*) 

If Metadata Reports are collected directly against this Metadataflow the Report author could attach the report to any or all Dataflows. In reality, the provider may only report data against a subset of these Dataflows and therefore it makes no sense for them to report data collection methodologies against Dataflows that they do not report data for.

The Metadata Provision Agreement is created for the Metadata Provider and would further restrict the targets to the Dataflows they do report data against, for example:

Dataflow=IMF:BOP(*)
Dataflow=IMF:NAC(*)

A separate Metadata Provision Agreement would be created for a different Metadata Provider and may contain different restrictions, for example:

Dataflow=IMF:CGO(*)
Dataflow=IMF:CGD(*)

The Metadata Provision Agreement enables a single collection structure (the Metadataflow) to be reused by multiple Metadata Providers, each Metadata Provider reporting information which is against the same template (the MSD), but connecting to targets relevant to the Metadata Provider.

Report Ownership

Reference Metadata Reports are maintainable artefacts, and as such are uniquely identified with 3 properties

  1. The owner (agency)
  2. The identity (ID)
  3. The version

These 3 properties form a composite key, and therefore it is possible to have multiple reports with the same ID as long as either the owner and/or version are different.

Unlike any other SDMX structure the ownership of a Metadata Report does not have to be an SDMX Agency. When a Metadata Report is authored against a Metadata Provision Agreement, it is the Metadata Provider (defined by the Metadata Provision Agreement) who owns the report.

It is important to remember that the Metadata Provider itself is defined and therefore owned by an Agency. For example, if the Agency IMF wants to collect Metadata Reports from multiple Metadata Providers, they will first define who the Metadata Providers are in a Metadata Provider Scheme which is owned by the IMF. For example:

IMF Metadata Providers
UK1 - Bank of England
FR1 - Banque de France
ES1 - Instituto Nacional de Estadistica

When a Metadata Provider authors a report, the ownership is the ID of the Metadata Provider (UK1 for example) concatenated with the ID of the Agency who owns the Metadata Provider Scheme (IMF for example). The order is [agency id].[provider id] for example:

IMF.UK1

This structure enables the Metatadata Provider to take ownership of their reports, and as such given them permission to add/edit/delete their reports.

When a Metadata Report is authored directly against a Metadataflow, the Agency who owns the Report is the Agency that authored the Report. It is therefore possible for the BIS to author a Metadata Report against a Metadataflow owned by the IMF.

Security rules can be defined in Fusion Registry to restrict Metadata Report read/write rules.

Reporting Reference Metadata

Reference Metadata must conform to the Metadata Structure Definition, it is authored in SDMX format, currently only SDMX-JSON is supported.

The Report has the following pieces of information:

  1. The identity. The owner, id, and version of the report, used for unique identification.
  2. The name/description.
  3. The metadataflow or provision. This describes what rules the Metadata Report conforms to.
  4. The target(s). This defines what structure or structures the report is for.
  5. The content. The content of the report is defined in Metadata Attributes. Each Metadata Attribute corresponds to a Concept in the Metadata Structure Definition.

Identity

The identity uniquely identifies the Metadata Report. The owner ID should be either the owning Agency (if the report is against a Metadataflow) or the owning Metadata Provider (if against a Metadata Provision Agreement). If against a Metadata Provision Agreement the ID of the Metadata Provider should be prefixed by the ID of the Agency who own the Metadata Provider Scheme.

Metadataflow / Provision

The SDMX-JSON refers to this property as metadatflow but the value can be a reference to either a Metadataflow or Metadata Provision Agreement. This tells the receiving system what to validate the report against, for dissemination this information is used to decode the reported values i.e the Concepts used by the Metadata Structure Definition provide the labels for the Metadata Attributes, and any Coded values can be decoded by obtaining the corresponding Codelist.

Target

The Target of the metadata report defines what structure (or structures) the Metadata Report is against. The connection to the structure(s) is a weak connection, and as such the structures do not have to exist in Fusion Registry. It is therefore possible for a report to be authored before the structure exists, or for a structure to be deleted even if there is reference metadata against it.

The targets of the metadata report may include wildcards in the reference (*) and there may be multiple targets. A report could report could therefore be authored for a specific structure, or a collection of structures. The wildcarding of the target enables reports to be immediately linked to structures as soon as they are created.

Content

The Content of a Metadata Report comes under the attributes section, and must conform to the rules set out by the Metadata Structure Definition. This is the information the user is reporting in the Report.

Example

An example report

Quality
 Legal
   The responsibility for collecting, processing, and....
Resource
   Staff, facilities, computing resources, and financing
Relevance
   The relevance and practical utility of existing statistics
Integrity
 ....

Would be transmitted in a Metadata Report like the following:

{
 "meta": {
   "id": "IREF920760",
   "test": false,
   "schema": "https://raw.githubusercontent.com/sdmx-twg/sdmx-json/master/metadata-message/tools/schemas/2.0.0/sdmx-json-metadata-schema.json",
   "prepared": "2022-07-21T08:27:55Z",
   "contentLanguages": ["en"],
   "sender": {"id": "FusionRegistry"}
 },
 "data": {
   "metadataSets": [
     {
       "id": "EXAMPLE",
       "names": {
         "en": "Example Report"
       },
       "version": "1.0.0",
       "agencyID": "IMF.UK1"
       "metadataflow": "urn:sdmx:org.sdmx.infomodel.registry.MetadataProvisionAgreement=IMF:MDF_UK1_DQAF(1.0.0)",
       "targets": ["urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=IMF:BOP(*)"],
       "attributes": [
         {
           "id": "QUALITY",
           "attributes": [
             {
               "id": "LEGAL",
               "value": { 
                          "en": "The responsibility for collecting, processing, and....",
                          "fr": "il est responsable de la collecte, du traitement et...." 
                        }
             },
             {
               "id": "RESOURCE",
               "value": { "en": "Staff, facilities, computing resources, and financing" }
             },
             {
               "id": "RELEVANCE",
               "value": { "en": "The relevance and practical utility of existing statistics"}
             }
           ]
         },
         {
           "id": "INTEGRITY",
           "attributes": [
             {
               "id": "INST",
               "value": {"en": "Statistics are produced on an impartial basis"}
             },
             {
               "id": "TRANSPARENCY",
               "value": {"en": "The terms and conditions under which statistics are collected, processed, and disseminated are available to the public."}
             },
             {
               "id": "ETHICAL",
               "value": {"en": "Guidelines for staff behavior are in place and are well known to the staff" }
             }
           ]
         }]
       },
       {
           "id": "METHODOLOGY",
           "attributes": [
             {
               "id": "SCOPE",
               "value": {"en": "The scope is broadly consistent with internationally accepted standards, guidelines or good practices" }
             },
             {
               "id": "CLASSIFICATION",
               "value": { "en": "systems used are broadly consistent with internationally accepted standard" }
             },
             {
               "id": "BASIS",
               "value": { "en": "Market prices are used to value flows and stocks." }
             },
             {
               "id": "SOURCE",
               "value": "IMF"
             }
           ]
         }
       ]
     }
   ]
 }
}

This example shows a Report owned by the IMF.UK1 Metadata Provider. It conforms to the rules of the Metadata Provision Agreement owned by the IMF with Id MDF_UK1_DQAF at version 1.0.0. The report is against the IMF Dataflow BOP at any version (it attaches to all Dataflows which match).

The reported content is given under the Metadata Attributes, where each attribute has an ID (which relates to the Metadata Attribute with the same ID defined in the Metadata Structure Definition).

Some attributes are presentational only, such as Quality, Collection Methodology, and Integrity. Other attributes have content.

With the exception of the SOURCE attribute, all attributes support multilingual text. Only the LEGAL attribute has been given text in a locale other then English (to demonstrate the syntax).

The report would can be presented in a dissemination environment as a readable document, like metadata is presented in the IMF DSBB.

Finding Reference Metadata

When a Metadata Report is saved to the Fusion Registry, it's presence is immediately reflected in the structure or structures to which it relates. Any structure which has associated reference metadata will provide links back to each Metadata Report (SDMX 3.0 formats only). Therefore a report against a specific Dataflow will result in that Dataflow linking back to the Metadata Report when the Dataflow is exported in SDMX-ML or SDMX-JSON format.

The link from the SDMX Structure back to the Reference Metadata Report is generated dynamically by the Fusion Registry. As such, the owner of the Dataflow does not need to take any action to maintain this link. If the Metadata Report is deleted, the link is removed. If the Metadata Report is linked to multiple structures, then a links is created against each structure.

In addition to the linking mechanism, it is possible to find reference metadata by:

  1. Query for Reports by Metadataflow (and optionally filtered by Metadata Provider),
  2. Query for Reports by Unique Identifiers of the Report
  3. Query for Reports by Structure Query

The SDMX REST Specification for metadata provides full details on the web services available for discovering reference metadata.

Maintaining Reference Metadata

The Fusion Registry provides web services to manage, find, and remove reference metadata.