XML Open Data Dictionary Technical Guideline
Table of Contents
Introduction
This specification defines the syntax and semantics of the schema for PWGSC Open Data XML data dictionary files. A data dictionary file contains information about the format and contents of a dataset, data files and resource files. A data dictionary file is a metadata file that contains information about the data file format and contents. It may also contain information regarding the creation and maintenance of individual resource files that make up a dataset on the Open Government Portal. It defines, at a minimum, the headers found in the data file and the links to the datasets that the headers apply to. It may also include, if available, any constraints or conditions that are be applied to the data cells (e.g. format, value set). The use of a XML data dictionary allows for the possibility of generating JavaScript Object Notation - Linked Data (JSON-LD) formatted data files in order to meet the 4 and 5 star Openness Rating as defined on the Open Data portal.
Data Dictionary Structure
The general structure of a XML data dictionary is as follows. Only required elements are included in this outline.
<?xml …?> <data_dictionary …> <headings> <heading id=”h1> <label>heading label</label> <description xml:lang="en">English description</description> <description xml:lang="fr">Description française</description> … </heading> <heading id=”h2> <label>heading label</label> <description xml:lang="en">English description</description> <description xml:lang="fr">Description française</description> … </heading> …. </headings> </data_dictionary>
The following sections provide details on the elements, their format, attributes and whether they are required or optional.
Declaration statement
The data dictionary must start with the xml declaration tag and must specify the character encoding as UTF-8.
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
Elements
The following elements or tags are defined in the PWGSC XML Data Dictionary schema.
The data_condition element
The data_condition element is an optional element that specifies any conditions that apply to values in a data cells under the column heading. The data conditions are based on the CSV Schema Language (version 1.1) language for defining and validating CSV data. If multiple conditions are specified, they must be separated by semicolons. The condition statements include, but are not limited to, the following:
Condition | Description |
---|---|
not | A Not Expression checks that the value of the column is not equal to the supplied string or the value in the referenced column. |
notEmpty | A Not Empty Expression checks that the column has some content. |
range | A Range Expression checks that the value of the column is a number lying between, or equal to, the supplied upper and lower bounds. Examples of range:
|
unique | A Unique Expression checks that the column value is unique within the CSV file being validated (within the current column, the value may occur elsewhere in the file in another column). |
Note: The CSV Schema Language condition ‘regex’ must not be used, the data_pattern tag is used to specify the regular expression for the data values.
Note: If column headings are included in the conditions, the heading labels should be quoted, for example
<data_condition>if($"Deposit-Transaction-Type-Code-Code-type-opération-dépôt"/not("CAD"),is(""))</data_condition>
If the heading label ‘Deposit-Transaction-Type-Code-Code-type-opération-dépôt‘ is not properly quoted an error may occur when the condition is evaluated using a CSV Schema Language tool.
If no data_condition tag appears in the heading tag, then there are no conditions applied to the data values.
Example code:
<heading id="h1"> …. <data_condition>unique</data_pattern> … </heading>
The data_dictionary element
The data_dictionary element is the root of a data dictionary and must follow the declaration statement. The data_dictionary element must include attributes to specify the schema and namespace declarations.
Attribute | Description |
---|---|
dd_version | This attribute specified the data dictionary schema version. The current version of the schema is "1.0". |
xmlns:xsi | Indicates the schema that the elements and data types come from. |
xsi:noNamespaceSchemaLocation | This attribute specifies the XML Schema document that has the target namespace |
The data_dictionary element must be coded as follows:
<data_dictionary dd_version=”1.0” xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://donnees-data.spac-pspc.gc.ca/dd/spac-pspc-dd.xsd">
The data_dictionary tag must only contain a single instance of the datasets <headings>
tag.
Example code:
<data_dictionary dd_version=”1.0” xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://donnees-data.spac-pspc.gc.ca/dd/spac-pspc-dd.xsd"> <headings> … </headings> </data_dictionary>
The data_pattern element
The data_pattern element is an optional element that specifies the pattern or regular expression used to validate that appears in a data cell under the column heading. The content of the tag is full Perl regular expression. The regular expression must begin with a beginning of line character '^' and end with the end of string character '$'. If no data_pattern tag appears in the heading tag, any character may appear in the data cells (i.e. the pattern is ^.*$).
Example code for matching a date (YYYY-MM-DD):
<heading id="h1"> …. <data_pattern>^\d\d\d\d-\d\d-\d\d$</data_type> … </heading>
The data_type element
The data_type element is an optional element that specifies the type of data (e.g. number, date, etc.) that appears in data cells under the column heading. The content of the tag is the URL of a type defined within a schema. Publically available schemas are available on schema.org. If no suitable type within a schema is available, the data_type may reference the heading tag in the current data dictionary. If no data_type tag appears in the heading tag, any type of data may appear in the data cells.
Example code for a type defined by schema.org:
<heading id=”h1”> …. <data_type>https://schema.org/Organization#legalName</data_type> … </heading>
Example code for a type not defined by any schema, the data type references the data dictionary (the url anchor matches the id attribute of the heading):
<heading id=”h1”> …. <data_type><a href="http://donnees-data.spac-pspc.gc.ca/dd/my_dd.xml">http://donnees-data.spac-pspc.gc.ca/dd/my_dd.xml</a>#h1</data_type> … </heading>
The description element
The description element specifies the description of a heading or a dataset. The description is language specific and must include a xml:lang attribute to specify the language of the description.
Attribute | Description |
---|---|
xml:lang | Specifies the language of the description's content. |
The description element is a required element, it specifies the text for the description of a single CSV data file heading (column heading). There must be at least 2 description tags for each heading, one for each official language.
Example code for a heading:
<heading id="h1"> <label>col_1</label> <description xml:lang="en">Identifies the fiscal year of payment issuance.</description > <description xml:lang="fr">Indique l'exercice financier au cours duquel le paiement a été émis.</description> … </heading>
The heading element
The heading tag is a container tag for details of a singleCSV data file heading (column heading). The heading tag must include an id attribute to uniquely identify the heading within the data dictionary.
Attribute | Description |
---|---|
id | An identifier that uniquely identifies the heading within the data dictionary. This is not the CSV column heading label. |
The heading tag must include the following required and optional tags in the following order:
- 1 or more required label <label> tags,
- 2 or more required description <description> tags,
- An optional data_type <data_type> tag,
- An optional data_pattern <data_pattern> tag,
- An optional data_conditions <data_conditions> tag,
- An unlimited number of optional related_resource <related_resource> tags.
Example code:
<heading id="h1"> <label>heading label</label> <description xml:lang="en">English description</description> <description xml:lang="fr">French description</description> … </heading>
The headings element
The headings tag is a container tag for a set of heading <heading> tags that contain details of CSV data file headings (column headings). The headings tag must include 1 or more heading <heading> tags.
Example code:
<headings> <heading> … </heading> … </headings>
The label element
The label element is a required element that specifies the text of a single CSV data file heading (column heading). The label may be language independent or language specific. If the label is language specific, it must include a xml:lang attribute to specify the language of the label.
Attribute | Description |
---|---|
xml:lang | Specifies the language of the labels's content. |
If a heading <heading> tag contains a language independent label (i.e. no xml:lang attribute), it must not also contain a language specific label.
Example of a heading with a language independent label:
<heading id="h1"> <label>col_1</label> … </heading>
Example of a heading with language specific labels:
<heading id="h1";> <label xml:lang="en">column_1</label> <label xml:lang="fr">colonne_1</label> … </heading>
The related_resource element
The related_resource element is an optional references an online resource that supports or is related to this heading. The content of the tag is the URL of the related resource. If the resource is language specific, it must include a xml:lang attribute to specify the language of the resource.
Attribute | Description |
---|---|
xml:lang | Specifies the language of the related resource's content. |
Example code:
<heading id="h1";> … <related_resource xml:lang="en">http://www.tpsgc-pwgsc.gc.ca/recgen/pceaf-gwcoa/1516/txt/rg-3-num-eng.html</related_resource> <related_resource xml:lang="fr">http://www.tpsgc-pwgsc.gc.ca/recgen/pceaf-gwcoa/1516/txt/rg-3-num-fra.html</related_resource> … </heading>
Attributes
The following attributes are used by elements or tags defined in the PWGSC XML Data Dictionary schema.
The dd_version attribute
The dd_version attribute specifies the version of the XML Open Data Dictionary schema. The value is a number.
The id attribute
The id attribute specifies its element's unique identifier, the value must be unique amongst all the IDs in the data dictionary. The value must contain at least one character and must not contain any space characters.
The xml:lang attribute
The xml:lang attribute specifies the base language of an element's attribute values and text content. The default value of this attribute is unknown. Values are 2 letter ISO 639 language codes (e.g. "en", "fr").
Complete Example
The following is a complete example of a XML data dictionary that includes all the required tags and values.
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="/dd/dd.xsl"?> <!-- Real Property branch, payment in lieu of taxes data dictionary. --> <data_dictionary xsi:noNamespaceSchemaLocation="http://donnees-data.spac-pspc.gc.ca/dd/spac-pspc-dd.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" dd_version="1.0"> <headings> <heading id="h1"> <label xml:lang="en">Province/Territory</label> <label xml:lang="fr">Province/Territoire</label> <description xml:lang="en">Province/Territory Name: The name of Canadian Province or Territory where a Taxing Authority is located.</description> <description xml:lang="fr">Nom de la Province/Territoire : Le nom de la Province Canadienne ou du Territoire où est située une Autorité Taxatrice.</description> <data_type>https://schema.org/State#name</data_type> </heading> <heading id="h2"> <label xml:lang="en">Taxing Authority</label> <label xml:lang="fr">Autorité taxatrice</label> <description xml:lang="en"> Taxing Authority Name: The name of Canadian taxing authorities that host federal property belonging to federal departments of the Government of Canada.</description> <description xml:lang="fr">Nom de l’autorité taxatrice : Le nom des autorités taxatrices canadiennes qui comptent sur leur territoire des propriétés fédérales appartenant à divers ministères du Gouvernement du Canada.</description> <data_type>https://schema.org/Organization#legalName</data_type> </heading> <heading id="h3"> <label xml:lang="en">2009 PILT Amount</label> <label xml:lang="fr">Montant des PERI pour 2009</label> <description xml:lang="en">Tax Year PILT Amount: Total PILT amount paid to a Taxing Authority for a specific tax year.</description> <description xml:lang="fr">Montant annuel de PERI : Le montant total de PERI versé à une Autorité Taxatrice pour une année de taxation donnée.</description> <data_type>https://schema.org/amount</data_type> <data_pattern>^\d+\.\d\d$</data_pattern> </heading> </headings> </data_dictionary>
- Date modified: