Skip to content

XML Interchange Format

Alice Zoë Bevan–McGregor edited this page Nov 12, 2015 · 5 revisions

Contentment allows for easy import and export of individual assets or whole trees. This document contains notes on the format.

The smallest acceptable XML file is:

<?xml version="1.0" encoding="utf-8"?>
<Extract xmlns="https://xml.webcore.io/component/asset/1.0">
</Extract>

This describes an export that exported nothing. The root tag of valid Contentment XML must always be Extract.

Object Protocol

All objects persisted in the database must support data interchange. There are legal reasons for this, but it does provide a convenient method to perform backups. Objects participating in this protocol must define an __xml__ method that accepts one named argument, recursive, defaulting to False. This method must return an iterable of the unicode XML fragments used to describe that object.

A Python example:

class Something:
	def __xml__(self, recursive=False):
		return ["<Something />"]

Template functions built using cinje are suitable for direct use:

# encoding: cinje

: def export obj, recursive=False
	<Something />

With the above, the following Python would be valid:

from template import export

class Something:
	__xml__ = export

An accessor property is provided to retrieve the non-recursive XML representation named as_xml, to match as_html, as_json, and friends.

Assets

The Asset base class defines the bulk of the export machinery for itself and its participating subclasses. The tag used is the name of the class. The smallest acceptable bare Asset is:

<Asset name="example">
	<title>Example Asset</title>
</Asset>

XML Attributes

Attributes of Asset instances fall into three categories: simple, complex, or compound:

  • Simple types are generally the fundamental ones, unicode text, numbers, etc., that do not represent a container for other values. These are stored as attributes on the containing XML tag.

  • Complex types are ones for which the value (really its class) has overridden export behaviour.

  • Compound types represent containers for other values. Both complex and compound types are stored as discrete child tags.

Translated Attributes

Translated attributes are stored internally in a mapping, and as such represent a compound type. An example of this is the title of an Asset instance. These are encoded using a field-specific singular tag, which may differ from its name in cases of singular/plural, with the tag repeated for each language.

<title>This page could use some color.</title>
<title lang="en">This page could use some colour.</title>
<title lang="en-US">This page could use some color.</title>
<title lang="fr">Cette page pourrait utiliser certaines couleurs.</title>

As can be seen above, both region-free ISO 639-1 and region-specific IETF language tags can be used. There may be an instance of the tag without a language specified, but there must not be more than one; this would represent the ultimate default fallback, and would be used last if no better match could be found. If you do not use the translation machinery, you will only see single tags not tagged with a language.

Metadata Properties

Metadata associated with an Asset instance via the properties accessor is stored via the Properties class, and represent a "complex" type. Properties may export data in two ways:

<property name="width" type="int">0</property>
<property name="title" separator=": " direction="ltr" />

Because metadata may be of a variety of types, if it is not a unicode string or dictionary the type must be included in the XML tag. If the property is itself a dictionary it must only contain basic unicode strings, and is given a simplified, empty tag encoding as XML attributes. (This, consequently, forbids use of name and type as metadata properties.)

Children

All Asset instances may contain child Asset instances. These would be encoded after any other properties are. The "path" of an Asset is determined by the combination of its name and the names of its parent elements.

Page

Pages are containers for layout and general site content. They are an Asset containing a linear list of blocks. An example encoded page would be:

<Page name="terms">
	<title lang="en">Terms of Service</title>
	
	<ReferenceBlock target="/theme/part/header" />
	<TextBlock>
		<content lang="en"><![CDATA[Content would go here.]]></content>
	</TextBlock>
	<ReferenceBlock target="/theme/part/footer" />
</Page>

Notably, Asset contributions towards the exported XML are everything except the series of Blocks. Blocks behave according to the Asset encoding rules with regards to which attributes to supply as XML attributes, and which to populate as nested tags. In the above example, content is a translated attribute, but because TextBlock expects HTML content (which would require excessive encoding), it mandates wrapping of those values in CDATA.