xml

Topics related to xml:

Getting started with xml

Building blocks

Namespaces

Element and attribute names in XML are called QNames (qualified names).

A QName is made of:

  • a namespace (a URI)
  • a prefix (an NCName, NC because it contains no colon)
  • a local name (an NCName)

Only the namespace and the local name are relevant for comparing two QNames. The prefix is only a proxy to the namespace.

The namespace and prefix are optional, but the namespace is always present if the prefix is present (this is ensured at the syntactic level, so this cannot be done wrong).

The lexical representation of a QName is prefix:local-name. The namespace is bound separately using the special xmlns:... attributes (reminder: attributes beginning with xml are reserved in XML).

If the prefix is empty, no colon is used in the lexical representation of the QName, which only contains the local-name. QNames with an empty prefix either have no namespace (if no default namespace is in scope) or are in the default namespace.

Entities

From a storage perspective, an XML document is made of entities. One of the entities is the document entity, which is the main XML document itself.

Entities can be classified like so (tentatively sorted by descending order of usage):

  • document entity: this is the main XML file.
  • internal general entities: this is the most common one besides the document entity, and the one most XML users are aware of. Often, the word entity is casually used for them. They allow specifying some shortcuts for longer replacement texts in document content. They are declared in the DTD.
  • the external DTD subset: another file in which part of the DTD is outsourced.
  • parameter entities: shortcuts, for use in the DTD.
  • external parsed general entities: they are XML fragments stored in other files.
  • unparsed entities: these can be any files on which XML places no restrictions, including images, sounds, etc.

In many cases, an XML document consists solely of the document entity.

Escaping

Characters can be escaped in XML using entity references and character references, or CDATA sections.

XML pre-defines five entities:

Named entityReplacement text
amp&
quot"
apos'
lt<
gt>

Consuming applications will not know whether each character has been escaped or not, and how.

DTD

XML Schema

XML Catalogs