Home > Articles > Web Design & Development > XML

XML Building Blocks: Elements and Attributes

David Gulbransen
  • PrintPrint
  • Share ThisShare This
  • DiscussDiscuss
Special Edition Using XML, 2nd Edition

This chapter is from the book
Special Edition Using XML, 2nd Edition

Well-Formedness Rules

XML documents must be well formed in order to be considered XML. The XML 1.0 Recommendation spells out some conditions that must be met for a document to be considered well formed. These conditions are called well-formedness constraints, and if the document fails to meet these constraints, it is not an XML document.

This can lead to some confusion when working with XML, as you may have a document which seems to be perfect XML, yet it will not load into your XML application. That is because if the well-formedness constraints are not met, XML parsers cannot properly load the document. This is very contrary to the behavior of most Web browsers, which are very forgiving of errors in HTML.

It would be impossible to enumerate each of the well-formedness constraints in the XML 1.0 Recommendation without delving into minutia that are not really very germane to creating XML documents. For example, if a document uses element names that are forbidden, such as <411>, then the document is not well formed. However, we've already discussed this rule in the context of naming your elements, so rehashing each of these details here would be tedious.

The important aspects of well-formedness can be boiled down into a few rules that should always be followed, and significantly lower your chances of creating a malformed XML document:

  1. All element and attribute names must follow the conventions for XML naming, as outlined previously (that is, not starting with a digit, and so on).

  2. Elements must be properly nested.

  3. Every start tag must have an end tag, or take the form of the empty element.

  4. All tags must properly match case.

  5. A well-formed document must have one, and only one, root element that contains all the other elements in the XML document.

  6. All entities must be properly referenced.

If you follow these rules, chances are your XML documents will be well formed.

Well-Formedness and Entities

An entity is just a way of using shorthand in XML. Entities can also be found in HTML. For example:

&copy;

is an entity that represents the copyright © symbol.

The syntax for most entities is

&entityname;

Entities can be used to replace long strings, or to represent symbols that you cannot include legally in an XML document. For example, let's say that you wanted to include a less-than symbol:

<equation>2 is less-than 7</equation>

You could not legally say

<equation>2 < 7</equation>

This violates the well-formedness constraints because it includes the < symbol that signifies the beginning of a tag. Fortunately, entities provide a way to reference this without actually including the symbol: &lt;. An entity exists for the greater-than symbol as well: &gt;.

There are a number of entities that are predefined for XML, so using these entities in your document does not violate any rules for well-formedness:

  • &amp;

    This entity is used to represent the ampersand symbol &.

  • &lt;

    The less-than entity is used to represent the less-than sign <, which is also the beginning sign of any tag. Because it denotes the beginning of a tag, if you want to show a tag in text, or use the less-than symbol, you should use the &lt; entity.

  • &gt;

    The greater-than entity is similar to that of the less-than entity. You would use it to represent the greater-than symbol > in the content portion of an element.

  • &apos;

    The apostrophe entity is used to represent an apostrophe ' or a single quotation mark.

  • &quot;

    This entity is used to represent a quotation mark:".

You should note that although these entities are found in HTML, some entities found in HTML such as &copy; are not present in XML. Any other entities that are used in your document would need to be defined by you in a DTD or XML Schema in order for the document to comply with well-formedness.

There are actually two ways that you can define entities. You can use an entity declaration in an external DTD, or you can also declare entities in the internal DTD subset, self-contained within your document. We will discuss Document Type Definitions, both internal and external, in more detail in Chapter 4.

A well-formed document does not have to have a DTD associated with it to be well formed. As long as the document is structured correctly, it can be considered well formed. For many documents, there is no need for a DTD or Schema. By enforcing well-formedness, XML enables you to create flexible documents that might serve your needs without adding a level of complexity with a Document Type Definition (DTD) or XML Schema.

  • Share ThisShare This
  • Your Account

Discussions

Make a New Comment

You must log in in order to post a comment.

Related Resources

Lisa Jacobson-BrownWill you review our books?
By Lisa Jacobson-Brown on August 16, 2010 No Comments

One of the most important jobs we have as a publicity department is to give our customers a good idea of how valuable a book will be – and the best way to do that is to get the book out there and have you review it.

What can Que do for YOU?
By Loretta Yates on August 6, 2010 No Comments

Lots of great info on Microsoft Office 2010, Expression Web 4, and much more coming your way!

Emily NaveCommunity Tips: Starting a User Group Library
By Emily Nave on August 4, 2010 No Comments

The Central Penn Adobe User Group (CPAUG) uses a library program to share books from different publishers with members. A short Q&A with group leader Megan Fister provides some great tips for starting your own.

See All Related Blogs

Informit Network