This chapter explains in concrete, practical terms how to make DocBook documents. It’s an overview of all the kinds of markup that are possible in DocBook documents. It explains how to create several kinds of DocBook documents: books, sets of books, chapters, articles, and reference manual entries. The idea is to give you enough basic information to actually start writing. The information here is intentionally skeletal; you can find the details in the reference section of this book.
1. Making an XML Document
An XML document consists of an optional XML declaration, an optional Document Type Declaration, which includes an optional internal subset, and a document (or root) element. We’ll discuss each of these in turn.
In XML vocabularies like DocBook, which are defined with RELAX NG (and also in the case of vocabularies defined with W3C’s XML Schema), it is common to omit the Document Type Declaration entirely. The Document Type Declaration associates a document with a particular Document Type Definition (DTD).
1.1. An XML Declaration
<?xml version="1.0" encoding="utf-8"?>
Identifying the version of XML ensures that future changes to the XML specification will not alter the semantics of this document. The encoding declaration tells the processor what character encoding this document uses. It must match the actual encoding that you use. The complete details of the XML declaration are described in the W3C standard, Extensible Markup Language (XML) 1.0 [XML].
If your document uses XML 1.0 and an encoding
utf-16, the XML declaration
is not required. But it is never wrong to include it. If you do not
include an XML declaration, your document must
conform to XML 1.0. If you want to use
XML 1.1, you must include an XML
declaration and specify
version="1.1" in it.
The XML declaration is syntactically similar to a processing instruction, but it is not one. The XML declaration, if it is present, must be absolutely the first thing in your document and it may not appear anywhere else.
1.2. A Document Type Declaration
The Document Type Declaration identifies what the root element of the document will be and may specify the DTD that should be used when parsing the document. A typical Document Type Declaration for a DocBook V4.5 document looks like this:
<?xml version='1.0'?> <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
This declaration indicates that the root element will be
book and that the DTD used will be
DocBook V4.5, identified with both its public and
system identifiers. In this example, the DTD is
identified with an HTTP URI.
System identifiers in XML must be
URIs. Almost all systems accept filenames and
interpret them locally as
URLs, but it’s always correct to fully qualify
You can specify a DTD for DocBook V5.0 documents:
<?xml version='1.0'?> <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V5.0//EN" "http://www.oasis-open.org/docbook/xml/5.0/docbook.dtd">
But the limited constraints that can be expressed in DTDs mean that the resultant document may or may not really be valid DocBook V5.0. The normative schema for DocBook V5.0 is the RELAX NG grammar with its Schematron annotations.
The only reason to use a DTD with DocBook V5.0 is if your editing environment (or other tool) requires one, for example, for syntax-directed editing. If you’re using a tool that requires DTDs, check with the vendor, as maybe a more recent version is available that supports RELAX NG.
1.3. An Internal Subset
<?xml version='1.0'?> <!DOCTYPE book [ <!ENTITY nwalsh "Norman Walsh"> <!ENTITY chap1 SYSTEM "chap1.xml"> <!ENTITY chap2 SYSTEM "chap2.xml"> ]>
These declarations form what is known as the internal subset. In this example, the DTD has been omitted, but the two are not mutually exclusive. If you are using a DTD (which is technically known as the external subset), you can include the internal subset immediately after the DTD:
<?xml version='1.0'?> <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V5.0/EN" "http://www.oasis-open.org/docbook/xml/5.0/docbook.dtd" [ <!ENTITY nwalsh "Norman Walsh"> <!ENTITY chap1 SYSTEM "chap1.xml"> <!ENTITY chap2 SYSTEM "chap2.xml"> ]>
When both are specified, the internal subset is parsed first. If multiple declarations for an entity occur, the first declaration is used. This means that declarations in the internal subset override declarations in the external subset.
1.4. The Document (or Root) Element
All XML documents must have exactly one root element, although it may have sibling comments and processing instructions. If the document has a Document Type Declaration, the root element usually immediately follows it:
<?xml version='1.0'?> <!DOCTYPE book [ <!ENTITY nwalsh "Norman Walsh"> <!ENTITY chap1 SYSTEM "chap1.xml"> <!ENTITY chap2 SYSTEM "chap2.xml"> ]> <book xmlns="http://docbook.org/ns/docbook" version="5.0">…</book>
The important point is that the root element must be physically present immediately after the Document Type Declaration. You cannot place the root element of the document in an external entity.
2. Physical Divisions: Breaking a Document into Separate Files
The rest of this chapter describes how you can break documents into logical chunks, such as books, chapters, sections, and so on. Before we begin, and while the subject of the internal subset is fresh in your mind, let’s take a quick look at how to break documents into separate files.
Actually, we’ve already told you how to do it. If you recall, in the preceding sections we had declarations of the form:
If you refer to the entity
in your document after this declaration, the system will insert the
contents of the file
filename into your
document at that point. So, if you’ve got a book that consists of three
chapters and two appendixes, you might create a file called
book.xml, which looks like this:
<!DOCTYPE book [ <!ENTITY chap1 SYSTEM "chap1.xml"> <!ENTITY chap2 SYSTEM "chap2.xml"> <!ENTITY chap3 SYSTEM "chap3.xml"> <!ENTITY appa SYSTEM "appa.xml"> <!ENTITY appb SYSTEM "appb.xml"> ]> <book xmlns="http://docbook.org/ns/docbook" version="5.0"> <title>My First Book</title> &chap1; &chap2; &chap3; &appa; &appb; </book>
Documents that you reference with external parsed entities cannot have a Document Type Declaration. For example, Chapter 1 might begin like this:
<chapter xml:id="ch1"><title>My First Chapter</title> <para>My first paragraph.</para> …
But it must not begin with its own Document Type Declaration:
<!DOCTYPE chapter> <chapter xmlns="http://docbook.org/ns/docbook" xml:id="ch1"> <title>My First Chapter</title> <para>My first paragraph.</para> …
<book xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude" version="5.0"> <title>My First Book</title> <xi:include href="chap1.xml"/> <xi:include href="chap2.xml"/> <xi:include href="chap3.xml"/> <xi:include href="appa.xml"/> <xi:include href="appb.xml"/> </book>
The essential trade-offs between external parsed entities and XInclude are:
XInclude can be used in a document that does not have a Document Type Declaration. Many web services applications (ones that rely on SOAP, anyway) forbid a Document Type Declaration and therefore cannot use entities of any sort.
The documents referenced by XInclude are complete, free-standing XML documents. They can declare their own local entities using a Document Type Declaration. Documents referenced by external parsed entities cannot have a Document Type Declaration. If they use entities, those entities must be declared in the document that does the including.
External parsed entities can have multiple top-level elements. They are not required to be “single rooted.” XIncluded documents must be wholly well-formed XML.
All XML validators support external parsed entities. (Validators that do not are not conformant XML processors.) XInclude is a separate specification and may or may not be supported by tools.
The XML validator expands entities and therefore “sees” the entire document. This means that ID/IDREF links can freely cross entity boundaries. Because XIncluded documents are free-standing, a document containing an IDREF that crosses a document boundary cannot be valid. It can be well-formed, and processors can do the right thing, but the validator cannot determine that the document is valid. What’s more, the same ID value can occur in several XIncluded documents without causing a validity error. This may cause subsequent processing to fail.
As time passes, the use of DTD-based mechanisms like entities is diminishing. If you have an eye on the future, to the extent that it is practical, it is probably better to use XInclude than entities.
3. Logical Divisions: The Categories of Elements in DocBook
|Divisions, which divide books|
|Components, which divide books or divisions|
|Sections, which subdivide components|
In the rest of this section, we’ll describe briefly the elements that make up these categories. This section is designed to give you an overview. It is not an exhaustive list of every element in DocBook.
For more information about any specific element and the elements that it may contain, consult the reference page for the element in question.
set contains two or more
books. It’s the hierarchical top of DocBook. You use
set tag, for example, for a series of books on a
single subject that you want to access and maintain as a single unit,
such as the manuals for series of computer systems or the documentation
(tutorial, reference, etc.) for a programming language. Sets are allowed
to contain other sets, though this is not common.
book is probably the most common
top-level element in a document. The DocBook definition of a book is
very loose and general. Given the variety of books authored with DocBook
and the number of different conventions for book organization used
around the world, any attempt to impose a strict ordering of elements
would make the content model extremely complex. Therefore, DocBook gives
you free rein. You can use a local customization (see Chapter 5, Customizing DocBook) if you want to impose a more strict
ordering for your applications.
book consists of a mixture of the following
dedicationpages almost always occur at the front of a book.
- Navigational components
Divisions are the first hierarchical level below
book. Divisions contain
partcontains components. A
refentrys. These are discussed more thoroughly in Section 8, “Making a Reference Page”.
Books can contain components directly and are not required to contain divisions.
These are the chapter-like elements of a
Components are the chapter-like elements of a
also occur at the component level. We describe
articles in more detail in Section 7, “Making an Article”. Components generally contain block elements
and/or sections, and some can contain navigational components and
sect5elements are sectioning elements. They can occur in most component-level elements. These numbered section elements must be properly nested (
sect2s can only occur inside
sect3s can only occur inside
sect2s, and so on). There are five levels of numbered sections.
In addition to numbered sections, there is the
simplesectelement. It is a terminal section that can occur at any level, but it cannot have any other sectioning element nested within it.
A distinguishing feature of
simplesectis that it does not occur in the Table of Contents.
bridgeheadprovides a section title without any containing section.
refsectionelement is a recursive division in a
refentry. It is an alternative to the numbered reference section tags (
refsect3). Like the
refsectionelement is recursive.
All of the elements at the section level and above, and
many other elements, include a wrapper for meta-information about the
content. That element is named
info. In earlier
versions of DocBook, there were many similarly named elements for this
etc. In DocBook V5.0, there is only one.
The meta-information wrapper is designed to contain
bibliographic information about the content (
publisher, and so on) as
well as other meta-information such as revision histories, keyword sets,
and index terms.
info can contain:
The text of the title of a section of a document or of a formal block-level element
The abbreviation of a title
The subtitle of a document
A real-world address, generally a postal address
The page numbers of an article as published
The name of an individual author
A wrapper for author information when a document has multiple authors or collaborators
The initials or other short identifier for an author
The spatial or temporal coverage of a document
An identifier for a document
Untyped bibliographic information
A cooked container for related bibliographic information
The relationship of a document to another
A raw container for related bibliographic information
The source of a document
Identifies a collaborator
A wrapper for document meta-information about a conference
The contract number of a document
The sponsor of a contract
Copyright information about a document
The date of publication or revision of a document
The name or number of an edition of a document
The name of the editor of a document
An XLink extended link
The number of an issue of a journal
A set of index terms in the meta-information of a document
A set of keywords describing the content of a document
A statement of legal obligations or requirements
A displayed media object (video, audio, image, etc.)
The name of an organization
A person or entity, other than an author or editor, credited in a document
The numbers of the pages in a book, for use in a bibliographic entry
The printing history of a document
The formal name of a product
A number assigned to a product
The date of publication of a document
The publisher of a document
The name of the publisher of a document
Information about a particular release of a document
A history of the revisions to a document
Numbers of the volumes in a series of books
A set of terms describing the subject matter of a document
The volume number of a document in a set (as of books in a set or articles in a journal)
subtitle elements can usually appear either
immediately before or inside the
info wrapper (but
not both). This means you don’t need the extra wrapper in the common
case where all you want to specify is a title.
3.6. Block Elements
The block elements occur immediately below the component and sectioning elements. These are the (roughly) paragraph-level elements in DocBook. They can be divided into a number of categories: lists, admonitions, line-specific environments, synopses of several sorts, tables, figures, examples, and a dozen or more miscellaneous elements.
3.6.1. Block versus inline elements
At the paragraph level, it’s convenient to divide elements into two classes, block and inline. From a structural point of view, this distinction is based loosely on their relative size, but it’s easiest to describe the difference in terms of their presentation.
Block elements are usually presented with a paragraph (or larger) break before and after them. Most can contain other block elements, and many can contain character data and inline elements. Paragraphs, lists, sidebars, tables, and block quotations are all common examples of block elements.
Inline elements are generally represented without any obvious breaks. The most common distinguishing mark of inline elements is a font change, but inline elements may present no visual distinction at all. Inline elements contain character data and possibly other inline elements, but they never contain block elements. Inline elements are used to mark up data such as cross-references, filenames, commands, options, subscripts and superscripts, and glossary terms.
A list of
callouts and their descriptions. The
callouts are marks, frequently numbered and typically on a graphic (
imageobjectco) or verbatim environment (
screenco), that are described in a
An unadorned list of items.
simplelists can be inline or arranged in columns.
A list of terms and definitions or descriptions. (This list of list types is a
All of the admonitions have the same structure: an optional
title followed by paragraph-level elements. DocBook
does not impose any specific semantics on the individual admonitions.
For example, DocBook does not mandate that
be reserved for cases where bodily harm can result.
3.6.4. Line-specific environments
These environments preserve whitespace and line breaks in the source text. DocBook does not provide the equivalent of HTML’s br tag, so there’s no way to interject a line break into normal running text.
addresselement is intended for postal addresses. In addition to being line-specific,
addresscontains additional elements suitable for marking up names and addresses:
literallayoutdoes not have any semantic association beyond the preservation of whitespace and line breaks. In particular, while
screenare frequently presented in a fixed-width font, a change of fonts is not ordinarily implied by
programlistingcoelements are verbatim environments, usually presented in Courier or some other fixed-width font, for program sources, code fragments, and similar listings. The two elements are the same, except that
programlistingcosupports markup for callouts.
screencoelements are verbatim or literal environments for text screen captures, other fragments of an ASCII display, and similar things.
screenis also a frequent catchall for any verbatim text. The two elements are the same, except that
screencosupports markup for callouts.
synopsisis a verbatim environment for command and function synopses.
3.6.5. Examples, figures, and tables
DocBook supports CALS tables (defined with
caption) and HTML
tables (defined with
Informal equations don’t have titles. For reasons of backward
equations are not required to have
titles. However, it may be more difficult for some stylesheet
languages to properly enumerate
equations if they
3.6.8. Graphics and media
Graphics occur most frequently in
they can also occur outside those wrappers. DocBook considers a
mediaobject a block element, even if it occurs in
an inline context. For graphics that you want to be represented
A wrapper for audio data and its associated meta-information. (Which contains
A wrapper for image data and its associated meta-information. (Which contains
A wrapper for an image object with callouts. (Which contains
imagedataand callout-related information).
A wrapper for video data and its associated meta-information. (Which contains
A wrapper for a text description of an object and its associated meta-information. (Which contains
The audio, image, video, and text data in a media object are, by definition, alternatives.
3.6.9. Questions and answers
qandaset element is suitable for
FAQs (Frequently Asked Questions) and other similar
collections of questions and answers. Each
qandaentry contains a
answer(s). The set of questions and answers
can be divided into sections with
3.6.10. Procedures and tasks
DocBook provides a number of elements for describing command, function, and class synopses:
A syntax summary for a software command. A
groupelements. For long synopses, the
sbrtag can be used to indicate where a break should occur. Complex synopses can be composed from
The syntax summary for a function definition. A function synopsis consists of one or more
funcprototypes and may include additional, literal information in a
funcsynopsisinfo. Each prototype consists of
funcdef, and a collection of
The syntax summary for a class definition. A class synopsis consists of one or more
oointerfaceelements followed by zero or more
funcsynopsis, it may include additional, literal information, in this case, in a
3.6.12. Miscellaneous block elements
The following block elements are also available:
3.7. Inline Elements
Users of DocBook are provided with a surfeit of inline elements. Inline elements are used to mark up running text. In published documents, inline elements often cause a font change or other small change, but they do not cause line or paragraph breaks.
In practice, writers generally settle on the tagging of inline elements that suits their time and subject matter. This may be a large number of elements or only a handful. What is important is that you choose to mark up not every possible item, but only those for which distinctive tagging will be useful in the production of the finished document for the readers who will search through it.
The following comprehensive list may be a useful tool for the process of narrowing down the elements that you will choose to mark up; it is not intended to overwhelm you by its sheer length. For convenience, we’ve divided the inlines into several subcategories.
The classification used here is not meant to be authoritative, only helpful in providing a feel for the nature of the inlines. Several elements appear in more than one category, and arguments could be made to support the placement of additional elements in other categories or entirely new categories.
3.7.1. Traditional publishing inlines
3.7.2. Cross-references and linking
A spot in the document.
A cross-reference to a bibliographic entry.
An inline bibliographic reference to another published work.
A citation to a reference page.
The title of a cited work.
The first occurrence of a term.
A glossary term.
A hypertext link.
A link that addresses its target indirectly.
A cross reference to another part of the document.
DocBook provides a number of common attributes to facilitate cross-references and hypertext linking. Broadly speaking, these facilities break down into three categories: standard XML ID/IDREF linking, [XLink], and DocBook-specific markup. These different and sometimes overlapping approaches exist because DocBook's long history predates many modern hypertext standards.
The easiest and most interoperable way to identify the target of
a link (the thing being linked to) is with an
xml:id attribute. Use the
anchor element if you need to place a link target at a
specific point in text where there isn't a convenient wrapper element
on which to hang the ID.
XML ID/IDREF linking is accomplished with the
most frequently on the
xref elements. On a
linkend attribute establishes a hypertext
link from the text of the
link to the element with the
attribute specified in the
linkend attribute. The semantics of
xref are analogous, but the link text is generated automatically.
For example, an
xref to an
xml:id on a
element might generate link text like “Figure 3.1, Title of Figure”.
The exact nature of the generated text is controlled by the tool processing
linkend attribute can be used on most inline
elements, not just on
xref. Used in
this way, the element on which the
appears behaves like a
link. (This is a convenient alternative
that avoids having to place the other inline element inside a
or place a
link inside it.)
DocBook allows links to occur more-or-less ubiquitously. As a consequence, it is possible to nest one link inside another. This is not forbidden, but the semantics of such nested links are undefined.
For cross-references to a bibliography entry, the
biblioref elements work analogously to the
xref elements. Bibliographic references often require very
specific formatting; having specific elements for this purpose simplifies
Titles are often highlighted (in printed books they are usually
italicized, for example). The
citetitle element is used to
identify titles. It may be used with or without linking attributes.
XLink is a separate, general purpose XML linking standard. It provides a rich and powerful vocabulary for describing a wide variety of linking scenarios. DocBook V5.1 has complete support for XLink and allows the XLink attributes on most elements.
The simplest and most common use of XLink in DocBook is simply
to place the
xlink:href attribute on an inline element. The value
xlink:href attribute is technically an
[XPointer Framework]), though this is often simply a URI.
DocBook places no constraints on the kinds of XPointers that can be used,
though not all processors will be able to address all XPointers, so make
sure you are satisfied that you can process the ones you use.
linkend on an element forms an intra-document
xlink:href usually forms an
There's nothing that prevents an XLink from being an intra-document
link, using the bare fragment identifier form. The salient difference between
using an ID/IDREF link (
linkend) and a bare fragment identifier
xlink:href) is that the former is always checked by the validator
and the latter never is. This means that errors in the latter form are less
likely to be detected by the processor.
Two examples of simple XLinks:
XLink allows authors to describe some very useful linking structures that have long been ignored by web browsers. Consider, for example:
<para>What about <citetitle xlink:type="extended"> <link xlink:type="locator" xlink:href="http://docbook.org/" xlink:label="target" xlink:title="DocBook.org"/> <link xlink:type="locator" xlink:href="http://en.wikipedia.org/wiki/DocBook" xlink:label="target" xlink:title="DocBook on Wikipedia"/> <phrase xlink:type="resource" xlink:label="source">DocBook</phrase> <link xlink:type="arc" xlink:from="source" xlink:to="target"/> </citetitle> and <citetitle xml:id="xquery">XQuery</citetitle>. </para>
This defines a link with one source “DocBook” and two
targets, the DocBook website and the DocBook page on
http://en.wikipedia.org/. It can get even more interesting.
extendedlink somewhere in the document, for example in the
<extendedlink xlink:type="extended"> <locator xlink:type="locator" xlink:href="http://www.w3.org/TR/XQuery" xlink:label="target" xlink:title="XQuery specification"/> <locator xlink:type="locator" xlink:href="http://www.w3.org/XML/Query/" xlink:label="target" xlink:title="XQuery WG"/> <locator xlink:type="locator" xlink:href="http://en.wikipedia.org/wiki/XQuery" xlink:label="target" xlink:title="XQuery on Wikipedia"/> <locator xlink:type="locator" xlink:href="#xquery" xlink:label="source"/> <arc xlink:type="arc" xlink:from="source" xlink:to="target"/> </extendedlink>
The word “XQuery” in the preceding example becomes a multi-ended link too! XLink allows links to be defined from either end and with either inline or standoff markup.
Support (or lack thereof!) for these more complex forms is alas, highly variable. Check to make sure your processor can handle the forms you use.
One final note on XLink. The XLink specification uses an attribute,
xlink:title, to contain the human-readable titles of locators,
arcs, etc. DocBook assiduously avoids placing human-readable content in
attribute values because they cannot contain inline graphics for non-Unicode
characters and they complicate translation efforts. Nevertheless, for
xlink:title attribute is used in XLink support.
Consider the implications of placing human-readable titles in that attribute
DocBook does not define a complete set of elements for
representing equations. The Mathematical Markup Language
is a standard that defines a comprehensive grammar for representing
equations. MathML markup may be used in any of the
inlineequation). For simple mathematics equations
that do not require extensive markup, the
mathphrase element is an alternative.
3.7.5. User interfaces
The text of a label in a GUI
3.7.6. Programming languages and constructs
3.7.7. Operating systems
3.7.8. General purpose
4. Roots: Starting Your DocBook Document
There’s one final detail of the physical and logical structures of
DocBook that we’ve left out: where can your document begin? In other
words, what are the valid “document elements” of DocBook
documents? Naturally, you can start at
book, but can you also start at
chapter? What about
If you come to DocBook from the DTD days, this question may seem odd. A DTD doesn’t provide any facility to impose constraints on where a document can begin. If the element occurs in the DTD, you can start with it.
RELAX NG does give us the ability to impose such constraints. In fact, it requires that we do. Of course, we could make the constraint vacuous by listing every possible element as a potential document element.
But, on reflection, that’s not necessarily the best choice. It’s
valuable to have metadata associated with documents, so only elements with
info element can be root elements, but not every
element with an
info element is currently included. In
DocBook V5.0 the following elements are available:
With the next point release of DocBook, V5.1, the
technical committee may take the position that any element that can
info wrapper can be a document element. This would dramatically expand the
list of valid root elements.
5. Making a DocBook Book
book, in English at least,
consists of some meta-information in an
copyright, etc.), one or more
perhaps a few
indexes, and a
Example 2.1, “A typical book” shows the structure of a typical book. Additional content is required where the ellipses occur.
6. Making a Chapter
appendixes all have a similar structure. They
consist of a
title, possibly some additional
meta-information, and any number of block-level elements followed by any
number of top-level sections. Each section may in turn contain any number
of block-level elements followed by any number from the next section
level, as shown in Example 2.2, “A typical chapter”.
7. Making an Article
For documents smaller than a book, such as journal articles,
white papers, or technical notes,
article is frequently
the most logical starting point. The body of an
is essentially the same as the body of a
chapter or any
other component-level element, as shown in Example 2.3, “A typical article”.
8. Making a Reference Page
The reference page or manual page in DocBook was inspired by, and in fact designed to reproduce, the common UNIX “manpage” concept. (We use the word “page” loosely here to mean a document of variable length containing reference material on a specific topic.) DocBook is rich in markup tailored for such documents, which often vary greatly in content, however well structured they may be. To reflect both the structure and the variability of such texts, DocBook specifies that reference pages have a strict sequence of parts, even though several of them are actually optional.
infoelement contains meta-information about the reference page (which should not be confused with
refmeta, which it precedes). It marks up information about the author of the document, or the product to which it pertains, or the document’s revision history, or other such information.
refmetacontains a title for the reference page (which may be inferred if the
refmetaelement is not present) and an indication of the volume number in which this reference page occurs. The
manvolnumis a very UNIX-centric concept. In traditional UNIX documentation, the subject of a reference page is typically identified by name and volume number; this allows you to distinguish between the uname command, “uname(1)” in volume 1 of the documentation, and the
unamefunction, “uname(3)” in volume 3.
Additional information of this sort, such as conformance or vendor information specific to the particular environment you are working in, may be stored in
The first obligatory element is
refnamediv, which is a wrapper for information about whatever you’re documenting, rather than the document itself. It can begin with a
refdescriptorif several items are being documented as a group and the group has a name. The
refnamedivmust contain at least one
refname, that is, the name of whatever you’re documenting, and a single short statement that sums up the use or function of the item(s) at a glance: its
refpurpose. Also available is the
refclass, intended to detail the operating system configurations that the software element in question supports.
refsynopsisdivis intended to provide a quick synopsis of the topic covered by the reference page. For commands, this is generally a syntax summary of the command, and for functions, the function prototype, but other options are possible. A
titleis allowed, but not required, presumably because the application that processes reference pages will generate the appropriate title if it is not given. In traditional UNIX documentation, its title is always “Synopsis.”
9. Making Front and Back Matter
DocBook contains markup for the usual variety of front and back matter necessary for books and articles: indexes, glossaries, bibliographies, and tables of contents. In many cases, these components are generated automatically, at least in part, from your document by an external processor, but you can create them by hand, and in either case, store them in DocBook.
Some forms of back matter, such as indexes and glossaries, usually require additional markup in the document to make generation by an application possible. Bibliographies are usually composed by hand like the rest of your text, unless you are automatically selecting bibliographic entries out of some larger database. Our principal concern here is to acquaint you with the kind of markup you need to include in your documents if you want to construct these components.
Front matter, like the table of contents, is almost always generated
automatically from the text of a document by the processing application.
If you need information about how to mark up a table of contents in
DocBook, please consult the reference page for
9.1. Making an Index
In some highly structured documents such as reference manuals, you can automate the whole process of generating an index successfully without altering or adding to the original source. You can design a processing application to select the information and compile it into an adequate index. But this is rare.
In most cases—and even in the case of some reference manuals—a useful index still requires human intervention to mark occurrences of words or concepts that will appear in the text of the index.
9.1.1. Marking index terms
<para> <indexterm><primary>Big Cats</primary> <secondary>Tigers</secondary></indexterm> The tiger is a very large cat indeed. </para>
This index term has two levels,
correspond to an increasing amount of indented text in the resultant
index. DocBook allows for three levels of index terms, with the third
There are two ways that you can index a range of text. The first is to put index marks at both the beginning and end of the discussion. The mark at the beginning asserts that it is the start of a range, and the mark at the end refers back to the beginning. In this way, the processing application can determine what range of text is indexed. Here’s the previous tiger example recast as starting and ending index terms:
<para> <indexterm xml:id="tiger-desc" class="startofrange"> <primary>Big Cats</primary> <secondary>Tigers</secondary></indexterm> The tiger is a very large cat indeed… </para> ⋮ <para> So much for tigers<indexterm startref="tiger-desc" class="endofrange"/>. Let's talk about leopards. </para>
Another way to mark up a range of text is to specify
that the entire content of an element, such as a chapter or section,
is the complete range. In this case, all you need is for the index
term to point to the
the element that contains the content in question. The
zone attribute of
indexterm provides this functionality.
One of the interesting features of this method is that the actual index marks do not have to occur anywhere near the text being indexed. It is possible to collect all of them together, for example, in one file, but it is not invalid to have the index marker occur near the element it indexes.
Suppose the discussion of tigers in your document comprises a
whole text object (such as a
chapter) with an
xml:id value of
tiger-desc. You can put the following tag anywhere
in your document to index that range of text:
<indexterm zone="tiger-desc"> <primary>Big Cats</primary> <secondary>Tigers</secondary></indexterm>
9.1.2. Printing an index
After you have added the appropriate markup to your document, an external application can use this information to build an index. The resultant index must have information about the page numbers on which the concepts appear. It’s usually the document formatter that builds the index. In this case, it may never be instantiated in DocBook.
However, there are applications that can produce an
index marked up in DocBook. The following example includes some one-
(which correspond to the primary and secondary levels in the
indexterms themselves) that begin with the letter
<index><title>Index</title> <indexdiv><title>D</title> <indexentry> <primaryie>database (bibliographic), 253, 255</primaryie> <secondaryie>structure, 255</secondaryie> <secondaryie>tools, 259</secondaryie> </indexentry> <indexentry> <primaryie>dates (language specific), 179</primaryie> </indexentry> <indexentry> <primaryie>DC fonts, <emphasis>172</emphasis>, 177</primaryie> <secondaryie>Math fonts, 177</secondaryie> </indexentry> </indexdiv> </index>
The structure of
indexentry is parallel to
the structure of
9.2. Making a Glossary
glossary, like a
bibliography, is often constructed by hand. However,
some applications are capable of building a skeletal index from glossary
term markup in the document. If all of your terms are defined in some
glossary database, it may even be possible to construct the complete
To enable automatic glossary generation, or simply
automatic linking from glossary terms in the text to glossary entries,
you must add markup to your documents. In the text, you mark up a term
for compilation later with the inline
This tag can have a
attribute whose value is the ID of the actual entry in the
For instance, if you have this markup in your document:
<glossterm linkend="xml">Extensible Markup Language</glossterm> is a new standard…
<glossary><title>Example Glossary</title> ⋮ <glossdiv><title>E</title> <glossentry xml:id="xml"><glossterm>Extensible Markup Language</glossterm> <acronym>XML</acronym> <glossdef> <para>Some reasonable definition here.</para> <glossseealso otherterm="sgml"> </glossdef> </glossentry> </glossdiv> ⋮ </glossary>
Note that the
reappears in the glossary to mark up the term and distinguish it from
its definition within the
xml:id that the
glossentry referenced in the text is the
ID of the
glossentry in the
itself. You can use the link between source and glossary to create a
link in electronic formats, as we have done with the HTML and PDF forms
of the glossary in this book.
<para> Using <glossterm baseform="DTD">DTDs</glossterm> can be hazardous to your sanity. </para>
9.3. Making a Bibliography
There are two ways to set up a bibliography in DocBook:
you can have the data raw or
cooked. When you use “raw” data, you
wrap your entry in the
biblioentry element and mark
up each item individually. The processor determines the display order
and supplies punctuation. When you
use “cooked” data, you wrap your entry in the
bibliomixed and provide the data in the
order in which you want it displayed, and you include the
Here’s an example of a raw bibliographical item, wrapped in the
<biblioentry xreflabel="Kites75"> <authorgroup> <author><firstname>Andrea</firstname><surname>Bahadur</surname></author> <author><firstname>Mark</firstname><surname>Shwarek</surname></author> </authorgroup> <copyright><year>1974</year><year>1975</year> <holder>Product Development International Holding N. V.</holder> </copyright> <isbn>0-88459-021-6</isbn> <publisher> <publishername>Plenary Publications International, Inc.</publishername> </publisher> <title>Kites</title> <subtitle>Ancient Craft to Modern Sport</subtitle> <pagenums>988-999</pagenums> <seriesinfo> <title>The Family Creative Workshop</title> <seriesvolnums>1-22</seriesvolnums> <editor> <firstname>Allen</firstname> <othername role=middle>Davenport</othername> <surname>Bragdon</surname> <contrib>Editor in Chief</contrib> </editor> </seriesinfo> </biblioentry>
The “raw” data in a
biblioentry is comprehensive to a fault—there
are enough fields to suit a host of different bibliographical styles,
and that is the point. An abundance of data requires processing
applications to select, punctuate, order, and format the bibliographical data, and it is unlikely
that all the information provided will actually be output.
All the “cooked” data in a
bibliomixed entry in a bibliography, on the
other hand, is intended to be presented to the reader in the form and
sequence in which it is provided. It even includes punctuation between
the fields of data:
<bibliomixed> <bibliomset relation="article"> <surname>Walsh</surname>, <firstname>Norman</firstname>. <title role="article">Introduction to Cascading Style Sheets</title>. </bibliomset> <bibliomset relation="journal"> <title>The World Wide Web Journal</title> <volumenum>2</volumenum><issuenum>1</issuenum>. <publishername>O'Reilly & Associates, Inc.</publishername> and <corpname>The World Wide Web Consortium</corpname>. <pubdate>Winter, 1996</pubdate></bibliomset>. </bibliomixed>
Clearly, these two ways of marking up bibliographical entries are suited to different circumstances. You should use one or the other for your bibliography, not both. Strictly speaking, mingling the raw and the cooked may be “kosher” as far as the schema is concerned, but it will almost certainly cause problems for most processing applications.
Some formatters are able to establish the link by examining the content of the terms and the glossary. In that case, the author does not need to make explicit links.