3 Refining the Encoding

9.1

Racket

3 Refining the Encoding🔗ℹ

This section provides more details about specific tasks in the encoding process after the initial preparation of a valid and well-formed TEI XML document (which is described in Getting Started).

They are loosely organized by the order in which an encoder would likely want to perform them.

3.1 Front- and Back-matter🔗ℹ

The minimal template described in Getting Started initially wraps the entire text of the document in a single ab element in the body element of the text element. However, TEI also provides elements for marking front-matter (such as "abstracts, title page, prefaces, dedications, etc.") and back-matter (such as indices or appendices).

When front-matter is present, it should be placed in a front element immedidiately before the body element inside the text element. Likewise, when back-matter is present, it should be placed in a back element immediately after the body element inside the text element.

Like the body element, front and back MUST NOT contain text directly: they should contain elements like ab or div.

If you are working from the recomended minimal template, you will need to split the ab element into multiple ab elements accordingly. For example, from a document with the text element structured like this:

<text>
  <body>
    <ab>
      All of the text here
    </ab>
  </body>
</text>

You might produce a new text element like this:

<text>
  <front>
    <ab>
      Front-matter goes here
    </ab>
  </front>
  <body>
    <ab>
      The main body goes here
    </ab>
  </body>
  <back>
    <ab>
      Back-matter goes here
    </ab>
  </back>
</text>

3.2 Chapters & Other Sections🔗ℹ

Once we have tagged the front- and back-matter, the next step is marking large-scale structural divisions (like chapters or sections).

These divisions are marked with div tags, which have a type attribute that has one of a fixed list of values to denote the type of division, such as "chapter" or "section". (See Structural Elements under Formal Specification for the complete list of allowable type values.) If the division is numbered in the text, the number should be given in an n attribute.

Note that div elements MUST NOT directly contain text: the contents of the div must be wrapped in p, ab, head or similar tags.

A few special types of div are a particular priority for encoding, as they can improve our search feature:

"contents" (for the table of contents)
"intro" (for an introduction)
"index"
"bibl" (for a bibliograpgy)
"dedication"
"ack" (for acknowledgements)

As an example, if by following the steps above you have produced a body element like this:

<body>
  <ab>
    Content from two chapters
  </ab>
</body>

You might encode the chapters like this:

<body>
  <div type="chapter" n="1">
    <ab>
      Content from chapter one
    </ab>
  </div>
  <div type="chapter" n="2">
    <ab>
      Content from chapter two
    </ab>
  </div>
</body>

3.2.1 Sections Not by Ricœur🔗ℹ

Some works contain sections that are not written by Paul Ricœur. Marking these sections as such is a particular priority, as they should be excluded from search results.

Some preparation is needed to encode authorship in a machine-readable manner. For each author or editor who needs to be identified, you must add an xml:id attribute to the corresponding author or editor element in the titleStmt. The value for this attribute must be unique across the entire document: often, the last name is a good choice. Note that the value "ricoeur" is reserved across all documents for the author element representing Paul Ricœur.

Once these identifiers have been assigned, you can mark a div element as being by a particular author/editor by adding a resp attribute. The value for the resp attribute must point to the corresponding identifier by prefixing it with the character #. For example, if you created an author with xml:id="smith", you would write resp="#smith".

Any div elements that do not have a resp attribute are assumed to be by Paul Ricœur. In addition, there is no need to add a resp attribute for div elements with a type of "contents" or "index".

A few documents, such as "Tragic Wisdom and Beyond", take the form of an extended dialogue between Paul Ricœur and some other party. The speakers in these documents are encoded through a special process. Rather than adding a resp attribute to the div elements, each passage where a distinct individual is speaking should be enclosed in a sp ("speech") element. This element must have a who attribute that points to the speaker as described above. (The who attribute is required even when the speaker is Paul Ricœur.) Note that, like the div element, the sp element must not directly contain text. Because the information is contained in the who attribute, the notation of the speaker in the text itself can be removed.

3.3 Footnotes & Endnotes🔗ℹ

Footnotes (those references, notes, and citations appearing at the bottom of the page) and endnotes (those which appear at the end of a book or article) are encoded using the note element. It must have a place attribute with a value of either "foot" or "end". It must also have a n attribute giving the number or symbol used to reference the note in the original.

Notes must be encoded where they are referenced. In other words, at the location of the note reference in the text, embed the note element itself in place.

Translation notes should have a type attribute with a value of "transl".

Any note which was added by someone other than Paul Ricœur (such as an editor or translator) must have a resp attribute with an appropriate value as described under Sections Not by Ricœur above.

Examples:

<note place="foot" n="1">We explain below why we use the uncommon term older logical writings.</note>
<note place="end" n="*">See Bachelard’s Poetics of Space, Beacon Press, Boston (1969), p. xxi; "retentissement" in French.</note>

Adapted from https://www.cdlib.org/groups/stwg/MS_BPG.html#fnote (but note that we encode endnotes differently)

3.4 Paragraphs, Headings, and Lists🔗ℹ

Except when they can be computed automatically, encoding paragraphs is probably a fairly low priority for our project at this stage. Ultimately, however, each paragraph should be wrapped in a p element. Headings (such as chapter titles) should ideally be encoded with head elements, not p elements.

Lists are encoded specially. The list as a whole is wrapped in a list element. If it is a numbered list, it should have a rend attribute with a value of "numbered". Individual items on the list are wrapped with item elements.

Example:

<div type="chapter" n="1">
  <pb n="1"/>
  <head>The Question of Selfhood</head>
  <p>This is a <pb n="2"/> paragraph.</p>
  …
</div>

1	Background: XML and TEI
2	Getting Started
3	Refining the Encoding
4	Formal Specification
5	Tools

3.1	Front- and Back-matter
3.2	Chapters & Other Sections
3.3	Footnotes & Endnotes
3.4	Paragraphs, Headings, and Lists