On this page:
6.1 Segment Basics
segment?
tei-document-segments
base-segment?
base-segment
base-segment-meta
base-segment-body
base-segment-instance-info
6.2 Working with Segments
segment-get-meta
segment-meta?
segment-meta=?
segment
segment-resp-string
segment-by-ricoeur?
segment-page-spec
page-spec/  c
segment-location-stack
location-stack->strings
location-stack?
segment-title/  symbol
segment-document-checksum
segment-order
segment-counter
6.3 Implementing New Types of Segments
prop:  segment
0.5.91

6 Document Segments🔗

Segments are an approach to dividing the text of a TEI document into a linear stream of logical groupings that share certain metadata, such as location and authorship. Dividing a TEI document into such groupings is a common requirement for many applications: search (as in term-search) has been our initial motivating use-case, but the same process is needed to, for example, plot trends in the use of a particular term over the course of a book.

This library provides tei-document-segments, which implements the common functionality needed to divide a TEI document into segments. It also defines an extensible interface for working with segment metadata.

6.1 Segment Basics🔗

procedure

(segment? v)  any/c

  v : any/c
A predicate recognizing segments.

A segment value represents a contiguous logical subdivision of a TEI document. While the XML structure of TEI documents involves nested and overlapping hierarchies, segments present a linear view of a document.

To a first approximation, a segment might correspond to a paragraph. All of the textual content that falls within a segment shares the same metadata: for example, a segment might come from chapter one, pages 2–3, and have been written by Paul Ricœur. In fact, segments can be more granular than paragraphs: a paragraph with a footnote in the middle might be divided into several segments. On the other hand, in a TEI document for which we have not yet completed paragraph inference (see tei-document-guess-paragraphs), segments might be based on page breaks and could be longer than a paragraph.

As the above suggests, a segment is assosciated with a specific TEI document: not just the identity of the instance, as might be determined by instance-title/symbol, but with the state of the TEI document itself as reflected by tei-document-checksum.

Segments are a general, extensible way of managing this contextual information: concrete applications are likely to implement specialized representations, and these can support the segment interface using prop:segment.

This library defines two built-in kinds of segmentsbase segments and segment metadata values—along with a core function for dividing a TEI document into segments, tei-document-segments. A different part of this library uses the segment interface to support location and context information for search result values, even though a search result is a more fine-grained level of organization within a segment. Other application programs can similarly build on the common segment interface.

procedure

(tei-document-segments doc)  (listof base-segment?)

  doc : tei-document?

procedure

(base-segment? v)  any/c

  v : any/c

match expander

(base-segment meta-pat body-pat maybe-info-pat)

 
maybe-info-pat = 
  | plain-instance-info-pat

procedure

(base-segment-meta seg)  segment-meta?

  seg : base-segment?

procedure

(base-segment-body seg)  
(and/c string-immutable/c
       #px"[^\\s]")
  seg : base-segment?

procedure

(base-segment-instance-info seg)  plain-instance-info?

  seg : base-segment?
The function tei-document-segments splits a TEI document doc into a list of segments. Specifically, it produces base segment values, which are recognized by the base-segment? predicate and the base-segment pattern for match. The result of tei-document-segments is weakly cached to reduce the cost of repeated calls.

A base segment can be used with the instance info interface to access bibliographic information about the instance represented by the TEI document from which it was created.

In addition to metadata, a base segment also contains the full textual data of the segment, but this is not a requirement: most other kinds of segment values will likely not wish to do so.

6.2 Working with Segments🔗

procedure

(segment-get-meta seg)  segment-meta?

  seg : segment?

procedure

(segment-meta? v)  any/c

  v : any/c

procedure

(segment-meta=? a b)  boolean?

  a : segment?
  b : segment?
Every segment has an assosciated segment metadata value that encapsulates its location, authorship, and other information. A segment metadata value can be recognized by the predicate segment-meta? and is itself a segment (and thus satisfies segment?). The segment metadata value contains all of the information needed by the segment interface. Any segment seg can be converted to a plain segment metadata value using segment-get-meta.

In addition to being the most minimal representation of a segment, segment metadata values can be serialized with racket/serialize.

The function segment-meta=? tests segments for equality based on their segment metadata values: it will consider segments of different specific types “the same” if they have equivalent segment metadata values. Any segments that are segment-meta=? can be used interchangably for the purposes of the functions documented in this section.

match expander

(segment kw-pat ...)

 
kw-pat = #:title/symbol title/symbol-pat
  | #:checksum checksum-pat
  | #:counter counter-pat
  | #:resp-string resp-string-pat
  | #:page-spec page-spec-pat
  | #:location-stack location-stack-pat
Matches any segment value, then matches any sub-patterns against the values that would be returned by the corresponding functions documented below.

Each keyword may appear at most once.

procedure

(segment-resp-string seg)  string-immutable/c

  seg : segment?

procedure

(segment-by-ricoeur? seg)  boolean?

  seg : segment?
Functions to access the “responsible party” for the segment seg.

Internally, segment-resp-string obtains a string suitable for display to end-users naming the “responsible party” for the segment (such as Ricœur, an editor, or a translator) using lower-level functions such as tei-element-resp and instance-get-resp-string.

The predicate segment-by-ricoeur? recognizes only segments by Ricœur himself.

procedure

(segment-page-spec seg)  page-spec/c

  seg : segment?

value

page-spec/c : flat-contract?

 = 
(or/c (maybe/c string-immutable/c)
      (list/c (maybe/c string-immutable/c)
              (maybe/c string-immutable/c)))
Returns the location of the segment seg in terms of pages.

If the returned value is a two-element list, the segment spans more than one page: the first element of such a list represents the page on which the segment starts, and the second element the page on which it ends. Otherwise, if the returned value is not a list, the segment is fully contained in a single page, and the value represents that page.

In either case, a value of (nothing) signifies that the pb element it represents was not numbered (i.e. it had no n attribute). A just value contains the page number, taken from the value of the corresponding pb’s n attribute.

procedure

(segment-location-stack seg)  location-stack?

  seg : segment?

procedure

(location-stack->strings location-stack)

  (listof string-immutable/c)
  location-stack : location-stack?

procedure

(location-stack? v)  any/c

  v : any/c
Returns the location of the segment seg in terms of the structure of the source TEI document, with reference to chapters, sections, footnotes, and the like. The location is represented by an opaque location stack value, which is recognized by the predicate location-stack?.

A location stack can be converted to a list of strings suitable for display to end-users via location-stack->strings. The strings in the resulting list describe the location from the broadest level of organization to the narrowist (e.g. '("Chapter 1" "Footnote 3"), though the precise textual content of the returned strings is unspecified).

procedure

(segment-title/symbol seg)  symbol?

  seg : segment?

procedure

(segment-document-checksum seg)  symbol?

  seg : segment?
Functions that return the same symbols that would have been returned by instance-title/symbol or tei-document-checksum, respectively, applied to the source TEI document of the segment seg.

An order (in the sense of data/order) on segments. It is an error to apply segment-order’s comparison functions to segments that do not come from the same TEI document, where “the same” encompases both instance-title/symbol and tei-document-checksum.

Sorting segments according to segment-order’s less-than relation places them in the order in which they occurred in the source TEI document.

procedure

(segment-counter seg)  natural-number/c

  seg : segment?
Returns an integer representing the position of the segment seg relative to other segments from the same source TEI document.

6.3 Implementing New Types of Segments🔗

A structure type property for implementing new types of segments.

The value for the property must be a function that accepts an instance of the new structure type and returns a segment metadata value. An instance of the new structure type will satisfy segment? and can be used with any of the functions above equivalently to using the returned segment metadata value directly.

The function given as a value for prop:segment should always return the very same segment metadata value when called with the same argument. This invariant is not currently checked, but may be in the future.