15 Data Sourcing
(require xiden/source) | package: xiden |
A source is a value that implements gen:source. When used with fetch, a source produces an input port and an estimate of how many bytes that port can produce. Xiden uses sources to read data with safety limits. To tap a source means gaining a reference to the input port and estimate. To exhaust a source means gaining a reference to a contextual error value. We can also say a source is tapped or exhausted.
Note that these terms are linguistic conveniences. There is no value representing a tapped or exhausted state. The only difference is where control ends up in the program, and what references become available as a result of using fetch on a source.
value
=
(-> input-port? (or/c +inf.0 exact-positive-integer?) any/c)
The procedure is given an input port, and an estimate of the maximum number of bytes the port can produce. This estimate could be +inf.0 to allow unlimited reading, provided the user allows this in their configuration.
value
exhaust/c : chaperone-contract? = (-> any/c any/c)
The sole argument to the procedure depends on the source type.
fetch attempts to tap source. If successful, fetch calls tap in tail position, passing the input port and the estimated maximum number of bytes that port is expected to produce.
Otherwise, fetch calls exhaust in tail position using a source-dependent argument.
procedure
(logged-fetch id source tap) → logged?
id : any/c source : source? tap : tap/c
struct
id : any/c errors : (listof $message?)
The computed value of the logged procedure is FAILURE if the source is exhausted. Otherwise, the value is what’s returned from tap.
The log will gain a ($fetch id errors) message, where errors is empty if the fetch is successful.
15.1 Defining Source Types
syntax
(define-source (id [field field-contract] ...) body ...)
On expansion, define-source defines a new structure type using (struct id (field ...)). The type is created with a guard that enforces per-field contracts. Instances implement gen:source.
define-source injects several bindings into the lexical context of body:
%src, %tap, and %fail are each bound to their respective formal argument of fetch.
%fetch is (bind-recursive-fetch %tap %fail).
Each field identifier is bound to a respective value for an instance of the structure type.
To understand how these injected bindings work together, let’s go through a few examples.
Use %tap to fulfil data with an input port and an estimated data length. In the simplest case, you can return constant data.
byte-source uses %tap like so:
(define-source (byte-source [data bytes?]) (%tap (open-input-bytes data) (bytes-length data)))
Notice that the data is used to both define a data field (where it appears by bytes?) and to reference the value contained in that field (within open-input-bytes and bytes-length).
Use %fail in tail position with error information to indicate a source was exhausted.
file-source uses %fail like so:
(define-source (file-source [path path-string?]) (with-handlers ([exn:fail:filesystem? %fail]) (%tap (open-input-file path) (+ (* 20 1024) (file-size path)))))
Note that %fail is an exhaust/c procedure, so it does not have to be given an exception as an argument.
%fetch is a recursive variant of fetch that uses %tap, but a possibly different exhaust/c procedure. This allows sources to control an entire fetch process and fall back to alternatives.
first-available-source uses a resursive fetch to iterate on available sources until it has none left to check.
(define-source (first-available-source [available (listof source?)] [errors list?]) (if (null? available) (%fail (reverse errors)) (%fetch (car available) (λ (e) (%fetch (first-available-source (cdr available) (cons e errors)) %fail)))))
Finally, %src is just a reference to an instance of the structure containing each field.
procedure
(bind-recursive-fetch %tap %fail)
→ (->* (source?) (exhaust/c) any/c) %tap : tap/c %fail : exhaust/c
15.2 Source Types
struct
(struct exhausted-source (value))
value : any/c
struct
(struct byte-source (data))
data : bytes?
struct
(struct first-available-source (sources errors))
sources : (listof sources?) errors : list?
If all sources for an instance are exhausted, then the instance is exhausted. As sources are visited, errors are functionally accumulated in errors.
The value produced for an exhausted first-available-source is the longest possible list bound to errors.
struct
(struct text-source (data) #:extra-constructor-name make-text-source) data : string?
struct
(struct lines-source (suffix lines) #:extra-constructor-name make-lines-source) suffix : (or/c #f char? string?) lines : (listof string?)
(define src (lines-source "\r\n" '("#lang racket/base" "(provide a)" "(define a 1)"))) ; "#lang racket/base (provide a) (define a 1) " (fetch src consume void)
struct
(struct file-source (path) #:extra-constructor-name make-file-source) path : path-string?
If the source is exhausted, it yields a relevant exn:fail:filesystem exception.
struct
(struct http-source (request-url) #:extra-constructor-name make-http-source) request-url : (or/c url? url-string?)
If the source is exhausted, it yields a relevant exception.
The behavior of the source is impacted by XIDEN_DOWNLOAD_MAX_REDIRECTS.
struct
(struct http-mirrors-source (request-urls) #:extra-constructor-name make-http-mirrors-source) request-urls : (listof (or/c url-string? url?))
15.3 Source Expressions
The following procedures are useful for declaring sources in a package input.
procedure
(coerce-source variant) → source?
variant : (or/c string? source?)
Otherwise, returns (string->source variant) in terms of the plugin.
procedure
(from-catalogs query-string [url-templates])
→ (listof url-string?) query-string : string? url-templates : (listof string?) = (XIDEN_CATALOGS)
syntax
(from-file relative-path-expr)
Due to this behavior, from-file will return different results when the containing source file changes location on disk.
15.4 Untrusted Source Expressions
struct
(struct $bad-source-eval $message (reason datum))
reason : (or/c 'security 'invariant) datum : any/c
procedure
(eval-untrusted-source-expression datum [ns]) → logged?
datum : any/c ns : namespace? = (current-namespace)
If the evaluation produces a source, then the result of the logged procedure is that source, and no other messages will appear in the program log.
If the evaluation does not produce a source, then the result is FAILURE and the program log gains a ($bad-source-eval 'invariant datum).
If the evaluation is blocked by the security guard, then the result is FAILURE and the program log gains a ($bad-source-eval 'security datum).
15.5 Transferring Bytes
(require xiden/port) | package: xiden |
xiden/port reprovides all bindings from racket/port, in addition to the bindings defined in this section.
procedure
(mebibytes->bytes mebibytes) → exact-nonnegative-integer?
mebibytes : real?
procedure
(transfer bytes-source bytes-sink #:on-status on-status #:max-size max-size #:buffer-size buffer-size #:transfer-name transfer-name #:est-size est-size #:timeout-ms timeout-ms) → void? bytes-source : input-port? bytes-sink : output-port? on-status : (-> $transfer? any) max-size : (or/c +inf.0 exact-positive-integer?) buffer-size : exact-positive-integer? transfer-name : non-empty-string? est-size : (or/c +inf.0 real?) timeout-ms : (>=/c 0)
transfer applies on-status repeatedly and synchronously with $transfer messages.
transfer reads no more than N bytes from bytes-source, and will wait no longer than timeout-ms for the next available byte.
The value of N is computed using est-size and max-size. max-size is the prescribed upper limit for total bytes to copy. est-size is an estimated for the number of bytes that bytes-source will actually produce (this is typically not decided by the user). If (> est-size max-size), then the transfer will not start. Otherwise N is bound to est-size to hold bytes-source accountable for the estimate.
If est-size and max-size are both +inf.0, then transfer will not terminate if bytes-source does not produce eof.
struct
(struct $transfer:scope $transfer (name message) #:prefab) name : string? message : (and/c $transfer? (not/c $transfer:scope?))
struct
(struct $transfer:progress $transfer ( bytes-read max-size timestamp) #:prefab) bytes-read : exact-nonnegative-integer? max-size : (or/c +inf.0 exact-positive-integer?) timestamp : exact-positive-integer?
Unless max-size is +inf.0, (/ bytes-read max-size) approaches 1. You can use this along with the timestamp (in seconds) to reactively compute an estimated time to complete.
struct
(struct $transfer:budget $transfer () #:prefab)
struct
(struct $transfer:budget:exceeded $message (size) #:prefab) size : exact-positive-integer?
See XIDEN_FETCH_TOTAL_SIZE_MB and XIDEN_FETCH_PKGDEF_SIZE_MB.
struct
(struct $transfer:budget:rejected $message ( proposed-max-size allowed-max-size) #:prefab) proposed-max-size : (or/c +inf.0 exact-positive-integer?) allowed-max-size : exact-positive-integer?
See XIDEN_FETCH_TOTAL_SIZE_MB and XIDEN_FETCH_PKGDEF_SIZE_MB.
struct
(struct $transfer:timeout $message (bytes-read wait-time) #:prefab) bytes-read : exact-nonnegative-integer? wait-time : (>=/c 0)