Tabular Asa
1 Sources
2 Quick Example
3 Introduction
4 Row vs. Column Major
5 Row vs. Record
6 Reading Tables
table-read/  sequence
table-read/  jsexpr
table-read/  csv
table-read/  json
7 Building Tables
table-builder%
new
add-column
add-row
add-record
build
for/  table
8 Tables
table
empty-table
table-preview
table-length
table-shape
table-empty?
table-header
table-columns
table-column
table-with-column
table-with-columns-renamed
table-cut
table-drop
table-irow
table-row
table-record
table-rows
table-records
table-head
table-tail
table-select
table-map
table-apply
table-filter
table-update
table-fold
table-groupby
table-drop-na
table-reverse
table-sort
table-distinct
table-join/  inner
table-join/  outer
9 Printing Tables
pretty-print-rows
write-table
print-table
display-table
10 Writing Tables
table-write/  string
table-write/  csv
table-write/  json
11 Columns
column
empty-column
build-column
column-length
column-empty?
column-compact
column-rename
column-ref
column-head
column-tail
column-reverse
column-sort
12 Groups
group-fold
group-count
group-min
group-max
group-mean
group-sum
group-product
group-and
group-or
group-list
group-unique
group-nunique
group-sample
13 Indexes
index
build-index
empty-index
index-scan-keys
index-scan
index-length
index-empty?
index-sorted?
index-find
index-member
index-ref
index-map
index-min
index-max
index-median
index-mode
14 Sort Ordering
sort-ascending
sort-descending
orderable?
8.2

Tabular Asa

Jeffrey Massung <massung@gmail.com>

 (require tabular-asa) package: tabular-asa

A fast, efficient, immutable, dataframes implementation.

1 Sources

The source code can be found at https://github.com/massung/tabular-asa.

2 Quick Example

This is a brief example of loading a table from a CSV, filtering, grouping, aggregating, and plotting the data. Note: This example uses ~> from the threading module for clarity, but Tabular Asa does not require it.

(define books (call-with-input-file "books.csv" table-read/csv))
 
(let ([df (~> books
              (table-drop-na '(Publisher))
              (table-cut '(Genre Title))
              (table-groupby '(Genre))
              (group-count))])
  (parameterize ([plot-x-tick-label-angle 30]
                 [plot-x-tick-label-anchor 'top-right])
    (plot (discrete-histogram (for/list ([x (table-column df 'Genre)]
                                         [y (table-column df 'Title)])
                                (list x y)))
          #:x-label "Genre"
          #:y-label "Number of Titles Published")))

3 Introduction

Tabular Asa is intended to fulfill the following goals:

Tabular Asa does this by starting with a couple very simple concepts and building on them. In order, those concepts are:

4 Row vs. Column Major

When thinking about tabular data, it’s very common to think of each row (or record) as a thing to be grouped together. However, this is extremely inefficient for most operations; it requires extracting data from a larger collection into a smaller collection for many operations. It is also an inefficient use of cache. For this reason Tabular Asa is column-major.

A simple example of this difference in implementation would be cutting or inserting columns (a SELECT operation in SQL) to a table. Consider the following table of data:

name

age

gender

Jeff

23

m

Sam

14

m

Kate

38

f

When rows are stored as a sequence or hash, removing or adding a column requires duplicating every single row of data and copying it into a new sequence or hash, essentially doubling the memory usage and increasing the time it takes to perform the operation. However, if the table is stored as 3 columns, then creating a new table with a column added only adds the memory cost of the new column. Selecting a subset of columns is even easier.

Additionally, tables contain an a vector which is the index of which rows it sees. This allows for tables that are filters to simply reference the existing column data, but with a new index. In the above example table, the index would be the vector #(0 1 2). If a new table was generated by filtering the original, keeping only the girls, then the new table would contain all the same column data, but the index would be #(2).

5 Row vs. Record

For the purposes of this documentation and function names, a "row" is defined as a list? and a "record" is defined as hash?.

Likewise, the symbol k is used in place of a column name (a symbol?) and the symbol ks is used for a list of column names.

6 Reading Tables

It is important to note that - when reading tables - columns that don’t already exist will be generated on-demand. The column names will be equivelant to calling (gensym "col") to ensure they are unique symbols.

procedure

(table-read/sequence seq [columns])  table?

  seq : 
(or/c (listof any/c)
      (sequenceof hash-eq?))
  columns : (listof symbol?) = '()
Creates and returns a new table from either a sequence of rows or a sequence of records.

procedure

(table-read/jsexpr jsexpr)  table?

  jsexpr : jsexpr?
Given a jsexpr?, use the shape of the object to determine how it should be transformed into a table.

If jsexpr is a JSON object (hash-eq?), then it is assumed to be a hash of columns, where each column contains the values for it.

If jsexpr is a JSON array (list?), then it is assumes to be a list of hash-eq? records, where the key/value pairs are the column names and values for each row of the table.

procedure

(table-read/csv port    
  [#:header? header    
  #:drop-index? drop-index    
  #:separator-char sep    
  #:quote-char quote    
  #:double-quote? double-quote    
  #:comment-char comment    
  #:strip? strip    
  #:na na    
  #:na-values na-values])  table?
  port : input-port?
  header : boolean? = #t
  drop-index : boolean? = #f
  sep : char? = #\,
  quote : char? = #\"
  double-quote : char? = #t
  comment : char? = #\#
  strip : boolean? = #f
  na : any/c = #f
  na-values : (listof string?)
   = (list "" "." "na" "n/a" "nan" "null")
Reads the data in port as a CSV file using the options specified and returns a new table. Most of the arguments are used for parsing the CSV.

The header argument - if #t - indicates that the first non-comment row of the CSV should be treated as the list of column names. If false then the column names will be generated as needed.

The drop-index arugment - if #t - assumes that the first column of the CSV is the row index (i.e., an auto-incrementing integer) and shouldn’t be kept. If there is a row index column, and it is not dropped, it’s important to note that it will be treated just like any other column and is NOT used as the table’s index.

The na-values argument is a list of strings that - when parsed as the value for a given cell - are replaced with the na value to indicate N/A (not available). The values in this list are case-insensitive.

procedure

(table-read/json port [#:lines? lines])  table?

  port : input-port?
  lines : boolean? = #f
Reads the data in port as a JSON value. If lines is #t then the port is read line-by-line, where each line is assumed to be a JSON object (hash-eq?) corresponding to a single record of the resulting table. Otherwise the entire JSON object is read into memory and passed to table-read/jsexpr.

7 Building Tables

Tables can also be built at constructed using an instance of table-builder% or with the for/table macro.

class

table-builder% : class?

  superclass: object%

A table-builder% is an object that can be sent rows or records for appending and automatically grow the shape of the table being built efficiently.

constructor

(new table-builder% 
    [[initial-size initial-size] 
    [columns columns] 
    [sort-columns sort-columns]]) 
  (is-a?/c table-builder%)
  initial-size : exact-nonnegative-integer? = 5000
  columns : (listof symbol?) = '()
  sort-columns : boolean? = #f
Creates a new table-builder% with an initial shape.

The initial-size is how many rows are initially reserved for each column.

The columns is the initial list (and order) of column names. Columns may be added dynamically as rows and records are appended to the table.

The sort-columns parameter makes it so that - upon building the table - the columns are sorted alphabetically. This is useful when building a table from records and you want to ensure a consistent ordering.

method

(send a-table-builder add-column name    
  [backfill])  void?
  name : symbol?
  backfill : any/c = #f
Appends a new column to the table being built. Any rows already added to the table will be be backfilled with backfill.

Typically, this method need not be called manually, as it will automatically be called as-needed by add-row and add-record.

method

(send a-table-builder add-row r ks)  void?

  r : list?
  ks : (or/c (non-empty-listof symbol?) #f)
Appends a new row of values to the table. If ks is #f (the default), then the current set of columns (in order) of the table is assumed. If the row contains more values than there are columns then additional columns will be added to the table with generated names.

method

(send a-table-builder add-record r)  void?

  r : hash-eq?
Appends a new row of values to the table. The record is assumed to be a hash-eq? of (k . value) pairings. If the record contains a column name not yet present in the table, a new column is created for it.

method

(send a-table-builder build)  table?

Builds a new index for the table, truncates the column data and returns it.

This method may be called more than once, and each time it will return a new table. This allows you to do things like add rows, build a table, add more rows and columns, build another table, etc.

Example:

(let ([builder (new table-builder%
                    [columns '(hero universe)])])
  (send builder add-row '("Superman" "DC"))
  (send builder add-record #hasheq((hero . "Wolverine") (universe . "Marvel")))
  (send builder build))

syntax

(for/table (init-forms ...) (for-clause ...) body-or-break ... body)

Builds a table using a for macro.

The init-forms are the optional, initial forms used to create a table-builder% instance.

Each iteration of body should return either a list? or hash-eq?, which will automatically be sent to the table-builder% using add-row or add-record. It’s also possible to mix and match (i.e., return a list for one iteration and a hash for another).

When the for-clause terminates, the table is built and returned.

Example:

(for/table ([initial-size 3]
            [columns '(hero universe)])
           ([i 3])
  (case i
   ((0) '("Superman" "DC"))
   ((1) '("Wolverine" "Marvel"))
   ((2) '("Batman" "DC"))))

8 Tables

struct

(struct table (index data)
    #:extra-constructor-name make-table)
  index : (vectorof exact-nonnegative-integer?)
  data : (listof (cons/c symbol? (vectorof any/c)))
The constructor for a new table structure. There should almost never be a need to call this directly as opposed to using one of the table-read/* functions to load a table from another container or a port.

All tables are also sequences and can be iterated using for, where each iteration returns the next index and row (list). For example:

(define df (table #(0 1 2)
                  '((hero . #("Superman" "Batman" "Wonder Woman"))
                    (gender . #(m m f)))))
(for ([(i row) df])
  (displayln row))

An immutable, empty table. Useful for building a table from scratch using table-with-column or returning from a function in failure cases, etc.

parameter

(table-preview)  (table? output-port? -> void?)

(table-preview proc)  void?
  proc : (table? output-port? -> void?)
 = procedure?
Controls how tables are previewed on the REPL. The default function, simply prints the table-shape on a single line like so:

#<table [359 rows x 8 cols]>

However, if you may supply your own function, or even replace it with display-table or print-table if you always want to see a preview of the table on the REPL.

procedure

(table-length df)  exact-nonnegative-integer?

  df : table?
Returns the number of rows in the table.

Returns the number of rows and columns in the table as multiple values.

procedure

(table-empty? df)  boolean?

  df : table?
Returns #t if there are no rows or no columns in the table.

procedure

(table-header df)  (listof symbol?)

  df : table?
Returns a list of symbols, which are the column names of the table.

procedure

(table-columns df)  (listof column?)

  df : table?
Returns a list of columns containing all the data in the table.

procedure

(table-column df k)  column?

  df : table?
  k : symbol?
Looks up the column named k and returns it. If a column with that name does not exist, raise an error.

procedure

(table-with-column df data [#:as as])  table?

  df : table?
  data : sequence?
  as : (or/c symbol? #f) = #f
Returns a new table with either the column data added or replaced if as is the same name as an existing column. If no column name is provided then a new column name is generated for it.

If the data sequence contains fewer elements than there are rows in the table, then the extra rows will be filled with #f for the new column. Likewise, if data contains more values than there are rows, the extra values will be dropped.

It’s important to note that the vector for the column generated will be as large as necessary for it to be indexed properly by the table! For example, let’s say you begin with a table that has 1M rows and filter it, which returns a new table with a single row. If the index of that single row is 999999 (i.e., the last row), then adding a new column will create a vector with 1M entries, all but one of which will contain #f.

procedure

(table-with-columns-renamed df rename-map)  table?

  df : table?
  rename-map : hash-eq?
Returns a new table with the columns in rename-map renamed. Example:

(table-with-columns-renamed df #hasheq((person . name)))

procedure

(table-cut df ks)  table?

  df : table?
  ks : (non-empty-listof symbol?)
Returns a new table with only the columns ks.

procedure

(table-drop df ks)  table?

  df : table?
  ks : (non-empty-listof symbol?)
Returns a new table with the columns ks removed.

procedure

(table-irow df i)  (listof any/c)

  df : table?
  i : exact-nonnegative-integer?
Given an index position, return a row (list) of the values in the columns at that position. An index position is the exact index into the column data the table references. This is usually not what you want, but can be useful in some situations.

For a more concrete example of this, imagine a brand new table with a single column of 3 value: #(a b c); it has an index of #(0 1 2). Now, reverse the table; the index is now #(2 1 0). Calling (table-irow df 2) will return '(c), because that’s the value at index 2. However, calling (table-row df 2) will return '(a), because that’s the value of the third row; the table has been reversed.

This can be seen in action easily with the following code:

(let ([df (table-reverse (table-with-column empty-table '(a b c)))])
  (for ([(i row) df] [n (in-naturals)])
    (displayln (format " for ~a = ~v" i row))
    (displayln (format "irow ~a = ~v" i (table-irow df i)))
    (displayln (format " row ~a = ~v" n (table-row df n)))))

The for loop follows the index, so index 2 should be output first, which is reference row 0. The last index output is 0, which is the last reference row (2).

procedure

(table-row df i)  (list/c any/c)

  df : table?
  i : exact-nonnegative-integer?
Given an reference position, return a row (list) of the values in the columns at that position. A reference position is similar to vector-ref or list-ref: it is the zero-based, nth row within the table.

See the comment for table-irow for more details.

procedure

(table-record df i)  hash-eq?

  df : table?
  i : exact-nonnegative-integer?
Given an reference position, return a record (hash) of the columns and values for that row.

procedure

(table-rows df)  (sequenceof list?)

  df : table?
Iterates over the table, returning a row (list) for each row. This is different from iterating over the table itself, because the table sequence also returns the index along with the row.

procedure

(table-records df)  (sequenceof hash-eq?)

  df : table?
Iterates over the table, returning a record (hash) for each row.

procedure

(table-head df [n])  table?

  df : table?
  n : exact-nonnegative-integer? = 10
Returns a new table that is just the first n rows of df.

procedure

(table-tail df [n])  table?

  df : table?
  n : exact-nonnegative-integer? = 10
Returns a new table that is just the last n rows of df.

procedure

(table-select df flags)  table?

  df : table?
  flags : (sequenceof any/c)
Given a sequence of boolean values, filters the rows of df and returns a new table. Use table-filter to filter using a predicate function.

procedure

(table-map df proc [ks])  table?

  df : table?
  proc : ((non-empty-listof any/c) -> any/c)
  ks : (or/c (non-empty-listof symbol?) #f) = #f
Provided an optional list of columns, pass a list of those columns to proc for every row in df and return a lazy sequence of results. If ks is #f then all columns are used.

procedure

(table-apply df proc [ks])  table?

  df : table?
  proc : procedure?
  ks : (or/c (non-empty-listof symbol?) #f) = #f
Like table-map, but applies each list of column values to proc.

procedure

(table-filter df proc [ks])  table?

  df : table?
  proc : procedure?
  ks : (or/c (non-empty-listof symbol?) #f) = #f
Like table-apply, but the resulting sequence is used for a table-select. A new table is returned.

procedure

(table-update df    
  k    
  proc    
  [#:ignore-na? ignore-na])  table?
  df : table?
  k : symbol?
  proc : procedure?
  ignore-na : boolean? = #t
Applies the column k to proc and returns a new table with the column values replaced. This is similar to:

(table-with-column df (table-apply df proc (list k)) #:as k)

If ignore-na is #t (the default), then all #f values are returned as #f instead of being updated.

procedure

(table-fold df proc i [final])  table?

  df : table?
  proc : (any/c any/c -> any/c)
  i : any/c
  final : (any/c -> any/c) = identity
Returns a table with a single row, where each column has been aggregated with proc, with an initial value of i. Optionally, a final function can be applied to the result before returning.

procedure

(table-groupby df ks [less-than?])

  (sequence/c (listof (list/c symbol? any/c)) table?)
  df : table?
  ks : (non-empty-listof symbol?)
  less-than? : (or/c (any/c any/c -> boolean?) #f)
   = sort-ascending
Creates and returns a sequence of reference indices grouped by the columns in ks. Each iteration of the sequence returns two values: an associative list of the group in (k value) form and the subtable of all rows for that group. If less-than? is #f then the groups are returned in the whatever order they appeared in the source table.

procedure

(table-drop-na df [ks])  table?

  df : table?
  ks : (or/c (non-empty-listof symbol?) #f) = #f
Returns a new table with all rows dropped that have missing values among the columns specified in ks (or any column if ks is #f).

procedure

(table-reverse df)  table?

  df : table?
Returns a new table with the index order reversed.

procedure

(table-sort df [ks less-than?])  table?

  df : table?
  ks : (or/c (non-empty-listof symbol?) #f) = #f
  less-than? : (any/c any/c -> boolean?) = sort-ascending
Returns a new table with the index of df sorted by the columns ks (or all columns if #f) sorted by less-than?. By default, it will sort in ascending order using a custom sorting predicate.

procedure

(table-distinct df [ks keep])  table?

  df : table?
  ks : (or/c (non-empty-listof symbol?) #f) = #f
  keep : (or/c 'first 'last 'none) = 'first
Returns a new table removing duplicate rows where all the columns specified in ks are equal?.

procedure

(table-join/inner df    
  other    
  on    
  [less-than?    
  #:with with])  table?
  df : table?
  other : table?
  on : (non-empty-listof symbol?)
  less-than? : (any/c any/c -> boolean?) = sort-ascending
  with : (non-empty-listof symbol?) = on
Performs an INNER join of df and other, joining the columns where the on and with columns are equal? between the two tables and returns the new table.

procedure

(table-join/outer df    
  other    
  on    
  [less-than?    
  #:with with])  table?
  df : table?
  other : table?
  on : (non-empty-listof symbol?)
  less-than? : (any/c any/c -> boolean?) = sort-ascending
  with : (non-empty-listof symbol?) = on
Performs an LEFT OUTER join of df and other, joining the columns where the on and with columns are equal? between the two tables and returns the new table.

9 Printing Tables

parameter

(pretty-print-rows)  (or/c exact-nonnegative-integer? #f)

(pretty-print-rows rows)  void?
  rows : (or/c exact-nonnegative-integer? #f)
 = 10
Controls the nmaximum umber of rows output by write-table. If set to #f then there is no limit and all rows will be printed.

procedure

(write-table df    
  [port    
  mode    
  #:keep-index? keep-index])  void?
  df : table?
  port : output-port? = (current-output-port)
  mode : boolean? = #t
  keep-index : boolean? = #t
Pretty prints a maximum of pretty-print-rows rows of df to port. If the table contains more rows then the head and the tail of the table is output. If keep-index is #t then the index column is also output as well.

procedure

(print-table df    
  [port    
  #:keep-index? keep-index])  void?
  df : table?
  port : output-port? = (current-output-port)
  keep-index : boolean? = #t
Calls write-table with the mode set to #t.

procedure

(display-table df    
  [port    
  #:keep-index? keep-index])  void?
  df : table?
  port : output-port? = (current-output-port)
  keep-index : boolean? = #t
Calls write-table with the mode set to #f.

10 Writing Tables

procedure

(table-write/string df [port])  void?

  df : table?
  port : output-port? = (current-output-port)
Pretty print the entire table to port using display-table, temporarily setting pretty-print-rows to #f beforehand.

procedure

(table-write/csv df    
  [port    
  #:keep-index? keep-index    
  #:header? header    
  #:separator-char sep    
  #:quote-char quote    
  #:escape-char escape    
  #:list-char list-sep    
  #:double-quote? double-quote    
  #:na-rep na    
  #:na-values na-values])  void?
  df : table?
  port : output-port? = (current-output-port)
  keep-index : boolean? = #t
  header : boolean? = #t
  sep : char? = #\,
  quote : char? = #\"
  escape : char? = #\\
  list-sep : char? = #\|
  double-quote : boolean? = #t
  na : string? = ""
  na-values : (listof any/c) = (quote(#f))
Outputs df to port in a CSV format.

procedure

(table-write/json df    
  [port    
  #:orient orient    
  #:lines? lines    
  #:na-rep na])  void?
  df : table?
  port : output-port? = (current-output-port)
  orient : (or/c 'records 'columns) = 'records
  lines : boolean? = #t
  na : any/c = (json-null)
Outputs df to port in JSON.

If orient is 'records (the default) then every row of the table is written as an array of JSON objects. If lines is #t (and orient is 'records) then instead of an array, then each row is written as a record on each line.

If orient is 'columns then the table is written as a single JSON object, where each key/value pairing is the column name and an array of values.

The na determines what Racket value in written out as a JSON null.

11 Columns

struct

(struct column (name index data)
    #:extra-constructor-name make-column)
  name : symbol?
  index : (vectorof exact-nonnegative-integer?)
  data : (vectorof any/c)
The constructor for a new column. There should almost never be a need to call this directly as opposed to having one created for you using the table-column function, which shares the same index and data values for the table. All columns are also sequences and can be iterated using for.

value

empty-column : column?

The immutable empty column.

procedure

(build-column data [#:as as])  column?

  data : (sequenceof any/c)
  as : (or/c symbol? #f) = #f
Builds a new column with the values in data. The data is copied and a new index is built for the column. If #:as is #f then a unique column name will be generated for it.

procedure

(column-length col)  exact-nonnegative-integer?

  col : column?
Returns the number of data items referenced by the index.

procedure

(column-empty? col)  boolean?

  col : column?
Returns #t if the column’s index is empty.

procedure

(column-compact col)  column?

  col : column?
Returns a new column with duplicated, but (presumably) reduced data and memory usage. This is useful if the original column contains lots of data, but a very small index.

procedure

(column-rename col [as])  column?

  col : column?
  as : (or/c symbol? #f) = #f
Returns a new column, referencing the same data as col, but with a different name. If as is not provided, then a unique colum name will be generated.

procedure

(column-ref col n)  any/c

  col : column?
  n : exact-nonnegative-integer?
Returns the nth item from the indexed data in the column.

procedure

(column-head col [n])  column?

  col : column?
  n : exact-nonnegative-integer? = 10
Returns a new column that shares data with col, but only contains the first n items.

procedure

(column-tail col [n])  column?

  col : column?
  n : exact-nonnegative-integer? = 10
Returns a new column that shares data with col, but only contains the last n items.

procedure

(column-reverse col)  column?

  col : column?
Returns a new column that shares data with col, but with the index reversed.

procedure

(column-sort col [less-than?])  column?

  col : column?
  less-than? : ((any/c any/c) -> boolean?) = sort-ascending
Returns a new column that shares data with col, but with the index sorted by the data values.

12 Groups

procedure

(group-fold proc init group [final])  table?

  proc : (any/c any/c -> any/c)
  init : any/c
  group : (sequence/c (listof (list/c symbol? any/c)) table?)
  final : (any/c -> any/c) = identity
Iterates over every table in the group and calls table-fold for each. The result of each fold is appended and returned in a final table of results.

procedure

(group-count group)  table?

  group : (sequence/c (listof (list/c symbol? any/c)) table?)
Counts every non #f value in each group.

procedure

(group-min group [less-than?])  table?

  group : (sequence/c (listof (list/c symbol? any/c)) table?)
  less-than? : (any/c any/c -> boolean?) = sort-ascending
Returns the minimum value for each group.

procedure

(group-max group [greater-than?])  table?

  group : (sequence/c (listof (list/c symbol? any/c)) table?)
  greater-than? : (any/c any/c -> boolean?) = sort-descending
Returns the maximum value for each group.

procedure

(group-mean group)  table?

  group : (sequence/c (listof (list/c symbol? any/c)) table?)
Sums all non #f values and then averages them at the end. The average is of all valid values and across all rows. For example, the mean of the values '(2 #f 4) is 3 not 2.

procedure

(group-sum group)  table?

  group : (sequence/c (listof (list/c symbol? any/c)) table?)
Adds every non #f value in each group.

procedure

(group-product group)  table?

  group : (sequence/c (listof (list/c symbol? any/c)) table?)
Multiplies every non #f value in each group.

procedure

(group-and group)  table?

  group : (sequence/c (listof (list/c symbol? any/c)) table?)
If every value is non #f then the result is #t, otherwise #f.

procedure

(group-or group)  table?

  group : (sequence/c (listof (list/c symbol? any/c)) table?)
If any value is non #f then the result is #t, otherwise #f.

procedure

(group-list group)  table?

  group : (sequence/c (listof (list/c symbol? any/c)) table?)
Collects all non #f values into a list.

procedure

(group-unique group)  table?

  group : (sequence/c (listof (list/c symbol? any/c)) table?)
Collects all non #f values into a list, keeping only unique values.

procedure

(group-nunique group)  table?

  group : (sequence/c (listof (list/c symbol? any/c)) table?)
Counts all non #f, unique values.

procedure

(group-sample group)  table?

  group : (sequence/c (listof (list/c symbol? any/c)) table?)
Picks a random value among all the non #f values. This uses Reservoir Sampling to ensure the selection is fair and works for arbitrarily large sets.

13 Indexes

struct

(struct index (keys less-than?)
    #:extra-constructor-name make-index)
  keys : (vectorof (cons/c any/c (vectorof exact-nonnegative-integer?)))
  less-than? : (or/c ((any/c any/c) -> boolean?) #f)
Constructor for a new index. The keys are a sorted vector of lists, where the first element of the list is the key value and the rest of the list are indices. The less-than? predicate is the same as was used to sort keys before passing them in.

procedure

(build-index data [less-than?])  index?

  data : (sequenceof any/c)
  less-than? : (or/c ((any/c any/c) -> boolean?) #f)
   = sort-ascending
Creates a new index by finding all the unique keys in the data sequence along with the ordered indices where they keys are located, then sorts them using the less-than? predicate if defined. If the less-than? predicate function is #f then no sorting takes place and the keys are in a random order.

An empty index.

procedure

(index-scan-keys ix [#:from from #:to to])  sequence?

  ix : index?
  from : any/c = #f
  to : any/c = #f
Returns a sequence from the from key (inclusive) to the to key (exclusive). If from is #f then the sequence begins with the first key. If to is #f then the sequence ends with the last key in the index. The sequence returns multiple values: the key and a list of all the reference indices of the data sequence where the keys originated from.

If the index is not sorted then the order of the sequence returned is undefined.

procedure

(index-scan ix [#:from from #:to to])  sequence?

  ix : index?
  from : any/c = #f
  to : any/c = #f
Similar to index-scan-keys, but instead of the sequence returning multiple values, this sequence only returns the indices, in order.

procedure

(index-length ix)  exact-nonnegative-integer?

  ix : index?
Returns the number of unique keys in the index.

procedure

(index-empty? ix)  boolean?

  ix : index?
True if the index has no keys.

procedure

(index-sorted? ix)  boolean?

  ix : index?
True if the index was defined with a less-than? predicate.

procedure

(index-find ix key [exact])  (or/c exact-nonnegative-integer? #f)

  ix : index?
  key : any/c
  exact : boolean? = #f
Searches the index looking for a matching key. If the index is sorted then this is a binary search, otherwise it’s a linear search through all the keys for a match.

If key is found, then the reference index to the keys is returned. When not found, if exact is #t then #f is returned. Otherwise, the next higher index for the next key is returned.

procedure

(index-member ix key)

  (or/c (list/c any/c exact-nonnegative-integer? ...) #f)
  ix : index?
  key : any/c
Searches the index looking for an exactly matching key. If found then the list of key and indices is returned, otherwise #f is returned.

procedure

(index-ref ix n)  (list/c any/c exact-nonnegative-integer? ...)

  ix : index?
  n : exact-nonnegative-integer?
Returns the key and indices at the given reference index.

procedure

(index-map ix v [#:from from #:to to])  sequence?

  ix : index?
  v : (vectorof any/c)
  from : any/c = #f
  to : any/c = #f
Scans the index and maps the indices across the values in v and returns them in a new sequence.

procedure

(index-min ix)  (or/c any/c #f)

  ix : index?
Returns #f if the index is empty, otherwise returns the first key in the index.

procedure

(index-max ix)  (or/c any/c #f)

  ix : index?
Returns #f if the index is empty, otherwise returns the last key in the index.

procedure

(index-median ix)  (or/c any/c #f)

  ix : index?
Returns #f if the index is empty, otherwise returns the median key in the index.

procedure

(index-mode ix)  (or/c any/c #f)

  ix : index?
Returns #f if the index is empty, otherwise returns the key that occurs the most often.

14 Sort Ordering

All functions that allow for sorting (e.g. table-sort) or indexing/grouping take an optional less-than? compare function. Tabular Asa comes with a generic orderable interface with sort-ascending and sort-descending functions for the following basic types:

Both generic functions will always sort #f values last regardless of sort direction.

procedure

(sort-ascending a b)  boolean?

  a : orderable?
  b : any/c
Returns #t if b is #f or a < b.

procedure

(sort-descending a b)  boolean?

  a : orderable?
  b : any/c
Returns #t if b is #f or a > b.

procedure

(orderable? x)  boolean?

  x : any/c
Returns #t if x is a of a type that can ordered using sort-ascending or sort-descending.