8.3

## Functional Relational Algebra

 (require fra) package: fra

This package provides a purely functional implementation of Relational Algebra in Racket.

### 1Examples

In this example, we will build a relational database for a university grade book. First, we define a database with a few relations:

 (define GradebookDB (Database [Students [StudentId Name Course] (0 "Jonah" 'CS330) (1 "Aidan" 'CS142) (2 "Sarah" 'CS630)] [Projects [ProjectId Course Title] (0 'CS330 "Garbage Collectors") (1 'CS330 "Prolog") (2 'CS142 "UFO") (3 'CS630 "Theorem Prover")] [Grades [StudentId ProjectId Grade] [0 0 2/3] [1 2 99] [2 3 -inf.0]]))

At this point GradebookDB is bound to a value obeying database/c that contains three relations: 'Students, 'Projects, and 'Grades. The first S-expression after each relation identifier is the schema/c for that relation. Each S-expression after that is a Tuple that is in the relation. As you can see, any Scheme value can be included in a tuple.

Let’s do some queries!

 (require racket/set) (define (print-relation r) (for ([c (in-list (relation-schema r))]) (printf "~a\t" c)) (printf "~n") (for ([t (in-set (relation-tuples r))]) (for ([i (in-range 0 (tuple-length t))]) (printf "~a\t" (tuple-ref t i))) (printf "~n")))

 > (with-database GradebookDB (print-relation (execute-query (query-relation 'Students))))
 StudentId Name Course 0 Jonah CS330 1 Aidan CS142 2 Sarah CS630

As you can see from this interaction, a relation is just a set of tuples, which are a vector-like abstraction. Now for some more interesting queries:

 (define (>30 n) (> 30 n))

 StudentId ProjectId Grade 0 0 2/3 2 3 -inf.0

Proposition can be any Scheme value that may be applied.

Suppose that we attempted to use that proposition on a relation that did not have 'Grade in its schema?

query-selection: Proposition must refer to subset of query's

As you can see, the error is detected before the query is ever run.

Now, let’s have some joins:

 > (with-database GradebookDB (print-relation (execute-query (query-rename 'Title 'Project (query-projection '(Name Course Title Grade) (query-natural-join (query-relation 'Projects) (query-natural-join (query-relation 'Grades) (query-relation 'Students))))))))
 Name Course Project Grade Jonah CS330 Garbage Collectors 2/3 Aidan CS142 UFO 99 Sarah CS630 Theorem Prover -inf.0

Finally, some functional database modification:

 > (with-database GradebookDB (print-relation (execute-query (query-relation 'Students))))
 StudentId Name Course 0 Jonah CS330 1 Aidan CS142 2 Sarah CS630
 > (define GradebookDB+1 (database-insert GradebookDB 'Students (Tuple 3 "Omega" (lambda () ((lambda (x) (x x)) (lambda (x) (x x)))))))
 > (with-database GradebookDB+1 (print-relation (execute-query (query-relation 'Students))))
 StudentId Name Course 0 Jonah CS330 1 Aidan CS142 2 Sarah CS630 3 Omega #
 > (with-database GradebookDB+2 (print-relation (execute-query (query-relation 'Students))))
 StudentId Name Course 1 Aidan CS142 2 Sarah CS630 3 Omega #

### 2API

This section documents the APIs of the package.

#### 2.1Schemas

Schemas are defined as lists of symbols.

 valueschema/c : contract?
Equivalent to (listof symbol?).

#### 2.2Propositions

Propositions are used by query-selection to compute sub-relations.

 procedure(prop? v) → boolean? v : any/c
Returns #t if v is a proposition, #f otherwise.

 procedure(make-prop:or lhs rhs) → prop? lhs : prop? rhs : prop?
 procedure(make-prop:and lhs rhs) → prop? lhs : prop? rhs : prop?
 procedure p : prop?
 procedure(make-prop:op op cols) → prop? op : procedure? cols : (listof symbol?)

Propositions constructors.

syntax

(Proposition p)

 p = (not p) | (and p p) | (or p p) | (proc attribute ...)

 proc : procedure?
 attribute : identifier?
Returns a proposition. The interpretation of not, and, and or is standard. When a procedure is included in a proposition, the values of the named attributes are extracted from the tuple and applied to the procedure value; if it returns #t, then the tuple is selected, otherwise it is rejected.

#### 2.3Queries

Queries are used by execute-query to run relational queries.

 procedure(query? v) → boolean? v : any/c
Returns #t if v is a query, #f otherwise.

 valuedatabase-schema/c : contract?
Equivalent to (-> symbol? schema/c).

 valuecurrent-database-schema : (parameter/c database-schema/c)
Used by query-schema to determine the schema of query-relation queries.

 procedure q : query?
Returns the schema of the relation q computes.

 procedure(query-relation rid) → query? rid : symbol?
Query of the relation rid.

 procedure(query-union q1 q2) → query? q1 : query? q2 : query?
 procedure(query-difference q1 q2) → query? q1 : query? q2 : query?
 procedure(query-intersection q1 q2) → query? q1 : query? q2 : query?
 procedure(query-product q1 q2) → query? q1 : query? q2 : query?
 procedure s : schema/c q : query?
 procedure p : prop? q : query?
 procedure(query-rename old-attr new-attr q) → query? old-attr : symbol? new-attr : symbol? q : query?
 procedure(query-rename* renaming q) → query? renaming : (hash/c symbol? symbol?) q : query?
 procedure(query-natural-join q1 q2 [equal?]) → query? q1 : query? q2 : query? equal? : (any/c any/c . -> . boolean?) = equal?
 procedure(query-theta-join p q1 q2) → query? p : prop? q1 : query? q2 : query?
 procedure(query-semi-join q1 q2) → query? q1 : query? q2 : query?
 procedure(query-anti-join q1 q2) → query? q1 : query? q2 : query?
 procedure(query-division q1 q2) → query? q1 : query? q2 : query?
 procedure(query-left-outer-join q1 q2) → query? q1 : query? q2 : query?
 procedure(query-right-outer-join q1 q2) → query? q1 : query? q2 : query?
 procedure(query-outer-join q1 q2) → query? q1 : query? q2 : query?
These construct queries represent the basic operations of relational algebra.

query-rename* applies multiple renamings at once using the dictionary to map old names to new names.

query-natural-join takes an optional equal? argument to compare attribute values for equality.

#### 2.4Tuples

Tuples are the contents of relations.

 procedure(tuple? v) → boolean? v : any/c
Returns #t if v is a tuple, #f otherwise.

 procedure t : tuple?
Returns the length of t.

 procedure(tuple-ref t i) → any/c t : tuple? i : exact-nonnegative-integer?
Returns the ith element of t.

 procedure(tuple elem ...) → tuple? elem : any/c
Returns a tuple that contains all the elems.

syntax

(Tuple elem ...)

 elem : any/c
Returns a tuple that contains all the elems.

#### 2.5Relations

Relations are the contents of databases and the results of queries.

 procedure v : any/c
Returns #t if v is a relation, #f otherwise.

 procedure r : relation?
Returns r’s schema.

 procedure r : relation?
Returns the set of tuples comprising the relation r.

syntax

 (Relation [attribute ...] (elem ...) ...)

 attribute : identifier?
 elem : any/c
Returns a relation with '(attribute ...) as its schema that contains each (Tuple elem ...) as its tuples.

#### 2.6Database

 valuedatabase/c : contract?
Equivalent to (hash/c symbol? relation? #:immutable #t).

 procedure(database-insert db rid t) → database/c db : database/c rid : symbol? t : tuple?
Returns a database that is identical to db, except t is in the relation rid.

 procedure(database-delete db rid t) → database/c db : database/c rid : symbol? t : tuple?
Returns a database that is identical to db, except t is not in the relation rid.

 procedure(call-with-database db thnk) → any db : database/c thnk : (-> any)
Executes (thnk) with db as the current database.

syntax

(with-database db e ...)

 db : database/c
Executes (begin e ...) with db as the current database.

 procedure q : query?
Computes the relation specified by q using the current database (chosen by with-database).

 valueNULL : any/c
The NULL value that is inserted by the evaluation of query-left-outer-join, query-right-outer-join, or query-outer-join.

syntax

 (Database [relation [attribute ...] (elem ...) ...] ...)

 relation : identifier?
 attribute : identifier?
 elem : any/c
Returns a database with each 'relation specified as
 (Relation [attribute ...] (elem ...) ...)

### 3Implementation Notes

The current implementation uses immutable hashes as relations, vectors as tuples (except that they can be efficient appended), and lists as schemas. Propositions are structures, but are compiled to procedures (with attribute identifiers resolved to tuple positions) prior to query execution.

execute-query uses a simple query optimizer. It has two passes: first it tries to push selection operations to the leaves of the query to reduce relation sizes prior to products, then it pulls selection operations towards the root (but not passed products) to reduce the number of iterations over all the elements of a tuple. During both passes, some simplifications are performed, such as merging adjacent renamings, projections, and identical (or trivial) selections. This optimization happens independent of any statistics about relation sizes, etc.