On this page:
3.1 Grouping
3.2 Traditional Aggregation
3.3 Illegal vs Quasi-Legal Selects
3.4 Grouped Join Aggregation
3.5 Summary
7.8

3 Aggregates

If you are going to follow along, be sure you
> (require plisqin-examples/adventure-works/schema)

3.1 Grouping

To understand aggregates, you must first understand how group-by changes the shape of the result set. First, consider the following query. It will return every row in the Product table.

(from p Product)

If we filter out some rows using where clauses, every row in the result set still corresponds to exactly one Product.
(from p Product
      (where (.> (ListPrice p)
                 (val 100))))

When we add a group-by clause, the shape of the result set is very different. In the following query, each row of the result set no longer corresponds to a single Product, but to a group of Products sharing the same Color.

> (aw:show-table
   (from p Product
         (group-by (Color p))
         (select (Color p))))

Show TableShow SQL

select p.Color as Color

from Product p

group by p.Color

Color

#<sql-null>

Black

Blue

Grey

Multi

Red

Silver

Silver/Black

White

Yellow

An Illegal Select
What would happen if you were to add (select (ListPrice p)) to the previous example? I call this an illegal select because it doesn’t make sense – there might be 10 yellow Products that all have different List Prices. PostgreSQL and MSSQL will raise an error if you try this. Unfortunately for my demonstration, SQLite tolerates illegal selects. In this example, it would choose a single ListPrice at random from the group. But I recommend that you avoid illegal selects.

Note that (select (Color p)) was not a illegal select. We grouped by Color, so each group will have exactly one Color by definition.

3.2 Traditional Aggregation

It seems that group-by is mostly useless on its own. Once we add aggregates, things start to make sense. Plisqin provides the following aggregate operations: count, avg, min, max, sum.

If we have Products grouped by Color, it does not make sense to ask "What is the ListPrice of each group of Products?" But we can ask "What is the maximum ListPrice of each group of Products?"

> (aw:show-table
   (from p Product
         (group-by (Color p))
         (select (Color p))
         (select (count p))
         (select (min (ListPrice p)))
         (select (max (ListPrice p)))))

Show TableShow SQL

select p.Color as Color

  , count(*)

  , min(p.ListPrice)

  , max(p.ListPrice)

from Product p

group by p.Color

Color

count(*)

min(p.ListPrice)

max(p.ListPrice)

#<sql-null>

248

0.0

229.49

Black

93

0.0

3374.99

Blue

26

34.99

2384.07

Grey

1

125.0

125.0

Multi

8

8.99

89.99

Red

38

34.99

3578.27

Silver

43

0.0

3399.99

Silver/Black

7

40.49

80.99

White

4

8.99

9.5

Yellow

36

53.99

2384.07

The previous query is an example of traditional aggregation. Soon we will contrast traditional aggregation with another style.

Implicit Group Caveat
You can use aggregates on an ungrouped query. Doing so implicitly puts every item into a single group. The following example is aggregating over the group of all Products.

> (aw:show-table
   (from p Product
         (select (count p))
         (select (min (ListPrice p)))
         (select (max (ListPrice p)))))

Show TableShow SQL

select count(*)

  , min(p.ListPrice)

  , max(p.ListPrice)

from Product p

count(*)

min(p.ListPrice)

max(p.ListPrice)

504

0.0

3578.27

Attempting to add (select (ListPrice p)) to the previous example would be another illegal select. The grouping is implicit, but the reasoning is the same – the group of all Products does not have a single List Price.

3.3 Illegal vs Quasi-Legal Selects

We previously saw the following example of traditional aggregation:
(from p Product
      (group-by (Color p))
      (select (Color p))
      (select (count p))
      (select (min (ListPrice p)))
      (select (max (ListPrice p))))

What if we wanted to modify this query to group by ProductCategory instead of by Color? Sticking with traditional aggregation, we can write the following query:

> (aw:show-table
   (from p Product
         (group-by (ProductCategoryID p))
         (select (ProductCategoryID p))
         (select (CategoryName p))
         (select (count p))
         (select (min (ListPrice p)))
         (select (max (ListPrice p)))))

Show TableShow SQL

select subcat.ProductCategoryID as ProductCategoryID

  , cat.Name as CategoryName

  , count(*)

  , min(p.ListPrice)

  , max(p.ListPrice)

from Product p

left join ProductSubcategory subcat

   on (p.ProductSubcategoryID is not null and (subcat.ProductSubcategoryID = p.ProductSubcategoryID))

left join ProductCategory cat

   on (subcat.ProductCategoryID is not null and (cat.ProductCategoryID = subcat.ProductCategoryID))

group by subcat.ProductCategoryID

ProductCategoryID

CategoryName

count(*)

min(p.ListPrice)

max(p.ListPrice)

#<sql-null>

#<sql-null>

209

0.0

196.92

1

Bikes

97

539.99

3578.27

2

Components

134

20.24

1431.5

3

Clothing

35

8.99

89.99

4

Accessories

29

2.29

159.0

In the previous query, (select (CategoryName p)) might look like an illegal select, but I am going to say that it is a quasi-legal select instead, and here is why:
  • We have already grouped by ProductCategoryID.

  • ProductCategoryID is a unique key of the Category table.

  • Therefore, for every column C of the Category table, each group contains exactly one value of C.

  • CategoryName is a column of the Category table.

  • Therefore, each group contains exactly one CategoryName.

  • Therefore, "What is the CategoryName of each group?" has an unambiguous answer, and it is not an illegal select.

I think I read that PostgreSQL now accepts some quasi-legal selects, but only when it can prove that they are not illegal, which is probably impossible to do with 100% accuracy.

None of this would be worth mentioning if every database permitted these quasi-legal selects, but they don’t. MSSQL rejects all quasi-legal selects, so we would have to modify our query. Adding (group-by (CategoryName p)) would be acceptable to MSSQL. But we are using SQLite which accepts all quasi-legal selects (and illegal selects too), so the query works and produces a reasonable result.

3.4 Grouped Join Aggregation

We have seen a few examples of traditional aggregation, but now it is time to introduce grouped join aggregation. First, we will define a grouped join to mean "any join which contains a group-by clause." The following procedure returns a grouped join:
> (define/contract (Products-by-Category cat)
    (-> (instanceof ProductCategory) (instanceof Product))
    (join p Product #:to cat
          (join-type 'left)
          (group-by (ProductCategoryID p))
          (join-on (.= (?? (ProductCategoryID p) /void)
                       (ProductCategoryID cat)))))

When a grouped join is passed into an aggregate, the aggregate recognizes the grouped join and performs the aggregation over that group. This is called grouped join aggregation; here is an example:

According to my normal naming convention, group-of-products would be named productsG for brevity. The longer name is for clarity.

> (aw:show-table
   (from cat ProductCategory
         (select (ProductCategoryID cat))
         (select (CategoryName cat))
         (define group-of-products
           (Products-by-Category cat))
         (select (count group-of-products))
         (select (min (ListPrice group-of-products)))
         (select (max (ListPrice group-of-products)))))

Show TableShow SQL

select cat.ProductCategoryID as ProductCategoryID

  , cat.Name as CategoryName

  , p.__INJECT1

  , p.__INJECT2

  , p.__INJECT3

from ProductCategory cat

left join (

  select subcat.ProductCategoryID as __INJECT0

    , count(*) as __INJECT1

    , min(p.ListPrice) as __INJECT2

    , max(p.ListPrice) as __INJECT3

  from Product p

  left join ProductSubcategory subcat

     on (p.ProductSubcategoryID is not null and (subcat.ProductSubcategoryID = p.ProductSubcategoryID))

  group by subcat.ProductCategoryID

) p

   on (p.__INJECT0 is not null and (p.__INJECT0 = cat.ProductCategoryID))

ProductCategoryID

CategoryName

__INJECT1

__INJECT2

__INJECT3

1

Bikes

97

539.99

3578.27

2

Components

134

20.24

1431.5

3

Clothing

35

8.99

89.99

4

Accessories

29

2.29

159.0

Notice that grouped join aggregation gives us better composability than traditional aggregation thanks to separation of concerns. Specifically, the "grouping" and "aggregating" concerns are now separated. The Products-by-Category procedure says nothing about aggregation; it only encodes the relationship "a ProductCategory has a group of Products." This means we can reuse it in other queries with other aggregations. And each aggregate (such as max) says nothing about grouping, but it will accept any grouped join you give it.

Also notice that we no longer have the quasi-legal select that we did in the traditional aggregation version of this query. Each select is fully legal here. This means that grouped join aggregation is usually easier to use if you are using a database that does not allow quasi-legal selects.

Finally, notice how grouped join aggregation allows us to define an aggregate expression as a function of a single argument. The caller of this function sees MinListPrice as just another property of the ProductCategory table:
(define/contract (MinListPrice cat)
  (-> (instanceof ProductCategory) Number?)
  (min (ListPrice (Products-by-Category cat))))

3.5 Summary

Grouped join aggregation is more composable than traditional aggregation, but it is also more limited. That is, grouped join aggregation can always be converted to traditional aggregation, but the reverse it not true. Let’s look at the first example of traditional aggregation again:
; Traditional aggregation, this works:
(from p Product
      (group-by (Color p))
      (select (Color p))
      (select (count p))
      (select (min (ListPrice p)))
      (select (max (ListPrice p))))
; Grouped join aggregation, this does not work
; because there is no Color table:
(from c Color
      (define productsG (Products-by-Color c))
      (select (ColorName c))
      (select (count p))
      (select (min (ListPrice p)))
      (select (max (ListPrice p))))

In the previous example, grouped join aggregation is impossible because the Color table was hypothetical; it does not exist. But we do have a ProductCategory table, which is why both styles are possible in the following example:
; traditional aggregation:
(from p Product
      (group-by (ProductCategoryID p))
      (select (ProductCategoryID p))
      ; A quasi-legal select; this won't work in some databases:
      (select (CategoryName p))
      (select (count p))
      (select (min (ListPrice p)))
      (select (max (ListPrice p))))
; grouped join aggregation:
(from cat ProductCategory
      ; All selects are fully legal here:
      (select (ProductCategoryID cat))
      (select (CategoryName cat))
      (define group-of-products
        (Products-by-Category cat))
      (select (count group-of-products))
      (select (min (ListPrice group-of-products)))
      (select (max (ListPrice group-of-products))))

In the previous example, the two styles will produce different result sets. The traditional aggregation will include one more row representing the group of Products that don’t belong to any ProductCategory. (If every Product belonged to a Category, there would be no difference.) In this situation, I recommend that you choose grouped join aggregation unless you need the following "null Category" to appear in your result set:

> (aw:show-table
   (from p Product
         (group-by (ProductCategoryID p))
         (select (ProductCategoryID p))
         (select (CategoryName p))
         (select (count p))
         (select (min (ListPrice p)))
         (select (max (ListPrice p)))))

Show TableShow SQL

select subcat.ProductCategoryID as ProductCategoryID

  , cat.Name as CategoryName

  , count(*)

  , min(p.ListPrice)

  , max(p.ListPrice)

from Product p

left join ProductSubcategory subcat

   on (p.ProductSubcategoryID is not null and (subcat.ProductSubcategoryID = p.ProductSubcategoryID))

left join ProductCategory cat

   on (subcat.ProductCategoryID is not null and (cat.ProductCategoryID = subcat.ProductCategoryID))

group by subcat.ProductCategoryID

ProductCategoryID

CategoryName

count(*)

min(p.ListPrice)

max(p.ListPrice)

#<sql-null>

#<sql-null>

209

0.0

196.92

1

Bikes

97

539.99

3578.27

2

Components

134

20.24

1431.5

3

Clothing

35

8.99

89.99

4

Accessories

29

2.29

159.0