On this page:
2.1 Teaser
2.2 Getting Started
2.3 The Tasks
2.3.1 Task 1:   Subcategories & Categories
2.3.2 Task 2:   Products & Subcategories & Categories
2.3.3 Task 3:   Products with Non-Zero Sales
2.3.4 Task 4:   Sales by Product
2.3.5 Task 5:   Sales by Subcategory
2.3.6 Task 6:   Sales by Anything
2.3.7 Task 7:   Sales by Anything with Date Range
2.3.7.1 Extra Credit 1
2.3.7.2 Extra Credit 2 (More Challenging)
2.4 Appendix A:   Generating the Initial Schema Definition
8.12

2 Using define-schema🔗ℹ

For this walkthrough, you are a new employee at AdventureWorks. AdventureWorks is a fictional company that sells bicycles and related products. The company has a mature database, but the SQL that has been written so far is scattered all over the place, making it difficult to find and reuse.

Your boss is going to ask you for a series of reports. As you produce these reports, you will capture certain facts about this database using define-schema, making it easier for yourself and other programmers to find and reuse them. You will also learn how Plisqin offers opportunities for code reuse that are simply impossible using plain SQL.

2.1 Teaser🔗ℹ

Here is a preview of what you will be able to do by the end of this walkthrough. Imagine that your boss asks to find the top 3 best-selling Products of all-time, showing Product Name, Category Name, and Total Sales. Thanks to the reusable definitions you have already captured using define-schema, it is this easy:

> (aw:show-table
   (from p Product
         (limit 3)
         (select (ProductName p))
         (select (CategoryName p))
         (select (TotalSales p))
         (order-by 'desc (TotalSales p))))

Show TableShow SQL

select p.Name as ProductName

  , cat.Name as CategoryName

  , round(detailsG.__INJECT1, 2) as TotalSales

from Product p

left join (

  select detailsG.ProductID as __INJECT0

    , sum(detailsG.LineTotal) as __INJECT1

  from SalesOrderDetail detailsG

  group by detailsG.ProductID

) detailsG

   on (detailsG.__INJECT0 = p.ProductID)

left join ProductSubcategory subcat

   on (p.ProductSubcategoryID is not null and (subcat.ProductSubcategoryID = p.ProductSubcategoryID))

left join ProductCategory cat

   on (subcat.ProductCategoryID is not null and (cat.ProductCategoryID = subcat.ProductCategoryID))

order by round(detailsG.__INJECT1, 2) desc

limit 3

ProductName

CategoryName

TotalSales

Mountain-200 Black, 38

Bikes

4400592.8

Mountain-200 Black, 42

Bikes

4009494.76

Mountain-200 Silver, 38

Bikes

3693678.03

Click on "Show SQL" and you might be surprised! The generated SQL is much larger than the Plisqin query. This may seem like "too much magic" right now, but once you learn how it works it is actually pretty formulaic.

2.2 Getting Started🔗ℹ

Start a new file using #lang racket. Save it as aw-schema.rkt, short for "AdventureWorks schema". Add the the following code to this file:
(require plisqin
         (prefix-in aw: plisqin-examples/adventure-works))

You should now have access to the aw:show-table procedure from your REPL. This allows you to execute a query against the SQLite database. To make sure everything is set up correctly, try asking SQLite what time it is:

> (aw:show-table "select datetime('now')")

Show TableShow SQL

select datetime('now')

datetime('now')

2024-02-07 17:44:15

If you really want to know how to automatically generate this, see Appendix A: Generating the Initial Schema Definition.

OK, the first thing we need to add to our schema definition is the table and column information. There are ways to automatically inspect the database and generate the initial schema definition, but that is not what we are interested in right now. You can just use this file as your starting aw-schema.rkt file.

Save, Run, and the following query should now work on your REPL:

> (aw:show-table
   (from pc ProductCategory
         (select (Name pc))))

Show TableShow SQL

select pc.Name as Name

from ProductCategory pc

Name

Bikes

Components

Clothing

Accessories

Now that aw-schema.rkt is somewhat large, DrRacket may take a long time to run it. For this reason, I recommend that you require it from another file, where you will do any work that does not require a change to the define-schema code. (The answer keys that I will share with you do not follow this advice.)

There are a few REPL tricks you should learn before proceeding. The first identifier we passed into define-schema was adventure-works-schema. This procedure is designed to be used at the REPL as a manual alternative to autocomplete. For example, if you want to know what procedures exist that accept a ProductCategory, you can ask using the REPL:
> (adventure-works-schema '(_ ProductCategory))

'(ModifiedDate Name ProductCategoryID rowguid)

You can also ask what types of argument the ProductCategoryID procedure accepts.
> (adventure-works-schema '(ProductCategoryID _))

'(ProductCategory ProductSubcategory)

(The previous REPL interaction is very useful for discovering relationships between tables in this database. This is one reason that I prefer primary key columns to be named like "ProductCategoryID" rather than "ID".)

There is one final REPL trick which lists all the tables. The output of this will be long, so here is the command only:

(adventure-works-schema 'tables)

2.3 The Tasks🔗ℹ

Your boss is going to assign you some tasks. For each task, you will
  1. Create a query.

  2. Repeatedly refactor that query. In this part, you will add definitions to your aw-schema.rkt file.

  3. Recap and verify that your refactorings were correct. Here I will provide an answer key that you can check.

  4. Proceed to the next task.

During refactoring, I will link you to refactoring recipes that you will follow. These recipes are meant to be very thorough. As you build confidence with the recipes, you might start taking shortcuts. This is fine. You can use the Recap to make sure you are in sync.

2.3.1 Task 1: Subcategories & Categories🔗ℹ

I want to see a list of Subcategories with the Category that they belong to.

– your boss

Remember you can use (adventure-works-schema 'tables) to list all the tables.

We need to create a query. The first step is to determine which table we need to query. The ProductSubcategory table looks promising. Let’s see what it contains:

> (aw:show-table
   (from subcat ProductSubcategory
         (limit 5)))

Show TableShow SQL

select subcat.*

from ProductSubcategory subcat

limit 5

ProductSubcategoryID

ProductCategoryID

Name

rowguid

ModifiedDate

1

1

Mountain Bikes

{2D364ADE-264A-433C-B092-4FCBF3804E01}

2008-04-30 00:00:00

2

1

Road Bikes

{000310C0-BCC8-42C4-B0C3-45AE611AF06B}

2008-04-30 00:00:00

3

1

Touring Bikes

{02C5061D-ECDC-4274-B5F1-E91D76BC3F37}

2008-04-30 00:00:00

4

2

Handlebars

{3EF2C725-7135-4C85-9AE6-AE9A3BDD9283}

2008-04-30 00:00:00

5

2

Bottom Brackets

{A9E54089-8A1E-4CF5-8646-E3801F685934}

2008-04-30 00:00:00

This is a good start. Your boss didn’t say exactly which columns he wants to see, but he did say to include "the Category". It looks like ProductCategoryID is a foreign key. (Any column that ends in "ID" is probably a primary key or foreign key.) We can check if any other tables have a ProductCategoryID as follows:
> (adventure-works-schema '(ProductCategoryID _))

'(ProductCategory ProductSubcategory)

Based on the names, we conclude that ProductCategoryID is the primary key of the ProductCategory table. And we conclude that each ProductSubcategory record points to a single ProductCategory record via the ProductCategoryID column. We need to add a join to our query:
(from subcat ProductSubcategory
      (limit 5)
      (join cat ProductCategory
            (join-on (.= (ProductCategoryID cat)
                         (ProductCategoryID subcat)))))

You can run that query, but it will produce the same result set. Even though we added a join, the query does not contain any select clauses, so it just shows the primary table by default. (The primary table is ProductSubcategory here.) Let’s add some select clauses to control which columns get displayed:

> (aw:show-table
   (from subcat ProductSubcategory
         (limit 5)
         (join cat ProductCategory
               (join-on (.= (ProductCategoryID cat)
                            (ProductCategoryID subcat))))
         (select (Name subcat))
         (select (Name cat))))

Show TableShow SQL

select subcat.Name as Name

  , cat.Name as Name

from ProductSubcategory subcat

inner join ProductCategory cat

   on (cat.ProductCategoryID = subcat.ProductCategoryID)

limit 5

Name

Name

Mountain Bikes

Bikes

Road Bikes

Bikes

Touring Bikes

Bikes

Handlebars

Components

Bottom Brackets

Components

Now the query seems to be returning the correct data. We saw the "Mountain Bikes" subcategory earlier, but now we also see that it belongs to the "Bikes" category. One obvious problem is that both columns are shown as "Name". We will immediately fix that during refactoring.

Refactoring
Use the Name Clarification recipe to create the following equivalent query:
(from subcat ProductSubcategory
      (limit 5)
      (join cat ProductCategory
            (join-on (.= (ProductCategoryID cat)
                         (ProductCategoryID subcat))))
      (select (SubcategoryName subcat))
      (select (Name cat)))

Use the Name Clarification recipe again to create the following equivalent query:
(from subcat ProductSubcategory
      (limit 5)
      (join cat ProductCategory
            (join-on (.= (ProductCategoryID cat)
                         (ProductCategoryID subcat))))
      (select (SubcategoryName subcat))
      (select (CategoryName cat)))

That looks good. But we are not done refactoring. Use the Singular Join -> Schema Definition recipe to create the following equivalent query:
(from subcat ProductSubcategory
      (limit 5)
      (join cat (ProductCategory subcat))
      (select (SubcategoryName subcat))
      (select (CategoryName cat)))

But we are still not done refactoring. Use the Inline Join recipe to create the following equivalent query:
(from subcat ProductSubcategory
      (limit 5)
      (join cat (ProductCategory subcat))
      (select (SubcategoryName subcat))
      (select (CategoryName (ProductCategory subcat))))

But we are still not done refactoring. Use the Scalar Flattening recipe to create the following equivalent query:

> (aw:show-table
   (from subcat ProductSubcategory
         (limit 5)
         (select (SubcategoryName subcat))
         (select (CategoryName subcat))))

Show TableShow SQL

select subcat.Name as SubcategoryName

  , cat.Name as CategoryName

from ProductSubcategory subcat

inner join ProductCategory cat

   on (subcat.ProductCategoryID is not null and (cat.ProductCategoryID = subcat.ProductCategoryID))

limit 5

SubcategoryName

CategoryName

Mountain Bikes

Bikes

Road Bikes

Bikes

Touring Bikes

Bikes

Handlebars

Components

Bottom Brackets

Components

And now we are done!

Refactoring Recap
While refactoring our query, we made the following enhancements to our schema. You can try the following on your REPL and verify that no error is raised.
(SubcategoryName ProductSubcategory)
(CategoryName ProductCategory)
(ProductCategory ProductSubcategory)
(CategoryName ProductSubcategory)

My solution for this section is here.

2.3.2 Task 2: Products & Subcategories & Categories🔗ℹ

Show me a list of Products with Subcategory and Category names.

– your boss

We need to create a query. The first step is to determine the table we need to query. The Product table looks promising. Let’s see what it contains:

> (aw:show-table
   (from prd Product
         (limit 5)))

Show TableShow SQL

select prd.*

from Product prd

limit 5

ProductID

Name

ProductNumber

MakeFlag

FinishedGoodsFlag

Color

SafetyStockLevel

ReorderPoint

StandardCost

ListPrice

Size

SizeUnitMeasureCode

WeightUnitMeasureCode

Weight

DaysToManufacture

ProductLine

Class

Style

ProductSubcategoryID

ProductModelID

SellStartDate

SellEndDate

DiscontinuedDate

rowguid

ModifiedDate

1

Adjustable Race

AR-5381

0

0

#<sql-null>

1000

750

0

0.0

#<sql-null>

#<sql-null>

#<sql-null>

#<sql-null>

0

#<sql-null>

#<sql-null>

#<sql-null>

#<sql-null>

#<sql-null>

2008-04-30 00:00:00

#<sql-null>

#<sql-null>

{694215B7-08F7-4C0D-ACB1-D734BA44C0C8}

2014-02-08 10:01:36.827000000

2

Bearing Ball

BA-8327

0

0

#<sql-null>

1000

750

0

0.0

#<sql-null>

#<sql-null>

#<sql-null>

#<sql-null>

0

#<sql-null>

#<sql-null>

#<sql-null>

#<sql-null>

#<sql-null>

2008-04-30 00:00:00

#<sql-null>

#<sql-null>

{58AE3C20-4F3A-4749-A7D4-D568806CC537}

2014-02-08 10:01:36.827000000

3

BB Ball Bearing

BE-2349

1

0

#<sql-null>

800

600

0

0.0

#<sql-null>

#<sql-null>

#<sql-null>

#<sql-null>

1

#<sql-null>

#<sql-null>

#<sql-null>

#<sql-null>

#<sql-null>

2008-04-30 00:00:00

#<sql-null>

#<sql-null>

{9C21AED2-5BFA-4F18-BCB8-F11638DC2E4E}

2014-02-08 10:01:36.827000000

4

Headset Ball Bearings

BE-2908

0

0

#<sql-null>

800

600

0

0.0

#<sql-null>

#<sql-null>

#<sql-null>

#<sql-null>

0

#<sql-null>

#<sql-null>

#<sql-null>

#<sql-null>

#<sql-null>

2008-04-30 00:00:00

#<sql-null>

#<sql-null>

{ECFED6CB-51FF-49B5-B06C-7D8AC834DB8B}

2014-02-08 10:01:36.827000000

316

Blade

BL-2036

1

0

#<sql-null>

800

600

0

0.0

#<sql-null>

#<sql-null>

#<sql-null>

#<sql-null>

1

#<sql-null>

#<sql-null>

#<sql-null>

#<sql-null>

#<sql-null>

2008-04-30 00:00:00

#<sql-null>

#<sql-null>

{E73E9750-603B-4131-89F5-3DD15ED5FF80}

2014-02-08 10:01:36.827000000

It looks like ProductSubcategoryID is a foreign key to the ProductSubcategory table. But at least some of the records have dbnull for their ProductSubcategoryID. This means that not every Product belongs to a ProductSubcategory. We create a join and use (join-type 'left) to avoid eliminating these Products from the result set:

The code (?? expr /void) says that "if expr is null, it should not be considered equal to anything." You can read more about this at Nullability.

(from prd Product
      (limit 5)
      (join subcat ProductSubcategory
            (join-type 'left)
            (join-on (.= (ProductSubcategoryID subcat)
                         (?? (ProductSubcategoryID prd) /void)))))

Now we need to add some select clauses:

> (aw:show-table
   (from prd Product
         (limit 5)
         (join subcat ProductSubcategory
               (join-type 'left)
               (join-on (.= (ProductSubcategoryID subcat)
                            (?? (ProductSubcategoryID prd) /void))))
         (select (Name prd))
         (select (ProductNumber prd))
         (select (SubcategoryName subcat))))

Show TableShow SQL

select prd.Name as Name

  , prd.ProductNumber as ProductNumber

  , subcat.Name as SubcategoryName

from Product prd

left join ProductSubcategory subcat

   on (prd.ProductSubcategoryID is not null and (subcat.ProductSubcategoryID = prd.ProductSubcategoryID))

limit 5

Name

ProductNumber

SubcategoryName

Adjustable Race

AR-5381

#<sql-null>

Bearing Ball

BA-8327

#<sql-null>

BB Ball Bearing

BE-2349

#<sql-null>

Headset Ball Bearings

BE-2908

#<sql-null>

Blade

BL-2036

#<sql-null>

It looks like we are on the right track. The top 5 rows have a null SubcategoryName, but this seems to be correct. If you remove the limit clause you will see some rows with non-null Subcategories. Now we just need to include the name of the Category. Hang on, we’ve worked with CategoryName in the past, haven’t we? Let’s see what it is defined for:
> (adventure-works-schema '(CategoryName _))

'(ProductCategory ProductSubcategory)

Sweet! We can see that CategoryName is defined for ProductSubcategory. We did this as part of the previous task’s refactoring. That investment pays off now, because our current query has an instance of ProductSubcategory, so we can pass it into CategoryName:

> (aw:show-table
   (from prd Product
         (limit 5)
         (join subcat ProductSubcategory
               (join-type 'left)
               (join-on (.= (ProductSubcategoryID subcat)
                            (?? (ProductSubcategoryID prd) /void))))
         (select (Name prd))
         (select (ProductNumber prd))
         (select (SubcategoryName subcat))
         (select (CategoryName subcat))))

Show TableShow SQL

select prd.Name as Name

  , prd.ProductNumber as ProductNumber

  , subcat.Name as SubcategoryName

  , cat.Name as CategoryName

from Product prd

left join ProductSubcategory subcat

   on (prd.ProductSubcategoryID is not null and (subcat.ProductSubcategoryID = prd.ProductSubcategoryID))

left join ProductCategory cat

   on (subcat.ProductCategoryID is not null and (cat.ProductCategoryID = subcat.ProductCategoryID))

limit 5

Name

ProductNumber

SubcategoryName

CategoryName

Adjustable Race

AR-5381

#<sql-null>

#<sql-null>

Bearing Ball

BA-8327

#<sql-null>

#<sql-null>

BB Ball Bearing

BE-2349

#<sql-null>

#<sql-null>

Headset Ball Bearings

BE-2908

#<sql-null>

#<sql-null>

Blade

BL-2036

#<sql-null>

#<sql-null>

This query looks good!

Refactoring
Use the Singular Join -> Schema Definition recipe to create the following equivalent query:
(from prd Product
      (limit 5)
      (join subcat (ProductSubcategory prd))
      (select (Name prd))
      (select (ProductNumber prd))
      (select (SubcategoryName subcat))
      (select (CategoryName subcat)))

Use the Inline Join recipe to create the following equivalent query:
(from prd Product
      (limit 5)
      (join subcat (ProductSubcategory prd))
      (select (Name prd))
      (select (ProductNumber prd))
      (select (SubcategoryName (ProductSubcategory prd)))
      (select (CategoryName (ProductSubcategory prd))))

Use the Scalar Flattening recipe twice to create the following equivalent query:
(from prd Product
      (limit 5)
      (select (Name prd))
      (select (ProductNumber prd))
      (select (SubcategoryName prd))
      (select (CategoryName prd)))

Use the Name Clarification recipe to create the following equivalent query:

> (aw:show-table
   (from prd Product
         (limit 5)
         (select (ProductName prd))
         (select (ProductNumber prd))
         (select (SubcategoryName prd))
         (select (CategoryName prd))))

Show TableShow SQL

select prd.Name as ProductName

  , prd.ProductNumber as ProductNumber

  , subcat.Name as SubcategoryName

  , cat.Name as CategoryName

from Product prd

left join ProductSubcategory subcat

   on (prd.ProductSubcategoryID is not null and (subcat.ProductSubcategoryID = prd.ProductSubcategoryID))

left join ProductCategory cat

   on (subcat.ProductCategoryID is not null and (cat.ProductCategoryID = subcat.ProductCategoryID))

limit 5

ProductName

ProductNumber

SubcategoryName

CategoryName

Adjustable Race

AR-5381

#<sql-null>

#<sql-null>

Bearing Ball

BA-8327

#<sql-null>

#<sql-null>

BB Ball Bearing

BE-2349

#<sql-null>

#<sql-null>

Headset Ball Bearings

BE-2908

#<sql-null>

#<sql-null>

Blade

BL-2036

#<sql-null>

#<sql-null>

And now we are done!

Refactoring Recap
While refactoring our query, we made the following enhancements to our schema. You can try the following on your REPL and verify that no error is raised.
(ProductSubcategory Product)
(SubcategoryName Product)
(CategoryName Product)
(ProductName Product)

My solution for this section is here.

2.3.3 Task 3: Products with Non-Zero Sales🔗ℹ

Show me a list of Products that have non-zero sales, with Subcategory and Category names.

– your boss

This just adds the "non-zero sales" criteria to the previous task. Your boss explains that the Product catalog could use some culling, but for now "has sales?" is an acceptable way to filter out obsolete Products.

How do we know whether a Product has been sold? Remember our REPL tricks. We might want to see the list of tables again:

(adventure-works-schema 'tables)

It looks like the SalesOrderHeader and SalesOrderDetail tables are the main suspects. Let’s see what is defined for both of these tables:
(adventure-works-schema '(_ SalesOrderHeader))
(adventure-works-schema '(_ SalesOrderDetail))

Aha, the SalesOrderDetail table has a ProductID column. This seems to be the relationship we need to consider. The presence of at least one SalesOrderDetail means that a Product has been sold. With that in mind, we can write this query:

> (aw:show-table
   (from prd Product
         (limit 5)
         (select (ProductName prd))
         (select (ProductNumber prd))
         (select (SubcategoryName prd))
         (select (CategoryName prd))
         (where (exists (from dtl SalesOrderDetail
                              (where (.= (ProductID dtl)
                                         (ProductID prd))))))))

Show TableShow SQL

select prd.Name as ProductName

  , prd.ProductNumber as ProductNumber

  , subcat.Name as SubcategoryName

  , cat.Name as CategoryName

from Product prd

left join ProductSubcategory subcat

   on (prd.ProductSubcategoryID is not null and (subcat.ProductSubcategoryID = prd.ProductSubcategoryID))

left join ProductCategory cat

   on (subcat.ProductCategoryID is not null and (cat.ProductCategoryID = subcat.ProductCategoryID))

where exists (

  select dtl.*

  from SalesOrderDetail dtl

  where (dtl.ProductID = prd.ProductID)

)

limit 5

ProductName

ProductNumber

SubcategoryName

CategoryName

Sport-100 Helmet, Red

HL-U509-R

Helmets

Accessories

Sport-100 Helmet, Black

HL-U509

Helmets

Accessories

Mountain Bike Socks, M

SO-B909-M

Socks

Clothing

Mountain Bike Socks, L

SO-B909-L

Socks

Clothing

Sport-100 Helmet, Blue

HL-U509-B

Helmets

Accessories

This query looks good!

Refactoring
Use the Scalar -> Schema Definition recipe to create the following equivalent query:

> (aw:show-table
   (from prd Product
         (limit 5)
         (select (ProductName prd))
         (select (ProductNumber prd))
         (select (SubcategoryName prd))
         (select (CategoryName prd))
         (where (HasSales? prd))))

Show TableShow SQL

select prd.Name as ProductName

  , prd.ProductNumber as ProductNumber

  , subcat.Name as SubcategoryName

  , cat.Name as CategoryName

from Product prd

left join ProductSubcategory subcat

   on (prd.ProductSubcategoryID is not null and (subcat.ProductSubcategoryID = prd.ProductSubcategoryID))

left join ProductCategory cat

   on (subcat.ProductCategoryID is not null and (cat.ProductCategoryID = subcat.ProductCategoryID))

where exists (

  select dtl.*

  from SalesOrderDetail dtl

  where (dtl.ProductID = prd.ProductID)

)

limit 5

ProductName

ProductNumber

SubcategoryName

CategoryName

Sport-100 Helmet, Red

HL-U509-R

Helmets

Accessories

Sport-100 Helmet, Black

HL-U509

Helmets

Accessories

Mountain Bike Socks, M

SO-B909-M

Socks

Clothing

Mountain Bike Socks, L

SO-B909-L

Socks

Clothing

Sport-100 Helmet, Blue

HL-U509-B

Helmets

Accessories

And now we are done!

Refactoring Recap
While refactoring our query, we made the following enhancements to our schema. You can try the following on your REPL and verify that no error is raised.

(HasSales? Product)

My solution for this section is here. Warning: You might notice a small error in my solution. I did not use a fallback around this, which will cause an error if anyone passes a left join of the Product table into HasSales?.

Notice the encapsulation that (HasSales? Product) provides. Today it is implemented using exists, but in the future we might decide to denormalize and add a HasSales column to the Product table. The important point is that we can choose a different implementation without breaking any calling code - the calling code will always remain (HasSales? product). This is not true in SQL - switching from an "exists" implementation to a simple column access would require updating all the call sites.

2.3.4 Task 4: Sales by Product🔗ℹ

Show me a list of the best-selling Products of all time. Sort by total revenue. Include total quantity sold and subcategory.

– your boss

So far, each query prior to refactoring has been similar to how you would write the query using plain SQL. This time, the initial query will use grouped join aggregation, a technique that is unlike anything in SQL. You don’t need to understand this right now in order to perform the refactoring, but if you are curious, the Aggregates section should be next on your reading list.

We need to create a query. The first step is to determine the table we need to query. This query will be of the Product table. But in order to get "total revenue" and "total quantity sold", we will need to use some aggregations. We recall from the previous task that the SalesOrderDetail table points to the Product table. From the other direction, we can say that each Product has a group of SalesOrderDetail records – the records whose ProductID matches the Product. We build the detailsG grouped join and perform aggregations over it to solve this task.

> (aw:show-table
   (from prd Product
         (limit 5)
         (select (ProductNumber prd))
         (select (SubcategoryName prd))
         (join detailsG SalesOrderDetail
               (join-type 'left)
               (group-by (ProductID detailsG))
               (join-on (.= (ProductID detailsG)
                            (ProductID prd))))
         (select (>> (round (sum (LineTotal detailsG)) 2)
                     #:as 'TotalSales))
         (select (>> (sum (OrderQty detailsG))
                     #:as 'TotalQtySold))
         (order-by 'desc (sum (LineTotal detailsG)))))

Show TableShow SQL

select prd.ProductNumber as ProductNumber

  , subcat.Name as SubcategoryName

  , round(detailsG.__INJECT1, 2) as TotalSales

  , detailsG.__INJECT2 as TotalQtySold

from Product prd

left join (

  select detailsG.ProductID as __INJECT0

    , sum(detailsG.LineTotal) as __INJECT1

    , sum(detailsG.OrderQty) as __INJECT2

  from SalesOrderDetail detailsG

  group by detailsG.ProductID

) detailsG

   on (detailsG.__INJECT0 = prd.ProductID)

left join ProductSubcategory subcat

   on (prd.ProductSubcategoryID is not null and (subcat.ProductSubcategoryID = prd.ProductSubcategoryID))

order by detailsG.__INJECT1 desc

limit 5

ProductNumber

SubcategoryName

TotalSales

TotalQtySold

BK-M68B-38

Mountain Bikes

4400592.8

2977

BK-M68B-42

Mountain Bikes

4009494.76

2664

BK-M68S-38

Mountain Bikes

3693678.03

2394

BK-M68S-42

Mountain Bikes

3438478.86

2234

BK-M68S-46

Mountain Bikes

3434256.94

2216

This query looks good!

Refactoring
Use the Grouped Join -> Schema Definition recipe to create the following equivalent query:

> (aw:show-table
   (from prd Product
         (limit 5)
         (select (ProductNumber prd))
         (select (SubcategoryName prd))
         (join detailsG (DetailsG prd))
         (select (>> (round (sum (LineTotal detailsG)) 2)
                     #:as 'TotalSales))
         (select (>> (sum (OrderQty detailsG))
                     #:as 'TotalQtySold))
         (order-by 'desc (sum (LineTotal detailsG)))))

Show TableShow SQL

select prd.ProductNumber as ProductNumber

  , subcat.Name as SubcategoryName

  , round(detailsG.__INJECT1, 2) as TotalSales

  , detailsG.__INJECT2 as TotalQtySold

from Product prd

inner join (

  select detailsG.ProductID as __INJECT0

    , sum(detailsG.LineTotal) as __INJECT1

    , sum(detailsG.OrderQty) as __INJECT2

  from SalesOrderDetail detailsG

  group by detailsG.ProductID

) detailsG

   on (detailsG.__INJECT0 = prd.ProductID)

left join ProductSubcategory subcat

   on (prd.ProductSubcategoryID is not null and (subcat.ProductSubcategoryID = prd.ProductSubcategoryID))

order by detailsG.__INJECT1 desc

limit 5

ProductNumber

SubcategoryName

TotalSales

TotalQtySold

BK-M68B-38

Mountain Bikes

4400592.8

2977

BK-M68B-42

Mountain Bikes

4009494.76

2664

BK-M68S-38

Mountain Bikes

3693678.03

2394

BK-M68S-42

Mountain Bikes

3438478.86

2234

BK-M68S-46

Mountain Bikes

3434256.94

2216

You might want to go further with the refactoring by using the Inline Join recipe to move detailsG inline, then using the Scalar -> Schema Definition recipe to define (TotalSales Product) and (TotalQtySold Product). This might be a good idea, and you are welcome to do so. But I am going to stop here for now.

Refactoring Recap
While refactoring our query, we made the following enhancements to our schema. You can try the following on your REPL and verify that no error is raised.

(DetailsG Product)

My solution for this section is here.

2.3.5 Task 5: Sales by Subcategory🔗ℹ

Show me a list of the best-selling Subcategories of all time. Sort by total revenue. Include total quantity sold and category name.

– your boss

This task is very similar to the previous task. We need to create a query. This query will be of the ProductSubcategory table. In order to get "total revenue" and "total quantity sold", we will use grouped join aggregation over the SalesOrderDetail table just as we did in the previous task. Again, you don’t need to fully understand grouped join aggregation at this time. You can take the initial query on faith; just make sure that the refactoring makes sense to you.

> (aw:show-table
   (from subcat ProductSubcategory
         (limit 5)
         (select (SubcategoryName subcat))
         (select (CategoryName subcat))
         (join detailsG SalesOrderDetail
               (join-type 'left)
               (join prd Product
                     (join-on (.= (ProductID prd)
                                  (ProductID detailsG))))
               (group-by (ProductSubcategoryID prd))
               (join-on (.= (?? (ProductSubcategoryID prd) /void)
                            (ProductSubcategoryID subcat))))
         (select (>> (round (sum (LineTotal detailsG)) 2)
                     #:as 'TotalSales))
         (select (>> (sum (OrderQty detailsG))
                     #:as 'TotalQtySold))
         (order-by 'desc (sum (LineTotal detailsG)))))

Show TableShow SQL

select subcat.Name as SubcategoryName

  , cat.Name as CategoryName

  , round(detailsG.__INJECT1, 2) as TotalSales

  , detailsG.__INJECT2 as TotalQtySold

from ProductSubcategory subcat

left join (

  select prd.ProductSubcategoryID as __INJECT0

    , sum(detailsG.LineTotal) as __INJECT1

    , sum(detailsG.OrderQty) as __INJECT2

  from SalesOrderDetail detailsG

  inner join Product prd

     on (prd.ProductID = detailsG.ProductID)

  group by prd.ProductSubcategoryID

) detailsG

   on (detailsG.__INJECT0 is not null and (detailsG.__INJECT0 = subcat.ProductSubcategoryID))

inner join ProductCategory cat

   on (subcat.ProductCategoryID is not null and (cat.ProductCategoryID = subcat.ProductCategoryID))

order by detailsG.__INJECT1 desc

limit 5

SubcategoryName

CategoryName

TotalSales

TotalQtySold

Road Bikes

Bikes

43909437.51

47196

Mountain Bikes

Bikes

36445443.94

28321

Touring Bikes

Bikes

14296291.26

14751

Mountain Frames

Components

4713930.23

11621

Road Frames

Components

3851350.6

11753

This query looks good!

Refactoring
Use the Singular Join -> Schema Definition recipe to create the following equivalent query:
(from subcat ProductSubcategory
      (limit 5)
      (select (SubcategoryName subcat))
      (select (CategoryName subcat))
      (join detailsG SalesOrderDetail
            (join-type 'left)
            (join prd (Product detailsG))
            (group-by (ProductSubcategoryID prd))
            (join-on (.= (?? (ProductSubcategoryID prd) /void)
                         (ProductSubcategoryID subcat))))
      (select (>> (round (sum (LineTotal detailsG)) 2)
                  #:as 'TotalSales))
      (select (>> (sum (OrderQty detailsG))
                  #:as 'TotalQtySold))
      (order-by 'desc (sum (LineTotal detailsG))))

Use the Inline Join recipe to create the following equivalent query:
(from subcat ProductSubcategory
      (limit 5)
      (select (SubcategoryName subcat))
      (select (CategoryName subcat))
      (join detailsG SalesOrderDetail
            (join-type 'left)
            (join prd (Product detailsG))
            (group-by (ProductSubcategoryID (Product detailsG)))
            (join-on (.= (?? (ProductSubcategoryID (Product detailsG)) /void)
                         (ProductSubcategoryID subcat))))
      (select (>> (round (sum (LineTotal detailsG)) 2)
                  #:as 'TotalSales))
      (select (>> (sum (OrderQty detailsG))
                  #:as 'TotalQtySold))
      (order-by 'desc (sum (LineTotal detailsG))))

Use the Scalar Flattening recipe to create the following equivalent query:
(from subcat ProductSubcategory
      (limit 5)
      (select (SubcategoryName subcat))
      (select (CategoryName subcat))
      (join detailsG SalesOrderDetail
            (join-type 'left)
            (group-by (ProductSubcategoryID detailsG))
            (join-on (.= (?? (ProductSubcategoryID detailsG) /void)
                         (ProductSubcategoryID subcat))))
      (select (>> (round (sum (LineTotal detailsG)) 2)
                  #:as 'TotalSales))
      (select (>> (sum (OrderQty detailsG))
                  #:as 'TotalQtySold))
      (order-by 'desc (sum (LineTotal detailsG))))

Use the Grouped Join -> Schema Definition recipe to create the following equivalent query:

> (aw:show-table
   (from subcat ProductSubcategory
         (limit 5)
         (select (SubcategoryName subcat))
         (select (CategoryName subcat))
         (join detailsG (DetailsG subcat))
         (select (>> (round (sum (LineTotal detailsG)) 2)
                     #:as 'TotalSales))
         (select (>> (sum (OrderQty detailsG))
                     #:as 'TotalQtySold))
         (order-by 'desc (sum (LineTotal detailsG)))))

Show TableShow SQL

select subcat.Name as SubcategoryName

  , cat.Name as CategoryName

  , round(detailsG.__INJECT1, 2) as TotalSales

  , detailsG.__INJECT2 as TotalQtySold

from ProductSubcategory subcat

inner join (

  select prd.ProductSubcategoryID as __INJECT0

    , sum(detailsG.LineTotal) as __INJECT1

    , sum(detailsG.OrderQty) as __INJECT2

  from SalesOrderDetail detailsG

  inner join Product prd

     on (prd.ProductID = detailsG.ProductID)

  group by prd.ProductSubcategoryID

) detailsG

   on (detailsG.__INJECT0 is not null and (detailsG.__INJECT0 = subcat.ProductSubcategoryID))

inner join ProductCategory cat

   on (subcat.ProductCategoryID is not null and (cat.ProductCategoryID = subcat.ProductCategoryID))

order by detailsG.__INJECT1 desc

limit 5

SubcategoryName

CategoryName

TotalSales

TotalQtySold

Road Bikes

Bikes

43909437.51

47196

Mountain Bikes

Bikes

36445443.94

28321

Touring Bikes

Bikes

14296291.26

14751

Mountain Frames

Components

4713930.23

11621

Road Frames

Components

3851350.6

11753

Again, you might want to go further with the refactoring and define properties like (TotalSales ProductSubcategory) and (TotalQtySold ProductSubcategory). This might be a good idea, and you are welcome to do so. But I am going to stop here for now.

Refactoring Recap
While refactoring our query, we made the following enhancements to our schema. You can try the following on your REPL and verify that no error is raised.
(Product SalesOrderDetail)
(ProductSubcategoryID SalesOrderDetail)
(DetailsG ProductSubcategory)

My solution for this section is here.

2.3.6 Task 6: Sales by Anything🔗ℹ

Let’s look at the previous two tasks.

Show me a list of the best-selling Products of all time. Sort by total revenue. Include total quantity sold and subcategory.

– your boss

Show me a list of the best-selling Subcategories of all time. Sort by total revenue. Include total quantity sold and category name.

– your boss

We can generalize this to Show me a list of the best-selling THINGS of all time. Sort by total revenue. Include total quantity sold and SOME_OTHER_STUFF.

Let’s look at where we left the previous two tasks:
; Sales by Product:
(from prd Product
      (limit 5)
      (select (ProductNumber prd))
      (select (SubcategoryName prd))
      (join detailsG (DetailsG prd))
      (select (>> (round (sum (LineTotal detailsG)) 2)
                  #:as 'TotalSales))
      (select (>> (sum (OrderQty detailsG))
                  #:as 'TotalQtySold))
      (order-by 'desc (sum (LineTotal detailsG))))
; Sales by Subcategory:
(from subcat ProductSubcategory
      (limit 5)
      (select (SubcategoryName subcat))
      (select (CategoryName subcat))
      (join detailsG (DetailsG subcat))
      (select (>> (round (sum (LineTotal detailsG)) 2)
                  #:as 'TotalSales))
      (select (>> (sum (OrderQty detailsG))
                  #:as 'TotalQtySold))
      (order-by 'desc (sum (LineTotal detailsG))))

Notice that the last few clauses of both queries are identical. We could easily remove this duplication using a macro, but we can also use a regular procedure if we recognize that queries are appendable. By "appendable" I mean that a query has a list of clauses, and you can easily add more clauses to an existing query. The following example demonstrates this.
> (define original
    (from p Product
          (select (ProductNumber p))
          (select (ProductName p))
          (select (SubcategoryName p))
          (select (CategoryName p))))
> (define first-half
    (from p Product
          (select (ProductNumber p))
          (select (ProductName p))))
> (define both-halves
    ; This appends two more clauses to `first-half`
    (from x first-half
          (select (SubcategoryName x))
          (select (CategoryName x))))
> (equal? original both-halves)

#t

With this pattern in mind, we can make a procedure sales-report which accepts a query and appends some clauses to it. We will append the clauses that were common to the two previous tasks:
(define (sales-report first-half)
  (from x first-half
        (limit 5)
        (join detailsG (DetailsG x))
        (select (>> (round (sum (LineTotal detailsG)) 2)
                    #:as 'TotalSales))
        (select (>> (sum (OrderQty detailsG))
                    #:as 'TotalQtySold))
        (order-by 'desc (sum (LineTotal detailsG)))))

Now we can use this procedure to reimplement the Sales by Product report. We just have to pass in the first-half as follows:

> (aw:show-table
   (sales-report (from prd Product
                       (select (ProductNumber prd))
                       (select (SubcategoryName prd)))))

Show TableShow SQL

select prd.ProductNumber as ProductNumber

  , subcat.Name as SubcategoryName

  , round(detailsG.__INJECT1, 2) as TotalSales

  , detailsG.__INJECT2 as TotalQtySold

from Product prd

inner join (

  select detailsG.ProductID as __INJECT0

    , sum(detailsG.LineTotal) as __INJECT1

    , sum(detailsG.OrderQty) as __INJECT2

  from SalesOrderDetail detailsG

  group by detailsG.ProductID

) detailsG

   on (detailsG.__INJECT0 = prd.ProductID)

left join ProductSubcategory subcat

   on (prd.ProductSubcategoryID is not null and (subcat.ProductSubcategoryID = prd.ProductSubcategoryID))

order by detailsG.__INJECT1 desc

limit 5

ProductNumber

SubcategoryName

TotalSales

TotalQtySold

BK-M68B-38

Mountain Bikes

4400592.8

2977

BK-M68B-42

Mountain Bikes

4009494.76

2664

BK-M68S-38

Mountain Bikes

3693678.03

2394

BK-M68S-42

Mountain Bikes

3438478.86

2234

BK-M68S-46

Mountain Bikes

3434256.94

2216

We can do the same thing for the Sales by Subcategory report:

> (aw:show-table
   (sales-report (from subcat ProductSubcategory
                       (select (SubcategoryName subcat))
                       (select (CategoryName subcat)))))

Show TableShow SQL

select subcat.Name as SubcategoryName

  , cat.Name as CategoryName

  , round(detailsG.__INJECT1, 2) as TotalSales

  , detailsG.__INJECT2 as TotalQtySold

from ProductSubcategory subcat

inner join (

  select prd.ProductSubcategoryID as __INJECT0

    , sum(detailsG.LineTotal) as __INJECT1

    , sum(detailsG.OrderQty) as __INJECT2

  from SalesOrderDetail detailsG

  inner join Product prd

     on (prd.ProductID = detailsG.ProductID)

  group by prd.ProductSubcategoryID

) detailsG

   on (detailsG.__INJECT0 is not null and (detailsG.__INJECT0 = subcat.ProductSubcategoryID))

inner join ProductCategory cat

   on (subcat.ProductCategoryID is not null and (cat.ProductCategoryID = subcat.ProductCategoryID))

order by detailsG.__INJECT1 desc

limit 5

SubcategoryName

CategoryName

TotalSales

TotalQtySold

Road Bikes

Bikes

43909437.51

47196

Mountain Bikes

Bikes

36445443.94

28321

Touring Bikes

Bikes

14296291.26

14751

Mountain Frames

Components

4713930.23

11621

Road Frames

Components

3851350.6

11753

Interesting! The sales-report procedure accepts a query of any table for which DetailsG is defined! It appends some more clauses to create a larger query.

Refactoring Recap

My solution for this section is here.

We didn’t add anything to our schema definition in this section.

2.3.7 Task 7: Sales by Anything with Date Range🔗ℹ

What else could we do with the sales-report? Right now it is showing all-time sales. We could modify it to accept a time window of sales to consider as follows:
(define (sales-report first-half
                      #:start-date [start-date #f]
                      #:end-date [end-date #f])
  (from x first-half
        (limit 5)
        (join detailsG (DetailsG x)
              ; Like queries, joins are also appendable.
              ; The following clauses are appended to the join
              ; that (DetailsG x) returned:
              (join soh SalesOrderHeader
                    (join-on (.= (SalesOrderID soh)
                                 (SalesOrderID detailsG))))
              (when start-date
                (where (.>= (OrderDate soh)
                            start-date)))
              (when end-date
                (where (.< (OrderDate soh)
                           end-date))))
        (select (>> (round (sum (LineTotal detailsG)) 2)
                    #:as 'TotalSales))
        (select (>> (sum (OrderQty detailsG))
                    #:as 'TotalQtySold))
        (order-by 'desc (sum (LineTotal detailsG)))))

> (aw:show-table
   (sales-report
    #:start-date (val "2012-01-01" Datetime?)
    #:end-date (val "2013-01-01" Datetime?)
    (from subcat ProductSubcategory
          (select (SubcategoryName subcat))
          (select (CategoryName subcat)))))

Show TableShow SQL

select subcat.Name as SubcategoryName

  , cat.Name as CategoryName

  , round(detailsG.__INJECT1, 2) as TotalSales

  , detailsG.__INJECT2 as TotalQtySold

from ProductSubcategory subcat

inner join (

  select prd.ProductSubcategoryID as __INJECT0

    , sum(detailsG.LineTotal) as __INJECT1

    , sum(detailsG.OrderQty) as __INJECT2

  from SalesOrderDetail detailsG

  inner join SalesOrderHeader soh

     on (soh.SalesOrderID = detailsG.SalesOrderID)

  inner join Product prd

     on (prd.ProductID = detailsG.ProductID)

  where (soh.OrderDate >= '2012-01-01')

    and (soh.OrderDate < '2013-01-01')

  group by prd.ProductSubcategoryID

) detailsG

   on (detailsG.__INJECT0 is not null and (detailsG.__INJECT0 = subcat.ProductSubcategoryID))

inner join ProductCategory cat

   on (subcat.ProductCategoryID is not null and (cat.ProductCategoryID = subcat.ProductCategoryID))

order by detailsG.__INJECT1 desc

limit 5

SubcategoryName

CategoryName

TotalSales

TotalQtySold

Road Bikes

Bikes

17193556.04

19460

Mountain Bikes

Bikes

11791959.8

9034

Road Frames

Components

1692467.03

5564

Mountain Frames

Components

1541340.67

3168

Wheels

Components

492218.86

3802

Recap

My solution for this section is here.

We didn’t add anything to our schema definition in this section.

2.3.7.1 Extra Credit 1🔗ℹ

Apply the appropriate refactoring recipies so that OrderDate is defined for SalesOrderDetail. This allows you to refactor sales-report as follows:
(define (sales-report some-query
                      #:start-date [start-date #f]
                      #:end-date [end-date #f])
  (from x some-query
        (limit 5)
        (join detailsG (DetailsG x)
              (when start-date
                (where (.>= (OrderDate detailsG)
                            start-date)))
              (when end-date
                (where (.< (OrderDate detailsG)
                           end-date))))
        (select (>> (round (sum (LineTotal detailsG)) 2)
                    #:as 'TotalSales))
        (select (>> (sum (OrderQty detailsG))
                    #:as 'TotalQtySold))
        (order-by 'desc (sum (LineTotal detailsG)))))

Hint: First use the Singular Join -> Schema Definition recipe to define

(SalesOrderHeader SalesOrderDetail)

Next use the Inline Join recipe, and finally use the Scalar Flattening recipe to define

(OrderDate SalesOrderDetail)

Answer Key

My solution for this section is here.

2.3.7.2 Extra Credit 2 (More Challenging)🔗ℹ

Extra Credit: Extend the definition of DetailsG so that it is defined for the ProductCategory, SalesTerritory, and SalesPerson tables. Try using sales-report with these tables. Here are some hints:

The primary key of the ProductCategory table is ProductCategoryID, so you will need to group the SalesOrderDetail records by ProductCategoryID. You will need to join from SalesOrderDetail to Product to ProductSubcategory. The ProductSubcategory table has a ProductCategoryID column.

The primary key of the SalesTerritory table is TerritoryID, so you will need to group the SalesOrderDetail records by TerritoryID. You will need to join from SalesOrderDetail to SalesOrderHeader. The SalesOrderHeader table has a TerritoryID column.

The primary key of the SalesPerson table is BusinessEntityID. There is a foreign key from SalesOrderHeader.SalesPersonID to SalesPerson.BusinessEntityID. You probably have already defined the join from SalesOrderDetail to SalesOrderHeader. This is sufficient to solve the task, but the SalesPerson table does not contain much information. If you want to see, for example, the full name of the SalesPerson you will have to join to the Person table.

Answer Keys
Part 1: My definition of (DetailsG ProductCategory) is here.

Part 2: My definition of (DetailsG SalesTerritory) is here.

Part 3: My definition of (DetailsG SalesPerson) is here.

2.4 Appendix A: Generating the Initial Schema Definition🔗ℹ

Extracting table and column information for use with define-schema is currently an ad-hoc process. Plisqin may make this easier in the future. But it is already pretty easy if you are using SQL Server or PostgreSQL.

The AdventureWorks database is originally for SQL Server; Plisqin provides a port of this database to SQLite. So to generate the initial AdventureWorks schema, I ran this query against the SQL Server original. This produces Racket code that just needs some simple text massaging.

If you have no choice but to use SQLite, you would probably have to write a small program with some loops that uses "sqlite_master" and "pragma table_info". But the following queries show that the data is available:

> (aw:show-table
   "select type, name from sqlite_master where name like 'Business%'")

Show TableShow SQL

select type, name from sqlite_master where name like 'Business%'

type

name

table

BusinessEntity

table

BusinessEntityAddress

table

BusinessEntityContact

> (aw:show-table
   "pragma table_info(BusinessEntity)")

Show TableShow SQL

pragma table_info(BusinessEntity)

cid

name

type

notnull

dflt_value

pk

0

BusinessEntityID

integer_id

1

#<sql-null>

0

1

rowguid

text

1

#<sql-null>

0

2

ModifiedDate

text

1

#<sql-null>

0