On this page:
unnest-wider
unnest-longer

12 Unnesting

Sometimes, data that’s read into a data-frame will have deeply nested structure (for example, when reading JSON directly into a data-frame). These operations convert these variables into tidy data, as described in Pivoting; with enough unnesting, these operations can turn this data into a tidy data-frame (when all hashes, vectors, lists, etc are removed).

These operations only work on "list-like" (the sequence returns one value per iteration) or "dictionary-like" (two values per iteration) sequences.

The following example data-frame will be used in this section:
> (define deep-df
    (column-df
     [character #("Toothless" "Dory" "Holly")]
     [metadata
     (vector
      (hash 'species "dragon"
            'color   "black"
            'films   (vector "How to Train Your Dragon"
                             "How to Train Your Dragon 2"
                             "How to Train Your Dragon: The Hidden World"))
      (hash 'species "blue tang"
            'color   "blue"
            'films   (vector "Finding Nemo"
                             "Finding Dory"))
      (hash 'species "glaceon"
            'color   "also blue"
            'films   #f))]))

procedure

(unnest-wider df    
  column-name    
  [#:index-prefix index-prefix    
  #:remove? remove?])  data-frame?
  df : data-frame?
  column-name : string?
  index-prefix : string? = "idx-"
  remove? : boolean? = #t
Returns a data-frame like df, except the given column-name’s data is collapsed into new columns based on its keys. column-name is expected to contain sequences and NA values.

If column-name’s data is "list-like" (returns one element per iteration), the column names will be index-prefix concatenated with its index plus one. Otherwise, the column names will be the "keys" (first element) of the sequence.

If the keys of the given sequence are not strings, they will be converted to strings with ~a.

If remove? is true, column-name will be removed from the result data-frame.

Examples:
> (define expanded-metadata
    (~> deep-df
        (unnest-wider "metadata")
        introspect))

data-frame: 3 rows x 4 columns

┌─────────┬─────────┬─────────┬─────────┐

│character│color    │films    │species  

├─────────┼─────────┼─────────┼─────────┤

│Toothless│black    │#<vector>│dragon   

├─────────┼─────────┼─────────┼─────────┤

│Dory     │blue     │#<vector>│blue tang│

├─────────┼─────────┼─────────┼─────────┤

│Holly    │also blue│#f       │glaceon  

└─────────┴─────────┴─────────┴─────────┘

> (~> expanded-metadata
      (unnest-wider "films")
      (slice ["character" "idx-1" "idx-2" "idx-3"])
      show)

data-frame: 3 rows x 4 columns

┌─────────┬───────────────┬───────────────┬───────────────┐

│character│idx-3          │idx-1          │idx-2          

├─────────┼───────────────┼───────────────┼───────────────┤

│Toothless│How to Train...│How to Train...│How to Train...│

├─────────┼───────────────┼───────────────┼───────────────┤

│Dory     │#f             │Finding Nemo   │Finding Dory   

├─────────┼───────────────┼───────────────┼───────────────┤

│Holly    │#f             │#f             │#f             

└─────────┴───────────────┴───────────────┴───────────────┘

procedure

(unnest-longer df    
  column-name    
  [#:keys-to keys-to    
  #:values-to values-to    
  #:remove? remove?])  data-frame?
  df : data-frame?
  column-name : string?
  keys-to : (or/c string? #f) = #f
  values-to : (or/c string? #f) = #f
  remove? : boolean? = #t
Returns a data-frame like df, except the given column-name’s data is collapsed into new columns: keys-to for the keys (if there are any), and values-to for the value. column-name is expected to contain sequences and NA values.

If values-to is unspecified or #f, it defaults to column-name.

If keys-to is unspecified or #f, it defaults to (string-append column-name "-keys").

If column-name’s data is "list-like" (returns one element per iteration), the column keys-to is not added, even if it specified.

If remove? is #t, column-name will be removed from the result data-frame, unless column-name is equal to values-to.

Examples:
> (~> deep-df
      (unnest-longer "metadata")
      show)

data-frame: 9 rows x 3 columns

┌─────────────┬─────────┬─────────┐

│metadata-keys│character│metadata │

├─────────────┼─────────┼─────────┤

│color        │Toothless│black    

├─────────────┼─────────┼─────────┤

│species      │Toothless│dragon   

├─────────────┼─────────┼─────────┤

│films        │Toothless│#<vector>│

├─────────────┼─────────┼─────────┤

│color        │Dory     │blue     

├─────────────┼─────────┼─────────┤

│species      │Dory     │blue tang│

├─────────────┼─────────┼─────────┤

│films        │Dory     │#<vector>│

└─────────────┴─────────┴─────────┘

3 rows, 0 cols elided

(use (show df everything #:n-rows 'all) for full frame)

> (~> expanded-metadata
      (unnest-longer "films")
      show)

data-frame: 6 rows x 4 columns

┌───────────────┬─────────┬─────────┬─────────┐

│films          │color    │species  │character│

├───────────────┼─────────┼─────────┼─────────┤

│How to Train...│black    │dragon   │Toothless│

├───────────────┼─────────┼─────────┼─────────┤

│How to Train...│black    │dragon   │Toothless│

├───────────────┼─────────┼─────────┼─────────┤

│How to Train...│black    │dragon   │Toothless│

├───────────────┼─────────┼─────────┼─────────┤

│Finding Nemo   │blue     │blue tang│Dory     

├───────────────┼─────────┼─────────┼─────────┤

│Finding Dory   │blue     │blue tang│Dory     

├───────────────┼─────────┼─────────┼─────────┤

│#f             │also blue│glaceon  │Holly    

└───────────────┴─────────┴─────────┴─────────┘