fuzzy-search
1 Definitions
2 Approximate Matching
fuzzy-search
score/  c
score/  match/  c
score/  pair/  c
score/  forrest
score-if
score/  all
score/  all/  unmatched-letter
score/  all/  leading-letter
score/  match
score/  match/  sequence
score/  match/  first-letter
score/  match/  pair
score/  pair/  camel-case
score/  pair/  prefixed
3 Fuzzy Requires
fuzzy
4 Credits and Project Information
7.8

fuzzy-search

Forrest Smith (90%), Sage Gerard (10%)

This module provides approximate string matching procedures loosely based on Sublime Text’s approach.

1 Definitions

In the context of this collection, the following terms apply:

2 Approximate Matching

 (require fuzzy-search) package: fuzzy-search

procedure

(fuzzy-search needle 
  haystack 
  [assign-score 
  #:max-match-count max-match-count 
  #:max-depth max-depth 
  #:case-sensitive? case-sensitive?]) 
  
boolean?
exact-integer?
(hash/c exact-nonnegative-integer?
        exact-nonnegative-integer?
        #:immutable #t)
  needle : string?
  haystack : string?
  assign-score : score/c = forrest-skinner
  max-match-count : exact-positive-integer? = 256
  max-depth : exact-positive-integer? = 10
  case-sensitive? : any/c = #f
Searches haystack for the given needle. Returns three values:

  1. #t if a match was found.

  2. The value of (assign-score H haystack (string-length haystack) (length (hash-keys H))), where H is the returned hash table.

  3. A hash table H, such that (string-ref haystack (hash-ref H N)) returns the first character of the Nth (zero-based) match. This hash table will produce the highest score for the given haystack.

The search will record a match for every contiguous character in needle found in haystack. If the number of matches exceeds max-match-count, then the search will conclude as if no match occurred, and with a score of 0.

Recursive searches in substrings of haystack will commence in order to find higher scoring matches. max-depth limits the recursion depth of the search. While searches are themselves called in tail position, max-depth can still help reduce the runtime of the search.

When case-sensitive? is #f, characters from the needle and haystack are first normalized with char-downcase before being compared.

value

score/c : 
(-> (hash/c exact-nonnegative-integer?
            exact-nonnegative-integer?
            #:immutable #t)
    string?
    exact-positive-integer?
    exact-positive-integer?
    exact-integer?)
Defines a scoring procedure that assigns a score with awareness of all matches in a haystack.

This procedure accepts the following arguments, in order:

  1. An immutable hasheq. The meaning of the hash is the same as it is in fuzzy-search.

  2. A haystack.

  3. The length of the haystack.

  4. The number of matches found in the haystack.

value

score/match/c : 
(-> (hash/c exact-nonnegative-integer?
            exact-nonnegative-integer?
            #:immutable #t)
    string?
    exact-positive-integer?
    exact-positive-integer?
    exact-integer?
    exact-integer?
    exact-integer?)
Defines a scoring procedure that assigns a score with awareness of a particular match in a haystack.

Like score/c, but accepts two more arguments that represent a key and the associated value from the hash table. This allows per-occurrence scoring when used with score/match.

Defines a scoring procedure that assigns a score in context of a character that MAY appear in a match, followed by a character that DOES appear in a match.

A scoring procedure based on Forrest Smith’s approximation of Sublime Text’s behavior.

(define score/forrest
  (score/all
   (λ _ 100)
   (score/all/leading-letter -5 -15)
   (score/all/unmatched-letter -1)
   (score/match
    (score/match/sequence 15)
    (score/match/first-letter 15)
    (score/match/pair
     (score/pair/camel-case 30)
     (score/pair/prefixed 30 '(#\space #\_))))))

syntax

(score-if t v)

Expands to (if t v 0). Useful for scoring a match conditionally in expression position of a sum.

procedure

(score/all procs ...)  score/c

  procs : score/c
Equivalent to:

(λ args
  (for/sum ([f (in-list procs)])
    (apply f args)))

Useful for combining several score/c procedures.

Returns a scoring procedure. That procedure returns v times the number of unmatched characters in a search.

procedure

(score/all/leading-letter v limit)  score/c

  v : exact-integer?
  limit : exact-positive-integer?
Returns a scoring procedure. That procedure returns a score of v times the position of the first matching character in a haystack, capped to a magnitude of limit.

Use to base a penalty or reward on how far the first match is from the beginning of its haystack.

procedure

(score/match procedures ...)  score/c

  procedures : score/match/c
Like score/all, except the given procedures must accept two more arguments according to score/match/c. Those procedures are used to assign scores to individual matches.

procedure

(score/match/sequence v ...)  score/match/c

  v : exact-integer?
Returns a scoring procedure for use in score/match.

The scoring procedure assigns a score of v if a given match (other than the first) starts immediately after the prior match in their haystack.

Returns a scoring procedure for use in score/match.

That procedure returns a score of v if the given match starts at the beginning of the haystack.

procedure

(score/match/pair procedures ...)  score/match/c

  procedures : score/pair/c
Like score/all, except the procedures are meant to assign scores based on adjacent characters in the haystack.

The returned procedure is also meant for use in score/match.

Returns a scoring procedure for use in score/match/pair.

That procedure returns a score of v if a given pair of characters changes from lowercase to uppercase. It returns a score of 0 otherwise.

procedure

(score/pair/prefixed v chars)  score/pair/c

  v : exact-integer?
  chars : (listof char?)
Returns a scoring procedure for use in score/match/pair.

That procedure returns a score of v if the first of a given pair of characters appears in chars. It returns a score of 0 otherwise.

3 Fuzzy Requires

 (require fuzzy-search/require) package: fuzzy-search

syntax

(fuzzy needle ...)

When used as a require subform, fuzzy expands to (combine-in (file relative-path) ...), where each relative-path is the highest-scoring relative path in proximate directories to a given Racket module.

The filesystem search entails use of find-files starting from the directory of the module using fuzzy, or (current-directory) if the source directory is unknown. Paths relative to the file using fuzzy are then treated as haystacks for fuzzy-search.

This is useful if you are prototyping a Racket package and are likely to shuffle files around. Normally when you move a file, all relative path require subforms will break. fuzzy forms may break. A break would occur if a needle targeting a file becomes ambiguous, or if the intended file moves out of the search scope.

(require fuzzy-search/require
         (fuzzy "legacy"    ; legacy.rkt
                "widgets")) ; utils/widgets.rkt

Do not use fuzzy in user-facing code. Use it only to delay your commitment to concrete file paths. Even then, be careful not to use overly vague needles. (require (fuzzy "a")) is a dice roll.

4 Credits and Project Information

The implementation is based on Forrest Smith’s work with Sublime Text’s fuzzy search. That link will take you to his content and public domain source code. Thanks, Forrest!

You can find the source code of this package here.

My own contributions are as follows:

If you found my additions useful, then please consider sponsoring my work. I count on your support to continue writing libraries like this one.