8.2

## fuzzy-search

This module provides approximate string matching procedures loosely based on Sublime Text’s approach.

### 1Definitions

In the context of this collection, the following terms apply:

• needle: A user-defined string used as an input search pattern.

• haystack: A string that may contain needles.

• match: A position in a haystack where a needle’s character appears.

• score: A user-defined exact integer representing how “well” a needle matches a haystack.

• scoring procedure: A procedure that returns a score.

### 2Approximate Matching

 (require fuzzy-search) package: fuzzy-search

procedure

 (fuzzy-search needle haystack [ assign-score #:max-match-count max-match-count #:max-depth max-depth #:case-sensitive? case-sensitive?])

boolean?
exact-integer?
 (hash/c exact-nonnegative-integer? exact-nonnegative-integer? #:immutable #t)
needle : string?
haystack : string?
assign-score : score/c = forrest-skinner
max-match-count : exact-positive-integer? = 256
max-depth : exact-positive-integer? = 10
case-sensitive? : any/c = #f
Searches haystack for the given needle. Returns three values:

1. #t if a match was found.

2. The value of (assign-score H haystack (string-length haystack) (length (hash-keys H))), where H is the returned hash table.

3. A hash table H, such that (string-ref haystack (hash-ref H N)) returns the first character of the Nth (zero-based) match. This hash table will produce the highest score for the given haystack.

The search will record a match for every contiguous character in needle found in haystack. If the number of matches exceeds max-match-count, then the search will conclude as if no match occurred, and with a score of 0.

Recursive searches in substrings of haystack will commence in order to find higher scoring matches. max-depth limits the recursion depth of the search. While searches are themselves called in tail position, max-depth can still help reduce the runtime of the search.

When case-sensitive? is #f, characters from the needle and haystack are first normalized with char-downcase before being compared.

value

score/c :
 (-> (hash/c exact-nonnegative-integer? exact-nonnegative-integer? #:immutable #t) string? exact-positive-integer? exact-positive-integer? exact-integer?)
Defines a scoring procedure that assigns a score with awareness of all matches in a haystack.

This procedure accepts the following arguments, in order:

1. An immutable hasheq. The meaning of the hash is the same as it is in fuzzy-search.

2. A haystack.

3. The length of the haystack.

4. The number of matches found in the haystack.

value

score/match/c :
 (-> (hash/c exact-nonnegative-integer? exact-nonnegative-integer? #:immutable #t) string? exact-positive-integer? exact-positive-integer? exact-integer? exact-integer? exact-integer?)
Defines a scoring procedure that assigns a score with awareness of a particular match in a haystack.

Like score/c, but accepts two more arguments that represent a key and the associated value from the hash table. This allows per-occurrence scoring when used with score/match.

 value
Defines a scoring procedure that assigns a score in context of a character that MAY appear in a match, followed by a character that DOES appear in a match.

 value
A scoring procedure based on Forrest Smith’s approximation of Sublime Text’s behavior.

 (define score/forrest (score/all (λ _ 100) (score/all/leading-letter -5 -15) (score/all/unmatched-letter -1) (score/match (score/match/sequence 15) (score/match/first-letter 15) (score/match/pair (score/pair/camel-case 30) (score/pair/prefixed 30 '(#\space #\_))))))

 syntax(score-if t v)
Expands to (if t v 0). Useful for scoring a match conditionally in expression position of a sum.

 procedure(score/all procs ...) → score/c procs : score/c
Equivalent to:

 (λ args (for/sum ([f (in-list procs)]) (apply f args)))

Useful for combining several score/c procedures.

 procedure v : exact-integer?
Returns a scoring procedure. That procedure returns v times the number of unmatched characters in a search.

 procedure(score/all/leading-letter v limit) → score/c v : exact-integer? limit : exact-positive-integer?
Returns a scoring procedure. That procedure returns a score of v times the position of the first matching character in a haystack, capped to a magnitude of limit.

Use to base a penalty or reward on how far the first match is from the beginning of its haystack.

 procedure(score/match procedures ...) → score/c procedures : score/match/c
Like score/all, except the given procedures must accept two more arguments according to score/match/c. Those procedures are used to assign scores to individual matches.

 procedure v : exact-integer?
Returns a scoring procedure for use in score/match.

The scoring procedure assigns a score of v if a given match (other than the first) starts immediately after the prior match in their haystack.

 procedure v : exact-integer?
Returns a scoring procedure for use in score/match.

That procedure returns a score of v if the given match starts at the beginning of the haystack.

 procedure(score/match/pair procedures ...) → score/match/c procedures : score/pair/c
Like score/all, except the procedures are meant to assign scores based on adjacent characters in the haystack.

The returned procedure is also meant for use in score/match.

 procedure v : exact-integer?
Returns a scoring procedure for use in score/match/pair.

That procedure returns a score of v if a given pair of characters changes from lowercase to uppercase. It returns a score of 0 otherwise.

 procedure(score/pair/prefixed v chars) → score/pair/c v : exact-integer? chars : (listof char?)
Returns a scoring procedure for use in score/match/pair.

That procedure returns a score of v if the first of a given pair of characters appears in chars. It returns a score of 0 otherwise.

### 3Fuzzy Requires

 (require fuzzy-search/require) package: fuzzy-search

 syntax(fuzzy needle ...)
When used as a require subform, fuzzy expands to (combine-in (file relative-path) ...), where each relative-path is the highest-scoring relative path in proximate directories to a given Racket module.

The filesystem search entails use of find-files starting from the directory of the module using fuzzy, or (current-directory) if the source directory is unknown. Paths relative to the file using fuzzy are then treated as haystacks for fuzzy-search.

This is useful if you are prototyping a Racket package and are likely to shuffle files around. Normally when you move a file, all relative path require subforms will break. fuzzy forms may break. A break would occur if a needle targeting a file becomes ambiguous, or if the intended file moves out of the search scope.

 (require fuzzy-search/require (fuzzy "legacy"    ; legacy.rkt "widgets")) ; utils/widgets.rkt

Do not use fuzzy in user-facing code. Use it only to delay your commitment to concrete file paths. Even then, be careful not to use overly vague needles. (require (fuzzy "a")) is a dice roll.

### 4Credits and Project Information

The implementation is based on Forrest Smith’s work with Sublime Text’s fuzzy search. That link will take you to his content and public domain source code. Thanks, Forrest!

You can find the source code of this package here.

My own contributions are as follows:

• Port the JavaScript implementation of Forrest’s fuzzy match to Racket.

• Refactor the code to allow scoring procedures and combinators as part of the public API.

• Make case-sensitivity an option.

• Expose match positions in program output.

• Package for distribution on the Racket catalog.

• Add the fuzzy require subform.

If you found my additions useful, then please consider sponsoring my work. I count on your support to continue writing libraries like this one.