String Tools
Jens Axel Søgaard <jensaxel@soegaard.net>
| (require string-tools) | package: string-tools-lib |
1 String Functions
This section documents string-processing procedures provided by this package. They focus on practical operations such as splitting, substring counting, and character counting with explicit index bounds.
The procedures are intended to complement racket/string, so a typical setup is:
(require racket/string string-tools)
Procedures that are close in purpose to existing Racket procedures use distinct names (for example, string-at instead of string-ref) to avoid name clashes. Some procedure names follow the SRFI 13 naming tradition. If you need both libraries at once, prefix SRFI 13, for example:
Before diving into the individual procedures, consider skimming Extended Examples. It includes one example on parsing and analyzing structured log lines, one example on cleaning and validating CSV-like imported rows, and one example on normalizing and patching an INI-like configuration text.
1.1 Conventions
These conventions apply throughout the string procedures in this section.
For procedures that accept start and end, negative indices count from the end of the string, and indices are clamped to valid bounds.
start is included and end is not included when selecting a substring.
A value of -1 denotes the index of the last character. Because end is not included, using end as -1 stops just before the last character.
In string-slice/step, #f for start or end means the bound is omitted and defaults according to the step direction.
In procedures that accept a character matcher, a matcher may be a character, a character set, a string (treated as a character set), or a unary predicate on characters.
1.2 Function Index
Use this overview as a quick map from task to procedure family.
| Access | string-at |
| Slicing | string-slice string-slice/step |
| Split/Replace | string-split-at string-replace-range |
| Counting | string-count string-count-lines |
| Needles | string-count-needle |
| Index/Skip | string-index string-index-right string-skip string-skip-right |
| Trimming | string-trim-both string-trim-left string-trim-right |
| Needle Search | string-find-needle string-find-last-needle string-find-all-needle |
| Partitioning | string-partition string-partition-right string-between |
| Normalize | string-remove-prefix string-remove-suffix string-ensure-prefix string-ensure-suffix |
| Common Parts | string-common-prefix string-common-suffix |
| Line Ops | string-lines string-line-start-indices string-normalize-newlines string-chomp string-chop-newline string-ensure-ends-with-newline |
| Tabs/Width | string-expand-tabs string-display-width |
| Case/Map | string-capitalize string-swapcase string-map string-map! |
| Transform | string-repeat string-reverse string-rot13 string-pluralize string-singularize string-intersperse |
| Quoting | string-quote string-unquote |
| Visible Escapes | string-escape-visible string-unescape-visible |
| JSON/Regexp/ANSI | string-escape-json string-unescape-json string-escape-regexp string-strip-ansi string-squeeze |
| Tokenize/Fields | string-tokenize string-fields |
| Scan | string-scan |
| Layout | string-wrap string-indent string-dedent string-elide |
| Metrics | string-levenshtein string-jaro-winkler string-similarity |
| Predicates | string-blank? string-ascii? string-digit? |
1.3 Splitting and Slicing
This subsection covers positional extraction and replacement operations, from safe single-character access to stepped slicing and split-at-index workflows.
ℹ️ Think of string-slice as a nicer substring.
procedure
(string-slice s [start end]) → string?
s : string? start : exact-integer? = 0 end : exact-integer? = (string-length s)
Indices are normalized and clamped to the string bounds: negative indices count backward from the end, and out-of-range indices are clamped to valid positions. If the normalized end is less than or equal to the normalized start, the result is the empty string.
> (string-slice "abcdef") "abcdef"
> (string-slice "abcdef" 1 4) "bcd"
> (string-slice "abcdef" -3 -1) "de"
> (string-slice "abcdef" -100 100) "abcdef"
> (string-slice "abcdef" 4 2) ""
Related: string-slice/step, string-at.
procedure
(string-slice/step s [start end step]) → string?
s : string? start : (or/c exact-integer? #f) = #f end : (or/c exact-integer? #f) = #f step : exact-integer? = 1
⚠️ Gotcha: With negative step, omitted bounds (#f) behave differently from explicit negative indices such as -1.
When step is positive, traversal is left to right. When step is negative, traversal is right to left. A zero step raises an exception.
If start or end is #f, the bound is treated as omitted and defaults according to the step direction.
Indices may be negative and are clamped to the string bounds.
> (string-slice/step "abcdef") "abcdef"
> (string-slice/step "abcdef" 0 6 2) "ace"
> (string-slice/step "abcdef" 5 #f -2) "fdb"
> (string-slice/step "abcdef" #f #f -1) "fedcba"
Related: string-slice, string-at.
ℹ️ Think of string-at as a safer string-ref.
procedure
s : string? i : exact-integer? default : any/c = #f
Indices are clamped to the string bounds, and negative indices count from the end of the string.
⚠️ Gotcha: For non-empty strings, out-of-range indices are clamped, so default is only used when s is empty.
If s is empty, default is returned.
> (string-at "abc" 0) #\a
> (string-at "abc" -1) #\c
> (string-at "abc" 3) #\c
> (string-at "abc" -10) #\a
> (string-at "" 0 #\x) #\x
Related: string-slice, string-slice/step.
procedure
(string-split-at s i ...) → (listof string?)
s : string? i : exact-integer?
Indices may be negative and are clamped to the string bounds. The indices may be given in any order and may contain duplicates; they are sorted and deduplicated before splitting.
The returned list contains the substrings of s between successive cut positions, including the beginning and end of the string.
> (string-split-at "abcdef" 2 4) '("ab" "cd" "ef")
> (string-split-at "abcdef" 4 2) '("ab" "cd" "ef")
> (string-split-at "abc" 1 1 2) '("a" "b" "c")
> (string-split-at "abc") '("abc")
> (string-split-at "abc" 0) '("" "abc")
> (string-split-at "abc" -1) '("ab" "c")
> (string-split-at "abc" 3) '("abc" "")
If no indices are provided, the result is a list containing s itself.
If exactly one index is provided, the result is a two-element list consisting of the prefix and suffix at that index.
An exception is raised if any index is not an exact integer.
procedure
(string-replace-range s start end replacement) → string? s : string? start : exact-integer? end : exact-integer? replacement : string?
The replaced portion starts at start and continues up to end, excluding the character at end.
Indices may be negative and are clamped to the string bounds.
> (string-replace-range "abcdef" 2 4 "XY") "abXYef"
> (string-replace-range "abcdefgh" 2 6 "X") "abXgh"
> (string-replace-range "abcdefgh" 2 4 "WXYZ") "abWXYZefgh"
> (string-replace-range "abcdef" -4 -2 "XY") "abXYef"
> (string-replace-range "abcdef" 4 2 "XY") "abXYef"
1.4 Character Counting
This subsection covers character-wise counting with flexible matching criteria, including character, character-set, string, and predicate forms.
procedure
(string-count s to-count [start end]) → exact-nonnegative-integer?
s : string? to-count : (or/c char? char-set? string? (-> char? any/c)) start : exact-integer? = 0 end : exact-integer? = (string-length s)
Indices may be negative and are clamped to the string bounds.
If to-count is a procedure, it is applied to each character as a predicate. If it is a character set, each character is tested for membership. If it is a character, char=? is used. If it is a string, the string is converted to a character set.
An exception is raised if start or end is not an exact integer, or if to-count is not a character, character set, string, or unary procedure.
> (string-count "banana" #\a) 3
> (string-count "banana" "an") 5
> (require string-tools/char-set) > (string-count "banana" (make-char-set #\a #\n)) 5
> (string-count "a1b2c3" char-numeric?) 3
> (string-count "banana" #\a 2 6) 2
> (string-count "banana" #\a -5 -1) 2
This procedure provides character counting with flexible matching criteria.
1.5 Needle Counting
This subsection groups substring-occurrence counting utilities for bounded regions of a string. Use these procedures when you need non-overlapping needle counts rather than per-character counting.
procedure
→ exact-nonnegative-integer? s : string? needle : string? start : exact-integer? = 0 end : exact-integer? = (string-length s)
The search begins at start and stops before end, which defaults to the length of s. In other words, start is included and end is not included.
Indices may be negative and are clamped to the string bounds.
Occurrences are counted from left to right and do not overlap.
If needle is the empty string, the result is the number of insertion positions in the selected substring.
An exception is raised if start or end is not an exact integer.
> (string-count-needle "banana" "na") 2
> (string-count-needle "aaaa" "aa") 2
> (string-count-needle "aaaa" "aaa") 1
> (string-count-needle "banana" "na" 3 6) 1
> (string-count-needle "banana" "na" -4 -1) 1
> (string-count-needle "abc" "") 4
> (string-count-needle "abc" "" 1 3) 3
This procedure is intended to complement the string-search utilities in racket/string by providing a direct substring-count operation.
1.6 Index Search and Trimming
This subsection combines left-to-right and right-to-left index/skip operations with matcher-driven trimming over bounded substring regions.
procedure
→ (or/c exact-nonnegative-integer? #f) s : string? to-find : (or/c char? char-set? string? (-> char? any/c)) start : exact-integer? = 0 end : exact-integer? = (string-length s)
Indices may be negative and are clamped to the string bounds.
If to-find is a character, char=? is used.
If it is a character set, membership is tested.
If it is a string, the string is converted to a character set.
If it is a procedure, the procedure is used as a predicate.
> (string-index "banana" #\a) 1
> (string-index "banana" "nz") 2
> (string-index "banana" (make-char-set #\n #\z)) 2
> (string-index "a1b2c3" char-numeric?) 1
> (string-index "banana" #\a -5 -1) 1
This procedure searches from left to right with configurable matching criteria.
procedure
→ (or/c exact-nonnegative-integer? #f) s : string? to-find : (or/c char? char-set? string? (-> char? any/c)) start : exact-integer? = 0 end : exact-integer? = (string-length s)
Indices may be negative and are clamped to the string bounds.
The right-to-left search starts at (sub1 end).
> (string-index-right "banana" #\a) 5
> (string-index-right "banana" "nz") 4
> (string-index-right "banana" (make-char-set #\n #\z)) 4
> (string-index-right "a1b2c3" char-numeric?) 5
> (string-index-right "banana" #\a -5 -1) 3
This procedure searches from right to left with configurable matching criteria.
procedure
→ (or/c exact-nonnegative-integer? #f) s : string? to-skip : (or/c char? char-set? string? (-> char? any/c)) start : exact-integer? = 0 end : exact-integer? = (string-length s)
Indices may be negative and are clamped to the string bounds.
> (string-skip " abc" #\space) 3
> (string-skip "aaab" "a") 3
> (string-skip "123x5" char-numeric?) 3
> (string-skip " abc" #\space -4 100) 3
This procedure provides left-to-right skipping using the complement criterion.
procedure
→ (or/c exact-nonnegative-integer? #f) s : string? to-skip : (or/c char? char-set? string? (-> char? any/c)) start : exact-integer? = 0 end : exact-integer? = (string-length s)
Indices may be negative and are clamped to the string bounds.
> (string-skip-right "abc " #\space) 2
> (string-skip-right "baaa" "a") 0
> (string-skip-right "5x321" char-numeric?) 1
> (string-skip-right "abc " #\space -100 -1) 2
This procedure provides right-to-left skipping using the complement criterion.
procedure
(string-trim-left s [to-trim start end]) → string?
s : string?
to-trim : (or/c char? char-set? string? (-> char? any/c)) = char-whitespace? start : exact-integer? = 0 end : exact-integer? = (string-length s)
Indices may be negative and are clamped to the string bounds.
If to-trim is omitted, whitespace is trimmed. If to-trim is a string, it is converted to a character set.
> (string-trim-left " abc ") "abc "
> (string-trim-left "aaab" #\a) "b"
> (string-trim-left "aaab" "a") "b"
> (string-trim-left "abbaXYZ" "ab") "XYZ"
> (string-trim-left "123x5" char-numeric?) "x5"
> (string-trim-left "xxabcxx" #\x 2 7) "abcxx"
> (string-trim-left "xxabcxx" #\x -5 -1) "abcx"
> (string-trim-left "abcde" #\x 1 3) "bc"
procedure
(string-trim-right s [to-trim start end]) → string?
s : string?
to-trim : (or/c char? char-set? string? (-> char? any/c)) = char-whitespace? start : exact-integer? = 0 end : exact-integer? = (string-length s)
Indices may be negative and are clamped to the string bounds.
If to-trim is omitted, whitespace is trimmed. If to-trim is a string, it is converted to a character set.
> (string-trim-right " abc ") " abc"
> (string-trim-right "baaa" #\a) "b"
> (string-trim-right "baaa" "a") "b"
> (string-trim-right "XYZabba" "ab") "XYZ"
> (string-trim-right "5x321" char-numeric?) "5x"
> (string-trim-right "xxabcxx" #\x 1 6) "xabc"
> (string-trim-right "xxabcxx" #\x -5 -1) "abc"
ℹ️ Similar to string-trim, but this procedure uses the matcher conventions in this module for to-trim.
procedure
(string-trim-both s [to-trim start end]) → string?
s : string?
to-trim : (or/c char? char-set? string? (-> char? any/c)) = char-whitespace? start : exact-integer? = 0 end : exact-integer? = (string-length s)
Indices may be negative and are clamped to the string bounds.
If to-trim is omitted, whitespace is trimmed. If to-trim is a string, it is converted to a character set.
> (string-trim-both " abc ") "abc"
> (string-trim-both "aaabaa" #\a) "b"
> (string-trim-both "aaabaa" "a") "b"
> (string-trim-both "abbaXYZabba" "ab") "XYZ"
> (string-trim-both "123x5" char-numeric?) "x"
> (string-trim-both "xxabcxx" #\x 1 6) "abc"
> (string-trim-both "xxabcxx" #\x -5 -1) "abc"
1.7 Substring Search and Partitioning
This subsection groups substring search and partitioning helpers that return indices, ranges, or before/needle/after splits.
procedure
→ (or/c exact-nonnegative-integer? #f) s : string? needle : string? start : exact-integer? = 0 end : exact-integer? = (string-length s)
Indices may be negative and are clamped to the string bounds.
If needle is the empty string, the result is start.
> (string-find-needle "banana" "na") 2
> (string-find-needle "banana" "na" 3 6) 4
> (string-find-needle "banana" "na" -4 -1) 2
> (string-find-needle "abc" "") 0
This procedure provides direct substring search.
Related: string-find-all-needle, string-scan.
procedure
→ (or/c exact-nonnegative-integer? #f) s : string? needle : string? start : exact-integer? = 0 end : exact-integer? = (string-length s)
Indices may be negative and are clamped to the string bounds.
If needle is the empty string, the result is end.
> (string-find-last-needle "banana" "na") 4
> (string-find-last-needle "banana" "na" 0 5) 2
> (string-find-last-needle "banana" "na" -4 -1) 2
> (string-find-last-needle "abc" "") 3
This procedure is a right-to-left substring search companion to string-find-needle.
procedure
(string-find-all-needle s needle [ start end #:overlap? overlap? #:ranges? ranges?])
→
(listof (or/c exact-nonnegative-integer? (cons/c exact-nonnegative-integer? exact-nonnegative-integer?))) s : string? needle : string? start : exact-integer? = 0 end : exact-integer? = (string-length s) overlap? : boolean? = #f ranges? : boolean? = #f
By default, returns start indices. When #:ranges? is true, returns (cons start end) pairs for each match.
When #:overlap? is true, overlapping matches are included.
Indices may be negative and are clamped to the string bounds.
> (string-find-all-needle "banana" "na") '(2 4)
> (string-find-all-needle "banana" "na" #:ranges? #t) '((2 . 4) (4 . 6))
> (string-find-all-needle "aaaa" "aa" #:overlap? #t) '(0 1 2)
Related: string-find-needle, string-scan.
procedure
(string-partition s needle [start end]) →
string? string? string? s : string? needle : string? start : exact-integer? = 0 end : exact-integer? = (string-length s)
Indices may be negative and are clamped to the string bounds.
the substring before the match
the matched separator
the substring after the match
If no match is found, the second and third values are empty strings, and the first value is the selected substring.
> (call-with-values (λ () (string-partition "a:b:c" ":")) list) '("a" ":" "b:c")
> (call-with-values (λ () (string-partition "abc" ":")) list) '("abc" "" "")
> (call-with-values (λ () (string-partition "banana" "na")) list) '("ba" "na" "na")
> (call-with-values (λ () (string-partition "banana" "na" -4 -1)) list) '("" "na" "n")
procedure
→
string? string? string? s : string? needle : string? start : exact-integer? = 0 end : exact-integer? = (string-length s)
Indices may be negative and are clamped to the string bounds.
the substring before the match
the matched separator
the substring after the match
If no match is found, the second and third values are empty strings, and the first value is the selected substring.
> (call-with-values (λ () (string-partition-right "a:b:c" ":")) list) '("a:b" ":" "c")
> (call-with-values (λ () (string-partition-right "abc" ":")) list) '("abc" "" "")
> (call-with-values (λ () (string-partition-right "banana" "na")) list) '("bana" "na" "")
> (call-with-values (λ () (string-partition-right "banana" "na" -4 -1)) list) '("" "na" "n")
procedure
(string-between s left right [ start end #:left-match left-match #:right-match right-match #:include-left? include-left? #:include-right? include-right?]) → (or/c string? #f) s : string? left : (or/c char? string?) right : (or/c char? string?) start : exact-integer? = 0 end : exact-integer? = (string-length s) left-match : (or/c 'first 'last) = 'first right-match : (or/c 'first 'last) = 'first include-left? : boolean? = #f include-right? : boolean? = #f
Delimiters may be strings or single characters.
left-match and right-match choose whether each delimiter uses its first or last match in the selected bounds. The include-left? and include-right? options control whether delimiters are included.
Indices may be negative and are clamped to the string bounds.
> (string-between "a[b]c" "[" "]") "b"
> (string-between "a[b]c[d]e" "[" "]" #:left-match 'last) "d"
> (string-between "a[b]c[d]e" "[" "]" #:right-match 'last) "b]c[d"
> (string-between "a[b]c" "[" "]" #:include-left? #t #:include-right? #t) "[b]"
> (string-between "a[b]c" #\[ #\]) "b"
1.8 Prefix and Suffix Utilities
This subsection provides small prefix/suffix primitives for normalization and path/key shaping, including remove-if-present and ensure-if-missing forms.
> (string-remove-prefix "foobar" "foo") "bar"
> (string-remove-prefix "foobar" "bar") "foobar"
> (string-remove-suffix "foobar" "bar") "foo"
> (string-remove-suffix "foobar" "foo") "foobar"
> (string-ensure-prefix "bar" "foo") "foobar"
> (string-ensure-prefix "foobar" "foo") "foobar"
> (string-ensure-suffix "foo" "bar") "foobar"
> (string-ensure-suffix "foobar" "bar") "foobar"
> (string-common-prefix "foobar" "foobaz") "fooba"
> (string-common-prefix "abc" "xyz") ""
> (string-common-suffix "foobar" "xxbar") "bar"
> (string-common-suffix "abc" "xyz") ""
1.9 Lines
This subsection groups line-oriented utilities for text and file processing, including line splitting, counting, newline normalization, and display-column handling.
Line separators recognized are #\newline, #\return, and the two-character sequence #\return followed by #\newline.
If s ends with a line separator, no extra trailing empty line is added.
> (string-lines "") '()
> (string-lines "a\nb") '("a" "b")
> (string-lines "a\r\nb") '("a" "b")
> (string-lines "a\rb") '("a" "b")
> (string-lines "a\n") '("a")
procedure
(string-count-lines s) → exact-positive-integer?
s : string?
Line boundaries follow #\newline, #\return, and #\return followed by #\newline. A #\return followed by #\newline counts as one line boundary.
> (string-count-lines "") 1
> (string-count-lines "a\nb") 2
> (string-count-lines "a\r\nb") 2
> (string-count-lines "a\n") 2
procedure
→ (listof exact-nonnegative-integer?) s : string?
Line boundaries follow the same rules as string-count-lines.
> (string-line-start-indices "") '(0)
> (string-line-start-indices "a\nb") '(0 2)
> (string-line-start-indices "a\r\nb") '(0 3)
> (string-line-start-indices "a\n") '(0 2)
Both #\return and #\return followed by #\newline are normalized to #\newline.
> (string-normalize-newlines "a\r\nb") "a\nb"
> (string-normalize-newlines "a\rb") "a\nb"
> (string-normalize-newlines "\r\n\rx\r") "\n\nx\n"
procedure
(string-expand-tabs s [ #:tab-width tab-width #:start-column start-column]) → string? s : string? tab-width : exact-positive-integer? = 8 start-column : exact-nonnegative-integer? = 0
Tabs advance to the next tab stop determined by tab-width. Newline and return characters reset the running column to zero.
> (string-expand-tabs "a\tb") "a b"
> (string-expand-tabs "ab\tcd" #:tab-width 4) "ab cd"
> (string-expand-tabs "\t" #:tab-width 4 #:start-column 2) " "
procedure
(string-display-width s [ #:tab-width tab-width #:start-column start-column]) → exact-nonnegative-integer? s : string? tab-width : exact-positive-integer? = 8 start-column : exact-nonnegative-integer? = 0
ASCII printable characters count as width 1. Tabs advance to the next tab stop. Newline and return reset the running column to zero.
> (string-display-width "a\tb") 9
> (string-display-width "a\nbc") 2
> (string-display-width "\t" #:tab-width 4 #:start-column 2) 4
If no trailing newline is present, s is returned unchanged.
> (string-chomp "abc") "abc"
> (string-chomp "abc\n") "abc"
> (string-chomp "abc\r\n") "abc"
> (string-chomp "abc\n\n") "abc\n"
1.10 String Construction and Transformation
This subsection collects string-building and transformation utilities, from repetition and case conversion to simple linguistic and mapping helpers.
procedure
(string-repeat s n) → string?
s : string? n : exact-nonnegative-integer?
> (string-repeat "ab" 0) ""
> (string-repeat "ab" 3) "ababab"
> (string-reverse "") ""
> (string-reverse "abc") "cba"
Use string-upcase when every character should be uppercased. Use string-titlecase for title-casing behavior across words.
> (string-capitalize "") ""
> (string-capitalize "hello world") "Hello world"
> (string-capitalize "hELLO WORLD") "Hello world"
> (string-swapcase "") ""
> (string-swapcase "AbC") "aBc"
> (string-swapcase "hello WORLD") "HELLO world"
🌐 See ROT13 on Wikipedia.
> (string-rot13 "Hello, World!") "Uryyb, Jbeyq!"
> (string-rot13 (string-rot13 "Racket")) "Racket"
> (string-rot13 (string-rot13 "uryyb")) "uryyb"
> (string-pluralize "cat") "cats"
> (string-pluralize "box") "boxes"
> (string-pluralize "city") "cities"
⚠️ Gotcha: This is heuristic, not full linguistic inflection.
> (string-singularize "cats") "cat"
> (string-singularize "boxes") "box"
> (string-singularize "cities") "city"
> (string-ensure-ends-with-newline "") "\n"
> (string-ensure-ends-with-newline "abc") "abc\n"
> (string-ensure-ends-with-newline "abc\n") "abc\n"
procedure
(string-map proc s [start end]) → string?
proc : (-> char? char?) s : string? start : exact-integer? = 0 end : exact-integer? = (string-length s)
This procedure does not mutate s.
Indices may be negative and are clamped to the string bounds.
> (string-map char-upcase "abc") "ABC"
> (string-map char-upcase "abcdef" 1 4) "aBCDef"
> (string-map char-upcase "abcdef" -4 -1) "abCDEf"
procedure
(string-map! proc s [start end]) → void?
proc : (-> char? char?) s : (and/c string? (not/c immutable?)) start : exact-integer? = 0 end : exact-integer? = (string-length s)
Indices may be negative and are clamped to the string bounds.
> (define m (string-copy "abcdef")) > (string-map! char-upcase m -4 -1) > m "abCDEf"
> (string-intersperse "," '()) ""
> (string-intersperse "," '("a")) "a"
> (string-intersperse "," '("a" "b" "c")) "a,b,c"
1.11 Escaping and Cleaning
This subsection groups escaping and cleanup utilities for both human-visible text and machine-oriented string formats such as quoted literals and JSON string content.
Escapes include "\\n", "\\r", "\\t", "\\b", "\\f", and "\\\\". Other ASCII control characters are rendered as "\\xNN".
> (string-escape-visible "\n\t\r") "\\n\\t\\r"
> (string-escape-visible "\\x") "\\\\x"
> (string-escape-visible (string #\nul #\rubout)) "\\x00\\x7F"
> (string-unescape-visible (string-escape-visible "a\n\tb")) "a\n\tb"
Recognized escapes include "\\n", "\\r", "\\t", "\\b", "\\f", "\\\\", "\\xNN", and "\\x...;".
> (string-unescape-visible "\\n\\t") "\n\t"
> (string-unescape-visible "\\x00\\x7F") "\u0000\u007F"
> (string-unescape-visible "\\x3BB;") "λ"
> (string-unescape-visible (string-escape-visible "a\n\tb")) "a\n\tb"
Related: string-quote, string-unquote.
> (string-quote "He said \"hi\"") "\"He said \\\"hi\\\"\""
> (string-quote "a'b" #:quote-char #\') "'a\\'b'"
> (string-unquote (string-quote "a\nb")) "a\nb"
Related: string-unquote, string-escape-visible, string-escape-json.
procedure
(string-unquote s [#:quote-char quote-char]) → string?
s : string? quote-char : char? = #\"
An exception is raised when outer quotes are missing or escapes are malformed.
> (string-unquote "\"a\\nb\"") "a\nb"
> (string-unquote "'a\\'b'" #:quote-char #\') "a'b"
> (string-unquote (string-quote "He said \"hi\"")) "He said \"hi\""
Related: string-quote, string-unescape-visible, string-unescape-json.
> (string-escape-regexp "a+b") "a\\+b"
> (regexp-match? (regexp (string-escape-regexp "a+b")) "a+b") #t
> (string-escape-json "\"\\/\n") "\\\"\\\\\\/\\n"
> (string-escape-json (string #\nul #\u001F)) "\\u0000\\u001F"
> (string-unescape-json (string-escape-json "hello\nλ")) "hello\nλ"
Related: string-unescape-json, string-quote.
> (string-unescape-json "\\u0041\\u03BB") "Aλ"
> (string-unescape-json "\\uD83D\\uDE00") "😀"
> (string-unescape-json (string-escape-json "hello\nλ")) "hello\nλ"
Related: string-escape-json, string-unquote.
> (string-strip-ansi "\e[31mred\e[0m") "red"
> (string-strip-ansi "a\e]0;title\ab") "ab"
procedure
(string-squeeze s [to-squeeze]) → string?
s : string?
to-squeeze : (or/c char? char-set? string? (procedure-arity-includes/c 1)) = char-whitespace?
If to-squeeze is a character, characters equal to it are squeezed. If it is a character set, characters in the set are squeezed. If it is a string, the string is treated as a character set. If it is a procedure, it is used as the character predicate.
> (string-squeeze "a b c" #\space) "a b c"
> (string-squeeze "a\t \n\nb") "a\tb"
> (string-squeeze "baaaana" "a") "bana"
1.12 Tokenization and Scanning
This subsection groups parsing-oriented helpers that split text into tokens or fields and scan text for successive match ranges.
procedure
(string-tokenize s [ to-separate start end #:quote quote #:escape escape]) → (listof string?) s : string?
to-separate : (or/c char? char-set? string? (procedure-arity-includes/c 1)) = char-whitespace? start : exact-integer? = 0 end : exact-integer? = (string-length s) quote : (or/c #f char?) = #f escape : (or/c #f char?) = #\\
If quote is provided, separators inside quoted text are ignored. If escape is provided, the escaped next character is treated literally.
Indices may be negative and are clamped to the string bounds.
> (string-tokenize " a b c ") '("a" "b" "c")
> (string-tokenize "a,b,c" #\,) '("a" "b" "c")
> (string-tokenize "a,\"b,c\",d" #\, #:quote #\") '("a" "b,c" "d")
> (string-tokenize "a,b\\,c,d" #\, #:escape #\\) '("a" "b,c" "d")
> (string-tokenize "abc def ghi" char-whitespace? -7 -1) '("def" "gh")
procedure
(string-fields s [ to-separate start end #:quote quote #:escape escape #:widths widths #:include-rest? include-rest?]) → (listof string?) s : string?
to-separate : (or/c char? char-set? string? (procedure-arity-includes/c 1)) = #\, start : exact-integer? = 0 end : exact-integer? = (string-length s) quote : (or/c #f char?) = #f escape : (or/c #f char?) = #\\ widths : (or/c #f (listof exact-positive-integer?)) = #f include-rest? : boolean? = #f
In delimiter mode, empty fields are preserved. In fixed-width mode (#:widths), the widths list defines field lengths from left to right. When #:include-rest? is true in fixed-width mode, one additional field contains any remaining substring.
Indices may be negative and are clamped to the string bounds.
> (string-fields "a,b,,c," #\,) '("a" "b" "" "c" "")
> (string-fields "a,\"b,c\",d" #\, #:quote #\") '("a" "b,c" "d")
> (string-fields "abcdefgh" #\, #:widths (quote(2 3 2))) '("ab" "cde" "fg")
> (string-fields "abcdefgh" #\, #:widths (quote(2 3 2))#:include-rest? #t) '("ab" "cde" "fg" "h")
> (string-fields "a,b,c,d" #\, -5 -1) '("b" "c" "")
procedure
(string-scan s matcher [ start end #:overlap? overlap?])
→
(-> (or/c (cons/c exact-nonnegative-integer? exact-nonnegative-integer?) #f)) s : string? matcher : (or/c string? char? char-set? (procedure-arity-includes/c 1)) start : exact-integer? = 0 end : exact-integer? = (string-length s) overlap? : boolean? = #f
If matcher is a string, it is treated as a substring needle. If matcher is a character, character set, or predicate, matching characters are returned as one-character ranges.
Indices may be negative and are clamped to the string bounds.
> (define (collect-ranges g) (let loop ([acc '()]) (define v (g)) (if v (loop (cons v acc)) (reverse acc)))) > (collect-ranges (string-scan "banana" "na")) '((2 . 4) (4 . 6))
> (collect-ranges (string-scan "aaaa" "aa" #:overlap? #t)) '((0 . 2) (1 . 3) (2 . 4))
> (collect-ranges (string-scan "abc123x" char-numeric?)) '((3 . 4) (4 . 5) (5 . 6))
> (collect-ranges (string-scan "banana" #\a -5 -1)) '((1 . 2) (3 . 4))
Related: string-find-all-needle, string-find-needle.
1.13 Formatting and Layout
This subsection covers presentation-oriented text shaping, including wrapping, indentation normalization, and width-constrained truncation.
procedure
(string-wrap s width [ #:mode mode #:preserve-words? preserve-words?]) → string? s : string? width : exact-positive-integer? mode : (or/c 'soft 'hard) = 'soft preserve-words? : boolean? = #t
In 'soft mode, wrapping prefers whitespace boundaries. In 'hard mode, lines are split exactly at width characters.
When preserve-words? is true in soft mode, long words are kept intact instead of being split.
> (string-wrap "alpha beta gamma" 10) "alpha\nbeta gamma"
> (string-wrap "supercalifragilistic" 8) "supercalifragilistic"
> (string-wrap "supercalifragilistic" 8 #:preserve-words? #f) "supercal\nifragili\nstic"
> (string-wrap "abcdefghij" 4 #:mode 'hard) "abcd\nefgh\nij"
procedure
(string-indent s n-or-prefix) → string?
s : string? n-or-prefix : (or/c exact-nonnegative-integer? string?)
If n-or-prefix is a nonnegative integer, that many spaces are used. If it is a string, that string is used as the line prefix.
> (string-indent "a\nb" 2) " a\n b"
> (string-indent "a\nb" "-> ") "-> a\n-> b"
Indentation is measured using leading spaces and tabs.
> (string-dedent " a\n b") "a\nb"
> (string-dedent " a\n b") "a\n b"
> (string-dedent " a\n\n b") "a\n\nb"
procedure
(string-elide s width [ #:where where #:ellipsis ellipsis]) → string? s : string? width : exact-nonnegative-integer? where : (or/c 'left 'right 'middle) = 'right ellipsis : string? = "..."
The where option chooses whether truncation happens on the left, right, or in the middle.
> (string-elide "abcdef" 5) "ab..."
> (string-elide "abcdef" 5 #:where 'left) "...ef"
> (string-elide "abcdef" 5 #:where 'middle) "a...f"
> (string-elide "abcdef" 6 #:ellipsis "..") "abcdef"
1.14 Similarity and Distance
This subsection provides string similarity and distance metrics useful for ranking candidates, fuzzy matching, and suggestion-style diagnostics.
procedure
(string-levenshtein a b) → exact-nonnegative-integer?
a : string? b : string?
Time complexity is O ((string-length a) * (string-length b)).
> (string-levenshtein "kitten" "sitting") 3
> (string-levenshtein "flaw" "lawn") 2
> (string-levenshtein "abc" "abc") 0
procedure
(string-jaro-winkler a b [ #:prefix-scale prefix-scale]) → inexact-real? a : string? b : string? prefix-scale : real? = 0.1
Time complexity is approximately O ((string-length a) * (string-length b)) in the worst case.
> (string-jaro-winkler "martha" "marhta") 0.9611111111111111
> (string-jaro-winkler "martha" "xyz") 0.0
procedure
(string-similarity a b) → inexact-real?
a : string? b : string?
> (string-similarity "dixon" "dicksonx") 0.8133333333333332
1.15 Case Conversion and Predicates
This subsection provides lightweight whole-string predicates for whitespace, ASCII, and digit checks.
> (string-blank? "") #t
> (string-blank? " \t\n") #t
> (string-blank? " a ") #f
> (string-ascii? "") #t
> (string-ascii? "ABC123!?") #t
> (string-ascii? "café") #f
> (string-digit? "") #t
> (string-digit? "0123456789") #t
> (string-digit? "12a3") #f
2 Character Sets
This section documents the character-set utilities used by string-count and available directly through string-tools/char-set.
Conceptually, a character set represents a collection of characters with membership operations and set operations such as union, intersection, and difference. It is useful when you want to classify characters efficiently and reuse that classification across multiple string-processing steps.
Character sets are represented with a hybrid structure: an ASCII bit mask for codepoints 0 through 127, plus a normalized collection of non-ASCII inclusive ranges. This representation gives fast membership tests for common ASCII text while keeping non-ASCII sets compact.
| (require string-tools/char-set) | package: string-tools-lib |
> (require string-tools/char-set) > (char-set? (make-char-set #\a #\b)) #t
> (char-set? "ab") #f
value
> (require string-tools/char-set) > (char-set-size empty-char-set) 0
> (char-set-member? empty-char-set #\a) #f
procedure
(make-char-set ch ...) → char-set?
ch : char?
> (require string-tools/char-set) > (make-char-set #\a #\b #\a) (char-set 475368975085586025561263702016 '#())
procedure
(list->char-set xs) → char-set?
xs : (listof char?)
> (require string-tools/char-set) > (define cs (list->char-set (list #\a #\b #\a))) > (char-set-size cs) 2
> (char-set-member? cs #\b) #t
procedure
(string->char-set s) → char-set?
s : string?
> (require string-tools/char-set) > (define cs (string->char-set "banana")) > (char-set-size cs) 3
> (char-set-member? cs #\n) #t
procedure
(char-set-add cs ch) → char-set?
cs : char-set? ch : char?
> (require string-tools/char-set) > (define cs (char-set-add empty-char-set #\x)) > (char-set-member? cs #\x) #t
procedure
(char-set-add-range cs lo-ch hi-ch) → char-set?
cs : char-set? lo-ch : char? hi-ch : char?
> (require string-tools/char-set) > (define letters (char-set-add-range empty-char-set #\a #\z)) > (char-set-member? letters #\m) #t
> (char-set-member? letters #\A) #f
procedure
(char-set-member? cs ch) → boolean?
cs : char-set? ch : char?
> (require string-tools/char-set) > (define vowels (make-char-set #\a #\e #\i #\o #\u)) > (char-set-member? vowels #\e) #t
> (char-set-member? vowels #\y) #f
procedure
(char-set-union a b) → char-set?
a : char-set? b : char-set?
> (require string-tools/char-set) > (define vowels (make-char-set #\a #\e #\i #\o #\u)) > (define y (make-char-set #\y)) > (char-set-member? (char-set-union vowels y) #\y) #t
procedure
(char-set-intersection a b) → char-set?
a : char-set? b : char-set?
> (require string-tools/char-set) > (define a (make-char-set #\a #\b #\c)) > (define b (make-char-set #\b #\c #\d)) > (char-set-size (char-set-intersection a b)) 2
procedure
(char-set-difference a b) → char-set?
a : char-set? b : char-set?
> (require string-tools/char-set) > (define letters (char-set-add-range empty-char-set #\a #\f)) > (define vowels (make-char-set #\a #\e)) > (char-set-member? (char-set-difference letters vowels) #\b) #t
> (char-set-member? (char-set-difference letters vowels) #\a) #f
procedure
cs : char-set?
> (require string-tools/char-set) > (char-set-size (make-char-set #\a #\b #\a)) 2
> (require string-tools/char-set) > (define vowels (make-char-set #\a #\e #\i #\o #\u)) > (char-set-member? vowels #\e) #t
> (char-set-size vowels) 5
> (define letters (char-set-add-range empty-char-set #\a #\z)) > (char-set-size (char-set-difference letters vowels)) 21
3 Extended Examples
This section presents three end-to-end workflows: log analysis, CSV-like import cleaning, and configuration normalization with patching.
3.1 Logs
This extended example uses a small synthetic log and shows a full normalize-parse-analyze pipeline. Each log line uses the format:
ts level service=... request_id=... msg="..."
Here, ts is the time stamp.
Prepare a small synthetic log input.
> (define raw-log (string-append "2026-02-21T22:10:00Z INFO service=api request_id=abc123 msg=\"start\"\r\n" "\e[31m2026-02-21T22:10:01Z ERROR service=api request_id=abc123 msg=\"timeout\"\e[0m\r\n" "2026-02-21T22:10:02Z WARN service=worker request_id=def456 msg=\"retrying\"\n" "2026-02-21T22:10:03Z INFO service=api request_id=abc123 msg=\"done\"\r"))
Normalize line endings and remove ANSI terminal escapes.
> (define cleaned-log (string-strip-ansi (string-normalize-newlines raw-log)))
Turn text into non-empty log lines and inspect quick counts.
> (define lines (filter (λ (s) (not (string-blank? s))) (string-lines cleaned-log))) > (length lines) 4
> (string-count-needle cleaned-log "ERROR") 1
Define a parser that turns one line into a record.
> (define (line->record line) (define fs (string-fields line #\space)) (define ts (list-ref fs 0)) (define level (list-ref fs 1)) (define service (string-between line "service=" " ")) (define request-id (string-between line "request_id=" " ")) (define msg (string-between line "msg=\"" "\"" #:right-match 'last)) (list ts level service request-id msg))
Parse all lines and inspect the first parsed record.
> (define records (map line->record lines)) > (car records) '("2026-02-21T22:10:00Z" "INFO" "api" "abc123" "start")
Select all ERROR records.
> (define error-records (filter (λ (r) (string=? (list-ref r 1) "ERROR")) records)) > error-records '(("2026-02-21T22:10:01Z" "ERROR" "api" "abc123" "timeout"))
3.2 CSV-Like Import Cleaning
This example shows a small CSV-like import pipeline with quoted fields, whitespace cleanup, and row-level validation diagnostics.
Prepare a small CSV-like input and split it into rows.
> (define raw-csv (string-append "id,name,socre\r\n" "1,\"Alice\",98\r\n" "2,\" Bob \",87\r\n" "x,\"Mallory\",91\r\n" "4,\"Eve\",9a\r\n" "5,\"\",100\r\n"))
> (define rows (string-lines (string-normalize-newlines raw-csv)))
Parse header and data rows.
> (define header (string-fields (car rows) #\, #:quote #\")) > (define data-rows (cdr rows))
Validate header names and suggest likely intended names.
> (define expected-header '("id" "name" "score"))
> (define (best-column-suggestion col) (define-values (best-name best-score) (for/fold ([best-name #f] [best-score -1.0]) ([cand (in-list expected-header)]) (define score (string-similarity col cand)) (if (> score best-score) (values cand score) (values best-name best-score)))) (if (and best-name (>= best-score 0.7)) best-name #f))
> (define header-diagnostics (for/list ([col (in-list header)] #:unless (member col expected-header)) (define suggestion (best-column-suggestion col)) (if suggestion (string-append "unknown column " col "; did you mean " suggestion "?") (string-append "unknown column " col)))) > header-diagnostics '("unknown column socre; did you mean score?")
Define a small field normalizer used during import.
> (define (clean-field s) (string-trim-both (string-squeeze s #\space) #\space))
Parse each row as CSV-like fields and inspect parsed rows.
> (define parsed (for/list ([row (in-list data-rows)]) (for/list ([field (in-list (string-fields row #\, #:quote #\"))]) (clean-field field)))) > parsed
'(("1" "Alice" "98")
("2" "Bob" "87")
("x" "Mallory" "91")
("4" "Eve" "9a")
("5" "" "100"))
Validate rows: id and score must be digits; name must be non-blank.
> (define (row-error fs) (define id (list-ref fs 0)) (define name (list-ref fs 1)) (define score (list-ref fs 2)) (cond [(not (string-digit? id)) "invalid id"] [(string-blank? name) "blank name"] [(not (string-digit? score)) "invalid score"] [else #f]))
Keep diagnostics for rows that fail validation.
> (define diagnostics (for/list ([row (in-list data-rows)] [fs (in-list parsed)] #:when (row-error fs)) (list (row-error fs) (string-escape-visible row)))) > header '("id" "name" "socre")
> diagnostics
'(("invalid id" "x,\"Mallory\",91")
("invalid score" "4,\"Eve\",9a")
("blank name" "5,\"\",100"))
3.3 Config Normalization and Patching
This example parses an INI-like configuration text, validates keys, suggests fixes for unknown keys, patches one value in-place, and emits normalized output with a final newline.
Prepare and normalize a small INI-like input.
> (define raw-config (string-append "; demo config\r\n" "host = example.org\r\n" "port = 8080\r\n" "timeout = 30\r\n" "retris = 2\r\n" "mode fast\r\n")) > (define normalized (string-normalize-newlines raw-config)) > (define lines (string-lines normalized)) > (define expected-keys '("host" "port" "timeout" "retries" "mode"))
Define helpers for comment detection and line parsing.
> (define (comment-line? t) (memv (string-at t 0 #f) '(#\# #\;)))
> (define (parse-config-line line) (define t (string-trim-both line)) (cond [(string-blank? t) #f] [(comment-line? t) #f] [else (define-values (lhs sep rhs) (string-partition t "=")) (if (string=? sep "") (list 'invalid (string-escape-visible line)) (list (string-trim-both lhs) (string-trim-both rhs)))]))
Parse all lines and inspect the intermediate representation.
> (define parsed-lines (filter (λ (x) x) (map parse-config-line lines))) > parsed-lines
'(("host" "example.org")
("port" "8080")
("timeout" "30")
("retris" "2")
(invalid "mode fast"))
Validate keys and produce diagnostics with similarity-based suggestions.
> (define (best-key-suggestion k) (define-values (best-name best-score) (for/fold ([best-name #f] [best-score -1.0]) ([cand (in-list expected-keys)]) (define score (string-similarity k cand)) (if (> score best-score) (values cand score) (values best-name best-score)))) (if (and best-name (>= best-score 0.7)) best-name #f))
> (define diagnostics (for/list ([entry (in-list parsed-lines)] #:when (or (eq? (car entry) 'invalid) (and (string? (car entry)) (not (member (car entry) expected-keys))))) (cond [(eq? (car entry) 'invalid) (string-append "malformed line: " (cadr entry))] [else (define key (car entry)) (define sug (best-key-suggestion key)) (if sug (string-append "unknown key " key "; did you mean " sug "?") (string-append "unknown key " key))]))) > diagnostics '("unknown key retris; did you mean retries?" "malformed line: mode fast")
Patch one setting in-place and normalize final output.
> (define old-timeout "timeout = 30") > (define i (string-find-needle normalized old-timeout))
> (define patched (if i (string-replace-range normalized i (+ i (string-length old-timeout)) "timeout = 45") normalized))
> (define final-config (string-ensure-ends-with-newline patched)) > (displayln final-config)
; demo config
host = example.org
port = 8080
timeout = 45
retris = 2
mode fast