Regex Machine Language
1 Programs and Instructions
program?
program
example-program
program-instructions
instruction?
1.1 CHAR instructions
char-instruction?
char-instruction
char-instruction-unwrap
1.2 MATCH instructions
match-instruction?
match-instruction
1.3 JMP instructions
jmp-instruction?
jmp-instruction
jmp-instruction-target
1.4 SPLIT instructions
split-instruction?
split-instruction
split-instruction-target
split-instruction-fork-target
2 Regex Virtual Machine
vm?
vm
vm-program
vm-input
vm-status
vm-run
2.1 Regex VM Threads
vm-thread?
vm-threads
vm-thread-program-counter
vm-thread-character-counter
8.12

Regex Machine Language🔗ℹ

 (require regex-machine) package: regex-machine

1 Programs and Instructions🔗ℹ

procedure

(program? v)  boolean?

  v : any/c
A predicate that determines if v is a regex machine program.

procedure

(program inst ...)  program?

  inst : instruction?
Constructs a regex machine program comprised of the given insts.

Example:
> (program (char-instruction #\a)
           (char-instruction #\b)
           (split-instruction 3 5)
           (char-instruction #\x)
           match-instruction
           (char-instruction #\y)
           match-instruction)

(program

 (char-instruction #\a)

 (char-instruction #\b)

 (split-instruction 3 5)

 (char-instruction #\x)

 #<match-instruction>

 (char-instruction #\y)

 #<match-instruction>)

The example program. Used in documentation examples and in tests.

Example:
> example-program

(program

 (split-instruction 1 3)

 (char-instruction #\a)

 (jmp-instruction 0)

 (char-instruction #\b)

 (split-instruction 5 7)

 (char-instruction #\b)

 (jmp-instruction 4)

 #<match-instruction>)

procedure

(program-instructions prog)  (listof instruction?)

  prog : program?
Returns the list of instructions in prog.

Examples:
> (program-instructions
   (program (char-instruction #\a)
            (char-instruction #\b)
            (char-instruction #\c)))

(list (char-instruction #\a) (char-instruction #\b) (char-instruction #\c))

> (program-instructions example-program)

(list

 (split-instruction 1 3)

 (char-instruction #\a)

 (jmp-instruction 0)

 (char-instruction #\b)

 (split-instruction 5 7)

 (char-instruction #\b)

 (jmp-instruction 4)

 #<match-instruction>)

procedure

(instruction? v)  boolean?

  v : any/c
A predicate that determines if v is a regex machine instruction.

1.1 CHAR instructions🔗ℹ

procedure

(char-instruction? v)  boolean?

  v : any/c
A predicate that determines if v is a CHAR instruction. Implies instruction?.

procedure

(char-instruction char)  char-instruction?

  char : char?
Constructs a CHAR instruction that instructs the machine to consume char from the input.

Example:
> (char-instruction #\b)

(char-instruction #\b)

procedure

(char-instruction-unwrap inst)  char?

  inst : char-instruction?
Returns the character that inst instructs the machine to consume.

Example:

1.2 MATCH instructions🔗ℹ

procedure

(match-instruction? v)  boolean?

  v : any/c
A predicate that determines if v is the MATCH instruction. Implies instruction?.

The MATCH instruction, which instructs the machine to stop running the executing thread and mark that thread’s execution successful if all input has been consumed. If any input has not been consumed, the thread’s execution is considered failed.

1.3 JMP instructions🔗ℹ

procedure

(jmp-instruction? v)  boolean?

  v : any/c
A predicate that determines if v is a JMP instruction. Implies instruction?.

procedure

(jmp-instruction target)  jmp-instruction?

  target : natural?
Constructs a JMP instruction, which instructs the executing thread to jump to the instruction at position target in the program.

Example:
> (jmp-instruction 4)

(jmp-instruction 4)

procedure

(jmp-instruction-target inst)  natural?

  inst : jmp-instruction?
Returns the position that inst instructs the executing thread to jump to.

Examples:
> (define example-jmp (jmp-instruction 42))
> (jmp-instruction-target example-jmp)

42

1.4 SPLIT instructions🔗ℹ

procedure

(split-instruction? v)  boolean?

  v : any/c
A predicate that determines if v is a SPLIT instruction. Implies instruction?.

procedure

(split-instruction target fork-target)  split-instruction?

  target : natural?
  fork-target : natural?
Constructs a SPLIT instruction, which instructs the executing thread to jump to the instruction at position target in the program, similar to a JMP instruction. However, unlike a JMP instruction, the SPLIT instruction also instructs the machine to spawn a new thread that is forked from the thread executing the SPLIT, meaning the new thread is a copy of this thread with the same program position and the same consumed input. The forked thread then jumps to the fork-target instruction.

Example:
> (split-instruction 2 5)

(split-instruction 2 5)

procedure

(split-instruction-target inst)  natural?

  inst : split-instruction?

procedure

(split-instruction-fork-target inst)  natural?

  inst : split-instruction?
Functions that return either the position that inst instructs the executing thread to jump to, or the position that inst instructs the forked thread to jump to, respectively.

Examples:
> (define example-split (split-instruction 42 17))
> (split-instruction-target example-split)

42

> (split-instruction-fork-target example-split)

17

2 Regex Virtual Machine🔗ℹ

procedure

(vm? v)  boolean?

  v : any/c
A predicate that determines if v is a regex virtual machine.

procedure

(vm prog input)  vm?

  prog : program?
  input : string?
Constructs a regex virtual machine that matches input with prog.

Example:
> (vm example-program "abx")

image

procedure

(vm-program machine)  program?

  machine : vm?
Returns the program that machine is running.

Examples:
> (define example-vm (vm example-program "ab"))
> (vm-program example-vm)

(program

 (split-instruction 1 3)

 (char-instruction #\a)

 (jmp-instruction 0)

 (char-instruction #\b)

 (split-instruction 5 7)

 (char-instruction #\b)

 (jmp-instruction 4)

 #<match-instruction>)

procedure

(vm-input machine)  string?

  machine : vm?
Returns the input string that machine is trying to match with its program.

Examples:
> (define example-vm (vm example-program "ab"))
> (vm-input example-vm)

"ab"

procedure

(vm-status machine)  (or/c 'running 'success 'failure)

  machine : vm?
Returns the current status of machine. A VM can have one of three statuses:

Examples:
> (define multiple-as-then-one-b
    (program (split-instruction 1 3)
             (char-instruction #\a)
             (jmp-instruction 0)
             (char-instruction #\b)
             match-instruction))
> (define aab-vm
    (vm multiple-as-then-one-b "aab"))
> (vm-status aab-vm)

'running

> (vm-status (vm-run aab-vm #:steps 2))

'running

> (vm-status (vm-run aab-vm #:steps 10))

'success

procedure

(vm-run machine #:steps num-steps)  vm?

  machine : vm?
  num-steps : natural?
Advances machine by a num-steps execution steps and returns the updated machine. All threads in a regex machine run in parallel, so a single execution step of the machine updates multiple threads at once.

Note that virtual machines are immutable, so this function returns a new machine that is distinct from machine.

Examples:
> (define even-number-of-as
    (program (split-instruction 1 4)
             (char-instruction #\a)
             (char-instruction #\a)
             (jmp-instruction 0)
             match-instruction))
> (define aaaaaa-vm (vm even-number-of-as "aaaaaa"))
> aaaaaa-vm

image

> (vm-run aaaaaa-vm #:steps 1)

image

> (vm-run aaaaaa-vm #:steps 4)

image

> (vm-run aaaaaa-vm #:steps 20)

image

2.1 Regex VM Threads🔗ℹ

procedure

(vm-thread? v)  boolean?

  v : any/c
A predicate that determines if v is a thread of execution in a regex virtual machine.

procedure

(vm-threads machine)  (listof vm-thread?)

  machine : vm?
Returns a list of all threads that have ever been started by machine, including running threads, dead threads, and threads that successfully matched the input.

Examples:
> (define all-as-or-all-bs
    (program (split-instruction 1 4)
             (split-instruction 2 7)
             (char-instruction #\a)
             (jmp-instruction 1)
             (split-instruction 5 7)
             (char-instruction #\b)
             (jmp-instruction 4)
             match-instruction))
> (define aaaa-vm (vm all-as-or-all-bs "aaaa"))
> (vm-threads aaaa-vm)

(list (vm-thread 'running 0 0))

> (vm-threads (vm-run aaaa-vm #:steps 1))

(list (vm-thread 'running 1 0) (vm-thread 'running 4 0))

> (vm-threads (vm-run aaaa-vm #:steps 2))

(list

 (vm-thread 'running 2 0)

 (vm-thread 'running 5 0)

 (vm-thread 'running 7 0)

 (vm-thread 'running 7 0))

procedure

(vm-thread-program-counter thd)  natural?

  thd : vm-thread?
Returns the program counter of thd, which indicates what program instruction the thread is executing. CHAR and MATCH instructions increase this number by one, while JMP and SPLIT instructions set it to a new number.

procedure

(vm-thread-character-counter thd)  natural?

  thd : vm-thread?
Returns the character counter of thd, which is the total number of characters that thd has accepted from the input string so far. Each CHAR instruction successfully executed by a thread increases this number by one.

Note that threads must accept characters from the input string in left-to-right order and cannot skip characters, so the exact prefix accepted by the thread can be reconstructed given its character counter and the original input string.