i recently built a toy programming language (brtlang), which got me more interested about interpreters and compilers. so i thought of building a brainfuck interpreter (pretty obvious, didn't you see it coming?).
brainfuck is an esoteric programming language which is designed to be extremely minimalistic. the language consists of only eight simple commands, a data pointer, and an instruction pointer. even tho it has a very small set of keywords, it is a turning-complete language.
a brainfuck program is initialized with an array of 30,000 1 byte memory
blocks (would be referring it as tape
), in which each
cell/block is initialized to zero, an instruction pointer that loops over
the tape, and a data pointer that loops over the file contents.
the available 8 keywords/operators in brainfuck are:
>
- move the instruction pointer to the next block
<
- move the instruction pointer to the previous block
+
- increase the value of current block-
- decrease the value of current block[
- loops until current block value is equal to 0]
- if current block value is non-zero, it jumps back to
[
. [
and ]
combined form the loop
,
- reads the character from stdin
and stores
it at the current block
.
- prints out current block value to stdout
any character other than the above mentioned 8 keywords must be considered as a comment, basically ignored by the interpreter
>++++++++[<+++++++++>-]<.>++++[<+++++++>-]<+.+++++++..+++.>>++++++[<+++++++>-]<++.------------.>++++++[<+++++++++>-]<+.<.+++.------.--------.>>>++++[<++++++++>-]<+.
let's break it down.
>++++++++[<+++++++++>-]<.
>++++[<+++++++>-]<+.
+++++++.
.
+++.
>>++++++[<+++++++>-]<++.
------------.
>++++++[<+++++++++>-]<+.
<.
+++.
------.
--------.
>>>++++[<++++++++>-]<+.
>++++++++[<+++++++++>-]<.
the instruction pointer moves one block to the right and increments the value by 8
>> current state of tape - [0,(8),...]
(()
indicates that the pointer is currently at that block)
the program then enters a loop. the instruction pointer moves one block to the left and increments the value by 9
>> current state of tape - [(9),8,...]
the instruction pointer moves one block to the right and decrements
the value by 1. as the current block value is non-zero, it jumps back
to [
and it continues until value of the second block
becomes zero, so 8 times. at the end of the loop, the value of first
block would be equal to 72
>> current state of tape - [72,(0),...]
the instruction pointer moves one block to the left and prints out current block value
>> current state of tape - [(72),0,...]
72 is ASCII code for letter H
>++++[<+++++++>-]<+.
the instruction pointer moves one block to the right and increments the value by 4
>> current state of tape - [72,(4),...]
the program enters a loop. the instruction pointer moves one block to the left and increments the value by 7
>> current state of tape - [(80),4,...]
the instruction pointer moves one block to the right and decrements
the value by 1. as the current block value is non-zero, it jumps back
to [
and it continues until the value of the second block
becomes zero, so 4 times. at the end of the loop, the value of first
loop would be equal to 72 + (4*7) = 100
>> current state of tape - [100,(0),...]
the instruction pointer moves one block to the left, increments the value by 1 and prints out current block's value
>> current state of tape - [(101),0,...]
101 is ASCII code for letter e
+++++++.
the instruction pointer increments value of current block by 7 and prints out current block's value
>> current state of tape - [(108),0,...]
108 is ASCII code for letter l
.
the instruction pointer prints out current block's value
>> curent state of tape - [(108),0,...]
108 is ascii code for letter l
+++.
the instruction pointer increments value of current block by 3 and prints out current block's value
>> current state of tape - [(111),0,...]
111 is ascii code for letter o
>>++++++[<+++++++>-]<++.
the instruction pointer moves two blocks to the right and increments value of current block by 6
>> current state of tape - [111,0,(6),...]
the program enters a loop. the instruction pointer moves one block to the left and increments value of current block by 7
>> current state of tape - [111,(7),6,...]
the instruction pointer moves one block to the right and decrements
value of current block by 1. as value of current block is non-zero,
the program jumps back to [
and it continues until value
of third block is equal to zero, so 6 times
>> current state of tape - [111,42,(0),...]
the instruction pointer moves one block to the left, increments value of current block by 2 and prints out the value
>> current state of tape - [111,(44),0,...]
44 is ASCII code for ,
so yea, you get it. let's now write the interpreter
let's structure interpreter
type Interpreter struct {
Tape [30000]uint8
Ptr int
Input []byte
}
Tape
is 30,000 byte array which is initialized at the start
of the program
Ptr
is current index of instruction pointerInput
is the source code
Tape
is of an array of type uint8
to have
wrapping
func (p *Interpreter) Run() {
bracketCounter := 0
for i := 0; i < len(p.Input); i++ {
switch p.Input[i] {
case '>':
p.Ptr++
case '<':
p.Ptr--
case '+':
p.Tape[p.Ptr]++
case '-':
p.Tape[p.Ptr]--
case '.':
fmt.Print(string(p.Tape[p.Ptr]))
case ',':
var input string
if _, err := fmt.Scan(&input); err != nil {
fmt.Println(InterpreterError("failed to read input", i))
os.Exit(1)
}
runes := []rune(input)
if len(runes) > 1 {
fmt.Println(InterpreterError(fmt.Sprintf("recieved %s. expected single character input", input), i))
os.Exit(1)
}
p.Tape[p.Ptr] = byte(runes[0])
case '[':
if p.Tape[p.Ptr] == 0 {
bracketCounter++
for p.Input[i] != ']' || bracketCounter != 0 {
i++
if p.Input[i] == '[' {
bracketCounter++
} else if p.Input[i] == ']' {
bracketCounter--
}
}
}
case ']':
if p.Tape[p.Ptr] != 0 {
bracketCounter++
for p.Input[i] != '[' || bracketCounter != 0 {
i--
if p.Input[i] == ']' {
bracketCounter++
} else if p.Input[i] == '[' {
bracketCounter--
}
}
}
}
}
}
i've defined Run
method on Interpreter
struct
which is responsible for interpreting the source code.
the code might look overwhelming at first, so let's break down.
Run
method loops the source code and checks value of each
character. i
in the for
loop acts like data
pointer, p.Ptr
acts as instruction pointer and
p.Tape[p.Ptr]
acts as current block
>
then p.Ptr
is
incremented by 1
<
then p.Ptr
is
decremented by 1
+
then value of current block
(p.Tape[p.Ptr]
) is incremented by 1
-
then value of current block
(p.Tape[p.Ptr]
) is decremented by 1
.
then string equivalent of current
block value (string(p.Tape[p.Ptr])
) is printed out
if current character is ,
then the input is read from
stdin
var input string
if _, err := fmt.Scan(&input); err != nil {
fmt.Println(InterpreterError("failed to read input", i))
os.Exit(1)
}
and checks whether the input is a multi-character or not.
runes := []rune(input)
if len(runes) > 1 {
fmt.Println(InterpreterError(fmt.Sprintf("recieved %s. expected single character input", input), i))
os.Exit(1)
}
InterpreterError
is just an utility function which
appends an prefix to an error message
func InterpreterError(msg string, idx int) string {
return fmt.Sprintf("an error occured at %d character: %s", idx, msg)
}
if the input isn't a multi-character then ASCII code of the first element is stored into the current block
p.Tape[p.Ptr] = byte(runes[0])
if current character is [
then it checks current block
value. if it is zero, then the loop is skipped until the matching
]
. if no, then it executes the loop body by moving the
data pointer i.e. incrementing i
bracketCounter
is used to track how many pairs of
brackets ([
and ]
) the interpreter has
encountered
if p.Tape[p.Ptr] == 0 {
bracketCounter++
for p.Input[i] != ']' || bracketCounter != 0 {
i++
if p.Input[i] == '[' {
bracketCounter++
} else if p.Input[i] == ']' {
bracketCounter--
}
}
}
if current character is ]
then it checks current block
value. if it is zero, then the loop is exited. if no, then it jumps
back to matching [
to execute the loop body until current
block value is equal to zero
if p.Tape[p.Ptr] != 0 {
bracketCounter++
for p.Input[i] != '[' || bracketCounter != 0 {
i--
if p.Input[i] == ']' {
bracketCounter++
} else if p.Input[i] == '[' {
bracketCounter--
}
}
}
and that's how you implement a brainfuck interpreter in under 100 lines in golang.
source code