A while ago I went through the first part of Crafting Interpreters and implemented Lox in Java as the book did and shared it on GitHub in MyLox Repository, which inspired me to create a similar programming language.
Note: I changed the name of this repository multiple times, I wanted to call it Onyx but the name was taken so I went with Qamar.
In this repository, I will be going through the second part of the book, and try to implement my own programming language and document my journey.
So far, I've implemented the Bytecode along with a couple of instructions. Additionally, I created a simple disassembler that takes those chunks and disassembles them for debugging purposes. This photo shows a hardcoded example in main for the disassembly of the instructions:
I have also developed the Virtual Machine upon which the interpreter operates. Additionally, I have introduced new Bytecode instructions supported by a disassembler and integrated them into the interpreter loop. As a result, I am now able to evaluate complex arithmetic expressions seamlessly. This screenshot shows a disassembled chunk of bytecode calculating the expression -((1.2 + 3.4) / 5.6)
After completing a significant portion of the backend for the compiler, I have now begun working on the frontend. Today, I focused on the initial component essential for a compiler's frontend: the Scanner (also known as the Lexer). I successfully implemented all the lexical grammars, resulting in an artifact capable of processing source code and generating corresponding tokens.
This screenshot shows the output of the lexer where:
1
is the line number|
means that this token is also at line 136
beingTOKEN_VAR
defined in theTokenType
Enum in scanner.h19
,13
,21
,10
,8
, and39
beingTOKEN_IDENTIFIER
,TOKEN_EQUAL
,TOKEN_NUMBER
,TOKEN_STAR
,TOKEN_SEMICOLON
, andTOKEN_EOF
respectively
Today, I completed the final segment of the VM's execution pipeline and developed a compiler capable of parsing high-level source code using Vaughan Pratt’s “top-down operator precedence parsing” technique. The compiler outputs a series of low-level instructions (Bytecode instructions) that can be executed by the VM. The types of instructions I can parse and execute include:
- Number literals
- Parentheses for grouping
- Unary Expressions
- The Four Horsemen of the Arithmetic +, -, *, /
So basically now I've got myself an over-engineered calculator
So far, our language has only had one data type (double), represented as the typedef
Value. However, in the past few days, I've been working on implementing new data types. I successfully added three new data types: NIL
, BOOL
, and STRING
. Additionally, I've included some helper macros to facilitate type-checking and conversion between the language's representation of these data types and their C counterparts.
Additionally, I've introduced several new operators to handle operations associated with the new data types. These include logical operators such as <
, <=
, >
, and ==
, among others. Furthermore, I've implemented string concatenation using the +
operator.
After working on the hash table implementation for the language, I utilized it to implement and support global variables. This allows us to define variables using the keyword var
, assign them values, and subsequently modify their values. The retrieval of variables is entirely facilitated through the hash table mechanism.
So far, all variables supported by the language have been global. Recently, I focused on extending the language to incorporate blocks, block scopes, and local variables. Additionally, I introduced a completely new runtime representation for variables using the new struct Compiler
that keeps track of the scope depth, and the number of the local variables in that scope.
So given this example test program provided in block.qmr
If we run the code it will produce the following output as expected
Our programming language is finally starting to look like a legitimate programming language.
I don't feel like typing a lot now, but I implemented if
statements and if else
, so given this program in if_statement.qmr:
It produces the following output:
There's not much to explain here except for the fact that we've added a new feature: looping, specifically while loops, along with a new test file that contains a code snippet to verify its functionality while_loop.qmr that counts down from 10 to 1
It produces the following output:
This is an example of a for
loop that prints even numbers from 1 through 10 provided in for_loop.qmr:
It produces the following output:
With all of these control structures in place, Onyx is finally considered to be a Turing-complete programming language.
I have implemented support for defining and calling functions. Additionally, I have incorporated Native Functions, which are functions defined in C that can be accessed by users (similar to standard library functions). As an example, I have implemented a clock
function that provides the elapsed time since the program started. The following example is provided in fib.qmr:
Running it produces the following:
Given this program in native.qmr:
It produces the following output:
In Qamar, closures are functions that can capture and retain references to variables from their surrounding scope, enabling them to operate on these variables even after their outer functions have completed execution. Upvalues are the captured variables themselves, which are bound to a closure. Each closure maintains its own set of upvalues, allowing it to preserve the environment in which it was defined. Given this example from closures.qmr:
It produces the following output: