Recommendations for my parser and what do I do next #121039
-
#include <iostream>
#include <string>
#include <map>
#include "Interpreter.h"
using namespace std;
// A struct for the data type tokens
struct DataTypeTokens {
enum DataTypes {
STRING,
FLOAT,
INT,
CONST,
CHAR,
};
};
// A struct for the keyword tokens
struct Keywords {
enum KeywordTokens {
INTEGER8,
INTEGER16,
INTEGER32,
INTEGER64,
STRING8,
STRING16,
STRING32,
STRING64,
FLOAT8,
FLOAT16,
FLOAT32,
FLOAT64,
CHAR8,
CHAR16,
CHAR32,
CHAR64,
INLINE_ASSEMBLY,
AUTOMATIC_POINTER,
MANUAL_POINTER,
PRINT,
};
};
// Hold the token
struct Tokens {
DataTypeTokens dataTypeTokens;
Keywords keywords;
};
// Send the token to the interpreter
// Parse an instruction
int ParseInstruction(string instruction) {
Tokens tokens;
map<string, int>keywords = {
{"PRINT", tokens.keywords.PRINT},
{"INT8", tokens.keywords.INTEGER8},
{"INT16", tokens.keywords.INTEGER16},
{"INT32", tokens.keywords.INTEGER32},
{"INT64", tokens.keywords.INTEGER64},
{"FLOAT8", tokens.keywords.FLOAT8},
{"FLOAT16", tokens.keywords.FLOAT32},
{"FLOAT64", tokens.keywords.FLOAT64},
{"CHAR8", tokens.keywords.CHAR8},
{"CHAR16", tokens.keywords.CHAR16},
{"CHAR32", tokens.keywords.CHAR32},
{"CHAR64", tokens.keywords.CHAR64},
{"STRING8", tokens.keywords.STRING8},
{"STRING16", tokens.keywords.STRING16},
{"STRING32", tokens.keywords.STRING32},
{"STRING64", tokens.keywords.STRING64},
{"ASM", tokens.keywords.INLINE_ASSEMBLY},
{"AUTOMATIC_POINTER", tokens.keywords.AUTOMATIC_POINTER},
{"MANUAL_POINTER", tokens.keywords.MANUAL_POINTER},
};
return keywords[instruction];
}
int main() {
string convertedInstruction;
string instruction;
while (true) {
cout << "BASIC > ";
cin >> instruction;
convertedInstruction = ParseInstruction(instruction);
cout << convertedInstruction << endl;
}
return 0;
} Guidelines
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 7 replies
-
Hey there @PyDOS8 First off, what language are you trying to parse? Could you provide some link to a description of the language or give a code example? I did see your BASICPY repository with the instructions so I am going to try to infer it from there. It looks like you try to parse some simplified version of assembly. So, assume, you have the fallowing input: INT var1 9
INT var2 8
INT result 0
ADD result var1 var2 Then, what you do is you call ParseInstruction on each line. So far, so good. TokenizationThe first step you need to do now is tokenization. A token is an "atom" of the programming language. As far as I can tell, you need more tokens, namingly: Instruction tokens, a variable token, number token. You need to turn your input into a stream of tokens. Here is a very basic example in a pseudolanguage. First, we need a helper function: # helper: allows to have many whitespaces between elements.
# splits string among whitespaces and makes sure the result only has nonempty strings as result.
String[] split_elements(String s):
String[] res = []
String curr = ""
for (i in range(len(s))):
if (s[i].isWhitespace()): # space, newline, etc: okay, new element begins.
if (curr != ""):
res.append(curr)
curr = ""
else:
curr += s[i]
return res; This is a helper, which splits the string among whitespaces, even if the user jams whitespace. Thus, The big meat: Top down!First, if you are overwhelmed, here is the approach in a nutshell: Top down. You take the string. You break it down. You parse each part. Left to right. Simple, right? Note the names: Each construct, (statement, instruction (binary op), variable) has a tokenizer function. Token[] tokenize_statement(String s):
String[] s = split_elements(s)
if (s[0] not in keywords):
raise Exception("Unknown instruction s[0]")
Token instr = keywords[s[0]]
switch(instr):
case ADD:
return [instr].append(tokenize_binary_op(s[1:]))
# ... all other instructions
return None # should not get here.
# handles the arguments of binary operands like ADD, SUBTRACT, MULTIPLY, DIVIDE, etc.
Token[] tokenize_binary_op(String[] s):
if (s.length != 3):
raise Exception("Require 3 arguments")
return [tokenize_variable(s[0]), tokenize_variable(s[1]), tokenize_variable(s[2])]
Token tokenize_variable(String var):
if (var[0].isDigit()):
raise Exception("Variable should not start with digit!")
if (var in keywords):
raise Exception(f"Variable should not be called after keyword {var}")
return VariableToken(s); What does this do? It produces a stream (array) of tokens for a given statement. The way this is done is:
Note: The above reinforces all the rules of the language! For example, the above dictates that Again, big takeaway: Top down, break the string into parts and go left to right! Ps. If this comment helped you, please consider marking it as the answer. Thanks. |
Beta Was this translation helpful? Give feedback.
I am now working on a game, so this project remains unfinished. So, I'll mark this question as answered.