“Ad Hoc” lexical analyzer

KimM · Aug 10, 2012

So for a project I am trying to create a simple lexical analyzer for a fake programming language that is read in from a file. I asked a question earlier in the week asking how I can implement such a program and relieved an answer telling me to: Create an input buffer and two output buffers.initialize two loops and increment them until i find the start of a token. once I have found the start, increment the second loop until i find a white space or symbol, and then use a case statement to output to the two output files, then make the outer loop equal to the inner and continue scanning. I've done some research and this method is similar to a loop and switch method or "ad hoc" method. \[code\]import java.io.*;public class Lex { public static boolean contains(char[] a, char b){ for (int i = 0; i < a.length; i++) { if(b == a) return true; } return false; } public static void main(String args[]) throws FileNotFoundException, IOException{ //Declaring token values as constant integers. final int T_DOUBLE = 0; final int T_ELSE = 1; final int T_IF = 2; final int T_INT = 3; final int T_RETURN = 4; final int T_VOID = 5; final int T_WHILE = 6; final int T_PLUS = 7; final int T_MINUS = 8; final int T_MULTIPLICATION = 9; final int T_DIVISION = 10; final int T_LESS = 11; final int T_LESSEQUAL = 12; final int T_GREATER = 13; final int T_GREATEREQUAL = 14; final int T_EQUAL = 16; final int T_NOTEQUAL = 17; final int T_ASSIGNOP = 18; final int T_SMEICOLON = 19; final int T_PERIOD = 20; final int T_LEFTPAREN = 21; final int T_RIGHTPAREN = 22; final int T_LEFTBRACKET = 23; final int T_RIGHTBRACKET = 24; final int T_LEFTBRACE = 25; final int T_RIGHTBRACE = 26; final int T_ID = 27; final int T_NUM = 28; char[] letters_ = {'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','A','B','C','D', 'E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','_'}; char[] numbers = {'0','1','2','3','4','5','6','7','8','9'}; char[] symbols = {'+','-','*','/','<','>','!','=',':',',','.','(',')','[',']','{','}'}; FileInputStream fstream = new FileInputStream("src\\testCode.txt"); DataInputStream in = new DataInputStream(fstream); BufferedReader br = new BufferedReader(new InputStreamReader(in)); BufferedWriter bw1 = new BufferedWriter(new FileWriter(new File("src\\output.txt"), true)); BufferedWriter bw2 = new BufferedWriter(new FileWriter(new File("src\\output2.txt"), true)); String scanner;String temp = ""; int n = 0; while((scanner = br.readLine()) != null){ for (int i = 0; i < scanner.length(); i++) { for (int j = 0; j < scanner.length(); j++) { if(contains(letters_,scanner.charAt(i)) || contains(numbers,scanner.charAt(i)) || contains(symbols,scanner.charAt(i))){ j++; n++; if(scanner.charAt(j) == ' ' || scanner.charAt(j) == '\n' || scanner.charAt(j) == '\t'){ } } } } } in.close(); }}\[/code\]My question is how can I determine what token to assign a word after i find a white space or symbol. Can i put each character before the ws and symbol in a string and compare it like that? I've tried something similar but it wrote my whole input file into the string so my tokens would not match in my switch statement. Also using this method how can I safely ignore comments and comment blocks as they should not be tokenized.

&ldquo;Ad Hoc&rdquo; lexical analyzer

KimM

New Member

“Ad Hoc” lexical analyzer