Skip to content

LexicalAnalyzer

Harpinder edited this page Feb 12, 2020 · 16 revisions

Lexical Analyzer

DFA Formation

  • Single DFA for identifiers and keywords. When you recognize an identifier, check if it's length <= 20 or not, if not, it's an error. Else, look for the identified string in a lookup table to check if it's a keyword.
  • Real Numbers and Integers
  • Operators
  • Nested comments?

Notes :

  • LA keeps tracks of Line numbers for each token (to associate error messages with line # in other phases of the compiler).
  • The transition diagram can have some regular definitions as the label of a transition.
  • Each accepting state has an action attached (typically, returning a token and its attribute value) which is executed when the state is reached.
    • e.g. return(get_token_code(), name), get_token_code() searches a table to check if the name is a reserved word and returns it's integer code if so, else returns integer code of TK_ID, with the name of the string of characters forming the token
    • Stopping if no further transition possible based on lookahead
    • Associating appropriate label
    • Converting lexeme to a number if reqd
    • Associating line # with lexeme
  • LA can't recognize errors like fi ( a == f(x) ) but can like a = 2r

Some issues faced

  • Tokenize the following
    • 40.0e++ => ERROR(40.0e+) PLUS
    • OR98 => ID(OR98)
    • 123ea => NUM(123) ID(ea)
    • 1e4 => NUM(1) ID(e4)
    • 1.0ebc => ERROR(1.0e) ID(bc)
    • bc1.0e2f => ID(bc1) ERROR(.) NUM(0) ID(e2f)
    • 1.43e+ab => ERROR(1.43e+) ID(ab)
  • Capturing the effect of . for floating-point numbers vs .. for rangeop requires two retractions/look aheads

Some ideas for implementing:

  • Every token represented by a unique number. (#define)
  • Attributes for the token need to be saved
    • identifier: lexeme of token or ptr to the symbol table entry
    • int num/float num : value of number/ ptr to value
    • keywords: NIL (no attribute required)
    • Line numbers
  • If you reach a dead state, the last accepting state is the one that represents the type and length of the longest valid lexeme
  • A symbol table can also be implemented as a pair of two tables- one for maintaining the keywords and other for the identifiers
  • Data structures for Symbol tables
    • Hash table (for identifiers)
    • Look up table (for keywords)
    • Hash table of hash tables (for maintaining the scope)

Some DFAs implementation examples

Final DFA

Front Back

Food for thought

  • Think how you will implement dynamic scoping in the language

ToDo:

  • Add Lexemes for DRIVERDEF and DRIVERENDDEF
Clone this wiki locally