-
Notifications
You must be signed in to change notification settings - Fork 12
LexicalAnalyzer
Harpinder edited this page Feb 12, 2020
·
16 revisions
- Single DFA for identifiers and keywords. When you recognize an identifier, check if it's length <= 20 or not, if not, it's an error. Else, look for the identified string in a lookup table to check if it's a keyword.
- Real Numbers and Integers
- Operators
- Nested comments?
- LA keeps tracks of Line numbers for each token (to associate error messages with line # in other phases of the compiler).
- The transition diagram can have some regular definitions as the label of a transition.
- Each accepting state has an action attached (typically, returning a token and its attribute value) which is executed when the state is reached.
- e.g.
return(get_token_code(), name)
, get_token_code() searches a table to check if the name is a reserved word and returns it's integer code if so, else returns integer code of TK_ID, with the name of the string of characters forming the token - Stopping if no further transition possible based on lookahead
- Associating appropriate label
- Converting lexeme to a number if reqd
- Associating line # with lexeme
- e.g.
- LA can't recognize errors like
fi ( a == f(x) )
but can likea = 2r
- Tokenize the following
- 40.0e++ => ERROR(40.0e+) PLUS
- OR98 => ID(OR98)
- 123ea => NUM(123) ID(ea)
- 1e4 => NUM(1) ID(e4)
- 1.0ebc => ERROR(1.0e) ID(bc)
- bc1.0e2f => ID(bc1) ERROR(.) NUM(0) ID(e2f)
- 1.43e+ab => ERROR(1.43e+) ID(ab)
- Capturing the effect of . for floating-point numbers vs .. for rangeop requires two retractions/look aheads
- Every token represented by a unique number. (#define)
- Attributes for the token need to be saved
- identifier: lexeme of token or ptr to the symbol table entry
- int num/float num : value of number/ ptr to value
- keywords: NIL (no attribute required)
- Line numbers
- If you reach a dead state, the last accepting state is the one that represents the type and length of the longest valid lexeme
- A symbol table can also be implemented as a pair of two tables- one for maintaining the keywords and other for the identifiers
- Data structures for Symbol tables
- Hash table (for identifiers)
- Look up table (for keywords)
- Hash table of hash tables (for maintaining the scope)
- Identifiers
- Identifiers and keywords
- Integers
- Numbers
- Numbers with corresponding actions
- Operators
- Rel operator
- combined NFA
- Think how you will implement dynamic scoping in the language
- Add Lexemes for DRIVERDEF and DRIVERENDDEF