[Back to PARSING SWAG index] [Back to Main SWAG index] [Original]
Type
RW_toKEN = Record
token_str :String[9];
token_cod :toKEN_CODE;
end;
RW_Type = Array[0..9] of RW_toKEN;
RWT_PTR = ^RW_Type;
Const
NULL = '';
Rw_2 :RW_Type = ((token_str : 'do'; token_cod : tdo),
(token_str : 'if'; token_cod : tif),
(token_str : 'in'; token_cod : tin),
(token_str : 'of'; token_cod : tof),
(token_str : 'or'; token_cod : tor),
(token_str : 'to'; token_cod : tto),
(token_str : NULL; token_cod : NO_toKEN),
(token_str : NULL; token_cod : NO_toKEN),
(token_str : NULL; token_cod : NO_toKEN),
(token_str : NULL; token_cod : NO_toKEN)
);
...the difference being the explicit declaration of the Constant
Record fields. (I'm used to Array Constants, not Record
Constants - I was unaware of the requirement)
PARSinG NUMBERS
Now we'll concentrate on parsing Integer and Real numbers.
The Pascal definition of a number begins With an UNSIGNED
Integer. An unsigned Integer consists of one or more consecutive
DIGITS. The simplest Form of a number token is an unsigned
Integer:
1 9 120 12654
A number token can also be an unsigned Integer (the whole part)
followed by a fraction part. A fraction part consists of a
decimal point followed by an unsigned Integer, such as:
123.45 0.9987564
These numbers have whole parts 123 and 0 respectively, and
fraction parts .45 and .9987564 respectively.
A number token can also be a whole part followed by an EXPONENT
part. An exponent part consists of an "E" (or "e") followed by
an unsigned Integer. An optional exponent sign, + or -, can
appear between the letter and the first exponent digit.
Examples:
134e2 2E99 123e-45 73623E+4
Finally, a number token can be a whole part followed by a
fraction part and an exponent part, in that order:
2.3498E7 0.00034e-66
I arbitrarily limit the number of digits to 20, and the exponent
value from -37 to +37 - the exact value necessary to limit this
value is dependant on how Real values are represented on the
Computer.
The "get_number" Function is likely to be the biggest Function
in your scanner, but it should be relatively straighForward to
code...in light of what has already been done With the scanner/
tokenizer module, and the definition of a number.
EXERCISE #1
Write the get_number Function to parse Integers and Real
numbers.
You will need to add the following Types and Variables to your
global data segment:
Type { add "Real"s to list... }
LITERAL_Type = (Integer_LIT, Real_LIT, String_LIT);
LITERAL_REC = Record
Case lType:LITERAL_Type of
Integer_LIT: (ivalue :Integer);
Real_LIT : (rvalue :Real );
String_LIT : (svalue :String );
end;
Var
digit_count :Word;
count_error :Boolean;
-------------- PART 2 ---------------------------------------
The rest of this post will cover two simple topics - parsing
Strings inside quotes, and parsing comments.
PARSinG COMMENTS {}
The Compiler should ignore the input between two curly braces
({}), and the curly braces themselves. My scanner is written so
the entire comment is replace by a Single blank (" "), although
you could possibly Write the scanner so that comments are
_totally_ ignored.
EXERCISE #2:
Integrate COMMENT detection into the get_Char routine, so that
when your Character fetching routine will ignore comments and
pass a blank when a comment is encountered, skipping the comment
entirely For the next fetch.
Make sure that the routine keeps reading Until the right curly
brace is detected, even past the end-of-line. if the end-of-File
is encountered beFore the right curly brace is found, an
"unexpected end" error should be generated.
PARSinG StringS (QUOTES) ''
The quote Character delimits Strings, any Character between the
Strings is ignored by the Compiler, except to stored as a String
LITERAL. if you wish a ' (quote) to be included in the literal,
and extra ' must precede it.
One possible tricky area is the {} (comment) Character. You must
be careful not to inadvertently trigger the comment routine within
the quote routine While reading a String, otherwise you will
have a BUG.
EXERCISE #3:
Add a quote routine to the get_token routine within your module,
to fetch Strings, as a LITERAL IDENTifIER when the QUOTE
Character is detected.
The following mods to your Types are required:
Eof_Char = #$7F;
Type
Char_CODE = (LETTER, DIGIT, QUOTE, SPECIAL, Eof_CODE);
{ The following code init's the Character maping table: }
Var
ch :Byte;
begin
For ch := 0 to 255 do
Char_table[ch] := SPECIAL;
For ch := ord('0') to ord('9') do
Char_table[ch] := DIGIT;
For ch := ord('A') to ord('Z') do
Char_table[ch] := LETTER;
For ch := ord('a') to ord('z') do
Char_table[ch] := LETTER;
Char_table[ord(Eof_Char)] := Eof_CODE;
Char_table[39] := QUOTE;
end;
----------------------------------------------------------------
PLEASE, please let me know what you think about these posts,
even if they're negative - I want to have some feedback on the
difficulties, and whether or not people are having trouble
following the material - I _can_ be more concise at the cost of
being more verbose - if it's needed!
if you are having problems With your source code, and want me to
do a detailed examination of your code, expecially if it's
written in a language other than Pascal, send me email via the
Internet - to avoid "carpet bombing" the conference with
undesired material.
NEXT POST:
Error codes, and putting your code to the test - our first
utility (other than the lister) : a source Program Compactor
(not cruncher).
FUTURE POSTS:
- Review and (hopefully) a status report from "students"
- Symbol table
- YA utility (cross - referencer)
- YA utility (source Program CRUNCHer)
- YA utility (source Program UNcruncher)
- Parsing simple expressions
- Utility : CALC, using infix-to-postfix conversions and stack
ops.
- Parsing statements
- Utility: Pascal syntax checker part I
- Parsing declarations (Var, Type, etc)
incl's: much improved (and much more Complex) symbol table
- Utility: Declarations analyzer.
- Syntax Checker part II
- Parsing Program, Procedure, and Function declarations
(routines).
- Syntax checker Part III
- Review and discussion?
[Back to PARSING SWAG index] [Back to Main SWAG index] [Original]