DCG parsing
In Prolog, atoms can have spaces in them; e.g. 'an atom'. Atoms without spaces
can be written down without quotes; e.g. anAtom.
Prolog stores strings in rabbit ears as lists of numbers. So:
?- X = "09azAZ"
X = [48, 57, 97, 122, 65, 90]
Which means we can use DCGs to recognize valid numbers:
% num1.pl
% this file is a simple recongizer. see num2.pl
% for an example of processing the string as we go
% standard demo predicate.
% ignore(X) ensures that we always get to the told
% (i.e. no open files left hanging)
demo :- tell('num1.out'), ignore(demo1), told.
% a failure drive loop- results not accessible
% after loop quicks except via global asserts.
demo1 :- demo2, fail.
% note also that demo/1 can be tested and debugged
% without wasting disc space with disc I/O.
demo2 :-
member(X,["23", " 23", "23.3 "]),
% when calling a DCG parser, often best
% to say the parse ends here; i.e. the
% resulting list is the empty list
space --> " ". % same as space --> [32].
tabb --> [9].
dot --> ".".
zero --> "0".
one --> "1".
two --> "2".
three --> "3".
four --> "4".
five --> "5".
six --> "6".
seven --> "7".
eight --> "8".
nine --> "9".
% "|" is the same as ";" i.e. "or"
blank --> space
| tabb. % can't use "tab" since that is a
% standard prolog predicate.
% problem: single global name space
% solution: 1) modules (non-portable)
% 2) my accessor system
whitespace --> []
| blank, whitespace.
digit --> one | two | three | four
| five | six | seven | eight | nine | zero.
% problem: wasted computation- digit can get parsed twice.
% solution: 1) memoing- oldt resolution- non-standard prolog
% 2) dont write big parsers this way- use
% operators and let the prolog reader
% work it all out for you
digits --> digit
| digit, digits.
num1 --> digits
| digits, dot, digits.
num --> whitespace,num1,whitespace.
% note the neat declarative nature of the above |
Which generates
num([50, 51])
num([32, 50, 51])
num([50, 51, 46, 51, 32])
And we can convert strings to numbers:
% num2.pl
% convertor of strings to numbers.
% based on the grammar of num1.pl
demo :- tell('num2.out'), ignore(demo1), told.
demo1 :- demo2, fail.
demo2 :-
member(X,["23", " 23", "23.3 ", "34.5671 "]),
% here's the output bound to the empty list again
% ~s will print a string of ascii numbers in their
% atom form. e.g. X = [116, 105, 109], format('<~s>',[X])
% prints <tim>
format('"~s" = [~10f]\n',[X,N]).
space --> " ".
tabb --> [9].
dot --> ".".
zero --> "0".
one --> "1".
two --> "2".
three --> "3".
four --> "4".
five --> "5".
six --> "6".
seven --> "7".
eight --> "8".
nine --> "9".
ascii2Number(N0,N) :-
N is N0 - Zero.
blank --> space | tabb.
whitespace --> []
| blank, whitespace.
digit -->
one | two | three | four
| five | six | seven | eight | nine | zero.
% digitNumber does not use the DCG syntax since it wants
% to grab the head of the list without popping it off the
% list.
digitNumber(N,[N0|L],Out) :-
% note the added variables- DCGs can not only parse, but
% bind variables as a side effect of the parse
digits(N,0) --> digitNumber(N).
digits(N,P) -->
% P0 is the number of tens places used by the
% digits. e.g. "123" would take P0=2
% the DCG expansion (adding in carry variables)
% is disabled inside {curly brackets}
% (this extra syntax could be avoided via dcgfix)
{P is P0 + 1,
N is N1*10^P + N2}.
num1(N) --> digits(N,_).
num1(N) --> digits(N1,_),
% so neat- the final number is the number
% LHS of the dot plus the right hand side
% number divided by a factor for the number
% of digits
{N is N1 + N2/10^(P+1)}.
num(N) --> whitespace,num1(N),whitespace. |
Which generates
"23" = [23.0000000000]
" 23" = [23.0000000000]
"23.3 " = [23.3000000000]
"34.5671 " = [34.5671000000]
Not © Tim Menzies, 2001
Share and enjoy- information wants to be free.
But if you take anything from this site, please credit tim@menzies.com.