EECE 571F= Domain-Specific Languages  
  
This is a page from yet
another great ECE, UBC subject.
[ Home | Assignments | Lectures |
DSL rules | Articles | Resources ]

DCG parsing

Lectures:
Old: 1 | 2 | 3 | 4 | 5
New: DCGnums | Parsing | Meta-prolog | Faster1 | Faster2 | Rand-nums
Not done: Abduction | Stochastic abs

In Prolog, atoms can have spaces in them; e.g. 'an atom'. Atoms without spaces can be written down without quotes; e.g. anAtom.

Prolog stores strings in rabbit ears as lists of numbers. So:


?- X = "09azAZ" 
X = [48, 57, 97, 122, 65, 90] 

Which means we can use DCGs to recognize valid numbers: num1.pl=


% num1.pl

% this file is a simple recongizer. see num2.pl
% for an example of processing the string as we go

% standard demo predicate.
% ignore(X) ensures that we always get to the told
% (i.e. no open files left hanging)

demo :- tell('num1.out'), ignore(demo1), told.

% a failure drive loop- results not accessible
% after loop quicks except via global asserts.

demo1 :- demo2, fail.
demo1.

% note also that demo/1 can be tested and debugged
% without wasting disc space with disc I/O.

demo2 :-
	member(X,["23", " 23", "23.3 "]),
	% when calling a DCG parser, often best
	% to say the parse ends here; i.e. the
	% resulting list is the empty list
	num(X,[]),
	print(num(X)),
	nl.

space --> " ". % same as space --> [32].
tabb  --> [9].
dot   --> ".".
zero  --> "0".
one   --> "1".
two   --> "2".
three --> "3".
four  --> "4".
five  --> "5".
six   --> "6".
seven --> "7".
eight --> "8".
nine  --> "9".

% "|" is the same as ";" i.e. "or"
blank --> space
        | tabb. % can't use "tab" since that is a
                % standard prolog predicate. 
                % problem: single global name space
                % solution: 1) modules (non-portable)
                %           2) my accessor system

whitespace --> []
            | blank, whitespace.

digit --> one | two | three | four
        | five | six | seven | eight | nine | zero.

% problem: wasted computation- digit can get parsed twice.
% solution: 1) memoing- oldt resolution- non-standard prolog
%           2) dont write big parsers this way- use
%              operators and let the prolog reader
%              work it all out for you
digits --> digit
        |  digit, digits.

num1 --> digits
        | digits, dot, digits.

num --> whitespace,num1,whitespace.

% note the neat declarative nature of the above

Which generates num1.out=


num([50, 51])
num([32, 50, 51])
num([50, 51, 46, 51, 32])

And we can convert strings to numbers: num2.pl=


% num2.pl

% convertor of strings to numbers.
% based on the grammar of num1.pl

demo :- tell('num2.out'), ignore(demo1), told.

demo1 :- demo2, fail.
demo1.

demo2 :-
	member(X,["23", " 23", "23.3 ", "34.5671 "]),
	% here's the output bound to the empty list again
	num(N,X,[]),
	% ~s will print a string of ascii numbers in their
	% atom form. e.g.   X = [116, 105, 109], format('<~s>',[X])
	% prints <tim>
	format('"~s" = [~10f]\n',[X,N]).

space --> " ". 
tabb  --> [9].
dot   --> ".".
zero  --> "0".
one   --> "1".
two   --> "2".
three --> "3".
four  --> "4".
five  --> "5".
six   --> "6".
seven --> "7".
eight --> "8".
nine  --> "9".

ascii2Number(N0,N) :-
	zero([Zero],[]),
	N is N0 - Zero.

blank --> space | tabb.

whitespace --> []
            | blank, whitespace.

digit --> 
	one | two | three | four
        | five | six | seven | eight | nine | zero.

% digitNumber does not use the DCG syntax since it wants
% to grab the head of the list without popping it off the
% list.
digitNumber(N,[N0|L],Out) :-
	digit([N0|L],Out),
	ascii2Number(N0,N).

% note the added variables- DCGs can not only parse, but
% bind variables as a side effect of the parse
digits(N,0) --> digitNumber(N).
digits(N,P) -->
	    digitNumber(N1),
	    % P0 is the number of tens places used by the
	    % digits. e.g. "123" would take P0=2
	    digits(N2,P0),
	    % the DCG expansion (adding in carry variables)
	    % is disabled inside {curly brackets}
	    % (this extra syntax could be avoided via dcgfix)
	    {P is P0 + 1,
	     N is N1*10^P + N2}.

num1(N) --> digits(N,_).
num1(N) --> digits(N1,_),
	    dot,
	    digits(N2,P),
	    % so neat- the final number is the number
	    % LHS of the dot plus the right hand side
	    % number divided by a factor for the number
	    % of digits
	    {N is N1 + N2/10^(P+1)}.

num(N) --> whitespace,num1(N),whitespace.

Which generates num2.out=


"23" = [23.0000000000]
" 23" = [23.0000000000]
"23.3 " = [23.3000000000]
"34.5671 " = [34.5671000000]


Not © Tim Menzies, 2001
Share and enjoy- information wants to be free.
But if you take anything from this site,
please credit tim@menzies.com.