DCG parsing

Old:	1 \| 2 \| 3 \| 4 \| 5
New:	DCGnums \| Parsing \| Meta-prolog \| Faster1 \| Faster2 \| Rand-nums
Not done:	Abduction \| Stochastic abs

In Prolog, atoms can have spaces in them; e.g. 'an atom'. Atoms without spaces can be written down without quotes; e.g. anAtom.

Prolog stores strings in rabbit ears as lists of numbers. So:

?- X = "09azAZ" X = [48, 57, 97, 122, 65, 90]

Which means we can use DCGs to recognize valid numbers: num1.pl=

% num1.pl % this file is a simple recongizer. see num2.pl % for an example of processing the string as we go % standard demo predicate. % ignore(X) ensures that we always get to the told % (i.e. no open files left hanging) demo :- tell('num1.out'), ignore(demo1), told. % a failure drive loop- results not accessible % after loop quicks except via global asserts. demo1 :- demo2, fail. demo1. % note also that demo/1 can be tested and debugged % without wasting disc space with disc I/O. demo2 :- member(X,["23", " 23", "23.3 "]), % when calling a DCG parser, often best % to say the parse ends here; i.e. the % resulting list is the empty list num(X,[]), print(num(X)), nl. space --> " ". % same as space --> [32]. tabb --> [9]. dot --> ".". zero --> "0". one --> "1". two --> "2". three --> "3". four --> "4". five --> "5". six --> "6". seven --> "7". eight --> "8". nine --> "9". % "|" is the same as ";" i.e. "or" blank --> space | tabb. % can't use "tab" since that is a % standard prolog predicate. % problem: single global name space % solution: 1) modules (non-portable) % 2) my accessor system whitespace --> [] | blank, whitespace. digit --> one | two | three | four | five | six | seven | eight | nine | zero. % problem: wasted computation- digit can get parsed twice. % solution: 1) memoing- oldt resolution- non-standard prolog % 2) dont write big parsers this way- use % operators and let the prolog reader % work it all out for you digits --> digit | digit, digits. num1 --> digits | digits, dot, digits. num --> whitespace,num1,whitespace. % note the neat declarative nature of the above

Which generates num1.out=

num([50, 51]) num([32, 50, 51]) num([50, 51, 46, 51, 32])

And we can convert strings to numbers: num2.pl=

% num2.pl % convertor of strings to numbers. % based on the grammar of num1.pl demo :- tell('num2.out'), ignore(demo1), told. demo1 :- demo2, fail. demo1. demo2 :- member(X,["23", " 23", "23.3 ", "34.5671 "]), % here's the output bound to the empty list again num(N,X,[]), % ~s will print a string of ascii numbers in their % atom form. e.g. X = [116, 105, 109], format('<~s>',[X]) % prints <tim> format('"~s" = [~10f]\n',[X,N]). space --> " ". tabb --> [9]. dot --> ".". zero --> "0". one --> "1". two --> "2". three --> "3". four --> "4". five --> "5". six --> "6". seven --> "7". eight --> "8". nine --> "9". ascii2Number(N0,N) :- zero([Zero],[]), N is N0 - Zero. blank --> space | tabb. whitespace --> [] | blank, whitespace. digit --> one | two | three | four | five | six | seven | eight | nine | zero. % digitNumber does not use the DCG syntax since it wants % to grab the head of the list without popping it off the % list. digitNumber(N,[N0|L],Out) :- digit([N0|L],Out), ascii2Number(N0,N). % note the added variables- DCGs can not only parse, but % bind variables as a side effect of the parse digits(N,0) --> digitNumber(N). digits(N,P) --> digitNumber(N1), % P0 is the number of tens places used by the % digits. e.g. "123" would take P0=2 digits(N2,P0), % the DCG expansion (adding in carry variables) % is disabled inside {curly brackets} % (this extra syntax could be avoided via dcgfix) {P is P0 + 1, N is N1*10^P + N2}. num1(N) --> digits(N,_). num1(N) --> digits(N1,_), dot, digits(N2,P), % so neat- the final number is the number % LHS of the dot plus the right hand side % number divided by a factor for the number % of digits {N is N1 + N2/10^(P+1)}. num(N) --> whitespace,num1(N),whitespace.

Which generates num2.out=

"23" = [23.0000000000] " 23" = [23.0000000000] "23.3 " = [23.3000000000] "34.5671 " = [34.5671000000]

Not © Tim Menzies, 2001
Share and enjoy- information wants to be free.
But if you take anything from this site,
please credit tim@menzies.com.