2012/06/19 Fixed a bug that made the parser return wrong beginning positions for the matched symbols when the right hand side of the rule to reduce with would begin with nullable non terminal symbols. Section 2.4 of the manual has been updated. __________________________________________________________________________ 2012/05/28 Fixed a bug that made the parser return wrong beginning positions for the matched symbols (layout characters were not skipped). __________________________________________________________________________ 2012/04/19 Fixed a bug that made the parser return wrong positions for the matched symbols. __________________________________________________________________________ 2012/03/14 Added the function: val is_re_name : ('t,'o,'gd,'ld,'l) parser_pilot -> string -> bool Now you can ask the parser_pilot whether s is the name of a declared regexp: > > if is_re_name dyp.parser_pilot s > then Regexp (RE_Name s) > else failwith "regexp name undefined" __________________________________________________________________________ 2011/11/27 Fixed a bug that prevented merge to happen when reducing with a rule with a right-hand side beginning with a non terminal reducing to the empty string. Fixed a bug that made the parser raise Syntax_error when the starting non terminal (the entry point of the grammar) was present in a right-hand side. Fixed a bug that prevented the ASTs of the parse forest to be returned with their respective priorities (instead empty strings were returned). Fixed a bug that prevented merge to happen when reducing with a rule that had an inherited attribute or an early action in the right-hand side. __________________________________________________________________________ 2011/06/18 Added the option --Werror that makes the warnings become errors (except for merge warnings). __________________________________________________________________________ 2011/03/28 Fixed a bug that prevented calls to functions parse and lexparse to work properly. Fixed a bug that happened when using Parser in the parser command list in some instances (the number identifying the grammar was not updated). __________________________________________________________________________ 2011/02/13 Two functions now allow to save a parsing_device without any functional value and to attach them after loading it. See manual section 7.4 for more information. __________________________________________________________________________ 2011/02/02 Fixed a bug that made the functions parse and lexparse not take into account the optional global_data and local_data. The constructor Parser of the parser commands list now takes a value of type parsing_device instead of parsing_pilot. The type parser_pilot is now concrete: type ('token,'obj,'global_data,'local_data,'lexbuf) parser_pilot = { pp_dev : ('token,'obj,'global_data,'local_data,'lexbuf) parsing_device; pp_par : ('token,'obj,'global_data,'local_data,'lexbuf) parser_parameters; pp_gd : 'global_data; pp_ld : 'local_data } When you want to save the automaton to a file you marshal dyp.parser_pilot.pp_dev instead of dyp.parser_pilot. __________________________________________________________________________ 2011/01/27 Code generated by dypgen now tests if the version of dypgen to generate the code matches the version of the dyp library. It is now possible to load a parser at parsing time using the constructor Parser of ('token, 'obj,'global_data,'local_data,'lexbuf) parser_pilot of the parser commands list. See section 7.4 of the manual for more information. __________________________________________________________________________ 2010/09/01 Fixed a bug that made the function next_lexeme raise Invalid_argument("String") Fixed a bug that made the function next_lexeme return an empty list Fixed a bug that made the lexer raise Invalid_argument("String.create") Note that the changes made to the lexer in the version 2009/11/15 are not effective anymore (they did not work anyway). __________________________________________________________________________ 2010/06/20 Fixed a bug that exchanged inherited attributes between distinct parsing paths inappropriately. Fixed a bug that made the parser infinitely loop when a left recursive rule had an early action. Left recursive rules with inherited attributes do not make the parser loop indefinitely anymore (but inherited attributes are still not properly handled in this case, see the manual section 5.4 for more details). __________________________________________________________________________ 2009/11/15 The layout regexp are not prefered anymore over non layout regexp when their match is longer. For instance a regexp that matches the empty string and that is expected will be prefered over a layout character. For example the following rule: a: "x" - "y"? "z" now matches the string "x z", while it didn't previously (assuming space is a layout character). Fixed a bug that made a syntax error in the generated caml code. It happened when the type of an entry point was declared and an early action was used. __________________________________________________________________________ 2009/04/30 Improved the speed of generation of the automaton. Fixed a bug that made some merges be performed several times instead of just once. Renamed the temporary files parser.temp.ml to parser_temp.ml to avoid ocaml 3.11 emitting a warning. __________________________________________________________________________ 2009/04/11 Fixed a bug that raised Invalid_argument with large grammars using priorities. Fixed two bugs that made the parser consume a very large amount of memory with large grammars using priorities. Fixed a bug (introduced in the previous version) that made the parser perform the reductions in a wrong order in some cases resulting in lost parse trees. __________________________________________________________________________ 2009/04/09 Fixed a bug that raised Not_found with no reason in some instances. Changed a data structure that made the parser use too much memory for large grammars with priorities. __________________________________________________________________________ 2009/03/10 Fixed a bug with inherited attributes. They did not work when there was an early action in the right-hand side of the rule. The behavior of early actions with respect to dyp.local_data changes slightly: the scope of local_data now is the whole right-hand side (final action still excluded) even when there are early actions in the rhs. Fixed a bug with inherited attributes that caused some parse trees to be duplicated. Fixed a bug with lexers generated with dypgen: when the character '-' was used before a nullable non terminal nt in the rhs of a rule in the parser definition, the forbidding to have a layout character would be applied before the previous symbol instead of being applied before nt. The temporary file is now named with the extension .temp.ml instead of .ml.temp for compatibility with ocamlc 3.12. __________________________________________________________________________ 2009/02/23 Fixed 2 bugs with inherited attributes. Parsing would fail in some instances when it shouldn't and the parser would not keep track correctly of the different synthesized ASTs when the same rule appeared several times with different inherited attributes. __________________________________________________________________________ 2009/02/21 Fixed a bug happening when extending the grammar in some instances. The generated automaton would not parse as intended or the generation would fail. Added the option --cpp-options "options" to dypgen to pass options to cpp when called by dypgen. It is useful to pass the flag -w to cpp to avoid the warning messages. --cpp is not needed when using --cpp-options. dypgen now supports a subset of the class of L-attribute grammars. You can send down an inherited attribute except for left-recursive rules which make the parser loop indefinitely when used with an inherited attribute. See section 5.5 of the manual. __________________________________________________________________________ 2009/02/10 Nested rules are now enclosed between [ and ] instead of ( and ). If the right-hand side of a rule is not empty you can state no action this means that the value bound to the last symbol of the rhs is returned. For example you can use it with nested rules this way: ["kw1" | "kw2"- ] here "kw1" can be followed by layout characters while "kw2" cannot. Another example: nt ["," nt]* to have a list of nt separated with commas. Note that something like ['a'] will still be interpreted as a character set and not as a nested rule, write ["a"] instead. Bug fixed: dypgen raised a syntax error when named regexp were declared without any main lexer. __________________________________________________________________________ 2009/02/05 Added the following functions to Dyp : val set_newline : 'obj dyplexbuf -> unit val set_fname : 'obj dyplexbuf -> string -> unit documented at the end of section 2.1. Line of the form: # -line-number- "filename" -anything-until-end-of-line- are now allowed in .dyp files and taken into account for error location by dypgen and Caml. As a consequence dypgen now accepts cpp preprocessing of .dyp files. With the option --cpp dypgen calls the C preprocessor cpp on the input file before processing it. For example: #define INFIX(op,p) expr(<=p) #op expr(
for dyp.Dyp.rhs_end_pos
__________________________________________________________________________
2009/01/29
Fixed a bug that made the parser miss some reductions in some cases, as
a result parsing failed in some instances when it should not.
Bug fixed: when several actions were bound to a rule, they were performed
in reverse order instead of source file order (manual section 5.1).
It is now possible to execute all the actions bound to each rule instead
of just the first that doesn't raise Giveup. For this use the option:
--use-all-actions or declare: let dypgen_use_all_actions = true.
__________________________________________________________________________
2009/01/26
Several functions of the GLR algorithm and of analysis of the grammar have
been rewritten. This fixes several bugs: in some cases some ASTs were lost
or parsing failed on valid inputs.
Cyclic grammars (when a non terminal can derive itself) are now allowed.
__________________________________________________________________________
2009/01/05
Bug fixed: the main function of the parser was not tail recursive when
using the lexer generated by dypgen. This is fixed and results in a big
performance improvement, especially with large files.
Another improvement makes the parser faster (regardless of which lexer
generator is used) when the grammar is ambiguous.
__________________________________________________________________________
2009/01/01
Bugs fixed:
- When a layout regular expression could match the empty string it caused
the parser to loop indefinitely in some instances.
- Stating two regular expressions on a rhs with the second one being a
sequence of regular expressions made dypgen consider them as just one
regexp instead of two separated regexp.
- In some instances, when using dypgen_choose_token = `all, the lexer
would stop lexing inappropriately and the parser would raise Syntax_error.
__________________________________________________________________________
2008/12/31
Bug fixed: merge happened even when one rule forbade layout characters and
the other did not. Such merges do not happen anymore.
__________________________________________________________________________
2008/12/30
Bug fixed. It happened in some instances when extending the grammar.
It is now possible to forbid layout characters more precisely than with
'!':
The type symb is now:
type symb =
| Ter of string
| Ter_NL of string
| Non_ter of string * string nt_prio
| Non_ter_NL of string * string nt_prio
| Regexp of regexp
| Regexp_NL of regexp
instead of:
type symb =
| Ter of string
| Non_ter of string * string nt_prio
| Regexp of regexp
The suffix _NL tells that the symbol cannot be preceded by layout
characters. It is only relevant with lexers generated by dypgen.
In the .dyp file, you use the character '-' before a symbol.
The type rule is now:
type rule = string * (symb list) * string * rule_options list
instead of:
type rule = string * (symb list) * string * bool
with:
type rule_options = No_layout_inside | No_layout_follows
Using [] is like using true in the previous version and using
[No_layout_inside] like using false. No_layout_follows forbids layout
characters after the last symbol of the right-hand side of the rule. In
the .dyp file you use the character '-' after the last symbol in the rhs
of the rule.
__________________________________________________________________________
2008/12/17
The default behavior of dypgen with respect to merging sub trees has
changed. When merging, dypgen used to pick one of the global data and one
of the local data arbitrarily. Now dypgen does not merge anymore when data
are different (according to global_data_equal and local_data_equal). This
behavior turns out to be more natural. You can still have the old behavior
by using the new optional argument:
?keep_data:[`both|`global|`local|`none]
of the functions parse and lexparse with `none. Or define the variable:
val dypgen_keep_data : [`both|`global|`local|`none]
in the header of the parser (see 3.2.4 in the manual for more info).
__________________________________________________________________________
2008/12/16
Bug fixed related to merging sub-trees, some merged trees were duplicated.
__________________________________________________________________________
2008/12/14
The record dyp has a new field:
next_lexeme : unit -> string list
next_lexeme allows the user action to know the next lexeme to be matched
by the lexer. It only works for the main lexer generated by dypgen. For
more information about next_lexeme see section 5.7 of the manual.
__________________________________________________________________________
2008/12/13
Bug fixed: dypgen_choose_token = `all did not work.
__________________________________________________________________________
2008/12/12
Bug fixed: merging of sub-trees was not handled properly when the grammar
contained epsilon rules and an assert failure could be raised.
__________________________________________________________________________
2008/12/09
Bug fixed: when a rule began with an early action the generated code was
wrong.
Bug fixed: in some instances, the layout characters were not skipped.
Bug fixed: long sequences and alternatives of regexp in a .dyp file made
dypgen loop for ever.
The function lexparse has a new optional argument:
?choose_token:[`first|`all]
When `all is used the lexer uses all the tokens that are the longest match
instead of just the first one. By default `first is used.
__________________________________________________________________________
2008/09/25
Dypgen can now generate a lexer for the parser and auxiliary lexers to be
called from the main lexer. These lexers do not support unicode nor
submatching (bindings with the keyword "as"). See sections 1.5 and 2.1 of
the manual for information about dypgen lexer generator. The main lexer
can be extended by using regular expressions in the right-hand sides of
new grammar rules.
The type symb is now:
type symb =
| Ter of string
| Non_ter of string * (string nt_prio)
| Regexp of regexp
instead of:
type symb =
| Ter of int
| Non_ter of string * (string nt_prio)
This means that terminals are refered by a string (their name "TOKEN")
instead of an int (t_TOKEN), and that regular expressions are allowed in
the rhs of a rule. The type regexp is given in section 6.1.
Don't use names of undefined tokens when you write new rules (it raises
Undefined_ter), you cannot define new terminal symbols.
The type obj now owns the constructor:
Lexeme_matched of string
It is the constructor for the strings returned by regular expressions.
This constructor is present even if you don't use dypgen lexer generator.
There is one constructor Lex_lexer_name for each auxiliary lexer (where
lexer_name is the name of the lexer), and one constructor:
Lex_lexer_name_Arg_arg_name
for each parameter of the lexer (where arg_name is the name of the
parameter). the user won't have to deal with these constructors.
When using dypgen as the lexer generator you must use the function
lexparse instead of parse (section 7.3).
The patterns for symbols are now enclosed between < and > instead of [ and
], these brackets are now used for characters intervals.
It is not possible anymore to not state any action code after a rhs (this
meant the user action returned None). It was too ambiguous.
You should not have a non terminal named eof because this is the regular
expression that matches the end of input.
You can use the character '!' at the beginning of the rhs of a rule. This
means that the parser will reduce with this rule only if no layout
character was matched in the part of the input that is being reduced. This
only apply when dypgen is the lexer generator.
As a consequence the type of grammar rules changes, it is now:
type rule = string * (symb list) * string * bool
instead of:
type rule = string * (symb list) * string
When the bool is true layout characters are allowed to be matched in the
part of the input to be reduced.
The documentation has been updated. Partial actions are now called early
actions in the manual.
The keyword %parser is equivalent to %% in .dyp files.
__________________________________________________________________________
2008/09/01
--no-pp now works when non terminals are declared with %start too.
__________________________________________________________________________
2008/08/31
Added the option --no-pp. This prevents dypgen from declaring pp in the
.mli file (only works when no non terminal is declared with %start).
Added the option --no-obj. This prevents dypgen from declaring the type
obj in the .mli file (does not work if pp is declared).
The types token and obj declared in the .mli now use type names that are
completely prefixed with module names (it was already the case for the
value pp). Therefore you should avoid opening modules in %mlitop and
%mlimid. In particular this fixes a type error that happened when a module
containing a module of the same name was opened in the .dyp file.
__________________________________________________________________________
2008/08/27
Fixed a bug that happened when extracting strings from parser.extract_type
__________________________________________________________________________
2008/07/02
Added the option --command (see manual 10.6).
dypgen now works on an auxiliary file .ml.temp when generating the .mli
before saving the result in the .ml file.
Changed the type name parser to parser_pilot and the value name parser
to pp, this allows camlp4 to parse the generated .ml file. The function
update_parser is renamed update_pp and the field parser of the record
dypgen_toolbox is renamed parser_pilot.
__________________________________________________________________________
2008/06/14
The operators *, + and ? have been added to dypgen syntax, see section 4.5
of the manual for more information.
The .ml generated file defines the value parser.
The record dyp has a new field named parser. Because of the type of this
field, the typechecker of ocamlc complains in some cases, like when using
a partial action that returns a parser commands list (i.e. beginning with
...@{ ). In such cases you will have to use the option -rectypes of ocamlc
and use --ocamlc "-rectypes" with dypgen.
The following functions have been added to the module Dyp of the library:
val update_parser :
('token,'obj,'global_data,'local_data,'lexbuf) parser ->
('token,'obj,'global_data,'local_data,'lexbuf) dyp_action list ->
('token,'obj,'global_data,'local_data,'lexbuf) parser
val parse :
('token, 'obj,'global_data,'local_data,'lexbuf) parser -> string ->
?global_data:'global_data ->
?local_data:'local_data ->
?match_len:[`longest|`shortest] ->
?lexpos:('lexbuf -> (Lexing.position * Lexing.position)) ->
('lexbuf -> 'token) ->
'lexbuf ->
(('obj * string) list)
The function parse makes possible to parse for any non terminal symbol of
the grammar and to parse recursively from the action code.
The function update_parser makes possible to modify a parser with a list
of parser commands of type dyp_action. Both functions can be used inside
the action code and outside.
See section 7 for more information about parser, parse and update_parser.
The types of start entry points, global_data and local data are now
infered by Caml. You don't have to state them anymore. The keywords
%global_data_type and %local_data_type are therefore discarded.
The keyword %lexbuf_type is discarded. If you want to use another lexer
than ocamllex you have to define the function dypgen_lexbuf_position in
the header. If you don't need the positions of the lexer you may use the
line: let dypgen_lexbuf_position = Dyp.dummy_lexbuf_position
Added the option --ocamlc string to dypgen. In order to know some types,
dypgen calls ocamlc -i (see 8.2). --ocamlc makes possible to pass some
command-line options to ocamlc. Example:
dypgen --ocamlc "-I ../dypgen/dyplib -rectypes" parser.dyp
The types pliteral and lit are renamed psymbol and symb.
The constructor Will_shift of bool is replaced by Dont_shift.
The program pgen has been removed, dypgen generates itself now.
The type error messages are less difficult to understand in some cases.
Added the option --no-mli : dypgen does not generate a .mli file.
Added the variable dypgen_match_length (see section 10.7).
You can state no action after the right-hand side of a rule (see 4.6).
__________________________________________________________________________
2008/02/04
Some improvements in the speed of extension of the grammar.
__________________________________________________________________________
2008/01/11
The constructor Relations has been renamed Relation for consistency with
dypgen syntax.
Some improvements in the speed of extension of the grammar.
A bug that may happened when extending the relation between priorities has
been fixed.
__________________________________________________________________________
2008/01/04
The constructor Keep_grammar has no argument anymore (you just state
Keep_grammar instead of Keep_grammar true)
It is now possible to add new priorities and make the relation true
between priorities. See the manual section 9 for more information.
__________________________________________________________________________
2007/12/26
The option --lexer does not exist anymore. If you want to use another lexer
than ocamllex, use the keyword %lexbuf_type to assign the type you want to
the lexer buffer (see section 13.5 of the manual for more info).
It is now also possible to have relevant information about the positions
(i.e. with the functions: symbol_start, symbol_start_pos, ...) when not
using ocamllex too (see section 13.6 of the manual for more info).
Added the example program position_token_list. It is the same as the demo
program position except that it first uses ocamllex to make a list of
tokens and then uses dypgen to parse this list of tokens.
New keyword %mlimid to add code to the mli between the type token and the
declaration of the entry point functions.
__________________________________________________________________________
2007/11/09
It is not possible to remove rules at parsing time in this version anymore.
The option --prio-pt does not exist anymore (a new method combines the
benefit of both former methods).
The commands to change the relation between priorities and add new
priorities are not available anymore (they were bugged in the previous
versions).
The constructors Remove_rules and Priority_data have been removed from
the type dyp_action. The fields priority_data, add_nt and find_nt have
been removed from the record dyp (type dypgen_toolbox).
The following functions are not available anymore:
empty_priority_data
is_relation
insert_priority
find_priority
set_relation
update_priority
add_list_relations
dyp.find_nt
dyp.add_nt
The following constructor has been added to the type dyp_action:
Bind_to_cons of (string * string) list
(the type dyp_action is the type of the list that user actions return
along with the resulting tree)
When one returns:
Bind_to_cons [("nt1","Cons1"),("nt2","Cons2")]
the non terminal nt1 is bound to the constructor Cons1 and nt2 to Cons2.
The news rules introduced at parsing tim are constructed differently:
one now simply uses the string of the non terminal and the string of
the priority. e.g.
("expr",
[Non_ter ("expr",No_priority); Ter t_PLUS; Non_ter ("expr",No_priority)],
"default_priority")
for the rule:
expr: expr PLUS expr
The types to construct rules were previously:
type token_name
type non_ter
type priority
type non_terminal_priority =
| No_priority
| Eq_priority of priority
| Less_priority of priority
| Lesseq_priority of priority
| Greater_priority of priority
| Greatereq_priority of priority
type 'a pliteral =
| Ter of token_name
| Non_ter of 'a * non_terminal_priority
type lit = (non_ter * non_terminal_priority) pliteral
type rule = non_ter * (lit list) * priority
And now they are:
type token_name
type 'a nt_prio =
| No_priority
| Eq_priority of 'a
| Less_priority of 'a
| Lesseq_priority of 'a
| Greater_priority of 'a
| Greatereq_priority of 'a
type 'a pliteral =
| Ter of token_name
| Non_ter of 'a
type lit = (string * (string nt_prio)) pliteral
type rule = string * (lit list) * string
Added the commands infix, infixl and infixr to the example tinyML
(see test_infix.tiny)
__________________________________________________________________________
2007/11/08
Fixed a bug that raised an Assert_failure when using an ambiguous
grammar with an epsilon production and using Dyp.keep_all.
Partial actions must now be preceded by three dots `...' to prevent
puzzling errors when one forgets a `|' between productions.
__________________________________________________________________________
2007/07/29
global_data and local_data can be passed to the parser since they are
now optional arguments of the parser. If 'main' is a start symbol of
the grammar then we have the function:
val main : ?global_data:gd_type -> ?local_data:ld_type ->
(Lexing.lexbuf -> token) -> Lexing.lexbuf ->
((main_type) * Dyp.priority) list
where gd_type and ld_type represents the type of the global and local
data. Those have to be declared in the parser definition using:
%global_data_type obj -> obj list.
And the parser return a list of couple (parse-tree,priority) instead of
just a list of parse trees.
The default automaton is now the LR(0).
A new method which embeds the inforcement of priorities in the automaton
can be optionally chosen.
__________________________________________________________________________
2007/1/31
Bug fixed : %relation p1:p2 would introduce p1:p1 and p2:p2 in addition to
p1:p2, thus (p2) and (=p2) would denote the same thing. This has been fixed.
__________________________________________________________________________
2007/1/23
Fixed an efficiency issue (a lot of %token lines would make the generation
very slow).
One can now use an LR(0) automaton. This is stated with an option to dypgen.
__________________________________________________________________________
2007/1/15
The automaton of the parser is now built when one actually uses the parser,
it is not marshalled anymore in the .ml file (this fixes a bug). As a result
the .ml file is lighter but the parser is longer to start when it is used.
The license changes to Cecill-B for all the files.
Simplification of the installation.
__________________________________________________________________________
2006/12/26
The merge warning now returns the position of the part of the input considered.
__________________________________________________________________________
2006/12/25
The functions symbol_start,... which tell what is the part of the input that
is reduced to a given non terminal are now available.
When one uses a lexer different from ocamllex, one must use the option -lexer.
The type of the action code changes slightly.
__________________________________________________________________________
2006/12/19
Bug fixed: in some particular cases the parser would not shift while it had to.
Bug fixed: Taking into account the lookahead token for the reduction by a
dynamic rule prevented the parser from reducing in some particular cases when
it had to. As a consequence the lookahead token is not considered anymore to
decide whether to reduce by a dynamic rule or not.
__________________________________________________________________________
2006/12/16
Introduction of the generic merge functions and of predefined merge functions.
The type of the specific merge functions changes and is now :
val merge_nt : priority_data -> ('obj * priority) list -> ('obj * priority) ->
('obj * priority) list
The command line option -merge_warning makes the generated parser emit a warning
on the standard output each time a merge happens.
__________________________________________________________________________
2006/12/15
By default the parser build an LALR(1) automaton instead of
an LR(1) one, which is still possible by stating %LR1 in the
parser definition.
__________________________________________________________________________
2006/12/13
Fixed a bug which prevented merging of values to happen.
One can define a general merge function now. It applies
to any non terminal which has now its own merge function.
Added an example which yields a parse forest.
__________________________________________________________________________
2006/12/12
renamed the flag verbose to dypgen_verbose
minor bug fixed.
__________________________________________________________________________
2006/11/29
Several entry points are now allowed.
The parsing function is not dyp_parse anymore but
has the name of the corresponding non terminal
as with ocamlyacc.
The end of file token is not a special token anymore
and can be used in the rules as with ocamlyacc.
__________________________________________________________________________
2006/10/26
val dyp_parse : ('a -> token) -> 'a -> main_type list
instead of
val dyp_parse : (Lexing.lexbuf -> token) -> Lexing.lexbuf -> main_type list
in addition to ocamllex any other lexer can be used with dypgen
__________________________________________________________________________
2006/10/24
Fixed bugs with merge functions
Added a compact syntax for transitive relations
Completed user's manual.
__________________________________________________________________________
2006/10/20
First version