2012/06/19 Fixed a bug that made the parser return wrong beginning positions for the matched symbols when the right hand side of the rule to reduce with would begin with nullable non terminal symbols. Section 2.4 of the manual has been updated. __________________________________________________________________________ 2012/05/28 Fixed a bug that made the parser return wrong beginning positions for the matched symbols (layout characters were not skipped). __________________________________________________________________________ 2012/04/19 Fixed a bug that made the parser return wrong positions for the matched symbols. __________________________________________________________________________ 2012/03/14 Added the function: val is_re_name : ('t,'o,'gd,'ld,'l) parser_pilot -> string -> bool Now you can ask the parser_pilot whether s is the name of a declared regexp: > > if is_re_name dyp.parser_pilot s > then Regexp (RE_Name s) > else failwith "regexp name undefined" __________________________________________________________________________ 2011/11/27 Fixed a bug that prevented merge to happen when reducing with a rule with a right-hand side beginning with a non terminal reducing to the empty string. Fixed a bug that made the parser raise Syntax_error when the starting non terminal (the entry point of the grammar) was present in a right-hand side. Fixed a bug that prevented the ASTs of the parse forest to be returned with their respective priorities (instead empty strings were returned). Fixed a bug that prevented merge to happen when reducing with a rule that had an inherited attribute or an early action in the right-hand side. __________________________________________________________________________ 2011/06/18 Added the option --Werror that makes the warnings become errors (except for merge warnings). __________________________________________________________________________ 2011/03/28 Fixed a bug that prevented calls to functions parse and lexparse to work properly. Fixed a bug that happened when using Parser in the parser command list in some instances (the number identifying the grammar was not updated). __________________________________________________________________________ 2011/02/13 Two functions now allow to save a parsing_device without any functional value and to attach them after loading it. See manual section 7.4 for more information. __________________________________________________________________________ 2011/02/02 Fixed a bug that made the functions parse and lexparse not take into account the optional global_data and local_data. The constructor Parser of the parser commands list now takes a value of type parsing_device instead of parsing_pilot. The type parser_pilot is now concrete: type ('token,'obj,'global_data,'local_data,'lexbuf) parser_pilot = { pp_dev : ('token,'obj,'global_data,'local_data,'lexbuf) parsing_device; pp_par : ('token,'obj,'global_data,'local_data,'lexbuf) parser_parameters; pp_gd : 'global_data; pp_ld : 'local_data } When you want to save the automaton to a file you marshal dyp.parser_pilot.pp_dev instead of dyp.parser_pilot. __________________________________________________________________________ 2011/01/27 Code generated by dypgen now tests if the version of dypgen to generate the code matches the version of the dyp library. It is now possible to load a parser at parsing time using the constructor Parser of ('token, 'obj,'global_data,'local_data,'lexbuf) parser_pilot of the parser commands list. See section 7.4 of the manual for more information. __________________________________________________________________________ 2010/09/01 Fixed a bug that made the function next_lexeme raise Invalid_argument("String") Fixed a bug that made the function next_lexeme return an empty list Fixed a bug that made the lexer raise Invalid_argument("String.create") Note that the changes made to the lexer in the version 2009/11/15 are not effective anymore (they did not work anyway). __________________________________________________________________________ 2010/06/20 Fixed a bug that exchanged inherited attributes between distinct parsing paths inappropriately. Fixed a bug that made the parser infinitely loop when a left recursive rule had an early action. Left recursive rules with inherited attributes do not make the parser loop indefinitely anymore (but inherited attributes are still not properly handled in this case, see the manual section 5.4 for more details). __________________________________________________________________________ 2009/11/15 The layout regexp are not prefered anymore over non layout regexp when their match is longer. For instance a regexp that matches the empty string and that is expected will be prefered over a layout character. For example the following rule: a: "x" - "y"? "z" now matches the string "x z", while it didn't previously (assuming space is a layout character). Fixed a bug that made a syntax error in the generated caml code. It happened when the type of an entry point was declared and an early action was used. __________________________________________________________________________ 2009/04/30 Improved the speed of generation of the automaton. Fixed a bug that made some merges be performed several times instead of just once. Renamed the temporary files parser.temp.ml to parser_temp.ml to avoid ocaml 3.11 emitting a warning. __________________________________________________________________________ 2009/04/11 Fixed a bug that raised Invalid_argument with large grammars using priorities. Fixed two bugs that made the parser consume a very large amount of memory with large grammars using priorities. Fixed a bug (introduced in the previous version) that made the parser perform the reductions in a wrong order in some cases resulting in lost parse trees. __________________________________________________________________________ 2009/04/09 Fixed a bug that raised Not_found with no reason in some instances. Changed a data structure that made the parser use too much memory for large grammars with priorities. __________________________________________________________________________ 2009/03/10 Fixed a bug with inherited attributes. They did not work when there was an early action in the right-hand side of the rule. The behavior of early actions with respect to dyp.local_data changes slightly: the scope of local_data now is the whole right-hand side (final action still excluded) even when there are early actions in the rhs. Fixed a bug with inherited attributes that caused some parse trees to be duplicated. Fixed a bug with lexers generated with dypgen: when the character '-' was used before a nullable non terminal nt in the rhs of a rule in the parser definition, the forbidding to have a layout character would be applied before the previous symbol instead of being applied before nt. The temporary file is now named with the extension .temp.ml instead of .ml.temp for compatibility with ocamlc 3.12. __________________________________________________________________________ 2009/02/23 Fixed 2 bugs with inherited attributes. Parsing would fail in some instances when it shouldn't and the parser would not keep track correctly of the different synthesized ASTs when the same rule appeared several times with different inherited attributes. __________________________________________________________________________ 2009/02/21 Fixed a bug happening when extending the grammar in some instances. The generated automaton would not parse as intended or the generation would fail. Added the option --cpp-options "options" to dypgen to pass options to cpp when called by dypgen. It is useful to pass the flag -w to cpp to avoid the warning messages. --cpp is not needed when using --cpp-options. dypgen now supports a subset of the class of L-attribute grammars. You can send down an inherited attribute except for left-recursive rules which make the parser loop indefinitely when used with an inherited attribute. See section 5.5 of the manual. __________________________________________________________________________ 2009/02/10 Nested rules are now enclosed between [ and ] instead of ( and ). If the right-hand side of a rule is not empty you can state no action this means that the value bound to the last symbol of the rhs is returned. For example you can use it with nested rules this way: ["kw1" | "kw2"- ] here "kw1" can be followed by layout characters while "kw2" cannot. Another example: nt ["," nt]* to have a list of nt separated with commas. Note that something like ['a'] will still be interpreted as a character set and not as a nested rule, write ["a"] instead. Bug fixed: dypgen raised a syntax error when named regexp were declared without any main lexer. __________________________________________________________________________ 2009/02/05 Added the following functions to Dyp : val set_newline : 'obj dyplexbuf -> unit val set_fname : 'obj dyplexbuf -> string -> unit documented at the end of section 2.1. Line of the form: # -line-number- "filename" -anything-until-end-of-line- are now allowed in .dyp files and taken into account for error location by dypgen and Caml. As a consequence dypgen now accepts cpp preprocessing of .dyp files. With the option --cpp dypgen calls the C preprocessor cpp on the input file before processing it. For example: #define INFIX(op,p) expr(<=p) #op expr( for dyp.Dyp.rhs_end_pos __________________________________________________________________________ 2009/01/29 Fixed a bug that made the parser miss some reductions in some cases, as a result parsing failed in some instances when it should not. Bug fixed: when several actions were bound to a rule, they were performed in reverse order instead of source file order (manual section 5.1). It is now possible to execute all the actions bound to each rule instead of just the first that doesn't raise Giveup. For this use the option: --use-all-actions or declare: let dypgen_use_all_actions = true. __________________________________________________________________________ 2009/01/26 Several functions of the GLR algorithm and of analysis of the grammar have been rewritten. This fixes several bugs: in some cases some ASTs were lost or parsing failed on valid inputs. Cyclic grammars (when a non terminal can derive itself) are now allowed. __________________________________________________________________________ 2009/01/05 Bug fixed: the main function of the parser was not tail recursive when using the lexer generated by dypgen. This is fixed and results in a big performance improvement, especially with large files. Another improvement makes the parser faster (regardless of which lexer generator is used) when the grammar is ambiguous. __________________________________________________________________________ 2009/01/01 Bugs fixed: - When a layout regular expression could match the empty string it caused the parser to loop indefinitely in some instances. - Stating two regular expressions on a rhs with the second one being a sequence of regular expressions made dypgen consider them as just one regexp instead of two separated regexp. - In some instances, when using dypgen_choose_token = `all, the lexer would stop lexing inappropriately and the parser would raise Syntax_error. __________________________________________________________________________ 2008/12/31 Bug fixed: merge happened even when one rule forbade layout characters and the other did not. Such merges do not happen anymore. __________________________________________________________________________ 2008/12/30 Bug fixed. It happened in some instances when extending the grammar. It is now possible to forbid layout characters more precisely than with '!': The type symb is now: type symb = | Ter of string | Ter_NL of string | Non_ter of string * string nt_prio | Non_ter_NL of string * string nt_prio | Regexp of regexp | Regexp_NL of regexp instead of: type symb = | Ter of string | Non_ter of string * string nt_prio | Regexp of regexp The suffix _NL tells that the symbol cannot be preceded by layout characters. It is only relevant with lexers generated by dypgen. In the .dyp file, you use the character '-' before a symbol. The type rule is now: type rule = string * (symb list) * string * rule_options list instead of: type rule = string * (symb list) * string * bool with: type rule_options = No_layout_inside | No_layout_follows Using [] is like using true in the previous version and using [No_layout_inside] like using false. No_layout_follows forbids layout characters after the last symbol of the right-hand side of the rule. In the .dyp file you use the character '-' after the last symbol in the rhs of the rule. __________________________________________________________________________ 2008/12/17 The default behavior of dypgen with respect to merging sub trees has changed. When merging, dypgen used to pick one of the global data and one of the local data arbitrarily. Now dypgen does not merge anymore when data are different (according to global_data_equal and local_data_equal). This behavior turns out to be more natural. You can still have the old behavior by using the new optional argument: ?keep_data:[`both|`global|`local|`none] of the functions parse and lexparse with `none. Or define the variable: val dypgen_keep_data : [`both|`global|`local|`none] in the header of the parser (see 3.2.4 in the manual for more info). __________________________________________________________________________ 2008/12/16 Bug fixed related to merging sub-trees, some merged trees were duplicated. __________________________________________________________________________ 2008/12/14 The record dyp has a new field: next_lexeme : unit -> string list next_lexeme allows the user action to know the next lexeme to be matched by the lexer. It only works for the main lexer generated by dypgen. For more information about next_lexeme see section 5.7 of the manual. __________________________________________________________________________ 2008/12/13 Bug fixed: dypgen_choose_token = `all did not work. __________________________________________________________________________ 2008/12/12 Bug fixed: merging of sub-trees was not handled properly when the grammar contained epsilon rules and an assert failure could be raised. __________________________________________________________________________ 2008/12/09 Bug fixed: when a rule began with an early action the generated code was wrong. Bug fixed: in some instances, the layout characters were not skipped. Bug fixed: long sequences and alternatives of regexp in a .dyp file made dypgen loop for ever. The function lexparse has a new optional argument: ?choose_token:[`first|`all] When `all is used the lexer uses all the tokens that are the longest match instead of just the first one. By default `first is used. __________________________________________________________________________ 2008/09/25 Dypgen can now generate a lexer for the parser and auxiliary lexers to be called from the main lexer. These lexers do not support unicode nor submatching (bindings with the keyword "as"). See sections 1.5 and 2.1 of the manual for information about dypgen lexer generator. The main lexer can be extended by using regular expressions in the right-hand sides of new grammar rules. The type symb is now: type symb = | Ter of string | Non_ter of string * (string nt_prio) | Regexp of regexp instead of: type symb = | Ter of int | Non_ter of string * (string nt_prio) This means that terminals are refered by a string (their name "TOKEN") instead of an int (t_TOKEN), and that regular expressions are allowed in the rhs of a rule. The type regexp is given in section 6.1. Don't use names of undefined tokens when you write new rules (it raises Undefined_ter), you cannot define new terminal symbols. The type obj now owns the constructor: Lexeme_matched of string It is the constructor for the strings returned by regular expressions. This constructor is present even if you don't use dypgen lexer generator. There is one constructor Lex_lexer_name for each auxiliary lexer (where lexer_name is the name of the lexer), and one constructor: Lex_lexer_name_Arg_arg_name for each parameter of the lexer (where arg_name is the name of the parameter). the user won't have to deal with these constructors. When using dypgen as the lexer generator you must use the function lexparse instead of parse (section 7.3). The patterns for symbols are now enclosed between < and > instead of [ and ], these brackets are now used for characters intervals. It is not possible anymore to not state any action code after a rhs (this meant the user action returned None). It was too ambiguous. You should not have a non terminal named eof because this is the regular expression that matches the end of input. You can use the character '!' at the beginning of the rhs of a rule. This means that the parser will reduce with this rule only if no layout character was matched in the part of the input that is being reduced. This only apply when dypgen is the lexer generator. As a consequence the type of grammar rules changes, it is now: type rule = string * (symb list) * string * bool instead of: type rule = string * (symb list) * string When the bool is true layout characters are allowed to be matched in the part of the input to be reduced. The documentation has been updated. Partial actions are now called early actions in the manual. The keyword %parser is equivalent to %% in .dyp files. __________________________________________________________________________ 2008/09/01 --no-pp now works when non terminals are declared with %start too. __________________________________________________________________________ 2008/08/31 Added the option --no-pp. This prevents dypgen from declaring pp in the .mli file (only works when no non terminal is declared with %start). Added the option --no-obj. This prevents dypgen from declaring the type obj in the .mli file (does not work if pp is declared). The types token and obj declared in the .mli now use type names that are completely prefixed with module names (it was already the case for the value pp). Therefore you should avoid opening modules in %mlitop and %mlimid. In particular this fixes a type error that happened when a module containing a module of the same name was opened in the .dyp file. __________________________________________________________________________ 2008/08/27 Fixed a bug that happened when extracting strings from parser.extract_type __________________________________________________________________________ 2008/07/02 Added the option --command (see manual 10.6). dypgen now works on an auxiliary file .ml.temp when generating the .mli before saving the result in the .ml file. Changed the type name parser to parser_pilot and the value name parser to pp, this allows camlp4 to parse the generated .ml file. The function update_parser is renamed update_pp and the field parser of the record dypgen_toolbox is renamed parser_pilot. __________________________________________________________________________ 2008/06/14 The operators *, + and ? have been added to dypgen syntax, see section 4.5 of the manual for more information. The .ml generated file defines the value parser. The record dyp has a new field named parser. Because of the type of this field, the typechecker of ocamlc complains in some cases, like when using a partial action that returns a parser commands list (i.e. beginning with ...@{ ). In such cases you will have to use the option -rectypes of ocamlc and use --ocamlc "-rectypes" with dypgen. The following functions have been added to the module Dyp of the library: val update_parser : ('token,'obj,'global_data,'local_data,'lexbuf) parser -> ('token,'obj,'global_data,'local_data,'lexbuf) dyp_action list -> ('token,'obj,'global_data,'local_data,'lexbuf) parser val parse : ('token, 'obj,'global_data,'local_data,'lexbuf) parser -> string -> ?global_data:'global_data -> ?local_data:'local_data -> ?match_len:[`longest|`shortest] -> ?lexpos:('lexbuf -> (Lexing.position * Lexing.position)) -> ('lexbuf -> 'token) -> 'lexbuf -> (('obj * string) list) The function parse makes possible to parse for any non terminal symbol of the grammar and to parse recursively from the action code. The function update_parser makes possible to modify a parser with a list of parser commands of type dyp_action. Both functions can be used inside the action code and outside. See section 7 for more information about parser, parse and update_parser. The types of start entry points, global_data and local data are now infered by Caml. You don't have to state them anymore. The keywords %global_data_type and %local_data_type are therefore discarded. The keyword %lexbuf_type is discarded. If you want to use another lexer than ocamllex you have to define the function dypgen_lexbuf_position in the header. If you don't need the positions of the lexer you may use the line: let dypgen_lexbuf_position = Dyp.dummy_lexbuf_position Added the option --ocamlc string to dypgen. In order to know some types, dypgen calls ocamlc -i (see 8.2). --ocamlc makes possible to pass some command-line options to ocamlc. Example: dypgen --ocamlc "-I ../dypgen/dyplib -rectypes" parser.dyp The types pliteral and lit are renamed psymbol and symb. The constructor Will_shift of bool is replaced by Dont_shift. The program pgen has been removed, dypgen generates itself now. The type error messages are less difficult to understand in some cases. Added the option --no-mli : dypgen does not generate a .mli file. Added the variable dypgen_match_length (see section 10.7). You can state no action after the right-hand side of a rule (see 4.6). __________________________________________________________________________ 2008/02/04 Some improvements in the speed of extension of the grammar. __________________________________________________________________________ 2008/01/11 The constructor Relations has been renamed Relation for consistency with dypgen syntax. Some improvements in the speed of extension of the grammar. A bug that may happened when extending the relation between priorities has been fixed. __________________________________________________________________________ 2008/01/04 The constructor Keep_grammar has no argument anymore (you just state Keep_grammar instead of Keep_grammar true) It is now possible to add new priorities and make the relation true between priorities. See the manual section 9 for more information. __________________________________________________________________________ 2007/12/26 The option --lexer does not exist anymore. If you want to use another lexer than ocamllex, use the keyword %lexbuf_type to assign the type you want to the lexer buffer (see section 13.5 of the manual for more info). It is now also possible to have relevant information about the positions (i.e. with the functions: symbol_start, symbol_start_pos, ...) when not using ocamllex too (see section 13.6 of the manual for more info). Added the example program position_token_list. It is the same as the demo program position except that it first uses ocamllex to make a list of tokens and then uses dypgen to parse this list of tokens. New keyword %mlimid to add code to the mli between the type token and the declaration of the entry point functions. __________________________________________________________________________ 2007/11/09 It is not possible to remove rules at parsing time in this version anymore. The option --prio-pt does not exist anymore (a new method combines the benefit of both former methods). The commands to change the relation between priorities and add new priorities are not available anymore (they were bugged in the previous versions). The constructors Remove_rules and Priority_data have been removed from the type dyp_action. The fields priority_data, add_nt and find_nt have been removed from the record dyp (type dypgen_toolbox). The following functions are not available anymore: empty_priority_data is_relation insert_priority find_priority set_relation update_priority add_list_relations dyp.find_nt dyp.add_nt The following constructor has been added to the type dyp_action: Bind_to_cons of (string * string) list (the type dyp_action is the type of the list that user actions return along with the resulting tree) When one returns: Bind_to_cons [("nt1","Cons1"),("nt2","Cons2")] the non terminal nt1 is bound to the constructor Cons1 and nt2 to Cons2. The news rules introduced at parsing tim are constructed differently: one now simply uses the string of the non terminal and the string of the priority. e.g. ("expr", [Non_ter ("expr",No_priority); Ter t_PLUS; Non_ter ("expr",No_priority)], "default_priority") for the rule: expr: expr PLUS expr The types to construct rules were previously: type token_name type non_ter type priority type non_terminal_priority = | No_priority | Eq_priority of priority | Less_priority of priority | Lesseq_priority of priority | Greater_priority of priority | Greatereq_priority of priority type 'a pliteral = | Ter of token_name | Non_ter of 'a * non_terminal_priority type lit = (non_ter * non_terminal_priority) pliteral type rule = non_ter * (lit list) * priority And now they are: type token_name type 'a nt_prio = | No_priority | Eq_priority of 'a | Less_priority of 'a | Lesseq_priority of 'a | Greater_priority of 'a | Greatereq_priority of 'a type 'a pliteral = | Ter of token_name | Non_ter of 'a type lit = (string * (string nt_prio)) pliteral type rule = string * (lit list) * string Added the commands infix, infixl and infixr to the example tinyML (see test_infix.tiny) __________________________________________________________________________ 2007/11/08 Fixed a bug that raised an Assert_failure when using an ambiguous grammar with an epsilon production and using Dyp.keep_all. Partial actions must now be preceded by three dots `...' to prevent puzzling errors when one forgets a `|' between productions. __________________________________________________________________________ 2007/07/29 global_data and local_data can be passed to the parser since they are now optional arguments of the parser. If 'main' is a start symbol of the grammar then we have the function: val main : ?global_data:gd_type -> ?local_data:ld_type -> (Lexing.lexbuf -> token) -> Lexing.lexbuf -> ((main_type) * Dyp.priority) list where gd_type and ld_type represents the type of the global and local data. Those have to be declared in the parser definition using: %global_data_type %local_data_type and main_type represents the type of the value returned by the non terminal 'main'. You still need to define global_data and local_data in the header of the parser definition, but their type doesn't have to be a ref any more. They may have a polymorphic type, for instance %global_data_type <'a list> is valid. The record dyp has no mutable field anymore. To give instructions to the parser one now uses a list of values of type dyp_action instead of assigning values to some fields of dyp: type ('obj,'gd,'ld) dyp_action = | Global_data of 'gd | Local_data of 'ld | Priority_data of priority_data | Add_rules of (rule * (('obj,'gd,'ld) dypgen_toolbox -> ('obj list -> 'obj * ('obj,'gd,'ld) dyp_action list))) list | Remove_rules of rule list | Will_shift of bool | Keep_grammar of bool | Next_state of out_channel | Next_grammar of out_channel If the action returns such a list along with the returned AST, then the character '@' is stated just before the left brace which begins the action code. For instance if one wants to add a new rule: @{ returned_AST, [Add_rules [(new_rule, new_action)]] } If one wants to change global_data: @{ returned_AST, [Global_data new_global_data] } and so on. The user actions that are introduced at parsing time must now return a couple: (returned_AST, dyp_action_list) instead of just returning an AST. The keywords %constructor ... %for ... can now be used with tokens as well. This is useful to save constructors or to make the compilation of the generated code less demanding by using fewer polymorphic variants. For instance by using the same constructor for any token with no argument. A same constructor of the type obj can be shared by non terminals and tokens accepting one argument. In particular the constructors associated with tokens with an argument can be used as constructors for new non terminals introduced at parsing time. The manual has been updated. Fixed a bug that, in some cases, made some parse trees lost or raised Syntax_error when it shouldn't. Fixed a bug that happened with more than one partial action in a right hand side. __________________________________________________________________________ 2007/07/26 New option for dypgen --noemit-token-type the type token is not emitted in the mli or ml files, it must be provided by the user instead. New keywords %mltop and %mlitop to add code at the top of the .ml or .mli generated file. __________________________________________________________________________ 2007/07/22 Added the map: val ter_of_string : token_name String_ter_map where module String_ter_map : Map.S with type key = string It maps strings of terminal symbols to their corresponding token_name values. It is defined in the module Dyp_symbols in the generated file. The following list is also available in this module: val ter_string_list : (string * token_name) list Each string of terminal symbol is associated with its corresponding value of type token_name. The type of the merge functions changes again. It is now: (nt_type * global_data_t * local_data_t) list -> nt_type list * global_data_t * local_data_t The merges between values for a given part of the input and a given non terminal (and a given priority) all happen at once. As always you can return a list of ASTs if you want to keep a forest of them, but you must choose only one global data and one local data. Fixed a bug that lost some parse trees in some cases. __________________________________________________________________________ 2007/07/21 Fixed a bug in dypgen lexer that happened when the string "\\" appeared in the Caml code. __________________________________________________________________________ 2007/07/16 The merge functions are now of type: nt_type list -> global_data_t -> local_data_t -> nt_type -> global_data_t -> local_data_t -> merge_result with: type merge_result = | Merge of (nt_type list * global_data_t * local_data_t) | Dont_merge The second, third, fifth and sixth arguments are the global_data and the local_data associated with respectively the previous parsing and the current parsing that are merged. The global_data and local_data that are kept must be returned in the result of the merge function, like: Dyp.Merge (tree_list, global_data, local_data) The constructor Merge is defined in the module Dyp of the library. If you don't want to merge the two parsings because the two global_data values or the two local_data values are distinct and you want to keep two GLR parsings with distinct global_data or local_data, then the merge function must just return Dyp.Dont_merge. The section of the manual about merge functions has been updated. __________________________________________________________________________ 2007/07/15 The generic merge functions keep_older and keep_newer have been replaced by keep_one to make clear it is not possible to know which AST is chosen by the default merge function. The merge functions are now associated with the constructors of the type obj instead of being associated with the non terminals. The section of the manual about merge functions has been fixed and updated, see it for more info. __________________________________________________________________________ 2007/07/13 dyp.keep_grammar is now of type bool If it is set to true then the parser keeps the current grammar after the current reduction. It has no effect when the current action itself changes the grammar. A field last_local_data has been added to the record dyp. It is equal to the value of local_data when the reduction of the last non terminal of the right-hand side of the current rule occured. To prevent local_data from being forgotten, use: dyp.local_data <- dyp.last_local_data; This allows local_data to "climb up" a node in the parsing-tree, and thus to extend its scope. Contrary to local_data, last_local_data is immutable. Manual updated, see sections 6.5 and 8.3. The example tinyML uses dyp.keep_grammar now. __________________________________________________________________________ 2007/07/06 The type obj is now also available when using --pv-obj The value default_priority is defined in the module Dyp. Added the field keep_grammar to the record dyp. dyp.keep_grammar <- `First makes the parser keep the grammar that was used after reducing the first symbol of the right-hand side of the rule, after the current reduction. dyp.keep_grammar <- `Last makes the parser keep the current grammar after the current reduction. This feature is not documented in the manual yet, see the example tinyML for an example. __________________________________________________________________________ 2007/06/27 Fixed a bug that happened in some case when merge was needed, some parse trees would be lost. __________________________________________________________________________ 2007/06/24 Bug fixed: the exception Bad_constructor was raised inappropriately when the following conditions were true: 1) a non terminal is declared with %constructor Cons %for nt or %non_terminal nt and this non terminal is not used in the initial grammar but subsquently in extensions of the grammar. 2) and the non terminal of the left-hand side of a rule that introduces extensions of the grammar can derive epsilon (the empty string). __________________________________________________________________________ 2007/06/23 New type for Bad_constructor: exception Bad_constructor of (string * string * string) 3rd string is the name of the constructor that has been used. __________________________________________________________________________ 2007/06/22 Faster generation of the .ml file Better encapsulation: the type dypgen_toolbox (i.e. the type of the record dyp) is now in the module Dyp. The types of dyp.add_nt and dyp.find_nt change: dyp.add_nt : string -> string -> Dyp.non_ter instead of dyp.add_nt : string -> Dyp.non_ter and dyp.find_nt : string -> Dyp.non_ter * string instead of dyp.find_nt : string -> Dyp.non_ter See section 8.4 of the manual for more information. Three new exceptions: exception Bad_constructor of (string * string) This exception is raised when a value is returned by a user action with a bad constructor (not corresponding to the non terminal). This can only happen with rules defined dynamically. 1st string is the rule and can be used to be printed. 2nd string is the name of the constructor that should have been used. exception Constructor_mismatch of (string * string) This exception is raised when a nt is added using dyp.add_nt with a constructor cons but it already exists with another constructor. 1st string is the name of the previous constructor, 2nd string is the name of the constructor one tried to add. exception Undefined_nt of string This exception is raised when there is in the grammar a non terminal that is in a right-hand side but never in a left-hand side (i.e. it is never defined). The string represents this non terminal. This exception is not raised if the option --no-undef-nt is used. __________________________________________________________________________ 2007/06/17 Fixed a bug that made dyp.add_nt returns a wrong value when used on a new non terminal. Added a few new fields to dyp for debugging purpose (see 10 and 13.1). __________________________________________________________________________ 2007/06/16 Added a keyword %non_terminal which makes possible to include non terminals in the initial grammar that are not part of any rule. __________________________________________________________________________ 2007/06/13 Generation time of the .ml file is less long. __________________________________________________________________________ 2007/06/11 Some type errors that used to be reported by Caml in the .ml generated file are now reported in the .dyp file and are less complex. __________________________________________________________________________ 2007/06/08 Fixed a bug that raised Not_found when a starting non terminal was not used. Added the keyword %type that behaves as in ocamlyacc. __________________________________________________________________________ 2007/06/07 Added nested rules: nt1: | symb1 ( symb2 symb3 { action1 } prio1 | symb4 symb5 { action2 } prio2 | symb6 symb7 { action3 } prio3 ) symb8 symb9 { action4 } prio4 | ... __________________________________________________________________________ 2007/06/06 Fixed a bug that happened when using the option --prio-pt along with priorities and extensibility of the grammar. Speed of the automaton generation improved. __________________________________________________________________________ 2007/05/31 Refactoring of the code (the parsing is now table driven instead of automaton driven). Speed of the automaton generation improved. Parsing speed improved. Memory usage decreased. The options --automaton LALR and --automaton LR1 are not available anymore. New option --version. A bug that made parse trees lost in case of ambiguity has been fixed. By default the priorities are now embedded into the automaton (used to be option --prio-aut), the option --prio-pt disables this. __________________________________________________________________________ 2007/05/23 Bug fixed, caused "Uncaught Exception: "Index out of bounds" at the initialization of the parser when using priorities. __________________________________________________________________________ 2007/5/20 Fixed a bug that replaced '$' by '_' in strings in actions code. Fixed a bug (introduced with the previous optimisations) that made the parser perform the reductions in a wrong order in some situations. insert_partially_ordered is tail recursive Further optimisations when initializing the parser and when extending the grammar. Install of the library with support for ocamlfind __________________________________________________________________________ 2007/05/18 The library dyp does not use -pack option anymore. The module Dyp_tools does not exist anymore. The values and types contained in the module Dyp_tools are now accesible in the module Dyp of the library dyp.cm[x]a. This module is not open by default. The values of the non terminals are now encapsulated in a module Dyp_symbols. i.e. if you have a non terminal `expr' then you use Dyp_symbol.expr to build new rules, instead of expr. The token names are also encapsulated in this module and they now begin with t_ instead of token_. The priorities and the initial priority_data are encapsulated in the module Dyp_priority_data. It is now possible to assign a common constructor to several non terminals with the directive %constructor Cons for nt1 nt2 ... Some slight optimization when initializing the parser and when extending the grammar. There is a new section about name conflicts and the accessible modules in the manual (section 11). __________________________________________________________________________ 2007/5/13 Bug fixed: the parser was not able to handle a grammar where the right-hand side of the rule of the entry point ends with a non terminal like : statementsx: | statement_aster statements_terminator statements_terminator: | ENDMARKER Bug fixed: interface was incorrect when a start symbol returned a tuple. The cyclic grammars are now detected and cause an error when the parser is used (but not when the parser is generated). Some of the options have been renamed, do dypgen --help for more details. merge and merge_nt where nt is the name of a non terminal have been renamed dyp_merge and dyp_merge_nt. The following values and types are now encapsulated in a module (which is actually already open, this may change): module Dyp_tools : sig val dypgen_verbose : int ref type token_name = int type non_ter = int type 'a pliteral = | Ter of token_name | Non_ter of 'a type priority val default_priority : priority type non_terminal_priority = | No_priority | Eq_priority of priority | Less_priority of priority | Lesseq_priority of priority | Greater_priority of priority | Greatereq_priority of priority type priority_data val empty_priority_data : priority_data val is_relation : priority_data -> priority -> priority -> bool val insert_priority : priority_data -> string -> (priority_data * priority) val find_priority : priority_data -> string -> priority val set_relation : priority_data -> bool -> priority -> priority -> priority_data val update_priority : priority_data -> (priority * priority * bool) list -> priority_data val add_list_relations : priority_data -> (priority list) -> priority_data type lit = (int * non_terminal_priority) pliteral type rule = non_ter * (lit list) * priority exception Giveup exception Syntax_error val keep_all : 'a list -> 'a -> 'a list val keep_oldest : 'a list -> 'a -> 'a list val keep_newest : 'a list -> 'a -> 'a list end __________________________________________________________________________ 2007/5/7 1) Added the option -symbol-with-variants which makes possible to construct the symbols with polymorphic variants instead of constructors. This is useful if you reach the maximum number of non-constant constructors. 2) Parsers generated by dypgen are now re-entrant. This has consequences on the name of some variables (ex: dyp.global_data instead of data). The record dyp is now used in actions, it is an argument of the action, it has type dypgen_toolbox: type ('obj,'data,'local_data) dypgen_toolbox = { mutable global_data : 'data; mutable local_data : 'local_data; mutable priority_data : priority_data; mutable add_rules : (rule * (('obj,'data,'local_data) dypgen_toolbox -> 'obj list -> 'obj)) list; mutable remove_rules : rule list; mutable will_shift : bool; symbol_start : unit -> int; symbol_start_pos : unit -> Lexing.position; symbol_end : unit -> int; symbol_end_pos : unit -> Lexing.position; rhs_start : int -> int; rhs_start_pos : int -> Lexing.position; rhs_end : int -> int; rhs_end_pos : int -> Lexing.position; add_nt : string -> non_ter; find_nt : string -> non_ter } For example when you want to add rules you no longer do: add_rules := your-rule-list but: dyp.add_rules <- your-rule-list To prevent a shift you do: dyp.will_shift <- false, etc. The variable to set the initial value of the local data is still a ref with name local_data. For the global data it is now a ref with name global_data instead of data. __________________________________________________________________________ 2007/5/2 Fixed a bug that prevented from appending code to the .mli with %mli When one changes the refs data and local_data before calling the parser it is now taken into account as new initial values for them. __________________________________________________________________________ 2007/4/1 Fixed a bug that happened with two or more partial actions in the same rule. __________________________________________________________________________ 2007/3/18 Added a script to generate a documentation of the grammar (by Pierre Hyvernat) __________________________________________________________________________ 2007/3/16 Added let rec to the example language tinyML. __________________________________________________________________________ 2007/2/26 Minor changes in the makefiles __________________________________________________________________________ 2007/2/13 Fixed a bug that caused the following error : The constructor Obj_structure expects 1 argument(s), but is here applied to 0 argument(s) __________________________________________________________________________ 2007/2/10 Several bug fixed about pattern matching for symbols. Improved location accuracy of syntax errors in .dyp files. __________________________________________________________________________ 2007/2/9 Added pattern matching for symbols in right-hand sides of rules. In particular this makes possible guarded reductions and to bind names to the arguments of actions. __________________________________________________________________________ 2007/2/8 Added the partial actions : actions within the right-hand side of a rule. Added the priority sets (=p), (>p) and (>=p) to be used after a non terminal in the right-hand side of a rule. Fixed a bug in the generation of the LR(1) and LALR(1) automata that prevented the parser to handle grammar with rules like : a: (empty side) | b T b: a U Updated the manual, more details on local_data. __________________________________________________________________________ 2007/2/7 Syntax error in .dyp files are now located with the line and character number by dypgen. To use a token which is not declared is an error. If an undeclared priority is used, dypgen emits a warning. __________________________________________________________________________ 2007/2/7 The following constructors are renamed: To_priority becomes Less_priority Toeq_priority becomes Lesseq_priority __________________________________________________________________________ 2007/2/6 The interface of the action code has been significantly simplified. There is no difference between a dynamic action and a classic action anymore. The keywords %dynamic and %full are now discarded. Access and changes to data, priority data and the grammar are now performed through the assignment of references. datadyn has been renamed local_data. The method of adding rules like with %dynamic(nt) in the previous versions is not possible anymore. This allowed to add actions which returned value of the same type than the one returned by the non terminal nt instead of returning value of type obj. Now any action introduced at runtime must return a value of type obj. An action added dynamically is now just : obj list -> obj. Priorities with sets are discarded. Priority functions are discarded. Now a rule returns a constant priority. The exception Giveup_reduce is renamed Giveup. The exception Giveup_shiftreduce is discarded. One can now bind several actions to a rule, the parser only apply the first that does not raise Giveup and then does not try the others. The relations between priorities is now denoted '<' instead of ':'. (p) is now ( obj -> obj list. And the parser return a list of couple (parse-tree,priority) instead of just a list of parse trees. The default automaton is now the LR(0). A new method which embeds the inforcement of priorities in the automaton can be optionally chosen. __________________________________________________________________________ 2007/1/31 Bug fixed : %relation p1:p2 would introduce p1:p1 and p2:p2 in addition to p1:p2, thus (p2) and (=p2) would denote the same thing. This has been fixed. __________________________________________________________________________ 2007/1/23 Fixed an efficiency issue (a lot of %token lines would make the generation very slow). One can now use an LR(0) automaton. This is stated with an option to dypgen. __________________________________________________________________________ 2007/1/15 The automaton of the parser is now built when one actually uses the parser, it is not marshalled anymore in the .ml file (this fixes a bug). As a result the .ml file is lighter but the parser is longer to start when it is used. The license changes to Cecill-B for all the files. Simplification of the installation. __________________________________________________________________________ 2006/12/26 The merge warning now returns the position of the part of the input considered. __________________________________________________________________________ 2006/12/25 The functions symbol_start,... which tell what is the part of the input that is reduced to a given non terminal are now available. When one uses a lexer different from ocamllex, one must use the option -lexer. The type of the action code changes slightly. __________________________________________________________________________ 2006/12/19 Bug fixed: in some particular cases the parser would not shift while it had to. Bug fixed: Taking into account the lookahead token for the reduction by a dynamic rule prevented the parser from reducing in some particular cases when it had to. As a consequence the lookahead token is not considered anymore to decide whether to reduce by a dynamic rule or not. __________________________________________________________________________ 2006/12/16 Introduction of the generic merge functions and of predefined merge functions. The type of the specific merge functions changes and is now : val merge_nt : priority_data -> ('obj * priority) list -> ('obj * priority) -> ('obj * priority) list The command line option -merge_warning makes the generated parser emit a warning on the standard output each time a merge happens. __________________________________________________________________________ 2006/12/15 By default the parser build an LALR(1) automaton instead of an LR(1) one, which is still possible by stating %LR1 in the parser definition. __________________________________________________________________________ 2006/12/13 Fixed a bug which prevented merging of values to happen. One can define a general merge function now. It applies to any non terminal which has now its own merge function. Added an example which yields a parse forest. __________________________________________________________________________ 2006/12/12 renamed the flag verbose to dypgen_verbose minor bug fixed. __________________________________________________________________________ 2006/11/29 Several entry points are now allowed. The parsing function is not dyp_parse anymore but has the name of the corresponding non terminal as with ocamlyacc. The end of file token is not a special token anymore and can be used in the rules as with ocamlyacc. __________________________________________________________________________ 2006/10/26 val dyp_parse : ('a -> token) -> 'a -> main_type list instead of val dyp_parse : (Lexing.lexbuf -> token) -> Lexing.lexbuf -> main_type list in addition to ocamllex any other lexer can be used with dypgen __________________________________________________________________________ 2006/10/24 Fixed bugs with merge functions Added a compact syntax for transitive relations Completed user's manual. __________________________________________________________________________ 2006/10/20 First version