Need help for regular expressions in ragel language.

Avatar

vjadhav

about 1 year ago

Hi

 We have in our language a concept of tag comment. Example is as follows

 |test#

       Description of code

 #test|

 So here test is treated as a tag for multi line comment and for beginning and ending block we require to match exact word test. 

 And tag is not fix it can be anything like test, abc, xyz … but at least we have to take care for matching in order to identify block.



We have tried to implement with following regex in ragel language.

tag_comment =

'|’ (nonnewline -ws)* ‘#' @comment (

  newline %{ entity = INTERNAL_NL; } %lang_ccallback

  |

  ws

  |

  (nonnewline - ws) @comment

)* :>> ('#’ (nonnewline -ws)* ’|’) ';

But above regex also matched block like |test$ some description #abc| which we don’t want. So we required backward referencing in ragel regex itself.

Also there is a concept called "Length-specified Comments", where the opening and closing comment symbols specify the number of characters in the comment, as shown below:

Here is some text |20# please#|&#@|ignore #20| surrounding a length-specified comment.

If the length of the comment inside the block is more than 20 characters in above example, then it considers it as invalid comment.

Second problem is we want a regex for recursive braces matching in ragel

Like {doc-next { { } } } should be valid but {doc-next{ }}} should not valid.

Is there a way to handle the above scenarios using regular expression in ragel language?

Thanks & Regards, Vijay


Avatar

mitchell

about 1 year ago

For length-specified comments, you can add @{} at the start and end points you are interested in and using the 'when' statement for when the difference in start and end positions is less than 20.

e.g. (untested)

char *s, *e;

%%{

comment = ('|' @{ s = p; } stuff* '#' @{ e = p; }) when { e - s < 20 };

%%}

For nested expressions, look at the d.rl ragel parser. It has nested '/+ +/' comments that should help you.