Help on Backward Reference concept of Regular Expression in Ragel Language

Avatar

Hardik Parikh

about 1 year ago

Hi

I want to use backward reference in ragel regular expression for creating parser for my language. The requirement is kind of begining and end of tag matching where tag can be any arbitary string. e.g

  1. /test/ this is sample /test/

or it can be

2./hello/ this is test /hello/

and following can be consider as a invalide

3./abc/ this is invalid /xyz/

So as in above example tag can be any arbitary string but requirement is to match begining and ending tag string should be same. To write regular expression for above requirement in ragel language i want to use backward reference as in javascript. So can somebody give valuable suggestion how can we achieve it in ragel or any pointer related to same would be great help

Thanks Hardik Parikh


Avatar

mitchell

12 months ago

I would try capturing the first /.../ and start the enqueueing process (@enqueue). Wrap and capture the delimitting /.../ in a 'when' statement comparing with the first capture. Outside of and after the 'when' statement, commit the queue (@commit).

Pseudocode:

'/' @{ capture start of tag } whatever+ '/' @{ capture end of tag and get substring } @enqueue (
  ... stuff ...
)* :>> (('/' @{ capture start of delimitting tag } whatever+ '/' @{ capture end of delimitting tag and get substring}) when { substrings match }) @commit;

Avatar

Hardik Parikh

12 months ago

Consider follwing is valid comment

/abc/ Some description ……. /abc/

And following is invalid comment

/abc/
         Some description …….
/xyz/

As per your suggestion I tried with following code

action mark_start{ start = fpc+1; // printf(" Mark start at %c\n",fc); } action tagstart{ size_t len = fpc - start ; s = calloc(len,sizeof(char)); strncpy(s,start,len); s[len]='\0'; // printf("Start : %s\n",s); }

action tagend{ size_t len = fpc - start ; e = calloc(len,sizeof(char)); strncpy(e,start,len); e[len]='\0'; stringcmp = strncmp(e,s,sizeof(s)); //printf("End : %s\n",e); }

curltagcomment = '/'@mark_start (nonnewline - ws)* '/'@tagstart @enqueue @comment( newline %{ entity = INTERNALNL; } %curlccallback | ws | (nonnewline - ws) @comment )* :>> ('/'@mark_start (nonnewline - ws)* '/' @tagend when { stringcmp == 0}) @commit @comment;

Now I am able to capture both start and end substring properly and able to do math operation in when condition but it produces as a valid commet in parser for even a non valid statement. like curl comment /abc/ curl comment Some description ……. curl comment /xyz/

I have also tried with removing @enqueue and @commit and still its giving same result.

Please provide me any suggestions where I am doing something wrong or is there any other better way to implement the same.


Avatar

mitchell

12 months ago

I've had mixed success with the following (maybe it will help):

action mark_start {
  start = fpc + 1;
  //printf("Mark start at %c\n", fc);
}
action tagstart {
  size_t len = fpc - start;
  s = calloc(len, sizeof(char));
  strncpy(s, start, len);
  s[len] = '\0';
  printf("Start: %s\n", s);
}
action tagend {
  size_t len = fpc - start;
  e = calloc(len, sizeof(char));
  strncpy(e, start, len);
  e[len] = '\0';
  printf("End: %s\n", e);
}
tag_comment_end := '/' @commit @comment @{ start = NULL; e = NULL; } @{ fgoto tag_line; };
tag_comment =
  '/' @mark_start (nonnewline - ws)* '/' @tagstart @comment @enqueue (
    '/' @mark_start (nonnewline - ws)* '/' @tagend @{ if (e && s && strncmp(e, s, sizeof(s)) == 0) { fhold; fgoto tag_comment_end; } }
    |
    newline %{ entity = INTERNAL_NL; } %tag_ccallback
    |
    ws
    |
    ^space @comment
  )*;

Avatar

Hardik Parikh

12 months ago

Hi mitchell

In above code there is reference to tag_line in regex for tag_comment_end but i could not find definition for tag_line.
Even I have tried with replacing it with tag_comment but it throws compilation error : Can not enter inside longest match constuction as an entry point.
So can you please describe what is here tag_line should be?

Thanks Hardik Parikh


Avatar

mitchell

12 months ago

I've called the language 'tag'. If it was C, then the line would be c_line; Ruby would be ruby_line, etc.

I would pursue the solution given by Adrian on the Ragel mailing list though.


Avatar

Hardik Parikh

12 months ago

I tried out suggested solution but somehow it does not produce any output when i use ./ohcount --annotate command. It seems like fgoto is not working in my case. I am not able to understand what is going wrong?


Avatar

Hardik Parikh

11 months ago

Hi

blockcomment = "|" @initbuf1([a-z]+ $ident1collect tagcomment);

tagcomment = "#" @comment( newline%{ entity = INTERNALNL; } %curlccallback | ws | (nonnewline - ws)@comment)* :>> "#" @initbuf2 [a-z]+ $ident2_collect "|" when {index1 == index2 && strncmp(buf1, buf2, index1) == 0} @comment;

In above code ident1collect and ident2collect is action implemented which actually collect begin and end tag name in buf1 and buf2 respectively

But it throws some duplicate output having wrong parsing line as comment when tag are not matched like below for a given input file

mylang comment |abc# mylang comment Source file created mylang code |abc# mylang code Source file created mylang code #ab|

And when tag are matched it gives valid output for input file as below

mylang comment |abc# mylang comment Source file created mylang comment #abc|

Can anybody suggest what's went wrong to have duplicate output with wrong parsing as comment for not matching tag ?

Thanks Hardik


Avatar

mitchell

11 months ago

That's what @enqueue and @commit are for.