this post was submitted on 29 Jul 2024
8 points (100.0% liked)

Rust

6052 readers
92 users here now

Welcome to the Rust community! This is a place to discuss about the Rust programming language.

Wormhole

[email protected]

Credits

  • The icon is a modified version of the official rust logo (changing the colors to a gradient and black background)

founded 1 year ago
MODERATORS
 

So, I'm basically trying to parse a string literal with nom. This is the code I've come up with:

use nom::{
    bytes::complete::{tag, take_until},
    sequence::delimited,
    IResult,
};

/// Parses string literals.
fn parse_literal<'a>(input: &'a str) -> IResult<&'a str, &'a str> {
    // escape tag identifier is the same as delimiter, obviously
    let escape_tag_identifier =
        input
            .chars()
            .nth(0)
            .ok_or(nom::Err::Error(nom::error::Error::new(
                input,
                nom::error::ErrorKind::Verify,
            )))?;

    let (remaining, value) = delimited(
        tag(escape_tag_identifier.to_string().as_str()),
        take_until(match escape_tag_identifier {
            '\'' => "'",
            '"' => "\"",
            _ => unreachable!("parse_literal>>take_until branched into unreachable."),
        }),
        tag(escape_tag_identifier.to_string().as_str()),
    )(input)?;

    Ok((remaining, value))
}

#[cfg(test)]
mod literal_tests {
    use super::*;

    #[rstest]
    #[case(r#""foo""#, "foo")]
    #[case(r#""foo bar""#, "foo bar")]
    #[case(r#""foo \" bar""#, r#"foo " bar"#)]
    fn test_dquotes(#[case] input: &str, #[case] expected_output: &str) {
        let result = parse_literal(input);
        assert_eq!(result, Ok(("", expected_output)));
    }

    #[rstest]
    #[case("'foo'", "foo")]
    #[case("'foo bar'", "foo bar")]
    #[case(r#"'foo \' bar'"#, "foo ' bar")]
    fn test_squotes(#[case] input: &str, #[case] expected_output: &str) {
        let result = parse_literal(input);
        assert_eq!(result, Ok(("", expected_output)));
    }

    #[rstest]
    #[case(r#""foo'"#, "foo'")]
    #[case(r#"'foo""#, r#"foo""#)]
    fn test_errs(#[case] input: &str, #[case] expected_err_input: &str) {
        let result = parse_literal(input);
        assert_eq!(
            result,
            Err(nom::Err::Error(nom::error::Error::new(
                expected_err_input,
                nom::error::ErrorKind::TakeUntil
            ))),
        );
    }
}

Note: The example uses rstest for tests.

Although it looks a little bit complex, actually, it is not. Basically, the parse function is parse_literal. The tests are separated for double quotes and single quotes and errors.,

When you run the tests, you will realize first and second cases for single and double quotes run successfully. The problem is with the third case of each: #[case(r#""foo \" bar""#, r#"foo " bar"#)] for test_dquotes and #[case(r#"'foo \' bar'"#, "foo ' bar")] for test_squotes.

Ideally, if a string literal is defined with single quotes and has single quotes in its content, the single quotes can be escaped with single quotes again. Same goes for double quotes as well. To demonstrate in a pseudocode:

"foo ' bar" // is ok
"foo \" bar" // is ok
"foo " bar" // is err
'foo " bar' // is ok
'foo \' bar' // is ok
'foo ' bar' // is err

Currently, in the code, I take characters until the delimiter with take_until, which reaches to the end of the input, which, let's say, in this case, is guaranteed to contain only and only the string literal as input. So it's kind of okay for first and second cases in the tests.

But, of course, this fails in the third cases of each test since the input has the delimiter character early on, finishes early and returns the remaining.

This is only for research purposes, so you do not need to give a fully-featured answer. A pathway is, as well, appreciated.

Thanks in advance.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 1 points 4 months ago

What I do to parse strings (pseudo code since I'm on mobile, don't copy-paste):

delimited(
    ",
    many0(alt(
        any_character_except_quote_or_slash,
        pair('\', escaped_char)
   )),
   "
)

Where any_except_quote_or_slash and escaped_char are defined somewhere else, the rest of the parsers are by nom.

You may want to wrap pair with a map and many0 with recognize.