this post was submitted on 08 Nov 2024
21 points (100.0% liked)

Rust Programming

8188 readers
37 users here now

founded 5 years ago
MODERATORS
 
fn get_links(link_nodes: Select) -> Option<String> {

        let mut rel_permalink: Option<String> = for node in link_nodes {
            link = String::from(node.value().attr("href")?);

            return Some(link);
        };

        Some(rel_permalink)
    }

This is what I'm trying to do, and I've been stuck with this code for an hour, I simply don't know how to put this function togheter.. Essentially I would like to take some link_nodes and just return the link String, but I'm stuck in the use of Option with the ? operator.. Pheraps trying to write it with match would clear things out(?)

Also I come from JavaScript in which expressions do not have their own scope, meaning I'm having troubles to understand how to get out a variable from a for loop, should I initialize the rel_permalink variable as the for loop result?

This are the errors i get:

error[E0308]: mismatched types
  --> src/main.rs:55:49
   |
55 |           let mut rel_permalink: Option<String> = for node in link_nodes {
   |  _________________________________________________^
56 | |             link = String::from(node.value().attr("href")?);
57 | |
58 | |             return Some(link);
59 | |         };
   | |_________^ expected `Option<String>`, found `()`
   |
   = note:   expected enum `Option<String>`
           found unit type `()`
note: the function expects a value to always be returned, but loops might run zero times
  --> src/main.rs:55:49
   |
55 |         let mut rel_permalink: Option<String> = for node in link_nodes {
   |                                                 ^^^^^^^^^^^^^^^^^^^^^^ this might have zero elements to iterate on
56 |             link = String::from(node.value().attr("href")?);
   |                                                          - if the loop doesn't execute, this value would never get returned
57 |
58 |             return Some(link);
   |             ----------------- if the loop doesn't execute, this value would never get returned
   = help: return a value for the case when the loop has zero elements to iterate on, or consider changing the return type to account for that possibility
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 1 points 2 weeks ago (10 children)

Hi! First of all thank you so much for the detailed explanation!

What I'm trying to do is scraping some content.

Yes I'm trying to return all links (maybe in a vector), I have a list of elements (Select, which actually is scraper::html::Select<'_, '_>) which contain essentially html nodes selections, and I would like to grab each of them, extract the actual link value (&str), convert it into an actual String and push it firstly into a vector containing all the links and then in an istance of a struct which will contain several datas about the scraped page later.

I was trying to use a for loop because that was the first structure that came to my mind, I'm finding it hard to wrap my head around ownership and error handling with rust, using the if let construct can be a good idea, and I didn't consider the use of break!

I also managed to build the "match version" of what I was trying to achieve:

fn get_links(link_nodes: scraper::html::Select<'_, '_>) -> Vec<String> {
        let mut links = vec![];

        for node in link_nodes {
            match node.value().attr("href") {
                Some(link) => {
                    links.push(link.to_string());
                }
                None => (),
            }
        }

        dbg!(&links);
        links
    }

I didn't understand that I had to return the same type for each of the Option match arms, I thought enum variants could have different types, so if the Some match arm returns (), also None has to do the same..

If I try with a simpler example I still cannot understand why I cannot do something like:

enum OperativeSystem {
            Linux,
            Windows,
            Mac,
            Unrecognised,
        }

        let absent_os = OperativeSystem::Unrecognised;
        find_os(absent_os);

        fn find_os(os: OperativeSystem) -> String {
            match os {
                debian => {
                    let answer = "This pc uses Linux";
                    answer.to_string()
                }
                windows10home => {
                    let answer = "This pc uses Windows, unlucky you!";
                    answer.to_string()
                }
                ios15 => {
                    let answer = "This pc uses Mac, I'm really sorry!";
                    answer.to_string()
                }
                _ => {
                    let is_unrecognised = true;
                    is_unrecognised
                }
            }
        }

match is much more intuitive for a beginner, there's a lot of stuff which go under the hood with ?

[–] AsudoxDev 1 points 2 weeks ago* (last edited 1 week ago) (2 children)

Here's what you are trying to do, with a one liner:

fn get_links(mut link_nodes: Select) -> Vec<String> {
    link_nodes.retain(|node| node.value().attr("href").is_some()).into_iter().fold(Vec::new(), |links, node| links.push(link.value().attr("href").unwrap().to_string()))
}

edit: shorter and updated version:

fn get_links(mut link_nodes: Select) -> Vec<String> {
    link_nodes.into_iter().filter_map(|node| node.value().attr("href").map(|href| href.to_string())).collect()
}

The retain method is to get rid of all the nodes which don't have a href attribute and the fold method after it is to extract the href out of the nodes and push them into the vector.

It might work or not, I've written this from my memory and I can't exactly know what that Select is.

I also hope you begin reading The Book without half assing it.

[–] [email protected] 2 points 2 weeks ago (1 children)

You should use filter_map instead of the retain and later unwrapping and you don't need a fold to build a Vec from an iterator, you can just use collect for that at the end.

[–] AsudoxDev 1 points 1 week ago

here, it definitely is shorter, I'll keep filter_map in mind, thanks:

fn get_links(mut link_nodes: Select) -> Vec<String> {
    link_nodes.into_iter().filter_map(|node| node.value().attr("href").map(|href| href.to_string())).collect()
}
load more comments (7 replies)