Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Webscrapping for images in one specific page #434

Open
candelavillalonga opened this issue Dec 1, 2024 · 0 comments
Open

Webscrapping for images in one specific page #434

candelavillalonga opened this issue Dec 1, 2024 · 0 comments

Comments

@candelavillalonga
Copy link

Hi there!

I'm quite new but excited to be here :)

Though not so much as in thrilled cause I've been trying the whole day to do something that looked very simple and I keep on getting errors which..made me come here.

I'd like to do something as SIMPLE as scrapping from this website the logos that are found under this three sections, and downloading them into two different folders:

xpath_financio <- "//[@id='h.71472b2c8f27c01b_62']"
xpath_apoyo1 <- "//
[@id='h.3d1935040932c0ea_10']"
xpath_apoyo2 <- "//*[@id='h.1fb167b9350294fd_75']"

I've had errors such as:

Error in tokenize(css) : Unexpected character '/' found at position 1 --> used a \ instead

Then it changed the error to

Error in tokenize(css) : Unexpected character '@' found at position 4

Found that maybe it is what they are saying here in Reddit that it's a CSS-problem related. Though I don't know how to taylor it to my specific code (which I paste down here)

I really don't know !

Would someone be so kind in helping me and my code, please? (:

Thanks a lot!

da code in question:

library(rvest)
library(magick)

URL de la página web

url <- "https://www.aymurai.info/inicio"

XPath de los elementos

xpath_financiero <- "\[@id='h.71472b2c8f27c01b_62']"
xpath_apoyo1 <- "\
[@id='h.3d1935040932c0ea_10']"
xpath_apoyo2 <- "\*[@id='h.1fb167b9350294fd_75']"

Función para extraer imágenes y guardarlas

extraer_y_guardar <- function(xpath, carpeta) {

Leer la página web

pagina <- read_html(url)

Obtener los nodos con el XPath

nodos <- html_nodes(pagina, xpath)

Crear la carpeta si no existe

dir.create(carpeta, showWarnings = FALSE)

Iterar sobre los nodos y guardar las imágenes

for (i in seq_along(nodos)) {
imagen_url <- html_attr(nodos[i], "src")
imagen <- image_read(imagen_url)
image_write(imagen, paste0(carpeta, "/", i, ".jpg"))
}
}

Llamar a la función para cada XPath y carpeta

extraer_y_guardar(xpath_financiero, "FINANCIO")
extraer_y_guardar(xpath_apoyo1, "APOYO")
extraer_y_guardar(xpath_apoyo2, "APOYO")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant