Webscrapping for images in one specific page #434

candelavillalonga · 2024-12-01T01:07:23Z

Hi there!

I'm quite new but excited to be here :)

Though not so much as in thrilled cause I've been trying the whole day to do something that looked very simple and I keep on getting errors which..made me come here.

I'd like to do something as SIMPLE as scrapping from this website the logos that are found under this three sections, and downloading them into two different folders:

xpath_financio <- "//[@id='h.71472b2c8f27c01b_62']"
xpath_apoyo1 <- "//[@id='h.3d1935040932c0ea_10']"
xpath_apoyo2 <- "//*[@id='h.1fb167b9350294fd_75']"

I've had errors such as:

Error in tokenize(css) : Unexpected character '/' found at position 1 --> used a \ instead

Then it changed the error to

Error in tokenize(css) : Unexpected character '@' found at position 4

Found that maybe it is what they are saying here in Reddit that it's a CSS-problem related. Though I don't know how to taylor it to my specific code (which I paste down here)

I really don't know !

Would someone be so kind in helping me and my code, please? (:

Thanks a lot!

da code in question:

library(rvest)
library(magick)

URL de la página web

url <- "https://www.aymurai.info/inicio"

XPath de los elementos

xpath_financiero <- "\[@id='h.71472b2c8f27c01b_62']"
xpath_apoyo1 <- "\[@id='h.3d1935040932c0ea_10']"
xpath_apoyo2 <- "\*[@id='h.1fb167b9350294fd_75']"

Función para extraer imágenes y guardarlas

extraer_y_guardar <- function(xpath, carpeta) {

Leer la página web

pagina <- read_html(url)

Obtener los nodos con el XPath

nodos <- html_nodes(pagina, xpath)

Crear la carpeta si no existe

dir.create(carpeta, showWarnings = FALSE)

Iterar sobre los nodos y guardar las imágenes

for (i in seq_along(nodos)) {
imagen_url <- html_attr(nodos[i], "src")
imagen <- image_read(imagen_url)
image_write(imagen, paste0(carpeta, "/", i, ".jpg"))
}
}

Llamar a la función para cada XPath y carpeta

extraer_y_guardar(xpath_financiero, "FINANCIO")
extraer_y_guardar(xpath_apoyo1, "APOYO")
extraer_y_guardar(xpath_apoyo2, "APOYO")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Webscrapping for images in one specific page #434

Webscrapping for images in one specific page #434

candelavillalonga commented Dec 1, 2024

Webscrapping for images in one specific page #434

Webscrapping for images in one specific page #434

Comments

candelavillalonga commented Dec 1, 2024

da code in question:

URL de la página web

XPath de los elementos

Función para extraer imágenes y guardarlas

Leer la página web

Obtener los nodos con el XPath

Crear la carpeta si no existe

Iterar sobre los nodos y guardar las imágenes

Llamar a la función para cada XPath y carpeta