Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error messages #110

Open
rjaffe-exponent opened this issue Apr 5, 2022 · 1 comment
Open

Error messages #110

rjaffe-exponent opened this issue Apr 5, 2022 · 1 comment

Comments

@rjaffe-exponent
Copy link

rjaffe-exponent commented Apr 5, 2022

Hi,
I`m getting the following errors when running pdf_text() on multiple pdf files:

PDF error: Can't get Fields array<0a>
PDF error (2126): Unexpected MC Type: 7
PDF error: Couldn't find trailer dictionary
PDF error: Couldn't read xref table
Error in poppler_pdf_text(loadfile(pdf), opw, upw) : PDF parsing failure.

Could you please provide more details on the meaning of these errors? I`m wondering if using "pdf_ocr_text" would solve the problems, but it would take too much time to test in all my files (the errors do not specify the files generating the issues).

Here my session info:

R version 4.1.3 (2022-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] pdfsearch_0.3.0 tesseract_5.0.0 pdftools_3.1.1 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.8 purrr_0.3.4 readr_2.1.2
[9] tidyr_1.2.0 tibble_3.1.6 ggplot2_3.3.5 tidyverse_1.3.1

loaded via a namespace (and not attached):
[1] Rcpp_1.0.8.3 cellranger_1.1.0 pillar_1.7.0 compiler_4.1.3 dbplyr_2.1.1 tokenizers_0.2.1 tools_4.1.3 digest_0.6.29
[9] jsonlite_1.8.0 lubridate_1.8.0 lifecycle_1.0.1 gtable_0.3.0 pkgconfig_2.0.3 rlang_1.0.2 reprex_2.0.1 rstudioapi_0.13
[17] DBI_1.1.2 cli_3.2.0 haven_2.4.3 xml2_1.3.3 withr_2.5.0 httr_1.4.2 rappdirs_0.3.3 askpass_1.1
[25] fs_1.5.2 generics_0.1.2 vctrs_0.3.8 hms_1.1.1 grid_4.1.3 tidyselect_1.1.2 glue_1.6.2 qpdf_1.1
[33] R6_2.5.1 fansi_1.0.3 readxl_1.4.0 tzdb_0.3.0 modelr_0.1.8 magrittr_2.0.2 SnowballC_0.7.0 backports_1.4.1
[41] scales_1.1.1 ellipsis_0.3.2 rvest_1.0.2 assertthat_0.2.1 colorspace_2.0-3 utf8_1.2.2 stringi_1.7.6 munsell_0.5.0
[49] broom_0.7.12 crayon_1.5.1

Thanks!

@jeroen
Copy link
Member

jeroen commented Oct 4, 2022

Can you include an example pdf please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants