TypeError: 'PDFObjRef' object is not iterable #1004

corobin · 2024-07-10T00:03:30Z

after updating to version 20240706 extract_text() on a pdf throws an error TypeError: 'PDFObjRef' object is not iterable

this did not occur on the previous version 20231228

Python 3.12.4 (tags/v3.12.4:8e8a4ba, Jun  6 2024, 19:30:16) [MSC v.1940 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>> from pdfminer.high_level import extract_text
>>> text = extract_text("Working.pdf")
>>> text = extract_text("Error.pdf")
Traceback (most recent call last):
  File "<pyshell#21>", line 1, in <module>
    text = extract_text(path)
  File "C:\Program Files\Python312\Lib\site-packages\pdfminer\high_level.py", line 169, in extract_text
    for page in PDFPage.get_pages(
  File "C:\Program Files\Python312\Lib\site-packages\pdfminer\pdfpage.py", line 171, in get_pages
    for (pageno, page) in enumerate(cls.create_pages(doc)):
  File "C:\Program Files\Python312\Lib\site-packages\pdfminer\pdfpage.py", line 127, in create_pages
    yield cls(document, objid, tree, next(page_labels))
  File "C:\Program Files\Python312\Lib\site-packages\pdfminer\pdfpage.py", line 63, in __init__
    mediabox_params: List[Any] = [
TypeError: 'PDFObjRef' object is not iterable
>>>

Working.pdf - newly created blank page with acrobat

Error.pdf - downloaded, I cannot change the process of its creation. I deleted all visible text on the page which did not appear to affect the behaviour of the error

The text was updated successfully, but these errors were encountered:

felixxm · 2024-07-10T09:39:00Z

We hit the same issue with next(high_level.extract_pages(pdf_page_path)) calls.

myhloli · 2024-07-23T15:52:22Z

same error with this：opendatalab/MinerU#198

dhdaines · 2024-07-31T19:46:33Z

Probably need to call resolve1 on self.attrs["MediaBox"] as well... it's indirect objects all the way down...

MarcoPeli · 2024-08-03T00:49:50Z

Probably need to call resolve1 on self.attrs["MediaBox"] as well... it's indirect objects all the way down...

I had same error, using resolve1 fixed it for me.

… not iterable` This fixes upstream issue pdfminer/pdfminer.six#1004 and the build of python3Packages.pdfplumber.

jroakes · 2024-10-26T22:15:53Z

@MarcoPeli . Y'all ok providing a bit more detail on the fix here, for users using:

from pdfminer.high_level import extract_text
text = extract_text("Working.pdf")

jsvine mentioned this issue Jul 14, 2024

Update version of pdfminer-six to 20240706 jsvine/pdfplumber#1166

Open

dotlambda mentioned this issue Jul 25, 2024

python312Packages.pdfminer-six: 20231228 -> 20240706 NixOS/nixpkgs#329409

Merged

13 tasks

dhdaines added a commit to dhdaines/pdfminer.six that referenced this issue Jul 31, 2024

fix: dereference MediaBox (fixes: pdfminer#1004)

ad101c1

dhdaines linked a pull request Jul 31, 2024 that will close this issue

Make sure to dereference MediaBox in /Pages #1027

Open

sarahec mentioned this issue Sep 4, 2024

build failure: python3Packages.pdf-plumber unit tests fail due to error in python3Packages.pdfminer-six NixOS/nixpkgs#339639

Closed

dotlambda added a commit to dotlambda/nixpkgs that referenced this issue Sep 5, 2024

python312Packages.pdfminer-six: fix `TypeError: 'PDFObjRef' object is…

5410e1e

… not iterable` This fixes upstream issue pdfminer/pdfminer.six#1004 and the build of python3Packages.pdfplumber.

dotlambda mentioned this issue Sep 5, 2024

python312Packages.pdfminer-six: fix TypeError: 'PDFObjRef' object is not iterable NixOS/nixpkgs#339919

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: 'PDFObjRef' object is not iterable #1004

TypeError: 'PDFObjRef' object is not iterable #1004

corobin commented Jul 10, 2024 •

edited

Loading

felixxm commented Jul 10, 2024

myhloli commented Jul 23, 2024

dhdaines commented Jul 31, 2024

MarcoPeli commented Aug 3, 2024

jroakes commented Oct 26, 2024

TypeError: 'PDFObjRef' object is not iterable #1004

TypeError: 'PDFObjRef' object is not iterable #1004

Comments

corobin commented Jul 10, 2024 • edited Loading

felixxm commented Jul 10, 2024

myhloli commented Jul 23, 2024

dhdaines commented Jul 31, 2024

MarcoPeli commented Aug 3, 2024

jroakes commented Oct 26, 2024

corobin commented Jul 10, 2024 •

edited

Loading