You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
from pdfminer.high_level import extract_pages
file_path = "***.pdf"
with open (file_path, "rb") as f:
for page in extract_pages(f):
page_index = page.pageid
print(page_index)
it cost about 10 minutes.
The text was updated successfully, but these errors were encountered:
Well, extract_pages is actually doing layout analysis on every page, so one might expect this to be slow! Do you find that it gets slower with each successive page?
中国神华:中国神华2023年度报告.PDF
code is very simple
from pdfminer.high_level import extract_pages
file_path = "***.pdf"
with open (file_path, "rb") as f:
for page in extract_pages(f):
page_index = page.pageid
print(page_index)
it cost about 10 minutes.
The text was updated successfully, but these errors were encountered: