Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PowerPoint PDF data loaded into PDF-Lib does not open in Adobe Acrobat Pro DC #1206

Open
2 tasks done
ahaganDEV opened this issue Apr 5, 2022 · 3 comments · May be fixed by #1215
Open
2 tasks done

PowerPoint PDF data loaded into PDF-Lib does not open in Adobe Acrobat Pro DC #1206

ahaganDEV opened this issue Apr 5, 2022 · 3 comments · May be fixed by #1215

Comments

@ahaganDEV
Copy link

What were you trying to do?

Given some PDF file data retrieved from our API, pass them through into PDF-Lib to manipulate them (draw stamps etc) and merge them into one PDF document to output to disk. Then the PDF should be opened in Adobe Acrobat Pro DC.

How did you attempt to do it?

  1. Initially we receive PDF data from an API that returns it in UInt8Array format.

  2. Load the data into PDF-Lib:
    const embedDoc = await PDFDocument.load(pdfFileData);

  3. Embed the pages into the document:

const pages = embedDoc.getPages();
  for (const page of pages) {
      // Exclude blank pages
      if (!page.node.Contents()) {
        continue;
      }

      const newPage = embedDoc.addPage([page.getWidth(), page.getHeight()]);

      const embedPage = await embedDoc.embedPage(page);

      newPage.drawPage(embedPage, {
        x: 0,
        y: 0,
        xScale: 1,
        yScale: 1,
        width: page.getWidth(),
        height: page.getHeight(),
      });
  1. Later on in the process, merge multiple PDF Documents together:
for (const pdfFile of this.pdfFiles) {
        const copiedPages = await mergedPdf.copyPages(
          pdfFile,
          pdfFile.getPageIndices()
        );
        copiedPages.forEach((page, index) => {
           mergedPdf.addPage(page);
        });
    }
  1. Save and then output the merged file to disk:
const pdfData = await mergedPdf.save();
fs.writeFileSync('my-path\myfile.pdf', pdfData);

What actually happened?

The PDF generated can be opened in native PDF readers on Windows, MacOS and Ubuntu. However when trying to open it in Adobe Acrobat Pro DC, it fails to open, giving the following error:

image

When run through this PDF Checker tool https://www.pdf-online.com/osa/repair.aspx it outputs the followoing error:

0x80410306 - E - The "Length" key of the stream object is wrong.
    - Object No.: 10
    - File: Generated_Merged_File.pdf

When repaired, this PDF can then be opened in Adobe Acrobat Pro DC.

When Opened in RUPS here is the basic structure and the stream length of the above object:
image

Here is the RUPS view of the repaired PDF (notice the differing stream length highlighted)
image

What did you expect to happen?

The PDF file opens up correctly in Adobe Actobat Pro DC

How can we reproduce the issue?

Here is the original PowerPoint PDF file that is retrieved from our API (this PDF itself opens fine in Adobe Acrobat)
simple_ppt.pdf

Here is the generated PDF after it is passed through PDF-Lib and has gone through the merge process (this does NOT open in Adobe Acrobat)
Generated_Merged_File.pdf

Here is the output of the repaired PDF using the tool https://www.pdf-online.com/osa/repair.aspx (this does open in Adobe Acrobat)
Generated_Merged_File.pdf_recovered.pdf

Example code snippets are shown above.

Version

1.16.0

What environment are you running pdf-lib in?

Node

Checklist

  • My report includes a Short, Self Contained, Correct (Compilable) Example.
  • I have attached all PDFs, images, and other files needed to run my SSCCE.

Additional Notes

No response

@Trapfether
Copy link
Contributor

I have determined that the reason for the issue is related to License text that is being embedded with the font. The license text contains pdf keywords that are confusing the Stream Parser in pdf-lib.

I'm working on an improvement that would make pdf-lib resilient to this particular issue.

@Trapfether Trapfether linked a pull request Apr 16, 2022 that will close this issue
10 tasks
@ahaganDEV
Copy link
Author

@Trapfether I see your PR has been open for over 3 weeks now. Do you know if it is likely to merged and released soon?

@Trapfether
Copy link
Contributor

@ahaganDEV a new release of pdf-lib is cut every few months as needed. I havn't yet received any contact or feedback from the maintainer so doubt it will be release soon.

In the mean time, you can apply my changes to your local copy depending on how you use pdf-lib. I use the browser-based version and so run the build myself and use the resulting files.

If you're using the backend version, you can use NPM Link or maintain your own repository and install the package from that repository instead of this one. However, you would want to check back periodically and switch back to using this repository once the change has been merged so you also get patches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants