Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting the order of multiple globbed files #292

Open
psychemedia opened this issue Aug 30, 2022 · 7 comments
Open

Sorting the order of multiple globbed files #292

psychemedia opened this issue Aug 30, 2022 · 7 comments

Comments

@psychemedia
Copy link

psychemedia commented Aug 30, 2022

PR #248 supports the ability to pass multiple files, but I notice that if I pass a glob pattern (eg *.html) to combine multiple files into a single PDF:

  • in the command line version of pandoc, the glob returned docs are parsed in alphabetical sort order;
  • in pypandoc, the glob returned files are parsed in an arbitrary order.

If a glob pattern is passed, eg to capture files 01.md, 02.md etc, it would be really useful if the sort order were respected.

@hey24sheep
Copy link
Contributor

@JessicaTegner I have added a fix for this in my PR. Can I be assigned to this. Let me know if I have to edit my code, thanks.

JessicaTegner added a commit that referenced this issue Oct 1, 2022
@fsoedjede
Copy link

fsoedjede commented Dec 29, 2022

Hello @JessicaTegner ,

I saw that this request was merge here: #292

Upgrading from 1.9 to 1.10 caused a regression in my application.
As we can pass a list of files, with this change, it's now impossible to pass a list of files with my preferred order.
If I have files named book.md, addenda01.md, addenda02.md and I want to pass them keeping the order, it will not work.

My suggestion is that the sorting should be done on user side. The change should be revert then.
If it's not possible to revert the change, I suggest that another argument sort_input_files=False is added (I don't think it's a good idea to add an argument just for that but I see no other possibility).

cc @psychemedia

@psychemedia
Copy link
Author

IIRC, my original issye was inconsistency between default pandoc and pypandoc behaviours.

@fsoedjede
Copy link

I think the multiple

I must be left to pandoc instead of doing it in this library

Pandoc accepts this command: pandoc _posts/*.md --from markdown -o books/cookbook.epub

When we use pypandoc as this: pypandoc.convert_file('_posts/*.md', 'epub', outputfile="books/test.epub")

instead of listing individually files before passing them to Pandoc as done here: https://github.com/JessicaTegner/pypandoc/blob/master/pypandoc/__init__.py#L159-L168
This glob (_posts/*.md) must be passed as it is to pandoc.

This will avoid inconsistency between pypandoc and pandoc and reduce the processing needed in this package. WDYT @JessicaTegner

@JessicaTegner
Copy link
Owner

@fsoedjede that seems okay on the surface, but I'm not sure if that would mean that we would run in to some errors with automatics detection of the file type (since I'm pretty sure we do that at some point)

@JessicaTegner
Copy link
Owner

Also @fsoedjede I just tried your command and I get the following output

pandoc.exe: tests/*.md: withBinaryFile: invalid argument (Invalid argument)                                             

@qtfkwk
Copy link

qtfkwk commented Nov 8, 2023

Hi @JessicaTegner! I'm trying to track down the same withBinaryFile: invalid argument (Invalid argument) error I'm seeing with pandoc on Windows... any idea what could cause it? (There aren't many hits after searching for a while; in fact I think this error is the first I've seen.) Appreciate any insight you might have. Thanks!

Admittedly, I'm kinda doing something weird, but maybe these details help (?)... I'm running something like pandoc file.md -o file.docx on Windows with the latest pandoc version 3.1.9 and it works flawlessly in either PowerShell or Git Bash... but if I run from within a Rust utility in Git Bash via std::process::Command::new("sh").args(["-c", "pandoc file.md -o file.docx"]).spawn().unwrap().wait().unwrap() suddenly it doesn't work... but other commands via this method work fine. I have also tried several other Pandoc versions and so far they're all consistently working / not working in the same way.

It'd be nice if searching pandoc source code repo and/or issues for this error produced anything... ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants