Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Product format error message #998

Open
idomic opened this issue Aug 25, 2022 · 4 comments
Open

Product format error message #998

idomic opened this issue Aug 25, 2022 · 4 comments
Assignees
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@idomic
Copy link
Contributor

idomic commented Aug 25, 2022

Could not determine format for product ‘outputs/validation_result0.pkl’. Pass a valid extension (‘.html’, ‘.ipynb’, ‘.md’, ‘.pdf’, ‘.rst’, and ‘.tex’) or pass “nbconvert_exporter_name”. If you want this task to generate multiple products, pass a dictionary to “product”, with the path to the output notebook in the “nb” key (e.g. “output.ipynb”) and any other output paths in other keys)

We need to support pkl as a format, or if supported but missing the dictionary key, we should adjust the error message to be specific and give an example (for instance, to pass a pickle product do product: 'file.pkl'). We should also include a link to our community slack channel.

@idomic idomic added bug Something isn't working good first issue Good for newcomers labels Aug 25, 2022
@edublancas
Copy link
Contributor

edublancas commented Aug 26, 2022

It's not like we don't support .pkl as format. We support all formats since the user is in charge of the serialization; we just take the extension as-is.

This error happens because Ploomber determined that one of the tasks is a notebook. Hence, it requires one of the products to be a notebook (so we store the output notebook there).

For example, the pipeline might be something like this:

tasks:
  - source: notebook.ipynb
     product: something.pkl

Ploomber will throw the error since it expects one of the products to be either an .ipynb or a "report" format, for example, a pdf, html, or any format that can be exported from an .ipynb. We expect something like this:

tasks:
  - source: notebook.ipynb
     product:
       nb: output.ipynb # or output.html, output.pdf, etc
       data: something.pkl

Note that the same would happen if the source is a script, since Ploomber converts scripts to notebooks and then executes them.

This would generate the same error:

tasks:
  - source: script.py
     product: something.pkl

Because we expects something like:

tasks:
  - source: script.py
     product:
       nb: output.ipynb
       data: something.pkl

I agree that the error is confusing, and the code should run even if the path to the output notebook is missing. Here's my proposed solution:

  1. if the source is an ipynb and there is no output ipynb/html/pdf, then execute the notebook inline (this is a duplicate of feature idea: notebook inplace execution #941 so I closed that one)
  2. if the source is a py and there is no output ipynb/html/pdf, then convert it into a notebook (as we currently do) and execute it, but run it in a temporary directory and delete it at the end of the execution (maybe show a warning here? telling the user that they are missing the path to the output notebook, along with an example)

@aadityasinha-dotcom
Copy link
Contributor

Is this issue available, can I work on this issue?

@abhishak3
Copy link
Contributor

abhishak3 commented Oct 9, 2022

I'd like to work on this issue too with @aadityasinha-dotcom

@edublancas
Copy link
Contributor

sure @aadityasinha-dotcom, @abhishak3. feel free to work on it.

assigned it to both. feel free to work independently or together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

5 participants