Replies: 3 comments 3 replies
-
@Ubehebe , sorry for the late response -- just saw your post.
I've thought about this insofar as the indirection bothers me, too. But I haven't spent time trying to design something better. In particular, number 2 resonates with me -- it would be great to make editing in an IDE/text editor easier. Definitely open to hearing your suggestions. Thanks so much for the thoughtful message! |
Beta Was this translation helpful? Give feedback.
-
I think the main question I have is: why does marimo have to serialize the dataflow graph into the source file? Why can't it be an in-memory data structure on the server? The first problem is that marimo needs some way to partition a Python source file into cells. The The edges of the dataflow graph (the parameters of the synthetic functions) are what confuse non-marimo tools. Why do they need to be in the source file? Couldn't marimo run the dataflow analysis once on startup and keep the graph in-memory? This might slow down the initial time to interactive, but I doubt it would be significant when If we're able to make marimo notebooks more idiomatic Python so that other tools can work on them seamlessly, I think that's a good tradeoff. |
Beta Was this translation helpful? Give feedback.
-
I renamed this topic to reflect the most important issue (and to focus less on specific solutions). marimo's text format is good compared to other notebook formats, but I think it can be even better. |
Beta Was this translation helpful? Give feedback.
-
One of the main advantages of marimo compared to other notebook formats is that marimo notebooks are syntactically valid Python files. This means that tools that analyze Python files (linters, formatters, type-checkers, IDEs) can generally do something useful with marimo notebooks without any setup.
As I've used marimo more, I've discovered some exceptions. marimo notebooks are syntactically valid Python, but they aren't idiomatic Python. This means that some tools can't analyze marimo notebooks in a useful way.
Here's an example. When you use an import in a notebook:
marimo serializes that to disk as something like:
This is basically a serialized DAG: the nodes (cells) are represented by top-level functions decorated with
@app.cell
, and the edges (dependencies) are represented by function params/return values.The serialization is elegant, but tools other than marimo can't understand the indirection -- for example, they can't understand that the DataFrame constructor comes from the pandas import. This means that:
import pandas as pd
statement remains.I can see a few approaches we might take to improve this situation, but before proposing anything specific, I wanted to start a discussion. Maintainers, have you thought about this? How important do you think it is to improve?
My own view is that it's medium importance. For (1), unused imports can significantly slow down notebook execution. And for (2), being able to use IDE features to edit marimo notebooks would make large codebases significantly more maintainable (refactoring, etc.).
Thanks for your time!
Beta Was this translation helpful? Give feedback.
All reactions