Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite disko in Python #902

Draft
wants to merge 37 commits into
base: master
Choose a base branch
from
Draft

Rewrite disko in Python #902

wants to merge 37 commits into from

Conversation

iFreilicht
Copy link
Contributor

See #789 for rationale.

This is very barebones, but I'm at a point where I'm confident enough in the basic structure to get feedback from the community.

How to test

You can use nix develop to get the same shell I've been developing in.
After that, you can just run ./disko2 to try out the CLI, or pytest to run the test suite.

Right now, only the disk, gpt and filesystem types are implemented.

The most interesting commands to try are ./disko2 generate, which generates a disko-config.nix at the project root, and ./disko2 format,mount --dry-run disko-config.nix, which will show you what commands disko would run to align your current system with the target configuration.

Right now, even without --dry-run, no commands are executed.

Thoughts on the rewrite

In general, it was quite a bit of effort to get a disko config into python and use it in a type-safe manner. Now that that's possible, working with the object feels quite nice.

Error handling and communicating issue to the user properly is significantly improved, which is a huge bonus, and being able to generate configs from real world systems and test the plan generation against that should help us test disko a lot more thoroughly.

You will notice that there is a lot more code, though. While you can template complete scripts in python as well, I wanted to actively avoid that to obviate the need for interpreting commands with a shell in the first place. The idea is that all commands will be executed with subprocess.run, which means the binaries are called directly. This will solve escaping, but might make some things (like conditional execution) more complicated to implement.

To Do

  • Display plan in a human-readable way
  • Add ability to actually run the plan
  • Add VM tests (will require JSON input)
  • Find a good design for conditional execution of some steps
  • Implement more of the types
  • Resolve merge conflicts

After that, I also have to think about how many of the currently nix-only functionality can be implemented. How would something like config.system.build.format be generated, for example?

This will be essential for type safety later on.
Error handling could be done better, but I'm pretty happy with it
already.
The important part is aggregating all errors instead of failing on the
first issue we encounter.
This requires a lot of restructuring. The .nix files have to be bundled
together with the python files, so they need to follow python's module
system structure.

I ran `nix-fast-build --no-link -j 2 --eval-workers 3 --flake .#checks`
and it succeeded, so I'm reasonably confident I changed everything as
required.
Other people might use other tools, but having a known-good
configuration is useful.
Using untyped dicts was not a safe way to pass arguments to error
message rendering. Also, having one giant match statement wouldn't scale
well. Using function pointers and kwargs, mypy can now properly check
whether all required arguments are passed to DiskoMessage().

As a bonus, it's now easy to use the "Go to definition" LSP command for
error codes, which wasn't possible before.
It combines multiple tools and is much faster.
This allows to create plans only based on the things that changed in the
configuration.
This makes the tests more readable and easier to browse.
Making sure the tests always run fine, no matter from which directory
they're started is a little detail we should either test for or
document.
This is required to be able to generate sgdisk invocations that are
equivalent to what's currently generated in nix.
This will be very useful for generating documentation and python type
definitions.
The default `strict = true` is too permissive for my liking, especially
how it allows using `Any` in many places and doesn't warn about them at
all.
This will be very useful for generating documentation and python type
definitions.
There are still two issues: The type of "topology" in zpool is not
created, and gpt_partitions_options_hybrid_options for some reason
contains a `_create: "_create"` entry. This is an issue with the nix
evaluation, though, not with the code generator.

I'm fixing these issues manually to have some state I can start working
from.
Belongs to d382a3c, but I authored that
on another machine.

Making sure vscode uses mypy from the environment is very important now,
because some of these errors get triggered in different ways depending
on the version.
Copy link
Contributor

@bittner bittner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments from a pythonista. 🐍

Re development tooling, I'd suggest to embrace uv, which is about to become the de-facto installer and packaging tool, and ruff as a universal linter and code formatter. See painless-software/cicd/app/cli (GitLab) for an example CLI setup with Tox that wraps Ruff for formatting and linting.

Some of your code mentions that a Nix test should be run in CI. I use the docker.io/nixos/nix container images to run nix flake check. See painless-software/nixos-config (GitLab). If you want this in GHA in this repo I could contribute that.

Comment on lines +104 to +122
match type_str:
case "str":
return "str"
case "absolute-pathname":
return "str"
case "bool":
return "bool"
case "int":
return "int"
case "anything":
return "Any"
# Set up discriminated unions to reduce error messages when validation fails
case "deviceType":
return '"deviceType" = Field(..., discriminator="type")'
case "partitionType":
return '"partitionType" = Field(..., discriminator="type")'
case _:
# Probably a type alias, needs to be quoted in case the type is defined later
return f'"{type_str}"'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With traditional Python you would write it like that:

Suggested change
match type_str:
case "str":
return "str"
case "absolute-pathname":
return "str"
case "bool":
return "bool"
case "int":
return "int"
case "anything":
return "Any"
# Set up discriminated unions to reduce error messages when validation fails
case "deviceType":
return '"deviceType" = Field(..., discriminator="type")'
case "partitionType":
return '"partitionType" = Field(..., discriminator="type")'
case _:
# Probably a type alias, needs to be quoted in case the type is defined later
return f'"{type_str}"'
types_map = {
"str": "str",
"absolute-pathname": "str",
"bool": "bool",
"int": "int",
"anything": "Any",
# Set up discriminated unions to reduce error messages when validation fails
"deviceType": '"deviceType" = Field(..., discriminator="type")',
"partitionType": '"partitionType" = Field(..., discriminator="type")',
}
try:
return types_map[type_str]
except KeyError:
# Probably a type alias, needs to be quoted in case the type is defined later
return f'"{type_str}"'

This is (still) the pythonic way. I understand that match/switch/case is known in other languages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, it's more terse anyway. I used match here initially because I called a function in one of the branches, but when all the results are static, that's not needed anymore.

buffer = io.StringIO()

buffer.write(
"""# File generated by scripts/generate_python_types.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could write it like that, which helps to keep the comment lines visually aligned.

Suggested change
"""# File generated by scripts/generate_python_types.py
"""\
# File generated by scripts/generate_python_types.py


DEFAULT_CONFIG_FILE = "disko-config.nix"

HEADER_COMMENT = """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a backslash if you don't want an empty first line.

Suggested change
HEADER_COMMENT = """
HEADER_COMMENT = """\


"""

PARTIAL_FAILURE_COMMENT = """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a backslash if you don't want an empty first line.

Suggested change
PARTIAL_FAILURE_COMMENT = """
PARTIAL_FAILURE_COMMENT = """\

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, totally forgot about this.

nix_code = re.sub(r"^\{ disko = \{ devices", "{ disko.devices", config_as_nix.value)
nix_code = re.sub(r"\}; \}$", "}", nix_code)

with open(DEFAULT_CONFIG_FILE, "w") as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to consider using pathlib.Path instead, e.g.

Suggested change
with open(DEFAULT_CONFIG_FILE, "w") as f:
with Path(DEFAULT_CONFIG_FILE).open("w") as f:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know pathlib is useful for path operations and type safety, but does it have any advantage here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more about consistency in modern Python.

@iFreilicht
Copy link
Contributor Author

@bittner Thank you for your comments. ruff is already used, you can see it in the dependencies for devShell in flake.nix. It's not mentioned in the pyproject.toml because we use Nix as the package manager.
I do still have to configure it properly, though.

Regarding uv, I'm not sure I understand how it would fit into the picture here. It seems to be an alternative to pip or poetry, but as we use (and must use) Nix, I'm unsure what it would even do in this project.

We already run nix tests in CI using buildbot. You can have a look at other PR's CI Actions if you're curious. Our tests are plentiful and take a long time to evaluate, multiple hours in fact if you were to run them in sequence. Thus, we nix-fast-build to speed up the build time to 45 Minutes locally and ~10 Minutes in CI. Most of our tests also run VMs to check whether the formatting process works correctly, which I think would be tricky to do inside containers, right? If your setup does have unique advantages, let me know.

@bittner
Copy link
Contributor

bittner commented Jan 16, 2025

Regarding uv, I'm not sure I understand how it would fit into the picture here. It seems to be an alternative to pip or poetry, but as we use (and must use) Nix, I'm unsure what it would even do in this project.

It was meant for the dev-setup document. Ack, I'm a beginner in NixOS. What is uv2nix, btw, does that fit for using uv to develop with Python on NixOS?

We already run nix tests in CI using buildbot.

I wasn't aware of that, sorry. Is the source code of this setup included in this repo?

If your setup does have unique advantages, let me know.

No, probably not. I had played around with Nix in container images for running checks over my nixos-config and thought I share my findings. I didn't see that anywhere else. Sounds like your setup is much more than just some light linting and nix flake check. Sounds good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants