Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPICE-0002: URI Rewriting for Mirroring #4

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

holzensp
Copy link

@holzensp holzensp commented May 1, 2024

This adds a SPICE for URI rewriting for e.g. running Pkl in air-gapped / mirroring environments.

* @param with The string to substitute for the first match of [replace], which may contain
* references to capture groups (`$0`, `$1`, etc).
*/
public EvaluatorBuilder addRewriteRule(Pattern replace, String with) {
Copy link
Contributor

@bioball bioball May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with allowed modules, allowed resources, I would just have Pattern here.

Might be good to have a record RewriteRule(Pattern replace, String with); and to have:

addRewriteRule(RewriteRule rewriteRule)

setRewriteRules(Collection<RewriteRule> rewriteRules)

And maybe these should be added to HttpClient#Builder instead of EvaluatorBuilder

@Since { version = "0.26.0" }
class UriRewrite {
/// What to match in the given URI.
replace: String|Regex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using pkl.Regex as a configuration value is an anti-pattern, because it is specifically a Java 8 compatible regex, and not portable across languages.

Following that, I think this should be a String instead:

/// A regular expression pattern describing what to match in a given URI.
///
/// Regular expressions follow Java rules.
@SourceCode { language = "RegExp" }
replace: String(isRegex)

And with unconditionally accepts placeholders for capture groups.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping for an easy way to allow non-regex pattern specification, to not have to always have to handle escaping (in the URI case, especially of .) and I wasn't considering the case where Pkl's own configuration would be object-mapped to a host language.

Mind you, Regex uses Pattern.compile(pattern, Pattern.UNICODE_CHARACTER_CLASS | Pattern.UNICODE_CASE), where isRegex uses Pattern.compile(self, Pattern.UNICODE_CASE), so they're the same Java language level, regardless, and the former is slightly more expressive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, since we are dealing with Java regexes here, it's quite nice that we can use the isRegex helper.

The pattern that I was saying we should follow is to use strings to represent regex values, which is a universal way to describe them (String is a suitable type no matter the regex flavor).

IMO: Regex should only be used for in-language string matching.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users need to understand the exact regex semantics. “follow Java rules” is too vague (and why refer Pkl users to Java?), and isRegex only covers regex syntax. Because the regex will be evaluated by Pkl (even if it happens in Java code), replace: Regex seems logical to me. I don’t see why portability matters here.

Copy link
Contributor

@bioball bioball May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users need to understand the exact regex semantics. “follow Java rules” is too vague (and why refer Pkl users to Java

I disagree with this, actually. When writing regex, it's much easier to understand "this is a Java regex" than needing to look through the full set of available features. It's not any different than the many tools that refer their users to Perl (see PCRE).

As a matter of principle, Regex shouldn't be used to express configuration data in Pkl; it's a data type that's meant for in-language operations. The same can be said of IntSeq, Pair, List, Map, etc.

The first, longest-matching rewrite rule is applied.
In other words:
- when multiple rules match, a rule that matches a longer substring of the URI is preferred over one matching a shorter substring; and
- when multiple rules match the same substring length, the first rule is chosen (in definition order).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these rewrite rules are user-defined as an in-order list, I think we can just pick the first rewrite rule that matches.

It would also be cheaper this way; it allows us to short-circuit once we find a rule that matches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also want to validate that a rewrite can only be rewritten to the same scheme, and perhaps that the rewrite is only for http/https.

Rewriting to a different scheme can be problematic because of minor differences between how different schemes work (e.g. HTTPS urls don't support glob imports).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we want this to intentionally exclude custom resource readers?

106 | text = renderer.renderDocument(value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
at pkl.base#Module.output.text (https://github.com/apple/pkl/blob/0.25.2/stdlib/base.pkl#L106)
----
Copy link
Contributor

@bioball bioball May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this work if the rewrite rule is provided via another means, like via a CLI flag?

Also, it doesn't sound great to insert stack frames to point to nodes that aren't actually being executed. We do this sparingly (e.g. to represent values that initialized as parse-time constants, which is a language implementation optimization detail).

Rather than this, I suggest that we decorate the error with hints. Take a look at VmExceptionBuilder#withHint.

new UriRewrite {
replace = Regex("^")
with = throw("URI found that is uncovered by any of the rewrite rules")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this can work--in order to configure the evaluator at runtime, all of the rewrite rules need to be consumed (ergo, deep forced).

Comment on lines +5 to +6
* Status: Review
* Implemented in: Pkl 0.26.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Status: Review
* Implemented in: Pkl 0.26.0
* Status: TBD
* Implemented in: TBD

@@ -1,6 +1,6 @@
= SPICE Name

* Proposal: [SPICE-NNNN](./SPICE-NNNN-name-of-proposal.adoc)
* Proposal: link:./NNNN-name-of-proposal.adoc[SPICE-NNNN]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this is causing a merge conflict; rebase and resolve


1. Via the API;
2. Via `pkl:Project`; and
3. Via arguments on the CLI.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also:

  • Gradle plugin
  • Java evaluator API
  • Server mode message passing API
  • ~/.pkl/settings.pkl file

In particular I think the settings file is an important place to be able to configure rewrite rules. Not everything needs to be a project, and in some cases can't be a project. This enables regular evaluation of Pkl without needing to introduce projects.

It also is really useful for, say, if you have a machine image (e.g. Docker) that you want to distribute that is already configured to talk to an internal host.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Via the API" was intended to refer to the Java Evaluator API (hence the proposed changes to EvaluatorBuilder). If this is reworded, are you expecting support for other APIs?

Similarly, how do we want to do these for SPICEs generally? List all surfaces where some work needs doing? Then it also needs to list every supported language binding / build system. I was generally assuming it understood that any change/addition of the Evaluator API implies changes/additions to build systems / server / etc. Basically; do we want SPICEs to show the explicit design for every contact surface?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all components that are expected to require changes should at least be mentioned, especially those in apple/pkl. Supporting a new feature in the Evaluator API doesn’t necessarily mean that it should also be supported in the Gradle plugin.

=== Implementation

The proposed design is to implement this behaviour in `IoUtils::resolve`.
`IoUtils::resolve` is used throughout to resolve URIs, regardless of whether a URI is used in an `import`, `import*`, `extends`, `amends`, `read`, or `read*`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably use org.pkl.core.http.RequestRewritingClient instead, because IoUtils.resolve doesn't affect how packages get loaded.

We can definitely use org.pkl.core.http.RequestRewritingClient if we only permit http/https rewrites. It'd be more problematic if we allow other types of rewrites though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ This goes back to whether we want to restrict rewriting to HTTP(S) and ignore custom resource readers.

@bioball bioball changed the title Add SPICE-0002 on URI rewriting SPICE-0002: URI Rewriting for Mirroring May 1, 2024
Mirroring packages from internet repositories in a local store is insufficient, because the mirrored packages may have hard-coded dependencies.
Importing mirrored packages, using URIs pointing to the mirror, will eventually evaluate `import`s of those hard-coded dependencies, referring to external sites.

Another reason to want to consume (transitive) dependencies from a mirror is the requirement of altered versions of packages.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This amounts to monkey patching. Is it wise for a deterministic config language to encourage/support this?

Copy link
Contributor

@bioball bioball May 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO: I don't think this is a valid use-case; packages have their integrity verified by checksum. Altering a package would break any integrity checks that we would do. The checksums within PklProject.deps.json should be consistent regardless of rewrite rules.

Packages whose contents need to be altered should probably become forks.


For systems that require run-time air-gapping, packages imported from `pkg.pkl-lang.org` or other sites on the internet are not available.
This can be remedied by pre-populating a cache directory with all required dependencies (see link:../spices/0001-import-graph-analyzer.pkl[SPICE 1: Import Graph Analyzer API]).
When dependencies can not be determined ahead of time, pre-populating the cache is not an option.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this happen?

* @param replace The literal string to match.
* @param with The literal string to substitute for the first match of [replace].
*/
public EvaluatorBuilder addRewriteRule(String replace, String with) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the method name should be more specific. Suggestions: addModuleUriRewriteRule, addImportUriRewriteRule, rewriteImportUri. When I see String replace, I’m thinking that’s the replacement.


== Introduction

This SPICE discusses rewrite rules for URIs of modules being `import`ed, so as to use mirrored alternatives.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a scary feature. I’m worried that it might cause more harm than good. Does Pkl do “feature previews” like (say) Java? Does this feature exist in other languages, especially in languages with a similar safety mindset?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is naturally an opt-in feature. Pkl would not do any rewriting unless specifically configured to do so. Unless you were thinking of something else? E.g. a library author being able to opt out of rewrites?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants