-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPICE-0002: URI Rewriting for Mirroring #4
base: main
Are you sure you want to change the base?
Conversation
* @param with The string to substitute for the first match of [replace], which may contain | ||
* references to capture groups (`$0`, `$1`, etc). | ||
*/ | ||
public EvaluatorBuilder addRewriteRule(Pattern replace, String with) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent with allowed modules, allowed resources, I would just have Pattern
here.
Might be good to have a record RewriteRule(Pattern replace, String with)
; and to have:
addRewriteRule(RewriteRule rewriteRule)
setRewriteRules(Collection<RewriteRule> rewriteRules)
And maybe these should be added to HttpClient#Builder
instead of EvaluatorBuilder
@Since { version = "0.26.0" } | ||
class UriRewrite { | ||
/// What to match in the given URI. | ||
replace: String|Regex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using pkl.Regex
as a configuration value is an anti-pattern, because it is specifically a Java 8 compatible regex, and not portable across languages.
Following that, I think this should be a String
instead:
/// A regular expression pattern describing what to match in a given URI.
///
/// Regular expressions follow Java rules.
@SourceCode { language = "RegExp" }
replace: String(isRegex)
And with
unconditionally accepts placeholders for capture groups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping for an easy way to allow non-regex pattern specification, to not have to always have to handle escaping (in the URI case, especially of .
) and I wasn't considering the case where Pkl's own configuration would be object-mapped to a host language.
Mind you, Regex
uses Pattern.compile(pattern, Pattern.UNICODE_CHARACTER_CLASS | Pattern.UNICODE_CASE)
, where isRegex
uses Pattern.compile(self, Pattern.UNICODE_CASE)
, so they're the same Java language level, regardless, and the former is slightly more expressive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, since we are dealing with Java regexes here, it's quite nice that we can use the isRegex
helper.
The pattern that I was saying we should follow is to use strings to represent regex values, which is a universal way to describe them (String
is a suitable type no matter the regex flavor).
IMO: Regex
should only be used for in-language string matching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users need to understand the exact regex semantics. “follow Java rules” is too vague (and why refer Pkl users to Java?), and isRegex
only covers regex syntax. Because the regex will be evaluated by Pkl (even if it happens in Java code), replace: Regex
seems logical to me. I don’t see why portability matters here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users need to understand the exact regex semantics. “follow Java rules” is too vague (and why refer Pkl users to Java
I disagree with this, actually. When writing regex, it's much easier to understand "this is a Java regex" than needing to look through the full set of available features. It's not any different than the many tools that refer their users to Perl (see PCRE).
As a matter of principle, Regex
shouldn't be used to express configuration data in Pkl; it's a data type that's meant for in-language operations. The same can be said of IntSeq
, Pair
, List
, Map
, etc.
The first, longest-matching rewrite rule is applied. | ||
In other words: | ||
- when multiple rules match, a rule that matches a longer substring of the URI is preferred over one matching a shorter substring; and | ||
- when multiple rules match the same substring length, the first rule is chosen (in definition order). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these rewrite rules are user-defined as an in-order list, I think we can just pick the first rewrite rule that matches.
It would also be cheaper this way; it allows us to short-circuit once we find a rule that matches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might also want to validate that a rewrite can only be rewritten to the same scheme, and perhaps that the rewrite is only for http/https.
Rewriting to a different scheme can be problematic because of minor differences between how different schemes work (e.g. HTTPS urls don't support glob imports).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we want this to intentionally exclude custom resource readers?
106 | text = renderer.renderDocument(value) | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
at pkl.base#Module.output.text (https://github.com/apple/pkl/blob/0.25.2/stdlib/base.pkl#L106) | ||
---- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would this work if the rewrite rule is provided via another means, like via a CLI flag?
Also, it doesn't sound great to insert stack frames to point to nodes that aren't actually being executed. We do this sparingly (e.g. to represent values that initialized as parse-time constants, which is a language implementation optimization detail).
Rather than this, I suggest that we decorate the error with hints. Take a look at VmExceptionBuilder#withHint
.
new UriRewrite { | ||
replace = Regex("^") | ||
with = throw("URI found that is uncovered by any of the rewrite rules") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this can work--in order to configure the evaluator at runtime, all of the rewrite rules need to be consumed (ergo, deep forced).
* Status: Review | ||
* Implemented in: Pkl 0.26.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Status: Review | |
* Implemented in: Pkl 0.26.0 | |
* Status: TBD | |
* Implemented in: TBD |
@@ -1,6 +1,6 @@ | |||
= SPICE Name | |||
|
|||
* Proposal: [SPICE-NNNN](./SPICE-NNNN-name-of-proposal.adoc) | |||
* Proposal: link:./NNNN-name-of-proposal.adoc[SPICE-NNNN] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this is causing a merge conflict; rebase and resolve
|
||
1. Via the API; | ||
2. Via `pkl:Project`; and | ||
3. Via arguments on the CLI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also:
- Gradle plugin
- Java evaluator API
- Server mode message passing API
~/.pkl/settings.pkl
file
In particular I think the settings file is an important place to be able to configure rewrite rules. Not everything needs to be a project, and in some cases can't be a project. This enables regular evaluation of Pkl without needing to introduce projects.
It also is really useful for, say, if you have a machine image (e.g. Docker) that you want to distribute that is already configured to talk to an internal host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Via the API" was intended to refer to the Java Evaluator API (hence the proposed changes to EvaluatorBuilder
). If this is reworded, are you expecting support for other APIs?
Similarly, how do we want to do these for SPICEs generally? List all surfaces where some work needs doing? Then it also needs to list every supported language binding / build system. I was generally assuming it understood that any change/addition of the Evaluator API implies changes/additions to build systems / server / etc. Basically; do we want SPICEs to show the explicit design for every contact surface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all components that are expected to require changes should at least be mentioned, especially those in apple/pkl. Supporting a new feature in the Evaluator API doesn’t necessarily mean that it should also be supported in the Gradle plugin.
=== Implementation | ||
|
||
The proposed design is to implement this behaviour in `IoUtils::resolve`. | ||
`IoUtils::resolve` is used throughout to resolve URIs, regardless of whether a URI is used in an `import`, `import*`, `extends`, `amends`, `read`, or `read*`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably use org.pkl.core.http.RequestRewritingClient
instead, because IoUtils.resolve
doesn't affect how packages get loaded.
We can definitely use org.pkl.core.http.RequestRewritingClient
if we only permit http
/https
rewrites. It'd be more problematic if we allow other types of rewrites though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^^ This goes back to whether we want to restrict rewriting to HTTP(S) and ignore custom resource readers.
Mirroring packages from internet repositories in a local store is insufficient, because the mirrored packages may have hard-coded dependencies. | ||
Importing mirrored packages, using URIs pointing to the mirror, will eventually evaluate `import`s of those hard-coded dependencies, referring to external sites. | ||
|
||
Another reason to want to consume (transitive) dependencies from a mirror is the requirement of altered versions of packages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This amounts to monkey patching. Is it wise for a deterministic config language to encourage/support this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO: I don't think this is a valid use-case; packages have their integrity verified by checksum. Altering a package would break any integrity checks that we would do. The checksums within PklProject.deps.json
should be consistent regardless of rewrite rules.
Packages whose contents need to be altered should probably become forks.
|
||
For systems that require run-time air-gapping, packages imported from `pkg.pkl-lang.org` or other sites on the internet are not available. | ||
This can be remedied by pre-populating a cache directory with all required dependencies (see link:../spices/0001-import-graph-analyzer.pkl[SPICE 1: Import Graph Analyzer API]). | ||
When dependencies can not be determined ahead of time, pre-populating the cache is not an option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can this happen?
* @param replace The literal string to match. | ||
* @param with The literal string to substitute for the first match of [replace]. | ||
*/ | ||
public EvaluatorBuilder addRewriteRule(String replace, String with) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the method name should be more specific. Suggestions: addModuleUriRewriteRule
, addImportUriRewriteRule
, rewriteImportUri
. When I see String replace
, I’m thinking that’s the replacement.
|
||
== Introduction | ||
|
||
This SPICE discusses rewrite rules for URIs of modules being `import`ed, so as to use mirrored alternatives. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a scary feature. I’m worried that it might cause more harm than good. Does Pkl do “feature previews” like (say) Java? Does this feature exist in other languages, especially in languages with a similar safety mindset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is naturally an opt-in feature. Pkl would not do any rewriting unless specifically configured to do so. Unless you were thinking of something else? E.g. a library author being able to opt out of rewrites?
This adds a SPICE for URI rewriting for e.g. running Pkl in air-gapped / mirroring environments.