Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(crd-generator): Add CRD-Generator Maven plugin #5979

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

baloo42
Copy link
Contributor

@baloo42 baloo42 commented May 3, 2024

Description

Add CRD-Generator Maven plugin (using api-v2).

Fixes #5944

Examples: https://github.com/baloo42/crd-generator-maven-examples

mvn crd-generator:help -Ddetail=true -Dgoal=generate

crd-generator:generate

  Available parameters:

    classpath (Default: WITH_RUNTIME_DEPENDENCIES)
      The classpath which should be used during the CRD generation.
      Choice of:
      * PROJECT_ONLY: Only classes in the project.
      * WITH_RUNTIME_DEPENDENCIES: Classes from the project and any runtime
      dependencies.
      * WITH_COMPILE_DEPENDENCIES: Classes from the project and any compile time
      dependencies.
      * WITH_ALL_DEPENDENCIES: Classes from the project, compile time and
      runtime dependencies.
      * WITH_ALL_DEPENDENCIES_AND_TESTS: Classes from the project (including
      tests), compile time, runtime and test dependencies.
      User property: fabric8.crd-generator.classpath

    customResourceClasses
      Custom Resource classes, which should be considered to generate the CRDs.
      If set, scanning is disabled.
      User property: fabric8.crd-generator.customResourceClasses

    dependenciesToScan
      Dependencies which should be scanned for Custom Resources.
      User property: fabric8.crd-generator.dependenciesToScan

    exclusions
      Exclusions, used to filter Custom Resource classes after scanning.
      User property: fabric8.crd-generator.exclusions

    forceIndex (Default: false)
      If true, a Jandex index will be created even if the directory or JAR file
      contains an existing index.
      User property: fabric8.crd-generator.forceIndex

    forceScan (Default: false)
      If true, directories and JAR files are scanned even if Custom Resource
      classes are given.
      User property: fabric8.crd-generator.forceScan

    implicitPreserveUnknownFields (Default: false)
      If true, x-kubernetes-preserve-unknown-fields: true will be added on
      objects which contain an any-setter or any-getter.
      User property: fabric8.crd-generator.implicitPreserveUnknownFields

    inclusions
      Inclusions, used to filter Custom Resource classes after scanning.
      User property: fabric8.crd-generator.inclusions

    outputDirectory (Default:
    ${project.build.outputDirectory}/META-INF/fabric8/)
      The output directory where the CRDs are emitted.
      User property: fabric8.crd-generator.outputDirectory

    parallel (Default: true)
      If true, the CRDs will be generated in parallel.
      User property: fabric8.crd-generator.parallel

    skip (Default: false)
      If true, execution will be skipped.
      User property: fabric8.crd-generator.skip

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change
  • Chore (non-breaking change which doesn't affect codebase;
    test, version modification, documentation, etc.)

Checklist

  • Code contributed by me aligns with current project license: Apache 2.0
  • I Added CHANGELOG entry regarding this change
  • I have implemented unit tests to cover my changes
  • I have added/updated the javadocs and other documentation accordingly
  • No new bugs, code smells, etc. in SonarCloud report
  • I tested my code in Kubernetes
  • I tested my code in OpenShift

@baloo42 baloo42 force-pushed the crd-generator-maven-plugin branch 2 times, most recently from fa9d221 to dc50235 Compare May 8, 2024 06:59
@baloo42
Copy link
Contributor Author

baloo42 commented May 8, 2024

@manusa Can you have second look on this? Is this now going into the right direction?

The logic to collect and load custom resource classes is now included in an own module "crd-generator-collector" intended to be shared between maven-plugin, gradle-plugin, cli etc.

TBD:

  • Do we have to care about Zip Bombs, if we collect custom resource classes from a jar file? (I have added some safeguards but sonar is not recognizing it)
  • Do we need reset methods? (the collector classes contain a state) --> added
  • Do we want to include the examples in some way? --> https://github.com/baloo42/crd-generator-maven-examples
    --> added as integration tests

A prototype for the CLI which is using the same base can be found here: baloo42#9

@baloo42
Copy link
Contributor Author

baloo42 commented May 8, 2024

@shawkins, @andreaTP Maybe you can have a look on this too.

Marc has mentioned that he is not sure about the final / future purpose of the new module (me too), so I'll try to explain a few thoughts I had.

To use the CRD-Generator from CLI, Maven, Gradle etc. we need in general two main features which are not implemented in the api-v2 module:

  • Search / Collect / Filter custom resources classes
  • Customizable class loading

The new crd-generator-collector module implements both features via the main component CustomResourceCollector which can be used in the following way:

CustomResourceCollector customResourceCollector = new CustomResourceCollector()
        .withClasspathElements(<additional classpath elements>)
        .withFilesToIndex(<e.g. directory with class files or jar file with class files>)

CustomResourceInfo[] customResourceInfos = customResourceCollector.findCustomResources();

CRDGenerator crdGenerator = new CRDGenerator()
        .customResources(customResourceInfos)
        .inOutputDir(outputDirectory);
        
crdGenerator.detailedGenerate()

In conclusion this approach is some kind of "pre-processor" for the CRDGenerator.

Another approach would be to implement it as a facade ("processor"):

CRDGeneratorFacade facade = new CRDGeneratorFacade()
        .withClasspathElements(<additional classpath elements>)
        .withFilesToIndex(<e.g. directory with class files or jar file with class files>)
        .inOutputDir(outputDirectory)

facade.detailedGenerate()

This approach is nicer from a user perspective but we need to passthrough every option (or use some kind of nested builder pattern).

A third approach would be to implement both features in the api-v2 module itself and make jandex an optional dependency (or allow to extend it by using a service loader).

CRDGenerator crdGenerator = new CRDGenerator()
        .withClasspathElements(<additional classpath elements>)
        .withFilesToIndex(<e.g. directory with class files or jar file with class files>)
        .inOutputDir(outputDirectory)

crdGenerator.detailedGenerate()

This approach might be the nicest but it would also be the most advanced: the api-v2 module has now to deal somehow with additional dependencies like Jandex and the code base of the api-v2 would grow.

Which approach do you prefer? Other thoughts?

@shawkins
Copy link
Contributor

shawkins commented May 8, 2024

Which approach do you prefer? Other thoughts?

I'd like to keep the core api as simple as possible - whatever is needed to satisfy Quarkus should be good.

The first option seems good enough. I don't really see much need for coupling between the collector and the crd generator - it doesn't really need to provide CustomResourceInfo, just classes. I mentioned on a pr somewhere that the collector to me just looks like something that you could provide a list of interfaces / base classes, and a list of annotations, and it will just find everything that matches that.

@baloo42
Copy link
Contributor Author

baloo42 commented May 8, 2024

I'd like to keep the core api as simple as possible - whatever is needed to satisfy Quarkus should be good.
👍

The first option seems good enough. I don't really see much need for coupling between the collector and the crd generator - it doesn't really need to provide CustomResourceInfo, just classes.

Yes, you are right. Providing only classes could basically do the job but then we cannot implement high level filtering (e.g. by group or version). Do we need high level filtering? If yes: High level filtering could also be implemented in api-v2 so that we can stay by the strategy to return only classes instead of CustomResourceInfo's.

I mentioned on a pr somewhere that the collector to me just looks like something that you could provide a list of interfaces / base classes, and a list of annotations, and it will just find everything that matches that.

Yes, at the moment the base interface (HasMetadata) and the additional required annotations (Version, Group) to identify a custom resource class are hardcoded. But this can be easily changed to be configurable (with sane defaults) if you think this is useful. Do we need to search for other Interfaces/Annotations?

@andreaTP
Copy link
Member

andreaTP commented May 8, 2024

Completely agree with what @shawkins said, let's keep the core API as simple as possible.
I don't think that filtering is required, and can be implemented in the tooling when necessary.

@shawkins
Copy link
Contributor

shawkins commented May 8, 2024

Do we need high level filtering? If yes: High level filtering could also be implemented in api-v2 so that we can stay by the strategy to return only classes instead of CustomResourceInfo's.

Right I'd vote for putting that into the api if needed.

It would probably be good to consider whether having a public constructor on CustomResourceInfo makes sense - that opens up a different usage model for the CRDGenerator that isn't bound to the typical annotations.

Do we need to search for other Interfaces/Annotations?

Not yet. I was just highlighting that it doesn't really need to be seen as that specific to CRs.

@baloo42
Copy link
Contributor Author

baloo42 commented May 8, 2024

Thanks for the feedback. To sum it up, I have the following todo's so far:

  • Return a list of custom resource classes instead of a list of CustomResourceInfo's
  • Remove high level filtering from crd-generator-collector module with no replacement atm
  • Try to generalize the used interfaces / annotations to identify a custom resource

@shawkins
Copy link
Contributor

shawkins commented May 8, 2024

Try to generalize the used interfaces / annotations to identify a custom resource

Don't worry about this step just yet, what you currently have should work fine for the first iteration.

@baloo42 baloo42 force-pushed the crd-generator-maven-plugin branch from 3705e74 to 2842f0d Compare May 9, 2024 18:56
return customResourceClasses;
}

public Class<? extends HasMetadata>[] findCustomResourceClasses() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is required because the CRDGenerator accepts only an array.

Can we extend the CRDGenerator with an additional method?

  public CRDGenerator customResourceClasses(Collection<Class<? extends HasMetadata>> crClasses) {
    return customResources(crClasses.stream().map(CustomResourceInfo::fromClass).toArray(CustomResourceInfo[]::new));
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shawkins This question was mostly intended for you. What plans do you have for:

https://github.com/fabric8io/kubernetes-client/blob/main/crd-generator/api-v2/src/main/java/io/fabric8/crdv2/generator/CRDGenerator.java#L114

Are you open to add an alternative with a collection?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baloo42 yes, feel free to add whatever method works best.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we could possibly make API changes in the v2 version if needed… as long as it is possible to generate CRDs programmatically, incrementally registering classes to generate from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added two additional input methods for collections (both nullsafe) and kept the existing ones, so that upstream projects can now choose between a collection, an array or a single CustomResourceInfo or CustomResource class.

Note that we could possibly make API changes in the v2 version if needed… as long as it is possible to generate CRDs programmatically, incrementally registering classes to generate from.

Good to know, but IMHO we should keep both possibilities for now, so that we don't introduce unecessary changes. Please let me now if I should mark the array methods as deprecated, otherwise this thread can be marked as resolved in my opinion.

*
* @see JandexCustomResourceClassScanner#findCustomResourceClasses(IndexView)
*/
private Index createBaseIndex() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be optimized if required.
Existing indices of the fabric8 modules can be used or a specialized index can be created at build time.

Do we need further optimization within this PR?

Copy link
Contributor

@shawkins shawkins May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should reuse the existing index is possible - since users wil need to modify their pom / gradle file to use these plugins, it doesn't seem out of the question to just document also requiring the jandex plugin be configured as well for projects that want to create crds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm currently writing some usage docs about the different use cases, including when additional configuration is required. This should help to get an overview from the user perspective.

Some notes on a more technically perspective:

  • Jandex is only required to "scan" for Custom Resource classes. If the list of classes is given, no scanning at all is required. (The classloader does the heavy lifting)
  • The implementation detects existing indices in the scan targets automatically and prefers those. Only if a scan target does not contain an index, an own in-memory index will be created. This means having an existing index is just optimization, never a requirement.
  • Custom Resource classes must be loaded before they can be used by the CRDGenerator itself. Any dependency which is required to load those classes must be in the classpath.
  • A scan target must be also in the classpath. But not every classpath element is a scan target.
  • Additional indexed classes beside the "base-index" are only required, if the user wants to scan a target with Custom Resource classes which don't implement the CustomResource or HasMetadata interface directly. In this case, the user must ensure that the used intermediate class is included somehow in the underlying composite index. For example, he can add the dependency which includes the class to the list of scan targets, so that an index is automatically created if needed.

In conclusion, a user doesn't need to know anything about Jandex. Even for the edge case from the last bullet point, the user only has to care about including all "relevant" classes to the list of scan targets and not Jandex specifically.

(But if his project already uses Jandex, the scans will be a little bit faster.)

Keeping this in mind, optimizing the creation of the base index is mostly about performance.
Only if there are other "intermediate" classes like "CustomResource", then we should add them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote an integration test for the "intermediate class" case (last bullet point from above):

ddc3bec#diff-0488fc4e2f91d3e88ce94d75d1317631a14f2854194a2cf68354be4712e9c2cb

In this case, the user must add the dependency, which contains the intermediate class, to the list of dependencies to scan. Under the hood the classes are now added to the composite index and Custom Resources Classes will be found by the collector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are any questions on this open or can we resolve this conversion?

crd-generator/maven-plugin/pom.xml Outdated Show resolved Hide resolved

private int maxJarEntries = 100000;
private long maxBytesReadFromJar = 100000000; // 100 MB
private long maxClassFileSize = 1000000; // 1 MB
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need limits?

@baloo42 baloo42 marked this pull request as ready for review May 12, 2024 19:24
@baloo42 baloo42 force-pushed the crd-generator-maven-plugin branch from cc4ec43 to b79bae8 Compare May 15, 2024 09:54
@baloo42 baloo42 force-pushed the crd-generator-maven-plugin branch from 59dffef to 84f1800 Compare May 30, 2024 10:26
@baloo42 baloo42 force-pushed the crd-generator-maven-plugin branch from 98c50e7 to f216529 Compare May 30, 2024 13:30
@metacosm
Copy link
Collaborator

Which approach do you prefer? Other thoughts?

I'd like to keep the core api as simple as possible - whatever is needed to satisfy Quarkus should be good.

The first option seems good enough. I don't really see much need for coupling between the collector and the crd generator - it doesn't really need to provide CustomResourceInfo, just classes. I mentioned on a pr somewhere that the collector to me just looks like something that you could provide a list of interfaces / base classes, and a list of annotations, and it will just find everything that matches that.

I need to look deeper at this PR but one thing that is raising my interest is the use of Jandex. The Quarkus extension for the Java Operator SDK is using classes to get to the needed CustomResourceInfo… If we'd have a generator that worked directly off of Jandex' ClassInfo, that'd be a win. That said, the Jackson-based approach requires classes so… 🤷🏼
What we really need is a Jackson processor that works from Jandex indices! 😀

Copy link

sonarcloud bot commented May 31, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot
75.4% Coverage on New Code (required ≥ 80%)

See analysis details on SonarCloud

@baloo42
Copy link
Contributor Author

baloo42 commented May 31, 2024

I need to look deeper at this PR but one thing that is raising my interest is the use of Jandex. The Quarkus extension for the Java Operator SDK is using classes to get to the needed CustomResourceInfo… If we'd have a generator that worked directly off of Jandex' ClassInfo, that'd be a win. That said, the Jackson-based approach requires classes so… 🤷🏼
What we really need is a Jackson processor that works from Jandex indices! 😀

I agree, using directly such ClassInfo objects would be a nice solution. As you said, the underlying JsonSchemaGenerator from Jackson requires classes atm. We would need Jackson to directly work with Jandex or create synthetically classes from ClassInfo objects. But I'm really unsure if this is possible or worth it.

For now Jandex is only used to "scan" for Custom Resource classes. If a project already uses Jandex to create an index (or a dependency, which should be scanned, contains a Jandex index) then those existing indices will be used instead of generating an own index, so that finding Custom Resource classes should be little bit faster.

@metacosm
Copy link
Collaborator

I need to look deeper at this PR but one thing that is raising my interest is the use of Jandex. The Quarkus extension for the Java Operator SDK is using classes to get to the needed CustomResourceInfo… If we'd have a generator that worked directly off of Jandex' ClassInfo, that'd be a win. That said, the Jackson-based approach requires classes so… 🤷🏼
What we really need is a Jackson processor that works from Jandex indices! 😀

I agree, using directly such ClassInfo objects would be a nice solution. As you said, the underlying JsonSchemaGenerator from Jackson requires classes atm. We would need Jackson to directly work with Jandex or create synthetically classes from ClassInfo objects. But I'm really unsure if this is possible or worth it.

Yes, it's probably not worth it at this point as that would be a major undertaking, not to mention that we want the CRD generator to follow Jackson's conventions…

For now Jandex is only used to "scan" for Custom Resource classes. If a project already uses Jandex to create an index (or a dependency, which should be scanned, contains a Jandex index) then those existing indices will be used instead of generating an own index, so that finding Custom Resource classes should be little bit faster.

In a Quarkus extension, like is the case for the Quarkus extension for the Java Operator SDK, we already have an index, so being able to pass the index to the generator would be good, indeed.

@baloo42
Copy link
Contributor Author

baloo42 commented May 31, 2024

For now Jandex is only used to "scan" for Custom Resource classes. If a project already uses Jandex to create an index (or a dependency, which should be scanned, contains a Jandex index) then those existing indices will be used instead of generating an own index, so that finding Custom Resource classes should be little bit faster.

In a Quarkus extension, like is the case for the Quarkus extension for the Java Operator SDK, we already have an index, so being able to pass the index to the generator would be good, indeed.

As far as I understand atm the code from QOSDK and Quarkus, I assume the following (please correct me if it's wrong):

  • A jandex index will be always generated for the current project (Because it's a core Quarkus feature for CDI)
  • In QOSDK, the index is used to find reconcilers and associated CRDS, as well as non-reconciler associated CRDs.
  • The results of the CRD generation are important to further generate the CSV manifests

The current code of the CustomResourceCollector already supports contributing existing indices:

https://github.com/baloo42/kubernetes-client/blob/crd-generator-maven-plugin/crd-generator/collector/src/main/java/io/fabric8/crd/generator/collector/CustomResourceCollector.java#L98

Therefore it should be possible to reuse the collector module from within QOSDK. At least for the non-reconciler associated CRDs:

https://github.com/quarkiverse/quarkus-operator-sdk/blob/next-fabric8-version/core/deployment/src/main/java/io/quarkiverse/operatorsdk/deployment/CRDGenerationBuildStep.java#L78

This would result in the exact same behaviour to find a custom resource class if we compare qosdk, crd-generator maven plugin, gradle plugin etc. At the same time it wouldn't need too much changes on QOSDK side.

On CustomResourceCollector side, we only have to ensure that we can disable creating own indices entirely e.g. by adding a flag.

What do you think?

@metacosm
Copy link
Collaborator

For now Jandex is only used to "scan" for Custom Resource classes. If a project already uses Jandex to create an index (or a dependency, which should be scanned, contains a Jandex index) then those existing indices will be used instead of generating an own index, so that finding Custom Resource classes should be little bit faster.

In a Quarkus extension, like is the case for the Quarkus extension for the Java Operator SDK, we already have an index, so being able to pass the index to the generator would be good, indeed.

As far as I understand atm the code from QOSDK and Quarkus, I assume the following (please correct me if it's wrong):

* A jandex index will be always generated for the current project (Because it's a core Quarkus feature for CDI)

* In QOSDK, the index is used to find reconcilers and associated CRDS, as well as non-reconciler associated CRDs.

* The results of the CRD generation are important to further generate the CSV manifests

The current code of the CustomResourceCollector already supports contributing existing indices:

https://github.com/baloo42/kubernetes-client/blob/crd-generator-maven-plugin/crd-generator/collector/src/main/java/io/fabric8/crd/generator/collector/CustomResourceCollector.java#L98

Therefore it should be possible to reuse the collector module from within QOSDK. At least for the non-reconciler associated CRDs:

https://github.com/quarkiverse/quarkus-operator-sdk/blob/next-fabric8-version/core/deployment/src/main/java/io/quarkiverse/operatorsdk/deployment/CRDGenerationBuildStep.java#L78

This would result in the exact same behaviour to find a custom resource class if we compare qosdk, crd-generator maven plugin, gradle plugin etc. At the same time it wouldn't need too much changes on QOSDK side.

On CustomResourceCollector side, we only have to ensure that we can disable creating own indices entirely e.g. by adding a flag.

What do you think?

Sounds reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CRD-Generator v2: Maven Plugin
5 participants