Skip to content
This repository has been archived by the owner on May 3, 2022. It is now read-only.

Shading of PMML breaks apps that extend MLUpdate #336

Open
srowen opened this issue Jul 12, 2017 · 2 comments
Open

Shading of PMML breaks apps that extend MLUpdate #336

srowen opened this issue Jul 12, 2017 · 2 comments
Assignees
Milestone

Comments

@srowen
Copy link
Member

srowen commented Jul 12, 2017

Quoting from the mailing list:

    I meet a problem when running batch layer.
    I write a batch layer LRScalaUpdate  with scala extends MLUpdate, override  buildModel() and evaluate() method. then i get an exception when running the batch layer.
    I'm wondering why it call MLUpdate.buildModel  instead of my LRScalaUpdate.buildModel. 
    can you give me some suggestions? thank you

17/07/12 14:55:06 INFO cluster.YarnClusterScheduler: Removed TaskSet 7.0, whose tasks have all completed, from pool 
17/07/12 14:55:06 INFO scheduler.DAGScheduler: ResultStage 7 (isEmpty at MLUpdate.java:360) finished in 0.093 s
17/07/12 14:55:06 INFO scheduler.DAGScheduler: Job 7 finished: isEmpty at MLUpdate.java:360, took 0.109474 s
Exception in thread "streaming-job-executor-0" java.lang.AbstractMethodError: com.cloudera.oryx.ml.MLUpdate.buildModel(Lorg/apache/spark/api/java/JavaSparkContext;
Lorg/apache/spark/api/java/JavaRDD;Ljava/util/List;Lorg/apache/hadoop/fs/Path;)Loryx/org/dmg/pmml/PMML;
	at com.cloudera.oryx.ml.MLUpdate.buildAndEval(MLUpdate.java:314)
	at com.cloudera.oryx.ml.MLUpdate.lambda$findBestCandidatePath$0(MLUpdate.java:259)
	at java.util.stream.IntPipeline$4$1.accept(IntPipeline.java:250)
	at java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
...

Oryx shades its use of PMML classes to avoid classpath conflict with Spark. That's fine as it's internal to Oryx.

Except, one key thing I overlooked: MLUpdate actually forms a sort of API outside of the api package, and it does use one PMML class in its signature.

@srowen
Copy link
Member Author

srowen commented Jul 12, 2017

Currently, this can be worked around by shading JPMML in the same way in the client app:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <version>3.0.0</version>
    <executions>
        <execution>
            <id>shade</id>
            <phase>package</phase>
            <goals>
                <goal>shade</goal>
            </goals>
            <configuration>
                <artifactSet>
                    <includes>
                        <include>your.group:*</include>
                    </includes>
                </artifactSet>
                <relocations>
                    <relocation>
                        <pattern>org.jpmml</pattern>
                        <shadedPattern>oryx.org.jpmml</shadedPattern>
                        <includes>
                            <include>org.jpmml.**</include>
                        </includes>
                    </relocation>
                    <relocation>
                        <pattern>org.dmg</pattern>
                        <shadedPattern>oryx.org.dmg</shadedPattern>
                        <includes>
                            <include>org.dmg.**</include>
                        </includes>
                    </relocation>
                </relocations>
            </configuration>
        </execution>
    </executions>
</plugin>

@srowen
Copy link
Member Author

srowen commented Jul 13, 2017

Any solution I can think of ends up requiring an API change for MLUpdate, which is, while not a formal API, something that people might want to extend. However, it's clear anyone trying to extend it will find it doesn't work anyway.

The modified API would be a little clunky, making people pass Strings instead of PMML objects.

The workaround above isn't that bad and can be documented in the example project. Also, Spark 2.3 will shade JPMML and let Oryx un-shade this, in Oryx 2.6 perhaps.

For now I favor just documenting the workaround and removing all the shading later. It might be the least change.

srowen added a commit to srowen/oryx that referenced this issue Jul 13, 2017
srowen added a commit to srowen/oryx that referenced this issue Jul 13, 2017
@srowen srowen added this to the 2.6.0 milestone Sep 4, 2017
@srowen srowen modified the milestones: 2.6.0, 2.7.0 Oct 25, 2017
@srowen srowen modified the milestones: 2.7.0, 3.0.0 Aug 3, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant