Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standalone jar #1166

Open
wants to merge 32 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
a184584
Include Spark in the jar.
Feb 2, 2022
77d08c5
Make logging less verbose.
Feb 2, 2022
61adce7
Improve error message for memory.
Feb 3, 2022
ff01959
Fixes.
Feb 3, 2022
fbf4470
Merge branch 'master' into Jar
ghislainfourny Feb 3, 2022
8b5a237
Merge branch 'master' into Jar
ghislainfourny Feb 8, 2022
f1538a3
Adapt default message.
Feb 11, 2022
6b1cdb0
Merge branch 'master' into Jar
ghislainfourny Apr 12, 2022
1028e49
Merge branch 'master' into Jar
ghislainfourny Jun 14, 2022
b7b5fc2
Merge branch 'master' into Jar
ghislainfourny Nov 2, 2022
d4a84ec
Merge branch 'master' into Jar
ghislainfourny Nov 7, 2022
e8472d0
Fix typo in default screeen of standalone jar
ghislainfourny Nov 7, 2022
9e1dba1
Version bump
ghislainfourny Nov 7, 2022
88f1329
Fix mistake in default screen for repl
ghislainfourny Nov 7, 2022
1ec74ec
Merge branch 'master' into Jar
ghislainfourny Nov 7, 2022
698dbfa
Merge branch 'master' into Jar
ghislainfourny Mar 23, 2023
f20c7ff
Merge branch 'master' into Jar
ghislainfourny May 4, 2023
688c213
Update Base64BinaryItem.java
ghislainfourny May 4, 2023
9960320
Merge branch 'master' into Jar
ghislainfourny May 16, 2023
9b3756e
Fix parser.
May 16, 2023
630214d
Merge branch 'master' into Jar
ghislainfourny Feb 27, 2024
191fa63
Merge branch 'master' into Jar
ghislainfourny Jul 10, 2024
a37fe9b
Merge branch 'master' of github.com:RumbleDB/rumble into Jar
Jul 10, 2024
1d841be
Upgrade versions.
Jul 10, 2024
caf2882
Merge branch 'master' into Jar
ghislainfourny Jul 10, 2024
0fe3fe8
Merge branch 'master' into Jar
ghislainfourny Oct 24, 2024
b3dd773
Merge branch 'delta-lake-functions' of github.com:RumbleDB/rumble int…
Oct 24, 2024
f70ddb3
Merge branch 'master' into Jar
ghislainfourny Oct 24, 2024
d6477cd
Merge branch 'master' of github.com:RumbleDB/rumble into Jar
Oct 28, 2024
2cf1e2e
Merge branch 'master' of github.com:RumbleDB/rumble into Jar
Oct 28, 2024
180c46d
Upgrade ant version.
Oct 28, 2024
0d88f87
Revert.
Oct 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 4 additions & 8 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -210,31 +210,27 @@
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.4.2</version>
<scope>provided</scope>
<version>3.4.3</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.4.2</version>
<scope>provided</scope>
<version>3.4.3</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.12</artifactId>
<version>3.4.2</version>
<scope>provided</scope>
<version>3.4.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>3.3.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-avro_2.12</artifactId>
<version>3.4.2</version>
<version>3.4.3</version>
</dependency>
<dependency>
<groupId>org.antlr</groupId>
Expand Down
17 changes: 16 additions & 1 deletion src/main/java/org/rumbledb/cli/JsoniqQueryExecutor.java
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
import org.rumbledb.exceptions.ExceptionMetadata;
import org.rumbledb.optimizations.Profiler;
import org.rumbledb.runtime.functions.input.FileSystemUtil;
import org.slf4j.Logger;

import sparksoniq.spark.SparkSessionManager;
import java.io.IOException;
Expand All @@ -43,11 +44,13 @@
import java.util.Map;
import java.util.stream.Collectors;

import org.apache.spark.internal.Logging;

public class JsoniqQueryExecutor {
public class JsoniqQueryExecutor implements Logging {
private RumbleRuntimeConfiguration configuration;

public JsoniqQueryExecutor(RumbleRuntimeConfiguration configuration) {
initializeLogIfNecessary(true, true);
this.configuration = configuration;
SparkSessionManager.COLLECT_ITEM_LIMIT = configuration.getResultSizeCap();
}
Expand Down Expand Up @@ -238,4 +241,16 @@ public long runInteractive(String query, List<Item> resultList) throws IOExcepti
return SparkSessionManager.collectRDDwithLimitWarningOnly(rdd, resultList);
}

@Override
public Logger org$apache$spark$internal$Logging$$log_() {
// TODO Auto-generated method stub
return null;
}

@Override
public void org$apache$spark$internal$Logging$$log__$eq(Logger x$1) {
// TODO Auto-generated method stub

}

}
8 changes: 7 additions & 1 deletion src/main/java/org/rumbledb/cli/Main.java
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,13 @@ private static void handleException(Throwable ex, boolean showErrorInfo) {
"⚠️ Java went out of memory."
);
System.err.println(
"If running locally, try adding --driver-memory 10G (or any quantity you need) between spark-submit and the RumbleDB jar in the command line to see if it fixes the problem. If running on a cluster, --executor-memory is the way to go."
"If running locally with java -jar, try adding --Xmx10g (or any quantity you need) before the RumbleDB jar in the command line to see if it fixes the problem."
);
System.err.println(
"If running locally with spark-submit, try adding --driver-memory 10G (or any quantity you need) between spark-submit and the RumbleDB jar in the command line to see if it fixes the problem."
);
System.err.println(
"If running on a cluster, --executor-memory should be used instead."
);
if (showErrorInfo) {
ex.printStackTrace();
Expand Down
20 changes: 20 additions & 0 deletions src/main/java/sparksoniq/spark/SparkSessionManager.java
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,26 @@ public SparkSession getOrCreateSession() {

private void setDefaultConfiguration() {
try {
if (System.getProperty("hadoop.home.dir") == null) {
System.err.println(
"[WARNING] The hadoop home directory was not set. Setting to \"/\"."
);
System.setProperty("hadoop.home.dir", "/");
}
String javaVersion = System.getProperty("java.version");
if (!javaVersion.startsWith("1.8") && !javaVersion.startsWith("11.")) {
System.err.println("[Error] RumbleDB requires Java 8 or Java 11.");
System.err.println("Your Java version: " + System.getProperty("java.version"));
}

/*
* System.err.println(
* "[INFO] Total available memory: " + (Runtime.getRuntime().maxMemory() / 1000000000) + " GB"
* );
* System.err.println(
* "[INFO] Total available cores: " + Runtime.getRuntime().availableProcessors()
* );
*/
this.configuration = new SparkConf();
if (this.configuration.get("spark.app.name", "<none>").equals("<none")) {
LogManager.getLogger("SparkSessionManager")
Expand Down
36 changes: 11 additions & 25 deletions src/main/resources/assets/defaultscreen.txt
Original file line number Diff line number Diff line change
@@ -1,62 +1,48 @@
RumbleDB is a JSONiq engine that can be used both on your laptop or on a
cluster (e.g. with Amazon EMR or Azure HDInsight).

It runs on top of Apache Spark and must be invoked with spark-submit, both for
local use and for cluster use. Spark must be installed either on your laptop,
or on the cluster.
This is the standalone jar that does not require the installation of Spark.

If you need more control over Spark or use it on a cluster, we recommend using
the leaner jars instead, which you can download from www.rumbledb.org.

If you do not want to install Spark, then you need to use the standalone jar
instead from www.rumbledb.org.

Usage:
spark-submit <Spark arguments> <path to RumbleDB's jar> <mode> <parameters>
java -jar <path to RumbleDB's jar> <mode> <parameters>

The first optional argument specifies the mode:
**** run ****
for directly running a query from an input file or (with -q) provided directly on the command line.

It is the default mode.

spark-submit rumbledb-1.22.0.jar run my-query.jq
spark-submit rumbledb-1.22.0.jar run -q '1+1'
java -jar rumbledb-1.22.0.jar run my-query.jq
java -jar rumbledb-1.22.0.jar run -q '1+1'

You can specify an output path with -o like so:
spark-submit rumbledb-1.22.0.jar run -q '1+1' -o my-output.txt
java -jar rumbledb-1.22.0.jar run -q '1+1' -o my-output.txt

**** serve ****
for running as an HTTP server listening on the specified port (-p) and host (-h).

spark-submit rumbledb-1.22.0.jar serve -p 9090
java -jar rumbledb-1.22.0.jar serve -p 9090

RumbleDB also supports Apache Livy for use in Jupyter notebooks, which may be
even more convenient if you are using a cluster.

**** repl ****
for shell mode.

spark-submit rumbledb-1.22.0.jar repl
java -jar rumbledb-1.22.0.jar repl


**** resource use configuration ****

For a local use, you can control the number of cores, as well as allocated
memory, with:
spark-submit --master local[*] rumbledb-1.22.0.jar repl
spark-submit --master local[*] rumbledb-1.22.0.jar repl
spark-submit --master local[2] rumbledb-1.22.0.jar repl
spark-submit --master local[*] --driver-memory 10G rumbledb-1.22.0.jar repl

You can use RumbleDB remotely with:
spark-submit --master yarn rumbledb-1.22.0.jar repl

(Although for clusters provided as a service, --master yarn is often implicit
and unnecessary).

For remote use (e.g., logged in on the Spark cluster with ssh), you can set the
number of executors, cores and memory, you can use:
spark-submit --executor-cores 3 --executor-memory 5G rumbledb-1.22.0.jar repl

For remote use, you can also use other file system paths such as S3, HDFS, etc:
spark-submit rumbledb-1.22.0.jar run hdfs://server:port/my-query.jq -o hdfs://server:port/my-output.json
java -jar -Xmx10g rumbledb-1.22.0.jar repl

More documentation on available CLI parameters is available on https://www.rumbledb.org/
Loading