-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-1808] Route bin/pyspark through Spark submit #799
Conversation
The bin/pyspark script takes two pathways, depending on the application. If the application is a python file, bin/pyspark passes the python file directly to Spark submit, which launches the python application as a sub-process within the JVM. If the application is the pyspark shell, however, bin/pyspark starts the python REPL as the parent process, which launches the JVM as a sub-process. A significant benefit here is that all keyboard signals are propagated first to the Python interpreter properly. The existing code already provided a code path to do this; all we need to change is to use spark-submit instead of spark-class to launch the JVM. This requires modifications to Spark submit to handle the pyspark shell as a special case. This has been tested locally (OSX) for both cases, and using IPython.
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15034/ |
Merged build triggered. |
Merged build started. |
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15038/ |
Merged build finished. All automated tests passed. |
All automated tests passed. |
Note that this reflects changes incorporated in apache#799.
* Merge a sequence of comma-separated file lists into a single comma-separated string. | ||
* The provided strings may be null or empty to indicate no files. | ||
*/ | ||
def mergeFileLists(lists: String*): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this moved to Utils? The code to deal with nulls and empty strings and such is pretty specific to spark-submit
, we don't need other parts of Spark to use it.
Another thing is that when you kill the Python/IPython shell the JVM still seems to be active. I tested this by running the shell, killing it and running it again. It complains that the Spark UI port is already taken. |
Interesting.. I wonder if the --die-on-broken-pipe is somehow not being passed through, since that parameter is supposed to deal with this. |
exec ipython $IPYTHON_OPTS | ||
# If a python file is provided, directly run spark-submit | ||
if [[ "$1" =~ \.py$ ]]; then | ||
exec $FWDIR/bin/spark-submit $PYSPARK_SUBMIT_ARGS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work with quoted arguments. The problem is that when you convert $@
to a variable the type changes to a string from an array. Check out the way ORIG_ARGS
is handled inside of spark-submit
.
Merged build triggered. |
Merged build started. |
At the end of parsing options, we add an empty string to child arguments. This caused the arguments to py4j.JavaGateway to be interpreted as ["", "--die-on-broken-pipe", "0"].
Changes look good to me. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
This is tested on Windows 7. The only thing that does not work is having perverse app names with escaped quotes (e.g. "foo \" jam"). App names with spaces are fine. Other than that, the behavior on Windows is the same on OSX.
This is ready for merge from my side, provided that the tests pass (they should). |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
@@ -28,7 +28,7 @@ esac | |||
# Enter posix mode for bash | |||
set -o posix | |||
|
|||
if [[ "$@" == *--help* ]]; then | |||
if [[ "$@" = *--help ]] || [[ "$@" = *--h ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the second condition for? It doesn't seem to match -h
. I noticed if I run this with -h
then I seem to just get the help option from spark-submit
.
patrick@patrick-t430s:~/Documents/spark$ ./bin/spark-shell -h
Usage: spark-submit [options] <app jar | python file> [app options]
Options:
LGTM - a few minor comments. Mind taking a quick look |
@@ -25,6 +25,12 @@ export SPARK_HOME="$FWDIR" | |||
|
|||
SCALA_VERSION=2.10 | |||
|
|||
if [[ "$@" = *--help ]] || [[ "$@" = *--h ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't seem to match -h
for me... not sure if that was the intention of the second condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe you want this to say
if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
One thing is this will only detect if -h
or --help
is the _last_argument, but I think anything other than that is pretty tricky.
Merged build triggered. |
Merged build started. |
LGTM - thanks Andrew! |
**Problem.** For `bin/pyspark`, there is currently no other way to specify Spark configuration properties other than through `SPARK_JAVA_OPTS` in `conf/spark-env.sh`. However, this mechanism is supposedly deprecated. Instead, it needs to pick up configurations explicitly specified in `conf/spark-defaults.conf`. **Solution.** Have `bin/pyspark` invoke `bin/spark-submit`, like all of its counterparts in Scala land (i.e. `bin/spark-shell`, `bin/run-example`). This has the additional benefit of making the invocation of all the user facing Spark scripts consistent. **Details.** `bin/pyspark` inherently handles two cases: (1) running python applications and (2) running the python shell. For (1), Spark submit already handles running python applications. For cases in which `bin/pyspark` is given a python file, we can simply call pass the file directly to Spark submit and let it handle the rest. For case (2), `bin/pyspark` starts a python process as before, which launches the JVM as a sub-process. The existing code already provides a code path to do this. All we needed to change is to use `bin/spark-submit` instead of `spark-class` to launch the JVM. This requires modifications to Spark submit to handle the pyspark shell as a special case. This has been tested locally (OSX and Windows 7), on a standalone cluster, and on a YARN cluster. Running IPython also works as before, except now it takes in Spark submit arguments too. Author: Andrew Or <[email protected]> Closes #799 from andrewor14/pyspark-submit and squashes the following commits: bf37e36 [Andrew Or] Minor changes 01066fa [Andrew Or] bin/pyspark for Windows c8cb3bf [Andrew Or] Handle perverse app names (with escaped quotes) 1866f85 [Andrew Or] Windows is not cooperating 456d844 [Andrew Or] Guard against shlex hanging if PYSPARK_SUBMIT_ARGS is not set 7eebda8 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit b7ba0d8 [Andrew Or] Address a few comments (minor) 06eb138 [Andrew Or] Use shlex instead of writing our own parser 05879fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a823661 [Andrew Or] Fix --die-on-broken-pipe not propagated properly 6fba412 [Andrew Or] Deal with quotes + address various comments fe4c8a7 [Andrew Or] Update --help for bin/pyspark afe47bf [Andrew Or] Fix spark shell f04aaa4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a371d26 [Andrew Or] Route bin/pyspark through Spark submit (cherry picked from commit 4b8ec6f) Signed-off-by: Patrick Wendell <[email protected]>
A recent PR (#552) fixed this for all Scala / Java examples. We need to do it for python too. Note that this blocks on #799, which makes `bin/pyspark` go through Spark submit. With only the changes in this PR, the only way to run these examples is through Spark submit. Once #799 goes in, you can use `bin/pyspark` to run them too. For example, ``` bin/pyspark examples/src/main/python/pi.py 100 --master local-cluster[4,1,512] ``` Author: Andrew Or <[email protected]> Closes #802 from andrewor14/python-examples and squashes the following commits: cf50b9f [Andrew Or] De-indent python comments (minor) 50f80b1 [Andrew Or] Remove pyFiles from SparkContext construction c362f69 [Andrew Or] Update docs to use spark-submit for python applications 7072c6a [Andrew Or] Merge branch 'master' of github.com:apache/spark into python-examples 427a5f0 [Andrew Or] Update docs d32072c [Andrew Or] Remove <master> from examples + update usages
A recent PR (#552) fixed this for all Scala / Java examples. We need to do it for python too. Note that this blocks on #799, which makes `bin/pyspark` go through Spark submit. With only the changes in this PR, the only way to run these examples is through Spark submit. Once #799 goes in, you can use `bin/pyspark` to run them too. For example, ``` bin/pyspark examples/src/main/python/pi.py 100 --master local-cluster[4,1,512] ``` Author: Andrew Or <[email protected]> Closes #802 from andrewor14/python-examples and squashes the following commits: cf50b9f [Andrew Or] De-indent python comments (minor) 50f80b1 [Andrew Or] Remove pyFiles from SparkContext construction c362f69 [Andrew Or] Update docs to use spark-submit for python applications 7072c6a [Andrew Or] Merge branch 'master' of github.com:apache/spark into python-examples 427a5f0 [Andrew Or] Update docs d32072c [Andrew Or] Remove <master> from examples + update usages (cherry picked from commit cf6cbe9) Signed-off-by: Patrick Wendell <[email protected]>
Merged build finished. All automated tests passed. |
All automated tests passed. |
**Problem.** For `bin/pyspark`, there is currently no other way to specify Spark configuration properties other than through `SPARK_JAVA_OPTS` in `conf/spark-env.sh`. However, this mechanism is supposedly deprecated. Instead, it needs to pick up configurations explicitly specified in `conf/spark-defaults.conf`. **Solution.** Have `bin/pyspark` invoke `bin/spark-submit`, like all of its counterparts in Scala land (i.e. `bin/spark-shell`, `bin/run-example`). This has the additional benefit of making the invocation of all the user facing Spark scripts consistent. **Details.** `bin/pyspark` inherently handles two cases: (1) running python applications and (2) running the python shell. For (1), Spark submit already handles running python applications. For cases in which `bin/pyspark` is given a python file, we can simply call pass the file directly to Spark submit and let it handle the rest. For case (2), `bin/pyspark` starts a python process as before, which launches the JVM as a sub-process. The existing code already provides a code path to do this. All we needed to change is to use `bin/spark-submit` instead of `spark-class` to launch the JVM. This requires modifications to Spark submit to handle the pyspark shell as a special case. This has been tested locally (OSX and Windows 7), on a standalone cluster, and on a YARN cluster. Running IPython also works as before, except now it takes in Spark submit arguments too. Author: Andrew Or <[email protected]> Closes apache#799 from andrewor14/pyspark-submit and squashes the following commits: bf37e36 [Andrew Or] Minor changes 01066fa [Andrew Or] bin/pyspark for Windows c8cb3bf [Andrew Or] Handle perverse app names (with escaped quotes) 1866f85 [Andrew Or] Windows is not cooperating 456d844 [Andrew Or] Guard against shlex hanging if PYSPARK_SUBMIT_ARGS is not set 7eebda8 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit b7ba0d8 [Andrew Or] Address a few comments (minor) 06eb138 [Andrew Or] Use shlex instead of writing our own parser 05879fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a823661 [Andrew Or] Fix --die-on-broken-pipe not propagated properly 6fba412 [Andrew Or] Deal with quotes + address various comments fe4c8a7 [Andrew Or] Update --help for bin/pyspark afe47bf [Andrew Or] Fix spark shell f04aaa4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a371d26 [Andrew Or] Route bin/pyspark through Spark submit
A recent PR (apache#552) fixed this for all Scala / Java examples. We need to do it for python too. Note that this blocks on apache#799, which makes `bin/pyspark` go through Spark submit. With only the changes in this PR, the only way to run these examples is through Spark submit. Once apache#799 goes in, you can use `bin/pyspark` to run them too. For example, ``` bin/pyspark examples/src/main/python/pi.py 100 --master local-cluster[4,1,512] ``` Author: Andrew Or <[email protected]> Closes apache#802 from andrewor14/python-examples and squashes the following commits: cf50b9f [Andrew Or] De-indent python comments (minor) 50f80b1 [Andrew Or] Remove pyFiles from SparkContext construction c362f69 [Andrew Or] Update docs to use spark-submit for python applications 7072c6a [Andrew Or] Merge branch 'master' of github.com:apache/spark into python-examples 427a5f0 [Andrew Or] Update docs d32072c [Andrew Or] Remove <master> from examples + update usages
###### _excavator_ is a bot for automating changes across repositories. Changes produced by the roomba/latest-baseline check. # Release Notes ## 0.50.0 [feature] Warn against .parallel() calls on Java streams (#537) [fix] Correct prioritisation of versions.props to match nebula logic (#533) ## 0.51.0 - [feature] New 'com.palantir.baseline-reproducibility' plugin (#539) - [improvement] `./gradlew idea` deletes redundant ipr files (#550) - [fix] ValidateConstantMessage error-prone check is more accurate. (#546) ## 0.51.1 - [fix] Fix cleanup of old idea project files (#559) - [fix] Remove stale references to no longer existing puppycrawl checkstyle DTDs (#556) ## 0.52.0 - [improvement] errorprone 2.3.2 -> 2.3.3 (#561) - [feature] new plugin: `com.palantir.baseline-exact-dependencies` helps declare necessary and sufficient dependencies (#548) - [improvement] Split out circle style plugin into generic junit reports plugin #564 ## 0.53.0 - [improvement] Disallow javafx imports with checkstyle #569 - [fix] Avoid lambda to allow build caching of checkstyle results #576 ## 0.54.0 - [feature] New `com.palantir.baseline-release-compatibility` plugin (#582) ## 0.55.0 [break] Enable running of unique class check on multiple configurations (#583) ## 0.55.1 [fix] checkImplicitDependencies shouldn't count ignored artifacts (#601) ## 0.55.2 [fix] BaselineReleaseCompatibility up-to-date checking of compile tasks (#605) ## 0.56.0 [feature] Add an errorprone rule GradleCacheableTaskAction that prevents passing a lambda to Task.doFirst or Task.doLast when implementing gradle plugins (#608) ## 0.57.0 * [feature] Error prone rule to replace `Iterables.partition(List, int)` with `Lists.partition(List, int)` (#622) * [feature] Error prone rule to prefer `Lists` or `Collections2` `transfrom` over `Iterables.transform` (#623) ## 0.58.0 [improvement] make CheckClassUniquenessTask cacheable (#637) [fix] Add Javac Settings to uncheck "Use compiler from module target JDK when possible" (#629) [fix] class uniqueness rule must have a config (#638) ## 0.59.0 [improvement] Spotless to remove blank lines at start of methods (#641) ## 0.60.0 * [improvement] New PreferBuiltInConcurrentKeySet suggestion (#649) * [improvement] Start publishing plugin to the [Gradle plugin portal](https://plugins.gradle.org/plugin/com.palantir.baseline) (#613) ## 0.61.0 - [improvement] Sensible defaults for test tasks (HeapDumpOnOutOfMemory) (#652) ## 0.62.0 * [improvement] Ensure Optional#orElse argument is not method invocation (#655) ## 0.62.1 [fix] Revert "[improvement] Ensure Optional#orElse argument is not method invocation" (#659) ## 0.63.0 [improvement] Support auto-applying error-prone suggested fixes (#660) ## 0.64.0 * [improvement] Refaster rule compilation (#661) ## 0.64.1 - [improvement] JUnit 5 boilerplate #666 ## 0.65.0 [improvement] Error-prone check to help prevent logging AuthHeader and BearerToken (#654) [fix] fix potential NPE when configuring testing (#669) [fix] Fix refaster compilation to support version recommendations (#667) ## 0.66.0 [improvement] Ignore DesignForExtension for ParameterizedTest (#673) ## 0.66.1 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | The PreventTokenLogging error-prone check will now correctly handle null use in SLF4J and Safe/Unsafe Arg functions. | palantir/gradle-baseline#674 | ## 1.0.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | Add refaster rule to migrate away from optional.orElse(supplier.get()) | palantir/gradle-baseline#679 | | Fix | Projects can now compile using Java12, because the one errorprone check that breaks (Finally) is now disabled when you use this toolchain. It remains enabled when compiling against earlier JDKs. | palantir/gradle-baseline#681 | ## 1.1.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | Ensure that format tasks execute after compilation | palantir/gradle-baseline#688 | ## 1.1.1 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | Auto-fix OptionalOrElseMethodInvocation using `-PerrorProneApply`. | palantir/gradle-baseline#690 | ## 1.2.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | Spotless check for disallowing dangling parenthesis. | palantir/gradle-baseline#687 | ## 1.3.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | Don't cache test tasks in the build cache by default.<br>It's possible to restore caching by adding `com.palantir.baseline.restore-test-cache = true` to your `gradle.properties`. | palantir/gradle-baseline#694 | ## 1.4.0 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | No longer cache javaCompile tasks when applying errorprone or refaster checks. | palantir/gradle-baseline#696 | | Feature | Test helper for refaster checks. | palantir/gradle-baseline#697 | ## 1.5.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | Determine whether to use junitPlatform on a per source set basis | palantir/gradle-baseline#701 | | Feature | OptionalOrElseMethodInvocation now checks for constructor invocations. | palantir/gradle-baseline#702 | ## 1.6.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | The severity of PreferSafeLoggableExceptions and PreferSafeLoggingPreconditions is now WARNING. | palantir/gradle-baseline#704 | | Fix | OptionalOrElseMethodInvocation now allows method references in orElse. | palantir/gradle-baseline#709 | ## 1.6.1 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | Do not overwrite user provided test configure when using junit5 | palantir/gradle-baseline#712 | ## 1.7.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | Baseline can now re-format all your Java files using the Eclipse formatter. This is currently an opt-in preview, try it out by running `./gradlew format -Pcom.palantir.baseline-format.eclipse`. | palantir/gradle-baseline#707 | | Improvement | Add errorprone check to ensure junit5 tests are not used with junit4 Rule/ClassRule | palantir/gradle-baseline#714 | ## 1.8.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | Checkstyle now tolerates empty lambda bodies (e.g. `() -> {}` | palantir/gradle-baseline#715 | ## 1.8.1 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | Correctly set dependency between spotlessApply and baselineUpdateConfig to prevent a race | palantir/gradle-baseline#724 | ## 1.8.2 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | Correctly handle `EnableRuleMigrationSupport` in `JUnit5RuleUsage` errorprone-rule | palantir/gradle-baseline#725 | ## 1.9.0 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | Wrap long parameterized types where necessary | palantir/gradle-baseline#716 | | Improvement | Allow suppression of the TODO checkstyle check by giving it an ID. Clarify its comment to allow // TODO(username): ... | palantir/gradle-baseline#727 | | Improvement | IntelliJ GitHub issue navigation | palantir/gradle-baseline#729 | | Improvement | print out suggestion for module dependencies inclusion in useful format | palantir/gradle-baseline#733 | | Fix | The `checkImplicitDependencies` task will no longer suggest a fix of the current project. | palantir/gradle-baseline#736, palantir/gradle-baseline#567 | | Improvement | Implement DangerousCompletableFutureUsage errorprone check | palantir/gradle-baseline#740 | ## 1.10.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | Refaster to use `execute` over `submit` when the result is ignored | palantir/gradle-baseline#741 | ## 1.10.1 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | Enable applying refaster rules for repos with -Xlint:deprecation | palantir/gradle-baseline#742 | ## 1.11.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | Apply `InputStreamSlowMultibyteRead` error prone check at ERROR severity | palantir/gradle-baseline#749 | ## 1.12.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | The `baseline-idea` plugin now generates configuration more closely aligned with Gradle defaults. | palantir/gradle-baseline#718 | | Improvement | Apply the suggested fixes for `UnusedMethod` and `UnusedVariable`. | palantir/gradle-baseline#751 | | Improvement | Refaster `stream.sorted().findFirst()` into `stream.min(Comparator.naturalOrder())` | palantir/gradle-baseline#752 | | Improvement | Error prone validation that Stream.sort is invoked on comparable streams | palantir/gradle-baseline#753 | | Improvement | `DangerousStringInternUsage`: Disallow String.intern() invocations | palantir/gradle-baseline#754 | ## 1.12.1 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | Do not apply the suggested fixes for `UnusedMethod` and `UnusedVariable` which automaticall remove code with side effects. | palantir/gradle-baseline#757 | ## 1.13.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | Remove errorprone `LogSafePreconditionsConstantMessage` | palantir/gradle-baseline#755 | | Improvement | Disable errorprone `Slf4jLogsafeArgs` in test code | palantir/gradle-baseline#756 | | Improvement | error-prone now detects `Duration#getNanos` mistakes and bans URL in equals methods | palantir/gradle-baseline#758 | ## 1.14.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | Implement `OptionalOrElseThrowThrows` to prevent throwing from orElseThrow | palantir/gradle-baseline#759 | ## 1.15.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | LogSafePreconditionsMessageFormat disallows slf4j-style format characters | palantir/gradle-baseline#761 | | Improvement | Error Prone LambdaMethodReference check | palantir/gradle-baseline#763 | ## 1.16.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | baseline-circleci no longer integrates with the (deprecated) FindBugs plugin, as a pre-requisite for Gradle 6.0 compatibility. | palantir/gradle-baseline#766 | ## 1.17.0 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | The `TypeParameterUnusedInFormals` errorprone check is disabled when compiling on Java 13, to workaround an error-prone bug. | palantir/gradle-baseline#767 | | Improvement | Publish scm information within POM | palantir/gradle-baseline#769 | ## 1.17.1 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | LambdaMethodReference avoids suggestions for non-static methods | palantir/gradle-baseline#771 | ## 1.17.2 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | Remove pom only dependencies from analysis in checkUnusedDependencies | palantir/gradle-baseline#773 | ## 1.18.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | When computing unused dependencies, compileOnly and annotationProcessor<br>dependencies are ignored due to false positives as these dependencies<br>will not appear as dependencies in the generated byte-code, but are in<br>fact necessary dependencies to compile a given module. | palantir/gradle-baseline#783 | ## 1.19.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | Disable `PreconditionsConstantMessage` on gradle plugins | palantir/gradle-baseline#790 | ## 2.0.0 | Type | Description | Link | | ---- | ----------- | ---- | | Break | Add gradle 6.0-20190904072820+0000 compatibiltiy. This raises minimum required version of gradle for plugins from this repo to 5.0. | palantir/gradle-baseline#791 | ## 2.1.0 | Type | Description | Link | | ---- | ----------- | ---- | | Feature | Automatically configure the [Intellij Eclipse format plugin](https://plugins.jetbrains.com/plugin/6546-eclipse-code-formatter) to use the eclipse formatter | palantir/gradle-baseline#794 | ## 2.1.1 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | Stop applying error prone patches for checks that have been turned off. | palantir/gradle-baseline#793 | ## 2.2.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | baseline-circleci now validates that the rootProject.name isn't the CircleCI default (`project`) as can interfere with publishing. | palantir/gradle-baseline#775 | | Improvement | Remove JGit dependency | palantir/gradle-baseline#798 | ## 2.2.1 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | Don't add whitespace to blank lines inside comments. Fixes apache#799 | palantir/gradle-baseline#800 | | Fix | Eclipse formatter now aligns multicatch so that it passes checkstyle. | palantir/gradle-baseline#807 | ## 2.2.2 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | ClassUniquenessPlugin now checks the `runtimeClasspath` configuration by default. | palantir/gradle-baseline#810 | ## 2.3.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | SafeLoggingExceptionMessageFormat disallows `{}` in safelog exception messages | palantir/gradle-baseline#815 | ## 2.4.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | A new `StrictUnusedVariable` check will catch any unused arguments (e.g. AuthHeaders) to public methods. If you need to suppress this, rename your variable to have an underscore prefix (e.g. `s/foo/_foo/`) or just run `./gradlew classes -PerrorProneApply` to auto-fix | palantir/gradle-baseline#819 | | Improvement | Message format checks use instanceof rather than catching | palantir/gradle-baseline#821 | ## 2.4.1 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | Avoid false positives caused by `module-info.class` when checking class uniqueness | palantir/gradle-baseline#823 | ## 2.4.2 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | Checkstyle tasks only check their own source set and only actual java sources. They don't look in your `src/*/resources` directory anymore. | palantir/gradle-baseline#830 | ## 2.4.3 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | Add link to StrictUnusedVariable that directs users to baseline repo. | palantir/gradle-baseline#829 | | Fix | Long try-with-resources statements are now aligned such that the first assignment stays on the first line. | palantir/gradle-baseline#835 | ## 2.5.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | Error Prone StringBuilderConstantParameters. StringBuilder with a constant number of parameters should be replaced by simple concatenation. The Java compiler (jdk8) replaces concatenation of a constant number of arguments with a StringBuilder, while jdk 9+ take advantage of JEP 280 (https://openjdk.java.net/jeps/280) to efficiently pre-size the result for better performance than a StringBuilder. | palantir/gradle-baseline#832 | ## 2.6.0 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | Excavator PRs that apply other refaster rules (e.g. Witchcraft ones) will not also apply baseline refaster rules. | palantir/gradle-baseline#827 | | Improvement | Added a new ErrorProne check `PreferAssertj` to assist migration to AssertJ from legacy test frameworks. It may be necessary to add a dependency on `org.assertj:assertj-core` in modules which do not already depend on AssertJ. If there's a technical reason that AssertJ cannot be used, `PreferAssertj` may be explicitly disabled to prevent future upgrades from attempting to re-run the migration. | palantir/gradle-baseline#841 | ## 2.7.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | `StrictUnusedVariable` now ignores variables prefixed with `_` and the suggested fix will rename all unused parameters in public methods instead of removing them | palantir/gradle-baseline#833 | | Improvement | ErrorProne will now detect dangerous usage of `@RunWith(Suite.class)` that references JUnit5 classes, as this can cause tests to silently not run! | palantir/gradle-baseline#843 | ## 2.8.0 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | PreferAssertj provides better replacements fixes | palantir/gradle-baseline#850 | | Improvement | Do not run error prone on any code in the build directory | palantir/gradle-baseline#853 | ## 2.8.1 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | Fix hamcrest arrayContainingInAnyOrder conversion | palantir/gradle-baseline#859 | ## 2.9.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | StrictUnusedVariable can only be suppressed with `_` prefix | palantir/gradle-baseline#854 | | Improvement | StrictUnusedVariable is now an error by default | palantir/gradle-baseline#855 | | Fix | The PreferAssertj refactoring will only be applied if you have explicitly opted in (e.g. using `baselineErrorProne { patchChecks += 'PreferAssertj' }` | palantir/gradle-baseline#861 | ## 2.9.1 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | Error prone will correctly ignore all source files in the build directory and in any generated source directory | palantir/gradle-baseline#864 | | Fix | Ensure that `StrictUnusedVariable` correctly converts previously suppressed variables `unused` to `_` | palantir/gradle-baseline#865 | ## 2.9.2 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | When removing unused variables, `StrictUnusedVariable` will preserve side effects | palantir/gradle-baseline#870 | ## 2.10.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | A new `checkJUnitDependencies` task detects misconfigured JUnit dependencies which could result in some tests silently not running. | palantir/gradle-baseline#837 | | Improvement | Some AssertJ assertions can now be automatically replaced with more idiomatic ones using refaster. | palantir/gradle-baseline#851 | | Fix | PreferAssertj check avoids ambiguity in assertThat invocations | palantir/gradle-baseline#874 | | Improvement | Improve performannce of error prone PreferAssertj check | palantir/gradle-baseline#875 | | Improvement | StringBuilderConstantParameters suggested fix doesn't remove comments | palantir/gradle-baseline#877 | ## 2.10.1 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | Allow junit4 dependencies to exist without junit4 tests | palantir/gradle-baseline#880 | ## 2.11.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | PreferAssertj supports migration of zero-delta floating point array asserts | palantir/gradle-baseline#883 | ## 2.11.1 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | checkJunitDependencies only checks Java source | palantir/gradle-baseline#885 | ## 2.11.2 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | AssertJ Refaster fixes use static `assertThat` imports | palantir/gradle-baseline#887 | ## 2.12.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | Disable `UnusedVariable` error prone rule by default | palantir/gradle-baseline#888 | ## 2.13.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | Refaster for AssertJ isZero/isNotZero/isOne and collections | palantir/gradle-baseline#881 | | Improvement | AssertJ refaster migrations support string descriptions | palantir/gradle-baseline#891 | | Fix | Certain error-prone checks are disabled in test code, and the presence of JUnit5's `@TestTemplate` annotation is now used to detect whether a class is test code. | palantir/gradle-baseline#892 | | Fix | BaselineFormat task exclude generated code on Windows | palantir/gradle-baseline#896 | ## 2.14.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | Refaster rules for AssertJ tests | palantir/gradle-baseline#898 | | Improvement | refaster replacement for assertj hasSize(foo.size) -> hasSameSizeAs | palantir/gradle-baseline#900 | | Fix | Keep spotless plugin from eagerly configuring all tasks | diffplug/spotless#444 | | Fix | Continue when RefasterRuleBuilderScanner throws | palantir/gradle-baseline#904 | | Improvement | Refaster now works on repos using Gradle 6.0 | palantir/gradle-baseline#804, palantir/gradle-baseline#906 | ## 2.15.0 _No documented user facing changes_ ## 2.16.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | Rewrite ImmutableCollection#addAll to add for arrays | palantir/gradle-baseline#743 | | Improvement | Add refaster rule to simplify empty optional asserts | palantir/gradle-baseline#911 | | Improvement | Baseline now allows static imports of AssertJ and Mockito methods. | palantir/gradle-baseline#915 | | Improvement | Remove refaster AssertjIsOne rule. | palantir/gradle-baseline#917 | | Improvement | Add assertj refaster rules for map size asserts | palantir/gradle-baseline#919 | | Improvement | Added a Refaster rule to change `isEqualTo` checks into `hasValue` checks | | ## 2.17.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | Implement AssertjCollectionHasSameSizeAsArray | palantir/gradle-baseline#922 | | Improvement | Implement assertj map refactors for containsKey and containsEntry | palantir/gradle-baseline#925 | | Improvement | Refaster assertj migrations support descriptions with format args | palantir/gradle-baseline#926 | | Improvement | Refaster out String.format from describedAs | palantir/gradle-baseline#927 | ## 2.18.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | Refaster rules to simplify negated boolean expressions and extract null checks. | palantir/gradle-baseline#935 | | Improvement | Refaster rules for checks that maps do not contain a specific key | palantir/gradle-baseline#935 | | Improvement | Refaster rule 'CollectionStreamForEach' | palantir/gradle-baseline#942 | | Improvement | ExecutorSubmitRunnableFutureIgnored as error prone ERROR | palantir/gradle-baseline#943 | ## 2.19.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | checkJUnitDependencies detects a possible misconfiguration with spock and JUnit5 which could lead to tests silently not running. | palantir/gradle-baseline#951 | ## 2.20.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | Use Mockito verifyNoInteractions over deprecated verifyZeroInteractions | palantir/gradle-baseline#924 | | Improvement | Errorprone rules for usage of Guava static factory methods | palantir/gradle-baseline#941 | | Improvement | Fix error-prone `UnnecessaryParentheses` by default | palantir/gradle-baseline#952 | | Improvement | Implement Error Prone `ThrowError` to discourage throwing Errors in production code<br>Errors are often handled poorly by libraries resulting in unexpected<br>behavior and resource leaks. It's not obvious that 'catch (Exception e)'<br>does not catch Error.<br>This check is intended to be advisory - it's fine to<br>`@SuppressWarnings("ThrowError")` in certain cases, but is usually not<br>recommended unless you are writing a testing library that throws<br>AssertionError. | palantir/gradle-baseline#957 | | Improvement | Improve TestCheckUtils.isTestCode test detection | palantir/gradle-baseline#958 | | Improvement | Implement Error Prone `Slf4jLevelCheck` to validate that slf4j level checks agree with contained logging. | palantir/gradle-baseline#960 | ## 2.20.1 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | Suppress error-prone PreferCollectionConstructors on jdk13 | palantir/gradle-baseline#968 | ## 2.21.0 | Type | Description | Link | | ---- | ----------- | ---- | | Feature | Users can opt-in to format their files using our fork of google-java-format (palantir-java-format) | palantir/gradle-baseline#936 | ## 2.22.0 _Automated release, no documented user facing changes_ ## 2.23.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | Implement error prone ReverseDnsLookup for unexpected reverse dns lookups<br><br>Calling address.getHostName may result in a DNS lookup which is a network request,<br>making the invocation significantly more expensive than expected depending on the<br>environment.<br>This check is intended to be advisory - it's fine to<br>@SuppressWarnings("ReverseDnsLookup") in certain cases, but is usually not<br>recommended. | palantir/gradle-baseline#970 | ## 2.24.0 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | The deprecated `verifyZeroInteractions` now gets rewritten to `verifyNoMoreInteractions`, which has the same behaviour. | palantir/gradle-baseline#975 | | Improvement | ReadReturnValueIgnored: Check that read operation results are not ignored | palantir/gradle-baseline#978 | | Improvement | Stop migrating source sets to safe-logging, unless they already have the requisite library (`com.palantir.safe-logging:preconditions`). | palantir/gradle-baseline#981 | | Improvement | For users who opted into palantir-java-format, we now reflow strings and reorder imports. | palantir/gradle-baseline#982 | ## 2.25.0 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | checkstyle Indentation rule is disabled when palantir-java-format is enabled | palantir/gradle-baseline#987 | | Improvement | Load palantir-java-format dynamically from the same configuration set up by `com.palantir-java-format` which is also used to determine the version used by IntelliJ. | palantir/gradle-baseline#989 | ## 2.26.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | Run `./gradlew formatDiff` to reformat the relevant sections of any uncommitted changed Java files (relies on `git diff -U0 HEAD` under the hood) | palantir/gradle-baseline#988 | ## 2.27.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | Slf4jLogsafeArgs fixes safe-log wrapped throwables | palantir/gradle-baseline#1001 | | Improvement | `DangerousParallelStreamUsage` checks for `Collection.parallelStream()` and `StreamSupport` utility methods with parallel=true. | palantir/gradle-baseline#1005 | | Improvement | DangerousThrowableMessageSafeArg disallows Throwables in SafeArg values.<br>Throwables must be logged without an Arg wrapper as the last parameter, otherwise unsafe data may be leaked from the unsafe message or the unsafe message of a cause. | palantir/gradle-baseline#997 | | Improvement | Implement a suggested fix for CatchBlockLogException | palantir/gradle-baseline#998 | ## 2.28.0 | Type | Description | Link | | ---- | ----------- | ---- | | Improvement | Implement `FinalClass` error prone check, replacing the checkstyle implementation | palantir/gradle-baseline#1008 | | Improvement | Error prone validation to avoid redundant modifiers | palantir/gradle-baseline#1010 | ## 2.28.1 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | Fix `RedundantModifier` interpretation of implicit modifiers | palantir/gradle-baseline#1014 | ## 2.28.2 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | Fix RedundantModifier failures types nested in interfaces | palantir/gradle-baseline#1017 | ## 2.28.3 | Type | Description | Link | | ---- | ----------- | ---- | | Fix | Fix error-prone mathcing literal null as a subtype.<br>The most common issue this fixes is failures on `SafeArg.of("name", null)`<br>assuming that the null literal value parameter may be a throwable. | palantir/gradle-baseline#1020 | To enable or disable this check, please contact the maintainers of Excavator.
Problem. For
bin/pyspark
, there is currently no other way to specify Spark configuration properties other than throughSPARK_JAVA_OPTS
inconf/spark-env.sh
. However, this mechanism is supposedly deprecated. Instead, it needs to pick up configurations explicitly specified inconf/spark-defaults.conf
.Solution. Have
bin/pyspark
invokebin/spark-submit
, like all of its counterparts in Scala land (i.e.bin/spark-shell
,bin/run-example
). This has the additional benefit of making the invocation of all the user facing Spark scripts consistent.Details.
bin/pyspark
inherently handles two cases: (1) running python applications and (2) running the python shell. For (1), Spark submit already handles running python applications. For cases in whichbin/pyspark
is given a python file, we can simply call pass the file directly to Spark submit and let it handle the rest.For case (2),
bin/pyspark
starts a python process as before, which launches the JVM as a sub-process. The existing code already provides a code path to do this. All we needed to change is to usebin/spark-submit
instead ofspark-class
to launch the JVM. This requires modifications to Spark submit to handle the pyspark shell as a special case.This has been tested locally (OSX and Windows 7), on a standalone cluster, and on a YARN cluster. Running IPython also works as before, except now it takes in Spark submit arguments too.