[SPARK-56374][BUILD] Align SBT assembly shade rules with Maven#55307
[SPARK-56374][BUILD] Align SBT assembly shade rules with Maven#55307yadavay-amzn wants to merge 1 commit intoapache:masterfrom
Conversation
Add missing shade rules to SparkBuild.scala for three connect modules: 1. SparkConnect (server): Add guava and guava.thirdparty relocations to match Maven. Without these, grpc classes in the assembly reference unshaded com.google.common.* which fails at runtime since Guava is shaded to org.sparkproject.guava in spark-network-common. 2. SparkConnectClient (jvm): Add org.apache.arrow relocation to match Maven. Arrow classes are now shaded under org/sparkproject/connect/client/org/apache/arrow/. 3. SparkConnectJdbc: Add org.apache.arrow relocation for consistency with Maven and the jvm client module. Closes SPARK-56374
|
Hi @yadavay-amzn, your GA seems still disabled. Could you confirm it? |
|
Hi @yadavay-amzn I have a few concerns about this PR.
cc: @LuciferYang who raised SPARK-56374 |
|
@yadavay-amzn Thank you for submitting the pr,I’d like to clarify the intent behind this PR.: Why I’m proposing this workRight now,
The end result is that an SBT-built Spark distribution cannot replace a Maven-built one at runtime. Downstream code that relies on shaded relocated classes will fail. The release process currently uses Maven builds as the source of truth, which means we are unable to use a more efficient method for version building and release. What “done” looks likeMy target end state:
Explicit non-goals:
Where the work lives
Read all these poms first. They are the specification. The modules covered above are the scope we need to align. How to testWhat I can think of now is that tests for PySpark and Connect should use the shaded jars to verify normal runtime behavior. The YARN shuffle service needs to start properly and load Netty native libraries correctly. And for Spark Connect JDBC, the client should run normally with no class-not-found errors. This is quite a challenging task, and I’m really glad you’re interested in it. Feel free to ping me anytime if there are updates. Thanks ~ |

What changes were proposed in this pull request?
Add missing shade rules to
project/SparkBuild.scalato align SBT assembly output with Maven for three connect modules:SparkConnect (server): Add
com.google.common→org.sparkproject.guavaandcom.google.thirdparty→org.sparkproject.guava.thirdpartyrelocations. Maven'ssql/connect/server/pom.xmlhas these but SBT was missing them.SparkConnectClient (jvm): Add
org.apache.arrow→org.sparkproject.connect.client.org.apache.arrowrelocation. Maven'sconnector/connect/client/jvm/pom.xmlhas this but SBT was missing it.SparkConnectJdbc: Add
org.apache.arrowrelocation for consistency with Maven'ssql/connect/client/jdbc/pom.xml.Why are the changes needed?
SBT assembly shade rules were out of sync with Maven, causing differences in the assembled JARs:
Server: Without the guava relocation, grpc classes in the server assembly reference unshaded
com.google.common.*. At runtime these fail to resolve because Guava is shaded toorg.sparkproject.guavainspark-network-common. Verified by inspectingManagedChannelImpl.class— after the fix, references correctly point toorg/sparkproject/guava/base/MoreObjectsinstead ofcom/google/common/base/MoreObjects.Client JVM: Arrow classes were not being shaded. After the fix, they appear under
org/sparkproject/connect/client/org/apache/arrow/.Other Maven modules with shade rules (
core,sql/core,network-yarn, rootpom.xml) were verified to be intentionally different in SBT — those modules don't produce separate assembly JARs in SBT, and their shading is handled at a different level in the SBT build architecture.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Built all three affected SBT assemblies and verified the output JARs:
Verified shading in the output JARs:
strings ManagedChannelImpl.classshowsorg/sparkproject/guava/base/MoreObjects(wascom/google/common/base/MoreObjectsbefore fix)jar tfshows arrow classes underorg/sparkproject/connect/client/org/apache/arrow/with zero unshadedorg/apache/arrowentriesWas this patch authored or co-authored using generative AI tooling?
No