You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
cutecycle
changed the title
[BUG]: JVM IPC Deserialization uses BinaryFormatter, which is now Deprecated for OWASP CWE
JVM IPC Deserialization uses BinaryFormatter, which is now Deprecated for OWASP CWE
Jan 6, 2023
This is entirely pointless for UDFs. UDFs execute arbitrary code remotely on workers by design. The whole entire point of UDF on Spark is distributed RCE. Whatever formatter you use does not change the fact that the payload is, and has to be arbitrary computation.
Describe the bug
We'll be getting SYSLIB0011 errors for the way the Broadcast, Worker, and RDD are formatting streams.
My current understanding:
Historical example: jni4net (also uses BinaryFormatter, but has a some level of struct definition:
*https://github.com/jni4net/jni4net/blob/ac2189c37253710e7b729797631419b0bf3b8559/jni4net.tested.n/src/generated/net/sf/jni4net/tested/JavaInstanceFields.generated.cs#L16
Notes:
Can we write directly to MemoryStream?
#1112 (comment)
"passing protobufs between Java and C using JNI":
https://medium.com/@dhaval.durve/passing-protobufs-between-java-and-native-c-code-using-jni-9808b60f6d2c
An equivalent of this CVE, and the object filter used to resolve it
https://security.snyk.io/vuln/SNYK-PYTHON-PYSPARK-3021140
https://github.com/apache/spark/pull/18166/files#diff-6a1d1601920af68466d7c30dc02170468abbe408138734c00d50d2ba1b81ba35R179
BinaryFormatter Guidance:
https://learn.microsoft.com/en-us/dotnet/standard/serialization/binaryformatter-security-guide
Arrow buffers:
https://arrow.apache.org/docs/python/ipc.html
BinaryFormatter Marshaller in ProtoBuf.net:
https://github.com/protobuf-net/protobuf-net.Grpc/blob/main/tests/protobuf-net.Grpc.Test.Integration/CustomMarshaller.cs
Protobuf scalar
bytes
for arbitrary byte lengths:https://developers.google.com/protocol-buffers/docs/proto3#scalar
Wind down plan in dotnet:
dotnet/designs@bd0a066
"Is binary serialization inherently unsafe?"
https://stackoverflow.com/a/66825699
pyspark's implementation of this is based on py4j; they were going to use protobuf but opted for strings
https://github.com/py4j/py4j/blob/b4514ecd40ea121a35f9cf50bbf2ccea95354245/py4j-python/src/py4j/protocol.py#L9
https://github.com/py4j/py4j/blob/1f8a0b6dc216f16092d9c1b2556897eec8653a62/py4j-python/src/py4j/java_gateway.py#L1737
Though I will say... things seem to be BinaryFormatter all the way down?
https://github.com/protobuf-net/protobuf-net/search?q=binaryformatter
The text was updated successfully, but these errors were encountered: