IBM to Purchase DataStax for Database, GenAI Capabilities

February 26, 2025

6

IBM at present introduced its intent to accumulate DataStax, the longtime backer of the Apache Cassandra database that has just lately broadened its attain into streaming information and generative AI. IBM cited DataStax’s functionality to handle unstructured information in addition to its vector database, which is used for growing RAG options.

Apache Cassandra was initially developed at Fb in 2008 to serve the fledgling social community’s want for a extremely scalable, fault-tolerant database to retailer huge information generated by customers on its web site. Fb was a giant consumer and creator within the nascent huge information ecosystem, constructing its social media empire atop non-relational expertise like Apache Hadoop and HBase, one other NoSQL information retailer, in addition to Apache Hive, which it created to make Hadoop seem like a relational database. (Fb would ultimately transfer again to utilizing relational databases, particularly Postgres, however that’s one other story.)

Cassandra, which technically is a wide-column retailer that favors information availability and reliability (on the expense of information consistency), turned a top-level venture on the Apache Software program Basis in 2010. That’s the similar 12 months that Jonathan Ellis and Matt Pfeil co-founded an organization in Austin, Texas referred to as Riptano, which it shortly renamed DataStax.

At first, DataStax adopted the everyday industrial open-source enterprise mannequin, providing an enterprise model of Apache Cassandra referred to as DataStax Enterprise (DSE). The corporate, which had moved to Santa Clara, California by 2014, attracted clients from the Fortune 500, reminiscent of FedEx, Capital One, and Verizon. It has raised $106 million in enterprise capital at a $830 valuation, and was on tempo for an IPO within the 2015 or 2016 timeframe.

That IPO by no means occurred, as MongoDB dominated the NoSQL house and went public in 2017. In Might 2020, DataStax launched Astra DB, a completely managed model of Cassandra operating within the cloud atop Kassandra, giving clients the scalability and availability advantages of the NoSQL database however with out the administration duties (like many distributed methods, Cassandra could be troublesome to handle). Later that 12 months, it launched K8ssandra, an open supply model of the database operating atop the useful resource supervisor.

Quickly, the corporate began branching past NoSQL databases. In 2021, it launched Astra Streaming, an occasion streaming platform based mostly on Apache Pulsar, a publish and subscribe (pub-sub) information platform that competes with Apache Kafka. In 2023, DataStax purchased Kaskada, an AI startup that helped to automate tedious characteristic engineering duties, and made the software program open supply below the Luna ML model.

DataStax additional bolstered its generative AI capabilities in 2023 with the launch of a vector retailer in Astra DB. Vector shops emerged as essential instruments for constructing retrieval-augmented era (RAG) pipelines to bolster the accuracy of huge language mannequin (LLM) output in generative AI purposes. Then in 2024, DataStax additional fleshed out its RAG story when it nabbed Langflow, which developed an open supply framework for constructing RAG pipelines.

(Laborant/Shutterstock)

All the gathered capabilities that DataStax constructed and acquired clearly caught the attention of IBM. Huge Blue, which has been rallying its enterprise to some extent on the again of its watsonx AI choices, cited open supply tasks like Apache Cassandra, Apache Pulsar, Langflow, and OpenSearch (a department of Elasticsearch and Kibana) in its press launch asserting the acquisition.

IBM is especially enamored of how DataStax has constructed its unstructured information administration capabilities below a single product. Whereas it didn’t point out DataStax’s Hyper-Converged Knowledge Platform (HCDP) by identify, it appears clear that IBM is banking on harnessing the tech to assist clients flip unstructured information into profitable AI purposes.

“Unstructured information represents a treasure trove of untapped enterprise intelligence, representing 93% of all enterprise information in 2024, in accordance with IDC,” Ritika Gunnar, IBM’s normal supervisor of information and AI, says in a weblog put up. “Harnessing the facility of this information inside generative AI purposes is crucial. However to try this, enterprises should first make order out of information chaos.”

In response to Gunnar, IBM needs to carry DataStax’s open supply choices along with its watsonx portfolio of merchandise, particularly Apache Iceberg, Apache Spark, Velox, and Presto, to assist clients leverage massive quantities of unstructured information.

“The info infrastructure required for AI is rather more than simply vector,’” Gunnar writes. “Many modalities of information–JSON, time-series, key/worth, tabular, graph–want to return collectively to make the information ingest and search correct and related. By having them constructed right into a simplified and scalable answer (due to generative AI) customers don’t need to sew collectively a mess of information representations to achieve worth from their enterprise information.)

In his personal weblog put up, DataStax CEO Chet Kapoor mentioned how DataStax and IBM have labored along with open supply software program (OSS) since 2020, together with deploying DataStax merchandise atop the IBM OpenShift platform.

“We respect the management and stewardship that IBM has demonstrated with OSS and the good OSS corporations which have discovered a house at IBM, like Purple Hat and others, and we’re excited to change into a part of an organization that understands the facility of openness,” Kapoor writes. “With our applied sciences and IBM’s watsonx.information, their hybrid, open information lakehouse, we will carry vector and AI search to your entire information property and make IBM’s capabilities out there to each developer.”

Phrases of the deal, which is predicted to shut within the second quarter, weren’t disclosed. DataStax was valued at $1.6 billion throughout its most up-to-date funding spherical, in June 2022. The corporate has raised $342.6 million over a number of rounds. It has tons of of paying clients, in accordance with IBM.

Associated Objects:

DataStax Rolls Out Vector Seek for Astra DB to Assist Gen AI

DataStax Declares New K8ssandra Operator

Cassandra Now Formally Within the Cloud with DataStax Astra