

On the other hand, what if you want to create a separate environment for developers for each feature branch? In this case, the correct approach is to create a single Cassandra node and store data in the distributed storage (i.e., Ceph and GlusterFS become an option). In this situation, the most appropriate solution is to store data on the node’s disk via local persistent volumes or by mounting hostPath. Therefore, there is no need to use distributed data storage systems like Ceph or GlusterFS in the case of a Cassandra cluster comprising a large number of nodes. We wish to emphasize that Cassandra itself has built-in mechanisms for data replication. We will provide a personal PersistentVolume to every pod that contains Cassandra You can use this mechanism effortlessly since it is already well developed. In Kubernetes, there is a PersistentVolume for storing data. Cassandra also has a unique instance called Commit Log that keeps records of all transactions and is stored on disk. There is another part of the data that is saved to disk as SSTable. Data storageĪs we mentioned above, Cassandra stores some of its data in RAM, in the form of MemTable. So, we’d rather focus on what we need to make this transition possible and what tools can help us with this. For us as a company maintaining a lot of K8s clusters, it’s all about managing it in a more convenient way. What we’re not going to discuss here is the reason for migrating Cassandra to Kubernetes. Cassandra stores part of the data in RAM to speed up reading and writing.Īnd now, let’s get to the actual migration to Kubernetes.Cassandra uses an IP address to identify a node.Cluster - the collection of all Datacenters.Datacenter - the combination of all Racks located in the same data center.Rack - the collection of Cassandra’s instances grouped by some attribute and located in the same data center.Node - the single deployed instance of Cassandra.The topology of Cassandra includes several layers:.The project hardly needs a detailed introduction, so I will only remind you of those main features of Cassandra that are important in the context of this article. It provides high availability with no single point of failure.

What is Cassandra? Cassandra is a distributed NoSQL DBMS designed to handle large amounts of data. “The one who can control a woman can cope with the country as well” In this article, we will share our vision of the necessary steps, criteria, and existing solutions for migrating Cassandra to K8s. We regularly deal with the Apache Cassandra and the need to operate it as a part of a Kubernetes-based infrastructure (for example, for Jaeger installations).
