Thursday, 11 June 2015

VoltDB Q&A

1. When a failed node is re-added in the cluster after fixed, will the cluster see it as a complete new node, or re-allocate the partitions belong to it before the failure?

The rejoined node is pretty much a clean node. But it will be the same partitions that used to be on that machine because of the consistent hashing algorithm. 

2. What’s the max size of each partition? and how many partitions are allowed in a cluster? is it 1:1 with CPU core?

Each partition has access to all of the memory. So theoretically, you can 4 partitions in a node and all the data somehow got hashed to the same partition and the other three can be empty. Hence, the selection of the right partition key is important. As for the number of partitions in a cluster, the number of partitions is a setting per node. Partitions are 1:1 with cores. Usually, the recommendation is to start the tuning at 75% of the physical cores (not hyper-threaded). To truly gauge a production environment tune with command logging on and some level of k-safety that is production representative. 

3. How does VoltDB handle distributed transactions in an efficient way? Would you please give us more details about "Serialized requests are queued for single- threaded execution on a dedicated core”?

Single partition transactions are routed through the SPI (Single partition initiators) and this can complete the transaction with just that one partition's thread. In a distributed transaction, i.e. multi-partition transaction, it goes through the MPI (Multi partition initiator) which schedules the task to run concurrently on all partitions. Any stored procedure that gets run, being the premise of an atomic transaction, will maintain undo logs for all the mutations.

4. Each partition has two replicas in the cluster(Three copies in total). When the master is updated, one of the two replicas is updated synchronously. Is it correct?

When the master is updated, both the replica partitions are updated synchronously. Since read work load gets load balanced between the master and the replicas, you don't want any of the replicas out of synch and hence giving old data. To avoid this, master and replicas are updated in an active-active manner. 

5. What is replicated table used for?

Replicated tables are for storing reference data that will be often used in joins in queries on the transaction data. For example:
  1. Customer Profile Data - partitionable by customerId
  2. Customer Clickstream Data - partitionable by customerId
  3. Age Demographic Data - Not related to the customer. Something like A -> 15-25, B-> 25-35, C-> 35-45. This is a good candidate for being a replicated table

6. What’s the largest size of VoltDB cluster? And what’s a standard node(e.g cpu cores, memory, disk) for VoltDB?

The largest production deployment of VoltDB is at 20+ nodes and around 3 TB+. Usually we recommend not to go beyond 24 to 32 cores and 256 GB RAM per machine. The reason behind this is that:

  1. All the partitions need to go to the same disk for writing their command logs. A very large number of cores can cause contention on the disk I/O.
  2. When you have a lot of memory on a single machine, the snapshots to save will take a long time. Incidentally, when you have a catastrophic event on the cluster and need to restore, that much data needs to be read off the disk to repopulate the memory. 256 GB is something that we felt is a good balance between storing a large amount of data per node and not having a long reload time. 

No comments:

Post a Comment