I would like to run the cluster on GlusterFS to have a high-availability and make ‘elastic’ the storage. Do you think that GlusterFS will create issues? Of course each node will use its own space, and not any file will be shared with other databases.
Since I will run on EC2, do you think that using EBS on RAID10 will provide the same solution?
Another idea I have is to use new SSD instances with GlusterFS:
SSD = performance
GlusterFS = distributed FS
I don’t quite get the idea. What kind of “elastic” storage will GlusterFS provide in case of using XtraDB Cluster?
Each node in XDC needs the same amount of disk space, and the idea of cluster is to keep the service up in case of a node failure on a remaining nodes (which all already must have the same data set). Mixing their storages using GlusterFS seems to me like additional, superfluous data redundancy that gives you no benfit, but only introduces overhead.
Raid10 provides you data safety on each node’s storage level as well as additional I/O capacity, and combined with SSDs even more I/O capacity.
IMHO distributed networked filesystems are sweet when you want to operate on large data volumes, where you need redundancy on host level, or even datacenter level, but each node is most likely able to store only fraction of total dataset. This way you can add just more nodes when you need more space.
This approach really negates the concept of what xtradb cluster is designed to do. The cluster already handles replicating changes in the database through galera replication, creating a shared nothing architecture where every node has an independent copy of the data. Adding gluster underneath really gains you nothing but complexity, and could possibly break the cluster as each node is meant to have a separate copy of the data, not a shared copy.
Gluster is a neat tool and is really designed more for hosting content like video files, image files, virtual machine datastores, and application data where you would want to be able to scale your read/write capabilities for delivering high concurrency to a huge collection of data. You could look at it like Xtradb cluster and glusterfs are applications that are responsible for handling/abstracting the complexity of distributed storage, one for database the other for files.