Amazon S3 replacement

zimin2000 · January 13, 2009, 4:17am

Hello,

I am working on building a project that should replace Amazon S3 for yet another backup solution. To simplify migration the project will provide exactly the same interface as Amazon S3 (namely, the objects/files will be identified by 1k string keys of some free format). It is going to be Linux/Apache/TomCat/Java/?MySQL? platform where MySQL suppose to store all the key and a mapping for them to a data storage where the actual objects/files are stored.

The target scale is going to be very huge: several petabytes of storage and several thousand requests per second. As the result, a single MySQL server will not be able to handle it; it seems that MySQL cluster will not be able to handle it either.

The way I see to improve it is sharding or partitioning the keys between several MySQL servers. The issues/comments/complexities on this way are as follows:

The backup solution uses only one Amazon S3 user and only one bucket to store all the data, so per-user sharding is not possible.
LIST operation must be implemented, so data ordering should be preserved.

The best way is to partition the lexicographical order of data.

The keys are unknown strings, so “static” sharding is not possible; it should be dynamic based on the data.
The set of data will increase, as the result, the number of involved MySQL servers will increase as well, so smooth data migration must be supported.

And on the other way as well, some partitions/shards can be merger and MySQL servers can be decommissioned.

Another approach could be to switch from MySQL to some other platform. For instance Google’s BigTable does solve all these complications, so taking into account that very limited set of operations should be required (PUT/GET/DELETE/LIST), BigTable’s clone could be a good solution for it.

What do you think?

xaprb · December 23, 2009, 12:30pm

I think it will be fun and difficult. Check our book for some different ideas on sharding: High Performance MySQL, Second Edition.

sbarre · October 22, 2010, 3:41pm

You might want to have a look at OpenStack. [URL]http://openstack.org/projects/storage/[/URL]

Its the code Rackspace is using for their AWS competitor. I’m sure they would like someone to make a S3 interface wrapper for their setup.

xaprb · October 23, 2010, 8:51am

I just re-read this thread and I see no reason why a single MySQL server can’t handle the work. Assuming that the petabytes of data isn’t in MySQL, but is in the actual files/data, then several thousand queries per second is trivial for MySQL. You should be able to get 20-30k QPS easily with simple key lookups and updates on a commodity server.

Topic		Replies	Views
Moved: Percona XtraBackup to AWS S3: Issue Alert Percona XtraBackup	0	362	December 13, 2019
120million records - scaling up/out advice? Other MySQL® Questions	6	556	October 12, 2007
Does hot backup to s3 support other S3 compatible storages except Amazon s3 and MinIO? Percona Server for MongoDB percona , mongodb	8	751	October 28, 2021
One query on a big table vs multiple queries on multiple smalle ones Other MySQL® Questions	8	658	September 21, 2006
Percona-xtradb-cluster-operator.v1.11.0 \| MYSQL Backup failing with S3 storage Percona Operator for MySQL percona	3	553	January 22, 2024

Amazon S3 replacement

Related topics