I am wondering if MySQL is the right tool for this, and, if it might be, if anyone can offer me any pointers in database design for this type of project:
A part of the platform I am building requires a large data table (starting at tens of millions of records, scaling to hundreds of millions within a year or two, maybe reaching billions at some point). The structure of the data table is: int, int, float, datetime, datetime. This data table will accept data from a single source (import script) in batches of up to ten million rows. I have full control over the import script. Various applications will be pulling data via web API and probably a custom TCP server. I am expecting requests for individual rows in bursts of up to 50,000 per second. At first this sounds like a good application of a key-value design, but many of the requests will take the form:
select float where int=A and datetime<B and datetime<C order by datetime, datetime limit 0,1
the basic idea is that I’m getting the datapoint for a given series that has the latest datetime pair below a user-defined threshhold. I will probably be able to do some logic on the application layer to pull an entire series at a time, but much of that ordering will still fall to the database layer.
I’m currently running a prototype off of SQL Server 2005 and it’s very responsive at up to 1,000 requests per second with 10 million records. I am concerned about scaling to hundreds of millions of rows at 50,000 requests.
What do you think? Is MySQL the tool for the job? Should I look into NoSQL solutions? If MySQL is it, any pointers or ideas about how to best structure the importer to minimize negative performance?
Thanks!
theserge