Sizing of servers for a project

Hi everyone,

for an onlinecommunity project I would like to buy/lease own servers.

At the moment the project runs on a single rental root server. A 2CPU Dualcore Opteron 265 with 4GB Ram and 2x250GB SATA Raid1.
The project runs on Debian/PHP4/mySQL 4.1/Apache2.
The maximun users same time online are 750-950 (in the evening hours). The performance of the server is ok at this oint but I guess the maximum concurrent users are nearly at maximum. Because the provider offers no possiblities to upgrade or to add a second dedicated server for the database I would like to get own hardware and own colocation.

I’m not sure about the sizing of the servers. In the next year I expect the online users to grow up to 3000 concurrent online.
If these goal will be reached there should be no problems to extend the hardware but at the moment the budget is limited.

Would it be better to get one big server for the apache and one for the database? Or would it be better to get maybe 4 or 6 cheaper single CPU machines and build kind of cluster?

Maybe for the database something like that:
Dual-DualcoreCPU machine (I prefer the new Xeon CoreDuo)
8GB ECC Ram
SCSI 15000rpm HardwareRaid 10 with 4 or 6 drives

and maybe the same for the Apache but not that expensive harddrives. I think Raid1 is enough.

These machines could then be extended with RAM up to 32GB if needed.

What are your thoughts?

Thanks in advance
Daniel

Hi Daniel,

Do you know where your bottlenecks are or will be? This will greatly affect how you want to structure your platform from a hardware POV.

(When you say “maximum concurrent users” how are you measuring what the maximum is?)

Is it webserver processing that’s the bottleneck? Then get more/faster webservers - these should be super easy to scale out horizontally.

If it’s HTTP connection handling, look at using a reverse proxy such as Squid to free up your heavy/expensive/valuable apache processes.

If it’s DB that’s your bottleneck, then you’ll get much greater performance gains from optimising any queries that are run very often or are very slow than you will from buying hardware (unless you’re severely RAM limited say) so maybe look at this first. If your queries are running well and you just need more DB capacity, then:

Managing a single DB server is quite a bit easier than managing a cluster, so if you can get away with it from a performance POV then great. (Take reliability/failover into account too - you may also want to run a replicated slave DB which can act as failover)

Scaling a DB platform vertically (eg. bigger boxes) can get expensive very quickly though.

With a relatively small platform (i.e. just a few boxes) separating the DB from the webserver doesn’t always make best use of your limited resources (eg the webserver may be 90% idle and the DB server fully loaded or vice-versa) but it does make platform management and tuning easier.

If you can post where you think your bottleneck is (and why) then people might be able to suggest ways of reducing it through optimisation/configuration/software to let you get the most out of your hardware ) (and if not, at least help you to decide what you need to focus your HW purchases on)

HTH )
Toasty

Hi Daniel,

First I would mention you should be careful using number of online users - this metrics is very missleading - some people count people having page view in last minute as online, while other could be counting people active within last 30 minutes. HTTP is stateless protocol so the only real metric will be number of concurrent requests and requests per second… but even with it all requests are different.

Before going with upgrade I’d check if your application is optimized enough - I do not know your application complexity but you have planty of hardware to trow at your users at this point. Even if you have to upgrade later anyway spending time to optimize application first is very efficient as it may significantly save hardware you need on upgrade.

Next I would also try looking into scale out - it is much better if application can be improved so it can scale by adding servers rather than moving to more and more powerful server.

Also listen to the toasty. You need to access your operation skills and requirements, high availability needs and where the bottleneck is. Is it web server or database ? If it is databse is it CPU or IO bound. What is the database size and what is working set (this defines how much memory you need) - what storage engine etc.

I obviously would be happy to take a closer look at your application and help you with sizing and/or optimization.

Hello Peter!

1st of all, i really like your site… good job !

can you tell me how can be make a good estimation on how many requests a site can handle maximum at a moment? or I can formulate the question in an other way: if you think that a community site will handle one million PI in the hottest hour, than what could this mean in hardware needs…
what are the most relevant things you need to take in consideration? MySQL, apache or lighttpd, HDD’s performances, etc… is there any forum or blog where I can find some relevant data?
which one is the most critical in your opinion when build such a site?
community side tend to use: forum, a lots of pics ( albums ), inner email system, blog, tags… show a lots of informations for users on every pages…

thanks,
Rich

Thank you Rich,

Generally it is really hard to estimate how many page views site can handle other than performing benchmark. And it also better be relistic benchmark. For example with ab (apache benchmark) hitting always the same page you will not get realistic load but will check something close to peak - full caches speed.

Your hardware needs depends a lot of how you write software and how complex it is internally. For example I know Digg for long was running with couple of MySQL servers, having many millions of PV per hour.

If you’re targeting 1mil of PV per hour (which for community site, probably means large database as well) everything becomes important - your web layer and caching, database design, software efficnency - ie you can’t spend too much CPU time generating pages.

The most important is proper planning - check software which you have and access how it will work with many users and large data sets. There could be show stoppers - ie full table scans for tables which may grow large. Once you accessed your software you can estimate your hardware needs and think if you can optimize your software to reduce it.

If you’re going to have 20 servers even 10% performance gains is important.

Sorry I’m not giving you hard answer but any one I would give would be missleading. Depending on circumstances amount of hardware needed could be different 10-100 times.

I optimized some times to run 10 times faster simply by adding proper indexes )