Let me say straight off that I do not have to much control over the
MySQL server itself, but I can say that it has fairly good capacity
and is running MySQL 5.0.19 and is on a Linux box. You can assume a
I have a good amount of experience with MySQL, however MySQL
optimization is not one of my better skill sets. I currently have a
site (really a series of them) which employs a small amount of MySQL,
but not for the main storage engine. I need to shift to a full MySQL
solution. It’s really not a very complicated site, however there is
lots of data.
The site works essentially like a blog. In fact its a small blog
aggregator. There are posts with dates, the source the post came from
as well as a link to the original article. So what has to be stored is
relatively simple. That said there are currently about 200,000 posts
and counting (uncompressed over 1 gig of data). This could easily grow to over 1 million.
The site is split into sub domains. Each sub
domain relates to a specific topic. For example:
There is a celebrity section and the sub domain is celebrity.MySite.net.
there is a video game section and the sub domain is game.MySite.net
Currently there are 19 of these sub domains in all, but it’s growing.
MySite uses two very small tables at present. The rest of the data is
stored in gziped text files (that’s the part that has to change).
These tables are as fallows.
- sub domain ( stores info about a particular sub domain )
sd_id int(10) unsigned NOT NULL auto_increment,
sd_name varchar(255) NOT NULL default ‘’,
sd_name_disp varchar(255) NOT NULL default ‘’,
sd_title varchar(255) NOT NULL default ‘’,
sd_desc varchar(255) NOT NULL default ‘’,
sd_keywords varchar(255) NOT NULL default ‘’,
sd_feed_alt_url varchar(255) NOT NULL default ‘NONE’,
sd_today_post_num int(10) unsigned NOT NULL default ‘0’,
sd_other varchar(255) default NULL,
PRIMARY KEY (
- feeds ( stores RSS feed URL’s of each of the data sources. Each row
is related to the sub domain it belongs to via the “sd_id” field.
feed_id int(10) unsigned NOT NULL auto_increment,
sd_id int(10) unsigned NOT NULL default ‘0’,
feed_name varchar(255) NOT NULL default ‘’,
feed_site_url varchar(255) NOT NULL default ‘’,
feed_rss_url varchar(255) NOT NULL default ‘’,
feed_post_length varchar(255) NOT NULL default ‘’,
feed_type enum(‘headline’,‘main’,‘both’) NOT NULL default ‘main’,
feed_other varchar(255) default NULL,
PRIMARY KEY (
) TYPE=MyISAM AUTO_INCREMENT=536 ;
So what I do currently is on an hourly bases (via a cron job) read
through each of the rows in the “feeds” table and store any new posts
in the appropriate folder for the corresponding sub domain. But what I
need to do, as I have said is store this data in a MySQL DB. It will
need to be searchable, quickly accessible via sub domain, and quickly
sorted by date time. I would also like to be able to have a global
search across all sub domains if possible.
Currently there is no way to search MySite and if speed is going to
be a problem as far as search I can live with that. The key is to have
quick date sorting so that I can call the latest posts for sub domain
“celebrity” in chronological order with out to much delay.