Let me say straight off that I do not have to much control over the
MySQL server itself, but I can say that it has fairly good capacity
and is running MySQL 5.0.19 and is on a Linux box. You can assume a
default setup.
I have a good amount of experience with MySQL, however MySQL
optimization is not one of my better skill sets. I currently have a
site (really a series of them) which employs a small amount of MySQL,
but not for the main storage engine. I need to shift to a full MySQL
solution. It’s really not a very complicated site, however there is
lots of data.
The site works essentially like a blog. In fact its a small blog
aggregator. There are posts with dates, the source the post came from
as well as a link to the original article. So what has to be stored is
relatively simple. That said there are currently about 200,000 posts
and counting (uncompressed over 1 gig of data). This could easily grow to over 1 million.
The site is split into sub domains. Each sub
domain relates to a specific topic. For example:
There is a celebrity section and the sub domain is celebrity.MySite.net.
there is a video game section and the sub domain is game.MySite.net
etc…
Currently there are 19 of these sub domains in all, but it’s growing.
MySite uses two very small tables at present. The rest of the data is
stored in gziped text files (that’s the part that has to change).
These tables are as fallows.
- sub domain ( stores info about a particular sub domain )
CREATE TABLE subdomain
(
sd_id
int(10) unsigned NOT NULL auto_increment,
sd_name
varchar(255) NOT NULL default ‘’,
sd_name_disp
varchar(255) NOT NULL default ‘’,
sd_title
varchar(255) NOT NULL default ‘’,
sd_desc
varchar(255) NOT NULL default ‘’,
sd_keywords
varchar(255) NOT NULL default ‘’,
sd_feed_alt_url
varchar(255) NOT NULL default ‘NONE’,
sd_today_post_num
int(10) unsigned NOT NULL default ‘0’,
sd_other
varchar(255) default NULL,
PRIMARY KEY (sd_id
)
) TYPE=MyISAM
- feeds ( stores RSS feed URL’s of each of the data sources. Each row
is related to the sub domain it belongs to via the “sd_id” field.
CREATE TABLE feeds
(
feed_id
int(10) unsigned NOT NULL auto_increment,
sd_id
int(10) unsigned NOT NULL default ‘0’,
feed_name
varchar(255) NOT NULL default ‘’,
feed_site_url
varchar(255) NOT NULL default ‘’,
feed_rss_url
varchar(255) NOT NULL default ‘’,
feed_post_length
varchar(255) NOT NULL default ‘’,
feed_type
enum(‘headline’,‘main’,‘both’) NOT NULL default ‘main’,
feed_other
varchar(255) default NULL,
PRIMARY KEY (feed_id
)
) TYPE=MyISAM AUTO_INCREMENT=536 ;
So what I do currently is on an hourly bases (via a cron job) read
through each of the rows in the “feeds” table and store any new posts
in the appropriate folder for the corresponding sub domain. But what I
need to do, as I have said is store this data in a MySQL DB. It will
need to be searchable, quickly accessible via sub domain, and quickly
sorted by date time. I would also like to be able to have a global
search across all sub domains if possible.
Currently there is no way to search MySite and if speed is going to
be a problem as far as search I can live with that. The key is to have
quick date sorting so that I can call the latest posts for sub domain
“celebrity” in chronological order with out to much delay.