I have a large fact table like this:
(day date not null,
advertiser_id int not null,
publisher_id int not null,
ad_template_id int not null,
status_id tinyint not null,
partner_id int not null,
event_source_id tinyint not null,
product_id int not null,
impressions int,
partialimpressions float,
primary key (day, advertiser_id, publisher_id, ad_template_id, status_id, partner_id, event_source_id, product_id)
);
all queries i do against this table select using day and often some combination of the rest of the key [i think many times using advertiser_id so its second…]
my question is: this is a large table (you can see the combinations get big), and so presumably traversing the index will take a good chunk of memory. i have some other tables that also have a similar structure.
i think right now querying against these tables is flushing the innodb_buffer (i can only give innodb_buffer 4GB…) repeatedly.
my questions are:
are there any alternatives to this ‘large primary key’ approach? one reason i want it is i want to keep the unique semantic across the key.
i calculated the size of the key and data by having a ballpark of a reasonable lookback, and then multiplying each row size by the number or rows and i believe i should aim to have enough memory to keep this in memory if i want to query against it? is this the right approach here?
(i set innodb_buffer_pool_size to the max i could without making the machine swap unfortunately there are other things running on it right now…)
also am i right in thinking i can monitor either vmstat (bi/bo) [which unfortunately does not work so well because of other stuff running on the machine] and also i am thinking of monitoring ‘innodb_buffer_pool_reads/innodb_buffer_pool_read_requests’ to count the % cache misses.
sorry a large braindump - appreciate any comments.
thanks