Actual table size decrease

Hey,

I’m looking at a table here that is kinda big… but it doesnt need to be… I’m wondering if I should suggest something to change how its stored, inserts would have an added layer of complexity, but the queries should be faster

Big big parameter table with

parameter_id int
parameter_type_id int (points to the type of parameter here, eg. int, string, date, large string etc)
parameter_value varchar(64000)?

Data fills this table from xml, the parameter value could be as small as an int ‘1’ or as big as a large string containing log data

If a new row gets inserted and it’s only an int, it will still take up the space for varchar(64000) right?
Doing a search on for int ‘1’ will be doing a txt conversion (fulltext) etc?

should I split this table into a table per parameter_type?
as in 1 table for int values, 1 table for datetime values, 1 table for small strings, 1 table for mediumtexts etc.

something like

main table -
parameter_id int
parameter_type_id int (points to the type of parameter here, eg. int, string, date, large string etc)

int table -
int_parameter_id int
parameter_id int (same as maintable)
parameter_value int

datetime table -
datetime_parameter_id int
parameter_id int (same as maintable)
parameter_value datetime

large string table -
largestring_parameter_id int
parameter_id int (same as maintable)
parameter_value varchar(64000)

medium string table -
largestring_parameter_id int
parameter_id int (same as maintable)
parameter_value varchar(333)

so doing a query would be

select it.parameter_value, dt.parameter_value, st.parameter_value, mst.parameter_value from maintable mtleft outer join inttable it on mt.parameter_id = it.parameter_idleft outer join datetable dt on mt.parameter_id = dt.parameter_idleft outer join large_strtable st on mt.parameter_id = st.parameter_idleft outer join medium_strtable mst on mst.parameter_id = st.parameter_idwhere mt.parameter_id = 3

if the parameter is an int, then dt.parameter_value and st.parameter_value will be null

I know the extra tables will add to the speed of the query, but hmm for the drastic reduction in storage, it might be worth it?

This table is HUUGGGEEEE … disksize wise, compared to the number of rows in it, index file for the table is huge with indexes on the primary key and the value

inserts have an added layer of complexity too
but looking at things like ’ [URL=“http://doc.dev.md/mysql-4.1/optimization.html#data-size”]http://doc.dev.md/mysql-4.1/optimization.html#data-size[/URL]

Does anyone have a nicer way to optimize a table with different data types for data values? I know what datatype the values are, it just seems a huge waste just dumping them all in the same table and good luck trying to do a query on date ranges as it is!

Thanks )

/Bal

er

(select it.parameter_value from maintable mtinner join inttable it on mt.parameter_id = it.parameter_idwhere mt.parameter_id = 3)union(select dt.parameter_value from maintable mtinner join datetable dt on mt.parameter_id = dt.parameter_idwhere mt.parameter_id = 3)union(select lst.parameter_value from maintable mtinner join large_strtable lst on mt.parameter_id = lst.parameter_idwhere mt.parameter_id = 3)union(select mst.parameter_value from maintable mtinner join medium_strtable mst on mt.parameter_id = mst.parameter_idwhere mt.parameter_id = 3)limit 1

might make more sense as a query, also wondering if limit 1 would stop the query completting the unions if it found a value in the first table :wink:

I know this is complex and a bit awkward, but does it make any sense? Should I just avoid doing this and stick to 1 table? Is there any nicer way to organise the table?

Hmm

InnoDB might not have any issue with storage here?
What about indexes though?

MyISAM pads everything though?

It sounds a bit odd that you have anything from a string with one character ‘1’ to 64000 characters of XML in the same column. Could it be that all these parameters are mostly present so that you should instead have 4 different columns in this table? one int, one date, etc that contains the data. That way you can also have different indexes on them like for example the large VARCHAR() you might want a fulltext while on the other columns you want normal indexes.

Doing your 4 table join strategy is something I would definately advice against (based on your description it sounds just like you are going to have a lot of work and no direct benefit).

[B]Quote:[/B]

Data fills this table from xml, the parameter value could be as small as an int ‘1’ or as big as a large string containing log data

Technically it wouldn't be a "int" since it would be the string '1' so as long as you still query with "WHERE yourVarCharColumn = '1' " there is no implicit conversion.

But your main concern seems to be that you thinks this table will be excessively large, and it sounds like you think a VARCHAR(64000) will be very large even though it only contains a small string.
That is not so, the size of a VARCHAR(64000) will be stringLength + 2 bytes.
So the string ‘1’ will only occupy 3 bytes of storage.

As for indexes (if you have used a normal index) you can use the yourVarCharColumn(x) syntax in your index creation where you say that only x bytes from the beginning of the string should be part of the index.
This is since you want to reduce the size of the index since usually the first 20-30 bytes can be enough to locate the matchings records, then the rest of the data is just payload which is better fetched from the table. This way the index itself can be kept small and fast.
But this is not true for fulltext indexes which is a totally other beast.