Script to analyze all entries in a table and suggest optimal column data types.

RichardBronosky · June 27, 2007, 2:57pm

I’ve been working on a bash script that does this. I thought it was going to be really short and simple, but it is turning out to be quite complex. Before I continue with it, I want to know if such a thing already exists.

Basically I do a lot of queries to test each column like:

“select ‘false.’, `$field` from $TABLE where lpad(cast(0+`$field` as binary), length(`$field`), ‘0’) != `$field` limit 1;”

Which tests to see if the data is an INT ZEROFILL.

Is there already a solution for this? If not, would anyone else be interested in helping to develop this? I intended to create a source forge project for it once I got a working version. But, I’m not finding the time to complete it.

RichardBronosky · June 29, 2007, 12:14pm

Okay, since no one is interested in helping me with it… Is anyone interested in using it when it is done?

Here is the basic outline. For each column in the table, analyze the values:
[LIST=1]
[] Decide if the column is currently BINARY.
[] Get the length of the longest value.
[] If the values are numeric, then…[LIST=1]
[] Decide whether they require zerofill.
[] Decide if they are unsigned.
[] If they are all integers, decide what size is the best fit. Else…
[] If they are [fixed point] DECIMAL, decide their significant digits and number of decimal places. Else…
[] Decide if DOUBLE or single precision FLOAT is required.
[] (optionally) Decide the significant digits and number of decimal places that best fit.
[/LIST]
[] If the values are temporal[LIST=1]
[] Decide which is the best fit. (YEAR, DATE, TIME, TIMESTAMP, or DATETIME in that order of precedence)
[/LIST]
[] If an ENUM would be appropriate, suggest it.
[] If a CHAR would be most efficient, suggest it (BINARY is needed).
[] If the length you came up with first is < 255 (or 65535 post-MySQL 5.0.3) use VARCHAR (VARBINARY is needed). Else… Find the TEXT/BLOB size that best fits.
[/LIST]

What’s missing:
[LIST=1]
[] BIT type (I don’t use it)
[] SET type (I don’t want to code it) Spatial Types (I hope I never need it)
[/LIST]

I hope that draws some interest. I’ll take your continued silence to mean that I’m an idiot and the only person in the community who would like a tool for doing this.

Speeple · July 1, 2007, 12:33pm

The path I would take here is parsing the table schema, then contrasting the data type definitions with the actual data in the table.

RichardBronosky · July 1, 2007, 6:24pm

This is a good idea. It would be worth adding.

My original purpose for the script was to serve my own need. I’m currently working for a company whose idea of data storage was delimited text files or excel files. I’ve loaded dozens of tables into MySQL from comma/tab delimited files. In the interest of speed, I generally create the tables with all varchar fields. These tables are currently being used by individuals in the company. When it is decided that an application is going to be built against data in one of these tables, I either optimize the table or I pull the needed columns out of many tables and build one streamlined table.

I’d like to be able to take all these nasty tables that I have (and continue to create) and clean them up. So, as it stands, i don’t much care about the current schema. I know what it is, and I know it is bad.

roman · July 15, 2007, 6:23am

I think, You probably can start with use of this query:

SELECT * FROM tbl PROCEDURE ANALYSE()

RichardBronosky · July 25, 2007, 10:00am

Unfortunately that almost always just suggests that I make everything an ENUM. I think that should be a REALLY powerful tool, but turns out to be basically useless. (

Topic		Replies	Views
long simple table -- mysql right tool for the job? Other MySQL® Questions	0	369	November 10, 2010
More efficient MySQL Maintenance logic Other MySQL® Questions	1	413	September 8, 2009
mysql TABLE optimisation Other MySQL® Questions	6	545	November 22, 2007
Finding the size of all columns in a table MySQL & MariaDB	3	1491	January 17, 2023
Home Brew Mysql Query Analyzer - Advice Needed Other MySQL® Questions	2	584	July 22, 2009

Script to analyze all entries in a table and suggest optimal column data types.

Related topics