What is faster: Columns or Rows

Let’s say I have a table that contains text (still varchar) in a lot of languages (let’s say 50 languages).

I would only use this to do selects the whole day.

And i always need to search for 1 key in 2 or 3 of this languages with 1 query and display this to the user.

What would be faster:

ID english lang2 lang3
1 text text text
(so I get both long vertical/horizontal table)

OR
ID lang text
1 english text
(so I get very long vertical table)

Normalization tells me the last, but using MySQL 4 i noticed the top one is faster for as much i can test… but what should be faster (maybe I’m testing wrongly ))

when testing the second query in mysql 4, are you sure you have indexed fuly?

Ie, a single clustered index on (id, langid) ?
I would imagine this will be the fastest method

Final results:

A horizontal table is WAY faster if it becomes very big.

Horzontal contains:
262,069 total rows
with id dutch english french german spanish italian romanian as columns

Vertical includes exactly the same information and thus becomes
1,568,702 total rows and only id, languageID, text

====
HORIZONTAL TABLE

  • with ID primary (no index further)

SELECT SQL_NO_CACHE english, dutch, id
FROM phrases
WHERE (
dutch LIKE ‘%hoe%’
OR english LIKE ‘%hoe%’
)
AND (
dutch LIKE ‘%lang%’
OR english LIKE ‘%lang%’
)
AND (
dutch LIKE ‘%duurt%’
OR english LIKE ‘%duurt%’
)
LIMIT 0 , 30

==> 0.0081
== On very big database => 0.2149

— 2nd search method
SELECT SQL_NO_CACHE dutch, english
FROM phrases
WHERE id
IN ( 50400011, 60100011, 60100027 )

==> 0.0004 sec
== On very big database => 0.0269

And after putting index on ID (for big table: 0.0013

=========================================================
VERTICAL TABLE

  • With language and ID primary, no index further

SELECT SQL_NO_CACHE *
FROM site_phrases
WHERE
(language = 0 OR language = 1)
AND phrase LIKE ‘%hoe%’
AND phrase LIKE ‘%lang%’
AND phrase LIKE ‘%duurt%’

==> 0.0049
== ON very big database ==> 0.7106

— 2nd search method
SELECT SQL_NO_CACHE *
FROM site_phrases
WHERE
(language = 0 OR language = 1)
AND
id IN ( 50400011, 60100011, 60100027 )

==> 0.0007
== ON very big database ==> 0.1460

After putting index on ID ==> 0.0005
And for big table with index ==> 0.0706

the results are very clear: NOT normalizing this table layout is WAY faster

Well those tables are not normalized, normally you would surely have a seperate table with each language? confused:

I’ve tested that too:

==SPLITTED VERTICAL TABLE

SELECT SQL_NO_CACHE lang_Dutch.id,lang_Dutch.text, lang_English.text FROM lang_Dutch
JOIN lang_English using (ID) WHERE (
lang_Dutch.text LIKE ‘%hoe%’
OR lang_English.text LIKE ‘%hoe%’
) AND
(
lang_Dutch.text LIKE ‘%lang%’
OR lang_English.text LIKE ‘%lang%’
) AND
(
lang_Dutch.text LIKE ‘%duurt%’
OR lang_English.text LIKE ‘%duurt%’
)
LIMIT 0 , 30

= 1000rows, 6languages ==> 0.0196

–union method

SELECT SQL_NO_CACHE lang_Dutch.id, lang_Dutch.text, lang_English.text
FROM lang_Dutch
JOIN lang_English
USING ( ID )
WHERE (
lang_Dutch.text LIKE ‘%hoe%’
OR lang_English.text LIKE ‘%hoe%’
)
AND (
lang_Dutch.text LIKE ‘%lang%’
OR lang_English.text LIKE ‘%lang%’
)
AND (
lang_Dutch.text LIKE ‘%duurt%’
OR lang_English.text LIKE ‘%duurt%’
)
LIMIT 0 , 30
UNION
SELECT SQL_NO_CACHE lang_Dutch.id, lang_Dutch.text, lang_English.text
FROM lang_English
JOIN lang_Dutch
USING ( ID )
WHERE (
lang_Dutch.text LIKE ‘%hoe%’
OR lang_English.text LIKE ‘%hoe%’
)
AND (
lang_Dutch.text LIKE ‘%lang%’
OR lang_English.text LIKE ‘%lang%’
)
AND (
lang_Dutch.text LIKE ‘%duurt%’
OR lang_English.text LIKE ‘%duurt%’
)
LIMIT 0 , 30

== 0.0401 (to slow)

Normally you wouldnt do it this way :s This ISNT normalisation, surely if you wanted to properly normalise you wouldnt have to add a new table every time you want to add a language? You would simply add a row to a seperate table. Basically somewhere your schema is incorrect.

You would have one table for languages, one for text and then crossreferance them, anyway, id really suggest NOT using “like” anyway, as it doesnt scale well at all… Have you looked into sphinx (sphinxsearch.com)?

What you suggest does look exactly what I tested:

ID english lang2 lang3
1 text text text
(so I get both long vertical/horizontal table)

OR

ID lang text
1 1 text

(so I get very long vertical table + 2nd language table)

But a very long table seems very slow… I will get some 1.000.000 rows!!

I indeed looked into that program, but as I run it on a virtual-host at an ISP, that will be a problem.

Somewhat offtopic:
It isnt really advised to have something that big on a virtualhost anyway…

You could do something with words and mapping, i looked into this for a search engine schema before i started using sphinx (as this wasnt efficient enough for my likings)

Word Table:
id
word (varchar)

Map Table:
word_id
lang_id

Lang table:
id
language (varchar)

Then something similar to:
SELECT … FROM lang LEFT JOIN map ON map.lang_id = lang.id LEFT JOIN words ON map.word_id = words.id WHERE word = ‘Blah blah’

That should work, not sure on speeds really, and might not be accustumed to what you want to use this for. However it might work )

Remember, for this sort of table good indexes are key to success )