Tag-based schema for different entities + translations

Hello,

We are using Percona Server with Mysql 8 and we are having major performance issues displaying search results, we have even given up including some tables in queries (translation tables).

This is a streaming platform, users can use the platform’s search engine to search for videos, the returned result must match at least:

  • Video title in default language (English)
  • Video title in one of 12 translations
  • Category name in default language (English)
  • Category name in one of 12 translations
  • Tag name in default language (English)
  • Tag name in one of 12 translations
  • Actor name
  • Name of the producer

It is a FULL_TEXT search that is made.

Currently, it is only done on the video title in the default language, otherwise, it takes a long time to display the result.

The videos table contains more than 4 million records.

I read a blog post that talks about “Tag-based schema

I wanted to have your opinion to apply the same thing to our system.

Add a text type field in the videos table that I can call “properties or tags” and save in it:

  • Video Title
  • 12 video title translations
  • Category names
  • 12 translations of each category
  • Tag names
  • 12 translations of each of the tags
  • Names of actors
  • Name of the producer

I ask myself a lot of questions:

  • Isn’t that a lot for a FULL_TEXT INDEX?
  • Doesn’t that make duplicate content?
  • Is there no other solution?
  • Didn’t I forget other things?

Thanks for your help.

1 Like

You could do that. It will be a lot of data per row. Yes, probably a lot of duplicate content. How will you identify a tag which is the title vs a tag which is a category name vs a tag which is the name of the producer?

Sounds like you have 1 function in your app that should “do everything” which is typically a bad design. Instead your “search box” needs to be limited to only the title, and have additional boxes specifically for actors or producers. Have another box specifically for category. Go into google drive and do a search. You’ll see how google drive breaks down each “property” into its own search field.

1 Like

Hi @matthewb,

Thanks for your return.

I want to use the added field just to filter the search, there will be joins to select actors, categories, tags, etc.

The goal is to speed up the search while filtering several entities

An example query would be:

SELECT 
  Videos.id AS Videos__id, 
  Videos.title AS Videos__title, 
  ... 
FROM 
  videos Videos 
  LEFT JOIN videos_translations VideosTranslation ON (
    VideosTranslation.locale = 'fr' 
    AND Videos.id = VideosTranslation.id
  ) 
WHERE 
  (
    Videos.active = TRUE 
    AND Videos.segment = 1 
    AND (Videos.deleted) IS NULL
	AND MATCH(
      Videos.properties
    ) AGAINST(
      'my search' IN NATURAL LANGUAGE MODE
    )
  ) 
ORDER BY 
  Videos.views DESC 
LIMIT 
  40 OFFSET 0

Here is part of the current database schema:

-- -----------------------------------------------------
-- videos
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `videos` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `producer_id` INT UNSIGNED NULL,
  `title` VARCHAR(120) NULL DEFAULT NULL,
  PRIMARY KEY (`id`),
  FULLTEXT INDEX `title` (`title`) VISIBLE,
  INDEX `producer_id_INDEX` (`producer_id` ASC) VISIBLE
ENGINE = InnoDB;

CREATE TABLE IF NOT EXISTS `videos_translations` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `locale` CHAR(2) NOT NULL,
  `title` VARCHAR(120) NOT NULL,
  PRIMARY KEY (`id`, `locale`))
ENGINE = InnoDB;

-- -----------------------------------------------------
-- categories
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `categories` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(50) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE INDEX `name` (`name` ASC) VISIBLE
ENGINE = InnoDB;

CREATE TABLE IF NOT EXISTS `categories_translations` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `locale` CHAR(2) NOT NULL,
  `name` VARCHAR(50) NOT NULL,
  PRIMARY KEY (`id`, `locale`),
  UNIQUE INDEX `name` (`locale` ASC, `name` ASC) VISIBLE)
ENGINE = InnoDB;

CREATE TABLE IF NOT EXISTS `categories_videos` (
  `video_id` INT UNSIGNED NOT NULL,
  `category_id` INT UNSIGNED NOT NULL,
  PRIMARY KEY (`video_id`, `category_id`),
  INDEX `category_video_INDEX` (`category_id` ASC, `video_id` ASC) VISIBLE)
ENGINE = InnoDB;

-- -----------------------------------------------------
-- producers
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `producers` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(50) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE INDEX `name` (`name` ASC) VISIBLE
ENGINE = InnoDB;

-- -----------------------------------------------------
-- actors
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `actors` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(50) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE INDEX `name` (`name` ASC) VISIBLE
ENGINE = InnoDB;

CREATE TABLE IF NOT EXISTS `actors_videos` (
  `video_id` INT UNSIGNED NOT NULL,
  `actor_id` INT UNSIGNED NOT NULL,
  PRIMARY KEY (`video_id`, `actor_id`),
  INDEX `actor_video_INDEX` (`actor_id` ASC, `video_id` ASC) VISIBLE)
ENGINE = InnoDB;

-- -----------------------------------------------------
-- tags
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `tags` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(50) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE INDEX `name` (`name` ASC) VISIBLE
ENGINE = InnoDB;

CREATE TABLE IF NOT EXISTS `tags_translations` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `locale` CHAR(2) NOT NULL,
  `name` VARCHAR(50) NULL DEFAULT NULL,
  PRIMARY KEY (`id`, `locale`))
ENGINE = InnoDB;

CREATE TABLE IF NOT EXISTS `tags_videos` (
  `video_id` INT UNSIGNED NOT NULL,
  `tag_id` INT UNSIGNED NOT NULL,
  PRIMARY KEY (`video_id`, `tag_id`),
  INDEX `tag_video_INDEX` (`tag_id` ASC, `video_id` ASC) VISIBLE)
ENGINE = InnoDB;
1 Like

That all seems fairly straightforward. Have you ran ‘EXPLAIN’ on the query?

1 Like

Not yet, because we wanted to have your opinion on the solution before starting its integration.

1 Like

My recommendation, as is most of Percona’s, is to simply test it out. :slight_smile: Clone this server, make the changes and give it a try. Or just make a copy of the database locally

CREATE DATABASE testingvideo;
CREATE TABLE testingvideo.videos LIKE proddb.videos;
INSERT INTO testingvideo.videos SELECT * FROM proddb.videos;

Run the ALTER’s on testingvideo.videos and then EXPLAIN your queries on that dataset.

2 Likes

Very well.
I think we’re going to do that.
I will keep you informed of the result.
Thank you.

1 Like

Hello @matthewb,

We made the changes (still in development) and here is the result of the explain:

EXPLAIN SELECT 
  Videos.id AS Videos__id, 
  Videos.videokey AS Videos__videokey, 
  Videos.title AS Videos__title,
  MATCH(Videos.glossary) AGAINST(
    '+my search' IN BOOLEAN MODE
  ) AS score 
FROM 
  videos Videos 
WHERE 
  (
    Videos.active = TRUE 
    AND Videos.segment = 1 
    AND (Videos.deleted) IS NULL
    AND MATCH(Videos.glossary) AGAINST(
      '+my search' IN BOOLEAN MODE
    ) 
  ) 
ORDER BY 
  score DESC, 
  Videos.views DESC 
LIMIT 
  40 OFFSET 0;

Here is the screenshot:

The request takes about 3 seconds, some +4 to execute.

I tried to optimize the query by creating a sub-query to filter a part, but the execution time remains unchanged.

EXPLAIN SELECT 
  Videos.id AS Videos__id, 
  Videos.videokey AS Videos__videokey, 
  Videos.title AS Videos__title,
  MATCH(Videos.glossary) AGAINST(
    '+my search' IN BOOLEAN MODE
  ) AS score 
FROM 
  (
    SELECT 
      Videos.id, 
      Videos.videokey, 
      Videos.title
    FROM 
      videos Videos 
    WHERE 
      (
        Videos.active = TRUE 
        AND Videos.segment = 1 
        AND (Videos.deleted) IS NULL
      )
  ) Videos 
WHERE 
  MATCH(Videos.glossary) AGAINST(
    '+my search' IN BOOLEAN MODE
  ) 
ORDER BY 
  score DESC 
LIMIT 
  40 OFFSET 0; 

Here is the screenshot:

1 Like

Looks like the FULLTEXT indexes are in use. The EXPLAIn plan on the 2nd query looks incomplete; there’s no section for the subquery. If the performance is good for you then it looks like you succeeded. The other issue is that you can’t use more than 1 index per table-access and you can’t combine FULLTEXT with standard indexes. Make an index on (active, segment, deleted) and try the sub-select again. Or you might try a CTE instead of sub-query.

1 Like

I find that the result is not at the top level response time, that’s why I tried the subquery. There is already an INDEX for (active, segment and deleted) and another FULLTEXT for (glossary).

I also followed the advice on the blog and subscribed to Releem.com I followed all the recommendations and for two weeks we are at 100% and no drop.

What should be done to speed up the search?

1 Like

You tried the subquery, but notice it made no difference because no different index was used. As I said, MySQL can only use 1 index per table. So you need to rewrite the query a different way. Try a CTE instead of the subquery, or try a self-join, or you may need to use an external FULLTEXT indexer like Sphinx or Lucene to handle that aspect. Good luck!

1 Like

Very well. Thank you for your help and advices.

1 Like

Hi,
We just did some tests with self-joins and CTE, the result is identical.
Looks like Mysql has these limit regarding FULLTEXT search.
We are thinking of looking at one of the solutions recommended by the forums:
sphinx, solr, elasticsearch or lucene.

1 Like

The limitation is not FULLTEXT specifically. The limitation is more general in that MySQL cannot use more than 1 index per table-access. The fulltext index must be used when using MATCH AGAINST() syntax, thus the other indexes cannot be used.

There is one other suggestion which is to partition the table on (active, segment, deleted). This, in effect, creates a “second index” that can be used by the optimizer. This is known as “partition pruning”.

1 Like

Hi,

Thank you for the table partition solution but we are thinking of setting up Elasticsearch to avoid possible problems later, knowing that the database will increase every day…

What is the best solution to retrieve the results:

  1. Save only the fields to be searched in ES (id, title, categories and tags), do the search by retrieving the IDs of the videos then make a select in the “videos” table in mysql to retrieve the other properties to display (1 ES query + 1 Mysql query)

  2. Put all the fields necessary for the search and for the display (only 1 ES request)

Thanks for your feedback.

1 Like

I’m not an ES expert so it’s up to you on how you want to store the data. I would probably do option 1, but that might be more operationally complex. However, option 2 requires more data be stored in ES which may affect backups, and other costs associated with ES.

1 Like