Pt-archiver not using --columns listed when index is defined manually

Hello,

We’re currently trying to utilize pt-archiver to delete old records from a MySQL 5.7 AWS Aurora database, however we’re not able to specify both the index and columns at the same time, it seems the utility ignores the --columns options.

Output from --dry-run:

[root@~]# pt-archiver --source h=dns,P=3306,u=user,p=pwd,D=db,t=attempt,i=IDX_18EC02668B8E8428 --purge --where "created_at between '2018-07-26 00:00:00.000' and '2018-07-26 01:00:00.000'" --limit 1000 --txn-size 1000 --bulk-delete --low-priority-delete --no-check-charset --no-safe-auto-increment --columns id,created_at --dry-run

SELECT /*!40001 SQL_NO_CACHE */ `id`,`created_at`,`application_id`,`integration_id`,`campaign_payout_rule_id`,`campaign_credential_id`,`tier_id`,`sequence_number`,`request_timeout`,`request_method`,`request_uri`,`request_headers`,`request_data`,`request_body`,`response_code`,`response_headers`,`response_data`,`response_time`,`duped`,`approved`,`price`,`parse_status`,`archive_status`,`updated_at`,`parsed_at`,`archived_at` FROM `dois`.`attempt` FORCE INDEX(`idx_18ec02668b8e8428`) WHERE (created_at between '2018-07-26 00:00:00.000' and '2018-07-26 01:00:00.000') ORDER BY `created_at` LIMIT 1000

SELECT /*!40001 SQL_NO_CACHE */ `id`,`created_at`,`application_id`,`integration_id`,`campaign_payout_rule_id`,`campaign_credential_id`,`tier_id`,`sequence_number`,`request_timeout`,`request_method`,`request_uri`,`request_headers`,`request_data`,`request_body`,`response_code`,`response_headers`,`response_data`,`response_time`,`duped`,`approved`,`price`,`parse_status`,`archive_status`,`updated_at`,`parsed_at`,`archived_at` FROM `dois`.`attempt` FORCE INDEX(`idx_18ec02668b8e8428`) WHERE (created_at between '2018-07-26 00:00:00.000' and '2018-07-26 01:00:00.000') AND ((`created_at` >= ?)) ORDER BY `created_at` LIMIT 1000

DELETE LOW_PRIORITY FROM `dois`.`attempt` WHERE (((`created_at` >= ?))) AND (((`created_at` <= ?))) AND (created_at between '2018-07-26 00:00:00.000' and '2018-07-26 01:00:00.000') LIMIT 1000

Table DDL:

CREATE TABLE `attempt` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `application_id` int(11) NOT NULL,
  `integration_id` int(11) NOT NULL,
  `campaign_payout_rule_id` int(11) NOT NULL,
  `campaign_credential_id` int(11) NOT NULL,
  `tier_id` int(11) NOT NULL,
  `sequence_number` smallint(5) unsigned NOT NULL DEFAULT '1',
  `request_timeout` decimal(10,2) NOT NULL DEFAULT '0.00',
  `request_method` int(10) unsigned NOT NULL DEFAULT '0',
  `request_uri` varchar(2083) COLLATE utf8_unicode_ci NOT NULL,
  `request_headers` json DEFAULT NULL,
  `request_data` json NOT NULL COMMENT '(DC2Type:json)',
  `request_body` mediumtext COLLATE utf8_unicode_ci,
  `response_code` smallint(5) unsigned NOT NULL,
  `response_headers` json DEFAULT NULL COMMENT '(DC2Type:json)',
  `response_data` mediumtext COLLATE utf8_unicode_ci,
  `response_time` decimal(10,2) NOT NULL DEFAULT '0.00',
  `duped` tinyint(1) NOT NULL DEFAULT '0',
  `approved` tinyint(1) NOT NULL DEFAULT '0',
  `price` decimal(10,2) NOT NULL DEFAULT '0.00',
  `parse_status` smallint(5) unsigned NOT NULL DEFAULT '1',
  `archive_status` smallint(5) unsigned NOT NULL DEFAULT '1',
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '(DC2Type:timestamp)',
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '(DC2Type:timestamp)',
  `parsed_at` timestamp NULL DEFAULT NULL COMMENT '(DC2Type:timestamp)',
  `archived_at` timestamp NULL DEFAULT NULL COMMENT '(DC2Type:timestamp)',
  PRIMARY KEY (`id`),
  KEY `IDX_18EC02663E030ACD` (`application_id`),
  KEY `IDX_18EC02669E82DDEA` (`integration_id`),
  KEY `IDX_18EC0266FA64550D` (`campaign_payout_rule_id`),
  KEY `IDX_18EC0266E4640ADF` (`campaign_credential_id`),
  KEY `IDX_18EC0266A354F9DC` (`tier_id`),
  KEY `IDX_18EC02668B8E8428` (`created_at`),
  KEY `IDX_18EC02665552D714` (`parsed_at`),
  KEY `IDX_18EC0266720357E8` (`parse_status`),
  KEY `IDX_18EC026686057109` (`archive_status`),
  KEY `IDX_18EC026642C5213` (`archived_at`),
  CONSTRAINT `FK_18EC02663E030AC1` FOREIGN KEY (`application_id`) REFERENCES `application` (`id`),
  CONSTRAINT `FK_18EC02669E82DDE2` FOREIGN KEY (`integration_id`) REFERENCES `integration` (`id`),
  CONSTRAINT `FK_18EC0266A354F9D3` FOREIGN KEY (`tier_id`) REFERENCES `tier` (`id`),
  CONSTRAINT `FK_18EC0266E4640AD4` FOREIGN KEY (`campaign_credential_id`) REFERENCES `campaign_credential` (`id`),
  CONSTRAINT `FK_18EC0266FA645505` FOREIGN KEY (`campaign_payout_rule_id`) REFERENCES `campaign_payout_rule` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

Version:

[root@~]# pt-archiver --version
pt-archiver 3.3.1

Any thoughts?

Thank you.

Renato.

1 Like

Hello again,

Would anyone have an idea why this is happening? Is it possible this is a bug in the archiver utility?

We’re currently running a DAG that is purging billions of records from this table, and since we don’t need to archive this data anywhere selecting all the columns is inefficient, and the load in the DB could probably be reduced.

Any help is appreciated.

Thank you.

Renato.

1 Like

Hi,
I’ve never used “–columns” before, but from the manual I understand that “–columns” is used for archiving (writing to file). You’ve specified “–purge”, meaning you want deletion of records, not archiving. Could that be your problem, that “purge” and “columns” are not meant to be used together?
If you don’t want to archive your data, “–purge” should do it.
Greetings,
Steffen

1 Like