I ran the online-schema-change tool on a table containing 1.5 million records with the basic command and it stopped running after ~2 hours. But it copied every thing properly while running.
I was running this in a AWS stack with 1GB RAM.
I dont think so there is any problem if you are not using chunk-time while running tool. chunk-time and chunk-size will decide how the tool will work.
If you’ll set chunk-time, It will adjust the chunk size dynamically so each data-copy query takes this long to execute. The tool tracks the copy rate (rows per second) and adjusts the chunk size after each data-copy query, so that the next query takes this amount of time (in seconds) to execute. It keeps an exponentially decaying moving average of queries per second, so that if the server’s performance changes due to changes in server load, the tool adapts quickly.
If you’ll set chunk-size, it can override the default behavior, which is to adjust chunk size dynamically to try to make chunks run in exactly --chunk-time seconds. When this option isn’t set explicitly, its default value is used as a starting point, but after that, the tool ignores this option’s value. If you set this option explicitly, however, then it disables the dynamic adjustment behavior and tries to make all chunks exactly the specified number of rows.
Thank you so much for the clarification. In fact we did further investigation to understand why it stopped abruptly and we figured out that the issue.
Before the end of the day, we logged of the Linux machine where the tool was running and the tool script has a function to handle signals and exit on SIGNUP
We commented those line and successfully ran the entire alter. It took two days to complete!