Fixing Tungsten Replication Failure
- Posted by Gayan
- Posted in Uncategorized
Scenario: We will simulate a loss of the files at tungsten master server (master1), which have not been applied to slave (slave1). We assume mysql binary logs are still available at master server.
These are the steps to simulate the error.
- Make sure both tungsten master and slave are working fine.
- Make tungsten slave offline.
- Insert dummy data to master side so that tungsten master will generate thl information
- Make tungsten master offline.
- Delete available thl files at tungsten master server.
- Make tungsten master online.
- Try to make tungsten slave online.
This scenario can happen in a system, where you have tungsten master-slave step and your slave server happen to be in offline state for a long time, allowing tungsten master to delete master thl files depends on it’s purging retention.
At step 7, We will get the following message, since slave is unable to retrive required thl from master.
pendingExceptionMessage: Client handshake failure: Client response validation failed: Master log does not contain requested transaction: client source ID=slave1 seqno=7 epoch number=0 master min seqno=13 master max seqno=18
Ok. This says, slave is up to date until seqno 7, and it could not retrieve 8th sequence number as master server contains thl files starting only from 13th sequence number.
Let’s find out the exact binlog position for seqno 7 from slave side.
Alright, sequence number 7 is mapped to event id “mysql-bin.000010:0000000000001579;-1”
Note that seqno 7 has already been appplied. Let’s check if seqno 8 is available at master side.
Looks like it has been deleted. Let’s check if the corresponding binary log for seq 7 is available at master side.
Ok. binary log 000010 is available at master side. Therefore we have all the missing info in binary logs, but not in the thl files. Now we need to generate all thl information starting from mysql-bin.000010:0000000000001579, and this starting position should be the 9th sequence number.
Let’s go to mastser server and regenerate thl files starting from that specific binary coordinates.
This command generates thl files starting from the specified binlog cordinates, by default, first file will be named as ‘thl.data.0000000001’
We can check the content of the file as follows.
Looks like we were able to regenerate the thl files, but the problem is sequence number is not equal to 8.
If we try to make the slave online with this configuration, it will give us the following error.
pendingExceptionMessage: Client handshake failure: Client response validation failed: Master log does not contain requested transaction: client source ID=slave1 seqno=7 epoch number=0 master min seqno=19 master max seqno=30
Note that slave still ask for 8th sequence number, but the minimum available sequence number at master has been changed to 19. This is expected as our newly generated thl files starts from 19th sequence number.
Now it is obvious that we need to regenerate the thl files with a given sequence number (8).
When tungsten regenerate thl files, it picks the next sequence from a recorded value in trep_commit_seq table. Let’s go ahead and make tungsten master offline and check the value.
We need to update seqno value to 7, and epoch_number to 0 (epoch number changes every time we regenerate master thl files)
Now, let’s generate thl files.
Let’s analyze the new thl file.
Looks like we were able to generate the correct seqno, but still epoch# has the position where new thl regeneration happened.
Since we have the correct seqno and correct data, we can try making the slave online again.
Awesome, It is online now. Let’s check if the data is propagated properly.
Data segment related to seq 8 has been applied to the slave.