• Twitter
  • Facebook
  • MySpace
  • StumbleUpon
  • Reddit
  • Digg
  • Del.icio.us
  • E-mail

Testing of Hypertable RangeServer Failover

03.04.2013  |  Testing

As we mentioned in our previous post announcing the newly arrived RangeServer failover feature, robustness and application transparency were #1 priorities.  To achieve these objectives, we placed enormous emphasis on testing.  While this testing effort was painstaking and led to a very long development cycle, it has paid enormous dividends in quality and robustness.  The following table lists the regression tests that were added to Hypertable to guarantee correctness and ensure that the expected behavior holds true for future releases.

# Test Description
1 RangeServer-failover-basic-1  Start Hypertable with two RangeServers (rs1,rs2) and then 1) Load table with data, 2) Kill rs1 and wait for recovery to complete, 3) Stop Hypertable and then restart with just rs2. Dump keys after each step and verify that the table contains the exact set of keys that were loaded. Verify that no ranges are assigned to rs1.
2 RangeServer-failover-basic-2  Start Hypertable with three RangeServers (rs1,rs2,rs3) and then 1) Load table with data, 2) Kill rs1 and wait for recovery to complete, 3) Stop Hypertable and then restart with just rs2 and rs3. Dump keys after each of these three steps and verify that the table contains the exact set of keys that were loaded.
3 RangeServer-failover-basic-3  Start Hypertable with five RangeServers (rs1,rs2,rs3,rs4,rs5) and then 1) Load table with data, 2) Kill rs1 and rs2 and wait for recovery to complete for both servers, 3) Stop Hypertable and then restart with just rs3, rs4, and rs5. Dump keys after each of the three steps and verify that the table contains the exact set of keys that were loaded.
4 RangeServer-failover-basic-4  Start Hypertable with two RangeServers (rs1,rs2) and then load table with data. Kill rs1 and wait for recovery to complete. Try to restart rs1 and verify that it does not come up because it has been recovered. Restart all servers. Again verify that rs1 does not come up because it has been recovered.
5 RangeServer-failover-quorum  Start Hypertable with five RangeServers (rs1,rs2,rs3,rs4,rs5) and then 1) Load table with data, 2) Kill rs1, rs2, rs3, rs4 and wait for Master to indicate that a quorum has not been reached and therefore recovery is blocked. Start rs6 and wait for recover of rs1, rs2, rs3, and rs4 to complete. 3) Restart Hypertable with just rs5 and rs6. Dump keys after each of the three steps and verify that the table contains the exact set of keys that were loaded.
6 RangeServer-failover-standby-master  Start Hypertable with two RangeServers (rs1,rs2) and a second hot-standby Master. 1) Load table with data, 2) Kill rs1 and original master and wait for hot-standby Master to take over and recover rs1, 3) Restart Hypertable with just rs2. Dump keys after each of the three steps and verify that the table contains the exact set of keys that were loaded.
7 RangeServer-failover-graceperiod  Start Hypertable with two RangeServers (rs1,rs2) and then 1) load table with data, then 2) kill rs1, wait 10 seconds, kill rs2 and then wait for both to be recovered. Dump keys after each step and verify that the table contains the exact set of keys that were loaded. Also verify that the temporary recovery barrier was put up after each server was stopped.
8 RangeServer-failover-sigstop  Start Hypertable with two RangeServers (rs1,rs2) and then 1) load table with data, then 2) suspend rs1 by sending it the SIGSTOP signal and wait for it to be recovered and then restart servers wit just rs2. Dump keys after each step and verify that the table contains the exact set of keys that were loaded.
9 RangeServer-failover-bad-fragments  Start Hypertable with two RangeServers (rs1,rs2) and supply --induce-failure=bad-log-fragments-1:signal:0 switch to Master to force it to report two bad fragments after replay completes. Install notification script to capture all notifications. Then load table with data, stop rs1 and wait for it to recover. Verify that the notification script was called to 1) Notify that failover took place and 2) Report the bad fragment error was encountered.
10 Testing RangeServer-failover-restart  Start Hypertable with two RangeServers (rs1,rs2) and then 1) load table with data, then 2) restart servers without rs2 and wait for recovery of rs2 to complete. Dump keys after both steps and verify that the table contains the exact set of keys that were loaded.
11 RangeServer-failover-create-table  Start Hypertable with three range servers (rs1, rs2, rs3). Supply the --induce-failure=user-load-range-4:exit:0 switch to rs2 to force it to fail during the loading of the initial range. Create a table and verify that rs2 failed. Run load generator to load table with 200,000 keys. Dump keys and verify that the table contains the exact set of keys that were loaded.
12 RangeServer-failover-scan-1  Start Hypertable with two range servers (rs1, rs2). Supply the --induce-failure=create-scanner-user-1:exit:1 switch to rs1 to force it to fail during table scan. Create a table and run load generator to load table with 200,000 keys. Dump keys and verify that the table contains the exact set of keys that were loaded. Verify that rs1 was recovered.
13 RangeServer-failover-scan-2  Start Hypertable with two range servers (rs1, rs2). Supply the --induce-failure=fetch-scanblock-user-1:exit:0 switch to rs1 to force it to fail during table scan. Create a table and run load generator to load table with 200,000 keys. Dump keys and verify that the table contains the exact set of keys that were loaded. Verify that rs1 was recovered.
14 split-failover-1  Start Hypertable with three range servers (rs1, rs2, rs3). Supply the --induce-failure=metadata-split-1:exit:1 switch to rs1 to force it to fail after the split log has been installed during the split of a METADATA range. Run load generator to load table with 200,000 keys. Verify that rs1 was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
15 split-failover-2  Start Hypertable with three range servers (rs1, rs2, rs3). Supply the --induce-failure=metadata-split-2:exit:1 switch to rs1 to force it to fail after a METADATA range has been compacted and shrunk during a split and persisted to the SPLIT_SHRUNK state. Run load generator to load table with 200,000 keys. Verify that rs1 was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
16 split-failover-3  Start Hypertable with three range servers (rs1, rs2, rs3). Supply the --induce-failure=metadata-split-3:exit:1 switch to rs1 to force it to fail during a METADATA split, after the newly split-off range has been reported to the Master, but before the range state has been persisted in the range server meta log and the call to Master::relinquish_acknowledge() is made. Run load generator to load table with 200,000 keys. Verify that rs1 was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
17 split-failover-4  Start Hypertable with three range servers (rs1, rs2, rs3). Supply the --induce-failure=metadata-split-4:exit:1 switch to rs1 to force it to fail during a METADATA split, after the newly split-off range has been reported to the Master and the range state has been persisted in the range server meta log, but before the call to Master::relinquish_acknowledge() is made. Run load generator to load table with 200,000 keys. Verify that rs1 was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
18 split-failover-5  Start Hypertable with three range servers (rs1, rs2, rs3). Supply the --induce-failure=metadata-load-range-3:exit:1 switch to rs1 to force it to fail after the range server has taken ownership of the newly split-off METADATA range by writing the new location into the METADATA table, but before the split log has been linked into the primary commit log and the range has been persisted in the range server's metalog. Run load generator to load table with 200,000 keys. Verify that rs1 was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
19 split-failover-6  Start Hypertable with three range servers (rs1, rs2, rs3). Supply the --induce-failure=metadata-load-range-3:exit:1 switch to rs1 to force it to fail after the range server has taken ownership of the newly split-off METADATA range by writing the new location into the METADATA table and the split log has been linked into the primary commit log and the range has been persisted in the range server's metalog, but before success is reported back to the Master. Run load generator to load table with 200,000 keys. Verify that rs1 was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
20 split-failover-7  Start Hypertable with three range servers (rs1, rs2, rs3). Supply the --induce-failure=user-range-acknowledge-load-pause-1:pause(10000):3 and the --induce-failure=metadata-load-range-3:exit:1 switch to rs1. This will cause it to pause for 10 seconds and then fail (exit) inside Range::acknowledge_load(), immediately before it persists load_acknowledged=true for the range in the range server's metalog. By introducing the 10 second delay, it increases the likelihood that the range will have received additional updates. Run load generator to load table with 200,000 keys. Verify that rs1 was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
21 RangeServer-failover-master-1  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-root-initial-1:exit:0 argument to the Master to force it to fail during the INITIAL state of the recovery of the ROOT range. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
22 RangeServer-failover-master-2  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-root-initial-2:exit:0 argument to the Master to force it to fail during the INITIAL state of the recovery of the ROOT range, after the recovery plan and state transition to PHANTOM_LOAD has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
23 RangeServer-failover-master-3  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-root-load-3:exit:0 argument to the Master to force it to fail during the PHANTOM_LOAD state of the recovery of the ROOT range, after the phantom_load has successfully completed, but before the state transition to REPLAY_FRAGMENTS has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
24 RangeServer-failover-master-4  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-root-replay-3:exit:0 argument to the Master to force it to fail during the REPLAY_FRAGMENTS state of the recovery of the ROOT range, after the replay was successful, but before the state transition to PREPARE has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
25 RangeServer-failover-master-5  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-root-prepare-3:exit:0 argument to the Master to force it to fail during the PREPARE state of the recovery of the ROOT range, after the calls to RangeServer::phantom_prepare_ranges() were successful, but before the state transition to COMMIT has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
26 RangeServer-failover-master-6  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-root-commit-3:exit:0 argument to the Master to force it to fail during the COMMIT state of the recovery of the ROOT range, after the calls to RangeServer::phantom_commit_ranges() were successful, but before the state transition to ACKNOWLEDGE has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
27 RangeServer-failover-master-7  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-root-commit-3:exit:0 argument to the Master to force it to fail during the COMMIT state of the recovery of the ROOT range, after the calls to RangeServer::load_acknowledge() were successful, but before the state transition to COMPLETE has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
28 RangeServer-failover-master-8  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-metadata-load-3:exit:0 argument to the Master to force it to fail during the PHANTOM_LOAD state of the recovery of a METADATA range, after the phantom_load has successfully completed, but before the state transtion to REPLAY_FRAGMENTS has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
29 RangeServer-failover-master-9  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-metadata-replay-3:exit:0 argument to the Master to force it to fail during the REPLAY_FRAGMENTS state of the recovery of a METADATA range, after the replay was successful, but before the state transition to PREPARE has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
30 RangeServer-failover-master-10  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-metadata-prepare-3:exit:0 argument to the Master to force it to fail during the PREPARE state of the recovery of a METADATA range, after the calls to RangeServer::phantom_prepare_ranges() were successful, but before the state transition to COMMIT has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
31 RangeServer-failover-master-11  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-metadata-commit-3:exit:0 argument to the Master to force it to fail during the COMMIT state of the recovery of a METADATA range, after the calls to RangeServer::phantom_commit_ranges() were successful, but before the state transition to ACKNOWLEDGE has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
32 RangeServer-failover-master-12  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-metadata-commit-3:exit:0 argument to the Master to force it to fail during the COMMIT state of the recovery of a METADATA range, after the calls to RangeServer::load_acknowledge() were successful, but before the state transition to COMPLETE has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
33 RangeServer-failover-master-13  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-user-load-3:exit:0 argument to the Master to force it to fail during the PHANTOM_LOAD state of the recovery of a USER range, after the phantom_load has successfully completed, but before the state transtion to REPLAY_FRAGMENTS has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
34 RangeServer-failover-master-14  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-user-replay-3:exit:0 argument to the Master to force it to fail during the REPLAY_FRAGMENTS state of the recovery of a USER range, after the replay was successful, but before the state transition to PREPARE has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
35 RangeServer-failover-master-15  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-user-prepare-3:exit:0 argument to the Master to force it to fail during the PREPARE state of the recovery of a USER range, after the calls to RangeServer::phantom_prepare_ranges() were successful, but before the state transition to COMMIT has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
36 RangeServer-failover-master-16  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-user-commit-3:exit:0 argument to the Master to force it to fail during the COMMIT state of the recovery of a USER range, after the calls to RangeServer::phantom_commit_ranges() were successful, but before the state transition to ACKNOWLEDGE has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
37 RangeServer-failover-master-17  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-user-commit-3:exit:0 argument to the Master to force it to fail during the COMMIT state of the recovery of a USER range, after the calls to RangeServer::load_acknowledge() were successful, but before the state transition to COMPLETE has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
38 RangeServer-failover-master-18  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-1:exit:0 argument to the Master to force it to fail inside the toplevel recovery state machine in the INITIAL state, after the range server's metalog has been read and a recovery plan was created, but before the state transition to ISSUE_REQUESTS has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
39 RangeServer-failover-master-19  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-2:exit:0 argument to the Master to force it to fail inside the toplevel recovery state machine in the INITIAL state, after the range server's metalog has been read and a recovery plan was created and the state transition to ISSUE_REQUESTS has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
40 RangeServer-failover-master-20  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-3:exit:0 argument to the Master to force it to fail inside the toplevel recovery state machine in the ISSUE_REQUESTS state, after recovery operations have been created for each of the failed range server's commit log types (ROOT, METADATA, SYSTEM, USER) and the state transition to FINALIZE has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
41 RangeServer-failover-master-21  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-4:exit:0 argument to the Master to force it to fail inside the toplevel recovery state machine in the FINALIZE state, after the range server's connection has been purged from the connection manager. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
42 RangeServer-failover-master-22  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=recover-server-ranges-user-phantom-load-ranges:throw:0 argument to the Master to force it to throw an exception during recovery of the USER commit log, after the first successful call to RangeServer::phantom_load(). Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
43 RangeServer-failover-master-23  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=recover-server-ranges-user-replay-fragments:throw:0 argument to the Master to force it to throw an exception during recovery of the USER commit log, after the first successful call to RangeServer::replay_fragments(). Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
44 RangeServer-failover-master-24  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=recover-server-ranges-user-phantom-prepare-ranges:throw:0 argument to the Master to force it to throw an exception during recovery of the USER commit log, after the first successful call to RangeServer::phantom_prepare_ranges(). Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
45 RangeServer-failover-master-25  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=recover-server-ranges-user-phantom-commit-ranges:throw:0 argument to the Master to force it to throw an exception during recovery of the USER commit log, after the first successful call to RangeServer::phantom_commit_ranges(). Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
46 RangeServer-failover-master-26  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=recover-server-ranges-user-acknowledge-load:throw:0 argument to the Master to force it to throw an exception during recovery of the USER commit log, after the first successful call to RangeServer::acknowledge_load(). Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
47 RangeServer-failover-master-27  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-load-user:exit:0 argument to rs2 to force it to fail (exit) inside the call to RangeServer::phantom_load() during the recovery of one of rs1's USER ranges. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
48 RangeServer-failover-master-28  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-load-user:exit:0 argument to rs2 and supply the --induce-failure=recover-server-ranges-user-load-2:exit:0 to the Master. This causes rs2 to fail (exit) inside the call to RangeServer::phantom_load() during the recovery of one of rs1's USER ranges, and also causes the Master to fail when it detects the failure of the call to RangeServer::phantom_load() at rs2. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
49 RangeServer-failover-master-29  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=replay-fragments-user-0:exit:0 argument to rs2 to force it to fail (exit) inside the call to RangeServer::replay_fragments() during the recovery of one of rs1's USER ranges. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
50 RangeServer-failover-master-30  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=replay-fragments-user-0:exit:0 argument to rs2 and supply the --induce-failure=recover-server-ranges-user-replay-2:exit:0 to the Master. This causes rs2 to fail (exit) inside the call to RangeServer::replay_fragments() during the recovery of one of rs1's USER ranges, and also causes the Master to fail when it detects the failure of the call to RangeServer::replay_fragments() at rs2. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
51 RangeServer-failover-master-31  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=replay-fragments-user-1:exit:0 argument to rs2 to force it to fail (exit) inside the call to RangeServer::replay_fragments(), after the replay completed successfully, but before notification was sent back to the Master, during the recovery of one of rs1's USER ranges. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
52 RangeServer-failover-master-32  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-update-user:exit:0 argument to rs2 to force it to fail (exit) inside the call to RangeServer::phantom_update(), during the recovery of one of rs1's USER ranges. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
53 RangeServer-failover-master-33  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-update-user:exit:0 argument to rs2 and supply the --induce-failure=recover-server-ranges-user-replay-2:exit:0 to the Master. This causes rs2 to fail (exit) inside the call to RangeServer::phantom_update() during the recovery of one of rs1's USER ranges, and also causes the Master to fail when it detects the failure of the call to RangeServer::replay_fragments() at rs2. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
54 RangeServer-failover-master-34  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-update-metadata:exit:0 argument to rs2 to force it to fail (exit) inside the call to RangeServer::phantom_update(), during the recovery of one of rs1's METADATA ranges. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
55 RangeServer-failover-master-35  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-prepare-ranges-user-1:exit:0 argument to rs2 to force it to fail (exit) inside the call to RangeServer::phantom_prepare_ranges(), after ranges have been prepared, but before the PHANTOM range entries have been written to the range server's metalog, during the recovery of rs1's USER ranges. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
56 RangeServer-failover-master-36  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-prepare-ranges-user-2:exit:0 argument to rs2 to force it to fail (exit) inside the call to RangeServer::phantom_prepare_ranges(), after the PHANTOM range entries have been written to the range server's metalog, but before success has been reported back to the Master, during the recovery of rs1's USER ranges. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
57 RangeServer-failover-master-37  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-prepare-ranges-user-2:exit:0 argument to rs2 and supply the --induce-failure=recover-server-ranges-user-prepare-2:exit:0 to the Master. This causes rs2 to fail (exit) inside the call to RangeServer::phantom_prepare_ranges() during the recovery of one of rs1's USER ranges, and also causes the Master to fail when it detects the failure of the call to RangeServer::phantom_prepare_ranges() at rs2. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
58 RangeServer-failover-master-38  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-prepare-ranges-user-3:exit:0 argument to rs2 to force it to fail (exit) inside the call to RangeServer::phantom_prepare_ranges(), after success has been reporte back to the Master, during the recovery of rs1's USER ranges. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
59 RangeServer-failover-master-39  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-commit-user-1:exit:0 argument to rs2 to force it to fail (exit) inside the call to RangeServer::phantom_commit_ranges(), before the Location column of the METADATA table has been updated for the ranges, during the recovery of rs1's USER ranges. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
60 RangeServer-failover-master-40  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-commit-user-2:exit:0 argument to rs2 to force it to fail (exit) inside the call to RangeServer::phantom_commit_ranges(), after the Location column of the METADATA table has been updated for all of the ranges, but before the range server's metalog has been updated, during the recovery of rs1's USER ranges. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
61 RangeServer-failover-master-41  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-commit-user-2:exit:0 argument to rs2 and supply the --induce-failure=recover-server-ranges-user-commit-2:exit:0 to the Master. This causes rs2 to fail (exit) inside the call to RangeServer::phantom_commit_ranges() during the recovery of one of rs1's USER ranges, and also causes the Master to fail when it detects the failure of the call to RangeServer::phantom_commit_ranges() at rs2. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
62 RangeServer-failover-master-42  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-commit-user-3:exit:0 argument to rs2 to force it to fail (exit) inside the call to RangeServer::phantom_commit_ranges(), after the the range server's metalog has been successfully updated, but before success has been reported back to the Master, during the recovery of rs1's USER ranges. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
63 RangeServer-failover-master-43  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-commit-user-4:exit:0 argument to rs2 to force it to fail (exit) inside the call to RangeServer::phantom_commit_ranges(), after success has been reported back to the Master, during the recovery of rs1's USER ranges. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.

Thanks again to everyone who participated in this considerable development effort!

Posted By:  Doug Judd, CEO, Hypertable Inc.

Here's what other people had to say

There are no comments yet... be the first!

What about you?

Name

Email

Comment

Remember my personal information
Notify me of follow-up comments?

Please enter the word you see in the image below: