The Hypertable Blog

Testing of Hypertable RangeServer Failover

03.04.2013  |  Testing  |  0

As we mentioned in our previous post announcing the newly arrived RangeServer failover feature, robustness and application transparency were #1 priorities.  To achieve these objectives, we placed enormous emphasis on testing.  While this testing effort was painstaking and led to a very long development cycle, it has paid enormous dividends in quality and robustness.  The following table lists the regression tests that were added to Hypertable to guarantee correctness and ensure that the expected behavior holds true for future releases.

# Test Description
1 RangeServer-failover-basic-1  Start Hypertable with two RangeServers (rs1,rs2) and then 1) Load table with data, 2) Kill rs1 and wait for recovery to complete, 3) Stop Hypertable and then restart with just rs2. Dump keys after each step and verify that the table contains the exact set of keys that were loaded. Verify that no ranges are assigned to rs1.
2 RangeServer-failover-basic-2  Start Hypertable with three RangeServers (rs1,rs2,rs3) and then 1) Load table with data, 2) Kill rs1 and wait for recovery to complete, 3) Stop Hypertable and then restart with just rs2 and rs3. Dump keys after each of these three steps and verify that the table contains the exact set of keys that were loaded.
3 RangeServer-failover-basic-3  Start Hypertable with five RangeServers (rs1,rs2,rs3,rs4,rs5) and then 1) Load table with data, 2) Kill rs1 and rs2 and wait for recovery to complete for both servers, 3) Stop Hypertable and then restart with just rs3, rs4, and rs5. Dump keys after each of the three steps and verify that the table contains the exact set of keys that were loaded.
4 RangeServer-failover-basic-4  Start Hypertable with two RangeServers (rs1,rs2) and then load table with data. Kill rs1 and wait for recovery to complete. Try to restart rs1 and verify that it does not come up because it has been recovered. Restart all servers. Again verify that rs1 does not come up because it has been recovered.
5 RangeServer-failover-quorum  Start Hypertable with five RangeServers (rs1,rs2,rs3,rs4,rs5) and then 1) Load table with data, 2) Kill rs1, rs2, rs3, rs4 and wait for Master to indicate that a quorum has not been reached and therefore recovery is blocked. Start rs6 and wait for recover of rs1, rs2, rs3, and rs4 to complete. 3) Restart Hypertable with just rs5 and rs6. Dump keys after each of the three steps and verify that the table contains the exact set of keys that were loaded.
6 RangeServer-failover-standby-master  Start Hypertable with two RangeServers (rs1,rs2) and a second hot-standby Master. 1) Load table with data, 2) Kill rs1 and original master and wait for hot-standby Master to take over and recover rs1, 3) Restart Hypertable with just rs2. Dump keys after each of the three steps and verify that the table contains the exact set of keys that were loaded.
7 RangeServer-failover-graceperiod  Start Hypertable with two RangeServers (rs1,rs2) and then 1) load table with data, then 2) kill rs1, wait 10 seconds, kill rs2 and then wait for both to be recovered. Dump keys after each step and verify that the table contains the exact set of keys that were loaded. Also verify that the temporary recovery barrier was put up after each server was stopped.
8 RangeServer-failover-sigstop  Start Hypertable with two RangeServers (rs1,rs2) and then 1) load table with data, then 2) suspend rs1 by sending it the SIGSTOP signal and wait for it to be recovered and then restart servers wit just rs2. Dump keys after each step and verify that the table contains the exact set of keys that were loaded.
9 RangeServer-failover-bad-fragments  Start Hypertable with two RangeServers (rs1,rs2) and supply --induce-failure=bad-log-fragments-1:signal:0 switch to Master to force it to report two bad fragments after replay completes. Install notification script to capture all notifications. Then load table with data, stop rs1 and wait for it to recover. Verify that the notification script was called to 1) Notify that failover took place and 2) Report the bad fragment error was encountered.
10 Testing RangeServer-failover-restart  Start Hypertable with two RangeServers (rs1,rs2) and then 1) load table with data, then 2) restart servers without rs2 and wait for recovery of rs2 to complete. Dump keys after both steps and verify that the table contains the exact set of keys that were loaded.
11 RangeServer-failover-create-table  Start Hypertable with three range servers (rs1, rs2, rs3). Supply the --induce-failure=user-load-range-4:exit:0 switch to rs2 to force it to fail during the loading of the initial range. Create a table and verify that rs2 failed. Run load generator to load table with 200,000 keys. Dump keys and verify that the table contains the exact set of keys that were loaded.
12 RangeServer-failover-scan-1  Start Hypertable with two range servers (rs1, rs2). Supply the --induce-failure=create-scanner-user-1:exit:1 switch to rs1 to force it to fail during table scan. Create a table and run load generator to load table with 200,000 keys. Dump keys and verify that the table contains the exact set of keys that were loaded. Verify that rs1 was recovered.
13 RangeServer-failover-scan-2  Start Hypertable with two range servers (rs1, rs2). Supply the --induce-failure=fetch-scanblock-user-1:exit:0 switch to rs1 to force it to fail during table scan. Create a table and run load generator to load table with 200,000 keys. Dump keys and verify that the table contains the exact set of keys that were loaded. Verify that rs1 was recovered.
14 split-failover-1  Start Hypertable with three range servers (rs1, rs2, rs3). Supply the --induce-failure=metadata-split-1:exit:1 switch to rs1 to force it to fail after the split log has been installed during the split of a METADATA range. Run load generator to load table with 200,000 keys. Verify that rs1 was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
15 split-failover-2  Start Hypertable with three range servers (rs1, rs2, rs3). Supply the --induce-failure=metadata-split-2:exit:1 switch to rs1 to force it to fail after a METADATA range has been compacted and shrunk during a split and persisted to the SPLIT_SHRUNK state. Run load generator to load table with 200,000 keys. Verify that rs1 was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
16 split-failover-3  Start Hypertable with three range servers (rs1, rs2, rs3). Supply the --induce-failure=metadata-split-3:exit:1 switch to rs1 to force it to fail during a METADATA split, after the newly split-off range has been reported to the Master, but before the range state has been persisted in the range server meta log and the call to Master::relinquish_acknowledge() is made. Run load generator to load table with 200,000 keys. Verify that rs1 was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
17 split-failover-4  Start Hypertable with three range servers (rs1, rs2, rs3). Supply the --induce-failure=metadata-split-4:exit:1 switch to rs1 to force it to fail during a METADATA split, after the newly split-off range has been reported to the Master and the range state has been persisted in the range server meta log, but before the call to Master::relinquish_acknowledge() is made. Run load generator to load table with 200,000 keys. Verify that rs1 was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
18 split-failover-5  Start Hypertable with three range servers (rs1, rs2, rs3). Supply the --induce-failure=metadata-load-range-3:exit:1 switch to rs1 to force it to fail after the range server has taken ownership of the newly split-off METADATA range by writing the new location into the METADATA table, but before the split log has been linked into the primary commit log and the range has been persisted in the range server's metalog. Run load generator to load table with 200,000 keys. Verify that rs1 was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
19 split-failover-6  Start Hypertable with three range servers (rs1, rs2, rs3). Supply the --induce-failure=metadata-load-range-3:exit:1 switch to rs1 to force it to fail after the range server has taken ownership of the newly split-off METADATA range by writing the new location into the METADATA table and the split log has been linked into the primary commit log and the range has been persisted in the range server's metalog, but before success is reported back to the Master. Run load generator to load table with 200,000 keys. Verify that rs1 was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
20 split-failover-7  Start Hypertable with three range servers (rs1, rs2, rs3). Supply the --induce-failure=user-range-acknowledge-load-pause-1:pause(10000):3 and the --induce-failure=metadata-load-range-3:exit:1 switch to rs1. This will cause it to pause for 10 seconds and then fail (exit) inside Range::acknowledge_load(), immediately before it persists load_acknowledged=true for the range in the range server's metalog. By introducing the 10 second delay, it increases the likelihood that the range will have received additional updates. Run load generator to load table with 200,000 keys. Verify that rs1 was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
21 RangeServer-failover-master-1  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-root-initial-1:exit:0 argument to the Master to force it to fail during the INITIAL state of the recovery of the ROOT range. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
22 RangeServer-failover-master-2  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-root-initial-2:exit:0 argument to the Master to force it to fail during the INITIAL state of the recovery of the ROOT range, after the recovery plan and state transition to PHANTOM_LOAD has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
23 RangeServer-failover-master-3  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-root-load-3:exit:0 argument to the Master to force it to fail during the PHANTOM_LOAD state of the recovery of the ROOT range, after the phantom_load has successfully completed, but before the state transition to REPLAY_FRAGMENTS has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
24 RangeServer-failover-master-4  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-root-replay-3:exit:0 argument to the Master to force it to fail during the REPLAY_FRAGMENTS state of the recovery of the ROOT range, after the replay was successful, but before the state transition to PREPARE has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
25 RangeServer-failover-master-5  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-root-prepare-3:exit:0 argument to the Master to force it to fail during the PREPARE state of the recovery of the ROOT range, after the calls to RangeServer::phantom_prepare_ranges() were successful, but before the state transition to COMMIT has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
26 RangeServer-failover-master-6  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-root-commit-3:exit:0 argument to the Master to force it to fail during the COMMIT state of the recovery of the ROOT range, after the calls to RangeServer::phantom_commit_ranges() were successful, but before the state transition to ACKNOWLEDGE has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
27 RangeServer-failover-master-7  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-root-commit-3:exit:0 argument to the Master to force it to fail during the COMMIT state of the recovery of the ROOT range, after the calls to RangeServer::load_acknowledge() were successful, but before the state transition to COMPLETE has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
28 RangeServer-failover-master-8  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-metadata-load-3:exit:0 argument to the Master to force it to fail during the PHANTOM_LOAD state of the recovery of a METADATA range, after the phantom_load has successfully completed, but before the state transtion to REPLAY_FRAGMENTS has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
29 RangeServer-failover-master-9  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-metadata-replay-3:exit:0 argument to the Master to force it to fail during the REPLAY_FRAGMENTS state of the recovery of a METADATA range, after the replay was successful, but before the state transition to PREPARE has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
30 RangeServer-failover-master-10  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-metadata-prepare-3:exit:0 argument to the Master to force it to fail during the PREPARE state of the recovery of a METADATA range, after the calls to RangeServer::phantom_prepare_ranges() were successful, but before the state transition to COMMIT has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
31 RangeServer-failover-master-11  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-metadata-commit-3:exit:0 argument to the Master to force it to fail during the COMMIT state of the recovery of a METADATA range, after the calls to RangeServer::phantom_commit_ranges() were successful, but before the state transition to ACKNOWLEDGE has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
32 RangeServer-failover-master-12  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-metadata-commit-3:exit:0 argument to the Master to force it to fail during the COMMIT state of the recovery of a METADATA range, after the calls to RangeServer::load_acknowledge() were successful, but before the state transition to COMPLETE has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
33 RangeServer-failover-master-13  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-user-load-3:exit:0 argument to the Master to force it to fail during the PHANTOM_LOAD state of the recovery of a USER range, after the phantom_load has successfully completed, but before the state transtion to REPLAY_FRAGMENTS has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
34 RangeServer-failover-master-14  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-user-replay-3:exit:0 argument to the Master to force it to fail during the REPLAY_FRAGMENTS state of the recovery of a USER range, after the replay was successful, but before the state transition to PREPARE has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
35 RangeServer-failover-master-15  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-user-prepare-3:exit:0 argument to the Master to force it to fail during the PREPARE state of the recovery of a USER range, after the calls to RangeServer::phantom_prepare_ranges() were successful, but before the state transition to COMMIT has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
36 RangeServer-failover-master-16  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-user-commit-3:exit:0 argument to the Master to force it to fail during the COMMIT state of the recovery of a USER range, after the calls to RangeServer::phantom_commit_ranges() were successful, but before the state transition to ACKNOWLEDGE has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
37 RangeServer-failover-master-17  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-ranges-user-commit-3:exit:0 argument to the Master to force it to fail during the COMMIT state of the recovery of a USER range, after the calls to RangeServer::load_acknowledge() were successful, but before the state transition to COMPLETE has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
38 RangeServer-failover-master-18  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-1:exit:0 argument to the Master to force it to fail inside the toplevel recovery state machine in the INITIAL state, after the range server's metalog has been read and a recovery plan was created, but before the state transition to ISSUE_REQUESTS has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
39 RangeServer-failover-master-19  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-2:exit:0 argument to the Master to force it to fail inside the toplevel recovery state machine in the INITIAL state, after the range server's metalog has been read and a recovery plan was created and the state transition to ISSUE_REQUESTS has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
40 RangeServer-failover-master-20  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-3:exit:0 argument to the Master to force it to fail inside the toplevel recovery state machine in the ISSUE_REQUESTS state, after recovery operations have been created for each of the failed range server's commit log types (ROOT, METADATA, SYSTEM, USER) and the state transition to FINALIZE has been persisted to the Master metalog. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
41 RangeServer-failover-master-21  Start Hypertable with two RangeServers (rs1,rs2) and supply the --induce-failure=recover-server-4:exit:0 argument to the Master to force it to fail inside the toplevel recovery state machine in the FINALIZE state, after the range server's connection has been purged from the connection manager. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
42 RangeServer-failover-master-22  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=recover-server-ranges-user-phantom-load-ranges:throw:0 argument to the Master to force it to throw an exception during recovery of the USER commit log, after the first successful call to RangeServer::phantom_load(). Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
43 RangeServer-failover-master-23  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=recover-server-ranges-user-replay-fragments:throw:0 argument to the Master to force it to throw an exception during recovery of the USER commit log, after the first successful call to RangeServer::replay_fragments(). Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
44 RangeServer-failover-master-24  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=recover-server-ranges-user-phantom-prepare-ranges:throw:0 argument to the Master to force it to throw an exception during recovery of the USER commit log, after the first successful call to RangeServer::phantom_prepare_ranges(). Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
45 RangeServer-failover-master-25  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=recover-server-ranges-user-phantom-commit-ranges:throw:0 argument to the Master to force it to throw an exception during recovery of the USER commit log, after the first successful call to RangeServer::phantom_commit_ranges(). Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
46 RangeServer-failover-master-26  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=recover-server-ranges-user-acknowledge-load:throw:0 argument to the Master to force it to throw an exception during recovery of the USER commit log, after the first successful call to RangeServer::acknowledge_load(). Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
47 RangeServer-failover-master-27  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-load-user:exit:0 argument to rs2 to force it to fail (exit) inside the call to RangeServer::phantom_load() during the recovery of one of rs1's USER ranges. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
48 RangeServer-failover-master-28  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-load-user:exit:0 argument to rs2 and supply the --induce-failure=recover-server-ranges-user-load-2:exit:0 to the Master. This causes rs2 to fail (exit) inside the call to RangeServer::phantom_load() during the recovery of one of rs1's USER ranges, and also causes the Master to fail when it detects the failure of the call to RangeServer::phantom_load() at rs2. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
49 RangeServer-failover-master-29  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=replay-fragments-user-0:exit:0 argument to rs2 to force it to fail (exit) inside the call to RangeServer::replay_fragments() during the recovery of one of rs1's USER ranges. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
50 RangeServer-failover-master-30  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=replay-fragments-user-0:exit:0 argument to rs2 and supply the --induce-failure=recover-server-ranges-user-replay-2:exit:0 to the Master. This causes rs2 to fail (exit) inside the call to RangeServer::replay_fragments() during the recovery of one of rs1's USER ranges, and also causes the Master to fail when it detects the failure of the call to RangeServer::replay_fragments() at rs2. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
51 RangeServer-failover-master-31  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=replay-fragments-user-1:exit:0 argument to rs2 to force it to fail (exit) inside the call to RangeServer::replay_fragments(), after the replay completed successfully, but before notification was sent back to the Master, during the recovery of one of rs1's USER ranges. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
52 RangeServer-failover-master-32  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-update-user:exit:0 argument to rs2 to force it to fail (exit) inside the call to RangeServer::phantom_update(), during the recovery of one of rs1's USER ranges. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
53 RangeServer-failover-master-33  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-update-user:exit:0 argument to rs2 and supply the --induce-failure=recover-server-ranges-user-replay-2:exit:0 to the Master. This causes rs2 to fail (exit) inside the call to RangeServer::phantom_update() during the recovery of one of rs1's USER ranges, and also causes the Master to fail when it detects the failure of the call to RangeServer::replay_fragments() at rs2. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
54 RangeServer-failover-master-34  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-update-metadata:exit:0 argument to rs2 to force it to fail (exit) inside the call to RangeServer::phantom_update(), during the recovery of one of rs1's METADATA ranges. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
55 RangeServer-failover-master-35  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-prepare-ranges-user-1:exit:0 argument to rs2 to force it to fail (exit) inside the call to RangeServer::phantom_prepare_ranges(), after ranges have been prepared, but before the PHANTOM range entries have been written to the range server's metalog, during the recovery of rs1's USER ranges. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
56 RangeServer-failover-master-36  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-prepare-ranges-user-2:exit:0 argument to rs2 to force it to fail (exit) inside the call to RangeServer::phantom_prepare_ranges(), after the PHANTOM range entries have been written to the range server's metalog, but before success has been reported back to the Master, during the recovery of rs1's USER ranges. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
57 RangeServer-failover-master-37  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-prepare-ranges-user-2:exit:0 argument to rs2 and supply the --induce-failure=recover-server-ranges-user-prepare-2:exit:0 to the Master. This causes rs2 to fail (exit) inside the call to RangeServer::phantom_prepare_ranges() during the recovery of one of rs1's USER ranges, and also causes the Master to fail when it detects the failure of the call to RangeServer::phantom_prepare_ranges() at rs2. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
58 RangeServer-failover-master-38  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-prepare-ranges-user-3:exit:0 argument to rs2 to force it to fail (exit) inside the call to RangeServer::phantom_prepare_ranges(), after success has been reporte back to the Master, during the recovery of rs1's USER ranges. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
59 RangeServer-failover-master-39  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-commit-user-1:exit:0 argument to rs2 to force it to fail (exit) inside the call to RangeServer::phantom_commit_ranges(), before the Location column of the METADATA table has been updated for the ranges, during the recovery of rs1's USER ranges. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
60 RangeServer-failover-master-40  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-commit-user-2:exit:0 argument to rs2 to force it to fail (exit) inside the call to RangeServer::phantom_commit_ranges(), after the Location column of the METADATA table has been updated for all of the ranges, but before the range server's metalog has been updated, during the recovery of rs1's USER ranges. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
61 RangeServer-failover-master-41  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-commit-user-2:exit:0 argument to rs2 and supply the --induce-failure=recover-server-ranges-user-commit-2:exit:0 to the Master. This causes rs2 to fail (exit) inside the call to RangeServer::phantom_commit_ranges() during the recovery of one of rs1's USER ranges, and also causes the Master to fail when it detects the failure of the call to RangeServer::phantom_commit_ranges() at rs2. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
62 RangeServer-failover-master-42  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-commit-user-3:exit:0 argument to rs2 to force it to fail (exit) inside the call to RangeServer::phantom_commit_ranges(), after the the range server's metalog has been successfully updated, but before success has been reported back to the Master, during the recovery of rs1's USER ranges. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.
63 RangeServer-failover-master-43  Start Hypertable with three RangeServers (rs1,rs2,rs3) and supply the --induce-failure=phantom-commit-user-4:exit:0 argument to rs2 to force it to fail (exit) inside the call to RangeServer::phantom_commit_ranges(), after success has been reported back to the Master, during the recovery of rs1's USER ranges. Run load generator to load table with 200,000 keys. Kill rs1 and verify that it was recovered. Dump keys and verify that the table contains the exact set of keys that were loaded.

Thanks again to everyone who participated in this considerable development effort!

Posted By:  Doug Judd, CEO, Hypertable Inc.



permalink

Hypertable has Reached a Major Milestone!

02.14.2013  |  RangeServer Failover  |  2

With the release of Hypertable version 0.9.7.0 comes support for automatic RangeServer failover.  Hypertable will now detect when a RangeServer has failed, logically remove it from the system, and automatically re-assign the ranges that it was managing to other RangeServers.  This represents a major milestone for Hypertable and alows for very large scale deployments.  We have been activly working on this feature, full-time, for 1 1/2 years.  To give you an idea of the magnitude of the change, here are the commit statistics:

  • 441 changed files
  • 17,522 line additions
  • 6,384 line deletions

The reason that this feature has been a long time in the making is because we placed a very high standard of quality for this feature so that under no circumstance, a RangeServer failure would lead to consistency problems or data loss.  We're confident that we've achieved 100% correctness under every conceivable circumstance.  The two primary goals for the feature, robustness and applicaiton transparancy, are described below.

Robustness

We designed the RangeServer failover feature to be extremely robust.  RangeServers can fail in any state (mid-split, transferring, etc.) and will be recovered properly.  The system can also withstand the loss of any RangeServer, even the ones holding the ROOT or other METADATA ranges.  To achieve this level of robustness, we added 63 regression tests that verify the correct handling of RangeServer failures in every conceivable failure scenario.  We will follow up later with a blog post describing these tests.

Application Transparency

Another important aspect of our RangeServer failover implementation is application transparency.  Aside from a transient delay in database access, RangeServer failures are...

read more

Roadmap to Hypertable 1.0

07.23.2012  |  Release Status  |  11

With the release of Hypertable version 0.9.6.0 I thought I would take some time to describe where we are in terms of the Hypertable 1.0 release and what work is remaining.  We had intended to make the next Hypertable release our beta release.  However, it’s been four months since the release of 0.9.5.6 and since the beta release is not quite ready to go, we decided to do one last alpha release and call it 0.9.6.0.  In this release we’ve put in a considerable effort to fix a number of stability issues that have affected prior releases.

0.9.6.0 Stability Improvements for HDFS deployments

The biggest source of instability for Hypertable deployments running on top of HDFS has do with the unclean shutdown of either the Master or RangeServer.   Upon restart after this situation has ocurred, the RangeServer (or Master) can fail to come up with an error message similar to the following in its log file:

1342810317 ERROR Hypertable.RangeServer : verify_backup (/root/src/hypertable/src/cc/Hypertable/Lib/MetaLogReader.cc:131): MetaLog file '/hypertable/servers/rs12/log/rsml/0' has length 0 < backup file '/opt/hypertable/0.9.5.6 /run/log_backup/rsml/rs12/0' length 11376

This problem was due to a misunderstanding on our part of the HDFS API semantics.  Whenever the Master or RangeServer writes data to any of its log files, it makes a call to FSDataOutputStream.sync() to ensure that the data makes it in to the filesystem and is persistent.  However, after making this call, a call to the FileStatus.getLen() does not return the correct value.  FileStatus.getLen() only returns the correct file length if the file was properly closed.  HDFS provides an alternate API, DFSClient.DFSDataInputStream.getVisibleLength(), that returns the actual length of the file regardless...

read more

Secondary Indices Have Arrived!

03.22.2012  |  New Feature  |  17

Until now, SELECT queries in Hypertable had to include a row key, row prefix or row interval specification in order to be fast. Searching for rows by specifying a cell value or a column qualifier involved a full table scan which resulted in poor performance and scaled badly because queries took longer as the dataset grew. With 0.9.5.6, we’ve implemented secondary indices that will make such SELECT queries lightning fast!

Hypertable supports two kinds of indices: a cell value index and a column qualifier index. This blog post explains what they are, how they work and how to use them.

The cell value index

Let’s look at an example of how to create those two indices.  A big telco asks us to design a table for its customer data.  Every user profile has a customer ID as the row key. But our system also wants to provide fast queries by phone number, since customers can dial in and our automated answering system can then immediately figure out who’s calling by checking the caller ID.  We therefore decide to create a secondary index on the phone number.  The following statement might be used to create this table and along with a phone number index:

CREATE TABLE customers ( name, address, phone_no, INDEX phone_no );

Internally, Hypertable will now create a table customers and an index table ^customers. Every cell that is now inserted into the phone_no column family will be transformed and inserted into the index table as well. If you’re curious, you can insert some phone numbers and run, SELECT * FROM “^customers”; to see how the index was updated.

Not every query makes use of the...

read more

Sehrch.com: A Structured Search Engine Powered By Hypertable

03.15.2012  |  Hypertable Case Study  |  10

Sehrch.com is a structured search engine.  It provides powerful querying capabilities that enable users to quickly complete complex information retrieval tasks.  It gathers conceptual awareness from the Linked Open Data cloud, and can be used as (1) a regular search engine or (2) as a structured search engine.  In both cases conceptual awareness is used to build entity centric result sets.  Try this simple query: Pop singers less than 20 years old.

Sehrch.com gathers data from the Semantic Web in the form of RDF, crawling the Linked Open Data cloud and making requests with headers accepting RDF NTriples.  Data dumps are also obtained from various sources.  In order to store this data, we required a data store capable of storing tens of billions of triples using the least hardware while still delivering high performance.  So we conducted our own study to find the most appropriate store for this type and quantity of data.

As Semantic Web people, our initial choice would have been to use native RDF data stores, better known as triplestores.  But from our initial usage we quickly concluded that SPARQL compliant triplestores and large quantities of data do not mix well.  As a challenge, we attempted to load 1.3 billion triples (the entire DBpedia and Freebase datasets) into a dual core machine with only 3GB memory.  The furthest any of the open source triplestores (4store, TDB, Virtuoso) progressed to load the datasets upon the given hardware was around 80 million triples.  We were told that the only solution was more hardware.  We weren't the only ones facing significant hardware requirements when attempting to the load this volume of data.  For example, in the following post a machine with 8 cores and 32GB...

read more

Welcome!

02.07.2012  |  Hypertable Unveils New Website  |  5

Welcome to the new Hypertable website.  This new website is easy to navigate and has all of the tools you'll need to learn about Hypertable and easily deploy it for your big data aplications.

We have put a tremendous amount of effort into the Documentation section of the website.  There you'll find an Architectural Overview, Installation Instructions, Administrator Guides, Reference Guides and much more.  Be sure check out the Code Examples section for working code examples written Java, C++, PHP, Python, Perl and Ruby.

We're also very excited to announce new products and services available today:

  • UpTime Support Subscription – for round-the-clock, 7 days a week, 365 days a year "uptime" assurance for your Hypertable deployments -- with support stafflocated in the United States and Europe
  • Training and Certification –  taught by big data experts, Hypertable Training and Certification classes are held in Silicon Valley, USA and Munich, Germany
  • Commercial License – for organizations such as OEMs, ISVs, and VARs who distribute Hypertable with their closed source products, Hypertable Inc. offers the software under a flexible OEM commercial license

Also, please check out the Hypertable vs. HBase Performance Evaluation II.  In this in-house performance test, Hypertable demonstrates its power by easily loading 167 billion records into a 16-node test cluster, a test in which HBase failed, struggling with memory management.

If you're looking to take commercial advantage of the power of big data, then the Hypertable database platform is the right choice for you.  Our new product and service offerings will ensure that you get the most out of your Hypertable deployment, allowing you to take full advantage of Hypertable's unprecedented performance and cost-efficiency...

read more