• Twitter
  • Facebook
  • MySpace
  • StumbleUpon
  • Reddit
  • Digg
  • Del.icio.us
  • E-mail

New and Improved Secondary Indexes

07.10.2014  |  New Feature

As part of our ongoing effort to make Hypertable useful for a wide range of applications, we've greatly improved our support for secondary indexing.  These improvements, introduced in the 0.9.8.0 release, enable much more powerful queries that access rows by means other than the primary key.  As part of this effort, we've changed the query language to allow for more expressive queries and have brought it more in-line with SQL.

Like the original implementation, thew new secondary indexes are fully consistent and massively scalable.  They can be used to index not only column values, but also column qualifiers.  Secondary indexes can be defined in the CREATE TABLE statement with the INDEX and QUALIFIER INDEX clauses, for example:

CREATE TABLE customers (

    name,
    INDEX name,
    tag,
    QUALIFIER INDEX tag,
    info,
    INDEX info,
    QUALIFIER INDEX info
);

The ALTER TABLE command has been enhanced to allow for creating and dropping indexes of preexisting tables.  For example:

ALTER TABLE customers ADD ( QUALIFIER INDEX name ) DROP INDEX ( info );

To populate indexes added with ALTER TABLE or to convert indexes created with older versions of Hypertable into the new format, the indexes need to be rebuilt.  This can be accomplished with the new REBUILD INDICES command.  For example:

REBUILD INDICES customers;

The examples which follow in this post assume that the following table has been created and populated with products.tsv.

CREATE TABLE products (
  title,
  section,
  info,
  category,
  INDEX section,
  INDEX info,
  QUALIFIER INDEX info,
  QUALIFIER INDEX category
);

A set of scripts and data files for running all of the examples in this post can be found in the archive secondary-indexes.tgz.

Queries against the value index

One important difference between secondary index queries and normal table scan queries is that with secondary index queries, the SELECT statement behaves like SQL SELECT in that the WHERE predicate acts as a boolean selector of rows and the columns listed after the SELECT keyword act as a projection of the selected rows. In other words, you can project an arbitrary set of columns that don't necessarily have to be referenced in the WHERE predicate.

Exact match

Select the title column of all rows whose section column contains exactly "books".

SELECT title FROM products WHERE section = "books";
0307743659	title	The Shining Mass Market Paperback
0321321928	title	C++ Common Knowledge: Essential Intermediate Programming [Paperback]
0321776402	title	C++ Primer Plus (6th Edition) (Developer's Library)

Select the title column of all rows that contain an info:actor column containing exactly "Jack Nicholson".

SELECT title FROM products WHERE info:actor = "Jack Nicholson";
B00002VWE0	title	 Five Easy Pieces (1970)
B002VWNIDG	title	The Shining (1980)

Prefix match

Select the title and info:publisher columns of all rows that contain an info:publisher column whose value starts with "Addison-Wesley".

SELECT title,info:publisher FROM products WHERE info:publisher =^ 'Addison-Wesley';
SELECT title,info:publisher FROM products WHERE info:publisher =~ /^Addison-Wesley/;
0321321928	title	C++ Common Knowledge: Essential Intermediate Programming [Paperback]
0321321928	info:publisher	Addison-Wesley Professional; 1 edition (March 10, 2005)
0321776402	title	C++ Primer Plus (6th Edition) (Developer's Library)
0321776402	info:publisher	Addison-Wesley Professional; 6 edition (October 28, 2011)

Regular expression match

Select the title and info:publisher columns of all rows that contain an info:author column whose value starts with "Stephen " followed by either a 'P' or 'K' character.

SELECT title,info:author FROM products WHERE info:author =~ /^Stephen [PK]/;
SELECT title,info:author FROM products WHERE Regexp(info:author, '^Stephen [PK]');
0307743659	title	The Shining Mass Market Paperback
0307743659 info:author Stephen King
0321776402	title	C++ Primer Plus (6th Edition) (Developer's Library)
0321776402	info:author	Stephen Prata

Queries against the qualifier index

Like the value index queries, the qualifier index queries have the traditional SQL select/project semantics.

Exact match

Select the title column of all rows that contain an info:studio column.

SELECT title FROM products WHERE Exists(info:studio);
B00002VWE0	title	Five Easy Pieces (1970)
B000Q66J1M	title	2001: A Space Odyssey [Blu-ray]
B002VWNIDG	title	The Shining (1980)

Regular expression match

Select the title column of all rows that contain a category column with qualifier that starts with "/Movies".

SELECT title FROM products WHERE Exists(category:/^\/Movies/);
B00002VWE0	title	Five Easy Pieces (1970)
B000Q66J1M	title	2001: A Space Odyssey [Blu-ray]
B002VWNIDG	title	The Shining (1980)

Select the title column of all rows that contain a category column with qualifier that starts with "/Books" followed at some point later with "Programming".

SELECT title FROM products WHERE Exists(category:/^\/Books.*Programming/);
0321321928	title	C++ Common Knowledge: Essential Intermediate Programming [Paperback]
0321776402	title	C++ Primer Plus (6th Edition) (Developer's Library)

Boolean operators

The boolean operators AND and OR can be used to combine column predicates. The boolean AND operator can also combine column predicates and ROW intervals.

OR operator

Select the title column of all rows that contain an info:author column that starts with "Stephen P" or contain an info:publisher column that starts with "Anchor".

SELECT title FROM products WHERE info:author =~ /^Stephen P/ OR info:publisher =^ "Anchor";
SELECT title FROM products WHERE info:author =~ /^Stephen P/ OR info:publisher =~ /^Anchor/;
0307743659	title	The Shining Mass Market Paperback
0321776402	title	C++ Primer Plus (6th Edition) (Developer's Library)

AND operator

Select the title column of all rows that contain an info:author column that starts with "Stephen " followed by either the letter 'P' or 'K' and contain an info:publisher column that starts witht "Anchor"

SELECT title FROM products
  WHERE info:author =~ /^Stephen [PK]/ AND info:publisher =^ "Anchor";
SELECT title FROM products
  WHERE info:author =~ /^Stephen [PK]/ AND info:publisher =~ /^Anchor/;
0307743659	title	The Shining Mass Market Paperback

Select the title column of all rows whose row key is greater than 'B00002VWE0' and that contain an info:actor column containing exactly "Jack Nicholson".

SELECT title FROM products WHERE ROW > 'B00002VWE0' AND info:actor = 'Jack Nicholson';
B002VWNIDG	title	The Shining (1980)

Select the title column of all rows whose row key starts with 'B' and that contain an info:actor column containing exactly "Jack Nicholson".

SELECT title FROM products WHERE ROW =^ 'B' AND info:actor = 'Jack Nicholson';
B00002VWE0	title	Five Easy Pieces (1970)
B002VWNIDG	title	The Shining (1980)

Programming examples (in all supported languages) of the above queries can be found in the Hypertable Developer Guide. Special thanks goes to Philip Gatt for providing feedback on the original implementation that led to these improvements!

Posted By:  Doug Judd

Here's what other people had to say

There are no comments yet... be the first!

What about you?

Name

Email

Comment

Remember my personal information
Notify me of follow-up comments?

Please enter the word you see in the image below: