Indexing your Models
- Basic Indexing
- Index Names
- Real-time Indices vs SQL-backed Indices
- Conditions and Groupings
- Sanitizing SQL
- Index Options
- Multiple Indices
- Real-time Callbacks
- Processing your Index
Everything to set up the indices for your models goes in files in
app/indices. The files themselves can be named however you like, but I generally opt for
model_name_index.rb. At the very least, the file name should not be the same as your model’s file name. Here’s an example of what goes in the file:
You’ll notice the first argument is the model name downcased and as a symbol, and we are specifying the processor -
:active_record - to use SQL-backed indices. Everything inside the block is just like previous versions of Thinking Sphinx, if you’re familiar with that (and if not, keep reading).
An equivalent index definition if you want to use real-time indices would be:
You’ll also want to add a real-time callback to your model.
When you’re defining indices for namespaced models, use a lowercase string with /’s for namespacing and then casted to a symbol as the model reference:
When translating these index definitions into Sphinx configuration, Thinking Sphinx will use the model’s name for the index, and append a
_core suffix to it. So, an index for
Article will be named
If you’re using SQL-backed indices with deltas, then there is also a corresponding index with the
_delta suffix - e.g.
You can set different index names if you wish, using the
:name option (as noted later in this documentation related to multiple indices for a single model). However, the suffixes will always be applied.
Real-time Indices vs SQL-backed Indices
Thinking Sphinx allows for definitions of both real-time indices and SQL-backed indices. (In previous versions, only SQL-backed indices were available.)
Real-time indices are processed using Sphinx’s SphinxQL protocol, and thus are managed by Thinking Sphinx via Ruby, with the following advantages:
- Your fields and attributes reference Ruby methods.
- Real-time records can be updated directly, thus keeping your Sphinx data up-to-date almost immediately. This removes the need for delta indices.
The SQL-backed indices, however, have the potential to be much faster: the indexing process avoids the need to iterate through every record separately, and can use SQL joins to load association data directly.
You’ll need to consider which approach will work best for your application, but certainly if your data is changing frequently and you’d like it to be up-to-date, it’s worth starting with real-time indices.
The two approaches are distinguished by the
Any differences in behaviour within an index definition are noted in the documentation below.
indexes method adds one (or many) fields, by referencing the model’s method names (for real-time indices) or column names (for SQL-backed indices). You cannot reference model methods with SQL-backed indices - in this case, Sphinx talks directly to your database, and Ruby doesn’t get loaded.
You don’t need to keep the same names as your model, though. Use the
:as option to signify a new name. Field and attribute names must be unique, so specifying custom names (instead of the column name for both) is essential.
You can also flag fields as being sortable.
:facet option to signify a facet.
For real-time indices, you can drill down on methods that return single objects (such as
If you want to collect multiple values into a single field, you will need a method in your model to aggregate this:
With SQL-backed indices, if there are associations in your model you can drill down through them to access other columns. Explicit names with the
:as option are required when doing this.
There may be times when a normal column value isn’t exactly what you’re after, so you can also define your indexes as raw SQL:
Again, in this situation, an explicit name is required, and it only works with SQL-backed indices.
has method adds one (or many) attributes, and just like the
indexes method, it requires references to the model’s methods (for real-time indices) or column names (for SQL-backed indices).
Real-time indices require the attribute types to be set manually, but SQL-backed indices have the ability to introspect on the database to determine types. Known types for real-time indices are:
The syntax is very similar to setting up fields. You can set custom names, and drill down into associations. You don’t ever need to label an attribute as
:sortable though - in Sphinx, all attributes can be used for sorting.
You’ll also see below that multi-value attributes in real-time indices need the
:multi option to be set.
Again: fields and attributes cannot share names - they must all be unique. Use the
:as option to provide custom names when a column is being used more than once.
Conditions and Groupings
Because SQL-backed indices are translated to SQL, you may want to add some custom conditions or groupings manually - and for that, you’ll want the
For real-time indices you can define a custom scope to preload associations or apply custom conditions:
This scope only comes into play when populating all records at once, not when single records are created or updated.
Note: this section applies only to SQL-backed indices.
As previously mentioned, your index definition results in SQL from the indexes, the attributes, conditions and groupings, etc. With this in mind, it may be useful to simplify your index.
One way would be to use something like
ActiveRecord::Base.sanitize_sql to generate the required SQL for you. For example:
This will produce the expected
WHERE published = 1 for MySQL.
Most Sphinx index configuration options can be set on a per-index basis using the
set_property method within your index definition. Here’s an example for the
set_property takes a hash of options, but also can be called as many times as you’d like.
If you want more than one index defined for a given model, just add further
ThinkingSphinx::Index.define calls - but make sure you give every index a unique name, and have the same attributes defined in all indices.
These index definitions can be in the same file or separate files - it’s up to you.
If you’re using real-time indices, you will want to add a callback to your model to ensure changes are reflected in Sphinx:
If you want changes to associated data to fire Sphinx updates for a related model, you can specify a method chain for the callback.
The first argument, in all situations, should match the index definition’s first argument: a symbolised version of the model name. The second argument is a chain, and should be in the form of an array of symbols, each symbol representing methods called to get to the indexed object (so, an instance of the Article model in the example above).
If you wish to have your callbacks update Sphinx only in certain conditions, you can either define your own callback and then invoke TS if/when needed:
Or supply a block to the callback instantiation which returns an array of instances to process:
You do not need to add a
destroy callback - Thinking Sphinx does this automatically for all indexed models.
Processing your Index
Once you’ve got your index set up just how you like it, you can run the rake task to get Sphinx to process the data.
If you have made structural changes to your index (which is anything except adding new data into the database tables), you’ll need to stop Sphinx, re-process, and then re-start Sphinx - which can be done through a single rake call.
Index Guard Files
Any given SQL-backed index can not be processed more than once concurrently. To avoid multiple indexing requests, Thinking Sphinx adds a lock file in the indices directory while indexing occurs, named
ts-INDEXNAME.tmp. When you’re processing all indices in the one call (via either of the above rake tasks), then the lock file is instead named
In rare cases (generally when the parent process crashes completely), orphan lock files may remain - these are safe to remove if no indexing is occured. If you’re finding some of your indices aren’t being processed reliably, checking for these index files is recommended.
These lock files are not created when processing real-time indices.
You can disable the use of these lock files if you wish, by changing the guarding strategy:
ts:index will instruct Sphinx to process all indices (and this has always been how Thinking Sphinx has behaved). This means that Sphinx will prepare all of the new data together before switching the daemon over to use it.
It is possible, though, to instead process each index one at a time (and thus, the daemon uses each index’s new data as that index’s processing is completed):
Should you wish to build your own indexint strategy, you can give
ThinkingSphinx::Configuration.instance.indexing_strategy anything you like that responds to call and expects an array of index options, and yields index names. You can see the implementations of the two approaches here.
You can also process just specific indices via the
INDEX_FILTER environment variable: