Indexing your Models
- Basic Indexing
- Real-time Indices vs SQL-backed Indices
- Conditions and Groupings
- Sanitizing SQL
- Index Options
- Multiple Indices
- Real-time Callbacks
- Processing your Index
Everything to set up the indices for your models goes in files in
app/indices. The files themselves can be named however you like, but I generally opt for
model_name_index.rb. At the very least, the file name should not be the same as your model’s file name. Here’s an example of what goes in the file:
You’ll notice the first argument is the model name downcased and as a symbol, and we are specifying the processor -
:active_record - to use SQL-backed indices. Everything inside the block is just like previous versions of Thinking Sphinx, if you’re familiar with that (and if not, keep reading).
An equivalent index definition if you want to use real-time indices would be:
You’ll also want to add a real-time callback to your model.
When you’re defining indices for namespaced models, use a lowercase string with /’s for namespacing as the model reference:
Thinking Sphinx v1/v2
Note: Index definitions for Thinking Sphinx versions before 3.0.0 went in the model files instead, inside a
Don't forget to place this block below your associations and any
accepts_nested_attributes_for calls, otherwise any references to them for fields and attributes will not work.
Real-time Indices vs SQL-backed Indices
Thinking Sphinx allows for definitions of both real-time indices and SQL-backed indices. (In previous versions, only SQL-backed indices were available.)
Real-time indices are processed using Sphinx’s SphinxQL protocol, and thus are managed by Thinking Sphinx via Ruby, with the following advantages:
- Your fields and attributes reference Ruby methods.
- Real-time records can be updated directly, thus keeping your Sphinx data up-to-date almost immediately. This removes the need for delta indices.
The SQL-backed indices, however, have the potential to be much faster: the indexing process avoids the need to iterate through every record separately, and can use SQL joins to load association data directly.
You’ll need to consider which approach will work best for your application, but certainly if your data is changing frequently and you’d like it to be up-to-date, it’s worth starting with real-time indices.
The two approaches are distinguished by the
Any differences in behaviour within an index definition are noted in the documentation below.
indexes method adds one (or many) fields, by referencing the model’s method names (for real-time indices) or column names (for SQL-backed indices). You cannot reference model methods with SQL-backed indices - in this case, Sphinx talks directly to your database, and Ruby doesn’t get loaded.
Thinking Sphinx v1/v2
Keep in mind that if you're referencing a column that shares its name with a core Ruby method (such as id, name or type) and you're using Thinking Sphinx v1 or v2, then you'll need to specify it using a symbol.
You don’t need to keep the same names as your model, though. Use the
:as option to signify a new name. Field and attribute names must be unique, so specifying custom names (instead of the column name for both) is essential.
You can also flag fields as being sortable.
:facet option to signify a facet.
For real-time indices, you can drill down on methods that return single objects (such as
If you want to collect multiple values into a single field, you will need a method in your model to aggregate this:
With SQL-backed indices, if there are associations in your model you can drill down through them to access other columns. Explicit names with the
:as option are required when doing this.
There may be times when a normal column value isn’t exactly what you’re after, so you can also define your indexes as raw SQL:
Again, in this situation, an explicit name is required, and it only works with SQL-backed indices.
has method adds one (or many) attributes, and just like the
indexes method, it requires references to the model’s methods (for real-time indices) or column names (for SQL-backed indices).
Real-time indices require the attribute types to be set manually, but SQL-backed indices have the ability to introspect on the database to determine types. Known types for real-time indices are:
The syntax is very similar to setting up fields. You can set custom names, and drill down into associations. You don’t ever need to label an attribute as
:sortable though - in Sphinx, all attributes can be used for sorting.
You’ll also see below that multi-value attributes in real-time indices need the
:multi option to be set.
Again: fields and attributes cannot share names - they must all be unique. Use the
:as option to provide custom names when a column is being used more than once.
Conditions and Groupings
Because SQL-backed indices are translated to SQL, you may want to add some custom conditions or groupings manually - and for that, you’ll want the
For real-time indices you can define a custom scope to preload associations or apply custom conditions:
This scope only comes into play when populating all records at once, not when single records are created or updated.
Note: this section applies only to SQL-backed indices.
As previously mentioned, your index definition results in SQL from the indexes, the attributes, conditions and groupings, etc. With this in mind, it may be useful to simplify your index.
One way would be to use something like
ActiveRecord::Base.sanitize_sql to generate the required SQL for you. For example:
This will produce the expected
WHERE published = 1 for MySQL.
Most Sphinx index configuration options can be set on a per-index basis using the
set_property method within your index definition. Here’s an example for the
set_property takes a hash of options, but also can be called as many times as you’d like.
If you want more than one index defined for a given model, just add further
ThinkingSphinx::Index.define calls - but make sure you give every index a unique name, and have the same attributes defined in all indices.
These index definitions can be in the same file or separate files - it’s up to you.
Thinking Sphinx v1/v2
Note: Defining multiple indices in Thinking Sphinx v2 or older is just a matter of using define_index multiple times, and supplying a unique name for each:
If you’re using real-time indices, you will want to add a callback to your model to ensure changes are reflected in Sphinx:
If you want changes to associated data to fire Sphinx updates for a related model, you can specify a method chain for the callback.
The first argument, in all situations, should match the index definition’s first argument: a symbolised version of the model name. The second argument is a chain, and should be in the form of an array of symbols, each symbol representing methods called to get to the indexed object (so, an instance of the Article model in the example above).
If you wish to have your callbacks update Sphinx only in certain conditions, you can either define your own callback and then invoke TS if/when needed:
Or supply a block to the callback instantiation which returns an array of instances to process:
Processing your Index
Once you’ve got your index set up just how you like it, you can run the rake task to get Sphinx to process the data.
However, if you have made structural changes to your index (which is anything except adding new data into the database tables), you’ll need to stop Sphinx, re-process, and then re-start Sphinx - which can be done through a single rake call.