Common Questions and Issues
Depending on how you have Sphinx setup, or what database you’re using, you might come across little issues and curiosities. Here’s a few to be aware of.
- Editing the generated Sphinx configuration file
- Running multiple instances of Sphinx on one machine
- Record IDs found by Sphinx but not by ActiveRecord
- Viewing Result Weights
- Wildcard Searching
- Slow Indexing
- MySQL and Large Fields
- PostgreSQL with Manual Fields and Attributes
- Delta Indexing Not Working
- Running Delta Indexing with Passenger
- Can only access the first thousand search results
- Filtering on String Attributes
- Removing HTML from Excerpts
- Using other Database Adapters
- Using OR Logic with Attribute Filters
- Catching Exceptions when Searching
- Using with Unicorn
- Alternatives to MVAs with Strings
- Indices not being processed
Editing the generated Sphinx configuration file
In most situations, you won’t need to edit this file yourself, and can rely on Thinking Sphinx to generate it reliably.
If you do want to customise the settings, you’ll find most options are available to set via
config/thinking_sphinx.yml - many are mentioned on the Advanced Sphinx Configuration page. For those that aren’t mentioned on that page, you could still try setting it, and there’s a fair chance it will work.
On the off chance that you actually do need to edit the file, make sure you’re running the
ts:index task with the
INDEX_ONLY environment variable set to true, otherwise the task will always regenerate the configuration file, overwriting your customisations.
Running multiple instances of Sphinx on one machine
You can run as many Sphinx instances as you wish on one machine - but each must be bound to a different port. You can do this via the
config/thinking_sphinx.yml file - just add a setting for the port for the specific environment using the mysql41 setting:
Other options are documented on the Advanced Sphinx Configuration page.
Record IDs found by Sphinx but not by ActiveRecord
ThinkingSphinx::Search::StaleIdsException exceptions are being raised with the message “Record IDs found by Sphinx but not by ActiveRecord”, then it’s likely that your Sphinx data/daemon is out-of-sync with your database. This can happen if data changes occur without firing ActiveRecord callbacks, or if the daemon is somehow orphaned from its pidfile.
In either case, it’s recommended that you restart the daemon (via the
ts:restart task) and/or reprocess your indices (via
ts:rebuild) to ensure both daemon and data are correct again.
If you find these exceptions occur regularly, then review anywhere the indexed models have changes to data occurring - if no ActiveRecord callbacks are being invoked (especially when it comes to data deletion), then this could be the underlying cause. To bulk-delete records from Sphinx, the following example code may be helpful:
Viewing Result Weights
To retrieve the weights/rankings of each search result, you can enumerate through your matches using
each_with_weight, once you’ve added the appropriate mask:
If you want to access weights directly for each search result, you should add a weight pane to the search context:
Sphinx can support wildcard searching (for example: Austr∗), though it is turned off by default in Sphinx 2.1. To enable it, you need to add two settings to your
You can set the
min_infix_len value to something higher if you don’t need single characters with a wildcard being matched. This may be a worthwhile fine-tuning, because the smaller the infixes are, the larger your index files become.
Don’t forget to rebuild your Sphinx indexes after making this change.
If Sphinx is taking a while to process all your records, there are a few common reasons for this happening. Firstly, make sure you have database indexes on any foreign key columns and any columns you filter or sort by.
Secondly - are you using fixtures, or are there large gaps between primary key values for your models? Sphinx isn’t set up to process disparate IDs efficiently by default - and Rails’ fixtures have randomly generated IDs, which are usually extremely large integers. To get around this, you’ll need to set
sql_range_step in your
config/thinking_sphinx.yml file for the appropriate environments:
MySQL and Large Fields
If you’ve got a field that is built off multiple values in one column from a MySQL database - ie: through a has_many association - then you may hit MySQL’s default limit for string concatenation: 1024 characters. You can increase the group_concat_max_len value by adding the following to your index definition:
If these fields get particularly large though, then there’s another setting you may need to set in your MySQL configuration: max_allowed_packet, which has a default of sixteen megabytes. You can’t set this option via Thinking Sphinx though (it’s a rare edge case).
PostgreSQL with Manual Fields and Attributes
If you’re using fields or attributes defined by strings (raw SQL) in SQL-backed indices, then the columns used in them aren’t automatically included in the GROUP BY clause of the generated SQL statement. To make sure the query is valid, you will need to explicitly add these columns to the GROUP BY clause.
A common example is if you’re converting latitude and longitude columns from degrees to radians via SQL.
Delta Indexing Not Working
Often people find delta indexing isn’t working on their production server. Sometimes, this is because Sphinx is running as one user on the system, and the Rails application is being served as a different user. Check your production.log and Apache/Nginx error log file for mentions of permissions issues to confirm this.
Indexing for deltas is invoked by the web user, and so needs to have access to the index files. The simplest way to ensure this is by running all Thinking Sphinx rake tasks with that web user.
If you’re still having issues, and you’re using Passenger, read the next hint.
Running Delta Indexing with Passenger
If you’re using Phusion Passenger on your production server, with delta indexing on some models, a common issue people find is that their delta indexes don’t get processed.
If it’s not a permissions issue (see the previous hint), another common cause is because Passenger has its own PATH set up, and can’t execute the Sphinx binaries (indexer and searchd) implicitly.
The way around this is to find out where your binaries are on the server:
And then set the bin_path option in your
config/thinking_sphinx.yml file for the production environment:
Can only access the first thousand search results
This is actually how Sphinx is supposed to behave. Have a read of the Large Result Sets section of the Advanced Configuration page to see why, and how to work around it if you really need to.
Filtering on String Attributes
To filter by string attributes, you must be using Sphinx 2.2.3 or newer. If that’s not possible, the workarounds covered in older documentation could be viable.
Removing HTML from Excerpts
For a while, Thinking Sphinx auto-escaped excerpts. However, Sphinx itself can remove HTML entities for indexing and excerpts, which is a better way to approach this. So, you’ll want to add the following setting to your
Using other Database Adapters
If you’re using Thinking Sphinx in combination with a database adapter that isn’t quite run-of-the-mill, you may need to add a snippet of code to a Rails initialiser or equivalent.
Using OR Logic with Attribute Filters
It is possible to filter on attributes using OR logic. There are two steps: firstly, you need to create a computed attribute while searching, using Sphinx’s select option, and then filter by that computed value.
Here’s an example where we want to return all publicly visible articles, as well as articles belonging to the user with an ID of 5.
If you’ve given your attributes aliases (using the
:as option) in your index definition, then you must refer to those attributes by their aliases, not the original database columns. This applies generally to anything using those attributes (filtering, ordering, facets, etc).
Catching Exceptions when Searching
By default, Thinking Sphinx does not execute the search query until you examine your search results - which is usually in the view. This is so you can chain sphinx scopes without sending multiple (unnecessary) queries to Sphinx.
However, this means that exceptions will be fired from within the view - and most people put their exception handling in the controller. To force exceptions to fire when you actually define the search, all you need to do is to inform Thinking Sphinx that it should populate the results immediately:
If you’re chaining scopes together, make sure you add this at the end with a final search call:
Using with Unicorn
If you’re using Unicorn as your web server, you’ll want to ensure the connection pool is cleared after forking.
Alternatives to MVAs with Strings
Given Sphinx doesn’t support multi-value string attributes, what are alternative ways to achieve similar functionality?
The easiest approach is when the string values are coming from an association. In this case, use the foreign key ids instead, and translate string values to the underlying id when you’re filtering your searches.
Otherwise, you could look into using CRC’d integer values of strings, though there is the possibility of collisions.
Indices not being processed
If you’re finding indices aren’t being processed - particularly delta indices - it could be that guard files haven’t been cleaned up properly. They are located in the indices directory, and take the name pattern
Provided there is no indexing occuring, they can safely be deleted.