Guest Post: Chris Corbyn of Flippa Tweet
(Later sections written by Chris Corbyn)
Wondering why us developers don’t talk much more about search engine design, I asked on twitter:
App developers with searches: Am I the only one here who thinks search design should be much much more important than it is? I’m interested!
— Florian Hanke (@hanke) March 3, 2012
The man behind the curious pseudonym @d11wtq was Chris Corbyn, who took the time to respond in full on the design of Flippa, “The #1 Marketplace for Buying and Selling Websites”, where the search engine takes center stage.
With his gracious permission I am reprinting his email in full.
In Chris’ words:
“The motivation behind putting a focus on search
At Flippa, we’re currently up to our 3rd implementation of search and we consider it hugely important to the success of our business. We’re something along the lines of an eBay platform, but built purely for buying and selling websites. If buyers cannot find what they are looking for, we quickly lose those users, and if we don’t have buyers on the site, logically we lose the sellers who market to them. I still think we have a lot of room to improve, but when we look back at our first implementation, we have come a long way. I guess we have come to learn over time just how important search is, rather than it being something that was apparent to us right from the day we launched Flippa (three years ago).
The first implementation
When we built Flippa, we knew we needed a search, but the scope of this was simply something that had to exist so that users could find listings by keywords. It was not something that was well-integrated with the rest of the application. We used Solr (as was the fashion at the time) and search was just a “side feature” that was often forgotten about. Users could enter a keyword and get a set of results in a listing format entirely different from the layout we use when browsing our listings via the primary navigation. Users regularly complained, with reason… our search was more or less useless for their needs, which were far more complex than matching on keywords.
The second implementation
Acknowledging that users needed to be able to search on a range of metrics and that we needed to make some rather substantial changes to our search infrastructure, we sat down to discuss what our end goals were. Our users are interested more in raw numbers, than in text (e.g. they search for websites based on revenue, on page views, on alexa rank, etc). Users also wanted the ability to put together a custom search and save it to the database, so that when they returned to the site they could easily repeat a previous search.
We decided that we effectively needed to build a complete model around our search system, providing all the criteria our users would search on, in such a way that a fully-built set of criteria would be saved into the database for re-use. We also wanted to integrate our primary navigation with this search system. I don’t remember what the driving force was behind using the same system for our primary navigation, but I suspect it was mostly about unifying the UI and the underlying model code, in addition to improving our categorization of listings—sellers could previously specify if their website was “high end” or “turnkey”, for example, but with this new search system we could determine such things by looking at the numbers, in realtime.
We dropped Solr in favour of Sphinx, for two reasons:
- Indexing time with MySQL was considerably faster.
- It provided SphinxSE, which is a plugin for MySQL, allowing the index to be queried through MySQL.
We built an advanced search library around Sphinx, allowing us to compose searches from a selection of pre-defined criteria, which were all exposed in the UI through our advanced search page. Because of the MySQL integration, internally searches became a combination of full-text index querying + an INNER JOIN to our listings table in MySQL. A sort of hybrid of MySQL and Sphinx full-text querying. Searches could be saved to the database, though regrettably, as entire serialized objects. We actually stored our primary navigation options this way in the database too. This turned out to be a big mistake when it came to data portability.
From our users’ perspective, the search capabilities were good, but it was too difficult to use. We had tried to provide all the options they could ever need, but the end result was that there were too many options, some of which seemed ambiguous and confusing. Additionally, every time we changed the name of a search field, the serialized objects saved into the database broke, and the migration procedure was much more complicated than it should have been.
Users also found it difficult to “narrow down” their search, since it wasn’t clear what impact changing an input in the advanced search would have on the size of the result set, without performing that search.
At the time we built this particular search implementation, the only way to index your data with Sphinx was to rebuild the entire index. Fortunately this only took about 20 seconds or so in our case (Sphinx is good at doing this stuff efficiently). Though since we wanted close-to-realtime results (as our data changes practically every second or two), we were re-indexing the entire dataset every minute via cron, which was adding some strain to our database servers.
The third (and current) implementation
Generally we are happy with what have now, but we do have some things planned for further improvements.
When we rewrote search this time around, beyond our desire to improve the underlying code internally, we wanted to make it easier for users to “visualize” the data as they browsed. Since users were searching primarily on factors such as revenue and page views, we set about building a faceted search designed to allow click-by-click drill-down of the results, where the facets always show how many results you’ll get if you click on them. The facets would be displayed all the time, no matter what you were searching for. This presented some challenges, since now instead of executing a single query per search, we had to execute something like 20 queries.
Like the previous implementation, we used Sphinx—albeit a newer version with support for realtime indexes and multi-queries, which is how the facets are able to execute efficiently. We also retained the idea of having our primary navigation hooked into our search system. This had worked well for us in the previous implementation. We ditched SphinxSE due to the complexity it added to our server infrastructure and the fact we wanted to use multi-queries in Sphinx, which would not work efficiently through MySQL. While we still use the search system for our primary navigation (which means you’ll always have facets down the side of the page), we stopped storing these in the database and simply have them formalized in code. This makes tweaking them simpler, since it’s a code edit, not a data migration. We also built a proper schema for saving searches, instead of being lazy and serializing objects to the database (the benefits of which, probably do not require further explanation).
Since the primary complaint with our previous implementation, from the user experience perspective, was that it was too confusing to use, we spent a considerable amount of time assessing what options we were providing to users via our advanced search page. It was overly complicated and ambiguous in places. As a result, we decided to either remove search options entirely, or combine them together, thus greatly simplifying the UI for our users. I believe at the same time, we added new options, but the end result was still simpler. Part of this change, however, was designed to draw the focus away from the advanced search and more towards our pre-defined facets, which suit the needs of most casual users browsing the site.
The feedback we’ve had from regarding our current search has been extremely positive. Many of our listings are at the low end of the scale, which many buyers are not interested in. Now buyers are able to quickly filter these out simply by clicking on the facets, directly from our primary navigation options. This is something we were aiming to achieve… we’re looking to encourage more quality listings, so making it easier for buyers to reach these listings and hide the 3-day-old WordPress blogs solves this.
All the code is custom-written in PHP (parts of our site are written in PHP, other parts are written in Ruby). We’ll likely be porting this code to Ruby at some point, though we need a Sphinx gem that supports the features we’re using from Sphinx 2, and Pat Allan’s Riddle gem doesn’t offer this just yet. We may end up writing this ourselves.
Some things we have built around our search
- From any of our primary navigation options, you may click “Advanced” at the top of the facets, to load the advanced search page with the inputs used to execute the search for that navigation option, either for inspection, or to modify them.
- We have a JSON search API, available only on request, used by third-parties who analyze our data for use on their own websites.
- Users can have the results of a search emailed to them on a daily basis. This simply loads the search from the database and executes it via a background job.
- Some smaller features, such as watching certain tags and sellers use the search internally.
Where to next?
We have some things on the agenda for future improvements to our search, though nothing quite as major as our previous iterations. There are some internal optimizations we can certainly make, such as having an effective caching strategy (though cache invalidation is hard). We also have some changes planned that focus on tailoring the search according to the region of the user, though I can’t go into details on this. All in all, we think we’re getting there!"
Thanks / Guest Posts
Many thanks to Chris! Please post feedback right here or send to Chris’ Twitter.
If you don’t have a blog or are interested in writing a guest post, roughly in these areas: Ruby, Framework Design, Search Design or similar, please contact me.Next Business Cards
Previous Normalizing Indexed Data