Fork me on GitHub

Picky

In Detail

Huh? What's a semantic text search engine?

A semantic text search engine does not operate on huge blobs of text, but instead on smaller, highly categorized text amounts. For example, on varchar database fields.

If your data isn't categorized well (like text from a book), then you should instead choose a full-text search engine, like Sphinx or Solr (Lucene).

Then why use it?

Often, full-text search engines are misused by letting them loose on highly categorized (semantic) text.

Picky helps your user find data which in a full-text search engine would be buried in a heap of results. Also, it lets him do so with a Google-y single search field.

Sure the word "peter" is found most often in document #7, but he actually just wants documents by someone with surname "Peter", and not everything related to peters.

Picky helps him refine his search by way of a comfortable interface to get exactly what he wants.

But why not use a full-text search engine?

Full-Text search engines do one thing especially well: Making full (i.e. uncategorized heaps of) text searchable.

For small, highly categorized text, we simply need new ideas. Picky is one of them.

Ok, that was my elevator pitch ;)

See me show (it) off

Using a real telephone search as an example.

This was at the fantastic EuRuKo 2010 Conference in beautiful Krakow.

Why would one write a search engine in Ruby?

It's fast enough and the high level really helped understanding it as it evolved. There are some parts that have been written in pedal-to-the-metal C code.

How does it perform?

This depends on many factors, but generally we recommend using Picky with a maximum of 150 million data points, i.e. words (we used it there). The area under 20 millions is probably best. Your mileage may vary, of course, depending on how many partial indexes you use etc.

See the use case in the enterprise section.

Indexing is not too fast, and I'd be glad if it were faster. However, you get the full power of Ruby and fully customizable indexing.

Why the octopus?

Glad you asked. But first, read this Wikipedia entry about octopuses. Also, a movie. Finished? I think that sums it up pretty well. And it's cuuute, don't you think? :)

But don't call him that. He likes to be called "Octor the Destroyer".

Who wrote it?

Mainly me, Florian Hanke, but I also had excellent help by friends and coworkers.

Why the LGPL license?

I'd have preferred a MIT license. In the end it was a compromise between my former employer and me.

Roadmap

Wiki Roadmap

Alternatives

There aren't many real Ruby search engines. Just more or less elegant adapters for existing ones. I found two real ones:

Whistlepig by William Morgan. "Whistlepig is a minimalist real-time full-text search".

Ion by Rico Sta. Cruz. A Ruby search engine based on a Redis backend.

Logos and all images are CC Attribution licensed to Florian Hanke.