This is the one page help document for Picky.
Search for things using your browser (use ⌘F).
Edit typos directly in the github page of a section using the edit button.
It's All Ruby. You'll never feel powerless. Look at your index data anytime.
Creating an example app to get you up and running fast, Servers or Clients.
Generating them:
More infos on the applications:
How to integrate Picky in:
How data is cut into little pieces for the index and when searching.
How the data is stored and what you can do with Indexes.
Configuring an index:
How does data get into an index?
How is the data categorized?
How is the data prepared?
Getting at the data:
There are four different store types:
Advanced topics:
How to configure a search interface over an index (or multiple).
What options does a user have when searching?
Advanced topics:
When you need a slice over a category's data.
What a picky search returns.
We include a JavaScript library to make writing snazzy interfaces easier – see the options.
A bit of thanks!
Never forget this: Picky is all Ruby, all the time!
Even though we only describe examples of classic and Sinatra style servers, Picky can be included directly in Rails, as a client or server. Or in DRb. Or in your simple script without HTTP. Anywhere you like, as long as it's Ruby, really.
To drive the point home, remember that Picky is mainly two pieces working together: An index, and a search interface on indexes.
The index normally has a source, knows how to tokenize data, and has a few data categories. And the search interface normally knows how to tokenize incoming queries. That's it (copy and run in a script):
require 'picky'
Person = Struct.new :id, :first, :last
index = Picky::Index.new :people do
source { People.all }
indexing splits_text_on: /[\s-]/
category :first
category :last
end
index.add Person.new(1, 'Florian', 'Hanke')
index.add Person.new(2, 'Peter', 'Mayer-Miller')
people = Picky::Search.new index do
searching splits_text_on: /[\s,-]/
end
results = people.search 'Miller'
p results.ids # => [2]
You can put these pieces anywhere, independently.
Picky tries its best to be transparent so you can go have a look if something goes wrong. It wants you to never feel powerless.
All the indexes can be viewed in the /index
directory of the project. They are waiting for you to inspect their JSONy goodness.
Should anything not work with your search, you can investigate how it is indexed by viewing the actual index files (remember, they are in readable JSON) and change your indexing parameters accordingly.
You can also log as much data as you want to help you improve your search application until it's working perfectly.
Picky offers a few generators to have a running server and client up in 5 minutes. So you can either get started right away
or, run gem install
gem install picky-generators
and simply enter
picky generate
This will raise an Picky::Generators::NotFoundException
and show you the possibilities.
The "All In One" Client/Server might be interesting for Heroku projects, as it is a bit complicated to set up two servers that interact with each other.
Currently, Picky offers two generated example projects that you can adapt to your project: Separate Client and Server (recommended) and All In One.
If this is your first time using Picky, we suggest to start out with these even if you have a project where you want to integrate Picky already.
The server is generated with
picky generate server target_directory
and generates a full Sinatra server that you can try immediately. Just follow the instructions.
All In One is actually a single Sinatra server containing the Server AND the client. This server is generated with
picky generate all_in_one target_directory
and generates a full Sinatra Picky server and client in one that you can try immediately. Just follow the instructions.
Picky currently offers an example Sinatra client that you can adapt to your project (or look at it to get a feeling for how to use Picky in Rails).
This client is generated with
picky generate client target_directory
and generates a full Sinatra Picky client (including Javascript etc.) that you can try immediately. Just follow the instructions.
Picky, from version 3.0 onwards, is designed to run anywhere, in anything. An octopus has eight legs, remember?
This means you can have a Picky server running in a DRb instance if you want to. Or in irb, for example.
We do run and test the Picky server in two styles, Classic and Sinatra.
But don't let that stop you from just using it in a class or just a script. This is a perfectly ok way to use Picky:
require 'picky'
include Picky # So we don't have to type Picky:: everywhere.
books_index = Index.new(:books) do
source Sources::CSV.new(:title, :author, file: 'library.csv')
category :title
category :author
end
books_index.index
books_index.reload
books = Search.new books_index do
boost [:title, :author] => +2
end
results = books.search "test"
results = books.search "alan turing"
require 'pp'
pp results.to_hash
More Ruby, more power to you!
A Sinatra server is usually just a single file. In Picky, it is a top-level file named
app.rb
We recommend to use the modular Sinatra style as opposed to the classic style. It's possible to write a Picky server in the classic style, but using the modular style offers more options.
require 'sinatra/base'
require 'picky'
class BookSearch < Sinatra::Application
books_index = Index.new(:books) do
source { Book.order("isbn ASC") }
category :title
category :author
end
books = Search.new books_index do
boost [:title, :author] => +2
end
get '/books' do
results = books.search params[:query],
params[:ids] || 20,
params[:offset] || 0
results.to_json
end
end
This is already a complete Sinatra server.
The Sinatra Picky server uses the same routing as Sinatra (of course). More information on Sinatra routing.
If you use the server with the picky client software (provided with the picky-client gem), you should return JSON from the Sinatra get
.
Just call to_json
on the returned results to get the results in JSON format.
get '/books' do
results = books.search params[:query], params[:ids] || 20, params[:offset] || 0
results.to_json
end
The above example search can be called using for example curl
:
curl 'localhost:8080/books?query=test'
TODO Update this section.
This is one way to do it:
MyLogger = Logger.new "log/search.log"
# ...
get '/books' do
results = books.search "test"
MyLogger.info results
results.to_json
end
or set it up in separate files for different environments:
require "logging/#{PICKY_ENVIRONMENT}"
Note that this is not Rack logging, but Picky search engine logging. The resulting file can be used with the picky-statistics gem.
The All In One server is a Sinatra server and a Sinatra client rolled in one.
It's best to just generate one and look at it:
picky generate all_in_one all_in_one_test
and then follow the instructions.
When would you use an All In One server? One place is Heroku, since it is a bit more complicated to set up two servers that interact with each other.
It's nice for small convenient searches. For production setups we recommend to use a separate server to make everything separately cacheable etc.
How do you integrate Picky in…?
There are basically two basic ways to integrate Picky in Rails:
The advantage of the first setup is that you don't need to manage an external server. However, having a separate search server is much cleaner: You don't need to load the indexes on Rails startup as you just leave the search server running separately.
If you just want a small search engine inside your Rails app, this is the way to go.
In config/initializers/picky.rb
, add the following: (lots of comments to help you)
# Set the Picky logger.
#
Picky.logger = Picky::Loggers::Silent.new
# Picky.logger = Picky::Loggers::Concise.new
# Picky.logger = Picky::Loggers::Verbose.new
# Set up an index and store it in a constant.
#
BooksIndex = Picky::Index.new :books do
# Our keys are usually integers.
#
key_format :to_i
# key_format :to_s # From eg. Redis they are strings.
# key_format ... (whatever method needs to be called on
# the id of what you are indexing)
# Some indexing options to start with.
# Please see: http://florianhanke.com/picky/documentation.html#tokenizing
# on what the options are.
#
indexing removes_characters: /[^a-z0-9\s\/\-\_\:\"\&\.]/i,
stopwords: /\b(and|the|of|it|in|for)\b/i,
splits_text_on: /[\s\/\-\_\:\"\&\/]/,
rejects_token_if: lambda { |token| token.size < 2 }
# Define categories on your data.
#
# They have a lot of options, see:
# http://florianhanke.com/picky/documentation.html#indexes-categories
#
category :title
category :subtitle
category :author
category :isbn,
:partial => Picky::Partial::None.new # Only full matches
end
# BookSearch is the search interface
# on the books index. More info here:
# http://florianhanke.com/picky/documentation.html#search
#
BookSearch = Picky::Search.new BooksIndex
# We are explicitly indexing the book data.
#
Book.all.each { |book| BooksIndex.add book }
That's already a nice setup. Whenever Rails starts up, this will add all books to the index.
From anywhere (if you have multiple, call Picky::Indexes.index
to index all).
Ok, this sets up the index and the indexing. What about the model?
In the model, here app/models/book.rb
add this:
# Two callbacks.
#
after_save :picky_index
after_destroy :picky_index
# Updates the Picky index.
#
def picky_index
if destroyed?
BooksIndex.remove id
else
BooksIndex.replace self
end
end
I actually recommend to use after_commit
, but it did not work at the time of writing.
Now, in the controller, you need to return some results to the user.
# GET /books/search
#
def search
results = BookSearch.search query, params[:ids] || 20, params[:offset] || 0
# Render nicely as a partial.
#
results = results.to_hash
results.extend Picky::Convenience
results.populate_with Book do |book|
render_to_string :partial => "book", :object => book
end
respond_to do |format|
format.html do
render :text => "Book result ids: #{results.ids.to_s}"
end
format.json do
render :text => results.to_json
end
end
end
The first line executes the search using query params. You can try this using curl
:
curl http://127.0.0.1:4567/books/search?query=test
The next few lines use the results as a hash, and populate the results with data loaded from the database, rendering a book partial.
Then, we respond to HTML requests with a simple web page, or respond to JSON requests with the results rendered in JSON.
As you can see, you can do whatever you want with the results. You could use this in an API, or send simple text to the user, or...
TODO Using the Picky client JavaScript.
TODO
TODO Reloading indexes live
TODO Prepending the current user to filter
# Prepends the current user filter to
# the current query.
#
query = "user:#{current_user.id} #{params[:query]}"
TODO
TODO Also mention Padrino.
TODO
TODO
The indexing
method in an Index
describes how index data is handled.
The searching
method in a Search
describes how queries are handled.
This is where you use these options:
Picky::Index.new :books do
indexing options_hash_or_tokenizer
end
Search.new *indexes do
searching options_hash_or_tokenizer
end
Both take either an options hash, your hand-rolled tokenizer, or a Picky::Tokenizer
instance initialized with the options hash.
Picky by default goes through the following list, in order:
#substitute(text) #=> substituted text
[[/matching_regexp/, 'replace match \1']]
Infinity
(Don't go there, ok?).->(token){ token == 'hello' }
true
or false
, false
is default.stem(text)
that returns stemmed text.You pass the above options into
Search.new *indexes do
searching options_hash
end
You can provide your own tokenizer:
Search.new books_index do
searching MyTokenizer.new
end
TODO Update what the tokenizer needs to return.
The tokenizer needs to respond to the method #tokenize(text)
, returning a Picky::Query::Tokens
object. If you have an array of tokens, e.g. [:my, :nice, :tokens]
,
you can pass it into Picky::Query::Tokens.process(my_tokens)
to get the tokens and return these.
rake 'try[text,some_index,some_category]'
(some_index
, some_category
optional) tells you how a given text is indexed.
It needs to be programmed in a performance efficient way if you want your search engine to be fast.
Even though you usually provide options (see below), you can provide your own:
Picky::Index.new :books do
indexing MyTokenizer.new
end
The tokenizer must respond to tokenize(text)
and return [tokens, words]
, where tokens
is an Array of processed tokens and words
is an Array of words that represent the original words in the query (or as close as possible to the original words).
It is also possible to return [tokens]
, where tokens is the Array of processed query words. (Picky will then just use the tokens as words)
A very simple tokenizer that just splits the input on commas:
class MyTokenizer
def tokenize text
tokens = text.split ','
[tokens]
end
end
MyTokenizer.new.tokenize "Hello, world!" # => [["Hello", " world!"]]
Picky::Index.new :books do
indexing MyTokenizer.new
end
The same can be achieved with this:
Picky::Index.new :books do
indexing splits_text_on: ','
end
Usually, you use the same options for indexing and searching:
tokenizer_options = { ... }
index = Picky::Index.new :example do
indexing tokenizer_options
end
Search.new index do
searching tokenizer_options
end
However, consider this example.
Let's say your data has lots of words in them that look like this: all-data-are-tokenized-by-dashes
.
And people would search for them using spaces to keep words apart: searching for data
.
In this case it's a good idea to split the data and the query differently.
Split the data on dashes, and queries on \s
:
index = Picky::Index.new :example do
indexing splits_text_on: /-/
end
Search.new index do
searching splits_text_on: /\s/
end
The rule number one to remember when tokenizing is: Tokenized query text needs to match the text that is in the index.
So both the index and the query need to tokenize to the same string:
all-data-are-tokenized-by-dashes
=> ["all", "data", "are", "tokenized", "by", "dashes"]
searching for data
=> ["searching", "for", "data"]
Either look in the /index
directory (the "prepared" files is the tokenized data), or use Picky's try
rake task:
$ rake try[test]
"test" is saved in the Picky::Indexes index as ["test"]
"test" as a search will be tokenized as ["test"]
You can tell Picky which index, or even category to use:
$ rake try[test,books]
$ rake try[test,books,title]
Indexes do three things:
Picky offers a choice of four index types:
This is how they look in code:
books_memory_index = Index.new :books do
# Configuration goes here.
end
books_redis_index = Index.new :books do
backend Backends::Redis.new
# Configuration goes here.
end
Both save the preprocessed data from the data source in the /index
directory so you can go look if the data is preprocessed correctly.
Indexes are then used in a Search
interface.
Searching over one index:
books = Search.new books_index
Searching over multiple indexes:
media = Search.new books_index, dvd_index, mp3_index
The resulting ids should be from the same id space to be useful – or the ids should be exclusive, such that eg. a book id does not collide with a dvd id.
The in-memory index saves its indexes as files transparently in the form of JSON files that reside in the /index
directory.
When the server is started, they are loaded into memory. As soon as the server is stopped, the indexes are deleted from memory.
Indexing regenerates the JSON index files and can be reloaded into memory, even in the running server (see below).
The Redis index saves its indexes in the Redis server on the default port, using database 15.
When the server is started, it connects to the Redis server and uses the indexes in the key-value store.
Indexing regenerates the indexes in the Redis server – you do not have to restart the server running Picky.
TODO
TODO
If you don't have access to your indexes directly, like so
books_index = Index.new(:books) do
# ...
end
books_index.do_something_with_the_index
and for example you'd like to access the index from a rake task, you can use
Picky::Indexes
to get all indexes.
To get a single index use
Picky::Indexes[:index_name]
and to get a single category of an index, use
Picky::Indexes[:index_name][:category_name]
That's it.
This is all you can do to configure an index:
books_index = Index.new :books do
source { Book.order("isbn ASC") }
indexing removes_characters: /[^a-z0-9\s\:\"\&\.\|]/i, # Default: nil
stopwords: /\b(and|the|or|on|of|in)\b/i, # Default: nil
splits_text_on: /[\s\/\-\_\:\"\&\/]/, # Default: /\s/
removes_characters_after_splitting: /[\.]/, # Default: nil
normalizes_words: [[/\$(\w+)/i, '\1 dollars']], # Default: nil
rejects_token_if: lambda { |token| token == :blurf }, # Default: nil
case_sensitive: true, # Default: false
substitutes_characters_with: Picky::CharacterSubstituters::WestEuropean.new, # Default: nil
stems_with: Lingua::Stemmer.new # Default: nil
category :id
category :title,
partial: Partial::Substring.new(:from => 1),
similarity: Similarity::DoubleMetaphone.new(2),
qualifiers: [:t, :title, :titre]
category :author,
partial: Partial::Substring.new(:from => -2)
category :year,
partial: Partial::None.new
qualifiers: [:y, :year, :annee]
result_identifier 'boooookies'
end
Usually you won't need to configure all that.
But if your boss comes in the door and asks why X is not found… you know. And you can improve the search engine relatively quickly and painless.
More power to you.
Data sources define where the data for an index comes from. There are explicit data sources and implicit data sources.
Explicit data sources are mentioned in the index definition using the #source
method.
You define them on an index:
Index.new :books do
source Book.all # Loads the data instantly.
end
Index.new :books do
source { Book.all } # Loads on indexing. Preferred.
end
Or even on a single category:
Index.new :books do
category :title,
source: lambda { Book.all }
end
TODO more explanation how index sources and single category sources might work together.
Explicit data sources must respond to #each, for example, an Array.
Picky supports any data source as long as it supports #each
.
See under Flexible Sources how you can use this.
In short. Model:
class Monkey
attr_reader :id, :name, :color
def initialize id, name, color
@id, @name, @color = id, name, color
end
end
The data:
monkeys = [
Monkey.new(1, 'pete', 'red'),
Monkey.new(2, 'joey', 'green'),
Monkey.new(3, 'hans', 'blue')
]
Setting the array as a source
Index::Memory.new :monkeys do
source { monkeys }
category :name
category :couleur, :from => :color # The couleur category will take its data from the #color method.
end
If you define the source directly in the index block, it will be evaluated instantly:
Index::Memory.new :books do
source Book.order('title ASC')
end
This works with ActiveRecord and other similar ORMs since Book.order
returns a proxy object that will only be evaluated when the server is indexing.
For example, this would instantly get the records, since #all
is a kicker method:
Index::Memory.new :books do
source Book.all # Not the best idea.
end
In this case, it is better to give the source
method a block:
Index::Memory.new :books do
source { Book.all }
end
This block will be executed as soon as the indexing is running, but not earlier.
Implicit data sources are not mentioned in the index definition, but rather, the data is added (or removed) via realtime methods on an index, like #add
, #<<
, #unshift
, #remove
, #replace
, and a special form, #replace_from
.
So, you don't define them on an index or category as in the explicit data source, but instead add to either like so:
index = Index.new :books do
category :example
end
Book = Struct.new :id, :example
index.add Book.new(1, "Hello!")
index.add Book.new(2, "World!")
Or to a specific category:
index[:example].add Book.new(3, "Only add to a single category")
Currently, there are 7 methods to change an index:
#add
: Adds the thing to the end of the index (even if already there). index.add thing
#<<
: Adds the thing to the end of the index (shows up last in results). index << thing
#unshift
: Adds the thing to the beginning of the index (shows up first in results). index.unshift thing
#remove
: Removes the thing from the index (if there). index.remove thing
#replace
: Replaces the thing in the index (if there, otherwise like #add
). Equal to #remove
followed by #add
. index.replace thing
#replace_from
: Pass in a Hash. Replaces the thing in the index (if there, otherwise like #add
). Equal to #remove
followed by #add
. index.replace id: 1, example: "Hello, I am Hash!"
See Tokenizing for tokenizer options.
Categories – usually what other search engines call fields – define categorized data. For example, book data might have a title
, an author
and an isbn
.
So you define that:
Index.new :books do
source { Book.order('author DESC') }
category :title
category :author
category :isbn
end
(The example assumes that a Book
has readers for title
, author
, and isbn
)
This already works and a search will return categorized results. For example, a search for "Alan Tur" might categorize both words as author
, but it might also at the same time categorize both as title
. Or one as title
and the other as author
.
That's a great starting point. So how can I customize the categories?
The partial option defines if a word is also found when it is only partially entered. So, Picky
will be found when typing Pic
.
The default partial marker is *
, so entering Pic*
will force Pic
to be looked for in the partial index.
The last word in a query is always partial, by default. If you want to force a non partial search on the last query word, use "
as in last query word would be "partial"
, but here partial
would not be searched in the partial index.
By default, the partial marker is *
and the non-partial marker is "
. You change the markers by setting
Picky::Query::Token.partial_character = '\*'
Picky::Query::Token.no_partial_character = '"'
You define this by this:
category :some, partial: (some generator which generates partial words)
The Picky default is
category :some, partial: Picky::Partial::Substring.new(from: -3)
You get this one by defining no partial option:
category :some
The option Partial::Substring.new(from: 1)
will make a word completely partially findable.
So the word Picky
would be findable by entering Picky
, Pick
, Pic
, Pi
, or P
.
If you don't want any partial finds to occur, use:
category :some, partial: Partial::None.new
There are four built-in partial options. All examples use "hello" as the token.
Partial::None.new
Generates no partials, using *
will use exact word matching.Partial::Postfix.new(from: startpos)
Generates all postfixes.
from: 1
# => ["hello", "hell", "hel", "he", "h"]from: 4
# => ["hello", "hell"]Partial::Substring.new(from: startpos, to: endpos)
Generates substring partials. to: -1
is set by default.
from: 1
# => ["hello", "hell", "hel", "he", "h"]from: 4
# => ["hello", "hell"]from: 1, to: -2
# => ["hell", "hel", "he", "h"]from: 4, to: -2
# => ["hell"]Partial::Infix.new(min: minlength, max: maxlength)
Generates infix partials. max: -1
is set by default.
min: 1
# => ["hello", "hell", "ello", "hel", "ell", "llo", "he", "el", "ll", "lo", "h", "e", "l", "l", "o"]min: 4
# => ["hello", "hell", "ello"]min: 1, max: -2
# => ["hell", "ello", "hel", "ell", "llo", "he", "el", "ll", "lo", "h", "e", "l", "l", "o"]min: 4, max: -2
# => ["hell", "ello"]The general rule is: The more tokens are generated from a token, the larger your index will be. Ask yourself whether you really need an infix partial index.
You can also pass in your own partial generators. How?
Implement an object which has a single method #each_partial(token, &block)
. That method should yield all partials for a given token. Want to implement a (probably useless) random partial search? No problem.
Example:
You need an alphabetic index search. If somebody searches for a name, it should only be found if typed as a whole. But you'd also like to find it when just entering a
, for Andy
, Albert
, etc.
class AlphabeticIndexPartial
def each_partial token, &block
[token[0], token].each &block
end
end
This will result in "A" and "Andy" being in the index for "Andy".
Pretty straightforward, right?
The weight option defines how strongly a word is weighed. By default, Picky rates a word according to the logarithm of its occurrence. This means that a word that occurs more often will be weighed slightly higher.
You define a weight option like this:
category :some, weight: MyWeights.new
The default is Weights::Logarithmic.new
.
You can also pass in your own weight generators. See this article to learn more.
If you don't want Picky to calculate weights for your indexed entries, you can use constant or dynamic weights.
With 0.0 as a constant weight:
category :some, weight: Weights::Constant.new # Returns 0.0 for all results.
With 3.14 as a constant weight:
category :some, weight: Weights::Constant.new(3.14) # Returns 3.14 for all results.
Or with a dynamically calculated weight:
Weights::Dynamic.new do |str_or_sym|
sym_or_str.length # Uses the length of the symbol as weight.
end
You almost never need to define weights. More often than not, you can fiddle with boosting combinations of categories , via the boost
method in searches.
Usually it is preferable to boost specific search results, say "florian hanke" mapped to [:first_name, :last_name], but sometimes you want a specific category boosted wherever it occurs.
For example, the title in a movie search engine would need to be boosted in all searches it occurs. Do this:
category :title, weight: Weights::Logarithmic.new(+1)
This adds +1 to all weights. Why the logarithmic? By default, Picky weighs categories using the logarithm of occurrences. So the default would be:
category :title, weight: Weights::Logarithmic.new # The default.
The Logarithmic
initializer accepts a constant to be added to the result. Adding the constant +1
is like multiplying the weight by Math::E
(e is Euler's constant). If you don't understand, don't worry, just know that by adding a constant you multiply by a certain value.
In short:
* Use weight
on the index, if you need a category to be boosted everywhere, wherever it occurs
* Use boosting if you need to boost specific combinations of categories only for a specific search.
The similarity option defines if a word is also found when it is typed wrong, or close to another word. So, "Picky" might be already found when typing "Pocky~" (Picky will search for similar word when you use the tilde, ~).
You define a similarity option like this:
category :some, similarity: Similarity::None.new
(This is also the default)
There are several built-in similarity options, like
category :some, similarity: Similarity::Soundex.new
category :this, similarity: Similarity::Metaphone.new
category :that, similarity: Similarity::DoubleMetaphone.new
You can also pass in your own similarity generators. See this article to learn more.
Usually, when you search for title:wizard
you will only find books with "wizard" in their title.
Maybe your client would like to be able to only enter t:wizard
. In that case you would use this option:
category :some, qualifier: "t"
Or if you'd like more to match:
category :some,
qualifiers: ["t", "title", "titulo"]
(This matches "t", "title", and also the italian "titulo")
Picky will warn you if on one index the qualifiers are ambiguous (Picky will assume that the last "t" for example is the one you want to use).
This means that:
category :some, qualifier: "t"
category :other, qualifier: "t"
Picky will assume that if you enter t:bla
, you want to search in the other
category.
Searching in multiple categories can also be done. If you have:
category :some, :qualifier => 's'
category :other, :qualifier => 'o'
Then searching with s,o:bla
will search for bla
in both :some
and :other
. Neat, eh?
Usually, the categories will take their data from the reader or field that is the same as their name.
Sometimes though, the model has not the right names. Say, you have an italian book model, Libro
. But you still want to use english category names.
Index.new :books do
source { Libro.order('autore DESC') }
category :title, :from => :titulo
category :author, :from => :autore
category :isbn
end
You can also populate the index at runtime (eg. with index.add
) using a lambda. The required argument inside the lambda is the object being added to the index.
Index.new :books do
category :authors, :from => lambda { |book| book.authors.map(&:name) }
end
You will almost never need to use this, as the key format will usually be the same for all categories, which is when you would define it on the index, like so.
But if you need to, use as with the index.
Index.new "books" do
category :title,
:key_format => :to_s
end
You will almost never need to use this, as the source will usually be the same for all categories, which is when you would define it on the index, "like so":#indexes-sources.
But if you need to, use as with the index.
Index.new :books do
category :title,
source: some_source
end
Set this option to false
when you give Picky already tokenized data (an Array, or generally an Enumerator).
Index.new :people do
category :names, tokenize: false
end
And Person has a method #names
which returns this array:
class Person
def names
['estaban', 'julio', 'ricardo', 'montoya', 'larosa', 'ramirez']
end
end
Then Picky will simply use the tokens in that array without (pre-)processing them. Of course, this means you need to do all the tokenizing work. If you leave the tokens in uppercase formatting, then nothing will be found, unless you set the Search to be case-sensitive, for example.
Users can use some special features when searching. They are:
something*
(By default, the last word is implicitly partial)"something"
(The quotes make the query on this word explicitly non-partial)something~
(The tilde makes this word eligible for similarity search)title:something
(Picky will only search in the category designated as title, in each index of the search)title,author:something
(Picky will search in title and author categories, in each index of the search)year:1999…2012
(Picky will search all values in a Ruby Range
: (1999..2012)
)These options can be combined (e.g. title,author:funky~"
): This will try to find similar words to funky (like "fonky"), but no partials of them (like "fonk"), in both title and author.
Non-partial will win over partial, if you use both, as in test*"
.
Also note that these options need to make it through the tokenizing, so don't remove any of *":,-
. TODO unclear
By default, the indexed data points to keys that are integers, or differently said, are formatted using to_i
.
If you are indexing keys that are strings, use to_s
– a good example are MongoDB BSON keys, or UUID keys.
The key_format
method lets you define the format:
Index.new :books do
key_format :to_s
end
The Picky::Sources
already set this correctly. However, if you use an #each
source that supplies Picky with symbol ids, you should tell it what format the keys are in, eg. key_format :to_s
.
By default, an index is identified by its name in the results. This index is identified by :books
:
Index.new :books do
# ...
end
This index is identified by media
in the results:
Index.new :books do
# ...
result_identifier 'media'
end
You still refer to it as :books
in e.g. Rake tasks, Picky::Indexes[:books].reload
. The result_identifier
option is just for the results.
Indexing can be done programmatically, at any time. Even while the server is running.
Indexing all indexes is done with
Picky::Indexes.index
Indexing a single index can be done either with
Picky::Indexes[:index_name].index
or
index_instance.index
Indexing a single category of an index can be done either with
Picky::Indexes[:index_name][:category_name].index
or
category_instance.index
Loading (or reloading) your indexes in a running application is possible.
Loading all indexes is done with
Picky::Indexes.load
Loading a single index can be done either with
Picky::Indexes[:index_name].load
or
index_instance.load
Loading a single category of an index can be done either with
Picky::Indexes[:index_name][:category_name].load
or
category_instance.load
To communicate with your server using signals:
books_index = Index.new(:books) do
# ...
end
Signal.trap("USR1") do
books_index.reindex
end
This reindexes the books_index when you call
kill -USR1 <server_process_id>
You can refer to the index like so if want to define the trap somewhere else:
Signal.trap("USR1") do
Picky::Indexes[:books].reindex
end
Reindexing your indexes is just indexing followed by reloading (see above).
Reindexing all indexes is done with
Picky::Indexes.reindex
Reindexing a single index can be done either with
Picky::Indexes[:index_name].reindex
or
index_instance.reindex
Reindexing a single category of an index can be done either with
Picky::Indexes[:index_name][:category_name].reindex
or
category_instance.reindex
Picky offers a Search
interface for the indexes. You instantiate it as follows:
Just searching over one index:
books = Search.new books_index # searching over one index
Searching over multiple indexes:
media = Search.new books_index, dvd_index, mp3_index
Such an instance can then search over all its indexes and returns a Picky::Results
object:
results = media.search "query", # the query text
20, # number of ids
0 # offset (for pagination)
Please see the part about Results to know more about that.
You use a block to set search options:
media = Search.new books_index, dvd_index, mp3_index do
searching tokenizer_options_or_tokenizer
boost [:title, :author] => +2,
[:author, :title] => -1
end
See Tokenizing for tokenizer options.
The boost
option defines what combinations to boost.
This is unlike boosting in most other search engines, where you can only boost a given field. I've found it much more useful to boost combinations.
For example, you have an index of addresses. The usual case is that someone is looking for a street and a number. So if Picky encounters that combination (in that order), it should promote the results containing that combination to a more prominent spot. On the other hand, if picky encounters a street number followed by a street name, which is unlikely to be a search for an address (where I come from), you might want to demote that result.
So let's boost street, streetnumber
, while at the same time deboost streetnumber, street
:
addresses = Picky::Search.new address_index do
boost [:street, :streetnumber] => +2,
[:streetnumber, :street] => -1
end
If you still want to boost a single category, check out the category weight option. For example:
Picky::Index.new :addresses do
category :street, weight: Picky::Weights::Logarithmic.new(+4)
category :streetnumber
end
This boosts the weight of the street category for all searches using the index with this category. So whenever the street category is found in results, it will boost these.
Picky combines consecutive categories in searches for boosting. So if you search for "star wars empire strikes back", when you defined [:title] => +1
, then that boosting is applied.
Why? In earlier versions of Picky we found that boosting specific combinations is less useful than boosting a specific order of categories.
Let me give you an example from a movie search engine. instead of having to say boost [:title] => +1, [:title, :title] => +1, [:title, :title, :title] => +1
, it is far more useful to say "If you find any number of title words in a row, boost it". So, when searching for "star wars empire strikes back 1979", it is less important that the query contains 5 title words than that it contains a title followed by a release year. So in this particular case, a boost defined by [:title, :release_year] => +3
would be applied.
There's a full blog post devoted to this topic.
In short, an ignore :name
option makes that Search throw away (ignore) any tokens (words) that map to category name
.
Let's say we have a search defined:
names = Picky::Search.new name_index do
ignore :first_name
end
Now, if Picky finds the tokens "florian hanke" in both :first_name, :last_name
and :last_name, :last_name
, then it will throw away the solutions for :first_name
("florian" will be thrown away) leaving only "hanke", since that is a last name. The [:last_name, :last_name]
combinations will be left alone – ie. if "florian" and "hanke" are both found in last_name
.
The ignore
option also takes arrays. If you give it an array, it will throw away all solutions where that order of categories occurs.
Let's say you want to throw away results where last name is found before first name, because your search form is in order: [first_name last_name]
.
names = Picky::Search.new name_index do
ignore [:last_name, :first_name]
end
So if somebody searches for "peter paul han" (each a last name as well as a first name), and Picky finds the following combinations:
[:first_name, :first_name, :first_name]
[:last_name, :first_name, :last_name]
[:first_name, :last_name, :first_name]
[:last_name, :first_name, :first_name]
[:last_name, :last_name, :first_name]
then the combinations
[:last_name, :first_name, :first_name]
[:last_name, :last_name, :first_name]
will be thrown away, since they are in the order [:last_name, :first_name]
. Note that [:last_name, :first_name, :last_name]
is not thrown away since it is last-first-last.
This is the opposite of the ignore
option above.
Almost. The only
option only takes arrays. If you give it an array, it will keep only solutions where that order of categories occurs.
Let's say you want to keep only results where first name is found before last name, because your search form is in order: [first_name last_name]
.
names = Picky::Search.new name_index do
only [:first_name, :last_name]
end
So if somebody searches for "peter paul han" (each a last name as well as a first name), and Picky finds the following combinations:
[:first_name, :first_name, :last_name]
[:last_name, :first_name, :last_name]
[:first_name, :last_name, :first_name]
[:last_name, :first_name, :first_name]
[:last_name, :last_name, :first_name]
then only the combination
[:first_name, :first_name, :last_name]
will be kept, since it is the only one where first comes before last, in that order.
There's a full blog post devoted to this topic.
In short, the ignore_unassigned_tokens true/false
option makes Picky be very lenient with your queries. Usually, if one of the search words is not found, say in a query "aston martin cockadoodledoo", Picky will return an empty result set, because "cockadoodledoo" is not in any index, in a car search, for example.
By ignoring the "cockadoodledoo" that can't be assigned sensibly, you will still get results.
This could be used in a search for advertisements that are shown next to the results.
If you've defined an ads search like so:
ads_search = Search.new cars_index do
ignore_unassigned_tokens true
end
then even if Picky does not find anything for "aston martin cockadoodledoo", it will find an ad, simply ignoring the unassigned token.
The max_allocations(integer)
option cuts off calculation of allocations.
What does this mean? Say you have code like:
phone_search = Search.new phonebook do
max_allocations 1
end
And someone searches for "peter thomas".
Picky then generates all possible allocations and sorts them.
It might get
[first_name, last_name]
[last_name, first_name]
[first_name, first_name]
with the first allocation being the most probable one.
So, with max_allocations 1
it will only use the topmost one and throw away all the others.
It will only go through the first one and calculate only results for that one. This can be used to speed up Picky in case of exploding amounts of allocations.
The terminate_early(integer)
or terminate_early(with_extra_allocations: integer)
option stops Picky from calculate all ids of all allocations.
However, this will also return a wrong total.
So, important note: Only use when you don't display a total. Or you want to fool your users (not recommended).
Examples:
Stop as soon as you have calculated enough ids for the allocation.
phone_search = Search.new phonebook do
terminate_early # The default uses 0.
end
Stop as soon as you have calculated enough ids for the allocation, and then calculate 3 allocations more (for example, to show to the user).
phone_search = Search.new phonebook do
terminate_early 3
end
There's also a hash form to be more explicit. So the next coder knows what it does. (However, us cool Picky hackers know ;) )
phone_search = Search.new phonebook do
terminate_early with_extra_allocations: 5
end
This option speeds up Picky if you don't need a correct total.
Results are returned by the Search
instance.
books = Search.new books_index do
searching splits_text_on: /[\s,]/
boost [:title, :author] => +2
end
results = books.search "test"
p results # Returns results in log form.
p results.to_hash # Returns results as a hash.
p results.to_json # Returns results as JSON.
If no sorting is defined, Picky results will be sorted in the order of the data provided by the data source.
However, you can sort the results any way you want.
You can define an arbitrary sorting on results by calling Results#sort_by
.
It takes a block with a single parameter: The stored id of a result item.
This example looks up a result item via id and then takes the priority of the item to sort the results.
results.sort_by { |id| MyResultItemsHash[id].priority }
The results are only sorted within their allocation.
If you, for example, searched for Peter
, and Picky allocated results in first_name
and last_name
, then each allocation's results would be sorted.
Picky is optimized: it only sorts results which are actually visible. So if Picky looks for the first 20 results, and the first allocation already has more than 20 results in it – say, 100 --, then it will only sort the 100 results of the first allocation. It will still calculate all other allocations, but not sort them.
Results#sort_by
, then sorting incurs no costs.sort_hash = {
1 => 10, # important
2 => 100 # not so important
}
results.sort_by { |id| sort_hash[id] }
Note that in Ruby, a lower value => more to the front (the higher up in Picky).
TODO Update with latest logging style and ideas on how to separately log searches.
Picky results can be logged wherever you want.
A Picky Sinatra server logs whatever to wherever you want:
MyLogger = Logger.new "log/search.log"
# ...
get '/books' do
results = books.search "test"
MyLogger.info results
results.to_json
end
or set it up in separate files for different environments:
require "logging/#{PICKY_ENVIRONMENT}"
A Picky classic server logs to the logger defined with the Picky.logger=
writer.
Set it up in a separate logging.rb
file (or directly in the app/application.rb
file).
Picky.logger = Picky::Loggers::Concise.new STDOUT
and the Picky classic server will log the results into it, if it is defined.
Why in a separate file? So that you can have different logging for different environments.
More power to you.
Here's the Wikipedia entry on facets. I fell asleep after about 5 words. Twice.
In Picky, categories are explicit slices over your index data. Picky facets are implicit slices over your category data.
What does "implicit" mean here?
It means that you didn't explicitly say, "My data is shoes, and I have these four brands: Nike, Adidas, Puma, and Vibram".
No, instead you told Picky that your data is shoes, and there is a category "brand". Let's make this simple:
index = Picky::Index.new :shoes do
category :brand
category :name
category :type
end
index.add Shoe.new(1, 'nike', 'zoom', 'sports')
index.add Shoe.new(2, 'adidas', 'speed', 'sports')
index.add Shoe.new(3, 'nike', 'barefoot', 'casual')
With this data in mind, let's look at the possibilities:
Index facets are very straightforward.
You ask the index for facets and it will give you all the facets it has and how many results there are within:
index.facets :brand # => { 'nike' => 2, 'adidas' => 1 }
The category type is a good candidate for facets, too:
index.facets :type # => { 'sports' => 2, 'casual' => 1 }
What are the options?
at_least
: index.facets :brand, at_least: 2 # => { 'nike' => 2 }
counts
: index.facets :brand, counts: false # => ['nike', 'adidas']
index.facets :brand, at_least: 2, counts: false # => ['nike']
at_least
only gives you facets which occur at least n times and counts
tells the facets method whether you want counts with the facets or not. If counts are omitted, you'll get an Array
of facets instead of a Hash
.
Pretty straightforward, right?
Search facets are quite similar:
Search facets work similarly to index facets. In fact, you can use them in the same way:
search_interface.facets :brand # => { 'nike' => 2, 'adidas' => 1 }
search_interface.facets :type # => { 'sports' => 2, 'casual' => 1 }
search_interface.facets :brand, at_least: 2 # => { 'nike' => 2 }
search_interface.facets :brand, counts: false # => ['nike', 'adidas']
search_interface.facets :brand, at_least: 2, counts: false # => ['nike']
However search facets are more powerful, as you can also filter the facets with a filter query option:
shoes.facets :brand, filter: 'some filter query'
What does that mean?
Usually you want to use multiple facets in your interface. For example, a customer might already have filtered results by type "sports" because they are only interested in sports shoes. Now you'd like to show them the remaining brands, so that they can filter on the remaining facets.
How do you do this?
Let's say we have an index as above, and a search interface to the index:
shoes = Picky::Search.new index
If the customer has already filtered for sports, you simply pass the query to the filter
option:
shoes.facets :brand, filter: 'type:sports' # => { 'nike' => 1, 'adidas' => 1 }
This will give you only 1 "nike" facet. If the customer filtered for "casual":
shoes.facets :brand, filter: 'type:casual' # => { 'nike' => 1 }
then we'd only get the casual nike facet (from that one "barefoot" shoe picky loves so much).
As said, filtering works like the query string passed to picky. So if the customer has filtered for brand "nike" and type "sports", you'd get:
shoes.facets :brand, filter: 'brand:nike type:sports' # => { 'nike' => 1 }
shoes.facets :name, filter: 'brand:nike type:sports' # => { 'zoom' => 1 }
Playing with it is fun :)
See below for testing and performance tips.
Let's say we have an index with some data:
index = Picky::Index.new :people do
category :name
category :surname
end
person = Struct.new :id, :name, :surname
index.add person.new(1, 'tom', 'hanke')
index.add person.new(2, 'kaspar', 'schiess')
index.add person.new(3, 'florian', 'hanke')
This is how you test facets:
# We should find two surname facets.
#
index.facets(:surname).should == {
'hanke' => 2, # hanke occurs twice
'schiess' => 1 # schiess occurs once
}
# Only one occurs at least twice.
#
index.facets(:surname, at_least: 2).should == {
'hanke' => 2
}
# Passing in no filter query just returns the facets
#
finder.facets(:surname).should == {
'hanke' => 2,
'schiess' => 1
}
# A filter query narrows the facets down.
#
finder.facets(:name, filter: 'surname:hanke').should == {
'tom' => 1,
'florian' => 1
}
# It allows explicit partial matches.
#
finder.facets(:name, filter: 'surname:hank*').should == {
'fritz' => 1,
'florian' => 1
}
Two rules:
A good example for a meaningful use of facets would be brands of shoes. There aren't many different brands (usually less than 100).
So this facet query
finder.facets(:brand, filter: 'type:sports')
does not return thousands of facets.
Should you find yourself in a position where you have to use a facet query on uncontrolled data, eg. user entered data, you might want to cache the results:
category = :name
filter = 'age_bracket:40'
some_cache[[category, filter]] ||= finder.facets(category, filter: filter)
Picky offers a standard HTML interface that works well with its JavaScript. Render this into your HTML (needs the picky-client
gem):
Picky::Helper.cached_interface
Adding a JS interface (written in jQuery for brevity):
$(document).ready(function() {
pickyClient = new PickyClient({
// A full query displays the rendered results.
//
full: '/search/full',
// More options...
});
});
See the options described and listed below.
The variable pickyClient has the following functions:
// Params are params for the controller action. Full is either true or false.
//
pickyClient.insert(query, params, full);
// Resends the last query.
//
pickyClient.resend;
// If not given a query, will use query from the URL (needs history.js).
//
pickyClient.insertFromURL(overrideQuery);
When creating the client itself, you have many more options, as described here:
Search options are about configuring the search itself.
There are four different callbacks that you can use. The part after the ||
describes the default, which is an empty function.
The beforeInsert
is executed before a call to pickyClient.beforeInsert
. Use this to sanitize queries coming from URLs:
var beforeInsertCallback = config.beforeInsert || function(query) { };
The before
is executed before a call to the server. Use this to add any filters you might have from radio buttons or other interface elements:
var beforeCallback = config.before || function(query, params) { };
The success
is executed just after a successful response. Use this to modify returned results before Picky renders them:
var successCallback = config.success || function(data, query) { };
The after
callback is called just after Picky has finished rendering results – use it to make any changes to the interface (like update an advertisement or similar).
var afterCallback = config.after || function(data, query) { };
This will cause the interface to search even if the input field is empty:
var searchOnEmpty = config.searchOnEmpty || false;
If you want to tell the server you need more than 0 live search results, use liveResults
:
var liveResults = config.liveResults || 0;
If the live results need to be rendered, set this to be true. Usually used when full results need to be rendered even for live searches (search as you type):
var liveRendered = config.liveRendered || false;
After each keystroke, Picky waits for a designated interval (default is 180ms) for the next keystroke. If no key is hit, it will send a "live" query to the search server. This option lets you change that interval time:
var liveSearchTimerInterval = config.liveSearchInterval || 180;
You can completely exchange the backend used to make calls to the server – in this case I trust you to read the JS code of Picky yourself:
var backends = config.backends;
With these options, you can change the text that is displayed in the interface.
These options can be locale dependent.
Qualifiers are used when you have a category that uses a different qualifier name than the category. That is, if you have a category in the index that is named differently from its qualifiers. Eg. category :application, qualifiers: ['app']
. You'd then have to tell the Picky interface to map the category correctly to a qualifier.
qualifiers: {
en:{
application: 'app'
}
},
Remember that you only need this if you do funky stuff. Keep to the defaults and you'll be fine.
Explanations are the small headings over allocations (grouped results). Picky just writes "with author soandso" – if you want a better explanation, use the explanations option:
explanations: {
en:{
title: 'titled',
author: 'written by',
year: 'published in',
publisher: 'published by',
subjects: 'with subjects'
}
}
Picky would now write "written by soandso", making it much nicer to read.
Choices describe the choices that are given to a user when Picky would like to know what the user was searching. This is done when Picky gets too many results in too many allocations, e.g. it is very unclear what the user was looking for.
An example for choices would be:
choices: {
en:{
'title': {
format: "Called <strong>%1$s</strong>",
filter: function(text) { return text.toUpperCase(); },
ignoreSingle: true
},
'author': 'Written by %1$s',
'subjects': 'Being about %1$s',
'publisher': 'Published by %1$s',
'author,title': 'Called %1$s, written by %2$s',
'title,author': 'Called %2$s, written by %1$s',
'title,subjects': 'Called %1$s, about %2$s',
'author,subjects': '%1$s who wrote about %2$s'
}
},
Was the user just looking for a title? (Displayed as eg. "ULYSSES – because of the filter and format) or was he looking for an author? (Displayed as "Written by Ulysses")
Multicategory combinations are possible. If the user searches for Ulysses Joyce, then Picky will most likely as if this is a title and an author: "Called Ulysses, written by Joyce".
This is a much nicer way to ask the user, don't you think?
The last option just describes which categories should not show ellipses …
behind the text (eg. ) if the user searched for it in a partial way. Use this when the categories are not partially findable on the server.
nonPartial: ['year', 'id']
When searching for "1977", this will result in the text being "written in 1977" instead of "written in 1977…", where the ellipses don't make much sense.
The last option describes how to group the choices in a text. Play with this to see the effects (I know, am tired ;) ).
groups: ['title', 'author'];
There are quite a few selector options – you only need those if you heavily customise the interface. You tell Picky where to find the div containing the results or the search form etc.
The selector that contains the search input and the result:
config['enclosingSelector'] || '.picky';
The selector that describes the form the input field is in:
var formSelector = config['formSelector'] || (enclosingSelector + ' form');
The formSelector
(short fs
) is used to find the input etc.:
config['input'] = $(config['inputSelector'] || (fs + ' input[type=search]'));
config['reset'] = $(config['resetSelector'] || (fs + ' div.reset'));
config['button'] = $(config['buttonSelector'] || (fs + ' input[type=button]'));
config['counter'] = $(config['counterSelector'] || (fs + ' div.status'));
The enclosingSelector
(short es
) is used to find the results
config['results'] = $(config['resultsSelector'] || (es + ' div.results'));
config['noResults'] = $(config['noResultsSelector'] || (es + ' div.no_results'));
config['moreSelector'] = config['moreSelector'] ||
es + ' div.results div.addination:last';
The moreSelector refers to the clickable "more results" pagination/addination.
The result allocations are selected on by these options:
config['allocations'] = $(config['allocationsSelector'] ||
(es + ' .allocations'));
config['shownAllocations'] = config['allocations'].find('.shown');
config['showMoreAllocations'] = config['allocations'].find('.more');
config['hiddenAllocations'] = config['allocations'].find('.hidden');
config['maxSuggestions'] = config['maxSuggestions'] || 3;
Results rendering is controlled by:
config['results'] = $(config['resultsSelector'] ||
(enclosingSelector + ' div.results'));
config['resultsDivider'] = config['resultsDivider'] || '';
config['nonPartial'] = config['nonPartial'] || [];
// e.g. ['category1', 'category2']
config['wrapResults'] = config['wrapResults'] || '<ol></ol>';
The option wrapResults
refers to what the results are wrapped in, by default <ol></ol>
.
Thanks to whoever made the Sinatra README page for the inspiration.