Experimental Features for Picky 5 Tweet
This is a quick post about two experimental features in Picky 4.11+ that will be available stably in Picky 5.
Intro
Picky is very much driven by its users.
After adding stemming in Picky 4.6.6 from a push I got by John Barton and Glen Maddern of goodfil.ms fame, Andy Kitchen supplied a piece of code for automatic word segmentation, while also mentioning that he needs a range query.
They are now both available as experimental features.
Range queries
Let’s say you’d like to find all people born in 1977, 1978, and 1979. Previously, this was not too easy to do in Picky.
Now you can. Let’s look at a full copy-and-paste-able example:
require 'picky'
index = Picky::Index.new :people do
key_format :to_s
category :year
end
Person = Struct.new :id, :year
index.add Person.new('Picky', 2008)
index.add Person.new('Kaspar', 1978)
index.add Person.new('Florian', 1977)
index.add Person.new('Joe', 1955)
people = Picky::Search.new index
p people.search('1977-1979').ids
p people.search('year:1977-1979').ids
p people.search('year:1900-2010').ids
The first result will be
["Florian", "Kaspar"]
since I was born in 1977, and Kaspar was born in 1978. If you categorize it with year:1977-1979
it will yield the same result. If you only want results for a specific category, remember to categorize it by prefixing a search term or range category_name:
.
By going over the whole range, as in the third result, you’ll get
["Joe", "Florian", "Kaspar", "Picky"]
as the range year:1900-2010
includes all the results.
Range queries the Ruby way
Picky internally uses Enumerable#inject
, so any range will work. For example, initial:a-d
will yield results for each "a", "b", "c", and "d"
. Cool, eh?
Not impressed? Read on…
Custom ranges!
Andy Kitchen was happy with the range queries, however he needed range queries that were wrapping. If somebody wanted to find eg. an event that was on between 10pm and 2am in the morning, the current range query implementation did not allow that, as event_start:10-2
did not work (#each
or #inject
will yield nothing).
Because Picky accepts any kind of range, he implemented a wrapping range (the version here is a slight rewrite of the original):
class Wrap12Hours
include Enumerable
def initialize(min, max)
@hours = 12
@min = min.to_i
@top = max.to_i
@top += @hours if @top < @min
end
def each
@min.upto(@top).each do |i|
yield (i % @hours).to_s
end
end
end
This is then passed into an index category like this
category :hour, ranging: Wrap12Hours
to make Picky use this “ranging” for that category.
The result: If Wrap12Hours
is given a range like 10-2
, it will #each
this: [10, 11, 0, 1, 2]
, which is exactly what he needed.
Picky range queries use #inject
, but there is no #inject
on Wrap12Hours
– so why does it work? Note that Andy does an include Enumerable
. Enumerable#inject
uses the #each
method which is already there to implement #inject
and some other methods. Pretty snazzy! (And, I might add, the Ruby way of doing things)
The ability to implement custom ranges is very powerful and underlines the flexibility of Picky.
Automatic word segmentation
Just a quick note on this as it is just a sketch, currently. A fully functional sketch, though.
What if you want to not split on a regexp as you would usually, but you’d like Picky to split on words in the index.
So if you had “purple”, “rainbow”, and “pony” (don’t ask) in your index, then you’d want Picky to automatically split a query like “purplerainbowpony” into “purple”, “rainbow”, “pony”.
This can be achieved by giving the search category option splits_text_on
an automatic splitter rather than a regexp. The automatic splitter is initialized with the index category you’d like to use for the splitter.
automatic_splitter = Picky::Splitters::Automatic.new index[:text]
some_search = Picky::Search.new index do
searching splits_text_on: automatic_splitter
end
That’s it!
Note that if you want to test the spitter itself you can simply call #split
on it, as this is the method called by the Picky Tokenizer
to split incoming queries:
automatic_splitter.split 'hellopicky' # => ['hello', 'picky']
Please give it a go and report back!
The partial option
The automatic splitter supports a partial
option. This will make Picky also use the partial index.
automatic_splitter = Picky::Splitters::Automatic.new index[:text], partial: true
What does it mean? It means that it will
automatic_splitter.split 'hellopic' # => ['hello', 'pic']
correctly split off the partial ‘pic’. The non-partial version would simply split off ‘hello’:
automatic_splitter.split 'hellopic' # => ['hello']
Have fun!
As Picky grows and grows, I am especially happy that Picky is fed well by its enthusiastic and helpful users.
This is much appreciated, amigos! Keep it coming :D
Outlook for Picky 5
The above features will – after some polishing and feedback – be included into Picky 5.
Environments
After a discussion with Kaspar Schiess (my cofounder at The Technology Astronauts), I am very inclined to drop environments (ie. development, test, production) in the next Picky.
Have you ever asked yourself if you really need environments?
I hope to cover this topic in the next post.
Cheers, and have (pink, tentacly) fun!
Next Picky Tutorial: Rails 3.2Share
Previous Picky Stemming