Experimental Features for Picky 5

ruby / picky

This is a quick post about two experimental features in Picky 4.11+ that will be available stably in Picky 5.

Intro

Picky is very much driven by its users.

After adding stemming in Picky 4.6.6 from a push I got by John Barton and Glen Maddern of goodfil.ms fame, Andy Kitchen supplied a piece of code for automatic word segmentation, while also mentioning that he needs a range query.

They are now both available as experimental features.

Range queries

Let’s say you’d like to find all people born in 1977, 1978, and 1979. Previously, this was not too easy to do in Picky.

Now you can. Let’s look at a full copy-and-paste-able example:

require 'picky'
  
index = Picky::Index.new :people do
  key_format :to_s
  category :year
end

Person = Struct.new :id, :year

index.add Person.new('Picky',   2008)
index.add Person.new('Kaspar',  1978)
index.add Person.new('Florian', 1977)
index.add Person.new('Joe',     1955)

people = Picky::Search.new index

p people.search('1977-1979').ids
p people.search('year:1977-1979').ids
p people.search('year:1900-2010').ids

The first result will be

["Florian", "Kaspar"]

since I was born in 1977, and Kaspar was born in 1978. If you categorize it with year:1977-1979 it will yield the same result. If you only want results for a specific category, remember to categorize it by prefixing a search term or range category_name:.

By going over the whole range, as in the third result, you’ll get

["Joe", "Florian", "Kaspar", "Picky"]

as the range year:1900-2010 includes all the results.

Range queries the Ruby way

Picky internally uses Enumerable#inject, so any range will work. For example, initial:a-d will yield results for each "a", "b", "c", and "d". Cool, eh?

Not impressed? Read on…

Custom ranges!

Andy Kitchen was happy with the range queries, however he needed range queries that were wrapping. If somebody wanted to find eg. an event that was on between 10pm and 2am in the morning, the current range query implementation did not allow that, as event_start:10-2 did not work (#each or #inject will yield nothing).

Because Picky accepts any kind of range, he implemented a wrapping range (the version here is a slight rewrite of the original):

class Wrap12Hours
  include Enumerable

  def initialize(min, max)
    @hours = 12
    @min   = min.to_i
    @top   = max.to_i
    @top   += @hours if @top < @min
  end

  def each
    @min.upto(@top).each do |i|
      yield (i % @hours).to_s
    end
  end
end

This is then passed into an index category like this

category :hour, ranging: Wrap12Hours

to make Picky use this “ranging” for that category.

The result: If Wrap12Hours is given a range like 10-2, it will #each this: [10, 11, 0, 1, 2], which is exactly what he needed.

Picky range queries use #inject, but there is no #inject on Wrap12Hours – so why does it work? Note that Andy does an include Enumerable. Enumerable#inject uses the #each method which is already there to implement #inject and some other methods. Pretty snazzy! (And, I might add, the Ruby way of doing things)

The ability to implement custom ranges is very powerful and underlines the flexibility of Picky.

Automatic word segmentation

Just a quick note on this as it is just a sketch, currently. A fully functional sketch, though.

What if you want to not split on a regexp as you would usually, but you’d like Picky to split on words in the index.

So if you had “purple”, “rainbow”, and “pony” (don’t ask) in your index, then you’d want Picky to automatically split a query like “purplerainbowpony” into “purple”, “rainbow”, “pony”.

This can be achieved by giving the search category option splits_text_on an automatic splitter rather than a regexp. The automatic splitter is initialized with the index category you’d like to use for the splitter.

automatic_splitter = Picky::Splitters::Automatic.new index[:text]

some_search = Picky::Search.new index do
  searching splits_text_on: automatic_splitter
end

That’s it!

Note that if you want to test the spitter itself you can simply call #split on it, as this is the method called by the Picky Tokenizer to split incoming queries:

automatic_splitter.split 'hellopicky' # => ['hello', 'picky']

Please give it a go and report back!

The partial option

The automatic splitter supports a partial option. This will make Picky also use the partial index.

automatic_splitter = Picky::Splitters::Automatic.new index[:text], partial: true

What does it mean? It means that it will

automatic_splitter.split 'hellopic' # => ['hello', 'pic']

correctly split off the partial ‘pic’. The non-partial version would simply split off ‘hello’:

automatic_splitter.split 'hellopic' # => ['hello']

Have fun!

As Picky grows and grows, I am especially happy that Picky is fed well by its enthusiastic and helpful users.

This is much appreciated, amigos! Keep it coming :D

Outlook for Picky 5

The above features will – after some polishing and feedback – be included into Picky 5.

Environments

After a discussion with Kaspar Schiess (my cofounder at The Technology Astronauts), I am very inclined to drop environments (ie. development, test, production) in the next Picky.

Have you ever asked yourself if you really need environments?

I hope to cover this topic in the next post.

Cheers, and have (pink, tentacly) fun!

Next Picky Tutorial: Rails 3.2

Share


Previous

Comments?