Searching with Picky: Data Sources Tweet
What is a Data Source in Picky?
A data source is where the indexes get their data. Every index needs a data source.
The way to do this is pass the
index(identifier, source) method’s source param a source instance, like so (in
Here we passed a database source that uses a simple select. Which database the source uses is defined in the file
books_index = index :books, Sources::DB.new('SELECT id, title, author FROM books', file: 'app/db.yml')
app/db.ymland follows the configuration structure of Active Record. You could, instead of passing in a
fileoption, just pass in the Active Record config hash.
There are various data sources already defined beside the DB source (see below), but if the one you need is missing, writing your own is easy.
After that comes the most important part in Picky! :) No, really. Because what we are now going to do is categorize the data we got from the source.
Categorizing the data is so important, because it allows Picky to make guesses as to which category a query word is in and get better feedback from the user. Say, if you categorized both first name and last name in the category
name, Picky would not be able to help your users find what you are looking for, since it can’t ask back specifically what you mean, like “Did you mean Florian as first name or last name?”.
It’s best if you just get started, and see for yourself. Picky is best experienced, and not told.
Back to the example: Now that we have defined a data source, it’s easy to define a category on it. If you define a
it will use whatever data came back from the database.
If your database doesn’t have nice column names, don’t worry, you have two options:
SELECT id, t_01 as title ... or use the
from option when you define the category:
books_index.define_category :title, :from => :t_01
from option is quite cool, as it allows you to have multiple categories on the same data! Say you wanted a similarity search in one category and none on the other:
Lots of possibilities, I’m sure you’ll find more useful ones!
books_index.define_category :title, :from => :t_01 books_index.define_category :similar_title, :from => :t_01, similarity: Similarity::Phonetic.new(3)
There’s more. You can have crazy indexes where every category has its own data source:
Now the title category takes its data from a library.csv. If you do this, be careful that all data sources use the same ids or Picky’s core mechanism won’t work.
books_index.define_category :title, source: Sources::CSV.new(:title, :author, file: 'data/library.csv', col_sep: ',')
Currently available data sources
Picky offers a few data sources,
DB for databases,
CSV for comma-separated files,
Couch for couch DB, and
Delicious, for delicious bookmarks. Mmh.
This is how you use them. We’ve already seen the database source:
Don’t hesitate to use JOINs or other SQL expressions for some extreme databasing!
Sources::DB.new('SELECT id, title, author FROM books', file: 'app/db.yml')
This source assumes that your first column is the id column. It takes its data from the file given in the
Sources::CSV.new(:title, :author, :isbn, :year, :publisher, :subjects, file: 'data/books.csv')
The CouchDB source takes a url where couch DB serves its data. By default it assumes that you are using Hex Keys. But you can pass in one of
Sources::Couch.new(:title, :author, :isbn, url: 'http://localhost:5984/picky', keys: Sources::Couch::UUIDKeys.new)
keysoption to tell Picky what keys you have. I’m afraid that currently you have to recalculate your keys in the client to get back the original keys. I am working on non-integer keys, but it takes its time. Sorry about that.
Delicious is the easiest source, since it comes with fixed data categories
urlthat you can categorize.
How do I define my own Data Source?
Defining your own source is easy. The Couch DB source for example has actually been sent in by Stanley.
This piece of code is the superclass of all sources in Picky and is there simply for illustrative purposes, so you can see what methods should be implemented: http://github.com/floere/picky/blob/master/server/lib/picky/sources/base.rb.
I recommend to make your source also its subclass, since it implements empty methods that are called by the indexer. But it actually just needs one worker method. This one:
It gets the index and the current category and should
yield(id, text_data_for_id). It is called by the indexer when it needs the data.
The two other methods that are called by the indexer are
connect_backend, which is called once per index/category, and
take_snapshot, which is called once for each index, before
harvest-ing the data. Use it to create temporary tables etc.
So if your duck subclasses
#harvest and yields
id, text_data_for_id your data source is set to go!
Simple and easy to understand, isn’t it?
So we’ve seen
- what a data source in Picky is.
- what data sources are currently available.
- how you write your own.
Hope you learnt something new :)
Contributing one to Picky
If you write your own data source, please let me know!Next Searching with Picky: Live Parameters Part 1
Previous Searching with Picky: Partial Search