Normalizing Indexed Data Tweet
A quick blog post on a Picky tokenizer option.
Intro / Problem
On mobile devices it can be a bit annoying to enter special symbols, like
&, and it would be easier to just enter
Or maybe there are a lot of abbreviations, like
e.g., but you’d still like to find the item when searching for
Or maybe you’d like number
1 to be findable with
In the search engine domain, this is one part of text normalization, the examples being expanding abbreviations and converting numbers.
In Picky, this is done using the tokenizer option
Tokenizer option “normalizes_words”
This option makes the tokenizer normalize words before indexing them.
The usage is very simple. Just pass a 2d array of regexps and replacement terms into the
normalizes_words option, like so:
index = Picky::Index.new :normalized do indexing normalizes_words: [ [/\+/, 'plus'], # + -> plus [/\&/, 'and'], # & -> and [/\w\//, 'with'], # w/ -> with [/abbr(ev)?/, 'abbreviation'], # abbr, abbrev -> abbreviation [/e\.g\./, 'example given'] # e.g. -> example given (note that the . have to survive) ] end
- character removal
- character replacement
are specifically handled in options
and should be handled there.
What if this doesn’t work for you?
No problemo! Picky is all Ruby, so feel free to either monkey patch, or probably better: Preprocess the data to your heart’s content.
Have fun!Next Guest Post: Chris Corbyn of Flippa
Previous CocoaPods Search Design