Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When you can't have Apache Lucene, because you are using PHP/RoR/Django/node.js/whatever, you can always have "basic text indexing and search".

that's a bad news for users of all those PHP/RoR/Django/node etc. apps, who will never get proper on site search functionality. majority of lazy devs won't go for Solr-like solution



My environment of choice is Python. I've used solr before but for a project a while back I used the pure Python Whoosh http://packages.python.org/Whoosh/

The intention was to quickly develop the extra search pieces needed in Python and then port them to solr. (For example we needed a custom scoring mechanism, and needed to experiment with spelling errors, pronunciation equivalency etc). However Whoosh turned out performant enough that I didn't need to touch solr again (XML config files always make me judder!)

So if you Python, I strong recommend giving Whoosh a go especially when starting out a project as you'll be more productive.


This is just flat out misinformation. You can use lucene from pretty much any environment. In rails it is utterly trivial to integrate solr/lucene, it's probably about 2 or 3 lines of code. I assume it's similar for other frameworks.


Wouldn't you want to use Solr (which wraps Lucene under a nice API) if you're using PHP? It's what I've always done.

I don't see the bad news at all, if you wan't to implement proper search you need something like that.


Solr works great for english language, but once you want to have other languages support, you will want to use Lucene directly.

anyway, what if my hosting provider won't let me run Solr or any other java software?


What host doesn't let you run java software? I did a dry run a long time ago with solr and ubuntu under VMware fusion in a half gig VM (or maybe 684 Meg), and it's my impression that solr won't run well in limited memory (sphinx works fine) but it's been a while


You can't run Solr/Lucene properly on Google App Engine(just an example), other software may run better in such circumstances, but its quality is questionable


We used it only for Spanish, it worked well enough (and fast enough). We deployed on bare metal so no hosting provider (appart from the rack space) was in the middle. If you are doing things this "difficult" you'll need at least a VPS of course.


Solr will support any combination of analyzers that lucene supports.


Lucene search with Rails is easy and very common. Sunspot[http://sunspot.github.com/] is just one example.


Don't know about other frameworks, but for RoR there are a lot of full-text search solutions that can be setup with almost 0 effort.


Thanks to rake and brew etc, Thinking sphinx, sunspot and elastic search/tire are all pretty easy with rails if you want default indexes on English language docs. It all gets complex quickly when you start layering on multiple search strategies and indexes, n-gram search, convert ISO latin to ASCII, etc, not to mention the S word "scaling" and anything near realtime index updates

http://stackoverflow.com/questions/9160305/elastic-search-vs...

http://adventuresincoding.com/2012/05/full-text-search-in-ra...

http://www.slideshare.net/dkeener/rails-and-the-apache-solr-...


what if you want to have high quality search service? install even better search plugin in RoR?


As long as there is a better one - yes. Why not? There are elasticsearch-clients that pretty much plug into activerecord and elasticsearch is pretty darn good.


You are not speaking very clearly.

What exactly is the issue/problem that you're trying to convey?

One can use any search solution, from the most basic to the most advanced one, with any server-side technology.

Even if your hosting provider doesn't allow you to run some technology stack, there are hosted search solutions with APIs you can use.

And it's not like any but the most basic of sites should not use at least a VPS anyway.


>When you can't have Apache Lucene, because you are using PHP/RoR/Django/node.js/whatever, you can always have "basic text indexing and search".

Nothing in PHP/RoR/Django/node.js makes them incompatible with Lucene and/or Solr. You just need to run a jvm in parallel.

And it's not like every page needs a "full text indexing and search" solution.

Personally, for a lot of use cases I prefer exact string matches over BS stem indexing.


>Personally, for a lot of use cases I prefer exact string matches over BS stem indexing.

Really? I've worked on a few search projects in different spaces (venues (aka places/stores), source code, and products) in the past, and while exact string matches are often a good sign of quality, stemming and other analyzers make huge improvements in recall (and when measuring transaction volume in A/B testing strict string matching performed substantially worse). Certainly if you throw out the exact match signal (i.e. only index stemmed) I've seen that result in a deterioration of quality. What sort of data do you work with?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: