Saturday, July 7, 2007

Beagle vs GDLinux (Google Desktop Linux)

Beagle
- Slow (3-4 second) when searching something
- Must be forced reindex all documents regularly, otherwise Beagle accuracy will be decreased. I created a script to force index Beagle at least once a week to maintain Beagle accuracy
+ Can be manually forced to use all CPU for indexing, so indexing document is very fast. In my case (around 10 GB data), indexing is finished within 3 hours (estimation only)
+ Can find text inside any Microsoft Office document

GDLinux (beta)
+ Very fast when searching something
- Can not be manually forced to use all CPU for indexing. Building index only done using idle time. In my case, building index for a 10GB data took around 5 hrs idle time, and in reality the index finished in one day (around 15 hours....)
- Can not find text inside Microsoft Office documents - looks like this version of gdlinux only index the title of Microsoft Office documents

Recommendation
So far, I think the best solution is using Beagle for searching Microsoft Office documents only, and using GDlinux for searching the rest.

1 comment:

Unknown said...

Hi,

Must be forced reindex all documents regularly, otherwise Beagle accuracy will be decreased. I created a script to force index Beagle at least once a week to maintain Beagle accuracy

I'm the maintainer of Beagle -- this shouldn't be necessary. Can you give me a better idea of why you need to do this? Feel free to email me at joe@joeshaw.org.

Thanks,
Joe