From Electron Cloud
																				| Line 6: | Line 6: | ||
| * use plugins or external parsers to convert unsupported things (like Word documents and PDFs) to text for indexing (I've done that, it is the Swish++ approach) | * use plugins or external parsers to convert unsupported things (like Word documents and PDFs) to text for indexing (I've done that, it is the Swish++ approach) | ||
| * the client can search only the local machine, or can distribute the same query to a collection of machines on the network, and aggregate results, showing where the found documents are located | * the client can search only the local machine, or can distribute the same query to a collection of machines on the network, and aggregate results, showing where the found documents are located | ||
| + | * for remote machines which are turned off sometimes, a machine which is more likely to be turned on can be configured as a proxy, so you can search offline files | ||
| * security features like slocate (users can optionally not be allowed to search files they aren't allowed to read) | * security features like slocate (users can optionally not be allowed to search files they aren't allowed to read) | ||
| * re-index files as they change (get notifications via inotify or something) | * re-index files as they change (get notifications via inotify or something) | ||
Latest revision as of 20:38, 10 March 2009
Strigi and Beagle are too slow. locate does not give enough information. locate does not work across the network. Swish++ is fast enough though to implement a really complete whole-system search solution. I even used it that way once at a previous job. (It must be native code, not Python/C#/Java etc!)
Here are the features I want:
- index ALL files, not just my home directory (I've done that)
- include symbols in libraries (.a and .so) (I've done that)
- use plugins or external parsers to convert unsupported things (like Word documents and PDFs) to text for indexing (I've done that, it is the Swish++ approach)
- the client can search only the local machine, or can distribute the same query to a collection of machines on the network, and aggregate results, showing where the found documents are located
- for remote machines which are turned off sometimes, a machine which is more likely to be turned on can be configured as a proxy, so you can search offline files
- security features like slocate (users can optionally not be allowed to search files they aren't allowed to read)
- re-index files as they change (get notifications via inotify or something)
- cross-platform (Linux, Mac and Windows) (on mac maybe just tie into spotlight, make the database available across the net)
- search text in open xterms/rxvts/konsoles too (often I have way too many of them open and forget what I was doing where)
-  maybe quickie searches (for applications, or text in open apps) should be a different UI than searching for documents etc.
- from a technical perspective it makes sense to combine search for apps and docs (like spotlight and the KDE4 search thingy on the start menu), but put the search for text in open windows into the window list (the one that flies out from the upper-right corner). So those two aren't actually related: documents and apps are indexed, but the "open windows" search is on-demand: the apps need to support an API for doing that. Not sure which is best for usability though.
 
Of course it needs to integrate into the "universal command line" some day. (link suggestions show up as you are typing)