essays |
---|
The user end of any search engine is just digging through their database via a web interface, and there are many
other web-searchable databases: 411's,
lexical databases (like lexfn), and etexts, for example.
A metasearch just adds another layer onto the onion, to act as an interpreter between us and the databases. (Or rather,
between us and the web interfaces to the databases, an interpreter talking to an interpreter, so sometimes you don't
get the joke). Perl or PHP gives more flexibility,
but frontends can be javascripted, see web scripting secrets,
Bombastic Search Engine Front-end, or
snooz and all-in-one searches on fravia's.
It's easy to make your own two-dimensional (linear, read: slow) metasearches. Anyone with even basic programming skills can use LWP::Simple & HTML::Parser to automate web retrieval. I think REBOL is particularly well suited to these built-in html parser and document retriever. A 3d one will take some experience with multi-theading/tasking (i don't have yet, other than playing with fork()s ;)). There are several other search-engine front-ends on the web, Oingo tries to give natural language recognition to altavista and dmoz, as did electricmonk.com, which seems to be closed as of this writing.
Other possible directions to take metasearching (or front-ending?):
Hmm, let's try it real quick, using say, +%archetypal figure +Jung, 'coz I'm interested in that stuff right
now :) Entering that query in the synonyms metasearcher will return the following url : http://www.altavista.com/cgi-bin/query?sc=on&hl=on&kl=en&pg=q&
text=yes&q=%2b%28archetypal+|+archetypical+|+prototypal+|+prototypic+|+prototypical%29+figure+%2bJung&search=Search)
And, wow, that came out much better than I thought it would, heh, good example :)
Well, here is the relevant part of the cgi (the rest is just html stuff):
use LWP::Simple; #no need for anything more in-depth @in = split(/&/,$ENV{'QUERY_STRING'}); foreach $i(@in){ $i =~ s/\+/ /g; $i =~ s/%(..)/chr(hex($1))/ge; @key_val = split(/=/,$i,2); $in{$key_val[0]} = $key_val[1]; } $in{'q'}=~s/[^\w()|+\-~"% ]//; #strip bad chars. open(L,"<<$logf"); print L time()."\n$ENV{'REMOTE_ADDR'}\n$ENV{'HTTP_USER_AGENT'}\n$in{'q'}\n"; close(L); #there _must_ be a better way to do this! am I just stupid or what?! $b=$in{'q'}; while($b=~/^.*?"(.*?)"(.*)$/s){ $a=$1; $b=$2; $c=$a; $c=~tr/ /_/s; $in{'q'}=~s/$a/$c/s; } @tokens=split(/ /,$in{'q'}); foreach $token(@tokens){$token=~tr/_/ /;} foreach $token(@tokens){ if($token=~/^([+\-~|]*\(?)%/){ $t="$1("; $token=~s/[^\w ]//g; @syns=($token); $token=~s/ /+/g; #grabbing and parsing out the synonyms $p=get("http://www.raisch.com/cgi-bin/lexfn/lexfn-cuff.cgi?sWord=$token&tWord=&query=show&maxReach=2&ASYN=on&ABAK=on") or last; $p=~/^.*<\/form<(.*)<font/is; $p=$1; while($p=~/<b<<a.*?<(.*?)<\/a<(.*)$/is){ push @syns, $1; $p=$2; } #quote spaced synonyms foreach $s(@syns){ if($s=~/ /){$s='"'.$s.'"';} } $token=$t.join(' | ',@syns).')'; } $query=join(' ',@tokens); $query=~s/\+/%2b/g; $query=~s/ /+/g; $url="http://www.altavista.com/cgi-bin/query?sc=on&hl=on&kl=$in{'kl'}&pg=q&text=yes&q=$query&search=Search"; print "Location: $url\n"; print "Content-type: text/html\n\n";Is this kind of se additions useful? or maybe more can be got from Lexical FreeNet?