~ Berlin talk: 22C3
Private Investigations ~
Thursday December 29th, 2005 ~
Time: 13:00 ~
Location: 22C3 Berlin
This file dwells at http://www.searchlores.org/private.htm
INTRODUCTION Structure, Opera and Proxomitron | Top |
Excuse my English, please, which is my third language,
and please note that I'm not sure I'll always be able to be politically correct.
Also -as you can see- no powerpoint in my talks: there's no need to turn everything into a sales pitch,
and with powerpoint even the few pre-chewed "ideas",
hidden inside the bambinesque noise, are simplified to the point that they become
redundant and unclear.
The purpose of this talk is to show you how to search effectively the web, and hence
give you cosmic power, no more and no less. In only one hour we'll be able to
examinate just a broad palette of searching approaches. Let's hope that your attention span is good enough, and that
you'll be later able to work by yourself, in order to learn much more on your own: Probieren geht
über studieren, duh, yet without your own work and application
most of you will just remain the poor "one word" searchers that they are.
It would be a pity!
Once you know how to search the web,
the entire human knowledge will become available to you.
Enough blabbering: let's begin from the beginning. Using google, we can see how a simple "moronical" query like
"index.of" warez
has your target
signal
submerged under such a heavy commercial noise to be next to useless. In fact people create on their servers many
Index of/archiv/warez or Index of/archiv/porn subdirectories
just in order to attract some (moronical) traffic :-)
The index.of querystring is one of the oldest tricks used to bypass the commercial vultures, in the hope
to fetch directly the targets you are seeking.
It still works, btw: playmates index.of
will indeed give you some
good results. But the commercial beasts and the search engines' spammers (that call themselves SEOs)
incorporated a long time ago this index.of
string in their scam pages, so you cannot rely on this querystring -alone- anymore.
So what should we use instead?
Well, a query like the following one should cut some more mustard (In case people should look for software on the web instead of buying it):
("wares" OR "warez" OR "appz" OR "gamez" OR "abandoned" OR "pirate" OR "war3z") ("download" OR "ftp" OR "index of" OR "cracked" OR "release" OR "full") ("nfo" OR "rar" OR "zip" OR "ace")
Of course this is useful when used with a specific target NAME:
teleport.pro ("wares" OR "warez" OR "appz" OR "gamez" OR "abandoned" OR "pirate" OR "war3z") ("download" OR "ftp" OR "index of" OR "cracked" OR "release" OR "full") ("nfo" OR "rar" OR "zip" OR "ace"),
which confirms the paramount importance of NAMES on the Web.
Here another nice mp3 querystring, just smash it inside google:
imagine "snd *.mp3 *-*-2005 *:* *.*m" OR "snd *.mp3 *-*-2005 *:* *.*k" OR "snd *.mp3 *-*-2005 *:* *.*"
and obtain the following link: http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=imagine++%22snd+*.mp3+*-*-2005+*%3A*+*.*m%22+OR+%22snd+*.mp3+*-*-2005+*%3A*+*.*k%22+OR+%22snd+*.mp3+*-*-2005+*%3A*+*.*%22++&btnG=Search
Change imagine to boogie, dylan, garfunkel, mendelssohn, mozart or whatnots. Change 2005 to 2004 (or earlier) for a different (but possibly more stale) search.
Look at the query:
can you understand WHY this query works?
These above are just examples, and using just one of the main search engines: Google, one of
the most important, but (and hence) also one of the most abused search engines.
Always remember that
the main search engines (at the moment the most important ones are google, yahoo, msn and teoma) cover -at best- just a third
of the whole web...
So the problem is to wade through the slimy commercial morasses of the web, which were
made specifically "ad captandum vulgus", and to quickly "cut"
this useless ballast in order to find our targets.
We'll use google a lot as an example today, but
we'll examine various different ways to fetch your targets "by hand". I say "manually" because that's what
we are going to do for pedagocical reasons.
Real seekers try to
automate the process as much as possible and usually employ ad hoc bots to do the gritty digging. But that's
for later, first the basics.
Let's first have a short look at what the web looks like from a searcher's point of view.
Outside linkers are fetched through klebing (and stalking and social engineering),
the bulk and the outside linked
through combing and short and long term seeking, the hidden and commercial databases through
password breaking or guessing, social engineering or, more simply, just seeking
databases'
hardcoded passwords (à la
Borland Interbase's
"politically correct") on the web.
Here is for instance one of these lists:
defpasslist1.htm
In fact the web was made for SHARING information, not for "hoarding" nor for "selling" it. And
it was made for solidity: its structure was made
in order to resist a possible nuclear attack. It will resist even the commercial beasts that
have tried to bury real useful information under tons of commercial crap, aggressive commercial
porn sites and an avalanche of silly and useless advertisements.
Learning how to search, you'll be able to "cut" through the commercial pudding and morasses and
fetch quickly (or relatively quickly) your target jewels.
But to "cut" the Web you'll need first of all a SWORD with a sharp blade: a capable and quick browser.
That's the first and foremost
step. MSIE, Microsoft Internet explorer is a no-no-no, too buggy, bloated and prone to all sort of nasty attacks. The
two current "philosophical schools" are either Firefox or Opera... which is the one I am using now.
Whichever of the two "real" browsers you use, no sword will suffice without a SHIELD. And your shield,
and a mighty one, is proxomitron.
Proxomitron is a very powerful tool. Its power lies
in its ability to rewrite webpages on the fly, filter communications between
your computer and the web servers of the sites you visit, and to allow easy management of external proxy use.
Here is a
link to an old, but very good essay about proxomitron basic installation: anony_8.htm, and a link
to
another essay, Oncle Faf goes inside proxomitron about further finetuning.... Let's sum it up:
"Only morons 'just do it' without Proxomitron."
A word of warning: You'll most probably forget most of the things you'll learn today rather quickly, since
nowadays most
young people (and many elder ones as well) after having been heavily bombarded by advertisements
from their birth onwards, have an attention span of just a few minutes and a memory as weak
as an autumn leaf. But hopefully you'll gather today the basics of searching correctly the web. You may
even want to test your skills, afterwards, on your own, on some assignements.
So, in case you forget,
for instance, how to quickly find any mp3 on the fly, you'll be able to use your combing
knowledge to quickly find on the web
many searchers who will teach you how to find mp3 rather quickly. Or maybe even their teachers :-)
MUST KNOW & DISCLAIMER sine qua non | Top |
Just a short tour around the house
Main, regional and local search engines
ftp, blogs and all the various targets
usenet irc and then, of course, trolls
Again: anonymity and stalking, maybe some luring as well...
Disclaimer
The information provided during this conference should only be used for academic purposes
and must not be used to infringe the patents, copyrights (or any other legal rights) of any company,
organisation, government or legal entity.
The information provided during this conference must not be used to engage in any illegal activity.
On the other hand you may use the information provided in order to FIGHT any illegal
activity, be it performed by private parties or by one of the aforementioned legal entities :-)
GUESSING is a very important art for seekers.
Here two 'images-related' examples of the "guessing" approach:
1)
A database and a search engine for advertisements,
completely free until recently, you could enlarge (and copy)
any picture and watch
any ad-spot.
Advertisements from Austria to Zimbabwe. Very useful
for advertisement reversing pourposes.
For instance: http://media3.adforum.com/zrIf58670C/B/BM/BMPD_03884/BMPD_03884_0048601T.JPG,
an English volkswagen advertisement.
Alas, the clowns are no longer free. Let's see what we can do.
Let's isolate the image, and now
let's play the guessing game, because we don't really want to [shudder] pay advertisers in order to see
their crap, do we?
Now we notice that BMPD_03884_0048601T.JPG has a "t" inside.
It may be "t" for tiny.
Then we may have "w" for wide and maybe also "a" for art :-)
(http://media3.adforum.com/zrIf58670C/B/BM/BMPD_03884/BMPD_03884_0048601W.JPG...see?
http://media3.adforum.com/zrIf58670C/B/BM/BMPD_03884/BMPD_03884_0048601AJPG...q.e.d.
The web was made for SHARING, not for hoarding and not for selling. It's very STRUCTURE will deliver
searchers whatever they are looking for.
But the web is really deep.
For instance, if you study advertisement debunking, you may also want to take account of the EVOLUTION of advertisement, and here is
where sites like http://scriptorium.lib.duke.edu/adaccess/browse.html (1911-1956)
may come handy.
2)
http://www.acclaimimages.com/_gallery/_pages/
As you can see, when you click on one of these links, for instance on this construction
crane, you get a useless small and watermarked image: http://www.acclaimimages.com/_gallery/_SM/0153-0512-1500-1119_SM.jpg.
Now note the STRUCTURE of this address: http://www.acclaimimages.com/_gallery/_SM/0153-0512-1500-1119_SM.jpg
That _SM must mean SMALL, and in this case, also, watermarked.
Some simple guessing (and some experience) will prove once again that the web was made for SHARING, not for hoarding and
not for selling: as you can see the 'similar pictures', below, have the following structure: http://www.acclaimimages.com/_gallery/_TN/0153-0512-1500-1119_TN.jpg,
hence _TN must be TINY.
Let's first try http://www.acclaimimages.com/_gallery/_BG/0153-0512-1500-1119_BG.jpg:
BG for BIG
Ahi, ahi: "was not found on this server". In fact, I'll stop here and
let you find out -I mean, guess out- the correct character
sequence as an assignement :-)
But we can still eliminate the watermarks now, so that you see that you may have a three letter combination as well as a two letters one :-)
http://www.acclaimimages.com/_gallery/_SM2/0153-0512-1500-1119_SM2.jpg
How did we guess that "SM2" -instead of SM- would have eliminated the watermark?
We didin't guess at all :-) We just searched
http://www.google.com/search?&rls=en&q=%22password=rya%22&num=100
(you can also try password=riaa, fuckriaa, and
so on, riaa is the "Recording Industry Association of America", a bunch of patents enforcers)
http://www.google.com/search?&rls=en&q=%22password=tolkien%22&num=100
Simple & elegant Webbit, for Combing, rather than for fishing pourposes...
You'll find more about "free" email in the ad hoc section of searchlores, the
most important thing is to NEVER give out real data on the web unless you are really compelled to do so (and
even in that case there are many ways to avoid it).
Always choose the first option, whatever it is,
when you (have to) "choose" some options from a menu ("Your income", "Your profession", Your "State" and so on):
State=Afganistan, Income=less than 15 euro per year and so on...
If you want to play, there are some funny national options like "American Samoa" "Fortune and Wallys
Islands" and so on.
The option "other" that you often find on these menus is also great, because you will get
the wannabye sniffers thinking hard about updating their long palette of options, adding even more crap to their possible choices.
Do not feel bad while feeding only lies to anyone asking for your data on line: such people
are just scum that will use EVERYTHING you tell them for profit the very moment you do,
and they don't even have the decency to admit it. Screw them black and blue, such
clowns deserve far worse than that: never believe for a minute that their 'privacy -
pleads' about how they will "never use your data" could be anything else than cheap sarcasm.
The
very reason they did set up such "free" email addresses sites (and such "free" search engines and
"free" file repositories) is -of course- to
READ everything you write and to have a copy of everything you upload or create.
Of course, klaro, no human being will ever read what you write, but their bots and grepping
algos will do it for the owners of the "free" email services
(or of the "free" search engines), presenting them
nice tables built on your private data as a result.
This brings us to a very interesting contradiction: on one site "echelon" and
the total big broterish control, on the other "wardriving" and pretty good anonymity...
Examples of "one shot" email addresses...
Mailinator http://www.mailinator.com/mailinator/Welcome.do
Anothe example:
http://www.pookmail.com/
For instance using pookmail as an example:
http://www.whois.sc/pookmail.com (scroll down for contact names and info)
A very powerful similar tool:
http://www.domainsdb.net/
How to discover IP-related sites | Top |
Erom pointed this gem out some time ago:
http://www.searchmee.com/web-info/ip-hunt.php
it allows to see which websites are cohosted on the same ip. Really great for hidden web private research, ahem ;)
A very powerful similar tool:
http://www.domainsdb.net/
Wikipedia: the power of good non commercial approaches | Top |
Very useful
for our in-depth "private investigations"
Useful autocompletion...
Lumrix
http://wiki.lumrix.net/en/
Of course also in German and so on: http://wiki.lumrix.net/de/
http://en.wikisource.org/wiki/Main_Page Wikibooks
How comes it works?
A legitimate question would be "why wikipedia works?". Anybody and his cat can write whatever he believes to be the truth.
The arbitration committee kills only the most obnoxious kooks and wanna-be experts, and leaves many incompetent buffoons
write whatever they want. The whole project is open
to trolls and with little defence against them. Yet it works perfectly, and this annoys all the clowns that hate any free successfull project :-)
There are even various idiots and subhumans that PLANT false information in wikipedia, on purpose, in order to accuse
it immediately afterwards of spreading false information. To no avail. Wikipedia works very
well.
So how comes it does work?
It works BECAUSE it is an open, anarchistical, non commercial, collaborative project: the power of the unwashed
masses against the academical experts.
Wikipedia is about as accurate on science as the Encyclopaedia Britannica:
The British journal Nature ran blind tests asking experts to compare scientific entries from both publications.
The reviewers were asked to check for errors, but were not told about the source of the information.
Only eight serious errors, such as misinterpretations of
important concepts, were detected in the pairs of articles reviewed,
four from each encyclopaedia.
Reviewers found 162 factual errors in the Wikipedia documents, compared to 123 in the Britannica documents.
Nature also said that its reviewers found that Wikipedia entries were often poorly
structured and confused.
Wikipedia's reliability is just a byproduct of the sheer SCALE of the project. It is not due to a peer-controlling academic
process (with all its strenghts and weaknesses), but to the 'self-improving' nature of information that is shared on the web.
That's the reason why its reliability, already better than many academic experts would be ready to admit, is IMPROVING. That's
why anybody that has some expertise in some specific field should contribute. That's why I will do it myself as soon as I find the time :-)
Wikipedia is indeed "a brilliant product of open-source intellectual collaboration". Even its enemies
now have obtorto collo to admit it :-)
Caveat lector, of course, but this holds true for all "established" encyclopaedias as well.
Indeed much knowledge lies outside of academic study and "experts" . However -again- caveat lector:
any seeker would soon be confused, inside or outside academia, without solid evaluation skills.
A small digression about scientific articles
| Top |
"The contradictions of journal searching"
Now, let's imagine that for our in-depth "private investigations" we need a given COMPLETE ARTICLE,
not an abstract, a complete text, and we do not want to pay anyone for that.
Let's imagine we want
something mathematic related, I haven chosen as examples ["polynomial"] and ["prime factorization"]
Most searchers would use the two most "common" search engines for MATHEMATIC-RELATED articles
of the visible web:
http://www.emis.de/ZMATH/, which you can use to start a search and
http://www.ams.org/mathscinet/search which you SHOULD NOT use, due to its commercial crappiness
Let's search for "polynomial"
Let's imagine we are interested in the
third result: "The minimum period of the Ehrhart quasi-polynomial of a rational polytope", alas! Now we would be
supposed "to pay" in order to consult/see/download it.
But we'r seekers, right?
Let's use a part of the abstract in order to fetch our target in extenso: " called the Ehrhart quasi-polynomial of"...
see? Let's repeat this with any other article on this database...
Of course we could also have used google scholar
So, we have seen how to bypass commercial yokes using the previously explained "long string searching" approach.
The funny thing is that the web is so deep that we do not need at all to go through such bazaars.
In fact the "open source" waves are already purifying the closed world of the scientific journals as well. Good riddance!
Let's search on The Front (arxiv.org), that
is slowly beating the two "established" euroamerican commercial repositories black and blue...
for instance: "prime factorization",
but, to keep our previous example, also: "The minimum period of the Ehrhart"... et voilà.
On one side the Americans, who do not even let you search if you do not pay up-front (US-mathscinet) & on the other
one the Europeans, who let you search, but then
want you to pay in order to fetch your results (EU-ZMATH). Of course we could still
find our targets starting from there, but it
is refreshing to know that there is also -amazingly coexisting on the same web- a complete 'journals' search engine, with a
better (& rapidly growing) database and everything you need for free: the Front ("It freed anyone from the need
to be in Princeton, Heidelberg or Paris in order to do frontier research").
So -once again- the web is BOTH a bottomless cornucopia and an immense
commercial garbage damp, and -of course- you need to know how to search both sides of the same mirror.
First of all, for your 'private investigations' a good start is the ISBN finder that all major
search engines provide:
isbn 0596005458 at google
isbn 0596005458 at yahoo
All on-line repositories are quite useful for finding books:
http://www.uploadscout.com/UploadScout/newindex.aspx: rapidshare & megaupload index.
You could input -for instance- digital photography on that search mask, but you can as well search with
rapidshare digital.photography: google
rapidshare "digital photography": yahoo
{frsh=94} {mtch=69} {popl=33} rapidshare "digitalphotography": msnsearch
or whatever local/main search engine you may like...
Well, I'm using "digital photography", or "photoshop"
query examples just to demonstrate that
finding "photoshop-related" books is almost as easy as writing them (everyone and his dog is writing a photoshop book
nowadays).
Yet
maybe many of the friends in this room would prefer, instead of "digital photography",
this kind of books?
Note, however, that the rapidshare search-examples above are JUST ONE EXAMPLE:
Rapidshare is one of many "upload repositories" where people can (and do with gusto) upload large files.
It's quick, it allows unlimited downloads, and it has some free-happy-hours in the morning.
So you don't need, of course, to pay. But there are many similar repositories:
rapidshare.de/: 30 Mb max, forever but after 30 days unused the file is removed, daily download limit of 3,000 MB for hosted files
YouSendIt: 1 Giga max, after 7 days or 25 downloads (whichever occurs first)
the file is automatically removed
mytempdir: 25 Mb max, 14 days * 1200 free downloads, after that only from 23.00 to 7.00.
Sendmefile: 30 Mb max, after 14 days the file is automatically removed
Megaupload: 500 Mb max (!), forever but after 30 days unused the file is removed (like rapidshare)
ultrashare.net/: 30 Mb max, forever but after 30 days unused the file is removed (like rapidshare)
http://www.spread-it.com/: 500Mb - Forever or after 14 days if unused
http://turboupload.com/: 70Mb - download delay in order to show pub
http://www.4shared.com/: 100Mb - 10Mb per file Forever or after 30 days if unused
and so on, a fairly complete list of many Files and images repositories is here.
Anyhow,
there's a whole section regarding books searches (and a fairly complete
library as well) at searchlores, and, if interested in finding
any book whatsoever, you'll be able
to find more pointers there.
Suffice to say that most books mankind has written are already on the web somewhere,
and that while we are sitting here, hic et nunc, hundreds of fully scanned libraries are going on line... if you
are attentive enough,
and if your searching scripts are good, you can
even hear the clincking "thud" of those huge databases going on line...
In order to fetch books you just need some correct strings.
A simple trick is to use the powerful A9 engine,
for instance, for conan doyle,
http://a9.com/conan%20doyle?a=obooks
and then fetch the
study
in scarlet.
Of course once we have some arrows, it is
relatively easy to fetch whole copies of a book all over the web...
A simple trick is to use google's books' search facility. Let's search for 'The Hound of the Baskervilles':
http://books.google.com/books?q=doyle&btnG=Search+Books&hl=en:
now let's chose a phrase from page three (more pages we cannot see because of the "patents' dictatorship"): "The probability lies in that direction":
It's more than enough: "The probability lies in that direction". q.e.d.
Of course this is also true for all kind of patented books... let's see:
"I have no fitting gifts to give you at our parting,"...
and we land here, for instance: 'I have no fitting gifts to give you at our parting,' said Faramir;
`but take these staves... (J.R.R. Tolkien: Two Towers)
The more "popular" a target, the easier it is to find it:
"some students were standing up to get a better look at Harry as he sat, frozen, in his seat"
(Harry Potter and the Goblet of Fire)
A msnsearch "index of" webbit:
Nov-2005
intitle:"Index of /" {frsh=9999} , for isntance Nov-2005 intitle:"Index of /" {frsh=9999} "digital photography"
A "classical" Bookish Webbit: -inurl:htm -inurl:html intitle:"index of" +("/ebooks"|"/book") +(chm|pdf|zip) +"o'reilly"
IMAGES SEARCHING APPROACHES | Top |
Target name guessing.
Uhmmm. Is this a 'mature' and enough politically uncorrect audience? Anyway we are speaking of 'private' investigations,
aren't we?
Here is a better search than
the (still working) playmates index.of that we have seen
at the beginning:
1981-02.jpg playmates
Of course you could try a different approach:
Finally, a very nice trick to avoid those "index of" spammers & clowns
intitle:"index of/" "Apr-2004" "jpg" playboy
note
the "Apr-2004" snippet, that you can change at leisure :-)
It's often useful to (try to) find images with some ad hoc target name guessing.
In this case, since we have one playmate every month, chances are that there are date-related images' URLs.
Note that you could just try 1983-03.jpg,
without the specifying suffix 'playmates', or youy could change that suffix to 'playboy', or you could
repeat the search with
1983_03.jpg (note the underscore
instead of the hyphen) and so on, or even try
something like
playmate6.jpg, in the (correct)
assumption, that where there are at least 6 jpgs, you'll have more.
Of course such searches do not need to be so frivolous or 'Pr0n' oriented:
monet8.jpg,
and you'll land in Monet-heavy and images-rich sites.
Here are some useful images' repositories (à la 'rapidshare'):
http://www.fapomatic.com/,
http://www.imghost.com/,
http://www.glowfoto.com/,
http://www.imageshack.us/,
http://www.imgspot.com/,
http://www.mytempdir.com/,
http://www.bestupload.com/,
http://www.netpix.org/,
http://www.jotapeges.com/,
http://www.rapidshare.com/,
http://www.filesupload.com/,
http://www.updownloadserver.de/,
http://www.dropload.com/,
http://www.sendthisfile.com/,
http://www.fireupload.com/,
http://www.yousendit.com/,
http://www.youshareit.com/,
http://www.glintfiles.net/,
http://www.paintedover.com/,
http://www.2and2.com/,
http://www.imagehosting.com/,
http://www.xs.com/,
http://www.imagehigh.com/,
http://www.imagevenue.com/,
http://www.shareitagain.com/,
http://www.ultrashare.net/,
http://www.sendmefile.com/,
http://www.perushare.com/,
http://www.megaupload.com/,
http://www.imageranch.com/,
http://www.photobucket.com/
A completely new wave of music searching has opened up through the relatively recent
mp3 blogs phenomenon, but usually it is MUCH simpler to just fetch the music you
need from the web any time you need it.
See the Combing webbit above.
Your phantasy is the limit!
Simply adding for
instance "4.6M" (or whatever similar you may fancy) to your querystring
will ensure that there are enough big and juicy MP3
in your targets: imagine lennon mp3 OR ma4 OR ogg intitle:"Index of" -metallica +"4.6M"
Most simple trick:
"index of" imagine m4a|wma
Another one (for music videos):
?intitle:index.of? "crazy frog" wmv "axel f"
or
?intitle:index.of? "madonna" wmv
Another one (for mp3 & co):
intitle:index.of + mp3 + "garfunkel" -html -htm -php -asp -txt -pls
Another one:
intitle:index.of + "mp3" + "band name" -htm -html -php -asp
Or even this, found with the previous query, so big that it may crash our browsers...
http://24.91.184.80/jserver/files/music/
or
http://mensa.familia.rebello.nom.br/media/Som/MPG_RA_VQF/Mp3/,
found through
imagine lennon mp3 OR ma4 OR ogg intitle:"Index of" -metallica
Some of the webbits we use for images work for music as well:
intitle:"index of/" "Apr-2004" "jpg" garfunkel
However, all these tecniques are overkill. In fact the amazing thing is that
even the most stupid searches, those that should NOT work, will give results:
madonna index.of mp3... q.e.d:
wherever, whenever, whatever.
Here are, as promised, some "privated investigation related magic"...
Finding photos BY CAMERA model
http://photos.alexa.com/
This is interesting because is part of the now public Alexa indexes: http://websearch.alexa.com/welcome.html
How to subdivide a query in manageable chunks (by Various Authors)
Do any search engines
or techniques exist to get more than -say- 1000 results from a search engine?
Usually you just refine your search.
You can narrow your query in various ways:
eliminating crap (the infamous -tits example)
adding broader -but relevant- terms ("digital photography": 20.400.000, +"shutter priority"= 190.000 +tiff"=40.500)
and with the results it is still good to jump first
to -say page five or ten, and then go backwards when evaluating the results :-)
If you have time, you can try things like these on Google
(the strategy for other search engines is mutatis mutandis
the same):With this kind of strategy
you can divide your SERPs in more or less four equal parts. If you use another common keyword or feature,
you can double the number of equal sets, for each new keyword...
There is a more direct way to achieve what you are asking,with the antipagination extension in firefox.
https://addons.mozilla.org/extensions/moreinfo.php?id=853
It flattens result pages (even works in forum pages)
There is a userjs that works in opera too and does the same only for google's result pages here:
http://userscripts.org/scripts/show/1392
Another trick:
blabla -inurl:htm 1.680.000
blabla -inurl:html 2.050.000
the differences are noticeable after the first pumped results
of course you can add and play with -/+ php or -/+ pdf or regional parameters (-/+fr -/+nl etcetera)
1) Sourceror2 (by Mordred & rai.jack)
try it right away
Right click and, in opera, select "add link to bookmarks"
javascript: z0x=document.createElement('form'); f0z=document.documentElement; z0x.innerHTML = '<textarea rows=10 cols=80>' + f0z.innerHTML + '</textarea><br>'; f0z.insertBefore(z0x, f0z.firstChild); void(0);
javascript:document.write(document.documentElement.outerHTML.replace(new RegExp("<","g"), "<"));
2) Another google approach
http://www.google.com/complete/search?hl=en&js=tru%20e&qu=photography
3) Another google approach (by Mordred)
Here is a
way to gather relevant info about your target
"index+of/" "rain.wav******"
Useful to see date and size that follow your target name...
bookmarklets: Bookmarklets: Weapons for the seeker
4)Googe's advance operators: "aeroplane finder" and other crap
Here is a
google's easy implementation:
from berlin to helsinki
Clicking on the first link)
you have an automatic price comparison.
On a similar path, there is the useful define: operator we have already seen, and all
the other advanced operators (stocks and other crap).
A possibly useful one is the 'change' operator: 234 USD in euro,
234 euro in CHF,
234 french money in GBP,
currency
of germany in malaysian money
and so on.
Another useful possibility are the mathematical operators:
twenty miles in kilometers
45 Fahrenheit in celsius
((894151*66771)+456)/1241: 48 109 070.8
But here you should not use google, for mathematical calculations
yahoo is better: ((894151*66771)+456)/1241=48,109,070.8114423826.
5) ElKilla bookmarklet (by ritz)
try it right away (no more clicking, press DEL to delete and ESC to cancel)
Right click and, in opera, select "add link to bookmarks"
More about bookmarklets in the javascript bookmark tricks essay.
http://fireddl.info/apps.htm: one of the many doors to the warez world
SEARCHING FOR DISAPPEARED SITES | Top |
http://webdev.archive.org/
~ The 'Wayback' machine, explore the Net as it was!
Visit The 'Wayback' machine at Alexa,
or try your luck with the form below.
Alternatively, learn how to navigate through
[Google's cache]!
(http://www.netcraft.com/ ~ Explore 15,049,382 web sites)
VERY useful: you find a lot of sites based on their own name, which is another possible way to get to your target...
http://local.live.com/: pretty good ms concoction
http://maps.google.com/: google starter
http://maps.yahoo.com/: Yahoo (limited to the states and kanuk): for instance:
zip 56554
This is useful for ALL Europe:
http://www.nl.map24.com/ just input street, town and country :-)
The same in english: http://www.uk.map24.com/
Check also the ad hoc stalking section peoplesearch
Structure of the Web ~
WebStructure + Hidden Web ~
Main search engines' coverage ~
Short and long term seeking: % ~
Short and long term seeking: noise ~
Structure of the web
Short and long term seeking: percentages
Main search engines' coverage | Top |
Bulk, Hidden web and main search engines' coverage
The structure of the Web, Hidden web visible | Top |
Structure of the web. Explain tie
model and diameter 19-21: do not dispair, never...
How big is the web? 24 billions? The s.e. cover between 1/3 and 1/4...
Short term searching/Long term searching: noise | Top |
Popularity versus time
Both axes on a logarithmic scale
1) 5 minutes: Harry Potter,
the Goblet of Fire ~ the Half-Blood Prince (pdf or doc)
2) John Lennon "Imagine" (mp3 or ma4) (using imagine lennon mp3 OR ma4 OR ogg intitle:"Index of" -metallica and jumping direct to page 3)
3) Lord of the rings trilogy (pdf, html or audiobook)
4) Nero Wolfe: The girl who cried wolf, audio (or any other earlier radioshow)
5) an old Luis Vitton advertisement: (they want us to pay, so let's enlarge it manually
(http://media3.adforum.com/zrIf58670C/E/EU/EURR_01547/EURR_01547_0005730W.JPG... you want more?
http://media3.adforum.com/zrIf58670C/E/EU/EURR_01547/EURR_01547_0005730A.JPG
6) Several years: A Black and white Bulgarian film of the fifties, or even from the late seventies,
for instance
ADVANTAGE
Bulgaria 1978, 142 min. Dir.: Georgi Dyulgerov
What a search looks like (Private investigations: The cranberry path) | Top |
"Your nose is as red as that cranberry sauce," answered Fan,
coming out of the big chair where she had been curled up for an
hour or two"
Hey, what the heck is a cranberry?
NON SPECIFIC LINKS/APPROACHES (can be used for most targets, doesn't need to be a cranberry :-)
google define: cranberry
wikipedia cranberry
yahoo education cranberry
cranberry: The Columbia Encyclopedia, 6th Edition.
Assorted links (out of thin air):
cranberry institute --> cranberryinstitute;
cranberry magazine --> cranberriesmagazine
cranberry bibliography --> Maine Uni bibliography
Wisconsin Cranberry School Proceedings (browse journals)
google images
& yahoo images
SPECIFIC LINKS/APPROACHES (cranberry-related: should be used only for plants-targets)
plants database
Synecdochical searching | Top |
A
Synecdoche ("sin-EK-doh-kee") is the rhetorical or metaphorical
substitution of a part for the whole, or vice versa. This approach is widely used in searching,
because it allows you to get at your signal 'from the bottom', eliminating part
of the noise.
For some specific examples see synecdoc.htm.
Here let's just have "a visual look" at a search:
The red cylinder below represents the TOTALITY of accessible web sites that
could be of interest to you -in the context of your current search. The small
rings shows four different specific clusters of interesting sites.
Please remember that inside the cylinder the 'void' is only APPARENT!
That's the part
of the internet you cannot reach through the main search engines. There are interesting sites
there as well (as a matter of fact MANY more than on the 'accessible' outside), but to grab them you'll have to use
more advanced techniques than commercial engines :-)
1 You land first time to an interesting cluster of sites trough your 'clean cut'
2 You have 'synecdochically' moved horizontally, modifying your original clean-cut
3 These sites will be relatively easy to find, they are both on an horizontal and on a
vertical synecdoche. Note that the signal width of the vertical synecdoches
(e.g. the yellow one on the right side of the image) may vary quite a lot,
while horizontal synecdoches' width seems more costant.
4 You'll never find this cluster
with your current synecdochical approaches, you'll have to
devise a COMPLETELY DIFFERENT cut.
Regional searching The importance of languages and of online translation services and tools | Top |
One of the main reasons why the main search engines together cover (at best) just something less than 1/2 of the web
is a LINGUISTIC one. The main search engines are, in fact, "Englishcentric" if I may use this term, and
in many cases - which is even worse - are subject to a heavy "Americancentric bias".
The web is truly international, to an extent that even those who did
both physically travel and virtually browse a lot tend to underestimate.
Some of the pages you'll find may point to problems, ideals and aims so 'alien' from your point
of view that -even if you knew their languages or if they happen to be in English- you
cannot even hope to understand them.
On the other hand this multicultural
and truly international cooperation may bring some fresh air in a
world of cloned Euro-American zombies who drink the same coke with the same bottles, wear the same shirts,
the same shoes (and the same pants),
and sit ritually in the same McDonalds in order to perform their compulsory
and quick "reverse shitting".
But seekers need to understand this Babel if they want to add depth to their queries.
There are MANY linguistic aids out there on the web, and many systems that allow you to translate a page, or a snippet of text from
say, Spanish, into English or viceversa. But much rarer, and much more useful for us, are sites
that allow us to understand -eve roughly- pages written in Japanese, Chinese, Hindi, Russian, Korean, you name the funny alphabet :-)
As an example of how powerful such services can be in order to understand, for example, a Japanese site,
have a look at the following trick:
RIKAI
An incredible translator!
http://www.rikai.com/perl/Home.pl
Try it for instance onto http://www.shirofan.com/ See? It "massages" WWW pages and
places "popup translations" from the EDICT database behind the Japanese text!
for instance
http://www.rikai.com/perl/LangMediator.En.pl?mediate_uri=http%3A%2F%2Fwww.shirofan.com%2F
See?
You can use this tool to "guess" the meaning of many a Japanese page or -and especially- Japanese search engine options,
even if you do not know Japanese :-)
You can easily understand how, in this way, you can -with the proper tools- explore the wealth of results that the
Japanese, Chinese, Korean, you name them, search engines may (and probably will) give you.
Let's search for "spanish search engines"... see?
Let's now search for "buscadores hispanos"... see?
A 'portable' translator
| Top |
Highlight the following text:
Nous sommes en 50 avant Jésus-Christ. Toute la Gaule
est occupée par les Romains... Toute? Non! Un village peuplé d'irréductibles
Gaulois résiste encore et toujours à l'envahisseur. Et la vie n'est pas facile
pour les garnisons de légionnaires romains des camps retranchés de Babaorum,
Aquarium, Laudanum et Petitbonum...
click here: translate,
javascript:
params = '?langpair=fr|en';
if (document.getSelection) {
txt = document.getSelection();
}
else
if (document.selection) {
txt = document.selection.createRange().text;
}
if(txt)
params+="&text="+encodeURIComponent(txt);
void(window.open('http://translate.google.com/translate_t'+params, /keep on 1 line/
'translate','location=no,status=yes,menubar=no,scrollbars=yes, /keep on 1 line/
resizable=yes,width=547,height=442')) /keep on 1 line/
Ok, simple and quick (and rough) javascript. French into English was easy, of course. But -again-
note inside the code the params = '?langpair=fr|en'; snippet, that you can change to anything!
For instance to Korean
in order to translate or to browse starting from the following:
http://www.japanpr.com/shimane/shimane_default.htm
click here: Translate the page ko|en,
I would also like to draw your attention to the paramount
importance of names on the web...
The ethical aspect...
An unfair society...
websearch importance nowadays recognized and obvious, you'll see tomorrow :-)...
libraries and documents: frills and substance...
the guardian of the light tower, the young kid in Central Africa and the yuppie in New York...
Ode to the seekers
Like a skilled native, the able seeker has become part of the web. He knows the smell of
his forest: the foul-smelling mud of the popups, the slime of
a rotting commercial javascript. He knows the sounds of the web: the gentle rustling of the jpgs,
the cries of the brightly colored
mp3s that chase one another among the trees, singing as they go;
the dark snuffling of the m4as, the mechanical, monotone clincking
of the huge, blind databases, the pathetic cry of the common user:
a plaintive cooing that slides from one useless page down to the next until
it dies away in a sad, little moan. In fact, to all
those who do not understand it,
today's Internet looks more and more
like a closed, hostile and terribly boring commercial world.
Yet if you
stop and
hear attentively, you may be able to hear the seekers, deep into the shadows,
singing
a lusty chorus of praise to this wonderful world of theirs -- a world that gives them everything they want.
The web is the habitat of the seeker, and in return for his knowledge and skill
it satisfies all his needs.
The seeker
does not even need any more to hoard on his hard
disks whatever he has found: all the various images,
musics, films, books and whatsnot that he fetches from the web...
he can just taste and leave there what he finds, without even copying it, because he knows that nothing
can disappear any more:
once anything lands on the web, it will always be
there, available for the eternity to all those that possess its secret name...
The web-quicksand moves all the time, yet nothing can sink.
In order to fetch all kinds of delicious fruits, the seeker just needs to raise his sharp searchstrings.
In perfect
armony with the sourronding internet forest, he can fetch again and again, at will, any target he fancies,
wherever it may have been "hidden". The seeker
moves unseen among sites and backbones,
using his anonymity skills, his powerful proxomitron shield and his mighty HOST file.
If need be, he can quickly hide among the zombies, mimicking their behaviour and thus disappearing into the mass.
Moving silently along the cornucopial forest of his web, picking his fruits and digging his juwels,
the seeker avoids easily the many vicious traps that have been set to catch
all the furry, sad little animals that happily use MSIE (and outlook), that use only
one-word google "searches",
and that
browse and chat around all the time without proxies, bouncing against trackers and web-bugs
and smearing all their personal data around.
Moreover the seeker is armed:
his sharp browser will quickly cut to pieces any slimy javascript
or rotting advertisement that the
commercial beasts may have put on his way. His bots' jaws will tear apart any database defense, his powerful
scripts will send
perfectly balanced searchstrings far into the forest.
So, that was it. Any questions?
Your own private investigations
The power of searching at your fingertips, what are you waiting for?
Start your own private investigations! Here two rather naïve examples.
1) Inflation
Don't you have the impression that the real inflation
we have all to endure (with more and more expensive everyday prices) is waaay more than that
ludicrous 2,1% (circa) that our powers that be claim year after year?
Well, there are a series of newspapers with their COMPLETE ARCHIVES on the web, searchable, for free.
Supermarket chains,
Aldi, Carrefour, you name it, have also published on the web their "fabolous" offers and prices.
Or try this:
http://www.google.com/catalogs :-)
You'll be able to find the older pages, as we have seen, using webarchive or similar web-snapshots repositories.
Assignement: Find the real inflation using all available data.
Some simple suggestions:
Use an
average of price components that is weighted and categorized in
approximately the same manner as the official Consumer Price Index.
Housing should represent the largest component at 40% with other categories having lesser impact.
The inflation rate should be calculated as a price multiplier with a base year of 1995,
to represent the number of "1st January 2006"
euro that are required to purchase what the equivalent of one "1st January 2006" EUR bought in 1995.
The annualized inflation rate is the
equivalent average compounded yearly inflation rate over the 10 year period.
Take account of education and medical care costs, also easy to find and check on the web (some combing
and social engineering will go
a long way in order to find them).
Create two subgroups: 1995-1999, 2000-2005
If you do it, you will soon realize that -while the euro itself has nothing to do with it-
the high inflation trend (around 6~8% real, not 2,1%) has been a (quite interesting) constant.
Purchasing power and living standards been steadily reduced, in Europe and elsewhere,
through a higher than admitted real inflation, which
translates of course into an automatic salary decrease for the great majority, bar speculators.
This has been coupled
with a prolonged (by law) "active life" (read shorter pension), and longer working hours (and working days) without
any salary compensation whatsoever for the unwashed masses.
2) Punishing greedy and corrupt ones
D'you have in your town a station being built, a new industrial area being planned, any building permits being granted,
any committee for the management of public housing?
You can bet that -in 99% of cases- someone is using law-loopholes and/or a net of political protection in order
to make money illegally.
But you now have the power of the seeker! Don't underestimate it.
You can explore all newspapers' databases, you can easily find
related
news, you can seek in many
languages...
In a more and more Internet-oriented society a seeker can find out quite a lot about
his targets.
You can stalk people, lure and/or troll info out of them or about them, find out where they live,
how much they earn, when, where and
how they started to work (political appointment? Public competition? Father's connections?)
You can, with simple social engineering tricks, get in touch with their co-workers, enter their databases,
have a look at the code of their doc format documents, where word, often enough and per default, keeps all the
corrections and changes which have been made to a document...
Your 'private investigations' may be small crumbs, but even small crumbs may grind the
well-greased wheels of your own local political/commercial vermine!
|
|
|
|
|
The Door | the Hall | The Library | The Studio | The Garden path |