~ ~

Yahoo!/AllTheWeb's image search syntax

No one lights a candle and hides it under a bushel, except Yahoo!

 

****
         to essays   

First published @ Searchlores in June 2007 | Version 0.01 | By Nemo


¤ Introduction ¤ AllTheWeb's syntax ¤ References ¤

Introduction

Is AllTheWeb dead? Not quite true, AllTheWeb image search (the engine powering Yahoo! image search) is alive and kicking and has undergone major redevelopment last year. Unfortunately Yahoo seems to be shy and once more seekers had to join the dots to guess the syntax, which is the most advanced on earth (followed quite closely by exalead, who offers regular expression searches). The purpose of this essay consists precisely in documenting Yahoo's image search syntax and providing examples showing its usefulness.

To begin, how do we know the syntax? Well, Yahoo bought the three Google's competitors -AllTheWeb, AltaVista and Inktomi- in 2003/2004, so it is a good guess that one of these is powering Yahoo image search. To know which one, all we have to do is testing the three search syntaxes:

As we can see, AllTheWeb search syntax, cf. [1] and [2], is the supported one. On the other hand, Paul Bausch published on Yahoo! hacks the Yahoo video search syntax, which already have leaked to the web, cf. [3] and [4]. As Yahoo video search also supports the AllTheWeb search syntax, it is a good guess that image and video search should share the same query language, which is the case for most operators. Seeing the names of the operators, another one was guessed: fromurl:.

Now that you know how I have found the syntax, lets see how to use Yahoo image search to its full power.

AllTheWeb's syntax

Elementary seeking-jutsu

Lets start by the most basic seeking-jutsu -the image's color- which is useful for searching old images, when photos where black & white. Compare: Champs-Elysées (black & white) vs. Champs-Elysées (color). Or for searching scans, as documents are usually black & white. Compare: Ode to Joy (black & white) vs. Ode to Joy (color).

Another easy one is image's format (JPG, GIF or PNG). Almost everyone knows that GIF and PNG are great for computer generated graphics, clip art and drawn graphics with few colors, or large blocks of color, whereas JPG is far better for photographs and illustrations with large numbers of colors, as these formats give better quality images for the same file size. As far as I know Yahoo image search do not have a file type operator, the best we can do is searching for keywords in the url and the operator is url.all:. Compare for instance:

Intermediate seeking-jutsu

Like it or not, you must use words for searching images and, besides, images search engines are totally blind! For instance, if you are searching for oranges, Yahoo! shows you oranges not because it is able to identify images' content, but because the word oranges appears, at least, in one of the following locations:

  1. image's file name,
  2. image's alt attribute (an HTML snippet of text which is supposed to describe image's content to blind people using an audio or braille browser),
  3. a small snippet of text in image's neighborhood,
  4. image's and document's URL, perhaps document's title and anchors (link's visible text) of links pointing to the document containing the image.

To get a better grasp of what the hell I'm talking about, take a look the same query, oranges, at AltaVista image search, which is powered by AllTheWeb image search. If you click on the link More info of the fourth image search result, you'll see the small snippet of text (Abstract:), taken from the original document, describing the image.

Now that you grasp the importance of names, lets see how AllTheWeb handle words. AllTheWeb uses an automatic Boolean AND between terms, but it also offers a way of searching for any of your search terms (OR operation). Compare illumination parchment and (illumination parchment). The main uses of the OR operator are the expansion of you search query with synonyms, because different people use different words to describe the same thing, or to give a context to your query. For instance, if you were searching for illuminations on parchment, you could use the following expanded query: illumination (parchment vellum), where the expanded terms are substituted by the synonyms enclosed by parenthesis. To get synonyms, you can use the following online tools:

Warning! For reasons that will be explained later, as the respective syntax, images from the site flickr.com will be excluded preemptively joining -fromsite:flickr.com to the following queries.

Lets see another use of the OR operator -giving a context to your search query-. If you where searching for illustrations of Captain Nemo -fromsite:flickr.com or Gandalf -fromsite:flickr.com, you would be disappointed with what you get, as search results are flooded with commercial trash and littered with a bunch of pictures of someone's favorite pet. The root of the problem is that search engines do not have a clue of what do you want. It's up to you, to give them the context of your query string. That is fairly easy to do, just go to the Seekers' Oracle, type two keywords who belong to the same context in the box, select web search, hit search and you'll get a list of keywords, sorted by frequency, who appear in Yahoo's web search results and are likely to be related with your search terms. The rationale is that is far easier to get related keywords by seeing them, than by brainstorming to get them. With this in mind, lets see how do we give a context to our two queries. For the first, using the keywords nemo verne we easily spot the following keywords belonging to our context –verne, jules, captain, leagues, sea, under, nautilus, mysterious, submarine, island, underwater, ocean, voyage & crew– which allow us to build the following query:

nemo (verne jules captain leagues sea under nautilus mysterious submarine island underwater ocean voyage crew) -fromsite:flickr.com

Similarly for Gandalf, using the keywords gandalf tolkien, gandalf aragorn and gandalf frodo we manage to give a context to our query:

gandalf (tolkien rings lord hobbit wizard ring frodo middle earth lotr bilbo fellowship grey white aragorn king return towers balrog saruman hobbits galadriel legolas sam baggins sauron battle elrond gimli gollum magic moria council theonering gwaihir elves hobbiton pippin three wise battles thorin evil shire arwen merry boromir gondor theoden tirith faramir eomer minas elf rohan secret eowyn son heir isengard rivendell denethor army arathorn palantir samwise throne pipe wood quest journey mordor rivendell friends light orcs gamgee destroy sword dead death precious mithril) -fromsite:flickr.com

But only a quarter of the job is already done, as till now we only have eliminated the documents where only appear our main keyword, either by design (off topic pages or pages containing random lists of keywords made by incompetent spammers) or by chance (someone's pet name). Of course savvy SEOs also have done this kind of research and have stuffed and optimized their pages with keywords from this context... Notwithstanding they are betrayed by their own pages, which expose the keywords showing the ways they are using to monetize their "content"... The way of handling these offenders is getting those very keywords. For that, we once more use the Seekers' Oracle, but this time we use only one keyword to maximize the crap to content ratio on search results! As web and images are likely to be monetized in different ways, we choose the option images to spot the keywords clinging to our main keywords on images search results. For instance, using the keyword nemo we spot the following list of troublesome keywords: finding, disney, movies, movie, shop & jamba. All we have to do is excluding the pages containing those very same keywords:

nemo (verne jules captain leagues sea under nautilus mysterious submarine island underwater ocean voyage crew) -fromsite:flickr.com -"finding nemo" -disney -movies -movie -shop -jamba

Similarly, using the keyword gandalf, we spot the following list of troublesome keywords –movie, movieshots, products, product, shop, banner, movies, cinematrix, game & games–, which allow us to build the following query:

gandalf (tolkien rings lord hobbit wizard ring frodo middle earth lotr bilbo fellowship grey white aragorn king return towers balrog saruman hobbits galadriel legolas sam baggins sauron battle elrond gimli gollum magic moria council theonering gwaihir elves hobbiton pippin three wise battles thorin evil shire arwen merry boromir gondor theoden tirith faramir eomer minas elf rohan secret eowyn son heir isengard rivendell denethor army arathorn palantir samwise throne pipe wood quest journey mordor rivendell friends light orcs gamgee destroy sword dead eowyn death precious mithril) -fromsite:flickr.com -movie -movieshots -products -product -shop -banner -movies -cinematrix -game -games

Some quarries are better described by phrases. The "philosopher's stone" is one of those examples (phrases are searched by enclosing them between "). To get more search results, lets search for the philosopher's stone in different languages. Thanks to Wikipedia, thats a trivial task. Take a look at the following page: Philosopher's stone, where you can get the translation in a bunch of other languages, just following the links present on the table In other languages located on the left. So our quest for the philosopher's stone is expanded by the following query:

("Philosopher's stone" "Философски камък" "Pedra filosofal" "De vises sten" "Stein der Weisen" "Piedra filosofal" "Ŝtono de la saĝuloj" "Pierre philosophale" "Pietra filosofale" "אבן החכמים" "Filosofinis akmuo" "Bölcsek köve" "Философски камен" "Steen der wijzen" "賢者の石" "De vises stein" "Dei vises stein" "Kamień filozoficzny" "Философский камень" "Kameň mudrcov" "Viisasten kivi" "Bato ng pilosopo" "賢者之石")

Using the Seekers' Oracle to investigate the phrase philosopher's stone, we get the following list of off topic keywords –harry, potter, dvd, youngactors, harrypotter, movie, shop, product, products, hogwarts, fawkes, games, game, trailer, amazon, cd & shopping–, which we promptly use to refine our query:

("Philosopher's stone" "Философски камък" "Pedra filosofal" "De vises sten" "Stein der Weisen" "Piedra filosofal" "Ŝtono de la saĝuloj" "Pierre philosophale" "Pietra filosofale" "אבן החכמים" "Filosofinis akmuo" "Bölcsek köve" "Философски камен" "Steen der wijzen" "賢者の石" "De vises stein" "Dei vises stein" "Kamień filozoficzny" "Философский камень" "Kameň mudrcov" "Viisasten kivi" "Bato ng pilosopo" "賢者之石") -harry -potter -dvd -youngactors -harrypotter -movie -shop -product -products -hogwarts -fawkes -games -game -trailer -amazon -cd -shopping

Until now we only have half refined our three main queries: nemo, gandalf & philosopher's stone. The other half requires advanced seeking-jutsu and those techniques will be taught in the corresponding section.

AllTheWeb also offers powerful ways of refining our queries by document's / image's location. There are two prime cases where location is very useful. When you want to explore the content of a specific site:

or when you want to search some classes of sites / documents having better signal to noise ratios:

Advanced seeking-jutsu

AllTheWeb offers five powerful operators which allows you to focus on images' properties and not on the surrounding context:

Convert a Date/Time to a Unix timestamp
(number of seconds elapsed since midnight UTC of January 1, 1970)

Month:   Day:   Year:   Hr:   Min:   Sec:
/ / at : :
 
Warning: it won't work, because you have javascript disabled.

To get a better grasp on how these operators work, lets start with the date: operator. The date: operator has two main uses: seeing how something or someone has evolved with time, fravia for instance, taking one year time slices (or any other time span you prefer), to do that we enclose the beginning and ending dates between square brackets

or seeing how an important event changed someone / something, for instance the World Trade Center before (on the left) and after (on the right) September 11, 2001

Before we give the final refinements to our queries (nemo, gandalf & philosopher's stone), lets solve three of fravia's riddles, which are quite instructive in the way they illustrate how to use the other four operators.

Riddle I (from An ochre colored "image searching" riddle)
Searching an image without knowing its name. Can you locate/understand/explain this image?
Right clicking on the image, we know that image's width = 691 pixels, height = 460 pixels and file size = 42162 bytes. This data is usually sufficient to uniquely identify an image on the web and allows us to write the following query – width:691 height:460 filesize:42162 –. There is only one search result and is located on this forum. I guess this a copy from somewhere else and the image's name (probulemfrog.jpg) should be the original one. Searching probulemfrog on Google, we finally locate the original site.
Riddle II (from Hidden essays)
Well, ahem... on a real ~S~ site a small (sortof) "riddle" is imho necessary and welcome (per angusta ad augusta). sincerely hope you'll have quite a difficult task :-) trying to fetch my own "dark" and "light" essays...

~S~ fravia+, September 2000

Right clicking on the images, we see that both have width = 301 pixels, height = 366 pixels and their names are mucha3.jpg and mucha4.jpg. Seeing this information, I guess that the "hidden" page has two mucha images and those images probably have the same width and height. Testing that hypothesis (width:301 height:366), we find on page 20 (ymmv) two other mucha images located here. The caption of the second image on that page, the width and height of the images?, is quite interesting... Was this kind of search, what fravia had in mind?
Riddle III (from not so easy little historical 'riddle')
For those that like riddles (and web-searching for images). How comes there's a swastika (green, smack in the middle, see?) on this 1917 russian 1000 rubles banknote? And: can you find the other side (the one with the building) on the web?
Right clicking on the image we see that its width = 818 and its height = 495 pixels. We also know that this image is a scan of a real note with unknown width and height, but the ratio width / height (aspect ratio) of an object remains unchanged by magnifications or reductions, so the original banknote has an aspect ratio of approximately 165... giving some room to the way the images on the web might have been cropped, we guess the aspect ratio of those images should range from 155 to 175. Joining all the information, we construct the following query: aspect:[155;175] (1000 rubles banknote) 1917.
This query can be expanded in other languages using the Wikipedia and following the links present on the table In other languages located on the left. For the word ruble we get its plural in other languages either by consulting the tables or images captions or banknotes' values followed by something. For the word banknote, we get the translations seeing the title. Taking this linguistic expansion, our query becomes:
aspect:[155;175] (1000 rubles rublů Rubel rubla rublos roubles ruskih rubli roebel рублей рубаља rubalja ruplaa Rubler карбованців 卢布 รัสเซีย Banknote "Gîn-phiò" Банкнота Bankovka Monbileto اسکناس Billet Nota 지폐 שטר כסף Banconota Popieriniai pinigai Bankbiljet 紙幣 Pengeseddel پۇل Banknot Банкнота Bankovka Bankovec Seteli Sedel ธนบัตร Банкнота 紙幣) 1917
Among these pages, this one contains the other side of the banknote, cf. other side 1 and other side 2.

Finally, lets see how these operators can be used to further reduce the noise in our search results:

It's now time to give the final refinements to our three queries (if you get We did not find results for... hit the button reload, as we are pushing AllTheWeb to the limit! Trust me, AllTheWeb can handle it :)

nemo (verne jules captain leagues sea under nautilus mysterious submarine island underwater ocean voyage crew) -fromsite:flickr.com -"finding nemo" -disney -movies -movie -shop -jamba aspect:[30;300] -aspect:[66;67] -aspect:75 -aspect:133 -aspect:150 -aspect:177 width:>250 filesize:>10000
gandalf (tolkien rings lord hobbit wizard ring frodo middle earth lotr bilbo fellowship grey white aragorn king return towers balrog saruman hobbits galadriel legolas sam baggins sauron battle elrond gimli gollum magic moria council theonering gwaihir elves hobbiton pippin three wise battles thorin evil shire arwen merry boromir gondor theoden tirith faramir eomer minas elf rohan secret eowyn son heir isengard rivendell denethor army arathorn palantir samwise throne pipe wood quest journey mordor rivendell friends light orcs gamgee destroy sword dead eowyn death precious mithril) -fromsite:flickr.com -movie -movieshots -products -product -shop -banner -movies -cinematrix -game -games aspect:[30;300] -aspect:[66;67] -aspect:75 -aspect:133 -aspect:150 -aspect:177 width:>500 filesize:>50000
("Philosopher's stone" "Философски камък" "Pedra filosofal" "De vises sten" "Stein der Weisen" "Piedra filosofal" "Ŝtono de la saĝuloj" "Pierre philosophale" "Pietra filosofale" "אבן החכמים" "Filosofinis akmuo" "Bölcsek köve" "Философски камен" "Steen der wijzen" "賢者の石" "De vises stein" "Dei vises stein" "Kamień filozoficzny" "Философский камень" "Kameň mudrcov" "Viisasten kivi" "Bato ng pilosopo" "賢者之石") -harry -potter -dvd -youngactors -harrypotter -movie -shop -product -products -hogwarts -fawkes -games -game -trailer -amazon -cd -shopping aspect:[30;300] -aspect:[66;67] -aspect:75 -aspect:133 -aspect:150 -aspect:177 width:>200 filesize:>10000

References:

  1. http://www.ysearchblog.com/archives/000238.html
  2. http://web.archive.org/web/20030703134412/http://www.alltheweb.com/help/faqs/query_language.html
  3. http://www.searchengineshowdown.com/blog/2006/01/new_search_prefixes_for_yahoo.shtml
  4. http://www.digitalalchemy.tv/2006/09/yahoo-video-conduct-very-specific.html

(c) Nemo 2007    nemo vitam meam regit@yahoo.com    replace white spaces by underscores.


Petit image

(c) III Millennium: [fravia+], all rights reserved, reversed, reviled and revealed