pdffing.htm: How to search the web, by fravia+ pdffing

~ Those annoying pdf files ~

				Those annoying pdf files

Version August 2005

Introduction PDF-related essays
Converting any file to PDF
Converting an Acrobat PDF into ASCII
The modern art of HTML2PDF conversion
Enabling Print-Challenged PDF Files
The use of pdf2txt@adobe.com
De-protecting PDF files PDF tools
Cleaning Adobe bloated reader 6
"Remote Approach" pdf malware

Adobe PDF2HTML:

Ta-daaa!

  Joseph Cox's Adobe Reader Speed-Up (v1.31): [ar-speedup.zip] : 149 Kbytes zipped

disables the majority of plugins (that are completely useless)
hence adobe, should you really be compelled to use it, will start in a wiff

Precious
Item

A free, alternative and quick pdf reader instead of Adobe's monstruosity:
Foxit PDF reader: Free and QUICK reader for PDF documents. You can view and print them with it.
Its small (less than 1MB download. It doesn't need any lengthy installation, so you can start to run it as soon as you have downloaded it. It finds text inside the pdf documents, select it and copy.
And it starts up immediately, so you don't need to wait ages for many lengthy useless dlls loading or for an annoying "Welcome" screen to disappear.

And if you really want to create your own pdf-files...
PDFCreator easily creates PDFs from any Windows program. Use it like a printer in Word, StarCalc or any other Windows application.

Introduction

Ok, zugegeben: pdf files are a pain in the stomach: cumbersome, difficult to grep, search, and automate for retrieval, awkward for cut and pasting purposes, clogging down your computers with the Acrobat overload. But they have also some positive aspects, of course, hence people still use them, and you will find USEFUL pdf-files every now and then on the web - see for instance the very important altavista's
Search Intranet Developer's Kit (Ver. 2.6, April 1999) And all other pdf files in our library.

Evidently -once found- you may want to fiddle with them: use them, catalogue them, grep them, whatever.
But, alas, pdf files, are annoying: they can be write protected, password protected, whatever. Searchers should, of course, know how to overcome these small annoyances.
Also, Carpathia pointed out this a useful tool for a lot of multiformat conversions:
http://wheel.compose.cs.cmu.edu:8001/cgi-bin/browse/objweb
Enjoy!

PDF-related essays

[son_font.htm]: An essay on identifying and getting ahold of fonts
by sonof, february 2003 part of the [Essays], of the [pdf], and of the [Targets] sections.
With a script to extract fonts from pdf files
[getting a non-protected PDF file] by heather spoonheim (September 2002)
[Converting Word-Docs (or anything you like) to PDF] Some tips by ashok hariharan and other common knowledge methods
[The use of pdf2txt@adobe.com] by fravia+ (September 2000)
How to create PDF-Files
from any application that allows printing
(The GSview approach)
by hassan, May 2000

[Converting an Acrobat PDF into ASCII] by Wolfgang Redtenbacher (March 2000)

[Enabling Print-Challenged PDF Files] by Kayaker (March 2000)

by Zer0+, November 1997

Quick starting notes and pdf again
(boolean variables inside PDF files)

by Ragica, November 1997

A Response to +ORC's Message Regarding reversing PDF
the biggest collection ever (in fact the only collection ever so far!) of information regarding hacking PDF and links to relevant information
(Kevin Lair CGI-hacks, the GhostScript hack, many good starting point for the USER crack)

by SiuL+Hacky, November 1997

Linux cracking: the live approach (acrobat reader)
Linux advanced reverse engineering: imported functions

by Snatch, November 1997

Cracking all nag-screen and time-trial protections (Aerial32 as example)
(Resource-ID fishing)

by JimBob, October 1997

The Aerial trick
How to crack any pdf security setting
(The Aerial RTF format converter)

by zeezee, October 1997

Create PDF documents for free reversing Adobe PDF Writer
You know how to create .pdf documents? No? I will -shortly- explain it
(another nice HIEW lesson)

Converting Word-Docs (or anything you like) to PDF

Converting Word-Docs (or anything you like) to PDF

The word document is a very lousy format -- it just takes up too much space.
Now you can convert all your huge documents to liteweight pdfs. Just follow the following steps blindly :
1) Goto the start menu -> settings -> printers and select 'Add Printers'
2) Select 'Local Printer' when it prompts you for a local or network printer.
3) From the list of printers select any printer ending with 'PS' , this indicates that the printer has PostScript support. I generally chose something like 'HP Color Laserjet 5/5M PS'.
4) Click Next and in the next screen select the Port as FILE:.
5) Click Next again and finish. (Say no to default printer).
6) Download and install GhostView from http://www.cs.wisc.edu/~ghost/
7) Now launch the Word Document that you want to convert to PDF in winword.
8) In winword select File->Print. In the printer name select the name of the printer that you just added. and check the option 'Print to file'. Now click OK.
9) In the 'Print to File' Save As dialog save the file to a folder as filename.prn.
10)Now launch ghostView .
11)Use file->open to open filename.prn in GhostView.
12)Now use file->print. The printer setup dialog is displayed.
13)Select 'device:' as pdfwrite, select 'resolution:' as 300 , select 'Print to File' and click ok, enter the output file name when prompted as your filename .pdf.
14)Thats it ! you can now view your old word document as a PDF file in acrobat reader.

There are other tools that may come useful:
sfwtools converts anything (also pdf files) to swf

Yet there are many other methods:
One of the simplest ways to convert into pdf is to use the software OpenOffice which is available at openoffice.org.

It is a great replacement for Micro$oft Office package. Use the text editor in OpenOffice and simply click on "export as pdf". Within a few seconds (depending on the size of your file), the PDF document is created.

Open office does a lot more.
+ it replaces the entire Micro$soft office package for home users.
+ It dispenses the need for using Word for composing documents
+ it is only a little over 60 MB
+ It is free
+ It is Open source
+ And, again, you can create PDF files by the mere click of a button.

Another possibility is the Win2pdf package, available at http://www.daneprairie.com, It also installs as windoze print driver, so all you need to do is to print from any application to create a PDF file. Unregistered versions are free for non commercial use.

Converting an Acrobat PDF into ASCII

Thank to Wolfgang Redtenbacher for the bulk of the following advices about converting an Acrobat PDF file into ASCII text

Several solutions like using Ariel (that can be easily cracked) and sending an e-mail to "pdf2txt@adobe.com" have been suggested.

What does not seem to be known widely, however, is the fact that there exist freeware programs to convert PDF to TXT locally.

One solution is to download Acrobat 4.0 (ca. 6 MB) from www.adobe.com, plus the accessibility plug-in (ca. 1.2 MB) from "access.adobe.com". This plug-in permits you to load a PDF file into Acrobat and save it as .TXT or .HTM.
An even better solution (regarding program size and conversion quality) is the program "pdftotext" which is part of the XPDF-package (a freeware PDF viewer for several operating systems).

You need to download the following files:

DOS: ftp://ftp.foolabs.com/pub/xpdf/xpdf-0.90-dos.zip (1298148 bytes) and
ftp://ftp.cdrom.com/pub/infozip/MSDOS/gzip124.exe (119146 bytes)

Win32: ftp://ftp.foolabs.com/pub/xpdf/xpdf-0.90-win32.zip (584326 bytes) and
http://www.gzip.org/gzip124xN.zip (62203 bytes)

After unpacking, you only need 1 file from each of the archives (either DOS or Win32):

pdftotext.exe (964341 bytes/DOS resp. 354304 bytes/Win32) gzip.exe (39910 bytes/DOS resp. 91648 bytes/Win32)

Move these 2 files into a directory that is in your search path (environment variable PATH= ...), enter the command "pdftotext xyz.pdf", and within seconds you get an ASCII text conversion result in the file "xyz.txt" ("xyz" has to be replaced by the real file name, of course).

NOTE: While the Win32 version of "pdftotext.exe" is more compact than the DOS version (which contains additional DOS extender code), it does not work with the widespread DOS version of "gzip.exe" as it needs gzip with long file name support. Therefore make sure to use either both programs in the DOS version or both in the Win32 version. (The DOS version runs flawlessly on a Win32 platform - it is just a bigger EXE-file.)

The modern art of HTML2PDF conversion

by Christian Wolfgang Hujer

Hello,

>> where can i find source code for converting html files
>> to pdf format, source code being in java.

several tools exist for that purpose.

I searched in google for "HTML Java PDF Conversion".

I found iText, which is an open source PDF library in Java that already has 
capabilities of converting HTML to PDF.
On
http://www.lowagie.com/iText/links.html
are links to several other PDF engines in Java.
iText is an open source project hosted on sourceforge.

But the "modern art of HTML2PDF conversion" is the 
following:
1. Make sure the HTML files are valid XHTML (or at least well-formed XML).
2. Use a transformation stylesheet that transforms HTML to XSL Formatting 
Objects using XSL Transformation. Apply that stylesheet using an XSLT 
processor like xt, xalan, saxon...
3. Run a Formatting Objects engine (like FOP from James Tauber / Apache 
Group) that converts the generated FO-Tree from step 2 to PDF.

Most XSLT processors and most Formatting Objects engines are written in Java, 
including all mentioned products (xt, xalan, saxon, fop) and come with their 
source code.

The following points must be kept in mind:
- - Knowledge
XSLT and XSL:FO are 1-4 new languages to learn (depends on the point of view 
and your knowledge, my opinion is that it's four languages altogether: XML, 
XPath, XSLT and XSL:FO, but these are quite easy languages)
- - Development speed
XSLT Stylesheets and XSL Formatting are quite easy to learn and very quick to 
develop.
It takes only few time to write an XSLT stylesheet.
- - Servlet Usage
XSLT and XSL:FO are usable as servlets.
- - Performance
Java native library performance might be considerably better.
- - The XSLT and XSL:FO is highly configurable. You control nearly every pixel 
(resp. point) of the resulting PDF.

But I've never used the non-XSL:FO way, so I can't say much about that.


Just my 2 cents.

Greetings

- -- 
Christian Wolfgang Hujer

Enabling Print-Challenged PDF Files

Enabling Print-Challenged PDF Files

I've seen a number of queries recently about printing PDF files when the
Document Security doesn't allow printing, so I thought I'd pass this along
before I file it with my notes.

With Acrobat Reader 4.0 all that is required is to enable the Print menu item,
enable it - you print. There is no second check before it goes to the code that
actually calls the Print Common Dialog! This is different from Acrobat Reader 3.x,
where there is a second check, which stupidly gives you a Message Box saying
"This Operation is Not Allowed" if you try clicking on your newly enabled Print
function. By saying essentially "You shouldn't have been able to do this", it
gives you the reverser something to work with to bypass it of course.

The check for whether printing is allowed occurs as soon as you click on the File
drop down menu. I was using the Building Win95 Apps PDF by Kevin Goodman which is
available here and there.

You all know the Win32 API function EnableMenuItem of course:

The EnableMenuItem function enables, disables, or grays the specified menu item.

BOOL EnableMenuItem(

HMENU hMenu, // handle to menu
UINT uIDEnableItem, // menu item to enable, disable, or gray
UINT uEnable // menu item flags

The 2nd parameter specifies the menu item under question and will be either the
identifier of the menu item if given by the uEnable parameter MF_BYCOMMAND flag,
or the relative position of the menu item if given by the MF_BYPOSITION flag.
The MF_BYPOSITION flag is normally used.

We can find out the position of the Print menu item in the drop down list with:

The GetMenuItemID function which retrieves the menu item identifier of a menu
item located at the specified position in a menu.

UINT GetMenuItemID(

HMENU hMenu, // handle of menu
int nPos // position of menu item

If a regular menu item, the Return Value is an identifier.
If a submenu, the Return Value is 0xFFFFFFFF.
If a separator, the Return Value is 0.

If you cycle through the GetMenuItemID function by setting a breakpoint on the
2nd parameter (the 1st parameter PUSHed in SoftIce), you see an interesting pattern
forming. The first item in the drop down list is given the position # 0 (hex) and
an identifier as a return value, the second #1 and so on, including the separators.

The following table can be made (and I apologize for the /PRE formatting spacing):

Position	Identifier	Menu Item

0		1770		Open
1		0		Separator
2		1772		Close
3		0		Separator
4		1774		Page Setup
5		1775		Print
6		0		Separator
7		FFFF		Document Info (submenu)
8		FFFF		Preferences (submenu)
9		0		Separator
A		1783		Adobe Online
B		0		Separator
C		1785		Recent File 1
D		17CC		Recent File 2 (up to 4 files)
E		0		Separator
F		1787		Exit

So, without even going to this trouble we can deduce what the BYPOSITION position
value will be for that 2nd parameter of EnableMenuItem simply by counting the
number of menu items, including separators, in the drop down list.

OK, now what? We know that somewhere between the time the identifier (1775) is
allocated to the Print menu item by GetMenuItemID, and the EnableMenuItem function
is called, there is a check to see if this file is actually supposed to be printable.

So how about doing a TRACE between the two and see what's going on?

We want the first break to be when the 2nd parameter of GetMenuItemID (the first
parameter PUSHed) is equal to 5 (the position number of Print in the drop down list).

The address of the 1st parameter on the function call stack is given by (ESP+4),
the 2nd by (ESP+8), so this works:

BPX GetMenuItemID IF *(SS:ESP+8)==5

If we set up a macro to display this address in the data window we can verify it
broke at the right time:

MACRO Position = "dd SS:ESP+8"

and the first BPX can become:

BPX GetMenuItemID IF *(SS:ESP+8)==5 DO "Position"

Break here, F11 and notice the menu identifier (1775) in EAX and the position (5)
in the first line of the data window.

Set up the second breakpoint similarly:

BPX EnableMenuItem IF *(SS:ESP+8)==5 DO "Position"
(again, we are looking at the 2nd parameter on the stack)

Then set up the Trace. You may want to increase the Trace Buffer size from the
default of 8K.

TASK will give the Taskname Acrord32

BPRW Acrord32 T will set the trace

Press F5 and you will break back into SoftIce after the code between the two
function calls has been executed. F11 to return to Acrord32. You might temporarily
toggle out the Register/Data/Code windows with WR/WD/WC and maximize LINES before
typing SHOW to display the trace.

SHOW 1 will show the last command executed

000001  0137:0054C117  FF159C295700    CALL  [USER32!EnableMenuItem]

and you can use the arrow keys to scroll up and down.

A full screens' worth of this trace gives a nice screendump with the Icedump
PAGEIN N c:\filename.txt command.

Looking back through the trace code, you quickly see a suspicious jump. You can
patch this or force EAX to 1 a few lines back by changing

SBB EAX, EAX
INC EAX (EAX=0)

into

XOR EAX, EAX
INC EAX (EAX=1)

I won't give the actual addresses to patch, as that would take the fun out ;-)

Please correct any mistakes I might have made.

Cheers, Kayaker

The use of pdf2txt@adobe.com

This is an extract from an email I made some time ago... common knowledge (see also Wolfgang Redtenbacher's contribution, but hey! it works fine for me!

More and more documents are stored in Adobe's pdf format on the Web.

That may be fine for frill-formatting purposes, but quite annoying for the rest of us, since pdf files are quite cumbersome for cut & paste and for search & grepping purposes. I have realized that many don't know that there's a nice (email) utility by Adobe itself for those of you that prefer plain *.txt files (that can be searched, cutted, pasted or grepped ad libitum).

Simply send an email with your pdf files attached (i.e. use the "insert file" option) to the following email address:

pdf2txt@adobe.com

You don't need to send either text or subjects.

After a couple of minutes: "Hey bingo!" you'll get your text file emailed back to you (for free of course).

Also useful: http://www.adobe.com/products/acrobat/access_email.html: Adobe PDF conversion by e-mail

getting a non-protected PDF file

I received the following from heather spoonheim (September 2002):

+Fravia;

I have been following your site since early 1999, and am not sure just how active you are, but, I thought I would send a quick note as an addendum to the PDFFING page.

Something that was not mentioned was just how simple getting a non-protected PDF file from a protected PDF file really is.

First I did some searching and found out that non-printing was actually up to the PDF viewer. That means the software is requested by the document to not print the document, and the software honours the request. What does this mean? It means that if you have the source to a viewer, you can tell it to ignore the request. Just do a search for open source PDF viewers, and you find xpdf, just like on your PDFFING page. You now have the source for PDF software.

To create your own PDF software that ignores the "print disable" information, edit the pdftops.cc file. Find this section of the file:

  // check for print permission
  if (!doc->okToPrint()) {
    error(-1, "Printing this document is not
allowed.");
    goto err1;
  }

Comment out that block, compile, and presto, whammy, you can convert the the .PDF to .PS using the pdftops.exe program.

What to do now? Well, all laser printers understand .PS files, so you can just do a dump from the command line to the printer, or you can run Adobe Distiller on the file, and create a new PDF, or you can run Ghostscript on the file. Anything you want.

ps: To compile on a M$ platform you need either MSVC(big bucks), or DJGPP(free as in speech). I didn't try with BCC (free as in beer), but it would probably work also. The ms_make.bat file that comes with the source package worked flawlessly for me the first time.

PDF 'tools'

Advanced PDF Password Recovery Pro 2.12:
Decrypt protected Adobe Acrobat PDF files, which have "user" and/or "owner" passwords set, preventing the file from opening or editing, printing, selecting text and graphics etc.

Gymnast 3.5 (build 149):
Gymnast converts text files to Adobe Acrobat PDF format without the need for any additional Adobe software.
The alternative to the full Acrobat suite (which is relatively easy to find on the web: search for KWW500R7150122-128 or for acroba5 .zip / .rar / .ace): Gymnast supports hyperlinks, annotations, and automatic generation of bookmarks from headings. You can even include links to Web sites and other PDF files. Produce professional-looking documents for the Web. Development stopped end 1999.
Now freeware: Use this key to register your full copy: GYM03-35672-11110-33170

GNU-Linux free pdf-viewers: http://lwn.net/Articles/113094/ (GNOME Ghostview (ggv), kghostview, xpdf, gpdf, kpdf)

Cleaning Adobe bloated reader 6

How to use liposuction to repair Adobe Reader 6
And give it mouth-to-mouth respiration too

(article by Fernando Cassia taken from The Inquirer and published on searchlores in february 2004)

I reported some time ago on the slooooooow nature of Adobe's latest version of the acrobat (.pdf) reader, recently renamed "Adobe Reader" by their marketing spinmeisters. See "Adobe's quiet release of Reader software causes people to scream".

And just a few days an e-mail from a reader, Kelly Cook, sneaked into my inbox, claiming to have found a way to trim the fat off Adobe Reader 6, and improving its load time to more reasonable levels. Apparently the reader spotted this info on a Mac-related blog and decided to try it on the Windows version. I have tried it myself and guess what, it worked.

On my tests, I decided to use the "lowest of the low-end" system: my Thinkpad 380ed with a Pentium I-MMX class CPU. Before the liposuction, Adobe Reader 6 took 41 seconds to load (without any PDF file), after the fat-removal procedure, it took 20 seconds. On high-end systems, however, the results are more dramatic: Kelly claims Adobe Reader 6 took over 20 seconds to load on a 1.8 Ghz. Pentium 4 system, and just under two seconds after the procedure.

So here are the dirty details

Install Adobe Reader 6 :)
From the Start->Run windows menu, Open the "x:\Program
Files\Adobe\Acrobat 6.0\Reader" folder, where x is the right drive letter.
Find the plug_ins folder and rename it plug_ins_disabled
Create a new folder named plug_ins
Copy the following files from "plug_ins_disabled" to "plug_ins": EWH32.api, printme.api, and search.api

Of course this will limit the functionality to viewing non-encrypted pdf files, but that's exactly what I want Acrobat ^B^B^B^B^B Adobe Reader for, 99.9% of the time. You might want to experiment leaving some of the fat in, I mean, .API files, like reflow.api and search5.api (if it's there), and see how it affects functionality and load times.

With the files listed, you get half the load time on low-end systems, and a 2-sec load time on high-end ones. Still, you might want to prefer using Acrobat Reader 4.05 on old systems, since it loads in just seven seconds instead of 20.

Your mileage might vary. Liposuction is a dangerous clinical procedure. Consult your doctor. All lawsuits and claims should go not to me, but to our editor and our reader Kelly. ;)

"Remote Approach" pdf malware

"Remote Approach" pdf malware

The following taken from http://lwn.net/Articles/129729/

Unexpected features in Acrobat 7

(This article was contributed by Joe 'Zonker' Brockmeier)
	
Linux users may have been pleased to find that Adobe has finally
made available a new version of its Acrobat Reader, with
accessibility features, a much slicker interface than Acrobat 5.x
and new and other spiffy features. However, there are a few other
features that Linux users should be aware of.

A company called Remote Approach is promising to alert PDF
publishers as to the "reach and use of their materials." We were
curious to find out how Remote Approach was going to make good on
its promise, given that PDF has largely been seen as a one-way
medium. To find out, we created a test account and uploaded a PDF
to be "tagged" by Remote Approach, and then downloaded the
modified document to see whether Remote Approach could log our use
of the document.

Remote Approach's reporting did not work when we viewed the
document with Kpdf, Xpdf and Adobe Reader 5.0.10. It also failed
using Apple's "Preview" application on Mac OS X. The document was
still viewable with no apparent glitch in other PDF readers, but
the reporting function did not work. However, when we opened the
file using Adobe Acrobat Reader 7, Remote Approach started logging
views from our IP address. After doing a little research, we found
that Adobe's Reader was connecting to
http://www.remoteapproach.com/remoteapproach/logging.asp each time
we opened the document. The information is submitted over port 80
using HTTP, so it is unlikely that a home or office firewall
would, in a normal configuration, block the activity, unless the
firewall administrator is attempting to block Web browsing.

Apparently, Remote Approach's "tag" to our document included the
addition of JavaScript code causing Acrobat to report back to
their server; the information reported includes the fact that the
document had been read, our IP address, and which viewer it had
been read in. (Interestingly, Remote Approach does not seem to
recognize the Linux version of Acrobat Reader, as it left the
"User Agent" field blank in its reports.)

What many Linux users may not have realized, since Adobe did not
release an Acrobat Reader 6.x for Linux, is that Adobe has added
JavaScript support to PDF and the official Acrobat readers since
Acrobat 6.x. For those interested in the JavaScript support and
its abilities in Acrobat, see Adobe's scripting reference or
scripting guide. (Both are PDFs, of course.)

By default, Adobe Reader 7 turns on JavaScript, so the "tagged"
document is able to "phone home" without the user's awareness.
Turning off JavaScript disables the document's code, and prevents
Remote Approach (or any other entity) from tracking views of the
document. No doubt, Remote Approach is using features that would
normally be used to submit information from a PDF form.

The inclusion of JavaScript in Adobe Reader 7 for Linux no doubt
provides a number of welcome features for users, but it also
raises some privacy issues. The reader does not inform the user
that information is being submitted, so users are likely to be
oblivious to the fact that another party is aware of their PDF
reading habits. While a user may not find it objectionable to
notify the publisher, there are those of us who don't care to
allow publishers to snoop on activities taking place on our
personal computers.

Lucky for us, there are plenty of alternatives to Adobe's Reader.
Free PDF readers are unlikely to adopt features allowing the
reader to silently phone home in response to code stored within
the document itself. If you must use Acrobat, however, you may
want to have a look at the JavaScript settings first.

(pointed out by Kane)