Adobe PDF2HTML:
Ta-daaa!
Joseph Cox's Adobe Reader Speed-Up (v1.31): [ar-speedup.zip] : 149 Kbytes zipped
disables the majority of plugins (that are
completely useless) hence adobe, should you really be compelled to use it, will start in a wiff
Precious Item
| |
A free, alternative and quick pdf reader instead of Adobe's monstruosity:
Foxit PDF reader: Free and QUICK reader
for PDF documents. You can view and print them with it.
Its small (less than 1MB download.
It doesn't need any lengthy installation, so you can start to run it as soon as you have downloaded it.
It finds text inside the pdf documents, select it and copy.
And it starts up immediately, so you don't need to wait ages for many lengthy useless dlls loading or for an annoying "Welcome" screen to disappear.
And if you really want to create your own pdf-files...
PDFCreator easily creates PDFs from any Windows program. Use it like a printer in Word, StarCalc or any other Windows application.
Ok, zugegeben: pdf files are a pain in the stomach: cumbersome,
difficult to grep, search, and automate for retrieval, awkward for cut and pasting purposes,
clogging down your computers with the Acrobat overload. But they have also some
positive aspects, of course, hence people still use them,
and you will find USEFUL pdf-files every now and then on
the web - see for instance the very important
altavista's
Search Intranet Developer's Kit (Ver. 2.6, April 1999)
And all other pdf files in our library.
Evidently -once found- you may want to fiddle with them: use them, catalogue them, grep
them, whatever.
But, alas,
pdf files, are annoying: they can be write protected, password protected,
whatever. Searchers should, of course, know how to overcome these small annoyances.
Also, Carpathia pointed out this a useful tool for a lot of multiformat conversions:
http://wheel.compose.cs.cmu.edu:8001/cgi-bin/browse/objweb
Enjoy!
-
[son_font.htm]:
An essay on identifying and getting ahold of fonts
by sonof, february 2003
part of the [Essays],
of the [pdf],
and of the [Targets] sections.
With a script to extract fonts from pdf files
-
[getting a non-protected PDF file] by heather spoonheim (September 2002)
-
[Converting Word-Docs
(or anything you like) to PDF] Some tips by
ashok
hariharan and other common knowledge methods
-
[The use of pdf2txt@adobe.com] by fravia+ (September 2000)
-
How to create PDF-Files
from any application that allows printing
(The GSview approach)
by hassan, May 2000
-
[Converting an Acrobat
PDF into ASCII] by Wolfgang Redtenbacher (March 2000)
-
[Enabling
Print-Challenged PDF
Files] by Kayaker (March 2000)
-
by Zer0+, November 1997
Quick starting notes and pdf again
(boolean variables inside PDF files)
-
by Ragica, November 1997
A Response to +ORC's Message Regarding reversing PDF
the biggest collection ever (in fact
the only collection ever so far!) of information regarding hacking
PDF and links to relevant information
(Kevin Lair CGI-hacks, the GhostScript hack, many good starting point for the USER crack)
-
by SiuL+Hacky, November 1997
Linux cracking: the live approach
(acrobat reader)
Linux advanced reverse engineering: imported functions
-
by Snatch, November 1997
Cracking all nag-screen and time-trial protections
(Aerial32 as example)
(Resource-ID fishing)
-
by JimBob, October 1997
The Aerial trick
How to crack any pdf security setting
(The Aerial RTF format converter)
-
by zeezee, October 1997
Create PDF documents for free reversing Adobe PDF Writer
You know how to create .pdf documents? No? I will -shortly- explain it
(another nice HIEW lesson)
Converting Word-Docs
(or anything you like) to PDF |
Converting Word-Docs (or anything you like) to PDF
The word document is a very lousy format -- it just takes up
too much
space.
Now you can convert all your huge documents to liteweight pdfs.
Just follow the following steps blindly :
1) Goto the start menu -> settings -> printers and select 'Add Printers'
2) Select 'Local Printer' when it prompts you for a local or network printer.
3) From the list of printers select any printer ending with 'PS' ,
this indicates that the printer has PostScript support. I generally chose
something like 'HP Color Laserjet 5/5M PS'.
4) Click Next and in the next screen select the Port as FILE:.
5) Click Next again and finish. (Say no to default printer).
6) Download and install GhostView from http://www.cs.wisc.edu/~ghost/
7) Now launch the Word Document that you want to convert to PDF in
winword.
8) In winword select File->Print. In the printer name select the name of
the printer that you just added. and check the option 'Print to file'. Now
click OK.
9) In the 'Print to File' Save As dialog save the file to a folder as filename.prn.
10)Now launch ghostView .
11)Use file->open to open filename.prn in GhostView.
12)Now use file->print. The printer setup dialog is displayed.
13)Select 'device:' as pdfwrite, select 'resolution:' as 300 , select
'Print to File' and click ok, enter the output file name when prompted as
your filename .pdf.
14)Thats it ! you can now view your old word document as a PDF file in
acrobat reader.
There are other tools that may come useful:
sfwtools converts anything (also pdf files) to swf
Yet there are many other methods:
One of the simplest ways
to convert into pdf is to use the software OpenOffice which is available at
openoffice.org.
It is a great replacement for Micro$oft Office package. Use the text
editor in OpenOffice and simply click on "export as pdf". Within a few
seconds (depending on the size of your file), the PDF document is
created.
Open office does a lot more.
+ it replaces the entire Micro$soft office package for home users.
+ It dispenses the need for using Word for composing documents
+ it is only a little over 60 MB
+ It is free
+ It is Open source
+ And, again, you can create PDF files by the mere click of a button.
Another possibility is the Win2pdf package, available at http://www.daneprairie.com,
It also installs as windoze print driver, so all you need to do is to print from
any application to create a PDF file. Unregistered versions are free for non commercial use.
Converting an Acrobat
PDF into ASCII |
Thank to Wolfgang Redtenbacher for the bulk of the following
advices about
converting an Acrobat PDF
file into ASCII text
Several
solutions like using Ariel (that can be easily cracked) and
sending an e-mail to "pdf2txt@adobe.com" have been
suggested.
What does not seem to be known widely, however, is the fact that
there exist freeware programs to convert PDF to TXT locally.
- One solution is to download Acrobat 4.0 (ca. 6 MB) from
www.adobe.com, plus the accessibility plug-in (ca. 1.2 MB)
from "access.adobe.com". This plug-in permits you to load a
PDF file into Acrobat and save it as .TXT or .HTM.
- An even better solution (regarding program size and
conversion
quality) is the program "pdftotext" which is part of the
XPDF-package (a freeware PDF viewer for several operating
systems).
You need to download the following files:
DOS: ftp://ftp.foolabs.com/pub/xpdf/xpdf-0.90-dos.zip
(1298148 bytes) and
ftp://ftp.cdrom.com/pub/infozip/MSDOS/gzip124.exe
(119146 bytes)
Win32: ftp://ftp.foolabs.com/pub/xpdf/xpdf-0.90-win32.zip
(584326 bytes) and
http://www.gzip.org/gzip124xN.zip
(62203 bytes)
After unpacking, you only need 1 file from each of the
archives (either DOS or Win32):
pdftotext.exe (964341 bytes/DOS resp. 354304 bytes/Win32)
gzip.exe (39910 bytes/DOS resp. 91648 bytes/Win32)
Move these 2 files into a directory that is in your search
path (environment variable PATH= ...), enter the command
"pdftotext xyz.pdf", and within seconds you get an ASCII
text conversion result in the file "xyz.txt" ("xyz" has to be
replaced by the real file name, of course).
NOTE: While the Win32 version of "pdftotext.exe" is more
compact than the DOS version (which contains additional DOS
extender code), it does not work with the widespread DOS
version of "gzip.exe" as it needs gzip with long file name
support. Therefore make sure to use either both programs in
the DOS version or both in the Win32 version. (The DOS version
runs flawlessly on a Win32 platform - it is just a bigger
EXE-file.)
The modern art of HTML2PDF conversion |
by Christian Wolfgang Hujer
Hello,
>> where can i find source code for converting html files
>> to pdf format, source code being in java.
several tools exist for that purpose.
I searched in google for "HTML Java PDF Conversion".
I found iText, which is an open source PDF library in Java that already has
capabilities of converting HTML to PDF.
On
http://www.lowagie.com/iText/links.html
are links to several other PDF engines in Java.
iText is an open source project hosted on sourceforge.
But the "modern art of HTML2PDF conversion" is the
following:
1. Make sure the HTML files are valid XHTML (or at least well-formed XML).
2. Use a transformation stylesheet that transforms HTML to XSL Formatting
Objects using XSL Transformation. Apply that stylesheet using an XSLT
processor like xt, xalan, saxon...
3. Run a Formatting Objects engine (like FOP from James Tauber / Apache
Group) that converts the generated FO-Tree from step 2 to PDF.
Most XSLT processors and most Formatting Objects engines are written in Java,
including all mentioned products (xt, xalan, saxon, fop) and come with their
source code.
The following points must be kept in mind:
- - Knowledge
XSLT and XSL:FO are 1-4 new languages to learn (depends on the point of view
and your knowledge, my opinion is that it's four languages altogether: XML,
XPath, XSLT and XSL:FO, but these are quite easy languages)
- - Development speed
XSLT Stylesheets and XSL Formatting are quite easy to learn and very quick to
develop.
It takes only few time to write an XSLT stylesheet.
- - Servlet Usage
XSLT and XSL:FO are usable as servlets.
- - Performance
Java native library performance might be considerably better.
- - The XSLT and XSL:FO is highly configurable. You control nearly every pixel
(resp. point) of the resulting PDF.
But I've never used the non-XSL:FO way, so I can't say much about that.
Just my 2 cents.
Greetings
- --
Christian Wolfgang Hujer
Enabling Print-Challenged PDF Files |
Enabling Print-Challenged PDF Files
I've seen a number of
queries recently about printing PDF files when the
Document
Security doesn't allow printing, so I thought I'd pass this
along
before I file it with my notes.
With Acrobat
Reader 4.0 all that is required is to enable the Print menu
item,
enable it - you print. There is no second check before
it goes to the code that
actually calls the Print Common
Dialog! This is different from Acrobat Reader 3.x,
where there
is a second check, which stupidly gives you a Message Box
saying
"This Operation is Not Allowed" if you try clicking on
your newly enabled Print
function. By saying essentially "You
shouldn't have been able to do this", it
gives you the reverser
something to work with to bypass it of course.
The check
for whether printing is allowed occurs as soon as you click on the
File
drop down menu. I was using the Building Win95 Apps PDF
by Kevin Goodman which is
available here and there.
You
all know the Win32 API function EnableMenuItem of
course:
The EnableMenuItem function enables, disables, or
grays the specified menu item.
BOOL
EnableMenuItem(
HMENU hMenu, // handle to menu
UINT uIDEnableItem, // menu item to enable, disable, or gray
UINT uEnable // menu item flags
The 2nd parameter
specifies the menu item under question and will be either
the
identifier of the menu item if given by the uEnable
parameter MF_BYCOMMAND flag,
or the relative position of the
menu item if given by the MF_BYPOSITION flag.
The MF_BYPOSITION
flag is normally used.
We can find out the position of
the Print menu item in the drop down list with:
The
GetMenuItemID function which retrieves the menu item identifier of
a menu
item located at the specified position in a menu.
UINT GetMenuItemID(
HMENU hMenu, // handle of
menu
int nPos // position of menu item
If a
regular menu item, the Return Value is an identifier.
If a
submenu, the Return Value is 0xFFFFFFFF.
If a separator, the
Return Value is 0.
If you cycle through the GetMenuItemID
function by setting a breakpoint on the
2nd parameter (the 1st
parameter PUSHed in SoftIce), you see an interesting
pattern
forming. The first item in the drop down list is given
the position # 0 (hex) and
an identifier as a return value, the
second #1 and so on, including the separators.
The
following table can be made (and I apologize for the /PRE
formatting spacing):
Position Identifier Menu Item
0 1770 Open
1 0 Separator
2 1772 Close
3 0 Separator
4 1774 Page Setup
5 1775 Print
6 0 Separator
7 FFFF Document Info (submenu)
8 FFFF Preferences (submenu)
9 0 Separator
A 1783 Adobe Online
B 0 Separator
C 1785 Recent File 1
D 17CC Recent File 2 (up to 4 files)
E 0 Separator
F 1787 Exit
So, without even going to this trouble we
can deduce what the BYPOSITION position
value will be for that
2nd parameter of EnableMenuItem simply by counting the
number
of menu items, including separators, in the drop down
list.
OK, now what? We know that somewhere between the
time the identifier (1775) is
allocated to the Print menu item
by GetMenuItemID, and the EnableMenuItem function
is called,
there is a check to see if this file is actually supposed to be
printable.
So how about doing a TRACE between the two and
see what's going on?
We want the first break to be when the
2nd parameter of GetMenuItemID (the first
parameter PUSHed) is
equal to 5 (the position number of Print in the drop down
list).
The address of the 1st parameter on the function
call stack is given by (ESP+4),
the 2nd by (ESP+8), so this
works:
BPX GetMenuItemID IF *(SS:ESP+8)==5
If we
set up a macro to display this address in the data window we can
verify it
broke at the right time:
MACRO Position = "dd
SS:ESP+8"
and the first BPX can become:
BPX
GetMenuItemID IF *(SS:ESP+8)==5 DO "Position"
Break here,
F11 and notice the menu identifier (1775) in EAX and the position
(5)
in the first line of the data window.
Set up the
second breakpoint similarly:
BPX EnableMenuItem IF
*(SS:ESP+8)==5 DO "Position"
(again, we are looking at the
2nd parameter on the stack)
Then set up the Trace. You
may want to increase the Trace Buffer size from the
default of
8K.
TASK will give the Taskname Acrord32
BPRW
Acrord32 T will set the trace
Press F5 and you will
break back into SoftIce after the code between the two
function
calls has been executed. F11 to return to Acrord32. You might
temporarily
toggle out the Register/Data/Code windows with
WR/WD/WC and maximize LINES before
typing SHOW to display the
trace.
SHOW 1 will show the last command
executed
000001 0137:0054C117 FF159C295700 CALL [USER32!EnableMenuItem]
and you can use the arrow keys to
scroll up and down.
A full screens' worth of this trace
gives a nice screendump with the Icedump
PAGEIN N
c:\filename.txt command.
Looking back through the trace
code, you quickly see a suspicious jump. You can
patch this or
force EAX to 1 a few lines back by changing
SBB EAX,
EAX
INC EAX (EAX=0)
into
XOR EAX, EAX
INC EAX (EAX=1)
I won't give the actual
addresses to patch, as that would take the fun out
;-)
Please correct any mistakes I might have
made.
Cheers,
Kayaker
The use of pdf2txt@adobe.com |
This is an extract from an email I made some time ago... common knowledge (see also
Wolfgang Redtenbacher's contribution, but hey! it works fine for me!
More and more documents are stored in Adobe's pdf format on the Web.
That may be fine for frill-formatting purposes, but quite annoying for the rest
of us, since pdf files are quite cumbersome for cut & paste and for search & grepping
purposes.
I have realized that many don't know that there's a nice (email) utility by
Adobe itself for those of you that prefer plain *.txt files (that can be searched,
cutted, pasted or grepped ad libitum).
Simply send an email with your pdf files attached (i.e. use the "insert file" option)
to the following email address:
pdf2txt@adobe.com
You don't need to send either text or subjects.
After a couple of minutes: "Hey bingo!" you'll get your text
file emailed back to you (for free of course).
Also useful: http://www.adobe.com/products/acrobat/access_email.html: Adobe PDF conversion by e-mail
getting a non-protected PDF file |
I received the following from heather spoonheim (September 2002):
+Fravia;
I have been following your site since early 1999, and
am not sure just how active you are, but, I thought I
would send a quick note as an addendum to the PDFFING
page.
Something that was not mentioned was just how simple
getting a non-protected PDF file from a protected PDF
file really is.
First I did some searching and found out that
non-printing was actually up to the PDF viewer. That
means the software is requested by the document to not
print the document, and the software honours the
request. What does this mean? It means that if you
have the source to a viewer, you can tell it to ignore
the request. Just do a search for open source PDF
viewers, and you find xpdf, just like on your PDFFING
page. You now have the source for PDF software.
To create your own PDF software that ignores the
"print disable" information, edit the pdftops.cc file.
Find this section of the file:
// check for print permission
if (!doc->okToPrint()) {
error(-1, "Printing this document is not
allowed.");
goto err1;
}
Comment out that block, compile, and presto, whammy,
you can convert the the .PDF to .PS using the
pdftops.exe program.
What to do now? Well, all laser printers understand
.PS files, so you can just do a dump from the command
line to the printer, or you can run Adobe Distiller on
the file, and create a new PDF, or you can run
Ghostscript on the file. Anything you want.
ps: To compile on a M$ platform you need either
MSVC(big bucks), or DJGPP(free as in speech). I
didn't try with BCC (free as in beer), but it would
probably work also. The ms_make.bat file that comes
with the source package worked flawlessly for me the
first time.
Advanced PDF Password Recovery Pro 2.12:
Decrypt protected Adobe Acrobat PDF files, which
have "user" and/or "owner" passwords set,
preventing the file from opening or editing,
printing, selecting text and graphics etc.
Gymnast 3.5 (build 149):
Gymnast converts text files to Adobe Acrobat PDF format without the need for
any additional Adobe software.
The alternative to the full Acrobat suite (which is relatively
easy to find on the web: search for
KWW500R7150122-128 or for acroba5 .zip / .rar / .ace):
Gymnast supports hyperlinks, annotations, and automatic generation
of bookmarks from headings. You can even include links to Web sites
and other PDF files. Produce professional-looking
documents for the Web. Development stopped end 1999.
Now freeware: Use this key to register your full copy:
GYM03-35672-11110-33170
GNU-Linux free pdf-viewers: http://lwn.net/Articles/113094/ (GNOME Ghostview (ggv), kghostview,
xpdf, gpdf, kpdf)
Cleaning Adobe bloated reader 6 |
How to use liposuction to repair Adobe Reader 6
And give it mouth-to-mouth respiration too
(article by Fernando
Cassia taken
from The Inquirer and published on searchlores in
february 2004)
I reported some time ago on the
slooooooow nature of Adobe's latest version of the acrobat (.pdf)
reader, recently renamed "Adobe Reader" by their marketing
spinmeisters. See "Adobe's quiet release of Reader software causes
people to scream".
And just a few days an e-mail from a reader, Kelly Cook, sneaked
into my inbox, claiming to have found a way to trim the fat off
Adobe Reader 6, and improving its load time to more reasonable
levels. Apparently the reader spotted this info on a Mac-related
blog and decided to try it on the Windows version. I have tried it
myself and guess what, it worked.
On my tests, I decided to use the "lowest of the low-end" system:
my Thinkpad 380ed with a Pentium I-MMX class CPU. Before the
liposuction, Adobe Reader 6 took 41 seconds to load (without any
PDF file), after the fat-removal procedure, it took 20 seconds. On
high-end systems, however, the results are more dramatic: Kelly
claims Adobe Reader 6 took over 20 seconds to load on a 1.8 Ghz.
Pentium 4 system, and just under two seconds after the procedure.
So here are the dirty details
Install Adobe Reader 6 :)
From the Start->Run windows menu, Open the "x:\Program
Files\Adobe\Acrobat 6.0\Reader" folder, where x is the right drive
letter.
Find the plug_ins folder and rename it plug_ins_disabled
Create a new folder named plug_ins
Copy the following files from "plug_ins_disabled" to "plug_ins":
EWH32.api, printme.api, and search.api
Of course this will limit the functionality to viewing
non-encrypted pdf files, but that's exactly what I want Acrobat
^B^B^B^B^B Adobe Reader for, 99.9% of the time. You might want to
experiment leaving some of the fat in, I mean, .API files, like
reflow.api and search5.api (if it's there), and see how it affects
functionality and load times.
With the files listed, you get half the load time on low-end
systems, and a 2-sec load time on high-end ones. Still, you might
want to prefer using Acrobat Reader 4.05 on old systems, since it
loads in just seven seconds instead of 20.
Your mileage might vary. Liposuction is a dangerous clinical
procedure. Consult your doctor. All lawsuits and claims should go
not to me, but to our editor and our reader Kelly. ;)
"Remote Approach" pdf malware |
"Remote Approach" pdf malware
The following taken from http://lwn.net/Articles/129729/
Unexpected features in Acrobat 7
(This article was contributed by Joe 'Zonker' Brockmeier)
Linux users may have been pleased to find that Adobe has finally
made available a new version of its Acrobat Reader, with
accessibility features, a much slicker interface than Acrobat 5.x
and new and other spiffy features. However, there are a few other
features that Linux users should be aware of.
A company called Remote Approach is promising to alert PDF
publishers as to the "reach and use of their materials." We were
curious to find out how Remote Approach was going to make good on
its promise, given that PDF has largely been seen as a one-way
medium. To find out, we created a test account and uploaded a PDF
to be "tagged" by Remote Approach, and then downloaded the
modified document to see whether Remote Approach could log our use
of the document.
Remote Approach's reporting did not work when we viewed the
document with Kpdf, Xpdf and Adobe Reader 5.0.10. It also failed
using Apple's "Preview" application on Mac OS X. The document was
still viewable with no apparent glitch in other PDF readers, but
the reporting function did not work. However, when we opened the
file using Adobe Acrobat Reader 7, Remote Approach started logging
views from our IP address. After doing a little research, we found
that Adobe's Reader was connecting to
http://www.remoteapproach.com/remoteapproach/logging.asp each time
we opened the document. The information is submitted over port 80
using HTTP, so it is unlikely that a home or office firewall
would, in a normal configuration, block the activity, unless the
firewall administrator is attempting to block Web browsing.
Apparently, Remote Approach's "tag" to our document included the
addition of JavaScript code causing Acrobat to report back to
their server; the information reported includes the fact that the
document had been read, our IP address, and which viewer it had
been read in. (Interestingly, Remote Approach does not seem to
recognize the Linux version of Acrobat Reader, as it left the
"User Agent" field blank in its reports.)
What many Linux users may not have realized, since Adobe did not
release an Acrobat Reader 6.x for Linux, is that Adobe has added
JavaScript support to PDF and the official Acrobat readers since
Acrobat 6.x. For those interested in the JavaScript support and
its abilities in Acrobat, see Adobe's scripting reference or
scripting guide. (Both are PDFs, of course.)
By default, Adobe Reader 7 turns on JavaScript, so the "tagged"
document is able to "phone home" without the user's awareness.
Turning off JavaScript disables the document's code, and prevents
Remote Approach (or any other entity) from tracking views of the
document. No doubt, Remote Approach is using features that would
normally be used to submit information from a PDF form.
The inclusion of JavaScript in Adobe Reader 7 for Linux no doubt
provides a number of welcome features for users, but it also
raises some privacy issues. The reader does not inform the user
that information is being submitted, so users are likely to be
oblivious to the fact that another party is aware of their PDF
reading habits. While a user may not find it objectionable to
notify the publisher, there are those of us who don't care to
allow publishers to snoop on activities taking place on our
personal computers.
Lucky for us, there are plenty of alternatives to Adobe's Reader.
Free PDF readers are unlikely to adopt features allowing the
reader to silently phone home in response to code stored within
the document itself. If you must use Acrobat, however, you may
want to have a look at the JavaScript settings first.
(pointed out by Kane)
(c) III Millennium: [fravia+], all rights
reserved, reversed and deserved