Making Compact PDFs Using Acrobat Distiller
Portable Document Format (PDF) files are a defacto standard for publishing complex documents on web sites. Forms, visually rich brochures and long documents are presented in PDF form. Due to the ease of “printing” to a PDF generating Printer such as Distiller or PDFWriter rather than totally regenerating the page in HTML or SWF — and also not to mention the cost of the Reader (free!) — placing PDFs
The web, however, is not a fast, high bandwidth unlimited download environment many of us experience at work. Sometimes even at work our access to the internet is throttled by slow modems, slow proxy servers – or god forbid, both.
Whilst making PDF is relatively easy, making a small, compact and efficient PDF takes a little setup and thought.
This article is not going to describe how to make extreme PDFs. This would require a long, and indepth study of PDF as a file format and squeezing every last byte, redundant space and element from the file.
PDFWriter or Distiller?
Acrobat 3.0 and 4.0 installed a printer called ‘PDFWriter’. The PDFWriter takes GDI (Windows) or Quickdraw (MacOS 8/9) printer calls and converted these into a PDF file. Quickdraw and GDI is not as accurate as Postscript with positioning.
Using a baseline file containing one graph (vector graphics), one image and text using Times New Roman (Truetype) – we can compare the file size of Distiller vs. PDFWriter. In both instances, I have used JPEG-medium settings for image compression.
Source Microsoft Word 2000 Document: 221184 bytes
PDFWriter: 92656 bytes
Distiller: 20676 bytes
Why is the PDFWriter’s PDFs larger? Isn’t it the same file? Well, yes – but the path taken is different. You can travel from Perth to Sydney the quickest way – or – via Darwin. Using either path you will arrive at the destination. However, the distance you travel is very different. PDFWriter is no different: we are asking the application to generate a representation of the page using the operating system’s internal graphics; perfectly tuned for screen output – but no so for printed output.
Looking “inside” the PDF using the Acrobat 5.0 “PDF Consultant>Audit Space Usage” reveals the differences when comparing the two files:
Distiller: Fonts 9.5% [1966 bytes], Images 68% [14053 bytes]
Above: Tools>PDF Consultant>Audit Space Usage Report for Distiller created PDF
PDFWriter: Fonts 4.5% [4162 bytes], Images 92.3% [85491 bytes]
Above: Tools>PDF Consultant>Audit Space Usage Report for PDFWriter created PDF
Why does the Distiller produce a smaller PDF? Postscript is targetted at printed output from low end laser and inkjet printers to very high quality printed output.
The Distiller, at its core, has a Postscript engine. It takes the Postscript generated by the application and interprets it; leaving the resulting images, text and graphics in an internal object-based state. This state is then easily represented in PDF.
I recommend we use the Distiller as our preferred process/path for creating a compact PDF.
The Creating Application
– Print in RGB. Firstly, images in the CMYK colour space takes extra disk space, contain less colour information (technically: it has a smaller colour gamut) compared to RGB and the the Distiller will convert from CMYK to RGB anyway. So try to keep the source images and colour definitions in RGB.
– Reduce Image Use. As images take the most space in a PDF, use vector graphic elements whereever possible. This is especially the case with elements such as corporate logos: they are usually available in vector format, and are used many times in a PDF.
– Do not convert text to outlines. Applications such as Illustrator and Freehand have the ability to convert type to outlines. Usually, this is done to reduce the need for a font to be sent along with a document. When creating PDFs each time a character is used in the document there is a vector shape describing that glyph (character). If it is used again, the whole shape is described again – this may take many bytes to do. Just reusing the character and referencing a typeface will take a byte.
– Reduce font variation. Try to minimise the number of typefaces used in a document. For every font or typeface used, Acrobat may (depending on the settings) embed more font information. This increases the resulting file size.
– Use Zapf Dingbats as much as Possible. Widget style fonts such as Wingdings et al are ok, but each time they are used, the Distiller has to embed the characters into the PDF. If Zapf Dingbats has a similar character or mark, use that glyph instead. As Zapf Dingbats in installed along with Acrobat Reader, the local version of the font will be used.
Tuning The Distiller
Now we are using the Distiller as the tool to create compact PDFs, its time to understand the process it goes through to create the resulting PDF.
The creating application firstly creates Postscript. Applications that do not produce Postscript directly use the AdobePS printer driver to generate the Postscript.
The heart of the Distiller is a Postscript engine, but the instructions it uses are contained in a file called the ‘joboptions’ file. It instructs the Distiller as to what fonts to embed and how; as well as what compression to apply to images.
Included with Acrobat 4.0 and 5.0 is a joboptions file called “ScreenOptimized” or “Screen” (respectively). This applies a certain, standardised joboption settings the source Postscript and directly effect the resulting PDF. These settings are a good first start when creating compact PDFs.
We can however further tweak these settings to make even smaller, more compact PDFs.
Launch Acrobat Distiller, and go to Settings>Job Options menu.
We are going to look at each of these 5 tabs and choose the most appropriate options to reduce the size of our PDF.
The first panel that is displayed is the “General Settings”
Acrobat 3, 4 or 5? What we are really comparing is not the version of Acrobat, but the underlying version of the PDF file format. Each time Adobe releases a new version of Acrobat, there are underlying changes to the file format as it adds extra features to the application. For instance,
Acrobat 4.0 added the ability to embed Times, Helvetica, Courier and Symbol to a PDF. Acrobat 5.0 added the ability to have expressed transparency.
Acrobat 3 = PDF 1.2
Acrobat 4 = PDF 1.3
Acrobat 5 = PDF 1.4
The decision to use a particular version of Acrobat, and therefore new features added to the file format must be based on what you believe your readers have installed on their PCs. The Reader, whilst a free download from the Adobe web site, is not something that end users will install the first day it is release.
Personally, I’ve found it usually takes 6-12 months from a new release of Acrobat to the PDF format for it to become the ‘standard’ and therefore target for your compatibility when creating PDFs.
The good news is that Acrobat 5.0 Distiller can make Acrobat 3, 4 or 5 PDFs.
When using Acrobat 4 or 5 compatibility, the file gets marginally larger than Acrobat 3. Why is this so? The reason is that the Distiller is adding an colour management profile (ICC) to each image.
Unless you are being very particular about colour, there is no need to attach a colour profile to each image. This will further reduce the file size of our compact PDF.
There are other limitations with Acrobat 3.0 compatibility. There is a limitation to the number of pages (32768) and dimensions (45×45 inches).
Optimise for Fast Web View: The Distiller restructures the file to prepare for page-at-a-time downloading. It also compresses text and line art, overriding the setting in the Compression panel. This makes for faster access and viewing when downloading.
Thumbnails: Thumbnails are a small, miniature versions of the PDF’s pages included inside the document for the Thumbnails palette in Acrobat. Prior to Acrobat 5.0, the Acrobat Distiller had to create these Thumbnails or they had to be manually created inside Acrobat itself. In Acrobat 5.0, Thumbnails are dynamically created.
Therefore to keep our PDF small, its best to turn this off.
As images constitute a large ratio of the space taken in PDFs, its best we optimise the images as much as possible.
The first important setting is to set the ‘target’ dpi to 72. This is the dpi that most computer screens will experience the compact PDF. New in Distiller 5.0 is the ‘for images above’ — this forces all images to be downsampled to 72dpi.
The quality setting you choose will be dependant on the end quality you are willing to ‘live with’ in the compact PDF. Remember, we are looking at images and how the average end user will experience them.
In our baseline document, the three difference compressions result in the following file sizes:
Medium: 20876 bytes
Low: 16615 bytes
Minimum: 12677 bytes
See the difference between the three generated PDFs using Medium, Low and Minimum Compression: Link To Compression Comparison
Before discussing fonts, we need to first look at how the Acrobat Reader renders type. The optimal way of displaying type if for the Acrobat Reader to use the inbuilt operating system’s type display mechanism. Apple and Microsoft spend many engineer-years of time tuning their operating systems to make the display of text as quick as possible.
In an environment where a PDF can go anywhere, you cannot rely on all the fonts (and versions of fonts) used in a document being everywhere in the world. Even relying on a fairly common font, Arial, is not recommended. What about Linux users – do they have the exact same version. What about different versions of Windows? What about different language versions of the same operating system?
This is where PDF first made its name: a universal format where the font issues are handled by the Reader. Fonts used on the creating computer are embedded into the PDF file as a part of the Distiller process — if the font permits embedding. (this is a ‘switch’ on the font, turned on or off by the maker/designer of the typeface)
There is no need to send your fonts along with the PDF – the Acrobat Reader is supplied with a collection of typefaces that represent the fonts supplied as standard with Postscript Level 1: Times, Helvetica, Courier, Symbol and Zapf Dingbats. If you use these fonts in your design, and as they are supplied with the Acrobat Reader, there is no need to embed them in a compact PDF. This is not recommended for PDFs for High Quality output as the versions of these fonts will be slightly different depending on the version of the Acrobat Reader the end user has; as well as differences related to platform.
Another smart feature of Acrobat since the first version is creating faux fonts. Faux means false in French (I think!) Acrobat Reader has a mechanism where it looks at typeface information embedded in the PDF by the Distiller that describes the features of the font (serif or sans serif, weight, descender details etc) and creates a false version of this typeface when displaying at the end user’s display. It will not look exactly the same, but it may suffice for the purposes of your compact PDF.
In our baseline Microsoft Word document, there are three fonts used. Times New Roman (TrueType), Arial (TrueType) in the graph and Imago ExtraBold (Headline). If you view the File>Document Info>Fonts you notice that these fonts are listed:
The important column to notice in the above screen dump is the ‘Actual Font’. It lists the actual font used in place of the original font. You will see that Imago ExtraBold lists “AdobeSansMM” — this indicates that the actual font being used to display is a Multiple Master font (MM) Sans Serif (without little bits on the end of the character shapes). What this means is that Acrobat has created a false (faux) version of this font to reduce file size. Not all fonts can be fauxed.
In contrast, Times New Roman and Arial list other typefaces: TimesNewRomanPSMT and ArialMT. This indicates that the original font is not embedded, and Acrobat is substituting the font to display it as accurately as possible.
The most appropriate option is to Subset embedded fonts, and never embedding the base-14 fonts. The result is that the Distiller:
– does not embed the base -14 fonts that are included with Acrobat Reader
– does not embed fonts that can be substitued with AdobeSerif/AdobeSansMM faux fonts
– does embed fonts that cannot be created
Distiller, to create the smallest files, converts all colour to sRGB (Acrobat 4 or 5) or Calibrated RGB (Acrobat 3). To reduce the file size further, its best to set the ‘Working Spaces’ RGB and CMYK to ‘None’ This discards any ICC colour profiles from the images prior to embedding them into the PDF.
The only option relevant to file size is the option called “ASCII format”. This option is a legacy of the bad old days of dial-in bulletin board systems and modems. Sometimes, the most appropriate way to transfer files was not via 8-bit communications but rather 7-bit ASCII style communications. This increases the file size by up to 12% when this option is turned on.
Individual Graphic Tuning
In some circumstances, you may need to individually tune the images in a PDF to reduce file size or increase the quality of certain elements.
Tools such as Quite A Box of Tricks or Enfocus Pitstop will permit individual optimisation of PDFs.
Beware and Be-aware Of
– Combining PDFs (taking the pages from one PDF and inserting them into another using Acrobat) will result in larger PDFs than combing the pages in the source document.
– Adding Form fields, buttons, Tags and Bookmarks to PDFs add to the size. Whilst the size of these added elements are much smaller than images that may be in a PDF, they do add to the size. Consider the audience of the PDF. If you expect visually impared people to view the PDF, tagging is a requirement for screen readers to correctly interpret and read the document. Long, text filled documents scream out for bookmarks for viewing. Visually rich documents for onscreen may require links created by Acrobat.
– Applications such as Adobe Illustrator 9, 10 and InDesign 2.0 can create bitmap images where you don’t first expect. In the process of creating a Postscript file where transparency exists in the source document, these applications may flatten the transparency into a bitmap, even where the source elements are vector graphics or text.
“Flattening” is a function of printing, and is executed to ensure the printed result matches the designer’s intent. As we know, bitmaps take more space than vector and text elements. Therefore, the file size may blow out.
– Some typefaces are large. Complex fonts containing more intricate paths will take more space than simple typefaces. If you use bold, semibold, italic variations to a typeface, this will force Distiller to further embed more font data.
– In Microsoft Word, and other applications, you can “scale up” images. The applications hold the image data at 100%, and at print time, scale the pixels up. With Acrobat and PDF, the image is always held at 100% after printing from the application. This will cause the PDF to potentially become larger than the original application document – as the PDF have more pixels.
How Low Can it Go?
Out of interest, how small can you make a PDF?
The simplest piece of Postscript that will render a PDF is the command ‘showpage’ Normally, the Postscript interpretter treats this as “take all that has been rendered to the page and generate output” In the instance of nothing on the page, you get, well, nothing.
Acrobat Distiller 5.0.5 generates a PDF of 2116 bytes in length from this simple Postscript command.
– half of this is populated with XMP (eXtensible Metadata Platform) packet containing information about the file.
Handcoding a PDF would result in a smaller file, but I’ll leave this as an excerise for the reader.
Obviously, a PDF with no contents is as useful as a non-alcoholic beverage at an australian barbeque.
If you are interested in a tool that will automatically enhance the size (that is, make smaller) – I suggest you look into PDFEnhancer