According to AbbreviationFinder, PDF (English acronym p ortable d ocument f Ormat, Portable Document Format) is a document storage format developed by Adobe Systems. This format is a composite type (vector image, bitmap and text).
It is specially designed for documents that can be printed, since it specifies all the necessary information for the final presentation of the document, determining all the details of how it will look, not requiring previous adjustment or layout processes.
- It is multiplatform, that is, it can be presented by the main operating systems (Windows, Unix / Linux or Mac), without modifying the appearance or structure of the original document.
- You can integrate any combination of text, multimedia elements such as videos or sound, hypertext elements such as links and bookmarks, links and page thumbnails.
- It is one of the most widespread formats on the Internet for exchanging documents. For this reason it is widely used by companies, governments and educational institutions.
- It is an open specification, for which Free Software tools have been generated that allow creating, viewing or modifying documents in PDF format. An example is the org office suite and the LaTeX word processor.
- It can be encrypted to protect your content and even digitally signed.
- The PDF file can be created from various applications by exporting the file, such as the org programs and the Microsoft Office 2007 office suite (if you upgrade to SP2.
- It can be generated from any application by installing a virtual printer in the operating system, in case of using applications without this embedded functionality.
- It is the ISO standard (ISO 19005-1: 2005) for electronic document container files with a view to their long-term preservation.
- PDF files are device independent, the same file can be printed on an inkjet printer or imager. For the optimization of printing we can configure the options for creating the PDF file.
PDF files, along with applications that could view and create these types of documents, began to be developed from 1991, and their commercial and general adoption was very low. Their software was distributed as commercial license software. At that time the PDF document viewer was available for free, but not free.
The early versions of PDF documents did not have external hyperlinks; for this reason, its adoption on the Internet was considerably reduced and it was not very popular. In those times, Internet connections via dial-up modem were common, and the size of PDF documents was much larger than other types of documents, such as plain text (without formatting), for example; therefore, broadband was a key factor for its acceptance on the Internet. In addition, there were already other types of documents that made strong competition to the type of PDF documents, such as ” PostScript ” documents (.ps), which, at that time, were quite common.
Over time, PDF documents became popular in a number of different ways, such as advertising. This type of document began to become popular considerably, until it became a standard. This type of document is seen as a digital page that is ready to be printed exactly as it is displayed on the screen, with no margin problems when printing, just as it happens in other digital documents.
In the recent years of its popularity, several reader applications of this type of files have come out. Its popularity has opened the possibility of creating PDF documents with free software programs, as OpenOffice.org does today. Other applications are even capable of editing them, without using the typical application to create and edit Adobe PDF documents.
The PDF file format has changed several times as new versions of Adobe Acrobat have been released. There have been nine versions of PDF:
- (1993) – PDF 1.0 / Acrobat 1.0
- (1994) – PDF 1.1 / Acrobat 2.0
- (1996) – PDF 1.2 / Acrobat 3.0
- (1999) – PDF 1.3 / Acrobat 4.0
- (2001) – PDF 1.4 / Acrobat 5.0
- (2003) – PDF 1.5 / Acrobat 6.0
- (2005) – PDF 1.6 / Acrobat 7.0
- (2006) – PDF 1.7 / Acrobat 8.0
- (2008) – PDF 1.7, Adobe Extension Level 3 / Acrobat 9.0
PDF file format
Regardless of how the PDF file was created, they all share the same internal structure composed of four parts:
- Header: Information about the specification of the PDF standard that has been followed where, for example, the version is indicated.
- Body: Description of the elements used in the pages of the file.
- Cross-reference table: Information of the elements used in the pages of the file.
- Coda: Indicates where to find the crosstab.
It should be noted that when a PDF file is modified and new content is added, it will have new body sections, crosstab and coda but when saving this document we can optimize it so that duplicate sections are merged into only one and the file is reorganized. file.
Color representation in PDF
The PDF format is suitable for printing documents as it specifies all the necessary information that defines it. At this point it is interesting to specify how the color representation of the PDF file is done.
In the PDF format, color spaces are specified, this is the description of how to interpret the colors in the document.
A color is defined by one or more numerical components and the interpretation of these will be done according to the specified color space.
Color spaces can be: device dependent, device independent, or spatial color spaces.
It is the simplest and most imprecise way to reproduce colors used by devices that do not have color managers. Each point is described by a color that is made up of certain amounts of colorants.
For PDF there are three different color spaces depending on the device:
- Device CMYK : Color composition values are described by CMYK dyes (cyan, magenta, yellow, and black) by subtractive mixing.
- Device RGB : Color composition values are described by RGB colorants (red, green and blue) by additive mixing.
- Device gray: Color composition values are described by an achromatic scale from white to black.
|CMYK||Pure green||(66%, 0%, 100%, 0%)|
|Gray||Pure green||Black = 20%|
As we have said, when using the dependent color definition, even if it has the same color values, the reproduction of them will vary depending on the device that reproduces it.
These color spaces are based on CIE, an international organization that studies light and color. Its objective is to describe in detail how the human being sees and tries to reproduce them in the same way regardless of the device that reproduces it. These colors are also called calibrated.
Colors are described by numerical matrices and are modified by value transformations using the ideas of lighter and darker neutral colors.
For PDF there are four different color spaces independent of the device:
- Calibrated RGB: The color composition values are described by the RGB colorants (red, green and blue) through additive mixing but both the intensity, tonality and gradation depend on decoding functions in which a particular gamma value is applied for each colorant..
- Calibrated gray: The composition values of the colors are described by an achromatic scale from white to black but both the intensity, tonality and gradation depend on decoding functions in which a particular gamma value is applied for the colorant.
- Lab: CIE-based color space composed of A, B and C that are assigned the L *, a * and b * values of the CIELAB color space (Lab color space).
- Based on ICC: Based on the color spaces of the International Color Consortium which is not based on the entries of the color space dictionaries but on ICC (International Color Consortium) color profiles.
Special methods of color reproduction are used.
- Color spaces Separation: They are monochrome color spaces where special colorants such as metallic or fluorescent inks are used.
- Color spaces Device: Used for occasions when objects need to use more colorants in printing. These color spaces allow colorants in the device to be treated as a multi-component device color space.
PDF files can be compressed and each element of it is compressed using one or another algorithm.
PostScript texts and commands can be compressed using the Lempel Ziv Welch (LZW) algorithm and images using: JPEG, ZIP or RLE.
- JPEG (Joint Photographic Experts Group): In lossy or lossless mode used for grayscale or four-color images. If it is recompressed it causes a cumulative loss of information.
- ZIP (ZIP compression format): Performed using the lossless LZW algorithm, where it replaces repeated sequences with markers. Indicated for color and grayscale images.
- RLE (Run-length encoding): Lossless system used for line images (Raster graphics).