itext

Topics related to itext:

Getting started with itext

If you look at PDF creation, you'll find two different approaches:

Graphical designers use desktop applications such as Adobe Acrobat or Adobe InDesign to create a document in a manual or semimanual process.
In another context, PDF documents are created programmatically, using an API to produce PDFs directly from software applications, without —or with minimal— human intervention. Sometimes the document is created in an intermediary format first (e.g. XML, HTML,...), and then converted to PDF.

These different approaches demand different software products.

The same goes for PDF manipulation.

You can update a PDF manually in tools such as Adobe Acrobat,
There are also tools that allow forms to be filled out automatically based on information from a database.

iText is a tool that focuses on the automation side of things.

What is iText?

iText is an SDK that was developed to allow developers to do the following (and much more):

Generate documents and reports based on data from an XML file or a database
Create maps and books, exploiting numerous interactive features available in PDF
Add bookmarks, page numbers, watermarks, and other features to existing PDF documents
Split or concatenate pages from existing PDF files
Fill out interactive forms
Digitally sign PDF documents
Serve dynamically generated or manipulated PDF documents to a web browser

iText is not an end-user tool. You have to build iText into your own applications so that you can automate the PDF creation and manipulation process.

When to use iText?

Typically, iText is used in projects that have one of the following requirements:

The content isn't available in advance: it's calculated based on user input or real-time database information.
The PDF files can't be produced manually due to the massive volume of content: a large number of pages or documents.
Documents need to be created in unattended mode, in a batch process.
The content needs to be customized or personalized; for instance, the name of the end user has to be stamped on a number of pages.

Often you'll encounter these requirements in web applications, where content needs to be served dynamically to a browser. Normally, you'd serve this information in the form of HTML, but for some documents, PDF is preferred over HTML for better printing quality, for identical presentation on a variety of platforms, for security reasons, to comply with specific industry standards (such as PAdES, PDF/A, or PDF/UA), or to reduce the file size.

Fonts: iText 5 versus iText 7

In the first versions of iText, there was only one font class: Font.

With this font, you could create a Font object for fourteen fonts from five font families: Helvetica (regular, bold, oblique, bold-oblique), Times Roman (regular, bold, italic, bold-italic), Courier (regular, bold, oblique, bold-oblique), Symbol and Zapf Dingbats.

Such a Font object was created like this:

Font font = new Font(FontFamily.TIMES_ROMAN);

You also had to define the font size, for instance:

Font font14pt = new Font(FontFamily.TIMES_ROMAN, 14);

The default font was Helvetica; the default font size 12.

iText evolved and more fonts were supported. The BaseFont class was used to deal with these fonts internally. A BaseFont class was created like this:

BaseFont bf_russian = BaseFont.createFont(
    "resources/fonts/FreeSans.ttf",
    "CP1251",
    BaseFont.EMBEDDED);

The first parameter is the path to a font program, for instance a TTF file, the second parameter is the encoding, for instance CP1251 for Cyrillic characters, the third parameter indicates if a subset of the font needs to be embedded.

The BaseFont class is to be used when you add content at the lowest level, for instance when creating text objects in your code using beginText(), setFontAndSize(), setTextMatrix(), showText(), endText() sequences. Typically, you will only use this low-level approach if you are a PDF specialist. If you don't know anything of PDF syntax, you shouldn't use such a sequence.

You can also use the BaseFont class to create a Font object:

Font russian = new Font(bf_russian, 12);

Now we can use the russian font to create a Paragraph that contains Russian text.

There are some other ways in iText 5 to create Font objects, but this is the most common procedure. People were sometimes confused by the difference between Font and BaseFont, and they didn't always use the correct approach.

What we fixed in iText 7:

We made things more uniform. There is now a single PdfFont class, and you create a font using a PdfFontFactory:

PdfFont font = PdfFontFactory.createFont(FontConstants.TIMES_ROMAN);
PdfFont russian = PdfFontFactory.createFont(
    "src/main/resources/fonts/FreeSans.ttf", "CP1251", true);

You no longer need to create different font objects if you want to switch to another font size. Switching to a different font size can simply be done using the setFontSize() method:

Paragraph p = new Paragraph("Hello World! ")
    .add(new Text("Hallo Wereld! ").setFontSize(14))
    .add(new Text("Bonjour le monde! ").setFontSize(10));

The default font is still Helvetica and the default font size is still 12, but you can now define a font (and a font size) for the document:

document.setFont(font);

In this case font will be the default font when adding a building block (for instance a Paragraph) without specifying a font.

Want to know more?

Read Introducing the PdfFont class which is chapter 1 in the iText 7: Building Blocks tutorial. Get the free ebook!

Styles: iText 5 versus iText 7

Creating a document in which you have to switch between styles frequently tends to be tedious in iText 5. You need to create a lot of Chunk objects and you always have to make a trade-off between applying the styles directly to every new Chunk or creating a helper method that creates the Chunk for you.

What we fixed in iText 7:

It is now possible to chain methods. The setFont(), setFontSize(), addStyle(), and other methods all return the object on which they are invoked. Adding a Paragraph involving different styles can now be done in one line:

document.add(
    new Paragraph()
        .add("In this example, named ")
        .add(new Text("HelloWorldStyles").addStyle(style))
        .add(", we experiment with some text in ")
        .add(new Text("code style").addStyle(style))
        .add("."));

Using the Style object, you can now also apply different properties (font, font color, background color, font size,...) in one go with the addStyle() method.

Want to know more?

Read Introducing the PdfFont class which is chapter 1 in the iText 7: Building Blocks tutorial. Get the free ebook!

Tables: iText 5 versus iText 7

The iText 5 class names PdfPTable and PdfPCell were chosen because we already had classes named Table and Cell to create table and cell objects at the highest programming level. There was also a class named PdfTable to be used by iText internally. Those classes had a lot of flaws and they were deprecated in favor of PdfPTable and PdfPCell. They have been removed a long time ago.

Over the years, PdfPTable and PdfPCell also received some criticism from users. For instance: users didn't understand the difference between text mode and composite mode.

Text mode is used when you create a PdfPCell like this:

cell = new PdfPCell(new Phrase("Cell with rowspan 2"));

In this case, you define properties like the horizontal alignment on the level of the PdfPCell.

Composite mode kicks in the moment you use the addElement() method:

cell = new PdfPCell();
cell.addElement(new Phrase("Cell 1.2"));

In this case, some properties defined at the level of the PdfPCell (such as the horizontal alignment) are ignored. The horizontal alignment is to be defined at the level of the elements added to the cell. For instance: if you want to create a cell in which different paragraphs need to have a different horizontal alignment, you will switch to composite mode.

If you look at the screen shot of the table created with the iText 5 example, you will notice that the cells with content Cell 1.1 (added in text mode) and Cell 1.2 (added in composite mode) are aligned quite differently.

In answer to the criticism on the odd alignment, we introduced methods to use ascender and descender information. We use these methods for the cells with content Cell 2.1 (added in text mode) and Cell 2.2 (added in composite mode). We also introduced a padding of 5 for these cells.

Now the result is much better.

What we fixed in iText 7:

Since we created iText 7 from scratch, we had no legacy classes with names we couldn't reuse. We introduced a new Table and a new Cell class.

There is no more text mode and no more composite mode. A Cell is created either without parameters, or with parameters that define the rowspan and the colspan. All content is added the same way: using the add() method.

Our customers were also asking to provide a means to distinguish a margin and a padding. In the iText 7 example, we added a gray background to show the difference. In the cell with content Cell 2.1, we define a margin of 5 user units. The default padding is 2. In the cell with content Cell 2.2, we define a padding of 5 user units, the default margin in 0.

As you can tell from the screen shots, the cells are rendered quite nicely. We didn't have to use methods to set the ascender or descender. The default behavior is much closer to the behavior a developer would expect.

Want to know more about tables and cells in iText 7?

Read Adding AbstractElement objects (part 2) which is chapter 5 in the iText 7: Building Blocks tutorial. Get the free ebook!

Text to PDF: iText 5 versus iText 7

The code to convert a plain text file to a PDF document is pretty simple whether you use iText 5 or iText 7. In iText 7, you have the advantage that you can define the alignment at the level of the document. In iText 5, you have to set the alignment for every separate Paragraph object.

To understand the real difference between iText 5 and iText 7 in this pair of examples, we have to take a look at the resulting PDF. In iText 5, we end up with 35 pages of text. In iText 7, we have the same text distributed over 38 pages.

The text is easier to read when created by iText 7 because different defaults are used when creating the layout. You could get the same result from iText 5 code, but then you'd have to change some values with respect to spacing.

In iText 7, the default values were chosen based on 16 years of experience with iText. This way, you get a better result with less code.

Want to know more?

Read Working with the RootElement which is chapter 5 in the iText 7: Building Blocks tutorial. Get the free ebook!

Columns: iText 5 versus iText 7

In iText 5, you can't use the add() method to add a Paragraph to a Document if you want to organize the content in columns. We can't reuse the code of the Text2Pdf.java (iText 5) example.

Instead we have to create a ColumnText object, we have to add all the Paragraph objects to this object, and once we've finished adding all the content, we can start rendering that content using the go() method. While doing so, we have to keep track of the columns, and create new pages when necessary.

What we fixed in iText 7:

With iText 7, we can copy and paste the code from the Text2Pdf.java (iText 7) example. We can continue using the add() method the same way we did before. If we want to render the content in two columns instead of in one, we simple have to change the document renderer:

Rectangle[] columns = {
    new Rectangle(36, 36, 254, 770),
    new Rectangle(305, 36, 254, 770)};
document.setRenderer(new ColumnDocumentRenderer(document, columns));

Want to know more?

Read Working with the RootElement which is chapter 5 in the iText 7: Building Blocks tutorial. Get the free ebook!

Page events (iText 5) versus Event handlers and Renderers (iText 7)

In iText 5, we introduced the concept of page events to allow developers to add specific behavior when a document is opened, when a new page is opened, when a page ends, and when a document is closed.

In the documentation, we made it very clear that it was forbidden to add content in the onStartPage() method; content can only be added in the onEndPage() method. We also made it very clear that the Document object passed to the page event methods was passed for read-only purposes only. It was forbidden to use document.add() even in the onEndPage() method.

Unfortunately, many developers completely ignore the documentation, which led to problems such as:

I can't remember how many times I got agitated because yet another developer posted a duplicate of these questions. People often wonder why they get a harsh answer, but they don't realize that a minimum of effort from their side would have saved everyone, including themselves, plenty of time. All of these questions could have been answered by saying "Read The (you-know-which) Manual."

Another option was a complete overhaul of iText so that these kind of problems can be avoided.

Due to the organic growth of iText, the page event class had also been extended with functionality that was unrelated to page events. It contained generic chunk functionality, it registered the start and the end of paragraphs, and so on.

What we fixed in iText 7:

We removed the page event functionality.

For all events with respect to pages, we now implement the IEventHandler interface, and we use the addEventHandler to add this handler as a PdfDocumentEvent to the PdfDocument. In the example, we use an END_PAGE event, but we could also have used a START_PAGE event. It doesn't matter any more whether you add content at the start or at the end. You can read more about this in Handling events; setting viewer preferences and writer properties which is chapter 7 in the iText 7: Building Blocks tutorial.

We improved the building blocks in the sense that we made them more hierarchical (see Before we start: Overview of the classes and interfaces which is the introduction of the iText 7: Building Blocks tutorial). We also introduced a set of Renderer classes, one for each building block, and we allow developers to adapt these renderers so that a building block shows a different behavior when rendered. See for instance the renderer example in Adding AbstractElement objects (part 1) which is chapter 7 in the iText 7: Building Blocks tutorial.

These changes simplify the functionality for developers who don't (want to) know much about PDF and iText, while at the same time offering an abundance of flexibility to those developers who aren't afraid to dig deep into the iText code to create a PDF exactly the way they want it.

Want to know more? Get the free ebook!

Pdf Creation: iText 5 versus iText 7

In the original design for iText, it was possible to create a high-level Document object, and then have different DocListener objects listening to that Document object. This was achieved by using different writers: a PdfWriter, an HTMLWriter, and an RtfWriter. When using a PdfWriter, a PdfDocument was created internally. This low-level class took care of all PDF-related structures. More or less the same was true for the other formats.

Over the years, iText specialized and it became a pure PDF library. The creation of HTML and RTF was abandoned, hence it was no longer necessary to create a Document before creating a PdfWriter, but we had to stick to the original architecture because we weren't ready to break the API.

Over the years, we added more and more PDF functionality to iText, and the fact that PdfDocument was a class for internal use only became problematic. We used workarounds so that we could introduce new PDF features that belonged in the PdfDocument class up until the point that we reached the ceiling of what we considered acceptable as workarounds.

That's when we decided to rewrite iText from scratch and to create a completely new architecture for iText. Now we have a clear distinction between the PdfDocument (for low-level operations) and the Document (for high-level functionality). We no longer have to open the document, and if we use the try-with-resources approach, we don't even have to close it ourselves.

Want to know more? Get the free ebook!

Forms: iText 5 vs iText 7

iText 5 is a library that has grown organically. Many developers contributed code. For instance: one developer contributed code to create form fields from scratch, using classes such as TextField and PdfFormField; another developer contributed code to change existing form fields, using the AcroField class and a series of setFieldProperty() methods.

In iText 5, the classes used to create form fields cannot be used to change form fields, and vice-versa. There is no relationship whatsoever between the two sets of classes. That's confusing for many users. For instance: some users discover the TextField class, and assume they can use that class to change the properties of an existing text field. This isn't the case, they need to use the AcroFields class instead.

All of this is fixed in iText 7. We created a new set of classes such as PdfFormField and its subclass PdfTextField that can be used to create a new field, as well as to update an existing form field.

The iText 7 form field methods can be chained to make your code more compact, and they are much more intuitive than the corresponding methods in iText 5. Making the form functionality more elegant was one of the key reasons to rewrite iText from scratch.

Q & A about versions

Why do the version numers jump from 2 to 5, and from 5 to 7? There are several reasons for skipping version numbers. In 2009, the version number of iText (Java) and iTextSharp (C#) were not in sync. The Java version was at version 2.1.7; the C# version was at version 4.1.6. A decision was taken to move to Java 5 for the Java version and to harmonize the version numbers of iText and iTextSharp.

What's the deal with iText 4? Third parties have released iText 4.2.z versions, but those aren't official versions. They aren't supported by anyone and shouldn't be used because no one really knows what's inside.

Is iText 7 backward compatible? iText 7 is a totally new version of iText, written from scratch by the iText team. Backward compatibility is broken in favor of a more intuitive interface. The Java version of iText has moved from Java 5 to Java 7, which was one of the reasons to jump from iText 5 to iText 7.