Here, a post to present briefly the recommandation XSL Formatting Objects (XSL-FO) and an example with Apache FOP.

This article is a part of a serie of posts concerning the XSL recommendations.
From the W3C website about the The Extensible Stylesheet Language Family (XSL) http://www.w3.org/Style/XSL/:
XSL is a family of recommendations for defining XML document transformation and presentation. It consists of three parts:

  • XSL Transformations (XSLT): a language for transforming XML; An XSLT stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses a formatting vocabulary, such as (X)HTML or XSL-FO. XSLT is developed by the W3C XSLT Working Group (members only) whose charter is to develop the next version of XSLT. XSLT is part of W3C’s XML Activity, whose work is described in the XML Activity Statement.
  • The XML Path Language (XPath): an expression language used by XSLT (and many other languages) to access or refer to parts of an XML document; XPath is developed jointly by the XQuery and XSLT Working Groups.
  • XSL Formatting Objects (XSL-FO): an XML vocabulary for specifying formatting semantics. XSL-FO is now developed by the XML Print and Page Layout Working Group.

Presentation
XSL Formatting Objects (XSL-FO), defined in the W3C XSL recommendation, is a language of data formatting, i.e. a markup language based on XML that defines the layout of text, images, lines and other graphical elements.

XLS-FO allows to create quality prints on paper or on screen. Contrary to XHTML / HTML, which is particularly suitable for browsers, XLS-FO is mainly used in the field of printing and archiving for documents with many pages. In 2001, XSL-FO was defined by the World Wide Web Consortium as a W3C Recommendation i.e. a standard language for converting XML documents in printing format.

XSL-FO offers the following elements, functions and attributes (non exhaustive list):

  • Regions, borders and areas of a page;
  • Width, height and order of pages;
  • Management pages;
  • Frames, spacing, multi-column presentation and blocks;
  • Paragraphs, lists and tables;
  • Layout of the text as the format of the records, linefeed and separation;
  • Lines, images and other objects;
  • …etc.

The XSL-FO tags
To generate printing format of a document, XSL-FO defined multiple tags with different behaviors in a XSL-FO document which is also XML document. XSL-FO document has the following structure:

<?xml version="1.0" encoding="ISO-8859-1"?>

 <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">

 <fo:layout-master-set>
   <fo:simple-page-master master-name="A4">
     <!-- Page template goes here -->
   </fo:simple-page-master>
 </fo:layout-master-set>

 <fo:page-sequence master-reference="A4">
   <!-- Page content goes here -->
 </fo:page-sequence>

 </fo:root> 
</xsl:template>

Some tags of XSL-FO:

  • fo:root:
    <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
       ...
    </fo:root>
    

    The fo:root element is the root element of XSL-FO documents.

  • fo:layout-master-set:
    <fo:layout-master-set>
       <!-- All page templates go here -->
    </fo:layout-master-set>
    

    The fo:layout-master-set element contains one or more page templates.

  • …etc:
    Many other tags exist and can perform complex transformations : https://www.w3schools.com/xml/xsl_intro.asp, http://www.whoishostingthis.com/resources/xsl-fo/

There are several implementations of XSL-FO processors, in this article, we will describe how to create a PDF document using XSL-FO and FOP developed by the Apache Software Foundation.

Apache FOP:
From Apache http://xmlgraphics.apache.org/fop/index.html:
Apache™ FOP (Formatting Objects Processor) is a print formatter driven by XSL formatting objects (XSL-FO) and an output independent formatter. It is a Java application that reads a formatting object (FO) tree and renders the resulting pages to a specified output. Output formats currently supported include PDF, PS, PCL, AFP, XML (area tree representation), Print, AWT and PNG, and to a lesser extent, RTF and TXT. The primary output target is PDF.

The necessary librairies JAR are:

  • fop.jar (1.0)
  • avalon-framework-4.2.0.jar
  • batik-all-1.7.jar
  • commons-io-1.3.1.jar
  • commons-logging-1.0.4.jar
  • xmlgraphics-commons-1.4.jar

Here, we will study a concrete case: (XSL-FO) =[XSL-FO processor]=> PDF
However, often, the XSL-FO doesn’t exist, so, the creation of PDF is done from a XML stream with the following steps:

  • XML is transformed using an XSLT to XSL-FO: XML =[XSLT processor]=> XSL-FO
  • XSL:FO is transformed using FOP to PDF: XSL-FO =[XSL-FO processor]=> PDF

…or from a JSON stream with the below steps:

  • JSON object is serialized to POJO
  • POJO object is serialized to XML (XSTREAM)
  • XML is transformed using an XSLT to XSL-FO: XML =[XSLT processor]=> XSL-FO
  • XSL:FO is transformed using FOP to PDF: XSL-FO =[XSL-FO processor]=> PDF

So, in our example, we will create a PDF with a logo, and a table/list of websites containing names and URL. We will use the following XSL-FO xml file xslFolFile.fo:

<?xml version="1.0" encoding="utf-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
  
  <fo:layout-master-set>
    <fo:simple-page-master master-name="root-reference">
      <fo:region-body margin="1in" />
    </fo:simple-page-master>
  </fo:layout-master-set>
  
  <fo:page-sequence master-reference="root-reference">
    <fo:flow flow-name="xsl-region-body">

  	  <fo:block>
		<fo:external-graphic width="100pt" height="100pt" content-width="50pt" content-height="50pt" text-align="end" display-align="before" src="images/fop.jpg"/>
	  </fo:block>
	  
      <fo:block>Here, my favorites sites:</fo:block>
      
      <fo:block>
        <fo:table>
          <fo:table-body>

            <fo:table-row>
              <fo:table-cell border="solid 1px black" text-align="center" font-weight="bold">
                <fo:block>URL</fo:block>
              </fo:table-cell>
              <fo:table-cell border="solid 1px black" text-align="center" font-weight="bold">
                <fo:block>Name</fo:block>
              </fo:table-cell>
            </fo:table-row>

            <fo:table-row>
              <fo:table-cell border="solid 1px black" text-align="center">
                <fo:block><fo:basic-link external-destination="htt://www.javablog.fr">htt://www.javablog.fr</fo:basic-link></fo:block>
              </fo:table-cell>
              <fo:table-cell border="solid 1px black" text-align="center">
                <fo:block>JavaBlog.fr</fo:block>
              </fo:table-cell>
            </fo:table-row>
            
            <fo:table-row>
              <fo:table-cell border="solid 1px black" text-align="center">
                <fo:block><fo:basic-link external-destination="http://www.java.lu/">http://www.java.lu/</fo:basic-link></fo:block>
              </fo:table-cell>
              <fo:table-cell border="solid 1px black" text-align="center">
                <fo:block>Java.lu</fo:block>
              </fo:table-cell>
            </fo:table-row>
          
            <fo:table-row>
              <fo:table-cell border="solid 1px black" text-align="center">
                <fo:block><fo:basic-link external-destination="http://xmlgraphics.apache.org/">http://xmlgraphics.apache.org/</fo:basic-link></fo:block>
              </fo:table-cell>
              <fo:table-cell border="solid 1px black" text-align="center">
                <fo:block>Apache XML Graphics</fo:block>
              </fo:table-cell>
            </fo:table-row>
                      
          </fo:table-body>
        </fo:table>

      </fo:block>
      
    </fo:flow>
  </fo:page-sequence>
</fo:root>

Some explanations concerning this XSL-FO document:

  • an image is added in the PDF via the tag fo:external-graphic:
    <fo:external-graphic width="100pt" height="100pt" content-width="50pt" content-height="50pt" text-align="end" display-align="before" src="images/fop.jpg"/>
    
  • a table is created via the tags fo:table, fo:table-body, fo:table-row and fo:table-cell
    ...
            <fo:table>
              <fo:table-body>
    ...
                <fo:table-row>
                  <fo:table-cell border="solid 1px black" text-align="center" font-weight="bold">
                    <fo:block>URL</fo:block>
                  </fo:table-cell>
                  <fo:table-cell border="solid 1px black" text-align="center" font-weight="bold">
                    <fo:block>Name</fo:block>
                  </fo:table-cell>
                </fo:table-row>
    ...
                          
              </fo:table-body>
            </fo:table>
    
  • hyperlink is added via the tag fo:basic-link
    <fo:basic-link external-destination="htt://www.javablog.fr">htt://www.javablog.fr</fo:basic-link>
    

The Java code for the XSL-FO processing is:

/**
 * Generate PDF = XSL-FO stylesheet + XSL-FO processor (FOP)
 * 
 * @author Huseyin OZVEREN
 *
 */
public class TestGeneratePdf {

	public static void main(String[] args) {
		String foPathFile = "xslFolFile.fo";
		buildPdf(foPathFile);
	} 
	
	/**
	 * Builds the PDF from XSL-FO + FOP processor
	 * @return html document
	 */
	public static void buildPdf(String foPathFile)  {
		
		OutputStream out = null;
		
		try {
			// Construct a FopFactory (reuse if you plan to render multiple documents!)
			FopFactory fopFactory = FopFactory.newInstance();

			// Setup input stream
			InputStream foStream = TestGeneratePdf.class.getResourceAsStream(foPathFile);
			Source src  = new StreamSource(foStream);
			
			// Setup output stream.
			// Note: Using BufferedOutputStream for performance reasons (helpful with FileOutputStreams).
			// The pdf is generated  in "bin" folder "test_xml_xslfo\bin\com\ho\test\xsl\xslfo\fop"
			String myfilePdf = TestGeneratePdf.class.getResource(foPathFile).getPath() + ".pdf"; 
			out = new BufferedOutputStream(new FileOutputStream(new File(myfilePdf)));
			
			// Construct fop with desired output format
			Fop fop = fopFactory.newFop("application/pdf", out);

			// Setup JAXP using identity transformer
			TransformerFactory factory = TransformerFactory.newInstance();
			Transformer transformer = factory.newTransformer(); // identity transformer

	        // Resulting SAX events (the generated FO) must be piped through to FOP
			Result res = new SAXResult(fop.getDefaultHandler());
		            
			// Start XSLT transformation and FOP processing
			transformer.transform(src, res);

		} catch(Throwable th){
			th.printStackTrace();
		} finally {
		  //Clean-up
			if(out!=null){
				  try{out.close();} catch(Throwable th){th.printStackTrace();}
			}
		}
	}
}

So, we could obtain the following ouputs in console:

2 oct. 2012 00:48:24 org.apache.fop.events.LoggingEventListener processEvent
ATTENTION: The following feature isn't implemented by Apache FOP, yet: table-layout="auto" (on fo:table) (See position 20:19)
2 oct. 2012 00:48:24 org.apache.fop.events.LoggingEventListener processEvent
ATTENTION: Font "Symbol,normal,700" not found. Substituting with "Symbol,normal,400".
2 oct. 2012 00:48:24 org.apache.fop.events.LoggingEventListener processEvent
ATTENTION: Font "ZapfDingbats,normal,700" not found. Substituting with "ZapfDingbats,normal,400".

… the following PDF is generated in the “bin” folder test_xml_xslfo\bin\com\ho\test\xsl\xslfo\fop:

Note: The above screenshot contains hyperlinks.

In this post, I have presented briefly the recommandation XSL-FO of XSL with the XSL-FO processor APACHE FOP to create PDF from XSL-FO stream. The bit of code will help you to create PDF programmatically.

Source: test_xml_xslfo.zip

That’s all!!!

Huseyin OZVEREN