JavaBlog.fr / Java.lu DEVELOPMENT,Java Apache: Commons lang (example StringEscapeUtils.escapeHtml)

Apache: Commons lang (example StringEscapeUtils.escapeHtml)

A simple post concerning a very good class org.apache.commons.lang3.StringEscapeUtils in the library commons-lang3-3.1.jar of Apache Commons Lang for the manipulation of java core classes.

From the official Apache documentation:
The standard Java libraries fail to provide enough methods for manipulation of its core classes. Apache Commons Lang provides these extra methods.
Lang provides a host of helper utilities for the java.lang API, notably String manipulation methods, basic numerical methods, object reflection, concurrency, creation and serialization and System properties. Additionally it contains basic enhancements to java.util.Date and a series of utilities dedicated to help with building methods, such as hashCode, toString and equals.
Note that Lang 3.0 (and subsequent versions) use a different package (org.apache.commons.lang3) than the previous versions (org.apache.commons.lang), allowing it to be used at the same time as an earlier version.

In this article, we will try to respond to the question How to remove/convert all “HTML” characters in a String?
The API provides the following methods:

  • the method escapeHtml4 escapes the characters in a String using HTML entities,
  • the method unescapeHtml4 unescapes a string containing entity escapes to a string containing the actual Unicode characters corresponding to the escapes.
01public static void main(String[] args) {
02         
03    // Example n°1
04    {
05        System.out.println("------ Example 1 -------");
06        String contentWithHTML = "<html><head><title>javablog.fr<title></head><body>JavaBlog.fr / Java.lu - Java Development and Tools</body></html>";
07        System.out.println("Content with HTML characters: " + contentWithHTML);
08        // Remove the HTML entities
09        String escapedContentWithHTML = StringEscapeUtils.escapeHtml4(contentWithHTML);
10        System.out.println("Content after the escaping of the HTML characters to HTML entities: " + escapedContentWithHTML);
11        // Unescapes a string containing entity escapes to a string containing the actual Unicode characters corresponding to the escapes.
12        String initialContentWithHTML = StringEscapeUtils.unescapeHtml4(escapedContentWithHTML);
13        System.out.println("Initial content with HTML characters: " + initialContentWithHTML);
14    }
15 
16    // Example n°2
17    {
18        System.out.println("------ Example 2 -------");
19        String contentWithHTML = "\"bread\" & \"butter\"";
20        System.out.println("Content with HTML characters: " + contentWithHTML);
21        // Remove the HTML entities
22        String escapedContentWithHTML = StringEscapeUtils.escapeHtml4(contentWithHTML);
23        System.out.println("Content after the escaping of the HTML characters to HTML entities: " + escapedContentWithHTML);
24        // Unescapes a string containing entity escapes to a string containing the actual Unicode characters corresponding to the escapes.
25        String initialContentWithHTML = StringEscapeUtils.unescapeHtml4(escapedContentWithHTML);
26        System.out.println("Initial content with HTML characters: " + initialContentWithHTML);
27    }
28         
29    // Example n°3
30    {
31        System.out.println("------ Example 3 -------");
32        String escapedContentWithHTML = "&lt;Fran&ccedil;ais&gt;";
33        System.out.println("Content with containing entity escapes: " + escapedContentWithHTML);
34        // Unescapes a string containing entity escapes to a string containing the actual Unicode characters corresponding to the escapes.
35        String contentWithHTML = StringEscapeUtils.unescapeHtml4(escapedContentWithHTML);
36        System.out.println("Initial content with HTML characters: " + contentWithHTML);
37    }  
38         
39    // Example n°4
40    {
41        System.out.println("------ Example 4 -------");
42        String escapedContentWithHTML = "&gt;&zzzz;x";
43        System.out.println("Content with containing entity escapes: " + escapedContentWithHTML);
44        // Unescapes a string containing entity escapes to a string containing the actual Unicode characters corresponding to the escapes.
45        String contentWithHTML = StringEscapeUtils.unescapeHtml4(escapedContentWithHTML);
46        System.out.println("Initial content with HTML characters: " + contentWithHTML);
47    }                          
48}

So, the outputs in console would be:

01------ Example 1 -------
02Content with HTML characters: <html><head><title>javablog.fr<title></head><body>JavaBlog.fr / Java.lu - Java Development and Tools</body></html>
03Content after the escaping of the HTML characters to HTML entities: &lt;html&gt;&lt;head&gt;&lt;title&gt;javablog.fr&lt;title&gt;&lt;/head&gt;&lt;body&gt;JavaBlog.fr / Java.lu - Java Development and Tools&lt;/body&gt;&lt;/html&gt;
04Initial content with HTML characters: <html><head><title>javablog.fr<title></head><body>JavaBlog.fr / Java.lu - Java Development and Tools</body></html>
05------ Example 2 -------
06Content with HTML characters: "bread" & "butter"
07Content after the escaping of the HTML characters to HTML entities: &quot;bread&quot; &amp; &quot;butter&quot;
08Initial content with HTML characters: "bread" & "butter"
09------ Example 3 -------
10Content with containing entity escapes: &lt;Fran&ccedil;ais&gt;
11Initial content with HTML characters: <Français>
12------ Example 4 -------
13Content with containing entity escapes: &gt;&zzzz;x
14Initial content with HTML characters: >&zzzz;x

That’s all!!!!!!!!

Source: Apache Commons Lang

Leave a Reply

Your email address will not be published.

Time limit is exhausted. Please reload CAPTCHA.

Related Post