A simple post concerning a very good class org.apache.commons.lang3.StringEscapeUtils in the library commons-lang3-3.1.jar of Apache Commons Lang for the manipulation of java core classes.

From the official Apache documentation:
The standard Java libraries fail to provide enough methods for manipulation of its core classes. Apache Commons Lang provides these extra methods.
Lang provides a host of helper utilities for the java.lang API, notably String manipulation methods, basic numerical methods, object reflection, concurrency, creation and serialization and System properties. Additionally it contains basic enhancements to java.util.Date and a series of utilities dedicated to help with building methods, such as hashCode, toString and equals.
Note that Lang 3.0 (and subsequent versions) use a different package (org.apache.commons.lang3) than the previous versions (org.apache.commons.lang), allowing it to be used at the same time as an earlier version.

In this article, we will try to respond to the question How to remove/convert all “HTML” characters in a String?
The API provides the following methods:

  • the method escapeHtml4 escapes the characters in a String using HTML entities,
  • the method unescapeHtml4 unescapes a string containing entity escapes to a string containing the actual Unicode characters corresponding to the escapes.
public static void main(String[] args) {
		
	// Example n°1
	{
		System.out.println("------ Example 1 -------");
		String contentWithHTML = "<html><head><title>javablog.fr<title></head><body>JavaBlog.fr / Java.lu - Java Development and Tools</body></html>";
		System.out.println("Content with HTML characters: " + contentWithHTML);
		// Remove the HTML entities	
		String escapedContentWithHTML = StringEscapeUtils.escapeHtml4(contentWithHTML);
		System.out.println("Content after the escaping of the HTML characters to HTML entities: " + escapedContentWithHTML);
		// Unescapes a string containing entity escapes to a string containing the actual Unicode characters corresponding to the escapes.
		String initialContentWithHTML = StringEscapeUtils.unescapeHtml4(escapedContentWithHTML);
		System.out.println("Initial content with HTML characters: " + initialContentWithHTML);
	}

	// Example n°2
	{
		System.out.println("------ Example 2 -------");
		String contentWithHTML = "\"bread\" & \"butter\"";
		System.out.println("Content with HTML characters: " + contentWithHTML);
		// Remove the HTML entities	
		String escapedContentWithHTML = StringEscapeUtils.escapeHtml4(contentWithHTML);
		System.out.println("Content after the escaping of the HTML characters to HTML entities: " + escapedContentWithHTML);
		// Unescapes a string containing entity escapes to a string containing the actual Unicode characters corresponding to the escapes.
		String initialContentWithHTML = StringEscapeUtils.unescapeHtml4(escapedContentWithHTML);
		System.out.println("Initial content with HTML characters: " + initialContentWithHTML);
	}
		
	// Example n°3
	{
		System.out.println("------ Example 3 -------");
		String escapedContentWithHTML = "&lt;Fran&ccedil;ais&gt;";
		System.out.println("Content with containing entity escapes: " + escapedContentWithHTML);
		// Unescapes a string containing entity escapes to a string containing the actual Unicode characters corresponding to the escapes.
		String contentWithHTML = StringEscapeUtils.unescapeHtml4(escapedContentWithHTML);
		System.out.println("Initial content with HTML characters: " + contentWithHTML);
	}	
		
	// Example n°4
	{
		System.out.println("------ Example 4 -------");
		String escapedContentWithHTML = "&gt;&zzzz;x";
		System.out.println("Content with containing entity escapes: " + escapedContentWithHTML);
		// Unescapes a string containing entity escapes to a string containing the actual Unicode characters corresponding to the escapes.
		String contentWithHTML = StringEscapeUtils.unescapeHtml4(escapedContentWithHTML);
		System.out.println("Initial content with HTML characters: " + contentWithHTML);
	}							
}

So, the outputs in console would be:

------ Example 1 -------
Content with HTML characters: <html><head><title>javablog.fr<title></head><body>JavaBlog.fr / Java.lu - Java Development and Tools</body></html>
Content after the escaping of the HTML characters to HTML entities: &lt;html&gt;&lt;head&gt;&lt;title&gt;javablog.fr&lt;title&gt;&lt;/head&gt;&lt;body&gt;JavaBlog.fr / Java.lu - Java Development and Tools&lt;/body&gt;&lt;/html&gt;
Initial content with HTML characters: <html><head><title>javablog.fr<title></head><body>JavaBlog.fr / Java.lu - Java Development and Tools</body></html>
------ Example 2 -------
Content with HTML characters: "bread" & "butter"
Content after the escaping of the HTML characters to HTML entities: &quot;bread&quot; &amp; &quot;butter&quot;
Initial content with HTML characters: "bread" & "butter"
------ Example 3 -------
Content with containing entity escapes: &lt;Fran&ccedil;ais&gt;
Initial content with HTML characters: <Français>
------ Example 4 -------
Content with containing entity escapes: &gt;&zzzz;x
Initial content with HTML characters: >&zzzz;x

That’s all!!!!!!!!

Source: Apache Commons Lang