Locating Information with XPath in Java

If you want to locate a specific piece of information in an XML document, it can be a bit of a hassle to navigate the nodes of the DOM tree. The XPath language makes it simple to access tree nodes. For example, suppose you have this XHTML document:

<html>

<head>

<title>. . .</title>

</head>

</html>

You can get the title text by evaluating the XPath expression

/html/head/title/text()

That’s a lot simpler than the plain DOM approach:

  1. Get the document root.
  2. Get the first child and cast is as an Element.
  3. Locate the title element among its children.
  4. Get its first child and cast it as a CharacterData
  5. Get its data.

An XPath can describe a set of nodes in an XML document. For example, the XPath

/html/body/form

describes the set of all form elements that are children of the body element in an XHTML file. You can select a particular element with the [] operator:

/html/body/form[1]

is the first form. (The index values start at 1.)

Use the @ operator to get attribute values. The XPath expression

/html/body/form[1]/@action

describes the action attribute of the first table. The XPath expression

/html/body/form/@action

describes all action attribute nodes of all form elements that are children of the body element.

There are a number of useful XPath functions. For example,

count(/html/body/form)

returns the number of form children of the body element. There are many more elaborate XPath expressions; see the specification at www.w3c.org/TR/xpath or the online tutorial at www.zvon.org/xxl/XPathTutorial/General/examples.html.

To evaluate XPath expressions, first create an XPath object from an XPathFactory:

XPathFactory xpfactory = XPathFactory.newInstance();

path = xpfactory.newXPath();

Then, call the evaluate method to evaluate XPath expressions:

String username = path.evaluate(7html/head/title/text()”, doc);

You can use the same XPath object to evaluate multiple expressions.

This form of the evaluate method returns a string result. It is suitable for retriev­ing text, such as the text child of the title element in the preceding example. If an XPath expression yields multiple nodes, make a call such as the following:

XPathNodes result = path.evaluateExpression(7html/body/form”, doc, XPathNodes.class);

The XPathNodes class is similar to a NodeList, but it extends the Iterable interface, allowing you to use an enhanced for loop.

This method was added in Java 9. In older releases, use the following call instead:

var nodes = (NodeList) path.evaluate(7html/body/form”, doc, XPathConstants.NODESET);

If the result is a single node, use one of the following calls:

Node node = path.evaluateExpression(7html/body/form[1]”, doc, Node.class);

node = (Node) path.evaluate(7html/body/form[1]”, doc, XPathConstants.NODE);

If the result is a number, use:

int count = path.evaluateExpression(“count(/html/body/form)”, doc, Integer.class);

count = ((Number) path.evaluate(“count(/html/body/form)”,

doc, XPathConstants.NUMBER)).intValue();

You don’t have to start the search at the document root; you can start at any node or node list. For example, if you have a node from a previous evaluation, you can call

String result = path.evaluate(expression, node);

If you do not know the result of evaluating an XPath expression (perhaps because it comes from a user), then call

XPathEvaluationResult<?> result = path.evaluateExpression(expression, doc);

The expression result.type() is one of the constants

STRING

NODESET

NODE

NUMBER

BOOLEAN

of the XPathEvaluationResutt.XPathResultType enumeration. Call resutt.vatue() to get the value.

The program in Listing 3.6 demonstrates evaluation of arbitrary XPath expres­sions. Load an XML file and type an expression. The result of the expression is displayed.

Source: Horstmann Cay S. (2019), Core Java. Volume II – Advanced Features, Pearson; 11th edition.

Leave a Reply

Your email address will not be published. Required fields are marked *