TolerantSaxDocumentBuilder tolerantSaxDocumentBuilder = new TolerantSaxDocumentBuilder(XMLUnit.getTestParser());
HTMLDocumentBuilder htmlDocumentBuilder = new HTMLDocumentBuilder(tolerantSaxDocumentBuilder);
Document doc = htmlDocumentBuilder.parse(content);
XpathEngine engine = XMLUnit.newXpathEngine();
String res = engine.evaluate( "/html/body//a[@href]", doc);
Sunday, June 14, 2009
Parsing HTML using XMLUnit
XMLUnit HTML document builder is useful outside the testing environment if you want to parse HTML files using DOM model. It also supports powerful XPath engine to makes html traversing easier. Here is sample code to fetch all anchor link tag which has href attribute defined.