Forgiving SAX-style HTML parser for robust HTML document processing
Forgiving SAX-style HTML parser
ekhtml
$ ekhtml input.html
$ cat document.html | ekhtml
$ ekhtml broken.html > output.xml