libhtml is a C library for parsing HTML that aims to conform to the HTML5 specification and be useful for parsing real-world web pages.
Currently, the library is still in the planning stage, and consists of a simple test program that attempts to detect the encoding of HTML files.
To checkout the code from the Subversion repository:
svn co https://libhtml.svn.sourceforge.net/svnroot/libhtml/trunk libhtml
For more information, see the SourceForge project page for libhtml.