Title : Web Page Classification .

Document classification is an important and time-consuming process in e.g. libraries and web portals . Automated document classification approaches use techniques similar to spam filters in that they treat documents as a collection of terms, and they use a training set to determine the most productive terms . Web Pages embed structure in a documnt by means of HTML . The aim of this project is to implement a document classifier, extending it to take the HTML structure of a Web Page into account, as well as the other Web Pages it is linking to .

Tools : required skills : knowledge of machine learning, knowledge of HTML, good programming skills .