Nello Cristianini, John Shawe-Taylor
This is the first comprehensive introduction to Support Vector Machines (SVMs), a new generation learning system based on recent advances in statistical learning theory. SVMs deliver state-of-the-art performance in real-world applications such as text categorisation, hand-written character recognition, image classification, biosequences analysis, etc., and are now established as one of the standard tools for machine learning and data mining. Students will find the book both stimulating and accessible, while practitioners will be guided smoothly through the material required for a good grasp of the theory and its applications.
I'm looking for a really good tutorial on machine learning for text classification perhaps using Support vector machine (SVM) or other appropriate technology for large-scale supervised text classification. If there isn't a great tutorial, can anyone give me pointers to how a beginner should get started and do a good job with things like feature detection for English language Text Classification.
Books, articles, anything that can help beginners get started would be super helpful!
In its classical flavour the Support Vector Machine (SVM) is a binary classifier (i.e., it solves classification problems involving two classes). However, it can be also used to solve multi-class classification problems by applying techniques likes One versus One, One Versus All or Error Correcting Output Codes [Alwein et al.]. Also recently, a new modification of the classical SVM the multiclass-SVM allows to solve directly multi-class classification problems [Crammer et al.].
Now as far as it concerns document classification, your main problem is feature extraction (i.e., how to acquire certain classification features from your documents). This is not a trivial task and there's a batch of bibliography on the topic (e.g., [Rehman et al.], [Lewis]).
Once you've overcome the obstacle of feature extraction, and have labeled and placed your document samples in a feature space you can apply any classification algorithm like SVMs, AdaBoost e.t.c.