Support Vector Machine (SVM) is a classic machine learning algorithm. It has both solid theoretical foundation and very good practical performance. Therefore, SVM is wildly used in many real-life classification and regression problems. While SVM is successfully used in many applications, a big challenge is the scalability of kernel SVM: How to efficiently train an accurate kernel SVM model on a large scale dataset (e.g., a dataset with billions of samples and millions of features)? To address the scalability issue of kernel SVM, we propose to investigate this problem from the following three aspects: (1) We first propose to combine the advantages of Nystrom method and random projection to develop a new framework for large scale kernel SVM; Furthermore, we will theoretically investigate the effect of random projection on both Nystrom method and kernel SVM; (2) We propose to design new column sampling methods to address the limitation of Subsampled Randomized Hadamard Transform (SRHT) which is a data-independent random projection method; We propose to design new column sampling techniques that can effective exploit the underlying data properties and therefore to improve the performance; (3) We propose to design new column combination methods to address the limitation of Sparse Embedding which is another popular data-independent random projection method; We propose to design new column combination techniques that can effective exploit the underlying data distribution and therefore to improve the performance. This project aims on addressing the scalability issues of kernel SVM on large scale data which has both important scientific and practical values.
This project is supported by NSFC - Young Scientists Fund (Project No. 61906161).
For further information on this research topic, please contact Dr. LAN, Liang.