Spam Detection in Voice-over-IP Calls through Semi-Supervised Clustering

Get BibTex-formatted data

Download

PDF

Author

Yu-Sung Wu, Saurabh Bagchi, Navjot Singh, Ratsameetip Wita

Tech report number

CERIAS TR 2009-03

Entry type

conference

Abstract

In this paper, we present an approach for detection of spam calls over IP telephony called SPIT in Voice-over-IP (VoIP) systems. SPIT detection is different from spam detection in email in that the process has to be soft real-time, fewer features are available for examination due to the difficulty of mining voice traffic at runtime, and similarity in signaling traffic between legitimate and malicious callers. Our approach differs from existing work in its adaptability to new environments without the need for laborious and error-prone manual parameter configuration. We use clustering based on the call parameters leveraging optional user feedback for some calls, which they mark as SPIT or non-SPIT. We improve on a popular algorithm for semi-supervised learning, called MPCK-Means, to make it scalable to a large number of calls. Our evaluation on captured call traces shows a fifteen fold reduction in computation time, with improvement in detection accuracy.

Download

PDF

Date

2009 – 6 – 29

Key alpha

Voice-over IP systems, spam detection, spit detection, semi-supervised learning, clustering

Publication Date

2009-06-29

BibTex-formatted data

To refer to this entry, you may select and copy the text below and paste it into your BibTex document. Note that the text may not contain all macros that BibTex supports.

@Conference{ Voice-over IP systems, spam detection, spit detection, semi-supervised learning, clustering,
	title = "Spam Detection in Voice-over-IP Calls through Semi-Supervised Clustering",
	author = "Yu-Sung Wu, Saurabh Bagchi, Navjot Singh, Ratsameetip Wita",
	year = "2009",
	month = "6",
	day = "29",
	abstract = "In this paper, we present an approach for detection of spam calls over IP telephony called SPIT in Voice-over-IP (VoIP) systems. SPIT detection is different from spam detection in email in that the process has to be soft real-time, fewer features are available for examination due to the difficulty of mining voice traffic at runtime, and similarity in signaling traffic between legitimate and malicious callers. Our approach differs from existing work in its adaptability to new environments without the need for laborious and error-prone manual parameter configuration. We use clustering based on the call parameters leveraging optional user feedback for some calls, which they mark as SPIT or non-SPIT. We improve on a popular algorithm for semi-supervised learning, called MPCK-Means, to make it scalable to a large number of calls. Our evaluation on captured call traces shows a fifteen fold reduction in computation time, with improvement in detection accuracy.",
}