With its critical role in business and service delivery through mobile
devices, SMS (Short Message Service) has long been abused for spamming, which
is still on the rise today possibly due to the emergence of A2P bulk messaging.
The effort to control SMS spam has been hampered by the lack of up-to-date
information about illicit activities. In our research, we proposed a novel
solution to collect recent SMS spam data, at a large scale, from Twitter, where
users voluntarily report the spam messages they receive. For this purpose, we
designed and implemented SpamHunter, an automated pipeline to discover SMS spam
reporting tweets and extract message content from the attached screenshots.
Leveraging SpamHunter, we collected from Twitter a dataset of 21,918 SMS spam
messages in 75 languages, spanning over four years. To our best knowledge, this
is the largest SMS spam dataset ever made public. More importantly, SpamHunter
enables us to continuously monitor emerging SMS spam messages, which
facilitates the ongoing effort to mitigate SMS spamming. We also performed an
in-depth measurement study that sheds light on the new trends in the spammer’s
strategies, infrastructure and spam campaigns. We also utilized our spam SMS
data to evaluate the robustness of the spam countermeasures put in place by the
SMS ecosystem, including anti-spam services, bulk SMS services, and text
messaging apps. Our evaluation shows that such protection cannot effectively
handle those spam samples: either introducing significant false positives or
missing a large number of newly reported spam messages.