Deep Learning-Based Email Spam Identification and Classification for Enhanced Cybersecurity

Vaibhav Maniar¹, Aniruddha Arjun Singh Singh², Rami Reddy Kothamaram³, Dinesh Rajendran⁴, Venkata Deepak Namburi⁵, Vetrivelan Tamilmani⁶

¹Oklahoma City University, MBA / Product Management, USA.
²ADP, Agile Team Leader, USA.
³California University of Management and Science, MS in Computer Information Systems, USA.
⁴Coimbatore Institute of Technology, MSc Software Engineering, USA.
⁵Department of Computer Science, University of Central Missouri, USA.
⁶Principal Service Architect, SAP America.

View / Download Full Article (PDF)

Abstract

Digital communication and cybersecurity are major problems concerning the spread of unsolicited and malicious emails. Spam emails are vectors of phishing, malware and financial fraud as well as inbox clutter. The conventional spam detection methods, like rule-based spam filters and the classical machine learning models, have limitations in detecting spam based on predetermined patterns and require extensive manual feature engineering. To address these challenges, this study proposes a deep learning-driven framework that enables automatic extraction of complex and hierarchical patterns in email data. The model is trained and evaluated using the Spam Base benchmark dataset consisting of 4601 emails with 57 features. Comprehensive preprocessing techniques such as parsing, tokenization, stemming, case folding, error correction, and regression-based extraction are applied to ensure high-quality input data. Feature extraction, dimensionality reduction, and classification techniques are utilized to further optimize the dataset. Using a 70:30 train-test split, an Artificial Neural Network (ANN) achieved Accuracy, Recall, Precision, and F1-score values of 99.50, 99.68, 99.68, and 99.68 respectively. These results demonstrate the robustness, scalability, and effectiveness of the proposed framework in detecting spam emails, contributing to improved cybersecurity and efficient email communication systems. Future work can explore multi-modal and transformer-based approaches to further enhance detection of advanced spam threats.

Keywords

Email Spam Detection, Deep Learning, Machine Learning Cybersecurity, Malicious Email Filtering, Email Classification, Support Vector Machines (SVM), Naive Bayes (NB), Long Short-Term Memory (LSTM).

References

[1] V. Rajavel, G. Balaji, and A. V. Gomathinayagam, “Eye Gaze Pecularities Detection in Children with Autism using a Head-free cam,” Int. J. Eng. Sci. Res. Technol., vol. 5, no. 6, pp. 868–876, 2016.

[2] A. Zamir et al., “A feature-centric spam email detection model using diverse supervised machine learning algorithms,” Electronic Library, 2020.

[3] Y. Alamlahi and A. Muthana, “An Email Modelling Approach for Neural Network Spam Filtering to Improve Score-based Anti-spam Systems,” IJCNIS, 2018.

[4] D. D. Rao, “Multimedia Based Intelligent Content Networking for Future Internet,” UKSim European Symposium on Computer Modeling and Simulation, 2009.

[5] E. E. Eryılmaz et al., “Filtering Turkish Spam Using LSTM From Deep Learning Techniques,” IEEE ISDFS, 2020.

[6] S. S. S. Neeli, “Real-Time Data Management with In-Memory Databases: A Performance-Centric Approach,” J. Adv. Dev. Res., 2020.

[7] H. Bhuiyan et al., “A survey of existing e-mail spam filtering methods considering machine learning techniques,” Glob. J. Comput. Sci. Technol., 2018.

[8] S. Bosaeed et al., “A Fog-Augmented Machine Learning based SMS Spam Detection and Classification System,” IEEE FMEC, 2020.

[9] Y. R. Bujang and H. Hussin, “Should we be concerned with spam emails? A look at its impacts and implications,” ICT4M, 2013.

[10] A. Balasubramanian, “Proactive Machine Learning Approach to Combat Money Laundering in Financial Sectors,” IJIRMEPS, 2019.

[11] D. Wang, D. Irani, and C. Pu, “A Study on Evolution of Email Spam Over Fifteen Years,” IEEE Collaborative Computing, 2013.

[12] A. Tyagi, “Content based spam classification-a deep learning approach,” University of Calgary, 2016.

[13] A. Balasubramanian and N. Gurushankar, “AI-Driven Supply Chain Risk Management,” IJIRMEPS, 2020.

[14] S. S. S. Neeli, “Optimizing Database Management with DevOps,” J. Adv. Dev. Res., 2020.

[15] M. Alauthman, “Botnet Spam E-Mail Detection Using Deep Recurrent Neural Network,” IJETER, 2020.

[16] S. E. Rahman and S. Ullah, “Email Spam Detection using Bidirectional LSTM with CNN,” IEEE TENSYMP, 2020.

[17] S. A. Khamis et al., “Header Based Email Spam Detection Framework Using SVM,” Soft Computing and Data Mining, 2020.

[18] M. O. Akinrele, “Detection of Phishing and Spam Emails Using Ensemble Technique,” National College of Ireland, 2019.

[19] O. Göker, “Spam filtering using big data and deep learning,” 2018.

[20] V. Kumar et al., “Spam Email Detection using ID3 Algorithm and Hidden Markov Model,” CICT, 2018.

[21] Y. K. Zamil et al., “Spam image email filtering using K-NN and SVM,” IJECE, 2019.

[22] T. Wu et al., “Detecting spamming activities in twitter based on deep-learning technique,” Concurrency Computation Practice and Experience, 2017.

[23] G. Jain et al., “Spam detection on social media text,” International Journal of Computer Science and Engineering, 2017.

[24] S. Srinivasan et al., “Deep CNN Based Image Spam Classification,” IEEE CDMA, 2020.

[25] H. M et al., “Deep learning based phishing e-mail detection,” ACM IWSPA, 2018.

[76] Shashidhar, R., et al. (2023). Empowering investors: Insights from sentiment analysis, FFT, and regression in Indian stock markets. IEEE AIKIIE.

[77] Jayakeshav Reddy Bhumireddy et al. Predictive models for early detection of chronic diseases in elderly populations: A machine learning perspective. Int J Comput Artif Intell 2023.