• Atif Khan Islamia College Peshawar
  • Junaid Yousaf Islamia College Peshawar
  • Tila Muhammad Islamia College Peshawar
  • Muhammad Ismail University of Engineering and Technology Peshawar Pakistan
Keywords: Text Analysis, NLP, Classification algorithms, Text categorization, Machine learning algorithms, Hate Speech.


Toxic online material has emerged as a significant issue in contemporary society as a result of the exponential increase in internet usage by individuals from all walks of life, including those with varied cultural and educational backgrounds. Automatic identification of damaging text offers a problem because it needs to differentiate between disrespectful language and hate speech. In this paper, we provide a technique for automatically categorizing literature into the categories of hateful and non-hateful. This study discusses the difficulty of automatically identifying hate speech. It is examined how machine learning and natural language processing may be combined in various ways. Following that, the experiment results are contrasted in terms of how well they apply to this project. When we examine the models under consideration and fine-tune the model that produces the greatest performance accuracy after testing it against test data, we get a 94% accuracy rate.


[1] Waseem Z, Hovy Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter. In: SRW@HLT-NAACL; 2016.
[2] Robertson C, Mele C,Tavernise S. 11 Killed in Synagogue Massacre; Suspect Charged with 29 Counts. 2018
[3] Huei-Po Su, Chen-Jie Huang, Hao-Tsung Chang, and Chuan-Jie Lin, Rephrasing Profanity in Chinese Text. In Proceedings of the Workshop Workshop on Abusive Language Online (ALW), Vancouver, Canada, 2017.
[4] Shervin Malmasi, Marcos Zampieri, Detecting Hate Speech in Social Media, 26 Dec 2017, doi:
arXiv:1712.06427v2 [cs.CL].
[5] Chen, Y., Detecting offensive language in social medias for protection of adolescent online safety. 2011.
[6] Travis, A., Anti-Muslim hate crime surges after Manchester and London Bridge attacks. The Guardian, 2017.
[7] Bjorn Ross, Michael Rist, Guillermo Carbonell, Ben-jamin Cabrera, Nils Kurowsky, and Michael Wojatzki. 2016. Measuring the reliability of hate speech annotations: The case of the european refugee crisis. Bochum, Germany, September
[8] Justin Cheng, Christian Danescu-Niculescu-Mizil, and Jure Leskovec. 2015. Antisocial behavior in online discussion communities. In Proceedings of the 9th International Conference on Web and Social Media, pages 61–70, University of Oxford, Oxford, UK. AAAI Press.
[9] Pete Burnap and Matthew L. Williams. 2016. Us and them: identifying cyber hate on twitter across multiple protected characteristics. EPJ Data Science, 5(1):1–15.
[10] https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html
[11] Pang, B., & Lee, L. 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, Vol 2 (1-2).
[12] https://link.springer.com/article/10.1007/s42979-021-00592-x#author-information
[13] H. Watanabe, M. Bouazizi and T. Ohtsuki, “Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions and Perform Hate Speech Detection”, IEEE Access, vol. 6, pp. 13825-13835, 2018.
[14] de Gibert O, Perez N, Garc’ia-Pablos A, Cuadros M. Hate Speech Dataset from a White Supremacy Forum. In: 2nd Workshop on Abusive Language Online @ EMNLP; 2018
[15] Paula Fortuna. 2017. Automatic detection of hate speech in text: an overview of the topic and dataset annotation with hierarchical classes. Master’s thesis, Faculdade De Engenharia Da Universidade Do Porto, Porto, Portugal, June.
[16] Lei Gao and Ruihong Huang. 2017. Detecting online hate speech using context aware models. CoRR, abs/1710.07395.
[17] Chakravarthi Bharathi Raja, Kumar M Anand, McCrae John Philip, B, Premjith, K P Soman and Mandl Thomas, Overview of the track on "HASOC-Offensive Language Identification- Dravidian Code Mix”, inproceedings hasoc dravidian—acm, 2020