This project discusses the issue of violence detection in videos using computer vision and machine learning techniques. The authors propose a novel model that combines Full Temporal Cross Fusion Network (FTCF) and Residual Network (ResNet) to capture both spatial and temporal information about violent events. FTCF integrates information from multiple sources such as RGB frames and optical flow to capture temporal dynamics, while ResNet enables the training of very deep neural networks using residual connections. The proposed model achieves state-of-the-art per- formance in violence detection and could contribute to the increase of accuracy and decrease of the human cost in this field. The model finally achieves a test accuracy of 95 percent within 100 epochs on the movie dataset, a popular dataset for violence detection.