Autonomous driving technology has become increasingly important in recent years, with the potential to revolutionize transportation systems and improve road safety. Vision-based methods have long been used in this field, but the major challenges in object detection are efficiency and occlusion. To address this challenge, anchor-free collaborative detection has been proposed as a promising solution. Despite its potential, there has been limited research on this approach. This study proposes an efficient vision-based multi-view object detection and localization method that leverages anchor-free collaborative detection to improve the accuracy of pedestrian detection. The method first generates feature maps to extract the head and foot of pedestrians and then applies spatial aggregation to fuse information from different views. Additionally, the study examines the efficiency of different convolutional neural network architectures for the feature map extraction model and identifies ResNet-18 and ResNet-34 as the most efficient models for the task. The proposed method has the potential to significantly improve the accuracy of pedestrian detection and localization in autonomous driving scenarios, which is critical for ensuring safety. Overall, this work contributes to the development of vision-based methods for autonomous driving and has significant implications for the future of transportation technology. |