Muestra métricas de impacto externas asociadas a la publicación. Para mayor detalle:
| Indexado |
|
||
| DOI | 10.1109/LA-CCI62337.2024.10814874 | ||
| Año | 2024 | ||
| Tipo |
Citas Totales
Autores Afiliación Chile
Instituciones Chile
% Participación
Internacional
Autores
Afiliación Extranjera
Instituciones
Extranjeras
In autonomous driving and video surveillance, efficient object detection is crucial, requiring a balance between speed and accuracy. Recent deep learning advancements, particularly the integration of Transformers into computer vision, have often led to trade-offs between these aspects. This paper introduces YotoR (You Only Transform One Representation), a novel architecture designed to address these challenges by combining the strengths of Swin Transformers and YoloR. The YotoR architecture combines the Swin Transformer backbone with the YoloR neck and head, aiming to leverage the precision of Transformers while maintaining the speed of YoloR models. Experimental results demonstrate that YotoR models TP5 and BP4 surpass the performance of both consistently outperform YoloR P6 and Swin Transformers models in terms of object detection accuracy and inference speed. These results highlight the potential for further model combinations and improvements in real-Time object detection with Transformers. The paper concludes by emphasizing the broader implications of YotoR, including its potential to enhance transformer-based models for image-related tasks.
| Ord. | Autor | Género | Institución - País |
|---|---|---|---|
| 1 | Villa, Jose Ignacio Diaz | - |
Universidad de Chile - Chile
|
| 2 | Loncomilla, Patricio | - |
Centro Avanzado de Tecnologia para la Mineria - Chile
|
| 3 | Ruiz-Del-Solar, Javier | - |
Centro Avanzado de Tecnologia para la Mineria - Chile
|