Colección SciELO Chile

Departamento Gestión de Conocimiento, Monitoreo y Prospección
Consultas o comentarios: productividad@anid.cl
Búsqueda Publicación
Búsqueda por Tema Título, Abstract y Keywords



Intramodal consistency in triplet-based cross-modal learning for image retrieval
Indexado
WoS WOS:001434447200001
Scopus SCOPUS_ID:85219645781
DOI 10.1007/S10994-024-06710-Z
Año 2025
Tipo artículo de investigación

Citas Totales

Autores Afiliación Chile

Instituciones Chile

% Participación
Internacional

Autores
Afiliación Extranjera

Instituciones
Extranjeras


Abstract



Cross-modal retrieval requires building a common latent space that captures and correlates information from different data modalities, usually images and texts. Cross-modal training based on the triplet loss with hard negative mining is a state-of-the-art technique to address this problem. This paper shows that such approach is not always effective in handling intra-modal similarities. Specifically, we found that this method can lead to inconsistent similarity orderings in the latent space, where intra-modal pairs with unknown ground-truth similarity are ranked higher than cross-modal pairs representing the same concept. To address this problem, we propose two novel loss functions that leverage intra-modal similarity constraints available in a training triplet but not used by the original formulation. Additionally, this paper explores the application of this framework to unsupervised image retrieval problems, where cross-modal training can provide the supervisory signals that are otherwise missing in the absence of category labels. Up to our knowledge, we are the first to evaluate cross-modal training for intra-modal retrieval without labels. We present comprehensive experiments on MS-COCO and Flickr30k, demonstrating the advantages and limitations of the proposed methods in cross-modal and intra-modal retrieval tasks in terms of performance and novelty measures. We also conduct a case study on the ROCO dataset to assess the performance of our method on medical images and present an ablation study on one of our approaches to understanding the impact of the different components of the proposed loss function. Our code is publicly available on GitHub https://github.com/MariodotR/FullHN.git.

Revista



Revista ISSN
Machine Learning 0885-6125

Métricas Externas



PlumX Altmetric Dimensions

Muestra métricas de impacto externas asociadas a la publicación. Para mayor detalle:

Disciplinas de Investigación



WOS
Computer Science, Artificial Intelligence
Scopus
Sin Disciplinas
SciELO
Sin Disciplinas

Muestra la distribución de disciplinas para esta publicación.

Publicaciones WoS (Ediciones: ISSHP, ISTP, AHCI, SSCI, SCI), Scopus, SciELO Chile.

Colaboración Institucional



Muestra la distribución de colaboración, tanto nacional como extranjera, generada en esta publicación.


Autores - Afiliación



Ord. Autor Género Institución - País
1 Mallea, Mario - Univ Politecn Cataluna - España
Universitat Politècnica de Catalunya - España
2 Nanculef, Ricardo - Universidad Técnica Federico Santa María - Chile
3 Araya, Mauricio - Universidad Técnica Federico Santa María - Chile

Muestra la afiliación y género (detectado) para los co-autores de la publicación.

Financiamiento



Fuente
Agencia Nacional de Investigación y Desarrollo
AC3E ANID-Basal
National Agency for Research and Development (ANID, Chile)
Agenția Națională pentru Cercetare și Dezvoltare
Agencia Nacional de Investigacin y Desarrollo

Muestra la fuente de financiamiento declarada en la publicación.

Agradecimientos



Agradecimiento
This research was partially funded by National Agency for Research and Development (ANID, Chile), grant numbers FONDEF IT21I0019 and AC3E ANID-Basal Project AFB240002.
This research was partially funded by National Agency for Research and Development (ANID, Chile), grant numbers FONDEF IT21I0019 and AC3E ANID-Basal Project AFB240002.

Muestra la fuente de financiamiento declarada en la publicación.