Enhancing Content-Based Image Retrieval Using Aggregation of Binary Features, Deep Learning, and Supermetric Search

(1)

U N I V E R S I T À D I P I S A

DIPARTIMENTO DI INGEGNERIA DELL’INFORMAZIONE Dottorato di Ricerca in Ingegneria dell’Informazione

Activity Report by the Student LUCIA VADICAMO – PhD Program, cycle XXX

Tutors:

 Dr. Giuseppe Amato (CNR, Institute of Information Science and Technologies (ISTI))  Dr. Fabrizio Falchi (CNR, Institute of Information Science and Technologies (ISTI))  Prof. Francesco Marcelloni (UniPI, Dipartimento di Ingegneria dell’Informazione)

1. Research Activity

The research activity that I carried out during the three-year period of the PhD Program is based on proposing efficient and effective methods for Content-Based Image Retrieval (CBIR) and

Simi-larity Search. CBIR embraces any technology that allows systems to organize archives containing

digital pictures so that they can be searched and retrieved by using their visual contents (i.e. with-out using text or other metadata associated to the images). The content-based search relies on ex-tracting numerical descriptors (features) of the visual content of the images such that similar im-ages have similar representations. The image descriptors usually lie in a metric space where it is possible to use a distance function for the image comparison. The image descriptors are then managed and stored in a database so that, given an image as a query, the system searches for the database objects whose features are the closest to the query ones. Finally, in order to perform im-age retrieval on a large scale, the imim-age descriptors are indexed and appropriate searching algo-rithms are adopted for the search phase.

My study focused on three fundamental stages of a CBIR system: the numerical representation of the image visual content, the processing/indexing of the image features, and the query-by-example search.

Initially, I focused on effective image representations by investigating and experimentally compar-ing the Convolutional Neural Network (CNN) features, aggregations of local features (e.g. BoW, VLAD, and FV), and their combination. I performed the experimental evaluation in an applicative scenario concerned the recognition of ancient inscriptions and other objects related to cultural heritage (publications [J3, JN1, C3 ,C9,C10]). One main output of this activity has been showing that a very high effectiveness is achieved by combining CNN features and aggregation methods. However, the extraction of the local features (e.g. SIFT) used in the aggregation approaches may be too costly to be dealt with devices with very limited resources. In order to address efficiency is-sues, I investigated methods to aggregate binary local features (e.g. ORB, LATCH, AKAZE) whose extraction process is up two orders of magnitude faster than non-binary features (publications [J2,C6,C7]). In this regard, I proposed a novel encoding schema, named BMM-FV, which generaliz-es the FV approach by using a Bernoulli Mixture Model to dgeneraliz-escribe the distribution of a set of bina-ry vectors. I performed an extensive experimental evaluation on benchmarks for image retrieval, which shows that the proposed BMM-FV outperforms other state-of-the-art aggregations of bina-ry features and achieves state-of-the-art results when combined with CNN features.

(2)

Subsequently, I investigated and proposed novel approaches to process some state-of-the-art im-age descriptors, like CNN features and VLAD vectors (publications [C5,C8]). Specifically, I proposed 1) an efficient and effective technique, called Deep Permutation, to represent and index deep fea-tures using a permutation-based approach, and 2) the Blockwise Surrogate Text Representation to represent and index compound metric objects using off-the-shelf text search engine.

Finally, I worked on indexing and searching algorithms to efficiently perform similarity search in metric space. In this context, the general interest is to efficiently find data objects that are close to an arbitrary query object, where the distance function is the only way by which two objects can be compared. This topic was thoroughly inspected in past research literature. However, I analyzed foundations of metric search from a different point of view by using finite isometric embeddings in Euclidean spaces. I have shown that some metric properties can be reread in the light of discrete geometry and that discrete geometry allows defining new algorithms to be used for the metric search. This research activity was carried out in collaboration with the Prof. Richard Connor of the University of Strathclyde. Together with Prof. Connor, I inspected a large class of metric spaces, called supermetric spaces, meeting the four-point property, which is a discrete geometric property that is stronger than the triangle inequality. We shown that many supermetric spaces commonly used in applications have a further geometric property called n-point property. By exploiting the

four- and n- point property we derived bounds on the distance that are tighter than those

ob-tained using the triangle inequality and we proposed techniques that allow improving the search in supermetric spaces (publications [JS1, J1, C2,C4]). The proposed approaches, which were vali-dated both theoretically and experimentally, shows promising results for space pruning, (su-per)metric indexing and dimensionality reduction tasks.

During the last year, I also worked on a research activity with ISTI-CNR, ILC-CNR, and IIT-CNR, which aims to face the challenge of training a visual sentiment classifier starting from a large set of user-generated and unlabeled contents. The IIT-CNR collected more than 3 million tweets contain-ing both text and images (randomly selected from the stream of all globally produced tweets). The ILC-CNR classified the sentiment polarity of the texts and selected the tweets having the most con-fident predictions to form a training dataset. I collaborated with other researchers of ISTI-CNR to train a visual classifier able to discover the sentiment polarity of images by leveraging on transfer learning. Although the text of the tweets is often noisy or misleading with respect to the image content (e.g. irrelevant comments), we show that our cross-media and self-supervised approach can be profitably used for learning visual sentiment classifiers in the wild [C1]. This work has not been included in the thesis since it is out of the scope of the main topics of the dissertation, but it still represents a valuable research activity carried out during the PhD period.

All researches were performed within the following projects:

 EAGLE, Europeana network of Ancient Greek and Latin Epigraphy, co-founded by the Euro-pean Commission, CIPICT-PSP.2012.2.1 - EuroEuro-peana and creativity, Grant Agreement n. 325122;

(3)

2. Training Activity

During the PhD Program, I have attended the following courses:

 Game Theory and Optimization in communications and Networking, Prof. M. Luise, Dr. L. Sanguinetti, 16 hours (4 CFU);

 Multi-modal Registration of Visual Data, Dr. M. Corsini, 15 hours (4CFU)  Smart Spaces, Prof. Dmitry G. Korzun, 20 hours ( 5 CFU);

 Probabilità, statistica e processi stocastici (Probability, statistics, and stochastic processes), Prof. F. Flandoli, 24 hours (6 CFU);

 English for writing and presenting scientific papers, A. Wallwork, 20 hours (2 CFU);

 Cloud Computing for Big Data Analysis, Dr. C. Lucchese, Dr. F. M. Nardini, Dr. N. Tonellotto– 20 hours (5 CFU);

 Enterprise Information Management, Dr. G. Amato, Dr. F. Falchi, Dr. C. Gennaro, Dr. P. Bo-lettieri, 90 hours (9 CFU);

for a total of 35 credits.

3. Research Periods at Qualified Research Institutions

In May 2016, I visited the Laboratory of Data Intensive Systems and Applications (DISA) of the Ma-saryk University (Brno, Czech Republic) for about four weeks. During that period, I had the oppor-tunity to collaborate with researchers and PhD students of the DISA laboratory on the following research topics:

 Similarity searching in motion capture data. Specifically, I performed an experimental evaluation of a number of approaches (FV, covariance matrix, CNN features) to encode motion capture data into fix-sized vector representations for the motion retrieval and recognition task.

 Indexing CNN features using binary sketches. I investigated and tested several approaches to transform CNN features into compact binary sketches to be used for speeding up the similarity searching in a database of CNN features.

4. Awards

 ISTI Grants for Young Mobility 2015: grant to carry out research in cooperation with for-eign Universities and Research Institutions of clear international standing

– http://www.isti.cnr.it/news/yawards-gym.php

– I used the travel grant for visiting the DISA Laboratory (Masaryk University).

 SISAP 2016 Best Paper: best paper at the 9th International Conference on Similarity Search and Applications (Tokyo, 24-26 October 2016) for the paper Supermetric Search with the

Four-Point Property.

– http://www.sisap.org/2016/awards.html

 ISTI Young Researcher Award 2017: award for young staff members (less than 35 years old) of the Institute of Information Science and Technologies (ISTI-CNR) with high scientific production.

– http://puma.isti.cnr.it/dfdownloadnew.php?ident=/cnr.isti/2017-TR-006&langver=en&scelta=NewMetadata

(4)

5. Publications

List of publications appeared during the PhD period:

International Journals

[J1] R. Connor, L. Vadicamo, F.A. Cardillo, F. Rabitti: “Supermetric Search”. Information Systems, 2018, In press.

[J2] R. Connor, F. A. Cardillo, L. Vadicamo, F. Rabitti: "Hilbert Exclusion: Improved Metric Search

through Finite Isometric Embeddings", ACM Transactions on Information Systems (TOIS), 35(3), 17,

June 2017

[J3] Amato, G., Falchi, F., and Vadicamo, L. : “Aggregating binary local descriptors for image

re-trieval”. Multimedia Tools and Applications (MTAP). pp. 1-31., March 2017

[J4] G. Amato, F. Falchi, L. Vadicamo: "Visual Recognition of Ancient Inscriptions using

Convolu-tional Neural Network and Fisher Vector", Journal on Computing and Cultural Heritage (JOCCH),

9(4), 21, December 2016

National Journals

[JN1] G. Amato, P. Bolettieri, F. Falchi, L. Vadicamo: “Sistema di riconoscimento delle immagini e

mobile app”, Forma Urbis, vol. XXI (1) pp. 22 - 25. E.S.S. Editorial Service System, January 2016.

(Ri-vista nazionale)

International Conferences/Workshops with Peer Review

[C1] Vadicamo, L., Carrara, F., Cimino, A., Cresci, S., Dell'Orletta, F., Falchi, F., Tesconi, M.:

“Cross-Media Learning for Image Sentiment Analysis in the Wild”. In Proceedings of the IEEE International

Conference on Computer Vision (ICCV) Workshops, October 2017

[C2] Connor, R., Vadicamo, L., Rabitti, F.: “High-Dimensional Simplexes for Supermetric Search”. In Proceedings of the International Conference on Similarity Search and Applications (SISAP 2017), Springer International Publishing (Lecture Notes in Computing Science), October 2017

[C3] Amato, G., Mannocci, A., Vadicamo, and L., Zoppi, F.: “Coping with interoperability in cultural

heritage data infrastructures: the Europeana network of Ancient Greek and Latin Epigraphy”, In

Proceedings of the 6th AIUCD Conference 2017 (AIUCD 2017), (pp. 211-215), January 2017

[C4] R. Connor, L. Vadicamo, F. A. Cardillo, F. Rabitti: “Supermetric Search with the Four-Point

Property”, In: Proceedings of the International Conference on Similiarty Search and Applications

(SISAP 2016). Springer International Publishing (Lecture Notes in Computing Science), pages 61-54, October 2016

[C5] G. Amato, F. Falchi, C. Gennaro, L. Vadicamo: “Deep Permutations: Deep Convolutional Neural

Networks and Permutation Based Indexing”, In: Proceedings of the International Conference on

Similiarty Search and Applications (SISAP 2016). Springer International Publishing (Lecture Notes in Computing Science), pages 93-106, October 2016

[C6] G. Amato, F. Falchi, F. Rabitti, and L. Vadicamo: “Combining Fisher Vector and Convolutional

Neural Networks for image retrieval”, In CEUR Workshop Proceedings of the 7th Italian

(5)

[C7] G. Amato, F. Falchi, L. Vadicamo: "How Effective Are Aggregation Methods on Binary

Fea-tures?", In: Proceedings of the 11th International Conference on Computer Vision Theory and

Ap-plications- Volume 4: VISAPP 2016, pages 566-573, February 2016

[C8] G. Amato, P. Bolettieri, F. Falchi, C. Gennaro, L. Vadicamo: "Using Apache Lucene to Search

Vector of Locally Aggregated Descriptors", In: Proceedings of the 11th International Conference on

Computer Vision Theory and Applications- Volume 4: VISAPP 2016, pages 383-392, February 2016 [C9] P. Bolettieri, V. Casarosa, F. Falchi, L. Vadicamo, P. Martineau, S. Orlandi and R. Santucci: "Searching the EAGLE epigraphic material through image recognition via a mobile device", In: Pro-ceedings of the International Conference on Similarity Search and Applications (SISAP 2015), Springer International Publishing(Lecture Notes in Computing Science), volume 9371, pag. 351-354, October 2015.

[C10] G. Amato, P. Bolettieri, F. Falchi, F. Rabitti, L. Vadicamo: "Visual Recognition in the EAGLE

Project", CEUR Workshop Proceedings of the 6th Italian Information Retrieval Workshop (IIR

2015), May 2015

Others

[O1] Barsocchi P., Basile D., Candela L., Ciancia V., Delle Piane M., Esuli A., Ferrari A., Girardi M., Guidotti R., Lonetti F., Moroni D., Nardini F. M., Rinzivillo S., Vadicamo L. ISTI Young Research

Award 2017. Technical report, 2017.

Pisa, 24/04/2018 Lucia Vadicamo