Computer Vision on Mobile Devices: A few case studies

(1)

Computer Vision on Mobile Devices: A few case studies

S. Battiato, G. M. Farinella, E. Messina, G. Puglisi, D. Ravì University of Catania – Dipartimento di Matematica ed Informatica

http://iplab.dmi.unict.it

Viale Andrea Doria 6 – 95125 – Catania (Italy)

1. Introduction

In recent years there is a growing interest in new technology to be employed in the context of mobile devices. Despite today's mobile devices (e.g., smartphone, tablet, etc.) are still limited in terms of resources (e.g., processor speed, available RAM, etc.), novel Computational Photography solutions are available to build appealing imaging applications that cannot be performed before. The main idea is to overcome the limitation of traditional imaging devices by using computational methods which can exploit the different inputs offered by a mobile devices (e.g., from low level data, such as Bayern pattern, GPS position, etc.) [10]. Since different cameras are usually embedded in devices of new generation, computer vision algorithms will be extremely useful in many applications of next future. For example, visual tracking can be exploited to interact with video games, or the recognition of the visual content could help in building new applications in the context of cultural heritage (e.g., giving back information on a recognized archeological site).

The main contribution of this work is related to the porting and testing of some classic computer vision algorithms on mobile platforms. Specifically, a few algorithms covering the main tasks of Computer Vision have been considered: keypoint extraction, face detection, image segmentation. The porting has been performed considering the following operating systems: Maemo, typically used in Nokia N900, and Android employed in LG Optimus one, Samsung Galaxy SII. These operating systems have been considered because they can be easily extended with customized libraries and/or programs and provide a standardized and fairly widespread API (Application Program Interface).

It is worth noting that the aforementioned algorithms should be optimized to properly work on low resources devices. For instance, the FCAM library[1] available for N900 Nokia smart phone, allows to interact with the low level algorithms (e.g., demosaicing, white balancing, denoising, etc.) and data (Bayer pattern) involved in the imaging pipeline [10]. In this way a better design of computer vision algorithms for constrained resources devices can be done.

Finally, comparative tests, quantitatively and qualitatively evaluate the performance of the algorithms on mobile devices.

(2)

2. Operating Systems and Computational Platforms

Maemo Operating System is based on GNU Linux-Kernel. This means that many development tools (such as gcc, make, etc.), which generally are used in a desktop computer, have to be integrated in Maemo OS.Porting of OpenCV Library [9] has been then performed in a standard way.

The default framework used to write a new application in MaemoOS is Qt; it is cross platform, written in C++ adding a layer of abstraction to access the low level functions (gui, network, gps, etc.). Qt applications can be easily cross-compiled in many different platforms, such as Maemo/MeeGo, Symbian, Windows Mobile, desktop PC, consumer electronics, car entertainment, etc.

The device used for testing OpenCV with Maemo OS was Nokia N900. This device mounts a high-end OAP 3430 ARM Cortex A8 as main processor, running at 600 MHz. The GPU is a PowerVR SGX 530 which supports OpenGL ES2.0. The TMS320C64x processor, running at 430 MHz, is used to run the image processing (camera), audio processing and data transmission. The system has 256 MB of dedicated high performance RAM (Mobile DDR) paired with access to 768 MB swap space managed by the OS. This provides a total of 1 GB of virtual memory.

Like Maemo, Android Operating System is also based on a Linux kernel. The Android operating system is currently the world's best-selling Smartphone platform and has a large community of developers. There are currently over 200,000 apps available on market.

The applications for the Android platform is Java based and run on a Dalvik virtual machine featuring JIT compilation, this means a continuously translation and caching of code to minimize performance degradation. Unlike Maemo, the Android OS does not have a native support for the full set of standard GNU libraries, and this makes difficult to port existing GNU/Linux applications. It has, instead, its own C library. Therefore, the OpenCV Library can’t be directly compiled, but the Java Native Interface (JNI) programming framework is required to interact with the Java classes. Android offers the possibility of programming in C/C++ using the Native Dev Kit (NDK), together with the standard SDK. To simplify the wrapping of OpenCV code to the JNI functions, we used a tool called SWIG[2], that connect programs written in C/C++ with many languages (in these case Java). It works by taking the declarations found in header files and using them to generate the wrapper code that scripting languages need to access the underlying native code.

We used two devices to test Android OS: Lg Optimus one and Samsung Galaxy S 2. The first has a camera of 3.15 Mpx (2048x1536 pixels) and aVGA video resolution at 18fps. It mounts a Qualcomm MSM7227 chipset ARM CPU running at 600 MHz and 512 MB of RAM. The second mounts two cameras: one rear of 8.1 Mpx (3264x2448 pixels) and one frontal of 2,0 Mpx. The video resolution is 1080p at 30 fps. A Cortex-A9 ARM CPU with 2x1,2 GHz (dual core) is embedded with a 1.024 MB of RAM.

3. Involved Algorithms

The algorithms tested in the selected platforms are:

• Feature extraction;

• Face detection;

• Graph cut segmentation.

(3)

Feature extraction algorithms are typically used to detect the points of interest (called also keypoints) in an image [8]. These features can be used for many purposes: image registration, visual tracking, image retrieval, etc. In our test we employed FAST [3], STAR [4] and SURF [5] as implemented in OpenCV Library [9].

The face detection method used in our tests is based on the widespread Viola and Jones [6] object detection algorithm. In the beginning step, a boosting cascade classifier (working with haar-like features) is trained with a few hundreds of samples of a particular object (in this case face), called positive examples, and a set of arbitrary images (not faces), called negative examples. After a classifier is trained, it can be applied to a region of interest in an input image.

The last algorithm considered was Graph cuts [7], a popular approach used for image segmentation.The basic idea of this approach is the following:

• each image pixel is viewed as a vertex of a graph;

• the similarity between two pixels is viewed as the weight of the edge of these two vertices;

• segmentation is achieved by cutting edges in the graph to form a good set of connected component.

4. Results

Some tests have been performed to estimate the performances of the different involved platforms. A significant effort has been done to properly configure the different devices in order to run the considered algorithms.

In the N900 device we ran the Graph cuts and Face detection algorithms. These tests are performed for different image resolutions and the results are shown in Fig. 1, Fig. 2, Fig. 3.

Figure 1: Average detection time vs image resolution.

18,66

9,04

4,19

1,03 0,41

0,00 2,00 4,00 6,00 8,00 10,00 12,00 14,00 16,00 18,00 20,00

1024x1024 731x731 512x512 256x256 170x170

Sec

Face Detection N-900

(4)

Figure 2: Average segmentation time at different iteration steps vs image resolution.

Figure 3: First row: the input image with the background and foreground seeds provided to Graph cuts.

Second and third rows: Visual assessment of the segmentation results at different iterations. These screenshots are captured directly on the N900.

0 50 100 150 200 250

1 2 3 4 5 6

Sec

Iterations

Graph Cuts N-900

800x600 640x480 480x360 320x240 240x180

(5)

In the Lg Optimus One and Samsung Galaxy S II devices we test the SIFT, FAST and SURF keypoint extraction algorithms varying image resolution and color depth. The resulting frame rates, for both devices, are shown in Fig. 4 and Fig. 5.

Figure 4: Average frame rate (FPS) vs color resolution depth on the Samsung platform.

Figure 5: Average frame rate (FPS) vs color resolution depth on the LG platform.

0,00 5,00 10,00 15,00 20,00 25,00 30,00 35,00

FPS

Resolution and Color Depth

Samsung Galaxy S II

FAST STAR SURF

0,00 2,00 4,00 6,00 8,00 10,00 12,00 14,00 16,00

FPS

Resolution and Color Depth

LG Optimus One

FAST STAR SURF

(6)

References

[1] A. Adams, E. Talvala, S. Park, D. E. Jacobs, B. Ajdin, N. Gelfand, J. Dolson, D. Vaquero, J. Baek, M.

Tico, H. P. A. Lensch, W. Matusik, K. Pulli, M. Horowitz, M. Levoy: “The Frankencamera: An Experimental Platform for Computational Photography”, ACM Transactions on Graphics (TOG) - Proceedings of ACM SIGGRAPH 2010, Vol. 29, Issue 4, pp. 29:1-29:12, 2010.

[2] D. M. Beazley: “SWIG An Easy to Use Tool for Integrating Scripting Languages with C and C++”, 4th Annual Tcl/Tk Workshop, 1996.

[3] E. Rosten and T. Drummond: “Machine learning for high-speed corner detection”, European Conference on Computer Vision, pp. 430-443, 2006.

[4] M. Agrawal, K. Konolige, M. R. Blas: “CenSurE: Center Surround Extremas for Realtime Feature Detection and Matching”, European Conference on Computer Vision, pp. 102-115, 2008.

[5] R. Funayama, H. Yanagihara, L. Van Gool, T. Tuytelaars, H. Bay: “Robust Interest Point Detector and Descriptor”, US 2009238460, published 24-09-2009.

[6] P. Viola, M. Jones: “Robust Real-time Object Detection”, International Journal of Computer Vision, Vol. 57, Issue 2, pp. 137-154, 2001.

[7] R. Zabih, V. Kolmogorov: “Spatially Coherent Clustering Using Graph Cuts”, IEEE International Conference on Computer Vision and Pattern Recognition, pp. 437-444, 2004.

[8] R. Szeliski: “Computer Vision: Algorithms and Applications”, ISBN 978-1-84882-934-3, Springer, 2010.

[9] http://opencv.willowgarage.com/wiki/

[10] Image Processing for Embedded Devices - Eds. S. Battiato, A.R. Bruna, G. Messina, G. Puglisi - ISSN:

1879-7458 - Applied Digital Imaging ebook series, ISBN: 978-1-60805-170-0, Bentham Science Publisher, 2010.