0% found this document useful (0 votes)
116 views

YosrON Report v1.4

By Egyptian students from Cairo University, Faculty of Engineering, Electronics and Communications Dept., Class of 2012. Graded excellent.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views

YosrON Report v1.4

By Egyptian students from Cairo University, Faculty of Engineering, Electronics and Communications Dept., Class of 2012. Graded excellent.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

Cairo University Faculty of Engineering Electrical Electronics and Communications Department

Touchscreen Add-On
Official Website: YosrON.com

July, 2012

YosrON: Touchscreen add-on


By Donia Alaa Eldin Hassan Idriss: [email protected] Muhammad Al-Sherbeeny Hassan: [email protected]

Under the Supervision of Dr. Ibrahim Qamar


[email protected]

A Graduation Project Report Submitted to the Faculty of Engineering at Cairo University In Partial Fulfillment of the Requirements for the Degree of Bachelor of Science in Electronics and Communications Engineering Faculty of Engineering, Cairo University Giza, Egypt July 2012
i

Table of Contents
List of Figures ................................................................................................................ v Acknowledgments......................................................................................................... vi Abstract ........................................................................................................................vii Chapter 1: 1.1 1.2 1.3 Introduction .............................................................................................. 1

Why is it important? ........................................................................................ 1 Other related projects ...................................................................................... 3 YosrON is built on the 2nd version of EverScreen .......................................... 6 The hardware ........................................................................................... 6 The software............................................................................................. 6 The advantages of YosrON ...................................................................... 7 The challenges we expected..................................................................... 7 The skills we needed ................................................................................ 8 The plan ................................................................................................... 8 YosrON structure ..................................................................................... 9

1.3.1 1.3.2 1.3.3 1.3.4 1.3.5 1.3.6 Chapter 2: 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 Chapter 3: 3.1

System Description ......................................................................................... 9 Scanlines.......................................................................................................... 9 Noise reduction ............................................................................................. 10 Fast pointer detection .................................................................................... 11 Positioning the cameras ................................................................................. 14 Calibration phase ........................................................................................... 16 Tracking algorithm ........................................................................................ 16 Resolution accuracy ...................................................................................... 18 Algorithm complexity ................................................................................... 21 Settings and system performance .............................................................. 21 Notes on the code ................................................................................... 22

Overall flow of the code ................................................................................ 22 ii

3.2

Notes on main.cpp ......................................................................................... 24 Defining the two webcams we need ...................................................... 24 Smoothing neighborhood in averaging .................................................. 24 The color model to be used .................................................................... 24 Debugging .............................................................................................. 25 Luminance.............................................................................................. 25 Control the code while running.............................................................. 25

3.2.1 3.2.2 3.2.3 3.2.4 3.2.5 3.2.6 3.3

Notes on constants.h ...................................................................................... 26 Threshold of colors difference in each pixel.......................................... 26 Consecutive pixels threshold ................................................................. 26 Calibration touch offset.......................................................................... 26 Consecutive detections to locate a corner .............................................. 26 Limit of attempts to locate a corner ....................................................... 27 Calibration scanlines distances .............................................................. 27 Picture format, resolution, fps and grab method .................................... 27

3.3.1 3.3.2 3.3.3 3.3.4 3.3.5 3.3.6 3.3.7 3.4

Compiling the code after any edits ................................................................ 28 Challenges .............................................................................................. 29

Chapter 4: 4.1

The environment ........................................................................................... 29 OpenCV on Windows ............................................................................ 29 C/C++ programming on Linux/Ubuntu ................................................. 29 Libraries that must be installed for the code .......................................... 30

4.1.1 4.1.2 4.1.3 4.2 4.3

The cameras................................................................................................... 33 The fisheye lenses ......................................................................................... 34 Conclusions and Future Work ............................................................... 38

Chapter 5:

References .................................................................................................................... 39 Chapter 0: 0.1 Appendix ................................................................................................ 43

Installing Ubuntu 11.10 ................................................................................. 43 iii

0.2

Installing the required libraries and packages ............................................... 50 Installing "build-essential" using the "Terminal" .................................. 50 Installing libraries/packages using "Ubuntu Software Center".............. 53

0.2.1 0.2.2 0.3

Check webcam supported formats and UVC compliance ............................. 54 UVC compliance check ......................................................................... 54 Supported configurations and formats ................................................... 55 Troubleshooting webcams ..................................................................... 56

0.3.1 0.3.2 0.3.3

iv

List of Figures

Figure .1: Survey results on UniMasr.com website. .................................................... 2 1 Figure .2: Survey results on YosrON page on facebook. ............................................ 3 1 Figure .3: Touchscreen add-on by TouchMagic. ......................................................... 3 1 Figure .1: Visual representation of scanlines. ............................................................ 10 2 Figure .2: The buffer used for the analysis of the green row shows a clear peak. ..... 12 2 Figure .3: The system correctly detects only the the pointer coming from above. .... 13 2 Figure .4: The vertical contiguity constraint of a of a hand holding a pen. ............... 14 2 Figure .5: Example of a simple but inefficient configuration. ................................... 14 2 Figure .6: Suggested configuration to optimize the use of the cameras. ................... 15 2 Figure .7: Resolution accuracy of W1 in t. ................................................................. 19 2 Figure .8: A4Tech webcam, PK 720G model. ........................................................... 21 2 Figure .1: Full-frame fisheye image........................................................................... 35 4 Figure .2: Remapped full-frame fisheye image into rectilinear prespective. ............. 35 4 Figure .3: Circular fisheye image............................................................................... 36 4 Figure .4: The image of circular fisheye after remapping (Defisheye). ..................... 36 4 Figure .5: Fisheye for home doors. ............................................................................ 37 4 Figure .1: Windows Disk Management Tool. ............................................................ 44 6 Figure .2: Shrink dialog box. ..................................................................................... 45 6 Figure .3: Windows partitions after successfully freeing space. ............................... 46 6 Figure .4: Don't allow any updates ............................................................................ 47 6 Figure .5: Install Ubuntu alongside your current OS. ................................................ 48 6 Figure .6: Disabling automatic updates of Ubuntu. ................................................... 49 6 Figure .7: Using Ubuntu Software Center to install required libraries/packages. ..... 54 6 Figure .8: Checking supported configurations and formats using guvcview............. 55 6

Acknowledgments

We would like to thank those who helped us to make this dream comes true. No matter how big or little help they offered, we would like to mention them all as much as we can. We will mention them according to the timing of their help.

Thanks to Dr. Ibrahim Qamar for accepting us and our idea. Thanks for his valuable time and efforts of understanding and kindness discussing with us many problems leading us to solutions. Thanks to Eng. Abdel-Mohsen for telling us which programming language to use (Matlab is easy but slow, C++ is good with toolboxes and very fast for image processing). Thanks to Eng. Khaled Yeiha and Eng. Ahmad Ismail for giving us useful guidelines for the algorithm. Thanks to Eng. E. Rustico (From Italy) for supporting us with documents, codes and instructions that helped us very much as we built our project on his work, EverScreen. Thanks to Dr. Essam, the glasses maker, for helping us with the fisheye lenses. Thanks to Eng. Shaimaa Mahmoud and Eng. Dina Zeid for helping us with OpenCV toolbox and in some translations (From Italian to English). Thanks to Muhammad Sherif and Sherif Medhat for helping us with programming on Ubuntu. Thanks to Eng. Muhammad Hosny for helping us debugging some codes and solving many problems we faced with the OS and the software. Thanks to Mr. Muhammad Reda for helping us finding the compatible webcams. Thanks to Eng. Sherbeeny Hasan, Muhammad's father, for helping us with the webcams and the fisheye lenses. Thanks to our families for supporting us in every way all the time. vi

Abstract

The entire world is heading to design all operating systems and programs to work with touch technology. But most of the Egyptians, and others around the world, can't afford the cost of a touchscreen for their computers. That's why we came up with YosrON. YosrON is meant to be a touchscreen add-on that can be put on any computer screen, PC or laptop, to add the "touch" feature to the computer screen using a USB connection and software. It has been built on a complete and inexpensive system to track the movements of a physical pointer on a flat surface. Any opaque object can be used as a pointer (fingers, pens, etc.) and it is possible to discriminate whether the surface is being touched or just pointed at. The system relies on two entry-level webcams and it uses a fast scanline-based algorithm. A calibration wizard helps the user during the initial setup of the two webcams. No markers, gloves or other hand-held devices are required. Since the system is independent from the nature of the pointing surface, it is possible to use a screen or a projected wall as a virtual touchscreen. The complexity of the algorithms used by the system grows less than linearly with resolution, making the software layer very lightweight and suitable also for low-powered devices like embedded controllers. We were planning to make a resizable plastic frame as housing for the webcams and the added wide-angle lenses, fisheye lenses, but we ran out of time and faced many problems that made us postpone it for future work besides adding the multi-touch feature. For now, YosrON is just two webcams fixed far away from the touching surface and software is used for calibration and moving the mouse.

vii

Chapter 1:

Introduction

1.1 Why is it important?


The advances in technology and the widespread usage of computers in almost every field of human activity are necessitating new interaction methods between humans and machines. The traditional keyboard and mouse combination has proved its usefulness but also, and in a more extensive way, its weakness and limitations. In order to interact in an efficient and expressive way with the computer, humans need to be able to communicate with machines in a manner more similar to human-human communication. In fact, throughout their evolution, human beings have used their hands, alone or with the support of other means and senses, to communicate with others, to receive feedback from the environment, and to manipulate things. It therefore seems important that technology makes it possible to interact with machines using some of these traditional skills. The human-computer interaction (HCI) community has invented various tools to exploit humans gestures, the first attempts resulting in mechanical devices. Devices such as data gloves can prove especially interesting and useful in certain specific applications but have the disadvantage of often being onerous, complex to use, and somewhat obtrusive. The use of computer vision can consequently be a possible alternative. Recent advances in computer vision techniques and availability of fast computing have made the real-time requirements for HCI feasible. Consequently, extensive research has been done in the field of computer vision to identify hand poses and static gestures, and also, more recently, to interpret the dynamic meaning of gestures. Computer vision systems are less intrusive and impose lower constraints on the user since they use video cameras to capture movements and rely on software applications to perform the analysis.

Among the existing graphical input devices, computer users love especially touchscreens. The reason is that they reflect, as no other device does, the way we use to get in touch and interact with the reality around us: we use to point and touch directly with our hands what we see around us; touchscreens allow doing the same with our fingers on computer interfaces. This preference is confirmed by a strong trend in the industry of high end platforms (e.g. Microsoft Surface and Touchwall) and in the market of mobile devices: Apple, Samsung and Nokia, to cite only a few examples, finally chose a touch-sensible display for their leading products, while the interest for this technology is growing also for design studios, industrial environments and public information points like museums and ATMs. Unfortunately, touchscreen flexibility is low: finger tracking is impossible without physical contact; it is not possible to use sharp objects on them; large touch-sensible displays are expensive because of their manufacturing cost and damage-proneness. YosrON is made of low cost devices, without the use of any kind of equipment that is not possible to find in any computer shop with less than 300 EGP which is reasonable price for the Egyptian market and other similar markets. It's important to offer such add-on with low price because the upcoming Microsoft Windows 8 OS, which is the most common OS in Egypt, is mainly designed for touchscreens. Of course it can be used without touchscreens, but that would be a great loss for the user experience. We made a simple survey asking many computer users and resellers if they would buy such an add-on and how much would they pay for it? The results are in fig. 1.1 and fig. 1.2.

Figure .1: Survey results on UniMasr.com website. 1

Figure .2: Survey results on YosrON page on facebook. 1

1.2 Other related projects


The only commercial product we found is TouchMagic (fig. 1.3) which is available in USA and can be found in the Middle East, only in UAE, KSA and the occupied lands of Palestine; Israel. This product is available in fixed sizes with minimal cost of 170 $ = 1000 EGP for 15" screens. So, if you changed your computer/screen for any reason, you will probably need to buy a new add-on that fits your screen size. That's why it's not wanted in the market because it's expensive and not resizable. But when we talk about researches and projects in computer interfaces, we find all of them turning back to the human body, trying to adapt the way we communicate with computers to our natural way of move and behave. Speech-driven interfaces, gesturerecognition software and facial expression interpreters are just some examples of this recent trend. There is a growing interest in the ones that involve real-time body tracking, especially if no expensive hardware is required and the user does not need to wear any special equipment. The simplest and cheapest choice is to use optical devices to track a specific part of the body (head, eyes, hands or even the nose {Check [GMR02] in the
Figure 1.3: Touchscreen add-on by TouchMagic.

references}); we focus on finger tracking systems that do not require lasers, markers, gloves or hand-held devices [SP98, DUS01, Lee07]. The main application of finger tracking is to move a digital pointer over a screen, enabling the user to replace the pointing device (e.g. the mouse) with his hands. While for eye or head tracking we have to direct the camera(s) towards the users body, finger tracking let us a wider range of choices. The first possibility is to direct the camera towards the users body, as for head tracking, and to translate the absolute or relative position of the users finger to screen coordinates. In [WSL00] an empty background is needed; in [IVV01] the whole arm position is reconstructed, and in [Jen99] a combination of depth and color analysis helps to robustly locate the finger. Some works tried to estimate the position of the fingertip relatively to the view frustum of the user; this was done in [CT06] with one camera and in [pHYssCIb98] with stereovision, but both had strong limits in the accuracy of the estimation. The second possibility is to direct the camera towards the pointing surface, which may be static or dynamic. Some works require a simple black pad as pointing surface, making it easy to locate the users finger with only one camera [LB04]; however, we may need additional hardware [Mos06] or stereovision [ML04] to distinguish if the user is just hovering the finger on it or if there is a physical contact between the finger and the surface. A physical desktop is an interesting surface to track a pointer on. Some works are based on the DigitalDesk setup [Wel93], where an overhead projector and one or more cameras are directed downwards on a desk and virtual objects can interact with physical documents [Ber03, Wil05]; others use a similar approach to integrate physical and virtual drawings on vertical or horizontal whiteboards [Wil05, vHB01, ST05], and one integrates visual information with an acoustic triangulation to achieve better accuracy [GOSC00]. These works use differencing algorithms to segment the users hands from the background, and then shape analysis or finger templates matching to locate the fingertips; they rely on the assumption that the background surface is white, or in general of a color different than skin. Other approaches work also on highly dynamic surfaces. It is possible to robustly suppress the background by analyzing the screen color space [Zha03] or by applying polarizing filters to the cameras [AA07]; in the first the mouse click has to 4

be simulated with a keystroke, while in the latter a sophisticated mathematical finger model allow to detect the physical contact with stereovision. Unfortunately, these two techniques cannot be applied to a projected wall. Directing the camera towards the pointing surface implies, in general, the use of computationally expensive algorithms, especially when we have to deal with dynamic surfaces. A third possible approach, which may drastically reduce the above problems, is to have the cameras watching sidewise - i.e. laying on the same plane of the surface; using this point of view we do not have any problem with dynamic backgrounds both behind the user or on the pointing surface, and this enables us to set up the system also in environments otherwise problematic (e.g. large displays, outdoor, and so on). Among the very few works using this approach, in [QMZ95] the webcam is on the top of the monitor looking towards the keyboard, and the finger is located with a color segmentation algorithm. The movement of the hand along the axis perpendicular to the screen is mapped to the vertical movement of the cursor, and a keyboard button press simulates the mouse click. However, the position of the webcam has to be calibrated and the vertical movement is mapped in an unnatural way. Also in [WC05] we find a camera on the top of a laptop display directed towards the keyboard, but the mouse pointer is moved accordingly to the motion vectors detected in the gray scale video flow; a capacitive touch sensor enables and disables the tracking, while the mouse button has to be pressed with the other hand. In [Mor05], finally, the lateral approach is used to embed four smart cameras into a plastic frame that is possible to overlap on a traditional display. The above approaches need to process the entire image as it is captured by the webcam. Thus, every of the above algorithms are at least quadratic with respect to resolution (or linear with respect to image area). Although it is possible to use smart region finding algorithms, these would not resolve the problem entirely. In [FR08] they proposed the 1st version of EverScreen, a different way to track user movements keeping the complexity low. They drastically decreased the scanning area to a discrete number of pixel lines of two uncalibrated cameras. Their system requires a simple calibration phase that is easy to perform also for non-experienced users. The proposed technique only regards the tracking of a pointer, and it is not about gesture recognition. The output of the system, at present, is directly translated into mouse movements, but may be instead interpreted by gesture recognition software. 5

1.3 YosrON is built on the 2nd version of EverScreen


The 1st version of EverScreen focused its attention mostly on the mapping algorithm and provided only a description of an early stage of the system. The 2nd introduces a more efficient and mature system, exploiting an improved pointer detection but computationally and economically cheap as the previous one. Among the improvements: Two proximity constraints in the pointer detection help to reduce the number of false positives. A convolution-based algorithm is used to locate the presence of a pointer. The gap from the reference backgrounds is kept under control to detect camera movements. The calibration phase is faster, and the system graphically shows the points to touch. Iterative algorithms are used to solve the linear systems instead of direct formulas.

1.3.1 The hardware


YosrON was planned to consist of four 90-degrees view angle cameras fixed in corners of a resizable frame with Arrays of IR or LEDs, all together connected to a USB hub to be connected to the computer through one single port. We also planned to implement the software on a microprocessor to eliminate any processing load on the host computer. But we had to reduce the hardware because of some challenges that will be mentioned later.

1.3.2 The software


It's for image processing and geometrical calculations on the cameras outputs to determine the position of the finger (pointing tool). It was planned to be a C++ code using OpenCV toolbox on visual studio in Windows OS. We faced some problems with the configuration of the environment and some limitation with the toolbox so we migrated to Ubuntu 11.10, 64-bit with lots of libraries to be mentioned later. 6

1.3.3 The advantages of YosrON


Resizable: With no glass used, the same item can be used with any screen of any size. Low cost: The expected cost for end users is around 200 EGP. (The prototype cost less than 300 EGP, so the single item in mass production would cost less!) Fast: With configuration of 30 fps, the response of the software is immediate (In the range of microseconds). Accurate: With configuration of 320x240 resolution, the accuracy is acceptable for touchscreen systems (OS and programs are designed with big buttons) Easy fabrication: Manufacturers can easily fabricate it in mass production without the need of any new or complicated technology.

1.3.4 The challenges we expected


Cameras: Finding USB cameras of low cost and fast response with wide view angle (90 at least). Resizable frame: Fabrication of a plastic resizable frame and mounting the cameras on it. Processing: Building the software that can interact with the cameras and process the images to determine the pointer/finger position. Load: Reducing the processing load on the host computer using microprocessor.

1.3.5 The skills we needed


Image processing using OpenCV toolbox on C++ visual studio. (Before we migrate to Ubuntu) Installing and configuring Linux/Ubuntu OS. C/C++ programming on Ubuntu. Debugging and troubleshooting.

And for production, we will need to make drivers for different OSs and to implement the software on microprocessor.

1.3.6 The plan


Purchasing and installing webcams and wide-angle lenses. Building the initial image processing code on live stream images on a single webcam for finger detection only. Building the code of calibration and solving the streams from both webcams. Building the mouse controlling code. Fabricating the resizable frame housing. Refining the software after housing. Building the driver and calibration software.

Chapter 2:

YosrON structure

2.1 System Description


The system now consists of two off-the-shelf webcams positioned sidewise so that the lateral silhouette of the hand is captured into an image like fig. 2.1. After a quick auto-calibration, the software layer will be able to interpret the image flow and translate it into absolute screen coordinates and mouse button clicks; the corresponding mouse events will be simulated on the OS in a completely transparent way for the application level. We call pointing surface the rectangle of surface to be tracked; as pointing surface we can choose a desk, a LCD panel, a projected wall, etc. An automatic region stretching is done to map the coordinates of the pointing surface to the target display. Any opaque object can be used to point or touch the surface: the system will track a finger as well as a pencil, a chalk or a wooden stick.

2.2 Scanlines
We focus the processing only on a small number of pixel lines from the whole image provided by each webcam; we call these lines scanlines. Each scanline is horizontal and ideally parallel with the pointing surface; we call touching scanline the lowest scanline (the nearest to the pointing surface), and pointing scanline every other one. The calibration phase requires grabbing a frame before any pointer enters in the tracking area; these reference frames (one per webcam) will be stored as reference backgrounds, and will be used to look for runs of consecutive pixels different from the reference background. We will see later how we detect such scan-line interruptions (fig. 2.1). The detection of a finger only in pointing scanlines will mean that the surface is only being pointed, while a detection in all the scanlines will mean that the user is currently touching the surface. To determine if a mouse button pressure has to be simulated, we can just look at the touching scanline: we assume that the user is clicking if the touching scanline is occluded in at least one of the two views.

Figure .1: Visual representation of scanlines. 2

During the calibration phase the number of scanlines of interest may vary from a couple to tens; during the tracking, three or four scanlines will suffice for an excellent accuracy. A detailed description of the calibration will be given later.

2.3 Noise reduction


We detect the presence of a physical pointer in the view frustum of a webcam by comparing the current frame with the reference background. This is simple in absence of noise; unfortunately, the video flow captured from a CMOS sensor (the most common type of sensor in low cost video devices) is definitely not ideal and presents a bias of white noise, salt and pepper noise and motion jpeg artifacts. This makes pointer detection more difficult, especially when the pointer is not very close to the camera and its silhouette is therefore only a few pixels wide. To keep the overall complexity low we avoided applying any post-elaboration filter on each of the grabbed frames and we adopted two simple strategies in order to reduce the impact of noise on our algorithm.

10

The first strategy is to store, as a reference background, not just the first frame but the average of the first b frames captured (in current implementation, b = 4). The average root mean square deviation of a frame from the reference background, after this simple operation, decreases from ~1.52 to ~1.26 (about 17%). The second strategy is to apply a simple convolution to the scanlines we focus on. The matrix we use is

with divisor 3. This is equivalent to say that we replace each pixel with the average of a 1 pixel neighborhood on the same row; it is not worth increasing the neighborhood of interest because by increasing it we decrease the tracking accuracy.

Finally, we keep track of the Root Mean Square Error (RMSE) with respect to the reference frames; if the RMSE gets higher than a threshold, this is probably due to a disturbing entity in the video or to a movement of the camera rather than to systematic noise. In this case, the system automatically stops tracking and informs the user that a new reference background is about to be grabbed.

2.4 Fast pointer detection


Although some noise has been reduced, we cannot rely only on a binary differencing algorithm. A set of pixels different from the reference frame is meaningful if they are close to each other; we apply this spatial contiguity principle both horizontally and vertically. This approach imitates the so called Helmholtz principle for human perception. The Helmholtz principle states that an observed geometric structure is perceptually meaningful if its number of occurrences would be very small in a random situation. (see [MmM01])

11

The first goal is to find a run of consecutive pixels significantly different from the reference; what we care is the X coordinate of the center of such interruption. We initialize to zero a buffer of the same size of one row, and then we start scanning the selected line (say l). For each pixel p = ( px, pl ), we compute the absolute difference dp from the correspondent reference value; then, for each pixel q = ( qx , ql ) in a neighborhood long n, we add this dp multiplied by a factor m inversely proportional to | px qx |. Finally we read in the buffer a peak value correspondent to the X coordinate of the center of the interruption (fig. 2.2); if no interruption occurred in the row (i.e. pixels different from the reference were not close to each other), we will have only low peaks in the buffer. To distinguish between a high and a low peak we can use a fixed or a relative threshold; in our tests, a safe threshold was about 20 times greater than the neighborhood length.

Figure .2: The buffer used for the analysis of the green row shows a clear peak. 2

12

Now we have a horizontal proximity check, but not a vertical one yet. Each webcam sees the pointer always breaking into the view frustum by the upper side. The pointer silhouette may be straight (like a stick) or curved (e.g. a finger); in both cases, the interruptions found on scanlines close to each other should not differ more than a given threshold. This vertical proximity constraint gives a linear upper bound to the curvature of the pointer, and helps discarding interruptions caused by noise or other objects entering in the view frustum; in other words, the system detects only pointers coming from above, and keeps working correctly if other objects appear in the view frustum from a different direction (e.g. the black pen in fig. 2.3).

Figure .3: The system correctly detects only the the pointer coming from above. 2

These two simple proximity checks make the recognition of the pointer an easier task. Fig. 2.4 shows the correct detection of the pointer (a hand holding a pen) over a challenging background. The lower end of the vertical sequence of interruptions is marked with a little red cross.

13

Figure .4: The vertical contiguity constraint of a of a hand holding a pen. 2

2.5 Positioning the cameras


The proposed technique requires the positioning of two webcams relatively to the pointing surface. The simplest choice is to put them so that one detects only movements along the X axis, while the other one detects Y axis changes. This solution is the simplest to implement, but requires the webcams to have their optical axes perfectly aligned along the sides of the pointing surface. Moreover, the wider is the view field of a webcam, the more we lose

accuracy on the opposite side of the surface. On the other hand, the narrower is the view field of the webcams, the farther we have to put them to capture the entire surface.
Figure 2.5: Example of a simple but inefficient configuration.

14

In fig. 2.5, for example, the webcam along Y axis of the surface has a wide view field, but this brings resolution loss on segment DC; on the other side, the webcam along X axis of the surface has a narrow view field, but it has to be positioned far from the pointing surface to cover the whole area. If the surface is a 21.5m projected wall and the webcam has a 45 view field, we have to put the camera ~5.2 meters away to catch the whole horizontal size. A really usable system should not bother the final user about webcam calibration, view angles and so on. A way to minimize the calibration effort is to position the webcams near two nonopposite corners of the pointing surface, far enough to catch it whole and oriented as the surface diagonals were about bisectors of the respective view fields (fig. 2.6). With this configuration there is no need to put the webcams far away from the surface; this reduces the accuracy loss on the far sides.

Figure .6: Suggested configuration to optimize the use of view frustum of the cameras. 2

In the rest of this project we will assume, for the sake of clarity, that the webcams are in the same locations and orientations as in fig. 2.6. However, the proposed tracking algorithm works with a variety of configurations without changes in the calibration phase: the cameras may be positioned anywhere around the surface, and we only need that they do not face each other. 15

2.6 Calibration phase


When the system is loaded, the calibration phase starts. In this phase, after grabbing the reference backgrounds, we ask the user to touch the vertices of the pointing surface and its center. When a pointer is detected in both views, we track the position of its lower end (the red cross in fig. 2.4 and 2.3); if this position holds with a low variance for a couple of seconds, the correspondent X coordinate is stored. After we grabbed the position of all the five points, we compute the Y coordinate of a special scanline as the lowest row not intercepting the pointing surface: during the tracking we will focus only on this row to grab the position of the pointer, so that the overall complexity will be linear with the horizontal resolution.

2.7 Tracking algorithm


During the calibration phase we stored the X coordinate of each vertex as seen by the webcams. The basic idea is to calculate the perspective transformation that translates the absolute screen coordinates to absolute coordinates in the viewed image. We store vertices in homogeneous coordinates and use a 3x3 transformation matrix M:

Since P is determined up to a proportional factor there is no loss of generality in setting one of the elements of M to an arbitrary non-zero value. In the following we set the element l33 = 1. To obtain all the other elements of M, in principle the correspondence between four pairs of points must be given. The proposed application only needs to look at horizontal scanlines; for this reason there is no need to know the coefficients l21,l22,l23 of M and we only have to determine the values of l11,l12,l13,l31,l32. The number of unknown matrix elements has been decreased to five, so we only need the x coordinate of five points (instead of the x and y of four points).

16

During the calibration phase, we ask the user to touch the four vertices of the pointing surface and its center. This setup greatly simplifies the computation of the unknown coefficients. Indeed points A,B,C,D and the center E (see fig. 2.6) have screen coordinates respectively:

when the display resolution is W H. If Q is a point on the surface, let Qxp be the x coordinate of the corresponding projected point. The final linear system to solve is:

which makes easy to obtain l11, l12, l13, l31, l32 for each camera. During the tracking phase we face a somehow inverse problem: we know the projected x coordinate in each view, and from these values (let them be Xl and Xr) we would like to compute the x and y coordinates of the correspondent unprojected point (that is, the point the user is touching). Let lij be the transformation values for the first camera, and rij for the second one; the linear system we have to solve in this case is

17

It is convenient to divide the first two equations by zl and the latter two by zr , and rename the unknown variables as follows

So that the final system is

This is a determined linear system, and it is possible to prove that in the setting above there is always one and only one solution. By solving this system in x and y we find the absolute coordinates of the point that the user is pointing/touching on the surface. We can solve this system in a very fast way by computing once a LU factorization of the coefficient matrix, and by using it to compute x and y for each pair of frames; we can also use numerical methods, such as Single Value Decomposition, or direct formulas. In the previous version of the system direct formulas were used, while now a LU factorization is implemented.

2.8 Resolution accuracy


Lets consider now how accurate is the tracking system depending on display and webcam physical characteristics.

18

Let t = (xt ,yt ) be a point on the pointing surface, XDYD the display resolution (i.e. the resolution of the projector for a projected wall) and XW1 YW1 the resolution of a webcam W1; let W1 be the bisector of the view frustum of W1, and let the upper left corner of the surface be the origin of our coordinate system (with Y pointing downwards, like in fig. 2.7). We assume for simplicity that the view frustum of the camera is centered on the bisector of the coordinate system, but the following considerations keep their validity also in slightly different configurations. The higher is the number of pixels detected by the webcam for each real pixel of the display, the more accurate will be the tracking. Thus, if we want to know how accurate is the detection of a point in the pointing surface, we could consider the ratio between the length in pixels of the segment Xt , passing by t and perpendicular to W1 , and the number of pixels detected by the webcam W1. We define resolution accuracy of W1 in t and we call

(W1, t) this ratio. It is clear that we only care about the

horizontal resolution of W1, which is constant in the whole view frustum of the camera. (fig. 2.7)

Figure .7: We define resolution accuracy of W1 in t the ratio between the length of Xt and the 2 number of pixels detected by W1.

19

Because pixels are approximately squares, the number of pixels along the diagonal of a square is equal to the number of pixels along an edge of the square; thus, the length of Xt will be equal to the distance from the origin of one of the two points that Xt intercepts on the X and Y axes. For every point p Xt is xp + yp = k; then, its length will be equal to the y-intercept of the line passing by t and perpendicular to W1. So we have |Xt | = xt + yt ; hence, the resolution accuracy of W1 in t is

One of the most interesting applications of the system is to projected walls, so that they become virtual blackboards. A very common projector resolution is nowadays 1024 768 pixels, while one of the maximum resolutions that recent low-cost webcams support is 12801024 pixels at 15 frames per second. In this configuration, the resolution accuracy in t = (1024, 768) is

This is the lowest resolution accuracy we have with W1 in the worst orientation; if we invert the X axis to get the accuracy for W2 (supposing that W2 is placed on the upper right corner of the surface), (W2, t) 1.7. In the central point u = (512, 384) of the display we have (W1, u) = (W2, u) 1.4; it is immediate that, in the above configuration, the average resolution accuracy is higher than 1:1 (sub-pixel).

20

2.9 Algorithm complexity


The number of scanlines is constant and in the tracking phase it is not useful to use more than 3 or 4 of them. For each scanline we do a noise reduction (in linear time), we apply a linear convolution filter (in linear time too) and then we do a linear search for a peak. Finally, we solve the system (in constant time). The total complexity is therefore linear with the horizontal resolution of the webcams.

2.10 Settings and system performance


The webcams we used for testing are two A4Tech PK 720G, with the following specifications: Image sensor: 1/6" CMOS, 640480 pixels Lens: F=2.4, f=3.5 mm View angle: 54 degrees Exposure control: Automatic White balance: Automatic Computer interface: USB 2.0 Focus range: Automatic focus, 10 cm to infinity Frame rates: 30fps@160x120, @320x240, @640x480
Figure 2.8: A4Tech webcam, PK 720G model.

Their 2012 price has been of about 110 EGP each. There is a mature Video4Linux2 compliant driver (uvcvideo) available for GNU/Linux. Our prototype has good resolution accuracy and excellent time performances: less than 10 milliseconds are needed to elaborate a new frame and compute the pointer coordinates. Two USB webcams connected to the same computer can usually send less than 20 frames per second simultaneously, while the software layer could elaborate hundreds more. The tracking system is in C++ in a GNU/Linux environment; in the relatively small source code, all software layers are strictly separated, so that it is possible to port the whole system to different platforms with very little changes in the source.

21

Chapter 3:

Notes on the code

The code consists of separate files. Most of them are standard header files or contain many standard functions. Most of our efforts in coding were made in the files: constants.h, main.cpp and makefile.

3.1 Overall flow of the code


Start Detect screen size Initialize webcams & mouse handler

Grab 4 frames/webcam then average them to set a reference image for each webcam

Ask the user to touch the 4 corners of the


screen and its center

For each corner, compare the live frames of each webcam with its reference image

Yes RMSE > 8.0 No Redefine touchline after each corner

Any corner detection attempts > 100 No @ 22

Yes Exit

Calibration completed. Send values to GSL for calculations

Tracking

Any interrupts Yes

No

Inside the
tracking area Yes

No

Move mouse

Yes

Interrupts in pointing scanlines

No

Interrupts in Click mouse Yes touchline below the pointing interruptions No

23

3.2 Notes on main.cpp


3.2.1 Defining the two webcams we need
The following lines are responsible for defining which webcams to use:
const char *videodevice1 = "/dev/video1"; const char *videodevice2 = "/dev/video2";

If the host computer doesn't have any other webcams (doesn't have built-in webcam), then these lines should be like this:
const char *videodevice1 = "/dev/video0"; const char *videodevice2 = "/dev/video1";

In general, we used an application called "Cheese webcam" to test the webcam and to determine their ID. After installing "Cheese webcam" using "Ubuntu Software Center", go to Edit preferences And you can see a list of all connected webcams and their ID.

3.2.2 Smoothing neighborhood in averaging


It can be defined in the file constants.h, but it's defined in the file main.cpp for now. It determines how many pixels before and after the each pixel to blur horizontally.
unsigned int SMOOTHING_NEIGHBORHOOD = 2;

It shouldn't be high to keep the reference image realistic.

3.2.3 The color model to be used


Two color models available in the code: YUV and RGB. Selection is made using the following lines:
bool RGB_INSTEAD_OF_YUV = false;

False for YUV True for RGB

24

3.2.4 Debugging
There are two debugging modes. Debug_one is for debugging one webcam only (the first one) as we will be able to see a live streaming from the first webcam with a single horizontal line across the image defining the scanline resulting a histogram below the live stream showing interruptions as in fig. 2.2. And the other mode is an overall debugging. Activating any of them is using the following lines:
debug = false; debug_one = false;

If debug_one is activated (Making it "true") it will prevent rest of the code from running.

3.2.5 Luminance
The value of the following variable should be set depending on the luminance of the surrounding.
norm_luminance = false;

3.2.6 Control the code while running


Some options can be altered while the code is running as following: q: Quit. s: Edit smoothing neighborhood. l: Selecting the line to scan. h: Which histogram mode to use ( l for live, p for peak, s for static, d for differential). m: Which color model to use ( y for YUV, r for RGB). u: To update the reference images.

25

3.3 Notes on constants.h


3.3.1 Threshold of colors difference in each pixel
In general, and for YUYV model, the threshold can be controlled using the following lines:
const unsigned char COLOR_THRESHOLD = 20; const unsigned char Y_THRESHOLD = 20;

For RGB model, the threshold applied separately to each channel R, G, B.


const unsigned char R_THRESHOLD = 35; const unsigned char G_THRESHOLD = 38; const unsigned char B_THRESHOLD = 35;

3.3.2 Consecutive pixels threshold


How many sequence of consecutive pixels (not) to different whether the start (end) of an interruption?
const unsigned int LENGTH_THRESHOLD = 16; const unsigned int HOLE_THRESHOLD = 3;

3.3.3 Calibration touch offset


Difference between the lowest breakpoint detected in the image and the height of the scanline to choose for the interruption.
const unsigned int CALIBRATION_TOUCH_OFFEST = 8; //would edit it to make it 2

3.3.4 Consecutive detections to locate a corner


How many consecutive breaks are necessary to claim to have located the corner?
const unsigned int ALT_CALIBRATION_CONSECUTIVE_INTERRUPTIONS = 6; // make it 15

26

3.3.5 Limit of attempts to locate a corner


Maximum number of attempts for each corner detection.
const unsigned int CALIBRATION_CORNER_ATTEMPTS = 100;

3.3.6 Calibration scanlines distances


Distance between scanlines. The height of the touching line is established in the calibration, the others are calculated using this value.
const unsigned int CALIBRATION_SCALINES_DISTANCE = 20;

3.3.7 Picture format, resolution, fps and grab method


In the following lines, you should only enter the resolution, fps, format and grab method available by the webcams. Check the appendix for more details on how to get these details about any webcam.
const unsigned int width = 320; const unsigned int height = 240; const unsigned int fps = 30; const int grabmethod = 1; // Use mmap (default) // const int grabmethod = 0; // Ask for read instead default

mmap

const int format = V4L2_PIX_FMT_YUYV; // Better quality, lower framerate //const int format = V4L2_PIX_FMT_MJPEG; // Lower quality, higher frame rate

Note that entering an unsupported option would lead to error 22. And entering higher resolution without lowering the fps or using MJPEG format would lead to error 28 which is due to USB 2.0 bandwidth limitation. More details about error 22 and error 28 can be found in section 4.2.

27

3.4 Compiling the code after any edits


To compile the code on Ubuntu, press "Alt+Ctrl+T" to open the terminal. If the code is in the folder "YosrON" on Desktop, then type: cd Desktop/YosrON Note that all the commands in the terminal are case-sensitive even with the folder names.

To remove older compilation files, type: Make clean To make new compilation files, type: Make

To run the code, type: (For example) ./yosron

28

Chapter 4:

Challenges

4.1 The environment


We spent very long time searching for the best software environment starting from the programming language and toolboxes/libraries to use ending with the OS.

4.1.1 OpenCV on Windows


We started with OpenCV toolbox with Visual Studio C++ on Windows 7, 64-bit. We faced many problems at first due to incompatibilities between the latest version of OpenCV and windows 7. After lots of online searching, we were instructed to use an older version of OpenCV. We used version 2.2 and we were able to interface with the webcams. When we started to work on the code, we needed to process a single horizontal line of pixels only instead of processing the entire image which is a very essential function for our project as we wanted the software to be faster and light as much as we can. After consulting engineers of experience with OpenCV, we have been told that OpenCV can't do such a function and it must process the entire image. So, we had to look for other alternatives leading us to C/C++ programming on Linux/Ubuntu.

4.1.2 C/C++ programming on Linux/Ubuntu


We had to change our track from Windows to Linux, even that our time was very limited. We were encouraged to do so after we communicated with Eng. E. Rustico, the designer of EverScreen, and he supported us with very useful documentation, codes and instructions that helped us achieving our main target. The OS used is Ubuntu 11.10, 64-bit with kernel version 3.0.0-22 and gcc/g++ version 4.4.6. (gcc/g++ is the compiler of C/C++ on Linux) Installing Ubuntu is a little bit tricky as there are many options. We tried to install it using WUbi (Windows Ubuntu Installer) but we had many problems. After many attempts to fix those problems, we assumed that they would disappear if we tried the 29

installation all over again using another method. We had to remove all the installation again and install it from a boot CD alongside with Windows 7. Details about this process are available in the appendix.

4.1.3 Libraries that must be installed for the code


Build-essential: An informational list of needed packages for C/C++ programming on Linux as it generally includes gcc/g++ and other utilities and libraries.

Libc dev: It provides headers from the Linux kernel. These headers are used by the installed headers for GNU glibc and other system libraries.

SDL dev (libsdl1.2-dev): Simple DirectMedia Layer is a cross-platform multimedia library designed to provide low level access to audio, keyboard, mouse, joystick, 3D hardware via OpenGL, and 2D video framebuffer. It is used by MPEG playback software, emulators, and many popular games, including the award winning Linux port of "Civilization: Call To Power." SDL supports Linux, Windows, Windows CE, BeOS, MacOS, Mac OS X, FreeBSD, NetBSD, OpenBSD, BSD/OS, Solaris, IRIX, and QNX. The code contains support for AmigaOS, Dreamcast, Atari, AIX, OSF/Tru64, RISC OS, SymbianOS, and OS/2, but these are not officially supported. SDL is written in C, but works with C++ natively, and has bindings to several other languages, including Ada, C#, D, Eiffel, Erlang, Euphoria, Go, Guile, Haskell, Java, Lisp, Lua, ML, Objective C, Pascal, Perl, PHP, Pike, Pliant, Python, Ruby, Smalltalk, and Tcl. 30

GSL dev (libgsl0-dev): The GNU Scientific Library (GSL) is a numerical library for C and C++ programmers. It is free software under the GNU General Public License. The library provides a wide range of mathematical routines such as random number generators, special functions and leastsquares fitting. There are over 1000 functions in total with an extensive test suite.

Xorg XTest (libxtst-dev): The X window system (commonly X Window System or X11, based on its current major version being 11) is a computer software system and network protocol that provides a basis for graphical user interfaces (GUIs) and rich input device capability for networked computers. It creates a hardware abstraction layer where software is written to use a generalized set of commands, allowing for device independence and reuse of programs on any computer that implements X.

V4L2 dev (libv4l-dev): Video4Linux or V4L is a video capture application

programming interface for Linux. Many USB webcams, TV tuners, and other devices are supported. Video4Linux is closely integrated with the Linux kernel. V4L2 is the second version of V4L. The original V4L was introduced late into the 2.1.X development cycle of the Linux kernel. Video4Linux2 fixed some design bugs and started appearing in the 2.5.X kernels. Video4Linux2 drivers include a compatibility mode for Video4Linux1 application, though practically, the support can be incomplete and it is recommended to use V4L2 devices in V4L2 mode.

31

It's considered as an API that provides unified access to various video capturing devices, such as TV tuners, USB web cameras, etc.

UVC drivers: The USB video device class (also USB video class or UVC) is a USB device class that describes devices capable of streaming video like webcams, digital camcorders, transcoders, analog video converters, television tuners, and still-image cameras.

The latest revision of the USB video class specification carries the version number 1.1 and was defined by the USB Implementers Forum in a set of documents describing both the basic protocol and the different payload formats. Webcams were among the first devices to support the UVC standard and they are currently the most popular UVC devices. It can be expected that in the near future most webcams will be UVC compatible as this is a logo requirement for Windows and Since Linux 2.6.26 the driver is included in kernel source distribution.

luvcview: luvcview is a camera viewer for UVC based webcams. It includes an mjpeg decoder and is able to save the video stream as an AVI file.

guvcview: It provides a simple GTK interface for capturing and viewing video from devices supported by the linux UVC driver, although it should also work with any v4l2 compatible device. The project is based on luvcview for video rendering, but all controls are built using a GTK2 interface. It can also be used as a control window only

32

4.2 The cameras


The cameras were very hard to find in the Egyptian market due to lack of availability of highly technical details we need about any camera before we buy it. The cameras must be UVC compliant and support different control options for resolution, frames per second, color profiles etc. And we also needed the cameras to mechanically solid, stiff and capable of being fixed on any surface with the ability of changing the direction of the lenses towards any direction. First, we bought two 2B webcams and they worked with us nicely on OpenCV. But when we migrated to Ubuntu, we had a major problem in the first phase of the project (pointer/finder detection phase streaming from one webcam only) as they were working well with guvcview but producing an error (error 22) with our code. We checked their driver to make sure they are UVC compliant as we can't use the Windows driver provided in the CD. (Checking UVC compliance for webcams is available in the appendix). The error 22 was produced because the code was configured for MJPEG picture format which is a compressed format of the raw stream while the cameras only support YUYV format which is the uncompressed/raw format. MJPEG format had been chosen in the beginning because it needs low bandwidth of USB so that we can use 4 webcams or more on the same USB 2.0 bus while YUYV format consumes higher bandwidth with slightly better quality. Unfortunately, most or all webcams in the Egyptian market doesn't support MJPEG format and we have been told that they would be much more expensive. (Checking the supported formats by the webcam is available in the appendix) But when we moved to the second phase (stream from two webcams for calibration and calculating the pointer/finger position to move the mouse) we faced other errors (28 and 16). After searching online, we found that error 28 is due to USB bandwidth limitation and error 16 is due to device hanged. As we know, the bus of USB 2.0 supports a total bandwidth of 480 Mbps. calculating the required bandwidth for a webcam is based on the configuration of the webcam. For a resolution of 640 x 480, 30 frames per second and 32-bit colors: the required bandwidth = 640 x 480 x 30 x 32 = 294912000 bits/second = 294.912 Mbps

33

So, the total required bandwidth for two webcams = 2 x 294.912 = 589.824 Mbps which is higher than the 480 Mbps total bandwidth supported by USB 2.0. Overcoming this problem was supposed to be easy by setting the configuration of the webcams to fewer frames (15 fps) or lower resolution (320x240), but that didn't work. After spending more than a week investigating this problem and trying all the suggested solutions, we suspected that the 2B webcams only supports one bandwidth setting despite of the configuration which means that each webcam reserves a fixed USB bandwidth much more than it really need no matter what is the configuration. Error 16 is much related to error 22 as it means that the device is hanged and can't be accessed. When a webcam starts streaming, it reserves the bandwidth. When the other webcam starts to work on the same bus, it requests the needed bandwidth which is not available because of the first webcam. So, both webcams hang and stop responding while the system keeps their ports (i.e. /dev/video1) reserved forcing us to unplug and plug them again. Our final solution for these errors was to buy another two webcams that support either MJPEG format or variable bandwidth depending on the configuration. We didn't find webcams in the Egyptian market that support MJPEG format but we found A4Tech webcams that supported variable bandwidth depending on the configuration. A4Tech webcams don't support MJPEG and support only 30 frames per second. So we had to work with the configuration of 320 x 240 resolution which is acceptable for our needs.

4.3 The fisheye lenses


We need the view angle of each webcam to be more than 90 degrees to be able to put them very near to the screen and not having any blind areas. Most of webcams have a view angle of less than 60 degrees. So, we need to use fisheye lenses by installing each lens on each webcam.

34

We needed a full-frame fisheye lens that produces images as in fig. 4.1.

Figure .1: Full-frame fisheye image. 4

Then to remap it into rectilinear perspective (Defisheye) with any of the available scripts like Panorama Tools as in fig 4.2.

Figure .2: Remapped full-frame fisheye image into rectilinear prespective. 4

We searched in many places and asked many photographers and glass makers to help us finding a single lens that serves as a full-frame fisheye with a very small size for our webcams. But all the attempts failed.

35

We also couldn't find circular fisheye that would produce an image as in fig 4.3.

Figure .3: Circular fisheye image. 4

That also can be remapped into a normal image as in fig. 4.4.

Figure .4: The image of circular fisheye after remapping (Defisheye). 4

36

Our final hope is to use the only available fisheye small enough for YosON: The fisheye for home doors as in fig. 4.5. We removed its metallic housing as we don't need and while we need to make it smaller to fit in the plastic frame.
Figure 4.5: Fisheye for home doors.

After removing the housing of the webcams and fixing the fisheye lenses on them, we faced a problem that we couldn't overcome due to the lake of time and available support in Egypt. The problem was that the fisheye lens produced some internal reflections on the image (i.e. the lighting would be repeated in other parts in the image) increasing the noise to unacceptable levels. Another problem was the difficulties of finding two exactly identical fisheye lenses. We thought it should be a simple thing if we bought them both from the same brand and the same shop, but believe it or not: They weren't identical!! Although that "identical" problem is possible to overcome using software, but the killing problem was the "internal reflections" problem that made us postpone the fisheye addition and the plastic frame to future work.

37

Chapter 5:

Conclusions and Future Work

5.1 Conclusions
We presented a low cost system for bare finger tracking able to turn LCD displays into touchscreens, as well as a desk into a design board, or a wall into an interactive whiteboard. Many application domains can benefit from the proposed solution: designers, teachers, gamers, interface developers. The proposed system requires a simple calibration phase.

5.2 Future work


Future works will be devoted to improve the robustness of the calibration and the pointer-detection subsystems; moreover, suitable evaluation procedures to test the empirical accuracy of tracking will be addressed. Adding multitouch support will also be considered. The system needs a GUI for installation, calibration and configuration as all of them now are done by editing the source code which is not user friendly of course. It would be better for the system if the processing load is not on the host computer. That can be done by using a standalone DSP unit for image processing and position calculations which will lead to changes in the cameras and the code. A standalone DSP processing unit would be also good to make the system cross OS supported as all the processing will be made on that unit and it will only send signals to the OS through USB to move the mouse, do the clicks and even multitouch functions. That will save us from making drivers and code editions for each OS like Windows, Linux and Mac OS. Solving the problem of the fisheye lenses is still an essential need for YosrON to be a user friendly product. After solving this problem we can easily seek to put the entire hardware inside a resizable plastic as housing. 38

References
[Figure 1.1] Survey from UniMasr.com website at: Can be found at: http://unimasr.com/community/viewtopic.php?t=87470. [Figure 1.2] Survey from YosrON page on facebook (http://fb.com/yosronx) at: http://fb.com/questions/242871132427684/. [Figure 1.3] Image and price details from http://www.magictouch.com and local resellers available at: http://www.magictouch.com/middleeast.html. [Figure 2.8] A4Tech webcam, PK 720G model at: http://a4tech.com/product.asp?cid=77&scid=167&id=693. E. Rustico. "Low cost finger tracking for a virtual blackboard" at http://www.dmi.unict.it/~rustico/docs/Low%20cost%20finger%20tracking%2 0for%20a%20virtual%20blackboard.pdf. [AA07] Chandraker M. Blake A. Agarwal A., Shahram Izadi S. High precision multitouch sensing on surfaces using overhead cameras. In Horizontal Interactive Human-Computer Systems, 2007. TABLETOP 07. Second Annual IEEE International Workshop on, pages 197 200, 2007. [Ber03] F. Berard. The magic table: Computer vision based augmentation of a whiteboard for creative meetings. IEEE International Conference in Computer Vision, 2003. [CT06] Kelvin Cheng and Masahiro Takatsuka. Estimating virtual touchscreen for fingertip interaction with large displays. In OZCHI 06: Proceedings of the 20th conference of the computer-human interaction special interest group (CHISIG) of Australia on Computer-human interaction: design: activities, artefacts and environments, pages 397400, New York, NY, USA, 2006. ACM. [DUS01] Klaus Dorfmller-Ulhaas and Dieter Schmalstieg. Finger tracking for interaction in augmented environments. Augmented Reality, International Symposium on, 0:55, 2001. [FR08] G.M. Farinella and E. Rustico. Low cost finger tracking on flat surfaces. In Eurographics Italian chapter 2008, 2008. [GMR02] D. Gorodnichy, S. Malik, and G. Roth. Nouse use your nose as a mouse a new technology for hands-free games and interfaces, 2002.

39

[GOSC00] Christophe Le Gal, Ali Erdem Ozcan, Karl Schwerdt, and James L. Crowley. A sound magicboard. In ICMI 00: Proceedings of the Third International Conference on Advances in Multimodal Interfaces, pages 6571, London, UK, 2000. Springer-Verlag.

[IVV01] Giancarlo Iannizzotto, Massimo Villari, and Lorenzo Vita. Hand tracking for human-computer interaction with gray level visual glove: turning back to the simple way. In PUI 01: Proceedings of the 2001 workshop on Perceptive user interfaces, pages 17, New York, NY, USA, 2001. ACM.

[Jen99] Cullen Jennings. Robust finger tracking with multiple cameras. In In Proc. Of the International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, pages 152160, 1999.

[LB04] Julien Letessier and Franois Brard. Visual tracking of bare fingers for interactive surfaces. In UIST 04: Proceedings of the 17th annual ACM symposium on User interface software and technology, pages 119122, New York, NY, USA, 2004. ACM.

[Lee07] Johnny Chung Lee. Head tracking for desktop VR displays using the Wii remote http://www.cs.cmu.edu/~johnny/projects/wii. 2007. [ML04] Shahzad Malik and Joe Laszlo. Visual touchpad: a two-handed gestural input device. In ICMI 04: Proceedings of the 6th international conference on Multimodal interfaces, pages 289296, New York, NY, USA, 2004. ACM.

[MmM01] Lionel Moisanm and Jean Michel Morel. Edge detection by Helmholtz principle. Journal of Mathematical Imaging and Vision, 14:271 284, 2001.

[Mor05] Gerald D. Morrison. A camera-based input device for large interactive displays. IEEE Computer Graphics and Applications, 25(4):5257, 2005.

[Mos06] Tomer Moscovich. Multi-finger cursor techniques. In In GI 06: Proceedings of the 2006 conference on Graphics interface, pages 17, 2006. [pHYssCIb98] Yi ping Hung, Yang Yao-strong, Yong sheng Chen, and Hsieh Ingbor. Freehand pointer by use of an active stereo vision system. In Proc. 14th Int. Conf. Pattern Recognition, pages 12441246, 1998.

40

[QMZ95] F. Quek, T. Mysliwiec, and M. Zhao. Fingermouse: A freehand computer pointing interface, 1995. [SP98] Joshua Strickon and Joseph Paradiso. Tracking hands above large interactive surfaces with a low-cost scanning laser range finder. In Proceedings of CHI98, pages 231232. Press, 1998.

[ST05] Le Song and Masahiro Takatsuka. Real-time 3d finger pointing for an augmented desk. In AUIC 05: Proceedings of the Sixth Australasian conference on User interface, pages 99108, Darlinghurst, Australia, Australia, 2005. Australian Computer Society, Inc.

[vHB01] Christian von Hardenberg and Franois Brard. Bare-hand humancomputer interaction. In PUI 01: Proceedings of the 2001 workshop on Perceptive user interfaces, pages 18, New York, NY, USA, 2001. ACM.

[WC05] Andrew D.Wilson and Edward Cutrell. Flowmouse: A computer vision-based pointing and gesture input device. In Interact 05, 2005. [Wel93] Pierre Wellner. Interacting with paper on the digitaldesk. Communications of the ACM, 36:8796, 1993. [Wil05] Andrew D. Wilson. Play anywhere: a compact interactive tabletop projection-vision system. In Patrick Baudisch, Mary Czerwinski, and Dan R. Olsen, editors, UIST, pages 8392. ACM, 2005.

[WSL00] Andrew Wu, Mubarak Shah, and N. Da Vitoria Lobo. A virtual 3d blackboard: 3d finger tracking using a single camera. In In Fourth IEEE International Conference on Automatic Face and Gesture Recognition, pages 536543, 2000.

[Zha03] Zhengyou Zhang. Vision-based interaction with fingers and papers. In Proc. International Symposium on the CREST Digital Archiving Project, pages 83106, 2003.

Details about guvcview package from: http://guvcview.sourceforge.net. Details about luvcview package from: http://packages.ubuntu.com/hardy/luvcview. Details about V4L2 library from: http://en.wikipedia.org/wiki/Video4Linux. Details about SDL library from: http://www.libsdl.org. Details about GSL library from: http://www.gnu.org/software/gsl.

41

Details about Xorg Xtest from http://en.wikipedia.org/wiki/X_Window_System. Details about build-essential package from: http://packages.ubuntu.com/lucid/build-essential. Details about UVC drivers from: http://en.wikipedia.org/wiki/USB_video_device_class. Details about Libc dev package from: http://packages.debian.org/sid/linuxlibc-dev. Details about fisheye lenses from: http://en.wikipedia.org/wiki/Fisheye_lens. Details about defisheye scripts from: http://www.fmwconcepts.com/imagemagick/defisheye/index.php. How to install Ubuntu 11.10 from a CD or USB flash memory. From: http://blog.sudobits.com/2011/09/11/how-to-install-ubuntu-11-10-from-usbdrive-or-cd/

How to free space on your hard disk and make it unallocated using Windows Disk Management Tool. From: http://technet.microsoft.com/enus/magazine/gg309169.aspx.

How to disable automatic updates in Ubuntu. From: http://www.garron.me/linux/turn-off-stop-ubuntu-automatic-update.html. How to install build-essential from: https://help.ubuntu.com/community/CompilingEasyHowTo. How to check UVC compliance of a webcam and troubleshoot it from: http://www.ideasonboard.org/uvc/faq.

42

Chapter 0:

Appendix

0.1 Installing Ubuntu 11.10


The instructions given in this section assume that you want to install Ubuntu 11.10 as a dual boot with Windows 7 (or XP/Vista or whatever youve already installed), which is recommended for absolute beginners as if any problem occurs with Ubuntu. That's how you would still be able to access Windows, but if you want something else like removing windows and install Ubuntu or erase whole disk and install Ubuntu on a new computer then most of the steps would be same few things will change that Ive pointed out (Jump to steps).

Preparing for installation: First of All backup your important data This step is very important, especially for beginners, as some mistakes would lead to reformatting the entire hard disk and losing data. So, Before going to start the installation procedure you are strongly recommended to backup your data (using a backup disk or online backup program), although you arent going to lose any if youve multiple partition on your drive and want to go for custom installation procedure, but youre supposed to have a backup of all your critical data before starting any experiments.

Step 1: Download Ubuntu 11.10 ISO file First, Download Ubuntu 11.10 ISO (http://releases.ubuntu.com/oneiric), then select the archive file (ISO) depending on your computer architecture such as Intelx86 or AMD64. If you are not sure then go for first one. When the download is completed, move on to next step.

43

Step 2: Create a bootable media (USB/CD) You can create a bootable USB stick/drive or a CD/DVD from the ISO file youve just downloaded. If you want to create a bootable CD/DVD then its pretty easyyou just need to burn the ISO image to the cd. If you want to install Ubuntu from a USB flash memory (pendrive), then use the free program called universal USB installer. To make your pendrive bootable use Universal-USB-Installer (Download from "http://www.pendrivelinux.com/universalusb-installer-easy-as-1-2-3" and run it then locate the ISO file, choose your USB drive as a target and your will be done in a minute). In Windows 7 you can burn ISO files directly in few simple steps Insert cd in to the tray, right click on the ISO file and select burn this ISO and finally you will get a bootable cd.

Step 3: Free enough space Explore your partitions and make sure that one of them has at least 20 GB free. Then use the Windows 7 Disk Management tool that provides a simple interface for managing partitions and volumes. Heres an easy way to shrink a volume: 1. Open the Disk Management console by pressing "Windows key + R" and typing diskmgmt.msc at an elevated command prompt. 2. In Disk Management, right-click the volume that you want to shrink, and then click Shrink Volume.

Figure 0.1: Windows Disk Management Tool.

44

3. In the field provided in the Shrink dialog box, enter the amount of space by which to shrink the disk.

Figure .2: Shrink dialog box. 0

The Shrink dialog box provides the following information:

Total Size Before Shrink In MB Lists the total capacity of the volume in MB. This is the formatted size of the volume. Size Of Available Shrink Space In MB Lists the maximum amount by which you can shrink the volume. This doesnt represent the total amount of free space on the volume; rather, it represents the amount of space that can be removed, not including any data reserved for the master file table, volume snapshots, page files, and temporary files. Enter The Amount of Space To Shrink In MB Lists the total amount of space that will be removed from the volume. The initial value defaults to the maximum amount of space that can be removed from the volume. For optimal drive performance, you should ensure that the volume has at least 10 percent of free space after the shrink operation. Total Size After Shrink In MB Lists what the total capacity of the volume in MB will be after you shrink the volume. This is the new formatted size of the volume.

45

4. After clicking "Shrink", you should see the free space as a green partition.

Figure .3: Windows partitions after successfully freeing space. 0

That free unallocated space will be automatically used by the Ubuntu installer.

Step 4: Insert the USB disk (or CD) and restart Now restart your computer and enter the BIOS to make sure that your computer is configured to boot first from CD or USB drives. The steps of this configuration are not the same for every computer, so you have to do it yourself. You can search online according to the model for your motherboard or you can ask the help of any available technical support for you. When your computer is booting, if you have set any password, enter your supervisor BIOS password as your system may not boot from CD if you enter user BIOS password. Your computer should boot automatically from the bootable media, and the Ubuntu will be loaded in RAM. If any option comes then select "try Ubuntu without installing" if you want to take a look before installing it on your hard drive. Then click on the install Ubuntu 11.10 icon on the desktop to begin and select the language "English" to continue.

46

Step 5: Select Installation Type For YosrON, we should not allow any updates for the environment, especially kernel updates or an entire upgrade, as this might lead to incompatibilities between the software headers and the kernel headers. So, make sure to uncheck "Download Updates" but you can check "install third party software", but you must be connected with Internet (its recommended if wireless network doesnt seem to work use wired connection). Although there is no hurry you can always install them later, so its optional.

Figure .4: Don't allow any updates 0

then click on continue then a new window will appear where you need to select installation type.

47

Figure .5: Install Ubuntu alongside your current OS. 0

You may get different options depending on your computer configuration. The above snapshot has been taken while installing Ubuntu 11.10 on a computer with Ubuntu 10.04 and Windows 7 pre-installed as dual boot.

Install Ubuntu alongside with them: it will install Ubuntu 11.10 alongside with existing operating systems such as Windows 7.

Erase Entire Disk and Install Ubuntu: its going to erase your whole hard drive and everything will be deleted (your files as well as other operating systems), useful only if your hard-drive doesnt have any important files or you just bought a new computer and want to keep only one OS i.e. Ubuntu.

Something Else: Create, Allocate and choose the partition to which you want to install Ubuntu, using advanced partition manager. At first look it may seems little difficult but its better as it give you more options/control.

However, we will go with the first option select "Install Ubuntu alongside them" and continue.

48

Step 6: Finishing the installation The rest of the steps are easy for any user and they are standards as available online. But it's important to select the correct keyboard layout to ensure no problems later. Most of keyboards in Egypt are "Arabic 101" layout. Also, it's very important to set a password for Ubuntu and remember is very well as we will use it in installing the required libraries and packages for YosrON.

Step 7: Disabling automatic updates As we mentioned before, it's very important for YosrON to disable the automatic updates feature in Ubuntu. From the menu on the left of the screen, Open the Ubuntu Software Center then go to Edit -> Software Sources and be sure to select to Never the option Automatically check for updates:.

Figure .6: Disabling automatic updates of Ubuntu. 0

Then click "close". This will disable automatic update on you Ubuntu box.

49

0.2 Installing the required libraries and packages


Some libraries/packages can be installed directly from "Ubuntu Software Center" and others must be installed from the "Terminal". The internet connection must be available in both cases. "Ubuntu Software Center" can be opened from the menu on the left of the screen. The "Terminal" (AKA: command-line) can be obtained by pressing "Alt+Ctrl+T".

0.2.1 Installing "build-essential" using the "Terminal"


We need to install a compiler gcc which can be obtained by installing the buildessential package. Step 1: Prep your system for building packages By default, Ubuntu does not come with the tools required. You need to install the package build-essential for making the package and checkinstall for putting it into your package manager. These can be found on the install CD or in the repositories, searching in Synaptic Package Manager or the terminal apt-get: sudo apt-get install build-essential checkinstall And since you may want to get code from some projects with no released version, you should install appropriate version management software. sudo apt-get install cvs subversion git-core mercurial You should then build a common directory for yourself where you'll be building these packages. We recommend creating "/usr/local/src", but really you can put it anywhere you want. Make sure this directory is writable by your primary user account, by running sudo chown $USER /usr/local/src and, just to be safe sudo chmod u+rwx /usr/local/src After you've done this, you're set up to start getting the programs you need. 50

Step 2: Resolving Dependencies One nice thing about modern Linux distributions is they take care of dependencies for the user. That is to say, if you want to install a program, the apt program will make sure it installs all needed libraries and other dependent programs so installing a program is never more difficult than just specifying what you need and it does the rest. Unfortunately with some programs this is not the case, and you'll have to do it manually. It's this stage that trips up even some fairly experienced users who often give up in frustration for not being able to figure out what they need to get.

You probably want to read about the possibilities and limitations of auto-apt (https://help.ubuntu.com/community/AutoApt) first, which will attempt to take care of dependency issues automatically. The following instructions are for fulfilling dependencies manually:

To prepare, install the package "apt-file", and then run sudo apt-file update. This will download a list of all the available packages and all of the files those packages contain, which as you might expect can be a very large list. It will not provide any feedback while it loads, so just wait. The "apt-file" program has some interesting functions, the two most useful are aptfile search which searches for a particular file name, and apt-file list which lists all the files in a given package. (Two explanations:

1{http://debaday.debian.net/2007/01/24/apt-file-search-for-files-in-packagesinstalled-or-not/} and 2{http://www.debianhelp.co.uk/findfile.htm}) To check the dependencies of your program, change into the directory you created in step two (cd /usr/local/src). Extracting the tarball or downloading from

"cvs/subversion" will have made a sub-directory under "/usr/local/src" that contains the source code. This newly-created directory will contain a file called "configure", which is a script to make sure that the program can be compiled on your computer. To run it, run the command ./configure This command will check to see if you've got all the programs needed to install the program in most cases you will not, and it will error out with a message about needing a program.

If you run ./configure without any options, you will use the default settings for the program. Most programs have a range of settings that you can enable or 51

disable, if you are interested in this check the README and INSTALL files found in the directory after decompressing the tar file. You can check the developer documentation and in many cases ./configure --help will list some of the key configurations you can do. A very common options is to use ./configure --prefix=/usr which will install your application into "/usr" instead of "/usr/local" as my instructions do. If this happens, the last line of output will be something like configure: error: Library requirements (gobbletygook) not met, blah blah blah stuff we don't care about But right above that it will list a filename that it cannot find (often a filename ending in ".pc", for instance). What you need to do then is to run apt-file search missingfilename.pc which will tell you which Ubuntu package the missing file is in. You can then simply install the package using sudo apt-get install requiredpackage Then try running ./configure again, and see if it works. If you get to a bunch of text that finishes with "config.status: creating Makefile" followed by no obvious error messages, you're ready for the next steps.

Step 3: Build and install If you got this far, you've done the hardest part already. Now all you need to do is to make sure you are inside the program folder (for example: a folder on the desktop called YosrON), type: cd Desktop/YosrON Then, run the command make

52

which does the actual building (compiling) of the program. (You can use make clean to remove older compilation files after any edits you make in the code then use make again) Make sure you installed all the libraries/packages needed for YosrON before running this command. Check the following sections. When it's done, install the program. You probably want to use sudo checkinstall which puts the program in the package manager for clean, easy removal later. This replaces the old sudo make install command. See the complete documentation at CheckInstall (https://help.ubuntu.com/community/CheckInstall). Note: If checkinstall fails you may need to run the command like sudo checkinstall --fstrans=0 which should allow the install to complete successfully Then the final stage of the installation will run. It shouldn't take long. When finished, if you used checkinstall, the program will appear in Synaptic Package Manager. If you used sudo make install, your application will be installed to "/usr/local/bin" and you should be able to run it from there without problems. Finally, it would be better to change the group of "/usr/local/src/" to admin and give them rwx privileges? Since anyone adding and removing software should be in the admin group.

0.2.2 Installing libraries/packages using "Ubuntu Software Center"


"Ubuntu Software Center" is much easier to be used for installing packages and libraries. First of all, you need to enable installing/updating software from other sources other than Ubuntu. This is done by all opening the software center from the menu on the left of the screen and going to Edit Software sources Other software

53

Check the box "Canonical Partners" as in fig. 6.7, then click "Close".

Figure .7: Using Ubuntu Software Center to install required libraries/packages. 0

Then, type the name (code) of what you want to install in the search box found in the upper right of the window. For example: type "guvcview" and it will appear in the results. Just click "Install". Some libraries/packages can't be installed from the "Ubuntu Software Center" leading us to the "Terminal". For example: to install the SDL library, type: sudo apt-get install libsdl1.2-dev

0.3 Check webcam supported formats and UVC compliance


0.3.1 UVC compliance check
1. First find out the vendor ID (VID) and product ID (PID) of the webcam. Use: lsusb which will list all your USB devices including the VID and PID in this format: VID:PID.

54

2. Use the lsusb tool again to look for video class interfaces like this: (In this example, the VID is 046d and the PID is 08cb.) lsusb -d 046d:08cb -v | grep "14 Video" If the webcam is a UVC device, you should see a number of lines that look like this:
bFunctionClass bInterfaceClass bInterfaceClass bInterfaceClass 14 Video 14 Video 14 Video 14 Video

In this case the Linux UVC driver should recognize your camera when you plug it in. If there are no such lines, your device is not a UVC device.

0.3.2 Supported configurations and formats


This is done using guvcview. Type guvcview in the terminal then go to "Video & files". You can see all the supported configurations and formats for every webcam.

Figure .8: Checking supported configurations and formats using guvcview. 0

55

0.3.3 Troubleshooting webcams


If the webcam is UVC-compatible, it should be supported out of the box in any recent Linux distribution. Failures are usually caused by buggy applications or broken hardware (cameras, USB cables and USB host controllers can be faulty). You should start with trying several applications. qv4l2, guvcview and luvcview are common test tools for UVC webcams, but feel free to try other V4L2 applications as well. In particular be careful that different webcams might use different video formats, and some of them can be unsupported in some applications. If all applications fail display the same failure, chances are that your hardware is broken (or at least buggy), or that you're lucky enough to have hit a bug in the UVC driver. To diagnose the problem, please follow this procedure: 1. Make sure the webcam is UVC compliant as mentioned in a previous section. 2. Enable all uvcvideo module traces: sudo echo 0xffff > /sys/module/uvcvideo/parameters/trace 3. Reproduce the problem. The driver will print many debugging messages to the kernel log, so don't let video capture running for too long. You can disable the uvcmodule traces when you're done: sudo echo 0 > /sys/module/uvcvideo/parameters/trace 4. Capture the contents of the kernel log: dmesg > dmesg.log 5. If your device is not listed in the supported devices list

(http://www.ideasonboard.org/uvc/#devices), dump its USB descriptors: lsusb -d VID:PID -v > lsusb.log (replace VID and PID with your device VID and PID)

56

You might also like