Capture images using V4L2 on Linux

Original Post on my new Blog

I have always been using OpenCV’s VideoCapture API to capture images from webcam or USB cameras. OpenCV supports V4L2 and I wanted to use something other than OpenCV’s VideoCapture API so I started digging up about v4l2 and got few links using and few examples using which I successfully wrote a small code to grab an image using V4L2 and convert it to OpenCV’s Mat structure and display the image.

What is V4L2?

V4L2 is the second version of Video For Linux which is a video capturing API for Linux. Here you can find amazing documentation about the API. So it gives you a very easy inteface to use it with C, C++ and Python. I haven’t tried Python bindings yet.

How To Use V4L2 API?

I started reading documentation but didn’t really understand much until I found this example. The code had some issues and wasn’t working properly. But I just copied it and tried understanding it. So this is my understanding of the code.

Step 1: Open the Capture Device.

In Linux, default capture devide is generally /dev/video0, but if you’re using USB webcams, the index will vary accordingly.

int fd;
fd = open("/dev/video0", O_RDWR);
if (fd == -1)
    // couldn't find capture device
    perror("Opening Video device");
    return 1;

Step 2: Query the Capture

So, basically you check if the capture is available or not. V4L2 doesn’t support some cameras so it would throw an error here. We need to use v4l2_capability structure and VIDIOC_QUERYCAP to query the capture. Read More here.

struct v4l2_capability caps = {0};
if (-1 == xioctl(fd, VIDIOC_QUERYCAP, &caps))
    perror("Querying Capabilites");
    return 1;

Here xioctl is a wrapper function over ioctl. ioctl() is a function to manipulate device parameters of special files. Read more here.

#include <sys/ioctl.h>

static int xioctl(int fd, int request, void *arg)
    int r;
        do r = ioctl (fd, request, arg);
        while (-1 == r && EINTR == errno);
        return r;

Step 3: Image Format

V4L2 provides an easy interface to check the image formats and colorspace that your webcam supports and provide. v4l2_format sturcture is to be used to change image format.

struct v4l2_format fmt = {0};
fmt.fmt.pix.width = 320;
fmt.fmt.pix.height = 240;
fmt.fmt.pix.pixelformat = V4L2_PIX_FMT_MJPEG;
fmt.fmt.pix.field = V4L2_FIELD_NONE;

if (-1 == xioctl(fd, VIDIOC_S_FMT, &fmt))
    perror("Setting Pixel Format");
    return 1;

I have set image width and height to be 320 and 240 respectively. You should check out the format that your camera supports. My Camera supports MJPEG and YUV and hence I have set image format to MJPEG.

Step 4: Request Buffers

A buffer contains data exchanged by application and driver using Streaming I/O methods.v4l2_requestbuffers is used to allocate device buffers. Read more here.

struct v4l2_requestbuffers req = {0};
req.count = 1;
req.memory = V4L2_MEMORY_MMAP;

if (-1 == xioctl(fd, VIDIOC_REQBUFS, &req))
    perror("Requesting Buffer");
    return 1;

The ioctl is used to initialize memory mapped(mmap), user pointer based I/O.

Step 5: Query Buffer

After requesting buffer from the device, we need to query the buffer in order to get raw data. Read morehere

struct v4l2_buffer buf = {0};
buf.memory = V4L2_MEMORY_MMAP;
buf.index = bufferindex;
if(-1 == xioctl(fd, VIDIOC_QUERYBUF, &buf))
    perror("Querying Buffer");
    return 1;

buffer = mmap (NULL, buf.length, PROT_READ | PROT_WRITE, MAP_SHARED, fd, buf.m.offset);

The mmap() function asks to map length bytes starting at offset in the memory of the device specified by fd into the application address space, preferably at address start. Read more here

Step 6: Capture Image

After querying the buffer, the only thing left is capturing the frame and saving it in the buffer.

if(-1 == xioctl(fd, VIDIOC_STREAMON, &buf.type))
    perror("Start Capture");
    return 1;

fd_set fds;
FD_SET(fd, &fds);
struct timeval tv = {0};
tv.tv_sec = 2;
int r = select(fd+1, &fds, NULL, NULL, &tv);
if(-1 == r)
    perror("Waiting for Frame");
    return 1;

if(-1 == xioctl(fd, VIDIOC_DQBUF, &buf))
    perror("Retrieving Frame");
    return 1;

Step 7: Store data in OpenCV datatype

I wanted to stored the retrieved data in OpenCV image structure. It took me few hours to figure out the perfect way. So here’s how I did it.

CvMat cvmat = cvMat(480, 640, CV_8UC3, (void*)buffer);
IplImage * img;
img = cvDecodeImage(&cvmat, 1);

So this how I captured frames from my webcam and stored in OpenCV Image data structure.

You can find the complete code here on my GitHub

P.S. Coding period for gsoc has started and I have to start working.

Kinect with OpenCV using Freenect

I recently got a Kinect to work on at my sumemr internship at Ducere Technologies. Having heard so much about OpenNI, I tried installing it on my Ubuntu 12.04 LTS 64 bit machine. It took some time to configure, build and install it. I got help from few blogs. I managed to install it somehow. Connected the kinect and fired up the demo program and ran into troubles. I was getting following error.

A timeout has occurred while waiting for new data

Read More At My New blog

OpenCV with Android SDK Camera

Original Post Here – Using Android SDK Camera with OpenCV on my blog.

So I’m currently working on HTC Evo V 4G and was desparately trying to obtain images from both the camera. One thing was sure that I couldn’t use OpenCV’s Java Camera or Native Camera (it doesn’t even work with ICS). I decided to use Android SDK Camera. I tried posting question on stackoverflow and OpenCV forum, but couldn’t find any proper solutions. I tried taking pieces of code from wherever I could and wrote something, but it wouldn’t work. I also found a perfectly working code but it was giving me Static Linkage Errors. It meant the OpenCV manager couldn’t be loaded in the application. I had done everything step by step but it wasn’t working.

Since my phone has two back cameras (stereoscopic), I was finding it very difficult to access both the cameras and convert it to OpenCV images (Mat format). Someone adviced me to use HTC Open Sense SDK to access both the cameras. So I downloade HTC Open Sense SDK and installed it as mentioned on HTC Website. I loaded one of the example applications on the phone and started using it. It worked alright. I browsed the source and found a code which used 3D camera. You can find it on HTC Website

So, I used it as the base of my code. Tried it and switced off the 2D view. So far so good. The applicaiton was working fine. Dan advised me to use Camera Intent. I tried Camera Intent and it turned out to be very good. It started a new thread/application which captured image on pressing the button and also asked whether you wanted to save it or not. Pretty useful application. But it couldn’t be used in my case becuase whenever I used Intent, it started the camera in 2D mode, i.e. only one camera was capturing the image. So I had to look for some other solution.

There’s a similar question on stackoverflow which proved to be very useful. Using the code provided to capture images and store it in bytes, I used OpenCV Mat’s put instance to store the data in OpenCV Mat. Plus, I edited the HTC code a bit for my own use so that I could use both cameras and store the image data in bytes.

Whenevr I pressed on text/button, using onTouchEvent, I’d use addCallbackBuffer and setPreviewCallbackWithBuffer to get the raw image data in bytes.

public boolean onTouchEvent(MotionEvent event) {
    switch (event.getAction()) {
    case MotionEvent.ACTION_DOWN:
    //  toggle();
        //Intent cameraIntent = new Intent(android.provider.MediaStore.ACTION_IMAGE_CAPTURE); 
        //startActivityForResult(cameraIntent, 1337);
        int bufferSize = width * height * 3;
        byte[] mPreviewBuffer = null;

        // New preview buffer.
        mPreviewBuffer = new byte[bufferSize + 4096];

        // with buffer requires addbuffer.
    return true;

Now, the function where I’d store the bytes data in OpenCV Mat.

private final Camera.PreviewCallback mCameraCallback = new Camera.PreviewCallback() {
public void onPreviewFrame(byte[] data, Camera c) {
    Log.d(TAG, "ON Preview frame");
    img = new Mat(height, width, CvType.CV_8UC1);
    gray = new Mat(height, width, CvType.CV_8UC1);
    img.put(0, 0, data);        
    Imgproc.cvtColor(img, gray, Imgproc.COLOR_YUV420sp2GRAY);
    String pixvalue = String.valueOf(gray.get(300, 400)[0]);
    String pixval1 = String.valueOf(gray.get(300, 400+width/2)[0]);
    Log.d(TAG, pixvalue);
    Log.d(TAG, pixval1);
        // to do the camera image split processing using "data"

The image that we get from Android SDK Camera is in YUV420s Colorspace and we wanted it in BGRA/Grayscale corolspace. So we tried converting it, but we were getting only 0.0 as the data, so we figured there was some problem with YUV420s colorspace format. We looked up on the google and realized that OpenCV requires only 1 channel (not 4 or 3 channel) Mat to store YUV420S colorspace image. So, now we have both the images stored in Mat. Both images are places side by side. We can use both the images by splitting the Mat in half.

So this is a way to use Android SDK Camera to take image and convert it OpenCV Mat. I have posted the solution on stackoverflow. You can find it here.

P.S. More posts to come on image stitching and Android.

Freehand | Design Innovation 2013 | MIT Media Lab | PESIT

Searching for free bathroom to bathe in cold water on a cold morning in Bangalore, well, that’s how the Design Innovation workshop started for all of us who were placed at RIE. Despite of the terrible accomodation, the workshop turned … Continue reading

SIFT Keypoint Matching using Python OpenCV

I have been working on SIFT based keypoint tracking algorithm and something happened on Reddit. Kat wanted this is Python so I added this feature in SimpleCV. Here’s the pull request which got merged. SIFT KeyPoints Matching using OpenCV-Python: To … Continue reading

SIFT based Tracker

Scale-invariant feature transform (or SIFT) is an algorithm in computer vision to detect and describe local features in images. The algorithm was published by David Lowe in 1999.SIFT is a method to detect distinct, invariant image feature points, which easily … Continue reading