New path ahead!

莫名其妙的转行到Online Advertising行业,前两年的工作经验积累在目前的岗位上几乎完全用不上了。 之前所做主要与图像/视频算法的实现有关系,侧重于基于显卡的GPGPU异构计算;取而代之,工作重心转移到了企业级Java应用。

于是,关于OpenCV/OpenCL的内容,可能就要无限期的暂停了。接下来,也许随着经验的积累,我会写点跟目前更有紧密联系的内容。

学习IOS7开发ING

最近这段时间我开始从零单排学起IOS开发。学习过程中,着实感觉到苹果公司以及社区的学习资料对于初学者的友好。 甚至于苹果公司自己提供的技术文档都散发着苹果公司自身的美学气息。这是我在学习其他编程语言或技术所没感受到的。 因为长期以来,程序员(尤其是开源社区)的风格基本就是自带文档少的可怜或者资料残缺过时根本没用处。基本上来说,对陌生的技术/开源源码,基本上是不能指望开发者的文档的,只能靠自己阅读代码去理解。

在这里,为了方便后来者,我整理了一下我在IOS开发过程中参阅学习的资料:

  1. 斯坦福IOS7在线课程 via Itunes U。我的最初和主要的学习资料。这门课程内容十分丰富,每节1小时的1080P全程课堂视频,完整的课堂笔记和作业,极为专业的教授,而且免注册。推荐认真学习。

  2. 马上着手开发 iOS 应用程序 入门级的IOS学习资料,但内容跨度很大,覆盖了几乎完整的IOS应用的开发流程。

  3. iOS 用户界面指南规范 规范内容并不枯燥乏味,反而由于大量图片/视频实例,大大增强了文档的可读性。从这里可以学习到很多苹果公司(Jonathan Ive)对于IOS7方方面面设计细节,让人折服于苹果对于用户界面和体验的深刻体会,可谓苹果粉的圣经,推荐阅读。

  4. 使用 Objective-C 编程 为了编写IOS程序,学习Objective-C是必不可少的。

  5. 以后再加。。。(xp你不觉得这篇很水吗?

I don't know C

C is purer than C++. It does not have so many obscure features and ambiguous grammar. I understand these facts and thought there would be nothing more to learn about the C language itself. This was almost true in my mind until I met some open-source projects coded in C, i.e., x264 and ffmpeg. In this article, I will not talk about the x264 techniques but only the C language.

A colleague poked me yesterday and asked how to read the array structure below, which was originally found in x264 implementation. I edited it for explanation:

int16_t (*mv[2][2])[2];

For me this kind of presentation of array structure declaration was seen rarely. I paused for a few seconds and recalled a spirial rule I had learnt in college (probably 5 years ago). Back to that time I did not pay much attention to that because I could not understand it due to lack of coding experience. I did not manage to decipher it in a way both of us could understand at first and thus I went through the spirial rule.

So to speak in spirial rule, we may draw it in such way:

                     +-----------+
                     | +---+     |
                     | ^   |     |
            int16_t (*mv[2][2])[2];
             ^       ^     |     |
             |       +-----+     |
             +-------------------+

In speaking, it could be explained in the following English statement:

mv is a 2x2 2D array of pointers to int16_t[2].

It may still unclear to understand. I extend it in this way:

mv is a 2x2 2D array. Each of the array element is a pointer. Each pointer is pointing to one int16_t[2] element.

For now I think it will not be that wired to see why x264 accesses mv in, for example mv[0][1][6376][1], patterns.


In debugging we found that x264 sometimes use negative indexes in an array. e.g.:

int t = some_random_int_array[-1]; 

it is like why the hell can indexes are negative? However it turns out to be totally legal and not like python, negative indexes indicate elements before the first element of the array. This is because the pattern array[idx] is equivalent to *(array + idx). This SO thread explains and quotes the following from C99 §6.5.2.1/2:

The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).


The story of learning new facts wen on and then I met designated initializers but I do not want to repeat every details of the specification here. As an short example in ffmpeg, I saw this:

AVCodec ff_libx264_encoder = {
    .name             = "libx264",
    .long_name        = NULL_IF_CONFIG_SMALL("libx264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10"),
    .type             = AVMEDIA_TYPE_VIDEO,
    .id               = AV_CODEC_ID_H264,
    .priv_data_size   = sizeof(X264Context),
    .init             = X264_init,
    .encode2          = X264_frame,
    .close            = X264_close,
    .capabilities     = CODEC_CAP_DELAY | CODEC_CAP_AUTO_THREADS,
    .priv_class       = &x264_class,
    .defaults         = x264_defaults,
    .init_static_data = X264_init_static,
};

I can guess what is the dot variable name is about, but did not ever imagine C can do something like this!

These all kinds of both new/old facts refreshed my attitude towards C. I knew C++ is a language hard to master all the details, but I have always underestimated C as well. Language is evolving itself all the time even for C.


References

The ``Clockwise/Spiral Rule''

Negative array indexes in c

Designated Initializers

简易facedetect库

这两天做了一个Cascade Classifier人脸检测的项目,放到了Github上。

主要功能有:

  1. 预先载入进内存的cascade文件
  2. 读取I420图片和视频的一些小工具
  3. API设计成用户直接给定的图片数据的原始指针/图片颜色格式/图片大小
  4. OpenCL支持

实现起来很直接,不过有几点挺有意思的,比较值得注意一下:

xml2header.cmake

cascade文件预读进内存的思想是用项目里的xml2header.cmake脚本处理cascade的xml文件,生成一个含有长字符串的.h头文件,然后.cpp文件引用它。 这里有个问题就是,有的xml文件很大,比如常用的haarcascade_frontalface_alt.xml。这个文件如果直接编译成一个静态的长字符串,编译器很可能会出错。因此,我在cmake脚本里对这个文件切割成几个小的std::string,然后在程序初始化时用std::accumulate函数再组成完整的cascade字符串。另外要注意,读取成字符串的时候要把文件中的\\\转化成\\\\\\,每一行结尾要再加一个\n

读取视频源

测试中我使用了两种I420视频源,一种是有header的.y4m格式,一种是没有header的.yuv格式文件。对于.y4m,我们可以参考网上对于y4m格式的介绍来逐帧读取。

从内存中读取cascade字符串

处理cascade字符串时,我们可以用FileStorage创建一个流,然后给OpenCV的cv::CascadeClassifier类的read使用。不过实现过程中我发现read函数只支持新的Cascade文件 - 通过traincascade训练而来的,参考OpenCV API的文档- 为了绕过这一点,我重写了load函数的其中一小部分,这样老的cascade文件也能从内存里读取了。