Dawei Huang | Pieces

Assemble your Jigsaw.

Mid autumn day is over.



早上起来依旧是看google reader里的rss,看fah的进展,写程序,看文章,今日的碎碎念并不比往日多。如果在家里,也许也是安静地陪父母去山上看看月亮,吃几块父母留下的月饼。不过今年的任务要到过年的时候完成了。




前几天看“乱象 印记”里的一篇帖子,其中的一个观点我很赞同:每到一个新的城市要迅速找到自己的坐标,这样一直又固定的方向,旧时的朋友不会忘记你,而你也容易找到新的朋友。而眼前的这个中秋,我对自己的坐标还是不甚明确,但心是坚定的;而远方的朋友们,也许都在低着头,努力寻找自己的坐标吧。于是在这里,这个简陋的博客里,码字也就成了一件有意义的事情了。

今天开始抄How to solve it。好记性不如烂笔头是经验之谈,肌肉记忆的辅助作用是科学常识。如果中秋真是个特别的日子,那么今天也算收获了个新的习惯。

Happy Mid Autumn Day!


How to use BLAT to annotate a new sequenced genome

//My first post here.
I am in charge of Date Palm genome annotation recently. In fact, there is only one person doing this, me.
UCSC and Ensembl are the most famous genome sequence databases and both developed effective tool, BLAT from UCSC and pMatch from Ensembl, to do this.
pMatch need input of protein sequence. So it is suitable for use rich datasets to annotate a similar species’ genome without lots of sequence proteins.
BLAT is good at cDNA to genome alignment. So we choose BLAT because we got Newbler assembled cDNA sequence from 454.

The Data Palm Genome sequence assembly based on two data source, the 454 newbler assembled contig and solid mate pair reads.
Newbler assemble the 454 data of genome to contigs which provide the main structure of genome assembly.
Solid mate pair data provide the information of relationship between contigs, including order, gap length, etc.

454 can not provide sequence strand information. After assembly, the cDNA can be sense strand or anti-sense strand. Since both of the genome strand have the ability to encode genes as we defined as positive strand gene and negative strand gene, the BLAT mapping data can provide 4 kinds of mapping results due to the complexity discussed before.

So the first step of annotation is to convert the anti-sense cDNA to sense strand sequence, the the mapping results will remain only two types, the positive strand gene and negative strand gene.

There are several source of the strand information, splice signal, transcriptome data form solid strand specific experiment, CDS prediction data and alignment-based annotation of cDNA.

Splice signal is the first 2 and last two base of intron sequence. cDNA is sequenced from mRNA which only consist exons of gene. When mapped back to genome, the alignment region of cDNA will consist multiple alignment blocks. Blocks of cDNA sequence are usually continuous whereas blocks of genome is separated by unaligned sequence blocks, usually introns.


//原文发表于2011-01-30 15:07,人人网.
iphone is perfect device for reading, especially for second language reading. But a lot of people complain about it shortage in extract word instantly by touch. There are many apps can do this, for example:

good apps
1.kindle from amazon
2.iBooks from apple
3.instapaper from instapaper
Then the question will be how to put the material u want to read into these apps.
First, we have a look at instapaper.
The design of the app is elegant. It is combine with net service. U store any material u like when browsing net. Just click the “read later” link to store it into instapaper.

read later link in ur bookmark bar

it shows “saving”

It is similar to google’s reader service. U can create folders and drag the folder’s own “read later” link to ur book mark bar for convenient. Although the instapaper cost 4.99 usd, I think it worth. But the dictionary integrate in it sucks for its short explanation copying from wikipedia. So this app is great for first language reader. But without a profession dictionary, it’s not so good for english learner.

Second one, we turn to iBooks.
This app is great. A lot of people become apple user for ipad and they bought it mostly for this app. The animation of flipping page is beautiful and the integrated dictionary is good, but not good enough for iphone and ipod touch user. As their screen size, apple didn’t design it to show the explanation and the context in one page. So when u save a screen shot as a vocabulary card, it’ll only contain the dictionary page. So difficult to remember what material u r reading by the card. But the good thing is u can transfer any materials through itunes and epub books can be edit and export from page, a software in apple’s iWork package. So with the high quality dictionary in it, u can make good vocabulary card by screen shot.
But better cards are form kindle screen shot. Without a transfer organize software like itunes, Kindle is not convenient as iBooks for reading ur personal material. But this is changed when instapaper and Kindle hooked up. Here is how I use them.
U can store any material into instapaper using its web service, it’s complete free. Then open the its main page in safari on ur iphone or ipod touch, click the ebook format u want in the download tool box.

instapaper web service

Choose ePub if prefer iBooks, but I’l use kindle. When a new page it open, instapaper will ask u whether u want to open it in “Kindle”.

download ebook in safari on ios device

Click it will launch kindle automatically. Then enjoy ur reading, ur dictionary and ur voc cards.

For iBook, the process is same. Just click the ePub button.
That’s all. Enjoy ur reading!

open instapaper ebook in kindle

OpenCL on OS X 之一:配置开发环境

//原文发表于2011-08-12 15:14 人人网。




收集之后,就是 如何 运行起来并学习了。

因为用的是mac,而且OpenCL原本是Apple牵头提出,此外,OS X下的利用cl技术的商业化软件也很多(可到app store内搜索),于是要学习Xcode。

曾经学习过,还是在iphone SDK出来之前。问题是IO S SDK出来之后,Xcode更新得越来越频繁,界面上都面目全非了。而update成瘾应该是很多MacHead的通病。于是摸索了一下,于是就有了我的第一个程序。


1. Xcode: 通过launchpad或/Developer/Applications找到xcode,打开。

2. new project->in left pane->Mac os x->application->command line tool->click [next]->key in a name & choose ur preference language(in this case, choose C)->save ur project

3. add OpenCL.framework


in left pane, click ur project->in right pane, click build phases->in link binary with libraries->click [+]->in search window click “opencl.framework”->then it will appeare in ur project, u can drag it into ur “frameworks folder”.

4.key in ur first openCL program.


 注意这里的kernel program是写在host program内部的,这个和标准的单独写kernel脚本的习惯不同。

5. compile and run in xcode

just clike run. and the compiled file is in product folder of the left pane.

6. compile and run in linux



//原文发表于2011-08-25 18:03,人人网。

师弟回国,留下的机器就被我收了。午饭前查看了一下fah的统计结果,果然两台客户端的跑分效率高了很多,但仍在unactive donor后面缓慢追赶。


母亲大人给的原因无非是怕影响学习之类的。但没有电脑也不妨碍我在文曲星上写了一大堆程序,虽然是用BASIC,而且大多数都是没什么用处的。那个时候的文曲星上的编程环境,实在是不能再简单了。当时尽然还花时间把老妈的大学basic语言教材翻了一遍。而那一年也正好是apple把nextstep移植到了mac下,发布mac os x的一年…现在回想起来,受那个系统的局限,其实自己在编程方面并没学到什么精髓…









//原文发表于9月6日,google group: sth. about genomics(本站镜像)。



3,提交后可以关闭console,但必须在log文件输出Done making ooc file.。这个位置的功能尚需要改进。

Intel(R) Xeon(R)  Processor (Intel64 Harpertown)
=====  Processor composition  =====
Processors(CPUs)  : 8
Packages(sockets) : 2
Cores per package : 4
Threads per core  : 1

memory: 16G





来Al Khobar旅游比黄金海岸还要简约,连沙滩游都省了,在宾馆睡睡懒觉,看书,游泳。特别是最后一项,再惬意不过了。要知道这可是在消耗石油来净化海水得沙特~
4,心理障碍是我们永远要克服的。游泳的心理障碍就是对水的恐惧。这个很好理解,因为人不是水生的。从羊水里钻出来之后,对这种能造成窒息的液体的控掘就深埋到潜意识里。而结果就是学习游泳时候的身体紧张,时候是动多的不协调,造成问题如疲劳,呛水,不能按照陆上练习的标准完成水下练习等等。在学习新的技能的时候,心理上的障碍是普遍存在的。关于人的两种状态的理论可以很好的和这个题目对应:人存在学习态和舒适态两种状态。在学习态人要不断更新自己的技能和思想,解决不熟悉的问题;而舒适态是不断熟练技能,解决自己熟悉的问题。很显然,后者是人更舒服的状态――我们都喜欢做自己擅长的事情。任务完成的越漂亮,我们越可以标榜自己的聪明。what a shame~强迫自己一直处于学习态,人才可以不断进步。克服你的心理障碍吧。

Hello world!

Welcome to WordPress.com. After you read this, you should delete and write your own post, with a new title above. Or hit Add New on the left (of the admin dashboard) to start a fresh post.

Here are some suggestions for your first post.

  1. You can find new ideas for what to blog about by reading the Daily Post.
  2. Add PressThis to your browser. It creates a new blog post for you about any interesting  page you read on the web.
  3. Make some changes to this page, and then hit preview on the right. You can always preview any post or edit it before you share it to the world.