项目与产品

做项目的看到做产品的来钱快，不免都心生嫉妒，想做起产品。

想做产品，但是又不愿意投入，那就只能在项目中产品，或者说从项目中提炼产品。

这个过程中，产品经理这个角色往往是缺位的。而在项目里大家所谓做产品，考虑的也仅仅是模块耦合性低一点，功能通用一点，设计上更灵活一点。主要考虑的是降低下次同类项目的实施成本。而不是产品所面对的用户。

项目中做产品往往是失败的。做项目考虑的是如何在限定的时间内完成既定的需求。项目经理仅考虑完成本期项目需求。为了保证实施及交付进度，就必须做出各种妥协。

产品经理和项目经理的区别应该是:产品经理从业务上规划产品, 包括产品的功能和运营, 为产品的成就负责.项目经理从技术上实现产品, 为产品的稳定和进度负责.

项目经理和产品经理这两个两个角色。有个很简单的区分办法:项目经理在跟进项目的同时兼任开发, 产品经理在跟进项目同时兼任产品设计.项目用传统的瀑布开发模型来开发都能做，但是做产品不用上敏捷，做出来的肯定不是一个面向客户的产品。

做产品，项目组内部成员统一思路也很重要。很快你就会发现项目成员的需求讨论，开发过程中有个问题。还是按照传统项目的实施思路，害怕客户提出新的需求。在讨论一个新功能的时候，最常见的一句话就是“如果我们做了XXX，客户又想要YYY那怎么办” 这恰恰是敏捷方法论早就指出的，如果一个项目实施结束时所完成的产品，和项目开始前的设计一致。那么着必然是一个失败的产品。随着项目的进展，实施方和客户都会对项目有了更深的理解，必然会提出更符合实际需要的需求。所以以前经常瀑布模型开发出来按所谓的产品，做出来了，连自己都不爱用。可想而知客户的反应。

做产品，有时候又会走向另外一个极端。在和一些用户讨论功能设计的时候，经常会注意到人们有一种追求过度灵活度的倾向，即超出可以想见的实际需求，一味追求 API 和功能上的极端灵活性。这种倾向其实是有害的，首先过度灵活的东西会大大提高实现的复杂度，给性能打上不小的折扣，其次，因为缺乏可预见的实际需求，投入产出严重不成比例。

2013 年 11 月 7 日2013 年 11 月 7 日

Hive 中使用多字符字符串作为字段分隔符

Hive建表语句中得FIELDS TERMINATED BY 只能是单字符，遇到多字符作为分隔符的就尴尬了。目前我们的字段分隔符是’@#@’ 。遇到这个问题除了变更分隔符外，hive也可以使用serde的方式来支持多字符作为分隔符。

例如一个分隔符为’@#@’的数据，有3个字段

create table hive_test(
id string,
tour_cd string,
flt_statis_cd string )
ROW FORMAT
SERDE ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’
WITH SERDEPROPERTIES
( ‘input.regex’ = ‘^([^@#]*)@#@([^@#]*)@#@([^@#]*)’,
‘output.format.string’ = ‘%1$s %2$s %3$s ‘)
STORED AS TEXTFILE;

input.regex 就是按照java的字段分割正则表达式方式编写。

output.format.string 按照顺序往后递增即可。

需要注意的是，字段类型只支持string，不然就会报错：

FAILED: Error in metadata: java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.contrib.serde2.RegexSerDe only accepts string columns, but column[3] named id_valid_ind has type int)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

建完以后就可以往hive表里面load数据了。但是用的时候很可能报这个错。

Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.contrib.serde2.RegexSerDe
	at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:247)
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:891)
	at org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:233)
	at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:366)
	... 33 more

执行add jar 命令将hive-contrib.jar 加入再执行hive语句即可

hive> add jar /usr/lib/hive/lib/hive-contrib-0.9.0-Intel.jar;
Added /usr/lib/hive/lib/hive-contrib-0.9.0-Intel.jar to class path
Added resource: /usr/lib/hive/lib/hive-contrib-0.9.0-Intel.jar

一个带分区的外部表，自定义多字符字符串作为分隔符的建表语句例子

create EXTERNAL table hive_test（
seg_fr_bs string,
tour_cd string,
flt_statis_cd string )
PARTITIONED BY(dt STRING)
ROW FORMAT
SERDE ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’
WITH SERDEPROPERTIES
( ‘input.regex’ = ‘^([^@#]*)@#@([^@#]*)@#@([^@#]*))’,
‘output.format.string’ = ‘%1$s %2$s %3$s’)
STORED AS TEXTFILE
LOCATION ‘/user/adhoc/file/pir2_base_ics_wxl’;

2013 年 10 月 15 日2013 年 10 月 15 日

走路

从小到大的教育，无数的名言、典故告诉我，只要努力就能成功。不成功只是你还不够努力。但工作也好些年头了，一无所成。

被各种项目中疲于奔命，被现实的压力牵着鼻子走，却没有时间停下来想一想，努力很好，但是我挣得选择了正确的方式了么，去一个城市，我可以走路，骑车，坐车，开车，高铁，飞机。

老大告诉我，走路很好，坐飞机的看不到地上的风景。我真的要把短暂的生命就这么花费?
可我想看的风景也许正是在目的地呢？但我想坐飞机的时候已经买不起机票了。。

刚毕业的时候我看不清，不知道该选择什么样的路。但现在趋势越来越明显。IT外包行业不好做，可真正重要的不是趋势，而是趋势的改变。趋势的改变才是决定成功的关键。

公司能转型么，公司从来都不缺乏创意，从领导们大大小小的饼就知道。最缺乏的是创意的执行，
要出新，就必须推陈。而现在，有限的人力的时间用来保卫昨天。大搞起数据仓库那一套了。
大家都知道未来在改变，可是人人都在为过去而忙碌。所以想要转型，信心从何而来。。

2013 年 7 月 25 日

redhat rhel 6 kernel: nf_conntrack: table full, dropping packet.

在做HAWQ 压力测试的时候突然发现连不上服务器了，检查一下，莫名自动切换到了备机运行。

检查heartbeat的日志：

Jul 24 15:55:17 big3hd02.corp.haier.com heartbeat: [23081]: info: Link big3hd01.corp.haier.com:bond0 dead.
Jul 24 15:55:17 big3hd02.corp.haier.com ipfail: [23133]: info: Link Status update: Link big3hd01.corp.haier.com/bond0 now has status dead
Jul 24 15:55:18 big3hd02.corp.haier.com ipfail: [23133]: info: Asking other side for ping node count.
Jul 24 15:55:18 big3hd02.corp.haier.com ipfail: [23133]: info: Checking remote count of ping nodes.
Jul 24 15:55:21 big3hd02.corp.haier.com ipfail: [23133]: info: Telling other node that we have more visible ping nodes.
Jul 24 15:55:26 big3hd02.corp.haier.com heartbeat: [23081]: info: big3hd01.corp.haier.com wants to go standby [all]
Jul 24 15:55:26 big3hd02.corp.haier.com heartbeat: [23081]: info: standby: other_holds_resources: 3
Jul 24 15:55:26 big3hd02.corp.haier.com heartbeat: [23081]: info: New standby state: 2
Jul 24 15:55:26 big3hd02.corp.haier.com heartbeat: [23081]: info: New standby state: 2
Jul 24 15:55:27 big3hd02.corp.haier.com heartbeat: [23081]: info: other_holds_resources: 0
Jul 24 15:55:41 big3hd02.corp.haier.com heartbeat: [23081]: info: Link big3hd01.corp.haier.com:bond0 up.

主节点的bond0网卡无法联通了，所以自动切换到了备机运行。这有点纳闷。检查主节点的系统日志

Jul 24 15:54:59 big3hd01 kernel: nf_conntrack: table full, dropping packet.
Jul 24 15:54:59 big3hd01 kernel: nf_conntrack: table full, dropping packet.
Jul 24 15:54:59 big3hd01 kernel: nf_conntrack: table full, dropping packet.
Jul 24 15:54:59 big3hd01 kernel: nf_conntrack: table full, dropping packet.
Jul 24 15:54:59 big3hd01 kernel: nf_conntrack: table full, dropping packet.
Jul 24 15:54:59 big3hd01 kernel: nf_conntrack: table full, dropping packet.
Jul 24 15:54:59 big3hd01 kernel: nf_conntrack: table full, dropping packet.
Jul 24 15:54:59 big3hd01 kernel: nf_conntrack: table full, dropping packet.
Jul 24 15:54:59 big3hd01 heartbeat: [3687]: ERROR: glib: Error sending packet: Operation not permitted
Jul 24 15:54:59 big3hd01 heartbeat: [3685]: ERROR: glib: ucast_write: Unable to send HBcomm packet bond0 10.135.24.2:694 len=210 [-1]: Operation not permitted
Jul 24 15:54:59 big3hd01 heartbeat: [3687]: info: glib: euid=0 egid=0
Jul 24 15:54:59 big3hd01 heartbeat: [3687]: ERROR: write_child: write failure on ping 10.135.25.254.: Operation not permitted
Jul 24 15:54:59 big3hd01 heartbeat: [3685]: ERROR: write_child: write failure on ucast bond0.: Operation not permitted
Jul 24 15:54:59 big3hd01 heartbeat: [3685]: ERROR: glib: ucast_write: Unable to send HBcomm packet bond0 10.135.24.2:694 len=198 [-1]: Operation not permitted
Jul 24 15:54:59 big3hd01 heartbeat: [3685]: ERROR: write_child: write failure on ucast bond0.: Operation not permitted
Jul 24 15:55:01 big3hd01 heartbeat: [3687]: ERROR: glib: Error sending packet: Operation not permitted
Jul 24 15:55:01 big3hd01 heartbeat: [3685]: ERROR: glib: ucast_write: Unable to send HBcomm packet bond0 10.135.24.2:694 len=198 [-1]: Operation not permitted
Jul 24 15:55:01 big3hd01 heartbeat: [3687]: info: glib: euid=0 egid=0
Jul 24 15:55:01 big3hd01 heartbeat: [3687]: ERROR: write_child: write failure on ping 10.135.25.254.: Operation not permitted
Jul 24 15:55:01 big3hd01 heartbeat: [3685]: ERROR: write_child: write failure on ucast bond0.: Operation not permitted
Jul 24 15:55:03 big3hd01 heartbeat: [3687]: ERROR: glib: Error sending packet: Operation not permitted
Jul 24 15:55:03 big3hd01 heartbeat: [3685]: ERROR: glib: ucast_write: Unable to send HBcomm packet bond0 10.135.24.2:694 len=197 [-1]: Operation not permitted
Jul 24 15:55:03 big3hd01 heartbeat: [3687]: info: glib: euid=0 egid=0
Jul 24 15:55:03 big3hd01 heartbeat: [3687]: ERROR: write_child: write failure on ping 10.135.25.254.: Operation not permitted
Jul 24 15:55:03 big3hd01 heartbeat: [3685]: ERROR: write_child: write failure on ucast bond0.: Operation not permitted
Jul 24 15:55:04 big3hd01 kernel: __ratelimit: 169 callbacks suppressed
Jul 24 15:55:04 big3hd01 kernel: nf_conntrack: table full, dropping packet.

出现了大量 kernel: nf_conntrack: table full, dropping packet.的信息。

检查netfilter的设置

sysctl net.nf_conntrack_max

net.nf_conntrack_max = 65536

检查当前的连接

wc -l /proc/net/nf_conntrack 达到5万多。

调大net.nf_conntrack_max到200000.

sysctl -w net.nf_conntrack_max=65536

监控并发测试时候，wc -l /proc/net/nf_conntrack 可以达到7万多。难怪日志中出现 kernel: nf_conntrack: table full, dropping packet.信息。

sysctl -w net.nf_conntrack_max=65536

再次测试就很顺利解决了并非测试的问题。

2013 年 4 月 30 日2013 年 4 月 30 日

NoSQL 数据库的分布式算法

Distributed Algorithms in NoSQL Databases

Scalability is one of the main drivers of the NoSQL movement. As such, it encompasses distributed system coordination, failover, resource management and many other capabilities. It sounds like a big umbrella, and it is. Although it can hardly be said that NoSQL movement brought fundamentally new techniques into distributed data processing, it triggered an avalanche of practical studies and real-life trials of different combinations of protocols and algorithms. These developments gradually highlight a system of relevant database building blocks with proven practical efficiency. In this article I’m trying to provide more or less systematic description of techniques related to distributed operations in NoSQL databases.

In the rest of this article we study a number of distributed activities like replication of failure detection that could happen in a database. These activities, highlighted in bold below, are grouped into three major sections:

Data Consistency. Historically, NoSQL paid a lot of attention to tradeoffs between consistency, fault-tolerance and performance to serve geographically distributed systems, low-latency or highly available applications. Fundamentally, these tradeoffs spin around data consistency, so this section is devoted?data replication?and?data repair.
Data Placement. A database should accommodate itself to different data distributions, cluster topologies and hardware configurations. In this section we discuss how todistribute or rebalance data?in such a way that failures are handled rapidly, persistence guarantees are maintained, queries are efficient, and system resource like RAM or disk space are used evenly throughout the cluster.
System Coordination. Coordination techniques like?leader election?are used in many databases to implements fault-tolerance and strong data consistency. However, even decentralized databases typically track their global state,?detect failures and topology changes. This section describes several important techniques that are used to keep the system in a coherent state.

继续阅读“NoSQL 数据库的分布式算法”