信息孤岛

“信息孤岛” (The information isolated island ),听着挺专业的一个词,前些天我就遇上了这个场景。查资料的时候无意中发现一个产品,对我们部门的一个同类产品有借鉴的意义,略微看了一下便发了邮件分享给大家。邮件发完后几分钟以后就有人回了邮件

晕,这个就是我们现在用的啊,XX的产品。。。。。。。

自家产品不认识,是有些尴尬。这是个饭后的谈资但更是所谓信息孤岛的一个典型案例。每个人的知识和能力是有限的,每个个体都是一个“知识孤岛”,通过团队沟通,交流,团队间可能会组成一个“知识群岛”,如果有更好的知识协助和知识管理的方法,则有可能形成一个“知识大陆”。

目前知识管理系统 KMS (knowledge management system)有很多,我从一年前开始就不断试用了业界的各种KMS软件尝试找到一款适合的KMS软件。但是很多像sharepoint之类的KMS注重的是管理,而我觉得KMS的精髓在于知识的分享。分享知识的奇妙之处正在于,越分享,知识越是增值。

 

我理想中的KMS软件有以下特点:

  1. KMS必须是每个人都可以维护,更新,发布的。才能确保KMS的生命力。对于文中的错误,任何能都能直接修改,并保存。通过保存多版本来保证不被恶意修改。类似百度百科,互动百科,wikipedia。
  2. 便于编辑,最好是所见即所得的编辑方式,不需要如wikipedia一样掌握新的语法。类似博客。
  3. 在编辑时能直接粘贴Word等还保持原始格式。类似织梦CMS。
  4. 上传的pdf,ppt,word,excel等附件能在web直接展示,而不需要下载附件打开后才能看到。类似slideshare。
  5. 必须提供全文检索功能 。类似google
  6. 提供良好的上传下载功能。类似dropbox,百度云等等。
  7. 要有积分,等级制度,让分享多的人有成就感。

理想要求太多,结果是一直没有找到一个能满足所有要求的软件来进行知识管理和分享。但是根据我的经验来讲,缺少其中任何一个功能,都无法进行有效的知识分享。

写到这,想想我是不是应该开发这个理想的产品。因为市场上没有这样的产品,而大家又有这样的强烈需求。而开发的时候自己的需求才是最好的需求,不需要访谈任何人,不需要到处征求意见,能让自己,让团队爱不释手的产品就是一个真正吸引人的产品。

 

 

 

杂谈:未来的路

人的一生很短暂,算算只有短短几万天,所以人最宝贵的财富是时间,相同的时间里如何才能达到功用最大化(maximization of utility) ,那选择走什么样的路,做什么样的事,就必须细心考量了。

功用最大化做到了么,其实想想就知道,没有!不管是我,或者是我身边很多的同事。其实我们都可以做的更好,好很多。我很身边有很多很优秀的人一样,埋在琐事中一天一天浪费着时间和才华,现在回头想想觉得很可惜。勤奋,认真,用心的做事情很重要,但是就像现代管理学之父彼得·德鲁克(Peter Drucker)曾在他的著作《论管理》(On the Profession ofManagement)中说:

没什么比高效地做一件根本不该做的事更加徒劳的了。

所以更重要的是选择做正确的事情。只有正确的事才能让你做到功用最大化。

那我们是要马上辞职,换个平台?对员工辞职,马云说的挺到位

员工的离职原因林林总总,只有两点最真实:1、钱,没给到位;2、心,委屈了。这些归根到底就一条:干得不爽。员工临走还费尽心思找靠谱的理由,就是为给你留面子,不想说穿你的管理有多烂、他对你已失望透顶。 仔细想想,真是人性本善。作为管理者,定要乐于反省。

按照马斯洛需求层次理论,人的需求分为五种,像阶梯一样从低到高,按层次逐级递升,分别为:生理上的需求,安全上的需求,情感和归属的需求,尊重的需求,自我实现的需求。

画大饼可以,满足了员工自我实现的需求。但是生活中终究是现实的,没有底层的支撑,那终究不会长久。所以大家崇拜的好公司很重要的特点,就是满足了马云说的亮点,钱给够了,心不委屈了。一项委屈了,那还能撑一撑。两个都委屈了,那就只能走人了。

那未来走什么样的路?我们都说失败是成功之母,或者说learn to fail。但是从失败中学习我们只是知道某一条路会失败,但是仍然没有找到正确的路。所以更应该学习的是成功,向最成功的行业的顶尖学习。现在所作的这种软件实施,外包行业,我不知道在全国,或者世界范围内哪家是最成功的。可以做到如同文思海辉那般好几万人。但是这种几万人的公司又如何?拿着可怜的利润,微薄的薪资福利。其实这种软件实施,外包的行业大家都过得很辛苦,所以你说实施行业要留人,涨薪涨不起,那就只能靠画大饼了。都说老大们爱画饼,我真的很理解。

所以我曾经想我不再做实施了,做实施的出路在哪里,还真没看到,老大们过着的生活也不是我想要的。

所以就像刚才说的那样,要向成功者学习。反观现在成功的公司在做什么,都是在做产品,微软不是在为每个人定制软件,谷歌也不是在做提数。做产品意味着可持续,可复制的成功。发达的互联网环境和日渐完善的网上支付体系,意味着不再需要向传统产品一样耗费大量资源进行渠道部署,甚至不再需要传统渠道。的出现。

想改行去做产品,而不是实施行业这种拼人力的苦B现状。退一万步讲,即便还做不了产品,想想这么累,经济压力有那么大,那还真不如去外企,至少人家钱也给的比较足,出差还住着四五星的宾馆,两周还能回家一趟,家庭矛盾也不会这么大。

说了半天,其实就是一句话!只有心里有梦,就一定会成功!

我的2012

2012静静的过去了,有成长,也有失落。曾经迷茫过,但是我想我已经知道我需要什么。

我要走什么样的路,需要什么,问过自己很多次。

2012心得,听过马云创业的故事,不在乎年纪,心里永远不要放弃自己的梦,记得要走的路。

2012最有心得的书,《REWORK 》 < The Lean Startup>

Redhat Enterprise Linux 6 update 3 (RHEL 6.3) 与WINDOWS SERVER 2008 Active Directory 集成

好久没用Windows,这周在做hadoop的安全认证相关事情,打算hadoop改用kerberos进行用户认证。kerberos的实现在linux下有很多方案了,因为微软的Active Directory在企业中应用还是比较广泛的,就打算装个windows server 做域控制器,来提供LDAP 目录检索和kerberos认证服务。想想对AD还是挺熟悉的,若干年前就搞过了,于是兴致满满的开工了。

周一装好了windows server 2003 ,这就是悲催故事的开始。。。。。配好了AD和DNS,然后有点悲催,折腾了两天具体就不谈了,回头想是AD中的属性名称与linux不同,然后又按照win2008的属性名做映射,能通才怪了。虽然最后没配通,但是对整个AD与linux的相关集成有了更深入了解。

文中用到的一些信息如下:

域控制器 BIDC.BI.HYLANDTEC.COM

域 :BI.HYLANDTEC.COM

用户检索ldap的bind用户名称 :binduser

1.WINDOWS 配置部分

在windows 2008上装完AD以后注意要装上NIS扩展模型,属性中就会出现一个新的Tab页(Unix Attributes) 。设置好uid,gid,homedirectory等AD中的用户和组才能为linux所用。

装完NIS以后新增一个group

而后设置linuxgrp的gid

而后新建用户binduser

设置好用户的unix相关属性

设置完以后用系统自带的ADSI EDIT (ADSIEDIT.MSC) 或者Mark Russinovich 的Active Directory Explorer或者ldapsearch等等来看看用户属性。下图是Active Directory Explorer的截图,如果需要完成属性最好还是用adsiedit来看:

Linux中用户所必需的loginshell,uid,gid等等已经出现了。AD中的配置肩搭到这边即可用了,剩下主要是linux下的工作。

2.linux配置部分

Linux 下有很多方案,用Winbind,sssd.nslcd 的,其中nslcd是rhel 6中引入的,所以本次在rhel6.3中就采用nslcd的方式。

 

yum install pam_krb5 pam_ldap nss-pam-ldapd nscd openldap-clients

装完以后先执行以下命令试试能否和AD服务器连通,如果服务器不通,检查防火墙和网络设置。

 ldapsearch -x -LLL -H ‘ldap://bidc.bi.hylandtec.com’ -b ‘DC=BI,DC=HYLANDTEC,DC=COM’ -E pr=200/noprompt -D “[email protected]” -W -s sub “(cn=binduser)”

应该会出来类似的结果,注意unix相关属性应该顺利被读取。

与ldap服务器连接无问题后以root用户运行

 authconfig-tui

按照如下配置认证方式:

配置LDAP

配置kerberos

按OK保存以后会提示nslcd服务启动:

检查Name Service Switch 已经配置为使用ldap了。/etc/nsswitch.conf 文件中以下三行是否有ldap。

passwd:     files ldap

shadow:     files ldap

group:      files ldap

由于AD不允许匿名使用目录服务,以及需要对AD中的属性映射到unix下的属性。还需要对nslcd做手工配置。

修改/etc/nslcd.conf 文件 (注:windows server 2003 装SFU 的NIS 以后的ladp的属性名称和2008的不一样,配置文件中的需要写较多的map 但其实也是可以用的)

binddn CN=binduser,CN=Users,DC=BI,DC=HYLANDTEC,DC=COM

bindpw bind..321

# Mappings for Active Directory

pagesize 1000

referrals off filter

passwd (&(objectClass=user)(!(objectClass=computer))(uidNumber=*)(unixHomeDirectory=*))

map    passwd uid              sAMAccountName

map    passwd homeDirectory    unixHomeDirectory

map    passwd gecos            displayName

filter shadow (&(objectClass=user)(!(objectClass=computer))(uidNumber=*)(unixHomeDirectory=*))

map    shadow uid              sAMAccountName

map    shadow shadowLastChange pwdLastSet

filter group  (objectClass=group)

map    group  uniqueMember     member

修改完后执行/etc/init.d/nslcd restart 重启nslcd

重启完毕后执行getent passwd ,

nscd:x:28:28:NSCD Daemon:/:/sbin/nologin

nslcd:x:65:55:LDAP Client User:/:/sbin/nologin

binduser:*:10000:10000:binduser:/home/binduser:/bin/sh

出现AD中的用户,如binduser就说明nslcd配置正确。

因为LDAP过来的用户并没有创建主目录,所以需要自动新建主目录。在/etc/pam.d/sshd 以及/etc/pam.d/logon中增加一行

session required pam_mkhomedir.so skel=/etc/skel umask=0022

kerberos要求各台服务器间时间同步,误差不能大于十分钟,所以与域控制器做好时间同步。

ntpdate bidc.bi.hylandtec.com

最后检查主机名是否设置正确/etc/hosts

172.16.130.227 rhel227.bi.hylandtec.com rhel227

执行hostname命令设置主机名 ,并检查/etc/sysconfig/network 文件中的主机名是否正确设置。

hostname rhel227.bi.hylandtec.com

 

编辑, /etc/krb5.conf在libdefaults部分 增加default_tgs_enctypes,default_tkt_enctypes,permitted_enctypes 三个属性,例如:

[libdefaults]

default_realm = BI.HYLANDTEC.COM

dns_lookup_realm = false

dns_lookup_kdc = false

ticket_lifetime = 24h

renew_lifetime = 7d

forwardable = true

default_tgs_enctypes = rc4-hmac

default_tkt_enctypes = rc4-hmac

permitted_enctypes = rc4-hmac

编辑/etc/samba/smb.conf ,内容如下,workgroup写域的第一部分:

workgroup = BI

server string = Samba Server Version %v

netbios name = RHEL227

security = ads

realm = BI.HYLANDTEC.COM

dedicated keytab file = / etc/krb5.keytab

kerberos method = system keytab

password server = BIDC.BI.HYLANDTEC.COM

使用域管理员用户将计算机加入域

[root@rhel227 ~]# net ads join OSNAME=RHEL OSVer=6 -U Administrator

Enter Administrator’s password:

Using short domain name — BI

Joined ‘RHEL227’ to realm ‘BI.HYLANDTEC.COM’

查看keytab ,顺利出现主机内容就表示配置成功。

[root@rhel227 security]# klist -ke

Keytab name: WRFILE:/etc/krb5.keytab

KVNO Principal —- ————————————————————————–

3 host/[email protected] (des-cbc-crc)

3 host/[email protected] (des-cbc-md5)

3 host/[email protected] (arcfour-hmac)

3 host/[email protected] (des-cbc-crc)

3 host/[email protected] (des-cbc-md5)

3 host/[email protected] (arcfour-hmac)

3 [email protected] (des-cbc-crc)

3 [email protected] (des-cbc-md5)

3 [email protected] (arcfour-hmac)

 

而后我们就可以尝试使用AD上的用户登录linux主机了。。

[root@rhel232 ~]# ssh [email protected]

[email protected]’s password:

Creating directory ‘/home/binduser’.

-sh-4.1$ hostname

rhel227.bi.hylandtec.com

-sh-4.1$ id binduser

uid=10000(binduser) gid=10000(linuxgrp) groups=10000(linuxgrp)

顺利登录,大功告成。

 

 

 

 

 

 

 

 

 

 

 

 

 

 

hadoop 0.20.2 capacity scheduler 配置方法

本文采用的是CDH4+MAPREDUCE 0.20

hadoop中共有6台机作为tasktracker ,每台机配置map和reduce个2个slots

队列名 Capacity Maximum  Capacity
 defaults  10%
 edp  50%  90%
 hive  40%  80%

设置完资源后,设置队列的ACL,必须具有相关权限才能向指定队列中提交任务。

队列名 权限
 defaults 所有用户可提交
 edp xhdeng可提交
 hive root及hive用户可提交

登录jobtracker机器

将/usr/lib/hadoop-0.20-mapreduce/contrib/capacity-scheduler/下的hadoop-capacity-scheduler-2.0.0-mr1-cdh4.0.1.jar 复制到/usr/lib/hadoop/lib/目录下

修改/etc/hadoop/conf 下的mapred-site.xml 在其中新增

<property>
<name>mapred.jobtracker.taskScheduler</name>
<value>org.apache.hadoop.mapred.CapacityTaskScheduler</value>
</property>

<property>
<name>mapred.queue.names</name>
<value>default,edp,hive</value>
</property>

<property>
<name>mapred.acls.enabled</name>
<value>true</value>
<description> Specifies whether ACLs should be checked
for authorization of users for doing various queue and job level operations.
ACLs are disabled by default. If enabled, access control checks are made by
JobTracker and TaskTracker when requests are made by users for queue
operations like submit job to a queue and kill a job in the queue and job
operations like viewing the job-details (See mapreduce.job.acl-view-job)
or for modifying the job (See mapreduce.job.acl-modify-job) using
Map/Reduce APIs, RPCs or via the console and web user interfaces.
</description>
</property>

在/etc/hadoop/conf 下新建capacity-scheduler.xml

<?xml version=”1.0″?>

<!– This is the configuration file for the resource manager in Hadoop. –>
<!– You can configure various scheduling parameters related to queues. –>
<!– The properties for a queue follow a naming convention,such as, –>
<!– mapred.capacity-scheduler.queue.<queue-name>.property-name. –>

<configuration>

<property>
<name>mapred.capacity-scheduler.maximum-system-jobs</name>
<value>6</value>
<description>Maximum number of jobs in the system which can be initialized,
concurrently, by the CapacityScheduler.
</description>
</property>

<property>
<name>mapred.capacity-scheduler.queue.default.capacity</name>
<value>10</value>
<description>Percentage of the number of slots in the cluster that are
to be available for jobs in this queue.
</description>
</property>

<property>
<name>mapred.capacity-scheduler.queue.default.maximum-capacity</name>
<value>-1</value>
<description>
maximum-capacity defines a limit beyond which a queue cannot use the capacity of the cluster.
This provides a means to limit how much excess capacity a queue can use. By default, there is no limit.
The maximum-capacity of a queue can only be greater than or equal to its minimum capacity.
Default value of -1 implies a queue can use complete capacity of the cluster.

This property could be to curtail certain jobs which are long running in nature from occupying more than a
certain percentage of the cluster, which in the absence of pre-emption, could lead to capacity guarantees of
other queues being affected.

One important thing to note is that maximum-capacity is a percentage , so based on the cluster’s capacity
the max capacity would change. So if large no of nodes or racks get added to the cluster , max Capacity in
absolute terms would increase accordingly.
</description>
</property>

<property>
<name>mapred.capacity-scheduler.queue.default.supports-priority</name>
<value>false</value>
<description>If true, priorities of jobs will be taken into
account in scheduling decisions.
</description>
</property>

<property>
<name>mapred.capacity-scheduler.queue.default.minimum-user-limit-percent</name>
<value>100</value>
<description> Each queue enforces a limit on the percentage of resources
allocated to a user at any given time, if there is competition for them.
This user limit can vary between a minimum and maximum value. The former
depends on the number of users who have submitted jobs, and the latter is
set to this property value. For example, suppose the value of this
property is 25. If two users have submitted jobs to a queue, no single
user can use more than 50% of the queue resources. If a third user submits
a job, no single user can use more than 33% of the queue resources. With 4
or more users, no user can use more than 25% of the queue’s resources. A
value of 100 implies no user limits are imposed.
</description>
</property>

<property>
<name>mapred.capacity-scheduler.queue.default.user-limit-factor</name>
<value>1</value>
<description>The multiple of the queue capacity which can be configured to
allow a single user to acquire more slots.
</description>
</property>

<property>
<name>mapred.capacity-scheduler.queue.default.maximum-initialized-active-tasks</name>
<value>200000</value>
<description>The maximum number of tasks, across all jobs in the queue,
which can be initialized concurrently. Once the queue’s jobs exceed this
limit they will be queued on disk.
</description>
</property>

<property>
<name>mapred.capacity-scheduler.queue.default.maximum-initialized-active-tasks-per-user</name>
<value>100000</value>
<description>The maximum number of tasks per-user, across all the of the
user’s jobs in the queue, which can be initialized concurrently. Once the
user’s jobs exceed this limit they will be queued on disk.
</description>
</property>

<property>
<name>mapred.capacity-scheduler.queue.default.init-accept-jobs-factor</name>
<value>10</value>
<description>The multipe of (maximum-system-jobs * queue-capacity) used to
determine the number of jobs which are accepted by the scheduler.
</description>
</property>

<!– The default configuration settings for the capacity task scheduler –>
<!– The default values would be applied to all the queues which don’t have –>
<!– the appropriate property for the particular queue –>
<property>
<name>mapred.capacity-scheduler.default-supports-priority</name>
<value>false</value>
<description>If true, priorities of jobs will be taken into
account in scheduling decisions by default in a job queue.
</description>
</property>

<property>
<name>mapred.capacity-scheduler.default-minimum-user-limit-percent</name>
<value>100</value>
<description>The percentage of the resources limited to a particular user
for the job queue at any given point of time by default.
</description>
</property>
<property>
<name>mapred.capacity-scheduler.default-user-limit-factor</name>
<value>1</value>
<description>The default multiple of queue-capacity which is used to
determine the amount of slots a single user can consume concurrently.
</description>
</property>

<property>
<name>mapred.capacity-scheduler.default-maximum-active-tasks-per-queue</name>
<value>200000</value>
<description>The default maximum number of tasks, across all jobs in the
queue, which can be initialized concurrently. Once the queue’s jobs exceed
this limit they will be queued on disk.
</description>
</property>

<property>
<name>mapred.capacity-scheduler.default-maximum-active-tasks-per-user</name>
<value>100000</value>
<description>The default maximum number of tasks per-user, across all the of
the user’s jobs in the queue, which can be initialized concurrently. Once
the user’s jobs exceed this limit they will be queued on disk.
</description>
</property>

<property>
<name>mapred.capacity-scheduler.default-init-accept-jobs-factor</name>
<value>10</value>
<description>The default multipe of (maximum-system-jobs * queue-capacity)
used to determine the number of jobs which are accepted by the scheduler.
</description>
</property>

<!– Capacity scheduler Job Initialization configuration parameters –>
<property>
<name>mapred.capacity-scheduler.init-poll-interval</name>
<value>5000</value>
<description>The amount of time in miliseconds which is used to poll
the job queues for jobs to initialize.
</description>
</property>
<property>
<name>mapred.capacity-scheduler.init-worker-threads</name>
<value>5</value>
<description>Number of worker threads which would be used by
Initialization poller to initialize jobs in a set of queue.
If number mentioned in property is equal to number of job queues
then a single thread would initialize jobs in a queue. If lesser
then a thread would get a set of queues assigned. If the number
is greater then number of threads would be equal to number of
job queues.
</description>
</property>

<property>
<name>mapred.capacity-scheduler.queue.hive.capacity</name>
<value>40</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.hive.maximum-capacity</name>
<value>80</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.hive.supports-priority</name>
<value>false</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.hive.minimum-user-limit-percent</name>
<value>20</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.hive.user-limit-factor</name>
<value>10</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.hive.maximum-initialized-active-tasks</name>
<value>200000</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.hive.maximum-initialized-active-tasks-per-user</name>
<value>100000</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.hive.init-accept-jobs-factor</name>
<value>100</value>
</property>

<!– queue: edp –>
<property>
<name>mapred.capacity-scheduler.queue.edp.capacity</name>
<value>50</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.edp.maximum-capacity</name>
<value>90</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.edp.supports-priority</name>
<value>false</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.edp.minimum-user-limit-percent</name>
<value>100</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.edp.user-limit-factor</name>
<value>1</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.edp.maximum-initialized-active-tasks</name>
<value>200000</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.edp.maximum-initialized-active-tasks-per-user</name>
<value>100000</value>
</property>
<property>
<name>mapred.capacity-scheduler.queue.edp.init-accept-jobs-factor</name>
<value>10</value>
</property>

</configuration>

在/etc/hadoop/conf 下新建mapred-queue-acls.xml

<configuration>
<property>
<name>mapred.queue.edp.acl-submit-job</name>
<value>xhdeng</value>
<description> Comma separated list of user and group names that are allowed
to submit jobs to the ‘default’ queue. The user list and the group list
are separated by a blank. For e.g. user1,user2 group1,group2.
If set to the special value ‘*’, it means all users are allowed to
submit jobs. If set to ‘ ‘(i.e. space), no user will be allowed to submit
jobs.

It is only used if authorization is enabled in Map/Reduce by setting the
configuration property mapred.acls.enabled to true.
Irrespective of this ACL configuration, the user who started the cluster and
cluster administrators configured via
mapreduce.cluster.administrators can submit jobs.
</description>
</property>
<property>
<name>mapred.queue.hive.acl-submit-job</name>
<value>hive,root</value>
<description> Comma separated list of user and group names that are allowed
to submit jobs to the ‘default’ queue. The user list and the group list
are separated by a blank. For e.g. user1,user2 group1,group2.
If set to the special value ‘*’, it means all users are allowed to
submit jobs. If set to ‘ ‘(i.e. space), no user will be allowed to submit
jobs.

It is only used if authorization is enabled in Map/Reduce by setting the
configuration property mapred.acls.enabled to true.
Irrespective of this ACL configuration, the user who started the cluster and
cluster administrators configured via
mapreduce.cluster.administrators can submit jobs.
</description>
</property>
<property>
<name>mapred.queue.default.acl-submit-job</name>
<value>*</value>
<description> Comma separated list of user and group names that are allowed
to submit jobs to the ‘default’ queue. The user list and the group list
are separated by a blank. For e.g. user1,user2 group1,group2.
If set to the special value ‘*’, it means all users are allowed to
submit jobs. If set to ‘ ‘(i.e. space), no user will be allowed to submit
jobs.

It is only used if authorization is enabled in Map/Reduce by setting the
configuration property mapred.acls.enabled to true.
Irrespective of this ACL configuration, the user who started the cluster and
cluster administrators configured via
mapreduce.cluster.administrators can submit jobs.
</description>
</property>
</configuration>

重启jobtracker,登录hadoop的map/reduce administration即可发现新的调度器生效。

 

Scheduling Information

Queue Name State Scheduling Information
hive running Queue configuration
Capacity Percentage: 40.0%
User Limit: 20%
Priority Supported: NO
————-
Map tasks
Capacity: 4 slots
Maximum capacity: 9 slots
Used capacity: 0 (0.0% of Capacity)
Running tasks: 0
————-
Reduce tasks
Capacity: 4 slots
Maximum capacity: 9 slots
Used capacity: 0 (0.0% of Capacity)
Running tasks: 0
————-
Job info
Number of Waiting Jobs: 0
Number of Initializing Jobs: 0
Number of users who have submitted jobs: 0
default running Queue configuration
Capacity Percentage: 10.0%
User Limit: 100%
Priority Supported: NO
————-
Map tasks
Capacity: 1 slots
Used capacity: 0 (0.0% of Capacity)
Running tasks: 0
————-
Reduce tasks
Capacity: 1 slots
Used capacity: 0 (0.0% of Capacity)
Running tasks: 0
————-
Job info
Number of Waiting Jobs: 0
Number of Initializing Jobs: 0
Number of users who have submitted jobs: 0
edp running Queue configuration
Capacity Percentage: 50.0%
User Limit: 100%
Priority Supported: NO
————-
Map tasks
Capacity: 6 slots
Maximum capacity: 10 slots
Used capacity: 0 (0.0% of Capacity)
Running tasks: 0
————-
Reduce tasks
Capacity: 6 slots
Maximum capacity: 10 slots
Used capacity: 0 (0.0% of Capacity)
Running tasks: 0
————-
Job info
Number of Waiting Jobs: 0
Number of Initializing Jobs: 0
Number of users who have submitted jobs: 0

 

如果修改了调度器的配置文件,无需重启整个jobtracker,使用以下命令刷新即可。

hadoop mradmin -refreshQueues

查看当前用户的acl可以使用以下命令查看。

mapred queue -showacls

Queue acls for user : root

Queue Operations
=====================
hive submit-job,administer-jobs
default submit-job,administer-jobs
edp administer-jobs

用root用户往edp队列中跑一个任务测试一下:

sudo  hadoop jar    hadoop-mapreduce-client-jobclient-2.0.0-cdh4.0.1-tests.jar TestDFSIO -D mapred.job.queue.name=edp  -write -nrFiles 6 -fileSize 1000

然后,必然的,报错了。

12/11/23 16:16:19 ERROR security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: User root cannot perform operation SUBMIT_JOB on queue edp.
Please run “hadoop queue -showacls” command to find the queues you have access to .

用root用户往edp用户中跑一个任务测试一下:

sudo  hadoop jar    hadoop-mapreduce-client-jobclient-2.0.0-cdh4.0.1-tests.jar TestDFSIO  -write -nrFiles 6 -fileSize 1000

然后,必然的,报错了。

default running Queue configuration
Capacity Percentage: 10.0%
User Limit: 100%
Priority Supported: NO
————-
Map tasks
Capacity: 1 slots
Used capacity: 1 (100.0% of Capacity)
Running tasks: 1
Active users:
User ‘root’: 1 (100.0% of used capacity)
————-
Reduce tasks
Capacity: 1 slots
Used capacity: 0 (0.0% of Capacity)
Running tasks: 0
————-
Job info
Number of Waiting Jobs: 0
Number of Initializing Jobs: 0
Number of users who have submitted jobs: 1

跑起来了,很悲催的是只能使用1个slot。。。

那往hive队列上再调一次。。

 

Queue Name State Scheduling Information
hive running Queue configuration
Capacity Percentage: 40.0%
User Limit: 20%
Priority Supported: NO
————-
Map tasks
Capacity: 4 slots
Maximum capacity: 9 slots
Used capacity: 6 (150.0% of Capacity)
Running tasks: 6
Active users:
User ‘root’: 6 (100.0% of used capacity)
————-
Reduce tasks
Capacity: 4 slots
Maximum capacity: 9 slots
Used capacity: 0 (0.0% of Capacity)
Running tasks: 0
————-
Job info
Number of Waiting Jobs: 0
Number of Initializing Jobs: 0
Number of users who have submitted jobs: 1

提示:我们设置的capacity是4,而实际的则是Used capacity: 6 (150.0% of Capacity),符合我们的预期。

使用capacity scheduler ,集群的资源得到有效的管理可控制,即不会让一个用户跑死整个集群,也不会管得过死造成资源闲置。