Suzf Blog

[faq] Couchbase cant start because of the ip addr nightmare

Jeffrey Aug 15, 2016 faq

许久之前，再次打开在本地 VM 创建的 Couchbase 有些异常服务无法正常运行查看日志，如下：

# /opt/couchbase/var/lib/couchbase/logs/info.log
[ns_server:warn,2016-08-15T13:47:59.536+08:00,nonode@nohost:dist_manager<0.129.0>:dist_manager:wait_for_address:121]Cannot listen on address `cb1.suzf.net`: eaddrnotavail
[ns_server:info,2016-08-15T13:47:59.536+08:00,nonode@nohost:dist_manager<0.129.0>:dist_manager:wait_for_address:125]Configured address `cb1.suzf.net` seems to be invalid. Giving OS a chance to bring it up.
[ns_server:error,2016-08-15T13:48:00.537+08:00,nonode@nohost:dist_manager<0.129.0>:dist_manager:init:178]Configured address `cb1.suzf.net` seems to be invalid. Will refuse to start for safety reasons.

看起来是说主机名称 cb.suzf.net 无效，以至于服务无法正常启动。 O 想起来了，曾经修改过 /etc/hosts 记录

#grep cb /etc/hosts
172.16.9.10     cb1.suzf.net
172.16.9.20     cb2.suzf.net

而当前的IP为

#ifconfig eth1|grep 'inet addr'
          inet addr:172.16.9.11  Bcast:172.16.9.255  Mask:255.255.255.0

下面来更正 hosts 记录

#grep cb /etc/hosts
172.16.9.11     cb1.suzf.net
172.16.9.20     cb2.suzf.net

It worked.

There are several ways you can provide hostnames:

When installing a Couchbase Server on a machine.
When adding a node to an existing cluster for an online upgrade.
Using a REST API call.

Hostname Errors

Provide the hostname and port for the node and administrative credentials for the cluster. The value you provide for hostname must be a valid hostname for the node. Possible errors that can occur:

Could not resolve the host name. The host name you provide as a parameter does not resolve to a IP address.
Could not listen. The host name resolves to an IP address, but no network connection exists for the address.
Could not rename the node because name was fixed at server start-up.
Could not save address after rename.
Requested name host name is not allowed. Invalid host name provided.
Renaming is disallowed for nodes that are already part of a cluster.

节选自： http://developer.couchbase.com/documentation/server/4.1/install/hostnames.html

pip-faq: Error -5 while decompressing data: incomplete or truncated stream

Jeffrey Dec 30, 2015 faq

在我执行 `pip install flask-bootstrap` 出现了一个这样的错误
-- error: Error -5 while decompressing data: incomplete or truncated stream

安装/卸载其他包是正常的。唯独管理flask-bootstrap 出现了这样的错误。

版本信息：
#pip --version
pip 7.1.2 from /usr/local/lib/python2.7/site-packages/pip-7.1.2-py2.7.egg (python 2.7)
#python --version
Python 2.7.3

完整的报错信息是：

^_^[15:36:31][root@master01 ~]#pip install flask-bootstrap
Collecting flask-bootstrap
Exception:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/pip-7.1.2-py2.7.egg/pip/basecommand.py", line 211, in main
    status = self.run(options, args)
  File "/usr/local/lib/python2.7/site-packages/pip-7.1.2-py2.7.egg/pip/commands/install.py", line 294, in run
    requirement_set.prepare_files(finder)
  File "/usr/local/lib/python2.7/site-packages/pip-7.1.2-py2.7.egg/pip/req/req_set.py", line 334, in prepare_files
    functools.partial(self._prepare_file, finder))
  File "/usr/local/lib/python2.7/site-packages/pip-7.1.2-py2.7.egg/pip/req/req_set.py", line 321, in _walk_req_to_install
    more_reqs = handler(req_to_install)
  File "/usr/local/lib/python2.7/site-packages/pip-7.1.2-py2.7.egg/pip/req/req_set.py", line 461, in _prepare_file
    req_to_install.populate_link(finder, self.upgrade)
  File "/usr/local/lib/python2.7/site-packages/pip-7.1.2-py2.7.egg/pip/req/req_install.py", line 250, in populate_link
    self.link = finder.find_requirement(self, upgrade)
  File "/usr/local/lib/python2.7/site-packages/pip-7.1.2-py2.7.egg/pip/index.py", line 486, in find_requirement
    all_versions = self._find_all_versions(req.name)
  File "/usr/local/lib/python2.7/site-packages/pip-7.1.2-py2.7.egg/pip/index.py", line 404, in _find_all_versions
    index_locations = self._get_index_urls_locations(project_name)
  File "/usr/local/lib/python2.7/site-packages/pip-7.1.2-py2.7.egg/pip/index.py", line 378, in _get_index_urls_locations
    page = self._get_page(main_index_url)
  File "/usr/local/lib/python2.7/site-packages/pip-7.1.2-py2.7.egg/pip/index.py", line 818, in _get_page
    return HTMLPage.get_page(link, session=self.session)
  File "/usr/local/lib/python2.7/site-packages/pip-7.1.2-py2.7.egg/pip/index.py", line 928, in get_page
    "Cache-Control": "max-age=600",
  File "/usr/local/lib/python2.7/site-packages/pip-7.1.2-py2.7.egg/pip/_vendor/requests/sessions.py", line 477, in get
    return self.request('GET', url, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/pip-7.1.2-py2.7.egg/pip/download.py", line 373, in request
    return super(PipSession, self).request(method, url, *args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/pip-7.1.2-py2.7.egg/pip/_vendor/requests/sessions.py", line 465, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/site-packages/pip-7.1.2-py2.7.egg/pip/_vendor/requests/sessions.py", line 573, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/pip-7.1.2-py2.7.egg/pip/_vendor/cachecontrol/adapter.py", line 36, in send
    cached_response = self.controller.cached_request(request)
  File "/usr/local/lib/python2.7/site-packages/pip-7.1.2-py2.7.egg/pip/_vendor/cachecontrol/controller.py", line 102, in cached_request
    resp = self.serializer.loads(request, self.cache.get(cache_url))
  File "/usr/local/lib/python2.7/site-packages/pip-7.1.2-py2.7.egg/pip/_vendor/cachecontrol/serialize.py", line 108, in loads
    return getattr(self, "_loads_v{0}".format(ver))(request, data)
  File "/usr/local/lib/python2.7/site-packages/pip-7.1.2-py2.7.egg/pip/_vendor/cachecontrol/serialize.py", line 164, in _loads_v2
    cached = json.loads(zlib.decompress(data).decode("utf8"))
error: Error -5 while decompressing data: incomplete or truncated stream

原来在PIP的本地缓存损坏了（在我这里的环境中，默认情况下在 ~/.cache/pip）。
我测试了一下，试图执行 `pip install --no-cache-dir flask-bootstrap`,它工作了。
为了确认这是高速缓存，我执行：

pip uninstall flask-bootstrap
rm -rf ~/.cache/pip/*`
pip install flask-bootstrap

或者 pip install --no-cache-dir install flask-bootstrap

这次它成功了，而它之前总是失败。
我不知道该这个问题是否跟缓存的问题有关。但我的猜测是，PIP被中断下载导致缓存数据被破坏。

Yum_Faq: unpacking of archive failed on file XXX: cpio: rename

Jeffrey Aug 10, 2015 faq

今天对系统进行更新大更新了一百多个包，但唯独一个包安装失败。
错误信息如下： error: unpacking of archive failed on file /etc/inittab: cpio: renamecpio
错误从字面上上来看应该是拷贝/替换文件错误，原因可能是权限控制

[10:31:12][root@XXX ~]#yum upgrade -y
Loaded plugins: fastestmirror
Setting up Upgrade Process
...

Running Transaction
Updating : initscripts-9.03.49-1.el6.centos.x86_64 1/2
Error unpacking rpm package initscripts-9.03.49-1.el6.centos.x86_64
error: unpacking of archive failed on file /etc/inittab: cpio: rename
Verifying : initscripts-9.03.49-1.el6.centos.x86_64 1/2
initscripts-9.03.46-1.el6.centos.1.x86_64 was supposed to be removed but is not!
Verifying : initscripts-9.03.46-1.el6.centos.1.x86_64 2/2

Failed:
initscripts.x86_64 0:9.03.46-1.el6.centos.1 initscripts.x86_64 0:9.03.49-1.el6.centos

Complete!

所以错误应该出现在 /etc/inittab 以至于更新文件无法更改
可能是不可改变的属性已经设置在这个文件之上，让我们先来查看一下。

[10:32:22][root@XXX ~]#lsattr /etc/inittab
----i---------- /etc/inittab

让我们删除这个属性
[10:32:55][root@XXX ~]#chattr -i /etc/inittab

让我们再来执行以下更新看看问题是否解决
[10:33:07][root@XXX ~]#yum upgrade -y
Loaded plugins: fastestmirror
Setting up Upgrade Process
...

Downloading Packages:
initscripts-9.03.49-1.el6.centos.x86_64.rpm | 945 kB 00:00
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
Updating : initscripts-9.03.49-1.el6.centos.x86_64 1/2
warning: /etc/sysctl.conf created as /etc/sysctl.conf.rpmnew
Cleanup : initscripts-9.03.46-1.el6.centos.1.x86_64 2/2
Verifying : initscripts-9.03.49-1.el6.centos.x86_64 1/2
Verifying : initscripts-9.03.46-1.el6.centos.1.x86_64 2/2

Updated:
initscripts.x86_64 0:9.03.49-1.el6.centos

Complete!

至此，问题解决。

FAQ - "Package Power limit" and "Core power limit" notifications.

Jeffrey Aug 03, 2015 faq

Article Summary:

This article provides information on Linux - "Package Power limit" and "Core power limit" notifications.

Environment:

Linux - Various distributions - It seems to affect systems with kernel 2.6.18 and above.
Intel based CPU

Issue:

“Core power limit ” and “Package power limit ” notifications being continuously logged in the /var/log/messages.

Diagnostic Steps:

/var/log/messages will show notifications similar to:
Jun 21 09:34:55 compute-0-15 kernel: CPU7: Core power limit notification (total events = 1)
Jun 21 09:34:55 compute-0-15 kernel: CPU21: Core power limit notification (total events = 1)
Jun 21 09:34:55 compute-0-15 kernel: CPU25: Package power limit notification (total events = 1)
Jun 21 09:34:55 compute-0-15 kernel: CPU3: Package power limit notification (total events = 1)

Resolution:

This issue is currently still being worked by some software vendors.
In the meantime, there are 2 possible workarounds:
Clear PLN flag adding clearcpuid=229 to grub.conf or;
Set the machine's Performance Profile to Max Performance in the BIOS, if that setting is available (Please, be aware that power consumption may increase if this option is chosen).

Note:

Power Limit Notification is a feature that was added to the Sandy Bridge processors from Intel. The notification does not indicate a problem, and should not be reported as a Critical-level message.
The occurrence of the power limit notification is routine, and the system should not be reporting the messages. They can be ignored.

提问的智慧

Jeffrey Jul 07, 2015 faq

原文：How To Ask Questions The Smart Way
作者：Eric Steven Raymond, Rick Moen
翻译：王刚
时间：2010年9月28日

如果你想复制、镜像、翻译或引用本文，请参阅我的复制协议。
弃权申明

许多项目的网站在如何取得帮助的部分链接了本文，这没有关系，也正是我们想要的。但如果你是该项目生成此链接的网管，请在链接附近显著位置注明：我们不提供该项目的服务支持！

我们已经领教了没有此说明带来的痛苦，我们将不停地被一些白痴纠缠，他们认为既然我们发布了本文，那么我们就有责任解决世上所有的技术问题。

如果你是因为需要帮助正在阅读本文，然后就带着可以直接从作者那取得帮助的印象离开，那么你就不幸成了我们所说的白痴之一。别向我们提问，我们不会理睬的。我们只是在这教你如何从那些真正懂得你软硬件问题的人那里取得帮助，但 99.9％的时间我们不会是那些人。除非你非常地确定本文的作者是你遇到问题方面的专家，请不要打搅，这样大家都更开心一点。

引言

Flask_Faq : AttributeError: 'module' object has no attribute 'autoescape'

Jeffrey Jun 12, 2015 faq

前些天从 "spider_net" 上找了一篇文章 , 大致就是利用 highcharts + flask + mysql 构建的一个简单的监控系统 ,
经过几番挣扎终于还是给捣持出来了 , 现在总结一下 , 分享给大家 . o_O

部分报错信息：

* Detected change in 'flask_web.py', reloading
* Restarting with reloader
X.X.X.X - - [11/Jun/2015 15:24:14] "GET / HTTP/1.1" 500 -
Traceback (most recent call last):
...
...
File "/usr/lib64/python2.6/site-packages/jinja2/utils.py", line 195, in import_string
return getattr(__import__(module, None, None, [obj]), obj)
AttributeError: 'module' object has no attribute 'autoescape'

解决办法：
yum install python-pip -y
pip install --upgrade jinja2

redhat 官方宣称这是一个bug , for more :
https://bugzilla.redhat.com/show_bug.cgi?id=799087

Web_Faq: TCP: time wait bucket table overflow

Jeffrey May 16, 2015 faq

前几日查看自己的 VPS 主机返现有报错信息如下：
kernel: TCP: time wait bucket table overflow
kernel: TCP: time wait bucket table overflow
kernel: TCP: time wait bucket table overflow
（TCP:时间等待桶表）

根据报错提示，需要更改net.ipv4.tcp_max_tw_buckets这个内核参数。这个参数是系统同时保持timewait套接字的最大数量。如果超过这个数字，time-wait套接字将立刻被清除并打印警告信息。这个限制仅仅是为了防止简单的 DoS攻击，你绝对不能过分依靠它或者人为地减小这个值，如果网络实际需要大于缺省值，更应该增加这个值(如果增加了内存之后)。

解决方法：增大 tcp_max_tw_buckets的值，并不是这个值越小越好，我看了我系统中TIME_WAIT 大部是由php-fpm产生的，是属于正常的现象。
系统在同时所处理的最大timewait sockets 数目。如果超过此数的话﹐time-wait socket 会被立即砍除并且显示警告信息。之所以要设定这个限制﹐纯粹为了抵御那些简单的 DoS 攻击﹐千万不要人为的降低这个限制﹐不过﹐如果网络条件需要比默认值更多﹐则可以提高它(或许还要增加内存)。

netstat -an | grep 9000 | awk '{print $6}' | sort | uniq -c | sort -rn
netstat -an | grep 80 | awk '{print $6}' | sort | uniq -c | sort -rn

排查步骤：

1. 查看服务器网络连接情况；

# netstat -pant |awk '/^tcp/ {++state[$6]} END {for(key in state) printf("%-10s\t%d\n",key,state[key]) }'
TIME_WAIT 6798
CLOSE_WAIT 1
ESTABLISHED 592
SYN_RECV 69
CLOSING 39
LAST_ACK 19
LISTEN 107

2.查看内核参数
#sysctl -a | grep net.ipv4.tcp_max_tw_buckets
net.ipv4.tcp_max_tw_buckets = 8192

改为：net.ipv4.tcp_max_tw_buckets = 10000

3.使更改的内核参数生效
sysctl -p

4. 再次查看服务器网络连接情况；

# netstat -pant |awk '/^tcp/ {++state[$6]} END {for(key in state) printf("%-10s\t%d\n",key,state[key]) }'
TIME_WAIT 6644
...

5.再看/var/log/messages和dmesg的信息，已经不再报错了
看来net.ipv4.tcp_max_tw_buckets=10000暂时是够用了

6.原因

服务器的TCP连接数，超出了内核定义最大数。

[Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT.

Jeffrey May 16, 2015 faq

问题描述

发现网站反应比较慢，顺便查看了一下mysql的错误日志，发现产生了很多warning的日志：
150527 21:15:28 [Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. INSERT... ON DUPLICATE KEY UPDATE on a table with more than one UNIQUE KEY is unsafe Statement: INSERT INTO 'xxx'

问题原因

查了下原因，xxx 这个表上有2个唯一键。则使用INSERT ... ON DUPLICATE KEY UPDATE ，且当前数据库binlog_format是statement格式，这种sql语句就会报unsafe。

查了下手册

INSERT ... ON DUPLICATE KEY UPDATE statements on tables with multiple primary or unique keys.When executed against a table that contains more than one primary or unique key, this statement is considered unsafe, being sensitive to the order in which the storage engine checks the keys, which is not deterministic, and on which the choice of rows updated by the MySQL Server depends.

INSERT ... ON DUPLICATE KEY UPDATEstatement against a table having more than one unique or primary key is marked as unsafe for statement-based replication beginning with MySQL 5.6.6. (Bug #11765650, Bug #58637)

link : http://dev.mysql.com/doc/refman/5.6/en/replication-rbr-safe-unsafe.html
binlog format : http://dev.mysql.com/doc/refman/5.6/en/replication-options-binary-log.html#sysvar_binlog_format

看官方解释，是server层把数据传给innodb引擎，innodb引擎检查key值比较敏感造成的。

问题解决
两种办法：
1.修改binlog_format格式为mixed；
登陆mysql，执行 set global binlog_format=MIXED; FLUSH LOGS;

注意：
如果是 master->slave 结构的数据库架构。并且 slave上开启了 log_slave_updates。那么在master上修改完binlog格式，开启了log_slave_updates的从库，会同布中断。
所以，需要先在 slave上，设置 binlog_format=mixed，之后再在master上设置。slave报错信息如下：
Last_SQL_Errno: 1666
Last_SQL_Error: Error executing row event: 'Cannot execute statement: impossible to write to binary log since statement is in row format and BINLOG_FORMAT = STATEMENT.'

2.不要使用这类sql；

Mysql_Faq: ERROR 1396 (HY000): Operation CREATE USER failed for 'username'@'hostname'

Jeffrey Dec 21, 2014 faq

在对mysql 权限进行管理的时候出现如下错误：
ERROR 1396 (HY000): Operation CREATE USER failed for 'username'@'hostname'

But 这个用户只真是存在的。回想一个之前的操作 : 先是用 grant 语句创建了一个用户，然后权限有变用 update 更新了一下 mysql.user 的数据。结果就出现了上面的错误。
解决办法：删除无效/冲突的用户授权，重新根据需求授权。
这就是说 MySQL 权限控制最好是使用统一的操作方式。

FLUSH PRIVILEGES不会删除用户，而是从mysql数据库中的授权表重新载入权限。

GRANT, CREATE USER, CREATE SERVER, and INSTALL PLUGIN 语句是缓存到服务器内存当中的。该内存不会被释放由相应的REVOKE, DROP USER, DROP SERVER, and UNINSTALL PLUGIN 语句，因此对于执行该语句的过多的情况下，会有增加内存使用。该缓存内存可以被释放使用 FLUSH PRIVILEGES。

DROP USER
DROP USER user[，user] ...
http://dev.mysql.com/doc/refman/5.1/en/drop-user.html
DROP USER 'username'@HOSTNAME;
CREATE USER 'username'@HOSTNAME [IDENTIFIED BY 'password'];
你可能会需要的，如果你使用的删除刷新权限。
请记住：这并不一定撤销所有该用户可能有（如表的权限）的权限，你将不得不这样做
如果你不这样做，你可能无法重新创建用户。
REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'username'@HOSTNAME;
DELETE FROM mysql.user WHERE user='username';
FLUSH PRIVILEGES;
CREATE USER 'username'@HOSTNAME [IDENTIFIED BY 'password'];

用户的帐户名是等价的：
以“user_name'@'％'。例如，'user_name' 等同于 'user_name'@'％'。

补充阅读：http://dev.mysql.com/doc/refman/5.1/en/account-names.html
请阅读进一步 bug:
http://bugs.mysql.com/bug.php?id=28331
http://bugs.mysql.com/bug.php?id=62255