License: Attribution-NonCommercial-ShareAlike 4.0 International
本文出自 Suzf Blog。 如未注明,均为 SUZF.NET 原创。
什么是 Consul? Consul有多个组件,但作为一个整体,它是用于发现和配置基础架构中的服务的工具。它提供几个关键特性: 服务发现:Consul的客户端可以提供一个服务,如api或mysql,其他客户端可以使用Consul来发现给定服务的提供者。使用DNS或HTTP,应用程序可以轻松找到它们所依赖的服务。 健康检查:Consul 客户端可以提供与给定服务(“web服务器返回200 OK”)或本地节点(“内存利用率低于90%”)相关联的任何数量的运行状况检查。操作员可以使用此信息来监视群集运行状况,并且服务发现组件使用此信息来将流量路由到不正常的主机。 键/值存储:应用程序可以使用Consul的分层键/值存储用于任何用途,包括动态配置,功能标记,协调,领导选举等。简单的HTTP API使其易于使用。 多数据中心:Consul支持多个数据中心开箱即用。这意味着Consul的用户不必担心构建额外的抽象层以扩展到多个区域。 Consul对DevOps社区和应用程序开发人员设计的更为友好,使其成为完美的现代化,弹性基础设施。
consul的基本概念
agent 组成 consul 集群的每个成员上都要运行一个 agent,可以通过 consul agent 命令来启动。agent 可以运行在 server 状态或者 client 状态。自然的,运行在 server 状态的节点被称为 server 节点;运行在 client 状态的节点被称为 client 节点。client 节点 负责转发所有的 RPC 到 server 节点。本身无状态,且轻量级,因此,可以部署大量的 client 节点。server 节点 负责组成 cluster 的复杂工作(选举、状态维护、转发请求到 lead),以及 consul 提供的服务(响应 RCP 请求)。考虑到容错和收敛,一般部署 3 ~ 5 个比较合适。 datacenter 多机房使用的数据共享
基本架构图
Prerequisites and Goals
在本系列中,我们将设置一个服务系统,这些服务将能够相互通信并维护服务信息,键/值存储池以及客户端计算机的其他详细信息。在我们安装软件和自动化我们的一些配置时,我们将在本指南中介绍完成系统生产准备的第一步。 consul文档建议您在每个数据中心中运行3或5个consul server,以避免在服务器故障时数据丢失。Leader服务器是执行繁重工作的组件。它们存储关于服务和键/值信息的信息。奇数的服务器数量是必要的,以避免选举期间的僵局问题。 除了Leader服务器,其他机器可以运行consul agent。consul agent非常轻量级并且简单地将请求转发到服务器。它们提供了一种隔离服务器的方法,并将已知道服务器地址的责任分给agent自身。 为了在以后的指南中实现一些安全机制,我们需要在一个域中命名所有的机器。这样,我们可以稍后颁发通配符SSL证书。 我们的机器的细节在这里:
# IP HOSTNAME ROLE 172.16.9.86 lucy.suzf.net bootstrap consul server 172.16.9.10 eva.suzf.net consul server 172.16.9.20 cali.suzf.net consul server 172.16.9.100 web01.suzf.net consul agent
本文测试环境为 Ubuntu 16.04.1 LTS (64-bit)
Download and Install Consul
更新本地系统软件包的缓存,之后使用 apt 安装下面的软件包
apt-get install -y curl unzip
现在, 我们可以去 consul download page 下载 Consul 程序了。
# On all nodes cd /usr/local/src curl -OL https://releases.hashicorp.com/consul/0.7.2/consul_0.7.2_linux_amd64.zip unzip consul_0.7.2_linux_amd64.zip mv consul /usr/bin adduser consul mkdir /etc/consul /opt/data/consul -p chown consul:consul /opt/data/consul
Creating the Bootstrap Configuration
我们需要创建的第一个配置是引导集群。 这不是一个非常常见的事件,因为它只需要最初创建集群。 但是,我们将创建配置文件,以便在群集完全关闭的情况下,我们可以快速重新启动。 您可以将此配置文件仅放置在您的一个服务器服务器上,或者在所有服务器上为您提供更多的引导选项。 我们只会把它放在 lucy 节点上进行演示。 配置文件存储在简单的JSON中,因此它们非常容易管理。 在这个文件中,我们可以在使用此配置指定它,consul 应该在 bootstrap 模式下作为 server 启动:
@lucy:~# cat /etc/consul/config.json { "bootstrap": true, "server": true }
我们还应该指定集群将要驻留的数据中心。 这可以是帮助您标识集群的物理位置的任何名称。 consul 支持多数据中心,这些指定将帮助您按数据中心组织不同的集群。 我们还可以传入我们已经创建好的`/opt/data/consul` 作为数据存储目录。 consul 将使用它来存储有关集群状态的信息:
@lucy:~# cat /etc/consul/config.json { "bootstrap": true, "server": true, "datacenter": "sdc", "data_dir": "/opt/data/consul" }
接下来我们对服务进行简单的加密
# consul keygen 2qpBV1BtjAGZKYmqsDqXxA==
将生成的密钥串添加到配置文件中
@lucy:~# cat /etc/consul/config.json { "bootstrap": true, "server": true, "datacenter": "sdc", "data_dir": "/opt/data/consul", "encrypt": "2qpBV1BtjAGZKYmqsDqXxA==" }
最后,我们将添加一些额外的信息来指定日志级别,并指明要使用syslog进行日志记录:
@lucy:~# cat /etc/consul/config.json { "bootstrap": true, "server": true, "datacenter": "sdc", "data_dir": "/opt/data/consul", "encrypt": "2qpBV1BtjAGZKYmqsDqXxA==", "log_level": "INFO", "enable_syslog": true }
测试配置文件
# 没有输出是最好的结果 如果没有任何输出说明配置文件没有问题 # 反之, 亦然 consul configtest -config-dir /etc/consul
这时候可以测试启动,看看配置文件是否正常
@lucy:~# /usr/bin/consul agent -config-dir /etc/consul ==> Starting Consul agent... ==> Error starting agent: Failed to get advertise address: Multiple private IPs found. Please configure one.
从上面的提示信息可以看到在这个主机上发现了多个私有IP,这时候需要我们指定一个IP
@lucy:~# cat /etc/consul/config.json { "bootstrap": true, "server": true, "datacenter": "sdc", "data_dir": "/opt/data/consul", "encrypt": "2qpBV1BtjAGZKYmqsDqXxA==", "log_level": "INFO", "enable_syslog": true, "bind_addr": "172.16.9.86" }
Ref: https://www.consul.io/docs/agent/options.html#_bind https://www.consul.io/docs/agent/options.html#_advertise
Creating Cluster Configuration
现在我们已经完成了引导配置,我们可以使用它作为我们的一般服务器配置的基础。 一旦集群被引导,将使用服务器配置。 下面是 三个 Server 节点的配置实例
@lucy:~# cat /etc/consul/config.json { "bootstrap": false, "server": true, "datacenter": "sdc", "data_dir": "/opt/data/consul", "encrypt": "2qpBV1BtjAGZKYmqsDqXxA==", "log_level": "INFO", "enable_syslog": true, "bind_addr": "172.16.9.86", "node_name": "lucy.suzf.net", "bootstrap_expect": 3, "rejoin_after_leave": true, "retry_interval": "30s", "retry_join": [ "lucy.suzf.net", "eva.suzf.net", "cali.suzf.net" ] } @eva:~# cat /etc/consul/config.json { "bootstrap": false, "server": true, "datacenter": "sdc", "data_dir": "/opt/data/consul", "encrypt": "2qpBV1BtjAGZKYmqsDqXxA==", "log_level": "INFO", "enable_syslog": true, "bind_addr": "172.16.9.10", "node_name": "eva.suzf.net", "bootstrap_expect": 3, "rejoin_after_leave": true, "retry_interval": "30s", "retry_join": [ "lucy.suzf.net", "eva.suzf.net", "cali.suzf.net" ] } @cali:~# cat /etc/consul/config.json { "bootstrap": false, "server": true, "datacenter": "sdc", "data_dir": "/opt/data/consul", "encrypt": "2qpBV1BtjAGZKYmqsDqXxA==", "log_level": "INFO", "enable_syslog": true, "bind_addr": "172.16.9.20", "node_name": "cali.suzf.net", "bootstrap_expect": 3, "rejoin_after_leave": true, "retry_interval": "30s", "retry_join": [ "lucy.suzf.net", "eva.suzf.net", "cali.suzf.net" ] }
注: -bootstrap-expect
选项提示Consul我们期待加入的server节点的数量.这个选项的作用是启动时推迟日志复制直到我们期望的server都成功加入时.你可以阅读启动指南了解更多.
对于系统中的所有参与者,encrypt参数必须相同,因此复制文件已经为我们满足了这一要求。 创建新配置时请记住这一点。
Creating the Client Configuration
现在所有 consul server 的配置已经完成了。我们可以专注于让我们的客户端机器正确的配置和运行。
@happycode:~# cat /etc/consul/config.json { "server": false, "datacenter": "sdc", "data_dir": "/opt/data/consul", "encrypt": "2qpBV1BtjAGZKYmqsDqXxA==", "log_level": "INFO", "enable_syslog": true, "bind_addr": "172.16.9.110", "node_name": "happycode.suzf.net", "rejoin_after_leave": true, "retry_join": [ "lucy.suzf.net", "eva.suzf.net", "cali.suzf.net" ] }
Create an Upstart Script
在所有节点上安装 daemon
[on all node] # apt-get install -y daemon
Consul init script
#! /bin/bash ### BEGIN INIT INFO # Provides: consul # Required-Start: $local_fs $remote_fs $syslog $named $network # Required-Stop: $local_fs $remote_fs $syslog $named $network # Default-Start: 2 3 4 5 # Default-Stop: S 0 1 6 # Short-Description: Consul service discovery framework # Description: Healthchecks local services and registers # them in a central consul database. ### END INIT INFO # Do NOT "set -e" PATH=/usr/sbin:/usr/bin:/sbin:/bin DESC="Consul service discovery framework" NAME=consul DAEMON=/usr/bin/$NAME PIDFILE=/var/run/$NAME/$NAME.pid DAEMON_ARGS="agent -config-dir /etc/consul" USER=consul SCRIPTNAME=/etc/init.d/$NAME RPC_ADDR=-rpc-addr=127.0.0.1:8400 # Exit if the package is not installed [ -x "$DAEMON" ] || exit 0 # Read configuration variable file if it is present [ -r /etc/default/$NAME ] && . /etc/default/$NAME # Load the VERBOSE setting and other rcS variables [ -f /etc/default/rcS ] && . /etc/default/rcS # Define LSB log_* functions. # Depend on lsb-base (>= 3.0-6) to ensure that this file is present. . /lib/lsb/init-functions # # Function to create run directory # mkrundir() { [ ! -d /var/run/consul ] && mkdir -p /var/run/consul chown $USER /var/run/consul } # # Function that starts the daemon/service # do_start() { # Return # 0 if daemon has been started # 1 if daemon was already running # 2 if daemon could not be started mkrundir start-stop-daemon --start --quiet --pidfile $PIDFILE --exec $DAEMON --chuid $USER --background --make-pidfile --test > /dev/null \ || return 1 start-stop-daemon --start --quiet --pidfile $PIDFILE --exec $DAEMON --chuid $USER --background --make-pidfile -- \ $DAEMON_ARGS \ || return 2 for i in `seq 1 30`; do if ! start-stop-daemon --quiet --stop --test --pidfile $PIDFILE --exec $DAEMON --user $USER; then RETVAL=2 sleep 1 continue fi if "$DAEMON" info ${RPC_ADDR} >/dev/null; then return 0 fi done return "$RETVAL" } # # Function that stops the daemon/service # do_stop() { # If consul is not acting as a server, exit gracefully if ("${DAEMON}" info ${RPC_ADDR} 2>/dev/null | grep -q 'server = false' 2>/dev/null) ; then "$DAEMON" leave ${RPC_ADDR} fi # Return # 0 if daemon has been stopped # 1 if daemon was already stopped # 2 if daemon could not be stopped # other if a failure occurred start-stop-daemon --stop --quiet --retry=TERM/30/KILL/5 --pidfile $PIDFILE --name $NAME RETVAL="$?" [ "$RETVAL" = 2 ] && return 2 # Wait for children to finish too if this is a daemon that forks # and if the daemon is only ever run from this initscript. # If the above conditions are not satisfied then add some other code # that waits for the process to drop all resources that could be # needed by services started subsequently. A last resort is to # sleep for some time. start-stop-daemon --stop --quiet --oknodo --retry=0/30/KILL/5 --exec $DAEMON [ "$?" = 2 ] && return 2 # Many daemons don't delete their pidfiles when they exit. rm -f $PIDFILE return "$RETVAL" } # # Function that sends a SIGHUP to the daemon/service # do_reload() { # # If the daemon can reload its configuration without # restarting (for example, when it is sent a SIGHUP), # then implement that here. # start-stop-daemon --stop --signal 1 --quiet --pidfile $PIDFILE --name $NAME return 0 } case "$1" in start) [ "$VERBOSE" != no ] && log_daemon_msg "Starting $DESC" "$NAME" do_start case "$?" in 0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;; 2) [ "$VERBOSE" != no ] && log_end_msg 1 ;; esac ;; stop) [ "$VERBOSE" != no ] && log_daemon_msg "Stopping $DESC" "$NAME" do_stop case "$?" in 0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;; 2) [ "$VERBOSE" != no ] && log_end_msg 1 ;; esac ;; #reload|force-reload) # # If do_reload() is not implemented then leave this commented out # and leave 'force-reload' as an alias for 'restart'. # #log_daemon_msg "Reloading $DESC" "$NAME" #do_reload #log_end_msg $? #;; restart|force-reload) # # If the "reload" option is implemented then remove the # 'force-reload' alias # log_daemon_msg "Restarting $DESC" "$NAME" do_stop case "$?" in 0|1) do_start case "$?" in 0) log_end_msg 0 ;; 1) log_end_msg 1 ;; # Old process is still running *) log_end_msg 1 ;; # Failed to start esac ;; *) # Failed to stop log_end_msg 1 ;; esac ;; status) status_of_proc -p $PIDFILE $DAEMON $NAME && exit 0 || exit $? ;; *) #echo "Usage: $SCRIPTNAME {start|stop|restart|reload|force-reload}" >&2 echo "Usage: $SCRIPTNAME {start|stop|status|restart|force-reload}" >&2 exit 3 ;; esac :
Consul systemd script
@lucy:~# cat /etc/systemd/system/consul.service [Unit] Description=Consul Agent After=network.target [Service] User=consul Group=consul ExecStart=/usr/bin/consul agent \ -config-dir /etc/consul ExecReload=/bin/kill -HUP $MAINPID Restart=on-failure LimitNOFILE=131072 [Install] WantedBy=multi-user.target
Systemd 相关链接: ArchLinux Systemd Wiki 注: 启动脚本来源 https://github.com/solarkennedy/puppet-consul/tree/master/templates 依次启动 Consul server & agent 服务 查看状态信息
# 查看成员 @lucy:~# consul members Node Address Status Type Build Protocol DC cali.suzf.net 172.16.9.20:8301 alive server 0.7.2 2 sdc eva.suzf.net 172.16.9.10:8301 alive server 0.7.2 2 sdc happycode.suzf.net 172.16.9.110:8301 alive client 0.7.2 2 sdc lucy.suzf.net 172.16.9.86:8301 alive server 0.7.2 2 sdc # 查看 leader @lucy:~# consul info | grep leader leader = true leader_addr = 172.16.9.86:8300
Regist services
注册服务可以通过提供服务定义或者调用HTTP API来注册一个服务。服务定义文件是注册服务的最通用的方式。本文以定义服务文件为例: 在 Consul agent 节点上定义 HTTP 服务
@happycode:~# cat /etc/consul/service_nginx.json { "service": { "name": "Nginx", "tags": ["HTTP"], "port": 80, "check": { "script": "curl localhost >/dev/null 2>&1", "interval": "10s" } } }
如果你想注册多个服务,你应该在Consul配置目录创建多个服务定义文件。 之后重新加载配置文件
systemctl reload consul.service or service consul reload
Query service
我们可以通过DNS或者HTTP的API来查询服务.
DNS API
让我们首先使用DNS API来查询.在DNS API中,服务的DNS名字是 NAME.service.consul
. 虽然是可配置的,但默认的所有DNS名字会都在consul
命名空间下.这个子域告诉Consul,我们在查询服务,NAME
则是服务的名称.
对于我们上面注册的Web服务.它的域名是 nginx.service.consul
:
@happycode:~# dig @127.0.0.1 -p 8600 nginx.service.consul ; <<>> DiG 9.10.3-P4-Ubuntu <<>> @127.0.0.1 -p 8600 nginx.service.consul ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4149 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;nginx.service.consul. IN A ;; ANSWER SECTION: nginx.service.consul. 0 IN A 172.16.9.110 ;; Query time: 8 msec ;; SERVER: 127.0.0.1#8600(127.0.0.1) ;; WHEN: Thu Jan 19 16:17:34 CST 2017 ;; MSG SIZE rcvd: 54
如你所见,一个A
记录返回了一个可用的服务所在的节点的IP地址.`A
记录只能设置为IP地址. 有也可用使用 DNS API 来接收包含 地址和端口的 SRV记录
@happycode:~# dig @127.0.0.1 -p 8600 nginx.service.consul SRV ; <<>> DiG 9.10.3-P4-Ubuntu <<>> @127.0.0.1 -p 8600 nginx.service.consul SRV ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63951 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;nginx.service.consul. IN SRV ;; ANSWER SECTION: nginx.service.consul. 0 IN SRV 1 1 80 happycode.suzf.net.node.sdc.consul. ;; ADDITIONAL SECTION: happycode.suzf.net.node.sdc.consul. 0 IN A 172.16.9.110 ;; Query time: 1 msec ;; SERVER: 127.0.0.1#8600(127.0.0.1) ;; WHEN: Thu Jan 19 16:21:40 CST 2017 ;; MSG SIZE rcvd: 102
SRV
记录告诉我们 nginx
这个服务运行于节点 happycode.suzf.net.node.sdc.consul
的80
端口. DNS额外返回了节点的A记录.
最后,我们也可以用 DNS API 通过标签来过滤服务.基于标签的服务查询格式为TAG.NAME.service.consul
. 在下面的例子中,我们请求Consul返回有 HTTP
标签的 Nginx
服务.我们成功获取了我们注册为这个标签的服务:
@happycode:~# dig @127.0.0.1 -p 8600 http.nginx.service.consul SRV ; <<>> DiG 9.10.3-P4-Ubuntu <<>> @127.0.0.1 -p 8600 http.nginx.service.consul SRV ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34361 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;http.nginx.service.consul. IN SRV ;; ANSWER SECTION: http.nginx.service.consul. 0 IN SRV 1 1 80 happycode.suzf.net.node.sdc.consul. ;; ADDITIONAL SECTION: happycode.suzf.net.node.sdc.consul. 0 IN A 172.16.9.110 ;; Query time: 165 msec ;; SERVER: 127.0.0.1#8600(127.0.0.1) ;; WHEN: Thu Jan 19 16:25:11 CST 2017 ;; MSG SIZE rcvd: 107
HTTP REST API
查询 nginx 服务
@happycode:~# curl http://localhost:8500/v1/catalog/service/nginx [{"Node":"happycode.suzf.net","Address":"172.16.9.110","TaggedAddresses":{"lan":"172.16.9.110","wan":"172.16.9.110"},"ServiceID":"Nginx","ServiceName":"Nginx","ServiceTags":["HTTP"],"ServiceAddress":"","ServicePort":80,"ServiceEnableTagOverride":false,"CreateIndex":488,"ModifyIndex":580}]
只查看健康的实例的查询方法
@happycode:~# curl http://localhost:8500/v1/catalog/service/nginx?passing [{"Node":"happycode.suzf.net","Address":"172.16.9.110","TaggedAddresses":{"lan":"172.16.9.110","wan":"172.16.9.110"},"ServiceID":"Nginx","ServiceName":"Nginx","ServiceTags":["HTTP"],"ServiceAddress":"","ServicePort":80,"ServiceEnableTagOverride":false,"CreateIndex":488,"ModifyIndex":580}]
更多 API 参见 https://www.consul.io/docs/agent/http.html
Health Check
和服务类似,一个检查可以通过检查定义或HTTP API请求来注册.
我们将使用和检查定义来注册检查.和服务类似,因为这是建立检查最常用的方式.
@happycode:~# echo '{ > "check": { > "name": "ping", > "script": "ping -c1 google.com >/dev/null", > "interval": "30s" > } > }' > /etc/consul/check_ping.json
该文件定义增加了一个主机级别的检查,名字为 "ping" . 这个检查每30秒执行一次,执行 ping -c1 google.com
. 在基于脚本的健康检查中,脚本运行在与Consul进程一样的用户下.如果这个命令以非0值退出的话这个节点就会被标记为不健康.这是所有基于脚本的健康检查的约定. 检查监控状态
# curl http://localhost:8500/v1/health/state/critical
另外,我们可以尝试用DNS查询web服务,Consul将不会返回结果.因为服务不健康.
Download and Install Web UI
@lucy # download Consul Web UI mkdir -p /opt/data/consul/ui cd /opt/data/consul/ui curl -OL https://releases.hashicorp.com/consul/0.7.2/consul_0.7.2_web_ui.zip unzip consul_0.7.2_web_ui.zip rm consul_0.7.2_web_ui.zip # Add ui setting on consul config @lucy:~# cat /etc/consul/config.json { "bootstrap": false, "server": true, "datacenter": "sdc", "data_dir": "/opt/data/consul", "encrypt": "2qpBV1BtjAGZKYmqsDqXxA==", "log_level": "INFO", "enable_syslog": true, "bind_addr": "172.16.9.86", "node_name": "lucy.suzf.net", "bootstrap_expect": 3, "rejoin_after_leave": true, "retry_interval": "30s", "retry_join": [ "lucy.suzf.net", "eva.suzf.net", "cali.suzf.net" ], "ui": true, "ui_dir": "/opt/data/consul/ui", "client_addr": "0.0.0.0" } # reload service systemctl reload consul.service
浏览器打开 http://172.16.9.86:8500 可以看到Consul 的所有信息
Key Value Store
https://www.consul.io/docs/agent/http/kv.html
Reference
[0] https://www.consul.io/docs/index.html [1] https://imaginea.gitbooks.io/consul-devops-handbook/ [2] https://www.digitalocean.com/community/tutorials/an-introduction-to-using-consul-a-service-discovery-system-on-ubuntu-14-04 ~ EOF ~