Suzf Blog

keep it simple, stupid.

Tag Ceph

How-to deal with Ceph Monitor DB compaction?

Jeffrey Dec 22, 2015 Ceph

Issue

Ceph Monitors DB compaction
mon.ceph1 store is getting too big! 48031 MB >= 15360 MB -- 62% avail
mon.ceph2 store is getting too big! 47424 MB >= 15360 MB -- 63% avail
mon.ceph3 store is getting too big! 46524 MB >= 15360 MB -- 63% avail

In Three Monitor nodes each one have ~50GB of store.db:
du -sch /var/lib/ceph/mon/ceph-ceph1/store.db/
47G /var/lib/ceph/mon/ceph-ceph1/store.db/
47G total

We've set the following in our ceph.conf:
[mon]
mon compact on start = true
Then we restart one of the monitor to trigger the compact process.
Noticed that size of store.db increase more (and is still increasing) but it should decrease.

However

If mon compact on start is set true.

The larger the database, the longer the compaction would take. there by increasing the time for a node to join cluster / form quorum. <on the procuction, i restart one mon service. it costs more than one hour. It's soo long! >

This probably need a review alongside any other existing cluster-level heartbeats/failover process for safety if this approach is selected.

Clearly we don't want this on by default, but having the option to turn it on via auto-manage-soft might be nice.

Note you can also tell a monitor to run compaction on the fly with

sudo ceph tell mon.{id} compact

Ceph 单/多节点安装小结

Jeffrey Nov 25, 2015 Ceph

概述

Docs : http://docs.ceph.com/docs

Ceph是一个分布式文件系统，在维持POSIX兼容性的同时加入了复制和容错功能。Ceph最大的特点是分布式的元数据服务器，通过CRUSH（Controlled Replication Under Scalable Hashing）这种拟算法来分配文件的location。Ceph的核心是RADOS（ReliableAutonomic Distributed Object Store)，一个对象集群存储，本身提供对象的高可用、错误检测和修复功能。

Ceph生态系统架构可以划分为四部分：

client：客户端（数据用户）。client向外export出一个POSIX文件系统接口，供应用程序调用，并连接mon/mds/osd，进行元数据及数据交互；最原始的client使用FUSE来实现的，现在写到内核里面了，需要编译一个ceph.ko内核模块才能使用。
mon：集群监视器，其对应的daemon程序为cmon（Ceph Monitor）。mon监视和管理整个集群，对客户端export出一个网络文件系统，客户端可以通过mount -t ceph monitor_ip:/ mount_point命令来挂载Ceph文件系统。根据官方的说法，3个mon可以保证集群的可靠性。
mds：元数据服务器，其对应的daemon程序为cmds（Ceph Metadata Server）。Ceph里可以有多个MDS组成分布式元数据服务器集群，就会涉及到Ceph中动态目录分割来进行负载均衡。
osd：对象存储集群，其对应的daemon程序为cosd（Ceph Object StorageDevice）。osd将本地文件系统封装一层，对外提供对象存储的接口，将数据和元数据作为对象存储。这里本地的文件系统可以是ext2/3，但Ceph认为这些文件系统并不能适应osd特殊的访问模式，它们之前自己实现了ebofs，而现在Ceph转用btrfs。

Ceph支持成百上千甚至更多的节点，以上四个部分最好分布在不同的节点上。当然，对于基本的测试，可以把mon和mds装在一个节点上，也可以把四个部分全都部署在同一个节点上。