之前在k8s中部署了rabbitmq、redis、mongodb、zookpeeper集群,由于对集群机制不了解,害怕存在数据丢失的可能,现在将各个集群的机制稍微深入了解,
并且模拟故障测试数据是否丢失,以下为测试步骤

rabbitmq

最初想法是将数据持久化到pvc,当节点出现故障时恢复,但是在实际操作中发现当故障节点重新加入集群时候报如下错误,重新加入集群需要reset,rest后数据会丢失
所以感觉持久化不能保证数据不丢失,所以还是采用镜像模式。
如果/data/rabbitmq/mnesia 目录中存在数据的话需要删除pod,否择不能重新加入到集群,或者在创建的时候不进行任何的 volumeMount。

"init terminating in do_boot",{error,{inconsistent_cluster,"Node 'rabbit@12.240.3.7' thinks it's clustered with node 'rabbit@12.240.4.190', but 'rabbit@12.240.4.190' disagrees"}}}

redis

主从模式,master读写,slave只读。
““
持久化到ceph.rbd的目录有redis的/opt和/data,sentinel的/opt目录。以4个sentinel和4个redis为例
default redis-sentinel-0 2/2 Running 0 1h 12.240.1.122 k8snode4
default redis-sentinel-1 2/2 Running 0 6h 12.240.3.63 k8snode2
default redis-sentinel-2 2/2 Running 2 6h 12.240.2.72 k8snode3
default redis-sentinel-3
default redis-server-0 1/1 Running 1 6h 12.240.2.71 k8snode3
default redis-server-1 1/1 Running 0 1h 12.240.1.121 k8snode4
default redis-server-2 1/1 Running 0 6h 12.240.3.64 k8snode2
default redis-server-3
初始化默认redis-server-0 为master(可读写),slave只可以读。
1、重启redis-server-0所在的节点,后查看状态,redis-server-1 变为master节点。命令如下
kubectl exec -it redis-sentinel-0 -c redis-sword – /bin/sh
# redis-cli -p 26379
127.0.0.1:26379> info
127.0.0.1:26379> SENTINEL masters
127.0.0.1:26379> SENTINEL slaves mymaster
2、测试节点宕机后数据的恢复情况。步骤如下:
1)、正常情况下
kubectl exec -it redis-server-1 – /bin/sh
# redis-cli
127.0.0.1:6379> auth abcd
127.0.0.1:6379> set ding ding
OK
127.0.0.1:6379>
127.0.0.1:6379> get ding
“ding”
2)、slave节点也能get到数据。
3)、重启任意一台slave节点所在的宿主机。然后继续在master节点插入数据
127.0.0.1:6379> set x asdfasdfads
OK
127.0.0.1:6379> get x
“asdfasdfads”
在其他正常的slave节点也能查询到
4)、故障服务器自动启动后查询正常。
127.0.0.1:6379> get x
“asdfasdfads”
5)、在出现故障的时候删除数据,故障几点启动后查询也为空。

““

mongdb

主从模式,master读写,slave可以读,默认不允许读写,db.getMongo().setSlaveOk()设置备用节点可读。
““
mongo-0 1/1 Running 0 8h 12.240.3.61 k8snode2
mongo-1 1/1 Running 0 3h 12.240.1.120 k8snode4
mongo-2 1/1 Running 1 8h 12.240.2.78 k8snode3
mongo-3 1/1 Running 4 8h 12.240.4.76 k8snode1
1、查看状态
mongo:PRIMARY> rs.status()
{
“set” : “mongo”,
“date” : ISODate(“2017-08-31T09:30:32.700Z”),
“myState” : 1,
“term” : NumberLong(3),
“heartbeatIntervalMillis” : NumberLong(2000),
“optimes” : {
“lastCommittedOpTime” : {
“ts” : Timestamp(1504171825, 1),
“t” : NumberLong(3)
},
“appliedOpTime” : {
“ts” : Timestamp(1504171825, 1),
“t” : NumberLong(3)
},
“durableOpTime” : {
“ts” : Timestamp(1504171825, 1),
“t” : NumberLong(3)
}
},
“members” : [
{
“_id” : 0,
“name” : “mongo-0.mongo.default.svc.cluster.local:27017”,
“health” : 1,
“state” : 1,
“stateStr” : “PRIMARY”,
“uptime” : 28350,
“optime” : {
“ts” : Timestamp(1504171825, 1),
“t” : NumberLong(3)
},
“optimeDate” : ISODate(“2017-08-31T09:30:25Z”),
“electionTime” : Timestamp(1504159230, 1),
“electionDate” : ISODate(“2017-08-31T06:00:30Z”),
“configVersion” : 12,
“self” : true
},
{
“_id” : 1,
“name” : “mongo-1.mongo.default.svc.cluster.local:27017”,
“health” : 1,
“state” : 2,
“stateStr” : “SECONDARY”,
“uptime” : 12541,
“optime” : {
“ts” : Timestamp(1504171825, 1),
“t” : NumberLong(3)
},
“optimeDurable” : {
“ts” : Timestamp(1504171825, 1),
“t” : NumberLong(3)
},
“optimeDate” : ISODate(“2017-08-31T09:30:25Z”),
“optimeDurableDate” : ISODate(“2017-08-31T09:30:25Z”),
“lastHeartbeat” : ISODate(“2017-08-31T09:30:31.156Z”),
“lastHeartbeatRecv” : ISODate(“2017-08-31T09:30:30.179Z”),
“pingMs” : NumberLong(0),
“configVersion” : 12
},
{
“_id” : 2,
“name” : “mongo-2.mongo.default.svc.cluster.local:27017”,
“health” : 1,
“state” : 2,
“stateStr” : “SECONDARY”,
“uptime” : 6152,
“optime” : {
“ts” : Timestamp(1504171825, 1),
“t” : NumberLong(3)
},
“optimeDurable” : {
“ts” : Timestamp(1504171825, 1),
“t” : NumberLong(3)
},
“optimeDate” : ISODate(“2017-08-31T09:30:25Z”),
“optimeDurableDate” : ISODate(“2017-08-31T09:30:25Z”),
“lastHeartbeat” : ISODate(“2017-08-31T09:30:31.156Z”),
“lastHeartbeatRecv” : ISODate(“2017-08-31T09:30:30.189Z”),
“pingMs” : NumberLong(0),
“configVersion” : 12
},
{
“_id” : 3,
“name” : “mongo-3.mongo.default.svc.cluster.local:27017”,
“health” : 1,
“state” : 2,
“stateStr” : “SECONDARY”,
“uptime” : 5,
“optime” : {
“ts” : Timestamp(1504171825, 1),
“t” : NumberLong(3)
},
“optimeDurable” : {
“ts” : Timestamp(1504171825, 1),
“t” : NumberLong(3)
},
“optimeDate” : ISODate(“2017-08-31T09:30:25Z”),
“optimeDurableDate” : ISODate(“2017-08-31T09:30:25Z”),
“lastHeartbeat” : ISODate(“2017-08-31T09:30:31.180Z”),
“lastHeartbeatRecv” : ISODate(“2017-08-31T09:30:30.331Z”),
“pingMs” : NumberLong(0),
“configVersion” : 12
}
],
“ok” : 1
}

2、插入数据,在任何一个节点上可以查到。
mongo:PRIMARY> db.user.insert({fname:”jeff”,lname:”jiang”})
WriteResult({ “nInserted” : 1 })
mongo:PRIMARY>
mongo:PRIMARY> db.user.find()
{ “_id” : ObjectId(“59a7d7f573edcefd94f5bd9e”), “fname” : “jeff”, “lname” : “jiang” }
mongo:PRIMARY>
mongo:PRIMARY> i={fname:”chengcheng”,lname:”zhang”}
{ “fname” : “chengcheng”, “lname” : “zhang” }
mongo:PRIMARY>
mongo:PRIMARY> j={fname:”dengdeng”,lname:”pan”}
{ “fname” : “dengdeng”, “lname” : “pan” }
mongo:PRIMARY> db.user.insert(i)
WriteResult({ “nInserted” : 1 })
mongo:PRIMARY> db.user.insert(j)
WriteResult({ “nInserted” : 1 })
mongo:PRIMARY>
mongo:PRIMARY> db.user.find()
{ “_id” : ObjectId(“59a7d7f573edcefd94f5bd9e”), “fname” : “jeff”, “lname” : “jiang” }
{ “_id” : ObjectId(“59a7d9c373edcefd94f5bd9f”), “fname” : “chengcheng”, “lname” : “zhang” }
{ “_id” : ObjectId(“59a7d9c673edcefd94f5bda0”), “fname” : “dengdeng”, “lname” : “pan” }
3、重启其中一台服务器模拟故障,在重启的过程中删除一条数据并且添加一条新的数据,然后等故障节点回复后进行查询,查看数据是否同步。
1)、模拟mongo-3出现故障,重启
2)、进行数据操作,添加一条,删除一条
mongo:PRIMARY> db.user.find()
{ “_id” : ObjectId(“59a7d7f573edcefd94f5bd9e”), “fname” : “jeff”, “lname” : “jiang” }
{ “_id” : ObjectId(“59a7d9c373edcefd94f5bd9f”), “fname” : “chengcheng”, “lname” : “zhang” }
{ “_id” : ObjectId(“59a7d9c673edcefd94f5bda0”), “fname” : “dengdeng”, “lname” : “pan” }
mongo:PRIMARY>
mongo:PRIMARY> x={fname:”bbbbb”,lname:”xxxxxxxxxx”}
{ “fname” : “bbbbb”, “lname” : “xxxxxxxxxx” }
mongo:PRIMARY> db.user.insert(x)
WriteResult({ “nInserted” : 1 })
mongo:PRIMARY>
mongo:PRIMARY> db.user.remove({fname:”chengcheng”,lname:”zhang”})
WriteResult({ “nRemoved” : 1 })
mongo:PRIMARY>
mongo:PRIMARY> db.user.find()
{ “_id” : ObjectId(“59a7d7f573edcefd94f5bd9e”), “fname” : “jeff”, “lname” : “jiang” }
{ “_id” : ObjectId(“59a7d9c673edcefd94f5bda0”), “fname” : “dengdeng”, “lname” : “pan” }
{ “_id” : ObjectId(“59a7dafc73edcefd94f5bda1”), “fname” : “bbbbb”, “lname” : “xxxxxxxxxx” }
待故障节点重新启动后加入到集群中,登录故障节点查询数据结果一致。
““

zookeeper


client请求读写可以请求任何一个节点,fllower在和leader之间进行request,proposal,ack,commit等交互。
1、在任意节点执行创建操作。
[zk: localhost:2181(CONNECTED) 0] create /apptest "this is test data"
Created /apptest
[zk: localhost:2181(CONNECTED) 1] set /apptest 123123123
cZxid = 0x30000000b
ctime = Fri Sep 01 03:40:02 GMT 2017
mZxid = 0x30000000c
mtime = Fri Sep 01 03:40:18 GMT 2017
pZxid = 0x30000000b
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 9
numChildren = 0
[zk: localhost:2181(CONNECTED) 2]
2、在任意节点查询
[zk: localhost:2181(CONNECTED) 5] get /apptest
123123123
cZxid = 0x30000000b
ctime = Fri Sep 01 03:40:02 GMT 2017
mZxid = 0x30000000c
mtime = Fri Sep 01 03:40:18 GMT 2017
pZxid = 0x30000000b
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 9
numChildren = 0
3、模拟一个节点宕机,在重启的过程中进行删除和创建,等待故障节点启动后再进行相应的查询,查看故障节点是否有数据同步。
正常节点操作:
[zk: localhost:2181(CONNECTED) 2] create /app2test "sss"
Created /app2test
[zk: localhost:2181(CONNECTED) 3] set /app2test 4444444
cZxid = 0x400000002
ctime = Fri Sep 01 03:47:16 GMT 2017
mZxid = 0x400000003
mtime = Fri Sep 01 03:47:31 GMT 2017
pZxid = 0x400000002
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 7
numChildren = 0
[zk: localhost:2181(CONNECTED) 4] delete /apptest
故障节点恢复后查询,数据是同步的
[zk: localhost:2181(CONNECTED) 0] get /app2test
4444444
cZxid = 0x400000002
ctime = Fri Sep 01 03:47:16 GMT 2017
mZxid = 0x400000003
mtime = Fri Sep 01 03:47:31 GMT 2017
pZxid = 0x400000002
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 7
numChildren = 0
[zk: localhost:2181(CONNECTED) 1] get /app2test
Node does not exist: /app2test

Logo

开源、云原生的融合云平台

更多推荐