#ceph IRC Log


IRC Log for 2011-09-10

Timestamps are in GMT/BST.

[0:00] <greglap> I'm not real solid on how marking things lost impacts this, though, I'll poke sjust or sagewk when they're back at their computers
[0:01] <greglap> but I think it's possible you don't have anybody left who is responsible for them, and so the OSDs that are supposed to hold them don't know anything about them
[0:03] <slang> greglap: well, none of the osds that are listed for the down+peering pgs are any of the ones that were lost
[0:03] <greglap> slang: yes, at this point nothing is mapped to those OSDs
[0:04] <greglap> but PGs are proactively pushed around as the map changes, so if all the replicas for a PG were lost then strange things can happen
[0:04] <greglap> I expect there's a magic incantation to override the strict consistency that OSDs normally follow, but I'm not sure what it is
[0:13] <sjust> slang: try ceph pg force_create_pg <pgid> on one of the pgs
[0:13] <sjust> if it was mapped only to those lost osds, it won't recover on it's own
[0:13] <greglap> probably want to check the existing OSDs to make sure that none of the down PGs have any on-disk state, right?
[0:15] <slang> that seemed to work for that pg
[0:15] <slang> but a pg shouldn't have been mapped to only lost osds
[0:15] <slang> let me explain though...
[0:15] <slang> I have 5 nodes, each with 7-8 drives
[0:15] <slang> I'm running an osd for each drive
[0:16] <slang> and have created a crushmap that prevents replicas being placed on osds on the same node
[0:17] <slang> although, I increased the replication factor to 3 for my data and metadata pool
[0:18] <slang> and it looks like all the down+peering pgs are on the rbd pool
[0:18] <slang> which (I assume) doesn't actually have any data
[0:18] <slang> so maybe setting the crushmap after the osds are created/running doesn't change the pgs that don't have any data?
[0:19] <greglap> slang: no, it still changes the PGs
[0:19] <sjust> slang: yeah
[0:19] <slang> yes? no?
[0:19] <sjust> slang: did you specifically set the crush rules for the rbd pool?
[0:20] <greglap> and were all your lost OSDs on one machine, is that why you think they couldn't have lost all replicas?
[0:20] <sjust> slang: sorry, to clarify, the crushmap should affect empty pgs
[0:21] <sjust> however, you have to set the crush rule for each pool seperately
[0:21] <slang> sjust: I think I did set the crush rules for the rbd pool:
[0:21] <slang> rule rbd {
[0:21] <slang> ruleset 2
[0:21] <slang> type replicated
[0:21] <slang> min_size 1
[0:21] <slang> max_size 10
[0:21] <slang> step take root
[0:21] <slang> step chooseleaf firstn 0 type domain
[0:21] <slang> step emit
[0:21] <slang> }
[0:22] <slang> greglap: yes all the lost osds were on one machine
[0:22] <slang> (sorry should I have mentioned that too)
[0:22] <greglap> slang: is that the same ruleset that you used for your other pools?
[0:22] <slang> similar, but for data and metadata I used different min_size
[0:23] <slang> http://fpaste.org/3t6p/raw/
[0:24] <slang> I lost the node that maps to dom0
[0:24] <slang> it had 6 active osds
[0:24] <slang> all the down+peering pgs are 2.*
[0:25] <greglap> slang: we're looking
[0:26] <greglap> none of us is terribly familiar with CRUSH, we'll have to talk with Sage about how you got into this situation
[0:29] <slang> its also strange to me that file system operations seem to hang, even though the only pgs that are in down+peering are in the rbd pool
[0:29] <slang> file system stuff should just be accessing objects in the metadata and data pools I would think
[0:29] <greglap> yeah, it should
[0:30] <greglap> what kinds of hangs are you seeing?
[0:30] <greglap> it might be bugs in the MDS rather than bugs related to data access
[0:31] <slang> cfuse seems to be stuck in _open -> make_request
[0:32] <slang> but maybe this is a separate issue
[0:32] <slang> as the cfuse client on a different node is working fine
[0:33] <greglap> slang: probably a separate issue
[0:34] <greglap> sage changed some of the mds locking last week which could have introduced bugs in client handling code, or there might have been some that we still hadn't found and squashed
[0:35] <slang> greglap: I haven't pulled changes that recently
[0:35] <greglap> oh, they'd have to be ones we didn't find and squash yet then
[0:35] <greglap> still definitely possible, though annoying :/
[0:43] * greglap (~Adium@aon.hq.newdream.net) Quit (Quit: Leaving.)
[0:55] <slang> I force_create_pg on all the pgs that were down+peering, and they seem to now just be stuck in a creating state...
[0:55] <slang> pg v156958: 7920 pgs: 66 creating, 19 active, 7816 active+clean, 19 active+clean+degraded
[1:11] * jojy (~jojyvargh@70-35-37-146.static.wiline.com) Quit (Quit: jojy)
[1:13] <sjust> slang: hmm
[1:13] <sjust> can you post the output of ceph pg dump -o -
[1:13] <sjust> ?
[1:25] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[1:27] <Tv> new documentation: http://ceph.newdream.net/docs/latest/man/ (just converted manpages to web-friendly form)
[1:36] * greglap (~Adium@aon.hq.newdream.net) has joined #ceph
[1:42] * Tv (~Tv|work@aon.hq.newdream.net) Quit (Quit: Tv)
[1:53] * cp (~cp@c-98-234-218-251.hsd1.ca.comcast.net) has joined #ceph
[1:54] * cp (~cp@c-98-234-218-251.hsd1.ca.comcast.net) Quit ()
[2:05] * greglap (~Adium@aon.hq.newdream.net) Quit (Quit: Leaving.)
[2:06] * jojy (~jojyvargh@75-54-231-2.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[2:07] * jojy (~jojyvargh@75-54-231-2.lightspeed.sntcca.sbcglobal.net) Quit ()
[3:14] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[3:26] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Read error: Operation timed out)
[3:43] * greglap (~Adium@mobile-166-205-141-172.mycingular.net) has joined #ceph
[3:43] * greglap (~Adium@mobile-166-205-141-172.mycingular.net) Quit ()
[4:16] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[4:30] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) has joined #ceph
[4:36] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[4:54] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[5:20] * MK_FG (~MK_FG@219.91-157-90.telenet.ru) has joined #ceph
[5:24] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) has joined #ceph
[5:25] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) Quit ()
[5:25] * MKFG (~MK_FG@ has joined #ceph
[5:28] * MK_FG (~MK_FG@219.91-157-90.telenet.ru) Quit (Ping timeout: 480 seconds)
[5:28] * MKFG is now known as MK_FG
[5:30] * The_Bishop (~bishop@port-92-206-251-64.dynamic.qsc.de) has joined #ceph
[7:38] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) has joined #ceph
[7:38] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) Quit ()
[7:51] * yehuda_hm (~yehuda@bzq-79-183-192-55.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[9:04] * niemeyer (~niemeyer@200-102-220-181.pltce701.dsl.brasiltelecom.net.br) Quit (Remote host closed the connection)
[12:52] * yehuda_hm (~yehuda@bzq-79-183-235-246.red.bezeqint.net) has joined #ceph
[13:42] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) Quit (Quit: Ex-Chat)
[14:08] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) has joined #ceph
[16:11] * The_Bishop (~bishop@port-92-206-251-64.dynamic.qsc.de) Quit (Remote host closed the connection)
[17:17] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Remote host closed the connection)
[17:20] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[18:15] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) Quit (Read error: No route to host)
[18:42] * niemeyer (~niemeyer@200-102-220-181.pltce701.dsl.brasiltelecom.net.br) has joined #ceph
[20:54] * yehuda_hm (~yehuda@bzq-79-183-235-246.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[20:59] * yehuda_hm (~yehuda@bzq-79-183-235-246.red.bezeqint.net) has joined #ceph
[21:35] * yehuda_hm (~yehuda@bzq-79-183-235-246.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[21:52] * yehuda_hm (~yehuda@bzq-79-183-235-246.red.bezeqint.net) has joined #ceph
[22:19] * yehuda_hm (~yehuda@bzq-79-183-235-246.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[22:44] * yehuda_hm (~yehuda@bzq-79-183-235-246.red.bezeqint.net) has joined #ceph
[22:58] * yehuda_hm (~yehuda@bzq-79-183-235-246.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[23:00] * yehuda_hm (~yehuda@bzq-79-183-235-246.red.bezeqint.net) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.