#ceph IRC Log

Index

IRC Log for 2012-07-12

Timestamps are in GMT/BST.

[0:10] * theron (~Theron@ip66-43-220-25.static.ishsi.com) has joined #ceph
[0:15] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) has joined #ceph
[0:16] * s[X]_ (~sX]@eth589.qld.adsl.internode.on.net) has joined #ceph
[0:16] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) Quit (Read error: Connection reset by peer)
[0:25] <sjust> is it just me or are the autobuilders inaccessible?
[0:29] <mgalkiewicz> looks like osd is constantly timed out on sth
[0:31] <elder> sjust, it's not just you.
[0:31] <mgalkiewicz> sjust: is it expected?
[0:31] <elder> it's been that way for a few hours.
[0:32] <mgalkiewicz> sjust: it crashed
[0:32] <sjust> uh
[0:32] <sjust> hmm
[0:32] <sjust> can you post the log again?
[0:33] <mgalkiewicz> sure
[0:39] <mgalkiewicz> sjust: another.osd.0.log.gz
[0:39] <sjust> thanks
[0:40] * theron (~Theron@ip66-43-220-25.static.ishsi.com) Quit (Read error: No route to host)
[0:40] * theron (~Theron@ip66-43-220-25.static.ishsi.com) has joined #ceph
[0:47] * sabcio (~sabcio@77-253-65-155.adsl.inetia.pl) has joined #ceph
[0:50] * theron_ (~Theron@ip66-43-220-25.static.ishsi.com) has joined #ceph
[0:50] * theron (~Theron@ip66-43-220-25.static.ishsi.com) Quit (Read error: Connection reset by peer)
[0:50] * theron_ is now known as theron
[0:53] * sabcio (~sabcio@77-253-65-155.adsl.inetia.pl) Quit (Remote host closed the connection)
[0:54] * joshd (~jdurgin@2602:306:c5db:310:1e6f:65ff:feaa:beb7) has joined #ceph
[1:01] <sjust> mgalkiewicz: what filesystem are you using
[1:01] <sjust> there are remove operations taking longer than a second
[1:01] * yoshi (~yoshi@p22043-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:02] <sjust> mgalkiewicz: it appears that it timed out at 20 minutes about half way through the transaction
[1:02] * LarsFronius (~LarsFroni@95-91-243-243-dynip.superkabel.de) has joined #ceph
[1:04] <mgalkiewicz> sjust: btrfs
[1:05] <mgalkiewicz> if it make things better I can perform the upgrade
[1:05] <sjust> mgalkiewicz: something odd is going on, set the timeouts above to 6000 and 12000 (basically infinite)
[1:05] <sjust> mgalkiewicz: we need to get the osd up first
[1:05] <mgalkiewicz> ok
[1:06] <sjust> mgalkiewicz: as long as you continue to see lines like:
[1:06] <sjust> 2012-07-12 00:19:35.895441 7f2967ea8700 filestore(/srv/ceph/osd.0) collection_setattr /srv/ceph/osd.0/current/159.7_head 'ondisklog' len 30
[1:06] <sjust> 2012-07-12 00:19:36.218846 7f2967ea8700 filestore(/srv/ceph/osd.0) collection_setattr /srv/ceph/osd.0/current/159.7_head 'ondisklog' len 30 = 30
[1:06] <sjust> 2012-07-12 00:19:36.372166 7f2967ea8700 filestore(/srv/ceph/osd.0) collection_setattr /srv/ceph/osd.0/current/159.7_head 'info' len 432
[1:06] <sjust> 2012-07-12 00:19:36.385453 7f2967ea8700 filestore(/srv/ceph/osd.0) collection_setattr /srv/ceph/osd.0/current/159.7_head 'info' len 432 = 432
[1:06] <sjust> 2012-07-12 00:19:36.389012 7f2967ea8700 filestore(/srv/ceph/osd.0) truncate meta/a0831fa6/pginfo_159.7/0 size 0
[1:06] <sjust> 2012-07-12 00:19:36.409094 7f2967ea8700 filestore(/srv/ceph/osd.0) truncate meta/a0831fa6/pginfo_159.7/0 size 0 = 0
[1:06] <sjust> 2012-07-12 00:19:36.409121 7f2967ea8700 filestore(/srv/ceph/osd.0) write meta/a0831fa6/pginfo_159.7/0 0~8
[1:06] <sjust> 2012-07-12 00:19:36.424289 7f2967ea8700 filestore(/srv/ceph/osd.0) queue_flusher ep 1 fd 30 0~8 qlen 1
[1:06] <sjust> 2012-07-12 00:19:36.424308 7f2967ea8700 filestore(/srv/ceph/osd.0) write meta/a0831fa6/pginfo_159.7/0 0~8 = 8
[1:06] <sjust> 2012-07-12 00:19:36.424323 7f2967ea8700 filestore(/srv/ceph/osd.0) collection_setattr /srv/ceph/osd.0/current/151.0_head 'info' len 432
[1:06] <sjust> 2012-07-12 00:19:38.539727 7f2967ea8700 filestore(/srv/ceph/osd.0) collection_setattr /srv/ceph/osd.0/current/151.0_head 'info' len 432 = 432
[1:06] <sjust> 2012-07-12 00:19:38.543945 7f2967ea8700 filestore(/srv/ceph/osd.0) truncate meta/a08dfc76/pginfo_151.0/0 size 0
[1:06] <sjust> 2012-07-12 00:19:38.624769 7f2967ea8700 filestore(/srv/ceph/osd.0) truncate meta/a08dfc76/pginfo_151.0/0 size 0 = 0
[1:06] <sjust> 2012-07-12 00:19:38.624805 7f2967ea8700 filestore(/srv/ceph/osd.0) write meta/a08dfc76/pginfo_151.0/0 0~504
[1:06] <sjust> every few minutes, it's still making progress
[1:07] <mgalkiewicz> does the whole process starts from the beginning after restarting it?
[1:07] <sjust> no
[1:07] <sjust> just that transaction
[1:07] * theron (~Theron@ip66-43-220-25.static.ishsi.com) Quit (Quit: theron)
[1:08] <mgalkiewicz> ok timeouts increased osd started
[1:10] * joshd (~jdurgin@2602:306:c5db:310:1e6f:65ff:feaa:beb7) Quit (Remote host closed the connection)
[1:46] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) Quit (Remote host closed the connection)
[1:50] <mgalkiewicz> sjust: last lines like these were 15 minutes ago
[1:50] <sjust> nothing starting with filestore(
[1:50] <sjust> ?
[1:51] <sjust> sorry, including filestore(
[1:51] <mgalkiewicz> right now only:
[1:51] <mgalkiewicz> 2012-07-12 01:51:23.867368 7faed675b700 filestore(/srv/ceph/osd.0) sync_entry woke after 5.003755
[1:51] <mgalkiewicz> 2012-07-12 01:51:23.867429 7faed675b700 filestore(/srv/ceph/osd.0) sync_entry waiting for max_interval 5.000000
[1:51] <sjust> is the osd up?
[1:52] <mgalkiewicz> yes
[1:52] <sjust> serving requests?
[1:52] <mgalkiewicz> peering
[1:56] <sjust> ceph -s output?
[1:56] <sjust> it's the only osd right?
[1:56] <sjust> and there haven't been more osds in the past?
[1:57] <mgalkiewicz> yes
[1:57] <mgalkiewicz> there were two and the other one was removed
[1:57] <sjust> when?
[1:58] <mgalkiewicz> month ago I guess
[1:58] <sjust> and the cluster has been healthy and functional since then?
[1:58] <sjust> ok
[1:58] <mgalkiewicz> yes
[1:58] <sjust> in that case, it should eventually complete peering
[1:58] <sjust> ceph -s output?
[1:59] <mgalkiewicz> https://gist.github.com/3094588
[1:59] <mgalkiewicz> wait a sec
[1:59] <mgalkiewicz> and now it is stale+peering
[1:59] <sjust> yeah
[2:00] <mgalkiewicz> and it is not up
[2:00] <sjust> did it crash?
[2:00] <mgalkiewicz> no just osd e1021: 1 osds: 0 up, 1 in
[2:01] <sjust> is the osd spewing a lot of output?
[2:01] <mgalkiewicz> it was stale+peering at the beginning, for some time it was peering and now stale+peering once again is it ok?
[2:01] <sjust> maybe, depends on what the osd is actually doing
[2:01] <sjust> can you gzip up the log as it currently exists and push it?
[2:01] <sjust> without killing the process
[2:01] <mgalkiewicz> k
[2:03] * joshd (~jdurgin@2602:306:c5db:310:1e6f:65ff:feaa:beb7) has joined #ceph
[2:04] <mgalkiewicz> osd e1022: 1 osds: 0 up, 0 in
[2:04] <mgalkiewicz> log is still copying so we might miss sth
[2:06] <sjust> as long as the osd process is alive, it's probably making progress
[2:06] <mgalkiewicz> I have crashed it today a couple of times so I will problably end up creating new cluster from scratch
[2:08] <sjust> the thing is, it should recover
[2:13] * bchrisman (~Adium@108.60.121.114) Quit (Quit: Leaving.)
[2:15] <mgalkiewicz> sjust: log.gz and https://gist.github.com/3094635
[2:17] <dmick> it appears that teuthology is back
[2:19] * LarsFronius (~LarsFroni@95-91-243-243-dynip.superkabel.de) Quit (Quit: LarsFronius)
[2:19] * LarsFronius (~LarsFroni@95-91-243-243-dynip.superkabel.de) has joined #ceph
[2:22] <mgalkiewicz> sjust: now there are some filestore lines like those above
[2:22] <sjust> hmm
[2:22] <mgalkiewicz> looks like they appear during stale+peering state
[2:23] <mgalkiewicz> and not during peering
[2:23] <mgalkiewicz> maybe we should check whether osd is not trying to recover the same pg?
[2:26] <mgalkiewicz> I have also updated latest gist there were three "1" in the first line
[2:26] <mgalkiewicz> there should be three "1"
[2:27] <sjust> it's acting really very odd
[2:27] * Tv_ (~tv@2607:f298:a:607:c4fb:49d5:841d:f90) Quit (Quit: Tv_)
[2:27] * LarsFronius (~LarsFroni@95-91-243-243-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[2:28] <sjust> ok
[2:28] <sjust> upgrading the osd probably isn't any higher risk than continuing down this path
[2:29] <mgalkiewicz> ok I will start upgrade
[2:29] <sjust> you will want to leave those timeouts as is though
[2:30] <mgalkiewicz> ok
[2:31] * loicd (~loic@151.216.22.26) has joined #ceph
[2:33] <sjust> mgalkiewicz: you are upgrading to argonaut right?
[2:34] * hufman (~hufman@CPE-72-128-65-189.wi.res.rr.com) Quit (Quit: leaving)
[2:36] * Cube (~Adium@12.248.40.138) has joined #ceph
[2:38] <mgalkiewicz> yes
[2:40] * loicd (~loic@151.216.22.26) Quit (Ping timeout: 480 seconds)
[2:41] <mgalkiewicz> sjust: https://gist.github.com/3094762
[2:42] <mgalkiewicz> what is interesting ceph -s works randomly
[2:42] <mgalkiewicz> osd logs show sth like this:
[2:42] <mgalkiewicz> 2012-07-12 02:42:49.194956 7f680aa7e780 -1 Updating collection 166.5_head current version is 2
[2:42] <mgalkiewicz> 2012-07-12 02:42:50.184528 7f680aa7e780 -1 collection 166.5_head updated
[2:43] <mgalkiewicz> so I guess this is data reorganization
[2:48] <mgalkiewicz> it is finished and right now status is: osdmap e1024: 1 osds: 0 up, 0 in and osd logs looks the same like with older version
[2:49] <mgalkiewicz> sjust: any ideas?
[2:56] * loicd (~loic@151.216.22.26) has joined #ceph
[2:58] * loicd1 (~loic@151.216.22.26) has joined #ceph
[3:02] * loicd (~loic@151.216.22.26) Quit (Read error: Connection reset by peer)
[3:05] * Cube (~Adium@12.248.40.138) Quit (Ping timeout: 480 seconds)
[3:05] * loicd1 (~loic@151.216.22.26) Quit (Read error: Connection reset by peer)
[3:06] <dmick> mgalkiewicz: he's in a meeting at the moment again; shouldn't be too long
[3:06] <mgalkiewicz> ok
[3:07] * loicd (~loic@2001:67c:28dc:850:412e:1426:df7a:be35) has joined #ceph
[3:15] * loicd (~loic@2001:67c:28dc:850:412e:1426:df7a:be35) Quit (Ping timeout: 480 seconds)
[3:25] <sjust> mgalkiewicz: it finished the upgrade?
[3:29] <mgalkiewicz> yes
[3:29] <mgalkiewicz> but the situation looks the same
[3:29] <sjust> can you post the log again?
[3:29] <mgalkiewicz> sure
[3:37] * gohko (~gohko@natter.interq.or.jp) Quit (Read error: Connection reset by peer)
[3:38] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[3:41] <mgalkiewicz> sjust: ceph-osd.0.log.log.gz
[3:46] <sjust> it appears to have been writing a 17 MB file for 2 minutes
[3:47] <sjust> this appears to be pathological btrfs behavior
[3:47] <mgalkiewicz> what do you suggest
[3:50] <sjust> what kernel version are you running?
[3:51] <mgalkiewicz> 3.2.12-1
[3:52] <mgalkiewicz> I can easily perform upgrade to 3.2.20
[3:52] <sjust> when was the btrfs volume created?
[3:52] <mgalkiewicz> when in time or with which kernel?
[3:52] <sjust> which kernel is probably more pertinent
[3:53] <mgalkiewicz> 2.6.32
[3:54] <sjust> can you find the file containing pglog_166.2 in it's name in the osd data directory?
[3:54] <sjust> *its
[3:55] <mgalkiewicz> checking
[3:55] <elder> Anyone know why gitbuilder.ceph.com is still not responsive?
[3:55] <sjust> networking something something
[3:56] <elder> Wait, sorry, wrong machine.
[3:56] <sjust> i've stopped asking
[3:56] <sjust> it's up?
[3:56] <dmick> *some* things are back
[3:56] <sjust> oh
[3:56] <sjust> cool
[3:56] <elder> ceph.com/guiltbuilder-precise64/
[3:56] <mgalkiewicz> not sure where to look for this file
[3:56] <elder> \Simon says all is well.
[3:56] <dmick> gitbuilder.cgi can't connect out to the gitbuilder VMs
[3:56] <mgalkiewicz> ceph_path/osd.0/??
[3:57] <elder> I think it lost a few of my keystrokes. gitbuilder-precise-kernel-amd64
[3:57] <dmick> I love guiltbuilder tho
[3:57] <elder> My teuthology lock problem cleared up.
[3:57] <sjust> I suggest find
[3:57] <dmick> yes: (05:17:02 PM) dmick: it appears that teuthology is back
[3:57] <sjust> there are a lot of subdirs
[3:58] <dmick> so ceph.com can't get to gitbuilder output, but
[3:58] <dmick> the gitbuilder VMs are up
[3:58] <mgalkiewicz> sth like this snap_37813585/meta/DIR_2/DIR_0/pglog\u106.5__0_1FA84802__none ?
[3:58] <elder> I may be OK.
[3:58] <sjust> yeah, that's it
[3:58] <elder> Nope.
[3:58] <sjust> oh, no it's not
[3:59] <elder> I can get to the packages on gitbuilder.ceph.com.
[3:59] <elder> But running the teuthology task it tells me ERROR 403: Forbidden.
[4:00] <elder> Pretty foreboding.
[4:00] <dmick> VERBOTEN
[4:00] <sjust> find . -name 'pglog\\u166.2*'
[4:00] <sjust> that should do it
[4:00] <mgalkiewicz> https://gist.github.com/3095180
[4:00] <mgalkiewicz> which one
[4:01] <sjust> ./current/meta/DIR_2/DIR_7/pglog\u166.2__0_1FD33872__none
[4:01] <sjust> that one
[4:02] <mgalkiewicz> and what about it? upload?
[4:02] <sjust> hang on
[4:03] <sjust> there is a way to get btrfs to tell you how fragmented the file is
[4:03] * chutzpah (~chutz@100.42.98.5) Quit (Quit: Leaving)
[4:04] <sjust> try filefrag ./current/meta/DIR_2/DIR_7/pglog\u166.2__0_1FD33872__none
[4:05] <mgalkiewicz> 129 extents found
[4:05] <sjust> ok
[4:06] <sjust> people in the room are suggesting that that is a silly number of fragments
[4:06] <mgalkiewicz> by silly you mean that it is not an issue?
[4:06] <dmick> no, a *lot*
[4:06] <sjust> I am going to go now for the night, but you should try defragging the volume and trying again
[4:07] <mgalkiewicz> ok
[4:07] * nhmlap (~Adium@2607:f298:a:607:99d3:118:e217:e55a) Quit (Quit: Leaving.)
[4:07] <mgalkiewicz> thx for help
[4:07] <sjust> not at all, thanks for the feedback
[4:07] <sjust> I'll be back on in 14 hours
[4:08] <mgalkiewicz> hmm till then I will probably install ceph once again:)
[4:08] <mgalkiewicz> but defragmentation first
[4:18] <mgalkiewicz> did not help
[4:31] * loicd (~loic@151.216.22.26) has joined #ceph
[4:32] * nhmlap (~Adium@12.238.188.253) has joined #ceph
[4:48] * loicd (~loic@151.216.22.26) Quit (Ping timeout: 480 seconds)
[5:19] * nhmlap1 (~Adium@12.238.188.253) has joined #ceph
[5:19] * nhmlap (~Adium@12.238.188.253) Quit (Read error: Connection reset by peer)
[5:44] * renzhi (~renzhi@180.169.73.90) has joined #ceph
[5:50] * deepsa (~deepsa@122.172.6.109) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[6:22] * deepsa (~deepsa@122.172.6.109) has joined #ceph
[6:38] * alexxy (~alexxy@2001:470:1f14:106::2) Quit (Ping timeout: 480 seconds)
[6:38] <mgalkiewicz> anybody there?
[6:41] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[7:10] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[7:25] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[7:48] * dmick (~dmick@38.122.20.226) Quit (Quit: Leaving.)
[7:58] * loicd (~loic@151.216.22.26) has joined #ceph
[8:07] * loicd (~loic@151.216.22.26) Quit (Ping timeout: 480 seconds)
[8:11] * nhmlap1 (~Adium@12.238.188.253) Quit (Quit: Leaving.)
[8:15] * joshd (~jdurgin@2602:306:c5db:310:1e6f:65ff:feaa:beb7) Quit (Quit: Leaving.)
[8:36] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[8:41] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[8:47] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) Quit (Ping timeout: 480 seconds)
[9:01] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[9:10] * s[X]_ (~sX]@eth589.qld.adsl.internode.on.net) Quit (Remote host closed the connection)
[9:18] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[9:19] * loicd (~loic@2001:67c:28dc:850:9c85:f353:f799:710b) has joined #ceph
[9:28] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[9:40] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[9:45] * LarsFronius (~LarsFroni@95-91-243-243-dynip.superkabel.de) has joined #ceph
[9:59] * LarsFronius (~LarsFroni@95-91-243-243-dynip.superkabel.de) Quit (Quit: LarsFronius)
[10:16] * renzhi (~renzhi@180.169.73.90) Quit (Quit: Leaving)
[10:24] * loicd (~loic@2001:67c:28dc:850:9c85:f353:f799:710b) Quit (Quit: Leaving.)
[10:26] * renzhi (~renzhi@180.169.73.90) has joined #ceph
[10:32] * tnt_ (~tnt@212-166-48-236.win.be) has joined #ceph
[10:33] <tnt_> Hi. I'm wondering if anyone is running radosgw behing lighttpd ? My current issue is that the radosgw executable does a fork() and goes in the background while lighttpd expect it to stay in the front ... (and so lighttpd aborts with an error ...)
[10:35] <Fruit> if radosgw can listen on a socket (perhaps with spawn-fcgi?) you could run it externally
[10:36] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[10:37] <Fruit> ah yes, it can. use rgw socket path
[10:37] <tnt_> yes in both case it talks via the socket path but I would have liked it to start/stop with lighttpd ...
[10:38] <tnt_> the config in http://ceph.com/wiki/RADOS_Gateway suggest it can ...
[10:38] <tnt_> but apparently it doesn't work. Maybe the behavior of radosgw changed recently ?
[10:39] <Fruit> I'm afraid I don't know, never having used rgw (yet)
[10:40] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[10:43] <Fruit> one advantage of running it externally is that you can use seperate user accounts
[10:46] * LarsFronius_ (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[10:48] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Read error: Connection reset by peer)
[10:48] * LarsFronius_ is now known as LarsFronius
[10:50] <renzhi> hijacker, how I can configure Ceph to run in a VLAN, meaning that the synchronization between ceph nodes are done on its own network segment?
[10:50] <renzhi> i meant "hi", sorry
[10:51] <tnt_> just configure a vlan with different IP range and use addresses in that range for all ceph config ...
[11:06] * loicd (~loic@151.216.22.26) has joined #ceph
[11:12] <darkfader> reminds me of the multihoming feature request
[11:12] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[11:19] <renzhi> sorry, just walked away
[11:20] * yoshi (~yoshi@p22043-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:20] <renzhi> what I meant is, the servers have 2 network ports, I'd like ceph internal synchronization performed only on that the VLAN, while applications outside of the VLAN do read/write
[11:21] <renzhi> I remember there was a sample config on the list, but just couldn't find it now
[11:21] <renzhi> I was scratching my head on how to separate that in ceph.conf
[11:27] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Read error: Connection reset by peer)
[11:27] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[11:41] * renzhi (~renzhi@180.169.73.90) Quit (Quit: Leaving)
[11:54] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[12:05] * jamespage (~jamespage@tobermory.gromper.net) has joined #ceph
[12:08] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[12:14] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[12:16] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[12:52] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[12:59] * loicd (~loic@151.216.22.26) Quit (Quit: Leaving.)
[13:07] <todin> hi, is it possible to name the device when a rbd is mapped via the kernel clinet?
[13:13] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[13:27] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[13:39] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[13:53] * The_Bishop (~bishop@2a01:198:2ee:0:5855:d51d:11e8:430a) Quit (Remote host closed the connection)
[13:56] * The_Bishop (~bishop@2a01:198:2ee:0:4da1:7b7b:4d6e:2594) has joined #ceph
[14:23] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) has joined #ceph
[14:26] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Ping timeout: 480 seconds)
[14:29] * s[X] (~sX]@49.176.100.142) has joined #ceph
[14:33] * s[X]_ (~sX]@49.176.100.142) has joined #ceph
[14:33] * s[X] (~sX]@49.176.100.142) Quit (Read error: Connection reset by peer)
[14:40] * s[X]__ (~sX]@49.176.100.142) has joined #ceph
[14:40] * s[X]_ (~sX]@49.176.100.142) Quit (Read error: Connection reset by peer)
[15:05] * nhmlap (~Adium@12.238.188.253) has joined #ceph
[15:08] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) has joined #ceph
[15:10] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[15:10] * s[X]__ (~sX]@49.176.100.142) Quit (Read error: Connection reset by peer)
[15:28] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) Quit (Ping timeout: 480 seconds)
[15:48] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Ping timeout: 480 seconds)
[15:51] * s[X] (~sX]@49.176.67.206) has joined #ceph
[16:21] * s[X] (~sX]@49.176.67.206) Quit (Read error: Connection reset by peer)
[16:39] * theron (~Theron@ip66-43-220-25.static.ishsi.com) has joined #ceph
[16:42] * theron_ (~Theron@ip66-43-220-25.static.ishsi.com) has joined #ceph
[16:42] * theron (~Theron@ip66-43-220-25.static.ishsi.com) Quit (Read error: Connection reset by peer)
[16:42] * theron_ is now known as theron
[16:49] * s[X] (~sX]@49.176.97.15) has joined #ceph
[16:51] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) has joined #ceph
[16:51] * s[X] (~sX]@49.176.97.15) Quit (Read error: Connection reset by peer)
[16:51] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[16:57] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[17:19] * theron_ (~Theron@ip66-43-220-25.static.ishsi.com) has joined #ceph
[17:19] * theron (~Theron@ip66-43-220-25.static.ishsi.com) Quit (Read error: Connection reset by peer)
[17:19] * theron_ is now known as theron
[17:20] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Quit: Ex-Chat)
[17:25] * theron_ (~Theron@ip66-43-220-25.static.ishsi.com) has joined #ceph
[17:25] * theron (~Theron@ip66-43-220-25.static.ishsi.com) Quit (Read error: Connection reset by peer)
[17:25] * theron_ is now known as theron
[17:49] * tnt_ (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:50] * Tv_ (~tv@2607:f298:a:607:394a:5e1a:feb6:b166) has joined #ceph
[17:58] * tnt_ (~tnt@150.189-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:05] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) Quit (Remote host closed the connection)
[18:12] * nhmlap (~Adium@12.238.188.253) Quit (Quit: Leaving.)
[18:17] * Cube (~Adium@12.248.40.138) has joined #ceph
[18:20] * theron_ (~Theron@ip66-43-220-25.static.ishsi.com) has joined #ceph
[18:20] * theron (~Theron@ip66-43-220-25.static.ishsi.com) Quit (Read error: Connection reset by peer)
[18:20] * theron_ is now known as theron
[18:23] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[18:32] * deepsa_ (~deepsa@115.241.86.134) has joined #ceph
[18:33] * deepsa (~deepsa@122.172.6.109) Quit (Ping timeout: 480 seconds)
[18:33] * deepsa_ is now known as deepsa
[18:33] * aliguori (~anthony@32.97.110.59) has joined #ceph
[18:35] * nhmlap (~Adium@38.122.20.226) has joined #ceph
[18:49] * chutzpah (~chutz@100.42.98.5) has joined #ceph
[18:50] * loicd (~loic@magenta.dachary.org) has joined #ceph
[18:59] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) has joined #ceph
[19:01] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[19:02] * dmick (~dmick@38.122.20.226) has joined #ceph
[19:04] * JohnS50 (~quassel@71-86-129-2.static.stls.mo.charter.com) has joined #ceph
[19:06] * deepsa (~deepsa@115.241.86.134) Quit (Ping timeout: 480 seconds)
[19:06] * deepsa (~deepsa@122.172.7.127) has joined #ceph
[19:07] <JohnS50> I am testing ceph and having a problem getting it running. Someone suggested I ask for help here. Is that appropriate?
[19:08] <joao> it is
[19:11] <JohnS50> I installed it without error. When I try to start things with 'service ceph start', the os part gets an error 'unable to open OSD superblock on /data/ceph/osd0 no such file or directory'
[19:12] <JohnS50> I submitted it in the bug tracking system - it is id 2618 (my ceph.conf file should be in there too)
[19:13] <joao> does /data/ceph/osd0 exist?
[19:13] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) has left #ceph
[19:15] <JohnS50> yeah, that's the weird part
[19:16] <JohnS50> /dev/sdb1 was mounted as /data when I started the mkcephfs part
[19:16] <JohnS50> after the install, /dev/sdb1 is mounted as /data and /data/ceph/osd0
[19:17] <JohnS50> so /data/ceph/osd0 has a directory in it ceph which has osd0
[19:18] <JohnS50> so it looks like files are in /data/ceph/osd0/ceph/osd0/
[19:19] <JohnS50> I think I'm confused over where things should be and what needs to be mounted when starting the installation
[19:21] * theron (~Theron@ip66-43-220-25.static.ishsi.com) Quit (Read error: No route to host)
[19:21] * theron (~Theron@ip66-43-220-25.static.ishsi.com) has joined #ceph
[19:23] * gregaf (~Adium@38.122.20.226) Quit (Quit: Leaving.)
[19:24] <joshd> JohnS50: you don't need to mount /dev/sdb1 in two places like that
[19:25] <joshd> if you just keep it mounted only as /data it should work
[19:25] <JohnS50> I didn't. I mounted it as /data. the mkcephfs (or whatever the command is) mounted it
[19:26] <joshd> ah, ok. this sort of thing is why mkcephfs will be replaced
[19:27] <JohnS50> I tried unmounting it and just having /data mounted - no luck
[19:27] <joshd> to make mkcephfs not try to mount anything itself, remove the btrfs devs lines from your ceph.conf
[19:28] <joshd> 'btrfs devs' is only used by mkcephfs, and was really just a convenience thing for sage. it'll be removed in the future
[19:29] * theron (~Theron@ip66-43-220-25.static.ishsi.com) Quit (Quit: theron)
[19:29] <joshd> so if you keep /dev/sdb1 mounted only as /data/ceph/osd0, remove the btrfs devs line from your ceph.conf, and re-run mkcephfs, it should work
[19:30] <joshd> oh, and remove the existing contents of /data/ceph/osd0
[19:32] <JohnS50> I just tried removing the btrfs devs line in ceph.conf and removing the extra mount (so /deb/sdb1 was mounted as /data only). the osd.0 seemed to start ok. THANKS!!!
[19:32] <joshd> you're welcome :)
[19:34] * JohnS50 (~quassel@71-86-129-2.static.stls.mo.charter.com) has left #ceph
[19:35] <elder> I'll be offline for a couple of hours.
[19:41] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[19:41] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[19:42] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit ()
[19:43] * gregaf (~Adium@2607:f298:a:607:b98c:614a:1d58:34a1) has joined #ceph
[20:01] * LarsFronius (~LarsFroni@95-91-243-243-dynip.superkabel.de) has joined #ceph
[20:01] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) Quit (Remote host closed the connection)
[20:06] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) has joined #ceph
[20:08] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[20:08] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:16] * theron (~Theron@ip66-43-220-25.static.ishsi.com) has joined #ceph
[20:19] * theron_ (~Theron@ip66-43-220-25.static.ishsi.com) has joined #ceph
[20:19] * theron (~Theron@ip66-43-220-25.static.ishsi.com) Quit (Read error: Connection reset by peer)
[20:19] * theron_ is now known as theron
[20:26] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[20:33] * theron_ (~Theron@ip66-43-220-25.static.ishsi.com) has joined #ceph
[20:33] * theron (~Theron@ip66-43-220-25.static.ishsi.com) Quit (Read error: Connection reset by peer)
[20:33] * theron_ is now known as theron
[20:35] <Tv_> "gcc: internal compiler error: Killed (program cc1)"
[20:35] <Tv_> one of those days, huh
[20:36] <Tv_> oh the vm oom'ed
[20:37] <dmick> eh, you didn't need that process anywya
[20:38] <Tv_> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
[20:38] <Tv_> 911 whoopsie 20 0 351m 169m 812 S 0.0 34.6 0:56.69 whoopsie
[20:38] <Tv_> nice, Canonical, nice...
[20:39] <Tv_> bwahaha https://bugs.launchpad.net/whoopsie/+bug/1022435
[20:41] <dmick> holy ... what the... I don't even...
[20:41] <Cube> that's amazing
[20:42] <dmick> subscribed to updates :)
[20:51] * benpol (~benp@garage.reed.edu) has joined #ceph
[21:01] <joao> rofl
[21:13] * LarsFronius (~LarsFroni@95-91-243-243-dynip.superkabel.de) Quit (Quit: LarsFronius)
[21:45] * danieagle (~Daniel@177.43.213.15) has joined #ceph
[22:01] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[22:04] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:06] * nearbeer (~nik@mobile-198-228-199-241.mycingular.net) has joined #ceph
[22:09] * markl (~mark@tpsit.com) has joined #ceph
[22:10] <nearbeer> Hello. I've got 3 mons running on a 2 osd/host test setup . The 3rd mon is on a separate machine. Repl = 2. What happens if I lose a mon as I've read that even numbers of mons are a no-no.
[22:11] <nearbeer> It's not likely tha I'll lose mons in pairs :P
[22:11] <markl> good morning. trying to make the ext4 vs. btrfs decision here; anyone have experience with btrfs & ceph here?
[22:11] <joshd> nearbeer: as long as you have a majority of monitors available, it will continue working
[22:12] <markl> i have used btrfs with vm images but i haven't found the right tweaks to make it perform acceptably; i'm wondering if there are any recommended settings for using with ceph
[22:12] <The_Bishop> you may always set up a fourth mon and take out the broken one, but you have to distribute the new monmap to all nodes
[22:13] <joshd> nearbeer: the recommendation against an even number is that it reduces your availability - i.e. if you have 2 monitors and lose 1, a majority isn't available, or if you have 4 and lose 2, you're again below a majority
[22:14] <nearbeer> joshd: The_Bishop : I get it ... now. Thanks.
[22:14] <The_Bishop> no problem :)
[22:16] <elder> sagewk, are you aware of any type of problem that might show up only in a "real" kernel, *not* in a UML kernel?
[22:16] <elder> I'm getting a hang when setting up RBD devices, but not under uml.
[22:17] <sagewk> i think uml is less liekly to race cpus
[22:17] <elder> I have a suspicion it could be related to GFP flags but I haven't started looking into it much yet.
[22:17] <dmick> yeah, uml is surely more synchronous
[22:18] <elder> I wish it didn't take so long to get a kernel built.
[22:18] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has left #ceph
[22:18] <elder> Maybe I should start looking at building right on my target machine.
[22:19] <sagewk> elder: an abbreviated .config would go a long way.. it spends most of its tiem building modules for sound cards and weird networking crap
[22:19] <elder> I know, and I intended to get to that but still have not.
[22:19] <sagewk> elder: that too. would be nice to merge that branch
[22:19] <sagewk> er, make it work, i guess it's already in there
[22:19] <elder> I wanted to talk with Tv and Dan while I was out there last but didn't find the time.
[22:20] <joshd> markl: for btrfs, metadata fragmentation can be reduced by using mkfs.btrfs -l 64k -n 64k
[22:20] <elder> I'll see what I can do for now without changing my process, but tomorrow morning I may switch gears and get set up for on-target builds.
[22:22] <joshd> todin: I looked into it a little more, and it seems you do need to set the option on the command line. does http://tracker.newdream.net/issues/2777#note-3 work for you?
[22:22] * Ryan_Lane (~Adium@128.164.5.67) has joined #ceph
[22:22] <sagewk> elder; you can also plow ahead with functinoality with uml and defer the non-uml debugging
[22:22] <sagewk> that might come back to bite you tho :)
[22:22] <elder> I suppose so.
[22:22] <elder> Right.
[22:23] <dmick> elder, have you ever by chance looked at cscope in the ceph (userland) repo?
[22:23] <elder> I'm going to do some bisects, and will forge ahead with uml while I wait for builds.
[22:23] <elder> dmick, yes
[22:23] <dmick> automake is supposed to create a target set, but it just doesn't (at least not with the automake I have on precise)
[22:23] <elder> You mean using cscope on those files?
[22:23] <dmick> and I don't understand why now
[22:23] <dmick> *not
[22:24] <dmick> support was added in 2009; I haven't correlated to a version but I'm sure that's before the version I have
[22:24] <elder> What do you mean a "target set?" You mean the list of files in the database?
[22:24] <dmick> no, a target named cscope, one named cscopefiles, etc. (make target)
[22:24] <elder> Oh.
[22:24] <dmick> same way it does for tags/ctags/etc.
[22:24] <elder> I haven't done that, I do it using my own "csbuild" script./
[22:25] <dmick> http://www.gnu.org/software/automake/manual/automake.html#Tags
[22:25] * danieagle (~Daniel@177.43.213.15) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[22:25] * JJ (~JJ@12.248.40.138) has joined #ceph
[22:25] <dmick> and yeah, I've done it by hand too, but I tire of such nonsense :)
[22:25] <elder> What a great idea to put it in the makefile!!!
[22:26] <elder> Sort of like adding support for "make depend"
[22:26] <dmick> if only it actually worked
[22:26] <elder> I still have my own rules for that, from decades ago.
[22:26] <elder> Anyway, sorry, I probably won't be of much help.
[22:27] <dmick> yeah. it was a hope against hope
[22:27] <elder> Automake always makes me feel uneasy.
[22:27] <elder> Like I may be able to fix things, but I'm also sure I don't know what I'm doing no longer how long I look at it.
[22:28] <dmick> it's pretty complex
[22:28] * nearbeer (~nik@mobile-198-228-199-241.mycingular.net) Quit (Quit: Colloquy for iPad - http://colloquy.mobi)
[22:32] <markl> joshd: it's 128k by default isn't it?
[22:33] <markl> my btrfs test is currently running with auto_defrag and ubuntu's 3.2.0 kernel
[22:33] <markl> but i think the sync problems are too hideous
[22:34] <markl> i was running an oracle standby vm for awhile but the only way it was usable was with a tgtd exported target with the file on btrfs, and the sync operations commented out in the tgtd code
[22:35] <markl> kind of gross
[22:35] <markl> s/kind of/extremely/
[22:35] <joshd> I think it's smaller in 3.2. iirc those options were added in 3.4
[22:35] <markl> so to make a short story long :) i was wondering if anyone here had any better luck
[22:44] <markl> looks like there is a PPA for 3.5, great
[22:57] * theron (~Theron@ip66-43-220-25.static.ishsi.com) Quit (Read error: Connection reset by peer)
[22:57] * theron (~Theron@ip66-43-220-25.static.ishsi.com) has joined #ceph
[23:01] <dmick> ok, cscope mystery solved: that really wasn't in the version in precise
[23:06] * aliguori (~anthony@32.97.110.59) Quit (Remote host closed the connection)
[23:06] * theron (~Theron@ip66-43-220-25.static.ishsi.com) Quit (Read error: Connection reset by peer)
[23:06] * theron (~Theron@ip66-43-220-25.static.ishsi.com) has joined #ceph
[23:06] * Ryan_Lane (~Adium@128.164.5.67) Quit (Quit: Leaving.)
[23:09] <dmick> precise has automake 1.11.3; most-current is 1.12.2; cscope was apparently added in v1.11b, which is after the 1.11.x series.
[23:11] * Ryan_Lane (~Adium@128.164.5.67) has joined #ceph
[23:13] <elder> sagewk, nevermind, I think I was running an old version of my code on the last test that hung.
[23:13] <elder> Pretty sure my problem was I had GFP_KERNEL where I needed GFP_NOIO in the extract_encoded_string() function
[23:14] <elder> The old version of my code (from yesterday) had it wrong.
[23:25] * Ryan_Lane (~Adium@128.164.5.67) Quit (Quit: Leaving.)
[23:26] <Tv_> whee dns auto-updater is working
[23:26] <Tv_> crow-test-crowbar4-node-3.front.sepia.ceph.com etc now show up in dns
[23:27] <Tv_> still fiddling with details, but this is good
[23:27] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[23:28] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Read error: Connection reset by peer)
[23:28] <sagewk> tv_: yay!
[23:29] <dmick> using "DDNS" as in RFC2136, Tv_?
[23:29] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[23:29] <Tv_> nope
[23:30] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[23:30] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Read error: Connection reset by peer)
[23:30] <Tv_> dmick: the only way isc dhcpd really supports that is just passing whatever dhcp client says straight through, that was too much of a pain; i'm following dhcp leases db and libvirt defined vms and changing powerdns sql db accordingly
[23:31] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[23:31] <dmick> some libvirt hook to kick it off?
[23:31] <Tv_> it supports a event notification mechanism
[23:32] <Tv_> (that's how the virt-manager gui updates when things happen)
[23:33] <dmick> thru dbus or something generic?
[23:33] <Tv_> through libvirt-the-protocol-and-client-lib
[23:33] <dmick> k
[23:34] <dmick> anyway, neet; just peering under the hood because it's neet
[23:34] <Tv_> dmick: details got a big ugly because i got frustrated and just forced my way through, but https://github.com/ceph/propernoun
[23:36] <Tv_> it's churning through a lot of unnecessary work, but for a deployment of this size, that's fine
[23:36] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[23:36] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) has joined #ceph
[23:38] <Tv_> frankly, all my worries ended up being about managing concurrency.. there's an event loop welded with a blocking thread that both feed event into a single queue that's emptied by a single consumer for ordering, and then the parent thread just waits for a "i crashed" message from anyone and god kill me now
[23:40] <elder> joshd, sagewk, trying to document the pool id. Is it true that the id is a permanent and unchanging attribute of a rados storage pool?
[23:40] <sagewk> yeah
[23:40] <mgalkiewicz> hi I have three mons in my configuration but when I start the third one it constantly forces to elect new leader https://gist.github.com/3101220
[23:40] <elder> The doc now says "the pool-name pair is unique per rados system"
[23:40] <mgalkiewicz> its is 0.48
[23:41] <dmick> mgalkiewicz: that's probably normal?
[23:41] <Tv_> elder: "pool-name pair"? does that mean (pool, object_name)?
[23:41] <elder> But I gather a pool name can change. Is there anything more we should be saying about the pool?
[23:41] <elder> Yes, I think so.
[23:41] <elder> But in this case the name is the rbd image name, so, well, not quite.
[23:41] <Tv_> elder: my understanding: even if the pool name can change, that pair is still unique at all times
[23:41] <elder> I'd like to rename that "image name"
[23:41] <elder> But not today.
[23:41] <Tv_> elder: is this a doc just about rbd, or about rados?
[23:42] <elder> RBD
[23:42] <mgalkiewicz> dmick: why?
[23:42] <Tv_> elder: also, i'm not sure why you're imply rbd image names are not unique
[23:43] <elder> I don't mean to imply that. Actually, I don't even know if whether that's true or not now that you say that.
[23:43] <Tv_> elder: has to be, or we're screwed
[23:43] <dmick> mgalkiewicz: with one monitor there's no consensus to gain. With two no consensus is possible. With three, you have a group that can reach consensus, but they have to elect a leader
[23:44] <dmick> well, unique within a pool
[23:44] <Tv_> dmick: of course two mons can reach consensus
[23:44] <dmick> sorry: rbd image names are unique with a pool
[23:44] <dmick> Tv_: yeah, I misworded that. They can't resolve disagreement
[23:44] <Tv_> dmick: and also, be careful with your language and say "2 of 2" etc, otherwise reader might assume a cluster size
[23:45] <Tv_> dmick: sure they can!
[23:45] <dmick> ok, then, I don't understand
[23:45] <Tv_> dmick: 2 mon cluster just needs n/2+1==2 mons healthy
[23:45] <Tv_> dmick: 1 of 2 cannot reach quorum
[23:45] <mgalkiewicz> dmick: yeah but after choosing the leader they should not elect it once again and this is what it looks like from logs
[23:45] <Tv_> much like 2 of 4 cannot reach quorum
[23:45] <Tv_> or 2 of 5
[23:45] <gregaf> assuming they follow a cooperative algorithm (which they do), they can agree
[23:46] <dmick> ok. sloppy wording destroyed the concept, yes.
[23:46] <gregaf> they can't agree if all they do is elections, and they each propose a different outcome ??? maybe that's the confusion
[23:46] <gregaf> (but nobody does that IRL, because???dur, it means 2 doesn't work AT ALL)
[23:46] <Tv_> that's just saying "there is a way of programming paxos wrong"
[23:47] <Tv_> (that may, in fact be the most common way ;)
[23:47] <gregaf> if you aren't using a leader, doing it right is actually pretty hard
[23:47] <gregaf> anyway, for mgalkiewicz: do you mean they're looping on the leader election?
[23:47] <gregaf> what output is leading to this conclusion?
[23:47] <Tv_> gregaf: see the link
[23:48] <gregaf> see, you guys went on so long I didn't even notice there was a link :p
[23:48] <mgalkiewicz> gregaf: yes
[23:48] <Tv_> it seems to be calling for a leader election every 6 seconds
[23:48] <mgalkiewicz> https://gist.github.com/3101220
[23:49] <dmick> mgalkiewicz: I also did not read your log carefully, sorry
[23:50] <elder> joshd, how does one change the name of a pool? Using the rbd command line or something?
[23:51] <gregaf> mgalkiewicz: which monitors did you have on before (when it was working, right?)
[23:52] * benpol (~benp@garage.reed.edu) has left #ceph
[23:52] <joshd> elder: ceph osd pool rename <src> <dest>
[23:53] <elder> OK. I'm going to add a note about how the pool name can change out from under you while I'm at it.
[23:53] <elder> That doesn't affect access to the storage though, right?
[23:54] <gregaf> I'm not sure that's helpful ?????pool name changes are manual and there aren't any good reasons to do them??? (not even sure why we support them, unless we busted something at some point)
[23:54] <elder> Is it true though?
[23:54] <mgalkiewicz> gregaf: n11c1 and n12c1 and then I added cc2 which causes troubles
[23:54] <joshd> yeah
[23:54] <elder> Can someone else change a pool name while it's in use?
[23:54] <dmick> what do you mean by "aren't any good reasons"? I can certainly imagine creating a pool and then changing my mind about its name
[23:55] <joshd> elder: once we've opened a pool and gotten its id, we don't care about name changes
[23:55] <elder> OK.
[23:56] <joshd> if you doing something that requires opening a new device, like live migration, you'd have a problem if the name changed while you were running
[23:56] <gregaf> mgalkiewicz: I could tell you more with a high-debug-level log from cc2, but from what I see here it or your other machines are just overloaded
[23:56] <elder> So the pool name is a convenient moniker for puny humans to be able to attach some meaning beyond the unique identifier used internally.
[23:56] <joshd> right
[23:56] <dmick> ....as all good names should be
[23:57] <lurbs> I am not a number! I am a free man!
[23:57] <mgalkiewicz> gregaf: well they are not overloaded in terms of load, ram usage or cpu usage
[23:58] <gregaf> well, if you grab me a log with "--debug_ms 1 --debug-mon 20" I'll take a look
[23:58] <elder> lurbs, you are Number 6.

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.