#ceph IRC Log


IRC Log for 2013-05-07

Timestamps are in GMT/BST.

[0:05] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[0:06] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit ()
[0:07] * BillK (~BillK@58-7-104-61.dyn.iinet.net.au) has joined #ceph
[0:08] * gucki (~smuxi@84-73-204-178.dclient.hispeed.ch) Quit (Remote host closed the connection)
[0:08] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[0:10] * sleinen2 (~Adium@2001:620:0:26:a4a1:8f93:366b:c5e6) Quit (Ping timeout: 480 seconds)
[0:13] * drokita (~drokita@ Quit (Ping timeout: 480 seconds)
[0:17] <elder> joshd, dmick I have a question about flattening images.
[0:17] * dmick listens
[0:18] <elder> Wait, nevermind. Give me a few minutes, I want to verify something. Thanks for listening.
[0:18] * kfox1111 (bob@leary.csoft.net) Quit (Read error: Connection reset by peer)
[0:18] * dmick storms off in a huff
[0:21] * loicd trying to figure out how to register to https://plus.google.com/events/cn7iv958atff3e5a4tci1va36bg
[0:24] * Hefeweizen (~oftc-webi@dyn160091089133.dz.ornl.gov) Quit (Remote host closed the connection)
[0:24] <dontalton2> hi, I am working on using ceph with cinder. Is it imperative that "env CEPH_ARGS="--id volumes" be at the TOP of /etc/init/cinder-volume.conf, or can it be elsewhere?
[0:25] <elder> dmick, nevermind. I was hitting a "image busy" but it was because I fired off a flatten in the background and didn't wait for it to ocmplete.
[0:26] * yasu` (~yasu`@dhcp-59-157.cse.ucsc.edu) Quit (Remote host closed the connection)
[0:26] * gmason_ (~gmason@hpcc-fw.net.msu.edu) Quit (Quit: Computer has gone to sleep.)
[0:26] * kyle_ (~kyle@ has joined #ceph
[0:26] <elder> That is, I was trying to remove an image, but I didn't wait for the flatten to complete, hence it was busy.
[0:29] <Elbandi_> gregaf: i do a new pull req for https://github.com/ceph/ceph/pull/253 with "clean" commits
[0:31] * rustam (~rustam@ has joined #ceph
[0:36] <gregaf> Elbandi_: I don't think I see it?
[0:36] <dmick> elder: gotcha
[0:38] <Tamil> loicd: its a public event, anyone can view the video stream and join IRC
[0:39] <loicd> Tamil: ok. I got confused because I saw a "Are you going?" followed by 'yes' or 'maybe' or 'no' . But when clicking 'yes' nothing happens, the page just reloads.
[0:40] <Elbandi_> gregaf: i will do... :)
[0:40] <gregaf> sounds good, thanks
[0:41] * tnt_ (~tnt@ Quit (Ping timeout: 480 seconds)
[0:41] * LeaChim (~LeaChim@ Quit (Read error: Connection reset by peer)
[0:44] <Tamil> loicd: oh ok, if you are looking to speak, please let us know in IRC and we'll invite you
[0:45] * Havre (~Havre@2a01:e35:8a2c:b230:dcc3:7504:611c:64fc) has joined #ceph
[0:45] <loicd> Tamil: I'm scheduled to speak at 9am PDT http://wiki.ceph.com/01Planning/Ceph_Developer_Summit#Schedule
[0:45] <loicd> my id on g+ is louis.lavile@gmail.com
[0:45] <loicd> my real name is loic dachary
[0:46] <loicd> I did not realize invitations were necessary, I understand why it did not work ;-)
[0:46] <Tamil> loicd: i see that :)
[0:47] <dmick> I don't think you'll need to be invited to the session you're presenting loicd
[0:47] <dmick> :)
[0:47] <dmick> it's more about divvying up the 10 Hangout "can video" slots
[0:48] <loicd> as long as I don't need to do anything special, that's fine ;-)
[0:52] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[0:52] <gregaf> loicd: I suspect we'll get invites sometime today or tomorrow :)
[0:52] * PerlStalker (~PerlStalk@ Quit (Quit: ...)
[0:53] <loicd> gregaf: ok :-)
[0:54] * SvenPHX (~scarter@wsip-174-79-34-244.ph.ph.cox.net) has left #ceph
[0:55] <Tamil> donalton2: from what i have seen, all the env variables seems to be set at the top of config file
[0:55] * yehuda_hm (~yehuda@2602:306:330b:1410:882e:275c:f33f:7cd1) Quit (Ping timeout: 480 seconds)
[0:57] <Fetch> How do I go about generating rbd_secret_uuid for using Ceph in a Cinder config?
[0:59] <Tamil> fetch: http://ceph.com/docs/master/rbd/rbd-openstack/?highlight=uuid
[0:59] <Fetch> weird, so the compute (non-cinder) hosts get that too? sweet
[1:02] * dpippenger (~riven@206-169-78-213.static.twtelecom.net) Quit (Remote host closed the connection)
[1:03] * dpippenger (~riven@206-169-78-213.static.twtelecom.net) has joined #ceph
[1:03] <Tamil> donalton2: "env CEPH_ARGS="--id volumes" has to be at the TOP of /etc/init/cinder-volume.conf
[1:04] * rturk-away is now known as rturk
[1:05] <rturk> loicd: Hi ;) We have created the hangouts for tomorrow, but we can't turn them on until tomorrow morning
[1:05] <rturk> they'll start recording once we turn them on, I think
[1:06] <gregaf> rturk: so will everybody who put their name down be invited at that time?
[1:06] <rturk> we're going to send an email to session owners in a bit to ask for their google IDs - then we plan to send out invitations shortly before the sessions
[1:06] <rturk> gregaf: correct
[1:06] <loicd> rturk: thanks for the information
[1:06] <rturk> and those who didn't ask to be invited can ping me or scuttlemonkey_ on IRC, it's easy to invite people
[1:06] <rturk> we'll have one hangout for track 1 and one hangout for track 2
[1:08] * yehuda_hm (~yehuda@2602:306:330b:1410:882e:275c:f33f:7cd1) has joined #ceph
[1:14] <Fetch> tamil: ugh, have you ever done this (ceph on cinder)? because those directions don't make sense - libvirt/compute nodes and virsh are orthogonal to Cinder
[1:14] <Fetch> (as in, on my controller node I don't even have virsh)
[1:16] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[1:16] <joshd> Fetch: the uuid is arbitrary - you can create one with the uuid command, then put it in the secret.xml when you add the secret to each compute node
[1:17] <Fetch> joshd: cool. Do the compute nodes necessarily need that if I'm not having nova-compute use rbd for instance backing?
[1:18] <joshd> they still need it if they're going to attach volumes
[1:18] <Fetch> thanks
[1:18] <Fetch> can the nodes share the uuid?
[1:19] <joshd> yeah, they need to use the same one (since it's sent to them by cinder) unless they override it with their own in nova.conf.
[1:20] <Fetch> I wish this rbd/openstack page was a wiki, I'd go add all that
[1:20] <Fetch> hopefully I'm now clear on the concept
[1:22] <joshd> you can edit and send a pull request from the browser on github https://github.com/ceph/ceph/blob/master/doc/rbd/rbd-openstack.rst
[1:23] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) Quit (Read error: Connection reset by peer)
[1:23] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[1:25] * dontalton2 (~dwt@rtp-isp-nat1.cisco.com) Quit (Quit: Leaving)
[1:25] * brady (~brady@rrcs-64-183-4-86.west.biz.rr.com) Quit (Quit: Konversation terminated!)
[1:32] * cjh_ (~cjh@ps123903.dreamhost.com) Quit (Remote host closed the connection)
[1:33] * cjh_ (~cjh@ps123903.dreamhost.com) has joined #ceph
[1:33] <cjh_> if i have a hotspot on ceph is there a way i can mitigate that? replicate it on to more hosts maybe?
[1:36] * cjh_ (~cjh@ps123903.dreamhost.com) Quit (Remote host closed the connection)
[1:37] * yehuda_hm (~yehuda@2602:306:330b:1410:882e:275c:f33f:7cd1) Quit (Ping timeout: 480 seconds)
[1:38] <Elbandi_> gregaf: hmm, github is clever :) the new commits are in the old pull request
[1:45] <gregaf> cool
[1:45] * yehuda_hm (~yehuda@2602:306:330b:1410:882e:275c:f33f:7cd1) has joined #ceph
[1:46] <Elbandi_> anyway, do you have any idea, why is mds journal so slow?
[1:46] <gregaf> I think I just got whiplash
[1:47] <gregaf> not without a lot more info than that, no
[1:47] <Elbandi_> i deleted all files ~6 hours ago
[1:47] <gregaf> oh, you mean the deletes, not the journal
[1:48] <gregaf> yeah, my suspicion is that the clients are still holding capabilities on the inodes; have you tried unmounting and remounting?
[1:48] <sagewk> paravoid: ping!
[1:48] <Elbandi_> hmm
[1:48] <sagewk> paravoid: wip-suppress has a patch for ceph-disk that should, i think, resolve the 'prepare but don't activate' problem... what do you think?
[1:48] <Elbandi_> i try it now
[1:48] <benner> how to avoid/lower thiese spikes: http://p.defau.lt/?RL_2mhI88nEJ9kCs3_66Aw ?
[1:49] <gregaf> Elbandi_: do you have a need for ceph_get_path_object_size() et al when there's a ceph_get_path_layout()
[1:50] <sjust> benner: hmm, that is not a good thing
[1:50] <gregaf> it seems like keeping it smaller is simpler
[1:50] <sjust> benner: version? number of osds?
[1:51] <benner> sjust: 30, 10 in one host
[1:51] <sjust> benner: you probably need to adjust the osd grace period way down
[1:51] <sjust> it takes a few seconds to detect an osd as dead, basically
[1:52] * john_barbee_ (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) has joined #ceph
[1:52] <sjust> actually, do you mean in the case like this where the osd is deliberately killed with warning, or a case where the node looses power abruptly?
[1:53] <benner> sjust: i just rebooted host (os reboot)
[1:53] * cjh_ (~cjh@ps123903.dreamhost.com) has joined #ceph
[1:54] * cjh_ (~cjh@ps123903.dreamhost.com) Quit (Remote host closed the connection)
[1:55] <sjust> benner: so cuttlefish osds will warn the mons prior to dieing, that greatly speeds up the hand off
[1:55] <sjust> but that won't help in the case of an abrupt network partition or power failure
[1:55] <sjust> in either case, you are at the mercy of failure detection
[1:57] <Elbandi_> gregaf: actually, get_layout = get stripe_unit + get stripe_count + get object_size + get pool_id
[1:58] <Elbandi_> but it's the same thing than setattr = chmod + chown + utime
[1:59] * BManojlovic (~steki@fo-d- Quit (Quit: Ja odoh a vi sta 'ocete...)
[2:00] <Elbandi_> so if someone want to get the stripe unit only, he can call the get_path_stripe_unit func, but if he want all data at the same time, he sould call the get_layout func
[2:00] <benner> sjust: ceph -w shows that in first place mon on this osd was missed, osds next. may it related?
[2:01] <sjust> oh, you had a monitor on the dead node?
[2:01] <sjust> hmm
[2:01] <sjust> not sure that they mark themselves out on shutdown
[2:02] <sjust> they probably don't need to, actually
[2:02] <benner> sjust: in my setup i have 3 hosts with 1 mon each and 10 osd each
[2:02] <sjust> anyway, you need to mark the osds 'down' prior to killing them if you want to avoid the timeout
[2:02] <sjust> cuttlefish will do that automatically if you do an orderly shutdown on the osd daemons
[2:02] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[2:03] <benner> so grace way is this: ceph set noout; ceph osd <x..y> down and then reboot?
[2:03] * rustam (~rustam@ Quit (Remote host closed the connection)
[2:04] <sjust> yeah, except that they will attempt to mark themselves back up, so probably for i in osds; do ceph osd down $i; kill $i; done
[2:04] <sjust> etc
[2:08] <benner> sjust: tryed and it worked. thank you
[2:14] * pconnelly (~pconnelly@ has joined #ceph
[2:14] * pconnelly (~pconnelly@ Quit ()
[2:15] <Fetch> does the cinder rbddriver support the filterscheduler?
[2:15] * pconnelly (~vmware-pc@ has joined #ceph
[2:16] <joshd> yes
[2:16] <joshd> e.g. http://www.sebastien-han.fr/blog/2013/04/25/ceph-and-cinder-multi-backend/, and then filter on volume_type
[2:19] <Fetch> getting ERROR [cinder.scheduler.manager] Failed to schedule_create_volume: No valid host was found. but, it's a brand new problem so I'll whack on it a bit unless it's something you've seen a bit
[2:19] * yehuda_hm (~yehuda@2602:306:330b:1410:882e:275c:f33f:7cd1) Quit (Ping timeout: 480 seconds)
[2:19] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[2:21] <joshd> might be https://bugs.launchpad.net/cinder/+bug/1172286
[2:22] <gregaf> Elbandi_: but with the interface you've set up, anybody who's not interested in the extra layout data can just set NULL on those pointers, and then we cut out on a whole mess of cod
[2:22] <gregaf> *code
[2:29] <Elbandi_> this is also true :D
[2:30] <Elbandi_> so i sould remove all get_* func, and keep only the get_layout ?
[2:30] <Fetch> joshd: that patch did the trick
[2:31] * yehuda_hm (~yehuda@2602:306:330b:1410:882e:275c:f33f:7cd1) has joined #ceph
[2:31] <Fetch> (although my volume name was volumes, so weird huh)
[2:31] <joshd> cool, it should be merged soon
[2:44] <gregaf> Elbandi_: sorry, I keep moving away
[2:44] <gregaf> that would be my preference, just to keep it smaller and have less duplication
[2:44] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Quit: my troubles seem so far away, now yours are too...)
[2:44] <gregaf> if you've got a good argument for keeping them separate though, I'd love to hear it :)
[2:44] <gregaf> heading out now but I'll check back in later
[2:44] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[2:44] * ChanServ sets mode +o scuttlemonkey
[2:50] <Fetch> joshd: -1 librbd: error removing img from new-style directory: (2) No such file or directory on rbd rm, but only --format 2 images. Confirmed libcls_rbd.so present all around
[2:50] <Fetch> Created test case with simple filename for image, created from cli
[2:50] <Fetch> sound familiar?
[2:50] <Fetch> there's hits on google, but of people typoing sounds like
[2:52] <joshd> yeah, that's what you'd get when the image already doesn't exist
[2:52] * tkensiski (~tkensiski@2600:1010:b022:6eb2:6978:a72c:2087:7de5) has joined #ceph
[2:52] <Fetch> but it shows up with ls?
[2:52] <joshd> like if you tried to remove it from the wrong pool
[2:52] * tkensiski (~tkensiski@2600:1010:b022:6eb2:6978:a72c:2087:7de5) has left #ceph
[2:53] <Fetch> I mean, I get that I could typo it pretty easy, but I can do rbd -p volumes -s 1 --format 2 create test;rbd -p test rm create
[2:53] <Fetch> and get the error
[2:53] <Fetch> (tracking down similar error being thrown by cinder deleting volume)
[2:54] <Fetch> and derp, typo in my example
[2:54] <Fetch> but seriously, using command history and everything
[2:54] <joshd> I don't think it would be another cls_rbd.so problem
[2:55] <Fetch> I only mentioned it because it was one of the google hits
[2:55] <joshd> you'd get a different error in that case
[2:55] <joshd> -ENOENT is a really strange error to get when it's there
[2:55] <Fetch> http://pastebin.com/LFWDa2Jx
[2:56] <Fetch> and it's only format 2
[2:56] <joshd> my next guess would be client permissions, but -ENOENT doesn't make sense for that
[2:56] <Fetch> format 1 works fine
[2:57] * tkensiski (~tkensiski@2600:1010:b022:6eb2:6978:a72c:2087:7de5) has joined #ceph
[2:57] <Fetch> just tried with client.admin, same result
[2:57] <joshd> maybe you have an older librbd or ceph package on the client?
[2:57] * tkensiski (~tkensiski@2600:1010:b022:6eb2:6978:a72c:2087:7de5) has left #ceph
[2:58] <joshd> I feel like there was a bug that could result in this error, even when something else was the cause, several months ago
[2:59] <joshd> try rbd -p volumes rm test --debug-ms 1
[3:00] <Fetch> getting pastebin ready
[3:01] * dpippenger (~riven@206-169-78-213.static.twtelecom.net) Quit (Quit: Leaving.)
[3:01] <Fetch> http://pastebin.com/2C5CpRWJ
[3:01] * Cube (~Cube@ Quit (Quit: Leaving.)
[3:03] <Fetch> I'm at 0.56.6 across the cluster
[3:04] <joshd> osds restarted after the upgrade?
[3:04] <Fetch> Possibly not, restarting
[3:07] <Fetch> Looks like they were. But I restarted all osd, mon, command still fails the same
[3:08] <joshd> ok, next step is dumping the omap values on the rbd directory object
[3:08] <Fetch> health HEALTH_WARN 226 pgs degraded; 91 pgs stuck inactive; 317 pgs stuck unclean; recovery 11/46 degraded (23.913%)
[3:08] <Fetch> (current cluster health, if that matters)
[3:10] * yehuda_hm (~yehuda@2602:306:330b:1410:882e:275c:f33f:7cd1) Quit (Ping timeout: 480 seconds)
[3:10] <joshd> rados listomapvals -p volumes rbd_directory
[3:12] <Fetch> http://pastebin.com/FVtmEXGQ
[3:13] <joshd> that looks like a normal rbd_directory
[3:15] * yehuda_hm (~yehuda@2602:306:330b:1410:882e:275c:f33f:7cd1) has joined #ceph
[3:15] <Fetch> so what I don't know about this stuff would fill a book: if dirty PGs were previously assigned to a pool, but that pool was nuked and then later recreated, could something like this happen?
[3:15] <joshd> I guess it may be an osd problem or cls issue then
[3:16] <joshd> no, pool ids aren't reused
[3:18] <joshd> once you're cluster's healthy, you can find which osd is primary (the first in the list) for the rbd_directory object with: ceph osd map volumes rbd_directory
[3:18] <joshd> then enable logging on that osd via: ceph osd tell N injectargs -- "--debug-osd 20"
[3:18] <Fetch> this sounds dumb, but I don't think the thing is converging to healthy
[3:18] <joshd> where N is the number of the primary osd
[3:19] <Fetch> it's on a fairly unused network, but I haven't seen clean pg progress in a day
[3:19] <Fetch> and it's only 15TB across 3 osd
[3:19] <joshd> with only 3 osds, it might be due to the legacy crush tunables
[3:20] <joshd> http://ceph.com/docs/master/rados/operations/crush-map/#tuning-crush
[3:20] <joshd> legacy is still the default so old kernel clients work
[3:21] * sagelap (~sage@2600:1012:b02e:9e54:6c88:ecfd:9e2d:5f39) Quit (Ping timeout: 480 seconds)
[3:21] <joshd> if you can get an osd log of the rbd rm failing like I described above, that'd be great
[3:21] <joshd> I've got to run though
[3:21] <Fetch> will do
[3:21] <Fetch> and yeah, same here
[3:22] <Fetch> thanks again for all the help
[3:22] <joshd> no problem
[3:23] * Tamil (~tamil@ Quit (Quit: Leaving.)
[3:29] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[3:37] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[3:38] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[3:48] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[3:54] * diegows (~diegows@ Quit (Ping timeout: 480 seconds)
[3:55] * rturk is now known as rturk-away
[4:03] * yehuda_hm (~yehuda@2602:306:330b:1410:882e:275c:f33f:7cd1) Quit (Ping timeout: 480 seconds)
[4:05] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[4:05] * yehuda_hm (~yehuda@2602:306:330b:1410:882e:275c:f33f:7cd1) has joined #ceph
[4:06] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[4:16] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Read error: Connection reset by peer)
[4:26] * kylehutson (~kylehutso@dhcp231-11.user.cis.ksu.edu) Quit (Ping timeout: 480 seconds)
[4:29] * tkensiski (~tkensiski@c-98-234-160-131.hsd1.ca.comcast.net) has joined #ceph
[4:29] * tkensiski (~tkensiski@c-98-234-160-131.hsd1.ca.comcast.net) has left #ceph
[4:33] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[4:56] <sage> glowell just pushed out v0.61
[5:00] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[5:01] * pconnelly (~vmware-pc@ has left #ceph
[5:10] * lightspeed (~lightspee@ Quit (Ping timeout: 480 seconds)
[5:11] <lurbs> What're the 0.61 highlights?
[5:13] <jmlowe> sage: get my note about the docs?
[5:13] <sage> yep fixing now
[5:13] <sage> thanks!
[5:13] <jmlowe> it'll save everybody lots of where is it questions
[5:16] <sage> :)
[5:17] <jmlowe> wish me luck, here goes my mon upgrade
[5:17] <mikedawson> jmlowe: if you give me ~5 I'll let you know how mine goes (mind you I've been on next for a while)
[5:18] <jmlowe> just doing the first one now
[5:18] <jmlowe> 1/3
[5:21] * coyo (~unf@pool-71-164-242-68.dllstx.fios.verizon.net) has joined #ceph
[5:25] * DarkAce-Z (~BillyMays@ has joined #ceph
[5:26] <mikedawson> sage, jmlowe: I am upgraded and back to HEALTH_OK after a rolling update.
[5:26] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[5:26] <mikedawson> Nice work Inktank / Ceph Community!
[5:26] <jmlowe> I have my quorum
[5:27] <jmlowe> whew, I believe it is a new record: 2775op/s
[5:29] * Dark-Ace-Z (~BillyMays@ has joined #ceph
[5:30] * DarkAceZ (~BillyMays@ Quit (Ping timeout: 480 seconds)
[5:30] * sage (~sage@ Quit (Ping timeout: 480 seconds)
[5:34] * DarkAce-Z (~BillyMays@ Quit (Ping timeout: 480 seconds)
[5:48] * sagelap (~sage@2600:1012:b01c:4b82:cc35:76de:35:104b) has joined #ceph
[5:53] * sage (~sage@ has joined #ceph
[5:54] * sagelap (~sage@2600:1012:b01c:4b82:cc35:76de:35:104b) has left #ceph
[5:59] * yehuda_hm (~yehuda@2602:306:330b:1410:882e:275c:f33f:7cd1) Quit (Ping timeout: 480 seconds)
[6:02] * yehuda_hm (~yehuda@2602:306:330b:1410:882e:275c:f33f:7cd1) has joined #ceph
[6:12] * Dark-Ace-Z is now known as DarkAceZ
[6:13] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) has joined #ceph
[6:13] <jmlowe> well if I believe the op/s reported, cuttlefish is 2x - 5x faster
[6:16] * yehuda_hm (~yehuda@2602:306:330b:1410:882e:275c:f33f:7cd1) Quit (Ping timeout: 480 seconds)
[6:21] <mikedawson> jmlowe: keep an eye on the size growth of /var/lib/ceph/mon/ceph-*/store.db
[6:21] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) Quit (Ping timeout: 480 seconds)
[6:22] <jmlowe> 50MB
[6:23] <mikedawson> os?
[6:23] <paravoid> congrats on v0.61!
[6:24] <nigwil> +1
[6:24] <dmick> tnx paravoid
[6:24] <jmlowe> mikedawson: ubuntu 12.10
[6:25] <jmlowe> mikedawson: now down to 43MB
[6:25] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[6:26] <mikedawson> yeah, it compacts the leveldb store periodically. My stores have been large (growing ~35GB a day with compact disabled). No one is quite sure why yet.
[6:28] <mikedawson> jmlowe: I'm on 13.04, which may be the problem. Let me know if yours stays in that range or grows tomorrow if you think about it
[6:28] <jmlowe> mikedawson: I've seen some of the discussion here
[6:44] <coredumb> just to know, is ceph compatible with ZFS?
[6:44] <wogri_risc> cordumb, this is a weird questionl. you mean as the underlying FS for an OSD or as a FS in RBD?
[6:44] <dmick> "compatible with" is not a well-defined question, but, some have been doing work to allow ceph to use ZFS filesystems for the daemon storage
[6:45] <dmick> it's not quite right last I heard; there are problems with xattrs (there were a few zfs-on-linux bugs, but I think there's still a problem or two outstanding)
[6:46] <coredumb> wogri_risc: indeed as an underlying FS for an OSD
[6:46] <coredumb> dmick: ok
[6:46] <wogri_risc> coredumb: i would not use this yet. as dmick said, work has been done.
[6:46] <wogri_risc> but it's neither complete nor very well supported.
[6:47] <wogri_risc> and zfs on linux - isn' that in a weird state anyways?
[6:47] <dmick> we'd like it to work, and I suspect work will happen soon again now that cuttlefish is out. Really there's not much to support to get it rolling, but in theory zfs's snapshotting can be used to optimize some things for the filestore
[6:48] <coredumb> wogri_risc: it's definitely not ;)
[6:49] <coredumb> dmick: ok
[6:49] <wogri_risc> Ah, I just read: "Solaris Porting Layer"
[6:51] <coredumb> i'll stay tuned on that :)
[7:08] * Rorik (~rorik@ Quit (Read error: Connection reset by peer)
[7:08] * Rorik (~rorik@ has joined #ceph
[7:22] * coyo (~unf@00017955.user.oftc.net) Quit (Quit: F*ck you, I'm a daemon.)
[7:40] * wogri (~wolf@nix.wogri.at) Quit (Quit: Lost terminal)
[7:40] * wogri (~wolf@nix.wogri.at) has joined #ceph
[8:15] * tnt (~tnt@ has joined #ceph
[8:17] * yehuda_hm (~yehuda@2602:306:330b:1410:5ce0:683c:121a:f749) has joined #ceph
[8:34] * loicd (~loic@magenta.dachary.org) has joined #ceph
[8:35] * fridad (~fridad@b.clients.kiwiirc.com) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[8:36] * fridad (~fridad@b.clients.kiwiirc.com) has joined #ceph
[8:44] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[8:45] * lightspeed (~lightspee@ has joined #ceph
[8:48] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[8:56] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) has joined #ceph
[8:56] * BManojlovic (~steki@ has joined #ceph
[8:57] <Kioob`Taff> Hi
[8:57] <Kioob`Taff> (the channel topic is not "up to date", there is no mention of cuttlefish)
[8:57] * uli (~uli@mail1.ksfh-bb.de) Quit (Remote host closed the connection)
[8:58] <Kioob`Taff> simple question : is it safe to upgrade from bobtail 0.56.4 to cuttlefish, without upgrading to 0.56.5 before ?
[8:59] <nigwil> on a related question, will the binaries (http://ceph.com/debian/dists/precise/) be updated automagically?
[9:00] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) Quit ()
[9:00] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) has joined #ceph
[9:01] <Zethrok> Kioob`Taff: I did on a test-cluster with no problems (from 0.56.3). First upgraded mons, made sure they were up then upgraded all osd. Since all osd need to upgrade PG you might want to only upgrade 1 osd/node at a time. I just restarted all upgraded nodes at once and everything was stalled for ~10-15min
[9:02] <tnt> Kioob`Taff: from what I hear, it depends when your cluster was created. If it was created with argonaut and upgraded to bobtail, you need to go through latest bobtail first (0.56.6)
[9:02] <tnt> if it was first created using bobtail, then it's fine, no need to go through the latest bobtal.
[9:03] <Kioob`Taff> ok thanks Zethrok and tnt !
[9:03] * danieagle (~Daniel@ has joined #ceph
[9:03] * eschnou (~eschnou@ has joined #ceph
[9:03] <Kioob`Taff> And yes, my cluster was setup from argonaut
[9:16] * tnt (~tnt@ Quit (Ping timeout: 480 seconds)
[9:20] * ScOut3R (~ScOut3R@ has joined #ceph
[9:21] * barryo (~borourke@cumberdale.ph.ed.ac.uk) Quit (Read error: No route to host)
[9:22] <matt_> hmm.. crap. I seem to have hit bug #4793 when using cuttlefish - http://tracker.ceph.com/issues/4793
[9:23] * lofejndif (~lsqavnbok@rainbowwarrior.torservers.net) has joined #ceph
[9:25] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:25] * barryo (~borourke@cumberdale.ph.ed.ac.uk) has joined #ceph
[9:26] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[9:31] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:43] * tziOm (~bjornar@ has joined #ceph
[9:45] <brother> Hi
[9:46] <brother> Is there any arcitechtuar documentation about how the object store is used by cephfs, rdb and the RADOS Gateway?
[9:47] <nigwil> brother: you need more detail than this? http://ceph.com/docs/master/architecture/
[9:47] <brother> I found a bit about striping somewhere,
[9:47] <nigwil> I suspect you might need to browse the source-code if you do
[9:48] * LeaChim (~LeaChim@ has joined #ceph
[9:49] <brother> That is my Plan B. It was just if I missed some secret stash of documentation somewhere...
[9:58] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[10:00] * tziOm (~bjornar@ Quit (Ping timeout: 480 seconds)
[10:03] <paravoid> rturk-away, scuttlemonkey, yehuda_hm: I'm interested in the RGW Geo-replication discussion today. not sure what you mean by "interested parties" and what you expect from hangout attendees, but I can say for sure that we're interested!
[10:08] <tnt> I think "interested parties" is the people that want to be among the 10 possible "active" people (i.e. talking) in the hangout and not just watching the public stream.
[10:10] * tziOm (~bjornar@ has joined #ceph
[10:10] <loicd> tnt: that's also my understanding
[10:14] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[10:21] <paravoid> I think I'll be fine with irc
[10:21] * loicd trying to understand the semantics of OSDService https://github.com/ceph/ceph/blob/master/src/osd/OSD.h
[10:22] <tnt> Interesting ... I left a test cluster doing continuous fio benchmark over the last few days and I found it crashed this morning ...
[10:23] * BManojlovic (~steki@ has joined #ceph
[10:23] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[10:23] <tnt> "2013-05-05 03:03:04.533803 7f5195912700 0 -- submit_message pg_stats_ack(31 pgs tid 99558) v1 remote,, failed lossy con, dropping message 0x2acfc40"
[10:23] <tnt> these started appearing in the mon logs.
[10:24] * BManojlovic (~steki@ Quit (Remote host closed the connection)
[10:24] * ScOut3R_ (~ScOut3R@ has joined #ceph
[10:25] * BManojlovic (~steki@ has joined #ceph
[10:29] <tnt> And restarting the daemons doesn't help ... there are 912 pgs that stay "peering" ...
[10:31] * ScOut3R (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[10:31] * mistur_ is now known as mistur
[10:31] * sage (~sage@ Quit (Ping timeout: 480 seconds)
[10:36] <lxo> oh my... the session I was most interested in (crush rules) didn't make it, and the other (erasure encoding) overlaps with the only meeting I have at work today :-(
[10:36] <lxo> aah, I see the crush language blueprint was combined with two others into a single session. phew!
[10:42] * sage (~sage@ has joined #ceph
[10:45] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[10:45] * Havre (~Havre@2a01:e35:8a2c:b230:dcc3:7504:611c:64fc) Quit (Remote host closed the connection)
[10:45] * TMM (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) Quit (Remote host closed the connection)
[10:47] * BManojlovic (~steki@ has joined #ceph
[10:49] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[10:50] * tziOm (~bjornar@ Quit (Ping timeout: 480 seconds)
[10:50] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[10:50] * TMM (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) has joined #ceph
[10:51] * Havre (~Havre@2a01:e35:8a2c:b230:94b:4c37:1cf6:c28d) has joined #ceph
[10:52] * bergerx_ (~bekir@ has joined #ceph
[10:54] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[10:55] * coyo (~unf@pool-71-164-242-68.dllstx.fios.verizon.net) has joined #ceph
[10:55] * BManojlovic (~steki@ has joined #ceph
[10:57] * SubOracle (~quassel@00019f1e.user.oftc.net) Quit (Remote host closed the connection)
[10:59] * SubOracle (~quassel@coda-6.gbr.ln.cloud.data-mesh.net) has joined #ceph
[10:59] * goodbytes (~kennetho@2a00:9080:f000::58) has joined #ceph
[11:02] * tziOm (~bjornar@ has joined #ceph
[11:06] * dxd828 (~dxd828@ has joined #ceph
[11:26] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[11:28] <goodbytes> I'm attempting to create a cluster completely manually. I have created a monitor keyring a ceph.conf and ceph-mon --mkfs, the monitor is running and the cluster is up (only with this one monitor). How do i create a client.admin key? The documentation seems to assume that I have this key, but I can't run "ceph" commands without it.
[11:30] <Zethrok> goodbytes: it should be created when you call mkcephfs - at least that's the only way I've ever created one
[11:35] <goodbytes> I'm trying to deploy my cluster without cephfs. I want to deploy it using Salt Stack, but first I need to understand how it works in fully, the manual way
[11:38] <tnt> the client.admin key is the first one in the mon keyring.
[11:38] <goodbytes> oh ok. I found a tool named "ceph-create-keys" to generate the initial keys for me
[11:38] * schlitzer (~schlitzer@p5DCA3735.dip0.t-ipconnect.de) has joined #ceph
[11:38] <schlitzer> hey all
[11:39] <schlitzer> how can i move a osd in the crush map?
[11:39] <schlitzer> i tried ceph osd crush move osd.3 ceph1
[11:39] <schlitzer> where ceph1 is the name of one of my hosts
[11:40] <schlitzer> but i simply get "invalid argument" back
[11:41] <Zethrok> schlitzer: I think you need to use the normal way like to add an osd ( http://ceph.com/docs/master/rados/operations/crush-map/#add-move-an-osd )
[11:41] <Zethrok> schlitzer: or you should be able to compile a new crushmap
[11:43] <schlitzer> so it would be "ceph osd crush set osd.3 1.0 root=ceph1
[11:43] <Zethrok> schlitzer: yea, seems like it. I think it will re-distribute everything again, but I'm not sure
[11:44] <schlitzer> nope: (22) Invalid argument
[11:44] <schlitzer> not working ether
[11:44] <schlitzer> btw, this is bobtail 0.56 running
[11:45] <Zethrok> schlitzer: I think you need the entirety of the bucket-data. Like root=lklkjl rack=kjlk host=lkjlkj. You should be able to find those values with ceph osd tree
[11:45] <schlitzer> ahhh ok
[11:45] <schlitzer> i try this
[11:46] <schlitzer> this works
[11:46] <schlitzer> thx
[11:48] <Azrael> so
[11:49] <Azrael> i've just started upgrading from testing (0.60) to cuttlefish (0.61)
[11:49] <Azrael> i've upgraded the mon and one of my data nodes
[11:49] <Azrael> each data node runs 12 osd's
[11:49] <Azrael> anywho
[11:49] <Azrael> now osd's are falling out of the cluster on other nodes
[11:49] <Azrael> i'll restart the osds and they come back in, but eventually fall out again
[11:49] <Azrael> its like playing whack-a-mole
[11:50] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[11:56] * BManojlovic (~steki@ has joined #ceph
[11:59] <matt_> Azrael, is this during peering?
[12:00] <tnt> ARGH ... HEALTH_ERR 9 pgs inconsistent; 10 scrub errors
[12:00] <matt_> I saw the same thing this morning using next, Ceph was kind of DDOS'ing itself and OSD's come and go but eventually they end up back online
[12:04] * ScOut3R (~ScOut3R@ has joined #ceph
[12:06] * barryo (~borourke@cumberdale.ph.ed.ac.uk) Quit (Remote host closed the connection)
[12:06] <loicd> Hi, I'm not sure my question is in the scope of http://ceph.com/help/community/ but here it is anyway ;-) I have troubles figuring out the difference between OSDService and OSD https://github.com/ceph/ceph/blob/master/src/osd/OSD.h . I can see that they do different kind of things. I also understand that PG should ( although it does not always do so, but it will get there eventually ) only use OSDService and never OSD. But I don't understand why O
[12:07] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[12:10] * ScOut3R_ (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[12:11] <Azrael> matt_: looks like osd's are actually crashing
[12:11] <Azrael> matt_: both on cuttlefish and 0.60 testing
[12:11] <Azrael> 0> 2013-05-07 10:06:03.853800 7f5536086700 -1 *** Caught signal (Aborted) **
[12:12] <matt_> Azrael, hmm... I haven't seen that one before
[12:14] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[12:16] <nigwil> this page has a typo on the shell commands: http://ceph.com/docs/master/install/upgrading-ceph/#upgrading-from-bobtail-to-cuttlefish
[12:17] <nigwil> "sudo rm /etc/apt/sources.sources.list.d/ceph.list" should be "sudo rm /etc/apt/sources.list.d/ceph.list"
[12:18] <nigwil> the typo occurs twice on that page (argonaut and bobtail)
[12:23] <Azrael> sigh
[12:23] <Azrael> osd's keep crashing
[12:23] <wogri_risc> Azrael, always the same stuff in the logs?
[12:24] <wogri_risc> did you upgrade the mon's first?
[12:24] * vipr (~vipr@78-23-113-244.access.telenet.be) Quit (Quit: leaving)
[12:24] * vipr (~vipr@78-23-113-244.access.telenet.be) has joined #ceph
[12:27] * fabioFVZ (~fabiofvz@ has joined #ceph
[12:27] <fabioFVZ> Hello
[12:28] <Azrael> wogri_risc: yep same crash on same osd's
[12:28] <Azrael> wogri_risc: and yep upgraded mon first
[12:28] <wogri_risc> damn it, Azrael.
[12:28] <Azrael> wogri_risc: this is a sort-of-beta cluster for us. only one mon
[12:28] <wogri_risc> so this would even be better for you in this case.
[12:29] <Azrael> we are writing live data to the cluster, but our software isn't dependent on ceph yet. its using nfs too. this is just for transition.
[12:29] <fabioFVZ> i updated from bobtail to cuttlefish...in the mon-A received this error found errors while attempting to convert the monitor store: (17) File exists
[12:29] <fabioFVZ> someone know why?
[12:29] <matt_> fabioFVZ, do you have a store.db directory in your monitor?
[12:30] <fabioFVZ> wait
[12:30] <fabioFVZ> yes
[12:30] <nigwil> I think the bobtail-->cuttlefish instructions could be tweaked in the MON upgrade case: with two MONs when you upgrade the first one then they start complaining about "connect protocol version mismatch", and doing a ceph mon stat on the just upgraded MON just appears to lockup. Everything comes good once you upgrade the second MON and they can start talking happily again.
[12:31] <Gugge-47527> any documentation saying 2 mon's is a good idea is broken :)
[12:31] <fabioFVZ> i update the mon-b..
[12:31] <matt_> nigwil, you would lose quorum in a 2 mon setup as soon as you upgrade one of them which would be the reason for the lockup
[12:31] <fabioFVZ> but nothing ...
[12:31] <nigwil> On that page referenced above: where is says "Ensure the monitor has rejoined the quorum." should be for the second and subsequent MONs only
[12:31] * schlitzer (~schlitzer@p5DCA3735.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[12:32] <nigwil> matt_: agree, I am just clarifying the documentation for the base (somewhat broken case) :-)
[12:32] <matt_> nigwil, I think that's refering to a 3 mon setup. after the first upgrade the bobtail mons keep quorum. after the second upgrade the cuttlefish mons have quorum (because theres two of them)
[12:33] <fabioFVZ> matt_: wait the error is before update the second mon-b
[12:33] <fabioFVZ> and now?
[12:34] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has left #ceph
[12:34] <matt_> fabioFVZ, I don't believe you should have a store.db directory before you upgrade because that's where leveldb is stored
[12:34] <nigwil> matt_: it would help if that page makes it clear about the minimum number of MONs
[12:34] <matt_> nigwil, true that. Although they do discourage running 2 mon's because it's no better than running a single mon
[12:34] * danieagle (~Daniel@ Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[12:35] <nigwil> ok
[12:35] <Gugge-47527> the minimum number is 1 :)
[12:35] <Gugge-47527> Anything less that that is not good :P
[12:35] <Gugge-47527> 2 is not better than 1 though :)
[12:35] <nigwil> :-)
[12:35] * diegows (~diegows@ has joined #ceph
[12:36] <nigwil> it would seem to be a good opportunity to remind people, particularly on test-setups where they are sub-optimal
[12:36] <nigwil> like mine...
[12:36] <Gugge-47527> any even number is crazy actually :)
[12:37] * schlitzer (~schlitzer@p5DCA3735.dip0.t-ipconnect.de) has joined #ceph
[12:37] <nigwil> although it is going to happen in the failed-case, and people will try to upgrade a broken cluster with an even number of MONs (they might lack the chance to add another MON)
[12:37] <Gugge-47527> the only thing they have to do is upgrade the rest :)
[12:39] <fabioFVZ> thanks
[12:39] <fabioFVZ> i lost all
[12:39] <matt_> fabioFVZ, ??
[12:39] <fabioFVZ> i follow the update page...but there is some problem
[12:40] <fabioFVZ> for me ..is broken during update
[12:40] <fabioFVZ> but is no possibile if 1 mon crashed ...all ceph craches
[12:42] <fabioFVZ> mon-A now is without osd daemon and mon-B is update and mon-b is ok
[12:43] <fabioFVZ> but ceph -W 2013-05-07 12:43:10.449023 7f0903f45700 0 -- :/2341 >> pipe(0x18b8490 sd=3 :0 s=1 pgs=0 cs=0 l=1).fault
[12:44] <nigwil> fabioFVZ: how many MONs did you have originally?
[12:45] <nigwil> I saw the same message when I upgraded the first MON to cuttlefish
[12:47] <tnt> So .. anyone knows what I'm supposed to do when I have inconsisteng PGs ?
[12:47] <scuttlemonkey> paravoid: "interested parties" are folks that really feel the need to be on video and have the ability to speak
[12:47] <paravoid> ok, I don't think that'd be needed
[12:48] <scuttlemonkey> anyone that wants it will be able to see the "live" video stream and talk in the irc channel
[12:49] <fabioFVZ> any idea?
[12:49] <fabioFVZ> in this moment ceph is died
[12:49] <nigwil> fabioFVZ: what are you seeing in /var/log/ceph/{most-recent-log-file}
[12:50] <fabioFVZ> 2013-05-07 12:38:48.542677 7f7918f76780 0 ceph version 0.61 (237f3f1e8d8c3b85666529860285dcdffdeda4c5), process ceph-mon, pid 964
[12:50] <fabioFVZ> 2013-05-07 12:38:48.740594 7f7918f76780 -1 there is an on-going (maybe aborted?) conversion.
[12:50] <fabioFVZ> 2013-05-07 12:38:48.740701 7f7918f76780 -1 you should check what happened
[12:50] <fabioFVZ> 2013-05-07 12:38:48.740797 7f7918f76780 -1 found errors while attempting to convert the monitor store: (17) File exists
[12:51] <fabioFVZ> how i force a monitor store conversion?
[12:51] <matt_> fabioFVZ, trying renaming the store.db directory in your mon to store.db.old and then try again
[12:51] <fabioFVZ> try
[12:52] <fabioFVZ> ... start conversion
[12:53] <matt_> fabioFVZ, rename the folder then start the mon. The mon will do the conversion when it starts
[12:54] <fabioFVZ> yes... done..i'm waiting the end of conversion
[12:54] <fabioFVZ> yessss. :)
[12:54] <fabioFVZ> matt_: many thankssss
[12:54] <nigwil> whew! glad to hear that fabioFVZ :-)
[12:54] <matt_> fabioFVZ, you are welcome :)
[12:54] <fabioFVZ> :)
[12:55] <fabioFVZ> update the osd node? ... :)
[12:56] <matt_> fabioFVZ, if your monitors are now working then yes
[12:56] <fabioFVZ> ok ...osd-00 ...
[12:57] <nigwil> between each OSD upgrade I waited until they settled down and showed up again ceph osd stat
[12:57] <fabioFVZ> ok
[12:58] <fabioFVZ> i delete the store.db.old dir?
[12:59] <matt_> fabioFVZ, yes that should be safe if your monitors are all working again
[12:59] <fabioFVZ> ok many thanks
[12:59] <matt_> maybe keep it for a few days just in case
[12:59] <fabioFVZ> osd-00 updated...
[13:00] <fabioFVZ> ok no problem..is little
[13:00] * barryo (~borourke@cumberdale.ph.ed.ac.uk) has joined #ceph
[13:01] <goodbytes> where can i report problems in docs?
[13:01] <fabioFVZ> o think the program is died during the conversion..
[13:02] <fabioFVZ> for me yes..
[13:02] <tnt> Is there a way to force a deep scrub on a PG ? ceph pg scrub <xxx> only forces a normal scrub.
[13:03] <matt_> tnt, repair maybe? I'm sure there is a deep scrub command though
[13:05] <fabioFVZ> update all ... now i have the "cuttlefish" :) many thank
[13:06] * SubOracle (~quassel@00019f1e.user.oftc.net) Quit (Remote host closed the connection)
[13:06] <tnt> paravoid: I think I'm having the same issue as you in http://tracker.ceph.com/issues/4743 How did you manually trigger a deep scrub to check all PGs ?
[13:07] <paravoid> tnt: ceph pg deep-scrub $pg or ceph osd deep-scrub $osd
[13:07] <tnt> damn, I was missing the '-' ...
[13:08] <tnt> did you get any more error since your upgrade to 0.56.4 ?
[13:08] <tnt> I just switched from 0.56.3 to 0.56.6 yesterday and they appeared ...
[13:09] <nigwil> another successful cuttlefish upgrade, subjectively feels quicker than bobtail, well done Inktank+friends
[13:13] <mikedawson> matt_: did you get past your issue with Cuttlefish?
[13:14] <matt_> mikedawson, my monitor issue? It did resolve itself eventually but it just recently spazzed out and I'm down to two mons again
[13:14] <mikedawson> matt_: I have a new theory... What kind of workload do you have? Read heavy or write heavy? Average size of writes?
[13:16] <matt_> mikedawson, it's all VM's hosted on RBD. Alot of small writes and most of the reads get filled by the kernel disk cache so not many hit the disks
[13:18] * john_barbee_ (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[13:18] * yanzheng (~zhyan@jfdmzpr01-ext.jf.intel.com) has joined #ceph
[13:19] <mikedawson> matt_: yep, that is consistent with my theory. What are the sizes of your mon store.db's ? If you watch them over time, you'll see them grow then shrink (caused by the compact on trim). But are they getting bigger in general despite the frequent compactions?
[13:20] <matt_> mikedawson, about 1gb at the moment but they were recently compacted
[13:20] <matt_> they were getting massive before the compact fixes. Joao was using my store to test the compact fix I think
[13:22] <matt_> oh yay, new IOPS record from within a VM. 7,800 :D
[13:26] * ScOut3R_ (~ScOut3R@ has joined #ceph
[13:27] <mikedawson> matt_: I don't doubt they were getting big. What I'm seeing is the 'compact on trim' is working, but under load, my store.db is still growing at ~1.5GB/hour. Could you watch yours to see if leveldb grows over time?
[13:27] <goodbytes> I have created my initial mon keyring as written in the docs: http://ceph.com/docs/master/dev/mon-bootstrap/
[13:27] <matt_> mikedawson, no problem. I will keep a lookout for you.
[13:27] <goodbytes> but I get an "access denied" when attempting to authenticate with those keys, either mon. or client.admin
[13:28] <matt_> mikedawson, whilst we're on the subject... should I have compact on trim = true in my ceph.conf or is it on by default in cuttlefish?
[13:28] <goodbytes> i think it has something to do with missing capabilities
[13:29] <mikedawson> matt_: Cuttlefish has the default set to 'mon compact on trim = true'
[13:30] * aliguori (~anthony@ has joined #ceph
[13:31] <mikedawson> matt_: To me, it seems small writes (possibly only small writes from RBD) spam the mon leveldb. With compact on trim I was able to keep up last week, but after ramping up workload, I'm growing faster than I'm compacting.
[13:32] <fridad> mikedawson
[13:32] <fridad> mikedawson
[13:32] <fridad> mikedawson: good to know so i'lldef. wait until this is fixed and then start upgrading from bobtail
[13:33] <mikedawson> fridad: good call
[13:34] * ScOut3R (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[13:35] <fridad> mikedawson: yes i've only small writes from rbd
[13:36] <fridad> mikedawson: zero reads and nearly zero seq. i/o
[13:36] <matt_> mikedawson, are you creating a lot of new objects constantly? I wouldn't have thought updating an existing object would require the mon to write to leveldb but I'm not entirely sure of it's architecture
[13:36] <mikedawson> fridad: I bet you'd see the issue
[13:37] * scuttlemonkey changes topic to 'Ceph Developer Summit Today! || Main page: http://ow.ly/kMufY || Track 1: http://ow.ly/kMuhN || Track 2: http://ow.ly/kMujg || Starts 8a PDT / 11a EDT / 4p BST / 5p CEST'
[13:37] <mikedawson> matt_: yes, constant new small writes (~16KB)
[13:37] <tnt> btw, when I do 'apt-get install ceph' it doesn't automatically pull the new packages for the other ceph packages like ceph-common ceph-fs-common ceph-fuse libcephfs1 librados-dev librados2 librbd-dev librbd1 radosgw
[13:38] <matt_> mikedawson, my workload might be a little different. For the most part, all of my small writes should be to existing objects as they're all SQL and Exchange DB's and such
[13:39] <matt_> mikedawson, there are new objects created but probably not on the same scale as yourself
[13:39] * SubOracle (~quassel@coda-6.gbr.ln.cloud.data-mesh.net) has joined #ceph
[13:39] <matt_> heh, actually I might be wrong. My mon store is up 300mb since 10 minutes ago
[13:41] <mikedawson> matt_: keep an eye on it today, it'll go down when compact is triggered, but if it trends up over time...
[13:41] <tnt> wtf ... why would /usr/sbin/ceph-create-keys get executed after a cuttlefish upgrade ... it's an upgrade, I have keys already !?!
[13:42] <tnt> it seems to have replaced the keys from the mon I just restarted ... and obvously it can't join the cluster now.
[13:44] <matt_> mikedawson, will do. I'll drop you a message tomorrow
[13:45] <mikedawson> matt_: thanks
[13:49] <wido> tnt: It's something new with those keys for the mons
[13:49] <wido> on start-up they generate a mon. key
[13:49] <goodbytes> i'm trying to build a cluster from scratch. I have deployed and started a single mon daemon, but I can't use the monitor key to authenticate against the mon daemon, i get an "access" denied, anybody knows how to troubleshoot that?
[13:50] <tnt> wido: but what's weird is that now that mon can't join the cluster and has plenty of cephx errors in the logs
[13:50] <jluis> <goodbytes> where can i report problems in docs? <- open a ticket on the tracker or email ceph-devel
[13:50] <goodbytes> jluis, thanks
[13:50] <wido> tnt: That is indeed weird. It's working fine with my mons which went from bobtail to cuttlefish
[13:50] <tnt> yes, I'm coming from 0.56.6 to 0.61
[13:51] <wido> tnt: To be sure: Updated the package, stopped the mon and started it again
[13:51] <tnt> yup.
[13:51] <wido> it does the leveldb migration and joins again
[13:51] <jluis> mikedawson, how big are your stores now?
[13:51] * jluis is now known as joao
[13:52] <mikedawson> jluis: 14GB, growing ~1.5GB/hour
[13:52] <tnt> wido: http://pastebin.com/2cy54HGx This is the log after restart.
[13:53] <joao> that's not supposed to happen :\
[13:53] <mikedawson> joao: email sent with more details on my new theory
[13:53] <joao> thanks
[13:53] <wido> tnt: So it's still in the probing state?
[13:54] <tnt> wido: AFAICT, yes
[13:55] <joao> mikedawson, 'ceph -s' ?
[13:57] <wido> tnt: That is weird. Those failed to authorize messages is what I see sometimes as well, but that could be because the monitor didn't join the quorum
[13:58] <tnt> I'll try to restart a second mon ...
[13:59] <tnt> wido: huh ... that actually worked.
[14:00] <tnt> So all those cephx error were just because it wasn't in the quorum because it was the first mon (of 3) to be restarted in a bobtail to cuttlefish upgrade
[14:00] <tnt> A note about that in the release notes might be useful. I expected it to not join the quorum because of the mon changes, but I didn't expect cephx errors in the logs.
[14:03] <mikedawson> joao: http://pastebin.com/raw.php?i=WByX0s8b
[14:04] <tnt> Another thing "create-or-move updated item id 0 name 'osd.0' weight 0.02 at location {host=ceph,root=default} to crush map" should it alter the crushmap each time I restart a service ? I'm kind of worried that all that 'automatic' stuff messes up my crush map.
[14:05] <mikedawson> tnt: I have seen that, too. So far no changes to my crush map or weights
[14:07] <joao> mikedawson, fwiw, those ops/s are stats reported by the osds
[14:08] <joao> mikedawson, any idea how much is actually going into leveldb? I would think that iotop or something would give you that info, but not sure
[14:09] <mikedawson> joao: Not sure. What gets written to the mon leveldb? My hosts, osds, pools, crush rules aren't changing...
[14:11] <joao> mikedawson, pgmap and osdmap updates are mostly responsible for store writes; some log messages too (those reported on 'ceph -w')
[14:12] <Azrael> 2013-05-07 14:12:03.795544 mon.0 [INF] pgmap v159328: 17408 pgs: 17402 active+clean, 4 down+peering, 2 active+clean+scrubbing; 1331 GB data, 4119 GB used, 253 TB / 257 TB avail
[14:12] <mikedawson> joao: do I have pgmap or osdmap changes if my hosts, osds, pool, and crush rules aren't changing?
[14:12] <Azrael> how in the world do i handle the 4 down+peering?
[14:12] <joao> mikedawson, if you look at map epochs on 'ceph -s', you'll see what's being updated (map epochs are increased)
[14:12] <joao> mikedawson, you'll have pgmap updates if pgs are being changed
[14:13] <joao> osdmaps will also change from time to time (osds report new stats every now and then)
[14:14] <mikedawson> joao: sdd1 is my os partition (which includes /var/lib/ceph/mon/osd) http://pastebin.com/raw.php?i=Z4eFMtH6
[14:17] <joao> the monitor doesn't seem to be doing a lot of writes
[14:17] * ScOut3R (~ScOut3R@ has joined #ceph
[14:17] <joao> we could also force compact on every N commits
[14:18] <joao> but this feels like going around whatever is the problem
[14:18] <joao> :\
[14:18] <joao> besides, I wonder what's the real impact on compacting frequently
[14:19] <Zethrok> grrr.. after cuttlefish my crushmap change every time I restart any osd and that osd will be placed in the wrong bucket
[14:19] <joao> fwiw, now that I think of it, compacting on trim may not be as frequent as one would think
[14:19] <Zethrok> Anyone else experienced that?
[14:20] <joao> Zethrok, I believe tnt was just concerned about that
[14:20] <tnt> Zethrok: I'v just seen the "create-or-move updated item id 0 name 'osd.0' weight 0.02 at location {host=ceph,root=default} to crush map" message as well on restart.
[14:21] <tnt> on my test cluster, it doesn't affect anything, but I was worried on prod, it would (because prod has a more complex crushmap)
[14:21] <joao> is that really changing the crushmap? have you tried obtaining a crush map before and after the restart, decompiling and comparing them both?
[14:23] <Zethrok> Yea, that was the first I tried. Whenever I restart any osd it goes from one of my custom made root-rack-hosts to default one
[14:23] * eschnou (~eschnou@ Quit (Remote host closed the connection)
[14:23] <joao> mikedawson, still have debug on in the monitor?
[14:23] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[14:24] <Zethrok> I'm guessing it is because I use different hostnames for the same server (like osd101-ssd, osd101-sata) because it creates a new entry in the crushmap with the default hostname in the default root-rack-host
[14:24] <joao> would love to see how frequently 'compacting on trim' appears on your logs, but that's only going to show up at 'debug mon = 20'
[14:24] * ScOut3R_ (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[14:24] <joao> err
[14:24] <joao> 'compating prefix' I mean
[14:25] <Azrael> rawwwwrrrr osd's keep crashing
[14:25] <joao> well, figuring out lunch; brb
[14:25] <Azrael> guess thats what we get for upgrading to a .0 heh
[14:26] <tnt> Zethrok: you can probably disable osd_crush_update_on_start option.
[14:26] <tnt> or manually set osd_crush_location for each osd
[14:26] <Zethrok> hehe, yea :) - on the good side cuttlefish and qemu-kvm with the async patch seems really awesome
[14:27] <tnt> if you look in /etc/init/ceph-osd.conf you can see what it executes.
[14:27] <jmlowe> Zethrok: you build your own qemu or did you get it somewhere with the patch applied?
[14:27] <tnt> Zethrok: do you have a pointer to the async patch ?
[14:28] * aliguori (~anthony@ Quit (Remote host closed the connection)
[14:28] <Zethrok> built it just to test - but seems like it was committed, so I guess it will be out before too long
[14:29] <jmlowe> it was committed, came after 1.4.1
[14:30] <tnt> http://git.qemu.org/?p=qemu.git;a=commitdiff;h=dc7588c1eb3008bda53dde1d6b890cd299758155 ?
[14:30] <Zethrok> before I had io stalls of ~10 sec when I shutdown a node and approx. 1min. once it started to replicate. Now I'm seeing 1 sec and 3 sec respectively.
[14:31] <tnt> That might be the fact that osd shutdown are now send to the mon to 'warn it' and avoid IO stall more than the qemu flush patch ?
[14:32] <tnt> "osd: notify mon on clean shutdown to avoid IO stall
[14:32] <tnt> " from the changelog.
[14:32] <tnt> that's something I'm looking forward to :p
[14:32] <Zethrok> Maybe - I didn't test without the patch, so can't say :)
[14:33] * BManojlovic (~steki@ Quit (Remote host closed the connection)
[14:34] <wido> Anyone upgraded their cluster to Cuttlefish yet?
[14:34] <wido> I'm rebooting nodes one by one, suddenly see strange things
[14:34] <wido> I reboot a node with 4 OSDs and suddenly I have 20 OSDs being marked down
[14:35] <wido> and they all start complaining "wrongly marked me down"
[14:35] <wido> After a couple of min the cluster is happy again with everything active+clean
[14:35] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[14:36] <tnt> wido: I've seen a coupld of osd up/down that seemed weird when I restarted .. but that was a test cluster with 4 osd so I didn't pay much attention at the time.
[14:37] * BManojlovic (~steki@ has joined #ceph
[14:37] <wido> tnt: Ok, weird
[14:37] <wido> I suddenly see this: "osdmap e580: 36 osds: 4 up, 36 in"
[14:37] <wido> "map e582 wrongly marked me down"
[14:37] <Zethrok> wido: On the first test-cluster I upgraded I didn't wait for all OSD on a node to finish upgrading PG's (or whatever they're doing) before restarting the next.
[14:37] <tnt> yikes, that's indeed pretty bad.
[14:37] <wido> When I reboot one of the nodes
[14:38] <wido> Zethrok: No, I'm going one by one and waiting for active+clean
[14:38] <Zethrok> yea, I'm doing that as well now - takes ~5-10min pr. node though
[14:39] <wido> Ok, weird. I rebooted another node and I does it again
[14:39] <wido> osdmap e592: 36 osds: 4 up, 36 in
[14:39] <wido> Seems like the OSDs which go down, marked everybody else down
[14:39] <wido> and pollute the OSDMap
[14:40] <tnt> set the threshold higher as a work around ?
[14:41] <wido> tnt: Yes, that might actually be a good default anyway
[14:41] <wido> the min down reporters should be more then the OSDs on one host
[14:42] <wido> I think it is 3 by default
[14:50] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[14:54] * wido (~wido@rockbox.widodh.nl) has left #ceph
[14:54] * wido (~wido@rockbox.widodh.nl) has joined #ceph
[15:02] * aliguori (~anthony@ has joined #ceph
[15:18] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[15:19] * madkiss (~madkiss@p5DCA3735.dip0.t-ipconnect.de) has joined #ceph
[15:19] * allsystemsarego (~allsystem@ has joined #ceph
[15:23] * glowell (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[15:26] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[15:30] * yanzheng (~zhyan@jfdmzpr01-ext.jf.intel.com) Quit (Remote host closed the connection)
[15:30] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[15:31] * BManojlovic (~steki@ has joined #ceph
[15:34] <Azrael> hey folks, i'm getting osd crashes with ceph cuttlefish like so: http://pastebin.com/Bt3yWvfV
[15:34] <Azrael> any advice?
[15:37] * yanzheng (~zhyan@ has joined #ceph
[15:43] <tnt> does it happen with all the osd
[15:43] <tnt> ?
[15:43] <tnt> Is there anything before the crash in the logs ?
[15:52] <tnt> Wow, cuttle fish really boosted small IO perf.
[15:53] <Kdecherf> tnt: ah? need to test on our production cluster :D
[15:55] <via> has the status of nfs re-export changed with cuttlefish?
[15:56] <Azrael> happens with a few osd's
[15:56] <Azrael> we *think* its a corrupt pg
[15:57] <Azrael> i'm removing the corrupt pg manually and starting the osd's
[15:57] <Azrael> seems to fix things
[15:57] * drokita (~drokita@ has joined #ceph
[15:57] <Azrael> funny thing is.... this corrupt pg doesn't even belong to the osd's that are crashtastic
[15:58] * PerlStalker (~PerlStalk@ has joined #ceph
[15:58] * tkensiski (~tkensiski@2600:1010:b012:a87e:a5bf:3ca6:19ba:7a76) has joined #ceph
[15:58] * tkensiski (~tkensiski@2600:1010:b012:a87e:a5bf:3ca6:19ba:7a76) has left #ceph
[16:00] <mikedawson> Azrael: I remember some threads with similar crashes from people running 0.56.x. Can't find any word of a resolution.
[16:01] <Azrael> http://tracker.ceph.com/issues/3615
[16:01] <Azrael> mikedawson: ^ was our hint
[16:03] <mikedawson> Azrael: does it sound like this? http://www.mail-archive.com/ceph-devel@vger.kernel.org/msg10174.html
[16:05] <Azrael> yarrrr 1 pg incomplete
[16:05] * eschnou (~eschnou@ has joined #ceph
[16:06] * eschnou (~eschnou@ Quit ()
[16:06] <Azrael> mikedawson: it does indeed good sir
[16:06] <Azrael> mikedawson: the fs seems ok
[16:06] <Azrael> mikedawson: but will remount and fsck
[16:07] * drokita1 (~drokita@ has joined #ceph
[16:07] <tnt> Kdecherf: write speed of 44k blocks (yes, 44k not 4k) seems to have doubled. Read speed has a small regression tough.
[16:07] * drokita (~drokita@ Quit (Ping timeout: 480 seconds)
[16:09] <Azrael> ugh lost a pg it looks like
[16:09] <Azrael> its incomplete
[16:09] <mikedawson> tnt: if you are running a lot of small random writes, could you watch the size of /var/lib/ceph/mon/ceph-*/store.db? With enough writes, I believe it will grow over time. Trying to track down the bug.
[16:11] * sagelap1 (~sage@2600:1012:b014:8bdc:6c88:ecfd:9e2d:5f39) has joined #ceph
[16:12] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has left #ceph
[16:12] <jmlowe> mikedawson: I'm still holding steady right around the 50MB mark
[16:12] <tnt> mikedawson: well atm I only have my 2 test clusters on it running benchmarks.
[16:12] <joao> sagelap1, morning; around?
[16:12] <sagelap1> good morning!
[16:12] <Azrael> ok
[16:12] <Azrael> so if i have an incomplete pg
[16:13] <Azrael> (lord knows how that happened)
[16:13] <Azrael> what should i do about it
[16:13] <sagelap1> 'ceph pg <pgid> query' should give some clues
[16:13] <sagelap1> it will tell you which down osds it would like to see come up
[16:14] <Azrael> right
[16:14] <tnt> mikedawson: but I sure hope that doesn't grow too much because my mons in prod only have 10G ...
[16:14] <Azrael> the 3 osd's are back online
[16:14] <Azrael> but
[16:14] <Azrael> we removed those pg directories from those osds
[16:14] <Azrael> because the osd daemon kept crashing, otherwise
[16:15] <Azrael> its no surprise (to us) the pg is incomplete however, because there's no files there :-)
[16:15] <Azrael> hmm
[16:15] <Azrael> but now we have to figure out how to make this pg no longer incomplete
[16:15] <Azrael> we have backups of what was in the pg's directory under /var/lib/ceph/osd/ceph-N/current/<pg>
[16:16] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[16:16] * saras (~kvirc@74-61-8-52.war.clearwire-wmx.net) has joined #ceph
[16:16] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[16:19] <sagelap1> the simplest option is probably to make the mon recreate the pg, and then manually put those objects back into rados
[16:19] <sagelap1> non-trivial to repair the underlying directory such that the osd will accept it
[16:19] <Azrael> right
[16:20] <Azrael> can you elaborate on the non-trivial repair process, just for giggles?
[16:20] <sagelap1> ceph pg force_create_pg <pgid>
[16:20] <sagelap1> but first you should remove all traces of that pg
[16:20] <Azrael> booo thats not in ceph -h output :-D
[16:20] <sagelap1> yeah it is intentionally not documented
[16:21] <mikedawson> http://wiki.ceph.com/01Planning/Developer_Summit
[16:21] <Azrael> wonder why... heh heh
[16:21] <Azrael> ok so that will recreate the pg correct... thus losing all data
[16:21] <sagelap1> what kind of data was stored in the cluster?
[16:21] <Azrael> thats the trivial process, right?
[16:21] <sagelap1> in this pool?
[16:21] <Azrael> copy of production data
[16:21] * markbby (~Adium@ has joined #ceph
[16:21] <Azrael> we haven't cutover to rados fully yet
[16:22] <PerlStalker> Heads up: http://ceph.com/docs/master/install/upgrading-ceph/rados/deployment/ceph-deploy-transition linked to from http://ceph.com/docs/master/install/upgrading-ceph/ 404s.
[16:22] <sagelap1> rbd? fs? rgw?
[16:22] <Azrael> running in dual mode
[16:22] <Azrael> just rados objects
[16:22] <sagelap1> no omap?
[16:22] * sagelap1 is now known as sagelap
[16:22] <Azrael> its written just using librados
[16:22] <sagelap> with bytes in objects, but not the keys/values?
[16:23] <sagelap> if that's the case, then just find the remaining instances of the pg on your osds (if there are any)
[16:23] <sagelap> if the pg status is 'stale' from the mon then there probably aren't any left
[16:23] <sagelap> and then its safe to recreate the pg
[16:23] <Azrael> yeah we moved all copies away from the 3 osds (have set 3 replicas for this pool)
[16:23] <sagelap> then go through your backup directory and manually do a rados put on those objects back into the cluster
[16:23] <sagelap> ok cool
[16:24] <Azrael> ok cool i think i follow
[16:25] <Azrael> so make sure that borked pg is actually empty on the 3 osd's
[16:25] <Azrael> and then ceph pg force_create_pg 8.40 (our busted pg is 8.40)
[16:25] <Azrael> then manually add the objects back
[16:25] <sagelap> yeah
[16:26] <sagelap> if you have crash dumps from when it was messed up before tht would be interesting to see on ceph-devel, too :)
[16:27] * rturk-away is now known as rturk
[16:27] <Azrael> sagelap: http://pastebin.com/Bt3yWvfV
[16:28] <sagelap> do you have a core?
[16:28] <Azrael> only if ceph automagically writes one somewhere
[16:28] <Azrael> know where that would be?
[16:28] <Azrael> we *might* have an strace output if that would help
[16:29] <tnt> Azrael: they appear in / for me ...
[16:29] <sagelap> usually ends up in /, if you did ulimit -c unlimited
[16:29] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[16:29] <Azrael> says its creating the pg
[16:30] <Azrael> there's a 1.6GB core file
[16:30] <Azrael> pg create y u slow?
[16:31] <Azrael> should i ceph pg force_rm or something first?
[16:33] * saras (~kvirc@74-61-8-52.war.clearwire-wmx.net) has left #ceph
[16:33] <Azrael> health HEALTH_WARN 1 pgs stuck inactive; 1 pgs stuck unclean
[16:33] <Azrael> osdmap e3392: 72 osds: 72 up, 72 in
[16:33] <Azrael> pgmap v165893: 17408 pgs: 1 creating, 17407 active+clean; 1334 GB data, 4124 GB used, 257 TB / 261 TB avail
[16:36] <Azrael> sagelap: should the cluster take a long while to create the pg?
[16:36] <scheuk> Does anyone know why the latest version of bobtail is 0.56.5 according to the blog post, but when I install it from the ceph.com/debian-bobtail I'm getting 0.56.6, was there an accidential version bump in the apt package build?
[16:36] <mikedawson> sagelap: did you have a chance to read my email?
[16:37] <sagelap> mikedawson: yeah just saw it
[16:37] <sagelap> i think we should rebase the patches that build a trace of the leveldb workload and have you rn that for a while so we can (hopefully) reproduce the growth here
[16:37] <mikedawson> sagelap: wish I would have turned up my workload sooner
[16:37] <sagelap> brb
[16:38] * aliguori (~anthony@ Quit (Read error: Operation timed out)
[16:40] * tziOm (~bjornar@ Quit (Remote host closed the connection)
[16:42] <Azrael> dumdeedum
[16:42] <jmlowe> scheuk: there was a bug in the /usr/lib/ceph/ceph_common.sh, wouldn't parse ceph.conf correctly to get the hostname for starting and stopping the daemons
[16:43] <Azrael> ceph create pg, i command thee!
[16:44] <mikedawson> jmlowe: many op/s and especially KB/s wr are you seeing on your cluster?
[16:45] * sagelap (~sage@2600:1012:b014:8bdc:6c88:ecfd:9e2d:5f39) Quit (Ping timeout: 480 seconds)
[16:46] <scheuk> jmlowe: I see, thank you :)
[16:46] * sagelap (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) has joined #ceph
[16:46] <jmlowe> mikedawson: http://pastebin.com/VBMJdgmJ
[16:49] <mikedawson> jmlowe: I'm about an order of magnitude higher right now, seemingly beyond 'compact on trim's ability to keep up
[16:50] <mikedawson> jmlowe: do you have a dev/test cluster?
[16:50] <mikedawson> jmlowe: something you could do 'rados -p <testing-pool-with-3x> bench 3600 write -b 16384 -t 64'
[16:52] * yasu` (~yasu`@dhcp-59-157.cse.ucsc.edu) has joined #ceph
[16:52] <mikedawson> basically, I need someone with cuttlefish and enough disks (in a non-production setup) to hammer small writes and watch to see if they can reproduce the monitor's leveldb growth
[16:52] <Azrael> yay we fixed it
[16:53] <jmlowe> I was doing about 10x higher earlier this morning, there must be a bit of a lull
[16:55] * nibon7 (~nibon7@ has joined #ceph
[16:57] <sagewk> azrael: awesome
[16:58] * yasu` (~yasu`@dhcp-59-157.cse.ucsc.edu) Quit (Remote host closed the connection)
[16:58] * yasu` (~yasu`@dhcp-59-157.cse.ucsc.edu) has joined #ceph
[17:01] <Azrael> sagewk: thanks man!
[17:03] * alram (~alram@ has joined #ceph
[17:06] <tnt> mikedawson: I started a bench in a loop ... but it's only a very small (1mon / 2disks) cluster.
[17:06] <Azrael> sagewk: ok we recovered our objects
[17:06] * scuttlemonkey changes topic to 'Ceph Developer Summit Today! || Main page: http://ow.ly/kMufY || Track 1: http://ow.ly/kMuhN || Track 2: http://ow.ly/kMujg || Starts 8a PDT / 11a EDT / 4p BST / 5p CEST'
[17:06] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[17:07] <mikedawson> tnt: thanks, but I doubt two spinners will be able to provide the iops needed to exceed the effect of compacting the mon leveldb
[17:07] <scuttlemonkey> sage is currently doing his opening remarks for the online Ceph Developer Summit
[17:08] <tnt> mikedawson: yeah, I see it growing then going back in size in sort of a triangle wave pattern. But I want to test that io load anyway because when I left it run last week end, it crashed the OSD ...
[17:09] * yanzheng (~zhyan@ Quit (Remote host closed the connection)
[17:10] <mikedawson> tnt: growth is the problem and 'mon compact on trim' is the current workaround. When you see the size go down, a leveldb compact has been triggered. That works ok, until your workload exceeds it's abilities
[17:10] * winston-d (~zhiteng@ has joined #ceph
[17:11] <tnt> mikedawson: what kind of iops do you need to see that ?
[17:11] * yanzheng (~zhyan@ has joined #ceph
[17:11] <mikedawson> tnt: not sure
[17:14] * sjust (~sam@2607:f298:a:607:baac:6fff:fe83:5a02) Quit (Quit: Leaving.)
[17:14] * sjust (~sam@2607:f298:a:607:baac:6fff:fe83:5a02) has joined #ceph
[17:20] * gmason (~gmason@hpcc-fw.net.msu.edu) has joined #ceph
[17:21] * tkensiski (~tkensiski@ has joined #ceph
[17:23] * tkensiski (~tkensiski@ has left #ceph
[17:24] * John (~john@astound-64-85-225-33.ca.astound.net) has joined #ceph
[17:25] * ron-slc (~Ron@173-165-129-125-utah.hfc.comcastbusiness.net) Quit (Quit: Leaving)
[17:25] * John (~john@astound-64-85-225-33.ca.astound.net) Quit ()
[17:25] * diegows (~diegows@ Quit (Ping timeout: 480 seconds)
[17:26] * yanzheng (~zhyan@ Quit (Remote host closed the connection)
[17:28] * yanzheng (~zhyan@jfdmzpr05-ext.jf.intel.com) has joined #ceph
[17:28] * John (~john@astound-64-85-225-33.ca.astound.net) has joined #ceph
[17:28] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:29] * ron-slc (~Ron@173-165-129-125-utah.hfc.comcastbusiness.net) has joined #ceph
[17:32] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[17:33] * schlitzer (~schlitzer@p5DCA3735.dip0.t-ipconnect.de) Quit (Quit: Leaving)
[17:35] * aliguori (~anthony@ has joined #ceph
[17:40] * brady (~brady@rrcs-64-183-4-86.west.biz.rr.com) has joined #ceph
[17:40] * ScOut3R (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[17:40] <tnt> mikedawson: Not sure if it matters but the size the mon goes down to after a compact seems to become progressively larger.
[17:42] <mikedawson> tnt: that is my problem, if you run it long enough, with enough load...
[17:42] * madkiss (~madkiss@p5DCA3735.dip0.t-ipconnect.de) Quit (Quit: Leaving.)
[17:43] <mikedawson> tnt: despite the compaction, how quickly is the store growing over time?
[17:45] <tnt> Maybe a bit less than 10 Mo / hour or so.
[17:45] <tnt> I'll see how it went overnight. Gotta go now.
[17:46] <mikedawson> tnt: ok. thanks!
[17:47] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[17:48] * imjustmatthew (~imjustmat@c-24-127-107-51.hsd1.va.comcast.net) Quit (Remote host closed the connection)
[17:56] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:57] * bergerx_ (~bekir@ Quit (Remote host closed the connection)
[17:57] * winston-d (~zhiteng@ Quit (Quit: Leaving)
[17:58] * winston-d (~zhiteng@pgdmzpr01-ext.png.intel.com) has joined #ceph
[17:58] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Quit: Leaving)
[18:01] * rturk is now known as rturk-away
[18:01] * rturk-away is now known as rturk
[18:03] <ron-slc> I have a 3-mon setup, one of them was inadvertently upgraded to v0.61. I have stopped this mon, and wish to down grade it to v56.6. If I remove all /var/lib/ceph/mon.b/ content, except the keyfile, will it properly repopulate?
[18:03] <ron-slc> this mon, also never joined the cluster. so there wasn't any cluster communications with it.
[18:06] <mikedawson> ron-slc: there is no going back iiuc. If you still have a 2 mon quorum with 0.56.6, you may be able to remove the upgraded monitor, then re-add it with the old software
[18:07] <ron-slc> kk. Sadly my osd's did join cluster... so I'm going to format osds, and rebuild them....
[18:07] <ron-slc> stupid "do-release-upgrade"!!!
[18:08] <ron-slc> mikedawson: thanks!
[18:14] * tziOm (~bjornar@ti0099a340-dhcp0870.bb.online.no) has joined #ceph
[18:16] * fabioFVZ (~fabiofvz@ Quit (Quit: see you )
[18:16] * dxd828 (~dxd828@ Quit (Ping timeout: 480 seconds)
[18:16] * fabioFVZ (~fabiofvz@ has joined #ceph
[18:17] * fabioFVZ (~fabiofvz@ Quit (Remote host closed the connection)
[18:22] * nibon7 (~nibon7@ Quit (Quit: 离开)
[18:26] * yanzheng (~zhyan@jfdmzpr05-ext.jf.intel.com) Quit (Remote host closed the connection)
[18:27] * yanzheng (~zhyan@jfdmzpr02-ext.jf.intel.com) has joined #ceph
[18:32] * rturk is now known as rturk-away
[18:32] * davidzlap (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[18:33] * rturk-away is now known as rturk
[18:34] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[18:39] * yehuda_hm (~yehuda@2602:306:330b:1410:5ce0:683c:121a:f749) Quit (Ping timeout: 480 seconds)
[18:40] * yehuda_hm (~yehuda@2602:306:330b:1410:5ce0:683c:121a:f749) has joined #ceph
[18:40] <dosaboy> hi, I am trying to debug an issue with rbd but do not have /sys/kernel/debug/ceph
[18:40] <dosaboy> do I need to install something to have this?
[18:41] <dosaboy> ah just found out how ;)
[18:41] <dosaboy> mount -t debugfs none /sys/kernel/debug
[18:42] <dosaboy> hmm that did not work
[18:43] <dosaboy> already mounted but does not add ceph entry
[18:43] <dmick> dosaboy: needs a kernel compiled with the right options
[18:45] <dosaboy> dmick: so debugfs is mounted successfully and there is abunch of stuff in there
[18:46] <dosaboy> just no ceph entry
[18:46] <dmick> yes. needs a kernel....
[18:46] <dosaboy> ack
[18:46] <dmick> DYNAMIC_DEBUG or something like that
[18:47] <dosaboy> so, I have just set up a fresh cluster and it loosk in good health
[18:48] <dosaboy> I created an rbd of 100GB and mapped, mounted, unmoutn etc and when I delete it
[18:48] <dosaboy> it hangs
[18:48] <dosaboy> any clues as to where I could look to see what is going on?
[18:48] <dmick> "it hangs" means...
[18:48] <dosaboy> ok so, I was able to mount and write to it,
[18:49] <dosaboy> i unmounted and unmapped it
[18:49] <dosaboy> then I did rbd rm <vol>
[18:49] <dosaboy> and that command hangs
[18:49] <dosaboy> i.e. it never completes the rm
[18:49] <dosaboy> logs are not showing much atm
[18:49] <dosaboy> and there is hardly any disk activity
[18:50] <dmick> "the rbd rm command hangs" is what I was after. Hm. Did it show any progress messages before hanging (% completed type things)?
[18:50] <dosaboy> nope
[18:50] <dmick> does rbd ls still work and show it?
[18:51] <dmick> how about rbd info?
[18:51] <dosaboy> yes
[18:51] * dosaboy tries rbd info
[18:51] <dosaboy> yup that works too
[18:52] <dmick> well that's odd. Is this a normal (non-snapshot) image, and are you sure no one else has it open?
[18:53] <dosaboy> yep just me. not a snapshot either
[18:53] <dosaboy> so I just mapped the vol
[18:53] <dosaboy> which worked
[18:54] <dosaboy> and now I did rm again (on mapped vol) to see what happens
[18:54] <dosaboy> it hangs
[18:54] <dosaboy> should it complain if you try to do rm on a mapped rbd vol?
[18:54] <dmick> it should, I'm not sure if it does :)
[18:54] * leo (~leo@ has joined #ceph
[18:55] <dmick> I think it does, because I think a mapped volume has an outstanding watch. sounds like it's not even getting that far though
[18:55] <dosaboy> hm I think something is screwed cause it won't allow me to unmap the device now
[18:55] <dmick> does it even put up "Removing image"?
[18:56] <dosaboy> nope
[18:56] <dmick> so it must be hanging in the open
[18:56] * dosaboy tries to digs deeper
[18:57] <dmick> librbd debug might be helpful
[18:57] <dosaboy> thanks dmick!
[18:58] <dmick> which you enable with debug rbd = 20, conf or inject
[18:58] <dosaboy> i've got 'debug default = 20' atm
[18:59] <dosaboy> should cover all of them right
[18:59] <dosaboy> ok so just realised that I had created a volume much bigger than the size of my cluster
[18:59] <dmick> I don't think so
[18:59] <dosaboy> TB instead of GB
[19:00] <janos> dosaboy, you cna certainly do that. but it will take a long time to remove ;)
[19:00] * Tamil (~tamil@ has joined #ceph
[19:01] <dmick> but you should still see activity when you attempt removal
[19:01] <dosaboy> janos: when I first deleted the image I heard the disks crunching
[19:01] * leo (~leo@ Quit (Quit: Leaving)
[19:01] <dosaboy> presumably deleting objects
[19:01] <dosaboy> but then they went quiet
[19:01] <dosaboy> for ages
[19:01] <janos> i had something bungled long ago, and it would error right after 99% rm
[19:01] <dosaboy> hmm
[19:01] <janos> each attempt took forever, so it would issue a resize on the rbd to shrink it
[19:02] <janos> THEN try to remove again
[19:02] <dosaboy> janos: trying...
[19:02] <janos> i dont know if it will actgually help you rm, but once shrunken you can certainly attempt it with much faster iteration
[19:03] <dmick> it's conceivable that it wasn't actually hung, but again, would expect to see progress msgs
[19:03] <dmick> once you're past the objects that actually exist, there's a pile of "look up object, fail, look up object, fail" calls that basically just hit the net/daemons
[19:05] * nlopes (~nlopes@a95-92-0-12.cpe.netcabo.pt) has joined #ceph
[19:05] * BManojlovic (~steki@fo-d- has joined #ceph
[19:05] <dosaboy> dmick: that sounds about right
[19:06] <dmick> but the very first thing it does is open the image and then print "Removing image", so, I got nothing on why that didn't succeed
[19:07] * jtang1 (~jtang@ has joined #ceph
[19:08] * yanzheng (~zhyan@jfdmzpr02-ext.jf.intel.com) Quit (Remote host closed the connection)
[19:14] * bergerx_ (~bekir@ has joined #ceph
[19:14] * davidzlap (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Read error: Connection reset by peer)
[19:14] * davidzlap (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[19:15] * gaveen (~gaveen@ has joined #ceph
[19:19] * cjh_ (~cjh@ps123903.dreamhost.com) has joined #ceph
[19:20] <cjh_> congrats on cuttlefish !
[19:22] <dosaboy> does ceph have any support for client quotas?
[19:23] <dosaboy> web search seems to indicate no
[19:28] <cjh_> what kind of client? there are many
[19:31] <cjh_> looks like the ceph-deploy transition page is down
[19:32] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[19:32] <rturk> transition page?
[19:39] <joshd> fghaas: yes, it should fit in well with the cinder backup service (although that's still pretty basic, doesn't do any incrementals yet)
[19:42] <fghaas> joshd: true, but people will be increasingly interested in native rbd diff shipping (outside Cinder, that is)
[19:46] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[20:07] * dpippenger (~riven@206-169-78-213.static.twtelecom.net) has joined #ceph
[20:10] * glowell (~glowell@ has joined #ceph
[20:11] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Quit: Leaving.)
[20:20] * john_barbee_ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[20:20] <cjh_> fghaas: how does the diff shipping work in rbd? is it like btrfs snap diffs?
[20:21] * BillK (~BillK@58-7-104-61.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[20:21] <fghaas> cjh_: as joshd explained earlier, shipping is up to the user at this point
[20:21] <cjh_> fghaas: sorry I just jumped in. missed that
[20:21] <fghaas> http://ceph.com/docs/master/man/8/rbd/ explains rbd export-diff and import-diff
[20:22] <cjh_> thanks :). i'll experiment with this
[20:23] <fghaas> so the idea, at this point, would be that the user exports the diff, moves it to a different cluster where there is an rbd device that is exactly idetical to the snapshot the diff is against, and applies it there
[20:23] <cjh_> i wonder if you could use this to export diff's between remote clusters
[20:23] <fghaas> see above
[20:23] <cjh_> right
[20:23] * BManojlovic (~steki@fo-d- Quit (Quit: Ja odoh a vi sta 'ocete...)
[20:23] <cjh_> that takes away netapp's advantage :D
[20:26] * paravoid_ (~paravoid@scrooge.tty.gr) has joined #ceph
[20:27] * paravoid is now known as Guest4609
[20:27] * paravoid_ is now known as paravoid
[20:27] * bergerx_ (~bekir@ Quit (Remote host closed the connection)
[20:27] <darkfader> cjh_: netapps advantage as per 1999-ish? let's revisit that once one of us backs up ceph via ndmp
[20:27] * Guest4609 (~paravoid@scrooge.tty.gr) Quit (Quit: Reconnecting)
[20:27] <darkfader> (which i would definitely love)
[20:28] * bergerx_ (~bekir@ has joined #ceph
[20:29] <cjh_> darkfader: lol true
[20:30] * BManojlovic (~steki@fo-d- has joined #ceph
[20:44] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[20:47] <fghaas> joshd: one other question about snapshot diff shipping. are there currently any safeguards in place for rbd import-diff? i.e. if I were to apply a snapshot diff on an rbd that is not in a state that matches the pre-diff state, would anything notice?
[20:49] <joshd> fghaas: it only checks that the source snapshot name exists in the image - nothing fancy or content-checking
[20:50] <cjh_> i'm not sure i understand how rbd-fuse and the mds fuse mount differ.
[20:50] <cjh_> they appear to be very similar
[20:51] <fghaas> joshd, ok so if I do rbd snap create on one image, named foo, and I have a completely different rbd that also happens to have a snapshot named foo, and I take a snap diff between foo and bar on one of them it will happily apply on the other?
[20:52] <joshd> fghaas: yes
[20:53] <fghaas> alright. but an automated snapshot shipper could work around that by using uuids for snapshot names. that could at least prevent _accidental_ rbd wreckage
[20:55] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[20:57] <cjh_> fghaas: i'm imagine you could write some quicky bash to get that done
[20:57] <joshd> fghaas: yeah, or we could add some basic check like md5sum of the first object or something
[20:58] * portante|afk is now known as portante
[20:59] <fghaas> cjh_: anything that someone thinks is less than 10 lines in shell, eventually ends up in a GitHub repo of its own
[20:59] <fghaas> and frequently that repo is then full of Haskell
[20:59] <cjh_> lol
[20:59] <cjh_> oh god
[21:00] <cjh_> python to the rescue :D
[21:01] <fghaas> joshd: if the diff is sufficiently small, wouldn't it even be conceivable to store a hash of the full image? could just as well be a fast, nonsecure one like crc32c if the user so desires
[21:01] <cjh_> i'm doing an upgrade on my testing cluster now to cuttlefish
[21:01] <fghaas> joshd: sorry, hash of the full diff of course (not the full image)
[21:03] <drokita1> Congratulations on the Cuttlefish release. Does this mean that PG splitting is prime time?
[21:04] * winston-d (~zhiteng@pgdmzpr01-ext.png.intel.com) Quit (Quit: Leaving)
[21:05] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:05] * loicd (~loic@magenta.dachary.org) Quit ()
[21:20] * Cube (~Cube@ has joined #ceph
[21:24] <joshd> fghaas: you mean reading the old version of data where the diff exists, and checksumming that? that's certainly possible too. much better than the full image, like you said
[21:24] <benner> in http://ceph.com/docs/master/install/upgrading-ceph/#upgrading-from-bobtail-to-cuttlefish
[21:25] <benner> does line sudo rm /etc/apt/sources.sources.list.d/ceph.list must be sudo rm /etc/apt/sources.list.d/ceph.list ?
[21:25] * Tamil (~tamil@ Quit (Quit: Leaving.)
[21:25] <joshd> benner: yeah, good point
[21:26] * diegows (~diegows@ has joined #ceph
[21:27] * psomas_ (~psomas@inferno.cc.ece.ntua.gr) has joined #ceph
[21:28] * paravoid (~paravoid@scrooge.tty.gr) Quit (Remote host closed the connection)
[21:28] * eschnou (~eschnou@54.120-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[21:29] * The_Bishop (~bishop@ has joined #ceph
[21:29] * psomas (~psomas@inferno.cc.ece.ntua.gr) Quit (Ping timeout: 480 seconds)
[21:30] <kyle_> hello all. i'm having some trouble with my mds and it's standby crashing instantly...
[21:31] <kyle_> i see something like this in it's log: 0> 2013-05-07 12:29:10.047193 7fb61bc38700 -1 mds/journal.cc: In function 'void EMetaBlob::replay(MDS*, LogSegment*)' thread 7fb61bc38700 time 2013-05-07 12:29:10.046543
[21:31] <kyle_> mds/journal.cc: 741: FAILED assert(i == used_preallocated_ino)
[21:31] * Cube1 (~Cube@ has joined #ceph
[21:31] * Cube (~Cube@ Quit (Read error: Connection reset by peer)
[21:32] <kyle_> if anyone has a moment to help me troubleshoot, that would be awesome. Thanks!
[21:33] * paravoid (~paravoid@scrooge.tty.gr) has joined #ceph
[21:34] * Tamil (~tamil@ has joined #ceph
[21:36] * paravoid (~paravoid@scrooge.tty.gr) Quit ()
[21:37] * bergerx_ (~bekir@ Quit (Quit: Leaving.)
[21:38] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[21:38] * psomas_ (~psomas@inferno.cc.ece.ntua.gr) Quit (Remote host closed the connection)
[21:38] * psomas (~psomas@inferno.cc.ece.ntua.gr) has joined #ceph
[21:39] * paravoid (~paravoid@scrooge.tty.gr) has joined #ceph
[21:44] * The_Bishop (~bishop@ Quit (Ping timeout: 480 seconds)
[21:49] * saaby (~Adium@1009ds5-oebr.1.fullrate.dk) has joined #ceph
[21:49] * loicd (~loic@2a01:e35:2eba:db10:dd35:eadd:f15f:fa75) has joined #ceph
[21:55] * The_Bishop (~bishop@2001:470:50b6:0:40e7:824f:8945:bf7f) has joined #ceph
[21:59] <lx0> kyle_, this sounds familiar. is this the first time you start the mds after an upgrade to >= 0.60 (maybe 0.59)?
[22:00] * lx0 is now known as lxo
[22:00] <kyle_> 1xo: it started happening before the upgrade but continues after
[22:00] <kyle_> i just upgraded todayt
[22:00] <kyle_> i was using bobtail
[22:01] <lxo> ok, I think I know what it is. let me confirm and locate the patch that fixes it. if it's what I'm thinking, it's a one-time error
[22:01] <kyle_> oh okay. thank you so much
[22:02] <lxo> if you still have the old mds, it should still work while I find the patch and you build the new mds with it
[22:02] <kyle_> i'm not in production yet. so i have some flexibility
[22:04] * nyerup (irc@jespernyerup.dk) has joined #ceph
[22:04] <lxo> d777b8e6 is the patch id for the problem I'm thinking of
[22:04] <lxo> git commit id, I mean
[22:06] <lxo> yeah, confirmed, that one should get your mds back up
[22:06] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[22:07] <kyle_> okay cool. i'll give it a try. thanks again
[22:07] * saaby (~Adium@1009ds5-oebr.1.fullrate.dk) has left #ceph
[22:07] * as (~as@mail.saaby.com) has joined #ceph
[22:10] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:10] <lxo> np
[22:11] * as (~as@mail.saaby.com) has left #ceph
[22:16] <kyle_> 1xo: i'm guessing to apply that patch there would be no way to use the package anymore for those servers right?
[22:20] * tnt (~tnt@ has joined #ceph
[22:20] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) has joined #ceph
[22:20] * saaby (~as@mail.saaby.com) has joined #ceph
[22:21] <benner> found another bug in documentation ( http://ceph.com/docs/master/install/upgrading-ceph/#upgrading-from-bobtail-to-cuttlefish ): broken link in ceph-deploy section "Transitioning to ceph-deploy" link
[22:23] * saaby (~as@mail.saaby.com) has left #ceph
[22:24] * saaby (~as@mail.saaby.com) has joined #ceph
[22:28] * fidadud (~oftc-webi@p4FC2C668.dip0.t-ipconnect.de) has joined #ceph
[22:31] <fidadud> i read today that there are still people having problems with the mon leveldb compaction in small I/O.
[22:32] <mikedawson> fidadud: yes
[22:32] <fidadud> Does anybody already know if we'll see an additonal fix to compact leveldb for 0.61.1?
[22:33] <mikedawson> fidadud: the issue hasn't been solved yet
[22:33] <fidadud> is it clear that it is a ceph and not a leveldb problem?
[22:34] <mikedawson> fidadud: not yet
[22:34] <jmlowe> did anybody ever answer drokita, is pg splitting out of experimental status?
[22:34] * prudhvi (~prudhvi@tau.supr.io) has left #ceph
[22:35] <fidadud> jmlowe>: yes for cuttlefish
[22:35] <lxo> kyle_, if you build just the mds with the patch and let it run for a while, once it flushes the journal entries with the old encoding that it misdecodes the packaged mds will work fine
[22:35] <jmlowe> fidadud: has it made it's way to the docs yet?
[22:36] <kyle_> 1xo: should i be cherry picking that patch after i checkout v0.61?
[22:36] <fidadud> jmlowe: sadly personally me needs pg merging instead of splitting...
[22:37] <dmick> fidadud: are you seeing performance issues that you believe are because of too many pgs?
[22:37] <dmick> I don't think I've heard of that yet
[22:37] <lxo> kyle_, hmm, I'd expect 0.61 to already have that patch, but I didn't check myself. maybe it missed the cut
[22:38] <fidadud> jmlowe: no idea if it is in the docs
[22:40] <fidadud> dmick: i don't have any problems but i know my pgs are far too high. 4096 pgs on 24 osds with replication 3
[22:41] <kyle_> 1xo: initially i tired purging and removing the ceph pakage, then reinstalled, hoping that would fix but they still crash as soon as i start them.
[22:41] <kyle_> trie*
[22:41] <dmick> fidadud: I wouldn't worry about it. Too many probably won't cause problems
[22:41] <dmick> just makes pg dump longer :)
[22:42] <dmick> jmlowe: I know the "--enable-experimental-feature" flag was removed. I filed the bug to get it removed :)
[22:42] <fidadud> dmick: wasn't sure about it - i've read a lot that too many pgs causes slow downs. And the doc says 100*24 / 3 => 800 pgs for my constellation
[22:43] <jmlowe> dmick: so what's the rest of the command?
[22:43] <fidadud> dmick: what's pg dump?
[22:43] <dmick> fidadud: yeah, but those are fuzzy rules of thumb. I wouldn't sweat it.
[22:43] <drokita1> nice work, pg splitting will save me a lot of work :)
[22:43] <dmick> jmlowe: it's just 'changing the pool pg_num setting' (that is, making it greater)
[22:43] <dmick> ceph osd pool set pg_num <num>, IIRC
[22:44] <dmick> yeah
[22:44] <fidadud> dmick: yeah but i need some rules i could use ;-) no rules isn't an alternative
[22:44] <dmick> fidadud: sure. *all I'm saying* is that, although yours is a bit high, I doubt it will cause problems, and you don't need to merge pgs
[22:44] <fidadud> dmick: OK thanks ;-)
[22:44] <dmick> fidadud: ceph osd pg dump, for instance
[22:45] <dmick> sorr
[22:45] <dmick> ceph pg dump
[22:45] <fidadud> dmick: yes but what's the use case?
[22:45] <dmick> why would you want to dump pgs?
[22:45] <dmick> it's just another cluster examination command
[22:45] <dmick> nothing magic
[22:45] <dmick> the point was there's a line per pg, so having too many means you have too many lines. just joking.
[22:46] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[22:46] <jmlowe> dmick: any danger, what happens next?
[22:47] <fidadud> dmick: ;-) i kow what ceph pg dump does but i don't know a usecase but maybe i'm just using rbd and not rados
[22:47] <dmick> you have more pgs
[22:47] <dmick> fidadud: useful for analyzing cluster health problems
[22:48] <fidadud> dmick: ok i didn't had one except dying OSDs on v0.50..
[22:48] <jmlowe> dmick: no balancing of objects or downtime?
[22:48] <fidadud> jmlowe: sure status pg splitting
[22:48] <fidadud> jmlowe: but no downtime
[22:48] <dmick> jmlowe: I don't remember if it causes object motion.
[22:48] <dmick> sjust: ?
[22:49] <fidadud> dmick: jmlowe: not sure about object motion BUT at least creating new objects and new placement of them
[22:49] <dmick> fidadud: definitely
[22:50] <lxo> kyle_, reinstalling the same faulty binary won't help. forcing the mds to clean its journal would fix the problem, but I couldn't get that ceph-mds option to work since 0.3*, if not older releases
[22:50] <sjust> jmlowe: there are two numbers
[22:51] <sjust> jmlowe: increasing pg_num causes split
[22:51] <sjust> jmlowe: increasing pgp_num causes motion
[22:51] <sjust> so if you have pg_num = pgp_num = 4
[22:51] <sjust> and you increase pg_num to 8
[22:51] <sjust> each pg splits in half
[22:51] <sjust> but both halves stay put
[22:51] <sjust> they don't move until you increase pgp_num to 8 as well
[22:51] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit (Quit: Leaving.)
[22:51] <lxo> sjust, so that's how cephalopods reproduce, eh? :-D
[22:54] <sjust> lxo: heh
[22:55] <kyle_> 1xo: gotcha. Yeah i'm not entirely sure what to do yet then. I would like to be able to continue to use the packages. But if i cannot get the mds to work with them i'm up for building from the clone. Just not sure how to apply to patch. I tried cherry picking but got errors from git.
[22:55] <lxo> what does the second p in pgp_num stand for, BTW?
[22:55] <sjust> hmm
[22:55] <sjust> probably placement
[22:56] <lxo> placement group placement? that requires deduplication :-)
[22:56] <sjust> it's certainly overloaded
[22:56] * eschnou (~eschnou@54.120-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[22:57] <lxo> this calls for intervention by the association for deduplication association
[22:58] <lxo> kyle_, so you're running 0.61, eh?
[22:58] <kyle_> yeah i just upgraded today
[22:58] <lxo> 0.61 seems to already have this patch
[22:59] * tziOm (~bjornar@ti0099a340-dhcp0870.bb.online.no) Quit (Quit: Leaving)
[22:59] <jmlowe> do all the clients need to be running cuttlefish, will there be any problem changing pg_num and pgp_num with old clients being active?
[22:59] <jmlowe> old == 0.56.4
[22:59] <lxo> which explains why it didn't apply
[22:59] <lxo> but doesn't explain why you get an error with very a very similar symptom
[23:00] <lxo> kyle_, what were you running before?
[23:01] * John (~john@astound-64-85-225-33.ca.astound.net) has left #ceph
[23:02] * markbby (~Adium@ Quit (Quit: Leaving.)
[23:02] * saras (~kvirc@74-61-8-52.war.clearwire-wmx.net) has joined #ceph
[23:03] <saras> that was really kool
[23:04] <lxo> rats, I thought deep scrub would pick up from the point it was at the last snap rather than starting over, after an osd restart :-(
[23:04] * aliguori (~anthony@ Quit (Remote host closed the connection)
[23:07] * scuttlemonkey changes topic to 'Latest stable -- http://ceph.com/get || v0.61 "Cuttlefish" available -- http://goo.gl/A1y0b || http://wiki.ceph.com Live! || "Geek on Duty" program -- http://goo.gl/f02Dt'
[23:14] * LeaChim (~LeaChim@ Quit (Ping timeout: 480 seconds)
[23:15] * BManojlovic (~steki@fo-d- Quit (Remote host closed the connection)
[23:16] * fidadud (~oftc-webi@p4FC2C668.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[23:18] * BManojlovic (~steki@fo-d- has joined #ceph
[23:18] * saaby (~as@mail.saaby.com) Quit (Quit: leaving)
[23:18] * saaby (~as@mail.saaby.com) has joined #ceph
[23:19] <cjh_> rbd-fuse doesn't seem to do anything on centos6
[23:19] <cjh_> i say rbd-fuse -p bench bench/ and it seems to do something but nothing is in the folder. i know there's about a terabyte of data in that pool
[23:20] * BManojlovic (~steki@fo-d- Quit ()
[23:22] <kyle_> 1xo: bobtail
[23:23] <kyle_> 1xo: hmm i guess i have some other issue going on, if the patch is already in 0.61.
[23:23] * LeaChim (~LeaChim@ has joined #ceph
[23:25] * Vjarjadian (~IceChat77@ has joined #ceph
[23:35] * Cube (~Cube@ has joined #ceph
[23:35] * Cube1 (~Cube@ Quit (Write error: connection closed)
[23:39] <dmick> cjh_: try with -d
[23:40] <cjh_> dmick: hidden switch ? :D
[23:40] <dmick> fuse-standard switch
[23:42] <cjh_> when i do touch test in that dir it makes a 1GB file
[23:43] <dmick> uint64_t imagesize = 1024ULL * 1024 * 1024;
[23:43] <dmick> creation attributes are in xattrs
[23:44] <dmick> see commit msg on 2a6dcabf7f1b7550a0fa4fd223970ffc24ad7870
[23:52] * danieagle (~Daniel@ has joined #ceph
[23:53] <lxo> kyle_, 'fraid so. odds are it's a similar decoding issue. I'll know if/when I hit it myself, but it's unlikely I'll be able to upgrade to 0.61 before Friday :-(
[23:55] <dmick> cjh_: so it's running now?
[23:57] * rustam (~rustam@ has joined #ceph
[23:59] * portante is now known as portante|afk

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.