#ceph IRC Log


IRC Log for 2012-07-26

Timestamps are in GMT/BST.

[0:10] * cephalobot (~ceph@ps94005.dreamhost.com) Quit (Remote host closed the connection)
[0:10] * rturk (~rturk@ps94005.dreamhost.com) Quit (Remote host closed the connection)
[0:11] * rturk (~rturk@ps94005.dreamhost.com) has joined #ceph
[0:11] * cephalobot (~ceph@ps94005.dreamhost.com) has joined #ceph
[0:13] <elder> I just spent three hours trying to figure out the source of a stupid crash. Turned out a freed pointer was being not reset to NULL, and was subsequently reused.
[0:13] <elder> Nothing wrong with my changes. Just a latent problem that my changes tickle.d
[0:17] <nhm> elder: at least you figured it out!
[0:19] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[0:21] <elder> That's true. But I kept looking and looking at my patches and it just didn't add up. I almost came upon the fix by mistake.
[0:27] * guerby (~guerby@nc10d-ipv6.tetaneutral.net) has joined #ceph
[0:27] * LarsFronius (~LarsFroni@95-91-243-240-dynip.superkabel.de) Quit (Quit: LarsFronius)
[0:49] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:52] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[1:02] * eternaleye_ (~eternaley@tchaikovsky.exherbo.org) has joined #ceph
[1:02] * eternaleye (~eternaley@tchaikovsky.exherbo.org) Quit (Read error: Connection reset by peer)
[1:08] * yoshi (~yoshi@p22043-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:14] * Tv_ (~tv@2607:f298:a:607:9cb9:bfb8:f8d0:361c) Quit (Quit: Tv_)
[1:34] * tnt (~tnt@99.56-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[1:37] * jluis (~JL@ has joined #ceph
[1:40] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[1:43] * joao (~JL@ Quit (Ping timeout: 480 seconds)
[1:45] * jluis (~JL@ Quit (Remote host closed the connection)
[1:46] * joao (~JL@ has joined #ceph
[1:48] * ryann (~chatzilla@ has left #ceph
[1:53] * jluis (~JL@89-181-146-184.net.novis.pt) has joined #ceph
[1:59] * joao (~JL@ Quit (Ping timeout: 480 seconds)
[2:39] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) has joined #ceph
[2:51] * jluis is now known as joao
[3:02] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[3:02] * loicd (~loic@magenta.dachary.org) has joined #ceph
[3:19] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) Quit (Quit: Leaving.)
[3:42] * alexxy (~alexxy@ Quit (Ping timeout: 480 seconds)
[3:46] * chutzpah (~chutz@ Quit (Remote host closed the connection)
[4:00] * andrewbogott (~andrewbog@c-75-72-240-208.hsd1.mn.comcast.net) has joined #ceph
[4:17] * andrewbogott (~andrewbog@c-75-72-240-208.hsd1.mn.comcast.net) Quit (Quit: andrewbogott)
[4:36] * eternaleye_ is now known as eternaleye
[4:46] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[5:02] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[5:02] * loicd (~loic@magenta.dachary.org) has joined #ceph
[5:26] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) Quit (Quit: adjohn)
[5:27] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) has joined #ceph
[5:39] * deepsa (~deepsa@ has joined #ceph
[6:00] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[6:00] * loicd (~loic@magenta.dachary.org) has joined #ceph
[6:10] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) Quit (Quit: adjohn)
[6:14] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[6:14] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) Quit (Remote host closed the connection)
[6:32] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) has joined #ceph
[7:06] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[7:06] * loicd (~loic@magenta.dachary.org) has joined #ceph
[7:07] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[7:09] * gregaf1 (~Adium@2607:f298:a:607:a9c4:6ffb:55e6:752d) has joined #ceph
[7:10] * gregaf (~Adium@2607:f298:a:607:ed43:f4ec:1f6c:43d8) Quit (Ping timeout: 480 seconds)
[7:11] * sagewk (~sage@2607:f298:a:607:219:b9ff:fe40:55fe) Quit (Ping timeout: 480 seconds)
[7:11] * sjust (~sam@ Quit (Ping timeout: 480 seconds)
[7:11] * yehudasa (~yehudasa@2607:f298:a:607:1815:c873:3856:4a95) Quit (Ping timeout: 480 seconds)
[7:11] * mkampe (~markk@2607:f298:a:607:222:19ff:fe31:b5d3) Quit (Ping timeout: 480 seconds)
[7:18] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Quit: LarsFronius)
[7:24] * sjust (~sam@ has joined #ceph
[7:25] * yehudasa (~yehudasa@ has joined #ceph
[7:25] * gregaf1 (~Adium@2607:f298:a:607:a9c4:6ffb:55e6:752d) Quit (Ping timeout: 480 seconds)
[7:26] * gregaf (~Adium@ has joined #ceph
[7:27] * mkampe (~markk@ has joined #ceph
[7:30] * sagewk (~sage@ has joined #ceph
[7:48] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[7:51] * tnt (~tnt@99.56-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[8:13] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[8:13] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[8:16] * gregaf1 (~Adium@ has joined #ceph
[8:18] * gregaf (~Adium@ Quit (Ping timeout: 480 seconds)
[8:19] * mkampe (~markk@ Quit (Ping timeout: 480 seconds)
[8:19] * sjust (~sam@ Quit (Ping timeout: 480 seconds)
[8:19] * sagewk (~sage@ Quit (Ping timeout: 480 seconds)
[8:19] * yehudasa (~yehudasa@ Quit (Ping timeout: 480 seconds)
[8:29] * yehudasa (~yehudasa@ has joined #ceph
[8:29] * sjust (~sam@ has joined #ceph
[8:29] * sagewk (~sage@2607:f298:a:607:219:b9ff:fe40:55fe) has joined #ceph
[8:32] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[8:33] * loicd (~loic@magenta.dachary.org) has joined #ceph
[8:33] * loicd (~loic@magenta.dachary.org) Quit ()
[8:50] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[8:57] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[9:03] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) Quit (Remote host closed the connection)
[9:07] * tnt (~tnt@99.56-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[9:08] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[9:13] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[9:19] * BManojlovic (~steki@ has joined #ceph
[9:31] * Leseb (~Leseb@ has joined #ceph
[9:32] * Leseb_ (~Leseb@ has joined #ceph
[9:32] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[9:33] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:36] * markl (~mark@tpsit.com) Quit (Remote host closed the connection)
[9:36] * markl (~mark@tpsit.com) has joined #ceph
[9:39] * Leseb (~Leseb@ Quit (Ping timeout: 480 seconds)
[9:39] * Leseb_ is now known as Leseb
[9:50] * fghaas (~florian@91-119-129-178.dynamic.xdsl-line.inode.at) has joined #ceph
[10:19] <tnt> Is there an explanation somewhere of how to create a cluster "manually" ? (machines can't ssh to one another for eg, and certainly can't ssh to user root anyway)
[10:20] <fghaas> take a look at what mkcephfs does and replicate that manually
[10:21] <fghaas> or use chef and the ceph recipes ... I hear there's people working on puppet modules as well. not that that's manual, but it removes the need to ssh into boxes
[10:22] <fghaas> also, do consider "PermitRootLogin without-password", deploying your ssh key in /root/.ssh/authorized_keys, and use "ssh -A" to shell into the box where you're running mkcephfs
[10:28] <gadago> morning
[10:29] <tnt> fghaas: thank. I'll do it manually as a test and to understand how it works and then integrate it in our own management scripts.
[10:29] <gadago> I have seem to have managed to setup a small ceph cluster with a rados gateway and managed to place a file into a bucket and list it back out again
[10:29] <gadago> my question is, where does the data go? i.e. where would I find it on the file system?
[10:30] * Leseb_ (~Leseb@ has joined #ceph
[10:30] * Leseb (~Leseb@ Quit (Read error: Connection reset by peer)
[10:30] * Leseb_ is now known as Leseb
[10:30] <tnt> gadago: you're not supposed to :)
[10:30] <gadago> tnt: oh okay
[10:30] <tnt> gadago: but it will be on the osd nodes ... but not necessarely as a single file ...
[10:31] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[10:31] <gadago> tnt: what am I supposed to be able to do?
[10:31] <tnt> access it via rgw ... that's it.
[10:31] <gadago> I managed to get it to list out the url to connect to
[10:31] <gadago> and it is there
[10:31] <gadago> is that it?
[10:33] <tnt> well you can do whatever you could do on a S3 gateway ...
[10:35] <fghaas> gadago: you can also fetch the object via rados directly, do "rados lspools", and then examine the .rgw* pools with "rados -p <pool> ls"
[10:44] <gadago> fghaas: thanks
[10:46] <fghaas> gadago: you can also find out which osd copies of an object are located on, and then examine the data on the osd, but as tnt said, you're really not supposed to do that. interact with rados via rados client tools
[10:48] <tnt> Mmm, it's a bit inconvenient that mkcephfs tries to rm the mon directory during setup. If it happens to be a dedicated mounted directory, it fails of course.
[10:49] <fghaas> well there's really no reason to have a separate mon directory, at least I've never found one. I use one device (and hence, filesystem) per osd, but not per mon
[10:50] <fghaas> s/separate mon directory/separate mon filesystem/ of course
[10:50] <joao> I believe Sage pushed a patch recently to avoid removal of the mon directories
[10:50] <tnt> Well it was just because the osd were on separate disks and the root was ext4 and I wanted xfs so I created a dedicated partition for it ...
[10:51] <fghaas> for a mon, ext4 vs btrfs vs xfs should really not have any significant impact, to the best of my knowledge
[10:53] * yoshi (~yoshi@p22043-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[10:56] * loicd (~loic@ has joined #ceph
[10:59] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[10:59] * BManojlovic (~steki@ has joined #ceph
[11:10] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[11:11] <tnt> Mmm, the mkcephfs script doesn't seem to copy the keyring at the right place. I had to copy them from /tmp/foo to the appropriate /var/lib/ceph/osd/ceph-X/keyring location.
[11:12] <fghaas> yeah, I ran into that too
[11:13] <fghaas> you can also do "ceph auth get-or-create osd.0 > /var/lib/ceph/osd/ceph-0/keyring"
[11:14] <fghaas> the change that introduced new default keyring locations in 0.48 was botched a bit usability-wise, imnsho, and not fixing up mkcephfs is one example
[11:16] <tnt> Ok, at least it's a known issue :)
[11:20] <fghaas> actually, I think I still need to file a bug for this
[11:34] <fghaas> tnt: http://tracker.newdream.net/issues/2845
[11:38] <Azrael> hey fghaas !
[11:38] <Azrael> were there videos recorded of your OSCON Ceph presentation? and the openstack HA presentation?
[11:40] <tnt> fghaas: great thanks.
[11:42] <fghaas> Azrael: nope, OSCON only records keynotes and interviews.
[11:42] <fghaas> I'll do the openstack HA talk again at cloudopen; not sure if there will be recordings there
[11:42] <Azrael> ahh ok
[11:43] <Azrael> somebody was asking in #openstack about high-availability with openstack
[11:43] <Azrael> i pointed them to your slides from oscon
[11:43] <Azrael> but the slides don't say much without your talking heh
[11:43] * Azrael is Josh btw
[11:45] <fghaas> well, yeah my slides aren't exactly complete without my babbling :)
[11:45] <Azrael> heh heh
[11:45] <Azrael> it was good babbling though
[11:47] <fghaas> jenkins builds of the openstack ha guide should be popping up on http://docs.openstack.org any day now, watch my blog for updates on that
[11:47] <Azrael> nice
[11:47] <Azrael> i'm looking forward to folsom
[11:47] <fghaas> apologies everyone, we're going OT here. :) back to ceph
[11:47] <Azrael> boot from volume, quantum, and then some
[11:47] <Azrael> haha
[11:49] * loicd (~loic@ Quit (Ping timeout: 480 seconds)
[11:50] <tnt> Should bug in the doc also be reported on the redmine ?
[11:51] <fghaas> tnt: what bug in the doc?
[11:52] <fghaas> http://ceph.com/docs/master/config-cluster/authentication/ looks quite good to me, really
[11:52] <fghaas> see section "daemon keyrings"
[11:52] <tnt> No I moved on :) In http://ceph.com/docs/master/radosgw/config/ sometiems it uses client.radosgw.gateway and sometimes client.rados.gateway
[11:53] <tnt> and it seems you need to choose one or the other but be consistent
[11:53] <fghaas> um, yeah. john and I kind of mis-coordinated on that one, he and I wrote essentially the same content in parallel, without coordinating
[11:53] <fghaas> and we still need to consolidate
[11:54] <fghaas> does http://ceph.com/docs/master/ops/radosgw/ work better for you? you still get to pick & choose :)
[11:56] <tnt> yes that seems more consistent.
[11:57] <tnt> Ah and you really need client.radosgw.X and not 'client.rados.X because the startup script relies on that prefix
[11:58] <fghaas> yes it does
[12:03] <tnt> Ok, all works fine now.
[12:06] <fghaas> tnt: which client are you testing with?
[12:07] <tnt> the 's3' executable
[12:07] <tnt> btw, I use lighttpd and I don't need a fastcgi wrapper since the daemon is managed externally ...
[12:09] <fghaas> tnt: is that from libs3?
[12:10] <tnt> Yes IIRC.
[12:11] <tnt> but a forked version that accepts different hostname https://github.com/wido/libs3
[12:25] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[12:25] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[12:30] <tnt> Is mounting a RBD on a machine that has an OSD supposed to kernel panic ?
[12:33] <fghaas> I'd say kernel panics are never supposed to happen
[12:33] <fghaas> however are you sure it's a panic and not just a stack trace?
[12:33] <tnt> http://pastebin.com/RgnapSvk
[12:33] <tnt> Well the machine is dead
[12:34] <tnt> and it's 100% reproductible on that setup ... (I rebooted it and retried a rbd map and bam ... dead again)
[12:34] <fghaas> is this right after "rbd map" or after actually opening your /dev/rbdX?
[12:35] <tnt> right after rbd map. I don't even see it returning, the ssh connection dies before that.
[12:35] <fghaas> 0.48, ubuntu 3.2.0 kernel? does this also occur when you're mapping from a different box?
[12:37] <tnt> It's 0.48 and a up to date ubuntu 12.04 ( so yes, 3.2.0-24-generic kernel ). I'll try from another box.
[12:41] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Quit: LarsFronius)
[12:42] <tnt> fghaas: seems to work fine from another box
[12:42] <fghaas> ok, that's what I thought. I'd be surprised if it didn't, it's been working fine for me that way
[12:42] <fghaas> never tried mounting from an osd though
[12:43] <fghaas> now it does say somewhere that you shouldn't be mounting cephfs from a ceph cluster nodes, but afair nothing about rbd
[12:43] <fghaas> but the problem seems to be in libceph, which both use
[12:44] <tnt> strangly I used the manual "echo ..." method and ... it worked fine.
[12:44] <fghaas> you mean via the sysfs file?
[12:45] <tnt> Yup ...
[12:45] <fghaas> what's your exact "rbd map" command line?
[12:45] <tnt> rbd map es-data-0 --pool rbd --name client.admin --secret /etc/ceph/keyring
[12:46] <tnt> and for the echo I do:
[12:46] <tnt> echo " name=admin,secret=AQDgARFQ8MzbLxAA5bouqkjb26sZk5eGqfkKcQ== rbd es-data-0" > /sys/bus/rbd/add
[12:47] <fghaas> um, wait. it's not supposed to doesn't work with --secret pointing to the standard keyring file
[12:47] <fghaas> s/doesn't//
[12:47] <fghaas> do echo AQDgARFQ8MzbLxAA5bouqkjb26sZk5eGqfkKcQ== > /tmp/secret, and then rbd map es-data-0 --pool rbd --name client.admin --secret /tmp/secret
[12:48] <tnt> Ah yes indeed, this way it works !
[12:48] <fghaas> still a silly bug for this to cause a null pointer deref, but yeah -- yet another case of unsanitized user input
[12:49] * tnt goes to read the source ...
[12:49] <fghaas> tnt, do so at your own risk :)
[12:50] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[12:53] <fghaas> tnt: http://tracker.newdream.net/issues/2846
[12:57] <tnt> fghaas: thanks. I would have reported it myself but I usually wait until I found the real cause and have a patch :p
[12:57] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) Quit (Read error: Connection reset by peer)
[12:58] <fghaas> feel free to add the patch to that bug
[12:58] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[13:00] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) has joined #ceph
[13:01] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[13:09] * gregorg_taf (~Greg@ has joined #ceph
[13:09] * gregorg (~Greg@ Quit (Read error: Connection reset by peer)
[13:10] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[13:12] * gregaf (~Adium@2607:f298:a:607:a9c4:6ffb:55e6:752d) has joined #ceph
[13:14] * ajm (~ajm@adam.gs) has joined #ceph
[13:15] * iggy2 (~iggy@theiggy.com) has joined #ceph
[13:15] * steki-BLAH (~steki@ has joined #ceph
[13:15] * cephalobot` (~ceph@ps94005.dreamhost.com) has joined #ceph
[13:15] * asadpand- (~asadpanda@2001:470:c09d:0:20c:29ff:fe4e:a66) has joined #ceph
[13:17] * __jt___ (~james@jamestaylor.org) has joined #ceph
[13:17] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (resistance.oftc.net charm.oftc.net)
[13:17] * BManojlovic (~steki@ Quit (resistance.oftc.net charm.oftc.net)
[13:17] * gregaf1 (~Adium@ Quit (resistance.oftc.net charm.oftc.net)
[13:17] * cephalobot (~ceph@ps94005.dreamhost.com) Quit (resistance.oftc.net charm.oftc.net)
[13:17] * rturk (~rturk@ps94005.dreamhost.com) Quit (resistance.oftc.net charm.oftc.net)
[13:17] * asadpanda (~asadpanda@2001:470:c09d:0:20c:29ff:fe4e:a66) Quit (resistance.oftc.net charm.oftc.net)
[13:17] * jeffp (~jplaisanc@net66-219-41-161.static-customer.corenap.com) Quit (resistance.oftc.net charm.oftc.net)
[13:17] * iggy (~iggy@theiggy.com) Quit (resistance.oftc.net charm.oftc.net)
[13:17] * __jt__ (~james@jamestaylor.org) Quit (resistance.oftc.net charm.oftc.net)
[13:17] * ajm- (~ajm@adam.gs) Quit (resistance.oftc.net charm.oftc.net)
[13:17] * ivan` (~ivan`@li125-242.members.linode.com) Quit (resistance.oftc.net charm.oftc.net)
[13:17] * asadpand- is now known as asadpanda
[13:20] * ivan` (~ivan`@li125-242.members.linode.com) has joined #ceph
[13:20] * rturk (~rturk@ps94005.dreamhost.com) has joined #ceph
[13:23] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[13:23] * MarkN (~nathan@ has joined #ceph
[13:24] * jeffp (~jplaisanc@net66-219-41-161.static-customer.corenap.com) has joined #ceph
[13:35] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[13:35] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[13:39] * fghaas (~florian@91-119-129-178.dynamic.xdsl-line.inode.at) Quit (Ping timeout: 480 seconds)
[13:44] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[13:47] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[13:52] * LarsFronius_ (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[13:52] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Read error: Connection reset by peer)
[13:52] * LarsFronius_ is now known as LarsFronius
[13:53] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[13:55] * fghaas (~florian@91-119-129-178.dynamic.xdsl-line.inode.at) has joined #ceph
[14:01] <fghaas> tnt: thanks for diggng into that issue :)
[14:02] <tnt> np. That's how opensource works (or is supposed to imho :p)
[14:03] <fghaas> couldn't possibly agree more
[14:11] * al (quassel@niel.cx) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * fghaas (~florian@91-119-129-178.dynamic.xdsl-line.inode.at) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * __jt___ (~james@jamestaylor.org) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * deepsa (~deepsa@ Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * eternaleye (~eternaley@tchaikovsky.exherbo.org) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * stass (~stas@ssh.deglitch.com) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * glowell (~glowell@ Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * rosco (~r.nap@ Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * nhm (~nh@65-128-130-177.mpls.qwest.net) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * nolan (~nolan@phong.sigbus.net) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * fc (~fc@ Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * cclien (~cclien@ec2-50-112-123-234.us-west-2.compute.amazonaws.com) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * acaos (~zac@209-99-103-42.fwd.datafoundry.com) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * psomas (~psomas@inferno.cc.ece.ntua.gr) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * jpieper (~josh@209-6-86-62.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * jantje (~jan@paranoid.nl) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * Solver (~robert@atlas.opentrend.net) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * asadpanda (~asadpanda@2001:470:c09d:0:20c:29ff:fe4e:a66) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * iggy2 (~iggy@theiggy.com) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * yehudasa (~yehudasa@ Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * Meths (rift@ Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * Dr_O (~owen@heppc049.ph.qmul.ac.uk) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * cephalobot` (~ceph@ps94005.dreamhost.com) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * sagewk (~sage@2607:f298:a:607:219:b9ff:fe40:55fe) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * sjust (~sam@ Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * mikeryan (mikeryan@lacklustre.net) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * sileht (~sileht@sileht.net) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * sage (~sage@cpe-76-94-40-34.socal.res.rr.com) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * brambles (xymox@grip.espace-win.org) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * MK_FG (~MK_FG@ Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * wido (~wido@rockbox.widodh.nl) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * Azrael (~azrael@terra.negativeblue.com) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * todin (tuxadero@kudu.in-berlin.de) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * eightyeight (~atoponce@pinyin.ae7.st) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * steki-BLAH (~steki@ Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * gregaf (~Adium@2607:f298:a:607:a9c4:6ffb:55e6:752d) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * lightspeed (~lightspee@fw-carp-wan.ext.lspeed.org) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * morse (~morse@supercomputing.univpm.it) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * newtontm (~jsfrerot@charlie.mdc.gameloft.com) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * ninkotech (~duplo@ Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * ogelbukh (~weechat@nat3.4c.ru) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * jamespage (~jamespage@tobermory.gromper.net) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * darkfaded (~floh@ Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * andret (~andre@pcandre.nine.ch) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * gohko (~gohko@natter.interq.or.jp) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * Fruit (wsl@2001:980:3300:2:216:3eff:fe10:122b) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * Anticimex (anticimex@netforce.csbnet.se) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * Ludo_ (~Ludo@falbala.zoxx.net) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * MarkN (~nathan@ Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * ajm (~ajm@adam.gs) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * alexxy (~alexxy@2001:470:1f14:106::2) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * markl (~mark@tpsit.com) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * izdubar (~MT@c-50-137-1-13.hsd1.wa.comcast.net) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * gadago (~gavin@2001:9d8::223:54ff:fee2:f41d) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * cattelan (~cattelan@2001:4978:267:0:21c:c0ff:febf:814b) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * jefferai (~quassel@quassel.jefferai.org) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * laevar (~jochen@laevar.de) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * vhasi (martin@vha.si) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * guerby (~guerby@nc10d-ipv6.tetaneutral.net) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * lurbs (user@uber.geek.nz) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * Meyer__ (meyer@c64.org) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * DLange (~DLange@dlange.user.oftc.net) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * benner (~benner@ Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * SpamapS (~clint@xencbyrum2.srihosting.com) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * hijacker (~hijacker@ Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * raso (~raso@deb-multimedia.org) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * liiwi (liiwi@idle.fi) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * MarkS (~mark@irssi.mscholten.eu) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * wonko_be (bernard@november.openminds.be) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * Ormod (~valtha@ohmu.fi) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * morpheus (~morpheus@foo.morphhome.net) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * _are_ (~quassel@vs01.lug-s.org) Quit (kinetic.oftc.net reticulum.oftc.net)
[14:11] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[14:11] * fghaas (~florian@91-119-129-178.dynamic.xdsl-line.inode.at) has joined #ceph
[14:11] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[14:11] * MarkN (~nathan@ has joined #ceph
[14:11] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[14:11] * __jt___ (~james@jamestaylor.org) has joined #ceph
[14:11] * asadpanda (~asadpanda@2001:470:c09d:0:20c:29ff:fe4e:a66) has joined #ceph
[14:11] * cephalobot` (~ceph@ps94005.dreamhost.com) has joined #ceph
[14:11] * steki-BLAH (~steki@ has joined #ceph
[14:11] * iggy2 (~iggy@theiggy.com) has joined #ceph
[14:11] * ajm (~ajm@adam.gs) has joined #ceph
[14:11] * gregaf (~Adium@2607:f298:a:607:a9c4:6ffb:55e6:752d) has joined #ceph
[14:11] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) has joined #ceph
[14:11] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[14:11] * markl (~mark@tpsit.com) has joined #ceph
[14:11] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[14:11] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[14:11] * sagewk (~sage@2607:f298:a:607:219:b9ff:fe40:55fe) has joined #ceph
[14:11] * sjust (~sam@ has joined #ceph
[14:11] * yehudasa (~yehudasa@ has joined #ceph
[14:11] * deepsa (~deepsa@ has joined #ceph
[14:11] * eternaleye (~eternaley@tchaikovsky.exherbo.org) has joined #ceph
[14:11] * guerby (~guerby@nc10d-ipv6.tetaneutral.net) has joined #ceph
[14:11] * izdubar (~MT@c-50-137-1-13.hsd1.wa.comcast.net) has joined #ceph
[14:11] * lightspeed (~lightspee@fw-carp-wan.ext.lspeed.org) has joined #ceph
[14:11] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[14:11] * Meths (rift@ has joined #ceph
[14:11] * benner (~benner@ has joined #ceph
[14:11] * stass (~stas@ssh.deglitch.com) has joined #ceph
[14:11] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[14:11] * mikeryan (mikeryan@lacklustre.net) has joined #ceph
[14:11] * glowell (~glowell@ has joined #ceph
[14:11] * gadago (~gavin@2001:9d8::223:54ff:fee2:f41d) has joined #ceph
[14:11] * newtontm (~jsfrerot@charlie.mdc.gameloft.com) has joined #ceph
[14:11] * rosco (~r.nap@ has joined #ceph
[14:11] * sileht (~sileht@sileht.net) has joined #ceph
[14:11] * ninkotech (~duplo@ has joined #ceph
[14:11] * nhm (~nh@65-128-130-177.mpls.qwest.net) has joined #ceph
[14:11] * ogelbukh (~weechat@nat3.4c.ru) has joined #ceph
[14:11] * cattelan (~cattelan@2001:4978:267:0:21c:c0ff:febf:814b) has joined #ceph
[14:11] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[14:11] * sage (~sage@cpe-76-94-40-34.socal.res.rr.com) has joined #ceph
[14:11] * Dr_O (~owen@heppc049.ph.qmul.ac.uk) has joined #ceph
[14:11] * jamespage (~jamespage@tobermory.gromper.net) has joined #ceph
[14:11] * nolan (~nolan@phong.sigbus.net) has joined #ceph
[14:11] * darkfaded (~floh@ has joined #ceph
[14:11] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) has joined #ceph
[14:11] * fc (~fc@ has joined #ceph
[14:11] * brambles (xymox@grip.espace-win.org) has joined #ceph
[14:11] * andret (~andre@pcandre.nine.ch) has joined #ceph
[14:11] * MK_FG (~MK_FG@ has joined #ceph
[14:11] * cclien (~cclien@ec2-50-112-123-234.us-west-2.compute.amazonaws.com) has joined #ceph
[14:11] * jantje (~jan@paranoid.nl) has joined #ceph
[14:11] * acaos (~zac@209-99-103-42.fwd.datafoundry.com) has joined #ceph
[14:11] * jpieper (~josh@209-6-86-62.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) has joined #ceph
[14:11] * Solver (~robert@atlas.opentrend.net) has joined #ceph
[14:11] * psomas (~psomas@inferno.cc.ece.ntua.gr) has joined #ceph
[14:11] * wido (~wido@rockbox.widodh.nl) has joined #ceph
[14:11] * wonko_be (bernard@november.openminds.be) has joined #ceph
[14:11] * Ormod (~valtha@ohmu.fi) has joined #ceph
[14:11] * al (quassel@niel.cx) has joined #ceph
[14:11] * morpheus (~morpheus@foo.morphhome.net) has joined #ceph
[14:11] * _are_ (~quassel@vs01.lug-s.org) has joined #ceph
[14:11] * Ludo_ (~Ludo@falbala.zoxx.net) has joined #ceph
[14:11] * Anticimex (anticimex@netforce.csbnet.se) has joined #ceph
[14:11] * DLange (~DLange@dlange.user.oftc.net) has joined #ceph
[14:11] * Meyer__ (meyer@c64.org) has joined #ceph
[14:11] * MarkS (~mark@irssi.mscholten.eu) has joined #ceph
[14:11] * lurbs (user@uber.geek.nz) has joined #ceph
[14:11] * liiwi (liiwi@idle.fi) has joined #ceph
[14:11] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) has joined #ceph
[14:11] * vhasi (martin@vha.si) has joined #ceph
[14:11] * laevar (~jochen@laevar.de) has joined #ceph
[14:11] * Fruit (wsl@2001:980:3300:2:216:3eff:fe10:122b) has joined #ceph
[14:11] * raso (~raso@deb-multimedia.org) has joined #ceph
[14:11] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[14:11] * hijacker (~hijacker@ has joined #ceph
[14:11] * SpamapS (~clint@xencbyrum2.srihosting.com) has joined #ceph
[14:11] * jefferai (~quassel@quassel.jefferai.org) has joined #ceph
[14:11] * eightyeight (~atoponce@pinyin.ae7.st) has joined #ceph
[14:11] * todin (tuxadero@kudu.in-berlin.de) has joined #ceph
[14:11] * Azrael (~azrael@terra.negativeblue.com) has joined #ceph
[14:16] * deepsa (~deepsa@ Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[14:17] * deepsa (~deepsa@ has joined #ceph
[14:19] <tnt> Mmm, I'm a bit surprised. I changed rbd.cc to write to /tmp/foo instead of /sys/bus/rbd/add and then I cat /tmp/foo > /sys/bus/rbd/add ... and that doesn't crash (the kernel refuses to parse the option like it should).
[14:32] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) has joined #ceph
[14:46] <newtontm> Hi,
[14:47] <newtontm> I'm currently writing a puppet classe for ceph, and i'm going through chef cookbook for ceph and trying to figure out 1 specific part
[14:48] <newtontm> I've been able to install and configure mon so far, no i'm doing the osd part. In my understanding chef cookbook is done for a btrfs partitions for osd, but in my case i'm configuring osd over xfs
[14:48] <newtontm> can someone help me to prepare my osd folder to be usable on xfs?
[14:49] <fghaas> yes, hang on
[14:49] <fghaas> I can give you the manual commands for initializing an OSD, and then you can puppetize those :)
[14:49] <newtontm> Here is what I did so far: ceph-disk-prepare --cluster-uuid={fsid} /opt/data/ceph/osd/ceph-0
[14:50] <fghaas> # ceph mon getmap -o /tmp/monmap
[14:50] <fghaas> got latest monmap
[14:50] <fghaas> # ceph-osd -i 3 --mkjournal \
[14:50] <fghaas> ??--mkfs --monmap /tmp/monmap
[14:51] <fghaas> that's essentially what you always need to do for a new osd
[14:51] * loicd (~loic@194.201-14-84.ripe.coltfrance.com) has joined #ceph
[14:51] <tnt> The thing is you need to execute some commands on the mon (or admin node) ... and then some commands on the osd.
[14:51] <fghaas> so: mkfs.xfs, mount it to whatever location you prefer, then do the above
[14:52] <fghaas> you can run both of the above from the osd as long as you have access to the client.admin key
[14:53] <tnt> and if you use cephx you'll need --mkkey I think. And register the key to the mon
[14:53] <newtontm> i do have cephx
[14:53] <tnt> Have a look at http://ceph.com/wiki/OSD_cluster_expansion/contraction
[14:54] <fghaas> tnt, that's somewhat outdated as it hasn't been updated for the new way of creating keys
[14:54] <tnt> well, I followed it not so long ago and it mostly worked AFAIR
[14:54] <fghaas> (11:52:07) fghaas: http://ceph.com/docs/master/config-cluster/authentication/ looks quite good to me, really
[14:54] <fghaas> (11:52:33) fghaas: see section "daemon keyrings"
[14:55] <fghaas> if "not so long ago" was before argonaut then you would run into a problem following it today :)
[14:55] <newtontm> ok let me try that, and come back to you if I encounter problems ;)
[14:55] <newtontm> thx
[14:56] <newtontm> oh, and was that part necessary before doing what you recommanded ? ceph-disk-prepare --cluster-uuid={fsid} /opt/data/ceph/osd/ceph-0 ?
[14:57] <fghaas> I've never used that, to be honest... but then all my clusters up to this point have been initialized with mkcephfs
[14:59] <fghaas> tnt, joao: any input on that for newtontm?
[15:02] <tnt> fghaas: I used ceph-mon --mkfs ....
[15:02] <fghaas> tnt: um, thats for initializing a mon, not an osd
[15:02] <tnt> ceph-osd --mkfs sorry
[15:03] <fghaas> yeah, that's what I quoted above
[15:03] <newtontm> I think I don't need to run that command "ceph-disk-prepare", i've used the steps as provided here: http://ceph.com/wiki/OSD_cluster_expansion/contraction and it seems to work properly
[15:03] <tnt> exactly. I don't think there is a need to go lower level than this because you can run ceph-osd --mkfs on an empty machine that has no config yet.
[15:03] <gadago> does anyone have an explanation or a good article that explain what object storage is and why you would use it?
[15:05] <Azrael> gadago: object storage is good for storing static items such as movies, photos, etc. items that you don't change over time.
[15:05] <fghaas> gadago: this is not object storage specific, but does it help anyhow? http://www.hastexo.com/blogs/florian/2012/03/08/ceph-tickling-my-geek-genes
[15:10] * tnt is having trouble tracing that kernel bug :(
[15:10] <fghaas> gadago: also, sage's 2007 paper on RADOS will help: http://ceph.com/papers/weil-rados-pdsw07.pdf
[15:12] <newtontm> should I worry about this error message when running: ceph-osd -c /etc/ceph/ceph.conf -i 0 --mkfs --monmap /tmp/monmap --mkkey
[15:12] <newtontm> 2012-07-26 12:59:13.178980 7f0fdbc50780 -1 filestore(/opt/data/ceph/osd/ceph-0) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
[15:12] <fghaas> you forgot --mkjournal
[15:13] <gadago> thanks for the links guys
[15:13] <newtontm> ok, but it's not in the wiki ;)
[15:13] <fghaas> oh, indeed it's not
[15:13] <newtontm> i'll try again following what you gave me earlire ;)
[15:13] <newtontm> earlier*
[15:14] * mkampe (~markk@2607:f298:a:607:222:19ff:fe31:b5d3) has joined #ceph
[15:15] <newtontm> however, I still get the error after running: ceph-osd -c /etc/ceph/ceph.conf -i 0 --mkjournal --mkfs --monmap /tmp/monmap --mkkey
[15:15] <newtontm> 2012-07-26 13:15:19.909528 7f2e0d148780 -1 filestore(/opt/data/ceph/osd/ceph-0) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
[15:16] <fghaas> well did you set "osd journal" correctly in your ceph.conf? because you're definitely using a non-default "osd data"
[15:17] <newtontm> [osd]
[15:17] <newtontm> osd journal = /opt/data/ceph/osd/ceph-$id/journal
[15:17] <newtontm> osd journal size = 1000
[15:17] <newtontm> osd data = /opt/data/ceph/osd/ceph-$id
[15:17] <newtontm> keyring = /etc/ceph/ceph.keyring.$name
[15:17] <fghaas> um, ever heard of pastebin?
[15:17] <newtontm> yeah
[15:17] <newtontm> but that's not too long, or sorry, if I should have used paste bin anyways
[15:18] <fghaas> I had thought there were more lines coming :)
[15:19] <newtontm> here is the full config file: http://pastebin.com/VN9QBpz3
[15:20] <newtontm> so am I doing something wrong ?
[15:20] <joao> newtontm, any other messages before that?
[15:20] <newtontm> joao: no only after
[15:20] <joao> also, try adding --debug--filestore 20
[15:20] <newtontm> k
[15:20] <joao> should provide further insight into what's happening
[15:21] <newtontm> btw, i'm deleting the content of /opt/data/ceph/osd/ceph-0 before running the cmd again
[15:21] <joao> I think the --mkfs option no longer removes contents, so I suppose that's a good idea
[15:22] <fghaas> what's your rationale for setting the fsid explicitly?
[15:22] * deepsa_ (~deepsa@ has joined #ceph
[15:22] <newtontm> ok, so where should I see the debuig info ?
[15:22] * deepsa (~deepsa@ Quit (Ping timeout: 480 seconds)
[15:22] * deepsa_ is now known as deepsa
[15:22] <joao> oh
[15:23] <joao> newtontm, try running it with -d and it should output to the terminal
[15:23] <joao> would be fastest that way
[15:24] <newtontm> result: http://pastebin.com/K0W93sr2
[15:24] <newtontm> fghaas: I thought that fsid was to be the same throughout my ceph cluster, so i provide it in puppet and put it there when needed.
[15:24] <joao> newtontm, what seems to be the problem then?
[15:25] <joao> everything appears to work just fine
[15:25] <newtontm> BTRFS_IOC_SUBVOL_CREATE ioctl failed, trying mkdir /opt/data/ceph/osd/ceph-0/current ?
[15:26] <joao> are you using btrfs?
[15:26] <newtontm> no
[15:26] <newtontm> xfs
[15:26] <joao> then that is expected
[15:27] <joao> we cannot create btrfs subvolumes if the underlying fs is not btrfs ;)
[15:27] <newtontm> ok but I still get this error: 2012-07-26 13:23:16.914198 7f3f8686c780 -1 filestore(/opt/data/ceph/osd/ceph-0) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
[15:28] <newtontm> just before you have: read meta/23c2fcde/osd_superblock/0//-1 0~0
[15:29] <joao> newtontm, that's the daemon looking for some metadata, not finding it and continuing gracefully
[15:29] <fghaas> newtontm: I don't set fsid, and my ceph clusters seem to work just dandy :)
[15:30] <joao> newtontm, are you noticing any problem getting the osd up and running? what lead you to look into the mkfs process? (I'm a bit out of context here)
[15:30] <fghaas> joao: he's just following the "expansion and contraction" wiki page
[15:31] <joao> oh, okay then
[15:31] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) Quit (Remote host closed the connection)
[15:32] <newtontm> I saw this error and thought that maybe i'm missing something... even though my osd is up and running after running the commands
[15:33] <joao> nope, I suppose that's just a function being too verbose
[15:33] <joao> :)
[15:33] <newtontm> k thx
[15:35] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) has joined #ceph
[15:37] <Azrael> what (minimum) kernel version is recommended for running the ceph and rbd clients?
[15:37] <tnt> If you use btrfs ... 3.10 maybe :)
[15:38] <Azrael> heh
[15:38] <fghaas> he was asking about clients
[15:38] <Azrael> i was, but i figured just saying heh was easier than explaining
[15:38] <fghaas> 3.2.0 (ubuntu precise) works fine for me, with the usual caveats of ceph fs not being production ready
[15:38] <joao> not sure; elder?
[15:38] <Azrael> fghaas: ok thanks
[15:38] <tnt> I'm using 3.2.0 for rbd clientas well
[15:39] <tnt> the lack of caching sucks though
[15:39] <Azrael> fghaas: i'm not a fan of ubuntu for servers, but all the cool stuff seems to be in ubuntu and not centos/rhel... so........ i best get used to it.
[15:39] <Azrael> tnt: ahh performance is less than ideal?
[15:40] <fghaas> Azrael: debian squeeze has a 3.2.0 backports kernel at this time
[15:41] <fghaas> if you're looking for a less, what's the word, fluid platform :)
[15:41] <Azrael> haha
[15:41] <tnt> Azrael: I'm quite early in my testing but yes so far write perf is a bit on the low side. As far as I can gather it's because every write is not acknowledge until written to all osd. (even if the called has not called 'sync').
[15:41] <Azrael> no no i like debian even less
[15:41] <Azrael> i think i just prefer systemd and rpm over upstart and deb
[15:41] <elder> joao, I don't think I can answer that question. At the moment, the newest kernels have the best kernel code. We have a linux-3.4.6-ceph branch that has newer bug fixes back-ported to 3.4.
[15:42] <Azrael> tnt: interesting
[15:42] <elder> But that isn't a complete answer to the broader question.
[15:42] <fghaas> Azrael: well for rhel/centos, you're currently stuck with a 2.6.32 red hat frankenkernel
[15:42] <tnt> Azrael: the QEMU-RBD backend doesn't have the issuebecause it uses the userspace library and this one has write-back caching suport.
[15:42] <joao> elder, I assumed you would know at least something that I didn't :p
[15:42] <joao> elder, and I was right!
[15:42] <Azrael> fghaas: haha, frankenkernel
[15:43] <Azrael> tnt: iiiiiiiiinteresting. i wasn't aware of the difference. good to know.
[15:43] * izdubar (~MT@c-50-137-1-13.hsd1.wa.comcast.net) Quit (Ping timeout: 480 seconds)
[15:43] * steki-BLAH (~steki@ Quit (Ping timeout: 480 seconds)
[15:43] <Azrael> hmm i wonder what [franken]kernel rhel 7 will supply
[15:44] <fghaas> but Azrael happens to be a Xen guy, no? so no qemu-rbd goodness for you :)
[15:44] <Azrael> yeah :-(
[15:44] <fghaas> Azrael: I just use the frankenkernel term because what ships is really way off a real 2.6.32, in ways good and bad
[15:44] <Azrael> fghaas: yup
[15:45] <Azrael> so far removed from vanilla that even folks at oracle scratch their heads
[15:45] <fghaas> I'm sure the bearers of red headgear are not too sad about _that_
[15:46] <tnt> Azrael: yeah, me either until yesterday. That was disappointing :(
[15:46] <Azrael> fghaas: heh
[15:47] <tnt> Ah finally progress ! it's the add_key syscall that crashes ...
[15:47] <fghaas> Azrael: I'm beginning to think that there's no rbd goodness for you at all, because rbd client support hit vanilla in 2.6.37, and I'm damn sure no-one at red hat bothered to backport that
[15:47] <Azrael> i'm looking forward to xen getting ceph love. from what i was told, its in the works.
[15:47] <Azrael> fghaas: ahh
[15:47] <Azrael> fghaas: hmm i usually have no qualms about deviating from rhel provided kernels
[15:48] <Azrael> i'm betting that rhel 7 will be based on 3.x
[15:48] <fghaas> well that's a safe bet
[15:48] <Azrael> which should have a somewhat decent rbd, albeit without caching
[15:48] <fghaas> but that's what, 3 years down the road?
[15:48] <Azrael> i am a xen guy after all... and for years i had to run my own patched version of the kernel on rhel :-)
[15:49] * Azrael shudders at the Xen support that came with rhel. oh god.
[15:49] <Azrael> yeah its a long time from now definitely
[15:52] <Azrael> http://www.google.com/appsstatus <--- wow google talk/chat has been down for ~3hrs now
[15:52] * steki-BLAH (~steki@79-101-189-33.dynamic.isp.telekom.rs) has joined #ceph
[15:53] <fghaas> so it says, but it happens to work for us over her
[15:53] <fghaas> here
[15:54] <Azrael> seems to not be a full outage
[15:54] <joao> Azrael, and I only figured that was the problem with empathy moments ago
[15:54] <joao> in the meantime, ended up reinstalling all the packages and updates thinking that would solve the problem :p
[15:54] <Azrael> i was getting error messages about timeouts from the xmpp servers. i made the mistake of disconnecting and haven't been able to reconnect since.
[15:54] <Azrael> haha joao
[15:55] <fghaas> Azrael: if you know someone over there, tell them to give us a call and let them know we know a thing or two about HA :)
[15:56] <joao> lol
[15:57] <fghaas> joao: now you're laughing, but just wait...
[15:58] <Azrael> fghaas: haha
[15:58] <Azrael> fghaas: oh wait i do know somebody at google
[15:58] <Azrael> two people
[15:58] <Azrael> if only i could ping them on google talk about the issue.... ;-)
[15:58] <fghaas> in fact I do too. but yeah, they're probably having that same problem :D
[15:58] <Azrael> hehe
[16:03] <ninkotech> google is not (not evil)
[16:08] * andrewbogott (~andrewbog@ has joined #ceph
[16:13] <nhm> good mornign #ceph
[16:14] * loicd (~loic@194.201-14-84.ripe.coltfrance.com) Quit (Quit: Leaving.)
[16:15] * loicd (~loic@194.201-14-84.ripe.coltfrance.com) has joined #ceph
[16:16] * loicd1 (~loic@ has joined #ceph
[16:17] <newtontm> ok some more question for osd. Once I started my fisrt OSD process, I do I need to the crush map ?
[16:17] <newtontm> do I need to modify the crush map?
[16:17] * loicd1 (~loic@ Quit ()
[16:18] * steki-BLAH (~steki@79-101-189-33.dynamic.isp.telekom.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[16:21] <tnt> newtontm: well it depends on your current crush map ... but yeah in general you will.
[16:22] <tnt> I think what you do is similar to cluster expansion one node at a time.
[16:23] * izdubar (~MT@c-50-137-1-13.hsd1.wa.comcast.net) has joined #ceph
[16:23] <newtontm> context: i'm trying to pupptize ceph installation, I was able to do it for mon, now i'm doing it for osd. I was able to prepare the osd folder and the process ist now running. How can I automate the crush map?
[16:23] <fghaas> newtontm: here's a bit of an easier way to do that than the manual process outlined in the wiki:
[16:23] <tnt> otoh, if the conf file already contained all the osd you were going to build when you created the mon, maybe the original crushmap is good
[16:23] <newtontm> i see here some commands that I think could be useful: http://ceph.com/docs/master/ops/manage/crush/
[16:23] <fghaas> ceph osd crush set 3 osd.3 1.0 pool=default rack=rackbar host=hostfoo
[16:23] * loicd (~loic@194.201-14-84.ripe.coltfrance.com) Quit (Ping timeout: 480 seconds)
[16:23] <fghaas> yeah, exactly those
[16:24] <newtontm> so even though I didn't *create* the original crush map, could I simply use those commands and it would do the trick ?
[16:34] <nhm> joao: ping
[16:35] * BManojlovic (~steki@ has joined #ceph
[16:35] <joao> nhm, pong
[16:35] <joao> sup?
[16:36] <nhm> joao: Do you know much about the threading model in the filestore? IE what causes threads to sleep and wake up, how long they should sleep for, etc?
[16:37] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[16:37] <joao> have sort of an idea
[16:37] <dspano> Good morning everyone.
[16:37] <joao> gotta look back into the code to refresh my memory ;)
[16:37] <joao> give me a couple of minutes
[16:37] <nhm> joao: ok. I might have some questions for you at some point.
[16:37] <joao> nhm, do you have some now, so I can focus on what you're looking for?
[16:38] <dspano> I've got a ton of these old messages from 7/18 and 7/23 on one of my OSDs that seems to be blocking them from peering after it's restarted.
[16:38] <dspano> 2012-07-26 10:28:04.855866 7f3d1ba3f780 20 osd.0 1592 pg[0.f( v 1578'2725 (125'1724,1578'2725] n=6 ec=1 les/c 1587/1587 1591/1591/1591) [] r=0 lpr=0 pi=1586-1590/2 (info mismatch, log(125'1724,0'0]) (log bound mismatch, actual=[125'1725,130'1746]) lcod 0'0 mlcod 0'0 inactive] read_log 2992 130'1747 (130'1746) delete 7d14da8f/1000001b9ed.00000000/head//0 by mds.0.10:4153 2012-07-18 12:26:35.162831
[16:38] <dspano> Is there a semi-easy way to fix this?
[16:40] <nhm> joao: Unfortunately not really. I see our performance tank over time and we spend a lot of time in interruptable sleep.
[16:40] <joao> nhm, may it be syncs?
[16:41] <nhm> joao: Maybe, but it doesn't seem like we spend a lot of time in sync.
[16:41] <nhm> joao: I need to narrow it down to only the threads that actually matter I think.
[16:41] <joao> ok
[16:41] <joao> will look for potential culprits :)
[16:43] <nhm> joao: ok, cool. I don't want to take up all of your time, but I'm both inexperienced with that code and my c++ isn't exactly world-class. ;)
[16:44] <joao> no problem; I'm glad to put the time into this :)
[16:45] <nhm> joao: one of the things I was wondering a bit about is how waitinterval works. It looks like a Threadpool worker will try to grab some work to do, then reset teh heartbeat timer and wait for 2 seconds?
[16:45] <joao> don't think I ever looked into that stuff
[16:46] <joao> any idea where that happens?
[16:46] <nhm> common/Workqueue.cc
[16:46] <nhm> sorry, WorkQueue.cc
[16:48] <nhm> I've fiddled with the amount of time to wait, but it looks like we are still spending a fair amount of time in pthread_cond_timedwait, so I'm wondering if the thread pool isn't getting fed with enough stuff to do.
[16:49] <joao> nhm, WorkQueues are used all over for all sorts of things
[16:49] <joao> I think the filestore alone has a couple of them
[16:49] <joao> I'm looking at the WorkQueue code now, but haven't figured yet what the hearbeat is all about
[16:50] <nhm> joao: I think the heartbeat is just there to tell you when a thread isn't responding fast enough.
[16:50] <joao> yeah, that would make sense
[16:51] <nhm> joao: I see messages from the workload generator complaining about the io writer threads pretty regularly.
[16:51] <joao> looks like it's used to set up a timeout each time we start processing some work
[16:53] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Quit: LarsFronius)
[16:54] <nhm> joao: anyway, don't feel like you need to spend all morning on this. I'm just trying to get a better feeling for what an optimal flow would look like so I am more likely to spot when it's not behaving right.
[16:55] <joao> nhm, good thing my morning is over for 4 hours now :p
[16:55] <nhm> lol
[16:55] <nhm> good point
[16:56] <joao> well, looks like we only have a WorkQueue on the filestore
[16:56] <joao> and it is the one handling OpSequencers
[16:57] <joao> basically, we queue each collection transaction onto an OpSequencer
[16:57] <joao> the point being increasing the parallelization of work, allowing us to apply multiple transactions as long as they are on different collections
[16:58] <iggy2> most of the xen bits were merged in to upstream qemu, so they should be pretty close to using upstream qemu now (which should get you qemu's rbd support at some level)
[16:58] <iggy2> and.... I was scrolled way up, whoops
[16:58] * iggy2 is now known as iggy
[17:00] <joao> nhm, have you found any way to look more closely where things are happening? For instance, obtaining stats of how long we spend inside one particular function?
[17:01] <nhm> joao: yes, I can do that both for cpu time and wall clock time.
[17:01] <nhm> joao: that thing I sent you the other day was a very low-sample wall clock breakdown.
[17:02] <nhm> joao: but of all threads mashed together
[17:03] <nhm> joao: just sent you another one.
[17:05] <joao> nhm, if you're noticing those complains about the IO writter thread, maybe it would be worth to take a look into what's happening in ThreadPool::WorkQueue::_void_process()
[17:05] * fghaas (~florian@91-119-129-178.dynamic.xdsl-line.inode.at) Quit (Ping timeout: 480 seconds)
[17:05] <joao> or, in filestore's case, OpWq::_process()
[17:05] <joao> looking at the email now :)
[17:13] <nhm> joao: part of the problem still seems to be the controller. I see the writev's to the jouranl taking more time and having higher IOwait when the OSD *data* disk is switched from an SSD to a SATA disk.
[17:15] <nhm> It's like the presence of slower writes to the osd data disk causes stalls in writevs to the journal disk.
[17:15] * LarsFronius (~LarsFroni@2a02:8108:380:90:70cb:d462:7ce2:4b09) has joined #ceph
[17:15] <joao> unless we are writing once in journal, then in the data disk, then in journal again, then in the data disk, ..., I don't see why that would happen
[17:16] <joao> but I guess that's the behavior that would make sense
[17:16] <joao> write a transaction to journal, then apply it, then write another to journal, then apply it...
[17:17] <joao> I wonder if bundling a couple of transactions and write them all into the journal, and then into the data disk would make it better
[17:17] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:17] * fghaas (~florian@91-119-129-178.dynamic.xdsl-line.inode.at) has joined #ceph
[17:17] <nhm> joao: yeah, that's what I was just wondering too.
[17:19] <nhm> I can do concurrent sequential writes to a given file on multiple drives at the same time without issue though. I wonder if it has to do with seeks/metadata traffic on one drive slowing down sequential writes to the other.
[17:19] <joao> nhm, in the email, the number on the left is the number of times we hit that particular backtrace, right?
[17:19] <nhm> yep
[17:19] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Quit: Ex-Chat)
[17:20] <joao> nhm, I think the problem would lie with the filestore waiting on completion of the journal write in order to proceed with applying the next transaction, and vice-versa
[17:20] <joao> not sure though, have to confirm that
[17:21] <nhm> joao: That would be very very good to confirm.
[17:22] <nhm> Seems like all FS writes should be overlapped with journal writes and those should both be overlapped with network IO.
[17:22] <joao> that is probably what happens
[17:22] <joao> I just want to confirm
[17:24] <joao> btw, nhm, do you see anything in the logs regarding "queue_transactions" ?
[17:24] <nhm> joao: I was wondering if maybe some how the thread(s) that deal with writing out data end up sleeping and don't get properly poked and woken up in some corner case or something.
[17:24] <joao> it should say "parallel" or "writeahead"
[17:24] <nhm> what log level do I need for that?
[17:24] <joao> 5
[17:25] <joao> just trying to decide which branch to follow on an if :p
[17:25] <nhm> joao: let me try it out and I'll see.
[17:26] <nhm> writeahead
[17:26] <joao> cool, thanks
[17:27] * andrewbogott (~andrewbog@ Quit (Quit: andrewbogott)
[17:27] * LarsFronius (~LarsFroni@2a02:8108:380:90:70cb:d462:7ce2:4b09) Quit (Ping timeout: 480 seconds)
[17:27] * deepsa (~deepsa@ Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[17:27] <Azrael> iggy: still -- thats good to hear :-)
[17:28] <tnt> I don't think you can use qemu-rdb with xen ... unless you use HVM maybe.
[17:29] <Azrael> tnt: you're most likely correct
[17:29] <Azrael> you can use just a regular kernel rbd as a block device backend for a vm
[17:30] <Azrael> but like you were talking about earlier ... no writeback cache
[17:30] <iggy> oh, yeah, I would assume that is correct (re: hvm required)
[17:31] <Azrael> its like 80F outside here in copenhagen. thats gotta be a record hah. i'm gonna go throw some steaks on the grill. bbl.
[17:31] <iggy> must be nice... it's been over 100F quite a bit lately
[17:31] <Azrael> where at
[17:33] <tnt> Azrael: yes preciserly. I was thinking about implementing a NBD server usng librdb as a backend to "work around" the issue.
[17:33] <iggy> Houston, TX
[17:34] <iggy> tnt: but that basically kills the point of rbd because you are funneling everything through 1 spof
[17:35] <tnt> iggy: Well I would put that server on each Dom0 for its own usage.
[17:43] * andrewbogott (~andrewbog@50-93-251-66.fttp.usinternet.com) has joined #ceph
[17:44] * BManojlovic (~steki@ has joined #ceph
[17:47] <nhm> joao: find anything interesting?
[17:48] <joao> nhm, looks like in writeahead it will first make sure everything is in the journal before processing the transaction, but haven't seen anything that would not allow writing to the journal while the last transaction is being applied
[17:48] <nhm> joao: I think when the jouranl queue fills up, you end up in a situation where every time a write to the FS happens, another jouranl writes happens until the next filestore write happens.
[17:48] <joao> there is, however, a queue on FileJournal.{cc,h} that appears to be responsible for processing the journal writes
[17:48] <nhm> And it does end up waiting because the journal queue is full.
[17:49] <joao> nhm, but I think we do perform multiple writes to the log in one go
[17:51] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[17:53] <joao> nhm, as far as I can see, we do attempt to write as many journal entries as possible, up to 'journal_max_write_entries' or 'journal_max_write_bytes'
[17:53] <joao> but we will be holding a lock while doing these writes
[17:54] <joao> but then again, I think that only the FileJournal threads will try to acquire that lock anyway
[17:54] <joao> (looking into that now)
[17:57] <nhm> what I find interesting is that the jouranl writes are nearly constant, but the filestore writes are inconsistent.
[17:59] <joao> well, the journal is basically appending onto the disk
[17:59] <joao> the filestore writes are bound to be more scattered throughout the disk
[18:00] <joao> anyway, from the looks of it, that lock is only acquired in the FileJournal, and I doubt any other thread besides the FileJournal's write thread has any heavy dependence on it
[18:01] <joao> and I should get back to my unit tests for a bit, but feel free to poke me at any given time ;)
[18:02] <joao> (actually, coffee run first)
[18:02] <nhm> Ok, sounds good. Thanks for looking at it!
[18:04] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[18:05] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[18:05] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[18:07] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit ()
[18:13] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Ping timeout: 480 seconds)
[18:16] <sileht> Hi all, I have read this in the documentation: "The Ceph community provides a slightly optimized version of the apache2 and fastcgi packages."
[18:16] <sileht> But where I can find them ?
[18:20] * SuperSonicSound (~SuperSoni@9KCAAAAZE.tor-irc.dnsbl.oftc.net) has joined #ceph
[18:23] <joao> sileht, https://github.com/ceph
[18:23] <joao> more specifically, https://github.com/ceph/apache2 and https://github.com/ceph/mod_fastcgi
[18:25] <dspano> I have a ton of log mismatches from a week ago showing up in my OSD logs, and one OSD won't peer with the other, does anyone know how I can go about fixing this?
[18:25] <sileht> joao, ok thanks, I'm looking for the package but the source will be ok too :), so it's compile time
[18:25] <dspano> The logs look like this: 2012-07-26 12:23:53.633847 7fe3ca51f780 20 osd.1 1672 pg[0.3b( v 1578'2870 (282'1869,1578'2870] n=14 ec=1 les/c 1669/1669 1671/1671/1671) [] r=0 lpr=0 pi=1668-1670/1 (info mismatch, log(282'1869,0'0]) (log bound mismatch, actual=[282'1870,1138'2281]) lcod 0'0 mlcod 0'0 inactive] read_log 56032 1138'2282 (1138'2281) delete 27cd9dbb/10000023932.00000000/head//0 by mds.0.23:96512 2012-07-21 10:18:07.458921^C
[18:26] <sileht> joao, this modified version improve the performance ?
[18:27] <joao> sileht, I'm not into those details
[18:27] <sileht> joao, oki thanks a lot
[18:31] <joao> yey, gtalk is back
[18:32] * andrewbogott_ (~andrewbog@50-93-251-66.fttp.usinternet.com) has joined #ceph
[18:32] * andrewbogott (~andrewbog@50-93-251-66.fttp.usinternet.com) Quit (Read error: Connection reset by peer)
[18:32] * andrewbogott_ is now known as andrewbogott
[18:38] * fghaas (~florian@91-119-129-178.dynamic.xdsl-line.inode.at) Quit (Ping timeout: 480 seconds)
[18:45] * Tamil (~Adium@2607:f298:a:607:983a:797d:9725:9b84) has joined #ceph
[18:46] * Tamil (~Adium@2607:f298:a:607:983a:797d:9725:9b84) Quit (Quit: Leaving.)
[18:54] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[18:55] <gregaf> dspano: I think that might be a bug we're currently seeing???I'd normally point you at sjust but he's out so maybe sagewk will have more details when he gets in
[18:58] <dspano> gregaf: It looks like one of the OSDs thinks it's by itself.
[18:58] <dspano> I'm seeing the error message from this if block alot. : // hmm.. am i all alone?
[18:58] <dspano> dout(30) << "heartbeat lonely?" << dendl;
[18:58] <dspano> if (heartbeat_peers.empty()) {
[18:58] <dspano> if (now - last_mon_heartbeat > g_conf->osd_mon_heartbeat_interval && is_active()) {
[18:58] <dspano> last_mon_heartbeat = now;
[18:58] <dspano> dout(10) << "i have no heartbeat peers; checking mon for new map" << dendl;
[18:58] <dspano> monc->sub_want("osdmap", osdmap->get_epoch() + 1, CEPH_SUBSCRIBE_ONETIME);
[18:58] <dspano> monc->renew_subs();
[18:58] <dspano> }
[18:58] <dspano> }
[19:02] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) has joined #ceph
[19:02] * tnt (~tnt@99.56-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[19:02] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) Quit (Remote host closed the connection)
[19:02] * loicd (~loic@magenta.dachary.org) has joined #ceph
[19:03] * fghaas (~florian@91-119-129-178.dynamic.xdsl-line.inode.at) has joined #ceph
[19:03] <dspano> gregaf: What's bizarre is the active OSD has the affected OSD listed as a peer.
[19:04] <dspano> gregaf: Could my osdmap be all jacked up?
[19:05] <gregaf> the error you're seeing is in part of the OSD code I'm not familiar with; I think that the PG log got messed up and it's preventing the OSD from starting correctly, but I'm not sure
[19:05] <gregaf> it's not about your osdmap, though
[19:09] * Leseb (~Leseb@ Quit (Quit: Leseb)
[19:09] <dspano> gregaf: That makes sense. So it's loading the pgmap when it give all the info mismatch errors then?
[19:10] <gregaf> not the pgmap ?????which is basically just a reporting structure; I don't think the OSDs even look at it ?????but the "pg log" for a PG which it stores on disk
[19:11] <dspano> gregaf: Sorry, I was looking at the osdmap stuff right before I wrote that. I misspoke.
[19:12] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[19:15] <dspano> gregaf: The entries it's complaining about are from over a week ago. It there any way I can clear it out, or tell the OSD to just listen to the active OSD?
[19:15] <gregaf> I'll poke Sage; he's in now
[19:16] * Dr_O (~owen@heppc049.ph.qmul.ac.uk) Quit (Quit: Ex-Chat)
[19:16] <gregaf> but we're doing standup now so let us finish that
[19:17] <dspano> gregaf: Sure. Lol.
[19:20] * Ryan_Lane (~Adium@ has joined #ceph
[19:31] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[19:32] <gregaf> dspano: hmm, Sage says on replay that's normal output if your log level is high enough (*sigh*)
[19:33] <gregaf> so I'm sure yehudasa will be happy to walk you through diagnosing whatever's actually gone wrong
[19:37] * chutzpah (~chutz@ has joined #ceph
[19:38] * aliguori (~anthony@ has joined #ceph
[19:39] <Azrael> joao: why does ceph has its own / optimized versions of apache and fastcgi?
[19:47] * andrewbogott (~andrewbog@50-93-251-66.fttp.usinternet.com) Quit (Quit: andrewbogott)
[19:53] * deepsa (~deepsa@ has joined #ceph
[19:56] * chutzpah (~chutz@ Quit (Quit: Leaving)
[20:03] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) Quit (Remote host closed the connection)
[20:13] <dspano> gregaf: I thought I was on to something.
[20:16] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) has joined #ceph
[20:20] * deepsa (~deepsa@ Quit (Quit: Computer has gone to sleep.)
[20:21] * fghaas (~florian@91-119-129-178.dynamic.xdsl-line.inode.at) Quit (Ping timeout: 480 seconds)
[20:33] * Leseb_ (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[20:33] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Read error: Connection reset by peer)
[20:33] * Leseb_ is now known as Leseb
[20:36] * yehudasa (~yehudasa@ Quit (Quit: Ex-Chat)
[20:44] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) has left #ceph
[20:49] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Read error: Connection reset by peer)
[20:51] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[20:52] * fghaas (~florian@91-119-129-178.dynamic.xdsl-line.inode.at) has joined #ceph
[20:57] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[21:06] * EmilienM (~EmilienM@ has joined #ceph
[21:07] * EmilienM (~EmilienM@ has left #ceph
[21:08] * EmilienM (~EmilienM@ has joined #ceph
[21:14] * fc (~fc@ Quit (Ping timeout: 480 seconds)
[21:21] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[21:22] * newtontm (~jsfrerot@charlie.mdc.gameloft.com) Quit (Quit: leaving)
[21:22] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:59] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[22:04] * fghaas (~florian@91-119-129-178.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[22:15] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:29] * chutzpah (~chutz@ has joined #ceph
[22:36] * gregaf (~Adium@2607:f298:a:607:a9c4:6ffb:55e6:752d) Quit (Quit: Leaving.)
[22:52] * gregaf (~Adium@ has joined #ceph
[22:52] * gregaf (~Adium@ has left #ceph
[22:52] * gregaf (~Adium@ has joined #ceph
[22:57] * Meths (rift@ Quit (Ping timeout: 480 seconds)
[23:02] * Meths (rift@ has joined #ceph
[23:04] * gregaf (~Adium@ Quit (Quit: Leaving.)
[23:05] * gregaf (~Adium@2607:f298:a:607:706a:894f:625f:b0e1) has joined #ceph
[23:14] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[23:15] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[23:15] * aliguori (~anthony@ Quit (Remote host closed the connection)
[23:22] * SuperSonicSound (~SuperSoni@9KCAAAAZE.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[23:24] * Leseb_ (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[23:24] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Read error: Connection reset by peer)
[23:24] * Leseb_ is now known as Leseb
[23:29] * Leseb_ (~Leseb@ has joined #ceph
[23:31] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[23:34] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Ping timeout: 480 seconds)
[23:34] * Leseb_ is now known as Leseb
[23:34] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[23:39] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[23:39] * tnt (~tnt@99.56-67-87.adsl-dyn.isp.belgacom.be) Quit (Read error: Connection reset by peer)
[23:41] * Leseb_ (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[23:43] * tnt (~tnt@93.56-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[23:44] * danieagle (~Daniel@ has joined #ceph
[23:47] * Leseb (~Leseb@ Quit (Ping timeout: 480 seconds)
[23:47] * Leseb_ is now known as Leseb
[23:55] * asadpanda (~asadpanda@2001:470:c09d:0:20c:29ff:fe4e:a66) Quit (Read error: No route to host)
[23:56] * asadpanda (~asadpanda@2001:470:c09d:0:20c:29ff:fe4e:a66) has joined #ceph
[23:56] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.