#ceph IRC Log


IRC Log for 2013-06-03

Timestamps are in GMT/BST.

[0:00] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[0:01] * loicd (~loic@magenta.dachary.org) has joined #ceph
[0:03] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[0:04] * LeaChim (~LeaChim@ Quit (Ping timeout: 480 seconds)
[0:08] * Steki (~steki@fo-d- Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:08] * Steki (~steki@fo-d- has joined #ceph
[0:09] <phantomcircuit> i have way too few placement groups
[0:10] <phantomcircuit> what is the best way to fix that?
[0:10] <phantomcircuit> i assume it's creating another pool with more pgs and then migrating from one to the other
[0:14] * BillK (~BillK@124-148-124-185.dyn.iinet.net.au) Quit (Quit: Leaving)
[0:14] * BillK (~BillK@124-148-124-185.dyn.iinet.net.au) has joined #ceph
[0:23] * Steki (~steki@fo-d- Quit (Read error: Connection reset by peer)
[0:23] * Steki (~steki@fo-d- has joined #ceph
[0:28] <mrjack> phantomcircuit: don't know but sounds like one way to do it
[0:28] <mrjack> phantomcircuit: http://www.sebastien-han.fr/blog/2013/03/12/ceph-change-pg-number-on-the-fly/
[0:37] * jahkeup (~jahkeup@ Quit (Ping timeout: 480 seconds)
[0:41] <phantomcircuit> mrjack, the bold warnings there kind of tell me not to do that
[0:41] <phantomcircuit> anyways yeah rados cppool
[0:41] <phantomcircuit> only problem is my pool will be active while this is happening
[0:41] * tnt (~tnt@ Quit (Ping timeout: 480 seconds)
[0:42] <phantomcircuit> so to make that safe i assume i would have to freeze activity or lose everything from the start of cppool onwards
[0:46] * sh_t (~sht@NL.privatevpn.com) Quit (Ping timeout: 480 seconds)
[0:57] * Steki (~steki@fo-d- Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:57] * Steki (~steki@fo-d- has joined #ceph
[1:11] * joelio (~Joel@ Quit (Ping timeout: 480 seconds)
[1:11] * dosaboy (~dosaboy@ Quit (Quit: leaving)
[1:22] <mrjack> phantomcircuit depends on how you use ceph
[1:22] <mrjack> phantomcircuit: for rbd images export import might work better and more painless
[1:24] <phantomcircuit> well
[1:24] <phantomcircuit> for now it's not too bad
[1:25] <phantomcircuit> i think i'll just wait until the "right" way to do it is production ready
[1:59] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[2:12] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[2:12] * ChanServ sets mode +o scuttlemonkey
[2:21] <mrjack> sage sagewk: CXX libmon_a-Paxos.o
[2:21] <mrjack> CXX libmon_a-PaxosService.o
[2:21] <mrjack> mon/PaxosService.cc: In member function �bool PaxosService::dispatch(PaxosServiceMessage*)�:
[2:21] <mrjack> mon/PaxosService.cc:52: error: �struct Connection� has no member named �get_messenger�
[2:21] <sage> mrjack: what commit?
[2:21] <mrjack> tried current cuttlefish
[2:21] <mrjack> ffb87918fa7b829a5199eec08804dc540a819bf2
[2:28] <sage> mrjack: just pushed fix. missed a hunk in the backport. be aware that this particular combination of patches isn't yet well tested. in fact, let me make it cuttlefish-next until i can hammer on it more. getting everything queued up for 0.61.3.
[2:37] * ghartz (~ghartz@ill67-1-82-231-212-191.fbx.proxad.net) Quit (Read error: Connection reset by peer)
[2:38] * Steki (~steki@fo-d- Quit (Quit: Ja odoh a vi sta 'ocete...)
[2:50] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[2:51] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[3:07] * noob2 (~cjh@pool-96-249-205-19.snfcca.dsl-w.verizon.net) Quit (Quit: Leaving.)
[3:15] * The_Bishop (~bishop@e179016206.adsl.alicedsl.de) has joined #ceph
[3:25] * john_barbee_ (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[3:35] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) has joined #ceph
[3:43] * john_barbee (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) has joined #ceph
[3:53] * The_Bishop_ (~bishop@e177088019.adsl.alicedsl.de) has joined #ceph
[3:58] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[3:59] * The_Bishop (~bishop@e179016206.adsl.alicedsl.de) Quit (Read error: Operation timed out)
[4:24] * john_barbee (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) Quit (Quit: ChatZilla 0.9.90 [Firefox 22.0/20130521223249])
[5:01] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[5:03] * Vanony (~vovo@i59F7A4DD.versanet.de) has joined #ceph
[5:10] * Vanony_ (~vovo@i59F7A034.versanet.de) Quit (Ping timeout: 480 seconds)
[5:54] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[5:54] * loicd (~loic@magenta.dachary.org) has joined #ceph
[5:56] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) has joined #ceph
[6:12] * niklas (~niklas@2001:7c0:409:8001::32:115) Quit (Read error: Connection reset by peer)
[6:25] * niklas (~niklas@2001:7c0:409:8001::32:115) has joined #ceph
[7:02] * Volture (~quassel@office.meganet.ru) Quit (Remote host closed the connection)
[7:12] * codice (~toodles@75-140-71-24.dhcp.lnbh.ca.charter.com) Quit (Remote host closed the connection)
[7:20] * codice (~toodles@75-140-71-24.dhcp.lnbh.ca.charter.com) has joined #ceph
[7:23] <nigwil> http://www.sebastien-han.fr/images/ceph-openstack-update-roadmap-havana.jpg
[7:23] <nigwil> :-)
[7:38] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) Quit (Quit: Leaving.)
[7:39] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[7:42] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[7:47] * tziOm (~bjornar@ti0099a340-dhcp0745.bb.online.no) Quit (Remote host closed the connection)
[7:59] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[8:00] * sh_t (~sht@NL.privatevpn.com) has joined #ceph
[8:06] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[8:07] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[8:08] * kyle__ (~kyle@ Quit (Read error: Connection reset by peer)
[8:08] * kyle__ (~kyle@ has joined #ceph
[8:13] * LeaChim (~LeaChim@ has joined #ceph
[8:15] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[8:16] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[8:19] * Volture (~quassel@office.meganet.ru) has joined #ceph
[8:21] * blue (~blue@irc.mmh.dk) has left #ceph
[8:35] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) has joined #ceph
[8:35] * dcasier (~dcasier@ has joined #ceph
[8:37] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) Quit ()
[8:39] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) has joined #ceph
[8:39] * pixel (~pixel@ has joined #ceph
[8:40] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) Quit ()
[8:43] <pixel> Hi, Everybody! Is it possible to associate certain rbd device with appropriate rbd image (for example : image1 > /dev/rbd0)?
[8:44] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) has joined #ceph
[8:45] <pixel> We always need to map image1 to /dev/rbd0, image2 to /dev/rbd1 etc. Because I'm using /dev/rbd* in some configuration files.
[8:50] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) Quit (Quit: Leaving.)
[8:58] * fridudad (~oftc-webi@fw-office.allied-internet.ag) has joined #ceph
[9:04] * LeaChim (~LeaChim@ Quit (Remote host closed the connection)
[9:16] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[9:16] * tnt (~tnt@ has joined #ceph
[9:16] <loicd> tnt: good morning :-)
[9:17] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[9:17] <fridudad> tnt: http://tracker.ceph.com/issues/5232
[9:19] <tnt> fridudad: thanks, I'll watch that closely.
[9:19] <tnt> loicd: morning.
[9:21] * BManojlovic (~steki@ has joined #ceph
[9:24] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:24] * ChanServ sets mode +v andreask
[9:26] <ofu> can I raise the pgnum of a pool after filling data into it? Will this only create empty placement groups? Rebalancing all objects among all pgs is not possible? (I know that this would push around some stuff and is probably not a good idea)
[9:29] <tnt> PG splitting is an experimental feature in cuttlefish.
[9:31] * jgallard (~jgallard@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[9:37] * madkiss (~madkiss@p5DCA3D5C.dip0.t-ipconnect.de) has joined #ceph
[9:39] <ofu> when I add OSDs to a pool, the PGs will be distriubted evenly among all OSDs?
[9:45] * madkiss1 (~madkiss@p5DCA3D5C.dip0.t-ipconnect.de) has joined #ceph
[9:45] <tnt> yes
[9:46] * madkiss (~madkiss@p5DCA3D5C.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[9:51] * sha (~kvirc@ Quit (Read error: Connection reset by peer)
[9:56] * san (~san@ Quit (Ping timeout: 480 seconds)
[9:57] * bergerx_ (~bekir@ has joined #ceph
[10:02] * gucki (~smuxi@77-56-36-164.dclient.hispeed.ch) has joined #ceph
[10:05] <gucki> hi guys
[10:12] * ScOut3R (~ScOut3R@ has joined #ceph
[10:12] <Volture> gucki: Hi
[10:20] * eschnou (~eschnou@ has joined #ceph
[10:20] * JohansGlock (~quassel@kantoor.transip.nl) has joined #ceph
[10:20] * jgallard (~jgallard@3.46-14-84.ripe.coltfrance.com) Quit (Quit: Leaving)
[10:20] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[10:31] <gucki> i'd like to upgrade my bobtail to cuttlefish, but afaik the latest cuttlefish (61.2) has some serious bugs. any eta for 61.3? :)
[10:36] <Volture> gucki: 0.61.2 working of 2 weeks of my cluster. bugs not seen. ))
[10:37] <gucki> Volture: don't you have any growth of your mon store? there an open issue in the tracker and someone told me i should wait upgrading...
[10:37] <gucki> Volture: http://tracker.ceph.com/issues/4895
[10:37] <tnt> yeah, you definitely should wait.
[10:40] <tnt> 0.61.3 is scheduled for this week
[10:40] <Volture> gucki: No ))) 180M of all mon stor
[10:41] <fridudad> you should wait until 0.61.3 which will def. fix the mon problems. I hope that sage will find a solution for http://tracker.ceph.com/issues/5232 too
[10:41] <Volture> gucki: I do not think that's a lot ))
[10:43] <saaby_> yeah, wait for 0.61.3 - we hit the mon bugs in 0.62.2 too.
[10:44] <saaby_> 0.61.2, sorry ^^
[10:44] <saaby_> speaking of which - have any of you experienced osd segfaults on 0.61.0 - 0.61.2 ?
[10:45] <saaby_> we see them popping up from time to time when the cluster is under heavy write I/O (benchmarks).
[10:46] * eegiks_ (~quassel@2a01:e35:8a2c:b230:50b6:5de:170f:95f2) has joined #ceph
[10:47] * eegiks is now known as Guest783
[10:47] * eegiks_ is now known as eegiks
[10:48] * Guest783 (~quassel@2a01:e35:8a2c:b230:566:484c:c010:7ca6) Quit (Ping timeout: 480 seconds)
[10:52] * DarkAce-Z (~BillyMays@ Quit (Read error: Connection reset by peer)
[10:54] * DarkAce-Z (~BillyMays@ has joined #ceph
[10:54] * saaby_ is now known as saaby
[11:01] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[11:02] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[11:08] * loicd1 (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[11:09] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Ping timeout: 480 seconds)
[11:13] <fridudad> saaby_ no open up a tracker bug with the stack trace i'm sure somebody will look at them. So we all can benefit from it.
[11:15] <saaby> fridudad: we reported it a few weeks ago here: http://www.mail-archive.com/ceph-devel@vger.kernel.org/msg14498.html
[11:16] * loicd1 (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Ping timeout: 480 seconds)
[11:16] <saaby> but that thread died, maybe because we where not sure whether it was caused by excessively large RADOS objects (+1GB).
[11:17] <saaby> thurns out though, that the large objects may have exxagerated the problem, but we have now been able to reproduce it with 4MB objects too. (which is default in e.g. RBD too)
[11:17] <saaby> we will update the mailinglist today. - but should we also open a tracker bug..?
[11:20] <fridudad> saaby: use the tracker i've seen that the ceph team is especially with bugs more working with the tracker than the ml
[11:20] <saaby> ok, we'll do that
[11:20] <saaby> thanks
[11:21] <fridudad> saaby: my feeling is... i've not seen it or made any statistic ;-)
[11:21] <saaby> :)
[11:32] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[11:33] <jluis> fridudad, imo, it's easier to follow a bug's status via the tracker than it is with the ml
[11:33] <jluis> I tend to rely on it far more than on the ml, although the ml also serves its purpose
[11:37] * andrei (~andrei@host217-46-236-49.in-addr.btopenworld.com) has joined #ceph
[11:38] * todin (tuxadero@kudu.in-berlin.de) has joined #ceph
[11:38] <andrei> hello guys
[11:38] <andrei> I was wondering if someone could help me with ceph + kvm?
[11:38] <andrei> i am having some issues with writes
[11:39] <andrei> reading seems to work okay, but writing causes task hang and all sorts if issues with unmounting ceph disk
[11:39] <andrei> and my feeling is that it has something to do with sync/direct writes
[11:42] <absynth> are you using rbd_cache?
[11:42] <absynth> which kvm and ceph version, which kernel?
[11:47] <andreask> yeah ... rbd_cache helps a lot
[11:49] <fridudad> jluis: sure i think the same
[12:02] <ofu> is there some kind of straightforward documentation how to configure ceph as rbd for qemu/kvm with libvirt?
[12:06] <andrei> absynth: I am not using anything which is not default. My setup is as follows: qemu - 1.5.0, libvirt - 1.0.5 compiled from sources. ceph is from ubuntu ppa version 0.61.2
[12:06] <BillK> absynth: I though rbd_cache caused pauses in network, or was that fixed?
[12:07] <andrei> absynth: i've been doing some dd and fio testing yesterday
[12:07] <andrei> the rbd disk has been attached to vm and xfs partitioned
[12:07] <andrei> when I was running the tests without iflag=direct or --iodirect=1 everything was working okay
[12:07] <absynth> BillK: i have no idea if it's been fixed, but the "network pauses" are actually IO stalls
[12:08] <andrei> i was able to carry the tests and remount partition
[12:08] <andrei> however, as soon as i started using the direct flags, the problems happened
[12:08] <BillK> absynth: couldnt remember the detail except it was cache and rbd
[12:08] <andrei> after 2 mins i've started getting kernel messages about blocked tasks
[12:09] <absynth> andrei: stick around for about 2-3 hours and nhm will be awake
[12:09] <absynth> he might have more insight
[12:09] <andreask> ofu: you already found that? http://ceph.com/docs/master/rbd/qemu-rbd/?highlight=kvm
[12:11] <absynth> http://tracker.ceph.com/issues/3737
[12:11] <absynth> this is the issue you were talking about, BillK, right?
[12:22] <andrei> absynth: thanks for the link.
[12:22] <andrei> my vms are started automatically for me from cloudstack and I do believe they use difference cache options
[12:22] <andrei> i think it automatically sets cache=none
[12:24] <andrei> yes, indeed, it uses format=raw,cache=none
[12:24] <andrei> wido: hello
[12:24] <andrei> wido: are you online by any chance?
[12:34] * jluis is now known as joao
[12:34] * ChanServ sets mode +o joao
[12:36] * pixel (~pixel@ Quit (Quit: Ухожу я от вас (xchat 2.4.5 или старше))
[12:58] * loicd1 (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[13:02] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Ping timeout: 480 seconds)
[13:02] * pressureman (~pressurem@ has joined #ceph
[13:04] <pressureman> I have a question regarding RBD support in KVM. I have set up a 3-node ceph cluster (3 OSDs, 3 monitors) and installed a debian VM to an RBD that I created in the cluster. This all works fine.
[13:05] <pressureman> I want to test the failover capability of this setup. So far it reacts ok to cleanly removing a monitor or OSD with the ceph commands
[13:06] <pressureman> However, if a try to simulate a sudden failure of one of the nodes (e.g. lost network connectivity or kernel panic), the KVM guest hangs until all monitors are back online
[13:07] <pressureman> The KVM / virsh domain XML defines all three ceph monitors
[13:08] <ofu> andreask: yes, read that. But I did not understand what kind of privileges the kvm-host needs to authenticate against the ceph-cluster. Also: How do I prevent the kvm host to only talk to one monitor? What if this monitor is down?
[13:08] <tnt> pressureman: how did you "simulate" ? and are you sure it's the monitor it's waiting for or the OSD ?
[13:09] <pressureman> tnt, i logged into one of the ceph nodes and shut down the eth0
[13:10] <andreask> ofu: you read on on this? http://ceph.com/docs/master/rbd/libvirt/
[13:10] * san (~san@ has joined #ceph
[13:10] <tnt> pressureman: so... that shutdown both OSD and MON ... hence your "until all monitors are back online" is just not founded
[13:11] <pressureman> if a server fails catastrophically, we won't really get a choice whether it's hosting a monitor or osd
[13:12] <pressureman> tnt, my setup is basically 3 ceph nodes, each running one monitor and one OSD
[13:12] * bergerx_ (~bekir@ Quit (Remote host closed the connection)
[13:13] <pressureman> i can seamlessly stop/start monitors/OSDs using the ceph cli tools, all transparent to the KVM guest
[13:13] * bergerx_ (~bekir@ has joined #ceph
[13:13] <pressureman> but if i "pull the plug" on one of the nodes, the KVM guest clearly doesn't like it
[13:16] <tnt> I didn't say it wasn't a problem. I just said your conclusion that the problem was the monitor was wrong.
[13:16] <tnt> I'm pretty sure that if you killed -9 the monitor before replugging eth0, it would work just fine because the VM is waiting on OSD IO and not on the monitor.
[13:16] <pressureman> ok... forgive me, i've only been using ceph for about 12 hours so far
[13:17] <pressureman> so if one of the ceph nodes (monitor + OSD) _were_ to fail catastrophically.... i should expect to see the KVM guest hang?
[13:18] <tnt> If there try to do IO, I'm pretty sure they would, at least for a short time because it will take some time for ceph to actually detect the failure and mark the OSD as down.
[13:19] <pressureman> it hung for a pretty long time... i mean, i was expecting some kind of timeout to kick in, and things get back to normal... but they didn't
[13:19] <tnt> what does ceph -w said ?
[13:21] <pressureman> well the cluster is in HEALTH_OK state at the moment
[13:21] * n3c8-35575 (~mhattersl@pix.office.vaioni.com) has joined #ceph
[13:21] <pressureman> how should i best try to simulate complete and sudden loss of one of the nodes?
[13:21] <n3c8-35575> hi, was just wonder if someone can help me with a 2013-06-03 11:25:32.108853 b2d9bb40 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
[13:22] <n3c8-35575> I've tried removing and adding the mon with a freshly exported mon key and mon map
[13:23] <tnt> n3c8-35575: my guess is that you don't have the same mon version on all your nodes.
[13:23] <tnt> pressureman: the way you simulate it is fine.
[13:24] <n3c8-35575> tnt: thanks, ill have a looksie
[13:25] <tnt> n3c8-35575: if you use debian/ubuntu apt-cache policy ceph ceph-common and both package need matching version #
[13:26] <pressureman> tnt, a colleague just pointed me to this http://git.qemu.org/?p=qemu.git;a=commit;h=dc7588c1eb3008bda53dde1d6b890cd299758155
[13:26] <pressureman> is it possible that i'm running into that?
[13:26] <pressureman> ceph nodes are debian wheezy + cuttlefish. the KVM guest is running on my ubuntu raring workstation (qemu 1.4)
[13:27] <n3c8-35575> i am indeed running ubuntu and the stupidity of my actions are becoming clearer by the second
[13:27] <tnt> pressureman: yes and no ... it might be why the entire vm freeze and not just the process doing IO, and it makes the issue worse, but it's not the root cause.
[13:28] <saaby> joao: ping
[13:28] <tnt> n3c8-35575: heh ... how do you think I know what the exact problem is :p I was hit by the exact same thing.
[13:30] <joao> saaby, pong
[13:30] <saaby> joao: hey - did you need anything more from me on those mon crashes?
[13:31] <n3c8-35575> tnt, yup ceph-common was still 0.56
[13:32] <joao> saaby, yeah, your ceph.conf if possible
[13:32] <saaby> sure, hold on
[13:32] <joao> assuming you didn't change it
[13:32] <saaby> didn't
[13:33] <saaby> joao: http://pastebin.com/q5DTE4sQ
[13:33] <joao> saaby, thanks
[13:34] <joao> saaby, have you had any other issues since?
[13:34] <pressureman> tnt, just killed eth0 on my ceph02 node, and the guest didn't notice a thing.... :-|
[13:35] <pressureman> ceph -w shows one monitor down, one OSD down
[13:35] <pressureman> active+degraded
[13:36] <saaby> joao: I have seen a monitor falling out of quorum, and then rejoining on load a few times.
[13:36] <pressureman> i guess i'll try being the failed mon/osd back up, and killing the other two, one at a time
[13:36] <joao> saaby, what branch are you on?
[13:36] <saaby> other than that only segmentation faults on osd's during heavy load - but thats another story.. :)
[13:37] <saaby> hold on
[13:37] <saaby> Esmil: you can answer that better than me. :)
[13:38] <joao> well, I have to go prepare something for lunch; should be back in an hour or so
[13:38] <Esmil> joao: We're on the cuttlefish branch, commit 0e127dc4ef16d19..
[13:43] <andrei> I was wondering if anyone uses cephfs or ceph-fuse to mount their ceph partitions?
[13:43] <andrei> what are the mount options that you use?
[13:43] <andrei> i am having some performance issues with ceph-fuse mounting
[13:44] <andrei> in particular if I am reading multiple files at the same time
[13:44] <andrei> performance drops to like 2mb/s per dd process
[13:44] <andrei> on the other hand if I am reading the same file with multiple threads, I am seeing around 100mb/s throughput
[13:45] <jerker> for my test clusters I have been using the kernel client and never looked back.
[13:46] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) Quit (Ping timeout: 480 seconds)
[13:47] <jerker> What do you mean by "mb/s"? Mbit/s? MByte/s? Reading same file with multiple threads is just reading from RAM somewhere I guess.
[13:48] * san (~san@ Quit (Quit: Ex-Chat)
[13:48] <jerker> Need more input... Write a file from one node, read it from the other... Or read a large chunk from one machine (much larger than RAM) and write it to CephFS.. etc.
[13:49] <jerker> For my testing performance is usually what I expect, currently limited by interconnect bandwidth and rotating drive IOPS. I hope to be able to test with some SSD for cache. Has not had time to yet. :(
[13:51] <andrei> jerker: megabytes / s
[13:51] <jerker> Another thing: Has anyone tried to run Ceph OSD nodes in HP Proliant Microserver? I was looking around for storage nodes in around 90 USD/disk (70 EUR/disk) and it seems like they should be acceptable. Looking for rackmonted one from some other manufacturer... I have not really found agood one yet.
[13:52] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[13:52] <andrei> jerker: i am clearing cache on the storage servers and the client before running my tests
[13:52] <andrei> but as you've said
[13:53] <andrei> are there any mountpoint options that would help with performance?
[13:53] <jerker> andrei: I usually check drive utilization with "iostat -x 2" on the OSD nodes. It seems like Ceph is quite heavy in IOPS even when the files are large.
[13:54] <jerker> andrei: do you use large files? is the OSD-nodes limited by IO-utilization?
[13:54] <jerker> andrei: or is the network the limit (check with "dstat" for example)
[13:55] <jerker> andrei: I do not use any special options.
[13:55] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[13:55] <n3c8-35575> tnt, hey just wanted to say thanks.... it was that, plus a couple of induced issues (caused by me trying to fix it by add/removing lots of mons)
[13:56] <andrei> jerker: there is no issue with the osds or the network as I am happily reaching 600mb/s speeds when running benchmarks with rbd drives
[13:56] <andrei> and also with rados benchmarks as well
[13:57] <andrei> i have performance issues when using ceph-fuse
[13:57] <jerker> andrei: you need to check while the cluster is slow, to see how it behaves. the CephFS has not been tuned as much as RBD as far as I understand.
[13:59] <jerker> andrei: one obvious thing that helped performance while writing, limited by drive IOPS, is to set the replicas to two instead of three. But that is obvious.
[13:59] <jerker> i am the department of redundancy department
[13:59] <andrei> ))
[14:00] <andrei> in my PoC I am currently using 2 replica
[14:00] <jerker> yay
[14:00] <andrei> and I also have ssd for journaling
[14:00] <jerker> cool
[14:00] <andrei> so, my write speed is limited to the speed of the ssd disk
[14:00] <andrei> my reads are pretty fast using rbd
[14:00] <andrei> I am seeing around 600mb/s from the client
[14:01] <andrei> which seems to be the limit at the moment
[14:01] <andrei> coz from the server I am seeing speeds in excess of 1.5gb/s
[14:01] <andrei> i guess i need to look at the ipoib bottleneck
[14:02] <andrei> however, the ceph-fuse seems to be rather slow in places
[14:02] <jerker> you have tried the kernel cephfs client?
[14:02] <andrei> no, i have not
[14:02] <andrei> that's another thing i wanted to check
[14:02] <pressureman> ipoib? someone is using that?
[14:02] <darkfaded> welcome to 2013 :)
[14:02] <andrei> pressureman: yes, I am
[14:02] <jerker> do that first. I have not tried the fuse-client in a long time.
[14:02] <pressureman> RDMA support is planned, is it not?
[14:03] <andrei> jerker: how do I enable this? coz when i try to mount with mount -t ceph i get an error message that it's not supported by kernel
[14:03] <pressureman> we have a substantial investment in infiniband (mellanox) gear, and would be quite excited to see RDMA support added
[14:03] <andrei> do I need to install something in addition
[14:03] <andrei> pressureman: IB is very nice, but at the moment, as far as I know rdma is not supported
[14:03] <andrei> there has been some testing around rsockets with ceph
[14:03] <jerker> andrei: I downloaded and installed more or less manually a new Linux kernel on top of Scientific Linux 6.4 x64...
[14:03] <darkfaded> pressureman: i haven't heard about plans for rdma support that are like, on the roadmap
[14:04] <darkfaded> i am asking about it for almost 2 years now i think
[14:04] * tnt (~tnt@ Quit (Ping timeout: 480 seconds)
[14:04] <andrei> jerker: is ceph a part of a kernel now, or do I need to compile the module?
[14:04] <jerker> andrei: it is part of the kernel
[14:04] <andrei> does anyone know how to get ceph kernel driver working so that I can mount with mount -t ceph ?
[14:04] <pressureman> certainly our IB QDR network is faster than a 10 GigE.... but using a (relatively) slow IP stack on infiniband makes me sad
[14:04] <darkfaded> yep
[14:04] <darkfaded> especially for latency stuff like mon-mon
[14:05] <andrei> pressureman: yeah, me too
[14:05] <andrei> i am on qdr and have to live with ipoib for the time being
[14:05] <pressureman> what sort of throughput are you getting?
[14:06] <andrei> does anyone know how to compile ceph kernel module for centos 6.4 kernel?
[14:06] <andrei> pressureman: i am getting (using rbd) around 600mb/s
[14:06] <jerker> andrei: I got a whole new kernel... I wanted the latest and greatest btrfs etc.
[14:06] <pressureman> hmm... i guess that's acceptable... but slowler than what we currently get with SRP
[14:07] <pressureman> i guess it could also be that our disks are faster than yours ;-)
[14:07] <ofu> i tried centos and noticed wrong paths in ceph-deploy, is there a bug for this?
[14:07] <andrei> pressureman: are you using centos or ubuntu?
[14:07] <pressureman> our SRP storage servers are gentoo
[14:07] <andrei> pressureman: the 600mb/s is coming directly from ram
[14:07] <jerker> andrei: I do not have my .config (cluster shut down currently) but just find ceph in the config, activate it with whatever else you need in the kernel.
[14:08] <pressureman> andrei, we mostly use debian elsewhere though
[14:08] <andrei> pressureman: what drivers do you use with debian? I was having lods of issues getting infiniband to work with ubuntu
[14:09] <pressureman> afaik, just the mainline drivers... all our HCAs and switches are mellanox
[14:10] <andrei> does anyone know how to backport ceph kernel module to centos 6.4 with 2.6.32 kernel?
[14:10] <andrei> pressureman: lucky you )))
[14:10] <andrei> i am using qlogic kit and it's been a nightmare
[14:10] <andrei> i can't make it work with ubuntu
[14:11] * Vjarjadian (~IceChat77@ Quit (Quit: If at first you don't succeed, skydiving is not for you)
[14:11] <andrei> the f*kers don't get the link even though the card is recognised by the os
[14:12] <pressureman> andrei, we used to have some qlogic switches, but they were fairly troublesome, so we replaced them all with mellanox
[14:12] <pressureman> it's a bit pricier, but rock solid
[14:16] <jerker> andrei: no. Sorry. feel the pain. that one of the reasons I try to keep to stock kernels. I was quite irritated when my sweet areca cards was not included in the CentOS kernel. Now they are.
[14:24] * tnt (~tnt@ has joined #ceph
[14:36] <ofu> jerker: I thought, ceph would be fine with cheap jbods and ssds for logging, no hw raid required?
[14:38] <jerker> ofu: yes. (the areca cards are on my current production file servers running ext4 and NFS)
[14:46] * Maskul (~Maskul@host-89-241-171-211.as13285.net) has joined #ceph
[14:46] * dosaboy (~dosaboy@eth3.bismuth.canonical.com) has joined #ceph
[14:47] * andrei (~andrei@host217-46-236-49.in-addr.btopenworld.com) Quit (Ping timeout: 480 seconds)
[14:48] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[14:49] <Maskul> hey guys, imagine a scenario where you have two identical images, image A and image B
[14:49] <Maskul> you create a cow file based on image A, afterwards you delete image A
[14:49] <Maskul> would it be possible to use the cow file using image B?
[14:55] <jerker> Maskul: I have only used it in ZFS; there you cannot delete an original file system with active clones. You need to promote the clone first. I don't know in Ceph.
[14:57] * gucki_ (~smuxi@77-56-36-164.dclient.hispeed.ch) has joined #ceph
[14:57] * gucki_ (~smuxi@77-56-36-164.dclient.hispeed.ch) Quit (Remote host closed the connection)
[14:57] * dosaboy (~dosaboy@eth3.bismuth.canonical.com) Quit (Read error: Operation timed out)
[14:57] <Maskul> yeah sorry,i accidently posted it on this channel, it wasn't meant for this one. silly me, i need more sleep :(
[14:58] <jerker> no probs. What topic was it intented for?
[14:58] <Maskul> for qemu
[14:58] <jerker> ah
[14:59] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[15:05] * Engur (~root@ has joined #ceph
[15:05] <Engur> hello.. i have a question
[15:05] <Engur> any1 online?
[15:06] <jerker> shoot
[15:06] * jerker trying best effort
[15:07] <Engur> i lost my metadata disks.. i have only OSD's... is there anyway to rebuild my ceph from only OSD's???
[15:07] <pressureman> jerker, interesting, i never knew you could promote a clone... i haven't done any serious ZFS for at least 2 years.... last time i used it was around opensolaris 134
[15:09] <jerker> pressureman: I started to use ZFS two years ago when it becae mountable in Linux. :)
[15:10] <jerker> pressureman: still probably is not as good as it was in Solaris or is in BSD, but good enough for my disk backups.
[15:10] <Engur> any possible answers?
[15:11] <pressureman> i was recently tempted to use zfs on a 26TB array on linux... but wasn't quite convinced of its stability
[15:11] <jerker> Engur: sorry. Hmm. I have done that. I am no developer. But the idea is that the MDS serve as a fast cache and then store data into the OSDs. So my guess is that is possible.
[15:12] <jerker> Engur: not only cache, take care of locking, metadata, balancing stuff like that.
[15:12] <pressureman> ended up just going hw raid6 and lvm2 (storing a mixture of ext4 and raw block devs exported over iscsi for VMs)
[15:12] <tnt> Engur: what do you mean "metadata disk"
[15:12] <tnt> you mean the MDS servers for cephfs ?
[15:12] <Engur> in other words, i lost all my MDS servers
[15:12] <Engur> yes..
[15:12] <Engur> now, all i have OSD servers
[15:12] <tnt> well ... you should have MON servers as well
[15:12] <Engur> so can i rebuild my ceph from only them?
[15:13] <Engur> i lost them too..
[15:14] * hybrid5121 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[15:14] <jerker> pressureman: it is still stabilizing, i was finding bugs when rsyncing in 70 million files. A couple of months later patches popped up fixing some of the deadlocks but I had to run Ext4 instead. At least that works. :) But i like ZFS.
[15:14] <joao> Engur, if you lost all your monitors, then I don't think there's anything you can do
[15:15] <tnt> Engur: ask on the ml, but I'm not sure you can reconstruct the cluster.
[15:15] <joao> but yeah, what tnt said
[15:15] <joao> ask on the ml
[15:15] <tnt> Engur: and this is cephfs data you want to recover ? or RBD or RGW ?
[15:16] <pressureman> jerker, yeah i'm also a big fan of zfs (and a few other bits of solaris tech)... it's bittersweet satisfaction to see a lot of zfs features ending up in btrfs... i look forward to the day we no longer need lvm for snapshotting, and no longer have to think about fixed partition sizes
[15:16] <Engur> cephfs
[15:16] <Engur> i was using only cephfs
[15:17] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[15:17] <tnt> Engur: it might be possible to write some custom tool to scan the osd dir and extract data manually, but even if this is possible, I'm pretty sure it doesn't exist ATM so you'd have to write it and that looks like a pretty big job.
[15:18] <tnt> How did you manage to loose all your monitors ...
[15:18] <Engur> tnt:fatal power supply error :(
[15:19] <Engur> btw, thanks for your answers..
[15:19] <janos> single monitor?
[15:19] <jerker> pressureman: lvm has bitten me enough, even though it was a long time ago, that I still do not trust it. It is probably not very wise of me. With my limited experience of ZFS it has not lost data on me yet.
[15:19] <joao> I suppose it would be possible to grab all objects from the osds, but I doubt it is likely to recover the osds though
[15:19] <joao> no maps, no joy
[15:20] <joao> I think I've heard something about osd keeping a cached version of the map, but I'm not sure that even accurate; asking on the ml or hanging around until later might be best
[15:20] <tnt> joao: yeah, maybe recover all objects and re-inject them in a new cluster
[15:20] <Engur> janos: 3 of them, on the same powerline
[15:20] <joao> Engur, and all disks just fried?
[15:20] <janos> ah. that hurts. gotta separate the failure domains
[15:20] <jerker> joao: This is a stupid question of me, but why cannot the cluster map be recreated with the same topology?
[15:21] <joao> jerker, it can, but the infos kept on the maps allow you to map the cluster's contents
[15:21] <Engur> joao:after power failure, mons and mds were dead, only osds were alive (another powerline)
[15:21] <joao> without those maps, you have raw data sitting in the cluster without much to make sense of it
[15:22] <joao> pretty much like losing the superblock of a traditional filesystem, if you allow me the rough analogy
[15:22] <jerker> joao: i still do not understand. Is not the input of the CRUSH the topology of the cluster, the output is the map?
[15:22] <tnt> Engur: and you can't read anything out of the hdd ?
[15:23] <darkfaded> joao: worse, that would be more easier to fix :)
[15:23] <joao> Engur, you mean dead as in down, or dead for good?
[15:23] <Engur> tnt:yeah, they're gone. (at least from normal operations... maybe, a high-tech company could recover from those hard disks)
[15:23] <joao> jerker, you have to give CRUSH an hash of the object's name (iirc) and a map
[15:24] <joao> that map allows crush to pinpoint where that object is supposed to live in that given epoch
[15:24] <joao> maps have multiple epochs
[15:26] <jerker> too complicated for me. (exits to meeting) :)
[15:27] * redeemed (~redeemed@static-71-170-33-24.dllstx.fios.verizon.net) has joined #ceph
[15:28] <Engur> thanks anyway
[15:29] <saaby> joao: did you get everything you needed?
[15:29] <joao> saaby, I suppose so
[15:30] <saaby> ok
[15:30] <joao> you hit a weird crash that I've been trying to figure out on-and-off
[15:30] <joao> not critical, just weird
[15:30] <saaby> :)
[15:30] <saaby> ok
[15:30] <saaby> can you reproduce?
[15:30] <joao> nope
[15:30] <joao> lol
[15:30] <saaby> hah
[15:31] <joao> this one: http://tracker.ceph.com/issues/5205
[15:31] <saaby> ok
[15:33] * PerlStalker (~PerlStalk@ has joined #ceph
[15:36] <joao> saaby, you sure you didn't change anything in your ceph.conf in-between triggering those crashes and getting the current cluster running?
[15:37] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) has joined #ceph
[15:38] <saaby> joao: we might have changed an osd timeout or two, but nothing else.
[15:38] <saaby> is something fishy?
[15:38] <joao> naa, ceph.conf is right
[15:38] <maximilian> hi folks, how can degraded mds cluster repaired?? ceph health returns "HEALTH_WARN mds cluster is degraded" on 2 node cluster
[15:38] <joao> I'm just annoyed I can't seem to see what could have caused this
[15:39] <saaby> ok
[15:39] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) Quit ()
[15:39] <saaby> I couldn't restart or even rebuild that mon at the time.. eventually I deleted the test pool, and recreated that. - after that I could recreate the mon without problems
[15:41] <joao> well, I'll sit on this bug a little while longer
[15:41] <saaby> venturing a wild guess; maybe that pool somehow got hosed (which trigegred the mon fail) by the fairly frequent osd segfaults we had during that time
[15:41] <joao> maybe use it as an excuse to add some debug output to this function in case of error
[15:41] <saaby> right
[15:42] <joao> saaby, this particular assert is some issue with network-related configuration options
[15:42] * andrei (~andrei@host217-46-236-49.in-addr.btopenworld.com) has joined #ceph
[15:42] <joao> how, why or what I don't know
[15:42] <andrei> hello guys
[15:42] <saaby> we havent seen those segfaults in a while now, but today they resurfaced after 3-4 days of constant rados bench loops.
[15:42] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) has joined #ceph
[15:43] <andrei> does anyone know if i can run ceph on two different networks at the same time?
[15:44] <joao> andrei, ceph allows you to have a public and a cluster network (for osd communication, such as replication), if that's what you mean...
[15:45] <saaby> andrei: and your servers don't need to be in the same layer two domain, if that what you mean..
[15:45] * loicd1 is now known as loicd
[15:45] <andrei> joao: that's not exactly what I am after
[15:45] <andrei> i've got two lan networks
[15:45] <andrei> and i would like ceph to be able to work for both lans
[15:46] <andrei> each server has two nics
[15:46] <andrei> so as the clients
[15:46] <andrei> i've got an infiniband network as well as dual 1gbit/s nics
[15:47] <saaby> joao: network-related <- wow.. that is really strange, as deleting the data pool "solved" the problem.
[15:47] <andrei> what i could like to acheve in the end is that if there is an issue with an infiniband network the clients should be able to use ethernet to get to the data
[15:48] <pressureman> andrei, what about configuring the two eth nics as 802.3ad (LACP) trunk? that's what we do here
[15:48] <pressureman> you get load balancing and high availability at the same time
[15:48] <andrei> pressureman: that is what we currently use, but we have an alb mode
[15:49] <andrei> however, i would like our ib network to serve data most of the times
[15:49] <andrei> so, at the moment ceph runs over our ipoib net
[15:50] <pressureman> hmm.... i see... and bridging ib and eth interfaces probably isn't a good idea (assuming it's even possible)
[15:50] <andrei> i am not sure you can bond and eth and ib interfaces ))
[15:50] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) Quit (Quit: Leaving.)
[15:51] <andrei> so, it would be good having ceph running on both eth and ib interfaces at the same time
[15:51] <pressureman> no, you cannot bond them (at least not 802.3ad)... but you can perhaps bridge them
[15:51] <andrei> at the moment ib can't be bridged
[15:51] <darkfaded> you can make an active/passive bond but it would be one of the most ugly things ever done
[15:51] <andrei> yeah, i do not want to do that
[15:52] <andrei> does anyone know if ceph could listen on multiple network interfaces?
[15:52] * jahkeup (~jahkeup@ has joined #ceph
[15:53] * drokita (~drokita@ has joined #ceph
[15:54] * yehudasa_ (~yehudasa@2602:306:330b:1410:acb2:c167:19bd:1176) has joined #ceph
[15:56] <joao> saaby, I meant something with regard to cluster and public network and/or addresses
[15:56] <joao> hence why I asked so many times if you had changed anything in your ceph.conf :)
[15:56] <joao> anyway, I'll keep an eye out for similar issues
[15:56] <saaby> right :)
[15:56] <saaby> ok
[15:56] <joao> doesn't appear critical as no one else seems to be hitting that
[15:57] <saaby> I will let you know if it resurfaces
[15:57] * portante (~user@ has joined #ceph
[15:57] <joao> thanks
[15:57] <saaby> agreed.
[16:03] * dosaboy (~dosaboy@eth3.bismuth.canonical.com) has joined #ceph
[16:11] * tziOm (~bjornar@ has joined #ceph
[16:13] <andrei> i can see in global options that you can specify multiple cluster network ips
[16:13] <andrei> however, i can't find any references to how do i specify multiple ips for mon and old
[16:15] * jgallard (~jgallard@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[16:17] <via> the new ceph mons seem to require connecting to a quorum for the init script to get them to start
[16:17] <via> which, when i have 3 and all 3 went down, i can't get two of them to start up since there is no quorum
[16:17] <via> has anyone else experience this?
[16:19] <joao> that seems way too odd
[16:19] <joao> never heard of it
[16:20] * joao looks for the init script in the sources
[16:20] <loicd> in https://github.com/ceph/ceph/blob/master/src/osd/osd_types.h#L435 there is epoch and version. I'm under the impression that version is not reset to zero when epoch increases
[16:20] <loicd> because of
[16:20] <via> well, i don't know exactly where it is, but i know before the init script just started the daemons, and now the init script sits scrolling faults being unable to connect
[16:20] <loicd> https://github.com/ceph/ceph/blob/master/src/osd/PGLog.cc#L482
[16:21] <loicd> which only compares the version and not the epoch
[16:23] <joao> loicd, afaict, the version is supposed to be incremented whenever the epoch increases (see eversion_t::inc())
[16:23] <joao> then again, not sure what I'm looking at
[16:23] <joao> so no idea whether the logic makes sense or not
[16:23] <loicd> :-) thanks for the hint anyway
[16:24] * joao (~JL@ has left #ceph
[16:24] * joao (~JL@ has joined #ceph
[16:24] * ChanServ sets mode +o joao
[16:24] <joao> yeah, right ctrl-w, wrong window
[16:25] <loicd> :-)
[16:25] <loicd> I noticed inc() but I'm not sure it's used consistently. And the comparison operators would work even if .version was reset when epoch increases.
[16:25] <loicd> https://github.com/ceph/ceph/blob/master/src/osd/osd_types.h#L491
[16:25] <ccourtaut> :)
[16:26] <joao> loicd, can you give me a bit of background on eversion_t? :)
[16:26] * loicd trying to make sense of https://github.com/ceph/ceph/blob/master/src/osd/PGLog.cc#L483 assuming eversion_t::version is only increased thru ::inc()
[16:27] <via> so, https://pastee.org/jyvas -- but the problem is this mon was converted weeks ago and was working until it crashed and now gets that
[16:27] <loicd> joao: I'm trying to understand this specific line of code joao :-)
[16:28] * fridudad (~oftc-webi@fw-office.allied-internet.ag) Quit (Remote host closed the connection)
[16:28] <loicd> https://github.com/ceph/ceph/blob/master/src/osd/PGLog.cc#L452 sets the lower bound
[16:28] * doubleg (~doubleg@ has joined #ceph
[16:29] <via> note thats with debug mon=20, doesn't seem to get much more than that
[16:29] <ccourtaut> loicd: i see you point, it's why is the version inside the eversion_t is compared instead of the eversion_t itself using the comparison operator
[16:29] <joao> loicd, any idea whether an epoch has multiple versions, or if it's versions that are supposed to have multiple epochs? or if they are unrelated altogether?
[16:30] <joao> via, how did it crash?
[16:31] * fridudad (~oftc-webi@fw-office.allied-internet.ag) has joined #ceph
[16:31] <via> honestly, this was all 2 weeks ago and i just haven't had time to mess with it -- my mons were crashing hourly. i'll see if i can find logs
[16:31] <loicd> joao: I don't know when epochs increase.
[16:31] * loicd looking
[16:33] <joao> from eversion_t::inc() I'd say that they're somewhat independent (inc() always increases version, and only updates epoch if e > epoch), which would make that operator<() have a bit more of a sense to it
[16:33] <loicd> right
[16:34] * drokita1 (~drokita@ has joined #ceph
[16:34] * drokita (~drokita@ Quit (Ping timeout: 480 seconds)
[16:36] <via> is it possible to, with ust working mon out of 3, recreate the other two?
[16:37] <joao> loicd, assuming I didn't look through the code enough to assess whether or not I am right, I'd say that it is assumed that the epoch is the same for all the entries in 'log'; only way that makes sense, but sjust would probably know better (as he usually does)
[16:37] <joao> via, yeah
[16:38] <loicd> hum interesting
[16:38] <joao> if you have a working monitor, it is possible
[16:38] <joao> my suggestion though is to back them all up first
[16:38] <via> yeah, definetely
[16:38] <joao> if you're hitting some nasty assert, better be safe than sorry
[16:38] <via> although keep in mind i can't get the one 'working' one to start
[16:38] <via> because of no quorum
[16:38] <joao> yeah
[16:38] <joao> that's easily fixed by injecting a new monmap
[16:39] <joao> shut them down, backup your mons, generate a new monmap with just that one monitor, inject it to the one monitor that works, restart it
[16:39] <joao> then just add new monitors to the cluster
[16:39] <via> ok
[16:40] <via> this won't result in data loss?
[16:40] <via> i guess i don't fully understand the monmap, i'll read up
[16:40] <joao> if that monitor is fine, then no
[16:40] <joao> the monmap basically tells the clients, the osds and sorts, and even the monitors where the monitors are
[16:42] <via> should i be using generate with --fsid specified?
[16:43] <joao> yeah, my guess would be 4011ed30-6181-4e94-a807-c5f7b181a254
[16:43] <via> yeah
[16:43] <joao> (from your log)
[16:43] <via> okay, i'll give it a shot in a few hours, thank you
[16:43] <joao> hope it works
[16:44] <joao> let us know how it goes :)
[16:45] * scheuk (~scheuk@ has joined #ceph
[16:55] * Wolff_John (~jwolff@vpn.monarch-beverage.com) has joined #ceph
[16:58] * dignus (~dignus@bastion.jkit.nl) has joined #ceph
[17:00] * tziOm (~bjornar@ Quit (Remote host closed the connection)
[17:01] <joao> hey nlopes, noticed that your part message from a couple days ago was from .pt ; where are you from? :)
[17:02] * madkiss1 (~madkiss@p5DCA3D5C.dip0.t-ipconnect.de) Quit (Quit: Leaving.)
[17:15] * xiaoxi1 (~xiaoxi@shzdmzpr02-ext.sh.intel.com) has joined #ceph
[17:17] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[17:18] * yehudasa_ (~yehudasa@2602:306:330b:1410:acb2:c167:19bd:1176) Quit (Ping timeout: 480 seconds)
[17:21] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[17:23] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Ping timeout: 480 seconds)
[17:32] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:33] * drokita1 (~drokita@ Quit (Quit: Leaving.)
[17:37] * drokita (~drokita@ has joined #ceph
[17:42] * eschnou (~eschnou@ Quit (Ping timeout: 480 seconds)
[17:47] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[17:48] <Jakdaw> What are tdump files in /var/log/ceph/ ?
[17:49] <joao> leveldb transaction dumps from the monitors
[17:49] <joao> I presume
[17:49] <Jakdaw> ah /me finds bug #5024
[17:50] * danieagle (~Daniel@ has joined #ceph
[17:50] * gregaf (~Adium@ Quit (Quit: Leaving.)
[17:54] * gregaf (~Adium@2607:f298:a:607:3028:f346:9be8:3220) has joined #ceph
[17:58] * jgallard (~jgallard@3.46-14-84.ripe.coltfrance.com) Quit (Quit: Leaving)
[17:59] <mikedawson> Jakdaw: upgrade to 0.61.2 http://ceph.com/releases/v0-61-2-released/ and manually delete the tdump
[17:59] * gregaf (~Adium@2607:f298:a:607:3028:f346:9be8:3220) Quit (Quit: Leaving.)
[18:00] <mikedawson> Jakdaw: or set 'mon debug dump transactions = false' under [mon] in ceph.conf if you want to keep 0.61.1 for a bit then restart the ceph-mon process and manually delete the tdump
[18:04] <Jakdaw> actually I've already updated, so seems I can just remove the old tdumps
[18:04] <Jakdaw> thanks
[18:05] * rturk-away is now known as rturk
[18:06] <jamespage> mikedawson, hey - was it you who was asking about qemu rbd improvements week before last?
[18:06] <mikedawson> jamespage: yes
[18:07] * ron-slc (~Ron@173-165-129-125-utah.hfc.comcastbusiness.net) Quit (Remote host closed the connection)
[18:08] <jamespage> mikedawson, did you manage to raise a bug report? have some qemu packages for raring for testing
[18:08] * BillK (~BillK@124-148-124-185.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[18:08] <jamespage> http://people.canonical.com/~serge/qemu-rbd-async/
[18:09] <mikedawson> jamespage: I never did get the bug report in. Couldn't figure out how to go through the process via CLI.
[18:09] <mikedawson> jamespage: is there a list of backports included in these packages somewhere?
[18:09] <jamespage> mikedawson: http://people.canonical.com/~serge/qemu-rbd-async/qemu.debdiff
[18:10] <mikedawson> jamespage: great! I'll test these soon.
[18:12] <mikedawson> jamespage: are these wrapped as a PPA somewhere?
[18:16] <jamespage> mikedawson, knowing serge probably not
[18:16] <jamespage> is that a problem?
[18:16] <jamespage> mikedawson, also raised https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1187046
[18:16] <jamespage> could you add some details from an end user perspective? its going to need some sort of test case for verification as well
[18:18] * ScOut3R (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[18:18] * xiaoxi1 (~xiaoxi@shzdmzpr02-ext.sh.intel.com) Quit (Remote host closed the connection)
[18:20] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[18:20] * gregaf (~Adium@2607:f298:a:607:ed33:570f:1106:e36b) has joined #ceph
[18:20] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[18:21] * loicd (~loic@magenta.dachary.org) has joined #ceph
[18:22] * gucki (~smuxi@77-56-36-164.dclient.hispeed.ch) Quit (Remote host closed the connection)
[18:22] * bergerx_ (~bekir@ Quit (Quit: Leaving.)
[18:27] * joshd1 (~jdurgin@2602:306:c5db:310:95e0:cb03:6afc:780d) has joined #ceph
[18:32] * hybrid5121 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[18:33] * KindTwo (KindOne@h150.51.186.173.dynamic.ip.windstream.net) has joined #ceph
[18:35] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[18:35] * KindTwo is now known as KindOne
[18:38] * yehudasa_ (~yehudasa@2607:f298:a:697:acb2:c167:19bd:1176) has joined #ceph
[18:38] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Quit: Konversation terminated!)
[18:40] * BManojlovic (~steki@fo-d- has joined #ceph
[18:41] * madkiss (~madkiss@p5099e2ec.dip0.t-ipconnect.de) has joined #ceph
[18:45] * pressureman (~pressurem@ Quit (Quit: Ex-Chat)
[18:56] * Tamil (~tamil@ has joined #ceph
[19:00] * Wolff_John (~jwolff@vpn.monarch-beverage.com) Quit (Ping timeout: 480 seconds)
[19:00] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[19:01] * ghartz (~ghartz@ill67-1-82-231-212-191.fbx.proxad.net) has joined #ceph
[19:11] * noob2 (~cjh@ has joined #ceph
[19:16] * noob2 (~cjh@ has left #ceph
[19:18] * andrei (~andrei@host217-46-236-49.in-addr.btopenworld.com) Quit (Ping timeout: 480 seconds)
[19:25] * dpippenger (~riven@206-169-78-213.static.twtelecom.net) has joined #ceph
[19:26] * madkiss (~madkiss@p5099e2ec.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[19:36] * tkensiski (~tkensiski@27.sub-70-197-10.myvzw.com) has joined #ceph
[19:36] * tkensiski (~tkensiski@27.sub-70-197-10.myvzw.com) has left #ceph
[19:40] * tziOm (~bjornar@ti0099a340-dhcp0745.bb.online.no) has joined #ceph
[19:46] * bergerx_ (~bekir@ has joined #ceph
[19:47] * Tamil (~tamil@ Quit (Quit: Leaving.)
[19:54] * bergerx_ (~bekir@ Quit (Quit: Leaving.)
[19:57] * mschiff (~mschiff@port-27667.pppoe.wtnet.de) has joined #ceph
[20:12] * Maskul (~Maskul@host-89-241-171-211.as13285.net) Quit (Quit: Maskul)
[20:16] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[20:18] * danieagle (~Daniel@ Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[20:20] * Wolff_John (~jwolff@ftp.monarch-beverage.com) has joined #ceph
[20:23] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) has joined #ceph
[20:28] * tkensiski (~tkensiski@ has joined #ceph
[20:28] * tkensiski (~tkensiski@ has left #ceph
[20:32] * yehudasa_ (~yehudasa@2607:f298:a:697:acb2:c167:19bd:1176) Quit (Ping timeout: 480 seconds)
[20:33] * yehudasa_ (~yehudasa@2607:f298:a:697:acb2:c167:19bd:1176) has joined #ceph
[20:42] * Tamil (~tamil@ has joined #ceph
[20:43] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[21:02] * fridudad_ (~oftc-webi@p5B09D25D.dip0.t-ipconnect.de) has joined #ceph
[21:05] * eschnou (~eschnou@168.176-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[21:13] * eschnou (~eschnou@168.176-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[21:23] * andrei (~andrei@host86-155-31-94.range86-155.btcentralplus.com) has joined #ceph
[21:23] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[21:25] * Tamil (~tamil@ Quit (Quit: Leaving.)
[21:28] * dosaboy (~dosaboy@eth3.bismuth.canonical.com) Quit (Quit: Lost terminal)
[21:30] * dosaboy (~dosaboy@eth3.bismuth.canonical.com) has joined #ceph
[21:33] * mrjack (mrjack@office.smart-weblications.net) Quit (Ping timeout: 480 seconds)
[21:33] * eschnou (~eschnou@168.176-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[21:33] * mschiff_ (~mschiff@port-27667.pppoe.wtnet.de) has joined #ceph
[21:34] * mschiff (~mschiff@port-27667.pppoe.wtnet.de) Quit (Remote host closed the connection)
[21:34] * madkiss (~madkiss@p5099e2ec.dip0.t-ipconnect.de) has joined #ceph
[21:36] <sjust> loicd: you there?
[21:38] * diegows (~diegows@ Quit (Ping timeout: 480 seconds)
[21:38] * Vjarjadian (~IceChat77@ has joined #ceph
[21:39] <fridudad_> sjust just wanted to let you know that i opened a bug in tracker and sage had already looked at it http://tracker.ceph.com/issues/5232
[21:39] <sjust> yeah
[21:39] * Tamil (~tamil@ has joined #ceph
[21:39] <sjust> thanks
[21:40] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[21:40] * ChanServ sets mode +v andreask
[21:40] <fridudad_> sjust i thank you guys ;-)
[21:40] <paravoid> that doesn't sound very nice
[21:42] * eschnou (~eschnou@168.176-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[21:43] * The_Bishop_ (~bishop@e177088019.adsl.alicedsl.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[21:46] * diegows (~diegows@ has joined #ceph
[21:47] * joshd1 (~jdurgin@2602:306:c5db:310:95e0:cb03:6afc:780d) Quit (Quit: Leaving.)
[21:53] * LeaChim (~LeaChim@ has joined #ceph
[21:58] * joshd1 (~jdurgin@2602:306:c5db:310:2d8f:668d:89ff:fbad) has joined #ceph
[22:04] * eschnou (~eschnou@168.176-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[22:10] <loicd> sjust: I'm here
[22:12] <sjust> loicd: that case I think is impossible, forming email
[22:19] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:22] <loicd> sjust thanks. For a moment I thought it could not be reproduced. And then I found a way. But it seemed so unlikely and contorted that I had doubts :-
[22:22] <sjust> that specific one can't happen since if the most recent entry were delete, the missing set wouldn't contain anything for that object
[22:22] <sjust> the more general case is more confusing
[22:26] * madkiss (~madkiss@p5099e2ec.dip0.t-ipconnect.de) Quit (Quit: Leaving.)
[22:28] * mjeanson (~mjeanson@00012705.user.oftc.net) has joined #ceph
[22:28] <sjust> loicd: sent an email
[22:28] * loicd reading
[22:30] <loicd> sjust: on a related topic, it is my understanding that in eversion_t the version is guarantee to be always growing and that epoch *could* be ignored when comparing eversion_t ( although it does not hurt to compare with epoch ). Is it correct ?
[22:31] <loicd> I'm asking in the context of https://github.com/ceph/ceph/blob/master/src/osd/PGLog.cc#L483
[22:39] * sjust1 (~sam@ has joined #ceph
[22:39] * sjust (~sam@2607:f298:a:607:baac:6fff:fe83:5a02) Quit (Read error: Connection reset by peer)
[22:41] * yehudasa_ (~yehudasa@2607:f298:a:697:acb2:c167:19bd:1176) Quit (Ping timeout: 480 seconds)
[22:46] * sjustlaptop (~sam@2607:f298:a:607:6d1c:c3a8:c2ea:4b6) has joined #ceph
[22:46] <sjustlaptop> loicd: no, you need the epoch as well
[22:47] <sjustlaptop> the version is always increasing, but there might be a (4, 20) and a (5, 20) if the first osd died, the second didn't find out about update 20 and recreated it
[22:47] <sjustlaptop> (4,20) would become divergent
[22:48] <loicd> sjustlaptop: ok, thanks. I'll read again with this in mind. I'm not sure to understand why only the version is compared in this specific case and not the eversion_t object. But it will help me figure it out :-)
[22:48] * fridudad_ (~oftc-webi@p5B09D25D.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[22:50] * sjust1 (~sam@ Quit (Ping timeout: 480 seconds)
[22:53] * Esmil (esmil@horus.0x90.dk) Quit (Remote host closed the connection)
[22:55] * espeer (~espeer@105-236-45-136.access.mtnbusiness.co.za) has joined #ceph
[22:55] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[22:55] <espeer> hello, can anyone help me with a cephfs/mds issue? parts of the FS tree seem to deadlock ceph-fuse if I try access them...
[22:58] * _are__ is now known as _are_
[23:01] <davidz> espeer: what version of ceph and ceph-fuse/kernel are you running?
[23:01] * sjustlaptop (~sam@2607:f298:a:607:6d1c:c3a8:c2ea:4b6) Quit (Quit: Leaving.)
[23:02] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[23:02] * loicd (~loic@magenta.dachary.org) has joined #ceph
[23:03] <ccourtaut> yehudasa: Hi
[23:04] * dcasier (~dcasier@ Quit (Ping timeout: 480 seconds)
[23:04] * sjustlaptop (~sam@2607:f298:a:607:6d1c:c3a8:c2ea:4b6) has joined #ceph
[23:05] <ccourtaut> yehudasa: i'm still working on rgw disaster recovery and i would have some questions to ask
[23:05] * sjustlaptop (~sam@2607:f298:a:607:6d1c:c3a8:c2ea:4b6) Quit ()
[23:08] * sjust (~sam@2607:f298:a:607:baac:6fff:fe83:5a02) has joined #ceph
[23:08] * sjustlaptop (~sam@2607:f298:a:607:6d1c:c3a8:c2ea:4b6) has joined #ceph
[23:11] <jks> after compiling qemu 1.4.2 on CentOS 6.4, a simple qemu-img convert -O rbd fails with a segmentation fault... any ideas?
[23:12] <jks> the resulting rbd is created (can be seen with rbd ls), but it fails before data has been written to it
[23:15] <jks> apparently it crashes in: librbd::close_image(librbd::ImageCtx*) ()
[23:15] <jks> (Ceph 0.56.6)
[23:17] * via (~via@smtp2.matthewvia.info) Quit (Ping timeout: 480 seconds)
[23:18] * cjh_ (~cjh@ps123903.dreamhost.com) has joined #ceph
[23:18] * eschnou (~eschnou@168.176-201-80.adsl-dyn.isp.belgacom.be) Quit (Quit: Leaving)
[23:18] <espeer> qemu live migration also stopped working with the async flush patch
[23:19] <espeer> qemuMigrationCancelDriveMirror:1386 : Unable to stop block job on drive-vi
[23:19] <espeer> rtio-disk0
[23:21] <cjh_> does anyone know what caps are required on the radogsgw restful api to modify user attributes?
[23:21] <cjh_> i have write=* and read=* and it still denies me
[23:25] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[23:26] <ccourtaut> yehudasa: what is your test setup? and how do you deploy it?
[23:26] <ccourtaut> yehudasa: and is there a repository for the replication agents?
[23:26] * BillK (~BillK@124-148-124-185.dyn.iinet.net.au) has joined #ceph
[23:28] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[23:29] <jks> figured out my qemu-img segfault... when given -O rbd it crashes, substituting -O raw makes it work
[23:35] * Wolff_John (~jwolff@ftp.monarch-beverage.com) Quit (Quit: ChatZilla 0.9.90 [Firefox 21.0/20130511120803])
[23:37] * redeemed_ (~redeemed@static-71-170-33-24.dllstx.fios.verizon.net) has joined #ceph
[23:40] <davidz> cjh_: what about looking at radosgw-admin user info output for that user?
[23:41] <cjh_> davidz: yeah i did that and it seems like i have every cap possible.
[23:42] * redeemed (~redeemed@static-71-170-33-24.dllstx.fios.verizon.net) Quit (Remote host closed the connection)
[23:43] <cjh_> davidz: my user has usage=*, user=*
[23:43] <cjh_> is there anything else I can set for caps?
[23:46] <yehuda_hm> cjh_: try 'users=read, write'
[23:47] <cjh_> ok
[23:49] <cjh_> yehuda_hm: are these caps documented somewhere? i had trouble finding them over the weekend
[23:51] <ccourtaut> yehuda_hm: hi
[23:51] <ccourtaut> yehuda_hm: i would have a few questions about rgw disaster recovery
[23:51] * The_Bishop (~bishop@e177088019.adsl.alicedsl.de) has joined #ceph
[23:52] <davidz> espeer: Our guess is that the qemu live migration issue is a bug in qemu and should contact qemu-devel / file a bug report.
[23:52] <ccourtaut> first of all, how do you test your code? In fact how do you deploy a test setup with master/slave region/zone
[23:54] * jahkeup (~jahkeup@ Quit (Ping timeout: 480 seconds)
[23:55] <ccourtaut> and is there a repository with replication agents code?
[23:56] * sjustlaptop (~sam@2607:f298:a:607:6d1c:c3a8:c2ea:4b6) Quit (Ping timeout: 480 seconds)
[23:57] * Tamil (~tamil@ Quit (Quit: Leaving.)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.