#ceph IRC Log


IRC Log for 2013-03-26

Timestamps are in GMT/BST.

[0:17] <dmick> you could compare that upstream with github.com/ceph/ceph-client and see how far off it is
[0:17] * jjgalvez1 (~jjgalvez@ has joined #ceph
[0:18] * jjgalvez2 (~jjgalvez@ has joined #ceph
[0:23] * leseb (~leseb@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Remote host closed the connection)
[0:23] * jjgalvez (~jjgalvez@ Quit (Ping timeout: 480 seconds)
[0:25] * jjgalvez1 (~jjgalvez@ Quit (Ping timeout: 480 seconds)
[0:28] * xiaoxi (~xiaoxiche@ has joined #ceph
[0:30] <dmick> elder: (or anyone): are you aware of a bug like "Mapping an invalid image causes a kernel bug in __kmalloc from ceph_parse_options()"? mailing list user
[0:30] <elder> I'll look at it. Just a minute.
[0:31] <dmick> subj "kernel BUG when mapping unexisting rbd device" on ceph-users; he doesn't say what kernel version
[0:41] <elder> The "rbd map ..." means the rbd CLI is packaging up the arguments into something passed to the kernel.
[0:41] <elder> I have to figure out precisely what was done there.
[0:43] <elder> Hmm. is there an easy way to find out what exactly gets built up by do_kernel_add() in rbd.cc?
[0:43] <dmick> yeah, it's supposed to write pool, image, and optionally snap to /sys/bus/rbd/add, of course
[0:44] <elder> Well yes, but I want to know precisely what happens when provided "rbd add afs254-vicepa"
[0:44] <elder> I mean map
[0:44] <dmick> yeah
[0:44] <dmick> I don't think it's logged. one could strace of course but
[0:45] <dmick> and yeah, there's mon addrs, and name=, and key=, and other stuff there
[0:46] <elder> The kernel parses the input sent to /sys/bus/rbd/add into tokens separated by whitespace.
[0:46] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has left #ceph
[0:47] * BManojlovic (~steki@fo-d- Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:49] <dmick> ok so for "rbd map image", I get
[0:49] <elder> Duh. I should have just tried that.
[0:50] <dmick> name=admin,key-=client.admin rbd image
[0:50] <dmick> where I'm sure 'rbd' there is the poolname
[0:50] <elder> key-=
[0:50] <elder> ?
[0:50] <elder> Or did you type that?
[0:50] <dmick> sorry, key=
[0:50] <dmick> I typed; strace gave me a partial or a hex dump
[0:50] <elder> It's posslble the IP is a little funny.
[0:50] <elder> I'll be back in a bit, dinner is served.
[0:50] <dmick> l
[0:50] <dmick> k
[0:51] * rustam (~rustam@5e0f5b1e.bb.sky.com) Quit (Remote host closed the connection)
[0:55] * xiaoxi (~xiaoxiche@ Quit (Ping timeout: 480 seconds)
[1:04] * rustam (~rustam@5e0f5b1e.bb.sky.com) has joined #ceph
[1:06] * ron-slc (~Ron@173-165-129-125-utah.hfc.comcastbusiness.net) Quit (Remote host closed the connection)
[1:07] <dmick> does anyone happen to know what Danny was using for checking for multiply-included files?
[1:07] <dmick> I have a merge where I'm likely to screw that up and want to doublecheck if it's easy
[1:10] <houkouonchi-home> gregaf: he did? I don't see them
[1:10] <houkouonchi-home> and this proves I don't check IRC very often =P
[1:11] <dmick> https://github.com/tv42/downburst
[1:11] <gregaf> haha, yeah
[1:11] <gregaf> I follow him on twitter and he put it there
[1:11] <houkouonchi-home> oh its his own fork
[1:21] * alram (~alram@ Quit (Quit: leaving)
[1:25] * tore_ (~tore@ Quit (Remote host closed the connection)
[1:26] * jjgalvez2 (~jjgalvez@ Quit (Ping timeout: 480 seconds)
[1:27] * jtang1 (~jtang@ Quit (Quit: Leaving.)
[1:28] * xiaoxi (~xiaoxiche@ has joined #ceph
[1:28] <LeaChim> Does anyone know if there's any documentation on how to get ceph working as a storage backend for hadoop? i.e. in terms of which version of hadoop, where the ceph/hadoop library is and so on.
[1:31] * vata (~vata@ Quit (Ping timeout: 480 seconds)
[1:31] <dmick> there's this, which doesn't really answer either of those: http://ceph.com/docs/master/cephfs/hadoop/
[1:32] <LeaChim> Mhm, indeed, I'd found that page
[1:33] * sagelap (~sage@2600:1012:b013:bce1:c946:d3aa:5e5f:e561) has joined #ceph
[1:35] <dmick> Noah Watkins or Joe Buck are the two Inktankers I would ask, but it appears they've both gone for the day; I suggest email to ceph-users
[1:36] <dmick> http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/13339 may be helpful as well
[1:36] <LeaChim> ok, cheers
[1:37] * sagelap1 (~sage@ Quit (Ping timeout: 480 seconds)
[1:37] <mauilion> LeaChim: this might get you started. http://static.usenix.org/publications/login/2010-08/openpdfs/maltzahn.pdf
[1:37] <dmick> that's quite old tho
[1:38] <mauilion> true
[1:43] * sagelap (~sage@2600:1012:b013:bce1:c946:d3aa:5e5f:e561) Quit (Ping timeout: 480 seconds)
[1:45] <LeaChim> I'll have a good look at it tomorrow. Thanks guys.
[1:52] * Cube1 (~Cube@ has joined #ceph
[1:53] * Cube1 (~Cube@ Quit ()
[1:53] * sagelap (~sage@2600:1012:b013:bce1:fc07:6fbc:5271:15d) has joined #ceph
[1:55] * esammy (~esamuels@host-2-103-103-175.as13285.net) Quit (Quit: esammy)
[1:55] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[1:59] * Cube (~Cube@ Quit (Ping timeout: 480 seconds)
[2:03] * LeaChim (~LeaChim@b0fae63d.bb.sky.com) Quit (Ping timeout: 480 seconds)
[2:09] * sagelap (~sage@2600:1012:b013:bce1:fc07:6fbc:5271:15d) Quit (Quit: Leaving.)
[2:13] * rturk is now known as rturk-away
[2:30] * sagelap (~sage@ has joined #ceph
[2:30] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[2:30] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[2:31] * loicd (~loic@74-94-156-210-NewEngland.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[2:36] * rustam (~rustam@5e0f5b1e.bb.sky.com) Quit (Remote host closed the connection)
[2:39] <elder> dmick, can you help me interpret an osd log file?
[2:39] <elder> Or a piece of one?
[2:40] <elder> Or sagelap maybe?
[2:40] * vata (~vata@216-13-56-3.dedicated.allstream.net) has joined #ceph
[2:40] * vata (~vata@216-13-56-3.dedicated.allstream.net) Quit ()
[2:44] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Quit: ChatZilla 0.9.90 [Firefox 19.0.2/20130307023931])
[2:55] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[3:00] * chutzpah (~chutz@ Quit (Quit: Leaving)
[3:02] * diegows (~diegows@ Quit (Ping timeout: 480 seconds)
[3:04] * Cube (~Cube@cpe-76-95-217-215.socal.res.rr.com) has joined #ceph
[3:05] * sagelap (~sage@ Quit (Ping timeout: 480 seconds)
[3:15] * xiaoxi (~xiaoxiche@ Quit (Ping timeout: 480 seconds)
[3:16] * sagelap (~sage@ has joined #ceph
[3:17] * JohansGlock (~quassel@kantoor.transip.nl) Quit (Read error: Connection reset by peer)
[3:18] * xiaoxi (~xiaoxiche@ has joined #ceph
[3:20] * loicd (~loic@c-76-119-91-36.hsd1.ma.comcast.net) has joined #ceph
[3:48] <Psi-jack> heh
[3:49] <Psi-jack> Welp, I'll be doing a presentation on Ceph April 3rd. :)
[3:51] * Romeo (~romeo@ Quit (Read error: Connection reset by peer)
[3:51] * Romeo (~romeo@ has joined #ceph
[4:05] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Quit: ChatZilla 0.9.90 [Firefox 19.0.2/20130307023931])
[4:20] * jlogan1 (~Thunderbi@2600:c00:3010:1:51a1:82c7:4e19:4fa) Quit (Ping timeout: 480 seconds)
[5:10] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[5:11] <xiaoxi> hi
[5:12] <xiaoxi> I have seen 0.56.4 released with "fix potential deadlock when 'journal aio = true' "
[5:12] <xiaoxi> I want to confirm if this fix included in 0.58?
[5:16] * The_Bishop (~bishop@f052103222.adsl.alicedsl.de) has joined #ceph
[5:30] <davidz1> xiaoxi: The git log shows that this was back-ported to 0.56.4, but it is NOT in 0.58. The tracker item #4079 "osd: journal aio deadlock"
[5:37] <pioto> hi; it seems that, unlike most other commands, `cephfs /mnt/mycephfs/some/path set_layout -p poolname` doesnt work ,and you need to give a number... how can you find that pool number?
[5:37] <pioto> rados lspools doesn't show one, for example
[5:37] <pioto> also... is it possible to ask a pool it's current 'size' setting? i see you can set it, but... not get it?
[5:37] <dmick> hey elder, you still around?
[5:37] <dmick> I wouldn't expect it
[5:38] <dmick> pioto: you can get it in the current source, but that might be recent
[5:38] <dmick> but ceph -s will show it
[5:38] <dmick> that will also show pool numbers
[5:39] <pioto> ah. `ceph osd lspools` has numbers, too
[5:39] <pioto> i don't see pool numbers in ceph -s
[5:39] <pioto> nor the size setting.. just the number of osds
[5:39] <dmick> sorry. ceph osd dump
[5:40] <dmick> (I had to unbreak my ceph command to verify and I shot off an answer before testing. that'll learn me)
[5:41] <dmick> anyway, in later versions you can ceph osd pool get <poolname> <datum> . Don't quite remember when that went in
[5:41] <elder> I am, but about to go to bed.
[5:41] <dmick> still need help with that osd log?
[5:41] <elder> No, Sage showed up and explained what I could look for.
[5:42] <dmick> ah ok
[5:42] <dmick> he's everywhere, all the time
[5:42] <elder> I did neglect to get back to that parseargs crash. I'll send a message reminding myself to look at that... Is there a bug open at all?
[5:42] <dmick> not AFAIk
[5:42] <dmick> pioto: get <stuff> went in as bug #3869
[5:43] <dmick> in v0.57, looks like
[5:43] <elder> dmick, I liked this from Greg about reviewing patches: "either they're short and simple enough that Sage merged them in the car on the way in"
[5:43] <dmick> that's not hyperbole
[5:45] <elder> Anyway, talk to you tomorrow.
[5:45] <dmick> k gnight
[5:57] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[5:59] <pioto> dmick: hm, ok. so, i got further, following stuff from this thread: http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/6148
[6:00] <pioto> but... i don't seem to see any evidence that my writes are going to the pool i chose
[6:00] <pioto> rados df shows 0 bytes on that pool still, even after i wrote some 90M of data to that path
[6:01] <dmick> unfortunately I don't know a lot about cephfs
[6:02] <pioto> well. all this makes me wonder... is it supposed to be "production ready"? i'm not so sure..
[6:02] <pioto> but i didn't see any clear warnings against in in the docs
[6:03] <dmick> http://ceph.com/dev-notes/cephfs-mds-status-discussion/
[6:04] <dmick> but I say I don't know a lot just because I haven't been working with it a lot
[6:04] <dmick> I know many people are using it daily;
[6:05] <dmick> did you try verifying that show_layout shows what you think you set?
[6:06] <pioto> yes. and now all my cephfs commands i try are just hanging :)
[6:07] <dmick> hm. cluster still healthy?
[6:07] <pioto> yes
[6:07] <dmick> well that's annoying
[6:07] <pioto> pioto@ceph-test:~$ sudo ceph -s
[6:07] <pioto> health HEALTH_OK
[6:07] <pioto> monmap e1: 1 mons at {a=}, election epoch 2, quorum 0 a
[6:07] <pioto> osdmap e6: 2 osds: 2 up, 2 in
[6:07] <pioto> pgmap v527: 584 pgs: 584 active+clean; 197 MB data, 1397 MB used, 13302 MB / 16384 MB avail
[6:07] <pioto> mdsmap e5: 1/1/1 up {0=a=up:active}
[6:08] <pioto> note: i'm running in virtualbox
[6:08] <pioto> but... that seems unlikely to be a root cause
[6:08] <pioto> unless maybe time skew messes things up
[6:08] <pioto> (forgot ntpd, it seems)
[6:08] <dmick> you're not mounting the cephfs on the same (virtual)box that's running the cluster, are you?
[6:09] <pioto> no, 2 separate guests
[6:09] <pioto> but, both on the same physical host
[6:09] <dmick> that should be OK, as long as only one VM is running Ceph daemons
[6:09] <pioto> both bridged to my real ethernet device
[6:10] <pioto> yeah
[6:11] <pioto> hm. load is high on the client
[6:11] <pioto> and other io just seems slow on it too
[6:11] <pioto> server seems fine
[6:11] <pioto> lemme just reboot it...
[6:12] <pioto> i also seem to have hit this, but i think that's (probably) unrelated? http://tracker.ceph.com/issues/2754
[6:13] <dmick> hum. that's supposed to have been fixed
[6:14] <pioto> apparently not in whatever i get from a stock ubuntu 12.04 install, plus the ceph-bobtail repo
[6:14] <pioto> ceph -v
[6:14] <pioto> ceph version 0.56.4 (63b0f854d1cef490624de5d6cf9039735c7de5ca)
[6:15] <pioto> hm
[6:15] <pioto> hmm.
[6:15] <pioto> server is still 0.56.3? hmm
[6:15] <pioto> lemme check for updates...
[6:15] <dmick> 56.4 was just today
[6:16] <pioto> rebooting the client...
[6:16] <pioto> that 90M file is 0 bytes
[6:16] <pioto> hm
[6:17] <pioto> maybe it just got stuck trying to write all the data there, and failed
[6:17] <pioto> or... i dunno
[6:17] <pioto> i just used 'dd' to make some random file
[6:17] <dmick> hm
[6:18] <pioto> hm. well. at least the client survives a `service ceph restart` on the server
[6:22] <pioto> hrm... Mar 26 01:22:16 ceph-client kernel: [ 562.826058] libceph: wrong peer, want, got
[6:22] <pioto> Mar 26 01:22:16 ceph-client kernel: [ 562.827704] libceph: osd1 wrong peer at address
[6:23] <dmick> yeah, an osd died and came back, probably
[6:23] <pioto> yeah, i did a 'restart' of the whole server, which has the mds, mon, and 2 osds, to pick up an upgrade
[6:24] <pioto> and, now things seem to be grinding to a halt again, io-wise
[6:24] <dmick> so that's the client whinging that the cluster is all new. grinding to a halt, not so much
[6:24] <pioto> anyways, that blog post basically seems to say "this isn't production ready yet", in terms of how i interpret it...
[6:26] <pioto> now, i also notice my test vm's only have a half-GB of ram each... any chance ram exhaustion is an issue here?... swap doesn't seem to be touched at all, though
[6:26] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) has joined #ceph
[6:27] <dmick> 1/2G isn't huge for two OSDs
[6:27] <dmick> but if you're not swapping it shouldn't be the issu
[6:28] * sleinen1 (~Adium@2001:620:0:25:5db7:2ab6:b825:5065) has joined #ceph
[6:30] * Cube1 (~Cube@cpe-76-95-217-215.socal.res.rr.com) has joined #ceph
[6:31] <pioto> well. it at least mounts, which is further than i got before
[6:31] <pioto> and i also haven't managed to cause an outright kernel panic yet
[6:32] <pioto> (i did that playing w/ ceph a while ago, i think, with some 'mknod', and then an unmount)
[6:33] * Cube (~Cube@cpe-76-95-217-215.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[6:34] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[6:49] <xiaoxi> davidz1:Thanks
[6:51] <xiaoxi> but reading the code, I am not quite understanding why this deadlock is not occurred everytime the OSD start? in that situation ,the aio_num always be 0
[7:07] * esammy (~esamuels@host-2-103-103-175.as13285.net) has joined #ceph
[7:14] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[7:50] * tnt (~tnt@ has joined #ceph
[7:59] * Cube (~Cube@cpe-76-95-217-215.socal.res.rr.com) has joined #ceph
[8:03] * Cube1 (~Cube@cpe-76-95-217-215.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[8:04] * ScOut3R (~ScOut3R@5401D8E4.dsl.pool.telekom.hu) has joined #ceph
[8:21] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) Quit (Quit: A fine is a tax for doing wrong. A tax is a fine for doing well)
[8:27] * ScOut3R (~ScOut3R@5401D8E4.dsl.pool.telekom.hu) Quit (Ping timeout: 480 seconds)
[8:52] * barryo (~borourke@cumberdale.ph.ed.ac.uk) Quit (Read error: Connection reset by peer)
[9:01] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:08] * barryo (~borourke@cumberdale.ph.ed.ac.uk) has joined #ceph
[9:13] * verwilst (~verwilst@d5152D6B9.static.telenet.be) has joined #ceph
[9:13] * gerard_dethier (~Thunderbi@ has joined #ceph
[9:15] * jtang1 (~jtang@ has joined #ceph
[9:17] * BManojlovic (~steki@ has joined #ceph
[9:20] * tnt (~tnt@ Quit (Ping timeout: 480 seconds)
[9:26] * l0nk (~alex@ has joined #ceph
[9:29] * eschnou (~eschnou@ has joined #ceph
[9:39] * leseb (~leseb@ has joined #ceph
[9:44] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:45] * dosaboy (~gizmo@faun.canonical.com) has joined #ceph
[9:50] * rustam (~rustam@5e0f5b1e.bb.sky.com) has joined #ceph
[9:51] * rustam (~rustam@5e0f5b1e.bb.sky.com) Quit (Remote host closed the connection)
[9:51] * jtang1 (~jtang@ Quit (Quit: Leaving.)
[9:51] * rustam (~rustam@5e0f5b1e.bb.sky.com) has joined #ceph
[9:59] * jantje (~jan@paranoid.nl) Quit (Ping timeout: 480 seconds)
[9:59] * jantje (~jan@paranoid.nl) has joined #ceph
[10:01] * sleinen1 (~Adium@2001:620:0:25:5db7:2ab6:b825:5065) Quit (Quit: Leaving.)
[10:01] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) has joined #ceph
[10:03] * xiaoxi (~xiaoxiche@ Quit (Ping timeout: 480 seconds)
[10:05] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) Quit (Read error: Operation timed out)
[10:06] * rustam (~rustam@5e0f5b1e.bb.sky.com) Quit (Remote host closed the connection)
[10:07] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[10:09] * Rocky (~r.nap@ has joined #ceph
[10:31] * BillK (~BillK@58-7-172-123.dyn.iinet.net.au) has joined #ceph
[10:33] * sleinen (~Adium@2001:620:0:25:f8c4:a63c:ed67:3a57) has joined #ceph
[10:35] * LeaChim (~LeaChim@b0fae63d.bb.sky.com) has joined #ceph
[10:43] * LeaChim (~LeaChim@b0fae63d.bb.sky.com) Quit (Remote host closed the connection)
[10:51] * stacker666 (~stacker66@206.pool85-61-191.dynamic.orange.es) has joined #ceph
[10:51] <stacker666> hi all
[10:52] * jtang1 (~jtang@2001:770:10:500:fdfb:e8b9:65c7:690c) has joined #ceph
[10:52] * jtang2 (~jtang@2001:770:10:500:3179:5aa9:1fe8:f6e) has joined #ceph
[10:53] <stacker666> has anyone tried ceph as datastore for vmware?
[10:53] <stacker666> im exporting that using iscsi
[10:54] <stacker666> but performace is horrible :(
[10:55] * leseb1 (~Adium@ has joined #ceph
[10:56] * sleinen (~Adium@2001:620:0:25:f8c4:a63c:ed67:3a57) Quit (Quit: Leaving.)
[10:57] * leseb (~leseb@ Quit (Remote host closed the connection)
[10:57] * sleinen (~Adium@ has joined #ceph
[10:57] * leseb1 (~Adium@ Quit (Quit: Leaving.)
[10:57] * leseb (~Adium@ has joined #ceph
[10:58] * sleinen1 (~Adium@2001:620:0:25:b884:6f87:8d87:2882) has joined #ceph
[10:58] * JohansGlock (~quassel@kantoor.transip.nl) has joined #ceph
[11:00] * jtang1 (~jtang@2001:770:10:500:fdfb:e8b9:65c7:690c) Quit (Ping timeout: 480 seconds)
[11:00] * jlogan (~Thunderbi@2600:c00:3010:1:51a1:82c7:4e19:4fa) has joined #ceph
[11:01] * BillK (~BillK@58-7-172-123.dyn.iinet.net.au) Quit (synthon.oftc.net oxygen.oftc.net)
[11:01] * jantje (~jan@paranoid.nl) Quit (synthon.oftc.net oxygen.oftc.net)
[11:01] * dosaboy (~gizmo@faun.canonical.com) Quit (synthon.oftc.net oxygen.oftc.net)
[11:01] * barryo (~borourke@cumberdale.ph.ed.ac.uk) Quit (synthon.oftc.net oxygen.oftc.net)
[11:01] * Cube (~Cube@cpe-76-95-217-215.socal.res.rr.com) Quit (synthon.oftc.net oxygen.oftc.net)
[11:01] * sagelap (~sage@ Quit (synthon.oftc.net oxygen.oftc.net)
[11:01] * pioto (~pioto@pool-96-235-30-25.pitbpa.fios.verizon.net) Quit (synthon.oftc.net oxygen.oftc.net)
[11:01] * glowell (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) Quit (synthon.oftc.net oxygen.oftc.net)
[11:01] * wer_ (~wer@206-248-239-142.unassigned.ntelos.net) Quit (synthon.oftc.net oxygen.oftc.net)
[11:01] * Psi-jack (~psi-jack@psi-jack.user.oftc.net) Quit (synthon.oftc.net oxygen.oftc.net)
[11:01] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (synthon.oftc.net oxygen.oftc.net)
[11:01] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (synthon.oftc.net oxygen.oftc.net)
[11:01] * denken (~denken@dione.pixelchaos.net) Quit (synthon.oftc.net oxygen.oftc.net)
[11:01] * Qten (~Qten@ip-121-0-1-110.static.dsl.onqcomms.net) Quit (synthon.oftc.net oxygen.oftc.net)
[11:01] * psieklFH (psiekl@wombat.eu.org) Quit (synthon.oftc.net oxygen.oftc.net)
[11:01] * janos (janos@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (synthon.oftc.net oxygen.oftc.net)
[11:01] * nwl (~levine@atticus.yoyo.org) Quit (synthon.oftc.net oxygen.oftc.net)
[11:01] * Karcaw (~evan@68-186-68-219.dhcp.knwc.wa.charter.com) Quit (synthon.oftc.net oxygen.oftc.net)
[11:01] * darkfader (~floh@ Quit (synthon.oftc.net oxygen.oftc.net)
[11:02] * sleinen1 (~Adium@2001:620:0:25:b884:6f87:8d87:2882) Quit ()
[11:03] * jks (~jks@3e6b5724.rev.stofanet.dk) Quit (Read error: Connection reset by peer)
[11:05] * sleinen (~Adium@ Quit (Ping timeout: 480 seconds)
[11:05] * BillK (~BillK@58-7-172-123.dyn.iinet.net.au) has joined #ceph
[11:05] * jantje (~jan@paranoid.nl) has joined #ceph
[11:05] * dosaboy (~gizmo@faun.canonical.com) has joined #ceph
[11:05] * barryo (~borourke@cumberdale.ph.ed.ac.uk) has joined #ceph
[11:05] * Cube (~Cube@cpe-76-95-217-215.socal.res.rr.com) has joined #ceph
[11:05] * sagelap (~sage@ has joined #ceph
[11:05] * pioto (~pioto@pool-96-235-30-25.pitbpa.fios.verizon.net) has joined #ceph
[11:05] * glowell (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) has joined #ceph
[11:05] * wer_ (~wer@206-248-239-142.unassigned.ntelos.net) has joined #ceph
[11:05] * Psi-jack (~psi-jack@psi-jack.user.oftc.net) has joined #ceph
[11:05] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[11:05] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[11:05] * denken (~denken@dione.pixelchaos.net) has joined #ceph
[11:05] * Qten (~Qten@ip-121-0-1-110.static.dsl.onqcomms.net) has joined #ceph
[11:05] * psieklFH (psiekl@wombat.eu.org) has joined #ceph
[11:05] * janos (janos@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[11:05] * nwl (~levine@atticus.yoyo.org) has joined #ceph
[11:05] * Karcaw (~evan@68-186-68-219.dhcp.knwc.wa.charter.com) has joined #ceph
[11:05] * darkfader (~floh@ has joined #ceph
[11:11] * sleinen (~Adium@ has joined #ceph
[11:12] * barryo (~borourke@cumberdale.ph.ed.ac.uk) Quit (Read error: Operation timed out)
[11:12] * sleinen1 (~Adium@2001:620:0:26:9927:fc10:5b72:2919) has joined #ceph
[11:13] * barryo (~borourke@cumberdale.ph.ed.ac.uk) has joined #ceph
[11:17] * jlogan (~Thunderbi@2600:c00:3010:1:51a1:82c7:4e19:4fa) Quit (Read error: Connection reset by peer)
[11:19] * sleinen (~Adium@ Quit (Ping timeout: 480 seconds)
[11:34] * jks (~jks@3e6b5724.rev.stofanet.dk) has joined #ceph
[11:37] * norbi (~nonline@buerogw01.ispgateway.de) has joined #ceph
[11:38] <norbi> hi #ceph
[11:38] <norbi> upgrading ceph from 0.58 to 0.59 shows me "Existing store has not been converted to 0.52 format" if i start the converted mon, and then the start fails
[11:38] <norbi> how can i convert the format ?
[11:39] <joao> if you're coming from 0.58 that should not have happen, assuming your monitors actually formed a quorum at some point in time
[11:40] <norbi> coming from 0.58 :) now only two monitors up with version 0.58
[11:41] <norbi> must i stop all mons to convert to the new store schema ?
[11:42] <joao> norbi, could you please run an 'ls' on the data dir of the monitor that error'ed out and pastebin it?
[11:42] <joao> norbi, the 0.59 monitor should convert the store automatically on startup, if the store has the right format
[11:43] <norbi> ceph was upgrading from 0.52 to 0.53 ....0.58, thats the problem ? pastebin... takes some seconds :)
[11:44] <joao> what do you mean by 'was upgrading from 0.52 to 0.52... 0.58'?
[11:44] <joao> were you running the monitors in-between and waiting for them to stabilize?
[11:45] <norbi> http://pastebin.com/Zib83dgf
[11:46] <norbi> upgrading vom 0.52 ...0.58 was over a ling time :) the last day/weeks ceph was running fine
[11:46] <norbi> ceph was n status health_OK before upgrading
[11:46] <joao> right
[11:46] <joao> do you have logs for this monitor?
[11:47] <norbi> yes
[11:47] <norbi> pastebin ? :)
[11:47] <joao> if they're not too big, yeah; otherwise I can arrange a place for you to drop them if needed
[11:49] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[11:49] <norbi> here we go
[11:49] <norbi> http://pastebin.com/ZcEqPW8A
[11:50] * __jt__ (~james@rhyolite.bx.mathcs.emory.edu) has joined #ceph
[11:51] * __jt___ (~james@rhyolite.bx.mathcs.emory.edu) Quit (Ping timeout: 480 seconds)
[11:52] <joao> norbi, would you happen to still have logs prior to 0.59? and if so, could you run a grep 'recovered_' on the log ?
[11:52] <joao> although nothing will pop if you haven't run the monitors with debug mon = 10 :\
[11:53] * barryo (~borourke@cumberdale.ph.ed.ac.uk) has left #ceph
[11:54] * barryo (~borourke@cumberdale.ph.ed.ac.uk) has joined #ceph
[11:54] <norbi> grep find nothing, will start with debug mon = 10
[11:54] <joao> don't bother
[11:55] <joao> well, you could, but that's not an issue right now
[11:55] <joao> that message should have been outputted long before 0.59
[11:55] <joao> most likely on 0.52
[11:56] <norbi> ok thats to old, last logfile is from 20130320
[11:56] <joao> norbi, could you just tar the 'feature_set' file from your mon store and send it to my email?
[11:56] <joao> joao.luis@inktank.com
[11:57] <norbi> ok no problem
[12:00] * sleinen1 (~Adium@2001:620:0:26:9927:fc10:5b72:2919) Quit (Quit: Leaving.)
[12:00] <norbi> hope u got it, its just send via swaks
[12:01] <joao> got it
[12:01] <joao> thanks
[12:03] * sleinen (~Adium@ has joined #ceph
[12:04] * sleinen1 (~Adium@2001:620:0:25:35c9:b533:2d8e:ae28) has joined #ceph
[12:07] * mcclurmc_laptop (~mcclurmc@firewall.ctxuk.citrix.com) has joined #ceph
[12:07] * leseb (~Adium@ Quit (Quit: Leaving.)
[12:07] <joao> norbi, can you please run 'strings /mon/data/feature_set' on your remaining monitors and show me the output?
[12:08] <norbi> ok
[12:08] * diegows (~diegows@ has joined #ceph
[12:08] <norbi> initial feature set (~v.18)
[12:08] <norbi> global version sequencing (v0.52)
[12:09] <joao> that's on the remaining monitors, right?
[12:09] <norbi> yes
[12:09] <joao> so, what's different between the monitor you're having troubles with and the remaining monitors?
[12:10] <joao> the monitor you're having troubles with should also hve those two entries; but if you run the same command on mon.a, it will only output the first
[12:10] <norbi> the remaining monitors are running 0,58 :)
[12:10] <norbi> yes
[12:10] <joao> the only reason I can tell is that mon.a never belonged to the quorum, or something weird as such
[12:11] <norbi> so , u think if i upgrade mon.c or mon.b, there will be no problems ?
[12:11] * sleinen (~Adium@ Quit (Ping timeout: 480 seconds)
[12:11] <joao> norbi, I'm quite sure of it
[12:12] <joao> might incur in some other issue maybe, but not this one
[12:12] <norbi> if not, im happy that this is just a testsystem :)
[12:12] <norbi> i try
[12:12] <joao> norbi, if you happen to be able to downgrade mon.a to 0.58 and rerun it for a bit, it should set that flag eventually
[12:13] <joao> I can only wonder why in hell it isn't set though
[12:13] <joao> my guess is that mon.a spent all its time not belonging to the quorum, but that seems rather farfetched
[12:14] <norbi> ok u are right, mon.c was converted
[12:14] <joao> norbi, would you be kind enough to point me to the last logs prior to mon.a's upgrade?
[12:14] <joao> norbi, there's on-wire incompatibility between 0.58 and 0.59; you'll have to upgrade a majority in order to obtain a working 0.59 quorum
[12:16] <norbi> ok mon.b has converted too, strange :)
[12:16] <norbi> logfiles from mon.a before the upgrade ?
[12:17] <joao> yeah, whatever you have
[12:17] <joao> just need to go through with it, hoping to find a reason for this to have happened
[12:17] <norbi> yes, upgraded mon.b -> will crash mon.c :)
[12:17] <joao> err, *go through it*
[12:18] <joao> wow, crashed mon.c?
[12:18] <joao> log please?
[12:18] <norbi> ok
[12:19] <norbi> http://pastebin.com/VLNmfKag
[12:20] <joao> aah
[12:20] <joao> does that trace start with a 'FAILED assert(0)' on auth/none/AuthNoneServiceHandler.h by any chance?
[12:21] <joao> and by that I mean this: http://tracker.ceph.com/issues/4519
[12:21] <norbi> yes
[12:21] <joao> yeah, it is fixed on master or next, not sure which, but not yet on 0.59 :\
[12:22] <joao> I'm assuming you're not using cephx?
[12:22] <norbi> its disabled
[12:24] <joao> yeah, that's the one
[12:24] <norbi> hm
[12:24] <joao> it's indeed fixed on both master and next
[12:24] <joao> was fixed just last week, xiaoxi hit it first
[12:25] <norbi> have downloaded the tgz today ?
[12:25] <joao> are you running on master?
[12:26] <norbi> no, have downleaded from http://ceph.com/download/ceph-0.59.tar.gz
[12:27] <norbi> seems that is not the newest ?
[12:27] <joao> that's the newest release, but the fix is still on git's master/next branches
[12:28] <joao> not sure if we have nightly tar.gz's
[12:28] * thorus (~jonas@pf01.intranet.centron.de) has joined #ceph
[12:28] <joao> let me check
[12:28] <joao> norbi, you have compiled that from source, right?
[12:28] <norbi> yes
[12:29] <norbi> btw, mon.a is running ceph 0.58, mon was starting, but he cant get into quorum
[12:30] <joao> norbi, for the monitors to form a quorum, a majority either need to be <= 0.58 or >= 0.59
[12:30] <norbi> oh ok
[12:30] <joao> in your case, I'm assuming you have only 3 monitors, right?
[12:30] <norbi> now only one :D
[12:31] <norbi> yes 3 ;)
[12:31] <norbi> mon.c = upgraded and crashing, mon.a = not upgraded, mon.b = upgraded
[12:31] <joao> okay, if I were to check what had happened, I'd do the following
[12:32] <joao> shutdown all monitors; downgrade all monitors to v0.58; rm 'store.db' from all three monitors data stores; start v0.58 monitors
[12:32] <joao> let them run for a while
[12:32] <norbi> OK !
[12:33] <joao> check if all monitors 'string /mon/data/feature_set' contains this
[12:33] <joao> <norbi> initial feature set (~v.18)
[12:33] <joao> <norbi> global version sequencing (v0.52)
[12:33] <joao> then, grab git's master; compile it; then upgrade the monitors and restart
[12:34] <joao> if you could go through with this and let me know how it goes I'd really, *really* appreciate it ;)
[12:34] <tnt> yehudasa: ping
[12:35] * dosaboy (~gizmo@faun.canonical.com) Quit (Read error: Connection reset by peer)
[12:36] <norbi> i have one server, that only uses OSD with 0.59
[12:36] <norbi> must i downgrade him too ?
[12:37] <norbi> because, i have no crashing OSDs :(
[12:37] <norbi> no = now
[12:40] <joao> I don't think there's an incompatibility between 0.58 and 0.59 osds
[12:40] <joao> how does the osd crash look like?
[12:42] <norbi> http://pastebin.com/mxpL2N77
[12:46] <norbi> mon.a nad mon.b are in quorum now (mon.c is still compiling), but strings feature_set from mon.a shows only one line "initial feature set (~v.18)"
[12:49] * ScOut3R (~ScOut3R@rock.adverticum.com) has joined #ceph
[12:49] * leseb (~Adium@ has joined #ceph
[12:57] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[12:57] <joao> norbi, that osd crash might be worth to send to the list, or rather maybe wait for sjust to confirm whether or not there was some incompatibility between 0.58 and 0.59 introduced
[12:57] <joao> I'll check with him when he checks-in
[12:58] <joao> I'm not putting aside that the monitor might still be to blame on that one though
[13:00] <norbi> sounds not so good, will the osds working again if i downgrade the other osds to 0.58 ?
[13:00] <joao> that I don't know
[13:00] <joao> I suppose, but am not sure
[13:01] <joao> and by not sure in this case I really mean 'I don't know' :\
[13:01] <norbi> ok, before the last gigabyte get crashed, i will wait
[13:06] <joao> norbi, does mon.a still only have 'initial feature set' on the feature_set file?
[13:07] <norbi> yes :(
[13:07] <norbi> now with 3 mons in quorum
[13:07] <norbi> quorum 0,1,2 a,b,c
[13:07] <joao> if so, can you please run a 'for i in /mon/data/*_gv ; do echo "$i : `ls -l $i | wc -l`" ; done' ?
[13:08] <norbi> pasted output to you
[13:08] <joao> saw it, thanks
[13:08] <norbi> the output is the same on all mons
[13:09] <joao> as expected; the only thing I really wasn't expecting was the missing feature set
[13:09] <joao> we should set debug mon = 10 on mon.a and restart it
[13:09] <joao> and then I would love to take a look at the log
[13:09] <norbi> i can cscp the file ? :)
[13:10] <norbi> scp
[13:10] * markbby (~Adium@ has joined #ceph
[13:10] <joao> sftp to cephdrop@ceph.com
[13:19] <norbi> think the problem with the crashing osds, are the false ouput from the mons ?, the mon says 21 osds: 18 up, 21 in, but "in", are only 10 OSDs, the others crashed
[13:21] <norbi> what about copying the "feature_set" file from mon.c or mon.b to mon.a ? md5sum says its the same file.
[13:22] * diegows (~diegows@ Quit (Ping timeout: 480 seconds)
[13:22] <joao> norbi, we really want to know what's happening with mon.a for it not to set the on-disk feature
[13:22] <joao> and regarding the osds question, can't really say
[13:24] <joao> we fixed a bug just in time for 0.59 I think, that in certain cases would lead an osd boot message to be ignored
[13:25] <joao> don't know if that's what is happening here; it was only introduced post-0.58
[13:25] <norbi> :(
[13:25] <joao> I need more coffee and/or food
[13:25] <joao> brb
[13:25] <norbi> ;)
[13:41] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[13:53] * jskinner (~jskinner@ has joined #ceph
[14:07] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[14:09] * hybrid5121 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[14:35] * drokita (~drokita@ has joined #ceph
[14:37] * xiaoxi (~xiaoxiche@jfdmzpr04-ext.jf.intel.com) has joined #ceph
[14:44] <xiaoxi> hi,could anyone pls tell me how ms_dispatch_throttle_bytes/ops works?
[14:47] <xiaoxi> sorry ,only ms_dispatch_throttle_bytes
[14:49] * Romeo (~romeo@ Quit (Read error: Connection reset by peer)
[14:49] * Romeo (~romeo@ has joined #ceph
[14:49] * sivanov (~sivanov@gw2.maxtelecom.bg) has joined #ceph
[14:49] <sivanov> Hello all. What mean ceph osd create 131, error: (22) Invalid argument
[14:50] <tnt> 131 is not a valid UUID
[14:51] * portante|afk is now known as portante
[14:57] <sivanov> I'm not sure I understood correctly
[14:58] <janos> the "131" in that command - it's an optional parameter and it's expecting a UUID format
[14:59] <sivanov> okay, thanks
[15:00] * PerlStalker (~PerlStalk@ has joined #ceph
[15:02] * ScOut3R (~ScOut3R@rock.adverticum.com) Quit (Ping timeout: 480 seconds)
[15:11] * aliguori (~anthony@ has joined #ceph
[15:18] * maxiz (~pfliu@ has joined #ceph
[15:26] * sagelap1 (~sage@2600:1012:b02f:719d:fc07:6fbc:5271:15d) has joined #ceph
[15:31] * sagelap (~sage@ Quit (Ping timeout: 480 seconds)
[15:36] * b1tbkt (~Peekaboo@68-184-193-142.dhcp.stls.mo.charter.com) has joined #ceph
[15:37] * ScOut3R (~ScOut3R@ has joined #ceph
[15:40] * markbby (~Adium@ Quit (Quit: Leaving.)
[15:41] * markbby (~Adium@ has joined #ceph
[15:41] * markbby (~Adium@ Quit ()
[15:41] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[15:41] * markbby (~Adium@ has joined #ceph
[15:44] <joao> norbi, still around?
[15:45] <norbi> yes
[15:45] <norbi> 15min to go :)
[15:46] <joao> still have those monitors on 0.58?
[15:46] <norbi> yes
[15:46] <joao> and does mon.a's feature_set still only show the v0.18 entry?
[15:47] <norbi> yes
[15:47] <norbi> ceph -s
[15:47] <joao> okay
[15:47] <norbi> quorum 0,1,2 a,b,c
[15:47] <joao> cool, can you please restart mon.b ?
[15:48] <norbi> done
[15:48] <joao> wait for HEALTH_OK, then check mon.a's feature_set again please
[15:50] <norbi> hm that won't go, because i have lost 10 OSDs ;)
[15:50] <joao> oh, okay, just wait for a formed quorum
[15:50] <norbi> but now im in quorum 408 (before it was 406)
[15:51] <norbi> but feature_set is still the same
[15:51] <joao> :\
[15:51] <joao> yeah, I'm going to need your other two monitor's logs if you don't mind
[15:51] <norbi> just the mon.c.log and mon.b.log, riht ?
[15:51] <norbi> right
[15:51] <joao> yes
[15:52] <joao> already got mon.a's :)
[15:53] <norbi> mon.b.log and mon.c.log are there
[15:53] * dosaboy (~gizmo@faun.canonical.com) has joined #ceph
[15:56] * The_Bishop_ (~bishop@e179001229.adsl.alicedsl.de) has joined #ceph
[15:57] * l0nk (~alex@ Quit (Quit: Leaving.)
[15:58] <joao> thanks norbi
[15:59] * loicd (~loic@c-76-119-91-36.hsd1.ma.comcast.net) Quit (Quit: Leaving.)
[16:00] <norbi> so time to go home, i'm here tomorow from about 8am to 12am, we can try to fix it then?
[16:00] * Cube (~Cube@cpe-76-95-217-215.socal.res.rr.com) Quit (Quit: Leaving.)
[16:01] <joao> norbi, I might not able to get back to this until the end of the week, but if you could back up your monitor stores for then, you should be able to get a working monitor cluster by removing mon.a and readding it
[16:01] <norbi> ok
[16:01] * The_Bishop (~bishop@f052103222.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[16:01] <norbi> and about the crashing OSDs ?
[16:01] * The_Bishop__ (~bishop@e177089197.adsl.alicedsl.de) has joined #ceph
[16:01] <norbi> need some logfiles to ?
[16:02] <joao> I'll mention that to sjust when he arrives, and he should be the best person to further get into it
[16:02] <joao> but yeah, log files would be nice
[16:03] <norbi> its no problem to upload in schema osd.x.log ? i dont overwrite other logfiles ?
[16:03] <joao> norbi, better to create a 'norbi' directory and upload your logs there :)
[16:03] <norbi> ok
[16:04] <xiaoxi> nhm:are you around?
[16:05] <norbi> ok logfiles are up in directory norbi, hope you can find the problem
[16:05] <norbi> thx for help !
[16:06] <stacker666> has anyone tried ceph as datastore for vmware? im using iscsi but the performance is very poor :(
[16:07] * norbi (~nonline@buerogw01.ispgateway.de) Quit (Quit: Miranda IM! Smaller, Faster, Easier. http://miranda-im.org)
[16:07] * The_Bishop_ (~bishop@e179001229.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[16:07] <stacker666> 36MB/s
[16:07] * sagelap1 (~sage@2600:1012:b02f:719d:fc07:6fbc:5271:15d) Quit (Ping timeout: 480 seconds)
[16:08] * sagelap (~sage@2607:f298:a:607:c562:1214:14a0:5eae) has joined #ceph
[16:09] <xiaoxi> how do you bridge RBD to ISCSI?
[16:10] <xiaoxi> by the new tgtd?or you just mount rbd to kernel and re-export it to iscsi
[16:14] * jskinner (~jskinner@ Quit (Remote host closed the connection)
[16:16] * loicd (~loic@74-94-156-210-NewEngland.hfc.comcastbusiness.net) has joined #ceph
[16:16] * jlogan (~Thunderbi@2600:c00:3010:1:3914:d1f5:fab4:e336) has joined #ceph
[16:27] * Cube (~Cube@ has joined #ceph
[16:29] <stacker666> <xiaoxi> i map the image of rbd in /dev/rbd0
[16:30] <stacker666> <xiaoxi> and export that with iscsitarget
[16:30] <stacker666> <xiaoxi> as blockdevice
[16:32] <xiaoxi> ok,so how your backend ceph is?
[16:32] <stacker666> <xiaoxi> with a linux i can get 100MB/s write speed but with ESX i have 20 o 30MB/s
[16:32] <xiaoxi> you mean when you doing io on the /dev/rbd0, you can get 100MB/s ,but with the iscsi exported dev, you can only ge 20~30?
[16:32] * baz_ (~baz@2001:610:110:6e1:986f:a364:188a:e0ce) has joined #ceph
[16:33] <stacker666> <xiaoxi> 2 servers R515 running ubuntu 12LTS with ceph 0.56.3
[16:33] <baz_> hi, any news on bug #4065
[16:33] <baz_> I'm doing the quickstart and I'm running into the same issue
[16:34] <stacker666> <xiaoxi> if i use openiscsi from a linux machine i can get this 100MB/s
[16:35] * sagelap1 (~sage@ has joined #ceph
[16:35] <stacker666> <xiaoxi> i connect my laptop for make tests. I have multipath with roundrobin
[16:35] * sagelap (~sage@2607:f298:a:607:c562:1214:14a0:5eae) Quit (Quit: Leaving.)
[16:36] <stacker666> <xiaoxi> when i connect with a R610 server with a ESX installed configured with multipath too only i get 20-30MB/s
[16:37] <baz_> so I'm testing it with two VMs, both having an osd
[16:37] <stacker666> <xiaoxi> i dont know why
[16:37] <baz_> when I start ceph it will heal and one of the osd goes down
[16:38] <baz_> the log shows the client aborts with: FAILED assert("join on thread that was never started" == 0)
[16:39] <baz_> I fails with XFS, but the same setup works fine with btrfs
[16:39] <baz_> that is the weird part
[16:39] <xiaoxi> stacker666:well, you can hardly blame on ceph for this...
[16:40] <xiaoxi> maybe the ISCSI driver of ESX is not that good?
[16:40] <loicd> Hi, I'm looking for tests cases that would help me understand the logic of https://github.com/ceph/ceph/blob/8befbca77aa50a1188969892aabedaf11d8f8ce7/src/os/FlatIndex.cc#L103
[16:41] <stacker666> <xiaoxi> yes you are true. Probably the vmware software adapter is a crap :(
[16:42] <stacker666> <xiaoxi> im happy with ceph not with vmware. you are right, this is not the place to ask this question
[16:42] <baz_> btw, I'm trying with 0.56.4
[16:43] <stacker666> <xiaoxi> im going to make a support request with this guys. Thanks for all
[16:47] * gerard_dethier (~Thunderbi@ Quit (Quit: gerard_dethier)
[16:49] <paravoid> so, I've got a very interesting problem
[16:49] <paravoid> sagelap1 or sjust around?
[16:49] * sagelap1 is now known as sagelap
[16:49] <sagelap> yeah
[16:49] <paravoid> hey :)
[16:50] <paravoid> 2013-03-26 14:44:27.816436 7fe8a079d700 0 log [INF] : osdmap e170991: 144 osds: 136 up, 136 in
[16:50] <paravoid> 2013-03-26 15:47:08.672963 7fe8a079d700 0 log [INF] : osdmap e170992: 144 osds: 24 up, 136 in
[16:51] <sagelap> did the osds mark each other down or did the mon mark them down?
[16:51] <sagelap> should be something in the mon log..
[16:51] * Rocky (~r.nap@ Quit (Quit: Lost terminal)
[16:51] <paravoid> and I know what triggered this
[16:51] <paravoid> so, each of our boxes has 12 disks
[16:51] <paravoid> one of the boxes has a bunch of disks broken
[16:51] <paravoid> so the tech was running megacli to try to find what's going on
[16:52] <paravoid> when I run e.g. megacli PDList, megacli gets stuck and presumably all the I/O on that box gets stuck too
[16:52] <paravoid> for several seconds
[16:52] <paravoid> so marking those 12 OSDs down would be a reasonable thing to do
[16:53] <paravoid> marking 112 of them on all of the boxes on the other hand... :)
[16:53] <paravoid> the answer to your question is
[16:53] <paravoid> 2013-03-26 15:23:09.407529 7fe8a079d700 1 mon.ms-fe1001@0(leader).osd e170991 prepare_failure osd.134 from osd.5 is reporting failure:1
[16:53] <paravoid> 2013-03-26 15:23:09.407545 7fe8a079d700 0 log [DBG] : osd.134 reported failed by osd.5
[16:53] <paravoid> 2013-03-26 15:23:10.258905 7fe8a079d700 1 mon.ms-fe1001@0(leader).osd e170991 prepare_failure osd.134 from osd.80 is reporting failure:1
[16:53] <paravoid> 2013-03-26 15:23:10.258924 7fe8a079d700 0 log [DBG] : osd.134 reported failed by osd.80
[16:53] <paravoid> .134 being on the box in question
[16:54] <sagelap> so that part makes sense (that .134 would be failed). who reproted the other osds on other hosts as failed?
[16:54] <paravoid> those 12 osds apparently!
[16:55] <paravoid> 2013-03-26 15:47:05.537976 7fe8a079d700 0 log [DBG] : osd.1 reported failed by osd.135
[16:55] <paravoid> 2013-03-26 15:47:05.536522 7fe8a079d700 0 log [DBG] : osd.2 reported failed by osd.141
[16:55] <paravoid> 2013-03-26 15:47:05.533538 7fe8a079d700 0 log [DBG] : osd.2 reported failed by osd.140
[16:55] <paravoid> 2013-03-26 15:47:05.533659 7fe8a079d700 0 log [DBG] : osd.0 reported failed by osd.134
[16:55] <paravoid> 2013-03-26 15:47:05.533719 7fe8a079d700 0 log [DBG] : osd.1 reported failed by osd.142
[16:55] <paravoid> 132-143 is the problematic box
[16:56] <sagelap> oh, i see
[16:57] <sagelap> is there any osd logging enabled to show the timing of the failure messages?
[16:57] <sagelap> or on hte mon?
[16:57] <paravoid> the mon logs shows the timing quite well
[16:58] <sagelap> can you attach details to http://tracker.ceph.com/issues/4552 please?
[16:58] <sagelap> i suspect its a pretty straightforward fix
[16:59] <paravoid> ha, I could have filed the bug
[16:59] <paravoid> thanks :)
[16:59] <paravoid> you're too kind
[17:00] <paravoid> I'm looking at the log to see if there's anything private before I put it up
[17:01] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[17:04] * BillK (~BillK@58-7-172-123.dyn.iinet.net.au) Quit (Read error: Operation timed out)
[17:05] <paravoid> sagelap: I just attached osd-tree and mon log there
[17:10] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[17:10] * bithin (~bithin@ has joined #ceph
[17:11] <absynth> sagelap: do you have any idea how dangerous an update to .56.4 might be?
[17:14] * verwilst (~verwilst@d5152D6B9.static.telenet.be) Quit (Quit: Ex-Chat)
[17:31] * eschnou (~eschnou@ Quit (Remote host closed the connection)
[17:33] * alram (~alram@ has joined #ceph
[17:36] * bithin (~bithin@ Quit (Quit: Leaving)
[17:37] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[17:43] * jtangwk (~Adium@2001:770:10:500:499c:2ad8:78dd:2edb) Quit (Quit: Leaving.)
[17:50] * Rocky (~r.nap@ has joined #ceph
[17:51] * rzerres (~ralf.zerr@static-87-79-239-211.netcologne.de) has joined #ceph
[17:52] <sstan> I did ceph osd crush tunables optimal; ceph health shows OK but rbd map is hanging :/
[17:52] <sstan> any ideas?
[17:52] <sstan> actually, it just takes forever
[17:53] * BManojlovic (~steki@ Quit (Remote host closed the connection)
[17:54] * jtangwk (~Adium@2001:770:10:500:d51d:cd0d:57d1:de8e) has joined #ceph
[17:54] * sivanov (~sivanov@gw2.maxtelecom.bg) Quit (Read error: Operation timed out)
[17:57] * ScOut3R (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[17:57] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) Quit (Quit: Leaving.)
[17:58] <rzerres> hello joao, are you busy for today? I'd like to continue our session from friday
[18:02] * chutzpah (~chutz@ has joined #ceph
[18:06] * jtang2 (~jtang@2001:770:10:500:3179:5aa9:1fe8:f6e) Quit (Ping timeout: 480 seconds)
[18:06] * sleinen1 (~Adium@2001:620:0:25:35c9:b533:2d8e:ae28) Quit (Quit: Leaving.)
[18:06] * sleinen (~Adium@ has joined #ceph
[18:13] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[18:14] <rzerres> is someone availble to help me out on a crashed cluster osd? I'm using v0.59?
[18:14] * sleinen (~Adium@ Quit (Ping timeout: 480 seconds)
[18:15] <joao> rzerres, sorry, can't in the next 20-30 minutes, but we'll chat a bit after
[18:15] <joao> if you're still around that is
[18:16] <rzerres> joao: that's ok. we all are a bit busy
[18:16] <rzerres> joao, meanwhile i got a running mon on the last mon-server .....
[18:18] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[18:21] * jtang1 (~jtang@2001:770:10:500:853:5cbd:5217:c456) has joined #ceph
[18:22] <sstan> I've updated to .59. But it's giving me : Invalid argument: /var/lib/ceph/mon/ceph-c/store.db: does not exist (create_if_missing is false)
[18:24] <rzerres> @sstan: had that problem on last friday
[18:25] <sstan> oh were you able to solve it ?
[18:25] <rzerres> mon protocol has changed from 0.58 to 0.59
[18:25] <rzerres> the monitor will check the format and will upgrade to the new one
[18:26] <rzerres> now the monitors need to get a quorum to make the cluster manageble
[18:26] <joao> sstan, that might not really be a problem, just yet another message that is way too verbose and all around stupid to output (must fix that soon)
[18:26] <rzerres> an of corse the election need to succedd with a majority of the given mons
[18:26] <joao> sstan, that's only an issue if the monitor store conversion fails
[18:27] <sstan> ERROR: on disk data includes unsupported features: compat={},rocompat={},incompat={4=}
[18:27] <joao> also, all that rzerres said is accurate
[18:27] <joao> ah, yes, that's the pretty much what rzerres bumped into iirc
[18:27] <joao> sstan, what version are you coming from?
[18:28] <sstan> 58 I think
[18:29] <joao> sstan, can you please run 'strings /var/lib/ceph/mon/ceph-c/feature_set' and share the output with us?
[18:30] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[18:30] <sstan> initial feature set (~v.18)
[18:30] <sstan> global version sequencing (v0.52)
[18:35] <sstan> brb
[18:35] <sstan> .. /etc/init.d/ceph start mon returned
[18:35] * tnt (~tnt@ has joined #ceph
[18:37] * jtang1 (~jtang@2001:770:10:500:853:5cbd:5217:c456) Quit (Quit: Leaving.)
[18:38] <rzerres> @sstan: how many mons are you running?
[18:40] * rzerres (~ralf.zerr@static-87-79-239-211.netcologne.de) has left #ceph
[18:40] * markbby (~Adium@ has left #ceph
[18:41] * rzerres (~ralf.zerr@static-87-79-239-211.netcologne.de) has joined #ceph
[18:41] * Rocky (~r.nap@ Quit (Quit: Lost terminal)
[18:42] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[18:43] <joao> sstan, can you please run 'ceph_test_store_tool /var/lib/ceph/mon/ceph-c/store.db get monitor feature_set' and pastebin it?
[18:44] * sagelap (~sage@ Quit (Read error: Connection reset by peer)
[18:47] <rzerres> @sjust: joao told me you might be the right guy to get my broken osd's up running again.
[18:47] <rzerres> @sjust: my i borrow a bit of your time?
[18:50] <rzerres> @sjust: s/my/may/
[18:54] * markbby (~Adium@ has joined #ceph
[18:54] * sleinen (~Adium@2001:620:0:25:b9e6:e848:a649:797a) has joined #ceph
[18:58] * The_Bishop_ (~bishop@e179010009.adsl.alicedsl.de) has joined #ceph
[18:59] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[18:59] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[19:03] * The_Bishop__ (~bishop@e177089197.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[19:05] * mcclurmc_laptop (~mcclurmc@firewall.ctxuk.citrix.com) Quit (Ping timeout: 480 seconds)
[19:06] <sjust> rzerres: what's up?
[19:07] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[19:07] <rzerres> sjust: hey. i have converted from 0.58 to 0.59 and overcome the initial problems while talking to joao
[19:08] <sjust> ok
[19:08] <rzerres> jjust: now i have my testing cluster (2 cluster-servers, 3 monitors, 8 osd's) running with ceph -w
[19:08] <matt_> rzerres: Is that to do with the monitor crashing? I was just about to ask about this
[19:09] <sjust> ok
[19:09] <rzerres> @sjust: no, it is not. Probably my playing arround with moving osd's and disks and converting from 1Gig net to 10Gig network on the networking device
[19:10] <sjust> rzerres: what is the current problem behavior?
[19:10] <rzerres> sjust: redundancy is set to "2", so my data should were handled correct with the given curshmap
[19:11] <rzerres> sjust: right now, 4 of the osd's on 2nd clusterserver are not starting. logs of this osd's are coredumping on startup. osdmap is needed ....
[19:11] <sjust> clarify: "osdmap is needed ..."
[19:12] <rzerres> i have set up a tmux where you can log in via ssh to have a look yourself. this might be the fastes way to get the overview.
[19:14] * BManojlovic (~steki@fo-d- has joined #ceph
[19:14] <matt_> Joao: Basically I'm just having the exact same issue as reported in http://tracker.ceph.com/issues/4521 after upgrading to 0.59
[19:15] <rzerres> sjust: /usr/bin/ceph-osd() .... NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
[19:15] <joao> matt_, any chance you can also attach a log with the output from the commands I posted on one of the comments of that ticket?
[19:15] <sjust> rzerres: I can't log into your cluster, but if you'd like to post a log from a crashing osd, I can take a look
[19:15] <rzerres> sjust: pastbin?
[19:15] <sjust> sure
[19:16] <rzerres> i'll get prepared
[19:16] <dmick> or if it's too big, you can sftp
[19:16] <sjust> to cephdrop@ceph.com
[19:16] <joao> matt_, I've looked into that bug this morning, and although I have an understanding of what's happening I am yet unsure about the whole picture
[19:16] <matt_> Joao: I seem to be missing ceph_test_store_tool , would this be in the Ubuntu 12.10 packages or another location?
[19:16] <joao> but a fix might come out still this week
[19:17] <dmick> matt_: do you have ceph-test installed?
[19:17] <joao> matt_, it should be part of the 'ceph-test' package iirc
[19:17] <dmick> ^
[19:17] * dosaboy (~gizmo@faun.canonical.com) Quit (Quit: Leaving.)
[19:17] <matt_> ah, that would be it
[19:18] <matt_> Is it best to attach the logs to bug report or paste here?
[19:19] <rzerres> sjust: have a look at pastebin.com/nAbF0qzi
[19:20] <joao> matt_, attach to the ticket, to be used as a reference, please
[19:20] <sjust> rzerres: did you use split?
[19:20] <rzerres> sjust: grabbed out the last meaningful portion
[19:20] <sjust> did you increase pgnum for a pool?
[19:20] <sjust> you'll need to upgrade to current master to fix that one
[19:20] <sjust> back in a few
[19:21] <matt_> Whilst we're waiting on a fix, is it possible to downgrade the monitors after the conversion?
[19:21] <Kdecherf> n/B 6
[19:21] <Kdecherf> hm, sorry.
[19:22] <joao> matt_, as long as you didn't remove anything from the monitor data directory, and assuming you never formed a quorum, running a prior version should work just fine
[19:23] <joao> the thing is, if your monitors reached a quorum at some point, and given that downgrading means using older data, you may end up losing stuff
[19:23] <matt_> hmm, it has formed a quorum
[19:23] <rzerres> sjust: no, just copy 'n paste . increase pgnum: yes i did that back in v0.58 and it did work out, before i was moving some osd's stuff
[19:23] * stacker666 (~stacker66@206.pool85-61-191.dynamic.orange.es) Quit (Ping timeout: 480 seconds)
[19:24] <sstan> still getting pipe(...).fault ... even though everything seems to be runiing
[19:25] <joao> sstan, pipe(...).fault is not an error; just means a connection was dropped, closed or something like that (can never recall the exact meaning)
[19:25] <matt_> it's currently running with the OSD's that were up before the upgrade (on 0.58 monitors) and is stable but it's a bit scary is anything needs restarting
[19:27] <joao> matt_, if you are back on 0.58, once you're done attaching the log to the ticket, if may very well remove 'store.db' from your monitor's data directory, if that makes you feel more comfortable
[19:27] * markbby (~Adium@ has left #ceph
[19:27] <joao> actually, you'll eventually have to remove it before upgrading again
[19:28] <joao> but I really mean only the 'store.db' directory: nothing else
[19:28] <rzerres> @joao: ist store.db created and the one now only used in 0.59 monitor?
[19:29] <matt_> joao, sorry I probably didn't explain that correctly. I was on 0.58 and upgraded the monitors all at once so I'm on 0.59 currently. The OSD's were upgraded prior and any OSD that was up before the monitor upgrade is still running correctly
[19:29] * leseb (~Adium@ Quit (Quit: Leaving.)
[19:29] <matt_> joao, the crash is just for any OSD that starts/restarts now that I've upgraded the monitors to 0.59
[19:29] * sleinen1 (~Adium@2001:620:0:26:f5fe:f848:a00e:5c53) has joined #ceph
[19:30] <matt_> joao, there is 3 monitors all in quorum so I'm probably stuck for a while until there's a fix
[19:30] <sstan> same problem here ... mon starts ... then stops
[19:32] <joao> sorry, I have to run; I'll be back in some 2 or 3 hours
[19:33] <rzerres> @joao: my ceph mon stat shows 'e4: 3 mons at {0=,1=,2=}, election epoch 280, quorum 0,2 1,2'
[19:33] <rzerres> @joao: in 0.58 the quorum status was different separated all online monitores with commas lik 0,1,2
[19:35] * sleinen (~Adium@2001:620:0:25:b9e6:e848:a649:797a) Quit (Ping timeout: 480 seconds)
[19:35] <rzerres> @joao: is the postes status ok? I'm to stupid to get that out (0 is in quorum with 2 and 1 is in quorum with 2, how about 0 talking to 1?
[19:37] <matt_> joao, Thanks for the help
[19:41] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[19:49] * jtang1 (~jtang@ has joined #ceph
[19:49] <rzerres> @sjust: can't follow on "upgrade the current master ....", all machines are running v0.59 binarys
[19:51] <sstan> Monitors still not working :/ As soon as one communicates with them, they die
[19:57] * diegows (~diegows@200-081-038-029.wireless.movistar.net.ar) has joined #ceph
[20:00] <sjust> rzerres: yeah, that assert is simply wrong with split
[20:00] <mjblw> Does anyone know if the ability to change the number of pgs in a pool is committed into master yet? I had heard that this feature is slated for the cuttlefish release.
[20:00] <sjust> it's been removed in current master/.60 whenever .60 comes out
[20:00] <sjust> rzerres: good to hear that split worked though
[20:03] <sjust> mjblw: sorry, "it's been removed in current master/.60 whenever .60 comes out" refers to rzerres' assert
[20:03] <sjust> mjblw: it's in master, but still experimental
[20:03] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[20:03] <sjust> we'll see what the status is after cuttlefish stabilizes
[20:04] <sjust> mjblw: note, you can only increase pgnum
[20:04] <sjust> decreasing is a whole different level of pain
[20:04] <sjust> which we haven't even started on
[20:05] <mjblw> sjust.. is decreasing pg num going to be part of cuttlefish, do you know? I have a pool that I need to reduce the pg num on in order to be able to use the kernel driver (the pool have of 65k pgs)
[20:05] <sjust> mjblw: definitely not
[20:05] <mjblw> *over 65k pgs
[20:05] <sjust> mjblw: and probably not in the medium future
[20:06] <rzerres> sjust: i was just happy that i could increase pg_num for a given pool
[20:06] <sjust> rzerres: yep
[20:07] <rzerres> sjust: and did i got it right, you revoced that possibility for 0.60
[20:07] <mjblw> well, I guess I won't be able to use the rbd kernel driver unless I migrate the rbd volumes into another pool.
[20:07] <sjust> rzerres: no, the feature is still there and stabilizing
[20:07] <rzerres> all right
[20:07] <sjust> rzerres: but the assert you hit was removed
[20:07] <sjust> so if you upgrade to current master, the osd should start right up
[20:08] <sjust> mjblw: I think the kernel driver being fixed to support 32bit pg ids though
[20:09] <sjust> or was already fixed
[20:09] <rzerres> sjust: sorry, i cant follow on that. wow do i know what is the current master? i thought all nodes aar aquivalent and the monitors will take care of the connection
[20:09] <rzerres> s/aar/are/
[20:09] <sjust> rzerres: sorry, I mean our current developement branch of the code base
[20:09] <sjust> you are running 0.59
[20:09] <sjust> current master is the current dev branch
[20:10] <rzerres> ah. so i need to grab 0.60
[20:10] <sjust> if there is one, yeah
[20:10] <sjust> not sure what's been released
[20:10] <rzerres> now, got that. beeing a bit lazy
[20:11] <sjust> ok, 0.60 is current next I think
[20:11] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) Quit (Quit: Leaving.)
[20:13] <mjblw> sjust, the kernel driver is one of two problems we have with the high number of pgs. The rbd client memory usage is too high with the amount of pgs I have. I hope the kernel driver (if and when it supports >65k pgs) will use less memory than running multiple instances of the qemu driver per host machine.
[20:13] <sjust> rzerres: be95af7bf8bca651ff0d28bf488ab3cb149708a5 fixed the bug
[20:13] <sjust> rzerres: and you want current next
[20:13] <sjust> or 0.60 when it's released
[20:14] <rzerres> since i'm using ubuntu: ceph-deb-precise-x86-64-basic/ref/next will point me to builds of current next, right?
[20:14] <sstan> sjust : what is that fixing ?
[20:14] <sjust> I think so
[20:14] <sjust> sstan: a mistaken assert on OSD startup
[20:14] <sjust> sstan: it only affects you if you have been increasing pgnum
[20:14] <sjust> on a pool
[20:14] <sstan> ok
[20:14] <sjust> you'd know, you have to specify --allow-experimental-feature to do it :)
[20:15] <sstan> is it possible to rm -r all monitor folders and start over?
[20:15] <sjust> mjblw: hmm, that's interesting...
[20:16] <sjust> mjblw: can't think of a reason off of the top of my head why that would happen
[20:17] <sjust> sstan: you mean without loosing the data in the OSDs?
[20:17] <sstan> yes
[20:17] <sjust> sstan: probably not
[20:17] <sjust> sstan: why?
[20:18] <sjust> sstan: or not without manually reconstructing a bunch of mon state
[20:18] <sstan> to see if it solves problems : /
[20:19] * jeffv (~jeffv@2607:fad0:32:a02:990f:d0f4:34e5:a977) has joined #ceph
[20:24] * nwat (~Adium@eduroam-233-33.ucsc.edu) has joined #ceph
[20:27] <rzerres> sjust: digging arround at gitbuilder.com, found ref/master with 0.59-503 and ref/next with 0.59-401
[20:27] <rzerres> sjust: will ref/next be the correct version to test?
[20:27] <sjust> you want ref/next I think
[20:27] <sjust> yes
[20:28] <rzerres> ok. did that. pulled all needed packages and started the osd. still same behaviour.
[20:29] <rzerres> updating at pastebin ....
[20:32] <rzerres> sjust: ok. done. you will find the new output
[20:34] * eschnou (~eschnou@227.159-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[20:34] <sjust> from first crash:  ceph version 0.59 (cbae6a435c62899f857775f66659de052fb0e759)
[20:34] <sjust> from second crash:  ceph version 0.59 (cbae6a435c62899f857775f66659de052fb0e759)
[20:34] <sjust> you did not upgrade
[20:35] <rzerres> sjust: oh, i'm so stupid. wrong machine .... hold on
[20:35] <sjust> no worries
[20:36] <rzerres> now 0.59-401-g23faa9f-1precise on all cluster-hardware
[20:36] <sjust> that should be right
[20:37] <rzerres> yes, it's syncing now .... i might change debug level back from 20 to much output ....
[20:37] <sjust> yeah, it's a bit of a fire hose
[20:37] <rzerres> looks a bit like artwork :)
[20:38] <rzerres> at least i do see the correct journal and osd-path so ...
[20:39] <sstan> rzerres : did you get your mons to boot properly ?
[20:40] <rzerres> right now, the machine is to busy to get any ssh feedback :(
[20:40] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[20:41] <rzerres> last mon stat was: 'e4: 3 mons at {0=,1=,2=}, election epoch 280, quorum 0,2 1,2'
[20:55] <rzerres> sjust: thanks a lot! cluster is resolving its degraded state now. Data will be available again, yeah!
[20:55] <sjust> rzerres: cool
[20:56] <rzerres> sjust: one more question now
[20:57] <rzerres> sjust: i have confured the cluster to use a dedicated 1gb eth for the cluster addr
[20:57] * eschnou (~eschnou@227.159-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[20:58] <rzerres> this will give me max ~110Mbyte thoughput for the sync, right?
[20:58] <rzerres> since this is the physical limit of the interface
[20:58] <sjust> rzerres: first, you may not be able to actually get that from the link
[20:59] <sjust> rzerres: second, ceph is going to squander some of your bandwidth on latency
[20:59] <sjust> rzerres: are you doing client IO at the same time?
[20:59] <rzerres> sjust: i have put in a 10Gbit card in both cluster systems, doing peer to peer linking
[21:00] <sjust> rzerres: so you aren't using the 1gb connection?
[21:00] <rzerres> sjust: no no IO now while the cluster is rebuilding
[21:00] <rzerres> sjust: right now, yes
[21:00] <sjust> rzerres: just to clarify, at the moment, you have configured the cluster addr on the OSDs to be the 10g link?
[21:01] <sjust> and you only have two OSD machines?
[21:01] <rzerres> sjust: if i would like to rebind to a new ip for the cluster addr, pointing to the new ibg cards i have to stop the cluster first?
[21:01] <sjust> rzerres: yeah
[21:01] <rzerres> sjust: yes just 2 osd machines
[21:02] <rzerres> sjust: is it save to do it right now, stopping the rebuild with a save service ceph stop; refine the ip addresses in the ceph.conf; restarting service ceph start?
[21:03] <sjust> I think so
[21:04] <rzerres> that should give a way better sync rate. Then the sata-disks will be the bottleneck, not the nic, right?
[21:04] <sjust> rzerres: hopefully
[21:04] <sjust> you will probably need to tweak the OSD settings
[21:04] <rzerres> sjust: i will let you know ....
[21:04] <sjust> k
[21:05] * DJF5 (~dennisdeg@backend0.link0.net) Quit (Ping timeout: 480 seconds)
[21:05] <rzerres> sjust: which osd settings beside the 'cluster addr = xx.xx.xx.xx'?
[21:06] <rzerres> :q
[21:06] <sjust> that should do it for switching the interface (I think)
[21:06] <sjust> you may need to tune the recovery settings to maximize throughput
[21:06] <sjust> we'll see
[21:06] <rzerres> ok. will tweak the conf now ....
[21:13] <rzerres> sjust: working!
[21:14] <rzerres> sjust : output from ceph -w : 18588 GB avail; 1104720/2312859 degraded (47.764%); 3857/1155732 unfound (0.334%); recovering 13 o/s, 56710KB/s
[21:14] <sjust> cool
[21:14] <sstan> rzerres : how did you fix your mons ?
[21:14] <sjust> though it's only recovering at 56 MB/s
[21:14] <sjust> which is not great
[21:15] <rzerres> sstan: with the help of joao ...
[21:15] <sstan> last Friday?
[21:15] * eschnou (~eschnou@227.159-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[21:15] <sstan> or in a private conversation
[21:15] <sjust> rzerres: what is the output of ceph -w?
[21:15] <sjust> oops
[21:15] <sjust> ceph -s
[21:15] <rzerres> sjust: it is between 55 MB/s and 200Mb/s
[21:15] <sjust> rzerres: that's more reasonable
[21:16] <rzerres> sjust: depends on the pool, i guess
[21:16] <sjust> you may be able to kick up the rate by setting osd_recovery_max_active to 20
[21:16] <rzerres> sjust: using 2 osd's on an underlying raid 5, 2osd's bound to jbod sata controller
[21:17] <rzerres> ok, will try that
[21:17] <sjust> ceph osd tell \* -- injectargs 'osd-recovery-max-active=20'
[21:17] <sjust> I think
[21:17] <sjust> wait
[21:17] <rzerres> waiting ....
[21:17] <sjust> ceph osd tell \* -- injectargs '--osd-recovery-max-active=20'
[21:17] <sjust> that
[21:17] <rzerres> whi \* ?
[21:18] <sjust> you don't need it unless your shell would otherwise expand the *
[21:18] <sjust> which just about all of them do
[21:18] <sjust> hmm, so you have exactly two osds?
[21:18] <sjust> 1 osd per machine?
[21:18] <rzerres> ah, a wildcard for all running osds
[21:18] <sjust> right
[21:20] <sjust> rzerres: note, that setting will probably increase the impact of recovery on client IO
[21:21] <rzerres> sjust: no, like this
[21:21] <rzerres> -1 16.8 room server-raum-keller
[21:21] <rzerres> -2 16.8 rack rack-daywalker
[21:21] <rzerres> -4 14.8 storage daywalker-data
[21:21] <rzerres> -6 7.4 host dwssrv1
[21:21] <rzerres> 0 3.7 osd.0 up 1
[21:21] <rzerres> 1 3.7 osd.1 up 1
[21:21] <rzerres> -7 7.4 host dwssrv2
[21:21] <rzerres> 3 3.7 osd.3 up 1
[21:21] <rzerres> 4 3.7 osd.4 up 1
[21:21] <rzerres> -5 2 storage daywalker-archive
[21:21] <rzerres> -8 2 host dwssrv1-archive
[21:21] <rzerres> 7 1 osd.7 up 1
[21:21] <rzerres> 8 1 osd.8 up 1
[21:21] <rzerres> -9 2 host dwssrv2-archive
[21:21] <rzerres> 5 1 osd.5 up 1
[21:21] <rzerres> 6 1 osd.6 up 1
[21:21] <sjust> intereseting
[21:21] <sjust> how many disks behind each osd?
[21:22] <rzerres> osd.0 and osd.1 -> raid with 3x 2TB
[21:22] <rzerres> and osd.3 and osd.4 - alike
[21:22] <rzerres> osd 5 to osd.6 -> jbod with 1TB each connected to an LSI SATA
[21:23] <rzerres> i do use 10GB partiaitons on ssd as a journal.
[21:23] <rzerres> so each machine has an extra 120 GB ssd
[21:23] <sjust> hmm, our defaults really assume one osd per disk, you would probably benefit from some tuning
[21:23] <sjust> in particular
[21:23] * terje (~Adium@c-67-176-69-220.hsd1.co.comcast.net) has joined #ceph
[21:24] <rzerres> sjust: i'm just the newbee kiddy getting his hands on the tools
[21:24] <sjust> osd_op_threads defaults to 2 but with fewer osds per host you might try kicking that up to 3 or 4 (or higher, I don't really have a good idea of what would be optimal)
[21:24] <rzerres> sjust: tuning is next step, before going into production
[21:24] <sjust> osd_recovery_threads defaults to 1 but you might benefit from having it set to 2 or more
[21:25] <sjust> filestore_op_threads defaults to 2 but you might want 4 or more
[21:25] <sjust> just some settings to play with
[21:25] <rzerres> sjust: very intersting
[21:27] <rzerres> sjust: so i can play with online injection like -> - ceph osd tell \* -- injectargs '<cmd>' ?
[21:27] <sjust> osd_recovery_threads -> <newval> would be '--osd-recovery-threads=<newval>'
[21:27] <nhm> rzerres: you may find this useful: http://ceph.com/community/ceph-bobtail-jbod-performance-tuning/
[21:28] <rzerres> sjust: yes, thanks. I already found this very helpfull stresstest reports .... well done.
[21:31] <rzerres> sstan: will contact you a bit later, ok?
[21:32] * SvenPHX (~scarter@wsip-174-79-34-244.ph.ph.cox.net) Quit (Remote host closed the connection)
[21:43] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[21:46] * sivanov (~sivanov@ has joined #ceph
[21:49] <sstan> ok
[21:50] * The_Bishop_ (~bishop@e179010009.adsl.alicedsl.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[21:50] * SvenPHX (~scarter@wsip-174-79-34-244.ph.ph.cox.net) has joined #ceph
[21:51] <mjblw> is it possible, to move rbd volumes from one pool to another, either using rbd or rados?
[21:51] <rzerres> sjust: thanks so much. You saved my day :)
[21:52] <sjust> certainly, let us know if you find anything interesting roaming the config space
[21:57] * SvenPHX (~scarter@wsip-174-79-34-244.ph.ph.cox.net) Quit (Quit: Leaving.)
[21:57] * SvenPHX (~scarter@wsip-174-79-34-244.ph.ph.cox.net) has joined #ceph
[22:05] * houkouonchi-work (~linux@ Quit (Read error: Connection reset by peer)
[22:06] * houkouonchi-work (~linux@ has joined #ceph
[22:11] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) has joined #ceph
[22:13] * portante is now known as portante|afk
[22:23] <joao> sstan, rzerres monitors didn't have a problem per se; rzerres did hit an issue with one of the monitors, but that wasn't the source of his main problem
[22:23] <joao> in fact, rzerres' main issue was that he shutdown/killed/interrupted a monitor mid-conversion
[22:23] <joao> and the converted store got borked
[22:24] <joao> the other issue was just that his store was taking too long to convert, and it appeared as if the monitor was stalled
[22:24] <sstan> In my case, all mons seem to be up, but every ceph command hangs
[22:24] <joao> sstan, we'd be able to know more granted you have logs
[22:25] <joao> you *having* logs
[22:25] <joao> or something of the sorts, am way too hungry to figure out the correct form of that phrase
[22:26] <sstan> not much logs right now .. except
[22:26] <sstan> pipe(0x1610230 sd=20 :6789 s=0 pgs=0 cs=0 l=0).accept connect_seq 0 vs existing 0 state connecting
[22:27] <joao> sstan, 'debug mon = 20' would provide further insights on what's happening
[22:27] <sstan> ok I'll try that
[22:29] <loicd> sjust: Hi ! I see you authored a many commits from https://github.com/ceph/ceph/blob/master/src/os/LFNIndex.cc . Would you have an advice regarding unit tests ? It is fine if you don't I'll just go ahead and write them ;-)
[22:30] <sstan> joao : http://pastebin.com/aJh7qzR6
[22:32] <sstan> oh now ceph health returns .. 1 mons down even though /etc/init.d/ceph status shows they're running right now
[22:35] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[22:35] <joao> sstan, that other monitor that is out of the quorum is in fact in the middle of synchronizing its store
[22:36] <joao> that should be the monitor on
[22:36] <joao> depending on how big the store, that might take a while
[22:37] <joao> well, dinner is ready; brb
[22:37] <sstan> hmm that's possible
[22:37] <sstan> thanks joao !
[22:37] <sstan> top shows 100% cpu usage for ceph-mon
[22:39] * The_Bishop (~bishop@2001:470:50b6:0:1475:dc56:2e23:e501) has joined #ceph
[22:41] <sjust> loicd: hi
[22:43] <sjust> loicd: the pure functions in LFNIndex.h are good candiates for unit testing
[22:43] <sjust> unfortunately, that's mostly just the methods in the manglers/demanglers sectino
[22:43] <sjust> *section
[22:44] <sjust> at the bottom
[22:44] <sjust> the rest are thin wrappers over IO
[22:45] * loicd looking
[22:45] <sjust> lfn_get_name along with lfn_unlink are the least trivial pieces of machinery
[22:45] <sjust> and I don't think they have explicit test coverage
[22:47] <loicd> I'll start with them. It's a little tricky because they are private.
[22:48] <sjust> loicd: urgh, would prefer they not be public, but there's probably no real harm
[22:48] <sjust> ideally, you could make a public debug_runtests() method
[22:48] <sjust> not sure
[22:49] <sjust> test/filestore/store_test.cc has some testing for the collection index machinery
[22:49] <loicd> I'll try to do with the protected/public methods. If I'm stuck I'll figure out a minimal way to create the conditions for a good unit test.
[22:49] <sjust> though I wouldn't classify it as a unit test
[22:49] <sjust> loicd: k
[22:50] <loicd> sjust: thanks for the advices :-)
[22:50] <sjust> sure, test coverage is a good thing
[22:56] * LeaChim (~LeaChim@b0fae63d.bb.sky.com) has joined #ceph
[22:56] * sleinen1 (~Adium@2001:620:0:26:f5fe:f848:a00e:5c53) Quit (Quit: Leaving.)
[23:06] * aliguori (~anthony@ Quit (Remote host closed the connection)
[23:06] * sivanov (~sivanov@ Quit (Ping timeout: 480 seconds)
[23:08] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) has joined #ceph
[23:15] * jtang1 (~jtang@ Quit (Quit: Leaving.)
[23:15] * davidz1 (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[23:16] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[23:16] * BillK (~BillK@58-7-172-123.dyn.iinet.net.au) has joined #ceph
[23:17] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit ()
[23:18] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[23:19] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit ()
[23:19] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[23:20] * drokita (~drokita@ Quit (Ping timeout: 480 seconds)
[23:23] * eschnou (~eschnou@227.159-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[23:33] <mjblw> does anyone know, is it possible, to move rbd volumes from one pool to another, either using rbd or rados?
[23:36] <joshd> mjblw: yes, you can copy or clone them from one pool to another with the rbd cli
[23:36] <joshd> i.e. rbd cp pool1/image1 pool2/image2
[23:36] <Kioob> joshd: and is it really faster than just doing a dd ?
[23:37] <Kioob> oh... dd doesn't handle the �thin provisionning�
[23:38] <joshd> if you just want it available quickly, cloning + flattening is even better
[23:39] <Kioob> ok thanks
[23:39] <mjblw> cloning and flattening?
[23:39] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[23:39] <mjblw> i am looking for the least downtime option
[23:40] <mjblw> i'm assuming these operations cannot be done while the rbd volume is actively being used.
[23:42] <joshd> well, cloning is done from a snapshot, and requires a format 2 image (which kernel rbd doesn't understand yet) http://ceph.com/docs/master/rbd/rbd-snapshot/#layering
[23:44] <joshd> if your image is format 1 or you're using the kernel client, it won't matter much whether you use 'rbd cp' or a large block size dd from one volume to another
[23:45] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[23:50] <mjblw> Ok. I'm on format 1, so the choice is made.
[23:50] * PerlStalker (~PerlStalk@ Quit (Quit: ...)
[23:51] * jtang1 (~jtang@ has joined #ceph
[23:57] * loicd (~loic@74-94-156-210-NewEngland.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.