#ceph IRC Log

Index

IRC Log for 2013-07-28

Timestamps are in GMT/BST.

[0:03] * via (~via@smtp2.matthewvia.info) has joined #ceph
[0:47] * via (~via@smtp2.matthewvia.info) Quit (Ping timeout: 480 seconds)
[0:47] * smiley (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) has joined #ceph
[0:51] * lautriv (~lautriv@f050081222.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[0:56] * bitblt (~don@ip68-226-26-79.tc.ph.cox.net) has joined #ceph
[1:01] * lautriv (~lautriv@f050081250.adsl.alicedsl.de) has joined #ceph
[1:15] * via (~via@smtp2.matthewvia.info) has joined #ceph
[1:32] * LeaChim (~LeaChim@0540adc6.skybroadband.com) Quit (Ping timeout: 480 seconds)
[1:49] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:55] * bitblt (~don@ip68-226-26-79.tc.ph.cox.net) Quit (Ping timeout: 480 seconds)
[1:55] * nwat (~nwat@eduroam-251-132.ucsc.edu) Quit (Ping timeout: 480 seconds)
[1:59] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[2:11] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[2:26] * bitblot (~don@rtp-isp-nat1.cisco.com) has joined #ceph
[2:35] <bitblot> hi, could someone explain this error to me please?
[2:35] <bitblot> http://codepad.org/HPpyDAhN
[2:43] <sage> bitblot: what version is that?
[2:43] <bitblot> 0.61.7-1precise
[2:43] <sage> can you try with --debug-mon 20 --log-to-stderr ?
[2:44] <- *davidz* 2Mfa827!
[2:44] <sage> also, it looks like you are overriding the dfeault mon data directory in ceph.conf.. was that intentional?
[2:45] <bitblot> hmm, not intentionally, I am trying to use enovance's ceph puppet module. It seemed to work okay for .61.4, but "breaks" with .7
[2:45] <bitblot> yes, it is being overridden
[2:45] <joao> bitblot, does /var/lib/ceph/mon/mon.0 exist?
[2:45] <bitblot> I created it manually, so yes, now
[2:45] <joao> ok
[2:46] <bitblot> I think the issue might be the puppet code is not creating the mon directory
[2:46] <bitblot> which is odd, because it works completely using bobtail
[2:46] <joao> ah
[2:46] <joao> well, the mon needs the directory to exist in cuttlefish
[2:46] <bitblot> oh, did it not in bobtail?
[2:47] <joao> nope, back then it would create the dir
[2:47] <bitblot> aha, ok, I owe you and sage a drink then
[2:47] <bitblot> thank you
[2:47] <joao> yw
[2:48] <sage> we should probably suggest that the puppet module be updated to use default paths
[2:48] <sage> less confusing all around
[2:48] <bitblot> I'll see if I can send a pull request for that. Francois does not seem to like setting explicit paths though..
[2:49] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[2:53] <sage> all puppet should nee dto do is mkdir /var/lib/ceph/mon/ceph-$hostname and remove teh mon data line from ceph.conf
[2:54] <sage> or even get that path by doing ceph-mon -i `hostname` --show-config-value mon_data
[3:01] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Ping timeout: 480 seconds)
[3:04] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[3:09] * BillK (~BillK-OFT@124-169-67-32.dyn.iinet.net.au) has joined #ceph
[3:22] * xmltok_ (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Leaving...)
[3:46] * mozg (~andrei@host109-151-35-94.range109-151.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[3:49] * yehuda_hm (~yehuda@2602:306:330b:1410:baac:6fff:fec5:2aad) Quit (Remote host closed the connection)
[3:53] * davidz (~Adium@ip68-5-239-214.oc.oc.cox.net) has joined #ceph
[3:58] * huangjun (~kvirc@60.55.8.156) has joined #ceph
[4:38] <huangjun> if i want to debug the ceph,like gdb /usr/bin/zbkc-osd /core.XXXX
[4:38] <huangjun> which package should i install
[4:58] * KindTwo (KindOne@h86.35.186.173.dynamic.ip.windstream.net) has joined #ceph
[5:00] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[5:00] * KindTwo is now known as KindOne
[5:05] * fireD_ (~fireD@93-139-141-122.adsl.net.t-com.hr) has joined #ceph
[5:07] * fireD (~fireD@93-142-249-252.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[5:58] * bitblot (~don@rtp-isp-nat1.cisco.com) Quit (Read error: Connection reset by peer)
[6:22] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[6:22] * yy-nm (~chatzilla@115.198.96.222) has joined #ceph
[6:44] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[6:45] * DaChun (~quassel@222.76.56.254) has joined #ceph
[6:57] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[7:11] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[7:12] * yy-nm (~chatzilla@115.198.96.222) Quit (Read error: Connection reset by peer)
[7:13] * yy-nm (~chatzilla@115.198.96.222) has joined #ceph
[7:26] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[7:49] * DaChun (~quassel@222.76.56.254) Quit (Ping timeout: 480 seconds)
[8:34] * DaChun (~quassel@222.76.56.254) has joined #ceph
[9:09] * smiley (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) Quit (Quit: smiley)
[9:36] * yy-nm (~chatzilla@115.198.96.222) Quit (Read error: Connection reset by peer)
[9:36] * yy-nm (~chatzilla@115.198.96.222) has joined #ceph
[9:51] * AfC (~andrew@2001:44b8:31cb:d400:bc9c:a858:863c:7398) has joined #ceph
[9:52] * via (~via@smtp2.matthewvia.info) Quit (Ping timeout: 480 seconds)
[9:52] * saaby (~as@mail.saaby.com) Quit (Ping timeout: 480 seconds)
[10:11] * huangjun|2 (~kvirc@113.107.222.53) has joined #ceph
[10:18] * huangjun (~kvirc@60.55.8.156) Quit (Ping timeout: 480 seconds)
[11:09] * Cybertinus (~Cybertinu@2001:828:405:30:83:96:177:42) has left #ceph
[11:50] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[11:52] * TomasCZ (~TomasCZ@yes.tenlab.net) has joined #ceph
[11:52] * yy-nm (~chatzilla@115.198.96.222) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 22.0/20130618035212])
[12:20] * LeaChim (~LeaChim@0540adc6.skybroadband.com) has joined #ceph
[12:46] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[12:53] * TomasCZ (~TomasCZ@yes.tenlab.net) Quit (Quit: Leaving)
[13:25] * smiley (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) has joined #ceph
[13:36] * DaChun (~quassel@222.76.56.254) Quit (Ping timeout: 480 seconds)
[14:19] <lautriv> anyone around who is responsible for the sgdisk calls used to prepare drives in ceph ?
[14:23] * saabylaptop (~saabylapt@1009ds5-oebr.1.fullrate.dk) has joined #ceph
[14:31] <lautriv> ok, maybe someone will catchup this later : since my osd-prepare failed on some drives with a messed-up disklabel, i had a closer look and it appears the problem is the sgdisk "--largest-new=1" must be used in conjuncion with "-a 2048" to respect proper boundaries. even it is a bit strange that other disks just work, it seem to put the first partitions start to early and truncates the label there.
[14:32] * saabylaptop (~saabylapt@1009ds5-oebr.1.fullrate.dk) Quit (Quit: Leaving.)
[14:47] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[14:52] * mozg (~andrei@host109-151-35-94.range109-151.btcentralplus.com) has joined #ceph
[14:52] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[15:03] * saabylaptop (~saabylapt@1009ds5-oebr.1.fullrate.dk) has joined #ceph
[15:10] * waxzce (~waxzce@2a01:e35:2e1e:260:f8bf:c51e:303f:a475) Quit (Remote host closed the connection)
[15:12] * waxzce (~waxzce@glo44-2-82-225-224-38.fbx.proxad.net) has joined #ceph
[15:13] * DaChun (~quassel@222.76.56.254) has joined #ceph
[15:18] * saabylaptop (~saabylapt@1009ds5-oebr.1.fullrate.dk) Quit (Quit: Leaving.)
[15:23] * saabylaptop (~saabylapt@1009ds5-oebr.1.fullrate.dk) has joined #ceph
[15:26] * saabylaptop (~saabylapt@1009ds5-oebr.1.fullrate.dk) Quit ()
[15:40] * huangjun|2 (~kvirc@113.107.222.53) Quit (Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/)
[15:42] <Psi-jack> heh, I kinda wish cephfs itself had a means to set a quota of some sort on the whole mount, kind of like a subvolume on zfs or btrfs.
[15:48] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[15:55] * KindTwo (KindOne@h170.45.28.71.dynamic.ip.windstream.net) has joined #ceph
[15:57] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[15:57] * KindTwo is now known as KindOne
[16:10] <lautriv> ok, i inserted '--set-alignment=2048', in /usr/sbin/ceph-disk right after --largest-new=1 and it corrupted the labels anyway :(
[16:19] * LeaChim (~LeaChim@0540adc6.skybroadband.com) Quit (Ping timeout: 480 seconds)
[16:23] * BillK (~BillK-OFT@124-169-67-32.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[16:27] * mjeanson (~mjeanson@00012705.user.oftc.net) Quit (Remote host closed the connection)
[16:27] * mjeanson (~mjeanson@bell.multivax.ca) has joined #ceph
[16:44] * saabylaptop (~saabylapt@1009ds5-oebr.1.fullrate.dk) has joined #ceph
[17:12] * saabylaptop (~saabylapt@1009ds5-oebr.1.fullrate.dk) Quit (Quit: Leaving.)
[17:19] * saabylaptop (~saabylapt@1009ds5-oebr.1.fullrate.dk) has joined #ceph
[17:23] * infernix (nix@cl-1404.ams-04.nl.sixxs.net) Quit (Read error: Connection reset by peer)
[17:31] * LeaChim (~LeaChim@0540adc6.skybroadband.com) has joined #ceph
[17:40] * LeaChim (~LeaChim@0540adc6.skybroadband.com) Quit (Ping timeout: 480 seconds)
[17:49] * AfC (~andrew@2001:44b8:31cb:d400:bc9c:a858:863c:7398) Quit (Quit: Leaving.)
[17:56] * DaChun (~quassel@222.76.56.254) Quit (Ping timeout: 480 seconds)
[18:12] * lautriv notes down :" #ceph is dead each sunday"
[18:13] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[18:25] <sage> Psi-jack: that just came up on the list yesterday. the accounting half is there, just need enforcement on the mds
[18:31] <lautriv> hey sage , any light idea or hint what else i could check about my disklable-truncation issue ?
[18:32] * loicd looking for the command to notify the kernel that a RBD volume has changed size. So that resize2fs live works after a rbd resize.
[18:35] <lautriv> loicd, partprobe or some udev-trigger should do the trick.
[18:35] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[18:35] <loicd> hum
[18:35] * loicd trying
[18:35] <loicd> randomly :-)
[18:37] <loicd> sudo partprobe /dev/rbd0
[18:37] <loicd> sudo resize2fs /dev/rbd0 30000M
[18:37] <loicd> The containing partition (or device) is only 5242880 (4k) blocks.
[18:37] <loicd> You requested a new size of 7680000 blocks.
[18:37] <loicd> I guess there should be an IOCTL of some kind. /dev/rbd0 does not contain a partition table. It's ext4 formated.
[18:37] <loicd> I did a
[18:37] <loicd> sudo rbd resize --size 30000 nfs
[18:38] <loicd> and got
[18:38] <loicd> sudo rbd info nfs
[18:38] <loicd> rbd image 'nfs':
[18:38] <loicd> size 30000 MB in 7500 objects
[18:38] <loicd> order 22 (4096 KB objects)
[18:38] <loicd> block_name_prefix: rb.0.1209.238e1f29
[18:38] <loicd> format: 1
[18:38] <loicd> so all seems fine
[18:38] <loicd> and if I umount the file system and mount it back, then it works
[18:38] <loicd> but I'm trying to have resize2fs work on a live file system because it's more fun ;-)
[18:39] <loicd> and it's sunday
[18:40] <loicd> maybe there is some hack with /proc ...
[18:53] <lautriv> ok, i found a really hacky workaround for the freaking damaged disklabels : did a raid0 on those drives and symlinked md0 to sde, md01 and md0p1 to sde1 and got a working osd. _should_ still be fixed.
[18:54] * loicd reading http://www.spinics.net/lists/ceph-users/msg02887.html
[18:56] * waxzce (~waxzce@glo44-2-82-225-224-38.fbx.proxad.net) Quit (Remote host closed the connection)
[18:56] <loicd> wido: are you around ?
[18:56] <loicd> I'm trying to get rid of the
[18:56] <loicd> $ umount /tmp/rbd1
[18:56] <loicd> $ mount -o rw,noatime /dev/rbd/rbd/image1 /tmp/rbd1
[18:56] <loicd> part
[18:57] * saabylaptop (~saabylapt@1009ds5-oebr.1.fullrate.dk) Quit (Quit: Leaving.)
[18:58] * saabylaptop (~saabylapt@1009ds5-oebr.1.fullrate.dk) has joined #ceph
[18:58] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[19:02] <lautriv> to mount a client with cephx , i use the /etc/ceph/ceph.client.admin.keyring from the monitor, no ?
[19:12] * infernix (nix@5ED33947.cm-7-4a.dynamic.ziggo.nl) has joined #ceph
[19:13] <Psi-jack> sage: Hmm, interesting. But, you know what I mean by quota in this example? The way zfs and btrfs do it is different than conventional per-user type quota. I'm referring to disk quota, as in only providing X size for available maximum usage of the disk itself.
[19:13] <Psi-jack> So, a 4TB CephFS, I would only want a particular mount to allot only 500 GB of.
[19:38] * saabylaptop (~saabylapt@1009ds5-oebr.1.fullrate.dk) Quit (Quit: Leaving.)
[19:42] <loicd> strace says it resize2fs tries
[19:42] <loicd> ioctl(4, BLKGETSIZE64, 0x7ffffc825080) = 0
[19:46] <loicd> sudo blockdev --getsize64 /dev/rbd0
[19:46] <loicd> 21474836480
[19:46] <loicd> instead of 30G
[19:48] <loicd> sudo umount /mnt
[19:48] <loicd> sudo blockdev --getsize64 /dev/rbd0
[19:48] <loicd> 31457280000
[19:49] <loicd> that narrows the problem : how to make it so blockdev --getsize64 /dev/rbd0 returns the actual size of the underlying devices after a resize
[19:57] * jluis (~JL@89.181.148.68) has joined #ceph
[19:58] <jluis> sage, around?
[20:00] <wido> loicd: around now
[20:07] <sage> jluis: for a few minuets
[20:07] <sage> whats up?
[20:07] <jluis> have you ever seen this:
[20:08] <jluis> common/buffer.cc: 336: FAILED assert(_raw)
[20:08] <jluis> ceph version 0.67-rc2-35-g6b505ec (6b505ec7a2fbe20c88ae6da7243d2085f2128c26)
[20:08] <jluis> 1: ./ceph-mon() [0x7b188f]
[20:08] <jluis> 2: (ceph::buffer::list::crc32c(unsigned int)+0x38) [0x6109d8]
[20:08] <jluis> 3: (Message::encode(unsigned long, bool)+0x25) [0x726a75]
[20:08] <jluis> 4: (encode_message(Message*, unsigned long, ceph::buffer::list&)+0x187) [0x726e17]
[20:08] <jluis> 5: (MForward::encode_payload(unsigned long)+0x97) [0x611317]
[20:08] <jluis> ?
[20:08] <sage> nope
[20:08] <sage> doyou have a log?
[20:08] <jluis> yeah
[20:08] <sage> i bet there is a forward_messge() that isn't followed by a return;
[20:08] <jluis> I'll look into it; might be my own doing
[20:08] <sage> so teh caller does m->put on teh message
[20:09] <jluis> might be, although I didn't touch that portion of the code
[20:09] <jluis> something to keep me entertained :p
[20:11] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Remote host closed the connection)
[20:12] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[20:13] * saabylaptop (~saabylapt@1009ds5-oebr.1.fullrate.dk) has joined #ceph
[20:14] * xmltok_ (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[20:18] <jluis> sage, yeah, "oops"
[20:18] <jluis> forgot to return when !capable
[20:20] * xmltok (~xmltok@pool101.bizrate.com) Quit (Ping timeout: 480 seconds)
[20:21] * LeaChim (~LeaChim@0540adc6.skybroadband.com) has joined #ceph
[21:00] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[21:04] * saabylaptop (~saabylapt@1009ds5-oebr.1.fullrate.dk) Quit (Quit: Leaving.)
[21:08] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[21:22] * saabylaptop (~saabylapt@1009ds5-oebr.1.fullrate.dk) has joined #ceph
[21:30] * saabylaptop (~saabylapt@1009ds5-oebr.1.fullrate.dk) Quit (Quit: Leaving.)
[21:44] <loicd> ls -l /sys/devices/virtual/block/rbd0/ shows a number of entries ( http://pastebin.com/TxeWSBdD ) I wonder if one of them could be used to refresh the cache
[21:52] * jluis (~JL@89.181.148.68) Quit (Ping timeout: 480 seconds)
[22:00] * jluis (~JL@89.181.148.68) has joined #ceph
[22:02] * Tamil (~tamil@38.122.20.226) has joined #ceph
[22:05] * xmltok_ (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Bye!)
[22:06] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[22:07] * Tamil1 (~tamil@38.122.20.226) Quit (Read error: Operation timed out)
[22:21] * scalability-junk (uid6422@id-6422.ealing.irccloud.com) Quit (Ping timeout: 480 seconds)
[22:21] * scalability-junk (uid6422@ealing.irccloud.com) has joined #ceph
[22:43] * s2r2 (uid322@id-322.ealing.irccloud.com) Quit (Ping timeout: 480 seconds)
[22:44] * Tribaal (uid3081@id-3081.ealing.irccloud.com) Quit (Ping timeout: 480 seconds)
[23:05] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Leaving...)
[23:06] * BillK (~BillK-OFT@124-169-67-32.dyn.iinet.net.au) has joined #ceph
[23:14] <lautriv> if a disklabel from a formerly working osd got truncated, what's the best way to recover ? ( had a read on gpt fdisk but is not in my repo )
[23:21] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[23:28] * via (~via@smtp2.matthewvia.info) has joined #ceph
[23:31] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[23:34] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[23:35] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[23:35] * ChanServ sets mode +o scuttlemonkey
[23:40] * danieagle (~Daniel@186.214.77.206) has joined #ceph
[23:40] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[23:41] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit ()
[23:44] * waxzce (~waxzce@glo44-2-82-225-224-38.fbx.proxad.net) has joined #ceph
[23:56] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[23:58] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.