#ceph IRC Log

Index

IRC Log for 2012-07-05

Timestamps are in GMT/BST.

[0:55] * yoshi (~yoshi@p22043-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[0:57] * The_Bishop (~bishop@2a01:198:2ee:0:acf0:de6e:7db:80e9) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[1:05] * stass (stas@ssh.deglitch.com) Quit (Read error: Connection reset by peer)
[1:05] * stass (stas@ssh.deglitch.com) has joined #ceph
[1:15] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[1:32] * danieagle (~Daniel@177.43.213.15) has joined #ceph
[2:15] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[2:41] * danieagle (~Daniel@177.43.213.15) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[2:54] * lofejndif (~lsqavnbok@82VAAEWDG.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[3:27] * renzhi (~renzhi@69.163.36.54) has joined #ceph
[3:50] <renzhi> Hello, any got error upgrading from 0.47 to 0.48 on Debian?
[3:50] <renzhi> Unpacking ceph-fs-common (from .../ceph-fs-common_0.48argonaut-1~bpo70+1_amd64.deb) ...
[3:50] <renzhi> dpkg: error processing /var/cache/apt/archives/ceph-fs-common_0.48argonaut-1~bpo70+1_amd64.deb (--unpack):
[3:50] <renzhi> trying to overwrite '/sbin/mount.ceph', which is also in package ceph-common 0.47.2-1~bpo70+1
[3:50] <renzhi> Selecting previously unselected package ceph-mds.
[3:50] <renzhi> Unpacking ceph-mds (from .../ceph-mds_0.48argonaut-1~bpo70+1_amd64.deb) ...
[3:50] <renzhi> Processing triggers for man-db ...
[3:50] <renzhi> Errors were encountered while processing:
[3:50] <renzhi> /var/cache/apt/archives/ceph-fs-common_0.48argonaut-1~bpo70+1_amd64.deb
[3:50] <renzhi> E: Sub-process /usr/bin/dpkg returned an error code (1)
[3:51] <renzhi> I shutdown ceph on that node, and do a apt-get install ceph
[3:51] <renzhi> the system is debian wheezy
[4:05] <renzhi> I reran apt-get install ceph, and it went through. Not sure if this is good or not. Still in the recovery phase.
[4:28] <renzhi> I'm doing this upgrade on a very small test cluster, not a lot of data. I'm wondering how long could the conversion and recovery take?
[5:50] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[5:59] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[7:44] * Qten (~qgrasso@ip-121-0-1-110.static.dsl.onqcomms.net) Quit (Read error: Connection reset by peer)
[7:44] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[7:44] * Qten (~qgrasso@ip-121-0-1-110.static.dsl.onqcomms.net) has joined #ceph
[8:52] <pmjdebruijn> renzhi: I'm guessing that's a minor packaging oversight, where one package should have obsolted another
[8:53] <pmjdebruijn> http://www.debian.org/doc/debian-policy/ch-relationships.html
[8:58] <renzhi> ok, I'll take a look
[8:59] <renzhi> It took several hours to upgrade and convert, but it seems to have succeeded.
[9:00] <renzhi> However, one of the mon seems to behave funny.
[9:05] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[9:11] <Qten> any ideas on how rbd boot in essex/folsom is coming along?
[9:19] * renzhi (~renzhi@69.163.36.54) Quit (Ping timeout: 480 seconds)
[9:27] * renzhi (~renzhi@69.163.36.54) has joined #ceph
[9:36] <pmjdebruijn> renzhi: our osd's are still converting... 20+ hours :s
[9:38] * johnl (~johnl@2a02:1348:14c:1720:c0ae:8290:f70c:9d25) Quit (Remote host closed the connection)
[9:38] * johnl (~johnl@2a02:1348:14c:1720:6d1c:ccf:522a:ba70) has joined #ceph
[9:39] <renzhi> pmjdebruijn: :D
[9:39] <renzhi> ours is just a small test cluster, but that was slow, to be sure.
[9:39] <renzhi> not sure how fast it could convert with btrfs, we are using xfs
[9:48] <NaioN> we too
[9:48] <NaioN> (same company as pmjdebruijn)
[9:50] * deepsa (~deepsa@122.172.22.108) Quit (Ping timeout: 480 seconds)
[10:00] <renzhi> seems to have a lot development going on with btrfs though, hope that it will be production-ready soon
[10:02] * psomas (~psomas@inferno.cc.ece.ntua.gr) has joined #ceph
[10:31] * deepsa (~deepsa@115.242.216.119) has joined #ceph
[11:07] * yoshi (~yoshi@p22043-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:31] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[11:38] * The_Bishop (~bishop@e179020007.adsl.alicedsl.de) has joined #ceph
[12:13] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[12:29] * deepsa (~deepsa@115.242.216.119) Quit (Ping timeout: 480 seconds)
[12:30] * deepsa (~deepsa@122.172.174.250) has joined #ceph
[13:44] * neerbeer (~Adium@c-75-75-33-53.hsd1.va.comcast.net) has joined #ceph
[13:45] * neerbeer (~Adium@c-75-75-33-53.hsd1.va.comcast.net) Quit ()
[14:51] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[16:04] * goedi (goedi@195.26.5.166) Quit (Read error: Connection reset by peer)
[16:04] * goedi (goedi@195.26.5.166) has joined #ceph
[16:33] * The_Bishop (~bishop@e179020007.adsl.alicedsl.de) Quit (Remote host closed the connection)
[16:40] * goedi (goedi@195.26.5.166) Quit (Read error: Connection reset by peer)
[16:40] * goedi (goedi@195.26.5.166) has joined #ceph
[16:53] <nhm> ugh, installing asciidoc takes up 750MB of space with all the depends.
[16:53] * nhorman (~nhorman@nat-pool-rdu.redhat.com) has joined #ceph
[16:54] * fghaas (~florian@194.158.199.28) has joined #ceph
[17:09] * deepsa_ (~deepsa@122.167.169.82) has joined #ceph
[17:11] * deepsa (~deepsa@122.172.174.250) Quit (Ping timeout: 480 seconds)
[17:11] * deepsa_ is now known as deepsa
[17:19] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Quit: Ex-Chat)
[17:28] * ceph-test (~Runner@mail.lexinter-sa.COM) Quit ()
[17:34] * mgalkiewicz (~mgalkiewi@staticline58611.toya.net.pl) has joined #ceph
[17:35] <mgalkiewicz> hi is there any replacement for ceph -s command in 0.48 except status for each subsystem?
[17:39] <joao> doesn't 'ceph status' return the same thing as '-s'?
[17:43] <mgalkiewicz> joao: nope I got "unrecognized subsystem"
[17:43] <joao> weird
[17:43] <joao> wasn't aware we changed that
[17:43] <joao> but I've been away from master for some time now, maybe I missed that change
[17:44] <joao> well, iirc, the changelog for 0.48 did state that there were some format changes on the ceph tool status report
[17:45] <mgalkiewicz> yep but documentation is still obsolete
[17:47] <fghaas> mgalkiewicz: are you saying "ceph -s" isn't working for you in 0.48?
[17:47] <fghaas> I ask because it is for me
[17:48] <joao> mgalkiewicz, it's still in master though
[17:49] <mgalkiewicz> yes it is not working
[17:49] <mgalkiewicz> I have just installed 0.48 from your debian wheezy repo
[17:49] <mgalkiewicz> ceph osd stat works
[17:49] <fghaas> works just fine for me, albeit using the ceph.com debian repo
[17:49] <fghaas> on squeeze
[17:50] <mgalkiewicz> my version 0.48argonaut-1~bpo70+1
[17:52] <joao> just to confirm, the error message was "unrecognized subsystem"?
[17:52] <joao> not "unrecognized command" or something else?
[17:53] <mgalkiewicz> no
[17:53] <fghaas> on that 0.48 note, I must confess that the following error message is exceedingly helpful:
[17:53] <fghaas> # ceph-authtool -l /etc/ceph/keyring
[17:53] <fghaas> error reading file /etc/ceph/keyring
[17:54] <mgalkiewicz> joao: it is definitely "unrecognized subsystem"
[17:54] <joao> gotta stash my stuff and checkout master then
[17:54] <joao> just a moment
[17:54] <joao> maybe that's an error message introduced in recent patches
[18:01] * nhorman (~nhorman@nat-pool-rdu.redhat.com) Quit (Quit: Leaving)
[18:03] <joao> I gotta say, this is weird; I just don't seem to be able to find any error message putting "unrecognized" and "subsystem" together in some way
[18:05] <mgalkiewicz> so the package might have been built from different source code
[18:10] * lofejndif (~lsqavnbok@82VAAEXA5.tor-irc.dnsbl.oftc.net) has joined #ceph
[18:10] <joao> honestly, that sounds unlikely but should be a possibility given the error message, but I think sagewk (or whoever is behind building the packages) would know best :)
[18:11] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[18:15] <mgalkiewicz> joao: ok should I report a bug?
[18:17] <joao> maybe it would be easier to wait a bit until they get to the office, but if you'd rather file a bug report that would be cool too :)
[18:17] <joao> I can just point them to the bug once they show up
[18:17] <joao> *bug report
[18:22] <mgalkiewicz> ok
[18:29] * loicd (~loic@83.167.43.235) Quit (Quit: Leaving.)
[18:49] <fghaas> looking at the newly recommended way to create client keys, is there a way to add/remove/modify capabilities with "ceph auth get-or-create", rather than only set them on key creation?
[18:51] <fghaas> the naive way of doing "ceph auth get-or-create <name> <caps>" yields "key for <name> exists but cap <cap> does not match"
[18:52] * tremon (~aschuring@d594e6a3.dsl.concepts.nl) has joined #ceph
[18:55] <tremon> hi all, small question: what's the default location for the osd keyring? Documentation suggests it's $osd_data/keyring but that doesn't work for me
[18:55] <fghaas> that's in 0.48, are you using that?
[18:55] <tremon> ah, no 0.43 still. Thx, that explains it
[18:56] <fghaas> prior to that it was in /etc/ceph/keyring, /etc/ceph/keyring.bin and (I think) /etc/ceph/$cluster-keyring
[18:56] * loicd (~loic@magenta.dachary.org) has joined #ceph
[18:57] <tremon> ok thx, I will simply keep the keyring= in my config for now
[18:57] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[18:58] <fghaas> and for anyone else looking into this, the documentation for defining capabilities appears to be wrong for 0.48... at least the example :)
[19:05] <mgalkiewicz> joao: http://tracker.newdream.net/issues/2721
[19:06] * dmick (~dmick@aon.hq.newdream.net) has joined #ceph
[19:06] <joao> mgalkiewicz, for what is worth, I don't think 'ceph -s osd' ever worked
[19:07] <joao> 'ceph -s osd' would simply run a 'ceph -s'
[19:07] <joao> but then again, the unrecognized subsystem error is what's throwing me off
[19:10] <joao> mgalkiewicz, do you see anything in the mon logs when you issue this command?
[19:10] <joao> greping the log for 'handle_command' could provide some insight
[19:10] <dmick> ceph -s osd does not complain to me to the console (joined late) FWIW
[19:11] <joao> dmick, sure, but it should output the same as ceph -s
[19:11] <dmick> and it does
[19:12] <joao> yeah, the ceph tool will just run the equivalent of a 'ceph status' whenever the '-s' is specified
[19:17] <mgalkiewicz> when I execute ceph -s in mon log appears sth like this:
[19:17] <mgalkiewicz> 7f712d7c8700 0 mon.cc@0(leader) e2 handle_command mon_command(status v 0) v1
[19:18] <mgalkiewicz> I have also another problem
[19:18] <mgalkiewicz> when I try to map rbd volume the command hangs forever
[19:18] <mgalkiewicz> rbd map postgresql -p foo-test-staging --user admin --secret /tmp/secret
[19:19] <mgalkiewicz> it randomly works
[19:20] <dmick> 2012-07-05 10:10:29.198906 7f213415d700 0 mon.c@2(peon) e1 handle_command mon_command(status v 0) v1 same for me
[19:21] <dmick> that seems normal
[19:22] * Ryan_Lane (~Adium@c-98-210-205-93.hsd1.ca.comcast.net) has joined #ceph
[19:22] * chutzpah (~chutz@100.42.98.5) has joined #ceph
[19:23] <fghaas> mgalkiewicz: stuck sub ops in osd log and/or ceph -w?
[19:25] <mgalkiewicz> fghaas: not sure what do you mean
[19:26] <fghaas> mgalkiewicz: is one of your OSDs reporting that it is waiting for a "sub op" to complete, indicating slow I/O on one of your boxes, or potentially extreme btrfs fragmentation?
[19:28] <mgalkiewicz> https://gist.github.com/3055052
[19:29] <mgalkiewicz> nothing interesting in the other osd's log
[19:30] * nhorman (~nhorman@2001:470:8:a08:7aac:c0ff:fec2:933b) has joined #ceph
[19:39] <fghaas> issue report bomb incoming on the mailing list, heads up -- don't shoot the messenger :)
[19:40] <nhm> fghaas: :D
[19:40] <gregaf1> mgalkiewicz: it sounds like maybe you didn't upgrade all your stuff at the same time
[19:42] <fghaas> nhm: feel free to set me straight if I did anything stupid, but those three do seem rather painful :)
[19:43] <nhm> fghaas: all seem to be reasonable to me, but some of the other guys might have comments.
[19:43] <fghaas> good, then my WTF moments served a purpose
[19:44] <gregaf1> yeah; I'm writing emails now...
[19:45] <nhm> fghaas: the FS stuff is probably just due to lack of time spent on that right now. The maxosd setting killing all the MONs is a bit scary though. :D
[19:45] <fghaas> ahem, yes
[19:45] <yehudasa> nhm: are you still seeing rest-bench issues?
[19:46] <nhm> yehudasa: I haven't installed 0.48 proper yet. Should I upgrade?
[19:46] <yehudasa> no
[19:47] <mgalkiewicz> gregaf1: apt-get dist-upgrade should do the trick?
[19:47] <yehudasa> nhm: don't stop your current testing for that
[19:47] <gregaf1> mgalkiewicz: well, dist-upgrade is a big hammer; just the ceph packages need upgrades
[19:48] <gregaf1> maybe you have hit an actual bug, I just bring this up because it sounds similar to what somebody reported on the mailing list, and that was the issue
[19:48] <yehudasa> nhm: however, I'd like to have a better understanding later on the rados bench vs rest-bench differences
[19:48] <yehudasa> nhm: in terms of performance results
[19:49] <nhm> yehudasa: still haven't problems iwth it definitely. Fairly often it doesn't finish the test.
[19:49] <nhm> yehudasa: performance for small transfer sizes is definitely much lower than rados bench when it works.
[19:50] <mgalkiewicz> gregaf1: ok could you help me with my rbd map issue? do you need osd logs?
[19:50] <yehudasa> nhm: I fixed some issues with it recently, are you running a version that was before that?
[19:51] <gregaf1> mgalkiewicz: can you describe it a little more?
[19:51] * mkampe (~markk@aon.hq.newdream.net) has joined #ceph
[19:52] <nhm> yehudasa: yeah, I haven't upgraded it recently.
[19:52] <nhm> yehudasa: it's still the version from next as of about 2 weeks ago.
[19:52] <mgalkiewicz> gregaf1: I have tried to map rbd volume with command rbd map postgresql -p foo-test-staging --user admin --secret /tmp/secret
[19:53] <mgalkiewicz> but it hangs forever
[19:53] <mgalkiewicz> sometimes it works
[19:57] <fghaas> hmm. yehudasa, that fix for the rgw subuser lost-permissions-on-key-creation issue hasn't made it into 0.48, has it?
[19:59] <gregaf1> mgalkiewicz: can you check dmesg and make sure there's nothing in there?
[19:59] <gregaf1> and make sure that you can connect to the monitors from that node using other tools?
[19:59] <gregaf1> I'm afraid I have to run to a meeting, but I'll go through the logs, or somebody else might be able to help
[20:00] <gregaf1> fghaas: apparently you're out of the office :p
[20:01] <fghaas> well I am, but is the mailing list not smart enough to discard those?
[20:01] <fghaas> or did you cc me personally?
[20:01] <mgalkiewicz> dmesg is empty
[20:01] <fghaas> ah, you did
[20:02] <mgalkiewicz> not sure how to connect to monitors using other tools. what tools?
[20:03] <fghaas> I guess gregaf1 is referring to ceph -s, for example -- any client other than rbd
[20:04] <mgalkiewicz> ceph -s is now fixed:)
[20:04] <mgalkiewicz> gotta go
[20:04] <mgalkiewicz> thx for help
[20:04] * mgalkiewicz (~mgalkiewi@staticline58611.toya.net.pl) Quit (Quit: Ex-Chat)
[20:09] <fghaas> hm, guys, it seems like openid login for the issue tracker is shot; at least it's failing for my launchpad openid
[20:10] <fghaas> scratch that; I actually can't login at all, neither with openid nor username/password
[20:12] * chutzpah (~chutz@100.42.98.5) Quit (Quit: Leaving)
[20:17] * adjohn (~adjohn@69.170.166.146) has joined #ceph
[20:28] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[20:28] <dmick> just checked fghaas; I can log in
[20:28] <dmick> possible forgotten password, or?...
[20:29] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has left #ceph
[20:31] <fghaas> dmick: no, I just created the account :)
[20:31] <dmick> hm. ok. what happens when you attempt to log in?
[20:43] <dmick> gregaf1's on fghaas's login issue, apparently
[20:52] * LarsFronius (~LarsFroni@2a02:8108:380:90:8c61:ae23:fa5f:b913) has joined #ceph
[21:17] * fghaas (~florian@194.158.199.28) has left #ceph
[21:41] * MapspaM is now known as SpamapS
[22:00] * NashTrash (~Adium@mobile-166-147-116-066.mycingular.net) has joined #ceph
[22:01] * NashTrash (~Adium@mobile-166-147-116-066.mycingular.net) Quit ()
[22:05] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[22:06] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has left #ceph
[22:09] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[22:10] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has left #ceph
[22:13] * tremon (~aschuring@d594e6a3.dsl.concepts.nl) has left #ceph
[22:14] * ssedov (stas@ssh.deglitch.com) has joined #ceph
[22:16] * stass (stas@ssh.deglitch.com) Quit (Read error: Connection reset by peer)
[22:21] <sagewk> elder: any luck?
[22:21] <elder> Not yet.
[22:22] <elder> Kind of absorbing info at this point. I also took a little break to go find out which of the computers in my house is infected with the DNS bot.
[22:23] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[22:49] * nhorman (~nhorman@2001:470:8:a08:7aac:c0ff:fec2:933b) Quit (Quit: Leaving)
[22:49] * nhorman (~nhorman@2001:470:8:a08:7aac:c0ff:fec2:933b) has joined #ceph
[22:51] * nhorman (~nhorman@2001:470:8:a08:7aac:c0ff:fec2:933b) Quit ()
[23:00] <elder> Back in a little while.
[23:16] <nhm> wow, it's only 96F today. It was supposed to be 100F.
[23:47] * neerbeer (~Adium@65-125-22-154.dia.static.qwest.net) has joined #ceph
[23:50] <neerbeer> Hello. I've got repl=2 w/ 2 osds. Should there be a noticable wait/hang when writing to the osd that is still up ? The osd that is still up is running the mon and mds.
[23:55] <gregaf1> neerbeer: what kind of wait/hang?
[23:57] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[23:57] <neerbeer> during my testing w/ just two hosts, I was doing a dd=/dev/zero of=outfile bs=1024 count=1000 to ceph and shutdown one of the osds to test the availability of the storage.
[23:58] <neerbeer> I added rbd via modprobe and added the rbd device via an echo to the proc filesystem and was able to mkfs.ext4 just fine.
[23:59] <neerbeer> dd when both osds were running was good. I shutdown the osd not running the mon (obviously ) to test failure of an osd and my writes via dd just hung .
[23:59] <neerbeer> The storage came back after I restarted the 2nd osd and I did see the san rebuild via ceph -w

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.