#ceph IRC Log

Index

IRC Log for 2012-08-27

Timestamps are in GMT/BST.

[0:27] * Qu310 (~qgrasso@ip-121-0-1-110.static.dsl.onqcomms.net) Quit ()
[1:02] * yoshi (~yoshi@p22043-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:59] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[2:00] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[2:00] * BManojlovic (~steki@212.200.243.134) Quit (Quit: Ja odoh a vi sta 'ocete...)
[2:27] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[2:39] * nhm (~nhm@174-20-15-49.mpls.qwest.net) has joined #ceph
[3:04] * thingee is now known as thingee_zz
[3:18] * markl (~mark@tpsit.com) Quit (Ping timeout: 480 seconds)
[3:54] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[4:21] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[4:22] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[4:25] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[4:46] * deepsa (~deepsa@117.203.8.12) has joined #ceph
[4:54] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[5:05] * lightspeed (~lightspee@2001:8b0:16e:1:216:eaff:fe59:4a3c) Quit (Ping timeout: 480 seconds)
[5:06] * lightspeed (~lightspee@fw-carp-wan.ext.lspeed.org) has joined #ceph
[5:06] * renzhi_2 (~renzhi@180.169.73.90) has joined #ceph
[5:08] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[5:09] * renzhi_2 (~renzhi@180.169.73.90) Quit ()
[5:28] * nhm (~nhm@174-20-15-49.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[6:25] * Tobarja1 (~athompson@cpe-071-075-064-255.carolina.res.rr.com) Quit (Quit: Leaving.)
[6:28] * Tobarja (~athompson@cpe-071-075-064-255.carolina.res.rr.com) has joined #ceph
[6:31] * danieagle (~Daniel@177.43.213.15) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[6:32] * deepsa (~deepsa@117.203.8.12) Quit (Ping timeout: 480 seconds)
[6:42] * deepsa (~deepsa@117.203.18.70) has joined #ceph
[6:50] * deepsa (~deepsa@117.203.18.70) Quit (Ping timeout: 480 seconds)
[6:52] * deepsa (~deepsa@117.203.11.220) has joined #ceph
[7:56] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[7:56] * deepsa_ (~deepsa@115.241.142.60) has joined #ceph
[7:57] * deepsa (~deepsa@117.203.11.220) Quit (Ping timeout: 480 seconds)
[7:57] * deepsa_ is now known as deepsa
[8:23] * darkfaded (~floh@188.40.175.2) has joined #ceph
[8:28] * darkfader (~floh@188.40.175.2) Quit (Ping timeout: 480 seconds)
[9:05] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:05] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has left #ceph
[9:08] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:28] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:31] * Leseb (~Leseb@193.172.124.196) has joined #ceph
[10:01] * EmilienM (~EmilienM@98.49.119.80.rev.sfr.net) has joined #ceph
[10:12] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[10:58] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[11:00] * renzhi_2 (~renzhi@180.169.73.90) has joined #ceph
[11:02] * andret (~andre@pcandre.nine.ch) has joined #ceph
[11:16] * yoshi (~yoshi@p22043-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:27] * ihwtl (~ihwtl@odm-mucoffice-02.odmedia.net) has joined #ceph
[11:43] <mrjack> someone here?
[11:45] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[11:55] <ihwtl> yes :-)
[12:18] <mrjack> hi
[12:18] <mrjack> can you help me? - i have problems with my ceph cluster
[12:19] <mrjack> the mailinglist isn't answering my mails
[12:19] <mrjack> i have a ceph cluster with two osds
[12:19] <mrjack> i could not get ceph healthy again so i decided to put one osd out
[12:19] <mrjack> did ceph osd crush remove osd.1 and ceph osd rm 1
[12:20] <mrjack> then i wanted to create new ceph filesystem for osd.1
[12:20] <mrjack> ceph-osd -c /etc/ceph/ceph.conf -i 1 --mkjournal --mkfs --monmap /tmp/monmap --mkkey
[12:20] <mrjack> 2012-08-27 12:20:12.047650 f7296710 -1 filestore(/data/ceph_backend/osd) mkfs: BTRFS_IOC_SUBVOL_CREATE failed with error (22) Invalid argument
[12:20] <mrjack> but it is ext4
[12:20] <mrjack> not BTRFS
[12:20] <mrjack> how can i tell ceph-osd to make ext4 filestore?
[12:21] <mrjack> it worked with my ceph.conf wenn i used mkcephfs initially
[12:25] <mrjack> HMPF
[12:25] <mrjack> i think i'll just kill ceph and start all over AGAIN!
[12:25] <mrjack> this is anoying - the 3rd time i loose data with ceph
[12:26] <mrjack> support from community or ML sucks ;( sorry to say that
[12:41] * renzhi_2 (~renzhi@180.169.73.90) Quit (Quit: Leaving)
[12:42] <todin_> Hi, how far is the rbd layering testable? Is there some howto somewhere?
[12:42] * todin_ is now known as todin
[12:44] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[13:04] <mrjack> ceph-osd -c /etc/ceph/ceph.conf -i 1 --mkjournal --mkfs --monmap /tmp/monmap --mkkey
[13:04] <mrjack> 2012-08-27 12:20:12.047650 f7296710 -1 filestore(/data/ceph_backend/osd) mkfs: BTRFS_IOC_SUBVOL_CREATE failed with error (22) Invalid argument
[13:04] <mrjack> but it is ext4
[13:04] <mrjack> any ideas?
[13:34] <mrjack> hm
[13:42] * nhm (~nhm@174-20-15-49.mpls.qwest.net) has joined #ceph
[13:52] <andreask> mrjack: tried this? http://ceph.com/wiki/Replacing_a_failed_disk/OSD
[14:04] <ihwtl> I have everytime when I produce high IO on the ceph filessystem following messages in ceph -w
[14:04] <ihwtl> [WRN] 17 slow requests, 2 included below
[14:05] <ihwtl> when this message are seen nearly not possible to access the ceph filesystem - somebody an idea?
[15:09] * stan_theman (~stan_them@173.208.221.221) has joined #ceph
[15:11] * aliguori (~anthony@cpe-70-123-140-180.austin.res.rr.com) has joined #ceph
[15:14] * renzhi (~renzhi@180.172.165.213) Quit (Quit: Leaving)
[15:54] * markl (~mark@tpsit.com) has joined #ceph
[16:03] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Read error: Connection reset by peer)
[16:03] * deepsa (~deepsa@115.241.142.60) Quit (Ping timeout: 480 seconds)
[16:07] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[16:11] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Remote host closed the connection)
[16:11] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[16:17] <mrjack> 2012-08-27 12:20:12.047650 f7296710 -1 filestore(/data/ceph_backend/osd) mkfs: BTRFS_IOC_SUBVOL_CREATE failed with error (22) Invalid argument
[16:17] <mrjack> ceph-osd -c /etc/ceph/ceph.conf -i 1 --mkjournal --mkfs --monmap /tmp/monmap --mkkey
[16:17] <mrjack> any ideas?
[16:21] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[16:32] * deepsa (~deepsa@117.203.12.160) has joined #ceph
[16:40] * deepsa (~deepsa@117.203.12.160) Quit (Ping timeout: 480 seconds)
[16:43] * deepsa (~deepsa@117.203.18.63) has joined #ceph
[16:46] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[16:47] * deepsa_ (~deepsa@115.242.22.105) has joined #ceph
[16:51] * deepsa (~deepsa@117.203.18.63) Quit (Ping timeout: 480 seconds)
[16:51] * deepsa_ is now known as deepsa
[16:58] <mrjack> [16:17] <mrjack> 2012-08-27 12:20:12.047650 f7296710 -1 filestore(/data/ceph_backend/osd) mkfs: BTRFS_IOC_SUBVOL_CREATE failed with error (22) Invalid argument
[16:58] <mrjack> [16:17] <mrjack> ceph-osd -c /etc/ceph/ceph.conf -i 1 --mkjournal --mkfs --monmap /tmp/monmap --mkkey
[16:58] <mrjack> [16:17] <mrjack> any ideas?
[17:03] * Fruit (wsl@2001:980:3300:2:216:3eff:fe10:122b) has left #ceph
[17:22] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[17:23] * Cube (~Adium@12.248.40.138) has joined #ceph
[17:27] * loicd (~loic@brln-4dbac51f.pool.mediaWays.net) has joined #ceph
[17:32] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:33] * loicd1 (~loic@brln-4d0cec6b.pool.mediaWays.net) Quit (Ping timeout: 480 seconds)
[17:39] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[17:44] * lxo (~aoliva@82VAAFYFQ.tor-irc.dnsbl.oftc.net) has joined #ceph
[17:45] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:58] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[18:08] * maelfius (~mdrnstm@pool-71-160-33-115.lsanca.fios.verizon.net) has joined #ceph
[18:12] * Leseb (~Leseb@193.172.124.196) Quit (Quit: Leseb)
[18:13] * masterpe_ (~masterpe@87.233.7.43) has joined #ceph
[18:14] <masterpe_> I got an little problem, i get the flowing error:
[18:14] <masterpe_> 2012-08-27 18:14:46.960706 7f0de4f0a700 0 mds.-1.0 ms_handle_connect on 192.168.1.12:6789/0
[18:15] * masterpe_ is now known as masterpe
[18:18] <gregaf> ihwtl: that error means the OSDs are backed up in dealing with IO requests from clients; if everything is uniformly slow it might just mean that you're putting too much load on your cluster
[18:21] <gregaf> masterpe: that's???not an error; is there something broken?
[18:24] <mrjack> [16:17] <mrjack> 2012-08-27 12:20:12.047650 f7296710 -1 filestore(/data/ceph_backend/osd) mkfs: BTRFS_IOC_SUBVOL_CREATE failed with error (22) Invalid argument
[18:24] <mrjack> [16:17] <mrjack> ceph-osd -c /etc/ceph/ceph.conf -i 1 --mkjournal --mkfs --monmap /tmp/monmap --mkkey
[18:24] <mrjack> [16:17] <mrjack> any ideas?
[18:25] <gregaf> I'm getting to you???you should have more patience over the weekend; people are away from their computers :)
[18:25] <mrjack> ah
[18:25] <mrjack> a lifesign ;)
[18:25] <mrjack> well, i don't know if my mails reached the list or got marked as spam
[18:26] <mrjack> and i did not recieve any response in here, too ;)
[18:26] <mrjack> all i am trying to do is to get my osd.1 back up online
[18:26] <gregaf> yeah
[18:26] <mrjack> but i cannot use ceph-osd -c /etc/ceph/ceph.conf -i 1 --mkjournal --mkfs --monmap /tmp/monmap --mkkey
[18:26] <gregaf> generally speaking you probably don't want to remove components when things break
[18:26] <mrjack> well
[18:26] <mrjack> i had ceph 50% degraded and did backups
[18:26] <gregaf> I suspect that you are confusing Ceph by including the "btrfs devs" line in your config file
[18:26] <gregaf> have you tried removing that and running again?
[18:27] <mrjack> yes
[18:27] <mrjack> i added these lines
[18:27] <mrjack> does not matter
[18:27] <mrjack> before i did not have these lines in the conf
[18:27] <gregaf> so you've run the mkfs both with and without btrfs devs?
[18:27] <mrjack> yes
[18:27] <mrjack> and always same error
[18:27] <gregaf> okay, take those out then because they're just going to confuse your readers :)
[18:28] <mrjack> well i thought when i do not define btrfs devs, it will use ext4
[18:28] <gregaf> what does "df" output on the node in question?
[18:28] <mrjack> everything allright, user_xattr is supported..
[18:28] <gregaf> please paste it :)
[18:28] <mrjack> tmpfs 7,9G 0 7,9G 0% /lib/init/rw
[18:28] <mrjack> udev 7,8G 232K 7,8G 1% /dev
[18:28] <mrjack> tmpfs 7,9G 0 7,9G 0% /dev/shm
[18:29] <mrjack> . /dev/md6 on /data type ext3 (rw,noatime,nodiratime,errors=remount-ro,user_xattr)
[18:29] <gregaf> also, you haven't specified a journal location???that is almost certainly your problem
[18:29] <mrjack> i have
[18:29] <mrjack> in the global section..
[18:29] <mrjack> osd journal = /data/ceph_backend/ceph.journal
[18:29] <mrjack> osd_journal_size = 512
[18:29] <mrjack> i was running with that config
[18:30] <mrjack> so i wonder why it does not want to recreate the osd-data dir again
[18:30] <gregaf> ah, and there you are specifying extra filestore options; you should clear those out too
[18:31] <gregaf> does the /data/ceph_backend/osd folder exist?
[18:31] <mrjack> YES
[18:31] <mrjack> it would create different error if it wouldnt
[18:31] * nhm (~nhm@174-20-15-49.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[18:31] <gregaf> I would certainly expect so, but since you haven't provided me many details I need to walk through all of them
[18:31] <mrjack> well
[18:31] <mrjack> i mailed to the devel list
[18:31] <mrjack> but got no reply
[18:32] <mrjack> file /data/ceph_backend/osd
[18:32] <gregaf> this was you 4 and 18 hours ago?
[18:32] <mrjack> . /data/ceph_backend/osd: directory
[18:32] <mrjack> yes
[18:33] <gregaf> again; it was a weekend; have some patience :)
[18:33] <mrjack> and 24.08
[18:33] <mrjack> no reply
[18:33] <mrjack> ok
[18:33] <gregaf> can you re-run the command, adding ???debug_filestore 20 ???debug_osd 20
[18:34] <gregaf> then put the output in pastebin
[18:34] <mrjack> eph-osd -c /etc/ceph/ceph.conf -i 1 --mkjournal --mkfs --monmap /tmp/monmap --mkkey --debug_filestore 20 --debug_osd 20
[18:34] <mrjack> 2012-08-27 18:34:28.718221 f7363710 -1 filestore(/data/ceph_backend/osd) mkfs: BTRFS_IOC_SUBVOL_CREATE failed with error (22) Invalid argument
[18:34] <mrjack> 2012-08-27 18:34:28.718236 f7363710 -1 OSD::mkfs: FileStore::mkfs failed with error -22
[18:34] <mrjack> 2012-08-27 18:34:28.718249 f7363710 -1 ** ERROR: error creating empty object store in /data/ceph_backend/osd: (22) Invalid argument
[18:34] <mrjack> mkfs: BTRFS_IOC_SUBVOL_CREATE failed with error (22) Invalid argument
[18:34] <mrjack> this is the problem
[18:34] <mrjack> it is not BTRFS
[18:35] <mrjack> so this can never be successful..
[18:35] <gregaf> I realize; I thought there would be more output before that
[18:35] * glowell (~Adium@c-98-210-226-131.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[18:35] <mrjack> nope
[18:35] <gregaf> let me go look at the code
[18:35] <mrjack> ok
[18:36] * deepsa_ (~deepsa@115.242.60.50) has joined #ceph
[18:37] * maelfius (~mdrnstm@pool-71-160-33-115.lsanca.fios.verizon.net) Quit (Quit: Leaving.)
[18:37] * deepsa (~deepsa@115.242.22.105) Quit (Ping timeout: 480 seconds)
[18:37] * deepsa_ is now known as deepsa
[18:42] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[18:43] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[18:43] <gregaf> mrjack: hmm, it should be returning EOPNOTSUPP rather than EINVAL???.
[18:43] <mrjack> i also noticed that in 0.48.1argonaut release there is ceph osd repair in the help
[18:43] <mrjack> ceph osd repair <osd-id>
[18:43] <gregaf> can you cat /proc/mounts for me?
[18:43] <mrjack> ceph osd repair 1
[18:43] <mrjack> unknown command repair
[18:44] <mrjack> gregaf: well, i tested this --mkfs with 0.48.1 and ceph version 0.51 (commit:c03ca95d235c9a072dcd8a77ad5274a52e93ae30)
[18:44] <mrjack> it does not work with both versions
[18:44] <mrjack> ceph was created with 0.48
[18:45] <gregaf> I'm not worried about Ceph right now; I'm worried about the output of "cat /proc/mounts", and then I will ask you what kernel version you're running :)
[18:45] <mrjack> node02:/usr/src# uname -a
[18:45] <mrjack> Linux node02 3.2.28-vs2.3.2.13 #1 SMP PREEMPT Tue Aug 21 13:24:48 CEST 2012 x86_64 GNU/Linux
[18:48] <gregaf> well, yep, that appears to be a problem
[18:48] * gregaf (~Adium@2607:f298:a:607:4990:c1e3:3fe9:b77f) has left #ceph
[18:48] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Remote host closed the connection)
[18:48] * gregaf (~Adium@2607:f298:a:607:4990:c1e3:3fe9:b77f) has joined #ceph
[18:48] <gregaf> ah, wrong commands
[18:48] <mrjack> what is the problem?
[18:49] <gregaf> mrjack: try ext4 instead; it appears that ext3 is returning the wrong error code for non-existent ioctls
[18:49] <mrjack> well, it was set up with ext3
[18:49] <mrjack> i cannot change it to ext4
[18:49] <gregaf> if you're building from source I can give you a patch
[18:50] <mrjack> i can
[18:50] <mrjack> that would be great
[18:50] * cattelan (~cattelan@2001:4978:267:0:21c:c0ff:febf:814b) Quit (Read error: Operation timed out)
[18:50] <mrjack> why not have some option --filestore-fstype= ext4|xfs|btrfs|whatever?
[18:51] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[18:51] <gregaf> people would get it wrong ;)
[18:51] <mrjack> hm
[18:51] <gregaf> that's why everybody likes autodetect
[18:51] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[18:51] <mrjack> welll
[18:51] <mrjack> i hate it
[18:51] <mrjack> :)
[18:51] <gregaf> http://pastebin.com/77zQF3ui
[18:52] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[18:52] * bchrisman (~Adium@108.60.121.114) has joined #ceph
[18:52] <masterpe> gregaf: I think so, going to make an new cluster
[18:52] <mrjack> thanks for the patch - for which version is it?
[18:52] <gregaf> it should apply to anything recentish
[18:53] <gregaf> so whatever you're running your cluster on now
[18:53] <mrjack> hm
[18:53] <gregaf> if it doesn't, you can make that one-line change yourself or I can go grab an older checkout
[18:53] <mrjack> one node is ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
[18:53] <mrjack> one is ceph version 0.51 (commit:c03ca95d235c9a072dcd8a77ad5274a52e93ae30)
[18:53] <mrjack> i think i'll update the other node to 0.51
[18:54] <gregaf> you should probably get both of them on the same version; we don't test inter-version compatibility very well
[18:54] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[18:54] <mrjack> gregaf: that was a result from the mails from 24.08.2012
[18:54] <mrjack> gregaf: i did not know what i could do so i decided maybe a upgrade could fix it
[18:55] <mrjack> i'll give it a try
[18:55] <mrjack> bbl
[18:56] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Remote host closed the connection)
[18:56] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[19:02] * glowell (~Adium@c-98-210-226-131.hsd1.ca.comcast.net) has joined #ceph
[19:03] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Quit: Leaving)
[19:03] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[19:04] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) has joined #ceph
[19:07] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[19:08] * mrjack_ (mrjack@office.smart-weblications.net) Quit (Ping timeout: 480 seconds)
[19:10] * deepsa (~deepsa@115.242.60.50) Quit (Ping timeout: 480 seconds)
[19:11] * deepsa (~deepsa@115.241.149.17) has joined #ceph
[19:11] * chutzpah (~chutz@100.42.98.5) has joined #ceph
[19:15] * SpamapS (~clint@xencbyrum2.srihosting.com) Quit (Quit: leaving)
[19:16] * SpamapS (~clint@xencbyrum2.srihosting.com) has joined #ceph
[19:22] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[19:23] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit ()
[19:27] * maelfius (~mdrnstm@66.209.104.107) has joined #ceph
[19:28] * joao (~JL@89.181.149.181) has joined #ceph
[19:28] <joao> I seriously hate reboots
[19:28] <joao> totally forgot to run xchat again
[19:29] <mikeryan> joao: i run irssi on a VPS in a screen session
[19:31] <joao> mikeryan, I would probably end up forgetting all about that session :p
[19:33] <gregaf> can't you set startup items or something?
[19:34] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[19:41] <elder> I seriously hate it when my display system freezes entirely, only fixable by a reboot.
[19:41] <elder> It's happened several times this morning already.
[19:42] <mikeryan> elder: you try switching to a virtual terminal and killing X?
[19:42] <mikeryan> most distros disable ctrl alt backspace these days
[19:42] <elder> My keyboard doesn't respond either. I did try killing X at one time, but no help.
[19:42] <elder> I go to another machine and ssh in.
[19:43] <elder> I may be wrong about the display driver, but I'm fairly sure it's the root of the problem.
[19:43] <mikeryan> yeah, it's not entirely uncommon
[19:43] <mikeryan> especially with binary drivers
[19:44] <elder> The whole setup is so fragile though that I decided against trying to fix it last week. Too scary, and each time I go down that road it seems to guarantee several hours of hell.
[19:44] <elder> So instead, I'm gambling on avoiding the problem, and suffering only when I have to reboot and set everything up again.
[19:48] * BManojlovic (~steki@212.200.243.134) has joined #ceph
[19:51] * deepsa (~deepsa@115.241.149.17) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[19:53] * ihwtl (~ihwtl@odm-mucoffice-02.odmedia.net) Quit (Ping timeout: 480 seconds)
[20:04] * dmick (~dmick@2607:f298:a:607:1a03:73ff:fedd:c856) has joined #ceph
[20:15] * sagelap (~sage@2600:1012:b00a:5efc:c685:8ff:fe59:d486) has joined #ceph
[20:23] * sagelap (~sage@2600:1012:b00a:5efc:c685:8ff:fe59:d486) has left #ceph
[20:27] <ninkotech_> elder: which card/driver?
[21:03] * mrjack_ (mrjack@office.smart-weblications.net) has joined #ceph
[21:10] <elder> ninkotech_, ATI FirePro v4800
[21:10] <elder> Using the binary driver, since it works (mostly) and switching is as I said, scary and time-consuming and not often very satisfying in my experience.
[21:10] <ninkotech_> i was having similar hell with some intel drivers... they were becoming worse each version...
[21:10] <ninkotech_> for 2-3 years
[21:10] <ninkotech_> now they work well
[21:11] <ninkotech_> elder: use free software only :) no switching
[21:11] <ninkotech_> integrity!
[21:11] <elder> My problem is I have 3 displays active. The card hardware supports it and the software does for the most part, but 3 displays is not a formally supported configuration.
[21:11] <joao> nvidia's official linux driver was also breaking everything around here when using a dual screen
[21:11] <joao> had to resort to nouveau
[21:13] <elder> Well, I'll gladly switch to something that works better. I've just wasted so much time on video drivers and configuration I am now prone to stick with what (mostly) works.
[21:25] * peanuts (~ada1c731@2600:3c00::2:2424) has joined #ceph
[21:25] * nhm (~nhm@67-220-20-222.usiwireless.com) has joined #ceph
[21:34] <dmick> my Radeon HD 5450 has yet to work correctly with screenlock
[21:34] <dmick> about half the time, it resumes from screenlock/blank with one black screen and one half-blue screen, so I need to type blind to unlcok
[21:36] <mikeryan> my laptop loses its mind sometimes when i remove an external display
[21:36] <mikeryan> the only real solution it to use the hotkey to open a terminal and sudo reboot
[21:36] <mikeryan> has a sysreq key too, but i haven't had to use it... yet
[21:37] * peanuts (~ada1c731@2600:3c00::2:2424) Quit (Quit: TheGrebs.com CGI:IRC (EOF))
[21:39] <masterpe> As I sad earlyer, I get an failure: 2012-08-27 21:38:36.604132 7f12d4c4b700 0 mds.-1.0 ms_handle_connect on 192.168.1.13:6789/0
[21:40] <gregaf> masterpe: that's not a failure; what makes you think it is?
[21:41] <masterpe> I saw that http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/6969
[21:42] <masterpe> I can fix it by removing the pool metadata and data
[21:43] <masterpe> But how do I make an export of my data in rbd
[21:44] <masterpe> gregaf: I can't access my data any more?
[21:44] <Cube> nhm: Are you around?
[21:45] <gregaf> masterpe: okay, but that message isn't the cause at all :)
[21:45] <gregaf> you'll want to gather up some generic debugging info; what's the output of ceph -s?
[21:51] <masterpe> ceph -s hangs
[21:52] <gregaf> then you have a problem with your monitors ??? make sure they're running
[21:52] <masterpe> root 14110 0.0 0.6 133664 13888 ? Ssl 17:55 0:03 /usr/bin/ceph-mon -i g --pid-file /var/run/ceph/mon.g.pid -c /tmp/ceph.conf.25453
[21:53] <gregaf> you have 7 monitors?
[21:53] <gregaf> (and check all the others too)
[21:54] <masterpe> I had 3 monitors but wanted to do to 5 new systems
[21:55] <masterpe> so i added the 5 systems and after that i removed the 3 systems
[21:55] <gregaf> is that when things stopped working?
[21:55] <masterpe> yes
[21:57] <gregaf> okay, so it sounds like you didn't do that correctly
[21:57] <gregaf> you should bring up the original three monitors and see if you can get anything out of ceph -s
[21:57] <gregaf> and see how many monitors they think are in the system
[21:57] <masterpe> That is possible
[21:58] <gregaf> I suppose if you're certain you did it right, you might also have just forgotten to add the new monitors to your ceph.conf so that your OSD and MDS daemons and clients can find them
[22:00] <joao> gregaf, sagewk, are we doing it now? if so, vidyo or should we stick with irc?
[22:00] <gregaf> he was in a meeting; I was about to check...
[22:01] <joao> ok
[22:01] <sagewk> joao: skype!
[22:01] <joao> okay, firing up the laptop :)
[22:02] <masterpe> http://pastebin.com/6w0pJ9Y4
[22:03] * sjust (~sam@2607:f298:a:607:baac:6fff:fe83:5a02) has joined #ceph
[22:05] <sjust> masterpe: so bringing up the original monitors resolved the problem?
[22:06] <sjust> how did you attempt to add the additional 4 monitors
[22:06] <sjust> ?
[22:07] <masterpe> By following: http://ceph.com/docs/master/ops/manage/grow/mon/
[22:08] <masterpe> 49 ceph mon getmap -o /srv/mon.h/monmap
[22:08] <masterpe> 50 ceph auth get mon. -o /srv/mon.h/monkey
[22:08] <masterpe> 51 ceph-mon -i h --mkfs --monmap /srv/mon.h/monmap --keyring /srv/mon.h/monkey
[22:08] <sjust> how did you then start the ceph-mon process?
[22:08] <sjust> (or does --mkfs also start the daemon, anyone?)
[22:09] <masterpe> I also add it to ceph.conf
[22:09] <sjust> k
[22:09] <sjust> can you paste your ceph.conf to pastebin or something?
[22:09] <masterpe> 43 ceph-mon -i h --public-addr 192.168.1.14:6789
[22:10] <masterpe> but also I switch down the whole cluster en restarted
[22:10] <sjust> You started with three and want to transition to having 5?
[22:10] <masterpe> (it is an test enviroment)
[22:10] <sjust> where the new 5 and the old 3 do not overlap?
[22:11] * adjohn (~adjohn@69.170.166.146) has joined #ceph
[22:14] <masterpe> http://pastebin.com/aLAwFJcM
[22:14] <masterpe> That are the ceph of the three stages.
[22:15] <sjust> ok, try starting the rest of the monitors
[22:16] <sjust> ceph -s should show all of the monitors except d, I think
[22:18] <nhm> Cube: heya, sorry about that. My primary internet connection went down AGAIN, so now I've purchased a second slower connection just so that I have a backup. ;(
[22:18] <nhm> Cube: anyway, I'm here now...
[22:19] <Cube> nhm: Hey, I wrote a script to run the fio tests concurrently and drop the caches on the osd's between tests
[22:20] <nhm> Cube: cool! Tests looking any different?
[22:20] <Cube> way differnet
[22:20] <Cube> 70mb/s vs the 300 we were seeing
[22:20] <Cube> 4gb tests running now, almost done.
[22:22] <nhm> That sounds a lot more like what I'd expect to see...
[22:22] <masterpe> sjust: http://pastebin.com/dHm0tkCf
[22:22] <nhm> bbiab
[22:34] <masterpe> Still the mount point find /var/lib/vzdata-ceph
[22:34] <masterpe> s/mount point/find
[22:36] <sjust> you started the new mons?
[22:36] <sjust> this indicates that only 4 are up
[22:37] * EmilienM (~EmilienM@98.49.119.80.rev.sfr.net) has left #ceph
[22:41] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:42] <sagewk> joao: http://ceph.com/presentations
[22:42] <masterpe> Yes, ps aux| grep ceph says that the mon service are active
[22:42] <masterpe> on all nodes
[22:42] <joao> sagewk, thanks :)
[22:43] <sjust> masterpe: previously, did you confirm that all mons were in the quorum?
[22:43] <masterpe> and ceph.conf are all the same
[22:43] <masterpe> sjust: How do i do that?
[22:44] <sjust> ah, it's in the ceph -s output, ?? ??monmap e9: 4 mons at {a=192.168.1.3:6789/0,b=192.168.1.5:6789/0,c=192.168.1.4:6789/0,d=192.168.1.6:6789/0}, election epoch 4, quorum 0,1,2,3 a,b,c,d
[22:44] <sjust> it lists 4 mons, a,b,c,d
[22:44] <sjust> so the first thing is to get it to recognize all of them
[22:45] <masterpe> OK, and how?
[22:45] <sjust> yeah, working that out now :)
[22:46] <masterpe> That is what try
[22:47] <sjust> ok, restart mon.e with --debug-mon=20
[22:47] <sjust> that is, add --debug-mon=20 after the other command line options
[22:47] <masterpe> to do before, and I removed all if the new mon and reactived but I still got the same problem
[22:47] <sjust> that'll give us some debugging
[22:48] <masterpe> sjust: tnks, ik will do that, one second
[22:48] <sjust> k
[23:16] <masterpe> sjust: tnks, monmap e13: 8 mons at {a=192.168.1.3:6789/0,b=192.168.1.5:6789/0,c=192.168.1.4:6789/0,d=192.168.1.6:6789/0,e=192.168.1.11:6789/0,f=192.168.1.12:6789/0,g=192.168.1.13:6789/0,h=192.168.1.14:6789/0}, election epoch 20, quorum 0,1,2,3,4,5,6,7 a,b,c,d,e,f,g,h
[23:17] <masterpe> After enabling the debug, I saw that it couldn't find /srv/mon.*/keyring
[23:18] <masterpe> I copyed it from a and after it worked
[23:18] <nhm> Cube: ping
[23:19] <masterpe> I still get that the find command hangs on: find /var/lib/vzdata-ceph
[23:19] <masterpe> But that is an problem for tomorrow
[23:19] <Cube> nhm: Hey, just parsing the results, will have them up in a second.
[23:19] <nhm> Cube: cool, looking forward to it. :)
[23:22] <masterpe> sjust and gregaf thanks for your help
[23:22] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[23:22] <sjust> masterpe: happy to help, let us know if you hit any more trouble!
[23:25] * masterpe (~masterpe@87.233.7.43) Quit (Quit: Reconnecting)
[23:25] * masterpe_ (~masterpe@87.233.7.43) has joined #ceph
[23:26] * masterpe_ is now known as masterpe
[23:28] * masterpe (~masterpe@87.233.7.43) Quit ()
[23:28] * masterpe (~masterpe@2001:990:0:1674::1:82) has joined #ceph
[23:32] <Cube> nhm: still have the link to the spreadsheet?
[23:33] <Cube> you're on jabber now :) that will be easier
[23:34] * bitsweat_ (~bitsweat@ip68-106-243-245.ph.ph.cox.net) Quit (Remote host closed the connection)
[23:48] * womble (~mjp16@2001:470:1f0e:9e6::2) has left #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.