#ceph IRC Log


IRC Log for 2012-07-11

Timestamps are in GMT/BST.

[0:08] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) has joined #ceph
[0:08] * CristianDM (~CristianD@host92.201-252-27.telecom.net.ar) has joined #ceph
[0:16] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) Quit (Ping timeout: 480 seconds)
[0:16] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) has joined #ceph
[0:20] * Dr_O is now known as Dr_O__
[0:21] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) Quit (Remote host closed the connection)
[0:22] * loicd (~loic@ has joined #ceph
[0:23] * LarsFronius (~LarsFroni@95-91-243-243-dynip.superkabel.de) Quit (Quit: LarsFronius)
[0:27] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) has joined #ceph
[0:34] <Tv_> typo of the moment: pip install --upgrage
[0:34] <Tv_> yes i was a little bit furious
[0:36] * yehudasa (~yehudasa@2607:f298:a:607:18ee:8529:6607:79ec) has joined #ceph
[0:41] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) has joined #ceph
[0:47] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) has left #ceph
[0:53] * yoshi (~yoshi@p22043-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:08] * adjohn (~adjohn@ Quit (Quit: adjohn)
[1:20] * loicd (~loic@ Quit (Quit: Leaving.)
[1:20] * Tv_ (~tv@2607:f298:a:607:c4fb:49d5:841d:f90) Quit (Quit: Tv_)
[1:24] * CristianDM (~CristianD@host92.201-252-27.telecom.net.ar) Quit ()
[1:35] <dmick> rage++
[1:48] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) Quit (Quit: Ex-Chat)
[1:54] * lofejndif (~lsqavnbok@04ZAAEFMZ.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[1:57] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) has joined #ceph
[1:59] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) Quit ()
[2:43] <iggy> i used to type updegrade a lot for some reason
[2:47] * macan (~macan@ has joined #ceph
[3:09] * jpieper (~josh@209-6-86-62.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) Quit (Quit: Ex-Chat)
[3:11] * izdubar (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[3:12] * izdubar (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit ()
[3:17] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[3:59] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) Quit (Quit: Leaving.)
[4:19] * chutzpah (~chutz@ Quit (Quit: Leaving)
[4:26] * legume (~legumeleg@c-98-235-160-244.hsd1.pa.comcast.net) has joined #ceph
[4:26] * legume (~legumeleg@c-98-235-160-244.hsd1.pa.comcast.net) has left #ceph
[4:52] * nhmlap (~Adium@2607:f298:a:607:6d53:e1b:eb1c:6ec3) Quit (Quit: Leaving.)
[5:05] * widodh (~widodh@minotaur.apache.org) has joined #ceph
[5:05] * widodh_ (~widodh@minotaur.apache.org) Quit (Ping timeout: 480 seconds)
[5:17] * nhmlap (~Adium@ has joined #ceph
[5:32] <nhmlap> sjust: so looks like I'm still seeing gaps with the new code and with the filestore flusher off. Very similar to the other run where they occur every 15-20 seconds for 3-4 seconds. Bursts inbetween are ~600MB/s though which is still way faster than it used to be.
[5:33] <nhmlap> I'll try to get some movies up.
[6:27] * loicd (~loic@ has joined #ceph
[6:43] * dmick (~dmick@ Quit (Quit: Leaving.)
[6:43] * nhmlap (~Adium@ Quit (Quit: Leaving.)
[6:46] * RupS| (~rups@panoramix.m0z.net) has joined #ceph
[6:49] * RupS (~rups@panoramix.m0z.net) Quit (Read error: Connection reset by peer)
[6:49] * rz (~root@ns1.waib.com) Quit (Remote host closed the connection)
[6:49] * rz (~root@ has joined #ceph
[7:55] <chuanyu> hi , did this bug have resolved? http://tracker.newdream.net/issues/2595
[7:55] <chuanyu> I still got the "provided osd id 1 != superblock's -1" :(
[7:56] <chuanyu> ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030)
[8:15] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[8:44] * andreask1 (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[8:44] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[9:07] * Dr_O__ (~owen@host-2-96-191-87.as13285.net) Quit (Remote host closed the connection)
[9:08] * macan (~macan@ has left #ceph
[9:35] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) has joined #ceph
[9:46] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[9:56] * fghaas (~florian@ has joined #ceph
[10:01] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[10:02] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) Quit (Remote host closed the connection)
[10:13] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[10:17] <chuanyu> sorry for my mistake, when I add new osd, i ran "mkcephfs --init-daemon osd.1" before "ceph-osd --monmap monmap -i 1 --mkfs"
[10:17] <chuanyu> so I got the above error
[10:17] <chuanyu> after recheck the wiki "Replacing a failed disk/OSD", i ran just "ceph-osd --mkfs"
[10:18] <chuanyu> and it works :)
[10:29] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[10:37] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[10:51] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[10:53] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[10:58] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[10:58] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[11:03] * keret (ca037809@ircip2.mibbit.com) has joined #ceph
[11:06] <keret> hi! any1 here?
[11:06] <joao> there's a lot of people here, although most is just idling
[11:07] <keret> yeah, I noticed that.
[11:07] * loicd (~loic@ Quit (Quit: Leaving.)
[11:12] <mgalkiewicz> hi guys
[11:13] <joao> morning
[11:13] <mgalkiewicz> I have a problem with osd
[11:13] <mgalkiewicz> pgmap v3087455: 792 pgs: 792 stale+peering; 22742 MB data, 0 KB used, 0 KB / 0 KB avail
[11:13] <mgalkiewicz> osd is started
[11:13] <mgalkiewicz> I will paste the log
[11:14] <mgalkiewicz> https://gist.github.com/3089209
[11:15] <mgalkiewicz> joao: could you help me with debugging?
[11:15] <joao> let me take a look
[11:16] <mgalkiewicz> it is very important for me because production server is affected
[11:16] <joao> there is nothing apparently wrong with the log, aside from the fact that the OSD should have continued its work from then on
[11:16] <joao> could you please provide the 'ceph status' output?
[11:17] <mgalkiewicz> yes
[11:18] <joao> and is that gist all there is regarding the log?
[11:18] <mgalkiewicz> I have updated the gist
[11:18] <mgalkiewicz> joao: yes
[11:19] <keret> can any1 help me studying ceph quickly? some urls might be helpful.
[11:19] <joao> that's a bummer; I'd guess that's something that isn't reaching the monitor for some reason
[11:19] <joao> but that's just a guess at this moment
[11:19] <fghaas> keret: http://www.ceph.com/docs/master
[11:19] <fghaas> should get you started fairly well
[11:19] <mgalkiewicz> joao: so osd cannot communicate with mon?
[11:20] <joao> mgalkiewicz, is the osd up in 'ceph status' ?
[11:20] <joao> and 'in' ?
[11:20] <mgalkiewicz> https://gist.github.com/3089228
[11:20] <joao> if not, that might be it
[11:20] <keret> i ws unable to find hw exact low level file storage is carried out>
[11:20] <joao> "osdmap e991: 1 osds: 0 up, 0 in"
[11:20] <mgalkiewicz> one mon is down but 2 others works fine
[11:21] <fghaas> mgalkiewicz: did you just upgrade to 0.48? maybe you're getting bitten by the new default keyring locations?
[11:21] <joao> yeah, it's not in, nor informing the leader that it's up
[11:21] <mgalkiewicz> fghaas: no osd is 0.44 but mons are 0.48
[11:21] <joao> mgalkiewicz, the last time I had this problem it was indeed related with auth
[11:22] <mgalkiewicz> ok but osd should report in logs that it cannot authenticate
[11:22] <joao> if there's any chance, you should try running the osd with --debug-auth 20
[11:22] <mgalkiewicz> sure
[11:22] <joao> and maybe --debug-monc 20 as well
[11:23] <joao> just to make sure the osd is finding someone to talk to
[11:24] <mgalkiewicz> adding debug to osd did not show anything more in logs
[11:25] <mgalkiewicz> /usr/bin/ceph-osd -i 0 -c /etc/ceph/ceph.conf --debug-auth 20
[11:25] <mgalkiewicz> i did sth like this
[11:25] <joao> add the --debug-monc 20 then
[11:25] <joao> and --debug-ms 10
[11:26] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[11:27] * yoshi (~yoshi@p22043-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:27] <joao> I'm having flashbacks of two weeks ago, when I had that very same problem; but back then I was tinkering with the mon code :)
[11:27] <joao> (no, it didn't go upstream)
[11:27] * loicd (~loic@ has joined #ceph
[11:29] <mgalkiewicz> joao: https://gist.github.com/3089281
[11:29] <mgalkiewicz> the command was /usr/bin/ceph-osd -i 0 -c /etc/ceph/ceph.conf --debug-auth 20 --debug-monc 20 --debug-ms 10
[11:30] <mgalkiewicz> and some strace data: https://gist.github.com/3089286
[11:33] <mgalkiewicz> joao: any idea?
[11:34] <mgalkiewicz> do you need some logs from mon?
[11:34] <joao> I'm inclined to say this is the osd waiting to receive a message, but just not sure
[11:34] * raso (~raso@deb-multimedia.org) Quit (Quit: WeeChat 0.3.7)
[11:34] <joao> yeah, having the --debug-auth 20 and --debug-ms 10 from the leader would be nice
[11:35] <mgalkiewicz> how to check which one is the leader?
[11:35] <fghaas> check logs for "won leader election"
[11:36] <fghaas> normally the mon with the lowest IP address
[11:36] <joao> in your setup it would be n8c1
[11:36] <joao> or n4c1, if that one is back up
[11:36] <mgalkiewicz> n4c1 is down
[11:36] <mgalkiewicz> 1 mon.n9c1@2(peon).log v413041 check_sub sub osdmap not log type
[11:37] <mgalkiewicz> looks like osd problem
[11:37] <fghaas> joao: is that 0.44 osd with 0.48 mon combination expected to work?
[11:37] <joao> fghaas, I wish I knew
[11:38] <fghaas> :)
[11:38] <mgalkiewicz> I can perform the upgrade
[11:38] * raso (~raso@deb-multimedia.org) has joined #ceph
[11:38] <joao> mgalkiewicz, that message should not happen
[11:38] <mgalkiewicz> osd takes about 29GB of data and I am not sure how long it will take to reorganize data
[11:39] <mgalkiewicz> however this combination worked
[11:39] <fghaas> http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/6085 is not too encouraging in that regard, but it's from april
[11:39] <mgalkiewicz> mon.n8c1@0(leader).log v413041 check_sub sub osdmap not log type
[11:40] <joao> yeah... w8 a sec
[11:40] <joao> let me check something
[11:41] <mgalkiewicz> recently I tried to add another osd
[11:42] <mgalkiewicz> but I failed on https://gist.github.com/3078131
[11:42] <mgalkiewicz> do you think that it might broke sth?
[11:43] <fghaas> is that a new OSD?
[11:43] <fghaas> shouldn't you add --mkjournal there?
[11:43] <fghaas> (unsure how that could possibly cause your apparent global OSD failure though)
[11:45] <joao> erm
[11:45] <mgalkiewicz> ok do you need some more logs?
[11:45] <joao> mgalkiewicz, I'm starting to believe that a 0.44 osd and a 0.48 monitor won't play well with each other
[11:45] <mgalkiewicz> do you think that upgrade is a good idea?
[11:46] <joao> there were some changes to the map subscription mechanism back in May that may be the cause of your problem
[11:46] <mgalkiewicz> how long might it take to convert 29GB of data (version 0.48 does such thing)?
[11:46] <mgalkiewicz> joao: or maybe just downgrade mons?
[11:47] <joao> mgalkiewicz, I have no idea when it comes to the time to convert the data
[11:48] <joao> and after upgrading the mons, I'm not sure if it is a good idea to downgrade, unless you can cope with a ceph-mon --mkfs if something goes wrong
[11:48] <joao> but I'm out of my depth when it comes to know what the best approach in this case might be
[11:49] <mgalkiewicz> what coping with ceph-mon --mkfs mean?
[11:49] <joao> well, if you have no problem with doing that
[11:50] <joao> and I mean, starting the mons with a fresh store
[11:51] <joao> mgalkiewicz, I don't want to give you wrong infos here, so take that with a grain of salt
[11:52] <mgalkiewicz> joao: ok I will try with mon downgrade
[11:53] <joao> hope it goes well
[11:54] * ninkotech (~duplo@ has joined #ceph
[12:06] * loicd (~loic@ Quit (Quit: Leaving.)
[12:15] <mgalkiewicz> joao: there is a progress
[12:15] <joao> great!
[12:15] <mgalkiewicz> osd is up
[12:15] <mgalkiewicz> but can map any rbd volume
[12:15] <joao> I was holding my breath here
[12:15] <mgalkiewicz> cannot
[12:15] <joao> oh man, rbd is so out of my league :\
[12:15] * keret (ca037809@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[12:16] <joao> elder, around?
[12:16] <joao> I don't think he'll be around for another hour or so
[12:16] <fghaas> I *may* be able to help with that, any more details available?
[12:17] <joao> mgalkiewicz, is 'ceph status' reporting the osd up and in ?
[12:17] <mgalkiewicz> https://gist.github.com/3089457
[12:17] <mgalkiewicz> status looks fine for me
[12:18] <fghaas> nope, they're all still peering
[12:18] <mgalkiewicz> what does it mean
[12:18] <fghaas> hangon, lemme dig up the doc link for you
[12:19] <joao> yeah, there's still an election in progress
[12:19] <fghaas> (in short, it means you can't use that data yet though :) so rbd wouldn't be expected to work)
[12:19] <mgalkiewicz> election of what?
[12:19] <mgalkiewicz> mon is elected
[12:19] <joao> mgalkiewicz, the monitors will have to elect a leader in order to properly work
[12:19] <mgalkiewicz> 2012-07-11 12:12:05.271947 7f1429e76700 log [INF] : mon.n8c1@0 won leader election with quorum 0,2
[12:19] <joao> okay then
[12:19] <joao> so that's not it
[12:20] <joao> oh, nevermind
[12:20] <fghaas> mgalkiewicz: http://ceph.com/docs/master/ops/manage/failures/osd/ ... scroll down for "peering failure"
[12:20] <joao> but fghaas has a point
[12:20] <fghaas> you sure you only want to have one osd in that whole setup?
[12:20] <mgalkiewicz> no I wanted to add another one but I failed
[12:21] <mgalkiewicz> right now i want to have one working
[12:22] <fghaas> well what are the PGs waiting to peer with then? is "ceph pg <pgid> query" saying anything in peering_blocked_by?
[12:22] <mgalkiewicz> https://gist.github.com/3089499
[12:23] <fghaas> yeah, so "ceph pg 220.2 query" for example, what is that saying?
[12:23] <mgalkiewicz> checking
[12:23] <mgalkiewicz> https://gist.github.com/3089499
[12:24] <mgalkiewicz> it hangs after those lines
[12:24] <fghaas> can you try as client.admin instead please?
[12:24] <mgalkiewicz> ok
[12:24] <mgalkiewicz> the same
[12:25] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[12:25] <fghaas> wtf
[12:25] <mgalkiewicz> what is probably weird nothing happens inn osd log
[12:26] <fghaas> could your OSD just be waiting for I/O forever?
[12:26] <fghaas> it seems like you're running a pretty old kernel there, for putting your OSD on btrfs
[12:26] <mgalkiewicz> 15637 ? Dsl 0:26 /usr/bin/ceph-osd -i 0 -c /etc/ceph/ceph.conf
[12:26] <mgalkiewicz> D suggest iowait
[12:27] <fghaas> yup
[12:27] <mgalkiewicz> kernel 3.2.12-1
[12:27] <fghaas> huh? then why did that other gist complain about SNAP_CREATE_V2 being unavailable?
[12:27] <joao> does dmesg report anything useful?
[12:28] <fghaas> fair question, but for an I/O tarpit, I wouldn't be surprised if it doesn't
[12:28] <mgalkiewicz> joao: no
[12:28] <joao> damn, I was hoping we could blame it on btrfs :p
[12:28] <fghaas> probably no easy way for you to get this unstuck other than hard-rebooting the box, at your own risk... is that an option at all?
[12:29] <mgalkiewicz> the server was rebooted
[12:29] <fghaas> and ceph-osd is getting stuck in D immediately after?
[12:30] <mgalkiewicz> https://gist.github.com/3089526
[12:30] <fghaas> nonono, I meant reboot the box
[12:30] <mgalkiewicz> whats the difference?
[12:30] <fghaas> if osd.0 is stuck in D, it's unkillable
[12:31] <mgalkiewicz> but the old process is gone
[12:31] <fghaas> oh, true, so it doesn't get permanently stuck in D then?
[12:31] <mgalkiewicz> strace shows that sth is going so probably not
[12:32] <fghaas> slow filestore, or slow journal device?
[12:32] <mgalkiewicz> ?
[12:33] <fghaas> well if your OSD is getting repeatedly stuck in D, it may be blocking on I/O in your filestore, or on your journal
[12:33] <fghaas> or both
[12:33] <mgalkiewicz> so what do you suggest?
[12:33] * lofejndif (~lsqavnbok@9KCAAGO86.tor-irc.dnsbl.oftc.net) has joined #ceph
[12:34] <fghaas> mgalkiewicz: not sure what to suggest at this point, except try to figure out if and where your I/O is blocking
[12:34] <joao> mgalkiewicz, from what I can tell, the PG's go into the peering state when they are recovering
[12:35] <mgalkiewicz> hmm this is interesting
[12:35] <mgalkiewicz> https://gist.github.com/3089545
[12:35] <joao> although I'm not used to deal with more than just a couple of MB worth of osds for testing purposes, maybe 29GB takes its time to recover
[12:36] <joao> only reads? let me take a look at how this recovery works
[12:36] <mgalkiewicz> but gist shows that 99% of time osd is in iowait
[12:37] <fghaas> my best guess would be dead slow hardware
[12:38] <mgalkiewicz> I will perform smart test on my harddisk
[12:40] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[12:40] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[12:40] <mgalkiewicz> is it normal that osd is only reading the disk?
[12:47] <mgalkiewicz> fghaas: ?
[12:47] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Quit: LarsFronius)
[12:48] <joao> mgalkiewicz, I'm looking into that
[12:48] <mgalkiewicz> joao: thx I really appreciate your help guys
[12:48] <joao> if we were replaying the journal, the answer would be no; but I'm not sure how the pg recovery works, so...
[12:51] * widodh_ (~widodh@minotaur.apache.org) has joined #ceph
[12:52] * widodh (~widodh@minotaur.apache.org) Quit (Ping timeout: 480 seconds)
[12:53] <mgalkiewicz> is it ok to run btrfsck on this partition?
[12:54] <joao> is btrfsck stable now?
[12:55] <mgalkiewicz> no
[12:55] <mgalkiewicz> but i did it anyway
[12:55] <mgalkiewicz> found 29961011200 bytes used err is 0
[12:55] <mgalkiewicz> looks like there are no erros
[12:56] <mgalkiewicz> smart short test did not detect any problems with disks
[12:56] <mgalkiewicz> so the hardware looks fine
[12:57] <mgalkiewicz> other processes working on this server are not in iowait
[12:57] <mgalkiewicz> so it looks like osd problem for me
[12:58] <joao> mgalkiewicz, try running the osd with --debug-osd 20 --debug-filestore 20 and we'll take another look at what's happening
[12:58] <mgalkiewicz> ok
[12:59] <joao> brb; brewing some more coffee
[13:02] <mgalkiewicz> https://gist.github.com/3089696
[13:02] <mgalkiewicz> there are a lot of data I pasted the most recent
[13:06] <joao> taking a look
[13:07] <mgalkiewicz> and here is some more https://gist.github.com/3089714
[13:13] <joao> I have no idea if those mismatches are the root of all problems, but from reading the source it doesn't seem to be
[13:14] <joao> I have a feeling that everything should be okay when the osd finishes going through all the pg's
[13:16] <joao> and apparently, the read-only behavior is to be expected; the osd is going through the metadata, like this:
[13:16] <joao> 2012-07-11 12:59:54.488708 7fabd1980780 filestore(/srv/ceph/osd.0) collection_getattr /srv/ceph/osd.0/current/45.1_head 'info'
[13:16] <joao> 2012-07-11 12:59:54.488745 7fabd1980780 filestore(/srv/ceph/osd.0) collection_getattr /srv/ceph/osd.0/current/45.1_head 'info' = 432
[13:16] <joao> 2012-07-11 12:59:54.488755 7fabd1980780 filestore(/srv/ceph/osd.0) read meta/28c9bd01/pginfo_45.1/0 0~0
[13:16] <joao> 2012-07-11 12:59:54.488816 7fabd1980780 filestore(/srv/ceph/osd.0) FileStore::read meta/28c9bd01/pginfo_45.1/0 0~256/256
[13:16] <joao> 2012-07-11 12:59:54.488836 7fabd1980780 filestore(/srv/ceph/osd.0) collection_getattr /srv/ceph/osd.0/current/45.1_head 'ondisklog'
[13:16] <joao> 2012-07-11 12:59:54.488855 7fabd1980780 filestore(/srv/ceph/osd.0) collection_getattr /srv/ceph/osd.0/current/45.1_head 'ondisklog' = 30
[13:21] <mgalkiewicz> hmm those log mismatch appears in log periodically
[13:23] <mgalkiewicz> but pg should be not in peering state after this?
[13:23] <mgalkiewicz> what a sec sth new is in health detail
[13:24] <mgalkiewicz> pg 1.5 is stuck stale+peering, last acting [0]
[13:24] <mgalkiewicz> joao: last time it was peering
[13:24] <mgalkiewicz> and now the status is osd e1002: 1 osds: 0 up, 0 in
[13:24] <joao> well, at least there's something changing
[13:25] <joao> although it bums me out that the osd is no longer up and in
[13:25] <mgalkiewicz> I was restarting it
[13:25] <joao> oh
[13:25] <mgalkiewicz> when I added debug options
[13:26] <mgalkiewicz> but it produces to big logfile
[13:26] <mgalkiewicz> so I have removed it
[13:26] <joao> yeah, debugging with 20 on several fronts will produce a huge log file
[13:26] <joao> but I often believe I rather have huge log files than be lacking debug infos ;)
[13:28] <mgalkiewicz> what is more https://gist.github.com/3089772
[13:28] <mgalkiewicz> joao: what now?
[13:29] <joao> oh man...
[13:30] <joao> wasn't expecting anything so strange
[13:30] <mgalkiewicz> what is so strange?
[13:30] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[13:31] <joao> mgalkiewicz, I was expecting something more verbose, that's all
[13:31] <joao> never seen that, gotta take a closer look at where that came from
[13:32] <mgalkiewicz> ok
[13:34] <joao> oh
[13:34] <joao> wait, is your osd up?
[13:34] <joao> and in the cluster?
[13:35] <mgalkiewicz> no
[13:36] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[13:37] <joao> yeah, that's probably why it does not map to an osd
[13:38] <joao> I was trying to pinpoint that on the source code, but it surely looks life a plausible explanation
[13:38] <joao> so we're back to "why isn't the osd up and in the cluster"
[13:38] <mgalkiewicz> I will enable debugging
[13:39] <joao> cool; drop the --debug-filestore
[13:39] <joao> I don't think we'll need it
[13:39] <joao> --debug-ms 10 --debug-monc 10 --debug-osd 20
[13:39] <joao> we may not need it all, but that should produce a whole lot of useful output
[13:40] <mgalkiewicz> this is weird
[13:40] <mgalkiewicz> I did nothing and osd is now up
[13:40] <joao> it could be recovering
[13:40] <joao> is it in?
[13:41] <mgalkiewicz> now it is peering
[13:41] <mgalkiewicz> it is in
[13:41] <mgalkiewicz> so we are back on peering
[13:41] <mgalkiewicz> maybe I should wait some time
[13:42] <mgalkiewicz> what do you think?
[13:42] <mgalkiewicz> but debugging is off
[13:42] <joao> yeah, wait on it
[13:42] <mgalkiewicz> how long?
[13:42] <joao> I think that's a safe bet, and it is probably checking the pg's metadata
[13:43] <joao> stopping it will probably restart the process the next time
[13:43] <mgalkiewicz> yeah
[13:43] <mgalkiewicz> ok
[13:44] <joao> I don't know for how long; have no idea how long it could take to bring an osd with a couple dozen GB worth of data
[13:45] <mgalkiewicz> ok I will let you know
[13:45] <joao> cool
[13:45] <joao> I'm going to have lunch, and will be back in a bit to check on how's it going :)
[13:46] <mgalkiewicz> ok
[13:48] * asadpanda (~asadpanda@2001:470:c09d:0:20c:29ff:fe4e:a66) Quit (Ping timeout: 480 seconds)
[13:49] <mgalkiewicz> joao: one weird thing osd is in iowait all the time but it looks like he started to write data
[13:50] <joao> how's ceph status going?
[13:50] <mgalkiewicz> and consumes a lot of ram about 11GB
[13:50] <mgalkiewicz> and a lot of cpu time
[13:50] <joao> that is a lot indeed
[13:51] <mgalkiewicz> status is the same
[13:51] <mgalkiewicz> my machine is swapping
[13:52] <joao> yeah, swapping could make things worse :\
[13:52] <mgalkiewicz> at least we know that it does sth
[13:53] <mgalkiewicz> do you have any idea how my rbd clients will react when osd is up and running?
[13:53] <joao> unfortunately, no
[13:53] <mgalkiewicz> will be up and running
[13:53] <mgalkiewicz> ok
[13:54] <joao> well, going to grab lunch; promise to be back shortly
[13:54] <mgalkiewicz> I am not sure if we should see 100% usage of cpu
[13:54] <mgalkiewicz> ok
[13:57] <mgalkiewicz> joao: https://gist.github.com/3089928
[13:58] <mgalkiewicz> it is once again stale+peering
[13:58] <mgalkiewicz> and the status
[13:58] <mgalkiewicz> osd e1005: 1 osds: 0 up, 1 in
[14:00] <mgalkiewicz> and now it is osd e1006: 1 osds: 0 up, 0 in
[14:04] <mgalkiewicz> heartbeat_map reset_timeout 'OSD::command_tp thread 0x7fdb1fdb9700' had timed out after 4
[14:08] <mgalkiewicz> and out of memory kernel killed osd
[14:09] * widodh_ (~widodh@minotaur.apache.org) Quit (Read error: Connection reset by peer)
[14:09] * widodh (~widodh@minotaur.apache.org) has joined #ceph
[14:11] <mgalkiewicz> and here are some logs with debugging options --debug-ms 10 --debug-monc 10 --debug-osd 20 https://gist.github.com/3089983
[14:11] <mgalkiewicz> nothing after the last line was logged
[14:11] <mgalkiewicz> well now it is log mismatch
[14:13] <mgalkiewicz> osd is now stale+peering and 0up 0in
[14:15] * loicd (~loic@2001:67c:28dc:850:412e:1426:df7a:be35) has joined #ceph
[14:29] * todin (tuxadero@kudu.in-berlin.de) has joined #ceph
[14:30] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) has joined #ceph
[14:32] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Quit: LarsFronius)
[14:33] <todin> Hi, I found a commit from josh, which adds discard to the rbd blocklayer in qemu, but somehow it does not work, any ideas?
[14:35] <pmjdebruijn> hi guys
[14:35] <pmjdebruijn> I have a small patch, I assume the preferred way to submitting a patch is via the mailing list? Do you prefer attached patches? Or inlined?
[14:36] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) Quit (Remote host closed the connection)
[14:36] <fghaas> pmjdebruijn: my experience has been that git send-email is fine for everyone, and so is forking ceph on github and issuing a pull request there
[14:37] <fghaas> just make sure that you add your "Signed-off-by:" line (hint: "git commit -s")
[14:41] <pmjdebruijn> oh
[14:41] <pmjdebruijn> I'm manually composing an email now
[14:42] <pmjdebruijn> http://elinux.org/Patch_Submission_HOWTO using that as a guideline
[14:42] <pmjdebruijn> there's an example
[14:43] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) has joined #ceph
[14:43] <fghaas> pmjdebruijn: oh goodness, how complicated is that? :)
[14:44] <fghaas> (that wiki page, I mean)
[14:54] <pmjdebruijn> well I'm just following the example :)
[14:54] * nhmlap (~Adium@ has joined #ceph
[14:55] <joao> mgalkiewicz, can't think of anything else
[14:56] <fghaas> pmjdebruijn: is https://github.com/ceph/ceph/blob/master/SubmittingPatches not useful?
[14:57] <joao> mgalkiewicz, I'm not even sure if that mismatch is concerning
[14:57] <joao> I don't think so, but am not 100% sure
[14:58] <joao> and about the memory consumption, never seen it behave like that, so I think I'm a bit way in over my head here
[14:59] <joao> hey nhmlap, are you back at home? :)
[14:59] <pmjdebruijn> fghaas: I depends on using git
[14:59] <pmjdebruijn> fghaas: I don't have mail configured on my local desktop
[15:00] <pmjdebruijn> so I'd rather mail manually for now
[15:00] <fghaas> good luck. hope your mailer doesn't mangle your patch too badly :)
[15:01] <pmjdebruijn> Evo has a "preformatted" setting
[15:01] <pmjdebruijn> so that should be fine
[15:01] * pmjdebruijn *crosses-fingers*
[15:02] <nhmlap> joao: nope, I'm still in LA
[15:03] <joao> you're waking up awfully early then :)
[15:03] <mgalkiewicz> joao: I have increased swap space and I have a bunch of new logs can you take a look?
[15:03] <joao> sure
[15:03] <joao> let's have another go at it :)
[15:03] * chuanyu (chuanyu@linux3.cs.nctu.edu.tw) Quit (Ping timeout: 480 seconds)
[15:04] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[15:04] <nhmlap> joao: and this is sleeping in!
[15:04] <joao> eheh :)
[15:05] <elder> I don't know if you know this, joao, but the early bird catches the worm.
[15:05] <elder> (Do you have an idiom like that?)
[15:05] <joao> yeah, we do have something similarly annoying :p
[15:05] <elder> Do translate, please.
[15:05] <darkfader> elder: but better safe than sorry, and when it doubt, don't
[15:05] <darkfader> *sings*
[15:06] <joao> elder, it would roughly translate to "going to bed early, and early rising, gives you health and makes you grow"
[15:06] <elder> That's a different one.
[15:07] <joao> the principle is the same: making you wake up early
[15:07] <elder> Early to bed, early to rise, makes a man healthy, wealthy and wise.
[15:07] <mgalkiewicz> joao: https://gist.github.com/3090290
[15:07] <joao> elder, had no idea about that one :)
[15:07] <mgalkiewicz> joao: osd e1010: 1 osds: 0 up, 0 in
[15:08] <nhmlap> Apparently we should all get raises then. ;)
[15:08] <elder> I've heard that "A bird in the hand is worth two in the bush" is an idiom that exists all over the world, indicating it is a very very old one.
[15:08] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit ()
[15:08] <fghaas> elder: "I always told you, sleep in kids, remember Joe" -- Joe the worm's mother, to his surviving siblings
[15:08] <joao> elder, we have that one, but instead of "is worth two in the bush" we say "is worth two flying"
[15:09] <joao> but that's just because it rhymes, I suppose :)
[15:09] <pmjdebruijn> fghaas: thanks for the hints though
[15:09] <mgalkiewicz> joao: https://gist.github.com/3090305
[15:09] <joao> mgalkiewicz, looking :)
[15:09] <mgalkiewicz> joao: two gists
[15:09] <elder> fghaas, worms are lazy.
[15:10] <joao> I notices; one of them is the osd committing suicide
[15:10] <joao> *noticed
[15:10] <mgalkiewicz> yep so just waiting and consuming memory was not the way
[15:11] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) Quit (Remote host closed the connection)
[15:12] <joao> mgalkiewicz, did that huge memory consumption happened while you had the logging cranked up?
[15:13] <mgalkiewicz> yes
[15:14] <mgalkiewicz> there are a lot of logs anything you need
[15:15] <mgalkiewicz> last two logs from gists are from the time when osd took about 15GB of memory
[15:15] <mgalkiewicz> about 9 was swapped
[15:17] <joao> mgalkiewicz, what's prior to https://gist.github.com/3090305 ?
[15:19] <mgalkiewicz> joao: last 10k lines https://gist.github.com/3090350
[15:20] <joao> mgalkiewicz, I think that the osd committed suicide because the filestore spent way too much time trying to write something to disk
[15:21] <joao> we just can't see it in those logs because we turned off the filestore debugging :)
[15:21] <mgalkiewicz> omg
[15:21] <joao> but in any case, we shouldn't be having that problem, and we probably did due to all the swapping
[15:22] <joao> and I can only wonder if the swapping is a side effect of all the logging
[15:22] <mgalkiewicz> first of all osd should not consume so much memory
[15:22] <joao> because the pg stuff does produce *a lot* of logging
[15:22] <joao> mgalkiewicz, yeah, that's why I'm wondering if that memory consumption has anything to do with the logging
[15:23] <mgalkiewicz> joao: when debugging was off it also took that amount of ram
[15:23] <joao> oh
[15:23] <joao> that's weird then
[15:24] <mgalkiewicz> so what now?
[15:25] <joao> give me a second, make sure there's nothing fishy in the logs
[15:25] <mgalkiewicz> ok
[15:25] <joao> *making
[15:25] <mgalkiewicz> I can provide you entire log from different osd states
[15:25] * chuanyu (chuanyu@linux3.cs.nctu.edu.tw) has joined #ceph
[15:26] <mgalkiewicz> when it was consuming much memory, when it was up and in
[15:26] <mgalkiewicz> everything from a single osd start
[15:26] <mgalkiewicz> until it died
[15:29] <joao> I'd actually be interested in the interval in which it was up and in, and then it wasn't
[15:29] <joao> I'm assumind that would have happened when it died
[15:29] <joao> *assuming
[15:30] <joao> until then, it seems like it was communicating with the monitor
[15:30] <mgalkiewicz> it all happened and it is in logs
[15:30] <joao> so it should have been at least up
[15:31] <mgalkiewicz> but there are a lot of data so I should probably filter them first
[15:31] <mgalkiewicz> or you are able to do this on your own?
[15:31] <joao> I can do it myself, no worries :)
[15:32] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) has joined #ceph
[15:32] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) Quit (Remote host closed the connection)
[15:32] <mgalkiewicz> so how to upload a file ~7GB
[15:32] <joao> oh
[15:32] <mgalkiewicz> lets see how much after gzip
[15:33] <joao> 7GB worth of log files; that's gonna be a new personal record
[15:33] <mgalkiewicz> :)
[15:34] <mgalkiewicz> 465M
[15:35] <mgalkiewicz> is it possible to scp it to you or sth?
[15:36] <joao> I don't have anywhere to scp it to though :\
[15:36] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) has joined #ceph
[15:36] * andreask1 (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[15:36] <mgalkiewicz> ok I will put it on apache or sth
[15:37] <joao> thanks
[15:44] <pmjdebruijn> oh crap
[15:44] <pmjdebruijn> apparently the ceph-devel mailing lists accepts mails from unsubscribed addresses?
[15:44] <pmjdebruijn> so I'm afraid I might be double-sent something
[15:44] <pmjdebruijn> apologies
[15:44] <mgalkiewicz> joao: when do you finish?
[15:45] <joao> when do I finish what?
[15:46] <mgalkiewicz> work
[15:47] <elder> joao never sleeps
[15:47] <joao> lol
[15:47] <joao> mgalkiewicz, I'm usually around until 21-22h GMT
[15:48] <joao> or later if I manage being unable to sleep :)
[15:48] <mgalkiewicz> great so we still have some time for debugging:)
[15:49] <joao> yeah :)
[15:50] <joao> and in the meantime, the rest of the team should become available; maybe they can give some insight into this if I am unable to crack your problem
[15:57] <joao> opening a 7GB log file on gvim: worst idea ever
[15:58] <mgalkiewicz> yep:)
[15:59] <mgalkiewicz> yeah I thought that all of you are from LA and wasnt expecing anybody before 18-19 utc
[16:02] <joao> I'm in the west-most european coast :p
[16:05] <mgalkiewicz> so portugal
[16:06] <joao> yep
[16:14] * loicd (~loic@2001:67c:28dc:850:412e:1426:df7a:be35) Quit (Quit: Leaving.)
[16:18] * loicd (~loic@ has joined #ceph
[16:25] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[16:42] * lofejndif (~lsqavnbok@9KCAAGO86.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[16:52] * Ryan_Lane (~Adium@ has joined #ceph
[16:59] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[17:02] <mgalkiewicz> joao: is it possible to find anything in these logs?
[17:02] <joao> mgalkiewicz, haven't been able to find anything relevant yet
[17:02] <joao> am now parsing the last run and going to look at it next
[17:04] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Quit: Ex-Chat)
[17:09] * loicd (~loic@ Quit (Ping timeout: 480 seconds)
[17:11] <joao> yeah, I should have done this whole thing on a remote machine
[17:12] <joao> making the desktop unusable due to greping a 7GB file is completely unproductive
[17:25] <mgalkiewicz> :/
[17:30] * loicd (~loic@2001:67c:28dc:850:412e:1426:df7a:be35) has joined #ceph
[17:34] <joao> well, I haven't been able to figure out what's wrong
[17:35] <joao> but the guys should be around in the next hour, so maybe they can take a look :)
[17:35] * nhmlap (~Adium@ Quit (Quit: Leaving.)
[17:36] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:37] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[17:39] * asadpanda (~asadpanda@2001:470:c09d:0:20c:29ff:fe4e:a66) has joined #ceph
[17:43] * hufman (~hufman@CPE-72-128-65-189.wi.res.rr.com) has joined #ceph
[17:43] <hufman> Hello!
[17:46] <mgalkiewicz> joao: ok thx for your help, could you please let them shortly know what is probably wrong?
[17:46] <mgalkiewicz> joao: it will take hours for me to explain them the problem i guess
[17:47] <joao> sure
[17:48] <joao> once they arrive I'll make sure to tell them :)
[17:48] <mgalkiewicz> ok thx really appreciate your help
[17:50] <joao> np
[17:51] * loicd (~loic@2001:67c:28dc:850:412e:1426:df7a:be35) Quit (Quit: Leaving.)
[17:52] <hufman> i have some ceph_fuse mounts, and i was filling them up to see what would happen when i try to expand past my smaller node of my 2-node osd setup
[17:52] <hufman> and it nicely told me that it was full, right where i expected it to
[17:52] <hufman> however, now my mountpoint is frozen and i can't delete the data
[17:53] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) has joined #ceph
[17:53] <hufman> what should i do?
[17:54] * lofejndif (~lsqavnbok@19NAAAW1T.tor-irc.dnsbl.oftc.net) has joined #ceph
[17:55] * Tv_ (~tv@2607:f298:a:607:c4fb:49d5:841d:f90) has joined #ceph
[17:59] * nhmlap (~Adium@2607:f298:a:607:99d3:118:e217:e55a) has joined #ceph
[18:03] <joshd> hufman: if you can't add more space, you can increase the full threshold with something like 'ceph pg set_full_ratio 0.98'
[18:03] <joshd> hufman: the current ratios are displayed at the top of 'ceph pg dump'
[18:08] <hufman> ooo
[18:10] <hufman> do i have to do anything to reclaim the space after deleting data?
[18:12] <joshd> for cephfs, the space reclamation is asynchronous, so you might have to wait a little while
[18:14] <sagewk> gregaf: simplified the librados cct refcount for you :)
[18:14] <sagewk> much nicer
[18:17] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[18:19] * eightyeight (~atoponce@ Quit (Quit: wwjd? jwrtfm.)
[18:20] <gregaf> aww, so sweet! (<??? just finished reading logs)
[18:22] * eightyeight (~atoponce@pinyin.ae7.st) has joined #ceph
[18:33] <gregaf> sagewk: mon-log-noise looks good to me
[18:35] <sagewk> thanks
[18:39] * loicd (~loic@2001:67c:28dc:850:412e:1426:df7a:be35) has joined #ceph
[18:40] <gregaf> sagewk: hmm, I don't think I got my question across about wip-cct
[18:40] <gregaf> from a brief skim it looks like we could just pass around a shared_ptr instead of doing explicit puts and gets?
[18:41] <gregaf> or is it shared externally?
[18:41] <sagewk> we could do that too, it just means changing a ton of code
[18:42] * eightyeight (~atoponce@pinyin.ae7.st) Quit (Quit: wwjd? jwrtfm.)
[18:50] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Ping timeout: 480 seconds)
[18:50] * eightyeight (~atoponce@pinyin.ae7.st) has joined #ceph
[18:51] * eightyeight (~atoponce@pinyin.ae7.st) Quit ()
[18:51] * Ryan_Lane (~Adium@ Quit (Quit: Leaving.)
[18:52] * fghaas (~florian@ Quit (Ping timeout: 480 seconds)
[18:54] * bchrisman (~Adium@ has joined #ceph
[18:55] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[18:56] * eightyeight (~atoponce@pinyin.ae7.st) has joined #ceph
[18:56] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[18:56] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[18:56] * Ryan_Lane (~Adium@ has joined #ceph
[19:07] * dmick (~dmick@ has joined #ceph
[19:09] * rosco_ (~r.nap@ Quit (Ping timeout: 480 seconds)
[19:10] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Quit: LarsFronius)
[19:13] <mgalkiewicz> joao: who will be the best to ask?
[19:13] <joao> mgalkiewicz, sorry, hadn't the chance to talk to anyone about that; maybe sjust if he's around?
[19:14] * fghaas (~florian@ has joined #ceph
[19:14] * fghaas (~florian@ Quit ()
[19:29] * LarsFronius (~LarsFroni@95-91-243-243-dynip.superkabel.de) has joined #ceph
[19:35] * chutzpah (~chutz@ has joined #ceph
[19:44] <sjust> if anyone's interested i just pushed wip_osd_internal_doc
[19:44] <sjust> some feedback would be good
[19:47] <mgalkiewicz> sjust: do you have some time?
[19:47] <sjust> mgalkiewicz: yeah, joao is getting me your logs
[19:47] <mgalkiewicz> sjust: great
[19:51] <dmick> sjust: awesome! (and I've only just opened the overview)
[19:51] * loicd (~loic@2001:67c:28dc:850:412e:1426:df7a:be35) Quit (Quit: Leaving.)
[19:51] <sjust> dmick: I've included 1 factual error per 10 lines to keep readers on their toes
[19:52] <dmick> and a few spelling errors :)
[19:52] <sjust> yeah, that'll happen
[19:52] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[19:53] <sjust> mgalkiewicz: what is the current output of ceph -s?
[19:58] <mgalkiewicz> osd is shutted down so nothing interesting
[19:59] <sjust> ok
[19:59] * RupS| (~rups@panoramix.m0z.net) Quit (Remote host closed the connection)
[20:00] * Ryan_Lane (~Adium@ Quit (Quit: Leaving.)
[20:03] <sjust> mgalkiewicz: when did you upgrade the monitor?
[20:04] * RupS (~rups@panoramix.m0z.net) has joined #ceph
[20:05] * Ryan_Lane (~Adium@ has joined #ceph
[20:06] <mgalkiewicz> 2 days ago sth like this
[20:06] <mgalkiewicz> but they are downgraded
[20:06] <sjust> sth?
[20:06] <mgalkiewicz> no sry I will check
[20:07] <dmick> sjust: "something"
[20:07] <sjust> ok, thanks
[20:10] <mgalkiewicz> sjust: 2012-07-09 11:59:14 UTC
[20:10] <mgalkiewicz> sjust: the first one
[20:10] <sjust> and it worked for two days?
[20:11] <mgalkiewicz> yep
[20:11] <mgalkiewicz> I did not restart osd
[20:12] <sjust> oh, but it got messed up when you restarted the osd?
[20:13] <mgalkiewicz> 11.07 around 23:00
[20:13] <mgalkiewicz> not sure exactly I can check if you need more precise time
[20:13] <sjust> the time isn't important, just the event
[20:13] <sjust> you restarted the osd?
[20:13] <sjust> or it just happened
[20:13] <sjust> ?
[20:14] <mgalkiewicz> my monitoring system detected the problem so osd probably crashed on its own and the was started by monitoring
[20:14] <sjust> oh
[20:22] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[20:22] * lxo (~aoliva@lxo.user.oftc.net) Quit (Read error: Connection reset by peer)
[20:23] <sjust> mgalkiewicz: would you mind restarting the OSD with debug filestore = 20 in addition to the previous debugging?
[20:23] <sjust> it appears that a disk operation took several minutes to complete resulting in the death of the osd
[20:24] <sjust> if we can reproduce it with debugging, I should be able to track it down
[20:24] <mgalkiewicz> ok it will take about an hour maybe less tfor osd to crash
[20:24] <sjust> yeah
[20:26] <mgalkiewicz> --debug-ms 10 --debug-monc 10 --debug-osd 20 --debug-filestore 20
[20:26] <mgalkiewicz> is it what we want/
[20:26] <mgalkiewicz> ?
[20:26] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[20:27] <mgalkiewicz> sjust: ?
[20:38] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[20:41] <nhmlap> mgalkiewicz: He had to run to a meeting for a bit
[20:46] <mgalkiewicz> nhmlap: ok I have figured this out
[20:47] * LarsFronius (~LarsFroni@95-91-243-243-dynip.superkabel.de) Quit (Quit: LarsFronius)
[20:53] <elder> 503 Service Temporarily Unavailable on http://ceph.com/gitbuilder-precise-kernel-amd64/
[20:56] <dmick> elder: there are some internal network hiccups, could be related, but I'll look at the VM
[20:56] <elder> Thanks.
[20:57] <elder> Meanwhile, gitbuilder.ceph.com/... works.
[20:57] <elder> Different machine?
[20:57] <dmick> yes
[20:57] <dmick> the former is a specific VM
[21:03] * danieagle (~Daniel@ has joined #ceph
[21:03] <elder> I am also having trouble power cycling plana70.
[21:03] <elder> I can ping the ipmi address though.
[21:17] <todin> Hi, has anyone here some expirience with the discard patch for qemu/rbd which josh posted in may?
[21:19] <todin> the patch went upstream, but discard does not work, is there a config option for it?
[21:20] <joshd> todin: you need to use ide or scsi buses, discard hasn't been added to virtio in qemu
[21:21] <todin> joshd: I tried ide but it did not work, do you have qemu command line example at hand?
[21:22] <todin> does it work in combination with the rbd_cache?
[21:22] <elder> dmick, any info? I'm still not progressing.
[21:22] <joshd> yeah, it works with the cache
[21:22] <joshd> how are you enabling it in the guest, and how are you verifying whether it works?
[21:23] <joshd> elder: the vms are all inaccessible right now (including teuthology) due to the network issues. that may be why ipmi isn't working for plana as well
[21:23] <elder> OK.
[21:23] <elder> I'll be patient.
[21:23] <themgt> anyone using something like https://github.com/ha/doozerd/ to maintain a ceph.conf file between cluster nodes? the files are always supposed to be identical, right?
[21:24] <elder> I've got some other work to do now anyway. (Reviews0
[21:24] <joshd> themgt: the files don't have to be identical, and in the future may not need much node-specific info at all
[21:24] <todin> joshd: I tried in the quest fstrim, it says op not supported, accoridng to google you could check the block device via /sys/block/xxx/queue/discard_gran_max
[21:25] <themgt> joshd - how would that work, would you just give each node some way to connect to the cluster, and the cluster would maintain each nodes config itself?
[21:26] * danieagle (~Daniel@ Quit (Read error: No route to host)
[21:27] <joshd> todin: does hdparm --trim-sector-ranges work?
[21:29] <todin> joshd: didn't try, but I will check, as I know that it should work, I am going to try it a little harder
[21:30] <joshd> themgt: using smarter chef in the future, all the configuration necessary can be passed as arguments to daemons
[21:31] <todin> joshd: btw, do you know a nice tutorial how to manage IPs via chef?
[21:32] <joshd> themgt: the main configuration state (like monitor addresses and auth info) is maintained by the monitor cluster anyway (which uses a variant of paxos), but you could maintain extra configuration in doozerd or zookeeper
[21:33] <joshd> todin: no, I'm not very up to date on chef myself, I just know bits and pieces
[21:33] <themgt> ok .. I've basically got a custom wrapper on chef-ish based deployment now, just trying to get a hang on the right way to maintain a cluster
[21:34] <themgt> it seems like there's automatic tools for a lot of the management stuff, if SSH keys are setup and such
[21:34] <themgt> which is a bit at odds with the chef-philosophy, but if it works I'm not too concerned ;)
[21:34] <joshd> yeah, the goal is for a lot of it to be automatable with chef
[21:34] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) Quit (Ping timeout: 480 seconds)
[21:35] <joshd> tv could tell you more about the future plans there
[21:36] <themgt> cool, thx
[21:43] <sjust> mgalkiewicz: yeah, that's right
[21:43] <sjust> sorry, was in a meeting
[21:44] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) has joined #ceph
[21:45] <dmick> elder: ipmi seems to work, but the VMs and teuthology, not so much
[21:46] <elder> ipmi doesn't seem to be working for me.
[21:46] <elder> That's allright, I can wait until things settle down.
[21:47] <dmick> do you need 70 cycled? I can do that
[21:47] <elder> Only if I'll then be able to do a teuthology run on it. :)
[21:48] <dmick> if it's already locked...well, no, it'll still need to check I bet
[21:48] <elder> Stuck on "Checking locks..."
[21:49] <dmick> but I'm gonna guess you'd prefer it be cycled if it's wedged anyway, even without teuthology, yes?
[21:50] <elder> Sure, might as well. Please do.
[21:50] <dmick> done
[21:50] <dmick> someone, btw, apparently has a console session (you?)
[21:51] * lofejndif (~lsqavnbok@19NAAAW1T.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[21:57] <todin> joshd: discard via hdparm works, but not via fstrim or mkfs.ext4 -E discard
[21:59] <joshd> todin: interesting... maybe we (or qemu) needs to report extra info to the guest (like /sys/block/xxx/queue/discard_gran_max you mentioned)
[21:59] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[22:01] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has left #ceph
[22:01] <Tv_> ooh i see my nick mentioned
[22:01] <todin> joshd: I think so, or the device must repot it's trim capability
[22:01] <Tv_> themgt: reading backscroll
[22:02] <Tv_> themgt: the current chef cookbooks write out a ceph.conf based on attributes in the chef environment; there's not that much going on in the config file anymore, after osd hotplug features etc made the [osd.42] sections unnecessary etc
[22:03] <Tv_> themgt: i don't expect ceph.conf to need to change on those chef setups except to add things like debug levels, and that does not have to happen everywhere at once
[22:04] <Tv_> themgt: the chef layer is intentionally kept very thing, most of the functionality belongs in the core product, and i fully intend to implement a "this machine can ssh everywhere" mode that uses the same underlying features, but not chef -- and then call that mkcephfs v2.0 ;)
[22:04] <Tv_> s/thing/thin/
[22:05] <Tv_> (the biggest difference being, it'll be fully incremental, which the current mkcephfs very much isn't)
[22:09] <mgalkiewicz> sjust: osd already crashed do you need entire log?
[22:11] <elder> sagewk, yehudasa: Would providing a ceph_decode_string_safe() macro in addition to the ceph_decode_string() function satisfy what you wanted in your review comments?
[22:16] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:21] <sagewk> elder: then who would use ceph_decode_string()? it seems like they're all broken by definition
[22:22] <elder> If you knew the length ahead of time you wouldn't need it.
[22:22] <elder> Anyway, I've already written the macro.
[22:22] <elder> And it leverages the function.
[22:22] * Ryan_Lane (~Adium@ Quit (Quit: Leaving.)
[22:24] * Ryan_Lane (~Adium@ has joined #ceph
[22:24] <elder> sagewk, I'll send out an updated patch, you can see what I'm talking about.
[22:27] <sjust> mgalkiewicz: yeah, if possible
[22:27] <sjust> how big is it?
[22:28] <sagewk> elder:K
[22:29] <mgalkiewicz> sjust: gzipping
[22:29] <elder> sagewk, it follows the pattern of the rest of the _safe() macros, where you provide a "bad" label to goto in the event there's a problem.
[22:30] <sagewk> yeah sounds good
[22:31] <sagewk> i think the key difference is that there is a legitimate user for decode_u64 etc. because you can check the bounds ahead of time, but there is no possible way anyone can call decode_string and not risk overrunning a buffer
[22:31] <sagewk> the bounds check has to happen between reading the length and allocating/copying it or else it is fundamentally unsafe
[22:33] * Ryan_Lane (~Adium@ Quit (Quit: Leaving.)
[22:33] <elder> Except if you know, for example, that you only supplied X bytes for a receive message, then you know there's an upper bound.
[22:33] <elder> But I'm being a devil's advocate.
[22:34] <mgalkiewicz> sjust: 366MB
[22:34] <sjust> ok
[22:34] <mgalkiewicz> sjust: is it possible to scp it to u/
[22:34] <mgalkiewicz> ?
[22:35] <sagewk> true, but then you have to preallocate a max-sized buffer?
[22:37] <sagewk> seems liek you end up with
[22:37] <sagewk> macro -> ceph_extract_encoded_string -> ceph_decode_string (*2)
[22:37] <sagewk> where the macro user is probably only valid user, and you are only replacing about ~8 lines of open-coded functionality
[22:37] <sagewk> a simple int ceph_decode_string_safe(void **p, void *end, char **str) could do it all in one go
[22:45] <mgalkiewicz> sjust: are you going to investigate it shortly?
[22:45] <sjust> yeah, looking now
[22:45] <mgalkiewicz> k
[22:54] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) Quit (Ping timeout: 480 seconds)
[23:03] * rosco (~r.nap@ has joined #ceph
[23:03] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) has joined #ceph
[23:06] <sjust> ugh, that version generates large transactions during handle_osd_map, it's actually working fine, one sec
[23:07] <sjust> mgalkiewicz: add to your settings for that osd:
[23:07] <sjust> filestore op thread timeout = 600
[23:07] <sjust> filestore op thread suicide timeout = 1200
[23:08] <mgalkiewicz> ok so run again and provide logs?
[23:08] <sjust> yeah, but I don't expect it to crash
[23:08] <sjust> might as well leave on logging anyway though
[23:09] <sjust> once it starts serving requests you can turn the logging back off
[23:09] <mgalkiewicz> ok do you know that the server is heavily consuming memory and cpu?
[23:09] <mgalkiewicz> and waits in io a lot
[23:09] <sjust> it's a consequence of processing the map backlog
[23:09] <mgalkiewicz> ok
[23:14] <sjust> mgalkiewicz: newer versions don't actually have that particular problem
[23:14] <mgalkiewicz> i thought about upgrade
[23:16] <mgalkiewicz> but I was not sure if it is a good idea when osd is broken and I have no idea how much time will take to reorganize data in 0.48
[23:17] <mgalkiewicz> I have 29GB do you have any idea how long will it take?
[23:23] <sjust> mgalkiewicz: not really, depends on the condition of the underlying filesystem
[23:24] <sjust> it's proportional to the number of objects rather than the total size
[23:27] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[23:30] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[23:40] * Meths (rift@ Quit (Ping timeout: 480 seconds)
[23:41] * Meths (rift@ has joined #ceph
[23:42] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) Quit (Quit: Leaving.)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.