#ceph IRC Log

Index

IRC Log for 2012-04-25

Timestamps are in GMT/BST.

[0:01] * loicd (~loic@173.231.115.58) Quit (Quit: Leaving.)
[0:01] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Remote host closed the connection)
[0:06] * sage (~sage@cpe-76-94-40-34.socal.res.rr.com) has joined #ceph
[0:10] <elder> mgalkiewicz, sorry, I just returned from what should have been a half hour trip.
[0:10] <mgalkiewicz> np:)
[0:10] <elder> My question before I left was trying to clarify your statement about the cash of the client after the server was restored.
[0:11] <elder> Do you mean that all three of your osd's were operational, yet the client still crashed in a similar way after you started things up again?
[0:12] <elder> Sorry, I meant both osd's, all three hosts.
[0:12] <mgalkiewicz> yes
[0:13] <elder> OK. I'll make a note of that too in the bug report. Unfortunately I don't expect to make any more progress on this today.
[0:14] <mgalkiewicz> And I am not sure if its obvious but the server with ceph client is completely unresponsible after the crash. It does not even reponse to icmp echo requests
[0:14] * BManojlovic (~steki@212.200.243.246) Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:15] * Tv_ (~tv@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[0:16] <mgalkiewicz> elder: ok. Can I catch u tomorrow about the same time?
[0:29] * morpheusx (~morpheus@foo.morphhome.net) Quit (Ping timeout: 480 seconds)
[0:35] <elder> I'm pretty much on all day... But I do get away from my computer occasionally :)
[0:40] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[0:47] <elder> Can we try putting a bunch of osd's on this? https://drive.google.com/start#home
[0:49] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[0:53] <mgalkiewicz> elder: ok and thx for your help!
[0:53] <mgalkiewicz> cu
[0:54] * mgalkiewicz (~mgalkiewi@85.89.186.247) Quit (Quit: Leaving)
[0:56] * andresambrois (~aa@217.115.112.241) has joined #ceph
[0:59] <nhm> wow, cray is selling their interconnect development program to intel.
[0:59] <nhm> with intel buying qlogic's IB division and now this, they are making a pretty big move into that sector.
[1:05] * andresambrois (~aa@217.115.112.241) Quit (Quit: Konversation terminated!)
[1:05] * andresambrois (~aa@217.115.112.241) has joined #ceph
[1:06] * andresambrois (~aa@217.115.112.241) Quit ()
[1:06] * andresambrois (~aa@217.115.112.241) has joined #ceph
[1:11] <elder> nhm, that's interesting, but my gut reaction about Cray/Intel is that it says more about Cray than Intel.
[1:12] * andresambrois (~aa@217.115.112.241) Quit (Read error: Connection reset by peer)
[1:12] * andresambrois (~aa@217.115.112.241) has joined #ceph
[1:14] <elder> And maybe Qlogic too, for that matter. Both those companies are shedding technology and probably more pressing for them, employees.
[1:15] * andresambrois (~aa@217.115.112.241) Quit (Read error: Connection reset by peer)
[1:15] * andresambrois (~aa@217.115.112.241) has joined #ceph
[1:15] <nhm> elder: I was bearish on cray, but they've won some big contracts lately which will help them out.
[1:16] <elder> How recently?
[1:16] <nhm> Getting blue waters after IBM gave up may turn out well for them if they can just replicate some of the other big machines they've been putting together.
[1:17] * andresambrois (~aa@217.115.112.241) Quit ()
[1:17] * andresambrois (~aa@217.115.112.241) has joined #ceph
[1:18] * andresambrois (~aa@217.115.112.241) Quit ()
[1:18] * andresambrois (~aa@217.115.112.241) has joined #ceph
[1:18] * mtk (~mtk@ool-44c35967.dyn.optonline.net) Quit (Remote host closed the connection)
[1:18] <nhm> titan and blue waters are the two big ones right now. I think there are a couple of other big decently big ones too.
[1:19] <nhm> ok, time to eat dinner
[1:19] <elder> I don't know, these lab machines have a long history of hurting HPC companies. Decades long.
[1:19] <nhm> yeah, I think they may be ok given that it's basically just the kraken design.
[1:19] <nhm> might make it a bit more profitable.
[1:19] <elder> Go eat.
[1:20] <joao> if Mark's already having dinner, that's my cue to go to sleep :p
[1:21] <elder> Go sleep.
[1:22] * andresambrois (~aa@217.115.112.241) Quit ()
[1:22] * andresambrois (~aa@217.115.112.241) has joined #ceph
[1:24] * andresambrois (~aa@217.115.112.241) Quit ()
[1:24] * andresambrois (~aa@217.115.112.241) has joined #ceph
[1:27] * andresambrois (~aa@217.115.112.241) Quit ()
[1:27] * aa_ (~aa@217.115.112.241) has joined #ceph
[1:28] * aa_ (~aa@217.115.112.241) Quit ()
[1:28] * aa_ (~aa@217.115.112.241) has joined #ceph
[1:29] * aa_ (~aa@217.115.112.241) Quit ()
[1:29] * aa_ (~aa@217.115.112.241) has joined #ceph
[1:30] <joao> I just hate when the mouse's battery dies as I'm about to put the computer to sleep
[1:31] <joao> 'night
[1:31] * joao (~JL@89-181-154-158.net.novis.pt) Quit (Quit: Leaving)
[1:40] <sagewk> gregaf: can you review wip-mon for me?
[1:42] * bchrisman (~Adium@108.60.121.114) Quit (Quit: Leaving.)
[1:45] * lofejndif (~lsqavnbok@28IAAD6YM.tor-irc.dnsbl.oftc.net) has joined #ceph
[1:50] <nhm> ok, very first supernode tests are in. Good news: At peak we can get 424MB/s+. Bad news: Average throughput for the run was ~60MB/s.
[1:53] <sjust> :(
[1:53] * ivan\ (~ivan@108-213-76-179.lightspeed.frokca.sbcglobal.net) Quit (Quit: ERC Version 5.3 (IRC client for Emacs))
[1:53] <sjust> rados bench 4MB writes?
[1:54] * ivan\ (~ivan@108-213-76-179.lightspeed.frokca.sbcglobal.net) has joined #ceph
[1:55] <nhm> sjust: yep
[1:55] <sjust> xfs/
[1:55] <nhm> sjust: 1 client, 2 OSDs btrfs
[1:55] <nhm> sorry, 2 OSD nodes
[1:55] <sjust> how many disks/
[1:56] <nhm> 14 OSDs, 14 disks, 8 SSDs for journals (4 per node)
[1:56] <nhm> 10GE for netowrk
[1:56] <sjust> 10G?
[1:56] <nhm> yep.
[1:57] <sjust> how long was the run?
[1:57] <nhm> behavior was like 350-400MB/s for 5-6s, and then lots of inactivity.
[1:57] <nhm> 5mins
[1:57] <sjust> yeah, that's a pattern I'm familiar with
[1:58] <sjust> xfs should perform better if it follow the pattern I've seen in the past
[1:59] <nhm> ok, I'll give it a try.
[2:00] <nhm> No debugging on this run, but at some point I'll enable it and we can try to figure out what was taking so long.
[2:14] * aa_ (~aa@217.115.112.241) Quit (Quit: Konversation terminated!)
[2:14] * aa__ (~aa@217.115.112.241) has joined #ceph
[2:15] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[2:21] * LarsFronius (~LarsFroni@95-91-243-252-dynip.superkabel.de) has joined #ceph
[2:24] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[2:27] * aa__ (~aa@217.115.112.241) Quit (Quit: Konversation terminated!)
[2:27] * aa__ (~aa@217.115.112.241) has joined #ceph
[2:28] * aa__ (~aa@217.115.112.241) Quit ()
[2:28] * aa__ (~aa@217.115.112.241) has joined #ceph
[2:28] * aa__ (~aa@217.115.112.241) Quit ()
[2:28] * aa__ (~aa@217.115.112.241) has joined #ceph
[2:29] * yoshi (~yoshi@p1062-ipngn1901marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[2:44] * aa__ (~aa@217.115.112.241) Quit (Quit: Konversation terminated!)
[2:44] * aa__ (~aa@217.115.112.241) has joined #ceph
[3:05] * LarsFronius (~LarsFroni@95-91-243-252-dynip.superkabel.de) Quit (Quit: LarsFronius)
[3:07] * LarsFronius (~LarsFroni@95-91-243-252-dynip.superkabel.de) has joined #ceph
[3:15] * LarsFronius (~LarsFroni@95-91-243-252-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[3:44] * lofejndif (~lsqavnbok@28IAAD6YM.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[3:53] <The_Bishop> after the things i saw in this channel, i use the fuse client on the box running OSDs. this works out very well and is much more stable than the kernel client. ref: http://www.spinics.net/lists/ceph-devel/msg01425.html
[3:55] <The_Bishop> thx nhm :)
[4:07] * aa__ (~aa@217.115.112.241) Quit (Quit: Konversation terminated!)
[4:07] * aa__ (~aa@217.115.112.241) has joined #ceph
[4:08] * aa__ (~aa@217.115.112.241) Quit ()
[4:08] * aa__ (~aa@217.115.112.241) has joined #ceph
[4:08] * vikasap_ (~vikasap@ool-4353bee9.dyn.optonline.net) has joined #ceph
[4:15] * aa__ (~aa@217.115.112.241) Quit (Quit: Konversation terminated!)
[4:15] * aa__ (~aa@217.115.112.241) has joined #ceph
[4:16] * aa__ (~aa@217.115.112.241) Quit ()
[4:16] * aa__ (~aa@217.115.112.241) has joined #ceph
[4:17] <nhm> ok, supernode test with xfs averages about 156MB/s, with 540MB/s peaks.
[4:21] <iggy> 14 disks per node or total?
[4:26] * dmick (~dmick@aon.hq.newdream.net) Quit (Quit: Leaving.)
[4:28] <nhm> iggy: total
[4:28] <nhm> iggy: 7 OSD disks and 4 SSDs for journals per node.
[4:28] <iggy> what kind of osd disks?
[4:29] <nhm> iggy: whatever dell is using for their 1TB 7200rpm sata disks. Probably ES.2s
[4:30] * loicd (~loic@modemcable075.145-176-173.mc.videotron.ca) has joined #ceph
[4:30] <iggy> cool. I'm trying to get access to some nodes at work to test ceph... no SSDs, but 15k SAS
[4:30] * chutzpah (~chutz@216.174.109.254) Quit (Quit: Leaving)
[4:31] <iggy> they are testing fhgfs on them now
[4:32] <nhm> iggy: ah, I never actually ran fhgfs, though I saw a report on it at the lustre conference last year.
[4:32] <nhm> Apparently it's quite fast.
[4:32] <iggy> I think it depends heavily on the workload
[4:32] <iggy> doing stuff like dd runs seems to do good, but running some of our actual apps on it sucks
[4:33] <iggy> and iozone did good
[4:33] <nhm> that tends to be the case with a lot of those kinds of filesystems.
[5:09] * aa__ (~aa@217.115.112.241) Quit (Quit: Konversation terminated!)
[5:09] * aa__ (~aa@217.115.112.241) has joined #ceph
[5:10] * aa__ (~aa@217.115.112.241) Quit ()
[5:10] * aa__ (~aa@217.115.112.241) has joined #ceph
[5:11] * aa__ (~aa@217.115.112.241) Quit ()
[5:11] * aa__ (~aa@217.115.112.241) has joined #ceph
[5:20] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[5:26] * aa__ (~aa@217.115.112.241) Quit (Quit: Konversation terminated!)
[5:26] * aa__ (~aa@217.115.112.241) has joined #ceph
[5:32] * aa__ (~aa@217.115.112.241) Quit (Quit: Konversation terminated!)
[5:32] * aa (~aa@217.115.112.241) has joined #ceph
[5:37] * vikasap_ (~vikasap@ool-4353bee9.dyn.optonline.net) Quit (Quit: leaving)
[5:42] * aa (~aa@217.115.112.241) Quit (Quit: Konversation terminated!)
[5:42] * aa (~aa@217.115.112.241) has joined #ceph
[5:46] * aa (~aa@217.115.112.241) Quit ()
[5:46] * aa (~aa@217.115.112.241) has joined #ceph
[5:49] * aa (~aa@217.115.112.241) Quit ()
[5:49] * aa (~aa@217.115.112.241) has joined #ceph
[5:50] * aa (~aa@217.115.112.241) Quit ()
[5:50] * aa (~aa@217.115.112.241) has joined #ceph
[5:54] * aa (~aa@217.115.112.241) Quit ()
[5:54] * aa (~aa@217.115.112.241) has joined #ceph
[5:54] * aa (~aa@217.115.112.241) Quit ()
[5:54] * aa (~aa@217.115.112.241) has joined #ceph
[5:54] * aa (~aa@217.115.112.241) Quit ()
[5:54] * aa (~aa@217.115.112.241) has joined #ceph
[6:04] * aa (~aa@217.115.112.241) Quit (Quit: Konversation terminated!)
[6:04] * aa (~aa@217.115.112.241) has joined #ceph
[6:04] * aa (~aa@217.115.112.241) Quit ()
[6:04] * aa (~aa@217.115.112.241) has joined #ceph
[6:22] * aa (~aa@217.115.112.241) Quit (Quit: Konversation terminated!)
[6:22] * aa (~aa@217.115.112.241) has joined #ceph
[6:24] * aa (~aa@217.115.112.241) Quit ()
[6:24] * aa (~aa@217.115.112.241) has joined #ceph
[6:42] * f4m8_ is now known as f4m8
[7:38] * cattelan is now known as cattelan_away
[7:39] * aa (~aa@217.115.112.241) Quit (Quit: Konversation terminated!)
[7:39] * aa (~aa@217.115.112.241) has joined #ceph
[7:40] * aa (~aa@217.115.112.241) Quit ()
[7:40] * The_Bishop (~bishop@158.181.82.102) Quit (Read error: Connection reset by peer)
[7:40] * The_Bishop (~bishop@158.181.82.102) has joined #ceph
[7:40] * aa (~aa@217.115.112.241) has joined #ceph
[7:54] * aa (~aa@217.115.112.241) Quit (Quit: Konversation terminated!)
[7:54] * aa (~aa@217.115.112.241) has joined #ceph
[8:08] * yoshi (~yoshi@p1062-ipngn1901marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[8:46] * LarsFronius (~LarsFroni@95-91-243-252-dynip.superkabel.de) has joined #ceph
[8:48] * aa (~aa@217.115.112.241) Quit (Quit: Konversation terminated!)
[8:49] * aa (~aa@217.115.112.241) has joined #ceph
[8:49] * Theuni (~Theuni@195.62.106.91) has joined #ceph
[8:50] * aa (~aa@217.115.112.241) Quit ()
[8:50] * aa (~aa@217.115.112.241) has joined #ceph
[8:58] * The_Bishop_ (~bishop@158.181.82.102) has joined #ceph
[8:58] * The_Bishop (~bishop@158.181.82.102) Quit (Read error: Connection reset by peer)
[9:07] * aa (~aa@217.115.112.241) Quit (Quit: Konversation terminated!)
[9:07] * andresambrois (~aa@217.115.112.241) has joined #ceph
[9:08] * andresambrois (~aa@217.115.112.241) Quit ()
[9:08] * andresambrois (~aa@217.115.112.241) has joined #ceph
[9:24] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:24] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit ()
[9:29] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:32] * LarsFronius (~LarsFroni@95-91-243-252-dynip.superkabel.de) Quit (Quit: LarsFronius)
[9:36] * andresambrois (~aa@217.115.112.241) Quit (Quit: Konversation terminated!)
[9:36] * yoshi (~yoshi@p1062-ipngn1901marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[9:36] * andresambrois (~aa@217.115.112.241) has joined #ceph
[9:36] * andresambrois (~aa@217.115.112.241) Quit ()
[9:36] * andresambrois (~aa@217.115.112.241) has joined #ceph
[9:39] * andresambrois (~aa@217.115.112.241) Quit ()
[9:39] * andresambrois (~aa@217.115.112.241) has joined #ceph
[9:39] * andresambrois (~aa@217.115.112.241) Quit ()
[9:39] * aa_ (~aa@217.115.112.241) has joined #ceph
[9:44] * aa_ (~aa@217.115.112.241) Quit ()
[9:44] * aa_ (~aa@217.115.112.241) has joined #ceph
[9:44] * aa_ (~aa@217.115.112.241) Quit ()
[9:44] * aa_ (~aa@217.115.112.241) has joined #ceph
[9:44] * The_Bishop_ (~bishop@158.181.82.102) Quit (Read error: Connection reset by peer)
[9:51] * yoshi (~yoshi@p1062-ipngn1901marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[9:57] * aa_ (~aa@217.115.112.241) Quit (Quit: Konversation terminated!)
[9:57] * aa__ (~aa@217.115.112.241) has joined #ceph
[9:57] * aa__ (~aa@217.115.112.241) Quit ()
[9:57] * aa__ (~aa@217.115.112.241) has joined #ceph
[10:01] * The_Bishop_ (~bishop@158.181.82.102) has joined #ceph
[10:05] * morpheus (~morpheus@foo.morphhome.net) has joined #ceph
[10:16] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[10:25] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[10:26] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[10:46] * aa__ (~aa@217.115.112.241) Quit (Read error: Operation timed out)
[10:50] * aa__ (~aa@217.115.112.241) has joined #ceph
[10:59] * aa__ (~aa@217.115.112.241) Quit (Quit: Konversation terminated!)
[10:59] * aa (~aa@217.115.112.241) has joined #ceph
[10:59] * aa (~aa@217.115.112.241) Quit ()
[10:59] * aa (~aa@217.115.112.241) has joined #ceph
[11:01] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[11:02] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[11:03] * Theuni (~Theuni@195.62.106.91) Quit (Ping timeout: 480 seconds)
[11:13] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[11:14] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[11:16] * Theuni (~Theuni@195.62.106.110) has joined #ceph
[11:16] * aa (~aa@217.115.112.241) Quit (Quit: Konversation terminated!)
[11:16] * andresambrois (~aa@217.115.112.241) has joined #ceph
[11:17] * andresambrois (~aa@217.115.112.241) Quit ()
[11:17] * andresambrois (~aa@217.115.112.241) has joined #ceph
[11:20] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[11:28] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[11:28] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[11:29] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit ()
[11:32] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[11:34] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit ()
[11:35] * andresambrois (~aa@217.115.112.241) Quit (Quit: Konversation terminated!)
[11:35] * andresambrois (~aa@217.115.112.241) has joined #ceph
[11:35] * andresambrois (~aa@217.115.112.241) Quit ()
[11:35] * andresambrois (~aa@217.115.112.241) has joined #ceph
[11:37] * andresambrois (~aa@217.115.112.241) Quit ()
[11:37] * andresambrois (~aa@217.115.112.241) has joined #ceph
[11:37] * andresambrois (~aa@217.115.112.241) Quit (Read error: Connection reset by peer)
[11:37] * andresambrois (~aa@217.115.112.241) has joined #ceph
[11:50] * eightyeight (~atoponce@pthree.org) Quit (Ping timeout: 480 seconds)
[11:53] * andresambrois (~aa@217.115.112.241) Quit (Ping timeout: 480 seconds)
[12:08] * andresambrois (~aa@217.115.112.241) has joined #ceph
[12:11] * Theuni (~Theuni@195.62.106.110) Quit (Quit: Leaving.)
[12:25] * andresambrois (~aa@217.115.112.241) Quit (Quit: Konversation terminated!)
[12:25] * aa_ (~aa@217.115.112.241) has joined #ceph
[12:26] * aa_ (~aa@217.115.112.241) Quit ()
[12:26] * aa_ (~aa@217.115.112.241) has joined #ceph
[12:27] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[12:27] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit ()
[12:37] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[12:46] <nhm> good morning #ceph
[12:48] * aa_ (~aa@217.115.112.241) Quit (Remote host closed the connection)
[13:09] * Theuni (~Theuni@195.62.106.91) has joined #ceph
[13:12] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[13:36] * ss7pro (~ss7pro@static.nk-net.pl) has joined #ceph
[13:36] <ss7pro> anynoe from dreamhost is here ?
[13:37] <elder> What do you need ss7pro?
[13:39] <ss7pro> I have reported problem with ceph-osd on ceph-devel
[13:39] <ss7pro> mailling list
[13:39] <ss7pro> and I'am wonder
[13:40] <ss7pro> if ther's anybody
[13:40] <ss7pro> who can help me with diagnoze
[13:40] <ss7pro> I have one idea
[13:40] <ss7pro> how to fix this
[13:40] <ss7pro> but I'am afraid
[13:40] <ss7pro> cause I'am not sure
[13:40] <ss7pro> if it will not cause any data damage ;(
[13:40] <nhm> ss7pro: I'm about half here right now. :)
[13:41] <nhm> what was the title of your mailng list post?
[13:42] <nhm> oh, nm, I see it
[13:42] <ss7pro> I was playing with pool snapshots
[13:42] <ss7pro> and after removing pool snapshot all of the osd crashed
[13:43] <ss7pro> problem is in osd/OSD.cc:3475
[13:43] <ss7pro> I'am wondering if I can just remove information about pool snapshots from osdmap
[13:45] <nhm> ss7pro: not sure. I don't know enough about how that part of the code works to advise...
[13:47] <nhm> ss7pro: The other guys should be around in about 4-5 hours.
[13:49] <ss7pro> thanks
[13:50] <nhm> ss7pro: np, sorry I couldn't be of more help.
[13:50] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Quit: LarsFronius)
[13:54] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[14:07] * joao (~JL@89-181-154-158.net.novis.pt) has joined #ceph
[14:08] <joao> hi all
[14:09] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[14:12] * aa (~aa@217.115.112.241) has joined #ceph
[14:12] * aa (~aa@217.115.112.241) Quit (Remote host closed the connection)
[14:15] <nhm> zgood morning joao
[14:16] <joao> hi Mark
[14:33] * loicd (~loic@modemcable075.145-176-173.mc.videotron.ca) Quit (Quit: Leaving.)
[14:39] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Quit: LarsFronius)
[14:43] * loicd (~loic@173.231.115.58) has joined #ceph
[14:46] * Theuni1 (~Theuni@195.62.106.110) has joined #ceph
[14:47] * mgalkiewicz (~mgalkiewi@staticline58611.toya.net.pl) has joined #ceph
[14:49] * brambles (brambles@79.133.200.49) Quit (Remote host closed the connection)
[14:49] * brambles (brambles@79.133.200.49) has joined #ceph
[14:51] * asadpanda (~asadpanda@67.231.236.80) Quit (Remote host closed the connection)
[14:52] * Theuni (~Theuni@195.62.106.91) Quit (Ping timeout: 480 seconds)
[14:52] * asadpanda (~asadpanda@67.231.236.80) has joined #ceph
[15:08] * eightyeight (~atoponce@pthree.org) has joined #ceph
[15:17] * s[X]_ (~sX]@60-241-151-10.tpgi.com.au) has joined #ceph
[15:36] * eightyeight (~atoponce@pthree.org) Quit (Ping timeout: 480 seconds)
[15:41] * eightyeight (~atoponce@pthree.org) has joined #ceph
[15:43] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[15:43] * f4m8 is now known as f4m8_
[15:57] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Quit: LarsFronius)
[16:21] * mgalkiewicz (~mgalkiewi@staticline58611.toya.net.pl) has left #ceph
[16:21] * Theuni1 (~Theuni@195.62.106.110) Quit (Ping timeout: 480 seconds)
[16:36] * s[X]_ (~sX]@60-241-151-10.tpgi.com.au) Quit (Remote host closed the connection)
[16:53] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[17:04] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[17:04] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[17:13] * ss7pro (~ss7pro@static.nk-net.pl) Quit (Remote host closed the connection)
[17:20] * ss7pro (~ss7pro@static.nk-net.pl) has joined #ceph
[17:29] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[17:29] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit ()
[18:08] * asadpanda (~asadpanda@67.231.236.80) Quit (Ping timeout: 480 seconds)
[18:10] * asadpanda (~asadpanda@67.231.236.80) has joined #ceph
[18:22] * Tv_ (~tv@aon.hq.newdream.net) has joined #ceph
[18:23] * cattelan_away (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) Quit (Ping timeout: 480 seconds)
[18:40] * cattelan_away (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) has joined #ceph
[18:41] <ss7pro> anybody from newdream here ?
[18:41] <gregaf> *looks at sjust*
[18:42] <sjust> ss7pro: I was just looking at your email
[18:43] <ss7pro> and ? any idea ?
[18:43] <ss7pro> Can i just comment out those lines in OSD.cc related to subtract ?
[18:44] <ss7pro> or do i have do modify snap_seq in some way in osdmap ?
[18:45] <sjust> one sec, you removed a self managed snapshot?
[18:45] * cattelan_away (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) Quit (Read error: Connection reset by peer)
[18:45] <ss7pro> yes
[18:45] <sjust> but you had been creating pool snapshots
[18:46] <ss7pro> at the begining I have created pool snapshot
[18:46] <ss7pro> removed it fw sec later
[18:46] <ss7pro> and after this i was trying to make self managed snapshot
[18:46] <ss7pro> and than osd died
[18:47] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Read error: Operation timed out)
[18:47] <sjust> how did you attempt to create the self managed snapshot?
[18:48] <ss7pro> using rbd command
[18:48] <ss7pro> it was sth like rbd -p nova snap create XXXX
[18:49] <sjust> ah
[18:49] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[18:49] <ss7pro> qman
[18:53] * bchrisman (~Adium@108.60.121.114) has joined #ceph
[18:54] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[18:57] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[19:02] <sjust> was that pool snap the first that you created?
[19:03] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[19:06] <ss7pro> yes poll snap was the first one
[19:07] <ss7pro> * pool
[19:07] <sjust> and you hadn't done a self-managed snap prior to that?
[19:08] <ss7pro> I'am not sure :-)
[19:08] <sjust> ok
[19:10] <ss7pro> From those what I have seend in gdb chaed_removed_snaps was having two elements and newly_removed_snaps was just one element
[19:11] <sjust> yeah, essentially, you can't mix pool snaps and self-managed snaps in the same pool
[19:11] <sjust> the monitor should have prevented this from happening
[19:11] <ss7pro> now i know this :-)
[19:11] <ss7pro> I have read log from this channel
[19:11] <ss7pro> and someone was mentioning this ;)
[19:11] <sjust> it's our fault, the monitor is not supposed to allow this to happen
[19:12] <ss7pro> but how we can fix this now ?
[19:12] <ss7pro> do i need to prepare osdmap ?
[19:12] <sjust> I'm working out precisely what happened now
[19:12] <sjust> then I'll tell you how to work around it
[19:12] <ss7pro> maybe you need some more detailed data from gdb ?
[19:14] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Quit: LarsFronius)
[19:20] * bchrisman (~Adium@108.60.121.114) Quit (Read error: Connection reset by peer)
[19:21] * bchrisman (~Adium@108.60.121.114) has joined #ceph
[19:22] * chutzpah (~chutz@216.174.109.254) has joined #ceph
[19:33] * Theuni (~Theuni@46.253.59.219) has joined #ceph
[19:35] <sjust> ss7pro: I've got to work on something else for a while, I might be able to get back to your situation later today
[19:36] * cattelan (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) has joined #ceph
[19:37] <ss7pro> ok thanks
[19:47] * Theuni (~Theuni@46.253.59.219) Quit (Quit: Leaving.)
[19:53] * dmick (~dmick@aon.hq.newdream.net) has joined #ceph
[19:56] * Theuni (~Theuni@46.253.59.219) has joined #ceph
[20:08] * ss7pro (~ss7pro@static.nk-net.pl) Quit (Quit: IRC webchat at http://irc2go.com/)
[20:10] * Theuni (~Theuni@46.253.59.219) Quit (Quit: Leaving.)
[20:11] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[20:12] * LarsFronius (~LarsFroni@95-91-243-252-dynip.superkabel.de) has joined #ceph
[20:12] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[20:16] * MoXx (~Spooky@fb.rognant.fr) Quit (Ping timeout: 480 seconds)
[20:18] * Theuni (~Theuni@46.253.59.219) has joined #ceph
[20:41] * Theuni (~Theuni@46.253.59.219) Quit (Quit: Leaving.)
[21:11] * mgalkiewicz (~mgalkiewi@85.89.186.247) has joined #ceph
[21:13] <mgalkiewicz> elder: hi any progres on #2267?
[21:13] * cattelan is now known as cattelan_away
[21:13] <elder> No, sorry. Progress on the other(s) that are related, but nothing conclusive.
[21:13] * cattelan_away is now known as cattelan
[21:14] <elder> This is going to be a hard one I think so I expect something on the order of days, not hours.
[21:14] <mgalkiewicz> ok let me know if you need me to reproduce it for you
[21:15] <mgalkiewicz> I will have one question to you, just a sec
[21:17] <morpheus> so we're running ceph stable (0.45) for ~2 weeks no in production. no problems so far, great work
[21:19] <mgalkiewicz> elder: do you think the problem may be caused by ceph? https://gist.github.com/2492443
[21:19] <mgalkiewicz> this is rbd volume with xfs filesystem
[21:20] <elder> I don't think so. Note thae line at the bottom: XFS (rbd0): Corruption detected. Unmount and run xfs_repair
[21:20] <mgalkiewicz> ceph does not report any problems (ceph -s) but my clients are randomly affected
[21:21] <elder> Your XFS filesystem needs to be repaired. You can run "xfs_repair -n" to do a trial-run without changing anything--it will tell you what it would do.
[21:21] <elder> If your underlying XFS filesystem is corrupted you want to repair it.
[21:21] <mgalkiewicz> yes I do it but it reappears after a while
[21:21] <joshd> morpheus: good to hear - what parts are you using?
[21:22] <elder> mgalkiewicz, is this XFS filesystem something that's providing storage for a ceph osd?
[21:22] <mgalkiewicz> no it is filesystem on rbd volume, my osd is stored on btrfs
[21:23] <elder> Ahh, then it is possible that rbd/rados (which may be what you meant by "ceph") is connected to the problem, yes.
[21:23] <morpheus> 2 active, 3 standby mds , ~16 osd, 3 mon running on xfs. we're running about 100 kvm hosts on it yet
[21:24] <morpheus> looking forward to add some more hosts in the next days
[21:24] <mgalkiewicz> elder: bad luck
[21:24] <elder> OK, well it would be helpful if you file another bug on that so we keep track of whatever information you have.
[21:24] <joshd> mgalkiewicz: there was a bug in sparse read handling that might have caused that
[21:24] <joshd> mgalkiewicz: oh, but not on the kernel side
[21:24] <elder> Make sure you mention that it is XFS on top of RBD, and whatever somewhat detailed information you can about your configuratino.
[21:25] <mgalkiewicz> elder: will do
[21:26] <mgalkiewicz> joshd: any ideas how to fix or even temporary workaround this are welcome
[21:32] <joshd> mgalkiewicz: not really sure what the cause is, the only bugs I know of wouldn't affect your configuration
[21:33] <joshd> morpheus: is that on rbd, or the filesystem? you don't need mdses for rbd
[21:34] <morpheus> joshd: rbd, interesting, didn't know this
[21:34] <morpheus> so mon + osd should be enough
[21:35] <joshd> yeah, mds is just for the distributed filesystem (and more than one active mds isn't recommended yet)
[21:36] <mgalkiewicz> joshd: ok I have reported a bug
[21:37] <morpheus> i'll try to disable mds tomorrow (already 21:36 here) so i can use the nodes for other stuff
[21:42] <joshd> mgalkiewicz: have any of your osds been going down and coming back up (i.e. crashing and being restarted?)
[21:43] <joshd> mgalkiewicz: since you have 'fliestore btrfs snaps = 0', you could be hitting a non-idempotent journal replay problem that we fixed recently
[21:46] <mgalkiewicz> joshd: how recently? is it fixed in 0.44? I have disabled snapshots because the kernel on machines with mons, mds, osds were old and did not support it (2.6.32)
[21:46] <mgalkiewicz> joshd: osds were restarted
[21:46] <joshd> oh, that's really old for btrfs
[21:47] <mgalkiewicz> now it is upgraded to 3.2 so I will delete this option from config file
[21:49] <joshd> the bug is fixed in the next branch (which should be 0.46 in a few days)
[21:51] <mgalkiewicz> I thought that 0.46 is finished
[21:54] <joshd> we usually wait a week and add any important bug fixes before doing the release
[21:54] <mgalkiewicz> ok
[22:10] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[22:15] * cattelan is now known as cattelan_away
[22:15] * cattelan_away is now known as cattelan
[22:19] * Theuni (~Theuni@46.253.59.219) has joined #ceph
[22:27] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Quit: Leaving)
[22:33] * lofejndif (~lsqavnbok@9YYAAFMM5.tor-irc.dnsbl.oftc.net) has joined #ceph
[22:46] * Theuni (~Theuni@46.253.59.219) Quit (Ping timeout: 480 seconds)
[23:05] <chutzpah> #/j #debian
[23:05] <chutzpah> er
[23:08] * s[X]_ (~sX]@60-241-151-10.tpgi.com.au) has joined #ceph
[23:13] * s[X]_ (~sX]@60-241-151-10.tpgi.com.au) Quit (Remote host closed the connection)
[23:42] * s[X] (~sX]@ppp59-167-154-113.static.internode.on.net) has joined #ceph
[23:45] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[23:45] * mgalkiewicz (~mgalkiewi@85.89.186.247) Quit (Quit: Ex-Chat)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.