#ceph IRC Log

Index

IRC Log for 2012-07-22

Timestamps are in GMT/BST.

[0:52] * LarsFronius (~LarsFroni@95-91-243-240-dynip.superkabel.de) has joined #ceph
[0:59] * danieagle (~Daniel@177.43.213.15) has joined #ceph
[1:03] * EmilienM (~EmilienM@191.223.101.84.rev.sfr.net) Quit (Quit: Leaving...)
[1:43] * danieagle (~Daniel@177.43.213.15) Quit (Read error: Connection reset by peer)
[1:59] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[1:59] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[2:35] * BManojlovic (~steki@212.200.241.106) Quit (Quit: Ja odoh a vi sta 'ocete...)
[3:25] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Quit: Leaving)
[3:30] * LarsFronius (~LarsFroni@95-91-243-240-dynip.superkabel.de) Quit (Quit: LarsFronius)
[4:59] * tjikkun (~tjikkun@82-169-255-84.ip.telfort.nl) Quit (Read error: Connection reset by peer)
[4:59] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) has joined #ceph
[5:31] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[5:31] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[5:47] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[5:59] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[8:40] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[9:00] * nymous (~darthampe@95-106-165-238.pppoe.yaroslavl.ru) has joined #ceph
[9:19] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[10:26] * deepsa (~deepsa@122.172.212.52) has joined #ceph
[10:58] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[11:11] * EmilienM (~EmilienM@191.223.101.84.rev.sfr.net) has joined #ceph
[11:23] * EmilienM (~EmilienM@191.223.101.84.rev.sfr.net) Quit (Quit: Leaving...)
[11:33] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[12:10] * LarsFronius (~LarsFroni@95-91-243-240-dynip.superkabel.de) has joined #ceph
[12:23] <nymous> i think i'll end up with ceph
[12:23] <nymous> my cluster stuck and won't return my data
[12:24] <nymous> too unstable for production
[12:39] * joao (~JL@89.181.148.137) Quit (Quit: Leaving)
[13:13] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[13:33] <darkfader> nymous: but thats exactly what the warnings mean :)
[13:34] <nymous> i've read so many enthusiastic articles, so i thought it is stable enough
[13:49] <darkfader> did you try just using /dev/rbd with a normal (haha, normal) cluster filesystem like gpfs or ocfs?
[13:50] <darkfader> the devs are really methodically working through each part of ceph to make it stable, but the fs layer is further down the roadmap
[13:51] <darkfader> if you need something right now, i've seen people using xtreemfs and moosefs for real-world things. they aren't ceph, so there's less HA and scalabilty if you compare the designs, but they're out of experimental stages it seems
[13:52] <darkfader> i'd just plan to not stick with them forever after ceph has all parts stable
[13:52] <darkfader> since it'll just handle more io/s than anything else (methinks)
[13:52] <nymous> i'm mainly using gpfs
[13:53] <nymous> so cephfs is no match for it right now
[13:53] <nymous> i thought that at least it is fault tolerant, but i've totally lost my data on ceph
[13:55] <darkfader> you could try gpfs on rbd, but no cephfs is not meant to be run in prod now
[13:56] <darkfader> it is no good to bend things ;) but it's very good to give it a test run every month :)
[13:57] <darkfader> lol, and i gtg, i abused my raidcontroller a little too much
[13:58] <nymous> i see no benefits on running gpfs on rbd
[13:59] <darkfader> i can't tell, never used it
[14:00] <darkfader> what might be one is adding more replicas or funky replication topologies
[14:00] <darkfader> anyway. if they say experimental, they mean it. experiment and file bugs :)
[14:00] <darkfader> but not put prod data on it while you have something that works
[14:01] <darkfader> as for the enthusiastic reports, most people will say "wow this is the new shit, use it, and never have more data to put on it that their iso archive and movies
[14:01] <darkfader> and thats most of the reports
[14:01] <darkfader> no matter which filesystem
[14:02] <darkfader> people tell you ext4 is great but most big shops end up running NOT ext4 since it's not stable...
[14:02] <darkfader> people tell you btrfs is great even when it didn't have a fsck
[14:04] <nymous> afaik btrfs got fsck with new kernel
[14:05] <nymous> anyway, it was interesting to explore, but didn't do the trick(
[14:06] <darkfader> nymous: yes it has fsck now
[14:06] <darkfader> just meant you need to take all enthusiastic stories about filesystem with a lot of salt
[14:07] <darkfader> and re-test cephfs in a few months
[14:07] <darkfader> i think there's a roadmap on the website
[14:07] <nymous> i wished to try it with openstack
[14:07] <nymous> and it worked for glance and volume
[14:08] <nymous> but things got totally messed up with live migration
[14:08] <nymous> so bad so whole cluster died )
[14:08] <nymous> it was starting, but says several pgs are stuck... no rbd ops, no fs mounts, nothing
[14:08] <nymous> data has lost
[14:09] <darkfader> yeah i hate it when filesystems just get stuck
[14:09] <darkfader> did you see if there was still traffic between the osds?
[14:10] <nymous> i had several entries on logs about it... not sure actually
[14:11] <nymous> it says one osd had slow request with ~9000 secs waiting
[14:11] <nymous> no matter what i did, it didn't "unstuck"
[15:15] * nymous (~darthampe@95-106-165-238.pppoe.yaroslavl.ru) Quit (Ping timeout: 480 seconds)
[15:53] * glowell (~glowell@c-98-210-226-131.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[16:30] * sage (~sage@cpe-76-94-40-34.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[17:08] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[17:08] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Read error: Connection reset by peer)
[17:25] * andreask (~andreas@213.162.68.36) has joined #ceph
[17:33] * andreask (~andreas@213.162.68.36) has left #ceph
[18:03] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[18:03] * andreask (~andreas@213.162.68.118) has joined #ceph
[18:04] * andreask (~andreas@213.162.68.118) Quit ()
[18:09] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Ping timeout: 480 seconds)
[18:20] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[18:20] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Read error: Connection reset by peer)
[18:20] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[18:28] * EmilienM (~EmilienM@191.223.101.84.rev.sfr.net) has joined #ceph
[18:29] * loicd (~loic@2a01:e35:2eba:db10:120b:a9ff:feb7:cce0) has joined #ceph
[18:31] * EmilienM (~EmilienM@191.223.101.84.rev.sfr.net) Quit (Remote host closed the connection)
[18:32] * EmilienM (~EmilienM@191.223.101.84.rev.sfr.net) has joined #ceph
[18:37] * loicd (~loic@2a01:e35:2eba:db10:120b:a9ff:feb7:cce0) Quit (Quit: Leaving.)
[18:37] * loicd (~loic@magenta.dachary.org) has joined #ceph
[19:23] <lxo> interesting... I just found out, by accident, a way to fix some ???blocked directories??? that I've been experiencing every now and again
[19:24] <lxo> the issue has to do with ???incomplete??? hardlinks (presumably client or mds restart): the directory holding one of the links, and the link itself, become inaccessible (client blocks on stat) until the other link is accessed, presumably triggering recovery that fixes the problem
[19:24] * loicd (~loic@magenta.dachary.org) Quit (Read error: Connection reset by peer)
[19:24] * loicd (~loic@magenta.dachary.org) has joined #ceph
[19:26] <lxo> this was not just a matter of waiting for the mds to expire a dead client; the problem remained overnight
[19:26] <lxo> it wasn't a matter of waiting for the log segment to be processed and expired either; lots of filesystem activity occurred during that period
[19:27] <lxo> (like, hundreds of thousands of small files were created)
[19:28] <lxo> now, is there any way to trigger this kind of hardlink recovery that doesn't involve knowing the name of such an ???incomplete??? link?
[19:28] <lxo> because, you know, it may get tricky if both are in the same directory and you can't getdents from the directory ;-)
[19:30] <lxo> or even if they're in different dirs, and each dir is locked by a separate pending hardlink (e.g. a/c => b/c and b/d => a/d)
[20:13] * EmilienM (~EmilienM@191.223.101.84.rev.sfr.net) Quit (Quit: Leaving...)
[20:55] * darkfaded (~floh@188.40.175.2) has joined #ceph
[20:56] * nolan_ (~nolan@phong.sigbus.net) has joined #ceph
[20:59] * benner_ (~benner@193.200.124.63) has joined #ceph
[20:59] * benner (~benner@193.200.124.63) Quit (resistance.oftc.net oxygen.oftc.net)
[20:59] * Dr_O (~owen@heppc049.ph.qmul.ac.uk) Quit (resistance.oftc.net oxygen.oftc.net)
[20:59] * darkfader (~floh@188.40.175.2) Quit (resistance.oftc.net oxygen.oftc.net)
[20:59] * jamespage (~jamespage@tobermory.gromper.net) Quit (resistance.oftc.net oxygen.oftc.net)
[20:59] * nolan (~nolan@2001:470:1:41:20c:29ff:fe9a:60be) Quit (resistance.oftc.net oxygen.oftc.net)
[20:59] * nolan_ is now known as nolan
[21:02] * jamespage (~jamespage@tobermory.gromper.net) has joined #ceph
[21:07] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[21:08] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has left #ceph
[21:08] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[21:10] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:10] * Dr_O (~owen@heppc049.ph.qmul.ac.uk) has joined #ceph
[21:19] * deepsa (~deepsa@122.172.212.52) Quit (Quit: Computer has gone to sleep.)
[21:26] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[21:30] * sage (~sage@cpe-76-94-40-34.socal.res.rr.com) has joined #ceph
[21:41] * danieagle (~Daniel@177.43.213.15) has joined #ceph
[21:50] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[21:55] * danieagle (~Daniel@177.43.213.15) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[22:13] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[23:20] * LarsFronius (~LarsFroni@95-91-243-240-dynip.superkabel.de) Quit (Quit: LarsFronius)
[23:54] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has joined #ceph
[23:54] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has left #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.