#ceph IRC Log


IRC Log for 2012-03-10

Timestamps are in GMT/BST.

[0:09] * perplexed (~ncampbell@ has joined #ceph
[0:12] <sagewk> mrjack: not for the file system. the rados cluster scrubs itself to detect inconsistencies, and repairs some of them.
[0:13] <mrjack> sagewk: i have this problem:
[0:13] <mrjack> created a nfsroot for fai
[0:13] <mrjack> then decided to delete it
[0:13] <mrjack> but
[0:13] <mrjack> rm: Entfernen von �nfsroot/live/filesystem.dir/etc/dpkg/origins� nicht möglich: Das Verzeichnis ist nicht leer
[0:14] <mrjack> which means directory is not emtpy
[0:14] <mrjack> but it is empty
[0:14] <mrjack> and
[0:14] <mrjack> :2012-03-09 23:23:43.265313 log 2012-03-09 23:23:33.790283 mds.0 xxx.xxx.xxx.xxx:6800/27320 24 : [ERR] loaded dup inode 10000004802 [2,head] v18 at /srv/fai/nfsroot/live/filesystem.dir/etc/dpkg/origins/debian.dpkg-new, but inode 10000004802.head v13009 already exists at ~mds0/stray4/10000004802
[0:15] <mrjack> well i don't need the data
[0:15] <mrjack> but i want to delete it
[0:15] <mrjack> but this fails so the files will be there forever?!
[0:15] <mrjack> can i only reformat cephfs without loosing rbd images?
[0:17] <sagewk> mrjack: not easily, altho it can be done. easier to rename that weird directory somewhere where it isn't in the way and we'll fix/repair it later
[0:19] <mrjack> i once in a while coded fsck for mysqlfs which is a fuse-fs, and would contribute code do ceph if i could, but i did not get a clue on how ceph really works internally yet :)
[0:20] <mrjack> sagewk: is fsck planned?
[0:21] <gregaf> mrjack: it definitely is, but we aren't actively working on the filesystem component until after we've stabilized the simpler pieces :)
[0:22] <mrjack> ok, i understand your point :)
[0:24] <mrjack> but what exactly do you mean by simpler pieces?
[0:26] <sagewk> tv__: there?
[0:26] <Tv__> yeah
[0:26] <sagewk> tv__: i need to set up br0 on plana01 to get the vm access to the network..
[0:26] <sagewk> i should just do that statically in /etc/network/interfaces?
[0:26] <gregaf> mrjack: we're focusing on the RADOS object store, the RADOS Gateway proxy (that speaks S3 and Swift, and is WAY smaller than the Ceph Filesystem), and the RADOS Block Device (which is WAY WAY smaller than the filesystem)
[0:27] <Tv__> sagewk: sure -- consider the base install there currently a throw-away
[0:27] <gregaf> we need RADOS stable before CephFS can be, and the demand for the other two interfaces seems to be about as high as the demand for CephFS and can be satisfied with a lot less engineering time
[0:28] <mrjack> i c
[0:29] <mrjack> though i never had any problems with my rbd images :)
[0:29] <mrjack> but the fs itself f*cksup alot.. :(
[0:31] <gregaf> yeah, but there are some failure scenarios which RBD is known to have issues with :( and it's better for us to have some components solid and properly QAed than for them all to be halfway there… :(
[0:31] <gregaf> even though it's not as fun as working on everything whenever we feel like it
[0:32] <mrjack> yeah
[0:32] <mrjack> :)
[0:33] <darkfader> mrjack: and the market for distributed scaleout filesystems with design flaws and random blowups at *any* layer is saturated
[0:33] <mrjack> :))
[0:33] <nhm> darkfader: lol
[0:33] <mrjack> darkfader: sad but true
[0:34] <nhm> darkfader: I use to admin 900TB of lustre storage. :)
[0:34] <darkfader> 3-4 months ago i was in the "when is ceph done when is ceph done when is ceph done" phase. i think i get that once a year
[0:34] <nhm> ugh s/use/used
[0:35] <darkfader> nhm: we haz autocorrection
[0:35] <mrjack> darkfader: i believe i'm also stuck in this phase :)
[0:36] <darkfader> mrjack: are you running live systems in there already since you mentioned "images"?
[0:37] <darkfader> i recently fixed my home fileserver and it's really robust now, i could start moving it to cepph so i don't feel idle
[0:37] <darkfader> (it's a VM anyway)
[0:37] <mrjack> darkfader: yes, i have some brave customers testing kvm with rbd
[0:38] <darkfader> cool
[0:38] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Quit: Leaving)
[0:38] <mrjack> i like the live migration feature ... it saves you trouble when you are forced to reboot nodes because of ceph-bugs ;)))
[0:40] <mrjack> my production system is still glusterfs though.. and that is because of the many troubles i still have with ceph-fuse or the kernel-client..
[0:44] * perplexed (~ncampbell@ Quit (Quit: perplexed)
[0:44] <darkfader> wow, so gluster has been stable for you?
[0:45] <mrjack> yes, i use it for webhosting and mta/imap servers
[0:45] <darkfader> has it improved a lot lately or do you simply "won" and learned how to keep it stable?
[0:45] <darkfader> did you win, even
[0:45] <darkfader> it's late :)
[0:45] <mrjack> well
[0:45] <mrjack> that question is not that easy to answer
[0:46] <mrjack> i'm stuck at v3.0.5
[0:46] <mrjack> because the upgrade is not backwards compatible..
[0:46] <mrjack> and it takes forever to backup about 550GB of Imap Mails..
[0:47] <mrjack> but it works well..
[0:48] <mrjack> so it's a mixture of luck and learning to get it stable i guess :)
[0:48] <darkfader> congrats, not many people mastered it
[0:48] <mrjack> :)
[0:49] <mrjack> i also deployed gluster for customers.. they are still happy..
[0:50] <mrjack> but i think ceph has the better design
[0:55] <gregaf> Tv__: you want us to skype you in for journal club?
[0:55] <Tv__> gregaf: in what free time? grumble grumble
[0:55] <gregaf> heh, okay
[0:55] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[0:59] * lofejndif (~lsqavnbok@214.Red-83-43-124.dynamicIP.rima-tde.net) Quit (Quit: Leaving)
[1:00] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[1:00] * adjohn (~adjohn@50-0-92-115.dsl.dynamic.sonic.net) has joined #ceph
[1:16] <sagewk> btw, i set up a tunnel from metropolis, so anybody who wants can run their teuth jobs from there
[1:40] <mrjack> i can reproduce the bug with another fai nfsroot install .. when i want to delete it, it can't be deleted..
[1:40] <mrjack> how could i trace / debug that?
[1:40] <sjust1> sagewk: looks good
[1:41] * ivan\ (~ivan@108-213-76-179.lightspeed.frokca.sbcglobal.net) has joined #ceph
[1:41] * ivan` (~ivan`@li125-242.members.linode.com) has joined #ceph
[1:42] <mrjack> as a consequence now all mds are crashed..
[1:44] <sagewk> woot, teuthology-lock is working!
[1:45] <mrjack> ?
[1:48] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has left #ceph
[1:52] * tnt_ (~tnt@37.191-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[1:56] <sagewk> tv__: still there?
[1:56] <Tv__> yeah
[1:57] <sagewk> tv__: some of the nodes don't have our keys...
[1:58] <sagewk> do they have yours? can you do a big chef run on them to get everyone in there?
[1:58] <sagewk> (teuthology-lock is up and running now)
[1:58] <Tv__> dmick: can you work your newly-found solo-from-scratch magic? let me know if there are stragglers
[1:59] <sagewk> dan's key is on all the nodes?
[1:59] <Tv__> i don't know why it wouldn't be
[1:59] <sagewk> mine isn't..
[1:59] <Tv__> but then again, i don't really know what's up, any my head ain't on right today
[2:00] <sagewk> :) k
[2:00] <Tv__> if nothing else, i understand dan knows the dh-admin way
[2:00] <Tv__> i'm seriously afraid of clusterssh and friends today
[2:00] <Tv__> you don't want me to take down dozes of boxes ;)
[2:00] <Tv__> i already messed up the vpn routing a little bit, you just can't tell...
[2:01] <sagewk> hehe ok. no worries, have a good weekend
[2:01] <sagewk> it looks like just nodes < 10 ...
[2:02] <Tv__> oh rihgt
[2:02] <sagewk> mostly
[2:02] <Tv__> the ones not allocated to anyone probably didn't get the chef run in the first place
[2:03] <Tv__> you could just leave them out the db / lock them for now
[2:04] <sagewk> yeah
[2:07] <gregaf> mrjack: sorry, most of us are focused on some local infrastructure stuff right now...
[2:07] <gregaf> anyway, if it's easy to reproduce your issue, you should turn on mds debugging and create an issue in the tracker and attach the full mds log!
[2:08] <gregaf> http://ceph.newdream.net/wiki/Debugging
[2:08] <gregaf> just debug ms and debug mds
[2:44] <mrjack> hm...
[2:44] <mrjack> 2012-03-10 02:43:45.311118 7fd04537d780 mon.1@1(probing) e1 init fsid 5edcee7e-1fc8-4edb-8ae0-6d29e1306b0e
[2:44] <mrjack> 2012-03-10 02:43:45.313401 7fd04537d780 mon.1@1(probing).pg v714911 update_from_paxos: error parsing incremental update: buffer::end_of_buffer
[2:44] <mrjack> mon/PGMonitor.cc: In function 'virtual void PGMonitor::update_from_paxos()' thread 7fd04537d780 time 2012-03-10 02:43:45.313413
[2:44] <mrjack> mon/PGMonitor.cc: 253: FAILED assert(0 == "update_from_paxos: error parsing incremental update")
[2:44] <mrjack> ceph version 0.43 (commit:9fa8781c0147d66fcef7c2dd0e09cd3c69747d37)
[2:44] <mrjack> 1: (PGMonitor::update_from_paxos()+0xc38) [0x4e7868]
[2:44] <mrjack> 2: (Monitor::init()+0x4b3) [0x47c2d3]
[2:44] <mrjack> 3: (main()+0x3070) [0x46a2c0]
[2:44] <mrjack> 4: (__libc_start_main()+0xfd) [0x7fd043735c8d]
[2:44] <mrjack> 5: /usr/bin/ceph-mon() [0x466fe9]
[2:44] <mrjack> ceph version 0.43 (commit:9fa8781c0147d66fcef7c2dd0e09cd3c69747d37)
[2:44] <mrjack> 1: (PGMonitor::update_from_paxos()+0xc38) [0x4e7868]
[2:44] <mrjack> 2: (Monitor::init()+0x4b3) [0x47c2d3]
[2:44] <mrjack> 3: (main()+0x3070) [0x46a2c0]
[2:44] <mrjack> 4: (__libc_start_main()+0xfd) [0x7fd043735c8d]
[2:44] <mrjack> 5: /usr/bin/ceph-mon() [0x466fe9]
[2:44] <mrjack> terminate called after throwing an instance of 'ceph::FailedAssertion'
[2:44] <mrjack> *** Caught signal (Aborted) **
[2:44] <mrjack> in thread 7fd04537d780
[2:44] <mrjack> ceph version 0.43 (commit:9fa8781c0147d66fcef7c2dd0e09cd3c69747d37)
[2:44] <mrjack> 1: /usr/bin/ceph-mon() [0x51ec69]
[2:44] <mrjack> 2: (()+0xeff0) [0x7fd044f66ff0]
[2:44] <mrjack> 3: (gsignal()+0x35) [0x7fd0437491b5]
[2:44] <mrjack> 4: (abort()+0x180) [0x7fd04374bfc0]
[2:44] <mrjack> 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fd043fdddc5]
[2:44] <mrjack> 6: (()+0xcb166) [0x7fd043fdc166]
[2:44] <mrjack> 7: (()+0xcb193) [0x7fd043fdc193]
[2:44] <mrjack> 8: (()+0xcb28e) [0x7fd043fdc28e]
[2:44] <mrjack> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80e) [0x54aece]
[2:44] <mrjack> 10: (PGMonitor::update_from_paxos()+0xc38) [0x4e7868]
[2:44] <mrjack> 11: (Monitor::init()+0x4b3) [0x47c2d3]
[2:44] <mrjack> 12: (main()+0x3070) [0x46a2c0]
[2:44] <mrjack> 13: (__libc_start_main()+0xfd) [0x7fd043735c8d]
[2:44] <mrjack> 14: /usr/bin/ceph-mon() [0x466fe9]
[2:44] <mrjack> i'cant get the mds back up to reproduce it
[2:45] <mrjack> eh
[2:45] <mrjack> mon
[2:45] <mrjack> ..
[3:06] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[3:06] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[3:16] * joao (~JL@ace.ops.newdream.net) Quit (Read error: Operation timed out)
[3:16] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[3:23] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[3:29] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[3:36] <gregaf> mrjack: that's odd…create an issue in the tracker and attach the contents of the "latest" file in the mon_dir/pgmap directory?
[3:36] <gregaf> anyway, I'm off for the weekend
[3:38] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[3:50] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[4:22] * mrjack (mrjack@office.smart-weblications.net) Quit (Ping timeout: 480 seconds)
[4:25] * chutzpah (~chutz@ Quit (Quit: Leaving)
[5:03] <Qten> any devs still around?
[5:03] * dmick (~dmick@aon.hq.newdream.net) Quit (Quit: Leaving.)
[7:16] * softcrack (ca55d12f@ircip4.mibbit.com) has joined #ceph
[8:38] * softcrack (ca55d12f@ircip4.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[8:41] * self (~mbritsch@p5DCE9A4A.dip.t-dialin.net) has joined #ceph
[8:42] <self> test
[8:52] * self (~mbritsch@p5DCE9A4A.dip.t-dialin.net) Quit (Ping timeout: 480 seconds)
[9:06] * tnt_ (~tnt@37.191-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[9:12] * Tv__ (~tv@cpe-24-24-131-250.socal.res.rr.com) Quit (Read error: Operation timed out)
[10:30] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[10:41] * The_Bishop (~bishop@178-17-163-220.static-host.net) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[11:16] * Enorian (~Enoria@albaldah.dreamhost.com) Quit (Quit: Leaving)
[11:33] * The_Bishop (~bishop@178-17-163-220.static-host.net) has joined #ceph
[11:46] * tnt_ (~tnt@37.191-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[12:02] * tnt_ (~tnt@37.191-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[12:13] * alexxy (~alexxy@ Quit (Remote host closed the connection)
[12:43] * alexxy (~alexxy@ has joined #ceph
[12:45] * alexxy (~alexxy@ Quit (Remote host closed the connection)
[12:51] * phil_ (~quassel@chello080109010223.16.14.vie.surfer.at) has joined #ceph
[12:57] * tnt_ (~tnt@37.191-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[13:42] * lofejndif (~lsqavnbok@214.Red-83-43-124.dynamicIP.rima-tde.net) has joined #ceph
[14:23] * mrjack (mrjack@office.smart-weblications.net) has joined #ceph
[15:28] * aliguori_ (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[15:28] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Read error: Connection reset by peer)
[16:16] * adjohn (~adjohn@50-0-92-115.dsl.dynamic.sonic.net) Quit (Read error: Operation timed out)
[16:16] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) has joined #ceph
[16:17] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) Quit ()
[16:17] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[16:25] * LarsFronius (~LarsFroni@f054097134.adsl.alicedsl.de) has joined #ceph
[16:41] * LarsFronius (~LarsFroni@f054097134.adsl.alicedsl.de) Quit (Quit: LarsFronius)
[16:51] * tnt_ (~tnt@122.25-240-81.adsl-dyn.isp.belgacom.be) has joined #ceph
[17:36] * tnt__ (~tnt@215.27-240-81.adsl-dyn.isp.belgacom.be) has joined #ceph
[17:37] * tnt_ (~tnt@122.25-240-81.adsl-dyn.isp.belgacom.be) Quit (Read error: Connection reset by peer)
[17:41] * tnt_ (~tnt@152.59-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[17:45] * tnt__ (~tnt@215.27-240-81.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[18:24] * aliguori_ (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[18:56] * Tv__ (~tv@cpe-24-24-131-250.socal.res.rr.com) has joined #ceph
[19:09] * BManojlovic (~steki@ has joined #ceph
[19:17] * LarsFronius_ (~LarsFroni@d219187.adsl.hansenet.de) has joined #ceph
[19:30] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[19:42] * adjohn (~adjohn@50-0-92-115.dsl.dynamic.sonic.net) has joined #ceph
[19:42] * adjohn (~adjohn@50-0-92-115.dsl.dynamic.sonic.net) Quit ()
[19:48] * adjohn (~adjohn@50-0-92-115.dsl.dynamic.sonic.net) has joined #ceph
[19:48] * adjohn (~adjohn@50-0-92-115.dsl.dynamic.sonic.net) Quit ()
[19:49] * lofejndif (~lsqavnbok@214.Red-83-43-124.dynamicIP.rima-tde.net) Quit (Quit: Leaving)
[19:52] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[20:02] * mrjack (mrjack@office.smart-weblications.net) Quit (Ping timeout: 480 seconds)
[20:23] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) Quit (Ping timeout: 480 seconds)
[20:32] * tjikkun (~tjikkun@82-169-255-84.ip.telfort.nl) has joined #ceph
[22:09] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[22:37] * Kathor (~Kathor@ Quit (Ping timeout: 480 seconds)
[23:40] * lofejndif (~lsqavnbok@214.Red-83-43-124.dynamicIP.rima-tde.net) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.