#ceph IRC Log

Index

IRC Log for 2012-01-10

Timestamps are in GMT/BST.

[0:06] <dwm_> Okay, definitely sounds like it's relevant to that bug, will collect some log samples together and attach them to the bug report.
[0:07] <sjust> dwm_: thanks
[0:07] <dwm_> Not at all; given how damn useful I anticipate Ceph is going to be, it seems only fair to help. :)
[0:34] * failbaitr (~innerheig@85.17.0.131) Quit (Remote host closed the connection)
[0:44] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[0:54] * adjohn is now known as Guest23515
[0:54] * adjohn (~adjohn@208.90.214.43) has joined #ceph
[0:54] * Guest23515 (~adjohn@208.90.214.43) Quit (Read error: Connection reset by peer)
[0:59] * adjohn (~adjohn@208.90.214.43) Quit (Remote host closed the connection)
[0:59] * adjohn (~adjohn@208.90.214.43) has joined #ceph
[1:01] * spadaccio (~spadaccio@213-155-151-233.customer.teliacarrier.com) Quit (Quit: WeeChat 0.3.7-dev)
[1:02] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Read error: Operation timed out)
[1:11] * BManojlovic (~steki@212.200.243.100) Quit (Remote host closed the connection)
[1:17] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[2:11] * Tv (~Tv|work@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[2:13] * bchrisman (~Adium@108.60.121.114) Quit (Quit: Leaving.)
[2:29] * yoshi (~yoshi@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[3:08] * MarkDud (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[3:25] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[3:29] * linuxuser1357 (~linuxuser@50-82-41-66.client.mchsi.com) has joined #ceph
[3:30] * linuxuser1357 (~linuxuser@50-82-41-66.client.mchsi.com) Quit ()
[3:41] * adjohn is now known as Guest23528
[3:41] * adjohn (~adjohn@208.90.214.43) has joined #ceph
[3:42] * Guest23528 (~adjohn@208.90.214.43) Quit (Read error: Operation timed out)
[3:46] * adjohn is now known as Guest23531
[3:46] * adjohn (~adjohn@208.90.214.43) has joined #ceph
[3:47] * Guest23531 (~adjohn@208.90.214.43) Quit (Read error: Operation timed out)
[3:54] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Remote host closed the connection)
[3:54] * adjohn (~adjohn@208.90.214.43) Quit (Ping timeout: 480 seconds)
[4:07] * elder (~elder@aon.hq.newdream.net) Quit (Quit: Leaving)
[4:13] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[4:35] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[4:54] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[6:18] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has joined #ceph
[6:19] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has left #ceph
[7:55] * Kioob (~kioob@luuna.daevel.fr) Quit (Quit: Leaving.)
[8:03] * adjohn (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) has joined #ceph
[8:04] * adjohn (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) Quit ()
[8:05] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[8:33] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:51] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[9:59] * yoshi (~yoshi@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[10:06] * fghaas (~florian@85-127-93-41.dynamic.xdsl-line.inode.at) has joined #ceph
[10:17] * fghaas (~florian@85-127-93-41.dynamic.xdsl-line.inode.at) Quit (Ping timeout: 480 seconds)
[10:21] * fghaas (~florian@85-127-93-41.dynamic.xdsl-line.inode.at) has joined #ceph
[10:37] * bugoff_ (bram@november.openminds.be) has joined #ceph
[10:37] * bugoff (bram@november.openminds.be) Quit (Read error: Connection reset by peer)
[11:07] * spadaccio (~spadaccio@213-155-151-233.customer.teliacarrier.com) has joined #ceph
[12:40] * dwm_ (~dwm@vm-shell4.doc.ic.ac.uk) Quit (Quit: leaving)
[12:42] * dwm_ (~dwm@kalimdor.tastycake.net) has joined #ceph
[12:42] * dwm_ (~dwm@kalimdor.tastycake.net) Quit ()
[12:42] * dwm_ (~dwm@2001:ba8:0:1c0:225:90ff:fe08:9150) has joined #ceph
[13:05] * fronlius_ (~fronlius@testing78.jimdo-server.com) has joined #ceph
[13:05] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Read error: Connection reset by peer)
[13:05] * fronlius_ is now known as fronlius
[13:27] <wonko_be> to make a bit a real-life test with ceph, would I use btrfs as the osd storage, or ext4 with xattrs?
[14:13] <darkfaded> ext4 if you do want to be able to use fsck, btrfs if you wanna see next years real life :)
[14:14] <darkfaded> i.e. you can't do much snapshotting without brtrs
[14:14] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[14:14] <darkfaded> but if it's "real" data then imo btrfs is out of question
[14:20] <wonko_be> it is real, but I don't mind losing it
[14:20] <wonko_be> (it is the timemachine backups of some of our spare macbooks we have here)
[14:26] * xarthisius (~xarth@hum.astri.uni.torun.pl) has left #ceph
[15:19] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[15:24] <spadaccio> hi, I create an image successfully with "rbd create", but then I can't use "rbd map". I get "add failed: error 2: No such file or directory". Any hints?
[15:48] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[15:48] * fronlius_ (~fronlius@testing78.jimdo-server.com) has joined #ceph
[15:58] <spadaccio> so I believe I found a bug
[15:58] <spadaccio> my image filename was something a.b.c
[15:58] <spadaccio> and when running rbd map
[15:58] <spadaccio> I saw in osd.0.log I saw it was looking for a.b
[15:59] <spadaccio> thus the No such file or directory
[15:59] <spadaccio> changing the image name to "test" worked, it has been mapped correctly
[16:04] <spadaccio> hmm.. I can't file a bug even if I got an account on redmine
[16:15] <spadaccio> uhm..
[16:15] <spadaccio> no
[16:15] <spadaccio> the bug is different.. it seems to read only the first 40 characters of the name
[16:15] <spadaccio> that coincidentally, on my first attempt, was the length of the image name without the last .something portion
[16:20] * todin_ is now known as todin
[16:22] <todin> does anyone use the ceph barclamps for crowbar?
[16:23] <spadaccio> oh, it's a known issue: http://tracker.newdream.net/issues/1704
[16:55] * adjohn (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) has joined #ceph
[16:58] <dwm_> Have updated bug #1759.
[16:59] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[17:02] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[17:06] * fronlius_ (~fronlius@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[17:07] * elder (~elder@aon.hq.newdream.net) has joined #ceph
[17:33] * fghaas (~florian@85-127-93-41.dynamic.xdsl-line.inode.at) Quit (Ping timeout: 480 seconds)
[17:33] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:34] * The_Bishop (~bishop@cable-89-16-138-109.cust.telecolumbus.net) has joined #ceph
[17:37] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[17:42] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[17:51] * fronlius (~fronlius@e176052115.adsl.alicedsl.de) has joined #ceph
[17:58] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[17:58] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[18:01] * adjohn (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[18:02] * lx0 is now known as lxo
[18:09] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[18:52] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[19:03] * bchrisman (~Adium@108.60.121.114) has joined #ceph
[19:06] * adjohn (~adjohn@208.90.214.43) has joined #ceph
[19:16] * jojy (~jvarghese@108.60.121.114) has joined #ceph
[19:27] * fghaas (~florian@85-127-93-41.dynamic.xdsl-line.inode.at) has joined #ceph
[19:38] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[19:50] * fghaas (~florian@85-127-93-41.dynamic.xdsl-line.inode.at) Quit (Ping timeout: 480 seconds)
[19:53] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[20:08] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) Quit (Remote host closed the connection)
[20:11] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) has joined #ceph
[20:18] * mtk (~mtk@ool-44c35967.dyn.optonline.net) has joined #ceph
[20:35] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[20:44] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[20:54] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has joined #ceph
[21:02] * adjohn (~adjohn@208.90.214.43) Quit (Quit: adjohn)
[21:10] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[21:14] * adjohn (~adjohn@208.90.214.43) has joined #ceph
[21:16] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[21:36] * fghaas (~florian@85-127-93-41.dynamic.xdsl-line.inode.at) has joined #ceph
[21:36] * jmlowe (~Adium@129-79-195-139.dhcp-bl.indiana.edu) has joined #ceph
[21:36] <jmlowe> So I've managed to deadlock some osd's with a snapshot operation again and this time I'm sure the btrfs underneath was healthy
[21:37] <jmlowe> What information would be helpful?
[21:42] * _Tassadar (~tassadar@tassadar.xs4all.nl) Quit (Remote host closed the connection)
[21:42] * _Tassadar (~tassadar@tassadar.xs4all.nl) has joined #ceph
[21:43] <joshd> jmlowe: an osd log of this happening with debug_filestore=20 debug_osd=25 debug_ms=1 should tell us where the problem is
[21:43] <joshd> jmlowe: but if the osd is uninterruptible, it is probably a problem with btrfs
[21:44] <jmlowe> root@gwboss2:~# ps -Alf |grep osd
[21:44] <jmlowe> 0 S root 20467 20269 0 80 0 - 2311 pipe_w 15:37 pts/1 00:00:00 grep --color=auto osd
[21:44] <jmlowe> 5 D root 25564 1 3 80 0 - 119268 exit_m Jan05 ? 04:31:46 /usr/bin/ceph-osd -i 10 -c /tmp/ceph.conf.23393
[21:44] <jmlowe> 5 S root 26383 1 5 80 0 - 140232 futex_ Jan05 ? 06:05:38 /usr/bin/ceph-osd -i 11 -c /tmp/ceph.conf.23393
[21:44] <jmlowe> 5 D root 27533 1 4 80 0 - 124637 exit_m Jan05 ? 05:50:52 /usr/bin/ceph-osd -i 6 -c /tmp/ceph.conf.23393
[21:44] <jmlowe> 5 S root 28409 1 2 80 0 - 153765 futex_ Jan05 ? 02:59:16 /usr/bin/ceph-osd -i 7 -c /tmp/ceph.conf.23393
[21:44] <jmlowe> 5 S root 29303 1 4 80 0 - 146309 futex_ Jan05 ? 05:32:45 /usr/bin/ceph-osd -i 8 -c /tmp/ceph.conf.23393
[21:44] <jmlowe> 5 S root 30213 1 3 80 0 - 140267 futex_ Jan05 ? 04:10:13 /usr/bin/ceph-osd -i 9 -c /tmp/ceph.conf.23393
[21:48] <jmlowe> Joshd: are you sure, I just created a fresh btrfs filesystem 5 days ago and it hasn't been down since
[21:48] <NaioN> jmlowe: which kernel version do you use?
[21:48] <jmlowe> Linux version 3.0.0-14-server (buildd@allspice) (gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3) ) #23-Ubuntu SMP Mon Nov 21 20:49:05 UTC 2011
[21:49] <NaioN> oh
[21:49] <NaioN> I have a lot of troubles with btrfs
[21:49] <jmlowe> let me guess, btrfs bug in that version?
[21:49] <NaioN> well I tried 3.1rcX and still having trouble
[21:50] <NaioN> I'm going to try 3.2
[21:50] <NaioN> jmlowe: do you see any errors/warnings in dmesg?
[21:50] <NaioN> that's what I saw
[21:51] <NaioN> And the OSDs stall in IO... (status D in ps)
[21:53] <jmlowe> how about this one: Jan 10 10:30:02 gwboss2 kernel: [504007.999398] kernel BUG at /build/buildd/linu
[21:53] <jmlowe> x-3.0.0/fs/btrfs/delayed-inode.c:1693!
[21:53] <NaioN> I got others
[21:54] <NaioN> but well that's definitely a btrfs bug :)
[21:55] <jmlowe> so my best chance for recovery is to umount everything that is still up and create a new btrfs for the affected osd's after a powercycle?
[21:56] <NaioN> yeps
[21:56] <jmlowe> and then wait for a kernel rev or three before I do a snapshot again
[21:56] <NaioN> well the 3.2 is released and I saw a lot of BTRFS fixes
[21:56] <NaioN> also in 3.1
[21:56] <NaioN> so it's worth trying
[21:57] <jmlowe> wonder if ubuntu devs will be kind enough to backport them
[21:57] <NaioN> search for kernel ppa
[21:58] <jmlowe> I loath the idea of maintaining my own, it's very tedious
[21:59] <NaioN> http://linux-software-news-tutorials.blogspot.com/2012/01/linux-kernel-32-is-with-us-news-and-how.html
[22:00] <NaioN> don't find a ppa for it
[22:07] <wido> hi
[22:07] <wido> I'm still seeing my monitors OOM, even with 4GB of memory
[22:07] <jmlowe> well that is considerably easier, thanks NaioN
[22:07] <wido> I've seen this before, but then the profiling didn't work, has there been any change in the profiling for memory leaks?
[22:07] <wido> Or is the 'old' wiki still the way to go?
[22:08] <wido> with that I mean: http://ceph.newdream.net/wiki/Memory_Profiling
[22:08] <wido> ceph mon tell 0 start_profiling
[22:08] <wido> start_profiler*
[22:23] * fghaas (~florian@85-127-93-41.dynamic.xdsl-line.inode.at) has left #ceph
[22:50] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has left #ceph
[22:55] * Kioob (~kioob@luuna.daevel.fr) has joined #ceph
[22:56] <sjust> wido: as far as I know, that works
[22:56] <sjust> what problem were you running into>?
[23:08] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[23:08] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[23:15] * fronlius (~fronlius@e176052115.adsl.alicedsl.de) Quit (Quit: fronlius)
[23:52] * jmlowe (~Adium@129-79-195-139.dhcp-bl.indiana.edu) Quit (Quit: Leaving.)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.