#ceph IRC Log


IRC Log for 2011-06-23

Timestamps are in GMT/BST.

[0:00] <Tv> alright this might be the fix, waiting for gitbuilder
[0:06] <stingray> this message represents the official view of the voices in my head
[0:06] <stingray> sorry, wrong window again
[0:09] <Tv> stingray: it's almost as if your hands were controlled by some external entity
[0:10] <Tv> sepia is still broken for me :(
[0:12] <sagewk> tv: which nodes? the ones in that bug?
[0:13] <Tv> sagewk: #10301
[0:14] <stingray> Tv: bash.org classic, the keys are next to each other
[0:16] <Tv> mds/AnchorServer.cc: In member function ???bool AnchorServer::add(inodeno_t, inodeno_t, __u32, bool)???:
[0:16] <Tv> warning: mds/AnchorServer.cc:73: control reaches end of non-void function
[0:17] <sagewk> yeah that's greg, will bug him when he comes back in
[0:17] <sagewk> tv: 72 is down but others around it are ok.. so not a switch issue at least.
[0:18] <Tv> sagewk: oh huh
[0:18] <Tv> sagewk: thanks!
[0:18] <sagewk> tv: 66 is free and up
[0:24] <Tv> alright 02a2518efa224266db68a6bc5b5eaee1127c6ef9 should fix the gitbuilder issues, waiting..
[0:30] <Tv> yup, that worked
[0:30] <Tv> just the unrelated warning i pasted above left
[0:35] <gregaf1> Tv: huh, I must've missed the warnings, lemme look at it
[0:37] <Tv> sagewk: fyi the sepia outage still continues as far as serial console and powerc are concerned.. ticket updated.. i'll take sepia66 thankyouverymuch
[0:37] <Tv> umm sepia66 has a peon2963 hostname, i fear it's not properly installed
[0:38] <gregaf1> heh, nobody even looks at that return value
[0:39] <Tv> yup, sepia66 is not like good sepia boxes should be
[0:40] * `gregorg` (~Greg@ has joined #ceph
[0:40] * gregorg_taf (~Greg@ Quit (Read error: Connection reset by peer)
[0:44] <Tv> alright sepia13 looks good, is unlocked in autotest, and nobody has logged in in 5 days -- taking it
[0:46] * sugoruyo (~george@athedsl-408632.home.otenet.gr) Quit (Quit: sugoruyo)
[0:52] <Tv> joshd: hey is the device path really meant to be /dev/rbd/rbd/testimage.client.0
[0:52] <Tv> seems a bit repetititive
[0:52] <joshd> Tv: yeah, the second rbd is the pool
[0:52] <Tv> joshd: also what's the difference between .0 and .0:0 ?
[0:54] <joshd> Tv: .0 comes from the image name, :0 comes from the kernel and varies based on which images are mapped by the kernel
[0:54] <Tv> joshd: so :0 is just to make it unique, or something?
[0:54] <Tv> sorry for being confused
[0:55] <joshd> Tv: that's right
[0:55] <Tv> anyway, the code looks good enough, i rebased it and will push teuthology master soon
[0:55] <joshd> Tv: cool, thanks
[0:55] <Tv> joshd: there's some stuff where error handling could be better etc, but i'll just slap those on to the todo list, they don't seem too bad
[0:56] <Tv> oh except now i made it fail ;)
[0:56] <Tv> NameError: free variable 'role_images' referenced before assignment in enclosing scope
[0:56] <Tv> in def task
[0:56] <Tv> oh that looks halfbaked
[0:56] <Tv> fs_types is abandoned, etc
[0:58] <joshd> Tv: I don't see how the NameError is possible... maybe I forgot to push the most recent version
[0:58] <Tv> 79c77cc19a41be8b4deb8fcb76556640eead4c6b on the server
[0:59] <joshd> yeah, I thought I'd pushed this already, sorry
[1:01] <Tv> yeah that looks better
[1:08] <Tv> INFO:orchestra.run.err:Could not stat /dev/rbd/rbd/testimage.client.0 --- No such file or directory
[1:08] <Tv> INFO:orchestra.run.err:
[1:08] <Tv> INFO:orchestra.run.err:The device apparently does not exist; did you specify it correctly?
[1:09] <Tv> sadface
[1:09] * aliguori (~anthony@ Quit (Quit: Ex-Chat)
[1:11] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[1:37] * djlee (~dlee064@des152.esc.auckland.ac.nz) has joined #ceph
[1:37] <djlee> [42865.406937] Call Trace:
[1:37] <djlee> [42865.406952] [<ffffffffa01f9be2>] __mark_caps_flushing+0xa2/0x310 [ceph]
[1:37] <djlee> [42865.406981] [<ffffffffa01fd39e>] ceph_check_caps+0x7ee/0xdb0 [ceph]
[1:37] <djlee> [42865.407012] [<ffffffffa01ffac3>] ceph_check_delayed_caps+0x93/0x150 [ceph]
[1:37] <djlee> [42865.407044] [<ffffffffa0207ce5>] delayed_work+0x35/0x280 [ceph]
[1:37] <djlee> [42865.407077] [<ffffffff8107237a>] process_one_work+0x10a/0x420
[1:37] <djlee> [42865.407083] [<ffffffff81072e95>] worker_thread+0x165/0x340
[1:37] <djlee> [42865.407089] [<ffffffff810774a6>] kthread+0x96/0xa0
[1:37] <djlee> [42865.407094] [<ffffffff8151f484>] kernel_thread_helper+0x4/0x10
[1:37] <djlee> [42865.407098] Code: 74 07 f3 90 0f b7 13 eb f5 5b c3 66 66 2e 0f 1f 84 00 00 00 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 07 f3 90 <0f> b7 17 eb f5 c3 0f 1f 44 00 00 f0 81 07 00 00 00 01 48 89 f7
[1:38] <djlee> anyone know what the above problem is ?..
[1:38] <sagewk> djlee: what is the error above the stack trace?
[1:39] <djlee> just above the call trace: it is..
[1:39] <djlee> [42949.176591] Pid: 27374, comm: ffsb Not tainted #1 Supermicro X8DTN+-F/X8DTN+-F
[1:39] <djlee> [42949.176596] RIP: 0010:[<ffffffff81516695>] [<ffffffff81516695>] _raw_spin_lock+0x15/0x20
[1:39] <djlee> [42949.176604] RSP: 0018:ffff88038992de80 EFLAGS: 00000297
[1:39] <djlee> [42949.176607] RAX: 000000000000003a RBX: ffffffff81513cdf RCX: 00000000c0000100
[1:39] <djlee> [42949.176609] RDX: 0000000000000039 RSI: 0000000000000000 RDI: ffff8803d2cc3418
[1:39] <djlee> [42949.176612] RBP: ffff8803d2cc3760 R08: 0000000000000001 R09: 00000000007eb404
[1:39] <djlee> [42949.176615] R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff8151ed2e
[1:39] <djlee> [42949.176617] R13: 7fffffffffffffff R14: ffff88038992de88 R15: ffff88038992dec8
[1:39] <djlee> [42949.176620] FS: 00007f74752f5700(0000) GS:ffff8804bf220000(0000) knlGS:0000000000000000
[1:39] <djlee> [42949.176624] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[1:39] <djlee> [42949.176626] CR2: 00007f7474a24390 CR3: 00000002983c1000 CR4: 00000000000006e0
[1:39] <djlee> [42949.176629] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1:39] <djlee> [42949.176632] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[1:39] <djlee> [42949.176635] Process ffsb (pid: 27374, threadinfo ffff88038992c000, task ffff8804a11c26c0)
[1:39] <djlee> [42949.176637] Stack:
[1:39] <djlee> [42949.176639] ffffffff8116625d 0000000000000000 ffffffffa01ffc98 0000000000009919
[1:40] <djlee> [42949.176645] ffff8802cad87000 ffff8803d2cc30f8 ffff88049f88cc00 ffff88038992deb0
[1:40] <djlee> [42949.176649] ffff8804a087f000 0000000000000000 ffff8802cad87068 ffff88038992df64
[1:40] <djlee> v0.29.1 with
[1:41] <djlee> im getting lots of it, every several secs
[1:41] <djlee> the mesg looks the same
[1:41] <sagewk> i think there should be one more line above that with the actual error?
[1:42] <djlee> oh, wait
[1:42] <djlee> sorry for flushing the screen in advance,
[1:42] <djlee> [42949.176786] Call Trace:
[1:42] <djlee> [42949.176791] [<ffffffff8116625d>] igrab+0xd/0x40
[1:42] <djlee> [42949.176803] [<ffffffffa01ffc98>] ceph_flush_dirty_caps+0x118/0x270 [ceph]
[1:42] <djlee> [42949.176830] [<ffffffffa01e6a82>] ceph_sync_fs+0x22/0x160 [ceph]
[1:42] <djlee> [42949.176835] [<ffffffff81177f43>] __sync_filesystem+0x53/0x90
[1:42] <djlee> [42949.176840] [<ffffffff81151762>] iterate_supers+0x62/0xc0
[1:42] <djlee> [42949.176846] [<ffffffff81177eb9>] sync_filesystems+0x19/0x20
[1:42] <djlee> [42949.176850] [<ffffffff81178002>] sys_sync+0x12/0x40
[1:42] <djlee> [42949.176856] [<ffffffff8151e352>] system_call_fastpath+0x16/0x1b
[1:42] <djlee> [42949.176861] [<00007f74749e4137>] 0x7f74749e4136
[1:42] <djlee> [42949.360444] BUG: soft lockup - CPU#3 stuck for 67s! [kworker/3:0:25103]
[1:42] <djlee> [42949.360446] Modules linked in: ceph fuse edd bonding cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf loop dm_mod ses enclosure ioatdma igb ftdi_sio i7core_edac iTCO_wdt usbserial usbhid edac_core dca sg i2c_i801 iTCO_vendor_support pcspkr button uhci_hcd ehci_hcd usbcore megaraid_sas fan thermal processor thermal_sys
[1:43] <djlee> [42949.360470] CPU 3
[1:43] <djlee> [42949.360472] Modules linked in: ceph fuse edd bonding cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf loop dm_mod ses enclosure ioatdma igb ftdi_sio i7core_edac iTCO_wdt usbserial usbhid edac_core dca sg i2c_i801 iTCO_vendor_support pcspkr button uhci_hcd ehci_hcd usbcore megaraid_sas fan thermal processor thermal_sys
[1:43] <djlee> [42949.360493]
[1:43] <djlee> [42949.360496] Pid: 25103, comm: kworker/3:0 Not tainted #1 Supermicro X8DTN+-F/X8DTN+-F
[1:43] <djlee> [42949.360501] RIP: 0010:[<ffffffff81516695>] [<ffffffff81516695>] _raw_spin_lock+0x15/0x20
[1:43] <djlee> [42949.360507] RSP: 0018:ffff88028e539b98 EFLAGS: 00000287
[1:43] <djlee> [42949.360510] RAX: 0000000000009a20 RBX: ffff880200000000 RCX: 0000000000003fdd
[1:43] <djlee> [42949.360512] RDX: 0000000000009a1f RSI: ffff8804a0aa9800 RDI: ffff88049f88cd9c
[1:43] <djlee> [42949.360515] RBP: ffff88049f88cc00 R08: 0000000000000000 R09: 0000000000000000
[1:43] <djlee> [42949.360518] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8151ed2e
[1:43] <djlee> [42949.360520] R13: ffff88028e539bd0 R14: 0000000000001000 R15: 0000000000004000
[1:43] <djlee> [42949.360523] FS: 0000000000000000(0000) GS:ffff8804bf260000(0000) knlGS:0000000000000000
[1:43] <djlee> [42949.360526] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[1:43] <djlee> [42949.360529] CR2: 00007f9fca515000 CR3: 0000000001a03000 CR4: 00000000000006e0
[1:43] <djlee> [42949.360532] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1:43] <djlee> [42949.360535] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[1:43] <djlee> [42949.360538] Process kworker/3:0 (pid: 25103, threadinfo ffff88028e538000, task ffff8802cac62080)
[1:43] <djlee> [42949.360540] Stack:
[1:43] <djlee> [42949.360542] ffffffffa01f9be2 ffff88028e539d4c 0000000000000000 ffff8803d2cc33f8
[1:43] <djlee> [42949.360547] ffffffffa01fa691 ffff8803d2cc4690 ffffffffa01f8294 000000004e026be0
[1:43] <djlee> [42949.360551] 000000001f3be2e7 0000000000000001 0000000000000000 ffff880144c501a8
[1:43] <djlee> [42949.360556] Call Trace:
[1:43] <djlee> [42949.360571] [<ffffffffa01f9be2>] __mark_caps_flushing+0xa2/0x310 [ceph]
[1:43] <djlee> [42949.360601] [<ffffffffa01fd39e>] ceph_check_caps+0x7ee/0xdb0 [ceph]
[1:43] <djlee> [42949.360632] [<ffffffffa01ffac3>] ceph_check_delayed_caps+0x93/0x150 [ceph]
[1:43] <djlee> [42949.360663] [<ffffffffa0207ce5>] delayed_work+0x35/0x280 [ceph]
[1:43] <djlee> [42949.360696] [<ffffffff8107237a>] process_one_work+0x10a/0x420
[1:44] <djlee> [42949.360702] [<ffffffff81072e95>] worker_thread+0x165/0x340
[1:44] <djlee> [42949.360708] [<ffffffff810774a6>] kthread+0x96/0xa0
[1:44] <djlee> [42949.360713] [<ffffffff8151f484>] kernel_thread_helper+0x4/0x10
[1:44] <djlee> [42949.360716] Code: 74 07 f3 90 0f b7 13 eb f5 5b c3 66 66 2e 0f 1f 84 00 00 00 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 07 f3 90 <0f> b7 17 eb f5 c3 0f 1f 44 00 00 f0 81 07 00 00 00 01 48 89 f7
[1:44] <djlee> [42949.360747] Call Trace:
[1:44] <djlee> [42949.360759] [<ffffffffa01f9be2>] __mark_caps_flushing+0xa2/0x310 [ceph]
[1:44] <djlee> [42949.360785] [<ffffffffa01fd39e>] ceph_check_caps+0x7ee/0xdb0 [ceph]
[1:44] <djlee> [42949.360815] [<ffffffffa01ffac3>] ceph_check_delayed_caps+0x93/0x150 [ceph]
[1:44] <djlee> [42949.360846] [<ffffffffa0207ce5>] delayed_work+0x35/0x280 [ceph]
[1:44] <djlee> [42949.360877] [<ffffffff8107237a>] process_one_work+0x10a/0x420
[1:44] <djlee> [42949.360882] [<ffffffff81072e95>] worker_thread+0x165/0x340
[1:44] <djlee> [42949.360887] [<ffffffff810774a6>] kthread+0x96/0xa0
[1:44] <djlee> [42949.360892] [<ffffffff8151f484>] kernel_thread_helper+0x4/0x10
[1:44] <sagewk> djlee: i suspect 70b666c3b4cb2b96098d80e6f515e4bc6d37db5a in ceph-client.git will fix this.. didn't make it into .39 unfortunately.
[1:45] <sagewk> can you give that a go and let us know if you still get a lockup?
[1:46] <djlee> how'd do i git to that 70b666 ?
[1:48] <djlee> was this problem specific to having more nodes..? I didn't see this before in 0.27 and/or with 4 nodes,
[1:48] <djlee> i was trying on 6 nodes
[1:48] <djlee> and that happened
[2:03] <bchrisman> sagewk: so that file creation latency isn't an NFS issue, but fuse client gives < 0.01s creates while kernel client ~0.3s file creates
[2:05] <gregaf1> bchrisman: I was going to say, we had half-second creates for a while but after a bit of work that shouldn't be happening anymore
[2:09] <gregaf1> bchrisman: also, added you as a developer on the linux kernel project; you've got perms now :)
[2:10] <bchrisman> gregaf1: those were from kernel client only as well?
[2:10] <gregaf1> I haven't tested it lately, booting up a UML instance now
[2:10] <gregaf1> but i'm pretty sure it worked properly at one point
[2:11] <bchrisman> gregaf1: that kernel client I'm using is compiled from just a couple days ago ceph-client, master branch
[2:11] <bchrisman> well.. might be just over a week old...
[2:11] <gregaf1> hmm, on current master and an all-local setup I get:
[2:11] <gregaf1> uml:~# time touch mnt/file1
[2:11] <gregaf1> real 0m0.056s
[2:11] <gregaf1> user 0m0.000s
[2:11] <gregaf1> sys 0m0.000s
[2:11] <gregaf1> uml:~# time touch mnt/file2
[2:11] <gregaf1> real 0m0.016s
[2:11] <gregaf1> user 0m0.000s
[2:11] <gregaf1> sys 0m0.000s
[2:11] <gregaf1> uml:~# time touch mnt/file3
[2:11] <gregaf1> real 0m0.013s
[2:11] <gregaf1> user 0m0.000s
[2:11] <gregaf1> sys 0m0.000s
[2:11] <gregaf1> uml:~# time touch mnt/file4
[2:11] <gregaf1> real 0m0.012s
[2:11] <gregaf1> user 0m0.000s
[2:11] <gregaf1> sys 0m0.000s
[2:11] <gregaf1> uml:~# time touch mnt/file5
[2:11] <bchrisman> that's on kernel client too?
[2:12] <gregaf1> real 0m0.013s
[2:12] <gregaf1> user 0m0.000s
[2:12] <gregaf1> sys 0m0.000s
[2:12] <gregaf1> uml:~# time touch mnt/file6
[2:12] <gregaf1> real 0m0.014s
[2:12] <gregaf1> user 0m0.000s
[2:12] <gregaf1> sys 0m0.000s
[2:12] <gregaf1> uml, yeah
[2:13] <gregaf1> I think it does still require going to the MDS, how's your latency there?
[2:13] <bchrisman> works with cfuse with < 0.01s file create time.. so I don't think it's an issue outside the code.
[2:14] <gregaf1> hmm
[2:14] <bchrisman> http://pastebin.com/LFqAyZN2
[2:15] <bchrisman> when you were seeing slow creates, was it in the kernel client only, or both cfuse and kernel client?
[2:15] <bchrisman> (have to say, that's a nice thing about having two separate code bases for clients...)
[2:15] <gregaf1> it used to be they were really slow because the MDS had to go to disk for each create
[2:16] <bchrisman> any cephfs mount options that I need to specify?
[2:16] <gregaf1> sage finally got rid of that by doing ino preallocation, and in principal we can extend it so that file creates are an entirely local affair if the client has enough dir caps
[2:16] <gregaf1> no, not any mount options
[2:16] <bchrisman> trying to figure out why the kernel's behaving differently from cfuse
[2:17] <gregaf1> I'm surprised that you're seeing the issue and I'm not
[2:17] <gregaf1> I'm not real good with the kernel though, and I doubt it's a protocol bug or it would show up for me too
[2:18] <gregaf1> you want to create a bug?
[2:18] <bchrisman> will do..
[2:19] <bchrisman> hmm??? is there a commit id or something in the kernel module that I can drag out of somewhere to verify the kernel module source? I know what I put on there, but it'd be nice to validate that on-box.
[2:19] <gregaf1> probably? that's outside my expertise, sorry
[2:19] <gregaf1> if you want to at least start checking it
[2:19] <gregaf1> you can dump out the ceph message traffic debug logs
[2:20] <gregaf1> and see whether the time lag is between when the client sends out the mknod and the MDS responds
[2:20] <bchrisman> yeah.. I suppose I could do that for one fuse request and one kernel request and make some comparison.
[2:20] <gregaf1> or if it's not
[2:20] <bchrisman> yeah
[2:20] <gregaf1> my initial thought was that the uclient and the kclient are dropping their caps differently so the MDS has to go a few rounds with the kclient
[2:20] <gregaf1> but I should be seeing that too, in that case
[2:21] <bchrisman> that would be in the messenger?
[2:21] <gregaf1> although if you have other active clients that might do something too
[2:21] <bchrisman> or show up in the mds logs too?
[2:21] <bchrisman> only one client at a time right now
[2:21] <gregaf1> yeah, either one, you just want to look at the message passing traffic
[2:22] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[2:42] * gregaf1 (~Adium@ip-66-33-206-8.dreamhost.com) has left #ceph
[2:42] * gregaf1 (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[2:48] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Quit: http://daniel.friesen.name or ELSE!)
[2:48] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[2:50] * yoshi (~yoshi@p24092-ipngn1301marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[2:51] * Tv (~Tv@cpe-76-168-227-45.socal.res.rr.com) has joined #ceph
[2:52] * Tv (~Tv@cpe-76-168-227-45.socal.res.rr.com) has left #ceph
[2:58] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[5:36] * nolan (~nolan@phong.sigbus.net) Quit (Ping timeout: 480 seconds)
[5:45] * nolan (~nolan@phong.sigbus.net) has joined #ceph
[6:01] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[6:53] * cmccabe1 (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[8:11] * `gregorg` (~Greg@ Quit (Quit: Quitte)
[8:11] * gregorg (~Greg@ has joined #ceph
[8:35] * johnl (~johnl@johnl.ipq.co) Quit (Remote host closed the connection)
[8:35] * johnl (~johnl@johnl.ipq.co) has joined #ceph
[8:43] * johnl_ (~johnl@johnl.ipq.co) has joined #ceph
[8:43] * johnl (~johnl@johnl.ipq.co) Quit (Remote host closed the connection)
[10:26] * sugoruyo (~george@athedsl-408632.home.otenet.gr) has joined #ceph
[11:12] * allsystemsarego (~allsystem@ has joined #ceph
[11:26] * yoshi (~yoshi@p24092-ipngn1301marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:35] * stefanha (~stefanha@yuzuki.vmsplice.net) Quit (Quit: leaving)
[12:01] <sugoruyo> hey folks, out of curiosity can someone tell me - other than the regular stuff - what does Ceph store in an inode?
[12:58] <sugoruyo> also can someone tell me how Ceph goes from filename + inode number to oid and pgid so CRUSH can determine the OSDs to ask i/o from?
[13:19] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) Quit (Remote host closed the connection)
[13:19] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) has joined #ceph
[13:20] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) Quit ()
[13:20] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) has joined #ceph
[14:18] * jbd (~jbd@ks305592.kimsufi.com) has left #ceph
[14:30] * hijacker (~hijacker@ has joined #ceph
[14:32] * jbd (~jbd@ks305592.kimsufi.com) has joined #ceph
[14:53] * jbd (~jbd@ks305592.kimsufi.com) has left #ceph
[14:54] * jbd (~jbd@ks305592.kimsufi.com) has joined #ceph
[16:06] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[16:37] * lxo (~aoliva@19NAABZZM.tor-irc.dnsbl.oftc.net) Quit (Read error: Connection reset by peer)
[16:42] * lxo (~aoliva@83TAAB0X0.tor-irc.dnsbl.oftc.net) has joined #ceph
[17:02] <stingray> yehudasa: you fixed it!
[17:37] <stingray> now my mds journal is broken and reset-journal doesnt work
[17:38] <stingray> 0xffffu...
[17:41] <stingray> 2011-06-23 19:41:03.986181 7fa4dc666700 cephx server mds.vuvuzela: unexpected key: req.key=8473da971ab65827 expected_key=e493dfc641be2bec
[17:52] <stingray> assert(p.end());
[17:53] <stingray> funny thing, journal check works
[17:53] <stingray> but replay fails on this assert in LogEvent.cc
[17:55] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:55] * greglap (~Adium@ has joined #ceph
[17:59] <greglap> stingray: if you're running master, upgrade to the latest version, that'll probably clear up the assert
[18:01] <stingray> greglap: and if I'm on stable?
[18:02] <greglap> hmm, not sure ??? somebody else saw that assert and it went away by upgrading
[18:02] <stingray> shall I just drop this idea of running stable and switch to master?
[18:02] <greglap> no, stable should be working :x
[18:03] <stingray> I think that Dumper etc. auth code is broken
[18:03] <greglap> let me look at what's been changing lately
[18:03] <greglap> authentication was temporarily broken on master, I imagine it's related to that ??? but it shouldn't have gotten into stable!
[18:04] <stingray> thanks :)
[18:05] <stingray> I ws just about to reset the journal, but it doesn't work because of auth
[18:05] <stingray> and journal replay authenticates but fails on this assert :)
[18:06] <greglap> hmm, do you have any patches on top of stable?
[18:09] <greglap> I think i know what's breaking the journaling but have no idea why auth wouldn't be working on stable
[18:11] <greglap> sagewk: you didn't version the fullbit encode/decode :(
[18:11] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:15] * Yulya_ (~Yu1ya_@ip-95-220-242-20.bb.netbynet.ru) has joined #ceph
[18:17] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Quit: Ex-Chat)
[18:36] <greglap> actually, that encoding should be good
[18:36] <greglap> but there's literally nothing else that's changed on the mds, blech
[18:39] <greglap> stingray: can you post the last few lines of the mds journal somewhere?
[18:39] <greglap> and are you sure that all your daemons are running the same version of the code?
[18:40] * aliguori (~anthony@ has joined #ceph
[18:45] <stingray> greglap: sorry, I'm in an unrelated meeting. Will reappear with everything you requested in ab hour or so
[18:45] <greglap> np
[18:49] * greglap (~Adium@ Quit (Read error: Connection reset by peer)
[18:51] * cmccabe (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) has joined #ceph
[18:52] <stingray> gregaf1: my diffs from stable are only in librbd (stuff that yedudasa fixed)
[18:52] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:53] <stingray> gregaf1: I am running the same version on all hosts (all 3 of them)
[18:54] <stingray> gregaf1: http://pastebin.com/afsCKiSy
[19:05] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:08] <stingray> greglap:
[19:08] <stingray> is it paste useful to you?
[19:09] <gregaf1> a little
[19:09] <gregaf1> are you using cephx?
[19:09] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:13] <stingray> yep
[19:14] <gregaf1> ah, did you make sure that your journal reset was using it?
[19:14] <gregaf1> that might be why the auth errors
[19:14] <gregaf1> still not sure why the replay bug, though
[19:14] <stingray> it gets the same config file
[19:14] <stingray> which is -c /etc/ceph/ceph.conf, always
[19:14] <gregaf1> how recent a stable are you on? (what commit id?)
[19:15] <stingray> b34e195a46e8fc6eba0099b517685a205ce86061
[19:16] <gregaf1> what command are you using for journal reset?
[19:16] <gregaf1> bbiab, team meeting
[19:17] <stingray> cmds -c /etc/ceph/ceph.conf -i vuvuzela --reset-journal 0 --debug_ms 20 --debug_mds 20 -d --debug_objecter 20 --debug_journal 20 --debug_auth 20 --debug_osd 20
[19:28] <Tv> i fully expect mds.vuvuzela to spam the whole cluster with lots of boring messages
[19:28] <gregaf1> bah, that should be working
[19:28] <cmccabe> tv: just because it's named vuvuzela?
[19:28] <Tv> yes
[19:28] <stingray> so far, it fails authx when you do dump or reset
[19:28] <cmccabe> haha
[19:28] <stingray> and fails assert if you try to replay journal
[19:29] <gregaf1> yeah
[19:29] <gregaf1> let me see if I can reproduce the auth problem here, at least
[19:34] <stingray> I just reproduced it on my second cluster.
[19:38] <gregaf1> yep, I get it too
[19:44] <stingray> gregaf1: any suggestions on how to fix it?
[19:44] <gregaf1> I've got sage looking at it :p (I don't know the auth subsystem too well)
[19:45] <gregaf1> I'll see if I can get the journal problem too, were you running on 0.29.1 before?
[19:48] <stingray> yep.
[19:49] <stingray> I'm generally tracking stable
[19:49] <stingray> so I was the commit I've posted + some librbd stuff which is hardly relevant to this problem :)
[19:49] <stingray> I am almost happy though - my vm on rbd survived bonnie++
[20:18] <gregaf1> hmm, when I try upgrading I can't even get ceph tool to authenticate
[20:18] <gregaf1> and my MDS replay crashed on an rstat bug, not that assert
[20:18] <stingray> upgrading to master?
[20:18] <gregaf1> this qualifies as "not helpful"
[20:18] <gregaf1> no, from .29.1 to current stable
[20:23] <stingray> wtf.
[20:24] <gregaf1> ?
[20:24] <stingray> the diffstat vetween 0.29.1 and stable is, like, tiny
[20:24] * Yulya_ (~Yu1ya_@ip-95-220-242-20.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[20:24] <stingray> some encoding/decoding stuff, something with pointers
[20:24] <gregaf1> yes
[20:24] <stingray> I haven't had any problems with upgrade itself
[20:24] <stingray> this journal corruption just happened
[20:25] <stingray> as it usually happens, client got stuck, I restarted mds, boom
[20:25] <stingray> now it's crashlooping
[20:25] <gregaf1> wait, it wasn't over the upgrade?
[20:27] <stingray> no
[20:27] <stingray> it just happened.
[20:28] <gregaf1> oh
[20:28] <stingray> this doesn't take away the inability to do journal dump or reset
[20:28] <stingray> :)
[20:28] <gregaf1> all right, guess I misunderstood that
[20:29] <gregaf1> so can you dump the journal and post it somewhere for me to look at?
[20:29] <gregaf1> and remind me
[20:30] <gregaf1> this is on current stable?
[20:30] <gregaf1> and happened after a client hang so you restarted the mds
[20:30] <stingray> no I can not dump the journal
[20:30] <stingray> as I said above
[20:30] <stingray> dump journal doesn't work because of some bug in auth
[20:30] <stingray> which onlly affects dumper or resetter
[20:30] <stingray> doesn't affect the regular replay
[20:31] <gregaf1> but you upgraded from .29.1 to current stable without problems
[20:31] <gregaf1> it was during a later restart that replay stopped working
[20:32] <gregaf1> that correct?
[20:32] <stingray> I may have created this cluster using later commit
[20:32] <gregaf1> okay
[20:32] <stingray> not exactly 0.29.1 but closer to current stable
[20:32] <stingray> so no problems with upgrades
[20:32] <stingray> I was writing files to the system
[20:32] <stingray> the client hung
[20:32] <stingray> so I restarted the client and the mds
[20:33] <stingray> mds never returned because it started failing on journal reply
[20:33] <stingray> I tried dumping journal it didn't work because of auth fail
[20:33] <stingray> same with reset
[20:33] <gregaf1> right
[20:33] <stingray> I tried check journal and it worked - replayed and crashed
[20:33] <stingray> now, to give you the journal dump, I need working dumper
[20:33] <gregaf1> all right
[20:34] <gregaf1> so can you restart your cluster with cephx turned off and get me a journal dump?
[20:34] <gregaf1> I think that should work :)
[20:34] <stingray> well, I guess I can do that
[20:35] <gregaf1> I'm making a bug about the problem with cephx and journal dump, I'll talk to Yehuda about it later today
[20:43] <stingray> http://stingr.net/d/stuff/dump1.xz
[20:52] <stingray> do you need my cluster in broken state, or can I reset-journal it and continue my exercise?
[21:01] * sayotte (~ircuser@ has left #ceph
[21:18] * aliguori (~anthony@ Quit (Quit: Ex-Chat)
[21:18] * aliguori (~anthony@ has joined #ceph
[21:27] <yehudasa> stingray: I pushed a fix yesterday to the aio_sparse_read branch, should be better now, I'm having trouble testing it myself, my backend runs on ext4 and it seems to trigger a bug there
[21:32] <stingray> yehudasa: well, I cherrypicked it from there and it worked
[21:32] <stingray> that's why I said you fixed it
[21:32] <yehudasa> oh.. missed that message :)
[21:33] <stingray> now my vm dies dd if=... and also it survives bonnie++
[21:33] <stingray> which is a win
[21:33] <stingray> what failed though is this mds stuff I was annoying gregaf1 with
[21:34] <stingray> I only have 2 servers in a cluster now, as soon as I'll fix metadata I'll expand it and do some performance tests
[21:35] <yehudasa> when you say 'vm dies', you really mean 'vm does', right?
[21:36] <stingray> yeah
[21:36] <stingray> i is next to o
[21:37] <stingray> and those 3 hours I spent in the immigration bureau this morning are not helping
[21:39] <yehudasa> heh.. that's never fun
[21:53] * DavidDreezer (~Adium@fw0hq.hr01.groupee-inc.net) has joined #ceph
[21:54] <DavidDreezer> Hello. We're thinking about using Ceph and were wondering if the backports will be fixed to work again with the Squeeze kernel so that we can do some rudimentary testing.
[21:54] <stingray> what's the squeeze kernel?
[21:55] <stingray> I am using it on ext4 without journal
[21:55] <cmccabe> stingray: it's an old debian kernel
[21:55] <stingray> and ext4 works on pretty much every kernel since 2008
[21:55] <stingray> ah
[21:55] <DavidDreezer> current squeeze is 2.6.32-5
[21:55] <cmccabe> DavidDreezer: I am not the expert on this topic, but I think there were some btrfs bugs in the older kernels that cause problems
[21:55] <DavidDreezer> which is the current debian release
[21:55] <stingray> nah, btrfs is totally unusable on 2.6.32
[21:56] <stingray> just use ext4
[21:56] <stingray> it'll be grand
[21:56] <cmccabe> DavidDreezer: you might be able to avoid them by using ext4 on the old kernels... probably ask sage if he knows of any outstanding issues with that config
[21:56] <gregaf1> stingray: sorry, went off to lunch
[21:56] <DavidDreezer> so, use ceph with ext4 in lnstead of btrfs?
[21:56] <gregaf1> I think you're okay resetting your cluster now if you don't want to wait
[21:57] <cmccabe> gregaf: do you know of any issues with ceph+ext4+linux 2.6.32
[21:58] <gregaf1> DavidDreezer here is talking about the ceph client backport; I'm not sure what its shape is right now
[21:58] <DavidDreezer> problem i have is that I have not been able to get ceph to compile on a debian squeeze machine since 0.27.1
[21:58] <cmccabe> oh, that alternate kernel-client tree
[21:58] <gregaf1> yeah, that works on older kernels like .32
[21:58] <gregaf1> the userspace stuff should still be good
[21:59] <gregaf1> you could do rudimentary testing if you grabbed an older backports branch checkout, it wouldn't be as good as current kclient but it would work for a lot of stuff
[21:59] <DavidDreezer> we really are interested in using ceph when it becomes production ready. I'd like to start doing some rudimentary testing on it to prepare.
[22:00] <DavidDreezer> are you still on target to be ready for production this calendar year, more or less?
[22:00] <gregaf1> unfortunately the best way to keep track of the current state is to use a current kernel ??? we just don't have the manpower to backport
[22:00] <gregaf1> it wouldn't surprise me if we got it working on the newest red hat or something once we were set, but we're not there yet
[22:00] <DavidDreezer> i'm limited to whatever comes in debian's latest distro
[22:01] <gregaf1> DavidDreezer: depends on which bits you want to be working
[22:01] <gregaf1> RADOS is pretty stable already and is getting a ton of testing now since we're rolling at an S3 clone which it backs
[22:02] <stingray> gregaf1: np. Do you need anything else besides that dump?
[22:02] <gregaf1> single-MDS Ceph configs are also doing pretty well at this point, they haven't seen as much testing but I don't run into bugs routinely and I don't think that anybody else is either; multi-MDS configs still have a lot of work to do
[22:02] <DavidDreezer> ah??????.. that was good information
[22:02] <DavidDreezer> thank you
[22:03] <gregaf1> stingray: a full log of the replay might be useful too
[22:03] <gregaf1> yep :)
[22:03] <bchrisman> hmm.. is rbd also backported to 2.6.32?
[22:04] <DavidDreezer> i'm not actually sure we need multiple MDS, but if we're trying to remove all single points of failure multiple MDS would be in order, no?
[22:05] <gregaf1> not really, you'd just set up a standby for it ??? those are different from multi-MDS configs
[22:05] <yehudasa> bchrisman: don't think so
[22:06] <bchrisman> ok.. thx
[22:06] <yehudasa> bchrisman: shouldn't be too hard to backport.. not doing anything fancy
[22:07] <bchrisman> yeah.. a lot less going on in terms of interoperating with the kernel than a filesystem...
[22:07] <DavidDreezer> is there any way to call someone associated with the team and discuss our needs and get some pointers for moving in the right direction?
[22:09] * Yulya_ (~Yu1ya_@ip-95-220-161-118.bb.netbynet.ru) has joined #ceph
[22:10] <sagewk> DavidDreezer: sure, can you send contact info to sage@newdream.net and i'll get you connected
[22:10] <DavidDreezer> that would be so very helpful. I'll do that.
[22:11] <DavidDreezer> thank you all for your time, I appreciate it greatly.
[22:12] * DavidDreezer (~Adium@fw0hq.hr01.groupee-inc.net) has left #ceph
[22:17] <stingray> gregaf1: mds.vuvuzela.log.xz in the same dir
[22:17] <gregaf1> hmm, hit a 403 forbidden
[22:21] <gregaf1> stingray: I can't pull that down, looks like you need to give the internet read perms :)
[22:23] <cmccabe> yehudasa: looks like ede3a0a broke the build
[22:24] <stingray> chmod a+r mds.vuvuzela.log.xz
[22:24] <stingray> done
[22:24] <stingray> sorry
[22:24] <cmccabe> bad merge perhaps?
[22:25] <gregaf1> stingray: got it, thanks!
[22:25] <yehudasa> cmccabe: will be fixed in a second
[22:26] <cmccabe> yehudasa: thx
[22:30] <stingray> good
[22:30] <stingray> I'll go home, then
[22:56] * aliguori (~anthony@ Quit (Ping timeout: 480 seconds)
[23:10] <cmccabe> yehudasa: testradospp is broken in master
[23:11] <cmccabe> not sure what's going on with that
[23:11] <cmccabe> whatever it is, it's not affecting testrados (non-pp)
[23:56] <bchrisman> losing my mind??? can't recreate the create-lag now??? but I pastebin'ed my results yesterday, so I know I did it.
[23:58] <Tv> bchrisman: universe was laggy, you moved to a less laggy shard of the game server
[23:58] <bchrisman> that explains it??? need the non-laggy membership option..
[23:58] <sjust> I'm going to start explaining all of my bugs that way

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.