#ceph IRC Log

Index

IRC Log for 2010-11-18

Timestamps are in GMT/BST.

[0:00] <ajnelson> The limits are whatever `./vstart.sh -d -n -l` set.
[0:00] <gregaf> with ioctls?
[0:00] <ajnelson> Aye.
[0:00] <ajnelson> The kernel and userspace clients both export the interface for the two relevant ioctls for this work.
[0:01] <gregaf> huh, I didn't realize that
[0:01] <gregaf> in any case, if you're using the userspace client I bet that's what is crashing
[0:01] <gregaf> unfortunately it's not going to have any logs unless you explicitly enabled them
[0:02] <ajnelson> Oh.
[0:02] * Jiaju (~jjzhang@222.126.194.154) has joined #ceph
[0:02] <gregaf> are the other daemons still running?
[0:02] <ajnelson> They appear to be.
[0:02] <gregaf> yeah, probably cfuse crashed horribly then
[0:03] <ajnelson> Ok. I should be using the kernel client, then?
[0:03] <gregaf> so I'd try re-running the tests with debug turned up and a log file specified
[0:03] <ajnelson> Ok.
[0:03] <gregaf> well it'll definitely be more stable
[0:03] <ajnelson> Ok.
[0:03] <gregaf> but I am trying to firm up cfuse so if you can get me logs and maybe a core dump that's be helpful
[0:03] <ajnelson> What is the flag for cfuse logging?
[0:04] <gregaf> like mount it with extra options --debug_ms 1 --debug_client 20 —log-file=path
[0:04] <gregaf> you can specify the logging separately for the different components but that'll probably catch it
[0:11] <ajnelson> greag: Hrm. Technical assistance request: I need a super-duper kill -9. Can't get my mount point back from an unkillable cfuse, and I forgot how I've done this before.
[0:11] <ajnelson> gregaf: (Last message)
[0:12] <gregaf> umm, kill −9 should work on cfuse
[0:12] <gregaf> and then fusermount -u mount_point will get back your mount point
[0:12] <ajnelson> Ah got it. Forgot the -9, sorry.
[0:12] <ajnelson> (Thus the head scratching.)
[0:12] <gregaf> :)
[0:13] <ajnelson> Oh, an aside: Is there any chart that shows how well particular Ceph kernel clients work with particular server releases?
[0:14] <gregaf> don't think so, no
[0:15] <gregaf> it "should" work as long as the server version is newer than the kclient, but those scenarios aren't tested too well since most of our current users/testers run very recent versions of both
[0:15] <ajnelson> Reasonable.
[0:21] <ajnelson> Ok. Well, I have the Hadoop test suite running again working out of a Ceph directory, Ceph via cfuse. When that falls over, I'll check in again.
[0:21] <gregaf> cool, thanks
[0:22] <ajnelson> This is on a Fedora Core 12 machine, need to see if I can update its kernel and/or ceph-kclient without breaking things too horribly...
[0:23] <ajnelson> Via http://fedoraproject.org/wiki/YumUpgradeFaq: "Although upgrades with yum works, it is not explicitly tested as part of the release process by the Fedora Project. If you are not prepared to resolve issues on your own if things break, you should probably use the recommended installation methods instead."
[0:23] <ajnelson> Sigh.
[0:23] <gregaf> ajnelson: Hmmm, Sage is pretty sure that FUSE doesn't have any way to make use of ioctls
[0:23] <ajnelson> ...Huh. In that case, I probably have another unit test to write.
[0:23] <ajnelson> =/
[0:23] <gregaf> :)
[0:25] <ajnelson> ...Oh wait, I think I went down this road on paper before. I got blocked at not knowing how to explicitly set a block's location...or more realistically, not having time in the course project constraints.
[0:25] <sagewk> fwiw i'm most interested in seeing how well the shim that uses the kernel client and ioctls works
[0:25] <ajnelson> I may have to punt on that for the list.
[0:25] <ajnelson> Oh
[0:25] <ajnelson> sagewk: The shim actually worked pretty well when I ran it on the issdm cluster. I made a bad judgement call in scaling it out, though, so there aren't many interesting results. It was functional, at least.
[0:26] <sagewk> and it used the ioctl to extract the layout from ceph?
[0:26] <ajnelson> Aye.
[0:26] <sagewk> cool
[0:27] <ajnelson> I'm combing through the patch right now to make sure I don't have any spectacularly dumb comments. There are plenty of debug printouts, so I'm not sure how well that would fly going to the mailing list.
[0:31] <ajnelson> gregaf: Oh. Hadoop's died again. Failed on the same test: Running org.apache.hadoop.fs.TestCopyFiles
[0:32] <gregaf> ajnelson: Hadoop issue or Ceph issue?
[0:33] <ajnelson> I'm going to call this a Ceph issue. Hadoop didn't print any output. I just stopped the services.
[0:33] <ajnelson> *Ceph services.
[0:33] <ajnelson> Eeeeeum. client log is 554 MB long. Apparently this was dead awhile.
[0:34] <sagewk> looping?
[0:34] <ajnelson> Yeah, looks like it:
[0:34] <ajnelson> [ajnelson@bass log]$ tail client.log
[0:34] <ajnelson> 2010-11-17 15:33:51.343733 7f2ecdbe4710 -- 127.0.0.1:0/4480 >> 127.0.0.1:6789/0 pipe(0x7f2eb80264d0 sd=-1 pgs=0 cs=0 l=0).fault first fault
[0:34] <ajnelson> 2010-11-17 15:33:54.343765 7f2eddae2710 -- 127.0.0.1:0/4480 mark_down 127.0.0.1:6789/0 -- 0x7f2eb80264d0
[0:34] <ajnelson> 2010-11-17 15:33:54.343842 7f2eddae2710 -- 127.0.0.1:0/4480 --> mon2 127.0.0.1:6791/0 -- auth(proto 0 30 bytes) v1 -- ?+0 0x7f2eb802a050
[0:34] <ajnelson> 2010-11-17 15:33:54.344009 7f2ecdae3710 -- 127.0.0.1:0/4480 >> 127.0.0.1:6791/0 pipe(0x7f2eb8026d00 sd=-1 pgs=0 cs=0 l=0).fault first fault
[0:34] <ajnelson> 2010-11-17 15:33:57.344081 7f2eddae2710 -- 127.0.0.1:0/4480 mark_down 127.0.0.1:6791/0 -- 0x7f2eb8026d00
[0:34] <ajnelson> 2010-11-17 15:33:57.344193 7f2eddae2710 -- 127.0.0.1:0/4480 --> mon1 127.0.0.1:6790/0 -- auth(proto 0 30 bytes) v1 -- ?+0 0x7f2eb802a870
[0:34] <ajnelson> 2010-11-17 15:33:57.344399 7f2ecd9e2710 -- 127.0.0.1:0/4480 >> 127.0.0.1:6790/0 pipe(0x7f2eb802a4d0 sd=-1 pgs=0 cs=0 l=0).fault first fault
[0:34] <ajnelson> 2010-11-17 15:34:00.344430 7f2eddae2710 -- 127.0.0.1:0/4480 mark_down 127.0.0.1:6790/0 -- 0x7f2eb802a4d0
[0:34] <ajnelson> 2010-11-17 15:34:00.344536 7f2eddae2710 -- 127.0.0.1:0/4480 --> mon2 127.0.0.1:6791/0 -- auth(proto 0 30 bytes) v1 -- ?+0 0x7f2eb802e050
[0:34] <ajnelson> 2010-11-17 15:34:00.344730 7f2ecd8e1710 -- 127.0.0.1:0/4480 >> 127.0.0.1:6791/0 pipe(0x7f2eb802ad00 sd=-1 pgs=0 cs=0 l=0).fault first fault
[0:35] <sagewk> looks like its the monitors that are down
[0:36] <ajnelson> Did I capture sufficient details? Here was my cfuse call: sudo cfuse --debug_ms 1 --debug_client 20 --log-file=log/client.log -m 127.0.0.1:6789 /mnt/ceph -o nonempty
[0:38] <gregaf> yeah, that should be fine
[0:38] <sagewk> the client is just complaining it can't talk to the monitors (at least in that log fragment). you stopped them explicitly though? that's probably why. unless they crashed, in which case the monitor logs should have a stack trace
[0:38] <ajnelson> I stopped the client, and then the services.
[0:38] <sagewk> check the mon logs then?
[0:38] <ajnelson> Ok.
[0:39] <ajnelson> Should the monitors have put output in log/ ? (I ran `./vstart.sh -d -n -l`.)
[0:40] <sagewk> yeah, log/mon.a.log probably
[0:40] <ajnelson> Oh. Well, there are no mon logs:
[0:40] <ajnelson> [ajnelson@bass log]$ ls
[0:40] <ajnelson> client.log mds.a.log mds.a.mem.log mds.a.server.log mds.b.log mds.b.mem.log mds.b.server.log mds.c.log mds.c.mem.log mds.c.server.log osd.0.log
[0:40] <sagewk> out/mon.a.log
[0:41] <ajnelson> Thought so.
[0:41] <ajnelson> wait
[0:41] <ajnelson> There is no out/mon.a.log either:
[0:41] <ajnelson> bass.soe.ucsc.edu.3755 bass.soe.ucsc.edu.3807 bass.soe.ucsc.edu.3816 bass.soe.ucsc.edu.3901 bass.soe.ucsc.edu.3965 bass.soe.ucsc.edu.4012 mds1 mds.a mds.c mon.a.0 mon.b.0 mon.c.0 osd.0.0
[0:41] <ajnelson> bass.soe.ucsc.edu.3783 bass.soe.ucsc.edu.3812 bass.soe.ucsc.edu.3820 bass.soe.ucsc.edu.3928 bass.soe.ucsc.edu.3990 mds0 mds2 mds.b mon.a mon.b mon.c osd.0
[0:42] <sagewk> /log/mon.a.0?
[0:42] <sagewk> or mon.a
[0:42] <ajnelson> Sorry, these are in out/, not log/.
[0:42] <sagewk> er right, out/. ignore log/ :)
[0:42] <ajnelson> k. =)
[0:43] <ajnelson> Hmm, I see these lines about 40 or so from the end:
[0:43] <ajnelson> max_mds 1
[0:43] <ajnelson> in
[0:43] <ajnelson> up {}
[0:43] <ajnelson> failed
[0:43] <ajnelson> stopped
[0:43] <ajnelson> Oh, nevermind, I think I get it. Not actually failed or stopped.
[0:44] <ajnelson> Hrm. 2.2G of logs. What would be the best way to help you out? I'm kind of fumbling around looking for crash indicators.
[0:44] <jantje> fyi, right before I left at work, I think I trashed my cluster by doing a single osd bench when it was running at full gbit speed, the last message was something with the journal being full, I'm not 100%, since I still have to reproduce. My journal is on a memory device, dio=false and parallell filestore is on
[0:45] <jantje> so I'll let you know tomorrow
[0:45] <jantje> sagewk: found anything with those traces?
[0:46] <sagewk> looking at it right now, actually. trying to figure out where EOVERFLOW comes from.
[0:47] <gregaf> ajnelson: if something crashed it ought to have left a backtrace at the end of its log
[0:47] <sagewk> ajnelson: just look at the end of mon.a and see if there is a stack trace?
[0:47] <gregaf> if there aren't any anywhere, is there any chance your local machine ran out of memory and stuff got OOM-killed?
[0:48] <sagewk> oh, jantje: this is a 32bit machine right?
[0:48] <ajnelson> mon.a,b,c don't have stack traces at the end.
[0:49] <ajnelson> Last two lines of mon.a look ok to my untrained eyes:
[0:49] <ajnelson> 2010-11-17 15:32:39.391979 7f4f3e698710 mon.a@0(leader).mds e15 _note_beacon mdsbeacon(4098/a up:active seq 252 v15) v2 noting time
[0:49] <ajnelson> 2010-11-17 15:32:39.392005 7f4f3e698710 -- 127.0.0.1:6789/0 --> 127.0.0.1:6802/3964 -- mdsbeacon(4098/a up:active seq 252 v15) v1 -- ?+0 0x7f4f300922b0
[0:49] <jantje> sagewk: yes
[0:50] <ajnelson> mon.{abc}.0 don't have anything interesting. All 5734 bytes.
[0:50] <jantje> it must be related to that I guess, since bonnie is working on a 64bit machine -on- a cephfs
[0:51] <sagewk> jantje: ok cool, i see the problem
[0:52] <jantje> lovely
[0:53] <jantje> i'm off to bed, nite
[0:54] <ajnelson> gregaf: sagewk: Looked at the ends of all the logs, didn't see any stack traces. I kind of doubt an OOM killing, the box has 8GB of memory.
[0:54] <jantje> core dumps?
[0:55] * jantje &
[0:56] <ajnelson> jantje: Not seeing one. Looked in /var/log, and I'm not sure what it'd be named otherwise.
[0:56] <sagewk> jantje: please try http://fpaste.org/Q4i8/ when you get a chance. 'night!
[0:57] <sagewk> usually /core, but only if you did a 'ulimit -c unlimited' beforehand.
[0:57] <ajnelson> I didn't do that. So, no dump visible.
[0:58] <ajnelson> I think I'll try this again, see what remains active.
[0:58] <gregaf> yeah, we can't even begin to diagnose it if we don't know what died! :)
[0:59] <ajnelson> =)
[1:05] <ajnelson> Y'know, I'm starting to think I misdiagnosed this.
[1:06] <ajnelson> This test just seems to take a long time to run.
[1:06] <gregaf> it's possible, cfuse is pretty slow at some tasks
[1:06] <gregaf> painfully slow, in fact
[1:06] <gregaf> anybody running Hadoop is probably going to need to use the kernel client
[1:10] <ajnelson> Well, 2 minutes running using the local file system. I may have to just let this run overnight.
[1:10] <ajnelson> =|
[1:32] <ajnelson> gregaf: Shall I put the cfuse debug calls on the Ceph wiki?
[1:32] <gregaf> you mean the command-line options?
[1:32] <ajnelson> Aye.
[1:32] <gregaf> they're already there somewhere, but if you think they should be added to somewhere that's more readily apparent feel free to do so :)
[1:33] <ajnelson> They weren't on this page:
[1:33] <ajnelson> http://ceph.newdream.net/wiki/Command_line_options
[1:33] <gregaf> go for it!
[1:33] <ajnelson> Shall do.
[1:41] <ajnelson> gregaf: Well, TestCopyFiles timed out.
[1:42] <ajnelson> Ceph is still usable.
[1:42] <ajnelson> I think I need to update the kernel to run this suite.
[2:18] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[2:18] * greglap1 (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Read error: Connection reset by peer)
[2:33] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[2:52] * greglap (~Adium@166.205.139.148) has joined #ceph
[3:28] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[3:35] * sjust (~sam@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[3:38] * greglap1 (~Adium@166.205.139.148) has joined #ceph
[3:41] * greglap (~Adium@166.205.139.148) Quit (Ping timeout: 480 seconds)
[3:52] * greglap1 (~Adium@166.205.139.148) Quit (Read error: Connection reset by peer)
[4:03] * greglap (~Adium@76.90.74.194) has joined #ceph
[5:12] * alexxy[home] (~alexxy@79.173.81.171) has joined #ceph
[5:12] * alexxy (~alexxy@79.173.81.171) Quit (Read error: Connection reset by peer)
[6:24] * ajnelson (~ajnelson@soenat3.cse.ucsc.edu) Quit (Quit: ajnelson)
[6:42] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[7:38] * sentinel_e86 (~sentinel_@188.226.51.71) Quit (Quit: sh** happened)
[7:40] * sentinel_e86 (~sentinel_@188.226.51.71) has joined #ceph
[7:46] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[7:57] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[8:42] * allsystemsarego (~allsystem@188.26.33.21) has joined #ceph
[9:21] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[9:37] * failboat (~stingray@stingr.net) Quit (Ping timeout: 480 seconds)
[9:37] * failboat (~stingray@stingr.net) has joined #ceph
[10:18] * Yoric (~David@213.144.210.93) has joined #ceph
[10:30] * Yoric (~David@213.144.210.93) Quit (Quit: Yoric)
[11:02] <jantje> sagewk: it works! ceph: fix readdir EOVERFLOW on 32-bit archs fixes #549
[12:34] * Yoric (~David@213.144.210.93) has joined #ceph
[14:24] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[14:30] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: Leaving)
[15:39] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[15:48] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[16:44] * fred_ (~fred@80-219-183-100.dclient.hispeed.ch) has joined #ceph
[16:44] <fred_> hi
[17:39] * greglap (~Adium@76.90.74.194) Quit (Quit: Leaving.)
[17:52] * greglap (~Adium@166.205.138.179) has joined #ceph
[17:54] <greglap> hi fred_
[17:54] <fred_> hi
[17:55] <fred_> I've got about 30 minutes left if you want me to help with #590
[17:56] <greglap> hmm, that's new code I don't know a lot about
[17:57] <greglap> did you have any logs from the OSD that crashed?
[17:58] <greglap> fred_: if you don't then what you posted in the bug is probably all we can get out of the core file, but I'll ask around when people get in to work today :)
[17:58] <fred_> I have them... but it a bunch of journal read_entry 1197391872 : seq 252404254 33 bytes
[17:59] <fred_> and a few throttle: waited for ops
[17:59] <greglap> how large is it?
[17:59] <fred_> not a lot of debug turned on these days, 0.21, 0.22 was working so well...
[17:59] <greglap> heh
[18:00] <fred_> 144k
[18:00] <greglap> can you just attach it to the issue on the tracker, then?
[18:02] <fred_> done
[18:02] <greglap> k, thanks!
[18:03] <greglap> I'll make sure somebody at least skims it later today
[18:03] <fred_> and some other thing, what to do with "JOURNAL FULL" in the logs ?
[18:03] <fred_> ok thanks
[18:04] <greglap> hmm, that means that your OSD journal has filled up
[18:04] <greglap> it's not a bug or anything, but it means your journal isn't keeping up with your filesystem
[18:04] <fred_> journal full happen to me because 1/3 osd is out so the other 2 get all the data
[18:05] <fred_> this happens during the recovery process (that I stopped after 4 hours approx)
[18:05] <greglap> oh, sorry, got that backwards
[18:05] <fred_> I find it strange that the journal is not flushed to the btrfs device...
[18:05] <greglap> recovery is putting a heavy load on your disks I guess
[18:05] <fred_> yep
[18:05] <greglap> so it's streaming into the journal and then it's getting stuck waiting for the storage to catch up
[18:06] <fred_> I saw it too late (ceph already stopped) but I had no more free memory (4gb ram + 4gb swap)
[18:06] <greglap> well it won't break anything
[18:07] <fred_> ok, so I think the high swapping was the problem
[18:07] <greglap> it'll just hold up ops and, indeed, add a bit to the memory load
[18:07] <fred_> too bad I don't know which process ate 8gb of memory
[18:07] <greglap> I'm not sure how careful the recovery is of that
[18:08] <greglap> recovery isn't too worried about memory, but I didn't think it would indiscriminately send data off
[18:08] <greglap> did something get killed by the OOM-killer?
[18:08] <fred_> ok, I'll wait tomorrow for a fix for #590, if I see none, I'll restart ceph and observe the swapfest more closely
[18:09] <greglap> sweet
[18:13] <fred_> it's really strange, I fear a hardware problem, maybe not ceph's fault...
[18:14] <greglap> because of 590?
[18:14] <greglap> or your full journal?
[18:14] <fred_> swap related mess...
[18:15] <fred_> Free swap = 3967856kB
[18:15] <greglap> you might have a slow disk or something but that still shouldn't be killing daemons!
[18:15] <fred_> SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[18:15] <fred_> btw, I didn't answer, I saw no trace of oom killed process
[18:16] <greglap> hmm, okay
[18:20] * cmccabe (~cmccabe@dsl081-243-128.sfo1.dsl.speakeasy.net) has joined #ceph
[18:21] <sagewk> fred_ this is with the v0.23 release
[18:21] <sagewk> ?
[18:21] <sagewk> or something newer from git?
[18:21] <fred_> v0.23
[18:23] <sagewk> it may be a bogus assert. :/
[18:23] <fred_> I would love that when it means no data loss :)
[18:24] <sagewk> i'm pretty sure it does in this case. still looking.
[18:24] <fred_> thanks, take your time. I've got to go now.
[18:24] <sagewk> k
[18:25] <fred_> is the testing branch usually safe ?
[18:25] <sagewk> yeah
[18:25] <sagewk> bugfixes only
[18:26] <fred_> ok, so that is what I'll try tomorrow
[18:26] <fred_> good day
[18:26] * fred_ (~fred@80-219-183-100.dclient.hispeed.ch) Quit (Quit: Leaving)
[18:34] * failboat resumed his experiments, now on 0.23
[18:47] * greglap (~Adium@166.205.138.179) Quit (Ping timeout: 480 seconds)
[18:51] * sjust (~sam@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:53] * ajnelson (~ajnelson@dhcp-106-34.cruznet.ucsc.edu) has joined #ceph
[18:54] * Yoric (~David@213.144.210.93) Quit (Quit: Yoric)
[19:00] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:06] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:38] * ajnelson (~ajnelson@dhcp-106-34.cruznet.ucsc.edu) Quit (Quit: ajnelson)
[20:00] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[20:34] <wido> About RBD, what happends if I have /dev/rbd0 and I remove the backing rbd image
[20:38] <yehudasa> wido: any operations on the /dev/rbd0 will fail
[20:39] <wido> Ok, since I saw the libvirt discussing, about going for the RBD kernel module, before implementing the qemu driver
[20:40] <wido> Before booting your VM, you have to make sure you backing RBD device is present, this causes extra administration
[20:40] <wido> and also cleaning up the backing device when the VM is shut down or moved to another physical host
[20:42] <wido> I don't see how libvirt could do that for me
[21:54] <wido> sagewk: did you get a message from issue #585, since it has the status "Closed". I don't know if redmine notifies about a closed issue
[21:56] <cmccabe> wido: sage is at a meeting now
[21:57] <cmccabe> wido: so you saw 585 again with the latest unstable
[21:58] <wido> cmccabe: Indeed, but I wasn't sure if Redmine notifies when I reply on a Closed issue
[21:58] <cmccabe> wido: I've been working on that area of the code a little bit in connection with some other things, so I'm curious about the details
[21:59] <cmccabe> wido: so when did it happen
[22:00] <cmccabe> wido: was this right after you started all the OSDs?
[22:03] <wido> cmccabe: yes, I compiled the packages and then distributed them over the OSD's
[22:03] <cmccabe> wido: could you upload the OSD log to somewhere
[22:03] <wido> that takes some time, about 10 minutes or so, after it had finished 4 of the 12 OSD's were up
[22:03] <wido> do you have a public key? I have a machine where I dump all my logs, cores, etc
[22:04] <wido> logger.ceph.widodh.nl
[22:04] <cmccabe> I do have an ssh public key
[22:04] <cmccabe> ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAxT/EmlXe8YO4mJHpa8zMd4yibsO7ygg25n+8lIfkUeU2ugAn+Xt05IJKbofZ+6gok1dRO+sIUp4QMolCs2Sf9AuOJrvMbgZj398VQMmGyOc/3m9nUPiwzEXalrppn7TU5QLIHx0XccOuand2km/r3Bcoc3olc7VrIVBpJ8jBxOhaABoPtTYp6QiVYbeAYGpUqY+OyVpHVe23h5LFupMNr5EOgWDnA/8RViMHO/TO4Gw2Dkf7//o3r8BRY/HZHSQTRMA02Oq1D2kZK6Q1o3eQX528CaZkfVpd8RSSxIh9fiqVRJhXVZX/DkHoZbTOchFQtBpO9PnhjTkV84XTj3uH7Q== cmccabe@flab
[22:04] <wido> rsa I assume?
[22:05] <cmccabe> yes
[22:05] <wido> Ok, you've got root access to logger.ceph.widodh.nl
[22:06] <wido> cmccabe: check out /srv/ceph/issues/osd_crash_replicated_pull, there is "replicated_pull_osd0.tar.gz"
[22:06] <wido> it's a cdebugpack file from this morning (my local time)
[22:07] <wido> from the logger machine you also have root access to all my other machines, just do "ssh root@node01" and you're in
[22:07] <cmccabe> k
[22:07] <wido> feel free to check out the cluster, it's a dev env :-)
[22:08] <wido> cmccabe: To get back to my original question, does Redmine sent a notification then a issue is "Closed" ? For future reference
[22:08] <cmccabe> I'm not sure
[22:09] <wido> Ok, tnx
[22:09] <cmccabe> It sends emails when you post a comment to something in a bunch of other states, so I suspect it probably does. But I'd have to check.
[22:10] <wido> I have to go, hope the dump helps finding the cause, if you need anything else, let me know or feel free to grab it from one of the machines
[22:10] <wido> Ok, i'll assume it does sent e-mails when a issue is closed
[22:10] <cmccabe> I didn't get one, but I wasn't on the original bug
[22:11] <wido> cmccabe: ok, tnx! Got to go, ttyl
[22:12] <cmccabe> wido: ok, see you
[22:12] <cmccabe> wido: are you going to be online later today?
[22:12] <cmccabe> wido: we may want some additional information about this
[22:49] <wido> cmccabe: going afk now for today, almost 23:00 over here
[23:19] <jantje> :)
[23:55] * allsystemsarego (~allsystem@188.26.33.21) Quit (Quit: Leaving)
[23:55] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.