#ceph IRC Log

Index

IRC Log for 2010-08-06

Timestamps are in GMT/BST.

[0:00] * MarkN (~nathan@59.167.240.178) has joined #ceph
[1:39] * MarkN (~nathan@59.167.240.178) has left #ceph
[1:39] * MarkN (~nathan@59.167.240.178) has joined #ceph
[2:47] * ghaskins_mobile (~ghaskins_@66-189-114-103.dhcp.oxfr.ma.charter.com) Quit (Ping timeout: 480 seconds)
[2:49] * ghaskins_mobile (~ghaskins_@66-189-114-103.dhcp.oxfr.ma.charter.com) has joined #ceph
[6:18] * bbigras (quasselcor@bas11-montreal02-1128531598.dsl.bell.ca) has joined #ceph
[6:18] * bbigras is now known as Guest1043
[6:22] * Guest993 (quasselcor@bas11-montreal02-1128531598.dsl.bell.ca) Quit (Ping timeout: 480 seconds)
[7:13] * f4m8_ is now known as f4m8
[7:21] * Osso (osso@AMontsouris-755-1-10-232.w90-46.abo.wanadoo.fr) Quit (Quit: Osso)
[7:35] * mtg (~mtg@port-87-193-189-26.static.qsc.de) has joined #ceph
[7:40] * allsystemsarego (~allsystem@188.26.32.97) has joined #ceph
[9:11] <jantje> morning
[9:27] * jantje created a new feature request: #336 : journal (and/or data) on a tmpfs (memory filesystem) for (speed) testing.
[9:56] * eternaleye (eternaleye@bach.exherbo.org) has joined #ceph
[10:16] * akhurana (~ak2@c-98-232-30-233.hsd1.wa.comcast.net) has joined #ceph
[10:50] * akhurana (~ak2@c-98-232-30-233.hsd1.wa.comcast.net) Quit (Quit: akhurana)
[11:33] * vvk (~vvk@77.242.104.225) has joined #ceph
[11:34] * vvk (~vvk@77.242.104.225) Quit ()
[11:38] * MarkN (~nathan@59.167.240.178) Quit (Ping timeout: 480 seconds)
[13:25] * MarkN (~nathan@59.167.240.178) has joined #ceph
[14:05] * f4m8 is now known as f4m8_
[15:02] * Osso (osso@AMontsouris-755-1-10-232.w90-46.abo.wanadoo.fr) has joined #ceph
[15:25] * ghaskins_mobile (~ghaskins_@66-189-114-103.dhcp.oxfr.ma.charter.com) Quit (Ping timeout: 480 seconds)
[16:28] * jantje yawns
[17:01] * andret (~andre@pcandre.nine.ch) Quit (Remote host closed the connection)
[18:24] * mtg (~mtg@port-87-193-189-26.static.qsc.de) Quit (Quit: Verlassend)
[18:42] <jantje> 10.08.06_18:42:34.679693 7f08232f9710 -- 138.203.10.98:6800/7116 >> 138.203.8.142:0/4223772423 pipe(0x17cde30 sd=17 pgs=0 cs=0 l=0).accept peer addr is really 138.203.8.142:0/4223772423 (socket is 138.203.8.142:51309/0)
[18:43] <jantje> 10.08.06_18:42:34.679746 7f08232f9710 -- 138.203.10.98:6800/7116 >> 138.203.8.142:0/4223772423 pipe(0x17cde30 sd=17 pgs=0 cs=0 l=0).accept connect_seq 0 vs existing 1 state 3
[18:43] <jantje> 10.08.06_18:42:34.679757 7f08232f9710 -- 138.203.10.98:6800/7116 >> 138.203.8.142:0/4223772423 pipe(0x17cde30 sd=17 pgs=0 cs=0 l=0).accept peer reset, then tried to connect to us, replacing
[18:43] <jantje> 10.08.06_18:42:34.679776 7f082a6fc710 mds0.1 ms_handle_remote_reset on 138.203.8.142:0/4223772423
[18:43] <jantje> 10.08.06_18:42:34.680640 7f08232f9710 -- 138.203.10.98:6800/7116 >> 138.203.8.142:0/4223772423 pipe(0x17cde30 sd=-1 pgs=6860 cs=1 l=0).fault with nothing to send, going to standby
[18:48] <yehudasa> jantje: about 336, can you turn on debug journal to 20, so that we can see why it's not loading?
[18:49] <yehudasa> like, add 'debug journal = 20' to your ceph.conf
[19:01] * ghaskins_mobile (~ghaskins_@66-189-114-103.dhcp.oxfr.ma.charter.com) has joined #ceph
[19:13] <jantje> somehow I managed it to work
[19:13] <jantje> I think it's something with turning replication off
[19:13] <jantje> I now have osd pool default size = 1
[19:13] <jantje> under [mds]
[19:14] <jantje> (is that even correct?)
[19:18] * ghaskins_mobile (~ghaskins_@66-189-114-103.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[19:19] <gregaf> that command will set the default size (# replicas) for all newly created pools to be 1, but I think it should only work under the monitor section
[19:20] <sagewk> yeah, it goes in [mon]
[19:23] <jantje> Oh, so 0 is 'none'
[19:23] <jantje> and 1 is 'still' one copy?
[19:27] <gregaf> jantje: yes
[19:27] <gregaf> default is 2
[19:32] <jantje> Im getting no speed improvements
[19:33] <jantje> I have 3 servers with each 2 disks (doing journal + data)
[19:33] <jantje> I have 4gbit links bonded (round robin) with effective iperf througput of 2.6gbit
[19:34] <jantje> (4links each server)
[19:34] <gregaf> speed improvements over what?
[19:35] <jantje> I get an average of 150MB/s on writes with default conig, as well with pool size=0
[19:36] <jantje> (and a rebooting box for free.)
[19:37] <gregaf> yes, but the 150MB/s isn't better compared to what?
[19:38] <jantje> nothing. I was hoping to get 50MB/s write speed from each disk
[19:40] <gregaf> well you are, basically — the journal has to write changes first and then they get copied to the data disk so you've really only got 3 disks worth of bandwidth
[19:40] <jantje> (well, not exactly 50 for each disk, but still 'more' , but probably the journal is messing things up)
[19:40] <gregaf> although it does seem a little slow for a setup without replication and independent journaling disks
[19:40] <gregaf> can you run ceph osd tell [0,1,2] bench and look for the results with ceph -w?
[19:42] <jantje> ok, sec
[19:42] <jantje> 10.08.06_19:41:36.247232 osd e15: 6 osds: 4 up, 6 in -- 3 blacklisted MDSes
[19:42] <jantje> Hmm
[19:42] <gregaf> oh, 6 OSDs with each disk doing journal and data, I misunderstood you there
[19:43] <gregaf> do you have any core files or logs for those dead OSDs?
[19:44] <jantje> 10.08.06_19:40:56.399884 log 10.08.06_19:40:55.487668 mon0 138.203.10.98:6789/0 1043 : [WRN] lease_ack from mon2 was sent from future time 10.08.06_19:40:56.006752, clocks not synchronized.
[19:44] <jantje> I think thats the reason, I'll set my lease wiggle room higer
[19:47] <jantje> core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/bin/cosd -i 0 -c /etc/ceph/ceph.conf'
[19:47] <jantje> you want that?
[19:51] <jantje> I think that one was made when my box rebooted
[19:51] <gregaf> can you give us the backtrace from it?
[19:51] <jantje> http://jan.sin.khk.be/core in 2 min
[19:51] <jantje> its there
[19:51] <jantje> how does that work? :)
[19:52] <jantje> (I also mailed a backtrace yesterday op the mailinglist)
[19:52] <jantje> *on
[19:53] <jantje> if my osd's are marked down, do I have to put them back up manually?
[19:53] <gregaf> if they're still running they'll come back on their own
[19:54] <gregaf> but the processes won't be restarted or anything like that
[19:54] <gregaf> hmm, I'm not seeing a backtrace email, just your kernel oops?
[19:54] <gregaf> to use the core file I'll need your executable too
[19:56] <jantje> http://jan.sin.khk.be/cosd
[19:56] <jantje> (email subject from yesterday: Kernel oops on df : offline ceph still mounted.)
[19:58] <gregaf> that doesn't have any OSD backtrace in it, just a kernel oops in btrfs?
[19:58] <jantje> oh, it's probably something totally different
[19:59] <jantje> I'll see if I can find something in my logs from this crash
[20:00] <jantje> Hmm
[20:00] <jantje> Aug 6 19:46:17 ceph0 kernel: [ 750.415836] ata1.00: failed command: READ DMA
[20:00] <jantje> Aug 6 19:46:17 ceph0 kernel: [ 750.422117] ata1.00: cmd c8/00:00:b8:09:8c/00:00:00:00:00/e0 tag 0 dma 131072 in
[20:00] <jantje> Aug 6 19:46:17 ceph0 kernel: [ 750.422118] res 51/40:00:59:0a:8c/00:00:00:00:00/00 Emask 0x9 (media error)
[20:01] <jantje> fucking buggy disks
[20:02] <jantje> 10.08.06_20:02:27.646418 log 10.08.06_20:02:26.470100 osd2 138.203.10.99:6801/7148 2 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 21.816785 sec at 48062 KB/sec
[20:02] <jantje> 10.08.06_20:02:27.646418 log 10.08.06_20:02:27.232937 osd5 138.203.10.100:6803/6960 1 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 22.540679 sec at 46519 KB/sec
[20:02] <jantje> osd0 and osd1 are down
[20:03] <jantje> 10.08.06_20:02:27.646418 log 10.08.06_20:02:27.420835 osd3 138.203.10.99:6803/7182 1 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 22.765286 sec at 46060 KB/sec
[20:03] <jantje> 10.08.06_20:02:31.586766 log 10.08.06_20:02:30.411966 osd4 138.203.10.100:6801/6923 1 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 25.726924 sec at 40757 KB/sec
[20:03] <gregaf> oh, there aren't any symbols and my environment isn't the same, I can't do anything with that core :(
[20:03] <jantje> ok, I should go now
[20:03] <jantje> is it something easy I can do for you ?
[20:04] <gregaf> it looks like your disks are only good for about 40MB/s each when you're running journal and datastore on them
[20:04] <gregaf> if you have gdb then running gdb cosd core
[20:04] <gregaf> and then getting a backtrace (type "bt")
[20:04] <gregaf> and pasting that in
[20:04] <gregaf> will give us a clue :)
[20:05] <jantje> Core was generated by `/usr/bin/cosd -i 0 -c /etc/ceph/ceph.conf'.
[20:05] <jantje> Program terminated with signal 6, Aborted.
[20:05] <jantje> #0 0x00007fa704519175 in raise () from /lib/libc.so.6
[20:05] <jantje> (gdb) bt
[20:05] <jantje> #0 0x00007fa704519175 in raise () from /lib/libc.so.6
[20:05] <jantje> #1 0x00007fa70451bf80 in abort () from /lib/libc.so.6
[20:05] <jantje> #2 0x00007fa704dacdc5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
[20:05] <jantje> #3 0x00007fa704dab166 in ?? () from /usr/lib/libstdc++.so.6
[20:05] <sagewk> jantje: replied to your email from yesterday
[20:05] <jantje> #4 0x00007fa704dab193 in std::terminate() () from /usr/lib/libstdc++.so.6
[20:05] <jantje> #5 0x00007fa704dab28e in __cxa_throw () from /usr/lib/libstdc++.so.6
[20:05] <jantje> #6 0x00000000005c3740 in ceph::__ceph_assert_fail(char const*, char const*, int, char const*) ()
[20:05] <jantje> #7 0x000000000056d2c4 in FileJournal::wrap_read_bl(long&, long, ceph::buffer::list&) ()
[20:06] <jantje> #8 0x000000000056e228 in FileJournal::read_entry(ceph::buffer::list&, unsigned long&) ()
[20:06] <jantje> #9 0x000000000056bba0 in JournalingObjectStore::journal_replay(unsigned long) ()
[20:06] <jantje> #10 0x00000000005575ce in FileStore::mount() ()
[20:06] <jantje> #11 0x00000000004e9f16 in OSD::init() ()
[20:06] <jantje> #12 0x000000000045a4da in main ()
[20:07] <jantje> my journal is pointing to a block device
[20:07] <jantje> (if thats any different from a btrfs fs)
[20:07] <gregaf> all right, thanks!
[20:09] <jantje> gregaf: osd's aren't coming back up automaticly ..
[20:09] <gregaf> are the processes still running?
[20:10] <jantje> Hmm, obvious: No
[20:10] <jantje> damn
[20:11] <jantje> it's getting late
[20:11] <jantje> time for me to go home
[20:16] <jantje> sagewk: /dev/shm/journal is created by mkbrtrfs , but only on the local system I think
[20:17] <jantje> OH
[20:18] <jantje> I thought that the I should make the filename unique, put $id in it
[20:18] <jantje> but still fails :P
[20:19] <jantje> mkbtrfs touches the journal file, but then fails
[20:21] <jantje> s/btrfs/cephfs/
[20:24] <sagewk> jantje: can you add 'debug journal = 20' to the
[20:24] <sagewk> er hold on
[20:25] <sagewk> yeah, can you add that to the [osd] section, run mkcephfs, and then send along the /var/log/ceph/osd.whatever.log file?
[20:31] <jantje> ceph version 0.21~rc (3ed08a33)
[20:31] <jantje> 10.08.06_20:30:07.188343 7f79f358b720 7f79f358b720 strange, pid file /var/run/ceph/osd.0.pid has 11605, not expected 12973
[20:31] <jantje> 10.08.06_20:30:07.188401 7f79f358b720 filestore(/data/osd0) mkfs in /data/osd0
[20:31] <jantje> 10.08.06_20:30:07.188478 7f79f358b720 filestore(/data/osd0) mkfs removing old file fsid
[20:31] <jantje> 10.08.06_20:30:07.220593 7f79f358b720 journal create /dev/shm/journal0
[20:31] <jantje> 10.08.06_20:30:07.220636 7f79f358b720 journal _open failed 22 Invalid argument
[20:31] <jantje> 10.08.06_20:30:07.220643 7f79f358b720 filestore(/data/osd0) mkjournal error creating journal on /dev/shm/journal0
[20:31] <jantje> 10.08.06_20:30:07.220653 7f79f358b720 filestore(/data/osd0) mkfs done in /data/osd0
[20:31] <jantje> 10.08.06_20:30:07.220694 7f79f358b720 filestore(/data/osd0) mount detected btrfs
[20:31] <jantje> 10.08.06_20:30:07.220703 7f79f358b720 filestore(/data/osd0) mount btrfs CLONE_RANGE ioctl is supported
[20:31] <jantje> 10.08.06_20:30:07.253838 7f79f358b720 filestore(/data/osd0) mount btrfs SNAP_CREATE is supported
[20:32] <jantje> 10.08.06_20:30:07.340049 7f79f358b720 filestore(/data/osd0) mount btrfs SNAP_DESTROY is supported
[20:32] <jantje> 10.08.06_20:30:07.340088 7f79f358b720 filestore(/data/osd0) mount found snaps <>
[20:32] <jantje> 10.08.06_20:30:07.340136 7f79f358b720 journal journal_replay fs op_seq 0
[20:32] <jantje> 10.08.06_20:30:07.340143 7f79f358b720 journal open /dev/shm/journal0 next_seq 1
[20:32] <jantje> 10.08.06_20:30:07.340157 7f79f358b720 journal journal_replay open failed with Inappropriate ioctl for device
[20:32] <jantje> 10.08.06_20:30:07.340166 7f79f358b720 filestore(/data/osd0) mount failed to open journal /dev/shm/journal0: Inappropriate ioctl for device
[20:32] <jantje> thats all there is in
[20:32] <jantje> I really have to go now
[20:33] <darkfade1> go sleep!
[20:33] <sagewk> oh, i bet direct io doesn't work on tmpfs. add 'journal dio = false' to your [osd] section.
[20:34] <sagewk> thanks! ;)
[21:35] * ghaskins_mobile (~ghaskins_@pool-70-19-184-7.bos.east.verizon.net) has joined #ceph
[21:49] * ghaskins_mobile (~ghaskins_@pool-70-19-184-7.bos.east.verizon.net) Quit (Ping timeout: 480 seconds)
[21:51] * ghaskins_mobile (~ghaskins_@66-189-114-103.dhcp.oxfr.ma.charter.com) has joined #ceph
[22:00] <jantje> 10.08.06_21:56:42.914802 log 10.08.06_21:56:41.537344 osd2 138.203.10.99:6801/10604 1 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 7.288518 sec at 140 MB/sec
[22:00] <jantje> now we're talking :)
[22:03] <jantje> and now the box runs into real trouble .. :)
[22:05] <jantje> in fact, they're all dead, unpingable even
[22:05] <jantje> mental note: connect apc next week
[23:42] <todinini> hi, my mds dies with terminate called after throwing an instance of 'ceph::buffer::end_of_buffer*'
[23:42] <sagewk> do you have a stack trace?
[23:42] <todinini> sagewk: how do I get one?
[23:43] <todinini> i only have the log file
[23:44] <gregaf> what version are you using?
[23:44] <sagewk> do you have a core file?
[23:44] <todinini> ceph version 0.22~rc (2cac166825ac080e091789e659f4cb6ae250cc7a)
[23:46] <todinini> I upload the core
[23:46] <sagewk> better yet, can you gdb cmds core and pastebin the output from 'bt'?
[23:50] <todinini> I hope I did it right http://pastebin.com/i6Y7XbB1
[23:52] <sagewk> hmm, it should be reproducible (i.e., crash every time you restart cmds?)
[23:53] <todinini> yep, on both mds
[23:55] <todinini> today we updated from 0.21 to 0.22, before the mds was fine
[23:56] <sagewk> greg just pushed a fix
[23:57] <todinini> ok, building

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.