#ceph IRC Log


IRC Log for 2011-12-03

Timestamps are in GMT/BST.

[0:45] <Tv> mwahahah
[0:45] <Tv> GPTHeader(signature='EFI PART', revision='\x00\x00\x01\x00', header_size=92, crc32='LBr\xd2', current_lba=1, backup_lba=20000001, first_usable_lba=34, last_usable_lba=19999968, disk_guid='\x11\xaau\x9fv*bA\xa1\x06U\xe3v\x08\xde\x06', part_entry_start_lba=2, num_part_entries=128, part_entry_size=128, crc32_part_array='\x7f\xa7\xff\xb1\x00\x00\x00\x00')
[0:45] <Tv> GPTPartition(type='\xa2\xa0\xd0\xeb\xe5\xb93D\x87\xc0h\xb6\xb7&\x99\xc7', unique='\xfb\xf9%\xb5I\xceKD\x93\x93s\xc4\x93v\xa7\xbd', first_lba=2048, last_lba=19998719, flags=0, name='\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0
[0:45] <Tv> 0\x00\x00\x00')
[0:45] <Tv> need to pretty-print guids and add a couple of safety checks, but then it's done
[1:12] <nwatkins> I'm getting the following speeds from the ceph bench on the troubleshooting page. 2011-12-02 16:09:01.682124 log 2011-12-02 16:09:00.661823 osd.0 193 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 79.555075 sec at 13180 KB/sec
[1:13] <nwatkins> This speed is going through the user-space client, but I can achieve full line rate (~100MB/s) using the kernel client
[1:14] <gregaf> nwatkins: the bench command tells each OSD to benchmark the disk it uses for storing data
[1:14] <gregaf> it's not related to the clients at all
[1:14] <gregaf> but you've got a hell of a slow disk on osd 0 it looks like
[1:15] <nwatkins> hmm
[1:17] <nwatkins> gregaf: on osd.0 i'm seeing 100 MB/s writes to the ceph disk with dd
[1:18] <gregaf> hrm, were you doing other things with it at the time you ran bench?
[1:18] <nwatkins> no
[1:18] <gregaf> huh
[1:18] <gregaf> let me check the code again, but as I recall it just tells the OSDs to write 1GB of data to disk and report back how long it took...sagekw?
[1:18] <gregaf> sagewk?
[1:19] <sagewk> yeah
[1:20] <sagewk> the bench is doing 4k ios.. probably change that to something larger and you'll see teh 100mb/sec
[1:20] <sagewk> -b <bytes> to bench command, iirc
[1:21] <gregaf> looks like it's just by order, so bench <bytes_per> <total_bytes>
[1:24] <gregaf> and then it loops through dispatching transactions that the OSD processes normally, then does a sync_and_flush() at the end
[1:24] <gregaf> it should be pretty close to what you pull off the disk normally
[1:26] <nwatkins> gregaf: kinda weird behavior... simple hadoop job writing a couple files, after several minutes only a few KB have made it out to the file system.
[1:31] <nwatkins> gregaf: here's the client log. basically some files are being open, but the client just gets stuck indefinitely. http://pastebin.com/jnh7xrRc
[1:33] <gregaf> nwatkins: can you give me a little more context?
[1:34] <nwatkins> gregaf: sure
[1:34] <gregaf> the job is writing files but when you look at ceph -s it's only got a few KB of stuff added?
[1:36] <nwatkins> he client trace i just posted is from a job that basically creates a directory and writes a few KB into a couple files. i stopped the client after several minutes, and the directory had not even been created, but the behavior isn't consistent. earlier the same setup began writing its data files, but an ls revealed only a few KB had been written.
[1:36] <gregaf> in that log it does look like there are only 4 writes of a few hundred k going out to the OSDs
[1:37] <nwatkins> it seems like something is hanging
[1:39] <nwatkins> gregaf: that's about all i know. i have to run, but i'll try to do narrow this down later
[1:39] <gregaf> okay
[1:39] <gregaf> I'll see if there's something I can get out of this log
[1:39] <gregaf> unfortunately I might be less accessible than usual later — my power's out at home :(
[1:39] <nwatkins> gregaf: btw, i may just try to revert back to ceph version from a month ago when things were working fine. is that localized reads patch expected to be easily cherrypicked that far back?
[1:40] <gregaf> nwatkins: yeah, the localized reads stuff is just a few lines in the Objecter
[1:40] <gregaf> as long as it's not what's causing the problems :/
[1:40] <nwatkins> hmm
[1:40] <nwatkins> i'll test that first
[1:57] <gregaf> nwatkins: hmm, I'm not getting much out of this log — it sends out 4 OSD requests, which are replied to; it sends out a bunch of MDS requests which are all replied to; it's not waiting for anything that I can find...
[5:26] <darkfader> Tv: wanna dig through that in a query tomnorrow?
[9:28] * CephLogBot (~PircBot@rockbox.widodh.nl) has joined #ceph
[9:28] * wido (~wido@rockbox.widodh.nl) has joined #ceph
[13:00] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[15:53] * fronlius (~fronlius@g231136124.adsl.alicedsl.de) has joined #ceph
[16:15] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[16:21] * aa (~aa@r186-51-129-4.static.adinet.com.uy) has joined #ceph
[17:04] * andresambrois (~aa@r190-64-71-154.dialup.adsl.anteldata.net.uy) has joined #ceph
[17:05] * aa (~aa@r186-51-129-4.static.adinet.com.uy) Quit (Ping timeout: 480 seconds)
[18:07] * aa (~aa@r186-48-202-160.dialup.adsl.anteldata.net.uy) has joined #ceph
[20:38] * andresambrois (~aa@r186-48-210-168.dialup.adsl.anteldata.net.uy) has joined #ceph
[20:45] * aa (~aa@r186-48-202-160.dialup.adsl.anteldata.net.uy) Quit (Ping timeout: 480 seconds)
These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.