#ceph IRC Log


IRC Log for 2011-05-25

Timestamps are in GMT/BST.

[0:28] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) Quit (Ping timeout: 480 seconds)
[0:33] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) has joined #ceph
[0:42] * aliguori (~anthony@ Quit (Quit: Ex-Chat)
[0:50] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[0:50] <Tv> yehudasa: rgw changes broke make check
[1:07] <cmccabe> I'm having some trouble connecting to a ceph cluster I'm running on rgw-cmccabe
[1:08] <cmccabe> nmap says that the only open ports are 22, 80, 111... is there some kind of firewall or something in effect on the virtual machines
[1:09] <cmccabe> actually, I have mon addr set to perhaps that is causing problems?
[1:16] * djlee (~dlee064@des152.esc.auckland.ac.nz) has joined #ceph
[1:18] * verwilst_ (~verwilst@dD576F05B.access.telenet.be) Quit (Quit: Ex-Chat)
[1:32] * sjustlaptop (~sam@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[1:58] * bchrisman (~Adium@sjs-cc-wifi-1-1-lc-int.sjsu.edu) Quit (Quit: Leaving.)
[2:08] * djlee_ (~dlee064@des152.esc.auckland.ac.nz) has joined #ceph
[2:13] <yehudasa> cmccabe: 9a660ac910 broke rgw
[2:14] <cmccabe> I'm looking at it
[2:15] * djlee (~dlee064@des152.esc.auckland.ac.nz) Quit (Ping timeout: 480 seconds)
[2:17] <yehudasa> cmccabe: you changed RGWRados::initialize(md_config_t *conf) to something else, but that's a virtual function
[2:18] <yehudasa> so you need to change RGWAccess::initialize too
[2:18] <cmccabe> yep
[2:18] <cmccabe> I'm not sure why the base class had an implementation
[2:19] <yehudasa> because I did it so
[2:19] <cmccabe> I think I left it that way because it was like that when I got there
[2:19] <yehudasa> I gtg now, but it completely broke radosgw, radosgw_admin
[2:19] <cmccabe> I'll retest with this change
[2:24] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[2:26] <djlee_> cmccabe: for the sequential writing to ceph mount, say 2 osd (2 disk), using say dd, where and how (which patterns) does the objects get written?
[2:26] <cmccabe> djlee_: the mds handles the mapping between file data and objects
[2:27] <djlee_> cmccabe: surely the way disk writes, it doesn't appear sequential....(when it should?)
[2:27] <cmccabe> djlee_: I'm not sure what you're asking.
[2:27] <cmccabe> djlee_: even on a local FS, the data won't be sequential in terms of disk sectors unless you're using extents
[2:28] <djlee_> cmccabe: extents? you mean xattr?
[2:28] <cmccabe> djlee_: I'm definitely not really an MDS expert; greg and sage know the most about that code
[2:29] <cmccabe> djlee_: https://sort.symantec.com/public/documents/sfha/5.1sp1/hp-ux/productguides/html/vxfs_admin/ch01s02s01.htm
[2:29] <cmccabe> or maybe better: http://chrismiles.info/unix/sun/performance/disks.html
[2:30] <cmccabe> "Extent-based filesystems allocate disk blocks in large groups at a single time, which forces sequential allocation. As a file is written, a large number of blocks are allocated, after which writes can occur in large groups or clusters of sequential blocks. Filesystem metadata is written when the file is first created. Subsequent writes within the first allocation extent of blocks do not require additional metadata writes (until the next extent is allocated).
[2:30] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[2:31] <djlee_> OK
[2:32] <djlee_> for write-journal size, e.g., set to 1gb
[2:33] <djlee_> what exactly happens after say dd writes more than 1gb to it?
[2:33] <djlee_> does the osd flushes the 1gb of journal back to object-data, while still receiving the remaining dd writes (to journal) ?
[2:33] <cmccabe> djlee_: basically we block until the journal gets written out
[2:34] <djlee_> argh.
[2:35] <cmccabe> djlee_: I mean in general you'll be adding new data to the journal continuously
[2:35] <djlee_> so then any write process with size less than the journal-size 1GB, say 200mb write, is (not) instant?
[2:36] <cmccabe> djlee_: you can reuse journal space once that data is safely on disk
[2:36] <cmccabe> djlee_ I don't understand the question really
[2:36] <djlee_> i.e., how fast is it journal or no-journal, if I stay under 1gb, then I get really fast-performance, but after 1gb, really slow-performance?
[2:36] <cmccabe> djlee_: no
[2:37] <cmccabe> djlee_: the data in the journal is continuously being written out to disk
[2:37] <cmccabe> djlee_: once it's committed to disk it doesn't need to be in the journal any more
[2:37] <djlee_> i see, when you meant 'reuse journal space', right, I suppose ceph will automatically 'continously write to disk'
[2:37] <cmccabe> djlee_: http://ceph.newdream.net/wiki/OSD_journal
[2:39] <djlee_> heh; its updated im reading it thanks :)
[2:39] <cmccabe> Do you understand the basic concept of journalling?
[2:39] <djlee_> i thought i did for ext4
[2:39] <djlee_> but ceph's yet-another one on top of ext4's journal is confusing me
[2:39] <cmccabe> the basic idea is that you want to make sure that your data is safely committed to disk before sending back an ack
[2:40] <cmccabe> but putting data into its final destination and then waiting for sync() is slow
[2:40] <cmccabe> so you add the data to the journal instead
[2:41] <cmccabe> basically it's taking advantage of the fact that it's more efficient to commit a lot of things at once
[2:43] <cmccabe> actually, journalling can be turned off on ext4
[2:43] <djlee_> in the pros/cons, when you say 'we write all data twice', so the first goes to journal, and second (backend) disk's will copyoff from the journal to it's storage.
[2:43] <cmccabe> yep
[2:43] <djlee_> are you saying that there's no-need for double-journal?
[2:44] <cmccabe> I'm not sure-- I'd have to think about that
[2:44] <djlee_> if i remember ext4 with or without journal, the performance diff wasn't big
[2:45] <cmccabe> in theory it seems like you would want to run ext4 without the journal, since you should be getting your safety provided by ceph's replication
[2:46] <djlee_> also the ram helps none for the write operation as far as I see
[2:46] <cmccabe> so I don't think the journal is really buying you anything at that point
[2:46] <cmccabe> we generally encourage people to run btrfs
[2:46] <cmccabe> so I guess we haven't really discussed that aspect of ext4
[2:47] <cmccabe> RAM is not permanent storage so it can't be used for journalling-- unless it's NVRAM
[2:48] <djlee_> right, what im saying is that there's no cache-hit for write, but cache-hit for reading (randomly, etc)
[2:48] <cmccabe> in caching terms we are doing a write-through
[2:48] <djlee_> basically, ive been building this rather a large matrix of node vs osd, etc relationship for write and reading, of a bunch of filesets
[2:49] <cmccabe> http://en.wikipedia.org/wiki/Cache
[2:49] <djlee_> to see a scalability of per-disk
[2:49] <cmccabe> "In a write-through cache, every write to the cache causes a synchronous write to the backing store"
[2:50] <cmccabe> I guess the three big variables are network bandwidth, I/O bandwidth, and CPU
[2:50] <cmccabe> broadly speaking
[2:51] <cmccabe> I guess RAM matters, but only for reads...as you mentioned
[2:51] <djlee_> yeah
[2:51] <djlee_> cpu+mobo is the biggest one
[2:53] <cmccabe> I think we originally were hoping CPU wouldn't have much impact on the OSD performance
[2:53] <djlee_> yeah but when running multiple cosds, i think there's some hit
[2:53] <djlee_> for crappy PCs i mean
[2:53] <cmccabe> but I guess maybe cosd turns out to take more CPU than we hoped
[2:54] <djlee_> good PC, still suffers from the bottleneck of the file-size though,
[2:55] <djlee_> e.g., Im testing a large fileset of, variable sized, e.g., 2kb all the way to 2GB+ etc, with lots of small sizes
[2:55] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[2:55] <djlee_> and so it seems that disks or ceph (or over-the-network), can't seem to increase performance of small sized files
[2:56] <djlee_> well there is an increase with adding more disk, etc, but i meant the actual increase per-disk is about 10MB/s (and this is using the fileset with many small sizes)
[2:56] <djlee_> big files no problem, good mb/s, iirc
[2:59] <djlee_> i think its pretty common to see this small vs big size difference, but i suppose its not ceph specific, as it applies to almost all file systems
[3:00] <cmccabe> that is true
[3:00] <cmccabe> small files cause a lot more metadata to be generated, relative to the size of data
[3:01] <djlee_> right, does that mean that small-size file, e.g., 4kb file will be separated by 4mb or 4kb (fixed-)object?
[3:02] <cmccabe> I don't think cosd imposes any minimum object size
[3:02] <djlee_> i hear about the 4mb chunking
[3:03] <cmccabe> that is RBD
[3:04] <djlee_> arg
[3:04] <cmccabe> there isn't actually any limit on the size of rados objects as far as I know
[3:04] <cmccabe> I don't know what the maximum object size that the MDS uses is
[3:05] <cmccabe> it might be 4 MB, but I never saw any documentation about that
[3:06] <djlee_> right, so then i presume 4kb and 8kb file to be stored in ceph mount, will be about 4kb and 8kb object size plus the metadata size
[3:06] <djlee_> sorry for the noob question!
[3:08] <cmccabe> I'm not sure exactly what the MDS' strategy is for chunking
[3:09] <djlee_> whats the big deal with rbd? i mean as far as a real production setups are concerned. after all the object talks, block-device?
[3:09] <cmccabe> RBD is a network block device similar to GNBD
[3:10] <cmccabe> or sheepdog
[3:10] <cmccabe> the big deal is mostly focused on storing virtual machine images, at the moment
[3:11] <cmccabe> I have to revise something I said earlier:
[3:11] <cmccabe> the MDS does stripe the file over 4 MB objects by default
[3:11] <cmccabe> it's controlled by stripe_unit and stripe_count, but the default is 4MB
[3:12] <djlee_> so rdb is really for a single-node for, say, testing i guess (for now), and not for (real) distributed environment; for this we need the proper ceph mon/mds/osd,
[3:12] <cmccabe> I would guess that for files less than 4MB, they would fit in a single object all the time. However, I'm not 100% sure.
[3:13] <djlee_> i see, so that default 4mb, means its capped/maxed at 4MB? so if I have a 8mb , then 2 chunks of 4MB. If I have 4kb, then 1 chunk of.. 4kb (or4MB) ?
[3:13] <cmccabe> I don't think RBD uses the metadata servers at all
[3:13] <cmccabe> RBD is not "a single node"; it is distributed
[3:13] <cmccabe> RGW is a single-node, centralized system, however
[3:13] <cmccabe> I think so
[3:14] <cmccabe> I never dealt with striping directly but everything I read seems to confirm that
[3:23] <cmccabe> have a good night
[3:23] * cmccabe (~cmccabe@ has left #ceph
[4:38] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Quit: Been here. Done that.)
[4:39] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[4:55] * Nadir_Seen_Fire (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[5:01] * DanielFriesen (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Ping timeout: 480 seconds)
[5:13] * sjustlaptop (~sam@adsl-76-208-176-239.dsl.lsan03.sbcglobal.net) has joined #ceph
[5:27] * sjustlaptop (~sam@adsl-76-208-176-239.dsl.lsan03.sbcglobal.net) Quit (Quit: Leaving.)
[5:31] * sjustlaptop (~sam@adsl-76-208-176-239.dsl.lsan03.sbcglobal.net) has joined #ceph
[5:33] * sjustlaptop (~sam@adsl-76-208-176-239.dsl.lsan03.sbcglobal.net) Quit ()
[5:34] * sjustlaptop (~sam@adsl-76-208-176-239.dsl.lsan03.sbcglobal.net) has joined #ceph
[6:37] * MarkN (~nathan@ has joined #ceph
[7:30] * lxo (~aoliva@ Quit (Ping timeout: 480 seconds)
[7:40] * lxo (~aoliva@ has joined #ceph
[7:57] * alexxy[home] (~alexxy@ has joined #ceph
[8:00] * alexxy (~alexxy@ Quit (Ping timeout: 480 seconds)
[8:09] * alexxy[home] (~alexxy@ Quit (Ping timeout: 480 seconds)
[8:11] * alexxy (~alexxy@ has joined #ceph
[8:16] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) Quit (Ping timeout: 480 seconds)
[8:20] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) has joined #ceph
[8:20] * wonko_be (bernard@november.openminds.be) Quit (Read error: Connection reset by peer)
[8:21] * sjustlaptop (~sam@adsl-76-208-176-239.dsl.lsan03.sbcglobal.net) Quit (Quit: Leaving.)
[8:22] * wonko_be (bernard@november.openminds.be) has joined #ceph
[8:26] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Quit: zzZZZZzz)
[8:52] * andret (~andre@pcandre.nine.ch) has joined #ceph
[8:53] * allsystemsarego (~allsystem@ has joined #ceph
[8:54] * darktim (~andre@ticket1.nine.ch) has joined #ceph
[11:09] * alexxy (~alexxy@ Quit (Ping timeout: 480 seconds)
[11:18] * Yulya_ (~Yu1ya_@ip-95-220-133-71.bb.netbynet.ru) has joined #ceph
[11:25] * Yulya___ (~Yu1ya_@ip-95-220-166-177.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[11:42] * mihu (~mihu@praha.web4u.cz) has joined #ceph
[11:44] * mihu (~mihu@praha.web4u.cz) has left #ceph
[11:44] * Yulya__ (~Yu1ya_@ip-95-220-187-248.bb.netbynet.ru) has joined #ceph
[11:50] * Yulya_ (~Yu1ya_@ip-95-220-133-71.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[12:25] * Yulya__ (~Yu1ya_@ip-95-220-187-248.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[12:27] * Yulya_ (~Yu1ya_@ip-95-220-255-70.bb.netbynet.ru) has joined #ceph
[13:08] * Yulya__ (~Yu1ya_@ip-95-220-136-240.bb.netbynet.ru) has joined #ceph
[13:15] * Yulya_ (~Yu1ya_@ip-95-220-255-70.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[13:52] * philipgian (~philipgia@athedsl-4504336.home.otenet.gr) Quit (Ping timeout: 480 seconds)
[13:59] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[14:09] * Yulya_ (~Yu1ya_@ip-95-220-187-68.bb.netbynet.ru) has joined #ceph
[14:15] * Yulya__ (~Yu1ya_@ip-95-220-136-240.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[15:40] * jbd (~jbd@ks305592.kimsufi.com) has joined #ceph
[15:53] * slang (~slang@chml01.drwholdings.com) has joined #ceph
[16:04] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) Quit (Ping timeout: 480 seconds)
[16:06] * Yulya__ (~Yu1ya_@ip-95-220-129-123.bb.netbynet.ru) has joined #ceph
[16:08] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) has joined #ceph
[16:13] * MK_FG (~MK_FG@ Quit (Ping timeout: 480 seconds)
[16:13] * Yulya_ (~Yu1ya_@ip-95-220-187-68.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[16:23] * tjikkun (~tjikkun@195-240-187-63.ip.telfort.nl) Quit (Quit: Ex-Chat)
[16:26] * Yulya__ (~Yu1ya_@ip-95-220-129-123.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[16:28] * Yulya_ (~Yu1ya_@ip-95-220-244-12.bb.netbynet.ru) has joined #ceph
[16:49] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[16:54] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) Quit (Remote host closed the connection)
[16:59] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) has joined #ceph
[17:08] * sjustlaptop (~sam@adsl-76-208-176-239.dsl.lsan03.sbcglobal.net) has joined #ceph
[17:10] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Quit: Ex-Chat)
[17:30] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[17:34] * aliguori (~anthony@ has joined #ceph
[17:36] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:44] * sjustlaptop (~sam@adsl-76-208-176-239.dsl.lsan03.sbcglobal.net) Quit (Quit: Leaving.)
[17:55] * Nadir_Seen_Fire (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Ping timeout: 480 seconds)
[17:56] * MK_FG (~MK_FG@ has joined #ceph
[18:05] * MK_FG (~MK_FG@ Quit (Quit: o//)
[18:06] * aliguori (~anthony@ Quit (Read error: Operation timed out)
[18:07] * MK_FG (~MK_FG@ has joined #ceph
[18:08] * MK_FG (~MK_FG@ Quit ()
[18:12] * MK_FG (~MK_FG@ has joined #ceph
[18:13] * MK_FG (~MK_FG@ Quit ()
[18:15] * MK_FG (~MK_FG@ has joined #ceph
[18:15] * Yulya__ (~Yu1ya_@ip-95-220-238-119.bb.netbynet.ru) has joined #ceph
[18:20] * Yulya_ (~Yu1ya_@ip-95-220-244-12.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[18:23] * aliguori (~anthony@ has joined #ceph
[18:28] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:33] * Yulya_ (~Yu1ya_@ip-95-220-241-134.bb.netbynet.ru) has joined #ceph
[18:40] * Yulya__ (~Yu1ya_@ip-95-220-238-119.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[18:44] * aliguori (~anthony@ Quit (Remote host closed the connection)
[18:49] * Yulya__ (~Yu1ya_@ip-95-220-253-227.bb.netbynet.ru) has joined #ceph
[18:56] * Yulya_ (~Yu1ya_@ip-95-220-241-134.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[18:57] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:58] <bchrisman> sagewk: From first estiimation of logs, it looks like when a process with a posix lock on one node dies/gets killed, the kernel client doesn't notify the mds to cleanup that processes locks (or the mds gets notified but doesn't clean things up). We're seeing mds log messages with references to pids that are no longer running. I wanted to check whether that functionality is currently implemented.
[18:59] <bchrisman> Once we tell the ping_pong test to unlock on sigint, the test runs as it's supposed to...
[19:06] * cmccabe (~cmccabe@ has joined #ceph
[19:10] <bchrisman> sagewk: also.. this only happens when there's contention for locks from multiple nodes??? so I imagine the basic functionality is there
[19:14] * Yulya_ (~Yu1ya_@ip-95-220-176-197.bb.netbynet.ru) has joined #ceph
[19:20] * alexxy (~alexxy@ has joined #ceph
[19:21] * Yulya__ (~Yu1ya_@ip-95-220-253-227.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[19:32] <sagewk> bchrisman: hmm! will take a look in a bit
[19:32] <Tv> fuse: unknown option `-osubtype=cfuse'
[19:32] <Tv> anyone know what that's all about?
[19:32] <sagewk> no idea
[19:33] <Tv> there's no real matches for "subtype" in ceph.git :(
[19:33] <bchrisman> I tihink we have more info on that.
[19:33] <bchrisman> we can talk about it on conf
[19:33] <bchrisman> we're also producing a test case.
[19:48] <Tv> my fuse problem seems to be a version thing.. i don't know why it would have started to trigger now, though :(
[19:52] <sagewk> libfuse2 version or something?
[19:52] <Tv> all at 2.8.4-something as far as i can see
[19:52] <Tv> redoing that part manually
[19:54] <sagewk> bchrisman: can you reproduce the lock hang with mds logs cranked up?
[19:54] <bchrisman> yeah.. we're going to do that now
[19:55] <bchrisman> we were just guesing that remove_lock doesn't get called because the client doesn't actually hav ea lock, it is just waiting for a lock??? so the question is where it's supposed to clean up processes waiting for a lock after they exit.
[19:56] <Tv> ah bah cfuse doesn't understand -- to terminate flags, that's what was different
[19:58] <sagewk> bchrisman: oh i see.
[19:58] * Dantman (~dantman@ has joined #ceph
[20:01] <bchrisman> sagewk: okay.. think we tracked it down.. process1 gets lock??? process2 gets on wait list??? process2 exits??? process1 unlocks???. now mds shows process2 holding lock
[20:01] <sagewk> bchrisman: k, looking
[20:14] <sagewk> bchrisman: did you get a log?
[20:16] <bchrisman> yeah.. one sec
[20:18] <cmccabe> seems like DELETE no longer works on rgw
[20:19] <cmccabe> yehuda, are you there?
[20:19] <yehudasa> cmccab: yes
[20:20] <cmccabe> I haven't managed to get a backtrace or even anything in the logs yet
[20:20] <cmccabe> but I get a 500 internal error trying to delete objects
[20:20] <cmccabe> hmm, weird; it worked that time
[20:20] <yehudasa> hmm.. just tested it and it works for me
[20:21] <sagewk> bchrisman: is there some command line tool you're using to do the locking?
[20:21] <yehudasa> probably crashed for some reason.. on which machine were you trying that?
[20:21] <cmccabe> rgw-cmccabe
[20:21] <yehudasa> do you have latest rgw there"?
[20:21] <cmccabe> yes
[20:22] <cmccabe> I am feeling kind of frustrated because I told apache to launch only one radosgw and it launched 2
[20:22] <cmccabe> also I can't find anything in the logs
[20:22] <bchrisman> sagewk: just a simple c-prog??? it can be reproduced via ping_pong.c as well.
[20:22] <yehudasa> did that happen just after restarting apache?
[20:22] <bchrisman> still getting logs
[20:23] <cmccabe> ok, here are some lines
[20:23] <cmccabe> - - [25/May/2011:14:23:15 -0400] "HEAD /cmccabe2/aaa HTTP/1.1" 500 - "-" "Boto/2.0b5 (linux2)"
[20:23] <sagewk> bchrisman: do you mind pastebinning that?
[20:23] <bchrisman> yup.. will do.
[20:25] <cmccabe> usually this sort of thing means core dump
[20:25] <cmccabe> but the pids are staying the same, and the core directory is staying empty
[20:26] <yehudasa> cmccabe: did that happen just after apache restarted?
[20:26] <cmccabe> I'm not sure what you mean
[20:26] <cmccabe> it doesn't seem timing-dependent, if that's what you mean
[20:26] <yehudasa> was it the first operation after apache restarted?
[20:26] <cmccabe> no
[20:28] <cmccabe> I guess I'm going to turn up rgw log and see what happens
[20:31] <bchrisman> logs are > 500k, I'm going to open an issue in bug tracker
[20:31] <cmccabe> it can be reproduced every time by running obsync with -d
[20:31] * aliguori (~anthony@ has joined #ceph
[20:33] <yehudasa> cmccabe: do you have logs?
[20:33] <cmccabe> ok, I finally found the logs
[20:33] <cmccabe> yehudasa: Wed, 25 May 2011 18:29:28 GMT
[20:33] <cmccabe> /cmccabe2/aaa
[20:33] <cmccabe> ...
[20:33] <cmccabe> 2011-05-25 14:33:18.969670 7f7cbfdd7740 flush(): buf='<?xml version="1.0" encoding="UTF-8"?><Error><Code>UnknownError</Code></Error>' strlen(buf)=78
[20:34] <cmccabe> read_permissions on cmccabe2:aaa only_bucket=0 ret=-61
[20:34] <yehudasa> 61 == ENODATA
[20:35] <yehudasa> cmccabe: were you creating this bucket using obsync?
[20:36] <cmccabe> no... but maybe it got altered
[20:37] <bchrisman> sagewk: submitted with issue??? testcode still coming.
[20:37] <yehudasa> cmccabe: it's missing the acl xattr
[20:38] <cmccabe> cmccabe@metropolis:~/ceph/src$ ./rados -c ~/s3examples/rgw.ceph.conf -p .rgw getxattr cmccabe2 user.rgw.acl
[20:38] <cmccabe> cmccabecmccabecmccabecmccabecmccabe
[20:39] <cmccabe> it doesn't seem to be missing
[20:40] <cmccabe> also, boto_tool can make s3 requests to cmccabe2 just fine
[20:40] <yehudasa> maybe it's missing the acl on the 'aaa' object?
[20:41] <cmccabe> yep
[20:42] <cmccabe> removing that lets it work
[20:42] <cmccabe> must have been created in a previous run of the unit test
[20:43] <cmccabe> and since there's no way of checking python besides running unit tests... because there's no static typing... grr
[20:48] * Dantman (~dantman@ Quit (Ping timeout: 480 seconds)
[20:53] <cmccabe> I might almost have to reinitialize rados/rgw each time
[20:57] * sjustlaptop (~sam@ip-66-33-206-8.dreamhost.com) has joined #ceph
[20:57] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[21:29] * aliguori (~anthony@ Quit (Ping timeout: 480 seconds)
[21:40] * aliguori (~anthony@ has joined #ceph
[22:03] * Yulya__ (~Yu1ya_@ip-95-220-181-18.bb.netbynet.ru) has joined #ceph
[22:10] * Yulya_ (~Yu1ya_@ip-95-220-176-197.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[22:19] * Yulya_ (~Yu1ya_@ip-95-220-189-189.bb.netbynet.ru) has joined #ceph
[22:25] * Yulya__ (~Yu1ya_@ip-95-220-181-18.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[22:25] * MK_FG (~MK_FG@ Quit (Ping timeout: 480 seconds)
[22:25] * Yulya__ (~Yu1ya_@ip-95-220-128-19.bb.netbynet.ru) has joined #ceph
[22:33] * Yulya_ (~Yu1ya_@ip-95-220-189-189.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[22:33] * Yulya_ (~Yu1ya_@ip-95-220-233-225.bb.netbynet.ru) has joined #ceph
[22:40] * Yulya__ (~Yu1ya_@ip-95-220-128-19.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[23:01] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[23:07] * Yulya__ (~Yu1ya_@ip-95-220-241-12.bb.netbynet.ru) has joined #ceph
[23:14] * Yulya_ (~Yu1ya_@ip-95-220-233-225.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[23:20] * Yulya_ (~Yu1ya_@ip-95-220-235-151.bb.netbynet.ru) has joined #ceph
[23:24] * Yulya__ (~Yu1ya_@ip-95-220-241-12.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[23:25] * aliguori (~anthony@ Quit (Ping timeout: 480 seconds)
[23:34] * aliguori (~anthony@ has joined #ceph
[23:34] * verwilst (~verwilst@dD57693C8.access.telenet.be) has joined #ceph
[23:50] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) Quit (Ping timeout: 480 seconds)
[23:52] * MK_FG (~MK_FG@ has joined #ceph
[23:53] * alexxy (~alexxy@ Quit (Ping timeout: 480 seconds)
[23:55] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) has joined #ceph
[23:56] * darkfader (~floh@ has joined #ceph
[23:57] <sagewk> bchrisman: just pushed kclient and mds fixes for the locking thing
[23:58] <bchrisman> sagewk: will require both?
[23:58] <bchrisman> I'm guessing it would..
[23:58] <bchrisman> we've been putting off kernel upgrade automation for too long now :)
[23:58] <sagewk> yeah
[23:58] <Tv> bchrisman: i hear ya..
[23:58] <Tv> though now i have a clear idea of how to do it, for once
[23:59] <bchrisman> Tv: what's yer plan? ;)
[23:59] * darkfaded (~floh@ Quit (Ping timeout: 480 seconds)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.