#ceph IRC Log


IRC Log for 2011-04-04

Timestamps are in GMT/BST.

[0:03] * MarkN (~nathan@ has left #ceph
[0:36] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[0:38] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit ()
[0:57] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[1:29] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[1:34] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit ()
[2:03] * jeffhung_ (~jeffhung@60-250-103-120.HINET-IP.hinet.net) has joined #ceph
[2:05] * Meths (rift@customer9880.pool1.unallocated-106-192.orangehomedsl.co.uk) Quit (synthon.oftc.net larich.oftc.net)
[2:05] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) Quit (synthon.oftc.net larich.oftc.net)
[2:05] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) Quit (synthon.oftc.net larich.oftc.net)
[2:05] * jjchen (~jjchen@lo4.cfw-a-gci.greatamerica.corp.yahoo.com) Quit (synthon.oftc.net larich.oftc.net)
[2:05] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[2:06] * Meths (rift@customer9880.pool1.unallocated-106-192.orangehomedsl.co.uk) has joined #ceph
[2:06] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) has joined #ceph
[2:06] * jjchen (~jjchen@lo4.cfw-a-gci.greatamerica.corp.yahoo.com) has joined #ceph
[2:16] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: Leaving)
[2:18] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[2:29] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[2:47] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[2:58] * jjchen (~jjchen@lo4.cfw-a-gci.greatamerica.corp.yahoo.com) Quit (synthon.oftc.net larich.oftc.net)
[2:58] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) Quit (synthon.oftc.net larich.oftc.net)
[2:58] * Meths (rift@customer9880.pool1.unallocated-106-192.orangehomedsl.co.uk) Quit (synthon.oftc.net larich.oftc.net)
[2:59] * Meths (rift@customer9880.pool1.unallocated-106-192.orangehomedsl.co.uk) has joined #ceph
[2:59] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) has joined #ceph
[2:59] * jjchen (~jjchen@lo4.cfw-a-gci.greatamerica.corp.yahoo.com) has joined #ceph
[3:02] * DJLee (82d8d198@ircip2.mibbit.com) has joined #ceph
[6:21] <DJLee> gregaf, here?
[6:23] <DJLee> about bug#970, i wasn't seeing that when the file sizes are large (and smaller number), so it could've been the queuing..
[6:23] <DJLee> the client only got 4gb ram;
[6:33] <greglap> DJLee: I'm not terribly familiar with the kernel, but what I see here is that it's trying to allocate memory for a new message and failing to do so
[6:33] <greglap> which makes me think that the client must be under memory pressure for some reason
[6:34] <greglap> it shouldn't break anyway so there's definitely some bug but I just want to get a better idea of the circumstances so we know what to look for
[7:05] <DJLee> thanks greg
[7:12] <DJLee> in a properly configured lacp environment for bonding, e.g., dual channel, where the trasnmit policy is done by src-dst-ip, in this case, a machine with a single IP address, can't push more than a single channel
[7:13] <DJLee> so the machine needs at least 2 separate IP address (2 hostnames) for 2 channels properly
[7:58] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[8:18] <darkfader> DJLee: the most current switches can hash in the tcp/udp port numbers
[8:18] <darkfader> (not that I got to own anything that new)
[8:46] * gregorg (~Greg@ Quit (Quit: Quitte)
[8:46] * gregorg (~Greg@ has joined #ceph
[8:49] <DJLee> darkfader, sadly, even the latest cisco 3560-x catalyst don't have src-dst-port (layer4)
[8:50] <DJLee> it only has src-dst-ip, what a ripoff..
[8:50] * allsystemsarego (~allsystem@ has joined #ceph
[8:52] <DJLee> shouldn't matter too much in real environment where client(s) have unique IP addresses, but for benchmarking, either client (single) or osd machines need multiple IP addresses,
[8:53] <DJLee> and with that multiple IPs, i gotta run simultaneous benchmark and add/merge the results back to get the final performance, etc.
[8:55] <DJLee> just adds one extra annoyance, you know..
[10:34] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[10:42] <maswan> DJLee: Or fast enough NICs that you don't need bonding.
[11:06] * Hugh (~hughmacdo@soho-94-143-249-50.sohonet.co.uk) has joined #ceph
[11:06] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Quit: neurodrone)
[11:11] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[12:45] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[12:53] * morse (~morse@supercomputing.univpm.it) Quit (Quit: Bye, see you soon)
[13:03] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[13:09] * DJLee (82d8d198@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[14:04] * st-7138 (~st-7138@a89-154-147-132.cpe.netcabo.pt) has joined #ceph
[14:06] * st-7138 (~st-7138@a89-154-147-132.cpe.netcabo.pt) Quit (Remote host closed the connection)
[14:08] * st-7236 (~st-7236@a89-154-147-132.cpe.netcabo.pt) has joined #ceph
[14:11] * st-7236 (~st-7236@a89-154-147-132.cpe.netcabo.pt) Quit ()
[14:13] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[14:15] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit ()
[15:01] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[15:18] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[16:39] * Yoric_ (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[16:39] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Read error: Connection reset by peer)
[16:39] * Yoric_ is now known as Yoric
[16:53] * MarkN (~nathan@ has joined #ceph
[17:06] * alexxy[home] (~alexxy@ has joined #ceph
[17:07] * cclien_ (~cclien@ec2-175-41-146-71.ap-southeast-1.compute.amazonaws.com) has joined #ceph
[17:07] * todin_ (tuxadero@kudu.in-berlin.de) has joined #ceph
[17:09] * Anticime1 (anticimex@netforce.csbnet.se) has joined #ceph
[17:10] * Anticimex (anticimex@netforce.csbnet.se) Quit (reticulum.oftc.net kilo.oftc.net)
[17:10] * alexxy (~alexxy@ Quit (reticulum.oftc.net kilo.oftc.net)
[17:10] * todin (tuxadero@kudu.in-berlin.de) Quit (reticulum.oftc.net kilo.oftc.net)
[17:10] * cclien (~cclien@ec2-175-41-146-71.ap-southeast-1.compute.amazonaws.com) Quit (reticulum.oftc.net kilo.oftc.net)
[17:12] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) Quit (Quit: Leaving.)
[17:47] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:51] * greglap (~Adium@ has joined #ceph
[17:53] * morse (~morse@supercomputing.univpm.it) Quit (Quit: Bye, see you soon)
[18:03] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[18:11] * cmccabe (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) has joined #ceph
[18:18] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[18:24] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:24] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[18:41] <cmccabe> wido: you there?
[18:41] * greglap (~Adium@ Quit (Read error: Connection reset by peer)
[18:42] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:57] <Tv> anyone know is radosgw provides read-your-write consistency or not? for S3, they're upgrading to that, not all availability zones do it yet but that seems to be the future
[18:57] <Tv> oh i think S3 actually did read-after-write, as in anyone's write, that's even stronger
[18:57] <Tv> http://aws.amazon.com/s3/faqs/#What_data_consistency_model_does_Amazon_S3_employ
[18:57] <Tv> yup
[18:57] <cmccabe> tv: only in norcal, and they charge extra
[18:58] <gregaf> Tv: pretty sure rgw does
[18:58] <gregaf> the write to OSD happens before a response goes back to the client
[18:58] <Tv> cmccabe: if by "only in norcal" you mean "in all but one availability zone"
[18:58] <gregaf> and all reads are serviced without rgw caching right now
[18:58] <cmccabe> tv: hmm, it used to be only northern california, did that change?
[18:58] <Tv> cmccabe: as i said, they're upgrading to that
[18:58] <cmccabe> tv: wow, it did
[19:00] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:00] <cmccabe> tv: but not for us-east, the region that was cheapest and therefore most used :P
[19:01] <Tv> cmccabe: yeah but it sounds like we want to test against what their service will be
[19:02] <cmccabe> tv: yeah
[19:06] * joshd1 (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:13] * alexxy[home] (~alexxy@ Quit (Ping timeout: 480 seconds)
[19:17] * alexxy (~alexxy@ has joined #ceph
[19:24] * alexxy[home] (~alexxy@ has joined #ceph
[19:24] <wido> cmccabe: yes, here
[19:24] <cmccabe> wido: I can't reproduce the bug you filed about osd conf
[19:24] <wido> I've checked my config, I have no changes to the "pid file" directive
[19:24] <cmccabe> wido: have you tried with head of line?
[19:25] <cmccabe> wido: so you're using the default value
[19:25] <wido> just got back home, I'm compiling atm
[19:25] <Tv> def readKeyWithRetry(key):
[19:25] <Tv> # We have this function, since the key's not always picked up the first time after a write.
[19:25] <Tv> # 1:49:14 PM Yehuda S: so when you're reading it it doesn't necessarily being read from the node that you written it into it, and it takes time for it to propagate
[19:25] <Tv> ... that doesn't smell like read-after-write
[19:25] * alexxy (~alexxy@ Quit (Ping timeout: 480 seconds)
[19:25] <wido> cmccabe: head of line, you mean the latest master, correct?
[19:26] <wido> cmccabe: my ceph.conf: http://zooi.widodh.nl/ceph/ceph.conf
[19:26] <cmccabe> wido: yes
[19:28] <wido> cmccabe: Just tried with d941422, still the same result, I'll update the issue
[19:28] <cmccabe> wido: I think I know what's going on here
[19:30] * alexxy (~alexxy@ has joined #ceph
[19:30] <cmccabe> wido: it's a problem with the way that default is implemented
[19:31] <Tv> cmccabe: skype time!
[19:31] <cmccabe> I'm here
[19:32] * alexxy[home] (~alexxy@ Quit (Ping timeout: 480 seconds)
[19:33] <wido> cmccabe: You mean that $type and $id are never replaced by their correct values? I've been searching through the code and couldn't find that
[19:34] * alexxy[home] (~alexxy@ has joined #ceph
[19:36] <cmccabe> wido: it happens for everything else, just not that default that is set in common_init
[19:36] <cmccabe> actual common_preinit
[19:37] <cmccabe> it can't be expanded there because we don't yet have the correct values for $type and $id
[19:38] * alexxy (~alexxy@ Quit (Ping timeout: 480 seconds)
[19:53] <gregaf> Tv: hmmm, does rgw use read-from-any-osd or something?
[19:53] <Tv> gregaf: no clue yet
[19:54] <gregaf> or it might just be that he wrote that prepared for caching gateways or something
[19:54] <Tv> gregaf: i want to get the tests in shape where anyone can run them, then i'll focus on rgw
[19:54] <Tv> well, in this case, caches need invalidation..
[19:54] <gregaf> Tv: yeah, but I don't think there actually are any caches yet
[19:55] <Tv> yeah there aren't afaik
[19:57] * Juul (~Juul@ has joined #ceph
[20:00] <wido> For caching I wouldn't build that into RGW
[20:01] <wido> I'd use Varnish for that, it's a great reverse-proxy with caching, it could intercept PUT's and invalidate it's own cache
[20:01] <Tv> depends on what the invalidation etc look like
[20:01] <wido> I'm using it in front of RGW, works fine
[20:01] <wido> there is some configuration example in the wiki of the RGW
[20:02] <Tv> but yeah i've used varnish extensively, i'm a big fan of it (apart from a few nagging issues :( )
[20:02] <Tv> but it's cache invalidation might not be good enough, at least without careful work
[20:03] <wido> You could purge (called ban nowadays)
[20:03] <cmccabe> tv: I guess squid is often used too
[20:03] <cmccabe> tv: I've never used that one though
[20:04] <cmccabe> tv: but it seems more appropriate for ceph, right? (joke)
[20:04] <wido> Varnish really rocks, I'm using it for a few big websites and even in shared webhosting, great piece of software
[20:05] <Tv> squid : varnish :: sendmail : postfix
[20:06] <Tv> wido: except for the part where it can e.g. corrupt it's log ringbuffer and start spewing garbage and/or refuse to restart :(
[20:07] <cmccabe> tv: I don't really know that much about sendmail, except for the fact that it got a bad security rep, and has very complex configuration
[20:07] <Tv> cmccabe: and it's slow -- and now you know a lot about squid, too!
[20:07] <cmccabe> tv: I ran sendmail once locally but never was a sysadmin using it
[20:08] <cmccabe> tv: MTAs seem to be a flamewar-provoking topic, but there was this thing called qmail that half the people hated, and the other half loved
[20:08] <wido> Tv: when is the last time you used Varnish?
[20:08] <Tv> djb has good ideas and bad communication skills
[20:08] <Tv> wido: a few months ago
[20:08] <wido> Was that version 2.1? That one fixed a lot
[20:08] <cmccabe> tv: I'm not too familiar with postfix, is it good?
[20:09] <Tv> cmccabe: postfix is the best thing out there in that space
[20:09] <Tv> wido: probably 2.1.3, not sure anymore
[20:10] <cmccabe> tv: I'll keep that in mind if I ever want to try running a mail server again
[20:10] <cmccabe> tv: which seems unlikely given how cloud-y everything is getting these days
[20:11] <wido> Tv: You shuld try 2.1.5, works much better
[20:12] <Tv> wido: no longer at that gig
[20:15] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[20:19] * RickB17 (~rbreidens@pat.recoverynetworks.com) has joined #ceph
[20:33] <bchrisman> if a node reboots, (cosd goes down and comes back up), will it auto-rejoin, or do I need to mark those osds up/in?
[20:33] <bchrisman> oops
[20:33] <bchrisman> answered my own question.. ;)
[20:49] * imcsk8 (~ichavero@nat.ti.uach.mx) has joined #ceph
[20:54] * cmccabe1 (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) has joined #ceph
[20:54] * cmccabe (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[20:59] <cmccabe1> wido: the config bug should be fixed by 0e26ece4e366972cbcbaf76db75df8d4512e361e
[20:59] <cmccabe1> wido: let me know if it works for you
[21:15] <wido> cmccabe1: Yes, it did
[21:15] <cmccabe1> wido: great.
[21:23] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: Leaving)
[21:27] * stefanha (~stefanha@yuzuki.vmsplice.net) has joined #ceph
[21:34] <stefanha> joshd1: In qemu rbd when the guest does a 4k write is it possible to overwrite the object in-place at the ceph osd level? (I know btrfs may not overwrite data on disk but that's a different story)
[21:34] <stefanha> Or is there some kind of read-modify-write going on?
[21:36] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[21:41] <Tv> stefanha: the operation on the wire should look like "write this data at this offset"
[21:41] <Tv> stefanha: so no read-modify-write cycle on the network level, at least
[21:45] <stefanha> Tv: Okay. I'm wondering if cosd will look up the object in its store and perform the write inside the object.
[21:45] <joshd1> stefanha: there's no read-modify-write in the rbd layer
[21:45] <stefanha> Tv: I saw there was some journalling thing inside cosd. Not sure how that comes into play but I think it journals a copy of the data to be written.
[21:45] <stefanha> joshd1: rbd layer on the client side or on the server side?
[21:46] <joshd1> stefanha: I think both
[21:46] <iggy> cosd used to have it's own storage layer (vs using the fs) could be related
[21:47] <gregaf> stefanha: depending on the underlying FS in use, it does journal the data to be written before it applies it to the store
[21:48] <cmccabe1> the journalling layer in the OSD is still used
[21:48] <gregaf> but it doesn't read-modify-write at any point and once the data is journaled any subsequent reads will block until the in-journal data is readable
[21:49] <stefanha> What's the nature of the journal? Do operations get appended to the journal and then acked to the client or does the operation also have to update the actual object in the store?
[21:50] <stefanha> i.e. is the journal a buffer to queue up operations and apply them later or is it purely for the crash/powerfailure case where ceph doesn't want to corrupt data?
[21:51] <gregaf> I'm not sure your question quite makes sense?
[21:51] <gregaf> the ultimate purpose of course is for data safety
[21:51] <gregaf> so with most FSes operations go into the journal first, and then are applied to the store
[21:52] <gregaf> with btrfs we can take a snapshot and put the data in-store and in-journal simultaneously, since if there's a failure we can just play back the journal on top of the snapshot
[21:53] <gregaf> but in most cases the client will get a commit notification as soon as the operation is journaled
[21:54] <stefanha> gregaf: thanks, that's what I was looking for.
[22:21] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[23:07] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[23:35] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[23:52] * MarkN (~nathan@ has left #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.