#ceph IRC Log

Index

IRC Log for 2011-05-10

Timestamps are in GMT/BST.

[0:26] * verwilst (~verwilst@dD57695CA.access.telenet.be) has joined #ceph
[0:36] <wido> sagewk: are you there?
[0:42] <wido> Not sure if my e-mail got out. But I'm in LA right now, lets say I'll drop by at 10am tomorrow?
[0:42] * Meduka_M1guca (~Yulya@ip-95-220-151-67.bb.netbynet.ru) has joined #ceph
[0:49] * Meduka_Meguca (~Yulya@ip-95-220-174-242.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[0:51] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Remote host closed the connection)
[0:53] <Tv> wido: everyone is just getting back from a field trip to our other office in LA area
[0:53] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[0:55] <wido> Tv: ah, ok :) Internet here in the hotel is not working that great, I keep getting errors when sending e-mails
[0:58] <greglap> bchrisman: sorry we didn't let you know, had a company meeting in the other office today like Tv said
[0:59] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[1:01] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[1:07] <Tv> oh hey who gets support watch this week? joshd?
[1:07] <joshd> yeah, I think so
[1:07] <Tv> tag, you're it!
[1:07] * Tv runs away
[1:08] <Tv> joshd: the one unanswered email from last week is Raible's thing about qcow2 file r/w speeds via ceph
[1:09] <Tv> nobody really knew what to say, when i went around asking for ideas
[1:09] <Tv> so best we can do is try to reproduce, i guess
[1:09] <Tv> but nobody got to that yet
[1:09] <joshd> ah, well, another thing to do then
[1:10] <Tv> oh also the doki74216@gmail "Change the weight of osd" remains unclear
[1:10] <Tv> that's all about crush maps
[1:13] <sage> wido: 10 sounds good!
[1:16] * verwilst (~verwilst@dD57695CA.access.telenet.be) Quit (Quit: Ex-Chat)
[1:17] * greglap (~Adium@198.228.210.38) has joined #ceph
[1:18] * djlee (~dlee064@des152.esc.auckland.ac.nz) has joined #ceph
[1:23] * djlee1 (~dlee064@des152.esc.auckland.ac.nz) Quit (Ping timeout: 480 seconds)
[1:38] <greglap> sage: can you check out the doki74216 email?
[1:38] <greglap> it's been sitting unanswered since Thursday and looks to be a CRUSH map problem
[1:39] <greglap> at first I thought maybe it was just not enough data for the pseudo-randomness to be very accurate but then I noticed both nodes have similar proportions across their 3 OSD daemons so probably something's wrong with the rule setup
[1:39] <sage> np
[1:39] <sage> uniform buckets don't support per-item weighting
[1:39] <sage> hense "uniform"
[1:39] <sage> hence
[1:41] <greglap> haha, well there's something weird going on then since the ratio is roughly 1:6:8 in terms of how much cosd 1,2,3 on each machine node stores?
[1:42] <greglap> which is not uniform
[1:42] <sage> hmm indeed!
[1:43] <joshd> rageguy: still a zombie?
[1:52] * djlee1 (~dlee064@des152.esc.auckland.ac.nz) has joined #ceph
[1:58] * djlee (~dlee064@des152.esc.auckland.ac.nz) Quit (Ping timeout: 480 seconds)
[2:00] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Quit: Been here. Done that.)
[2:00] * greglap (~Adium@198.228.210.38) Quit (Quit: Leaving.)
[2:16] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[2:16] * djlee (~dlee064@des152.esc.auckland.ac.nz) has joined #ceph
[2:20] * djlee1 (~dlee064@des152.esc.auckland.ac.nz) Quit (Ping timeout: 480 seconds)
[2:23] * zwu (~root@202.108.130.138) Quit (Ping timeout: 480 seconds)
[2:28] * zwu (~root@202.108.130.138) has joined #ceph
[2:36] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[2:36] * zwu (~root@202.108.130.138) Quit (Ping timeout: 480 seconds)
[2:50] * djlee1 (~dlee064@des152.esc.auckland.ac.nz) has joined #ceph
[2:56] * djlee (~dlee064@des152.esc.auckland.ac.nz) Quit (Ping timeout: 480 seconds)
[3:03] * djlee (~dlee064@des152.esc.auckland.ac.nz) has joined #ceph
[3:08] * djlee1 (~dlee064@des152.esc.auckland.ac.nz) Quit (Ping timeout: 480 seconds)
[3:21] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[3:27] * djlee1 (~dlee064@des152.esc.auckland.ac.nz) has joined #ceph
[3:31] * djlee (~dlee064@des152.esc.auckland.ac.nz) Quit (Ping timeout: 480 seconds)
[3:39] * djlee (~dlee064@des152.esc.auckland.ac.nz) has joined #ceph
[3:43] * djlee2 (~dlee064@des152.esc.auckland.ac.nz) has joined #ceph
[3:45] * djlee1 (~dlee064@des152.esc.auckland.ac.nz) Quit (Ping timeout: 480 seconds)
[3:47] * djlee1 (~dlee064@des152.esc.auckland.ac.nz) has joined #ceph
[3:47] * djlee (~dlee064@des152.esc.auckland.ac.nz) Quit (Ping timeout: 480 seconds)
[3:52] * djlee2 (~dlee064@des152.esc.auckland.ac.nz) Quit (Ping timeout: 480 seconds)
[3:56] * djlee (~dlee064@des152.esc.auckland.ac.nz) has joined #ceph
[3:58] * zwu (~root@202.108.130.138) has joined #ceph
[4:02] * djlee1 (~dlee064@des152.esc.auckland.ac.nz) Quit (Ping timeout: 480 seconds)
[4:05] * djlee1 (~dlee064@des152.esc.auckland.ac.nz) has joined #ceph
[4:11] * djlee (~dlee064@des152.esc.auckland.ac.nz) Quit (Ping timeout: 480 seconds)
[4:12] * djlee (~dlee064@des152.esc.auckland.ac.nz) has joined #ceph
[4:13] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[4:18] * djlee1 (~dlee064@des152.esc.auckland.ac.nz) Quit (Ping timeout: 480 seconds)
[4:18] * zwu (~root@202.108.130.138) Quit (Ping timeout: 480 seconds)
[4:24] * djlee1 (~dlee064@des152.esc.auckland.ac.nz) has joined #ceph
[4:25] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[4:25] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit ()
[4:25] * zwu (~root@202.108.130.138) has joined #ceph
[4:29] * djlee (~dlee064@des152.esc.auckland.ac.nz) Quit (Ping timeout: 480 seconds)
[4:35] * djlee (~dlee064@des152.esc.auckland.ac.nz) has joined #ceph
[4:38] * zwu (~root@202.108.130.138) Quit (Ping timeout: 480 seconds)
[4:40] * djlee1 (~dlee064@des152.esc.auckland.ac.nz) Quit (Ping timeout: 480 seconds)
[4:42] * zwu (~root@202.108.130.138) has joined #ceph
[5:02] * zwu (~root@202.108.130.138) Quit (Ping timeout: 480 seconds)
[5:06] * zwu (~root@202.108.130.138) has joined #ceph
[5:21] * zwu (~root@202.108.130.138) Quit (Ping timeout: 480 seconds)
[5:25] * zwu (~root@202.108.130.138) has joined #ceph
[5:26] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[5:27] * djlee1 (~dlee064@des152.esc.auckland.ac.nz) has joined #ceph
[5:32] * djlee (~dlee064@des152.esc.auckland.ac.nz) Quit (Ping timeout: 480 seconds)
[6:17] * djlee (~dlee064@des152.esc.auckland.ac.nz) has joined #ceph
[6:22] * djlee1 (~dlee064@des152.esc.auckland.ac.nz) Quit (Ping timeout: 480 seconds)
[6:33] * djlee1 (~dlee064@des152.esc.auckland.ac.nz) has joined #ceph
[6:35] * DanielFriesen (~dantman@S0106001eec4a8147.vs.shawcable.net) has joined #ceph
[6:36] * Dantman (~dantman@S0106001eec4a8147.vs.shawcable.net) Quit (Ping timeout: 480 seconds)
[6:39] * djlee (~dlee064@des152.esc.auckland.ac.nz) Quit (Ping timeout: 480 seconds)
[6:39] * djlee1 (~dlee064@des152.esc.auckland.ac.nz) has left #ceph
[6:40] * zwu (~root@202.108.130.138) Quit (Ping timeout: 480 seconds)
[6:45] * zwu (~root@202.108.130.138) has joined #ceph
[7:21] * zwu (~root@202.108.130.138) Quit (Ping timeout: 480 seconds)
[7:24] * eternaleye (~eternaley@195.215.30.181) Quit (Remote host closed the connection)
[7:24] * eternaleye (~eternaley@195.215.30.181) has joined #ceph
[7:35] * zwu (~root@202.108.130.138) has joined #ceph
[7:52] * zwu (~root@202.108.130.138) Quit (Ping timeout: 480 seconds)
[7:54] * zwu (~root@202.108.130.138) has joined #ceph
[8:08] * zwu (~root@202.108.130.138) Quit (Ping timeout: 480 seconds)
[8:30] * allsystemsarego (~allsystem@188.27.166.92) has joined #ceph
[8:33] * zwu (~root@202.108.130.138) has joined #ceph
[8:42] * bbigras (quasselcor@bas11-montreal02-1128536388.dsl.bell.ca) has joined #ceph
[8:42] * bbigras is now known as Guest356
[8:46] * bbigras__ (quasselcor@bas11-montreal02-1128536388.dsl.bell.ca) Quit (Ping timeout: 480 seconds)
[9:00] * zwu (~root@202.108.130.138) Quit (Ping timeout: 480 seconds)
[9:16] * zwu (~root@202.108.130.138) has joined #ceph
[9:43] * zwu (~root@202.108.130.138) Quit (Ping timeout: 480 seconds)
[9:44] * Hugh (~hughmacdo@soho-94-143-249-50.sohonet.co.uk) has joined #ceph
[9:48] * zwu (~root@202.108.130.138) has joined #ceph
[10:06] * DanielFriesen (~dantman@S0106001eec4a8147.vs.shawcable.net) Quit (Remote host closed the connection)
[10:18] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) has joined #ceph
[10:25] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[10:35] * alexxy[home] (~alexxy@79.173.81.171) has joined #ceph
[10:38] * alexxy (~alexxy@79.173.81.171) Quit (Ping timeout: 480 seconds)
[10:42] * Jiaju (~jjzhang@222.126.194.154) Quit (Ping timeout: 480 seconds)
[10:48] * lidongyang (~lidongyan@222.126.194.154) Quit (Remote host closed the connection)
[10:52] * lidongyang (~lidongyan@222.126.194.154) has joined #ceph
[10:52] * Jiaju (~jjzhang@222.126.194.154) has joined #ceph
[11:01] * lidongyang (~lidongyan@222.126.194.154) Quit (Remote host closed the connection)
[11:04] * lidongyang (~lidongyan@222.126.194.154) has joined #ceph
[11:05] * Jiaju (~jjzhang@222.126.194.154) Quit (Ping timeout: 480 seconds)
[12:05] * zwu (~root@202.108.130.138) Quit (Ping timeout: 480 seconds)
[12:10] * zwu (~root@202.108.130.138) has joined #ceph
[12:24] * Jiaju (~jjzhang@222.126.194.154) has joined #ceph
[12:32] * Dantman (~dantman@S0106001eec4a8147.vs.shawcable.net) has joined #ceph
[14:22] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[15:00] <jrosser> i see that there are config settings for public_addr and cluster_addr in cosd.cc
[15:02] <jrosser> any detail on what these are for? i needed to set them on osd nodes that have multiple interfaces.
[15:02] <rageguy> well, as I understand it, public addr is what clients will use to connect to osds
[15:03] <rageguy> while cluster_addr is what other cluster members will use
[15:03] <rageguy> although I don't know if mds is treated as a cluster member or a client in this context
[15:03] <rageguy> other osds should connect to cluster_addr
[15:04] <jrosser> ok - i could not find similar settings elsewhere
[15:05] <jrosser> it was all broken until i set public_addr and cluster_addr, as cosd picked the incorrect ip by default
[15:05] <jrosser> where "incorrect" is not the one i wanted - i guess :)
[15:06] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[15:06] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[15:06] <rageguy> well, if all of them are equally reachable there shall be no problem omitting them
[15:13] <jrosser> i think i have had problems relating to specifying particular node addresses in ceph.conf, but stuff binding to 0.0.0.0 and using interfaces that do not correspond to those addresses
[15:13] <jrosser> i suspect i have a wider question about the proper way to go about constraining ceph to use a particular phyiscal interface on the nodes
[16:49] * chraible (~chraible@blackhole.science-computing.de) Quit (Quit: Verlassend)
[16:58] * josef (~seven@nat-pool-rdu.redhat.com) Quit (Quit: leaving)
[17:25] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) Quit (Quit: Leaving.)
[17:51] * greglap (~Adium@198.228.208.228) has joined #ceph
[17:58] <sagewk> jrosser: when it binds the 0.0.0.0 (the default) the address is chosen based on the source used when connecting to the monitor
[18:11] * Dantman (~dantman@S0106001eec4a8147.vs.shawcable.net) Quit (Remote host closed the connection)
[18:40] <sagewk> wido: let me know when you head over and i can meet you downstairs
[18:40] <rageguy> sagewk: you're all in LA, right
[18:40] <sagewk> rageguy: yeah
[18:41] <rageguy> do you plan to participate in any of the linux-related conferences this year?
[18:42] <greglap> rageguy: sage and others from our company have been to a lot of them...
[18:42] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:43] <rageguy> yeah, I know
[18:43] <rageguy> I was just thinking where to go instead of OLS this year...
[18:44] <sagewk> let me think what's coming up...
[18:44] <sagewk> i'll be speaking at SDC (SNIA storage developers conf) in september. not necessarily linux specific
[18:45] <sagewk> that's all i currently have planned. not sure about plumbers.
[18:47] <rageguy> mmm santa clara
[18:48] <sagewk> mmm indeed
[18:48] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:49] <rageguy> and plumbers is close too
[18:49] <rageguy> much better than canada or oregon fwiw, except for the september part
[18:49] <rageguy> joshd: do you ave any fresh brains?
[18:50] * greglap (~Adium@198.228.208.228) Quit (Ping timeout: 480 seconds)
[18:50] <rageguy> joshd: actually I don't have much to tell you. I cherrypicked your rbd stuff on top of fedora's qemu-kvm 0.14.0, and it sompiled but then it goes haywire, essentially write-only
[18:51] <rageguy> and I haven't set up everything to properly debug rbd, from the error qemu-convert from rbd to a file throws at me
[18:52] <rageguy> convert from file to rbd works, rbd --export works, I haven't tried comparing source to destination yet.
[18:52] <wido> sagewk: I'm leaving the hotel now
[18:53] <joshd> rageguy: where is the fedora qemu source?
[18:53] <wido> be there in a few minutes
[18:53] <sagewk> wido: meet you at the top of the escalator
[18:54] <rageguy> joshd: I'll give you my srpms
[18:56] <rageguy> http://stingr.net/d/stuff/ceph/qemu-0.14.0-8.fc16.src.rpm
[18:57] <rageguy> Patch50 is the forward port
[18:57] <joshd> rageguy: thanks, hopefully there's something obvious like other missing patches
[18:58] <rageguy> http://stingr.net/d/stuff/ceph/ceph-0.27.1-3.fc15.src.rpm is what I'm building it against
[18:58] <rageguy> (well, binary stuff)
[18:58] <rageguy> all on fedora-14
[18:58] <joshd> if it's compiling, you've got the right ceph version
[18:59] <rageguy> yeah it does
[18:59] <rageguy> I already tried unpatched 0.14.0 and ran into rados_initialize vs rados_create
[18:59] <rageguy> :)
[18:59] <joshd> ah, right
[19:00] <rageguy> and triggering is easy - qemu-img fails fast
[19:00] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:00] <rageguy> what's surprising is that I actually was able to install fedora on it (but not boot)
[19:00] <rageguy> what is also funny is that if I set cache=writeback qemu fails at some point
[19:01] <rageguy> either it segfaults, or it just cgives lots of read errors to the guest
[19:01] <joshd> did it fail with output like: common/Mutex.h: 118: FAILED assert(r == 0) ?
[19:02] <rageguy> yeah
[19:02] <rageguy> it did
[19:02] <rageguy> common/Mutex.h: In function 'void Mutex::Lock(bool)', in thread '0x7f419b5fc700'
[19:02] <rageguy> common/Mutex.h: 118: FAILED assert(r == 0)
[19:02] <joshd> ok, that seems to be a race condition in librbd we're trying to find
[19:02] <rageguy> and then I was like - where's my debuginfo?
[19:02] <rageguy> 0xffffuuuu~
[19:02] <joshd> haha
[19:03] <joshd> do you have the core file?
[19:03] <rageguy> I think I can try reproducing it, with debuginfo this time
[19:03] <rageguy> no
[19:04] <rageguy> it's another "challenge" - to make this libvirt-controlled thing actually drop something
[19:06] <joshd> if you can run 'ulimit -c unlimited' before you start qemu you can get a core file, and we could figure out which mutex is triggering this
[19:06] <rageguy> yeah
[19:06] <rageguy> let me add some debuginfo first :)
[19:09] <yehuda_hm> joshd: not related to this race.. there's the rbd-async-convert branch that makes the qemu-img conversion runs much faster
[19:09] <yehuda_hm> we should try to get it upstream
[19:09] <joshd> yehuda_hm: cool, that's a good idea
[19:10] <rageguy> send it all to me, I'll test it
[19:10] <rageguy> :)
[19:10] <yehuda_hm> joshd: the patch is not very big, but one thing that we need to verify is that we wait for all writes to complete in the end
[19:11] * Dantman (~dantman@S0106001eec4a8147.vs.shawcable.net) has joined #ceph
[19:11] <yehuda_hm> rageguy: it's on the ceph qemu-kvm git tree
[19:25] <rageguy> okay
[19:29] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[19:29] * cmccabe (~cmccabe@208.80.64.174) has joined #ceph
[19:31] <sagewk> let's meet!
[19:39] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Read error: No route to host)
[20:09] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[20:10] <cmccabe> librados remove doesn't seem to return an error code if the object isn't found
[20:10] <cmccabe> I guess that could be intentional, but I doubt it
[20:24] * alexxy[home] (~alexxy@79.173.81.171) Quit (Ping timeout: 480 seconds)
[20:25] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[20:35] * Meduka_M1guca (~Yulya@ip-95-220-151-67.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[21:09] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[21:17] * alexxy (~alexxy@79.173.81.171) Quit (Read error: Connection reset by peer)
[21:25] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[21:29] * alexxy (~alexxy@79.173.81.171) Quit (Remote host closed the connection)
[21:29] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[21:42] <bchrisman> sent out patch.. hopefully not mangled..
[22:12] * alexxy (~alexxy@79.173.81.171) Quit (Remote host closed the connection)
[22:14] * allsystemsarego (~allsystem@188.27.166.92) Quit (Quit: Leaving)
[22:16] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[22:23] * alexxy (~alexxy@79.173.81.171) Quit (Remote host closed the connection)
[22:25] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[22:31] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[22:33] * alexxy (~alexxy@79.173.81.171) Quit (Remote host closed the connection)
[22:33] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[23:14] <sagewk> yehuda_hm: the remove ENOENT thing is an oversight, let's add it in (to master branch)
[23:14] <yehuda_hm> sagewk: ok
[23:26] * bchrisman (~Adium@sjs-cc-wifi-1-1-lc-int.sjsu.edu) has joined #ceph
[23:48] * bchrisman (~Adium@sjs-cc-wifi-1-1-lc-int.sjsu.edu) Quit (Quit: Leaving.)
[23:54] <rageguy> joshd: did you spot a fail?
[23:54] <joshd> rageguy: haven't been able to reproduce it yet
[23:55] <rageguy> joshd: so, qemu-img is able to read from rbd?
[23:55] <joshd> and haven't found the race condition by inspection yet either :(
[23:56] <joshd> yeah, qemu-img works fine for me
[23:57] <rageguy> ugh
[23:57] <rageguy> ceph version?
[23:59] <joshd> 0.27 or higher, not sure if librbd was fully in ones previous to that

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.