#ceph IRC Log


IRC Log for 2012-03-06

Timestamps are in GMT/BST.

[0:03] <Tv__> hrrmph irc on 2nd laptop is annoying when the screensaver tries to lock this screen all the time
[0:04] <Tv__> i need to be more engaged on this keyboard ;)
[0:04] <Tv__> let's try 30 minutes
[0:07] * joao (~JL@89-181-150-46.net.novis.pt) has joined #ceph
[0:50] * BManojlovic (~steki@ Quit (Remote host closed the connection)
[0:51] <nhm> ooh, I can get to sepia again.
[0:55] * tnt_ (~tnt@215.189-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[1:11] <elder> Me too.
[1:11] <elder> Not cool.
[1:12] * lofejndif (~lsqavnbok@9KCAAEG6L.tor-irc.dnsbl.oftc.net) Quit (Quit: Leaving)
[1:33] * yoshi (~yoshi@p8031-ipngn2701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:46] <nhm> elder: not cool that we can get back in and work? ;)
[1:46] <elder> Oh, sorry, I mis-read it.
[1:46] <elder> At about that moment, my connection disappeared for a bit again.
[1:46] <nhm> oh, your right
[1:46] <elder> It had come back then went away, but it seems better now.
[1:46] <nhm> now mine is busted again
[1:47] <elder> I don't think anyone out there is having a nice afternoon.
[1:47] <nhm> yeah, I'm not even getting email replies from HR. :)
[1:49] <nhm> hopefully they can get it figured out soon.
[1:58] <dmick> hm. net is up for most of us here
[2:02] <joao> I still can't access sepia
[2:02] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[2:03] <joao> I'll give it another shot in the morning
[2:03] <joao> you guys have a great day
[2:03] <joao> ttyt
[2:03] <nhm> I could access sepia briefly, but not now
[2:03] <nhm> ttyl, have a good evening joao
[2:04] * joao (~JL@89-181-150-46.net.novis.pt) Quit (Quit: Leaving)
[2:04] <elder> I'm still getting timeouts when I attempt to query my teuthology locks.
[2:04] <elder> Headed out for a few hours, will check back laster.
[2:04] <elder> later even.
[2:05] <nhm> yeah, seems like it's pretty much all of the ceph stuff.
[2:05] <nhm> hrm, I guess I can get to flak
[2:40] * imjustmatthew (~imjustmat@pool-96-228-59-130.rcmdva.fios.verizon.net) Quit (Remote host closed the connection)
[3:12] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Quit: Ex-Chat)
[3:28] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[3:37] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[3:50] <The_Bishop> should i care about this?
[3:50] <The_Bishop> 2012-03-06 03:49:08.313090 log 2012-03-06 03:48:58.712920 osd.6 1676 : [WRN] old request osd_sub_op(client.4109.1:18081 0.1c3 81a03dc3/100000026a1.00000004/head [] v 31'39 snapset=0=[]:[] snapc=0=[]) v5 received at 2012-03-06 03:48:27.259555 currently started
[4:40] <Tv__> The_Bishop: are you perhaps upgrading your cluster?
[4:40] <The_Bishop> nope
[4:41] <The_Bishop> it is a fresh setup with 9 disks on 4 computers
[4:42] <The_Bishop> the disks have 35G..466G of storage totalling 1.7T
[4:42] <The_Bishop> ah, and it runs on aged machines
[4:43] <Tv__> sorry the machine i'd have the source tree on is getting *killed* by my browser
[4:43] * chutzpah (~chutz@ Quit (Quit: Leaving)
[4:43] <Tv__> i'll check the sources after it is done with swapping ;)
[4:50] * dmick (~dmick@aon.hq.newdream.net) Quit (Quit: Leaving.)
[4:55] <Tv__> The_Bishop: ah yes.. that's a debug mechanism that warns of requests coming to OSD that took >30 sec to get serviced
[4:55] <Tv__> The_Bishop: which tends to be a symptom of an unhappy filesystem / block device
[5:00] <The_Bishop> its more like old and slow hardware in my case ;)
[5:00] <Tv__> yeah
[5:01] <The_Bishop> and only 100mbit swiched network
[5:02] <The_Bishop> i've to say i'm impressed of what i see
[5:03] <The_Bishop> nice, the directory sizes are the sizes of the contents therein
[5:03] <Tv__> if you like that, wait till you discover .snap ;)
[5:03] <The_Bishop> whats this?
[5:03] <Tv__> subtree snapshots
[5:04] <The_Bishop> well i already have (and use) this with btrfs
[5:04] <Tv__> http://ceph.newdream.net/wiki/Snapshots
[5:04] <Tv__> yeah just a cool feature
[5:05] <The_Bishop> for the moment i evict a 2.5T harddisk to integrate it into the cluster
[5:06] <The_Bishop> is it enough to disable one of the osd-processes to call for a rebalance?
[5:06] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[5:08] <Tv__> The_Bishop: there is no "rebalance" as such... tell me more
[5:08] <Tv__> (as rados has no allocation tables, it is essentially "perfectly balanced" as soon as the osds have finished copying data around)
[5:09] <The_Bishop> isn't it so that ceph duplicates data across devices to be failsafe when disks die?
[5:09] <Tv__> yeah
[5:09] <Tv__> oh you mean to ensure you have enough copies
[5:09] <Tv__> that's called taking the osd "out"
[5:09] <Tv__> once it's out, the remaining copies will get re-replicated to ensure enough copies are around
[5:10] <The_Bishop> so that i can take a disk out of the cluster without losing data
[5:10] <Tv__> http://ceph.newdream.net/docs/latest/control/#osd-subsystem
[5:10] <Tv__> ceph osd out N
[5:11] <The_Bishop> ok, will try this when the cluster is more filled
[5:14] <The_Bishop> does ceph benefit from "compress-force" on the btrfs?
[5:22] <The_Bishop> may i set the replication size to one?
[5:31] <Tv__> The_Bishop: sure, if you don't care about the data
[5:31] <Tv__> as for compress, a benchmark is the only real way to know
[5:31] <Tv__> compression, by it's nature, is a tradeoff between cpu, bandwidth and latency -- there is no "better"
[5:32] <The_Bishop> for me it means better in storage capacity
[5:33] <Tv__> yeah, if your cpu and memory can take it, and the increase in latency is tolerable, etc
[5:34] <The_Bishop> i fear that i need rep size 1 for a short time to get the big disk empty
[5:34] <The_Bishop> ... and to look how the cluster re-replicates afterward ;)
[5:35] <Tv__> ohh so
[5:35] <Tv__> i'm not actually sure what's gonna happen if you drop rep size to 1, then ceph osd out the big disk
[5:35] <Tv__> because worst case, the only copy of many objects is on that big disk, and it might go "out" before it hands them off
[5:35] <Tv__> the right way to do that is to continually decrease the weight of that osd until it has no objects
[5:36] <Tv__> and *then* take it "out"
[5:36] <The_Bishop> dont get me wrong, i have 1.5T on the big disk and the cluster so far is only 1.7T
[5:37] <The_Bishop> when the big disk is empty i will integrate it as OSD
[5:38] <The_Bishop> and then i will set the rep size back to 2
[9:04] -coulomb.oftc.net- *** Looking up your hostname...
[9:04] -coulomb.oftc.net- *** Checking Ident
[9:04] -coulomb.oftc.net- *** No Ident response
[9:04] -coulomb.oftc.net- *** Found your hostname
[9:04] * CephLogBot (~PircBot@rockbox.widodh.nl) has joined #ceph
[9:18] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:24] * tnt_ (~tnt@215.189-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[9:42] * tnt_ (~tnt@212-166-48-236.win.be) has joined #ceph
[9:54] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[10:07] * Oejet (~Oejet@x1-6-74-44-01-fe-cf-20.k1024.webspeed.dk) has joined #ceph
[10:09] * Oejet (~Oejet@x1-6-74-44-01-fe-cf-20.k1024.webspeed.dk) has left #ceph
[11:05] * Volley (~worf@hexen.cgv.tu-graz.ac.at) has joined #ceph
[11:26] <Volley> can one recommend some (up to date) reading that allows me to get a feeling of the state of ceph?
[11:33] <Volley> i'm playing with the thought of doing a test setup ... but i am not sure if that could be a "test to use really use it" with not that important data and a backup strategy, or if that is way too early
[11:41] * fronlius is now known as LarsFronius
[11:41] * LarsFronius (~fronlius@testing78.jimdo-server.com) Quit (Quit: LarsFronius)
[11:41] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[11:56] <Volley> found http://ceph.newdream.net/docs/latest/#status ... sounds ... like time to plan some tests
[11:56] * joao (~JL@89-181-145-13.net.novis.pt) has joined #ceph
[12:06] * yoshi (~yoshi@p8031-ipngn2701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[12:06] * Volley (~worf@hexen.cgv.tu-graz.ac.at) Quit (Quit: Leaving)
[12:08] * joao (~JL@89-181-145-13.net.novis.pt) Quit (Ping timeout: 480 seconds)
[12:14] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) Quit (Quit: Ex-Chat)
[12:17] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) has joined #ceph
[12:18] * joao (~JL@ace.ops.newdream.net) has joined #ceph
[12:18] * Volley (~worf@chello080109200187.3.sc-graz.chello.at) has joined #ceph
[12:32] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[13:16] * The_Bishop (~bishop@ Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[13:19] <nhm> Volley: definitely the rados and radosgw pieces will be production ready first.
[13:46] <joao> nhm, what is your timezone? You being active whenever I'm around makes me wonder if you sleep at all :p
[13:47] <nhm> joao: hehe, I'm central time in the US, but get up earlier than most. :)
[13:48] <nhm> joao: You are in Brazil right?
[13:48] <joao> same language, other side of the atlantic and further north :)
[13:49] * The_Bishop (~bishop@178-17-163-220.static-host.net) has joined #ceph
[13:49] <nhm> Ah, Portugal! I'd like to go there some day.
[13:49] <joao> then come; we have good food
[13:50] <nhm> joao: Excellent. My wife speaks a bit of Portuguse.
[13:51] <nhm> blah, Portuguese.
[13:51] * nhm needs coffee
[13:51] <joao> even if she didn't, people around here will try to their best to talk with virtually anyone, no matter what's the language
[13:52] <joao> and as a last resort, they will point semi-randomly in the air in order to communicate
[13:54] <joao> nhm, can you access any of the sepia machines?
[13:54] <nhm> That's great. It's interesting how different countries are regarding how they treat visitors. I imagine the US is probably not particularly friendly.
[13:54] <nhm> joao: Nope. I sent a message to one of the network folks yesterday after they said everything was working again but never got a response.
[13:55] <joao> nhm, although I haven't been to the US yet (therefore I wouldn't know), the only country where I've had issues with talking with the locals was in France
[13:56] <joao> but then again, I'm sure it was just bad luck
[13:56] <nhm> joao: The US is pretty diverse so I imagine a lot of it depends on the region you go to.
[13:57] <nhm> joao: Around here people are fairly nice but quiet and mostly keep to themselves.
[13:57] <joao> nhm, where are you from?
[13:58] <nhm> joao: Minnesota
[13:59] <nhm> joao: We're one of the coldest states in the country outside of Alaska. I figure the long winters probably make people more isolated.
[13:59] <joao> I've never been there and I already dread your state
[13:59] <nhm> joao: hehehe
[14:00] * root____ (~root@ns1.waib.com) has joined #ceph
[14:00] * root____ is now known as rz
[14:00] <rz> hi
[14:00] <nhm> joao: It takes some getting used to certainly.
[14:00] <joao> I bet it would take me a whole lot more than getting used to
[14:00] <nhm> rz: hello!
[14:01] <joao> I can't even tolerate the occasional 0ºC that happen in Lisbon every once in a couple of years
[14:01] <nhm> joao: The thing we like to say around here is that the winter makes us appreciate the other seasons all the more. ;)
[14:01] <joao> that seems a good saying
[14:03] <nhm> joao: this year was unusual. Other years it can get down to -20 to -30C on a fairly regular basis. This year it was much warmer. 0C was pretty common.
[14:03] <joao> lol
[14:04] <joao> on the upside, snow
[14:05] <joao> and ice fishing
[14:05] <nhm> Have you gone ice fishing?
[14:06] <joao> no, but I would love to try it
[14:07] <joao> although I'm yet to find the kind of fishing I'm good at
[14:08] <nhm> Yeah, I've never been really good at fishing. I don't think I have the patience for it. :)
[14:09] <joao> my problem usually is the surrounding rocks or the sea sickness
[14:09] <joao> that's why I think ice fishing would be great
[14:10] <joao> being unable to access sepia is driving me nuts
[14:10] <nhm> Lake fishing is generally much calmer than fishing in the ocean too. You might like that better.
[14:11] <nhm> joao: Yeah, I'm trying to work on some parallel job runner scripts while I wait. I think now I'm going to go have breakfast though.
[14:12] <joao> and I'll try to have lunch, or I believe I'll just go insane while trying to ssh to sepia
[14:26] <rz> wanna know if ceph can do some scalable heterogeneous storage with multiples servers without same free space and sort of aggregate setup
[14:28] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[14:29] * diegows (~diegows@ Quit (Ping timeout: 480 seconds)
[14:43] <iggy> heterogeneous environments is ceph's middle name (actually it's steve)
[14:45] <iggy> I think the trick in that case is just getting the map's right
[14:50] * andret (~andre@pcandre.nine.ch) has joined #ceph
[14:50] <elder> nhm have you tried doing teuthology runs from a machine inside the VPN?
[14:53] <joao> elder, looks like it works from inside
[14:55] <joao> but access through the vpn is busted
[15:00] <elder> OK. I guess I'll get myself set up on an internal machine for now.
[15:02] <joao> but looks like the sepia I'm on does not reach the internet
[15:02] <joao> which is a problem when it comes to installing debs using apt
[15:22] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) has joined #ceph
[15:46] <elder> joao, OK I now see what you mean. I can't build ceph because I'm missing a package on flak. How is that?
[15:48] <elder> I can't build teuthology because sage owns a file in /tmp (and on metropolis Dan owns the same file preventing the build there).
[16:04] <joao> lol
[16:05] <joao> can't you use the sepiaX machines?
[16:07] <nhm> elder: yeah, that's what makes running from *any* of the dev machines hard.
[16:07] <elder> I can't get into them from my house. I also can't get in from flak (for example)
[16:07] <nhm> elder: I noticed that when I first setup teuthology. I had to do it from my machine.
[16:08] <joao> elder, I can ssh into flak from home though
[16:08] <nhm> You can fudge it by copying sage's install over to your home directory and it works for some stuff, but not for others.
[16:08] <elder> I can get into flak.
[16:08] <nhm> there's a bug in virtualenv that doesn't clean up the distribute tgz files in /tmp
[16:08] <sage> elder, nhm: sudo rm /tmp/distribute*. not sure why virtualenv leaves it laying around
[16:09] <elder> D'oh! Good idea!
[16:09] <nhm> sage: I don't think I have permission to do that?
[16:09] <sage> which machine(s)?
[16:09] <nhm> pretty much all the dev machines. I don't think I'm in the admin group.
[16:10] <nhm> On flak I think it's owned by you though.
[16:11] <elder> Not any more!
[16:11] <nhm> heh
[16:11] <sage> nhm: try now?
[16:12] <elder> sage, any idea what needs to be done to restore our previous access?
[16:12] <sage> looking at it now
[16:13] <nhm> sage: Still doesn't think I'm in the sudoers file...
[16:14] <sage> you should be in the sudo group tho
[16:14] <sage> log out and in?
[16:14] <nhm> did that.. :/
[16:15] <nhm> tried again, still no go.
[16:17] <sage> which machine?
[16:17] <sage> der, wrong username
[16:18] <nhm> hehe
[16:18] <nhm> virtualenv should really create a random prefix or directory name or something in tmp for those files.
[16:18] <sage> seriously
[16:19] <nhm> even if they clean up it's still crappy.
[16:20] <elder> OK, I'm hitting the same problem from flak as from home. Can't ssh to sepia*
[16:20] <nhm> elder: huh, that worked last night
[16:20] <nhm> actually scratch that. teuthology worked last night.
[16:20] <nhm> never actually tried sshing to sepia.
[16:22] * stxShadow (~jens@p4FD0617B.dip.t-dialin.net) has joined #ceph
[16:22] <elder> I'm giving up for a bit. I'm assuming somehow the networking issues will get resolved soon...
[16:22] <nhm> oh well. I'll continue to work on scripts to run the benchmark in parallel.
[16:23] <nhm> Any of you pyhton goes play with fabric much?
[16:24] <nhm> ugh, python guys, not python goes
[16:24] <sage> the sepia nodes let me in, but they can't reach teh outside internet, so it's slow (sshd reverse dns lookup).
[16:24] <sage> i'm prodding the network guys
[16:24] <elder> Thanks sage
[16:24] <sage> nh, you should be in the sudo group now
[16:24] <nhm> sage: cool, thanks!
[16:30] <wido> sage: I'm running the wip-2116 branch right now, nothing going on
[16:31] <wido> ran multiple I/O loads on it today, it's all stable
[16:31] * wido is now known as widodh
[16:31] * widodh (~wido@rockbox.widodh.nl) Quit (Quit: Changing server)
[16:33] <nhm> sage: btw, you are up early. :)
[16:35] * wido (~wido@rockbox.widodh.nl) has joined #ceph
[16:36] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) Quit (Quit: Leaving.)
[16:53] <stxShadow> can someone tell me the meaning of the following messag please :
[16:53] <stxShadow> 2012-03-06 16:52:40.787873 log 2012-03-06 16:52:32.877678 osd.3 3385 : [WRN] old request osd_op(client.97909.0:2605037 rb.0.3.000000000e2c [write 638976~4096] 11.cd942f85) v4 received at 2012-03-06 16:52:02.399849 currently delayed
[16:58] <The_Bishop> yes
[16:58] <The_Bishop> it means that osd.3 is slow and has data older than 30s in flight
[16:59] <stxShadow> hmmm ..... so i have to check osd3 why it is slow
[16:59] <The_Bishop> yes
[16:59] <The_Bishop> this is only a warning, but it is a sign that your cluster does not perform well
[17:00] <The_Bishop> is osd.3 on a slow disk, or slow machine?
[17:00] <stxShadow> i use very recent hardware (raid > 400 MB / sec + ssd for journaling)
[17:00] <The_Bishop> maybe the filesystem is fragmented heavily
[17:01] <stxShadow> that can be possible .... the cluster is under heavy load for nearly 2 month now
[17:01] <The_Bishop> or the network connection to this box is congested
[17:03] <stxShadow> IO Subsystem Usage is 10-15%
[17:04] <The_Bishop> well, then you should investigate more
[17:06] <stxShadow> fragmentation seems to be a little bit high ....
[17:06] <The_Bishop> btrfs as filesystem?
[17:06] <stxShadow> no ... xfs
[17:06] <The_Bishop> do you have a gauge vor io-wait?
[17:07] <stxShadow> the message was shown when ceph was doing the daily scrubbing
[17:07] <The_Bishop> s/vor/for/
[17:07] <stxShadow> german ;)
[17:07] <stxShadow> hmmm iowait is 0,1%
[17:10] <nhm> stxShadow: did we look at the extent fragmentation on your nodes a little while back?
[17:10] <stxShadow> yes we did :)
[17:11] <stxShadow> so i know how to lock for the fragmentation ;)
[17:11] <nhm> I thought so. :) There's some work coming down the pipe that may help that.
[17:12] <stxShadow> in terms of xfs ?
[17:14] * jmlowe (~Adium@129-79-134-204.dhcp-bl.indiana.edu) has joined #ceph
[17:15] <nhm> stxShadow: Yeah. See: http://tracker.newdream.net/issues/2003
[17:16] <sagewk> stxshadow: it will hopefully be in v0.44, but we'll see how things progress this week.
[17:17] <stxShadow> nhm, sagewk : thx ....
[17:18] <stxShadow> i hope the update will be possible without destroying our data :)
[17:19] <sagewk> stxshadow: it will be
[17:19] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Quit: Ex-Chat)
[17:19] <stxShadow> 2003 seems to be exactly my problem .... i see 8 extends too
[17:23] <nhm> stxShadow: yeah, it's why I asked you to take a look. I wanted to verify it was happening to other people as well.
[17:31] <Volley> i'm just trying to do a test setup, currently at http://ceph.newdream.net/docs/latest/ops/install/mkcephfs/#running-mkcephfs and ... how long is mkcephfs supposed to run when i just used 2 nodes yet? i think i have a unkillable ceph-osd now ... :)
[17:36] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:37] <stxShadow> it runs one a few seconds here
[17:37] <stxShadow> 1 mon / mds + 2 osd nodes
[17:39] <Volley> maybe i should have tried all that in a VM first ...
[17:40] <stxShadow> do you have passwordless ssh access to you nodes?
[17:40] <Volley> aye
[17:41] <Volley> and i don't yet know what the mkcephfs did, but it managed to not work at first, asking for passwords
[17:42] <Volley> and running it wth verbose flag on it didn't ask for passwords anymore, but then it did stop and left me with a unkillable 100% cpu load ceph-osd
[17:44] <stxShadow> how is you mkcephfs command line
[17:44] <stxShadow> ?
[17:44] * aliguori (~anthony@ has joined #ceph
[17:45] <Volley> it was mkcephfs -v -a -c mycluster.conf -k mycluster.keyring being in /etc/ceph with the conf file being there
[17:46] <Volley> i tried to follow the above linked install docu. i'm on debian wheezy 64bit
[17:46] <stxShadow> so you have initialized a filesystem (ext4 or xfs) and mounted it before?
[17:48] <Volley> /srv/osd.$id exists ( btrfs empty subvol ) - or did i miss some point? *reading docu again*
[17:50] <stxShadow> erm ... what is the filename of your conf ?
[17:51] <Volley> you mean mycluster.conf ?
[17:51] <sagewk> tv__: also, the problem with old sepia right now seems to be that they can't reach outside network. have an issue open for that.. let me know if you see anything else
[17:52] <Volley> hmm ... i think i localized possible totally different cause for my problems
[17:53] * Tv|work (~Tv_@aon.hq.newdream.net) has joined #ceph
[17:54] <stxShadow> Volley, -> ich i move my ceph.conf to mycluster.conf
[17:54] <stxShadow> ceph is not starting anymore
[17:54] <stxShadow> root@cephnode1:/etc/ceph# /etc/init.d/ceph restart
[17:54] <stxShadow> /etc/init.d/ceph: ceph conf /etc/ceph/ceph.conf not found; system is not configured.
[17:55] <stxShadow> -> its debian squeeze
[17:55] <stxShadow> maybe that is one of your problems ?
[17:59] <Volley> stxShadow: nah, ceph.conf is created by mkcephfs from mycluster.conf ... but i think i have some problem with either my storage, or kernel, or root filesystem ...
[18:00] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[18:04] * joao (~JL@ace.ops.newdream.net) Quit (Ping timeout: 480 seconds)
[18:06] * Kathor (~Kathor@ Quit (Read error: Connection reset by peer)
[18:06] * Kathor (~Kathor@ has joined #ceph
[18:16] * joao (~JL@89-181-145-13.net.novis.pt) has joined #ceph
[18:17] * jmlowe (~Adium@129-79-134-204.dhcp-bl.indiana.edu) Quit (Quit: Leaving.)
[18:18] * cclien (~cclien@ec2-50-112-123-234.us-west-2.compute.amazonaws.com) Quit (Remote host closed the connection)
[18:19] * stxShadow (~jens@p4FD0617B.dip.t-dialin.net) Quit (Remote host closed the connection)
[18:20] * cclien (~cclien@ec2-50-112-123-234.us-west-2.compute.amazonaws.com) has joined #ceph
[18:25] * Volley (~worf@chello080109200187.3.sc-graz.chello.at) Quit (Read error: Connection reset by peer)
[18:25] * Volley (~worf@chello080109200187.3.sc-graz.chello.at) has joined #ceph
[18:29] * tnt_ (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[18:37] * tnt_ (~tnt@37.191-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:38] * jmlowe (~Adium@129-79-195-139.dhcp-bl.indiana.edu) has joined #ceph
[18:39] * dmick (~dmick@aon.hq.newdream.net) has joined #ceph
[18:46] * joao (~JL@89-181-145-13.net.novis.pt) Quit (Read error: Connection reset by peer)
[18:57] * joao (~JL@89-181-145-13.net.novis.pt) has joined #ceph
[19:01] * chutzpah (~chutz@ has joined #ceph
[19:01] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[19:02] <NaioN> nhm: the pastebin in http://tracker.newdream.net/issues/2003 doesn't work anymore :(http://tracker.newdream.net/issues/2003
[19:04] <NaioN> that's our cluster, we formatted the filesystems without any options (mkfs.xfs /dev/X), if I'm correct the inode size will be 256 bytes
[19:05] <yehudasa> wido: kudos on the three phase..
[19:05] <NaioN> Christoph suggests 512 bytes, but will that be sufficient, we only use rbd's
[19:10] <sagewk> naion: good question... i don't think we've checked that. do you know how much room that gives us for xattrs?
[19:14] * bchrisman (~Adium@ has joined #ceph
[19:14] * bchrisman (~Adium@ Quit ()
[19:20] <rz> is it possible to install ceph on non empty nodes ?
[19:21] <NaioN> sagewk: not much
[19:22] <NaioN> I'll check with Pascal (pmjdebruijn) tomorrow
[19:22] <NaioN> We looked into it together
[19:23] <NaioN> On our next cluster we want to use bigger inodes, say 1k
[19:23] <NaioN> But I want to have some feeling about how much space we need
[19:27] * bchrisman (~Adium@ has joined #ceph
[19:28] <NaioN> The OpenStack documentation also suggest a 1k inode size for xfs
[19:38] <NaioN> sagewk: what will 'filestore flusher = false' do? you suggested it in the tracker...
[19:39] <NaioN> I could enable it on the cluster
[19:40] <sagewk> ceph-osd won't try to flush writes to disk.. it'll let the (eventual) sync(2) call do it
[19:40] <sagewk> if you do that, you should also tune the vm write threshold to be small (few mb or something)
[19:42] <sagewk> naion: normal, non-snapshotted rbd objects will have ~250-300 bytes of xattrs.
[19:42] <sagewk> once you add in a snap, that may jump to 400 bytes
[19:43] <sagewk> or more, in certain cases.
[19:43] <sagewk> i expect 1k inodes would be good
[19:43] <NaioN> ok thx
[19:44] <NaioN> we don't have any snaps so I have to count with about 250-300 bytes of xattrs
[19:44] <NaioN> well 1k would give us enough room
[19:45] <NaioN> I can't find any info on how much space by default (without xattrs) gets used in a xfs inode
[19:46] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) has joined #ceph
[19:47] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) has left #ceph
[19:49] <NaioN> sagewk: but besides that the cache function for rbd is something we are really looking forward to
[19:51] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Quit: LarsFronius)
[19:52] <gregaf> The_Bishop: stxShadow: actually that old request warning from this morning says "currently delayed", indicating that there's something blocking the request from completing and it's in a wait queue somewhere
[19:52] <gregaf> in this case, probably the scrub
[19:58] <sagewk> sjust: aside from the (near?) dup blocks in that tmap upgrade patch, looks ok to me
[20:09] <sagewk> gregaf: pushed fixed/rebased wip-1796
[20:10] <gregaf> sagewk: cool
[20:10] <gregaf> the other thing is the new functions are simple enough to document nicely… :)
[20:10] <sagewk> sigh.. yeah
[20:11] <gregaf> oh, you just switched it to handle_write_error()
[20:11] <gregaf> I was going to ask, that does make more sense
[20:11] <gregaf> other than docs, looks good
[20:13] <sagewk> also made it on_write_error = NULL after deleteing.
[20:13] <sagewk> thanks
[20:37] <Tv|work> sagewk: you added a role gitbuilder_ceph_deb_ndn_native reference to ceph-autobuild.git fabfile.py, but that role is not defined
[20:37] <sagewk> blarg
[20:38] <sagewk> tv|work, sjust: btw, turning off ccache didn't help the leveldb unit test failure. :(
[20:38] <Tv|work> sagewk: i feel vindicated ;)
[20:38] <sagewk> :) any other ideas?
[20:39] <Tv|work> google for the failures? /me ducks
[20:42] <yehudasa> gregaf: can you also take a look at the new-atomic branch?
[20:42] <gregaf> yeah, sage asked me to and I'm checking it now
[20:43] <gregaf> we're punching more holes in our abstraction layers with the manifests :(
[20:43] <gregaf> but I haven't seen anything else yet
[20:44] <yehudasa> well.. the abstraction layer is kinda doomed anyway
[20:44] <gregaf> yeah :/
[20:44] <yehudasa> librgw is the future
[20:46] <gregaf> yehudasa: woah, dout(0) on every sub-object on every manifest decode?
[20:46] <yehudasa> gregaf: oh, that needs to be cleaned up
[20:47] <gregaf> yes please!
[20:47] <yehudasa> noted
[20:54] <sjust> looking at the leveldb check failure now
[20:55] <yehudasa> sjust: I think that in general, for consistency, we should pass bufferlist reference instead of pointer. Unless we allocate the bufferlist, which I don't think there's a good reason to do it anyway
[20:56] <sjust> yehudasa: I've been using pointers to indicate return parameters
[20:56] <yehudasa> sjust: the thing is that it's not consistent with the rest of librados
[20:57] <sagewk> for librados, we've used pointers consistently for return values
[20:57] <sagewk> as far as i know at least
[20:57] <sjust> it's consistent with at least the ops
[20:57] <yehudasa> int librados::IoCtx::getxattr(const std::string& oid, const char *name, bufferlist& bl)
[20:57] <sjust> void read(size_t off, uint64_t len, bufferlist *pbl, int *prval);
[20:58] <sagewk> first is a write, second is a read.
[20:58] <sjust> getxattr is a read
[21:00] <wido> yehudasa: thanks for the kudos! It took some work.. Pff
[21:05] <yehudasa> wido: well deserved
[21:09] <wido> yehudasa: Now I still have to wait another year :)
[21:09] <wido> sagewk: Did you expect wip-2116 to fix the issue or did you just want extra debugging?
[21:09] <sagewk> originally i did, but when it didn't, i added extra debugging.
[21:09] <sagewk> are you sure you ran wip-2116^ the last time around?
[21:09] <sagewk> oh i can check, nm
[21:10] <wido> Yes, I'm sure of that. I always verify with 'ceph-osd -v'
[21:10] <wido> for the right commit
[21:10] <sagewk> yeah. weird!
[21:10] <wido> indeed. I've been hammering on it for a long time and it's idle now, but nothing. Is normal like it should be
[21:12] <sagewk> anything change with teh hardware? reboots or anything?
[21:12] <sagewk> i think the original bug was triggered by an untimely socket disconnect
[21:12] <nhm> Aha, all sepia nodes are locked.
[21:13] <sagewk> nhm: probably the qa stuff
[21:13] <sagewk> still fixing the routing, almost there
[21:13] <wido> sagewk: Nope, no reboots or any changes. I just stopped all the OSD's, replaced the binary and started them again
[21:13] <nhm> sagewk: yeah, putting in a feature request to have teuthology-lock explain why it didn't do anything. :)
[21:15] <sagewk> nhm: even better if you wanna do it.. just make sure the error goes to stderr not stdout
[21:15] <nhm> sagewk: sure
[21:23] <nhm> sagewk: would it make sense to send all log.error() messages to stderr?
[21:23] <sagewk> i suspect that's where they already go, unless a different location is specified
[21:28] <nhm> hrm.
[21:31] <yehudasa> sagewk: on a single op operation, is the return value of the op is the same value of the entire operation?
[21:31] <sagewk> yeah
[21:32] <sagewk> unless its a write, in which case you either get 0 or error code (no positive values allowed)
[21:32] <yehudasa> sagewk: otoh, the operation may fail before it executed
[21:32] <sagewk> right
[21:32] <sagewk> so you get an informative error code, or uninformative success
[21:32] <yehudasa> so in generally I should return the operate() ret code
[21:33] <sagewk> yeah..
[21:34] <sagewk> the osd will normally stop on error, so you get either (first) error or 0 (no errors).
[21:34] <yehudasa> oh, I see
[21:56] <sagewk> i wonder if we should make 'ceph health' have the first line be the quick summary (as before), but also include after that detailed information. like, which osds are down. which pgs are stale. what osds previously managed the stale pgs. etc.
[21:57] <sagewk> otherwise it's 'ceph osd dump | grep down', 'ceph pg dump | grep stale', etc.
[21:57] <jmlowe> I'd sure like a ceph health with moderate detail
[21:59] * yehudasa_ (~yehudasa@aon.hq.newdream.net) has joined #ceph
[22:00] * gregaf1 (~Adium@aon.hq.newdream.net) has joined #ceph
[22:00] * sagewk1 (~sage@aon.hq.newdream.net) has joined #ceph
[22:01] * dmick1 (~dmick@aon.hq.newdream.net) has joined #ceph
[22:01] * dmick (~dmick@aon.hq.newdream.net) Quit (Read error: Operation timed out)
[22:01] * sjust1 (~sam@aon.hq.newdream.net) has joined #ceph
[22:02] * joshd1 (~joshd@aon.hq.newdream.net) has joined #ceph
[22:04] <yehudasa_> sjust1: renamin get_vals_by_key to get_vals_by_keys
[22:04] <sjust1> yehudasa_: ok
[22:04] * Tv|work (~Tv_@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[22:06] * gregaf (~Adium@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[22:06] * joshd (~joshd@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[22:06] * sjust (~sam@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[22:06] * yehudasa (~yehudasa@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[22:07] * sagewk (~sage@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[22:09] <darkfader> sagewk1: http://mathias-kettner.de/checkmk_writing_checks.html if you look in there at "The agent plugin" you'd see how our monitoring software likes to be fed data, normally we'd rather have multiline output because that's much easier to parse
[22:09] <darkfader> (whereas the same is more annoying to parse in normal scripts hehe)
[22:10] <darkfader> i guess the "standard" way for that dilemma is an option that outputs machine-loved output, and default output being optimized for humans
[22:10] * lofejndif (~lsqavnbok@exit2.ipredator.se) has joined #ceph
[22:16] * MarkN (~nathan@ has joined #ceph
[22:16] * MarkN (~nathan@ has left #ceph
[22:19] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Quit: Leaving)
[22:20] * Tv|work (~Tv_@aon.hq.newdream.net) has joined #ceph
[22:20] <Tv|work> sagewk: braindump sent.. phew
[22:21] <sagewk1> tv|work: thanks
[22:21] <sagewk1> darkfader: i'm thinking:
[22:21] <sagewk1> HEALTH_{OK,WARN,ERROR} short description
[22:21] <sagewk1> long multiline description
[22:21] <sagewk1> more lines
[22:21] <sagewk1> ...
[22:22] <sagewk1> and where the error code is 0 for OK, and nonzero for warn/error
[22:22] <Tv|work> sagewk1: you could probably come up with a readable structured format that'd work as yaml, and then a --format=json would be trivial too...
[22:23] <sagewk1> sigh... yeah, that's be better.
[22:30] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[22:33] * BManojlovic (~steki@ has joined #ceph
[22:35] <nhm> sagewk1: regarding ceph health, that's a great idea.
[22:36] <nhm> sagewk1: Also, the other day I was thinking a clear way to figure out all of the places where a file lives would be really nice.
[22:42] * lofejndif (~lsqavnbok@1RDAADL0T.tor-irc.dnsbl.oftc.net) Quit (Remote host closed the connection)
[22:47] <gregaf1> yehudasa_: with those updates you can put my Reviewed-by on wip-rgw-new-atomic and wip-2139
[22:48] <yehudasa_> gregaf1: cool, thanks
[23:01] * BManojlovic (~steki@ Quit (reticulum.oftc.net kilo.oftc.net)
[23:01] * tnt_ (~tnt@37.191-67-87.adsl-dyn.isp.belgacom.be) Quit (reticulum.oftc.net kilo.oftc.net)
[23:01] * Volley (~worf@chello080109200187.3.sc-graz.chello.at) Quit (reticulum.oftc.net kilo.oftc.net)
[23:01] * Kathor (~Kathor@ Quit (reticulum.oftc.net kilo.oftc.net)
[23:01] * morse (~morse@supercomputing.univpm.it) Quit (reticulum.oftc.net kilo.oftc.net)
[23:01] * jks2 (jks@ Quit (reticulum.oftc.net kilo.oftc.net)
[23:01] * raso (~raso@debian-multimedia.org) Quit (reticulum.oftc.net kilo.oftc.net)
[23:01] * chaos_ (~chaos@hybris.inf.ug.edu.pl) Quit (reticulum.oftc.net kilo.oftc.net)
[23:01] * df_ (davidf@dog.thdo.woaf.net) Quit (reticulum.oftc.net kilo.oftc.net)
[23:01] * stingray (~stingray@stingr.net) Quit (reticulum.oftc.net kilo.oftc.net)
[23:01] * df_ (davidf@dog.thdo.woaf.net) has joined #ceph
[23:01] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[23:01] * tnt_ (~tnt@37.191-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[23:01] * chaos (~chaos@hybris.inf.ug.edu.pl) has joined #ceph
[23:01] * jks (jks@ has joined #ceph
[23:01] * Kathor (~Kathor@ has joined #ceph
[23:01] * stingray (~stingray@stingr.net) has joined #ceph
[23:01] * raso (~raso@debian-multimedia.org) has joined #ceph
[23:01] * Volley (~worf@chello080109200187.3.sc-graz.chello.at) has joined #ceph
[23:01] * BManojlovic (~steki@ has joined #ceph
[23:01] * chaos is now known as Guest5293
[23:04] <yehudasa_> sjust1: how do we init omap?
[23:04] <yehudasa_> omap_clear()?
[23:07] <sjust1> doing any omap operation will implicitly create the omap
[23:07] <sjust1> as with xattrs
[23:08] <yehudasa_> sjust1: well, any write omap operation will do that I guess?
[23:08] <sjust1> yup
[23:08] <yehudasa_> sjust1: in that case omap_clear() will serve as omap_init(), should we rename it?
[23:19] <sjust1> yehudasa_: there is no init
[23:20] <sjust1> reading from an omap to which you havent written will simply return nothing
[23:20] <yehudasa_> sjust1: but what if it doesn't exist..
[23:20] <sjust1> touch will create the file
[23:21] <yehudasa_> sjust1: that's kind of implicit api
[23:21] <sjust1> it's supposed to act somewhat like an xattr or the contents of the file
[23:22] <yehudasa_> can you have both omap and data content on a single object?
[23:22] <sjust1> yes
[23:22] <yehudasa_> oh, cool
[23:22] <sjust1> yep, entirely independent
[23:27] * jmlowe (~Adium@129-79-195-139.dhcp-bl.indiana.edu) Quit (Quit: Leaving.)
[23:28] * phil_ (~quassel@chello080109010223.16.14.vie.surfer.at) has joined #ceph
[23:33] <sagewk1> sjust1, yehudasa_: should omap_get() on a non-existent key return ENODATA instead of NOENT perhaps, like an xattr?
[23:34] <sagewk1> to distinguish between a missing object and a missing key
[23:34] <yehudasa_> sakewk1: yes
[23:34] * sagewk1 is now known as sagewk
[23:34] <yehudasa_> though it's going to break old behavior..
[23:35] <sagewk> this is a new api, there is no old behavior, right?
[23:35] <yehudasa_> technically speaking it's a new api
[23:35] <yehudasa_> but converting from tmap to omap would be more painful
[23:35] <sagewk> yeah, a bit. seems worthwhile tho... sjust1?
[23:35] <yehudasa_> as we'd need to translate ENODATA to ENOENT in certain cases
[23:36] <sagewk> yeah
[23:37] <yehudasa_> though there's probably only one place to change as it's abstracted
[23:46] <yehudasa_> sjust1: I think I need an exclusive set_key
[23:47] * joao (~JL@89-181-145-13.net.novis.pt) Quit (Quit: Leaving)
[23:47] <yehudasa_> sjust1: wait, otoh maybe not
[23:47] <yehudasa_> sjust1: but thinking about it a bit more, it will be useful anyway
[23:47] <sjust1> yeah, I was just putting it off until necessary
[23:47] <yehudasa_> sjust1: will allow doing atomic set and test
[23:48] <yehudasa_> I mean test and set
[23:48] <yehudasa_> anyway, there's a place where we relied on it, but I'm not sure it was actually needed there
[23:48] <Tv|work> leveldb set-if-not-exists is a read-write cycle :(
[23:48] <Tv|work> oh well
[23:49] <Tv|work> probably a lot more lower hanging fruit for performance all over the place
[23:49] <yehudasa_> yeah
[23:52] * tnt_ (~tnt@37.191-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.