#ceph IRC Log

Index

IRC Log for 2013-05-10

Timestamps are in GMT/BST.

[0:04] * portante is now known as portante|afk
[0:04] <Fetch> Kioob: your tribulations are encouraging me to wait for dumpling to upgrade :)
[0:05] <Kioob> well... on the SSD node, it was really fast
[0:05] <Kioob> node(s)
[0:08] <Kioob> or it's because of snapshots :/
[0:08] <Kioob> (I don't have any snapshots on SSD nodes)
[0:09] * athrift (~nz_monkey@222.47.255.123.static.snap.net.nz) Quit (Quit: No Ping reply in 180 seconds.)
[0:11] <Kioob> yeah ! 30 minutes, it's now ok.
[0:12] <Kioob> there is 32 OSD again
[0:12] <Kioob> but I probably can upgrade per host :/
[0:13] * athrift (~nz_monkey@123.255.47.222) has joined #ceph
[0:13] * leseb (~Adium@bea13-1-82-228-104-16.fbx.proxad.net) Quit (Quit: Leaving.)
[0:14] <saras> does any know if upgrade works with landscape
[0:23] <saras> http://jujucharms.com/search?search_text=ceph has any tested with these
[0:28] <dmick> scuttlemonkey has worked with them a lot
[0:30] * mip (~chatzilla@cpc15-thor5-2-0-cust102.14-2.cable.virginmedia.com) Quit (Remote host closed the connection)
[0:31] <cjh_> how slow would ceph get if you put one monitor on every osd server?
[0:31] <cjh_> you'd prob have to lock across a lot of machines creating a lot of extra network traffic
[0:31] <saras> dmick: any plaing with salt
[0:31] <cjh_> saras: i have a friend that uses it and he likes it
[0:32] * ScOut3R (~ScOut3R@BC065755.dsl.pool.telekom.hu) has joined #ceph
[0:32] <dmick> cjh_: 3 or 5 mons is enough, so no value
[0:33] <saras> cjh_: my plan is to start plaing with soon try to get ceph running as the storage for the salt-master so solve fail over with ceph
[0:33] <cjh_> dmick: but what if i wanted to make it really easy to deploy across many hosts. you have to have additional logic to keep track of which hosts are monitors and fail over that process to other nodes if it needs to
[0:34] <saras> have fun guys
[0:34] <dmick> cjh_: yes
[0:35] <cjh_> i guess what i'm getting at is that if the osd process and monitor process were packaged together it makes it really easy to just deploy it and go
[0:36] <Fetch> cjh_: they are packaged together
[0:36] <Fetch> whether a node runs osd or mon or both is a configuration/startup script option
[0:36] <dmick> well, for some use of "package"
[0:37] <Fetch> they're in the same rpm ;)
[0:37] <cjh_> oh yeah i understand that :)
[0:38] <cjh_> if you turned on a monitor on every node than every node is exactly the same and it makes it super easy to expand
[0:39] <lurbs> An even number of monitors is pretty bad, though. Quorum fail.
[0:40] <cjh_> true, you'd at least need some logic built in to take into account even numbers
[0:40] <cjh_> but that's easy to do compared to keeping 3-5 monitors running on the cluster at all times when nodes are dying
[0:41] <Fetch> cjh_: not belittling your idea, but it seems like that's something, if you were running into it as an issue, that you'd really really want addressed through a SCM system of some sort
[0:41] * leseb (~Adium@bea13-1-82-228-104-16.fbx.proxad.net) has joined #ceph
[0:41] <Fetch> vice relying on whatever the Ceph guys coded as default
[0:41] <Fetch> my two cents, anyway
[0:42] * tnt (~tnt@91.177.240.165) Quit (Ping timeout: 480 seconds)
[0:42] <cjh_> i'm not sure what SCM is
[0:46] <Fetch> puppet, chef, ansible, whatnot
[0:46] <cjh_> ah, ok
[0:47] <cjh_> yeah you could have chef manage that to an extent
[0:47] * loicd (~loic@2a01:e35:2eba:db10:ec20:da10:d403:13b6) has joined #ceph
[0:47] <cjh_> does chef understand having an odd number of monitors in the cluster though?
[0:47] <cjh_> or would you have to manually set hosts to be monitors
[0:59] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[1:00] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) Quit (Read error: Connection reset by peer)
[1:00] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[1:04] * ScOut3R (~ScOut3R@BC065755.dsl.pool.telekom.hu) Quit (Ping timeout: 480 seconds)
[1:06] * rustam (~rustam@94.15.91.30) has joined #ceph
[1:09] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[1:10] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[1:11] <cjh_> loicd: you around?
[1:16] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[1:17] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[1:19] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[1:26] * alexxy (~alexxy@2001:470:1f14:106::2) Quit (Remote host closed the connection)
[1:26] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[1:33] <scuttlemonkey> saras: still around?
[1:35] * Tamil (~tamil@38.122.20.226) Quit (Quit: Leaving.)
[1:42] * rustam (~rustam@94.15.91.30) has joined #ceph
[1:50] * diegows (~diegows@190.190.2.126) has joined #ceph
[1:56] * LeaChim (~LeaChim@94.15.192.184) Quit (Ping timeout: 480 seconds)
[2:00] * alram (~alram@38.122.20.226) Quit (Quit: leaving)
[2:04] * buck (~buck@bender.soe.ucsc.edu) has left #ceph
[2:15] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[2:17] * gmason (~gmason@hpcc-fw.net.msu.edu) Quit (Ping timeout: 480 seconds)
[2:27] * Tamil (~tamil@38.122.20.226) has joined #ceph
[2:29] * saras (~kvirc@74-61-8-52.war.clearwire-wmx.net) Quit (Ping timeout: 480 seconds)
[2:31] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[2:31] * KindTwo (~KindOne@h113.172.17.98.dynamic.ip.windstream.net) has joined #ceph
[2:32] * KindTwo is now known as KindOne
[2:36] * sagelap (~sage@2600:1012:b000:21fe:6cec:8658:b13d:9ee8) has joined #ceph
[2:48] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[2:49] * rustam (~rustam@94.15.91.30) has joined #ceph
[2:54] * saras (~kvirc@74-61-8-52.war.clearwire-wmx.net) has joined #ceph
[2:55] <saras> scuttlemonkey: what up m8
[3:00] * Tamil (~tamil@38.122.20.226) Quit (Quit: Leaving.)
[3:03] <scuttlemonkey> hey
[3:03] <scuttlemonkey> you were the salt fan, right?
[3:03] * leseb (~Adium@bea13-1-82-228-104-16.fbx.proxad.net) Quit (Quit: Leaving.)
[3:03] <saras> scuttlemonkey: yes
[3:03] <scuttlemonkey> I spoke briefly w/ the guy I met at openstack
[3:03] <scuttlemonkey> https://github.com/saltstack/salt-states/tree/master/ceph
[3:04] <scuttlemonkey> that is the only one that is public atm
[3:04] <scuttlemonkey> he has another that he put together for internal use that he wants to polish and share
[3:04] <scuttlemonkey> maybe even a guest blog on ceph.com if I can talk him into it :)
[3:04] <scuttlemonkey> dunno if that's helpful (I haven't had time to sit down and play w/ salt yet)
[3:05] <scuttlemonkey> but I knew you were looking
[3:05] <saras> thanks very much
[3:06] <saras> after get though all the task form the PI build setting salt on ceph will the next on my todo list
[3:06] <scuttlemonkey> sweet
[3:07] <scuttlemonkey> if you come up with something you feel like sharing I'd be happy to feature it on ceph.com
[3:07] <saras> you can set up ceph to not have any single point failt points
[3:07] <scuttlemonkey> I really like to not be the only one with the microphone for open source blogs :)
[3:07] <scuttlemonkey> but barring that, I'll eventually have time to sit down and do what I did with juju (I hope)
[3:08] <scuttlemonkey> http://ceph.com/dev-notes/deploying-ceph-with-juju/
[3:08] <scuttlemonkey> would love to be able to do that ^^ for all the orchestration tools
[3:08] <saras> scuttlemonkey: openshift is closer to juju then salt is
[3:09] <scuttlemonkey> heh, one more to add to the pile of stuff I wanna learn and play with then :P
[3:09] <saras> lol
[3:09] <scuttlemonkey> I've done chef, juju, ceph-deploy, crowbar, and a bit of puppet
[3:09] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[3:10] <saras> chef and puppet are closer to what salt or least that what got from watch all the salt airs
[3:10] * rustam (~rustam@94.15.91.30) has joined #ceph
[3:10] <scuttlemonkey> yeah
[3:11] <scuttlemonkey> they are all in the same family of things in my head though
[3:11] <scuttlemonkey> even though they can have quite a bit of difference in use case and approach
[3:11] <saras> yes
[3:11] <saras> salt has better story when come to mentioring
[3:12] <saras> and reacting to it
[3:12] <scuttlemonkey> yeah, I'm really looking forward to playing with it
[3:12] <scuttlemonkey> sounds very cool
[3:12] <saras> if they do anthor salt shaker you try to join
[3:13] <scuttlemonkey> is that an online event of some sort?
[3:14] <saras> https://www.youtube.com/watch?v=XmNTNICNe1k
[3:14] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[3:14] <saras> the salt and rackspace guys did hangout and talked about salt
[3:14] <scuttlemonkey> oh nice
[3:15] <saras> think you 10 20 min into that video
[3:16] <saras> i would have loved to do you guys or salt guys form google summer of code
[3:16] <saras> but really missed the dead line
[3:16] <Kioob> since 0.61.1 files like /var/log/ceph/ceph-mon.*.tdump are filling my disks
[3:16] <Kioob> can I disable that ?
[3:17] <scuttlemonkey> saras: we applied but didn't get accepted
[3:17] <scuttlemonkey> their complaint was we didn't have enough project ideas
[3:18] <saras> your doing better then me then
[3:18] <scuttlemonkey> http://ceph.com/gsoc2013/
[3:18] <scuttlemonkey> I want to get a giant "FAILED" rubber stamp graphic I can overlay on top of that page or something :P
[3:21] <saras> scuttlemonkey: lol
[3:22] <saras> i would was thinking of cool train/marketing idea
[3:22] <saras> some thing like what 10gen is doing
[3:23] <saras> https://education.10gen.com/ scuttlemonkey: have you seen this
[3:24] <scuttlemonkey> hmmm, don't think so
[3:24] <saras> the mongo guys
[3:24] <scuttlemonkey> yeah, interesting
[3:24] <scuttlemonkey> self-paced learning
[3:24] <saras> not really
[3:25] <saras> it week by week
[3:25] <scuttlemonkey> yeah
[3:25] <scuttlemonkey> but not scheduled time of day at least
[3:25] <scuttlemonkey> pretty neat
[3:26] <saras> yes
[3:26] <saras> https://github.com/Stanford-Online/class2go/
[3:27] <saras> that the plat form that stanford is using for online course
[3:33] <scuttlemonkey> cool
[3:33] <scuttlemonkey> ok, time to call it a night
[3:33] <scuttlemonkey> find a bit of irish whisky and enjoy the evening out on the deck
[3:34] <saras> scuttlemonkey: night
[3:36] <saras> scuttlemonkey: i put phone email in private channel if need them
[3:42] * rustam (~rustam@94.15.91.30) has joined #ceph
[3:47] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[3:54] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[4:07] * john_barbee_ (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) has joined #ceph
[4:08] * rustam (~rustam@94.15.91.30) has joined #ceph
[4:11] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[4:18] * rustam (~rustam@94.15.91.30) has joined #ceph
[4:19] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[4:25] * Cube (~Cube@12.248.40.138) Quit (Ping timeout: 480 seconds)
[4:26] * topro (~topro@host-62-245-142-50.customer.m-online.net) Quit (Ping timeout: 480 seconds)
[4:33] * rustam (~rustam@94.15.91.30) has joined #ceph
[4:35] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[5:00] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[5:10] * rustam (~rustam@94.15.91.30) has joined #ceph
[5:10] * NyanDog (~q@103.29.151.3) Quit (Read error: Connection reset by peer)
[5:11] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[5:12] * NyanDog (~q@103.29.151.3) has joined #ceph
[5:21] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[5:57] * rustam (~rustam@94.15.91.30) has joined #ceph
[5:59] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[6:08] * rustam (~rustam@94.15.91.30) has joined #ceph
[6:10] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[6:34] * rustam (~rustam@94.15.91.30) has joined #ceph
[6:35] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[6:48] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[6:54] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) Quit (Quit: Leaving.)
[6:59] * rustam (~rustam@94.15.91.30) has joined #ceph
[7:00] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[7:29] * rustam (~rustam@94.15.91.30) has joined #ceph
[7:31] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[7:52] * sagelap (~sage@2600:1012:b000:21fe:6cec:8658:b13d:9ee8) Quit (Ping timeout: 480 seconds)
[7:57] * tnt (~tnt@91.177.240.165) has joined #ceph
[8:01] * bergerx_ (~bekir@78.188.204.182) has joined #ceph
[8:01] * rustam (~rustam@94.15.91.30) has joined #ceph
[8:02] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[8:10] * fridadud (~oftc-webi@fw-office.allied-internet.ag) has joined #ceph
[8:22] * rustam (~rustam@94.15.91.30) has joined #ceph
[8:23] * topro (~topro@host-62-245-142-50.customer.m-online.net) has joined #ceph
[8:24] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[8:27] * eternaleye (~eternaley@cl-43.lax-02.us.sixxs.net) Quit (Ping timeout: 480 seconds)
[8:28] * eternaleye (~eternaley@cl-43.lax-02.us.sixxs.net) has joined #ceph
[8:38] * rustam (~rustam@94.15.91.30) has joined #ceph
[8:38] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[8:39] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[8:40] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Quit: IceChat - Its what Cool People use)
[8:49] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[8:51] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Remote host closed the connection)
[8:52] * rustam (~rustam@94.15.91.30) has joined #ceph
[8:53] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[9:01] * fabioFVZ (~fabiofvz@213.187.20.119) has joined #ceph
[9:06] * jimyeh (~Adium@60-250-129-63.HINET-IP.hinet.net) has joined #ceph
[9:06] <jimyeh> Hi
[9:07] <jimyeh> Is there any graceful shutdown mechanism in a ceph cluster?
[9:08] <jimyeh> Or just do "service ceph -a stop"?
[9:09] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[9:14] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[9:31] * leseb (~Adium@bea13-1-82-228-104-16.fbx.proxad.net) has joined #ceph
[9:33] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:33] * rustam (~rustam@94.15.91.30) has joined #ceph
[9:35] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[9:35] * LeaChim (~LeaChim@94.15.192.184) has joined #ceph
[9:38] * tnt (~tnt@91.177.240.165) Quit (Ping timeout: 480 seconds)
[9:46] * rustam (~rustam@94.15.91.30) has joined #ceph
[9:47] * loicd (~loic@2a01:e35:2eba:db10:ec20:da10:d403:13b6) Quit (Remote host closed the connection)
[9:56] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[9:58] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:59] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[10:14] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[10:17] * BillK (~BillK@58-7-220-225.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[10:19] * rustam (~rustam@94.15.91.30) has joined #ceph
[10:21] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[10:25] * BillK (~BillK@124-169-231-135.dyn.iinet.net.au) has joined #ceph
[10:32] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[10:36] * psomas (~psomas@inferno.cc.ece.ntua.gr) Quit (Remote host closed the connection)
[10:39] * kyann (~oftc-webi@tui75-3-88-168-236-26.fbx.proxad.net) has joined #ceph
[10:39] <kyann> Hi
[10:41] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[10:44] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[10:47] <kyann> I have a seg fault with the 0.61.1 rgw on a 0.61 cluster :/
[10:52] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[10:52] <leseb> kyann: seg fault on the rgw daemon?
[10:53] <matt_> are there any docs around for librados c++?
[10:54] <tnt> there is the C api doc http://ceph.com/docs/master/rados/api/librados/ and matching with c++ is trivial.
[10:55] <matt_> maybe not so trivial when you're learning c++ :)
[10:56] <kyann> leseb: yes
[10:56] <kyann> i'm doing more test
[10:57] <kyann> but basically i have to radosgw
[10:57] <kyann> on is 0.61 and runs fine
[10:57] <kyann> one is 0.61.1 and segfault on start
[10:57] <kyann> i downgrade to 0.60-1precise but it still segfault
[10:59] <Kioob`Taff> Hi
[10:59] <Kioob`Taff> I have an OSD under 0.61.1 which doesn't work, I obtain message "heartbeat_map is_healthy 'OSD::op_tp thread 0x7f1ec9976700' had timed out after 15"
[11:12] * jjgalvez1 (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[11:12] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[11:12] <Kioob`Taff> other question : slow request 33.681700 seconds old, received at 2013-05-10 11:11:03.285419: osd_op(client.4236483.1:470712312 niman-root.rbd [watch add cookie 1 ver 0] 3.d2c98a17 e67723) currently waiting for missing object
[11:12] <Kioob`Taff> how can I find where is that "missing object" ?
[11:20] <leseb> KindOne: is the object "niman-root.rbd"?
[11:20] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[11:20] <leseb> Kioob`Taff: is the object "niman-root.rbd"?
[11:23] <Kioob`Taff> niman-root.rbd is an RBD image of 2GB, with 3 replica
[11:25] <tnt> what does ceph -s say ?
[11:27] <Kioob`Taff> a lot of errors
[11:27] <Kioob`Taff> health HEALTH_ERR 232 pgs backfill; 10 pgs backfilling; 4 pgs down; 14 pgs incomplete; 4 pgs inconsistent; 4 pgs peering; 9 pgs recovery_wait; 18 pgs stuck inactive; 269 pgs stuck unclean; recovery 159018/8853385 degraded (1.796%); recovering 31 o/s, 82037KB/s; 23 scrub errors; noout flag(s) set; 1 mons down, quorum 0,1,2,3 a,b,c,e
[11:27] <Kioob`Taff> monmap e7: 5 mons at {a=10.0.0.1:6789/0,b=10.0.0.2:6789/0,c=10.0.0.5:6789/0,d=10.0.0.6:6789/0,e=10.0.0.3:6789/0}, election epoch 2456, quorum 0,1,2,3 a,b,c,e
[11:27] <Kioob`Taff> osdmap e67961: 50 osds: 47 up, 47 in
[11:27] <Kioob`Taff> pgmap v12241028: 7824 pgs: 7549 active+clean, 232 active+remapped+wait_backfill, 7 active+recovery_wait, 10 active+remapped+backfilling, 4 down+peering, 2 active+clean+scrubbing, 2 active+recovery_wait+remapped, 14 incomplete, 4 active+clean+inconsistent; 5503 GB data, 17524 GB used, 18812 GB / 36337 GB avail; 159018/8853385 degraded (1.796%); recovering 31 o/s, 82037KB/s
[11:27] <Kioob`Taff> mdsmap e1: 0/0/1 up
[11:28] <Kioob`Taff> (I loose one OSD)
[11:29] <Kioob`Taff> I don't understand pgs in down and incomplete state
[11:29] <absynth> for your first question (timed out after 15), did you try restarting the OSD? we had that yesterday on .56.6 and a restart seemed to fix the problem
[11:30] <absynth> you really shouldn't have incomplete PGs with 1/50 OSDs down
[11:30] <absynth> but your osdmap shows that only 47 are up?
[11:30] <Kioob`Taff> absynth: oh yes I tried... in fact that OSD was crashing, and I was thinking it was related, but now it seem to stay online
[11:30] <absynth> does it consume LOTS of cpu?
[11:30] <absynth> like... 1000% and up?
[11:32] <Kioob`Taff> yes, one OSD is "out" since several weeks, and the other one is really missing. But with 3 replica, this should be ok, no ?
[11:32] <Kioob`Taff> no, it's doesn't use a lot of CPU
[11:33] <tnt> Well you are missing 3 OSDs.
[11:33] <Kioob`Taff> tnt: yes, but an OSD out since several weeks had balanced data on other OSD
[11:33] <Kioob`Taff> no ?
[11:34] <tnt> if the cluster recovered to HEALTH_OK in the mean time, yes it should.
[11:34] <Kioob`Taff> so yes
[11:36] <absynth> that's really weird... you should neither have inconsistent, nor incomplete PGs
[11:36] <tnt> but you also have scrubs errors ...
[11:36] <absynth> down is correct, IMHO, since the osd is down and all of its primary PGs with it
[11:37] <tnt> I'd wait for the cluster to stabilize, then look at the scrub errors and try to repair those
[11:37] <absynth> yeah
[11:37] <absynth> did the slow requests resolve or are they still there?
[11:37] <tnt> well, they seemed to be waiting for some incomplete or down PG.
[11:39] <kyann> Kioob`Taff: you should query stuck pg
[11:39] <kyann> and see why they are stuck
[11:41] <Kioob`Taff> kyann: it's what I did, but... I didn't understand better :p
[11:41] <Kioob`Taff> one note : "ceph pg 8.23e query" doesn't respond... after 2 minutes I haven't got any answer
[11:42] <absynth> is it possible that pg 8.23e is on your "hanging" osd?
[11:42] <Kioob`Taff> no, in "pg dump" I see that it is on [32,13]
[11:42] <Kioob`Taff> which are 2 working OSD
[11:43] <kyann> maybe osd 32 or 13 seems to "work"
[11:43] <Kioob`Taff> but... I see only 2 OSD here...
[11:44] <Kioob`Taff> in "ceph pg dump | grep incomplete", I don't see a specific OSD which could be the source of the problem
[11:44] <tnt> well, maybe pool 8 is not with replication size 3 ...
[11:44] <Kioob`Taff> I can check that
[11:45] * bergerx_ (~bekir@78.188.204.182) Quit (Quit: Leaving.)
[11:46] <Kioob`Taff> ohhhh you're right tnt !
[11:46] <Kioob`Taff> "rep size 2" → game over
[11:47] <Kioob`Taff> and so. All 8.* PG are "dead"
[11:48] <Kioob`Taff> I also have pg in "down" state, on pools 0, 1 and 2
[11:48] <Kioob`Taff> (the defaults pool)
[11:49] <Kioob`Taff> the exact status is "down+peering"
[11:50] <Kioob`Taff> "blocked": "peering is blocked due to down osds",
[11:50] <Kioob`Taff> "peering_blocked_by": [
[11:50] <Kioob`Taff> { "osd": 19,
[11:50] <Kioob`Taff> "current_lost_at": 0,
[11:50] <Kioob`Taff> "comment": "starting or marking this osd lost may let us proceed"}]},
[11:50] <Kioob`Taff> ...
[11:51] <Kioob`Taff> so I just need to mark that OSD as lost ???
[11:51] <absynth> well, is it lost?
[11:51] <absynth> lost == will never, ever come back again, as far as i remember
[11:51] <tnt> yup. so only mark lost if you give up on any data on it.
[11:52] <Kioob`Taff> oh yes... disk unreadable (kernel errors), and when I mount the partition I see a lot of errors in "ls"
[11:52] <absynth> then marking it as lost might be a viable alternative
[11:52] <Kioob`Taff> but yes... I will loose all data from pool 8
[11:52] * KindTwo (~KindOne@h211.35.28.71.dynamic.ip.windstream.net) has joined #ceph
[11:55] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[11:55] * KindTwo is now known as KindOne
[12:02] <Azrael> hey folks. our entire ceph cluster went down and wont come back online. osd's are goign OOM.
[12:02] <absynth> what version?
[12:02] <Azrael> 0.61.1
[12:03] <absynth> phew. not my problem yet.
[12:03] <absynth> ;)
[12:03] <Azrael> first crash was 0.61.0
[12:03] <absynth> so, did any change prompt the outage?
[12:03] <Azrael> can't bring anything back online
[12:03] <Azrael> nope
[12:03] <Azrael> 12 osd's per data node, 16GB ram per data node
[12:03] <absynth> are the OSDs going oom very quickly after starting or during normal operation?
[12:03] <Azrael> were told that 1GB ram per OSD is good on the mailing list
[12:03] <Azrael> quickly
[12:04] <absynth> 1gb is usually OK
[12:04] <Azrael> i'm seeing an OSD use 6GB now
[12:04] <absynth> are you scrubbing?
[12:04] <Azrael> the whole cluster is off
[12:04] <Azrael> i'm doing lots of things
[12:04] <Azrael> i'm trying to bring thing back online
[12:04] <absynth> maybe you want to disable scrubbing for the time being, although i thought the scrubbing memleaks were fixed
[12:04] <Azrael> but if one osd uses 6GB and i have 16GB total... cant bring the remaining 11 back up on boxes
[12:04] <Azrael> ok
[12:04] <Azrael> ceph osd set noscrub?
[12:05] <absynth> that should work
[12:05] <absynth> and set nodeepscrub
[12:05] <Azrael> nodeepscrub == unknown cmd
[12:05] <absynth> if that doesn't work, osd scrub min interval = 30000000 or somethign
[12:05] <Azrael> but noscrub works
[12:05] <kyann> Azrael: when in recovery OSD needs a lot more of RAM
[12:05] <Azrael> orly
[12:06] <absynth> whats your replica count?
[12:06] <Azrael> 3
[12:07] <kyann> how many pg ?
[12:07] <Azrael> 17000+
[12:07] <kyann> wow, that's a lot
[12:08] <Azrael> ok. we were told should be fine.
[12:08] <kyann> it's too much
[12:08] <Azrael> howso?
[12:09] <kyann> based on this page http://ceph.com/docs/master/rados/operations/placement-groups/
[12:09] <kyann> you should have around 400 pgs
[12:09] <absynth> yeah
[12:09] <absynth> oh wait, how many OSDs do you have?
[12:10] <Azrael> right so following that plan
[12:10] <Azrael> we should have 24000+ pg's
[12:10] <Azrael> we have 60 osds
[12:10] <Azrael> sorry
[12:10] <Azrael> right now we have 72 osd's
[12:10] <kyann> oh ok
[12:10] <absynth> so 6 nodes
[12:10] <Azrael> in the future we will have 720 osd's
[12:10] <Azrael> yes
[12:10] <tnt> I have ~12k ogs ... unfortunately back then the formulas I saw was just # OSDs * 100 and didn't mentions replicas and there was no notes about when using multiple pools you should also divide by that ...
[12:10] <Azrael> but we can't change pg allocations either
[12:11] <Azrael> so we have to allocate for what we're planning
[12:11] <kyann> you can change pg allocation but it's "experimental" :D
[12:11] <tnt> you can only split AFAIK. Is merging implemented at all ?
[12:11] <absynth> and anything marked experimental in a ceph context is a sure recipe for desaster
[12:11] <Azrael> agreed
[12:12] <Azrael> so........
[12:12] <Azrael> any tips on how to get our system back online?
[12:12] <Azrael> or dead in the water?
[12:12] <Azrael> switch to XtreemFS? ;-)
[12:12] <absynth> that's actually a problematic situation you're in
[12:12] <absynth> at least from what i can tell
[12:13] <Azrael> yeah heh
[12:13] <Azrael> even getting more ram wont help
[12:13] <Azrael> if we bump to 64GB
[12:13] <Azrael> atm i see OSD's using 7.4GB
[12:13] <absynth> the RAM usage is really weird, it should never be that high
[12:13] <Azrael> it keeps climbing
[12:13] <absynth> not even on a massive recovery
[12:13] <kyann> I did have that kind of memory usage on 10k pg cluster
[12:13] <kyann> while recovering
[12:14] <kyann> but I had 48Gb of RAM so everything was find
[12:14] <absynth> we have about 10k PGs and we are currently backfilling, but no OSD has memory usage that high
[12:14] <tnt> Does crush use the pool number as input ?
[12:16] * psomas (~psomas@inferno.cc.ece.ntua.gr) has joined #ceph
[12:17] <Azrael> ok
[12:17] <Azrael> should we throw away all data and start over?
[12:17] <Azrael> 1 OSD per node instead of 1 OSD per disk?
[12:17] <absynth> i think 1 osd per disk is still the way to go
[12:17] <absynth> but maybe 12 disks per node are not such a good ide
[12:18] <Azrael> 128GB RAM per box then?
[12:19] <absynth> less disks
[12:21] <Azrael> strange. other folks on the mailing lists were running with this setup.
[12:33] * diegows (~diegows@190.190.2.126) has joined #ceph
[12:54] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Read error: Connection reset by peer)
[12:54] * ninkotech_ (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Read error: Connection reset by peer)
[12:54] * ninkotech_ (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[12:55] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[13:02] * Romeo (~romeo@198.144.195.85) Quit (Read error: Connection reset by peer)
[13:02] * Romeo (~romeo@198.144.195.85) has joined #ceph
[13:06] * jimyeh (~Adium@60-250-129-63.HINET-IP.hinet.net) Quit (Quit: Leaving.)
[13:32] * john_barbee_ (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[13:34] <Kioob`Taff> 2013-05-10 13:34:07.284907 7f6a60fdc700 0 log [WRN] : slow request 8716.620331 seconds old, received at 2013-05-10 11:08:50.664509: osd_op(client.4236483.1:470500821 rb.0.2d784.238e1f29.000000000e24 [read 880640~4096] 4.9fd6f600 RETRY=-1 e67691) currently reached pg
[13:35] <Kioob`Taff> what can be the problem ?
[13:35] <Kioob`Taff> 8716 seconds is very old...
[13:37] <absynth> try marking the OSD as down
[13:37] <absynth> the one that has the slow request
[13:40] <Kioob`Taff> thanks !
[13:40] <loicd> cjh_: yes
[13:41] * loicd lagging behind 12h00... ;-)
[13:44] <absynth> loicd: the good news is - it's friday now!
[13:44] <loicd> absynth: :-)
[13:46] <Kioob`Taff> absynth: in fact it was the source of all my problems.... all is now working !
[13:47] <absynth> what did you do to fix it? mark the OSD down?
[13:51] * BillK (~BillK@124-169-231-135.dyn.iinet.net.au) Quit (Remote host closed the connection)
[13:54] <Kioob`Taff> yes absynth
[13:55] <Kioob`Taff> but in fact... it fix the problem (VM are unfrozen, and works), but after some minutes the problem is still here
[13:56] <Kioob`Taff> 2013-05-10 13:55:08.125691 7f1938dd2700 0 log [WRN] : slow request 120.232696 seconds old, received at 2013-05-10 13:53:07.892940: osd_op(client.4154191.1:291838319 rb.0.1c3cb3.2ae8944a.000000000ab7 [write 610304~4096] 8.b9407a3e RETRY=-1 snapc 49=[49,41,39,31,29,21,19,11,9,4] e68552) currently reached pg
[13:56] <Kioob`Taff> 8.b9407a3e refer to the PG ?
[13:58] <absynth> taht's a new slow request
[13:58] <Kioob`Taff> yes
[13:58] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[13:59] <absynth> well, in this case our usual approach was to stop reweighting completely
[13:59] <absynth> and wait until all slow requests are either gone or killed
[13:59] <Kioob`Taff> stop reweighting ?
[13:59] <absynth> reweighting or repairing
[14:00] <absynth> set osd_recovery_max_active = 0
[14:00] <Kioob`Taff> mmm scrubbing maybe ?
[14:00] <absynth> oh yeah, and stop scrubbing too
[14:00] <absynth> on 0.61.1 set osd noscrub
[14:00] <absynth> earlier: osd_scrub_min_interval = 10000000 or something like that
[14:01] <Kioob`Taff> I have 0.61.1 but what is the syntax ? :$
[14:02] <Kioob`Taff> "ceph osd set noscrub" ;)
[14:02] <absynth> i have no idea how it works yet, i just read it in the changelog
[14:03] <absynth> but it should be like you wrote, yes
[14:03] <absynth> (followed by the osdnum)
[14:04] <Kioob`Taff> no, it's like the noout flag, it's common to all osd
[14:04] <absynth> oh okay
[14:06] * Muhlemmer (~kvirc@cable-88-137.zeelandnet.nl) has joined #ceph
[14:06] <Kioob`Taff> I still have slow requets.... but it seems to concern only 3 OSD
[14:07] <Kioob`Taff> I tried to restart those OSD too, without any success
[14:10] * jimyeh (~Adium@112.104.142.211) has joined #ceph
[14:12] <joao> hey absynth, haven't seen you in a while... did something break? :)
[14:12] <Muhlemmer> g'day
[14:13] <Muhlemmer> question on the network configuration docs
[14:14] <Muhlemmer> I found some mailinglist pointing out it would be possible (but not neccesary) to use redundant cluster network
[14:15] <Muhlemmer> anybody knows how to set the adress / interface of the secondary network?
[14:16] * BillK (~BillK@124-169-231-135.dyn.iinet.net.au) has joined #ceph
[14:17] <janos> Muhlemmer: i haven't seen that, but i typically build networks with physical redundancy in them already
[14:19] <Muhlemmer> ok, I was imagining that my cluster side switch would fail and bring praticly everything down
[14:21] <Muhlemmer> janos: any example configuration of a network with build in redundancy?
[14:22] <janos> i always double up switches, use bonding to have multiple physical links
[14:25] <Muhlemmer> janos: aah, I never looked into that before, but some googling already put me on the right track. Thanks!
[14:26] <tnt> I also use LACP over two switches of a stack.
[14:26] <janos> yep
[14:26] <tnt> and as a bonus you get twice the bandwidth.
[14:30] * john_barbee_ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[14:30] <Muhlemmer> tnt: that is what I just read as wel :) Go bonding!
[14:43] * bergerx_ (~bekir@78.188.101.175) has joined #ceph
[14:44] * john_barbee__ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[14:47] * john_barbee_ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[14:48] * stxShadow (~jens@p4FECE7C9.dip0.t-ipconnect.de) has joined #ceph
[14:50] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[14:52] * john_barbee_ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[14:52] * john_barbee__ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Read error: Connection reset by peer)
[14:55] * john_barbee__ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[14:55] * john_barbee_ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Read error: Connection reset by peer)
[14:57] <Kioob`Taff> So.... since 0.61.1 (or since I have two dead OSD ?), I have a lot of slow requests
[14:57] <tnt> with HEALTH_OK as status ?
[14:58] <Kioob`Taff> no
[14:58] <Kioob`Taff> but there is no recovery and no scrub running
[14:58] <stxShadow> Hi all .... we have problems with instantly crashing mds's after upgrade to 0.56.6 .... .anything known about that?
[15:01] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[15:01] <stxShadow> we dont use cephfs jet .... so no real problem .... but maybe interessting for bug hunting ?
[15:02] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[15:02] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[15:06] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[15:10] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[15:11] * markbby (~Adium@168.94.245.4) has joined #ceph
[15:14] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[15:17] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Quit: Ex-Chat)
[15:19] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[15:25] * kyann (~oftc-webi@tui75-3-88-168-236-26.fbx.proxad.net) Quit (Quit: Page closed)
[15:27] <via> joao: ping
[15:27] <joao> pong
[15:28] <via> you replied to my post in ticket 4974 asking about logs
[15:28] <via> what do you need specifically?
[15:30] <joao> ideally, the logs prior to the 'duplicate gv' assert with debug mon 20
[15:30] <via> prior?
[15:30] <via> brb, i'll respond soon
[15:30] <joao> if that's not possible, whatever you have before that assert being triggered
[15:35] <Kioob`Taff> tnt: so, if I well understand, a problem on a pool (not enough replica), can hang all OSD ?
[15:35] <tnt> in theory ... no.
[15:35] <tnt> but heh ... version number is not 0.xxx for nothing :p
[15:37] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) has joined #ceph
[15:39] <Kioob`Taff> ok ;)
[15:39] <Kioob`Taff> and so... what can I do to fix that ?
[15:40] <Kioob`Taff> the missing OSD is definitly lost
[15:42] * john_barbee__ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[15:45] * portante|afk is now known as portante
[15:58] * fridadud (~oftc-webi@fw-office.allied-internet.ag) Quit (Remote host closed the connection)
[15:58] * rustam (~rustam@94.15.91.30) has joined #ceph
[15:58] * john_barbee_ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[15:59] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[15:59] * jeffv (~jeffv@2607:fad0:32:a02:5b2:3634:4079:d171) has joined #ceph
[16:07] * Husky (~sam@huskeh.net) has joined #ceph
[16:08] <Husky> Afternoon. Quick question. What sort of horsepower do we need for the monitor nodes of a ceph cluster?
[16:09] <stxShadow> we use E3-1220 + 8 GB
[16:09] <matt_> Husky, single core an a GB of ram will usually do it. I think the problem with memory usage getting out of control is all sorted
[16:09] <stxShadow> works without problems
[16:09] <matt_> mine is setting at ~15% of one core and 70M with 100OSD's
[16:10] <Husky> We were thinking about using atom boxes for the monitors. We were also going to be using dual e5s for the OSDs as we're intending on running QEMU/KVM on top of them
[16:10] <stxShadow> ok ;) we only use 28 OSDs
[16:10] <stxShadow> 33 M -> 3 %
[16:11] * KindTwo (KindOne@50.96.231.63) has joined #ceph
[16:11] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[16:11] <stxShadow> OSD + KVM on the same host works nice
[16:11] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[16:11] <Husky> that's the plan :)
[16:11] <stxShadow> but you need lot of ram
[16:11] * KindTwo is now known as KindOne
[16:11] <Husky> dual e5 + 128GB :)
[16:11] <janos> thankfully ram is cheeeeeeap
[16:12] <stxShadow> we are on amd -> 32 Cores at 2,4 Ghz each
[16:12] <stxShadow> 256 GB Ram
[16:12] <matt_> Husky, I got the same setup an each osd is currently at 350-500mb of ram each with 4 osd's per kvm host
[16:13] <stxShadow> 2093 root 20 0 2700m 776m 4772 S 82 0.6 2768:13 ceph-osd
[16:13] <stxShadow> 2287 root 20 0 2314m 581m 4964 S 77 0.5 1940:13 ceph-osd
[16:13] <stxShadow> 1896 root 20 0 2382m 623m 5016 S 35 0.5 1575:34 ceph-osd
[16:13] <Husky> we were going with 4 disks per "node" with an SSD for the OS, does that count as 4 OSDs?
[16:13] <stxShadow> little more here
[16:13] <Husky> or is that one
[16:13] <Husky> Just trying to understand the structure of ceph
[16:13] <matt_> Husky, depends if you have an OSD per disk or not. Most people generally do
[16:13] <stxShadow> you will need something fast for the journaling
[16:13] <Husky> I have no idea, still wrapping my head around the docs
[16:14] <matt_> stxShadow, how many PG's do you have?
[16:14] <Husky> we were going to use the RDB block devices for KVM usage
[16:14] <stxShadow> 2013-05-10 16:14:13.382731 mon.0 [INF] pgmap v23840311: 9032 pgs: 9032 active+clean;
[16:14] <Husky> Just trying to understand how that works out
[16:15] <matt_> stxShadow, thanks! I'm only using 5500 pg's which explains the memory difference
[16:16] <Husky> we're also planning on using infiniband as our backend for the OSD nodes using IPoIB, would we need to hook the monitors up to infiniband too or does data shift between the data nodes themselves?
[16:16] <stxShadow> one issue is: reweighting affects VM speed
[16:16] <stxShadow> Husky: no need for that ... we use 10 GE for OSDs
[16:16] <stxShadow> und 1 GE for Mon
[16:17] <Husky> ok so just to recap for my benifit, we'd be fine using 1gbe for the monitors
[16:17] <stxShadow> we also tried 10 GE for the mons .... but ne performace increase
[16:17] <Husky> we'd still use infiniband for our OSDs but thats because we have shit loads of IB equipment in our cage and 10GBE is damn expensive
[16:17] <janos> Husky: a config option is to set up a "cluster" network for the OSD's to talk to each other
[16:19] <stxShadow> peak bandwith for our mons: 15 Mbit .... normal usage: less than 5 Mbit
[16:19] <joao> via, still around?
[16:26] <via> joao: yeah, i'm back
[16:27] <via> so, i don't keep mon debug at 20
[16:27] <via> typically
[16:27] <via> so nothing prior to the update i have
[16:27] <joao> via, do you still have the original store around?
[16:27] <via> yes
[16:27] <joao> mind going through some quick debugging?
[16:27] <via> sure
[16:28] <via> i can also make available to you the store directory
[16:28] <joao> oh, that would be nice
[16:28] <via> does it contain private data?
[16:28] <joao> if you're okay with it, that would be much appreciated
[16:29] <joao> via, mostly ips and sorts
[16:29] <via> ok
[16:29] <joao> on the maps
[16:29] <joao> some stats, but I can't recall any private data being kept there
[16:29] * gmason (~gmason@hpcc-fw.net.msu.edu) has joined #ceph
[16:30] <joao> oh, there's the keyring and auth stuff too
[16:31] <joao> but if you're more comfortable just going through some interactive debugging, that's fine with me :)
[16:31] <Husky> If I want to do KVM + OSD do I still need to use the network rdb thing in the KVM configs? or can I just point KVM config at the block device?
[16:32] <via> janos: http://mirror.ece.vt.edu/pub/oldstore.db.tar.gz
[16:33] <via> i can do both
[16:33] <via> but i'm at work so i'll be in and out
[16:33] <janos> i think you pinged the wrong person ;)
[16:33] <janos> though i like vt.edu
[16:33] <via> oh, yes i did
[16:33] <via> <_<
[16:33] <via> joao: ^
[16:34] <janos> i hung out there a number of years years ago
[16:34] <janos> after i graduated
[16:34] <via> from vt?
[16:34] <janos> graduated elsewhere, but lived in bburg for a while
[16:34] <via> cool
[16:34] <janos> had a ton of friends who were there
[16:34] <via> yeah, i graduated but still live in bburg
[16:34] <janos> wife graduated from there
[16:34] <via> its a great town
[16:34] <joao> err
[16:34] <janos> haha i bet my pullup bar is still there
[16:34] <joao> via, it's giving me 403 Forbidden :\
[16:34] <via> oops
[16:35] <matt_> Husky, you want to use the native KVM rbd driver. There a patch in the latest dev version which makes it nice and quick
[16:35] <janos> an apartment on main across from taco bell (if it's still there)
[16:35] <via> now try
[16:35] <Husky> matt_: Is it usable with libvirt?
[16:35] <janos> put up a pull up bar and it's been ther e adecade even though they told me to remove
[16:35] <via> janos: yeah, its still there
[16:35] <janos> hahaha
[16:35] <via> the taco bell
[16:35] <via> i don't know about the bar
[16:35] <janos> ok
[16:35] <Husky> We're going to be automating this with a control panel
[16:35] <joao> via, worked thanks :)
[16:35] <via> sure, thank you
[16:35] <janos> first floor apartment. check it out ;)
[16:35] <via> i will sometime
[16:35] <joao> aww
[16:35] <matt_> Husky, yep. RBD support has been around for a while and works perfectly
[16:36] <joao> via, I meant the old-format store :\
[16:36] <joao> sorry about that
[16:36] <Husky> would you mind pointing me towards an example config for using RDB with libvirt/KVM please
[16:36] <via> janos: er... is that not that?
[16:36] <joao> any chance you can grab that one for me instead?
[16:36] <joao> via, whatever is on mon data's directory
[16:36] <via> when i first started ceph after updating, it failed to convert
[16:36] <joao> those pesky 'osdmap/', 'mdsmap/' directories
[16:36] <via> oh
[16:36] <via> the whole thing
[16:36] <joao> yeah
[16:37] <matt_> Husky, It's all in the ceph wiki - http://ceph.com/docs/master/rbd/libvirt/#configuring-the-vm
[16:37] <joao> sorry, we call the file/dir-based store the 'old-format' vs the 'new leveldb format' :x
[16:37] <via> ok
[16:37] <via> this will take a while, its larger
[16:37] <joao> yeah, I was astonished you had only a 420KB store :p
[16:38] <Husky> matt_: Thanks
[16:39] <matt_> Husky, no worries. I have the exact same IB, Ceph, qemu setup that you're thinking of so if you want any tips just send me a msg. It's a friday night for me so I'm going back to watching tv :)
[16:39] <Husky> Will do. Your help is very much appreciated
[16:40] <via> joao: http://mirror.ece.vt.edu/pub/mon.0.tar.gz
[16:41] <joao> via, awesome, thanks :)
[16:42] <via> hopefully its more right this time
[16:42] <joao> just perfect
[16:42] <joao> thanks
[16:43] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[16:43] <via> joao: if you fetched it, can i safely remove it?
[16:44] <joao> via, yep
[16:44] <joao> go ahead
[16:44] <via> cool
[16:47] <Husky> How does CEPH handle a disk failure? is there replication of the data across other OSDs? or are pool replicas stored on the same OSD
[16:47] <via> it stores replicas on different osd's
[16:48] * via just lost a drive the other day and it recovered flawlessly
[16:48] <Husky> so if we set a pool to have 2 replicas it would guarentee that the 2nd replica would be on a different disk
[16:48] <via> i believe it depends on your crush map
[16:48] <via> but by default i believ so
[16:49] <Husky> ok, thanks
[16:56] <tnt> you just can't have two replicas on the same OSD.
[16:57] <tnt> but if you have several OSD on each server you can tell crush to distribute the replicas across servers to make sure you don't end up with both copies on the same server.
[16:57] <janos> Husky - it not only stores replicas on different OSD's it stores them on differnt *failure domains*
[16:57] <janos> the default failure domain is host, but can be changed
[16:57] <Husky> thats awesome
[16:57] <janos> if i'm not mistaken
[16:57] <Husky> we're going with 2 hosts to start with
[16:58] <Husky> each with 4 disks
[16:58] <janos> party
[16:58] <Husky> and 3 atom nodes for the monitors
[16:58] <janos> i need to check out low power but powerful-enough monitors
[16:58] <janos> i'd love to get some small machines in enclosures that can handle 2 physical mahcines
[16:59] <janos> for home at least
[16:59] <janos> yes, i have a cluster at home ;)
[17:00] <joao> eh, been trying to figure out an on-the-budget way to get one as well, but so far the biggest problem has been finding a location for it :p
[17:01] <janos> i work at home, so home-office helps
[17:01] <joao> last thing I want is having people complaining about noise at home
[17:01] <janos> still would love to dig out crawlspace more and rack up under the house ;)
[17:02] * yehuda__ (~yehuda@2602:306:330b:1410:18e9:d53a:acd8:6c3e) has joined #ceph
[17:05] <joao> one of the criteria I've been using when searching for a new place to rent has been having a big enough room to act as a server room/place to dump junk, far away enough from any bedroom so that nobody can complain about noise
[17:05] <joao> lest to say this has been a futile search so far; must relax my requirements if I am to find an apartment in Lisbon :p
[17:06] <janos> all my hand-built machines are quiet. any time i rack up something like a Dell though (have a rack in my office) it gets LOUD
[17:06] <janos> thankfully i only do that when configuring things before taking them to the DC
[17:08] * yehuda_hm (~yehuda@2602:306:330b:1410:8595:8c8e:abfc:6738) Quit (Ping timeout: 480 seconds)
[17:08] * yehuda_ (~yehuda@2602:306:330b:1410:57:9565:bd48:71cb) Quit (Ping timeout: 480 seconds)
[17:10] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:21] * tnt (~tnt@91.177.240.165) has joined #ceph
[17:22] * john_barbee_ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Read error: Connection reset by peer)
[17:23] * john_barbee_ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[17:24] * ShaunR (~ShaunR@staff.ndchost.com) Quit (Ping timeout: 480 seconds)
[17:28] * john_barbee_ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit ()
[17:29] * ShaunR (ShaunR@ip68-96-89-159.oc.oc.cox.net) has joined #ceph
[17:34] * stxShadow (~jens@p4FECE7C9.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[17:35] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Ping timeout: 480 seconds)
[17:35] * fabioFVZ (~fabiofvz@213.187.20.119) Quit (Remote host closed the connection)
[17:47] * jimyeh (~Adium@112.104.142.211) has left #ceph
[17:48] <pioto> hm, i think maybe `ceph df` is showing the wrong units, at least for the GLOBAL stats
[17:48] <pioto> it seems to be showing MB instead of GB, in my case...
[17:49] <pioto> so, it seems like it's getting some number in KB, but treating it like it's B
[17:49] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:50] * Matt (matt@matt.netop.oftc.net) has left #ceph
[17:58] <kyle_> hello all. I'm having a couple small issues I'm hoping someone would have a moment to help me troubleshoot? Thanks in advance.
[18:03] <kyle_> ...i'm willing to send beer money. heh
[18:03] <joao> did I just read something about beer?
[18:04] <joao> kyle_, what's up?
[18:05] * leseb (~Adium@bea13-1-82-228-104-16.fbx.proxad.net) Quit (Quit: Leaving.)
[18:05] * jeffv (~jeffv@2607:fad0:32:a02:5b2:3634:4079:d171) Quit (Quit: Leaving.)
[18:05] <kyle_> joao: i use ubuntu 12.04 with the upgraded kernel. Two of my monitors will not update to 0.61. I'm just using the packages here. The two mds and osds were able to update. However at first a couple would not, then they just finally seemed to work after a day or two.
[18:05] * jeffv (~jeffv@2607:fad0:32:a02:5b2:3634:4079:d171) has joined #ceph
[18:06] <joao> what do you mean by "will not update"?
[18:06] <kyle_> joao: actually i have 3 osd. Also one of my monitors was able to upgrade.
[18:06] <kyle_> just using apt-get update && apt-get upgrade
[18:07] <kyle_> going from bobtail
[18:07] <joao> do the packages get installed?
[18:07] <joao> do you have the new versions on the system?
[18:07] <kyle_> i get something like this towards the end.
[18:07] <kyle_> Errors were encountered while processing:
[18:07] <kyle_> /var/cache/apt/archives/ceph-mds_0.56.6-1precise_amd64.deb
[18:08] <kyle_> i have bobtail already running on those two that will not upgrade
[18:08] <joao> hmm, this might be a question for glowell
[18:08] <kyle_> okay. yeah like i said i had six of the daemons upgrade already.
[18:08] <kyle_> seems weird two wont
[18:09] <kyle_> the other issue i am having is that my mds crash almost as soon as i start them
[18:09] <kyle_> but otherwise the cluster seems healthy
[18:10] <joao> 0.61 mds?
[18:10] <kyle_> i'm thinking that may be a symptom of the upgrade problem
[18:10] <kyle_> yeah
[18:10] <kyle_> the mds were upgraded no prob
[18:10] <kyle_> now running 0.61
[18:10] <joao> yeah, might be I guess
[18:10] <joao> you should probably look in the tracker for similar issues
[18:10] <kyle_> okay
[18:11] <joao> but you should definitely try getting the upgrade complete
[18:11] <kyle_> the weird part was that a couple had the same issue, but just worked the next day.
[18:12] <kyle_> RE: upgrading
[18:12] <glowell> kyle_ was there any additional info about the failure available ?
[18:12] <kyle_> dpkg: error processing /var/cache/apt/archives/ceph-mds_0.56.6-1precise_amd64.deb (--unpack):
[18:12] <kyle_> trying to overwrite '/etc/init/ceph-mds.conf', which is also in package ceph 0.56.4-1precise
[18:12] <kyle_> dpkg-deb: error: subprocess paste was killed by signal (Broken pipe)
[18:12] <kyle_> Processing triggers for ureadahead ...
[18:12] <kyle_> Errors were encountered while processing:
[18:12] <kyle_> /var/cache/apt/archives/ceph-mds_0.56.6-1precise_amd64.deb
[18:12] <kyle_> E: Sub-process /usr/bin/dpkg returned an error code (1)
[18:15] <glowell> I haven't seen this. I'll try a 56.4 -> 56.6 upgrade on my test machine and see if I can reproduce it.
[18:16] <tnt> kyle_: you need to do a apt-get install ceph and then apt-get upgrade
[18:16] <tnt> kyle_: I had the same thing because one file was moved from the 'ceph' package to the 'ceph-mds' package.
[18:17] <tnt> and since the 'ceph' package has new dependencies, it won't be upgraded by default using 'apt-get upgrade' only. And because the 'ceph' package doesn't force to upgrade the other 'ceph-*' packages to the same version, 'apt-get install ceph' isn't enough.
[18:18] <kyle_> yeah that worked, thank you. i guess it never crossed my mind to try that since it's currently running.
[18:19] <tnt> glowell: that's the issue I had and why I raised the 'ceph' dependency issue on the ML.
[18:20] <kyle_> actually it doesn't error out anymore but it also now says the package is newest version. However ceph -v says still 0.56.6. Even after the daemon was restarted
[18:21] <tnt> and you did both an apt-get install ceph _and_ apt-get upgrade ?
[18:22] <kyle_> yes i did apt-get install ceph then after complete i did apt-get update && apt-get upgrade
[18:22] <tnt> wait ... 0.56.6 is the version you were trying to install wasn't it ?
[18:22] <tnt> you pasted above '/var/cache/apt/archives/ceph-mds_0.56.6-1precise_amd64.deb'
[18:22] <kyle_> no it's what was already running. the package should up it to .61 now, which it has for the rest of the servers
[18:22] <kyle_> yeah that's the error it was giving. not sure why though
[18:23] <tnt> you probably were at 0.56.4 ?
[18:23] <glowell> tnt: There is an open bug, #4944, in tracker that is related to this problem. I'll be updating the status as I work on it.
[18:23] <kyle_> tnt: it's possible i suppose. I am pretty sure i was at bobtail from the gate though.
[18:23] <tnt> kyle_: check the /etc/apt/source.list.d/ceph (or similar) to see if you point to the right one.
[18:23] <kyle_> okay
[18:24] <tnt> all 0.56.x are 'bobtail'.
[18:24] <kyle_> i see
[18:24] <kyle_> my third monitor upgraded to .61 no prob though
[18:24] <kyle_> with the apt
[18:27] <kyle_> okay looks like that was the issue. was pointing to deb http://ceph.com/debian-bobtail/ precise main
[18:27] <kyle_> to where the others were deb http://ceph.com/debian/ precise main
[18:27] <joao> via, still around?
[18:28] * alram (~alram@38.122.20.226) has joined #ceph
[18:29] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Ping timeout: 480 seconds)
[18:31] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[18:32] * bergerx_ (~bekir@78.188.101.175) Quit (Quit: Leaving.)
[18:33] <via> janos: yeah
[18:33] <via> uh
[18:34] <via> joao: sorry
[18:34] * tkensiski (~tkensiski@c-98-234-160-131.hsd1.ca.comcast.net) has joined #ceph
[18:34] <joao> via, do you happen to still have logs from mon.0 prior to upgrade, with debug mon >= 5 ?
[18:35] <via> if the debug level defaults to <5, no
[18:35] <joao> pity; thanks anyway :)
[18:36] <via> 1/ 5 mon
[18:36] <via> what does that mean?
[18:36] <via> 1?
[18:36] * jeffv (~jeffv@2607:fad0:32:a02:5b2:3634:4079:d171) Quit (Quit: Leaving.)
[18:37] <kyle_> joao, i still seem to have the issue where my mds instances crash as soon as i start them. Any idea where to begin looking for that?
[18:38] <kyle_> I have this in the log:
[18:38] <kyle_> 0> 2013-05-10 09:33:58.417669 7fb0f1035700 -1 mds/journal.cc: In function 'void EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)' thread 7fb0f1035700 time 2013-05-10 09:33:58.416955
[18:38] <kyle_> mds/journal.cc: 1408: FAILED assert(i == used_preallocated_ino)
[18:39] <joao> via, I think the 1 means the log level set; I'm not sure about the 5 though
[18:39] <via> ok
[18:40] * jeffv (~jeffv@2607:fad0:32:a02:5b2:3634:4079:d171) has joined #ceph
[18:40] <joao> kyle_, never seen that one, and searching on tracker is bearing no results
[18:41] <joao> gregaf, any ideas about that assert?
[18:41] * jeffv (~jeffv@2607:fad0:32:a02:5b2:3634:4079:d171) Quit (Remote host closed the connection)
[18:41] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[18:41] <joao> kyle_, my advice would be to open a new bug report on the tracker
[18:41] * jeffv (~jeffv@2607:fad0:32:a02:5b2:3634:4079:d171) has joined #ceph
[18:41] * jeffv (~jeffv@2607:fad0:32:a02:5b2:3634:4079:d171) Quit (Remote host closed the connection)
[18:42] <kyle_> does this config look correct to have two mds, one being hot standbye? http://pastebin.com/aGA90vn9
[18:42] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[18:42] * tkensiski (~tkensiski@c-98-234-160-131.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[18:44] <matt_> joao, I tried to get sage a trace for the mon growth issue but my store is no longer growing. Only problem is now I can't turn the trace off... it keeps going after a mon restart
[18:44] <matt_> any ideas?
[18:45] <joao> matt_, let me take a quick look at the patches for the tracing and I'll get back to you
[18:45] <matt_> Thankyou! I appreciate it
[18:47] * jeffv (~jeffv@2607:fad0:32:a02:e158:566a:36c:ef60) has joined #ceph
[18:53] * loicd (~loic@magenta.dachary.org) has joined #ceph
[18:54] <joao> matt_, are you using the 'mon debug dump transactions' option?
[18:54] * Tamil (~tamil@38.122.20.226) has joined #ceph
[18:54] <matt_> I initially just used the command line option as per the bug report at - http://tracker.ceph.com/issues/4895
[18:54] <matt_> nothing in my conf file
[18:55] * tkensiski (~tkensiski@2600:1010:b018:4ca1:14ba:64b3:c86d:11a9) has joined #ceph
[18:55] <joao> matt_, afaict, there is absolutely no reason for it to continue logging after removing that option
[18:56] <joao> I'll take another look just in case I missed something
[18:56] <matt_> I've just set it to false on the conf file and I think that's fixed it. I'll wait a minute to make sure the trace timestamp doesn't change
[18:57] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[18:57] <matt_> joao, all fixed! Thank you for the tip :)
[18:58] <joao> ah, cool :)
[18:59] <matt_> oh wtf, now my store decides to grow?
[19:00] <joao> huh?
[19:00] <matt_> what is this witchcraft
[19:00] <joao> lol
[19:00] <matt_> it was a stable 800M-ish all day, now that the trace is gone it's growing again
[19:00] <joao> maybe all the logging is giving time leveldb to catch up with compact and/or having time to trigger it by itself?
[19:01] <matt_> could very well be the case
[19:02] <n1md4> Hi. Very new to Ceph. I have 2 boxes for testing, with a single drive in each. I'm following this guide http://wiki.debian.org/OpenStackCephHowto and I now have an error which testing RBD backend, here http://pastie.org/pastes/7827588/text
[19:02] <n1md4> Could any one assist please?
[19:02] <matt_> n1md4, you appear to be using a very old version of ceph?
[19:03] <cjh_> loicd: i'd be happy to test your erasure coded ceph when you have something working
[19:03] * tkensiski (~tkensiski@2600:1010:b018:4ca1:14ba:64b3:c86d:11a9) Quit (Ping timeout: 480 seconds)
[19:03] <n1md4> matt_: likely, the ceph.com/debian/ respository was managed at the time.
[19:03] <loicd> cjh_: great :-) I'm afraid this will take a while, don't hold your breath ;-)
[19:03] <matt_> n1md4, have a look see here - http://ceph.com/docs/master/install/debian/
[19:04] <matt_> n1md4, if it's just a test cluster it's probably better you start fresh with the latest version (0.61.1 cuttlefish)
[19:04] <n1md4> matt_: I think you're right.
[19:05] <n1md4> Thanks.
[19:05] <cjh_> loicd: understood :) yeah this is a huge coding effort
[19:06] * tkensiski (~tkensiski@22.sub-70-197-12.myvzw.com) has joined #ceph
[19:07] * tkensiski (~tkensiski@22.sub-70-197-12.myvzw.com) has left #ceph
[19:08] * gmason (~gmason@hpcc-fw.net.msu.edu) Quit (Ping timeout: 480 seconds)
[19:11] <joao> matt_, we could probably avoid the overhead by simply writing the size of the transaction's ops to the log, instead of writing the whole buffer
[19:12] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has left #ceph
[19:12] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[19:12] * ChanServ sets mode +o scuttlemonkey
[19:13] <matt_> joao, for the sake of helping you replicate the problem would my trace help anyway? It still should be logging all the transactions that cause the issue
[19:14] <joao> matt_, I think the problem may lie on the rate of operations and their size, so if we are impairing that in some way the trace *may* not be that useful, but it's certainly worth a try
[19:15] <matt_> joao, hmm... I'll update the bug anyway to keep Sage up to date and I'll have another go tomorrow
[19:16] <joao> matt_, thanks; drop us the log anyway, if you still have it, and we'll give it a try :)
[19:17] <matt_> joao, I deleted it thinking it was useless :( sorry
[19:18] <joao> np :)
[19:18] <matt_> I'll make another over the weekend so you can have a play on Monday
[19:19] <sagewk> matt_: you can change teh path that it writes the dump too.. maybe sending it to another disk?
[19:20] <matt_> sagewk, it was logging to a different drive than the store. Both are on SSD's
[19:23] <sagewk> hrm :(
[19:25] * rustam (~rustam@94.15.91.30) has joined #ceph
[19:25] * rturk-away is now known as rturk
[19:26] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[19:30] * markbby (~Adium@168.94.245.4) Quit (Quit: Leaving.)
[19:31] * markbby (~Adium@168.94.245.4) has joined #ceph
[19:37] <matt_> I'm going to get some sleep. If you want me to try anything over the weekend just drop me a message, I have a bit of work to do so it's no trouble
[19:38] * LeaChim (~LeaChim@94.15.192.184) Quit (Ping timeout: 480 seconds)
[19:41] <joao> matt_, kay, thanks :)
[19:44] * diegows (~diegows@190.190.2.126) has joined #ceph
[19:46] * LeaChim (~LeaChim@176.250.188.136) has joined #ceph
[19:52] * lx0 is now known as lxo
[19:59] <paravoid> so the fact that restarting an OSD destabilized the cluster for 6-7 minutes is not expected, right?
[20:01] <tnt> paravoid: under 0.61 ?
[20:01] <paravoid> no
[20:01] <paravoid> bobtail
[20:02] <tnt> then, it's "normal". I had that trouble with bobtail and got fixed in cuttlefish.
[20:02] <tnt> (at least in the test cluster, I didn't deploy to prod yet)
[20:02] <tnt> Now when yu restart an OSD, it notifies the MON in advance so that it can redirect IO properly.
[20:03] * jeffv (~jeffv@2607:fad0:32:a02:e158:566a:36c:ef60) Quit (Ping timeout: 480 seconds)
[20:05] * markbby (~Adium@168.94.245.4) Quit (Quit: Leaving.)
[20:06] * markbby (~Adium@168.94.245.4) has joined #ceph
[20:08] * jjgalvez (~jjgalvez@12.248.40.138) has joined #ceph
[20:10] <paravoid> that's not it
[20:10] <paravoid> peering just takes a long time
[20:10] <paravoid> esp. when the osd is started
[20:10] <paravoid> the stop phase is okayish
[20:11] <paravoid> I've always had that problem I guess :)
[20:15] <paravoid> 6 minutes from osd up -> finished peering
[20:15] <tnt> yes, the peering takes a while but that shouldn't have any impact.
[20:15] <tnt> (i.e. IO should continue)
[20:16] * esammy (~esamuels@host-2-103-103-135.as13285.net) has joined #ceph
[20:17] * jeffv (~jeffv@2607:fad0:32:a02:7520:90d4:d7a0:f06b) has joined #ceph
[20:17] <paravoid> 2013-05-10 18:14:32.230985 osd.131 [WRN] 6 slow requests, 1 included below; oldest blocked for > 356.710990 secs
[20:17] <paravoid> 2013-05-10 18:14:32.231002 osd.131 [WRN] slow request 240.348148 seconds old, received at 2013-05-10 18:10:31.882745: osd_op(client.27455.0:463652 .dir.10267.521 [call rgw.bucket_prepare_op] 3.82c464b) v4 currently waiting for degraded object
[20:18] <paravoid> so, no :)
[20:18] <tnt> yes, that's exactly the kind of stuff I had in bobtail and I don't have in cuttlefish.
[20:18] <paravoid> that's unrelated to the osd notifying mon on stop though?
[20:18] <tnt> that's because it's IO requests that were sent to the restarted osd while the mon still tought it was up even though it was restarted.
[20:19] <paravoid> nope, that's not the case here
[20:19] <paravoid> I've been wondering on whether cuttlefish is mature enough yet
[20:20] <paravoid> migrating is going to be a pain with that behavior :)
[20:20] <Elbandi_> where is ceph_0.61.1-1.dsc? i could not find at http://ceph.com/debian-cuttlefish/pool/main/c/ceph/
[20:21] <paravoid> indeed, not there
[20:24] <Elbandi_> good, then there is no wrong with my eyes
[20:24] <Elbandi_> :)
[20:26] * tnt_ (~tnt@91.177.214.32) has joined #ceph
[20:27] * tnt (~tnt@91.177.240.165) Quit (Ping timeout: 480 seconds)
[20:38] * BillK (~BillK@124-169-231-135.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[20:42] * Archie (~tra26@tux64-13.cs.drexel.edu) has joined #ceph
[20:44] <Archie> Question about the recommended kernels
[20:45] <Archie> If I am using libvirt to access ceph does the host machine need to have an update kernel (currently have 3.2)
[20:48] <pioto> Archie: if you're using libvirt with the ceph config for qemu, instead of pointing it to /dev/rbd/..., then no, i don't think you need a newer kernel
[20:48] <pioto> (at least, i hope that's true :)
[20:49] <Archie> pioto: thanks, I really didn't want to debug a odd libvirt problem with the 3.4 kernel on precise
[20:50] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[20:50] * Tamil (~tamil@38.122.20.226) Quit (Quit: Leaving.)
[20:54] * Archie (~tra26@tux64-13.cs.drexel.edu) has left #ceph
[20:55] <saras> scuttlemonkey: syn
[20:59] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[21:07] * Tamil (~tamil@38.122.20.226) has joined #ceph
[21:09] * Cube (~Cube@12.248.40.138) has joined #ceph
[21:14] * eschnou (~eschnou@175.93-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[21:20] <scuttlemonkey> wha?
[21:22] * davidzlap (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[21:22] <dmick> TCP VIOLATION: syn sent, expected ack, got 'wha?'
[21:23] <elder> Stop violating
[21:23] <scuttlemonkey> I'm just that kinda app
[21:23] <scuttlemonkey> poorly written and non-conformist
[21:25] <saras> lol
[21:26] <saras> syn the rigth reply is ack
[21:26] <saras> will syn ack
[21:27] <scuttlemonkey> http://goo.gl/83JoR
[21:27] <saras> scuttlemonkey: i need that
[21:27] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[21:28] <dmick> ack thttpd
[21:30] <saras> wow it been while since lauged that hard
[21:30] <saras> me spell good
[21:30] <scuttlemonkey> :)
[21:30] <dmick> "be generous in what you accept, snarky in what you send"
[21:30] <janos> aww man, Bill the Cat
[21:32] <saras> scuttlemonkey: any way what do thing about do google hangout on air about devops
[21:32] <scuttlemonkey> anything is possible
[21:33] <scuttlemonkey> although our marketing team has been putting together a bunch of webinars
[21:33] <scuttlemonkey> and that might fit into that slot a bit easier
[21:33] <saras> you have better over veiw of all the devops tool then any one know of
[21:35] <saras> for the love all the good don't use webex
[21:35] <scuttlemonkey> heheh
[21:36] <scuttlemonkey> http://www.brighttalk.com/webcasts/?q=Ceph
[21:38] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Remote host closed the connection)
[21:38] <saras> why brighttalk no do oauth arrrrrrg
[21:38] <saras> or open id
[21:39] <saras> thanks
[21:40] <scuttlemonkey> hehe
[21:40] <saras> scuttlemonkey: I got to be honest these should be on youtube
[21:41] <scuttlemonkey> as far as I know...they are
[21:41] <rturk> (http://www.youtube.com/inktankstorage)
[21:41] <scuttlemonkey> pretty sure they got ripped and added to the inktank channel
[21:41] <rturk> scuttlemonkey: sssh!
[21:41] <rturk> :)
[21:41] <scuttlemonkey> hehe
[21:44] <saras> crap more stuff keep me getting my homework done
[21:49] <saras> scuttlemonkey: I would love to kick off a podcast using mumbles or what ever call one those air hangout things
[21:49] <saras> i don't think that fit the rfc for podcast
[21:54] * wschulze (~wschulze@38.105.245.211) has joined #ceph
[21:54] <saras> scuttlemonkey: whould you be up for something like this
[21:55] <scuttlemonkey> saras: don't know what mumbles is
[21:56] <scuttlemonkey> but if you want to start a podcast/hangout of some sort you are more than welcome
[21:57] <saras> scuttlemonkey: so what is good time for you
[21:58] <saras> http://mumble.sourceforge.net/
[22:00] <scuttlemonkey> oh, just a mumble server?
[22:00] <scuttlemonkey> I'm probably not the best guy for a podcast interview honestly
[22:00] <saras> scuttlemonkey: i was thinking more co host
[22:00] <scuttlemonkey> oh, hehe
[22:00] <scuttlemonkey> dunno if I'd have the time
[22:01] <scuttlemonkey> lemme mull it over a bit
[22:01] <saras> thinking kinda like going linux for devops
[22:01] <saras> you have clue and i am the dump ask stupid questions
[22:02] <scuttlemonkey> hah
[22:02] <saras> you got my email right
[22:03] <scuttlemonkey> nod
[22:03] <saras> kevin.cloinger@gmail.com
[22:03] <saras> or saras@itch.com
[22:04] <scuttlemonkey> yeah
[22:04] <saras> scuttlemonkey: if like give me call
[22:05] <saras> after you have had time to do your mulling
[22:05] <scuttlemonkey> k
[22:05] <saras> rst
[22:05] <saras> opps fin
[22:06] * rustam (~rustam@94.15.91.30) has joined #ceph
[22:06] <saras> wow it nice tcp drives i would screw up all the time
[22:07] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[22:08] * themgt (~themgt@24-177-232-33.dhcp.gnvl.sc.charter.com) has joined #ceph
[22:14] * rustam (~rustam@94.15.91.30) has joined #ceph
[22:15] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Read error: Connection reset by peer)
[22:16] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[22:17] * brian_appscale (~brian@wsip-72-215-161-77.sb.sd.cox.net) Quit (Quit: brian_appscale)
[22:22] * wschulze (~wschulze@38.105.245.211) Quit (Quit: Leaving.)
[22:22] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:27] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[22:33] * KindOne (KindOne@0001a7db.user.oftc.net) has joined #ceph
[22:49] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[22:49] * wschulze (~wschulze@38.98.115.249) has joined #ceph
[23:01] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[23:01] * loicd (~loic@magenta.dachary.org) has joined #ceph
[23:04] * markbby (~Adium@168.94.245.4) Quit (Quit: Leaving.)
[23:18] * eschnou (~eschnou@175.93-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[23:24] * yehuda__ (~yehuda@2602:306:330b:1410:18e9:d53a:acd8:6c3e) Quit (Ping timeout: 480 seconds)
[23:25] * yehuda__ (~yehuda@2602:306:330b:1410:18e9:d53a:acd8:6c3e) has joined #ceph
[23:28] * wschulze (~wschulze@38.98.115.249) Quit (Quit: Leaving.)
[23:35] * jeffv (~jeffv@2607:fad0:32:a02:7520:90d4:d7a0:f06b) has left #ceph
[23:43] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[23:43] * loicd (~loic@magenta.dachary.org) has joined #ceph
[23:49] * jjgalvez (~jjgalvez@12.248.40.138) Quit (Ping timeout: 480 seconds)
[23:56] * davidzlap1 (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[23:56] * davidzlap (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Read error: Connection reset by peer)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.