#ceph IRC Log

Index

IRC Log for 2013-01-21

Timestamps are in GMT/BST.

[0:00] * xiaoxi (~xiaoxiche@jfdmzpr06-ext.jf.intel.com) Quit (Ping timeout: 480 seconds)
[0:09] * dosaboy (~gizmo@12.231.120.253) Quit (Quit: Leaving.)
[0:15] * slang1 (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Ping timeout: 480 seconds)
[0:24] * ebo^ (~ebo@233.195.116.85.in-addr.arpa.manitu.net) Quit (Ping timeout: 480 seconds)
[0:38] <sage> noob21: not io
[0:39] <sage> noob21: it would very slightly slow down the rate of updating cluster state (marking osds up/down), but the mon cluster has to get pretty big before that's measurable.
[0:40] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[0:46] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[0:49] * sjustlaptop (~sam@mbb0436d0.tmodns.net) has joined #ceph
[0:49] * Cube (~Cube@pool-71-108-128-153.lsanca.dsl-w.verizon.net) has joined #ceph
[0:50] <noob21> thanks sage :)
[0:51] <noob21> ok i think i'll throw a few more monitors in just for safety
[0:51] * Cube1 (~Cube@184.253.155.121) has joined #ceph
[0:54] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Remote host closed the connection)
[0:55] <noob21> i'm on 0.56-1precise and i haven't noticed any problems with it. should i bother upgrading to 0.56.1-1precise?
[0:57] * Cube (~Cube@pool-71-108-128-153.lsanca.dsl-w.verizon.net) Quit (Ping timeout: 480 seconds)
[0:58] <noob21> i'm assuming i can upgrade the cluster while it's under load right?
[1:02] * sjustlaptop (~sam@mbb0436d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[1:05] * ssedov (stas@ssh.deglitch.com) Quit (Ping timeout: 480 seconds)
[1:06] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) Quit (Remote host closed the connection)
[1:07] * noob21 (~noob2@ext.cscinfo.com) Quit (Quit: Leaving.)
[1:08] <sage> noob21: 0.56 has broken compatiblity with prior and newer versions, so you should definitely upgrade.. but because of that version-specific bug the clients (librados, librbd, radosgw) need to be upgraded at the same time
[1:08] <sage> jksm: do you have the core file? the 'thread apply all bt' output would be helpful!
[1:09] * stass (stas@ssh.deglitch.com) has joined #ceph
[1:11] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[1:15] * Cube (~Cube@184.253.219.166) has joined #ceph
[1:22] * Cube2 (~Cube@184.251.208.136) has joined #ceph
[1:22] * Cube1 (~Cube@184.253.155.121) Quit (Ping timeout: 480 seconds)
[1:23] * dosaboy (~gizmo@12.231.120.253) has joined #ceph
[1:24] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) has joined #ceph
[1:24] * ScOut3R (~ScOut3R@dsl51B61EED.pool.t-online.hu) Quit (Remote host closed the connection)
[1:25] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Read error: Operation timed out)
[1:28] * Cube (~Cube@184.253.219.166) Quit (Ping timeout: 480 seconds)
[1:32] * webbber (~me@219.85.215.162) Quit (Remote host closed the connection)
[1:33] * xiaoxi (~xiaoxiche@134.134.137.71) has joined #ceph
[1:36] * noob2 (~noob2@ext.cscinfo.com) has joined #ceph
[1:54] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[1:56] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[1:58] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[2:05] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) has joined #ceph
[2:06] * gohko (~gohko@natter.interq.or.jp) Quit (Quit: Leaving...)
[2:08] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[2:08] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[2:09] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[2:09] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[2:10] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[2:16] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[2:16] * loicd (~loic@2a01:e35:2eba:db10:f91f:e105:e759:9a2d) has joined #ceph
[2:20] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[2:22] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[2:23] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Remote host closed the connection)
[2:26] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[2:26] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[2:38] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[2:38] * ChanServ sets mode +o scuttlemonkey
[2:39] * Cube (~Cube@184.251.84.141) has joined #ceph
[2:39] * Cube2 (~Cube@184.251.208.136) Quit (Ping timeout: 480 seconds)
[2:40] * dosaboy (~gizmo@12.231.120.253) Quit (Quit: Leaving.)
[2:45] * loicd (~loic@2a01:e35:2eba:db10:f91f:e105:e759:9a2d) Quit (Quit: Leaving.)
[3:53] * mattbenjamin (~matt@75.45.228.196) Quit (Quit: Leaving.)
[4:54] * dty (~derek@pool-71-191-131-36.washdc.fios.verizon.net) Quit (Quit: dty)
[6:06] * yoshi (~yoshi@p6124-ipngn1401marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[6:24] * Cube (~Cube@184.251.84.141) Quit (Ping timeout: 480 seconds)
[6:57] * Cube (~Cube@184.251.243.43) has joined #ceph
[7:26] * The_Bishop (~bishop@2001:470:50b6:0:9424:9c99:2f7:1449) Quit (Read error: Connection reset by peer)
[7:29] * The_Bishop (~bishop@cable-89-16-157-34.cust.telecolumbus.net) has joined #ceph
[7:41] <jksM> sage, it crashed again with full debug in the log a few hours later yesterday... I have both the log and the core dump
[7:42] <sage> jksm: nice.. can you post the thread apply all bt output?
[7:43] <jksM> I'll do that
[7:50] <sage> fpaste.org or similar works..
[7:59] <jksM> sage, I have sent you logs and thread stack traces per mail now.. both on dropbox I'm afraid
[8:00] <sage> thanks
[8:00] <sage> lookin..
[8:00] <jksM> I'll not be on irc for the next ~10 hours, so if you need me to find or test something, please send me an email!
[8:02] <sage> jksm: still there?
[8:02] <jksM> yep
[8:02] <sage> i think the problem may just be that the transaction size is to big, and your objects are big, and unlink is slow.
[8:03] <jksM> hmm
[8:03] <sage> can you try doing 'osd target transaction size = 50' in your ceph.conf?
[8:03] <sage> default is currently 300, but that might just be too much.
[8:03] <jksM> how would my objects have become too big? - I haven't changed any defaults?
[8:03] * Cube1 (~Cube@184.251.186.218) has joined #ceph
[8:04] <jksM> I'll try that
[8:04] <sage> thanks!
[8:04] <jksM> do I set that on each individual osd that I want to try it on? - or must it go in the [osd] section?
[8:04] <sage> [osd]
[8:05] <jksM> do I need to restart other things than the osd.2 that crashed in order to test it?
[8:05] <sage> nope
[8:06] <jksM> okay, restarting now - let's hope :)
[8:06] <sage> :)
[8:09] <jksM> the previous restart ran for 4 hours before crashing... so I guess we won't know until it has run for a full day or similar
[8:09] <sage> is there data rebalancing going on?
[8:09] <jksM> searching for "osd target transaction size" doesn't give me anything on google... is 50 in MB?
[8:09] * Cube (~Cube@184.251.243.43) Quit (Ping timeout: 480 seconds)
[8:10] <sage> operations
[8:10] <jksM> sage, yes, I added a new osd (which was the initial cause of problems)
[8:10] <sage> in this case, # of unlinks to put in a transaction
[8:10] <jksM> pgmap v969701: 1018 pgs: 902 active+clean, 4 active+remapped+wait_backfill, 10 active+degraded+wait_backfill, 28 active+recovery_wait, 7 active+remapped+backfilling, 5 active+degraded+backfilling, 37 active+degraded+remapped+wait_backfill, 16 incomplete, 7 active+degraded+remapped+backfilling, 1 active+clean+scrubbing+deep, 1 active+recovering; 2479 GB data, 4681 GB used, 6397 GB / 11652 GB avail; 171175/1304335 degraded (13.124%)
[8:10] <sage> got it. yeah try this and let me know how it goes
[8:10] <sage> cool
[8:11] <sage> ttyl
[8:18] * BClarkIndy (~BClarkInd@c-67-162-124-225.hsd1.in.comcast.net) has joined #ceph
[8:20] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) has joined #ceph
[8:21] * BClarkIndy (~BClarkInd@c-67-162-124-225.hsd1.in.comcast.net) Quit ()
[8:22] * Cube1 (~Cube@184.251.186.218) Quit (Ping timeout: 480 seconds)
[8:23] <jksM> ttyl, thanks again for the help!
[8:31] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[8:37] * andret (~andre@pcandre.nine.ch) has joined #ceph
[8:50] <xiaoxi> well...I am still trying to log something from kernel for the system auto-reset issue.....
[8:52] <xiaoxi> sage:any explaination about osd_targe_transaction_size? How this works?
[8:54] * LeaChim (~LeaChim@b0faf18a.bb.sky.com) has joined #ceph
[8:56] * ebo^ (~ebo@233.195.116.85.in-addr.arpa.manitu.net) has joined #ceph
[8:59] * sleinen (~Adium@2001:620:0:25:e4d2:2a12:cee2:dfd8) has joined #ceph
[9:02] * gregorg (~Greg@78.155.152.6) Quit (Quit: Quitte)
[9:07] * low (~low@188.165.111.2) has joined #ceph
[9:19] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:23] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[9:23] * ebo^ (~ebo@233.195.116.85.in-addr.arpa.manitu.net) Quit (Ping timeout: 480 seconds)
[9:28] <DJF5> hello
[9:29] <DJF5> is it a known issue that the documentation seems to be offline? http://ceph.com/docs/master/ > THIS PAGE DOESN'T EXIST. (404)
[9:32] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) Quit (Quit: Never put off till tomorrow, what you can do the day after tomorrow)
[9:41] * leseb (~leseb@193.172.124.196) has joined #ceph
[9:44] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) has joined #ceph
[9:45] <Kioob`Taff> Hi
[9:45] <Kioob`Taff> I have IO errors on an RBD volume (end_request: I/O error, dev xvda, sector 318939096)
[9:45] <Kioob`Taff> how can I found where it's stored ?
[9:53] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[9:53] * verwilst (~verwilst@d528F423B.access.telenet.be) has joined #ceph
[9:54] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:54] <tnt> when ceph says " 61/429242 degraded" what is the unit exactly ? It's definitely not PGs because I don't have that many ...
[10:02] * Active2 (~matthijs@callisto.vps.ar-ix.net) has joined #ceph
[10:09] * ebo^ (~ebo@icg1104.icg.kfa-juelich.de) has joined #ceph
[10:14] * psomas (~psomas@inferno.cc.ece.ntua.gr) has joined #ceph
[10:23] <Active2> the documentation link on ceph.com seems to be broken...
[10:23] <Active2> i'll get a 404
[10:23] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) has joined #ceph
[10:24] * yoshi (~yoshi@p6124-ipngn1401marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[10:27] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[10:28] <DJF5> Active2: you can read the docs @ https://github.com/ceph/ceph/tree/master/doc as a quickfix :)
[10:29] <Active2> DJF5: thanks!
[10:30] <DJF5> it's a bit more cumbersome, but it works...
[10:37] * Morg (d4438402@ircip2.mibbit.com) has joined #ceph
[10:40] * pixel (~pixel@188.72.65.190) has joined #ceph
[10:45] * topro (~topro@host-62-245-142-50.customer.m-online.net) has joined #ceph
[10:47] * benner_ (~benner@193.200.124.63) has joined #ceph
[10:48] * benner (~benner@193.200.124.63) Quit (Read error: Connection reset by peer)
[10:49] <topro> hi there, ceph docs down?
[10:51] <DJF5> 10:28:40 < DJF5> Active2: you can read the docs @ https://github.com/ceph/ceph/tree/master/doc as a quickfix :)
[10:51] <topro> DJF5: thanks
[10:51] <DJF5> 10:30:40 < DJF5> it's a bit more cumbersome, but it works...
[10:51] <DJF5> ;-)
[10:51] <topro> oh, i see ;)
[10:52] <topro> should be fine
[10:53] <topro> anyway I'm trying to figure out wheter only osds use cluster network, cause in my test setup with three hosts only osds use cluster network, mons and mds use public network to connect with each other
[10:55] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) Quit (Quit: tryggvil)
[10:57] * sleinen (~Adium@2001:620:0:25:e4d2:2a12:cee2:dfd8) Quit (Quit: Leaving.)
[10:57] * sleinen (~Adium@130.59.94.220) has joined #ceph
[11:01] <topro> http://paste.debian.net/227005/ thats what my ceph.conf is looking like
[11:02] * sleinen1 (~Adium@130.59.94.220) has joined #ceph
[11:03] * joao (~JL@89-181-156-120.net.novis.pt) has joined #ceph
[11:03] * ChanServ sets mode +o joao
[11:03] <joao> good morning #ceph
[11:05] * sleinen2 (~Adium@2001:620:0:26:bd81:6bd1:2185:257) has joined #ceph
[11:05] * sleinen (~Adium@130.59.94.220) Quit (Ping timeout: 480 seconds)
[11:07] * sleinen2 (~Adium@2001:620:0:26:bd81:6bd1:2185:257) Quit ()
[11:07] * sleinen (~Adium@130.59.94.220) has joined #ceph
[11:07] * sleinen1 (~Adium@130.59.94.220) Quit (Read error: Connection reset by peer)
[11:15] * sleinen (~Adium@130.59.94.220) Quit (Ping timeout: 480 seconds)
[11:17] * jantje_ (~jan@paranoid.nl) has joined #ceph
[11:22] * jantje (~jan@paranoid.nl) Quit (Ping timeout: 480 seconds)
[11:30] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[11:35] <DJF5> is anyone here using the admin part of radosgw?
[11:36] * tnt isn't yet, but plans to :p
[11:37] <DJF5> according to the documentation, you can use the S3 api, and provide more custom calls to the admin functionality, if you configure /admin/
[11:37] <tnt> yes
[11:37] <DJF5> but i can find no documentation (maybe i'm blind, it's monday morning for me) on where you can set that
[11:38] <tnt> well, it should just be there I think. No ?
[11:38] <tnt> what version are you using ?
[11:38] <DJF5> I think that i don't have the proper privileges to make those calls with my user, but i have no idea where to set them
[11:38] <DJF5> ceph version 0.56.1
[11:39] <DJF5> I can make calls to the S3 part fine though
[11:40] <tnt> mm, the docs on the website is broken atm ...
[11:41] <tnt> yes, when I query /admin/usage I get {"Code":"AccessDenied"}
[11:41] <DJF5> I am experiencing the same results
[11:42] <DJF5> the docs are available on https://github.com/ceph/ceph/tree/master/doc
[11:42] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[11:43] <DJF5> that's why i suspected authorization problems
[11:45] * MooingLemur (~troy@phx-pnap.pinchaser.com) Quit (Ping timeout: 480 seconds)
[11:55] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Ping timeout: 480 seconds)
[11:59] * stiller (~Adium@095-097-060-070.static.chello.nl) has joined #ceph
[12:01] <stiller> Have a test server running with 24x3TB disks. Ceph reports 67TB available space, with 2x replication. Should available space not be about half of raw storage?
[12:02] <tnt> since you can set different replication for different pools, there is no way to take replication into account when computing available space.
[12:02] * sleinen (~Adium@130.59.94.220) has joined #ceph
[12:02] <tnt> so it just reports the raw space ... but if you store a 10G file on a 2x pool, the available will decrease by 20G.
[12:03] * cobz (~coby@p579D31F1.dip.t-dialin.net) has joined #ceph
[12:04] * sleinen1 (~Adium@2001:620:0:26:f9ad:d31c:bd79:339d) has joined #ceph
[12:04] <Kioob`Taff> the RAID feature of btrfs works like that too
[12:05] * sleinen (~Adium@130.59.94.220) Quit (Read error: Operation timed out)
[12:08] <stiller> Awesome, thanks!
[12:09] <cobz> Hi, i can't umount my ceph mount point - "device ist busy". ceph cluster is up an running. I cant access to whole folder. Even lsof or fuser hang. Any hint? This happens during an iozone run.
[12:10] <pixel> Hi everybody, does anybody know where is main documentation page ?
[12:11] <cobz> by the way the system load is > 13 without any "real" usage by processes
[12:11] <stiller> Documentation is available on the EU mirror: http://eu.ceph.com/docs/master/start/
[12:12] <pixel> stiller Thanks!
[12:12] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[12:13] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[12:14] * stiller1 (~Adium@095-097-060-070.static.chello.nl) has joined #ceph
[12:16] * stiller2 (~Adium@095-097-060-070.static.chello.nl) has joined #ceph
[12:18] * rj175 (~richard@host109-154-219-154.range109-154.btcentralplus.com) has joined #ceph
[12:18] <rj175> hey, anyone else having an issue with the ceph documentation page?
[12:19] * stiller3 (~Adium@095-097-060-070.static.chello.nl) has joined #ceph
[12:19] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[12:19] <pixel> yep, use this one http://eu.ceph.com/docs/master/start/
[12:19] * stiller (~Adium@095-097-060-070.static.chello.nl) Quit (Ping timeout: 480 seconds)
[12:22] * stiller1 (~Adium@095-097-060-070.static.chello.nl) Quit (Ping timeout: 480 seconds)
[12:22] <rj175> magic cheers!
[12:24] * stiller2 (~Adium@095-097-060-070.static.chello.nl) Quit (Ping timeout: 480 seconds)
[12:27] * stiller3 (~Adium@095-097-060-070.static.chello.nl) Quit (Ping timeout: 480 seconds)
[12:28] <rj175> Has anyone ever used swift-bench to benchmark Ceph?
[12:30] * stiller (~Adium@2001:980:87b9:1:f5de:c125:3e0b:a4fc) has joined #ceph
[12:31] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[12:32] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[12:47] * michalefty (2504590a@ircip2.mibbit.com) has joined #ceph
[12:48] <Kioob`Taff> in 0.56.1 announce, I see : « osd: fix large io requests when journal is in (non-default) aio mode »
[12:48] <Kioob`Taff> but I didn't find more info about that
[12:49] <Kioob`Taff> oh. I suppose it's this one : https://github.com/ceph/ceph/commit/de61932793c5791c770855e470e3b5b9ebb53dba
[12:49] * michalefty (2504590a@ircip2.mibbit.com) Quit ()
[12:51] <ebo^> is there any documentation on the meaning of the osd perfcounters?
[12:54] * michalefty (~micha@37-4-89-10-dynip.superkabel.de) has joined #ceph
[13:03] * loicd (~loic@2a01:e35:2eba:db10:f91f:e105:e759:9a2d) has joined #ceph
[13:17] * michalefty (~micha@37-4-89-10-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[13:22] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[13:35] * MooingLemur (~troy@phx-pnap.pinchaser.com) has joined #ceph
[13:45] * MooingLemur (~troy@phx-pnap.pinchaser.com) Quit (Ping timeout: 480 seconds)
[13:48] <tnt> DJF5: I managed to get /admin/usage working
[13:49] <tnt> DJF5: you have to give the user the permissions using something like: radosgw-admin caps add --uid=<user-uid> --caps="usage=read"
[13:49] * xdeller (~xdeller@62.173.129.210) has joined #ceph
[13:54] * MooingLemur (~troy@phx-pnap.pinchaser.com) has joined #ceph
[13:54] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[13:55] <DJF5> tnt: ah thanks! I was thinking the same, but couldn't really find what caps where available
[13:55] <DJF5> o
[13:55] <DJF5> *i'll give it a shot
[13:55] <tnt> well, I went and read the source code :p
[13:55] <tnt> but in 0.56.1 it seems only /admin/usage is implemented, the rest is not there.
[13:58] <DJF5> hmm
[13:58] <DJF5> too bad
[14:00] <tnt> Yup, I would have liked a call to see the current bucket size ... usage isn't really that.
[14:00] <DJF5> I'm not too familiar with S3 itself though, but shouldn't that be in the S3 api itself?
[14:02] * MooingLemur (~troy@phx-pnap.pinchaser.com) Quit (Ping timeout: 480 seconds)
[14:03] <tnt> DJF5: unfortunately not ...
[14:04] <tnt> well, you can list the whole content meta-data and add them yourself, but it's time consuming
[14:05] <DJF5> hmm, too bad... I just noticed the same
[14:06] * Cybje (~cybje@2a02:348:56:53bb::1) has joined #ceph
[14:08] * rj175 (~richard@host109-154-219-154.range109-154.btcentralplus.com) Quit (Quit: rj175)
[14:09] * rj175 (~richard@host109-154-219-154.range109-154.btcentralplus.com) has joined #ceph
[14:11] <DJF5> indeed a {"Code":"MethodNotAllowed"} on other calls
[14:15] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[14:15] * stiller1 (~Adium@2001:980:87b9:1:4123:6057:c45a:7531) has joined #ceph
[14:17] * cobz (~coby@p579D31F1.dip.t-dialin.net) Quit (Remote host closed the connection)
[14:21] * stiller (~Adium@2001:980:87b9:1:f5de:c125:3e0b:a4fc) Quit (Ping timeout: 480 seconds)
[14:27] <Cybje> where is the online Ceph documentation? everything is linking to http://ceph.com/docs/master/ but that doesn't work
[14:28] <pixel> http://eu.ceph.com/docs/master/start/
[14:28] <Cybje> thanks :)
[14:29] <pixel> np
[14:45] * aliguori (~anthony@cpe-70-112-157-151.austin.res.rr.com) has joined #ceph
[14:49] * leseb_ (~leseb@2001:980:759b:1:c5f1:5d5d:82bd:a3e0) has joined #ceph
[14:49] * leseb (~leseb@193.172.124.196) Quit (Read error: Connection reset by peer)
[14:49] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[14:51] * MooingLemur (~troy@phx-pnap.pinchaser.com) has joined #ceph
[14:52] * pixel (~pixel@188.72.65.190) Quit (Quit: Ухожу я от вас (xchat 2.4.5 или старше))
[14:55] * michalefty (~micha@37-4-89-10-dynip.superkabel.de) has joined #ceph
[14:55] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[14:58] * Morg (d4438402@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[14:58] <rj175> Hello, I am using the S3 API with Ceph, I have a lot of images that I need to get into the system. I was going to write a script to do this for me. Does anyone have any other recommendations?
[15:00] * Mark_ (~Mark@cpc5-mort6-2-0-cust902.croy.cable.virginmedia.com) has joined #ceph
[15:05] * gregorg (~Greg@78.155.152.6) has joined #ceph
[15:07] * sleinen1 (~Adium@2001:620:0:26:f9ad:d31c:bd79:339d) Quit (Ping timeout: 480 seconds)
[15:09] <nhm> good morning #ceph
[15:13] <absynth_47215> morning, nhm
[15:13] <absynth_47215> enjoying your holiday?
[15:16] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) has joined #ceph
[15:20] * michalefty (~micha@37-4-89-10-dynip.superkabel.de) Quit (Remote host closed the connection)
[15:21] * nhorman (~nhorman@nat-pool-rdu.redhat.com) has joined #ceph
[15:27] * michalefty (~micha@37-4-89-10-dynip.superkabel.de) has joined #ceph
[15:36] <Mark_> Hi, wondering if someone can help me with ceph installation on Ubuntu…..got radosgw working and can store docs but unable to then see those docs using cephFS - the directory is blank when I mount?? Thanks
[15:36] <paravoid> radosgw writes to different pools than cephfs I think
[15:36] <paravoid> radosgw writes into .rgw.buckets by default
[15:37] <paravoid> and has its own structure of storing files (object names, extended attributes)
[15:42] <fghaas> Mark_, what paravoid says. don't expect to access objects you stored via radosgw with cephfs. if you need a solution like that, glusterfs (with ufo) is the only I know of at this point
[15:59] * dosaboy (~gizmo@12.231.120.253) has joined #ceph
[16:00] * Cube (~Cube@184.255.198.88) has joined #ceph
[16:00] * dxd828 (~dxd828@host-78-151-102-80.as13285.net) has joined #ceph
[16:01] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[16:03] * michalefty (~micha@37-4-89-10-dynip.superkabel.de) Quit (Remote host closed the connection)
[16:10] * rj175 (~richard@host109-154-219-154.range109-154.btcentralplus.com) Quit (Quit: rj175)
[16:11] * vata (~vata@2607:fad8:4:6:9002:b4b0:f356:6335) has joined #ceph
[16:11] * sleinen (~Adium@130.59.94.220) has joined #ceph
[16:12] * sleinen1 (~Adium@2001:620:0:25:8412:e845:8258:ba61) has joined #ceph
[16:13] * rj175 (~richard@host109-154-219-154.range109-154.btcentralplus.com) has joined #ceph
[16:19] * sleinen (~Adium@130.59.94.220) Quit (Ping timeout: 480 seconds)
[16:34] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) Quit (Quit: Leaving.)
[16:34] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) has joined #ceph
[16:35] * loicd (~loic@2a01:e35:2eba:db10:f91f:e105:e759:9a2d) Quit (Quit: Leaving.)
[16:39] * sleinen1 (~Adium@2001:620:0:25:8412:e845:8258:ba61) Quit (Quit: Leaving.)
[16:39] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) Quit (Read error: Connection reset by peer)
[16:40] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) has joined #ceph
[16:42] * mattbenjamin (~matt@aa2.linuxbox.com) has joined #ceph
[16:43] * calebamiles1 (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) has joined #ceph
[16:45] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[16:48] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) Quit (Ping timeout: 480 seconds)
[16:50] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[16:55] * low (~low@188.165.111.2) Quit (Quit: Leaving)
[16:55] * rj175 (~richard@host109-154-219-154.range109-154.btcentralplus.com) Quit (Quit: rj175)
[16:57] * loicd (~loic@pha75-15-88-165-135-72.fbx.proxad.net) has joined #ceph
[17:01] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:05] * Cube (~Cube@184.255.198.88) Quit (Ping timeout: 480 seconds)
[17:05] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) has joined #ceph
[17:06] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) has joined #ceph
[17:09] * verwilst (~verwilst@d528F423B.access.telenet.be) Quit (Quit: Ex-Chat)
[17:17] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[17:18] * sunsetjunks (~sunsetjun@d213-102-30-23.cust.tele2.lv) has joined #ceph
[17:18] <sunsetjunks> hello! is it just me or documentation and quick start are broken on www.ceph.com? http://ceph.com/docs/master/
[17:22] <Cybje> sunsetjunks: I asked the same question today, it's available at http://eu.ceph.com/docs/master/start/
[17:23] <sunsetjunks> Cybje: great, thanks!
[17:24] * sunsetjunks (~sunsetjun@d213-102-30-23.cust.tele2.lv) Quit (Quit: sunsetjunks)
[17:28] * Cube (~Cube@184-231-59-87.pools.spcsdns.net) has joined #ceph
[17:29] * Mark_ (~Mark@cpc5-mort6-2-0-cust902.croy.cable.virginmedia.com) has left #ceph
[17:29] <sage> docs are back up
[17:31] <paravoid> hey sage
[17:31] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) has joined #ceph
[17:31] <paravoid> added more PGs to my cluster, lots of them stucked in either peering, remapped or degraded
[17:31] <paravoid> no visible pattern
[17:31] <paravoid> all kinds of OSDs involved in those
[17:32] <sage> meaning you added a new pool?
[17:32] <paravoid> no
[17:32] <paravoid> sorry, added more OSDs
[17:32] <paravoid> typo
[17:32] * sleinen1 (~Adium@2001:620:0:26:a1a3:91ef:e078:a9ce) has joined #ceph
[17:33] <sage> 'ceph pg <pgid> query' on a peering one to see what is going on
[17:33] <paravoid> this is similar to http://tracker.newdream.net/issues/3747 that I previously reported
[17:35] <sage> http://ceph.com/docs/master/rados/operations/troubleshooting-osd/ if you haven't seen it already
[17:35] <sage> for help diagnosing why individual pgs aren't peering etc
[17:35] <sage> the query command shoudl tell us what is going on
[17:37] * mattbenjamin (~matt@aa2.linuxbox.com) has left #ceph
[17:37] <paravoid> the peering seem to all be with one specific OSD
[17:37] <paravoid> (54)
[17:38] * mattbenjamin (~matt@aa2.linuxbox.com) has joined #ceph
[17:38] <paravoid> anything specific you'd like from query?
[17:39] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[17:40] <paravoid> lots of
[17:40] <paravoid> 2013-01-21 16:39:54.201019 7f8456b65700 0 -- 10.64.32.10:6813/6827 >> 10.64.0.176:6817/22038 pipe(0x1db47900 sd=29 :32883 pgs=327007 cs=4 l=1).connect claims to be 10.64.0.176:6817/2447 not 10.64.0.176:6817/22038 - wrong node!
[17:40] <paravoid> 2013-01-21 16:39:59.476063 7f8448f8a700 0 -- 10.64.32.10:6813/6827 >> 10.64.0.176:6814/22672 pipe(0x294a1000 sd=4 :44777 pgs=61251 cs=4 l=1).connect claims to be 10.64.0.176:6814/20781 not 10.64.0.176:6814/22672 - wrong node!
[17:40] <paravoid> in the logs
[17:40] <paravoid> but I've been seeing this all over
[17:40] * tnt (~tnt@212-166-48-236.win.be) Quit (Read error: Operation timed out)
[17:40] * Zethrok (~martin@95.154.26.34) has joined #ceph
[17:41] * Cube1 (~Cube@184.255.25.118) has joined #ceph
[17:41] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Ping timeout: 480 seconds)
[17:42] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) has joined #ceph
[17:45] * Cube (~Cube@184-231-59-87.pools.spcsdns.net) Quit (Ping timeout: 480 seconds)
[17:45] * sleinen2 (~Adium@2001:620:0:26:adc8:a586:f2ea:1672) has joined #ceph
[17:45] <nhm> absynth_47215: yes indeed. I got my morning exercise in and got to help my son play his curious george video game. :)
[17:46] <sage> paravoid: try stopping osd.54
[17:46] <paravoid> I did
[17:46] <paravoid> I restarted it
[17:46] <paravoid> now it's fine
[17:47] * Cube (~Cube@184.253.6.206) has joined #ceph
[17:47] <sage> the osdmap just prior to that restart would be interesting to compare to the error messages you saw
[17:47] <paravoid> I have one OSD out and resyncing now, so I can't really debug the active+degraded issue now
[17:47] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) Quit (Read error: Operation timed out)
[17:49] * sleinen1 (~Adium@2001:620:0:26:a1a3:91ef:e078:a9ce) Quit (Ping timeout: 480 seconds)
[17:50] * Cube2 (~Cube@184.253.225.236) has joined #ceph
[17:51] * Cube1 (~Cube@184.255.25.118) Quit (Read error: Operation timed out)
[17:52] * loicd (~loic@pha75-15-88-165-135-72.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[17:53] * xdeller (~xdeller@62.173.129.210) Quit (Quit: Leaving)
[17:56] * leseb_ is now known as leseb
[17:56] * Cube (~Cube@184.253.6.206) Quit (Ping timeout: 480 seconds)
[17:59] <sage> paravoid: do you have that info? if you do 'ceph osd dump | grep osd.54' and look at the "down_at N" epoch, we want epoch N-1.. 'ceph osd dump (N-1)
[18:00] * tnt (~tnt@204.203-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:02] * Cube2 (~Cube@184.253.225.236) Quit (Ping timeout: 480 seconds)
[18:03] * ScOut3R (~ScOut3R@dsl51B61EED.pool.t-online.hu) has joined #ceph
[18:03] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[18:04] * leseb (~leseb@2001:980:759b:1:c5f1:5d5d:82bd:a3e0) Quit (Remote host closed the connection)
[18:05] * leseb (~leseb@2001:980:759b:1:c5f1:5d5d:82bd:a3e0) has joined #ceph
[18:06] * loicd (~loic@magenta.dachary.org) has joined #ceph
[18:08] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[18:13] * leseb (~leseb@2001:980:759b:1:c5f1:5d5d:82bd:a3e0) Quit (Ping timeout: 480 seconds)
[18:20] * ebo^ (~ebo@icg1104.icg.kfa-juelich.de) Quit (Quit: Verlassend)
[18:29] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[18:29] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[18:35] * jlogan (~Thunderbi@2600:c00:3010:1:ecc0:67c9:f071:2eb0) has joined #ceph
[18:38] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[18:38] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[18:47] <DJF5> Does anyone know if the admin API (as an extend of the S3 api in radosgw) only implements the 'admin/usage' endpoint? My testing and reading this source code (Just interpreting a language i don't write) https://github.com/ceph/ceph/blob/master/src/rgw/rgw_main.cc:479 makes me think that i am correct. Can someone confirm this? Or if there is any progress planned?
[19:03] * Cube (~Cube@184-192-116-16.pools.spcsdns.net) has joined #ceph
[19:03] * dty (~derek@pool-71-191-131-36.washdc.fios.verizon.net) has joined #ceph
[19:03] * dty (~derek@pool-71-191-131-36.washdc.fios.verizon.net) Quit ()
[19:05] * ScOut3R (~ScOut3R@dsl51B61EED.pool.t-online.hu) Quit (Remote host closed the connection)
[19:07] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) Quit (Quit: Why is the alphabet in that order? Is it because of that song?)
[19:11] <nhm> -29 degree windchill outside.
[19:12] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[19:13] * gaveen (~gaveen@112.135.148.237) has joined #ceph
[19:13] * gaveen (~gaveen@112.135.148.237) Quit ()
[19:13] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[19:15] <joao> nhm, -29C?
[19:15] * gaveen (~gaveen@112.135.148.237) has joined #ceph
[19:15] <joao> well, according to google it better be
[19:15] <joao> apparently -29F = -33.8C
[19:15] <elder> Sounds about right.
[19:16] <elder> Tomorrow's colder I think.
[19:17] <elder> Real temp tomorrow morning here is supposed to be -25 F
[19:17] <joao> and to think I've been complaining all week about how cold it is outside
[19:18] <elder> With the 16KM/hr wind that's supposed to feel like -33
[19:18] <joao> yikes
[19:18] <elder> Whoops, I meant -25 C and -33 C
[19:19] <elder> I have to boil some water and toss it in the air.
[19:20] <joao> regardless of the unit, that's about 35 degrees lower than my concept of 'cold'
[19:20] <elder> Well, we just stay inside...
[19:20] <joao> oh, that makes sense
[19:24] <dwm37> I recall that -40°C and -40°F are about the same..
[19:25] * nhorman (~nhorman@nat-pool-rdu.redhat.com) Quit (Quit: Leaving)
[19:26] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[19:26] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[19:28] <jksM> sagelap, good news! - with the configuration change you suggested, my ceph system has returned to HEALTH_OK for the first time in several days ;-)
[19:28] <jksM> and all osds have been running without crashing for 12+ hours
[19:28] * noob21 (~noob2@pool-71-244-111-36.phlapa.fios.verizon.net) has joined #ceph
[19:29] * dosaboy (~gizmo@12.231.120.253) Quit (Quit: Leaving.)
[19:31] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[19:33] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[19:36] * noob2 (~noob2@ext.cscinfo.com) Quit (Ping timeout: 480 seconds)
[19:42] * ScOut3R (~ScOut3R@dsl51B61EED.pool.t-online.hu) has joined #ceph
[19:43] <janos> dang, fedora 18 out. man i'm behind
[19:50] <absynth_47215> fedora? i think i installed the first preview of fedora1...
[19:50] <noob21> do i need to use the rados gw api to figure out how much is being used by the gateway users or is there a command for that? i couldn't find it when i was looking
[19:51] <janos> absynth_47215: i've had pretty good fortunes with it since fedora 8'ish
[19:51] <absynth_47215> janos: i'm a debian user since 3 or so, and never really regretted it
[19:51] <janos> i started with debian. i have no idea why i left it
[19:52] <janos> just plain old don't recall
[19:52] <janos> but i get along with fedora really well
[19:52] * sjust-phone (~sjust@206.29.182.197) has joined #ceph
[19:52] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[19:52] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[19:53] * mikey (~mikey@catv-213-222-190-74.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[19:54] * Cube1 (~Cube@184.255.200.110) has joined #ceph
[19:57] <absynth_47215> debian has actuality issues (does that word exist?)
[19:57] <absynth_47215> debian squeeze has an 2.6.35 kernel or something
[19:57] <absynth_47215> even on wheezy (testing), i only have 3.2.0
[19:59] * Cube2 (~Cube@184.253.131.180) has joined #ceph
[19:59] <paravoid> squeeze has 2.6.32, wheezy has latest 3.2 (the 3.2 upstream maintainer is also the Debian maintainer)
[20:00] <absynth_47215> i have a wheezy box that has 3.2.0-4 and it is recent, afaict
[20:00] <paravoid> yes.
[20:00] <paravoid> so?
[20:01] * Cube (~Cube@184-192-116-16.pools.spcsdns.net) Quit (Ping timeout: 480 seconds)
[20:02] <absynth_47215> that's not the latest 3.2 kernel, as far as i can tell from kernel.org
[20:02] * Cube (~Cube@184-231-96-12.pools.spcsdns.net) has joined #ceph
[20:03] <absynth_47215> kernel.org knows about 3.2.37, so either there's some seriously intransparent numbering on debian's side or their kernel is just old ;)
[20:04] * Cube1 (~Cube@184.255.200.110) Quit (Ping timeout: 480 seconds)
[20:04] <absynth_47215> (i rephrase intransparent to counterintuitive)
[20:05] * dxd828 (~dxd828@host-78-151-102-80.as13285.net) Quit (Quit: Textual IRC Client: www.textualapp.com)
[20:05] * dosaboy (~gizmo@12.231.120.253) has joined #ceph
[20:06] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[20:07] * Cube2 (~Cube@184.253.131.180) Quit (Ping timeout: 480 seconds)
[20:07] <joao> sage, around?
[20:07] * cdblack (86868b4c@ircip3.mibbit.com) has joined #ceph
[20:07] <cdblack> Elooo peeps
[20:07] <sstan> hi
[20:09] * Cube1 (~Cube@184.251.151.36) has joined #ceph
[20:09] <cdblack> Quick question: How can I get rid of "16 pgs degraded; 16 pgs stale; 16 pgs stuck stale; 22 pgs stuck unclean" in my 48 OSD cluster. I had a bunch of OSD failures over the holidays, it's OK if we lost some data just testing here. :-)
[20:09] <absynth_47215> joao: query
[20:09] <cdblack> I've tried to force scrubs to no avail
[20:09] <cdblack> still sitting at 4.78% degraded
[20:10] <absynth_47215> all OSDs up and in?
[20:10] <cdblack> yep, all up and all in
[20:10] <absynth_47215> hm, did you do something like "ceph osd pause" on one or more OSDs?
[20:10] <cdblack> most PGs were associated with OSD 45, which was a total disk failure
[20:11] <cdblack> no pauses issued from what I know
[20:12] <absynth_47215> so osd.45 is basically empty now?
[20:12] <cdblack> yep, replaced, formatted, re-initialized and re-joined
[20:13] * sagelap (~sage@76.89.177.113) Quit (Quit: Leaving.)
[20:14] <phantomcircuit> cdblack, you need to remove osd 45 from the cluster reinitializing like that probably made it worse
[20:14] * sleinen (~Adium@2001:620:0:26:adc8:a586:f2ea:1672) has joined #ceph
[20:14] <absynth_47215> i'd guess so too
[20:14] <absynth_47215> try taking it down
[20:14] * sleinen2 (~Adium@2001:620:0:26:adc8:a586:f2ea:1672) Quit (Read error: No route to host)
[20:14] <absynth_47215> just to see what happens
[20:14] <absynth_47215> (if it is an experimental cluster, nobody will be impacted, anyway)
[20:14] * Cube (~Cube@184-231-96-12.pools.spcsdns.net) Quit (Ping timeout: 480 seconds)
[20:15] * ScOut3R (~ScOut3R@dsl51B61EED.pool.t-online.hu) Quit (Remote host closed the connection)
[20:16] <cdblack> so it was out and down, then we formatted, ran a 'ceph-osd --mkfs -i 45 --monmap /tmp/monmap', then started it up at which point it did its thing and went up & in... recovered from 15% down to 4.7 and then stopped
[20:16] <janos> did you ever remove the old from the crushmap?
[20:17] <cdblack> I took it down, and rebooted the box even with no luck, didn't touch the crushmap.... thinking about just dumping all the RBDs and reformatting with mkcephfs to start clean
[20:18] <janos> would be worth the time to mess with this scenario
[20:19] <cdblack> janos, you're suggesting I take 45 down and remove it from the map to see if that fixes us?
[20:19] <absynth_47215> first i would check which PGs are affected
[20:19] <absynth_47215> ceph pg dump |grep degraded
[20:19] <absynth_47215> should work
[20:20] <cdblack> got the list
[20:20] <absynth_47215> third but last column shows the OSDs that have this pg and the replica
[20:20] <cdblack> ceph pg dump_stuck stale - brings back a few more
[20:20] <noob21> any idea why the rados admin tool spits out json? i'm not sure what to use on the cmdline to parse that
[20:21] <absynth_47215> noob21: there's surely a CPAN module for that
[20:21] <absynth_47215> or you can use PHP ;)
[20:21] * dosaboy (~gizmo@12.231.120.253) Quit (Quit: Leaving.)
[20:21] <noob21> lol true
[20:21] * Cube (~Cube@184.255.250.218) has joined #ceph
[20:21] <absynth_47215> cdblack: are they all on osd.45?
[20:21] <noob21> i was going to write some python
[20:21] <phantomcircuit> cdblack, ceph osd lost 45
[20:21] <noob21> i just find it odd that it doesn't print out space delimited like other cmd line tools
[20:21] <phantomcircuit> tells ceph that osd 45 is a total loss
[20:21] <joao> noob21, 'import json' should do the trick then
[20:22] <noob21> haha
[20:22] <noob21> import easy :D
[20:22] <phantomcircuit> which should at least help get the placement groups to just accept their loss
[20:22] <joao> noob21, I'm serious :p
[20:22] <noob21> oh i know
[20:22] <noob21> i will
[20:22] <joao> we use it extensively on teuthology
[20:22] <noob21> 2 mins and i'll have a quicky tool to parse this
[20:23] <absynth_47215> and remember, denial, anger, bargaining, depression, _acceptance_
[20:23] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[20:23] * cdblack (86868b4c@ircip3.mibbit.com) has left #ceph
[20:24] * cdblack (86868b4c@ircip3.mibbit.com) has joined #ceph
[20:24] <cdblack> whoops, lost me chat
[20:24] <absynth_47215> 20:21:43 < phantomcircuit> cdblack, ceph osd lost 45
[20:25] <absynth_47215> 20:21:55 < phantomcircuit> tells ceph that osd 45 is a total loss
[20:25] <absynth_47215> did you get that?
[20:25] * Cube2 (~Cube@184-225-51-36.pools.spcsdns.net) has joined #ceph
[20:25] <cdblack> Looks like ceph pg dump_stuck stale lists that 45 has all of those, checking.....
[20:26] <cdblack> up and acting colums on ceph pg dump_stuck show all 45, time for that osd lost then eh?
[20:27] * Cube1 (~Cube@184.251.151.36) Quit (Ping timeout: 480 seconds)
[20:28] <cdblack> --yes-i-really-mean-it .... lol
[20:29] <cdblack> guess I have to down the osd 1st
[20:29] * Cube (~Cube@184.255.250.218) Quit (Ping timeout: 480 seconds)
[20:30] * Cube (~Cube@184-225-111-69.pools.spcsdns.net) has joined #ceph
[20:32] <joao> cdblack, if I had my way, we'd be putting math solving problems in front of the user before we'd let them be accepted by the monitors :p
[20:32] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[20:37] * Cube2 (~Cube@184-225-51-36.pools.spcsdns.net) Quit (Ping timeout: 480 seconds)
[20:41] * jlogan2 (~Thunderbi@72.5.59.176) has joined #ceph
[20:44] * Cube1 (~Cube@184.253.200.37) has joined #ceph
[20:45] * jlogan (~Thunderbi@2600:c00:3010:1:ecc0:67c9:f071:2eb0) Quit (Ping timeout: 480 seconds)
[20:49] <noob21> joao: how did you enumerate the json dictionary?
[20:49] <joao> for x in j: ? is that what you mean?
[20:50] <noob21> yeah but that only gives you summary and entries
[20:50] * Cube (~Cube@184-225-111-69.pools.spcsdns.net) Quit (Ping timeout: 480 seconds)
[20:51] <joao> I'm a python newbie, but I would do just the same for j['summary'] and j['entries'] or something of the sort
[20:51] <noob21> ah i see, it's a nested dictionary
[20:53] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[21:02] * dosaboy (~gizmo@12.231.120.253) has joined #ceph
[21:07] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) has joined #ceph
[21:07] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) has left #ceph
[21:14] * Cube1 is now known as Cube
[21:14] * nyeates (~nyeates@180.sub-70-192-195.myvzw.com) has joined #ceph
[21:17] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[21:17] * ScOut3R (~ScOut3R@dsl51B61EED.pool.t-online.hu) has joined #ceph
[21:17] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:19] * sjust-phone (~sjust@206.29.182.197) Quit (Ping timeout: 480 seconds)
[21:19] * loicd (~loic@magenta.dachary.org) Quit ()
[21:20] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) has joined #ceph
[21:28] * Cube1 (~Cube@184-192-48-175.pools.spcsdns.net) has joined #ceph
[21:31] * Cube (~Cube@184.253.200.37) Quit (Ping timeout: 480 seconds)
[21:37] * dosaboy (~gizmo@12.231.120.253) Quit (Read error: Operation timed out)
[21:38] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[21:39] * Cube1 (~Cube@184-192-48-175.pools.spcsdns.net) Quit (Ping timeout: 480 seconds)
[21:40] <sage> joao: here now
[21:40] * dosaboy (~gizmo@12.231.120.253) has joined #ceph
[21:40] <joao> hi there
[21:41] <joao> sage, I have the feeling we've gone through this a million times now, but is it the 'HEAD_VERSION' on the messages the one that bump the message version and breaks compatibility?
[21:41] <sage> COMPAT_VERSION
[21:41] <sage> as long as you add new stuff ot the end, you can bump HEAD_VERSION and old code will ignore it.
[21:42] <sage> bump COMPAT_VERSION when that doesn't work, which makes old code complain (instead of crash)
[21:42] <joao> so to remove the gv stuff out of MMonPaxos and the likes, should it be compat?
[21:42] <sage> you can't remove anything without breaking compatibility
[21:42] <sage> only adding to the end is allowed
[21:43] <sage> iirc the plan is: wait until quorum all has the new feature. at that point, locally cover the store (using the gv info), and mark in the on-disk CompatSet that the feature is now required. from then on, old mons can't participate.
[21:44] <sage> when they upgrade, they should probably notice during probe that the quorum has the feature, do the local conversion, and then re-bootstrap()
[21:44] <joao> sage, given the single-paxos patches to MMonPaxos and MMonProbe, we're going to mess with those messages anyway, so I was expecting to just clean up the messages of gv stuff, bump a version and make sure this version does not work with previous versions
[21:44] <joao> oh, yeah, not sure that is going to work
[21:45] <joao> what I recall about the plan is to forget compatibility of the new code with the existing stores, and upgrade on the first run; if things don't work out, cleanly revert to an older version but requiring user intervention
[21:45] <sage> hmm. is that what our previous plan was, or did we have something else in mind?
[21:46] <sage> oh right. if you upgrade mon-by-mon, once you flip over to a majority it will continue on as before.
[21:46] <joao> yeah
[21:46] <sage> so yeah, screw compatibility. bump HEAD_VERSION, and set COMPAT_VERSION to the same value
[21:47] <joao> okay, cool
[21:47] <joao> thanks
[21:47] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[21:47] <joao> we'd probably be able to manage making the monitor compatible, with an older version, by feigning multiple paxos machines, but that would require some intervention
[21:48] <joao> it might be feasible though
[21:48] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:48] <joao> iff you guys think that's a good idea that is
[21:48] <sage> no, i think this is the way to go, at least for now.
[21:48] <joao> okay
[21:49] <joao> cool
[21:49] <sage> if we change our mines, we can just give the new MMonPaxos a new message id and it won't interfere with the old one at all.
[21:49] <sage> which might not be a bad idea anyhow.
[21:49] <joao> yeah
[21:50] <joao> well, going to have dinner now; bbiab
[21:53] * sleinen1 (~Adium@2001:620:0:26:cd05:b8a:f0bc:fb6b) has joined #ceph
[21:54] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) has joined #ceph
[21:57] * Cube (~Cube@184-231-224-123.pools.spcsdns.net) has joined #ceph
[21:59] * dosaboy (~gizmo@12.231.120.253) Quit (Quit: Leaving.)
[21:59] * sleinen (~Adium@2001:620:0:26:adc8:a586:f2ea:1672) Quit (Ping timeout: 480 seconds)
[22:02] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[22:03] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[22:10] * Kioob (~kioob@luuna.daevel.fr) Quit (Quit: Leaving.)
[22:10] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[22:10] * nyeates (~nyeates@180.sub-70-192-195.myvzw.com) Quit (Quit: Zzzzzz)
[22:10] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[22:14] * leseb_ (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[22:14] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Read error: Connection reset by peer)
[22:14] * Kioob (~kioob@luuna.daevel.fr) has joined #ceph
[22:15] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[22:18] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:22] * leseb_ (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Ping timeout: 480 seconds)
[22:27] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[22:28] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[22:39] * BManojlovic (~steki@85.222.185.201) has joined #ceph
[22:39] * doubleg (~doubleg@69.167.130.11) Quit (Quit: Lost terminal)
[22:41] <paravoid> where is the last index for "ceph osd create" stored?
[22:41] <paravoid> i.e. how can I alter that?
[22:45] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[23:21] * sjust-phone (~sjust@206.29.182.148) has joined #ceph
[23:25] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[23:27] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit ()
[23:28] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[23:34] * leseb_ (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[23:34] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Read error: Connection reset by peer)
[23:34] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[23:40] * sleinen1 (~Adium@2001:620:0:26:cd05:b8a:f0bc:fb6b) Quit (Quit: Leaving.)
[23:42] * leseb_ (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Ping timeout: 480 seconds)
[23:51] * sjust-phone (~sjust@206.29.182.148) Quit (Quit: Yaaic - Yet another Android IRC client - http://www.yaaic.org)
[23:56] * BManojlovic (~steki@85.222.185.201) Quit (Quit: Ja odoh a vi sta 'ocete...)
[23:58] * BManojlovic (~steki@85.222.185.201) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.