#ceph IRC Log

Index

IRC Log for 2013-01-16

Timestamps are in GMT/BST.

[0:02] * scalability-junk (~stp@188-193-202-99-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[0:05] * sander (~chatzilla@c-174-62-162-253.hsd1.ct.comcast.net) Quit (Ping timeout: 480 seconds)
[0:06] <phantomcircuit> ok well high iops does the same thing
[0:07] <phantomcircuit> 1k block size and it quickly stalls
[0:07] * tnt (~tnt@216.186-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[0:07] <phantomcircuit> clearly the journal isn't full from 1k writes
[0:12] <gregaf> I think maybe one of those knobs prevents the journal from getting too many operations ahead of the backing store as well, but I'd have to check with sjust to be sure
[0:13] <sjust> gregaf, phantomcircuit: yeah, the journal doesn't necessarily fill
[0:13] <sjust> if you allow the journal to get too far ahead of the backing fs, you just set yourself up for a horribly long sync
[0:16] <sjust> phantomcircuit: it's not supposed to go faster than the backing fs for long
[0:21] * nwat (~Adium@soenat3.cse.ucsc.edu) Quit (Quit: Leaving.)
[0:23] <phantomcircuit> sjust, well im sort of hoping that the os will merge a significant number of the smaller ops into larger ones
[0:23] <sjust> fair enough, what is the workload?
[0:24] <phantomcircuit> it's vms that are idle the vast majority of the time and then go nuts
[0:24] <sjust> ok, but if the io is truly random, there won't be any aggregation
[0:25] <phantomcircuit> yeah but that's the thing it doesn't last for very long
[0:25] <phantomcircuit> talking tens of seconds at worst
[0:28] <phantomcircuit> and actually the rados bench write test should be the absolute best case for that
[0:29] <phantomcircuit> as it's all new objects it should result in essentially all sequential writes
[0:32] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) has joined #ceph
[0:34] * aliguori (~anthony@32.97.110.59) Quit (Quit: Ex-Chat)
[0:35] * sjustlaptop (~sam@2607:f298:a:607:7079:1b6:ea66:b636) has joined #ceph
[0:39] * ScOut3R (~ScOut3R@5400A5AF.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[0:40] * noob2 (~noob2@pool-71-244-111-36.phlapa.fios.verizon.net) has joined #ceph
[0:46] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[0:48] * The_Bishop (~bishop@2001:470:50b6:0:9b6:b9a7:942f:f769) Quit (Ping timeout: 480 seconds)
[0:48] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) Quit (Ping timeout: 480 seconds)
[0:49] <phantomcircuit> ya know what i just realized
[0:49] <phantomcircuit> i dont think the flusher sets committing
[0:49] <phantomcircuit> committing is set but there is definitely no disk activity
[0:50] <phantomcircuit> something is actually deadlocked here
[0:50] * agh (~agh@www.nowhere-else.org) has joined #ceph
[0:52] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[0:53] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[0:53] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has left #ceph
[0:54] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[0:55] * The_Bishop (~bishop@2001:470:50b6:0:352d:126e:6cd2:9040) has joined #ceph
[0:56] * chutzpah (~chutz@199.21.234.7) Quit (Quit: Leaving)
[0:56] * jjgalvez1 (~jjgalvez@ec2-54-235-219-17.compute-1.amazonaws.com) Quit (Quit: Leaving.)
[0:56] * chutzpah (~chutz@199.21.234.7) has joined #ceph
[0:56] * xdeller (~xdeller@broadband-77-37-224-84.nationalcablenetworks.ru) Quit (Quit: Leaving)
[0:58] * ScOut3R (~ScOut3R@5400A5AF.dsl.pool.telekom.hu) has joined #ceph
[0:58] * sjustlaptop (~sam@2607:f298:a:607:7079:1b6:ea66:b636) Quit (Quit: Leaving.)
[0:59] * sjustlaptop (~sam@2607:f298:a:607:3946:5844:1b2e:9df4) has joined #ceph
[1:01] <phantomcircuit> it looks like turning on the synchronization flusher helps with small io
[1:01] <phantomcircuit> proably not with random writes though
[1:08] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[1:11] * ScOut3R (~ScOut3R@5400A5AF.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[1:13] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[1:16] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) has joined #ceph
[1:16] * sjustlaptop (~sam@2607:f298:a:607:3946:5844:1b2e:9df4) Quit (Quit: Leaving.)
[1:17] * sjustlaptop (~sam@2607:f298:a:607:848a:8c3e:9d2c:d1b6) has joined #ceph
[1:17] * mattbenjamin (~matt@adsl-75-45-230-94.dsl.sfldmi.sbcglobal.net) has joined #ceph
[1:21] <phantomcircuit> neat rbd caching is making this even faster
[1:21] <phantomcircuit> i dub thee production ready
[1:21] <phantomcircuit> after i add another two monitors...
[1:22] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) has left #ceph
[1:25] * jjgalvez (~jjgalvez@ec2-54-235-219-17.compute-1.amazonaws.com) has joined #ceph
[1:29] * jlogan (~Thunderbi@2600:c00:3010:1:a9fc:bead:751e:61d9) Quit (Ping timeout: 480 seconds)
[1:30] <phantomcircuit> gregaf, im still curious what limits i was hitting since i wasn't even close to any of the ones in perf dump
[1:30] <phantomcircuit> thenhm^
[1:30] <phantomcircuit> er
[1:30] <phantomcircuit> nhm^
[1:31] <phantomcircuit> i guess theoretically i could rebuild by looking for rbd headers
[1:31] <phantomcircuit> that sounds horrible though...
[1:32] * xiaoxi (~xiaoxiche@134.134.139.74) has joined #ceph
[1:32] <gregaf> dunno, I've exhausted my performance analysis mojo elsewhere today; ask sjust
[1:32] * buck (~buck@bender.soe.ucsc.edu) Quit (Quit: Leaving.)
[1:32] <phantomcircuit> hehe
[1:33] <phantomcircuit> sorry i know i kind of get going and just uh dont stop
[1:34] <xmltok> im looking to build out an object store with ceph, my understanding is that radosgw is production ready. with swift I can put containers/accounts on SSDs but leave my main storage on spinning disks. Can ceph take similar advantage of nodes with SSD and disks?
[1:35] <gregaf> xmltok: it's not specifically documented, but definitely
[1:35] <gregaf> the gateway creates a number of pools that for a real system you should create on your own (to get the right sizes and things)
[1:36] <gregaf> then you'd have a custom CRUSH map with an SSD group and an HDD group and CRUSH rules to go with each, and place the gateway's metadata pools with the SSD rule, and the data storage with the HDD rule
[1:36] * ScOut3R (~ScOut3R@5400A5AF.dsl.pool.telekom.hu) has joined #ceph
[1:36] * psieklFH (psiekl@wombat.eu.org) has joined #ceph
[1:37] <jmlowe> sjust: you around?
[1:38] * psiekl (psiekl@wombat.eu.org) Quit (Read error: Connection reset by peer)
[1:40] <jmlowe> I have managed to reproduce a inconsistent cluster with debugging on, any of the dev's want to take a look?
[1:41] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[1:41] * loicd (~loic@magenta.dachary.org) has joined #ceph
[1:47] <xmltok> is 0.56.1 the recommended version for building a cluster for radosgw?
[1:47] * sjustlaptop (~sam@2607:f298:a:607:848a:8c3e:9d2c:d1b6) Quit (Ping timeout: 480 seconds)
[1:47] <ircolle> jmlowe: could you open up a ticket and attach the debug output?
[1:48] <jmlowe> the debugging is very large, 100's of gigs
[1:48] <jmlowe> make that megs
[1:50] <dmick> jmlowe: even compressed?
[1:50] <dmick> if so, sftp might be better
[1:51] <jmlowe> root@gwboss1:/var/log/ceph# du -ch .
[1:51] <jmlowe> 1.2G .
[1:51] <jmlowe> 1.2G total
[1:51] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[1:51] * loicd (~loic@magenta.dachary.org) has joined #ceph
[1:51] <dmick> jmlowe: is that compressed?
[1:52] <jmlowe> mostly
[1:52] <dmick> ok, then sftp is best
[1:52] * ScOut3R (~ScOut3R@5400A5AF.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[1:55] <jmlowe> *sigh, password recovery
[1:57] * xmltok (~xmltok@pool101.bizrate.com) Quit (Ping timeout: 480 seconds)
[2:00] * ninkotech_ (~duplo@89.177.137.236) has joined #ceph
[2:02] <jmlowe> dmick: tell me about sftp, where do I sent it?
[2:04] <xiaoxi> Hi, have anyone suffer from OSD down?
[2:04] * alram (~alram@38.122.20.226) Quit (Quit: leaving)
[2:04] <xiaoxi> My osd goes down when I have continuing high load on it
[2:04] <jmlowe> xiaoxi, they will want to know the version
[2:05] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[2:05] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[2:10] <xiaoxi> v0.56.1 ontop of ubuntu 12.04 (and the newest dev version 13.10)
[2:11] <jmlowe> xiaoxi: and a pastebin of any relevant logs would be good too
[2:11] * Cube (~Cube@12.248.40.138) Quit (Quit: Leaving.)
[2:12] * lxo (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[2:12] <dmick> jmlowe: crap, now I gotta look
[2:13] <xiaoxi> there are a lot of slow request warning in ceph log
[2:13] <xiaoxi> 2013-01-15 19:14:23.770086 7f20a2d57700 0 log [WRN] : slow request 53.216404 seconds old, received at 2013-01-15 19:13:30.553616: osd_op(client.10671.1:1066860 rb.0.282c.6b8b4567.000000001057 [write 2621440~524288] 2.ea7acebc) currently waiting for sub ops
[2:13] <xiaoxi> 2013-01-15 19:14:23.770096 7f20a2d57700 0 log [WRN] : slow request 51.442032 seconds old, received at 2013-01-15 19:13:32.327988: osd_op(client.10674.1:1002418
[2:13] <xiaoxi> and in the dmesg, I can see:[21199.036476] INFO: task ceph-osd:7788 blocked for more than 120 seconds.
[2:13] <xiaoxi> [21199.037493] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2:13] <xiaoxi> [21199.038841] ceph-osd D 0000000000000006 0 7788 1 0x00000000
[2:13] <xiaoxi> [21199.038844] ffff880fefdafcc8 0000000000000086 0000000000000000 ffffffffffffffe0 [21199.038848] ffff880fefdaffd8 ffff880fefdaffd8 ffff880fefdaffd8 0000000000013780 [21199.038852] ffff88081aa58000 ffff880f68f52de0 ffff880f68f52de0 ffff882017556200 [21199.038856] Call Trace:
[2:15] <jmlowe> xiaoxi: you have exceeded my expertise, somebody here should be able to help though
[2:17] <joshd> xiaoxi: that suggests the osds are extremely overloaded, or there's an issue with the filesystem or disk underneath them causing operations to stall for a long time
[2:19] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[2:19] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[2:19] <phantomcircuit> joshd, i was seeing the same thing until i enabled the flusher / sync flusher
[2:20] <phantomcircuit> xiaoxi, try turning on the flusher
[2:21] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[2:24] * ircolle (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[2:25] * LeaChim (~LeaChim@b0fadd12.bb.sky.com) Quit (Read error: Operation timed out)
[2:26] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[2:29] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) has joined #ceph
[2:34] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[2:35] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[2:35] * loicd (~loic@magenta.dachary.org) has joined #ceph
[2:37] <jmlowe> logs away, all 1G of them
[2:42] * psieklFH (psiekl@wombat.eu.org) Quit (Read error: Connection reset by peer)
[2:43] * psiekl (psiekl@87.118.124.36) has joined #ceph
[2:48] * psieklFH (psiekl@87.118.124.36) has joined #ceph
[2:48] * psiekl (psiekl@87.118.124.36) Quit (Read error: Connection reset by peer)
[2:56] <dmick> jmlowe: tnx
[2:56] <dmick> what was your bug number?
[2:56] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[2:57] <dmick> 3810 probably
[2:57] <jmlowe> oh, 3810
[2:57] * psieklFH (psiekl@87.118.124.36) Quit (Ping timeout: 480 seconds)
[2:58] <jmlowe> I hope it's enough to find and fix
[2:58] <dmick> logs arrived successfully. Hopefully sjust will get to them tomorrow
[2:59] <jmlowe> I had these on debug ms = 1
[2:59] <jmlowe> debug osd = 20
[2:59] <jmlowe> debug filestore = 20
[2:59] <jmlowe> debug journal = 20
[3:01] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[3:02] <dmick> jmlowe: should be fine settings
[3:09] * psiekl (psiekl@wombat.eu.org) has joined #ceph
[3:09] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[3:09] * psiekl (psiekl@wombat.eu.org) Quit ()
[3:10] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit ()
[3:11] <xiaoxi> joshd:How can make an OSD overloaded,I mean alought journal is faster,but it will certainly hit filestore_queue_max_byte or max_ios
[3:12] <xiaoxi> phantomcircuit:Thanks a lot but I didn't tune this part(i.e default setting)
[3:21] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[3:24] * dmick (~dmick@2607:f298:a:607:2de2:fa1:2d80:89f8) Quit (Quit: Leaving.)
[3:29] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[3:29] * loicd (~loic@magenta.dachary.org) has joined #ceph
[3:46] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[3:53] * agh (~agh@www.nowhere-else.org) Quit (Remote host closed the connection)
[3:55] * jjgalvez1 (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[3:55] * psiekl (psiekl@wombat.eu.org) has joined #ceph
[4:00] * agh (~agh@www.nowhere-else.org) has joined #ceph
[4:02] * jjgalvez (~jjgalvez@ec2-54-235-219-17.compute-1.amazonaws.com) Quit (Ping timeout: 480 seconds)
[4:08] * yehudasa_ (~yehudasa@static-66-14-234-139.bdsl.verizon.net) has joined #ceph
[4:25] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[4:28] * Cube (~Cube@184.253.68.18) has joined #ceph
[4:29] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[4:29] * Cube (~Cube@184.253.68.18) Quit ()
[4:31] * mattbenjamin (~matt@adsl-75-45-230-94.dsl.sfldmi.sbcglobal.net) Quit (Quit: Leaving.)
[4:42] * chutzpah (~chutz@199.21.234.7) Quit (Quit: Leaving)
[4:48] * xmltok (~xmltok@pool101.bizrate.com) Quit (Quit: Leaving...)
[4:48] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[4:49] * loicd (~loic@magenta.dachary.org) has joined #ceph
[4:50] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[4:50] * yehudasa_ (~yehudasa@static-66-14-234-139.bdsl.verizon.net) Quit (Ping timeout: 480 seconds)
[4:51] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[4:59] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[4:59] * loicd (~loic@magenta.dachary.org) has joined #ceph
[5:18] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[5:26] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[6:02] * eternaleye_ (~eternaley@tchaikovsky.exherbo.org) has joined #ceph
[6:02] * eternaleye (~eternaley@tchaikovsky.exherbo.org) Quit (Read error: Connection reset by peer)
[6:12] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[6:16] * yoshi (~yoshi@p29244-ipngn1601marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[6:16] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[6:17] * loicd (~loic@magenta.dachary.org) has joined #ceph
[6:20] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Read error: Connection reset by peer)
[6:21] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[6:21] * ChanServ sets mode +o scuttlemonkey
[6:30] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[6:33] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit ()
[6:43] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Quit: Leaving.)
[6:44] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[6:55] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[6:57] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[7:03] * agh (~agh@www.nowhere-else.org) Quit (Remote host closed the connection)
[7:04] * agh (~agh@www.nowhere-else.org) has joined #ceph
[7:08] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[7:08] * loicd (~loic@magenta.dachary.org) has joined #ceph
[7:15] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[7:22] * phantomcircuit (~phantomci@173-45-240-7.static.cloud-ips.com) Quit (Quit: quit)
[7:23] * phantomcircuit (~phantomci@covertinferno.org) has joined #ceph
[7:25] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[7:36] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[7:44] * gaveen (~gaveen@112.135.142.156) has joined #ceph
[7:46] * gaveen (~gaveen@112.135.142.156) Quit ()
[7:53] * mmgaggle_ (~kyle@alc-nat.dreamhost.com) Quit (Ping timeout: 480 seconds)
[8:02] * gaveen (~gaveen@112.135.151.6) has joined #ceph
[8:02] * Pagefaulted (~AndChat73@c-67-168-132-228.hsd1.wa.comcast.net) has joined #ceph
[8:11] * yoshi (~yoshi@EM117-55-68-26.emobile.ad.jp) has joined #ceph
[8:16] * agh (~agh@www.nowhere-else.org) Quit (Remote host closed the connection)
[8:26] * mmgaggle (~kyle@alc-nat.dreamhost.com) has joined #ceph
[8:30] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[8:32] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) has joined #ceph
[8:36] * low (~low@188.165.111.2) has joined #ceph
[8:36] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[8:36] * loicd (~loic@magenta.dachary.org) has joined #ceph
[8:38] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[8:52] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) has joined #ceph
[8:54] * Morg (d4438402@ircip2.mibbit.com) has joined #ceph
[8:59] * itamar_ (~itamar@82.166.185.149) has joined #ceph
[9:02] * itamar_ is now known as itamar_L
[9:11] * yoshi_ (~yoshi@EM117-55-68-130.emobile.ad.jp) has joined #ceph
[9:14] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[9:17] * yoshi (~yoshi@EM117-55-68-26.emobile.ad.jp) Quit (Ping timeout: 480 seconds)
[9:18] * tnt (~tnt@216.186-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[9:19] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[9:24] * itamar_L (~itamar@82.166.185.149) Quit (Remote host closed the connection)
[9:24] * itamar_L (~itamar@82.166.185.149) has joined #ceph
[9:30] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[9:36] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) Quit (Read error: Connection reset by peer)
[9:36] * tnt (~tnt@216.186-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[9:37] * jjgalvez1 (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[9:38] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[9:39] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:39] * John (~john@astound-66-234-218-187.ca.astound.net) Quit (Quit: Leaving)
[9:45] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:47] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Ping timeout: 480 seconds)
[9:47] * mattbenjamin (~matt@75.45.230.94) has joined #ceph
[9:58] * dosaboy (~user1@host86-164-232-154.range86-164.btcentralplus.com) has joined #ceph
[9:59] * itamar_L (~itamar@82.166.185.149) Quit (Quit: Ex-Chat)
[9:59] * itamar_ (~itamar@82.166.185.149) has joined #ceph
[10:00] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[10:01] * dosaboy (~user1@host86-164-232-154.range86-164.btcentralplus.com) Quit ()
[10:01] * dosaboy (~user1@host86-164-232-154.range86-164.btcentralplus.com) has joined #ceph
[10:05] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[10:10] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[10:10] * DJF5 (~dennisdeg@backend0.link0.net) has joined #ceph
[10:13] * jrisch (~Adium@94.191.187.122.mobile.3.dk) has joined #ceph
[10:13] <scalability-junk> darkfaded, not quite the discussion I hoped for :D
[10:17] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[10:17] <darkfaded> scalability-junk: ima go and see
[10:18] <darkfaded> but i'll also tel you my first thought at reading what you just said
[10:18] * itamar_ (~itamar@82.166.185.149) Quit (Quit: Ex-Chat)
[10:18] * itamar_ (~itamar@82.166.185.149) has joined #ceph
[10:19] <darkfaded> "if the discussion goes the wrong way then the problem is generally not yet understood and you need to retry in 2 years"
[10:19] <scalability-junk> darkfaded, yeah thought the same except of the 2 years part :P
[10:19] <darkfaded> because that's kinda my experience in OSS, if the majority of the user base / devs doesn't grasp the problem you're simply too far ahead
[10:19] * Leseb (~Leseb@193.172.124.196) has joined #ceph
[10:19] * darkfaded hands over your personal windmill
[10:21] <darkfaded> oh i see, no replies at all
[10:21] <darkfaded> thats better than a lot of missing-the-point ones
[10:21] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) has joined #ceph
[10:23] * jrisch1 (~Adium@94.191.187.240.mobile.3.dk) has joined #ceph
[10:24] * LeaChim (~LeaChim@b0fadd12.bb.sky.com) has joined #ceph
[10:24] <scalability-junk> darkfaded, depends :)
[10:26] <darkfaded> scalability-junk: anyway. right now there was nothing that really helps you. question is, how much time can you put into building some workaround for yourself, how exact / consistent should your backup be and how much money can you put inm
[10:26] * itamar_ (~itamar@82.166.185.149) Quit (Quit: Ex-Chat)
[10:26] * itamar_ (~itamar@82.166.185.149) has joined #ceph
[10:27] <scalability-junk> darkfaded, I'll try get some concepts with the replication idea and then we'll see
[10:28] <darkfaded> the last thing i had thought about yesterday was that ideally one could get consistency by simply stopping to update all the "backup" OSDs at once
[10:28] * jrisch (~Adium@94.191.187.122.mobile.3.dk) Quit (Ping timeout: 480 seconds)
[10:29] <scalability-junk> darkfaded, true
[10:29] * loicd (~loic@LPuteaux-156-16-100-112.w80-12.abo.wanadoo.fr) has joined #ceph
[10:29] <darkfaded> so, i figure that would just mean to remove them from some map? i don't have any experience in those tasks, but you can test-drive those steps
[10:29] <darkfaded> then you have consistent "backup", yet inaccessible since you tookit out
[10:29] <darkfaded> so at the same time you'd need to split off the metadata, and thats tricky i guess
[10:30] <scalability-junk> yeah the special replication idea is a bit better, cause you can just use it in a new cluster.
[10:30] <scalability-junk> I will start with rsync and then work from there to get a real solution.
[10:31] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[10:31] <darkfaded> start by identifying how much consistentecy you need
[10:31] <darkfaded> (i mean, point-in-time)
[10:32] <darkfaded> why i'm going on about that is:
[10:33] <darkfaded> you have much bigger consistency issues than if you only backupped one server
[10:33] <darkfaded> there could be 5 databases that interact stored around your rbd pool
[10:34] <darkfaded> if you need a clean same point in time backup of all then it's tricky and then "cease writing to all backup OSDs" is the best way to keep it sane
[10:34] <scalability-junk> I'll keep that in mind
[10:34] <darkfaded> and if this all not as big an issue then you can just put the OSD dirs on lvm, run a snapshot per day and rsync the delta to another place
[10:35] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[10:35] <scalability-junk> darkfaded, the issue with osd dir snapshots is the recovery would be not that easy
[10:36] <scalability-junk> but snapshots would make different point in times easier
[10:36] <darkfaded> i totally dislike backing up the osd dir to be honest
[10:36] <darkfaded> since you're running a backup and have no way to know if it's valid etc
[10:37] <scalability-junk> yeah true
[10:37] <darkfaded> hmm, but you'll generally store stuff to /dev/rbd devices?
[10:38] <scalability-junk> yeah
[10:38] <darkfaded> raid on top of them, attach, detach an iscsi target from "backup box"
[10:38] <darkfaded> gives you change bitmask and no syncing if you don't need it
[10:38] <darkfaded> and it's a clean interface
[10:39] <darkfaded> (from OS perspective, mdadm is all things but a clean interface)
[10:39] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[10:39] * itamar_ (~itamar@82.166.185.149) Quit (Quit: Ex-Chat)
[10:39] <scalability-junk> that would work till I start using it as object storage
[10:39] <darkfaded> thats why i asked :)
[10:40] <scalability-junk> yeah, but some more general ceph worthy solution would be nice.
[10:40] <scalability-junk> perhaps the discussion gets inflamed in a few days :)
[10:40] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[10:41] <scalability-junk> if not I'll look into it myself and start for the sake of complexity with rsync or lvm snapshots
[10:41] <darkfaded> it helps if you first identify how much time / money to put down for the hack
[10:42] <darkfaded> if it's supposed to be really tested and all i'd not plan less than a week for implementing it
[10:42] <darkfaded> if you can talk your way outta a data loss then it's just a day maybe hehehe
[10:42] <scalability-junk> darkfaded, depends on the stuff I learn from it and the fun it gives me :)
[10:43] <darkfaded> agree :>
[10:43] <darkfaded> i think it'll be a lot of fun
[10:43] <scalability-junk> as the production setup of openstack + ceph on 1 server is crazy on it's own, I stopped thinking about cost efficiency :)
[10:44] <darkfaded> hrhr
[10:44] <darkfaded> at $oldjob they had some financial analysis software that wass supposed to run as a grid
[10:44] <darkfaded> it was on 3 nodes: DB, master, worker
[10:45] <darkfaded> worker often had a load of 100 or so
[10:45] <darkfaded> but duuuuh :)
[10:45] <scalability-junk> sounds like my "cloud" solution with 1 server :D
[10:45] * jrisch1 (~Adium@94.191.187.240.mobile.3.dk) Quit (Read error: Connection reset by peer)
[10:45] <scalability-junk> totally not done right
[10:46] <scalability-junk> why not db + master and 2 workers?
[10:48] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[10:59] * jrisch (~Adium@83-95-19-94-static.dk.customer.tdc.net) has joined #ceph
[11:01] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) Quit (Quit: tryggvil)
[11:03] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Remote host closed the connection)
[11:03] * yoshi_ (~yoshi@EM117-55-68-130.emobile.ad.jp) Quit (Read error: Connection reset by peer)
[11:04] * yoshi (~yoshi@EM117-55-68-130.emobile.ad.jp) has joined #ceph
[11:04] * ScOut3R (~ScOut3R@dslC3E4E249.fixip.t-online.hu) has joined #ceph
[11:04] * yoshi_ (~yoshi@p6124-ipngn1401marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[11:09] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[11:12] * yoshi (~yoshi@EM117-55-68-130.emobile.ad.jp) Quit (Ping timeout: 480 seconds)
[11:17] * match (~mrichar1@pcw3047.see.ed.ac.uk) has joined #ceph
[11:31] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[11:33] * xdeller (~xdeller@broadband-77-37-224-84.nationalcablenetworks.ru) has joined #ceph
[11:38] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[11:39] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[11:42] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[11:43] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[11:50] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Ping timeout: 480 seconds)
[11:50] * tryggvil_ is now known as tryggvil
[12:07] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) Quit (Remote host closed the connection)
[12:08] * yoshi_ (~yoshi@p6124-ipngn1401marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[12:08] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[12:14] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[12:31] * Leseb (~Leseb@193.172.124.196) Quit (Ping timeout: 480 seconds)
[12:31] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[12:36] * Morg (d4438402@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[12:39] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[12:42] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[12:49] * loicd (~loic@LPuteaux-156-16-100-112.w80-12.abo.wanadoo.fr) Quit (Quit: Leaving.)
[12:50] * loicd (~loic@LPuteaux-156-16-100-112.w80-12.abo.wanadoo.fr) has joined #ceph
[12:50] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[12:56] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[13:05] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[13:06] * Leseb (~Leseb@193.172.124.196) has joined #ceph
[13:06] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Ping timeout: 480 seconds)
[13:18] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[13:18] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[13:19] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[13:30] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[13:32] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[13:40] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[13:54] * loicd (~loic@LPuteaux-156-16-100-112.w80-12.abo.wanadoo.fr) Quit (Quit: Leaving.)
[14:00] * verwilst (~verwilst@76.171-136-217.adsl-static.isp.belgacom.be) has joined #ceph
[14:02] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[14:05] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[14:05] * loicd (~loic@soleillescowork-4p-55-10.cnt.nerim.net) has joined #ceph
[14:12] <nhm> morning all
[14:17] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[14:24] <sstan> morning
[14:24] * verwilst (~verwilst@76.171-136-217.adsl-static.isp.belgacom.be) Quit (Read error: Connection reset by peer)
[14:29] * noob2 (~noob2@pool-71-244-111-36.phlapa.fios.verizon.net) Quit (Quit: Leaving.)
[14:32] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[14:40] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 481 seconds)
[14:41] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[14:45] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[14:50] <jtang> morning!
[15:14] <liiwi> good afternoon
[15:25] * tryggvil (~tryggvil@157-157-199-57.dsl.dynamic.simnet.is) has joined #ceph
[15:29] * aliguori (~anthony@cpe-70-112-157-151.austin.res.rr.com) has joined #ceph
[15:30] * Leseb (~Leseb@193.172.124.196) Quit (Quit: Leseb)
[15:33] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[15:36] * dosaboy1 (~user1@host86-164-221-235.range86-164.btcentralplus.com) has joined #ceph
[15:37] * noob2 (~noob2@ext.cscinfo.com) has joined #ceph
[15:39] * mattbenjamin1 (~matt@adsl-75-45-226-110.dsl.sfldmi.sbcglobal.net) has joined #ceph
[15:40] * dosaboy (~user1@host86-164-232-154.range86-164.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[15:41] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[15:42] * dxd828 (~dxd828@195.191.107.205) has joined #ceph
[15:45] * mattbenjamin (~matt@75.45.230.94) Quit (Ping timeout: 480 seconds)
[15:47] * xiaoxi_1 (~sun@118.133.162.42) has joined #ceph
[15:48] * vata (~vata@2607:fad8:4:6:221:5aff:fe2a:d1dd) has joined #ceph
[15:48] <xiaoxi_1> hi
[15:49] * Leseb (~Leseb@193.172.124.196) has joined #ceph
[15:50] <xiaoxi_1> What's the recommended version of linux for ceph osd?
[15:50] <noob2> ubuntu 12.04LTS
[15:50] <xiaoxi_1> with kernel 3.2?
[15:50] * jlogan (~Thunderbi@2600:c00:3010:1:a9fc:bead:751e:61d9) has joined #ceph
[15:51] <noob2> yup
[15:52] <xiaoxi_1> I will try to install a clean 12.04 tomorrow...I just misupgrade the ubuntu to 13.04 with kernel 3.7,than rollback the kernel to 3.2,but ceph still unstable
[15:52] <noob2> why kind of instability?
[15:53] <noob2> ah, you wrote to the mailing list right? i'm reading them now
[15:53] <xiaoxi_1> yes
[15:53] <xiaoxi_1> I have wrote the mail
[15:53] <xiaoxi_1> when the load is high, the OSD daemon will likely to goes down, or even cause the system reboot
[15:56] * gucki (~smuxi@80-218-125-247.dclient.hispeed.ch) has joined #ceph
[15:58] <noob2> i'm going to try and recreate this today with the load
[15:59] <xiaoxi_1> noob2:Thanks a lot...
[16:00] * tryggvil (~tryggvil@157-157-199-57.dsl.dynamic.simnet.is) Quit (Quit: tryggvil)
[16:00] <noob2> i've been wondering how much load the cluster can take before it falls over
[16:01] <madkiss> from the ceph viewpoint, cluster failover is *so* 1997. ;-)
[16:01] <noob2> haha
[16:01] <xiaoxi_1> well, why you expect the cluster to falls over with high load?
[16:01] <noob2> i thought your problem was load based?
[16:01] <xiaoxi_1> yes, I think so
[16:02] <liiwi> 13.4 has now 3.8 kernel
[16:03] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[16:04] <xiaoxi_1> I know but I cannot tryout all the combination of kernel & ubuntu version, I have to provide a stable enough setup for my boss :)
[16:04] <noob2> yeah i hear ya
[16:05] <noob2> my boss is untrusting of new things also
[16:05] * jrisch (~Adium@83-95-19-94-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[16:06] <xiaoxi_1> We are also doing some tests: 1. With 4 nodes and 60 OSDs,doing dd on 240 RBDs ,osd kernel ver 3.2, client kernel ver 3.6, one of the nodes auto reboot...
[16:06] <liiwi> watchdog?
[16:07] <xiaoxi_1> 2.with 1 nodes and 20 OSDs, set replica to 1, doing dd on 120 OSDs, stable
[16:07] * via (~via@smtp2.matthewvia.info) has joined #ceph
[16:07] <xiaoxi_1> watchdog?
[16:08] <via> with .56.1 is it still unwise to re-export ceph with nfs, assuming up to date kernel?
[16:09] <via> i've been using unfsd with lots of success but its lacking support for some things
[16:10] <xdeller> <xiaoxi_1> is reset caused by panic?
[16:12] <jtang> *sigh*
[16:12] <jtang> our pods exploded
[16:12] <jtang> btrfs on the lts kernel from elrepo isn't good
[16:12] <xiaoxi_1> Not sure ,but seems not, for panic the machine likely to hang there until I restart it by power button. This time it isn't , it just automatically restarted
[16:12] <janos> jtang - what version of btrfs? just curious
[16:12] <tnt> xiaoxi_1: wait, which nodes reboot ? the osd or client ?
[16:13] <jtang> janos: the one that ships with rhel6 (if you enable the elrepo and use the lts mainline kernel)
[16:13] <jtang> its from linux 3.0.5x
[16:13] <jtang> somewhere around that
[16:13] <xdeller> <xiaoxi_1>: if that`s osd node, would you mind to set up netconsole on it?
[16:13] <janos> i've been using fedora 17. it's btrfs 0.19 iirc
[16:13] <jtang> we experienced a one failed disk in each pod :P
[16:14] <jtang> janos: thats the userland btrfs-progs
[16:14] <janos> ah
[16:14] <jtang> im having issues with the kernel module
[16:14] <jtang> its not really doing raid10 as we expected
[16:15] <xiaoxi_1> tnt:osd
[16:16] <xiaoxi_1> xdeller: I would like to try it tomorrow once I go to lab~
[16:18] <xiaoxi_1> I have tried to read the syslog after reboot,but non valuable infos...
[16:19] <xdeller> so you have rolled back to 3.2, on what exact release? or it is 12.04 current?
[16:20] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) has joined #ceph
[16:20] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[16:23] <xiaoxi_1> 11.04 release, just rolled kernel back
[16:23] <jtang> heh the rpm repo has development releases of ceph :P
[16:23] <xiaoxi_1> I would like to clean the OS and reinstall 12.04 tomorrow, if suggested
[16:25] * tryggvil (~tryggvil@157-157-199-57.dsl.dynamic.simnet.is) has joined #ceph
[16:26] <xdeller> as for me, seems using 3.2 'stable' kernel is enough, osds doing fine on it during today tests
[16:26] * tryggvil (~tryggvil@157-157-199-57.dsl.dynamic.simnet.is) Quit (Read error: Connection reset by peer)
[16:27] <xiaoxi_1> how about you put high load on it?
[16:27] * drafter (~drafter@62.173.129.210) has joined #ceph
[16:27] * tryggvil (~tryggvil@157-157-199-57.dsl.dynamic.simnet.is) has joined #ceph
[16:27] <xiaoxi_1> say 4 times more RBD Volumes than OSDs
[16:29] <drafter> hello, could you please help with a bit silly question ) the question is
[16:29] <drafter> Is it allowed to reuse single instance of RadosClient for sequential connection tries (imagine that we are trying to connect to shutoff cluster and reconnecting every 5 secs) or it's required to create new instance for every next connection try?
[16:30] <xdeller> not sure if this count really matters, I have started ~5 heavy vms on each osd node, doing both fio and memory benchmarks at same time, raising 24-ht-cores vm la to 20
[16:30] <tnt> xiaoxi_1: what's the backing FS ? xfs or btrfs or ext4 ?
[16:30] <xdeller> *vm to host node, I have mistyped above
[16:32] <xiaoxi_1> tnt:xfs
[16:32] <- *nwat* hi
[16:32] <xiaoxi_1> xdeller:have you ever seen "slow request" in osd's log?
[16:34] <xdeller> hmm, I may raise I/O pressure times up and see it, but that`s is not general matter, just limiting client` overall throughput will fix it :)
[16:35] <xdeller> I`m aware of kernel issue on recent kernels, 3.6 and up
[16:35] * tryggvil (~tryggvil@157-157-199-57.dsl.dynamic.simnet.is) Quit (Quit: tryggvil)
[16:35] <xdeller> in my case, that`s just a bunch of deadlocks on 3.6/3.7 and panic on 3.8, seems the same on your 3.7 ubuntu
[16:36] <tnt> I've seen xfs 'deadlocks' (causing ceph-osd process to 'hang') when running with xfs filestore on 3.2 kernel from ubuntu.
[16:36] <rlr219> Any crushmap experts?
[16:37] <xdeller> tnt: how it was looked? as large iowait or userspace or sys time?
[16:38] <xiaoxi_1> tnt:well, so bad news again,3.2 with XFS still seems unstable?
[16:39] <tnt> xdeller: like this in the 'dmesg': http://pastebin.com/raw.php?i=AtAawLRp from the cluster PoV, the OSD was marked down and out.
[16:40] <xdeller> hmm, just a plain i/o lockup, at first sight
[16:40] <gucki> hi there
[16:41] <gucki> any ceph dev here who would like to test issue 3797 with me?
[16:41] <xdeller> tnt: mean, disk throughput simply is not enough for osd above
[16:42] <tnt> xdeller: there was almost no disk activity and 5 hours later with the osd marked down & out and really doing no IO at all the task was still ung.
[16:44] <tnt> so something locked it up ... no idea what. Restarting the osd was enough to unlock it (no need to reboot the machine or remount the fs or anything)
[16:45] <xdeller> tnt: what about processor activity on the node at this time?
[16:47] <tnt> xdeller: I can't really tell unfortunately.
[16:47] <tnt> The OSD do run as VM under xen (but have dedicated memory and disk spindles).
[16:49] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[16:49] <xdeller> uh-oh, xen do fine until it`ll hit a lot of I/O, judging from my experience, may be xen devs fixed situation with hypervisor scheduler
[16:50] <tnt> I did do a bunch of dd tests both on the VM and 'natively' to see the impact of xen and it was insignificant when you had the whole physical hdd attributed to a VM. (it's of course different with VM image files).
[16:51] <tnt> Our cluster is also not exactly "stressed" or under high load .. especially at that time of the day.
[16:52] <xdeller> yes, on raw devices there is almost no difference, i`m pointing at very poor scheduler policy by default long ago, in 4.0 times
[16:52] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[16:54] <xiaoxi_1> tnt:have you reported this to sage or someone...
[16:54] <xiaoxi_1> is there any solution?
[16:54] * low (~low@188.165.111.2) Quit (Quit: Leaving)
[16:55] <tnt> xiaoxi_1: no I didn't.
[16:57] <xiaoxi_1> ok then ,so the conclusion is, tnt has also suffered from XFS for 3.2 kernel whild xdeller suffered deadlock on 3.6,3.7 and panic on 3.8
[16:59] <rlr219> i have 10 servers with 2 OSDs each. I have replication of 2, but ideally, I would like to make it so no PG is written to the 2 OSDs on the same machine? how would I do that, or does ceph handle that by default?
[16:59] <rlr219> sorry, replication is 3 not 2
[17:00] * ircolle (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) has joined #ceph
[17:00] <janos> rlr219: that can be defined through the crushmap
[17:01] <janos> rlr219: i *think* it defaults to host, but don't quote me on that
[17:02] <xiaoxi_1> rlr219:it's default to host
[17:03] * tryggvil (~tryggvil@nova106-255.cust.nova.is) has joined #ceph
[17:03] * xiaoxi_1 (~sun@118.133.162.42) Quit ()
[17:03] * joshd1 (~jdurgin@2602:306:c5db:310:28f6:eba8:a4e6:3810) has joined #ceph
[17:05] * tryggvil (~tryggvil@nova106-255.cust.nova.is) Quit ()
[17:14] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:16] * ron-slc (~Ron@173-165-129-125-utah.hfc.comcastbusiness.net) Quit (Remote host closed the connection)
[17:17] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[17:25] <rlr219> so if the 2 OSDs are defined under the same host, then it will not write a PG to both OSDs?
[17:26] <jmlowe> with the defaults, yes that's correct rlr219
[17:28] <tnt> I would still check the crushmap to be sure and also check ceph osd tree to make sure ceph is aware of the 'geography' of your OSDs.
[17:28] * mattbenjamin1 (~matt@adsl-75-45-226-110.dsl.sfldmi.sbcglobal.net) Quit (Quit: Leaving.)
[17:28] <sstan> what if that host has 2 hard drives?
[17:32] <rlr219> tnt: it is the default crush map generated when i made created the cluster. But in ceph.conf, I have the osd.0 & osd.1 under svr1, osd.2 & osd.3 under svr2, etc.
[17:34] <jmlowe> sstan: doesn't matter, the rules generated by default say start with a rack and all hosts are in the same rack, then pick a host in the rack, then pick an osd on that host, then repeat for as many replicas as are required (2 by default) each time taking the already used hosts and osd's out of consideration
[17:35] * sander (~chatzilla@c-174-62-162-253.hsd1.ct.comcast.net) has joined #ceph
[17:35] <rlr219> Outstanding. Thanks. now to get cephx working. ;)
[17:37] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:38] <jmlowe> sstan: for example rack-> host2 (randomly selected) ->osd3 (randomly selected) then rack -> host1 (host2 already used so not considered leaving only host1) -> osd2 (randomly selected)
[17:39] <sstan> hmm I didn't know that was the default crushmap, thanks for the info
[17:40] <jmlowe> rule data {
[17:40] <jmlowe> ruleset 0
[17:40] <jmlowe> type replicated
[17:40] <jmlowe> min_size 1
[17:40] <jmlowe> max_size 10
[17:40] <jmlowe> step take default
[17:40] <jmlowe> step chooseleaf firstn 0 type host
[17:40] <jmlowe> step emit
[17:40] <jmlowe> }
[17:40] * ScOut3R (~ScOut3R@dslC3E4E249.fixip.t-online.hu) Quit (Ping timeout: 480 seconds)
[17:40] <jmlowe> is one of the rules auto generated for my cluster
[17:42] <sstan> so firstn 0 type host translates to : take the first OSD of some host
[17:42] * Pagefaulted (~AndChat73@c-67-168-132-228.hsd1.wa.comcast.net) Quit (Ping timeout: 480 seconds)
[17:43] <sstan> and that defines how a PG maps to OSDs ... if I am correct
[17:43] <rlr219> jmlowe: eight of my servers are in one rack and 2 are in a separate rack. I would like to split them up that way. do I just create a second rack and pool and then duplicate the rules for that rack/pool also?
[17:44] <rlr219> Or I could have them both under the same DC and use the DC in the rule definitions?
[17:45] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[17:47] <jmlowe> rlr219: just an end user here so I'm not a definitive source and this isn't something I've done more than once or twice: I think what you want to do is add a rack type object called rack2 maybe I think the default is unknown rack, then add that rack to the default pool, then add a rule 'step chooseleaf firstn 0 type rack' between take default and 'step chooseleaf firstn 0 type host'
[17:47] <janos> oh you can stack the chooseleaf rules?
[17:47] <janos> neat
[17:47] <janos> i haven't gotten into crushmap rules as much as i would like
[17:48] * mattbenjamin (~matt@aa2.linuxbox.com) has joined #ceph
[17:48] <jmlowe> that may be incorrect, it may be stack the choose and make sure the there is only one chooseleaf at the end
[17:49] <jmlowe> so 'step choose firstn 0 type rack'
[17:52] <jmlowe> you should really get confirmation as changing the rules is slightly dangerous
[17:56] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[18:01] * drafter (~drafter@62.173.129.210) Quit (Quit: drafter)
[18:02] <rlr219> jmlowe: I understand. I do not want to break my cluster. Thanks
[18:04] * tnt (~tnt@120.194-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:08] * gaveen (~gaveen@112.135.151.6) Quit (Ping timeout: 480 seconds)
[18:16] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[18:17] * Leseb (~Leseb@193.172.124.196) Quit (Quit: Leseb)
[18:17] * gaveen (~gaveen@112.135.132.110) has joined #ceph
[18:18] * terje (~terje@71-218-10-180.hlrn.qwest.net) has joined #ceph
[18:18] * terje_ (~joey@71-218-10-180.hlrn.qwest.net) has joined #ceph
[18:19] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Remote host closed the connection)
[18:19] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[18:24] * portante (~user@66.187.233.206) Quit (Remote host closed the connection)
[18:25] * match (~mrichar1@pcw3047.see.ed.ac.uk) Quit (Quit: Leaving.)
[18:26] * sstan (~chatzilla@dmzgw2.cbnco.com) Quit (Remote host closed the connection)
[18:29] * nwat (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[18:29] * gucki (~smuxi@80-218-125-247.dclient.hispeed.ch) Quit (Remote host closed the connection)
[18:32] * xmltok (~xmltok@pool101.bizrate.com) Quit (Quit: Leaving...)
[18:42] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[18:44] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[18:51] <noob2> jmlowe: yes i found that out also :). changing rules caused my cluster to thrash about for about 5hrs :D
[18:51] <noob2> it eventually settled down
[18:52] * portante (~user@66.187.233.206) has joined #ceph
[18:52] * The_Bishop (~bishop@2001:470:50b6:0:352d:126e:6cd2:9040) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[18:53] * zK4k7g (~zK4k7g@digilicious.com) has joined #ceph
[18:54] * rturk-away is now known as rturk
[18:54] * drafter (~drafter@nat2-gw04-62.wwwcom.ru) has joined #ceph
[18:56] <rlr219> is there a crushmap expert, from inktank, willing to look my "original" and "revised" crushmap to see if my thought process is correct?
[18:57] <noob2> yeah i changed my from firstn 0 type host
[18:57] <noob2> to firstn 0 type rack
[18:59] <noob2> that way i have one replica per rack
[18:59] * loicd (~loic@soleillescowork-4p-55-10.cnt.nerim.net) Quit (Quit: Leaving.)
[19:01] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) Quit (Quit: Leaving.)
[19:01] <gregaf> rlr219: if you pastebin it I'll glance through at some point today
[19:05] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[19:05] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[19:08] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[19:11] * Oliver1 (~oliver1@ip-178-203-175-61.unitymediagroup.de) has joined #ceph
[19:13] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[19:16] * sleinen1 (~Adium@2001:620:0:46:a1d0:36b2:c4ae:f6fd) has joined #ceph
[19:19] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[19:22] * sleinen (~Adium@2001:620:0:25:8473:a0ee:2ded:7ebc) Quit (Ping timeout: 480 seconds)
[19:23] <rlr219> gregaf: http://pastebin.com/vrUi861z Thanks!!
[19:24] * drafter (~drafter@nat2-gw04-62.wwwcom.ru) Quit (Quit: drafter)
[19:26] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[19:34] * alram (~alram@38.122.20.226) has joined #ceph
[19:40] * PerlStalker (~PerlStalk@72.166.192.70) Quit (Quit: ...)
[19:46] * Cube (~Cube@12.248.40.138) has joined #ceph
[19:48] <jks> first time adding a new osd to the system here... ceph went to state HEALTH_WARN and pgs are backfilling and recovery... however it has been that way for hours, and the % degraded is not moving at all
[19:48] <jks> nothing seems to be written to the drive on the new osd really
[19:48] * BManojlovic (~steki@85.222.177.75) has joined #ceph
[19:49] <jks> is there a way to monitor what is happening? - or can I increase the speed by which it will work the new osd into the system?
[19:49] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[19:49] <jks> I was considering increasing osd_max_backfills (currently 10), or osd_recovery_max_active (currently 5).... but not sure if that is the right way to go
[19:50] * dosaboy1 (~user1@host86-164-221-235.range86-164.btcentralplus.com) Quit (Quit: Leaving.)
[19:50] <jks> network is not utilized at all, cpus are idle, etc.
[19:50] * dosaboy (~gizmo@host86-164-221-235.range86-164.btcentralplus.com) has joined #ceph
[19:51] <darkfaded> jks: i think you may have put it in the wrong pool
[19:51] <darkfaded> but i've not mastered that so i'm not sure
[19:51] <jks> darkfaded, I don't think so... I ran "ceph osd tree" and it seems correct there
[19:52] <jks> I'm using only pool default, under which I have the default "unknownrack" rack... and then I have my hosts and the osd's are listed under each host appropriately
[19:52] <darkfaded> ok
[19:53] <jks> the mon started logging stuff like this: mon.b@1(peon).log v1026865 check_sub sending message to client.? 10.0.0.1:0/8934 with 1 entries (version 1026865)
[19:53] <jks> it didn't do that before... but not sure if that is of any importance... trying google the message, but didn't find anything
[19:53] <darkfaded> i'm not sure if that relates to the osd really
[19:53] <darkfaded> another thing i sucessfully get wrong quite often is simply mixing up the osd mountpoint
[19:54] <darkfaded> just as a thing to double check
[19:54] <rlr219> jks: what version are you on?
[19:54] <jks> rlr219, 0.56.1
[19:55] <jks> darkfaded, I think the mount point is correct... it loaded about 11 GB of data on to the disk during the first minutes after I added the osd... but for several hours it is still only showing 11 GB of data on the drive
[19:55] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[19:56] <jks> 2.6 TB of data on the system in total (out of 18 TB avail).... added an extra 2 TB osd
[19:58] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[20:00] <rlr219> are you showing any delayed write messages in ceph -w?
[20:01] * xmltok_ (~xmltok@pool101.bizrate.com) has joined #ceph
[20:01] <jks> rlr219, nope... I had a few "slow requests"... but only about 1 of them per 5-10 minutes
[20:01] <mikedawson> jks: have you tried setting the tunables? I have had 0.56.1 get stuck while rebalancing similar to what you are describing. Setting the tunables has worked for me. http://ceph.com/docs/master/rados/operations/crush-map/#tunables
[20:01] * xmltok (~xmltok@pool101.bizrate.com) Quit (Read error: Operation timed out)
[20:01] <jks> perhaps I'm just impatient and this really is intended to take some days to complete?
[20:02] <jks> mikedawson, no, haven't tried that... wasn't aware of them! I'll read the docs, thanks!
[20:02] <rlr219> jks: I dont think so. I have about 1.8 TB of data and I moved my replication level from 2 to 3 and it only took about 1.5 hours to recover.
[20:03] <jks> rlr219, on how many osds? - I only have a small number
[20:03] <mikedawson> jks: if this is production data, beware that these settings are considered a bit unsafe
[20:03] <jks> mikedawson, I'm still just testing, so not a huge problem if it burns the data
[20:03] <rlr219> 20 OSDs (3TB disk each)
[20:03] <mikedawson> jks: the process I used is documented in the first comment here http://tracker.newdream.net/issues/3720
[20:05] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[20:05] * dmick (~dmick@38.122.20.226) has joined #ceph
[20:05] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[20:06] <rlr219> fwiw, i would restart the OSD first just to see if shakes loose first. then maybe play with tunables.
[20:06] <rlr219> if needed
[20:06] <jks> sounds like good advice... I'll try that
[20:07] <jks> I think I'll let it run and go watch the match, and then check back in 2 hours time to see if it has progressed any... and if not, try the tunables
[20:07] <rlr219> what is your setup like?
[20:08] <jks> rlr219, I'm currently testing and getting to know ceph... so I have 3 servers with 4 x 2 TB SATA drives each... each has 3 x 1 GigE bonded connection
[20:08] <jks> I have 2 additional servers for use as test clients
[20:08] <mikedawson> jks: bouncing the OSD processes or nodes never got mine going again. If it works for yours don't bother with the tunables
[20:09] <jks> rlr219, I made the mistake (I think) of setting up the servers with RAID-5 in my first attempt
[20:09] <jks> rlr219, so now I'm trying to add a non-RAIDed osd to test how that performs
[20:11] <rlr219> But if your cluster is RAID and you add an OSD that is not part of the RAID, it wont write data to the non-RAID OSD (I think) because it has to be brought into the RAID to get any data. But i could be wrong
[20:12] <jks> what I meant was that I had an RAID-5 on which I created the filesystem for the osd to use
[20:12] <jks> i.e. osd layered on top of raid-5... not the other way around
[20:12] <jks> now I have a server with a non-RAIDed disk, which I'm adding as an osd to the ceph system
[20:12] <rlr219> ok never mind. ;)
[20:13] <rlr219> not sure then. sorry.
[20:14] <jks> no problem! .... I'll report back in a few hours if I had any success! - thanks for the help!
[20:15] <paravoid> there's definitely still something wrong with 0.56.1
[20:16] <paravoid> I had 48 osds, added another 12 of them
[20:16] <paravoid> just added them, nothing else
[20:16] <paravoid> and now the state is
[20:16] <paravoid> 2013-01-16 19:14:43.122272 mon.0 [INF] osdmap e34033: 60 osds: 23 up, 47 in
[20:16] <paravoid> random osds dying all over the place
[20:16] <nhm> paravoid: no good. :(
[20:17] <nhm> paravoid: Seems like a couple of people are seeing that sort of thing. I wonder why we aren't encountering it.
[20:17] <paravoid> I have to hit the road now
[20:17] <paravoid> this isn't production yet, so I'll just leave it coalesce
[20:17] <paravoid> but I have to add two more boxes (12 osds each) later today
[20:18] <paravoid> so I'll definitely will grab logs
[20:18] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) has joined #ceph
[20:18] <paravoid> but this sucks :)
[20:19] <nhm> paravoid: indeed, it makes me cringe!
[20:19] <nhm> paravoid: hopefully between you and a couple of the other folks we can track it down.
[20:19] <paravoid> it went up to 43, then 40 again
[20:20] * rturk is now known as rturk-away
[20:20] <paravoid> and 21 now
[20:21] <dmick> dying to see a log
[20:22] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[20:27] <paravoid> still unstable as hell
[20:27] <paravoid> osds still flapping
[20:27] <paravoid> damn
[20:35] <jmlowe> ok this is weird, my inconsistent pg's have gone consistent by themselves
[20:36] <jmlowe> only thing I can think of is I restarted all of my osd's to force them to write their debugging logs
[20:36] <jmlowe> running deep-scrub across the board to confirm
[20:37] <jmlowe> if they stay clean then it would seem that sometimes my primaries don't complete their transactions until they are restarted and replay their journals while the secondaries work as expected
[20:37] <jmlowe> is that possible?
[20:39] * dosaboy (~gizmo@host86-164-221-235.range86-164.btcentralplus.com) Quit (Remote host closed the connection)
[20:39] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[20:39] * dosaboy (~gizmo@host86-164-221-235.range86-164.btcentralplus.com) has joined #ceph
[20:47] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[20:48] * itamar_ (~itamar@IGLD-84-228-8-119.inter.net.il) has joined #ceph
[20:49] * Cube1 (~Cube@12.248.40.138) has joined #ceph
[20:49] * xmltok_ (~xmltok@pool101.bizrate.com) Quit (Quit: Leaving...)
[20:51] * Cube (~Cube@12.248.40.138) Quit (Ping timeout: 480 seconds)
[20:54] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[20:55] * SkyEye (~gaveen@112.135.135.76) has joined #ceph
[20:56] * SkyEye (~gaveen@112.135.135.76) Quit ()
[20:56] * itamar_ (~itamar@IGLD-84-228-8-119.inter.net.il) Quit (Quit: Ex-Chat)
[20:59] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[20:59] * gaveen (~gaveen@112.135.132.110) Quit (Ping timeout: 480 seconds)
[21:13] * ircolle (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[21:13] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[21:23] * eternaleye_ is now known as eternaleye
[21:24] * ircolle (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) has joined #ceph
[21:24] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[21:29] * The_Bishop (~bishop@e179002005.adsl.alicedsl.de) has joined #ceph
[21:30] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:31] * jrisch (~Adium@4505ds2-hi.0.fullrate.dk) has joined #ceph
[21:41] * Oliver1 (~oliver1@ip-178-203-175-61.unitymediagroup.de) Quit (Quit: Leaving.)
[21:43] * Oliver1 (~oliver1@ip-178-203-175-61.unitymediagroup.de) has joined #ceph
[21:47] <loicd> Leseb: nice user story ;-) I think there is a word missing around "instantly’ available on every web virtual."
[21:48] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[21:55] * ScOut3R (~ScOut3R@5400A5AF.dsl.pool.telekom.hu) has joined #ceph
[21:57] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[21:57] * The_Bishop_ (~bishop@e179012071.adsl.alicedsl.de) has joined #ceph
[22:01] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[22:01] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[22:03] * amichel (~amichel@salty.uits.arizona.edu) has joined #ceph
[22:05] * The_Bishop (~bishop@e179002005.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[22:06] <gregaf> rlr219: I didn't proofread it, but the addition of enclosures makes sense and your rule changes to use them as the top-level divider (instead of hosts) look fine
[22:06] <gregaf> I do notice that enclosure 3 has half the disks as the others, so base your performance expectations on its throughput and not the throughput of the others
[22:07] * ScOut3R (~ScOut3R@5400A5AF.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[22:12] * rturk-away is now known as rturk
[22:13] <elder> Back in an hour or so.
[22:16] <rlr219> understood gregaf. Thanks!
[22:16] * sstan (~chatzilla@dmzgw2.cbnco.com) has joined #ceph
[22:17] <jmlowe> never mind, my pg's are still inconsistent following deep scrub
[22:18] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:21] <amichel> I was in a couple days ago getting some help with my crushmap, but we never got through with it. Is there anyone around that could continue helping me debug why my crushmap won't compile?
[22:22] * Oliver1 (~oliver1@ip-178-203-175-61.unitymediagroup.de) Quit (Quit: Leaving.)
[22:27] <dmick> amichel: I spent a while on that and didn't get anywhere other than "there's some simple syntax error" (i.e. it's not a semantic error I don't think). I'll exert a little more effort on it now
[22:30] <amichel> dmick: Awesome, thanks for taking another crack at it
[22:32] <amichel> I think my pastebin expired, I threw it up on dpaste as well if you need another copy: http://dpaste.org/0pfnt/
[22:32] <amichel> I believe this revision includes the minor fixes we and others talked about last time
[22:33] <dmick> ok I'll start from there
[22:33] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Read error: Connection reset by peer)
[22:33] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[22:34] * chutzpah (~chutz@199.21.234.7) has joined #ceph
[22:34] * Leseb_ (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[22:35] * jrisch (~Adium@4505ds2-hi.0.fullrate.dk) Quit (Ping timeout: 480 seconds)
[22:36] <dmick> amichel: well alram found the error straightaway
[22:36] <dmick> rule data has step chooseleaf 0 firstn type host
[22:36] <dmick> 0 and firstn need to swap positions
[22:37] <dmick> then it compiles
[22:37] <amichel> wow
[22:37] <amichel> How did I miss that
[22:37] <dmick> the complete misleading error messages :)
[22:37] <dmick> 225 error: parse error at '# rules' is what I got this time
[22:37] <dmick> which is before the error :(
[22:37] <dmick> but anyway thanks alram
[22:37] <gregaf> I believe last time you posted it the error was it was missing "firstn" completely, so apparently you put it in and said "didn't help!" :p
[22:38] <dmick> well I have one where that's fixed but it still failed
[22:39] <gregaf> I think there were like 3 and probably they all spit back the same parse error, so if you were trying one fix at a time...
[22:39] <dmick> yeah, I dunno.
[22:40] <amichel> I'm just not sure how I did the other two rules right and kept overlooking the order on that one
[22:40] <amichel> Yep, that compiles
[22:41] <amichel> And just as a point of procedure, I'm going into test with a single storage host, but I intend to add another before I go prod. Do I just add the appropriate lines to the crushmap after I add the new host and just set it again?
[22:41] <amichel> Doesn't destroy data or anything awful?
[22:43] <jks> hmm, I added a new osd to the system and let it run for some hours... now "ceph status" shows unfound pgs... I have never seen that before... how could that be?
[22:44] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[22:44] <dmick> gregaf: I've never modified crush on an existing cluster personally; can you verify ?
[22:45] <dmick> jks: all the OSDs still up and happy?
[22:45] <jks> dmick, yes, the status before was HEALTH_OK and nothing have happened to any of the existing osds
[22:46] <jks> sorry - that is not right... damn, I missed that
[22:46] <gregaf> dmick: amichel: I would use the CLI rather than manual decode/modify/inject
[22:46] <jks> hmmm, one of the osds crashed...
[22:47] <jks> the other osds on the same machine run without problems, and nothing happened to the disk that osd is serving
[22:47] <gregaf> the standard OSD addition (see eg http://ceph.com/docs/master/rados/operations/add-or-rm-osds/) includes updating the CRUSH map to include the OSD in data placement
[22:47] <jks> the logs says: common/HeartbeatMap.cc: 78: FAILED assert(0 == "hit suicide timeout")
[22:47] <jks> and then a stack trace
[22:48] <jks> perhaps that is connected with what I complained about earlier, that I added the osd and waited hours, but nothing seemed to happen... it wasn't filling up data on the new osd
[22:48] * benpol (~benp@garage.reed.edu) has left #ceph
[22:49] * vata (~vata@2607:fad8:4:6:221:5aff:fe2a:d1dd) Quit (Quit: Leaving.)
[22:50] <dmick> jks: yeah, something's wrong. How did you add the new OSD?
[22:51] <jks> dmick, I followed this guide: http://ceph.com/docs/master/rados/operations/add-or-rm-osds/
[22:52] <jks> so basically osd create, then added the auth, added the osd to the crush map and started the osd program
[22:52] <jks> the guide says to use "ceph osd in" to switch from out to in state... but it was automatically in the "in" state
[22:53] <jks> apart from that I followed the guide to the letter I think
[22:54] <jks> this is not good... osd's on various servers are just crashing now
[22:55] <jks> and all I did was add an extra osd to an otherwise functioning cluster
[22:55] <jks> osd on a different server went: "7f605d21c700 -1 *** Caught signal (Aborted) **" and a stack trace :-(
[22:57] <amichel> Hm, I applied the new crushmap but it seems that all my existing pgs are now "stuck unclean"
[23:00] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Read error: Connection reset by peer)
[23:01] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[23:04] <dmick> jks: pastebin the stack trace?
[23:05] <dmick> jks: what version are you running?
[23:05] <jks> dmick, 0.56.1
[23:06] <dmick> amichel: what had you been using as a crushmap before?
[23:06] <amichel> Just the default
[23:06] <jks> dmick: http://pastebin.com/GnLr9XwV <-- stack trace
[23:06] <amichel> I think the problem is I set the ruleset to diversify on hosts, but I only have one host?
[23:07] <amichel> Though they're all min_size 1 and I do have one host, so that should be kosher I guess?
[23:08] <amichel> I'm still putting all this together in my head, apologies if I'm getting the concepts wrong
[23:08] <dmick> jks: that's the original trace from the new OSD, right? Didn't you say a second, preexisting, OSD had died?
[23:08] <jks> jks: that is the trace from the second, preexisting osd that died
[23:09] <jks> actually two pre-existing osds died
[23:09] <jks> the new OSD didn't die
[23:09] <dmick> ok
[23:09] <jks> I added the osd and and it started filling the drive... when it reached about 11 GB of data on the drive, nothing more seemed to happen... I let it run for several hours, and still only 11 GB of data on the drive of the new osd
[23:10] <dmick> is your network connectivity still kosher? It really seems like the OSDs can't talk to each other
[23:10] <jks> after my questions here I decided to let it run for a few hours more before doing anything... and when I got back to check, now 2 other osds crashed with that stack trace
[23:10] <jks> dmick, yes, nothing is wrong with the network connectivity at all
[23:10] <jks> it is a test setup with a dedicated network - and no errors have been logged on the switch or anything like that
[23:11] <dmick> ceph osd dump shows?..
[23:11] <jks> it is behaving oddly now... something crashed again... it shows:
[23:11] <jks> osdmap e237: 4 osds: 2 up, 3 in
[23:12] <jks> dmick, here's the ceph osd dump: http://pastebin.com/4RWzy49t
[23:12] <jks> hmm, the same osds crashed again
[23:13] <jks> same "hit suicide timeout" message hmm
[23:13] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[23:13] <jks> I have been running this test system unchanged for a few months - and never had any problems like this before
[23:13] * Leseb_ (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Remote host closed the connection)
[23:13] * Leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[23:29] <Psi-jack> Oooooh, Ceph 0.56.1 is already out, and I still haven't even upgraded from 0.55-git! Grrr. ;)
[23:30] <dmick> jks: can you add, if you don't have them set, debug osd = 20 and debug filestore = 20 and debug ms = 1 to ceph.conf, and try starting one of the dying OSDs again?
[23:31] <jks> dmick, I already restarted both of them some time ago... none of them have come "up" yet... so I don't dare stop them again in the middle of starting up
[23:31] <jks> but if they crash again, I'll do it ofcourse
[23:31] <jks> I have checked the kernel logs, etc. - nothing seem to indicate anything wrong with the underlying filesystem or disks
[23:34] <dmick> jks: it should always be safe to kill an OSD; it's designed to handle failure
[23:34] <xdeller> jks: may I guess symptoms - eating one core, almost no disk activity and no appearing as 'up'?
[23:34] <dmick> but waiting is OK too
[23:34] <dmick> strace'ing the process might give a clue
[23:34] <dmick> is it logging anything in its current state?
[23:34] <jks> xdeller, it's not really using any cpu time, no
[23:34] <jks> it usually takes a while for a osd to restart
[23:35] <jks> I straced it earlier where it was doing a lot of stat() and unlink() on files named "FORREMOVAL...."
[23:36] <Psi-jack> When I upgrade to Ceph 0.56.1, I should be able to just take a full node offline, upgrade ceph's packages, and bring it back up, having mixed 0.55-git and 0.56.1 servers running in the same network cluster?
[23:37] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[23:37] <phantomcircuit> Psi-jack, i believe so
[23:37] * loicd (~loic@magenta.dachary.org) has joined #ceph
[23:37] <jks> sory, gotta go... will be back later with debug logs
[23:37] <phantomcircuit> you should wait for 0.56.2 though
[23:37] <Psi-jack> I should?
[23:38] <phantomcircuit> 0.56 seems to have caused some problems for people
[23:38] <Psi-jack> There some serious issues in 0.56.1 that's not present in some random 0.55 git checkout from sometime in December? ;)
[23:38] <phantomcircuit> probably not lol
[23:38] <Psi-jack> heh
[23:38] <Psi-jack> Yeah, I'm not upgrading from Argonaught. I'm upgrading from a dev build. :)
[23:47] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[23:47] * loicd (~loic@magenta.dachary.org) has joined #ceph
[23:52] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[23:52] * mattbenjamin (~matt@aa2.linuxbox.com) Quit (Quit: Leaving.)
[23:58] * Cube1 is now known as Cube

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.