#ceph IRC Log

Index

IRC Log for 2013-06-12

Timestamps are in GMT/BST.

[0:02] * darkfaded (~floh@88.79.251.60) has joined #ceph
[0:02] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) Quit (Quit: jlogan1)
[0:03] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[0:06] * miniyo (~miniyo@0001b53b.user.oftc.net) Quit (Ping timeout: 480 seconds)
[0:07] * darkfader (~floh@88.79.251.60) Quit (Read error: Connection reset by peer)
[0:07] * miniyo (~miniyo@0001b53b.user.oftc.net) has joined #ceph
[0:08] <andrei> sjust: i've stopped the slow server and just left the fast one running
[0:08] <andrei> dropped cache on the fast server
[0:09] <andrei> and will now test some disk performance
[0:22] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[0:27] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[0:28] * PerlStalker (~PerlStalk@72.166.192.70) Quit (Quit: ...)
[0:34] * redeemed (~quassel@static-71-170-33-24.dllstx.fios.verizon.net) Quit (Remote host closed the connection)
[0:40] <andrei> sjust: when reading from osd's spinning disks I do not really see speeds over 40mb/s from the fast server
[0:40] <andrei> in general they are in their 20-30mb/s
[0:40] <andrei> occasionally going to around 40mb/s
[0:40] <andrei> that is using 4 dd processes with bs=8M
[0:42] * tnt_ (~tnt@91.176.51.54) has joined #ceph
[0:42] <andrei> the picture is about the same when using 8 dds
[0:43] <andrei> the cumulative bandwidth of 8 dds with bs=4M is about 350mb/s
[0:44] <andrei> that is from one server with 9 osds
[0:44] <andrei> each capable of doing around 160MB/s
[0:44] * tnt (~tnt@228.199-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[0:46] <sjust> andrei: you dropped cache once at the beginning of the test?
[0:56] * mschiff (~mschiff@81.92.22.210) Quit (Remote host closed the connection)
[0:58] * Maskul (~Maskul@host-92-25-200-200.as13285.net) Quit (Quit: Maskul)
[1:02] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[1:04] * sagelap (~sage@63-237-196-66.dia.static.qwest.net) has joined #ceph
[1:09] <tnt_> Mmm, those slow requests on OSD restart again. And as someone pointed out, it's not when you shut down the OSD. It's when you start it again that it happens.
[1:09] <paravoid> tnt_: do you have pgs in a peering state?
[1:10] * aliguori (~anthony@32.97.110.51) Quit (Quit: Ex-Chat)
[1:11] <paravoid> I have both http://tracker.ceph.com/issues/5084 & http://tracker.ceph.com/issues/5297
[1:11] <paravoid> s/have/am experiencing/
[1:12] <tnt_> paravoid: yes, that during the peering process. Seems as soon as it boots, it starts receiving IO requests for the PG where it's "master" and it can't serve them yet. (AFAICT)
[1:12] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[1:12] <tnt_> it resolves but it can take a couple of mins where all VMs are stalled
[1:12] <paravoid> so you have pgs in a peering state that stay there for more than a few seconds?
[1:12] <tnt_> yup
[1:12] <paravoid> right
[1:12] <paravoid> that's #5084 I think
[1:13] <paravoid> which unlike its title, is not bobtail specific
[1:13] <paravoid> I was troubleshooting this with sjust last week, he's working on a fix
[1:13] <tnt_> Yup on on 0.61.3
[1:13] <paravoid> right, same here
[1:13] <sjust> paravoid, tnt_: I've got a branch I'm working on for master
[1:13] <sjust> cuttlefish may require a different approach
[1:14] <paravoid> tnt_: in my case, it happens just on the first restart after a while
[1:15] <tnt_> that makes the cluster update a bit tedious because I have to restart each OSD on each server sequentially by hand and wait for recovery between each (without using a // automation or something).
[1:15] <paravoid> tnt_: i.e. restart the OSD, you wait until it fully peers (takes a while), then immediately restart again and it works properly
[1:15] <paravoid> yep, it was a PITA to upgrade bobtail->ceph
[1:23] <paravoid> I'm glad someone else is experiencing it too
[1:23] <paravoid> sorry for you, but still :)
[1:23] <tnt_> :)
[1:24] <tnt_> I think I have both bugs as well, because I have slow req even after peering is down as well.
[1:37] <andrei> sjust: yeah, that's right. before the tests i am dropping cache
[1:50] * sagelap1 (~sage@2600:1010:b005:ac97:58a9:62e4:7de2:95a9) has joined #ceph
[1:52] * sagelap (~sage@63-237-196-66.dia.static.qwest.net) Quit (Read error: Operation timed out)
[1:53] * LeaChim (~LeaChim@2.217.202.28) Quit (Ping timeout: 480 seconds)
[1:54] <tnt_> sagelap1: wrt to 5176, when you say 250/500, it's for paxos_trim_{min,max} or paxos_service_trim_{min,max} ?
[1:54] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[1:57] * alram (~alram@38.122.20.226) Quit (Quit: leaving)
[1:58] * redeemed (~quassel@cpe-192-136-224-78.tx.res.rr.com) has joined #ceph
[1:58] * sagelap1 is now known as sagelap
[1:59] <sagelap> tnt_ i did both the same
[1:59] <sagelap> is that not what you meant?
[2:01] <tnt_> no I set 100/300 for paxos_trim_{min,max} and 250/500 for paxos_service_trim_{min,max}
[2:01] <tnt_> but I didn't really test other values, I just tried something between the old and the new.
[2:02] <tnt_> just setting 100/300 for paxos_trim_{min,max} didn't fix the issue by itself. So I then changed 250/500 for paxos_service_trim_{min,max} and that did the trick.
[2:03] <sagelap> sounds like having both smaller meant that larger bits weren't being compacted at once.
[2:03] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[2:11] * sagelap (~sage@2600:1010:b005:ac97:58a9:62e4:7de2:95a9) Quit (Ping timeout: 480 seconds)
[2:22] * mrjack (mrjack@office.smart-weblications.net) Quit (Ping timeout: 480 seconds)
[2:23] * mrjack (mrjack@pD95F2849.dip0.t-ipconnect.de) has joined #ceph
[2:24] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[2:32] * wdk (~wdk@124-169-216-2.dyn.iinet.net.au) Quit (Quit: Leaving)
[2:34] * tnt_ (~tnt@91.176.51.54) Quit (Ping timeout: 480 seconds)
[2:38] * rturk is now known as rturk-away
[2:55] * xmltok_ (~xmltok@pool101.bizrate.com) Quit (Quit: Leaving...)
[3:09] * sagelap (~sage@184.169.3.233) has joined #ceph
[3:22] <sagelap> tnt_: if you don't mind trying 250/500 for both that would help a bit. if it doesn't, a log would let us confirm we understand the problem
[3:25] * dpippenger (~riven@206-169-78-213.static.twtelecom.net) Quit (Quit: Leaving.)
[3:28] * andrei (~andrei@host86-155-31-94.range86-155.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[3:34] * diegows (~diegows@190.190.2.126) has joined #ceph
[3:39] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[3:41] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Remote host closed the connection)
[3:41] * xmltok (~xmltok@relay.els4.ticketmaster.com) has joined #ceph
[3:42] * sagelap (~sage@184.169.3.233) Quit (Read error: Connection reset by peer)
[3:45] * KindOne (~KindOne@0001a7db.user.oftc.net) has joined #ceph
[3:51] * Tamil (~Adium@cpe-108-184-66-69.socal.res.rr.com) has left #ceph
[4:02] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[4:07] * sagelap (~sage@2600:1012:b020:4257:58a9:62e4:7de2:95a9) has joined #ceph
[4:10] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[4:21] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[4:21] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[4:26] * buck (~buck@bender.soe.ucsc.edu) has left #ceph
[4:37] * sagelap (~sage@2600:1012:b020:4257:58a9:62e4:7de2:95a9) Quit (Quit: Leaving.)
[4:37] * sagelap (~sage@2600:1012:b020:4257:58a9:62e4:7de2:95a9) has joined #ceph
[4:43] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[4:45] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[4:46] * noahmehl (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) has joined #ceph
[4:51] * sagelap (~sage@2600:1012:b020:4257:58a9:62e4:7de2:95a9) Quit (Ping timeout: 480 seconds)
[4:51] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[4:55] * tkensiski (~tkensiski@c-98-234-160-131.hsd1.ca.comcast.net) has joined #ceph
[4:55] * tkensiski (~tkensiski@c-98-234-160-131.hsd1.ca.comcast.net) has left #ceph
[5:02] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[5:04] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Ping timeout: 480 seconds)
[5:06] * mrjack (mrjack@pD95F2849.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[5:06] * mrjack (mrjack@office.smart-weblications.net) has joined #ceph
[5:07] * rongze1 (~zhu@173-252-252-212.genericreverse.com) has joined #ceph
[5:08] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[5:09] * rongze (~zhu@173-252-252-212.genericreverse.com) Quit (Ping timeout: 480 seconds)
[5:09] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) has joined #ceph
[5:14] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[5:29] * haomaiwang (~haomaiwan@112.193.130.25) has joined #ceph
[5:31] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) Quit (Quit: Leaving.)
[5:42] * haomaiwang (~haomaiwan@112.193.130.25) Quit (Ping timeout: 480 seconds)
[5:43] * haomaiwang (~haomaiwan@119.6.74.149) has joined #ceph
[5:59] * RH-fred (~fred@95.130.8.50) Quit (Read error: Operation timed out)
[6:08] * Coyo (~coyo@00017955.user.oftc.net) Quit (Ping timeout: 480 seconds)
[6:12] * RH-fred (~fred@95.130.8.50) has joined #ceph
[6:15] * redeemed (~quassel@cpe-192-136-224-78.tx.res.rr.com) Quit (Read error: Connection reset by peer)
[6:17] * Coyo (~coyo@thinks.outside.theb0x.org) has joined #ceph
[6:18] * Coyo is now known as Guest47
[6:22] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) has joined #ceph
[6:39] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[6:40] * alexk (~alexk@of2-nat1.sat6.rackspace.com) Quit (Ping timeout: 480 seconds)
[6:53] * xmltok_ (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[7:01] * xmltok (~xmltok@relay.els4.ticketmaster.com) Quit (Ping timeout: 480 seconds)
[7:14] * xmltok_ (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Leaving...)
[7:15] <grepory> Does anyone happen to know what package the rbm kernel module comes from on CentOS/RHEL/and friends?
[7:15] <grepory> rbd. so late.
[7:52] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[8:05] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[8:06] * Almaty (~san@81.17.168.194) has joined #ceph
[8:12] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[8:15] * todin (tuxadero@kudu.in-berlin.de) Quit (Quit: leaving)
[8:15] * todin (tuxadero@kudu.in-berlin.de) has joined #ceph
[8:37] * BManojlovic (~steki@178-222-75-71.dynamic.isp.telekom.rs) Quit (Ping timeout: 480 seconds)
[8:37] * tnt (~tnt@91.176.51.54) has joined #ceph
[8:45] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[8:46] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[8:55] * Machske (~Bram@d5152D87C.static.telenet.be) Quit ()
[8:55] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) Quit (Quit: Leaving.)
[8:57] * Cube (~Cube@38.80.203.93) has joined #ceph
[9:10] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:10] * ChanServ sets mode +v andreask
[9:16] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[9:23] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[9:24] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[9:35] * bergerx_ (~bekir@78.188.101.175) has joined #ceph
[9:35] * tnt (~tnt@91.176.51.54) Quit (Ping timeout: 480 seconds)
[9:55] * stocki78 (~paul@cerberus.einsurance.de) has joined #ceph
[9:57] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[10:06] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) Quit (Ping timeout: 480 seconds)
[10:08] * LeaChim (~LeaChim@2.217.202.28) has joined #ceph
[10:10] * leseb (~Adium@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[10:11] * Machske (~Bram@d5152D8A3.static.telenet.be) has joined #ceph
[10:15] <loicd> sudo apt-get install linux-image-3.8.0-19-generic
[10:15] <loicd> ccourtaut, ^
[10:16] <ccourtaut> loicd: thanks :)
[10:28] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[10:29] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[10:31] * TMM (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) Quit (Ping timeout: 480 seconds)
[10:46] * jgallard (~jgallard@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[10:46] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[10:51] * stocki78 (~paul@cerberus.einsurance.de) Quit (Quit: WeeChat 0.4.0)
[11:02] * KindTwo (~KindOne@h27.57.186.173.dynamic.ip.windstream.net) has joined #ceph
[11:02] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[11:04] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[11:04] * KindTwo is now known as KindOne
[11:09] * stocki78 (~paul@cerberus.einsurance.de) has joined #ceph
[11:09] * stocki78 (~paul@cerberus.einsurance.de) Quit ()
[11:23] <tnt> Am I the only one where "radosgw-admin bucket list" isn't working anymore ?
[11:24] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[11:25] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[11:29] <ofu_> so I created an rbd image for use with libvirt and qemu/kvm according to http://www.hastexo.com/resources/hints-and-kinks/migrating-virtual-machines-block-based-storage-radosceph
[11:29] <ofu_> virsh secret list still says unused ?
[11:55] * haomaiwang (~haomaiwan@119.6.74.149) Quit (Ping timeout: 480 seconds)
[12:01] * andrei (~andrei@host217-46-236-49.in-addr.btopenworld.com) has joined #ceph
[12:03] * TMM_ (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) has joined #ceph
[12:05] * TMM_ is now known as TMM
[12:05] * TMM (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) Quit ()
[12:05] * TMM (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) has joined #ceph
[12:13] * Cube (~Cube@38.80.203.93) Quit (Quit: Leaving.)
[12:14] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[12:19] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[12:21] * mschiff (~mschiff@tmo-110-41.customers.d1-online.com) has joined #ceph
[12:27] <jks> anyone got an idea of whether the new HP Microservers will be powerful enough to run Ceph osds?
[12:28] <jks> (Celeron G1610T 2-core 2.3 Ghz or Pentium G2020T 2-core 2.5 Ghz.... 4 disks)
[12:34] * Gugge-47527 (gugge@kriminel.dk) Quit (Read error: Connection reset by peer)
[12:34] * Gugge-47527 (gugge@kriminel.dk) has joined #ceph
[12:38] * leseb (~Adium@3.46-14-84.ripe.coltfrance.com) Quit (Quit: Leaving.)
[12:38] * mschiff (~mschiff@tmo-110-41.customers.d1-online.com) Quit (Read error: No route to host)
[12:39] * __jt___ (~james@rhyolite.bx.mathcs.emory.edu) has joined #ceph
[12:39] * julian (~julianwa@125.70.132.9) has joined #ceph
[12:40] * __jt__ (~james@rhyolite.bx.mathcs.emory.edu) Quit (Ping timeout: 480 seconds)
[12:44] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) has joined #ceph
[12:56] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) Quit (Ping timeout: 480 seconds)
[13:02] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[13:03] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[13:03] * ChanServ sets mode +v andreask
[13:07] * goldfish (~goldfish@91.215.166.4) Quit (Remote host closed the connection)
[13:08] * leseb (~Adium@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[13:12] * TMM (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) Quit (Ping timeout: 480 seconds)
[13:19] * leseb (~Adium@3.46-14-84.ripe.coltfrance.com) Quit (Ping timeout: 480 seconds)
[13:22] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[13:23] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[13:24] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[13:30] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[13:36] * TMM (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) has joined #ceph
[13:42] <mxmln> first benchmark test on my two node cluster using tmpfs [INF] bench: wrote 1024 MB in blocks of 4096 KB in 7.600207 sec at 134 MB/sec
[13:42] <mxmln> journal on tmpfs
[13:44] * leseb (~Adium@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[13:45] * TMM (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) Quit (Remote host closed the connection)
[13:45] <mxmln> both nodes are connected via crossover to 10G interface for public and private network
[13:46] * TMM (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) has joined #ceph
[13:52] <tnt> mxmln: you're aware than if you loose a journal, you can throw away the complete OSD right ?
[13:54] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[13:57] * yehuda_hm (~yehuda@2602:306:330b:1410:baac:6fff:fec5:2aad) Quit (Ping timeout: 480 seconds)
[13:58] <mxmln> yes but journal on harddisk I have almost half of the performance...so I thought may be hourly backup from tmpfs is cheaper
[13:59] <tnt> if you don't have the last version, it's useless. (at least with osd on xfs), with btrfs it's a bit better but if you loose the two osd containing the two replica, you're f**** as well AFAIK.
[14:02] * diegows (~diegows@190.190.2.126) has joined #ceph
[14:04] <andreask> add a nice ssd disk for your journal
[14:06] * jgallard (~jgallard@3.46-14-84.ripe.coltfrance.com) Quit (Ping timeout: 480 seconds)
[14:06] * tnt want a battery-backed ram disk on SATA ...
[14:07] <mxmln> I see, just found 2 hotplug ssd's on the box I give a try
[14:07] <andreask> "found" .. nice ;-)
[14:07] <tnt> I wish I just 'found' SSD plugged on the box.
[14:08] <andreask> me too
[14:14] <Kdecherf> For those who are familiar with fs benchmarking, which tool do you use (not for cephfs)?
[14:14] <tnt> bonnie++ / fio / dd
[14:15] <tnt> although the 'raw' numbers might not mean much, at least you can compare several options / tweaks to find the best performer.
[14:16] <Kdecherf> my need is to benchmark a fs with lot of small files, but thanks
[14:27] <andreask> Kdecherf: you can give filebench a try http://sourceforge.net/apps/mediawiki/filebench/index.php?title=Filebench#Personalities
[14:29] <Kdecherf> andreask: thx
[14:29] <andreask> Kdecherf: they have quite some predefined tests for various use-cases
[14:36] * haomaiwang (~haomaiwan@124.161.74.128) has joined #ceph
[14:42] * mschiff (~mschiff@tmo-110-41.customers.d1-online.com) has joined #ceph
[14:45] * hufman (~hufman@rrcs-67-52-43-146.west.biz.rr.com) has left #ceph
[14:54] * markbby (~Adium@168.94.245.1) has joined #ceph
[14:55] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[15:04] * lx0 is now known as lxo
[15:04] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) Quit (Remote host closed the connection)
[15:10] <andrei> wido: hello, are you online?
[15:10] * redeemed (~quassel@static-71-170-33-24.dllstx.fios.verizon.net) has joined #ceph
[15:12] * Muhlemmer (~kvirc@cable-88-137.zeelandnet.nl) Quit (Quit: KVIrc 4.3.1 Aria http://www.kvirc.net/)
[15:15] * mschiff (~mschiff@tmo-110-41.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[15:16] <andrei> is anyone uses kvm with ceph?
[15:17] <andrei> i am having some write issues using direct flag
[15:17] <andrei> i was hoping someone could help me
[15:22] * TiCPU (~jeromepou@190-130.cgocable.ca) Quit (Read error: No route to host)
[15:24] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[15:26] <andreask> whats the problem?
[15:29] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[15:35] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[15:42] * mschiff (~mschiff@tmo-110-41.customers.d1-online.com) has joined #ceph
[15:50] * yehuda_hm (~yehuda@2602:306:330b:1410:baac:6fff:fec5:2aad) has joined #ceph
[15:55] * haomaiwang (~haomaiwan@124.161.74.128) Quit (Ping timeout: 480 seconds)
[15:55] * haomaiwang (~haomaiwan@124.161.74.128) has joined #ceph
[15:56] <andrei> andreask: the problem that I have relates to 4k block size random writes
[15:56] <andrei> i am using fio with 4 jobs and 16 iothreads as a test
[15:56] <andrei> direct=1
[15:57] <andrei> after about 10 mins i either get a kernel panic
[15:57] <andrei> or hang task notifications
[15:57] <andrei> or cpu softlocks
[15:57] <andrei> this doesn't happen when I do not use direct=1 flag
[15:57] <andrei> or when I am using tests with large block sizes, like 1M or 4M
[15:58] <andrei> this is on ceph 0.61.3
[15:58] <andrei> with qemu 1.5.0 + libvirt 1.0.6
[15:59] <andreask> kernel panic?
[16:02] * diegows (~diegows@190.190.2.126) has joined #ceph
[16:02] * TMM is now known as TmT
[16:03] * TmT is now known as Tmm
[16:03] * Tmm is now known as tmM
[16:03] <tmM> whoop
[16:03] <tmM> sorry guys, joke for another channel
[16:03] <tmM> forget I was in more channekls
[16:03] * tmM is now known as TMM
[16:14] * Steve_ (~Steve@c-75-68-52-7.hsd1.nh.comcast.net) has joined #ceph
[16:18] <andrei> andreask: yeah, kernel panic also happens sometimes
[16:19] <andrei> not all the time, but i've seen this happening
[16:25] <andreask> so your system is completely overloaded?
[16:26] * alexk (~alexk@of2-nat1.sat6.rackspace.com) has joined #ceph
[16:27] * mschiff (~mschiff@tmo-110-41.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[16:38] * mynameisbruce (~mynameisb@tjure.netzquadrat.de) Quit (Remote host closed the connection)
[16:40] <andrei> andreask: what do you mean?
[16:40] <andrei> i am just running performance tests with fio
[16:40] <andrei> and it causes stability issues
[16:41] <andrei> so, i am trying to figure out what is causing these issues and how to resolve them
[16:44] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[16:44] <andrei> and these issues tend to happen only with 4K block size and direct=1 flag
[16:49] * TMM (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) Quit (Remote host closed the connection)
[16:50] * TMM (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) has joined #ceph
[16:57] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[16:58] * drokita (~drokita@199.255.228.128) has joined #ceph
[16:59] <drokita> I have some monitor processes that won't. Log doesn't increment and provides no detail. What some possible causes? Filesystems are not at 100%.
[16:59] * TMM (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) Quit (Remote host closed the connection)
[16:59] <drokita> monitor processes that won't 'start' :)
[16:59] <nhm> andrei: fyi, I've done quite a bit of performance testing with fio and qemu/kvm on 0.61.2, but I was using stock 12.04 qemu. I also used direct=1 and libaio engine, but only 1 job (with various iodepth values). I didn't run into those panics.
[16:59] * TMM (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) has joined #ceph
[17:00] <nhm> I did start having some issues with iodepth=16 on 1 host with 16 guests, but I suspect that was because I was *really* overloading the host.
[17:01] <andrei> nhm: how many concurrent tests did you run? from how many client hosts and guest vms?
[17:01] <andrei> nhm: have you tried with bs=4k option?
[17:01] <andrei> coz I do not have any issues at all with large block sizes
[17:01] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[17:02] <nhm> andrei: I ran with bs=4k, bs=128k, and bs=4m. 1 OSD server with 24 OSDs, and 1 client host with 1-16 guests.
[17:02] <nhm> with iodepth from 1-16 (doubling for each test)
[17:02] <nhm> so 1,2,4,8,16 guests and 1,2,4,8,16 io depth
[17:03] <andrei> nhm: let me check with jobsnum=1
[17:03] <nhm> though I stopped the 16 guest tests because it was having issues (probably due to limited memory)
[17:03] <andrei> coz i was using 4 jobs with 16 depth
[17:04] * mikedawson (~chatzilla@23-25-2-142-static.hfc.comcastbusiness.net) has joined #ceph
[17:05] <andrei> what is the size of the original file you are testing with?
[17:05] * portante (~user@64.251.112.55) has joined #ceph
[17:05] <nhm> andrei: 64G file on a 100G volume
[17:07] <andrei> well, i guess it shouldn't matter, but i was testing with 200G files on 1TB volume formated as ext4
[17:07] <nhm> If I recall the volumes I was using were formatted with xfs
[17:08] <andrei> i think i was having the same issues with xfs as well
[17:08] * julian (~julianwa@125.70.132.9) Quit (Quit: afk)
[17:08] <andrei> i've just tried with 128k and 32k block sizes and i do not have any issues
[17:08] <andrei> i will try with 8k now
[17:09] <andrei> and switch to 1 job instead
[17:09] <andrei> nhm: how long did you run the tests for?
[17:10] <nhm> andrei: I ran probably a couple thousand tests for 60s each
[17:10] <andrei> i see
[17:10] <andrei> by the way, what performance were you gettign with 16 ios on 4k?
[17:11] <andrei> by the way, were these --rw=randwrite tests ?
[17:11] <nhm> andrei: read, write, randread, randwrite
[17:12] <andrei> did you use rbd caching on the ceph side and cache=writethrough on the kvm side?
[17:12] * joshd1 (~jdurgin@2602:306:c5db:310:21b4:f1a4:b7a8:80fa) Quit (Ping timeout: 480 seconds)
[17:15] * RH-fred (~fred@95.130.8.50) Quit (Quit: Quitte)
[17:16] * mynameisbruce (~mynameisb@tjure.netzquadrat.de) has joined #ceph
[17:18] * sagelap (~sage@2600:1012:b02b:14d5:58a9:62e4:7de2:95a9) has joined #ceph
[17:20] * Machske (~Bram@d5152D8A3.static.telenet.be) Quit ()
[17:21] <andrei> nhm: did you notice any performance issues while you were testing? I am seeing a lot of bandwidth stalls and peaks and lows
[17:22] * joshd1 (~jdurgin@2602:306:c5db:310:ec5c:63c2:e9dc:929c) has joined #ceph
[17:25] <andrei> it seems that 4k and 8k block sizes cause the problem with spu softlocks
[17:25] <andrei> and I've just had a kernel panic with 8k
[17:25] <andrei> after about 15 minutes of testing
[17:38] * elder_ (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Quit: Leaving)
[17:39] * portante (~user@64.251.112.55) Quit (Ping timeout: 480 seconds)
[17:39] * elder_ (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[17:40] * mikedawson (~chatzilla@23-25-2-142-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[17:43] <L2SHO> can someone point me to some information about how to retreive radosgw usage stats and logs?
[17:43] * bergerx_ (~bekir@78.188.101.175) Quit (Quit: Leaving.)
[17:46] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:50] <nhm> andrei: sorry, got distrcted. I had rbd caching on and cache=writeback
[17:50] <andrei> nhm: no probs
[17:50] <andrei> doing some testing
[17:51] <andrei> i have a combination of cache=none and cache=writeback vms
[17:51] <andrei> and with 4k size i am seeing far better performance with cache=none
[17:51] <andrei> still performance is not really good (((
[17:52] <andrei> nhm: did you make any changes to the size of rbd cache? or did you leave it as default?
[17:53] * denken (~denken@dione.pixelchaos.net) Quit (Ping timeout: 480 seconds)
[17:54] * sagelap1 (~sage@2600:1012:b02b:14d5:c685:8ff:fe59:d486) has joined #ceph
[17:54] * sagelap (~sage@2600:1012:b02b:14d5:58a9:62e4:7de2:95a9) Quit (Read error: Connection reset by peer)
[17:57] * tnt (~tnt@91.176.51.54) has joined #ceph
[18:01] <Gugge-47527> im playing around with ceph-deploy / cuttlefish / ubuntu 12.04.2 , and see all the ceph services are started using upstart
[18:02] <Gugge-47527> i cant find out what is mounting my osd disks though
[18:02] <Gugge-47527> they are just magically mounted on boot :)
[18:03] * haomaiwang (~haomaiwan@124.161.74.128) Quit (Ping timeout: 480 seconds)
[18:03] * haomaiwang (~haomaiwan@112.193.130.141) has joined #ceph
[18:05] * Almaty (~san@81.17.168.194) Quit (Remote host closed the connection)
[18:05] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Ping timeout: 480 seconds)
[18:08] * sagelap1 (~sage@2600:1012:b02b:14d5:c685:8ff:fe59:d486) Quit (synthon.oftc.net resistance.oftc.net)
[18:08] * joshd1 (~jdurgin@2602:306:c5db:310:ec5c:63c2:e9dc:929c) Quit (synthon.oftc.net resistance.oftc.net)
[18:08] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) Quit (synthon.oftc.net resistance.oftc.net)
[18:08] * yehuda_hm (~yehuda@2602:306:330b:1410:baac:6fff:fec5:2aad) Quit (synthon.oftc.net resistance.oftc.net)
[18:08] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (synthon.oftc.net resistance.oftc.net)
[18:08] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (synthon.oftc.net resistance.oftc.net)
[18:08] * __jt___ (~james@rhyolite.bx.mathcs.emory.edu) Quit (synthon.oftc.net resistance.oftc.net)
[18:08] * rongze1 (~zhu@173-252-252-212.genericreverse.com) Quit (synthon.oftc.net resistance.oftc.net)
[18:08] * miniyo (~miniyo@0001b53b.user.oftc.net) Quit (synthon.oftc.net resistance.oftc.net)
[18:08] * dxd828 (~dxd828@195.191.107.205) Quit (synthon.oftc.net resistance.oftc.net)
[18:08] * PodMan99 (~keith@dr-pepper.1stdomains.co.uk) Quit (synthon.oftc.net resistance.oftc.net)
[18:08] * gregaf (~Adium@2607:f298:a:607:fda9:e687:6e3c:62d0) Quit (synthon.oftc.net resistance.oftc.net)
[18:08] * madkiss1 (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit (synthon.oftc.net resistance.oftc.net)
[18:08] * b1tbkt (~b1tbkt@24-217-196-119.dhcp.stls.mo.charter.com) Quit (synthon.oftc.net resistance.oftc.net)
[18:08] * eternaleye (~eternaley@2002:3284:29cb::1) Quit (synthon.oftc.net resistance.oftc.net)
[18:08] * SWAT (~swat@cyberdyneinc.xs4all.nl) Quit (synthon.oftc.net resistance.oftc.net)
[18:08] * sagewk (~sage@38.122.20.226) Quit (synthon.oftc.net resistance.oftc.net)
[18:08] * sjust (~sam@2607:f298:a:607:baac:6fff:fe83:5a02) Quit (synthon.oftc.net resistance.oftc.net)
[18:08] * Guest668 (~jeremy@ip23.67-202-99.static.steadfastdns.net) Quit (synthon.oftc.net resistance.oftc.net)
[18:08] * `10__ (~10@juke.fm) Quit (synthon.oftc.net resistance.oftc.net)
[18:08] * scalability-junk (uid6422@id-6422.hillingdon.irccloud.com) Quit (synthon.oftc.net resistance.oftc.net)
[18:08] * glowell (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) Quit (synthon.oftc.net resistance.oftc.net)
[18:08] * wogri (~wolf@nix.wogri.at) Quit (synthon.oftc.net resistance.oftc.net)
[18:08] * masterpe (~masterpe@2001:990:0:1674::1:82) Quit (synthon.oftc.net resistance.oftc.net)
[18:08] * [cave] (~quassel@boxacle.net) Quit (synthon.oftc.net resistance.oftc.net)
[18:10] <L2SHO> Gugge-47527, look at line 302 of /etc/init.d/ceph
[18:11] * joshd1 (~jdurgin@2602:306:c5db:310:ec5c:63c2:e9dc:929c) has joined #ceph
[18:11] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[18:11] * yehuda_hm (~yehuda@2602:306:330b:1410:baac:6fff:fec5:2aad) has joined #ceph
[18:11] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[18:11] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[18:11] * __jt___ (~james@rhyolite.bx.mathcs.emory.edu) has joined #ceph
[18:11] * rongze1 (~zhu@173-252-252-212.genericreverse.com) has joined #ceph
[18:11] * miniyo (~miniyo@0001b53b.user.oftc.net) has joined #ceph
[18:11] * dxd828 (~dxd828@195.191.107.205) has joined #ceph
[18:11] * PodMan99 (~keith@dr-pepper.1stdomains.co.uk) has joined #ceph
[18:11] * gregaf (~Adium@2607:f298:a:607:fda9:e687:6e3c:62d0) has joined #ceph
[18:11] * madkiss1 (~madkiss@chello062178057005.20.11.vie.surfer.at) has joined #ceph
[18:11] * b1tbkt (~b1tbkt@24-217-196-119.dhcp.stls.mo.charter.com) has joined #ceph
[18:11] * SWAT (~swat@cyberdyneinc.xs4all.nl) has joined #ceph
[18:11] * eternaleye (~eternaley@2002:3284:29cb::1) has joined #ceph
[18:11] * sagewk (~sage@38.122.20.226) has joined #ceph
[18:11] * sjust (~sam@2607:f298:a:607:baac:6fff:fe83:5a02) has joined #ceph
[18:11] * Guest668 (~jeremy@ip23.67-202-99.static.steadfastdns.net) has joined #ceph
[18:11] * `10__ (~10@juke.fm) has joined #ceph
[18:11] * scalability-junk (uid6422@id-6422.hillingdon.irccloud.com) has joined #ceph
[18:11] * glowell (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) has joined #ceph
[18:11] * wogri (~wolf@nix.wogri.at) has joined #ceph
[18:11] * [cave] (~quassel@boxacle.net) has joined #ceph
[18:11] * masterpe (~masterpe@2001:990:0:1674::1:82) has joined #ceph
[18:12] <Gugge-47527> L2SHO: as far as i can tell that is not it, it is mounted even if i delete /etc/init.d/ceph
[18:13] <Gugge-47527> i would expect it to use some upstart script for the mount, but i dont know :)
[18:13] * LeaChim (~LeaChim@2.217.202.28) Quit (Ping timeout: 480 seconds)
[18:15] * DarkAceZ (~BillyMays@50.107.55.63) Quit (Ping timeout: 480 seconds)
[18:16] * sagelap (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) has joined #ceph
[18:17] <Gugge-47527> the documentation says GPT labels are used to automatically mount, but how :)
[18:20] <L2SHO> Gugge-47527, I dunno then, I can't stand this whole automatically do everything architecture. I'd much prefer to just update fstab myself
[18:21] <Gugge-47527> i know, that is how i want to know where it is done :)
[18:22] * LeaChim (~LeaChim@2.217.202.28) has joined #ceph
[18:23] <L2SHO> Gugge-47527, if you do "ceph osd dump" you can see the partitions guid at the end of the osd lines. So maybe the mounting logic is build into ceph-osd
[18:24] <Gugge-47527> well, stop ceph-all and unmounting, and start ceph-all does not work
[18:25] <Gugge-47527> so i guess the mounting is not done in the ceph upstart scripts :)
[18:26] * jeff-YF (~jeffyf@67.23.117.122) has joined #ceph
[18:27] * virustrojan (~virustroj@203-213-59-61.tpgi.com.au) has joined #ceph
[18:28] <virustrojan> hiii
[18:29] <mxmln> did someone tried sqlio benchmark inside win2k8r2 vm using virtio drivers??
[18:29] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[18:30] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[18:31] <andrei> nhm: funny thing. I've installed a centos 6.4 guest vm and run the same fio test without any issues
[18:32] <andrei> no problems what so ever with the writes of 4k blocks
[18:32] <nhm> andrei: interesting!
[18:32] <nhm> andrei: different kernel?
[18:32] <andrei> i've noticed ubuntu uses deadline scheduler by default and centos uses cfq
[18:32] <andrei> so i am testing ubuntu with cfq now
[18:32] <andrei> yeah, centos is on 2.6.32
[18:32] <andrei> and ubuntu uses 3.5
[18:33] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[18:33] <nhm> andrei: was the panic always the same?
[18:33] <andrei> yup
[18:34] <L2SHO> Gugge-47527, check /lib/udev/rules.d/95-ceph-osd.rules, maybe thats how it's done
[18:34] <andrei> relating to swapper process
[18:34] <andrei> i was checking mem usage
[18:34] <andrei> coz i thought vm ran out of mem
[18:34] <virustrojan> kik ya
[18:34] <virustrojan> lol
[18:34] <andrei> and that doesn't seems to be the case
[18:34] <virustrojan> sorry
[18:34] <andrei> vm guest was using around 500mb out of 4gb allocated
[18:35] <nhm> andrei: yeah, that seems suspect. You might try the ubuntu 3.8 kernel?
[18:35] * leseb (~Adium@3.46-14-84.ripe.coltfrance.com) Quit (Quit: Leaving.)
[18:35] <nhm> though I'm pretty sure I was using the 3.5 kernel
[18:36] <virustrojan> or debian is okay
[18:37] <andrei> nhm: changing to cfq scheduler didn't help
[18:37] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[18:37] <andrei> just paniced
[18:37] <virustrojan> y
[18:39] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 482 seconds)
[18:43] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[18:43] <nhm> andrei: maybe some interaction with the version of qemu?
[18:43] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[18:43] <nhm> are you using virtio?
[18:46] * DarkAceZ (~BillyMays@50.107.55.63) has joined #ceph
[18:50] * denken (~denken@dione.pixelchaos.net) has joined #ceph
[18:56] * Tamil (~tamil@38.122.20.226) has joined #ceph
[18:57] <virustrojan> no
[19:00] <Gugge-47527> L2SHO: that seems to be it :)
[19:01] <virustrojan> no
[19:02] <andrei> nhm: yes
[19:02] <andrei> using libio
[19:03] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[19:03] <andrei> sorry virtio
[19:03] <andrei> i've upgraded one of the ubuntus to 13.04 and will do some more testing
[19:03] * jluis (~JL@89-181-151-112.net.novis.pt) has joined #ceph
[19:04] * jluis (~JL@89-181-151-112.net.novis.pt) Quit ()
[19:04] * jluis (~JL@89.181.151.112) has joined #ceph
[19:05] * joao (~JL@89-181-151-112.net.novis.pt) Quit (Remote host closed the connection)
[19:05] * jluis (~JL@89.181.151.112) Quit (Remote host closed the connection)
[19:05] * joao (~JL@89.181.151.112) has joined #ceph
[19:05] * ChanServ sets mode +o joao
[19:06] <virustrojan> awwwwwwww
[19:06] <virustrojan> im not a operator
[19:06] <virustrojan> fuqq
[19:06] * Machske (~Bram@d5152D87C.static.telenet.be) has joined #ceph
[19:08] * joao sets mode +b *!*virustroj@*.tpgi.com.au
[19:09] * virustrojan was kicked from #ceph by joao
[19:10] <Kdecherf> Anybody knows when ceph 0.64 will be released?
[19:11] * elder_ (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Quit: Leaving)
[19:12] * andrei (~andrei@host217-46-236-49.in-addr.btopenworld.com) Quit (Read error: Operation timed out)
[19:18] * mikedawson (~chatzilla@23-25-2-142-static.hfc.comcastbusiness.net) has joined #ceph
[19:21] * rturk-away is now known as rturk
[19:26] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[19:27] * jasdeepH (~jasdeepH@50-0-250-146.dedicated.static.sonic.net) has joined #ceph
[19:31] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[19:32] <cjh_> kdecherf: my guess would be about 2 weeks to a month from now?
[19:33] <Kdecherf> hm
[19:35] <jasdeepH> question about ceph read consistency: I was reading the Rados paper and saw that the OSD responsible for reads for a PG will block if it is not receiving heartbeats from other OSD's acting set. How many OSD's in the acting does it need receive a heartbeat from in the 2 second interval before it decides to serve the read?
[19:35] * simplicityAgent (~hackman@76.74.153.36) has joined #ceph
[19:36] <simplicityAgent> Good morning. Anyone active in here?
[19:38] * jeffhung_ (~jeffhung@60-250-103-120.HINET-IP.hinet.net) Quit (Server closed connection)
[19:38] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) has joined #ceph
[19:39] * diegows (~diegows@190.190.2.126) has joined #ceph
[19:41] <simplicityAgent> Anyone have time for some beginner questions?
[19:42] <joshd1> jasdeepH: I think the heartbeating has changed a bit since that paper, but sjusthm would know specifics
[19:42] * Azrael (~azrael@terra.negativeblue.com) Quit (Server closed connection)
[19:42] * Azrael (~azrael@terra.negativeblue.com) has joined #ceph
[19:43] * Azrael is now known as Guest124
[19:43] <sjusthm> jasdeepH: that part doesn't quite work as described in the paper
[19:44] <sjusthm> jasdeepH: the osd responsible for reads will continue to serve reads until it receives a map removing responsibility
[19:44] <jasdeepH> how does that OSD know it has the latest copy of the object?
[19:45] <sjusthm> jasdeepH: however, prior to starting to serve reads and writes, a new osd will try to contact the old one to apprise it of the new situation
[19:46] <sjusthm> jasdeepH: the osd responsible for reads for a pg also coordinates writes for that pg
[19:46] <sjusthm> before a new osd takes over, it will try to contact the old one
[19:46] <jasdeepH> sjusthm: thanks, thought that might not be the case with chain/splay replication strategies
[19:47] <sjusthm> jasdeepH: those aren't implemented currently
[19:47] * simplicityAgent (~hackman@76.74.153.36) Quit (Quit: simplicityAgent)
[19:51] * tkensiski (~tkensiski@209.66.64.134) has joined #ceph
[19:52] * tkensiski (~tkensiski@209.66.64.134) has left #ceph
[19:58] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[20:02] * portseven (~fuckoff@port7.vm.bytemark.co.uk) Quit (Server closed connection)
[20:02] * portseven (~fuckoff@port7.vm.bytemark.co.uk) has joined #ceph
[20:04] * rturk is now known as rturk-away
[20:08] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[20:10] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[20:14] * jtang (~jtang@sgenomics.org) Quit (Server closed connection)
[20:15] * jtang (~jtang@sgenomics.org) has joined #ceph
[20:20] * haomaiwang (~haomaiwan@112.193.130.141) Quit (Ping timeout: 480 seconds)
[20:21] * haomaiwang (~haomaiwan@119.4.173.3) has joined #ceph
[20:40] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[20:46] * haomaiwang (~haomaiwan@119.4.173.3) Quit (Read error: Connection reset by peer)
[20:53] * Faron (~leo@dslb-188-096-219-128.pools.arcor-ip.net) has joined #ceph
[20:55] <Faron> hi, we have set up a cluster of 5 machines. After it worked yesterday, we had to change the ip of some nodes and are trying to repare now
[20:55] <Faron> i always get: 2013-06-12 20:48:56.630466 7f931057d760 -1 unable to authenticate as client.admin
[20:55] <Faron> 2013-06-12 20:48:56.630997 7f931057d760 -1 ceph_tool_common_init failed.
[20:55] <Faron> the usual google says to check all keys, which we did
[20:55] <Faron> could someone help us?
[20:56] <Faron> and i do not get an osd up
[21:00] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[21:01] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[21:14] <L2SHO> is there a way to list all radosgw users?
[21:22] * diegows (~diegows@190.190.2.126) has joined #ceph
[21:23] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[21:23] <Faron> L2SHO: you mean admin.client?
[21:23] <Faron> L2SHO: sorry, i thought you was answering to me …
[21:25] <L2SHO> Faron, I mean, I added some S3 users with "radosgw-admin user create ...", and now I want to get a list of all the users in there
[21:25] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[21:27] * Tamil (~tamil@38.122.20.226) Quit (Quit: Leaving.)
[21:28] <Faron> L2SHO: sorry, no idea, working on ceph now for 4 days
[21:29] * Tamil (~tamil@38.122.20.226) has joined #ceph
[21:31] <Tamil> Faron: when and where do you see this error?
[21:31] <Faron> ceph health
[21:32] <Tamil> Faron: ceph auth list displays your keyring files?
[21:34] <Faron> root@marvin03:~# ceph auth list
[21:34] <Faron> no installed auth entries!
[21:34] <Faron> no it does not
[21:35] <Faron> Tamil: we had just set:
[21:35] <Faron> auth cluster required = none
[21:35] <Faron> auth service required = none
[21:35] <Faron> auth client required = none
[21:35] <Faron> in global section
[21:36] <Faron> when does the mons use port 6800 and when 6789?
[21:42] <dmick> 6789 is the default listening port; more mons will build up from there. 6800 is the default listening port for OSDs, and again, build up from there. IIRC.
[21:43] <Tamil> Faron: what version of ceph are you using?
[21:43] <Faron> Tamil: 0.61.3-1~bpo70+1 from debian packages from ceph.com for debian wheezy
[21:45] <Tamil> Faron: how did you deploy your cluster?
[21:47] <Faron> ceph deploy
[21:47] <Faron> from http://ceph.com/docs/next/start/
[21:48] <Faron> we had the problem to had to replace the first 'mon initial members'
[21:50] <Faron> now i have inserted every mon to my ceph.conf and all 3 talk now, but i do not get up the osds
[21:50] <Tamil> Faron: how did you verify the keys are there?
[21:51] <Faron> i have looked into them
[21:51] <Faron> ooops now there are 5 osds up, one is still missing
[21:53] * mikedawson (~chatzilla@23-25-2-142-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[21:55] <Faron> Tamil: http://pastebin.com/aziBp5CS
[21:56] <Faron> strange error and some times later the osd is on
[21:59] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[22:01] <Tamil> Faron: what did you do to repair your cluster after the ip addresses were changed?
[22:02] <Faron> a monitor started on por t 6800 or 6789, no matter what i set up in ceph.conf
[22:02] <Faron> then i put in ceph.conf every monitor with his 'chosen' port
[22:03] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[22:04] <Faron> without auth, they talked a litte bit. After that i tried to start the osds, got the error in the pastebin for every osd, but after some time they were up
[22:04] * jlogan (~Thunderbi@72.5.59.176) has joined #ceph
[22:04] * mschiff (~mschiff@port-34432.pppoe.wtnet.de) has joined #ceph
[22:05] <Faron> i removed the crazy monitor now
[22:05] <Faron> good to have 3 of them
[22:07] <Tamil> Faron: ceph-deploy has cephx enabled by default in ceph.conf, when did you modify ceph.conf to include "auth cluster required=none,..."?
[22:09] <Faron> Tamil: when the monitors always told me 'unable to authenticate as client.admin'
[22:10] * rturk-away is now known as rturk
[22:10] <Faron> Tamil: after the rebuild is done, i will try to reenable auth. my collegue is setting up a new monitor as raplacement for 'the crazy one'
[22:10] <Tamil> Faron: after disabling cephx, did you restart the cluster?
[22:12] <Faron> yes
[22:12] <Faron> (multiple times)
[22:13] <Tamil> Faron: "unable to authenticate as client.admin" means your client.admin keyring is not right
[22:13] <Tamil> Faron: I also hope you pushed your ceph.conf changes to other nodes in the cluster
[22:16] * andrei (~andrei@host86-155-31-94.range86-155.btcentralplus.com) has joined #ceph
[22:17] <Faron> Tamil: i did
[22:17] * imjustmatthew (~imjustmat@c-24-127-107-51.hsd1.va.comcast.net) Quit (Remote host closed the connection)
[22:18] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[22:18] * ChanServ sets mode +v andreask
[22:18] <Faron> Tamil: what could be wrong with that?
[22:18] <Faron> root@marvin03:~# cat /etc/ceph/ceph.client.admin.keyring
[22:18] <Faron> [client.admin]
[22:18] <Faron> key = AQDEz7RRkIZ0CxAAEjO7FKOchGbZi9HVTdHipw==
[22:18] <dmick> Faron: with ceph-deploy, you need not mention monito addrs in ceph.conf
[22:18] <Faron> i found hints with problems with keys with / and \
[22:19] <dmick> Faron: have you set any 'keyring' in your ceph.conf? and is that keyring file readable by the user you're running commands as?
[22:20] <Faron> dmick: the problem with ceph deploy was, that if the 'initial' monitor is down, nothing work
[22:22] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:22] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Quit: Leaving.)
[22:22] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[22:25] <dmick> it's possible to set 'mon initial members' and/or 'mon host' to multiple values
[22:25] <dmick> but ceph-deploy does that at 'new' time
[22:26] <dmick> so if that's what you updated by hand, I understand
[22:27] * alexbligh (~alexbligh@89-16-176-215.no-reverse-dns-set.bytemark.co.uk) has joined #ceph
[22:29] * alexbligh is now known as alexbligh1
[22:31] * alexbligh1 is now known as alexbligh_away
[22:31] * alexbligh_away is now known as alexbligh1_away
[22:31] <alexbligh1_away> HELP IDENTIFY
[22:33] * alexbligh1_away is now known as alexbligh1_away_away
[22:33] * alexbligh1_away_away is now known as alexbligh1_away
[22:34] * alexbligh1_away is now known as alexbligh1_away_away
[22:34] * alexbligh1_away_away is now known as alexbligh1_away
[22:34] * alexbligh1_away is now known as alexbligh
[22:34] * alexbligh is now known as alexbligh_away
[22:34] * alexbligh_away is now known as alexbligh
[22:36] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[22:37] <L2SHO> ahhhhhh, http://ceph.com/docs/next/radosgw/config-ref/ says that ops_log and usage_log are enabled by default, but that seems to not be the case since 0.56
[22:37] <L2SHO> no wonder I couldn't get any usage information
[22:37] * leseb1 (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[22:37] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Read error: Connection reset by peer)
[22:39] * leseb2 (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[22:39] * leseb1 (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Read error: Connection reset by peer)
[22:39] * wdk (~wdk@124-169-216-2.dyn.iinet.net.au) has joined #ceph
[22:40] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[22:40] * leseb2 (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Read error: Connection reset by peer)
[22:43] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[22:47] * markbby (~Adium@168.94.245.1) Quit (Quit: Leaving.)
[22:51] * leseb1 (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[22:51] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Read error: Connection reset by peer)
[22:54] * mtanski (~mtanski@69.193.178.202) has joined #ceph
[22:55] <mtanski> Does the hadoop cephfs share the same namespace as the posix filesystem.
[22:58] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Quit: my troubles seem so far away, now yours are too...)
[22:59] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[22:59] * ChanServ sets mode +o scuttlemonkey
[23:16] <andrei> nhm: are you online?
[23:16] <andrei> wido: are you here?
[23:28] * leseb1 (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[23:30] * tkensiski (~tkensiski@209.66.64.134) has joined #ceph
[23:30] * tkensiski (~tkensiski@209.66.64.134) has left #ceph
[23:33] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[23:40] <nhm> andrei: heya, what's up?
[23:40] <cjh_> is leveldb being used on the osd's for storage as well as the monitors? i see on the cuttlefish changes that leveldb is the new backend but it doesn't mention it for the osd's
[23:41] * alexbligh is now known as alexbligh_away
[23:41] <nhm> cjh_: leveldb has been used on the OSDs for some things for a while.
[23:42] <cjh_> nhm: ok i was confused and thought everything was being stored in there
[23:42] <nhm> cjh_: nope. There are some parameters you can tweak too.
[23:42] <cjh_> are they documented on the site? i didn't have any luck finding them
[23:45] <nhm> cjh_: hrm, not sure how comprehensively. Look for the omap stuff
[23:45] <cjh_> ok
[23:46] <nhm> cjh_: also, if you do some google searches, you can see some of the commits, such as for the pg_info and pg_log stuff going into leveldb earlier this year.
[23:48] * Disconnected.
[23:52] -magnet.oftc.net- *** Looking up your hostname...
[23:52] -magnet.oftc.net- *** Checking Ident
[23:52] -magnet.oftc.net- *** No Ident response
[23:52] -magnet.oftc.net- *** Found your hostname

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.