#ceph IRC Log

Index

IRC Log for 2013-05-29

Timestamps are in GMT/BST.

[0:02] * sagewk (~sage@2607:f298:a:607:5835:d7f1:dcc4:ed57) Quit (Ping timeout: 480 seconds)
[0:02] <dmick> kyle_: hm, hang on tho, if there's no /var/lib/ceph/osd/*, then ceph-osd.conf won't start it
[0:03] <dmick> which should be done by udev
[0:03] <kyle_> dmick: okay. what step creates that directory?
[0:04] <dmick> you have /lib/udev/rules.d/95-ceph-osd.rules installed?
[0:05] <kyle_> no. don't see it
[0:05] <dmick> !
[0:05] <dmick> dpkg -l | grep ceph?
[0:05] <kyle_> well just a sec sorry, just purged to try from scratch
[0:05] <dmick> (on data0)
[0:05] <dmick> oh
[0:07] * aliguori (~anthony@32.97.110.51) Quit (Quit: Ex-Chat)
[0:10] * sagewk (~sage@2607:f298:a:607:b8b1:2d0c:7124:f5a7) has joined #ceph
[0:11] * loicd (~loic@2a01:e35:2eba:db10:8c4:2874:b63f:2a7e) has joined #ceph
[0:13] <kyle_> okay so started from scratch for good measure. yes, the /lib/udev/rules.d/95-ceph-osd.rules is there.
[0:16] <dmick> ok
[0:17] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[0:17] * vata (~vata@2607:fad8:4:6:9541:d2f3:3800:4fec) Quit (Quit: Leaving.)
[0:17] <dmick> in same place, nothing in /var/lib/ceph/osd, no useful logs in /var/log/upstart/ceph-osd*?
[0:18] <dmick> can you mount sda6 temporarily somewhere and observe that it has osd stuff in it?
[0:18] <kyle_> sure just a minute
[0:21] * tnt (~tnt@91.176.25.109) Quit (Ping timeout: 480 seconds)
[0:25] <kyle_> yeah here is what's there when i mount to /test
[0:25] <kyle_> -rw-r--r-- 1 root root 37 May 28 15:23 ceph_fsid
[0:25] <kyle_> -rw-r--r-- 1 root root 37 May 28 15:23 fsid
[0:25] <kyle_> lrwxrwxrwx 1 root root 9 May 28 15:23 journal -> /dev/sda6
[0:25] <kyle_> -rw-r--r-- 1 root root 21 May 28 15:23 magic
[0:26] <kyle_> /dev/sda7 is the correct osd partition i mislabeled above (/dev/sda6 is the journal).
[0:28] <dmick> ah
[0:29] * redeemed (~redeemed@static-71-170-33-24.dllstx.fios.verizon.net) Quit (Quit: bia)
[0:29] <dmick> ceph_fsid is a file; does it contain the UUID of your cluster (as in your ceph.conf on the ceph-deploy host)?
[0:30] <kyle_> yes they match
[0:32] * PerlStalker (~PerlStalk@72.166.192.70) Quit (Quit: ...)
[0:34] <dmick> so I'd probably try udevadm test --action=add --subsystem-block /dev/sda
[0:35] <kyle_> test: unrecognized option '--subsystem-block'
[0:37] <dmick> that should be =
[0:37] <dmick> sorry
[0:37] <dmick> --subsystem=block
[0:38] <kyle_> test: unrecognized option '--subsystem=block'
[0:40] * [fred] (fred@konfuzi.us) Quit (Remote host closed the connection)
[0:44] <dmick> that's awesome. I get that too, and the manpage isn't ambiguous about ti
[0:44] <dmick> sigh
[0:46] <dmick> how about udevadm trigger --dry-run --action=add --subsystem-match=block --paernt-match=/dev/sda
[0:46] <dmick> s/paernt/parent/
[0:46] * [fred] (fred@konfuzi.us) has joined #ceph
[0:47] <kyle_> unable to open the device '/dev/sda'
[0:47] <dmick> yeah, sorry, me too, that's a syspath
[0:48] <dmick> drop --parent-match, add --verboase
[0:48] <dmick> man
[0:48] <dmick> --verbose
[0:48] * portante (~user@66.187.233.206) Quit (Ping timeout: 480 seconds)
[0:49] <kyle_> http://pastebin.com/fbGF5Mqd
[0:52] <dmick> so it ought to be triggering for sda
[0:52] <dmick> if you remove the --dry-run, does that dir get mounted?
[0:56] <kyle_> removed dry run but did not mount
[0:56] <kyle_> mount:
[0:56] <kyle_> /dev/sda1 on / type xfs (rw)
[0:56] <kyle_> proc on /proc type proc (rw,noexec,nosuid,nodev)
[0:56] <kyle_> sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
[0:56] <kyle_> none on /sys/fs/cgroup type tmpfs (rw)
[0:56] <kyle_> none on /sys/fs/fuse/connections type fusectl (rw)
[0:56] <kyle_> none on /sys/kernel/debug type debugfs (rw)
[0:56] <kyle_> none on /sys/kernel/security type securityfs (rw)
[0:56] <kyle_> udev on /dev type devtmpfs (rw,mode=0755)
[0:56] <kyle_> devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
[0:56] <kyle_> tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
[0:56] <kyle_> none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
[0:56] <kyle_> none on /run/shm type tmpfs (rw,nosuid,nodev)
[0:56] <kyle_> none on /run/user type tmpfs (rw,noexec,nosuid,nodev,size=104857600,mode=0755)
[0:58] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[0:58] * jshen (~jshen@209.133.73.98) Quit (Ping timeout: 480 seconds)
[1:01] * rturk is now known as rturk-away
[1:01] <dmick> kyle_: in which case I'd probabyl run what udev runs, which is /usr/sbin/ceph-disk-activate --mount /dev/sda7 (if that's the OSD dir)
[1:01] <dmick> perhaps adding -v
[1:03] <kyle_> got latest monmap
[1:03] <kyle_> SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0b 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[1:03] <kyle_> 2013-05-28 16:02:22.835280 7fc204c507c0 -1 journal read_header error decoding journal header
[1:03] <kyle_> SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0b 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[1:03] <kyle_> SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0b 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[1:03] <kyle_> SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0b 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[1:03] <kyle_> 2013-05-28 16:02:22.852574 7fc204c507c0 -1 filestore(/var/lib/ceph/tmp/mnt.4yfFym) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
[1:03] <kyle_> 2013-05-28 16:02:22.859568 7fc204c507c0 -1 created object store /var/lib/ceph/tmp/mnt.4yfFym journal /var/lib/ceph/tmp/mnt.4yfFym/journal for osd.0 fsid 37a9ba70-1eaa-4036-9554-c4a747396c3d
[1:03] <kyle_> 2013-05-28 16:02:22.859613 7fc204c507c0 -1 auth: error reading file: /var/lib/ceph/tmp/mnt.4yfFym/keyring: can't open /var/lib/ceph/tmp/mnt.4yfFym/keyring: (2) No such file or directory
[1:04] <kyle_> 2013-05-28 16:02:22.859692 7fc204c507c0 -1 created new key in keyring /var/lib/ceph/tmp/mnt.4yfFym/keyring
[1:04] <kyle_> 2013-05-28 16:02:22.881966 7f2054c50780 -1 read 56 bytes from /var/lib/ceph/tmp/mnt.4yfFym/keyring
[1:05] <kyle_> it did mount though
[1:05] <kyle_> hmm yeah that brought it up and in the cluster ow
[1:05] <kyle_> now*
[1:07] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[1:08] <dmick> hm
[1:08] <dmick> so I wonder why udev is failing you
[1:08] <dmick> wonder if the partition type is right?
[1:09] <dmick> sgdisk could show you that detail
[1:09] <dmick> I think with 'info'
[1:11] <kyle_> i think that revealed the issue.
[1:11] <kyle_> ***************************************************************
[1:11] <kyle_> Found invalid GPT and valid MBR; converting MBR to GPT format.
[1:11] <kyle_> ***************************************************************
[1:11] <kyle_> Warning! Secondary partition table overlaps the last partition by
[1:11] <kyle_> 33 blocks!
[1:11] <kyle_> You will need to delete this partition or resize it in another utility.
[1:11] <kyle_> Partition GUID code: 0FC63DAF-8483-4772-8E79-3D69D8477DE4 (Linux filesystem)
[1:11] <kyle_> Partition unique GUID: B8424BA7-57AA-4020-B34E-1FDDC44098AC
[1:11] <kyle_> First sector: 97654847 (at 46.6 GiB)
[1:11] <kyle_> Last sector: 136724863 (at 65.2 GiB)
[1:11] <kyle_> Partition size: 39070017 sectors (18.6 GiB)
[1:11] <kyle_> Attribute flags: 0000000000000000
[1:11] <kyle_> Partition name: 'Linux filesystem'
[1:12] <kyle_> i guess i'm sizing my partitions incorrectly?
[1:18] <dmick> oh, right, you said you made your own partition
[1:18] <dmick> duh
[1:18] <dmick> so it wouldn't have the type Ceph expects. That's the problem, I'd guess.
[1:18] <dmick> (note the udev rule only takes effect for specific partition GUIDs)
[1:19] <kyle_> okay so should simply not create partions myself and try to have a raw device available?
[1:23] <dmick> depends on what you mean by "should". We should probably document/have a helper command to set up an existing partition for an OSD deployment
[1:23] <dmick> but as the code is written now, it either creates it with the right UUID, or uses what's there
[1:23] <dmick> which of course breaks udev and the ultimate activation
[1:24] <dmick> hm. it seems as though ceph-disk has code to address this, but perhaps it's buggy
[1:25] <kyle_> hmm okay. i prefer to do partitioning myself since i can have all six disks in the server be a in a RAID10. going from 4 to 6 disks in RAID10 gave me a huge performance boost.
[1:25] <dmick> I'm also concerned by the MBR vs GPT tho
[1:25] <dmick> I suspect that without a GPT, this all falls apart
[1:26] * LeaChim (~LeaChim@176.250.167.111) Quit (Ping timeout: 480 seconds)
[1:26] <kyle_> i might have done something weird when messing with partitions. i'll reinstall ubuntu and do partitioning in the setup now that i know i need seperate parts for the journal/osd.
[1:27] <dmick> ok
[1:27] <dmick> so the story is
[1:28] <dmick> automatic == GPT labels
[1:28] <dmick> but if not,
[1:28] <dmick> if you arrange for the partition to be mounted, the upstart jobs should find it and start an OSD on it
[1:28] <dmick> it's just the automount only works with GPT-labels-and-a-specific-type
[1:29] <dmick> further, if you'd had GPT labels, it looks like the code to stamp the partition with the OSD GUID is broken
[1:29] <dmick> but you didn't, so that wasn't the problem
[1:29] <dmick> yes, this should probably be clearer
[1:30] <kyle_> yeah on github i saw this: After that, the hosts will be running OSDs for the given data disks. If you specify a raw disk (e.g., /dev/sdb), partitions will be created and GPT labels will be used to mark and automatically activate OSD volumes. If an existing partition is specified, the partition table will not be modified.
[1:30] <kyle_> but not much more info
[1:31] <dmick> yeah. and there are multiple pieces
[1:31] <dmick> I'll file a doc bug
[1:33] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[1:34] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[1:37] <dmick> kyle_: http://tracker.ceph.com/issues/5181
[1:39] <kyle_> dmick: good deal. thanks for all your help, much appreciated!
[1:40] <dmick> sorry it took so long to puzzle out
[1:40] <dmick> http://tracker.ceph.com/issues/5182 as well
[1:43] <sagewk> sjust: review wip-5176 ?
[1:43] * Tv (~tv@pool-108-38-55-169.lsanca.fios.verizon.net) has joined #ceph
[1:43] <Tv> dmick: http://osrc.dfm.io/liewegas
[1:44] <Tv> oh http://osrc.dfm.io/dmick is even better
[1:46] <dmick> yes, I noticed that last week (someone posted Linus's)
[1:46] <dmick> I wonder what trends I'm setting, and I wonder how mick is filthy, exactly (at least in this century)
[1:47] <Tv> you're setting the trend on triggering silly rails apps to say funny things
[1:47] <dmick> achievement unlocked!
[1:48] <dmick> at least I'm no subterranean rodent :)
[1:50] * john_barbee_ (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) has joined #ceph
[2:01] * newbie (~kvirc@pool-71-164-242-68.dllstx.fios.verizon.net) has joined #ceph
[2:03] * newbie (~kvirc@pool-71-164-242-68.dllstx.fios.verizon.net) Quit ()
[2:08] * danieagle (~Daniel@177.99.132.159) has joined #ceph
[2:15] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[2:15] <joao> Tv == Tommi ?
[2:16] * Tv waves at joao
[2:16] <joao> hey o/
[2:16] <joao> how's it going? :)
[2:17] <Tv> life is good
[2:30] * jahkeup (~jahkeup@ip-8-20-191-5.twdx.net) has joined #ceph
[2:33] * danieagle (~Daniel@177.99.132.159) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[2:34] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[2:38] * sh_t (~sht@NL.privatevpn.com) has joined #ceph
[2:39] <mrjack> hi joao
[2:39] <joao> hey
[2:39] <mrjack> joao: will there be a new cuttelfish version that fixes the mon issue shortly?
[2:39] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[2:39] <joao> don't know for when it is scheduled
[2:40] <mrjack> joao: tnt told me that it seems fixed
[2:40] <joao> sage might know best?
[2:40] <mrjack> joao: because it is anoying to have to restart mons every day..
[2:40] <joao> mrjack, according to tnt's tests it does look that way
[2:50] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) Quit (Quit: Leaving.)
[2:51] * tkensiski (~tkensiski@71.sub-70-197-5.myvzw.com) has joined #ceph
[2:52] * tkensiski (~tkensiski@71.sub-70-197-5.myvzw.com) has left #ceph
[2:57] <mrjack> joao: why then not release a new fixed version?
[2:57] <mrjack> joao: or do new ceph installations not see this bug?
[3:13] <joao> mrjack, it is planed, and it's to be soon; I just don't know when it is supposed to come out
[3:14] * jahkeup (~jahkeup@ip-8-20-191-5.twdx.net) Quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz…)
[3:23] * jshen (~jshen@108-231-76-84.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[3:29] * loicd (~loic@2a01:e35:2eba:db10:8c4:2874:b63f:2a7e) Quit (Quit: Leaving.)
[3:29] * loicd (~loic@2a01:e35:2eba:db10:8c4:2874:b63f:2a7e) has joined #ceph
[3:33] * jahkeup (~jahkeup@ip-8-20-191-5.twdx.net) has joined #ceph
[3:40] * jahkeup (~jahkeup@ip-8-20-191-5.twdx.net) Quit (Quit: Textual IRC Client: www.textualapp.com)
[3:46] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[3:51] * bstillwell (~bryan@bokeoa.com) Quit (Quit: Lost terminal)
[3:53] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) has joined #ceph
[3:57] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[4:02] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[4:13] * jshen (~jshen@108-231-76-84.lightspeed.sntcca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[4:19] * yehuda_hm (~yehuda@2602:306:330b:1410:d90a:7f97:da86:902a) Quit (Ping timeout: 480 seconds)
[4:20] * yehuda_hm (~yehuda@2602:306:330b:1410:28a0:55bc:d720:f9ef) has joined #ceph
[4:20] * jshen (~jshen@108-231-76-84.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[4:23] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[4:24] * tkensiski (~tkensiski@c-98-234-160-131.hsd1.ca.comcast.net) has joined #ceph
[4:25] * tkensiski (~tkensiski@c-98-234-160-131.hsd1.ca.comcast.net) has left #ceph
[4:41] * jshen (~jshen@108-231-76-84.lightspeed.sntcca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[4:58] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[5:09] * Vanony_ (~vovo@88.130.215.162) has joined #ceph
[5:10] * doubleg (~doubleg@69.167.130.11) Quit (Ping timeout: 480 seconds)
[5:11] * jshen (~jshen@108-231-76-84.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[5:11] * yehuda_hm (~yehuda@2602:306:330b:1410:28a0:55bc:d720:f9ef) Quit (Read error: Connection timed out)
[5:13] * dpippenger (~riven@206-169-78-213.static.twtelecom.net) Quit (Quit: Leaving.)
[5:16] * Vanony (~vovo@i59F79DAF.versanet.de) Quit (Ping timeout: 480 seconds)
[5:31] * nunoes (~oftc-webi@213.63.190.66) Quit (Remote host closed the connection)
[5:34] * yehuda_hm (~yehuda@2602:306:330b:1410:28a0:55bc:d720:f9ef) has joined #ceph
[5:34] * Husky (~sam@huskeh.net) Quit (Ping timeout: 480 seconds)
[5:47] * loicd (~loic@2a01:e35:2eba:db10:8c4:2874:b63f:2a7e) Quit (Quit: Leaving.)
[5:47] * loicd (~loic@magenta.dachary.org) has joined #ceph
[5:49] * The_Bishop (~bishop@2001:470:50b6:0:d14c:a623:a4fd:1381) Quit (Ping timeout: 480 seconds)
[5:51] * The_Bishop (~bishop@2001:470:50b6:0:3d36:7a16:7b29:4d94) has joined #ceph
[5:56] * scheuk_ (~scheuk@204.246.67.78) Quit (Read error: Connection reset by peer)
[5:58] * lightspeed (~lightspee@fw-carp-wan.ext.lspeed.org) Quit (Ping timeout: 480 seconds)
[6:10] * The_Bishop (~bishop@2001:470:50b6:0:3d36:7a16:7b29:4d94) Quit (Ping timeout: 480 seconds)
[6:15] * lightspeed (~lightspee@fw-carp-wan.ext.lspeed.org) has joined #ceph
[6:19] * The_Bishop (~bishop@2001:470:50b6:0:d14c:a623:a4fd:1381) has joined #ceph
[6:19] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[6:26] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[6:46] * yehuda_hm (~yehuda@2602:306:330b:1410:28a0:55bc:d720:f9ef) Quit (Read error: Connection timed out)
[6:48] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[6:48] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[6:51] * yehuda_hm (~yehuda@2602:306:330b:1410:28a0:55bc:d720:f9ef) has joined #ceph
[7:07] * madkiss (~madkiss@91.82.208.7) has joined #ceph
[7:11] * jshen (~jshen@108-231-76-84.lightspeed.sntcca.sbcglobal.net) Quit (Remote host closed the connection)
[7:12] * yehuda_hm (~yehuda@2602:306:330b:1410:28a0:55bc:d720:f9ef) Quit (Read error: Connection timed out)
[7:15] * yehuda_hm (~yehuda@2602:306:330b:1410:28a0:55bc:d720:f9ef) has joined #ceph
[7:17] * etr (~etr@cs27053056.pp.htv.fi) Quit (Quit: -)
[7:23] * madkiss (~madkiss@91.82.208.7) Quit (Quit: Leaving.)
[7:27] * Tv (~tv@pool-108-38-55-169.lsanca.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[7:30] * yehuda_hm (~yehuda@2602:306:330b:1410:28a0:55bc:d720:f9ef) Quit (Ping timeout: 480 seconds)
[7:38] * yehuda_hm (~yehuda@2602:306:330b:1410:28a0:55bc:d720:f9ef) has joined #ceph
[7:42] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[7:56] * yehuda_hm (~yehuda@2602:306:330b:1410:28a0:55bc:d720:f9ef) Quit (Read error: Connection timed out)
[8:01] * codice (~toodles@75-140-71-24.dhcp.lnbh.ca.charter.com) has joined #ceph
[8:01] * yehuda_hm (~yehuda@2602:306:330b:1410:28a0:55bc:d720:f9ef) has joined #ceph
[8:05] * sleinen1 (~Adium@2001:620:0:26:496b:e6e7:230c:b2ff) Quit (Quit: Leaving.)
[8:05] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[8:13] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[8:16] * yehuda_hm (~yehuda@2602:306:330b:1410:28a0:55bc:d720:f9ef) Quit (Read error: Connection timed out)
[8:18] * MooingLe1ur (~troy@phx-pnap.pinchaser.com) has joined #ceph
[8:18] * MooingLemur (~troy@phx-pnap.pinchaser.com) Quit (Read error: Connection reset by peer)
[8:18] * yehuda_hm (~yehuda@2602:306:330b:1410:28a0:55bc:d720:f9ef) has joined #ceph
[8:26] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) Quit (Ping timeout: 480 seconds)
[8:29] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[8:32] * julian (~julianwa@125.70.134.203) has joined #ceph
[8:35] * san (~san@81.17.168.194) has joined #ceph
[8:37] <san> Does anyone who speaks Russian?
[8:42] * tnt (~tnt@91.176.25.109) has joined #ceph
[8:44] * madkiss (~madkiss@91.82.208.7) has joined #ceph
[8:48] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[9:07] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[9:09] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[9:10] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Read error: Connection reset by peer)
[9:10] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[9:13] * loicd (~loic@90.84.144.143) has joined #ceph
[9:18] * tziOm (~bjornar@ti0099a340-dhcp0745.bb.online.no) Quit (Remote host closed the connection)
[9:22] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:22] * ChanServ sets mode +v andreask
[9:28] * loicd1 (~loic@87-231-103-102.rev.numericable.fr) has joined #ceph
[9:32] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:33] * loicd (~loic@90.84.144.143) Quit (Ping timeout: 480 seconds)
[9:34] * madkiss (~madkiss@91.82.208.7) Quit (Quit: Leaving.)
[9:37] * Gugge-47527 (gugge@kriminel.dk) Quit (Quit: Bye)
[9:38] <sakari> ...so I started only the monitor server and none of the osds and ceph -s still shows that all osds are up. wat?
[9:39] * Gugge-47527 (gugge@kriminel.dk) has joined #ceph
[9:44] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[9:45] * julian (~julianwa@125.70.134.203) Quit (Read error: Connection reset by peer)
[9:45] * julian (~julianwa@125.70.134.203) has joined #ceph
[9:48] * tnt (~tnt@91.176.25.109) Quit (Ping timeout: 480 seconds)
[9:48] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) Quit (Ping timeout: 480 seconds)
[9:55] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[10:01] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[10:01] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[10:03] <tnt> Damn, enabling leveldb compression seems to eat a lot of CPU.
[10:04] <saaby> on the mons? how much?
[10:04] <tnt> ~ 60% of one core of a Xeon X7460
[10:05] <saaby> all the time?
[10:05] <tnt> it's not the latest gen of CPU but still ...
[10:05] <tnt> yes, all the time.
[10:05] <saaby> damn..
[10:05] <tnt> http://i.imgur.com/Zz3xvsx.png
[10:05] <saaby> leveldb compression, was that enabled in the latest fixes, or did you do that manually?
[10:05] <tnt> You can spot where I enabled it :p
[10:05] <tnt> I enabled it manually.
[10:06] <saaby> wow
[10:06] <saaby> ok
[10:06] <tnt> On the bright side it reduces the IO rate on disk by ~ 60/70%.
[10:06] <saaby> you still see diskusage grow fast with latest fixes?
[10:06] <tnt> no
[10:06] <saaby> ok, so just for reducing IO rate?
[10:07] <tnt> yes, I wanted to see the effect.
[10:07] <saaby> right
[10:07] <tnt> mostly see why it was disabled in ceph because leveldb docs says there are usually no good reason to disable it.
[10:07] <saaby> ok
[10:08] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has left #ceph
[10:08] * lx0 is now known as lxo
[10:11] * LeaChim (~LeaChim@176.250.167.111) has joined #ceph
[10:16] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[10:18] <tnt> wido: rados_cluster_stat ?
[10:18] <wido> tnt: Argh, stupid!
[10:18] <wido> indeed
[10:18] <wido> I've been looking for fetching the OSDMap with flags
[10:19] <wido> tnt: But you can't find out to what the near full ratio has been set to
[10:19] <wido> so you have to do your own guessing
[10:19] <tnt> right.
[10:19] <wido> anyway, a setting in RGW would work. rgw_refuse_put_above_ratio
[10:19] <wido> or so
[10:20] <san> hi. we got problem after removing mon ....
[10:20] <san> 2013-05-29 11:10:47.523778 7f99904b1700 1 mon.c@2(peon) e10 discarding message auth(proto 0 26 bytes epoch 9) v1 and sending client elsewhere
[10:20] <san> 2013-05-29 11:10:47.523802 7f99904b1700 1 mon.c@2(peon) e10 discarding message auth(proto 0 27 bytes epoch 9) v1 and sending client elsewhere
[10:20] <san> 2013-05-29 11:10:52.507711 7f99904b1700 0 mon.c@2(peon).data_health(3736) update_stats avail 90% total 480719568 used 23346248 avail 432954096
[10:20] <san> 2013-05-29 11:10:52.523928 7f99904b1700 1 mon.c@2(peon) e10 discarding message auth(proto 0 27 bytes epoch 9) v1 and sending client elsewhere
[10:20] <san> 2013-05-29 11:10:52.523948 7f99904b1700 1 mon.c@2(peon) e10 discarding message auth(proto 0 30 bytes epoch 9) v1 and sending client elsewhere
[10:20] <san> 2013-05-29 11:10:52.523958 7f99904b1700 1 mon.c@2(peon) e10 discarding message auth(proto 0 27 bytes epoch 9) v1 and sending client elsewhere
[10:20] <san> 2013-05-29 11:10:52.523966 7f99904b1700 1 mon.c@2(peon) e10 discarding message auth(proto 0 27 bytes epoch 10) v1 and sending client elsewhere
[10:20] <Gugge-47527> san: use pastebin
[10:20] <san> 2013-05-29 11:10:52.523979 7f99904b1700 1 mon.c@2(peon) e10 discarding message auth(proto 0 26 bytes epoch 9) v1 and sending client elsewhere
[10:20] <san> 2013-05-29 11:10:52.523988 7f99904b1700 1 mon.c@2(peon) e10 discarding message auth(proto 0 26 bytes epoch 10) v1 and sending client elsewhere
[10:20] <san> 2013-05-29 11:10:52.523997 7f99904b1700 1 mon.c@2(peon) e10 discarding message auth(proto 0 26 bytes epoch 10) v1 and sending client elsewhere
[10:20] <san> 2013-05-29 11:10:52.524005 7f99904b1700 1 mon.c@2(peon) e10 discarding message auth(proto 0 27 bytes epoch 10) v1 and sending client elsewhere
[10:20] <san> 2013-05-29 11:10:52.524015 7f99904b1700 1 mon.c@2(peon) e10 discarding message auth(proto 0 27 bytes epoch 10) v1 and sending client elsewhere
[10:20] <san> 2013-05-29 11:10:57.524237 7f99904b1700 1 mon.c@2(peon) e10 discarding message auth(proto 0 27 bytes epoch 9) v1 and sending client elsewhere
[10:20] <san> 2013-05-29 11:10:57.524279 7f99904b1700 1 mon.c@2(peon) e10 discarding message auth(proto 0 26 bytes epoch 9) v1 and sending client elsewhere
[10:20] <san> 2013-05-29 11:10:57.524287 7f99904b1700 1 mon.c@2(peon) e10 discarding message auth(proto 0 30 bytes epoch 10) v1 and sending client elsewhere
[10:20] <san> and on all mons we cant type any commands
[10:22] <san> sorry for my english
[10:23] <san> any ideas?
[10:25] <wido> san: How many mons are left?
[10:25] <wogri> san: how did you remove the mon, did you follow the docs?
[10:27] <san> 3 mons. 1 left.....on 2 mons logs http://pastebin.com/R5QdDQXE
[10:27] <san> yes we did
[10:29] <wogri> hm. this might be a bug. what does ceph mon stat say (if it is able to say something)?
[10:29] <san> now we have 2 mons. and on all mons we can type any command....
[10:30] <san> we cant?*
[10:32] <san> we can not see the status. do not respond to the command $ceph -s
[10:32] <absynth> 2 mons = no quorum
[10:32] <wogri> ceph works with 2 mons
[10:32] <wido> absynth: Well, if both are only there is a majority
[10:32] <wido> you don't want one to fail
[10:33] <wogri> but it seems that either the deletion went wrong or the deletion-process is buggy
[10:33] <wogri> did you verify that the mon's were gone when there were only two?
[10:33] <san> apparently there is a quorum. virtual machines are running. backend is responsible
[10:36] <tnt> raise the debug level and restart a mon and pastebin the log of start.
[10:36] <wogri> don't forget the pastebin :)
[10:36] <tnt> make it debug mon 10 debug paxos 10 and debug ms 1
[10:37] * absynth braces for a 20000 line paste
[10:38] <san> after command (ceph mon remove a ) Cluster said that there are 2 monitor. next action (ceph -s) - no response.
[10:39] * virsibl (~virsibl@94.231.117.244) has joined #ceph
[10:40] <wogri> darn.
[10:40] <tnt> you did shutdown the 'a' daemon right ? and you removed it from the ceph.conf of the other mons ?
[10:41] <san> its working cluster...we are afraid that after restart the mons - quorum may be not working.
[10:41] <san> yes right
[10:42] <tnt> well, the mons are obviously not processing anything ATM ...
[10:43] * miniyo (~miniyo@0001b53b.user.oftc.net) has joined #ceph
[10:43] <san> what it meen "not processing anything ATM ..." sorry
[10:45] <tnt> it means they're not working since ceph -s doesn't work.
[10:46] <san> yes it is
[10:47] <tnt> ...
[10:47] <san> may be it must be restart mons after removing mon a?
[10:51] <san> ok later we try (make it debug mon 10 debug paxos 10 and debug ms 1) and restart mons. after second cluster will be up and copy data (~3Tb).
[10:54] <virsibl> Hi! Please help me. My cluster consists of three servers each server two osd. The loss of one osd cluster switches to recovery. In consequence of which some virtual machines are not working. How to configure a cluster that would be the loss of one of the machines it was not reflected in the cluster is that possible? I use proxmox + ceph. ceph osd tree
[10:54] <virsibl> # id weight type name up/down reweight
[10:54] <virsibl> -1 6 root default
[10:54] <virsibl> -3 6 rack unknownrack
[10:54] <virsibl> -2 2 host ceph1
[10:54] <virsibl> 0 1 osd.0 up 1
[10:54] <virsibl> 1 1 osd.1 up 1
[10:54] <virsibl> -4 2 host ceph2
[10:54] <virsibl> 2 1 osd.2 up 1
[10:54] <virsibl> 3 1 osd.3 up 1
[10:54] <virsibl> -5 2 host ceph3
[10:54] <virsibl> 4 1 osd.4 up 1
[10:54] <virsibl> 5 1 osd.5 up 1
[10:55] * loicd1 is now known as loicd
[10:56] <Gugge-47527> virsibl: pastebin
[10:56] <tnt> pastebin !
[10:57] <absynth> and what does "not working" mean?
[10:57] <absynth> paste output of ceph -w / ceph -s to pastebin
[10:58] <virsibl> Ok sorry
[11:00] * madkiss (~madkiss@91.82.208.7) has joined #ceph
[11:01] <wogri> virsibl: you can configure ceph in a way so that it won't backfill (=recover) under-replicated pg's (or pools).
[11:01] * sha (~kvirc@81.17.168.194) has joined #ceph
[11:02] <virsibl> Now everything is working. But when ceph in the recovery. Some virtual machines are not available.
[11:06] <absynth> what does "not available" mean *exactly*?
[11:08] <saaby> joao: here?
[11:09] * john_barbee_ (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) Quit (Read error: Operation timed out)
[11:09] <saaby> I just succeeded in crashing a mon again. - packing debug logs and store.db now.
[11:09] <absynth> what version, saaby? cuttlefish?
[11:10] <Gugge-47527> virsibl: how is the io load on the osd disks under recovery?
[11:16] <saaby> absynth: cuttlefish. - and since yesterday with latest mon fixes in the cuttlefish branch
[11:16] <absynth> ok. not scheduling an update.
[11:17] <virsibl> Gugge-47527: I can't give an answer :( If fails one of the osd. should this affect the work of virtual machines?
[11:17] <Gugge-47527> If you scale your cluster right, no
[11:17] <absynth> if your i/o subsystems on the osd machines are underpowered, you will encounter slow requests. if these pile up, VMs will be unable to access their local file systems
[11:18] <virsibl> wogri: how to do it?
[11:18] <absynth> so, look at your OSDs with nmon. If the disks peak at 100% usage all the time during reweight, your disks are shit
[11:18] <Gugge-47527> you have to scale the cluster to your io needs + recovery io
[11:18] <absynth> ceph osd noout will prevent the cluster from reweighting, should an osd go down
[11:21] <Gugge-47527> But you really should have the power to recover while using the cluster :)
[11:25] <virsibl> That is, in the theory of loss of one of the osd. Should not affect the operation of the virtual machines?
[11:25] <wogri_risc> not if your pool size is greater than one.
[11:33] * bergerx_ (~bekir@78.188.101.175) has joined #ceph
[11:40] * madkiss (~madkiss@91.82.208.7) Quit (Quit: Leaving.)
[11:51] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[11:52] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[11:56] * Rocky (~r.nap@188.205.52.204) Quit (Quit: **Poof**)
[11:56] * Rocky (~r.nap@188.205.52.204) has joined #ceph
[11:57] <sig_wal1> hello. after migrating test cluster from 0.56 to 0.62.2, monitor's store.db is now 6 GB and growing... is it normal?
[11:59] <sig_wal1> *0.61.2
[12:00] <saaby> sig_wal1: normal: yes. - good: no. - It's a bug in 0.61.2 - fixed in the cuttlefish branch (and in 0.63 released today)
[12:02] <saaby> as long as you are on 0.62.2, depending on your workload, you will probably have to restart all mons at regular intervals.
[12:13] * DarkAce-Z (~BillyMays@50.107.54.92) Quit (Ping timeout: 480 seconds)
[12:28] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 481 seconds)
[12:37] * eternaleye (~eternaley@2607:f878:fe00:802a::1) Quit (Ping timeout: 480 seconds)
[12:38] * diegows (~diegows@190.190.2.126) has joined #ceph
[12:40] * DarkAce-Z (~BillyMays@50.107.54.92) has joined #ceph
[12:42] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Remote host closed the connection)
[12:43] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[12:43] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Read error: Connection reset by peer)
[12:44] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[12:44] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[12:44] * ChanServ sets mode +v andreask
[12:49] * barryo (~borourke@cumberdale.ph.ed.ac.uk) Quit (Ping timeout: 480 seconds)
[12:59] * barryo (~borourke@cumberdale.ph.ed.ac.uk) has joined #ceph
[13:01] <saaby> joao: I now have a complete log and dump of a failing mon as we talked about yesterday - let me know when you are ready
[13:05] * tziOm (~bjornar@194.19.106.242) has joined #ceph
[13:09] * madkiss (~madkiss@91.82.208.7) has joined #ceph
[13:09] * Maskul (~Maskul@host-89-241-174-13.as13285.net) has joined #ceph
[13:10] <Maskul> Hey guys, i have a question
[13:10] <Maskul> is it possible to distribute entire vm's across several disks with ceph?
[13:12] * Volture (~quassel@office.meganet.ru) Quit (Remote host closed the connection)
[13:12] <wogri_risc> that's what ceph does.
[13:18] <Maskul> so lets say in a situation where i have lets say 3-4 servers which run quite some VM's, however atm each vm stores its data locally. so atm i have i have to login on the same server to use my previous vm in order ot access my data.
[13:19] <Maskul> so with ceph it's possible that the vm is distributed over the 3-4 servers so i can always login into the same vm?
[13:19] <wogri_risc> what do you mean by login into the same vm?
[13:19] <wogri_risc> the storage ist distributed onto as many servers as you want
[13:19] <wogri_risc> but the vm will still have to have a space for memory and cpu power, this will be a single server.
[13:20] <Qten> Maskul: scales the virtual disk onto multiple physical disks not the virtual machine per se
[13:20] * eternaleye (~eternaley@c-50-132-41-203.hsd1.wa.comcast.net) has joined #ceph
[13:21] <Qten> Maskul: Ceph is pretty much a scale out NAS with object support as well and many other features, but its storage not compute :)
[13:23] * dalgaaf (~dalgaaf@nat.nue.novell.com) has joined #ceph
[13:23] * madkiss (~madkiss@91.82.208.7) Quit (Quit: Leaving.)
[13:24] <Maskul> so when i log in to server A to use vm A, i create some data on it afterwards i log off, next day i log in to server B but i want to access vm A again, is that possible?
[13:24] <Maskul> sorry i'm quite new to this area
[13:25] <wogri_risc> what does 'access vm a' mean?
[13:25] <Maskul> just log in to it as a user and use it
[13:25] <Vjarjadian> this is ceph channel :)
[13:25] <wogri_risc> with what? ssh?
[13:26] <Maskul> yes ssh
[13:26] <Maskul> there are other methods aswell but ssh is the main method
[13:26] <wogri_risc> ahm... sorry maskul, I think you don't get it. you could ssh into any virtual machine without ceph, too.
[13:27] <Vjarjadian> maskul, are both server A and B hypervisors? if you have everything on the same network you should be able to access everything from everything...
[13:27] <Maskul> erm yes, i'm kinda explaining the situation bad
[13:28] <jerker> Maskul: Yes, the block storage is available on both servers (hypervisors).. so they can start any machine.
[13:29] <jerker> Maskul: at least it should be, and like Vjarjadian wrote, you need the same network on both servers in order to get stuff to work.
[13:30] <Maskul> the situation is i have a system where users are able to "rent" vm's from several servers and do whatever they want with it however all the changes/ data they create is stored on the server which hosts the vm itself. after a user logs off their vm, it gets automaticaly powered off to save resources.if they connect to the same server again then they are able to use the same vm again they were using before with all its data.
[13:31] <Vjarjadian> maskul, what hypervisor are you using?
[13:31] <Vjarjadian> and is this production or theory?
[13:31] * dcasier (~dcasier@223.103.120.78.rev.sfr.net) has joined #ceph
[13:31] <Maskul> however if they connect to another server they will be given a complete new vm and they are not able to access their data anymore.
[13:31] <Maskul> hypervisor is kvm normally
[13:31] <Maskul> this is production
[13:32] <Vjarjadian> vmware and hyper-v can have high VM densities... KVM must be able to as well
[13:32] <Vjarjadian> probably not much need to power off the vms...
[13:32] * loicd (~loic@87-231-103-102.rev.numericable.fr) Quit (Quit: Leaving.)
[13:32] <Vjarjadian> you could also motion the VMs to the other hypervisor to give them the same VM each time
[13:33] * nhm (~nhm@65-128-142-169.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[13:34] <jerker> Maskul: The idea is that the RBD (rados block device) should be available on all hypervisors, so it is just to spin the virtual machine up on a another hypervisor. In theory, since I have only used CephFS for other things than virtual machines...
[13:34] <wogri_risc> not only in theory, I'm using this in production, jerker. the physical machine does not matter a lot in ceph.
[13:34] <Vjarjadian> i just wish i could mount ceph on windows machines... make it so much more useful for me
[13:35] <jerker> wogri_risc: Sorry for my wording, I mean for ME it is theory, since I have not tried. :-)
[13:35] <wogri_risc> jerker: got you right. I was just saying, I know that this also really works :)
[13:35] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[13:36] <jerker> Vjarjadian: CephFS? Well. I Samba-exported CephFS, seemed to work (not ACLs though). Using RBD via KVM is possible I guess.
[13:37] <Maskul> so i just have to make sure that the vm which is used to accessible by the other hypervisor and this is able to be done using RBD?
[13:37] <Vjarjadian> thats sort of what i had on my last test cluster...
[13:39] <jerker> Maskul: Yes, the block storage (file system image) for the virtual machines should use RBD and not a local file/partiton/etc. Stop the VM on one server, then start it (using the same RBD) on the other server....
[13:40] <Vjarjadian> maskul, do you have a SAN already or just the local storage?
[13:42] <jerker> Vjarjadian: My old execution nodes i ran Ceph on are too old or whatever to accept my SSDs as system drives (grr IBM) so gah currently I have not decided how to proceed with my test cluster... And other projects are taking more and more time, no time for setting up Ceph. I have to do with ZFS for a while. :/
[13:43] * jerker leaving lunch and go back to work :-)
[13:44] <Maskul> yes we have a san already i think (I'm not all too familiar with the system sadly enough, ive just been asked to come up with ideas how to improve the problem we have)
[13:44] <Vjarjadian> you should be able to use your current SAN for this too
[13:45] <Vjarjadian> and is the problem being able to access VMs on multiple servers or something else?
[13:46] <Maskul> the problem is being able to access vms on multiple servers
[13:47] <Vjarjadian> are your hypervisors in a cluster?
[13:48] * yeled (~yeled@spodder.com) has joined #ceph
[13:49] <Maskul> i would assume so
[13:49] <Vjarjadian> then if the clustering on KVM is any good at all, it should be able to sort out high availability/moving a VM easily
[13:50] <yeled> has anyone experience of migrating from mogilefs ( >1.5PB ) to ceph?
[13:52] <Maskul> alright, ill have a look around how i can solve it, thanks for the patience and help :)
[13:54] <Vjarjadian> and if your san can't do it... hyper-v supports 'shared nothing migrations' :)
[13:56] * TMM (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) Quit (Quit: Bye)
[13:57] * TMM (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) has joined #ceph
[13:59] * mrjack (mrjack@office.smart-weblications.net) Quit (Ping timeout: 480 seconds)
[14:04] * fghaas (~florian@91-119-88-140.dynamic.xdsl-line.inode.at) has joined #ceph
[14:04] * fghaas (~florian@91-119-88-140.dynamic.xdsl-line.inode.at) has left #ceph
[14:04] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[14:05] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[14:05] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[14:11] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[14:11] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[14:13] * Qten (~Qten@ip-121-0-1-110.static.dsl.onqcomms.net) Quit (Read error: Connection reset by peer)
[14:15] * Qten (~Qten@ip-121-0-1-110.static.dsl.onqcomms.net) has joined #ceph
[14:17] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[14:18] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[14:19] * tnt (~tnt@212-166-48-236.win.be) Quit (Read error: Operation timed out)
[14:20] * schlitzer|work (~schlitzer@109.75.189.45) has joined #ceph
[14:20] <schlitzer|work> hey all
[14:20] <schlitzer|work> how can is list my radosgw users?
[14:25] <topro> du brauchst nur kurz vorbeikommen mit deinem notebook. dankk bekommst auch gleich eine neue version der 1.6 zum ccp/xcp daqlist reinit testen ;)
[14:25] <topro> ^^ sorry, wrong channel ;)
[14:26] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[14:26] * morse_ (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[14:31] <tnt> Since cuttle fish it seems that "radosgw-admin bucket list" doesn't work anymore ... anyone else got that issue ?
[14:34] * dosaboy_ (~dosaboy@host86-161-207-152.range86-161.btcentralplus.com) has joined #ceph
[14:34] * dosaboy (~dosaboy@host86-163-34-137.range86-163.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[14:35] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[14:38] * madkiss (~madkiss@91.82.208.7) has joined #ceph
[14:39] * dosaboy (~dosaboy@host86-164-138-172.range86-164.btcentralplus.com) has joined #ceph
[14:40] * san (~san@81.17.168.194) Quit (Quit: Ex-Chat)
[14:43] * dosaboy_ (~dosaboy@host86-161-207-152.range86-161.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[14:44] * dosaboy_ (~dosaboy@host86-150-245-200.range86-150.btcentralplus.com) has joined #ceph
[14:45] * dosaboy_ (~dosaboy@host86-150-245-200.range86-150.btcentralplus.com) Quit ()
[14:46] * dosaboy_ (~dosaboy@host86-150-245-200.range86-150.btcentralplus.com) has joined #ceph
[14:48] * dosaboy (~dosaboy@host86-164-138-172.range86-164.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[14:52] <andreask> schlitzer|work: I think there is no direct way to query ...tried to parse "radosgw-admin usage show --show-log-entries=false"
[14:52] <andreask> ?
[14:53] <schlitzer|work> the output of this command is: { "summary": []}
[14:54] <andreask> oh
[14:54] <schlitzer|work> and i have created 6 users (that did not anything until now)
[14:54] <tnt> usage logs are disabled by default now IIRC.
[14:54] <schlitzer|work> i guess these will show up when they create some buckets or whatever
[14:55] <schlitzer|work> but it would be great to have a feature to list what users we have.... otherwise one would have to document this outside of ceph.... what is ugly
[14:55] <schlitzer|work> i guess i should open a bug/feature request
[14:56] <tnt> there is a rgw admin API being built
[14:56] <schlitzer|work> are there any links to this?
[14:58] <tnt> http://ceph.com/docs/master/radosgw/adminops/
[15:01] <schlitzer|work> ahh, this page. i already checked it. it is not mentioning something like list users
[15:01] <schlitzer|work> or something that can provide what i am looking for
[15:02] * dcasier (~dcasier@223.103.120.78.rev.sfr.net) Quit (Quit: Leaving)
[15:02] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[15:04] * madkiss (~madkiss@91.82.208.7) Quit (Quit: Leaving.)
[15:06] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[15:08] * jeff-YF (~jeffyf@67.23.117.122) has joined #ceph
[15:09] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has left #ceph
[15:09] * pixel (~pixel@81.195.203.34) has joined #ceph
[15:11] <pixel> Hi everybody, Is it possible to get access to data which is located in .rgw.buckets pool directly via console?
[15:12] <absynth> hm, can i colocate two osds and one mon on 1 machine to have the smallest possible ceph setup?
[15:13] <tnt> pixel: you can see the RADOS objects in there but it's non trivial to get the actual RGW "data" out of it because data isn't stored 1:1 ...
[15:13] * BillK (~BillK@124-169-221-201.dyn.iinet.net.au) Quit (Remote host closed the connection)
[15:13] <tnt> absynth: yes, it'll work.
[15:13] <tnt> I use that as a test setyp
[15:14] <pixel> tnt: Do you mean this command: rados -p .rgw.buckets ls ?
[15:14] <tnt> yes
[15:15] <pixel> <tnt> thx!
[15:16] * dcasier (~dcasier@223.103.120.78.rev.sfr.net) has joined #ceph
[15:16] <tnt> Anyone know how often rgw is supposed to cleanup deleted objects ?
[15:18] * loicd (~loic@87-231-103-102.rev.numericable.fr) has joined #ceph
[15:21] * madkiss (~madkiss@91.82.208.7) has joined #ceph
[15:23] * madkiss (~madkiss@91.82.208.7) Quit ()
[15:23] * BillK (~BillK@124-169-221-201.dyn.iinet.net.au) has joined #ceph
[15:26] * todin (tuxadero@kudu.in-berlin.de) has joined #ceph
[15:27] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[15:27] <schlitzer|work> i�m trying to create a key for a swift subuser with "radosgw-admin key create --subuser=johndoe:swift --key-type=swift"
[15:27] <schlitzer|work> but for some reason, no secret is created :-/
[15:28] <ccourtaut> schlitzer|work: yes same for me, try to add --gen-secret if i remember well
[15:29] <todin> hi, I use openstack with ceph/rbd via cinder, but somehow the cinderclient cannot connect to the rbd store. log. http://pastebin.com/BdKgzJrc
[15:30] <schlitzer|work> ccourtaut, thank you that did it.... the doku should be adjusted i guess
[15:30] <ccourtaut> schlitzer|work: i think too
[15:30] <ccourtaut> schlitzer|work: but i don't know if it is the right way to do so
[15:31] <loicd> ccourtaut: nice post http://blog.kri5.fr/ :-)
[15:32] <ccourtaut> loicd: thanks
[15:33] <schlitzer|work> hmmm, the created key is not working with the swift command line tool
[15:33] <schlitzer|work> i guess i have to escape some characters
[15:35] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[15:35] <ccourtaut> schlitzer|work: i think the output of radosgw-admin already escapes the / with \
[15:35] <schlitzer|work> ok, then it is simply not working :-/
[15:36] <schlitzer|work> i get a 403
[15:37] <ccourtaut> schlitzer|work: i don't know then, it worked for me yesterday on branch master
[15:37] * pixel (~pixel@81.195.203.34) Quit (Quit: Ухожу я от вас (xchat 2.4.5 или старше))
[15:37] <schlitzer|work> hmmm, i try to recreate the user
[15:40] <ccourtaut> schlitzer|work: you are trying the test with swift present in this doc : http://ceph.com/docs/master/radosgw/config/ ?
[15:41] * leo (~leo@27.106.31.68) has joined #ceph
[15:43] <schlitzer|work> yes
[15:44] * ofu (ofu@dedi3.fuckner.net) has joined #ceph
[15:44] <ofu> hi
[15:44] <ccourtaut> schlitzer|work: well i don't know then, it worked for me, you should ask to some inktank guys :)
[15:44] <ofu> hi, i am new to ceph and i have trouble getting my osds in up state
[15:45] <schlitzer|work> ccourtaut, it works now, had put the key in "", this was the error
[15:47] <ccourtaut> schlitzer|work: might be an escaping issue, though
[15:48] * mrjack (mrjack@office.smart-weblications.net) has joined #ceph
[15:49] * leo (~leo@27.106.31.68) Quit (Quit: Leaving)
[15:52] * oliver1 (~oliver@p4FD071D2.dip0.t-ipconnect.de) has joined #ceph
[15:59] * virsibl (~virsibl@94.231.117.244) has left #ceph
[16:05] * jmlowe1 (~Adium@c-71-201-31-207.hsd1.in.comcast.net) has joined #ceph
[16:05] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) Quit (Read error: Connection reset by peer)
[16:08] * tchmnkyz (~jeremy@0001638b.user.oftc.net) has joined #ceph
[16:09] <tchmnkyz> hey guys i am seeing a small issue with my ceph cluster. it seems that most of the IO load is being directed at osd0 rather then spreading the load across the cluster evenly
[16:09] * tkensiski (~tkensiski@243.sub-70-197-16.myvzw.com) has joined #ceph
[16:09] * tkensiski (~tkensiski@243.sub-70-197-16.myvzw.com) has left #ceph
[16:10] * dosaboy (~dosaboy@host86-163-9-169.range86-163.btcentralplus.com) has joined #ceph
[16:12] * DarkAce-Z (~BillyMays@50.107.54.92) Quit (Ping timeout: 480 seconds)
[16:15] <ofu> my osds are using btrfs on ubuntu and the osds are wainting on futex(0x7f39f13659d0, FUTEX_WAIT, 5169, NULL
[16:16] * dosaboy_ (~dosaboy@host86-150-245-200.range86-150.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[16:19] * DarkAceZ (~BillyMays@50.107.54.92) has joined #ceph
[16:19] * madkiss (~madkiss@91.82.208.7) has joined #ceph
[16:23] * KippiX (~kippix@coquelicot-a.easter-eggs.com) has joined #ceph
[16:23] <ofu> http://ceph.com/releases/v0-63-released/
[16:24] <KippiX> hi, i have thinked a architecture with 2 pysical nodes (dell R515) with 6 osd by nodes. on each nodes i have one "mon". And for requirements of ceph i plan to make a third "mon" on an other server. This serveur is out of the dedicate cluster network. Same as the others "mon" in fact. So the question is to know if i get a network bottleneck ? the dedicate cluster network is in lacp, so 2ge and the other network is in 1ge..
[16:25] <tnt> mon don't have much network traffic.
[16:25] <ofu> lacp will balance on mac addresses? So between 2 servers, you will only achieve 1gbit
[16:33] * redeemed (~redeemed@static-71-170-33-24.dllstx.fios.verizon.net) has joined #ceph
[16:36] <KippiX> ofu: s/lacp/802.3ad/ and balance-rr
[16:36] * jmlowe1 (~Adium@c-71-201-31-207.hsd1.in.comcast.net) has left #ceph
[16:37] <mrjack> i currently have 0.61.2 running - should i upgrade to 0.63 or will there be a new cuttlefish version? 0.61.3?
[16:38] <saaby> ofu: you can configure lacp to use ip/port
[16:38] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) Quit (Ping timeout: 480 seconds)
[16:40] <KippiX> tnt: but ceph clients talk to "mon" first no ? and the traffic goes to across "mon" nodes no ?
[16:42] * tkensiski (~tkensiski@243.sub-70-197-16.myvzw.com) has joined #ceph
[16:43] * tkensiski (~tkensiski@243.sub-70-197-16.myvzw.com) has left #ceph
[16:43] <tnt> they talk to the mon only for "meta data".
[16:43] <tnt> any IO traffic goes directly to the OSD.
[16:43] <tnt> KippiX: ^^
[16:44] <KippiX> tnt: ok :)
[16:44] * TiCPU (~jeromepou@190-130.cgocable.ca) has joined #ceph
[16:45] <KippiX> tnt: do you have a link to the ceph documentation, it explain this ?
[16:46] <schlitzer|work> hmm, enabling rgw has create 8 new pools as i can tell (.rgw, .rgw.buckets, .rgw.control, and so on) . if i do "ceph osd pool get .rgw.buckets pg_num" there are only 8 pg�s. is�nt this a somehow small number?
[16:46] <schlitzer|work> okay, at the moment i have only 3 osd�s, but if i will have 9 or more osd�s, how can i make use of them?
[16:47] <schlitzer|work> as i understand, one pg maps exactly to one osd
[16:47] <KippiX> tnt but "metadata" work just for cephfs and radosgw no ?
[16:47] <schlitzer|work> as far as i know there is also no way to split one pg into multiple pg�s, or anthing like this
[16:48] <TiCPU> I currently have a setup of 6 OSD, 3 monitors on 6 different physical servers, all the RADOS stuff works pretty well, benchmark write show ~80MB/s and seq show around 140MB/s, I'm using RBD backend for Qemu and I'm encountering major slowdowns in guests (Win7 and WinXP right now), multi seconds freezes. I'm using cuttlefish v0.61.2 and upgraded to Qemu 1.4.0 with async patch, mouse doesn't freeze now but the rest of the system do, on Ubuntu Raring, an
[16:48] <TiCPU> y idea?
[16:49] * jahkeup (~jahkeup@199.232.79.41) has joined #ceph
[16:49] <TiCPU> oh, and ceph -w shows almost no activity
[16:50] <TiCPU> perfmon on guests shows disk queue through the roof when frozen, I'm using VirtIO too, both writeback and none cache
[16:50] * loicd (~loic@87-231-103-102.rev.numericable.fr) Quit (Quit: Leaving.)
[16:51] <tnt> KippiX: no, it's more "cluster meta data" like pgmap / osdmap / ... and they're used by everyone.
[16:51] * loicd (~loic@87-231-103-102.rev.numericable.fr) has joined #ceph
[16:52] <KippiX> tnt: ok thx
[16:52] <tnt> schlitzer|work: yes, that's not enough pg. The best is to delete the pool and recreate it manually with more PGs.
[16:52] <tnt> usually you want pg_num ~= (100 * #OSD) / replicas
[16:52] * loicd (~loic@87-231-103-102.rev.numericable.fr) Quit ()
[16:53] <schlitzer|work> yes i know, that was the question
[16:53] <schlitzer|work> is this something where i should open a bug report?
[16:53] * loicd (~loic@87-231-103-102.rev.numericable.fr) has joined #ceph
[16:54] <schlitzer|work> this "problem" is not mentioned in the docs directly (at least i did not find anything)
[16:54] <schlitzer|work> and do you know if splitting pg�s is a planned feature?
[16:54] * loicd (~loic@87-231-103-102.rev.numericable.fr) Quit ()
[16:54] <tnt> it's planned. It's actually in cuttlefish, but experimental.
[16:54] <tnt> Although if there is no objects in the pool, it should work just fine.
[16:55] <schlitzer|work> ahh ok
[16:55] <schlitzer|work> thanks, so i will just delete all the rgw pools and recreate them with a higher PG count
[16:57] * gucki (~smuxi@77-56-36-164.dclient.hispeed.ch) has joined #ceph
[16:57] <tnt> only .rgw.buckets will receive real data
[16:57] <tnt> the rest is mostly bookkeeping and so it doesn't matter all that much IMHO
[16:59] <schlitzer|work> ok, thank you
[16:59] * drokita (~drokita@199.255.228.128) has joined #ceph
[17:00] <drokita> It looks like I am falling victim to Bug #4974 after upgrading from Bobtail to Cuttlefish. Got a down MON. What can I do to fix this?
[17:01] <tchmnkyz> tnt: any reason you can think of why my first osd would get more IO then the rest of the osd's?
[17:02] <tchmnkyz> there is more read/writes to it then to the other 5
[17:03] * madkiss (~madkiss@91.82.208.7) Quit (Quit: Leaving.)
[17:04] <tchmnkyz> tnt: iostat from all 6 nodes http://pastebin.com/055LDSD7
[17:05] <joao> drokita, which version did you upgrade to?
[17:05] <drokita> 0.61.2
[17:06] <joao> drokita, will need mon logs with 'debug mon = 20' and,if possible, a copy of the store you are trying to convert
[17:07] <joao> that bug should have been fixed for 0.61.2 iirc
[17:07] * portante (~user@66.187.233.207) has joined #ceph
[17:08] * lennox (lennox@addaitech.broker.freenet6.net) has joined #ceph
[17:08] * lennox (lennox@addaitech.broker.freenet6.net) has left #ceph
[17:08] <saaby> joao: I have saved logs and store.db from a mon crash earlier today. Interested?
[17:09] <joao> saaby, indeed I am
[17:09] <saaby> logs with debug 20
[17:09] <saaby> ok
[17:09] <saaby> here are the files: http://www.saaby.com/files/
[17:09] <joao> saaby, best kind of logs :)
[17:09] <saaby> story is, "crash_original" is the first crash since rebuild yesterday.
[17:10] <saaby> "crash_after_rebuild" contains store and logs for the crashes after I tried rebuilding the mon from scratch
[17:11] <saaby> so, I wasn't actually able to rebuild/restore the mon (which I could after the crash yesterday).
[17:11] <saaby> later I tried deleting the rather large test pool we have been using, after which I was able to rebuild and restart the mon without any problems.
[17:12] <saaby> in both cases the mon crashed continuously 2-3 secs after startup.
[17:13] <saaby> let me know if you need anything
[17:13] <joao> saaby, will do, thanks
[17:14] <saaby> cool, thanks
[17:15] <drokita> joao: http://pastebin.com/U8GBeK7B
[17:17] <saaby> btw, I am wondering, is having reached pgmap version +1mio in a few weeks normal? : "pgmap v1128547"
[17:19] <joao> drokita, you must have aborted a conversion while it was on-going, or the monitor crashed while doing so
[17:19] <drokita> That was what I read from the log, but not sure how to go about fixing now
[17:19] <joao> drokita, that error you just pastebined will go away if you 'mv /src/mon.a/store.db /src/mon.a/store.db.od' and re-run the monitor
[17:19] <joao> btw
[17:20] <drokita> ok
[17:20] <joao> drokita, it the monitor asserts out while converting, please redo what I just said (moving store.db out of the way) and rerun the monitor with 'debug mon = 20' and then send the logs my way :)
[17:21] <joao> also, conversion may take a while
[17:21] <joao> depending on the size of the original store
[17:21] <drokita> It is running right now
[17:21] <joao> okay, let me know how it goes
[17:22] <drokita> Thanks for the help!!! Worked. You are both a gentleman and a scholar.
[17:23] * tziOm (~bjornar@194.19.106.242) Quit (Remote host closed the connection)
[17:24] <joao> great; I worried a bit when you mentioned #4974 :)
[17:24] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:25] <drokita> I apologize, the path I took through the interwebs showed that error and a link to 4974.
[17:27] * diegows (~diegows@190.190.2.126) Quit (Read error: Operation timed out)
[17:28] * The_Bishop (~bishop@2001:470:50b6:0:d14c:a623:a4fd:1381) Quit (Ping timeout: 480 seconds)
[17:31] <drokita> joao: So after fixing the one monitor, and moving through the second monitor successfully.... the third monitor failed with something different.
[17:31] <drokita> Going to pastebin it
[17:31] <joao> drokita, k thanks
[17:33] <drokita> http://pastebin.com/jivSiXbD
[17:33] <drokita> It says something to the affect that the existing monitor store has not been converted to .52 bobtail
[17:34] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[17:35] <joao> drokita, http://tracker.ceph.com/issues/4747
[17:35] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[17:36] * The_Bishop (~bishop@2001:470:50b6:0:6dd1:495c:667:a5e6) has joined #ceph
[17:37] <tchmnkyz> anyone else able to help with the issue i am seeing?
[17:37] <tnt> tchmnkyz: maybe the data repartition is uneven
[17:38] <tchmnkyz> is there a easy way to check that?
[17:39] <tchmnkyz> sorry if i am newbish on this
[17:39] <tnt> check the used disk space on each osd :)
[17:39] <drokita> joao: I read through the issue report, but did not see a resolution. Did I miss it?
[17:41] <tchmnkyz> tnt: looks pretty even
[17:41] <tchmnkyz> anywhere from 900GB to 1.2 tb
[17:44] <tnt> tchmnkyz: well, then I'm not sure. And I gotta go right now.
[17:45] <tchmnkyz> k
[17:45] <tchmnkyz> have a good one
[17:46] <joao> drokita, can you 'ls -d /srv/mon.c/*_gv' ?
[17:50] <drokita> auth_gv logm_gv osdmap_gv pgmap_gv
[17:50] <joao> drokita, is that all?
[17:51] <drokita> yes
[17:51] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[17:52] <joao> drokita, okay, can you check if that matches the other monitor you successfully converted earlier?
[17:52] <drokita> It does
[17:52] <joao> cool
[17:52] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:52] <joao> just to make sure, do you have a quorum already?
[17:54] <drokita> e1: 3 mons at {a=10.32.12.80:6789/0,b=10.32.12.81:6789/0,c=10.32.12.82:6789/0}, election epoch 13470, quorum 0,1 a,b
[17:55] * joshd (~jdurgin@2602:306:c5db:310:459d:83f:2de:a3f7) has joined #ceph
[17:55] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[17:55] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[17:56] <joao> alright then, here's the thing: I'm not sure how Greg went about to recover a monitor from that state, but given the cause it's likely that the solution is just to copy over the 'feature_set' file from one of the already successfully converted monitors to the monitor you are having troubles converting
[17:56] <joao> an alternative would be to recreate that monitor
[17:56] <joao> and let it sync from the existing quorum
[17:57] <joao> gregaf, want to chime in?
[17:57] <drokita> Well... copying hte feature_set file seems less painful than recreating, so lets try that first
[17:57] <joao> okay
[17:57] <joao> drokita, make sure to backup your monitor store first though
[17:57] <joao> just in case
[17:58] <joao> conversion is a read-only process
[17:58] <drokita> store.db?
[17:58] * agh (~oftc-webi@gw-to-666.outscale.net) has joined #ceph
[17:58] <agh> Hello to all
[17:58] <agh> I've an issue with Openstack
[17:58] <joao> drokita, no, whatever you have under /srv/mon.foo
[17:58] <agh> I can't attach a RBD volume to a running instance
[17:59] <agh> because Libirt use disk bus IDE
[17:59] <agh> (which does not support hot plug)
[17:59] <TiCPU> agh, you absolutely need virtio !
[17:59] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[17:59] <agh> TiCPU: Yes, sure, but how to tell openstack to use virtio ?
[17:59] <TiCPU> performance-wise and recommended for hotplug
[18:00] <TiCPU> I do not use openstack though :/
[18:00] <agh> TiCPU: mm... :'(
[18:01] <drokita> joao: Seems to be working.... at least not failing outright
[18:02] <joao> drokita, the conversion?
[18:02] <joao> or is the monitor already in the quorum?
[18:02] <drokita> monitor is up
[18:02] <drokita> quorum is set
[18:02] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[18:02] <joao> oh, cool
[18:03] <drokita> e1: 3 mons at {a=10.32.12.80:6789/0,b=10.32.12.81:6789/0,c=10.32.12.82:6789/0}, election epoch 13472, quorum 0,1,2 a,b,c
[18:03] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Ping timeout: 480 seconds)
[18:03] <joao> let me know if something goes awry
[18:03] <joao> glad it worked :)
[18:03] <drokita> Wow... the OSDs will be easier, right?
[18:04] <joao> drokita, you managed to trigger a not-that-nasty-but-annoying bug introduced sometime around bobtail; I don't think the osds suffer from the same sort of problems :)
[18:05] <drokita> Thanks again for the help!!!
[18:05] <joao> just glad it wasn't something weirder and harder to assess
[18:06] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Read error: Connection reset by peer)
[18:06] <Maskul> hey guys, quick question, is it possible with a normal laptop to set up a basic ceph test lab? or do i need more powerful hardware?
[18:07] * tkensiski (~tkensiski@209.66.64.134) has joined #ceph
[18:08] * tkensiski (~tkensiski@209.66.64.134) has left #ceph
[18:08] * BillK (~BillK@124-169-221-201.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[18:09] <joao> Maskul, depends on what you want to test, but yeah, it's perfectly fine to set up a simple test cluster in a laptop
[18:09] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has joined #ceph
[18:09] * oliver1 (~oliver@p4FD071D2.dip0.t-ipconnect.de) has left #ceph
[18:09] <Maskul> joao, i just want to test it out, to see how it works, to get myself familiar with it :)
[18:10] <joao> Maskul, fwiw, under src/ there's a 'vstart.sh' script that is quite awesome to setup a simple test cluster
[18:11] <joao> 'MON=3 OSD=2 MDS=1 ./vstart.sh -n' will setup a 3 monitor, 2 osd, 1 mds cluster and run the appropriate daemons
[18:11] <Maskul> joao sorry for this newb question, can i just install 2 vm's which run debian to test out ceph or do they need to run on a hypervisor?
[18:12] <joao> Maskul, I never tried it, but I guess it's fine too
[18:12] * portante (~user@66.187.233.207) Quit (Ping timeout: 480 seconds)
[18:12] * tnt (~tnt@91.176.25.109) has joined #ceph
[18:13] <joao> oh, full disclaimer: I don't use a client for ceph either via fuse or the kernel client
[18:13] <joao> so I'm not sure if that would work
[18:13] * gucki (~smuxi@77-56-36-164.dclient.hispeed.ch) Quit (Remote host closed the connection)
[18:13] <Maskul> alright thank you :)
[18:13] <joao> I suppose it would though
[18:13] <Maskul> ill have a look into setting up a home lab with ceph
[18:21] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has left #ceph
[18:21] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has joined #ceph
[18:23] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[18:27] * nhm (~nhm@66.155.246.141) has joined #ceph
[18:29] * vata (~vata@2607:fad8:4:6:5572:f623:24ae:63cb) has joined #ceph
[18:33] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[18:34] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[18:52] * diegows (~diegows@200.68.116.185) has joined #ceph
[18:54] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Ping timeout: 480 seconds)
[19:06] * dalgaaf (~dalgaaf@nat.nue.novell.com) Quit (Quit: Konversation terminated!)
[19:11] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[19:12] * davidzlap1 (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[19:14] * davidzlap (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[19:14] * portante (~user@66.187.233.206) has joined #ceph
[19:23] <mrjack> joao: health HEALTH_WARN 1 mons down, quorum 1,2 0,1 - is there a reason why ceph "named" my monitors with other numbers than with their id? it is confusing ;) is the first output after quorum the id or the second? ;)
[19:24] * noahmehl (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) Quit (Read error: Operation timed out)
[19:25] <joao> mrjack, names are '1' and '2', ranks are '0' and '1'
[19:25] <mrjack> joao: ranks == id?
[19:25] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Quit: Leaving)
[19:26] <joao> mrjack, names are ids; ranks are to assess who's to be the leader
[19:26] <mrjack> joao: then i think there is something messed up? ;) because ceph-mon with id 2 is down currently..
[19:27] <mrjack> joao: 0 and 1 are up
[19:27] <mrjack> joao: on my other ceph clusters, i see quorum 0,1,2,3,4 0,1,2,3,4
[19:28] <joao> mrjack, I have to run, but if you pastebin the output from 'ceph mon_status', I'll be happy to take a look when I get back (in an hour or so)
[19:29] <mrjack> joao: ok
[19:37] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has left #ceph
[19:43] * partner (joonas@ajaton.net) has joined #ceph
[19:43] * dcasier (~dcasier@223.103.120.78.rev.sfr.net) Quit (Ping timeout: 480 seconds)
[19:46] <tnt> sagewk: I'll try out wip-5176-cuttlefish tomorrow first thing.
[19:46] <sagewk> tnt: awesome, thanks
[19:46] <sagewk> mikedawson: around?
[19:47] <cjh_> librbd is just exposing functionality from librados to python right?
[19:47] <cjh_> i saw in .63 librdb can now read from local replicas which is awesome. I'm assuming librados has that ability also right?
[19:48] <tnt> cjh_: no. librdb is for RDB. it has bindings in several languages.
[19:49] <cjh_> oh sorry i have mixed them up again :)
[19:49] <tnt> librados is for raw RADOS access and also has bindings for several languages.
[19:49] <cjh_> gotcha
[19:49] <cjh_> so if i wanted to roll my own client i would use librados
[19:50] <cjh_> if i wanted to just get rbd access i could use librbd python
[19:50] <tnt> pretty much
[19:50] * MapspaM (~clint@xencbyrum2.srihosting.com) has joined #ceph
[19:50] * Qten (~Qten@ip-121-0-1-110.static.dsl.onqcomms.net) Quit (resistance.oftc.net osmotic.oftc.net)
[19:50] * eternaleye (~eternaley@c-50-132-41-203.hsd1.wa.comcast.net) Quit (resistance.oftc.net osmotic.oftc.net)
[19:50] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (resistance.oftc.net osmotic.oftc.net)
[19:50] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) Quit (resistance.oftc.net osmotic.oftc.net)
[19:50] * Meths (rift@2.25.189.26) Quit (resistance.oftc.net osmotic.oftc.net)
[19:50] * SpamapS (~clint@xencbyrum2.srihosting.com) Quit (resistance.oftc.net osmotic.oftc.net)
[19:50] * Meths (rift@2.25.189.26) has joined #ceph
[19:50] * Qten (~Qten@ip-121-0-1-110.static.dsl.onqcomms.net) has joined #ceph
[19:50] * eternaleye_ (~eternaley@c-50-132-41-203.hsd1.wa.comcast.net) has joined #ceph
[19:51] * eternaleye_ is now known as eternaleye
[19:51] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[19:52] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) has joined #ceph
[19:55] * Volture (~quassel@office.meganet.ru) has joined #ceph
[19:55] * Volture (~quassel@office.meganet.ru) Quit ()
[19:55] * Volture (~quassel@office.meganet.ru) has joined #ceph
[19:56] <cjh_> tnt: i'd be interested to check out the patch to make librdb of local replicas
[19:56] <cjh_> that seems interesting
[19:56] <cjh_> i wonder if that could be extended to prefer local rack over some foreign rack
[19:58] * dpippenger (~riven@206-169-78-213.static.twtelecom.net) has joined #ceph
[20:00] <mrjack> joao: http://pastebin.com/vbtHz1hw
[20:01] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:05] * dcasier (~dcasier@223.103.120.78.rev.sfr.net) has joined #ceph
[20:07] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[20:07] * eschnou (~eschnou@60.197-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[20:11] * joshd1 (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) has joined #ceph
[20:13] * yasu` (~yasu`@dhcp-59-219.cse.ucsc.edu) has joined #ceph
[20:14] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:14] * MrNPP (~MrNPP@216.152.240.194) Quit (Read error: Operation timed out)
[20:14] <cjh_> is the librbd choose local automatic or can it be set in librbd?
[20:17] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Remote host closed the connection)
[20:19] * MrNPP (~MrNPP@216.152.240.194) has joined #ceph
[20:21] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[20:21] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[20:23] * madkiss (~madkiss@91.82.208.7) has joined #ceph
[20:26] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Remote host closed the connection)
[20:28] * jahkeup (~jahkeup@199.232.79.41) Quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz…)
[20:28] * tziOm (~bjornar@ti0099a340-dhcp0745.bb.online.no) has joined #ceph
[20:29] * jahkeup (~jahkeup@69.43.65.180) has joined #ceph
[20:30] * bergerx_ (~bekir@78.188.101.175) Quit (Quit: Leaving.)
[20:31] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[20:34] * eschnou (~eschnou@60.197-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[20:37] <tnt> joao: I had the mon "freeze" a couple of time just saying "2013-05-29 18:33:52.596505 7fcdf146a700 1 mon.a@0(synchronizing sync( requester state start )) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere"
[20:37] <tnt> can that be related to any of the fix ?
[20:37] <tnt> (basically it seems it's not processing any messages but the other 2 mon also don't realize that and they don't take over ...)
[20:45] * KindOne (KindOne@0001a7db.user.oftc.net) has joined #ceph
[20:45] * eschnou (~eschnou@60.197-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[20:48] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[20:49] <mikedawson> glowell: saw the update on issu3 4834. Are you planning to build debs, too?
[20:55] * jamespage (~jamespage@culvain.gromper.net) Quit (Quit: Coyote finally caught me)
[20:55] <glowell> At the moment the plan is just Centos/RHEL rpms. I shtere a particular distro you were think of ? I thought most already had qemu+rbd support.
[20:55] * jamespage (~jamespage@culvain.gromper.net) has joined #ceph
[20:56] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Read error: Operation timed out)
[20:58] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[20:58] * ChanServ sets mode +o scuttlemonkey
[21:04] * Wolff_John (~jwolff@ftp.monarch-beverage.com) has joined #ceph
[21:07] <mikedawson> glowell: I'm looking for Raring debs for qemu with joshd's patch backported
[21:07] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[21:13] <joao> tnt, whenever the monitor is synchronizing (or, basically, out of the quorum), to guarantee correctness, it won't take client messages
[21:14] <joao> but it should be marked out of the quorum at some point (tbh, it shouldn't even have gotten into the quorum)
[21:14] <joao> tnt, is that monitor in the quorum?
[21:18] <joao> mrjack, I really wished nobody used numbers to identify monitor names :(
[21:18] <joao> it's just so damn confusing :p
[21:20] <joao> mrjack, from that pastebin, all monitors appear to be up and running
[21:21] <dmick> joao: we could change status prints to indicate rank (if that's what that number is) differently
[21:21] <dmick> R1 vs 1 or something
[21:21] <joao> despite the confusion that may arise with names and ranks, the cluster seems fine
[21:21] <dmick> of course then someone will name their monitors R1, R2, R3...
[21:21] <joao> dmick, yeah :p
[21:22] * Maskul (~Maskul@host-89-241-174-13.as13285.net) Quit (Quit: Maskul)
[21:22] <darkfader> R1 is really stupid since that's a SDRF term
[21:22] <joao> but yeah, we could do something about it on 'ceph -s' at least
[21:22] <darkfader> R1 is the primary storage and R2 secondary in a realtime mirror
[21:23] <joao> something that would simply differentiate between the names and the ranks, but I worry that will break scripts relying on 'ceph -s'
[21:24] <joao> then again, not sure why we should care for scripts still relying on 'ceph -s' plain text format when we have json
[21:24] <tnt> joao: well, it shouldn't have left it ... the 3 were in quorums, then I added two new OSDs and in the very beginning somehow the master got kicked into that weird state and the two peons stayed peons and didn't take over.
[21:24] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:24] <joao> tnt, so the leader just started synchronizing out of the blue?
[21:24] <joao> tnt, I would love to look into that log
[21:26] * MapspaM is now known as SpamapS
[21:26] * Cube (~Cube@12.248.40.138) has joined #ceph
[21:35] * alop (~al592b@71-80-139-200.dhcp.rvsd.ca.charter.com) has joined #ceph
[21:35] <tnt> joao: http://pastebin.com/raw.php?i=4ig4LP9B
[21:36] <tnt> joao: basically it started when I started osd.12 (which I had just create) and it did a "osd crush create-or-move 12 0.27 root=default host=angelstit-00 v 0"
[21:36] <alop> anyone know what the clock skew threshold for ceph is? I have three nodes within the same second using the same ntp servers, and I'm getting a warning
[21:37] <tnt> 0.5s IIRC
[21:37] <joao> used to be half a sec
[21:37] <alop> hehe, thanks for the info
[21:38] <alop> touchy, I guess
[21:39] * joshd1 (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) Quit (Quit: Leaving.)
[21:43] * fridudad (~oftc-webi@p5B09D334.dip0.t-ipconnect.de) has joined #ceph
[21:45] * joshd1 (~joshd@2607:f298:a:607:ac93:ff05:d54d:d7b4) has joined #ceph
[21:45] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[21:45] * ChanServ sets mode +o elder
[21:47] * eschenal (~eschnou@60.197-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[21:50] * eschnou (~eschnou@60.197-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[21:53] <TiCPU> is it possible you can't map version 2 image usign rbd map ?
[21:53] <tnt> yes
[21:53] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Quit: Leaving)
[21:53] <tnt> depends on your kernel version
[21:53] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[21:53] * ChanServ sets mode +o elder
[21:53] <tnt> version 2 image are only recently supported.
[21:56] <TiCPU> Linux Ceph1 3.9.4-030904-generic #201305241545 SMP
[21:56] <TiCPU> rbd: add failed: (6) No such device or address
[21:58] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[21:59] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:59] * alop (~al592b@71-80-139-200.dhcp.rvsd.ca.charter.com) Quit (Quit: alop)
[21:59] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Quit: Leaving)
[22:02] * alop (~al592b@71-80-139-200.dhcp.rvsd.ca.charter.com) has joined #ceph
[22:03] <TiCPU> mapping a brand new v1 image works, v2 fails
[22:03] <mrjack> joao: the cluster is not running fine
[22:04] <mrjack> joao: well there is io and i can access it... but there is definitly one monitor not running (mon.2)
[22:04] <mrjack> joao: the one with name "2" is not running
[22:05] <joao> and 'ceph mon_status' still shows it in the quorum?
[22:05] <mrjack> joao: no it is missing..
[22:05] <mrjack> but
[22:05] <mrjack> even when i start it, it does not join
[22:05] <tnt> TiCPU: if your image 2 uses layering, you need 3.10 ...
[22:05] <joao> mrjack, is it running the same version as the others?
[22:06] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[22:06] * ChanServ sets mode +o elder
[22:07] <mrjack> joao: yes, 0.61.2
[22:08] <TiCPU> hahaha.. it justs seem I'm constantly installing new kernels
[22:09] <joao> mrjack, sorry, dinner is ready; bbl
[22:10] <jerker> regarding the mail on the mailinglist: wikipedia on distributed file systems and others have became even messier since i tried to clean it up a few years ago... Poeple not sharing my point of view (read: idiots) have removed concepts like distributed file system and merged into clustered file system. My head hurts.
[22:11] <jerker> At least the Ceph page is okayish.
[22:14] <scuttlemonkey> jerker: which page is hosed?
[22:14] <scuttlemonkey> I have poked the ceph page a few times, could add another to my watchlist
[22:14] <janos> jerker: are you named after the desk?
[22:15] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[22:15] <cjh_> does that rados command put/get objects in parallel?
[22:15] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Remote host closed the connection)
[22:16] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[22:18] <TiCPU> cjh_, what command?
[22:18] * alop_ (~al592b@71-80-139-200.dhcp.rvsd.ca.charter.com) has joined #ceph
[22:18] <cjh_> TiCPU: rados put for example
[22:18] <jerker> janos: the desk must have been named after me. Jerker is an old swedish name that was a nickname of Erik -> Eriker -> Jeriker -> Jerker
[22:19] <cjh_> rados bench seems to run really quickly but when i use put or get it's slow
[22:19] <cjh_> about 1/10th the speed
[22:19] <TiCPU> cjh_, bench uses 16 streams
[22:19] <TiCPU> (by default, or use -t 1 to switch to once)
[22:19] <janos> jerker: ah. it's a rather esteemed (but no longer produced) ikea desk. i'm still at a jerker right now
[22:20] * dcasier (~dcasier@223.103.120.78.rev.sfr.net) Quit (Ping timeout: 480 seconds)
[22:20] <jerker> scuttlemonkey: now when I read it closer it is not as bad as I first thought, http://en.wikipedia.org/wiki/Clustered_file_system but still I would not have made it like that.
[22:20] <cjh_> TiCPU: ok i see. I didn't know i had to do rados get -t 16 -p some_pool object outfile
[22:20] <jerker> scuttlemonkey: what i miss is the good old http://en.wikipedia.org/wiki/Distributed_file_system
[22:20] <scuttlemonkey> jerker: ahh, hadn't looked at this one
[22:21] <jerker> janos: im in a jerker right now!
[22:21] <janos> haha
[22:21] <gregaf> I assume we're talking about people thinking ext4 in two places is a good idea
[22:21] * alop (~al592b@71-80-139-200.dhcp.rvsd.ca.charter.com) Quit (Ping timeout: 480 seconds)
[22:21] * alop_ is now known as alop
[22:22] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:22] <cjh_> does ceph always fill up a PG before moving on to the next one?
[22:23] * vata (~vata@2607:fad8:4:6:5572:f623:24ae:63cb) Quit (Quit: Leaving.)
[22:24] <jerker> gregaf: yup, but wikipedia is not really is a good place to learn about the concepts. it could be a lot clearer. I tried. Got tangled in arguments. Got fed up.
[22:24] <gregaf> yeah
[22:24] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[22:24] * ChanServ sets mode +v andreask
[22:29] <mrjack> joao: it seems the mon is running, but not in quorum... from the logs i see mon.2@0(probing).data_health(0), mon.2@0(synchronizing sync( requester state chunks )).data_health(0), and the store is now 2.5GB large still growing...
[22:36] <jlogan1> Trying to upgrade to 0.61.2 and have "journal is corrupt" on some drives on one host.
[22:36] <jlogan1> What is the best way to get these OSD back into service?
[22:38] * schwarzenegro (~schwarzen@190.179.55.33) has joined #ceph
[22:38] <jlogan1> http://pastebin.com/tXaGMqrd
[22:42] <sjusthm> journal_ignore_corruption will disable the check
[22:42] <sjusthm> but it probably indicates that your journals are corrupt
[22:42] <sjusthm> in which case the OSDs are toast
[22:44] <jlogan1> sjusthm: what is the best way to rebuild the OSD?
[22:44] <sjusthm> kill it and recreate it
[22:44] <sjusthm> I think
[22:44] <sjusthm> I think the details are in the docs
[22:44] <jlogan1> ok. Do you want to see the ceph -s output?
[22:44] <sjusthm> do you have a question about it?
[22:45] <jlogan1> Just trying to make sure the degraded state will not go worse with killing the osd.
[22:45] <sjusthm> what's the ceph -s output?
[22:45] <jlogan1> health HEALTH_WARN 1489 pgs backfill; 6 pgs backfilling; 1486 pgs degraded; 3 pgs recovering; 71 pgs recovery_wait; 1569 pgs stuck unclean; recovery 569523/2067626 degraded (27.545%);
[22:46] <sjusthm> are the osds with the problem running?
[22:46] <jlogan1> no, they die on startup
[22:46] <sjusthm> then that cluster state already accounts for the dead osds
[22:46] <sjusthm> you probably want to wait for it to recover though
[22:46] <sjusthm> if you have enough slack space
[22:47] <sjusthm> in a pinch, you can resurrect the dead osds with the config I mentioned before
[22:47] * yehuda_hm (~yehuda@2602:306:330b:1410:28a0:55bc:d720:f9ef) Quit (Ping timeout: 480 seconds)
[22:49] * eschenal (~eschnou@60.197-201-80.adsl-dyn.isp.belgacom.be) Quit (Quit: Leaving)
[22:50] * yehuda_hm (~yehuda@2602:306:330b:1410:6885:1334:8c26:70e9) has joined #ceph
[22:52] <jlogan1> sjusthm: Thanks. We will let it run over night and see how it looks in the morning.
[22:53] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[22:53] * ChanServ sets mode +v leseb
[23:05] * yehuda_hm (~yehuda@2602:306:330b:1410:6885:1334:8c26:70e9) Quit (Ping timeout: 480 seconds)
[23:05] * Wolff_John (~jwolff@ftp.monarch-beverage.com) Quit (Quit: ChatZilla 0.9.90 [Firefox 21.0/20130511120803])
[23:07] <fridudad> rbd snap rollback has shown progress in bobtail in cuttle fish i see no output...
[23:07] <fridudad> is this correct?
[23:14] * yehuda_hm (~yehuda@2602:306:330b:1410:6885:1334:8c26:70e9) has joined #ceph
[23:14] * eegiks (~quassel@2a01:e35:8a2c:b230:3c6d:e42b:7cbf:28db) Quit (Remote host closed the connection)
[23:15] * eegiks (~quassel@2a01:e35:8a2c:b230:b593:6630:8f9d:771) has joined #ceph
[23:16] * madkiss (~madkiss@91.82.208.7) Quit (Quit: Leaving.)
[23:29] * jahkeup (~jahkeup@69.43.65.180) Quit (Ping timeout: 480 seconds)
[23:35] <mikedawson> sage, sagewk: does the wip-5176-cuttlefish branch include the changes in wip-4895-cuttlefish?
[23:35] <sagewk> mikedawson: yeah, those are in the cuttlefish branch now.
[23:36] <mikedawson> sagewk: Excellent. They have been stable for me. I'll give wip-5176-cuttlefish a try soon.
[23:37] <sagewk> great. the two questions are if (1) it still prevents growth, and (2) the io load is lower
[23:38] * portante (~user@66.187.233.206) Quit (Quit: bye)
[23:39] <mikedawson> sagewk: I have had no growth since wip-4895-cuttlefish with 'mon compact on trim = true'. Do you want to test any specific config settings against wip-5176-cuttlefish?
[23:39] <sagewk> nope, same config
[23:39] <sagewk> this just makes the trim cheaper
[23:39] <sagewk> er, compaction
[23:40] <mikedawson> sagewk: Great. I have stats polling / graphing on a 10s interval to see what it does to io load
[23:44] <nhm> sagewk: going to try to test tonight
[23:44] <sagewk>
[23:45] <sagewk> awesome, thanks guys!
[23:56] * tziOm (~bjornar@ti0099a340-dhcp0745.bb.online.no) Quit (Remote host closed the connection)
[23:58] <tnt> Damn, data movement after adding a couple OSD is much more intensive than I would have thought. 4h to move 200G ... and suprisingle it seems to stress the mons as well.

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.