#ceph IRC Log


IRC Log for 2012-11-22

Timestamps are in GMT/BST.

[0:00] <gregaf> the mds cache size setting should, but if clients do mean things it can get overruled
[0:04] <lurbs> plut0: Depends on your cluster, network speed, and workload. Enough to soak up a short burst of writes, which I believe will all be sequential onto whichever device(s) are doing the journalling.
[0:04] <dmick> lurbs: yes sequential
[0:04] <plut0> lurbs: it gets flushed to disk soon after right?
[0:04] <dmick> plut0: yes, and tunable
[0:04] <dmick> http://ceph.com/docs/master/rados/configuration/journal-ref/
[0:04] <plut0> so doesn't need to be larger than size of RAM
[0:05] <dmick> plut0: not necessarily related to RAM size
[0:05] <fmarchand2> nhm : apparently it's not yet fixed. Gregory is still waiting for an output from the user.
[0:06] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[0:06] <fmarchand2> gregaf : oki :)
[0:06] <joao> <dmick> oh man that was a weekend
[0:06] <dmick> Gregory == gregaf btw
[0:06] <joao> +1
[0:06] <fmarchand2> hi joao !
[0:06] <joao> hello there
[0:07] <fmarchand2> dmick : ahah thx ! :)
[0:07] * yanzheng (~zhyan@jfdmzpr06-ext.jf.intel.com) has joined #ceph
[0:08] * s_parlane (~scott@gate1.alliedtelesyn.co.nz) has joined #ceph
[0:09] <plut0> is a daemon the same as a node?
[0:09] <dmick> plut0: in what context?
[0:09] <dmick> people may use them for the same thing, or may not
[0:10] <plut0> "Our rule of thumb estimate is that you should have 1GHz of CPU and 1GB of RAM per daemon"
[0:10] * Tamil (~Adium@ Quit (Quit: Leaving.)
[0:10] <s_parlane> plut0: thats from the osd section right ?
[0:10] <fmarchand2> gregaf : sorry I didn't realize it was you ! it's late here !
[0:10] <dmick> yeah, that means daemon, or process
[0:10] <plut0> s_parlane: yes
[0:11] <s_parlane> then its per osd instance, so if you have 4 disks and run one instance per disk, then 4x 1GB + 1GHz
[0:11] <gregaf> :)
[0:11] <plut0> gotcha
[0:12] <fmarchand2> gregaf : so you're still waiting for an output from this user ?
[0:12] <plut0> 1ghz is ambiguous though, would be better if they could say X daemons per CPU core
[0:13] <gregaf> fmarchand2: not sure which output you're referring to :)
[0:14] <dmick> from http://www.spinics.net/lists/ceph-devel/msg09173.html
[0:14] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[0:14] <fmarchand2> gregaf : ceph mds tell 0 dumpcache
[0:15] <gregaf> oh, Tren and I had a long discussion
[0:15] <gregaf> it ended when the nodes got taken away from him for more profitable uses before we'd found the problem :(
[0:15] <gregaf> but pretty sure it was related to the VFS holding a bunch of inodes in cache and preventing the MDS from dumping them
[0:16] <fmarchand2> gregaf : got taken away from him ? what do yu mean ?
[0:16] <gregaf> I wait hopefully for the point when I have time to investigate it more eeply
[0:16] <gregaf> he was working at a company and he had them for a while for test purposes but the company wanted them for produciton
[0:16] <gregaf> they were like 192GB RAM boxes
[0:17] <fmarchand2> gregaf : oh so 50gb of cache was not a so big deal :)
[0:17] <gregaf> as it turns out, yeah
[0:18] <fmarchand2> gregaf : that's huge ... they could almost have whole osd's in RAM !!!
[0:18] <nhm> gregaf: interesting problem
[0:19] <fmarchand2> gregaf : I think I have the same pb but on another scale ... only 7GB of ram ... I mean it looks like ...
[0:22] <fmarchand2> gregaf : I restarted my mds today ... but I can run the dumpcache command when I will see the mds growing in memory, if you want.
[0:22] * MikeMcClurg (~mike@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) Quit (Quit: Leaving.)
[0:23] * tnt (~tnt@162.63-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[0:24] <gregaf> fmarchand2: I'm not sure I'll have time to do much about it right now, but more info is always helpful
[0:26] <fmarchand2> gregaf : You know I will do it maybe next week and I'll ask you where to send it so you'll have the file in case of you have time
[0:28] <fmarchand2> gregaf : I just have to run the command you put in the last mail of you're discussion with tren ?
[0:29] <fmarchand2> your
[0:31] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[0:39] * wer (~wer@wer.youfarted.net) Quit (Read error: Connection reset by peer)
[0:40] * loicd (~loic@pat35-5-78-226-56-155.fbx.proxad.net) Quit (Quit: Leaving.)
[0:40] * jlogan (~Thunderbi@2600:c00:3010:1:1ccf:467e:284:aea8) Quit (Ping timeout: 480 seconds)
[0:41] * Tamil (~Adium@ has joined #ceph
[0:43] <gregaf> fmarchand2: I believe so, yep
[0:43] * calebamiles (~caleb@65-183-137-95-dhcp.burlingtontelecom.net) Quit (Quit: Leaving.)
[0:44] * yanzheng (~zhyan@jfdmzpr06-ext.jf.intel.com) Quit (Quit: Leaving)
[0:48] * wer (~wer@wer.youfarted.net) has joined #ceph
[0:48] * maxiz (~pfliu@ has joined #ceph
[0:52] * xiaoxi (~xiaoxiche@ has joined #ceph
[0:53] <xiaoxi> joshd:sorry,I just fall asleep last night ;) actually I am using rbd pool now
[1:10] * fmarchand2 (~fmarchand@ Quit (Ping timeout: 480 seconds)
[1:17] * MarkS (~mark@irssi.mscholten.eu) Quit (Ping timeout: 480 seconds)
[1:25] * Guest6374 (~CristianD@host165.186-108-123.telecom.net.ar) Quit ()
[1:27] * benpol (~benp@garage.reed.edu) has left #ceph
[1:35] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[1:36] * LeaChim (~LeaChim@b0fa82fd.bb.sky.com) Quit (Remote host closed the connection)
[1:37] <xiaoxi> got error when trying to attach rbd to qemu instance
[1:37] <xiaoxi> root@compute01:/var/lib/nova/instances/instance-0000008a# virsh attach-device --domain instance-0000008a --file temp.xml
[1:37] <xiaoxi> error: Failed to attach device from temp.xml
[1:37] <xiaoxi> error: operation failed: open disk image file failed
[1:37] * weber (~he@61-64-87-236-adsl-tai.dynamic.so-net.net.tw) Quit (Read error: Connection reset by peer)
[1:38] <gregaf> I don't have much experience with virsh, but isn't that command asking to attach the file temp.xml to the instance? rather than the volume described by temp.xml?
[1:39] <xiaoxi> a volume described by temp.xml
[1:39] <gregaf> dmick, do you know? or is joshd back?
[1:42] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[1:44] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[1:44] <dmick> sorry
[1:45] <dmick> afaik the only way to add rbd images to a libvirt vm is to edit the domain xml
[1:45] <dmick> and add a <disk> section
[1:45] <dmick> http://ceph.com/wiki/QEMU-RBD#Virtual_disks is the only documentation I know of for it so far
[1:46] <dmick> http://ceph.com/deprecated/QEMU-RBD#Virtual_disks, sorry
[1:47] * yanzheng (~zhyan@ has joined #ceph
[1:47] * yoshi (~yoshi@p11251-ipngn4301marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:49] <xiaoxi> added and rebooted,doesn't work.
[1:49] <xiaoxi> rbd volume can online attach, I have tried with 0.48(by using nova)
[1:50] <dmick> xiaoxi: what's the failure message?
[1:52] <dmick> and does the emulator named in <emulator> show output for "--drive format=? | grep rbd"?
[1:52] <dmick> and can you show your <disk> stanza?
[1:52] <xiaoxi> dmick:how can I see the failure message?I cannot find any error info in qemu/instancexxxxxx.log
[1:53] <xiaoxi> <disk type="network" device="disk">
[1:53] <xiaoxi> <driver name="qemu" type="raw" cache="none"/>
[1:53] <xiaoxi> <source protocol="rbd" name="nova/volume-3452f4a8-0a04-4ce5-bfe8-7a1ec481165b"/>
[1:53] <xiaoxi> <target bus="virtio" dev="vdb"/>
[1:53] <xiaoxi> <serial>3452f4a8-0a04-4ce5-bfe8-7a1ec481165b</serial>
[1:53] <xiaoxi> </disk>
[1:53] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[1:53] <xiaoxi> here is the rbd info:
[1:53] <xiaoxi> root@compute01:/var/lib/nova/instances/instance-0000008a# rbd info nova/volume-3452f4a8-0a04-4ce5-bfe8-7a1ec481165b
[1:53] <xiaoxi> rbd image 'volume-3452f4a8-0a04-4ce5-bfe8-7a1ec481165b':
[1:53] <xiaoxi> size 5120 MB in 1280 objects
[1:53] <xiaoxi> order 22 (4096 KB objects)
[1:53] <xiaoxi> block_name_prefix: rbd_data.242a415bd260
[1:53] <xiaoxi> format: 2
[1:53] <xiaoxi> features: layering
[1:54] <dmick> well, how does it "not work"?
[1:54] <dmick> you mean the disk just doesn't show up?
[1:54] <xiaoxi> yes
[1:55] <dmick> hm. that's not the usual failure mode
[1:55] <dmick> are you sure it's just not where you're not looking? dev is only a hint
[1:55] <dmick> it could be /dev/sdb, or /dev/hdb
[1:56] <xiaoxi> sure,since I use virtio,it shoudl be vdb,but actually there is no sdb or hdb unser /dev
[1:57] * rweeks (~rweeks@c-24-4-66-108.hsd1.ca.comcast.net) has left #ceph
[1:58] <joao> any of you guys still around?
[1:58] <dmick> xiaoxi: did you try "kvm --drive format=? | grep rbd"
[1:58] <xiaoxi> well,I tried to remove my ceph.conf,and I can see an error message say "need mon addr".but this time doesnt'.so I assume xml and libvirt is well and can talk with mon
[1:58] <dmick> (if kvm is your <emulator>)
[1:59] <xiaoxi> root@compute01:/var/lib/nova/instances/instance-0000008a# kvm --drive format=? | grep rbd
[1:59] <xiaoxi> Supported formats: vvfat vpc vmdk vdi sheepdog rbd raw host_cdrom host_floppy host_device file qed qcow2 qcow parallels nbd dmg tftp ftps ftp https http cow cloop bochs blkverify blkdebug
[2:00] <joao> dmick, have a moment? need a second pair of eyes to make sure I'm not the only one misinterpreting what's on the docs :)
[2:00] <dmick> joao, sure
[2:00] <joao> http://ceph.com/docs/master/rados/operations/add-or-rm-osds/
[2:00] <joao> step 8 of "adding an osd (manual)"
[2:01] <dmick> xiaoxi: I really don't know then; if the vm starts with no errors I would expect it to be working
[2:01] <joao> would you say that the 'pool={pool-name}' is the only argument after the {weight} that is mandatory?
[2:01] <dmick> you can turn on libvirt debugging with /etc/libvirt/libvirtd.conf, although you may have to futz with app-armor
[2:02] <dmick> again from that wiki
[2:02] <dmick> Note: With Ubuntu apparmor blocks access to /etc/ceph/ceph.conf by default causing a permission denied error, the quick fix is to change /etc/apparmor.d/abstractions/libvirt-qemu to allow access.
[2:02] <dmick> same applies to debug log files
[2:04] <lurbs> The libvirt-bin package from the Ubuntu cloud archive repository fixes at least the access to ceph.conf
[2:04] <lurbs> https://wiki.ubuntu.com/ServerTeam/CloudArchive#How_to_Enable_and_Use
[2:05] <dmick> yes, quantal's has that, at least
[2:11] <xiaoxi> dmick:after I reboot libvirt....it works well.even with attach-device it works too
[2:13] <dmick> hmm
[2:14] <dmick> well isn't that interesting :)
[2:14] <dmick> and joao, sorry, looking
[2:14] <dmick> do you mean step 9, I assume?
[2:15] <dmick> and the crush map is probably my least understood part of the software, so I'm not sure
[2:16] <xiaoxi> not interesting at all :( This make me work to midnight last night.aha but it's daytime in China now
[2:17] <dmick> xiaoxi: sorry. Did you upgrade libvirt or qemu or some related package since rebooting /restarting libvirtd, maybe?...or perhaps it's just a libvirtd bug
[2:18] <gregaf> joao: dmick: joshd changed that recently to match the rest of the docs
[2:18] <gregaf> the problem is that you need to put the OSD in some bucket CRUSH knows about, and can specify extras/others if you like
[2:19] <gregaf> so if all your OSDs are rooted at the pool=, then yes, that's required
[2:19] <gregaf> if they're all rooted in a host which the system already knows about, that's required and nothing else is
[2:19] <gregaf> it doesn't lend itself to friendly parsing without convincing John and the readers to deal in generic CRUSH buckets instead of the defined default ones
[2:21] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[2:21] * Tamil (~Adium@ has left #ceph
[2:22] <joao> gregaf, I found that a bit odd, and tried it a couple of times this morning assuming the remaining buckets as optional, and just specifying a 'pool=data' for instance would return -EINVAL
[2:22] * The_Bishop (~bishop@2001:470:50b6:0:1d2f:9f6:a2bb:df4d) Quit (Ping timeout: 480 seconds)
[2:22] <gregaf> joao: it's not talking about RADOS pools...
[2:22] <joao> oh
[2:22] <joao> oops
[2:22] <gregaf> this is a CRUSH bucket, "pool" as in "pool of storage", which we renamed to "root" in Bobtail in order to prevent this confusion
[2:23] <dmick> I think we should schedule a CRUSH map webinar :)
[2:23] <joao> well, carry on then ;)
[2:23] <joao> I'm just glad I didn't spend that much time trying to figure that one out
[2:23] <joao> :p
[2:24] <joao> would have been a poor way to spend time assuming it was some sort of error
[2:25] <joao> gregaf, thanks btw
[2:25] <gregaf> :)
[2:25] <joao> heading off to bed as soon as I guarantee the new task options are working properly
[2:26] <joao> damned python and it's implicit data types drive me nuts
[2:27] <dmick> relax. strong typing is so...anal-retentive. :)
[2:27] <joao> lol
[2:28] <lurbs> Just had a bunch of slow requests crop up after the cluster did a what looks to be a scrub.
[2:29] <lurbs> http://paste.nothing.net.nz/47498a
[2:30] <lurbs> And a few PGs got marked as active+clean+scrubbing
[2:30] <gregaf> lurbs: oooh, do you have any OSD logging?
[2:30] * The_Bishop (~bishop@2001:470:50b6:0:7021:1145:970c:6795) has joined #ceph
[2:30] <gregaf> and what version?
[2:30] <lurbs> 0.54
[2:30] <gregaf> I think this is a bug that sjust is trying to diagnose
[2:31] <lurbs> Just standard logging, debugging levels aren't high.
[2:31] <gregaf> okay, bummer :(
[2:31] <lurbs> What would you want?
[2:31] <gregaf> in that case, I recommend restarting the OSDs in question
[2:31] <gregaf> I'm not sure exactly; I think he's trying to reproduce it but he needs some pretty high debug levels in order to get enough information out
[2:32] <lurbs> I can do that now. All of the debug stuff goes into the [global] section?
[2:33] <gregaf> well, it's not helpful unless it was running when the requests got stuck, unfortunately
[2:34] <gregaf> and the logs grow very quickly with that much debugging enabled
[2:34] <gregaf> but if you want to try it, then put it into the OSD or global section
[2:34] <gregaf> I think "debug osd = 20" "debug ms = 1" captures everything he's interested in
[2:35] * maxiz_ (~pfliu@ has joined #ceph
[2:36] <lurbs> Enabled now.
[2:37] <lurbs> About the third time it's triggered on this particular cluster, so hopefully I'll catch it again.
[2:38] <gregaf> ah, that would be convenient then :)
[2:38] <gregaf> okay, I'm heading out for the night, later all!
[2:38] <gregaf> and happy Thanksgiving!
[2:38] <lurbs> Is it likely to recur if I force a scrub of every PG?
[2:39] <gregaf> lurbs: I'm not sure about the likelihood, but yes, it has something to do with scrubs and client requests intersecting badly
[2:39] <gregaf> so that should increase the changes
[2:39] <gregaf> *chances
[2:39] <lurbs> I'll loop it, and try to capture something useful.
[2:41] <gregaf> thanks
[2:42] * maxiz (~pfliu@ Quit (Ping timeout: 480 seconds)
[2:48] * timmclaughlin (~timmclaug@173-25-192-164.client.mchsi.com) has joined #ceph
[3:07] <lurbs> gregaf: Just triggered it again, I think.
[3:07] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[3:08] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[3:08] <dmick> he has gone for the day, sadly lurbs
[3:08] <lurbs> Yeah, figure he'll read scrollback or something.
[3:08] <lurbs> Stupid time zones. :)
[3:25] <xiaoxi> Happy thanksgiving~ I am having terkey now
[3:30] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[3:34] * MarkS (~mark@irssi.mscholten.eu) has joined #ceph
[3:37] * lotia (~lotia@l.monkey.org) Quit (Ping timeout: 480 seconds)
[3:42] * stass (stas@ssh.deglitch.com) Quit (Ping timeout: 480 seconds)
[3:43] * renzhi (~renzhi@ has joined #ceph
[3:43] <renzhi> morning
[3:44] <renzhi> is there a way to undelete an object in ceph, if the object was deleted by accident?
[3:46] <dmick> renzhi: nothing I'm aware of
[3:47] * yoshi (~yoshi@p11251-ipngn4301marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[3:47] <lurbs> 2012-11-22 15:05:12.550586 7f4f67cc4700 20 osd.16 216 scrub_should_schedule loadavg 0.14 < max 0.5 = yes
[3:47] <lurbs> The auto scrub won't run at all if the load average is > 0.5?
[3:48] <dmick> http://ceph.com/docs/master/rados/configuration/osd-config-ref
[3:49] <dmick> scrub load threshold
[3:49] <lurbs> Got it, thanks.
[3:50] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[3:55] * stass (stas@ssh.deglitch.com) has joined #ceph
[4:07] * timmclaughlin (~timmclaug@173-25-192-164.client.mchsi.com) Quit (Remote host closed the connection)
[4:07] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[4:20] <buck> I'm trying to use the java bindings on an amd-64 platform and the jni call is erroring off, saying it cannot find libcrypto++.so.8 ,but googling a bit leaves me thinking that that is a 32-bit library. Has anyone run across something like this before?
[4:20] <xiaoxi> Hi,some days ago I complained about rbd's sequential write performance,the feedback is my SSD(130MB/s for sequential write) is the bottleneck.But today I swap all the SSD to Intel 520Serise,which announce 480MB/s for sequential write and when I run dd,~200MB/s is measured.
[4:21] <xiaoxi> So if SSD is the bottleneck, I would like to expect a doubled performance.but the result is sad,almost the same for 4M sequential write
[4:21] <dmick> buck: $ file /usr/lib/libcrypto++.so.9.0.0
[4:21] <dmick> /usr/lib/libcrypto++.so.9.0.0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=0x5ec93acfc636c3935da0fd23b210206a6b342e7a, stripped
[4:21] <dmick> I don't have 8
[4:22] <dmick> (this is quantal)
[4:22] <buck> I'm on the same release and I'm seeing the same thing. I have 9, not 8
[4:22] <buck> but the error message is very specifically saying it cannot find 8.....hmmm.... I wondering if the jni code needs tweaking.
[4:22] <buck> meh, I'll dig into this next week. It's time to make some pies. Thanks for chiming in dmick (and your help earlier today)
[4:23] <dmick> jni...is binary lib linked, right?
[4:23] <dmick> if it were linked against a specific version, that would make it require that version, yah?
[4:23] <dmick> one wonders why it's not linked against the non-versioned; don't know what gnu ld will do for "version >= x"
[4:23] <buck> Um.....it picks up some libs from the installed JDK, so it's doing at least that bit as a shared librayr
[4:23] <dmick> precise apparently also has 9
[4:24] <buck> lairght, now you're got me interested. I'll go take a peak at the make files
[4:25] <dmick> oneiric *also* has 9
[4:25] <dmick> confused
[4:27] <buck> yeah....this is odd. You looked at all 64-bit machines, yeah?
[4:27] <dmick> yeah
[4:28] <buck> even i386 precise has .9
[4:28] <buck> goofy
[4:29] <dmick> you just built this jni?
[4:31] <dmick> (do_autogen.sh confuses me; how am I supposed to supply --enable-java-cephfs?)
[4:31] <buck> no. It's been part of the ceph build for a while. I'm just trying to run it on some different hosts than I normally do
[4:31] <buck> ./configure --enable-java-cephfs
[4:31] <buck> I'm sure this is a config thing on my side
[4:31] <dmick> yes, but that misses all the stuff do_autogen.sh adds
[4:31] <buck> oh, sorry
[4:32] <renzhi> dmick: thanks, but bad for me though :(
[4:33] <dmick> @@ -30,7 +30,6 @@ die() {
[4:33] <cephalobot> dmick: Error: "@" is not a valid command.
[4:33] <dmick> debug_level=0
[4:33] <dmick> verbose=0
[4:33] <dmick> profile=0
[4:33] <dmick> -CONFIGURE_FLAGS=""
[4:33] <dmick> while getopts "d:e:hHTPjpnvO:" flag
[4:33] <dmick> do
[4:33] <dmick> case $flag in
[4:33] <dmick> renzhi: snapshots of precious data ?...
[4:35] <dmick> xiaoxi: if the SSD were the *only* bottleneck.
[4:35] <dmick> performance studies always amount to "finding the next problem"
[4:36] <renzhi> dmick: well, some objects, and someone has made a mistake by deleting them
[4:37] <dmick> renzhi: sure. sadly undelete is often not really an option
[4:37] <renzhi> dmick: ok, I should all objects should be considered precious anyway, that's why in our app, we never delete any
[4:37] <dmick> rm -rf / is also painful
[4:38] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[4:38] * buck (~buck@bender.soe.ucsc.edu) has left #ceph
[4:39] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[4:41] <dmick> buck: too dumb to figure out how to make it I guess
[4:43] <dmick> ./configure --prefix=/usr --sbindir=/sbin --localstatedir=/var --sysconfdir=/etc --with-debug --with-cryptopp --with-radosgw --enable-java-cephfs, but still no makey makey
[4:50] * deepsa (~deepsa@ has joined #ceph
[4:50] <renzhi> dmick: really like to have some capabilities on the pool level, e.g. that pool is read/write only, no delete, or something like that.
[4:51] <dmick> you can certainly make it readonly
[4:52] <dmick> read/write/nodelete is a little weird; does anything let you do that?
[4:53] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[5:02] * scuttlemonkey (~scuttlemo@96-42-146-5.dhcp.trcy.mi.charter.com) has joined #ceph
[5:02] * ChanServ sets mode +o scuttlemonkey
[5:15] * scuttlemonkey (~scuttlemo@96-42-146-5.dhcp.trcy.mi.charter.com) Quit (Quit: This computer has gone to sleep)
[5:22] <elder> dmick, my code just worked
[5:22] * maxiz_ (~pfliu@ Quit (Ping timeout: 480 seconds)
[5:23] <elder> Yip skip!
[5:23] <dmick> uh, good?
[5:23] <elder> It's my new rbd request code.
[5:23] <elder> The whole thing.
[5:23] <dmick> oh cool
[5:24] <elder> I could be leaking a bit, I haven't checked all that yet. But my short test ran to completion without a problem for the first time just now.
[5:28] <dmick> great news.
[5:28] <elder> I think so...
[5:32] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[5:32] <dmick> I'm picking apart python exceptions
[5:33] <dmick> and exploring the wonders of shell background procs and wait statements
[5:33] <dmick> it's pretty awesome
[5:34] * maxiz (~pfliu@ has joined #ceph
[5:36] <dmick> ok. I have a lock/blacklist test. cool. enough for tonight
[5:36] <dmick> happy t-day everyone
[5:36] * dmick (~dmick@2607:f298:a:607:c116:87f8:cf74:fa68) Quit (Quit: Leaving.)
[5:47] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[5:47] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[5:48] * yoshi (~yoshi@p11251-ipngn4301marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[5:48] <nhm> elder: which rbd request code is this?
[5:49] <elder> I'm basically rewriting some of the core code that handles requests.
[5:49] <nhm> nice!
[5:49] <nhm> huh, slashdot is down.
[5:50] <elder> Well I tried not to, but the way some of the stuff was structured just made it very hard and unnatural to try to resubmit requests to parent images for layered rbd images.
[5:50] <nhm> there it goes.
[5:50] <elder> David Porter says "hi" to you, by the way.
[5:50] <nhm> elder: Oh good! I've been meaning to catch up with him. Where did you see him?
[5:51] <nhm> elder: I can tell him I finally got an account at Oak Ridge.
[5:51] <elder> We went to a concert together on Sunday night, along with Tom Ruwart, Russell Cattelan and another friend of Tom's.
[5:52] * plut0 (~cory@pool-96-236-43-69.albyny.fios.verizon.net) Quit (Quit: Leaving.)
[5:52] <nhm> Elder: That's great.
[5:53] <elder> He was asking about where I worked and it occurred to me he might know you and he said he did.
[5:53] <elder> He seems very happy at what his job has become.
[5:54] <elder> I think the new director over there has been good for his role.
[5:54] * maxiz_ (~pfliu@ has joined #ceph
[5:54] <nhm> elder I'm glad they appreciate him, he's one of the best they have.
[5:54] <nhm> elder: I learned about overlapping IO from him.
[5:55] <elder> Overlapping I/O and computation?
[5:55] <phantomcircuit> im trying to setup a "cluster" on a single machine
[5:55] <phantomcircuit> running gentoo
[5:55] <nhm> elder: Yeah, and how they do it effectively in Paul's lab.
[5:55] <phantomcircuit> it seems like mkcephfs is working
[5:55] <phantomcircuit> but when i run /etc/init.d/ceph start
[5:55] <phantomcircuit> nothing happens
[5:55] <phantomcircuit> :(
[5:56] <elder> nhm, we were working on that when I was there. I gave a presentation on it called "Extreme Shared Memory" at ANL I think.
[5:56] <nhm> phantomcircuit: any logs?
[5:56] * calebamiles (~caleb@c-24-128-194-192.hsd1.vt.comcast.net) has joined #ceph
[5:57] <elder> I thought it was very interesting and that they could benefit from some of the scheduling algorithms used for CPU functional units. But nobody really got that... Tomasulo's algorithm was what I was thinking about at the time.
[5:57] <nhm> elder: Yeah, I think we spoke about that a bit before. I should go look up your presentation.
[5:57] * maxiz (~pfliu@ Quit (Ping timeout: 480 seconds)
[5:57] <elder> You may not find it.
[5:57] <elder> It was before you could find much useful on the web I think.
[5:58] <nhm> elder: So what has David's position become?
[5:58] <phantomcircuit> nhm, that's the weird thing
[5:58] <phantomcircuit> afaikt there are no logs
[5:59] <nhm> phantomcircuit: strange. Does the ceph.conf entries properly point to that machine? Maybe it doesn't think any of the daemons are supposed to run there?
[5:59] <elder> He's still a "technical consultant" or something. But apparently he had been making suggestions in vain about how to make MSI operate more collaboratively. The new director showed up and sent David a long list of all his suggestions, and wanted to talk about getting them implemented.
[6:00] <elder> In any case it seems like he's spending his time doing things he wants to.
[6:00] <phantomcircuit> nhm, the machines hostname is a full domain name, it's in /etc/hosts but dns resolves to another machine
[6:00] <phantomcircuit> would that cause problems
[6:00] <nhm> elder: David and I had long discussions about how to fix MSI while I was there. David and I were probably the two people most stuck between the two waring factions.
[6:01] <elder> Well maybe it's getting fixed. You should get in touch with him again to see.
[6:01] <nhm> elder: Many of the key players left either right before I did or a couple of months afterward. It's good that David was able to help put things back together.
[6:01] <elder> It was nice seeing him, I hadn't talked with him in years.
[6:02] <nhm> phantomcircuit: I think /etc/hosts should override it, but this really sounds like a dns/hostname/ip issue.
[6:03] <phantomcircuit> hmm
[6:03] <nhm> elder: speaking of which, we need to get together for lunch again so I can give you your raspberry pi.
[6:03] <elder> Yes we do. Maybe next week.
[6:03] <phantomcircuit> mkcephfs is setting up properly so i would assume it's the same check but i guess maybe not
[6:04] <nhm> phantomcircuit: I could be wrong too. :)
[6:04] <phantomcircuit> yeah that's what it was
[6:05] <nhm> bad dns?
[6:06] <phantomcircuit> hostname command returned the hostname without the domain portion
[6:06] <phantomcircuit> so it decided it's a different host
[6:06] <nhm> ah!
[6:06] <phantomcircuit> which doesn't seem right
[6:06] <phantomcircuit> but of well
[6:06] <phantomcircuit> oh*
[6:14] * The_Bishop (~bishop@2001:470:50b6:0:7021:1145:970c:6795) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[6:18] <tore_> BTW, I went back through logs and found some proof that ZFS does use SMART data to assess drive health
[6:18] <tore_> Check out this chunk of the log
[6:20] <tore_> Mar 14 17:03:33 hostname_omitted fmd: [ID 377184 daemon.error] SUNW-MSG-ID: DISK-8000-0X, TYPE: Fault, VER: 1, SEVERITY: Major
[6:20] <tore_> Mar 14 17:03:33 hostname_omitted EVENT-TIME: Wed Mar 14 17:03:33 JST 2012
[6:20] <tore_> Mar 14 17:03:33 hostname_omitted PLATFORM: X8DTS, CSN: 1234567890, HOSTNAME: hostname_omitted
[6:20] <tore_> Mar 14 17:03:33 hostname_omitted SOURCE: eft, REV: 1.16
[6:20] <tore_> Mar 14 17:03:33 hostname_omitted EVENT-ID: 11fb6e3e-e9d2-cd4a-cb25-ca3fa258e4e5
[6:20] <tore_> Mar 14 17:03:33 hostname_omitted DESC: SMART health-monitoring firmware reported that a disk
[6:20] <tore_> Mar 14 17:03:33 hostname_omitted failure is imminent.
[6:20] <tore_> Mar 14 17:03:33 hostname_omitted Refer to http://sun.com/msg/DISK-8000-0X for more information.
[6:20] <tore_> Mar 14 17:03:33 hostname_omitted AUTO-RESPONSE: None.
[6:20] <tore_> Mar 14 17:03:33 hostname_omitted IMPACT: It is likely that the continued operation of
[6:20] <tore_> Mar 14 17:03:33 hostname_omitted this disk will result in data loss.
[6:20] <tore_> Mar 14 17:03:33 hostname_omitted REC-ACTION: Schedule a repair procedure to replace the affected disk.
[6:20] <tore_> Mar 14 17:03:33 hostname_omitted Use fmdump -v -u <EVENT_ID> to identify the disk.
[6:20] <tore_> Mar 14 17:03:33 hostname_omitted genunix: [ID 846333 kern.warning] WARNING: constraints forbid retire: /scsi_vhci/disk@g5000c500261cfcd3
[6:20] <tore_> admin@hostname_omitted:/var/log$ fmdump -v -u 11fb6e3e-e9d2-cd4a-cb25-ca3fa258e4e5
[6:20] <tore_> TIME UUID SUNW-MSG-ID
[6:20] <tore_> Mar 14 17:03:33.1891 11fb6e3e-e9d2-cd4a-cb25-ca3fa258e4e5 DISK-8000-0X
[6:20] <tore_> 100% fault.io.disk.predictive-failure
[6:20] <tore_> basiclaly the disk was racking up grown defaults
[6:27] <tore_> eventually the system allowed ejection of the drive once a spare became available
[6:28] <tore_> Another good snippet to look at is:
[6:28] <tore_> extended device statistics ---- errors ---
[6:28] <tore_> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device
[6:28] <tore_> 9.9 9.4 526.0 85.1 0.1 0.1 4.3 4.2 2 5 20 0 20 40 c2d0
[6:28] <tore_> 1.0 1.0 1.0 1.0 0.0 0.0 0.0 3.1 0 1 0 0 0 0 c0t5000C500261D0853d0
[6:28] <tore_> 12.1 24.6 20.6 120.3 0.0 0.6 0.0 16.7 0 8 0 0 0 0 c0t5000C500261C5DF3d0
[6:29] <tore_> 0.2 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0 0 4812 0 0 4812 c0t50015179594E0E10d0
[6:29] <tore_> 2.1 6.0 5.3 9.9 0.0 0.1 0.0 15.6 0 2 0 0 0 0 c0t5000C500261D3E67d0
[6:29] <tore_> 0.8 0.5 4.5 40.4 0.0 0.0 0.0 1.7 0 0 4812 0 0 4812 c0t50015179594E6795d0
[6:29] <tore_> 0.2 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0 0 4812 0 0 4812 c0t50015179594E1129d0
[6:29] <tore_> 0.8 0.5 4.5 40.6 0.0 0.0 0.0 1.7 0 0 4812 0 0 4812 c0t50015179594E677Dd0
[6:29] <tore_> 0.8 0.5 4.5 40.5 0.0 0.0 0.0 1.7 0 0 4812 0 0 4812 c0t50015179594E67D9d0
[6:29] <tore_> 12.1 24.3 20.7 120.3 0.0 0.7 0.0 18.1 0 9 0 0 0 0 c0t5000C500261BD7CBd0
[6:29] <tore_> 11.9 24.4 20.5 120.0 0.0 0.6 0.0 16.9 0 9 0 36 83 119 c0t5000C500261C701Bd0
[6:29] <tore_> 0.8 0.5 4.5 40.4 0.0 0.0 0.0 1.7 0 0 4812 0 0 4812 c0t50015179594E66EDd0
[6:29] <tore_> 10.4 23.5 17.5 115.3 0.0 0.6 0.0 16.4 0 8 0 0 0 0 c0t5000C500261CF49Fd0
[6:29] <tore_> 11.6 24.6 20.1 120.3 0.0 0.6 0.0 17.2 0 8 0 0 0 0 c0t5000C500261C153Fd0
[6:29] <tore_> 0.2 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0 0 4812 0 0 4812 c0t50015179594E0C9Cd0
[6:29] <tore_> 2.6 9.4 4.7 41.9 0.0 0.2 0.0 13.0 0 2 0 0 0 0 sd15
[6:29] <tore_> 0.2 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0 0 4812 0 0 4812 c0t50015179594E0CE0d0
[6:29] <tore_> 7.9 20.1 12.5 117.4 0.0 1.4 1.2 48.4 1 37 0 51460 75449 126909 c0t5000C500261C85B3d0
[6:29] <tore_> 10.4 23.4 17.5 115.4 0.0 0.6 0.0 16.6 0 8 0 0 0 0 c0t5000C500261C8333d0
[6:29] <tore_> 12.1 25.0 20.9 121.0 0.0 0.6 0.0 16.5 0 8 0 0 0 0 c0t5000C500261C50A3d0
[6:29] <tore_> 12.0 24.6 20.6 120.3 0.0 0.6 0.0 16.8 0 8 0 0 0 0 c0t5000C500261CFCD3d0
[6:29] <tore_> 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c0t5000C500261C0E37d0
[6:29] <tore_> 12.1 25.0 21.0 121.0 0.0 0.6 0.0 17.6 0 9 0 0 0 0 c0t5000C5002619EC77d0
[6:29] <tore_> 12.0 24.5 20.5 120.0 0.0 0.6 0.0 16.5 0 8 0 0 0 0 c0t5000C500261CEE07d0
[6:29] <tore_> 10.4 23.3 17.5 115.3 0.0 0.6 0.0 16.6 0 8 0 0 0 0 c0t5000C500261C1D17d0
[6:29] <tore_> 10.4 23.5 17.4 115.3 0.0 0.5 0.0 16.1 0 8 0 0 0 0 c0t5000C500261D34B7d0
[6:29] <tore_> 12.1 25.1 20.9 121.0 0.0 0.6 0.0 16.0 0 8 0 0 0 0 c0t5000C500261CFC47d0
[6:29] <tore_> 11.8 24.6 20.0 119.9 0.0 0.6 0.0 15.9 0 8 0 0 0 0 c0t5000C500261C1AB7d0
[6:29] <tore_> 11.8 25.1 20.4 121.0 0.0 0.6 0.0 16.3 0 8 0 0 0 0 c0t5000C500261C0F57d0
[6:29] <tore_> 12.0 24.4 20.6 120.3 0.0 0.6 0.0 17.7 0 9 0 0 0 0 c0t5000C500261CC18Bd0
[6:29] <tore_> 12.0 24.2 20.6 120.0 0.0 0.6 0.0 17.6 0 9 0 0 0 0 c0t5000C500261D16EBd0
[6:29] <tore_> 3.9 6.7 5.3 29.8 0.0 0.2 0.0 14.3 0 2 0 0 0 0 sd31
[6:29] <tore_> 12.1 24.2 20.5 119.9 0.0 0.6 0.0 16.9 0 8 0 0 0 0 c0t5000C500261BF45Fd0
[6:30] <tore_> 3.0 7.8 7.5 13.7 0.0 0.1 0.0 10.0 0 2 0 0 0 0 c0t5000C50033E908DFd0
[6:30] <tore_> 0.2 0.0 0.3 0.0 0.0 0.0 0.0 4.0 0 0 0 1 0 1 c0t5000C500344C1D07d0
[6:32] <tore_> If you will, notice the drive with the average service time of 48.4. In this case the drive was taking 4 times longer than other drives to perform IO operations, but ZFS did not kick this drive out as one would have expected with all those hardware and transmission errors. Instead, it left it in and the IO just kept dropping off for the entire disk set. Striped sets are only as fast as their slowest performing drive with ZFS. This is one of the main
[6:32] <tore_> reasons why I'm moving away from ZFS and gravitating towards software like CEPH.
[6:48] * s_parlane (~scott@gate1.alliedtelesyn.co.nz) Quit (Ping timeout: 480 seconds)
[6:51] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) Quit (Quit: adjohn)
[6:58] * `gregorg` (~Greg@ has joined #ceph
[6:59] * gregorg_taf (~Greg@ Quit (Read error: Connection reset by peer)
[6:59] <phantomcircuit> http://pastebin.com/raw.php?i=PSthnBAv
[6:59] <phantomcircuit> trying to setup a qemu instance using ceph/rbd as disk backing
[6:59] <phantomcircuit> getting
[6:59] <phantomcircuit> could not open disk image rbd:email:auth_supported=none:mon_host=\:6789: Invalid argument
[7:09] <phantomcircuit> nvm i see my mistake
[7:09] <phantomcircuit> horray
[7:10] * yanzheng (~zhyan@ Quit (Remote host closed the connection)
[7:24] * yanzheng (~zhyan@ has joined #ceph
[7:27] * loicd (~loic@pat35-5-78-226-56-155.fbx.proxad.net) has joined #ceph
[7:28] * The_Bishop (~bishop@f052100089.adsl.alicedsl.de) has joined #ceph
[7:37] * loicd (~loic@pat35-5-78-226-56-155.fbx.proxad.net) Quit (Quit: Leaving.)
[7:52] * tnt (~tnt@162.63-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[7:55] * yanzheng (~zhyan@ Quit (Remote host closed the connection)
[8:09] * yanzheng (~zhyan@ has joined #ceph
[8:11] <renzhi> anyone knows how much time it would take to take a snapshot on a live system with over 15 millions objects?
[8:14] * loicd (~loic@pat35-5-78-226-56-155.fbx.proxad.net) has joined #ceph
[8:23] * loicd (~loic@pat35-5-78-226-56-155.fbx.proxad.net) Quit (Quit: Leaving.)
[8:49] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[8:51] * loicd (~loic@ has joined #ceph
[8:58] * fc (~fc@home.ploup.net) has joined #ceph
[9:04] * SIN (~SIN@ has joined #ceph
[9:27] * tnt (~tnt@162.63-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[9:34] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:40] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:45] * ScOut3R (~ScOut3R@dslC3E4E249.fixip.t-online.hu) has joined #ceph
[10:00] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) has joined #ceph
[10:07] * deepsa_ (~deepsa@ has joined #ceph
[10:12] * deepsa (~deepsa@ Quit (Ping timeout: 480 seconds)
[10:12] * deepsa_ is now known as deepsa
[10:34] * masterpe (~masterpe@ Quit (Quit: leaving)
[10:38] * Leseb (~Leseb@ has joined #ceph
[10:39] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[10:44] * loicd (~loic@ Quit (Quit: Leaving.)
[10:48] * nosebleedkt (~kostas@kotama.dataways.gr) has joined #ceph
[10:49] <nosebleedkt> joao, hi :D
[10:49] <joao> good morning :)
[10:49] * loicd (~loic@ has joined #ceph
[10:51] * renzhi is now known as _renzhi_away_
[10:58] * yoshi (~yoshi@p11251-ipngn4301marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[10:58] * jtangwk (~Adium@2001:770:10:500:24b3:c9e0:700a:125c) Quit (Read error: Connection reset by peer)
[10:58] * match (~mrichar1@pcw3047.see.ed.ac.uk) has joined #ceph
[10:58] * jtangwk (~Adium@2001:770:10:500:6482:6af9:bdc4:912c) has joined #ceph
[11:00] <nosebleedkt> joao, ready for a question storm ? :P
[11:03] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) has joined #ceph
[11:04] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[11:07] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[11:07] <phantomcircuit> lol derp
[11:07] <phantomcircuit> i just realized both the osd's are on the same RAID1
[11:08] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[11:08] <phantomcircuit> so i have 4 copies of everything instead of 2
[11:08] <phantomcircuit> lol
[11:16] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) Quit (Quit: adjohn)
[11:16] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[11:16] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[11:19] <joao> phantomcircuit, ehe
[11:19] <joao> redundancy!
[11:19] <joao> nosebleedkt, as long as I can help, sure...
[11:19] <joao> may leave mid-storm to grab more coffee though ;)
[11:20] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[11:21] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) Quit ()
[11:23] * s_parlane (~scott@ has joined #ceph
[11:23] <nosebleedkt> :P
[11:25] <nosebleedkt> joao, why the need of multiple monitors for one cluster?
[11:26] <joao> fail tolerance
[11:26] <joao> *failure
[11:26] <joao> redundancy too
[11:27] <nosebleedkt> joao, and if one monitor is down, which one is taking charge?
[11:28] * yanzheng (~zhyan@ Quit (Remote host closed the connection)
[11:28] <joao> they elect the leader among themselves using an algorithm called Paxos (the articles are pretty awesome btw)
[11:28] <nosebleedkt> oh
[11:29] <joao> in fact, what really happens, is that the one with the highest rank (lower numerical rank value though) will be elected as the leader
[11:29] <joao> and the rank is calculated taking into consideration the lowest numerical value amongst the ip:port of the monitors
[11:30] <joao> so say that you have 3 monitors, a =, b =, c =, the leader will be mon.c
[11:31] <joao> I mean, mon.c will have rank = 0, mon.a will have rank = 1, and mon.b will have rank = 2
[11:31] <nosebleedkt> definately you can be a teacher :D
[11:32] <phantomcircuit> joao, lol it's lvm mirrors with mirrored mirror log so every write is resulting in 2*(2+2) writes
[11:32] <joao> I'd make any student of mine's life a living hell though
[11:32] <phantomcircuit> disks are going to melt
[11:32] <joao> phantomcircuit, but what are the chances of all of them melting at the same time? :p
[11:34] <phantomcircuit> well they're identical disks under perfectly identical load purchased at the same time
[11:34] <phantomcircuit> so uh pretty high?
[11:34] <joao> oh
[11:34] <joao> well, that's a bummer then
[11:35] <joao> I'd hope that with all the redundancy flying around there you'd end up with at least one good disk just in time to replace the burnt disks
[11:35] <phantomcircuit> artificial redunancy
[11:35] <phantomcircuit> i've got two copies of the same data on the same disk
[11:36] <phantomcircuit> it's a lie :/
[11:36] <joao> yeah, far from optimal disk usage too
[11:37] <phantomcircuit> i can hear the drives from about 10ft away
[11:37] <phantomcircuit> that's probably not good for longevity
[11:37] * loicd (~loic@ Quit (Quit: Leaving.)
[11:37] <xiaoxi> joas:sorry, I am still a bit confuse about the rank,why b has rank 2?
[11:39] <joao> xiaoxi, ip on 'a' is the same as on 'b', but port on 'a' is lower than 'b'
[11:39] * maxiz_ (~pfliu@ Quit (Ping timeout: 480 seconds)
[11:40] <xiaoxi> ok,so paxos use IP as key for the sorting(or elation exactly)
[11:40] <joao> ip:port
[11:40] <joao> but yeah
[11:40] <xiaoxi> yeah
[11:42] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[11:42] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[11:44] <xiaoxi> joao:I have another question,how ceph remain consistency on failure?Say I have 2 Copys,OSD1 as primary and OSD2 followed.An update goes to OSD1,and OSD1 forward the update to OSD2,OSD2 applied the update successful but OSD1 failed and down.
[11:44] <dweazle> can i join in the question flood? :)
[11:44] <joao> xiaoxi, what then?
[11:44] <xiaoxi> I remember that OSD2 will become the primary then,but if OSD1 come back again,will it align with OSD2?
[11:45] <dweazle> what happens if you restart an osd, will it resync all pg's on that osd's or just the ones that are out of sync? (thinking about rolling updates of storage nodes here)
[11:45] <joao> osd2 becomes the primary if it has the same pgs as those that were lost with osd1; when osd1 comes back it will sync with osd2 yes
[11:45] <joao> dweazle, no idea
[11:45] <dweazle> ok, that's comforting
[11:45] <dweazle> guess i'll have to try and see for myself :)
[11:46] <xiaoxi> so the sync sometimes means updates while sometimes means rollback,right?
[11:46] <joao> I would say that it would only need to sync the ones that were changed, but I have no idea if I'm right
[11:47] <xiaoxi> well,the first question between sync should be:determine the which object is different
[11:47] <joao> I have little insight on how the recovery process between osds happen
[11:47] <xiaoxi> When you have large amount of object ,caculating hash and compare seems to take a long time?
[11:48] <joao> xiaoxi, it probably relies on the pgmap
[11:48] <joao> I would dig that out for you guys if I had the time to do it
[11:48] <xiaoxi> OK then,but will OSD1 grab the primary role from OSD2?
[11:49] <joao> xiaoxi, assuming that osd1 has a replica of the pg, then yes
[11:49] <joao> if there is no replica of the pg, no one will become primary and you'll end up with stuck pgs
[11:50] <xiaoxi> will such snatch happen before or after the sync?
[11:50] <joao> it will happen as soon as other osds notice that osd2 failed, they will the monitors know of that, the monitor will update the osdmap and redistribute it
[11:51] <joao> as soon as the osdmap mentions that osd1 is down, other osds will take primary on its pgs
[11:52] <xiaoxi> yeah, OSD1 is down-> OSD2 as primary -> OSD1 back(but OSD2 still alive), you mean OSD1 will become primary as soon as it come back?
[11:52] <joao> don't know
[11:53] <joao> if it becomes primary, I would assume that the only thing that makes sense is for it to become the primary only after recovering
[11:54] <xiaoxi> thanks, it looks to me that there are still a lot of things to learn in CRUSH and recovery
[11:54] <xiaoxi> for me to learn
[11:55] <dweazle> same here
[12:00] * yanzheng (~zhyan@ has joined #ceph
[12:00] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[12:00] <xiaoxi> And from the Ceph paper(in OSDI),it says primary will apply the update only after it collect acks from all other Replicas. But some guys told me these happened in parallel,mean primary will also apply the update during waiting acks from other.
[12:01] <xiaoxi> whichi is true?
[12:07] <joao> don't know, but I'd say that the latter one would make sense
[12:07] <joao> given that everybody with an updated map knows who's the primary, and taking into account that it is the primary that will always handle those writes anyway, that write is valid
[12:08] <joao> but sjust is probably the right person to answer these questions
[12:08] * MK_FG (~MK_FG@ Quit (Ping timeout: 480 seconds)
[12:13] <Robe> how pesky is it to move the journal in a running cluster?
[12:25] <tnt> shutdown the OSD, move the journal, start the OSD
[12:26] * MK_FG (~MK_FG@ has joined #ceph
[12:29] <Robe> ok, that sounds doable
[12:29] <Robe> just wondering if I need to care about correct journal placement upfront or if this is something I can add later on
[12:29] * stass (stas@ssh.deglitch.com) Quit (Ping timeout: 480 seconds)
[12:38] * loicd (~loic@ has joined #ceph
[12:43] <yanzheng> I think updates happen in parallel
[12:46] * loicd (~loic@ Quit (Ping timeout: 480 seconds)
[12:47] * stass (stas@ssh.deglitch.com) has joined #ceph
[12:48] <xiaoxi> I cannot find enough reasons to persuade myself these cannot happen in parallel..but the paper actually say it is sequential
[12:53] * ssedov (stas@ssh.deglitch.com) has joined #ceph
[12:54] * masterpe (~masterpe@2001:990:0:1674::1:82) has joined #ceph
[12:54] * masterpe (~masterpe@2001:990:0:1674::1:82) Quit ()
[12:54] <xiaoxi> joao:take a bit from the paper,about our discussion
[12:54] <xiaoxi> For example, suppose osd1 crashes and is marked
[12:54] <xiaoxi> down, and osd2 takes over as primary for pgA. If osd1
[12:54] <xiaoxi> recovers, it will request the latest map on boot, and a
[12:54] <xiaoxi> monitor will mark it as up. When osd2 receives the resulting
[12:54] <xiaoxi> map update, it will realize it is no longer primary
[12:54] <xiaoxi> for pgA and send the pgA version number to osd1.
[12:54] <xiaoxi> osd1 will retrieve recent pgA log entries from osd2,
[12:54] <xiaoxi> tell osd2 its contents are current, and then begin processing
[12:54] <xiaoxi> requests while any updated objects are recovered
[12:54] <xiaoxi> in the background.
[12:57] * yanzheng (~zhyan@ Quit (Remote host closed the connection)
[12:57] <joao> xiaoxi, fwiw, one of the first things I was told when I joined inktank is that the ceph paper and ceph itself do diverge in some stuff
[12:58] <joao> I suppose that may be one of them
[12:58] * stass (stas@ssh.deglitch.com) Quit (Ping timeout: 480 seconds)
[12:58] <joao> but I don't want to induce you in error, so I'll stick to my "no idea" ;)
[13:01] <xiaoxi> joao:Thanks :) Paper is for common ideas but code is the truth :)
[13:01] * masterpe (~masterpe@2001:990:0:1674::1:82) has joined #ceph
[13:02] * loicd (~loic@ has joined #ceph
[13:05] * yanzheng (~zhyan@ has joined #ceph
[13:05] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[13:14] <joao> xiaoxi, the papers are from sage's thesis, and they go back four or five years
[13:15] <joao> a lot of development has been done since then :)
[13:17] <nosebleedkt> joao, what was the command to see where an object is mapped in PG or OSD
[13:19] <yanzheng> ceph osd map <pool-name> <object-name>
[13:22] <xiaoxi> joao:a simple question,when using ceph as object store,will ceph still do some stripe?
[13:24] <yanzheng> I guess no
[13:29] <tnt> Mmm, two of my OSDs were OOM-killed last night ...
[13:34] <nosebleedkt> joao, can I add an OSD to a pool, but the OSD's disk to be remote?
[13:35] <tnt> And I can't restart them ... during the backfill process they start taking more memry again and get killed.
[13:46] * loicd (~loic@ Quit (Quit: Leaving.)
[13:46] <xiaoxi> can anyone send mail to ceph-devel@vger.kernel.org ? I can receive mail from it but I tried to send(I don't know if it is really sent out) but i cannot receive the mail I just sent
[13:47] <yanzheng> no
[13:48] <yanzheng> looks like vger.kernel.org is down
[13:49] <joao> autoanswer@vger.kernel.org didn't reply back
[13:50] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[14:02] * loicd (~loic@ has joined #ceph
[14:10] * deepsa (~deepsa@ Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[14:11] * deepsa (~deepsa@ has joined #ceph
[14:19] * yanzheng (~zhyan@ Quit (Remote host closed the connection)
[14:19] <tnt> ARGHHH ... damn fucking NTP failed on that OSD and apparently ceph react _VERY_ badly to timestamp difference ... I think a warning or safety checks could be useful instead of incontrollably rising the memory usage on MONs and OSDs.
[14:28] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[14:29] * yanzheng (~zhyan@jfdmzpr04-ext.jf.intel.com) has joined #ceph
[14:30] <nosebleedkt> joao, can I add an OSD to a pool, but the OSD's disk to be remote?
[14:30] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[14:30] <joao> didn't we discuss that yesterday?
[14:31] <joao> ceph doesn't handle latency very well
[14:31] <nosebleedkt> mhhh
[14:31] <nosebleedkt> wait
[14:32] <nosebleedkt> if that disk is on the same LAN but on different rack ?
[14:32] <joao> you can add a remote osd to a pool, granted the remaining cluster is able to communicate with the remote osd, but for now it's advised to keep everything "under the same roof"
[14:32] <joao> oh, sure
[14:32] <nosebleedkt> OK
[14:33] <joao> you might even consider to replicated amongst different failure domains for that matter
[14:33] <joao> say, replicate over different racks with independent power supplies, for instance
[14:33] <nosebleedkt> yeah
[14:34] <joao> in order to maximize failure tolerance
[14:35] <joao> well, back to my stuff; still have an open mon bug to tackle
[14:38] <nosebleedkt> thank you joao
[14:38] <joao> yw
[14:38] <joao> glad to help when I can :)
[14:38] * MarkS (~mark@irssi.mscholten.eu) Quit (Quit: leaving)
[14:43] <nosebleedkt> :D
[15:13] * sukiyaki (~Tecca@ has joined #ceph
[15:13] * loicd (~loic@ Quit (Quit: Leaving.)
[15:14] <sukiyaki> are there any known (or unknown but rumored) issues of 0.48.2 silently losing data?
[15:15] <sukiyaki> ceph health warn only shows: pg 1.599 is stuck active+remapped, last acting [63,43,7,8] which would not cause any files to be available afaik
[15:15] <sukiyaki> *unavailable
[15:16] <sukiyaki> the only thing that seems somewhat related on the mailing list is http://www.spinics.net/lists/ceph-devel/msg09777.html
[15:17] <sukiyaki> however we use librados directly
[15:22] <sukiyaki> I noticed doing ceph osd map <pool> <object> on about 10 of the resources yields a common set, but I'm not sure if that could mean anything
[15:26] * The_Bishop_ (~bishop@e179009022.adsl.alicedsl.de) has joined #ceph
[15:27] * loicd (~loic@ has joined #ceph
[15:33] * The_Bishop (~bishop@f052100089.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[15:37] * guigouz1 (~guigouz@ has joined #ceph
[15:37] * xiaoxi (~xiaoxiche@ Quit (Remote host closed the connection)
[15:38] * xiaoxi (~xiaoxiche@ has joined #ceph
[15:43] * yanzheng (~zhyan@jfdmzpr04-ext.jf.intel.com) Quit (Remote host closed the connection)
[15:47] * vata (~vata@ Quit (Read error: Connection reset by peer)
[15:55] * yanzheng (~zhyan@ has joined #ceph
[15:55] * xiaoxi (~xiaoxiche@ Quit (Remote host closed the connection)
[15:56] * xiaoxi (~xiaoxiche@ has joined #ceph
[15:56] * xiaoxi (~xiaoxiche@ Quit ()
[15:56] * xiaoxi (~xiaoxiche@ has joined #ceph
[16:01] * nosebleedkt (~kostas@kotama.dataways.gr) Quit (Quit: Leaving)
[16:01] * loicd (~loic@ Quit (Quit: Leaving.)
[16:17] * yanzheng (~zhyan@ Quit (Remote host closed the connection)
[16:27] * scuttlemonkey (~scuttlemo@96-42-146-5.dhcp.trcy.mi.charter.com) has joined #ceph
[16:27] * ChanServ sets mode +o scuttlemonkey
[16:38] * scuttlemonkey (~scuttlemo@96-42-146-5.dhcp.trcy.mi.charter.com) Quit (Quit: This computer has gone to sleep)
[16:42] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) has joined #ceph
[17:12] * lotia (~lotia@l.monkey.org) has joined #ceph
[17:13] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[17:23] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:23] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[17:25] * yanzheng (~zhyan@jfdmzpr03-ext.jf.intel.com) has joined #ceph
[17:26] * ScOut3R (~ScOut3R@dslC3E4E249.fixip.t-online.hu) Quit (Remote host closed the connection)
[17:27] * loicd (~loic@magenta.dachary.org) has joined #ceph
[17:32] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Quit: Ex-Chat)
[17:42] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:54] * tnt (~tnt@162.63-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:01] <SIN> Hello!
[18:05] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) Quit (Quit: Leaving.)
[18:30] * plut0 (~cory@pool-96-236-43-69.albyny.fios.verizon.net) has joined #ceph
[18:30] <plut0> happy thanksgiving all
[18:36] * The_Bishop_ (~bishop@e179009022.adsl.alicedsl.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[18:36] <nhm> gobble gobble
[18:38] <nhm> tnt: can you submit a bug report for that? sounds like something we should try to take care of.
[18:45] * sagewk (~sage@2607:f298:a:607:91b3:f084:6ac7:d6a4) Quit (Ping timeout: 480 seconds)
[18:56] * sagewk (~sage@2607:f298:a:607:e116:e786:b94f:5586) has joined #ceph
[19:03] * BManojlovic (~steki@ has joined #ceph
[19:13] * Leseb (~Leseb@ Quit (Quit: Leseb)
[19:29] * vata (~vata@ has joined #ceph
[19:34] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[19:49] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[19:51] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[19:52] * loicd (~loic@magenta.dachary.org) has joined #ceph
[19:52] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[19:57] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[20:11] * yanzheng (~zhyan@jfdmzpr03-ext.jf.intel.com) Quit (Remote host closed the connection)
[20:31] * andreask1 (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[20:33] * The_Bishop (~bishop@2001:470:50b6:0:d863:2ddf:b91f:ba88) has joined #ceph
[20:38] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[20:51] * plut0 (~cory@pool-96-236-43-69.albyny.fios.verizon.net) has left #ceph
[21:00] * s_parlane (~scott@ Quit (Ping timeout: 480 seconds)
[21:09] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[21:32] * scalability-junk (~stp@188-193-202-99-dynip.superkabel.de) has joined #ceph
[21:38] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[21:38] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:43] * s_parlane (~scott@gate1.alliedtelesyn.co.nz) has joined #ceph
[22:08] * guigouz1 (~guigouz@ Quit (Quit: Computer has gone to sleep.)
[22:20] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) Quit (Quit: adjohn)
[22:34] * s_parlane (~scott@gate1.alliedtelesyn.co.nz) Quit (Ping timeout: 480 seconds)
[22:40] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[22:40] * loicd (~loic@magenta.dachary.org) has joined #ceph
[22:51] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) Quit (Remote host closed the connection)
[23:24] * s_parlane (~scott@ has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.