#ceph IRC Log

Index

IRC Log for 2013-10-24

Timestamps are in GMT/BST.

[0:00] <lurbs> Which seems possible given that it's going to be under pretty heavy I/O load, and potentialy writing to previously unused sectors of the disk.
[0:01] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) Quit (Ping timeout: 480 seconds)
[0:01] * dmsimard (~Adium@2607:f748:9:1666:908c:55c9:a8b4:5272) Quit (Quit: Leaving.)
[0:02] <loicd> lurbs: you should figure out X because it depends on your target architecture. And by combining all factors in your environment that are likely to end with the destruction of data ( this also depends highly on your environment ) you will need to translate this in probabilities over the period of X seconds ( no clue how but someone should be able to answer in theory ).
[0:03] <loicd> now what's really interesting is how the result changes when you go from 2 to 3
[0:04] <lurbs> I feel a spreadsheet coming on.
[0:05] <lurbs> I just hope I can find disk failure rate probabilities that aren't complete lies.
[0:05] <loicd> (disk destruction)% over a period of (recovery time)sec = odds of losing data when you have two replicas
[0:06] <loicd> (disk destruction)% * 2 over a period of (recovery time)sec = odds of losing data when you have three replicas ?
[0:06] <pmatulis_> how does one "Add the user and secret to the CEPH_ARGS environment variable so that you don???t need to enter them each time"
[0:06] * loicd is way above his paygrade :-)
[0:07] <lurbs> My problem is that the (disk destruction)% depends on a whole bunch of things. Disk age, temperature, previous errors, etc. I guess I work it out for a worst case scenario, because that's what we have to be prepared for.
[0:08] <loicd> but do you need to be concerned by this to convince the powers that be ?
[0:08] <loicd> or do you just need to show the difference ?
[0:08] <lurbs> Both, I think.
[0:09] <lurbs> Anyway, thanks. I'm off to do the math.
[0:10] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[0:15] * rongze (~rongze@117.79.232.203) has joined #ceph
[0:15] * thomnico (~thomnico@70.35.39.20) Quit (Ping timeout: 480 seconds)
[0:16] * carif (~mcarifio@wrls-67-134-204-24.wrls.harvard.edu) has joined #ceph
[0:17] * markbby (~Adium@168.94.245.3) Quit (Quit: Leaving.)
[0:23] * dxd828 (~dxd828@host-2-97-72-213.as13285.net) Quit (Quit: Computer has gone to sleep.)
[0:23] * dxd828 (~dxd828@host-2-97-72-213.as13285.net) has joined #ceph
[0:24] * rongze (~rongze@117.79.232.203) Quit (Ping timeout: 480 seconds)
[0:24] * dxd828 (~dxd828@host-2-97-72-213.as13285.net) Quit ()
[0:25] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) Quit (Quit: ...)
[0:25] * mschiff (~mschiff@85.182.236.82) Quit (Remote host closed the connection)
[0:27] * ircolle (~Adium@2601:1:8380:2d9:c2c:8633:24c3:23b9) has joined #ceph
[0:27] * JoeGruher (~JoeGruher@134.134.139.74) has joined #ceph
[0:27] * danieagle (~Daniel@179.186.126.49.dynamic.adsl.gvt.net.br) has joined #ceph
[0:30] * Kioob (~kioob@luuna.daevel.fr) Quit (Quit: Leaving.)
[0:32] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) has joined #ceph
[0:33] * ircolle (~Adium@2601:1:8380:2d9:c2c:8633:24c3:23b9) Quit (Quit: Leaving.)
[0:34] * carif (~mcarifio@wrls-67-134-204-24.wrls.harvard.edu) Quit (Ping timeout: 480 seconds)
[0:35] * nhm (~nhm@172.56.7.171) Quit (Ping timeout: 480 seconds)
[0:36] * alram (~alram@216.103.134.250) Quit (Quit: leaving)
[0:39] * JoeGruher (~JoeGruher@134.134.139.74) Quit (Remote host closed the connection)
[0:41] * alram (~alram@216.103.134.250) has joined #ceph
[1:02] * smiley__ (~smiley@pool-108-28-107-254.washdc.fios.verizon.net) has joined #ceph
[1:03] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[1:05] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[1:06] * mtanski (~mtanski@69.193.178.202) Quit (Read error: Operation timed out)
[1:11] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[1:15] * iii8 (~Miranda@91.207.132.71) Quit (Read error: Connection reset by peer)
[1:17] * rongze (~rongze@211.155.113.161) has joined #ceph
[1:17] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[1:19] * dty (~derek@pool-71-114-104-38.washdc.fios.verizon.net) has joined #ceph
[1:20] * danieagle (~Daniel@179.186.126.49.dynamic.adsl.gvt.net.br) Quit (Quit: inte+ e Obrigado Por tudo mesmo! :-D)
[1:24] * cfreak201 (~cfreak200@p4FF3EB0C.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[1:25] * rongze (~rongze@211.155.113.161) Quit (Ping timeout: 480 seconds)
[1:28] * LeaChim (~LeaChim@host86-162-2-255.range86-162.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[1:32] * cfreak200 (~cfreak200@p4FF3F14F.dip0.t-ipconnect.de) has joined #ceph
[1:32] * glanzi (~glanzi@201.75.202.207) has joined #ceph
[1:34] * glanzi (~glanzi@201.75.202.207) has left #ceph
[1:41] * gregmark (~Adium@68.87.42.115) Quit (Quit: Leaving.)
[1:41] * nigwil (~chatzilla@2001:44b8:5144:7b00:39ff:fd0b:6dee:4268) has joined #ceph
[1:46] * nigwil_ (~chatzilla@2001:44b8:5144:7b00:39ff:fd0b:6dee:4268) Quit (Ping timeout: 480 seconds)
[1:50] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Quit: Leaving.)
[1:52] * nhm (~nhm@216.9.110.13) has joined #ceph
[1:56] * iii8 (~Miranda@91.207.132.71) has joined #ceph
[2:03] * alram (~alram@216.103.134.250) Quit (Quit: leaving)
[2:03] * freedomhui (~freedomhu@117.79.232.203) has joined #ceph
[2:03] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[2:06] * dmsimard (~Adium@69-165-206-93.cable.teksavvy.com) has joined #ceph
[2:10] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[2:11] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[2:12] * gregsfortytwo (~Adium@2607:f298:a:607:54e9:7eed:805:8fc0) has joined #ceph
[2:12] * nhm (~nhm@216.9.110.13) Quit (Ping timeout: 480 seconds)
[2:13] * nhm (~nhm@172.56.7.131) has joined #ceph
[2:14] * angdraug (~angdraug@64-79-127-122.static.wiline.com) Quit (Quit: Leaving)
[2:14] * dmsimard (~Adium@69-165-206-93.cable.teksavvy.com) Quit (Quit: Leaving.)
[2:17] * rongze (~rongze@117.79.232.203) has joined #ceph
[2:25] * rongze (~rongze@117.79.232.203) Quit (Ping timeout: 480 seconds)
[2:27] * Pedras (~Adium@216.207.42.132) Quit (Ping timeout: 480 seconds)
[2:28] * BillK (~BillK-OFT@58-7-61-12.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[2:29] * BillK (~BillK-OFT@106-68-81-108.dyn.iinet.net.au) has joined #ceph
[2:32] * xarses (~andreww@64-79-127-122.static.wiline.com) Quit (Ping timeout: 480 seconds)
[2:33] * nhm (~nhm@172.56.7.131) Quit (Ping timeout: 480 seconds)
[2:44] * malcolm (~malcolm@silico24.lnk.telstra.net) has joined #ceph
[2:44] <malcolm> Hi all.
[2:44] <malcolm> I'm building a cluster with ceph-deploy and it just never settles
[2:44] <malcolm> i get the mon created. I add my 10 od's
[2:45] * yy-nm (~Thunderbi@122.224.154.38) has joined #ceph
[2:45] <malcolm> and it just never settles.. I see osd's going in an out like yo-yo's
[2:46] <malcolm> I see HEAPS of things marked for rebuild
[2:46] <malcolm> It's just all nuts
[2:46] <malcolm> It never looked like this doing cephmkfs
[2:47] * gpmidi (~gpmidi@gen-public.gpmidi.net) has joined #ceph
[2:53] <sagelap> malcolm: usually this happens when there is some network issue, like when the osds can't talk to each other.
[2:53] <sagelap> did you configure a cluster network by chance? or is there anything else odd with the network that would prevent osds from pinging each other?
[2:54] * xarses (~andreww@c-71-202-167-197.hsd1.ca.comcast.net) has joined #ceph
[2:56] <gpmidi> I've been having problems with the OSD processes on one of my servers core dumping anywhere from a few seconds after starting to 30+ min after the process was started. There are two OSD processes that both will crash (but not at the exact same time). The monitor daemon running on the same box doesn't have any problems. The OSDs on two other boxes with simlar configs and part of the same cluster don't crash. The OSD logs show that it's an assert failing.
[2:57] <gpmidi> I think this is also relivant:
[2:57] <gpmidi> 0> 2013-10-21 17:52:44.370682 7f3b582e6700 -1 os/FileStore.cc: In function 'virtual void SyncEntryTimeout::finish(int)' thread 7f3b582e6700 time 2013-10-21 17:52:44.351186
[2:57] <gpmidi> os/FileStore.cc: 3379: FAILED assert(0)
[2:57] * BillK (~BillK-OFT@106-68-81-108.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[3:00] <sagelap> gpmidi: that means it timed out waiting for sync(2)... look at dmesg output to see if one of your local fs mount points is hung
[3:01] <gpmidi> sagelap: Nothing in dmesg. Plus there are other processes that are using the same array that's not having any problems
[3:02] <gpmidi> I've also tried moving the journal to a raw device based on a mid-end consumer SSD. There was no change in when the crashes occured.
[3:03] * BillK (~BillK-OFT@106-68-105-211.dyn.iinet.net.au) has joined #ceph
[3:03] <gpmidi> Also tried tweaking some config options for the two OSDs on the host that are having issues: journal_block_align = false | filestore_max_sync_interval=10 | filestore_min_sync_interval=0.5 | osd_disk_threads=1 | osd_op_thread_timeout=300
[3:03] * gregsfortytwo (~Adium@2607:f298:a:607:54e9:7eed:805:8fc0) Quit (Quit: Leaving.)
[3:03] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[3:04] * gregsfortytwo (~Adium@2607:f298:a:607:54e9:7eed:805:8fc0) has joined #ceph
[3:07] <gpmidi> As a side note, I've disabled saving of core dumps for the OSD to keep it from filling up one of the file systems
[3:08] * yanzheng (~zhyan@134.134.139.74) has joined #ceph
[3:08] * mozg (~andrei@host86-184-120-113.range86-184.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[3:09] <gpmidi> Which device is it trying to sync? The journal? The data storage location? Somewhere else?
[3:11] * Icefyre (~Icefyre@75-163-215-182.clsp.qwest.net) has joined #ceph
[3:11] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[3:12] * gregsfortytwo (~Adium@2607:f298:a:607:54e9:7eed:805:8fc0) Quit (Ping timeout: 480 seconds)
[3:13] <gpmidi> As for the platform: CentOS 6 fully updated | Kernel: 2.6.32-358.18.1.el6.x86_64 | An `rpm -Va` doesn't show any changes other than ceph.conf for Ceph and related libraries/packages
[3:13] * Icefyre (~Icefyre@75-163-215-182.clsp.qwest.net) Quit ()
[3:17] * huangjun (~kvirc@111.172.153.78) has joined #ceph
[3:26] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Quit: Leaving.)
[3:28] <malcolm> sagelap: The osds do have multiple nic's but are all configured so that hostname -s resolves to the same ip has hostname
[3:28] <malcolm> sagelap: i didn't setup a cluster netowrk
[3:29] <malcolm> any easy way to look at logs or something and see what the osd's are seeing?
[3:31] * jakku (~jakku@ad044124.dynamic.ppp.asahi-net.or.jp) has joined #ceph
[3:31] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Read error: Connection reset by peer)
[3:32] * jakku (~jakku@ad044124.dynamic.ppp.asahi-net.or.jp) Quit (Remote host closed the connection)
[3:32] * jakku (~jakku@ad044124.dynamic.ppp.asahi-net.or.jp) has joined #ceph
[3:32] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) Quit (Quit: Leaving.)
[3:33] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) has joined #ceph
[3:34] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[3:42] * sjm (~sjm@pool-96-234-124-66.nwrknj.fios.verizon.net) has joined #ceph
[3:52] <sagelap> malcolm: you can look at /va/rlog/ceph/ceph.log on the mon to see which osd is marking which other osd down
[3:53] * The_Bishop (~bishop@2001:470:50b6:0:5d8:43e2:642c:5286) Quit (Ping timeout: 480 seconds)
[3:53] <sagelap> or you can look at /var/log/ceph/ceph-osd*.log on the osd nodes to see what they are doing (set debug ms = 1 in the [osd] section of ceph.conf to see the message they send... you should see osd_ping go out via --> and come back via <== , but will probably see no reply and a osd_failure message going t othe mon as a result
[3:57] * rongze (~rongze@117.79.232.220) has joined #ceph
[3:57] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[3:58] * rongze (~rongze@117.79.232.220) Quit (Remote host closed the connection)
[3:59] * rongze (~rongze@117.79.232.252) has joined #ceph
[4:03] * The_Bishop (~bishop@2001:470:50b6:0:154c:c2d4:4e19:7dd6) has joined #ceph
[4:04] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[4:09] * KevinPerks1 (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[4:10] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Ping timeout: 480 seconds)
[4:12] <malcolm> sagellap: thanks I'll have a squiz
[4:12] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[4:29] <gpmidi> sagelap: Any ideas as to what else I can do? Currently I'm trying rolling back to the same kernel version that the other OSDs are running.
[4:34] * haomaiwang (~haomaiwan@183.220.31.90) has joined #ceph
[4:37] <sagelap> gpmidi: while ceph-osd is hung but before it crashes, you could do echo w > /proc/sysrq-trigger to see what syscalls are blocked.
[4:37] <sagelap> you should see one stuck in sys_syncfs
[4:38] * glzhao (~glzhao@118.195.65.67) has joined #ceph
[4:39] <gpmidi> sagelap: Cool, thanks. I'll give this a shot. Although I think it'll be a bit before I can give it a shot; The current config takes around 10m to 30m to crash
[4:40] <gpmidi> sagelap: If Murphy's Law keeps up then the kernel downgrade will fix it.
[4:44] * haomaiwa_ (~haomaiwan@183.220.19.60) has joined #ceph
[4:44] * haomaiwang (~haomaiwan@183.220.31.90) Quit (Read error: Connection reset by peer)
[4:45] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) Quit (Quit: Leaving.)
[4:54] * TiCPU (jerome@p4.i.ticpu.net) has joined #ceph
[5:04] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[5:06] * fireD (~fireD@93-139-181-79.adsl.net.t-com.hr) has joined #ceph
[5:07] * fireD_ (~fireD@93-139-135-235.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[5:08] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[5:11] * carif (~mcarifio@146-115-183-141.c3-0.wtr-ubr1.sbo-wtr.ma.cable.rcn.com) has joined #ceph
[5:12] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[5:12] * lupine (~lupine@lupine.me.uk) Quit (Ping timeout: 480 seconds)
[5:15] * houkouonchi-home (~linux@2001:470:c:c69::2) Quit (Read error: Connection reset by peer)
[5:37] <gpmidi> sagelap: Are you still around? If so, I've got a list of what's running (or rather blocked) when it crashes. Could you aid me in reviewing the logs.
[5:42] * julian (~julianwa@125.70.133.130) has joined #ceph
[5:45] <gpmidi> sagelap: Best I can tell there are 175 ceph-osd calls showing as "kernel: ceph-osd D 0000000000000003 0 23895 1 0x00000080" and simlar
[5:50] * yy-nm (~Thunderbi@122.224.154.38) Quit (Quit: yy-nm)
[5:51] <gpmidi> sagelap: I can PM you a link to the logs if you'd like to see for yourself.
[6:04] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[6:12] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[6:13] * KevinPerks1 (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Quit: Leaving.)
[6:36] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[6:42] * rongze (~rongze@117.79.232.252) Quit (Remote host closed the connection)
[6:48] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[6:48] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) has joined #ceph
[6:50] * nhm (~nhm@184-97-129-163.mpls.qwest.net) has joined #ceph
[6:53] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) Quit (Ping timeout: 480 seconds)
[6:58] * smiley__ (~smiley@pool-108-28-107-254.washdc.fios.verizon.net) Quit (Quit: smiley__)
[7:00] * sagelap (~sage@2600:1001:b12e:b7b4:2467:e849:938f:f70e) Quit (Ping timeout: 480 seconds)
[7:11] * sagelap (~sage@2600:1001:b11c:40d0:215:ffff:fe36:60) has joined #ceph
[7:16] * nhm (~nhm@184-97-129-163.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[7:21] * glzhao (~glzhao@118.195.65.67) Quit (Ping timeout: 480 seconds)
[7:43] * rongze (~rongze@117.79.232.220) has joined #ceph
[7:44] * gregsfortytwo (~Adium@cpe-172-250-69-138.socal.res.rr.com) has joined #ceph
[7:50] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[7:51] * rongze (~rongze@117.79.232.220) Quit (Ping timeout: 480 seconds)
[7:52] * sjusthm (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) Quit (Read error: Operation timed out)
[7:53] * carif (~mcarifio@146-115-183-141.c3-0.wtr-ubr1.sbo-wtr.ma.cable.rcn.com) Quit (Ping timeout: 480 seconds)
[8:04] * capri (~capri@212.218.127.222) Quit (Quit: Verlassend)
[8:07] * capri (~capri@212.218.127.222) has joined #ceph
[8:10] * Cube1 (~Cube@66-87-66-139.pools.spcsdns.net) has joined #ceph
[8:10] * ron-slc (~Ron@173-165-129-125-utah.hfc.comcastbusiness.net) Quit (Remote host closed the connection)
[8:14] * gregsfortytwo (~Adium@cpe-172-250-69-138.socal.res.rr.com) Quit (Quit: Leaving.)
[8:17] * Cube (~Cube@66-87-66-139.pools.spcsdns.net) Quit (Ping timeout: 480 seconds)
[8:18] * RuediR (~Adium@macrr.switch.ch) has joined #ceph
[8:19] * haomaiwa_ (~haomaiwan@183.220.19.60) Quit (Remote host closed the connection)
[8:19] * haomaiwang (~haomaiwan@183.220.19.60) has joined #ceph
[8:27] * haomaiwang (~haomaiwan@183.220.19.60) Quit (Ping timeout: 480 seconds)
[8:27] * topro (~topro@host-62-245-142-50.customer.m-online.net) has joined #ceph
[8:28] * rongze (~rongze@123.151.28.74) has joined #ceph
[8:29] * sleinen (~Adium@130.59.94.141) has joined #ceph
[8:31] * sleinen1 (~Adium@2001:620:0:25:e010:b3f2:f255:483a) has joined #ceph
[8:36] * sjm (~sjm@pool-96-234-124-66.nwrknj.fios.verizon.net) has left #ceph
[8:38] * sleinen (~Adium@130.59.94.141) Quit (Ping timeout: 480 seconds)
[8:46] * steki (~steki@91.195.39.5) has joined #ceph
[8:46] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[8:48] <malcolm> Ok so my cluster is up. But showing HEALTH_WARN
[8:48] <malcolm> how do you find the cause? I can'
[8:48] <malcolm> ^can't find any good reason for it to be in warning state
[8:48] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Remote host closed the connection)
[8:49] * steki is now known as BManojlovic
[8:50] <malcolm> NVM
[8:50] <malcolm> I found it...
[8:50] * mattt_ (~textual@94.236.7.190) has joined #ceph
[8:54] <aarontc> I think I may have encountered a bug in 0.67.4 - I have a VM running under qemu-kvm using rbd for the storage device, and that VM mounts CephFS internally. If I shutdown that VM (init scripts are not unmounting cephFS before shutting down network interface, which I am going to fix), the machine hangs forever at "unmounting /mnt/ceph" as expected, and if I try to force reset or force shutdown the qemu process hangs indefinitely on
[8:54] <aarontc> the host
[8:55] <aarontc> shortly thereafter, the host becomes nonresponsive, can't even login on the physical console :(
[8:56] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has left #ceph
[8:56] * dxd828 (~dxd828@212.183.128.229) has joined #ceph
[8:57] <aarontc> not useful but if I had to guess, the qemu process is in some kind of deadlock and eating all the RAM on the host
[9:00] * jcfischer (~fischer@macjcf.switch.ch) has joined #ceph
[9:08] <yanzheng> aarontc, no idea who cephfs in the guest can cause qemu hang
[9:09] <yanzheng> s/who/how
[9:10] <aarontc> yanzheng: I'm thinking it may be more related to a problem of the CephFS client vanishing, causing some weird behavior. I read in the mailing list that right now, CephFS is vulnerable to problems with clients causing slowdowns or hangs for the whole cluster
[9:10] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[9:12] <yanzheng> vanished client can cause cephfs hang for a while, but the client's session will eventually timeout
[9:12] <aarontc> yanzheng: could that be affecting the rbd also?
[9:12] <yanzheng> no
[9:13] <aarontc> hmmm
[9:13] <aarontc> I will have to see if I can reproduce it without destroying all my vms :)
[9:14] * malcolm (~malcolm@silico24.lnk.telstra.net) Quit (Ping timeout: 480 seconds)
[9:33] * dxd828 (~dxd828@212.183.128.229) Quit (Quit: Computer has gone to sleep.)
[9:34] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) Quit (Quit: Leaving.)
[9:36] * ScOut3R (~ScOut3R@dslC3E4E249.fixip.t-online.hu) has joined #ceph
[9:37] * Gdub (~Adium@dsl093-174-037-223.dialup.saveho.com) has joined #ceph
[9:40] * mschiff (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) has joined #ceph
[9:44] * mschiff_ (~mschiff@tmo-108-147.customers.d1-online.com) has joined #ceph
[9:50] * mschiff (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[9:52] * Gdub (~Adium@dsl093-174-037-223.dialup.saveho.com) Quit (Quit: Leaving.)
[9:52] * jtlebigot (~jlebigot@proxy.ovh.net) has joined #ceph
[9:53] * Gdub (~Adium@dsl093-174-037-223.dialup.saveho.com) has joined #ceph
[9:57] * RuediR1 (~Adium@2001:620:0:2d:cae0:ebff:fe18:5325) has joined #ceph
[9:58] * jcfischer_ (~fischer@130.59.94.238) has joined #ceph
[9:58] * jcfischer_ (~fischer@130.59.94.238) Quit ()
[9:58] * jcfischer_ (~fischer@130.59.94.238) has joined #ceph
[9:59] * jcfischer (~fischer@macjcf.switch.ch) Quit (Read error: Operation timed out)
[10:00] * sleinen1 (~Adium@2001:620:0:25:e010:b3f2:f255:483a) Quit (Quit: Leaving.)
[10:00] * sleinen (~Adium@130.59.94.141) has joined #ceph
[10:01] * jcfischer (~fischer@user-23-10.vpn.switch.ch) has joined #ceph
[10:01] * sleinen1 (~Adium@130.59.94.141) has joined #ceph
[10:01] * sleinen (~Adium@130.59.94.141) Quit (Read error: Connection reset by peer)
[10:01] * sleinen (~Adium@2001:620:0:26:f024:a5c0:a962:ae5f) has joined #ceph
[10:02] * RuediR (~Adium@macrr.switch.ch) Quit (Ping timeout: 480 seconds)
[10:03] * dxd828 (~dxd828@195.191.107.205) has joined #ceph
[10:05] * RuediR1 (~Adium@2001:620:0:2d:cae0:ebff:fe18:5325) Quit (Ping timeout: 480 seconds)
[10:07] * jcfischer_ (~fischer@130.59.94.238) Quit (Ping timeout: 480 seconds)
[10:07] * jbd_ (~jbd_@2001:41d0:52:a00::77) has joined #ceph
[10:09] * sleinen1 (~Adium@130.59.94.141) Quit (Ping timeout: 480 seconds)
[10:11] * RuediR (~Adium@2001:620:0:25:645f:e0ff:fe5c:9ba2) has joined #ceph
[10:15] * rongze (~rongze@123.151.28.74) Quit (Read error: Connection reset by peer)
[10:15] * rongze (~rongze@123.151.28.74) has joined #ceph
[10:17] * dxd828 (~dxd828@195.191.107.205) Quit (Quit: Computer has gone to sleep.)
[10:23] * renzhi (~renzhi@116.226.38.214) Quit (Read error: Connection reset by peer)
[10:25] * jcfischer (~fischer@user-23-10.vpn.switch.ch) Quit (Quit: jcfischer)
[10:30] * mschiff (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) has joined #ceph
[10:35] * mschiff__ (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) has joined #ceph
[10:35] * mschiff (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) Quit (Read error: Connection reset by peer)
[10:36] * jcfischer (~fischer@130.59.94.238) has joined #ceph
[10:37] * mschiff_ (~mschiff@tmo-108-147.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[10:37] * lupine (~lupine@lupine.me.uk) has joined #ceph
[10:39] * jcfischer (~fischer@130.59.94.238) Quit ()
[10:42] * LeaChim (~LeaChim@host86-162-2-255.range86-162.btcentralplus.com) has joined #ceph
[10:43] * mschiff (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) has joined #ceph
[10:43] * mschiff__ (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) Quit (Read error: Connection reset by peer)
[10:49] * jcfischer (~fischer@130.59.94.238) has joined #ceph
[10:51] * mschiff_ (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) has joined #ceph
[10:51] * mschiff (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) Quit (Read error: Connection reset by peer)
[10:52] * jcfischer_ (~fischer@user-28-12.vpn.switch.ch) has joined #ceph
[10:57] * jcfischer (~fischer@130.59.94.238) Quit (Ping timeout: 480 seconds)
[10:57] * jcfischer_ is now known as jcfischer
[10:58] * AfC (~andrew@2001:44b8:31cb:d400:6e88:14ff:fe33:2a9c) has joined #ceph
[11:01] * mschiff (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) has joined #ceph
[11:01] * mschiff_ (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) Quit (Read error: Connection reset by peer)
[11:01] * yanzheng (~zhyan@134.134.139.74) Quit (Quit: Leaving)
[11:03] <tnt_> Is there an explanation of the difference between "log to syslog" and "clog to syslog" somewhere ?
[11:06] * raipin (raipin@a.clients.kiwiirc.com) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[11:09] * ana_ (~quassel@167.196.23.95.dynamic.jazztel.es) has joined #ceph
[11:10] * ana_ (~quassel@167.196.23.95.dynamic.jazztel.es) has left #ceph
[11:12] * tziOm (~bjornar@194.19.106.242) Quit (Remote host closed the connection)
[11:15] <BillK> Is there a bug in 67.4 re hitting this assert? - ... mds/MDCache.cc: 215: FAILED assert(inode_map.count(in->vino()) == 0)
[11:15] <BillK> its now happening again after recreating the data and metadata :(
[11:15] <BillK> Cant get the MDS to stay up
[11:17] * mschiff_ (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) has joined #ceph
[11:18] * mschiff (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) Quit (Read error: Connection reset by peer)
[11:18] * sleinen (~Adium@2001:620:0:26:f024:a5c0:a962:ae5f) Quit (Quit: Leaving.)
[11:19] * sleinen (~Adium@130.59.94.141) has joined #ceph
[11:19] * RuediR (~Adium@2001:620:0:25:645f:e0ff:fe5c:9ba2) Quit (Quit: Leaving.)
[11:19] * RuediR (~Adium@130.59.94.249) has joined #ceph
[11:19] * jcfischer (~fischer@user-28-12.vpn.switch.ch) Quit (Quit: jcfischer)
[11:22] * jcfischer (~fischer@130.59.94.238) has joined #ceph
[11:23] * sleinen (~Adium@130.59.94.141) Quit (Read error: Connection reset by peer)
[11:23] * sleinen (~Adium@2001:620:0:25:dc77:9c67:4597:d746) has joined #ceph
[11:23] * RuediR1 (~Adium@130.59.94.249) has joined #ceph
[11:23] * RuediR (~Adium@130.59.94.249) Quit (Read error: Connection reset by peer)
[11:26] * rongze (~rongze@123.151.28.74) Quit (Remote host closed the connection)
[11:27] * mschiff (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) has joined #ceph
[11:28] * mschiff_ (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) Quit (Read error: Connection reset by peer)
[11:29] * claenjoy (~leggenda@37.157.33.36) has joined #ceph
[11:29] * Cube1 (~Cube@66-87-66-139.pools.spcsdns.net) Quit (Quit: Leaving.)
[11:30] * jcfischer_ (~fischer@macjcf.switch.ch) has joined #ceph
[11:30] * mschiff_ (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) has joined #ceph
[11:30] * mschiff (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[11:33] * jcfischer (~fischer@130.59.94.238) Quit (Ping timeout: 480 seconds)
[11:33] * jcfischer_ is now known as jcfischer
[11:40] * rongze (~rongze@106.120.176.90) has joined #ceph
[11:41] * mschiff (~mschiff@tmo-108-147.customers.d1-online.com) has joined #ceph
[11:45] * mschiff_ (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[11:59] * allsystemsarego (~allsystem@188.27.166.164) has joined #ceph
[12:04] * Cube (~Cube@12.248.40.138) has joined #ceph
[12:06] * ScOut3R (~ScOut3R@dslC3E4E249.fixip.t-online.hu) Quit (Read error: Operation timed out)
[12:09] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) has joined #ceph
[12:09] <mozg> wido, hello
[12:09] <mozg> do you have couple of minutes please?
[12:09] <mozg> i wanted to pick your brain on the radosgw for the secondary storage
[12:10] <mozg> i've got the nfs secodary storage at the moment and i would like to move away from it to ceph
[12:10] <mozg> as far as i understood from the CS mailing list i can't add S3 storage while I have the NFS one
[12:10] <mozg> is it safe for me to delete the NFS storage?
[12:11] <mozg> would my systemvm templates load automatically, or do I need to follow some steps to manually load the systemvm templates?
[12:11] <mattt_> were 0.69 packages accidentally put in the dumpling FC RPM repo recently ?
[12:11] <mattt_> i'm running 0.69 even tho my yum config is pointing to dumpling :-/
[12:13] * capri_on (~capri@212.218.127.222) has joined #ceph
[12:15] * capri (~capri@212.218.127.222) Quit (Read error: Operation timed out)
[12:16] * ScOut3R (~ScOut3R@catv-89-133-21-203.catv.broadband.hu) has joined #ceph
[12:20] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Quit: Leaving.)
[12:20] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[12:28] * mschiff_ (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) has joined #ceph
[12:31] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Read error: Operation timed out)
[12:35] * mschiff (~mschiff@tmo-108-147.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[12:36] * mschiff_ (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[12:37] * mschiff (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) has joined #ceph
[12:47] * glzhao (~glzhao@118.195.65.67) has joined #ceph
[12:53] * huangjun (~kvirc@111.172.153.78) Quit (Read error: Connection reset by peer)
[12:55] * rongze (~rongze@106.120.176.90) Quit (Remote host closed the connection)
[12:55] * hflai (hflai@alumni.cs.nctu.edu.tw) has joined #ceph
[12:56] * i_m (~ivan.miro@deibp9eh1--blueice2n2.emea.ibm.com) has joined #ceph
[12:57] * mschiff_ (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) has joined #ceph
[12:57] * mschiff (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) Quit (Read error: Connection reset by peer)
[13:00] * RuediR1 (~Adium@130.59.94.249) Quit (Quit: Leaving.)
[13:00] * RuediR (~Adium@130.59.94.249) has joined #ceph
[13:01] * RuediR1 (~Adium@130.59.94.249) has joined #ceph
[13:01] * RuediR (~Adium@130.59.94.249) Quit (Read error: Connection reset by peer)
[13:02] * sleinen (~Adium@2001:620:0:25:dc77:9c67:4597:d746) Quit (Quit: Leaving.)
[13:02] * sleinen (~Adium@130.59.94.141) has joined #ceph
[13:02] * DarkAce-Z (~BillyMays@50.107.53.200) has joined #ceph
[13:02] * jcfischer_ (~fischer@user-23-16.vpn.switch.ch) has joined #ceph
[13:02] * RuediR (~Adium@2001:620:0:25:2413:ebff:fea2:5898) has joined #ceph
[13:03] * DarkAceZ (~BillyMays@50.107.53.200) Quit (Ping timeout: 480 seconds)
[13:03] <mozg> does anyone know how to remove all rados objects from the pool without deleting the pool itself?
[13:03] <mozg> something like rados -p rbd rm benchmark*
[13:04] <mozg> the wildcards don't work
[13:04] <mozg> i've got around 30k objects there that i would like to remove
[13:04] <mozg> i can't really do it one by one
[13:04] <mozg> any tips?
[13:07] * jcfischer (~fischer@macjcf.switch.ch) Quit (Ping timeout: 480 seconds)
[13:07] * jcfischer_ is now known as jcfischer
[13:09] * RuediR1 (~Adium@130.59.94.249) Quit (Ping timeout: 480 seconds)
[13:10] * sleinen (~Adium@130.59.94.141) Quit (Ping timeout: 480 seconds)
[13:16] * mschiff (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) has joined #ceph
[13:16] * mschiff_ (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) Quit (Read error: Connection reset by peer)
[13:16] * rongze (~rongze@106.120.176.65) has joined #ceph
[13:18] * ScOut3R (~ScOut3R@catv-89-133-21-203.catv.broadband.hu) Quit (Read error: Operation timed out)
[13:18] * ScOut3R (~ScOut3R@catv-89-133-21-203.catv.broadband.hu) has joined #ceph
[13:20] * jakku (~jakku@ad044124.dynamic.ppp.asahi-net.or.jp) Quit (Read error: Connection timed out)
[13:21] * jakku (~jakku@ad044124.dynamic.ppp.asahi-net.or.jp) has joined #ceph
[13:25] * sleinen (~Adium@130.59.94.141) has joined #ceph
[13:26] * alexxy (~alexxy@2001:470:1f14:106::2) Quit (Ping timeout: 480 seconds)
[13:26] * jcfischer (~fischer@user-23-16.vpn.switch.ch) Quit (Quit: jcfischer)
[13:26] * sleinen1 (~Adium@2001:620:0:25:ddc8:7de9:4ffc:8067) has joined #ceph
[13:26] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[13:27] * capri (~capri@212.218.127.222) has joined #ceph
[13:27] * jcfischer (~fischer@130.59.94.238) has joined #ceph
[13:29] * ana_ (~quassel@167.196.23.95.dynamic.jazztel.es) has joined #ceph
[13:32] * jcfischer_ (~fischer@user-28-17.vpn.switch.ch) has joined #ceph
[13:33] * sleinen (~Adium@130.59.94.141) Quit (Ping timeout: 480 seconds)
[13:33] * capri_on (~capri@212.218.127.222) Quit (Ping timeout: 480 seconds)
[13:35] * yanzheng (~zhyan@134.134.137.71) has joined #ceph
[13:35] * jcfischer (~fischer@130.59.94.238) Quit (Ping timeout: 480 seconds)
[13:35] * jcfischer_ is now known as jcfischer
[13:40] * smiley__ (~smiley@pool-108-28-107-254.washdc.fios.verizon.net) has joined #ceph
[13:40] * capri_on (~capri@212.218.127.222) has joined #ceph
[13:41] * AfC (~andrew@2001:44b8:31cb:d400:6e88:14ff:fe33:2a9c) Quit (Quit: Leaving.)
[13:42] * sagelap (~sage@2600:1001:b11c:40d0:215:ffff:fe36:60) Quit (Read error: Connection reset by peer)
[13:46] * capri (~capri@212.218.127.222) Quit (Ping timeout: 480 seconds)
[13:48] * BillK (~BillK-OFT@106-68-105-211.dyn.iinet.net.au) Quit (Read error: Operation timed out)
[13:49] * mschiff_ (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) has joined #ceph
[13:49] * mschiff (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) Quit (Read error: Connection reset by peer)
[13:51] * BillK (~BillK-OFT@124-169-100-214.dyn.iinet.net.au) has joined #ceph
[13:54] * jcfischer (~fischer@user-28-17.vpn.switch.ch) Quit (Quit: jcfischer)
[13:55] * sleinen1 (~Adium@2001:620:0:25:ddc8:7de9:4ffc:8067) Quit (Quit: Leaving.)
[13:55] * sleinen (~Adium@130.59.94.141) has joined #ceph
[13:57] * jcfischer (~fischer@130.59.94.238) has joined #ceph
[13:58] * sleinen1 (~Adium@2001:620:0:26:80ca:ca24:4594:db5e) has joined #ceph
[13:59] * capri_on (~capri@212.218.127.222) Quit (Quit: Verlassend)
[13:59] * mschiff (~mschiff@tmo-108-147.customers.d1-online.com) has joined #ceph
[13:59] * yanzheng (~zhyan@134.134.137.71) Quit (Remote host closed the connection)
[14:01] * jcfischer_ (~fischer@user-23-12.vpn.switch.ch) has joined #ceph
[14:01] * bandrus (~Adium@208.69.66.118) has joined #ceph
[14:04] * sleinen (~Adium@130.59.94.141) Quit (Ping timeout: 480 seconds)
[14:04] * bandrus (~Adium@208.69.66.118) Quit ()
[14:05] * mschiff_ (~mschiff@p4FD7C39B.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[14:05] * jcfischer (~fischer@130.59.94.238) Quit (Ping timeout: 480 seconds)
[14:05] * jcfischer_ is now known as jcfischer
[14:10] * haomaiwang (~haomaiwan@218.201.76.31) has joined #ceph
[14:12] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) has joined #ceph
[14:13] * dxd828 (~dxd828@195.191.107.205) has joined #ceph
[14:13] * jcfischer (~fischer@user-23-12.vpn.switch.ch) Quit (Quit: jcfischer)
[14:21] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) has left #ceph
[14:22] * jcfischer_ (~fischer@user-28-20.vpn.switch.ch) has joined #ceph
[14:24] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[14:25] * BillK (~BillK-OFT@124-169-100-214.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[14:27] * RuediR (~Adium@2001:620:0:25:2413:ebff:fea2:5898) Quit (Quit: Leaving.)
[14:27] * RuediR (~Adium@130.59.94.249) has joined #ceph
[14:27] * jakku (~jakku@ad044124.dynamic.ppp.asahi-net.or.jp) Quit (Read error: Connection timed out)
[14:28] * sleinen (~Adium@2001:620:0:2d:6d1e:8:c952:54d0) has joined #ceph
[14:29] * jakku (~jakku@ad044124.dynamic.ppp.asahi-net.or.jp) has joined #ceph
[14:30] * xtra (~oftc-webi@193.53.188.200) has joined #ceph
[14:33] * dty (~derek@pool-71-114-104-38.washdc.fios.verizon.net) Quit (Quit: dty)
[14:35] * RuediR (~Adium@130.59.94.249) Quit (Ping timeout: 480 seconds)
[14:35] * sleinen1 (~Adium@2001:620:0:26:80ca:ca24:4594:db5e) Quit (Ping timeout: 480 seconds)
[14:37] * rongze (~rongze@106.120.176.65) Quit (Remote host closed the connection)
[14:42] * jcfischer (~fischer@macjcf.switch.ch) has joined #ceph
[14:42] * RuediR (~Adium@2001:620:0:2d:cae0:ebff:fe18:5325) has joined #ceph
[14:43] * sleinen1 (~Adium@2001:620:0:25:212c:521d:e58:7139) has joined #ceph
[14:46] * yanzheng (~zhyan@101.83.44.149) has joined #ceph
[14:46] * Nikhar (~nikhar@14.139.82.6) has joined #ceph
[14:47] * jcfischer_ (~fischer@user-28-20.vpn.switch.ch) Quit (Ping timeout: 480 seconds)
[14:47] <Nikhar> Hi, when I type -> rbd create foo --size 4096, absolutely nothing happens, my terminal's stuck on the command...I'm trying to follow http://ceph.com/docs/next/start/quick-rbd/
[14:48] <Nikhar> I believe I may be missing some very basic thing
[14:48] <Nikhar> any pointers?
[14:50] * jakku (~jakku@ad044124.dynamic.ppp.asahi-net.or.jp) Quit (Read error: Connection timed out)
[14:50] * sleinen (~Adium@2001:620:0:2d:6d1e:8:c952:54d0) Quit (Ping timeout: 480 seconds)
[14:52] * jakku (~jakku@ad044124.dynamic.ppp.asahi-net.or.jp) has joined #ceph
[14:56] * jakku (~jakku@ad044124.dynamic.ppp.asahi-net.or.jp) Quit (Read error: Connection reset by peer)
[14:56] * yanzheng (~zhyan@101.83.44.149) Quit (Ping timeout: 480 seconds)
[14:57] * nhm (~nhm@184-97-129-163.mpls.qwest.net) has joined #ceph
[14:58] * hijacker (~hijacker@bgva.sonic.taxback.ess.ie) Quit (Quit: Leaving)
[14:59] * ana_ (~quassel@167.196.23.95.dynamic.jazztel.es) Quit (Read error: Operation timed out)
[15:01] <peetaur> Nikhar: do you have quorum? are all monitors up?
[15:02] <peetaur> Nikhar: also check dmesg for stack traces, like: process blah has been stuck for 120 seconds -.... echo 1 > /sys/.... ... to disable this warning
[15:02] <peetaur> I got such problems with btrfs with kernel 3.2 and Ceph, but not with 3.8 or 3.11
[15:07] * yanzheng (~zhyan@101.83.202.249) has joined #ceph
[15:08] * rongze (~rongze@117.79.232.203) has joined #ceph
[15:09] <Nikhar> peetaur, Thanks :) It worked... I ahd apparently missed a step :(
[15:10] <Nikhar> when I try to attach a block device to a vm, I get the foll error:- libvir: QEMU Driver error : operation failed: open disk image file failed . ANy idea what it might mean?
[15:12] * dty (~derek@129-2-129-155.wireless.umd.edu) has joined #ceph
[15:15] <tnt_> Is it just me or are the mon much more verbose in 0.67 ?
[15:15] <tnt_> They keep saying "2013-10-24 06:25:04.184279 7f588bc0a700 1 mon.a@0(leader).paxos(paxos active c 46502635..46503181) is_readable now=2013-10-24 06:25:04.184281 lease_expire=2013-10-24 06:25:08.394332 has v0 lc 46503181" ...
[15:15] * mschiff (~mschiff@tmo-108-147.customers.d1-online.com) Quit (Remote host closed the connection)
[15:19] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[15:19] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[15:22] * Nikhar (~nikhar@14.139.82.6) Quit (Quit: Leaving)
[15:25] * glzhao (~glzhao@118.195.65.67) Quit (Quit: Lost terminal)
[15:25] * bandrus (~Adium@208.95.30.82) has joined #ceph
[15:29] * markbby (~Adium@168.94.245.3) has joined #ceph
[15:35] * dxd828 (~dxd828@195.191.107.205) Quit (Quit: Computer has gone to sleep.)
[15:39] * dmsimard (~Adium@70.38.0.248) has joined #ceph
[15:41] * BillK (~BillK-OFT@124-169-100-214.dyn.iinet.net.au) has joined #ceph
[15:42] * gregorg (~Greg@78.155.152.6) has joined #ceph
[15:43] * dmsimard1 (~Adium@2607:f748:9:1666:dd22:a0c5:167b:33d8) has joined #ceph
[15:46] * dmsimard (~Adium@70.38.0.248) Quit (Read error: Connection reset by peer)
[15:47] * rongze (~rongze@117.79.232.203) Quit (Ping timeout: 480 seconds)
[15:47] * doxavore (~doug@99-7-52-88.lightspeed.rcsntx.sbcglobal.net) has joined #ceph
[15:53] * zhyan_ (~zhyan@101.83.70.83) has joined #ceph
[15:54] * rongze (~rongze@117.79.232.235) has joined #ceph
[15:56] * yanzheng (~zhyan@101.83.202.249) Quit (Ping timeout: 480 seconds)
[15:56] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) has joined #ceph
[15:57] <dmsimard1> Anyone here using ceph for object store with Openstack ? Want to know if you're also using ceilometer/metering.
[15:57] * sleinen1 (~Adium@2001:620:0:25:212c:521d:e58:7139) Quit (Quit: Leaving.)
[15:57] * dmsimard1 is now known as dmsimard
[15:59] * sleinen (~Adium@2001:620:0:2d:7859:8703:7759:2a12) has joined #ceph
[16:00] * sleinen1 (~Adium@2001:620:0:25:1533:66e3:8486:ea2c) has joined #ceph
[16:02] <loicd> dmsimard: \o
[16:03] <loicd> sage: http://tracker.ceph.com/issues/6592#note-8 blkid -o udev /dev/cciss/c0d1p2 returns nothing. I have physical access to the machines for another 2 hours.
[16:04] <dmsimard> loicd: You do? :)
[16:07] * sleinen (~Adium@2001:620:0:2d:7859:8703:7759:2a12) Quit (Ping timeout: 480 seconds)
[16:07] <loicd> dmsimard: no I don't, I just said hi, sorry for the confusion ;-)
[16:07] <dmsimard> loicd: Oh, hi !
[16:11] * jakku (~jakku@ad044124.dynamic.ppp.asahi-net.or.jp) has joined #ceph
[16:11] * rongze (~rongze@117.79.232.235) Quit (Remote host closed the connection)
[16:12] * rongze (~rongze@117.79.232.203) has joined #ceph
[16:13] * zhyan_ (~zhyan@101.83.70.83) Quit (Ping timeout: 480 seconds)
[16:14] <mikedawson> nhm: ping
[16:15] * cronix (~cronix@5.199.139.166) has joined #ceph
[16:18] * jakku (~jakku@ad044124.dynamic.ppp.asahi-net.or.jp) Quit (Read error: Connection reset by peer)
[16:19] * rongze_ (~rongze@117.79.232.203) has joined #ceph
[16:20] * rongze (~rongze@117.79.232.203) Quit (Ping timeout: 480 seconds)
[16:23] <cronix> can i cun ceph-authtool without installing the whole ceph package?
[16:23] <cronix> i just want to generate keyring configs locally to then copy them to the remote machines
[16:26] * sleinen1 (~Adium@2001:620:0:25:1533:66e3:8486:ea2c) Quit (Quit: Leaving.)
[16:26] * sleinen (~Adium@130.59.94.141) has joined #ceph
[16:26] * ron-slc (~Ron@173-165-129-125-utah.hfc.comcastbusiness.net) has joined #ceph
[16:28] * BillK (~BillK-OFT@124-169-100-214.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[16:31] * huangjun (~kvirc@60.55.9.26) has joined #ceph
[16:34] * sleinen (~Adium@130.59.94.141) Quit (Ping timeout: 480 seconds)
[16:35] * haomaiwang (~haomaiwan@218.201.76.31) Quit (Remote host closed the connection)
[16:35] * angdraug (~angdraug@c-67-169-181-128.hsd1.ca.comcast.net) has joined #ceph
[16:36] * rongze_ (~rongze@117.79.232.203) Quit (Remote host closed the connection)
[16:37] * rongze (~rongze@14.18.250.170) has joined #ceph
[16:39] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[16:40] * xtra (~oftc-webi@193.53.188.200) Quit (Quit: Page closed)
[16:40] * xarses (~andreww@c-71-202-167-197.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[16:41] * sarob (~sarob@183.sub-70-199-0.myvzw.com) has joined #ceph
[16:42] <topro> is cephfs kernel client in vanilla linux-3.11 considered recent enough to be used instead of 0.67.4 fuse client (to mount cephfs from a 0.67.4 ceph cluster)?
[16:43] <topro> i.o.w. is it as "stable" as the 0.67.4 fuse client or does it lag behind?
[16:45] * rongze (~rongze@14.18.250.170) Quit (Ping timeout: 480 seconds)
[16:45] <huangjun> does ceph support vmware?
[16:46] <huangjun> can we run vmware on the virtural disk exported from rbd ?
[16:47] <huangjun> anybody have test it ?
[16:48] <janos> i could be wrong but that relationship sounds reversed
[16:48] * rongze (~rongze@117.79.232.235) has joined #ceph
[16:48] <janos> not "does ceph support vmware" but more "does vmware support ceph?"
[16:48] <janos> but i could be mistaken
[16:49] <cronix> janos: even if you are wrong that later question interests me
[16:49] <cronix> does vmware support ceph as datastore for vm's
[16:49] <cronix> :D
[16:49] <cronix> vsphere to be exact
[16:49] <cronix> i think ill research that
[16:49] * dmsimard (~Adium@2607:f748:9:1666:dd22:a0c5:167b:33d8) Quit (Remote host closed the connection)
[16:49] * dmsimard (~Adium@2607:f748:9:1666:dd22:a0c5:167b:33d8) has joined #ceph
[16:49] <janos> my main point is - if that's the case, you need to ask vmware ;)
[16:50] <cronix> jep^^
[16:52] * ana_ (~quassel@109.46.77.188.dynamic.jazztel.es) has joined #ceph
[16:52] * freedomhui (~freedomhu@117.79.232.203) Quit (Quit: Leaving...)
[16:52] <cronix> seems that there is someone on the ceph mailing list who re-exported the RBD as a vmware compatible device and managed to get it to work that way
[16:52] <cronix> sound a bit ugly to me thoguh
[16:53] <huangjun> yes,but it about 10 months ago
[16:56] * gregsfortytwo (~Adium@cpe-172-250-69-138.socal.res.rr.com) has joined #ceph
[16:59] <cronix> my approach would be
[16:59] <cronix> ceph -> http://linux-iscsi.org/wiki/Main_Page -> vmware
[17:00] <nwl> cronix: no formal vsphere support but we (inktank) are in discussions with vmware
[17:01] * bandrus (~Adium@208.95.30.82) Quit (Quit: Leaving.)
[17:01] <nwl> cronix: you can use the iSCSI re-export of an RBD device but obviously you don't get the full vSphere 'experience'
[17:02] <huangjun> nwl: so ceph doesn't support VAAI
[17:02] <huangjun> ?
[17:02] <cronix> okay, in discussion as in there are no plans YET but we MIGHT get to a conclusion which leads to planning if vmware MIGHT support if SOMEDAY in the future?
[17:06] * beardo (~sma310@beardo.cc.lehigh.edu) has joined #ceph
[17:06] * RuediR (~Adium@2001:620:0:2d:cae0:ebff:fe18:5325) Quit (Quit: Leaving.)
[17:06] * sleinen (~Adium@130.59.94.141) has joined #ceph
[17:06] * RuediR (~Adium@130.59.94.249) has joined #ceph
[17:07] * markbby1 (~Adium@168.94.245.2) has joined #ceph
[17:08] * jakku (~jakku@ad044124.dynamic.ppp.asahi-net.or.jp) has joined #ceph
[17:12] * markbby (~Adium@168.94.245.3) Quit (Remote host closed the connection)
[17:12] * markbby1 (~Adium@168.94.245.2) Quit (Remote host closed the connection)
[17:13] * jcfischer (~fischer@macjcf.switch.ch) Quit (Ping timeout: 480 seconds)
[17:13] * markbby (~Adium@168.94.245.4) has joined #ceph
[17:15] * RuediR (~Adium@130.59.94.249) Quit (Ping timeout: 480 seconds)
[17:17] * sleinen (~Adium@130.59.94.141) Quit (Ping timeout: 480 seconds)
[17:17] * markbby (~Adium@168.94.245.4) Quit (Remote host closed the connection)
[17:17] * rongze (~rongze@117.79.232.235) Quit (Remote host closed the connection)
[17:18] * markbby (~Adium@168.94.245.1) has joined #ceph
[17:22] * sarob (~sarob@183.sub-70-199-0.myvzw.com) Quit (Read error: Connection reset by peer)
[17:24] * freedomhui (~freedomhu@117.79.232.203) has joined #ceph
[17:25] * dmsimard (~Adium@2607:f748:9:1666:dd22:a0c5:167b:33d8) Quit (Quit: Leaving.)
[17:25] * dmsimard (~Adium@2607:f748:9:1666:dd22:a0c5:167b:33d8) has joined #ceph
[17:26] <nwl> cronix: we have a technical architecture, suggested by vmware, which requires validation and some software development.
[17:26] <nwl> huangjun: correct
[17:27] * dmsimard (~Adium@2607:f748:9:1666:dd22:a0c5:167b:33d8) Quit ()
[17:27] * dmsimard (~Adium@2607:f748:9:1666:dd22:a0c5:167b:33d8) has joined #ceph
[17:29] * markbby (~Adium@168.94.245.1) Quit (Remote host closed the connection)
[17:29] * markbby (~Adium@168.94.245.1) has joined #ceph
[17:29] * jakku (~jakku@ad044124.dynamic.ppp.asahi-net.or.jp) Quit (Read error: Connection reset by peer)
[17:30] * ircolle (~Adium@2601:1:8380:2d9:c2c:8633:24c3:23b9) has joined #ceph
[17:33] * i_m (~ivan.miro@deibp9eh1--blueice2n2.emea.ibm.com) Quit (Ping timeout: 480 seconds)
[17:34] * jakku (~jakku@ad044124.dynamic.ppp.asahi-net.or.jp) has joined #ceph
[17:34] * markbby (~Adium@168.94.245.1) Quit (Remote host closed the connection)
[17:35] * markbby (~Adium@168.94.245.2) has joined #ceph
[17:36] * huangjun (~kvirc@60.55.9.26) Quit (Ping timeout: 480 seconds)
[17:37] * ScOut3R (~ScOut3R@catv-89-133-21-203.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[17:38] * claenjoy (~leggenda@37.157.33.36) Quit (Quit: Leaving.)
[17:40] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[17:40] * markbby (~Adium@168.94.245.2) Quit (Remote host closed the connection)
[17:40] * markbby (~Adium@168.94.245.3) has joined #ceph
[17:44] * sarob (~sarob@173-13-44-225-Pennsylvania.hfc.comcastbusiness.net) has joined #ceph
[17:44] * thomnico (~thomnico@70.35.39.20) has joined #ceph
[17:49] * sleinen (~Adium@dhcp-wlan-eduroam-192-41-134-198.uzh.ch) has joined #ceph
[17:49] * jcfischer (~fischer@dhcp-wlan-eduroam-192-41-134-78.uzh.ch) has joined #ceph
[17:50] * sleinen1 (~Adium@2001:620:0:25:3c77:9248:b2a1:c4e2) has joined #ceph
[17:51] * jakku (~jakku@ad044124.dynamic.ppp.asahi-net.or.jp) Quit (Read error: Connection timed out)
[17:51] * thomnico (~thomnico@70.35.39.20) Quit (Read error: Operation timed out)
[17:53] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[17:56] * gregsfortytwo (~Adium@cpe-172-250-69-138.socal.res.rr.com) Quit (Quit: Leaving.)
[17:56] * jcfischer_ (~fischer@user-23-11.vpn.switch.ch) has joined #ceph
[17:57] * jcfischer_ (~fischer@user-23-11.vpn.switch.ch) Quit ()
[17:57] * sleinen (~Adium@dhcp-wlan-eduroam-192-41-134-198.uzh.ch) Quit (Ping timeout: 480 seconds)
[17:57] * jakku (~jakku@ad044124.dynamic.ppp.asahi-net.or.jp) has joined #ceph
[17:59] * mattt_ (~textual@94.236.7.190) Quit (Quit: Computer has gone to sleep.)
[18:01] * RuediR (~Adium@dhcp-wlan-eduroam-192-41-134-218.uzh.ch) has joined #ceph
[18:01] * jcfischer (~fischer@dhcp-wlan-eduroam-192-41-134-78.uzh.ch) Quit (Ping timeout: 480 seconds)
[18:02] * haomaiwang (~haomaiwan@218.201.76.184) has joined #ceph
[18:05] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[18:05] * swills (~swills@mouf.net) has joined #ceph
[18:09] * RuediR (~Adium@dhcp-wlan-eduroam-192-41-134-218.uzh.ch) Quit (Quit: Leaving.)
[18:09] * thomnico (~thomnico@70.35.39.20) has joined #ceph
[18:19] * Meths (~meths@2.25.214.231) Quit (Read error: Connection reset by peer)
[18:19] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) has joined #ceph
[18:20] * Meths (~meths@2.25.214.231) has joined #ceph
[18:21] * JoeGruher (~JoeGruher@134.134.139.76) has joined #ceph
[18:26] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[18:27] * glzhao (~glzhao@118.195.65.67) has joined #ceph
[18:28] * robert (~robert@77.95.99.166) Quit (Ping timeout: 480 seconds)
[18:28] * robert_ (~robert@77.95.99.166) Quit (Ping timeout: 480 seconds)
[18:32] * RuediR (~Adium@dhcp-wlan-eduroam-192-41-134-119.uzh.ch) has joined #ceph
[18:33] * RuediR (~Adium@dhcp-wlan-eduroam-192-41-134-119.uzh.ch) Quit ()
[18:33] * RuediR (~Adium@dhcp-wlan-eduroam-192-41-134-119.uzh.ch) has joined #ceph
[18:34] * dxd828 (~dxd828@217.39.7.254) has joined #ceph
[18:36] * yehudasa (~yehudasa@2602:306:330b:1980:ea03:9aff:fe98:e8ff) has joined #ceph
[18:36] * jakku (~jakku@ad044124.dynamic.ppp.asahi-net.or.jp) Quit (Read error: Connection timed out)
[18:38] <Gdub> hi everyone
[18:38] * ircolle1 (~Adium@c-67-172-132-222.hsd1.co.comcast.net) has joined #ceph
[18:38] <Gdub> am trying to map an image to a block device
[18:38] <Gdub> and i have a very small error :)
[18:39] <Gdub> rbd: add failed: (2) No such file or directory
[18:39] * jakku (~jakku@ad044124.dynamic.ppp.asahi-net.or.jp) has joined #ceph
[18:39] <Gdub> sudo rbd map backup --pool my_pool_name --name client.admin -m my_mon_ip -k /etc/ceph/ceph.client.admin.keyring
[18:39] * haomaiwang (~haomaiwan@218.201.76.184) Quit (Ping timeout: 480 seconds)
[18:40] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) has joined #ceph
[18:41] <smiley> does ceph mark the osd's in anyway so that if at some point sdb became sdc for example....ceph would be able to handle that change?
[18:42] <smiley> I thinking about times where a drive or 2 failed...the server got rebooted and the drives ended up getting assigned a diff drive path
[18:43] * sarob (~sarob@173-13-44-225-Pennsylvania.hfc.comcastbusiness.net) Quit (Remote host closed the connection)
[18:43] * sarob (~sarob@173-13-44-225-Pennsylvania.hfc.comcastbusiness.net) has joined #ceph
[18:44] * ircolle (~Adium@2601:1:8380:2d9:c2c:8633:24c3:23b9) Quit (Ping timeout: 480 seconds)
[18:49] * sarob (~sarob@173-13-44-225-Pennsylvania.hfc.comcastbusiness.net) Quit (Read error: Operation timed out)
[18:52] * RuediR1 (~Adium@2001:620:0:25:9028:4cff:fea0:84c1) has joined #ceph
[18:53] * lyncos (~chatzilla@208.71.184.41) has joined #ceph
[18:53] <lyncos> Hi everyone...
[18:54] <lyncos> I'm trying to create a new mon with ceph deploy.. and I'm facing all sort of problems
[18:54] <lyncos> anyone willing to help ...
[18:54] <lyncos> [ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
[18:55] <topro> hi, is there a git branch I can pull into vanilla 3.9.6 to build 3.9 kernel including latest cephfs kernel client code?
[18:56] * RuediR (~Adium@dhcp-wlan-eduroam-192-41-134-119.uzh.ch) Quit (Ping timeout: 480 seconds)
[18:57] * sjusthm (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) has joined #ceph
[19:01] * freedomhui (~freedomhu@117.79.232.203) Quit (Ping timeout: 480 seconds)
[19:05] * davidz (~Adium@ip68-5-239-214.oc.oc.cox.net) has joined #ceph
[19:08] * RuediR1 (~Adium@2001:620:0:25:9028:4cff:fea0:84c1) Quit (Quit: Leaving.)
[19:08] * RuediR (~Adium@dhcp-wlan-eduroam-192-41-134-119.uzh.ch) has joined #ceph
[19:10] * shang (~ShangWu@70.35.39.20) has joined #ceph
[19:12] * lyncos (~chatzilla@208.71.184.41) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 24.0/20130917154415])
[19:15] * dxd828 (~dxd828@217.39.7.254) Quit (Quit: Computer has gone to sleep.)
[19:16] * RuediR (~Adium@dhcp-wlan-eduroam-192-41-134-119.uzh.ch) Quit (Ping timeout: 480 seconds)
[19:16] * xarses (~andreww@64-79-127-122.static.wiline.com) has joined #ceph
[19:17] <gregsfortytwo1> topro: I don't think we maintain any such branches, but if you can play with git your chances aren't too bad if you just pull all the patches prefixed with "ceph", or by doing a diff on the relevant directories...
[19:18] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[19:18] * rongze (~rongze@117.79.232.237) has joined #ceph
[19:20] <topro> gregsfortytwo1: thanks for that hint. which are the relevant directories containing ceph code? fs/ceph is obvious but I think there are others too, right?
[19:21] <topro> like libceph, rbd, ...
[19:22] <gregsfortytwo1> yeah, I believe they're net/ceph and lib/ceph
[19:23] <gregsfortytwo1> if you check out what the ceph commits touch it'll be easy enough to trace
[19:24] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) Quit (Read error: No route to host)
[19:24] <gregsfortytwo1> lyncos: I'm not an expert with the ceph-deploy stuff but you'll need to provide more context; that error just means that one of the admin sockets isn't available and that ranges from the daemon not running to the config being wrong to insufficient permissions
[19:24] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) Quit (Ping timeout: 480 seconds)
[19:24] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) has joined #ceph
[19:24] <topro> ok, I'll have a look. what makes me worry a bit is that in the kernel client repo the last commit is from August 9, but in fuse client there have been lots of changes since then. is the kernel client lagging behind?
[19:24] * Gdub (~Adium@dsl093-174-037-223.dialup.saveho.com) Quit (Quit: Leaving.)
[19:26] <gregsfortytwo1> topro: they're sufficiently independent that you shouldn't take that as meaning much
[19:26] <gregsfortytwo1> but there have been commits since August 9 so you should pull again :)
[19:27] <topro> on which branch? 'for-linus' seems to be the master branch of ceph-client repo, isn't it?
[19:28] * shang (~ShangWu@70.35.39.20) Quit (Remote host closed the connection)
[19:28] <gregsfortytwo1> oh, right, the new stuff might still be in testing
[19:28] * dxd828 (~dxd828@212.183.128.230) has joined #ceph
[19:29] <peetaur> smiley: excellent question... did you find an answer?
[19:29] <topro> do you know if ceph code in any vanilla kernel (i.e. 3.11) is recent enough to work as reliable as the fuse client supplied with 0.67.4?
[19:29] <peetaur> smiley: if you have a GPT partition table, you can create apartition with a label, and then use /dev/disk/by-partlabel/yourlabelhere and maybe that would do it ... but no idea if it works
[19:32] <topro> gregsfortytwo1: I'm asking because the last kernel client i tried (linux-3.9) was too buggy and the fuse client gives me very poor performance (i.e. laggy, consuming a lot of CPU on the client)
[19:35] * dmsimard (~Adium@2607:f748:9:1666:dd22:a0c5:167b:33d8) Quit (Quit: Leaving.)
[19:35] * dmsimard (~Adium@2607:f748:9:1666:dd22:a0c5:167b:33d8) has joined #ceph
[19:36] <gregsfortytwo1> topro: I haven't made sustained use of any of them; I think that generally ceph-fuse gets fixes faster and is less trouble when it breaks
[19:36] <gregsfortytwo1> but I've also not seen performance issues with it myself
[19:36] * rongze (~rongze@117.79.232.237) Quit (Ping timeout: 480 seconds)
[19:38] <topro> over here it looks like before doing the actual IO ceph-fuse's CPU load on the client sky-rockets for about 2, then ceph-fuse cpu load diminishes and the io takes place
[19:38] * Gdub (~Adium@dsl093-174-037-223.dialup.saveho.com) has joined #ceph
[19:39] <topro> ^^ 2 seconds
[19:39] * angdraug (~angdraug@c-67-169-181-128.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[19:41] <gregsfortytwo1> hrm, what are you doing with CephFS?
[19:42] <topro> btw. there is another phenomenon I'm observing. it's about obvious memory hole on both, OSD and MDS. After running for about two days OSD processes tend to occupy up to 10GB, MDS up to 12GB
[19:42] <topro> I'm exporting /home of about 10 clients
[19:43] <topro> ~ 800GB of data
[19:43] <gregsfortytwo1> on the mds that's definitely possible; there are a number of ways for things to get stuck
[19:43] <topro> i.e. lots of small files
[19:43] <gregsfortytwo1> on the OSDs you've probably got something misconfigured if usage is getting that high
[19:43] * dmsimard (~Adium@2607:f748:9:1666:dd22:a0c5:167b:33d8) Quit (Quit: Leaving.)
[19:43] * jdmason (~jon@jfdmzpr03-ext.jf.intel.com) Quit (Read error: Connection reset by peer)
[19:43] * bandrus (~Adium@208.95.30.82) has joined #ceph
[19:43] * dmsimard (~Adium@2607:f748:9:1666:dd22:a0c5:167b:33d8) has joined #ceph
[19:43] * shang (~ShangWu@70.35.39.20) has joined #ceph
[19:44] * jdmason (~jon@134.134.137.71) has joined #ceph
[19:44] <gregsfortytwo1> anyway, have you tried looking at what the clients are doing when you see IO pause? and does it continue or is it just on first file access after an idle period?
[19:44] <topro> for MDS I know (and expected that) but anyone I'm talking to about that won't belive me that OSDs are memory-hungry like that as well (with 0.67.4 that is, was the same since I began with bobtail)
[19:44] <gregsfortytwo1> 2 seconds is a long time, but it could be that they're working on getting the file caps and that's taking a while for some reason
[19:45] <topro> I'm not quite sure on how to find out what causes the lag. any hint?
[19:45] <gregsfortytwo1> topro: every instance we've seen recently of osd mem usage being so high was just because of a very large PG count
[19:45] * mozg (~andrei@host86-184-120-113.range86-184.btcentralplus.com) has joined #ceph
[19:45] <gregsfortytwo1> topro: you could turn on messenger debugging on the client and see what communications it's sending
[19:45] <topro> 576 PGs over here
[19:45] * RuediR1 (~Adium@2001:620:0:26:3494:21ff:fe07:f6) has joined #ceph
[19:45] <gregsfortytwo1> total?
[19:45] <gregsfortytwo1> you're using our packages?
[19:46] <topro> yes,, total. packages from ceph.com/debian-dumpling
[19:46] <topro> its three wheezy boxes, each with one mon and three osd, one with a mds
[19:47] <gregsfortytwo1> I believe there are some instructions for troubleshooting OSD memory use in the docs; you should dig those up and follow them and see what comes out of i
[19:47] <topro> so three mons, 9 osds and one mds in total
[19:47] <gregsfortytwo1> *it
[19:47] <topro> I'll see if i can find it
[19:47] * sarob (~sarob@173-13-44-225-Pennsylvania.hfc.comcastbusiness.net) has joined #ceph
[19:47] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Read error: No route to host)
[19:47] * sarob (~sarob@173-13-44-225-Pennsylvania.hfc.comcastbusiness.net) Quit (Remote host closed the connection)
[19:48] * sarob (~sarob@173-13-44-225-Pennsylvania.hfc.comcastbusiness.net) has joined #ceph
[19:48] <topro> there is some information on memory profiling. http://ceph.com/docs/master/rados/troubleshooting/memory-profiling/ is that what you were referring to?
[19:49] <gregsfortytwo1> yeah
[19:49] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[19:49] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit ()
[19:51] <topro> would the output of "ceph tell osd.2 heap stats" be of any help?
[19:53] <gregsfortytwo1> maybe
[19:53] <gregsfortytwo1> what's it say?
[19:53] * jbd_ (~jbd_@2001:41d0:52:a00::77) has left #ceph
[19:54] <topro> I just restarted all of the OSDs half an hour ago, so biggest process right now is only about 850MB
[19:54] <gregsfortytwo1> what's that one say?
[19:55] <topro> http://pastebin.com/giSL9vvP
[19:56] <gregsfortytwo1> *sigh*
[19:56] * sarob (~sarob@173-13-44-225-Pennsylvania.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[19:56] <gregsfortytwo1> try running the heap release command and see what happens
[19:56] <gregsfortytwo1> for some reason on a few systems the memory doesn't seem to get released back to the OS even though the memory allocator has it marked as free
[19:57] <gregsfortytwo1> (notice the part about "in use by application" and "in page heap freelist")
[19:58] * dxd828 (~dxd828@212.183.128.230) Quit (Quit: Computer has gone to sleep.)
[19:58] <topro> http://pastebin.com/9RVf0w3N not too bad I think. I will have another try when process is running for about two days (i.e. close to 10GB)
[20:00] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) has joined #ceph
[20:03] <topro> back to the more important issue, would you recommend ceph kernel client coming with linux-3.11 or is that known to be buggy?
[20:03] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[20:05] * angdraug (~angdraug@64-79-127-122.static.wiline.com) has joined #ceph
[20:06] <gregsfortytwo1> I think there have been a number of fixes for 3.12, but I don't keep close track of it
[20:08] <topro> ok, so as soon as I can get my hands on a 3.11 kernel I'll just give it a try. thank you so far. last question: for debugging ceph-fuse laggyness, how to activate messenger debugging on ceph-fuse client?
[20:09] <gregsfortytwo1> "debug ms = 1", and make sure it's got a log file it's using
[20:09] * alram (~alram@216.103.134.250) has joined #ceph
[20:09] <gregsfortytwo1> you can put that in the config file or inject it via the admin socket if there is one
[20:10] <topro> well, I don't have a ceph.conf on the clients, just explicitly giving the monitor ip on mount command
[20:11] <gregsfortytwo1> http://ceph.com/docs/master/rados/troubleshooting/log-and-debug
[20:12] <gregsfortytwo1> oh wait, that doesn't include daemon startup, odd
[20:12] <gregsfortytwo1> anyway
[20:12] <gregsfortytwo1> "???debug_ms = 1"
[20:14] <topro> specifying on the ceph-fuse mount command on the client or would that have to be configured on the server side?
[20:14] * glzhao (~glzhao@118.195.65.67) Quit (Quit: leaving)
[20:16] * Gdub (~Adium@dsl093-174-037-223.dialup.saveho.com) Quit (Ping timeout: 480 seconds)
[20:18] <gregsfortytwo1> on the mount command
[20:20] * sarob (~sarob@183.sub-70-199-0.myvzw.com) has joined #ceph
[20:21] * Pedras (~Adium@64.191.206.83) has joined #ceph
[20:21] <topro> ok, I'll see what I can get. will report back tomorrow. thank you so much
[20:23] * topro (~topro@host-62-245-142-50.customer.m-online.net) Quit (Quit: Konversation terminated!)
[20:25] * ana_ (~quassel@109.46.77.188.dynamic.jazztel.es) Quit (Ping timeout: 480 seconds)
[20:27] * alram (~alram@216.103.134.250) Quit (Read error: Connection reset by peer)
[20:30] * rongze (~rongze@117.79.232.237) has joined #ceph
[20:31] <JoeGruher> durr. what's the best way to match up physical device to OSD number? like if I know /dev/sdc on host 4 has an issue, how do I match that to an OSD number so I can remove the OSD'?
[20:32] * jtlebigot (~jlebigot@proxy.ovh.net) Quit (Quit: Leaving.)
[20:32] * julian (~julianwa@125.70.133.130) Quit (Quit: afk)
[20:38] * Cube1 (~Cube@12.248.40.138) has joined #ceph
[20:38] * rongze (~rongze@117.79.232.237) Quit (Ping timeout: 480 seconds)
[20:40] * dxd828 (~dxd828@host-2-97-72-213.as13285.net) has joined #ceph
[20:42] * thomnico_ (~thomnico@70.35.39.20) has joined #ceph
[20:43] * thomnico_ (~thomnico@70.35.39.20) Quit (Remote host closed the connection)
[20:43] * Cube (~Cube@12.248.40.138) Quit (Ping timeout: 480 seconds)
[20:49] <xarses> JoeGruher, by default its mounted into osd-number under the ceph data directory
[20:50] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[20:50] * ChanServ sets mode +o scuttlemonkey
[20:55] <pmatulis_> JoeGruher: iterate through all all your nodes and keep the list handy?: ceph-deploy disk list <node>
[20:56] * nhm (~nhm@184-97-129-163.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[20:56] * alram (~alram@216.103.134.250) has joined #ceph
[20:58] * RuediR1 (~Adium@2001:620:0:26:3494:21ff:fe07:f6) Quit (Quit: Leaving.)
[20:58] * RuediR (~Adium@dhcp-wlan-eduroam-192-41-134-205.uzh.ch) has joined #ceph
[21:00] * sleinen1 (~Adium@2001:620:0:25:3c77:9248:b2a1:c4e2) Quit (Quit: Leaving.)
[21:01] <JoeGruher> thanks xarses, pmatulis_, will try that
[21:02] * allsystemsarego (~allsystem@188.27.166.164) Quit (Quit: Leaving)
[21:04] * cephalobot (~ceph@ds2390.dreamservers.com) has joined #ceph
[21:04] * cephalobot (~ceph@ds2390.dreamservers.com) Quit ()
[21:05] * cephalobot (~ceph@ds2390.dreamservers.com) has joined #ceph
[21:06] * RuediR (~Adium@dhcp-wlan-eduroam-192-41-134-205.uzh.ch) Quit (Ping timeout: 480 seconds)
[21:09] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[21:10] * KindTwo (KindOne@h48.51.186.173.dynamic.ip.windstream.net) has joined #ceph
[21:10] * sarob (~sarob@183.sub-70-199-0.myvzw.com) Quit (Remote host closed the connection)
[21:10] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit ()
[21:11] * sarob (~sarob@183.sub-70-199-0.myvzw.com) has joined #ceph
[21:12] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[21:12] * KindTwo is now known as KindOne
[21:12] * sarob (~sarob@183.sub-70-199-0.myvzw.com) Quit (Read error: No route to host)
[21:18] * sagelap (~sage@32.165.133.122) has joined #ceph
[21:19] * t0rn (~ssullivan@2607:fad0:32:a02:d227:88ff:fe02:9896) Quit (Quit: Leaving.)
[21:24] * amospalla (~amospalla@0001a39c.user.oftc.net) Quit (Ping timeout: 480 seconds)
[21:25] * robert (~robert@77.95.99.166) has joined #ceph
[21:25] * robert_ (~robert@77.95.99.166) has joined #ceph
[21:27] <dmsimard> alram: ping
[21:30] * rongze (~rongze@117.79.232.205) has joined #ceph
[21:36] * dmsimard1 (~Adium@108.163.152.66) has joined #ceph
[21:42] * rongze (~rongze@117.79.232.205) Quit (Ping timeout: 480 seconds)
[21:42] * dmsimard (~Adium@2607:f748:9:1666:dd22:a0c5:167b:33d8) Quit (Ping timeout: 480 seconds)
[21:44] * thomnico (~thomnico@70.35.39.20) Quit (Quit: Ex-Chat)
[21:48] * dmsimard1 (~Adium@108.163.152.66) Quit (Read error: Connection reset by peer)
[21:48] * dmsimard (~Adium@2607:f748:9:1666:8517:2c40:7bed:d2e3) has joined #ceph
[21:48] * shang (~ShangWu@70.35.39.20) Quit (Ping timeout: 480 seconds)
[21:50] * thomnico (~thomnico@70.35.39.20) has joined #ceph
[21:55] * dmsimard (~Adium@2607:f748:9:1666:8517:2c40:7bed:d2e3) Quit (Quit: Leaving.)
[21:59] * dmsimard (~Adium@2607:f748:9:1666:240c:f2b6:e0d4:7455) has joined #ceph
[22:11] * JoeGruher (~JoeGruher@134.134.139.76) Quit (Remote host closed the connection)
[22:13] * sarob (~sarob@neitherpuppy-lm.wv.cc.cmu.edu) has joined #ceph
[22:26] * mattt_ (~textual@cpc25-rdng20-2-0-cust162.15-3.cable.virginm.net) has joined #ceph
[22:33] * mattt_ (~textual@cpc25-rdng20-2-0-cust162.15-3.cable.virginm.net) Quit (Quit: Computer has gone to sleep.)
[22:35] * rongze (~rongze@14.18.250.170) has joined #ceph
[22:41] * amospalla (~amospalla@0001a39c.user.oftc.net) has joined #ceph
[22:41] * sagelap (~sage@32.165.133.122) Quit (Ping timeout: 480 seconds)
[22:41] * rongze (~rongze@14.18.250.170) Quit (Read error: Operation timed out)
[22:50] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[22:53] * shang (~ShangWu@70.35.39.20) has joined #ceph
[22:57] <athrift> Hrmm, is it normal on Dumpling that when a recovery starts IO to clients drops and causes freezes on the clients ? After a few minutes it appears to return to normal
[22:57] * bandrus1 (~Adium@208.95.30.82) has joined #ceph
[22:57] * bandrus (~Adium@208.95.30.82) Quit (Read error: Connection reset by peer)
[23:02] * ana_ (~quassel@188.199.23.95.dynamic.jazztel.es) has joined #ceph
[23:05] * bandrus (~Adium@208.95.30.82) has joined #ceph
[23:05] * bandrus1 (~Adium@208.95.30.82) Quit (Read error: Connection reset by peer)
[23:22] * Pedras1 (~Adium@64.191.206.83) has joined #ceph
[23:22] * sagelap (~sage@90.sub-70-208-86.myvzw.com) has joined #ceph
[23:25] * Pedras (~Adium@64.191.206.83) Quit (Ping timeout: 480 seconds)
[23:25] * Pedras1 (~Adium@64.191.206.83) Quit (Read error: Connection reset by peer)
[23:27] * sarob (~sarob@neitherpuppy-lm.wv.cc.cmu.edu) Quit (Remote host closed the connection)
[23:27] * sarob (~sarob@CMU-881111.WV.CC.CMU.EDU) has joined #ceph
[23:27] * dty (~derek@129-2-129-155.wireless.umd.edu) Quit (Quit: dty)
[23:29] * Pedras1 (~Adium@64.191.206.83) has joined #ceph
[23:31] * mschiff (~mschiff@port-7321.pppoe.wtnet.de) has joined #ceph
[23:34] * Pedras (~Adium@nat.kosmix.com) has joined #ceph
[23:35] * sarob (~sarob@CMU-881111.WV.CC.CMU.EDU) Quit (Ping timeout: 480 seconds)
[23:36] * Pedras (~Adium@nat.kosmix.com) Quit ()
[23:37] * Pedras1 (~Adium@64.191.206.83) Quit (Ping timeout: 480 seconds)
[23:38] * gregmark (~Adium@cet-nat-254.ndceast.pa.bo.comcast.net) has joined #ceph
[23:44] * BillK (~BillK-OFT@58-7-60-85.dyn.iinet.net.au) has joined #ceph
[23:45] * The_Bishop (~bishop@2001:470:50b6:0:154c:c2d4:4e19:7dd6) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[23:48] <cmdrk> what is the status of quotas in cephfs ? i haven't seen much traffic regarding this on the list or the "future of cephfs" talk. (i would be happy with subtree quotas fwiw..)
[23:49] * yanzheng (~zhyan@101.83.121.184) has joined #ceph
[23:49] <cjh_> cmdrk: I'd like that also
[23:49] <cjh_> ceph: are there any benchmarks of the cephfs/hadoop setup vs hdfs/hadoop?
[23:57] * yanzheng (~zhyan@101.83.121.184) Quit (Ping timeout: 480 seconds)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.