#ceph IRC Log

Index

IRC Log for 2014-03-24

Timestamps are in GMT/BST.

[0:03] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has joined #ceph
[0:03] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has left #ceph
[0:08] * zack_dolby (~textual@p852cae.tokynt01.ap.so-net.ne.jp) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[0:11] * dmsimard (~Adium@108.163.152.66) Quit (Quit: Leaving.)
[0:11] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Ping timeout: 480 seconds)
[0:12] * nrs_ (~nrs@ool-435376d0.dyn.optonline.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[0:23] * tserong (~tserong@203-57-208-132.dyn.iinet.net.au) has joined #ceph
[0:41] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[0:43] * `jpg (~josephgla@ppp121-44-151-43.lns20.syd7.internode.on.net) has joined #ceph
[0:54] * xarses (~andreww@67.139.65.163) Quit (Ping timeout: 480 seconds)
[0:57] * sputnik13 (~sputnik13@64.134.221.62) has joined #ceph
[0:58] * madkiss1 (~madkiss@aftr-88-217-181-123.dynamic.mnet-online.de) has joined #ceph
[1:03] * sputnik1_ (~sputnik13@172.56.31.191) has joined #ceph
[1:03] * madkiss (~madkiss@aftr-88-217-181-123.dynamic.mnet-online.de) Quit (Ping timeout: 480 seconds)
[1:03] * eternaleye (~eternaley@50.245.141.73) Quit (Ping timeout: 480 seconds)
[1:05] * eternaleye (~eternaley@50.245.141.73) has joined #ceph
[1:09] * sputnik13 (~sputnik13@64.134.221.62) Quit (Ping timeout: 480 seconds)
[1:10] * zack_dolby (~textual@e0109-114-22-14-183.uqwimax.jp) has joined #ceph
[1:28] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Quit: Leaving.)
[1:32] * wschulze (~wschulze@p54BEDDB2.dip0.t-ipconnect.de) has joined #ceph
[1:37] * wschulze (~wschulze@p54BEDDB2.dip0.t-ipconnect.de) Quit (Quit: Leaving.)
[1:42] * ghartz (~ghartz@ip-68.net-80-236-84.joinville.rev.numericable.fr) Quit (Remote host closed the connection)
[1:45] * sputnik13 (~sputnik13@64.134.221.62) has joined #ceph
[1:48] * sputnik13 (~sputnik13@64.134.221.62) Quit ()
[1:50] * sputnik13 (~sputnik13@64.134.221.62) has joined #ceph
[1:52] * sputnik1_ (~sputnik13@172.56.31.191) Quit (Ping timeout: 480 seconds)
[1:53] * sputnik13 (~sputnik13@64.134.221.62) Quit ()
[1:56] <classicsnail> I have 100 pools on a ceph cluster, and one pool has about 12 times the cluster average, so I get the health warning of too few pgs
[1:56] <classicsnail> it has no more data overall than the other pools, only more files which are smaller on average
[1:56] <classicsnail> is this safe to ignore?
[1:58] * aarcane (~aarcane@99-42-64-118.lightspeed.irvnca.sbcglobal.net) has joined #ceph
[2:02] * yanzheng (~zhyan@134.134.137.73) has joined #ceph
[2:12] * tserong (~tserong@203-57-208-132.dyn.iinet.net.au) Quit (Quit: Leaving)
[2:12] * tserong (~tserong@203-57-208-132.dyn.iinet.net.au) has joined #ceph
[2:13] * madkiss1 (~madkiss@aftr-88-217-181-123.dynamic.mnet-online.de) Quit (Ping timeout: 480 seconds)
[2:16] <sage> classicsnail: yeah. it indicates that the pg counts per pool may not be optimal, but it's a performance thing
[2:20] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) has joined #ceph
[2:28] * shang (~ShangWu@220-135-203-169.HINET-IP.hinet.net) has joined #ceph
[2:29] <classicsnail> it's a mirror of the gutenberg project
[2:29] <classicsnail> at least from a client perspective, I'm not sure I care about performance all that much in that direction
[2:29] <classicsnail> what's a decent real world upper bound on a number of pgs for the whole system?
[2:30] <classicsnail> I've got around 6,500 pgs in the cluster at the moment, and things recover fine if I shut down or crash a node
[2:31] * madkiss (~madkiss@aftr-88-217-181-123.dynamic.mnet-online.de) has joined #ceph
[2:33] * dlan (~dennis@116.228.88.131) has joined #ceph
[2:36] * LeaChim (~LeaChim@host86-159-235-225.range86-159.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[2:38] * diegows_ (~diegows@190.190.5.238) Quit (Ping timeout: 480 seconds)
[2:40] * h6w (~tudor@254.86.96.58.static.exetel.com.au) has joined #ceph
[2:41] <h6w> If I attempt to mount my ceph using ceph-fuse /mnt/point I get "starting ceph client" and then it hangs. My logs don't seem to respond.
[2:41] * thuc (~thuc@c-71-198-202-49.hsd1.ca.comcast.net) has joined #ceph
[2:42] <h6w> The monitor appears to be running, and I nmap and the 6789 port is open.
[2:47] * ninkotech_ (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[2:49] * thuc (~thuc@c-71-198-202-49.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[3:01] * shang (~ShangWu@220-135-203-169.HINET-IP.hinet.net) Quit (Ping timeout: 480 seconds)
[3:03] * andreask (~andreask@h081217016175.dyn.cm.kabsi.at) has joined #ceph
[3:03] * ChanServ sets mode +v andreask
[3:04] * andreask (~andreask@h081217016175.dyn.cm.kabsi.at) has left #ceph
[3:11] * sputnik13 (~sputnik13@64.134.221.62) has joined #ceph
[3:12] * sputnik13 (~sputnik13@64.134.221.62) Quit ()
[3:20] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Quit: Leaving.)
[3:23] * Cube (~Cube@66-87-67-113.pools.spcsdns.net) has joined #ceph
[3:41] * tiger (~textual@58.213.102.114) has joined #ceph
[3:42] * tiger (~textual@58.213.102.114) Quit ()
[3:43] * thuc (~thuc@c-71-198-202-49.hsd1.ca.comcast.net) has joined #ceph
[3:45] * thuc_ (~thuc@c-71-198-202-49.hsd1.ca.comcast.net) has joined #ceph
[3:45] * markbby (~Adium@168.94.245.1) has joined #ceph
[3:47] * sputnik13 (~sputnik13@64.134.221.62) has joined #ceph
[3:48] * sputnik13 (~sputnik13@64.134.221.62) Quit ()
[3:48] * Cube (~Cube@66-87-67-113.pools.spcsdns.net) Quit (Read error: Connection reset by peer)
[3:50] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[3:51] * madkiss (~madkiss@aftr-88-217-181-123.dynamic.mnet-online.de) Quit (Read error: Operation timed out)
[3:51] * thuc (~thuc@c-71-198-202-49.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[3:53] * thuc_ (~thuc@c-71-198-202-49.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[3:58] * madkiss (~madkiss@aftr-88-217-181-123.dynamic.mnet-online.de) has joined #ceph
[4:02] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Ping timeout: 480 seconds)
[4:03] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[4:05] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[4:05] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[4:17] * shang (~ShangWu@175.41.48.77) has joined #ceph
[4:17] * shang (~ShangWu@175.41.48.77) Quit ()
[4:18] * shang (~ShangWu@175.41.48.77) has joined #ceph
[4:22] * Cube (~Cube@66-87-67-113.pools.spcsdns.net) has joined #ceph
[4:29] * markbby (~Adium@168.94.245.1) Quit (Remote host closed the connection)
[4:36] * Zethrok_ (~martin@95.154.26.34) has joined #ceph
[4:38] * Zethrok (~martin@95.154.26.34) Quit (Ping timeout: 480 seconds)
[4:42] * Cube (~Cube@66-87-67-113.pools.spcsdns.net) Quit (Quit: Leaving.)
[4:42] * yanzheng (~zhyan@134.134.137.73) Quit (Remote host closed the connection)
[4:47] * glambert (~glambert@37.157.50.80) Quit (Ping timeout: 480 seconds)
[4:48] * danieagle (~Daniel@177.205.176.97.dynamic.adsl.gvt.net.br) Quit (Quit: Muito Obrigado por Tudo! :-))
[4:48] * glambert (~glambert@37.157.50.80) has joined #ceph
[4:54] * thuc (~thuc@c-71-198-202-49.hsd1.ca.comcast.net) has joined #ceph
[4:54] * BillK (~BillK-OFT@106-69-69-86.dyn.iinet.net.au) Quit (Read error: Operation timed out)
[4:57] * Cube (~Cube@66-87-67-113.pools.spcsdns.net) has joined #ceph
[4:57] * BillK (~BillK-OFT@106-69-238-222.dyn.iinet.net.au) has joined #ceph
[5:02] * thuc (~thuc@c-71-198-202-49.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[5:06] * Vacum_ (~vovo@88.130.219.77) has joined #ceph
[5:10] * madkiss (~madkiss@aftr-88-217-181-123.dynamic.mnet-online.de) Quit (Ping timeout: 480 seconds)
[5:13] * BillK (~BillK-OFT@106-69-238-222.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[5:13] * Vacum (~vovo@i59F797B7.versanet.de) Quit (Ping timeout: 480 seconds)
[5:15] * BillK (~BillK-OFT@124-149-91-147.dyn.iinet.net.au) has joined #ceph
[5:16] * yanzheng (~zhyan@134.134.137.75) has joined #ceph
[5:20] * madkiss (~madkiss@aftr-88-217-181-123.dynamic.mnet-online.de) has joined #ceph
[5:25] * shang (~ShangWu@175.41.48.77) Quit (Quit: Ex-Chat)
[5:25] * shang (~ShangWu@175.41.48.77) has joined #ceph
[5:39] * thuc (~thuc@c-71-198-202-49.hsd1.ca.comcast.net) has joined #ceph
[5:40] * nrs_ (~nrs@ool-435376d0.dyn.optonline.net) has joined #ceph
[5:47] * thuc (~thuc@c-71-198-202-49.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[5:50] * madkiss1 (~madkiss@aftr-88-217-180-152.dynamic.mnet-online.de) has joined #ceph
[5:52] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[5:52] * madkiss (~madkiss@aftr-88-217-181-123.dynamic.mnet-online.de) Quit (Read error: Operation timed out)
[6:07] <nrs_> lurbs: you around?
[6:09] * princeholla (~princehol@p5DE95CEB.dip0.t-ipconnect.de) has joined #ceph
[6:10] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) Quit (Quit: Leaving.)
[6:41] * zack_dolby (~textual@e0109-114-22-14-183.uqwimax.jp) Quit (Ping timeout: 480 seconds)
[6:47] * zack_dolby (~textual@e0109-114-22-14-183.uqwimax.jp) has joined #ceph
[7:00] * nrs_ (~nrs@ool-435376d0.dyn.optonline.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[7:12] * wrale (~wrale@cpe-107-9-20-3.woh.res.rr.com) Quit (Quit: Leaving...)
[7:16] * renzhi (~renzhi@192.241.193.44) has joined #ceph
[7:18] * princeholla (~princehol@p5DE95CEB.dip0.t-ipconnect.de) Quit (Quit: Verlassend)
[7:19] * xarses (~andreww@c-24-23-183-44.hsd1.ca.comcast.net) has joined #ceph
[7:33] * madkiss1 (~madkiss@aftr-88-217-180-152.dynamic.mnet-online.de) Quit (Quit: Leaving.)
[7:52] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[7:54] * MACscr (~Adium@c-98-214-103-147.hsd1.il.comcast.net) has joined #ceph
[7:54] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[8:02] * dlan_ (~dennis@116.228.88.131) has joined #ceph
[8:04] * madkiss (~madkiss@host-82-135-29-140.customer.m-online.net) has joined #ceph
[8:04] * dlan (~dennis@116.228.88.131) Quit (Ping timeout: 480 seconds)
[8:10] * AfC (~andrew@2407:7800:400:1011:2ad2:44ff:fe08:a4c) Quit (Ping timeout: 480 seconds)
[8:19] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[8:28] * Cube1 (~Cube@66-87-67-113.pools.spcsdns.net) has joined #ceph
[8:32] * bboris (~boris@78.90.142.146) has joined #ceph
[8:32] * Cube (~Cube@66-87-67-113.pools.spcsdns.net) Quit (Ping timeout: 480 seconds)
[8:38] * JCL (~JCL@2601:9:5980:39b:fe:9942:389a:6d0) Quit (Quit: Leaving.)
[8:39] * jks (~jks@3e6b5724.rev.stofanet.dk) has joined #ceph
[8:43] * ircolle (~Adium@2601:1:8380:2d9:9849:3a6a:3d7d:69f6) has joined #ceph
[8:44] * thomnico (~thomnico@2a01:e35:8b41:120:50dd:ed31:f46e:2f55) has joined #ceph
[8:56] * shang (~ShangWu@175.41.48.77) Quit (Ping timeout: 480 seconds)
[8:56] * analbeard (~shw@141.0.32.124) has joined #ceph
[9:00] * glambert (~glambert@37.157.50.80) Quit (Read error: Connection reset by peer)
[9:02] * BillK (~BillK-OFT@124-149-91-147.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[9:05] * BillK (~BillK-OFT@106-69-136-223.dyn.iinet.net.au) has joined #ceph
[9:06] * glambert (~glambert@37.157.50.80) has joined #ceph
[9:06] * Cataglottism (~Cataglott@dsl-087-195-030-170.solcon.nl) has joined #ceph
[9:07] * LeaChim (~LeaChim@host86-159-235-225.range86-159.btcentralplus.com) has joined #ceph
[9:10] * kraken (~kraken@gw.sepia.ceph.com) Quit (Read error: No route to host)
[9:12] * kraken (~kraken@gw.sepia.ceph.com) has joined #ceph
[9:12] * shang (~ShangWu@175.41.48.77) has joined #ceph
[9:14] * Sysadmin88 (~IceChat77@176.254.32.31) Quit (Quit: Oops. My brain just hit a bad sector)
[9:24] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:39] * thuc (~thuc@c-71-198-202-49.hsd1.ca.comcast.net) has joined #ceph
[9:43] * b0e (~aledermue@juniper1.netways.de) has joined #ceph
[9:43] * rotbeard (~redbeard@b2b-94-79-138-170.unitymedia.biz) has joined #ceph
[9:46] * hybrid512 (~walid@195.200.167.70) Quit (Quit: Leaving.)
[9:47] * thuc (~thuc@c-71-198-202-49.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[9:47] * hybrid512 (~walid@195.200.167.70) has joined #ceph
[9:50] * hybrid512 (~walid@195.200.167.70) Quit ()
[9:50] * hybrid512 (~walid@195.200.167.70) has joined #ceph
[9:52] * syed_ (~chatzilla@180.151.91.102) has joined #ceph
[9:55] * leseb (~leseb@185.21.172.77) Quit (Killed (NickServ (Too many failed password attempts.)))
[10:00] * leseb (~leseb@185.21.172.77) has joined #ceph
[10:00] * BillK (~BillK-OFT@106-69-136-223.dyn.iinet.net.au) Quit (Read error: Operation timed out)
[10:03] * BillK (~BillK-OFT@106-69-52-136.dyn.iinet.net.au) has joined #ceph
[10:03] * zack_dolby (~textual@e0109-114-22-14-183.uqwimax.jp) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[10:04] * Vacum_ is now known as Vacum
[10:17] * bboris (~boris@78.90.142.146) Quit (Ping timeout: 480 seconds)
[10:17] * oro (~oro@2001:620:20:16:c901:fc01:9cf9:26e) has joined #ceph
[10:21] * glambert (~glambert@37.157.50.80) Quit (Quit: <?php exit(); ?>)
[10:28] * yanzheng (~zhyan@134.134.137.75) Quit (Quit: Leaving)
[10:31] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[10:32] * oro (~oro@2001:620:20:16:c901:fc01:9cf9:26e) Quit (Ping timeout: 480 seconds)
[10:34] * Keksior (~oftc-webi@109.232.242.2) has joined #ceph
[10:34] <Keksior> Hello. Anyone online ?
[10:36] <syed_> Keksior: Hello
[10:36] <Keksior> syed_: I've got problem with ceph rbd performance
[10:37] <Keksior> reads are very slow - 30-50MB/s with 3 hosts and 14 OSD's with SSD journaling
[10:37] <Keksior> can you help me tracking down my problem root ?
[10:39] <dwm> Keksior: What have you looked at so far?
[10:40] <Keksior> tried to rebuild ceph with 4 bonded network interfaces for cluster network, and 1 interface for public
[10:41] * oro (~oro@2001:620:20:222:c0c7:3647:c369:d16b) has joined #ceph
[10:42] <dwm> Keksior: Right, so you're currently seeing 240-400Mbit/sec reads. I presume you have gigabit links for the public-side of the network?
[10:42] <Keksior> yes i've got
[10:42] <Keksior> i tried to bench my ceph so its like
[10:42] <dwm> So, assuming no contention, that's about 24-40% of possible bandwidth.
[10:43] <Keksior> rados bench write gave me this - Bandwidth (MB/sec): 185.857
[10:43] <Keksior> so still very poor
[10:43] <Keksior> i'm using pcie SSD cards with 10GB journal per osd
[10:44] <dwm> Are you using partitions on the SSD cards, or files in a filesystem?
[10:44] <Keksior> partition
[10:44] <dwm> Good, that should be optimal, then.
[10:44] <Keksior> rados bench seq gave me this: Bandwidth (MB/sec): 279.724
[10:45] <Keksior> but when i map rbd device to the system and do some dd on it
[10:45] <dwm> That's about 2.2Gbit/sec.
[10:45] <Keksior> yes, but it's only benchmarks
[10:45] <Keksior> when i map rbd device i've got
[10:46] <dwm> How are you testing RBD performance from your clients?
[10:46] <Keksior> i use openstack
[10:47] <Keksior> every when i run VM on the rbd device it's starting about 10 minuts (ubuntu)
[10:47] <Keksior> so now i try to mount rbd device and do some dd on it
[10:47] <Gugge-47527> single treaded reads from rbd without readahead will always be slower than a local disk
[10:48] <jerker> Keksior: It takes 10 minutes to start a VM machine on RBD using KVM?
[10:48] <Keksior> Gugge-47527: yes
[10:48] <Gugge-47527> but 10 minutes to start a vm is way to slow :)
[10:48] <Keksior> and logging to VM through SSH takes about 2 mins
[10:49] <dwm> I have little experience with this type of issue, but you might need to enable RBD caching: http://marc.info/?l=ceph-devel&m=133758599712768&w=2
[10:49] <Keksior> hdparm from VM on the vda is like this:
[10:49] <Keksior> Timing buffered disk reads: 106 MB in 5.50 seconds = 19.26 MB/sec
[10:49] <Keksior> Timing cached reads: 1348 MB in 2.00 seconds = 674.98 MB/sec
[10:49] * thb (~me@0001bd58.user.oftc.net) has joined #ceph
[10:49] * amospalla (~amospalla@0001a39c.user.oftc.net) Quit (Read error: Connection reset by peer)
[10:50] <dwm> Depending on your consistency requirements, writethrough caching mode may be what you require.
[10:50] <syed_> Keksior: You can try setting rbd_cache=true and cache=writeback to enable rbd caching
[10:50] <Keksior> i''ve got rbc cache= true
[10:50] <Keksior> shoud it have _ ? in the config file ?
[10:51] <jerker> What do you get when you do "dd if=/dev/sda of=/dev/null count=10000 bs=1M"? I get 98.6 MB/s in my small test cluster.
[10:52] <syed_> Keksior: No, rb cache is correct
[10:53] <Keksior> syed_: so i should add cache=writeback on the ceph.conf yes :>?
[10:53] <Keksior> jerker: still waiting till dd is over ;/
[10:53] <syed_> Keksior: did you also set disk_cachemodes="network=writeback" in nova.conf
[10:54] <Keksior> i had writethrough set
[10:55] <syed_> Keksior: ok
[10:55] <Keksior> jerker: 64,1 MB/s
[10:57] <Keksior> syed_: rebooting VM's with writeback enabled now
[10:57] * amospalla (~amospalla@0001a39c.user.oftc.net) has joined #ceph
[10:58] <Keksior> and what about rbd cache writethrough until flush parameter ?
[10:58] <Keksior> logging to VM still take about 1 minute :/
[10:59] <jerker> Keksior: You have no other problem like wrong DNS-resolver etc? Just guessing...
[11:00] <Keksior> with writeback enabled i've got now 50MB/s, before i've had only 20 MB/s
[11:00] * bboris (~boris@router14.mail.bg) has joined #ceph
[11:00] <jerker> Keksior: 64.1 MB/s is not great I guess (my old Core 2 Duo and Atom boxes are doing better with total 4 drives) But good enough I guess.
[11:01] * Cataglottism (~Cataglott@dsl-087-195-030-170.solcon.nl) Quit (Ping timeout: 480 seconds)
[11:01] * allsystemsarego (~allsystem@5-12-37-194.residential.rdsnet.ro) has joined #ceph
[11:01] <jerker> Keksior: Good enough for not making the OS totally horribly slow.
[11:01] <Keksior> jerker: the network is ok i checked it even through tcpdump for lost packets
[11:01] <jerker> You are mounting in KVM and not inside Linux kernel client RBD? /just checking
[11:02] <Keksior> jerker: jyes in kvm, but the problem is that my pproject needs minimum 80MB/s :(
[11:02] <jerker> How many OSD nodes? What kind of drives? SSD journal?
[11:02] <Keksior> SSD journal on the OCZ revo drives, 4osd/4osd/6osd
[11:03] <Keksior> the ceph osds are on the same physical machine as nova-compute
[11:03] <Keksior> you think it could be the bottleneck ?
[11:03] <jerker> My KVM is also running on one of the OSD nodes. It should not matter in this case I guess.
[11:04] <jerker> actually the ethernet usage should go down by some since some of the reads are from the local node.
[11:04] * Cataglottism (~Cataglott@dsl-087-195-030-184.solcon.nl) has joined #ceph
[11:04] <syed_> Keksior: you can look here https://ceph.com/docs/master/rbd/rbd-config-ref/ for all the available options
[11:04] <Keksior> i even thought about flashcache
[11:04] <jerker> What kind of physical drives is it? How fast are they?
[11:05] <Keksior> physical drives are SAS 10k rpm
[11:05] <Keksior> new servers
[11:05] <jerker> Mine are two 7k2rpm SATA and two 5k4rpm/8MB cache. Hmm. Cool. You have better hardware than I have...
[11:05] <jerker> (testcluster)=
[11:05] <Keksior> jerker: but you have better performance :/
[11:06] <jerker> If you do "iostat -x 2" (%util) on the client (runnning the RBD) when doing something heavy, what do you get? And if you check the same on the OSD-server?
[11:07] <jerker> I have plenty of room left on the OSD-nodes (only using 15% or something) when reading 98% IO-util on the RBD.
[11:08] <Keksior> when dd 8 GB file from vm to dev/null
[11:08] <jerker> I dd 10 GB file /dev/sda to /dev/null
[11:08] <jerker> The file must be larger than RAM on the VM
[11:09] <jerker> Also check "dstat -N eth0" to see how much bandwidth is being used. It should not be much at all.
[11:09] <Keksior> 8 gb file read to /dev/null gave me 26,5 MB/s
[11:09] <Keksior> on the VM
[11:10] <Keksior> so the reads are terrible :(
[11:10] <jerker> How much IO-utilization? The most right column (%util)
[11:11] <jerker> "dstat -N eth0" or whatever should be run on the OSD-nodes and the VM-host not the KVM/RBD-client since that traffic is just block device.
[11:11] <Keksior> 30-40 %
[11:11] <Keksior> IO-util on the VM is max 40%
[11:11] <jerker> How large blocks are you reading? 1M?
[11:11] <Keksior> yes
[11:12] <Keksior> with BS=1MB i've got 90% util
[11:12] <jerker> I do not understand. How much CPU have you left in VM and in VM-host when doing this? Check with "dstat" again while reading full.
[11:12] <jerker> Keksior: What speed do you get with bs=1M ?
[11:12] <jerker> Keksior: bandwidth
[11:13] <Keksior> jerker: on the VM cpu util is 60-70%
[11:13] <Keksior> with disk 50-60%
[11:13] <Keksior> dstat gave me this
[11:13] <Keksior> waiting for dd to end to get speed
[11:13] <Keksior> meantime i'll generate more powerfull VM
[11:14] <jerker> What about not the VM CPU util but the VM-host (the host for the VM) is there enough CPU there?
[11:14] <jerker> It should not be a problem I guess since mine are Core 2 Duo which are running fine. ...
[11:14] * Cataglottism (~Cataglott@dsl-087-195-030-184.solcon.nl) Quit (Quit: My Mac Pro has gone to sleep. ZZZzzz???)
[11:15] <Keksior> it's 16 cores machine with HT
[11:15] <Keksior> but dstat says it's 93 % used
[11:15] <syed_> Keksior: have you considered increasing the read ahead size ?
[11:15] <Keksior> only one core are used
[11:15] <jerker> mmmm
[11:15] <Keksior> on the VM ? or compute node ?
[11:16] * stewiem2000 (~stewiem20@195.10.250.233) Quit (Quit: Leaving.)
[11:16] <Keksior> dd with BS=1M gave me 64MB/s
[11:17] <Keksior> but still far from 100MB/s
[11:17] <syed_> readahead in ceph.conf
[11:17] <Keksior> i haven't considered it by now, you think i should ?
[11:18] <syed_> Keksior: sure
[11:18] * stewiem2000 (~stewiem20@195.10.250.233) has joined #ceph
[11:18] <Keksior> syed_: looking for it in documentation ceph and can't find ;/
[11:19] <jerker> Keksior: I am curious why you have 93% CPU utilization on the compute node. I get a qemu-kvm process eating 80% CPU (one core?) when reading at 91 MB/s from disk(RBD). This is on a Core 2 Duo.
[11:19] <jerker> Keksior: I am checking with "top".
[11:19] <Keksior> jerker: mine is new Dell R720 machine with small VM (1 core with 2 GB ram)
[11:19] <Keksior> now trying the same with 4 core VM with 8 GB ram
[11:20] <jerker> Keksior: My VM is 1 core 512 MB RAM.
[11:20] <jerker> Sorry 2 cores.
[11:20] <Keksior> it's very weird ;/
[11:20] <jerker> I have only 4 GB RAM total on the combined 2xOSD + 1xKVM box.
[11:21] <Keksior> so i should have 90MB/s without special configuration i think
[11:21] <jerker> Can you see with what arguments the KVM process was started? "ps axuw | grep kvm"
[11:21] <Keksior> syed_: could you help me find it in documentation from ceph, i cant find it
[11:22] <jerker> Keksior: What version of Ceph and KVM are you using? I run SL6 (RHEL6) with the KVM supplied by Inktank/Ceph.
[11:23] <Keksior> ceph -v 0.72.2, with openstack havana on the Ubuntu 12.04 LTS
[11:23] <jerker> Ceph is version 0.72, Qemu-kvm is version qemu-kvm-0.12.1.2-2.415.el6.3ceph.x86_64
[11:23] <Keksior> http://paste.openstack.org/show/74119/
[11:23] <jerker> 0.72.2 same for me
[11:24] <jerker> Can you check CPU usage with the single KVM-process while reading? Do you get 93% CPU utilization on a single core when checking with "top"?
[11:25] <Keksior> qemu-system-x86 129% cpu
[11:25] <Keksior> it's 7% of my machine cpu utilization
[11:27] <jerker> I do not know how well the drivers for QEMU scale to more cores than one. I have no idea. I am not even using the "virtio"-driver that you are, I am just using the "driver=ide-drive..."
[11:28] <jerker> Those should be worse for performance..
[11:30] <Keksior> virtio shoud be faster than ide-drive
[11:30] <jerker> Keksior: indeed
[11:30] <Keksior> but in my cluster it isn't :/
[11:30] <Keksior> still 60MB/s is better from 20MB/s
[11:30] <Keksior> but it's still far from 90MB/s i need to have
[11:30] <jerker> I am still curious about the 129% CPU utilization on the compute node. Wtf.
[11:31] <jerker> What processor?
[11:31] <jerker> I have a Intel Core 2 Duo E8400 at 3.0 GHz
[11:32] <jerker> And I am using 58% of one core of that one when reading at 93.3 MB/s.
[11:33] <syed_> Keksior: have a look at rsize option https://ceph.com/docs/master/man/8/mount.ceph/, default is 512 kb, you can change it 2048 kb
[11:33] <Keksior> Xeon E5620 @ 2,40 Ghz
[11:34] <Keksior> syed_: so here it is, can't i add something to ceph.conf file to get all old and new osds have 2 MB ?
[11:34] <Gugge-47527> why would a mount.ceph option have anything to do with rbd performance?
[11:35] <Keksior> also what's weird, with ceph fs i've got 100-125MB/s
[11:35] <Keksior> so maximum utilization of the public interface, and that's the performance i'm trying to achieve with rbd
[11:37] <Keksior> hmm now i see somethign strange
[11:38] <Keksior> when i do sudo ceph --admin-daemon /var/run/ceph/ceph-osd.4.asok config show | grep 'rbd'
[11:38] <Keksior> i'll see that on my osds rbd_cache is false
[11:38] <Keksior> even it it's enabled in ceph.conf and the machines was rebooted when appplyied this configuration
[11:38] <Gugge-47527> rbd_cache is a client setting
[11:38] <Gugge-47527> what its set to on the servers doesnt matter
[11:39] <Keksior> ah ok, now it's clear for me
[11:40] <syed_> Keksior: you can also try something like: echo "2048" > /sys/block/{sdx}/queue/readahead_kb
[11:41] * jerker leaving for lunch. good luck.
[11:42] <Keksior> syed_: now i've got 100-125 MB/s !
[11:43] <syed_> Keksior: ;)
[11:43] <Keksior> thank you very much guys for helping me with this problem !
[11:43] <jerker> Keksior: cool
[11:43] <Keksior> you're the best ! :)
[11:43] <Gugge-47527> what blocksize do you use in your test?
[11:43] <Gugge-47527> remember that readahead can damage random io performance :)
[11:44] <Keksior> Gugge-47527: 2MB blocksize
[11:44] <jerker> Hey, now I get 161 MB/s. Thanks guys :)
[11:44] * jerker NOW I am heading for lunch
[11:45] <Keksior> Gugge-47527: for now on i need to get this cluster working smoothly "in testing environment" so i'll get more time to get this working :)
[11:45] <Keksior> jerker: still better than mine :(
[11:47] <Keksior> but writes still poor ;/
[11:47] <Keksior> 43 MB/s
[11:47] * shang (~ShangWu@175.41.48.77) Quit (Quit: Ex-Chat)
[11:54] <syed_> Keksior: can you share your ceph.conf ?
[11:55] * ksingh (~Adium@2001:708:10:10:c29:3034:c200:4ad) has joined #ceph
[11:56] <Keksior> http://paste.openstack.org/show/74123/
[11:57] <Keksior> i created my cluster with ceph-deploy
[11:57] * The_Bishop (~bishop@2001:470:50b6:0:6dd0:23f:9159:4ddf) has joined #ceph
[12:04] <bboris> how do i mark stale pgs as lost and force replication?
[12:05] <bboris> i deleted and recreated the osds on one host but forgot to mark them as lost
[12:05] <bboris> now i have all of the pgs stale+active+clean
[12:06] <bboris> i think the mon is waiting for the old osds ?
[12:06] <syed_> Keksior: how are journals implemented in your setup /
[12:07] <Keksior> 5 GB partition on ssd per osd
[12:07] <Keksior> i've got 500 GB SSD, so you think i should make 100GB journal per osd ?
[12:07] <Keksior> to get whole ssd used and more speed
[12:08] <Keksior> i even thought about getting flashcache on the half of ssd to put on the osds
[12:08] <Keksior> but i don't need so much speed i only need to get 100-125 MB of read AND write :(
[12:14] <syed_> Keksior: you may want to read this http://www.hastexo.com/resources/hints-and-kinks/solid-state-drives-and-ceph-osd-journals
[12:15] * oro (~oro@2001:620:20:222:c0c7:3647:c369:d16b) Quit (Ping timeout: 480 seconds)
[12:16] <classicsnail> well, that was a mistake, gave a intro to ceph to two of my junior guys today
[12:16] <classicsnail> said "watch this"
[12:16] <classicsnail> "watch this" is something that you should never say... I'm now recovering a corrupt mds journal
[12:16] <classicsnail> it wasn't production
[12:28] * alphe (~alphe@0001ac6f.user.oftc.net) has joined #ceph
[12:28] <alphe> hello everyone
[12:29] <alphe> I have one ask buzzing me !
[12:29] <alphe> It s about disks filling proyection
[12:30] <alphe> so I have a 38 TB global RDB I did a 18TB rbd-image drive and I would like to know how will be handled data
[12:33] <alphe> lets say I write after formating 17 TB to the disk then I free 5TB so I get 12TB of "free" space on the rbd-image (client side) but I see as used space on ceph cluster still 34TB (17TB of data + 17TB of replicas)
[12:34] <alphe> now If I write back 5TB how the replicas will be handled ? will the os write first the 1 TB remaining of free space on the rbd-image and then start using the freed space ? will the replicas be overwriten ?
[12:34] <alphe> or will the replicas get created along ?
[12:35] * fatih (~fatih@78.186.36.182) has joined #ceph
[12:37] * yanzheng (~zhyan@134.134.137.75) has joined #ceph
[12:41] * glambert (~glambert@37.157.50.80) has joined #ceph
[12:46] * jcsp (~Adium@0001bf3a.user.oftc.net) has joined #ceph
[12:55] * sprachgenerator (~sprachgen@c-67-167-211-254.hsd1.il.comcast.net) has joined #ceph
[12:56] * syed_ (~chatzilla@180.151.91.102) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 27.0.1/20140212131424])
[12:59] <jerker> Hmmmm. In order to free space in the pool the client as far as I understand must discard / TRIM the data. If that has not been done then data erased and then written on the client RBD will just be handled as overwritten on the object layer.
[12:59] * oro (~oro@2001:620:20:222:c0c7:3647:c369:d16b) has joined #ceph
[13:00] <alphe> jerker ok but it is not on a regular filesystem usage to actually really remove data from the hard drive at most the datas are tagged as "deleted" and available for overwrite
[13:01] <alphe> so in my case the means I will top the 18TB of disk space and 18TB of replicated data means I will have a global use of my disks of 36TB/38TB
[13:02] <alphe> and this is as far I understand how ceph backend works I don t really know If I will have 4TC
[13:02] * BillK (~BillK-OFT@106-69-52-136.dyn.iinet.net.au) Quit (Read error: Operation timed out)
[13:02] * t0rn (~ssullivan@2607:fad0:32:a02:d227:88ff:fe02:9896) has joined #ceph
[13:02] <alphe> and this is as far I understand how ceph backend works I don t really know If I will have 4TB of untouched space or if replicas will fill that space because well ... there is free space ...
[13:03] * valeech (~valeech@pool-71-171-123-210.clppva.fios.verizon.net) has joined #ceph
[13:04] <jerker> If you have a RBD of 18 TB then only 36 TB total should be used for it with 2 replicas. AFAIK. But you can use less if the RBD is not filled and the file system supports TRIM (or may be called discard). I have not tried it my self though so that is the theory.
[13:04] * BillK (~BillK-OFT@58-7-168-173.dyn.iinet.net.au) has joined #ceph
[13:04] * wschulze (~wschulze@p54BEDDB2.dip0.t-ipconnect.de) has joined #ceph
[13:08] <alphe> hum
[13:08] * Keksior slaps jerker around a bit with a large fishbot
[13:08] <Keksior> wrr
[13:08] * marrusl (~mark@209-150-43-182.c3-0.wsd-ubr2.qens-wsd.ny.cable.rcn.com) has joined #ceph
[13:08] * jerker shutsup
[13:09] <Keksior> sorry ;D misscliced
[13:09] <alphe> yeah that was my calculation that i will soon or later top the rbd-image max size and the replicas will follow then I whill have a bunch of untouched space
[13:09] <Keksior> jerker: will you have some time later to help me solving write performance problem :>?
[13:10] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[13:10] * trotofdoom (~trotofdoo@BSN-143-34-158.dial-up.dsl.siol.net) has joined #ceph
[13:10] <alphe> jerker fitrim fs seems to do the job
[13:11] <trotofdoom> hello, i need general info about ceph (glusterfs)... am i at the right place? :)
[13:11] <alphe> I will try it in my virtual ceph-cluster mini
[13:11] <jerker> trotofdoom: I have tried Ceph and GlusterFS, but is most current in Ceph.
[13:11] * humbolt (~elias@91-114-136-138.adsl.highway.telekom.at) has joined #ceph
[13:12] <jerker> Keksior: I Do work with other stuff, but if I happens to be around fiddling with Ceph I will be happy to answer.
[13:12] <alphe> trotofdoom I think so ... if you have noticed a bug in the implementation and want to submit a patch then you should go to developer chat but that would be all
[13:12] <trotofdoom> no no, i am a long way away from finding bugs i think ;)
[13:14] <trotofdoom> what i need is distributed file system for static files (images for web server, backups, logs...)
[13:14] <jerker> How many files and how large?
[13:16] <trotofdoom> and secondly we are trying xenserver and we want to be able to store virtual machines on this file system
[13:16] <jerker> CephFS is not ready for general production use. I have not followed the mailinglists for a while for GlusterFS.
[13:17] <trotofdoom> not really large... lets say 1 million files ... large aroind 150kb
[13:17] <jerker> But VM-images (RBD) seem to work fine in Ceph.
[13:17] <jerker> How much data in total?
[13:18] * ninkotech_ (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Ping timeout: 480 seconds)
[13:18] <jerker> trotofdoom: even NFS is a distributed file system, but I guess you want some kind if fault-tolerance / high-availability ?
[13:18] <trotofdoom> 1 tb of data for starters
[13:18] <trotofdoom> yes, that is my main concern
[13:18] <trotofdoom> fault tolerane / HA
[13:19] <trotofdoom> not a lot of writes, mostly reades
[13:20] <trotofdoom> i have currently 8 physical servers each running around 8 virtual ones
[13:21] <trotofdoom> i want to use this crappy hard drives that are attached to this physical machines for distributed storage
[13:22] <trotofdoom> i dont have money for SAN or something
[13:23] <trotofdoom> first question... xenserver can access storage over nfs or iscsi... what should i use?
[13:24] <jerker> Do you have to use xenserver?
[13:25] <trotofdoom> not really, we were using esxi until now, but with the newest version they introduced new admin console which you only get if you pay
[13:25] <jerker> I would advice you to set up both GlusterFS and Ceph and see how it works. Do some testing and see what works for you.
[13:25] <trotofdoom> i want my whole system to be open source, or as cheap as possible (dont we all :))
[13:26] * fghaas (~florian@213.17.226.11) has joined #ceph
[13:26] <jerker> CephFS (POSIX-filesystem) is not really production use yet. GlusterFS is maybe, depending what you do.
[13:26] <trotofdoom> then what is production ready?
[13:27] <jerker> Both are slower than normal NFS, at least a few years ago. A old-school DRBD could also be considered. But everything HA makes most stuff a lot mor complicated.
[13:27] <trotofdoom> yes HA is my main goal, and simplicity
[13:27] <trotofdoom> one man army ... :)
[13:28] <jerker> trotofdoom: Storing VM (RBD) in Ceph is production ready as far as I can tell. (But I am just a user not developer.)
[13:29] <jerker> regarding glustefs, better talk to the glusterfs folks that is using it now.
[13:29] <trotofdoom> so lets say i will use ceph for that... what are ceph main components... i read about monitor and such.. i would like a system without master metadata server, i think less components the better
[13:30] <jerker> trotofdoom: you have failover between MONs. And between the OSD.
[13:30] <jerker> trotofdoom: built in as designed.
[13:31] <trotofdoom> so lets say i will have three virtual machines running this distributed file system... if one of them goes down, everything should still be working
[13:31] <trotofdoom> after i restart that machine it should resync automatically
[13:32] <trotofdoom> possible with ceph?
[13:32] <jerker> If one of the physical machines goes down (the OSDs) then the running virtual machines will keep running and storing the data on the other physical machines (OSDs). But if the virtual machine was running on that physical machine then the current state of the VM will be lost. (of course the "hard drive" of the VM will still be there).
[13:33] * b0e1 (~aledermue@213.95.15.4) has joined #ceph
[13:33] <jerker> So you can start up the VM on another physical machine. But I guess that is more of some other layer job to handle the VMs. I have not my self been running OpenStack and the like so I do not know what they can do.
[13:34] <jerker> Ceph take care of the storage and do not do any live migration a similiar of VMs.
[13:34] <Keksior> jerker: can you help me with writes slow performance now ?
[13:34] <Keksior> the reads working great
[13:34] <jerker> Keksior: If possible.
[13:35] <Keksior> but the writes on the rbd volume is now 40MB/s with 200MB file and 4 MB/s with 1 GB file
[13:36] <jerker> How do you test?
[13:36] <Keksior> jerker: the reads working great with read_ahead
[13:36] <jerker> How large are your journals?
[13:36] <Keksior> dd from /dev/zero to file
[13:36] <Keksior> 5 GB for now
[13:36] <Keksior> i can make them up to 30-50 GB
[13:36] <Keksior> per osd
[13:36] <Keksior> but is it worth of having so big journals ?
[13:37] * b0e (~aledermue@juniper1.netways.de) Quit (Ping timeout: 480 seconds)
[13:37] <jerker> Keksior: I have only 10 GB journal per OSD.
[13:37] <jerker> I try this now on my cluster: dd if=/dev/sda of=/escapemachine/test bs=1M count=10000
[13:38] <jerker> /dev/sdb1 is /escapemachine (its my time machine).
[13:38] <Keksior> do you have ssd journal :>?
[13:38] <jerker> Keksior: yes 10 GB SSD journal per OSD
[13:38] <Keksior> i see now
[13:39] <Keksior> that when i run dd to write file form /dev/zero to vda on VM the proc is unused
[13:39] <jerker> Keksior: I do not know how large journals are useful. I have considered bcache but I would like to keep it simple.
[13:39] <Keksior> with reads it's 90% used
[13:39] <Keksior> jerker: i would also like have it simple, but the 4 MB/s writes are way too slow :(
[13:39] <jerker> Indeed.
[13:39] <Keksior> i even removed ceph mon from nova-compute node
[13:40] <Keksior> to check does this affect performance
[13:40] <Keksior> and it isnt
[13:40] <jerker> I get 100% IO-util when reading from /dev/sda and 58% IO-util when writing to file on /dev/sdb, both are on RBD.... Strange
[13:41] <jerker> Ah now it is more even with 80-90% writing too.. Waiting for numbers.
[13:42] * mattt (~textual@CPE68b6fcfafe43-CM68b6fcfafe40.cpe.net.cable.rogers.com) has joined #ceph
[13:43] <Keksior> could you look on my ceph.conf file, maybe you will find anything that i need to add to it - http://paste.openstack.org/show/74126/
[13:44] <Keksior> also i've created my cluster on the btrfs partition to get more speed, but it seems to be otherwise ;/
[13:44] <jerker> Keksior: I get 31.0 MB/s when reading from RBD and writing to another RBD. with "dd bs=1M count=10k"
[13:45] * b0e1 (~aledermue@213.95.15.4) Quit (Quit: Leaving.)
[13:45] <jerker> Keksior: I run XFS, using standard ceph-deploy
[13:45] <jerker> Keksior: do you use compression in Btrfs?
[13:45] <Keksior> jerker: if default it's off then not, i've used ceph-deploy with --fs-type btrfs option only
[13:46] * fghaas (~florian@213.17.226.11) Quit (Quit: Leaving.)
[13:48] <jerker> Keksior: when doing "dd if=/dev/zero bs=1M count=1000" I get 56.9 MB/s.
[13:49] <jerker> My ceph.ceph is shorter and do not contain anything about "filestore ...." otherwise almost the same as far as I can tell.
[13:49] <Keksior> jerker: this are lines syed_ told me to try to add
[13:50] <jerker> Keksior: ok
[13:50] <Keksior> but that does nothing to my performance :(
[13:50] <jerker> Keksior: try XFS instead of Btrfs? I have not used Btrfs myself. Without compression I have personally no reason to run Btrfs instead of XFS.
[13:51] * alfredodeza (~alfredode@198.206.133.89) has left #ceph
[13:51] <fedgoat> Can anyone help me with stale buckets that I cant remove with rados. How do I edit a users omap index. I've gone over pretty much everything conventional in the ways of removing buckets. I ended up with Full OSD's and after adding new osd's and rebalancing, trying to do bucket rm, seems to have purged the data, but 2 buckets remain that won't DIE
[13:51] <fedgoat> I also believe this could be related to an unresolved issue opened http://tracker.ceph.com/issues/5197
[13:53] <Keksior> jerker: the problem is i must recreate to xfs one by one, because there is data that cannot disappear
[13:53] <Keksior> but i'll try
[13:53] <jerker> Keksior: I do not know, but that is at least a major thing that is different from my setup.
[13:55] <jerker> Keksior: I run on an old Netgear gigabit switch by the way. My next step is to get better throughput. Bond up interfaces ala old school Beowulf clusters maybe. Or get 10g switch. We have a few 10g connectors available but they are all over the building. :(
[13:55] <Keksior> jerker: now recreating osds with xfs, had you add extra options to tune xfs?
[13:55] <jerker> No.
[13:55] * Cataglottism (~Cataglott@dsl-087-195-030-170.solcon.nl) has joined #ceph
[13:56] * fatih_ (~fatih@static.39.43.46.78.clients.your-server.de) has joined #ceph
[13:56] <jerker> I ran "ceph-deploy prepare node1:sdb:sda3 node2:sdc:sda4" as plain as possible.
[13:56] <jerker> etc
[13:57] <jerker> Sorry "ceph-deploy prepare node1:sdb:/dev/sda3 node1:sdc:/dev/sda4" if I remember correctly
[13:57] <jerker> (of course check your devices yourself) :)
[13:57] * fatih_ (~fatih@static.39.43.46.78.clients.your-server.de) Quit ()
[13:57] <Keksior> yes of course :)
[13:57] * fatih (~fatih@78.186.36.182) Quit (Read error: Operation timed out)
[14:00] * BurgerEat3r (~oftc-webi@185.13.89.187) has joined #ceph
[14:01] * wschulze (~wschulze@p54BEDDB2.dip0.t-ipconnect.de) Quit (Quit: Leaving.)
[14:02] * ninkotech_ (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[14:03] * sjm (~Adium@pool-108-53-56-179.nwrknj.fios.verizon.net) has joined #ceph
[14:04] * mattch (~mattch@pcw3047.see.ed.ac.uk) Quit (Ping timeout: 480 seconds)
[14:05] <trotofdoom> so to wrap things up, maybe i should wait a little bit before using ceph in production... what about glusterFS? any thoughts?
[14:06] <trotofdoom> and what would you recommend using instead of xenserver 6.2 ?
[14:06] <jerker> trotofdoom: for running VMs with RBD that works fine. But not the POSIX-file system yet.
[14:06] <jerker> trotofdoom: That is out of my area, I'm sorry. I am just playing around with individual VMs starting plain KVM.
[14:07] * mattch (~mattch@pcw3047.see.ed.ac.uk) has joined #ceph
[14:07] * sroy (~sroy@207.96.182.162) has joined #ceph
[14:08] <jerker> trotofdoom: regarding glusterfs, better talk to them i guess. I am happy both GlusterFS and Ceph has improved since I played with them the first time 2008 :-)
[14:09] <trotofdoom> are there any good alternatives...something i missed?
[14:09] <trotofdoom> luster i think is an overkill for such small scale
[14:10] <jerker> trotofdoom: For plain availability check out DRBD. Much more simple in a sense but works with two servers. May be exported via SMB and NFS.
[14:10] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[14:11] <trotofdoom> i dont like drbd beacuse of two server limit... i want a solution which allows adding nodes when the need comes
[14:11] * sprachgenerator (~sprachgen@c-67-167-211-254.hsd1.il.comcast.net) Quit (Quit: sprachgenerator)
[14:12] * ninkotech_ (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Ping timeout: 480 seconds)
[14:12] * sroy (~sroy@207.96.182.162) Quit (Read error: Operation timed out)
[14:15] <trotofdoom> guys thanks a lot for help i will probably test glusterFS myself and see what it can do
[14:19] * sroy (~sroy@207.96.182.162) has joined #ceph
[14:19] * japuzzo (~japuzzo@pok2.bluebird.ibm.com) has joined #ceph
[14:22] * fghaas (~florian@213.17.226.11) has joined #ceph
[14:23] * mattt (~textual@CPE68b6fcfafe43-CM68b6fcfafe40.cpe.net.cable.rogers.com) Quit (Quit: Computer has gone to sleep.)
[14:25] * diegows_ (~diegows@190.190.5.238) has joined #ceph
[14:29] * ninkotech_ (~duplo@217-112-170-132.adsl.avonet.cz) has joined #ceph
[14:31] * thb (~me@0001bd58.user.oftc.net) Quit (Quit: Leaving.)
[14:33] * ninkotech_ (~duplo@217-112-170-132.adsl.avonet.cz) Quit (Remote host closed the connection)
[14:33] * ninkotech_ (~duplo@217-112-170-132.adsl.avonet.cz) has joined #ceph
[14:34] * fdmanana (~fdmanana@bl5-172-157.dsl.telepac.pt) has joined #ceph
[14:35] * ninkotech__ (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[14:37] * jcsp (~Adium@0001bf3a.user.oftc.net) Quit (Quit: Leaving.)
[14:38] * jcsp (~Adium@0001bf3a.user.oftc.net) has joined #ceph
[14:40] * b0e (~aledermue@juniper1.netways.de) has joined #ceph
[14:40] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[14:41] * gregsfortytwo1 (~Adium@cpe-172-250-69-138.socal.res.rr.com) has joined #ceph
[14:41] * gregsfortytwo1 (~Adium@cpe-172-250-69-138.socal.res.rr.com) Quit ()
[14:42] * ninkotech_ (~duplo@217-112-170-132.adsl.avonet.cz) Quit (Ping timeout: 480 seconds)
[14:44] * xdeller (~xdeller@109.188.124.66) has joined #ceph
[14:45] <Keksior> jerker: wrr i cna't migrate from btrfs to xfs i need copy-on-write :(
[14:46] <Keksior> so i need to get this working on my existing configuration
[14:46] * jtaguinerd (~Adium@121.54.32.134) has joined #ceph
[14:47] <jtaguinerd> hi guys
[14:47] <Gugge-47527> Keksior: why do you need cow on the osd filesystem?
[14:49] * thuc (~thuc@c-71-198-202-49.hsd1.ca.comcast.net) has joined #ceph
[14:50] <Keksior> Gugge-47527: openstack VM os volumes
[14:53] * rahatm1 (~rahatm1@CPE602ad089ce64-CM602ad089ce61.cpe.net.cable.rogers.com) has joined #ceph
[14:53] * rahatm1 (~rahatm1@CPE602ad089ce64-CM602ad089ce61.cpe.net.cable.rogers.com) Quit ()
[14:54] <jtaguinerd> I have a situation here that I am trying to understand. I am using Openstack and have integrated Ceph in Cinder. I did df -h from all my VMs and the sum of all is 30TB, however from Ceph cli it shows raw data consumption is 60TB while real usage including replica count is 118TB. My question is how come the total of df -h from the VM is different from the raw data consumption in ceph cli?
[14:54] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) has joined #ceph
[14:55] * yanzheng (~zhyan@134.134.137.75) Quit (Remote host closed the connection)
[14:55] <jtaguinerd> Thanks in advance to whoever can explain the reason behind the two different result. :)
[14:56] * jamespage (~jamespage@culvain.gromper.net) Quit (Quit: Coyote finally caught me)
[14:56] * jamespage (~jamespage@culvain.gromper.net) has joined #ceph
[14:57] * valeech (~valeech@pool-71-171-123-210.clppva.fios.verizon.net) Quit (Quit: valeech)
[14:57] * thuc (~thuc@c-71-198-202-49.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[14:58] <alphe> jtaguinerd deference comes from data+replica
[14:58] <alphe> jtaguinerd os only show the data writen but not the replicas wich are ceph internal sauce
[14:59] <alphe> and that there is no need for the os level to know about it
[14:59] * dmsimard (~Adium@108.163.152.2) has joined #ceph
[14:59] <alphe> but at ceph client level then you obviously need to know what the overall use is
[15:01] <jtaguinerd> hi alphe, thanks for the reply..
[15:01] <Keksior> what rbd_cache_size should i use?
[15:01] <alphe> Keksior depends you disk no ?
[15:02] <Keksior> i use 4 osd +1ssd for journal
[15:02] <Keksior> also if i have big SSD (500 GB SSD + 4x 300 GB SAS) shoud i use 120 GB for journal per osd ?
[15:02] <jtaguinerd> Alphe, my ceph cli result looks like this 57488 GB data, 114 TB used, 61438 GB / 174 TB avail;
[15:03] <Keksior> or is it too much,
[15:03] <jtaguinerd> i understand that 57488 is the raw data stored, while 114 TB is the actual space used including the replica count
[15:03] <alphe> jtaguinerd first value is the data writen and available seen by os second value 114tb is data + replicas
[15:03] <alphe> third values are the available space overall the space possible
[15:04] <alphe> you loose 2 tb per osd just by the first formating
[15:04] * zack_dolby (~textual@e0109-114-22-14-183.uqwimax.jp) has joined #ceph
[15:04] <jtaguinerd> aphe, ok but why is there so much difference with the result of df -h from inside the VM?
[15:05] <alphe> jtaguinerd hum seems odd indeed ...
[15:05] <alphe> let me think about it
[15:05] <Gugge-47527> Keksior: you dont need cow on the osd filesystem to do _any_ of the features rbd support
[15:05] <alphe> the virtual disk you did what are they real space ?
[15:06] <alphe> are they with fixed size ?
[15:06] <jtaguinerd> alphe, what do you mean by real space? :)
[15:06] <alphe> jtaguinerd ...
[15:06] <jtaguinerd> it's boot from volume from Openstack
[15:07] <Gugge-47527> jtaguinerd: you see 30TB used inside the VM, and 60TB used on the cluster ... because when you delete files from a vm, the data is not deleted from the virtual disks
[15:07] <alphe> jtaguinerd when your create a virtual drive you have 2 ways to do it or with fixed allocated size or with dynamic allocated size
[15:07] <Keksior> Gugge-47527: ok, thanks for info. I'm trying everything to get at least 100MB/s write performance in my VM :(
[15:07] <Gugge-47527> you need your guests to support trim, and you need the stack all the way down to the osd's to support trim
[15:07] <alphe> Gugge-47527 right
[15:07] * rahatm1 (~rahatm1@CPE602ad089ce64-CM602ad089ce61.cpe.net.cable.rogers.com) has joined #ceph
[15:07] <jtaguinerd> Gugge-47527, how can i delete them if that is the case?
[15:07] * rahatm1 (~rahatm1@CPE602ad089ce64-CM602ad089ce61.cpe.net.cable.rogers.com) Quit ()
[15:08] <Gugge-47527> jtaguinerd: with trim
[15:08] * zack_dol_ (~textual@p852cae.tokynt01.ap.so-net.ne.jp) has joined #ceph
[15:08] <Gugge-47527> or create a new rbd, with a new filesystem, and copy the data over, and delete the old rbd
[15:08] <Keksior> albo in the ceph.conf file should i use "options_with_underline" or "options without underline"?
[15:08] <Gugge-47527> Keksior: both work
[15:08] <alphe> jtaguinerd you need to install an intermediate fs that is FITRIM for example
[15:08] * mattch (~mattch@pcw3047.see.ed.ac.uk) Quit (Ping timeout: 480 seconds)
[15:09] <Keksior> ok
[15:09] <Keksior> Gugge-47527: can you help me track down my performance issue with rbd volumes ?
[15:09] * linuxkidd (~linuxkidd@rtp-isp-nat1.cisco.com) has joined #ceph
[15:09] <alphe> Keksior don t forget that your vm is an emulated machine ...
[15:09] <Keksior> if i mount rbd to OS i'll get 1GB/s write, but from VM i've got only 4 MB/s
[15:10] <alphe> so don t expect emulated hardware to be as cool than real hardware
[15:10] <Gugge-47527> Keksior: do you have 10Gbit network?
[15:10] <alphe> if you reach 60MB/s steady write that is a great achievement already
[15:10] <Keksior> i just expect to get them working normally, no i have to wait to 1-2 minutes to login
[15:11] <Keksior> Gugge-47527: 1 Gbit network only i've got, but i use ceph mon physical server as ceph client (nova-compute on the same host as ceph mon)
[15:11] <Keksior> + cluster network bonded with 4x1Gbit
[15:11] <Gugge-47527> Keksior: then this is a lie: "if i mount rbd to OS i'll get 1GB/s write"
[15:11] * BillK (~BillK-OFT@58-7-168-173.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[15:11] <alphe> Keksior you can t have more data pumped in than the tube allows
[15:12] <Keksior> i'm using RevoDrive 3 for journal so if i write data on the same physical server i think it's possible
[15:12] <jtaguinerd> alphe, Gugge-47527, thanks for the reply, atleast now i know why my ceph consumption is getting bloated. But seems like a dangerous thing to do?
[15:12] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) has joined #ceph
[15:12] * The_Bishop (~bishop@2001:470:50b6:0:6dd0:23f:9159:4ddf) Quit (Ping timeout: 480 seconds)
[15:12] * thuc (~thuc@c-71-198-202-49.hsd1.ca.comcast.net) has joined #ceph
[15:12] * yanzheng (~zhyan@jfdmzpr06-ext.jf.intel.com) has joined #ceph
[15:12] * zack_dolby (~textual@e0109-114-22-14-183.uqwimax.jp) Quit (Ping timeout: 480 seconds)
[15:12] <Gugge-47527> jtaguinerd: that is just how filesystems work, when you delete data it stays on the disk.
[15:12] <Keksior> even if the client and ceph-mon is the same machine it goes throught network stack ?
[15:12] <alphe> jtaguinerd yes data are never realy deleted until you install a trim
[15:12] <Gugge-47527> jtaguinerd: until you overwrite that part of the disk with new data
[15:13] <Gugge-47527> Keksior: the mon is not involved with the data transfer
[15:13] <alphe> then trim will have a cost in prefs since each delete = real remove of the deleted data
[15:13] <Gugge-47527> Keksior: and mounting rbd on the same server as osd can give deadlocks
[15:13] * zack_dolby (~textual@p852cae.tokynt01.ap.so-net.ne.jp) has joined #ceph
[15:13] * zack_dol_ (~textual@p852cae.tokynt01.ap.so-net.ne.jp) Quit (Read error: Connection reset by peer)
[15:13] <Keksior> Gugge-47527: ok now i see
[15:13] <Gugge-47527> s/mounting/mapping/
[15:13] <kraken> Gugge-47527 meant to say: Keksior: and mapping rbd on the same server as osd can give deadlocks
[15:13] <alphe> Gugge-47527 the trim is to be installed at osd level right ?
[15:14] <Gugge-47527> trim has to be enabled in the guest
[15:14] <alphe> and if you use a rbd-image at the client rbd machine level 2 since there is 2 xfs layers
[15:14] <Gugge-47527> as far as i remember librbd / qemu supports trim
[15:14] <Gugge-47527> but kernel rbd does not
[15:14] <alphe> Gugge-47527 but the guest in the case of openstack radosgw doesn t mount a device
[15:14] <Gugge-47527> trim does not need to be enabled on the osd fs
[15:15] <Gugge-47527> openstack uses radosgw and not rbd?
[15:15] <alphe> ok so since I use kernel rbd I would be unable tu use trim
[15:15] <Keksior> Gugge-47527: but so far i cannot win with performance better than 10-30MB/s, i think if the network could handle 1 Gbit i should have at least 80MB/s
[15:16] <Gugge-47527> Keksior: try a size=1 pool, to see if that gives you better performance
[15:16] <alphe> Gugge-47527 normally openstack and s3-amazon cloud style storage using ceph as backend are using rbd in a native way
[15:16] <Gugge-47527> a write right now has to go from vm host to one osd, and from that osd to an osd on another host
[15:16] <Gugge-47527> check the network usage on all hosts while you write
[15:17] <Gugge-47527> and check iostat on the osd filesystems while you write
[15:17] <Gugge-47527> see if something reaches a limit
[15:17] * BurgerEat3r (~oftc-webi@185.13.89.187) Quit (Quit: Page closed)
[15:17] <Keksior> it's only 300-400Mbit/s with 4x1Gbit Bonded
[15:17] <alphe> Keksior depends the service you use
[15:17] <Keksior> but i'll try with size=1
[15:17] * yanzheng (~zhyan@jfdmzpr06-ext.jf.intel.com) Quit (Remote host closed the connection)
[15:17] <alphe> with samba and a ton of small files don t expect going more than 10 Mbps
[15:18] * mattch (~mattch@pcw3047.see.ed.ac.uk) has joined #ceph
[15:18] <Keksior> alphe: i'm trying to use rbd via openstack cinder
[15:18] <Keksior> alphe: and the VM boots up about 8-10 minutes, ale login to it via ssh takes about 1-2 minutes
[15:18] <jtaguinerd> Gugge-47527, my setup (openstack+ceph) uses rbd
[15:18] <alphe> Keksior seems like your vms are super slow
[15:18] * Sommarnatt (~Sommarnat@kungsbacka.oderland.com) has joined #ceph
[15:19] <alphe> ssh login should be less than 1 sec
[15:19] <Gugge-47527> jtaguinerd: then you should be able to use trim if you use the right storage driver
[15:19] <Gugge-47527> http://ceph.com/docs/master/rbd/qemu-rbd/#enabling-discard-trim
[15:19] <jtaguinerd> ok, Gugge-47527 i'll figure it out. Thanks for pointing to the right direction :)
[15:20] <Keksior> alphe: i'm trying to solve out the speed of those VM's and i'm now even close :(
[15:20] * The_Bishop (~bishop@2001:470:50b6:0:289e:e599:e26:df61) has joined #ceph
[15:20] * thuc (~thuc@c-71-198-202-49.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[15:20] <Gugge-47527> Keksior: maybe try to run fio with the librbd backend on one of your openstack hosts
[15:21] <alphe> Keksior like I said virtual machines are emulations and emulation really use 1 single hardware resources ... dont know if that is ok
[15:21] * hybrid512 (~walid@195.200.167.70) Quit (Quit: Leaving.)
[15:21] <alphe> Keksior there is bench tools in ceph too
[15:21] <alphe> like ceph osd stat for example
[15:21] <alphe> like ceph osd perf for example
[15:22] * hybrid512 (~walid@195.200.167.70) has joined #ceph
[15:22] <Keksior> ok i'll try this and give you a results
[15:22] * hybrid512 (~walid@195.200.167.70) Quit ()
[15:22] * hybrid512 (~walid@195.200.167.70) has joined #ceph
[15:22] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[15:24] <alphe> you can use jnettop to monitor your network
[15:25] <alphe> and see accuratly what ceph nodes are doing
[15:27] <alphe> alfredoza are you around ?
[15:28] <alphe> with ceph-deploy I have a tiny problem ceph-install is broken if trying to install emperor on a saucy (ubuntu 13.10) because there is not ceph.com/debian-emperor/saucy directory
[15:28] * mjeanson (~mjeanson@00012705.user.oftc.net) Quit (Remote host closed the connection)
[15:29] <alphe> alfredoza with ceph-deploy I have a tiny problem ceph-deploy install is broken if trying to install emperor on a saucy (ubuntu 13.10) because there is not ceph.com/debian-emperor/saucy directory
[15:30] * mjeanson (~mjeanson@bell.multivax.ca) has joined #ceph
[15:30] * valeech (~valeech@pool-71-171-123-210.clppva.fios.verizon.net) has joined #ceph
[15:30] * sprachgenerator (~sprachgen@130.202.135.185) has joined #ceph
[15:31] <fedgoat> Can anyone help me with stale buckets that I cant remove with rados. How do I edit a users omap index. I've gone over pretty much everything conventional in the ways of removing buckets. I ended up with Full OSD's and after adding new osd's and rebalancing, trying to do bucket rm, seems to have purged the data, but 2 buckets remain that won't DIE
[15:31] <fedgoat> <fedgoat> I also believe this could be related to an unresolved issue opened http://tracker.ceph.com/issues/5197
[15:31] <pmatulis> alphe: that should be fixed in a few days
[15:35] * yanzheng (~zhyan@jfdmzpr06-ext.jf.intel.com) has joined #ceph
[15:35] * renzhi (~renzhi@192.241.193.44) Quit (Ping timeout: 480 seconds)
[15:36] <alphe> pmatulis ok :)
[15:37] <alphe> pmatulis I manually changed the way the os version is check if saucy then we use raring ...
[15:37] <alphe> but that is not the proper way ...
[15:37] <alphe> the proper way is to have a directory for every ubuntu distro supported ...
[15:38] <alphe> if betwin raring and saucy there is no real change then it can be a link of the other
[15:38] <alphe> saucy --> raring
[15:38] * wrale (~wrale@cpe-107-9-20-3.woh.res.rr.com) has joined #ceph
[15:39] * thuc (~thuc@c-71-198-202-49.hsd1.ca.comcast.net) has joined #ceph
[15:40] * valeech (~valeech@pool-71-171-123-210.clppva.fios.verizon.net) Quit (Quit: valeech)
[15:41] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[15:47] <bboris> what is the difference between "mon initial members" and "mon_initial_members" ?
[15:47] * thuc (~thuc@c-71-198-202-49.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[15:47] <bboris> ceph-deploy writes it with underscores in the config
[15:51] * oro (~oro@2001:620:20:222:c0c7:3647:c369:d16b) Quit (Ping timeout: 480 seconds)
[15:51] * yanzheng (~zhyan@jfdmzpr06-ext.jf.intel.com) Quit (Remote host closed the connection)
[15:52] <joao> no difference at all
[15:52] * zidarsk8 (~zidar@89-212-142-10.dynamic.t-2.net) has joined #ceph
[15:52] * jtaguinerd (~Adium@121.54.32.134) Quit (Ping timeout: 480 seconds)
[15:52] * jtaguinerd (~Adium@121.54.44.128) has joined #ceph
[15:52] * zidarsk8 (~zidar@89-212-142-10.dynamic.t-2.net) has left #ceph
[15:52] <joao> 'mon initial members', 'mon-initial-members' or 'mon_initial_members' all translate to the same thing internally
[15:52] * mattt (~textual@92.52.76.140) has joined #ceph
[15:53] <joao> well, at least the one with spaces and underscores does; pretty sure about the one with the dashes but not 100%
[15:53] <joao> yeah, dashes too
[15:53] <joao> that also applies to all other config options
[15:54] <joao> e.g., '--debug-mon 10' is the same as '--debug_mon 10' and 'debug mon = 10' in the config file
[15:54] * jmlowe1 (~Adium@2601:d:a800:511:8d26:9807:cedc:7975) has joined #ceph
[15:55] * madkiss (~madkiss@host-82-135-29-140.customer.m-online.net) Quit (Quit: Leaving.)
[15:55] * mattt (~textual@92.52.76.140) Quit (Read error: Connection reset by peer)
[15:55] * thomnico (~thomnico@2a01:e35:8b41:120:50dd:ed31:f46e:2f55) Quit (Quit: Ex-Chat)
[15:57] <jmlowe1> I have a quick question about tiered caching, the cache pool is just a regular pool that is designated as a higher tier?
[15:59] * shang (~ShangWu@220-135-203-169.HINET-IP.hinet.net) has joined #ceph
[15:59] * Cataglottism (~Cataglott@dsl-087-195-030-170.solcon.nl) Quit (Quit: My Mac Pro has gone to sleep. ZZZzzz???)
[16:03] * Keksior (~oftc-webi@109.232.242.2) Quit (Quit: Page closed)
[16:03] * thomnico (~thomnico@2a01:e35:8b41:120:51bd:4913:9399:150b) has joined #ceph
[16:04] * thuc (~thuc@c-71-198-202-49.hsd1.ca.comcast.net) has joined #ceph
[16:04] * bitblt (~don@128-107-239-233.cisco.com) has joined #ceph
[16:09] * Sommarnatt (~Sommarnat@kungsbacka.oderland.com) Quit (Remote host closed the connection)
[16:11] * cmdrk (~lincoln@c-24-12-206-91.hsd1.il.comcast.net) has joined #ceph
[16:12] * gregsfortytwo (~Adium@2607:f298:a:607:14a9:6586:fbe1:902e) Quit (Quit: Leaving.)
[16:12] * thuc (~thuc@c-71-198-202-49.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[16:13] * gregsfortytwo (~Adium@2607:f298:a:607:5c6d:7d97:be10:c78c) has joined #ceph
[16:14] * humbolt1 (~elias@178-190-244-65.adsl.highway.telekom.at) has joined #ceph
[16:15] <cmdrk> Hi all, can anyone ever see messages of this type: libceph: osd66 192.168.1.227:6875 socket closed (con state OPEN)
[16:16] <cmdrk> I see dmesg spammed with messages like this on one of my hosts that has an rbd volume mapped and mounted
[16:16] <cmdrk> I tried restarting osd.66 to no effect
[16:16] <gregsfortytwo> cmdrk: that usually just means that there was an idle session to an OSD that got closed
[16:16] <gregsfortytwo> it's not evidence of any issues on its own
[16:16] <gregsfortytwo> jmlowe1: yeah, the cache pool is a normal pool
[16:17] <gregsfortytwo> there's a page about it in the dev section of ceph.com/docs
[16:18] <jmlowe1> ok, so normal crush rules and replication levels, I can take down a osd in the cache pool and everything will work like I would expect
[16:19] <cmdrk> gregsfortytwo: thanks
[16:19] * humbolt (~elias@91-114-136-138.adsl.highway.telekom.at) Quit (Ping timeout: 480 seconds)
[16:21] * thomnico (~thomnico@2a01:e35:8b41:120:51bd:4913:9399:150b) Quit (Quit: Ex-Chat)
[16:26] * thuc (~thuc@c-71-198-202-49.hsd1.ca.comcast.net) has joined #ceph
[16:28] * hasues (~hasues@kwfw01.scrippsnetworksinteractive.com) has joined #ceph
[16:31] <mikedawson> jmlowe1: right
[16:31] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[16:34] * thuc (~thuc@c-71-198-202-49.hsd1.ca.comcast.net) Quit (Ping timeout: 481 seconds)
[16:39] * sroy (~sroy@207.96.182.162) Quit (Ping timeout: 480 seconds)
[16:43] * jtaguinerd1 (~Adium@121.54.32.134) has joined #ceph
[16:43] * jtaguinerd (~Adium@121.54.44.128) Quit (Read error: Connection reset by peer)
[16:48] * `jpg (~josephgla@ppp121-44-151-43.lns20.syd7.internode.on.net) Quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz???)
[16:48] * sroy (~sroy@207.96.182.162) has joined #ceph
[16:49] * sroy (~sroy@207.96.182.162) Quit ()
[16:49] * sroy (~sroy@207.96.182.162) has joined #ceph
[16:53] <loicd> houkouonchi-work: http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-deb-precise-amd64-basic/log.cgi?log=236a505d81382aae2a29d616ffd471b380f580af claims it does not find https://github.com/ceph/jerasure/commit/4da55343bc2f93391db6a8cfecf90bbba11108b2 . Could it be because the local mirror updates are delayed ?
[16:54] * thomnico (~thomnico@2a01:e35:8b41:120:51bd:4913:9399:150b) has joined #ceph
[16:55] * loicd noticed http://tracker.ceph.com/issues/7826#change-33791 thanks sage :-)
[17:02] * analbeard (~shw@141.0.32.124) Quit (Quit: Leaving.)
[17:03] * rmoe (~quassel@173-228-89-134.dsl.static.sonic.net) Quit (Remote host closed the connection)
[17:04] * reed (~reed@75-101-54-131.dsl.static.sonic.net) has joined #ceph
[17:06] * trotofdoom (~trotofdoo@BSN-143-34-158.dial-up.dsl.siol.net) Quit ()
[17:07] * valeech (~valeech@ip72-205-7-86.dc.dc.cox.net) has joined #ceph
[17:08] * oro (~oro@77-59-135-139.dclient.hispeed.ch) has joined #ceph
[17:12] * fatih (~fatih@78.186.36.182) has joined #ceph
[17:16] * thomnico (~thomnico@2a01:e35:8b41:120:51bd:4913:9399:150b) Quit (Quit: Ex-Chat)
[17:16] * zerick (~eocrospom@190.187.21.53) has joined #ceph
[17:17] * zerick (~eocrospom@190.187.21.53) Quit (Read error: Connection reset by peer)
[17:21] * BManojlovic (~steki@91.195.39.5) Quit (Ping timeout: 480 seconds)
[17:29] * Lea (~LeaChim@host86-159-235-225.range86-159.btcentralplus.com) has joined #ceph
[17:29] * LeaChim (~LeaChim@host86-159-235-225.range86-159.btcentralplus.com) Quit (Read error: Connection reset by peer)
[17:31] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[17:31] <alphe> how do I change that ceph-nodo03=0.0.0.0:6800/0 in my mon map I have a monitor that is not properly set
[17:31] * jmlowe1 (~Adium@2601:d:a800:511:8d26:9807:cedc:7975) has left #ceph
[17:31] * jtaguinerd1 (~Adium@121.54.32.134) Quit (Quit: Leaving.)
[17:31] * shang (~ShangWu@220-135-203-169.HINET-IP.hinet.net) Quit (Quit: Ex-Chat)
[17:32] * wschulze (~wschulze@p54BEDDB2.dip0.t-ipconnect.de) has joined #ceph
[17:33] <alphe> selfsolution: do a ceph-deploy mon destroy <node>
[17:33] <alphe> then do a ceph-deploy mon add <node>
[17:34] <alphe> that will update the monmap a set everything working smoothly
[17:35] * Shmouel (~Sam@fny94-12-83-157-27-95.fbx.proxad.net) has joined #ceph
[17:40] * xarses (~andreww@c-24-23-183-44.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[17:41] * Shmouel1 (~Sam@fny94-12-83-157-27-95.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[17:44] * sjusthm (~sam@24-205-43-60.dhcp.gldl.ca.charter.com) has joined #ceph
[17:47] * Nats (~Nats@telstr575.lnk.telstra.net) Quit (Ping timeout: 480 seconds)
[17:47] * renzhi (~renzhi@192.241.193.44) has joined #ceph
[17:55] * oro (~oro@77-59-135-139.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[17:58] * b0e (~aledermue@juniper1.netways.de) Quit (Remote host closed the connection)
[18:00] * xmltok (~xmltok@216.103.134.250) Quit (Quit: Leaving...)
[18:02] <saturnine> ls
[18:03] * jjgalvez (~jjgalvez@ip98-167-16-160.lv.lv.cox.net) has joined #ceph
[18:07] * rotbeard (~redbeard@b2b-94-79-138-170.unitymedia.biz) Quit (Quit: Leaving)
[18:07] <bens> any inktank people in here
[18:07] <bens> (support)
[18:08] * sputnik13 (~sputnik13@64.134.221.62) has joined #ceph
[18:09] * Haksoldier (~islamatta@88.234.60.197) has joined #ceph
[18:09] <Haksoldier> EUZUBILLAHIMINE??EYTANIRRACIM BISMILLAHIRRAHMANIRRAHIM
[18:09] <Haksoldier> ALLAHU EKBERRRRR! LA ?LAHE ?LLALLAH MUHAMMEDEN RESULULLAH!
[18:09] <Haksoldier> I did the obligatory prayers five times a day to the nation. And I promised myself that, who (beside me) taking care not to make the five daily prayers comes ahead of time, I'll put it to heaven. Who says prayer does not show attention to me I do not have a word for it.! Prophet Muhammad (s.a.v.)
[18:09] <Haksoldier> hell if you did until the needle tip could not remove your head from prostration Prophet Muhammad pbuh
[18:09] * Haksoldier (~islamatta@88.234.60.197) has left #ceph
[18:09] <sage> bens: many
[18:11] <bens> I need second eyes on a ticket - I am in dnager of losing ceph because of performance
[18:12] <bens> I am getting awesome help so far, but we are stuck.
[18:13] * sputnik13 (~sputnik13@64.134.221.62) Quit ()
[18:14] * fghaas (~florian@213.17.226.11) Quit (Ping timeout: 480 seconds)
[18:18] <dwm> bens: I'm not an Inktank person, but I could look at it if it's public?
[18:19] * sroy (~sroy@207.96.182.162) Quit (Quit: Quitte)
[18:20] * sroy (~sroy@207.96.182.162) has joined #ceph
[18:25] * angdraug (~angdraug@12.164.168.117) has joined #ceph
[18:26] * princeholla (~princehol@p5DE95CEB.dip0.t-ipconnect.de) has joined #ceph
[18:26] <bboris> i'm trying to test the mon_osd_(near)?full_ratio options
[18:27] <bboris> currently i have "mon_osd_full_ratio": "0.05", "mon_osd_nearfull_ratio": "0.1"
[18:27] * xarses (~andreww@12.164.168.117) has joined #ceph
[18:27] <bboris> and the system's df reports /var/lib/ceph/osd/ceph-0 is at 20%
[18:28] <bboris> "ceph df" shows 19.32%
[18:29] * Cube1 (~Cube@66-87-67-113.pools.spcsdns.net) Quit (Quit: Leaving.)
[18:29] <bboris> that is raw usage, the non-empty pool is 9.62%
[18:29] * marrusl (~mark@209-150-43-182.c3-0.wsd-ubr2.qens-wsd.ny.cable.rcn.com) Quit (Remote host closed the connection)
[18:29] <bboris> still, "ceph -s" doesn't report near full
[18:29] <bboris> what am i missing? restarted all osds and monitors
[18:30] * princeholla (~princehol@p5DE95CEB.dip0.t-ipconnect.de) Quit ()
[18:31] * wrale_ (~wrale@cpe-107-9-20-3.woh.res.rr.com) has joined #ceph
[18:32] <bboris> that is ceph v0.77
[18:32] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[18:38] * wrale (~wrale@cpe-107-9-20-3.woh.res.rr.com) Quit (Ping timeout: 480 seconds)
[18:39] <bens> dwm: It isn't public, but thats for the offer.
[18:40] <bens> The gist of the problem is when a new pool is created, new PGs go into peering or degraded instantly and take a LONG time (15+ minutes) to recover
[18:40] * xmltok (~xmltok@216.103.134.250) has joined #ceph
[18:42] <bboris> if anyone can help me you can write here, i'll read it later in the archive. going back home, later :)
[18:42] * bboris (~boris@router14.mail.bg) Quit (Quit: leaving)
[18:43] * dpippenger (~riven@66-192-9-78.static.twtelecom.net) has joined #ceph
[18:45] * dis (~dis@109.110.66.239) Quit (Ping timeout: 480 seconds)
[18:46] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[18:47] * marrusl (~mark@209-150-43-182.c3-0.wsd-ubr2.qens-wsd.ny.cable.rcn.com) has joined #ceph
[18:47] * reed (~reed@75-101-54-131.dsl.static.sonic.net) Quit (Quit: Ex-Chat)
[18:53] <alphe> what can be done agains back_filling_too_full PGs ?
[18:53] <alphe> 3 active+remapped+wait_backfill+backfill_toofull
[18:55] <alphe> I was playing with a virtual ceph cluster to test somethings around dispace usage and ceph is really neat
[18:58] * japuzzo (~japuzzo@pok2.bluebird.ibm.com) Quit (Quit: Leaving)
[19:00] * Cube (~Cube@12.248.40.138) has joined #ceph
[19:05] <Gugge-47527> alphe: get more space :)
[19:14] * theactualwarrenusui (~Warren@2607:f298:a:607:4c3f:82e:add5:b567) has joined #ceph
[19:18] * angdraug (~angdraug@12.164.168.117) Quit (Remote host closed the connection)
[19:20] * angdraug (~angdraug@12.164.168.117) has joined #ceph
[19:21] * theonceandfuturewarrenusui (~Warren@2607:f298:a:607:b8db:c806:85f2:3ae7) Quit (Ping timeout: 480 seconds)
[19:23] * Boltsky (~textual@cpe-198-72-138-106.socal.res.rr.com) Quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz???)
[19:25] * bboris (~boris@78.90.142.146) has joined #ceph
[19:27] * The_Bishop (~bishop@2001:470:50b6:0:289e:e599:e26:df61) Quit (Ping timeout: 480 seconds)
[19:31] * haomaiwa_ (~haomaiwan@117.79.232.254) Quit (Ping timeout: 480 seconds)
[19:31] * The_Bishop (~bishop@2001:470:50b6:0:289e:e599:e26:df61) has joined #ceph
[19:34] * ksingh (~Adium@2001:708:10:10:c29:3034:c200:4ad) Quit (Quit: Leaving.)
[19:40] * meeh (~meeh@193.150.121.66) has joined #ceph
[19:42] * alfredodeza (~alfredode@198.206.133.89) has joined #ceph
[19:51] * thomnico (~thomnico@2a01:e35:8b41:120:51bd:4913:9399:150b) has joined #ceph
[19:52] * thomnico (~thomnico@2a01:e35:8b41:120:51bd:4913:9399:150b) Quit ()
[19:57] * Boltsky (~textual@office.deviantart.net) has joined #ceph
[20:02] <wrencsok> We have a problem getting usage statistics out the rados gateways for object storage. The resolution in the source code implies that we can get hourly results our testing implies that the interval we can get usage data for a user/bucket is in a 6 hour interval. We see 4 reports covering a 24 hour period, instead of 24 reports. Each hour we query gets us a 6 hour window. Is there some tunable or is that a bug with Emperor (72.2)
[20:02] <wrencsok> ?
[20:03] <wrencsok> who could I ping to talk to about that directly?
[20:09] * andreask (~andreask@h081217016175.dyn.cm.kabsi.at) has joined #ceph
[20:09] * ChanServ sets mode +v andreask
[20:12] * andreask (~andreask@h081217016175.dyn.cm.kabsi.at) has left #ceph
[20:14] * sjm (~Adium@pool-108-53-56-179.nwrknj.fios.verizon.net) Quit (Quit: Leaving.)
[20:15] * sputnik13 (~sputnik13@206.29.182.180) has joined #ceph
[20:19] * sjm (~Adium@pool-108-53-56-179.nwrknj.fios.verizon.net) has joined #ceph
[20:24] * JoeGruher (~JoeGruher@jfdmzpr01-ext.jf.intel.com) has joined #ceph
[20:25] <JoeGruher> is there a rule of thumb for number of PGs with erasure coded pools?
[20:27] * sputnik13 (~sputnik13@206.29.182.180) Quit (Ping timeout: 480 seconds)
[20:29] * japuzzo (~japuzzo@rrcs-24-39-154-34.nyc.biz.rr.com) has joined #ceph
[20:33] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[20:46] * dis (~dis@109.110.66.165) has joined #ceph
[20:49] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[20:49] <JoeGruher> this command doesn't seem to work - "ceph osd erasure-code-profile get myprofile" - from https://ceph.com/docs/master/dev/erasure-coded-pool/ - anyone used this?
[20:50] <jks> anyone knows if ceph dumpling RPMs are available somewhere for Fedora Core 17? (the official repo seems to include only for fc18 and fc19)
[20:54] * sage (~quassel@2607:f298:a:607:20ea:13bf:5af:2725) Quit (Remote host closed the connection)
[20:54] * sage (~quassel@2607:f298:a:607:3c97:85a6:2361:88b1) has joined #ceph
[20:54] * ChanServ sets mode +o sage
[20:57] * sjm (~Adium@pool-108-53-56-179.nwrknj.fios.verizon.net) has left #ceph
[20:57] * alphe (~alphe@0001ac6f.user.oftc.net) Quit (Quit: Leaving)
[20:59] * japuzzo (~japuzzo@rrcs-24-39-154-34.nyc.biz.rr.com) Quit (Ping timeout: 480 seconds)
[20:59] <JoeGruher> as far as I can tell "ceph osd erasure-code-profile" is not valid syntax for any commands?
[21:00] * yanzheng (~zhyan@134.134.137.71) has joined #ceph
[21:02] * ksingh (~Adium@a91-156-75-252.elisa-laajakaista.fi) has joined #ceph
[21:02] * ksingh (~Adium@a91-156-75-252.elisa-laajakaista.fi) has left #ceph
[21:04] * The_Bishop (~bishop@2001:470:50b6:0:289e:e599:e26:df61) Quit (Ping timeout: 480 seconds)
[21:11] * valeech (~valeech@ip72-205-7-86.dc.dc.cox.net) Quit (Quit: valeech)
[21:14] <dmick> JoeGruher: it should be, if you're running a late enough version
[21:14] <dmick> docs/master is for the master branch; what version are you running?
[21:17] <JoeGruher> i have 0.78
[21:17] <JoeGruher> dmick: ^
[21:19] <dmick> yeah. not in there.
[21:20] <JoeGruher> dmick: hmm. so, if one is using erasure coding in 0.78, how do you make those changes, specifically i'm interested in the one about changing the ruleset failure domain?
[21:20] * mattt (~textual@CPE0026f326e530-CM0026f326e52d.cpe.net.cable.rogers.com) has joined #ceph
[21:22] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[21:23] <dmick> don't know that one
[21:24] * danieagle (~Daniel@177.205.176.97.dynamic.adsl.gvt.net.br) has joined #ceph
[21:24] <JoeGruher> ok - thanks
[21:26] * zidarsk8 (~zidar@89-212-142-10.dynamic.t-2.net) has joined #ceph
[21:27] * zidarsk8 (~zidar@89-212-142-10.dynamic.t-2.net) has left #ceph
[21:29] * sroy (~sroy@207.96.182.162) Quit (Quit: Quitte)
[21:30] * joao|lap (~JL@a95-92-33-54.cpe.netcabo.pt) has joined #ceph
[21:30] * ChanServ sets mode +o joao|lap
[21:30] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Ping timeout: 480 seconds)
[21:31] * diegows_ (~diegows@190.190.5.238) Quit (Ping timeout: 480 seconds)
[21:34] * zerick (~eocrospom@190.187.21.53) has joined #ceph
[21:36] * toutour (~toutour@causses.idest.org) Quit (Remote host closed the connection)
[21:47] * toutour (~toutour@causses.idest.org) has joined #ceph
[21:49] * mattt (~textual@CPE0026f326e530-CM0026f326e52d.cpe.net.cable.rogers.com) Quit (Quit: Computer has gone to sleep.)
[21:51] * allsystemsarego (~allsystem@50c25c2.test.dnsbl.oftc.net) Quit (Quit: Leaving)
[21:52] * Cataglottism (~Cataglott@dsl-087-195-030-170.solcon.nl) has joined #ceph
[21:57] * analbeard (~shw@host31-53-108-38.range31-53.btcentralplus.com) has joined #ceph
[21:59] * The_Bishop (~bishop@g229164051.adsl.alicedsl.de) has joined #ceph
[22:02] * sleinen1 (~Adium@2001:620:0:26:3497:f885:eb0d:6a97) has joined #ceph
[22:03] * JoeGruher (~JoeGruher@jfdmzpr01-ext.jf.intel.com) Quit (Remote host closed the connection)
[22:04] * yanzheng (~zhyan@134.134.137.71) Quit (Remote host closed the connection)
[22:11] * danieagle (~Daniel@177.205.176.97.dynamic.adsl.gvt.net.br) Quit (Quit: Muito Obrigado por Tudo! :-))
[22:13] <jdmason> Is there a ceph command to show the version running or should I just use the package manager's version?
[22:15] <dmick> ceph tell osd.* version, ceph tell mon.* version
[22:15] <dmick> among others
[22:15] * MarkN1 (~nathan@142.208.70.115.static.exetel.com.au) has joined #ceph
[22:15] <fedgoat> Can anyone help me with stale buckets that I cant remove with rados. How do I edit a users omap index. I've gone over pretty much everything conventional in the ways of removing buckets. I ended up with Full OSD's and after adding new osd's and rebalancing, trying to do bucket rm, seems to have purged the data, but 2 buckets remain that won't DIE
[22:15] <fedgoat> <fedgoat> <fedgoat> I also believe this could be related to an unresolved issue opened http://tracker.ceph.com/issues/5197
[22:16] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 28.0/20140314220517])
[22:16] <jdmason> dmick: great, thanks
[22:18] * MarkN2 (~nathan@197.204.233.220.static.exetel.com.au) has joined #ceph
[22:19] * marrusl (~mark@209-150-43-182.c3-0.wsd-ubr2.qens-wsd.ny.cable.rcn.com) Quit (Remote host closed the connection)
[22:19] * MarkN1 (~nathan@142.208.70.115.static.exetel.com.au) Quit (Read error: Connection reset by peer)
[22:21] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has joined #ceph
[22:22] * Sysadmin88 (~IceChat77@176.254.32.31) has joined #ceph
[22:22] * perfectsine (~perfectsi@if01-gn01.dal05.softlayer.com) Quit (Remote host closed the connection)
[22:26] * MarkN2 (~nathan@197.204.233.220.static.exetel.com.au) Quit (Ping timeout: 480 seconds)
[22:29] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[22:32] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Quit: Leaving.)
[22:43] * perfectsine (~perfectsi@if01-gn01.dal05.softlayer.com) has joined #ceph
[22:46] * Cube1 (~Cube@12.248.40.138) has joined #ceph
[22:46] * Cube (~Cube@12.248.40.138) Quit (Read error: Connection reset by peer)
[22:46] * Nats (~Nats@telstr575.lnk.telstra.net) has joined #ceph
[22:49] * wschulze (~wschulze@p54BEDDB2.dip0.t-ipconnect.de) Quit (Quit: Leaving.)
[22:50] <Kioob> Hi
[22:50] * Cube1 (~Cube@12.248.40.138) Quit (Read error: Connection reset by peer)
[22:50] * Cube (~Cube@12.248.40.138) has joined #ceph
[22:50] * analbeard (~shw@host31-53-108-38.range31-53.btcentralplus.com) Quit (Quit: Leaving.)
[22:51] <Kioob> about bug #5876 (kernel crash on Assertion failure in rbd_img_obj_callback), what kind of information can I give to help fix that ?
[22:51] <kraken> Kioob might be talking about http://tracker.ceph.com/issues/5876 [Assertion failure in rbd_img_obj_callback() : rbd_assert(which >= img_request->next_completion);]
[22:51] * Cube1 (~Cube@12.248.40.138) has joined #ceph
[22:51] * Cube (~Cube@12.248.40.138) Quit (Read error: Connection reset by peer)
[22:51] * fatih_ (~fatih@78.186.36.182) has joined #ceph
[22:51] <Kioob> kraken: good boy :)
[22:51] * Cube1 is now known as Cube
[22:52] <Kioob> I had a crash tonight, once again.
[22:53] * JoeGruher (~JoeGruher@134.134.137.75) has joined #ceph
[22:53] * marrusl (~mark@209-150-43-182.c3-0.wsd-ubr2.qens-wsd.ny.cable.rcn.com) has joined #ceph
[22:54] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) Quit (Quit: ...)
[22:55] * fatih_ (~fatih@78.186.36.182) Quit ()
[22:58] * fatih (~fatih@78.186.36.182) Quit (Ping timeout: 480 seconds)
[23:02] <Midnightmyth> is calimari open source or enterprise stuff?
[23:03] <darkfader> wuerde ich
[23:03] <dmick> http://www.inktank.com/enterprise/
[23:03] <darkfader> ww
[23:03] * Cataglottism (~Cataglott@dsl-087-195-030-170.solcon.nl) Quit (Quit: My Mac Pro has gone to sleep. ZZZzzz???)
[23:04] * dmsimard (~Adium@108.163.152.2) Quit (Ping timeout: 480 seconds)
[23:05] * bandrus (~Adium@66-87-119-121.pools.spcsdns.net) has joined #ceph
[23:05] <loicd> houkouonchi-work: FYI mira052 cannot be nuked ( ERROR:teuthology.nuke:Could not nuke the following targets:
[23:05] <loicd> targets:
[23:05] <loicd> ubuntu@mira052 )
[23:08] <loicd> houkouonchi-work: should I file a bug for this ?
[23:09] <houkouonchi-work> loicd: it happens sometimes for various reasons
[23:09] <houkouonchi-work> you can open a bug or i can check it out in a few min
[23:09] <loicd> I locked another machine, there is no hurry ;-)
[23:09] * diegows_ (~diegows@186.61.17.101) has joined #ceph
[23:10] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[23:11] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[23:12] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[23:13] * BillK (~BillK-OFT@124-148-70-238.dyn.iinet.net.au) has joined #ceph
[23:14] * fatih (~fatih@78.186.36.182) has joined #ceph
[23:18] * wschulze (~wschulze@p54BEDDB2.dip0.t-ipconnect.de) has joined #ceph
[23:18] * bandrus (~Adium@66-87-119-121.pools.spcsdns.net) Quit (Read error: Connection reset by peer)
[23:19] * fdmanana (~fdmanana@bl5-172-157.dsl.telepac.pt) Quit (Quit: Leaving)
[23:31] * markbby (~Adium@168.94.245.2) has joined #ceph
[23:33] * leseb (~leseb@185.21.172.77) Quit (Killed (NickServ (Too many failed password attempts.)))
[23:33] <mjevans> Since it doesn't appear to have been asked about recently, given that newer versions (post 3.12) of Linux have a probably stable BTRFS, has anyone deployed Ceph on btrfs recently and if so was the experience positive?
[23:33] * leseb (~leseb@185.21.172.77) has joined #ceph
[23:36] * mtanski (~mtanski@cpe-72-229-51-156.nyc.res.rr.com) has joined #ceph
[23:36] * h6w (~tudor@254.86.96.58.static.exetel.com.au) has left #ceph
[23:38] * mtanski (~mtanski@cpe-72-229-51-156.nyc.res.rr.com) Quit ()
[23:38] <nhm> mjevans: I haven't tested on 3.12. The big problem I've run into is if you do RBD, COW with RBD images can cause pretty extreme filesystem fragmentation which really hurts sequential read performance after a while. Theoretically auto defrag might help, but at least currently I think it can make linux go OOM if you try it on a FS with lots of snapshots.
[23:39] <mjevans> That's a scary thing; is it possible for that to OOM in a filesystem corrupted state, or is that a thing that you could do during a maintenance window and 'try again' if it locks up until it succeeds?
[23:39] <nhm> josef told me maybe a month or two ago that they still need to fix their defrag stuff not to do that.
[23:40] <nhm> mjevans: I honestly don't know. I haven't tried to invoke it after josef told me it wasn't really safe to use.
[23:40] * sprachgenerator (~sprachgen@130.202.135.185) Quit (Quit: sprachgenerator)
[23:41] <mjevans> Yeah... I'd avoid that too, and hopefully they'll get that not locking up soon...
[23:41] <mjevans> On the other hand, you can still defrag individual files right? In /theory/ you could lazily defrag each file one at a time.
[23:42] <nhm> mjevans: otherwise, btrfs is quite fast. On some hardware significantly faster than ext4/xfs, at least on a fresh file system.
[23:42] <nhm> (with Ceph that is)
[23:43] * wschulze (~wschulze@p54BEDDB2.dip0.t-ipconnect.de) Quit (Quit: Leaving.)
[23:43] <mjevans> It seems that I'd want the improvements of the 3.14 kernel if I'm going to be using Ceph... lots of little things: http://lkml.iu.edu//hypermail/linux/kernel/1401.3/03045.html
[23:46] * ircolle (~Adium@2601:1:8380:2d9:9849:3a6a:3d7d:69f6) Quit (Quit: Leaving.)
[23:46] <nhm> mjevans: in any event, I'd suggest lots of testing. :D
[23:47] * diegows_ (~diegows@186.61.17.101) Quit (Read error: Operation timed out)
[23:48] * diegows_ (~diegows@186.61.17.101) has joined #ceph
[23:49] <mjevans> Yeah, I had wanted to rush in to this a bit, but it's clear that I need to do a 'proper burnin' test given just how many unknowns and 'just released' things are going to be in use.
[23:52] <mjevans> Is there any tool you'd recommend for simulating 'light database' + small file manipulation loads?
[23:54] * mattt (~textual@CPE0026f326e530-CM0026f326e52d.cpe.net.cable.rogers.com) has joined #ceph
[23:55] * `jpg (~josephgla@ppp121-44-151-43.lns20.syd7.internode.on.net) has joined #ceph
[23:57] * mattt (~textual@CPE0026f326e530-CM0026f326e52d.cpe.net.cable.rogers.com) Quit ()
[23:59] * sleinen1 (~Adium@2001:620:0:26:3497:f885:eb0d:6a97) Quit (Quit: Leaving.)
[23:59] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[23:59] * mattt (~textual@CPE0026f326e530-CM0026f326e52d.cpe.net.cable.rogers.com) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.