#ceph IRC Log

Index

IRC Log for 2014-08-04

Timestamps are in GMT/BST.

[0:03] * flaxy (~afx@78.130.171.69) Quit (Quit: WeeChat 0.4.2)
[0:03] * BManojlovic (~steki@178-221-116-161.dynamic.isp.telekom.rs) Quit (Ping timeout: 480 seconds)
[0:06] * lupu (~lupu@86.107.101.214) Quit (Ping timeout: 480 seconds)
[0:07] * flaxy (~afx@78.130.171.69) has joined #ceph
[0:10] * lightspeed (~lightspee@2001:8b0:16e:1:216:eaff:fe59:4a3c) Quit (Ping timeout: 480 seconds)
[0:20] * lightspeed (~lightspee@2001:8b0:16e:1:216:eaff:fe59:4a3c) has joined #ceph
[0:24] * rendar (~I@host17-179-dynamic.56-79-r.retail.telecomitalia.it) Quit ()
[0:27] * capri_on (~capri@212.218.127.222) Quit (Read error: Connection reset by peer)
[0:27] * capri_on (~capri@212.218.127.222) has joined #ceph
[0:30] * rotbeard (~redbeard@2a02:908:df10:d300:76f0:6dff:fe3b:994d) Quit (charon.oftc.net coulomb.oftc.net)
[0:30] * Vacum_ (~vovo@i59F79AAC.versanet.de) Quit (charon.oftc.net coulomb.oftc.net)
[0:30] * nljmo (~nljmo@5ED6C263.cm-7-7d.dynamic.ziggo.nl) Quit (charon.oftc.net coulomb.oftc.net)
[0:30] * flaf (~flaf@2001:41d0:1:7044::1) Quit (charon.oftc.net coulomb.oftc.net)
[0:30] * steveeJ (~junky@HSI-KBW-085-216-022-246.hsi.kabelbw.de) Quit (charon.oftc.net coulomb.oftc.net)
[0:30] * topro (~prousa@host-62-245-142-50.customer.m-online.net) Quit (charon.oftc.net coulomb.oftc.net)
[0:30] * Cybert1nus (~Cybertinu@cybertinus.customer.cloud.nl) Quit (charon.oftc.net coulomb.oftc.net)
[0:31] * topro (~prousa@host-62-245-142-50.customer.m-online.net) has joined #ceph
[0:31] * MaZ- (~maz@00016955.user.oftc.net) Quit (Ping timeout: 480 seconds)
[0:32] * rotbeard (~redbeard@2a02:908:df10:d300:76f0:6dff:fe3b:994d) has joined #ceph
[0:32] * Vacum_ (~vovo@i59F79AAC.versanet.de) has joined #ceph
[0:32] * nljmo (~nljmo@5ED6C263.cm-7-7d.dynamic.ziggo.nl) has joined #ceph
[0:32] * flaf (~flaf@2001:41d0:1:7044::1) has joined #ceph
[0:32] * steveeJ (~junky@HSI-KBW-085-216-022-246.hsi.kabelbw.de) has joined #ceph
[0:32] * Cybert1nus (~Cybertinu@cybertinus.customer.cloud.nl) has joined #ceph
[0:34] * flaf (~flaf@2001:41d0:1:7044::1) Quit (Ping timeout: 480 seconds)
[0:35] * Vacum (~vovo@i59F79AAC.versanet.de) has joined #ceph
[0:36] * Vacum_ (~vovo@i59F79AAC.versanet.de) Quit (Remote host closed the connection)
[0:36] * nljmo (~nljmo@5ED6C263.cm-7-7d.dynamic.ziggo.nl) Quit (Ping timeout: 480 seconds)
[0:38] * sputnik13 (~sputnik13@ip-64-134-226-51.public.wayport.net) has joined #ceph
[0:39] * nljmo (~nljmo@5ED6C263.cm-7-7d.dynamic.ziggo.nl) has joined #ceph
[0:39] * flaf (~flaf@2001:41d0:1:7044::1) has joined #ceph
[0:43] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Quit: Leaving.)
[0:44] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[0:46] * sputnik13 (~sputnik13@ip-64-134-226-51.public.wayport.net) Quit (Ping timeout: 480 seconds)
[0:49] * jksM (~jks@3e6b5724.rev.stofanet.dk) Quit (Ping timeout: 480 seconds)
[0:49] * DV (~veillard@veillard.com) Quit (Ping timeout: 480 seconds)
[0:51] * cookednoodles (~eoin@eoin.clanslots.com) Quit (Quit: Ex-Chat)
[0:53] * AfC (~andrew@nat-gw2.syd4.anchor.net.au) has joined #ceph
[0:54] * AfC (~andrew@nat-gw2.syd4.anchor.net.au) Quit ()
[0:54] * AfC (~andrew@nat-gw2.syd4.anchor.net.au) has joined #ceph
[0:55] * JC1 (~JC@AMontpellier-651-1-420-97.w92-133.abo.wanadoo.fr) Quit (Quit: Leaving.)
[0:55] * MaZ- (~maz@00016955.user.oftc.net) has joined #ceph
[1:05] * markbby (~Adium@168.94.245.3) has joined #ceph
[1:05] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Quit: Leaving.)
[1:14] * oms101 (~oms101@p20030057EA5C9F00EEF4BBFFFE0F7062.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[1:23] * oms101 (~oms101@p20030057EA023800EEF4BBFFFE0F7062.dip0.t-ipconnect.de) has joined #ceph
[1:42] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[2:11] * Nacer_ (~Nacer@pai34-4-82-240-124-12.fbx.proxad.net) Quit (Remote host closed the connection)
[2:24] * DV (~veillard@2001:41d0:1:d478::1) has joined #ceph
[2:42] * DV (~veillard@2001:41d0:1:d478::1) Quit (Ping timeout: 480 seconds)
[2:44] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) has joined #ceph
[3:01] * hflai (~hflai@alumni.cs.nctu.edu.tw) has joined #ceph
[3:06] * davidz (~Adium@cpe-23-242-12-23.socal.res.rr.com) has joined #ceph
[3:07] * kevinc (~kevinc__@client65-44.sdsc.edu) has joined #ceph
[3:09] * LeaChim (~LeaChim@host86-161-89-237.range86-161.btcentralplus.com) Quit (Read error: Operation timed out)
[3:18] * kevinc (~kevinc__@client65-44.sdsc.edu) Quit (Quit: This computer has gone to sleep)
[3:28] * zack_dolby (~textual@p8505b4.tokynt01.ap.so-net.ne.jp) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[3:28] * zack_dolby (~textual@p8505b4.tokynt01.ap.so-net.ne.jp) has joined #ceph
[3:32] * joao (~joao@a79-168-5-220.cpe.netcabo.pt) Quit (Ping timeout: 480 seconds)
[3:34] * Tamil1 (~Adium@cpe-108-184-74-11.socal.res.rr.com) has joined #ceph
[3:36] * zack_dolby (~textual@p8505b4.tokynt01.ap.so-net.ne.jp) Quit (Ping timeout: 480 seconds)
[3:41] * joao (~joao@a79-168-5-220.cpe.netcabo.pt) has joined #ceph
[3:41] * ChanServ sets mode +o joao
[3:44] * zhaochao (~zhaochao@106.39.255.170) has joined #ceph
[3:50] * haomaiwang (~haomaiwan@223.223.183.114) Quit (Remote host closed the connection)
[3:51] * haomaiwang (~haomaiwan@203.69.59.199) has joined #ceph
[3:56] * markbby (~Adium@168.94.245.3) Quit (Ping timeout: 480 seconds)
[3:57] * haomaiwa_ (~haomaiwan@223.223.183.114) has joined #ceph
[4:04] * haomaiwang (~haomaiwan@203.69.59.199) Quit (Ping timeout: 480 seconds)
[4:07] * aknapp (~aknapp@ip68-99-237-112.ph.ph.cox.net) has joined #ceph
[4:07] * Tamil1 (~Adium@cpe-108-184-74-11.socal.res.rr.com) Quit (Quit: Leaving.)
[4:12] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) has joined #ceph
[4:19] * RameshN (~rnachimu@101.222.246.68) has joined #ceph
[4:22] * diegows (~diegows@190.190.5.238) Quit (Ping timeout: 480 seconds)
[4:25] * zack_dolby (~textual@e0109-114-22-0-42.uqwimax.jp) has joined #ceph
[4:30] * shang (~ShangWu@27.100.16.219) has joined #ceph
[4:39] * Tamil1 (~Adium@cpe-108-184-74-11.socal.res.rr.com) has joined #ceph
[4:40] * bkopilov (~bkopilov@213.57.16.16) Quit (Read error: Operation timed out)
[4:40] * wschulze (~wschulze@cpe-69-206-251-158.nyc.res.rr.com) has joined #ceph
[4:42] * Tamil1 (~Adium@cpe-108-184-74-11.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[4:42] * Tamil1 (~Adium@cpe-108-184-74-11.socal.res.rr.com) has joined #ceph
[5:02] * madkiss1 (~madkiss@2001:6f8:12c3:f00f:d75:c896:bee8:da30) Quit (Quit: Leaving.)
[5:04] * zerick (~Erick@190.118.43.55) Quit (Read error: Connection reset by peer)
[5:04] * zerick (~Erick@190.118.43.55) has joined #ceph
[5:05] * vz (~vz@122.167.123.39) has joined #ceph
[5:08] * lucas1 (~Thunderbi@222.247.57.50) has joined #ceph
[5:11] <longguang> for committed_thru , what is thru?
[5:17] * athrift (~nz_monkey@203.86.205.13) Quit (Quit: No Ping reply in 180 seconds.)
[5:17] * athrift (~nz_monkey@203.86.205.13) has joined #ceph
[5:26] * Vacum_ (~vovo@88.130.222.42) has joined #ceph
[5:33] * Vacum (~vovo@i59F79AAC.versanet.de) Quit (Ping timeout: 480 seconds)
[5:35] * debian (~oftc-webi@116.212.137.13) has joined #ceph
[5:35] * debian is now known as Guest4739
[5:37] <Guest4739> Dear everyone, i fail to test tmpfs for Ceph journal
[5:37] <Guest4739> root@ceph04-vm:~# ceph-osd -i 4 --osd-journal=/mnt/ramdisk/journal --mkjournal 2014-08-03 20:18:26.844953 7fb7c6b9c780 -1 journal FileJournal::_open: aio not supported without directio; disabling aio 2014-08-03 20:18:26.846006 7fb7c6b9c780 -1 journal FileJournal::_open_file : unable to preallocation journal to 5368709120 bytes: (28) No space left on device 2014-08-03 20:18:26.846048 7fb7c6b9c780 -1 filestore(/var/lib/ceph/osd/ceph-4) mkjournal error cre
[5:37] * shang (~ShangWu@27.100.16.219) Quit (Ping timeout: 480 seconds)
[5:38] * lupu (~lupu@86.107.101.214) has joined #ceph
[5:42] * kfei (~root@114-27-93-71.dynamic.hinet.net) has joined #ceph
[5:50] * shang (~ShangWu@175.41.48.77) has joined #ceph
[5:53] * Guest4739 (~oftc-webi@116.212.137.13) Quit (Remote host closed the connection)
[5:58] * kanagaraj (~kanagaraj@121.244.87.117) has joined #ceph
[6:04] * bkopilov (~bkopilov@nat-pool-tlv-t.redhat.com) has joined #ceph
[6:12] * kanagaraj (~kanagaraj@121.244.87.117) Quit (Ping timeout: 480 seconds)
[6:14] * vbellur (~vijay@122.167.220.189) Quit (Ping timeout: 480 seconds)
[6:19] * kanagaraj (~kanagaraj@121.244.87.117) has joined #ceph
[6:28] * debian_ (~oftc-webi@116.212.137.13) has joined #ceph
[6:32] * Tamil1 (~Adium@cpe-108-184-74-11.socal.res.rr.com) Quit (Quit: Leaving.)
[6:39] * vbellur (~vijay@121.244.87.117) has joined #ceph
[6:43] * rdas (~rdas@121.244.87.115) has joined #ceph
[7:16] <chowmeined> Guest3837, Is your ramdisk battery backed?
[7:17] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[7:22] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) Quit (Ping timeout: 480 seconds)
[7:24] * shimo (~A13032@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[7:29] * zack_dolby (~textual@e0109-114-22-0-42.uqwimax.jp) Quit (Read error: Connection reset by peer)
[7:29] * zack_dolby (~textual@e0109-114-22-0-42.uqwimax.jp) has joined #ceph
[7:34] * joao (~joao@a79-168-5-220.cpe.netcabo.pt) Quit (Ping timeout: 480 seconds)
[7:43] * joao (~joao@a79-168-5-220.cpe.netcabo.pt) has joined #ceph
[7:43] * ChanServ sets mode +o joao
[7:43] * haomaiwa_ (~haomaiwan@223.223.183.114) Quit (Remote host closed the connection)
[7:43] * haomaiwang (~haomaiwan@203.69.59.199) has joined #ceph
[7:43] * lucas1 (~Thunderbi@222.247.57.50) Quit (Quit: lucas1)
[7:45] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[7:46] * haomaiwa_ (~haomaiwan@223.223.183.114) has joined #ceph
[7:50] * djh-work is now known as Guest4752
[7:50] * djh-work (~daniel@141.52.73.152) has joined #ceph
[7:52] * RameshN (~rnachimu@101.222.246.68) Quit (Quit: Quit)
[7:53] * michalefty (~micha@p5DDCD2D3.dip0.t-ipconnect.de) has joined #ceph
[7:53] * haomaiwang (~haomaiwan@203.69.59.199) Quit (Ping timeout: 480 seconds)
[7:54] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) Quit (Quit: Leaving.)
[7:54] * RameshN (~rnachimu@101.222.246.68) has joined #ceph
[7:55] * Guest4752 (~daniel@141.52.73.152) Quit (Ping timeout: 480 seconds)
[7:56] * lucas1 (~Thunderbi@218.76.25.66) has joined #ceph
[7:56] * lucas1 (~Thunderbi@218.76.25.66) Quit ()
[7:59] * RameshN_ (~rnachimu@101.222.246.146) has joined #ceph
[8:05] * RameshN (~rnachimu@101.222.246.68) Quit (Ping timeout: 480 seconds)
[8:07] * lalatenduM (~lalatendu@121.244.87.117) has joined #ceph
[8:09] * shimo (~A13032@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Quit: shimo)
[8:10] * andreask (~andreask@nat-pool-brq-u.redhat.com) has joined #ceph
[8:10] * ChanServ sets mode +v andreask
[8:12] * vbellur (~vijay@121.244.87.117) Quit (Ping timeout: 480 seconds)
[8:12] * rotbeard (~redbeard@2a02:908:df10:d300:76f0:6dff:fe3b:994d) Quit (Quit: Verlassend)
[8:16] * zerick (~Erick@190.118.43.55) Quit (Read error: Connection reset by peer)
[8:18] * andreask (~andreask@nat-pool-brq-u.redhat.com) Quit (Ping timeout: 480 seconds)
[8:26] * vbellur (~vijay@121.244.87.117) has joined #ceph
[8:29] * dis (~dis@109.110.66.143) has joined #ceph
[8:31] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) has joined #ceph
[8:37] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Quit: Leaving.)
[8:38] * fsimonce (~simon@host225-92-dynamic.21-87-r.retail.telecomitalia.it) has joined #ceph
[8:38] * rendar (~I@host45-177-dynamic.20-87-r.retail.telecomitalia.it) has joined #ceph
[8:39] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[8:40] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[8:42] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) Quit (Quit: shimo)
[8:42] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) has joined #ceph
[8:43] * michalefty (~micha@p5DDCD2D3.dip0.t-ipconnect.de) has left #ceph
[8:45] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit ()
[8:55] * mabj (~SilverWol@130.226.133.111) Quit (Read error: Connection reset by peer)
[8:56] <Clabbe> If I have an SSD as journal and a mechanical drive as osd data storage, what would be the best to get the speed up? introduce another mech drive in raid 0? together with the old one
[8:57] <Clabbe> or is it possible to tell ceph to utilize both drives in any other way? I dont want another OSD as I dont have and SSD left to spare :)
[8:57] <Clabbe> any even
[8:57] * MACscr (~Adium@c-50-158-183-38.hsd1.il.comcast.net) has joined #ceph
[9:03] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[9:05] * kanagaraj (~kanagaraj@121.244.87.117) Quit (Ping timeout: 480 seconds)
[9:05] <Sysadmin88> you have an OSD on your SSD or the journal?
[9:05] <Sysadmin88> SSD can be journal for multiple HDDs
[9:06] <Sysadmin88> iirc youtube video had 5-1 ratio for good performance
[9:06] <Clabbe> Sysadmin88: journal on ssd
[9:06] * i_m (~ivan.miro@gbibp9ph1--blueice2n1.emea.ibm.com) has joined #ceph
[9:07] <Clabbe> okay so partitioning the ssd ?
[9:07] <Sysadmin88> from what i remember you just point things at a directory
[9:07] <Clabbe> its pointed at a partition atm
[9:07] <Clabbe> :|
[9:07] <Sysadmin88> so the directory is at the root of that partition
[9:08] <Clabbe> okay
[9:08] * b0e (~aledermue@juniper1.netways.de) has joined #ceph
[9:11] * drankis (~drankis__@89.111.13.198) has joined #ceph
[9:12] * AfC (~andrew@nat-gw2.syd4.anchor.net.au) Quit (Quit: Leaving.)
[9:12] * lcavassa (~lcavassa@89.184.114.246) has joined #ceph
[9:14] * drankis (~drankis__@89.111.13.198) Quit ()
[9:14] * drankis (~drankis__@89.111.13.198) has joined #ceph
[9:14] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:16] * kalleh (~kalleh@37-46-175-162.customers.ownit.se) has joined #ceph
[9:18] * kanagaraj (~kanagaraj@121.244.87.117) has joined #ceph
[9:24] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[9:24] <Clabbe> Sysadmin88: how large partition is sufficient for the journal?
[9:26] * DV (~veillard@2001:41d0:1:d478::1) has joined #ceph
[9:33] * danieagle_ (~Daniel@179.184.165.184.static.gvt.net.br) has joined #ceph
[9:34] * Sysadmin88 (~IceChat77@2.218.9.98) Quit (Quit: Some folks are wise, and some otherwise.)
[9:36] * DV (~veillard@2001:41d0:1:d478::1) Quit (Ping timeout: 480 seconds)
[9:39] * danieagle (~Daniel@179.184.165.184.static.gvt.net.br) Quit (Ping timeout: 480 seconds)
[9:43] * rotbeard (~redbeard@b2b-94-79-138-170.unitymedia.biz) has joined #ceph
[9:49] * DV (~veillard@veillard.com) has joined #ceph
[9:50] * cok (~chk@2a02:2350:18:1012:d9d7:2edd:b223:da6a) has joined #ceph
[9:50] * haomaiwa_ (~haomaiwan@223.223.183.114) Quit (Remote host closed the connection)
[9:50] * haomaiwang (~haomaiwan@203.69.59.199) has joined #ceph
[9:52] * vbellur (~vijay@121.244.87.117) Quit (Ping timeout: 480 seconds)
[9:52] * cookednoodles (~eoin@eoin.clanslots.com) has joined #ceph
[9:53] * vmx (~vmx@dslb-084-056-003-098.084.056.pools.vodafone-ip.de) has joined #ceph
[9:56] * zack_dol_ (~textual@e0109-114-22-0-42.uqwimax.jp) has joined #ceph
[9:56] * zack_dolby (~textual@e0109-114-22-0-42.uqwimax.jp) Quit (Read error: Connection reset by peer)
[10:06] * ikrstic (~ikrstic@109-93-162-27.dynamic.isp.telekom.rs) has joined #ceph
[10:06] * haomaiwa_ (~haomaiwan@223.223.183.114) has joined #ceph
[10:07] * jordanP (~jordan@185.23.92.11) has joined #ceph
[10:08] * fghaas (~florian@91-119-223-7.dynamic.xdsl-line.inode.at) has joined #ceph
[10:12] * haomaiwang (~haomaiwan@203.69.59.199) Quit (Ping timeout: 480 seconds)
[10:18] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[10:19] * rdas (~rdas@121.244.87.115) Quit (Quit: Leaving)
[10:23] * kalleeh (~kalleh@37-46-175-162.customers.ownit.se) has joined #ceph
[10:23] * kalleh (~kalleh@37-46-175-162.customers.ownit.se) Quit (Read error: Connection reset by peer)
[10:24] * LeaChim (~LeaChim@host86-161-89-237.range86-161.btcentralplus.com) has joined #ceph
[10:30] * ksingh (~Adium@2001:708:10:10:68cc:5f3b:861d:1de9) has joined #ceph
[10:31] <ksingh> loicd: Error EINVAL: cannot determine the erasure code plugin because there is no 'plugin' entry in the erasure_code_profile {}failed to load plugin using profile default , Firefly 0.80.5
[10:33] * lucas1 (~Thunderbi@218.76.25.66) has joined #ceph
[10:38] * sm1ly (~sm1ly@broadband-77-37-240-109.nationalcablenetworks.ru) has joined #ceph
[10:39] * lucas1 (~Thunderbi@218.76.25.66) Quit (Quit: lucas1)
[10:42] * mattch (~mattch@pcw3047.see.ed.ac.uk) has joined #ceph
[10:47] <loicd> ksingh: \o
[10:48] <loicd> ksingh: it probably is http://tracker.ceph.com/issues/8601 for which there is a workaround http://tracker.ceph.com/issues/8601#Workaround . It will be in 0.80.6
[10:50] <debian_> why dd hung on RBD mount folder
[10:50] <debian_> echo 3 | sudo tee /proc/sys/vm/drop_caches && sudo sync
[10:50] <debian_> dd if=/dev/zero of=here bs=1G count=1 oflag=direct
[10:51] <debian_> rados bench -p pool-B 20 write , runs OK. when dd hung, ceph -s , health ok
[10:52] * sm1ly (~sm1ly@broadband-77-37-240-109.nationalcablenetworks.ru) Quit (Quit: Leaving)
[11:03] * zack_dol_ (~textual@e0109-114-22-0-42.uqwimax.jp) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[11:11] <steveeJ> debian_: how long have you waited for dd to complete? is it possible it was just very slow?
[11:18] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[11:21] * vbellur (~vijay@121.244.87.117) has joined #ceph
[11:23] * Nacer (~Nacer@pai34-4-82-240-124-12.fbx.proxad.net) has joined #ceph
[11:45] * tdb (~tdb@myrtle.kent.ac.uk) Quit (Ping timeout: 480 seconds)
[11:47] * tdb (~tdb@myrtle.kent.ac.uk) has joined #ceph
[11:49] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[11:52] * AfC (~andrew@2001:44b8:31cb:d400:6e88:14ff:fe33:2a9c) has joined #ceph
[11:53] * capri_oner (~capri@212.218.127.222) has joined #ceph
[11:57] <ksingh> loicd: Thanks it worked
[11:58] <loicd> ksingh: cool ;-)
[12:00] * capri_on (~capri@212.218.127.222) Quit (Ping timeout: 480 seconds)
[12:02] * sbadia (~sbadia@yasaw.net) has joined #ceph
[12:04] * jksM (~jks@3e6b5724.rev.stofanet.dk) has joined #ceph
[12:15] * RameshN_ (~rnachimu@101.222.246.146) Quit (Ping timeout: 480 seconds)
[12:20] * RameshN_ (~rnachimu@101.222.246.146) has joined #ceph
[12:23] * zack_dolby (~textual@p8505b4.tokynt01.ap.so-net.ne.jp) has joined #ceph
[12:24] * AfC (~andrew@2001:44b8:31cb:d400:6e88:14ff:fe33:2a9c) Quit (Ping timeout: 480 seconds)
[12:26] * AfC (~andrew@2001:44b8:31cb:d400:6e88:14ff:fe33:2a9c) has joined #ceph
[12:35] * AfC (~andrew@2001:44b8:31cb:d400:6e88:14ff:fe33:2a9c) Quit (Ping timeout: 480 seconds)
[12:36] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[12:39] <Clabbe> is it better to startup a new OSD than or setup a striped hdd for existing osd? This using journal on SSD
[12:40] * kalleeh (~kalleh@37-46-175-162.customers.ownit.se) Quit (Ping timeout: 480 seconds)
[12:40] <Clabbe> disk is near 100% utilized when recovering
[12:41] <Clabbe> setting up a second OSD on same host wont solve the throughput issue right? Better would be to setup another disk and use raid 0?
[12:41] * vz_ (~vz@122.167.123.39) has joined #ceph
[12:42] * zhaochao (~zhaochao@106.39.255.170) has left #ceph
[12:44] * vz (~vz@122.167.123.39) Quit (Ping timeout: 480 seconds)
[12:50] * vz_ (~vz@122.167.123.39) Quit (Ping timeout: 480 seconds)
[12:51] * lucas1 (~Thunderbi@222.240.148.130) has joined #ceph
[12:52] * vz (~vz@122.167.209.38) has joined #ceph
[12:53] * vz (~vz@122.167.209.38) Quit ()
[12:54] * vz (~vz@122.167.209.38) has joined #ceph
[12:54] <Clabbe> loicd: or chowmeined here? :)
[12:56] * kalleh (~kalleh@37-46-175-162.customers.ownit.se) has joined #ceph
[13:10] * shang (~ShangWu@175.41.48.77) Quit (Ping timeout: 480 seconds)
[13:11] * lucas1 (~Thunderbi@222.240.148.130) Quit (Quit: lucas1)
[13:12] * vz_ (~vz@122.167.215.192) has joined #ceph
[13:14] * vz (~vz@122.167.209.38) Quit (Read error: Connection reset by peer)
[13:18] * funnel (~funnel@0001c7d4.user.oftc.net) has joined #ceph
[13:34] * vmx_ (~vmx@dslb-084-056-052-102.084.056.pools.vodafone-ip.de) has joined #ceph
[13:37] * vmx is now known as Guest4777
[13:37] * vmx_ is now known as vmx
[13:38] * kalleh (~kalleh@37-46-175-162.customers.ownit.se) Quit (Ping timeout: 480 seconds)
[13:39] * rdas (~rdas@110.227.45.202) has joined #ceph
[13:39] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[13:39] * primechuck (~primechuc@173-17-128-36.client.mchsi.com) Quit (Remote host closed the connection)
[13:40] * primechuck (~primechuc@173-17-128-36.client.mchsi.com) has joined #ceph
[13:41] * Guest4777 (~vmx@dslb-084-056-003-098.084.056.pools.vodafone-ip.de) Quit (Ping timeout: 480 seconds)
[13:41] <tnt_> Is there a command to see the really used space by a RBD volume ? (i.e. count how many slices are used).
[13:42] * Vacum_ is now known as Vacum
[13:42] <Vacum> if a bunch of osds in one host go down and out, shouldn't the crushmap reflect the changed weight of the osd then? (when I do ceph osd crush dump )
[13:43] * diegows (~diegows@190.190.5.238) has joined #ceph
[13:44] <tnt_> No
[13:44] <tnt_> Vacum: ^^
[13:44] <Vacum> sorry, I meant: shouldn't the crushmap reflect the changed weight of the *host* then
[13:44] <tnt_> Still no.
[13:44] <Vacum> mh. why not?
[13:45] <Vacum> without changing the weight of the host, that host will get the same amount of PGs as before, but on less osds than before
[13:48] * primechuck (~primechuc@173-17-128-36.client.mchsi.com) Quit (Ping timeout: 480 seconds)
[13:50] * Nacer (~Nacer@pai34-4-82-240-124-12.fbx.proxad.net) Quit (Remote host closed the connection)
[13:50] * vz_ (~vz@122.167.215.192) Quit (Ping timeout: 480 seconds)
[13:50] <tnt_> Vacum: the crushmap weight is for when everything is up.
[13:51] <Vacum> tnt_: so where/how can I see what ceph thinks about the weight while something is down/out ?
[13:52] <tnt_> I'm not sure if there is a way to do that ...
[13:54] <Vacum> because currently it looks as if ceph does use the host weight in the crushmap to decide where to put the pgs, resulting in in/up "sibling" osds being used overproportional on hosts that have "down/out" osds - compared to osds on 100% working hosts of the same weight.
[13:58] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[14:03] * mtl1 (~Adium@c-67-174-109-212.hsd1.co.comcast.net) has joined #ceph
[14:03] * mtl2 (~Adium@c-67-174-109-212.hsd1.co.comcast.net) Quit (Read error: Connection reset by peer)
[14:07] <tnt_> Vacum: don't know anything about that, but the crushmap is a 'static' thing AFAIK. It doesn't even include who's up/down/in/out. The CRUSH algo will use the crushmap + current osdmap to figure out pg placement.
[14:09] * AfC (~andrew@2001:44b8:31cb:d400:6e88:14ff:fe33:2a9c) has joined #ceph
[14:10] * Nacer (~Nacer@pai34-4-82-240-124-12.fbx.proxad.net) has joined #ceph
[14:11] * aknapp (~aknapp@ip68-99-237-112.ph.ph.cox.net) Quit (Remote host closed the connection)
[14:12] * aknapp (~aknapp@64.202.160.233) has joined #ceph
[14:16] * kanagaraj (~kanagaraj@121.244.87.117) Quit (Quit: Leaving)
[14:18] * rwheeler (~rwheeler@173.48.207.57) Quit (Quit: Leaving)
[14:19] * burley (~khemicals@9.sub-70-194-196.myvzw.com) has joined #ceph
[14:19] <theanalyst> hi is `bucket location' supported in the S3 API for radosgw... since multi regions are supported?
[14:20] <theanalyst> the support chart mentions no, but it is out of date
[14:22] * cok (~chk@2a02:2350:18:1012:d9d7:2edd:b223:da6a) Quit (Quit: Leaving.)
[14:30] * vz (~vz@122.167.215.192) has joined #ceph
[14:33] * Nacer (~Nacer@pai34-4-82-240-124-12.fbx.proxad.net) Quit (Remote host closed the connection)
[14:36] * vbellur (~vijay@121.244.87.117) Quit (Ping timeout: 480 seconds)
[14:41] * RameshN_ (~rnachimu@101.222.246.146) Quit (Ping timeout: 480 seconds)
[14:45] * bkopilov (~bkopilov@nat-pool-tlv-t.redhat.com) Quit (Ping timeout: 480 seconds)
[14:46] <Vacum> tnt_: ah! thank you
[14:50] * AfC (~andrew@2001:44b8:31cb:d400:6e88:14ff:fe33:2a9c) Quit (Ping timeout: 480 seconds)
[15:02] * ganders (~root@200-127-158-54.net.prima.net.ar) has joined #ceph
[15:14] * markbby (~Adium@168.94.245.3) has joined #ceph
[15:18] * primechuck (~primechuc@host-95-2-129.infobunker.com) has joined #ceph
[15:21] * KevinPerks (~Adium@cpe-174-098-096-200.triad.res.rr.com) has joined #ceph
[15:22] * rotbeard (~redbeard@b2b-94-79-138-170.unitymedia.biz) Quit (Ping timeout: 480 seconds)
[15:23] * cok (~chk@46.30.211.29) has joined #ceph
[15:26] * hufman (~hufman@cpe-184-58-235-28.wi.res.rr.com) has joined #ceph
[15:27] * jeff-YF (~jeffyf@67.23.117.122) has joined #ceph
[15:34] <tnt_> Does anyone know if any kernel supports RBD striping ?
[15:35] <beardo> tnt_, no, but I believe it's on the list for Giant
[15:37] * brad_mssw (~brad@shop.monetra.com) has joined #ceph
[15:37] <tnt_> Did any one make any perf test to see if striping helped ?
[15:39] <beardo> I haven't seen any tests, but it wouldn't surprise me
[15:40] * rdas (~rdas@110.227.45.202) Quit (Ping timeout: 480 seconds)
[15:42] * ganders (~root@200-127-158-54.net.prima.net.ar) Quit (Quit: WeeChat 0.4.1)
[15:43] * ganders (~root@200-127-158-54.net.prima.net.ar) has joined #ceph
[15:47] <Clabbe> chowmeined: there?
[15:48] <beardo> is there a procedure for complete shutdown/startup a cluster? I imagine setting noout, shutting off the OSD hosts, then the mons, and reversing that to start up again
[15:51] * vz (~vz@122.167.215.192) Quit (Remote host closed the connection)
[15:51] * gregmark (~Adium@68.87.42.115) has joined #ceph
[15:52] * aknapp (~aknapp@64.202.160.233) Quit (Remote host closed the connection)
[15:52] * aknapp (~aknapp@64.202.160.233) has joined #ceph
[15:56] * dmsimard_away is now known as dmsimard
[16:00] * aknapp (~aknapp@64.202.160.233) Quit (Ping timeout: 480 seconds)
[16:03] * fdmanana (~fdmanana@bl5-173-238.dsl.telepac.pt) has joined #ceph
[16:03] * markl (~mark@knm.org) has joined #ceph
[16:07] * cok1 (~chk@94.191.187.17.mobile.3.dk) has joined #ceph
[16:07] * cok (~chk@46.30.211.29) Quit (Read error: Connection reset by peer)
[16:08] * sjm (~sjm@108.53.250.33) has joined #ceph
[16:09] * rwheeler (~rwheeler@nat-pool-bos-t.redhat.com) has joined #ceph
[16:13] * aknapp (~aknapp@ip68-99-237-112.ph.ph.cox.net) has joined #ceph
[16:15] * aknapp (~aknapp@ip68-99-237-112.ph.ph.cox.net) Quit (Remote host closed the connection)
[16:15] * cok1 (~chk@94.191.187.17.mobile.3.dk) Quit (Ping timeout: 480 seconds)
[16:19] <hufman> heeey guys
[16:19] <hufman> how can i avoid having incomplete pgs when one of my osd servers goes down?
[16:23] <chowmeined> Clabbe, The SSD can act as the journal for multiple OSDs. Use raw partitions directly, this is supported and avoids filesystem overhead for the journal. Adding more spinners as OSDs will help with throughput. RAID-0 is not necessary, Ceph effectively stripes the OSDs at a higher level.
[16:23] * tracphil (~tracphil@130.14.71.217) has joined #ceph
[16:23] <Clabbe> hmm okay
[16:23] <Clabbe> :o
[16:24] <Clabbe> chowmeined: will adding an osd improve the throughput really?
[16:24] <Clabbe> Isnt the drive the bottleneck?
[16:24] <chowmeined> wait, are you saying adding an OSD to the same drive?
[16:24] <Clabbe> chowmeined: no
[16:24] <Clabbe> I mean the recovery I/O performance
[16:25] <Clabbe> Will it be better with more OSDs ? ^^
[16:25] <Clabbe> if the mechanical drive I/O is a bottleneck
[16:25] <chowmeined> yes, it should improve
[16:25] <Clabbe> hmm but how does the syncing work then
[16:26] <Clabbe> as it would still trottle at the line speed of the disk
[16:26] <chowmeined> The larger the cluster, the higher the aggregate throughput AND the smaller impact an outage has
[16:26] <Clabbe> yeah thats true
[16:27] <chowmeined> because the dataset gets split up into smaller pieces as you increase PGs
[16:27] <chowmeined> so as it reads from secondary OSDs to re-replicate its spread across more and more disks
[16:27] <chowmeined> the more PGs and OSDs you have, the more stripes
[16:28] <Clabbe> yeah, maybe set the replication to 3 also would be an improvement
[16:28] * bkopilov (~bkopilov@213.57.18.102) has joined #ceph
[16:28] <chowmeined> yeah
[16:28] <Clabbe> okay :) I will add 3 new OSDs and up the repl level to 3
[16:28] <chowmeined> then it has 2 copies to recover from
[16:28] <Clabbe> chowmeined: thx for the input
[16:28] <chowmeined> just, about the SSD journal
[16:28] * angdraug (~angdraug@c-67-169-181-128.hsd1.ca.comcast.net) has joined #ceph
[16:28] <chowmeined> make sure the throughput on the SSD is enough for the disks behind it
[16:29] <Clabbe> yeah I made som partitioning on the SSD
[16:29] <Clabbe> it should be sufficient
[16:29] <chowmeined> say a mechanical disk can get 120MB/s. An SSD might be able to get 500MB/s
[16:29] <chowmeined> so an SSD shouldnt journal for more than say 4-6 disks
[16:29] <Clabbe> yeah I only set it to 3 slices
[16:29] <chowmeined> ok
[16:29] <chowmeined> that could also be a bottleneck otherwise
[16:29] <Clabbe> =)
[16:29] <chowmeined> cool
[16:30] <Clabbe> now Im going hooome :) see you some other troublesome day, lets say tomorrow ;)
[16:35] * vbellur (~vijay@122.166.184.88) has joined #ceph
[16:38] * kevinc (~kevinc__@client65-44.sdsc.edu) has joined #ceph
[16:43] * vmx (~vmx@dslb-084-056-052-102.084.056.pools.vodafone-ip.de) Quit (Quit: Leaving)
[16:46] * aknapp (~aknapp@ip68-99-237-112.ph.ph.cox.net) has joined #ceph
[16:47] * markbby1 (~Adium@168.94.245.4) has joined #ceph
[16:47] * i_m (~ivan.miro@gbibp9ph1--blueice2n1.emea.ibm.com) Quit (Ping timeout: 480 seconds)
[16:51] * vz (~vz@122.167.215.192) has joined #ceph
[16:51] * markbby (~Adium@168.94.245.3) Quit (Remote host closed the connection)
[16:52] * vbellur (~vijay@122.166.184.88) Quit (Ping timeout: 480 seconds)
[16:54] * aknapp (~aknapp@ip68-99-237-112.ph.ph.cox.net) Quit (Read error: Connection reset by peer)
[16:54] * neurodrone (~neurodron@static-108-30-171-7.nycmny.fios.verizon.net) Quit (Quit: neurodrone)
[16:59] * vz (~vz@122.167.215.192) Quit (Ping timeout: 480 seconds)
[17:03] * vbellur (~vijay@122.172.246.69) has joined #ceph
[17:08] * sputnik13 (~sputnik13@207.8.121.241) has joined #ceph
[17:09] * RameshN_ (~rnachimu@101.222.246.146) has joined #ceph
[17:11] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[17:12] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit ()
[17:12] * nhm (~nhm@65-128-141-191.mpls.qwest.net) has joined #ceph
[17:12] * ChanServ sets mode +o nhm
[17:13] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[17:13] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[17:15] * KevinPerks (~Adium@cpe-174-098-096-200.triad.res.rr.com) Quit (Ping timeout: 480 seconds)
[17:15] * cok (~chk@2a02:2350:18:1012:c0b8:53e:d526:49f2) has joined #ceph
[17:16] * baylight (~tbayly@74-220-196-40.unifiedlayer.com) has joined #ceph
[17:17] * KevinPerks (~Adium@2606:a000:80a1:1b00:5d91:1d6d:9895:f502) has joined #ceph
[17:19] * hasues (~hasues@kwfw01.scrippsnetworksinteractive.com) has joined #ceph
[17:19] * RameshN_ (~rnachimu@101.222.246.146) Quit (Ping timeout: 480 seconds)
[17:19] * kapil (~ksharma@2620:113:80c0:5::2222) has joined #ceph
[17:25] <jiffe> so I'm testing my setup, I have 4 osd's with pool size 2, 3 mons and 2 mds's accessed via cephfs, if I reboot one of the osd's and then touch a new file and ls -la on the directory the ls stalls until the osd comes back up which took about 90 seconds
[17:25] <chowmeined> jiffe, how many OSD hosts?
[17:26] <jiffe> chowmeined: 4, one osd per host
[17:28] * markbby (~Adium@168.94.245.1) has joined #ceph
[17:30] * vz (~vz@122.167.215.192) has joined #ceph
[17:31] * xarses (~andreww@c-76-126-112-92.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[17:32] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:32] * markbby1 (~Adium@168.94.245.4) Quit (Remote host closed the connection)
[17:32] <jiffe> I'm guessing the write is blocking: slow request 30.021957 seconds old, received at 2014-08-04 09:29:36.399177: osd_op(mds.0.6:106 200.000000af [write 882244~1559] 1.32db241a ondisk+write e103) v4 currently reached pg
[17:34] * b0e (~aledermue@juniper1.netways.de) Quit (Quit: Leaving.)
[17:34] <tnt_> Yeah, unfortunately it looks like when a write happens after the OSD is down but before it's been marked down by the cluster, then that write will basically hang.
[17:34] * adamcrume (~quassel@50.247.81.99) has joined #ceph
[17:35] <jiffe> tnt_: it is marked as down though, the first log I see after I reboot is osd.0 marked itself down
[17:35] <tnt_> mmm, what's the min_size of the pool ?
[17:36] <jiffe> min_size is 2, maybe that needs to be set to 1?
[17:36] <tnt_> Ah yes.
[17:36] <tnt_> if min_size is 2, then it will refuse IO unless at least 2 OSD per PG are up.
[17:37] <tnt_> So that IO would have blocked until the OSD would have been marked 'out' and some other OSD taken over it's job. (which takes several minutes).
[17:41] <chowmeined> I heard there was talk about having an option to send requests to multiple OSDs and then one wins
[17:41] <chowmeined> like on reads, whichever one returns the fastest
[17:42] <jiffe> yup that was it, with min_size 1 there's no blocking
[17:45] * alram (~alram@38.122.20.226) has joined #ceph
[17:47] * rmoe (~quassel@173-228-89-134.dsl.static.sonic.net) Quit (Ping timeout: 480 seconds)
[17:48] * seapasulli (~seapasull@95.85.33.150) has joined #ceph
[17:49] * xarses (~andreww@12.164.168.117) has joined #ceph
[17:52] <seapasulli> I have a few unfound objects. I did ceph pg dump_stuck unclean and then tried to mark them as lost with mark_unfound_lost revert about a week ago and it is still trying to find them
[17:53] <seapasulli> I'm on ceph 0.82-524-gbf04897
[17:57] * rmoe (~quassel@12.164.168.117) has joined #ceph
[18:00] * joef (~Adium@2620:79:0:131:d061:a542:af79:d3b9) has joined #ceph
[18:01] * markbby (~Adium@168.94.245.1) Quit (Quit: Leaving.)
[18:03] * aknapp (~aknapp@fw125-01-outside-active.ent.mgmt.glbt1.secureserver.net) has joined #ceph
[18:04] * markbby (~Adium@168.94.245.1) has joined #ceph
[18:07] * Sysadmin88 (~IceChat77@2.218.9.98) has joined #ceph
[18:11] * vz (~vz@122.167.215.192) Quit (Ping timeout: 480 seconds)
[18:11] * lalatenduM (~lalatendu@121.244.87.117) Quit (Quit: Leaving)
[18:15] * vz (~vz@122.167.215.192) has joined #ceph
[18:20] * rotbeard (~redbeard@b2b-94-79-138-170.unitymedia.biz) has joined #ceph
[18:27] * rturk|afk is now known as rturk
[18:27] * ircolle (~Adium@2601:1:a580:145a:824:35ed:6d6:bb2d) has joined #ceph
[18:32] * jordanP (~jordan@185.23.92.11) Quit (Quit: Leaving)
[18:37] * angdraug (~angdraug@c-67-169-181-128.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[18:41] * sigsegv (~sigsegv@188.25.123.201) has joined #ceph
[18:43] <seapasulli> how do I mark the objects as lost. I am trying with mark_unfound_lost revert but it is still reporting that objects are missing.
[18:43] * Nats__ (~Nats@2001:8000:200c:0:f4b4:821:1f5a:23a8) has joined #ceph
[18:44] * scuttle|afk is now known as scuttlemonkey
[18:44] * joshd1 (~jdurgin@2602:306:c5db:310:6cc7:8f68:673b:92f8) has joined #ceph
[18:46] * lcavassa (~lcavassa@89.184.114.246) Quit (Quit: Leaving)
[18:47] * reed (~reed@75-101-54-131.dsl.static.sonic.net) has joined #ceph
[18:47] * topro (~prousa@host-62-245-142-50.customer.m-online.net) Quit (Quit: Konversation terminated!)
[18:47] * Nats (~natscogs@2001:8000:200c:0:c11d:117a:c167:16df) Quit (Read error: Connection reset by peer)
[18:47] * Nats (~natscogs@2001:8000:200c:0:c11d:117a:c167:16df) has joined #ceph
[18:48] * rotbeard (~redbeard@b2b-94-79-138-170.unitymedia.biz) Quit (Quit: Leaving)
[18:48] * rweeks (~rweeks@pat.hitachigst.com) has joined #ceph
[18:50] * sarob (~sarob@cpe-75-82-233-45.socal.res.rr.com) has joined #ceph
[18:50] * Nats_ (~Nats@2001:8000:200c:0:f4b4:821:1f5a:23a8) Quit (Ping timeout: 480 seconds)
[18:59] * Tamil1 (~Adium@cpe-108-184-74-11.socal.res.rr.com) has joined #ceph
[19:00] * zerick (~eocrospom@190.187.21.53) has joined #ceph
[19:00] * burley (~khemicals@9.sub-70-194-196.myvzw.com) Quit (Quit: burley)
[19:01] * markbby (~Adium@168.94.245.1) Quit (Quit: Leaving.)
[19:02] * jeff-YF (~jeffyf@67.23.117.122) Quit (Quit: jeff-YF)
[19:03] * adamcrume (~quassel@50.247.81.99) Quit (Remote host closed the connection)
[19:04] * ksingh (~Adium@2001:708:10:10:68cc:5f3b:861d:1de9) Quit (Quit: Leaving.)
[19:06] * rturk is now known as rturk|afk
[19:06] * vmx (~vmx@dslb-084-056-052-102.084.056.pools.vodafone-ip.de) has joined #ceph
[19:07] * angdraug (~angdraug@12.164.168.117) has joined #ceph
[19:07] <loicd> Vacum: since you ask about the Ceph User Committee meeting in August, I guess it did not happen. Let me ping Eric ;-)
[19:08] <rweeks> cephalopod humor: http://www.dieselsweeties.com/archive/3613
[19:10] * Nacer (~Nacer@2a01:e35:2e29:9800:f967:8cc4:d61a:2e9b) has joined #ceph
[19:14] * Sysadmin88 (~IceChat77@2.218.9.98) Quit (Quit: Relax, its only ONES and ZEROS!)
[19:15] * markbby (~Adium@168.94.245.1) has joined #ceph
[19:17] * bandrus (~oddo@216.57.72.205) has joined #ceph
[19:20] * vz (~vz@122.167.215.192) Quit (Ping timeout: 480 seconds)
[19:26] * jeff-YF (~jeffyf@67.23.117.122) has joined #ceph
[19:27] * rturk|afk is now known as rturk
[19:29] * Cube (~Cube@66-87-64-203.pools.spcsdns.net) has joined #ceph
[19:32] * jakes (~oftc-webi@128-107-239-235.cisco.com) has joined #ceph
[19:33] <jakes> Can weights of OSD's used to correlate the latency of the network?
[19:34] * cookednoodles (~eoin@eoin.clanslots.com) Quit (Quit: Ex-Chat)
[19:36] * cookednoodles (~eoin@eoin.clanslots.com) has joined #ceph
[19:36] <seapasulli> jakes: I don't see why not but wouldn't latency change +/- depending?
[19:37] <jakes> Yeah. I was planning to configure it dynamically
[19:37] <seapasulli> The usable size should pretty much be the same for the life of the OSD so I think it makes more sense. I think size + avg latency would be nice though. You could adjust it via crush maps and adjust.
[19:38] * cok (~chk@2a02:2350:18:1012:c0b8:53e:d526:49f2) has left #ceph
[19:40] <jakes> it is given in docs to configure weight of 1 for 1TB and consider having small variation based on the i/o rate.(1.2 for high i/o disks). So, if we have a mix of weight configurations(one based on the size, while the other on latency), will there be an issue?
[19:41] <steveeJ> is the primary OSD of one PG the same on every host?
[19:41] * adamcrume (~quassel@c-71-204-162-10.hsd1.ca.comcast.net) has joined #ceph
[19:42] <jakes> How can it be the same?
[19:45] <nizedk> jiffe; do you see the same behaviour with size=3 and min_size=2 ?
[19:47] <steveeJ> jakes: since the PG belongs to the pool, which should be the same on every host.
[19:47] * fghaas (~florian@91-119-223-7.dynamic.xdsl-line.inode.at) Quit (Ping timeout: 480 seconds)
[19:47] <steveeJ> ceph pg map `id` gives the same result on every host. that should answer my question
[19:48] <jakes> seapasulli: How can I have a configuration with both size and latency together?. how can we configure weight a High storage disk with high latency?.
[19:49] <jiffe> nizedk: I'm curious how size 3 would work with my setup, I have 2 datacenters defined and a crush rule to chooseleaf_firstn on datacenter
[19:50] <jiffe> I was thinking that it shouldn't work but it seems to so I'm not sure what crush is doing really
[19:51] <steveeJ> jiffe: how exactly does the rule look like?
[19:52] * DP (~oftc-webi@zccy01cs105.houston.hp.com) has joined #ceph
[19:53] <seapasulli> jakes: I am trying to find it now (I kept my ceph cluster mostly default) but I think you can call custom scripts to help with this::
[19:53] <seapasulli> A customize location hook can be used in place of the generic hook for OSD daemon placement in the hierarchy. (On startup, each OSD ensure its position is correct.):
[19:53] <seapasulli> osd crush location hook = /path/to/script
[19:53] <seapasulli> http://ceph.com/docs/master/rados/operations/crush-map/
[19:53] * Nacer (~Nacer@2a01:e35:2e29:9800:f967:8cc4:d61a:2e9b) Quit (Remote host closed the connection)
[19:53] * burley (~khemicals@165.sub-70-194-193.myvzw.com) has joined #ceph
[19:53] <seapasulli> brb sick
[19:56] <steveeJ> jiffe: according to the the docs {num} defines how many buckets will be chosen. in case you chose 0 and there's >= {your pool's min_size} OSDs in that datacenter your requests will not be blocked. what does ceph health say?
[19:57] <jiffe> steveeJ: I didn't have the pools using that rule, fixed that and now health says HEALTH_WARN 64 pgs stuck unclean; recovery 9/153 objects degraded
[19:57] * rotbeard (~redbeard@2a02:908:df10:d300:76f0:6dff:fe3b:994d) has joined #ceph
[19:57] <jakes> seapasulli: I am sorry. I couldn't understand, how it would help this situation
[19:58] <jiffe> I set size back to 2 but its still not healthy so I broke something :)
[20:00] <steveeJ> jiffe: is it still moving/copying?
[20:01] * Nacer (~Nacer@2a01:e35:2e29:9800:9c4e:3fc3:af65:c16e) has joined #ceph
[20:01] <jiffe> steveeJ: if it is I'm not seeing anything indicating that, ceph -w is quiet
[20:02] * kevinc (~kevinc__@client65-44.sdsc.edu) Quit (Quit: This computer has gone to sleep)
[20:02] * kevinc (~kevinc__@client65-44.sdsc.edu) has joined #ceph
[20:02] <steveeJ> jiffe: i still haven't seen your rule so i can only guess here
[20:02] <seapasulli> jakes: You're looking to deploy osds and recalculate weight based on size + latency right?
[20:02] <jakes> yes
[20:03] * Nacer (~Nacer@2a01:e35:2e29:9800:9c4e:3fc3:af65:c16e) Quit (Remote host closed the connection)
[20:03] * Nacer (~Nacer@2a01:e35:2e29:9800:9c4e:3fc3:af65:c16e) has joined #ceph
[20:05] * ircolle is now known as ircolle-afk
[20:05] <seapasulli> So could you not make a hook to determine the latency of a drive quickly + usable size? I think I copy-pasted from the wrong location but the hook part exists. Can you not re-weight your osds based on a quick hdparm + existing weight or usable size?
[20:07] <nizedk> if I have two datacenters, and say, want to buy 6-10 nodes of 24-36 spinners, I'll still be somewhat limited in regards to space efficiency/storage density. Put them in 1 datacenter, and it improves drastically. What do you other "two datacenter" owners do, if the target is 'slow and big'?
[20:07] * stewiem2000 (~stewiem20@195.10.250.233) Quit (Ping timeout: 480 seconds)
[20:08] <seapasulli> or just edit the existing crushmap to have latency float + size float?
[20:08] <jiffe> steveeJ: this is my crushdump http://nsab.us/public/ceph
[20:08] <jiffe> my new rule is the georeplicate rule
[20:10] <jiffe> my num is 0 so it would try to grab buckets if my pool size is 3
[20:11] <jiffe> I can change to 2 since I have 2 datacenters but then the question is will it choose 3 leaf nodes and from where
[20:11] * Nacer (~Nacer@2a01:e35:2e29:9800:9c4e:3fc3:af65:c16e) Quit (Ping timeout: 480 seconds)
[20:11] * JC (~JC@AMontpellier-651-1-420-97.w92-133.abo.wanadoo.fr) has joined #ceph
[20:12] <jiffe> or can you have multiple chooseleaf firstn lines...
[20:13] <jiffe> then I could set num = 2 for the first and maybe -2 for the second
[20:13] * codice (~toodles@97-94-175-73.static.mtpk.ca.charter.com) Quit (Read error: Operation timed out)
[20:16] * codice (~toodles@97-94-175-73.static.mtpk.ca.charter.com) has joined #ceph
[20:16] <nizedk> is it supported to place an rbd image on top of an erasure code pool, when a write-back cache pool is set in front of the EC pool? (without the cache pool, the rbd image cannot be created on the EC pool).
[20:19] <jakes> seapasulli: It would be great if you can give an example or point out some page where example to hook feature exists. My question is:. As OSD's are mapped dynamicaly onto PG's, isn't required to change weights of all OSD's in the CRUSH MAP, or else, some OSD's would have weights based on storage, while other based on storage +latency
[20:22] * cookednoodles (~eoin@eoin.clanslots.com) Quit (Quit: Ex-Chat)
[20:23] <steveeJ> jiffe: let's take it slow since i'm also a beginner with these rules ;) first the rule wants to select hosts. you try to select as many hosts as your size is. that will not work with num > 2, right?
[20:23] * cookednoodles (~eoin@eoin.clanslots.com) has joined #ceph
[20:24] <jiffe> steveeJ: well the purpose of the rule is to make sure that replicas are placed in each datacenter
[20:24] <jiffe> I have 4 hosts, 4 osds, 2 datacenters
[20:24] <seapasulli> I know I shot the ceph.com page earlier. I am trying to find an example. If I can't find one I will try to add it into my map and see how it goes.
[20:26] * burley (~khemicals@165.sub-70-194-193.myvzw.com) Quit (Ping timeout: 480 seconds)
[20:27] <jiffe> I had the rule set to 2, I changed it to 3 just to see what would happen
[20:27] <jiffe> I had to pool size set to 2 that is
[20:27] * sarob (~sarob@cpe-75-82-233-45.socal.res.rr.com) Quit (Remote host closed the connection)
[20:27] * sarob (~sarob@cpe-75-82-233-45.socal.res.rr.com) has joined #ceph
[20:28] * cookednoodles (~eoin@eoin.clanslots.com) Quit ()
[20:28] <steveeJ> jiffe: and with a size=2 it's healthy again?
[20:28] <jiffe> steveeJ: its not, I tried setting it back to 2 but that had no effect it still complains about being unclean
[20:28] * cookednoodles (~eoin@eoin.clanslots.com) has joined #ceph
[20:29] * neurodrone (~neurodron@static-108-30-171-7.nycmny.fios.verizon.net) has joined #ceph
[20:29] <jiffe> it looks like there's still an extra 64 pgs
[20:29] <jakes> seapasulli: Thanks. This was my question: eg: say pool1{osd1,osd2,osd3,osd4} . Say pool 1 has two pgs's. PG1= [osd1,osd2,osd3] and PG2-[osd2,osd3,osd4] . In this case, if I change weights for osd3 based on the latency, it will change the data write ratio for both of the PGS'- PG1 and PG2. So, In a live cluster with thousands of PG's, how can we keep add latency value to OSD wieghts if we need to deterministically calculate the output?.
[20:30] <jiffe> maybe not thats a guess
[20:30] <jakes> hope , you understood the question.
[20:31] <steveeJ> jiffe: i'm kind of in a hurry right now. i'd try sketching the tree you've created and see if your rule can possibly get the osd's you're requesting
[20:33] <steveeJ> may be try setting min_size=1 to see if that helps as a temporary workaround
[20:33] * cookednoodles (~eoin@eoin.clanslots.com) Quit ()
[20:33] <jiffe> steveeJ: that is a question I'd like to answer but right now I'm more curious if its possible to fix the pool size too large problem I've created without having the osds to back it up
[20:33] * burley (~khemicals@63.sub-70-194-200.myvzw.com) has joined #ceph
[20:34] <steveeJ> jiffe: you mean i want to decrease the size to 2 again?
[20:34] <jiffe> steveeJ: indeed
[20:34] <jiffe> I have the size = 2, min_size = 1 again
[20:35] <steveeJ> and stuck unclean pgs?
[20:35] <jiffe> yup still HEALTH_WARN 192 pgs stuck unclean; recovery 9/153 objects degraded (5.882%)
[20:35] * BManojlovic (~steki@178-221-116-161.dynamic.isp.telekom.rs) has joined #ceph
[20:35] <steveeJ> i'd make sure the OSDs are not frozen by restarting them
[20:35] * sarob (~sarob@cpe-75-82-233-45.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[20:36] <steveeJ> i've gotta go now. i'll be around later again
[20:36] <seapasulli> jakes: I can't see any way that this can be done. I can only see you getting a nice average by testing the OSD prior to adding it into the cluster via iozone or some other utility.
[20:36] * BManojlovic (~steki@178-221-116-161.dynamic.isp.telekom.rs) Quit (Read error: Connection reset by peer)
[20:37] <jiffe> steveeJ: that was it, restarted two of the osds and its healthy again
[20:38] * cookednoodles (~eoin@eoin.clanslots.com) has joined #ceph
[20:39] * rendar (~I@host45-177-dynamic.20-87-r.retail.telecomitalia.it) Quit (Ping timeout: 480 seconds)
[20:39] <jakes> seapasulli: This should be static configuration. correct?. I somehow wished for a dynamic configuration based on the ongoing latency of the osd
[20:41] * rendar (~I@host45-177-dynamic.20-87-r.retail.telecomitalia.it) has joined #ceph
[20:42] * ingard_ (~cake@tu.rd.vc) Quit (Ping timeout: 480 seconds)
[20:50] * rturk is now known as rturk|afk
[20:55] * jakes (~oftc-webi@128-107-239-235.cisco.com) Quit (Remote host closed the connection)
[20:55] * kevinc (~kevinc__@client65-44.sdsc.edu) Quit (Quit: This computer has gone to sleep)
[21:04] * burley (~khemicals@63.sub-70-194-200.myvzw.com) Quit (Quit: burley)
[21:04] <ganders> hi to all, i've some issues with my ceph cluster (4 osd servers, and 3 mon servers), each osd server has 3 SSD journals devs each with 3 OSD daemons running.
[21:06] <ganders> I've map a rbd on a client with Ubuntu 12.04.2LST with Kern 3.15.7
[21:06] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[21:07] <ganders> once the rbd was mounted, i've issue a fio command on the rbd.. and then the perf of the ceph cluster goes really down, and freeze my client prompt, and if i try a ceph -w on one of the monitors.. then nothing is going on
[21:08] * rturk|afk is now known as rturk
[21:10] <ganders> any idea?
[21:13] <ganders> i run a dstat on the linux client, and the transfer on the net is really bad...almost 436B, and if i run a nmon on one of the osd servers... nothing is going on at all
[21:14] * BManojlovic (~steki@178-221-116-161.dynamic.isp.telekom.rs) has joined #ceph
[21:20] <cookednoodles> generally the kernel part is really buggy
[21:23] <ganders> yeah but also i try with another kernel 3.13 and i'm getting the same results :(
[21:27] * rturk is now known as rturk|afk
[21:28] <ganders> the cluster is doing nothing since it's new
[21:31] * andreask (~andreask@91.224.48.154) has joined #ceph
[21:31] * ChanServ sets mode +v andreask
[21:35] * BManojlovic (~steki@178-221-116-161.dynamic.isp.telekom.rs) Quit (Ping timeout: 480 seconds)
[21:36] * ircolle-afk is now known as ircolle
[21:39] * ralphte (ralphte@d.clients.kiwiirc.com) has joined #ceph
[21:39] * Cybert1nus is now known as Cybertinus
[21:39] <ralphte> Can you install ceph on just two physical servers. Or do you need at least three?
[21:42] * andreask (~andreask@91.224.48.154) has left #ceph
[21:44] * fdmanana (~fdmanana@bl5-173-238.dsl.telepac.pt) Quit (Quit: Leaving)
[21:47] <kitz> ralphte: two is fine just make sure you set your pool's size (i.e. replication factor) to 2 as well.
[21:48] <kitz> ralphte: the gotcha is that if either of the mon processes goes down (assuming you put mons on both) all I/O will halt until it comes back up. With 3 mons you can sustain a quorum during the outage of a single node and I/O will continue.
[21:50] * lofejndif (~lsqavnbok@freeciv.nmte.ch) has joined #ceph
[21:51] <ralphte> Got ya if I run a a MON on a third box would this fix the problem
[21:52] <ralphte> Also am I correct in my math that with a replication factor of 2 assuming I have identical nodes that my avalable space would be x/2
[21:58] * kevinc (~kevinc__@client65-44.sdsc.edu) has joined #ceph
[21:59] <dmick> ralphte: yes
[22:02] <ralphte> for both questions
[22:02] <dmick> I was talking about space; what was the first questoin?
[22:04] <ralphte> it was about just running mon on a third box
[22:05] * BManojlovic (~steki@212.200.65.138) has joined #ceph
[22:08] <ralphte> Also does your storage have to be identical for every node
[22:12] * erice (~erice@50.245.231.209) has joined #ceph
[22:12] <seapasulli> ralphte: apparently not but it's preferred.
[22:13] <seapasulli> Of course things will be easier if they were. I don't use uniform hardware in ours. I have 8 disk fat twins and 36 disk servers both hosting osds
[22:13] <seapasulli> of different sizes.
[22:13] * Gorazd (~Venturi@93-103-91-169.dynamic.t-2.net) has joined #ceph
[22:13] <mtl1> Hi. Are there any plans to allow ceph to resize down an rbd device?
[22:13] <ralphte> I am assuming the trade off would be be your total available space since if one server is has more space then another?
[22:14] <darkfader> mtl1: it is possible already
[22:14] <mtl1> Just not with ???resize?
[22:14] <darkfader> yes
[22:14] <mtl1> directions somewhere?
[22:14] <darkfader> --resize 10240 --allow-shrink iirc
[22:14] <seapasulli> http://ceph.com/docs/master/man/8/rbd/
[22:14] <mtl1> Hmm, OK. I'm assuming openstack has no idea about that option though.
[22:14] <darkfader> safety: you need to resize any filesystem / lvm devices on it first :)
[22:14] <seapasulli> resize [image-name] [???allow-shrink]
[22:14] <seapasulli> Resizes rbd image. The size parameter also needs to be specified. The ???allow-shrink option lets the size be reduced.
[22:15] <darkfader> mtl1: that's normal, any layer like libvirt or such loses some features :)
[22:15] <mtl1> darkfader: thanks.
[22:16] <mtl1> My question was specific to openstack, but I also do some sorta-manual stuff, and that may help.
[22:16] <Gorazd> Does CEPH know WSGI middleware swift-like functionality within RGW - is that possible through uWSGI?
[22:18] <seapasulli> Gorazd: Can you be more specific? Sorry I'm dumb.
[22:19] <Gorazd> I would like to understand, in case RGW is used insted of Swift as a object storage backend, do I still could use middleware, which is written for Swift, together with Swift...
[22:19] * jakes (~oftc-webi@128-107-239-234.cisco.com) has joined #ceph
[22:20] <Gorazd> There was a nice post comparing ceph/swift where it says, in case using RGW, you could no longer use Swift pipline/middleware http://www.gossamer-threads.com/lists/openstack/dev/39720
[22:25] <seapasulli> We currently use Ceph RGW as our swift store. It accepts a few keystone roles as authenticated users and generates a new user using the keystone tenant ID.
[22:25] * Cube (~Cube@66-87-64-203.pools.spcsdns.net) Quit (Quit: Leaving.)
[22:27] <seapasulli> We use the swift cli written in python and just have it point to RGW. It seems to work mostly the same. Some ACL stuff doesn't seem to work or it's just I don't know how to fine tune it further. IE if we add an accepted role IE "swift_dl" and I want them to just download data with this role. I can't seem to get that working. Instead we set their tenant up with a 0byte quota so they can not upload anything new just read existing.
[22:27] <seapasulli> Gorazd:
[22:28] * jakes (~oftc-webi@128-107-239-234.cisco.com) Quit (Remote host closed the connection)
[22:29] * markbby (~Adium@168.94.245.1) Quit (Quit: Leaving.)
[22:32] * rwheeler (~rwheeler@nat-pool-bos-t.redhat.com) Quit (Quit: Leaving)
[22:35] <seapasulli> I can't get this ceph cluster to just drop these degraded placement groups and continue on. I had 2/3rds of the ceph cluster fail and mostly recovered unscathed (out of 1.5PB of space I lost a good 1PB).
[22:36] <seapasulli> health HEALTH_WARN 5 pgs recovering; 5 pgs stuck unclean; recovery 216/13918161 objects degraded (0.002%); 25/4639387 unfound (0.001%)
[22:36] * aknapp (~aknapp@fw125-01-outside-active.ent.mgmt.glbt1.secureserver.net) Quit (Read error: Connection reset by peer)
[22:36] * aknapp (~aknapp@fw125-01-outside-active.ent.mgmt.glbt1.secureserver.net) has joined #ceph
[22:36] <Gorazd> Thank you seapasulli. So CEPH API is not 100% compatible (http://ceph.com/docs/master/radosgw/swift/), so some one should make adoption to already written Swift-middleware..
[22:37] <seapasulli> I did mark_unfound_lost revert but can't get it to work.
[22:37] <seapasulli> yup it's not 100% yet Gorazd.
[22:38] <seapasulli> Apparently WOS is but when they tried to sell to us we thought the rest of the product wasn't developed enough yet to integrat with keystone yet.
[22:44] <Gorazd> Does CEPH has any functionality to roll-back SW installation. Let say if I find out new version of sw does not work for me, can I go back with older installation
[22:47] * bandrus (~oddo@216.57.72.205) Quit (Read error: Connection reset by peer)
[22:51] <seapasulli> I have so far. but I did ceph-deploy uninstall ${node} then ceph-deploy install ${node}.. (I installed latest git to try to address odd bugs). So far nothing too nuts. Just be careful about python version 2.7.7 seems cause ceph to just hang on most client commands.
[22:51] * bandrus (~oddo@216.57.72.205) has joined #ceph
[22:58] * burley (~khemicals@cpe-98-28-233-158.woh.res.rr.com) has joined #ceph
[23:04] * lpabon (~lpabon@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[23:04] * ikrstic (~ikrstic@109-93-162-27.dynamic.isp.telekom.rs) Quit (Quit: Konversation terminated!)
[23:04] <Gorazd> Also the question, why is recommencded for OSDs to be the same size?
[23:06] * brad_mssw (~brad@shop.monetra.com) Quit (Quit: Leaving)
[23:07] <dmick> even distribution of data across the cluster
[23:07] <dmick> you can tweak this with OSD weighting
[23:09] * BManojlovic (~steki@212.200.65.138) Quit (Ping timeout: 480 seconds)
[23:19] <Gorazd> Thank you dmick
[23:22] * BManojlovic (~steki@178-222-41-152.dynamic.isp.telekom.rs) has joined #ceph
[23:23] * ganders (~root@200-127-158-54.net.prima.net.ar) Quit (Quit: WeeChat 0.4.1)
[23:30] * rturk|afk is now known as rturk
[23:30] <steveeJ> are there any prepared settings to make sure a client uses a specific host's osds for I/O on rados images?
[23:31] * dmsimard is now known as dmsimard_away
[23:32] <steveeJ> i can only think of the following: create pool:ruleset pairs for each specific host choosing the first OSD of the corresponding host
[23:37] * kevinc (~kevinc__@client65-44.sdsc.edu) Quit (Quit: This computer has gone to sleep)
[23:39] <Gorazd> How does CAP theorem works out for CEPH, since it is primary build for consistency and partition tolerance. How does this work at scale for RADOS GW, what's the workaround if any, to satitsfy also availability, if this possible??
[23:40] <gleam> with rgw the workaround is to establish multiple zones and replicate data between them using radosgw-agent, i believe
[23:40] <gleam> see http://ceph.com/docs/master/radosgw/federated-config/
[23:44] * BManojlovic (~steki@178-222-41-152.dynamic.isp.telekom.rs) Quit (Ping timeout: 480 seconds)
[23:50] * xarses (~andreww@12.164.168.117) Quit (Ping timeout: 480 seconds)
[23:50] * hufman (~hufman@cpe-184-58-235-28.wi.res.rr.com) Quit (Quit: leaving)
[23:58] * Nacer (~Nacer@pai34-4-82-240-124-12.fbx.proxad.net) has joined #ceph
[23:59] * kevinc (~kevinc__@client65-44.sdsc.edu) has joined #ceph
[23:59] * bkopilov (~bkopilov@213.57.18.102) Quit (Ping timeout: 480 seconds)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.