#ceph IRC Log

Index

IRC Log for 2013-05-24

Timestamps are in GMT/BST.

[0:01] * DarkAceZ (~BillyMays@50.107.54.92) has joined #ceph
[0:02] * BillK (~BillK@124-169-236-155.dyn.iinet.net.au) has joined #ceph
[0:08] * drokita1 (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) Quit (Quit: Leaving.)
[0:08] * aliguori (~anthony@32.97.110.51) Quit (Quit: Ex-Chat)
[0:11] * jeff-YF_ (~jeffyf@67.23.117.122) has joined #ceph
[0:12] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) has joined #ceph
[0:13] * Tamil1 (~tamil@38.122.20.226) has joined #ceph
[0:14] * xmltok (~xmltok@pool101.bizrate.com) Quit (Ping timeout: 480 seconds)
[0:14] * jeff-YF_ (~jeffyf@67.23.117.122) Quit ()
[0:14] * jeff-YF (~jeffyf@67.23.117.122) Quit (Ping timeout: 480 seconds)
[0:15] * tnt (~tnt@91.177.204.242) Quit (Ping timeout: 480 seconds)
[0:17] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) Quit (Quit: Leaving.)
[0:19] * loicd (~loic@2a01:e35:2eba:db10:88c8:c260:79c1:5bae) has joined #ceph
[0:20] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[0:24] * madkiss (~madkiss@ds80-237-216-40.dedicated.hosteurope.de) Quit (Quit: Leaving.)
[0:31] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[0:37] * portante (~user@66.187.233.206) Quit (Ping timeout: 480 seconds)
[0:38] * loicd (~loic@2a01:e35:2eba:db10:88c8:c260:79c1:5bae) Quit (Quit: Leaving.)
[0:39] * loicd (~loic@magenta.dachary.org) has joined #ceph
[0:40] * MooingLemur (~troy@phx-pnap.pinchaser.com) Quit (Ping timeout: 480 seconds)
[0:42] * jshen (~jshen@209.133.73.98) Quit (Ping timeout: 480 seconds)
[0:44] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[0:55] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) has joined #ceph
[0:56] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[0:57] * loicd (~loic@magenta.dachary.org) has joined #ceph
[0:59] * PerlStalker (~PerlStalk@72.166.192.70) Quit (Quit: ...)
[1:03] * MooingLemur (~troy@phx-pnap.pinchaser.com) has joined #ceph
[1:07] * redeemed (~redeemed@static-71-170-33-24.dllstx.fios.verizon.net) Quit (Quit: bia)
[1:08] * dwt (~dwt@128-107-239-235.cisco.com) Quit (Quit: Leaving)
[1:14] * john_barbee (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) has joined #ceph
[1:17] * rturk is now known as rturk-away
[1:24] * rturk-away is now known as rturk
[1:26] * rturk is now known as rturk-away
[1:31] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[1:44] * rturk-away is now known as rturk
[1:46] * DarkAce-Z (~BillyMays@50.107.54.92) has joined #ceph
[1:46] * rturk is now known as rturk-away
[1:47] * rturk-away is now known as rturk
[1:49] * rturk is now known as rturk-away
[1:49] * DarkAceZ (~BillyMays@50.107.54.92) Quit (Ping timeout: 480 seconds)
[2:10] * dpippenger1 (~riven@206-169-78-213.static.twtelecom.net) has joined #ceph
[2:10] * dpippenger (~riven@206-169-78-213.static.twtelecom.net) Quit (Read error: Connection reset by peer)
[2:13] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[2:20] * LeaChim (~LeaChim@2.127.72.50) Quit (Ping timeout: 480 seconds)
[2:23] * alram (~alram@38.122.20.226) Quit (Quit: Lost terminal)
[2:39] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[3:07] * DarkAceZ (~BillyMays@50.107.54.92) has joined #ceph
[3:07] * DarkAce-Z (~BillyMays@50.107.54.92) Quit (Ping timeout: 480 seconds)
[3:28] * julian (~julianwa@125.70.132.111) has joined #ceph
[3:28] * Tamil1 (~tamil@38.122.20.226) Quit (Quit: Leaving.)
[3:29] * redeemed (~redeemed@cpe-192-136-224-78.tx.res.rr.com) has joined #ceph
[3:37] * redeemed_ (~redeemed@cpe-192-136-224-78.tx.res.rr.com) has joined #ceph
[3:37] * redeemed (~redeemed@cpe-192-136-224-78.tx.res.rr.com) Quit (Read error: Connection reset by peer)
[3:45] * redeemed_ (~redeemed@cpe-192-136-224-78.tx.res.rr.com) Quit (Ping timeout: 480 seconds)
[3:46] * redeemed_ (~redeemed@cpe-192-136-224-78.tx.res.rr.com) has joined #ceph
[3:49] * julian (~julianwa@125.70.132.111) Quit (Read error: Connection reset by peer)
[3:52] * julian (~julian@125.70.132.111) has joined #ceph
[3:58] * mormonitor (~mormonito@116.250.217.65) has joined #ceph
[4:02] * treaki__ (4fb3078d52@p4FDF76DD.dip0.t-ipconnect.de) has joined #ceph
[4:03] <mormonitor> hello, i am evaluating ceph to be used with hypertable. i have 3 virtuabox'es with ubuntu 13.04. and bootstrapped ceph using ceph-deploy. I am getting "mount error 5 = Input/output error" trying to mount the filesystem with "mount -t ceph 192.168.56.42:/ /mnt/" what i've done wrong?
[4:06] * treaki_ (88156254f0@p4FDF7CCD.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[4:12] * The_Bishop__ (~bishop@e177091074.adsl.alicedsl.de) has joined #ceph
[4:19] * The_Bishop_ (~bishop@f052096225.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[4:19] * redeemed__ (~redeemed@cpe-192-136-224-78.tx.res.rr.com) has joined #ceph
[4:19] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[4:22] * redeemed_ (~redeemed@cpe-192-136-224-78.tx.res.rr.com) Quit (Ping timeout: 480 seconds)
[4:22] * jshen (~jshen@108-231-76-84.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[4:30] * redeemed___ (~redeemed@cpe-192-136-224-78.tx.res.rr.com) has joined #ceph
[4:30] * redeemed__ (~redeemed@cpe-192-136-224-78.tx.res.rr.com) Quit (Read error: Connection reset by peer)
[4:30] * redeemed___ is now known as redeemed
[4:35] * tserong (~tserong@124-171-115-108.dyn.iinet.net.au) has joined #ceph
[4:53] * redeemed_ (~redeemed@cpe-192-136-224-78.tx.res.rr.com) has joined #ceph
[4:53] * redeemed (~redeemed@cpe-192-136-224-78.tx.res.rr.com) Quit (Read error: Connection reset by peer)
[4:53] * redeemed__ (~redeemed@cpe-192-136-224-78.tx.res.rr.com) has joined #ceph
[5:01] * redeemed_ (~redeemed@cpe-192-136-224-78.tx.res.rr.com) Quit (Ping timeout: 481 seconds)
[5:05] * The_Bishop__ (~bishop@e177091074.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[5:05] * redeemed__ (~redeemed@cpe-192-136-224-78.tx.res.rr.com) Quit (Remote host closed the connection)
[5:06] * redeemed (~redeemed@cpe-192-136-224-78.tx.res.rr.com) has joined #ceph
[5:11] * davidzlap (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[5:12] * redeemed (~redeemed@cpe-192-136-224-78.tx.res.rr.com) Quit (Remote host closed the connection)
[5:13] * redeemed (~redeemed@cpe-192-136-224-78.tx.res.rr.com) has joined #ceph
[5:16] * redeemed (~redeemed@cpe-192-136-224-78.tx.res.rr.com) Quit ()
[5:17] * The_Bishop (~bishop@e177091074.adsl.alicedsl.de) has joined #ceph
[5:47] * NXCZ (~chatzilla@ip72-199-155-185.sd.sd.cox.net) Quit (Quit: ChatZilla 0.9.90 [Firefox 20.0.1/20130409194949])
[5:53] * jshen (~jshen@108-231-76-84.lightspeed.sntcca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[5:57] * Cube1 (~Cube@12.248.40.138) has joined #ceph
[5:58] * Vanony_ (~vovo@i59F7AD7A.versanet.de) has joined #ceph
[6:04] * Cube (~Cube@12.248.40.138) Quit (Ping timeout: 480 seconds)
[6:04] * Vanony (~vovo@88.130.192.131) Quit (Ping timeout: 480 seconds)
[6:05] * Cube1 (~Cube@12.248.40.138) Quit (Ping timeout: 480 seconds)
[6:08] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[6:11] * jshen (~jshen@108-231-76-84.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[6:20] * jshen (~jshen@108-231-76-84.lightspeed.sntcca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[6:30] * jshen (~jshen@108-231-76-84.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[6:33] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[6:49] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[6:58] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[6:58] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[7:03] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[7:05] * sachindesai (~sachin@triband-mum-59.184.170.212.mtnl.net.in) has joined #ceph
[7:05] <sachindesai> hiee
[7:07] * sachindesai (~sachin@triband-mum-59.184.170.212.mtnl.net.in) Quit ()
[7:08] * dpippenger1 (~riven@206-169-78-213.static.twtelecom.net) Quit (Quit: Leaving.)
[7:11] * tkensiski1 (~tkensiski@c-98-234-160-131.hsd1.ca.comcast.net) has joined #ceph
[7:14] * jshen (~jshen@108-231-76-84.lightspeed.sntcca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[7:48] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[7:52] * NightDog (~Karl@38.179.202.84.customer.cdi.no) has joined #ceph
[7:54] * tkensiski1 (~tkensiski@c-98-234-160-131.hsd1.ca.comcast.net) has left #ceph
[7:55] * NightDog (~Karl@38.179.202.84.customer.cdi.no) Quit ()
[8:05] * tnt (~tnt@91.177.204.242) has joined #ceph
[8:07] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[8:07] * ChanServ sets mode +v andreask
[8:32] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Quit: Pull the pin and count to what?)
[8:32] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[8:41] * mormonitor (~mormonito@116.250.217.65) has left #ceph
[9:06] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) Quit (Quit: Leaving.)
[9:06] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) has joined #ceph
[9:06] * kyle_ (~kyle@216.183.64.10) has joined #ceph
[9:06] * loicd (~loic@185.10.252.15) has joined #ceph
[9:08] * kyle__ (~kyle@216.183.64.10) Quit (Ping timeout: 480 seconds)
[9:13] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) has joined #ceph
[9:16] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[9:32] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[9:32] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[9:33] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[9:33] * jjgalvez1 (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[9:39] * rturk-away is now known as rturk
[9:40] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[9:40] * rturk is now known as rturk-away
[9:43] * leseb (~Adium@83.167.43.235) has joined #ceph
[9:46] * humbolt (~elias@178-190-244-123.adsl.highway.telekom.at) Quit (Ping timeout: 480 seconds)
[9:47] <leseb> hi all
[9:47] * LeaChim (~LeaChim@2.127.72.50) has joined #ceph
[9:56] * humbolt (~elias@91-113-98-248.adsl.highway.telekom.at) has joined #ceph
[10:01] * andrei (~andrei@host86-155-31-94.range86-155.btcentralplus.com) has joined #ceph
[10:01] * espeer (~espeer@105-236-45-136.access.mtnbusiness.co.za) has joined #ceph
[10:02] <espeer> hallo, can anybody tell me what to do to improve single threaded rados performance, rados bench reports 100MB/s write performance by default, but -t 1 drops to about 18MB/s?
[10:02] <loicd> ccourtaut: I wrote this, trying to understand backfilling http://pastebin.com/XLidGQ5i
[10:02] <loicd> I'm not sure I got it right though :-)
[10:02] <espeer> this seems to also have implications for librbd backed VMs, which max out at about 20MB/s write performance
[10:03] <loicd> It would be great if you have time to take a look. I'll cross check before publishing. It may very well be completely wrong :-)
[10:08] * yanzheng (~zhyan@134.134.139.74) has joined #ceph
[10:10] * leseb (~Adium@83.167.43.235) Quit (Quit: Leaving.)
[10:10] * leseb (~Adium@83.167.43.235) has joined #ceph
[10:16] <espeer> or i could be completely misunderstanding things :)
[10:19] <loicd> espeer: could you paste the command you're using ?
[10:22] <espeer> rados bench -p rbd 120 write -t 1
[10:22] <espeer> without the -t 1, it defaults to 16 threads, right?
[10:24] <tnt> espeer: can you try this utility : https://gist.github.com/smunaut/5433222
[10:26] <tnt> you need to create a temporary rbd image of a given size ( like 2G or something ), then call it using as argument "user_name pool_name image_name"
[10:32] <loicd> espeer: yes
[10:32] <loicd> bench seconds mode [ -b objsize ] [ -t threads ] Benchmark for seconds. The mode can be write or read. The default object size is 4 MB, and the default number of simulated threads (parallel writes) is 16.
[10:33] <loicd> espeer: why is it important for you to measure the performances of a single thread ?
[10:37] <espeer> well, i'm concerned about single VM performance
[10:37] <espeer> which also seems to be similarly low at around 20MB/s
[10:37] <espeer> don't know if it's related
[10:37] <espeer> but radosbench gets 100MB/s while librbd backed qemu does 20MB/s
[10:38] <espeer> so, was thinking it may be a single threaded issue
[10:38] <loicd> good thinking
[10:38] <loicd> I don't know if the qemu rbd driver uses multiple threads or not
[10:39] <loicd> espeer: now I'm curious to know the answer :-)
[10:44] <espeer> otherwise, there's something else wrong
[10:45] <espeer> rados bench gets awesome performance, and qemu/rbd is not so great
[10:45] <tnt> espeer: check the bench I posted above. It uses librbd in single thread mode.
[10:50] <espeer> tnt, running
[10:51] <espeer> ganymede ~ # ./rados_bench admin rbd test_image
[10:51] <espeer> Read: 1221.23 Mb/s (2147483648 bytes in 1677 ms)
[10:51] <espeer> Write: 75.57 Mb/s (2147483648 bytes in 27100 ms)
[10:51] <espeer> so, that's not too terrible then
[10:51] <espeer> why is qemu/rbd so lousy then?
[10:51] <tnt> 1221.23 Mb ?!? ...
[10:52] <tnt> 10G network ?
[10:52] <espeer> nope 1G
[10:52] <espeer> so, that doesn't make sense
[10:52] <tnt> Oh wait ...
[10:52] <espeer> unless it's some cache effect on read?
[10:52] <tnt> you need to re-run it a second time.
[10:52] <tnt> if the image is new, it's sparse ...
[10:52] <espeer> ah, right
[10:52] <espeer> write before read in the test :)
[10:54] <espeer> ganymede ~ # ./rados_bench admin rbd test_image
[10:54] <espeer> Read: 128.41 Mb/s (2147483648 bytes in 15949 ms)
[10:54] <espeer> Write: 93.82 Mb/s (2147483648 bytes in 21829 ms)
[10:54] <espeer> right, so ceph and rados are happy, right?
[10:55] <tnt> that's more reasonable.
[10:55] <tnt> so yes, I'd say that the issue is more in the way requests are passed from the VM to RBD that perf might be lost.
[10:56] <tnt> what do you use to bench on the VM ?
[10:56] * yanzheng (~zhyan@134.134.139.74) Quit (Remote host closed the connection)
[10:58] <topro> tnt: won't refer Mb to MBit as opposed to MB as MByte?
[10:58] <espeer> tnt, just plain old dd a large file
[10:59] <tnt> espeer: can you try to dd directly on a block device ? (you obviously need two RBD devices attached, or use partitions on the block device)
[10:59] <espeer> ganymede ~ # dd if=/dev/zero of=test bs=4k count=100000
[10:59] <espeer> 100000+0 records in
[10:59] <espeer> 100000+0 records out
[10:59] <espeer> 409600000 bytes (410 MB) copied, 12.8061 s, 32.0 MB/s
[10:59] <tnt> espeer: well ... try bs=1M
[11:00] <tnt> topro: huh ... yeah ... but in this case it really mean Mbytes :p
[11:01] <espeer> doh, bs=1M count=100000 is a lot of data
[11:01] <topro> tnt: meanwhile I realised myself by doing the math of provided figures (2147483648 bytes in 15949 ms)
[11:01] * espeer picks more reasonble count
[11:01] <tnt> :)
[11:02] <espeer> ganymede ~ # dd if=/dev/zero of=test bs=1M count=1000
[11:02] <espeer> 1000+0 records in
[11:02] <espeer> 1000+0 records out
[11:02] <espeer> 1048576000 bytes (1.0 GB) copied, 34.1779 s, 30.7 MB/s
[11:02] <espeer> still not great
[11:02] <tnt> try on the block device directly and adding oflag=direct
[11:02] <espeer> will have to set that up
[11:02] <espeer> give me a minute
[11:03] <tnt> gotta go take care of something for a few min anyway :p
[11:14] <espeer> ouch
[11:14] <espeer> ganymede ~ # dd if=/dev/zero of=/dev/vdb bs=1M count=1000 oflag=direct
[11:14] <espeer> 1000+0 records in
[11:14] <espeer> 1000+0 records out
[11:14] <espeer> 1048576000 bytes (1.0 GB) copied, 90.1984 s, 11.6 MB/s
[11:14] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[11:15] <tnt> indeed, not great.
[11:16] <espeer> second run was simular
[11:17] <espeer> similar
[11:17] <tnt> what if you remove the oflag ?
[11:18] <tnt> what io scheduler are you using btw ? (on the vm)
[11:18] <espeer> deadline on the ceph hosts
[11:18] <espeer> i think the VM is cfq
[11:19] <tnt> try no-op on the VM
[11:19] <espeer> ganymede ~ # dd if=/dev/zero of=/dev/vdb bs=1M count=1000
[11:19] <espeer> 1000+0 records in
[11:19] <espeer> 1000+0 records out
[11:19] <espeer> 1048576000 bytes (1.0 GB) copied, 33.5054 s, 31.3 MB/s
[11:19] <espeer> without o_direct, it's similar to the file on fs performance
[11:20] <espeer> yays
[11:20] <espeer> ganymede ~ # dd if=/dev/zero of=/dev/vdb bs=1M count=1000
[11:20] <espeer> 1000+0 records in
[11:20] <espeer> 1000+0 records out
[11:20] <espeer> 1048576000 bytes (1.0 GB) copied, 12.7559 s, 82.2 MB/s
[11:20] <espeer> that made a massive difference
[11:21] <tnt> :)
[11:22] <espeer> must just communicate that to my customers ;)
[11:22] <espeer> ganymede ~ # dd if=/dev/zero of=/dev/vdb bs=1M count=1000 oflag=direct
[11:23] <espeer> 1000+0 records in
[11:23] <espeer> 1000+0 records out
[11:23] <espeer> 1048576000 bytes (1.0 GB) copied, 99.4744 s, 10.5 MB/s
[11:23] <espeer> o_direct is also much much slower
[11:24] <tnt> yup, I still don't quite get why ...
[11:25] <espeer> have a customer using o_direct mysql inside a VM, so that's still going to hurt him
[11:25] <mrjack> re
[11:26] <mrjack> tnt: how is your cluster going? ;)
[11:26] <tnt> it's alive and well :)
[11:26] <mrjack> mon problem solved?
[11:28] <tnt> it didn't occur since I upgraded to the patched version ~ 40 hours ago.
[11:30] <espeer> thanks tnt, you've helped a lot
[11:30] <espeer> ;)
[11:31] <tnt> espeer: you can also enable the writeback cache ... but of course that comes with it own set of caveats
[11:44] <espeer> tnt, i've just turned off the writeback cache
[11:44] <espeer> was triggering the qemu latency problem
[11:44] <espeer> where the whole VM blocks
[11:44] <espeer> i believe there is a fix for that though
[11:51] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[11:52] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[11:53] * leseb (~Adium@83.167.43.235) Quit (Quit: Leaving.)
[12:02] * andrei (~andrei@host86-155-31-94.range86-155.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[12:02] <joao> tnt, woohoo!
[12:03] <joao> tnt, did that solve the store growth by any chance?
[12:04] <tnt> joao: yes, AFAICT.
[12:04] * espeer (~espeer@105-236-45-136.access.mtnbusiness.co.za) Quit (Quit: Konversation terminated!)
[12:05] <tnt> joao: that will be much more conclusive if it goes through the weekend without issues though.
[12:05] <joao> tnt, yeah, give it time
[12:05] <joao> :)
[12:11] <loicd> :-)
[12:15] * leseb (~Adium@83.167.43.235) has joined #ceph
[12:21] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[12:25] * andrei (~andrei@host217-46-236-49.in-addr.btopenworld.com) has joined #ceph
[12:28] * tnt (~tnt@91.177.204.242) Quit (Ping timeout: 480 seconds)
[12:32] * leseb (~Adium@83.167.43.235) Quit (Quit: Leaving.)
[12:36] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[12:40] * diegows (~diegows@190.190.2.126) has joined #ceph
[12:53] * eegiks (~quassel@2a01:e35:8a2c:b230:951a:1e33:5e5e:5519) has joined #ceph
[13:07] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[13:11] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[13:15] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[13:19] * b1tbkt_ (~b1tbkt@24-217-196-119.dhcp.stls.mo.charter.com) has joined #ceph
[13:23] * b1tbkt (~b1tbkt@24-217-196-119.dhcp.stls.mo.charter.com) Quit (Ping timeout: 480 seconds)
[13:31] * bergerx_ (~bekir@78.188.204.182) has joined #ceph
[13:41] * KindTwo (KindOne@h10.172.17.98.dynamic.ip.windstream.net) has joined #ceph
[13:43] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[13:43] * KindTwo is now known as KindOne
[13:46] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[13:46] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Remote host closed the connection)
[13:50] * lightspeed (~lightspee@fw-carp-wan.ext.lspeed.org) has joined #ceph
[14:03] * leseb (~Adium@83.167.43.235) has joined #ceph
[14:06] * ChanServ sets mode +v leseb
[14:09] * erwan_taf (~erwan@lns-bzn-48f-62-147-157-222.adsl.proxad.net) has joined #ceph
[14:09] <erwan_taf> hi there
[14:09] <erwan_taf> I'm discovering the project and using it for the first couple of times
[14:10] <erwan_taf> I do use the packages you provide
[14:10] <erwan_taf> I was wondering if there is a particular reason not providing some mandatory directories inside your packages
[14:11] <erwan_taf> /etc/ceph /var/lib/ceph/{osd|mon|mds} etc..
[14:15] <erwan_taf> I mean, from my point of view, that would make sense to have /etc/ceph in ceph-common
[14:15] * jshen (~jshen@108-231-76-84.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[14:22] * john_barbee (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[14:24] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) Quit (Ping timeout: 480 seconds)
[14:32] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[14:39] * jshen (~jshen@108-231-76-84.lightspeed.sntcca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[14:41] * jjgalvez1 (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[14:41] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[14:47] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[14:49] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[14:50] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[14:51] * The_Bishop (~bishop@e177091074.adsl.alicedsl.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[14:51] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Ping timeout: 480 seconds)
[14:51] * dcasier (~dcasier@LVelizy-156-44-40-164.w217-128.abo.wanadoo.fr) has joined #ceph
[14:51] <gucki> mikedawson, tnt
[14:52] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit ()
[14:53] <gucki> mikedawson, tnt: thanks for the information, so i'll wait for next cuttlefish release. i really hope it fixes many of the memory leaks i'm seeing with latest bobtail. better rbd/ qemu performance sounds also great, it's my only usecase :)
[14:53] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[14:54] <gucki> tnt: btw, the best way to upgrade is to shutdown all (so all my 3) mons, then start the new mons and then restart all osds...right? :)
[14:57] <tnt> gucki: that's how I did it. you can restart the osd first AFAIK, it doesn't change much.
[14:58] * jshen (~jshen@108-231-76-84.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[14:59] * leseb (~Adium@83.167.43.235) Quit (Quit: Leaving.)
[15:02] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[15:06] * dosaboy (~dosaboy@host86-163-34-137.range86-163.btcentralplus.com) has joined #ceph
[15:08] * leseb (~Adium@83.167.43.235) has joined #ceph
[15:09] * jeff-YF (~jeffyf@67.23.117.122) has joined #ceph
[15:10] * ChanServ sets mode +v leseb
[15:10] * jshen (~jshen@108-231-76-84.lightspeed.sntcca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[15:11] * leseb (~Adium@83.167.43.235) Quit ()
[15:11] * leseb (~Adium@83.167.43.235) has joined #ceph
[15:12] * ChanServ sets mode +v leseb
[15:13] * dosaboy_ (~dosaboy@host86-161-206-107.range86-161.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[15:14] <niklas> wido: I finally got to the point where I would like to use your rados-java code. Do you have any pre-buit debian packages?
[15:14] <wido> niklas: Ha! Cool :)
[15:14] <wido> niklas: I wrote the debian packaging stuff two days ago
[15:14] <wido> let me build a package for you
[15:15] <niklas> yes, I saw that but I did not find a package, and I'm not too familiar with debian packaging so I could not figure out what to do with the files you pushed two days ago…
[15:15] <wido> niklas: you can also fetch the jar file: http://ceph.com/maven/com/ceph/rados/0.1.1/
[15:15] <wido> and use that in your classpath
[15:16] <niklas> ok, thanks
[15:16] * julian (~julian@125.70.132.111) Quit (Quit: Leaving)
[15:16] <wido> niklas: http://zooi.widodh.nl/ceph/rados-java/librados-java_0.1.1_all.deb
[15:16] <wido> that will place the same jar in /usr/share/java
[15:17] <wido> You can use /usr/share/java/rados.jar
[15:18] <niklas> wido: thank you, I think that wont be necessary - I guess I got confused by the fact you prepared to build a debian package. I assumed it would do more complex things ;-)
[15:18] <wido> niklas: No, not at all. It just puts the jar file in a location
[15:19] <niklas> ok
[15:20] <niklas> I'll report back to you as soon as I get any results
[15:22] <wido> niklas: Thanks. I've ran my UnitTests and they all work out, but you never know
[15:23] * ghartz (~ghartz@ill67-1-82-231-212-191.fbx.proxad.net) has joined #ceph
[15:24] * oliver1 (~oliver@p4FD06843.dip0.t-ipconnect.de) has joined #ceph
[15:33] <ccourtaut> Hi folks, any idea why some builds here http://ceph.com/gitbuilder.cgi are not exposed here http://gitbuilder.ceph.com/ ?
[15:34] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[15:39] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Quit: IceChat - Keeping PC's cool since 2000)
[15:50] * dcasier (~dcasier@LVelizy-156-44-40-164.w217-128.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[15:50] * jshen (~jshen@209.133.73.98) has joined #ceph
[15:57] * portante (~user@66.187.233.206) has joined #ceph
[15:58] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[16:02] * yanzheng (~zhyan@jfdmzpr04-ext.jf.intel.com) has joined #ceph
[16:07] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[16:07] * xmltok (~xmltok@pool101.bizrate.com) Quit (Quit: Leaving...)
[16:07] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[16:07] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[16:08] * Codora (~unf@pool-71-164-242-68.dllstx.fios.verizon.net) Quit (Quit: F*ck you, I'm a daemon.)
[16:11] <niklas> wido: How would I start to debug a connection failure with error code -2 ?
[16:11] <wido> niklas: That's the problem, I can't read the error codes in Java
[16:11] <niklas> Or otherwise: I do give my normal ceph.conf file, don't I?
[16:11] <wido> -2 means "No Such File or Directory"
[16:11] <wido> trying confReadFile?
[16:12] <niklas> yep
[16:12] <wido> niklas: No typo in the path?
[16:12] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[16:13] <niklas> no, but the file does not contain any authentication options, is that correct?
[16:13] <niklas> it kinda looks just likt the example file
[16:13] <niklas> http://ceph.com/docs/master/start/quick-start/#add-a-configuration-file
[16:13] <wido> niklas: It should contain a monitor section
[16:14] <wido> I might want to add something there
[16:14] <wido> then file.exists() fails it throws an exception
[16:14] <wido> "Error opening file or such"
[16:14] <wido> niklas: Can you post the code on Pastebin?
[16:15] <niklas> which code?
[16:15] <niklas> I am running your test case, just changed the monitor ip, and the path to the config file
[16:16] <niklas> err, just changed the path to the config file
[16:16] <wido> ahhh, let me check my test :)
[16:17] <niklas> Connection fails, reading the config file seems to be no problem
[16:17] <wido> niklas: Does the Java proces have read permission?
[16:18] <niklas> wido: TestRados line 92 fails
[16:19] <wido> niklas: So, to what is RADOS_JAVA_CONFIG_FILE pointing?
[16:19] <niklas> config file is 644
[16:20] * madkiss (~madkiss@217.237.167.132) has joined #ceph
[16:20] * madkiss (~madkiss@217.237.167.132) Quit ()
[16:21] <niklas> Replacing line 92 with r.confReadFile(new File("/home/me/ceph/files/ceph.conf")); does not help
[16:21] <niklas> also if I remove the try catch part, its the r.connect() call that fails
[16:22] <niklas> Shouldn't I have to authenticate against librados?
[16:22] <wido> niklas: It needs the confReadFile
[16:22] <wido> niklas: It reads the id, eg "admin" from the environment and then reads the config file
[16:22] <wido> the keyring should be in the config file
[16:22] <niklas> I cant't find any place where I pass a secretkey or password or anything
[16:23] <wido> niklas: In the config file
[16:23] <wido> but if you want to use it yourself
[16:23] <wido> confSet('mon_host', ''
[16:23] <wido> confSet('key', 'AJFKAHAXXXXX');
[16:24] <niklas> wido: confSet('key', 'AJFKAHAXXXXX'); works
[16:24] <niklas> the keyring is not in the ceph.conf file
[16:25] <niklas> it is referenced once in the [client.radosgw.gateway] part, but the path is incorrect because i'm on a different system
[16:26] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) has joined #ceph
[16:26] * vata (~vata@2607:fad8:4:6:1d1e:f894:89dc:6567) has joined #ceph
[16:27] <niklas> wido: I only have a test cluster on VMs running. It is set up from the 5-Minute-Guide where the key is not put in the ceph.conf file but into a seperate ceph.keyring file
[16:27] <wido> niklas: I haven't tried every use case there
[16:28] <wido> but the confReadFile method will work, as long as the config file has valid options :)
[16:28] <wido> 'ceph -s' should work with that file
[16:28] <niklas> wido: the confReadFile does work, it just couldn't read the key, because I didn't put it in the ceph.conf
[16:28] <wido> get it
[16:29] <niklas> thank you very much resolving it
[16:29] <jmlowe> Good morning everybody,I had an osd crash, is this known or should a new bug be filed? http://pastebin.com/AK7wUcDq
[16:30] <wido> jmlowe: Doesn't ring a bell, but it seems like a corrupt file or so
[16:30] <wido> no filesystem errors under the OSD?
[16:31] <niklas> wido: I should not have sticked to your test cases that closely but they make it easy to find the right calls to rados-java ;-)
[16:31] <jmlowe> wido: dmesg is clean
[16:31] <wido> wido: The tests also function as examples ;)
[16:31] <wido> jmlowe: Hmm, odd, can't say for sure
[16:32] <jmlowe> I did trigger the 0.61.1 release by having empty directories that weren't removed like they should have been by earlier releases when I split my pg's
[16:33] <jmlowe> let me rephrase that, when I split my pg's with 0.61.0 my osd's all asserted on empty directories that should have been removed by an earlier release
[16:34] <jmlowe> that bug triggered the 0.61.1 release
[16:35] <nhm> jmlowe: if you don't mind, just fill it and we'll figure out if it's a duplicate.
[16:36] <nhm> jmlowe: maybe just mention that you hit the other bug too.
[16:38] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[16:39] * dcasier (~dcasier@80.215.0.54) has joined #ceph
[16:45] * ccourtaut should have known about vstart sooner :/
[16:50] * dcasier (~dcasier@80.215.0.54) Quit (Ping timeout: 480 seconds)
[16:51] <jamespage> sagewk, gregaf: hey - we are seeing what I think is http://tracker.ceph.com/issues/4282 is one of our test environments - but its unclear from the issue what the resolution was - any pointers
[16:59] * yanzheng (~zhyan@jfdmzpr04-ext.jf.intel.com) Quit (Remote host closed the connection)
[16:59] * rturk-away is now known as rturk
[17:00] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[17:01] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[17:01] * rturk (~rturk@ds2390.dreamservers.com) Quit (Quit: Coyote finally caught me)
[17:01] * rturk-away (~rturk@ds2390.dreamservers.com) has joined #ceph
[17:01] * rturk-away is now known as rturk
[17:02] <elder> jamespage, what version of the kernel are you running?
[17:02] <jamespage> elder, its the Ubuntu quantal 3.5 kernel
[17:02] <jmlowe> filed 5163
[17:03] <jamespage> elder, I'm tested out with the raring 3.8 kernel as well - but I'd like to understand if it is a bug in the 3.5 kernel as we can get that fixed if so
[17:04] <elder> There were some fixes that went into 3.9 that I think fixed that problem. But I don't know if the problem was a regression to begin with or what.
[17:04] <elder> Sage is on vacation today though.
[17:05] * Volture (~quassel@office.meganet.ru) Quit (Remote host closed the connection)
[17:05] <jamespage> elder, right
[17:05] <elder> The 5 commits I'm thinking of are:
[17:06] <elder> e996607 libceph: wrap auth methods in a mutex
[17:06] <elder> 27859f9 libceph: wrap auth ops in wrapper functions
[17:06] <elder> 0bed9b5 libceph: add update_authorizer auth method
[17:06] <elder> 4b8e8b5 libceph: fix authorizer invalidation
[17:06] <elder> 20e55c4 libceph: clear messenger auth_retry flag when we authenticate
[17:08] * Volture (~quassel@office.meganet.ru) has joined #ceph
[17:09] * rturk is now known as rturk-away
[17:15] * leo (~leo@27.106.31.32) has joined #ceph
[17:16] <jamespage> elder, thanks for the pointers
[17:16] * jamespage takes a look
[17:17] * oliver1 (~oliver@p4FD06843.dip0.t-ipconnect.de) has left #ceph
[17:20] * leo (~leo@27.106.31.32) Quit (Remote host closed the connection)
[17:20] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) Quit (Remote host closed the connection)
[17:20] * leo (~leo@27.106.31.32) has joined #ceph
[17:26] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[17:26] * tkensiski (~tkensiski@49.sub-70-197-6.myvzw.com) has joined #ceph
[17:26] * tkensiski (~tkensiski@49.sub-70-197-6.myvzw.com) Quit (Read error: Connection reset by peer)
[17:31] * dcasier (~dcasier@223.103.120.78.rev.sfr.net) has joined #ceph
[17:35] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[17:35] * alop (~al592b@71-80-139-200.dhcp.rvsd.ca.charter.com) has joined #ceph
[17:36] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[17:36] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:36] * KindOne (KindOne@0001a7db.user.oftc.net) has joined #ceph
[17:37] <alop> howdy, trying to use ceph-deploy on ubuntu 12.04.2
[17:37] <alop> running into same issue as http://tracker.ceph.com/issues/4924
[17:38] <alop> can't find the branches mentioned in the ticket on github
[17:38] <alop> anyone have any insight?
[17:39] <kyle_> I had a similar issue. I ended up just purging everything and starting over. Which worked. I think i just missed a step somewhere.
[17:40] * joelio (~Joel@88.198.107.214) has joined #ceph
[17:40] <alop> worth a shot
[17:44] <PerlStalker> Is it possible to upgrade an rbd from version 1 to version 2?
[17:48] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Ping timeout: 480 seconds)
[17:48] * tnt (~tnt@91.177.204.242) has joined #ceph
[17:56] * yanzheng (~zhyan@134.134.137.73) has joined #ceph
[18:00] * redeemed (~redeemed@static-71-170-33-24.dllstx.fios.verizon.net) has joined #ceph
[18:01] * redeemed (~redeemed@static-71-170-33-24.dllstx.fios.verizon.net) Quit ()
[18:01] * redeemed (~redeemed@static-71-170-33-24.dllstx.fios.verizon.net) has joined #ceph
[18:02] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[18:04] * leo (~leo@27.106.31.32) Quit (Ping timeout: 480 seconds)
[18:04] * leo (~leo@27.106.31.254) has joined #ceph
[18:08] * BillK (~BillK@124-169-236-155.dyn.iinet.net.au) Quit (Ping timeout: 481 seconds)
[18:11] * mtanski (~mtanski@69.193.178.202) has joined #ceph
[18:11] * yanzheng (~zhyan@134.134.137.73) Quit (Remote host closed the connection)
[18:11] <mtanski> Who would be the right person to talk to about the ceph kernel client. I sent a patch in, but I didn't get anything feedback.
[18:11] * loicd (~loic@185.10.252.15) Quit (Ping timeout: 480 seconds)
[18:14] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[18:15] * joset_ (~joset@59.92.165.142) has joined #ceph
[18:17] * xmltok (~xmltok@pool101.bizrate.com) Quit (Quit: Leaving...)
[18:17] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Quit: Leaving)
[18:18] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[18:20] * bergerx_ (~bekir@78.188.204.182) Quit (Quit: Leaving.)
[18:23] <imjustmatthew> PerlStalker: I think the only way to switch formats is to export it and then import it with --image_format 2
[18:24] * KindTwo (~KindOne@h177.170.17.98.dynamic.ip.windstream.net) has joined #ceph
[18:25] <imjustmatthew> mtanski: try alexelder https://github.com/alexelder
[18:25] <mtanski> Okay, thank you. I'll reach out to him
[18:26] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[18:26] * KindTwo is now known as KindOne
[18:26] * xmltok (~xmltok@pool101.bizrate.com) Quit (Ping timeout: 480 seconds)
[18:27] <elder> mtanski, I'm here.
[18:27] <mtanski> Oh, even better. Was looking for your email online
[18:27] <elder> Which patch?
[18:28] <elder> Ahh, fscache.
[18:28] <mtanski> I sent in a patch (really two patches) for fscache support in the Ceph kernel client.
[18:28] <PerlStalker> imjustmatthew: That's disappointing but not surprising.
[18:29] <elder> mtanski, I didn't look at them because I haven't (yet) gone very deeply into the ceph file system code. Mainly I've been working with libceph (messenger and osd client) and rbd.
[18:29] <elder> I was sort of hoping Sage would take a look at your patches.
[18:30] <mtanski> I think for our workload that's dominated by reads (large analytics database) but there's frequently(isn) used data like indexes and the block index that greatly benefits from it (esp. on SSD drives).
[18:30] <mtanski> I think other folks would benefit from it greatly
[18:30] <elder> So you've tested it and used it and observed an improvement?
[18:31] <mtanski> Plus it'd bring the support on part with NFS, CIFS and AFS.
[18:32] <mtanski> I've been testing it on non-production hardware right now (which is not an SSD drive) where the major improvement came from lower traffic to the OSDs.
[18:33] * rturk-away is now known as rturk
[18:33] <elder> OK. I'll try to at least take an initial look at your two patches today. I'll also make sure Sage knows they should get a little attention when he returns from vacation.
[18:33] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[18:34] <mtanski> I'm going to spend more time testing it on our production hardware as well.
[18:34] <elder> Bottom line, I did see your patch and I'm sorry nobody acknowledged that. It may take some days to get them looked at closely though.
[18:34] <mtanski> I'm sure the both of you guys w
[18:34] <elder> But we're definitely interested.
[18:35] <mtanski> ill have some feedback and they'll be back and forth because I'm just learning about the Ceph internals.
[18:35] <elder> OK, that's fine. I'll be learning about fscache also.
[18:35] <elder> Thanks for posting it.
[18:37] <mtanski> That's fine time wise; just want to get feedback when you guys have time (and not be annoying on the mailing list)
[18:37] <mtanski> Thanks
[18:37] * yehudasa (~yehudasa@2607:f298:a:607:fc7b:7397:97da:230e) Quit (Ping timeout: 480 seconds)
[18:38] <mtanski> And I guess I would also like to know what the canonical branch I should be using for building this / testing.
[18:38] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) Quit (Quit: Leaving.)
[18:39] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) has joined #ceph
[18:39] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) Quit ()
[18:40] <dosaboy> can someone tell me is using --new-format with rbd creates/imports actually does anything?
[18:40] <dosaboy> it is not in the doc
[18:40] <dosaboy> and looks like --format [1|2] is the right way to do it
[18:40] <dosaboy> but that is not what is used in the cinder rbd driver so just want to clarify
[18:41] <elder> There are two rbd "formats," 1 and 2.
[18:42] <elder> The "new" rbd format 2 supports some new features that were not possible before.
[18:42] <elder> To support these features, some parts of how rbd data is stored and organized had to be changed.
[18:42] <elder> You won't notice as a user of rbd those changes, they're more internal.
[18:42] <alop> anyone using the ceph-cookbooks here?
[18:42] <elder> Sadly, the syntax for creating images has changed, twice.
[18:43] <elder> When the new format was first introduced, it was "--new-format"
[18:43] <alop> I'm trying to follow the wiki doc, chef-client is getting stuck at [2013-05-24T16:34:49+00:00] INFO: Processing ruby_block[get osd-bootstrap keyring] action create (ceph::mon line 96)
[18:43] <elder> Then, it was decided that "--format 1" (default) or "--format 2" to select which type you wanted to create.
[18:43] <elder> Now it's "--image-format 1" or "--image-format 2"
[18:44] <elder> Does that answer your questions dosaboy?
[18:46] * yehudasa (~yehudasa@2607:f298:a:607:dc9c:c9bf:9554:d22d) has joined #ceph
[18:46] <dosaboy> elder: what I need to know is what effect using --new-format has with Bobtail and up
[18:46] <dosaboy> since that is what the cinder rbd dirver does regardless
[18:46] <dosaboy> and Grizzly tends to be used with Bobtail
[18:46] <elder> I don't know about what cinder does with it. If you are using basic rbd functionality, there will be no difference.
[18:47] <elder> If you are going to use some of the advanced features offered by the new format, you must be using format 2.
[18:47] <dosaboy> elder: so you are saying that --new-format is not equivalent to --format 2
[18:47] <elder> (I can tell you about rbd. I'm not the right one to answer questions about OpenStack)
[18:48] <elder> These are all roughly equivalent:
[18:48] <elder> --new-format
[18:48] <elder> --format 2
[18:48] <elder> --image-format 2
[18:48] <elder> They are just different ways of expressing the same thing, as they've evolved.
[18:48] <dosaboy> ok, just want to be sure because rbd seems to ignore flags it does not know about
[18:49] <dosaboy> instead of raising an errpr
[18:49] <dosaboy> error
[18:49] <dosaboy> so unless I dig deep I do not know if it has acknowledged --new-format or not
[18:50] <dosaboy> is there a way to check what 'format' an rbd volume has?
[18:50] <dosaboy> the reason I am asking is cause I am trying out cloning
[18:50] <dosaboy> and it seems really slow
[18:50] <elder> Cloning won't work at all unless it's new format.
[18:51] <dosaboy> right
[18:51] <dosaboy> basically I am trying to test what is stated here http://ceph.com/docs/master/rbd/rbd-openstack/
[18:51] <elder> "rados ls -p rbd" will show you a list of objects, and that can tell you.
[18:52] <elder> Actually,
[18:52] <elder> cat /sys/bus/rbd/devices/*/image_id
[18:53] <elder> A mapped rbd image that's format one will have an empty "image_id" file.
[18:53] <elder> A mapped rbd image that's format 2 will contain a hexadecimal string in that file.
[18:53] <dosaboy> hmm, rbd/devices is empty
[18:53] <elder> Then you don't have it mapped.
[18:53] <dosaboy> I am using -p volumes
[18:54] * alram (~alram@38.122.20.226) has joined #ceph
[18:54] <dosaboy> are we talking on the client or server?
[18:54] <elder> I am talking about running that command on the client.
[18:54] <dosaboy> ah ;)
[18:54] <dosaboy> 1 sec
[18:55] * scalability-junk (uid6422@tooting.irccloud.com) Quit (Ping timeout: 480 seconds)
[18:56] <dosaboy> hmm no /sys/bus/rbd on cient
[18:56] <dosaboy> I guess Glance does not map rbds
[18:56] <dosaboy> so right now nothing is ever mapped
[18:56] <elder> rados ls -p rbd
[18:56] * leseb (~Adium@83.167.43.235) Quit (Quit: Leaving.)
[18:57] <dosaboy> and right now Glance is the source
[18:57] <dosaboy> I need to check that glance is creating images with format 2 though
[18:57] <dosaboy> since if not that would be a bug
[18:58] <dosaboy> so just to clarify, both source and dest need to be format 2 for this to work?
[18:59] <elder> Hmm.
[18:59] <elder> Technically only the parent image needs to be format 2, but I think in fact they both need to be.
[19:00] <dosaboy> ok so at least the parent
[19:00] <dosaboy> which in this case is glance
[19:01] <dosaboy> don't suppose jdurgin hangs around these parts?
[19:01] * jmlowe (~Adium@2001:18e8:2:28cf:f000::dab8) has joined #ceph
[19:02] * KindTwo (KindOne@h205.168.17.98.dynamic.ip.windstream.net) has joined #ceph
[19:02] <elder> Josh is the right guy to ask.
[19:02] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[19:02] <elder> He is apparently not around now. Check back in the next hour or two. It's possible he's taking today off.
[19:02] * KindTwo is now known as KindOne
[19:02] <elder> joshd ^^
[19:03] <dosaboy> looks liek the rbd driver in glance queries librbd to determine whether "layering" is possible
[19:03] <dosaboy> i'll need to investigate further here
[19:03] <dosaboy> may be a configration issue on my part
[19:03] <dosaboy> elder: thanks for your help
[19:06] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) has joined #ceph
[19:06] <elder> No problem dosaboy.
[19:07] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[19:07] * jmlowe (~Adium@2001:18e8:2:28cf:f000::dab8) Quit (Quit: Leaving.)
[19:09] * leo (~leo@27.106.31.254) Quit (Quit: Leaving)
[19:10] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[19:11] * KindOne (~KindOne@0001a7db.user.oftc.net) has joined #ceph
[19:15] * eschnou (~eschnou@148.95-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[19:17] * rturk is now known as rturk-away
[19:19] * nooky_ (~nooky@190.221.31.66) Quit (Quit: Lost terminal)
[19:20] * andrei (~andrei@host217-46-236-49.in-addr.btopenworld.com) Quit (Ping timeout: 480 seconds)
[19:32] * lyncos (~chatzilla@208.71.184.41) has joined #ceph
[19:33] <lyncos> Hi .. me again.. I still have unclean PGs everytime I build/rebuild my cluster from scratch
[19:34] <lyncos> Anyone can help plz .. I did search on google and can't find anything useful
[19:34] <lyncos> Here is my ceph -w output
[19:34] <lyncos> http://pastebin.com/mwKCDsGQ
[19:35] * joao (~JL@89.181.159.84) Quit (Remote host closed the connection)
[19:35] <lyncos> It seems no one can help me
[19:36] <lyncos> And I know I only have 8 Pgs .. I did try with 50, 100 and 1000 and they always unclean
[19:36] <lyncos> I have 2x OSD on 2 different nodes
[19:37] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[19:39] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[19:40] <lyncos> When I delete all the pools ... it's fixing up... but as soon I create a pool I get stuck unclean pgs
[19:41] * kyle_ (~kyle@216.183.64.10) Quit (Quit: Leaving)
[19:43] * eschnou (~eschnou@148.95-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[19:48] <tnt> lyncos: try "ceph osd crush tunables optimal"
[19:50] <tchmnkyz> hey guys is there a easy way to get a total disk space usage per rbd pool/
[19:53] <lyncos> rados df ?
[19:53] <lyncos> tnt: trying
[19:53] <lyncos> it fixed
[19:54] <tchmnkyz> ok but human radable
[19:54] <tchmnkyz> lol
[19:54] <via_> i haven't been able to keep my monitors up for even 24 hours since upgrading to .61.2, and now one of my mon's wont start at all now, giving this: http://pastebin.mlsrvr.com/m785f9a8
[19:54] * via_ is now known as via
[19:55] <lyncos> tnt: it fixed the uncleans.. but I still have 1000 active+degraded pgs ....
[19:58] <tchmnkyz> via: anything in the logs?
[19:59] <sjust> lyncos: ceph osd getmap -o /tmp/map
[19:59] <sjust> and then upload /tmp/map to cephdrop@ceph.com (sftp)
[20:03] * b1tbkt_ (~b1tbkt@24-217-196-119.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[20:03] <sjust> lyncos: did you create the crush map manually?
[20:04] <lyncos> I did try to adjust it manully .. with no success I think
[20:04] <sjust> yeah, you have the same osd under both hosts
[20:04] <lyncos> ...
[20:05] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[20:05] <sjust> assuming I grabbed the right osdmap from cephdrop
[20:06] * coyo (~unf@pool-71-164-242-68.dllstx.fios.verizon.net) has joined #ceph
[20:06] <lyncos> it make sence
[20:06] <lyncos> I'll change it
[20:10] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[20:11] <lyncos> Ok I did change it
[20:11] <lyncos> I still have the unclean stuff
[20:13] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[20:15] <sjust> doesn't look like it changed
[20:15] <sjust> maybe upload as lyncos-map?
[20:15] <lyncos> k
[20:15] <lyncos> lyncos-map
[20:15] <lyncos> done
[20:18] * joset_ (~joset@59.92.165.142) Quit (Quit: Leaving)
[20:19] <lyncos> HEALTH_WARN 1000 pgs degraded; 1000 pgs stuck unclean
[20:20] <sjust> that file seems to have been truncated
[20:20] <lyncos> ok let me try again
[20:20] <lyncos> i'll re do it
[20:21] <lyncos> Try again
[20:21] <lyncos> I did do the whole process again
[20:21] <sjust> # rules
[20:21] <sjust> rule tests {
[20:21] <sjust> ruleset 0
[20:21] <sjust> type replicated
[20:21] <sjust> min_size 1
[20:21] <sjust> max_size 10
[20:21] <sjust> step take ceph01-gns
[20:21] <sjust> step choose firstn 0 type osd
[20:21] <sjust> step emit
[20:21] <sjust> }
[20:22] <sjust> set take ceph01-gns
[20:22] <sjust> you are starting at ceph01-gns
[20:22] <sjust> which only has one osd
[20:22] <sjust> you want to start at tests
[20:22] <lyncos> ok let me try
[20:23] <lyncos> it's scrubbing now
[20:25] <lyncos> should be pretty quick
[20:27] * jjgalvez (~jjgalvez@12.248.40.138) has joined #ceph
[20:27] * dpippenger1 (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[20:28] * mtanski (~mtanski@69.193.178.202) Quit (Quit: mtanski)
[20:29] <lyncos> fixed ! thanks a lot
[20:29] <lyncos> I did learn how to do a crush map correctly
[20:29] * jane_hg (~G25@94.20.34.244) has joined #ceph
[20:30] * jane_hg (~G25@94.20.34.244) Quit (Remote host closed the connection)
[20:30] <sjust> lyncos: glad to help.
[20:33] * diegows (~diegows@host195.190-224-143.telecom.net.ar) has joined #ceph
[20:34] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[20:37] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[20:38] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[20:42] * diegows (~diegows@host195.190-224-143.telecom.net.ar) Quit (Ping timeout: 480 seconds)
[20:42] * dpippenger2 (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[20:43] * dpippenger1 (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[20:44] * dpippenger1 (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[20:48] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[20:49] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[20:50] * ChanServ sets mode +v leseb
[20:50] * dpippenger2 (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[20:51] <via> tchmnkyz: that paste was the log
[20:53] * eschnou (~eschnou@148.95-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[20:54] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[20:56] * scalability-junk (uid6422@id-6422.tooting.irccloud.com) has joined #ceph
[21:06] * mtanski (~mtanski@69.193.178.202) has joined #ceph
[21:06] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Quit: Konversation terminated!)
[21:09] * jshen (~jshen@209.133.73.98) Quit (Ping timeout: 480 seconds)
[21:10] * eschnou (~eschnou@148.95-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[21:10] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Read error: Connection reset by peer)
[21:10] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[21:21] * alop_ (~al592b@71-80-139-200.dhcp.rvsd.ca.charter.com) has joined #ceph
[21:22] * alop (~al592b@71-80-139-200.dhcp.rvsd.ca.charter.com) Quit (Ping timeout: 480 seconds)
[21:22] * alop_ is now known as alop
[21:30] * dpippenger1 (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[21:31] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[21:32] * alop_ (~al592b@71-80-139-200.dhcp.rvsd.ca.charter.com) has joined #ceph
[21:34] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[21:34] * alop (~al592b@71-80-139-200.dhcp.rvsd.ca.charter.com) Quit (Ping timeout: 480 seconds)
[21:34] * alop_ is now known as alop
[21:36] * rturk-away is now known as rturk
[21:37] * rturk is now known as rturk-away
[21:39] * lyncos (~chatzilla@208.71.184.41) Quit (Quit: ChatZilla 0.9.90 [Firefox 21.0/20130512193848])
[21:47] * eschnou (~eschnou@148.95-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[21:47] * imjustmatthew (~imjustmat@c-24-127-107-51.hsd1.va.comcast.net) Quit (Remote host closed the connection)
[21:48] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[21:48] * ChanServ sets mode +v andreask
[21:55] * portante (~user@66.187.233.206) Quit (Quit: outahere)
[21:56] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[22:02] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[22:08] * dcasier (~dcasier@223.103.120.78.rev.sfr.net) Quit (Ping timeout: 480 seconds)
[22:09] * dpippenger1 (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[22:15] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[22:27] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[22:42] * mormonitor (~mormonito@116.250.217.65) has joined #ceph
[22:43] <mormonitor> hello, i am evaluating ceph. after creating the monitor, status is "probing" and ceph-create-keys is not able to create keyrings. what i am doing wrong?
[22:44] <gregaf> ceph-create-keys won't work until your monitors have formed a quorum, and "probing" means they haven't
[22:44] <gregaf> you probably don't have them all running, or else there's something blocking their communication
[22:45] <mormonitor> i can telnet 6789 between all monitors
[22:45] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[22:46] <mormonitor> but monitors keep trying to connect inbetween
[22:46] <dmick> logs
[22:47] <mormonitor> mon.node2@-1(probing) e0 adding peer 192.168.56.42:6789/0 to list of hints
[22:47] <mormonitor> mon.node3@-1(probing) e0 adding peer 192.168.56.41:6789/0 to list of hints
[22:48] * ghartz|2 (~ghartz@ill67-1-82-231-212-191.fbx.proxad.net) has joined #ceph
[22:49] * alram_ (~alram@38.122.20.226) has joined #ceph
[22:50] * morse_ (~morse@supercomputing.univpm.it) has joined #ceph
[22:50] * morse (~morse@supercomputing.univpm.it) Quit (Read error: Connection reset by peer)
[22:51] * gregaf1 (~Adium@2607:f298:a:607:91c6:a098:e1e7:319f) has joined #ceph
[22:51] * ivan` (~ivan`@000130ca.user.oftc.net) Quit (Ping timeout: 480 seconds)
[22:51] * vata (~vata@2607:fad8:4:6:1d1e:f894:89dc:6567) Quit (Ping timeout: 480 seconds)
[22:52] * ivan` (~ivan`@000130ca.user.oftc.net) has joined #ceph
[22:52] * gregaf (~Adium@2607:f298:a:607:94dc:a315:cc72:795f) Quit (Ping timeout: 480 seconds)
[22:53] <mormonitor> any ideas on where to proceed?
[22:53] * vata (~vata@2607:fad8:4:6:1d1e:f894:89dc:6567) has joined #ceph
[22:54] * ghartz (~ghartz@ill67-1-82-231-212-191.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[22:54] <joshd> dosaboy: did you figure out your format 2 issue? if librbd and ceph-common are updated cinder will use format 2, and glance needs an up-to-date python-ceph
[22:55] * alram (~alram@38.122.20.226) Quit (Ping timeout: 480 seconds)
[22:58] * gucki (~smuxi@84-73-201-95.dclient.hispeed.ch) Quit (Remote host closed the connection)
[23:01] <dmick> mormonitor: if those are the only messages in the mon logs, I'd increase mon logging to try to figure out why they're not connecting
[23:04] * eschnou (~eschnou@148.95-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[23:04] <mormonitor> thats what i've got with debug_ms 1:
[23:04] <mormonitor> 013-05-25 07:03:34.930350 7f666f766700 1 -- 192.168.56.40:6789/0 --> mon.0 0.0.0.0:0/1 -- mon_probe(probe 6703288d-7fbb-4764-9486-b83fdccd80d5 name node1 new) v4 -- ?+0 0x1ec1b00
[23:04] <mormonitor> 2013-05-25 07:03:34.930902 7f666f766700 1 -- 192.168.56.40:6789/0 --> mon.1 0.0.0.0:0/2 -- mon_probe(probe 6703288d-7fbb-4764-9486-b83fdccd80d5 name node1 new) v4 -- ?+0 0x1ec1800
[23:05] <mormonitor> 2013-05-25 07:03:34.931138 7f666f766700 1 -- 192.168.56.40:6789/0 --> mon.2 0.0.0.0:0/3 -- mon_probe(probe 6703288d-7fbb-4764-9486-b83fdccd80d5 name node1 new) v4 -- ?+0 0x1ec1500
[23:05] <mormonitor> 2013-05-25 07:03:34.931245 7f666f766700 1 -- 192.168.56.40:6789/0 --> mon.? 192.168.56.41:6789/0 -- mon_probe(probe 6703288d-7fbb-4764-9486-b83fdccd80d5 name node1 new) v4 -- ?+0 0x1ec1200
[23:05] <mormonitor> 2013-05-25 07:03:34.932286 7f666ef65700 1 -- 192.168.56.40:6789/0 <== mon.1 192.168.56.41:6789/0 1726317517 ==== mon_probe(reply 6703288d-7fbb-4764-9486-b83fdccd80d5 name node2 paxos( fc 0 lc 0 ) new) v4 ==== 574+0+0 (2325886047 0 0) 0x1ec1200 con 0x1e3b4a0
[23:07] <dmick> debug mon = 20
[23:08] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[23:08] <mormonitor> the most significant message is:
[23:08] <mormonitor> 192.168.56.41:6789/0 pipe(0x139ca00 sd=22 :46988 s=4 pgs=1029 cs=1 l=0).reader couldn't read tag, Success
[23:09] <mormonitor> 192.168.56.40:6789/0 >> 0.0.0.0:0/3 pipe(0x139cc80 sd=21 :0 s=4 pgs=0 cs=0 l=0).unregister_pipe - not registered
[23:11] <dmick> mormonitor: even with mon debug = 20?
[23:14] * ScOut3R (~ScOut3R@dsl51B614D7.pool.t-online.hu) has joined #ceph
[23:14] * dpippenger1 (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[23:14] <mormonitor> dmick: https://gist.github.com/llonchj/5646557
[23:15] <dosaboy> joshd: so my concern was that I noticed that the cinder driver is use --new-format and I was not sure if that would imply format 2
[23:15] <dmick> that looks like debug_ms = 20
[23:15] <dosaboy> joshd: elder indicated that it did so I think there is no issue there
[23:16] <mormonitor> it does affect that hostname is different to the node name?
[23:16] <joshd> dosaboy: ok, cool
[23:16] <dmick> mormonitor: I don't know, but I'm wondering if we can get a listing with debug mon set to 20
[23:17] <mormonitor> dmick: have you seen the gist url? https://gist.github.com/llonchj/5646557
[23:17] <dosaboy> joshd: my other concern was that the thin cloning that is mentioned here - http://ceph.com/docs/master/rbd/rbd-openstack/
[23:17] <dosaboy> did not appear to be happening in my tests
[23:18] <joshd> dosaboy: it's possible you're hitting http://www.gossamer-threads.com/lists/openstack/dev/26064#26064
[23:18] <dosaboy> so when I looked into the rbd driver, it appears that for an image to volume create,
[23:18] <dmick> mormonitor: yes. and I responded to that posting
[23:18] <dosaboy> the driver downloads the whole image from glance, then makes a copy
[23:18] <dmick> (02:15:17 PM) dmick: that looks like debug_ms = 20
[23:18] <dmick> ms != mon
[23:18] <dosaboy> regardless of it is qcow2 or raw
[23:18] <dosaboy> joshd: then when it creates it is uploading the enitre raw image
[23:19] <dosaboy> so it takes ages
[23:19] <dosaboy> does that sound familiar
[23:19] * dosaboy looks at link
[23:19] <joshd> dosaboy: yeah, that's the fallback if cinder can't get the backend location from glance
[23:19] <joshd> or cloning isn't working for some other reason
[23:19] <dosaboy> backend location?
[23:20] <mormonitor> sorry, i am pretty new to ceph and got lost. what do you want me to do?
[23:20] <dosaboy> joshd ^^
[23:21] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[23:21] <joshd> dosaboy: glance's v2 api will give cinder the location where it's stored the image (i.e. rbd://cluster_fsid/pool/image/snapshot)
[23:21] <dosaboy> ah ok
[23:21] <joshd> dosaboy: then cinder uses that to determine whether it can clone, or if it has to do the full copy
[23:21] <dosaboy> so this is for the copy_image_to_volume() operation right?
[23:21] <dmick> mormonitor: this is not complex.
[23:21] <dmick> where you typed teh two letters 'ms'
[23:22] <dmick> replace those with the three letters 'mon'
[23:22] <mormonitor> did you mean: /usr/bin/ceph-mon --cluster=ceph -i node1 -d --debug_mon 20
[23:22] <joshd> dosaboy: yeah. another possibility is that 'rbd info' with the rados user cinder has is failing since it doesn't have permission to read the image
[23:22] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[23:23] <joshd> dosaboy: but the backend location being unavailable is more likely (glance-api also has to have the show_direct_url=true setting)
[23:23] <dosaboy> ah I need to try glance_api_version=2
[23:23] * ScOut3R (~ScOut3R@dsl51B614D7.pool.t-online.hu) Quit (Ping timeout: 480 seconds)
[23:23] <dosaboy> was not ware of that ;)
[23:23] <dosaboy> let me try it out
[23:23] <dosaboy> thanks joshd
[23:24] <tnt> Is there a way to see what objects are being accessed at the moment ?
[23:24] <joshd> dosaboy: you're welcome
[23:25] <tnt> like what RBD image is currently generating a lot of IO
[23:25] <dmick> mormonitor: that's what I get when I substitute ms with mon, yes
[23:25] <joshd> tnt: you can check via osd admin socket command dump_ops_in_flight
[23:26] <joshd> tnt: nothing cluster-level though
[23:26] <mormonitor> thanks, i'm building a gist
[23:26] * diegows (~diegows@190.190.2.126) has joined #ceph
[23:28] <tnt> joshd: thanks. But it shows no ops in flight at all ... despite having lots of disk IO.
[23:28] <joshd> scrubbing or snap trimming I'd guess then
[23:28] <nhm> tnt: anything interesting going on if you do a ceph -w?
[23:29] <mormonitor> dmick: sorry, i was not understanding what you said because debug_mon is not showing on ceph-mon -h as argument ;)
[23:29] <mormonitor> https://gist.github.com/llonchj/5646637
[23:29] <tnt> nhm: 2013-05-24 21:29:07.134355 mon.0 [INF] pgmap v14258008: 12808 pgs: 12808 active+clean; 649 GB data, 1567 GB used, 1673 GB / 3241 GB avail; 2335KB/s wr, 293op/s
[23:29] <tnt> things like that.
[23:29] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[23:30] <tnt> no scrub in progress. no idea what 'snap trimming' is.
[23:31] <joshd> deleting snapshots happens in the background
[23:31] <tnt> I never took any snapshots.
[23:34] <dmick> 2013-05-25 07:26:42.851727 7f000398f700 10 mon.node1@-1(probing) e0 mostly ignoring mon.node3, not part of monmap (and one for node2) look suspicious to me
[23:34] <dmick> ^^ mormonitor
[23:35] <dmick> how were the mons set up?
[23:35] <mormonitor> ceph-deploy
[23:35] <mormonitor> i have them now working
[23:35] <mormonitor> root@node1:~# ceph -s health HEALTH_ERR 192 pgs stuck inactive; 192 pgs stuck unclean; no osds
[23:36] <mormonitor> it was some host naming problem
[23:36] <joshd> tnt: are you seeing a lot of io by looking at iostat on the osds, or in ceph -w reporting?
[23:36] * dpippenger1 (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[23:37] <tnt> joshd: io stat on the osds. "Lots" might be a strong word ... let's say it's much higher than the IDLE levels I would expect for this time of day given nothing should really be happening.
[23:38] <mormonitor> dmick: health HEALTH_OK thank you for your support
[23:38] <dmick> mormonitor: what about the naming was wrong? hosts not reverse-mapping to their shortnames?
[23:39] <mormonitor> i am using virtualboxes with vagrant to setup an evaluation environment
[23:39] <joshd> tnt: since it showed now ops in flight, it's not from a client (unless there just happened to be none when the command was run). another non-client thing is deleting extra copies after recovery, if recovery occurred recently
[23:40] <dmick> mormonitor: ok, but what specifically did you fix about the names?
[23:40] <mormonitor> i am not used to ubuntu, this linux flavor uses 127.0.0.1 as the hostname instead of using localhost as in redhat
[23:41] <mormonitor> then on every node i mapped only the others. before i had also the nodeX (itself) redefined as 192.168.56.X
[23:41] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[23:41] <mormonitor> is that making sense?
[23:42] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Read error: Connection reset by peer)
[23:42] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[23:42] <dmick> no. can you phrase it in terms of "I edited <file> to say <x> instead of <y>"?
[23:43] * vata (~vata@2607:fad8:4:6:1d1e:f894:89dc:6567) Quit (Quit: Leaving.)
[23:44] <mormonitor> etc/hosts had 127.0.1.1 node1.nitidum.devel node1 vagrant. and i edited /etc/hosts with 192.168.56.40 node1.niditum.devel node1, 192.168.56.41 for node2 and so on
[23:45] <mormonitor> then i commented the line #192.168.56.40 node1.niditum.devel node1 for node1, #192.168.56.41 node2.niditum.devel node2 for node2 and #192.168.56.42 node3.niditum.devel node3 for node3
[23:45] <mormonitor> rebuild the cluster: ceph-deploy mon create root@node1 root@node2 root@node3
[23:45] <mormonitor> and worked
[23:46] <tnt> joshd: well, it turned out it was client IO ... not sure why ops in flight didn't show anything, I ran it several times on different OSDs ...
[23:46] <tnt> in the end, I just enabled OSD debug, looked at the requests and figured out which image was being accessed.
[23:47] <dmick> ah. the hosts files needed to have short names in them. ok. that should probably be stated more prominently in the preflight checklsit
[23:47] <dmick> it's in the ceph-deploy README I believe
[23:48] <joshd> hmm, strange that no ops in flight showed up
[23:50] <mormonitor> dmick: that would be appreciated, also in the README it will be helpful something like: "To deploy a cluster with 3 nodes, you need those steps"...
[23:50] * loicd (~loic@magenta.dachary.org) has joined #ceph
[23:51] <mormonitor> sometimes a whole example step-by-step helps you build a better idea
[23:52] <tnt> joshd: is there a recommended way to track IO per RBD image ?
[23:53] <joshd> tnt: if you're using qemu it can tell you how much i/o is going on
[23:53] <tnt> nope, xen.
[23:53] <joshd> libvirt and monitoring tools can hook into that
[23:53] <dmick> mormonitor: good idea, tnx
[23:53] <joshd> xen probably has a way to do it too, but I'm not that familiar with it
[23:54] <joshd> if you're using kernel rbd though, I'd guess iostat would be useful
[23:54] <mormonitor> dmick: thank you for your support and patience with me ;) i am strong-headed
[23:54] <tnt> joshd: yes and I track IO on some of them by using hooks, but as it happens that IO was on one of the "unmonitored disk" and I was just wondering if ceph itself has instrumentation hooks.
[23:55] <joshd> nothing at a per-image level like that, just per-osd stats
[23:55] <dmick> mormonitor: thank you as well, we're all working toward an easier process
[23:57] <mormonitor> dmick: today there's a few messages on ceph-users mailing list with several persons having same issues. it would be helpful to fix the doc and point them to a possible solution

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.