#ceph IRC Log

Index

IRC Log for 2012-01-20

Timestamps are in GMT/BST.

[0:03] * The_Bishop (~bishop@e179009011.adsl.alicedsl.de) has joined #ceph
[0:20] * aliguori (~anthony@32.97.110.59) Quit (Quit: Ex-Chat)
[0:37] * vodka (~paper@154.Red-83-43-125.dynamicIP.rima-tde.net) has joined #ceph
[0:50] * fronlius (~fronlius@f054101151.adsl.alicedsl.de) Quit (Quit: fronlius)
[0:52] * BManojlovic (~steki@212.200.243.100) Quit (Remote host closed the connection)
[1:00] * adjohn is now known as Guest24555
[1:00] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) has joined #ceph
[1:01] * Guest24555 (~adjohn@rackspacesf.static.monkeybrains.net) Quit (Read error: Operation timed out)
[1:03] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[1:03] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[1:25] * Tv|work (~Tv|work@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[1:36] * vodka (~paper@154.Red-83-43-125.dynamicIP.rima-tde.net) Quit (Quit: Leaving)
[1:46] * yoshi (~yoshi@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:48] <iggy> how big of a journal device does one need (let's say per TB)
[2:18] <dwm_> iggy: From memory, I think it's more a function of write throughput rather than raw capacity.
[2:18] <iggy> so it should be sized based on expected usage patterns... that could be tough
[2:20] <dwm_> Possibly more based on the bandwidth of your disk devices.
[2:21] <iggy> any kind of metric to go on though?
[2:32] * vodka (~paper@154.Red-83-43-125.dynamicIP.rima-tde.net) has joined #ceph
[2:36] * vodka (~paper@154.Red-83-43-125.dynamicIP.rima-tde.net) Quit ()
[3:04] <ajm> hrm, if i try to run >1 OSD on a machine
[3:04] <ajm> ./common/Mutex.h: In function 'void Mutex::Unlock()', in thread '7f7a7ed34780'
[3:04] <ajm> ./common/Mutex.h: 117: FAILED assert(nlock > 0)
[3:17] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[3:25] * jojy (~jvarghese@108.60.121.114) Quit (Quit: jojy)
[4:00] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) Quit (Quit: adjohn)
[4:25] * The_Bishop (~bishop@e179009011.adsl.alicedsl.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[4:48] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[4:53] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[5:03] * adjohn (~adjohn@173-164-152-85-SFBA.hfc.comcastbusiness.net) has joined #ceph
[5:16] * The_Bishop (~bishop@cable-89-16-138-109.cust.telecolumbus.net) has joined #ceph
[5:20] * MarkDude (~MT@70.42.240.21) has joined #ceph
[5:38] * izdubar (~MT@70.42.240.21) has joined #ceph
[5:38] * MarkDude (~MT@70.42.240.21) Quit (Read error: Connection reset by peer)
[5:44] * elder (~elder@c-71-193-71-178.hsd1.mn.comcast.net) Quit (Quit: Leaving)
[6:07] * WalterWoden (~wodencafe@19NAAF1OS.tor-irc.dnsbl.oftc.net) has joined #ceph
[6:09] * WalterWoden (~wodencafe@19NAAF1OS.tor-irc.dnsbl.oftc.net) Quit ()
[6:45] * MarkDud (~MT@70.42.240.21) has joined #ceph
[6:45] * izdubar (~MT@70.42.240.21) Quit (Read error: Connection reset by peer)
[6:48] * Sandra2012 (~sandraX@pool-71-168-103-20.cncdnh.east.myfairpoint.net) has joined #ceph
[6:48] <Sandra2012> Come chat with me guys.. http://bit.ly/wkbzel
[6:48] * Sandra2012 (~sandraX@pool-71-168-103-20.cncdnh.east.myfairpoint.net) has left #ceph
[6:53] * Sandra2012 (~sandraX@pool-71-168-103-20.cncdnh.east.myfairpoint.net) has joined #ceph
[6:53] <Sandra2012> Come chat with me guys.. http://bit.ly/wkbzel
[6:53] * Sandra2012 (~sandraX@pool-71-168-103-20.cncdnh.east.myfairpoint.net) has left #ceph
[6:57] * Sandra2012 (~sandraX@pool-71-168-103-20.cncdnh.east.myfairpoint.net) has joined #ceph
[6:57] <Sandra2012> Come chat with me guys.. http://bit.ly/wkbzel
[6:57] * Sandra2012 (~sandraX@pool-71-168-103-20.cncdnh.east.myfairpoint.net) has left #ceph
[7:00] * Sandra2012 (~sandraX@pool-71-168-103-20.cncdnh.east.myfairpoint.net) has joined #ceph
[7:00] <Sandra2012> Come chat with me guys.. http://bit.ly/wkbzel
[7:00] * Sandra2012 (~sandraX@pool-71-168-103-20.cncdnh.east.myfairpoint.net) has left #ceph
[7:03] * Sandra2012 (~sandraX@pool-71-168-103-20.cncdnh.east.myfairpoint.net) has joined #ceph
[7:03] <Sandra2012> Come chat with me guys.. http://bit.ly/wkbzel
[7:03] * Sandra2012 (~sandraX@pool-71-168-103-20.cncdnh.east.myfairpoint.net) has left #ceph
[7:07] * Sandra2012 (~sandraX@pool-71-168-103-20.cncdnh.east.myfairpoint.net) has joined #ceph
[7:07] <Sandra2012> Come chat with me guys.. http://bit.ly/wkbzel
[7:08] * Sandra2012 (~sandraX@pool-71-168-103-20.cncdnh.east.myfairpoint.net) has left #ceph
[7:12] * Sandra2012 (~sandraX@pool-71-168-103-20.cncdnh.east.myfairpoint.net) has joined #ceph
[7:12] <Sandra2012> Come chat with me guys.. http://bit.ly/wkbzel
[7:12] * Sandra2012 (~sandraX@pool-71-168-103-20.cncdnh.east.myfairpoint.net) has left #ceph
[7:58] * izdubar (~MT@70.42.240.21) has joined #ceph
[7:58] * MarkDud (~MT@70.42.240.21) Quit (Read error: Connection reset by peer)
[8:16] * adjohn (~adjohn@173-164-152-85-SFBA.hfc.comcastbusiness.net) Quit (Quit: adjohn)
[8:20] * ssedov (stas@ssh.deglitch.com) Quit (Read error: Connection reset by peer)
[8:23] * stass (stas@ssh.deglitch.com) has joined #ceph
[8:44] * izdubar (~MT@70.42.240.21) Quit (Ping timeout: 480 seconds)
[8:49] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) Quit (Ping timeout: 480 seconds)
[8:56] * Kioob`Taff1 (~plug-oliv@89-156-116-126.rev.numericable.fr) Quit (Quit: Leaving.)
[8:56] * s15y1 (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) has joined #ceph
[8:57] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[8:57] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) has joined #ceph
[9:02] * Kioob`Taff (~plug-oliv@89-156-116-126.rev.numericable.fr) has joined #ceph
[9:10] * verwilst (~verwilst@dD576F4A9.access.telenet.be) has joined #ceph
[9:14] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[9:15] * lxo (~aoliva@lxo.user.oftc.net) Quit (Read error: Connection reset by peer)
[9:23] * Kioob`Taff2 (~plug-oliv@89-156-116-126.rev.numericable.fr) has joined #ceph
[9:23] * Kioob`Taff (~plug-oliv@89-156-116-126.rev.numericable.fr) Quit (Read error: Connection reset by peer)
[9:33] * Kioob`Taff (~plug-oliv@neu69-1-82-232-160-30.fbx.proxad.net) has joined #ceph
[9:40] * Kioob`Taff2 (~plug-oliv@89-156-116-126.rev.numericable.fr) Quit (Ping timeout: 480 seconds)
[9:47] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[10:19] * henrycc (~chatzilla@59-124-35-221.HINET-IP.hinet.net) has joined #ceph
[10:23] <henrycc> Hi all, how can I change the osd full/nearfull ratio with the current version of ceph? It seems there is no mon command to do this....
[10:30] * yoshi (~yoshi@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:28] * spadaccio (~spadaccio@213-155-151-233.customer.teliacarrier.com) has joined #ceph
[11:54] * henrycc (~chatzilla@59-124-35-221.HINET-IP.hinet.net) Quit (Ping timeout: 480 seconds)
[12:30] * lollercaust (~paper@154.Red-83-43-125.dynamicIP.rima-tde.net) has joined #ceph
[12:46] * Kioob`Taff (~plug-oliv@neu69-1-82-232-160-30.fbx.proxad.net) Quit (Quit: Leaving.)
[12:56] * lollercaust (~paper@154.Red-83-43-125.dynamicIP.rima-tde.net) Quit (Quit: Leaving)
[13:00] * Kioob`Taff1 (~plug-oliv@89-156-116-126.rev.numericable.fr) has joined #ceph
[13:10] * Kioob`Taff2 (~plug-oliv@neu69-1-82-232-160-30.fbx.proxad.net) has joined #ceph
[13:12] * lollercaust (~paper@154.Red-83-43-125.dynamicIP.rima-tde.net) has joined #ceph
[13:16] * Kioob`Taff1 (~plug-oliv@89-156-116-126.rev.numericable.fr) Quit (Ping timeout: 480 seconds)
[13:17] * lollercaust (~paper@154.Red-83-43-125.dynamicIP.rima-tde.net) Quit (Quit: Leaving)
[13:28] * lollercaust (~paper@154.Red-83-43-125.dynamicIP.rima-tde.net) has joined #ceph
[13:36] * gohko (~gohko@natter.interq.or.jp) Quit (Quit: Leaving...)
[13:36] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[13:37] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[13:41] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[13:54] * lollercaust (~paper@154.Red-83-43-125.dynamicIP.rima-tde.net) Quit (Quit: Leaving)
[14:06] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[14:37] * lollercaust (~paper@154.Red-83-43-125.dynamicIP.rima-tde.net) has joined #ceph
[14:58] * lollercaust (~paper@154.Red-83-43-125.dynamicIP.rima-tde.net) Quit (Quit: Leaving)
[15:14] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) Quit (Ping timeout: 480 seconds)
[15:35] * Hugh (~hughmacdo@soho-94-143-249-50.sohonet.co.uk) has joined #ceph
[15:42] * lollercaust (~paper@154.Red-83-43-125.dynamicIP.rima-tde.net) has joined #ceph
[15:45] * Kioob`Taff2 (~plug-oliv@neu69-1-82-232-160-30.fbx.proxad.net) Quit (Quit: Leaving.)
[16:43] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[16:49] * lollercaust (~paper@154.Red-83-43-125.dynamicIP.rima-tde.net) Quit (Ping timeout: 480 seconds)
[16:57] * lollercaust (~paper@154.Red-83-43-125.dynamicIP.rima-tde.net) has joined #ceph
[17:26] <fred_> wow, Christian Brunner did something fantastic: http://article.gmane.org/gmane.comp.file-systems.btrfs/15413
[17:27] <fred_> If this problem gets fixed, my last roadblock for using ceph everyday will be no more
[17:29] * adjohn (~adjohn@70-36-197-222.dsl.dynamic.sonic.net) has joined #ceph
[17:34] * Tv|work (~Tv|work@aon.hq.newdream.net) has joined #ceph
[17:35] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[17:57] * adjohn (~adjohn@70-36-197-222.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[18:11] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[18:16] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[18:34] * spadaccio (~spadaccio@213-155-151-233.customer.teliacarrier.com) Quit (Quit: WeeChat 0.3.7-dev)
[18:49] * fred_ (~fred@80-219-180-134.dclient.hispeed.ch) Quit (Quit: Leaving)
[18:53] * joshd1 (~joshd@aon.hq.newdream.net) has joined #ceph
[19:01] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) has joined #ceph
[19:29] * tjikkun (~tjikkun@82-169-255-84.ip.telfort.nl) has joined #ceph
[19:32] * adjohn is now known as Guest24632
[19:32] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) has joined #ceph
[19:33] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) Quit (Remote host closed the connection)
[19:33] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) has joined #ceph
[19:33] * Guest24632 (~adjohn@rackspacesf.static.monkeybrains.net) Quit (Read error: Connection reset by peer)
[19:34] * Kioob (~kioob@luuna.daevel.fr) Quit (Quit: Leaving.)
[19:52] * jojy (~jvarghese@75-54-228-176.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[20:01] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[20:25] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[20:53] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) Quit (Quit: adjohn)
[20:57] * lollercaust (~paper@154.Red-83-43-125.dynamicIP.rima-tde.net) Quit (Quit: Leaving)
[21:16] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) has joined #ceph
[21:17] * jojy (~jvarghese@75-54-228-176.lightspeed.sntcca.sbcglobal.net) Quit (Quit: jojy)
[21:31] <NaioN> fred_: true, I'm also having that problem...
[21:31] <NaioN> well before BTRFS crashes but with 3.2.1 it doesn't crash anymore, now I see some orphans truncated/unlinked in dmesg
[21:32] <NaioN> but I also notice an extrem slowdown after a while and an increase in writes, I assume it's metadata that's been written
[21:35] * The_Bishop (~bishop@cable-89-16-138-109.cust.telecolumbus.net) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[21:40] <nhm> fred_: ah, those tests are very interesting!
[21:43] <NaioN> does anybody know how I can set mkfs and mount options for btrfs in the ceph.conf?
[21:45] <Tv|work> NaioN: the line that does the mkfs is just mkfs.btrfs $btrfs_devs
[21:46] <Tv|work> NaioN: and all the words in $btrfs_devs are expected to be block devices, so don't put options there
[21:46] <Tv|work> NaioN: so, please mkfs it yourself; btrfs_devs is more a quick-n-dirty thing
[21:50] <NaioN> hmmm alright :)
[21:50] <NaioN> just being lazy :)
[21:56] * jojy (~jvarghese@75-54-228-176.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[21:59] * adjohn is now known as Guest0
[21:59] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) has joined #ceph
[21:59] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) Quit (Remote host closed the connection)
[21:59] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) has joined #ceph
[22:05] * jojy (~jvarghese@75-54-228-176.lightspeed.sntcca.sbcglobal.net) Quit (Quit: jojy)
[22:06] * Guest0 (~adjohn@rackspacesf.static.monkeybrains.net) Quit (Ping timeout: 480 seconds)
[22:13] * verwilst (~verwilst@dD576F4A9.access.telenet.be) Quit (Quit: Ex-Chat)
[22:23] * adjohn is now known as Guest2
[22:23] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) has joined #ceph
[22:24] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) Quit (Remote host closed the connection)
[22:24] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) has joined #ceph
[22:24] * Guest2 (~adjohn@rackspacesf.static.monkeybrains.net) Quit (Read error: Connection reset by peer)
[22:37] * The_Bishop (~bishop@cable-89-16-138-109.cust.telecolumbus.net) has joined #ceph
[22:38] * lollercaust (~paper@174.Red-83-34-192.dynamicIP.rima-tde.net) has joined #ceph
[22:52] * adjohn is now known as Guest3
[22:52] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) has joined #ceph
[22:54] * lollercaust (~paper@174.Red-83-34-192.dynamicIP.rima-tde.net) Quit (Remote host closed the connection)
[22:54] * adjohn is now known as Guest4
[22:54] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) has joined #ceph
[22:57] * The_Bishop (~bishop@cable-89-16-138-109.cust.telecolumbus.net) Quit (Remote host closed the connection)
[23:00] * Guest3 (~adjohn@rackspacesf.static.monkeybrains.net) Quit (Ping timeout: 480 seconds)
[23:00] * lollercaust (~paper@174.Red-83-34-192.dynamicIP.rima-tde.net) has joined #ceph
[23:00] * Guest4 (~adjohn@rackspacesf.static.monkeybrains.net) Quit (Ping timeout: 480 seconds)
[23:13] <todin> Tv|work: You wrote the chef recipes for ceph? in the ceph.conf the recipe doen't write the osd section, or does that the barclamp?
[23:14] <Tv|work> todin: the ceph.conf is not intended to contain any dynamic sections
[23:14] <Tv|work> todin: osd come and go as disks fail and new ones are plugged in
[23:14] <Tv|work> todin: so it's all made to work with just the [osd] section
[23:14] * jojy (~jvarghese@75-54-228-176.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[23:15] <Tv|work> todin: that is, no [osd.ID] sections
[23:16] <todin> Tv|work: hmm, I don't get it how that should work, I thought every osd need a conf file with all the other osd in it
[23:16] <Tv|work> todin: nope
[23:16] <Tv|work> todin: all an osd needs is how to contact the monitors
[23:16] * jojy (~jvarghese@75-54-228-176.lightspeed.sntcca.sbcglobal.net) Quit ()
[23:16] <Tv|work> and well data dir location etc, but that's easily templated
[23:17] <Tv|work> todin: an osd will get an "osdmap" from the monitors, and that tells what osds are alive; it doesn't use ceph.conf for that
[23:18] <todin> Tv|work: ohh, I did not know that. than it should be quite easy to write in chef
[23:18] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) Quit (Read error: Connection reset by peer)
[23:18] <Tv|work> todin: i've been pushing to get it easier and easier
[23:19] <Tv|work> todin: current cookbook has two major limitations: 1) only one osd per host, fixed directory and not a mount point 2) one monitor total
[23:19] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) has joined #ceph
[23:19] <todin> and the data dir could be the same on every osd or should it osd.$id?
[23:19] <Tv|work> todin: both are easy to fix, i'm just working on better QA tools for ceph as a whole first, then i'll get back to that
[23:19] * jojy (~jvarghese@75-54-228-176.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[23:20] <Tv|work> todin: see ceph-cookbooks.git/ceph/templates/default/ceph.conf.erb
[23:20] <Tv|work> todin: it needs monitor addresses passed in, everything else can be fixed in the config file; osd data = /srv/osd.$id
[23:20] <todin> Tv|work: yep. that's what I am looking atm
[23:21] <Tv|work> todin: i have plans for making the osd data dir always be /var/lib/ceph/osd/$id or something like that, and using EFI partition type uuids to detect all the osd data disks in the system
[23:21] <Tv|work> todin: but i don't currently have the time to implement that
[23:22] <Tv|work> todin: with that, once the disk is mounted, a symlink will be made there, so the resulting path is always fixed
[23:22] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[23:22] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit ()
[23:22] <Tv|work> todin: and then using upstart etc new-fangled init systems to do job instances, e.g. "sudo initctl start ceph-osd OSD_ID=42"
[23:23] <Tv|work> todin: that's in there already, just.. hardcoded at setting up one osd per host, because i don't have the disk detection logic yet
[23:23] <Tv|work> heh, i called it /srv/ceph-fake-osd
[23:23] <Tv|work> explicitly labeled not ready ;)
[23:24] <todin> ok, I think I got, I give it a try.
[23:25] <todin> and I did not get the barclamp to work, do you know if it works with 1.2?
[23:25] <Tv|work> templates/default/upstart-ceph-mon-osd.conf.erb is the loop that starts all the osds on the system
[23:25] <Tv|work> todin: last time i ran it was in November :(
[23:26] <Tv|work> todin: this is what i'm working on right now: http://newdreamnetwork.github.com/ferrocumulus/architecture/
[23:26] <Tv|work> we need to revamp our qa machine pool, and make reinstalls faster
[23:26] <Tv|work> so i'm working on things that improve everyone's productivity first, and product features only second
[23:26] <todin> Tv|work: that's what I need to do as well.
[23:26] <iggy> my boss is all bonerific about xcat
[23:27] <Tv|work> iggy: test machines -> several reimagings per day -> can't sit through preseeded installs all the time
[23:28] <Tv|work> i do recall looking at xcat, i don't recall why i didn't like it
[23:28] <nhm> Tv|work: Do you guys have automatic provisioning going on the test cluster?
[23:28] <Tv|work> nhm: that's what i'm doing
[23:28] <nhm> Tv: excellent
[23:29] <iggy> pxe+nfsroot?
[23:29] <Tv|work> iggy: oh god please no
[23:29] <Tv|work> nfsroot was the source of several annoyances with the old cluster a year ago, before i made them use local disks; now pxe is the source of the majority of the rest
[23:29] <todin> Tv|work: what is is the timeframe for the reimageing? how fast has it to be?
[23:30] <iggy> pxe+cephroot?
[23:30] <Tv|work> todin: "as fast as fast can be"
[23:30] <nhm> we use xcat on some stuff here, though I'm partial to pxe+kickstart+puppet
[23:30] <Tv|work> iggy: ipmi+local disk
[23:30] <todin> Tv|work: why not use crowbar?
[23:30] <Tv|work> todin: i'm looking at boot+copy <300MB from 10gig network+boot
[23:30] <nhm> SystemImager is pretty speedy.
[23:31] <Tv|work> todin: crowbar is most definitely not meant for this use case, and I wrote the ceph barclamp..
[23:31] <iggy> hell, for what you guys are doing, you could probably pack it all in an initramfs
[23:31] <Tv|work> iggy: but we need to test what the customers use -- i need to install actual distros
[23:32] <todin> but wasn't crowbar developed by dell to remiage there openstack system?
[23:32] <Tv|work> todin: yes, buy me a beer some day and hear the story ;)
[23:32] <nhm> hehe
[23:32] <iggy> well...
[23:32] <Tv|work> crowbar is nice for what it does; crowbar does not do this
[23:33] <Tv|work> for one, i need to run lots of different distros, as unmodified as possible
[23:33] <Tv|work> how else will i do compatibility testing
[23:33] <iggy> testing what customers use seems contrary to fast (re)provisioning of an entire cluster for testing
[23:33] <Tv|work> oh right, "xCAT imaging with Partimage" "NOTE: This is not quite finished yet and I don't know when I'll get time to fix it in xCAT." "It is occasionally required to copy a hard drive for installation purposes ..."
[23:34] <Tv|work> completely wrong mindset for this
[23:34] <Tv|work> iggy: and *that* is where my trickery comes to play
[23:34] <nhm> iggy: depends on the testing. At least on the HPC side it's all automatic provisioning anyway.
[23:34] <todin> Tv|work: I see it. You need more like an provising system for an isp, which just installs many diffrent distros as an changed as possilble
[23:35] <Tv|work> frankly, when i looked at the requirements for a while, i realized i'm building a bare-metal "cloud"
[23:35] <Tv|work> sjust: feel free to hit me now
[23:35] <Tv|work> i can simplify it a lot, because i know the use cases are more limited
[23:36] <Tv|work> but it comes down to ec2-style "give me 7 machines running this! stat!"
[23:36] <nhm> Tv|work: how are you scheduling the provisioning?
[23:36] <Tv|work> nhm: sort of unclear.. i spent a large chunk of yesterday fighting mesos, it's just not good enough currently
[23:36] <Tv|work> nhm: or are you asking how it actually happens, as opposed to what are the allocation rules?
[23:36] <iggy> you'd think they'd have something like that in house already
[23:37] <Tv|work> pre-existing stuff is kinda heavyweight or simplistic
[23:37] <Tv|work> our current test cluster suffers from nightly runs DoSing it, etc
[23:38] <todin> Tv|work: is there anything out there that can do that? we wrote all the provining in house
[23:38] <nhm> Tv|work: I am curious if you are using anything like torque/sge/slurm to allocate the nodes for provisioning.
[23:38] <Tv|work> todin: a bug-free fork of mesos?-)
[23:38] <Tv|work> nhm: ah. not directly.
[23:38] <iggy> it'd be interesting to see your descet into insanity documented
[23:38] <iggy> *descent
[23:39] <iggy> i.e. we looked at xcat, but it had X drawbacks
[23:39] <Tv|work> nhm: a lot of the cluster schedulers assume they run a slave on every box; that's not true, these boxes get wiped
[23:40] <Tv|work> a lot of those are just stuck in an era they don't have to be anymore
[23:41] <nhm> gotta run, but we should talk more...
[23:43] <todin> Tv|work: and the machine you want to provision are bare-metal or virtual?
[23:43] <Tv|work> todin: bare-metal for this case, virtual is so much easier
[23:44] <Tv|work> todin: but we don't want to benchmark vms
[23:44] <Tv|work> todin: or worry about the origin of the funny IO latency
[23:45] <todin> you should look at isp tools, every isp with dedicated server is doing it
[23:45] <Tv|work> todin: my employer is an isp with a dedicated server product ;)
[23:46] <todin> Tv|work: mine as well
[23:46] <Tv|work> todin: they get like 1 reinstall / year / machine or something -- don't know the numbers, but it's so much less
[23:46] <iggy> yeah, i said that a few mins ago
[23:46] <Tv|work> more than anything, this'll feed back their way
[23:46] <todin> Tv|work: you have the right customer, we get much more reinstalls
[23:47] <Tv|work> people love dreamhost, they have lots of customers who've been with them for 10 years
[23:48] <todin> hopefully those customer got new server after a wihle ;-)
[23:48] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[23:48] <Tv|work> hah. most of that is shared hosting ;)
[23:49] <todin> a small provider has certainly some advantages
[23:50] <todin> but with the new input to the chef recipes I will try it again
[23:52] <todin> Tv|work: for the journal is it better to use a file, or a partion?
[23:52] <Tv|work> todin: that shouldn't matter, the question is what block device will it be on
[23:53] <Tv|work> these days swapfile and swap partition perform just as well; same should go for the journal
[23:53] <todin> you set the device via an attribute
[23:53] <Tv|work> pre-allocating the file may provide a benefit
[23:53] <Tv|work> i'm saying, the real difference is whether journal fights with actual data for IO, or not
[23:54] <todin> ok. so the jounral is easy to do as well
[23:55] <todin> iirc you worte in your email that only one mon is suportet atm
[23:56] * adjohn is now known as Guest9
[23:56] * Guest9 (~adjohn@rackspacesf.static.monkeybrains.net) Quit (Read error: Connection reset by peer)
[23:56] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) has joined #ceph
[23:58] <Tv|work> todin: yes, the cookbook currently makes simplifying assumptions
[23:59] <todin> where is the prob? couldn't you just get the mon add via search(:node, "recipes:ceph::mon).all? and just iterate throu it?

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.