#ceph IRC Log

Index

IRC Log for 2011-06-16

Timestamps are in GMT/BST.

[0:10] <Tv> so what exactly does "ceph auth add" do? how is it different from having the key in mon.0's ceph.keyring in the first place?
[0:13] * alexxy (~alexxy@79.173.81.171) Quit (Ping timeout: 480 seconds)
[0:14] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[0:14] * dr_bibble (5138106c@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[0:21] * cmccabe (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[0:21] * cmccabe (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) has joined #ceph
[0:23] <yehudasa> Tv: ceph auth add is the equivalent of modifying /etc/passwd
[0:23] <Tv> that's a bit too abstract ;)
[0:23] <yehudasa> the mons keep track of all the users/keys in the system, and it generates a new one
[0:24] <Tv> does it do more than having the same key with the right name in ceph.keyring on the mon before cmon is started?
[0:24] <yehudasa> the mon.0 keyring is just the monitors shared key
[0:25] <yehudasa> in the ceph auth add you can add client keys, osd keys, etc.
[0:25] <Tv> i can add those with vi ceph.keyring
[0:25] <Tv> does it do more?
[0:25] <yehudasa> it distributes the keys between all the monitors
[0:26] <Tv> assume i put the key in ceph.keyring on every monitor node -- does ceph auth add do more than that?
[0:27] <yehudasa> I'm not sure if the static keyring file is being read more than once on initialization
[0:27] <Tv> i'm fine with that
[0:27] <cmccabe> I'm pretty sure that file is only read once in common_init_finish
[0:27] <yehudasa> but in order to add users you'll need to restart the monitors
[0:27] <Tv> but there's no extra logic in "ceph auth add", having a key by the same name in ceph.keyring when cmon starts is equivalent?
[0:27] <cmccabe> for what it'
[0:27] <cmccabe> s worth
[0:30] <yehudasa> Tv: according to sage the user/keys creation happens only when doing mkfs
[0:30] <Tv> i'm fine with that too ;)
[0:31] <Tv> so at cmon mkfs time, it reads ceph.keyring and creates some persistent "user" state for every key in there; and ceph auth add adds to this state at runtime ?
[0:31] <yehudasa> Tv: as long as you're happy
[0:31] <yehudasa> Tv: yes
[0:33] * macana (~ml.macana@159.226.41.129) Quit (Read error: Connection reset by peer)
[0:33] * macana (~ml.macana@159.226.41.129) has joined #ceph
[0:44] * verwilst (~verwilst@dD5769416.access.telenet.be) Quit (Quit: Ex-Chat)
[1:11] * fred_ (~fred@2-113.79-83.cust.bluewin.ch) has joined #ceph
[2:03] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) has joined #ceph
[2:03] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) Quit (Ping timeout: 480 seconds)
[2:06] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Read error: Operation timed out)
[2:27] * yoshi (~yoshi@p24092-ipngn1301marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[2:33] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[2:38] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[2:48] * jbd (~jbd@ks305592.kimsufi.com) has left #ceph
[2:48] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[2:55] * cmccabe (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) has left #ceph
[3:17] * todin (tuxadero@kudu.in-berlin.de) Quit (Read error: Connection reset by peer)
[3:26] * todin (tuxadero@kudu.in-berlin.de) has joined #ceph
[3:31] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[3:42] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[3:52] * fred_ (~fred@2-113.79-83.cust.bluewin.ch) Quit (Quit: Leaving)
[4:23] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[4:43] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[6:29] * jbd (~jbd@ks305592.kimsufi.com) has joined #ceph
[8:04] * DanielFriesen (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[8:10] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Ping timeout: 480 seconds)
[8:29] * gregaf1 (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[8:36] * gregaf (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[8:52] * allsystemsarego (~allsystem@188.27.164.204) has joined #ceph
[9:39] * bhem (~bhem@1GLAAB9Q7.tor-irc.dnsbl.oftc.net) has joined #ceph
[11:31] * zenon (591e7c93@ircip2.mibbit.com) has joined #ceph
[12:13] * bhem (~bhem@1GLAAB9Q7.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[12:40] * yoshi (~yoshi@p24092-ipngn1301marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[12:54] * zenon (591e7c93@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[13:24] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[13:57] * sayotte (~ircuser@208.89.100.110) Quit (Remote host closed the connection)
[13:57] * sayotte (~ircuser@208.89.100.110) has joined #ceph
[14:14] * yoshi (~yoshi@KD027091032046.ppp-bb.dion.ne.jp) has joined #ceph
[14:35] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) Quit (Quit: Leaving.)
[15:00] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) has joined #ceph
[15:18] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[15:30] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) Quit (Remote host closed the connection)
[15:37] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) has joined #ceph
[15:41] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) has joined #ceph
[15:53] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) Quit (Quit: Leaving.)
[15:55] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) has joined #ceph
[16:24] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Read error: Connection reset by peer)
[16:30] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[16:31] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) Quit (Quit: Leaving.)
[16:38] * fred_ (~fred@2-113.79-83.cust.bluewin.ch) has joined #ceph
[16:38] <fred_> hi
[16:38] <fred_> wido, are you around?
[16:38] * lx0 (~aoliva@83TAABT1T.tor-irc.dnsbl.oftc.net) Quit (Read error: Connection reset by peer)
[16:39] * lx0 (~aoliva@1GLAAB9ZF.tor-irc.dnsbl.oftc.net) has joined #ceph
[16:45] <damoxc> does anyone know how to get around mds stuck in up:resolve
[16:45] <damoxc> all my pgs are active+clean
[16:47] <damoxc> appears to be something with subtree bounds being bad
[16:47] * greglap (~Adium@166.205.138.104) has joined #ceph
[16:57] * MattCampbell (~matt@ppp-70-130-44-76.dsl.wchtks.swbell.net) has joined #ceph
[16:57] <greglap> damoxc: so your MDS is crashing?
[16:57] <MattCampbell> Are there any large-scale deployments of Ceph yet, or is it still too early in the development process for that?
[16:58] <damoxc> greglap: yes, i've just attached some logs to #1170, although that may be the wrong issue
[16:58] <damoxc> greglap: it's the same stack-trace though
[16:58] <greglap> MattCampbell: I don't think there are any that are publicly known
[16:59] <MattCampbell> Has anyone written a comparison of Ceph and GlusterFS? They seem to be quite similar in their goals.
[16:59] <greglap> we at Dreamhost have a pretty big RADOS cluster that we're going to open up an S3-lookalike with, I think it's 40*5 or something
[17:00] <greglap> damoxc: yeah, probably the same issue
[17:00] <greglap> it's in my queue and I can bump it up in priority but I'm afraid I don't have a timeframe yet :(
[17:01] <greglap> multi-MDS systems just aren't stable yet although hopefully we're down to our last handful of bugs; I've been doing a lot of testing and sage and I took some time on it together last week
[17:01] <damoxc> greglap: no worries :-)
[17:01] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Quit: Ex-Chat)
[17:02] <damoxc> greglap: is there anyway to get the cluster to come back up, short of re-creating the whole thing?
[17:02] <greglap> not that will guarantee your data's good :/
[17:02] <greglap> you can wipe out the MDS journal for the MDS that's breaking, though
[17:03] <greglap> that will let it start up but you might lose some metadata/data
[17:03] <damoxc> hmm 6 / 13 are dead currently
[17:03] <greglap> MattCampbell: I'm not aware of any explicit comparisons
[17:03] <greglap> you've got 13 MDSes?
[17:03] <greglap> Oy vey
[17:04] <damoxc> greglap: yes, 8 active, 5 standby
[17:04] <MattCampbell> Even 8 MDSes sounds like a very high number unless you've got petabytes of storage.
[17:04] <greglap> ah, that makes moderately more sense
[17:04] <greglap> still, you can't possibly have 8 MDSes worth of metadata activity?
[17:05] <damoxc> I wanted to see how many files it could handle in a directory
[17:05] <greglap> so do you actually have 6 dead ones, or just one bad journal that all the standbys keep running through and crashing on? ;)
[17:05] <greglap> oh, that's not really a function of the number of MDSes...
[17:05] <damoxc> ah, sage said in a ticket that it might improve things
[17:06] <greglap> with fragmentation enabled (which I think it is in 0.29?) it doesn't really care how many files in the dir
[17:06] <greglap> more MDSes will allow more active files
[17:06] <damoxc> I was testing with another cluster (that's also stuck in up:resolve) and with a single mds that crawled to a snails pace with 300k files in a single directory
[17:06] <greglap> but if it's just total number then the dir will get fragmented and it shouldn't matter much
[17:07] <greglap> did you have fragmentation enabled on that one?
[17:07] <greglap> MattCampbell: was there anything specific you were looking for in Ceph vs Gluster?
[17:08] <greglap> I'm not real familiar with Gluster but I've read a little about it...
[17:08] <damoxc> I can't remember if it crashed before or after I enabled the fragmentation stuff
[17:08] <damoxc> I have a feeling it may be after
[17:08] <damoxc> there's only 3 on that cluster
[17:10] <MattCampbell> Well, I do have a specific question about Ceph: How complete is Ceph's POSIX-compliant fs layer? For example, could I run an IMAP server like Dovecot on top of a Ceph filesystem?
[17:10] <greglap> "Ceph" is the posix-compliant layer, and it's...very complete?
[17:11] <MattCampbell> ah, thanks for the clarification
[17:11] <greglap> more than NFS, anyway
[17:11] <MattCampbell> so Ceph is the filesystem and RADOS is the object storage?
[17:11] <greglap> yeah
[17:11] <MattCampbell> Oh wow, better than NFS?
[17:11] <greglap> yeah
[17:12] <greglap> the only difference I can recall off the top of my head is that sparse files only take up the amount of space required, but they show up in df as if they were non-sparse
[17:13] <greglap> I'm sure there are others, but I don't think any of them are involved in consistency semantics
[17:13] <MattCampbell> that one is no big deal
[17:13] <MattCampbell> I gather, then, that Ceph is also much better than any of the attempts to implement a FUSE filesystem on top of S3
[17:14] <greglap> I haven't looked at them, but that's a safe bet
[17:14] <MattCampbell> They look pretty shoddy to me.
[17:14] <greglap> I mean, its whole purpose in life is to be a real filesystem :)
[17:15] <MattCampbell> Given that Amazon implemented S3 and Rackspace Cloud did something very similar, I started to wonder if a distributed POSIX-compliant filesystem is even possible.
[17:16] <greglap> well, it's a lot more complicated than a simple object store
[17:16] <MattCampbell> Complicated for the system admin(s), or merely complicated to implement?
[17:17] <damoxc> the latter
[17:17] <greglap> mmm, kinda both
[17:17] <damoxc> posix has acl etc.
[17:17] <greglap> very complicated to implement
[17:17] <greglap> once implemented, complicated to design a good topology
[17:17] <greglap> once designed, shouldn't be hard to maintain
[17:18] <greglap> the OSD code we have right now is ~15k lines, the monitor code (some of which is for Ceph, but most of which is for RADOS) is ~10k lines, and the MDS code is ~50k lines
[17:18] <greglap> (the MDS code is all for Ceph)
[17:18] <greglap> there's a similar difference in complexity between the clients for RADOS and Ceph
[17:20] <greglap> and of course this code originated in a research lab, it's not the kind of thing a company like Amazon or Rackspace is going to develop on their own
[17:20] <MattCampbell> You said DreamHost is about to deploy an S3-like service on top of RADOS. So is DreamHost using Ceph as well, or just RADOS?
[17:20] <MattCampbell> (Or is that not public knowledge?)
[17:20] <greglap> nah, we're open about it
[17:20] <greglap> just RADOS right now
[17:21] <greglap> there's a closed semi-public playground that's on top of Ceph so we could try out a long-term deployment, but it's not for live data
[17:22] <MattCampbell> Seems like Ceph could be really useful for Dreamhost, since lots of web apps need a POSIX filesystem.
[17:22] <greglap> it's a long-term goal :)
[17:23] <damoxc> MattCampbell: it would be best for dovecot if it used rados directly I think
[17:23] <MattCampbell> Is it safe to say that one should only use a POSIX filesystem for existing apps, and an object store for new apps?
[17:23] <greglap> our company has gotten very good at running shared hosting the traditional way, and we don't want to migrate to something that we aren't 100% sure will work!
[17:24] <MattCampbell> since a POSIX filesystem is way more complicated
[17:24] <greglap> well, it depends on what you're doing and what you need your storage to do
[17:25] <greglap> if you actually want POSIX semantics you're just shifting the complexity from the well-tested filesystem to your less-tested application ;)
[17:25] <damoxc> greglap: does rados support locking?
[17:25] <greglap> but if you need it to scale to millions of simultaneous users you're going to have a way easier time finding object stores that can do that than posix filesystems
[17:26] <greglap> damoxc: it doesn't have a built-in locking system, no
[17:31] <MattCampbell> At the company I work for, we're using Amazon Web Services. A large proportion of our storage is in S3 and will probably stay there (especially given the integration with AWS's CDN), but some of our most important data is on a single EBS (block storage) volume.
[17:31] <MattCampbell> So I'm looking to eliminate the single point of failure.
[17:32] <MattCampbell> And among that data is mailboxes for our customers. So if Dovecot can run on top of Ceph, then Ceph might be a good choice.
[17:33] <MattCampbell> No, we're not a shared web hosting company, so no competition. :)
[17:33] <greglap> heh
[17:33] <greglap> I can't imagine why it wouldn't run on top of Ceph
[17:33] <greglap> you don't just want to RAID-1 the EBS or something, though?
[17:33] <greglap> (or are you planning to move it out of the Amazon cloud?
[17:34] <greglap> my train's in, be back in 15!
[17:35] <MattCampbell> EBS includes mirroring, and I also take snapshots to S3, but I was thinking more about the fact that the EBS volume can only be attached to a single EC2 instance, and if that EC2 instance goes down...
[17:35] * greglap (~Adium@166.205.138.104) Quit (Read error: Connection reset by peer)
[17:36] <MattCampbell> I might also consider using RADOS instead of S3, just to avoid the vendor lock-in should we ever want to leave AWS.
[17:38] <damoxc> MattCampbell: there's rbd too
[17:41] <MattCampbell> I know about RBD. But even with a virtual block device like RBD that's backed by a distributed object store, the block device itself can only be attached to one machine at a time. The distributed filesystem is more interesting to me.
[17:43] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:43] <MattCampbell> damoxc: You suggested the possibility of modifying Dovecot to use RADOS. Do you know of any IMAP server that uses an object store rather than a POSIX filesystem?
[17:43] <damoxc> MattCampbell: I just remember seeing the plans to support cassandra, which is effectively a key/value store also
[17:44] * aliguori (~anthony@32.97.110.59) has joined #ceph
[17:44] <damoxc> MattCampbell: http://www.dovecot.org/list/dovecot/2009-August/041983.html
[17:44] <damoxc> MattCampbell: a quick search doesn't yield much activity, although there may have been , I haven't checked the source
[17:47] <MattCampbell> On a different subject, is it better to run an object storage node on a RAID array, or directly on a physical disk?
[17:48] <damoxc> having SSds to store the journals on is nice
[17:48] <MattCampbell> Put a different way, if I have a box with, say, 3 disks in it, would it be better to RAID them and run object storage on the RAID array, or to run 3 object storage nodes on the disks themselves?
[17:49] <damoxc> aside from the journal's I don't really know, may be best to wait for greglap to get back
[17:53] <gregaf1> MattCampbell: that's something that we and various other people are still exploring
[17:54] <gregaf1> for now we're generally trying to run an OSD per disk because it minimizes recovery time, but there is a cost in processing power
[17:55] <damoxc> would running the disks in raid-0 be a good or bad idea?
[17:55] <damoxc> as a rule of thumb
[17:56] <gregaf1> you could, I wouldn't
[17:56] <damoxc> bad then
[17:56] <gregaf1> you'd increase the likelihood of a node going down and paying the reconstruction cost for all the disks if only one disk failed
[17:57] <gregaf1> but it would cost less in terms of hardware and compute power
[17:58] * damoxc nods
[17:59] <gregaf1> these kinds of questions are ones that we don't really have enough data for yet ??? we can approximate it using what we know about the system, but we might just be completely wrong
[18:01] <MattCampbell> So running an OSD on a physical disk yields faster recovery time than any of the varieties of RAID?
[18:02] <gregaf1> I just mean that if you run an OSD per disk then if one disk fails you're only recovering one disk's worth of data ??? if you're in RAID0 then you're recovering all of them
[18:02] <gregaf1> if you run a real RAID then losing one disk is all local recovery but losing two disks (and correlated failures suck and are pretty typical) then you're recovering the entire physical node's worth of data instead of two disk's worth of data
[18:04] <damoxc> gregaf1: how do you wipe a metadata servers journal?
[18:04] <gregaf1> there's a flag you pass to the cmds binary
[18:05] <gregaf1> if you run ./cmds ???help it should tell you? (or the man page?)
[18:05] <damoxc> --reset-journal by the looks of things
[18:05] <gregaf1> sounds right
[18:06] <damoxc> what is the rank parameter?
[18:06] <gregaf1> MDS num
[18:06] <fred_> gregaf1, do you know who of you guys will be able to help me with #1191?
[18:06] <damoxc> gregaf1: thanks
[18:07] <gregaf1> fred_: probably Josh, but let me check it out
[18:07] <gregaf1> we should get to it today since it's on the stable branch :)
[18:08] <fred_> gregaf1, oh that is a good reason to follow the stable branch :)
[18:09] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:16] <damoxc> gregaf1: what would I lose by clearing all the mds journals in a cluster?
[18:16] <gregaf1> damoxc: the mds writes metadata changes to the journal and then later flushes them out to the appropriate on-disk objects
[18:16] <damoxc> gregaf1: so any recent data will be lost?
[18:17] <gregaf1> you're going to lose most of the metadata changes that are in the journal
[18:17] <gregaf1> recent metadata, yes
[18:17] <damoxc> and rank is that the name of the metadata server or the number that ceph assigns it?
[18:17] <gregaf1> so new files won't be linked in to the tree, old files might have wrong atimes or recorded sizes, etc
[18:17] <gregaf1> rank is the number that it's assigned
[18:18] <damoxc> how do you know that if you haven't started the server yet?
[18:18] <gregaf1> MDS daemons have a name which associates them with the physical server they occupy, and they're given a logical ID which is the rank
[18:19] <gregaf1> the journals and directory authoritys belong to a rank, which is how they're moved between different daemons if one crashes
[18:19] <gregaf1> I'd look at crash logs and see which rank they were, it should be in the debug output of most lines
[18:19] <damoxc> mds7.journal
[18:19] <damoxc> so that's 7
[18:19] <gregaf1> yeah
[18:20] <damoxc> okay thanks
[18:32] <damoxc> cmds crashes with a stack trace if you put an invalid rank
[18:33] <damoxc> should I file a bug report or is that a known issue?
[18:34] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:34] <gregaf1> oh, make a bug
[18:36] <damoxc> looks like clearing the mds journals has fixed things
[18:39] <damoxc> thanks for all your help
[18:39] <gregaf1> any time
[18:41] <MattCampbell> Can multiple rgw instances run concurrently?
[18:42] <gregaf1> yes, though I don't remember the requirements/rules....yehudasa?
[18:43] * cmccabe (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) has joined #ceph
[18:43] <yehudasa> yes
[18:44] <gregaf1> okay, so it's just all on-disk, nothing's cached in-memory
[18:46] <gregaf1> wido: you around?
[18:46] <gregaf1> wanted to ask you about your monitor OOM
[18:46] <gregaf1> analyzing the heap dump requires a matching binary
[18:47] <gregaf1> and I wasn't sure if you had any newer ones or if it was currently running or what
[18:53] * Yulya_ (~Yu1ya_@ip-95-220-224-191.bb.netbynet.ru) has joined #ceph
[18:58] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:03] <wido> gregaf1: I'm around
[19:03] <gregaf1> ah, good
[19:03] <wido> The binary should match, I might have upgraded, don't know for sure
[19:04] <gregaf1> the binary on the machine?
[19:04] <gregaf1> I didn't seen one in the tar and I couldn't find any heap dumps on monitor, but I might not have looked in the right place
[19:04] <wido> I removed the heap dumps on monitor, my bad
[19:04] <wido> the binary is from june 14th
[19:05] <wido> So I guess I upgraded to try another fix
[19:05] <wido> I know why
[19:05] <wido> gregaf1: due to: 52e9e5ec339a4aa996731604bcc0ff95e6659d3b
[19:06] <gregaf1> ah, right
[19:06] <gregaf1> so there aren't any newer heap dumps?
[19:06] <wido> I could go back to another revision and build a new binary, that should match, right?
[19:06] <gregaf1> if you know which one it was running before :)
[19:07] <wido> No, weird thing is, it didn't crash since then
[19:07] <Tv> compiles aren't reproducible, in general
[19:07] <Tv> so re-compiling the same source does not result in the same output
[19:07] <gregaf1> yeah, it looks like it's sitting at ~140MB right now?
[19:07] <gregaf1> Tv: if you build it in the same environment it will, odds are good on the same machine
[19:08] <wido> Yes, it looks like it. It might be related to the OSD's which were hammering the mon due to a bug there
[19:08] <gregaf1> hmm, wonder if that means it's leaking in the pg_stats path or something
[19:09] <gregaf1> maybe you can turn on the heap profiling now and if it gets killed again we'll have data leading up to that point (and keep the binary!)
[19:09] <wido> Since the heap dumps are from june 10th, my guess is that it was with c2de9e6a
[19:09] <gregaf1> if you want to get me a copy I can try analyzing with that too
[19:09] <wido> I always build in the morning
[19:10] <wido> I'll start the profiler, we'll see if it goes OOM
[19:11] <cmccabe> tv: perhaps we should set -frandom-seed when compiling
[19:11] <cmccabe> tv: and, of course, get rid of automake for CMake (ooh, I went there)
[19:12] <wido> gregaf1: I'm building a cmon binary with c2de9e6a now, might work
[19:12] <wido> where do you want it?
[19:15] <wido> gregaf1: got to go, I've placed the new cmon binary on monitor in /root
[19:17] <sagewk> standup!
[19:21] * Yulya_ (~Yu1ya_@ip-95-220-224-191.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[19:24] <MattCampbell> What does RADOS stand for?
[19:34] <gregaf1> wido: thanks!
[19:34] <bchrisman> heh??? sorry.. was in mtg
[19:34] <gregaf1> MattCampbell: Reliable Autonomous Distributed Object Store
[19:35] <sagewk> np
[19:37] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:39] <fred_> joshd, around?
[19:40] <joshd> yup
[19:40] <gregaf1> oh, I was wrong about that ??? sjust is taking a look at your bug, fred_
[19:40] <fred_> thanks
[20:16] * cmccabe (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) has left #ceph
[20:48] * yoshi (~yoshi@KD027091032046.ppp-bb.dion.ne.jp) Quit (Remote host closed the connection)
[20:49] <MattCampbell> Does RADOS itself provide multi-user support (authentication, permissions), or is that implemented in rgw?
[21:01] * fred_ (~fred@2-113.79-83.cust.bluewin.ch) Quit (Quit: Leaving)
[21:02] <Tv> MattCampbell: rados' level of access control is "can you read/write this pool of objects"
[21:02] <yehudasa> MattCambell: rados does provide that, but it's also implemented above it in rgw
[21:02] <yehudasa> basically what Tv said
[21:02] <Tv> rgw does something more finegrained on top of that
[21:09] <MattCampbell> Incidentally, I'm surprised anyone would implement a web service in C++. Personally, I would have written rgw in Python.
[21:36] * jantje_ (~jan@paranoid.nl) has joined #ceph
[21:38] * jantje (~jan@paranoid.nl) Quit (Ping timeout: 480 seconds)
[21:59] * gregorg_taf (~Greg@78.155.152.6) has joined #ceph
[21:59] * gregorg (~Greg@78.155.152.6) Quit (Read error: Connection reset by peer)
[22:01] * aliguori (~anthony@32.97.110.59) Quit (Quit: Ex-Chat)
[22:01] * gregorg_taf (~Greg@78.155.152.6) Quit (Read error: Connection reset by peer)
[22:01] * aliguori (~anthony@32.97.110.65) has joined #ceph
[22:04] * gregorg (~Greg@78.155.152.6) has joined #ceph
[22:04] * allsystemsarego (~allsystem@188.27.164.204) Quit (Quit: Leaving)
[22:10] * mib_765z2b (5138106c@ircip1.mibbit.com) has joined #ceph
[22:11] * alexxy (~alexxy@79.173.81.171) Quit (Remote host closed the connection)
[22:11] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[22:11] * aliguori (~anthony@32.97.110.65) Quit (Ping timeout: 480 seconds)
[22:20] * aliguori (~anthony@32.97.110.64) has joined #ceph
[22:22] * verwilst (~verwilst@dD5769475.access.telenet.be) has joined #ceph
[22:56] * verwilst (~verwilst@dD5769475.access.telenet.be) Quit (Quit: Ex-Chat)
[22:56] * cmccabe (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) has joined #ceph
[23:06] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[23:15] * cmccabe1 (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) has joined #ceph
[23:15] * cmccabe (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[23:19] * alexxy (~alexxy@79.173.81.171) Quit (Remote host closed the connection)
[23:26] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[23:45] <sagewk> uhh, just lost network for sepia nodes
[23:45] * alexxy (~alexxy@79.173.81.171) Quit (Remote host closed the connection)
[23:47] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[23:49] <darkfaded> rebuild them to use json and xmlrpc
[23:49] <darkfaded> (scnr)
[23:50] <Tv> sagewk: me too
[23:50] <Tv> sagewk: i'm afraid you asked NOC to touch the machines ;)
[23:51] <Tv> this has happened before
[23:52] <sagewk> nobody touching them today, apparently. looks like something else
[23:57] * aliguori (~anthony@32.97.110.64) Quit (Quit: Ex-Chat)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.