#ceph IRC Log

Index

IRC Log for 2012-01-09

Timestamps are in GMT/BST.

[0:54] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[0:58] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[1:10] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[1:23] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[1:29] * BManojlovic (~steki@212.200.243.100) Quit (Remote host closed the connection)
[2:46] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[2:51] * aliguori_ (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[2:56] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[7:15] * The_Bishop (~bishop@cable-89-16-138-109.cust.telecolumbus.net) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[9:05] * fghaas (~florian@85-127-93-41.dynamic.xdsl-line.inode.at) has joined #ceph
[9:16] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[9:25] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:28] * izdubar (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[9:33] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[9:55] * fghaas (~florian@85-127-93-41.dynamic.xdsl-line.inode.at) Quit (Read error: No route to host)
[9:55] * fghaas (~florian@85-127-93-41.dynamic.xdsl-line.inode.at) has joined #ceph
[10:11] * BManojlovic (~steki@93-87-148-183.dynamic.isp.telekom.rs) has joined #ceph
[10:50] * fronlius_ (~fronlius@testing78.jimdo-server.com) has joined #ceph
[10:50] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Read error: Connection reset by peer)
[10:50] * fronlius_ is now known as fronlius
[11:50] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Read error: Connection reset by peer)
[12:41] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[13:11] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[13:59] <dwm_> Does a log entry of this form look normal?
[13:59] <dwm_> osd.3[32132]: 7fbed36a1700 osd.3 367 pg[0.a1( v 59'1425 (59'1423,59'1425] n=1425 ec=1 les/c 367/367 342/342/342) [3,0,1] r=0 lpr=342 lcod 0'0 mlcod 0'0 !hml active+clean] truncate_seq 1 > current 0, truncating to 184467440737095
[13:59] <dwm_> 51615
[14:00] <dwm_> (I ask, because that large truncation value is the same large truncation value causing OSD start-up crashes in bug #1759)
[14:38] * lollercaust (~paper@41.Red-88-15-116.dynamicIP.rima-tde.net) has joined #ceph
[15:12] * spadaccio (~spadaccio@213-155-151-233.customer.teliacarrier.com) has joined #ceph
[15:15] <spadaccio> hi, I'm trying ceph now, but my OSD is stuck in the "creating" state. Here is my ceph.conf: http://pastebin.com/qH73xTS0
[15:15] <spadaccio> can anybody spot the problem?
[15:16] <spadaccio> how much time does it usually take to create the osd?
[15:16] * lollercaust (~paper@41.Red-88-15-116.dynamicIP.rima-tde.net) Quit (Quit: Leaving)
[15:17] <spadaccio> also, do I have to mount the btrfs partition, or does ceph take care of mounting it?
[15:18] <guido> spadaccio: both will work
[15:20] <spadaccio> guido: thanks. unfortunately, it is still stuck in "creating"
[15:21] <spadaccio> I looked into dmesg as per http://tracker.newdream.net/issues/1098, but didn't find any stack trace
[15:34] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[15:34] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[15:35] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[15:35] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[16:31] <spadaccio> well, I progressed a bit, there was an error in the cfg file (s/dev/devs/)
[16:31] <spadaccio> now ceph -w hangs after giving me the information
[16:31] <spadaccio> and the status is always up:creating
[16:32] <spadaccio> the btrfs volume is mounted and there is data in it (ceph_fsid current fsid magic snap_1 snap_2 store_version whoami)
[16:47] <spadaccio> how much should an osd stay in "up:creating" ?
[16:54] * aneesh (~aneesh@122.248.163.4) Quit (Remote host closed the connection)
[16:55] * aneesh (~aneesh@122.248.163.3) has joined #ceph
[16:55] * adjohn (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) has joined #ceph
[17:17] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[17:19] * aliguori_ is now known as aliguori
[17:19] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Quit: Ex-Chat)
[17:20] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[17:26] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[17:26] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[17:35] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:42] * adjohn (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[17:43] * adjohn (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) has joined #ceph
[17:47] * adjohn (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) Quit ()
[18:03] * DLange (~DLange@dlange.user.oftc.net) Quit (Quit: reboot .. kernel upgrade)
[18:03] * BManojlovic (~steki@93-87-148-183.dynamic.isp.telekom.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[18:04] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[18:05] * DLange (~DLange@dlange.user.oftc.net) has joined #ceph
[18:08] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Ping timeout: 480 seconds)
[18:21] * jojy (~jvarghese@75-54-228-176.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[18:24] * elder (~elder@aon.hq.newdream.net) has joined #ceph
[18:33] * fronlius (~fronlius@f054113122.adsl.alicedsl.de) has joined #ceph
[18:34] * sjust (~sam@aon.hq.newdream.net) has joined #ceph
[18:37] <sjust> spadaccio: did you mean an mds?
[18:37] <sjust> in up:creating, that is
[18:38] <spadaccio> the object that stays in that state :) I started using ceph today so I still don't know the exact terms
[18:38] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[18:38] <guido> spadaccio: btw, which version are you using? How many PGs does ceph -s report?
[18:39] <spadaccio> I was using 0.38
[18:39] <guido> I had a similar problem with 0.38, caused by mkcephfs not creating enough pgs, but 0.39 fixed that
[18:39] <spadaccio> I just compiled 0.39
[18:40] <spadaccio> and was about to test it, but I had an interrupt
[18:40] <spadaccio> will let you know if 0.39 will fix it :)
[18:40] * MarkDud (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[18:41] <guido> spadaccio: note, just upgrading the software won't fix this problem, you will have to repeat the mkcephfs step
[18:42] <spadaccio> yes, I removed the LV completely and was about to recreate it all
[18:43] * elder (~elder@aon.hq.newdream.net) Quit (Quit: Leaving)
[18:46] <Tv> please confirm: we're doing a sprint planning meeting today, so we skip the daily status meeting, right?
[18:46] * izdubar (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[18:50] <sjust> spadaccio: could you post the output of ceph pg dump and ceph -s once your cluster is back up?
[18:50] <spadaccio> sjust: will do that, thanks
[18:53] <spadaccio> sjust: it's a 1-node cluster, with a 5GB LV partition used for storage: how much should ceph take to initialize it and move out of up:creating?
[18:55] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[18:55] <sjust> spadaccio: not long
[18:55] <sjust> spadaccio: I suspect that something is wrong with the osd, ceph -s and ceph pg dump should give me an idea of what is going on
[18:56] <spadaccio> so less than the 4 minutes I've been waiting so far :)
[18:56] <spadaccio> will put them into a pastebin for you now
[18:57] <spadaccio> http://pastebin.com/Cxfi8YgN
[18:57] <spadaccio> sjust: ^
[18:58] * elder (~elder@aon.hq.newdream.net) has joined #ceph
[19:00] <Tv> please confirm: we're doing a sprint planning meeting today, so we skip the daily status meeting, right? (sagewk?)
[19:03] <guido> spadaccio: that really looks like the same problem that I had - with only 6 pgs and such. In my case, it was fixed in 0.39, though...
[19:03] <sjust> spadaccio: I think you hit a bug in the default crushmap creation in 0.38
[19:03] <sjust> it should be fixed in 0.39, but you are still using the crushmap created in 0.38, one sec
[19:03] <spadaccio> yay
[19:04] <spadaccio> yes, I installed 0.39 after apt-get removing 0.38
[19:04] <spadaccio> I removed the LV and recreated it, but I don't know if I left something to clean behind=
[19:05] * adjohn (~adjohn@208.90.214.43) has joined #ceph
[19:05] <sagewk> tv: oh, let's do a standup, since it's this afternoon
[19:05] <Tv> sagewk: ok, just wanted a clarification
[19:07] * adjohn (~adjohn@208.90.214.43) Quit (Remote host closed the connection)
[19:07] * adjohn (~adjohn@208.90.214.43) has joined #ceph
[19:15] <sjust> spadaccio: what is the output of 'ceph osd pool get data pg_num'?
[19:17] <spadaccio> 2012-01-09 18:36:03.207930 mon <- [osd,pool,get,data,pg_num]
[19:17] <spadaccio> 2012-01-09 18:36:03.209054 mon.0 -> 'PG_NUM: 0' (0)
[19:17] <spadaccio> sjust: ^ (sorry, was AFK)
[19:30] * bchrisman (~Adium@108.60.121.114) has joined #ceph
[19:32] <sjust> spadaccio: for the pool_names data, metadata, and rbd, run 'ceph osd pool set <pool_name> pg_num 8' and 'ceph osd pool set <pool_name> pgp_num 8'
[19:32] <sjust> and then restart the daemons
[19:32] <sjust> sorry for the delay, was in a meeting
[19:33] <spadaccio> sjust: np. Will run the two commands and let you know
[19:33] <sjust> run those two commands for each of those pools
[19:33] <spadaccio> the pool name is osd.0 ?
[19:38] <sjust> the pool names are data, metadata, and rbd
[19:38] <sjust> osd.0 is an osd
[19:39] <spadaccio> thanks :) sorry but I'm very new :)
[19:39] <spadaccio> root@ugcn1:~/ceph-0.39# ceph osd pool set metadata pg_num 8
[19:39] <spadaccio> 2012-01-09 18:57:54.189565 mon <- [osd,pool,set,metadata,pg_num,8]
[19:39] <spadaccio> 2012-01-09 18:57:54.190033 mon.0 -> 'currently creating pgs, wait' (-11)
[19:39] <spadaccio> the same for rbd (but not for data)
[19:44] <sjust> what was the output for data?
[19:45] <spadaccio> root@ugcn1:~/ceph-0.39# ceph osd pool set data pg_num 8
[19:45] <spadaccio> 2012-01-09 18:57:51.288125 mon <- [osd,pool,set,data,pg_num,8]
[19:45] <spadaccio> ^[[A2012-01-09 18:57:51.626072 mon.0 -> 'set pool 0 pg_num to 8' (0)
[19:46] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[19:46] <spadaccio> (sorry for the spurious escape sequence)
[19:46] <sjust> did you also increase pgp_num (second command from above)?
[19:46] <spadaccio> not yet
[19:46] <sjust> ok
[19:46] <spadaccio> I just issued these 3 commadns
[19:46] <spadaccio> commands*
[19:46] <sjust> go ahead and do the 3 for pgp_num as well
[19:46] <sjust> then restart the cluster
[19:47] <spadaccio> ok
[19:47] <spadaccio> done
[19:48] <spadaccio> got an error on metadata and rbd
[19:48] <spadaccio> 2012-01-09 19:06:02.381337 mon.0 -> 'specified pgp_num 8 > pg_num 0' (-22)
[19:49] <sjust> spadaccio: could you pastebin the output of ceph pg dump again?
[19:49] <spadaccio> yep, 2 seconds
[19:50] <spadaccio> http://pastebin.com/urSB62Kt
[19:50] <spadaccio> sjust: ^
[19:50] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[19:53] <spadaccio> still up:creating, BTW
[19:55] * fghaas (~florian@85-127-93-41.dynamic.xdsl-line.inode.at) Quit (Read error: Connection timed out)
[19:56] * fghaas (~florian@85-127-93-41.dynamic.xdsl-line.inode.at) has joined #ceph
[20:02] <sjust> could you try ceph osd pool set metadata pg_num 8 again?
[20:04] <spadaccio> worked.. should I also do it for rbd?
[20:04] <sjust> yeah, and do pgp_num for metadata and rbd as well
[20:04] <sjust> odd, I don't know why it didn't work the first time
[20:04] <spadaccio> rbd doesn't work
[20:04] <spadaccio> root@ugcn1:~/ceph-0.39# ceph osd pool set metadata pg_num 8
[20:05] <spadaccio> 2012-01-09 19:22:26.084406 mon <- [osd,pool,set,metadata,pg_num,8]
[20:05] <spadaccio> 2012-01-09 19:22:26.422138 mon.0 -> 'set pool 1 pg_num to 8' (0)
[20:05] <spadaccio> root@ugcn1:~/ceph-0.39# ceph osd pool set rbd pg_num 8
[20:05] <spadaccio> 2012-01-09 19:23:22.609350 mon <- [osd,pool,set,rbd,pg_num,8]
[20:05] <spadaccio> 2012-01-09 19:23:22.609951 mon.0 -> 'currently creating pgs, wait' (-11)
[20:05] <sjust> oh, try setting pgp_num for metadata first
[20:05] <fghaas> Tv, sagewk, gregaf: since it's been mentioned in the email thread about the OCF agents: has anyone tackled systemd integration for osd/mds/mon yet?
[20:05] <spadaccio> root@ugcn1:~/ceph-0.39# ceph osd pool set metadata pgp_num 8
[20:05] <spadaccio> 2012-01-09 19:24:00.832734 mon <- [osd,pool,set,metadata,pgp_num,8]
[20:05] <spadaccio> 2012-01-09 19:24:00.833307 mon.0 -> 'still creating pgs, wait' (-11)
[20:05] <spadaccio> root@ugcn1:~/ceph-0.39# ceph osd pool set rbd pgp_num 8
[20:06] <spadaccio> 2012-01-09 19:24:06.643872 mon <- [osd,pool,set,rbd,pgp_num,8]
[20:06] <spadaccio> 2012-01-09 19:24:06.644283 mon.0 -> 'specified pgp_num 8 > pg_num 0' (-22)
[20:06] <spadaccio> sjust: didn't work either
[20:06] <gregaf> fghaas: don't think so!
[20:06] <Tv> fghaas: not systemd, i'm making headway on upstart
[20:06] <sjust> spadaccio: hmm, could you post ceph pg dump one more time?
[20:06] <spadaccio> yes
[20:06] <Tv> fghaas: btw i'm really worried the ocf work is heading into a deadend
[20:06] <fghaas> Tv: does upstart have a facility for automatically recovering daemons on failure?
[20:07] <Tv> fghaas: yes
[20:07] <fghaas> Tv: how so?
[20:07] <Tv> fghaas: there's no "one osd" daemon
[20:07] <spadaccio> sjust: http://pastebin.com/xrdc2nSr
[20:07] <fghaas> Tv: elaborate please?
[20:07] <Tv> fghaas: with upstart, i handle that with parametrized jobs, e.g. job "ceph-osd" takes parameter OSD_ID=42
[20:08] <fghaas> oh, well adding that is a piece of cake in the OCF ra
[20:08] <fghaas> "primitive osd_42 ocf:ceph:osd params instance=42", like that?
[20:08] <Tv> fghaas: never used ocf, i wouldn't know
[20:09] <fghaas> we can define as many parameters as we want
[20:09] <Tv> fghaas: but the actual osd ids served by a single host will come & go based on hard disks currently operational
[20:10] <sjust> spadaccio: try 'ceph osd pool set metadata pgp_num 8' again
[20:10] <Tv> fghaas: i know it's quite hard to figure out right now, but what work i've had time to put it in on that is at https://github.com/NewDreamNetwork/ceph-cookbooks/blob/master/ceph/recipes/bootstrap_osd.rb
[20:10] <spadaccio> sjust: mon.0 -> 'set pool 1 pgp_num to 8' (0)
[20:11] <Tv> fghaas: but a brief summary goes like this: suitable new block device is detected; osd id is allocated; a new osd daemon is started for that id
[20:11] <Tv> fghaas: a block device goes faulty and is removed from the system -> the relevant osd id is shut down
[20:11] <sjust> spadaccio: ok, looks like we just weren't waiting long enough. Try both commands for rbd
[20:12] <Tv> fghaas: but honestly that'll take some months to clear out fully.. but any assumption of a "boot time /etc/init.d/ceph-osd start is all it'll need" is wrong
[20:12] <spadaccio> sjust: so, pg_num worked, while pgp_num didn't
[20:13] <fghaas> Tv: yeah, thanks, but that boot time thingy is something the init script already does, so what's your point?
[20:13] <sjust> spadaccio: sorry, I should have explained. pg_num increases the number of pgs. However, the osd must actually create the pgs. Until that happens, you can't increase pgp_num.
[20:13] <sjust> ceph -s should indicate that some pgs are 'creating' right?
[20:13] <spadaccio> yes.. 7 creating
[20:13] <spadaccio> out of 30
[20:14] <sjust> once there are no longer any creating, you can do the pgp_num command and then restart the mds daemon
[20:14] <fghaas> also, forgive the stupid question (/me is clueless about chef), but can you point me to the upstart job?
[20:14] <spadaccio> is it the same if I just restart everything?
[20:15] <spadaccio> /etc/init.d/ceph restart
[20:15] <spadaccio> the pgp_num command succeeded nwo
[20:15] <spadaccio> now
[20:15] <Tv> fghaas: for example https://github.com/NewDreamNetwork/ceph-cookbooks/blob/master/ceph/templates/default/upstart-ceph-osd.conf.erb
[20:16] <sjust> spadaccio: restarting everything should be ok
[20:16] <Tv> fghaas: i'm just saying you really shouldn't build much code around the existing init.d scripts; they're already known to not be good enough
[20:17] <spadaccio> sjust: finally! that worked! up:active
[20:18] <fghaas> oh that I'm fine with, and I'm also entirely fine with something else providing the monitoring and recovery features that pacemaker provides, whether it's upstart or systemd I don't mind much. btw: that upstart job could already plug into pacemaker as it is; it supports those natively
[20:18] <spadaccio> (even if I didn't really understand "what" worked... but it's not important I guess)
[20:19] <Tv> fghaas: i'm still unclear on what you really see pacemaker's role as being
[20:19] <gregaf> Tv: restart of crashed daemons mostly, I think
[20:19] <fghaas> daemon monitoring and recovery, with the added benefit that cluster nodes are aware of the status of others
[20:19] <sjust> spadaccio: thanks for sticking with me. The metadata server (mds) stores data on the osd cluster. That data is broken into pools (data for file data and metadata for file metadata and hierarchy information). Those pools are broken into placement groups (pgs) for distribution across osds. In this case, your cluster was created with no pgs due to a bug in the default osdmap creation.
[20:19] <Tv> gregaf: upstart, systemd and runit all do that...
[20:20] <Tv> fghaas: ceph has the "awareness" already inside of the application, for everything that it cares about
[20:21] <spadaccio> sjust: thanks to you for helping me. But I don't understand why I didn't clear it up by using the new version
[20:21] <fghaas> ya know, guys, I did ask here whether someone was already working on a recovery facility the daemons before I started on the ocf scripts... :)
[20:21] <fghaas> "recovery facility *for* the daemons"
[20:22] <sjust> spadaccio: the old version created the 'osdmap'. When you loaded 0.39, it continued to use the old 'osdmap' so we had to manually force pg creation.
[20:22] <fghaas> so, like I said, if I can contribute to the upstart or systemd work instead, I'm all fine with that
[20:23] <spadaccio> sjust: ok, got it. Thanks again! :)
[20:23] <sjust> spadaccio: sure, happy to help!
[20:23] <Tv> fghaas: sorry, this is my first day back after a vacation
[20:23] <Tv> fghaas: i'm catching up on all fronts
[21:09] * jojy (~jvarghese@75-54-228-176.lightspeed.sntcca.sbcglobal.net) Quit (Quit: jojy)
[21:25] * jojy (~jvarghese@75-54-228-176.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[21:52] * fghaas (~florian@85-127-93-41.dynamic.xdsl-line.inode.at) Quit (Ping timeout: 480 seconds)
[21:53] * jojy (~jvarghese@75-54-228-176.lightspeed.sntcca.sbcglobal.net) Quit (Quit: jojy)
[22:16] <dwm_> Does the following log entry look wrong?
[22:16] <dwm_> osd.3[32132]: 7fbed36a1700 osd.3 367 pg[0.a1( v 59'1425 (59'1423,59'1425] n=1425 ec=1 les/c 367/367 342/342/342) [3,0,1] r=0 lpr=342 lcod 0'0 mlcod 0'0 !hml active+clean] truncate_seq 1 > current 0, truncating to 18446744073709551615
[22:16] <dwm_> (Context: bug #1759)
[22:56] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has joined #ceph
[22:57] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has left #ceph
[22:59] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[22:59] * fronlius (~fronlius@f054113122.adsl.alicedsl.de) Quit (Quit: fronlius)
[22:59] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[23:08] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[23:22] * BManojlovic (~steki@212.200.243.100) has joined #ceph
[23:46] <sjust> dwm_: what version are you running?
[23:46] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[23:46] <dwm_> sjust: master as of a few days ago, let me check the commit rev..
[23:46] <sjust> dwm_: yeah, truncating to -1 is probably incorrect :)
[23:47] <dwm_> Commit: a1252463055e2d6816407bd6465e74dea87a0955
[23:48] <dwm_> sjust: I have a -lot- of those messages.
[23:48] <sjust> on the same object/pg ?
[23:48] <dwm_> Also, quite a few like: osd.1[11779]: 7f3663cfd700 osd.1 369 OSD::ms_verify_authorizer name=mds.terra15 auid=18446744073709551615
[23:49] <dwm_> sjust: On many pgs. (Which entry is the object number?)
[23:49] <sjust> the object name doesn't appear to be in that message
[23:50] <dwm_> Right, presumably need more context. (Grepped my logs for 2^64-1)
[23:50] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[23:51] <sjust> the ms_verify_authorizer message is harmless
[23:51] <dwm_> sjust: Ok, just noticing a non-zero number of entries with the same wrapped -1 value for auid.
[23:52] <sjust> oh, did not notice that
[23:52] <sjust> ok
[23:52] <sjust> nvm, -1 is default for auid
[23:52] <dwm_> Would an object ID look something like: 10000062024.00000000/head/68a014a1
[23:52] <sjust> yes
[23:53] <sjust> the first number portion is the object name
[23:53] <sjust> head is the snapid
[23:53] <sjust> the last bit is the hash
[23:53] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[23:54] <dwm_> Okay, the object names -- at least in this section of log I'm looking at -- appear to be sequential.
[23:54] <sjust> are the truncate operations with -1 offset being applied to objects like that one?
[23:54] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[23:54] <sjust> ok
[23:54] <dwm_> Also seeing: osd.5[24122]: 7f8257a9c700 filestore(/data/osd) truncate 0.3b_head/10000062026.00000000/head/6cb18c3b size 18446744073709551615 = -2
[23:57] <dwm_> (Which, when these operations are presumably replayed later from the OSD journal, the -ENOENT changes to an -EINVAL and the OSD aborts.)
[23:57] <dwm_> Or something of that approximate shape.

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.