#ceph IRC Log


IRC Log for 2011-04-29

Timestamps are in GMT/BST.

[0:01] * verwilst (~verwilst@dD576FAAE.access.telenet.be) has joined #ceph
[0:06] <lxo> ugh. just lost one of my 3 osds; cosd died, machine rebooted and the underlying btrfs wouldn't mount any more. /me crosses fingers for a successful recovery
[0:10] <bchrisman> any recommendation on compatibility/casting between ceph_dir_result & the standard DIR type?
[0:17] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[0:18] <Tv> bchrisman: DIR is going away
[0:19] <Tv> bchrisman: i mean, libceph returning DIRs is going away
[0:19] <Tv> bchrisman: hence, no direct cast; you need to map field by field, if you so wish, and take care of any incompatibility
[0:21] <bchrisman> Tv: yeah??? there's a definition of ceph_dir_result_t somewhere? Somehow I'm not finding it.. intelligence failure on my part now doubt.
[0:22] <bchrisman> I guess I should be able to track it down in Client
[0:22] * alexxy (~alexxy@ Quit (Remote host closed the connection)
[0:22] <Tv> src/client/Client.h
[0:22] <sagewk> bchrisman: ceph_dir_result_t was renamed dir_result_t. but what do you need it for? the libceph readdir interface should be all you need?
[0:22] <Tv> which is wrong
[0:23] <Tv> it needs to be visible to C
[0:24] <Tv> sagewk: but ceph_readdir_r just fills the struct, he needs to access the fields
[0:24] <Tv> the struct definition needs to move from Client.h to libceph.h
[0:24] <bchrisman> sagewk: I was going to check/map ceph structures to SMB_STRUCT_* structures.
[0:24] <Tv> bleh it's a "c++ struct" with functions in it
[0:24] <sagewk> it fills in a struct dirent for that
[0:25] <sagewk> the dir_result is the internal handle for internal readdir state
[0:25] <Tv> oh the *dirp is a cursor not an output arg
[0:25] <sagewk> bchrisman: what structures do you want to map?
[0:26] <lxo> hey, is it known that there's some inconsistency between cfuse and the kernel client WRT dev nodes?
[0:26] <sagewk> lxo: what kind of inconsistency?
[0:26] <lxo> dev numbers show up differently depending on whether they're rsynced into a cfuse or (k)ceph mount
[0:26] <bchrisman> sagewk: SMB_STRUCT_DIR, SMB_STRUCT_DIRENT right now.
[0:26] * alexxy (~alexxy@ has joined #ceph
[0:26] <Tv> sagewk: btw -- dirent not dirent64?
[0:27] <Tv> lxo: are you running a 32-bit machine?
[0:27] <lxo> nope, all 64-bit
[0:27] <Tv> lxo: then differently how?
[0:27] <sagewk> tv: dirent... how does dirent64 vary?
[0:28] <Tv> sagewk: dirent is 32/64-bit depending on -D__USE_FILE_OFFSET64, dirent64 is always 64-bit
[0:28] <Tv> and only available with -D__USE_LARGEFILE64
[0:28] <Tv> historical junk
[0:29] <sagewk> tv: hrm.. dirent sounds better, although it means libceph users much -D__USE_FILE_OFFSET64
[0:29] <sagewk> (ala libfuse)
[0:29] <lxo> Tv, I can't look at the numbers right now, filesystem is recovering, but if I alternate rsync into cfuse and kceph mounts of the same filesystem, each rsync will modify the dev nodes, whereas if I don't alternate, nothing changes
[0:29] <Tv> sagewk: and if they don't, they'll get silent corruption :(
[0:29] <bchrisman> hmmm
[0:29] <Tv> sagewk: because your struct dirent * is not their struct dirent *
[0:29] <lxo> I checked with ls -l some time ago, and the numbers were correct when I looked in the last-rsynced mount point, but wrong in the other
[0:30] <lxo> but I forgot looking for a bug report or filing one. this discussion of dirent for some reason reminded me of it
[0:30] <Tv> lxo: you might have found a bug
[0:30] <sagewk> lxo: if you can let us know which is updating it wrong (cfuse or kclient) and how when you get a chance it should be an easy fix
[0:31] <lxo> not known, then. ok, I'll file it as soon as I can access the nodes again, to report the exact changes
[0:31] <lxo> I can't tell which one is wrong. both are self-consistent, but they're not consistent with each other. I don't know how to tell what the ???internal??? representation should look like
[0:31] <sagewk> tv: #ifndef __USE_FILE_OFFSET64 #warning bad bad libceph user #endif
[0:32] <Tv> sagewk: in libceph.h etc, and make that an #error, and i'll agree
[0:32] <bchrisman> yeah.. SMB_STRUCT_DIRENT is dirent or dirent64??? based on that same large file flag.
[0:32] <Tv> i'm reading the specs to see which one is more widespread
[0:38] <Tv> ah the confusing LFS apis.. http://www.suse.de/~aj/linux_lfs.html "Using LFS"
[0:39] <Tv> and http://www.gnu.org/s/libc/manual/html_node/Feature-Test-Macros.html#Feature-Test-Macros
[0:40] <Tv> ok so LARGEFILE64 allows using either at runtime, OFFSET64 makes the old api switch to 64-bits
[0:40] <Tv> which is what dirent.h kinda told me already
[0:41] <Tv> so i guess
[0:41] <Tv> #if _FILE_OFFSET_BITS != 64
[0:41] <Tv> # error "libceph only works with 64-bit file operations, use -D_FILE_OFFSET_BITS=64"
[0:41] <Tv> #endif
[0:42] <Tv> the __USE_ stuff is internal but fairly nicely documented, perhaps it can be relied on, but the above is guaranteed to work
[0:54] * Juul (~Juul@slim.dhcp.lbl.gov) has joined #ceph
[1:07] <bchrisman> Tv: yeah.. roughly: /usr/include/ceph/libceph.h:78: note: expected ???struct dirent *??? but argument is of type ???struct dirent64 **???
[1:09] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[1:14] * Juul (~Juul@slim.dhcp.lbl.gov) Quit (Quit: Leaving)
[1:17] * verwilst (~verwilst@dD576FAAE.access.telenet.be) Quit (Quit: Ex-Chat)
[1:19] * benpol (~benp@garage.reed.edu) has left #ceph
[1:29] * bchrisman (~Adium@sjs-cc-wifi-1-1-lc-int.sjsu.edu) Quit (Quit: Leaving.)
[1:49] * MarkN (~nathan@ has joined #ceph
[1:50] * MarkN (~nathan@ has left #ceph
[1:56] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[1:59] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Read error: Operation timed out)
[2:02] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[2:52] * darkfader (~floh@ Quit (Remote host closed the connection)
[2:53] * darkfader (~floh@ has joined #ceph
[3:06] * lxo (~aoliva@ Quit (Read error: Connection timed out)
[3:06] * lxo (~aoliva@ has joined #ceph
[3:24] * DanielFriesen (~dantman@S0106001eec4a8147.vs.shawcable.net) Quit (Quit: http://daniel.friesen.name or ELSE!)
[3:54] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) has joined #ceph
[4:39] * Dantman (~dantman@S0106001eec4a8147.vs.shawcable.net) has joined #ceph
[5:40] * lxo (~aoliva@ Quit (Read error: Connection reset by peer)
[5:40] * lxo (~aoliva@ has joined #ceph
[7:33] * joshd (~jdurgin@ has joined #ceph
[8:00] * neurodrone_ (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[8:00] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Read error: Connection reset by peer)
[8:00] * neurodrone_ is now known as neurodrone
[8:24] * lxo (~aoliva@ Quit (Read error: Connection reset by peer)
[8:25] * lxo (~aoliva@ has joined #ceph
[8:30] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[8:36] * joshd (~jdurgin@ Quit (Quit: Leaving.)
[8:39] * lxo (~aoliva@ Quit (Ping timeout: 480 seconds)
[8:42] * lxo (~aoliva@ has joined #ceph
[8:57] * allsystemsarego (~allsystem@ has joined #ceph
[9:01] * lxo (~aoliva@ Quit (Ping timeout: 480 seconds)
[9:07] * lxo (~aoliva@ has joined #ceph
[9:08] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[10:54] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Quit: zzZZZZzz)
[10:56] * Yoric (~David@87-231-38-145.rev.numericable.fr) has joined #ceph
[11:24] * Yoric_ (~David@87-231-38-145.rev.numericable.fr) has joined #ceph
[11:24] * Yoric (~David@87-231-38-145.rev.numericable.fr) Quit (Read error: Connection reset by peer)
[11:24] * Yoric_ is now known as Yoric
[12:10] * Yoric (~David@87-231-38-145.rev.numericable.fr) Quit (Quit: Yoric)
[12:12] * Yoric (~David@87-231-38-145.rev.numericable.fr) has joined #ceph
[12:36] * Yoric_ (~David@87-231-38-145.rev.numericable.fr) has joined #ceph
[12:36] * Yoric (~David@87-231-38-145.rev.numericable.fr) Quit (Read error: Connection reset by peer)
[12:36] * Yoric_ is now known as Yoric
[12:51] * Yoric (~David@87-231-38-145.rev.numericable.fr) Quit (Quit: Yoric)
[13:14] * trollface (~stingray@stingr.net) has joined #ceph
[13:21] * Yulya is now known as Yulya_the_drama_queen
[13:24] <chraible> hi
[13:25] <chraible> how can I add an new osd to my cluster? I tried the way described here http://ceph.newdream.net/wiki/OSD_cluster_expansion/contraction but nothin happend ...
[13:49] <wido> chraible: how do you mean? Nothing happend?
[13:49] <wido> no data movement? Or did you number of OSD's not increate?
[13:49] <wido> increase*
[13:50] <chraible> number of osd's do not increase
[13:50] <wido> did you change max_osd?
[13:50] <chraible> yes from 3 to 4
[13:51] <wido> ceph -s, what does the 'osds' line show?
[13:51] <chraible> osd e105: 3 osds: 3 up, 3 in
[13:52] <wido> chraible: ceph osd getmaxosd
[13:54] * Yulya (~Yulya@ip-95-220-174-246.bb.netbynet.ru) has joined #ceph
[13:54] <chraible> ok done...
[13:54] * Yulya_the_drama_queen (~Yulya@ip-95-220-190-12.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[13:54] <chraible> I will test ... :)
[13:54] <wido> chraible: What does it say? What is your max_osd?
[13:55] <wido> since that osds line is showing it's at 3
[13:55] <wido> max_osd is the number of OSD's, the the number of your 'highest' OSD
[14:48] <chraible> @wido sorry i had a meeting :(
[14:49] <chraible> 2011-04-29 14:48:27.822254 mon <- [osd,getmaxosd]
[14:49] <chraible> 2011-04-29 14:48:27.822691 mon1 -> 'max_osd = 4 in epoch 119' (0)
[14:49] <chraible> this is shown when i do ceph osd getmaxosd
[15:00] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[15:02] <wido> chraible: Is that 4th OSD running?
[15:04] * Yoric (~David@87-231-38-145.rev.numericable.fr) has joined #ceph
[15:07] * Yoric (~David@87-231-38-145.rev.numericable.fr) Quit ()
[15:07] * Yoric (~David@87-231-38-145.rev.numericable.fr) has joined #ceph
[15:23] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[15:32] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[15:35] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[15:37] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[15:40] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[15:59] * cephnewbie (~cephnewbi@173-24-225-53.client.mchsi.com) has joined #ceph
[16:02] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[16:03] * cephnewbie2 (~cephnewbi@173-24-225-53.client.mchsi.com) Quit (Ping timeout: 480 seconds)
[16:04] <trollface> hahaha
[16:04] <trollface> this is in another channel but this is relevant to this one I think
[16:06] <trollface> <@user> multi-million dollar SAN, petabytes of storage mirrored and fully redundant, 4 paths to each disk through the FC fabric.....one faulty controller crashes the SAN switches and we loose access to all drives through the enterprise
[16:06] <trollface> <@user> left work last night at 1AM
[16:06] <trollface> <@user> redundancy my @$$
[16:06] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[16:14] * morse (~morse@supercomputing.univpm.it) Quit (Ping timeout: 480 seconds)
[16:19] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[16:30] * Yulya is now known as Yulya_the_drama_queen
[16:35] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[16:52] <Yulya_the_drama_queen> hello guys
[16:54] <Yulya_the_drama_queen> some machines in claster becomes unavailable via ssh, after reboot i see http://paste2.org/p/1390701 in log
[16:54] <Yulya_the_drama_queen> what may cause this problem?
[17:01] * cephnewbie2 (~cephnewbi@173-24-225-53.client.mchsi.com) has joined #ceph
[17:01] <trollface> Julia the drama queen.
[17:02] <trollface> this does seem to me like a kernel bug, unrelated to ceph :)
[17:03] * cephnewbie3 (~cephnewbi@173-24-225-53.client.mchsi.com) has joined #ceph
[17:07] * cephnewbie (~cephnewbi@173-24-225-53.client.mchsi.com) Quit (Ping timeout: 480 seconds)
[17:09] * cephnewbie2 (~cephnewbi@173-24-225-53.client.mchsi.com) Quit (Ping timeout: 480 seconds)
[17:11] * cephnewbie (~cephnewbi@173-24-225-53.client.mchsi.com) has joined #ceph
[17:14] * cephnewbie3 (~cephnewbi@173-24-225-53.client.mchsi.com) Quit (Ping timeout: 480 seconds)
[17:29] * Yoric_ (~David@87-231-38-145.rev.numericable.fr) has joined #ceph
[17:29] * Yoric (~David@87-231-38-145.rev.numericable.fr) Quit (Read error: Connection reset by peer)
[17:29] * Yoric_ is now known as Yoric
[17:41] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[17:42] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) Quit (Quit: Leaving.)
[17:42] * aliguori_ (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[17:43] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:44] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Quit: zzZZZZzz)
[18:07] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:29] * Yoric (~David@87-231-38-145.rev.numericable.fr) Quit (Quit: Yoric)
[18:34] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:38] * joshd (~jdurgin@ has joined #ceph
[18:49] * joshd (~jdurgin@ Quit (Ping timeout: 480 seconds)
[18:54] <wido> sagewk: I tried the memory profiling of the MON today
[18:54] <wido> ceph mon tell 0 head dump
[18:54] <sagewk> wido: anything interesting?
[18:54] <wido> is a unknown command
[18:54] <wido> It's linked to tcmalloc now, but this command isn't in the subsystem it seems
[18:54] <sagewk> oh, yeah, tell doesn't work on monitors the same way.
[18:55] <Tv> // go!
[18:55] <Tv> if (did_bind)
[18:55] <Tv> accepter.start();
[18:55] <Tv> reaper_started = true;
[18:55] <Tv> reaper_thread.create();
[18:55] <Tv> that's the part in messenger->start after daemonize
[18:55] <sagewk> tv: that's basically the only important bit, too.
[18:55] <Tv> the lock & started stuff before the daemonize looks a bit funny.. like, do people really call ->start multiple times?
[18:56] <Tv> that sounds like it could be just an assert
[18:56] <sagewk> not any more
[18:57] <sagewk> pushed a small patch to clean it up a bit
[18:57] <sagewk> the daemonize part can safely be pulled out now.. as long as it happens before start is called.
[18:58] <sagewk> wido: i have meeting this morning.. hopefully greg can help you out there when he gets in in a few minutes
[18:58] <Tv> sagewk: ok i'll do that
[18:59] <sagewk> tv: common_daemonize() or something?
[18:59] <Tv> yup
[18:59] <sagewk> basically just replace the bool daemize arg to .start()
[18:59] <sagewk> yep
[18:59] <sagewk> cool
[19:08] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * lxo (~aoliva@ Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * eternaleye (~eternaley@ Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * ghaskins (~ghaskins@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * maswan (~maswan@kennedy.acc.umu.se) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * yehudasa (~quassel@ip-66-33-206-8.dreamhost.com) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * cclien_ (~cclien@ec2-175-41-146-71.ap-southeast-1.compute.amazonaws.com) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * MK_FG (~MK_FG@ Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * jjchen (~jjchen@lo4.cfw-a-gci.greatamerica.corp.yahoo.com) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * mwodrich (~Terminus@ip-66-33-206-8.dreamhost.com) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * __jt___ (~james@jamestaylor.org) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * nolan (~nolan@phong.sigbus.net) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * allsystemsarego (~allsystem@ Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * tjikkun_ (~tjikkun@195-240-187-63.ip.telfort.nl) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * votz (~votz@dhcp0020.grt.resnet.group.upenn.edu) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * Meths (rift@ Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * atg (~atg@please.dont.hacktheinter.net) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * iggy (~iggy@theiggy.com) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * cephnewbie (~cephnewbi@173-24-225-53.client.mchsi.com) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * macana (~ml.macana@ Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * sagewk (~sage@ip-66-33-206-8.dreamhost.com) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * midnightmagic (~midnightm@S0106000102ec26fe.gv.shawcable.net) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * sjust (~sam@ip-66-33-206-8.dreamhost.com) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * Dantman (~dantman@S0106001eec4a8147.vs.shawcable.net) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * darkfader (~floh@ Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * lidongyang (~lidongyan@ Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * josef (~seven@nat-pool-rdu.redhat.com) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * [ack]_ (ANONYMOUS@ Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * badari (~badari@ Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * Jiaju (~jjzhang@ Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * monrad-51468 (~mmk@domitian.tdx.dk) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * Guest3488 (quasselcor@bas11-montreal02-1128536388.dsl.bell.ca) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * todin (tuxadero@kudu.in-berlin.de) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * pruby (~tim@leibniz.catalyst.net.nz) Quit (reticulum.oftc.net resistance.oftc.net)
[19:08] * zoobab (zoobab@vic.ffii.org) Quit (reticulum.oftc.net solenoid.oftc.net)
[19:08] * frank_ (frank@november.openminds.be) Quit (reticulum.oftc.net solenoid.oftc.net)
[19:08] * jeffhung_ (~jeffhung@60-250-103-120.HINET-IP.hinet.net) Quit (reticulum.oftc.net solenoid.oftc.net)
[19:08] * DLange (~DLange@dlange.user.oftc.net) Quit (reticulum.oftc.net solenoid.oftc.net)
[19:08] * morse (~morse@supercomputing.univpm.it) Quit (reticulum.oftc.net kinetic.oftc.net)
[19:08] * trollface (~stingray@stingr.net) Quit (reticulum.oftc.net kinetic.oftc.net)
[19:08] * alexxy (~alexxy@ Quit (reticulum.oftc.net kinetic.oftc.net)
[19:08] * chraible (~chraible@blackhole.science-computing.de) Quit (reticulum.oftc.net kinetic.oftc.net)
[19:08] * wonko_be_ (bernard@november.openminds.be) Quit (reticulum.oftc.net kinetic.oftc.net)
[19:08] * jantje (~jan@paranoid.nl) Quit (reticulum.oftc.net kinetic.oftc.net)
[19:08] * Anticimex (anticimex@netforce.csbnet.se) Quit (reticulum.oftc.net kinetic.oftc.net)
[19:08] * stefanha (~stefanha@yuzuki.vmsplice.net) Quit (reticulum.oftc.net kinetic.oftc.net)
[19:08] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) Quit (reticulum.oftc.net kinetic.oftc.net)
[19:09] * Anticimex (anticimex@netforce.csbnet.se) has joined #ceph
[19:09] * stefanha (~stefanha@yuzuki.vmsplice.net) has joined #ceph
[19:09] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) has joined #ceph
[19:09] * jantje (~jan@paranoid.nl) has joined #ceph
[19:09] * wonko_be_ (bernard@november.openminds.be) has joined #ceph
[19:09] * chraible (~chraible@blackhole.science-computing.de) has joined #ceph
[19:09] * alexxy (~alexxy@ has joined #ceph
[19:09] * trollface (~stingray@stingr.net) has joined #ceph
[19:09] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[19:09] * zoobab (zoobab@vic.ffii.org) has joined #ceph
[19:09] * frank_ (frank@november.openminds.be) has joined #ceph
[19:09] * DLange (~DLange@dlange.user.oftc.net) has joined #ceph
[19:09] * jeffhung_ (~jeffhung@60-250-103-120.HINET-IP.hinet.net) has joined #ceph
[19:09] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[19:09] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:09] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[19:09] * cephnewbie (~cephnewbi@173-24-225-53.client.mchsi.com) has joined #ceph
[19:09] * lxo (~aoliva@ has joined #ceph
[19:09] * allsystemsarego (~allsystem@ has joined #ceph
[19:09] * Dantman (~dantman@S0106001eec4a8147.vs.shawcable.net) has joined #ceph
[19:09] * darkfader (~floh@ has joined #ceph
[19:09] * tjikkun_ (~tjikkun@195-240-187-63.ip.telfort.nl) has joined #ceph
[19:09] * Guest3488 (quasselcor@bas11-montreal02-1128536388.dsl.bell.ca) has joined #ceph
[19:09] * votz (~votz@dhcp0020.grt.resnet.group.upenn.edu) has joined #ceph
[19:09] * eternaleye (~eternaley@ has joined #ceph
[19:09] * todin (tuxadero@kudu.in-berlin.de) has joined #ceph
[19:09] * lidongyang (~lidongyan@ has joined #ceph
[19:09] * monrad-51468 (~mmk@domitian.tdx.dk) has joined #ceph
[19:09] * josef (~seven@nat-pool-rdu.redhat.com) has joined #ceph
[19:09] * [ack]_ (ANONYMOUS@ has joined #ceph
[19:09] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) has joined #ceph
[19:09] * jjchen (~jjchen@lo4.cfw-a-gci.greatamerica.corp.yahoo.com) has joined #ceph
[19:09] * badari (~badari@ has joined #ceph
[19:09] * sagewk (~sage@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:09] * iggy (~iggy@theiggy.com) has joined #ceph
[19:09] * mwodrich (~Terminus@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:09] * yehudasa (~quassel@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:09] * cclien_ (~cclien@ec2-175-41-146-71.ap-southeast-1.compute.amazonaws.com) has joined #ceph
[19:09] * midnightmagic (~midnightm@S0106000102ec26fe.gv.shawcable.net) has joined #ceph
[19:09] * __jt___ (~james@jamestaylor.org) has joined #ceph
[19:09] * sjust (~sam@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:09] * Meths (rift@ has joined #ceph
[19:09] * Jiaju (~jjzhang@ has joined #ceph
[19:09] * pruby (~tim@leibniz.catalyst.net.nz) has joined #ceph
[19:09] * nolan (~nolan@phong.sigbus.net) has joined #ceph
[19:09] * atg (~atg@please.dont.hacktheinter.net) has joined #ceph
[19:09] * MK_FG (~MK_FG@ has joined #ceph
[19:09] * maswan (~maswan@kennedy.acc.umu.se) has joined #ceph
[19:09] * macana (~ml.macana@ has joined #ceph
[19:09] * ghaskins (~ghaskins@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[19:10] <wido> sagewk: tnx! I have to go now anyway, np
[19:10] <wido> I see the memory usage climbing now slowly, atm the mon is at 600M RSS and climbing
[19:10] <wido> but, got to go
[19:12] * cmccabe (~cmccabe@ has joined #ceph
[19:17] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:18] * gregaf (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:19] * joshd (~jdurgin@ has joined #ceph
[19:28] * benpol (~benp@garage.reed.edu) has joined #ceph
[19:31] <Tv> sage seems to be in another meeting
[19:31] <Tv> so guessing daily is postponed
[19:32] <greglap> yeah, 11 every Friday, he said that last week I think
[19:32] <Tv> oh right
[19:32] * Tv fiddles with alarms
[19:37] * joshd (~jdurgin@ Quit (Ping timeout: 480 seconds)
[20:34] * joshd (~jdurgin@ has joined #ceph
[20:38] * sagelap (~sage@ip-66-33-206-8.dreamhost.com) has joined #ceph
[20:38] <sagelap> tv, cmccabe: are you guys coordinating? tv was looking at the daemonize stuff thsi morning
[20:39] <Tv> cmccabe: oh yeah i'm rewriting it right now
[20:39] <cmccabe> sagelap: I looked at the branch
[20:39] <Tv> cmccabe: what are you up to?
[20:39] <sagelap> also, don't forget cfuse calls fork() directly, so whatever pre/post daemonize stuff there is needs to be broken into separate stubs for cfuse to use
[20:39] <Tv> sagelap: yup, doing that
[20:39] <sagelap> (for the nss crap)
[20:39] <cmccabe> tv: we should coordinate a little bit better because what I'm doing now will affect it
[20:40] <cmccabe> tv: in fact, you might want to wait for just a little bit before starting on your NSS tweaks
[20:40] <Tv> bleh
[20:40] <cmccabe> tv: basically, there will be a "bool postfork" in md_config_t, and you can register a configuration observer to see when it changes from false->true
[20:41] <Tv> you're making that quite complex..
[20:41] <sagelap> cmccabe: why not just call common_postdaemonize at the fork callsites?
[20:41] <Tv> and that's not even about configuration
[20:41] <cmccabe> tv: not at all. what is complex is trying to manage that yourself
[20:41] <cmccabe> tv: that's the first step. The second step is to move daemonize out of the messenger
[20:41] <Tv> cmccabe: dude
[20:41] <cmccabe> tv: it never belonged there
[20:41] <sagelap> cmccabe: not surei agree. there are only a few call sites
[20:41] <Tv> already done that
[20:42] <Tv> it was very easy
[20:42] <Tv> no futzing with configuration
[20:42] <Tv> no callbacks, observers, anything complex
[20:42] <sagelap> all we're chanigng is messeger->start(true) to messenger->start(); common_daemonize();
[20:42] <sagelap> with teh option to call common_{pre,post}_daemonize() and fork() yourself
[20:42] <cmccabe> then we're on the same page
[20:43] <sagelap> ok, except configuration observers have nothing to do with that
[20:43] <cmccabe> sagelap: the point is that certain things need to be adjusted when fork happens
[20:43] <Tv> see branch wip-nss-vs-fork-3
[20:43] <Tv> it's all done there
[20:43] <cmccabe> such as the pidfile
[20:43] <Tv> all i have left is a unit test for crypto
[20:43] <sagelap> ..which is why there's a post_daemonize stub...
[20:43] <cmccabe> and dout needs to know that daemonize happens so that it can stop trying to write to stdout, stderr
[20:44] <cmccabe> those things are best handled as observers, a framework we already have
[20:44] <Tv> but "i forked" is not configuration
[20:44] <Tv> and there's exactly one non-standard caller, cfuse
[20:44] <sagelap> and those can go in the pre/post fork stubs
[20:44] <Tv> the explicit calls there are really simple
[20:44] <cmccabe> it's messy because remember, there is no global dout any more
[20:44] <cmccabe> so which douts get the callback?
[20:45] <cmccabe> are you going to make a list
[20:45] <sagelap> keep in mind only the daemons and cfuse fork, and those all have exactly one
[20:45] <cmccabe> or you can start referencing g_conf in common/common_init.cc, but I've been trying to remove references to that for a while
[20:45] <sagelap> this never comes up for libs
[20:45] <cmccabe> sagelap: I am aware of that, but the compiler isn't.
[20:45] <sagelap> pass in g_conf or something
[20:46] <cmccabe> common_init_daemonize does take a md_config_t* argument.
[20:46] <cmccabe> see master branch.
[20:49] <cmccabe> another reason to use an observer is because there are things like the keyring
[20:49] <cmccabe> that can't be initialized until postfork is true
[20:49] <sagelap> why not?
[20:49] <cmccabe> because it relies on the crypto lib
[20:49] <cmccabe> and the crypto lib can't be initialized until postfork is true
[20:50] <Tv> there's no permanent state in keyring related to the crypto lib
[20:50] <sagelap> the wip-nss-vs-fork-3 is what i was thinking.
[20:50] <Tv> some day there might be, i'd like to go that way, to avoid redoing operations all the time
[20:50] <sagelap> it sounds like the observer stuff you're talking about is orthogonal to that simple refactoring
[20:51] <sagelap> (altho i would make the start() not have a default param value so the rename isn't needed)
[20:51] <Tv> sagelap: yeah i had no criteria to decide which way to go
[20:51] <sagelap> or just make it start() and start(nonce).
[20:51] <sagelap> or whatever, as long as they're both changed.
[20:51] <Tv> sagelap: oh i can go *back* to that
[20:52] <Tv> sagelap: but temporarily, that meant start(false) looked like a nonce
[20:52] <Tv> sagelap: i prefer to get compiler errors when i forget to change a caller
[20:52] <sagelap> are there actaully any start() callers?
[20:52] <Tv> yes
[20:52] <Tv> but it seems start() is just start(0)
[20:52] <cmccabe> ok. so you guys are saying that you can initialize the crypto keyring without having the crypto lib initialized
[20:52] <Tv> so maybe that's the nicer way to go
[20:52] <sagelap> yeah. ok that's fine, more explicit anyway.
[20:52] <cmccabe> strange, but I guess I will check up on it.
[20:52] * adjohn (~adjohn@ has joined #ceph
[20:53] <cmccabe> but it seems like in the long term, this won't be true, because we'll need to keep some crypto state around, to improve peformance.
[20:53] <cmccabe> also it's kind of weird to have the keyring, but not be able to use it. That seems like it will lead to bugs.
[20:53] <sagelap> cmccabe: you can init crypto, denint prior to fork, and then reinint. that's what the pre/post fork stuff is for... bc lots of stuff needs the cyprot stuff prior to fork (and obviously after too)
[20:53] <Tv> cmccabe: the crypto lib is initialized before and after the fork, it's just temporarily shut down
[20:54] <cmccabe> what a mess
[20:54] <Tv> that's NSS for you
[20:54] <Tv> Mozilla Quality(tm)
[20:54] <sagelap> there is, by design, nothing going on when we fork, so i don't think we need to worry too much about it
[20:54] <Tv> DoD Approved(tm)
[20:55] <Tv> it makes sense only if you really plan to run your crypto ops on a special hardware, plugged into PCIe/USB/whatever; then you need to re-establish your communication channel after a fork
[20:55] * adjohn (~adjohn@ Quit ()
[20:55] <Tv> NSS is horrible for software crypto (/ merely accelerated crypto)
[20:55] <cmccabe> that doesn't really make sense either
[20:55] <cmccabe> a file descriptor should be enough to talk to your hardware
[20:56] <Tv> this is the kind of hardware that has requirements like "there cannot be two users at once"
[20:56] <Tv> for maximum trust
[20:56] <cmccabe> then it should be up to the library user to ensure that
[20:56] <Tv> umm, FIPS don't work quite like that
[20:57] <Tv> each unit needs to guarantee its own security
[20:57] <sagelap> ANYway...
[20:57] <Tv> anyway
[20:57] <cmccabe> that would be easy to add
[20:57] <cmccabe> you can't spin this kind of breakage as a feature
[20:57] <cmccabe> even if I wanted breakage, I know how to do it
[20:58] <Tv> cmccabe: bitching about NSS is unhelpful
[20:58] <Tv> trust me, i was there first ;)
[20:58] <Tv> anyway
[20:59] <Tv> the changes needed to make NSS work again with daemons are in the branch; i'll do the ->start(nonce) cleanup and push that as -4
[20:59] <sagelap> k
[20:59] <cmccabe> can I please review this before you push it
[20:59] <cmccabe> it is the config code I wrote after all
[21:00] <cmccabe> now I am hearing about how you're adding all this stuff to it, potentially ugly, because you believe you absolutely have to.
[21:00] <Tv> cmccabe: it's effectively the -3 branch, with ->start_with_nonce() as ->start() and the non-arg version removed
[21:00] <cmccabe> which may be true, but I want to check it
[21:01] <cmccabe> ok. first of all
[21:01] <cmccabe> if it's not idempotent, it shouldn't go in common_preinit
[21:01] <cmccabe> common_preinit is called every time we create a library rados_cluster / ceph_mount_info
[21:02] <Tv> if (!libceph_initialized) {
[21:03] <Tv> cmccabe: it's already guarded against
[21:03] <cmccabe> that is going away
[21:03] <cmccabe> preinit's function is to create a new struct md_config_t
[21:03] <Tv> so it just shifts to inside preinit
[21:03] <cmccabe> that's fine
[21:04] <cmccabe> please be sure to use a mutex if you're using shared state
[21:04] <Tv> same thing as libceph/librados are already doing
[21:05] <Tv> just move the mutex inside preinit when it's time to refactor that
[21:05] <cmccabe> remember, that mutex is going away
[21:05] <Tv> s/going away/migrating inside preinit/
[21:05] <cmccabe> can you please just create a mutex inside preinit
[21:05] <cmccabe> I want to get as much done now as possible
[21:05] <cmccabe> rather than deferring the work until later
[21:06] <Tv> sagewk: what do you want to about the assert(did_bind) in messenger->start()
[21:06] <Tv> sagewk: now that i look at it again, that's why i did it the way i did
[21:07] <Tv> sagewk: if (nonce!=0) assert(did_bind) seemed ugly
[21:07] * sagelap (~sage@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[21:07] <cmccabe> I don't understand
[21:08] <cmccabe> I'm on wip-nss-vs-fork-3 and I don't see any assert(did_bind) ?
[21:08] <Tv> cmccabe: that's been in SimpleMessenger for a while
[21:09] <cmccabe> cmccabe@metropolis:~/src/ceph3/src$ grep 'assert.*bind' msg/SimpleMessenger.cc
[21:09] <cmccabe> cmccabe@metropolis:~/src/ceph3/src$
[21:10] <cmccabe> anyway, I like what you've done with SimpleMessenger::start_with_nonce
[21:11] <cmccabe> oh, this is more silly code in the header file
[21:12] <cmccabe> can't you just do something reasonable if you have no nonce?
[21:12] <cmccabe> like call srand48 or something
[21:12] * Yulya_th1_drama_queen (~Yulya@ip-95-220-143-70.bb.netbynet.ru) has joined #ceph
[21:12] <cmccabe> or call getpid
[21:13] <Tv> cmccabe: i think the 0 means things, it's not just "pick a random number"
[21:14] <Tv> because it's connected to the whole accepter bind mechanism
[21:14] <Tv> note that the existing nonces tend to all be pids, so never 0
[21:15] <Tv> it doesn't really look like a nonce in the crypto sense
[21:15] <Tv> more like endpoint id
[21:16] <cmccabe> the whole thing would make sense if the first call to bind somehow established some global "nonce" state
[21:16] <cmccabe> then subsequent calls would be allowed to not include anything for the nonce argument
[21:16] <cmccabe> however, I can't find (in a few seconds of searching), any such global state
[21:17] <Tv> huh there should be only one start call per instance of messenger
[21:18] <Tv> so i have no idea what you mean by subsequent calls
[21:18] <cmccabe> I think bind can establish a nonce
[21:18] * benpol (~benp@garage.reed.edu) has left #ceph
[21:19] <cmccabe> so you can do
[21:19] <cmccabe> bind(nonce=FOO), start(None)
[21:19] <cmccabe> or you can do
[21:19] <cmccabe> start(nonce=FOO)
[21:19] <cmccabe> but you can't just do
[21:19] <cmccabe> start(None)
[21:19] * Yulya_the_drama_queen (~Yulya@ip-95-220-174-246.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[21:19] <cmccabe> so overall, I would recommend just making the nonce a constructor parameter to the messenger, to end this silliness
[21:20] <Tv> anyway, that's unrelated to my commit ripping out the daemonize
[21:20] <Tv> we can keep cleaning it up, but that's just further cleanup
[21:21] <cmccabe> it seems fine overall
[21:21] <cmccabe> now that I understand your strategy
[21:21] <Tv> the only reason i touched the prototype of start was because i wanted to avoid ->start(daemonize) from looking like ->start(nonce) temporarily
[21:21] <cmccabe> of initializing the library, de-initializing it, calling daemon, and re-initializing
[21:22] <cmccabe> I don't really like that strategy philosophically, but assuming that's our choice
[21:22] <cmccabe> the code in the branch is all right
[21:22] * joshd (~jdurgin@ Quit (Ping timeout: 480 seconds)
[21:23] <Tv> can't do much about it
[21:23] <Tv> NSS is braindead
[21:23] <cmccabe> I would argue that we shouldn't be braindead though
[21:23] <cmccabe> we should only initialize the library after fork
[21:23] <cmccabe> or some point when we've decided fork isn't going to happen
[21:23] <cmccabe> but anyway
[21:23] <cmccabe> that is more work to do
[21:24] <cmccabe> and it sounds like the decision has been made
[21:24] <cmccabe> I already implemented daemonize inside common_init with change 4107e2966aac7ff68ec30ef6d759ca7b390efda2
[21:24] <Tv> cmccabe: sage wanted to have it available before fork
[21:24] <cmccabe> s/change/commit
[21:25] <cmccabe> so we'll have to resolve that in the merge
[21:25] <Tv> initing only after fork was my first approach
[21:25] <cmccabe> but that seems like the only complication
[21:25] <Tv> yeah should be easy
[21:25] <cmccabe> tv: I understand.
[21:25] <Tv> i'm waiting to hear back on the did_bind stuff, and trying to write a unit test, then this'll go in
[21:26] <cmccabe> my change adds an atexit to remove the pid file
[21:26] <cmccabe> rather than relying on messenger to do that
[21:26] <cmccabe> again, messenger shouldn't be fooling with pid files.
[21:26] <cmccabe> also, the signal handlers, which are separate from messenger, also handle removing the pid_file
[21:27] <cmccabe> so basically the pid file can be removed normally, atexit, or by a signal handler.
[21:27] <cmccabe> anyway, looks good
[21:28] <Tv> saw that, will merge right
[21:30] <cmccabe> the official AWS "uh oh" summary is up
[21:32] <cmccabe> I'm having trouble plowing through the entire thing, but it seems like their recovery process for EBS was the root cause
[21:33] <cmccabe> basically a cascading failure-- the recovery process sent out a tidal wave of traffic causing more EBS nodes to fail or become unreachable
[21:34] <cmccabe> key phrase: "re-mirroring storm"
[21:35] <cmccabe> bbl, lunch
[21:42] <bchrisman> was there a call made on dirent vs dirent64? Seems dirent is directly used in the ceph code (at least client)??? I can compile samba sans LARGE for now, but if it's changing soon, I'd wait for that.
[21:47] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[21:52] <Tv> bchrisman: yeah dirent with 64-bit contents seems to be the way to go
[21:52] <Tv> not sure if that'll work for samba vfs
[21:56] <bchrisman> is that implemented?
[21:56] <bchrisman> err will that make a dirent identical to a dirent64?
[21:56] <Tv> bchrisman: apparently, has been from day one, recently got an extra preprocessor stuff to ensure callers are doing it right
[21:57] <Tv> sort of yeah but it's still called struct dirent
[21:57] <bchrisman> ahh okay.. so it should cast fine then...
[21:57] <Tv> the story goes something like this: dirent64 is meant for when you need for 32-bit and 64-bit files in the same process
[21:57] <Tv> struct dirent itself will contain 64-bit offsets if you just compile with the right flag
[21:58] <bchrisman> ahh okay.. makes sense
[22:06] * sagewk (~sage@ip-66-33-206-8.dreamhost.com) has left #ceph
[22:18] * joshd (~jdurgin@ has joined #ceph
[22:21] <cmccabe> bchrisman: I would imagine that everyone with any sanity is using O_LARGEFILE now
[22:21] <Tv> cmccabe: rebased, fixed, and pushed as wip-nss-vs-fork-4, please check it out
[22:21] <cmccabe> bchrisman: if we are assuming that in libceph, maybe we should have a static_assert to ensure it
[22:22] <Tv> cmccabe: sage added a pragma check for that
[22:22] <cmccabe> tv: cool
[22:22] <Tv> which reminds me to improve the message..
[22:22] <cmccabe> tv: I wonder what happens if a non-largefile user links with a largefile library?
[22:22] <Tv> cmccabe: *kaboom*
[22:22] <Tv> hence the pragma
[22:23] <cmccabe> tv: aren't pragmas all compile-time though?
[22:23] <Tv> well not always kaboom, but kaboom in our case
[22:23] <Tv> cmccabe: sure they are
[22:23] <cmccabe> tv: oh, but the original includes of our headers would fail at compile-time
[22:23] <Tv> but that means you can't #include <libceph.h> without the right stuff
[22:23] <cmccabe> tv: k
[22:26] <cmccabe> tv: looks ok, except again, common_preinit should be idempotent
[22:26] <cmccabe> tv: anyway, I can add that later if you want
[22:27] <Tv> cmccabe: yeah, that comes afterwards
[22:29] <bchrisman> yeah??? not certain how to get type checking to accept that if libceph is using a dirent that's 64-bit and samba is looking for explicit dirent64 when compiled with LARGEFILE.
[22:30] <bchrisman> maybe there's something I'm missing that makes samba compile 64-only and thus use dirent rather than dirent64 like ceph?
[22:30] <Tv> bchrisman: not sure.. samba people might have specific needs to do it the way they are doing it
[22:30] <Tv> bchrisman: i'll take a look in ~10 minutes
[22:31] <cmccabe> bchrisman: I guess we could replace the "struct dirent" types with a DIRENT_T macro
[22:31] <cmccabe> bchrisman: and if DIRENT_T was not defined, simply define it to "struct dirent"
[22:31] <Tv> eww ;)
[22:31] <bchrisman> The header invovlved is doing http://pastebin.com/460tDTnW
[22:32] <cmccabe> bchrisman: that would allow you to do #define DIRENT_T "struct dirent64"
[22:32] <bchrisman> I'm guessI can turn off HAVE_EXPLICIT_LARGEFILE_SUPPORT
[22:32] <bchrisman> which would make it implicit??? which should work..
[22:32] <bchrisman> or at least.. should compile the same as ceph...
[22:32] <Tv> bchrisman: yes that looks right
[22:33] <bchrisman> alright??? thanks guys..
[22:34] <Tv> if we can't solve this samba-side, we could do
[22:34] <Tv> #ifdef _LARGEFILE64_SOURCE
[22:34] <Tv> int ceph_readdir64_r(struct ceph_mount_info *cmount, struct ceph_dir_result *dirp, struct dirent64 *de);
[22:34] <Tv> #endif
[22:34] <cmccabe> well, it looks like sys_fseek will use fseek unless HAVE_EXPLICIT_LARGEFILE_SUPPORT is defined
[22:34] <Tv> but that leads to even more crap about matching compilation env of ceph with compilation env of callers
[22:34] <cmccabe> fseek has this prototype: int fseek(FILE *stream, long offset, int whence);
[22:34] <cmccabe> so, not going to work... on 32-bit systems at least
[22:35] <Tv> fseeko
[22:36] <cmccabe> gesuntheit
[22:36] <bchrisman> we don't call all the way down to sys_fseek in samba/vfs/ceph.. the vfs layer for ceph is kinda 'final destination/backstop'
[22:36] <cmccabe> I guess we can just provide a second function that has a struct dirent64 in libceph.h
[22:36] <cmccabe> and do the typecast inside libceph.cc
[22:37] <cmccabe> I mean, we are guaranteeing that the typecast will work, basically
[22:37] <cmccabe> so we might as well make that a formal part of the API
[22:37] * verwilst (~verwilst@dD576FAAE.access.telenet.be) has joined #ceph
[22:37] <bchrisman> there are a couple other types as well probably.
[22:37] <Tv> cmccabe: except that's a pain to support; how do you cope with all combinations of _LARGEFILE64_SOURCE defined/not defined when compiling libceph and the caller
[22:37] <cmccabe> bchrisman: yeah, but that sys_fseek thing was just one example
[22:38] <cmccabe> tv: we don't need an ifdef there
[22:38] <Tv> cmccabe: you don't have dirent64 with it
[22:38] <Tv> err, *without
[22:38] <bchrisman> so far, I'm mapping/wrapping all vfs operations to libceph (or dropping certain ones for now with ENOTSUP)
[22:39] <Tv> i guess force ceph itself to only compile with the define (-> makes it less portable) etc
[22:39] <Tv> i think we should stay away from struct dirent64
[22:39] <cmccabe> tv: so #ifdef _USE_LARGFILE
[22:39] <cmccabe> <prototype>
[22:39] <cmccabe> #else
[22:39] <cmccabe> #error "compile with OLARGEFILE"
[22:39] <cmccabe> #endif
[22:39] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[22:40] <Tv> cmccabe: 1) __USE_* are glibc internal 2) that makes it less portable
[22:40] <Tv> it's like requiring both bsd and sysv at once everywhere
[22:40] <Tv> usually you pick just one
[22:40] <cmccabe> tv: it seems to me that the whole pragma discussion makes this a moot point
[22:41] <cmccabe> tv: we can't ever be compiled without 64-bit dirent support
[22:41] <Tv> well now we're requiring the struct dirent is 64 bits flavor, which is more common
[22:41] <Tv> there's two different ways of doing it
[22:41] <cmccabe> tv: so we just support both ways, and expose whatever prototypes are appropriate
[22:41] <cmccabe> tv: otherwise the complexity is pushed into the library users, which seems unfair
[22:41] <Tv> which makes it less portable
[22:42] <cmccabe> tv: so if you're running this on your S/390 running iphone OS, you just compile with different flags, and the library doesn't have any protoypes with struct dirent64
[22:42] <cmccabe> tv: what's the big deal
[22:43] * alexxy (~alexxy@ Quit (Ping timeout: 480 seconds)
[22:43] * sagewk (~sage@ip-66-33-206-8.dreamhost.com) has joined #ceph
[22:44] * alexxy[home] (~alexxy@ has joined #ceph
[22:44] <Tv> if you make it conditional, you need to cover more ground
[22:45] <Tv> like compiling ceph without that, but client with it
[22:47] <cmccabe> having some extra, unused type names doesn't break the ABI
[22:47] <cmccabe> your user program that doesn't know what "struct dirent64" means can link against your libceph that does have a function using that somewhere in the symbol table
[22:48] <Tv> i was not talking about that case
[22:48] <cmccabe> what's the case that would not work then
[22:49] <Tv> Tv: like compiling ceph without that, but client with it
[22:50] <cmccabe> tv: ok, so you get a linker error
[22:50] <cmccabe> tv: then you examine the source, see the ifdef, and run configure + make again
[22:52] <cmccabe> anyway, I'm going back to obsync
[22:52] <cmccabe> I'm sure you guys will figure out a way of fitting that round peg in the square hole
[22:53] <cmccabe> :)
[22:59] <Tv> bchrisman: where's the code in the pastebin from?
[23:00] * joshd (~jdurgin@ Quit (Ping timeout: 480 seconds)
[23:00] <Tv> ah i guess you're working on samba 4
[23:01] <bchrisman> Tv: 3.5.6 source 3
[23:01] <bchrisman> err source3/include/includes.h
[23:02] * joshd (~jdurgin@ has joined #ceph
[23:02] <Tv> ah i just apt-get source'd and that's just 3.5.4, it didn't have that in it
[23:02] <Tv> grabbing upstream git repo..
[23:06] <bchrisman> I think was confused me was the configure option there: --disable-largefile omit support for large files
[23:06] <bchrisman> which in retrospect must have an 'explicit' on it..
[23:06] <bchrisman> I don't think they're going to be compiling 32-bit file support on an otherwise 64-bit machine...
[23:09] <Tv> i don't see that configurability
[23:09] <Tv> i'm probably still looking at a different version of samba
[23:10] <Tv> i looked at both the 3.5 and 3.6 stable branches in their git
[23:10] <bchrisman> hmm.. our version has hacks from sernet which people here contracted to perform some gpfs-related samba stuff on our code.
[23:11] <bchrisman> might've been something they distribute by default, even though I don't see how it'd have anything to do with much.
[23:11] <Tv> checkout out their tag release-3-5-6, just one instance of lowercase "largefile" in the tree and that's in docs
[23:11] <bchrisman> it's in 3.6.0 git
[23:12] <bchrisman> I haven't backtracked it from there though.
[23:13] <bchrisman> yeah.. 3.6.0 samba source??? source3/include/includes.h
[23:13] <Tv> looking at tag release-3-6-0pre3 which is the latest in branch v3-6-stable
[23:14] <Tv> i see the part you pastebinned, but not the --disable-largefile
[23:14] <Tv> unless the option is created by some magic that doesn't have the word "largefile" in it
[23:14] <Tv> not in ./configure --help output either
[23:16] <Tv> HAVE_EXPLICIT_LARGEFILE_SUPPORT seems to be set if off_t is large enough, or off64_t is available, either one is good enough
[23:17] <bchrisman> http://pastebin.com/cRX4nztw
[23:17] <bchrisman> yeah.. maybe they're removing that..
[23:18] <bchrisman> in which case.. latest would work and this problem becomes moot.
[23:18] <Tv> well
[23:18] <Tv> the way i read it
[23:18] <Tv> they're using struct dirent64
[23:19] <Tv> whenever that is possible, yup, that's what it looks like
[23:20] <Tv> they'll only fall back to 64-bit struct dirent if the OS doesn't have struct dirent64
[23:20] <Tv> which means that libceph isn't being friendly to you
[23:21] <Tv> the simplest thing you could do is probably something like....
[23:22] <Tv> well, if you wanna play it risky, just cast it
[23:22] <bchrisman> feh.. yer right.. I would have to hide dirent64 completely (HAVE_STRUCT_DIRENT64)
[23:22] <Tv> if you wanna be safe
[23:22] <Tv> void copy_dirent(struct dirent *in, struct dirent64 *out) {
[23:22] <Tv> out->d_ino = in->d_ino;
[23:22] <Tv> out->d_off = in->d_off;
[23:22] <Tv> ...
[23:22] <Tv> }
[23:22] <cmccabe> tv: I don't see how the cast is risky
[23:23] <cmccabe> tv: the compile should fail if the cast fails
[23:23] <cmccabe> or rather if the cast would fail
[23:23] <Tv> bchrisman: i don't see anything that would let you configure HAVE_STRUCT_DIRENT64, it's just detected
[23:23] <bchrisman> hmm.. may not even know what 'dirent' is itself.
[23:23] <Tv> cmccabe: eh? they're two independent structs that have their fields of identical type & order by luck
[23:23] <bchrisman> cast it inside libceph?
[23:23] <Tv> cmccabe: nothing guarantees that
[23:24] <Tv> bchrisman: why wouldn't you know what dirent is..
[23:24] <cmccabe> I'm having trouble finding dirent64 in the man pages
[23:25] <Tv> bchrisman: especially, how would you not know what to expect of dirent, but still know what to expect of dirent64..
[23:26] <Tv> dirent.h defines them pretty clearly (see bits/dirent.h for the actual definition)
[23:26] <Tv> it has d_ino, d_off, d_reclen, d_type and d_name
[23:27] <cmccabe> I see the header, but no documentation about when/whether it is safe to assume they're identical
[23:27] <Tv> cmccabe: never
[23:29] <bchrisman> yeah??? defined easily enough in dirent.h??? no samba macro for 'the other dirent'..
[23:29] <cmccabe> is there a separate 64-bit version of the readdir syscall?\
[23:31] <bchrisman> afaict, samba operations table only has one version of each call.
[23:31] <bchrisman> yeah
[23:31] <cmccabe> looks like there is a struct linux_dirent and a struct linux_dirent64 in the kernel
[23:32] <Tv> cmccabe: readdir64
[23:32] <Tv> or just readdir with -D_FILE_OFFSET_BITS=64
[23:33] <Tv> bchrisman: so you might have to convert; the function a sketched earlier is 8 lines total
[23:33] <Tv> *i sketched
[23:34] <cmccabe> we're not concerned about the number of lines, but about the inefficiency
[23:34] <cmccabe> honestly, I think the best solution is to add some static asserts plus a typecast
[23:34] <Tv> make it work first
[23:34] <Tv> benchmark
[23:35] <cmccabe> you can add a static assert mandating that the same field appears at the same offset
[23:35] <Tv> never bother to fix it because it was way faster than a lot of the crap in ceph messenger
[23:35] <cmccabe> STATIC_ASSERT(offsetof(foo.a) == offsetof(bar.a))
[23:35] <cmccabe> tv: sigh
[23:35] <Tv> most of all
[23:35] <Tv> don't put that in ceph
[23:36] <Tv> or you'll need to support it ;)
[23:36] <cmccabe> #define STATIC_ASSERT(x) (sizeof(int[((x)==0) ? -1 : 0]))
[23:37] <cmccabe> I don't think the fields of struct dirent are chaging any time soon.
[23:37] <cmccabe> you can add a static assert that each field in dirent matches each field in dirent64
[23:37] <cmccabe> it's work, but just work for the compiler
[23:37] <bchrisman> hmm.. maybe something I have to include other than <dirent.h>? (modules/vfs_ceph.c:228: error: ???dirent??? undeclared (first use in this function))
[23:38] <bchrisman> I gotta get my git repo up to github???
[23:38] <cmccabe> I'm confused how that could occur... bits/dirent.h defines struct dirent unconditionally
[23:38] <Tv> bchrisman: that's odd
[23:39] <bchrisman> yeah.. something else must be wrong???
[23:40] <Tv> this compiles just fine:
[23:40] <Tv> #include <dirent.h>
[23:40] <Tv> struct dirent foo;
[23:40] <Tv> perhaps you said dirent not struct dirent...
[23:40] <Tv> bchrisman: ^
[23:40] <bchrisman> aye??? yer a mindreader Tv :)
[23:40] <bchrisman> sloppy...
[23:41] <bchrisman> (on my part)..
[23:41] <cmccabe> really, BSD handled the 64-bit offset transition a little better than linux
[23:42] <cmccabe> they just had a flag day when off_t changed
[23:42] <Tv> if you call that "better"..
[23:42] <cmccabe> off64_t and struct dirent64 never existed for them
[23:42] <Tv> oh don't blame that on linux
[23:42] <Tv> that was a POSIX subcommittee
[23:42] <cmccabe> I'm sure it made it easier for old buggy software to limp along
[23:43] <Tv> much like with many things SysV popularized or POSIX standardized, Just Don't Use It
[23:43] <cmccabe> and maybe it was necessary
[23:43] <cmccabe> but I still feel like if a compatibility mode was going to be created, it should be something that needs to be changed in the buggy software
[23:43] <cmccabe> so why not create off32_t
[23:43] <cmccabe> etc
[23:44] <cmccabe> or D_FILE_OFFSET_32
[23:44] <cmccabe> well, whatever. It's done now. I'm sure there were many flames about it over the years :)
[23:46] <Tv> cmccabe: must be nice living in the ideal world
[23:46] <Tv> the whole point of the LFS(ummit) was to agree on a way to transition nicely
[23:47] <Tv> yes, everything should be -D_FILE_OFFSET_BITS=64 these days; yes, please stop using the old apis; yes, in a decade or so they might (just might!) be ripped out

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.