Thursday, January 31, 2013
Notes on FastCGI and webservers
This post is a distillation of what I've learned over the past couple of months. There's both new information here, and links to everything else on the FastCGI topic that I've written so far.
Wednesday, January 30, 2013
Minimal, Working Perl FastCGI Example, version 2
This is an update to a previous post. File layout remains the same: "site" is a placeholder for the actual site name, and /home/site/web is the actual repository of the project. Static files then appear under public, and Perl modules specific to the site in lib/Site (i.e. visible in Perl as Site::Modname when lib is put in @INC). I am still using mod_fcgid as the FastCGI process manager.
The major improvement: This version handles FCGI-only scripts which have no corresponding CGI URL. I discovered that limitation of the previous version when I tried to write some new code, where Apache or
Everything I liked about the previous version is preserved here: I can create
Apache configuration:
That means the new
I also remove non-{word characters or colons} from the inbound request for security, since my site uses URLs like /path/somereport.pl. You may need to carefully adjust that for your site.
The only thing left that I'd like to do is make this configuration more portable between web servers instead of dependent on Apache's
The major improvement: This version handles FCGI-only scripts which have no corresponding CGI URL. I discovered that limitation of the previous version when I tried to write some new code, where Apache or
mod_fcgid
realized that the CGI version didn't exist, and returned a 404 instead of passing it through the wrapper. As a consequence of solving that problem, FcgidWrapper is no longer necessary, which gives the dispatch.fcgi code a much cleaner environment to work in.Everything I liked about the previous version is preserved here: I can create
Site/Entry/login.pm
to transparently handle /login.pl
as FastCGI, without requiring every other URL to be available in FastCGI form. It also stacks properly with earlier RewriteRules that turn pretty URLs into ones ending in ".pl". Apache configuration:
# Values set via SetEnv will be passed in the request;Again, the regular expression of the RewriteRule is matched before RewriteCond is evaluated, so the backreference
# to affect Perl startup, it must be FcgidInitialEnv
FcgidInitialEnv PERL5LIB /home/site/web/lib
RewriteCond /home/site/web/lib/Site/Entry/$1.pm -f
RewriteRule ^/+(.+)\.pl$ /home/site/web/dispatch.fcgi [L,QSA,H=fcgid-script,E=SITE_HANDLER:$1]
<directory /home/site/web/fcgi>
Options ExecCGI FollowSymLinks
# ...
</directory>
$1
is available to test whether the file exists. This time, I also use the environment flag of the RewriteRule to pass the handler to the dispatch.fcgi script. Since I paid to capture it and strip the leading slashes and extension already, I may as well use it.That means the new
dispatch.fcgi
script doesn't have to do as much cleanup to produce the module name:#!/home/site/bin/perlI remembered to include the
use warnings;
use strict;
use FindBin qw($Bin);
use Site::Response;
use Site::Preloader ();
while (my $q = CGI::Fast->new) {
my ($base, $mod) = ($ENV{SITE_HANDLER});
$base =~ s#/+#::#g;
$base =~ s#[^\w:]##g;
$base ||= 'index';
$mod = "Site::Entry::$base";
my $r = Site::Response->new($base, "$Bin/templates");
eval {
eval "require $mod;"
and $mod->invoke($q, $r);
} or warn "$mod => $@";
$r->send($q);
}
$r->send
call this time. I pass the CGI query object so the response can call $q->header
. That's not strictly necessary—FCGI children process one request at a time and copy $q
to the default CGI object, meaning header
should work fine alone, but I didn't know that yet.I also remove non-{word characters or colons} from the inbound request for security, since my site uses URLs like /path/somereport.pl. You may need to carefully adjust that for your site.
Site::Response
is initialized as a generic error so that if the module dies, the response written to the client is a complete generic error. Otherwise, the template is selected and data set, so the send call ships the completed page instead.The only thing left that I'd like to do is make this configuration more portable between web servers instead of dependent on Apache's
mod_rewrite
and mod_fcgid
, but since Apache isn't killing us at work, it probably won't happen very soon.
Monday, January 28, 2013
mod_fcgid and graceful restarts
I see plenty of this in my logs when the server needs reloaded to pick up fresh Perl:
tl;dr: this appears to be harmless in practice.
The leading portion corresponds to EIDRM (see errno(3)) which comes back out of pthread_mutex_lock and cheerfully gets logged as the failure code of
My best guess for the order of events is that the Apache parent receives a graceful restart, unloads and reloads mod_fcgid, which destroys the mutex as a side effect. After old-generation children tie up their requests, they try to notify their parent that they're available again, only to discover that the mutex is gone. The child then exits, but it doesn't hurt any clients because they've already been served at this point.
This problem is not fixable in Apache 2.2 because there aren't any hooks for graceful-restart. It just unloads DSOs without warning, and their first clue anything happened is that they start receiving config events. By then, the mutex and process table are gone, so the newly-loaded master can't communicate with old-generation children. Someone did make an attempt to fix this for 2.4 (along with modifying
(43)Identifier removed: mod_fcgid: can't lock process table in pid 3218
tl;dr: this appears to be harmless in practice.
The leading portion corresponds to EIDRM (see errno(3)) which comes back out of pthread_mutex_lock and cheerfully gets logged as the failure code of
proctable_lock_internal
. The proctable is in turn locked during request handling.My best guess for the order of events is that the Apache parent receives a graceful restart, unloads and reloads mod_fcgid, which destroys the mutex as a side effect. After old-generation children tie up their requests, they try to notify their parent that they're available again, only to discover that the mutex is gone. The child then exits, but it doesn't hurt any clients because they've already been served at this point.
This problem is not fixable in Apache 2.2 because there aren't any hooks for graceful-restart. It just unloads DSOs without warning, and their first clue anything happened is that they start receiving config events. By then, the mutex and process table are gone, so the newly-loaded master can't communicate with old-generation children. Someone did make an attempt to fix this for 2.4 (along with modifying
mod_cgid
to test their infrastructure) but AFAICT nobody has made this available in mod_fcgid
for 2.4 yet.
Friday, January 11, 2013
Fun, Work, Puzzles, and Programming
Some programming tasks are just more fun than others. The same thing extends to languages—why are Perl and Ruby so much more fun to work with than Python?
I suspect that the answer lies in the scope of the solution space, in a sweet spot between “too straightforward” and “too complex.”
I suspect that the answer lies in the scope of the solution space, in a sweet spot between “too straightforward” and “too complex.”
Wednesday, January 9, 2013
PHP's debug_backtrace: a compact guide
Every time I need to use this function, I can't remember how it works.
- The array includes all call sites leading up to the current stack frame, but not actually the current one. (Everything in the current frame is still in scope to you, so you can use
__FILE__
and__LINE__
or your current variables directly.) - The array is indexed with 0=innermost to N=outermost frame.
- Each array index gives you information related to the call site of the next frame inward / earlier in the array. That is,
$bt[0]
gives you the immediate caller of your current point of execution.$bt[0]['function']
refers to the function or method invocation that called you, e.g. if the main code executesfoo(1),
then insidefunction foo
,$bt[0]['function']
is foo. The file and line point to the file/line containing the call. - When a 'class' key is present, it is the class of the line of code actually executing the call, i.e. what
__CLASS__
is at the 'file' and 'line'. - When an 'object' key is present, it has the actual object being used for dispatch; i.e.
get_class($bt[$i]['object'])
may return either the same value as 'class', or any descendant of that class. - The 'type' key, when present, is either -> or :: for dynamic or static calls, respectively. The latter means that the 'object' key won't be set.
- There is no way in my PHP (5.3.3-14_el6.3 from CentOS updates) to view the invoked class of a static call, e.g. if
SubThing::foo
is called butThing::foo
is executed because SubThing didn't override foo. Per the rules above, 'class' will still report Thing.
<?phpObviously this is a bare-bones approach, and could be adapted to pick different (or report more) stack frames, etc. But, it Works For Me.™
function carp () {
$msg = func_get_args();
if (empty($msg)) $msg = array('warned');
$bt = debug_backtrace();
// find nearest site not in our caller's file
$first_file = $bt[0]['file'];
$end = count($bt);
for ($i = 1; $i < $end; ++$i) {
if ($bt[$i]['file'] != $first_file)
break;
}
if ($i == $end) {
// not found; try the caller's caller.
// otherwise we're stuck with our caller.
$i = ($end > 1 ? 1 : 0);
}
error_log(implode(' ', $msg) .
" at {$bt[$i]['file']}:{$bt[$i]['line']}");
}
Subscribe to:
Posts (Atom)