I often am frustrated by the crappiness of UI APIs in the scripting languages I have available. But I have found writing web-apps in PHP to be mostly quite pleasant, despite the fact I hate HTML. Obviously I can't reasonably run all my scripts on Dreamhost's systems, some things only make sense here. But to run CGI programs on my own system needs a web server. I'm fairly sure I tried webfsd for this in the past (which I still sometimes use for quickly transferring files across our LAN, though it has... issues...), but IIRC its CGI support didn't work for me. Meh. I think I successfully used CGI on Boa years ago, but I uninstalled that because it ran as a system-wide webserver, which I didn't really want.
Lots of people have made "small, lightweight fast simple web server written entirely in bla bla bla", with little obvious benefits/reasons over the 5000 other ones. I may or may not join them. However if I do, what I basically want is an httpd that:
- Limits where clients connect from, maybe to just localhost, possibly configurable to include an IP address range (eg for a LAN). It should not be intended for the public internet, at all.
- Is intended to run CGIs, and supports at least ordinary CGI, maybe also FCGI and/or SCGI as well. This is simply the raison-d'etre of the damn thing.
- Does little if anything that is clever in terms of webserving. It probably shouldn't bother with content negotiation, nor virtual hosts, etc. Although ok I guess virtual hosts could be handy- I could edit the /etc/hosts file and get localhost to masquerade as some remote site, configure the server to give that hostname special treatment, and log requests or somesuch from mystery flash apps. But maybe that would be better handled with a different server anyway I imagine, and it would probably fall foul of the server's other limitations. So this is probably superfluous, although it would probably be easy too, if only supporting a small set of hosts.
- Supports files in the document root ONLY, for avoiding security issues of directory traversal which I don't really know enough about to be comfortable with. Although if it's only for localhost it should be of limited importance
- No tilde-expansion, it's not meant for running priveledged system-wide, it's for individual users to run their own copies. So per-user homepage support is irrelevant.
- No SSL, TLS, or IPV6. Not interested. I imagine for making sufficiently secure web-apps, https would be very helpful, but then there's that crap about certificates. Meh. There's local CAs, but then the certs generally need to be installed on the browser to make things better. Umm.. I dunno. *undecided*
- Don't support Basic Auth, probably not even Digest Auth. They are broken by design pretty much. Possibly they'd be almost useful in conjunction with https, but so would be a CGI-based auth system, and it's much better to keep the server simple.
- MUST support GET, POST, and COOKIES. PUT is sort of irrelevant I think.
- Binds to a non-special port, or maybe even works via socket?! Yes, in fact it could AFAICT do that quite easily. I wonder how that differs to using inetd, apart from that inetd is system-wide. Anyway, the point is it'd use anonymous ports like 89123 or something (except that's not a real port IIRC), to neither be obvious nor need priveledges. (addendum: no, can't use socket for this, for the simple reason that socket doesn't pass on what IP is connecting, it only prints that to stderr. Possibly that would be redirectable, but sod it, not worth the effort I suspect)
- Undecided what it'd be written in. Obvious choices include C, Perl, PHP (except I don't have it at home), or... SED?! Ok not really SED, I would almost certainly need it to support *bidirectional* IPC, and it only does unidirectional. And really it would be quite painful I'm sure. What about... Awk? ;) Gawk apparently does do bidirectional IPC, but I don't really know it. Ahem. M4 is out of the question, not gonna touch that again with a bargepole after the pain of MMSS. Don't fancy Lua. Don't know Icon well enough, and had problems with it before, but it might be fun to do CGI itself with. I think I should limit myself to obvious choices unless something else springs to mind. See what's involved.
- chroot might be a good thing, although I imagine that's rather incompatible with running CGI???
- Have I mentioned don't bother doing directory-listing? That should be handled by an external app, if desired, eg an actual CGI. However the facility for mapping / onto either specific filenames or scripts or whatever, is very worthwhile, and would be the main basis for achieving directory listings if wanted. Furthermore the same mechanism as silently mapping / onto other URLs (rather than redirecting) could be expanded usefully to support general URL rewriting like in Apache's htaccess files with mod_rewrite. I think there's some good applications for this like making php-generated media look like ordinary media files rather than php files, which some browsers might prefer.
existing small webservers
There's tons of the buggers. But frankly many of them look comparable in size to Apache, which I'm presuming is the canonical "big" webserver. (Actually "Apache common" package seems to be fairly big, but it's still not gargantuan).
Actually, on further inspection it seems much of the reported "installed size" of Debian packages is probably from directories and subdirectories- even when those directories are shared with other packages, eg /usr/ etc, because they still have to declare those. And of course there's still things like documentation (I should bloody hope). So many of these may be a lot smaller than that figure would suggest. Anyway, a list of ones I found, by no means exhaustive:
- thttpd: Tiny/Turbo/Throttling Httpd. Seems a bit fancy maybe.
- BozoHttpd: Purports to be the sort of thing I'm after, very limited feature-set. It doesn't look limited enough to me, although it does seem pretty small. And I *am* still undecided about whether https might be desirable after all.
- MiniHttpd: Purports to be a small webserver, it seems moderately so. Wanted features it has: CGI, GET/POST/HEAD (do I want the latter? well maybe), and trailing-slash redirection. Undecided features: https and virtual hosting. Unwanted features: directory listing (should be done by external handlers), basic auth, IPv6
- Mathopd(?!): "Very small, yet very fast HTTP server". It may be very fast, but it doesn't seem noticably smaller than many of these other packages. It supports CGI, and some fancyish features I don't much care about. It's probably meant as a system-wide server. (addendum, it seems that the Debian "installed size" for packages can be a little inflated, I think it takes directories into account, even though many of the directories would be shared; anyway, the "installed size" may be around 200K, but the binary is around 60K. That is rather small, but it's far from the smallest)
- Fnord is "yet another small httpd", and yes it is. "Installed size" is around 170k, but the binary, as it claims, is 15k. However, there are 2 other binaries- "fnord-cgi" and "fnord-idx"; I imagine the latter is for directory listings, I'm sorta ok with that as it's done as a separate program :D Really it should be a user-supplied program though. Seems rather feature-rich, but runs from an inetd type superservery thing. Doesn't do https AFAICT.
- Monkey, is a "fast, efficient, small and easy to configure web server". It is pretty small, 43k binary and 170k installed-size. It may be easy to configure, but probably not for our purposes of running as normal users only. It seems to make use of config files in /etc, not sure if that'd be overrideable. It also appears to be fairly new, it's not in Debian/Stable yet.
- Lighttpd is a well known "light" webserver, but the description seems to claim it as small *memory footprint*. The installed size is something like 700k, and I suspect it's only designed for running system-wide. Supports CGI, FastCGI, SSI, URL rewriting, and various features I don't need.
- Micro-httpd, "really small HTTP server"; Yes, it really is. Installed size of just 60k, binary is 9k. Runs from inetd, but also is associated with Micro-inetd, which apparently is designed for ordinary-user one-off type use! (Even if micro-httpd may not be; I think the point is that it'd be adaptable to that). Apart from being tiny, it features directory/ redirection and some other things... I'm not clear if it does CGI though?! It also claims it can do https "by wrapping it in stunnel", of which I'm a little skeptical. Has code spent on directory listings, and if it's via an inetd, I guess I wouldn't be able to block traffic by IP, except through the inetd's mechanisms. Or would it? I'm not very clear on inetd's interface... This is also also associated with Micro-proxy, an http/https proxy. I'm fairly sure this doesn't give the https support to micro-httpd though.
- sh-httpd; even smaller httpd, written in bash. *b0rk* I think it is meant to run from inetd, though it's not very clear. It needs some program I don't have, "getpeername", which I presume is part of a toolset for writing inetd-based servers?? Mostly I guess it'd serve as an example, but I dunno.
- xs-httpd; another small httpd that supports CGI and SSL. Haven't really looked at it though.
- Anti-Web Httpd, seems to be a simple server that can be told to be complex, but it sounds like you have to put it into "complex" mode to get CGI support, or something. I didn't understand it too well (very tired when writing this one up). It also sounded as though you get to register one CGI extension and one CGI handler/interpreter, so you can't do mixed language scripting (unless it's via bash :P ); OTOH cgi scripts can be in any directory at least, not that that's unheard of.
Note some of my "requirements" when applied to pre-existing servers, are more like preferences/prejudices. The main thing is simply that I want something to run only on a per-user basis as needed, rather than permanently system-wide on port 80, I want it secure, and I want to be able to run my own bespoke web-apps on it. Smallness is a virtue but it needn't be the absolute smallest. Minimal featureset is a virtue for avoiding configuration complexity and worries about whether X Y and Z work.
Other things
- Cherokee is meant to be fast, and flexible/modular, which is good, but it isn't small, and it's probably only designed to be site-wide.
- "thy" is described as "tiny" but has an installed-size of almost half a meg. Not huge, but not tiny. I get the impression it's been abandoned too maybe??
Increasingly random stuff TODO: move some of these project links elsewhere
- Libcgi-ssi-perl: process SSI stuff from in Perl CGI scripts. I'm not very clear on this. However, it would be neat to have a primitive SSI-like language for the server, for quick+dirty stuff. Maybe add some extensions of my own to it. That however, probably would not be done through this module, unless perhaps I were writing the server in Perl.
- ptunnel, tunnel TCP connections over ICMP (ping) packets. WTF?! But cool, unless you're a corporate firewall admin.
- SoCat ("Socket Cat"); a bit like socket or netcat, but I'm not very clear on how it is supposed to be better. Hmm actually it can do not only SSL but UDP? That's rather something, it rather covers that program I wanted to write, if so. *interested* (Update, this is really very cool, I could probably write a page on it, if only to get the documentation in a more digestible form than the huge long man page)
- libwrap0, aka TCP Wrappers Library. Already have this (but not the -dev library nor docs), it is sort of part of TCPD, which is in turn effectively like part of inetd (in that you tend to use the two things together, but I may be confused). I get the impression this would be useful in writing servers, especially when you want to limit access.
- libmatrixssl, a small SSL library optimised for embedded systems. Might be handy for making an https server! OTOH OpenSSL and libgcrypt etc etc are commonly available too on non-embedded systems.
- "dammit", an "Unbelievably inflexible build tool". Their words, not mine!! Something to look into in my contemplations of Make replacements.
- Jam, another Make replacement, as used by some projects. I think the FreeType project have their own version that they maintain. Sounds moderately powerful although to be honest the name puts me off.
- lolcat *ahem*
- Mowyw Writes Your Websites, an offline site-management type system similar to MMSS, but newer.
- "important alert" *ahem*
- STunnel, that encryption thing proposed for wrapping some servers to give effective https support. (Still not sure if that is true, but there it is to investigate)
- Zzuf, a "multi-purpose fuzzer", handy for testing programs (eg webservers!) against iffy/malformed input. Probably not an adequate substitute for methodical testing, but a very good supplement to it I'm sure.
TODO: Find out how SCGI works as that sounds good. And how inetd interfacing works, that confuses the crap out of me.
TODO: decide if that todo above should be using todo tags. ouch
more options found since
- libMicrohttpd, a Gnu project that appears different to Microhttpd, it's a little C library intended to enable you to embed simple web servers into other programs. Might be ok for these purposes, I haven't looked especially closely at it. Hopefully it's secure :P
- qshttpd, "Quick+Simple Http Daemon", unfortunately it looks like it only supports static content.
- Shttp, a small httpd using a library(?) called "Serverkit", which sounds interesting but I've not looked into it. By the sounds of things, shttp might be most worthwhile as a piece of example code for how to make a more suitable server with serverkit perhaps.
- chttpd, "the little webserver that could"; simple little HTTP/0.9 server, derived from dhttpd but in C instead and with some tweaks. Is offered partly as intelligible example code for a server.
- nhttpd, aka Nostromo, a small webserver that supports CGI/1.1; has a lot of other features though, gives the impression of being made with only system-wide use in mind, and I don't think much of the way they implement user directories (shows the users' actual home directories, not the public_html/ subdirs), not that I'd be using the latter anyway. Looks otherwise quite good, and is running that site.
TODO: also add those ones I have in my bookmarks from ages ago.
SSI
Looks like it's pretty much a commodity standard language rather than something very server-specific.
Server_Side_Includes article at Wikipedia summarises it, it looks very simple so probably manageable to make support for.
I think the main thing needed to support it, would be a lexer that can detect what part of a document (content, tag, quoted string in a tag, character following a \ char, or character entity) different characters are in, in order to determine correctly what is an SSI tag. The rest is probably very easy.
If possible, I would extend it to be able to specify $_GET and $_POST elements as parameters to pass to programs it calls (as an alternative to using the exec cgi directive; why would you put exec cgi in an SSI when you could just use a CGI to begin with?! Actually I can think of reasons yes...). This would make it pretty damn useful for quick+dirty coding. BUT those parameters would have to be escaped for the shell, which the interpreter really should do automatically.
Meh. By the looks of things, the Wikipedia page is only good for a vague summary, and a better reference would be the one from Apache, which also lists the ability to set variables. And I think the features I'd want from a mini server-embedded language would be different to what SSI gives probably. Mostly those GET and POST elements, and simple database access, apart from some of the same things it does give, but also a spec that I can be more sure of (seems a little vague in places, and there's questions I don't see answered). I think it shouldn't have to do anything too obscure, if I want that I'll use CGI.
Other CGI ideas
See the CGI/1.1 standard doc, though it is quite old. It doesn't seem especially hard although it looks like it shifts some of the burden of parsing GET and POST requests to the CGI script.
Could make a simple CGI-programming language using Flex or Ragel, and CGILib. It could be like a tiny little alternative to PHP, with similar access to $_GET and $_POST etc. Built-in functions for accessing SQLite databases, Berkeley DB databases, some other kinds too maybe, and plain fixed-width record flatfile databases. Oh and Ming I guess, as SWF is so handy. Perhaps it could make use of GNU Lightning for doing meta-programming, particularly as this might make it practical to have a set of extensions written in itself.
As mentioned above, an SSI type implementation in the server itself might be a worthwhile supplement to full-on CGI programming, to let it do simple stuff quite fast (no forking). It'd presumably need a parser written (again) with Ragel or something.
RECENT BRAINWAVE: A very nice simple way of handling a new bookmarky thingy webapp, would be that Firefox supports bookmarklets and keymarks, which can be like little snippets of Javascript embedded in bookmarks, and callable by doing things like entering "gi potato" to do a google images search for "potato", when "gi" is a keyword given to a bookmarklet. Except the "gi" example substitutes text into a normal URL, and a more bookmarkletty bookmarklet substitutes it into a piece of javascript code. (actually here I got keymarks and bookmarklets a bit mixed up. A situation exacerbated by the fact the stupid bastards don't really say much about them, let alone use the term "bookmarklet" in their knowlege base)
Maybe a bookmarklet or similar could be made to send the current URL, encoded, to a script on the locally-running CGI app server thingy, in order to do external bookmarks. That shoud probably be quite easy Javascript, although the fact of how to call it (typing into the addressbar sounds cumbersome, probably no easier than copy-pasting it elsewhere by hand.). (idea has since been explored in bookmarklets and keymarks)
Web-server side of it
A web sever primarily for CGI, still has to implement a reasonable amount of HTTPish stuff even if it's not going to do much of what a traditional server would.
HTTP/1.0 is defined in
RFC 1945HTTP/1.1 is defined in
RFC 2616I should probably aim somewhere in the middle, and maybe send error messages for features that aren't supported?? I'm not sure. I should read the blasted specs first.
Also important to see RFC 2119 which is about requirement definitions in RFCs in general, and RFC 2145, "Use and Interpretation of HTTP Version Numbers".
HTTPS
HTTPS is not something I can use on Dreamhost, because it's not really compatible with virtual hosting and they need you to pay for a dedicated IP server before you can get it. However it might be usable on a local system like this one I'm describing. There's various standards it can be built upon, such as SSLv2, SSLv3, and TLS (which has various versions too). SSLv2 is apparently quite obsolete, so TLS is probably the appropriate way to go. Or that "stunnel" program maybe??? I'm not sure about whether the latter would get in the way.
TLS/1.0 is covered by
RFC 2246I don't have the link for 1.1 etc etc, 1.0 is old.
Also see CAcert who do free certificates, would be useful with this (but still not with Dreamhost).
POST data
See
RFC 1867, "Form-based File Upload in HTML", which as far as I can see, explains how the POST method is actually done a lot more clearly than the HTTP specs themselves :P
Apart from that it's sorta handy to see how uploads are implemented too, although I doubt I'd add that feature to the server really. As it's meant to run locally, and only really has the network socket to communicate with a web browser. Files can be sent either as URLs or names in the filesystem. However, it might wind up used on a LAN somewhere, and the protocol doesn't look so hard to receive. Question of how to deal with the files sent remains meh, and do I resend the rest of the items to CGIs as normal POSTdata?
Anyway: From both that RFC, and the PHP docs on file uploads, uploads use an encoding type of "multipart/form-data". However, according to both that RFC and my experiments with Dillo talking to a socket, normal POST data from forms uses "application/x-www-form-urlencoded". Example, I got:
POST /pants.cgi HTTP/1.0
Host: localhost
User-Agent: Dillo/0.8.0-pre
Cookie2: $Version="1"
Content-type: application/x-www-form-urlencoded
Content-length: 19
field=put+text+here
NOTE, that there was NOT a trailing newline after "here", it just ended. That is part of the spec, there should only be as many characters as stated in Content-length, no extra newlines or crap added to it.
See also, networking stuff, lcgid implementation