A Short Guide to writing server-side CGI programs in C++
When a C++ program runs on the web server, it's called a "CGI" program. CGI stands for "Common Gateway Interface"...which frankly means as little to me as it probably does to you! You can write CGI programs in almost any language - PHP being a common choice. However, this article is about CGI's written in C++.
Starting a CGI program
Running a CGI program is just like accessing a URL for an HTML page.
<a href="http://www.sjbaker.org/cgi-bin/cgitest.cgi"> Run my program </a>
...when you click on "Run my program", the Apache server runs the program. Easy as that!
The HREF field is set to what looks a lot like a regular URL. This URL is a little special:
- Firstly, the path is in the "cgi-bin" directory. It's technically possible to put CGI executables anywhere on the web site - but there is a severe security issue here. If the client could give the file name of any executable program anywhere - then any idiot on the web could run any bit of software on your server!! To protect against that, the "Apache" HTTP server can limit where CGI programs must be saved...and typically, that's a directory called "cgi-bin". Whatever program(s) you put there can (in principle) be run by any idiot on the web with any data going into it - so be super-mega-careful.
- Secondly, the file has the ".cgi" extension. On a Linux server, executables typically have no extension and on Windows it's ".exe" - but CGI progams use ".cgi" as a way to remind users that this URL isn't a simple web page! You can use other extensions for scripts in other languages - but stick with ".cgi" for C++ executables.
However, this approach doesn't allow you to pass input into the script - for that we need a <form>...
A typical HTML snippet to invoke a CGI program would be:
<form method="POST" action="http://www.sjbaker.org/cgi-bin/cgitest.cgi"> <input type="text" name="myTextField" value="" size="20" /> <input type="submit" name="mySubmitButton" value="Run CGI!" /> </form>
The form sets the 'method' to "POST". It could alternatively be set to "PUT". The difference is that "PUT" passes the data to the CGI program on the command-line where "POST" passes it into the 'standard input' of the C++ program. "POST" is generally better than "PUT" for lots of complicated reasons - so we'll stick to "POST" for this tutorial. The 'action' field of the form is set to the URL just as in the last example.
- The two <input> sections both have a 'type' - which identifies what kind of widget you get - there are lots of choices.
- They both have a 'name' - which will be sent to the CGI program to say which data belongs to which input widget.
- They both have a 'value' field - which is what will be sent to the CGI as data. In the case of the "text" widget, the value is overwritten by whatever the user types. In the 'submit' widget, it's the label displayed on the widget - AND it's sent up to the server.
- The text input widget also has a 'size' field which sets how wide the widget is in characters.
The second input section says that the type is a 'submit' button - it's just like a type="button" widget except that it can be invoked by hitting the return key - and it has the magical property of causing the form data to be shipped off to the server to be stuffed into the CGI program when you click on it.
What happens next?
When you hit the submit button, the server will start the CGI program and bundle up the 'name' and 'value' fields from all of the input widgets into one gigantic string which your C++ CGI program can read like this:
char formInput [ 1024 ] ; getline ( cin, formInput, 1024 ) ;
(Assuming all of the user input fits into 1023 characters).
The resulting string is like this:
- The "name=value" terms are separated by '&' characters.
- The 'value' parts aren't allowed to have spaces in them by the protocol - so the spaces are replaced by '+' characters
- Characters that are 'special' to URL's such as '&', '+' and most other punctuation are replaced by hex numbers with a '%' sign.
Hello World ==> Hello+World (Space replaced by '+') HelloWorld! ==> HelloWorld%21 ('!' replaced by '%21' because hex 21 is ASCII for '!') Hello + World ==> Hello+%2B+World ('+' replaced by '%2B' and spaces replaced by '+')
Outputting from the CGI program to the browser
The CGI program sends a new HTML page to the browser to replace the one that the user just left. It does this by the simple method of printing it out to cout. The only 'gotcha' here is that the first thing you output from the program absolutely must be:
cout << "Content-type: text/html\n\n" ;
...no variation whatever here! Two newlines!
After that, you just print whatever you want to generate...
cout << "<html><head><title>CGI Demo.</title></head>\n" ; cout << "<body>\n" ; cout << "<h1>CGI Demo</h1>\n" ; cout << "This web page was generated by the C++ code!\n" ; cout << "</body></html>\n" ;
...whatever. Don't forget that when you need to insert double quotes into the output, you have to escape it:
cout << "<img src=\"myimage.png\"/>\n" ;
...it's all to easy to get lost in that maze of '<'s, slashes and backslashes!
A complete example
- C++ code is in http://www.sjbaker.org/cgiTest/cgiTest.cpp
- Sample HTML to launch it is in http://www.sjbaker.org/cgiTest/index.html
- There is a Makefile in http://www.sjbaker.org/cgiTest/Makefile
...and you can run it by clicking HERE.