TUCoPS :: Web :: Apps :: cgi_me~1.txt

How To Remove Meta-characters From User-Supplied Data In CGI Scripts

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


   How To Remove Meta-characters From User-Supplied Data In CGI Scripts

	Please Note:

		(1) The examples here are written in C and Perl, since
			these are two popular languages that most readers
			will be familiar with.  Developers who work in
			other languages are encouraged to adapt these
			examples accordingly.

		(2) The examples presented in this document are simplified
			examples to illustrate the problem and the general
			solution.  They are not intended to be directly
			inserted into applications without modification.
			It is the responsibility of the programmer and/or
			system administrator that the general concepts
			presented here are adapted appropriately for each
			application.


1. Definition of the Problem

We have noticed several reports to us and to public mailing lists about CGI
scripts that allow an attacker to execute arbitrary commands on a WWW
server under the effective user-id of the server process.

In many of these cases, the author of the script has not sufficiently
sanitized user-supplied input.


2. Definition of "Sanitize"

Consider an example where a CGI script accepts user-supplied data. In
practice, this data may come from any number of sources of user-supplied
data; but for this example, we will say that the data is taken from an
environment variable $QUERY_STRING. The manner in which data was inserted
into the variable is not important - the important point here is that the
programmer needs to gain control over the contents of the data in
$QUERY_STRING before further processing can occur. The act of gaining this
control is called "sanitizing" the data.


3. A Common But Inadvisable Approach

A script writer who is aware of the need to sanitize data may decide to
remove a number of well-known meta-characters from the script and replace
them with underscores. A common but inadvisable way to do this is by
removing particular characters.

For instance, in Perl:

	#!/usr/local/bin/perl
	$user_data = $ENV{'QUERY_STRING'};	# Get the data
	print "$user_data\n";
	$user_data =~ s/[\/ ;\[\]\<\>&\t]/_/g;	# Remove bad characters. WRONG!
	print "$user_data\n";
	exit(0);

In C:

	#include <stdio.h>
	#include <string.h>
	#include <stdlib.h>

	int
	main(int argc, char *argv[], char **envp)
	{
	    static char bad_chars[] = "/ ;[]<>&\t";

	    char * user_data;	/* our pointer to the environment string */
	    char * cp;		/* cursor into example string */

	    /* Get the data */
	    user_data = getenv("QUERY_STRING");
	    printf("%s\n", user_data);

	    /* Remove bad characters. WRONG! */
	    for (cp = user_data; *(cp += strcspn(cp, bad_chars)); /* */)
	        *cp = '_';
	    printf("%s\n", user_data);
	    exit(0);
	}

In this method, the programmer determines which characters should NOT be
present in the user-supplied data and removes them. The problem with this
approach is that it requires the programmer to predict all possible inputs
that could possibly be misused.  If the user uses input not predicted by
the programmer, then there is the possibility that the script may be used
in a manner not intended by the programmer.


4. A Recommended Approach

A better approach is to define a list of acceptable characters and replace any
character that is NOT acceptable with an underscore. The list of valid input
values is typically a predictable, well-defined set of manageable size. For
example, consider the tcp_wrappers package written by Wietse Venema. In the
percent_x.c module, Wietse has defined the following:

        char   *percent_x(...)
        {
                {...}
            static char ok_chars[] = "1234567890!@%-_=+:,./\
        abcdefghijklmnopqrstuvwxyz\
        ABCDEFGHIJKLMNOPQRSTUVWXYZ";

                {...}

        for (cp = expansion; *(cp += strspn(cp, ok_chars)); /* */ )
                *cp = '_';

                {...}


The benefit of this approach is that the programmer is certain that
whatever string is returned, it contains only characters now under his or her
control.

This approach contrasts with the approach we discussed earlier. In the earlier
approach, which we do not recommend, the programmer must ensure that he or she
traps all characters that are unacceptable, leaving no margin for error. In
the recommended approach, the programmer errs on the side of caution and only
needs to ensure that acceptable characters are identified; thus the programmer
can be less concerned about what characters an attacker may try in an attempt
to bypass security checks.

Building on this philosophy, the Perl program we presented above could be
thus sanitized to contain ONLY those characters allowed. For example:

	#!/usr/local/bin/perl
	$_ = $user_data = $ENV{'QUERY_STRING'};	# Get the data
	print "$user_data\n";
	$OK_CHARS='-a-zA-Z0-9_.@';	# A restrictive list, which
					# should be modified to match
					# an appropriate RFC, for example.
	s/[^$OK_CHARS]/_/go;
	$user_data = $_;
	print "$user_data\n";
	exit(0);

Likewise, the same updated example in C:

	#include <stdio.h>
	#include <string.h>
	#include <stdlib.h>

	int
	main(int argc, char *argv[], char **envp)
	{
	    static char ok_chars[] = "abcdefghijklmnopqrstuvwxyz\
	ABCDEFGHIJKLMNOPQRSTUVWXYZ\
	1234567890_-.@";

	    char * user_data;	/* our pointer to the environment string */
	    char * cp;		/* cursor into example string */

	    user_data = getenv("QUERY_STRING");
	    printf("%s\n", user_data);
	    for (cp = user_data; *(cp += strspn(cp, ok_chars)); /* */)
	        *cp = '_';
	    printf("%s\n", user_data);
	    exit(0);
	}


Some questions that we have received from sites indicate the mistaken belief
that this sanitization technique only needs to be applied to user data that
is passed to the environment in which the application is executing. This
is not strictly true.

For instance, many Perl scripts accept arbitrary filenames from users.
While the script should obviously check the filename to ensure that it
represents a file that the user should have access to, the first step in
any filename processing should be sanitization (as discussed above). The
reason for this is that metacharacters (such as ">" and "|") have special
meaning in file oriented functions in Perl.

Another example is Perl scripts which call the eval function, using
user-supplied arguments. A call to eval essentially represents the
execution of a mini-program within the Perl script being executed.
Programmers are encouraged to ensure that control is maintained over the
content of the user-supplied data with the intent of preventing the user
executing uncontrolled instructions within that environment.


5. Recommendation

We strongly encourage you to review all CGI scripts available via your web
server to ensure that any user-supplied data is sanitized using the approach
described in Section 4, adapting the example to meet whatever specification
you are using (such as the appropriate RFC).


6. Additional Tips

The following comments appeared in CERT Advisory CA-97.12 "Vulnerability in
webdist.cgi" and AUSCERT Advisory AA-97.14, "SGI IRIX webdist.cgi
Vulnerability."

    We strongly encourage all sites should consider taking this opportunity
    to examine their entire httpd configuration. In particular, all CGI
    programs that are not required should be removed, and all those
    remaining should be examined for possible security vulnerabilities.

    It is also important to ensure that all child processes of httpd are
    running as a non-privileged user. This is often a configurable option.
    See the documentation for your httpd distribution for more details.

    Numerous resources relating to WWW security are available. The
    following pages may provide a useful starting point. They include
    links describing general WWW security, secure httpd setup, and secure
    CGI programming.

        The World Wide Web Security FAQ:

            http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html

    The following book contains useful information including sections on
    secure programming techniques.

        _Practical Unix & Internet Security_, Simson Garfinkel and
        Gene Spafford, 2nd edition, O'Reilly and Associates, 1996.

    Please note that the CERT/CC and AUSCERT do not endorse the URL that
    appears above. If you have any problem with the sites, please contact
    the site administrator.

Wall, et al, discusses techniques and resources that can be used for
handling user-supplied data within Perl in this book:

        _Programming Perl_, Larry Wall, Tom Christiansen and Randall
        L. Schwartz, 2nd edition, O'Reilly and Associates, 1996.

Readers are referred to Chapter 6, pages 336 and 355-363.

Another resource that sites can consider is the CGI.pm module. Details
about this module are available from:

    http://www.genome.wi.mit.edu/ftp/pub/software/WWW/cgi_docs.html

This module provides mechanisms for creating forms and other web-based
applications. Be aware, however, that it does not absolve the programmer
from the safe-coding responsibilities discussed above.



Copyright 1997, 1998 Carnegie Mellon University. Conditions for use,
disclaimers, 
and sponsorship information can be found in
http://www.cert.org/legal_stuff.html and ftp://info.cert.org/pub/legal_stuff .
If you do not have FTP or web access, send mail to cert@cert.org with
"copyright" in the subject line.

CERT is registered in the U.S. Patent and Trademark Office.



This file: ftp://ftp.cert.org/pub/tech_tips/cgi_metacharacters

Last revised February 13, 1998
Version 1.4

-----BEGIN PGP SIGNATURE-----
Version: PGP for Personal Privacy 5.0
Charset: noconv

iQA/AwUBOBTCi1r9kb5qlZHQEQJwnQCguu+n2ikYz6+9vlHWWzRbSDXdc0wAoJeR
YYiutvo3q9u+5LqYTkU8Mp8q
=cGNs
-----END PGP SIGNATURE-----

TUCoPS is optimized to look best in Firefox® on a widescreen monitor (1440x900 or better).
Site design & layout copyright © 1986-2025 AOH