Ethical Hacking

Learn to find vulnerabilities before the bad guys do! Gain real world hands on hacking experience in our state of the art hacking lab. Course designed and taught by expert instructors with years of penetration testing experience. 12 student maximum in every class. Certification attempt included in every package.
Computer Forensics Training at InfoSec Institute

Gain the in-demand skills of a certified computer examiner, learn to recover trace data left behind by fraud, theft, and cybercrime perpetrators. Discover the source of computer crime and abuse at your organization so that it never happens again. All of our class sizes are guaranteed to be 12 students or less to facilitate one-on-one interaction with one of our expert instructors.




Network Security Web-App-Sec
[Top] [All Lists]

Re: Canicalization Of User Input In PHP

Subject: Re: Canicalization Of User Input In PHP
Date: Wed, 19 Jan 2005 14:16:54 +0000
Hi,

In general I feel that trying to develop a generic "sanitize_input" function is not fruitful. The set of dangerous characters depends on where the string is used. For example, I just audited some code which had such a function "safe_io" it called the MySQL and HTML escaping functions. This was rigorously called for inputs, however, I found some places where variables protected like that were passed to the shell. Also, such functions can very easily corrupt data.

For escaping dangerous characters, I advocate escaping very close to where the string will be reparsed. e.g. system("program " + escape_shell(args)). Applying this principle to cross-site scripting means escaping HTML as it is generated. A consequence of this is that your database may contain HTML special characters. However, as you follow this principle, you become more and more encouraged to just make everything binary safe and sidestep the dangerous characters problem. For SQL queries, most interfaces support some kind of parameterised queries that are binary safe. As for passing to the shell, it usually turns out the only good policy is to avoid this at all costs. I haven't mentioned the string length, and most of the time a long string is not dangerous.

One thing to note: in many situations the programmer should be able to get a complete list of dangerous characters - because they control the code that reparses the string. However, the most notable exception is HTML. Here the client's browser does the parsing - programmer has no control. Various browser-specific features require protecting more characters.

Now, escaping bad characters is just one part of the puzzle. As a major second line of defence every input value should be whitelist validated as early as possible. Any input encoding (e.g. URL encoding) must be decoded before this validation. Handling UTF-8 requires some consideration here. This is a major defence against the possibility that you've missed a character from your dangerous character list. Also, this is a good place to put sensible length limits. However, for many inputs quite permissive validation is the only acceptable option, a regex I often use is ^[\x20-\x7e]*$ While helpful, this does nothing to protect HTML or SQL special characters, so it is not a defence by itself.

So, the overall sequence is:

   input -> unescape -> validate -> [do stuff] -> escape -> output

I haven't mentioned "canonicalisation" but that is implicit in the above sequence. Of course, all this only protects you against one class of attacks. While the inputs are now validated, they remain untrusted. You still have to design your logic correctly. Also, you need to consider carefully what your inputs are. e.g. it would be easy to protect all form input fields, but forget to apply the same validation to cookies. For network applications it is usually clear what the inputs are (but take care of things like reverse DNS lookups). A tradition Unix situation where the inputs are many and varied is a setuid executable - something very difficult to secure.

Regards,

Paul



warnings@envisagement.com wrote:

I am working on implementing a basic PHP user input validation scheme
and have come across several references to canonicalizing input before
performing validation. After researching this topic on the net I have finally
reached a point where I feel okay asking for help.


At this point I have found a few basic functions related to this subject, but
I am getting lost in alphabet soup (UTF-8, RFC 2279, ISO 10646, ...) and
I am reaching a momentary saturation point where I am finding the learning
curve is only getting steeper with the more I learn.


For the basic validation I have found the following set of PHP filters via the
owasp.org site.


http://www.owasp.org/software/labs/phpfilters.html
// sanitize.inc.php
// Sanitization functions for PHP
// by: Gavin Zuchlinski, Jamie Pratt, Hokkaido
// webpage: http://libox.net
// Last modified: December 21, 2003

Now these functions are fairly clear and easy to understand and have
generally validated what I have come to understand as best practices.
as I have experience with fault tolerant coding, just not security. But, the
issue I am having trouble coming to terms with is canonicalization of the data.
Beyond the above routines, I have also found the urldecode() function in
the PHP manual.


At this point I feel (weakly, not securely) that one should use the following
to canonicalize the data prior to validating any input.


reset($_GET);
foreach($_GET as $key => $value){
   // Transform to canonical form.
   $ckey = my_utf8_decode(urldecode($key));
   $cvalue = my_utf8_decode(urldecode($value));
   if( $ckey != sanitize_paranoid_string($ckey) ||
           $cvalue != sanitize_paranoid_string($cvalue) ){
       header('location:www.somesight.net/index.php');
   }
}

I understand this example is simplistic, but is this a proper way
to canonicalize the input values?  Or am I missing something here?

Should I be looking at the following too?

$_SERVER['CONTENT_TYPE'] == 'application/x-www-form-urlencoded'

Is this data even trustworthy? I would at first guess think it could be forged in
the header data.


Any input would be appreciated.

thanks,

Sean


-- Paul Johnston, GSEC Internet Security Specialist Westpoint Limited Albion Wharf, 19 Albion Street, Manchester, M1 5LN England Tel: +44 (0)161 237 1028 Fax: +44 (0)161 237 1031 email: paul@westpoint.ltd.uk web: www.westpoint.ltd.uk

<Prev in Thread] Current Thread [Next in Thread>