Exchanging Information with Users: Validating Data in PHP

Data validation is one of the most important parts of a web application. Weird, wrong, and damaging data shows up where you least expect it. Users can be careless, malicious, and fabulously more creative (often accidentally) than you may ever imag­ine when you are designing your application. Even a a Clockwork Orange-style forced viewing of a filmstrip on the dangers of unvalidated data would not over-emphasize how crucial it is that you stringently validate any piece of data coming into your application from an external source. Some of these external sources are obvious: most of the input to your application is probably coming from a web form. But there are lots of other ways data can flow into your programs as well: databases that you share with other people or applications, web services and remote servers, even URLs and their parameters.

As mentioned earlier, Example 7-7 doesn’t indicate what’s wrong with the form if the check in validate_fom() fails. Example 7-8 alters validate_fom() and show_form() to manipulate and print an array of possible error messages.

Example 7-8. Displaying error messages with the form

// Logic to do the right thing based on

// the request method

if ($_SERVER[‘REQUEST_METHOD’] == ‘POST’) {

// If validate_form() returns errors, pass them to show_form()

if ($fom_errors = validate_fom()) {

show_fom($fom_errors);

} else {

process_form();

}

} else {

show_form();

}

// Do something when the form is submitted

function process_form() {

print “Hello, “. $_POST[‘my_name’];

}

// Display the form

function show_form($errors = ) {

// If some errors were passed in, print them out if ($errors) {

print ‘Please correct these errors: <ul><li>’;

print implode(‘</li><li>’, $errors);

print ‘</li></ul>’;

}

print<<<_HTML_

form method=”POST” action=”$_SERVER[PHP_SELF]”>

Your name: <input type=”text” name=”my_name”>

<br/>

<input type=”submit” value=”Say Hello”>

</form>

_HTML_;

}

// Check the form data

function validate_form() {

// Start with an empty array of error messages

$errors = array();

// Add an error message if the name is too short

if (strlen($_POST[‘my_name’]) <  ) {

$errors[ ] = ‘Your name must be at least 3 letters long.’;

}

// Return the (possibly empty) array of error messages

return $errors;

}

The code in Example 7-8 takes advantage of the fact that an empty array evaluates to false. The line if ($form_errors = validate_form()) decides whether to call show_form() again and pass it the error array, or to call process_form(). The array that validate_form() returns is assigned to $form_errors. The truth value of the if() test expression is the result of that assignment, which, as you saw in “Under­standing true and false” on page 40, is the value being assigned. So, the if() test expression is true if $form_errors has some elements in it, and false if $form_errors is empty. If validate_form() encounters no errors, then the array it returns is empty.

It is a good idea to do validation checks on all of the form elements in one pass, instead of redisplaying the form immediately when you find a single element that isn’t valid. A user should find out all of his errors when he submits a form instead of hav­ing to submit the form over and over again, with a new error message revealed on each submission. The validate_form() function in Example 7-8 does this by adding an element to $errors for each problem with a form element. Then, show_form() prints out a list of the error messages.

The validation methods shown here all go inside the validate_fom() function. If a form element doesn’t pass the test, then a message is added to the $errors array.

1. Required Elements

To make sure something has been entered into a required element, check the ele­ment’s length with strlen(), as in Example 7-9.

Example 7-9. Verifying a required element

if (strlen($_POST[’email’]) == 0) {

$errors[] = “You must enter an email address.”;

}

It is important to use strlen() when checking a required element instead of testing the value itself in an if() statement. A test such as if (! $_POST[‘quantity’]) treats a value that evaluates to false as an error. Using strlen() lets users enter a value such as 0 into a required element.

2. Numeric or String Elements

To ensure that a submitted value is an integer or floating-point number, use filter_input() function with an appropriate filter. With filter_input, you tell PHP what kind of input to operate on, the name of the submitted value in the input, and what rule you want the value to conform to. The FILTER_VALIDATE_INT and FILTER_VALIDATE_FLOAT filters check for integers and floating-point numbers, respectively.

Example 7-10 shows the integer filter in use.

Example 7-10. Filtering integer input

$ok = filter_input(INPUT_POST, ‘age’, FILTER_VALIDATE_INT);

if (is_null($ok) || ($ok === false)) {

$errors[] = ‘Please enter a valid age.’;

}

In Example 7-10, filter_input(INPUT_POST, ‘age’, FILTER_VALIDATE_INT) tells the PHP engine to examine submitted form data (INPUT_POST), specifically the form field named age, and check it against the integer validation filter (FILTER_VALIDATE_INT). The filter_input() function gets told where to look (INPUT_POST) and what field to check (age) rather than being given an entry in an array such as $_POST[‘age’] so that it can properly handle missing values and avoid being confused if your PHP program changes values in $_POST.

If fllter_lnput() sees that the specified input element is valid, it returns the value. If the specified input element is missing, it returns null. If the specified input element is present but not valid according to the filter, the function returns false. In the if() test expression in Example 7-10, $ok is compared to false with === (three equals signs). This is called the identity operator. It compares values and evaluates to true if the two values are the same and have the same type. As you saw in Example 3-11, when you compare two values of different types (such as string and integer, or integer and boolean), the PHP engine may change the type of the values to compare them. In this case, if the value of the submitted input was 0, which is a valid integer, $ok would be 0. Then the regular equality comparison between $ok and false would be true, since 0 evaluates to false. With the identity operator, the comparison is false, because the types don’t match.

This means that the $errors array gets an error message added to it if the age form element is either not present (is_null($ok)) or not an integer ($ok === false).

Filtering floating-point numbers works similarly, as shown in Example 7-11.

Example 7-11. Filtering floating-point input

$ok = filter_input(INPUT_POST, ‘price’, FILTER_VALIDATE_FLOAT);

if (is_null($ok) || ($ok === false)) {

$errors[] = ‘Please enter a valid price.’;

}

When validating elements (particularly string elements), it is often helpful to remove leading and trailing whitespace with the trim() function. You can combine this with the strlen() test for required elements to disallow an entry of just whitespace char­acters. The combination of trim() and strlen() is shown in Example 7-12.

Example 7-12. Combining trim() and strlen()

if (strlen(trim($_POST[‘name’])) ==   ) {

$errors[] = “Your name is required.”;

}

All URL and submitted form data arrives at the PHP engine as strings. The filter_input() function, if given a numeric filter (and a valid value), returns the value converted to an integer or floating-point number. Like working with a whitespace-trimmed string, using these converted values rather than $_POST directly is often convenient in your program. A good way to accomplish that is to have your validation function build an array of converted values to work with. This is shown in Example 7-13.

Example 7-13. Building an array of modified input data

function validate_form() {

$errors = array();

$input = array();

$input[‘age’] = filter_input(INPUT_POST, ‘age’, FILTER_VALIDATE_INT);

if (is_null($input[‘age’]) || ($input[‘age’] === false)) {

$errors[] = ‘Please enter a valid age.’;

}

$input[‘price’] = filter_input(INPUT_POST, ‘price’, FILTER_VALIDATE_FLOAT);

if (is_null($input[‘price’]) || ($input[‘price’] === false)) {

$errors[] = ‘Please enter a valid price.’;

}

// Use the null coalesce operator in case $_POST[‘name’] isn’t set

$input[‘name’] = trim($_POST[‘name’] ?? ”);

if (strlen($input[‘name’]) == 0) {

$errors[] = “Your name is required.”;

}

return array($errors, $input);

}

The validate_form() function in Example 7-13 builds up the $input array, putting values into it as they are checked. It also builds up the $errors array if there are any problems. Having created both arrays, it needs to return both so that the rest of the program can use $input, not just $errors. To do that, it bundles them up into a two- element array and returns that.

If validate_form() is returning both input and errors, the code calling it must be modified to take that into account. Example 7-14 shows a modified version of the beginning of Example 7-8 that handles both arrays returned from validate_form().

Example 7-14. Handling errors and modified input data

// Logic to do the right thing based on the request method

if ($_SERVER[‘REQUEST_METHOD’] == ‘POST’) {

// If validate_form() returns errors, pass them to show_form()

list($form_errors, $input) = validate_form();

if ($form_errors) {

show_form($form_errors);

} else {

process_form($input);

}

} else {

show_form();

}

In Example 7-14, the list() construct is used to destructure the return value from validate_fom(). Because we know that validate_fom() will always return an array with two elements (the first element is the possibly empty array of error messages and the second element is the array of modified input data), list($fom_errors, $input) tells the PHP engine to put the first element of that returned array into the $form_errors variable and the second element into $input. Having those separate arrays in separate variables makes the code easier to read.

Once the returned arrays are properly handled, the logic is similar. If the $errors array is not empty, then show_form() is called with the $errors array as an argument. Otherwise, the form processing function is called. One slight difference is that now the form processing function is passed the array of modified input values to use. This means that process_form() should now refer to $input[‘my_name’] rather than $_POST[‘my_name’] to find values to print.

3. Number Ranges

To check whether an integer falls within a certain range, use the min_range and max_range options of the FILTER_VALIDATE_INT filter. The options get passed as a fourth argument to filter_input(), as shown in Example 7-15.

Example 7-15. Checking an integer range

$input[‘age’] = filter_input(INPUT_POST, ‘age’, FILTER_VALIDATE_INT,

array(‘options’ => array(‘min_range’ => 18,

  ‘max_range’ => 6 )));

if (is_null($input[‘age’]) || ($input[‘age’] === false)) {

$errors[] = ‘Please enter a valid age between 18 and 65.’;

}

Notice that the array of options and their values are not themselves the fourth argu­ment to filter_input(). That argument is a one-element array with a key of options and a value of the actual array of options and their values.

The FILTER_VALIDATE_FLOAT filter doesn’t support the min_range and max_range options, so you need to do the comparisons yourself:

$input[‘price’] = filter_input(INPUT_POST, ‘price’, FILTER_VALIDATE_FLOAT);

if (is_null($input[‘price’]) || ($input[‘price’] === false) ||

($input[‘price’] < 10.00) || ($input[‘price’] > 50.00)) {

$errors[] = ‘Please enter a valid price between $10 and $50.’;

}

To test a date range, convert the submitted date value into a DateTime object and then check that its value is appropriate (for more information on DateTime objects and the checkdate() functions used in Example 7-16, see Chapter 15). Because DateTime objects encapsulate all the bits of information necessary to represent a point in time, you don’t have to do anything special when using a range that spans a month or year boundary. Example 7-16 checks to see whether a supplied date is less than six months old.

Example 7-16. Checking a date range

// Make a DateTime object for 6 months ago

$range_start = new DateTime(‘6 months ago’);

// Make a DateTime object for right now

$range_end = new DateTime();

// 4-digit year is in $_POST[‘year’]

// 2-digit month is in $_POST[‘month’]

// 2-digit day is is $_POST[‘day’]

$input[‘year’] = fitter_input(INPUT_POST, ‘year’, FILTER_VALIDATE_INT,

  array(‘options’ => array(‘min_range’ => 1900,

    ‘max_range’ => 210C)));

$input[‘month’] = fitter_input(INPUT_POST, ‘month’, FILTER_VALIDATE_INT,

array(‘options’ => array(‘min_range’ => 1,

  ‘max_range’ => 1 )));

$input[‘day’] = fitter_input(INPUT_POST, ‘day’, FILTER_VALIDATE_INT,

 array(‘options’ => array(‘min_range’ => 1,

   ‘max_range’ => 3 )));

// No need to use === to compare to false since 0 is not a valid

// choice for year, month, or day. checkdate() makes sure that

// the number of days is valid for the given month and year.

if ($input[‘year’] && input[‘month’] && input[‘day’] &&

checkdate($input[‘month’], $input[‘day’], $input[‘year’])) {

$submitted_date = new DateTime(strtotime($input[‘year’] . ‘-‘ .

$input[‘month’] . ‘-‘ .

$input[‘day’]));

if (($range_start > $submitted_date) || ($range_end < $submitted_date)) {

$errors[] = ‘Please choose a date less than six months old.’;

}

} else {

// This happens if someone omits one of the form parameters or submits

// something like February 31.

$errors[] = ‘Please enter a valid date.’;

}

4. Email Addresses

Checking an email address is arguably the most common form validation task. There is, however, no perfect one-step way to make sure an email address is valid, since “valid” could mean different things depending on your goal. If you truly want to make sure that someone is giving you a working email address, and that the person providing it controls that address, you need to do two things. First, when the email address is submitted, send a message containing a random string to that address. In the message, tell the user to submit the random string in a form on your site. Or, you can include a URL in the message that the user can just click on, which has the code embedded into it. If the code is submitted (or the URL is clicked on), then you know that the person who received the message and controls the email address submitted it to your site (or at least is aware of and approves of the submission).

If you don’t want to go to all the trouble of verifying the email address with a separate message, there is still an easy syntax check you can do in your form validation code to weed out mistyped addresses. The FILTER_VALIDATE_EMAIL filter checks strings against the rules for valid email addresses, as shown in Example 7-17.

Example 7-17. Checking the syntax of an email address

$input[’email’] = fitter_input(INPUT_POST, ’email’, FILTER_VALIDATE_EMAIL);

if (! $input[’email’]) {

$errors[] = ‘Please enter a valid email address’;

}

In Example 7-17, the simpler validity check if (! $input[’email’]) is fine because any submitted strings that would evaluate to false (such as the empty string or 0) are also invalid email addresses.

5. <select> Menus

When you use a <select> menu in a form, you need to ensure that the submitted value for the menu element is one of the permitted choices in the menu. Although a user can’t submit an off-menu value using a mainstream, well-behaved browser such as Firefox or Chrome, an attacker can construct a request containing any arbitrary value without using a browser.

To simplify display and validation of <select> menus, put the menu choices in an array. Then, iterate through that array to display the <select> menu inside the show_formQ function. Use the same array in validate_formO to check the submit­ted value. Example 7-18 shows how to display a <select> menu with this technique.

Example 7-18. Displaying a <select> menu

$sweets = array(‘Sesame Seed Puff’,’Coconut Milk Gelatin Square’,

  ‘Brown Sugar Cake’,’Sweet Rice and Meat’);

function generate_options($options) {

$html = ”;

foreach ($options as $option) {

$html .= “<option>$option</option>\n”;

}

return $html;

}

// Display the form

function show_form() {

$sweets = generate_options($GLOBALS[‘sweets’]);

print<<<_HTML_

form method=”post” action=”$_SERVER[PHP_SELF]”>

Your Order: <select name=”order”>

$sweets

</select>

<br/>

<input type=”submit” value=”Order”>

</form>

_HTML_;

}

The HTML that show_fom() in Example 7-18 prints is:

<form method=”post” action=”order.php”>

Your Order: <select name=”order”>

<option>Sesame Seed Puff</option>

<option>Coconut Milk Gelatin Square</option>

<option>Brown Sugar Cake</option>

<option>Sweet Rice and Meat</option>

</select>

<br/>

<input type=”submit” value=”Order”>

</form>

Inside validate_fom(), use the array of <select> menu options like this:

$input[‘order’] = $_POST[‘order’];

if (! in_array($input[‘order’], $GLOBALS[‘sweets’])) {

$errors[] = ‘Please choose a valid order.’;

}

If you want a <select> menu with different displayed choices and option values, you need to use a more complicated array. Each array element key is a value attribute for one option. The corresponding array element value is the displayed choice for that option. In Example 7-19, the option values are puff, square, cake, and ricemeat. The displayed choices are Sesame Seed Puff, Coconut Milk Gelatin Square, Brown Sugar Cake, and Sweet Rice and Meat.

Example 7-19. A <select> menu with different choices and values

$sweets = array(‘puff’ => ‘Sesame Seed Puff’,

  ‘square’ => ‘Coconut Milk Gelatin Square’,

  ‘cake’ => ‘Brown Sugar Cake’,

  ‘ricemeat’ => ‘Sweet Rice and Meat’);

function generate_options_with_value ($options) {

$html = ”;

foreach ($options as $value => $option) {

$html .= “<option value=\”$value\”>$option</option>\n”;

}

return $html;

}

// Display the form

function show_form() {

$sweets = generate_options_with_value($GLOBALS[‘sweets’]);

print<<<_HTML_

form method=”post” action=”$_SERVER[PHP_SELF]”>

Your Order: <select name=”order”>

$sweets

</select>

<br/>

<input type=”submit” value=”Order”>

</form>

_HTML_;

}

The form displayed by Example 7-19 is as follows:

<form method=”post” action=”order.php”>

Your Order: <select name=”order”>

<option value=”puff”>Sesame Seed Puff</option>

<option value=”square”>Coconut Milk Gelatin Square</option>

<option value=”cake”>Brown Sugar Cake</option>

<option value=”ricemeat”>Sweet Rice and Meat</option>

</select>

<br/>

<input type=”submit” value=”Order”>

</form>

The submitted value for the <select> menu in Example 7-19 should be puff, square, cake, or ricemeat. Example 7-20 shows how to verify this in validate_fom().

Example 7-20. Checking a <select> menu submission value

$input[‘order’] = $_POST[‘order’];

if (! array_key_exists($input[‘order’], $GLOBALS[‘sweets’])) {

$errors[] = ‘Please choose a valid order.’;

}

6. HTML and JavaScript

Submitted form data that contains HTML or JavaScript can cause big problems. Con­sider a simple blog application that lets users submit comments on a blog post page and then displays a list of those comments below the blog post. If users behave nicely and enter only comments containing plain text, the page remains benign. One user submits Cool page! I like how you list the different ways to cook fish.

When you come along to browse the page, that’s what you see.

The situation is more complicated when the submissions are not just plain text. If an enthusiastic user submits This page <b>rules!!! !</b> as a comment, and it is redisplayed verbatim by the application, then you see rules!!!! in bold when you browse the page. Your web browser can’t tell the difference between HTML tags that come from the application itself (perhaps laying out the comments in a table or a list) and HTML tags that happen to be embedded in the comments that the application is printing.

Although seeing bold text instead of plain text is a minor annoyance, displaying unfiltered user input leaves the application open to giving you a much larger head­ache. Instead of <b></b> tags, one user’s submission could contain a malformed or unclosed tag (such as <a href=” with no ending ” or >) that prevents your browser from displaying the page properly. Even worse, that submission could contain Java­Script code that, when executed by your web browser as you look at the page, does nasty stuff such as send a copy of your cookies to a stranger’s email box or surrepti­tiously redirect you to another web page.

The application acts as a facilitator, letting a malicious user upload some HTML or JavaScript that is later run by an unwitting user’s browser. This kind of problem is called a cross-site scripting attack because the poorly written blog application allows code from one source (the malicious user) to masquerade as coming from another place (the application hosting the comments).

To prevent cross-site scripting attacks in your programs, never display unmodified external input. Either remove suspicious parts (such as HTML tags) or encode special characters so that browsers don’t act on embedded HTML or JavaScript. PHP gives you two functions that make these tasks simple. The strip_tags() function removes HTML tags from a string, and the htmlentities() function encodes special HTML characters.

Example 7-21 demonstrates strip_tags().

Example 7-21. Stripping HTML tags from a string

// Remove HTML from comments

$comments = strip_tags($_POST[‘comments’]);

// Now it’s OK to print

$comments print $comments;

If $_POST[‘comments’] contains

I

<b>love</b> sweet <div class=”fancy”>rice</div> & tea.

then Example 7-21 prints:

I love sweet rice & tea.

All HTML tags and their attributes are removed, but the plain text between the tags is left intact. The strip_tags() function is very convenient, but it behaves poorly with mismatched < and > characters. For example, it turns I <3 Monkeys into I . It starts stripping once it sees that < and never stops because there’s no corresponding <.

Encoding instead of stripping the tags often gives better results. Example 7-22 dem­onstrates encoding with htmlentities().

Example 7-22. Encoding HTML entities in a string

$comments = htmlentities($_POST[‘coments’]);

// Now it’s OK to print $comments

print $comments;

If $_POST[‘comments’] contains

I

<b>love</b> sweet <div class=”fancy”>rice</div> & tea

then Example 7-22 prints:

I &lt;b&gt;love&lt;/b&gt; sweet &lt;div class=&quot;fancy &quot;&gt;rice&lt;/div&gt; &amp; tea.

The characters that have a special meaning in HTML (<, >, &, and “) have been changed into their entity equivalents:

  • < to &lt;
  • > to &gt;
  • & to &amp;
  • ” to &quot;

When a browser sees &lt;, it prints out a < character instead of thinking “OK, here comes an HTML tag” This is the same idea (but with a different syntax) as escaping a ” or $ character inside a double-quoted string, as you saw in “Text” on page 19. Figure 7-4 shows what the output of Example 7-22 looks like in a web browser.

In most applications, you should use htmlentities() to sanitize external input. This function doesn’t throw away any content, and it also protects against cross-site script­ing attacks. A discussion board where users post messages, for example, about HTML (“What does the <div> tag do?”) or algebra (“If x<y, is 2x>z?”) wouldn’t be very use­ful if those posts were run through strip_tags(). The questions would be printed as “What does the tag do?” and “If xz?”

7. Beyond Syntax

Most of the validation strategies discussed in this chapter so far check the syntax of a submitted value. They make sure that what’s submitted matches a certain format. However, sometimes you want to make sure that a submitted value has not just the correct syntax, but an acceptable meaning as well. The <select> menu validation does this. Instead of just assuring that the submitted value is a string, it matches it against a specific array of values. The confirmation-message strategy for checking email addresses is another example of checking for more than syntax. If you ensure only that a submitted email address has the correct form, a mischievous user can pro­vide an address such as president@whitehouse.gov that almost certainly doesn’t belong to her. The confirmation message makes sure that the meaning of the address —i.e., “this email address belongs to the user providing it”—is correct.

Source: Sklar David (2016), Learning PHP: A Gentle Introduction to the Web’s Most Popular Language, O’Reilly Media; 1st edition.

Leave a Reply

Your email address will not be published. Required fields are marked *