Author Topic: Data validation - a draft  (Read 1125 times)

Offline Marek

  • Level 18
  • *
  • Posts: 177
  • Reputation: +7/-0
  • XHTML, CSS, JS, PHP and MySQL are my pantheon.
    • View Profile
Data validation - a draft
« on: August 24, 2010, 07:11:46 PM »
I'm working on a little data validation system, and I'd like comments on this draft specification. Don't worry, it's not too long. :)

The context: My web apps use a lot of JavaScript along with ajax requests. The server-side mostly acts as a simple JSON interface for POST requests. The server validates the parameters it gets, and ideally we want the JavaScript to also validate the data before sending it to the server. In the spirit of DRY, the validation rules should only be expressed one. I've seen existing things like form validators and JSON-schema but I have yet to see something simple and portable enough. I want the implementation to be really small and clean.

The approach: My choice is to use JSON to describe validation rules. For every POST parameter, there is a rule list which describes the restrictions that the value must conform to. These rules only concern the value itself; other data, such as whether it is required or optional, isn't included here.

I'm implementing this in JavaScript as well as in PHP and Python. Because of these parallel implementations, one of the goals is to keep things really simple.

Specification:
In order to be considered valid, a value has to conform to a set of rules. We'll call this set of rules the "rule schema". For example, a username string can have a rules that dictates its minimum and maximum length, as well as its allowed characters. It can also be required to conform to a regular expression match.

A "rule schema" can be expressed as a JSON list equivalent. In JavaScript this is called an array, in PHP it's called a non-associative array, and in Python it's called a list. An element of the rule schema is called a "rule" and it is also expressed as a JSON list. Each rule's first element is a string representing the name of the rule, and the remaining elements are the parameters for that rule.

Here's an example: a rule schema for validating an integer between 0 and 10 inclusive.
Code: [Select]
var exampleSchema = [
    ['integer'],
    ['range', 0, 10]
]

var value = 11;
validator(value, exampleSchema);  //Example usage of the validator
In this example, the 'range' rule takes two parameters (a lower and upper bound). The 'integer' rule takes no parameters.

Every rule represents a callback to a "processing function". It's possible to add rules by adding new processing functions.

A processing function accepts a value to process, and some number of rule parameters. It has two outcomes: if it decides that the value conforms to the rule, it returns the value. (It doesn't have to return the same value it received. Therefore processing functions can serve as filters.) If it decides that the value doesn't conform, it raises an exception which signals failure.

The order of rules is significant, because some processing functions can potentially modify the value before passing it down the chain.

In the above example, the 'range' rule would call the range processing function which might look like this:
Code: [Select]
function validate_range(value, min_value, max_value) {
   // Return value if value is between min_value and max_value, otherwise throw an exception
}

Here's another example for a hypothetical username:
Code: [Select]
var usernameSchema = [
    ['string'],
    ['length', 3, 32],
    ['allowed', alphanum + "_-"] // where the alphanum variable contains all the alphanumeric characters
]
Where the 'allowed' processing function takes one string argument restricting what characters are allowed in the value.

This is the list of rules I'm implementing so far.
  • string
  • integer
  • float
  • bool
  • range <min><max>
  • length <min><max>
  • allowed <chars>
  • forbidden <chars>
  • match <regexp>
  • replace <search> <replace>
  • replace_regexp <regex search> <replace>

Extra example: a regular expression match
Code: [Select]
var emailSchema = [
    ['string'],
    ['match',  '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}$']
]

Request for comments
  • What else do I need to make this an adequate validator system?
  • Should I split up the range and length rules into individual max and min rules? It might be more flexible.
« Last Edit: August 24, 2010, 07:15:54 PM by nano »

Offline aerosuidae

  • Level 9
  • *
  • Posts: 50
  • Reputation: +5/-0
    • View Profile
    • Return to Sol
Re: Data validation - a draft
« Reply #1 on: October 23, 2010, 08:52:58 AM »
Looks interesting.


This is the list of rules I'm implementing so far.
  • string
  • integer
  • float
  • bool
  • range <min><max>
  • length <min><max>
  • allowed <chars>
  • forbidden <chars>
  • match <regexp>
  • replace <search> <replace>
  • replace_regexp <regex search> <replace>

I have a method that does much the same thing as you're creating here, but it is not extensible like yours.  I find it useful to accept the following additional types:

unsigned int  (0 or greater)
positive int  (1 or greater)
csv  (comma delimited values, returned as a 1D array)
json  (a json string to parse and return as an object)

First two can probably be done with your range type.

My method is roughly (pseudo code):

Code: [Select]
function expect($source_array, $source_key, $data_type, $default_value) { ... }

$id = expect($_POST, 'id', 'positive int', null);

assert(!is_null($id) && ... , ...);



 


SimplePortal 2.3.3 © 2008-2010, SimplePortal