Data validation best practices for web applications

Malformed data costs organizations millions. In 1998, NASA lost the $125 million Mars Climate Orbiter because one team used imperial units while another used metric. In 2018, Samsung Securities suffered a $300 million loss when an employee entered 1,000 shares instead of 1,000 won for a dividend payment. Both failures trace back to the same root cause: inadequate data validation.

For web developers, data validation is one of the most important skills to master. It protects your application from security vulnerabilities, prevents corrupted data from reaching your database, and gives users clear feedback when something goes wrong.

What is data validation?

Data validation is the process of ensuring that only properly formed data enters your application. According to OWASP, "Input validation is performed to ensure only properly formed data is entering the workflow in an information system, preventing malformed data from persisting in the database and triggering malfunction of various downstream components."

In simpler terms, data validation checks that the data your application receives meets your expectations before you do anything with it.

Types of data validation

Understanding the different types of validation helps you build comprehensive validation logic.

1. Data type validation

Verifies data is the correct type: string, number, boolean, or date. A quantity field should contain a number, not text.

2. Format validation

Ensures data matches expected format. Dates should follow patterns like YYYY-MM-DD. Phone numbers should match international standards. Postal codes should match country-specific formats.

3. Range validation

Checks that numerical values fall within acceptable bounds. An age field might accept values from 0 to 150. Latitude values must be between -90 and 90.

4. Length validation

Ensures string length meets requirements. Passwords might require a minimum of 8 characters. Usernames might have a maximum of 30 characters.

5. Presence validation

Confirms required fields are not empty or null. An email field marked as required must contain a value before form submission.

6. Uniqueness validation

Ensures values like IDs, emails, and usernames are unique in the system. Prevents duplicate accounts and maintains data integrity.

7. Consistency validation

Cross-field validation that checks relationships between fields. An end date should come after a start date. A confirm password field should match the password field.

8. Pattern validation

Validates data against regular expressions. Email addresses, phone numbers, and URLs all have expected patterns that regex can verify.

Client-side vs server-side validation

One of the most common questions developers ask is where to implement validation. The answer from OWASP is clear: you need both.

Client-side validation

Client-side validation runs in the browser and provides immediate feedback. Users see errors as they type, which improves the experience and reduces unnecessary server requests.

However, client-side validation has a critical limitation. OWASP states: "Input validation must be implemented on the server-side before any data is processed by an application's functions, as any JavaScript-based input validation performed on the client-side can be circumvented by an attacker who disables JavaScript or uses a web proxy."

Server-side validation

Server-side validation is your security boundary. Never trust data from the client, even if client-side validation passed. An attacker can bypass your frontend entirely by sending requests directly to your API.

The recommendation

OWASP recommends implementing both: client-side for user experience, server-side for security. Think of client-side validation as a convenience feature and server-side validation as your security requirement.

Allowlist vs denylist validation

A fundamental security decision is whether to use allowlist or denylist validation.

The denylist approach (avoid this)

Denylist validation tries to block known dangerous inputs. You might block the apostrophe character to prevent SQL injection, or block <script> tags to prevent XSS.

OWASP explicitly warns against this: "It is a common mistake to use denylist validation in order to try to detect possibly dangerous characters and patterns like the apostrophe ' character, the string 1=1, or the <script> tag, but this is a massively flawed approach as it is trivial for an attacker to bypass such filters."

The allowlist approach (use this)

Allowlist validation defines exactly what is allowed. OWASP recommends this approach: "Allowlist validation involves defining exactly what IS authorized, and by definition, everything else is not authorized."

For example, if a username should only contain letters, numbers, and underscores, validate that it contains only those characters rather than trying to block dangerous characters.

Syntactic vs semantic validation

OWASP distinguishes between two levels of validation:

Syntactic validation

Enforces correct syntax of structured fields. An email address should contain an @ symbol and a domain. A date should follow the YYYY-MM-DD format. A currency value should match expected decimal patterns.

Semantic validation

Enforces correctness of values in specific business contexts. A start date should come before an end date. An order quantity should not exceed available inventory. A shipping country should be valid for the selected shipping method.

Both levels matter. Syntactic validation catches malformed input. Semantic validation catches input that is well-formed but invalid for your business logic.

Common validation patterns

Here are practical patterns for validating common data types.

Email validation

A practical pattern for most applications:

^[^\s@]+@[^\s@]+\.[^\s@]+$

Per RFC 5321 and OWASP guidelines, thorough email validation should check:

Contains two parts separated by @ symbol
No dangerous characters (backticks, quotes, null bytes)
Domain part contains only letters, numbers, hyphens, and periods
Local part max 63 characters
Total length max 254 characters

OWASP recommends: basic validation plus sending to the mail server to catch exceptions. The mail server is the ultimate authority on whether an address is valid.

Phone number validation

For international phone numbers, use E.164 format:

^\+?[1-9]\d{1,14}$

This allows an optional plus sign followed by 1-15 digits, starting with a non-zero digit.

Date validation

ISO 8601 format (YYYY-MM-DD):

^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$

Note: This validates format but not actual date validity. February 30 would pass the regex but is not a real date. Use a date library for full validation.

URL validation

Basic pattern:

^https?:\/\/[^\s/$.?#].[^\s]*$

For production use, consider using the URL constructor and catching exceptions.

JavaScript validation libraries

Modern JavaScript/TypeScript projects benefit from validation libraries that handle common patterns and provide type safety.

Zod (TypeScript-first)

Zod has become the most popular validation library for TypeScript projects. It provides native type inference from schemas.

import { z } from 'zod';

const userSchema = z.object({
  username: z.string().min(3).max(30),
  email: z.string().email(),
  password: z.string().min(8),
});

try {
  const validData = userSchema.parse(userData);
  console.log('Valid data:', validData);
} catch (e) {
  console.error('Validation error:', e.errors);
}

Best for: TypeScript projects where you want type inference from your validation schemas.

Yup (React forms)

Yup integrates well with React form libraries like Formik and React Hook Form.

import * as Yup from 'yup';

const schema = Yup.object().shape({
  username: Yup.string().min(3).max(30).required(),
  email: Yup.string().email().required(),
  password: Yup.string().min(8).required(),
});

Best for: React applications using Formik or similar form libraries.

Joi (Server-side)

Joi excels at server-side validation with complex schema definitions.

import Joi from 'joi';

const schema = Joi.object({
  username: Joi.string().alphanum().min(3).max(30).required(),
  email: Joi.string().email({ minDomainSegments: 2 }).required(),
  password: Joi.string().pattern(new RegExp('^[a-zA-Z0-9]{3,30}$')).required(),
});

Best for: Node.js server-side validation with complex business rules.

Library comparison

Library	Best For	Key Feature	TypeScript Support
Zod	TypeScript projects	Type inference from schemas	Native
Yup	React forms (with Formik/RHF)	Chainable API	Good
Joi	Node.js/server-side	Complex schemas	Via types
Validator.js	Simple string validation	Lightweight	Via types
class-validator	NestJS projects	Decorator-based	Native

Best practices

Here are the practices that prevent validation failures:

Validate on both client and server: Client-side for UX, server-side for security. Never rely on client-side validation alone.
Use allowlist over denylist: Define what is allowed rather than trying to block what is dangerous. Attackers will find ways around denylists.
Validate early in the data flow: OWASP advises, "Input validation should happen as early as possible in the data flow, preferably as soon as the data is received from the external party."
Provide clear error messages: Tell users exactly what is wrong and how to fix it. "Email must include @ symbol" is more helpful than "Invalid email."
Use established validation libraries: Zod, Yup, and Joi have been battle-tested. They handle edge cases you might not consider.
Implement both syntactic and semantic validation: Check format first, then check business logic. An end date must be a valid date (syntactic) that comes after the start date (semantic).
Log validation failures: Track what validation fails and how often. Patterns in failures can reveal UX problems or attack attempts.
Handle international formats: Phone numbers, addresses, names, and dates vary by country. Account for Unicode characters and international conventions.

Common mistakes

Using only client-side validation

The most dangerous mistake is trusting client-side validation for security. Any validation in the browser can be bypassed. Always validate on the server, even if you also validate on the client.

Relying on denylist validation for security

Trying to block known dangerous patterns leaves you vulnerable to patterns you did not consider. Use allowlist validation that specifies exactly what is permitted.

Skipping validation on trusted data

Internal APIs, background jobs, and database triggers all handle data that should be validated. Trust boundaries exist everywhere in your system, not only at the user interface.

File upload validation

File uploads require special attention. OWASP recommends:

Validate the filename uses an expected extension
Ensure the file is not larger than a defined maximum size
For ZIP files, validate contents before unzipping
Use a new filename for storage (not the user-provided filename)
Analyze file content for malicious payloads

Never trust the file extension alone. Attackers can upload executable code with innocent-looking extensions.

How ImportCSV handles data validation

When importing CSV data, validation becomes critical. A single malformed row can corrupt an entire import, and manual validation of thousands of rows is impractical.

ImportCSV provides built-in validation during the import process:

Automatic format detection for dates, emails, and phone numbers
Real-time validation feedback as users map columns
Custom validation rules for specific business requirements
Clear error displays before import completes

import { ImportCSV } from '@importcsv/react';

function DataImporter() {
  return (
    <ImportCSV
      onComplete={(data) => {
        // Data is already validated based on column mappings
        console.log('Validated data:', data);
      }}
      columns={[
        {
          name: 'email',
          type: 'email',  // Built-in email validation
          required: true
        },
        {
          name: 'birthDate',
          type: 'date',   // Built-in date validation
          required: false
        },
        {
          name: 'phone',
          type: 'phone',  // Built-in phone validation
          required: false
        },
      ]}
    />
  );
}

Validation errors are displayed in context, so users can fix issues before completing the import rather than discovering problems after the fact.

Conclusion

Data validation protects your application, your data, and your users. The principles are straightforward:

Validate on both client and server
Use allowlist validation for security
Validate early in the data flow
Use established libraries that handle edge cases
Provide clear error messages that help users fix problems

The cost of skipping validation can be severe, from corrupted databases to security breaches to $125 million spacecraft losses. Implementing validation thoughtfully from the start is far easier than fixing problems later.