Data validation best practices for web applications

Malformed data costs organizations millions. In 1998, NASA lost the $125 million Mars Climate Orbiter because one team used imperial units while another used metric. In 2018, Samsung Securities suffered a $300 million loss when an employee entered 1,000 shares instead of 1,000 won for a dividend payment. Both failures trace back to the same root cause: inadequate data validation.
For web developers, data validation is one of the most important skills to master. It protects your application from security vulnerabilities, prevents corrupted data from reaching your database, and gives users clear feedback when something goes wrong.
What is data validation?
Data validation is the process of ensuring that only properly formed data enters your application. According to OWASP, "Input validation is performed to ensure only properly formed data is entering the workflow in an information system, preventing malformed data from persisting in the database and triggering malfunction of various downstream components."
In simpler terms, data validation checks that the data your application receives meets your expectations before you do anything with it.
Types of data validation
Understanding the different types of validation helps you build comprehensive validation logic.
1. Data type validation
Verifies data is the correct type: string, number, boolean, or date. A quantity field should contain a number, not text.
2. Format validation
Ensures data matches expected format. Dates should follow patterns like YYYY-MM-DD. Phone numbers should match international standards. Postal codes should match country-specific formats.
3. Range validation
Checks that numerical values fall within acceptable bounds. An age field might accept values from 0 to 150. Latitude values must be between -90 and 90.
4. Length validation
Ensures string length meets requirements. Passwords might require a minimum of 8 characters. Usernames might have a maximum of 30 characters.
5. Presence validation
Confirms required fields are not empty or null. An email field marked as required must contain a value before form submission.
6. Uniqueness validation
Ensures values like IDs, emails, and usernames are unique in the system. Prevents duplicate accounts and maintains data integrity.
7. Consistency validation
Cross-field validation that checks relationships between fields. An end date should come after a start date. A confirm password field should match the password field.
8. Pattern validation
Validates data against regular expressions. Email addresses, phone numbers, and URLs all have expected patterns that regex can verify.
Client-side vs server-side validation
One of the most common questions developers ask is where to implement validation. The answer from OWASP is clear: you need both.
Client-side validation
Client-side validation runs in the browser and provides immediate feedback. Users see errors as they type, which improves the experience and reduces unnecessary server requests.
However, client-side validation has a critical limitation. OWASP states: "Input validation must be implemented on the server-side before any data is processed by an application's functions, as any JavaScript-based input validation performed on the client-side can be circumvented by an attacker who disables JavaScript or uses a web proxy."
Server-side validation
Server-side validation is your security boundary. Never trust data from the client, even if client-side validation passed. An attacker can bypass your frontend entirely by sending requests directly to your API.
The recommendation
OWASP recommends implementing both: client-side for user experience, server-side for security. Think of client-side validation as a convenience feature and server-side validation as your security requirement.
Allowlist vs denylist validation
A fundamental security decision is whether to use allowlist or denylist validation.
The denylist approach (avoid this)
Denylist validation tries to block known dangerous inputs. You might block the apostrophe character to prevent SQL injection, or block <script> tags to prevent XSS.
OWASP explicitly warns against this: "It is a common mistake to use denylist validation in order to try to detect possibly dangerous characters and patterns like the apostrophe ' character, the string 1=1, or the <script> tag, but this is a massively flawed approach as it is trivial for an attacker to bypass such filters."
The allowlist approach (use this)
Allowlist validation defines exactly what is allowed. OWASP recommends this approach: "Allowlist validation involves defining exactly what IS authorized, and by definition, everything else is not authorized."
For example, if a username should only contain letters, numbers, and underscores, validate that it contains only those characters rather than trying to block dangerous characters.
Syntactic vs semantic validation
OWASP distinguishes between two levels of validation:
Syntactic validation
Enforces correct syntax of structured fields. An email address should contain an @ symbol and a domain. A date should follow the YYYY-MM-DD format. A currency value should match expected decimal patterns.
Semantic validation
Enforces correctness of values in specific business contexts. A start date should come before an end date. An order quantity should not exceed available inventory. A shipping country should be valid for the selected shipping method.
Both levels matter. Syntactic validation catches malformed input. Semantic validation catches input that is well-formed but invalid for your business logic.
Common validation patterns
Here are practical patterns for validating common data types.
Email validation
A practical pattern for most applications:
^[^\s@]+@[^\s@]+\.[^\s@]+$
Per RFC 5321 and OWASP guidelines, thorough email validation should check:
- Contains two parts separated by @ symbol
- No dangerous characters (backticks, quotes, null bytes)
- Domain part contains only letters, numbers, hyphens, and periods
- Local part max 63 characters
- Total length max 254 characters
OWASP recommends: basic validation plus sending to the mail server to catch exceptions. The mail server is the ultimate authority on whether an address is valid.
Phone number validation
For international phone numbers, use E.164 format:
^\+?[1-9]\d{1,14}$
This allows an optional plus sign followed by 1-15 digits, starting with a non-zero digit.
Date validation
ISO 8601 format (YYYY-MM-DD):
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$
Note: This validates format but not actual date validity. February 30 would pass the regex but is not a real date. Use a date library for full validation.
URL validation
Basic pattern:
^https?:\/\/[^\s/$.?#].[^\s]*$
For production use, consider using the URL constructor and catching exceptions.
JavaScript validation libraries
Modern JavaScript/TypeScript projects benefit from validation libraries that handle common patterns and provide type safety.
Zod (TypeScript-first)
Zod has become the most popular validation library for TypeScript projects. It provides native type inference from schemas.
import { z } from 'zod';
const userSchema = z.object({
username: z.string().min(3).max(30),
email: z.string().email(),
password: z.string().min(8),
});
try {
const validData = userSchema.parse(userData);
console.log('Valid data:', validData);
} catch (e) {
console.error('Validation error:', e.errors);
}Best for: TypeScript projects where you want type inference from your validation schemas.
Yup (React forms)
Yup integrates well with React form libraries like Formik and React Hook Form.
import * as Yup from 'yup';
const schema = Yup.object().shape({
username: Yup.string().min(3).max(30).required(),
email: Yup.string().email().required(),
password: Yup.string().min(8).required(),
});Best for: React applications using Formik or similar form libraries.
Joi (Server-side)
Joi excels at server-side validation with complex schema definitions.
import Joi from 'joi';
const schema = Joi.object({
username: Joi.string().alphanum().min(3).max(30).required(),
email: Joi.string().email({ minDomainSegments: 2 }).required(),
password: Joi.string().pattern(new RegExp('^[a-zA-Z0-9]{3,30}$')).required(),
});Best for: Node.js server-side validation with complex business rules.
Library comparison
| Library | Best For | Key Feature | TypeScript Support |
|---|---|---|---|
| Zod | TypeScript projects | Type inference from schemas | Native |
| Yup | React forms (with Formik/RHF) | Chainable API | Good |
| Joi | Node.js/server-side | Complex schemas | Via types |
| Validator.js | Simple string validation | Lightweight | Via types |
| class-validator | NestJS projects | Decorator-based | Native |
Best practices
Here are the practices that prevent validation failures:
-
Validate on both client and server: Client-side for UX, server-side for security. Never rely on client-side validation alone.
-
Use allowlist over denylist: Define what is allowed rather than trying to block what is dangerous. Attackers will find ways around denylists.
-
Validate early in the data flow: OWASP advises, "Input validation should happen as early as possible in the data flow, preferably as soon as the data is received from the external party."
-
Provide clear error messages: Tell users exactly what is wrong and how to fix it. "Email must include @ symbol" is more helpful than "Invalid email."
-
Use established validation libraries: Zod, Yup, and Joi have been battle-tested. They handle edge cases you might not consider.
-
Implement both syntactic and semantic validation: Check format first, then check business logic. An end date must be a valid date (syntactic) that comes after the start date (semantic).
-
Log validation failures: Track what validation fails and how often. Patterns in failures can reveal UX problems or attack attempts.
-
Handle international formats: Phone numbers, addresses, names, and dates vary by country. Account for Unicode characters and international conventions.
Common mistakes
Using only client-side validation
The most dangerous mistake is trusting client-side validation for security. Any validation in the browser can be bypassed. Always validate on the server, even if you also validate on the client.
Relying on denylist validation for security
Trying to block known dangerous patterns leaves you vulnerable to patterns you did not consider. Use allowlist validation that specifies exactly what is permitted.
Skipping validation on trusted data
Internal APIs, background jobs, and database triggers all handle data that should be validated. Trust boundaries exist everywhere in your system, not only at the user interface.
File upload validation
File uploads require special attention. OWASP recommends:
- Validate the filename uses an expected extension
- Ensure the file is not larger than a defined maximum size
- For ZIP files, validate contents before unzipping
- Use a new filename for storage (not the user-provided filename)
- Analyze file content for malicious payloads
Never trust the file extension alone. Attackers can upload executable code with innocent-looking extensions.
How ImportCSV handles data validation
When importing CSV data, validation becomes critical. A single malformed row can corrupt an entire import, and manual validation of thousands of rows is impractical.
ImportCSV provides built-in validation during the import process:
- Automatic format detection for dates, emails, and phone numbers
- Real-time validation feedback as users map columns
- Custom validation rules for specific business requirements
- Clear error displays before import completes
import { ImportCSV } from '@importcsv/react';
function DataImporter() {
return (
<ImportCSV
onComplete={(data) => {
// Data is already validated based on column mappings
console.log('Validated data:', data);
}}
columns={[
{
name: 'email',
type: 'email', // Built-in email validation
required: true
},
{
name: 'birthDate',
type: 'date', // Built-in date validation
required: false
},
{
name: 'phone',
type: 'phone', // Built-in phone validation
required: false
},
]}
/>
);
}Validation errors are displayed in context, so users can fix issues before completing the import rather than discovering problems after the fact.
Conclusion
Data validation protects your application, your data, and your users. The principles are straightforward:
- Validate on both client and server
- Use allowlist validation for security
- Validate early in the data flow
- Use established libraries that handle edge cases
- Provide clear error messages that help users fix problems
The cost of skipping validation can be severe, from corrupted databases to security breaches to $125 million spacecraft losses. Implementing validation thoughtfully from the start is far easier than fixing problems later.
Related posts
Wrap-up
CSV imports shouldn't slow you down. ImportCSV aims to expand into your workflow — whether you're building data import flows, handling customer uploads, or processing large datasets.
If that sounds like the kind of tooling you want to use, try ImportCSV .