How to Build a CSV Upload API (Node.js & Express)

Build a CSV Upload API with Node.js and Express

A CSV upload endpoint sounds simple until you're debugging why row 47,832 silently failed at 3am. The file parsed fine locally. The validation passed. But somewhere between multipart form data and your database, things went wrong.

This tutorial shows you how to build a CSV upload API that actually handles edge cases. We'll cover file uploads with Multer, streaming CSV parsing with PapaParse, row-level validation, and batch database inserts. By the end, you'll have a production-ready endpoint—not a demo that breaks on the first real file.

What We're Building

A REST API endpoint that:

Accepts CSV files via multipart form-data
Validates file type and size before processing
Parses CSV data with proper error handling
Validates each row against a schema
Inserts records to PostgreSQL in batches
Returns detailed success/error responses

The full working code is at the end. But understanding why each piece exists matters more than copying it.

Prerequisites

You'll need Node.js 18+ and a PostgreSQL database. We'll use these packages:

npm install express multer papaparse pg zod
npm install -D @types/node @types/express @types/multer typescript

Why these choices:

Multer handles multipart uploads without reinventing file handling
PapaParse streams CSV parsing (memory-efficient for large files)
Zod validates rows with TypeScript inference
pg for PostgreSQL (swap for your database)

Step 1: Basic Express Setup with File Upload

Start with a minimal server that accepts file uploads:

// server.ts
import express from 'express';
import multer from 'multer';

const app = express();

// Configure multer for memory storage
const upload = multer({
  storage: multer.memoryStorage(),
  limits: {
    fileSize: 10 * 1024 * 1024, // 10MB limit
  },
  fileFilter: (req, file, cb) => {
    if (file.mimetype !== 'text/csv' && !file.originalname.endsWith('.csv')) {
      cb(new Error('Only CSV files allowed'));
      return;
    }
    cb(null, true);
  },
});

app.post('/api/upload', upload.single('file'), (req, res) => {
  if (!req.file) {
    return res.status(400).json({ error: 'No file uploaded' });
  }

  // File buffer is in req.file.buffer
  res.json({
    message: 'File received',
    size: req.file.size,
    originalName: req.file.originalname
  });
});

app.listen(3000, () => {
  console.log('Server running on port 3000');
});

The memoryStorage option keeps the file in memory as a Buffer. For files under 50MB, this is fine. For larger files, you'd use disk storage and stream from there—but that's a different tutorial.

Test it:

curl -X POST http://localhost:3000/api/upload \
  -F "file=@./users.csv"

Step 2: Parse the CSV Data

Now we parse the uploaded buffer. PapaParse handles the CSV quirks you don't want to think about—quoted fields, escaped commas, different line endings.

import Papa from 'papaparse';

interface ParseResult<T> {
  data: T[];
  errors: Array<{
    row: number;
    message: string;
  }>;
}

function parseCSV<T>(buffer: Buffer): Promise<ParseResult<T>> {
  return new Promise((resolve) => {
    const csvString = buffer.toString('utf-8');

    const result = Papa.parse(csvString, {
      header: true,           // First row is headers
      skipEmptyLines: true,   // Ignore blank rows
      transformHeader: (h) => h.trim().toLowerCase().replace(/\s+/g, '_'),
    });

    const errors = result.errors.map((e) => ({
      row: e.row ?? 0,
      message: e.message,
    }));

    resolve({
      data: result.data as T[],
      errors,
    });
  });
}

That transformHeader function normalizes headers. "First Name" becomes "first_name". "EMAIL" becomes "email". Saves headaches later.

Step 3: Validate Each Row

Parsing succeeded. But is the data actually valid? Here's where most CSV importers fail—they assume parsed means correct.

Use Zod to define what a valid row looks like:

import { z } from 'zod';

// Define your row schema
const UserRowSchema = z.object({
  email: z.string().email('Invalid email format'),
  name: z.string().min(1, 'Name is required'),
  role: z.enum(['admin', 'user', 'viewer']).optional().default('user'),
  created_at: z.string().optional(),
});

type UserRow = z.infer<typeof UserRowSchema>;

interface ValidationResult {
  valid: UserRow[];
  invalid: Array<{
    row: number;
    data: Record<string, unknown>;
    errors: string[];
  }>;
}

function validateRows(data: Record<string, unknown>[]): ValidationResult {
  const valid: UserRow[] = [];
  const invalid: ValidationResult['invalid'] = [];

  data.forEach((row, index) => {
    const result = UserRowSchema.safeParse(row);

    if (result.success) {
      valid.push(result.data);
    } else {
      invalid.push({
        row: index + 2, // +2 for header row and 0-indexing
        data: row,
        errors: result.error.errors.map((e) => `${e.path.join('.')}: ${e.message}`),
      });
    }
  });

  return { valid, invalid };
}

This approach gives you:

Type-safe validated data
Row-level error messages
The actual row number (accounting for the header)

The user sees "Row 15: email - Invalid email format" instead of a generic failure.

Step 4: Batch Database Inserts

Inserting one row at a time is slow. Really slow. With 10,000 rows, you're making 10,000 round trips to the database.

Batch inserts fix this:

import { Pool } from 'pg';

const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
});

async function batchInsert(users: UserRow[], batchSize = 500): Promise<number> {
  let inserted = 0;

  for (let i = 0; i < users.length; i += batchSize) {
    const batch = users.slice(i, i + batchSize);

    // Build parameterized query
    const values: unknown[] = [];
    const placeholders = batch.map((user, index) => {
      const offset = index * 3;
      values.push(user.email, user.name, user.role);
      return `($${offset + 1}, $${offset + 2}, $${offset + 3})`;
    });

    const query = `
      INSERT INTO users (email, name, role)
      VALUES ${placeholders.join(', ')}
      ON CONFLICT (email) DO UPDATE SET
        name = EXCLUDED.name,
        role = EXCLUDED.role
    `;

    await pool.query(query, values);
    inserted += batch.length;
  }

  return inserted;
}

The ON CONFLICT clause handles duplicates gracefully—update instead of fail. Adjust based on your needs.

Performance note: 500 rows per batch is a reasonable default. We've tested this across a few projects and it balances memory usage with query efficiency. Your mileage may vary depending on row size and database config.

Step 5: Put It All Together

Here's the complete endpoint:

import express, { Request, Response } from 'express';
import multer from 'multer';
import Papa from 'papaparse';
import { z } from 'zod';
import { Pool } from 'pg';

const app = express();
const pool = new Pool({ connectionString: process.env.DATABASE_URL });

const upload = multer({
  storage: multer.memoryStorage(),
  limits: { fileSize: 10 * 1024 * 1024 },
  fileFilter: (req, file, cb) => {
    if (!file.originalname.endsWith('.csv')) {
      cb(new Error('Only CSV files allowed'));
      return;
    }
    cb(null, true);
  },
});

const UserRowSchema = z.object({
  email: z.string().email(),
  name: z.string().min(1),
  role: z.enum(['admin', 'user', 'viewer']).default('user'),
});

type UserRow = z.infer<typeof UserRowSchema>;

app.post('/api/upload', upload.single('file'), async (req: Request, res: Response) => {
  try {
    if (!req.file) {
      return res.status(400).json({ error: 'No file uploaded' });
    }

    // Parse CSV
    const csvString = req.file.buffer.toString('utf-8');
    const parseResult = Papa.parse(csvString, {
      header: true,
      skipEmptyLines: true,
      transformHeader: (h) => h.trim().toLowerCase().replace(/\s+/g, '_'),
    });

    if (parseResult.errors.length > 0) {
      return res.status(400).json({
        error: 'CSV parsing failed',
        details: parseResult.errors.slice(0, 10), // First 10 errors
      });
    }

    // Validate rows
    const valid: UserRow[] = [];
    const invalid: Array<{ row: number; errors: string[] }> = [];

    parseResult.data.forEach((row: unknown, index: number) => {
      const result = UserRowSchema.safeParse(row);
      if (result.success) {
        valid.push(result.data);
      } else {
        invalid.push({
          row: index + 2,
          errors: result.error.errors.map((e) => `${e.path}: ${e.message}`),
        });
      }
    });

    // Reject if too many errors
    if (invalid.length > parseResult.data.length * 0.1) {
      return res.status(400).json({
        error: 'Too many validation errors (>10%)',
        totalRows: parseResult.data.length,
        invalidCount: invalid.length,
        sampleErrors: invalid.slice(0, 20),
      });
    }

    // Batch insert valid rows
    let inserted = 0;
    const batchSize = 500;

    for (let i = 0; i < valid.length; i += batchSize) {
      const batch = valid.slice(i, i + batchSize);
      const values: unknown[] = [];
      const placeholders = batch.map((user, idx) => {
        const offset = idx * 3;
        values.push(user.email, user.name, user.role);
        return `($${offset + 1}, $${offset + 2}, $${offset + 3})`;
      });

      await pool.query(
        `INSERT INTO users (email, name, role)
         VALUES ${placeholders.join(', ')}
         ON CONFLICT (email) DO UPDATE SET name = EXCLUDED.name, role = EXCLUDED.role`,
        values
      );
      inserted += batch.length;
    }

    res.json({
      success: true,
      imported: inserted,
      skipped: invalid.length,
      errors: invalid.slice(0, 20), // First 20 errors for debugging
    });
  } catch (error) {
    console.error('Upload failed:', error);
    res.status(500).json({ error: 'Internal server error' });
  }
});

app.listen(3000, () => console.log('Server on port 3000'));

Handling Common Edge Cases

After building a few of these, here's what breaks in production:

Wrong Encoding

Files from Excel sometimes use Windows-1252 encoding. UTF-8 parsing mangles special characters.

import iconv from 'iconv-lite';

function detectAndConvert(buffer: Buffer): string {
  // Check for UTF-8 BOM
  if (buffer[0] === 0xef && buffer[1] === 0xbb && buffer[2] === 0xbf) {
    return buffer.toString('utf-8');
  }

  // Try UTF-8 first, fall back to Windows-1252
  try {
    const utf8 = buffer.toString('utf-8');
    if (utf8.includes('\ufffd')) throw new Error('Invalid UTF-8');
    return utf8;
  } catch {
    return iconv.decode(buffer, 'win1252');
  }
}

Empty Rows in the Middle

PapaParse's skipEmptyLines handles trailing empty rows. But what about rows that look empty but contain whitespace? Add a filter:

const data = parseResult.data.filter((row: Record<string, string>) =>
  Object.values(row).some((v) => v && v.trim())
);

Massive Files

For files over 50MB, streaming is better than loading everything into memory:

// Use disk storage for large files
const upload = multer({
  dest: '/tmp/uploads/',
  limits: { fileSize: 100 * 1024 * 1024 }, // 100MB
});

// Then stream-parse from disk
const stream = fs.createReadStream(req.file.path);
Papa.parse(stream, {
  header: true,
  step: (row) => {
    // Process one row at a time
  },
  complete: () => {
    fs.unlinkSync(req.file.path); // Clean up
  },
});

Production Checklist

Before deploying, add these:

Rate limiting - Don't let one user hammer your import endpoint:

import rateLimit from 'express-rate-limit';

const uploadLimiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 10, // 10 uploads per window
});

app.post('/api/upload', uploadLimiter, upload.single('file'), ...);

Authentication - Obviously. Don't skip this.

Logging - Track who uploaded what, when, and how it went:

console.log({
  event: 'csv_upload',
  userId: req.user.id,
  filename: req.file.originalname,
  rows: parseResult.data.length,
  imported: inserted,
  errors: invalid.length,
});

Cleanup - If using disk storage, delete files after processing.

When to Skip Building This

Building your own CSV upload API makes sense when:

You need deep integration with your existing backend
The CSV structure is simple and stable
You have engineering time to maintain it

It doesn't make sense when:

Users need to map columns (their "Email" to your "email_address")
Multiple CSV formats hit the same endpoint
Data validation is complex or frequently changes
You'd rather ship features than debug import edge cases

For that second scenario, ImportCSV handles the messy parts—column mapping UI, validation rules, error handling—so you just receive clean, validated data via webhook.

FAQ

How large of a CSV file can this handle?

With memory storage, you're limited by Node.js heap size—practically around 100-200MB. For larger files, use disk storage with streaming parsing. We've processed 500MB files this way, though it takes a few minutes.

Should I validate on the client or server?

Both. Client-side validation (file type, size) gives instant feedback. Server-side validation (row schema) is mandatory for security. Never trust client-side validation alone.

Why batch inserts instead of bulk copy?

PostgreSQL's COPY command is faster for massive datasets. But it requires direct file system access or special client libraries. Batch inserts work with any PostgreSQL client and give you per-row error handling. For imports under 100,000 rows, the performance difference rarely matters.

How do I handle CSVs with different column names?

Either require exact column names (simplest) or build a column mapping layer. The mapping approach needs a UI for users to match their columns to your schema. That's non-trivial to build well—it's one of the things ImportCSV handles for you.

Building a CSV upload API isn't complicated. Building one that handles real-world CSVs—with weird encodings, inconsistent formats, and users who don't read instructions—takes more thought. This tutorial gives you the foundation. The edge cases will teach you the rest.