BigQuery Upload CSV: Console, CLI & API Methods
Learn 4 ways to upload CSV files to BigQuery: Console GUI, bq CLI, Python API, and Node.js. Includes code examples, schema options, and quotas.

How to Upload CSV Files to BigQuery
Loading CSV data into BigQuery is one of those tasks you'll do constantly as a data engineer. BigQuery gives you four different ways to do it, which is great for flexibility but can make it hard to know where to start.
I'll walk through the Console GUI, the bq command-line tool, and both Python and Node.js APIs. Each has its place. The Console is fastest for one-off uploads. The CLI is my go-to for anything scripted. The APIs give you full control when you're building pipelines.
Quick Method Comparison
Here's the cheat sheet:
| Method | Best For | Max File Size | Complexity |
|---|---|---|---|
| Console (GUI) | Quick uploads, small local files | 100 MB (local), unlimited (GCS) | Easy |
| bq CLI | Scripting, automation, larger files | 5 TB per load job | Medium |
| Python API | Custom pipelines, programmatic control | 5 TB per load job | Medium |
| Node.js API | JavaScript/TypeScript pipelines | 5 TB per load job | Medium |
| Cloud Storage Staging | Large files, parallel loading | 5 TB per load job | Medium |
If your file is under 100 MB and you're just exploring, use the Console. For anything larger or automated, grab the CLI or an API. All methods support files up to 5 TB when loading from Cloud Storage (only local Console uploads have that 100 MB cap).
Method 1: Console Upload (GUI)
The Console is the quickest way to get a CSV into BigQuery. No code, no setup. Just click through a form. I use this when I'm exploring a dataset or need to test something fast.
Step-by-Step Instructions
- Navigate to the BigQuery Console in your browser
- In the Explorer pane, expand your project and select the target dataset
- Click Create table in the Dataset info section
- Configure the Source section:
- Create table from: Select Upload for local files or Google Cloud Storage for GCS
- Select your CSV file or enter the GCS URI (e.g.,
gs://mybucket/myfile.csv) - File format: CSV
- Configure the Destination section:
- Select your Project and Dataset
- Enter a Table name
- Table type: Native table
- Configure the Schema section:
- Enable Auto detect for automatic schema inference, OR
- Manually define each column with name and type
- Expand Advanced options to configure:
- Write preference: Write if empty, Append to table, or Overwrite table
- Number of errors allowed: How many bad rows to tolerate (default: 0)
- Header rows to skip: Usually 1 if your CSV has headers
- Field delimiter: Comma (default), Tab, Pipe, or Custom
- Allow quoted newlines: Enable for multi-line field values
- Allow jagged rows: Enable to allow rows with missing trailing columns
- Click Create Table to start the load job
Console Upload Limitations
The Console caps local uploads at 100 MB. That sounds small, but it's enough for most exploratory work. If you need more, you have options:
- Upload to Cloud Storage first, then load from there
- Switch to the bq CLI or API
- Split the file (annoying, but it works)
Don't use the Console for production workflows. Anything that needs to run more than once should be scripted.
Method 2: bq Command-Line Tool
The bq CLI comes with the Google Cloud SDK. I think it's the best option for most use cases. You get full control without writing actual code, and it's easy to wrap in shell scripts or CI pipelines.
Basic Syntax
The core command structure for loading CSV files is:
bq load \
--source_format=CSV \
--skip_leading_rows=1 \
PROJECT_ID:DATASET.TABLE \
PATH_TO_SOURCE \
SCHEMALoad from Local File with Auto-Detection
When you just want to get data in and let BigQuery figure out the types:
bq load \
--autodetect \
--source_format=CSV \
mydataset.mytable \
./mydata.csvThe --autodetect flag makes BigQuery sample your data and guess the column types. Works surprisingly well most of the time.
Load from Cloud Storage with Explicit Schema
When you need to control exactly what type each column gets (production pipelines, usually):
bq load \
--source_format=CSV \
--skip_leading_rows=1 \
mydataset.mytable \
gs://mybucket/mydata.csv \
name:STRING,email:STRING,created_at:TIMESTAMPLoad Using a Schema File
If you have more than a few columns, putting the schema in a JSON file is easier to maintain:
bq load \
--source_format=CSV \
--skip_leading_rows=1 \
mydataset.mytable \
gs://mybucket/mydata.csv \
./schema.jsonYour schema.json file should contain an array of field definitions:
[
{"name": "name", "type": "STRING", "mode": "REQUIRED"},
{"name": "email", "type": "STRING", "mode": "NULLABLE"},
{"name": "age", "type": "INTEGER", "mode": "NULLABLE"},
{"name": "created_at", "type": "TIMESTAMP", "mode": "NULLABLE"}
]Loading Multiple Files with Wildcards
To load all CSV files matching a pattern:
bq load \
--source_format=CSV \
--skip_leading_rows=1 \
mydataset.mytable \
'gs://mybucket/data/*.csv' \
schema.jsonOne gotcha: BigQuery doesn't support multiple asterisks in paths. So gs://bucket/*/data/*.csv won't work. Bit annoying if you have a complex folder structure.
Common bq load Flags Reference
| Flag | Description | Default |
|---|---|---|
--source_format=CSV | Specify CSV format | Required |
--autodetect | Auto-detect schema from data | false |
--skip_leading_rows=N | Skip N header rows | 0 |
--field_delimiter=',' | Column delimiter character | comma |
--allow_quoted_newlines | Allow newlines inside quoted fields | false |
--allow_jagged_rows | Allow rows with missing trailing columns | false |
--max_bad_records=N | Max errors before job fails | 0 |
--replace | Overwrite existing table | false |
--noreplace | Append to existing table | true |
--encoding=UTF-8 | Character encoding | UTF-8 |
Method 3: Python API
The Python client library is what you want when you're building actual data pipelines. You get error handling, retries, and the ability to integrate with whatever else your pipeline does.
Installation
Install the BigQuery client library:
pip install google-cloud-bigqueryLoad from Local CSV File
Uploading a local file with auto-detection:
from google.cloud import bigquery
client = bigquery.Client()
table_id = "your-project.your_dataset.your_table"
job_config = bigquery.LoadJobConfig(
source_format=bigquery.SourceFormat.CSV,
skip_leading_rows=1,
autodetect=True,
)
with open("mydata.csv", "rb") as source_file:
job = client.load_table_from_file(source_file, table_id, job_config=job_config)
job.result() # Wait for job to complete
table = client.get_table(table_id)
print(f"Loaded {table.num_rows} rows to {table_id}")Load from Cloud Storage with Explicit Schema
For production, define your schema explicitly. Auto-detection will eventually bite you when some file has slightly different formatting:
from google.cloud import bigquery
client = bigquery.Client()
table_id = "your-project.your_dataset.your_table"
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField("name", "STRING"),
bigquery.SchemaField("email", "STRING"),
bigquery.SchemaField("created_at", "TIMESTAMP"),
],
skip_leading_rows=1,
source_format=bigquery.SourceFormat.CSV,
)
uri = "gs://mybucket/mydata.csv"
load_job = client.load_table_from_uri(uri, table_id, job_config=job_config)
load_job.result() # Wait for completion
destination_table = client.get_table(table_id)
print(f"Loaded {destination_table.num_rows} rows.")Controlling Write Behavior
The write_disposition setting determines what happens when the table already exists:
job_config = bigquery.LoadJobConfig(
write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE,
source_format=bigquery.SourceFormat.CSV,
skip_leading_rows=1,
)Your options:
WRITE_TRUNCATE: Wipes the table and replaces it with your new dataWRITE_APPEND: Adds rows to whatever's already there (this is the default)WRITE_EMPTY: Only writes if the table is empty, fails otherwise
Error Handling
Always wrap load operations in try-except. Loads fail for all sorts of reasons (schema mismatches, permission issues, malformed data):
from google.cloud import bigquery
from google.cloud.exceptions import GoogleCloudError
client = bigquery.Client()
try:
job = client.load_table_from_uri(uri, table_id, job_config=job_config)
job.result()
print(f"Loaded {client.get_table(table_id).num_rows} rows")
except GoogleCloudError as e:
print(f"Load job failed: {e}")Method 4: Node.js API
If you're in a JavaScript or TypeScript environment, the Node.js client works well. I've used it in Cloud Functions and Express apps where Python wasn't an option.
Installation
Install the BigQuery client library:
npm install @google-cloud/bigquery @google-cloud/storageLoad from Local CSV File
Local upload with auto-detection:
const { BigQuery } = require("@google-cloud/bigquery");
const bigquery = new BigQuery();
async function loadLocalCSV() {
const datasetId = "my_dataset";
const tableId = "my_table";
const filename = "./mydata.csv";
const metadata = {
sourceFormat: "CSV",
skipLeadingRows: 1,
autodetect: true,
};
const [job] = await bigquery
.dataset(datasetId)
.table(tableId)
.load(filename, metadata);
console.log(`Job ${job.id} completed.`);
const errors = job.status.errors;
if (errors && errors.length > 0) {
throw errors;
}
}
loadLocalCSV();Load from Cloud Storage with Explicit Schema
Loading from GCS with a defined schema:
const { BigQuery } = require("@google-cloud/bigquery");
const { Storage } = require("@google-cloud/storage");
const bigquery = new BigQuery();
const storage = new Storage();
async function loadCSVFromGCS() {
const datasetId = "my_dataset";
const tableId = "my_table";
const bucketName = "my-bucket";
const filename = "mydata.csv";
const metadata = {
sourceFormat: "CSV",
skipLeadingRows: 1,
schema: {
fields: [
{ name: "name", type: "STRING" },
{ name: "email", type: "STRING" },
{ name: "created_at", type: "TIMESTAMP" },
],
},
location: "US",
};
const [job] = await bigquery
.dataset(datasetId)
.table(tableId)
.load(storage.bucket(bucketName).file(filename), metadata);
console.log(`Job ${job.id} completed.`);
const errors = job.status.errors;
if (errors && errors.length > 0) {
throw errors;
}
}
loadCSVFromGCS();Controlling Write Behavior
Set the writeDisposition option to control how data is written:
const metadata = {
sourceFormat: "CSV",
skipLeadingRows: 1,
writeDisposition: "WRITE_TRUNCATE", // Overwrite table
// writeDisposition: "WRITE_APPEND", // Append (default)
// writeDisposition: "WRITE_EMPTY", // Write only if empty
};Async/Await Pattern with Error Handling
A fuller example with error handling (you'll want something like this in production):
const { BigQuery } = require("@google-cloud/bigquery");
const bigquery = new BigQuery();
async function loadCSVWithErrorHandling() {
const datasetId = "my_dataset";
const tableId = "my_table";
const filename = "./mydata.csv";
try {
const [job] = await bigquery
.dataset(datasetId)
.table(tableId)
.load(filename, {
sourceFormat: "CSV",
skipLeadingRows: 1,
autodetect: true,
});
if (job.status.errors && job.status.errors.length > 0) {
console.error("Errors encountered:", job.status.errors);
return;
}
const [table] = await bigquery.dataset(datasetId).table(tableId).get();
console.log(`Loaded ${table.metadata.numRows} rows`);
} catch (error) {
console.error("Load job failed:", error.message);
}
}
loadCSVWithErrorHandling();Cloud Storage Staging for Large Files
Once your files get bigger than 100 MB (or you want better performance), stage them in Cloud Storage first. It's an extra step, but worth it.
Why Bother with Cloud Storage?
A few reasons I reach for GCS staging:
- Gets around the Console's 100 MB limit
- BigQuery can read multiple GCS files in parallel, which speeds things up
- Wildcard patterns let you load a bunch of files at once
- If the load fails, your data is still sitting in the bucket for a retry
Staging Workflow
Pretty simple:
- Upload your CSV file(s) to a Cloud Storage bucket
- Load from the GCS URI into BigQuery using any method (Console, CLI, or API)
GCS URI Formats
BigQuery supports several URI patterns for loading from GCS:
| Pattern | Example | Description |
|---|---|---|
| Single file | gs://mybucket/data/myfile.csv | Load one specific file |
| Wildcard (all CSVs) | gs://mybucket/data/*.csv | Load all CSV files in folder |
| Wildcard (prefix) | gs://mybucket/data/users*.csv | Load files matching prefix |
Again, no multiple wildcards. gs://bucket/*/data/*.csv won't work.
Region Requirements
Your GCS bucket has to be in the same region as your BigQuery dataset. Different regions = failed load. This trips people up more often than you'd expect. Check your dataset location in the Console and make sure your bucket matches.
Schema: Auto-Detection vs Explicit
Auto-detection is convenient. Explicit schemas are reliable. You're always trading one against the other.
How Auto-Detection Works
BigQuery looks at the first 500 rows and guesses:
- Column names (from the header row)
- Data types for each column
- The delimiter you're using
- Whether there are quoted newlines
When Auto-Detection Works Well
Auto-detection does fine when your data is clean, you're using standard types (strings, numbers, ISO dates), and the first 500 rows look like the rest of the file.
When to Use Explicit Schema
Go explicit if:
- Your columns are all strings (auto-detect can't find the header)
- You want INTEGER for IDs instead of STRING
- You have INTERVAL types (auto-detect doesn't support these)
- You're running in production and can't afford surprises
- Your files might have slightly different formatting
Schema Definition Formats
With the bq CLI, you can define schemas inline:
name:STRING,email:STRING,age:INTEGER,created_at:TIMESTAMPOr use JSON for more columns and finer control:
[
{"name": "name", "type": "STRING", "mode": "REQUIRED"},
{"name": "email", "type": "STRING", "mode": "NULLABLE"},
{"name": "age", "type": "INTEGER", "mode": "NULLABLE"},
{"name": "created_at", "type": "TIMESTAMP", "mode": "NULLABLE"}
]Supported Data Types
BigQuery handles most common types in CSV:
- Text: STRING, BYTES
- Numeric: INTEGER, INT64, FLOAT, FLOAT64, NUMERIC, BIGNUMERIC
- Boolean: BOOLEAN, BOOL
- Date/Time: TIMESTAMP, DATE, TIME, DATETIME
- Other: GEOGRAPHY, JSON
No ARRAY or STRUCT support in CSV, though. If you need nested data, you'll have to switch to JSON or Avro.
Quotas and Limits
BigQuery has limits. You probably won't hit most of them, but a few can sneak up on you.
Load Job Limits
| Limit | Value |
|---|---|
| Load jobs per table per day | 1,500 (including failures) |
| Load jobs per project per day | 100,000 (including failures) |
| Maximum file size (uncompressed) | 5 TB |
| Maximum file size (gzip compressed) | 4 GB |
| Maximum row size | 100 MB |
| Maximum columns per table | 10,000 |
| Maximum cell size | 100 MB |
CSV-Specific Limitations
A few things to watch for:
- No nested data: CSV just can't do it
- BOMs cause problems: Strip byte order marks from your files
- Compression trade-off: gzip saves space but BigQuery can't parallelize reads
- Don't mix: All files in one load job need to be either compressed or not (no mixing)
Date and Time Format Requirements
BigQuery expects specific formats for temporal data:
- DATE: Must use
YYYY-MM-DDformat (e.g.,2026-02-15) - TIMESTAMP: Use
YYYY-MM-DD HH:MM:SSorYYYY/MM/DD HH:MM:SS
Encoding Support
BigQuery handles UTF-8 (recommended), ISO-8859-1, UTF-16BE/LE, and UTF-32BE/LE. Stick with UTF-8 unless you have a specific reason not to.
Pricing Note
Good news: batch loading is free. You only pay for storage once the data lands.
Troubleshooting Common Errors
You'll hit these at some point. Here's what they mean and how to fix them.
"Input CSV files are not splittable"
This means your gzip file is over 4 GB, or you're using --allow_quoted_newlines on a big file.
Fix it by splitting into smaller chunks, keeping gzip files under 4 GB, or dropping --allow_quoted_newlines if you don't actually need it.
"Error detected while parsing row"
Almost always a quoting issue. Look for unescaped quotes inside fields, mismatched quote pairs, or a delimiter that doesn't match what you specified.
Schema Mismatch Errors
Your data doesn't match the schema. Check that column order is right, date formats follow BigQuery's requirements, and types align. While debugging, --max_bad_records=N lets you see how many rows are failing.
Permission Errors
You need bigquery.dataEditor on the dataset. For GCS loads, also need storage.objectViewer on the bucket.
Alternative: ImportCSV for User-Facing Uploads
Everything above assumes you're the one uploading data. But what if you need end users to upload CSVs through your app?
That's a different problem. You need validation with friendly error messages, column mapping when headers don't match, data cleaning, progress indicators. Building all of that yourself takes a while.
ImportCSV is an embeddable component that handles the user-facing parts. Your users get a polished upload experience, you get clean data ready for BigQuery.
Summary
Four ways to get CSV data into BigQuery:
- Console: Quick and easy, but limited to 100 MB for local files
- bq CLI: My recommendation for anything scripted
- Python/Node.js APIs: When you need programmatic control
For production, use explicit schemas and load from Cloud Storage. Auto-detection is fine for exploring data but will eventually cause problems if you rely on it.
Watch out for the 1,500 load jobs per table per day limit. And make sure your date formats match what BigQuery expects.
Wrap-up
CSV imports shouldn't slow you down. ImportCSV aims to expand into your workflow — whether you're building data import flows, handling customer uploads, or processing large datasets.
If that sounds like the kind of tooling you want to use, try ImportCSV .