Data Loading Guide¶
This guide covers how to load data from various sources into Arc. Arc uses DuckDB as its data engine, providing powerful data loading capabilities for CSV, Parquet, JSON, S3, Snowflake, and more.
Overview¶
Arc can load data from: - Local files: CSV, Parquet, JSON, Excel - Remote URLs: HTTPS endpoints - AWS S3: Public and private buckets - Snowflake: Data warehouse tables - Databases: PostgreSQL, MySQL (via DuckDB extensions)
Quick Start¶
The easiest way to load data is to ask Arc in natural language:
Arc will automatically: 1. Detect the file format 2. Infer the schema 3. Create the table 4. Load the data
You can verify with:
Loading CSV Files¶
Or use SQL directly:
Loading Parquet Files¶
Parquet is recommended for large datasets due to compression and speed.
Or with SQL:
Loading JSON Files¶
Arc supports JSON and JSONL formats:
Loading from URLs¶
Load data directly from HTTPS URLs:
Arc will download and load the data automatically.
Loading from S3¶
Arc supports loading data from AWS S3 buckets. See the S3 Integration Guide for setup details.
Public S3 Buckets¶
Or with SQL:
Private S3 Buckets¶
After configuring S3 credentials:
Loading from Snowflake¶
Arc can query Snowflake data warehouses. See the Snowflake Integration Guide for setup.
Or with SQL:
/sql CREATE TABLE local_customers AS
SELECT * FROM snowflake.PUBLIC.CUSTOMERS
WHERE signup_date >= '2024-01-01'
Checking Loaded Data¶
Verify data with /sql SHOW TABLES and /sql SELECT * FROM table_name LIMIT 10.
Next Steps¶
- Feature Engineering Guide - Transform your loaded data
- Model Training Guide - Train models with your data
- S3 Integration - Set up S3 data loading
- Snowflake Integration - Set up Snowflake access
Related Documentation¶
- Arc-Pipeline Specification - Declarative data processing
- Arc-Knowledge - Built-in ML knowledge system