The Three Pillars of Arc¶
Arc is built on three foundational concepts that work together to enable AI-native machine learning. Understanding these pillars is key to mastering Arc.
Overview¶
When you describe what you want in natural language, Arc's AI agent: 1. Consults Arc-Knowledge for ML best practices 2. Generates Arc-Graph (model architecture) and Arc-Pipeline (data processing) specifications 3. Executes the workflow to train your model and generate predictions
The Three Pillars¶
1. Arc-Graph: Model Architecture¶
Arc-Graph is a declarative YAML specification for ML model architecture and training configuration.
Purpose: Define what your model looks like and how it should be trained.
Key Components: - inputs: Define input shapes, types, and feature names - graph: Specify model layers and connections (using PyTorch layers) - outputs: Define model outputs - trainer: Training configuration (optimizer, loss, epochs, batch size) - evaluator: Evaluation metrics
Example:
inputs:
user_features:
dtype: float32
shape: [null, 10]
graph:
- name: hidden
type: torch.nn.Linear
params:
in_features: 10
out_features: 64
outputs:
prediction: hidden.output
2. Arc-Pipeline: Feature Engineering¶
Arc-Pipeline is a declarative YAML specification for data processing and feature engineering workflows.
Purpose: Transform raw data into ML-ready features.
Key Capabilities: - Load data from multiple sources (CSV, Parquet, S3, Snowflake, databases) - Transform and normalize features - Create train/validation/test splits - Handle missing values and outliers - Generate derived features
Example:
name: user_features_pipeline
sources:
- name: raw_users
type: duckdb
params:
query: "SELECT * FROM users"
transforms:
- name: normalize_age
type: min_max_scaler
input: raw_users.age
outputs:
- name: processed_users
Learn more about Arc-Pipeline →
3. Arc-Knowledge: ML Best Practices¶
Arc-Knowledge is a curated collection of ML best practices, patterns, and domain expertise that guides Arc's AI agent.
Purpose: Provide the AI with expert knowledge to generate optimal specifications.
What's Included: - Data loading patterns (CSV, Parquet, S3, Snowflake) - Feature engineering techniques (normalization, encoding, splits) - Model architectures (MLP, DCN, MMOE, Transformers) - Training best practices - Evaluation strategies
Extensibility: You can add your own knowledge to ~/.arc/knowledge/ to customize Arc's behavior for your domain.
Example Knowledge Structure:
~/.arc/knowledge/
├── metadata.yaml # Knowledge catalog
├── custom_models.md # Your model architectures
└── domain_patterns.md # Your domain-specific patterns
Learn more about Arc-Knowledge →
How They Work Together¶
The Workflow¶
-
User Describes Goal
-
AI Consults Arc-Knowledge
- What architecture works best for recommendations? (MMOE, DCN)
- How to process user demographics? (normalization, encoding)
-
How to handle purchase history? (sequence features, embeddings)
-
AI Generates Specifications
- Arc-Pipeline: Load data, normalize features, create embeddings
-
Arc-Graph: MMOE architecture with appropriate layer sizes
-
Arc Executes
- Runs the Arc-Pipeline to process data
- Builds the model from Arc-Graph
- Trains the model
-
Evaluates performance
-
Results
- Trained model ready for predictions
- Portable YAML specifications for reproducibility
- TensorBoard logs for analysis
Getting Started with the Three Pillars¶
Ready to dive deeper? Explore each pillar:
- Arc-Graph - Learn to read and customize model architectures
- Arc-Pipeline - Master feature engineering workflows
- Arc-Knowledge - Extend Arc with your own expertise
Or jump straight to the Quick Start Tutorial to see them in action!
Common Questions¶
Do I need to write Arc-Graph/Arc-Pipeline myself?¶
No! Arc's AI generates these specifications for you from natural language. You only need to review and approve them. However, understanding the format helps you customize and debug.
Can I edit the generated specifications?¶
Yes! The YAML specifications are human-readable and editable. You can modify them directly if you want fine-grained control.
What if I need a custom architecture?¶
You can: 1. Describe it in natural language and let the AI generate it 2. Add it to Arc-Knowledge so the AI knows about it 3. Edit the Arc-Graph YAML directly
How is this different from writing PyTorch?¶
Arc uses declarative YAML specifications instead of Python code. The AI generates these specifications from natural language, making ML accessible without deep coding knowledge. Arc generates and uses PyTorch models internally, but you work at a higher level of abstraction.
Next Steps¶
- Arc-Graph Specification - Deep dive into model architecture specs
- Arc-Pipeline Specification - Deep dive into data processing specs
- Arc-Knowledge System - Learn to extend Arc's knowledge