Claude Code for AI/ML — PyTorch, LangChain & HuggingFace

AI and machine learning projects have unique Claude Code requirements: GPU-aware code generation, large dependency environments, experiment reproducibility, and frameworks like PyTorch, LangChain, and HuggingFace Transformers that evolve rapidly. Without detailed CLAUDE.md guidance, Claude defaults to outdated API patterns. This guide covers the CLAUDE.md template for ML projects, PyTorch training workflows, LangChain RAG pipeline generation, HuggingFace fine-tuning with PEFT/LoRA, Jupyter integration, and experiment tracking patterns.

ML Project CLAUDE.md Template

# Project: [Your ML Project Name]

## Environment
- Python 3.12
- CUDA 12.4 / cuDNN 9.x (or CPU-only — specify)
- GPU: NVIDIA A100 80GB (or RTX 4090, or CPU — specify)
- Package manager: uv (uv sync to install; uv run python script.py)
- Virtual env: .venv/ (managed by uv)

## Key commands
```bash
uv run pytest tests/ -x -q          # unit tests (fast)
uv run ruff check . --fix            # lint + autofix
uv run mypy src/ --ignore-missing-imports  # type checking
uv run python -m src.train           # training entry point
uv run jupyter lab                   # Jupyter Lab server
```

## ML Stack
- Deep learning: PyTorch 2.5 (torch, torchvision, torchaudio)
- Training framework: PyTorch Lightning 2.x OR vanilla PyTorch (specify)
- LLM orchestration: LangChain 0.3 / LangGraph (if applicable)
- Transformers: HuggingFace transformers 4.x, peft, trl, datasets, tokenizers
- Vector DB: ChromaDB (local) / Pinecone (production)
- Experiment tracking: Weights & Biases (wandb) OR MLflow (specify)
- Data: pandas 2.x, polars, numpy 2.x, scikit-learn
- Visualization: matplotlib, seaborn, plotly

## Conventions
- Device placement: device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
- Reproducibility: set seed via torch.manual_seed + numpy.random.seed + random.seed
- Data loading: torch.utils.data.DataLoader with num_workers=4, pin_memory=True (GPU)
- Mixed precision: torch.autocast("cuda", dtype=torch.bfloat16) on A100+
- Gradient accumulation: specify accumulation_steps if batch doesn't fit in VRAM
- Model checkpointing: save best val_loss checkpoint; load with strict=False for fine-tuning
- No global state: pass config dataclass/dict to functions; never rely on globals

## Project structure
- src/
  - data/        — dataset classes, preprocessing, augmentation
  - models/      — model architectures
  - training/    — training loops, evaluation, loss functions
  - inference/   — inference pipelines, serving
  - utils/       — metrics, logging, visualization helpers
- configs/       — YAML experiment configs (Hydra or plain YAML)
- notebooks/     — exploratory analysis (Jupytext-synced to scripts/)
- tests/         — unit tests for data processing and model components

Automated Test & Lint Hooks

// .claude/settings.json
{
  "allowedTools": ["Edit", "Write", "Bash", "Read", "Glob", "Grep"],
  "hooks": [
    {
      "event": "PostToolUse",
      "matcher": "Edit|Write",
      "hooks": [{
        "type": "command",
        "command": "cd $PROJECT_ROOT && uv run ruff check . 2>&1 | tail -15 && uv run pytest tests/ -x -q --tb=short 2>&1 | tail -20"
      }]
    }
  ]
}
For GPU code, test the forward pass and loss computation in unit tests with a tiny synthetic batch (batch_size=2, seq_len=16) so tests run on CPU in CI without needing a GPU.

PyTorch Training Workflows

PatternCLAUDE.md instruction
Training loop"Training loop in src/training/trainer.py; separate train_epoch() and eval_epoch() functions; return metrics dict"
Mixed precision"Use torch.autocast('cuda', dtype=torch.bfloat16) context manager + GradScaler for fp16; never wrap the entire model"
Gradient clipping"torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0) before optimizer.step()"
Checkpointing"Save checkpoint dict: {'model': state_dict, 'optimizer': state_dict, 'epoch': n, 'val_loss': loss, 'config': cfg}"
Multi-GPU"Wrap model in torch.nn.parallel.DistributedDataParallel; use torchrun for launch; DistributedSampler for DataLoader"
Dataset"Extend torch.utils.data.Dataset with __len__ and __getitem__; apply augmentations only in training split"

New model training script

claude "add a training script for a text classification model.
Model: BERT-base-uncased fine-tuned for 3-class sentiment (positive/neutral/negative).
Dataset: CSV with 'text' and 'label' columns; split 80/10/10 train/val/test.
Training: AdamW optimizer, lr=2e-5, linear warmup 10%, cosine decay.
Mixed precision: bfloat16 on CUDA.
Log to W&B: epoch, train_loss, val_loss, val_accuracy, val_f1.
Save best checkpoint by val_f1.
Entry point: uv run python -m src.train --config configs/sentiment.yaml"

LangChain RAG Pipeline

# CLAUDE.md addition for LangChain/RAG

## LangChain / RAG stack
- LangChain version: 0.3 (use langchain-core, langchain-community packages)
- LLM: Anthropic Claude (langchain-anthropic) — model: claude-sonnet-4-6
- Embeddings: OpenAI text-embedding-3-small OR HuggingFace BAAI/bge-small-en-v1.5
- Vector DB: Chroma (local dev) / Pinecone (production)
- Chunking: RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
- Retrieval: MMR (search_type="mmr") with k=6, fetch_k=20
- Chain type: ConversationalRetrievalChain with chat history buffer
- Async: use ainvoke() for all chain calls in production (FastAPI context)
claude "add a RAG pipeline for a PDF Q&A system.
Ingestion: load PDFs from data/documents/ using PyPDFLoader; chunk with RecursiveCharacterTextSplitter.
Embedding: HuggingFace BAAI/bge-small-en-v1.5 (local, no API cost).
Vector store: Chroma persisted to .chroma_db/.
Retrieval: MMR with k=6.
Chain: ConversationalRetrievalChain with Claude claude-sonnet-4-6, system prompt in prompts/qa_system.txt.
Expose as FastAPI endpoint POST /chat with {question: str, chat_history: list}.
Include ingestion script: uv run python -m src.ingest --docs-dir data/documents/"

HuggingFace Fine-tuning with PEFT/LoRA

# CLAUDE.md addition for HuggingFace fine-tuning

## Fine-tuning conventions
- Framework: trl.SFTTrainer (supervised fine-tuning)
- PEFT method: LoRA via peft library
- Base model: specify (e.g. mistralai/Mistral-7B-Instruct-v0.3)
- Dataset format: HuggingFace Dataset with 'text' column (formatted instruction + response)
- LoRA config: r=16, lora_alpha=32, lora_dropout=0.05, bias="none"
  target_modules: ["q_proj", "k_proj", "v_proj", "o_proj"] for most models
- Training: bf16=True (A100), per_device_train_batch_size=4, gradient_accumulation_steps=4
- Hub: push adapter to HuggingFace Hub after training (push_to_hub=True)
- Merge: optionally merge adapter into base model for deployment
claude "add a fine-tuning script for instruction following.
Base model: mistralai/Mistral-7B-Instruct-v0.3 (load in 4-bit with bitsandbytes).
Dataset: JSONL in data/train.jsonl with 'instruction' and 'response' fields.
Format as Alpaca-style prompt template; tokenize to max_length=2048.
LoRA: r=16, alpha=32, target q_proj + v_proj.
Train 3 epochs, save checkpoint every 500 steps.
Evaluate: ROUGE-L score on data/eval.jsonl every epoch.
Push final adapter to HuggingFace Hub as {username}/{project}-lora."

Jupyter & Notebook Workflows

WorkflowBest practice with Claude Code
Exploratory analysis"Write exploration code in notebooks/explore.ipynb; extract reusable functions to src/utils/ — tell Claude to write to the .py file, not the notebook"
Jupytext sync"Use Jupytext to sync notebook.ipynb ↔ scripts/notebook.py; Claude edits the .py file; Jupyter auto-syncs the notebook"
Papermill execution"Parametrize notebooks with papermill; run with: papermill template.ipynb output.ipynb -p param value"
nbconvert for CI"Convert notebook to script for CI: jupyter nbconvert --to script --execute notebook.ipynb —catches execution errors"

Experiment Tracking

claude "add W&B experiment tracking to the training script.
Init run: wandb.init(project=cfg.project, config=cfg, tags=[cfg.model_name, cfg.dataset]).
Log each epoch: wandb.log({'train/loss': ..., 'val/loss': ..., 'val/accuracy': ..., 'epoch': n}).
Log best model as W&B artifact: artifact.add_file(checkpoint_path).
Add a --no-wandb CLI flag for offline runs.
Log gradient norms and learning rate every 100 steps for debugging."
claude "add MLflow tracking to the training script (alternative to W&B).
Set experiment: mlflow.set_experiment(cfg.experiment_name).
Log params at start: mlflow.log_params(cfg.__dict__).
Log metrics each epoch with step number.
Log model artifact using mlflow.pytorch.log_model.
Add MLFLOW_TRACKING_URI env var support (default: http://localhost:5000).
Include mlflow ui --port 5000 in CLAUDE.md commands."

5 Tips for AI/ML + Claude Code

  1. Always specify your GPU model and VRAM in CLAUDE.md: "GPU: RTX 4090 24GB". Claude will generate appropriate batch sizes, gradient accumulation settings, and precision flags (bf16 for A100+, fp16 for older GPUs) instead of using defaults that may cause OOM errors.
  2. Paste your pyproject.toml or requirements.txt into CLAUDE.md. ML libraries have frequent breaking changes between versions (transformers 4.x vs 3.x, LangChain 0.1 vs 0.3). Specifying exact versions prevents Claude from using deprecated APIs.
  3. For LLM API integrations in your ML project, use the Anthropic SDK directly instead of LangChain when the chain complexity is low — simpler code, better error messages, and lower latency. Specify in CLAUDE.md: "Use anthropic SDK directly for single-turn calls; LangChain only for multi-step chains."
  4. Add a tests/test_shapes.py that verifies model forward pass output shapes with a tiny synthetic batch. Shape errors are the #1 ML bug, and catching them in a 10ms unit test beats discovering them after 2 hours of GPU training.
  5. Seed everything in a single set_seed(42) utility function and call it at the top of every training script. Tell Claude in CLAUDE.md: "Call utils.set_seed(cfg.seed) at the start of every training and evaluation script." Reproducibility failures waste days of debugging.
📘 Free: 5 sample Claude Code prompts · plus the £3 pack with 25 moreSee the free 5 →