Motivation¶

Why build yet another data science environment? This page explains the problems this project solves and the philosophy behind its design.

The Problem with Local Installations¶

If you've done data science work for any length of time, you've probably experienced these pain points:

Dependency Hell¶

Error: package 'xyz' was installed under R version 4.2.0

Your operating system ships one version of R. Homebrew installs another. RStudio wants a third. Python's situation is even worse — system Python, Homebrew Python, pyenv, conda, virtualenv... the combinations are endless.

When something breaks, you're left debugging your environment instead of doing actual work.

The "It Works on My Machine" Problem¶

You write a brilliant analysis. You share it with a colleague. They can't run it because:

They have a different OS
They're missing a system library
Their package versions don't match
Something is configured differently

Reproducing your environment on another machine often takes longer than writing the code did.

Polluting Your Main System¶

Every data science project brings its own dependencies. Over time, your laptop accumulates:

Multiple R versions
Conflicting Python environments
System libraries you installed once for one package
Configuration files scattered across your home directory

Eventually, something breaks in a way you can't fix without nuking everything and starting over.

Remote Access Limitations¶

RStudio Desktop and JupyterLab are designed for local use. If you want to:

Work from a different computer
Access your environment from a tablet
Let a colleague look at your code
Run long jobs on a more powerful machine

...you need to set up RStudio Server or JupyterHub, which is its own project.

The Container Solution¶

Docker containers solve these problems elegantly:

Problem	Container Solution
Dependency conflicts	Each container is isolated
Reproducibility	Same image runs identically everywhere
System pollution	Nothing installed on host
Remote access	Built-in web interface

But running RStudio Server or JupyterLab in Docker has traditionally been painful:

Separate containers — Most setups run RStudio and Jupyter in different containers, complicating shared data and context switching
Lost packages — By default, packages are lost when containers restart
Configuration complexity — You need to understand Docker volumes, networking, and permissions
Architecture issues — Many images don't support ARM64 (Apple Silicon)

What This Project Does Differently¶

One Container, Two IDEs¶

Instead of orchestrating multiple containers, DataSci Homelab runs both RStudio Server and JupyterLab in a single container. They share:

The same filesystem
The same user account
The same installed packages (R and Python)
The same /data directory

Switch between them by changing browser tabs.

True Package Persistence¶

Packages are stored in Docker volumes that persist across:

Container restarts
Container rebuilds
Image updates

Install a package once; it's there until you explicitly remove the volume.

Pre-Configured Everything¶

The image ships with:

All common data science packages pre-installed
Sensible default configurations
Health checks for reliability
Authentication options for security

Pull and run. No configuration required.

Native Multi-Architecture¶

Built natively for both AMD64 and ARM64. If you're on Apple Silicon, you get a native image — no Rosetta emulation, no performance penalty.

Remote-Ready by Default¶

RStudio Server and JupyterLab are web applications. Access them from:

Your laptop
Your phone
A different computer
Anywhere in the world (with Cloudflare Tunnel)

Design Philosophy¶

Batteries Included, But Swappable¶

The base image includes everything you need for common data science work. But nothing prevents you from:

Installing additional packages at runtime
Mounting your own configuration files
Building on top of this image for specialized needs

Persistence Over Reproducibility (For Packages)¶

Typical Docker wisdom says: "put everything in the image." But data scientists install packages constantly. Rebuilding an image every time you need a new package is impractical.

DataSci Homelab separates:

The base environment (in the image) — R, Python, RStudio, Jupyter
Your packages (in volumes) — Install once, persist forever

Local-First, Remote-Optional¶

The primary use case is running on your own machine. But the same setup works for:

A home server
A cloud VM
A Kubernetes pod

The Cloudflare Tunnel integration is documented but not required.

Opinionated Defaults, Easy Overrides¶

The default configuration reflects best practices for data science work:

UTF-8 encoding everywhere
POSIX line endings
CRAN mirror configured
Common packages pre-installed

But every default can be overridden via environment variables or mounted config files — no image rebuild required.

Who Built This and Why¶

This project was built by a data scientist who got tired of:

Reinstalling packages every time macOS updated
Explaining environment setup to new team members
Debugging R/Python version conflicts
Not being able to access work from different machines

The goal was simple: define the environment once, use it everywhere, never think about it again.

If that resonates with you, welcome aboard.

Ready to try it?

Get Started See the Benefits