Skip to content

Architecture

Technical details of how DataSci Homelab is built and operates.


System Architecture

flowchart TB
    subgraph Host["Host Machine"]
        subgraph Container["Docker Container"]
            RS["RStudio Server<br>Port 8787"]
            JL["JupyterLab<br>Port 8888"]
            RS & JL --> ENV["Shared User Environment<br>R + Python"]
        end

        subgraph Volumes["Docker Volumes"]
            V1["home"]
            V2["r-library"]
            V3["py-packages"]
            V4["data"]
        end

        ENV --> Volumes
    end

    style Container fill:#e1f5fe
    style Volumes fill:#fff3e0
    style Host fill:#f5f5f5

Container Structure

Base Image

Built on Ubuntu 22.04 LTS for:

  • Long-term support (until 2027)
  • Wide package availability
  • Compatibility with R and Python ecosystems

Layer Breakdown

flowchart LR
    subgraph Image["Docker Image (~8GB)"]
        direction TB
        L1["System packages<br>~1.2GB"]
        L2["R installation<br>~600MB"]
        L3["RStudio Server<br>~1.5GB"]
        L4["Python<br>~400MB"]
        L5["Quarto + TinyTeX<br>~800MB"]
        L6["R packages<br>~2GB"]
        L7["Python packages<br>~1.7GB"]

        L1 --> L2 --> L3 --> L4 --> L5 --> L6 --> L7
    end

    style L1 fill:#ffccbc
    style L2 fill:#c5cae9
    style L3 fill:#c5cae9
    style L4 fill:#c8e6c9
    style L5 fill:#fff9c4
    style L6 fill:#c5cae9
    style L7 fill:#c8e6c9

Multi-Architecture Support

The image is built for both architectures:

Architecture Platforms
linux/amd64 Intel/AMD x86_64
linux/arm64 Apple Silicon, AWS Graviton

Built using Docker Buildx with QEMU emulation for cross-compilation.


Service Architecture

Startup Flow

flowchart TD
    A[Container Start] --> B[entrypoint.sh]
    B -->|Run as root| C[Configure User]

    subgraph Config["Configuration Phase"]
        C --> C1[Create/rename user]
        C --> C2[Set RSTUDIO_PASSWORD]
        C --> C3[Copy RStudio prefs]
    end

    C1 & C2 & C3 --> D[start.sh]
    D -->|Switch to user| E{Service Mode}

    E -->|both| F[RStudio Server]
    E -->|both| G[JupyterLab]
    E -->|rstudio| F
    E -->|jupyter| G

    F & G --> H[Wait for processes]

    style B fill:#ffcc80
    style D fill:#81d4fa
    style F fill:#c5cae9
    style G fill:#ffcc80

Process Management

Both services run as background processes under the same user:

# RStudio Server
sudo /usr/lib/rstudio-server/bin/rserver --server-daemonize=0 &

# JupyterLab
jupyter lab --config=/etc/jupyter/jupyter_server_config.py &

# Parent process waits
wait

The container exits only when both services stop.


Volume Architecture

Volume Mapping

flowchart LR
    subgraph Host["Host (./volumes/)"]
        H1[home]
        H2[r-library]
        H3[python-packages]
        H4[shared-data]
        H5[config-overrides]
    end

    subgraph Container["Container"]
        C1["/home/rstudio"]
        C2["/usr/local/lib/R/site-library"]
        C3["/home/rstudio/.local"]
        C4["/data"]
        C5["/config-overrides"]
    end

    H1 -.->|mount| C1
    H2 -.->|mount| C2
    H3 -.->|mount| C3
    H4 -.->|mount| C4
    H5 -.->|mount readonly| C5

    style Host fill:#e8f5e9
    style Container fill:#e3f2fd

Why These Locations?

R Packages (/usr/local/lib/R/site-library):

  • R checks this path by default
  • Writable without root
  • Separate from system packages

Python Packages (~/.local):

  • Default --user install location
  • Pip automatically uses this
  • Persists user-installed packages

Home Directory (/home/rstudio):

  • Contains user preferences
  • Contains project files
  • Contains shell configuration

Network Architecture

Port Mapping

flowchart LR
    subgraph External["External Access"]
        B1[Browser :8787]
        B2[Browser :8888]
    end

    subgraph Host["Host"]
        H1[localhost:8787]
        H2[localhost:8888]
    end

    subgraph Container["Container"]
        C1["RStudio Server<br>0.0.0.0:8787"]
        C2["JupyterLab<br>0.0.0.0:8888"]
    end

    B1 --> H1 --> C1
    B2 --> H2 --> C2

    style External fill:#fff3e0
    style Host fill:#f5f5f5
    style Container fill:#e3f2fd

With Cloudflare Tunnel

flowchart TD
    A[Internet] --> B[Cloudflare Network]
    B --> C["cloudflared<br>Running on host"]
    C --> D["Docker Container<br>localhost:8787/8888"]

    style A fill:#e1f5fe
    style B fill:#fff3e0
    style C fill:#c8e6c9
    style D fill:#e3f2fd

Configuration Architecture

Configuration Priority

flowchart TD
    subgraph Priority["Configuration Priority (highest → lowest)"]
        direction TB
        P1["1. Environment variables (.env)"]
        P2["2. Mounted config files (config-overrides/)"]
        P3["3. User preferences (in volumes)"]
        P4["4. Image defaults (baked in)"]

        P1 --> P2 --> P3 --> P4
    end

    style P1 fill:#c8e6c9
    style P2 fill:#dcedc8
    style P3 fill:#f0f4c3
    style P4 fill:#fff9c4

RStudio Configuration Flow

flowchart TD
    A[Container Start] --> B{rserver.conf exists?}
    B -->|Yes| C[Mount to /etc/rstudio/rserver.conf]
    B -->|No| D[Use image default]

    C & D --> E{rstudio-prefs.json exists?}
    E -->|Yes| F[Copy to ~/.config/rstudio/]
    E -->|No| G[Use default preferences]

    F & G --> H[RStudio Server starts]
    H --> I[User preferences take effect]

    style A fill:#e3f2fd
    style H fill:#c5cae9
    style I fill:#c8e6c9

Security Architecture

User Permissions

flowchart TD
    subgraph Root["root (UID 0)"]
        R1[Runs entrypoint script]
        R2[Sets user password]
        R3[Configures system files]
        R4[Drops privileges]
    end

    subgraph User["rstudio (UID 1000)"]
        U1[Runs RStudio Server via sudo]
        U2[Runs JupyterLab]
        U3[Owns home directory]
        U4[Cannot modify system files]
    end

    R4 --> User

    style Root fill:#ffcdd2
    style User fill:#c8e6c9

Sudo Access

The user has limited sudo access:

rstudio ALL=(ALL) NOPASSWD: /usr/lib/rstudio-server/bin/rserver

Only the RStudio Server binary can be run as root.

Authentication Flow

flowchart LR
    subgraph RStudio["RStudio Authentication"]
        direction LR
        R1[User] --> R2[Browser]
        R2 --> R3[RStudio Login]
        R3 --> R4[PAM]
        R4 --> R5["/etc/passwd"]
    end

    subgraph Jupyter["Jupyter Authentication"]
        direction LR
        J1[User] --> J2[Browser]
        J2 --> J3[Token Check]
        J3 --> J4[Jupyter Server]
    end

    style RStudio fill:#e3f2fd
    style Jupyter fill:#fff3e0

Build Architecture

GitHub Actions Workflow

flowchart TD
    A[Push to main] --> B[GitHub Actions]

    B --> C[Build AMD64]
    B --> D[Build ARM64]

    C & D --> E[Create Multi-arch Manifest]

    E --> F[Push to GHCR]
    E --> G[Push to Docker Hub]

    style A fill:#c8e6c9
    style B fill:#fff3e0
    style C fill:#e3f2fd
    style D fill:#e3f2fd
    style E fill:#f3e5f5
    style F fill:#e8eaf6
    style G fill:#e0f2f1

Image Tags

Tag Pattern When Created Purpose
latest Every push to main Default tag
v1.2.3 Git tags Version releases
sha-abc123 Every commit Specific builds
main Pushes to main Branch tracking

Health Check Architecture

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8787"]
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 60s

Health States

stateDiagram-v2
    [*] --> Starting
    Starting --> Healthy: start_period (60s)

    Healthy --> Checking: every 30s
    Checking --> Healthy: Success
    Checking --> Retry: Failure

    Retry --> Healthy: Success
    Retry --> Unhealthy: 3 failures

    Unhealthy --> Restarting: restart policy
    Restarting --> Starting

Package Installation Flow

R Package Installation

flowchart TD
    A["install.packages('pkg')"] --> B[Check .libPaths]

    subgraph Paths["Library Paths (checked in order)"]
        P1["1. ~/R/library (if exists)"]
        P2["2. /usr/local/lib/R/site-library"]
        P3["3. /usr/lib/R/library"]
    end

    B --> Paths
    P2 -->|"Volume mounted ✓"| C[Install to site-library]
    C --> D[Package persists in Docker volume]

    style P2 fill:#c8e6c9
    style D fill:#a5d6a7

Python Package Installation

flowchart TD
    A["pip install pkg"] --> B{--user flag?}
    B -->|"Auto in container"| C["Install to ~/.local/lib/python3.x/site-packages/"]
    C --> D[Package persists in Docker volume]

    style C fill:#c8e6c9
    style D fill:#a5d6a7

File System Layout

flowchart TD
    subgraph Root["/"]
        subgraph etc["/etc"]
            etc_rs["/etc/rstudio/rserver.conf"]
            etc_jup["/etc/jupyter/jupyter_server_config.py"]
        end

        subgraph usr["/usr"]
            usr_lib["System R packages"]
            usr_local["User R packages (volume)"]
        end

        subgraph home["/home/rstudio"]
            home_config["rstudio-prefs.json"]
            home_local["Python packages (volume)"]
            home_work["Default workdir"]
        end

        data["Shared data (volume)"]
        config["Mounted configs"]
    end

    style usr_local fill:#c8e6c9
    style home fill:#c8e6c9
    style home_local fill:#c8e6c9
    style data fill:#c8e6c9
    style config fill:#fff3e0

📦 = Volume Mounted

Directories marked with 📦 are mounted as Docker volumes and persist across container restarts.