Daily Workflow¶
How to use DataSci Homelab in your daily data science work.
Starting Your Day¶
Then open:
- http://localhost:8787 for RStudio
- http://localhost:8888 for JupyterLab
Working with Projects¶
Creating a New Project¶
- File → New Project → New Directory
- Choose project type
- Set location to
/home/rstudio/projects/ - Create Project
- Navigate to your projects folder
- Click the New Folder button
- Create notebooks and files
Recommended Structure¶
/home/rstudio/
├── projects/
│ ├── project-1/
│ │ ├── data/
│ │ ├── scripts/
│ │ ├── output/
│ │ └── project-1.Rproj
│ └── project-2/
├── notebooks/
│ └── exploration.ipynb
└── shared/
└── common-scripts/
Installing Packages¶
R Packages¶
# Single package
install.packages("packagename")
# Multiple packages
install.packages(c("pkg1", "pkg2", "pkg3"))
# From GitHub
devtools::install_github("user/repo")
# From Bioconductor
BiocManager::install("packagename")
Python Packages¶
# In terminal or notebook
pip install packagename
# Specific version
pip install packagename==1.2.3
# From requirements file
pip install -r requirements.txt
Packages Persist
All packages are stored in Docker volumes and persist across container restarts.
Working with Data¶
The Shared Data Directory¶
Both RStudio and JupyterLab can access /data:
# Python
import pandas as pd
df = pd.read_csv("/data/myfile.csv")
df.to_csv("/data/output.csv", index=False)
Adding Data from Host¶
Data placed in ./volumes/shared-data/ on your host is available at /data in the container:
Then in the container:
Switching Between R and Python¶
Same Project, Different Languages¶
Your home directory is shared between RStudio and JupyterLab:
- Create project files in RStudio
- Open the same directory in JupyterLab
- Work with
.ipynband.Rfiles interchangeably
Using R in Jupyter¶
- Create a new notebook
- Select "R" kernel
- Write R code as normal
Using Python in RStudio¶
Open Terminal in RStudio:
Or use reticulate:
Version Control¶
Git Setup (First Time)¶
Daily Git Workflow¶
Rendering Documents¶
Quarto Documents¶
R Markdown¶
Jupyter to HTML/PDF¶
Long-Running Jobs¶
Background Execution in R¶
# Using callr for background jobs
library(callr)
job <- r_bg(function() {
# Long running code
Sys.sleep(3600)
return("Done!")
})
# Check status
job$is_alive()
# Get result when done
job$get_result()
Background Execution in Python¶
import subprocess
# Run in background
process = subprocess.Popen(['python', 'long_script.py'])
# Check later
process.poll() # Returns None if still running
Using Terminal¶
# Run with nohup to continue after disconnect
nohup Rscript long_script.R > output.log 2>&1 &
# Check progress
tail -f output.log
Backing Up Your Work¶
Manual Backup¶
Package Lists¶
This creates:
r_packages_TIMESTAMP.csv- R packagespython_packages_TIMESTAMP.txt- Python packages
Ending Your Day¶
# Stop containers (data persists)
docker-compose down
# Or just stop without removing
docker-compose stop
Data Safety
Using down or stop preserves all your data in volumes. Only down -v removes volumes.
Common Tasks¶
Update System Packages¶
# Enter container as root
docker-compose exec -u root homelab bash
# Update
apt-get update && apt-get upgrade -y
# Exit
exit
Restart Services¶
# Restart everything
docker-compose restart
# Restart just the container
docker restart datasci-homelab
Check Logs¶
# All logs
docker-compose logs
# Follow logs
docker-compose logs -f
# Last 100 lines
docker-compose logs --tail=100
Clean Up¶
# Remove stopped containers
docker container prune
# Remove unused images
docker image prune
# Full cleanup (careful!)
docker system prune
Tips for Efficiency¶
1. Use Keyboard Shortcuts¶
Both RStudio and JupyterLab have extensive keyboard shortcuts. Learn the common ones:
Ctrl+Enter- Run current line/cellCtrl+Shift+Enter- Run allCtrl+S- Save
2. Split Your Screen¶
Open RStudio on one half, JupyterLab on the other for polyglot workflows.
3. Use the Terminal¶
Both IDEs have integrated terminals. Use them for:
- Git operations
- File management
- Running scripts
4. Keep Data in /data¶
Store datasets in the shared data directory for easy access from both environments.
5. Commit Often¶
With the integrated Git tools, there's no excuse. Small, frequent commits.