Writing Notebooks¶

Notebooks are the primary way to explore, analyze, and work with your data in MSD-LIVE. They combine code, visualizations, and narrative text in a single, interactive environment.

This guide walks you through the basics of writing notebooks, including how to import packages and access your dataset.

Once you've imported the libraries you need and loaded your data, you're ready to:

Visualize and explore your data
Create workflows for subsetting data in space and time
Develop analysis pipelines specific to your research questions
Use the AI Assistant to help you write your code

As you work, remember that notebooks are meant to be iterative—start simple, explore your data, and build up more complex analyses step by step.

Before you start:

You should have basic familiarity with Jupyter notebooks

Importing Packages¶

Your notebook environment comes with many common data science libraries pre-installed. Start by importing the packages you need:

Python¶

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

Julia¶

For Julia users, use the using statement:

using DataFrames
using Plots
using StatsPlots

R¶

For R users, use the library() function:

library(tidyverse)
library(ggplot2)

Refer to the language-specific documentation for detailed examples and additional libraries available in your environment.

Accessing Your Data¶

Dataset files are automatically available in your notebook environment via the DATA_DIR environment variable.

DATA_DIR is the preferred way to access dataset files
It points to the mounted dataset location in your environment
Avoid hardcoding paths like /data, as they may change

Python¶

import os
from pathlib import Path

data_dir = Path(os.environ["DATA_DIR"])
print("DATA_DIR:", data_dir)

# List files
for p in data_dir.iterdir():
    print("-", p.name)

# Load a CSV file (if present)
csvs = sorted(data_dir.glob("*.csv"))
if csvs:
    df = pd.read_csv(csvs[0])
    df.head()

Julia¶

data_dir = ENV["DATA_DIR"]
println("DATA_DIR = ", data_dir)

# Load data files
df = CSV.read(joinpath(data_dir, "your_data.csv"), DataFrame)

R¶

data_dir <- Sys.getenv("DATA_DIR")
print(paste("DATA_DIR =", data_dir))

# Load data files
df <- read.csv(file.path(data_dir, "your_data.csv"))

Accessing Multiple Datasets¶

You can access multiple datasets from within your notebook environment. The OTHER_DATASETS_DIR environment variable points to all public datasets with file exploration enabled. You can access another dataset by its Record ID:

import os
from pathlib import Path

# OTHER_DATASETS_DIR points to all public datasets with file exploration enabled
public_dir = Path(os.environ['OTHER_DATASETS_DIR'])
print("OTHER_DATASETS_DIR =", public_dir)

# Access another dataset by its Record ID
other_dataset_id = "6yawb-zyx60"
other_data_path = public_dir / other_dataset_id

print(f"Files available in dataset {other_dataset_id}:")
for f in other_data_path.iterdir():
    print(" -", f.name)

This allows you to combine data from multiple datasets in your analysis.

Using the MSD-LIVE AI Assistant¶

You can open the MSD-LIVE AI Assistant from the right sidebar while working in your notebook environment.

This built-in chatbot is designed to help you create high-quality dataset notebooks. It can assist with:

MSD-LIVE features: Using the scratch directory, referencing DATA_DIR and OTHER_DATASETS_DIR, and working in the environment.
JupyterLab help: Running cells, managing notebooks, and navigating the interface.
Code assistance: Writing Python, R, or Julia code to load, inspect, analyze, and visualize your data.
Notebook best practices: Organizing workflow, debugging, and improving notebook quality.

Use the AI Assistant whenever you need quick guidance or examples while developing your notebooks.

Best Practices¶

Keep notebooks focused — Create one notebook per analysis or workflow

Write clear explanations — Use markdown cells and comments to explain each section

Include practical examples — Show users how to subset, filter, and transform data

Test thoroughly — Run notebooks against real data before publishing

Document dependencies — List required packages and any external data

Use descriptive filenames — Make notebook purpose clear at a glance

Update your README — Briefly describe each example notebook in the repository

Save frequently — If using Notebook Lab, save often (3-hour session limit)