The Happiest Notebooks on Earth

Hello.

Alison Hill

Director of Knowledge

Voltron Data

@apreshill
@apreshill
fosstodon.org/@apreshill
apreshill.com

Once upon a time…

There was a human named Joel Grus (@joelgrus), who did not like notebooks.

Select complaints

  1. Hidden state and out-of-order execution
  2. Notebooks are difficult for beginners
  3. Notebooks encourage bad habits
  4. Notebooks discourage modularity and testing
  5. Jupyter’s autocomplete, linting, and way of looking up the help are awkward
  1. Notebooks encourage bad processes
  2. Notebooks hinder reproducible + extensible science
  3. Notebooks make it hard to copy and paste into Slack/Github issues
  4. Errors will always halt execution
  5. Notebooks make it easy to teach poorly
  6. Notebooks make it hard to teach well

In a land far far away…

There was another human named Jeremy Howard (@jeremyphoward), who actually did like notebooks.

Meanwhile 🍿…

A fair post-ac data scientist turned product manager was knitting notebooks to her heart’s desire

Github search showing 607 notebook results user:apreshill extension:.Rmd extension:.qmd extension:.Rmarkdown

The problem

We are still just talking about tools and features.

And…we are still (mainly) talking with engineers.

Imagineers

A creative team of artists and engineers who design the theme parks

“We keep moving forward, opening up new doors and doing new things, because we’re curious. And curiosity keeps leading us down new paths. We’re always exploring and experimenting. We call Imagineering the blending of creative imagination with technical know-how.”
— Walt Disney

What is a notebook?

Mike Bostock:

“an interactive, editable document defined by code. It’s a computer program, but one that’s designed to be easier to read and write by humans.”

Martin Fowler:

“A computational notebook is an environment for writing a prose document that allows the author to embed code which can be easily executed with the results also incorporated into the document.”

One kind of notebook

---
title: "ggplot2 demo"
author: "Norah Jones"
date: "5/22/2021"
format: 
  html:
    fig-width: 8
    fig-height: 4
    code-fold: true
---

## Air Quality

@fig-airquality further explores the impact of temperature on ozone level.

```{r}
#| label: fig-airquality
#| fig-cap: Temperature and ozone level.
#| warning: false
library(ggplot2)
ggplot(airquality, aes(Temp, Ozone)) + 
  geom_point() + 
  geom_smooth(method = "loess"
)
```

Why make a notebook?

  1. For communicating to decision makers, who want to focus on the conclusions, not the code behind the analysis.

  2. For collaborating with other data scientists (including future you!), who are interested in both your conclusions, and how you reached them (i.e. the code).

  3. As an environment in which to do data science, as a modern day lab notebook where you can capture not only what you did, but also what you were thinking.

Making of a notebook

Mice and birds sing while making the dress for Cinderella to attend the royal ball

  • Authoring framework: how you write code + text (Jupyter, Quarto, Observable)

  • Language engine & markdown flavor: what you write

  • File format: what you save

  • Local editor: where you write locally 💻 (source + UI)

  • Platform editor: where the magic happens ☁️ (sharing + UI)

A notebook by any other name…

  1. Jupyter notebooks are notebooks. Jupyter is the product.
  2. Observable notebooks are notebooks. Observable is the product.
  3. Quarto and R Markdown notebooks are plain text documents. Two modes:
    • Notebook mode

    • Batch mode

An enchanted rose trapped inside a glass jar

Four key principles

  1. Suspended reality
  1. Multisensory experience
  1. Details matter
  1. Make it shareable

Principle #1

Suspended reality

Principle #1: Suspended reality

Problem: Walt Disney found it jarring to see characters out of place. It breaks the suspension of reality.

Solution: Disney built the utilidor system — a system of some of the world’s largest utility tunnels

An intricate web of tunnels lies underneath the park, enabling characters to navigate to their respective “worlds” without ever appearing out of world or duplicative.

Mickey Mouse walking underground in the Disney utilidors

Notebook utilidors

Less magic, more plain text logic

  1. Configuration settings - often YAML in the document header or an external file - preferably define own variables
  2. Outsource a script or document
  3. Reuse code chunks in different places
  4. Conditional evaluation of code

Principle #2

Multisensory experience

Principle #2: Multisensory experience

Problem: Being at Disney World needs to be an immersive, memorable experience.

Solution: Disney built the smellitizer system — a system that blows air across scented substances to make the air smell a certain (ideally good) way.

Smellitizers in Disney World waft scents around the park in key strategic areas

Notebook smells

Have you ever seen a notebook with…

  1. No meaningful headings (or worse, cute or peppy ones)
  2. No meaningful heading hierarchy
  3. No navigation
  4. No description of the data
  5. No logical order

Notebooks do have a scent

“Information foraging explains how users behave on the web and why they click certain links and not others.

Information scent can be used to analyze how people assess a link and the page context surrounding the link to judge what’s on the other end of the link.”

Smells like a good notebook 🥞

  1. Add a clickable table of contents 📌
  2. Add useful headings
  3. Limit to five or six H2 sections max if you can
  4. Have a topic sentence
  5. Describe your data (5w+1H)
  6. Lay things out in a logical order
    • Beginning > middle > end
    • Logical != chronological

Principle #3

Details matter

Principle #3: Details matter

Problem: When a single detail is wrong, you lose the experience and the magic.

Solution: Reduce the chances of losing the magic.

Examples:

1. You cannot buy gum

2. Official no-fly zone

3. Always within 30 steps of a trash can

Easy things should be easy

  1. Easy styling
    • Authors want fonts & colors!
    • Readers want fonts & colors!
  2. Easy layouts
    • Basic grid with rows/columns
  3. Easy show/hide code and results in output
  4. Easy “run from the top” mode
  5. Easy way to skip or freeze a code chunk

Hard things should be possible

  1. Version control
  2. Export/download source file
  3. Save all plots as image files
  4. Extract all of my code into a script
  5. Reproducible environments

Principle #4

Make it shareable

Principle #4: Make it shareable

Red concrete sidewalks at Disney World

Problem: Making something shareable makes it more memorable.

Solution: Make it easier to share.

Examples:

“The concrete—a brainchild of Disney and Kodak—creates more vivid photographs…and also makes the green grass appear greener.”

“[Cinderella’s] castle faces south, meaning the sun is never directly behind it. It’s (almost) impossible to take a bad photo of yourself in front of Cinderella’s Castle.”

A notebook is a snapshot in time

Outtakes, bloopers, and in-between things

Authors should be able to:

  1. Quickly share incomplete thoughts (with errors in code!)

  2. Time travel to different versions (with output!)

  3. Stash bits of code out of sight (with utilidors!)

  4. Annotate easily what to look at and what not to look at 🚥

“We’re on the brink of adventure, children. Don’t spoil it with questions.”
– Mary Poppins

I think notebooks are useful

Don’t touch that

it’s my emotional support notebook

  1. Useful to record my code + thought process

  2. Useful to get feedback from others

  3. Useful to get help getting unstuck

  4. Useful to help others get unstuck

  5. Useful to teach with

  6. Useful to learn with

  7. Useful to collaborate with

  8. Useful to publish work for my career

  9. Useful to debug with (see reprex)

  10. Useful to iterate on code with

Summary

Notebook users should be able to follow all four of these key principles with tools we make for them:

  1. Suspended reality — use utilidors to save their sanity and their code
  1. Multisensory experience — use navigation and organization to make notebooks smell better to everyone
  1. Details matter — “Easy things should be easy, and hard things should be possible” (Larry Wall)
  1. Make it shareable — especially things that don’t work and things that aren’t done

Sincere thanks to

  • Tom Mock

  • Allison Horst

  • Rich Iannone

  • JJ Allaire

  • Charles Teague

  • Observable Insight 2022 conference organizers

  • AGU Notebooks Now! Initiative and support from the Sloan Foundation

  • CANSSI and Rohan Alexander for this opportunity

Thank you!

What questions do you have?

@apreshill
@apreshill
fosstodon.org/@apreshill
apreshill.com