Dr. Kristen Gorman, University of Alaska Fairbanks
Dr. Allison Horst, UC Santa Barbara
Dr. Alison Hill, Voltron Data
An R package featuring the penguins
dataset
344 penguins
3 penguin species (Adélie, chinstrap, and gentoo)
Inf
fun
> 476,400 CRAN downloads since 2020-07-23
Used globally in courses, workshops, blog posts, and other learning materials
penguins
now in Python, Julia, and TensorFlow
An integrative study of the breeding ecology and population structure of Pygoscelis penguins along the western Antarctic Peninsula as part of the Palmer LTER Program (US NSF)
The data were originally published in PLoS ONE in 20141
All data were made available through the Environmental Data Initiative
Collected by botanist Edgar Anderson in 1935
Used everywhere in data science teaching & resources
150 size measurements for 3 species of iris
No missing values
Lacks metadata
Variables like Sepal.Width
Published in The Annals of Eugenics (RA Fisher, 1936)
Keep using iris
and use it as an opportunity to learn/teach about its problematic aspects.
Find a better dataset to replace iris
.
Allison stumbles upon Gorman et al. and shares it with Alison
Alison writes a blog post with penguins after Allison shares it with her
Meanwhile, Allison keeps playing with the penguins
And plotting with the penguins
And looking at more penguin pictures
Aligns eerily well with iris data