| Title: | Datasets from the Hello Data Science Book |
|---|---|
| Description: | Provides datasets used for analysis and visualizations in the open-access Hello Data Science book. |
| Authors: | Mine Dogucu [aut, cre] (ORCID: <https://orcid.org/0000-0002-8007-934X>), Catalina Medina [aut] (ORCID: <https://orcid.org/0000-0003-2847-8180>), Alma Castro [aut] |
| Maintainer: | Mine Dogucu <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.0 |
| Built: | 2026-05-20 08:35:36 UTC |
| Source: | https://github.com/hellodata-science/hellodatascience |
The 2024 data was downloaded from U.S. Bureau of Labor Statistics' website https://www.bls.gov/tus/data/datafiles-2024.htm and subset to include only respondents who are enrolled in college or university. This dataset is used only for educational purposes. Those conducting real research should download the data from its original source. BLS.gov cannot vouch for the data or analyses derived from these data after the data have been retrieved from BLS.gov.
atus_collegeatus_college
A data frame with 312 rows and 5 variables. Each row represents a college student.
full time or part time employment status of respondent
age
are you enrolled as a full-time or part-time student?
weekly earnings at main job
number of people living in respondent's household
total nonwork-related time respondent spent alone (in minutes)
time spent sleeping
time spent working at main job
time spent taking class for degree, certification, or licensure
time spent shopping (store, telephone, internet)
time spent taking a lunch break
time spent participating in sports, exercise, or recreation
time spent attending or participating in religious services
U.S. Bureau of Labor Statistics (2025). https://nssdc.gsfc.nasa.gov/planetary/factsheet/index.html.
The data was downloaded from https://www.rug.nl/ggdc/productivity/pwt/ and contains information about different economic measures of countries around the world. The dataset has been subset and variable names have been modified for exercise purposes.
penn_worldpenn_world
A data frame with 12810 rows and 14 variables. Each row represents a country in a specific year.
3-letter ISO country code
country name
currency unit
year
expenditure-side real GDP at chained PPPs (in mil. 2017US$)
output-side real GDP at chained PPPs (in mil. 2017US$)
population (in millions)
number of persons engaged (in millions)
average annual hours worked by persons engaged
price level of household consumption, price level of USA GDPo in 2017=1
price level of capital formation, price level of USA GDPo in 2017=1
price level of government consumption, price level of USA GDPo in 2017=1
price level of exports, price level of USA GDPo in 2017=1
price level of imports, price level of USA GDPo in 2017=1
Feenstra, Robert C., Robert Inklaar and Marcel P. Timmer (2015), "The Next Generation of the Penn World Table" American Economic Review, 105(10), 3150-3182, available for download at http://www.ggdc.net/pwt/.
The data was scraped from NASA's website https://nssdc.gsfc.nasa.gov/planetary/factsheet/index.html and contains information on the planets of our Solar System
planetsplanets
A data frame with 8 rows and 7 variables. Each row represents a planet.
name of the planet
mass in 10^24 kg
length of day in hours
whether mean temperature in C is positive or not {negative}{positive}
number of moons
whether the planet has set of rings around it {TRUE} {FALSE}
surface pressure in bars
David R. Williams (2024). https://nssdc.gsfc.nasa.gov/planetary/factsheet/index.html.
How much do fruits and vegetables cost? United States Department of Agriculture (USDA) Economic Research Service (ERS), estimated average prices for 153 commonly consumed fresh and processed fruits and vegetables. USDA ERS calculated average prices at retail stores using 2022 retail scanner data from Circana (formerly Information Resources Inc. (IRI)). A selection of retail establishments—grocery stores, supermarkets, supercenters, convenience stores, drug stores, and liquor stores—across the United States provides Circana with weekly retail sales data (revenue and quantity).
produce_pricesproduce_prices
A data frame with 155 rows and 10 variables:
ID of item
name of produce
form of produce, either 'Canned', 'Dried', 'Fresh', 'Frozen', or 'Juice'
average retail price per pound or per pint
unit for the 'retail_price', either 'per pint' or 'per pound'
For most fruits and vegetables, a cup equivalent is the edible portion that will fit into a 1-cup measuring cup; for raisins and other dried fruit, it is the edible portion that will fit into a 1/2-cup; and for leafy vegetables, 2 cups. An edible cup equivalent is the unit of measurement used by the U.S. Department of Agriculture and the Department of Health and Human Services to report fruit and vegetable consumption recommendations.
unit for 'cup_equivalent_size'
average retail price per 'cup_equivalent_unit' of produce
type of produce, either 'fruit' or 'vegetables'
year
# Add more items for each column
U.S. Department of Agriculture, Economic Research Service. (2024). Fruit and vegetable prices. https://www.ers.usda.gov/data-products/fruit-and-vegetable-prices