--- title: "`bluebike`: A Data Package for Bluebike users" author: "Ziyue Yang and Tianshu Zhang" date: "`r Sys.Date()`" vignette: > %\VignetteIndexEntry{`bluebike`: A Data Package for Bluebike users} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} tags: - R - Rstats - tidyverse - leaflet - bluebike authors: - name: Ziyue Yang orcid: 0000-0002-9299-8327 - name: Tianshu Zhang orcid: 0000-0002-3004-4472 output: rmarkdown::html_vignette: df_print: default number_sections: no --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 5 ) ``` ## Summary Our package includes data from the Boston Blue Bike trip history data acquired from the [Blue Bikes System Data](https://www.bluebikes.com/system-data). The users can import all monthly trip history data from 2020 to 2022 into a cleaned data set that can easily be used for data analysis. \ The package also includes a sample data set that includes 1000 sampled trip history from Feb. 2022, and a full data set that contains information about all available stations. Functions inside the package:\ - `import_month_data()`: takes in numeric year/month values and imports data for the specified time\ - `station_distance()`: returns stations with distance in ascending order given the user's current location\ - `station_radius()`: plots the position of the stations within walking distance (500 m), and present the basic information about the stations via leaflet\ - `trip_distance()`: computes the geographical distance between the start and end stations\ The package would be a useful tool for the Blue Bike operations to analyze the trip data and help improve the shared bike service based on user data. It is also an easy-to-use tool for data analysis and visualization for anyone interested in the Blue Bike trip data. ## Data Sets Included - `trip_history_sample`: a sample of 1000 trip data entries from February 2022. - `station_data`: A dataset that includes identification, position, and other basic information about bluebike stations ## Basic Usage ```{r, message=FALSE, warning=FALSE} library(bluebike) library(dplyr) ``` ### Retrieve data online `import_month_data` enables users to retrieve monthly data from Bluebike System Data website. ```{r} jan2015 <- import_month_data(2015, 1) ``` ### Data Wrangling - Using the cleaned dataset `trip_history_sample` included in the package, the user can easily find out the most popular station in Feb. 2022: ```{r example, message=FALSE, warning=FALSE} stations <- trip_history_sample %>% group_by(start_station_name) %>% summarize(trips_from = n()) head(stations) ``` - Via `trip_distance`, the user can compute the the average distance that user traveled in Jan. 2015 ```{r} jan_distance <- jan2015 %>% sample_n(1000) %>% trip_distance() mean_jan_distance <- mean(jan_distance$distance) mean_jan_distance ``` - The function `station_distance()` helps the user to find the closest stations nearby. ```{r} top_5_station <- station_distance(-71.13, 42.36) %>% head(5) top_5_station ``` ### Data Visualization via Leaflet - Incorporated with the interactive map package `leaflet`, the position of the stations can be displayed: ```{r} library(leaflet) leaflet(data = station_data) %>% addTiles() %>% addCircleMarkers( lng = station_data$longitude, lat = station_data$latitude, radius = 0.1, color = "blue" ) ``` - The function `station_radius()` plots the positions of stations within a certain user defined radius and display basic information about stations available. ```{r} station_radius(-71.13, 42.36, r = 500) ``` ## Contributors - [Ziyue Yang](https://github.com/zyang2k) - [Tianshu Zhang](https://github.com/tianshu-zhang)