Title: | Blue Bike Comprehensive Data |
---|---|
Description: | Facilitates the importation of the Boston Blue Bike trip data since 2015. Functions include the computation of trip distances of given trip data. It can also map the location of stations within a given radius and calculate the distance to nearby stations. Data is from <https://www.bluebikes.com/system-data>. |
Authors: | Ziyue Yang [aut, cre] , Tianshu Zhang [aut] |
Maintainer: | Ziyue Yang <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.3 |
Built: | 2024-11-04 06:05:43 UTC |
Source: | https://github.com/zyang2k/bluebike |
bluebike
includes functions and dataset that aids bluebike users to retrieve data and perform data wrangling and visualizations
This package includes data from the Boston Blue Bike trip history data acquired from the Blue Bikes System Data. The users can import all monthly trip history data from 2020 to 2022 into a cleaned data set that can easily be used for data analysis.
The package also includes a sample data set that includes 1000 sampled trip history from Feb. 2022, and a full data set that contains information about all available stations.
The package also serves as a visualization tool for user to browse for closest stations as well as trip-planning via computing trip distances.
Available functions are:
import_month_data
Takes in numeric year/month values and imports data from Blue Bikes System Data for the specified time
station_distance
Returns stations with distance in ascending order given the user's current location
station_radius
Plots the position of the stations within walking distance (500 m), and present the basic information about the stations via leaflet
trip_distance
Computes the geographical distance between the start and end stations
Available datasets are:
trip_history_sample
A sample of 1000 trip data entries from February 2022
station_data
A dataset that includes identification, position, and other basic information about bluebike stations
library(dplyr) # Find most used stations: stations <- trip_history_sample %>% group_by(`start_station_name`) %>% summarize(trips_from = n()) head(stations)
library(dplyr) # Find most used stations: stations <- trip_history_sample %>% group_by(`start_station_name`) %>% summarize(trips_from = n()) head(stations)
This function takes in numeric year/month values and imports data for the specified time
import_month_data(year, month)
import_month_data(year, month)
year |
numeric value of year |
month |
numeric value of month |
A spec_tbl_df object
# Pull Jan., 2015 data from web library(dplyr) jan_2015 <- import_month_data(2015, 1) # Pull first quarter of 2015 data from web spring2015 <- c(1, 2, 3) quarter_1_2015 <- lapply(spring2015, import_month_data, year = 2015) quarter_1_2015 <- bind_rows(quarter_1_2015)
# Pull Jan., 2015 data from web library(dplyr) jan_2015 <- import_month_data(2015, 1) # Pull first quarter of 2015 data from web spring2015 <- c(1, 2, 3) quarter_1_2015 <- lapply(spring2015, import_month_data, year = 2015) quarter_1_2015 <- bind_rows(quarter_1_2015)
A dataset that includes identification, position, and other basic information about bluebike stations
station_data
station_data
A data frame of 423 rows and 8 columns
Station ID
Station name
Latitude of the station
Longitude of the station
District of the station
Character vector showing if a station is public
The number of docks at each station
The year that the station was put into work
The original source of the data are bluebikes system data retrieved from https://www.bluebikes.com/system-data
This function returns stations with distance in ascending order given the user's current location
station_distance(long, lat)
station_distance(long, lat)
long |
longtitude of user location |
lat |
latitude of user location |
a tbl_df object showing the distance between the user and top five closest stations with ID, name, number of docks, and position
# Calculate distance for user at (-71.11467361, 42.34414899) and show the closest five stations top_5_station <- head(station_distance(-71.11467361, 42.34414899), 5)
# Calculate distance for user at (-71.11467361, 42.34414899) and show the closest five stations top_5_station <- head(station_distance(-71.11467361, 42.34414899), 5)
This function plots the position of the stations within walking distance
station_radius(long, lat, r = 1000)
station_radius(long, lat, r = 1000)
long |
numeric value of longitude |
lat |
numeric value of latitude |
r |
numeric value of set radius in meters |
A leaflet map
# Show user at (-71.11467, 42.34415) and set the radius to 500 m station_radius(long = -71.11467, lat = 42.34415, r = 2000)
# Show user at (-71.11467, 42.34415) and set the radius to 500 m station_radius(long = -71.11467, lat = 42.34415, r = 2000)
This function computes the geographical distance between the start and end stations for trips in a given dataset
trip_distance(data)
trip_distance(data)
data |
trip data pulled from the Blue Bike System data |
a tbl_df object with an additional distance column
# Calculate distance for sample trip data sample_distance <- trip_distance(trip_history_sample)$distance
# Calculate distance for sample trip data sample_distance <- trip_distance(trip_history_sample)$distance
a random sample of bluebike trip history data from February, 2022
trip_history_sample
trip_history_sample
A data frame of 1,000 rows representing each sample of trip history
Trip duration of each trip measured in seconds
Start time and date of each trip
Stop time and date of each trip
The identification variable of the start station
The name of the end station
The latitude of the start station
The longitude of the start station
The identification variable of the end station
The name of the end station
The latitude of the end station
The longitude of the start station
The identification variable of the bike corresponding to each trip
Type of user in each trip (Casual = Single Trip or Day Pass user; Member = Annual or Monthly Member)
Postal code of the user
The original source of the data are bluebikes system data retrieved from https://www.bluebikes.com/system-data