bluebike
: A Data Package for
Bluebike usersOur package includes data from the Boston Blue Bike trip history data
acquired from the Blue
Bikes System Data. The users can import all monthly trip history
data from 2020 to 2022 into a cleaned data set that can easily be used
for data analysis.
The package also includes a sample data set that includes 1000
sampled trip history from Feb. 2022, and a full data set that contains
information about all available stations. Functions inside the
package:
import_month_data()
: takes in numeric year/month
values and imports data for the specified time
station_distance()
: returns stations with distance
in ascending order given the user’s current location
station_radius()
: plots the position of the stations
within walking distance (500 m), and present the basic information about
the stations via leaflet
trip_distance()
: computes the geographical distance
between the start and end stations
The package would be a useful tool for the Blue Bike operations to analyze the trip data and help improve the shared bike service based on user data. It is also an easy-to-use tool for data analysis and visualization for anyone interested in the Blue Bike trip data. ## Data Sets Included
trip_history_sample
: a sample of 1000 trip data entries
from February 2022.station_data
: A dataset that includes identification,
position, and other basic information about bluebike stationsimport_month_data
enables users to retrieve monthly data
from Bluebike System Data website.
jan2015 <- import_month_data(2015, 1)
#> Rows: 7840 Columns: 15
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (4): start station name, end station name, usertype, birth year
#> dbl (9): tripduration, start station id, start station latitude, start stat...
#> dttm (2): starttime, stoptime
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
trip_history_sample
included
in the package, the user can easily find out the most popular station in
Feb. 2022:stations <- trip_history_sample %>%
group_by(start_station_name) %>%
summarize(trips_from = n())
head(stations)
#> # A tibble: 6 × 2
#> start_station_name trips_from
#> <chr> <int>
#> 1 175 N Harvard St 8
#> 2 191 Beacon St 3
#> 3 30 Dane St 7
#> 4 359 Broadway - Broadway at Fayette Street 4
#> 5 606 American Legion Hwy at Canterbury St 1
#> 6 699 Mt Auburn St 5
trip_distance
, the user can compute the the average
distance that user traveled in Jan. 2015jan_distance <- jan2015 %>%
sample_n(1000) %>%
trip_distance()
mean_jan_distance <- mean(jan_distance$distance)
mean_jan_distance
#> 3215.401 [m]
station_distance()
helps the user to find
the closest stations nearby.top_5_station <- station_distance(-71.13, 42.36) %>%
head(5)
top_5_station
#> distance station_ID station_name
#> 210 124.9942 [m] A32040 Honan Library
#> 3 427.6489 [m] A32019 175 N Harvard St
#> 221 606.1752 [m] A32011 Innovation Lab - 125 Western Ave at Batten Way
#> 74 660.5163 [m] A32005 Brighton Mills - 370 Western Ave
#> 380 954.2026 [m] A32001 Union Square - Brighton Ave at Cambridge St
#> station_position docks
#> 210 POINT (-71.12852 42.36027) 15
#> 3 POINT (-71.12916 42.3638) 18
#> 221 POINT (-71.1246 42.36371) 19
#> 74 POINT (-71.13776 42.36155) 15
#> 380 POINT (-71.13731 42.35333) 19
leaflet
,
the position of the stations can be displayed:library(leaflet)
leaflet(data = station_data) %>%
addTiles() %>%
addCircleMarkers(
lng = station_data$longitude,
lat = station_data$latitude,
radius = 0.1,
color = "blue"
)
station_radius()
plots the positions of
stations within a certain user defined radius and display basic
information about stations available.