6 Data Visualizations
6.1 Simple plot
Practice pushing the calculations to the database
Load the
connections
,dplyr
,dbplyr
, andconfig
librariesUse
connection_open()
to open a Database connectionUse
tbl()
to create a pointer to the v_orders tableUse
collect()
bring back the aggregated results into a “pass-through” variable calledby_year
Preview the
by_year
variable## # A tibble: 3 x 2 ## date_year n ## <int> <int> ## 1 2017 364317 ## 2 2016 366796 ## 3 2018 268934
Load the
ggplot2
libraryPlot results using
ggplot2
Using the code in this section, create a single piped code set which also creates the plot
6.2 Plot in one code segment
Practice going from dplyr
to ggplot2
without using pass-through variable, great for EDA
Summarize the order totals in a new variable called
sales
## Warning: Missing values are always removed in SQL. ## Use `SUM(x, na.rm = TRUE)` to silence this warning ## This warning is displayed only once per session.
## # Source: lazy query [?? x 1] ## # Database: postgres [rstudio_admin@localhost:5432/postgres] ## sales ## <dbl> ## 1 38014000
Summarize the order totals grouped by
date_year
in a new variable calledsales
## # Source: lazy query [?? x 2] ## # Database: postgres [rstudio_admin@localhost:5432/postgres] ## date_year sales ## <int> <dbl> ## 1 2017 13911500 ## 2 2016 13998800 ## 3 2018 10103600
Summarize the order totals grouped by
date_year
in a new variable calledsales
and plot the resultsSwitch the calculation to reflect the average of the order sale total
orders %>% group_by(date_year) %>% summarise(sales = mean(order_total)) %>% ggplot() + geom_col(aes(date_year, sales))
## Warning: Missing values are always removed in SQL. ## Use `mean(x, na.rm = TRUE)` to silence this warning ## This warning is displayed only once per session.
6.3 Create a histogram
Use the dbplot
package to easily create a histogram
Load the
dbplot
packageUse the
dbplot_histogram()
to build the histogramAdjust the
binwidth
to 10
6.4 Raster plot
Use dbplot
’s raster graph
Use a
dbplot_raster()
to visualizeorder_qty
versusorder_total
Change the plot’s resolution to 10
6.5 Using the compute
functions
Use the
db_compute_raster()
function to get the underlying results that feed the plotPreview the
locations
variable## # A tibble: 58 x 5 ## customer_lon customer_lat `n()` customer_lon_2 customer_lat_2 ## <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 -122. 37.8 10819 -122. 37.8 ## 2 -122. 37.7 22034 -122. 37.8 ## 3 -122. 37.8 10906 -122. 37.8 ## 4 -122. 37.8 11574 -122. 37.8 ## 5 -122. 37.8 33725 -122. 37.8 ## 6 -122. 37.8 20083 -122. 37.8 ## 7 -122. 37.7 11475 -122. 37.7 ## 8 -122. 37.7 23571 -122. 37.7 ## 9 -122. 37.8 11416 -122. 37.8 ## 10 -122. 37.8 11089 -122. 37.8 ## # … with 48 more rows
Load the
leaflet
libraryPipe
location
into theleaflet()
function, and then pipe that into theaddTiles()
functionAdd the
addRectangles()
function using the longitude and latitude variables
locations %>%
leaflet() %>%
addTiles() %>%
addRectangles(
~customer_lon,
~customer_lat,
~customer_lon_2,
~customer_lat_2
)
- Add the
fillOpacity
argument to theaddRectangles()
step, usen()
as the value for it
locations %>%
leaflet() %>%
addTiles() %>%
addRectangles(
~customer_lon,
~customer_lat,
~customer_lon_2,
~customer_lat_2,
fillOpacity = ~`n()`
)
- Modify
fillOpacity
to be calculated as a percentage against the maximum number of orders
locations %>%
leaflet() %>%
addTiles() %>%
addRectangles(
~customer_lon,
~customer_lat,
~customer_lon_2,
~customer_lat_2,
fillOpacity = ~(`n()` / max(`n()`))
)
- Add the
popup
argument with the following instruction as its value:~paste0("<p>No of orders: ",
n(),"</p>")
locations %>%
leaflet() %>%
addTiles() %>%
addRectangles(
~customer_lon,
~customer_lat,
~customer_lon_2,
~customer_lat_2,
fillOpacity = ~(`n()` / max(`n()`)),
popup = ~paste0("<p>No of orders: ", `n()`,"</p>")
)
Disconnect from the database using
connection_close