6 Data Visualizations

6.1 Simple plot

Practice pushing the calculations to the database

  1. Load the connections, dplyr, dbplyr, and config libraries

  2. Use connection_open() to open a Database connection

  3. Use tbl() to create a pointer to the v_orders table

  4. Use collect() bring back the aggregated results into a “pass-through” variable called by_year

  5. Preview the by_year variable

    ## # A tibble: 3 x 2
    ##   date_year      n
    ##       <int>  <int>
    ## 1      2017 364317
    ## 2      2016 366796
    ## 3      2018 268934
  6. Load the ggplot2 library

  7. Plot results using ggplot2

  8. Using the code in this section, create a single piped code set which also creates the plot

6.2 Plot in one code segment

Practice going from dplyr to ggplot2 without using pass-through variable, great for EDA

  1. Summarize the order totals in a new variable called sales

    ## Warning: Missing values are always removed in SQL.
    ## Use `SUM(x, na.rm = TRUE)` to silence this warning
    ## This warning is displayed only once per session.
    ## # Source:   lazy query [?? x 1]
    ## # Database: postgres [rstudio_admin@localhost:5432/postgres]
    ##      sales
    ##      <dbl>
    ## 1 38014000
  2. Summarize the order totals grouped by date_year in a new variable called sales

    ## # Source:   lazy query [?? x 2]
    ## # Database: postgres [rstudio_admin@localhost:5432/postgres]
    ##   date_year    sales
    ##       <int>    <dbl>
    ## 1      2017 13911500
    ## 2      2016 13998800
    ## 3      2018 10103600
  3. Summarize the order totals grouped by date_year in a new variable called sales and plot the results

  4. Switch the calculation to reflect the average of the order sale total

    ## Warning: Missing values are always removed in SQL.
    ## Use `mean(x, na.rm = TRUE)` to silence this warning
    ## This warning is displayed only once per session.

6.3 Create a histogram

Use the dbplot package to easily create a histogram

  1. Load the dbplot package

  2. Use the dbplot_histogram() to build the histogram

  3. Adjust the binwidth to 10

6.4 Raster plot

Use dbplot’s raster graph

  1. Use a dbplot_raster() to visualize order_qty versus order_total

  2. Change the plot’s resolution to 10

6.5 Using the compute functions

  1. Use the db_compute_raster() function to get the underlying results that feed the plot

  2. Preview the locations variable

    ## # A tibble: 58 x 5
    ##    customer_lon customer_lat `n()` customer_lon_2 customer_lat_2
    ##           <dbl>        <dbl> <dbl>          <dbl>          <dbl>
    ##  1        -122.         37.8 10819          -122.           37.8
    ##  2        -122.         37.7 22034          -122.           37.8
    ##  3        -122.         37.8 10906          -122.           37.8
    ##  4        -122.         37.8 11574          -122.           37.8
    ##  5        -122.         37.8 33725          -122.           37.8
    ##  6        -122.         37.8 20083          -122.           37.8
    ##  7        -122.         37.7 11475          -122.           37.7
    ##  8        -122.         37.7 23571          -122.           37.7
    ##  9        -122.         37.8 11416          -122.           37.8
    ## 10        -122.         37.8 11089          -122.           37.8
    ## # … with 48 more rows
  3. Load the leaflet library

  4. Pipe location into the leaflet() function, and then pipe that into the addTiles() function

  5. Add the addRectangles() function using the longitude and latitude variables

  1. Add the fillOpacity argument to the addRectangles() step, use n() as the value for it
  1. Modify fillOpacity to be calculated as a percentage against the maximum number of orders
  1. Add the popup argument with the following instruction as its value: ~paste0("<p>No of orders: ",n(),"</p>")
  1. Disconnect from the database using connection_close