Connecting R to MySQL: How to Connect and Work with Database Objects

Estimated time:
time
min

If you're a data scientist or a data engineer, chances are you're working with databases daily. For a newcomer, it's essential to know how to establish connections with various database vendors and how to run basic SQL commands. That's where this article comes in. After reading, you'll know how to approach connecting R to MySQL database, how to create tables from R, how to insert data to the database and get it back to R. You'll also get a practical guide on dumping R aggregations to MySQL and visualizing them with ggplot2. <blockquote>Are you new to SQL? <a href="https://appsilon.com/introduction-to-sql/">Here are 5 key concepts every data professional must know</a>.</blockquote> Table of contents: <ul><li><a href="#database-setup">MySQL Database Setup on AWS - A Complete Guide</a></li><li><a href="#connect">How to Connect R to MySQL with RMySql</a></li><li><a href="#create-table">Connecting R to MySQL - Creating Tables</a></li><li><a href="#insert-data">Inserting and Selecting Data into MySQL from R</a></li><li><a href="#aggregate-data">Connecting R to MySQL - Data Manipulation and Aggregation</a></li><li><a href="#visualize-data">Visualizing R MySQL Data with ggplot2</a></li><li><a href="#summary">Connecting R to MySQL - Summary</a></li></ul> <hr /> <h2 id="database-setup">MySQL Database Setup on AWS - A Complete Guide</h2> Before connecting to a MySQL database, we first have to provision it. We could install the database locally to simplify things, but we prefer replicating real-world scenarios. For that reason, we'll set up an instance on Amazon AWS. It's assumed you have an AWS account configured already. <h3>Creating a New Database Instance</h3> To begin, log into your AWS account and navigate to <i>RDS - Databases</i>. You'll see an empty list if this is your first time here: <img class="size-full wp-image-20285" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d730d2619e892f98a862_7f482d0e_1-1.webp" alt="Image 1 - Creating an Amazon RDS instance" width="2318" height="1322" /> Image 1 - Creating an Amazon RDS instance Click on the big orange <i>Create database</i> button to start the new database provisioning process. It will redirect you to the following page. We'll use the standard database creation mode and will choose MySQL as the engine of choice: <img class="size-full wp-image-20287" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d73080c147fd384bd3b1_4ea70c2f_2-1.webp" alt="Image 2 - Standard MySQL engine" width="2318" height="1718" /> Image 2 - Standard MySQL engine Go with the default MySQL version 8 and don't forget to use the <b>Free tier</b> template. Otherwise, things could get expensive: <img class="size-full wp-image-20289" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d731c77436744522f3e8_27d9ca7c_3-1.webp" alt="Image 3 - Free Tier MySQL engine version" width="2318" height="1948" /> Image 3 - Free Tier MySQL engine version Scroll down to the <i>Settings</i> section. This is the area in which you'll specify the database identifier (irrelevant if you only have one database instance), username, and password. Make sure to remember these because they'll be essential for establishing a connection from R: <img class="size-full wp-image-20291" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d7328e4ed268cf598c66_a487aee9_4-1.webp" alt="Image 4 - MySQL database credentials" width="2318" height="1948" /> Image 4 - MySQL database credentials Further down, you can leave most of the settings as default. Choose the lightest compute instance you can with the least amount of storage: <img class="size-full wp-image-20293" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d732ef40262c8e94d245_6f6ac339_5-1.webp" alt="Image 5 - Instance configuration and storage" width="2318" height="1948" /> Image 5 - Instance configuration and storage Also, it's a good practice to disable storage autoscaling if you're just testing things out: <img class="size-full wp-image-20295" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d7337a511f178f8e25fa_dfda37dc_6-1.webp" alt="Image 6 - Storage autoscaling" width="2318" height="1948" /> Image 6 - Storage autoscaling In the <i>Security</i> section, make sure to allow public access to the instance. This won't be enough to actually connect to it, but more about that in a bit: <img class="size-full wp-image-20297" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d733c92cb2a06ec25d44_d310d1fa_7-1.webp" alt="Image 7 - Instance connectivity" width="2318" height="1948" /> Image 7 - Instance connectivity And finally, click on the <i>Create database</i> button to trigger the provisioning process: <img class="size-full wp-image-20299" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d73480c147fd384bd657_d4e50e62_8-1.webp" alt="Image 8 - Creating a database instance" width="2318" height="2114" /> Image 8 - Creating a database instance And that's it! You'll be redirected back to the <i>Databases</i> screen in which you'll be able to monitor the provisioning process. Wait until the status goes to <i>Available</i> and then continue reading. <h3>Obtaining Database Connection Parameters</h3> Here's what you'll see on the screen once your database instance becomes available: <img class="size-full wp-image-20301" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d735dbea942dfb09249d_6dafa1c5_9-1.webp" alt="Image 9 - RDS databases" width="2318" height="1292" /> Image 9 - RDS databases You can now click on the database identifier to obtain the connection parameters. These will be located in the second box under <i>Connectivity &amp; Security:</i> <img class="size-full wp-image-20303" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d73636826c908435f2e3_4929a993_10-1.webp" alt="Image 10 - MySQL connection parameters" width="2318" height="1292" /> Image 10 - MySQL connection parameters Note down the <i>Endpoint</i> and <i>Port</i> values - you'll need both in a bit. <h3>Modifying Database Security Rules</h3> So, you've provisioned the database and obtained the connection parameters, but that's not enough to establish a connection. By default, the VPC security group assigned to the database instance won't allow traffic to this particular port (3306). To change that, click on the assigned VPC security group visible in <i>Image 10</i>. You'll get redirected to the following screen: <img class="size-full wp-image-20305" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d736537e1bbc53969027_af5fb0a2_11-1.webp" alt="Image 11 - Inbound rules for the instance" width="2318" height="2028" /> Image 11 - Inbound rules for the instance Under the <i>Inbound rules</i> section, click on <i>Edit inbound rules</i>. Add a new rule to allow traffic from anywhere to the 3306 port, as shown below: <img class="size-full wp-image-20307" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d737d48dce80f00f7de9_2adfcfc4_12-1.webp" alt="Image 12 - New inbound rule" width="2318" height="2028" /> Image 12 - New inbound rule Scroll down and save your edited inbound rules. That's all you need to modify in order to establish a connection. Let's go over that next. <h3>Establishing a Connection and Creating a Database</h3> We won't use R just yet since we only want to verify the database was provisioned successfully and that you can access it from anywhere. We're using a free GUI tool named <a href="https://tableplus.com/" target="_blank" rel="noopener">TablePlus</a>. You can also download it, enter the connection parameters, and click o <i>Test</i> to test the connection. If the fields get a green overlay, it means the connection to your database can be established: <img class="size-full wp-image-20309" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d7382c764ec47711f8e5_20462bbc_13-1.webp" alt="Image 13 - Establishing a instance connection" width="1000" height="864" /> Image 13 - Establishing a instance connection <i>In addition, you'll also need to create a new database to store our tables. You can do so directly from TablePlus. We've named ours <code>data_db</code>:</i> <img class="size-full wp-image-20311" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d7399006b587a562ac85_67cda027_14-1.webp" alt="Image 14 - Creating a new database" width="2230" height="1720" /> Image 14 - Creating a new database And that's it! The following sections will go over connecting R to MySQL. <h2 id="connect">How to Connect R to MySQL with RMySql</h2> This section will explain how to establish an R MySQL connection and what are the best practices for doing so. You'll also learn how to hide sensitive information from R scripts. <h3>Setting up the .env File</h3> Think of the <code>.env</code> file as a file that holds environment variables for your project. You can separate sensitive info and auth credentials from your R code, and simply add <code>.env</code> file to <code>.gitignore</code>. Everyone will still be able to see your code, but no one will know your security credentials. To start, create a file named <code>.env</code>. There's no actual name for the file, only the extension. Once created, paste the following inside it: <pre><code class="langauge-text">DB_HOST= DB_PORT= DB_USER= DB_PASS= DB_NAME=</code></pre> And, of course, fill it with your database information. There's no need to surround strings with quotes, so keep that in mind. <h3>Establishing R MySQL Connection</h3> We'll use the <code>dotenv</code> R package to load the environment variables, and <code>RMySQL</code> to establish a connection with a MySQL database. Install any of these if needed. At the top of the script, we're loading the environment variables from the <code>.env</code> file, and then calling <code>Sys.getenv("variable-name")</code> to read them. As for the connection, we're using the <code>dbConnect()</code> function and passing in all the sensitive information stored inside the <code>.env</code> file: <pre><code class="language-r">library(RMySQL) dotenv::load_dot_env()</code></pre> <pre><code class="language-r">conn &lt;- dbConnect(  MySQL(),  user = Sys.getenv("DB_USER"),  password = Sys.getenv("DB_PASS"),  host = Sys.getenv("DB_HOST"),  port = as.integer(Sys.getenv("DB_PORT")),  dbname = Sys.getenv("DB_NAME") )</code></pre> If you don't get any errors running this code, it means you've successfully established a connection to the MySQL database from R! Up next, let's see how to create tables. <h2 id="create-table">Connecting R to MySQL - Creating Tables</h2> We're now connected to a remote MySQL database instance. To be more precise, we're connected to a separate <code>data_db</code> database inside it. We'll create a new table <code>inventory</code> to keep track of dummy warehouse items. The table will have a couple of fields, such as item identifier, name, quantity, and stock date. Here's the command you need to run to create the table: <pre><code class="language-r">dbSendQuery(  conn = conn,  statement = paste(    "CREATE TABLE inventory (",    "inv_id INT NOT NULL AUTO_INCREMENT,",    "inv_item_id INT,",    "inv_item_name VARCHAR(255),",    "inv_item_qty INT,",    "inv_item_stock_date DATE,",    "PRIMARY KEY (inv_id)",    ")"  ) )</code></pre> As you can see, we're using the <code>dbSendQuery()</code> function to run a SQL statement on the database. The statement itself is just one lengthy string split into multiple smaller ones with the R's <code>paste()</code> function. These are the results you'll see in the R console: <img class="size-full wp-image-20313" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d73a374a4f9b8ed44b7d_699295c9_15-1.webp" alt="Image 15 - Create table statement execution" width="1176" height="842" /> Image 15 - Create table statement execution Not a lot of info, and we don't know if the table was created. Luckily, there's a handy function that lists all of the tables in the database - <code>dbLisTables()</code>: <pre><code class="language-r">dbListTables(conn = conn)</code></pre> Here's the output: <img class="size-full wp-image-20315" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d73b9e4d047ca486e2fd_eb29bb20_16-1.webp" alt="Image 16 - Listing tables in the database" width="638" height="118" /> Image 16 - Listing tables in the database It looks like the table was created! Let's proceed with inserting data from R. <h2 id="insert-data">Inserting and Selecting Data into MySQL from R</h2> This section of the article will show you both how to insert data from R to MySQL and how to get the data back to R. Let's start by inserting a couple of rows. <h3>How to Insert Data to MySQL from R</h3> The most obvious way to insert a row into a table is by running a SQL <code>INSERT</code> statement. It allows you to specify the columns for which you want to insert the value and the actual values you want to insert. We'll keep things simple and run multiple <code>INSERT</code> statement to simulate inventory changes across two days: <pre><code class="language-r">dbSendQuery(conn = conn, statement = "INSERT INTO inventory (inv_item_id, inv_item_name, inv_item_qty, inv_item_stock_date) VALUES (1, 'Banana', 12, '2023-01-01')") dbSendQuery(conn = conn, statement = "INSERT INTO inventory (inv_item_id, inv_item_name, inv_item_qty, inv_item_stock_date) VALUES (2, 'Apple', 21, '2023-01-01')") dbSendQuery(conn = conn, statement = "INSERT INTO inventory (inv_item_id, inv_item_name, inv_item_qty, inv_item_stock_date) VALUES (3, 'Pear', 6, '2023-01-01')") dbSendQuery(conn = conn, statement = "INSERT INTO inventory (inv_item_id, inv_item_name, inv_item_qty, inv_item_stock_date) VALUES (1, 'Banana', 24, '2023-01-02')") dbSendQuery(conn = conn, statement = "INSERT INTO inventory (inv_item_id, inv_item_name, inv_item_qty, inv_item_stock_date) VALUES (3, 'Pear', 13, '2023-01-02')") dbCommit(conn = conn)</code></pre> These five rows are now inserted, so let's verify that by fetching them to R. <h3>How to Select Data from MySQL to R</h3> To retrieve records from a MySQL table in R, you'll have to combine the <code>dbSendQuery()</code> and <code>dbFetch()</code> functions. The first one is used to run a SQL <code>SELECT</code> statement, and the second one fetches the results and stores them into a variable. Here's how you can combine these two functions: <pre><code class="language-r">inventory_items &lt;- dbSendQuery(conn = conn, statement = "SELECT * FROM inventory") inventory_items_df &lt;- dbFetch(inventory_items) inventory_items_df</code></pre> And this is what the <code>inventory_items_df</code> variable holds: <img class="size-full wp-image-20317" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d73b019bb567ab86078c_94bb203d_17-1.webp" alt="Image 17 - R data frame from MySQL" width="1484" height="368" /> Image 17 - R data frame from MySQL In a nutshell, we now have a full R data frame containing all records from a MySQL table. Neat! <h2 id="aggregate-data">Connecting R to MySQL - Data Manipulation and Aggregation</h2> In this section, you'll learn how to aggregate and summarize our inventory data, and also how to store it in a MySQL directory from an R data frame format. <h3>Total Item Quantities</h3> The goal of this section is to calculate the number of items available for each inventory item. The idea is to use these for oversimplified inventory tracking. Anyhow, R's <code>dplyr</code> package provides everything you need to group and summarize the dataset: <pre><code class="language-r">library(dplyr) <br>item_qty &lt;- inventory_items_df %&gt;%  group_by(inv_item_name) %&gt;%  summarize(    items = sum(inv_item_qty)  ) item_qty</code></pre> This is what it looks like: <img class="size-full wp-image-20319" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d73ce2130d9f9d22373c_60dadce3_18-1.webp" alt="Image 18 - Item quantities" width="634" height="364" /> Image 18 - Item quantities Let's do another data frame aggregation before dumping both into the database. <h3>Inventory Stock Statistics</h3> Now the goal is to calculate the number of items in stock per day, both in per-day volumes and in cumulative terms. These could potentially give the inventory manager an idea of how much storage is occupied. Once again, R's <code>dplyr</code> has everything you need for the calculations: <pre><code class="language-r">in_stock_stats &lt;- inventory_items_df %&gt;%  group_by(inv_item_stock_date) %&gt;%  summarize(    n_items_stocked = sum(inv_item_qty)  ) %&gt;%  mutate(    n_items_in_stock = cumsum(n_items_stocked)  ) %&gt;%  ungroup() in_stock_stats</code></pre> And this is the resulting data frame: <img class="size-full wp-image-20321" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d73c6b3c80c04fb85809_a7edc58a_19-1.webp" alt="Image 19 - Stock statistics" width="1192" height="314" /> Image 19 - Stock statistics Let's now see how to dump both to MySQL. <h3>Save Aggregated R data.frame to MySQL</h3> A typical programming language and database workflow implies creating a table first, and then iterating over a data structure to insert rows. Luckily, that's not the case with R. The <code>dbWriteTable()</code> function allows you to dump the data frame directly to a database table. You simply have to provide the table name (the table doesn't have to exist) and your data frame data structure - R takes care of the rest: <pre><code class="language-r">dbWriteTable(conn = conn, name = "inventory_qty", value = item_qty, row.names = FALSE) dbWriteTable(conn = conn, name = "inventory_stats", value = in_stock_stats, row.names = FALSE)</code></pre> The <code>row.names</code> parameter ensures column names aren't treated as a single entry for the database table. This is the output you should see after running the above two lines of code: <img class="size-full wp-image-20323" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01a833b68120d1319ffdd_20-1.webp" alt="Image 20 - Writing data.frames to MySQL" width="2120" height="206" /> Image 20 - Writing data.frames to MySQL We can now use the <code>dbListTables()</code> function once again to verify if the tables were created: <pre><code class="language-r">dbListTables(conn = conn)</code></pre> And it looks like they were: <img class="size-full wp-image-20325" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01a8455ffbe81ba27f75c_21-1.webp" alt="Image 21 - List of database tables" width="1268" height="106" /> Image 21 - List of database tables Up next, we'll fetch data from these tables and visualize it. <h2 id="visualize-data">Visualizing R MySQL Data with ggplot2</h2> The goal of this section is to retrieve the previously aggregated and stored data and visualize it with <code>ggplot2</code>. <h3>Total Item Quantities</h3> Let's start with the inventory item quantities. You can retrieve table data the same way as before - by combining the <code>dbSendQuery()</code> and <code>dbFetch()</code> functions: <pre><code class="language-r">library(ggplot2) <br>db_inv_qty &lt;- dbSendQuery(conn = conn, statement = "SELECT * FROM inventory_qty") db_inv_qty &lt;- dbFetch(db_inv_qty) db_inv_qty</code></pre> Here's what the data frame looks like: <img class="size-full wp-image-20327" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d73e36826c908435f632_53e8a1c5_22.webp" alt="Image 22 - R data frame from database" width="494" height="260" /> Image 22 - R data frame from database And now we'll create a simple bar chart showcasing how many items do we have per item type. We'll also put bar counts on top of the bars for easier reading: <pre><code class="language-r">ggplot(db_inv_qty, aes(x = inv_item_name, y = items)) +  geom_bar(stat = "identity", fill = "skyblue") +  geom_text(aes(label = items), vjust = -0.5, color = "black", size = 4) +  labs(title = "Items Stocked", x = "Item Name", y = "Number of Items") +  theme_minimal()</code></pre> <img class="size-full wp-image-20329" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d73f09973e5f1b9817de_76fe8dc5_23.webp" alt="Image 23 - Visualizing total item quantities" width="2202" height="1538" /> Image 23 - Visualizing total item quantities That was simple, wasn't it? Don't worry if you don't feel at home with <code>gggplot2</code> - we'll provide a handful of useful links in a bit. <h3>Inventory Stock Statistics</h3> As for the inventory stock statistics, we can retrieve the data the same way as before. The only thing that changes is the table name: <pre><code class="language-r">db_inv_stats &lt;- dbSendQuery(conn = conn, statement = "SELECT * FROM inventory_stats") db_inv_stats &lt;- dbFetch(db_inv_stats) db_inv_stats</code></pre> This is what the data frame looks like: <img class="size-full wp-image-20331" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d74057c12c8e1dd8bdd8_94a5d302_24.webp" alt="Image 24 - R data frame from database" width="1218" height="208" /> Image 24 - R data frame from database And for the visualization, we'll create a two-layer chart. The first layer will show the number of inventory items stocked each day, and the second layer will show the same thing but in a cumulative manner: <pre><code class="language-r">ggplot(db_inv_stats, aes(x = inv_item_stock_date)) +  geom_bar(aes(y = n_items_stocked), stat = "identity", fill = "skyblue") +  geom_line(aes(y = n_items_in_stock, group = 1), color = "red", linewidth = 1.5) +  labs(    title = "Items Stocked and In Stock",    x = "Stock Date",    y = "Number of Items"  ) +  theme_minimal()</code></pre> <img class="size-full wp-image-20333" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01b8d81e2982979e04e66_25.webp" alt="Image 25 - Visualizing stock statistics" width="2202" height="1538" /> Image 25 - Visualizing stock statistics And that's how you can connect to a MySQL database from R and make amazing data visualizations - all in a couple of lines of code. Do you need more assistance in understanding <code>ggplot2</code> code? We have plenty of articles for you, covering <a href="https://appsilon.com/ggplot2-bar-charts/" target="_blank" rel="noopener">bar charts</a>, <a href="https://appsilon.com/ggplot2-line-charts/" target="_blank" rel="noopener">line charts</a>, <a href="https://appsilon.com/ggplot-scatter-plots/" target="_blank" rel="noopener">scatter plots</a>, <a href="https://appsilon.com/ggplot2-boxplots/" target="_blank" rel="noopener">box plots</a>, and <a href="https://appsilon.com/ggplot2-histograms/" target="_blank" rel="noopener">histograms</a>. <hr /> <h2 id="summary">Connecting R to MySQL - Summary</h2> To conclude, MySQL is the first database you've probably heard of. It's everywhere and can be installed for free locally and on your remote servers. Also, it's really simple to connect to it from R. Today you've learned how to approach connecting R to MySQL and how to run basic database operations. You've also seen how R can be used to store data frames directly in MySQL, which is not something a lot of programming languages offer. <i>What's your favorite way of connecting to a MySQL database from R? Also, where do you host your databases?</i> Let us know in the comment section below. <blockquote>R Shiny Engineer sure sounds like an interesting and promising career, but where do you start? <a href="https://appsilon.com/how-to-start-a-career-as-an-r-shiny-developer/" target="_blank" rel="noopener">Here's our detailed guide to go from zero to hired</a>.</blockquote>

Contact us!
Damian's Avatar
Damian Rodziewicz
Head of Sales
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Have questions or insights?
Engage with experts, share ideas and take your data journey to the next level!
Join Slack

Take Your Business Further with Custom
Data Solutions

Unlock the full potential of your enterprise with our data services, tailored to the unique needs of Fortune 500 companies. Elevate your strategy—connect with us today!

r
database