Connecting R to Postgres: From Database Creation to Data Manipulation and Charts

Estimated time:
time
min

Data science and statistics revolve around one key term - <i>data</i>. If there's no data, or you can't access it, there are no insights. It makes sense to invest some time into mastering databases and how to work with each straight from R. That's where this article comes in. If you're wondering how to go about connecting R to Postgres, wonder no more. Today you'll learn how to provision a PostgreSQL database on Amazon AWS, and also how to connect to it from R, create tables (2 ways), insert data, select data, and use database data as a basis for data visualization. <blockquote>Working with string data in R can seem tough - but it doesn't have to be. <a href="https://appsilon.com/stringr-examples/" target="_blank" rel="noopener">Read our comprehensive guide to R stringr</a>.</blockquote> Table of contents: <ul><li><a href="#database-setup">Postgres Database Setup on AWS - A Complete Guide</a></li><li><a href="#connect">How to Connect R to Postgres with DBI</a></li><li><a href="#create-table">Connecting R to Postgres - Creating Tables</a></li><li><a href="#insert-data">Inserting and Selecting Data into Postgres from R</a></li><li><a href="#aggregate-data">Connecting R to Postgres - Data Manipulation and Aggregation</a></li><li><a href="#visualize-data">Visualizing R Postgres Data with ggplot2</a></li><li><a href="#summary">Connecting R to Postgres - Summary</a></li></ul> <hr /> <h2 id="database-setup">Postgres Database Setup on AWS - A Complete Guide</h2> First things first, we need a database. This section will walk you through setting one up on Amazon AWS, and you'll also see how to test the connection. It's assumed you have an AWS account configured already. <h3>Creating a New Database Instance</h3> To start, log into your AWS account and navigate to <i>RDS</i>. Once there, click on <i>Databases</i>. You won't see anything there if this is your first time setting up a database on AWS: <img class="size-full wp-image-20200" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d741dbea942dfb0930ae_28228769_1-3.webp" alt="Image 1 - Creating an Amazon RDS database" width="1908" height="1054" /> Image 1 - Creating an Amazon RDS database To start provisioning a database, click on the big orange <i>Create database</i> button. It will redirect you to a configuration page. We'll use the standard database creation mode and will choose PostgreSQL as an engine of choice: <img class="size-full wp-image-20202" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d7429add9e4da7ad3a84_c038d77e_2-3.webp" alt="Image 2 - Standard PostgreSQL database creation" width="1908" height="1558" /> Image 2 - Standard PostgreSQL database creation Below the engine selection, make sure to use the <i>Free Tier</i> template. Contradictory, it's not entirely free and will cost you around $0.5 per day: <img class="size-full wp-image-20204" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d7422b08605d95aa1697_061b297e_3-3.webp" alt="Image 3 - Using the Free Tier RDS database" width="1908" height="1558" /> Image 3 - Using the Free Tier RDS database As for the database settings, make sure to remember the database instance identifier, username, and password. You'll need these later when establishing a connection, so write them somewhere safe: <img class="size-full wp-image-20206" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d743c24a1b8139c5c501_18910165_4-3.webp" alt="Image 4 - PostgreSQL database identifier, username, and password" width="1908" height="1769" /> Image 4 - PostgreSQL database identifier, username, and password The rest of the settings can be pretty much left on default. We need the cheapest CPU/RAM configuration and the least amount of storage (20 GB). You can also disable storage autoscaling: <img class="size-full wp-image-20208" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d74478149a2ea79e9d14_e109d7eb_5-3.webp" alt="Image 5 - Instance class and storage" width="1908" height="1769" /> Image 5 - Instance class and storage Instance connectivity is where you want to pay attention. It's crucial to <b>enable public access</b>, so you can actually access your database from R. This will assign a public IP address to the database: <img class="size-full wp-image-20210" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d744374a4f9b8ed45164_3dda8739_6-3.webp" alt="Image 6 - Instance connectivity and security group" width="1908" height="1857" /> Image 6 - Instance connectivity and security group Almost there! Regarding database authentication, leave it to the default password option: <img class="size-full wp-image-20212" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d74587994b7930bbee33_efb9c6a8_7-2.webp" alt="Image 7 - Authentication method" width="1908" height="1730" /> Image 7 - Authentication method And now finally click on the <i>Create database</i> button: <img class="size-full wp-image-20214" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d746c77436744523039c_c53f3842_8-2.webp" alt="Image 8 - Creating the database" width="1908" height="1730" /> Image 8 - Creating the database Provisioning the database might take a couple of minutes. It took between 5 to 10 minutes on our end, so just wait until you see that the database is available: <img class="size-full wp-image-20216" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d74716b3ff510e39900c_6de2c645_9-2.webp" alt="Image 9 - Database provisioned successfully" width="1908" height="1024" /> Image 9 - Database provisioned successfully And that's it! You've successfully provisioned a PostgreSQL database on Amazon AWS. Let's see how to connect to it next. <h3>Establishing a Connection to a Postgres Database</h3> Before you can establish a connection, you need to get the connection parameters. Simply click on the database identifier and you'll be redirected to the following screen: <img class="size-full wp-image-20218" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01a76a7c6956a1696937e_10-2.webp" alt="Image 10 - Postgres connection parameters" width="1908" height="1615" /> Image 10 - Postgres connection parameters From here, make sure to copy the <i>Endpoint</i> and <i>Port</i> values. You'll also need a username and password, but you already know these (specified when provisioning the database). We won't use R in this section since we only want to verify the database was provisioned successfully and that you can access it from anywhere. We're using a free GUI tool named <a href="https://tableplus.com/" target="_blank" rel="noopener">TablePlus</a>. You can also download it, enter the connection parameters, and click o <i>Test</i> to test the connection. If the fields get a green overlay, it means the connection to your database can be established: <img class="size-full wp-image-20220" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01a78307dbcdd40acf78e_11-2.webp" alt="Image 11 - Postgres connection test" width="612" height="504" /> Image 11 - Postgres connection test On the other hand, if the fields get a red overlay, it means you've probably entered one or more of the connection parameters wrong. Let's assume yours is green like ours and proceed to the following section. <h2 id="connect">How to Connect R to Postgres with DBI</h2> This section will walk you through connecting R to Postgres. You'll see the best practice for doing so, and the best practice to remove database credentials from your R scripts. <h3>Setting up the .env File</h3> Okay, so, what is and <code>.env</code> file? It's a file that holds environment variables for your project. You can separate sensitive info and auth credentials from your R code, and simply add <code>.env</code> file to <code>.gitignore</code>. That way, everyone will be able to see your code, but no one will know your security credentials. Neat! To start, create a <code>.env</code> file. It's not a typo - the file has no name, just the extension. Once created, paste the following inside it: <pre><code class="language-text">DB_HOST= DB_PORT= DB_USER= DB_PASS=</code></pre> And, of course, populate it with your database connection values. There's no need to wrap string data with quotes. <h3>Establishing R Postgres Connection</h3> You can now use the <code>dotenv</code> R package to load in the environment variables. Also, load the <code>DBI</code> package for establishing a database connection: <pre><code class="language-r">library(DBI) dotenv::load_dot_env()</code></pre> Of course, install any of these packages if you don't have them already. Finally, to establish an R Postgres connection, use the <code>dbConnect()</code> function from <code>DBI</code>. It requires you to enter connection parameters, which can be obtained from the loaded environment variables: <pre><code class="language-r">pg_conn &lt;- dbConnect(  RPostgres::Postgres(),  dbname = "postgres",  host = Sys.getenv("DB_HOST"),  port = Sys.getenv("DB_PORT"),  user = Sys.getenv("DB_USER"),  password = Sys.getenv("DB_PASS") )</code></pre> And that's the database connection for you. You're good to proceed to the following section if this code block doesn't throw an error. If it does, make sure you've entered the connection parameters correctly and also make sure R can access your environment variables. <h2 id="create-table">Connecting R to Postgres - Creating Tables</h2> The good thing about the R Postgres connection is that you can directly run SQL queries. We'll run one in this section to create a table. There's an alternative table creation method, but more on that in a bit. We'll store some dummy employee data, and the database table will hold information on the employee's name, email, department, salary, and hire date. The <code>dbSendQuery()</code> function is used to run a SQL statement from R: <pre><code class="language-r">dbSendQuery(  conn = pg_conn,  statement = paste(    "CREATE TABLE employees (",    "emp_id SERIAL PRIMARY KEY,",    "emp_first_name VARCHAR(32),",    "emp_last_name VARCHAR(32),",    "emp_email VARCHAR(32),",    "emp_department VARCHAR(16),",    "emp_salary NUMERIC(7, 2),",    "emp_hire_date DATE",    ")"  ) )</code></pre> Provided your SQL statement is written correctly, you should see the following output: <img class="size-full wp-image-20222" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01a79e87229504498ecdc_12-2.webp" alt="Image 12 - Table creation output" width="1126" height="213" /> Image 12 - Table creation output <code>DBI</code> also provides a convenient function for listing all database tables - <code>dbListTables()</code>. We'll use it to verify our <code>employees</code> table was created: <pre><code class="language-r">dbListTables(pg_conn)</code></pre> Here's the output: <img class="size-full wp-image-20224" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01a7bfa26a4961d77cd8f_13-2.webp" alt="Image 13 - List of all available tables" width="360" height="75" /> Image 13 - List of all available tables Long story short, the table was created successfully. Up next, we'll show you how to insert data into Postgres through R. <h2 id="insert-data">Inserting and Selecting Data into Postgres from R</h2> This section will show you one way of inserting data into a Postgres table, and also how to get the data back from Postgres to R. <h3>How to Insert Data into Postgres from R</h3> The most intuitive way to insert data into a Postgres table is by running multiple SQL <code>INSERT</code> statements. That's exactly what we'll do here. You already know how the <code>dbSendQuery()</code> function works, so there's no need to explain it further. We'll use it here to insert a couple of employees into our table: <pre><code class="language-r">dbSendQuery(conn = pg_conn, statement = "INSERT INTO employees (emp_first_name, emp_last_name, emp_email, emp_department, emp_salary, emp_hire_date) VALUES('John', 'Doe', 'johndoe@company.com', 'IT', 5500, '2020-01-01')") dbSendQuery(conn = pg_conn, statement = "INSERT INTO employees (emp_first_name, emp_last_name, emp_email, emp_department, emp_salary, emp_hire_date) VALUES('Mark', 'Smith', 'marksmith@company.com', 'IT', 6600, '2017-04-12')") dbSendQuery(conn = pg_conn, statement = "INSERT INTO employees (emp_first_name, emp_last_name, emp_email, emp_department, emp_salary, emp_hire_date) VALUES('Jake', 'Dean', 'jakedean@company.com', 'Sales', 4800, '2019-07-01')") dbSendQuery(conn = pg_conn, statement = "INSERT INTO employees (emp_first_name, emp_last_name, emp_email, emp_department, emp_salary, emp_hire_date) VALUES('Meredith', 'Jackson', 'meredithjackson@company.com', 'Sales', 4900, '2019-01-21')") dbSendQuery(conn = pg_conn, statement = "INSERT INTO employees (emp_first_name, emp_last_name, emp_email, emp_department, emp_salary, emp_hire_date) VALUES('Susan', 'Dell', 'susandell@company.com', 'Marketing', 4500, '20179-09-19')") </code></pre> The question now is - how can you know if the rows were inserted? Let's answer that next. <h3>How to Select Postgres Data from R</h3> In order to retrieve records from a Postgres table in R, you'll have to combine the <code>dbSendQuery()</code> and <code>dbFetch()</code> functions. The prior is used to run a SQL <code>SELECT</code> statement, while the latter fetches the results and stores them into a variable. If you prefer code over text, here's the snippet for you: <pre><code class="language-r">employees &lt;- dbSendQuery(conn = pg_conn, statement = "SELECT * FROM employees") employees_df &lt;- dbFetch(employees) employees_df</code></pre> And this is what the resulting employee data frame looks like: <img class="size-full wp-image-20226" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01a7b816ab5aa30956418_14-2.webp" alt="Image 14 - Employees data" width="1535" height="249" /> Image 14 - Employees data Up next, we'll aggregate this data and show you an alternative way to store records in a database. <h2 id="aggregate-data">Connecting R to Postgres - Data Manipulation and Aggregation</h2> The data retrieved in the previous section is in a familiar R data frame format. This means you can use packages such as <code>dplyr</code> to perform any sort of data manipulation and aggregation. For example, we'll group the dataset by department and calculate a couple of summary statistics - number of employees per department, total monthly salary, and total yearly salary. We'll also rename one column to get rid of the <code>emp</code> prefix: <pre><code class="language-r">library(dplyr) <br>emp_summary &lt;- employees_df %&gt;%  group_by(emp_department) %&gt;%  summarize(    n_employees = n(),    monthly_salary = sum(emp_salary),    yearly_salary = sum(emp_salary) * 12  ) %&gt;%  rename(    department = emp_department  ) emp_summary</code></pre> Here's what the resulting data frame looks like: <img class="size-full wp-image-20228" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01a7c028dd6ffa4ad797f_15-2.webp" alt="Image 15 - Aggregating an R data.frame" width="1364" height="378" /> Image 15 - Aggregating an R data.frame <h3>Save Aggregated R data.frame to Postgres</h3> It would be superb if we could somehow dump the entire R data frame into a database table, without actually needing to create the table first. Well, today's your lucky day. <code>DBI</code> packs a <code>dbWriteTable()</code> function you can use to dump a data frame into a table. It will create the table for you if it doesn't already exist: <pre><code class="language-r">dbWriteTable(conn = pg_conn, name = "employee_summary", value = emp_summary, row.names = FALSE)</code></pre> To verify, we can list the tables to ensure the table was created: <pre><code class="language-r">dbListTables(pg_conn)</code></pre> And it looks like it did: <img class="size-full wp-image-20230" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01a7d700c6e7f89a864ad_16-2.webp" alt="Image 16 - List of all available tables" width="1056" height="132" /> Image 16 - List of all available tables Up next, let's use this aggregated table to make a data visualization. <h2 id="visualize-data">Visualizing R Postgres Data with ggplot2</h2> You can retrieve table data the same way as before - by combining the <code>dbSendQuery()</code> and <code>dbFetch()</code> functions: <pre><code class="language-r">db_emp_summary &lt;- dbSendQuery(conn = pg_conn, statement = "SELECT * FROM employee_summary") emp_summary_df &lt;- dbFetch(db_emp_summary) emp_summary_df</code></pre> Here's what the data frame looks like: <img class="size-full wp-image-20232" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d74de4ddb620d508d742_16f72468_17-2.webp" alt="Image 17 - Summarized data" width="1360" height="306" /> Image 17 - Summarized data The idea now is to use <code>ggplot2</code> to make a bar chart showcasing yearly salaries by department, colored by the number of employees working in that department. We'll also add some quick styles and axis labels: <pre><code class="language-r">library(ggplot2) <br>ggplot(emp_summary_df, aes(x = department, y = yearly_salary, fill = n_employees)) +  geom_bar(stat = "identity", position = "dodge") +  labs(    title = "Yearly Salaries Across Departments",    x = "Department",    y = "Yearly Salary",    fill = "# of Employees"  ) +  theme_minimal()</code></pre> This is what the chart looks like: <img class="size-full wp-image-20234" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d74e6b3c80c04fb862e2_5107f9a0_18-1.webp" alt="Image 18 - Department statistics chart" width="2002" height="1562" /> Image 18 - Department statistics chart And that's how you can connect to a Postgres database from R and make a good-looking data visualization - all in a couple of lines of code. Do you need more assistance in understanding <code>ggplot2</code> code? We have plenty of articles for you, covering <a href="https://appsilon.com/ggplot2-bar-charts/" target="_blank" rel="noopener">bar charts</a>, <a href="https://appsilon.com/ggplot2-line-charts/" target="_blank" rel="noopener">line charts</a>, <a href="https://appsilon.com/ggplot-scatter-plots/" target="_blank" rel="noopener">scatter plots</a>, <a href="https://appsilon.com/ggplot2-boxplots/" target="_blank" rel="noopener">box plots</a>, and <a href="https://appsilon.com/ggplot2-histograms/" target="_blank" rel="noopener">histograms</a>. <hr /> <h2 id="summmary">Connecting R to Postgres - Summary</h2> To summarize, PostgreSQL is probably one of the easiest databases to connect with from R. You only need to know the connection parameters, and there's no need to install any additional client/server software. Probably the best thing about R is that it allows you to manipulate table data just like normal R data frames, and then dump the aggregated results back to a new table (or append to the existing one). There's no need to create the table first and then write a bunch or insert statements. <i>What's your preferred way of connecting R to Postgres? Are you using an alternative package?</i> Let us know in the comment section below. <blockquote>Do you think your code is production-ready? <a href="https://appsilon.com/package-tests-in-rstudio-python-r/" target="_blank" rel="noopener">Leave nothing to chance by implementing package tests</a>.</blockquote>

Contact us!
Damian's Avatar
Damian Rodziewicz
Head of Sales
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Have questions or insights?
Engage with experts, share ideas and take your data journey to the next level!
Join Slack

Take Your Business Further with Custom
Data Solutions

Unlock the full potential of your enterprise with our data services, tailored to the unique needs of Fortune 500 companies. Elevate your strategy—connect with us today!

r
postgres
tutorials