The goal of the odbc package is to provide a DBI-compliant interface to Open Database Connectivity (ODBC) drivers. This allows for an efficient, easy to setup connection to any database with ODBC drivers available, including SQL Server, Oracle, MySQL, PostgreSQL, SQLite and others. The implementation builds on the nanodbc C++ library.
For Unix and MacOS ODBC drivers should be compiled against unixODBC. Drivers compiled against iODBC may also work, but are not fully supported.
After installation of the driver manager and driver, you will have to
register the driver in a odbcinst.ini file
for it to appear in odbc::odbcListDrivers()
.
odbc and it’s dependencies use C++11 features. Therefore you need gcc 4.8, clang 3.3 or Rtools 3.3 or later.
Windows is bundled with ODBC libraries however drivers for each database need to be installed separately. Windows ODBC drivers typically include an installer that needs to be run and will install the driver to the proper locations.
homebrew can be used to easily install database drivers on MacOS.
# Install the unixODBC library
brew install unixodbc
# Microsoft SQL Server ODBC Drivers (Recommended)
brew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release
brew update
brew install msodbcsql17 mssql-tools
# SQL Server ODBC Drivers (Free TDS)
brew install freetds
# PostgreSQL ODBC ODBC Drivers
brew install psqlodbc
# MySQL ODBC Drivers (and database)
brew install mysql
# SQLite ODBC Drivers
brew install sqliteodbc
apt-get can be used to easily install database drivers on Linux distributions which support it, such as Debian and Ubuntu.
# Install the unixODBC library
apt-get install unixodbc unixodbc-dev
For SQL Server on Linux it is recommended you use the Microsoft Drivers.
# SQL Server ODBC Drivers (Free TDS)
apt-get install tdsodbc
# PostgreSQL ODBC ODBC Drivers
apt-get install odbc-postgresql
# MySQL ODBC Drivers
apt-get install libmyodbc
# SQLite ODBC Drivers
apt-get install libsqliteodbc
# Install the latest odbc release from CRAN:
install.packages("odbc")
# Or the the development version from GitHub:
# install.packages(devtools)
devtools::install_github("r-dbi/odbc")
Databases can be connected by specifying a connection string directly, or with DSN configuration files.
Specify a connection string as named arguments directly in the
dbConnect()
method.
library(DBI)
con <- dbConnect(odbc::odbc(),
driver = "PostgreSQL Driver",
database = "test_db",
uid = "postgres",
pwd = "password",
host = "localhost",
port = 5432)
Alternatively you can pass a complete connection string as the
.connection_string
argument. The Connection Strings
Reference is a useful resource that
has example connection strings for a large variety of databases.
library(DBI)
con <- dbConnect(odbc::odbc(),
.connection_string = "Driver={PostgreSQL Driver};Uid=postgres;Pwd=password;Host=localhost;Port=5432;Database=test_db;")
ODBC configuration files are another option to specify connection parameters and allow one to use a Data Source Name (DSN) to make it easier to connect to a database.
con <- dbConnect(odbc::odbc(), "PostgreSQL")
The ODBC Data Source Administrator application is used to manage ODBC data sources on Windows.
On MacOS and Linux there are two separate text files that need to be
edited. UnixODBC includes a command line executable odbcinst
which can
be used to query and modify the DSN files. However these are plain text
files you can also edit by hand if desired.
There are two different files used to setup the DSN information.
odbcinst.ini
- which defines driver optionsodbc.ini
- which defines connection options
The DSN configuration files can be defined globally for all users of the
system, often at /etc/odbc.ini
, /opt/local/etc/odbc.ini
or
/opt/homebrew/etc/odbc.ini
, the exact location depends on what option
was used when compiling unixODBC. odbcinst -j
can be used to find the
exact location. Alternatively the ODBCSYSINI
environment variable can
be used to specify the location of the configuration files. Ex.
ODBCSYSINI=~/ODBC
A local DSN file can also be used with the files ~/.odbc.ini
and
~/.odbcinst.ini
.
If ODBC has trouble locating your system data source names, you may need to override the default location where ODBC looks for your configuration files. You can use either of the options below to specify the location of your DSN configuration files.
- Create or open the
~/.Renviron
file. - Add
ODBCSYSINI=/opt/homebrew/etc
to your~/.Renviron
file and save your changes. - Restart any open R sessions before connecting to a database.
- Set the
ODBCSYSINI
environment variable, eg.ODBCSYSINI=/opt/homebrew/etc
. - Restart any open R sessions before connecting to a database.
Contains driver information, particularly the name of the driver library. Multiple drivers can be specified in the same file.
[PostgreSQL Driver]
Driver = /usr/local/lib/psqlodbcw.so
[SQLite Driver]
Driver = /usr/local/lib/libsqlite3odbc.dylib
On MacOS aarch64 machines, drivers installed via homebrew are in a different location, as seen below.
[PostgreSQL Driver]
Driver = /opt/homebrew/lib/psqlodbcw.so
[SQLite Driver]
Driver = /opt/homebrew/lib/libsqlite3odbc.dylib
Contains connection information, particularly the username, password,
database and host information. The Driver line corresponds to the driver
defined in odbcinst.ini
.
[PostgreSQL]
Driver = PostgreSQL Driver
Database = test_db
Servername = localhost
UserName = postgres
Password = password
Port = 5432
[SQLite]
Driver = SQLite Driver
Database = /tmp/testing
See also: unixODBC without the GUI for more information and examples.
All of the following examples assume you have already created a
connection con
. See Connecting to a
database for more information on
establishing a connection.
dbListTables()
is used for listing all existing tables in a database.
dbListTables(con)
# List tables beginning with f
dbListTables(con, table_name = "f%")
# List all fields in the 'flights' database
dbListFields(con, "flights")
dbReadTable()
will read a full table into an R data.frame()
.
data <- dbReadTable(con, "flights")
dbWriteTable()
will write an R data.frame()
to an SQL table.
data <- dbWriteTable(con, "iris", iris)
dbGetQuery()
will submit a query and fetch the results. It is also
possible to submit the query and fetch separately with dbSendQuery()
and dbFetch()
. The n=
argument to dbFetch()
can be used to fetch
only the part of a query result (the next n rows).
result <- dbSendQuery(con, "SELECT flight, tailnum, origin FROM flights ORDER BY origin")
# Retrieve the first 100 results
first_100 <- dbFetch(result, n = 100)
# Retrieve the rest of the results
rest <- dbFetch(result)
The odbc package is often much faster than the existing RODBC and DBI compatible RODBCDBI packages.
Reading a table from a SQL Server database with the ‘flights’ dataset (336,776 rows, 19 columns) of the package nytflights13.
# First using RODBC / RODBCDBI
library(DBI)
rodbc <- dbConnect(RODBCDBI::ODBC(), dsn = "MicrosoftSQLServer", user = Sys.getenv("SQLSERVER_UID"), password = Sys.getenv("SQLSERVER_PWD"))
system.time(rodbc_result <- dbReadTable(rodbc, "flights"))
#> user system elapsed
#> 13.986 1.173 15.192
# Now using odbc
odbc <- dbConnect(odbc::odbc(), dsn = "MicrosoftSQLServer", UID = Sys.getenv("SQLSERVER_UID"), PWD = Sys.getenv("SQLSERVER_PWD"))
system.time(odbc_result <- dbReadTable(odbc, "flights"))
#> user system elapsed
#> 1.334 0.065 1.406
tibble::as_tibble(odbc_result)
#> # A tibble: 336,776 x 19
#> year month day dep_time sched_dep_time dep_delay arr_time
#> <int> <int> <int> <int> <int> <dbl> <int>
#> 1 2013 1 1 517 515 2 830
#> 2 2013 1 1 533 529 4 850
#> 3 2013 1 1 542 540 2 923
#> 4 2013 1 1 544 545 -1 1004
#> 5 2013 1 1 554 600 -6 812
#> 6 2013 1 1 554 558 -4 740
#> 7 2013 1 1 555 600 -5 913
#> 8 2013 1 1 557 600 -3 709
#> 9 2013 1 1 557 600 -3 838
#> 10 2013 1 1 558 600 -2 753
#> # … with 336,766 more rows, and 12 more variables: sched_arr_time <int>,
#> # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,
#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
#> # minute <dbl>, time_hour <dttm>
identical(dim(rodbc_result), dim(odbc_result))
#> [1] TRUE
rm(rodbc_result, odbc_result, odbc, rodbc)
gc(verbose = FALSE)
Writing the same dataset to the database.
library(nycflights13)
# rodbc does not support writing timestamps natively, so we remove that column
flights2 <- as.data.frame(flights[, names(flights) != "time_hour"])
rodbc <- dbConnect(RODBCDBI::ODBC(), dsn = "MicrosoftSQLServer", user = Sys.getenv("SQLSERVER_UID"), password = Sys.getenv("SQLSERVER_PWD"))
system.time(dbWriteTable(rodbc, "flights2", flights2))
#> user system elapsed
#> 11.891 6.269 765.269
# Now using odbc
odbc <- dbConnect(odbc::odbc(), dsn = "MicrosoftSQLServer", UID = Sys.getenv("SQLSERVER_UID"), PWD = Sys.getenv("SQLSERVER_PWD"))
flights3 <- as.data.frame(flights)
system.time(dbWriteTable(odbc, "flights3", flights3))
#> user system elapsed
#> 2.617 0.275 9.649