Read an SQLite File
Let's find out how to use SQL with pandas DataFrame.
We'll cover the following
Try it yourself
Try executing the code below to see the result.
import sqlite3import pandas as pdconn = sqlite3.connect(':memory:')conn.executescript('''CREATE TABLE visits (day DATE, hits INTEGER);INSERT INTO visits VALUES('2020-07-01', 300),('2020-07-02', 500),('2020-07-03', 900);''')df = pd.read_sql('SELECT * FROM visits', conn)print('time span:', df['day'].max() - df['day'].min())
Explanation
SQLite3 is a great single-file database that is used many times to transfer data. It is widely used and heavily tested and can handle vast amounts of data up to 140 terabytes.
However, SQL does require some training.
In the teaser code, we create a hits
table that has the following two columns:
day
column with SQLDATE
type.hits
column with SQLINTEGER
type.
The mapping from SQL types to pandas types is defined in the SQL driver that is used to access the database. SQLite3 is a bit different from other
databases in that it only has numbers and strings as types. However, it does support declaring a column with a DATE
, TIME
, or TIMESTAMP
type.
We can see that for ourselves when we look at the .dtypes
below:
In [1]: df.dtypes
Out[1]:
day object
hits int64
dtype: object
The day
column has an object
dtype, which in most cases, means it’s an str
. When we do df['day'].max() - df['day'].min()
, we’re subtracting two strings, which is not
a legal operation in Python.
We can convert data in a column to a pandas Timestamp
either by using the to_datetime
function or by passing the column names to convert in the parse_dates
parameter of read_sql
. However, we somehow need to know what columns are time related.
The better option is to use the detect_types
parameter in sqlite3.connect
.
When we pass PARSE_DECLTYPES
to sqlite3.connect
, it’ll convert DATE, TIME
, and TIMESTAMP
columns to Python’s datetime
types. The parameter read_sql
will convert these
pandas.Timestamp
columns.
Solution
import sqlite3import pandas as pdconn = sqlite3.connect(':memory:',detect_types=sqlite3.PARSE_DECLTYPES,)conn.executescript('''CREATE TABLE visits (day DATE, hits INTEGER);INSERT INTO visits VALUES('2020-07-01', 300),('2020-07-02', 500),('2020-07-03', 900);''')df = pd.read_sql('SELECT * FROM visits', conn)print('time span:', df['day'].max() - df['day'].min())
Get hands-on with 1300+ tech skills courses.