SqlAlchemy: Connecting to pre-existing databases

Accessing databases with Python is a simple process. Python even provides a sqlite database library that’s built into the main distribution (since 2.5). My favorite way to access databases with Python is to use the 3rd party package, SqlAlchemy. SqlAlchemy is an object-relational mapper (ORM), which means that it takes SQL constructs and makes them more like the target language. In this case, you end up using Python syntax to execute SQL rather than straight SQL and you can use the same code to access multiple database backends (if you’re careful).

In this article, we’re going to look at how to use SqlAlchemy to connect to pre-existing databases. If my experience is any indication, you’ll probably be spending more time working with databases that you didn’t create than with ones that you did. This article will show you how to connect to them.

SqlAlchemy’s autoload

SqlAlchemy has two ways to define a databases columns. The first way is the long way, in which you define each and every field along with its type. The easy (or short) way is to use SqlAlchemy’s autoload functionality, which will introspect the table and pull out the field names in a rather magical way. We’ll start off with the autoload method and then show the long way in the next section.

Before we begin though, it needs to be noted that there are two methods for configuring SqlAlchemy: a long-hand way and a declarative (or “shorthand” way. We will go over both ways. Let’s start out with the long version and then do it “declaratively”.

from sqlalchemy import create_engine, MetaData, Table
from sqlalchemy.orm import mapper, sessionmaker

class Bookmarks(object):
    pass

#----------------------------------------------------------------------
def loadSession():
    """"""    
    dbPath = 'places.sqlite'
    engine = create_engine('sqlite:///%s' % dbPath, echo=True)
    
    metadata = MetaData(engine)
    moz_bookmarks = Table('moz_bookmarks', metadata, autoload=True)
    mapper(Bookmarks, moz_bookmarks)
    
    Session = sessionmaker(bind=engine)
    session = Session()
    return session

if __name__ == "__main__":
    session = loadSession()
    res = session.query(Bookmarks).all()
    res[1].title

In this snippet, we import a few handy classes and utilities from sqlalchemy that allow us to define such things as an engine (a type of connection/interface to the database), metadata (a table catalog) and a session (a “handle” to the database that allows us to query it). Note that we have a “places.sqlite” file. This database comes from Mozilla Firefox. If you have it installed, go looking for it as it makes an excellent testbed for this sort of thing. On my Windows XP machine, it’s in the following location: “C:\Documents and Settings\Mike\Application Data\Mozilla\Firefox\Profiles\f7csnzvk.default”. Just copy that file into the location that you’ll be using for the scripts in this article. If you try to use it in place and you have Firefox open, you may have issues as your code may interrupt Firefox or vice-versa.

In the create_engine call, we set echo to True. This causes SqlAlchemy to send all the SQL it generates to stdout, which is quite handy for debugging purposes. I would recommend setting it to False when you put your code in production. For our purposes, the most interesting line is as follows:

moz_bookmarks = Table('moz_bookmarks', metadata, autoload=True)

This tells SqlAlchemy to attempt to load the “moz_bookmarks” table automatically. If it’s a proper database with a primary key, this will work great. If the table doesn’t have a primary key, then you can do this to hack it:

from sqlalchemy import Column, Integer
moz_bookmarks = Table('moz_bookmarks', metadata, 
                      Column("id", Integer, primary_key=True),
                      autoload=True)

This adds an extra column called “id”. It’s basically monkey-patching the database. I’ve used this method successfully when I have to use poorly designed databases from vendors. SqlAlchemy will automatically increment it too, if the database supports it. The mapper will map the table object to the Bookmarks class. Then we create a session that’s bound to our engine so we can make queries. The res = session.query(Bookmarks).all() basically means SELECT * FROM moz_bookmarks and will return the result as a list of Bookmark objects.

Sometimes the bad database will have a field that should obviously be the primary key. If so, you can just use that name instead of “id”. Other times, you can create a unique key by setting the primary key to two columns (i.e. fields). You do that by creating two columns and setting “primary_key” to True on both of them. This is a whole lot better than having to define an entire table if the table has several dozen fields.

Let’s move on to the declarative autoload method:

from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

engine = create_engine('sqlite:///places.sqlite', echo=True)
Base = declarative_base(engine)
########################################################################
class Bookmarks(Base):
    """"""
    __tablename__ = 'moz_bookmarks'
    __table_args__ = {'autoload':True}

#----------------------------------------------------------------------
def loadSession():
    """"""
    metadata = Base.metadata
    Session = sessionmaker(bind=engine)
    session = Session()
    return session
    
if __name__ == "__main__":
    session = loadSession()
    res = session.query(Bookmarks).all()
    print res[1].title

The declarative method looks a little different, huh? Let’s try to unpack the new stuff. First off, we have a new import: from sqlalchemy.ext.declarative import declarative_base. We use “declarative_base” to create a class using our engine that we then create the Bookmarks subclass from. To specify the table name, we use the magic method, __tablename__ and to tell it to autoload, we create the __table_args__ dict. Then in the loadSession function, we get the metadata object like this: metadata = Base.metadata. The rest of the code is the same. Declarative syntax usually simplifies the code because it puts it all in one class rather than creating a class and a Table object separately. If you like the declarative style, you might also want to look at Elixir, a SqlAlchemy extension that did declarative before they added this style to the main package.

Defining Your Databases Explicitly

There are times when you can’t use autoload or you just want to have complete control over the table definition. SqlAlchemy allows you to do this almost as easily. We’ll look at the longhand version first and then look at the declarative style.

from sqlalchemy import create_engine, Column, MetaData, Table
from sqlalchemy import Integer, String, Text
from sqlalchemy.orm import mapper, sessionmaker

class Bookmarks(object):
    pass

#----------------------------------------------------------------------
def loadSession():
    """"""
    dbPath = 'places.sqlite'
    
    engine = create_engine('sqlite:///%s' % dbPath, echo=True)
    
    metadata = MetaData(engine)    
    moz_bookmarks = Table('moz_bookmarks', metadata, 
                          Column('id', Integer, primary_key=True),
                          Column('type', Integer),
                          Column('fk', Integer),
                          Column('parent', Integer),
                          Column('position', Integer),
                          Column('title', String),
                          Column('keyword_id', Integer),
                          Column('folder_type', Text),
                          Column('dateAdded', Integer),
                          Column('lastModified', Integer)
                          )
    
    mapper(Bookmarks, moz_bookmarks)
    
    Session = sessionmaker(bind=engine)
    session = Session()
    
if __name__ == "__main__":
    session = loadSession()
    res = session.query(Bookmarks).all()
    print res[1].title

This code really isn’t all that different from the code we’ve seen. The part we care about most is the Table definition. Here we use the keyword, Column, to define each column. The Column class accepts a name of the column, the type, whether to set it as the primary key and whether or not the column is nullable. It may accept a few other arguments, but those are the ones you’ll see the most. This example doesn’t show the nullable bit because I couldn’t figure out if the Firefox table had those constraints.

Anyway, once you’ve got the columns defined, the rest of the code is the same as before. Let’s move on to the declarative style then:

from sqlalchemy import Column, Integer, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

engine = create_engine('sqlite:///places.sqlite', echo=True)
Base = declarative_base(engine)
########################################################################
class Places(Base):
    """"""
    __tablename__ = 'moz_places'
    
    id = Column(Integer, primary_key=True)
    url = Column(String)
    title = Column(String)
    rev_host = Column(String)
    visit_count = Column(Integer)
    hidden = Column(Integer)
    typed = Column(Integer)
    favicon_id = Column(Integer)
    frecency = Column(Integer)
    last_visit_date = Column(Integer)
    
    #----------------------------------------------------------------------
    def __init__(self, id, url, title, rev_host, visit_count,
                 hidden, typed, favicon_id, frecency, last_visit_date):
        """"""
        self.id = id
        self.url = url
        self.title = title
        self.rev_host = rev_host
        self.visit_count = visit_count
        self.hidden = hidden
        self.typed = typed
        self.favicon_id = favicon_id
        self.frecency = frecency
        self.last_visit_date = last_visit_date
        
    #----------------------------------------------------------------------
    def __repr__(self):
        """"""
        return "" % (self.id, self.title,
                                                 self.url)
    
#----------------------------------------------------------------------
def loadSession():
    """"""
    metadata = Base.metadata
    Session = sessionmaker(bind=engine)
    session = Session()
    return session
    
if __name__ == "__main__":
    session = loadSession()
    res = session.query(Places).all()
    print res[1].title

We switched things up a bit with this example by setting it up for a different table: “moz_places”. You’ll notice that you set up the columns here by creating class variables. If you want to access the instance’s variables, you need to redefine them in the __init__. If you don’t do this, you’ll end up with some pretty confusing issues. The __repr__ is there as a “pretty print” method. When you print one of the “Places” objects, you’ll get whatever __repr__ returns. Other than that, the code is pretty similar to the other pieces that we’ve seen.

Wrapping Up

As you can see, using SqlAlchemy to connect to databases is a breeze. Most of the time, you will know ahead of time what the database’s design is, either from documentation, because you built it yourself, or because you have some utility that can tell you. Having that information is helpful when it comes to defining the tables in SqlAlchemy, but now you know what to try even if you don’t know the table’s configuration beforehand! Isn’t learning Python tricks fun? See you next time!

Further Reading

Downloads

1 thought on “SqlAlchemy: Connecting to pre-existing databases”

Comments are closed.