Faking Data with the Faker Package

If you are a software developer or engineer, then you know it can be really helpful to have sample data. The data doesn’t have to be real data either. Instead, the data can be fake. For example, if you are writing a program that will process HIPAA data, then you won’t be using actual data for your testing as that would be in violation of privacy laws.

Good programmers know that they should test their code, but how do you test it when the data is protected or unavailable? That is where fake data comes in. You can use fake data to populate your database, create XML or JSON or use it to anonymize real data. There is a Python package called Faker that you can use to generate fake data.

Faker is heavily inspired by PHP Faker, Perl Faker, and by Ruby Faker.

The first step in using Faker is to get it installed!

Installation

Faker is easy to install if you know how to use pip. Here is the command you can run:

python3 -m pip install Faker

Now that Faker is installed, you are ready to start creating fake data!

Creating Fake Data with Faker

Faker makes creating fake data surprisingly easy. Open up your terminal (or cmd.exe / Powershell) and run Python. Then you can try out the following code in your REPL:

>>> from faker import Faker
>>> fake = Faker()
>>> fake.name()
'Paul Lynn'
>>> fake.name()
'Keith Soto'
>>> fake.address()
'Unit 6944 Box 5854\nDPO AA 14829'
>>> fake.address()
'44817 Wallace Way Apt. 376\nSouth Ashleymouth, GA 03737'

Here you import the Faker class from the faker module. Next, you call the name() and address() functions a couple of times. Each time you call these functions, a new fake name or address is returned.

You can see this by creating a loop and calling name() ten times:

>>> for _ in range(10):
...     print(fake.name())
... 
Tiffany Mueller
Zachary Burgess
Clinton Castillo
Yvonne Scott
Randy Gilbert
Christina Frazier
Samantha Rodriguez
Billy May
Joel Ross
Lee Morales

This code demonstrates that the name is different every single time you call the function!

Create Fake International Data

Faker supports setting the locale when you instantiate the Faker() class. What that means is that you can fake data in other languages.

For example, try setting the locale to Italian and then print out some names:

>>> fake_italian = Faker(locale="it_IT")
>>> for _ in range(10):
...     print(fake_italian.name())
... 
Virgilio Cignaroli
Annibale Tutino
Alessandra Iannucci
Flavio Bianchi
Pier Peruzzi
Marcello Mancini-Saragat
Marina Sismondi
Rolando Comolli
Dott. Benvenuto Luria
Giancarlo Folliero-Dallapé

These names are now all Italian. If you want more variety, you can pass a list of locales to Faker() instead:

Note: This example is an image rather than text because the syntax highlighter tool couldn’t represent the Asian characters correctly.

Create Fake Python Data

The Faker package can even fake Python data. If you don’t want to come up with your Python lists, integers, dictionaries, etcetera, you can ask Faker to do it for you.

Here are a few examples:

>>> fake.pylist()
['http://www.torres.com/category/', -956214947820.653, 'bPpdDhlEBEbhbQETwXOZ', Decimal('256.347612040523'), '
dPypmKDRlQqxpdkhOfmP', 5848, 'PGyduoxaLewOUdTEdeBs', Decimal('-43.8777257283172'), 'oxqvWiDyWaOErUBrkhIa', 
'hkJbiRnTaPqZpEnuJoFF', 8471, 'scottjason@yahoo.com', 'rXQBeNIKEiGcQpLZKBvR']
>>> fake.pydict()
{'eight': 'http://nielsen.com/posts/about/', 'walk': Decimal('-2945142151233.25'), 'wide': 'mary80@yahoo.com', 
'sell': 5165, 'information': 2947, 'fire': 'http://www.mitchell.com/author.html', 'sea': 4662, 
'claim': 'xhogan@jackson.com'}
>>> fake.pyint()
493
>>> fake.pyint()
6280

Isn’t that great?

Wrapping Up

Faker allows you to set up custom seeds or create your own providers of fake data. For full details, read Faker’s documentation. You should check out the package today and see how useful it can be for your own code.

Other Neat Python Packages

Want to learn about other neat Python packages? Check out the following tutorials: