Python 101 - How to Work with CSV files

There are many common file types that you will need to work with as a software developer. One such format is the CSV file. CSV stands for “Comma-Separated Values” and is a text file format that uses a comma as a delimiter to separate values from one another. Each row is its own record and each value is its own field. Most CSV files have records that are all the same length.

Unfortunately, CSV is not a standardized file format, which makes using them directly more complicated, especially when the data of an individual field itself contains commas or line breaks. Some organizations use quotation marks as an attempt to solve this problem, but then the issue is shifted to what happens when you need quotation marks in that field?

A couple of the benefits of CSV files is that they are human readable, and most spreadsheet software can use them. For example, Microsoft Excel and Libre Office will happily open CSV files for you and format them into rows and columns.

Python has made creating and reading CSV files much easier via its csv library. It works with most CSV files out of the box and allows some customization of its readers and writers. A reader is what the csv module uses to parse the CSV file, while a writer is used to create/update csv files.

In this article, you will learn about the following:

Reading a CSV File
Reading a CSV File with DictReader
Writing a CSV File
Writing a CSV File with DictWriter

If you need more information about the csv module, be sure to check out the documentation.

Let’s start learning how to work with CSV files!

Reading a CSV File

Reading CSV files with Python is pretty straight-forward once you know how to do so. The first piece of the puzzle is to have a CSV file that you want to read. For the purposes of this section, you can create one named books.csv and copy the following text into it:

book_title,author,publisher,pub_date,isbn
Python 101,Mike Driscoll, Mike Driscoll,2020,123456789
wxPython Recipes,Mike Driscoll,Apress,2018,978-1-4842-3237-8
Python Interviews,Mike Driscoll,Packt Publishing,2018,9781788399081

The first row of data is known as the header record. It explains what each field of data represents. Let’s write some code to read this CSV file into Python so you can work with its content. Go ahead and create a file named csv_reader.py and enter the following code into it:

# csv_reader.py

import csv

def process_csv(path):
    with open(path) as csvfile:
        reader = csv.reader(csvfile)
        for row in reader:
            print(row)

if __name__ == '__main__':
    process_csv('books.csv')

Here you import csv and create a function called process_csv(), which accepts the path to the CSV file as its sole argument. Then you open that file and pass it to csv.reader() to create a reader object. You can then iterate over this object line-by-line and print it out.

Here is the output you will receive when you run the code:

['book_title', 'author', 'publisher', 'pub_date', 'isbn']
['Python 101', 'Mike Driscoll', ' Mike Driscoll', '2020', '123456789']
['wxPython Recipes', 'Mike Driscoll', 'Apress', '2018', '978-1-4842-3237-8']
['Python Interviews', 'Mike Driscoll', 'Packt Publishing', '2018', '9781788399081']

Most of the time, you probably won’t need to process the header row. You can skip that row by updating your code like this:

# csv_reader_no_header.py

import csv

def process_csv(path):
    with open(path) as csvfile:
        reader = csv.reader(csvfile)
        # Skip the header
        next(reader, None)
        for row in reader:
            print(row)

if __name__ == '__main__':
    process_csv('books.csv')

Python’s next() function will take an iterable, such as reader, and return the next item from the iterable. This will, in effect, skip the first row. If you run this code, you will see that the output is now missing the header row:

['Python 101', 'Mike Driscoll', ' Mike Driscoll', '2020', '123456789']
['wxPython Recipes', 'Mike Driscoll', 'Apress', '2018', '978-1-4842-3237-8']
['Python Interviews', 'Mike Driscoll', 'Packt Publishing', '2018', '9781788399081']

The csv.reader() function takes in some other optional arguments that are quite useful. For example, you might have a file that uses a delimiter other than a comma. You can use the delimiter argument to tell the csv module to parse the file based on that information.

Here is an example of how you might parse a file that uses a colon as its delimiter:

reader = csv.reader(csvfile, delimiter=':')

You should try creating a few variations of the original data file and then read them in using the delimiter argument.

Let’s learn about another way to read CSV files!

Reading a CSV File with `DictReader`

The csv module provides a second “reader” object you can use called the DictReader class. The nice thing about the DictReader is that when you iterate over it, each row is returned as a Python dictionary. Go ahead and create a new file named csv_dict_reader.py and enter the following code:

# csv_dict_reader.py

import csv

def process_csv_dict_reader(file_obj):
    reader = csv.DictReader(file_obj)
    for line in reader:
        print(f'{line["book_title"]} by {line["author"]}')

if __name__ == '__main__':
    with open('books.csv') as csvfile:
        process_csv_dict_reader(csvfile)

In this code you create a process_csv_dict_reader() function that takes in a file object rather than a file path. Then you convert the file object into a Python dictionary using DictReader(). Next, you loop over the reader object and print out a couple fields from each record using Python’s dictionary access syntax.

You can see the output from running this code below:

Python 101 by Mike Driscoll
wxPython Recipes by Mike Driscoll
Python Interviews by Mike Driscoll

csv.DictReader() makes accessing fields within records much more intuitive than the regular csv.reader object. Try using it on one of your own CSV files to gain additional practice.

Now, you will learn how to write a CSV file using Python’s csv module!

Writing a CSV File

Python’s csv module wouldn’t be complete without some way to create a CSV file. In fact, Python has two ways. Let’s start by looking at the first method below. Go ahead and create a new file named csv_writer.py and enter the following code:

# csv_writer.py

import csv

def csv_writer(path, data):
    with open(path, 'w') as csvfile:
        writer = csv.writer(csvfile, delimiter=',')
        for row in data:
            writer.writerow(row)

if __name__ == '__main__':
    data = '''book_title,author,publisher,pub_date,isbn
    Python 101,Mike Driscoll, Mike Driscoll,2020,123456789
    wxPython Recipes,Mike Driscoll,Apress,2018,978-1-4842-3237-8
    Python Interviews,Mike Driscoll,Packt Publishing,2018,9781788399081'''
    records = []
    for line in data.splitlines():
        records.append(line.strip().split(','))
    csv_writer('output.csv', records)

In this code, you create a csv_writer() function that takes two arguments:

The path to the CSV file that you want to create
The data that you want to write to the file

To write data to a file, you need to create a writer() object. You can set the delimiter to something other than commas if you want to, but to keep things consistent, this example explicitly sets it to a comma. When you are ready to write data to the writer(), you will use writerow(), which takes in a list of strings.

The code that is outside of the csv_writer() function takes a multiline string and transforms it into a list of lists for you.

If you would like to write all the rows in the list at once, you can use the writerows() function. Here is an example for that:

# csv_writer_rows.py

import csv

def csv_writer(path, data):
    with open(path, 'w') as csvfile:
        writer = csv.writer(csvfile, delimiter=',')
        writer.writerows(data)

if __name__ == '__main__':
    data = '''book_title,author,publisher,pub_date,isbn
    Python 101,Mike Driscoll, Mike Driscoll,2020,123456789
    wxPython Recipes,Mike Driscoll,Apress,2018,978-1-4842-3237-8
    Python Interviews,Mike Driscoll,Packt Publishing,2018,9781788399081'''
    records = []
    for line in data.splitlines():
        records.append(line.strip().split(','))
    csv_writer('output2.csv', records)

Instead of looping over the data row by row, you can write the entire list of lists to the file all at once.

This was the first method of creating a CSV file. Now let’s learn about the second method: the DictWriter!

Writing a CSV File with `DictWriter`

The DictWriter is the complement class of the DictReader. It works in a similar manner as well. To learn how to use it, create a file named csv_dict_writer.py and enter the following:

# csv_dict_writer.py

import csv

def csv_dict_writer(path, headers, data):
    with open(path, 'w') as csvfile:
        writer = csv.DictWriter(
                csvfile,
                delimiter=',',
                fieldnames=headers,
                )
        writer.writeheader()
        for record in data:
            writer.writerow(record)

if __name__ == '__main__':
    data = '''book_title,author,publisher,pub_date,isbn
    Python 101,Mike Driscoll, Mike Driscoll,2020,123456789
    wxPython Recipes,Mike Driscoll,Apress,2018,978-1-4842-3237-8
    Python Interviews,Mike Driscoll,Packt Publishing,2018,9781788399081'''
    records = []
    for line in data.splitlines():
        records.append(line.strip().split(','))
    headers = records.pop(0)
    
    list_of_dicts = []
    for row in records:
        my_dict = dict(zip(headers, row))
        list_of_dicts.append(my_dict)
    
    csv_dict_writer('output_dict.csv', headers, list_of_dicts)

In this example, you pass in three arguments to csv_dict_writer():

The path to the file that you are creating
The header row (a list of strings)
The data argument as a Python list of dictionaries

When you instantiate DictWriter(), you give it a file object, set the delimiter, and, using the headers parameter, tell it what the fieldnames are. Next, you call writeheader() to write that header to the file. Finally, you loop over the data as you did before and use writerow() to write each record to the file. However, the record is now a dictionary instead of a list.

The code outside the csv_dict_writer() function is used to create the pieces you need to feed to the function. Once again, you create a list of lists, but this time you extract the first row and save it off in headers. Then you loop over the rest of the records and turn them into a list of dictionaries.

Wrapping Up

Python’s csv module is great! You can read and write CSV files with very few lines of code. In this article you learned how to do that in the following sections:

Reading a CSV File
Reading a CSV File with DictReader
Writing a CSV File
Writing a CSV File with DictWriter

There are other ways to work with CSV files in Python. One popular method is to use the pandas package. Pandas is primarily used for data analysis and data science, so using it for working with CSVs seems like using a sledgehammer on a nail. Python’s csv module is quite capable all on its own. But you are welcome to check out pandas and see how it might work for this use-case.

If you don’t work as a data scientist, you probably won’t be using pandas. In that case, Python’s csv module works fine. Go ahead and put in some more practice with Python’s csv module to see how nice it is to work with!