:mod:`csv` --- CSV File Reading and Writing
Source code: :source:`Lib/csv.py`
The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. CSV format was used for many years prior to attempts to describe the format in a standardized way in RFC 4180. The lack of a well-defined standard means that subtle differences often exist in the data produced and consumed by different applications. These differences can make it annoying to process CSV files from multiple sources. Still, while the delimiters and quoting characters vary, the overall format is similar enough that it is possible to write a single module which can efficiently manipulate such data, hiding the details of reading and writing the data from the programmer.
The :mod:`csv` module implements classes to read and write tabular data in CSV format. It allows programmers to say, "write this data in the format preferred by Excel," or "read data from this file which was generated by Excel," without knowing the precise details of the CSV format used by Excel. Programmers can also describe the CSV formats understood by other applications or define their own special-purpose CSV formats.
The :mod:`csv` module's :class:`reader` and :class:`writer` objects read and write sequences. Programmers can also read and write data in dictionary form using the :class:`DictReader` and :class:`DictWriter` classes.
Module Contents
The :mod:`csv` module defines the following functions:
The :mod:`csv` module defines the following classes:
The :class:`Dialect` class is a container class relied on primarily for its attributes, which are used to define the parameters for a specific :class:`reader` or :class:`writer` instance.
The :class:`excel` class defines the usual properties of an Excel-generated CSV
file. It is registered with the dialect name 'excel'
.
The :class:`excel_tab` class defines the usual properties of an Excel-generated
TAB-delimited file. It is registered with the dialect name 'excel-tab'
.
The :class:`unix_dialect` class defines the usual properties of a CSV file
generated on UNIX systems, i.e. using '\n'
as line terminator and quoting
all fields. It is registered with the dialect name 'unix'
.
The :class:`Sniffer` class is used to deduce the format of a CSV file.
The :class:`Sniffer` class provides two methods:
An example for :class:`Sniffer` use:
with open('example.csv') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
# ... process CSV file contents here ...
The :mod:`csv` module defines the following constants:
The :mod:`csv` module defines the following exception:
Dialects and Formatting Parameters
To make it easier to specify the format of input and output records, specific formatting parameters are grouped together into dialects. A dialect is a subclass of the :class:`Dialect` class having a set of specific methods and a single :meth:`validate` method. When creating :class:`reader` or :class:`writer` objects, the programmer can specify a string or a subclass of the :class:`Dialect` class as the dialect parameter. In addition to, or instead of, the dialect parameter, the programmer can also specify individual formatting parameters, which have the same names as the attributes defined below for the :class:`Dialect` class.
Dialects support the following attributes:
Reader Objects
Reader objects (:class:`DictReader` instances and objects returned by the :func:`reader` function) have the following public methods:
Reader objects have the following public attributes:
DictReader objects have the following public attribute:
Writer Objects
:class:`Writer` objects (:class:`DictWriter` instances and objects returned by the :func:`writer` function) have the following public methods. A row must be an iterable of strings or numbers for :class:`Writer` objects and a dictionary mapping fieldnames to strings or numbers (by passing them through :func:`str` first) for :class:`DictWriter` objects. Note that complex numbers are written out surrounded by parens. This may cause some problems for other programs which read CSV files (assuming they support complex numbers at all).
Writer objects have the following public attribute:
DictWriter objects have the following public method:
Examples
The simplest example of reading a CSV file:
import csv
with open('some.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
print(row)
Reading a file with an alternate format:
import csv
with open('passwd', newline='') as f:
reader = csv.reader(f, delimiter=':', quoting=csv.QUOTE_NONE)
for row in reader:
print(row)
The corresponding simplest possible writing example is:
import csv
with open('some.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(someiterable)
Since :func:`open` is used to open a CSV file for reading, the file
will by default be decoded into unicode using the system default
encoding (see :func:`locale.getpreferredencoding`). To decode a file
using a different encoding, use the encoding
argument of open:
import csv
with open('some.csv', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
for row in reader:
print(row)
The same applies to writing in something other than the system default encoding: specify the encoding argument when opening the output file.
Registering a new dialect:
import csv
csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
with open('passwd', newline='') as f:
reader = csv.reader(f, 'unixpwd')
A slightly more advanced use of the reader --- catching and reporting errors:
import csv, sys
filename = 'some.csv'
with open(filename, newline='') as f:
reader = csv.reader(f)
try:
for row in reader:
print(row)
except csv.Error as e:
sys.exit('file {}, line {}: {}'.format(filename, reader.line_num, e))
And while the module doesn't directly support parsing strings, it can easily be done:
import csv
for row in csv.reader(['one,two,three']):
print(row)
Footnotes
[1] | If newline='' is not specified, newlines embedded inside quoted fields
will not be interpreted correctly, and on platforms that use \r\n linendings
on write an extra \r will be added. It should always be safe to specify
newline='' , since the csv module does its own
(:term:`universal <universal newlines>`) newline handling. |