fileinput.py 13.8 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
"""Helper class to quickly write a loop over all standard input files.

Typical use is:

    import fileinput
    for line in fileinput.input():
        process(line)

This iterates over the lines of all files listed in sys.argv[1:],
defaulting to sys.stdin if the list is empty.  If a filename is '-' it
is also replaced by sys.stdin.  To specify an alternative list of
filenames, pass it as the argument to input().  A single file name is
also allowed.

Functions filename(), lineno() return the filename and cumulative line
number of the line that has just been read; filelineno() returns its
line number in the current file; isfirstline() returns true iff the
line just read is the first line of its file; isstdin() returns true
iff the line was read from sys.stdin.  Function nextfile() closes the
current file so that the next iteration will read the first line from
the next file (if any); lines not read from the file will not count
towards the cumulative line count; the filename is not changed until
after the first line of the next file has been read.  Function close()
closes the sequence.

Before any lines have been read, filename() returns None and both line
numbers are zero; nextfile() has no effect.  After all lines have been
read, filename() and the line number functions return the values
pertaining to the last line read; nextfile() has no effect.

31 32 33 34
All files are opened in text mode by default, you can override this by
setting the mode parameter to input() or FileInput.__init__().
If an I/O error occurs during opening or reading a file, the IOError
exception is raised.
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

If sys.stdin is used more than once, the second and further use will
return no lines, except perhaps for interactive use, or if it has been
explicitly reset (e.g. using sys.stdin.seek(0)).

Empty files are opened and immediately closed; the only time their
presence in the list of filenames is noticeable at all is when the
last file opened is empty.

It is possible that the last line of a file doesn't end in a newline
character; otherwise lines are returned including the trailing
newline.

Class FileInput is the implementation; its methods filename(),
lineno(), fileline(), isfirstline(), isstdin(), nextfile() and close()
correspond to the functions in the module.  In addition it has a
readline() method which returns the next input line, and a
__getitem__() method which implements the sequence behavior.  The
sequence must be accessed in strictly sequential order; sequence
access and readline() cannot be mixed.

Optional in-place filtering: if the keyword argument inplace=1 is
passed to input() or to the FileInput constructor, the file is moved
to a backup file and standard output is directed to the input file.
This makes it possible to write a filter that rewrites its input file
in place.  If the keyword argument backup=".<some extension>" is also
given, it specifies the extension for the backup file, and the backup
file remains around; by default, the extension is ".bak" and it is
deleted when the output file is closed.  In-place filtering is
disabled when standard input is read.  XXX The current implementation
does not work for MS-DOS 8+3 filesystems.

67 68 69 70 71 72 73
Performance: this module is unfortunately one of the slower ways of
processing large numbers of input lines.  Nevertheless, a significant
speed-up has been obtained by using readlines(bufsize) instead of
readline().  A new keyword argument, bufsize=N, is present on the
input() function and the FileInput() class to override the default
buffer size.

74 75 76 77 78 79 80 81
XXX Possible additions:

- optional getopt argument processing
- isatty()
- read(), read(size), even readlines()

"""

82
import sys, os
83

Skip Montanaro's avatar
Skip Montanaro committed
84 85 86
__all__ = ["input","close","nextfile","filename","lineno","filelineno",
           "isfirstline","isstdin","FileInput"]

87 88
_state = None

89 90
DEFAULT_BUFSIZE = 8*1024

Tim Peters's avatar
Tim Peters committed
91
def input(files=None, inplace=0, backup="", bufsize=0,
92 93
          mode="r", openhook=None):
    """input([files[, inplace[, backup[, mode[, openhook]]]]])
94 95 96 97

    Create an instance of the FileInput class. The instance will be used
    as global state for the functions of this module, and is also returned
    to use during iteration. The parameters to this function will be passed
Tim Peters's avatar
Tim Peters committed
98
    along to the constructor of the FileInput class.
99
    """
100 101
    global _state
    if _state and _state._file:
102
        raise RuntimeError, "input() already active"
103
    _state = FileInput(files, inplace, backup, bufsize, mode, openhook)
104 105 106
    return _state

def close():
107
    """Close the sequence."""
108 109 110 111
    global _state
    state = _state
    _state = None
    if state:
112
        state.close()
113 114

def nextfile():
115 116 117 118 119 120 121
    """
    Close the current file so that the next iteration will read the first
    line from the next file (if any); lines not read from the file will
    not count towards the cumulative line count. The filename is not
    changed until after the first line of the next file has been read.
    Before the first line has been read, this function has no effect;
    it cannot be used to skip the first file. After the last line of the
Tim Peters's avatar
Tim Peters committed
122
    last file has been read, this function has no effect.
123
    """
124
    if not _state:
125
        raise RuntimeError, "no active input()"
126 127 128
    return _state.nextfile()

def filename():
129 130
    """
    Return the name of the file currently being read.
Tim Peters's avatar
Tim Peters committed
131
    Before the first line has been read, returns None.
132
    """
133
    if not _state:
134
        raise RuntimeError, "no active input()"
135 136 137
    return _state.filename()

def lineno():
138 139 140
    """
    Return the cumulative line number of the line that has just been read.
    Before the first line has been read, returns 0. After the last line
Tim Peters's avatar
Tim Peters committed
141
    of the last file has been read, returns the line number of that line.
142
    """
143
    if not _state:
144
        raise RuntimeError, "no active input()"
145 146 147
    return _state.lineno()

def filelineno():
148 149 150
    """
    Return the line number in the current file. Before the first line
    has been read, returns 0. After the last line of the last file has
Tim Peters's avatar
Tim Peters committed
151
    been read, returns the line number of that line within the file.
152
    """
153
    if not _state:
154
        raise RuntimeError, "no active input()"
155 156
    return _state.filelineno()

157 158 159 160 161 162 163 164 165
def fileno():
    """
    Return the file number of the current file. When no file is currently
    opened, returns -1.
    """
    if not _state:
        raise RuntimeError, "no active input()"
    return _state.fileno()

166
def isfirstline():
167 168
    """
    Returns true the line just read is the first line of its file,
Tim Peters's avatar
Tim Peters committed
169
    otherwise returns false.
170
    """
171
    if not _state:
172
        raise RuntimeError, "no active input()"
173 174 175
    return _state.isfirstline()

def isstdin():
176 177
    """
    Returns true if the last line was read from sys.stdin,
Tim Peters's avatar
Tim Peters committed
178
    otherwise returns false.
179
    """
180
    if not _state:
181
        raise RuntimeError, "no active input()"
182 183 184
    return _state.isstdin()

class FileInput:
185
    """class FileInput([files[, inplace[, backup[, mode[, openhook]]]]])
Tim Peters's avatar
Tim Peters committed
186

187
    Class FileInput is the implementation of the module; its methods
188 189 190
    filename(), lineno(), fileline(), isfirstline(), isstdin(), fileno(),
    nextfile() and close() correspond to the functions of the same name
    in the module.
191 192 193
    In addition it has a readline() method which returns the next
    input line, and a __getitem__() method which implements the
    sequence behavior. The sequence must be accessed in strictly
Tim Peters's avatar
Tim Peters committed
194
    sequential order; random access and readline() cannot be mixed.
195
    """
196

Tim Peters's avatar
Tim Peters committed
197
    def __init__(self, files=None, inplace=0, backup="", bufsize=0,
198
                 mode="r", openhook=None):
199
        if isinstance(files, basestring):
200 201
            files = (files,)
        else:
202 203
            if files is None:
                files = sys.argv[1:]
204
            if not files:
205 206 207
                files = ('-',)
            else:
                files = tuple(files)
208 209 210
        self._files = files
        self._inplace = inplace
        self._backup = backup
211
        self._bufsize = bufsize or DEFAULT_BUFSIZE
212 213 214 215 216 217
        self._savestdout = None
        self._output = None
        self._filename = None
        self._lineno = 0
        self._filelineno = 0
        self._file = None
218
        self._isstdin = False
219
        self._backupfilename = None
220 221
        self._buffer = []
        self._bufindex = 0
222 223 224 225 226
        # restrict mode argument to reading modes
        if mode not in ('r', 'rU', 'U', 'rb'):
            raise ValueError("FileInput opening mode must be one of "
                             "'r', 'rU', 'U' and 'rb'")
        self._mode = mode
227 228
        if inplace and openhook:
            raise ValueError("FileInput cannot use an opening hook in inplace mode")
229
        elif openhook and not hasattr(openhook, '__call__'):
230 231
            raise ValueError("FileInput openhook must be callable")
        self._openhook = openhook
232 233

    def __del__(self):
234
        self.close()
235 236

    def close(self):
237 238
        self.nextfile()
        self._files = ()
239

240 241 242 243
    def __iter__(self):
        return self

    def next(self):
244 245 246 247 248 249 250 251 252
        try:
            line = self._buffer[self._bufindex]
        except IndexError:
            pass
        else:
            self._bufindex += 1
            self._lineno += 1
            self._filelineno += 1
            return line
253 254
        line = self.readline()
        if not line:
255
            raise StopIteration
256
        return line
Tim Peters's avatar
Tim Peters committed
257

258 259 260 261 262 263 264
    def __getitem__(self, i):
        if i != self._lineno:
            raise RuntimeError, "accessing lines out of order"
        try:
            return self.next()
        except StopIteration:
            raise IndexError, "end of input reached"
265 266

    def nextfile(self):
267 268 269 270
        savestdout = self._savestdout
        self._savestdout = 0
        if savestdout:
            sys.stdout = savestdout
271

272 273 274 275
        output = self._output
        self._output = 0
        if output:
            output.close()
276

277 278 279 280
        file = self._file
        self._file = 0
        if file and not self._isstdin:
            file.close()
281

282 283 284 285
        backupfilename = self._backupfilename
        self._backupfilename = 0
        if backupfilename and not self._backup:
            try: os.unlink(backupfilename)
286
            except OSError: pass
287

288
        self._isstdin = False
289 290
        self._buffer = []
        self._bufindex = 0
291 292

    def readline(self):
293 294 295 296 297 298 299 300 301
        try:
            line = self._buffer[self._bufindex]
        except IndexError:
            pass
        else:
            self._bufindex += 1
            self._lineno += 1
            self._filelineno += 1
            return line
302 303 304 305 306 307 308
        if not self._file:
            if not self._files:
                return ""
            self._filename = self._files[0]
            self._files = self._files[1:]
            self._filelineno = 0
            self._file = None
309
            self._isstdin = False
310 311 312 313
            self._backupfilename = 0
            if self._filename == '-':
                self._filename = '<stdin>'
                self._file = sys.stdin
314
                self._isstdin = True
315 316 317
            else:
                if self._inplace:
                    self._backupfilename = (
318
                        self._filename + (self._backup or os.extsep+"bak"))
319 320
                    try: os.unlink(self._backupfilename)
                    except os.error: pass
321
                    # The next few lines may raise IOError
322
                    os.rename(self._filename, self._backupfilename)
323
                    self._file = open(self._backupfilename, self._mode)
324
                    try:
325
                        perm = os.fstat(self._file.fileno()).st_mode
326
                    except OSError:
327 328 329 330 331 332 333
                        self._output = open(self._filename, "w")
                    else:
                        fd = os.open(self._filename,
                                     os.O_CREAT | os.O_WRONLY | os.O_TRUNC,
                                     perm)
                        self._output = os.fdopen(fd, "w")
                        try:
334 335
                            if hasattr(os, 'chmod'):
                                os.chmod(self._filename, perm)
336
                        except OSError:
337
                            pass
338 339 340 341
                    self._savestdout = sys.stdout
                    sys.stdout = self._output
                else:
                    # This may raise IOError
342 343 344 345
                    if self._openhook:
                        self._file = self._openhook(self._filename, self._mode)
                    else:
                        self._file = open(self._filename, self._mode)
346 347 348 349
        self._buffer = self._file.readlines(self._bufsize)
        self._bufindex = 0
        if not self._buffer:
            self.nextfile()
350 351
        # Recursive call
        return self.readline()
352 353

    def filename(self):
354
        return self._filename
355 356

    def lineno(self):
357
        return self._lineno
358 359

    def filelineno(self):
360
        return self._filelineno
361

362 363 364 365 366 367 368 369 370
    def fileno(self):
        if self._file:
            try:
                return self._file.fileno()
            except ValueError:
                return -1
        else:
            return -1

371
    def isfirstline(self):
372
        return self._filelineno == 1
373 374

    def isstdin(self):
375
        return self._isstdin
376

377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396

def hook_compressed(filename, mode):
    ext = os.path.splitext(filename)[1]
    if ext == '.gz':
        import gzip
        return gzip.open(filename, mode)
    elif ext == '.bz2':
        import bz2
        return bz2.BZ2File(filename, mode)
    else:
        return open(filename, mode)


def hook_encoded(encoding):
    import codecs
    def openhook(filename, mode):
        return codecs.open(filename, mode, encoding)
    return openhook


397 398 399 400 401 402
def _test():
    import getopt
    inplace = 0
    backup = 0
    opts, args = getopt.getopt(sys.argv[1:], "ib:")
    for o, a in opts:
403 404
        if o == '-i': inplace = 1
        if o == '-b': backup = a
405
    for line in input(args, inplace=inplace, backup=backup):
406 407 408 409
        if line[-1:] == '\n': line = line[:-1]
        if line[-1:] == '\r': line = line[:-1]
        print "%d: %s[%d]%s %s" % (lineno(), filename(), filelineno(),
                                   isfirstline() and "*" or "", line)
410 411 412 413
    print "%d: %s[%d]" % (lineno(), filename(), filelineno())

if __name__ == '__main__':
    _test()