fileinput.py 14 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
"""Helper class to quickly write a loop over all standard input files.

Typical use is:

    import fileinput
    for line in fileinput.input():
        process(line)

This iterates over the lines of all files listed in sys.argv[1:],
defaulting to sys.stdin if the list is empty.  If a filename is '-' it
is also replaced by sys.stdin.  To specify an alternative list of
filenames, pass it as the argument to input().  A single file name is
also allowed.

Functions filename(), lineno() return the filename and cumulative line
number of the line that has just been read; filelineno() returns its
line number in the current file; isfirstline() returns true iff the
line just read is the first line of its file; isstdin() returns true
iff the line was read from sys.stdin.  Function nextfile() closes the
current file so that the next iteration will read the first line from
the next file (if any); lines not read from the file will not count
towards the cumulative line count; the filename is not changed until
after the first line of the next file has been read.  Function close()
closes the sequence.

Before any lines have been read, filename() returns None and both line
numbers are zero; nextfile() has no effect.  After all lines have been
read, filename() and the line number functions return the values
pertaining to the last line read; nextfile() has no effect.

31 32 33 34
All files are opened in text mode by default, you can override this by
setting the mode parameter to input() or FileInput.__init__().
If an I/O error occurs during opening or reading a file, the IOError
exception is raised.
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

If sys.stdin is used more than once, the second and further use will
return no lines, except perhaps for interactive use, or if it has been
explicitly reset (e.g. using sys.stdin.seek(0)).

Empty files are opened and immediately closed; the only time their
presence in the list of filenames is noticeable at all is when the
last file opened is empty.

It is possible that the last line of a file doesn't end in a newline
character; otherwise lines are returned including the trailing
newline.

Class FileInput is the implementation; its methods filename(),
lineno(), fileline(), isfirstline(), isstdin(), nextfile() and close()
correspond to the functions in the module.  In addition it has a
readline() method which returns the next input line, and a
__getitem__() method which implements the sequence behavior.  The
sequence must be accessed in strictly sequential order; sequence
access and readline() cannot be mixed.

Optional in-place filtering: if the keyword argument inplace=1 is
passed to input() or to the FileInput constructor, the file is moved
to a backup file and standard output is directed to the input file.
This makes it possible to write a filter that rewrites its input file
in place.  If the keyword argument backup=".<some extension>" is also
given, it specifies the extension for the backup file, and the backup
file remains around; by default, the extension is ".bak" and it is
deleted when the output file is closed.  In-place filtering is
disabled when standard input is read.  XXX The current implementation
does not work for MS-DOS 8+3 filesystems.

67 68 69 70 71 72 73
Performance: this module is unfortunately one of the slower ways of
processing large numbers of input lines.  Nevertheless, a significant
speed-up has been obtained by using readlines(bufsize) instead of
readline().  A new keyword argument, bufsize=N, is present on the
input() function and the FileInput() class to override the default
buffer size.

74 75 76 77 78 79 80 81
XXX Possible additions:

- optional getopt argument processing
- isatty()
- read(), read(size), even readlines()

"""

82
import sys, os
83

84 85
__all__ = ["input", "close", "nextfile", "filename", "lineno", "filelineno",
           "isfirstline", "isstdin", "FileInput"]
Skip Montanaro's avatar
Skip Montanaro committed
86

87 88
_state = None

89 90
DEFAULT_BUFSIZE = 8*1024

91
def input(files=None, inplace=False, backup="", bufsize=0,
92
          mode="r", openhook=None):
93 94
    """input(files=None, inplace=False, backup="", bufsize=0, \
mode="r", openhook=None)
95 96 97 98

    Create an instance of the FileInput class. The instance will be used
    as global state for the functions of this module, and is also returned
    to use during iteration. The parameters to this function will be passed
Tim Peters's avatar
Tim Peters committed
99
    along to the constructor of the FileInput class.
100
    """
101 102
    global _state
    if _state and _state._file:
103
        raise RuntimeError("input() already active")
104
    _state = FileInput(files, inplace, backup, bufsize, mode, openhook)
105 106 107
    return _state

def close():
108
    """Close the sequence."""
109 110 111 112
    global _state
    state = _state
    _state = None
    if state:
113
        state.close()
114 115

def nextfile():
116 117 118 119 120 121 122
    """
    Close the current file so that the next iteration will read the first
    line from the next file (if any); lines not read from the file will
    not count towards the cumulative line count. The filename is not
    changed until after the first line of the next file has been read.
    Before the first line has been read, this function has no effect;
    it cannot be used to skip the first file. After the last line of the
Tim Peters's avatar
Tim Peters committed
123
    last file has been read, this function has no effect.
124
    """
125
    if not _state:
126
        raise RuntimeError("no active input()")
127 128 129
    return _state.nextfile()

def filename():
130 131
    """
    Return the name of the file currently being read.
Tim Peters's avatar
Tim Peters committed
132
    Before the first line has been read, returns None.
133
    """
134
    if not _state:
135
        raise RuntimeError("no active input()")
136 137 138
    return _state.filename()

def lineno():
139 140 141
    """
    Return the cumulative line number of the line that has just been read.
    Before the first line has been read, returns 0. After the last line
Tim Peters's avatar
Tim Peters committed
142
    of the last file has been read, returns the line number of that line.
143
    """
144
    if not _state:
145
        raise RuntimeError("no active input()")
146 147 148
    return _state.lineno()

def filelineno():
149 150 151
    """
    Return the line number in the current file. Before the first line
    has been read, returns 0. After the last line of the last file has
Tim Peters's avatar
Tim Peters committed
152
    been read, returns the line number of that line within the file.
153
    """
154
    if not _state:
155
        raise RuntimeError("no active input()")
156 157
    return _state.filelineno()

158 159 160 161 162 163
def fileno():
    """
    Return the file number of the current file. When no file is currently
    opened, returns -1.
    """
    if not _state:
164
        raise RuntimeError("no active input()")
165 166
    return _state.fileno()

167
def isfirstline():
168 169
    """
    Returns true the line just read is the first line of its file,
Tim Peters's avatar
Tim Peters committed
170
    otherwise returns false.
171
    """
172
    if not _state:
173
        raise RuntimeError("no active input()")
174 175 176
    return _state.isfirstline()

def isstdin():
177 178
    """
    Returns true if the last line was read from sys.stdin,
Tim Peters's avatar
Tim Peters committed
179
    otherwise returns false.
180
    """
181
    if not _state:
182
        raise RuntimeError("no active input()")
183 184 185
    return _state.isstdin()

class FileInput:
186
    """class FileInput([files[, inplace[, backup[, mode[, openhook]]]]])
Tim Peters's avatar
Tim Peters committed
187

188
    Class FileInput is the implementation of the module; its methods
189 190 191
    filename(), lineno(), fileline(), isfirstline(), isstdin(), fileno(),
    nextfile() and close() correspond to the functions of the same name
    in the module.
192 193 194
    In addition it has a readline() method which returns the next
    input line, and a __getitem__() method which implements the
    sequence behavior. The sequence must be accessed in strictly
Tim Peters's avatar
Tim Peters committed
195
    sequential order; random access and readline() cannot be mixed.
196
    """
197

198
    def __init__(self, files=None, inplace=False, backup="", bufsize=0,
199
                 mode="r", openhook=None):
200
        if isinstance(files, str):
201 202
            files = (files,)
        else:
203 204
            if files is None:
                files = sys.argv[1:]
205
            if not files:
206 207 208
                files = ('-',)
            else:
                files = tuple(files)
209 210 211
        self._files = files
        self._inplace = inplace
        self._backup = backup
212
        self._bufsize = bufsize or DEFAULT_BUFSIZE
213 214 215 216 217 218
        self._savestdout = None
        self._output = None
        self._filename = None
        self._lineno = 0
        self._filelineno = 0
        self._file = None
219
        self._isstdin = False
220
        self._backupfilename = None
221 222
        self._buffer = []
        self._bufindex = 0
223 224 225 226 227
        # restrict mode argument to reading modes
        if mode not in ('r', 'rU', 'U', 'rb'):
            raise ValueError("FileInput opening mode must be one of "
                             "'r', 'rU', 'U' and 'rb'")
        self._mode = mode
228 229
        if inplace and openhook:
            raise ValueError("FileInput cannot use an opening hook in inplace mode")
230
        elif openhook and not hasattr(openhook, '__call__'):
231 232
            raise ValueError("FileInput openhook must be callable")
        self._openhook = openhook
233 234

    def __del__(self):
235
        self.close()
236 237

    def close(self):
238 239
        self.nextfile()
        self._files = ()
240

241 242 243 244 245 246
    def __enter__(self):
        return self

    def __exit__(self, type, value, traceback):
        self.close()

247 248 249
    def __iter__(self):
        return self

250
    def __next__(self):
251 252 253 254 255 256 257 258 259
        try:
            line = self._buffer[self._bufindex]
        except IndexError:
            pass
        else:
            self._bufindex += 1
            self._lineno += 1
            self._filelineno += 1
            return line
260 261
        line = self.readline()
        if not line:
262
            raise StopIteration
263
        return line
Tim Peters's avatar
Tim Peters committed
264

265 266
    def __getitem__(self, i):
        if i != self._lineno:
267
            raise RuntimeError("accessing lines out of order")
268
        try:
269
            return self.__next__()
270
        except StopIteration:
271
            raise IndexError("end of input reached")
272 273

    def nextfile(self):
274 275 276 277
        savestdout = self._savestdout
        self._savestdout = 0
        if savestdout:
            sys.stdout = savestdout
278

279 280 281 282
        output = self._output
        self._output = 0
        if output:
            output.close()
283

284 285 286 287
        file = self._file
        self._file = 0
        if file and not self._isstdin:
            file.close()
288

289 290 291 292
        backupfilename = self._backupfilename
        self._backupfilename = 0
        if backupfilename and not self._backup:
            try: os.unlink(backupfilename)
293
            except OSError: pass
294

295
        self._isstdin = False
296 297
        self._buffer = []
        self._bufindex = 0
298 299

    def readline(self):
300 301 302 303 304 305 306 307 308
        try:
            line = self._buffer[self._bufindex]
        except IndexError:
            pass
        else:
            self._bufindex += 1
            self._lineno += 1
            self._filelineno += 1
            return line
309 310 311 312 313 314 315
        if not self._file:
            if not self._files:
                return ""
            self._filename = self._files[0]
            self._files = self._files[1:]
            self._filelineno = 0
            self._file = None
316
            self._isstdin = False
317 318 319 320
            self._backupfilename = 0
            if self._filename == '-':
                self._filename = '<stdin>'
                self._file = sys.stdin
321
                self._isstdin = True
322 323 324
            else:
                if self._inplace:
                    self._backupfilename = (
Skip Montanaro's avatar
Skip Montanaro committed
325
                        self._filename + (self._backup or ".bak"))
326 327
                    try: os.unlink(self._backupfilename)
                    except os.error: pass
328
                    # The next few lines may raise IOError
329
                    os.rename(self._filename, self._backupfilename)
330
                    self._file = open(self._backupfilename, self._mode)
331
                    try:
332
                        perm = os.fstat(self._file.fileno()).st_mode
333
                    except OSError:
334 335
                        self._output = open(self._filename, "w")
                    else:
336 337 338 339 340
                        mode = os.O_CREAT | os.O_WRONLY | os.O_TRUNC
                        if hasattr(os, 'O_BINARY'):
                            mode |= os.O_BINARY

                        fd = os.open(self._filename, mode, perm)
341 342
                        self._output = os.fdopen(fd, "w")
                        try:
343 344
                            if hasattr(os, 'chmod'):
                                os.chmod(self._filename, perm)
345
                        except OSError:
346
                            pass
347 348 349 350
                    self._savestdout = sys.stdout
                    sys.stdout = self._output
                else:
                    # This may raise IOError
351 352 353 354
                    if self._openhook:
                        self._file = self._openhook(self._filename, self._mode)
                    else:
                        self._file = open(self._filename, self._mode)
355 356 357 358
        self._buffer = self._file.readlines(self._bufsize)
        self._bufindex = 0
        if not self._buffer:
            self.nextfile()
359 360
        # Recursive call
        return self.readline()
361 362

    def filename(self):
363
        return self._filename
364 365

    def lineno(self):
366
        return self._lineno
367 368

    def filelineno(self):
369
        return self._filelineno
370

371 372 373 374 375 376 377 378 379
    def fileno(self):
        if self._file:
            try:
                return self._file.fileno()
            except ValueError:
                return -1
        else:
            return -1

380
    def isfirstline(self):
381
        return self._filelineno == 1
382 383

    def isstdin(self):
384
        return self._isstdin
385

386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405

def hook_compressed(filename, mode):
    ext = os.path.splitext(filename)[1]
    if ext == '.gz':
        import gzip
        return gzip.open(filename, mode)
    elif ext == '.bz2':
        import bz2
        return bz2.BZ2File(filename, mode)
    else:
        return open(filename, mode)


def hook_encoded(encoding):
    import codecs
    def openhook(filename, mode):
        return codecs.open(filename, mode, encoding)
    return openhook


406 407
def _test():
    import getopt
408 409
    inplace = False
    backup = False
410 411
    opts, args = getopt.getopt(sys.argv[1:], "ib:")
    for o, a in opts:
412
        if o == '-i': inplace = True
413
        if o == '-b': backup = a
414
    for line in input(args, inplace=inplace, backup=backup):
415 416
        if line[-1:] == '\n': line = line[:-1]
        if line[-1:] == '\r': line = line[:-1]
417 418 419
        print("%d: %s[%d]%s %s" % (lineno(), filename(), filelineno(),
                                   isfirstline() and "*" or "", line))
    print("%d: %s[%d]" % (lineno(), filename(), filelineno()))
420 421 422

if __name__ == '__main__':
    _test()