• Thomas Wouters's avatar
    Fix crashing bug in tokenizer, when tokenizing files with non-ASCII bytes · 7eaf2aaf
    Thomas Wouters yazdı
    but without a specified encoding: decoding_fgets() (and decoding_feof()) can
    return NULL and fiddle with the 'tok' struct, making tok->buf NULL. This is
    okay in the other cases of calls to decoding_*(), it seems, but not in this
    one.
    
    This should get a test added, somewhere, but the testsuite doesn't seem to
    test encoding anywhere (although plenty of tests use it.)
    
    It seems to me that decoding errors in other places in the code (like at the
    start of a token, instead of in the middle of one) make the code end up
    adding small integers to NULL pointers, but happen to check for error states
    before using the calculated new pointers. I haven't been able to trigger any
    other crashes, in any case.
    
    I would nominate this file for a comlete rewrite for Py3k. The whole
    decoding trick is too bolted-on for my tastes.
    7eaf2aaf
tokenizer.c 32.8 KB