Kaydet (Commit) c6b607d4 authored tarafından Benjamin Peterson's avatar Benjamin Peterson

port simplejson upgrade from the trunk #4136

json also now works only with unicode strings

Patch by Antoine Pitrou; updated by me
üst 7255f185
......@@ -112,7 +112,7 @@ Using json.tool from the shell to validate and pretty-print::
Basic Usage
-----------
.. function:: dump(obj, fp[, skipkeys[, ensure_ascii[, check_circular[, allow_nan[, cls[, indent[, separators[, encoding[, default[, **kw]]]]]]]]]])
.. function:: dump(obj, fp[, skipkeys[, ensure_ascii[, check_circular[, allow_nan[, cls[, indent[, separators[, default[, **kw]]]]]]]]]])
Serialize *obj* as a JSON formatted stream to *fp* (a ``.write()``-supporting
file-like object).
......@@ -122,11 +122,10 @@ Basic Usage
:class:`float`, :class:`bool`, ``None``) will be skipped instead of raising a
:exc:`TypeError`.
If *ensure_ascii* is ``False`` (default: ``True``), then some chunks written
to *fp* may be :class:`unicode` instances, subject to normal Python
:class:`str` to :class:`unicode` coercion rules. Unless ``fp.write()``
explicitly understands :class:`unicode` (as in :func:`codecs.getwriter`) this
is likely to cause an error.
The :mod:`json` module always produces :class:`str` objects, not
:class:`bytes` objects. Therefore, ``fp.write()`` must support :class:`str`
input.
If *check_circular* is ``False`` (default: ``True``), then the circular
reference check for container types will be skipped and a circular reference
......@@ -146,8 +145,6 @@ Basic Usage
will be used instead of the default ``(', ', ': ')`` separators. ``(',',
':')`` is the most compact JSON representation.
*encoding* is the character encoding for str instances, default is UTF-8.
*default(obj)* is a function that should return a serializable version of
*obj* or raise :exc:`TypeError`. The default simply raises :exc:`TypeError`.
......@@ -156,26 +153,17 @@ Basic Usage
*cls* kwarg.
.. function:: dumps(obj[, skipkeys[, ensure_ascii[, check_circular[, allow_nan[, cls[, indent[, separators[, encoding[, default[, **kw]]]]]]]]]])
.. function:: dumps(obj[, skipkeys[, ensure_ascii[, check_circular[, allow_nan[, cls[, indent[, separators[, default[, **kw]]]]]]]]]])
Serialize *obj* to a JSON formatted :class:`str`.
Serialize *obj* to a JSON formatted :class:`str`. The arguments have the
same meaning as in :func:`dump`.
If *ensure_ascii* is ``False``, then the return value will be a
:class:`unicode` instance. The other arguments have the same meaning as in
:func:`dump`.
.. function:: load(fp[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]])
.. function:: load(fp[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]])
Deserialize *fp* (a ``.read()``-supporting file-like object containing a JSON
document) to a Python object.
If the contents of *fp* are encoded with an ASCII based encoding other than
UTF-8 (e.g. latin-1), then an appropriate *encoding* name must be specified.
Encodings that are not ASCII based (such as UCS-2) are not allowed, and
should be wrapped with ``codecs.getreader(encoding)(fp)``, or simply decoded
to a :class:`unicode` object and passed to :func:`loads`.
*object_hook* is an optional function that will be called with the result of
any object literal decode (a :class:`dict`). The return value of
*object_hook* will be used instead of the :class:`dict`. This feature can be used
......@@ -241,7 +229,7 @@ Encoders and decoders
+---------------+-------------------+
| array | list |
+---------------+-------------------+
| string | unicode |
| string | str |
+---------------+-------------------+
| number (int) | int |
+---------------+-------------------+
......@@ -257,13 +245,6 @@ Encoders and decoders
It also understands ``NaN``, ``Infinity``, and ``-Infinity`` as their
corresponding ``float`` values, which is outside the JSON spec.
*encoding* determines the encoding used to interpret any :class:`str` objects
decoded by this instance (UTF-8 by default). It has no effect when decoding
:class:`unicode` objects.
Note that currently only encodings that are a superset of ASCII work, strings
of other encodings should be passed in as :class:`unicode`.
*object_hook*, if specified, will be called with the result of every JSON
object decoded and its return value will be used in place of the given
:class:`dict`. This can be used to provide custom deserializations (e.g. to
......@@ -298,20 +279,20 @@ Encoders and decoders
.. method:: decode(s)
Return the Python representation of *s* (a :class:`str` or
:class:`unicode` instance containing a JSON document)
Return the Python representation of *s* (a :class:`str` instance
containing a JSON document)
.. method:: raw_decode(s)
Decode a JSON document from *s* (a :class:`str` or :class:`unicode`
beginning with a JSON document) and return a 2-tuple of the Python
representation and the index in *s* where the document ended.
Decode a JSON document from *s* (a :class:`str` beginning with a
JSON document) and return a 2-tuple of the Python representation
and the index in *s* where the document ended.
This can be used to decode a JSON document from a string that may have
extraneous data at the end.
.. class:: JSONEncoder([skipkeys[, ensure_ascii[, check_circular[, allow_nan[, sort_keys[, indent[, separators[, encoding[, default]]]]]]]]])
.. class:: JSONEncoder([skipkeys[, ensure_ascii[, check_circular[, allow_nan[, sort_keys[, indent[, separators[, default]]]]]]]])
Extensible JSON encoder for Python data structures.
......@@ -324,7 +305,7 @@ Encoders and decoders
+-------------------+---------------+
| list, tuple | array |
+-------------------+---------------+
| str, unicode | string |
| str | string |
+-------------------+---------------+
| int, float | number |
+-------------------+---------------+
......@@ -344,9 +325,9 @@ Encoders and decoders
attempt encoding of keys that are not str, int, float or None. If
*skipkeys* is ``True``, such items are simply skipped.
If *ensure_ascii* is ``True`` (the default), the output is guaranteed to be
:class:`str` objects with all incoming unicode characters escaped. If
*ensure_ascii* is ``False``, the output will be a unicode object.
If *ensure_ascii* is ``True`` (the default), the output is guaranteed to
have all incoming non-ASCII characters escaped. If *ensure_ascii* is
``False``, these characters will be output as-is.
If *check_circular* is ``True`` (the default), then lists, dicts, and custom
encoded objects will be checked for circular references during encoding to
......@@ -376,10 +357,6 @@ Encoders and decoders
otherwise be serialized. It should return a JSON encodable version of the
object or raise a :exc:`TypeError`.
If *encoding* is not ``None``, then all input strings will be transformed
into unicode using that encoding prior to JSON-encoding. The default is
UTF-8.
.. method:: default(o)
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
"""Iterator based sre token scanner
"""JSON token scanner
"""
import re
import sre_parse
import sre_compile
import sre_constants
from re import VERBOSE, MULTILINE, DOTALL
from sre_constants import BRANCH, SUBPATTERN
try:
from _json import make_scanner as c_make_scanner
except ImportError:
c_make_scanner = None
__all__ = ['Scanner', 'pattern']
__all__ = ['make_scanner']
FLAGS = (VERBOSE | MULTILINE | DOTALL)
NUMBER_RE = re.compile(
r'(-?(?:0|[1-9]\d*))(\.\d+)?([eE][-+]?\d+)?',
(re.VERBOSE | re.MULTILINE | re.DOTALL))
class Scanner(object):
def __init__(self, lexicon, flags=FLAGS):
self.actions = [None]
# Combine phrases into a compound pattern
s = sre_parse.Pattern()
s.flags = flags
p = []
for idx, token in enumerate(lexicon):
phrase = token.pattern
try:
subpattern = sre_parse.SubPattern(s,
[(SUBPATTERN, (idx + 1, sre_parse.parse(phrase, flags)))])
except sre_constants.error:
raise
p.append(subpattern)
self.actions.append(token)
def py_make_scanner(context):
parse_object = context.parse_object
parse_array = context.parse_array
parse_string = context.parse_string
match_number = NUMBER_RE.match
strict = context.strict
parse_float = context.parse_float
parse_int = context.parse_int
parse_constant = context.parse_constant
object_hook = context.object_hook
s.groups = len(p) + 1 # NOTE(guido): Added to make SRE validation work
p = sre_parse.SubPattern(s, [(BRANCH, (None, p))])
self.scanner = sre_compile.compile(p)
def _scan_once(string, idx):
try:
nextchar = string[idx]
except IndexError:
raise StopIteration
def iterscan(self, string, idx=0, context=None):
"""Yield match, end_idx for each match
if nextchar == '"':
return parse_string(string, idx + 1, strict)
elif nextchar == '{':
return parse_object((string, idx + 1), strict,
_scan_once, object_hook, object_pairs_hook)
elif nextchar == '[':
return parse_array((string, idx + 1), _scan_once)
elif nextchar == 'n' and string[idx:idx + 4] == 'null':
return None, idx + 4
elif nextchar == 't' and string[idx:idx + 4] == 'true':
return True, idx + 4
elif nextchar == 'f' and string[idx:idx + 5] == 'false':
return False, idx + 5
"""
match = self.scanner.scanner(string, idx).match
actions = self.actions
lastend = idx
end = len(string)
while True:
m = match()
if m is None:
break
matchbegin, matchend = m.span()
if lastend == matchend:
break
action = actions[m.lastindex]
if action is not None:
rval, next_pos = action(m, context)
if next_pos is not None and next_pos != matchend:
# "fast forward" the scanner
matchend = next_pos
match = self.scanner.scanner(string, matchend).match
yield rval, matchend
lastend = matchend
m = match_number(string, idx)
if m is not None:
integer, frac, exp = m.groups()
if frac or exp:
res = parse_float(integer + (frac or '') + (exp or ''))
else:
res = parse_int(integer)
return res, m.end()
elif nextchar == 'N' and string[idx:idx + 3] == 'NaN':
return parse_constant('NaN'), idx + 3
elif nextchar == 'I' and string[idx:idx + 8] == 'Infinity':
return parse_constant('Infinity'), idx + 8
elif nextchar == '-' and string[idx:idx + 9] == '-Infinity':
return parse_constant('-Infinity'), idx + 9
else:
raise StopIteration
return _scan_once
def pattern(pattern, flags=FLAGS):
def decorator(fn):
fn.pattern = pattern
fn.regex = re.compile(pattern, flags)
return fn
return decorator
make_scanner = c_make_scanner or py_make_scanner
......@@ -32,3 +32,10 @@ class TestDecode(TestCase):
object_pairs_hook = OrderedDict,
object_hook = lambda x: None),
OrderedDict(p))
def test_decoder_optimizations(self):
# Several optimizations were made that skip over calls to
# the whitespace regex, so this test is designed to try and
# exercise the uncommon cases. The array cases are already covered.
rval = json.loads('{ "key" : "value" , "k":"v" }')
self.assertEquals(rval, {"key":"value", "k":"v"})
......@@ -11,3 +11,11 @@ class TestDump(TestCase):
def test_dumps(self):
self.assertEquals(json.dumps({}), '{}')
def test_encode_truefalse(self):
self.assertEquals(json.dumps(
{True: False, False: True}, sort_keys=True),
'{"false": true, "true": false}')
self.assertEquals(json.dumps(
{2: 3.0, 4.0: 5, False: 1, 6: True}, sort_keys=True),
'{"false": 1, "2": 3.0, "4.0": 5, "6": true}')
......@@ -3,22 +3,20 @@ from unittest import TestCase
import json.encoder
CASES = [
('/\\"\ucafe\ubabe\uab98\ufcde\ubcda\uef4a\x08\x0c\n\r\t`1~!@#$%^&*()_+-=[]{}|;:\',./<>?', b'"/\\\\\\"\\ucafe\\ubabe\\uab98\\ufcde\\ubcda\\uef4a\\b\\f\\n\\r\\t`1~!@#$%^&*()_+-=[]{}|;:\',./<>?"'),
('\u0123\u4567\u89ab\ucdef\uabcd\uef4a', b'"\\u0123\\u4567\\u89ab\\ucdef\\uabcd\\uef4a"'),
('controls', b'"controls"'),
('\x08\x0c\n\r\t', b'"\\b\\f\\n\\r\\t"'),
('{"object with 1 member":["array with 1 element"]}', b'"{\\"object with 1 member\\":[\\"array with 1 element\\"]}"'),
(' s p a c e d ', b'" s p a c e d "'),
('\U0001d120', b'"\\ud834\\udd20"'),
('\u03b1\u03a9', b'"\\u03b1\\u03a9"'),
(b'\xce\xb1\xce\xa9', b'"\\u03b1\\u03a9"'),
('\u03b1\u03a9', b'"\\u03b1\\u03a9"'),
(b'\xce\xb1\xce\xa9', b'"\\u03b1\\u03a9"'),
('\u03b1\u03a9', b'"\\u03b1\\u03a9"'),
('\u03b1\u03a9', b'"\\u03b1\\u03a9"'),
("`1~!@#$%^&*()_+-={':[,]}|;.</>?", b'"`1~!@#$%^&*()_+-={\':[,]}|;.</>?"'),
('\x08\x0c\n\r\t', b'"\\b\\f\\n\\r\\t"'),
('\u0123\u4567\u89ab\ucdef\uabcd\uef4a', b'"\\u0123\\u4567\\u89ab\\ucdef\\uabcd\\uef4a"'),
('/\\"\ucafe\ubabe\uab98\ufcde\ubcda\uef4a\x08\x0c\n\r\t`1~!@#$%^&*()_+-=[]{}|;:\',./<>?', '"/\\\\\\"\\ucafe\\ubabe\\uab98\\ufcde\\ubcda\\uef4a\\b\\f\\n\\r\\t`1~!@#$%^&*()_+-=[]{}|;:\',./<>?"'),
('\u0123\u4567\u89ab\ucdef\uabcd\uef4a', '"\\u0123\\u4567\\u89ab\\ucdef\\uabcd\\uef4a"'),
('controls', '"controls"'),
('\x08\x0c\n\r\t', '"\\b\\f\\n\\r\\t"'),
('{"object with 1 member":["array with 1 element"]}', '"{\\"object with 1 member\\":[\\"array with 1 element\\"]}"'),
(' s p a c e d ', '" s p a c e d "'),
('\U0001d120', '"\\ud834\\udd20"'),
('\u03b1\u03a9', '"\\u03b1\\u03a9"'),
('\u03b1\u03a9', '"\\u03b1\\u03a9"'),
('\u03b1\u03a9', '"\\u03b1\\u03a9"'),
('\u03b1\u03a9', '"\\u03b1\\u03a9"'),
("`1~!@#$%^&*()_+-={':[,]}|;.</>?", '"`1~!@#$%^&*()_+-={\':[,]}|;.</>?"'),
('\x08\x0c\n\r\t', '"\\b\\f\\n\\r\\t"'),
('\u0123\u4567\u89ab\ucdef\uabcd\uef4a', '"\\u0123\\u4567\\u89ab\\ucdef\\uabcd\\uef4a"'),
]
class TestEncodeBaseStringAscii(TestCase):
......@@ -26,12 +24,14 @@ class TestEncodeBaseStringAscii(TestCase):
self._test_encode_basestring_ascii(json.encoder.py_encode_basestring_ascii)
def test_c_encode_basestring_ascii(self):
if json.encoder.c_encode_basestring_ascii is not None:
self._test_encode_basestring_ascii(json.encoder.c_encode_basestring_ascii)
if not json.encoder.c_encode_basestring_ascii:
return
self._test_encode_basestring_ascii(json.encoder.c_encode_basestring_ascii)
def _test_encode_basestring_ascii(self, encode_basestring_ascii):
fname = encode_basestring_ascii.__name__
for input_string, expect in CASES:
result = encode_basestring_ascii(input_string)
result = result.encode("ascii")
self.assertEquals(result, expect)
self.assertEquals(result, expect,
'{0!r} != {1!r} for {2}({3!r})'.format(
result, expect, fname, input_string))
......@@ -73,4 +73,4 @@ class TestFail(TestCase):
except ValueError:
pass
else:
self.fail("Expected failure for fail%d.json: %r" % (idx, doc))
self.fail("Expected failure for fail{0}.json: {1!r}".format(idx, doc))
......@@ -5,5 +5,11 @@ import json
class TestFloat(TestCase):
def test_floats(self):
for num in [1617161771.7650001, math.pi, math.pi**100, math.pi**-100]:
for num in [1617161771.7650001, math.pi, math.pi**100, math.pi**-100, 3.1]:
self.assertEquals(float(json.dumps(num)), num)
self.assertEquals(json.loads(json.dumps(num)), num)
def test_ints(self):
for num in [1, 1<<32, 1<<64]:
self.assertEquals(json.dumps(num), str(num))
self.assertEquals(int(json.dumps(num)), num)
......@@ -15,96 +15,90 @@ class TestScanString(TestCase):
def _test_scanstring(self, scanstring):
self.assertEquals(
scanstring('"z\\ud834\\udd20x"', 1, None, True),
scanstring('"z\\ud834\\udd20x"', 1, True),
('z\U0001d120x', 16))
if sys.maxunicode == 65535:
self.assertEquals(
scanstring('"z\U0001d120x"', 1, None, True),
scanstring('"z\U0001d120x"', 1, True),
('z\U0001d120x', 6))
else:
self.assertEquals(
scanstring('"z\U0001d120x"', 1, None, True),
scanstring('"z\U0001d120x"', 1, True),
('z\U0001d120x', 5))
self.assertEquals(
scanstring('"\\u007b"', 1, None, True),
scanstring('"\\u007b"', 1, True),
('{', 8))
self.assertEquals(
scanstring('"A JSON payload should be an object or array, not a string."', 1, None, True),
scanstring('"A JSON payload should be an object or array, not a string."', 1, True),
('A JSON payload should be an object or array, not a string.', 60))
self.assertEquals(
scanstring('["Unclosed array"', 2, None, True),
scanstring('["Unclosed array"', 2, True),
('Unclosed array', 17))
self.assertEquals(
scanstring('["extra comma",]', 2, None, True),
scanstring('["extra comma",]', 2, True),
('extra comma', 14))
self.assertEquals(
scanstring('["double extra comma",,]', 2, None, True),
scanstring('["double extra comma",,]', 2, True),
('double extra comma', 21))
self.assertEquals(
scanstring('["Comma after the close"],', 2, None, True),
scanstring('["Comma after the close"],', 2, True),
('Comma after the close', 24))
self.assertEquals(
scanstring('["Extra close"]]', 2, None, True),
scanstring('["Extra close"]]', 2, True),
('Extra close', 14))
self.assertEquals(
scanstring('{"Extra comma": true,}', 2, None, True),
scanstring('{"Extra comma": true,}', 2, True),
('Extra comma', 14))
self.assertEquals(
scanstring('{"Extra value after close": true} "misplaced quoted value"', 2, None, True),
scanstring('{"Extra value after close": true} "misplaced quoted value"', 2, True),
('Extra value after close', 26))
self.assertEquals(
scanstring('{"Illegal expression": 1 + 2}', 2, None, True),
scanstring('{"Illegal expression": 1 + 2}', 2, True),
('Illegal expression', 21))
self.assertEquals(
scanstring('{"Illegal invocation": alert()}', 2, None, True),
scanstring('{"Illegal invocation": alert()}', 2, True),
('Illegal invocation', 21))
self.assertEquals(
scanstring('{"Numbers cannot have leading zeroes": 013}', 2, None, True),
scanstring('{"Numbers cannot have leading zeroes": 013}', 2, True),
('Numbers cannot have leading zeroes', 37))
self.assertEquals(
scanstring('{"Numbers cannot be hex": 0x14}', 2, None, True),
scanstring('{"Numbers cannot be hex": 0x14}', 2, True),
('Numbers cannot be hex', 24))
self.assertEquals(
scanstring('[[[[[[[[[[[[[[[[[[[["Too deep"]]]]]]]]]]]]]]]]]]]]', 21, None, True),
scanstring('[[[[[[[[[[[[[[[[[[[["Too deep"]]]]]]]]]]]]]]]]]]]]', 21, True),
('Too deep', 30))
self.assertEquals(
scanstring('{"Missing colon" null}', 2, None, True),
scanstring('{"Missing colon" null}', 2, True),
('Missing colon', 16))
self.assertEquals(
scanstring('{"Double colon":: null}', 2, None, True),
scanstring('{"Double colon":: null}', 2, True),
('Double colon', 15))
self.assertEquals(
scanstring('{"Comma instead of colon", null}', 2, None, True),
scanstring('{"Comma instead of colon", null}', 2, True),
('Comma instead of colon', 25))
self.assertEquals(
scanstring('["Colon instead of comma": false]', 2, None, True),
scanstring('["Colon instead of comma": false]', 2, True),
('Colon instead of comma', 25))
self.assertEquals(
scanstring('["Bad value", truth]', 2, None, True),
scanstring('["Bad value", truth]', 2, True),
('Bad value', 12))
def test_issue3623(self):
self.assertRaises(ValueError, json.decoder.scanstring, b"xxx", 1,
"xxx")
self.assertRaises(UnicodeDecodeError,
json.encoder.encode_basestring_ascii, b"xx\xff")
......@@ -4,20 +4,8 @@ import json
from collections import OrderedDict
class TestUnicode(TestCase):
def test_encoding1(self):
encoder = json.JSONEncoder(encoding='utf-8')
u = '\N{GREEK SMALL LETTER ALPHA}\N{GREEK CAPITAL LETTER OMEGA}'
s = u.encode('utf-8')
ju = encoder.encode(u)
js = encoder.encode(s)
self.assertEquals(ju, js)
def test_encoding2(self):
u = '\N{GREEK SMALL LETTER ALPHA}\N{GREEK CAPITAL LETTER OMEGA}'
s = u.encode('utf-8')
ju = json.dumps(u, encoding='utf-8')
js = json.dumps(s, encoding='utf-8')
self.assertEquals(ju, js)
# test_encoding1 and test_encoding2 from 2.x are irrelevant (only str
# is supported as input, not bytes).
def test_encoding3(self):
u = '\N{GREEK SMALL LETTER ALPHA}\N{GREEK CAPITAL LETTER OMEGA}'
......@@ -52,8 +40,22 @@ class TestUnicode(TestCase):
def test_unicode_decode(self):
for i in range(0, 0xd7ff):
u = chr(i)
js = '"\\u{0:04x}"'.format(i)
self.assertEquals(json.loads(js), u)
s = '"\\u{0:04x}"'.format(i)
self.assertEquals(json.loads(s), u)
def test_unicode_preservation(self):
self.assertEquals(type(json.loads('""')), str)
self.assertEquals(type(json.loads('"a"')), str)
self.assertEquals(type(json.loads('["a"]')[0]), str)
def test_bytes_encode(self):
self.assertRaises(TypeError, json.dumps, b"hi")
self.assertRaises(TypeError, json.dumps, [b"hi"])
def test_bytes_decode(self):
self.assertRaises(TypeError, json.loads, b'"hi"')
self.assertRaises(TypeError, json.loads, b'["hi"]')
def test_object_pairs_hook_with_unicode(self):
s = '{"xkd":1, "kcw":2, "art":3, "hxm":4, "qrt":5, "pad":6, "hoy":7}'
......
......@@ -2,11 +2,11 @@ r"""Command-line tool to validate and pretty-print JSON
Usage::
$ echo '{"json":"obj"}' | python -mjson.tool
$ echo '{"json":"obj"}' | python -m json.tool
{
"json": "obj"
}
$ echo '{ 1.2:3.4}' | python -mjson.tool
$ echo '{ 1.2:3.4}' | python -m json.tool
Expecting property name: line 1 column 2 (char 2)
"""
......@@ -24,7 +24,7 @@ def main():
infile = open(sys.argv[1], 'rb')
outfile = open(sys.argv[2], 'wb')
else:
raise SystemExit("{0} [infile [outfile]]".format(sys.argv[0]))
raise SystemExit(sys.argv[0] + " [infile [outfile]]")
try:
obj = json.load(infile)
except ValueError as e:
......
......@@ -107,6 +107,8 @@ Installation
Library
-------
- The json module now works exclusively with str and not bytes.
- Issue #3959: The ipaddr module has been added to the standard library.
Contributed by Google.
......
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment