Kaydet (Commit) 0a560a11 authored tarafından R David Murray's avatar R David Murray

#23088: Clarify null termination of bytes and strings in C API.

Patch by Martin Panter, reviewed by Serhiy Storchaka and R. David Murray.
üst 3afdb287
...@@ -64,7 +64,8 @@ Direct API functions ...@@ -64,7 +64,8 @@ Direct API functions
.. c:function:: char* PyByteArray_AsString(PyObject *bytearray) .. c:function:: char* PyByteArray_AsString(PyObject *bytearray)
Return the contents of *bytearray* as a char array after checking for a Return the contents of *bytearray* as a char array after checking for a
*NULL* pointer. *NULL* pointer. The returned array always has an extra
null byte appended.
.. c:function:: int PyByteArray_Resize(PyObject *bytearray, Py_ssize_t len) .. c:function:: int PyByteArray_Resize(PyObject *bytearray, Py_ssize_t len)
......
...@@ -69,8 +69,8 @@ called with a non-bytes parameter. ...@@ -69,8 +69,8 @@ called with a non-bytes parameter.
+===================+===============+================================+ +===================+===============+================================+
| :attr:`%%` | *n/a* | The literal % character. | | :attr:`%%` | *n/a* | The literal % character. |
+-------------------+---------------+--------------------------------+ +-------------------+---------------+--------------------------------+
| :attr:`%c` | int | A single character, | | :attr:`%c` | int | A single byte, |
| | | represented as an C int. | | | | represented as a C int. |
+-------------------+---------------+--------------------------------+ +-------------------+---------------+--------------------------------+
| :attr:`%d` | int | Exactly equivalent to | | :attr:`%d` | int | Exactly equivalent to |
| | | ``printf("%d")``. | | | | ``printf("%d")``. |
...@@ -109,7 +109,7 @@ called with a non-bytes parameter. ...@@ -109,7 +109,7 @@ called with a non-bytes parameter.
+-------------------+---------------+--------------------------------+ +-------------------+---------------+--------------------------------+
An unrecognized format character causes all the rest of the format string to be An unrecognized format character causes all the rest of the format string to be
copied as-is to the result string, and any extra arguments discarded. copied as-is to the result object, and any extra arguments discarded.
.. c:function:: PyObject* PyBytes_FromFormatV(const char *format, va_list vargs) .. c:function:: PyObject* PyBytes_FromFormatV(const char *format, va_list vargs)
...@@ -136,11 +136,13 @@ called with a non-bytes parameter. ...@@ -136,11 +136,13 @@ called with a non-bytes parameter.
.. c:function:: char* PyBytes_AsString(PyObject *o) .. c:function:: char* PyBytes_AsString(PyObject *o)
Return a NUL-terminated representation of the contents of *o*. The pointer Return a pointer to the contents of *o*. The pointer
refers to the internal buffer of *o*, not a copy. The data must not be refers to the internal buffer of *o*, which consists of ``len(o) + 1``
modified in any way, unless the string was just created using bytes. The last byte in the buffer is always null, regardless of
whether there are any other null bytes. The data must not be
modified in any way, unless the object was just created using
``PyBytes_FromStringAndSize(NULL, size)``. It must not be deallocated. If ``PyBytes_FromStringAndSize(NULL, size)``. It must not be deallocated. If
*o* is not a string object at all, :c:func:`PyBytes_AsString` returns *NULL* *o* is not a bytes object at all, :c:func:`PyBytes_AsString` returns *NULL*
and raises :exc:`TypeError`. and raises :exc:`TypeError`.
...@@ -151,16 +153,18 @@ called with a non-bytes parameter. ...@@ -151,16 +153,18 @@ called with a non-bytes parameter.
.. c:function:: int PyBytes_AsStringAndSize(PyObject *obj, char **buffer, Py_ssize_t *length) .. c:function:: int PyBytes_AsStringAndSize(PyObject *obj, char **buffer, Py_ssize_t *length)
Return a NUL-terminated representation of the contents of the object *obj* Return the null-terminated contents of the object *obj*
through the output variables *buffer* and *length*. through the output variables *buffer* and *length*.
If *length* is *NULL*, the resulting buffer may not contain NUL characters; If *length* is *NULL*, the bytes object
may not contain embedded null bytes;
if it does, the function returns ``-1`` and a :exc:`TypeError` is raised. if it does, the function returns ``-1`` and a :exc:`TypeError` is raised.
The buffer refers to an internal string buffer of *obj*, not a copy. The data The buffer refers to an internal buffer of *obj*, which includes an
must not be modified in any way, unless the string was just created using additional null byte at the end (not counted in *length*). The data
must not be modified in any way, unless the object was just created using
``PyBytes_FromStringAndSize(NULL, size)``. It must not be deallocated. If ``PyBytes_FromStringAndSize(NULL, size)``. It must not be deallocated. If
*string* is not a string object at all, :c:func:`PyBytes_AsStringAndSize` *obj* is not a bytes object at all, :c:func:`PyBytes_AsStringAndSize`
returns ``-1`` and raises :exc:`TypeError`. returns ``-1`` and raises :exc:`TypeError`.
...@@ -168,14 +172,14 @@ called with a non-bytes parameter. ...@@ -168,14 +172,14 @@ called with a non-bytes parameter.
Create a new bytes object in *\*bytes* containing the contents of *newpart* Create a new bytes object in *\*bytes* containing the contents of *newpart*
appended to *bytes*; the caller will own the new reference. The reference to appended to *bytes*; the caller will own the new reference. The reference to
the old value of *bytes* will be stolen. If the new string cannot be the old value of *bytes* will be stolen. If the new object cannot be
created, the old reference to *bytes* will still be discarded and the value created, the old reference to *bytes* will still be discarded and the value
of *\*bytes* will be set to *NULL*; the appropriate exception will be set. of *\*bytes* will be set to *NULL*; the appropriate exception will be set.
.. c:function:: void PyBytes_ConcatAndDel(PyObject **bytes, PyObject *newpart) .. c:function:: void PyBytes_ConcatAndDel(PyObject **bytes, PyObject *newpart)
Create a new string object in *\*bytes* containing the contents of *newpart* Create a new bytes object in *\*bytes* containing the contents of *newpart*
appended to *bytes*. This version decrements the reference count of appended to *bytes*. This version decrements the reference count of
*newpart*. *newpart*.
......
...@@ -227,7 +227,10 @@ access internal read-only data of Unicode objects: ...@@ -227,7 +227,10 @@ access internal read-only data of Unicode objects:
const char* PyUnicode_AS_DATA(PyObject *o) const char* PyUnicode_AS_DATA(PyObject *o)
Return a pointer to a :c:type:`Py_UNICODE` representation of the object. The Return a pointer to a :c:type:`Py_UNICODE` representation of the object. The
``AS_DATA`` form casts the pointer to :c:type:`const char *`. *o* has to be returned buffer is always terminated with an extra null code point. It
may also contain embedded null code points, which would cause the string
to be truncated when used in most C functions. The ``AS_DATA`` form
casts the pointer to :c:type:`const char *`. The *o* argument has to be
a Unicode object (not checked). a Unicode object (not checked).
.. versionchanged:: 3.3 .. versionchanged:: 3.3
...@@ -650,7 +653,8 @@ APIs: ...@@ -650,7 +653,8 @@ APIs:
Copy the string *u* into a new UCS4 buffer that is allocated using Copy the string *u* into a new UCS4 buffer that is allocated using
:c:func:`PyMem_Malloc`. If this fails, *NULL* is returned with a :c:func:`PyMem_Malloc`. If this fails, *NULL* is returned with a
:exc:`MemoryError` set. :exc:`MemoryError` set. The returned buffer always has an extra
null code point appended.
.. versionadded:: 3.3 .. versionadded:: 3.3
...@@ -689,8 +693,9 @@ Extension modules can continue using them, as they will not be removed in Python ...@@ -689,8 +693,9 @@ Extension modules can continue using them, as they will not be removed in Python
Return a read-only pointer to the Unicode object's internal Return a read-only pointer to the Unicode object's internal
:c:type:`Py_UNICODE` buffer, or *NULL* on error. This will create the :c:type:`Py_UNICODE` buffer, or *NULL* on error. This will create the
:c:type:`Py_UNICODE*` representation of the object if it is not yet :c:type:`Py_UNICODE*` representation of the object if it is not yet
available. Note that the resulting :c:type:`Py_UNICODE` string may contain available. The buffer is always terminated with an extra null code point.
embedded null characters, which would cause the string to be truncated when Note that the resulting :c:type:`Py_UNICODE` string may also contain
embedded null code points, which would cause the string to be truncated when
used in most C functions. used in most C functions.
Please migrate to using :c:func:`PyUnicode_AsUCS4`, Please migrate to using :c:func:`PyUnicode_AsUCS4`,
...@@ -708,8 +713,9 @@ Extension modules can continue using them, as they will not be removed in Python ...@@ -708,8 +713,9 @@ Extension modules can continue using them, as they will not be removed in Python
.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject *unicode, Py_ssize_t *size) .. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject *unicode, Py_ssize_t *size)
Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE` Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE`
array length in *size*. Note that the resulting :c:type:`Py_UNICODE*` string array length (excluding the extra null terminator) in *size*.
may contain embedded null characters, which would cause the string to be Note that the resulting :c:type:`Py_UNICODE*` string
may contain embedded null code points, which would cause the string to be
truncated when used in most C functions. truncated when used in most C functions.
.. versionadded:: 3.3 .. versionadded:: 3.3
...@@ -717,11 +723,11 @@ Extension modules can continue using them, as they will not be removed in Python ...@@ -717,11 +723,11 @@ Extension modules can continue using them, as they will not be removed in Python
.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeCopy(PyObject *unicode) .. c:function:: Py_UNICODE* PyUnicode_AsUnicodeCopy(PyObject *unicode)
Create a copy of a Unicode string ending with a nul character. Return *NULL* Create a copy of a Unicode string ending with a null code point. Return *NULL*
and raise a :exc:`MemoryError` exception on memory allocation failure, and raise a :exc:`MemoryError` exception on memory allocation failure,
otherwise return a new allocated buffer (use :c:func:`PyMem_Free` to free otherwise return a new allocated buffer (use :c:func:`PyMem_Free` to free
the buffer). Note that the resulting :c:type:`Py_UNICODE*` string may the buffer). Note that the resulting :c:type:`Py_UNICODE*` string may
contain embedded null characters, which would cause the string to be contain embedded null code points, which would cause the string to be
truncated when used in most C functions. truncated when used in most C functions.
.. versionadded:: 3.2 .. versionadded:: 3.2
...@@ -895,10 +901,10 @@ wchar_t Support ...@@ -895,10 +901,10 @@ wchar_t Support
Copy the Unicode object contents into the :c:type:`wchar_t` buffer *w*. At most Copy the Unicode object contents into the :c:type:`wchar_t` buffer *w*. At most
*size* :c:type:`wchar_t` characters are copied (excluding a possibly trailing *size* :c:type:`wchar_t` characters are copied (excluding a possibly trailing
0-termination character). Return the number of :c:type:`wchar_t` characters null termination character). Return the number of :c:type:`wchar_t` characters
copied or -1 in case of an error. Note that the resulting :c:type:`wchar_t*` copied or -1 in case of an error. Note that the resulting :c:type:`wchar_t*`
string may or may not be 0-terminated. It is the responsibility of the caller string may or may not be null-terminated. It is the responsibility of the caller
to make sure that the :c:type:`wchar_t*` string is 0-terminated in case this is to make sure that the :c:type:`wchar_t*` string is null-terminated in case this is
required by the application. Also, note that the :c:type:`wchar_t*` string required by the application. Also, note that the :c:type:`wchar_t*` string
might contain null characters, which would cause the string to be truncated might contain null characters, which would cause the string to be truncated
when used with most C functions. when used with most C functions.
...@@ -907,8 +913,8 @@ wchar_t Support ...@@ -907,8 +913,8 @@ wchar_t Support
.. c:function:: wchar_t* PyUnicode_AsWideCharString(PyObject *unicode, Py_ssize_t *size) .. c:function:: wchar_t* PyUnicode_AsWideCharString(PyObject *unicode, Py_ssize_t *size)
Convert the Unicode object to a wide character string. The output string Convert the Unicode object to a wide character string. The output string
always ends with a nul character. If *size* is not *NULL*, write the number always ends with a null character. If *size* is not *NULL*, write the number
of wide characters (excluding the trailing 0-termination character) into of wide characters (excluding the trailing null termination character) into
*\*size*. *\*size*.
Returns a buffer allocated by :c:func:`PyMem_Alloc` (use Returns a buffer allocated by :c:func:`PyMem_Alloc` (use
...@@ -1038,9 +1044,11 @@ These are the UTF-8 codec APIs: ...@@ -1038,9 +1044,11 @@ These are the UTF-8 codec APIs:
.. c:function:: char* PyUnicode_AsUTF8AndSize(PyObject *unicode, Py_ssize_t *size) .. c:function:: char* PyUnicode_AsUTF8AndSize(PyObject *unicode, Py_ssize_t *size)
Return a pointer to the default encoding (UTF-8) of the Unicode object, and Return a pointer to the UTF-8 encoding of the Unicode object, and
store the size of the encoded representation (in bytes) in *size*. *size* store the size of the encoded representation (in bytes) in *size*. The
can be *NULL*, in this case no size will be stored. *size* argument can be *NULL*; in this case no size will be stored. The
returned buffer always has an extra null byte appended (not included in
*size*), regardless of whether there are any other null code points.
In the case of an error, *NULL* is returned with an exception set and no In the case of an error, *NULL* is returned with an exception set and no
*size* is stored. *size* is stored.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment