@@ -1123,6 +1123,10 @@ Connection objects
11231123 f.write('%s\n ' % line)
11241124 con.close()
11251125
1126+ .. seealso ::
1127+
1128+ :ref: `sqlite3-howto-encoding `
1129+
11261130
11271131 .. method :: backup(target, *, pages=-1, progress=None, name="main", sleep=0.250)
11281132
@@ -1189,6 +1193,10 @@ Connection objects
11891193
11901194 .. versionadded :: 3.7
11911195
1196+ .. seealso ::
1197+
1198+ :ref: `sqlite3-howto-encoding `
1199+
11921200 .. method :: getlimit(category, /)
11931201
11941202 Get a connection runtime limit.
@@ -1410,39 +1418,8 @@ Connection objects
14101418 and returns a text representation of it.
14111419 The callable is invoked for SQLite values with the ``TEXT `` data type.
14121420 By default, this attribute is set to :class: `str `.
1413- If you want to return ``bytes `` instead, set *text_factory * to ``bytes ``.
14141421
1415- Example:
1416-
1417- .. testcode ::
1418-
1419- con = sqlite3.connect(":memory: ")
1420- cur = con.cursor()
1421-
1422- AUSTRIA = "Österreich"
1423-
1424- # by default, rows are returned as str
1425- cur.execute("SELECT ?", (AUSTRIA,))
1426- row = cur.fetchone()
1427- assert row[0] == AUSTRIA
1428-
1429- # but we can make sqlite3 always return bytestrings ...
1430- con.text_factory = bytes
1431- cur.execute("SELECT ?", (AUSTRIA,))
1432- row = cur.fetchone()
1433- assert type(row[0]) is bytes
1434- # the bytestrings will be encoded in UTF-8, unless you stored garbage in the
1435- # database ...
1436- assert row[0] == AUSTRIA.encode("utf-8")
1437-
1438- # we can also implement a custom text_factory ...
1439- # here we implement one that appends "foo" to all strings
1440- con.text_factory = lambda x: x.decode("utf-8") + "foo"
1441- cur.execute("SELECT ?", ("bar",))
1442- row = cur.fetchone()
1443- assert row[0] == "barfoo"
1444-
1445- con.close()
1422+ See :ref: `sqlite3-howto-encoding ` for more details.
14461423
14471424 .. attribute :: total_changes
14481425
@@ -1601,7 +1578,6 @@ Cursor objects
16011578 COMMIT;
16021579 """)
16031580
1604-
16051581 .. method :: fetchone()
16061582
16071583 If :attr: `~Cursor.row_factory ` is ``None ``,
@@ -2580,6 +2556,47 @@ With some adjustments, the above recipe can be adapted to use a
25802556instead of a :class: `~collections.namedtuple `.
25812557
25822558
2559+ .. _sqlite3-howto-encoding :
2560+
2561+ How to handle non-UTF-8 text encodings
2562+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2563+
2564+ By default, :mod: `!sqlite3 ` uses :class: `str ` to adapt SQLite values
2565+ with the ``TEXT `` data type.
2566+ This works well for UTF-8 encoded text, but it might fail for other encodings
2567+ and invalid UTF-8.
2568+ You can use a custom :attr: `~Connection.text_factory ` to handle such cases.
2569+
2570+ Because of SQLite's `flexible typing `_, it is not uncommon to encounter table
2571+ columns with the ``TEXT `` data type containing non-UTF-8 encodings,
2572+ or even arbitrary data.
2573+ To demonstrate, let's assume we have a database with ISO-8859-2 (Latin-2)
2574+ encoded text, for example a table of Czech-English dictionary entries.
2575+ Assuming we now have a :class: `Connection ` instance :py:data: `!con `
2576+ connected to this database,
2577+ we can decode the Latin-2 encoded text using this :attr: `~Connection.text_factory `:
2578+
2579+ .. testcode ::
2580+
2581+ con.text_factory = lambda data: str(data, encoding="latin2")
2582+
2583+ For invalid UTF-8 or arbitrary data in stored in ``TEXT `` table columns,
2584+ you can use the following technique, borrowed from the :ref: `unicode-howto `:
2585+
2586+ .. testcode ::
2587+
2588+ con.text_factory = lambda data: str(data, errors="surrogateescape")
2589+
2590+ .. note ::
2591+
2592+ The :mod: `!sqlite3 ` module API does not support strings
2593+ containing surrogates.
2594+
2595+ .. seealso ::
2596+
2597+ :ref: `unicode-howto `
2598+
2599+
25832600.. _sqlite3-explanation :
25842601
25852602Explanation
0 commit comments