@@ -1154,6 +1154,10 @@ Connection objects
11541154 f.write('%s\n ' % line)
11551155 con.close()
11561156
1157+ .. seealso ::
1158+
1159+ :ref: `sqlite3-howto-encoding `
1160+
11571161
11581162 .. method :: backup(target, *, pages=-1, progress=None, name="main", sleep=0.250)
11591163
@@ -1220,6 +1224,10 @@ Connection objects
12201224
12211225 .. versionadded :: 3.7
12221226
1227+ .. seealso ::
1228+
1229+ :ref: `sqlite3-howto-encoding `
1230+
12231231 .. method :: getlimit(category, /)
12241232
12251233 Get a connection runtime limit.
@@ -1441,39 +1449,8 @@ Connection objects
14411449 and returns a text representation of it.
14421450 The callable is invoked for SQLite values with the ``TEXT `` data type.
14431451 By default, this attribute is set to :class: `str `.
1444- If you want to return ``bytes `` instead, set *text_factory * to ``bytes ``.
14451452
1446- Example:
1447-
1448- .. testcode ::
1449-
1450- con = sqlite3.connect(":memory: ")
1451- cur = con.cursor()
1452-
1453- AUSTRIA = "Österreich"
1454-
1455- # by default, rows are returned as str
1456- cur.execute("SELECT ?", (AUSTRIA,))
1457- row = cur.fetchone()
1458- assert row[0] == AUSTRIA
1459-
1460- # but we can make sqlite3 always return bytestrings ...
1461- con.text_factory = bytes
1462- cur.execute("SELECT ?", (AUSTRIA,))
1463- row = cur.fetchone()
1464- assert type(row[0]) is bytes
1465- # the bytestrings will be encoded in UTF-8, unless you stored garbage in the
1466- # database ...
1467- assert row[0] == AUSTRIA.encode("utf-8")
1468-
1469- # we can also implement a custom text_factory ...
1470- # here we implement one that appends "foo" to all strings
1471- con.text_factory = lambda x: x.decode("utf-8") + "foo"
1472- cur.execute("SELECT ?", ("bar",))
1473- row = cur.fetchone()
1474- assert row[0] == "barfoo"
1475-
1476- con.close()
1453+ See :ref: `sqlite3-howto-encoding ` for more details.
14771454
14781455 .. attribute :: total_changes
14791456
@@ -1632,7 +1609,6 @@ Cursor objects
16321609 COMMIT;
16331610 """)
16341611
1635-
16361612 .. method :: fetchone()
16371613
16381614 If :attr: `~Cursor.row_factory ` is ``None ``,
@@ -2611,6 +2587,47 @@ With some adjustments, the above recipe can be adapted to use a
26112587instead of a :class: `~collections.namedtuple `.
26122588
26132589
2590+ .. _sqlite3-howto-encoding :
2591+
2592+ How to handle non-UTF-8 text encodings
2593+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2594+
2595+ By default, :mod: `!sqlite3 ` uses :class: `str ` to adapt SQLite values
2596+ with the ``TEXT `` data type.
2597+ This works well for UTF-8 encoded text, but it might fail for other encodings
2598+ and invalid UTF-8.
2599+ You can use a custom :attr: `~Connection.text_factory ` to handle such cases.
2600+
2601+ Because of SQLite's `flexible typing `_, it is not uncommon to encounter table
2602+ columns with the ``TEXT `` data type containing non-UTF-8 encodings,
2603+ or even arbitrary data.
2604+ To demonstrate, let's assume we have a database with ISO-8859-2 (Latin-2)
2605+ encoded text, for example a table of Czech-English dictionary entries.
2606+ Assuming we now have a :class: `Connection ` instance :py:data: `!con `
2607+ connected to this database,
2608+ we can decode the Latin-2 encoded text using this :attr: `~Connection.text_factory `:
2609+
2610+ .. testcode ::
2611+
2612+ con.text_factory = lambda data: str(data, encoding="latin2")
2613+
2614+ For invalid UTF-8 or arbitrary data in stored in ``TEXT `` table columns,
2615+ you can use the following technique, borrowed from the :ref: `unicode-howto `:
2616+
2617+ .. testcode ::
2618+
2619+ con.text_factory = lambda data: str(data, errors="surrogateescape")
2620+
2621+ .. note ::
2622+
2623+ The :mod: `!sqlite3 ` module API does not support strings
2624+ containing surrogates.
2625+
2626+ .. seealso ::
2627+
2628+ :ref: `unicode-howto `
2629+
2630+
26142631.. _sqlite3-explanation :
26152632
26162633Explanation
0 commit comments