@@ -1029,6 +1029,10 @@ Connection objects
10291029 f.write('%s\n ' % line)
10301030 con.close()
10311031
1032+ .. seealso ::
1033+
1034+ :ref: `sqlite3-howto-encoding `
1035+
10321036
10331037 .. method :: backup(target, *, pages=-1, progress=None, name="main", sleep=0.250)
10341038
@@ -1095,6 +1099,10 @@ Connection objects
10951099
10961100 .. versionadded :: 3.7
10971101
1102+ .. seealso ::
1103+
1104+ :ref: `sqlite3-howto-encoding `
1105+
10981106 .. method :: getlimit(category, /)
10991107
11001108 Get a connection runtime limit.
@@ -1253,39 +1261,8 @@ Connection objects
12531261 and returns a text representation of it.
12541262 The callable is invoked for SQLite values with the ``TEXT `` data type.
12551263 By default, this attribute is set to :class: `str `.
1256- If you want to return ``bytes `` instead, set *text_factory * to ``bytes ``.
12571264
1258- Example:
1259-
1260- .. testcode ::
1261-
1262- con = sqlite3.connect(":memory: ")
1263- cur = con.cursor()
1264-
1265- AUSTRIA = "Österreich"
1266-
1267- # by default, rows are returned as str
1268- cur.execute("SELECT ?", (AUSTRIA,))
1269- row = cur.fetchone()
1270- assert row[0] == AUSTRIA
1271-
1272- # but we can make sqlite3 always return bytestrings ...
1273- con.text_factory = bytes
1274- cur.execute("SELECT ?", (AUSTRIA,))
1275- row = cur.fetchone()
1276- assert type(row[0]) is bytes
1277- # the bytestrings will be encoded in UTF-8, unless you stored garbage in the
1278- # database ...
1279- assert row[0] == AUSTRIA.encode("utf-8")
1280-
1281- # we can also implement a custom text_factory ...
1282- # here we implement one that appends "foo" to all strings
1283- con.text_factory = lambda x: x.decode("utf-8") + "foo"
1284- cur.execute("SELECT ?", ("bar",))
1285- row = cur.fetchone()
1286- assert row[0] == "barfoo"
1287-
1288- con.close()
1265+ See :ref: `sqlite3-howto-encoding ` for more details.
12891266
12901267 .. attribute :: total_changes
12911268
@@ -1423,7 +1400,6 @@ Cursor objects
14231400 COMMIT;
14241401 """)
14251402
1426-
14271403 .. method :: fetchone()
14281404
14291405 If :attr: `~Cursor.row_factory ` is ``None ``,
@@ -2369,6 +2345,47 @@ With some adjustments, the above recipe can be adapted to use a
23692345instead of a :class: `~collections.namedtuple `.
23702346
23712347
2348+ .. _sqlite3-howto-encoding :
2349+
2350+ How to handle non-UTF-8 text encodings
2351+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2352+
2353+ By default, :mod: `!sqlite3 ` uses :class: `str ` to adapt SQLite values
2354+ with the ``TEXT `` data type.
2355+ This works well for UTF-8 encoded text, but it might fail for other encodings
2356+ and invalid UTF-8.
2357+ You can use a custom :attr: `~Connection.text_factory ` to handle such cases.
2358+
2359+ Because of SQLite's `flexible typing `_, it is not uncommon to encounter table
2360+ columns with the ``TEXT `` data type containing non-UTF-8 encodings,
2361+ or even arbitrary data.
2362+ To demonstrate, let's assume we have a database with ISO-8859-2 (Latin-2)
2363+ encoded text, for example a table of Czech-English dictionary entries.
2364+ Assuming we now have a :class: `Connection ` instance :py:data: `!con `
2365+ connected to this database,
2366+ we can decode the Latin-2 encoded text using this :attr: `~Connection.text_factory `:
2367+
2368+ .. testcode ::
2369+
2370+ con.text_factory = lambda data: str(data, encoding="latin2")
2371+
2372+ For invalid UTF-8 or arbitrary data in stored in ``TEXT `` table columns,
2373+ you can use the following technique, borrowed from the :ref: `unicode-howto `:
2374+
2375+ .. testcode ::
2376+
2377+ con.text_factory = lambda data: str(data, errors="surrogateescape")
2378+
2379+ .. note ::
2380+
2381+ The :mod: `!sqlite3 ` module API does not support strings
2382+ containing surrogates.
2383+
2384+ .. seealso ::
2385+
2386+ :ref: `unicode-howto `
2387+
2388+
23722389.. _sqlite3-explanation :
23732390
23742391Explanation
0 commit comments