apache · HyukjinKwon · Apr 4, 2021
diff --git a/python/docs/source/index.rst b/python/docs/source/index.rst
@@ -42,7 +42,7 @@ SQL query engine.
 
 Running on top of Spark, the streaming feature in Apache Spark enables powerful
 interactive and analytical applications across both streaming and historical data,
-while inheriting Spark’s ease of use and fault tolerance characteristics.
+while inheriting Spark's ease of use and fault tolerance characteristics.
 
 **MLlib**
 

diff --git a/python/docs/source/migration_guide/pyspark_2.4_to_3.0.rst b/python/docs/source/migration_guide/pyspark_2.4_to_3.0.rst
@@ -22,7 +22,7 @@ Upgrading from PySpark 2.4 to 3.0
 
 * In Spark 3.0, PySpark requires a pandas version of 0.23.2 or higher to use pandas related functionality, such as ``toPandas``, ``createDataFrame`` from pandas DataFrame, and so on.
 
-* In Spark 3.0, PySpark requires a PyArrow version of 0.12.1 or higher to use PyArrow related functionality, such as ``pandas_udf``, ``toPandas`` and ``createDataFrame`` with “spark.sql.execution.arrow.enabled=true”, etc.
+* In Spark 3.0, PySpark requires a PyArrow version of 0.12.1 or higher to use PyArrow related functionality, such as ``pandas_udf``, ``toPandas`` and ``createDataFrame`` with "spark.sql.execution.arrow.enabled=true", etc.
 
 * In PySpark, when creating a ``SparkSession`` with ``SparkSession.builder.getOrCreate()``, if there is an existing ``SparkContext``, the builder was trying to update the ``SparkConf`` of the existing ``SparkContext`` with configurations specified to the builder, but the ``SparkContext`` is shared by all ``SparkSession`` s, so we should not update them. In 3.0, the builder comes to not update the configurations. This is the same behavior as Java/Scala API in 2.3 and above. If you want to update them, you need to update them prior to creating a ``SparkSession``.
 

diff --git a/python/docs/source/user_guide/python_packaging.rst b/python/docs/source/user_guide/python_packaging.rst
@@ -107,7 +107,7 @@ In the case of a ``spark-submit`` script, you can use it as follows:
 
 Note that ``PYSPARK_DRIVER_PYTHON`` above should not be set for cluster modes in YARN or Kubernetes.
 
-If you’re on a regular Python shell or notebook, you can try it as shown below:
+If you're on a regular Python shell or notebook, you can try it as shown below:
 
 .. code-block:: python
 

diff --git a/python/pyspark/ml/fpm.py b/python/pyspark/ml/fpm.py
@@ -161,11 +161,11 @@ class FPGrowth(JavaEstimator, _FPGrowthParams, JavaMLWritable, JavaMLReadable):
     .. [1] Haoyuan Li, Yi Wang, Dong Zhang, Ming Zhang, and Edward Y. Chang. 2008.
         Pfp: parallel fp-growth for query recommendation.
         In Proceedings of the 2008 ACM conference on Recommender systems (RecSys '08).
-        Association for Computing Machinery, New York, NY, USA, 107–114.
+        Association for Computing Machinery, New York, NY, USA, 107-114.
         DOI: https://doi.org/10.1145/1454008.1454027
     .. [2] Jiawei Han, Jian Pei, and Yiwen Yin. 2000.
         Mining frequent patterns without candidate generation.
-        SIGMOD Rec. 29, 2 (June 2000), 1–12.
+        SIGMOD Rec. 29, 2 (June 2000), 1-12.
         DOI: https://doi.org/10.1145/335191.335372
 
 

diff --git a/python/pyspark/mllib/clustering.py b/python/pyspark/mllib/clustering.py
@@ -143,7 +143,7 @@ class BisectingKMeans(object):
     -----
     See the original paper [1]_
 
-    .. [1] Steinbach, M. et al. “A Comparison of Document Clustering Techniques.” (2000).
+    .. [1] Steinbach, M. et al. "A Comparison of Document Clustering Techniques." (2000).
         KDD Workshop on Text Mining, 2000
         http://glaros.dtc.umn.edu/gkhome/fetch/papers/docclusterKDDTMW00.pdf
     """