[SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC #49528

EnricoMi · 2025-01-16T14:36:33Z

What changes were proposed in this pull request?

This is a follow-up on #16685 and #16692.

Implements upsert mode for SaveMode.Append of the MySql, MsSql, and Postgres JDBC source.

See #41611 for an alternative using the MERGE INTO command (not supported by MySql).

Why are the changes needed?

The JDBC writer only supports either truncating the existing table or inserting. Duplicates, i.e. rows with identical values in the primary or unique index columns, cause an exception, permitting updating existing and inserting new rows.

Re-evaluating a partition due to executor loss will insert rows that have been inserted in an earlier attempt, which kills the entier Spark job.

Does this PR introduce any user-facing change?

This adds upsert and upsertKeyColumns options for SaveMode.Append of the JDBC source.

How was this patch tested?

Tests in JdbcSuite and integration suites.

sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala

connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/UpsertTests.scala

EnricoMi · 2025-01-20T15:42:02Z

@MaxGekk thanks for the comments, all addressed in a03345c0.

LuciferYang · 2025-03-06T07:53:18Z

cc @beliefer and @yaooqinn FYI

beliefer · 2025-03-06T11:51:43Z

...re/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala

              truncateTable(conn, options)
              val tableSchema = JdbcUtils.getSchemaOption(conn, options)
-              saveTable(df, tableSchema, isCaseSensitive, options)
+              saveTable(df, tableSchema, isCaseSensitive, upsert, options)


It looks strange that apply upsert if SaveMode is Overwrite here.

beliefer · 2025-03-06T11:52:52Z

...re/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala

              dropTable(conn, options.table, options)
              createTable(conn, options.table, df.schema, isCaseSensitive, options)
-              saveTable(df, Some(df.schema), isCaseSensitive, options)
+              saveTable(df, Some(df.schema), isCaseSensitive, upsert, options)


beliefer

I have no idea. It seems we should add a new SaveMode Upsert.

…rtsError

- Use SparkUnsupportedOperationException - Remove unused string interpolation - Fix indentation

github-actions · 2025-08-07T00:31:53Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions bot added SQL DOCS labels Jan 16, 2025

EnricoMi mentioned this pull request Jan 16, 2025

[SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC #41518

Closed

MaxGekk reviewed Jan 17, 2025

View reviewed changes

EnricoMi force-pushed the jdbc-upsert-2 branch from a03345c to 8e3c043 Compare March 5, 2025 13:15

beliefer reviewed Mar 6, 2025

View reviewed changes

EnricoMi added 14 commits April 28, 2025 06:40

Implement upsert for MySQL

49adfe3

Code cleanup

94afa86

Move upsert tests into trait

1bb0457

Implement upsert for MsSqlServer

ba196a4

Fix non-existing upsert test for Postgres

a78cc0a

Add upsert concurrency integration test

c826ce8

Add tests with varying column order

feea1dd

Add test with varying column order, sketch more tests

7b77e4a

Revert empty line removal, fix scalastyle error

f551323

Refactor tableDoesNotSupportError to reuse in tableDoesNotSupportUpse…

1a045c5

…rtsError

Fix after merge master

48fbb3b

Fix after merge master

8d5bf9f

Fix after merge master

a6f8ed8

Apply code review comments

d9b33ea

- Use SparkUnsupportedOperationException - Remove unused string interpolation - Fix indentation

EnricoMi force-pushed the jdbc-upsert-2 branch from 8e3c043 to d9b33ea Compare April 28, 2025 04:40

github-actions bot added the Stale label Aug 7, 2025

github-actions bot closed this Aug 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC #49528

[SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC #49528

Uh oh!

EnricoMi commented Jan 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EnricoMi commented Jan 20, 2025

Uh oh!

LuciferYang commented Mar 6, 2025 •

edited

Loading

Uh oh!

beliefer Mar 6, 2025

Uh oh!

beliefer Mar 6, 2025

Uh oh!

beliefer left a comment

Uh oh!

github-actions bot commented Aug 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC #49528

[SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC #49528

Uh oh!

Conversation

EnricoMi commented Jan 16, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EnricoMi commented Jan 20, 2025

Uh oh!

LuciferYang commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

beliefer Mar 6, 2025

Choose a reason for hiding this comment

Uh oh!

beliefer Mar 6, 2025

Choose a reason for hiding this comment

Uh oh!

beliefer left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

LuciferYang commented Mar 6, 2025 •

edited

Loading