From 6daa8852d024de505cc77332ed76b233ea06c7b8 Mon Sep 17 00:00:00 2001 From: domel Dominik Tomaszuk Date: Mon, 25 Aug 2025 13:40:56 +0200 Subject: [PATCH 1/2] sep-0010 --- SEP/README.md | 1 + SEP/SEP-0010/sep-0010.md | 133 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 134 insertions(+) create mode 100644 SEP/SEP-0010/sep-0010.md diff --git a/SEP/README.md b/SEP/README.md index 0e0a392..69ae6ea 100644 --- a/SEP/README.md +++ b/SEP/README.md @@ -14,4 +14,5 @@ This area contains the proposals (SEPs). * [SEP-0007](SEP-0007/sep-0007.md): Variable Substitution * [SEP-0008](SEP-0008/sep-0008.md): SHA-3 * [SEP-0009](SEP-0009/sep-0009.md): SPARQL CDTs: extensions for composite datatypes (lists and maps) +* [SEP-0010](SEP-0010/sep-0010.md): Alignment of SPARQL Built-in Functions with ISO SQL Standard Functions diff --git a/SEP/SEP-0010/sep-0010.md b/SEP/SEP-0010/sep-0010.md new file mode 100644 index 0000000..e714164 --- /dev/null +++ b/SEP/SEP-0010/sep-0010.md @@ -0,0 +1,133 @@ +## Alignment of SPARQL Built-in Functions with ISO SQL Standard Functions + +## Short name +SPARQL-SQL-FUNCTIONS + +## SEP number +SEP-XX + +## Authors +Dominik Tomaszuk (University of Bialystok) +[Add others who may contribute] + +## Abstract +SPARQL 1.1 defines a limited set of built-in functions for string manipulation, numeric operations, date/time handling, and conditional logic. However, many commonly used functions standardized in ISO/IEC 9075:2023 (SQL:2023) are not currently available in SPARQL. This SEP proposes extending SPARQL with additional non-aggregate functions from the SQL standard to improve interoperability, completeness, and usability. Functions such as `TRIM`, `LPAD`, `RPAD`, `MOD`, `POWER`, `SQRT`, `EXP`, `LOG`, `DATE_ADD`, `TIMESTAMPDIFF`, `CASE`, `NULLIF`, `GREATEST`, and `LEAST` are widely used in database query processing but lack equivalents in SPARQL. By introducing these functions, SPARQL can align better with existing standards, reduce the learning curve for developers, and provide richer query expressivity for RDF data. + +## Motivation +SPARQL 1.1 (2013) provides only a minimal set of built-in functions compared to SQL. +Key limitations include: +- Missing string manipulation functions (`TRIM`, `LPAD`, `RPAD`, `POSITION`). +- Missing numeric/math functions (`MOD`, `POWER`, `SQRT`, `EXP`, `LOG`, `SIN`, `COS`, `TAN`). +- Limited date/time support (no `DATE_ADD`, `TIMESTAMPDIFF`, or `INTERVAL` arithmetic). +- Missing conditional/logical functions (`CASE`, `NULLIF`). +- No generalized comparative functions (`GREATEST`, `LEAST`). + +These gaps limit SPARQL’s usability in data integration and analytics scenarios where users expect similar functionality to SQL. They also complicate interoperability in hybrid systems where RDF data is queried alongside relational databases. + +Scope: This change affects the **SPARQL functions and operators specification**, not the core query language semantics. + +## Rationale and Alternatives +Rationale: +- **Interoperability**: SQL (ISO/IEC 9075:2023) is the most widely deployed query language. Aligning SPARQL functions with SQL reduces friction in adopting SPARQL. +- **Developer familiarity**: Many practitioners know SQL but not SPARQL. Familiar function names and semantics ease adoption. +- **Expressivity**: The missing functions require complex workarounds or external processing in current SPARQL. + +Alternatives considered: +1. Keep SPARQL minimal and rely on external application logic. +2. Define SPARQL-only extensions with new function names. +3. Adopt ISO SQL function names directly to ensure compatibility. + +This SEP recommends option (3) for consistency with established standards. + +## Evidence of consensus +- Multiple research works and developer reports highlight frustration with missing SPARQL functions. +- W3C Community Group discussions on SPARQL 1.2 already acknowledge gaps in function support. +- SQL alignment (ISO/IEC 9075:2023) has been proposed informally in workshops and mailing lists. + +## Specification +The following new functions are proposed to be added to SPARQL: + +### String functions +- `TRIM(string)`, `LTRIM(string)`, `RTRIM(string)` +- `LPAD(string, length, padchar)` +- `RPAD(string, length, padchar)` +- `POSITION(substring IN string)` + +### Numeric functions +- `MOD(numeric, numeric)` +- `POWER(x, y)` +- `SQRT(x)` +- `EXP(x)` +- `LN(x)`, `LOG10(x)` +- `SIN(x)`, `COS(x)`, `TAN(x)` + +### Date/Time functions +- `DATE_ADD(date, interval)` +- `TIMESTAMPDIFF(unit, t1, t2)` +- Support for `INTERVAL` literals (e.g., `INTERVAL '7' DAY`) + +### Conditional and logical functions +- `CASE WHEN ... THEN ... ELSE ... END` +- `NULLIF(x, y)` + +### Comparative functions +- `GREATEST(x1, x2, …)` +- `LEAST(x1, x2, …)` + +Each function should follow ISO/IEC 9075:2023 semantics, adapted for RDF datatypes (notably `xsd:dateTime`, `xsd:decimal`, etc.). + +## Backwards Compatibility +- No impact on existing queries: all proposed functions are new additions. +- Existing SPARQL functions (`STRLEN`, `UCASE`, `LCASE`, etc.) remain valid. +- Overlaps (e.g., `CONCAT`) follow existing SPARQL semantics aligned with SQL. + +## Tests and Implementations +- Test cases must cover typical inputs, edge cases (e.g., empty strings, NaN, null-equivalent values), and datatype conversions. +- Prototype implementations could be built on top of Apache Jena ARQ and RDF4J. +- Alignment tests should compare outputs against equivalent SQL queries on relational backends. + +--- + +## Appendix A: Function Mapping between SQL and SPARQL 1.1 + +| SQL Function | SPARQL 1.1 Equivalent | +|-----------------|------------------------| +| LENGTH | STRLEN | +| TRIM | | +| LTRIM | | +| RTRIM | | +| LPAD | | +| RPAD | | +| POSITION | | +| UPPER | UCASE | +| LOWER | LCASE | +| SUBSTRING | SUBSTR | +| CONCAT | CONCAT | +| REPLACE | REPLACE | +| REGEXP_MATCHES | REGEX | +| ABS | ABS | +| MOD | | +| CEIL / CEILING | CEIL | +| FLOOR | FLOOR | +| ROUND | ROUND | +| EXP | | +| LN | | +| LOG10 | | +| POWER | | +| SQRT | | +| SIN | | +| COS | | +| TAN | | +| CURRENT_TIMESTAMP | NOW | +| EXTRACT | YEAR, MONTH, DAY, HOURS, MINUTES, SECONDS | +| INTERVAL | | +| DATE_ADD | | +| TIMESTAMPDIFF | | +| CASE | | +| COALESCE | COALESCE | +| NULLIF | | +| GREATEST | | +| LEAST | | +| CAST | STR(), xsd:type(...) | +| CURRENT_USER | | + From 898b0ce495c42e8153cf8885812dfb0a8219dccb Mon Sep 17 00:00:00 2001 From: domel Dominik Tomaszuk Date: Mon, 25 Aug 2025 13:42:45 +0200 Subject: [PATCH 2/2] sep-0010 update --- SEP/SEP-0010/sep-0010.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/SEP/SEP-0010/sep-0010.md b/SEP/SEP-0010/sep-0010.md index e714164..75ab6da 100644 --- a/SEP/SEP-0010/sep-0010.md +++ b/SEP/SEP-0010/sep-0010.md @@ -4,11 +4,10 @@ SPARQL-SQL-FUNCTIONS ## SEP number -SEP-XX +SEP-10 ## Authors Dominik Tomaszuk (University of Bialystok) -[Add others who may contribute] ## Abstract SPARQL 1.1 defines a limited set of built-in functions for string manipulation, numeric operations, date/time handling, and conditional logic. However, many commonly used functions standardized in ISO/IEC 9075:2023 (SQL:2023) are not currently available in SPARQL. This SEP proposes extending SPARQL with additional non-aggregate functions from the SQL standard to improve interoperability, completeness, and usability. Functions such as `TRIM`, `LPAD`, `RPAD`, `MOD`, `POWER`, `SQRT`, `EXP`, `LOG`, `DATE_ADD`, `TIMESTAMPDIFF`, `CASE`, `NULLIF`, `GREATEST`, and `LEAST` are widely used in database query processing but lack equivalents in SPARQL. By introducing these functions, SPARQL can align better with existing standards, reduce the learning curve for developers, and provide richer query expressivity for RDF data.