Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions SEP/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,5 @@ This area contains the proposals (SEPs).
* [SEP-0007](SEP-0007/sep-0007.md): Variable Substitution
* [SEP-0008](SEP-0008/sep-0008.md): SHA-3
* [SEP-0009](SEP-0009/sep-0009.md): SPARQL CDTs: extensions for composite datatypes (lists and maps)
* [SEP-0010](SEP-0010/sep-0010.md): Alignment of SPARQL Built-in Functions with ISO SQL Standard Functions

132 changes: 132 additions & 0 deletions SEP/SEP-0010/sep-0010.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
## Alignment of SPARQL Built-in Functions with ISO SQL Standard Functions

## Short name
SPARQL-SQL-FUNCTIONS

## SEP number
SEP-10

## Authors
Dominik Tomaszuk (University of Bialystok)

## Abstract
SPARQL 1.1 defines a limited set of built-in functions for string manipulation, numeric operations, date/time handling, and conditional logic. However, many commonly used functions standardized in ISO/IEC 9075:2023 (SQL:2023) are not currently available in SPARQL. This SEP proposes extending SPARQL with additional non-aggregate functions from the SQL standard to improve interoperability, completeness, and usability. Functions such as `TRIM`, `LPAD`, `RPAD`, `MOD`, `POWER`, `SQRT`, `EXP`, `LOG`, `DATE_ADD`, `TIMESTAMPDIFF`, `CASE`, `NULLIF`, `GREATEST`, and `LEAST` are widely used in database query processing but lack equivalents in SPARQL. By introducing these functions, SPARQL can align better with existing standards, reduce the learning curve for developers, and provide richer query expressivity for RDF data.

## Motivation
SPARQL 1.1 (2013) provides only a minimal set of built-in functions compared to SQL.
Key limitations include:
- Missing string manipulation functions (`TRIM`, `LPAD`, `RPAD`, `POSITION`).
- Missing numeric/math functions (`MOD`, `POWER`, `SQRT`, `EXP`, `LOG`, `SIN`, `COS`, `TAN`).
- Limited date/time support (no `DATE_ADD`, `TIMESTAMPDIFF`, or `INTERVAL` arithmetic).
- Missing conditional/logical functions (`CASE`, `NULLIF`).
- No generalized comparative functions (`GREATEST`, `LEAST`).

These gaps limit SPARQL’s usability in data integration and analytics scenarios where users expect similar functionality to SQL. They also complicate interoperability in hybrid systems where RDF data is queried alongside relational databases.

Scope: This change affects the **SPARQL functions and operators specification**, not the core query language semantics.

## Rationale and Alternatives
Rationale:
- **Interoperability**: SQL (ISO/IEC 9075:2023) is the most widely deployed query language. Aligning SPARQL functions with SQL reduces friction in adopting SPARQL.
- **Developer familiarity**: Many practitioners know SQL but not SPARQL. Familiar function names and semantics ease adoption.
- **Expressivity**: The missing functions require complex workarounds or external processing in current SPARQL.

Alternatives considered:
1. Keep SPARQL minimal and rely on external application logic.
2. Define SPARQL-only extensions with new function names.
3. Adopt ISO SQL function names directly to ensure compatibility.

This SEP recommends option (3) for consistency with established standards.

## Evidence of consensus
- Multiple research works and developer reports highlight frustration with missing SPARQL functions.
- W3C Community Group discussions on SPARQL 1.2 already acknowledge gaps in function support.
- SQL alignment (ISO/IEC 9075:2023) has been proposed informally in workshops and mailing lists.

## Specification
The following new functions are proposed to be added to SPARQL:

### String functions
- `TRIM(string)`, `LTRIM(string)`, `RTRIM(string)`
- `LPAD(string, length, padchar)`
- `RPAD(string, length, padchar)`
- `POSITION(substring IN string)`

### Numeric functions
- `MOD(numeric, numeric)`
- `POWER(x, y)`
- `SQRT(x)`
- `EXP(x)`
- `LN(x)`, `LOG10(x)`
- `SIN(x)`, `COS(x)`, `TAN(x)`

### Date/Time functions
- `DATE_ADD(date, interval)`
- `TIMESTAMPDIFF(unit, t1, t2)`
- Support for `INTERVAL` literals (e.g., `INTERVAL '7' DAY`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it already covered by xsd: duration that is supported by a few implementations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xsd:duration is indeed supported in some implementations, but it only provides the datatype. What is missing are standardized functions and operators (e.g., DATE_ADD, TIMESTAMPDIFF) that make such durations practically usable within queries across engines.


### Conditional and logical functions
- `CASE WHEN ... THEN ... ELSE ... END`
- `NULLIF(x, y)`

### Comparative functions
- `GREATEST(x1, x2, …)`
- `LEAST(x1, x2, …)`

Each function should follow ISO/IEC 9075:2023 semantics, adapted for RDF datatypes (notably `xsd:dateTime`, `xsd:decimal`, etc.).

## Backwards Compatibility
- No impact on existing queries: all proposed functions are new additions.
- Existing SPARQL functions (`STRLEN`, `UCASE`, `LCASE`, etc.) remain valid.
- Overlaps (e.g., `CONCAT`) follow existing SPARQL semantics aligned with SQL.

## Tests and Implementations
- Test cases must cover typical inputs, edge cases (e.g., empty strings, NaN, null-equivalent values), and datatype conversions.
- Prototype implementations could be built on top of Apache Jena ARQ and RDF4J.
- Alignment tests should compare outputs against equivalent SQL queries on relational backends.

---

## Appendix A: Function Mapping between SQL and SPARQL 1.1

| SQL Function | SPARQL 1.1 Equivalent |
|-----------------|------------------------|
| LENGTH | STRLEN |
| TRIM | |
| LTRIM | |
| RTRIM | |
| LPAD | |
| RPAD | |
| POSITION | |
| UPPER | UCASE |
| LOWER | LCASE |
| SUBSTRING | SUBSTR |
| CONCAT | CONCAT |
| REPLACE | REPLACE |
| REGEXP_MATCHES | REGEX |
| ABS | ABS |
| MOD | |
| CEIL / CEILING | CEIL |
| FLOOR | FLOOR |
| ROUND | ROUND |
| EXP | |
| LN | |
| LOG10 | |
| POWER | |
| SQRT | |
| SIN | |
| COS | |
| TAN | |
| CURRENT_TIMESTAMP | NOW |
| EXTRACT | YEAR, MONTH, DAY, HOURS, MINUTES, SECONDS |
| INTERVAL | |
| DATE_ADD | |
| TIMESTAMPDIFF | |
| CASE | |
| COALESCE | COALESCE |
| NULLIF | |
| GREATEST | |
| LEAST | |
| CAST | STR(), xsd:type(...) |
| CURRENT_USER | |