to_clock_time Bottleneck When Writing Relationships With neo4j.time.DateTime Properties to Database 

- Neo4j Mode: Single instance
- Python version: 3.10
- Driver version: python driver 4.4.5 (Also applies to current state of repository) 
- Operating system: Windows 10

TLDR at the the bottom.

### Steps to reproduce
The following is a minimal working example to showcase the issue. It merges the same relationship between the same two nodes over and over again. The relationship has 3 properties of type `neo4j.time.DateTime`. 
```
from datetime import datetime
import cProfile

from neo4j import GraphDatabase
from neo4j.time import DateTime 


property_date = datetime.strptime("2021-12-01", "%Y-%m-%d")
neo4j_date = DateTime.from_native(property_date)

NUM_RELATIONSHIPS = 100000
BATCH_SIZE = 1000

DBMS_IP = "127.0.0.1"
DBMS_PORT = 7687
DBMS_AUTH_TUP = ("neo4j", "admin")

relationships = [{"start_node_labels":  ["Node1"], "start_node_properties": {"prop": "yes"},
                  "end_node_labels": ["Node2"], "end_node_properties": {"prop": "yes"},
                  "label": "relationship", 
                  "properties": {f"date{i}": neo4j_date for i in range(3)}} for _ in range(NUM_RELATIONSHIPS)]
                  
def create_relationships(tx, batch):
            tx.run("""WITH $batch AS batch
                        UNWIND batch as ind
                        CALL apoc.merge.node(ind.start_node_labels,
                                            ind.start_node_properties)
                        YIELD node as startNode
                        CALL apoc.merge.node(ind.end_node_labels,
                                            ind.end_node_properties)
                        YIELD node as endNode
                        CALL apoc.merge.relationship(startNode, ind.label,
                        ind.properties, {}, endNode, {})
                        YIELD rel RETURN null""", batch=batch).consume()

connect_string = f"bolt://{DBMS_IP}:{DBMS_PORT}"
driver = GraphDatabase.driver(connect_string, auth=DBMS_AUTH_TUP)
session = driver.session()
                        
with cProfile.Profile() as pr:
    for i in range(0, len(relationships), BATCH_SIZE):
        batch = relationships[i:i + BATCH_SIZE]
        session.write_transaction(create_relationships, batch)
        
session.close()
driver.close()

pr.print_stats("time")
```
From the first line of the profiler output, it is evident that the vast majority of the time is spent in `to_clock_time`. 

### Expected behavior
Having multiple properties of type `neo4j.time.DateTime` per relationship does not cause a significant overhead when compared to other types such as `str`, `int` or `float`. 
### Actual behavior
The function [`to_clock_time`](https://github.com/neo4j/neo4j-python-driver/blob/5.0/neo4j/time/__init__.py#L2564) has a severe impact on the running time to the point that it constitutes a bottleck, i.e. most of the total running time is spent in the function `to_clock_time`.

### Suggested Improvement
Since the slow running time is caused by the two for-loops in `to_clock_time`, I propose precomputing the cumulative days for each year/month, similar to what is already done with the global variables [`DAYS_IN_YEAR`](https://github.com/neo4j/neo4j-python-driver/blob/5.0/neo4j/time/__init__.py#L121) and [`DAYS_IN_MONTH`](https://github.com/neo4j/neo4j-python-driver/blob/5.0/neo4j/time/__init__.py#L133).  More specifically, I added the following two global variables:  
```
CUM_DAYS_IN_YEAR = [0]
for year in range(MIN_YEAR, MAX_YEAR + 1):
    CUM_DAYS_IN_YEAR.append(DAYS_IN_YEAR[year] + CUM_DAYS_IN_YEAR[year-1])

YEAR_TO_CUM_DAYS_IN_MONTH = {}
for year in range(MIN_YEAR, MAX_YEAR + 1):
    cum_days_in_month = [0]
    for month in range(1, 13):
        cum_days_in_month.append(DAYS_IN_MONTH[year, month] + cum_days_in_month[month-1])
    YEAR_TO_CUM_DAYS_IN_MONTH[year] = cum_days_in_month
```
Subsequently, `to_clock_time` is changed to:
```
def to_clock_time(self):
        """Convert to :class:`.ClockTime`.
        :rtype: ClockTime
        """
        total_seconds = (CUM_DAYS_IN_YEAR[self.year-1] + YEAR_TO_CUM_DAYS_IN_MONTH[self.year][self.month-1]) * 86400 
        total_seconds += 86400 * (self.day - 1)
        seconds, nanoseconds = divmod(self.__time.ticks_ns, NANO_SECONDS)
        return ClockTime(total_seconds + seconds, nanoseconds)
```
This change improves the worst case running time (i.e., when the date consists of a year == 9999 and a month == 12) by a factor of roughly 2671. More importantly, after this improvement, `to_clock_time` is no longer a performance bottleneck when writing datetime data to the DBMS.  
The two global variables require an additional 3.9 MB of memory. If this turns you off, know that you could reduce your memory overhead by about 10.3 MB by converting  [`DAYS_IN_YEAR`](https://github.com/neo4j/neo4j-python-driver/blob/5.0/neo4j/time/__init__.py#L121) to a simple list and [`DAYS_IN_MONTH`](https://github.com/neo4j/neo4j-python-driver/blob/5.0/neo4j/time/__init__.py#L133) to a dictionary of lists. Because these lists are relatively short, this will likely also be faster than retrieving the value from a dictionary. I have tested this hypothesis with the year 2022 and found a simple list faster than `DAYS_IN_YEAR`.

TLDR:  
- The function [`to_clock_time`](https://github.com/neo4j/neo4j-python-driver/blob/5.0/neo4j/time/__init__.py#L2564) is so slow that it becomes a bottleneck when reading in moderate amounts of relationships with `neo4j.time.DateTime` properties.
- I propose precomputing the cumulative days for each year/month
- The proposed change yields a hundred- to thousandfold decrease in the running time of `to_clock_time` while requiring an additional 3.9 MB of memory.
- I propose an additional and optional change to reduce the overall memory overhead by 10.3 MB by replacing the global variables `DAYS_IN_YEAR` and `DAYS_IN_MONTH` with lists, which in most realistic cases (year <= 2022) also improves running time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

to_clock_time Bottleneck When Writing Relationships With neo4j.time.DateTime Properties to Database #772

Steps to reproduce

Expected behavior

Actual behavior

Suggested Improvement

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

to_clock_time Bottleneck When Writing Relationships With neo4j.time.DateTime Properties to Database #772

Description

Steps to reproduce

Expected behavior

Actual behavior

Suggested Improvement

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions