Skip to content

Conversation

muzarski
Copy link
Contributor

@muzarski muzarski commented Apr 16, 2025

Ref: #132
Depends on: #254. Start review from cass_error: remove misleading comment.

To enable the tests I firstly needed to do some adjustments:

  • most importantly, the adjustments to error conversions - it turns out we misused the UNABLE_TO_CONNECT error in places where LIB_NO_HOSTS_AVAILABLE should be returned. This is adjusted.
  • Integration tests logic:
    • one test expected cass_cluster_set_local_address to return LIB_HOST_RESOLUTION when unparsable ip address is provided. We decided that it should return LIB_BAD_PARAMS in such case - I adjusted the integration test case to this.
    • After error conversion adjustments, remaining tests only required to adjust the log criteria in the test - logs emitted by rust-driver are slightly different than the ones emitted by cpp-driver.

Tests

Tests enabled:

  • ConnectUsingInvalidIpAddress
  • ConnectUsingInvalidPort
  • ConnectUsingUnresolvableLocalIpAddress
  • ConnectUsingUnbindableLocalIpAddress
  • ConnectUsingValidLocalIpAddressButInvalidRemote

Tests that we keep disabled:

  • TopologyChange <- some error during ccm add command (need to investigate)
  • FullOutage <- requires cass_cluster_set_constant_reconnect
  • TerminatedUsingMultipleIoThreadsWithError <- requires cass_cluster_set_num_threads_io

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • PR description sums up the changes and reasons why they should be introduced.
  • [ ] I have implemented Rust unit tests for the features/changes introduced.
  • I have enabled appropriate tests in .github/workflows/build.yml in gtest_filter.
  • I have enabled appropriate tests in .github/workflows/cassandra.yml in gtest_filter.

@muzarski muzarski added this to the 0.5 milestone Apr 16, 2025
@muzarski muzarski self-assigned this Apr 16, 2025
@muzarski muzarski added CI Related to continuous integration area/testing Related to unit/integration testing labels Apr 16, 2025
@muzarski muzarski requested review from Lorak-mmk and wprzytula April 16, 2025 11:58
Copy link
Contributor

@Lorak-mmk Lorak-mmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: As we can see, rust-driver does not provide the node address
either in the log message or in the error type.
This is something we should consider adding to rust-driver.

Please open an issue about it

  • TopologyChange <- some error during ccm add command (need to investigate)

In a follow up or before merging this?

@muzarski
Copy link
Contributor Author

Note: As we can see, rust-driver does not provide the node address
either in the log message or in the error type.
This is something we should consider adding to rust-driver.

Please open an issue about it

Ok.

  • TopologyChange <- some error during ccm add command (need to investigate)

In a follow up or before merging this?

I'll open separate issue regarding this. I'm not sure how much effort this would require - is it just a fix on cpp-rust-driver side? Or maybe there is some bug in ccm. Who knows...

At the time of writing this, I confused the address translation with
DNS address resolution.

In addition, the `CASS_ERROR_LIB_HOST_RESOLUTION` in cpp-driver is only
returned from `cass_cluster_set_local_address()` when unparsable ip address
is provided.
Motivation (based on cpp-driver):
1. UNABLE_TO_CONNECT should only be returned when user tries to call
   `cass_session_connect` on a session that is already connecting, connected or closing.
   We already handle this case in `session.rs`.

```
        let mut session_guard = session_opt.write().await;
        if session_guard.is_some() {
            return Err((
                CassError::CASS_ERROR_LIB_UNABLE_TO_CONNECT,
                "Already connecting, closing, or connected".msg(),
            ));
        }
```

2. NO_HOSTS_AVAILABLE is returned in all places where driver cannot reach
   one of the nodes. This includes:
   - pool is broken
   - connection is broken
   - connection is closed
   - connection is refused
   - empty LB plan
   - user provided empty list of contact points
   ... etc.

Thanks to this change, we will also be able to enable new integration tests
that expect `NO_HOSTS_AVAILABLE` error.

As a side note: as you can see, the NO_HOST_AVAILABLE is mapped from a lot of
Rust error types/variants. I believe we should increase the granularity of
CassError enum and add more error variants. This way we will be able to
define more direct mapping between Rust and cpp-rust-driver errors.
@muzarski muzarski force-pushed the control-connection-it branch from 96463e6 to b39fa51 Compare April 16, 2025 12:37
@muzarski
Copy link
Contributor Author

Rebased on master.

Comment on lines 239 to 241
logger_.add_critera("Could not fetch metadata, error: "
"Control connection pool error: The pool is broken; "
"Last connection failed with: Invalid argument (os error 22)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto: portability

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, do we target Windows? If so, then we should run unit & integration tests in the CI on Windows, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we want to support it eventually: #130

@muzarski muzarski force-pushed the control-connection-it branch from b39fa51 to 92e2f7c Compare April 19, 2025 10:13
@muzarski
Copy link
Contributor Author

v1.1: Addressed @wprzytula comments.

@muzarski muzarski requested a review from wprzytula April 19, 2025 10:36
I tried to map the underlying errors as precisely as possible (as of now).

The most important one is the `MetadataError::ConnectionPoolError`
which should be mapped to `LIB_NO_HOSTS_AVAILABLE`. This is required by the test.
We decided that `cass_cluster_set_local_address()` should return
LIB_BAD_PARAMS in case user provides unparsable ip address.
Note: As we can see, rust-driver does not provide the node address
either in the log message or in the error type.

This is something we should consider adding to rust-driver.
Tests enabled:
- ConnectUsingInvalidIpAddress
- ConnectUsingInvalidPort
- ConnectUsingUnresolvableLocalIpAddress
- ConnectUsingUnbindableLocalIpAddress
- ConnectUsingValidLocalIpAddressButInvalidRemote

Tests that we keep disabled:
- TopologyChange <- some error during `ccm add` command (need to investigate)
- FullOutage <- requires cass_cluster_set_constant_reconnect
- TerminatedUsingMultipleIoThreadsWithError <- requires cass_cluster_set_num_threads_io
@muzarski muzarski force-pushed the control-connection-it branch from 92e2f7c to 5baa5d3 Compare April 19, 2025 11:25
@muzarski muzarski merged commit 7916cc0 into scylladb:master Apr 19, 2025
12 checks passed
@muzarski muzarski deleted the control-connection-it branch April 19, 2025 12:26
@wprzytula wprzytula mentioned this pull request Jun 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/testing Related to unit/integration testing CI Related to continuous integration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants