Skip to content

Conversation

@dazuma
Copy link
Contributor

@dazuma dazuma commented Jun 26, 2025

This is a often-requested step due to httpclient being largely unmaintained. We've been putting it off for a long time because it induces breaking changes for any downstream dependents (such as the Google Cloud Storage client) that rely on httpclient types. However, we're now in a situation where Google Cloud Storage client feature roadmap requires HTTP capabilities not available in httpclient. So we're going to bite the bullet and do this change.

This pull request rips out httpclient and replaces it with faraday and faraday-follow_redirects, updating usage accordingly. A few public interfaces that previously exposed httpclient types now expose equivalent Faraday types, which is technically a breaking change. We will therefore release this as a semver-major (v1.0) release. Customers who depend on the httpclient types can pin the google-apis-core gem to v0.x. Note that the current direct dependents (the generated apiary clients) already declare dependencies as < 2.a so will work with either the older 0.x or newer 1.x cores.

Note: the google-cloud-storage gem is a primary dependent, and it has code that depends on an exposed httpclient object. We have therefore included in this gem a minimal patch to Faraday to provide just the interface needed by google-cloud-storage, so that it will work against both v0.x and v1.x releases.

This change has been verified against the integration tests for two major dependents: google-cloud-storage and google-cloud-bigquery.

Closes #2348
Closes #21381

@dazuma dazuma force-pushed the pr/faraday branch 6 times, most recently from e835eb8 to 12ea071 Compare July 2, 2025 18:17
@dazuma dazuma marked this pull request as ready for review July 2, 2025 18:17
@dazuma dazuma requested a review from a team as a code owner July 2, 2025 18:17
Copy link
Contributor

@bajajneha27 bajajneha27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Daniel.
I did first round of review and it looks good to me. I'm going to take a look at it again.

@viacheslav-rostovtsev
Copy link
Member

Can we avoid doing Faraday patch? Either by modifying usages in Storage to check both header and headers or by updating the usage in the same PR that pins the 1.0 version?

Btw quick search found 1(?) response.header usage in Storage...

@dazuma
Copy link
Contributor Author

dazuma commented Jul 15, 2025

Can we avoid doing Faraday patch? Either by modifying usages in Storage to check both header and headers or by updating the usage in the same PR that pins the 1.0 version?

@viacheslav-rostovtsev
It is possible, but it's a big deal because google-cloud-storage depends on google-apis-core only transitively through the google-apis-storage_v1 apiary library. So if you want different google-cloud-storage releases for the old and new core libraries, it requires embracing a full diamond dependency mitigation: You release a semver-major version of google-apis-core, and you release semver-major versions of EVERY Apiary client, depending on the new core, then you release a semver-major update to google-cloud-storage that depends on the new storage_v1 Apiary. Now we have two completely independent version trains of our entire Apiary stack: one based on the older core with httpclient, and one based on the newer core with faraday, and users must be all-in on one or the other. (We might also need to maintain the old version train for a year, per deprecation policy.)

That approach is maybe a bit safer, but is a hassle for our customers and a big hassle for us, and I've been trying not to do that sort of thing for the Ruby SDK. The current strategy tries to avoid the diamond dependency hell altogether by crafting the Core change so that downstream dependencies like google-cloud-storage are compatible with both old and new versions without modification.

That said, if we're concerned about patching Faraday, we might be able to avoid it by making our own subclass of Faraday::Response and returning that instead of Faraday::Response itself. I've been trying not to do that sort of thing in general because in the limit it suggests we wrap all of Faraday so it reproduces the old httpclient interfaces, which I think is overkill. But maybe it makes sense for this one case given that google-cloud-storage is known to use it. What do you think?

Copy link
Member

@viacheslav-rostovtsev viacheslav-rostovtsev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable, left couple questions

options[:headers] = { 'User-Agent' => user_agent }

Faraday.new options do |faraday|
faraday.response :logger, Google::Apis.logger if client_options.log_http_requests

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to understand what this will do and I got lost somewhere around the point where a (memoized by ConnectionOptions) RackBuilder's use_symbol method was calling lookup_middleware on the MiddlewareRegistry (via FaradayResponse) ... are we sure this works?

It really reminds me about the good OLE days.

"""
The OLE Component Object Model of the file and stream handlers uses the OLE IClassFactory interface to create instances of their object class.
"""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just Faraday's DSL for configuring the middleware stack. (It's similar to how other Ruby middleware systems, such as Rack itself, are configured.) This line registers the logger middleware (which is categorized as a "response" middleware) and passes it the apiary logger. The next line registers the follow_redirects_google_apis_core middleware which is a customized version of the follow_redirects middleware (see the faraday_integration.rb file).

# @param [String] response
# the response to parse.
# @return [Array<(HTTP::Message::Headers, String)>]
# @return [Array<(Hash, String)>]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array<(Hash{String, Array<(String)>},String)>, am I right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sure.


download_offset ||= (status == 206 ? @offset : 0)
download_offset += chunk.bytesize
download_offset ||= (status == 206 ? @offset : 0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# HTTP 206 Partial content

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

end

@download_io.flush if @download_io.respond_to?(:flush)
@download_io.flush if @download_io.respond_to?(:flush)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And people say alignment is difficult.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on what is being aligned.

# HTTP Client
# @return [HTTPClient]
# Faraday connection
# @return [Faraday::Connection]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"docs!: before we were calling Client a connection, now we will call Connection a client"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷🏻

request_header[CONTENT_LENGTH_HEADER] = '0'
# Initiating call
response = client.put(@upload_url, header: request_header, follow_redirect: true)
response = client.put(@upload_url, "", request_header)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the params parameter is empty string here but nil below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second parameter here, is the body, which must exist for an HTTP PUT, but could be empty (as in this case where the PUT is simply a signal that content is about to start.) The second parameter in the case below, an HTTP DELETE, is params, where nil indicates no params.

@close_io_on_finish = true
true # method returns true if upload is successfully cancelled
else
logger.debug { sprintf("Failed to cancel upload session. Response: #{response.code} - #{response.body}") }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the status is already either int or nil? and in case of nil this will be an empty string which is somewhat suboptimal for logs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added .to_i so nil will render as 0.

request_header[UPLOAD_COMMAND_HEADER] = UPLOAD_COMMAND
request_header[UPLOAD_OFFSET_HEADER] = @offset.to_s
request_header[CONTENT_TYPE_HEADER] = upload_content_type
request_header[CONTENT_LENGTH_HEADER] = (upload_io.size - @offset).to_s

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did this come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Content-Length is required by the Cloud Storage backend. httpclient added it automatically, but Faraday doesn't. Thank @blowmage and @quartzmo for a good set of Cloud Storage acceptance tests.


it 'should form encode parameters when method is POST and no body present' do
stub_request(:post, 'https://www.googleapis.com/zoo/animals?a=1&a=2&a=3&b=hello&c&d=0')
stub_request(:post, 'https://www.googleapis.com/zoo/animals?a=1&a=2&a=3&b=hello&c=true&d=0')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is Faraday::FlatParamsEncoder doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the FlatParamsEncoder behavior. It is technically a behavior change, but won't impact Apiary use cases.

header: request_header,
follow_redirect: true) do |res, chunk|
status = res.http_header.status_code.to_i
next unless OK_STATUS.include?(status)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think OK_STATUS is unused now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is. However, it wasn't private, so technically removing it would be another breaking change. I added a deprecation comment to it.

Copy link
Member

@viacheslav-rostovtsev viacheslav-rostovtsev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dazuma dazuma merged commit cdf6281 into googleapis:main Jul 31, 2025
10 checks passed
@dazuma dazuma deleted the pr/faraday branch August 1, 2025 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants