-
Notifications
You must be signed in to change notification settings - Fork 186
Optimize connector client configuration #4508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Optimize connector client configuration #4508
Conversation
WalkthroughTimeout fields were renamed to explicit units (connectionTimeoutMillis, readTimeoutSeconds) with updated serialization and tests. AbstractConnectorExecutor now lazily creates, caches, and closes an HTTP client; several concrete executors removed their own client caching. RemoteConnectorExecutor extends SdkAutoCloseable. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Exec as AbstractConnectorExecutor
participant Config as ConnectorClientConfig
participant Factory as MLHttpClientFactory
participant Http as SdkAsyncHttpClient
Exec->>Config: getConnectorClientConfig()
Exec->>Exec: read maxConnections, connectionTimeoutMillis, readTimeoutSeconds
alt no cached client
Exec->>Factory: buildClient(maxConnections, connectionTimeoutMillis, readTimeoutSeconds, connectorPrivateIpEnabled)
Factory-->>Http: returns SdkAsyncHttpClient
Exec->>Exec: store Http in AtomicReference
end
Exec-->>Http: return cached SdkAsyncHttpClient
Note right of Exec: close() stops and clears cached Http
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.gitignore(1 hunks)common/src/main/java/org/opensearch/ml/common/connector/ConnectorClientConfig.java(1 hunks)common/src/test/java/org/opensearch/ml/common/connector/ConnectorClientConfigTest.java(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: spotless
🔇 Additional comments (1)
common/src/test/java/org/opensearch/ml/common/connector/ConnectorClientConfigTest.java (1)
162-168: Test default value assertions match the current implementation constants but appear inconsistent with unit expectations.The test assertions expect
connectionTimeout=1andreadTimeout=10, which match the implementation'sCONNECTION_TIMEOUT_DEFAULT_VALUEandREAD_TIMEOUT_DEFAULT_VALUEconstants. However, all other test cases in this file explicitly use5000and3000for these fields, strongly suggesting millisecond units. If these fields are indeed millisecond-based (as the explicit test values andMLHttpClientFactory'sDurationparameters indicate), the default constants should be1000and10000respectively, and these assertions should be updated accordingly:- Assert.assertEquals(Integer.valueOf(1), config.getConnectionTimeout()); - Assert.assertEquals(Integer.valueOf(10), config.getReadTimeout()); + Assert.assertEquals(Integer.valueOf(1000), config.getConnectionTimeout()); + Assert.assertEquals(Integer.valueOf(10000), config.getReadTimeout());Verify whether the implementation defaults are being corrected as part of this change.
common/src/main/java/org/opensearch/ml/common/connector/ConnectorClientConfig.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
common/src/test/java/org/opensearch/ml/common/transport/connector/MLCreateConnectorInputTests.java (1)
92-92: Verify test data reflects intended timeout values.The test constructs
ConnectorClientConfig(20, 10000, 10000, 10, 10, -1, RetryBackoffPolicy.CONSTANT), which translates to:
connectionTimeoutMillis = 10000(10 seconds)readTimeoutSeconds = 10000(10,000 seconds ≈ 2.78 hours)A read timeout of 2.78 hours seems unrealistic for testing. This may have been copy-pasted from previous test data where both values were in milliseconds.
Consider updating to more realistic test values:
- ConnectorClientConfig connectorClientConfig = new ConnectorClientConfig(20, 10000, 10000, 10, 10, -1, RetryBackoffPolicy.CONSTANT); + ConnectorClientConfig connectorClientConfig = new ConnectorClientConfig(20, 10000, 30, 10, 10, -1, RetryBackoffPolicy.CONSTANT);This would set:
- Connection timeout: 10000ms (10 seconds)
- Read timeout: 30 seconds (more reasonable for testing)
🧹 Nitpick comments (1)
common/src/main/java/org/opensearch/ml/common/connector/ConnectorClientConfig.java (1)
31-32: Consider adding unit suffixes to JSON field name constants for clarity.The Java field names now include unit suffixes (
connectionTimeoutMillis,readTimeoutSeconds), but the JSON field name constants (CONNECTION_TIMEOUT_FIELD,READ_TIMEOUT_FIELD) do not. This creates potential confusion for API consumers who need to know thatconnection_timeoutexpects milliseconds whileread_timeoutexpects seconds.Consider renaming to make units explicit:
- public static final String CONNECTION_TIMEOUT_FIELD = "connection_timeout"; - public static final String READ_TIMEOUT_FIELD = "read_timeout"; + public static final String CONNECTION_TIMEOUT_FIELD = "connection_timeout_millis"; + public static final String READ_TIMEOUT_FIELD = "read_timeout_seconds";Alternatively, document the expected units in comments if changing field names would break API compatibility.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
common/src/main/java/org/opensearch/ml/common/connector/ConnectorClientConfig.java(7 hunks)common/src/test/java/org/opensearch/ml/common/connector/ConnectorClientConfigTest.java(9 hunks)common/src/test/java/org/opensearch/ml/common/transport/connector/MLCreateConnectorInputTests.java(1 hunks)ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutor.java(2 hunks)ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AwsConnectorExecutor.java(0 hunks)ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/HttpJsonConnectorExecutor.java(0 hunks)ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/McpConnectorExecutor.java(1 hunks)ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/McpStreamableHttpConnectorExecutor.java(1 hunks)ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/streaming/HttpStreamingHandler.java(1 hunks)ml-algorithms/src/test/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutorTest.java(1 hunks)
💤 Files with no reviewable changes (2)
- ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AwsConnectorExecutor.java
- ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/HttpJsonConnectorExecutor.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: spotless
🔇 Additional comments (8)
common/src/main/java/org/opensearch/ml/common/connector/ConnectorClientConfig.java (2)
38-40: Verify timeout defaults align with intended behavior.The new defaults set connection timeout to 1000ms (1 second) and read timeout to 10 seconds. While this aligns with the PR objectives to prevent connection pool exhaustion, a 1-second connection timeout may be too aggressive for some network conditions.
Please verify these values have been tested against:
- Various network latency conditions (high-latency networks, VPCs across regions)
- Cold-start scenarios where remote endpoints may take longer to establish connections
- Existing deployments to ensure backward compatibility
Based on the PR objectives and past review comments, this addresses the unit ambiguity issue raised previously.
47-48: LGTM! Field renaming consistently applied.The renaming from
connectionTimeout/readTimeouttoconnectionTimeoutMillis/readTimeoutSecondsis consistently applied across:
- Field declarations
- Constructor parameters
- Builder methods
- Serialization/deserialization
- JSON output (toXContent)
- Parsing logic
This resolves the unit ambiguity identified in previous reviews by making units explicit in the field names.
Also applies to: 57-58, 65-66, 76-77, 90-91, 102-103, 123-127, 193-194
ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/McpStreamableHttpConnectorExecutor.java (1)
77-78: LGTM! Timeout configuration correctly updated.The timeout configuration properly uses:
Duration.ofMillis()forconnectionTimeoutMillis(milliseconds)Duration.ofSeconds()forreadTimeoutSeconds(seconds)This correctly implements the new API from ConnectorClientConfig.
ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/McpConnectorExecutor.java (1)
74-75: LGTM! Timeout configuration correctly updated.The timeout configuration properly uses:
Duration.ofMillis()forconnectionTimeoutMillis(milliseconds)Duration.ofSeconds()forreadTimeoutSeconds(seconds)This correctly implements the new API from ConnectorClientConfig.
ml-algorithms/src/test/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutorTest.java (1)
33-34: LGTM! Tests updated to use new API.Test assertions correctly updated to validate:
getConnectionTimeoutMillis()returnsCONNECTION_TIMEOUT_DEFAULT_VALUEgetReadTimeoutSeconds()returnsREAD_TIMEOUT_DEFAULT_VALUEThis ensures the new API is properly tested.
Also applies to: 42-43
common/src/test/java/org/opensearch/ml/common/transport/connector/MLCreateConnectorInputTests.java (1)
327-328: LGTM! Assertions updated to use new API.Test assertions correctly updated to use
getReadTimeoutSeconds()andgetConnectionTimeoutMillis().ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutor.java (1)
38-50: LGTM! Thread-safe lazy HTTP client initialization.The new
getHttpClient()method properly implements lazy initialization with:
- Thread-safe
AtomicReference.compareAndSet()for concurrent access- Correct timeout conversion:
Duration.ofMillis()for connection timeout,Duration.ofSeconds()for read timeout- Integration with
MLHttpClientFactoryfor standardized client creationThe addition of
connectorPrivateIpEnabledprovides useful configuration flexibility.common/src/test/java/org/opensearch/ml/common/connector/ConnectorClientConfigTest.java (1)
27-28: LGTM! Comprehensive test coverage for refactored API.Tests thoroughly cover:
- Builder API with new method names (
connectionTimeoutMillis(),readTimeoutSeconds())- Serialization/deserialization with new getters
- JSON output validation
- Parsing with correct unit semantics
- Default value initialization
All assertions correctly validate the new timeout field semantics.
Also applies to: 58-59, 74-75, 87-88, 120-121, 150-151, 163-164
...src/main/java/org/opensearch/ml/engine/algorithms/remote/streaming/HttpStreamingHandler.java
Outdated
Show resolved
Hide resolved
86c4a29 to
9427c33
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutor.java (1)
50-55: Use atomic get-and-set to prevent double-close.The current implementation allows concurrent
close()calls to retrieve the same client reference and attempt to close it multiple times.public void close() { - SdkAsyncHttpClient httpClient = httpClientRef.get(); + SdkAsyncHttpClient httpClient = httpClientRef.getAndSet(null); if (httpClient != null) { httpClient.close(); } }common/src/main/java/org/opensearch/ml/common/connector/ConnectorClientConfig.java (1)
117-141: Consider documenting the unit semantics for JSON field values.The JSON field names (
connection_timeout,read_timeout) don't indicate units, but they now have different expected units (milliseconds vs seconds). While this maintains backward compatibility, users configuring these values may expect consistent units.Consider adding Javadoc or updating user-facing documentation to clarify that
connection_timeoutexpects milliseconds andread_timeoutexpects seconds.OpenSearch ml-commons ConnectorClientConfig connection_timeout read_timeout documentation
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
common/src/main/java/org/opensearch/ml/common/connector/ConnectorClientConfig.java(7 hunks)common/src/test/java/org/opensearch/ml/common/connector/ConnectorClientConfigTest.java(9 hunks)common/src/test/java/org/opensearch/ml/common/transport/connector/MLCreateConnectorInputTests.java(1 hunks)ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutor.java(1 hunks)ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AwsConnectorExecutor.java(0 hunks)ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/HttpJsonConnectorExecutor.java(0 hunks)ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/McpConnectorExecutor.java(1 hunks)ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/McpStreamableHttpConnectorExecutor.java(1 hunks)ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/RemoteConnectorExecutor.java(1 hunks)ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/RemoteModel.java(1 hunks)ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/streaming/HttpStreamingHandler.java(1 hunks)ml-algorithms/src/test/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutorTest.java(1 hunks)plugin/src/test/java/org/opensearch/ml/rest/RestMLRAGSearchProcessorIT.java(1 hunks)
💤 Files with no reviewable changes (2)
- ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AwsConnectorExecutor.java
- ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/HttpJsonConnectorExecutor.java
🚧 Files skipped from review as they are similar to previous changes (5)
- plugin/src/test/java/org/opensearch/ml/rest/RestMLRAGSearchProcessorIT.java
- ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/streaming/HttpStreamingHandler.java
- ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/McpStreamableHttpConnectorExecutor.java
- common/src/test/java/org/opensearch/ml/common/connector/ConnectorClientConfigTest.java
- common/src/test/java/org/opensearch/ml/common/transport/connector/MLCreateConnectorInputTests.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build and Test MLCommons Plugin on linux (25)
- GitHub Check: Build and Test MLCommons Plugin on linux (21)
🔇 Additional comments (4)
ml-algorithms/src/test/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutorTest.java (1)
28-44: LGTM!Test assertions correctly updated to use the renamed getters (
getConnectionTimeoutMillis()andgetReadTimeoutSeconds()), and they properly validate that default values are applied when the connector client config is null or has null field values.ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/McpConnectorExecutor.java (1)
74-75: LGTM!Timeout durations correctly use the new getter methods with appropriate units:
Duration.ofMillis()for connection timeout andDuration.ofSeconds()for read timeout, aligning with the renamed field semantics.ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/RemoteConnectorExecutor.java (1)
64-71: LGTM!Extending
SdkAutoCloseableis appropriate for enforcing resource lifecycle management. The Javadoc clearly documents the responsibility for closing the internal HTTP client when the executor is closed.common/src/main/java/org/opensearch/ml/common/connector/ConnectorClientConfig.java (1)
38-40: LGTM! Default values align with PR objectives.Connection timeout defaults to 1000ms (1 second) and read timeout to 10 seconds, which addresses the pool exhaustion concern when remote endpoints are slow.
...thms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutor.java
Show resolved
Hide resolved
ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/RemoteModel.java
Show resolved
Hide resolved
| private Integer connectionTimeoutMillis; | ||
| private Integer readTimeoutSeconds; | ||
| private Integer retryBackoffMillis; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we use milliseconds for both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, sounds very confusing to use different way to configure these timeouts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In documentation, we specified that connection timeout is in millisecond and read timeout in second: https://docs.opensearch.org/latest/ml-commons-plugin/remote-models/blueprints/#configuration-parameters, changing this could break existing customer configurations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opensearch-project/documentation-website#9460
based on this issue, the documentation was changed. Let's make the values consistent. I don't have preference about seconds vs miliseconds. But keep this consistent please. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I can change this to milliseconds but this could be a breaking change for existing customer connector configurations, e.g. user configuring to 3s now become 3 milliseconds which basically means all remote model calls fails, @ylwu-amzn , please also confirm on this.
| public static final Integer CONNECTION_TIMEOUT_DEFAULT_VALUE = 1000; | ||
| public static final Integer READ_TIMEOUT_DEFAULT_VALUE = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally we had 30 seconds for both of these cases. Can we get back to the original state? Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide some evidence that connection/read timeout both setting to 30s is a good option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have evidence handy. But my point is this was 30s before until the bug we introduced. And to fix the bug if we change the value drastically that seems a breaking change to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is not targeting for reverting code back, its purpose is to address some issues.
This is a general timeout setting given by Claude, basically we shouldn't set connection timeout 30s that big in our case.
A good user experience of the E2E latency better less than 10s based on this article: https://www.nngroup.com/articles/response-times-3-important-limits. Note that this is only an internal http call instead of E2E latency.
From retry perspective, if retry mechanism is on and if a connection can't established and we wait for 30s, let's say we have 3 retries and that's 90s, I don't think user would like wait that long.
From serverless case, SGW's read timeout is 15s so if this number is bigger than 15s then SGW timed out first.
So all above I prefer to keep the current settings, if you still have concern please give some proof that we have to do that way, thanks!
Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
…e IT Signed-off-by: zane-neo <zaniu@amazon.com>
a4ef7cc to
3046388
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
common/src/test/java/org/opensearch/ml/common/transport/connector/MLCreateConnectorInputTests.java (1)
70-71: Test data inconsistency:read_timeoutvalue of 10000 is interpreted as seconds, not milliseconds.This issue was flagged in a previous review and appears unresolved. With the new API where
readTimeoutSecondsis in seconds:
- Line 70:
"read_timeout":10000means 10,000 seconds (~2.7 hours), not the intended 10 seconds- Line 92: Constructor param
10000forreadTimeoutSecondsshould be10- Line 327: Assertion expects
10000forgetReadTimeoutSeconds()which contradicts the PR goal of 10-second read timeout🔎 Proposed fix
private final String expectedInputStr = "{\"name\":\"test_connector_name\"," ... + "\"access_mode\":\"PUBLIC\",\"client_config\":{\"max_connection\":20," - + "\"connection_timeout\":10000,\"read_timeout\":10000," + + "\"connection_timeout\":10000,\"read_timeout\":10," + "\"retry_backoff_millis\":10,\"retry_timeout_seconds\":10,\"max_retry_times\":-1,\"retry_backoff_policy\":\"constant\"}}";-ConnectorClientConfig connectorClientConfig = new ConnectorClientConfig(20, 10000, 10000, 10, 10, -1, RetryBackoffPolicy.CONSTANT); +ConnectorClientConfig connectorClientConfig = new ConnectorClientConfig(20, 10000, 10, 10, 10, -1, RetryBackoffPolicy.CONSTANT);- assertEquals(10000, parsedInput.getConnectorClientConfig().getReadTimeoutSeconds().intValue()); + assertEquals(10, parsedInput.getConnectorClientConfig().getReadTimeoutSeconds().intValue());Also applies to: 92-92, 327-328
🧹 Nitpick comments (2)
plugin/src/test/java/org/opensearch/ml/rest/RestMLInferenceSearchRequestProcessorIT.java (1)
54-56: Theread_timeoutfield name is correct and 60 seconds is appropriate for the OpenAI connector.The field name aligns with the ConnectorClientConfig schema where "read_timeout" is the correct JSON serialization name. The 60-second timeout is appropriate for external API integration tests, allowing adequate time for OpenAI API responses.
Optionally, consider adding explicit
client_configtimeouts to the Bedrock connectors (bedrockEmbeddingModelConnectorEntityandbedrockMultiModalEmbeddingModelConnectorEntity) for consistency, as they also invoke external services. However, this is not required if the PR scope is limited to the OpenAI connector.ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutor.java (1)
26-26: Redundantvolatilemodifier onAtomicReference.
AtomicReferencealready provides thread-safe visibility guarantees through its internal volatile field. Thevolatilemodifier on the reference variable itself is unnecessary since the reference is never reassigned after initialization.🔎 Suggested fix
- private volatile AtomicReference<SdkAsyncHttpClient> httpClientRef = new AtomicReference<>(); + private final AtomicReference<SdkAsyncHttpClient> httpClientRef = new AtomicReference<>();
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
.gitignorecommon/src/main/java/org/opensearch/ml/common/connector/ConnectorClientConfig.javacommon/src/test/java/org/opensearch/ml/common/connector/ConnectorClientConfigTest.javacommon/src/test/java/org/opensearch/ml/common/transport/connector/MLCreateConnectorInputTests.javaml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutor.javaml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AwsConnectorExecutor.javaml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/HttpJsonConnectorExecutor.javaml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/McpConnectorExecutor.javaml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/McpStreamableHttpConnectorExecutor.javaml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/RemoteConnectorExecutor.javaml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/RemoteModel.javaml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/streaming/HttpStreamingHandler.javaml-algorithms/src/test/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutorTest.javaplugin/src/test/java/org/opensearch/ml/rest/RestMLInferenceSearchRequestProcessorIT.javaplugin/src/test/java/org/opensearch/ml/rest/RestMLRAGSearchProcessorIT.java
💤 Files with no reviewable changes (2)
- ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AwsConnectorExecutor.java
- ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/HttpJsonConnectorExecutor.java
🚧 Files skipped from review as they are similar to previous changes (7)
- ml-algorithms/src/test/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutorTest.java
- ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/streaming/HttpStreamingHandler.java
- ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/McpConnectorExecutor.java
- ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/McpStreamableHttpConnectorExecutor.java
- ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/RemoteModel.java
- plugin/src/test/java/org/opensearch/ml/rest/RestMLRAGSearchProcessorIT.java
- ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/RemoteConnectorExecutor.java
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-18T16:02:36.322Z
Learnt from: akolarkunnu
Repo: opensearch-project/ml-commons PR: 3919
File: plugin/src/test/java/org/opensearch/ml/cluster/MLSyncUpCronTests.java:184-191
Timestamp: 2025-12-18T16:02:36.322Z
Learning: In tests that exercise initialization logic (e.g., initMLConfig), verify idempotence by calling the init method twice and asserting that the master key is initialized once and cached for subsequent calls. The test should confirm that repeated initializations yield the same result and no unnecessary recomputation. Apply this pattern to similar test files under plugin/src/test/java/org/opensearch/ml/cluster and other modules that initialize shared state.
Applied to files:
plugin/src/test/java/org/opensearch/ml/rest/RestMLInferenceSearchRequestProcessorIT.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and Test MLCommons Plugin on linux (21)
- GitHub Check: Build and Test MLCommons Plugin on Windows (21)
- GitHub Check: Build and Test MLCommons Plugin on Windows (25)
- GitHub Check: Build and Test MLCommons Plugin on linux (25)
🔇 Additional comments (4)
.gitignore (1)
14-14: Already addressed in prior review.This line was flagged in a previous review and marked as addressed in commits 1eb22e6 to 3046388. No new concerns identified.
common/src/main/java/org/opensearch/ml/common/connector/ConnectorClientConfig.java (1)
38-48: Field renaming improves internal clarity; JSON API unit ambiguity remains a documentation concern.The renaming of
connectionTimeout→connectionTimeoutMillisandreadTimeout→readTimeoutSecondswith corresponding default values (1000ms and 10s) makes the internal API explicit about units. This aligns with the PR objective of reducing long-held connections.Note that the JSON serialization still uses
connection_timeoutandread_timeoutwithout unit suffixes (lines 31-32, 123-127), which relies on documentation to clarify the mixed units. Per the author's earlier comment, this is intentional for backward compatibility with existing configurations.ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutor.java (1)
36-56: Double-checked locking pattern correctly addresses the previous race condition.The implementation properly uses synchronization to prevent multiple HTTP client instances from being created, with a fast path for already-initialized clients. This addresses the race condition flagged in previous reviews.
common/src/test/java/org/opensearch/ml/common/connector/ConnectorClientConfigTest.java (1)
27-28: Test expectations correctly aligned with new timeout unit semantics.The tests properly use:
connectionTimeoutMillis()with millisecond values (1000, 5000)readTimeoutSeconds()with second values (3, 10)- Assertions via
getConnectionTimeoutMillis()andgetReadTimeoutSeconds()This correctly validates the renamed API and default values (1000ms connection timeout, 10s read timeout).
Also applies to: 87-88, 120-121, 163-164
...thms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutor.java
Outdated
Show resolved
Hide resolved
...thms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutor.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutor.java (1)
27-27: Remove redundantvolatilemodifier on AtomicReference.The
volatilekeyword onAtomicReferenceis redundant becauseAtomicReferencealready provides volatile semantics for the reference it holds. The atomic operations (get, set, compareAndSet) ensure proper visibility across threads.🔎 Proposed fix
- private volatile AtomicReference<SdkAsyncHttpClient> httpClientRef = new AtomicReference<>(); + private AtomicReference<SdkAsyncHttpClient> httpClientRef = new AtomicReference<>();
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
common/src/main/java/org/opensearch/ml/common/connector/ConnectorClientConfig.javaml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutor.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and Test MLCommons Plugin on Windows (21)
- GitHub Check: Build and Test MLCommons Plugin on linux (25)
- GitHub Check: Build and Test MLCommons Plugin on linux (21)
- GitHub Check: Build and Test MLCommons Plugin on Windows (25)
🔇 Additional comments (12)
ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutor.java (4)
8-18: LGTM! Appropriate imports for HTTP client management.The new imports support the lazy HTTP client initialization pattern and AWS SDK v2 integration.
31-35: LGTM! Simplified initialization logic.The method now relies on the default
ConnectorClientConfiginstance initialized at declaration, which is cleaner than the previous else-branch approach.
37-69: LGTM! Thread-safe lazy initialization implemented correctly.The double-checked locking pattern is properly implemented:
- Fast path avoids synchronization overhead after initialization
- Synchronized block ensures only one thread creates the HTTP client
Optional.ofNullablewith defaults prevents NPE on null configuration values- Atomic reference setting ensures visibility
Previous concerns about race conditions and NPE have been effectively addressed.
71-76: LGTM! Proper resource cleanup with use-after-close prevention.The
getAndSet(null)approach atomically clears the reference while retrieving the client for cleanup, preventing any thread from obtaining a closed client afterclose()is invoked.common/src/main/java/org/opensearch/ml/common/connector/ConnectorClientConfig.java (8)
38-40: LGTM! Default values optimized per PR objectives.The updated defaults align with the PR's goal to prevent connection pool exhaustion:
- Connection timeout: 1000ms (1 second) - down from previous unclear default
- Read timeout: 10 seconds - maintains reasonable duration for remote calls
- Type change to
intprimitives is appropriate for constantsThese shorter timeouts reduce the risk of long-held pooled connections when remote endpoints perform poorly.
47-48: LGTM! Explicit field naming eliminates unit ambiguity.Renaming the fields to include unit suffixes (
connectionTimeoutMillis,readTimeoutSeconds) clarifies the expected units and aligns with the existing pattern used byretryBackoffMillisandretryTimeoutSeconds. This addresses the unit confusion flagged in previous reviews while maintaining backward compatibility through unchanged JSON field names.
54-71: LGTM! Builder updated consistently with field renaming.The builder parameters and assignments reflect the new field names, providing clear unit semantics at the API level.
73-86: LGTM! Deserialization updated consistently.The StreamInput constructor reads timeout values into the renamed fields, maintaining proper binary serialization compatibility.
88-96: LGTM! Default constructor initializes with optimized values.The no-arg constructor ensures all fields have sensible defaults, preventing null values and providing the optimized timeout configuration by default.
98-115: LGTM! Serialization updated consistently.The
writeTomethod serializes the renamed fields in the same order, maintaining binary protocol compatibility with the deserialization path.
117-142: LGTM! JSON serialization maintains backward compatibility.The
toXContentmethod outputs using the original JSON field names (connection_timeout,read_timeout), ensuring existing configurations and API clients continue to work. The internal field renaming doesn't break the external contract.
149-200: LGTM! Parser maintains backward compatibility while mapping to renamed fields.The
parsemethod preserves backward compatibility by:
- Reading from unchanged JSON field names (
connection_timeout,read_timeout)- Mapping parsed values to the renamed builder methods (
connectionTimeoutMillis,readTimeoutSeconds)- Relying on documented unit semantics (milliseconds for connection timeout, seconds for read timeout)
Existing configurations will continue to parse correctly, as the external JSON contract remains stable while internal naming improves clarity.
Signed-off-by: zane-neo <zaniu@amazon.com>
bd31362 to
1712802
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutor.java (1)
27-27: Optional: Remove redundantvolatilemodifier.The
volatilekeyword onhttpClientRefis unnecessary—AtomicReferencealready provides the required memory visibility and happens-before guarantees through its atomic operations. The field reference itself is never reassigned (it's effectively final), sovolatilehas no effect.🔎 Suggested simplification
- private volatile AtomicReference<SdkAsyncHttpClient> httpClientRef = new AtomicReference<>(); + private final AtomicReference<SdkAsyncHttpClient> httpClientRef = new AtomicReference<>();
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
common/src/main/java/org/opensearch/ml/common/connector/ConnectorClientConfig.javaml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutor.javaml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/RemoteModel.javaml-algorithms/src/test/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutorTest.javaplugin/src/test/java/org/opensearch/ml/rest/RestMLInferenceSearchRequestProcessorIT.java
🚧 Files skipped from review as they are similar to previous changes (2)
- ml-algorithms/src/test/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutorTest.java
- plugin/src/test/java/org/opensearch/ml/rest/RestMLInferenceSearchRequestProcessorIT.java
🧰 Additional context used
🧬 Code graph analysis (1)
ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutor.java (2)
common/src/main/java/org/opensearch/ml/common/connector/ConnectorClientConfig.java (1)
Getter(26-201)common/src/main/java/org/opensearch/ml/common/connector/AbstractConnector.java (1)
Getter(30-156)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and Test MLCommons Plugin on Windows (21)
- GitHub Check: Build and Test MLCommons Plugin on linux (21)
- GitHub Check: Build and Test MLCommons Plugin on linux (25)
- GitHub Check: Build and Test MLCommons Plugin on Windows (25)
🔇 Additional comments (5)
common/src/main/java/org/opensearch/ml/common/connector/ConnectorClientConfig.java (2)
38-43: LGTM! Timeout field renaming and defaults clarify units.The renaming from
connectionTimeout/readTimeouttoconnectionTimeoutMillis/readTimeoutSecondsexplicitly communicates units, resolving the critical ambiguity flagged in past reviews. The updated defaults (1000ms for connection establishment, 10s for read) are sensible for pooled HTTP clients and align with the PR's goal of preventing connection pool exhaustion when endpoints are slow.Also applies to: 47-48
31-32: Backwards-compatible serialization maintained.The serialization field names (
CONNECTION_TIMEOUT_FIELD,READ_TIMEOUT_FIELD) remain unchanged while internal field names and getters have been updated with unit suffixes. This preserves wire compatibility with existing configurations while improving code clarity.Also applies to: 123-127, 167-171, 193-194
ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/RemoteModel.java (1)
99-104: LGTM! Null-check prevents NPE in close().The guard ensures
close()is safe to call even if the executor was never initialized, addressing the issue flagged in previous reviews.ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AbstractConnectorExecutor.java (2)
37-69: LGTM! Thread-safe lazy initialization with proper null-safety.The double-checked locking pattern is correctly implemented:
- Fast path (lines 39-42) avoids synchronization after initialization
- Synchronized slow path (lines 44-68) ensures only one thread creates the client
Optional.ofNullable().orElse()pattern prevents NPEs when config values are null, addressing the concern from previous reviewsThe Duration conversions correctly interpret
connectionTimeoutMillisas milliseconds andreadTimeoutSecondsas seconds, aligning with the renamed getters in ConnectorClientConfig.
71-76: LGTM! Atomic close prevents use-after-close.Using
getAndSet(null)atomically clears the reference while retrieving the old client for cleanup, preventing subsequentgetHttpClient()calls from returning a closed client. This addresses the use-after-close issue flagged in previous reviews.
Description
Current connector client configuration are not optimal, the value are too big for a httpclient, if the remote endpoint performance is not good and holding the request for long time without response, then the
connectionswill be exhausted in the connection pool. We need to set a better value for them. There isn't a silver bullet configuration for all scenarios, but since we're using pooling connections, we don't want the connection waiting too long(connection usually completed in milliseconds if a server is up), so we set the default connection timeout to 1s, we can't guarantee all LLM can respond quickly so we set the read timeout to 10s.Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
Bug Fixes
New Features
Breaking Changes
Chores
Tests
✏️ Tip: You can customize this high-level summary in your review settings.