Support tracing RAG retrieval in Spring AI 1.x plugin by peachisai · Pull Request #808 · apache/skywalking-java

peachisai · 2026-05-31T02:57:38Z

If this pull request closes/resolves/fixes an existing issue, replace the issue number. Closes #.
Update the CHANGES log.

# Conflicts: # apm-sniffer/apm-sdk-plugin/spring-plugins/spring-ai-1.x-plugin/src/main/java/org/apache/skywalking/apm/plugin/spring/ai/v1/AbstractObservationVectorStoreInterceptor.java

# Conflicts: # apm-sniffer/apm-sdk-plugin/spring-plugins/spring-ai-1.x-plugin/src/main/java/org/apache/skywalking/apm/plugin/spring/ai/v1/AbstractObservationVectorStoreConstructorInterceptor.java # apm-sniffer/apm-sdk-plugin/spring-plugins/spring-ai-1.x-plugin/src/main/java/org/apache/skywalking/apm/plugin/spring/ai/v1/define/AbstractObservationVectorStoreInstrumentation.java

wu-sheng

Thanks for the contribution, and nice work — CI is green and the vector-store retrieval tracing works end-to-end (I confirmed the embedding model name is captured through the enhanced abstract-base constructor, so gen_ai.request.model is populated as expected). A few things I'd like addressed before merge:

1. Avoid reflection on private fields

AbstractObservationVectorStoreConstructorInterceptor#resolveModelFromOptionsField() reads the options/defaultOptions fields via getDeclaredField(...) + setAccessible(true). We avoid reflective access to private fields of third-party classes in plugins — it silently breaks across Spring AI versions if a field is renamed (the catch (Throwable) swallows it and the model tag just disappears, with no signal). Please resolve the model name via public API instead, e.g. EmbeddingRequest.getOptions().getModel() captured at the embed call, or via the builder's public getEmbeddingModel(). If neither is clean for a given impl, prefer leaving gen_ai.request.model unset over private-field reflection.

2. `ErrorTypeResolver` value scheme is inconsistent

resolve() mixes a fully-qualified class name (UnknownHostException.class.getName()) with custom low-cardinality tokens ("timeout", "server_certificate_invalid") and an "_OTHER" catch-all. The _OTHER default discards classification for everything that isn't one of the three special cases (connection refused, 4xx/5xx from the provider, rate-limit, auth failures, …). Please pick one scheme: keep the tokens for the cases you explicitly handle, but for the fallback use throwable.getClass().getName() instead of _OTHER so unrecognized errors stay diagnosable. (Also, UnknownHostException is currently the odd one out emitting an FQN while the rest are tokens.)

3. Span lifecycle robustness for non-`SimpleVectorStore` stores

The instrumentation matches the whole AbstractObservationVectorStore hierarchy, so the interceptor runs for every subclass. In beforeMethod, createObservationContextBuilder(...).build() and resolveDataSourceId() run before createExitSpan(...). If some store's builder throws, the agent wrapper swallows it but afterMethod still executes — and because the parent span is active, it would stopSpan() / tag the parent span instead of the (never-created) retrieval span, corrupting the parent segment. SimpleVectorStore is safe (its builder is a pure factory), but please make it robust for other stores: create the exit span first (compute dataSourceId afterward), or wrap the pre-span work in try/catch and fall back to objInst.getClass().getSimpleName().

4. Update CHANGES.md and docs

Please add a CHANGES.md entry for the new RAG / vector-store retrieval tracing, the three new GenAI tags (gen_ai.data_source.id, gen_ai.retrieval.documents, gen_ai.retrieval.query.text), and the error.type resolution, plus a short note in the spring-ai plugin docs. (No new component id or UI logo is needed — reusing the spring-ai component / id 178 is correct.)

wu-sheng

A few notes on the new vector-store retrieval tracing (inline below).

The main one is the retrieval query text being captured by default — it bypasses the plugin's existing default-off privacy posture for prompt/completion content, so a user who relies on the defaults to avoid logging user input would still get the RAG query recorded. The rest are small robustness/consistency cleanups.

Otherwise the change looks good — the abstract-base-class constructor interception for resolving the embedding model name is a nice touch.

wu-sheng · 2026-06-02T13:06:34Z

+        }
+        try {
+            if (ret instanceof List<?>) {
+                Tags.GEN_AI_RETRIEVAL_DOCUMENTS.set(ContextManager.activeSpan(), toDocumentsJson((List<?>) ret));


gen_ai.retrieval.documents is likewise emitted unconditionally, unlike COLLECT_OUTPUT_MESSAGES which gates analogous response content (default false).

Lower risk than the query text since only id + score are serialized (not document bodies), but for consistency consider a COLLECT_RETRIEVAL_DOCUMENTS = false flag. Note the JSON grows with topK and has no size cap.

wu-sheng · 2026-06-02T13:06:34Z

+
+    @Override
+    public Object afterMethod(EnhancedInstance objInst, Method method, Object[] allArguments, Class<?>[] argumentsTypes, Object ret) {
+        EmbeddingResponseMetadata metadata = ((EmbeddingResponse) ret).getMetadata();


((EmbeddingResponse) ret).getMetadata() runs with no guard on ret. The agent calls afterMethod from a finally, so it also runs when call(EmbeddingRequest) threw — and then ret is null, NPE-ing on every embedding failure (caught + logged by the framework, but it spams ERROR logs).

ChatModelCallInterceptor.afterMethod guards with if (!(ret instanceof ChatResponse)) { return ret; } — suggest the same pattern here. The metadata == null check below then becomes dead, since getMetadata() never returns null.

wu-sheng · 2026-06-02T15:59:57Z

A few notes on the latest fix (commit 2219fe7e8d):

1. apm-sniffer/config/agent.config:371 — the query-length limit can't actually be configured.
ConfigInitializer derives the property key from the field name (Plugin.SpringAi.RETRIEVAL_QUERY_LENGTH_LIMIT → plugin.springai.retrieval_query_length_limit) and only binds when that exact key is present. The key here is plugin.springai.retrieval_query, which matches no field, so RETRIEVAL_QUERY_LENGTH_LIMIT stays at the hard-coded 1024 and can't be changed via agent.config or the SW_... env var. Suggest renaming to:

plugin.springai.retrieval_query_length_limit=${SW_PLUGIN_SPRINGAI_RETRIEVAL_QUERY_LENGTH_LIMIT:1024}

(The sibling collect_retrieval_query / collect_retrieval_documents keys are fine.)

2. SpringAiPluginConfig.java:88 — nit: double space before = in COLLECT_RETRIEVAL_DOCUMENTS = false.

3. AbstractObservationVectorStoreInterceptor.java:82-86 — nit: the truncation guard limit > 0 means a value of 0 also collects the full query, i.e. 0 behaves the same as the documented negative = no limit. Fine if intentional (it's consistent with the chat interceptors), but 0 reads like it should mean "collect nothing".

peachisai added 16 commits May 30, 2026 08:59

Support tracing RAG retrieval in Spring AI 1.x plugin

dd0fc35

Add Spring AI error type resolver

1347195

Add Spring AI vector store instrumentation

81c3b2f

Add Spring AI vector store instrumentation

75d09e6

Fix

f33622d

Merge branch 'dev' of github.com:peachisai/skywalking-java into dev

8768a9d

# Conflicts: # apm-sniffer/apm-sdk-plugin/spring-plugins/spring-ai-1.x-plugin/src/main/java/org/apache/skywalking/apm/plugin/spring/ai/v1/AbstractObservationVectorStoreInterceptor.java

Merge branch 'dev' of github.com:peachisai/skywalking-java into dev

3cc5e13

# Conflicts: # apm-sniffer/apm-sdk-plugin/spring-plugins/spring-ai-1.x-plugin/src/main/java/org/apache/skywalking/apm/plugin/spring/ai/v1/AbstractObservationVectorStoreInterceptor.java

Use fallback error type for Spring AI errors

7268543

Cache Spring AI vector store model name

fa23baf

Fix

591427b

Merge branch 'dev' of github.com:peachisai/skywalking-java into dev

4682bd7

# Conflicts: # apm-sniffer/apm-sdk-plugin/spring-plugins/spring-ai-1.x-plugin/src/main/java/org/apache/skywalking/apm/plugin/spring/ai/v1/AbstractObservationVectorStoreInterceptor.java

Fix

77b4e81

Fix

d1dd31d

Support tracing RAG retrieval in Spring AI 1.x plugin

a9843a7

Fix

d772141

wu-sheng added this to the 9.7.0 milestone May 31, 2026

wu-sheng added the enhancement New feature or request label May 31, 2026

Fix e2e test

3b1acdc

wu-sheng reviewed Jun 1, 2026

View reviewed changes

peachisai added 5 commits June 1, 2026 21:59

Fix suggestions

e7923be

Fix suggestions

c852d8e

add plugin definition

a453e4e

Fix EmbeddingModelInterceptor

aeaf35a

remove redundant codes

e928170

peachisai requested a review from wu-sheng June 2, 2026 12:33

wu-sheng reviewed Jun 2, 2026

View reviewed changes

Fix suggestions

2219fe7

remove space

9874bd4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support tracing RAG retrieval in Spring AI 1.x plugin#808

Support tracing RAG retrieval in Spring AI 1.x plugin#808
peachisai wants to merge 24 commits into
apache:mainfrom
peachisai:uat

peachisai commented May 31, 2026 •

edited

Loading

Uh oh!

wu-sheng left a comment

Uh oh!

wu-sheng left a comment

Uh oh!

Uh oh!

wu-sheng Jun 2, 2026

Uh oh!

wu-sheng Jun 2, 2026

Uh oh!

Uh oh!

Uh oh!

wu-sheng commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

peachisai commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wu-sheng left a comment

Choose a reason for hiding this comment

1. Avoid reflection on private fields

2. ErrorTypeResolver value scheme is inconsistent

3. Span lifecycle robustness for non-SimpleVectorStore stores

4. Update CHANGES.md and docs

Uh oh!

wu-sheng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wu-sheng Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

wu-sheng Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wu-sheng commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

peachisai commented May 31, 2026 •

edited

Loading

2. `ErrorTypeResolver` value scheme is inconsistent

3. Span lifecycle robustness for non-`SimpleVectorStore` stores