-
Notifications
You must be signed in to change notification settings - Fork 1
REL-1207540-telemetry-volume-reduction & REL-1224050-Retension-Policy-Guidelines #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: REL-1238635-cumulative-cum-folder-hierarchy-change
Are you sure you want to change the base?
Conversation
…. Collapsed multiple tables into a single Elastic Stack infrastructure table to describe the infrastructure recommendations using the Server 2025 GA EW production certification results.
| | Kibana | 1 | 4 | | ||
| | APM Server | 1 | 4 | | ||
|
|
||
| | Environment Size | Web Servers | Agent Servers | Worker Servers | SQL Distributed Servers | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scott-parillo This is mismatching with the slack thread mentioned numbers(https://kcura-pd.slack.com/archives/C0616SVFYBU/p1764962920595329?thread_ts=1764959199.337929&cid=C0616SVFYBU)
Also mismatching with our initial table here
May i know whether these numbers are intentional or do i have make some changes?
| **A few other key notes and reminders:** | ||
|
|
||
| - **Tuning for speed** – Review Elastic’s guidance on how to tune the environment for speed [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-search-speed.html). | ||
| - **Hosting Elastic** – While the guidance below recommends installing the Elastic components on many dedicated servers, there are no hard requirements to isolate Elasticsearch, Kibana, or APM Server on dedicated hosts. As evident with the Development environment specifications, the full Elastic stack can be deployed on a single host if that server can meet the storage needs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed as per the feedback: https://kcura-pd.slack.com/archives/C0616SVFYBU/p1764987926224259?thread_ts=1764987546.162489&cid=C0616SVFYBU
| | **Processing** | **+450% faster** | Processing performance has improved dramatically, delivering a 450% speed increase that will noticeably accelerate end-to-end workflows. | | ||
| | **Review (Conversion)** | **+5% faster** | Review operations saw a modest 5% improvement, providing slightly faster document conversion without any workflow disruption. | | ||
| | **Imaging & Production** | **Stable (±4%)** | Imaging and production performance remained stable, with changes within a ±4% range, resulting in no meaningful impact to customer workflows. | | ||
| | **Data Transfer** | **Mixed results** | Native file operations improved by 4–38%, offering smoother import/export performance. Image-based workflows saw some declines—most notably a 157% slowdown in RIP image export—which may impact image-heavy projects. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Here we are disclosing 157% slowdown which might give negative perspective to client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that huge performance increases/decreases have yet to be vetted by the teams. Honestly, it defies logic but I don't want to completely discount. Until this has been thoroughly investigated and a clear conclusion drawn, I would advise removing any such results (positive or negative) until then. As it's currently worded, this sounds concerning but the results indicate otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the info as of now until other team comes to a conclusion
966d848
|
|
||
| ## Conclusion | ||
|
|
||
| Environment Watch delivers significant performance improvements for processing workloads while maintaining stable performance for most other Relativity operations. Organizations with heavy image-based data transfer workflows should evaluate their specific use cases to ensure alignment with their performance requirements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Depends on decision in including RIP image slowdown info, last line need to be modified/removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the info as of now until other team comes to a conclusion
966d848
Rahiman-Nadaf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed retention policy guidelines and looks good to me
docs/environment-watch/post-install-verification/retention-policy.md
Outdated
Show resolved
Hide resolved
docs/environment-watch/post-install-verification/retention-policy.md
Outdated
Show resolved
Hide resolved
docs/environment-watch/post-install-verification/retention-policy.md
Outdated
Show resolved
Hide resolved
DaRealRahul1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have completed reviewing the document. Could you please check the document below and make the necessary changes accordingly.
https://jira.kcura.com/secure/attachment/758132/EW%20Review.docx
|
Thanks @DaRealRahul1 , i have accommodated all your feedback, Please find the list below and respective action.
Requesting you to review and approve the PR. |
|
As discussed with @KarunaDhawan , removed the performance impact info. Until teams conclude that information, this PR will be blocked. |
|
|
||
| These guidelines define retention policies for logs, metrics, and traces collected in Elasticsearch and viewed through Kibana. Proper retention management is critical for: | ||
|
|
||
| - **Storage Optimization** – Prevents excessive disk usage by automatically removing outdated data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storage and cost are tied together, will be best to combine them here:
Storage Optimization & Cost Control – Prevents excessive disk usage and reduces infrastructure costs by automatically removing outdated data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
| ``` | ||
| Docs/Day (Daily Documents) = 6M + (Web_Server_Count × 2M) + (Agent_Server_Count × 2M) + (Worker_Server_Count × 400k) + (SQL_Distributed_Server_Count × 500k) | ||
| GiB/Day (Daily Storage) = Docs/Day × 380 / 1024³ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where does this 380 comes from? Perhaps we should explain what that number is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, updated the same.
| ### Step 3: Delete Existing Data Streams (Setup Time Only) | ||
|
|
||
| > [!WARNING] | ||
| > This step should only be performed once during initial setup. Deleting data streams will permanently remove all data and indices under those data streams. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data stream deletion step is destructive. Can we make the warning more prominent or clarify when it is safe to run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, Updated the same.
| **Sample Request:** | ||
|
|
||
| ``` | ||
| # Here logs-apm.app@template is the name of the index template |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Logs, Metrics, and Traces sections repeat the same workflow. Can we describe the pattern once and only show full JSON once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially, I had kept it the same way. However, during SQE verification, I received feedback that they felt stuck or confused and requested that all three be shown separately, so users can follow more smoothly. However, please let me know if you still prefer combining them into a single section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dinesh1010101 If compressing into one JSON causes confusion, perhaps we can keep it how it is set up currently, thanks
| ### Purpose | ||
|
|
||
| These guidelines define retention policies for logs, metrics, and traces collected in Elasticsearch and viewed through Kibana. Proper retention management is critical for: | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dinesh1010101 Please add EW context clarifying that configuring Elasticsearch retention policies is optional. Environment Watch operates correctly using default retention settings, and this configuration should be applied only when customers need to customize retention behavior. This helps prevent readers from assuming the configuration is required.
Note text could be something like this: Configuring Elasticsearch retention policies is optional. Environment Watch works out of the box using default retention settings. The configurations described here should be applied only if you need to customize how long data is retained to align with your organization’s storage, performance, or compliance requirements.
| **Sample Request:** | ||
|
|
||
| ``` | ||
| # Here logs-apm.app@template is the name of the index template |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dinesh1010101 If compressing into one JSON causes confusion, perhaps we can keep it how it is set up currently, thanks
| > - You are performing **initial setup** and no production data exists yet | ||
| > | ||
| > **Do NOT run this on production systems with active data.** | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the Note but still it seems like this is a necessary step by the users. Pls update this to explicitly mark the step as optional and intended only for initial setup or controlled scenarios. For example:
Step 3: Delete Existing Data Streams (Setup Time Only)
This step is optional and is not required for most Environment Watch deployments. It should only be performed during initial setup or in controlled, non-production scenarios.
This step will permanently delete all data and indices in the specified data streams. There is no recovery. Only proceed if:
- You are in a development or non-production environment, OR
- You have backed up all critical data from these data streams, OR
- You are performing initial setup and no production data exists yet
Do NOT run this on production systems with active data.
| } | ||
| ``` | ||
|
|
||
| > [!IMPORTANT] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This note is great but will be helpful to have it right before the Index template update steps.
|
|
||
| ## Overview | ||
|
|
||
| This document provides transparent information about the performance overhead Environment Watch introduces to standard Relativity workloads, based on comprehensive testing in a production-like environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pls update this to: This document provides transparent information about the performance overhead Environment Watch introduces to standard Relativity workloads, based on testing conducted in a controlled, production-like environment. Actual performance may vary depending on workload characteristics, environment size, infrastructure configuration, and usage patterns.
| | **Review (Conversion)** | **+5% faster** | Review operations saw a modest 5% improvement, providing slightly faster document conversion without any workflow disruption. | | ||
| | **Imaging & Production** | **Stable (±4%)** | Imaging and production performance remained stable, with changes within a ±4% range, resulting in no meaningful impact to customer workflows. | | ||
| | **Data Transfer** | **~5-6% faster on average** | Data transfer operations showed performance improvements with Imports demonstrating ~10% faster performance on average, while Exports (excluding RIP Images Export) were ~1% faster on average, resulting in an approximate 5-6% overall improvement. | | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pls add a note before the table: The results below reflect observed outcomes from internal testing and are provided for transparency. These results should not be interpreted as guaranteed performance improvements for all Environment Watch deployments.
|
|
||
| ## Conclusion | ||
|
|
||
| Environment Watch has demonstrated minimal to positive impact on Relativity workloads across comprehensive testing. Most operations showed performance improvements, with Processing, Data Transfer, and Review all performing faster. Imaging and Production workflows remained stable. These results confirm that Environment Watch provides valuable observability and monitoring capabilities without compromising your Relativity system's performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pls update this to something like this as the results may vary in diff env: Environment Watch has demonstrated minimal to positive impact on Relativity workloads based on comprehensive testing in a controlled, production-like environment. Most operations showed performance improvements, with Processing, Data Transfer, and Review performing faster, while Imaging and Production workflows remained stable. Environment Watch is designed to deliver observability and monitoring capabilities with minimal overhead; however, actual performance results may vary based on customer-specific configurations, environment size, and workload characteristics.
Summary
Test Evidence for Retention Policy Guidelines - SQE: #44 (review)
Test Evidence for Retention Policy Guidelines - DEV: 12_16_2025_Retention_Guideline_Dev_Evidence.docx