Streamlining Jaeger Data Management in OpenSearch: A DevSecOps How-To

8085-01-08

9 minute read

Introduction: Taming the Data Tsunami

Hey There,

In the spirit of making our lives a bit easier, let’s tackle a common challenge: efficiently managing Jaeger indices in OpenSearch. We all love Jaeger for its robust tracing capabilities, but as our applications grow, so does the volume of data. I’m here to share a practical, step-by-step guide, complete with code snippets and API requests, to help you streamline this process.

The Challenge: Daily Indices Overload

Jaeger’s default setup of storing data in daily indices can lead to a quickly cluttered OpenSearch environment. It’s like having a closet packed with clothes you never wear - it just doesn’t make sense.

Our Goal: Efficient Index Lifecycle Management

We aim to implement a lifecycle policy for Jaeger indices in OpenSearch, setting clear rules for data retention and index rollover. Think of it as giving your data a well-organized home with a clear decluttering strategy.

Creating a Lifecycle Policy

Important Note: Jaeger’s support for OpenSearch as a backend storage option was introduced in version 1.53. Earlier versions of Jaeger do not support OpenSearch as a backend.

Why a Lifecycle Policy?

A lifecycle policy helps us manage data growth and storage cost effectively. It’s like setting up rules in a game - it keeps everything fair and orderly.

Implementation Steps

Here’s where you’ll detail the steps to create a lifecycle policy in OpenSearch. Outline the API requests and configuration settings.

Example:

 1{
 2  "policy": {
 3    "description": "Jaeger index management policy",
 4    "default_state": "hot",
 5    "states": [
 6      {
 7        "name": "hot",
 8        "actions": [
 9          {
10            "rollover": {
11              "min_size": "50gb",
12              "min_index_age": "30d"
13            }
14          }
15        ],
16        "transitions": [
17          {
18            "state_name": "warm",
19            "conditions": {
20              "min_index_age": "30d"
21            }
22          }
23        ]
24      },
25      {
26        "name": "warm",
27        "actions": [],
28        "transitions": [
29          {
30            "state_name": "cold",
31            "conditions": {
32              "min_index_age": "60d"
33            }
34          }
35        ]
36      },
37      {
38        "name": "cold",
39        "actions": [],
40        "transitions": [
41          {
42            "state_name": "delete",
43            "conditions": {
44              "min_index_age": "90d"
45            }
46          }
47        ]
48      },
49      {
50        "name": "delete",
51        "actions": [
52          {
53            "delete": {}
54          }
55        ]
56      }
57    ],
58    "ism_template": [
59      {
60        "index_patterns": [
61          "*jaeger-span-*"
62        ],
63        "priority": 30
64      },
65      {
66        "index_patterns": [
67          "*jaeger-service-*"
68        ],
69        "priority": 30
70      }
71    ]
72  }
73}

The policy in this example outlines a comprehensive approach for managing the lifecycle of Jaeger indices, including *jaeger-span-* and *jaeger-service-*. Let’s break down the key components of your policy:

Policy Overview

Description: The policy is described as “Jaeger index management policy”, clearly indicating its purpose.
Default State: The default state is set to “hot”, which is typically the starting point for new indices in a lifecycle policy.

States in the Policy

Hot State:
- Rollover Action: The policy specifies a rollover action based on the size (50 GB) or age (30 days) of the index.
- Transition to Warm State: After 30 days, the index transitions from the hot state to the warm state.
Warm State:
- This state doesn’t specify any actions, but it serves as an intermediate phase between the hot and cold states.
- Transition to Cold State: The transition to the cold state is set to occur when the index reaches 60 days of age.
Cold State:
- Similar to the warm state, no specific actions are defined in the cold state.
- Transition to Delete State: The policy transitions the index to the delete state when it becomes 90 days old.
Delete State:
- Deletion Action: In this final state, the policy executes a deletion action, effectively removing the index once it reaches the specified age.

ISM Templates

The policy includes ISM templates for *jaeger-span-* and *jaeger-service-* index patterns, both with a priority of 30. These templates ensure that the policy is automatically applied to any new indices matching these patterns.

Applying the Policy

To apply this policy in OpenSearch, you would typically send a PUT request to the ISM policy API endpoint. Here’s an example of how you can do it using a command line tool like curl:

1curl -X PUT "your-opensearch-cluster:9200/_opendistro/_ism/policies/jaeger-lifecycle-policy" -H 'Content-Type: application/json' -d'
2{
3  // ... The policy JSON here ...
4}'

Replace <your-opensearch-cluster> with your OpenSearch cluster’s URL and ensure the JSON data is the policy you’ve provided.

Crafting Index Templates

The Role of Index Templates

Index templates in OpenSearch are like our data management blueprints. They ensure that each new index created for our Jaeger data adheres to our predefined settings and mappings. More importantly, they automatically apply our lifecycle policy to these indices, keeping our data shipshape.

Setting Up Templates

Getting our index templates right is crucial for smooth sailing. Think of it as custom-tailoring your suit – it needs to fit just right. We’re going to create two templates: one for our jaeger-span-* indices and another for jaeger-service-*. These templates will include all the necessary settings, mappings, and a direct line to our lifecycle policy.

Let’s roll up our sleeves and dive into the setup:

  1# Create Jaeger Span Index Template
  2curl -X PUT "http://your-opensearch-cluster:9200/_index_template/jaeger-span-template" -H 'Content-Type: application/json' -d'
  3{
  4  "index_patterns": [
  5    "*jaeger-span-*"
  6  ],
  7  "template": {
  8    "aliases": {
  9      "jaeger-span-read": {}
 10    },
 11    "settings": {
 12      "index.mapping.nested_fields.limit": "50",
 13      "index.plugins.index_state_management.policy_id": "jaeger-lifecycle-policy",
 14      "index.plugins.index_state_management.rollover_alias": "jaeger-span-write",
 15      "index.requests.cache.enable": "true",
 16      "index.number_of_shards": "2",
 17      "index.number_of_replicas": "2"
 18    },
 19    "mappings": {
 20      "dynamic_templates": [
 21        {
 22          "span_tags_map": {
 23            "path_match": "tag.*",
 24            "mapping": {
 25              "ignore_above": 256,
 26              "type": "keyword"
 27            }
 28          }
 29        },
 30        {
 31          "process_tags_map": {
 32            "path_match": "process.tag.*",
 33            "mapping": {
 34              "ignore_above": 256,
 35              "type": "keyword"
 36            }
 37          }
 38        }
 39      ],
 40      "properties": {
 41        "traceID": {
 42          "ignore_above": 256,
 43          "type": "keyword"
 44        },
 45        "process": {
 46          "properties": {
 47            "tag": {
 48              "type": "object"
 49            },
 50            "serviceName": {
 51              "ignore_above": 256,
 52              "type": "keyword"
 53            },
 54            "tags": {
 55              "dynamic": false,
 56              "type": "nested",
 57              "properties": {
 58                "tagType": {
 59                  "ignore_above": 256,
 60                  "type": "keyword"
 61                },
 62                "value": {
 63                  "ignore_above": 256,
 64                  "type": "keyword"
 65                },
 66                "key": {
 67                  "ignore_above": 256,
 68                  "type": "keyword"
 69                }
 70              }
 71            }
 72          }
 73        },
 74        "startTimeMillis": {
 75          "format": "epoch_millis",
 76          "type": "date"
 77        },
 78        "references": {
 79          "dynamic": false,
 80          "type": "nested",
 81          "properties": {
 82            "traceID": {
 83              "ignore_above": 256,
 84              "type": "keyword"
 85            },
 86            "spanID": {
 87              "ignore_above": 256,
 88              "type": "keyword"
 89            },
 90            "refType": {
 91              "ignore_above": 256,
 92              "type": "keyword"
 93            }
 94          }
 95        },
 96        "flags": {
 97          "type": "integer"
 98        },
 99        "operationName": {
100          "ignore_above": 256,
101          "type": "keyword"
102        },
103        "parentSpanID": {
104          "ignore_above": 256,
105          "type": "keyword"
106        },
107        "tags": {
108          "dynamic": false,
109          "type": "nested",
110          "properties": {
111            "tagType": {
112              "ignore_above": 256,
113              "type": "keyword"
114            },
115            "value": {
116              "ignore_above": 256,
117              "type": "keyword"
118            },
119            "key": {
120              "ignore_above": 256,
121              "type": "keyword"
122            }
123          }
124        },
125        "spanID": {
126          "ignore_above": 256,
127          "type": "keyword"
128        },
129        "duration": {
130          "type": "long"
131        },
132        "startTime": {
133          "type": "long"
134        },
135        "tag": {
136          "type": "object"
137        },
138        "logs": {
139          "dynamic": false,
140          "type": "nested",
141          "properties": {
142            "fields": {
143              "dynamic": false,
144              "type": "nested",
145              "properties": {
146                "tagType": {
147                  "ignore_above": 256,
148                  "type": "keyword"
149                },
150                "value": {
151                  "ignore_above": 256,
152                  "type": "keyword"
153                },
154                "key": {
155                  "ignore_above": 256,
156                  "type": "keyword"
157                }
158              }
159            },
160            "timestamp": {
161              "type": "long"
162            }
163          }
164        }
165      }
166    }
167  },
168  "composed_of": [],
169  "priority": 30,
170  "name": "jaeger-span-template"
171}'

And for the jaeger-service-* indices, the process is similar. Just tweak the pattern and alias accordingly:

 1# Create Jaeger Service Index Template
 2curl -X PUT "http://your-opensearch-cluster:9200/_index_template/jaeger-service-template" -H 'Content-Type: application/json' -d'
 3{
 4  "index_patterns": [
 5    "*jaeger-service-*"
 6  ],
 7  "template": {
 8    "aliases": {
 9      "jaeger-service-read": {}
10    },
11    "settings": {
12      "index.mapping.nested_fields.limit": "50",
13      "index.plugins.index_state_management.policy_id": "jaeger-lifecycle-policy",
14      "index.plugins.index_state_management.rollover_alias": "jaeger-service-write",
15      "index.requests.cache.enable": "true",
16      "index.number_of_shards": "2",
17      "index.number_of_replicas": "2"
18    },
19    "mappings": {
20      "dynamic_templates": [
21        {
22          "span_tags_map": {
23            "path_match": "tag.*",
24            "mapping": {
25              "ignore_above": 256,
26              "type": "keyword"
27            }
28          }
29        },
30        {
31          "process_tags_map": {
32            "path_match": "process.tag.*",
33            "mapping": {
34              "ignore_above": 256,
35              "type": "keyword"
36            }
37          }
38        }
39      ],
40      "properties": {
41        "operationName": {
42          "ignore_above": 256,
43          "type": "keyword"
44        },
45        "serviceName": {
46          "ignore_above": 256,
47          "type": "keyword"
48        }
49      }
50    }
51  },
52  "composed_of": [],
53  "priority": 31,
54  "name": "jaeger-service-template"
55}'

In these commands, we’re telling OpenSearch:

Which Indices to Apply To: Our patterns *jaeger-span-* and *jaeger-service-* make sure the right indices get the right rules.
How to Handle the Data: By linking our jaeger-lifecycle-policy, we’re automating the data’s lifecycle management.
The Rollover Process: The rollover_alias sets up our indices for the seamless transition during rollovers.

And there we have it – our index templates are set, ready to keep our Jaeger data organized and under control.

Please note: To access the latest mappings for jaeger-span and jaeger-service, refer to the following link.

Additional Considerations

Ensure the mappings in your template are correctly defined for the Jaeger data structure.
Set appropriate index settings, such as the number of shards and replicas, based on your cluster’s capacity and performance requirements.

The First Index and Rollover Magic

Starting Point: The First Index

Creating the first index manually is like laying the foundation of a building. It’s essential for the whole structure.

Rollover in Action

Now, let’s get our hands dirty and set the stage for the rollover magic. The first step is to manually create the initial index. This isn’t just any index – it’s the cornerstone upon which our rollover strategy is built. We’ll name it in a way that fits into our sequential naming pattern, like jaeger-span-000001.

The key here is to set this index up with a rollover alias. This alias acts as a pointer, guiding OpenSearch where to write data and when to shift to a new index as our current one fills up or ages. Let’s walk through the command to make this happen:

1curl -X PUT "http://your-opensearch-cluster:9200/jaeger-span-000001" -H 'Content-Type: application/json' -d'
2{
3  "aliases": {
4    "jaeger-span-write": {
5      "is_write_index": true
6    }
7  }
8  // Add any additional settings you might need here
9}'

In this command, we’re doing a few critical things:

Creating the Index: jaeger-span-000001 is our first index in the series.
Setting the Alias: We assign jaeger-span-write as our rollover alias.
Marking as Write Index: By setting "is_write_index": true, we’re telling OpenSearch that this is our current writing destination.

Once this index is in place and starts collecting data, OpenSearch will keep an eye on it. When our index hits the size or age limits defined in our lifecycle policy, OpenSearch will seamlessly roll over to a new index, say jaeger-span-000002, and update the alias accordingly. It’s like an efficient assembly line, but for data!

Conclusion: The Perks of Being Organized

Wrapping up, this method not only streamlines data management but also aligns with the best practices of storage optimization in DevSecOps. It’s like having a self-cleaning kitchen - less mess, more efficiency.

Final Thoughts Implementing this lifecycle management strategy for Jaeger data in OpenSearch is a game-changer. It’s about making our systems more efficient, cost-effective, and manageable.