Tanzu Observability (formerly known as VMware Aria Operations for Applications)) collects internal metrics that are used extensively in the different dashboards of the Tanzu Observability Usage integration.

You can:

  • Clone and modify one of the Tanzu Observability Usage integration dashboards.
  • Create your own dashboard, query these metrics in charts, and create alerts for some of these metrics.

Most of the internal metrics are ephemeral and not convertible to persistent. Exceptions are the following internal metrics, which are persistent:

  • ~collector.*points.reported
  • ~externalservices.*.points
  • ~derived-metrics.points.reported
  • ~collector.*histograms.reported
  • ~derived-histograms.histograms.reported
  • ~collector.*spans.reported
  • ~query.metrics_scanned
  • ~proxy.points.*.received
  • ~proxy.histograms.*.received
  • ~proxy.spans.*.received
  • ~proxy.spanLogs.*.received
  • ~proxy.build.version
  • ~metric.global.namespace.*
  • ~histogram.global.namespace.*
  • ~counter.global.namespace.*

Internal Metrics Overview

We collect the following sets of metrics.

  • ~alert* – a set of metrics that allows you to examine the effect of alerts on your service instance.
  • ~collector – metrics processed at the collector gateway to the service instance. Includes spans.
  • ~metric – total unique sources and metrics. You can compute the rate of metric creation from each source.
  • ~proxy – metric rate received and sent from each Wavefront proxy, blocked and rejected metric rates, buffer metrics, and JVM stats of the proxy. Also includes counts of metrics affected by the proxy preprocessor. See Monitor Wavefront Proxies.
  • ~wavefront – set of gauges that track metrics about your use of the Tanzu Observability service.
  • ~http.api – namespace for looking at API request metrics.

If you have an AWS integration, metrics with the following prefix are available:

  • ~externalservices – metric rates, API requests, and events from AWS CloudWatch, AWS CloudTrail, and AWS Metrics+.

There’s also a metric you can use to monitor ongoing events and make sure the number does not exceed 1000:

  • ~events.num-ongoing-events – returns the number of ongoing events.

Useful Internal Metrics for Optimizing Performance

A small set of internal metrics can help you optimize performance and monitor your costs. This section highlights some things to look for - the exact steps depend on how you’re using the Tanzu Observability service and on the characteristics of your environment.

Our customer support engineers have found the following metrics especially useful.

TypeMetricDescription
~alert ~alert.query_time.<alert_id> Tracks the average time, in ms, that a specified alert took to run in the past hour.
~alert ~alert.query_points.<alert_id> Tracks the average number of points that a specified alert scanned in the past hour.
~alert ~alert.checking_frequency.<alert_id> Tracks how often a specified alert performs a check. See Alert States for details.
~collector ~collector.points.reported
~collector.histograms.reported
~collector.tracing.spans.reported
~collector.tracing.span_logs.reported
~collector.tracing.span_logs.bytes_reported
Valid metric points, histogram points, trace data (spans), or span logs that the collector reports to Tanzu Observability. This is a billing metric that you can look up on the Tanzu Observability Usage dashboard.

Note: We have a corresponding direct ingestion metric for each metric. For example, corresponding to collector.points.reported we have collector.direct-ingestion.points.reported.
~collector ~collector.points.batches
~collector.histograms.batches
~collector.tracing.spans.batches
~collector.tracing.span_logs.batches
Number of batches of points, histogram points, or spans received by the collector, either via the proxy or via the direct ingestion API. In the histogram context a batch is the number of HTTP POST requests.

Note: We have a corresponding direct ingestion metric for each metric. For example, corresponding to collector.spans.batches we have collector.direct-ingestion.spans.batches.
~collector ~collector.points.undecodable
~collector.histograms.undecodable
~collector.tracing.spans.undecodable
~collector.tracing.span_logs.undecodable
Points, histogram points, spans, or span logs that the collector receives but cannot report to Tanzu Observability because the input is not in the right format.

Note: We have a corresponding direct ingestion metric for each metric. For example, corresponding to collector.points.undecodable we have collector.direct-ingestion.points.undecodable.
~collector ~collector.delta_points.tracing_red.reported
~collector.histograms.tracing_red.reported
~collector.points.tracing_red.reported
Delta counters, histograms, and points derived as Tracing RED metrics that the collector receives.

Note: We have a corresponding direct ingestion metric for each metric. For example, corresponding to collector.delta_points.tracing_red.reported we have collector.direct-ingestion.delta_points.tracing_red.reported.
~metric ~metric.new_host_ids Counter that increments when a new source= or host= is sent to Tanzu Observability.
~metric ~metric.new_metric_ids Counter that increments when a new metric name is sent to Tanzu Observability.
~metric ~metric.new_string_ids Counter that increments when a new point tag value is sent to Tanzu Observability.
~query ~query.requestsCounter tracking the number of queries a user made.
~http.api ~http.api.v2.* Monotonic counter, without tags, that can be aligned with the API endpoints and allows you to examine API request metrics.
For example: ts(~http.api.v2.alert.{id}.GET.200.count) aligns with the GET /api/v2/alert/{id} API endpoint.
Examine the ~http.api.v2. namespace to see the counters for specific API endpoints.

If several slow queries are executed within the selected time window the Slow Query page can become long. Section links at the top left allow you to select a section. The links display only after you have scrolled down the page.