Solve problems with the Tanzu Observability tile and the TAS integration.

This doc page looks at possible causes for problems you might encounter with your Tanzu Application Service (TAS) to Tanzu Observability integration and explains how to address them.

Sizing and Scaling for Large TAS Foundations

Larger TAS foundations are more demanding to monitor than smaller foundations.

  • If more application instances are running on a foundation, then more container-level metrics have to be collected and forwarded to Tanzu Observability.
  • If more virtual machines are in a foundation, then more VM-level metrics are reported.

If your foundation is large, tune the following parameters, in this order:

  1. Increase the size of your Telegraf Agent Virtual Machine. The Telegraf agent is responsible for collecting metrics and transforming them into the Wavefront data format. The is typically CPU and memory bound, so increasing virtual machine size can increase perfrmance.
  2. Increase the scrape interval. If collection times for some scrape targets are greater than 12 seconds, consider changing the scrape interval for your environment to a lower frequency. Typically, 120% of the longest observed collection time is safe.

Using the Nozzle Successfully with Service Broker Bindings

Support for service broker bindings differ for different versions of the Tanzu Observability by Wavefront Nozzle:

  • The Tanzu Observability by Wavefront Nozzle v4.1.1 supports Service Broker Bindings. When you configure Nozzle 4.1.1, select Enable legacy service broker bindings in the Wavefront Proxy Config tab. See Install Nozzle 4.1.1 and Enable Service Broker Bindings.
  • The Tanzu Observability by Wavefront Nozzle v4.1.0 DOES NOT support Service Broker Bindings. If you upgraded to nozzle 4.1.0, you have to:
    1. Downgrade from Tanzu Observability by Wavefront Nozzle v4.1.0 to Tanzu Observability by Wavefront Nozzle v3.
    2. Upgrade from Tanzu Observability by Wavefront Nozzle v3 to Tanzu Observability by Wavefront Nozzle v4.1.1. That version of the nozzle includes a checkbox that supports retaining Service Broker Bindings. The process is discussed in this section.

Downgrade from Nozzle 4.1.0 to Nozzle 3.0

This section explains how to downgrade. For clarity, the section uses explicit version numbers.

Step 1. Uninstall v4 of the Tanzu Observability by Wavefront Nozzle.
  1. Log in to Ops Manager.
  2. In the installation dashboard, find the Tanzu Observability tile and click the delete icon to stage the deletion.
  3. Click Review Pending Changes and uncheck boxes for any products that you don't want redeployed.
  4. Click Apply Changes to complete the deletion process.
Ops Manager installation dashboard shows 3 tiles, trash can highlighted.
Step 2. In the bottom left of the Ops Manager installation dashboard, click Delete all unused products and confirm.


Note: If you don't delete all unused products, the import of the v3 nozzle might fail later with an error like the following:"Metadata already exists for name: wavefront-nozzle and version: 3.0.5".
Zoom in on Delete Unused Products, with arrow pointing to trash icon.
Step 3. Download v3 of the Tanzu Observability by Wavefront Nozzle.
  1. Log in to Tanzu Network and go to https://network.pivotal.io/products/wavefront-nozzle.
  2. Select v3 of the nozzle and download it.

Step 4. Import and install v3 of the nozzle.
  1. In the Ops Manager Installation Dashboard, click Import a Product.
  2. Select the v3 nozzle that you just downloaded.
 
Step 5. Configure and deploy the v3 nozzle:
  1. Follow the configuration steps in Ops Manager: Install, Configure, and Deploy the Nozzle
  2. To deploy the nozzle, click Review Pending Changes and uncheck boxes for products that don't need to be redeployed. Click Apply Changes to complete the process.
  3. When installation is complete, click Change Log and verify that the older version shows Added.
Change log, arrow points to Added text in third column.

Install Nozzle 4.1.1 and Enable Service Broker Bindings

You enable service broker bindings as part of the Wavefront Proxy Config step of nozzle configuration.

To enable service broker bindings:
  1. Follow the installation steps in Ops Manager: Install, Configure, and Deploy the Nozzle.
  2. In the Wavefront Proxy Config tab, select the Enable legacy service broker bindings check box.
Proxy Config tab, with arrow pointing to Enable Service Broker Legacy Bindings check box

Symptom: No Data Flowing In and Certificate Error

No data are flowing in from one or more of your foundations. When you check the proxy log, you see an error like the following:

2022-06-05T08:17:37Z E! [outputs.wavefront::wavefront-pipeline-2] wavefront flushing error: error reporting wavefront format data to Wavefront: "Post \https://wavefront-proxy.service.internal:4443/report?f=wavefront\: x509: certificate signed by unknown authority"

Cause

This error results if the TLS connection between the Telegraf VM and the Proxy VM fails because the Tanzu Ops Manager root CA was not included during setup.

Solution

Include the root CA by clicking the check box. The following screenshot shows a BOSH Director for GCP setup with Include Tanzu Ops Manager Root CA in Trusted Certs checked.

Screenshot of Security tab shows Include Tanzu Ops Manager Root CA in Trusted Certs

Symptom: No Data Flowing or Dashboards Show Now Data

You have successfully set up the nozzle and the integration. However, you don’t see any data for the out-of-the-box dashboards. The most common cause is a problem with sending data to Tanzu Observability.

Potential Solutions:

  • Ensure that the installation of the Wavefront Nozzle in has completed.
  • Verify that the proxy uses the correct API token and Wavefront instance URL. You specify that information in Ops Manager in the Proxy Config page.
  • In your Tanzu Application Service environment, verify that the Bosh jobs for Wavefront proxy and for the Telegraf agent are running.
    • Using the BOSH cli, use the bosh deps command to identify your wavefront-nozzle deployment, then tail the logs using bosh ssh.
% bosh deps

% bosh ssh -d wavefront-nozzle-d62c653f58184da09b1d telegraf_agent
% sudo -i
% bpm logs -fa telegraf_agent

If you see errors in the output here, this may help pinpoint a specific issue in the environment. Otherwise, contact support.

If there are no errors in Telegraf, the next step is to check the logs for the wavefront_proxy

% bosh ssh -d wavefront-nozzle-d62c653f58184da09b1d wavefront_proxy
% sudo -i
% bpm logs -fa wavefront_proxy
  • Verify that data are flowing from the Wavefront proxy to your Wavefront instance. See Proxy Troubleshooting

Symptom: Higher than Expected PPS Rate

The PPS (points-per-second) rate can affect performance and potentially the cost of using Tanzu Observability.

  • 4.x: The PPS generated by the Tanzu Observability by Wavefront Nozzle version 4.x should be predictable and relatively consistent for any given foundation, because metrics are scraped at a fixed interval.
  • 3.x: Version 3.x of the Nozzle follows a push-based model. PPS varies based on factors such as HTTP requests being served by the gorouter, so PPS is less predictable.

However, it can be difficult to predict the average PPS of a TAS foundation ahead of time because several factors affect the total number of metrics that are generated:

  • The TAS version
  • The size of the foundation
  • Other TAS components running on the foundation

PPS might increase or decrease when individual TAS components are installed, upgraded or removed. Each individual component contributes its own metrics.

Solution:

  • Increase the Telegraf agent’s scrape interval. Metrics will be collected less frequently, and average PPS decreases.

Future releases will allow more targeted approaches to reducing PPS, for example, by filtering out unwanted metrics.

Symptom: Incomplete Data in Tanzu Observability

Data from your TAS foundation are visible in Tanzu Observability dashboards and charts, but seem incomplete.

Potential Cause:

Incomplete data is most likely caused by one or more components failing to keep up with the volume of metrics that are generated by the TAS. Typically this happens when the gauge exporter emits large numbers of metrics, and the Telegraf agent is not able to ingest these metrics and to forward them to the Wavefront proxy before the next collection cycle begins. Errors might result and metrics are dropped as the Telegraf agent tries to catch up.

Investigation:

Here are some things you can do.

  • Look for errors in bpm logs on the Telegraf agent or in the Wavefront proxy logs. See Proxy Troubleshooting and Telegraf Troubleshooting for details.
  • Look for collection errors from Telegraf (tas.observability.telegraf.internal_gather.errors)
  • Look for long collection times from Telegraf (tas.observability.telegraf.internal_gather.gather_time_ns)

Potential Solutions: In the Ops Manager tile:

  • Increase the size of the Telegraf Agent Virtual Machine
  • Increase the Telegraf scrape interval