diff --git a/docs/user-guide/deployments-administration/deploy-on-kubernetes/deploy-greptimedb-cluster.md b/docs/user-guide/deployments-administration/deploy-on-kubernetes/deploy-greptimedb-cluster.md index 75bef8a43..35603cac3 100644 --- a/docs/user-guide/deployments-administration/deploy-on-kubernetes/deploy-greptimedb-cluster.md +++ b/docs/user-guide/deployments-administration/deploy-on-kubernetes/deploy-greptimedb-cluster.md @@ -193,7 +193,7 @@ http://etcd-2.etcd-headless.etcd-cluster.svc.cluster.local:2379 is healthy: succ ## Setup `values.yaml` The `values.yaml` file contains parameters and configurations for GreptimeDB and is the key to defining the Helm chart. -For example, a minimal GreptimeDB cluster with self-monitoring configuration is as follows: +For example, a minimal GreptimeDB cluster configuration is as follows: ```yaml image: @@ -212,15 +212,6 @@ initializer: registry: docker.io repository: greptime/greptimedb-initializer -monitoring: - # Enable monitoring - enabled: true - -grafana: - # Enable grafana deployment. - # It needs to enable monitoring `monitoring.enabled: true` first. - enabled: true - frontend: replicas: 1 @@ -239,11 +230,11 @@ You should adjust the configuration according to your requirements. You can refer to the [configuration documentation](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md) for the complete `values.yaml` configuration options. -## Install the GreptimeDB cluster with self-monitoring +## Install the GreptimeDB cluster Now that the GreptimeDB Operator and etcd cluster are installed, and `values.yaml` is configured, -you can deploy a minimal GreptimeDB cluster with self-monitoring and Flow enabled: +you can deploy a minimal GreptimeDB cluster: ```bash helm upgrade --install mycluster \ @@ -277,51 +268,6 @@ The greptimedb-cluster is starting, use `kubectl get pods -n default` to check i ``` -When both `monitoring` and `grafana` options are enabled, we will enable **self-monitoring** for the GreptimeDB cluster: a GreptimeDB standalone instance will be deployed to monitor the GreptimeDB cluster, and the monitoring data will be visualized using Grafana, making it easier to troubleshoot issues in the GreptimeDB cluster. - -We will deploy a GreptimeDB standalone instance named `${cluster}-monitor` in the same namespace as the cluster to store monitoring data such as metrics and logs from the cluster. Additionally, we will deploy a [Vector](https://github.com/vectordotdev/vector) sidecar for each pod in the cluster to collect metrics and logs and send them to the GreptimeDB standalone instance. - -We will deploy a [Grafana](https://grafana.com/) instance and configure it to use the GreptimeDB standalone instance as a data source (using both Prometheus and MySQL protocols), allowing us to visualize the GreptimeDB cluster's monitoring data out of the box. By default, Grafana will use `mycluster` and `default` as the cluster name and namespace to create data sources. If you want to monitor clusters with different names or namespaces, you'll need to create different data source configurations based on the cluster names and namespaces. You can create a `values.yaml` file like this: - -```yaml -monitoring: - enabled: true - -grafana: - enabled: true - datasources: - datasources.yaml: - datasources: - - name: greptimedb-metrics - type: prometheus - url: http://${cluster}-monitor-standalone.${namespace}.svc.cluster.local:4000/v1/prometheus - access: proxy - isDefault: true - - - name: greptimedb-logs - type: mysql - url: ${cluster}-monitor-standalone.${namespace}.svc.cluster.local:4002 - access: proxy - database: public -``` - -The above configuration will create the default datasources for the GreptimeDB cluster metrics and logs in the Grafana dashboard: - -- `greptimedb-metrics`: The metrics of the cluster are stored in the standalone monitoring database and exposed in Prometheus protocol (`type: prometheus`); - -- `greptimedb-logs`: The logs of the cluster are stored in the standalone monitoring database and exposed in MySQL protocol (`type: mysql`). It uses the `public` database by default; - -Then replace `{cluster}` and `${namespace}` with your desired values and install the GreptimeDB cluster using the following command (please note that `{cluster}` and `${namespace}` in the command also need to be replaced): - -```bash -helm install {cluster} \ - --set monitoring.enabled=true \ - --set grafana.enabled=true \ - greptime/greptimedb-cluster \ - -f values.yaml \ - -n ${namespace} -``` - When starting the cluster installation, we can check the status of the GreptimeDB cluster with the following command. If you use a different cluster name and namespace, you can replace `mycluster` and `default` with your configuration: ```bash @@ -350,13 +296,11 @@ kubectl -n default get pods NAME READY STATUS RESTARTS AGE mycluster-datanode-0 2/2 Running 0 77s mycluster-frontend-6ffdd549b-9s7gx 2/2 Running 0 66s -mycluster-grafana-675b64786-ktqps 1/1 Running 0 6m35s mycluster-meta-58bc88b597-ppzvj 2/2 Running 0 86s -mycluster-monitor-standalone-0 1/1 Running 0 6m35s ``` -As you can see, we have created a minimal GreptimeDB cluster consisting of 1 frontend, 1 datanode, and 1 metasrv by default. For information about the components of a complete GreptimeDB cluster, you can refer to [architecture](/user-guide/concepts/architecture.md). Additionally, we have deployed a standalone GreptimeDB instance (`mycluster-monitor-standalone-0`) for storing monitoring data and a Grafana instance (`mycluster-grafana-675b64786-ktqps`) for visualizing the cluster's monitoring data. +As you can see, we have created a minimal GreptimeDB cluster consisting of 1 frontend, 1 datanode, and 1 metasrv by default. For information about the components of a complete GreptimeDB cluster, you can refer to [architecture](/user-guide/concepts/architecture.md). ## Explore the GreptimeDB cluster @@ -406,33 +350,6 @@ Open the browser and navigate to `http://localhost:4000/dashboard` to access by If you want to use other tools like `mysql` or `psql` to connect to the GreptimeDB cluster, you can refer to the [Quick Start](/getting-started/quick-start.md). -### Access the Grafana dashboard - -You can access the Grafana dashboard by using `kubctl port-forward` the Grafana service: - -```bash -kubectl -n default port-forward svc/mycluster-grafana 18080:80 -``` - -Please note that when you use a different cluster name and namespace, you can use the following command, and replace `${cluster}` and `${namespace}` with your configuration: - -```bash -kubectl -n ${namespace} port-forward svc/${cluster}-grafana 18080:80 -``` - -Then open your browser and navigate to `http://localhost:18080` to access the Grafana dashboard. The default username and password are `admin` and `gt-operator`: - -![Grafana Dashboard](/kubernetes-cluster-grafana-dashboard.jpg) - -There are three dashboards available: - -- **GreptimeDB**: Displays the metrics of the GreptimeDB cluster. -- **GreptimeDB Logs**: Displays the logs of the GreptimeDB cluster. - -## Next Steps - -- If you want to deploy a GreptimeDB cluster with Remote WAL, you can refer to [Configure Remote WAL](/user-guide/deployments-administration/deploy-on-kubernetes/configure-remote-wal.md) for more details. - ## Cleanup :::danger @@ -461,7 +378,6 @@ The PVCs wouldn't be deleted by default for safety reasons. If you want to delet ```bash kubectl -n default delete pvc -l app.greptime.io/component=mycluster-datanode -kubectl -n default delete pvc -l app.greptime.io/component=mycluster-monitor-standalone ``` ### Cleanup the etcd cluster @@ -479,3 +395,8 @@ If you are using `kind` to create the Kubernetes cluster, you can use the follow ```bash kind delete cluster ``` + +## Next Steps + +If you want to deploy a GreptimeDB cluster with Remote WAL, you can refer to [Configure Remote WAL](/user-guide/deployments-administration/deploy-on-kubernetes/configure-remote-wal.md) for more details. + diff --git a/docs/user-guide/deployments-administration/monitoring/cluster-monitoring-deployment.md b/docs/user-guide/deployments-administration/monitoring/cluster-monitoring-deployment.md index 24592d3d6..825f31a0f 100644 --- a/docs/user-guide/deployments-administration/monitoring/cluster-monitoring-deployment.md +++ b/docs/user-guide/deployments-administration/monitoring/cluster-monitoring-deployment.md @@ -1,39 +1,126 @@ --- keywords: [Kubernetes deployment, cluster, monitoring] -description: Guide to deploying monitoring for GreptimeDB clusters on Kubernetes, including self-monitoring and Prometheus monitoring steps. +description: Complete guide to deploying self-monitoring for GreptimeDB clusters on Kubernetes, including Grafana dashboard setup and configuration options --- -# Cluster Monitoring Deployment +# Self-Monitoring GreptimeDB Clusters -After deploying a GreptimeDB cluster using GreptimeDB Operator, by default, its components (Metasrv / Datanode / Frontend) expose a `/metrics` endpoint on their HTTP port (default `4000`) for [Prometheus metrics](/reference/http-endpoints.md#metrics). +Before reading this document, ensure you understand how to [deploy a GreptimeDB cluster on Kubernetes](/user-guide/deployments-administration/deploy-on-kubernetes/deploy-greptimedb-cluster.md). +This guide will walk you through configuring monitoring when deploying a GreptimeDB cluster. -We provide two approaches to monitor the GreptimeDB cluster: +## Quick Start -1. **Enable GreptimeDB Self-Monitoring**: The GreptimeDB Operator will launch an additional GreptimeDB Standalone instance and Vector Sidecar container to collect and store metrics and logs from the GreptimeDB cluster. -2. **Use Prometheus Operator to Configure Prometheus Metrics Monitoring**: Users need first to deploy Prometheus Operator and create Prometheus instance, then use Prometheus Operator's `PodMonitor` to write GreptimeDB cluster metrics into Prometheus. +You can enable monitoring and Grafana by adding configurations to the `values.yaml` file when deploying the GreptimeDB cluster using Helm Chart. +Here's a complete example of `values.yaml` for deploying a minimal GreptimeDB cluster with monitoring and Grafana: -Users can choose the appropriate monitoring approach based on their needs. +```yaml +image: + registry: docker.io + # Image repository: + # Use `greptime/greptimedb` for OSS GreptimeDB + # Consult staff for Enterprise GreptimeDB + repository: + # Image tag: + # Use database version `VAR::greptimedbVersion` for OSS GreptimeDB + # Consult staff for Enterprise GreptimeDB + tag: + pullSecrets: [ regcred ] + +initializer: + registry: docker.io + repository: greptime/greptimedb-initializer + +monitoring: + # Enable monitoring + enabled: true + +grafana: + # Enable Grafana deployment + # Requires monitoring to be enabled first (monitoring.enabled: true) + enabled: true + +frontend: + replicas: 1 + +meta: + replicas: 1 + backendStorage: + etcd: + endpoints: "etcd.etcd-cluster.svc.cluster.local:2379" + +datanode: + replicas: 1 +``` + +When monitoring is enabled, GreptimeDB Operator launches an additional GreptimeDB Standalone instance to collect metrics and logs from the GreptimeDB cluster. +To collect log data, GreptimeDB Operator starts a [Vector](https://vector.dev/) sidecar container in each Pod. + +When Grafana is enabled, a Grafana instance is deployed that uses the GreptimeDB Standalone instance configured for cluster monitoring as its data source. +This enables visualization of the GreptimeDB cluster's monitoring data out of the box using both Prometheus and MySQL protocols. + +Then install the GreptimeDB cluster with the above `values.yaml` file: + +```bash +helm upgrade --install mycluster \ + greptime/greptimedb-cluster \ + --values /path/to/values.yaml \ + -n default +``` + +After installation, you can check the Pod status of the GreptimeDB cluster: + +```bash +kubectl -n default get pods +``` -## Enable GreptimeDB Self-Monitoring +
+ Expected Output +```bash +NAME READY STATUS RESTARTS AGE +mycluster-datanode-0 2/2 Running 0 77s +mycluster-frontend-6ffdd549b-9s7gx 2/2 Running 0 66s +mycluster-grafana-675b64786-ktqps 1/1 Running 0 6m35s +mycluster-meta-58bc88b597-ppzvj 2/2 Running 0 86s +mycluster-monitor-standalone-0 1/1 Running 0 6m35s +``` +
+ +You can then access the Grafana dashboard by port-forwarding the Grafana service to your local machine: + +```shell +kubectl -n default port-forward svc/mycluster-grafana 18080:80 +``` -In self-monitoring mode, GreptimeDB Operator will launch an additional GreptimeDB Standalone instance to collect metrics and logs from the GreptimeDB cluster, including cluster logs. To collect log data, GreptimeDB Operator will start a [Vector](https://vector.dev/) sidecar container in each Pod. When this mode is enabled, JSON format logging will be automatically enabled for the cluster. +Then refer to the [Access Grafana Dashboard](#access-grafana-dashboard) section below for details on accessing Grafana. -If you deploy the GreptimeDB cluster using Helm Chart (refer to [Getting Started](/user-guide/deployments-administration/deploy-on-kubernetes/deploy-greptimedb-cluster.md)), you can configure the `values.yaml` file as follows: +## Monitoring Configuration + +This section covers the details of monitoring configurations. + +### Enable Monitoring + +Add the following configuration to [`values.yaml`](/user-guide/deployments-administration/deploy-on-kubernetes/deploy-greptimedb-cluster.md#setup-valuesyaml) to enable monitoring when deploying the GreptimeDB cluster: ```yaml monitoring: enabled: true ``` -This will deploy a GreptimeDB Standalone instance named `${cluster}-monitoring` to collect metrics and logs. You can check it with: +This deploys a GreptimeDB Standalone instance named `${cluster-name}-monitoring` to collect metrics and logs. You can verify the deployment with: -``` -kubectl get greptimedbstandalones.greptime.io ${cluster}-monitoring -n ${namespace} +```bash +kubectl get greptimedbstandalones.greptime.io ${cluster-name}-monitoring -n ${namespace} ``` -By default, this GreptimeDB Standalone instance will store monitoring data using the Kubernetes default StorageClass in local storage. You can adjust this based on your needs. +The GreptimeDB Standalone instance exposes services using `${cluster-name}-monitoring-standalone` as the Kubernetes Service name. You can use the following addresses to access monitoring data: -The GreptimeDB Standalone instance can be configured via the `monitoring.standalone` field in `values.yaml`, for example: +- **Prometheus metrics**: `http://${cluster-name}-monitor-standalone.${namespace}.svc.cluster.local:4000/v1/prometheus` +- **SQL logs**: `${cluster-name}-monitor-standalone.${namespace}.svc.cluster.local:4002`. By default, cluster logs are stored in the `public._gt_logs` table. + +### Customize Monitoring Storage + +By default, the GreptimeDB Standalone instance stores monitoring data using the Kubernetes default StorageClass in local storage. +You can configure the GreptimeDB Standalone instance through the `monitoring.standalone` field in `values.yaml`. For example, the following configuration uses S3 object storage to store monitoring data: ```yaml monitoring: @@ -66,12 +153,10 @@ monitoring: root: "standalone-with-s3-data" ``` -The GreptimeDB Standalone instance will expose services using `${cluster}-monitoring-standalone` as the Kubernetes Service name. You can use the following addresses to read monitoring data: - -- **Prometheus metrics**: `http://${cluster}-monitor-standalone.${namespace}.svc.cluster.local:4000/v1/prometheus` -- **SQL logs**: `${cluster}-monitor-standalone.${namespace}.svc.cluster.local:4002`. By default, cluster logs are stored in `public._gt_logs` table. +### Customize Vector Sidecar -The Vector sidecar configuration for log collection can be customized via the `monitoring.vector` field: +The Vector sidecar configuration for log collection can be customized via the `monitoring.vector` field. +For example, you can adjust the Vector image and resource limits as follows: ```yaml monitoring: @@ -94,139 +179,91 @@ monitoring: memory: "64Mi" ``` -:::tip NOTE -The configuration structure has changed between chart versions: +### YAML Configuration with `kubectl` Deployment -- In older version: `meta.etcdEndpoints` -- In newer version: `meta.backendStorage.etcd.endpoints` - -Always refer to the latest [values.yaml](https://github.com/GreptimeTeam/helm-charts/blob/main/charts/greptimedb-cluster/values.yaml) in the Helm chart repository for the most up-to-date configuration structure. -::: - -:::note -If you're not using Helm Chart, you can manually configure self-monitoring mode in the `GreptimeDBCluster` YAML: +If you're not using Helm Chart, you can also use the `monitoring` field to manually configure self-monitoring mode in the `GreptimeDBCluster` YAML: ```yaml -apiVersion: greptime.io/v1alpha1 -kind: GreptimeDBCluster -metadata: - name: basic -spec: - base: - main: - image: greptime/greptimedb:latest - frontend: - replicas: 1 - meta: - replicas: 1 - backendStorage: - etcd: - endpoints: - - "etcd.etcd-cluster.svc.cluster.local:2379" - datanode: - replicas: 1 - monitoring: - enabled: true +monitoring: + enabled: true ``` -The `monitoring` field configures self-monitoring mode. See [`GreptimeDBCluster` API docs](https://github.com/GreptimeTeam/greptimedb-operator/blob/main/docs/api-references/docs.md#monitoringspec) for details. -::: +For detailed configuration options, refer to the [`GreptimeDBCluster` API documentation](https://github.com/GreptimeTeam/greptimedb-operator/blob/main/docs/api-references/docs.md#monitoringspec). -## Use Prometheus Operator to Configure Prometheus Metrics Monitoring -Users need to first deploy Prometheus Operator and create Prometheus instance. For example, you can use [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) to deploy the Prometheus stack. You can refer to its [official documentation](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) for more details. +## Grafana Configuration -After deploying Prometheus Operator and instances, you can configure Prometheus monitoring via the `prometheusMonitor` field in `values.yaml`: +### Enable Grafana + +To enable Grafana deployment, add the following configuration to `values.yaml`. +Note that monitoring must be enabled first [(`monitoring.enabled: true`)](#enable-monitoring): ```yaml -prometheusMonitor: - # Enable Prometheus monitoring - this will create PodMonitor resources +grafana: enabled: true - # Configure scrape interval - interval: "30s" - # Configure labels - labels: - release: prometheus ``` -:::note -The `labels` field must match the `matchLabels` field used to create the Prometheus instance, otherwise metrics collection won't work properly. -::: - -After configuring `prometheusMonitor`, GreptimeDB Operator will automatically create `PodMonitor` resources and import metrics into Prometheus. You can check the `PodMonitor` resources with: +### Customize Grafana Data Sources -``` -kubectl get podmonitors.monitoring.coreos.com -n ${namespace} -``` - -:::note -If not using Helm Chart, you can manually configure Prometheus monitoring in the `GreptimeDBCluster` YAML: +By default, Grafana uses `mycluster` and `default` as the cluster name and namespace to create data sources. +To monitor clusters with different names or namespaces, you need to create custom data source configurations based on the actual cluster names and namespaces. +Here's an example `values.yaml` configuration: ```yaml -apiVersion: greptime.io/v1alpha1 -kind: GreptimeDBCluster -metadata: - name: basic -spec: - base: - main: - image: greptime/greptimedb:latest - frontend: - replicas: 1 - meta: - replicas: 1 - backendStorage: - etcd: - endpoints: - - "etcd.etcd-cluster.svc.cluster.local:2379" - datanode: - replicas: 1 - prometheusMonitor: - enabled: true - interval: "30s" - labels: - release: prometheus +monitoring: + enabled: true + +grafana: + enabled: true + datasources: + datasources.yaml: + datasources: + - name: greptimedb-metrics + type: prometheus + url: http://${cluster-name}-monitor-standalone.${namespace}.svc.cluster.local:4000/v1/prometheus + access: proxy + isDefault: true + + - name: greptimedb-logs + type: mysql + url: ${cluster-name}-monitor-standalone.${namespace}.svc.cluster.local:4002 + access: proxy + database: public ``` -The `prometheusMonitor` field configures Prometheus monitoring. -::: +This configuration creates the following data sources for GreptimeDB cluster monitoring in Grafana: -## Import Grafana Dashboards +- **`greptimedb-metrics`**: Cluster metrics stored in the standalone monitoring database, exposed via Prometheus protocol (`type: prometheus`) +- **`greptimedb-logs`**: Cluster logs stored in the standalone monitoring database, exposed via MySQL protocol (`type: mysql`). Uses the `public` database by default -GreptimeDB cluster currently provides 2 Grafana dashboards: +### Access Grafana Dashboard -- [Cluster Metrics Dashboard](https://github.com/GreptimeTeam/greptimedb/tree/VAR::greptimedbVersion/grafana/dashboards/metrics/cluster) -- [Cluster Logs Dashboard](https://github.com/GreptimeTeam/greptimedb/tree/VAR::greptimedbVersion/grafana/dashboards/logs) +You can access the Grafana dashboard by port-forwarding the Grafana service to your local machine: -:::note -The Cluster Logs Dashboard is only for self-monitoring mode, while the Cluster Metrics Dashboard works for both self-monitoring and Prometheus monitoring modes. -::: +```bash +kubectl -n ${namespace} port-forward svc/${cluster-name}-grafana 18080:80 +``` -If using Helm Chart, you can enable `grafana.enabled` to deploy Grafana and import dashboards automatically (see [Getting Started](/user-guide/deployments-administration/deploy-on-kubernetes/deploy-greptimedb-cluster.md)): +Then open `http://localhost:18080` to access the Grafana dashboard. +The default login credentials for Grafana are: -```yaml -grafana: - enabled: true -``` +- **Username**: `admin` +- **Password**: `gt-operator` -If you already have Grafana deployed, follow these steps to import the dashboards: +Navigate to the `Dashboards` section to explore the pre-configured dashboards for monitoring your GreptimeDB cluster. -1. **Add Data Sources** +![Grafana Dashboard](/kubernetes-cluster-grafana-dashboard.jpg) - You can refer to Grafana's [datasources](https://grafana.com/docs/grafana/latest/datasources/) docs to add the following 3 data sources: - - **`metrics` data source** - - For importing Prometheus metrics, works with both monitoring modes. For self-monitoring mode, use `http://${cluster}-monitor-standalone.${namespace}.svc.cluster.local:4000/v1/prometheus` as the URL. For your own Prometheus instance, use your Prometheus instance URL. +## Cleanup the PVCs - - **`information-schema` data source** - - For importing cluster metadata via SQL, works with both monitoring modes. Use `${cluster}-frontend.${namespace}.svc.cluster.local:4002` as the SQL address with database `information_schema`. +:::danger +The cleanup operation will remove the metadata and data of the GreptimeDB cluster. Please make sure you have backed up the data before proceeding. +::: +To uninstall the GreptimeDB cluster, please refer to the [Cleanup GreptimeDB Cluster](/user-guide/deployments-administration/deploy-on-kubernetes/deploy-greptimedb-cluster.md#cleanup) documentation. - - **`logs` data source** - - For importing cluster logs via SQL, **only works with self-monitoring mode**. Use `${cluster}-monitor-standalone.${namespace}.svc.cluster.local:4002` as the SQL address with database `public`. +To clean up the Persistent Volume Claims (PVCs) used by the GreptimeDB standalone monitoring instance, delete the PVCs using the following command: -2. **Import Dashboards** - - You can refer to Grafana's [Import dashboards](https://grafana.com/docs/grafana/latest/dashboards/build-dashboards/import-dashboards/) docs. +```bash +kubectl -n default delete pvc -l app.greptime.io/component=${cluster-name}-monitor-standalone +``` diff --git a/docs/user-guide/deployments-administration/monitoring/monitor-cluster-with-prometheus.md b/docs/user-guide/deployments-administration/monitoring/monitor-cluster-with-prometheus.md new file mode 100644 index 000000000..1ad30f86f --- /dev/null +++ b/docs/user-guide/deployments-administration/monitoring/monitor-cluster-with-prometheus.md @@ -0,0 +1,116 @@ +--- +keywords: [cluster, monitoring, Prometheus] +description: Learn how to monitor a GreptimeDB cluster using an existing Prometheus instance in Kubernetes, including configuration steps and Grafana dashboard setup. +--- + +# Prometheus-Monitoring GreptimeDB Cluster + +Before reading this document, ensure you understand how to [deploy a GreptimeDB cluster on Kubernetes](/user-guide/deployments-administration/deploy-on-kubernetes/deploy-greptimedb-cluster.md). + +It is recommended to use [self-monitoring mode](cluster-monitoring-deployment.md) to monitor GreptimeDB cluster, +as it's simple to set up and provides out-of-the-box Grafana dashboards. +However, if you already have a Prometheus instance deployed in your Kubernetes cluster and want to integrate +GreptimeDB cluster metrics into it, follow the steps below. + + +## Check the Prometheus Instance Configuration + +Ensure you have deployed the Prometheus Operator and created a Prometheus instance. For example, you can use [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) to deploy the Prometheus stack. Refer to its [official documentation](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) for more details. + +When deploying the Prometheus instance, ensure you set the labels used for scraping GreptimeDB cluster metrics. +For example, your existing Prometheus instance may contain the following configuration: + +```yaml +apiVersion: monitoring.coreos.com/v1 +kind: PodMonitor +metadata: + name: greptime-podmonitor + namespace: default +spec: + selector: + matchLabels: + release: prometheus + # other configurations... +``` + +When the `PodMonitor` is deployed, +the Prometheus Operator continuously watches for pods in the `default` namespace that match all labels defined in `spec.selector.matchLabels` (in this example, `release: prometheus`). + +## Enable `prometheusMonitor` for GreptimeDB Cluster + +When deploying a GreptimeDB cluster using a Helm Chart, +enable the `prometheusMonitor` field in your `values.yaml` file. For example: + +```yaml +prometheusMonitor: + # Enable Prometheus monitoring - this will create PodMonitor resources + enabled: true + # Configure scrape interval + interval: "30s" + # Configure labels + labels: + release: prometheus +``` + +**Important:** The `labels` field value (`release: prometheus`) +must match the `matchLabels` field used to create the Prometheus instance, +otherwise metrics collection won't work properly. + +After configuring `prometheusMonitor`, +the GreptimeDB Operator will automatically create `PodMonitor` resources and import metrics into Prometheus at the specified `interval`. +You can check the `PodMonitor` resources with: + +``` +kubectl get podmonitors.monitoring.coreos.com -n ${namespace} +``` + + +:::note +If you're not using a Helm Chart, you can manually configure Prometheus monitoring in the `GreptimeDBCluster` YAML: + +```yaml +apiVersion: greptime.io/v1alpha1 +kind: GreptimeDBCluster +metadata: + name: basic +spec: + base: + main: + image: greptime/greptimedb:latest + frontend: + replicas: 1 + meta: + replicas: 1 + backendStorage: + etcd: + endpoints: + - "etcd.etcd-cluster.svc.cluster.local:2379" + datanode: + replicas: 1 + prometheusMonitor: + enabled: true + interval: "30s" + labels: + release: prometheus +``` + +::: + +## Grafana Dashboards + +You need to deploy Grafana by yourself, +then import the dashboards. + +### Add Data Sources + +After deploying Grafana, +refer to Grafana's [data sources](https://grafana.com/docs/grafana/latest/datasources/) documentation to add the following two type data sources: + +- **Prometheus**: Name it `metrics`. This data source connects to your Prometheus instance, which collects GreptimeDB cluster monitoring metrics. Use your Prometheus instance URL as the connection URL. +- **MySQL**: Name it `information-schema`. This data source connects to your GreptimeDB cluster to access cluster metadata via the SQL protocol. If you have deployed GreptimeDB following the [Deploy a GreptimeDB Cluster on Kubernetes](/user-guide/deployments-administration/deploy-on-kubernetes/deploy-greptimedb-cluster.md) guide, use `${cluster-name}-frontend.${namespace}.svc.cluster.local:4002` as the server address with database `information_schema`. + +### Import Dashboards + +The [GreptimeDB Cluster Metrics Dashboard](https://github.com/GreptimeTeam/greptimedb/tree/VAR::greptimedbVersion/grafana/dashboards/metrics/cluster) uses the `metrics` and `information-schema` data sources to display GreptimeDB cluster metrics. + +Refer to Grafana's [Import dashboards](https://grafana.com/docs/grafana/latest/dashboards/build-dashboards/import-dashboards/) documentation to learn how to import dashboards. diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/deploy-on-kubernetes/deploy-greptimedb-cluster.md b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/deploy-on-kubernetes/deploy-greptimedb-cluster.md index 35dceff5e..146aff32b 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/deploy-on-kubernetes/deploy-greptimedb-cluster.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/deploy-on-kubernetes/deploy-greptimedb-cluster.md @@ -225,7 +225,7 @@ http://etcd-2.etcd-headless.etcd-cluster.svc.cluster.local:2379 is healthy: succ ## 配置 `values.yaml` `values.yaml` 文件设置了 GreptimeDB 的一些参数和配置,是定义 helm chart 的关键。 -例如一个带有自监控的最小规模 GreptimeDB 集群定义如下: +例如一个最小规模 GreptimeDB 集群定义如下: ```yaml image: @@ -244,15 +244,6 @@ initializer: registry: docker.io repository: greptime/greptimedb-initializer -monitoring: - # 启用监控 - enabled: true - -grafana: - # 用于监控面板 - # 需要先启用监控 `monitoring.enabled: true` 选项 - enabled: true - frontend: replicas: 1 @@ -266,7 +257,7 @@ datanode: replicas: 1 ``` -:::note +:::note 备注 中国大陆用户如有网络访问问题,可直接使用阿里云 OCI 镜像仓库: ```yaml @@ -286,20 +277,6 @@ initializer: registry: greptime-registry.cn-hangzhou.cr.aliyuncs.com repository: greptime/greptimedb-initializer -monitoring: - # 启用监控 - enabled: true - vector: - # 监控需要使用 Vector - registry: greptime-registry.cn-hangzhou.cr.aliyuncs.com - -grafana: - # 用于监控面板 - # 需要先启用监控 `monitoring.enabled: true` 选项 - enabled: true - image: - registry: greptime-registry.cn-hangzhou.cr.aliyuncs.com - frontend: replicas: 1 @@ -318,10 +295,10 @@ datanode: 可参考[配置文档](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md)获取完整的 `values.yaml` 的配置项。 -## 安装带有自监控的 GreptimeDB 集群 +## 安装 GreptimeDB 集群 在上述步骤中我们已经准备好了 GreptimeDB Operator,etcd 集群以及 GreptimeDB 集群相应的配置, -现在部署一个带自监控并启用 Flow 功能的最小 GreptimeDB 集群: +现在部署一个最小 GreptimeDB 集群: ```bash helm upgrade --install mycluster \ @@ -357,49 +334,6 @@ The greptimedb-cluster is starting, use `kubectl get pods -n default` to check i ``` -当同时启用 `monitoring` 和 `grafana` 选项时,我们将对 GreptimeDB 集群启动**自监控**:启动一个 GreptimeDB standalone 实例来监控 GreptimeDB 集群,并将相应的监控数据用 Grafana 进行渲染,从而更方便地排查 GreptimeDB 集群使用中的问题。 - -我们将会在 cluster 所属的命名空间下部署一个名为 `${cluster}-monitor` 的 GreptimeDB standalone 实例,用于存储集群的 metrics 和 logs 这类监控数据。同时,我们也会为集群内的每一个 Pod 部署一个 [Vector](https://github.com/vectordotdev/vector) sidecar 来收集集群的 metrics 和 logs,并发送给 GreptimeDB standalone 实例。 - -我们也将会部署一个 Grafana 实例,并配置 [Grafana](https://grafana.com/) 使用 GreptimeDB standalone 实例作为数据源(分别使用 Prometheus 和 MySQL 协议),从而我们开箱即可使用 Grafana 来可视化 GreptimeDB 集群的监控数据。默认地,Grafana 将会使用 `mycluster` 和 `default` 作为集群名称和命名空间来创建数据源。如果你想要监控具有不同名称或不同命名空间的集群,那就需要基于不同的集群名称和命名空间来创建不同的数据源配置。你可以创建一个如下所示的 `values.yaml` 文件: - -```yaml -monitoring: - enabled: true - -grafana: - enabled: true - datasources: - datasources.yaml: - datasources: - - name: greptimedb-metrics - type: prometheus - url: http://${cluster}-monitor-standalone.${namespace}.svc.cluster.local:4000/v1/prometheus - access: proxy - isDefault: true - - - name: greptimedb-logs - type: mysql - url: ${cluster}-monitor-standalone.${namespace}.svc.cluster.local:4002 - access: proxy - database: public -``` - -上述配置将在 Grafana dashboard 中为 GreptimeDB 集群的指标和日志创建默认的数据源: - -- `greptimedb-metrics`:集群的指标存储在独立的监控数据库中,并对外暴露为 Prometheus 协议(`type: prometheus`); - -- `greptimedb-logs`:集群的日志存储在独立的监控数据库中,并对外暴露为 MySQL 协议(`type: mysql`)。默认使用 `public` 数据库; - -然后将上面的 `values.yaml` 中的 `${cluster}` 和 `${namespace}` 替换为你想要的值,并使用以下命令安装 GreptimeDB 集群: - -```bash -helm install ${cluster} \ - greptime/greptimedb-cluster \ - -f values.yaml \ - -n ${namespace} -``` - 当启动集群安装之后,我们可以用如下命令检查 GreptimeDB 集群的状态。若你使用了不同的集群名和命名空间,可将 `default` 和 `mycluster` 替换为你的配置: ```bash @@ -428,13 +362,11 @@ kubectl -n default get pods NAME READY STATUS RESTARTS AGE mycluster-datanode-0 2/2 Running 0 77s mycluster-frontend-6ffdd549b-9s7gx 2/2 Running 0 66s -mycluster-grafana-675b64786-ktqps 1/1 Running 0 6m35s mycluster-meta-58bc88b597-ppzvj 2/2 Running 0 86s -mycluster-monitor-standalone-0 1/1 Running 0 6m35s ``` -正如你所看到的,我们默认创建了一个最小的 GreptimeDB 集群,包括 1 个 frontend、1 个 datanode 和 1 个 metasrv。关于一个完整的 GreptimeDB 集群的组成,你可以参考 [architecture](/user-guide/concepts/architecture.md)。除此之外,我们还部署了一个独立的 GreptimeDB standalone 实例(`mycluster-monitor-standalone-0`)用以存储监控数据和一个 Grafana 实例(`mycluster-grafana-675b64786-ktqps`)用以可视化集群的监控数据。 +正如你所看到的,我们默认创建了一个最小的 GreptimeDB 集群,包括 1 个 frontend、1 个 datanode 和 1 个 metasrv。关于一个完整的 GreptimeDB 集群的组成,你可以参考 [architecture](/user-guide/concepts/architecture.md)。 ## 探索 GreptimeDB 集群 @@ -480,30 +412,7 @@ kubectl -n default port-forward --address 0.0.0.0 svc/mycluster-frontend 4000:40 如果你想使用其他工具如 `mysql` 或 `psql` 来连接 GreptimeDB 集群,你可以参考 [快速入门](/getting-started/quick-start.md)。 -### 访问 Grafana dashboard - -你可以使用 `kubectl port-forward` 命令转发 Grafana 服务: - -```bash -kubectl -n default port-forward svc/mycluster-grafana 18080:80 -``` - -请注意,当你使用了其他集群名和命名空间时,你可以使用如下命令,并将 `${cluster}` 和 `${namespace}` 替换为你的配置: - -```bash -kubectl -n ${namespace} port-forward svc/${cluster}-grafana 18080:80 -``` - -接着打开浏览器并访问 `http://localhost:18080` 来访问 Grafana dashboard。默认的用户名和密码是 `admin` 和 `gt-operator`: - -![Grafana Dashboard](/kubernetes-cluster-grafana-dashboard.jpg) - -目前有三个可用的 Dashboard: - -- **GreptimeDB**: 用于显示 GreptimeDB 集群的 Metrics; -- **GreptimeDB Logs**: 用于显示 GreptimeDB 集群的日志; - -## 清理 +## 删除集群 :::danger 清理操作将会删除 GreptimeDB 集群的元数据和数据。请确保在继续操作之前已经备份了数据。 @@ -531,7 +440,6 @@ helm -n default uninstall mycluster ```bash kubectl -n default delete pvc -l app.greptime.io/component=mycluster-datanode -kubectl -n default delete pvc -l app.greptime.io/component=mycluster-monitor-standalone ``` ### 清理 etcd 数据 diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/monitoring/cluster-monitoring-deployment.md b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/monitoring/cluster-monitoring-deployment.md index 342e4d002..01a51acf8 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/monitoring/cluster-monitoring-deployment.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/monitoring/cluster-monitoring-deployment.md @@ -1,39 +1,174 @@ --- keywords: [Kubernetes 部署, 集群, 监控] -description: 在 Kubernetes 上部署 GreptimeDB 集群的监控指南,包括自监控和 Prometheus 监控的详细步骤。 +description: 在 Kubernetes 上部署 GreptimeDB 集群的自监控完整指南,包括 Grafana 仪表盘设置和配置项。 --- -# 集群监控部署 +# 自监控 GreptimeDB 集群 -当你使用 GreptimeDB Operator 部署 GreptimeDB 集群后,默认其对应组件(如 Metasrv / Datanode / Frontend)的 HTTP 端口(默认为 `4000`)将会暴露 `/metrics` 端点用于暴露 [Prometheus 指标](/reference/http-endpoints.md#指标)。 +在阅读本文档前,请确保你已经了解如何[在 Kubernetes 上部署 GreptimeDB 集群](/user-guide/deployments-administration/deploy-on-kubernetes/deploy-greptimedb-cluster.md)。 +本文将介绍在部署 GreptimeDB 集群时如何配置监控。 -我们将提供两种方式来监控 GreptimeDB 集群: +## 快速开始 -1. **启用 GreptimeDB 自监控**:GreptimeDB Operator 将额外启动一个 GreptimeDB Standalone 实例和 Vector Sidecar 容器,分别用于收集和存储 GreptimeDB 集群的指标和日志数据; -2. **使用 Prometheus Operator 配置 Prometheus 指标监控**:用户需先部署 Prometheus Operator,并创建相应的 Prometheus 实例,然后通过 Prometheus Operator 的 `PodMonitor` 来将 GreptimeDB 集群的 Metrics 数据写入到相应的 Prometheus 中; +你可以在使用 Helm Chart 部署 GreptimeDB 集群时,通过对 `values.yaml` 文件进行配置来启用监控和 Grafana。下面是一个完整的 `values.yaml` 示例,用于部署一个最小化的带有监控和 Grafana 的 GreptimeDB 集群: -用户可根据自身需求选择合适的监控方式。 +```yaml +image: + registry: docker.io + # 镜像仓库: + # OSS GreptimeDB 使用 `greptime/greptimedb`, + # Enterprise GreptimeDB 请咨询工作人员 + repository: + # 镜像标签: + # OSS GreptimeDB 使用数据库版本,例如 `v0.17.1` + # Enterprise GreptimeDB 请咨询工作人员 + tag: + pullSecrets: [ regcred ] + +initializer: + registry: docker.io + repository: greptime/greptimedb-initializer + +monitoring: + # 启用监控 + enabled: true + +grafana: + # 用于监控面板 + # 需要先启用监控 `monitoring.enabled: true` 选项 + enabled: true -## 启用 GreptimeDB 自监控 +frontend: + replicas: 1 -自监控模式下 GreptimeDB Operator 将会额外启动一个 GreptimeDB Standalone 实例,用于收集 GreptimeDB 集群的指标和日志数据,其中日志数据将包括集群日志和慢查询日志。为了收集日志数据,GreptimeDB Operator 会在每一个 Pod 中启动一个 [Vector](https://vector.dev/) 的 Sidecar 容器,用于收集 Pod 的日志数据。启用该模式后,集群将自动开启 JSON 格式的日志输出。 +meta: + replicas: 1 + backendStorage: + etcd: + endpoints: "etcd.etcd-cluster.svc.cluster.local:2379" -如果你使用 Helm Chart 部署 GreptimeDB 集群(可参考[立即开始](/user-guide/deployments-administration/deploy-on-kubernetes/deploy-greptimedb-cluster.md)),可对 Helm Chart 的 `values.yaml` 文件进行如下配置: +datanode: + replicas: 1 +``` + +:::note 备注 +如果你在中国大陆遇到网络访问问题,可直接使用阿里云 OCI 镜像仓库: ```yaml +image: + registry: greptime-registry.cn-hangzhou.cr.aliyuncs.com + # 镜像仓库: + # OSS GreptimeDB 使用 `greptime/greptimedb`, + # Enterprise GreptimeDB 请咨询工作人员 + repository: + # 镜像标签: + # OSS GreptimeDB 使用数据库版本,例如 `v0.17.1` + # Enterprise GreptimeDB 请咨询工作人员 + tag: + pullSecrets: [ regcred ] + +initializer: + registry: greptime-registry.cn-hangzhou.cr.aliyuncs.com + repository: greptime/greptimedb-initializer + monitoring: + # 启用监控 enabled: true + vector: + # 监控需要使用 Vector + registry: greptime-registry.cn-hangzhou.cr.aliyuncs.com + +grafana: + # 用于监控面板 + # 需要先启用监控 `monitoring.enabled: true` 选项 + enabled: true + image: + registry: greptime-registry.cn-hangzhou.cr.aliyuncs.com + +frontend: + replicas: 1 + +meta: + replicas: 1 + backendStorage: + etcd: + endpoints: "etcd.etcd-cluster.svc.cluster.local:2379" + +datanode: + replicas: 1 ``` +::: + +当启用监控后,GreptimeDB Operator 会额外启动一个 GreptimeDB Standalone 实例用于收集 GreptimeDB 集群的指标和日志数据。 +为了收集日志数据,GreptimeDB Operator 会在每一个 Pod 中启动一个 [Vector](https://vector.dev/) 的 Sidecar 容器。 + +当启用 Grafana 后,会部署一个 Grafana 实例,并将用于集群监控的 GreptimeDB Standalone 实例作为其数据源。 +这样就可以开箱即用地通过 Prometheus 和 MySQL 协议来可视化 GreptimeDB 集群的监控数据。 + +接下来使用上述配置的 `values.yaml` 文件来部署 GreptimeDB 集群: + +```bash +helm upgrade --install mycluster \ + greptime/greptimedb-cluster \ + --values /path/to/values.yaml \ + -n default +``` + +部署完成后,你可以用如下命令来查看 GreptimeDB 集群的 Pod 状态: -此时 Helm Chart 将会部署一个名为 `${cluster}-monitoring` 的 GreptimeDB Standalone 实例,用于收集 GreptimeDB 集群的指标和日志数据,你可以用如下命令进行查看: +```bash +kubectl -n default get pods +``` +
+ Expected Output +```bash +NAME READY STATUS RESTARTS AGE +mycluster-datanode-0 2/2 Running 0 77s +mycluster-frontend-6ffdd549b-9s7gx 2/2 Running 0 66s +mycluster-grafana-675b64786-ktqps 1/1 Running 0 6m35s +mycluster-meta-58bc88b597-ppzvj 2/2 Running 0 86s +mycluster-monitor-standalone-0 1/1 Running 0 6m35s ``` -kubectl get greptimedbstandalones.greptime.io ${cluster}-monitoring -n ${namespace} +
+ +你可以转发 Grafana 的端口到本地来访问 Grafana 仪表盘: + +```bash +kubectl -n default port-forward svc/mycluster-grafana 18080:80 ``` -默认该 GreptimeDB Standalone 实例会将监控数据使用 Kubernetes 当前默认的 StorageClass 将数据保存于本地存储,你可以根据实际情况进行调整。 +请参考[访问 Grafana 仪表盘](#访问-grafana仪表盘)章节来查看相应的数据面板。 -GreptimeDB Standalone 实例的配置可以通过 Helm Chart 的 `values.yaml` 中的 `monitoring.standalone` 字段进行调整,如下例子所示: + +## 配置监控数据的收集 + +本节将介绍监控配置的细节。 + +### 启用监控 + +在使用 Helm Chart 部署 GreptimeDB 集群时,在 [`values.yaml`](/user-guide/deployments-administration/deploy-on-kubernetes/deploy-greptimedb-cluster.md#setup-valuesyaml) 中添加以下配置来启用监控: + +```yaml +monitoring: + enabled: true +``` + +这将部署一个名为 `${cluster-name}-monitoring` 的 GreptimeDB Standalone 实例来收集指标和日志。你可以使用以下命令验证部署: + +```bash +kubectl get greptimedbstandalones.greptime.io ${cluster-name}-monitoring -n ${namespace} +``` + +GreptimeDB Standalone 实例使用 `${cluster-name}-monitoring-standalone` 作为 Kubernetes Service 名称来暴露服务。你可以使用以下地址访问监控数据: + +- **Prometheus 指标**:`http://${cluster-name}-monitor-standalone.${namespace}.svc.cluster.local:4000/v1/prometheus` +- **SQL 日志**:`${cluster-name}-monitor-standalone.${namespace}.svc.cluster.local:4002`。默认情况下,集群日志存储在 `public._gt_logs` 表中。 + +### 自定义监控数据存储 + +默认情况下,GreptimeDB Standalone 实例使用 Kubernetes 默认的 StorageClass 将监控数据存储在本地存储中。 +你可以通过 `values.yaml` 中的 `monitoring.standalone` 字段来配置 GreptimeDB Standalone 实例。例如,以下配置使用 S3 对象存储来存储监控数据: ```yaml monitoring: @@ -65,13 +200,10 @@ monitoring: # 用于配置 GreptimeDB Standalone 实例的对象存储的 root root: "standalone-with-s3-data" ``` +### 自定义 Vector Sidecar -GreptimeDB Standalone 实例将会使用 `${cluster}-monitoring-standalone` 作为 Kubernetes Service 的名称来暴露相应的服务,你可以使用如下地址来用于监控数据的读取: - -- **Prometheus 协议的指标监控**:`http://${cluster}-monitor-standalone.${namespace}.svc.cluster.local:4000/v1/prometheus`。 -- **SQL 协议的日志监控**:`${cluster}-monitor-standalone.${namespace}.svc.cluster.local:4002`。默认集群日志会存储于 `public._gt_logs` 表。 - -GreptimeDB 自监控模式将使用 Vector Sidecar 来收集日志数据,你可以通过 `monitoring.vector` 字段来配置 Vector 的配置,如下所示: +用于日志收集的 Vector Sidecar 配置可以通过 `monitoring.vector` 字段进行自定义。 +例如,你可以按如下方式调整 Vector 的镜像和资源: ```yaml monitoring: @@ -94,140 +226,94 @@ monitoring: memory: "64Mi" ``` -:::tip NOTE -chart 版本之间的配置结构已发生变化: +### 使用 `kubectl` 部署的 YAML 配置 -- 旧版本: `meta.etcdEndpoints` -- 新版本: `meta.backendStorage.etcd.endpoints` - -请参考 chart 仓库中配置 [values.yaml](https://github.com/GreptimeTeam/helm-charts/blob/main/charts/greptimedb-cluster/values.yaml) 以获取最新的结构。 -::: - -:::note -如果你没有使用 Helm Chart 进行部署,你也可以通过如下 `GreptimeDBCluster` 的 YAML 来手动配置自监控模式,如下所示: +如果你没有使用 Helm Chart 部署 GreptimeDB 集群, +可以在 `GreptimeDBCluster` 的 YAML 中使用 `monitoring` 字段来手动配置自监控模式: ```yaml -apiVersion: greptime.io/v1alpha1 -kind: GreptimeDBCluster -metadata: - name: basic -spec: - base: - main: - image: greptime/greptimedb:latest - frontend: - replicas: 1 - meta: - replicas: 1 - backendStorage: - etcd: - endpoints: - - "etcd.etcd-cluster.svc.cluster.local:2379" - datanode: - replicas: 1 - monitoring: - enabled: true +monitoring: + enabled: true ``` -其中 `monitoring` 字段用于配置自监控模式,具体可参考 [`GreptimeDBCluster` API 文档](https://github.com/GreptimeTeam/greptimedb-operator/blob/main/docs/api-references/docs.md#monitoringspec)。 -::: +详细的配置选项请参考 [`GreptimeDBCluster` API 文档](https://github.com/GreptimeTeam/greptimedb-operator/blob/main/docs/api-references/docs.md#monitoringspec)。 -## 使用 Prometheus Operator 配置 Prometheus 指标监控 -用户需先部署 Prometheus Operator 并创建相应的 Prometheus 实例,例如可以使用 [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) 来部署相应的 Prometheus 技术栈,具体过程可参考其对应的[官方文档](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack)。 +## Grafana 配置 -当部署完 Prometheus Operator 和 Prometheus 实例后,用户可通过 Helm Chart 的 `values.yaml` 的 `prometheusMonitor` 字段来配置 Prometheus 监控,如下所示: +### 启用 Grafana + +在 `values.yaml` 中添加以下配置启用 Grafana 部署, +注意该功能必须先启用[监控(`monitoring.enabled: true`)配置](#启用监控): ```yaml -prometheusMonitor: - # 用于配置是否启用 Prometheus 监控,此时 GreptimeDB Operator 将会自动创建 Prometheus Operator 的 `PodMonitor` 资源 +grafana: enabled: true - # 用于配置 Prometheus 监控的抓取间隔 - interval: "30s" - # 用于配置 Prometheus 监控的标签 - labels: - release: prometheus ``` -:::note -`labels` 字段需要与相应用于创建 Prometheus 实例的 `matchLabels` 字段保持一致,否则将无法正常抓取到 GreptimeDB 集群的 Metrics 数据。 -::: +### 自定义 Grafana 数据源 -当我们配置完 `prometheusMonitor` 字段后,GreptimeDB Operator 将会自动创建 Prometheus Operator 的 `PodMonitor` 资源,并将 GreptimeDB 集群的 Metrics 数据导入到 Prometheus 中,比如我们可以用如下命令来查看创建的 `PodMonitor` 资源: - -``` -kubectl get podmonitors.monitoring.coreos.com -n ${namespace} -``` - -:::note -如果你没有使用 Helm Chart 进行部署,你也可以通过如下 `GreptimeDBCluster` 的 YAML 来手动配置 Prometheus 监控,如下所示: +默认情况下,Grafana 使用 `mycluster` 和 `default` 作为集群名称和命名空间来创建数据源。 +要监控其他名称或命名空间的集群,请根据实际的集群名称和命名空间自定义配置。 +以下是 `values.yaml` 配置示例: ```yaml -apiVersion: greptime.io/v1alpha1 -kind: GreptimeDBCluster -metadata: - name: basic -spec: - base: - main: - image: greptime/greptimedb:latest - frontend: - replicas: 1 - meta: - replicas: 1 - backendStorage: - etcd: - endpoints: - - "etcd.etcd-cluster.svc.cluster.local:2379" - datanode: - replicas: 1 - prometheusMonitor: - enabled: true - interval: "30s" - labels: - release: prometheus +monitoring: + enabled: true + +grafana: + enabled: true + datasources: + datasources.yaml: + datasources: + - name: greptimedb-metrics + type: prometheus + url: http://${cluster-name}-monitor-standalone.${namespace}.svc.cluster.local:4000/v1/prometheus + access: proxy + isDefault: true + + - name: greptimedb-logs + type: mysql + url: ${cluster-name}-monitor-standalone.${namespace}.svc.cluster.local:4002 + access: proxy + database: public ``` -其中 `prometheusMonitor` 字段用于配置 Prometheus 监控。 -::: +此配置会在 Grafana 中为 GreptimeDB 集群的监控创建以下数据源: -## 导入 Grafana Dashboard +- **`greptimedb-metrics`**:用于存储监控数据的单机数据库中的集群指标,通过 Prometheus 协议提供服务(`type: prometheus`) +- **`greptimedb-logs`**:用于存储监控数据的单机数据库中的集群日志,通过 MySQL 协议提供服务(`type: mysql`),默认使用 `public` 数据库。 -目前 GreptimeDB 集群可使用如下 2 个 Grafana Dashboard 来配置监控面板: +### 访问 Grafana 仪表盘 -- [集群指标 Dashboard](https://github.com/GreptimeTeam/greptimedb/tree/VAR::greptimedbVersion/grafana/dashboards/metrics/cluster) -- [集群日志 Dashboard](https://github.com/GreptimeTeam/greptimedb/tree/VAR::greptimedbVersion/grafana/dashboards/logs) +你可以通过将 Grafana 服务端口转发到本地来访问 Grafana 仪表盘: -:::note -其中 **集群日志 Dashboard** 仅适用于自监控模式,而 **集群指标 Dashboard** 则适用于自监控模式和 Prometheus 监控模式。 -::: +```bash +kubectl -n ${namespace} port-forward svc/${cluster-name}-grafana 18080:80 +``` -如果你使用 Helm Chart 部署 GreptimeDB 集群,你可以通过启用 `grafana.enabled` 来一键部署 Grafana 实例,并导入相应的 Dashboard(可参考[立即开始](/user-guide/deployments-administration/deploy-on-kubernetes/deploy-greptimedb-cluster.md)),如下所示: +然后打开 `http://localhost:18080` 来访问 Grafana 仪表盘。 +默认登录凭据为: -```yaml -grafana: - enabled: true -``` +- **用户名**:`admin` +- **密码**:`gt-operator` + +接着进入到 `Dashboards` 部分来查看用于监控 GreptimeDB 集群而预配置的仪表盘。 -如果你是已经部署了 Grafana 实例,你可以参考如下步骤来导入相应的 Dashboard: +![Grafana Dashboard](/kubernetes-cluster-grafana-dashboard.jpg) -1. **添加相应的 Data Sources** - 你可以参考 Grafana 官方文档的 [datasources](https://grafana.com/docs/grafana/latest/datasources/) 来添加如下 3 个数据源: +## 清理 PVC - - **`metrics` 数据源** - - 用于导入集群的 Prometheus 监控数据,适用于自监控模式和 Prometheus 监控模式。如上文所述,当使用自监控模式时,此时可使使用 `http://${cluster}-monitor-standalone.${namespace}.svc.cluster.local:4000/v1/prometheus` 作为数据源的 URL。如果使用 Prometheus 监控模式,用户可根据具体 Prometheus 实例的 URL 来配置数据源。 +:::danger +清理操作将移除 GreptimeDB 集群的元数据和数据,请确保在操作前已备份数据。 +::: - - **`information-schema` 数据源** - - 这部分数据源用于使用 SQL 协议导入集群内部的元数据信息,适用于自监控模式和 Prometheus 监控模式。此时我们可以用 `${cluster}-frontend.${namespace}.svc.cluster.local:4002` 作为 SQL 协议的地址,并使用 `information_schema` 作为数据库名称进行连接。 +请参考[清理 GreptimeDB 集群](/user-guide/deployments-administration/deploy-on-kubernetes/deploy-greptimedb-cluster.md#cleanup)文档 +查看如何卸载 GreptimeDB 集群, - - **`logs` 数据源** - - 这部分数据源用于使用 SQL 协议导入集群的日志,**仅适用于自监控模式**。此时我们可以用 `${cluster}-monitor-standalone.${namespace}.svc.cluster.local:4002` 作为 SQL 协议的地址,并使用 `public` 作为数据库名称进行连接。 - +要清理 GreptimeDB 用于监控的单机数据库的 PVC,请使用以下命令: -2. **导入相应的 Dashboard** - - 你可以参考 Grafana 官方文档的 [Import dashboards](https://grafana.com/docs/grafana/latest/dashboards/build-dashboards/import-dashboards/) 来导入相应的 Dashboard。 +```bash +kubectl -n default delete pvc -l app.greptime.io/component=${cluster-name}-monitor-standalone +``` diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/monitoring/monitor-cluster-with-prometheus.md b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/monitoring/monitor-cluster-with-prometheus.md new file mode 100644 index 000000000..040bba786 --- /dev/null +++ b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/monitoring/monitor-cluster-with-prometheus.md @@ -0,0 +1,115 @@ +--- +keywords: [cluster, monitoring, Prometheus] +description: 本文介绍了如何在 Kubernetes 环境中使用现有的 Prometheus 实例监控 GreptimeDB 集群,包括配置 PodMonitor、启用指标收集和设置 Grafana 仪表板。 +--- + +# 使用 Prometheus 监控 GreptimeDB Cluster + +在阅读本文档之前,请确保你已经了解如何[在 Kubernetes 上部署 GreptimeDB 集群](/user-guide/deployments-administration/deploy-on-kubernetes/deploy-greptimedb-cluster.md)。 + +我们推荐使用[自监控方法](cluster-monitoring-deployment.md)来监控 GreptimeDB 集群, +这种模式配置简单且提供了开箱即用的 Grafana 仪表板。 +但如果你已经在 Kubernetes 集群中部署了 Prometheus 实例,并希望写入 +GreptimeDB 集群的监控指标,请按照本文档中的步骤操作。 + +## 检查 Prometheus 实例配置 + +请先确保你已经部署 Prometheus Operator 并创建了 Prometheus 实例。例如,你可以使用 [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) 来部署 Prometheus 技术栈。请参考其[官方文档](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack)了解更多详情。 + +在部署 Prometheus 实例时,确保你设置了用于抓取 GreptimeDB 集群指标的标签。 +例如,你现有的 Prometheus 实例包含下面的配置: + +```yaml +apiVersion: monitoring.coreos.com/v1 +kind: PodMonitor +metadata: + name: greptime-podmonitor + namespace: default +spec: + selector: + matchLabels: + release: prometheus + # 其他配置... +``` + +当 `PodMonitor` 被部署后, +Prometheus Operator 会持续监视 `default` 命名空间中匹配 `spec.selector.matchLabels` 中定义的所有标签(在此示例中为 `release: prometheus`)的 Pod。 + +## 为 GreptimeDB 集群启用 `prometheusMonitor` + +使用 Helm Chart 部署 GreptimeDB 集群时, +在你的 `values.yaml` 文件中启用 `prometheusMonitor` 字段。例如: + +```yaml +prometheusMonitor: + # 启用 Prometheus 监控 - 这将创建 PodMonitor 资源 + enabled: true + # 配置抓取间隔 + interval: "30s" + # 配置标签 + labels: + release: prometheus +``` + +**重要:** `labels` 字段的值(`release: prometheus`) +必须与创建 Prometheus 实例时使用的 `matchLabels` 的值匹配, +否则指标收集将无法正常工作。 + +配置 `prometheusMonitor` 后, +GreptimeDB Operator 将自动创建 `PodMonitor` 资源并按指定的时间间隔 `interval` 将指标导入到 Prometheus。 +你可以使用以下命令检查 `PodMonitor` 资源: + +``` +kubectl get podmonitors.monitoring.coreos.com -n ${namespace} +``` + +:::note +如果你没有使用 Helm Chart 部署 GreptimeDB 集群, +可以在 `GreptimeDBCluster` YAML 中手动配置 Prometheus 监控: + +```yaml +apiVersion: greptime.io/v1alpha1 +kind: GreptimeDBCluster +metadata: + name: basic +spec: + base: + main: + image: greptime/greptimedb:latest + frontend: + replicas: 1 + meta: + replicas: 1 + backendStorage: + etcd: + endpoints: + - "etcd.etcd-cluster.svc.cluster.local:2379" + datanode: + replicas: 1 + prometheusMonitor: + enabled: true + interval: "30s" + labels: + release: prometheus +``` + +::: + +## Grafana 仪表板 + +你需要自己部署 Grafana, +然后导入仪表板。 + +### 添加数据源 + +部署 Grafana 后, +参考 Grafana 的[数据源](https://grafana.com/docs/grafana/latest/datasources/)文档添加以下两种类型的数据源: + +- **Prometheus**:将其命名为 `metrics`。此数据源连接到你的收集 GreptimeDB 集群监控指标的 Prometheus 实例,因此请使用你的 Prometheus 实例 URL 作为连接 URL。 +- **MySQL**:将其命名为 `information-schema`。此数据源连接到你的 GreptimeDB 集群,通过 SQL 协议访问集群元数据。如果你已经按照[在 Kubernetes 上部署 GreptimeDB 集群](/user-guide/deployments-administration/deploy-on-kubernetes/deploy-greptimedb-cluster.md)指南部署了 GreptimeDB,服务器地址为 `${cluster-name}-frontend.${namespace}.svc.cluster.local:4002`,数据库为 `information_schema`。 + +### 导入仪表板 + +[GreptimeDB 集群指标仪表板](https://github.com/GreptimeTeam/greptimedb/tree/VAR::greptimedbVersion/grafana/dashboards/metrics/cluster)使用 `metrics` 和 `information-schema` 数据源来显示 GreptimeDB 集群指标。 + +请参考 Grafana 的[导入仪表板](https://grafana.com/docs/grafana/latest/dashboards/build-dashboards/import-dashboards/)文档了解如何导入仪表板。 diff --git a/sidebars.ts b/sidebars.ts index 042b5a0e2..a26b8a55f 100644 --- a/sidebars.ts +++ b/sidebars.ts @@ -338,8 +338,9 @@ const sidebars: SidebarsConfig = { label: 'Overview', }, 'user-guide/deployments-administration/monitoring/check-db-status', - 'user-guide/deployments-administration/monitoring/cluster-monitoring-deployment', 'user-guide/deployments-administration/monitoring/standalone-monitoring', + 'user-guide/deployments-administration/monitoring/cluster-monitoring-deployment', + 'user-guide/deployments-administration/monitoring/monitor-cluster-with-prometheus', 'user-guide/deployments-administration/monitoring/key-logs', 'user-guide/deployments-administration/monitoring/tracing', 'user-guide/deployments-administration/monitoring/slow-query',