Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 86 additions & 2 deletions pipeline/outputs/s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ description: Send logs, data, and metrics to Amazon S3

![AWS logo](../../.gitbook/assets/image%20(9).png)

The _Amazon S3_ output plugin lets you ingest records into the [S3](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html) cloud object store.
The _Amazon S3_ output plugin lets you ingest records into the [S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html) cloud object store.

The plugin can upload data to S3 using the [multipart upload API](https://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html) or [`PutObject`](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html). Multipart is the default and is recommended. Fluent Bit will stream data in a series of _parts_. This limits the amount of data buffered on disk at any point in time. By default, every time 5 MiB of data have been received, a new part will be uploaded. The plugin can create files up to gigabytes in size from many small chunks or parts using the multipart API. All aspects of the upload process are configurable.

Expand Down Expand Up @@ -36,7 +36,7 @@ The [Prometheus success/retry/error metrics values](../../administration/monitor
| `blob_database_file` | Absolute path to a database file to be used to store blob files contexts. | _none_ |
| `bucket` | S3 Bucket name | _none_ |
| `canned_acl` | [Predefined Canned ACL policy](https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl) for S3 objects. | _none_ |
| `compression` | Compression type for S3 objects. `gzip`, `arrow`, `parquet` and `zstd` are the supported values, `arrow` and `parquet` are only available if Apache Arrow was enabled at compile time. Defaults to no compression. | _none_ |
| `compression` | Compression/format for S3 objects. Supported: `gzip` (always available) and `parquet` (requires Arrow build). For `gzip`, the `Content-Encoding` header is set to `gzip`. `parquet` is available **only when Fluent Bit is built with `-DFLB_ARROW=On`** and Arrow GLib/Parquet GLib are installed. Parquet is typically used with `use_put_object On`. | _none_ |
| `content_type` | A standard MIME type for the S3 object, set as the Content-Type HTTP header. | _none_ |
| `endpoint` | Custom endpoint for the S3 API. Endpoints can contain scheme and port. | _none_ |
| `external_id` | Specify an external ID for the STS API. Can be used with the `role_arn` parameter if your role requires an external ID. | _none_ |
Expand Down Expand Up @@ -639,6 +639,7 @@ After being compiled, Fluent Bit can upload incoming data to S3 in Apache Arrow
For example:

{% tabs %}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove extraneous blank line in tab markup.

Line 642 contains a blank line immediately after the opening {% tabs %} tag, which appears to be a formatting artifact. This extra whitespace can affect the rendered tab display.

  {% tabs %}
-
  {% tab title="fluent-bit.yaml" %}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{% tabs %}
{% tab title="fluent-bit.yaml" %}
🤖 Prompt for AI Agents
In pipeline/outputs/s3.md around line 642, there is an extraneous blank line
immediately after the opening `{% tabs %}` tag; remove that blank line so the
first tab block/content follows the `{% tabs %}` tag directly (ensure no other
whitespace-only lines exist between the `{% tabs %}` tag and the first `{% tab
%}` or tab content).

{% tab title="fluent-bit.yaml" %}

```yaml
Expand Down Expand Up @@ -695,3 +696,86 @@ The following example uses `pyarrow` to analyze the uploaded data:
3 2021-04-27T09:33:56.539430Z 0.0 0.0 0.0 0.0 0.0 0.0
4 2021-04-27T09:33:57.539803Z 0.0 0.0 0.0 0.0 0.0 0.0
```

## Enable Parquet support

### Build requirements for Parquet

To enable Parquet, build Fluent Bit with Apache Arrow support and install Arrow GLib/Parquet GLib:

```bash
# Ubuntu/Debian example
sudo apt-get update
sudo apt-get install -y -V ca-certificates lsb-release wget
wget https://packages.apache.org/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
sudo apt-get install -y -V ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
sudo apt-get update
sudo apt-get install -y -V libarrow-glib-dev libparquet-glib-dev

# Build Fluent Bit with Arrow:
cd build/
cmake -DFLB_ARROW=On ..
cmake --build .
```

For other Linux distributions, refer [the document for installation instructions of Apache Parquet](https://arrow.apache.org/install/).
Apache Parquet GLib is a part of Apache Arrow project.

### Testing Parquet support

Example configuration:

{% tabs %}

{% tab title="fluent-bit.yaml" %}

```yaml
service:
flush: 5
daemon: Off
log_level: debug
http_server: Off

pipeline:
inputs:
- name: dummy
tag: dummy.local
dummy {"boolean": false, "int": 1, "long": 1, "float": 1.1, "double": 1.1, "bytes": "foo", "string": "foo"}

outputs:
- name: s3
match: dummy*
region: us-east-2
bucket: <your_testing_bucket>
use_put_object: On
compression: parquet
# other parameters
```

{% endtab %}
{% tab title="fluent-bit.conf" %}

```text
[SERVICE]
Flush 5
Daemon Off
Log_Level debug
HTTP_Server Off

[INPUT]
Name dummy
Tag dummy.local
Dummy {"boolean": false, "int": 1, "long": 1, "float": 1.1, "double": 1.1, "bytes": "foo", "string": "foo"}

[OUTPUT]
Name s3
Match dummy*
Region us-east-2
Bucket <your_testing_bucket>
Use_Put_Object On
Compression parquet
# other parameters
```

{% endtab %}
{% endtabs %}