# Fill

Use this API to populate an existing dataset with new data. This API allows you to insert or append rows of data into a pre-defined dataset, making it useful for updating data during your projects to keep ML models in Graphite Note with up to date datasets.

```python
fill_url = "https://app.graphite-note.com/api/dataset-fill"
```

### **Fill a Dataset**

To populate a dataset, follow these steps:

1. Make a POST request to `/dataset-fill` with the required parameters in the request body.
2. Include the `user-code` and `dataset-code` to identify the dataset and the user making the request.
3. Define the structure of the data by specifying the `columns` parameter, which includes details like column names, aliases, types, subtypes, and optional formats.
4. Provide the data to be inserted via the `insert-data` parameter.

* If `compressed` is `false`, the data should be formatted as a **JSON-escaped string**.
* If `compressed` is `true`, the data should be **base64-encoded** after being gzipped.

{% hint style="info" %}
If you have large dataset, it is a good idea to call dataset-fill in batch sizes of 10.000 rows, for example. Working example is below.
{% endhint %}

**Example of `insert-data` with JSON (when `compressed: false`):**

```json
{
  "insert-data": "[[\"536365\",\"85123A\",\"WHITE HANGING HEART T-LIGHT HOLDER\",6,\"2010-12-01 08:26:00\",2.55,17850], [\"536365\",\"71053\",\"WHITE METAL LANTERN\",6,\"2010-12-01 08:26:00\",3.39,17850]]",
  "compressed": false
}

```

5. Use the `append` parameter to indicate whether to append the data to the existing dataset (`true`) or truncate the dataset before inserting the new data (`false`).
6. use the `compressed` parameter to specify if the data is **gzip compressed** (`true`) or not (`false`).

### **Example Usage**

**Filling a Dataset with Base64-encoded Data**

For example, making a POST request to the following URL with the provided JSON body would result in the response below:

```json
POST /dataset-fill
Authorization: Bearer YOUR-TENANT-TOKEN

{
  "user-code": "0f02b4d4f9ae",
  "dataset-code": "eca2ad3940e3",
  "columns": [
    {
      "name": "InvoiceNo",
      "alias": "InvoiceNo",
      "type": "dimension",
      "subtype": "text"
    },
    {
      "name": "StockCode",
      "alias": "StockCode",
      "type": "dimension",
      "subtype": "text"
    },
    {
      "name": "Description",
      "alias": "Description",
      "type": "dimension",
      "subtype": "text"
    },
    {
      "name": "Quantity",
      "alias": "Quantity",
      "type": "measure",
      "subtype": "numeric",
      "format": "#,###.##"
    },
    {
      "name": "InvoiceDate",
      "alias": "InvoiceDate",
      "type": "dimension",
      "subtype": "datetime",
      "format": "Y-d-m H:i:s"
    },
    {
      "name": "UnitPrice",
      "alias": "UnitPrice",
      "type": "measure",
      "subtype": "numeric",
      "format": "#,###.##"
    },
    {
      "name": "CustomerID",
      "alias": "CustomerID",
      "type": "measure",
      "subtype": "numeric",
      "format": "#,###.##"
    }
  ],
  "insert-data": "H4sIAAAAAAAACtT9W4/...NDtGHlBlNUP4lMXrO9OfaUk2XMReE/t///68rEPhOT+AC"
  "compressed": true,
  "append": true
}

```

**Response**

```json
{
  "data": {
    "status": "success",
    "details": {
      "dataset-code": "eca2ad3940e3",
      "rows-count": 518125
    }
  }
}

```

The response will confirm the status of the operation, including the dataset code and the number of rows inserted.

***

### **Sample Python Code**

This Python script reads data from a PostgresSQL,  converts it to JSON, compresses it using gzip, and encodes it as a Base64 string, ready to be sent via API requests.&#x20;

Then It sends to Graphite Note in batch size of 10.000 rows.

```python
import psycopg2
import json
import gzip
import base64
import requests
from io import BytesIO
import time

db_config = {
    "host": "localhost",
    "user": "",
    "password": "",
    "port": "",
    "database": "dwh"
}

tenant_token = 'YOUT-TOKEN'
dataset_code = "GN-DATASET-CODE"
user_code = "GN-USER-CODE"
fill_url = "https://app.graphite-note.com/api/dataset-fill"
complete_url = "https://app.graphite-note.com/api/dataset-complete"
batch_size = 10000

columns = [
    {"name": "order", "alias": "order_uuid", "type": "dimension", "subtype": "text"},
    {"name": "customer_id", "alias": "customer_id", "type": "dimension", "subtype": "text"},
    {"name": "number_of_items", "alias": "number_of_items", "type": "measure", "subtype": "numeric", "format": "#,###.##"},
]

headers = {
    "Authorization": f"Bearer {tenant_token}",
    "Content-Type": "application/json"
}

def compress_data(data):
    json_string = json.dumps(data)
    buffer = BytesIO()
    with gzip.GzipFile(fileobj=buffer, mode="w") as f:
        f.write(json_string.encode("utf-8"))
    return base64.b64encode(buffer.getvalue()).decode("utf-8")

def send_batch(batch, batch_number):
    print(f"\nSending batch #{batch_number} with {len(batch)} rows...")
    start_time = time.perf_counter()

    payload = {
        "user-code": user_code,
        "dataset-code": dataset_code,
        "columns": columns,
        "insert-data": compress_data(batch),
        "compressed": True,
        "append": True
    }
    response = requests.post(fill_url, headers=headers, json=payload)

    end_time = time.perf_counter()
    elapsed = round(end_time - start_time, 2)

    print(f"Batch #{batch_number} response: {response.status_code} (Time: {elapsed} seconds)")
    if response.status_code != 200:
        print(response.text)

def send_completion():
    payload = {
        "user-code": user_code,
        "dataset-code": dataset_code
    }
    response = requests.post(complete_url, headers=headers, json=payload)
    print(f"\nComplete response: {response.status_code}")
    if response.status_code != 200:
        print(response.text)

# === MAIN EXECUTION ===
try:
    conn = psycopg2.connect(**db_config)
    cursor = conn.cursor(name='stream_cursor')

    cursor.itersize = batch_size
    cursor.execute("SELECT order, customer_id, number_of_items FROM public.my_dwh_db")

    batch = []
    batch_number = 1

    for row in cursor:
        clean_row = [str(val) if not isinstance(val, (int, float)) else val for val in row]
        batch.append(clean_row)
        if len(batch) >= batch_size:
            send_batch(batch, batch_number)
            batch = []
            batch_number += 1

    if batch:
        send_batch(batch, batch_number)

    send_completion()

except Exception as e:
    print("Error:", e)

finally:
    if 'cursor' in locals():
        cursor.close()
    if 'conn' in locals():
        conn.close()

```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.graphite-note.com/graphite-note-documentation/rest-api/dataset-api/fill.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
