Fill

Use this API to populate an existing dataset with new data. This API allows you to insert or append rows of data into a pre-defined dataset, making it useful for updating data during your projects to keep ML models in Graphite Note with up to date datasets.

fill_url = "https://app.graphite-note.com/api/dataset-fill"

Fill a Dataset

To populate a dataset, follow these steps:

  1. Make a POST request to /dataset-fill with the required parameters in the request body.

  2. Include the user-code and dataset-code to identify the dataset and the user making the request.

  3. Define the structure of the data by specifying the columns parameter, which includes details like column names, aliases, types, subtypes, and optional formats.

  4. Provide the data to be inserted via the insert-data parameter.

  • If compressed is false, the data should be formatted as a JSON-escaped string.

  • If compressed is true, the data should be base64-encoded after being gzipped.

If you have large dataset, it is a good idea to call dataset-fill in batch sizes of 10.000 rows, for example. Working example is below.

Example of insert-data with JSON (when compressed: false):

{
  "insert-data": "[[\"536365\",\"85123A\",\"WHITE HANGING HEART T-LIGHT HOLDER\",6,\"2010-12-01 08:26:00\",2.55,17850], [\"536365\",\"71053\",\"WHITE METAL LANTERN\",6,\"2010-12-01 08:26:00\",3.39,17850]]",
  "compressed": false
}
  1. Use the append parameter to indicate whether to append the data to the existing dataset (true) or truncate the dataset before inserting the new data (false).

  2. use the compressed parameter to specify if the data is gzip compressed (true) or not (false).

Example Usage

Filling a Dataset with Base64-encoded Data

For example, making a POST request to the following URL with the provided JSON body would result in the response below:

POST /dataset-fill
Authorization: Bearer YOUR-TENANT-TOKEN

{
  "user-code": "0f02b4d4f9ae",
  "dataset-code": "eca2ad3940e3",
  "columns": [
    {
      "name": "InvoiceNo",
      "alias": "InvoiceNo",
      "type": "dimension",
      "subtype": "text"
    },
    {
      "name": "StockCode",
      "alias": "StockCode",
      "type": "dimension",
      "subtype": "text"
    },
    {
      "name": "Description",
      "alias": "Description",
      "type": "dimension",
      "subtype": "text"
    },
    {
      "name": "Quantity",
      "alias": "Quantity",
      "type": "measure",
      "subtype": "numeric",
      "format": "#,###.##"
    },
    {
      "name": "InvoiceDate",
      "alias": "InvoiceDate",
      "type": "dimension",
      "subtype": "datetime",
      "format": "Y-d-m H:i:s"
    },
    {
      "name": "UnitPrice",
      "alias": "UnitPrice",
      "type": "measure",
      "subtype": "numeric",
      "format": "#,###.##"
    },
    {
      "name": "CustomerID",
      "alias": "CustomerID",
      "type": "measure",
      "subtype": "numeric",
      "format": "#,###.##"
    }
  ],
  "insert-data": "H4sIAAAAAAAACtT9W4/...NDtGHlBlNUP4lMXrO9OfaUk2XMReE/t///68rEPhOT+AC"
  "compressed": true,
  "append": true
}

Response

{
  "data": {
    "status": "success",
    "details": {
      "dataset-code": "eca2ad3940e3",
      "rows-count": 518125
    }
  }
}

The response will confirm the status of the operation, including the dataset code and the number of rows inserted.

Sample Python Code

This Python script reads data from a PostgresSQL, converts it to JSON, compresses it using gzip, and encodes it as a Base64 string, ready to be sent via API requests.

Then It sends to Graphite Note in batch size of 10.000 rows.

import psycopg2
import json
import gzip
import base64
import requests
from io import BytesIO
import time

db_config = {
    "host": "localhost",
    "user": "",
    "password": "",
    "port": "",
    "database": "dwh"
}

tenant_token = 'YOUT-TOKEN'
dataset_code = "GN-DATASET-CODE"
user_code = "GN-USER-CODE"
fill_url = "https://app.graphite-note.com/api/dataset-fill"
complete_url = "https://app.graphite-note.com/api/dataset-complete"
batch_size = 10000

columns = [
    {"name": "order", "alias": "order_uuid", "type": "dimension", "subtype": "text"},
    {"name": "customer_id", "alias": "customer_id", "type": "dimension", "subtype": "text"},
    {"name": "number_of_items", "alias": "number_of_items", "type": "measure", "subtype": "numeric", "format": "#,###.##"},
]

headers = {
    "Authorization": f"Bearer {tenant_token}",
    "Content-Type": "application/json"
}

def compress_data(data):
    json_string = json.dumps(data)
    buffer = BytesIO()
    with gzip.GzipFile(fileobj=buffer, mode="w") as f:
        f.write(json_string.encode("utf-8"))
    return base64.b64encode(buffer.getvalue()).decode("utf-8")

def send_batch(batch, batch_number):
    print(f"\nSending batch #{batch_number} with {len(batch)} rows...")
    start_time = time.perf_counter()

    payload = {
        "user-code": user_code,
        "dataset-code": dataset_code,
        "columns": columns,
        "insert-data": compress_data(batch),
        "compressed": True,
        "append": True
    }
    response = requests.post(fill_url, headers=headers, json=payload)

    end_time = time.perf_counter()
    elapsed = round(end_time - start_time, 2)

    print(f"Batch #{batch_number} response: {response.status_code} (Time: {elapsed} seconds)")
    if response.status_code != 200:
        print(response.text)

def send_completion():
    payload = {
        "user-code": user_code,
        "dataset-code": dataset_code
    }
    response = requests.post(complete_url, headers=headers, json=payload)
    print(f"\nComplete response: {response.status_code}")
    if response.status_code != 200:
        print(response.text)

# === MAIN EXECUTION ===
try:
    conn = psycopg2.connect(**db_config)
    cursor = conn.cursor(name='stream_cursor')

    cursor.itersize = batch_size
    cursor.execute("SELECT order, customer_id, number_of_items FROM public.my_dwh_db")

    batch = []
    batch_number = 1

    for row in cursor:
        clean_row = [str(val) if not isinstance(val, (int, float)) else val for val in row]
        batch.append(clean_row)
        if len(batch) >= batch_size:
            send_batch(batch, batch_number)
            batch = []
            batch_number += 1

    if batch:
        send_batch(batch, batch_number)

    send_completion()

except Exception as e:
    print("Error:", e)

finally:
    if 'cursor' in locals():
        cursor.close()
    if 'conn' in locals():
        conn.close()

Last updated

Was this helpful?