Fill
Use this API to populate an existing dataset with new data. This API allows you to insert or append rows of data into a pre-defined dataset, making it useful for updating data during your projects to keep ML models in Graphite Note with up to date datasets.
fill_url = "https://app.graphite-note.com/api/dataset-fill"
Fill a Dataset
To populate a dataset, follow these steps:
Make a POST request to
/dataset-fill
with the required parameters in the request body.Include the
user-code
anddataset-code
to identify the dataset and the user making the request.Define the structure of the data by specifying the
columns
parameter, which includes details like column names, aliases, types, subtypes, and optional formats.Provide the data to be inserted via the
insert-data
parameter.
If
compressed
isfalse
, the data should be formatted as a JSON-escaped string.If
compressed
istrue
, the data should be base64-encoded after being gzipped.
Example of insert-data
with JSON (when compressed: false
):
{
"insert-data": "[[\"536365\",\"85123A\",\"WHITE HANGING HEART T-LIGHT HOLDER\",6,\"2010-12-01 08:26:00\",2.55,17850], [\"536365\",\"71053\",\"WHITE METAL LANTERN\",6,\"2010-12-01 08:26:00\",3.39,17850]]",
"compressed": false
}
Use the
append
parameter to indicate whether to append the data to the existing dataset (true
) or truncate the dataset before inserting the new data (false
).use the
compressed
parameter to specify if the data is gzip compressed (true
) or not (false
).
Example Usage
Filling a Dataset with Base64-encoded Data
For example, making a POST request to the following URL with the provided JSON body would result in the response below:
POST /dataset-fill
Authorization: Bearer YOUR-TENANT-TOKEN
{
"user-code": "0f02b4d4f9ae",
"dataset-code": "eca2ad3940e3",
"columns": [
{
"name": "InvoiceNo",
"alias": "InvoiceNo",
"type": "dimension",
"subtype": "text"
},
{
"name": "StockCode",
"alias": "StockCode",
"type": "dimension",
"subtype": "text"
},
{
"name": "Description",
"alias": "Description",
"type": "dimension",
"subtype": "text"
},
{
"name": "Quantity",
"alias": "Quantity",
"type": "measure",
"subtype": "numeric",
"format": "#,###.##"
},
{
"name": "InvoiceDate",
"alias": "InvoiceDate",
"type": "dimension",
"subtype": "datetime",
"format": "Y-d-m H:i:s"
},
{
"name": "UnitPrice",
"alias": "UnitPrice",
"type": "measure",
"subtype": "numeric",
"format": "#,###.##"
},
{
"name": "CustomerID",
"alias": "CustomerID",
"type": "measure",
"subtype": "numeric",
"format": "#,###.##"
}
],
"insert-data": "H4sIAAAAAAAACtT9W4/...NDtGHlBlNUP4lMXrO9OfaUk2XMReE/t///68rEPhOT+AC"
"compressed": true,
"append": true
}
Response
{
"data": {
"status": "success",
"details": {
"dataset-code": "eca2ad3940e3",
"rows-count": 518125
}
}
}
The response will confirm the status of the operation, including the dataset code and the number of rows inserted.
Sample Python Code
This Python script reads data from a PostgresSQL, converts it to JSON, compresses it using gzip, and encodes it as a Base64 string, ready to be sent via API requests.
Then It sends to Graphite Note in batch size of 10.000 rows.
import psycopg2
import json
import gzip
import base64
import requests
from io import BytesIO
import time
db_config = {
"host": "localhost",
"user": "",
"password": "",
"port": "",
"database": "dwh"
}
tenant_token = 'YOUT-TOKEN'
dataset_code = "GN-DATASET-CODE"
user_code = "GN-USER-CODE"
fill_url = "https://app.graphite-note.com/api/dataset-fill"
complete_url = "https://app.graphite-note.com/api/dataset-complete"
batch_size = 10000
columns = [
{"name": "order", "alias": "order_uuid", "type": "dimension", "subtype": "text"},
{"name": "customer_id", "alias": "customer_id", "type": "dimension", "subtype": "text"},
{"name": "number_of_items", "alias": "number_of_items", "type": "measure", "subtype": "numeric", "format": "#,###.##"},
]
headers = {
"Authorization": f"Bearer {tenant_token}",
"Content-Type": "application/json"
}
def compress_data(data):
json_string = json.dumps(data)
buffer = BytesIO()
with gzip.GzipFile(fileobj=buffer, mode="w") as f:
f.write(json_string.encode("utf-8"))
return base64.b64encode(buffer.getvalue()).decode("utf-8")
def send_batch(batch, batch_number):
print(f"\nSending batch #{batch_number} with {len(batch)} rows...")
start_time = time.perf_counter()
payload = {
"user-code": user_code,
"dataset-code": dataset_code,
"columns": columns,
"insert-data": compress_data(batch),
"compressed": True,
"append": True
}
response = requests.post(fill_url, headers=headers, json=payload)
end_time = time.perf_counter()
elapsed = round(end_time - start_time, 2)
print(f"Batch #{batch_number} response: {response.status_code} (Time: {elapsed} seconds)")
if response.status_code != 200:
print(response.text)
def send_completion():
payload = {
"user-code": user_code,
"dataset-code": dataset_code
}
response = requests.post(complete_url, headers=headers, json=payload)
print(f"\nComplete response: {response.status_code}")
if response.status_code != 200:
print(response.text)
# === MAIN EXECUTION ===
try:
conn = psycopg2.connect(**db_config)
cursor = conn.cursor(name='stream_cursor')
cursor.itersize = batch_size
cursor.execute("SELECT order, customer_id, number_of_items FROM public.my_dwh_db")
batch = []
batch_number = 1
for row in cursor:
clean_row = [str(val) if not isinstance(val, (int, float)) else val for val in row]
batch.append(clean_row)
if len(batch) >= batch_size:
send_batch(batch, batch_number)
batch = []
batch_number += 1
if batch:
send_batch(batch, batch_number)
send_completion()
except Exception as e:
print("Error:", e)
finally:
if 'cursor' in locals():
cursor.close()
if 'conn' in locals():
conn.close()
Last updated
Was this helpful?