LogoLogo
Log InSign UpHomepage
  • 👋Welcome
  • Account and Team Setup
    • Sign up
    • Subscription Plans
    • Profile information
    • Account information
    • Roles
    • Users
    • Tags
  • FAQ
  • UNDERSTANDING MACHINE LEARNING
    • What is Graphite Note
      • Graphite Note Insights Lifecycle
    • Introduction to Machine Learning
      • What is Machine Learning
      • Data Analitycs Maturity
    • Machine Learning concepts
      • Key Drivers
      • Confusion Matrix
      • Supervised vs Unsupervised ML
  • Demo datasets
    • Demo Datasets
      • Ads
      • Churn
      • CO2 Emission
      • Diamonds
      • eCommerce Orders
      • Housing Prices
      • Lead Scoring
      • Mall Customers
      • Marketing Mix
      • Car Sales
      • Store Item Demand
      • Upsell
    • What Dataset do I need for my use case?
      • Predict Cross Selling: Dataset
      • Predict Customer Churn: Dataset
      • Predictive Lead Scoring: Dataset
      • Predict Revenue : Dataset
      • Product Demand Forecast: Dataset
      • Predictive Ads Performance: Dataset
      • Media Mix Modeling (MMM): Dataset
      • Customer Lifetime Value Prediction : Dataset
      • RFM Customer Segmentation : Dataset
    • Dataset examples - from online sources
      • Free datasets for Machine Learning
  • Datasets
    • Introduction
    • Prepare your Data
      • Data Labeling
      • Expanding datasets
      • Merging datasets
      • CSV File creating and formatting
    • Data sources in Graphite Note
      • Import data from CSV file
        • Re-upload or append CSV
        • CSV upload troubleshooting tips
      • MySQL Connector
      • MariaDB Connector
      • PostgreSQL Connector
      • Redshift Connector
      • Big Query Connector
      • MS SQL Connector
      • Oracle Connector
  • Models
    • Introduction
    • Preprocessing Data
    • Machine Learning Models
      • Timeseries Forecast
      • Binary Classification
      • Multiclass Classification
      • Regression
      • General Segmentation
      • RFM Customer Segmentation
      • Customer Lifetime Value
      • Customer Cohort Analysis
      • ABC Pareto Analysis
      • New vs Returning Customers
    • Predict with ML Models
    • Overview and Model Health Check
    • Advanced parameters in ML Models
    • Actionable insights in ML Models
    • Improve your ML Models
  • Notebooks
    • What is Notebook?
    • My first Notebook
    • Data Visualization
  • REST API
    • API Introduction
    • Dataset API
      • Create
      • Fill
      • Complete
    • Prediction API
      • Request v1
        • Headers
        • Body
      • Request v2
        • Headers
        • Body
      • Response
      • Usage Notes
    • Model Results API
      • Request
        • Headers
        • Body
      • Response
      • Usage Notes
      • Code Examples
    • Model Info API
      • Request
        • Headers
        • Body
      • Response
      • Usage notes
      • Code Examples
Powered by GitBook
On this page
  • Fill a Dataset
  • Example Usage
  • Sample Python Code

Was this helpful?

Export as PDF
  1. REST API
  2. Dataset API

Fill

Use this API to populate an existing dataset with new data. This API allows you to insert or append rows of data into a pre-defined dataset, making it useful for updating data during your projects to keep ML models in Graphite Note with up to date datasets.

fill_url = "https://app.graphite-note.com/api/dataset-fill"

Fill a Dataset

To populate a dataset, follow these steps:

  1. Make a POST request to /dataset-fill with the required parameters in the request body.

  2. Include the user-code and dataset-code to identify the dataset and the user making the request.

  3. Define the structure of the data by specifying the columns parameter, which includes details like column names, aliases, types, subtypes, and optional formats.

  4. Provide the data to be inserted via the insert-data parameter.

  • If compressed is false, the data should be formatted as a JSON-escaped string.

  • If compressed is true, the data should be base64-encoded after being gzipped.

If you have large dataset, it is a good idea to call dataset-fill in batch sizes of 10.000 rows, for example. Working example is below.

Example of insert-data with JSON (when compressed: false):

{
  "insert-data": "[[\"536365\",\"85123A\",\"WHITE HANGING HEART T-LIGHT HOLDER\",6,\"2010-12-01 08:26:00\",2.55,17850], [\"536365\",\"71053\",\"WHITE METAL LANTERN\",6,\"2010-12-01 08:26:00\",3.39,17850]]",
  "compressed": false
}
  1. Use the append parameter to indicate whether to append the data to the existing dataset (true) or truncate the dataset before inserting the new data (false).

  2. use the compressed parameter to specify if the data is gzip compressed (true) or not (false).

Example Usage

Filling a Dataset with Base64-encoded Data

For example, making a POST request to the following URL with the provided JSON body would result in the response below:

POST /dataset-fill
Authorization: Bearer YOUR-TENANT-TOKEN

{
  "user-code": "0f02b4d4f9ae",
  "dataset-code": "eca2ad3940e3",
  "columns": [
    {
      "name": "InvoiceNo",
      "alias": "InvoiceNo",
      "type": "dimension",
      "subtype": "text"
    },
    {
      "name": "StockCode",
      "alias": "StockCode",
      "type": "dimension",
      "subtype": "text"
    },
    {
      "name": "Description",
      "alias": "Description",
      "type": "dimension",
      "subtype": "text"
    },
    {
      "name": "Quantity",
      "alias": "Quantity",
      "type": "measure",
      "subtype": "numeric",
      "format": "#,###.##"
    },
    {
      "name": "InvoiceDate",
      "alias": "InvoiceDate",
      "type": "dimension",
      "subtype": "datetime",
      "format": "Y-d-m H:i:s"
    },
    {
      "name": "UnitPrice",
      "alias": "UnitPrice",
      "type": "measure",
      "subtype": "numeric",
      "format": "#,###.##"
    },
    {
      "name": "CustomerID",
      "alias": "CustomerID",
      "type": "measure",
      "subtype": "numeric",
      "format": "#,###.##"
    }
  ],
  "insert-data": "H4sIAAAAAAAACtT9W4/...NDtGHlBlNUP4lMXrO9OfaUk2XMReE/t///68rEPhOT+AC"
  "compressed": true,
  "append": true
}

Response

{
  "data": {
    "status": "success",
    "details": {
      "dataset-code": "eca2ad3940e3",
      "rows-count": 518125
    }
  }
}

The response will confirm the status of the operation, including the dataset code and the number of rows inserted.


Sample Python Code

This Python script reads data from a PostgresSQL, converts it to JSON, compresses it using gzip, and encodes it as a Base64 string, ready to be sent via API requests.

Then It sends to Graphite Note in batch size of 10.000 rows.

import psycopg2
import json
import gzip
import base64
import requests
from io import BytesIO
import time

db_config = {
    "host": "localhost",
    "user": "",
    "password": "",
    "port": "",
    "database": "dwh"
}

tenant_token = 'YOUT-TOKEN'
dataset_code = "GN-DATASET-CODE"
user_code = "GN-USER-CODE"
fill_url = "https://app.graphite-note.com/api/dataset-fill"
complete_url = "https://app.graphite-note.com/api/dataset-complete"
batch_size = 10000

columns = [
    {"name": "order", "alias": "order_uuid", "type": "dimension", "subtype": "text"},
    {"name": "customer_id", "alias": "customer_id", "type": "dimension", "subtype": "text"},
    {"name": "number_of_items", "alias": "number_of_items", "type": "measure", "subtype": "numeric", "format": "#,###.##"},
]

headers = {
    "Authorization": f"Bearer {tenant_token}",
    "Content-Type": "application/json"
}

def compress_data(data):
    json_string = json.dumps(data)
    buffer = BytesIO()
    with gzip.GzipFile(fileobj=buffer, mode="w") as f:
        f.write(json_string.encode("utf-8"))
    return base64.b64encode(buffer.getvalue()).decode("utf-8")

def send_batch(batch, batch_number):
    print(f"\nSending batch #{batch_number} with {len(batch)} rows...")
    start_time = time.perf_counter()

    payload = {
        "user-code": user_code,
        "dataset-code": dataset_code,
        "columns": columns,
        "insert-data": compress_data(batch),
        "compressed": True,
        "append": True
    }
    response = requests.post(fill_url, headers=headers, json=payload)

    end_time = time.perf_counter()
    elapsed = round(end_time - start_time, 2)

    print(f"Batch #{batch_number} response: {response.status_code} (Time: {elapsed} seconds)")
    if response.status_code != 200:
        print(response.text)

def send_completion():
    payload = {
        "user-code": user_code,
        "dataset-code": dataset_code
    }
    response = requests.post(complete_url, headers=headers, json=payload)
    print(f"\nComplete response: {response.status_code}")
    if response.status_code != 200:
        print(response.text)

# === MAIN EXECUTION ===
try:
    conn = psycopg2.connect(**db_config)
    cursor = conn.cursor(name='stream_cursor')

    cursor.itersize = batch_size
    cursor.execute("SELECT order, customer_id, number_of_items FROM public.my_dwh_db")

    batch = []
    batch_number = 1

    for row in cursor:
        clean_row = [str(val) if not isinstance(val, (int, float)) else val for val in row]
        batch.append(clean_row)
        if len(batch) >= batch_size:
            send_batch(batch, batch_number)
            batch = []
            batch_number += 1

    if batch:
        send_batch(batch, batch_number)

    send_completion()

except Exception as e:
    print("Error:", e)

finally:
    if 'cursor' in locals():
        cursor.close()
    if 'conn' in locals():
        conn.close()
PreviousCreateNextComplete

Last updated 6 days ago

Was this helpful?