SAP-RPT-1: SAP's Bold Bet on Tabular Models

⚡

Key Takeaways

1SAP-RPT-1, a foundational model, aims to transform the analysis of tabular data with a transformer-inspired architecture.

2SAP is investing in tabular models to reduce costs and enhance long-term predictive performance.

3The SAP-RPT-1 suite offers models tailored to various needs, including an open-source version for developers.

💡Why it matters — SAP-RPT-1 could redefine enterprise tabular data management, optimizing resources and opening new business opportunities.

SAP-RPT-1: A Revolution in Tabular Data Processing

Artificial Intelligence

The SAP-RPT-1 model represents a significant advancement in the field of foundational models, focusing on leveraging vast datasets to accomplish a variety of tasks. Current foundational models, often based on the transformer architecture, have been popularized by giants like Google and OpenAI. While their development requires considerable resources, these models stand out for their ability to predict accurately at scale, perform tasks without prior specific training, and handle various types of data, whether textual, visual, or auditory.

In a context where language models like ChatGPT are primarily trained on textual data, the idea of extending these principles to other types of data, particularly tabular data, is becoming increasingly relevant. Some companies, aware of the potential of tabular data, are investing in the development of foundational models tailored to this data. The goal is to transform initial investments, often high, into significant future benefits, such as improved predictive performance, increased productivity, and new revenue streams. Historically, each use case required a distinct tabular model, leading to high costs for companies with multiple AI applications. In contrast, a foundational model could potentially address multiple needs simultaneously through its generalization capabilities, thus offering a more economical and efficient solution.

SAP's Initiative in Tabular Models

SAP, globally recognized for its Enterprise Resource Planning (ERP) software, has recently garnered attention by launching its own range of Pretrained Relational Transformer (RPT) models. These models are designed to be trained on vast historical datasets covering various business sectors. This article examines how SAP's approach to tabular foundational models could transform the way businesses utilize these technologies. We will explore the evolution of RPT models, their technical architecture, a practical demonstration in Python, and the potential advantages and disadvantages of these models. Finally, we will discuss future strategies for tabular foundational models at enterprise providers like SAP.

Pretrained Relational Transformers

The Path to RPT at SAP

SAP is a major player in the ERP software space, helping businesses efficiently manage critical processes across various domains such as sales, finance, human resources, and logistics. For several years, SAP has been investing in artificial intelligence, offering its clients two main types of AI models: those optimized for the ABAP language and the S/4HANA database, and narrow AI models on SAP's Business Technology Platform. With the rise of ChatGPT, SAP has expanded its offerings with conversational and generative AI solutions under the Joule brand, tailored for specific use cases like knowledge retrieval and code generation. SAP also provides integrations with third-party pretrained model providers, such as OpenAI and Anthropic, via the Generative AI hub. The launch of the SAP-RPT-1 suite marks a new milestone with tabular foundational models trained on SAP's vast ERP data.

Technical Architecture of SAP-RPT-1

SAP's Pretrained Relational Transformer (RPT) adapts the architecture of classical transformers for tabular data. The initial SAP-RPT-1 models utilize the ConTextTab architecture, described by Spinaci et al. (2025), which is inspired by the TabPFN architecture proposed by Hollmann et al. (2022).

TabPFN is a transformer model pretrained on synthetic tables, encapsulating various causal relationships between the cells of the columns. Even with synthetic data, TabPFN can outperform other models in scenarios involving relatively small tables, with fewer than 10,000 rows, which may contain missing and outlier values. Through in-context learning (ICL), TabPFN can generalize to various classification tasks without requiring additional hyperparameter optimization. However, the exclusive use of synthetic data limits the capture of meaningful semantic values from real datasets. ConTextTab addresses this issue by training the transformer on real data and using semantic embeddings for categorical and textual data.

The initial SAP-RPT-1 suite includes three models:

sap-rpt-1-small: a lightweight model for rapid inference and prototyping.
sap-rpt-1-large: a larger model for enhanced predictive performance.
sap-rpt-1-oss: an open-source version available on HuggingFace and GitHub.

These models can be used for various classification and regression tasks using in-context learning with few examples. A free trial version of SAP-RPT-1 is available for non-productive evaluations in a sandbox environment.

Practical Implementation: Using SAP-RPT-1

To access the free trial version of SAP-RPT-1, simply log in via a specific link. Once logged in, a personal API token is generated, which should be saved in an access_token.json file for later use.

Next, create a CSV file named sales_data_test.csv with the provided data. This fictional dataset is also accessible from the SAP-RPT-1 sandbox environment.

Table 1: Test Dataset

The task is to predict the values of the SALESGROUP column (indicated by [PREDICT]) from the other columns. SAP-RPT-1 uses ICL with few examples, requiring complete context rows and query rows with [PREDICT].

Constructing and Sending the Prediction Request

The SAP-RPT-1 model requires that the request payload be formatted with two main keys: rows and index_column. The rows must be a list of dictionary objects representing the rows of the input data table, and index_column must be the name of the index column.

Here’s how to create the request payload from sales_data_test.csv:

import pandas as pd
df = pd.read_csv("sales_data_test.csv")  # Load the CSV file
rows = df.to_dict(orient="records")  # Convert to list of dicts
index_column = "id"
"index_column": index_column

The payload should look like this:

"PRODUCT": "Laptop",
"PRICE": 999.99,
"CUSTOMER": "Acme Corp",
"COUNTRY": "USA",
"SALESGROUP": "[PREDICT]"

Next, set the HTTP request headers:

# Load the token
with open("access_token.json", "r") as token_file:
    token_data = json.load(token_file)
AUTH_TOKEN = token_data["access_token"]

# Define the HTTP request headers
"Content-Type": "application/json",
"Authorization": f"Bearer {AUTH_TOKEN}"

Finally, send the POST request to obtain predictions:

url = "https://rpt.cloud.sap/api/predict"
response = requests.post(url, json=payload, headers=headers)
print(response.json())

In case the request fails, common errors include:

Bad Request (error code 400): often caused by invalid data format or validation error.
Unauthorized (401).