Training Models with XGBoost in Your Browser

Build and fine-tune XGBoost models entirely online — no installations, just data, tuning, and results inside your browser.



Training Models with XGBoost in Your Browser
Image by Author | Canva
 

What if you could train powerful machine learning models directly from your browser — no installations, no configurations, just data and code?

In this article, we will look at doing just that, specifically how using TrainXGB can train an XGBoost model fully online, end-to-end. We will accomplish this by using a real-world dataset from Haensel. I will guide you through the steps of training, tuning, and evaluating a model all within your browser tab, using the Predicting Price dataset.

 

Understanding the Data

 
Let’s take a look at what we have. It's also small, but a real-life dataset was made for real-world data science hiring rounds by Haensel. Here’s the link to this project.

Here is the data you are working with:

  • CSV file with seven unnamed attributes
  • Target variable: price
  • Filename: sample.csv

And here is your assignment:

  • Perform data exploration
  • Fit the machine leanring model
  • Perform cross-validation and evaluate the performance of your model

 

Train-Test Split

 
Let’s randomly split the dataset into training and test sets. To keep this fully-online and code-free, you can upload the dataset to ChatGPT and use this prompt.

Split the atttached dataset into train and test (80%-20%) sets and send the datasets back to me.

 

Here is the output.

 
Dataset for Training Models with XGBoost
 

We're ready. It's time to upload the dataset to TrainXGB. Here is what it looks like:

 
XGBoost Panel App
 

Here, there are four steps visible:

  1. Data
  2. Configuration
  3. Training & Result
  4. Inference

We will explore all of these. Now let’s upload our sample.csv from the data part, which we will call data exploration.

 

Data Exploration (Data)

 
Now, at this step, the platform provides a quick glance at the dataset. Here is the head of the dataset:

 
Data Exploration for Training Models with XGBoost
 

Also, it reduces the memory, which is good.

 
Data Exploration for Training Models with XGBoost
 

When you click on Show Dataset Description, this code works: df.describe:

 
Data Exploration for Training Models with XGBoost
 

This part can be improved. A little bit of data visualization would work better. But this will be enough for us now.

 

Model Building (Configuration)

 
Model Building with XGBoost
 

After your dataset is uploaded, the next step is to setup your XGBoost model. Though still in the browser, this is where it starts to feel a bit more “hands-on”. Here is what each part of this setup does:

 

Select Feature Columns

In here, you can select which columns to use for input. In this example, you will observe the following columns:

  • loc1, loc2: categorical location data
  • para1, para2, para3, para4: Probably numerical or engineered features
  • dow: This may be the day of the week, could be categorical or ordinal
  • price: It is your target, so this will not be considered a feature

If you click on Select All Columns, it will select all the columns, but ensure you uncheck the price column because you do not want the dependent variable to be an input.

 
Model Configuration with XGBoost
 

Target Column

It is pretty straightforward. Let’s select the target column.

 
Target Column in XGBoost
 

XGBoost Model Type

Here you have two options. Choose whether you’re doing regression or classification. Since price is a numeric, continuous value, I’ll choose Regressor instead of Classifier.

 
XGBoost Model Type

 

Evaluation Metrics

Here you will tell the system how you want to asses your model. It will change if you select a classifier.

 
XGBoost Evaluation Metrics
 

Train Split Ratio

 
The slider is used to set the percentage of your data used for training. In this case, it is set to 0.80; I split the dataset.

  • 80% Training
  • 20% for testing

 
XGBoost Train Split Ratio
 

This is a default split, and it typically works well for small to medium datasets.

 

Hyperparameters

We can control how our XGBoost trees grow with this part. These all affect performance and training speed:

  • Tree Method: hist - Employs histogram-based training, which is faster on bigger datasets
  • Max Depth: 6 - Limits the depth each tree can reach; a deeper tree has much more complexity to accommodate, but can also lead to overfitting
  • Number of Trees: 100 - The number of total boosting rounds; increases = training potential performance, but slower = more trees
  • Subsample: 1 - Percentage of rows of data used for each tree; decreasing this helps to avoid overfitting
  • Eta (Learning Rate): 0.30 - Learning rate is the feature that controls the step size of the weight updates; smaller values = slower and more precise training; that's quite a high rate of 0.3
  • colsample_bytree / bylevel / bynode : 1 - These are the parameters that control the number of features to be picked randomly while building trees

 
Hyperparameters in XGBoost
 

Evaluation Metrics (Training Results)

 
When your model is trained, the platform uses the selected metric(s) to automatically evaluate its performance. Here, we chose RMSE (root mean squared error), which is totally reasonable for predicting continuous values such as price.

Now that we have done everything, it is time to click on the Train XGBoost.

 
Evaluation Metrics in XGBoost
 

Now you can see the process like this.

 
Evaluation Metrics in XGBoost
 

And here is the final graph.
 
Evaluation Metrics in XGBoost
 

This is the output.

 

Evaluation Metrics in XGBoost

 

This gives us a reasonable baseline RMSE; the lower the RMSE, the better our model will be able to predict.

Now, you can see the options Download Model and Show Feature Importance. So you can download the model too.

 
Evaluation Metrics in XGBoost
 

Here would be the final format for you.

 
Evaluation Metrics in XGBoost
 

When we train a model and click the Feature Importance button, we can see how much each feature has contributed to the model's predictions. Features are sorted by gain, which indicates how much a feature improved the accuracy. Here is the output.

 
Evaluation Metrics in XGBoost
 

Here is the evaluation:

  • Far and away the #1 Influencer: para4 has the most dominant feature in the predictive power
  • Not quite as good: para2 is also quite high
  • Mid-tier importance: para1, loc1, para2, loc2 offer mid-tier importance
  • Low impact: dow and loc1 did not really moved the needle

This breakdown not only shows you what the model is looking at, but also directions for feature engineering; perhaps you go deeper on para4, or you question if dow and loc1 are features that add noise.

 

Final Prediction (Inference)

 
XGBoost Model Training
 

We now have our model trained and tuned on sample data. Now let’s try the test data you will use on your model to see how the model may perform in the wild. Here we will use the test data that we split.

Upload the data and select the features, like this. We did this previously:

 
XGBoost Model Training
 

Here is the output.

 
XGBoost Model Training
 

All of these predictions rely on the input features (loc1, loc2, para1, dow, etc.) from the test set.

Note that this doesn't provide a row-by-row price comparison; it's a normalized presentation that doesn't display the actual price values. This still allows us to make a relative performance evaluation.

 

Final Thoughts

 
With the website TrainXGB, you do not need to install packages, set up environments, or write endless lines of code in order to create an XGBoost machine learning model any longer. TrainXGB makes it easy to build, tune, and evaluate real models from right inside your browser more quickly and cleanly than ever.

Even better, you can run real data science projects with data accessible to download, then upload straight into TrainXGB within minutes to see how your models perform.
 
 

Nate Rosidi is a data scientist and in product strategy. He's also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Nate writes on the latest trends in the career market, gives interview advice, shares data science projects, and covers everything SQL.


Get the FREE ebook 'KDnuggets Artificial Intelligence Pocket Dictionary' along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

By subscribing you accept KDnuggets Privacy Policy


Get the FREE ebook 'KDnuggets Artificial Intelligence Pocket Dictionary' along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

By subscribing you accept KDnuggets Privacy Policy

Get the FREE ebook 'KDnuggets Artificial Intelligence Pocket Dictionary' along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

By subscribing you accept KDnuggets Privacy Policy

No, thanks!