Training Models with XGBoost in Your Browser

Build and fine-tune XGBoost models entirely online — no installations, just data, tuning, and results inside your browser.

By Nate Rosidi, KDnuggets Market Trends & SQL Content Specialist on May 30, 2025 in Machine Learning

Image by Author | Canva

What if you could train powerful machine learning models directly from your browser — no installations, no configurations, just data and code?

In this article, we will look at doing just that, specifically how using TrainXGB can train an XGBoost model fully online, end-to-end. We will accomplish this by using a real-world dataset from Haensel. I will guide you through the steps of training, tuning, and evaluating a model all within your browser tab, using the Predicting Price dataset.

Understanding the Data

Let’s take a look at what we have. It's also small, but a real-life dataset was made for real-world data science hiring rounds by Haensel. Here’s the link to this project.

Here is the data you are working with:

CSV file with seven unnamed attributes
Target variable: price
Filename: sample.csv

And here is your assignment:

Perform data exploration
Fit the machine leanring model
Perform cross-validation and evaluate the performance of your model

Train-Test Split

Let’s randomly split the dataset into training and test sets. To keep this fully-online and code-free, you can upload the dataset to ChatGPT and use this prompt.

Split the atttached dataset into train and test (80%-20%) sets and send the datasets back to me.

Here is the output.

We're ready. It's time to upload the dataset to TrainXGB. Here is what it looks like:

Here, there are four steps visible:

Data
Configuration
Training & Result
Inference

We will explore all of these. Now let’s upload our sample.csv from the data part, which we will call data exploration.

Data Exploration (Data)

Now, at this step, the platform provides a quick glance at the dataset. Here is the head of the dataset:

Also, it reduces the memory, which is good.

When you click on Show Dataset Description, this code works: df.describe:

This part can be improved. A little bit of data visualization would work better. But this will be enough for us now.

Model Building (Configuration)

After your dataset is uploaded, the next step is to setup your XGBoost model. Though still in the browser, this is where it starts to feel a bit more “hands-on”. Here is what each part of this setup does:

Select Feature Columns

In here, you can select which columns to use for input. In this example, you will observe the following columns:

loc1, loc2: categorical location data
para1, para2, para3, para4: Probably numerical or engineered features
dow: This may be the day of the week, could be categorical or ordinal
price: It is your target, so this will not be considered a feature

If you click on Select All Columns, it will select all the columns, but ensure you uncheck the price column because you do not want the dependent variable to be an input.

Target Column

It is pretty straightforward. Let’s select the target column.

XGBoost Model Type

Here you have two options. Choose whether you’re doing regression or classification. Since price is a numeric, continuous value, I’ll choose Regressor instead of Classifier.

Evaluation Metrics

Here you will tell the system how you want to asses your model. It will change if you select a classifier.

Train Split Ratio

The slider is used to set the percentage of your data used for training. In this case, it is set to 0.80; I split the dataset.

80% Training
20% for testing

This is a default split, and it typically works well for small to medium datasets.

Hyperparameters

We can control how our XGBoost trees grow with this part. These all affect performance and training speed:

Tree Method: hist - Employs histogram-based training, which is faster on bigger datasets
Max Depth: 6 - Limits the depth each tree can reach; a deeper tree has much more complexity to accommodate, but can also lead to overfitting
Number of Trees: 100 - The number of total boosting rounds; increases = training potential performance, but slower = more trees
Subsample: 1 - Percentage of rows of data used for each tree; decreasing this helps to avoid overfitting
Eta (Learning Rate): 0.30 - Learning rate is the feature that controls the step size of the weight updates; smaller values = slower and more precise training; that's quite a high rate of 0.3
colsample_bytree / bylevel / bynode : 1 - These are the parameters that control the number of features to be picked randomly while building trees

Evaluation Metrics (Training Results)

When your model is trained, the platform uses the selected metric(s) to automatically evaluate its performance. Here, we chose RMSE (root mean squared error), which is totally reasonable for predicting continuous values such as price.

Now that we have done everything, it is time to click on the Train XGBoost.

Now you can see the process like this.

And here is the final graph.

This is the output.

This gives us a reasonable baseline RMSE; the lower the RMSE, the better our model will be able to predict.

Now, you can see the options Download Model and Show Feature Importance. So you can download the model too.

Here would be the final format for you.

When we train a model and click the Feature Importance button, we can see how much each feature has contributed to the model's predictions. Features are sorted by gain, which indicates how much a feature improved the accuracy. Here is the output.

Here is the evaluation:

Far and away the #1 Influencer: para4 has the most dominant feature in the predictive power
Not quite as good: para2 is also quite high
Mid-tier importance: para1, loc1, para2, loc2 offer mid-tier importance
Low impact: dow and loc1 did not really moved the needle

This breakdown not only shows you what the model is looking at, but also directions for feature engineering; perhaps you go deeper on para4, or you question if dow and loc1 are features that add noise.

Final Prediction (Inference)

We now have our model trained and tuned on sample data. Now let’s try the test data you will use on your model to see how the model may perform in the wild. Here we will use the test data that we split.

Upload the data and select the features, like this. We did this previously:

Here is the output.

All of these predictions rely on the input features (loc1, loc2, para1, dow, etc.) from the test set.

Note that this doesn't provide a row-by-row price comparison; it's a normalized presentation that doesn't display the actual price values. This still allows us to make a relative performance evaluation.

Final Thoughts

With the website TrainXGB, you do not need to install packages, set up environments, or write endless lines of code in order to create an XGBoost machine learning model any longer. TrainXGB makes it easy to build, tune, and evaluate real models from right inside your browser more quickly and cleanly than ever.

Even better, you can run real data science projects with data accessible to download, then upload straight into TrainXGB within minutes to see how your models perform.

Nate Rosidi is a data scientist and in product strategy. He's also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Nate writes on the latest trends in the career market, gives interview advice, shares data science projects, and covers everything SQL.