Training Models with XGBoost in Your Browser
Build and fine-tune XGBoost models entirely online — no installations, just data, tuning, and results inside your browser.

Image by Author | Canva
What if you could train powerful machine learning models directly from your browser — no installations, no configurations, just data and code?
In this article, we will look at doing just that, specifically how using TrainXGB can train an XGBoost model fully online, end-to-end. We will accomplish this by using a real-world dataset from Haensel. I will guide you through the steps of training, tuning, and evaluating a model all within your browser tab, using the Predicting Price dataset.
Understanding the Data
Let’s take a look at what we have. It's also small, but a real-life dataset was made for real-world data science hiring rounds by Haensel. Here’s the link to this project.
Here is the data you are working with:
- CSV file with seven unnamed attributes
- Target variable:
price - Filename:
sample.csv
And here is your assignment:
- Perform data exploration
- Fit the machine leanring model
- Perform cross-validation and evaluate the performance of your model
Train-Test Split
Let’s randomly split the dataset into training and test sets. To keep this fully-online and code-free, you can upload the dataset to ChatGPT and use this prompt.
Split the atttached dataset into train and test (80%-20%) sets and send the datasets back to me.
Here is the output.

We're ready. It's time to upload the dataset to TrainXGB. Here is what it looks like:

Here, there are four steps visible:
- Data
- Configuration
- Training & Result
- Inference
We will explore all of these. Now let’s upload our sample.csv from the data part, which we will call data exploration.
Data Exploration (Data)
Now, at this step, the platform provides a quick glance at the dataset. Here is the head of the dataset:

Also, it reduces the memory, which is good.

When you click on Show Dataset Description, this code works: df.describe:

This part can be improved. A little bit of data visualization would work better. But this will be enough for us now.
Model Building (Configuration)

After your dataset is uploaded, the next step is to setup your XGBoost model. Though still in the browser, this is where it starts to feel a bit more “hands-on”. Here is what each part of this setup does:
Select Feature Columns
In here, you can select which columns to use for input. In this example, you will observe the following columns:
loc1, loc2: categorical location datapara1, para2, para3, para4: Probably numerical or engineered featuresdow: This may be the day of the week, could be categorical or ordinalprice: It is your target, so this will not be considered a feature
If you click on Select All Columns, it will select all the columns, but ensure you uncheck the price column because you do not want the dependent variable to be an input.

Target Column
It is pretty straightforward. Let’s select the target column.

XGBoost Model Type
Here you have two options. Choose whether you’re doing regression or classification. Since price is a numeric, continuous value, I’ll choose Regressor instead of Classifier.

Evaluation Metrics
Here you will tell the system how you want to asses your model. It will change if you select a classifier.

Train Split Ratio
The slider is used to set the percentage of your data used for training. In this case, it is set to 0.80; I split the dataset.
- 80% Training
- 20% for testing

This is a default split, and it typically works well for small to medium datasets.
Hyperparameters
We can control how our XGBoost trees grow with this part. These all affect performance and training speed:
- Tree Method: hist - Employs histogram-based training, which is faster on bigger datasets
- Max Depth: 6 - Limits the depth each tree can reach; a deeper tree has much more complexity to accommodate, but can also lead to overfitting
- Number of Trees: 100 - The number of total boosting rounds; increases = training potential performance, but slower = more trees
- Subsample: 1 - Percentage of rows of data used for each tree; decreasing this helps to avoid overfitting
- Eta (Learning Rate): 0.30 - Learning rate is the feature that controls the step size of the weight updates; smaller values = slower and more precise training; that's quite a high rate of 0.3
- colsample_bytree / bylevel / bynode : 1 - These are the parameters that control the number of features to be picked randomly while building trees

Evaluation Metrics (Training Results)
When your model is trained, the platform uses the selected metric(s) to automatically evaluate its performance. Here, we chose RMSE (root mean squared error), which is totally reasonable for predicting continuous values such as price.
Now that we have done everything, it is time to click on the Train XGBoost.

Now you can see the process like this.

And here is the final graph.

This is the output.

This gives us a reasonable baseline RMSE; the lower the RMSE, the better our model will be able to predict.
Now, you can see the options Download Model and Show Feature Importance. So you can download the model too.

Here would be the final format for you.

When we train a model and click the Feature Importance button, we can see how much each feature has contributed to the model's predictions. Features are sorted by gain, which indicates how much a feature improved the accuracy. Here is the output.

Here is the evaluation:
- Far and away the #1 Influencer:
para4has the most dominant feature in the predictive power - Not quite as good:
para2is also quite high - Mid-tier importance:
para1, loc1, para2, loc2offer mid-tier importance - Low impact:
dowandloc1did not really moved the needle
This breakdown not only shows you what the model is looking at, but also directions for feature engineering; perhaps you go deeper on para4, or you question if dow and loc1 are features that add noise.
Final Prediction (Inference)

We now have our model trained and tuned on sample data. Now let’s try the test data you will use on your model to see how the model may perform in the wild. Here we will use the test data that we split.
Upload the data and select the features, like this. We did this previously:

Here is the output.

All of these predictions rely on the input features (loc1, loc2, para1, dow, etc.) from the test set.
Note that this doesn't provide a row-by-row price comparison; it's a normalized presentation that doesn't display the actual price values. This still allows us to make a relative performance evaluation.
Final Thoughts
With the website TrainXGB, you do not need to install packages, set up environments, or write endless lines of code in order to create an XGBoost machine learning model any longer. TrainXGB makes it easy to build, tune, and evaluate real models from right inside your browser more quickly and cleanly than ever.
Even better, you can run real data science projects with data accessible to download, then upload straight into TrainXGB within minutes to see how your models perform.
Nate Rosidi is a data scientist and in product strategy. He's also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Nate writes on the latest trends in the career market, gives interview advice, shares data science projects, and covers everything SQL.