FinPulse is an autonomous finance platform that uses machine learning and statistical programming to enhance the way business owners manage their finances. It offers bespoke machine learning-trained models to predict and forecast sales and income to help plan their business operations.
main.py: A graphical user interface (GUI) application built using Tkinter that allows users to load a dataset, load a pre-trained machine learning model, run predictions on the dataset, and visualize the results.
model.py: A script for training and saving machine learning models using the Walmart store sales dataset. The script uses Random Forest, XGBoost, and Gradient Boosting regressors to predict weekly sales.
The Dataset was split using the train_test_split function within the sklearn.model_selection library and then the RandomForest, XGB, and Gradient Booster models are created. The target variable was selected as the weekly sales, and all others except the full date string were chosen as the features. The full date was parsed into the seperate components for day, month, and year.
After the models are fit and tested, they are exported and saved so other computers may open the models as needed to run the models.
The ML models were created, trained, and fit in the model.py file, while a GUI was created to allow for interativity with the code, housed in the main.py file.
Due to the model being pre-trained, selecting the test_data.csv file selects the dataset to make predictions for the future. Both the dataset and the model must be selected. Once run, a grapgh can then be populated in the window, showing the predictions.
The trained models will be saved as rf_model.joblib, xgb_model.joblib, and gb_model.joblib. The script will print the Root Mean Squared Error (RMSE) for each model on the test set.
Load Dataset: Click the "Load Dataset" button to select a CSV file containing the financial dataset. Load Model: Click the "Load Model" button to select a pre-trained machine learning model (joblib or pkl format). Run Model: Click the "Run Model" button to make predictions on the loaded dataset using the selected model. Display Graph: Click the "Display Graph" button to visualize the predictions on a line plot.
Store: Int
Month: Int
Day: Int
Month: Int
Holiday_Flag: Binary (1 for holiday during the week, 0 for none)
Temprature: Float
Fuel Price: Float
CPI: Float
Unemployment: Float
The performance of the models heavily depends on the quality and relevance of the features in the dataset. If important features are missing or if the data is noisy, it can adversely impact the model's accuracy.
The models assume that the underlying patterns in the data remain relatively stable over time. If the financial data exhibits non-stationary behavior, the model may struggle to make accurate predictions.
The performance of the models can be sensitive to the choice of hyperparameters. The current hyperparameters used in the scripts might not be optimal for all datasets, and fine-tuning may be required for better results.
The models are evaluated based on the Root Mean Squared Error (RMSE) on the test set. While RMSE provides a measure of prediction accuracy, it might not capture all aspects of model performance, especially in scenarios with imbalanced data or outliers.
Random Forest, XGBoost, and Gradient Boosting models assume a certain level of linearity in the relationships between features and the target variable. If the underlying relationships are highly non-linear, these models may not capture them accurately.
Invest time in thorough feature engineering to identify and create relevant features that better capture the underlying patterns in the financial data.
Implement robust data preprocessing techniques to handle missing values, outliers, and scale features appropriately. This can enhance the models' ability to generalize to new data.
If the financial data exhibits time-dependent patterns, consider applying advanced time series analysis techniques or using specialized time series models for more accurate predictions.
Conduct a comprehensive hyperparameter tuning process using techniques like grid search or randomized search to find the optimal hyperparameters for the models.
Explore ensemble methods by combining predictions from multiple models. This can help mitigate the weaknesses of individual models and improve overall predictive performance.
Enhance model interpretability by using techniques such as SHAP (SHapley Additive exPlanations) values or LIME (Local Interpretable Model-agnostic Explanations) to understand and explain the model's predictions.
Experiment with regularization techniques to prevent overfitting, especially if the dataset is limited in size.
Implement a system for continuous monitoring of model performance and update the models as new data becomes available. This ensures that the models stay relevant over time.
If applicable, incorporate user feedback into the model training process to adapt the models to changing business conditions or user requirements.
Implement robust cross-validation strategies to get a better estimate of the model's generalization performance and reduce the risk of overfitting to the training data.
