This project aims to leverage machine learning techniques to predict crop yield, optimize input usage, and assist in agricultural decision-making. It includes detailed analyses and tools for:
- Predicting the effect of annual rainfall on crop yield.
- Recommending optimal fertilizer and pesticide usage.
- Yield prediction based on state-specific conditions.
A user-friendly interface was developed using Tkinter for real-time interaction and predictions.
-
Rainfall Impact on Yield Prediction
- Inputs: Annual Rainfall, Cultivated Area, Fertilizer, Pesticide, Crop Type.
- Model: Regression Forest.
- Metrics: R², MAE, MSE.
- Insights: Highlights the need for irrigation planning and optimized input usage.
-
Optimal Fertilizer and Pesticide Recommendation
- Model: Regression Forest with feature engineering.
- Outputs: Suggested optimal levels for fertilizers and pesticides per crop.
-
State-based Yield Prediction
- Inputs: State, Area, Annual Rainfall, Fertilizer, Pesticide, Crop, Season.
- Outputs: Region-specific yield predictions and input optimization.
-
Tkinter GUI
- Real-time predictions with an easy-to-use interface.
- Input fields for rainfall, fertilizers, pesticides, and other features.
- Displays yield predictions and recommendations interactively.
-
Data Preprocessing
- Normalized numerical inputs.
- One-hot encoded categorical variables (e.g., Crop, Season).
- Handled missing values effectively.
-
Model Development
- Chosen Model: Regression Forest for its accuracy and interpretability.
- Metrics: High R², low MAE and MSE validated the model's reliability.
-
Feature Engineering
- Combined features to model interactions.
- Incorporated regional and seasonal trends.
-
RandomForestClassifier:
- Used for Seasonal Crop Yield Comparison (Classification Task).
- Accuracy: 51.65%.
-
RandomForestRegressor:
- Used for Regression Tasks:
- Rainfall Impact on Yield Prediction:
- MSE: 25924.09, R²: 96.76%.
- Optimal Fertilizer Prediction:
- MSE: 1.89e+14, R²: 97.43%.
- State-based Yield Prediction:
- MSE: 74883.98, R²: 90.65%.
- Rainfall Impact on Yield Prediction:
- Used for Regression Tasks:
- The Random Forest algorithm was chosen for its robustness, ability to handle non-linear relationships, and support for both classification and regression tasks.
- All trained models have been saved in a single file:
all_models.pkl.
- Python 3.8+
- Required libraries:
pandas,numpy,scikit-learn,matplotlib,seaborn,tkinter.
- Clone the repository:
git clone https://github.com/Harshad071/AIML-Project.git cd AIML-Project
The repository includes a Jupyter Notebook for Feature Engineering, which performs the following:
- Handles missing values using appropriate strategies for numerical and categorical columns.
- Encodes categorical variables using
LabelEncoder. - Scales numerical features using
StandardScalerfor improved model performance. - Saves the cleaned and preprocessed data for further analysis.
File: feature_engineering.ipynb
The Model Training notebook demonstrates:
- Training multiple models for various prediction tasks, including classification and regression.
- Tasks covered:
- Seasonal crop yield comparison (classification).
- Rainfall impact on yield prediction (regression).
- Optimal fertilizer and pesticide recommendation (regression).
- State-based yield prediction (regression).
- Saving all trained models in a single
.pklfile for easy reuse.
File: model_training.ipynb
Tip: To replicate the results:
- Ensure the dataset file
crop_yield.csvis placed in the correct directory.- Run the notebooks in sequence:
- Start with
feature_engineering.ipynbto preprocess the data.- Proceed with
model_training.ipynbto train and save the models.- Use the saved models (
all_models.pkl) for predictions in downstream tasks.
The notebooks also include:
- Histograms, boxplots, and count plots to explore data distributions and detect outliers.
- A correlation heatmap to analyze relationships between numerical features.
These visualizations provide valuable insights into the dataset and inform feature engineering decisions.