Contents

# Determine Market Segmentation Using the Naive Bayes Method

My college instructed me that I needed to write an article about naive Bayes while learning.

Posted by Jagad on July 17, 2021
-- views

## Introduction

Data mining is currently popular among informatics students. In fact, there are courses that specifically address data mining concerns. Naive Bayes is the most widely used method in data mining courses.

Naive Bayes is a classification based on probability and statistical methods created by British scientist Thomas Bayes that forecasts future opportunities based on past experience.

## Import Library

``````import pandas as pd
import numpy as np
``````

The code above is used to activate the pandas and numpy libraries, which will be used in the analysis phase. The Pandas library itself is used for processing data related to data frames, while the Numpy library is used for easy and fast array manipulation.

``````data_training=pd.read_excel('data_training.xlsx')
``````

The code above is used to read the training data by using the pandas library. The training data read is in the form of an xlsx file.

## Converting Data To Integer

Because naive bayes in Python cannot read training data in the form of strings, the training data is converted to integer form so that the data displayed is in the form of numbers.

Here is the code to convert the training data to integer.

``````from sklearn.preprocessing import LabelEncoder
enc = LabelEncoder()
data_training['Gender'] = enc.fit_transform(data_training['Gender'].values)
data_training['Ever_Married'] = enc.fit_transform(data_training['Ever_Married'].values)
data_training['Age'] = enc.fit_transform(data_training['Age'].values)
data_training['Profession'] = enc.fit_transform(data_training['Profession'].values)
data_training['Spending_Score'] = enc.fit_transform(data_training['Spending_Score'].values)
``````

After that, the data will turn into an integer.

## Checking training data

.info() is used to find out data type information.

``````data_training.info()
``````

It can be seen that the data used is 30, consisting of 9 columns, and each variable is of type integer and object, and there is no data that is null or empty (non-null).

## Determining Independent And Dependent Variables

``````x = data_training.drop('Segmentation', axis=1)
``````

The code above is used to delete the dependent variable and call the independent variable. Then when run, it will look like the image below:

Then to call the dependent variable, we can use the name of the existing data, followed by the column name, namely the "segmentation" column. To bring up the data, we can use the command "head".

To display testing data, you can use the pandas library by using the read_excel function.

``````data_test=pd.read_excel('data_test.xlsx')
``````

Then the segmentation column will be dropped or separated, and the segmentation column will be deleted from the data.

``````x_test = data_test.drop('Segmentation', axis=1)
``````

To display the segmentation column, we can use this command.

``````y_test = data_test['Segmentation']
``````

## Classification

To perform the classification, you can run the following command:

``````from sklearn.naive_bayes import GaussianNB
modelnb = GaussianNB()
nbtrain = modelnb.fit(x, y)
Y_predict = nbtrain.predict(x_test)
print(Y_predict)
``````