Determine Market Segmentation Using the Naive Bayes Method

    Determine Market Segmentation Using the Naive Bayes Method

    My college instructed me that I needed to write an article about naive Bayes while learning.

    Posted by Jagad on July 17, 2021
    -- views

    3 min read


    Data mining is currently popular among informatics students. In fact, there are courses that specifically address data mining concerns. Naive Bayes is the most widely used method in data mining courses.

    Naive Bayes is a classification based on probability and statistical methods created by British scientist Thomas Bayes that forecasts future opportunities based on past experience.

    in this article we will learn together how naive bayes works.

    Import Library

    import pandas as pd
    import numpy as np

    The code above is used to activate the pandas and numpy libraries, which will be used in the analysis phase. The Pandas library itself is used for processing data related to data frames, while the Numpy library is used for easy and fast array manipulation.

    Reading Data Using Python


    The code above is used to read the training data by using the pandas library. The training data read is in the form of an xlsx file.


    Converting Data To Integer

    Because naive bayes in Python cannot read training data in the form of strings, the training data is converted to integer form so that the data displayed is in the form of numbers.

    Here is the code to convert the training data to integer.

    from sklearn.preprocessing import LabelEncoder
    enc = LabelEncoder()
    data_training['Gender'] = enc.fit_transform(data_training['Gender'].values)
    data_training['Ever_Married'] = enc.fit_transform(data_training['Ever_Married'].values)
    data_training['Age'] = enc.fit_transform(data_training['Age'].values)
    data_training['Graduated'] = enc.fit_transform(data_training['Graduated'].values)
    data_training['Profession'] = enc.fit_transform(data_training['Profession'].values)
    data_training['Spending_Score'] = enc.fit_transform(data_training['Spending_Score'].values)

    After that, the data will turn into an integer.

    Data Integer

    Checking training data

    .info() is used to find out data type information.
    Data Integer

    It can be seen that the data used is 30, consisting of 9 columns, and each variable is of type integer and object, and there is no data that is null or empty (non-null).

    Determining Independent And Dependent Variables

    x = data_training.drop('Segmentation', axis=1)

    The code above is used to delete the dependent variable and call the independent variable. Then when run, it will look like the image below:

    Data without Segmentation

    Then to call the dependent variable, we can use the name of the existing data, followed by the column name, namely the "segmentation" column. To bring up the data, we can use the command "head".

    dependant variable only

    Reading Testing Data

    To display testing data, you can use the pandas library by using the read_excel function.

    read data testing

    Then the segmentation column will be dropped or separated, and the segmentation column will be deleted from the data.

    x_test = data_test.drop('Segmentation', axis=1)
    Drop dependent variable

    To display the segmentation column, we can use this command.

    y_test = data_test['Segmentation']


    To perform the classification, you can run the following command:

    from sklearn.naive_bayes import GaussianNB
    modelnb = GaussianNB()
    nbtrain =, y)
    Y_predict = nbtrain.predict(x_test)


    1. 181080200142 Jagad Yudha Awali
    2. 181080200134 Dewi Nur Afidah
    3. 181080200133 Ruri Aditya Pratama
    4. 181080200128 Tutut Anjarsari
    5. 181080200149 Jefry Fernando

    Post Reactions

    The writing on this website may contain errors in grammar, punctuation, etc. Please make a contribution here