What is Factor Analysis? | Explained Factor Analysis ~ Tech Era News

Data analysis is important for businesses because,

data-driven choices are the only way to be truly confident in business opinion.

data analysis is also important in research because,

it makes lot simpler and more accurate,

one such method is factor analysis.

1 - What is factor analysis

2 - Latent variables

3 - Assumptions in factor analysis

4 - Purpose of factor analysis

5- Types of factor analysis

6 - Issues with factor analysis

7 - Basic logic of factor analysis

1 - What is Factor Analysis

Factor analysis is a statistical technique,

that is used to reduce the large number of variables into smaller number of factors.

for example it is possible that variations in five observed variables mainly reflect the variation in one unobserved variable.

It is also known as dimension reduction.

Since it reduces the dimension or the total number of variables in the data set.

Factor analysis is a kind of latent

variable model.

consider a job satisfaction questionnaire.

a person's satisfaction with a job can be based on numerous factors such as satisfaction with the job role,

whether it's as per a person's qualification supervisor which can in turn depend on appraisal or communication satisfaction with co-workers pay etc.

Example of factor analysis say that you are a foodie and you want to pick a restaurant to go to,

so you start checking reviews on different restaurants you find,

that the reviews are categorized on the aspect of six variables which are

Waiting time

Cleanliness

Staff behavior

Taste of food

Food freshness and

Food temperature.

Too many variables are making it difficult for you to pick a particular restaurant that you would go to.

The two factors to pick a restaurant that you really care about are let's say service and food quality.

The variables in the reviews can be broadly categorized in these two factors as shown.

This is what factor analysis does

So service and food quality are not really present in the data but are derived out of the data, these are called the latent variable.

2 - Latent Variables

In statistics latent variables are variables that are not directly observed.

This is what we call factors it's actually difficult to measure numerically.

The mathematical model that aims to explain observed variables in terms of latent variables are called latent variable models.

hence factor analysis is a latent variable model.

examples of latent variables are

Quality of life

Business confidence

Morale happiness and

Conservatism among others.

3 - Assumptions in Factor Analysis

let's have a look at these

Firstly we assume that our data is clean there should be no outliers or missing values.

Secondly the sample size is expected to be greater than the number of factors.

Thirdly the variables are expected to be interrelated.

The concept of factor analysis is based on correlation of data, so that it can be grouped together.

we can perform something called buriedtest to analyze the correlation.

Forth matrix variables are expected that is the variables are expected to be of numeric type it should be in an interval of numbers and,

Lastly the data is preferred to be normalized however multivariate normalization is not necessary.

4 - Purpose of Factor Analysis

The primary purpose of using factor analysis is for data reduction.

Having too many related fields can make it difficult to analyze the data,

thus factor analysis reduces the number of variables.

factor analysis also helps in latent variable discovery as we saw in the examples before.

Some factors such as empathy cannot be measured but it can be formulated using other variables.

Factor analysis supports simplification of items in the subset of concepts.

Sometimes many fields in our data signify the same thing such as in the restaurant example delay in serving staff behavior and cleanliness signify the same factor which is service.

Moreover with factor analysis you can access the dimensionality and homogeneity in the data.

5- Types of Factor Analysis

Factor analysis can be broadly classified into two types efe and cfa.

Exploratory factor analysis is used to discover the underlying structure in the

data using something like correlation matrix,

it is used for getting insights out of the data.

and confirmatory factor analysis is based on the insights derived in efa.

So cfa is used to test those expectations it makes use of equations for modeling the structure.

Efa is further divided into many types the very popular pca or principal component analysis common factor analysis or just factor analysis.

image factoring that makes use of correlation matrix derived out of ols regression,

maximum likelihood method which is again based on the correlation matrix and other methods such as alpha factoring and weight square.

Out of these the most commonly used ones are principal component analysis and common factor analysis.

6 - Issues with factor analysis

First you need to understand whether to use principal component analysis or factor analysis,

Next you should know how to interpret the results of your analysis and,

finally you need to figure out how many factors to pick

let's address these issues one by one.

Principal component analysis tries to find the variables that are composites of observed variables.

such as in the house pricing data set pca would identify that air quality index is closely determined by the number of parks in the locality,

but in factor analysis we assume that there are some latent factors,

some immeasurable factors which can only be derived out of the given numeric variables and,

secondly in case of pca we take into account the total variance in the data.

that is the sum of unique variance.

variance due to error and common variance.

however in factor analysis only the common variance of shared variance is considered.

so when you want to find the latent variable using many variables,

use factor analysis and when you want to eliminate some variable that are having high variance use pca.

when the number of variables is more than 30 the result of pca and fa is the same.

Next you need to address how to interpret the results of the analysis.

for this we use something called loading,

factor loading is basically the correlation coefficient for the variable and factor.

let's say you have 10 variables that you want to derive into 3 factors,

so for that you make a table to account for how much of the variance of the variable is explained by a factor,

it ranges from 0 to 1. so if significant amount of the correlation is explained by a factor the variable can be denoted using that factor.

for deeper analysis you can calculate the communality of a variable,

it is given by the horizontal sum of squares of the values,

for example for variable 1 it would be 0.7 square plus 0.2 square plus 0.1 square.

similarly the vertical sum of squares of values for a factor is called eigenvalue,

for example for factor 1 eigenvalue will be 0.7 square plus 0.4 square plus 0.7 square plus 0.1 square and so on,

also sometimes for a particular variable it shows high correlation for more than one factor this is called cross loading and in this scenario variable rotation should be performed.

So we know how to interpret the results of the analysis,

now how do we know how many factors to select.

when we talk about the sample size the rule of thumb is to have minimum 5 observations per variable,

that is for let's say 5 variables you should have 25 observations 10 variables 50 observations and so on,

but when it comes to deriving the factors of the variables let's say from hundred variables how do we know how many factors today 5 8 10 how many,

for this you can make a screen plot and notice the bend in plot,

however this is not very intuitive you can instead use the latest root criterion which states that for a particular factor,

if the vertical sum of squares of all the values called the eigenvalue is greater than 1 you should include that factor in your analysis.

7 - Basic Logic of Factor Analysis

Factor analysis basically gives you the items that you want to reduce.

It creates a mathematical combination of variables,

that maximizes variance that you can predict in all variables which is the principal component or factor.

New combination of items from receivable variance that maximizes variance you can predict in what is left is your second component or factor.

Continue this until all the variance is accounted for and then select the minimal number of factors.

with that you can finally interpret the factors using rotated matrix and loadings.

What is Factor Analysis? | Explained Factor Analysis

Popular Posts

Recent Posts

Unordered List

Theme Support

Blogger Tutorials

Blogger Templates

Text Widget