|
THE COMPETITIVE SEMICONDUCTOR MANUFACTURING HUMAN
RESOURCES PROJECT:
Second Interim Report
CSM-32
Clair Brown, Editor
14. Statistical Tools for
Industry Data
Linda Sattler
14.1 Introduction
14.2 The HR Data Sets
14.3 An Overall Picture
14.4 Principal Components Analysis
14.5 Pursuing Interesting Leads
APP Appendix
REF References
14.1
Introduction
Validating general theories is difficult with the kind of data gathered
in a detailed industry study. The number of variables is extremely
large relative to the number of companies. Alternate theories can
always be posed and high statistical significance is difficult to
obtain. The kinds of data and the number of data points are considered
drawbacks by many organizational researchers. However, this opinion
is not shared by all. Much of the "drawback" comes not
from the data, but from the limited tools one is familiar with using.
Standard hypothesis testing tools need all distribution assumptions
to be met with well-measured quantitative data. Highly significant
results are more easily achieved with many data points and few variables.
This is the opposite of what occurs in industry studies. Organizational
data sets are certainly not lacking in information. Most researchers
gain a great deal of knowledge from their long hours spent on a
project. The problem should not be looked at as "How do I prove
with absolute certainty that my theory is correct?", but "How
do I convey to my readers the information I have gathered?"
To guide readers through the industry and have them see with the
researchers eye what lies there is a great accomplishment
for an industry study. How much easier it is to guide one through
a few companies rather than hundreds! What is needed for industry-wide
studies are the tools to guide, to visualize, and to let the readers
see for themselves (see Zeisel [1] for other descriptive visual
tools).
This chapter discusses the multivariate analysis techniques used
in the Competitive Semiconductor Manufacturing Study to study the
data. These tools not only give a reader a fair idea of the industry,
but also help the researcher pinpoint the promising areas for future
research. As interesting questions are extracted, other tools are
used to delve into the data and help answer or refine these questions.
Problems with the data are discussed and methods used to overcome
the difficulties are explained. The chapter gives an analysis of
some of the semiconductor industry findings as well as suggestions
for further research.
14.2 The HR Data Sets
Previous to 1995, most of the organizational data collected from
the fabrication plants came through observations and employee interviews.
After the two day site visit, a large (50-70 pages) document on
the particular plant visited was produced. In order to utilize the
site visit data in analyses, over five hundred organizational factors
were created for measurement purposes. These factors tended to be
factual in nature and crudely scored (True/False or High/Medium/Low).
Fifteen fabrication plants have had their organizational data coded
in this manner.
In 1995, a Human Resources Questionnaire was sent to the fabrication
plants. The completed questionnaire gave more detailed information
on the human resource practices and listed how practices changed
over time. At the time this chapter was written, seven completed
questionnaires had been returned.
Because of the smaller number of fabs with the newer data and the
lack of time to do a more complete analysis, the older data (from
the site visits) are used in this chapter. However, several techniques
are used with the newer data to show the consistency of the trends
and the similarities in the two data sets.
14.3 An Overall Picture
To begin the exploratory process of analyzing the organizational
data, it is good to start with an overall picture of the fabrication
plants. The difficulty is in condensing the hundreds of organizational
variables into just a few so that it is possible to graph the fabrication
plants and to identify patterns. There are several techniques that
can be used and the data need to be modified to a usable form. The
data modifications are first explained for the different techniques
and then the analysis of the results is presented.
Overall Picture Techniques and Data Modifications
The organizational data set from the
site visits contains 530 variables from 15 fabrication plants, most
of which are coded either High/Medium/Low or True/False. There is
a small subset of quantitative variables (such as absentee rate
or number of SPC charts on each machine, for example). There is
also a number of missing variables for each fab as shown below:
insert table 1
The fabs are coded for confidentiality
reasons and, throughout this paper, a better explanation of the
data may exist, but cannot be given because of confidentiality.
Multidimensional Scaling
The first technique proposed is a method
that allows utilization of the crudity of most of the variables
and circumvents the issues of missing data and qualitative scoring.
What is preferred is to see the fabs on a two-dimensional graph
in terms of how far away they are from each other organizationally.
If there are pairwise distances (with fifteen fabs that is 105 distance
measures), a technique called Multidimensional Scaling finds the
best two dimensional graph that minimizes the discrepancy between
the actual distances and the Euclidean distances on the graph. For
a good description of this technique see [9].
In order to get distances between fabrication plants, a technique
by Sokal and Michener is modified to take into account missing variables
and High/Medium/Low scoring. Sokal and Micheners simple matching
coefficient technique is described in the appendix (for an introduction
to this technique see [9], these authors also refer to the original
work [13]).
14.4 Principal Components
Analysis
Another technique that can give a global picture of the organizations
is principal components analysis, which was used earlier with the
performance metrics. First developed by Hotelling [6], principal
components analysis attempts to reduce the number of dimensions
of the data without losing much information. The first principal
component is the linear combination of the variables that gives
the maximum variance between the fabrication plants. The second
principal component is the linear combination of the factors that
gives the second largest variance between the fabs and is also orthogonal
to the first principal component. With fifteen fabs, up to fourteen
principal components are possible, but, hopefully, most of the fab
variance is explained in the first few components. Plotting the
fabs on some of the principal components can give one an idea of
how the fabs may or may not be related to each other. For a good
description of Principal Components Analysis see [9]. A description
of how the data were coded for principal components analysis can
be found in the appendix.
Analysis of Overall Picture
Multidimensional Scaling
Using the first method of multidimensional
scaling and the percent differences, Figure
14-1 is obtained. The distances between the different fabs are
the key metrics in this graph. A fabs relationship to the
X or Y axis is not meaningful (the axes may be ignored).
The fabs are marked with the following distinguishing characters:
J - Japan
T - Taiwan
U - United States
E - Europe
One of the most striking aspects of this graph is a separation of
the Japanese fabs with the non-Japanese fabs. There is, as is shown,
a general progression from left to right of fabs located east to
west. This pattern of east to west organizational practices was
also discovered in the automobile manufacturing study [14].
The seemingly large distances between the some of Japanese fabs
are a product of multidimensional scaling. Two of these distances
are actually the smallest in the CSM study with two of the Japanese
fabs having 80% of the factors alike and the other two fabs having
73% of the factors alike. The average similarity between any two
fabs is about 60%.
When analyzing the data, several clustering techniques were used
for verification, and it is interesting that the same two pairs
of Japanese fabs always cluster together. There is, however, a "medium"
sized distance between the two pairs of Japanese fabs. This issue
will be discussed further in the next section.
The complexities of analyzing organizations can be appreciated by
noting that not only does geographical location vary from left to
right, but manufacturing performance and levels of participative
management scores tend to follow this same trend. With so few companies,
separating possible cultural factors from purely organizational
ones as a determinant of performance is extremely difficult, if
not impossible. What can be done is to try to discover the underlying
differences and use good judgment as to whether the factors are
relevant to performance or simply coincidental with geographical
location.
Principal Components Analysis
Using principal components analysis,
the first two components can be graphed (see Figure
14-2). These first two components make up about 31% of the total
variation in the organizations as shown in the Scree graph in Figure
14-3. A Scree graph (Cattell [1]) plots the percentage of variance
explained by each eigenvalue. This graph is coded in the following
manner:
***, **, *, -, -- (*** high performer in both performance productivity
and quality, ** high performer in one performance dimension and
medium performer in the other, * medium performer in both dimensions
or high performer in one dimension and low performer in the other,
- medium performer in one dimension and low performer in the other,
-- low performer in both manufacturing productivity and quality
)
H, M, L (High PIRK, Medium PIRK and Low PIRK - PIRK is an index
of participative management given to each fab and based on the organizational
data from the site visits - see [12] for an explanation of PIRK)
There is a trend from high performance to low performance and high
PIRK to low PIRK, all in the second principal component. There is
also an east to west trend on this second component, although not
as striking.
The first principal component, which represents 19% of the total
variation, is far more puzzling. At first, no patterns were detected
with the organizations on this dimension. After some futile attempts
at deciphering the differences between the two pairs of Japanese
fabs (which have a large distance between them on this component),
the underlying organization variance in this component seems to
reflect some "researcher bias." It was noticed that fabs
visited earlier in the study are to the right of the fabs visited
later. There are several reasons for this:
1. An important member of the CSM team
who contributed a great deal to the organization data had gone
on all of the first seven fab visits and none of the remaining
eight. He also coded the organizational factors for these first
fabs. After he stopped visiting fabs, different members of the
HR team went on the visits and the remaining eight fabs were coded
into the organizational factors by different members of the HR
team. This would explain why his fabs were clustered to the right
of the graph while the other fabs were spread around.
2. The organizational H/M/L and T/F variables were created after
the first six fabs were visited. The first report is based on
these six fabs. It is possible that these factors displayed certain
unique qualities of the first six fabs, but special qualities
of the remaining fabs may not have had a variable or factor to
represent them.
The Human Resource Questionnaire is gathering
much of the information that are acquired on fab visits. This should
increase the standardization and significantly cut down on the bias.
It is important in any study to constantly review the methodology
and look for possible biases entering into the process (see [8]).
After exploring the bias a bit further, it was discovered that the
main problem was not in the values that were coded, but in what
data were coded. Fabs visited close together tended to have the
same factors scored or missing. Also, fabs visited from the first
report had this same bias. Because factors with a number of missing
values tend to be correlated with factors that also have a number
of missing values, this bias propagated through the data correction
methods. Removing factors that had a large number of missing data
points or factors that had unevenly distributed missing factors
(those, for example, that had four or five missing data points from
first fabs visited, but few or no missing data points from those
fabs visited later) gave good results. After removing variables
that had missing values from more than half the fabs (8 or more
missing values) and those that had four or more missing data points
from one year of fab visits but less than two from the other year
of fab visits, we were left with 238 organizational factors. The
principal components graph was recreated with this subset of data.
From Figure 14-4 one can see that much
of the bias has been removed since the "bias" axis is
either gone or relegated to the second principal component. Although
the first six fabs visited are high on the second principal component,
they no longer cluster together as much. As well, one pair of Japanese
fabs is no longer a "pair" and there is now a large distance
between them.
This graph also contains an interesting diagonal trend of high to
low PIRK fabs (or high to low performance).
-
First Principal Component: This component
appears to contain many participative management factors, which
is not surprising. They do tend to relate, however, mostly to
leadership roles. The factors involved include: Operator team
leaders, technicians training other job categories, and all
of upper level management involved in teams. This component
is called the "Employee Leadership" component, where
a high positive number indicates a great deal of self-leadership.
-
Second Principal Component: This
component contains many historical factors that are not related
to PIRK. These factors include: Mass turnover, mass hiring,
another fab opened on-site, and threat of imminent shutdown.
The common thread of these factors appears to be uncertainty
of ones job. This component is called the "Uncertainty
Level" component, where a large negative number indicates
stress.
What is interesting in this graph is
that it appears that only fabs with a great deal of self-leadership
can overcome periods of uncertainty and remain a high participative,
high performance fab. As self-leadership increases, more uncertainty
can be handled without altering the high PIRK/performance.
Robustness of Graphs
It is worthwhile to explore the robustness
issues of the overall picture graphs in the previous sections. Suppose
there was a slightly different data set, would there be the same
results? One way to do this with the multidimensional scaling graph
is to use the difference measure on only a random half of the data
set. Because the data set is so large, the results should not be
too different. The appendix gives a detailed explanation of the
simulation that is used to generate Figure
14-6. The distinguishing characteristics of Figure 14-1 are
still present in Figure 14-6, so the graph seems to be robust.
Unfortunately, with principal components analysis, a simple simulation
is far more complicated. Not only can the graph be flipped or rotated,
but with similar percents of total variance, the first principal
component can become the second, the second can become the third,
the third can become the first, etc. Each simulation would result
in a totally different graph in the first two principal components.
In this case, simply running a few simulated graphs and checking
for similarities by the eye may be the best that can be done.
Using Data from the HR Questionnaire
Multidimensional Scaling and Principal
Components Techniques were used on the new HR questionnaire data
with promising results (see Figure 14-7
through Figure 14-9). These graphs were
created with 418 HR factors and using the same data transformations
as in the previous graphs on seven fabs. Two of the fabs represented
with the new HR questionnaire data set have not been coded into
the 500+ organizational factors using the site visit data (at the
time of this report, 5 fabs are coded with both the site visit data
and the new HR questionnaire data). The following graphs are all
coded with regional codes (J - Japan, T - Taiwan, U - United States,
E - Europe) and the regional effect is clearly apparent with both
techniques.
The Multidimensional Scaling Graph (Figure
14-7) has, once again, Japan far removed from the others. In
this new graph the east to west effect is not as drastic since the
European fab is in the midst of the two Taiwanese. However, with
the Principal Components Graph (Figure 14-8)
each region is clearly represented. Similar to the Principal Components
Graph with the site visits data (Figure 14-4)
there is a diagonal (NE to SW) trend of high performance to low
performance. Because of confidentiality, performance ratings are
not displayed. At this point, the number of fabs is too small in
this graph to clearly discern what the components represent.
Go to Next Section of this Chapter
Go to Table of Contents for this
Chapter
Go to Table of Contents for the
CSM-HR Interim Report
|