About WTS Prof. Clair Brown Faculty, Students and Affiliates Research Areas Online Research Reports Working Papers


Second Interim Report
Clair Brown, Editor

14. Statistical Tools for Industry Data
Linda Sattler

14.1 Introduction
14.2 The HR Data Sets
14.3 An Overall Picture
14.4 Principal Components Analysis
14.5 Pursuing Interesting Leads
APP Appendix
REF References

14.1 Introduction

Validating general theories is difficult with the kind of data gathered in a detailed industry study. The number of variables is extremely large relative to the number of companies. Alternate theories can always be posed and high statistical significance is difficult to obtain. The kinds of data and the number of data points are considered drawbacks by many organizational researchers. However, this opinion is not shared by all. Much of the "drawback" comes not from the data, but from the limited tools one is familiar with using. Standard hypothesis testing tools need all distribution assumptions to be met with well-measured quantitative data. Highly significant results are more easily achieved with many data points and few variables. This is the opposite of what occurs in industry studies. Organizational data sets are certainly not lacking in information. Most researchers gain a great deal of knowledge from their long hours spent on a project. The problem should not be looked at as "How do I prove with absolute certainty that my theory is correct?", but "How do I convey to my readers the information I have gathered?" To guide readers through the industry and have them see with the researcher’s eye what lies there is a great accomplishment for an industry study. How much easier it is to guide one through a few companies rather than hundreds! What is needed for industry-wide studies are the tools to guide, to visualize, and to let the readers see for themselves (see Zeisel [1] for other descriptive visual tools).

This chapter discusses the multivariate analysis techniques used in the Competitive Semiconductor Manufacturing Study to study the data. These tools not only give a reader a fair idea of the industry, but also help the researcher pinpoint the promising areas for future research. As interesting questions are extracted, other tools are used to delve into the data and help answer or refine these questions. Problems with the data are discussed and methods used to overcome the difficulties are explained. The chapter gives an analysis of some of the semiconductor industry findings as well as suggestions for further research.

14.2 The HR Data Sets

Previous to 1995, most of the organizational data collected from the fabrication plants came through observations and employee interviews. After the two day site visit, a large (50-70 pages) document on the particular plant visited was produced. In order to utilize the site visit data in analyses, over five hundred organizational factors were created for measurement purposes. These factors tended to be factual in nature and crudely scored (True/False or High/Medium/Low). Fifteen fabrication plants have had their organizational data coded in this manner.

In 1995, a Human Resources Questionnaire was sent to the fabrication plants. The completed questionnaire gave more detailed information on the human resource practices and listed how practices changed over time. At the time this chapter was written, seven completed questionnaires had been returned.

Because of the smaller number of fabs with the newer data and the lack of time to do a more complete analysis, the older data (from the site visits) are used in this chapter. However, several techniques are used with the newer data to show the consistency of the trends and the similarities in the two data sets.

14.3 An Overall Picture

To begin the exploratory process of analyzing the organizational data, it is good to start with an overall picture of the fabrication plants. The difficulty is in condensing the hundreds of organizational variables into just a few so that it is possible to graph the fabrication plants and to identify patterns. There are several techniques that can be used and the data need to be modified to a usable form. The data modifications are first explained for the different techniques and then the analysis of the results is presented.

Overall Picture Techniques and Data Modifications

The organizational data set from the site visits contains 530 variables from 15 fabrication plants, most of which are coded either High/Medium/Low or True/False. There is a small subset of quantitative variables (such as absentee rate or number of SPC charts on each machine, for example). There is also a number of missing variables for each fab as shown below:

insert table 1

The fabs are coded for confidentiality reasons and, throughout this paper, a better explanation of the data may exist, but cannot be given because of confidentiality.

Multidimensional Scaling

The first technique proposed is a method that allows utilization of the crudity of most of the variables and circumvents the issues of missing data and qualitative scoring. What is preferred is to see the fabs on a two-dimensional graph in terms of how far away they are from each other organizationally. If there are pairwise distances (with fifteen fabs that is 105 distance measures), a technique called Multidimensional Scaling finds the best two dimensional graph that minimizes the discrepancy between the actual distances and the Euclidean distances on the graph. For a good description of this technique see [9].

In order to get distances between fabrication plants, a technique by Sokal and Michener is modified to take into account missing variables and High/Medium/Low scoring. Sokal and Michener’s simple matching coefficient technique is described in the appendix (for an introduction to this technique see [9], these authors also refer to the original work [13]).

14.4 Principal Components Analysis

Another technique that can give a global picture of the organizations is principal components analysis, which was used earlier with the performance metrics. First developed by Hotelling [6], principal components analysis attempts to reduce the number of dimensions of the data without losing much information. The first principal component is the linear combination of the variables that gives the maximum variance between the fabrication plants. The second principal component is the linear combination of the factors that gives the second largest variance between the fabs and is also orthogonal to the first principal component. With fifteen fabs, up to fourteen principal components are possible, but, hopefully, most of the fab variance is explained in the first few components. Plotting the fabs on some of the principal components can give one an idea of how the fabs may or may not be related to each other. For a good description of Principal Components Analysis see [9]. A description of how the data were coded for principal components analysis can be found in the appendix.

Analysis of Overall Picture

Multidimensional Scaling

Using the first method of multidimensional scaling and the percent differences, Figure 14-1 is obtained. The distances between the different fabs are the key metrics in this graph. A fab’s relationship to the X or Y axis is not meaningful (the axes may be ignored).

The fabs are marked with the following distinguishing characters:
J - Japan
T - Taiwan
U - United States
E - Europe

One of the most striking aspects of this graph is a separation of the Japanese fabs with the non-Japanese fabs. There is, as is shown, a general progression from left to right of fabs located east to west. This pattern of east to west organizational practices was also discovered in the automobile manufacturing study [14].

The seemingly large distances between the some of Japanese fabs are a product of multidimensional scaling. Two of these distances are actually the smallest in the CSM study with two of the Japanese fabs having 80% of the factors alike and the other two fabs having 73% of the factors alike. The average similarity between any two fabs is about 60%.

When analyzing the data, several clustering techniques were used for verification, and it is interesting that the same two pairs of Japanese fabs always cluster together. There is, however, a "medium" sized distance between the two pairs of Japanese fabs. This issue will be discussed further in the next section.

The complexities of analyzing organizations can be appreciated by noting that not only does geographical location vary from left to right, but manufacturing performance and levels of participative management scores tend to follow this same trend. With so few companies, separating possible cultural factors from purely organizational ones as a determinant of performance is extremely difficult, if not impossible. What can be done is to try to discover the underlying differences and use good judgment as to whether the factors are relevant to performance or simply coincidental with geographical location.

Principal Components Analysis

Using principal components analysis, the first two components can be graphed (see Figure 14-2). These first two components make up about 31% of the total variation in the organizations as shown in the Scree graph in Figure 14-3. A Scree graph (Cattell [1]) plots the percentage of variance explained by each eigenvalue. This graph is coded in the following manner:
***, **, *, -, -- (*** high performer in both performance productivity and quality, ** high performer in one performance dimension and medium performer in the other, * medium performer in both dimensions or high performer in one dimension and low performer in the other, - medium performer in one dimension and low performer in the other, -- low performer in both manufacturing productivity and quality )
H, M, L (High PIRK, Medium PIRK and Low PIRK - PIRK is an index of participative management given to each fab and based on the organizational data from the site visits - see [12] for an explanation of PIRK)

There is a trend from high performance to low performance and high PIRK to low PIRK, all in the second principal component. There is also an east to west trend on this second component, although not as striking.

The first principal component, which represents 19% of the total variation, is far more puzzling. At first, no patterns were detected with the organizations on this dimension. After some futile attempts at deciphering the differences between the two pairs of Japanese fabs (which have a large distance between them on this component), the underlying organization variance in this component seems to reflect some "researcher bias." It was noticed that fabs visited earlier in the study are to the right of the fabs visited later. There are several reasons for this:

1. An important member of the CSM team who contributed a great deal to the organization data had gone on all of the first seven fab visits and none of the remaining eight. He also coded the organizational factors for these first fabs. After he stopped visiting fabs, different members of the HR team went on the visits and the remaining eight fabs were coded into the organizational factors by different members of the HR team. This would explain why his fabs were clustered to the right of the graph while the other fabs were spread around.

2. The organizational H/M/L and T/F variables were created after the first six fabs were visited. The first report is based on these six fabs. It is possible that these factors displayed certain unique qualities of the first six fabs, but special qualities of the remaining fabs may not have had a variable or factor to represent them.

The Human Resource Questionnaire is gathering much of the information that are acquired on fab visits. This should increase the standardization and significantly cut down on the bias.

It is important in any study to constantly review the methodology and look for possible biases entering into the process (see [8]).

After exploring the bias a bit further, it was discovered that the main problem was not in the values that were coded, but in what data were coded. Fabs visited close together tended to have the same factors scored or missing. Also, fabs visited from the first report had this same bias. Because factors with a number of missing values tend to be correlated with factors that also have a number of missing values, this bias propagated through the data correction methods. Removing factors that had a large number of missing data points or factors that had unevenly distributed missing factors (those, for example, that had four or five missing data points from first fabs visited, but few or no missing data points from those fabs visited later) gave good results. After removing variables that had missing values from more than half the fabs (8 or more missing values) and those that had four or more missing data points from one year of fab visits but less than two from the other year of fab visits, we were left with 238 organizational factors. The principal components graph was recreated with this subset of data.

From Figure 14-4 one can see that much of the bias has been removed since the "bias" axis is either gone or relegated to the second principal component. Although the first six fabs visited are high on the second principal component, they no longer cluster together as much. As well, one pair of Japanese fabs is no longer a "pair" and there is now a large distance between them.

This graph also contains an interesting diagonal trend of high to low PIRK fabs (or high to low performance).

  • First Principal Component: This component appears to contain many participative management factors, which is not surprising. They do tend to relate, however, mostly to leadership roles. The factors involved include: Operator team leaders, technicians training other job categories, and all of upper level management involved in teams. This component is called the "Employee Leadership" component, where a high positive number indicates a great deal of self-leadership.

  • Second Principal Component: This component contains many historical factors that are not related to PIRK. These factors include: Mass turnover, mass hiring, another fab opened on-site, and threat of imminent shutdown. The common thread of these factors appears to be uncertainty of one’s job. This component is called the "Uncertainty Level" component, where a large negative number indicates stress.

What is interesting in this graph is that it appears that only fabs with a great deal of self-leadership can overcome periods of uncertainty and remain a high participative, high performance fab. As self-leadership increases, more uncertainty can be handled without altering the high PIRK/performance.

Robustness of Graphs

It is worthwhile to explore the robustness issues of the overall picture graphs in the previous sections. Suppose there was a slightly different data set, would there be the same results? One way to do this with the multidimensional scaling graph is to use the difference measure on only a random half of the data set. Because the data set is so large, the results should not be too different. The appendix gives a detailed explanation of the simulation that is used to generate Figure 14-6. The distinguishing characteristics of Figure 14-1 are still present in Figure 14-6, so the graph seems to be robust.

Unfortunately, with principal components analysis, a simple simulation is far more complicated. Not only can the graph be flipped or rotated, but with similar percents of total variance, the first principal component can become the second, the second can become the third, the third can become the first, etc. Each simulation would result in a totally different graph in the first two principal components. In this case, simply running a few simulated graphs and checking for similarities by the eye may be the best that can be done.

Using Data from the HR Questionnaire

Multidimensional Scaling and Principal Components Techniques were used on the new HR questionnaire data with promising results (see Figure 14-7 through Figure 14-9). These graphs were created with 418 HR factors and using the same data transformations as in the previous graphs on seven fabs. Two of the fabs represented with the new HR questionnaire data set have not been coded into the 500+ organizational factors using the site visit data (at the time of this report, 5 fabs are coded with both the site visit data and the new HR questionnaire data). The following graphs are all coded with regional codes (J - Japan, T - Taiwan, U - United States, E - Europe) and the regional effect is clearly apparent with both techniques.

The Multidimensional Scaling Graph (Figure 14-7) has, once again, Japan far removed from the others. In this new graph the east to west effect is not as drastic since the European fab is in the midst of the two Taiwanese. However, with the Principal Components Graph (Figure 14-8) each region is clearly represented. Similar to the Principal Components Graph with the site visits data (Figure 14-4) there is a diagonal (NE to SW) trend of high performance to low performance. Because of confidentiality, performance ratings are not displayed. At this point, the number of fabs is too small in this graph to clearly discern what the components represent.

Go to Next Section of this Chapter
Go to Table of Contents for this Chapter
Go to Table of Contents for the CSM-HR Interim Report

© 2005 Institute for Research on Labor and Employment. 
2521 Channing Way # 5555 
Berkeley, CA 94720-5555