1 Data and Statistics

1.1 Statistics

Statistics is defined as the art and science of collecting, analyzing, presenting, and interpreting data.

Particularly in business and economics, the information provided by collecting, analyzing, presenting, and interpreting data gives managers and decision makers a better understanding of the business and economic environment and thus enables them to make more informed and better decisions.

1.2 Applications in Business and Economics

Accounting Public accounting firms use statistical sampling procedures when conducting audits for their clients.
Finance Financial analysts use a variety of statistical information to guide their investment recommendations.
Marketing Electronic scanners at retail checkout counters collect data for a variety of marketing research applications.
Production Today’s emphasis on quality makes quality control an important application of statistics in production.
Economics Economists frequently provide forecasts about the future of the economy or some aspect of it. They use a variety of statistical information in making such forecasts.

1.3 Data

Data are the facts and figures collected, analyzed, and summarized for presentation and interpretation. All the data collected in a particular study are referred to as the data set for the study.

Table 1.1 shows a data set containing information for 25 mutual funds that are part of the Morningstar Funds500 for 2008.

Table 1.1: Data Set For 25 Mutual Funds

Fund Name	Fund Type	Net Asset Value ($)	5-Year Average Return (%)	Expense Ratio (%)	Morningstar Rank
American Century Intl. Disc	IE	14.37	30.53	1.41	3-Star
American Century Tax-Free Bond	FI	10.73	3.34	0.49	4-Star
American Century Ultra	DE	29.84	15.04	0.97	3-Star
Artisan Small Cap	DE	16.52	18.87	1.25	4-Star
Brown Cap Small	DE	33.97	15.53	1.08	3-Star
DFA U.S. Micro Cap	DE	18.33	17.57	0.52	5-Star
Fidelity Contrafund	DE	49.80	12.36	0.89	4-Star
Fidelity Overseas	IE	48.99	23.06	1.06	3-Star
Fidelity Sel Electronics	DE	22.40	17.70	0.89	4-Star
Fidelity Sh-Term Bond	FI	17.46	4.10	0.45	3-Star
Gabelli Asset AAA	DE	48.84	15.70	1.36	4-Star
Kalmar Grwth Sm Cp	DE	40.13	16.20	1.25	3-Star
Mairs & Power Grwth	DE	27.64	12.70	0.69	5-Star
Matthews Pacific Tiger	IE	40.07	19.51	1.05	4-Star
Oakmark I	DE	37.78	9.57	1.06	4-Star
PIMCO Emerg Mkts Bd D	FI	26.39	12.31	1.00	3-Star
RS Value A	DE	22.67	15.14	1.44	3-Star
T. Rowe Price Latin Am.	IE	33.59	32.06	1.24	4-Star
T. Rowe Price Mid Val	DE	26.37	14.40	0.80	4-Star
Thornburg Int’l Val	IE	21.10	23.64	1.40	5-Star
USAA Income	FI	12.10	5.13	0.62	3-Star
Vanguard Sel Val	DE	21.23	16.20	0.44	4-Star
Vanguard Sh-Tm TE	FI	11.20	3.80	0.13	3-Star
Vanguard Sm Cp Idx	DE	25.32	17.01	0.23	5-Star
Wasatch Sm Cp Growth	DE	35.41	13.98	1.19	4-Star

1.4 Elements, Variables, and Observations

Elements are the entities on which data are collected. For the data set in Table 1.1 each individual mutual fund is an element: the element names appear in the first column. With 25 mutual funds, the data set contains 25 elements.

A variable is a characteristic of interest for the elements.

The data set in Table 1.1 includes the following five variables:

Fund Type: The type of mutual fund
Net Asset Value ($): The closing price per share on December 31, 2007
5-Year Average Return (%): The average annual return for the fund over the past 5 years
Expense Ratio: The percentage of assets deducted each fiscal year for fund expenses
Morningstar Rank: The overall risk-adjusted star rating for each fund; Morningstar ranks go from a low of 1-Star to a high of 5-Stars

Observation Measurements collected on each variable for every element in a study provide the data. The set of measurements obtained for a particular element is called an observation.

Referring to Table 1.1 we see that the set of measurements for the first observation (American Century Intl. Disc) is IE, 14.37, 30.53, 1.41, and 3-Star.

1.5 Scales of Measurement

Data collection requires one of the following scales of measurement: nominal, ordinal, interval, or ratio .

When the data for a variable consist of labels or names used to identify an attribute of the element, the scale of measurement is considered a nominal scale ( Example: Fund Type).
The scale of measurement for a variable is called an ordinal scale if the data exhibit the properties of nominal data and the order or rank of the data is meaningful ( Example: Morningstar Rank).
The scale of measurement for a variable is an interval scale if the data have all the properties of ordinal data and the interval between values is expressed in terms of a fixed unit of measure. Interval data are always numeric ( Example: Temperature ).
The scale of measurement for a variable is a ratio scale if the data have all the properties of interval data and the ratio of two values is meaningful ( Example: distance, height, weight,time etc.).

This scale requires that a zero value be included to indicate that nothing exists for the variable at the zero point.

1.6 Quantitative and Categorical and Data

Data can be classified as either quantitative or categorical .

Quantitative Data (Numerical Data)

Data that represents numerical values.
Example: Heights of people, temperatures, test scores.
Subtypes:
- Discrete Data: Countable values (e.g., number of students in a class).
- Continuous Data: Measurable values that can take any value within a range (e.g., weight, time).

Qualitative Data (Categorical Data)

Data that represents categories or labels.
Example: Colors of cars, types of animals, survey responses (e.g., yes/no).
Subtypes:
- Nominal Data: Categories without a natural order (e.g., gender, blood type).
- Ordinal Data: Categories with a meaningful order (e.g., rankings, education levels).

The statistical analysis appropriate for a particular variable depends upon whether the variable is categorical or quantitative.

1.7 Cross-Sectional and Time Series Data

For purposes of statistical analysis, distinguishing between cross-sectional data and time series data is important.

Cross-sectional data are data collected at the same or approximately the same point in time. The data in Table 1.1 are cross-sectional because they describe the five variables for the 25 mutual funds at the same point in time.

Time series data are data collected over several time periods. For example, the time series in Figure 1.1 shows the U.S. average price per gallon of conventional regular gasoline between 2006 and 2009.

Figure 1.1: U.S. Average price per gallon for conventional regular gasoline

1.8 Descriptive Statistics

Most of the statistical information in newspapers, magazines, company reports, and other publications consists of data that are summarized and presented in a form that is easy for the reader to understand. Such summaries of data, which may be tabular, graphical, or numerical, are referred to as descriptive statistics.

1.9 Inferential statistics (Statistical Inference)

Many situations require information about a large group of elements (individuals, companies, voters, households, products, customers, and so on). But, because of time, cost, and other considerations, data can be collected from only a small portion of the group. The larger group of elements in a particular study is called the population, and the smaller group is called the sample. Formally, we use the following definitions.

Population A population is the set of all elements of interest in a particular study.
Sample A sample is a subset of the population.

The process of conducting a survey to collect data for the entire population is called a census.

The process of conducting a survey to collect data for a sample is called a sample survey.

As one of its major contributions, statistics uses data from a sample to make estimates and test hypotheses about the characteristics of a population through a process referred to as statistical inference.

1.10 Exercise

What is the level of measurement / categorical (nominal, ordinal ) or quantitative (discrete, continuous) for each of the following variables?

Student IQ ratings.
Distance students travel to class.
The jersey numbers of a sorority soccer team.
A classification of students by state of birth.
A summary of students by academic class—that is, freshman, sophomore, junior, and senior.
Number of hours students study per week.

What is the level of measurement / categorical (nominal, ordinal ) or quantitative (discrete, continuous) for these items related to the newspaper business?

The number of papers sold each Sunday during 2011.
The departments, such as editorial, advertising, sports, etc.
A summary of the number of papers sold by county.
The number of years with the paper for each employee.

What is the level of measurement / categorical (nominal, ordinal ) or quantitative (discrete, continuous) for these following items?

Salary
Gender
Sales volume of MP3 players
Soft drink preference
Temperature
SAT scores
Student rank in class
Rating of a finance professor
Number of home computers

For each of the following, determine whether the group is a sample or a population.

The participants in a study of a new cholesterol drug.
The drivers who received a speeding ticket in Kansas City last month.
Those on welfare in Cook County (Chicago), Illinois.
The 30 stocks reported as a part of the Dow Jones Industrial Average.