Description

Name: Nihal R. Dalvi
Roll No: 08
Assignment No. 2
(Pitfalls of Data Analysis)
The Problem with Statistics
We have a pervasive notion that we can prove anything with statistics which is only true when
we use them improperly. Lies, damned lies, and statistics is a

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Related Documents

Share

Transcript

Name: Nihal R. Dalvi Roll No: 08 Advanced Business Analytics
–
II Assignment 1
Assignment No. 2
(Pitfalls of Data Analysis)
The Problem with Statistics
We have a pervasive notion that we can prove anything with statistics which is only true when we use them improperly.
Lies, damned lies, and statistics is a phrase describing the persuasive power of numbers, particularly the use of statistics to bolster weak arguments. It is also sometimes colloquially used to doubt statistics used to prove an opponent's point.
Sources of Bias
Bias is the tendency of a statistic to overestimate or underestimate a parameter.
Representative Sampling
: In this, the ideal scenario would be where the sample is chosen by selecting members of the population at random, with each member having an equal probability of being selected for the sample. Thus randomness is again a source of bias.
Statistical Assumptions
: In this, if the sample distribution is non-normal, we apply a transformation. However, this has dangers as well; an ill-considered transformation can do more harm than good in terms of interpretability of results.
Errors in Methodology
Statistical Power
:
The power of any test of statistical significance is defined as the probability that it will reject a false null hypothesis. Statistical power is inversely related to beta or the probability of making a Type II error. In short, power = 1
–
β
. Statistical power is affected chiefly by the size of the effect and the size of the sample used to detect it. Bigger effects are easier to detect than smaller effects, while large samples offer greater test sensitivity than small samples.
Multiple Comparisons
: In statistics, the multiple comparisons occur when one considers a set of statistical inferences simultaneously or infers a subset of parameters selected based on the observed values. In certain fields it is known as the look-elsewhere effect. Multiple comparisons arise when a statistical analysis involves multiple statistical tests, each of which has a potential to produce a discovery. The more inferences are made, the more likely erroneous inferences are to occur.
Measurement Error
: Measurement Error is the difference between a measured quantity and its true value. It includes random error (naturally occurring errors that are to be expected with any experiment) and systematic error (caused by a mis-calibrated instrument that affects all measurements. Two characteristics of measurement which are particularly important in psychological measurement are reliability and validity. Reliability refers to the ability of a
Name: Nihal R. Dalvi Roll No: 08 Advanced Business Analytics
–
II Assignment 2
measurement instrument to measure the same thing each time it is used and Validity is the extent to which the indicator measures the thing it was designed to measure. Measurement errors can quickly grow in size when used in formulas. To account for this, we should use a formula for error propagation whenever we use uncertain measures in an experiment to calculate something else.
Problems with interpretation
Confusion over significance
: A reasonable way to handle this sort of thing is to cast results in terms of effect sizes. By doing so, the size of the effect is presented in terms that make quantitative sense. A p-value merely indicates the probability of a particular set of data being generated by the null model and has little to say about size of a deviation from that model.
Precision and Accuracy
:
Accuracy refers to the closeness of a measured value to a standard or known value. Precision refers to the closeness of two or more measurements to each other. A measurement system can be accurate but not precise, precise but not accurate, neither, or both and is considered valid if it is both accurate and precise.
Causality
: Causality is the natural or worldly agency or efficacy that connects one process (the cause) with another process or state (the effect), where the first is partly responsible for the second, and the second is partly dependent on the first. Statistics and economics usually employ pre-existing data or experimental data to infer causality by regression methods. The bottom line on causal inference is that we must have random assignment.
Graphical Representations
: In this, the Lie Factor is the ratio of the difference in the proportion of the graphic elements versus the difference in the quantities they represent. The most informative graphics are those with a Lie Factor of 1. One more element is that the changes in the scale of the graphic should always correspond to changes in the data being represented. Another trouble spot with graphs is multidimensional variation. This occurs where two-dimensional figures are used to represent one-dimensional values.

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks