# Goodness of Fit test (Normal Distribution

in statistiXL

The Goodness of Fit Test against a normal distribution fails for an unspecified error.

Your lower and upper case categories should have zero observations. Please consult the Help file on how to structure your data.

Statistixl ver 1.5 trial version

Data

Bin Frequency

-80000 0

-78075.18322 1

-68612.34886 1

-59149.51451 6

-49686.68015 5

-40223.84579 12

-30761.01144 22

-21298.17708 35

-11835.34272 49

-2372.508364 69

7090.325994 3768

16553.16035 90

26015.99471 52

35478.82907 34

44941.66342 16

54404.49778 7

63867.33214 3

73330.16649 0

82793.00085 0

92255.83521 0

101718.6696 0

111181.5039 0

120644.3383 0

130107.1726 0

139570.007 1

149032.8414 1

158495.6757 0

167958.5101 1

177421.3444 0

186884.1788 0

196347.0131 0

205809.8475 0

215272.6819 0

224735.5162 0

234198.3506 0

243661.1849 0

253124.0193 97

262586.8536 0

272049.688 0

281512.5224 0

290975.3567 0

300438.1911 0

309901.0254 0

319363.8598 0

328826.6941 0

338289.5285 0

347752.3629 0

357215.1972 0

366678.0316 0

376140.8659 0

385603.7003 0

395066.5346 0

404529.369 0

413992.2034 0

423455.0377 0

432917.8721 0

442380.7064 0

451843.5408 0

461306.3751 0

470769.2095 0

480232.0439 0

489694.8782 0

499157.7126 0

508620.5469 0

518083.3813 0

527546.2156 0

where could the error lie? pls advise

Your lower and upper case categories should have zero observations. Please consult the Help file on how to structure your data.

Statistixl ver 1.5 trial version

Data

Bin Frequency

-80000 0

-78075.18322 1

-68612.34886 1

-59149.51451 6

-49686.68015 5

-40223.84579 12

-30761.01144 22

-21298.17708 35

-11835.34272 49

-2372.508364 69

7090.325994 3768

16553.16035 90

26015.99471 52

35478.82907 34

44941.66342 16

54404.49778 7

63867.33214 3

73330.16649 0

82793.00085 0

92255.83521 0

101718.6696 0

111181.5039 0

120644.3383 0

130107.1726 0

139570.007 1

149032.8414 1

158495.6757 0

167958.5101 1

177421.3444 0

186884.1788 0

196347.0131 0

205809.8475 0

215272.6819 0

224735.5162 0

234198.3506 0

243661.1849 0

253124.0193 97

262586.8536 0

272049.688 0

281512.5224 0

290975.3567 0

300438.1911 0

309901.0254 0

319363.8598 0

328826.6941 0

338289.5285 0

347752.3629 0

357215.1972 0

366678.0316 0

376140.8659 0

385603.7003 0

395066.5346 0

404529.369 0

413992.2034 0

423455.0377 0

432917.8721 0

442380.7064 0

451843.5408 0

461306.3751 0

470769.2095 0

480232.0439 0

489694.8782 0

499157.7126 0

508620.5469 0

518083.3813 0

527546.2156 0

where could the error lie? pls advise

## Comments

I think this is down to a bug that has already been addressed in Version 1.6 of statistiXL which will be released soon. You can download a beta version from here. Please uninstall your existing version before installing the new one. Let me know if you still experience problems with your dataset.

Cheers

Alan

however, i'm still having the same problem.

is there something wrong with my dataset? or where could the error lie?

Exactly what error do you get? I have installed the download I linked to on a clean PC and your data runs through it fine (though you will be asked to select the pool data option).

Cheers

Alan

I placed the data for 'bin' in Col A and those of 'Frequency' in Col B. Then i choose Normal Distribution from the Goodness of FIt selections.

in the Category Range: it is Sheet1!$A$2:$A$67

in Freq Range: it is Sheet1!$B$2:$B$67

I chose 'Labels in First Row'

Output Range: Sheet1!$D$2

i chose the bottom 2 tick boxes as well..

still not able to detect any problem... pls advise

If you are wanting to include the labels in the analysis (i.e. as suggested by ticking the "Labels in First Row" option) then you need to include the labels (Row 1) in your range i.e. you should use

Category Range: Sheet1!$A$1:$A$67

Freq Range: Sheet1!$B$1:$B$67

(not Category Range: Sheet1!$A$2:$A$67

Freq Range: Sheet1!$B$2:$B$67)

If you're not bothered about the labels you can keep the ranges the same as you reported but you must untick the "Labels in First Row" option.

When you tick the "Labels in First Row" option statistiXL takes the top row of the range you specify as labels rather than data. With your settings this takes the values in Row 2 as labels and takes rows 3 to 67 as data. Row 3 doesn't contain 0 and so you get the error you mention.

Hope this helps

Alan

from my understanding, goodness of fit tests come in after i plot a curve with my input values. say i plot a CDF curve with my inputs, it gives a 'S' shaped curve, x-axis in Log scale.

now when i wan to apply goodness of fit test to it, i can include into the same chart say a Normal Distribution curve(in CDF as well) and this way i can visually see if the 2 curves fit nicely.

however, if i were to use statistiXL, i will have to input my values, but how do i calculate the frequency in excel? is there any functions within excel that automatically derives the frequency range for me?

In statistiXL, the Goodness of Fit module is intended to compare two frequency distributions, an expected one with observed data.

The observed data can be a typical frequency distribution i.e. a range of X values and a corresponding range of how many times each of those X values was recorded, OR a range of unique X values (non occurs twice or more) so each X has a frequency of 1. But you will probably have to convert these latter data to a frequency histogram anyway, to get a uniform range of bin sizes.

The expected frequency distribution is calculated in statistiXL for some of the common ones (normal, Poisson, binomial, uniform, circular). I am sure you appreciate that there are MANY other possible theoretical distributions, so it is not possible to cover all of them in statistiXL. However, you should be able to calculate the expected frequency for your particular distribution. The first thing you will probably have to do is arrange your X data as a frequency histogram (if it isn't already) so that you have uniform bin categories. You should then from first principles (knowledge of your theoretical distribution) be able to calculate the expected frequency for each bin range. If you sum these expected frequencies for your observed bin range, then you will most likely find that the sum of the expected frequencies is not equal to the sum of the observed frequencies. So, readjust your expected frequencies so that their sum equals the sum of the observed frequencies.

You should now be ready to do the Goodness of Fit test in statistiXL.

I hope this helps.

Phil

"An error has occurred with the analysis. Pls make sure that the dataset meets the requirements specified in the help file."

my input data are as follows:

Bin Frequency

-11000 0

-10800 1

-1819.52377 14

7160.952459 3703

16141.42869 31

25121.90492 10

34102.38115 7

43082.85738 1

52063.33361 1

61043.80984 0

70024.28607 0

79004.7623 0

87985.23852 0

96965.71475 0

105946.191 0

114926.6672 0

123907.1434 0

132887.6197 0

141868.0959 1

150848.5721 1

159829.0484 0

168809.5246 1

177790.0008 0

186770.477 0

195750.9533 0

204731.4295 0

213711.9057 0

222692.382 0

231672.8582 0

240653.3344 0

249633.8107 0

258614.2869 0

267594.7631 0

276575.2393 0

285555.7156 0

294536.1918 0

303516.668 0

312497.1443 0

321477.6205 0

330458.0967 0

339438.573 0

348419.0492 0

357399.5254 0

366380.0016 0

375360.4779 0

384340.9541 0

393321.4303 0

402301.9066 0

411282.3828 0

420262.859 0

429243.3352 0

438223.8115 0

447204.2877 0

456184.7639 0

465165.2402 0

474145.7164 0

483126.1926 0

492106.6689 0

501087.1451 0

510067.6213 0

519048.0975 0

528028.5738 0

any idea where went wrong? or is it the bug mentioned by alan?

I will have a look at this but given that you have more than 98% of your values in a single bin this isn't going to meet any theoretical distribution. Nearly all of your categories have 0 (or almost 0 ) readings. The dataset you have simply isn't really suited to any form of frequency analysis (it has almost no variability - practically everything is in one group).

Cheers

Alan

unless my method of generating the bin and frequency is wrong, otherwise i tink tis is the best i can achieve.

I generated the bin and frequency by using the 'Data Analysis -> Histogram'. If there is any better way of doing it, pls advise.

thanks

This analysis runs with the latest version of statistiXL (1.6 beta) referenced in my first reply at the top of this topic. Can you please download it and give it a go - it may have been updated since you last downloaded it?

Cheers

Alan

anyway was wondering, at what percentage level is the goodness of fit tests conducted? 90%? 95%?

also does the program allows the user to record a macro to perform a whole series of computations with the funcitons provided? tried doing it but nothing is reflected in the macro.

Cheers

Alan

There is currently no way to enable macros. We are hoping to expose Excel Formulae for many of statistiXL's functions in a later version. For now you could write VBA code against the statistiXL DLLs though we are unable to provide technical support for this solution and the DLLs are undocumented (though fairly self explanatory if you browse their functions using Excel's object browser).

Cheers

Alan