# Common Vocabulary used in statistics

This is a resource of common terms I come across whilst reading papers, that were before reading the paper unclear for me.

### Degrees of Freedom

Is the number of values used in a calculation which are free to vary.

### Difference between a statistic & a parameter

A parameter is a definitive value it is the truth. What is the average age of my class, well if I ask everyone in my class and you define my class as the population then well if I ask all 20 of them then the average I have is a parameter. A parameter is irrefutable.

Now it is rare that we are able to ask everyone in a population generally a population is much greater than it is either physically or economically possible to ask. This is where a statistic comes in. A statistic is an estimation of the parameter of our system based on a sample. Now what is important is that since a statistic is based on a sample it is disputable, it is dependent on my sample. This is where the whole realm of sampling techniques come into play, to try to ensure that our sample is representative. In addition, with a samples statistics we can develop a notion of a confidence interval which we will come onto later.

Getting representative data and constructing your analysis in the right way is something that is exceptionally complicated. For example a lot of the changes I am working on just now for a company relate to designing new parts of the site. Now we have a current group of paying customers which we use to test modifications for the new site, the problem we encounter is that the average age of our current customers are in their 40s and we are looking to design a site to attract mass appeal, to increase customer appeal. Now are design changes validated statistically by our current set of customers, which may not be representative. So perhaps we may have to look at adapting strategies to wait longer to have a sufficiently diverse population to make decisions on, but then too long a test could be costly. Getting a good statistic is hard at times!

### Endogenous Variable

Is a term we can come across often in linear regression in particular when it is from the point of view of econometrics. It has parallels to a dependent variable, and in essence is a parameter which can be fully described by a model (for example a set of simultaneous equations). For example the price of a house may be related to its location, size, number of bedrooms etc. Therefore, the price of the house could be described as an Endogeneous variable. In reality it is not completely determined by the system of equations as there will be always variables which models may not take into account, for example personal feeling towards the house, the type of neighbours etc.

The etymology of the word from Greek is apt to describing what it means ‘endo’=’within’ and ‘genous’=’producing’. Therefore, we can think of it as a variable produced from within the system. _

### Exogeneous variables

In contrast to the above these are variables which are not captured or explained by other variables in the system (‘Exo’=’outside’). These can be thought of as similar to independent variables. In the above example of the price of the house, a variable outside the system is for example the selling ability of the estate agent. Their personal ability is not affected by the house price, they are independent to the system and their given abilities are fixed on entering the system. Afterwards, the exact separation of exogenous and endogenous variables can be somewhat complicated as determining the boundary of your systems can be quite subjective.

### References

[1] http://www.statisticshowto.com/endogenous-variable/