Salta al contenuto

Categoria: Statistics

Here are the assignments and the researches done for Statistics course (class 2018-2019, Fall 2018).

Measure Dependence and Regression line

In statistics, dependence or association is any statistical relationship, whether casual or not, between two random variables or bivariate data. Formally, random variables are dependent if they do not satisfy a mathematical property of probabilistic independence. However, when used is a technical sense, correlation refers to any of several specific types of relationship between mean values. There are several correlation coefficients, measuring the degree of correlation. The most common of these is the Pearson correlation coefficient, which is sensitive only to a linear relationship between two variables. Other correlation coefficients have been developed to be more robust than the Pearson correlation – that is more sensitive to nonlinear relationships. Mutual information can also be applied to measure dependence between two variables. Correlation and linearity The Pearson correlation coefficient indicates the strength of a linear relationship between two variables, but its value generally does not completely characterize their relationship. In particular, if the conditional mean of Y given X, denoted E(Y | X), is not linear in X, the correlation coefficient will not fully determine the form E(Y | X).   In statistics, linear regression is a linear approach to modelling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables). The case of one explanatory variable is called “simple linear regression”. Linear regression models are often fitted using the least squares approach, but they may also be fitted in other ways, such as by minimizing the “lack of it” in some other norm (as with least absolute derivation regression), or by minimizing a penalized version of the least squares cost function as in ridge regression (L2-norm penalty) and lasso (L1-norm penalty). Conversely, the least squares approach can be used to fit models that are not a linear one. Thus, although the terms “least squares” and “linear model” are closely linked, they are not synonymous. A fitted linear regression model can be used to identify the relationship between a single predictor variable Xj and the response variable y when all the other predictor variables in the model are “help fixed”. Specifically, the interpretation of βj is the expected change in y for a one-unit change in Xj when the other covariates are help-fixed. That is the expected value of the partial derivative of y with respect to Xj. This is sometimes called the unique effect of Xj on y. In contrast, the marginal effect of Xj on y can be assessed using a correlation coefficient or simple linear regression model relating only Xj to y;…

Most common pseudorandom generators

A random number generator (RNG) is a device that generates a sequence of numbers or symbols that cannot be reasonably predicted better than by a random chance. Random number generators can be true hardware random-number generators (HRNG), which generate genuinely random numbers, or pseudo-random number generators (PRNG) which generate numbers which look random, but are actually deterministic, and can be reproduced if the state of the PRNG is known. So a pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generated sequence is not truly random, because it is completely determined by an initial value, called the PRNG’s seed (which may include truly random values). There exist several computational methods for pseudo-random number generation, but all fall short of the goal of true randomness, although they may meet, with varying success, some of the statistical tests for randomness intended to measure how unpredictable their results are. The generation of pseudo-random numbers is an important and common task in computer programming. There are a couple of methods to generate a random number based on a probability density function. These methods involve transforming a uniform random number in some way. Because of this, these methods work equally well in generating both pseudo-random and truly random numbers. One method, called the inversion method, involves integrating up to an area greater than or equal to the random number (which should be generated between 0 and 1 for proper distributions). A second method, called the acceptance-rejection method, involves choosing an x and y value and testing whether the function of x is greater than y value. If it is, the x value is accepted. Otherwise, the x value is rejected and the algorithm tries again. Random numbers uniformly distributed between 0 and 1 can be used to generate random numbers of any desired distribution by passing them through the inverse cumulative distribution function (CDF) of the desired distribution (see Inverse transform sampling). Inverse CDFs are also called quantile functions. A PRNG suitable for cryptographic applications is called a cryptographically secure PRNG (CSPRNG). A requirement for a CSPRNG is that an adversary not knowing the seed has the only negligible advantage in distinguishing the generator’s output sequence from a random sequence. In other words, while a PRNG is only required to pass certain statistical tests, a…

Central Limit theorem and LLN

The Central Limit Theorem is one of the greatest results in probability theory because it says that the sum of a big number of variables has approximately a normal distribution. We define a set of random variables iid X1, X2, X3, … , Xn with mean μ and variance σ². Then if n is very big, the sum X1 + X2 + X3 + … + Xn is approximately normal with mean nμ and variance nσ². If we also normalize the sum we can say that (X1 + X2 + X3 + … + Xn – nμ)/(σ√n) is approximately a normal standard, so to a normal with mean 0 and variance 1. Look for the most popular distributions in statistics Distribution can be divided into two categories: discrete and continuous distributions. The most popular discrete distributions are: 1)    Boolean (Bernoulli) which takes value 1 with probability p and value 0 with probability q = 1 − p. 2)    Binomial which describes the number of successes in a series of independent Yes/No experiments all with the same probability of success 3)    Poisson which describes a very large number of individually unlikely events that happen in a certain time interval 4)    Hypergeometric which describes the number of successes in the first m of a series of n consecutive Yes/No experiments, if the total number of successes is known. This distribution arises when there is no replacement. The most popular continuous distributions are: 1)    Normal (or Gaussian) often used in the natural and social sciences to represent real-valued random variables 2)    Chi-squared which is the sum of the squares of n independent Gaussian random variables. It is a special case of the Gamma distribution 3)    Gamma which describes the time until n consecutive rare random events occur in a process with no memory 4)    Beta a family of two-parameter distributions with one mode, of which the uniform distribution is a special case, and which is useful in estimating success probabilities 5)    T-Student useful for estimating unknown means of Gaussian populations 6)    F-Distribution (Fisher) is a continuous probability distribution that arises frequently as the null distribution of a test statistic, most notably in the analysis of variance 7)    Weibull of which the exponential distribution is a special case is used to model the lifetime of technical devices and is used to describe the particle size distribution of particles generated by grinding, milling and crushing operations  

Derivation of the Chebyshev’s inequality and its application to prove the (weak) LLN

Derivation of the Chebyshev’s inequality and its application to prove the (weak) LLN The Chebyshev’s inequality is a direct derivation of the Markov’s inequality and it says that if we consider X as a random variable with mean μ and variance σ^2, then for every r > 0 we have that: P(|X – μ| >= r) <=  σ²/ r² Demonstration:  The events {| X – μ| > r} and {(X – μ)² >= r²} are the same and so their probability is  the same too. Since ( X – μ)^2 is a random variable not negative, then we can apply the Markov’s inequality with a = r^2, so that: P(|X – μ| >= r) = P((X – μ)² >= r²) <= (E[(X – μ)²])/r² =  σ²/ r² Chebyshev’s inequality is also used to demonstrate the weak law of large numbers. Let’s define X1, X2,. ..  Xn a succession of random variables i.i.d (independent and identically distributed) with mean μ. Then for every ε > 0, P ( |( X1 + X2+Xn )/n – μ|> ε) -> 0 when n -> to ∞ Demonstration: We prove this result with the additional hypothesis that the Xi random variables have variance limited to σ². Since the properties of mean and variance we’ve that: E[(X1 + X2 + … +Xn)/n] =  μ and Var((X1 + X2 +…+Xn)/n) = σ²/n So by applying the Chebyshev’s inequality to the random variable R = ( X1 + X2+Xn ) /n, we’ve that: P ( |( X1 + X2+Xn )/n – μ|> ε) <= σ²/nε² Since now for n -> ∞, σ²/nε² -> 0 the law is proved.

Boole inequality and calculation of union probability of n arbitrary events. Explain in a simple way the concept of sampling distribution of the mean (or any other computable statistics on the sample as standard deviation (sigma) or mode or median).

Boole inequality and calculation of union probability of n arbitrary events The Boole inequality, or union bound, says that for every limited or countable collection of events, the probability that at least one of the events happens is no greater than the sum of the probabilities of the individual events. Formally, if we have a finite or countable set of events A1, A2, A3, An we say that: It is easily demonstrable for n = 2 event. If for example, we have two arbitrary events A and B we can say for the first axiom of probability that: P(AUB)= P(A)+P(B)- P(A∩B)  P(A∩B) is >0, which is subtracted, and so it follows that P(AUB)<= P(A)+P(B). If we consider the event C = AUB and another arbitrary event D we can iterate the demonstration for n = n+1. Therefore the result is the above formula.  So it is possible to apply the Boole inequality to compute the union probability of n arbitrary event. Here’s the formula: Explain in a simple way the concept of the sampling distribution of the mean (or any other computable statistics on the sample as standard deviation (sigma) or mode or median). The mean of the sampling distribution of the mean is the mean of the population from which the scores were sampled. Therefore, if a population has a mean μ (unknown), then the mean of the sampling distribution of the mean is also μ. The symbol μM (calculated) is used to refer to the mean of the sampling distribution of the mean. Therefore, the formula for the mean of the sampling distribution of the mean can be written as:   μM = μ It is important to keep in mind that every statistic, not just the mean, has a sampling distribution.  

Concept and definition of mean. Relationship between frequency and mean. The Markov inequality.

Concept and definition of mean In statistics, we define as mean a single numeric value that describes synthetically a set of data. There’re three types of mean: arithmetical, harmonic and geometrical mean. In statistics, we usually consider as mean the arithmetic mean. If the mean is computed by using the whole population it is called population mean, but if the values used are just a subset of the population the result is called sample mean. The equation to compute the arithmetic mean is the following: A := 1/n sum(from i = 1 to n) of ai This kind of approach is called ex post computation because to do that we need to collect the whole sample data before the computation. This isn’t always possible to do, so in some cases, we must use the expected mean or expected value, which is a measure of a central tendency where all data are weighted by their probability of occurring and then summed. The expected mean is an ex ante calculation (sometimes referred to as weighted mean where probabilities are the weights). The mean can be used to extract some results and interpretations from data. For example, if we can calculate the distance between a value v of the population and the average m we can see that this value if it must be balanced by another set of value v’ who got the same distance from m. In formula: distance(m,v) = sum[from i = 1 to d ](distance(m,v’) Statistical mean is popular because it includes every item in the data set and it can easily be used with other statistical measurements. However, the major disadvantage in using statistical mean is that it can be affected by extreme values in the data set and therefore be biased. For example, mean income is typically skewed upwards by a small number of people with very large incomes, so that the majority have an income lower than the mean. To avoid these problems one of the solutions could be the Knuth algorithm: # this is a Python implementation of Knuth alogorithm def mean_knut(data): n = 0 mean = 0 for x in data: n = n + 1 delta = x – mean mean = mean + delta/n return mean This algorithm is less subject to information loss given by floating point cancellation and rounded, but it could be less efficient than the naive implementation because of the division inside the loop. The relationship between frequency and mean To…

Conditional frequency

Request Understand and discuss the notion of conditional frequency Insight 2:  Research the various approaches to define the concept of probability, the ‘Kolmogorov axioms’ and the relationship with the notion of (relative) frequency. Exercise: take a CSV file and read all the lines. Split each line by using the ‘stringsplit method and (if you can) load the data as properties of suitable objects in a list of objects (example object student) Execution To introduce the concept of conditional frequency, we must discuss the concept of conditional probability. For example, I have two honest dice and i want to describe all the possibles results I can use this notation: S = {(i,j), i = 1,2,3,4,5,6 , i = 1,2,3,4,5,6} defining i is the result of the first dice and j is the result of the second one. Since the dice are honest every combination of (i,j) has the probability of 1/36 to happen. We suppose now that the first dice results, 3, so we fix i = 3 and we ask ourself: What is the probability that the sum of the results will be 8? By fixing i as 3 there are 6 other possibles results : (3,1), (3,2), (3,3), (3,4), (3,5), (3,6) . This means that if i = 3 the probability of the above events to happen is 1/6 and the probability of the other 30 initial events is 0. Then we can conclude that the probability of i+j = 8 when i = 3 is just 1/6. If we now call the event “sum of  i + j = 8″ E and the event “i = 3” F, then we define what’ve just calculated as the conditional probability of E given F and we denote that in this way: P(E|F) We can also say that if F is verified, to verify E, the event must be part of the E AND F (where AND is the intersection). Since now F is an event really happened it will become the new set of possibles solutions. So the conditional probability is the division of the probability of E AND F to the probability of F: P(E|F) = P(E AND F) / P(F) Given these definitions, the conditional frequency is the number of occurrences of the event E given the event F and it has the same formula the conditional probability. Insights Research the various approaches to define the concept of probability, the ‘Kolmogorov axioms’ and…

Statistics and range applications

Request Definition of Statistics and range applications Basic notions and definitions Population, statistical unit, Attributes Observations, dataset The concept of scale (or levels) of measurement Execution Statistics is the science that manages to pull out conclusions from experimental data. A typical statistical scenario is when we want to study a really big set of something that has associated some measurable values, called attributes. This enormous set is called “population”. The statistical approach to this problems consists in selecting a reduced subset of the population, called statistical unit (or in Italian, “campione”). From this statistical unit with some observations to register the pieces of information we want to study, we can retrieve a dataset, a set of measurable values, and we can study it to pull out some valid conclusions about the whole population. An implicit hypothesis that we’ve to do is that there’s a distribution of probability of the population so that our datasets are independent values from that distribution. By also applying the result of the LCT (Limit Central Theorem) we can introduce the concept of “scale of measurement” by saying that one of the main tough points of statistical research is the size of the statistical unit: the larger is the statistical unit, the more accurate and efficient is the study. Insight Most used programming languages within VS.NET Main similarities, differences, and comparison of languages Online translator VS.NET Built for main core and infrastructure, but it can easily be ported literally everywhere Syntax-based on C# so is really lookalike C#, C and C++ Object-oriented Here is possible to find some comparisons: Java vs .NET explained with cats Python vs .NET NodeJs vs .NET vs Spring Here is the online translator from .NET to C# Application Discuss possible differences between .NET and C#: VB has no parenthesis but is important the correct indentation. They also have different but similar syntax. VB wants .net to run, C# wants csc C# has to be compiled, VB no.