LONDON SCHOOL OF COMMERCE
Quantitative Techniques in Business
Submitted to: Safaat Ullah
Submitted by: Fawzia
Statistics is a mathematical science which includes methods of collecting, organizing and analyzing data in such a way that meaningful conclusions can be drawn from them. Data is quantitative if the observations or measurements made on a given variable of a sample or population have numerical values. In general, its investigations and analyses fall into two broad categories called descriptive and inferential statistics.
This report is based on the discussion about the statistics specifically on mean and standard deviation. This report puts the focus on calculating the mean, standard deviation and scatter plot of the Art Gallery which is in England. In this report, the author can find the correlation and regression between the variables. This report is mainly based on the analysis of types of scatter plot, correlation and regression on the chart and graphs which are given below. Through these analyses, the author can calculate the value of the variables and can easily state if the variables are weak, strong, positive or negative. From all these calculations and measurements, the relationship of the variables can be analyzed in less than no time.
Mean refers to the average that is used to derive the importance of the data in the question. It is determined by connecting all the points of the data in a population and divide the total number of points. The number that comes as the result is known as the mean or average. (Jassen, 2008)
Standard deviation is the measure of dispersion of a set of data from its mean. It calculates the absolute variability of a distribution, i.e. the higher the dispersion or variability, the greater is the standard deviation and greater will be the magnitude of the deviation of the value from their mean. (Economic Times, 2018)
The formula used for standard deviation is:
(?fx²)?f-?fx?f2Significance of Standard deviation
Standard deviation calculates the deviation from the mean, which is a very important statistic. Its task is to squares the numbers and make them negative to positive. The square of small numbers is smaller and large numbers larger which is an expanding effect. So, it makes the author ignore small deviations and see the result of the larger ones understandable. (Pristine, 2008)
Observation # Attendance on same day previous week (x) Attendance on day (y) 1 7882 8876 2 6115 7203 3 5351 4370 4 8546 7192 5 6022 6835 6 7367 5469 7 7871 8207 8 5377 7026 9 5259 7592 10 4915 3190 11 6538 7012 12 6507 5517 13 5118 3764 14 6077 7575 6353.214 6416.2857 1161.114 1697.0849 In the above details, it shows the mean and standard deviation of the attendance on same day previous week (x) and attendance on day (y). The total mean or average of the attendance on same day previous week (x) is 6353.214 and for attendance on day (y) is 6416.2857. The total standard deviation for the attendance on same day previous week came 1161.114 and for the attendance on day (y) came 1697.0849. This shows that the attendance on day (y) is greater than the attendance on same day previous week (x).
There are various types of graphs, some of them are shown below:
Figure 1. (Excel Easy, 2010)
A bar chart is basically a graph shows rectangular shapes that represent the changes in the size, value or rate of something or to compare the quantity of a data that relates to various countries or groups. (Collins, 1819)
Figure 2. Pie Chart (Brilliant, 2012)
A pie chart, also known as the pie graph is a circular shaped graph. It is a circular chart that is cut by radii into different segments which illustrates relative magnitudes or frequencies. It is also called a circle graph. (Khim, 2012)
Figure 3. Line Graph (ILETSExam.net,1999)
Line graphs shows how information or date change over the time of the years. Line graphs can be used when plotting any data that have peaks and troughs. In other words, they highlight the tendencies. (ILETSExam.net, 1999)
Figure 4. Best Excel Tutorial (K, 2012)
Histogram is a column chart which present the number of occurrences of the characteristic most often in the range of numbers. A histogram is also utilized in statistics besides quality control. (K, 2012)
Figure 5 (Odessa 1993)
Scatter plots are used to plot data points on a horizontal and a vertical axis in the attempt to show how much one variable is affected by another. Each row in the data table is presented by a marker where its position is based on its values in the columns set on the X and Y axes.
A third variable can be set to correspond to the color or size of the markers, which adds yet another dimension to the plot.
The relationship between two variables is known as their correlation. If the markers are close to making a straight line in the scatter plot, the two variables have a high correlation. If the markers are equally dispersed in the scatter plot, the correlation is low, or zero. However, even though a correlation may seem to be present, this might not always happen. Both variables could be related to some third variable, thus explaining their variation, or, pure coincidence might cause an apparent correlation. (D. Rode, 2014)
Advantages of scatter plot
The scatter diagram is used to find the correlation between two variables. This graph helps us to determine how closely the two variables are connected. After determining the correlation between the variables, the author can easily interpret the performance of the dependent variable based on the calculation of the independent variable. In statistics, scatter plot is important because it shows the extent of a correlation between the data and the values given. If there is any correlation between the variables, it clearly shows on the coordinate line on the scattered plot. As a result, it enables the author to visually interpret the data that is calculated through the scatter plot. (Usmani, 2018.)
Disadvantages of scatter plot
Scatterplots may not be used too often in infographics, but they have their place. Scatter plot makes it easy to state the correlation between the variables and clustering impacts with large amount of data and information. As a quick overview, scatterplots are not that valuable and work with almost any continuous scale data or any kind of information given in the diagram. Scatter plots do not always give the accurate relationship between the variables. This plot does not show the correlation more than two variables. (Fogarty, 2015)
However, scatterplots are not always good for presentation. Various problems might occur frequently, and so it is correct to always be aware when a scatter plot is used when analyzing or presentation them. (Preserve, 2012)
Types of scatter plots
Positive scatter plots:
They increase in one variable with respective to another variable is known as positive plots. This a perfectly positive linear relationship where r= 1
Figure 6. Relationship of positive scatter plot (Johnson & Wichern 1998)
Negative scatter plot:
Increase in one variable and decrease in another variable is called negative plots. This a perfectly negative linear relationship where r= -1.
Figure 7. Relationship of negative scatter plot (Endurite, 2018)
If the data points on a scatter graph do not have any kind of linear positive or negative trend, then there is said to be no connection between the two variables that are plotted in the graph.
Figure 8. No relation between the variables.
(Ridner & Wilson, 2002)
Scatter Plot of the Art Gallery
Figure 9. (own assignment)
Here, the author can see the scatter plot of the Art Gallery which is in England. The is illustrates a linear relationship. A relationship is linear if one variable increases by roughly the same rate as the other variables changes by one unit. This diagram above shows that it has a weak but positive relationship between the variables.
Correlation is a statistical measure that shows an extent to which two or more variables alter together. A positive correlation shows the extent to which those variables rise or decrease equally. Whereas, a negative correlation shows the extent to which one variable has an uphill as the other goes in a downhill.
The formula used to find the correlation is given below:
n ?xy- ?x ?yn?x2- ?x² n?y2-?y²Significance of Correlation
Correlation is a very vital element in the field of statistics as a measure of relationship between test scores and other calculations of performance. It helps to find the existing and non-existing of a linear relationship between two variables. Depending upon the scale of data the author must use proper correlation measurement. The strength can be seen between the correlation of two variables which shows the performance of the relationship. Quantitative data is attained which can be easily analyzed more in advance. A correlational study can be analyzed on variables that can be calculated and measured and not controlled, for instance, an experimental method can be impractical or unethical to conduct. A correlation can indicate the absence or presence of a relationship between two factors or variables which is good for identifying areas where experiments could take place and show more advanced results. (Curwin & Slater, 2008)
Interpretation of Correlation
The correlation coefficient r measures the direction and strength of a linear relationship between two variables on a scatter plot. The range of r is always between +1 and –1. Some of the correlation values are shown below to interpret its values:
If the range is -1 it means it is a perfect negative relationship. However, -1 does not always signify as a bad relationship. If a correlation is –1, it simply means that the data is not lined in an ideal straight line. It signifies a strong but negative linear relationship between the variables. The minus sign just shows a downhill line which states that it is a negative relationship.
If it has a range of -0.7, this shows a strong negative linear relationship.
If it has a range of –0.5, it shows an average downhill (negative) relationship between the variables.
If it has a range of –0.3, it shows that it is a weak downhill (negative) linear relationship.
If it has a range from 0, it means there is no linear relationship between the variables.
If it shows a range of +0.3, this means that it has a weak uphill (positive) linear relationship among the variables.
If it has a range from +0.5, it indicates an average uphill (positive) relationship between the variables.
If it has a range from +0.7, this shows a strong uphill (positive) linear relationship.
If it has an exactly +1, this means it has a perfect uphill (positive) linear relationship among the variables.
SUMMARY OUTPUT Regression Statistics Multiple R 0.57111 R Square 0.326167 Adjusted R Square 0.270014 Standard Error 992.0465 Observations 14 ANOVA df SS MS F Significance F Regression 1 5716540.445 5716540 5.80857 0.032907 Residual 12 11809875.91 984156.3 Total 13 17526416.36 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 3846.095 1073.513246 3.582717 0.003763 1507.11 6185.079 1507.11 6185.079 X Variable 1 0.390743 0.162127538 2.410097 0.032907 0.037498 0.743989 0.037498 0.743989 Analysis of Correlation
The correlation has been carried out to find out the relationship between the attendance on the first and the last week. The result the author can derive from the calculation given above is 0.57. From the understanding of this correlation, the author can interpret that it is a weak but positive relationship. This means that the attendance of the first week does have some effect on the fourth coming weeks of the attendance.
However, relying on the last week’s attendance, it might not be the most effective manner to determine the fourth coming week’s attendance for the author.
Regression is a technique which is used in statistics that determines the relationship between two or more variables. This is where a change in a dependent variable relate and depends on and a change in one or more independent variables and where it equates.
In the above chart, the author can see that the Multiple R is 0.57 which is equivalent to the correlation as explained in the previous section. Thus, the interpretation of 0.57 shows that it is a weak but positive relationship between the variables.
R square is one of the most important number of the output. As the author knows, 80% of R square is good which interprets all the variability of the response data around its mean. Therefore, 0.8 is known to be a good fit. Although, in the calculation, the author got an answer of 0.32 fit which indicates that it is not suitable. However, in this scenario, the elements that were calculated do not fit into the relationship of the data that were provided. Hence, the goodness of fit is weak here. Thus, there is a weak fit to this relationship.
However, despite being a weak fit of 0.32, the author can still draw important conclusions about how changes in the predictor values are associated with changes in the response value. (Williams, 2008)
P value is to test the relationship between certain variables. In this scenario, the calculated P value came less than 0.5. From this result the author can assume that a relationship does not exist between the two variables.
y= mx+bIn this regression model, the intercept is equivalent to b. The x variable is equivalent to m. (Levine 2005)
The formula for b in calculating regression is shown below:
b= n ?xy – ?x?yn ?x2-?x2Conclusion
In the end, the author can determine that there is a positive relationship. Through the calculations of mean, standard deviation, scatter plot, correlation and regression, the author can see that there is a positive relationship between the variables. The result which came for mean in the previous attendance is 6353.214. This shows that it is less than the previous week which was 6416.2857. The standard deviation for the fist week of attendance was 1697.0849 which is greater than the previous week which was 1161.114. The correlation that can be derived from the chart is 0.57 which gives the author a weak but positive relationship. However, in the R square there is a weak fit which gives a result of only 0.32. This states that there is a weak fit to this relationship. If the author takes the P value into consideration, the author can analyze that it is less than 0.5 which tells that a relationship does not exist between the two variables.
However, all the result the author calculated states that there is a weak relationship between the variables. Though the calculated answer is weak, it still represents as a positive relationship between the variables. Thus, it has a weak but positive relationship between the two variables.