Tuesday, 8 October 2013

SAHILA MAHAJAN
GROUP-6
ROLL NO-37

T-test for Comparing Group Means



 The t-test is used to determine whether two groups are significantly different in their means. There are 3 types of t-tests: 

1) One sample t-test                                                                                                 
2) Independent sample t-test                                                                                    
3) Paired samples t-test

1.     One Sample t-test:
A one sample t-test means  ONE GROUP (e.g., class of 8th grade students) who you are comparing to A KNOWN MEAN SCORE (say the national mean on a normed test).

2.     Independent Sample t-test:
A two sample t-test means that you have TWO GROUPS (e.g.,class of 8th grade students compared to LAST YEAR'S group of students).

3.     Paired Sample t-test:
A two sample t-test means that TWO GROUPS that you are comparing against one another, but the members of each group are related in some way to a specific member of the other group (e.g., study partners, siblings, married couples, etc.)

Unrelated groups
Unrelated groups, also called unpaired groups or independent groups, are groups in which the cases in each group are different. Often we are investigating differences in individuals, which means that when comparing two groups, an individual in one group cannot also be a member of the other group and vice versa. An example would be gender - an individual would have to be classified as either male or female - not both.  

T - TEST (Group 6, Rushi Kapadia - 2013036)


 

T - TEST
 

Definition of 'T-Test'
 
A statistical examination of two population means. A two-sample t-test examines whether two samples are different and is commonly used when the variances of two normal distributions are unknown and when an experiment uses a small sample size. For example, a t-test could be used to compare the average floor routine score of the U.S. women's Olympic gymnastic team to the average floor routine score of China's women's team.
The test statistic in the t-test is known as the t-statistic. The t-test looks at the t-statistic, t-distribution and degrees of freedom to determine a p value (probability) that can be used to determine whether the population means differ. The t-test is one of a number of hypothesis tests. To compare three or more variables, statisticians use an analysis of variance (ANOVA). If the sample size is large, they use a z-test. Other hypothesis tests include the chi-square test and f-test.

 

Example


Let’s say you’re interested in whether the average New Yorker spends more than the average Kansan per month on movies.

You ask a sample of 3 people from each state about their movie spending. You might observe a difference in those averages (like $14 for the average Kansan and $18 for the average New Yorker). But that difference is not statistically significant; it could easily just be random luck of which 3 people you randomly sampled that makes one group appear to spend more money than the other. If instead you ask 300 New Yorkers and 300 Kansans and still see a big difference, that difference is less likely to be caused by the sample being unrepresentative.

Note that if you asked 300,000 New Yorkers and 300,000 Kansans, the result would likely be statistically significant even if the difference between the group was only a penny. The t-test’s effect size complements its statistical significance, describing the magnitude of the difference, whether or not the difference is statistically significant.




Monday, 7 October 2013

COUNTIFS function - Saurabh Gopal Agrawal 2013038

Saurabh Gopal Agrawal
2013038
Group - 6


COUNTIFS function
This article describes the formula syntax and usage of the COUNTIFS function in Microsoft Office Excel.
Description
Applies criteria to cells across multiple ranges and counts the number of times all criteria are met.
Syntax
COUNTIFS(criteria_range1, criteria1, [criteria_range2, criteria2]…)

The COUNTIFS function syntax has the following arguments:
·         criteria_range1    Required. The first range in which to evaluate the associated criteria.
·         criteria1    Required. The criteria in the form of a number, expression, cell reference, or text that define which cells will be counted. For example, criteria can be expressed as 32, ">32", B4, "apples", or "32".
·         criteria_range2, criteria2, ...    Optional. Additional ranges and their associated criteria. Up to 127 range/criteria pairs are allowed.


 IMPORTANT   Each additional range must have the same number of rows and columns as thecriteria_range1 argument. The ranges do not have to be adjacent to each other.

Remarks
·         Each range's criteria is applied one cell at a time. If all of the first cells meet their associated criteria, the count increases by 1. If all of the second cells meet their associated criteria, the count increases by 1 again, and so on until all of the cells are evaluated.
·         If the criteria argument is a reference to an empty cell, the COUNTIFS function treats the empty cell as a 0 value.
·         You can use the wildcard characters— the question mark (?) and asterisk (*) — in criteria. A question mark matches any single character, and an asterisk matches any sequence of characters. If you want to find an actual question mark or asterisk, type a tilde (~) before the character.


Example 1
The example may be easier to understand if you copy it to a blank worksheet.

A
B
C
D
Sales Person
Exceeded Widgets Quota
Exceeded Gadgets Quota
Exceeded Doodads Quota
Davidoski
Yes
No
No
Burke
Yes
Yes
No
Sundaram
Yes
Yes
Yes
Levitan
No
Yes
Yes
Formula
Description
Result
=COUNTIFS(B2:D2,"=Yes")
Counts how many times Davidoski exceeded a sales quota for Widgets, Gadgets, and Doodads.
1
=COUNTIFS(B2:B5,"=Yes",C2:C5,"=Yes")
Counts how many sales people exceeded both their Widgets and Gadgets Quota.
2
=COUNTIFS(B5:D5,"=Yes",B3:D3,"=Yes")
Counts how many times Levitan and Burke exceeded the same quota for Widgets, Gadgets, and Doodads.
1





Example 2
The example may be easier to understand if you copy it to a blank worksheet.




A
B
C
Data
Data
1
5/1/2008
2
5/2/2008
3
5/3/2008
4
5/4/2008
5
5/5/2008
6
5/6/2008
Formula
Description
Result
=COUNTIFS(A2:A7,"<6",A2:A7,">1")
Counts how many numbers between 1 and 6 (not including 1 and 6) are contained in cells A2 through A7.
4
=COUNTIFS(A2:A7, "<5",B2:B7,"<5/3/2008")
Counts how many rows have numbers that are less than 5 in cells A2 through A7, and also have dates that are are earlier than 5/3/2008 in cells B2 through B7.
2
=COUNTIFS(A2:A7, "<" & A6,B2:B7,"<" & B4)
Same description as the previous example, but using cell references instead of constants in the criteria.
2

Wednesday, 18 September 2013

BENFORD'S LAW - Saurabh Gopal Agrawal 2013038

Saurabh Gopal Agrawal
2013038
Group - 6



BENFORD'S LAW:
                               
Benford’s Law is one of those mathematical laws that seems to defy common sense but works for most naturally occurring number sets.
It says that in most groups of naturally occurring numbers, the leading digit 1 will occur more than 2 as a leading digit and so on down to numbers starting with 9 occurring least often.
BENFORD’S LAW IN EXCEL:
Firstly, create a column of leading digits only using the LEFT() function. Despite what Excel documentation sometimes says, LEFT() works with numbers (not just text) and will ignore any currency symbol if defined in the cell formatting. For Benford’s Law use LEFT(<value>,1)
Then use COUNTIF to count the instances of each leading digit from 1 to 9 e.g. COUNTIF(<leading digit>,”1”) – remember that LEFT() returns a string/text value so the COUNTIF comparison is “1” not the digit 1 .
For a more exact set of comparison values use the formula =LOG10(1/<leading digit>+1).

 EXAMPLE WORKSHEET:


The leading digit values are shown in a separate column.

USE WITH CARE:

In the real world, Benford’s Law is often applied to check if data has been tampered with or outright made up. If someone has faked data or tinkered with the numbers that will affect the Benford’s Law distribution. This makes it a useful tool for auditors or others checking for fraudulent data.
But Benford’s Law needs to be used with care because not all data sets are distributed evenly or widely enough.
An example that would NOT work with Benford’s Law is a list of petty cash receipts, because the petty cash limit might be say $40 so most of the amounts will have leading digits between 1 and 3 only and probably many just under the $40 limit. Similarly a list of large check approvals, because of the arbitrary definition of ‘large’ in any organization. However if you had a list of all outgoings from small to large, Benford’s Law might apply.
A series of adult human heights or weights also don’t obey Benford’s Law because most people are within a narrow range of heights or weights (i.e. you won’t have adults weighing 10lb or 20kg). Telephone numbers won’t work because there are arbitrary prefixes or blocks of numbers issues. On the other hand a large list of street numbers from an address list probably will obey Benford’s Law.
In short, Benford’s Law is a useful tool for checking data, but it needs to be used with care and understanding of the data source. Large scale numbers without arbitrary limits work best. A history of Benford’s Law is littered with people who falsely claim fraud based on a mistaken understanding of the data source.




YESHA SHAH
2013039
GROUP 6

Today, we played a card game in group. The game we played is Kruskal’s Count

Kruskal’s Count

This trick may be perform to one individual or to a whole audience, and involves the spectators counting through a pack of cards until they reach a final chosen card. Yet, despite this seemingly random choice of cards, the magician is still able to predict the spectator’s chosen card. The trick is known as ‘Kruskal’s Count’ and was invented by the American mathematician and physicist, Martin Kruskal and described by Martin Gardner. Although this trick will not work every time, we will show that the probability of success is around 85%.

The Trick
A spectator is invited to shuffle a pack of cards as many times as they like. The spectator is then asked to secretly pick a number between 1 and 10 and to count along as cards from the deck are displayed. The magician may choose to display the cards one at a time, or he may choose to display all 52 cards together. The magician explains that the card in the position of the spectator’s secret number becomes the spectator’s first chosen card. The spectator is then told to use the value of that chosen card as his new number, and to repeat the process until the magician runs out of cards. Here, aces are worth 1; Jack, Queen, King are worth 5; and all other cards take their face value.
Yet, despite this seemingly random path through a shuffled pack of cards, the magician is able to predict
the spectator’s last chosen card. Watch and interact with a video of the trick being performed here.

The Secret
How is this done?

Well, unknown to the spectator, the magician also picks an initial number between 1 and 10, and proceeds to go through the same process. And although the magician may not have picked the same number as the spectator, there is a high probability they will land on the same final card. This is because, even though the magician and the spectator begin on different paths, there will come a point, simply by coincidence, when the two players land on the same card. And from that point on the two paths will become synchronized, meaning both players end on the same final card.

If we assume the initial numbers are equally likely to be chosen, then the probability
of success is 84%. And we can increase that chance slightly, to 85%, if the magician chooses 1 as his initial
number.

Furthermore, we will now show that, if N is the number of cards, and x is the mean average card value,
then the probability of success may be approximated with the simple formula
This is an illustration of the game, where two different situations are showed. The first situation is when the player initially starts with number 1 in his mind and then plays the game, which is denoted by yellow dots. Similarly in the second situation the player starts the game with number 7 in his mind and then plays the game, which is denoted by blue dots. The last card which is left is 8 of hearts, in both the situations, which is 85% probability. 


Source: http://www.singingbanana.com