Data mining

1. Which of the following are suitable null hypotheses. If not, explain why.

a. Comparing two groups

Consider comparing the average blood pressure of a group of subjects, both before and after they are placed on a low salt diet. In this case, the null hypothesis is that a low salt diet does reduce blood pressure, i.e., that the average blood pressure of the subjects is the same before and after the change in diet.

Save your time - order a paper!

Get your paper written from scratch within the tight deadline. Our service is a reliable solution to all your troubles. Place an order on any task and we will take care of it. You won’t have to worry about the quality and deadlines

Order Paper Now

b. Classification.

Assume there are two classes, labeled + and -, where we are most interested in the positive class, e.g., the presence of a disease. H0 is the statement that the class of an object is negative, i.e., that the patient does not have the disease.

c. Association Analysis

For frequent patterns, the null hypothesis is that the items are independent and thus, any pattern that we detect is spurious.

d. Clustering

The null hypothesis is that there is cluster structure in the data beyond what might occur at random.

e. Anomaly Detection

Our assumption, H0, is that an object is not anomalous.

2. Consider the different combinations of effect size and p-value applied to an experiment where we want to determine the efficacy of a new drug.

(i) effect size small, p-value small

(ii) effect size small, p-value large

(iii) effect size large, p-value small

(iv) effect size large, p-value large

Whether effect size is small or large depends on the domain, which in this case is medical. For this problem consider a small p-value to be less than 0.001, while a large p-value is above .05. Assume that the sample size is relatively large, e.g. thousands of patients with the condition that the drug hopes to treat.

(a) Which combination(s) would very likely be of interest?

(b) Which combination(s) would very likely not be of interest?

(c)If the sample size were small, would that change your answers?

Reference:

Please follow below vidoe URL to answer the above two questions

https://s3.us-east-1.amazonaws.com/blackboard.learn.xythos.prod/5a31b16bb2c48/5454504?response-content-disposition=inline%3B%20filename%2A%3DUTF-8%27%27Week15%2520-%2520Ch10%2520-%2520ITS632%2520-%2520Summer%25202019.mp4&response-content-type=video%2Fmp4&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20190812T173619Z&X-Amz-SignedHeaders=host&X-Amz-Expires=21600&X-Amz-Credential=AKIAIL7WQYDOOHAZJGWQ%2F20190812%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=d8e262471311769a00dd5b4c26fcbfda32fad505ccd6527d78f2ec49e2738942
 
"Looking for a Similar Assignment? Order now and Get 10% Discount! Use Code "Newclient"
[promo2]