1.

What’s an attribute? What’s a data instance?

2.

What’s noise? How can noise be reduced in a dataset?

3.

Define outlier. Describe 2 different approaches to detect outliers in a dataset.

4.

Describe 3 different techniques to deal with missing values in a dataset. Explain when each of

these techniques would be most appropriate.

5.

Given a sample dataset with missing values, apply an appropriate technique to deal with them.

6.

Give 2 examples in which aggregation is useful.

7.

Given a sample dataset, apply aggregation of data values.

8.

What’s sampling?

9.

What’s simple random sampling? Is it possible to sample data instances using a distribution

different from the uniform distribution? If so, give an example of a probability distribution of the

data instances that is different from uniform (i.e., equal probability).

10. What’s stratified sampling?

11. What’s “the curse of dimensionality”?

12. Provide a brief description of what Principal Components Analysis (PCA) does. [Hint: See

Appendix A and your lecture notes.] State what’s the input and what the output of PCA is.

13. What’s the difference between dimensionality reduction and feature selection?

14. Describe in detail 2 different techniques for feature selection.

15. Given a sample dataset (represented by a set of attributes, a correlation matrix, a co-variance

matrix, …), apply feature selection techniques to select the best attributes to keep (or

equivalently, the best attributes to remove).

16. What’s the difference between feature selection and feature extraction?

17. Give two examples of data in which feature extraction would be useful.

18. Given a sample dataset, apply feature extraction.

19. What’s data discretization and when is it needed?

20. What’s the difference between supervised and unsupervised discretization?

21. Given a sample dataset, apply unsupervised (e.g., equal width, equal frequency) discretization,

or supervised discretization (e.g., using entropy).

22. Describe 2 approaches to handle nominal attributes with too many values.

23. Given a dataset, apply variable transformation: Either a simple given function, normalization, or

standardization.

24. Definition of Correlation and Covariance, and how to use them in data pre-processing

Purchase answer to see full

attachment

#### Why Choose Us

- 100% non-plagiarized Papers
- 24/7 /365 Service Available
- Affordable Prices
- Any Paper, Urgency, and Subject
- Will complete your papers in 6 hours
- On-time Delivery
- Money-back and Privacy guarantees
- Unlimited Amendments upon request
- Satisfaction guarantee

#### How it Works

- Click on the “Place Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
- Fill in your paper’s requirements in the "
**PAPER DETAILS**" section. - Fill in your paper’s academic level, deadline, and the required number of pages from the drop-down menus.
- Click “
**CREATE ACCOUNT & SIGN IN**” to enter your registration details and get an account with us for record-keeping and then, click on “PROCEED TO CHECKOUT” at the bottom of the page. - From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.