.

# machine learning coursera quiz answers

## machine learning coursera quiz answers all weeks

Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it. Many researchers also think it is the best way to make progress towards human-level AI.

In this class, you will learn about the most effective machine learning techniques, and gain practice implementing them and getting them to work for yourself. More importantly, you’ll learn about not only the theoretical underpinnings of learning, but also gain the practical know-how needed to quickly and powerfully apply these techniques to new problems. Finally, you’ll learn about some of Silicon Valley’s best practices in innovation as it pertains to machine learning and AI.

 introduction to machine learning duke university coursera quiz answers, machine learning coursera github, coursera machine learning quiz answers week 3, introduction to machine learning coursera quiz answers, machine learning for all university of london coursera quiz answers, coursera university of washington machine learning quiz answers, machine learning coursera quiz answers week 1, machine learning coursera quiz answers week 2

## machine learning coursera quiz answers week 1

1. A computer program is said to learn from experience E with respect to some task T and some performance measure P if its performance on T, as measured by P, improves with experience E. Suppose we feed a learning algorithm a lot of historical weather data, and have it learn to predict weather. What would be a reasonable choice for P?
•  The probability of it correctly predicting a future date’s weather.
•  The process of the algorithm examining a large amount of historical weather data.
•  None of these.

1. A computer program is said to learn from experience E with respect to some task T and some performance measure P if its performance on T, as measured by P, improves with experience E. Suppose we feed a learning algorithm a lot of historical weather data, and have it learn to predict weather. In this setting, what is T?
•  None of these.
•  The probability of it correctly predicting a future date’s weather.
•  The process of the algorithm examining a large amount of historical weather data.

1. Suppose you are working on weather prediction, and use a learning algorithm to predict tomorrow’s temperature (in degrees Centigrade/Fahrenheit).
Would you treat this as a classification or a regression problem?

•  Regression
•  Classification

1. Suppose you are working on weather prediction, and your weather station makes one of three predictions for each day’s weather: Sunny, Cloudy or Rainy. You’d like to use a learning algorithm to predict tomorrow’s weather.
Would you treat this as a classification or a regression problem?

•  Regression
• Classification

1. Suppose you are working on stock market prediction, and you would like to predict the price of a particular stock tomorrow (measured in dollars). You want to use a learning algorithm for this.
Would you treat this as a classification or a regression problem?

•  Regression
•  Classification

1. Suppose you are working on stock market prediction. You would like to predict whether or not a certain company will declare bankruptcy within the next 7 days (by training on data of similar companies that had previously been at risk of bankruptcy).
Would you treat this as a classification or a regression problem?

•  Regression
•  Classification

1. Suppose you are working on stock market prediction, Typically tens of millions of shares of Microsoft stock are traded (i.e., bought/sold) each day. You would like to predict the number of Microsoft shares that will be traded tomorrow.
Would you treat this as a classification or a regression problem?

•  Regression
• Classification

1. Some of the problems below are best addressed using a supervised learning algorithm, and the others with an unsupervised learning algorithm. Which of the following would you apply supervised learning to? (Select all that apply.) In each case, assume some appropriate dataset is available for your algorithm to learn from.
•  Given historical data of children’s ages and heights, predict children’s height as a function of their age.
•  Given 50 articles written by male authors, and 50 articles written by female authors, learn to predict the gender of a new manuscript’s author (when the identity of this author is unknown).
•  Take a collection of 1000 essays written on the US Economy, and find a way to automatically group these essays into a small number of groups of essays that are somehow “similar” or “related”.
•  Examine a large collection of emails that are known to be spam email, to discover if there are sub-types of spam mail.

1. Some of the problems below are best addressed using a supervised learning algorithm, and the others with an unsupervised learning algorithm. Which of the following would you apply supervised learning to? (Select all that apply.) In each case, assume some appropriate dataset is available for your algorithm to learn from.
•  Given data on how 1000 medical patients respond to an experimental drug (such as effectiveness of the treatment, side effects, etc.), discover whether there are different categories or “types” of patients in terms of how they respond to the drug, and if so what these categories are.
•  Given a large dataset of medical records from patients suffering from heart disease, try to learn whether there might be different clusters of such patients for which we might tailor separate treatments.
•  Have a computer examine an audio clip of a piece of music, and classify whether or not there are vocals (i.e., a human voice singing) in that audio clip, or if it is a clip of only musical instruments (and no vocals).
•  Given genetic (DNA) data from a person, predict the odds of him/her developing diabetes over the next 10 years.

## Coursera: Machine Learning (Week 1) Quiz – Linear Regression with One Variable

1. Consider the problem of predicting how well a student does in her second year of college/university, given how well she did in her first year. Specifically, let x be equal to the number of “A” grades (including A-. A and A+ grades) that a student receives in their first year of college (freshmen year). We would like to predict the value of y, which we define as the number of “A” grades they get in their second year (sophomore year).
Here each row is one training example. Recall that in linear regression, our hypothesis is $\inline&space;h\theta(x)&space;=&space;\theta_0&space;+&space;\theta_1x$ to denote the number of training examples.

For the training set given above (note that this training set may also be referenced in other questions in this quiz), what is the value of $\inline&space;m$? In the box below, please enter your answer (which should be a number between 0 and 10).

4

1. Many substances that can burn (such as gasoline and alcohol) have a chemical structure based on carbon atoms; for this reason they are called hydrocarbons. A chemist wants to understand how the number of carbon atoms in a molecule affects how much energy is released when that molecule combusts (meaning that it is burned). The chemist obtains the dataset below. In the column on the right, “kJ/mol” is the unit measuring the amount of energy released.

You would like to use linear regression ($\inline&space;h_\theta(x)&space;=&space;\theta_0&space;+&space;\theta_1x$) to estimate the amount of energy released (y) as a function of the number of carbon atoms (x). Which of the following do you think will be the values you obtain for $\inline&space;\theta_0$ and $\inline&space;\theta_1$ ? You should be able to select the right answer without actually implementing linear regression.

1. For this question, assume that we are using the training set from Q1.
Recall our definition of the cost function was $\inline&space;J(\theta_0,&space;\theta_1&space;)&space;=&space;\frac{1}{2m}&space;\sum_{i=1}^{m}&space;(h&space;(x^{(i)}&space;)&space;-&space;y^{(i)})^2$
What is $\inline&space;J(0,1)$? In the box below,

0.5

1. Suppose we set $\inline&space;\theta_0&space;=&space;0,&space;\theta_1&space;=&space;1.5$ in the linear regression hypothesis from Q1. What is $\inline&space;h_\theta(2)$ ?
3

1. Suppose we set $\inline&space;\theta_0$ = −2, $\inline&space;\theta_1$ = 0.5 in the linear regression hypothesis from Q1. What is $\inline&space;h_\theta(6)$?
1

1. Let $\inline&space;f$ be some function so that $\inline&space;f(\theta_0&space;,&space;\theta_1&space;)$ outputs a number. For this problem, $\inline&space;f$ is some arbitrary/unknown smooth function (not necessarily the cost function of linear regression, so $\inline&space;f$ may have local optima).
Suppose we use gradient descent to try to minimize $\inline&space;f(\theta_0&space;,&space;\theta_1&space;)$ as a function of $\inline&space;\theta_0$ and $\inline&space;\theta_1$.
Which of the following statements are true? (Check all that apply.)

1. In the given figure, the cost function $\inline&space;J(\theta_0,&space;\theta_1)$ has been plotted against $\inline&space;\theta_0$ and $\inline&space;\theta_1$, as shown in ‘Plot 2’. The contour plot for the same cost function is given in ‘Plot 1’. Based on the figure, choose the correct options (check all that apply).

## Coursera: Machine Learning (Week 1) Quiz – Linear Algebra | Andrew NG

1. Let u and v be 3-dimensional vectors, where specifically
$\inline&space;u&space;=&space;\begin{bmatrix}&space;4&space;\\&space;-4&space;\\&space;-3&space;\end{bmatrix}$
and
$\inline&space;v&space;=&space;\begin{bmatrix}&space;4&space;\\&space;2&space;\\&space;4&space;\end{bmatrix}$
what is $\inline&space;u^{T}v$ ?
(Hint$\inline&space;u^{T}v$ is a 1×3 dimensional matrix, and v can also be seen as a 3×1 matrix. The answer you want can be obtained by taking the matrix product of $\inline&space;u^{T}$ and $\inline&space;v$.) Do not add brackets to your answer.

-4


## Machine learning coursera assignment answers week 2

### warmUpExercise.m :

function A = warmUpExercise()
%WARMUPEXERCISE Example function in octave
%   A = WARMUPEXERCISE() is an example function that returns the 5x5 identity matrix

A = [];
% ============= YOUR CODE HERE ==============
% Instructions: Return the 5x5 identity matrix
%               In octave, we return values by defining which variables
%               represent the return values (at the top of the file)
%               and then set them accordingly.

A = eye(5);  %It's a built-in function to create identity matrix

% ===========================================
end

### plotData.m :

function plotData(x, y)
%PLOTDATA Plots the data points x and y into a new figure
%   PLOTDATA(x,y) plots the data points and gives the figure axes labels of
%   population and profit.

figure; % open a new figure window

% ====================== YOUR CODE HERE ======================
% Instructions: Plot the training data into a figure using the
%               "figure" and "plot" commands. Set the axes labels using
%               the "xlabel" and "ylabel" commands. Assume the
%               population and revenue data have been passed in
%               as the x and y arguments of this function.
%
% Hint: You can use the 'rx' option with plot to have the markers
%       appear as red crosses. Furthermore, you can make the
%       markers larger by using plot(..., 'rx', 'MarkerSize', 10);

plot(x, y, 'rx', 'MarkerSize', 10); % Plot the data
ylabel('Profit in \$10,000s'); % Set the y-axis label
xlabel('Population of City in 10,000s'); % Set the x-axis label

% ============================================================
end

### computeCost.m :

function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
%   J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
%   parameter for linear regression to fit the data points in X and y

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly
J = 0;

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
%               You should set J to the cost.

%%%%%%%%%%%%% CORRECT %%%%%%%%%
% h = X*theta;
% temp = 0;
% for i=1:m
%   temp = temp + (h(i) - y(i))^2;
% end
% J = (1/(2*m)) * temp;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%% CORRECT: Vectorized Implementation %%%%%%%%%
J = (1/(2*m))*sum(((X*theta)-y).^2);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% =========================================================================
end

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%   taking num_iters gradient steps with learning rate alpha

% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

% ====================== YOUR CODE HERE ======================
% Instructions: Perform a single gradient step on the parameter vector
%               theta.
%
% Hint: While debugging, it can be useful to print out the values
%       of the cost function (computeCost) and gradient here.
%

%%%%%%%%% CORRECT %%%%%%%
%error = (X * theta) - y;
%temp0 = theta(1) - ((alpha/m) * sum(error .* X(:,1)));
%temp1 = theta(2) - ((alpha/m) * sum(error .* X(:,2)));
%theta = [temp0; temp1];
%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%% CORRECT %%%%%%%
%error = (X * theta) - y;
%temp0 = theta(1) - ((alpha/m) * X(:,1)'*error);
%temp1 = theta(2) - ((alpha/m) * X(:,2)'*error);
%theta = [temp0; temp1];
%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%% CORRECT %%%%%%%
error = (X * theta) - y;
theta = theta - ((alpha/m) * X'*error);
%%%%%%%%%%%%%%%%%%%%%%%%%

% ============================================================

% Save the cost J in every iteration
J_history(iter) = computeCost(X, y, theta);

end
end

### computeCostMulti.m :

function J = computeCostMulti(X, y, theta)
%COMPUTECOSTMULTI Compute cost for linear regression with multiple variables
%   J = COMPUTECOSTMULTI(X, y, theta) computes the cost of using theta as the
%   parameter for linear regression to fit the data points in X and y

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly
J = 0;

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
%               You should set J to the cost.

J = (1/(2*m))*(sum(((X*theta)-y).^2));

% =========================================================================

end

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)
%   taking num_iters gradient steps with learning rate alpha

% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

% ====================== YOUR CODE HERE ======================
% Instructions: Perform a single gradient step on the parameter vector
%               theta.
%
% Hint: While debugging, it can be useful to print out the values
%       of the cost function (computeCostMulti) and gradient here.
%

%%%%%%%% CORRECT %%%%%%%%%%
error = (X * theta) - y;
theta = theta - ((alpha/m) * X'*error);
%%%%%%%%%%%%%%%%%%%%%%%%%%%

% ============================================================

% Save the cost J in every iteration
J_history(iter) = computeCostMulti(X, y, theta);

end
end

.

## Linear regression with multiple variables coursera quiz answers week 2

1. Suppose m=4 students have taken some classes, and the class had a midterm exam and a final exam. You have collected a dataset of their scores on the two exams, which is as follows:

You’d like to use polynomial regression to predict a student’s final exam score from their midterm exam score. Concretely, suppose you want to fit a model of the form $\inline&space;h_{\theta}(x)&space;=&space;\theta_{0}&space;+&space;\theta_{1}&space;x_{1}&space;+&space;\theta_{2}&space;x_{2}$, where $\inline&space;x_1$ is the midterm score and x_2 is (midterm score)^2. Further, you plan to use both feature scaling (dividing by the “max-min”, or range, of a feature) and mean normalization.
What is the normalized feature $\inline&space;x_2^{(4)}$ ? (Hint: midterm = 69, final = 78 is training example 4.) Please round off your answer to two decimal places and enter in the text box below.

-0.47

1. You run gradient descent for 15 iterations with $\inline&space;\alpha&space;=&space;0.3$ and compute after each iteration. You find that the value of $\inline&space;J(\theta)$ decreases slowly and is still decreasing after 15 iterations. Based on this, which of the following conclusions seems most plausible?

1. You run gradient descent for 15 iterations with $\inline&space;\alpha&space;=&space;0.3$ and compute after each iteration. You find that the value of $\inline&space;J(\theta)$ decreases quickly then levels off. Based on this, which of the following conclusions seems most plausible?

1. Suppose you have m = 23 training examples with n = 5 features (excluding the additional all-ones feature for the intercept term, which you should add). The normal equation is $\inline&space;\theta&space;=&space;(X^{T}&space;X)^{-1}X^{T}y$. For the given values of m and n, what are the dimensions of $\inline&space;\theta$, X, and y in this equation?
•  X is 23 × 5, y is 23 × 1, θ is 5 × 5
•  X is 23 × 6, y is 23 × 6, θ is 6 × 6
•  X is 23 × 6, y is 23 × 1, θ is 6 × 1

X has m rows and n+1 columns (+1 because of the $\inline&space;x_0=1$ term). y is m-vector. $\inline&space;\theta$ is an (n+1)-vector

• X is 23 × 5, y is 23 × 1, θ is 5 × 1

1. Suppose you have a dataset with m = 1000000 examples and n = 200000 features for each example. You want to use multivariate linear regression to fit the parameters $\inline&space;\theta$ to our data. Should you prefer gradient descent or the normal equation?

With n = 200000 features, you will have to invert a 200001 x 200001 matrix to compute the normal equation. Inverting such a large matrix is computationally expensive, so gradient descent is a good choice.

•  The normal equation, since it provides an efficient way to directly find the solution.
•  The normal equation, since gradient descent might be unable to find the optimal θ.

## Octave/ Matlab tutorial coursera quiz answers week 2

1. Suppose I first execute the following Octave/Matlab commands:
A = [1 2; 3 4; 5 6];
B = [1 2 3; 4 5 6];

Which of the following are then valid commands? Check all that apply. (Hint: A’ denotes the transpose of A.)

•  C = A * B;
•  C = B’ + A;
•  C = A’ * B;
• C = B + A;

1. Let
$\inline&space;A&space;=&space;\begin{bmatrix}&space;16&space;&&space;2&space;&&space;3&space;&&space;13\\&space;5&space;&&space;11&space;&&space;10&space;&&space;8\\&space;9&space;&&space;7&space;&&space;6&space;&&space;12\\&space;4&space;&&space;14&space;&&space;15&space;&&space;1&space;\end{bmatrix}$
Which of the following indexing expressions gives
$\inline&space;B&space;=&space;\begin{bmatrix}&space;16&space;&&space;2\\&space;5&space;&&space;11\\&space;9&space;&&space;7\\&space;4&space;&&space;14&space;\end{bmatrix}&space;?$
Check all that apply.

•  B = A(:, 1:2);
•  B = A(1:4, 1:2);
•  B = A(:, 0:2);
• B = A(0:4, 0:2);

1. Let A be a 10×10 matrix and x be a 10-element vector. Your friend wants to compute the product Ax and writes the following code:
v = zeros(10, 1);
for i = 1:10
for j = 1:10
v(i) = v(i) + A(i, j) * x(j);
end
end

How would you vectorize this code to run without any for loops? Check all that apply.

•  v = A * x;
•  v = Ax;
•  v = x’ * A;
•  v = sum (A * x);

1. Say you have two column vectors v and w, each with 7 elements (i.e., they have dimensions 7×1). Consider the following code:
z = 0;
for i = 1:7
z = z + v(i) * w(i)
end

Which of the following vectorizations correctly compute z? Check all that apply.

•  z = sum (v .* w);
•  z = w’ * v;
•  z = v * w’;
•  z = w * v’;

1. In Octave/Matlab, many functions work on single numbers, vectors, and matrices. For example, the sin function when applied to a matrix will return a new matrix with the sin of each element. But you have to be careful, as certain functions have different behavior. Suppose you have an 7×7 matrix X. You want to compute the log of every element, the square of every element, add 1 to every element, and divide every element by 4. You will store the results in four matrices, A, B, C, D. One way to do so is the following code:
for i = 1:7
for j = 1:7
A(i, j) = log(X(i, j));
B(i, j) = X(i, j) ^ 2;
C(i, j) = X(i, j) + 1;
D(i, j) = X(i, j) / 4;
end
end

Which of the following correctly compute A, B, C or D? Check all that apply.

•  C = X + 1;
•  D = X / 4;
•  A = log (X);
•  B = X ^ 2;

## Machine learning coursera assignment week 3 answers

### plotData.m :

function plotData(X, y)
%PLOTDATA Plots the data points X and y into a new figure
%   PLOTDATA(x,y) plots the data points with + for the positive examples
%   and o for the negative examples. X is assumed to be a Mx2 matrix.

% ====================== YOUR CODE HERE ======================
% Instructions: Plot the positive and negative examples on a
%               2D plot, using the option 'k+' for the positive
%               examples and 'ko' for the negative examples.
%

%Seperating positive and negative results
pos = find(y==1); %index of positive results
neg = find(y==0); %index of negative results

% Create New Figure
figure;

%Plotting Positive Results on
%    X_axis: Exam1 Score =  X(pos,1)
%    Y_axis: Exam2 Score =  X(pos,2)
plot(X(pos,1),X(pos,2),'g+');

%To keep above plotted graph as it is.
hold on;

%Plotting Negative Results on
%    X_axis: Exam1 Score =  X(neg,1)
%    Y_axis: Exam2 Score =  X(neg,2)
plot(X(neg,1),X(neg,2),'ro');

% =========================================================================

hold off;
end

### sigmoid.m :

function g = sigmoid(z)
%SIGMOID Compute sigmoid function
%   g = SIGMOID(z) computes the sigmoid of z.

% You need to return the following variables correctly
g = zeros(size(z));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the sigmoid of each value of z (z can be a matrix,
%               vector or scalar).
g = 1./(1+exp(-z));

% =============================================================
end

### costFunction.m :

function [J, grad] = costFunction(theta, X, y)
%COSTFUNCTION Compute cost and gradient for logistic regression
%   J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
%   parameter for logistic regression and the gradient of the cost
%   w.r.t. to the parameters.

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly
J = 0;

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta
%
% Note: grad should have the same dimensions as theta
%
%DIMENSIONS:
%   theta = (n+1) x 1
%   X     = m x (n+1)
%   y     = m x 1
%   grad  = (n+1) x 1
%   J     = Scalar

z = X * theta;      % m x 1
h_x = sigmoid(z);   % m x 1

J = (1/m)*sum((-y.*log(h_x))-((1-y).*log(1-h_x))); % scalar

grad = (1/m)* (X'*(h_x-y));     % (n+1) x 1

% =============================================================

end

### predict.m :

function p = predict(theta, X)
%PREDICT Predict whether the label is 0 or 1 using learned logistic
%regression parameters theta
%   p = PREDICT(theta, X) computes the predictions for X using a
%   threshold at 0.5 (i.e., if sigmoid(theta'*x) >= 0.5, predict 1)

m = size(X, 1); % Number of training examples

% You need to return the following variables correctly
p = zeros(m, 1);

% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
%               your learned logistic regression parameters.
%               You should set p to a vector of 0's and 1's
%
% Dimentions:
% X     =  m x (n+1)
% theta = (n+1) x 1

h_x = sigmoid(X*theta);
p=(h_x>=0.5);

%p = double(sigmoid(X * theta)>=0.5);
% =========================================================================
end

### costFunctionReg.m :

function [J, grad] = costFunctionReg(theta, X, y, lambda)
%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
%   J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
%   theta as the parameter for regularized logistic regression and the
%   gradient of the cost w.r.t. to the parameters.

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly
J = 0;

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta

%DIMENSIONS:
%   theta = (n+1) x 1
%   X     = m x (n+1)
%   y     = m x 1
%   grad  = (n+1) x 1
%   J     = Scalar

z = X * theta;      % m x 1
h_x = sigmoid(z);  % m x 1

reg_term = (lambda/(2*m)) * sum(theta(2:end).^2);

J = (1/m)*sum((-y.*log(h_x))-((1-y).*log(1-h_x))) + reg_term; % scalar

grad(1) = (1/m)* (X(:,1)'*(h_x-y));                                  % 1 x 1
grad(2:end) = (1/m)* (X(:,2:end)'*(h_x-y))+(lambda/m)*theta(2:end);  % n x 1

% =============================================================
end

## LOGISTIC REGRESSION COURSERA QUIZ ANSWERS WEEK 3

1. Suppose that you have trained a logistic regression classifier, and it outputs on a new example a prediction $\inline&space;h_\theta(x)$ = 0.2. This means (check all that apply):
•  Our estimate for P(y = 1|x; θ) is 0.8.

h(x) gives P(y=1|x; θ), not 1 – P(y=1|x; θ)

•  Our estimate for P(y = 0|x; θ) is 0.8.

Since we must have P(y=0|x;θ) = 1 – P(y=1|x; θ), the former is
1 – 0.2 = 0.8.

•  Our estimate for P(y = 1|x; θ) is 0.2.

h(x) is precisely P(y=1|x; θ), so each is 0.2.

•  Our estimate for P(y = 0|x; θ) is 0.2.

h(x) is P(y=1|x; θ), not P(y=0|x; θ)

1. Suppose you have the following training set, and fit a logistic regression classifier $\inline&space;h_\theta(x)&space;=&space;g(\theta_0&space;+&space;\theta_1x_1&space;+&space;\theta_2x_2)$.

Which of the following are true? Check all that apply.

1. For logistic regression, the gradient is given by $\inline&space;\frac{\partial&space;}{\partial&space;\theta_j&space;}&space;J(\theta)&space;=&space;\frac{1}{m}&space;\sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{i})x^{(i)}_j$. Which of these is a correct gradient descent update for logistic regression with a learning rate of $\inline&space;\alpha$ ? Check all that apply.

1. Which of the following statements are true? Check all that apply.

Suppose you train a logistic classifier $\inline&space;h_\theta(x)&space;=&space;g(\theta_0&space;+&space;\theta_1x_1&space;+&space;\theta_2x_2)$. Suppose $\inline&space;\theta_0&space;=&space;6$$\inline&space;\theta_1&space;=&space;-1$$\inline&space;\theta_2&space;=&space;0$. Which of the following figures represents the decision boundary found by your classifier?

•   Figure:

In this figure, we transition from negative to positive when x1 goes from left of 6 to right of 6 which is true for the given values of θ.

•   Figure:
•   Figure:
•   Figure:

## REGULARIZATION COURSERA QUIZ ANSWERS WEEK 3

1. You are training a classification model with logistic regression. Which of the following statements are true? Check all that apply.
•  Introducing regularization to the model always results in equal or better performance on the training set.
•  Introducing regularization to the model always results in equal or better performance on examples not in the training set.
•  Adding a new feature to the model always results in equal or better performance on the training set.
• Adding many new features to the model helps prevent overfitting on the training set.

1. Suppose you ran logistic regression twice, once with $\inline&space;\lambda&space;=&space;0$, and once with $\inline&space;\lambda&space;=&space;1$. One of the times, you got parameters $\inline&space;\theta&space;=&space;\begin{bmatrix}&space;74.81\\&space;45.05&space;\end{bmatrix}$, and the other time you got $\inline&space;\theta&space;=&space;\begin{bmatrix}&space;1.37\\&space;0.51&space;\end{bmatrix}$. However, you forgot which value of $\inline&space;\lambda$ corresponds to which value of $\inline&space;\theta$. Which one do you think corresponds to $\inline&space;\lambda&space;=&space;1$?

1. Suppose you ran logistic regression twice, once with $\inline&space;\lambda&space;=&space;0$, and once with $\inline&space;\lambda&space;=&space;1$. One of the times, you got parameters $\inline&space;\theta&space;=&space;\begin{bmatrix}&space;81.47\\&space;12.69&space;\end{bmatrix}$, and the other time you got $\inline&space;\theta&space;=&space;\begin{bmatrix}&space;13.01\\&space;0.91&space;\end{bmatrix}$. However, you forgot which value of $\inline&space;\lambda$ corresponds to which value of $\inline&space;\theta$. Which one do you think corresponds to $\inline&space;\lambda&space;=&space;1$?

1. Which of the following statements about regularization are true? Check all that apply.

1. Which of the following statements about regularization are true? Check all that apply.

1. In which one of the following figures do you think the hypothesis has overfit the training set?
•  Figure:
•  Figure:
•  Figure:
•  Figure:

In which one of the following figures do you think the hypothesis has underfit the training set?

•  Figure:
•  Figure:
•  Figure:
•  Figure:

ASSEMBLE A COMPUTER COURSERA ANSWERS

IGNORE TAGS:=====

“machine learning coursera quiz answers week 1”
“machine learning coursera quiz answers week 2”
“machine learning coursera quiz answers week 6”
“machine learning coursera quiz answers github”
“machine learning coursera quiz answers week 3”
“machine learning coursera quiz answers week 5”
“machine learning coursera quiz answers week 4”
“machine learning coursera quiz answers week 9”
“machine learning coursera quiz answers week 10”
“getting started with aws machine learning coursera quiz answers”
“introduction to machine learning coursera quiz answers”
“getting started with aws machine learning coursera quiz answers github”
“stanford university machine learning coursera quiz answers”
“mathematics for machine learning coursera quiz answers”
“introduction to applied machine learning coursera quiz answers”
“large scale machine learning coursera quiz answers”
“neural networks for machine learning coursera quiz answers”