machine learning coursera quiz answers
Table of Contents
machine learning coursera quiz answers all weeks
Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us selfdriving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it. Many researchers also think it is the best way to make progress towards humanlevel AI.
In this class, you will learn about the most effective machine learning techniques, and gain practice implementing them and getting them to work for yourself. More importantly, you’ll learn about not only the theoretical underpinnings of learning, but also gain the practical knowhow needed to quickly and powerfully apply these techniques to new problems. Finally, you’ll learn about some of Silicon Valley’s best practices in innovation as it pertains to machine learning and AI.
introduction to machine learning duke university coursera quiz answers, machine learning coursera github, coursera machine learning quiz answers week 3, introduction to machine learning coursera quiz answers, machine learning for all university of london coursera quiz answers, coursera university of washington machine learning quiz answers, machine learning coursera quiz answers week 1, machine learning coursera quiz answers week 2 
machine learning coursera quiz answers week 1
 A computer program is said to learn from experience E with respect to some task T and some performance measure P if its performance on T, as measured by P, improves with experience E. Suppose we feed a learning algorithm a lot of historical weather data, and have it learn to predict weather. What would be a reasonable choice for P?
 The probability of it correctly predicting a future date’s weather.
 The weather prediction task.
 The process of the algorithm examining a large amount of historical weather data.
 None of these.
 A computer program is said to learn from experience E with respect to some task T and some performance measure P if its performance on T, as measured by P, improves with experience E. Suppose we feed a learning algorithm a lot of historical weather data, and have it learn to predict weather. In this setting, what is T?
 The weather prediction task.
 None of these.
 The probability of it correctly predicting a future date’s weather.
 The process of the algorithm examining a large amount of historical weather data.
 Suppose you are working on weather prediction, and use a learning algorithm to predict tomorrow’s temperature (in degrees Centigrade/Fahrenheit).
Would you treat this as a classification or a regression problem? Regression
 Classification
 Suppose you are working on weather prediction, and your weather station makes one of three predictions for each day’s weather: Sunny, Cloudy or Rainy. You’d like to use a learning algorithm to predict tomorrow’s weather.
Would you treat this as a classification or a regression problem? Regression
 Classification
 Suppose you are working on stock market prediction, and you would like to predict the price of a particular stock tomorrow (measured in dollars). You want to use a learning algorithm for this.
Would you treat this as a classification or a regression problem? Regression
 Classification
 Suppose you are working on stock market prediction. You would like to predict whether or not a certain company will declare bankruptcy within the next 7 days (by training on data of similar companies that had previously been at risk of bankruptcy).
Would you treat this as a classification or a regression problem? Regression
 Classification
 Suppose you are working on stock market prediction, Typically tens of millions of shares of Microsoft stock are traded (i.e., bought/sold) each day. You would like to predict the number of Microsoft shares that will be traded tomorrow.
Would you treat this as a classification or a regression problem? Regression
 Classification
 Some of the problems below are best addressed using a supervised learning algorithm, and the others with an unsupervised learning algorithm. Which of the following would you apply supervised learning to? (Select all that apply.) In each case, assume some appropriate dataset is available for your algorithm to learn from.
 Given historical data of children’s ages and heights, predict children’s height as a function of their age.
 Given 50 articles written by male authors, and 50 articles written by female authors, learn to predict the gender of a new manuscript’s author (when the identity of this author is unknown).
 Take a collection of 1000 essays written on the US Economy, and find a way to automatically group these essays into a small number of groups of essays that are somehow “similar” or “related”.
 Examine a large collection of emails that are known to be spam email, to discover if there are subtypes of spam mail.
 Some of the problems below are best addressed using a supervised learning algorithm, and the others with an unsupervised learning algorithm. Which of the following would you apply supervised learning to? (Select all that apply.) In each case, assume some appropriate dataset is available for your algorithm to learn from.
 Given data on how 1000 medical patients respond to an experimental drug (such as effectiveness of the treatment, side effects, etc.), discover whether there are different categories or “types” of patients in terms of how they respond to the drug, and if so what these categories are.
 Given a large dataset of medical records from patients suffering from heart disease, try to learn whether there might be different clusters of such patients for which we might tailor separate treatments.
 Have a computer examine an audio clip of a piece of music, and classify whether or not there are vocals (i.e., a human voice singing) in that audio clip, or if it is a clip of only musical instruments (and no vocals).
 Given genetic (DNA) data from a person, predict the odds of him/her developing diabetes over the next 10 years.
Coursera: Machine Learning (Week 1) Quiz – Linear Regression with One Variable
 Consider the problem of predicting how well a student does in her second year of college/university, given how well she did in her first year. Specifically, let x be equal to the number of “A” grades (including A. A and A+ grades) that a student receives in their first year of college (freshmen year). We would like to predict the value of y, which we define as the number of “A” grades they get in their second year (sophomore year).
Here each row is one training example. Recall that in linear regression, our hypothesis is to denote the number of training examples.
For the training set given above (note that this training set may also be referenced in other questions in this quiz), what is the value of ? In the box below, please enter your answer (which should be a number between 0 and 10).4
 Many substances that can burn (such as gasoline and alcohol) have a chemical structure based on carbon atoms; for this reason they are called hydrocarbons. A chemist wants to understand how the number of carbon atoms in a molecule affects how much energy is released when that molecule combusts (meaning that it is burned). The chemist obtains the dataset below. In the column on the right, “kJ/mol” is the unit measuring the amount of energy released.
You would like to use linear regression () to estimate the amount of energy released (y) as a function of the number of carbon atoms (x). Which of the following do you think will be the values you obtain for and ? You should be able to select the right answer without actually implementing linear regression.
 For this question, assume that we are using the training set from Q1.
Recall our definition of the cost function was
What is ? In the box below,
please enter your answer (Simplify fractions to decimals when entering answer, and ‘.’ as the decimal delimiter e.g., 1.5).0.5
 Let be some function so that outputs a number. For this problem, is some arbitrary/unknown smooth function (not necessarily the cost function of linear regression, so may have local optima).
Suppose we use gradient descent to try to minimize as a function of and .
Which of the following statements are true? (Check all that apply.)

 If and are initialized at the global minimum, then one iteration will not change their values.
 Setting the learning rate to be very small is not harmful, and can only speed up the convergence of gradient descent.
 No matter how and are initialized, so long as is sufficiently small, we can safely expect gradient descent to converge to the same solution.
 If the first few iterations of gradient descent cause to increase rather than decrease, then the most likely cause is that we have set the learning rate to too large a value.
 In the given figure, the cost function has been plotted against and , as shown in ‘Plot 2’. The contour plot for the same cost function is given in ‘Plot 1’. Based on the figure, choose the correct options (check all that apply).
 If we start from point B, gradient descent with a wellchosen learning rate will eventually help us reach at or near point A, as the value of cost function is maximum at point A.
 If we start from point B, gradient descent with a wellchosen learning rate will eventually help us reach at or near point C, as the value of cost function is minimum at point C.
 Point P (the global minimum of plot 2) corresponds to point A of Plot 1.
 If we start from point B, gradient descent with a wellchosen learning rate will eventually help us reach at or near point A, as the value of cost function is minimum at A.
 Point P (The global minimum of plot 2) corresponds to point C of Plot 1.
Coursera: Machine Learning (Week 1) Quiz – Linear Algebra  Andrew NG
 Let u and v be 3dimensional vectors, where specifically
and
what is ?
(Hint: is a 1×3 dimensional matrix, and v can also be seen as a 3×1 matrix. The answer you want can be obtained by taking the matrix product of and .) Do not add brackets to your answer.4
Machine learning coursera assignment answers week 2
warmUpExercise.m :
function A = warmUpExercise() %WARMUPEXERCISE Example function in octave % A = WARMUPEXERCISE() is an example function that returns the 5x5 identity matrix A = []; % ============= YOUR CODE HERE ============== % Instructions: Return the 5x5 identity matrix % In octave, we return values by defining which variables % represent the return values (at the top of the file) % and then set them accordingly. A = eye(5); %It's a builtin function to create identity matrix % =========================================== end
plotData.m :
function plotData(x, y) %PLOTDATA Plots the data points x and y into a new figure % PLOTDATA(x,y) plots the data points and gives the figure axes labels of % population and profit. figure; % open a new figure window % ====================== YOUR CODE HERE ====================== % Instructions: Plot the training data into a figure using the % "figure" and "plot" commands. Set the axes labels using % the "xlabel" and "ylabel" commands. Assume the % population and revenue data have been passed in % as the x and y arguments of this function. % % Hint: You can use the 'rx' option with plot to have the markers % appear as red crosses. Furthermore, you can make the % markers larger by using plot(..., 'rx', 'MarkerSize', 10); plot(x, y, 'rx', 'MarkerSize', 10); % Plot the data ylabel('Profit in $10,000s'); % Set the yaxis label xlabel('Population of City in 10,000s'); % Set the xaxis label % ============================================================ end
computeCost.m :
function J = computeCost(X, y, theta) %COMPUTECOST Compute cost for linear regression % J = COMPUTECOST(X, y, theta) computes the cost of using theta as the % parameter for linear regression to fit the data points in X and y % Initialize some useful values m = length(y); % number of training examples % You need to return the following variables correctly J = 0; % ====================== YOUR CODE HERE ====================== % Instructions: Compute the cost of a particular choice of theta % You should set J to the cost. %%%%%%%%%%%%% CORRECT %%%%%%%%% % h = X*theta; % temp = 0; % for i=1:m % temp = temp + (h(i)  y(i))^2; % end % J = (1/(2*m)) * temp; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%% CORRECT: Vectorized Implementation %%%%%%%%% J = (1/(2*m))*sum(((X*theta)y).^2); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % ========================================================================= end
gradientDescent.m :
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) %GRADIENTDESCENT Performs gradient descent to learn theta % theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by % taking num_iters gradient steps with learning rate alpha % Initialize some useful values m = length(y); % number of training examples J_history = zeros(num_iters, 1); for iter = 1:num_iters % ====================== YOUR CODE HERE ====================== % Instructions: Perform a single gradient step on the parameter vector % theta. % % Hint: While debugging, it can be useful to print out the values % of the cost function (computeCost) and gradient here. % %%%%%%%%% CORRECT %%%%%%% %error = (X * theta)  y; %temp0 = theta(1)  ((alpha/m) * sum(error .* X(:,1))); %temp1 = theta(2)  ((alpha/m) * sum(error .* X(:,2))); %theta = [temp0; temp1]; %%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%% CORRECT %%%%%%% %error = (X * theta)  y; %temp0 = theta(1)  ((alpha/m) * X(:,1)'*error); %temp1 = theta(2)  ((alpha/m) * X(:,2)'*error); %theta = [temp0; temp1]; %%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%% CORRECT %%%%%%% error = (X * theta)  y; theta = theta  ((alpha/m) * X'*error); %%%%%%%%%%%%%%%%%%%%%%%%% % ============================================================ % Save the cost J in every iteration J_history(iter) = computeCost(X, y, theta); end end
computeCostMulti.m :
function J = computeCostMulti(X, y, theta) %COMPUTECOSTMULTI Compute cost for linear regression with multiple variables % J = COMPUTECOSTMULTI(X, y, theta) computes the cost of using theta as the % parameter for linear regression to fit the data points in X and y % Initialize some useful values m = length(y); % number of training examples % You need to return the following variables correctly J = 0; % ====================== YOUR CODE HERE ====================== % Instructions: Compute the cost of a particular choice of theta % You should set J to the cost. J = (1/(2*m))*(sum(((X*theta)y).^2)); % ========================================================================= end
gradientDescentMulti.m :
function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters) %GRADIENTDESCENTMULTI Performs gradient descent to learn theta % theta = GRADIENTDESCENTMULTI(x, y, theta, alpha, num_iters) updates theta by % taking num_iters gradient steps with learning rate alpha % Initialize some useful values m = length(y); % number of training examples J_history = zeros(num_iters, 1); for iter = 1:num_iters % ====================== YOUR CODE HERE ====================== % Instructions: Perform a single gradient step on the parameter vector % theta. % % Hint: While debugging, it can be useful to print out the values % of the cost function (computeCostMulti) and gradient here. % %%%%%%%% CORRECT %%%%%%%%%% error = (X * theta)  y; theta = theta  ((alpha/m) * X'*error); %%%%%%%%%%%%%%%%%%%%%%%%%%% % ============================================================ % Save the cost J in every iteration J_history(iter) = computeCostMulti(X, y, theta); end end
machine learning coursera quiz answers week 2
.
Linear regression with multiple variables coursera quiz answers week 2

 Suppose m=4 students have taken some classes, and the class had a midterm exam and a final exam. You have collected a dataset of their scores on the two exams, which is as follows:
You’d like to use polynomial regression to predict a student’s final exam score from their midterm exam score. Concretely, suppose you want to fit a model of the form , where is the midterm score and x_2 is (midterm score)^2. Further, you plan to use both feature scaling (dividing by the “maxmin”, or range, of a feature) and mean normalization.
What is the normalized feature ? (Hint: midterm = 69, final = 78 is training example 4.) Please round off your answer to two decimal places and enter in the text box below.
 Suppose m=4 students have taken some classes, and the class had a midterm exam and a final exam. You have collected a dataset of their scores on the two exams, which is as follows:
0.47
 You run gradient descent for 15 iterations with and compute after each iteration. You find that the value of decreases slowly and is still decreasing after 15 iterations. Based on this, which of the following conclusions seems most plausible?
 You run gradient descent for 15 iterations with and compute after each iteration. You find that the value of decreases quickly then levels off. Based on this, which of the following conclusions seems most plausible?
 Suppose you have m = 23 training examples with n = 5 features (excluding the additional allones feature for the intercept term, which you should add). The normal equation is . For the given values of m and n, what are the dimensions of , X, and y in this equation?

 X is 23 × 5, y is 23 × 1, θ is 5 × 5
 X is 23 × 6, y is 23 × 6, θ is 6 × 6
 X is 23 × 6, y is 23 × 1, θ is 6 × 1
X has m rows and n+1 columns (+1 because of the term). y is mvector. is an (n+1)vector
 X is 23 × 5, y is 23 × 1, θ is 5 × 1

 Suppose you have a dataset with m = 1000000 examples and n = 200000 features for each example. You want to use multivariate linear regression to fit the parameters to our data. Should you prefer gradient descent or the normal equation?
With n = 200000 features, you will have to invert a 200001 x 200001 matrix to compute the normal equation. Inverting such a large matrix is computationally expensive, so gradient descent is a good choice.
 The normal equation, since it provides an efficient way to directly find the solution.
 The normal equation, since gradient descent might be unable to find the optimal θ.
Octave/ Matlab tutorial coursera quiz answers week 2
 Suppose I first execute the following Octave/Matlab commands:
A = [1 2; 3 4; 5 6]; B = [1 2 3; 4 5 6];
Which of the following are then valid commands? Check all that apply. (Hint: A’ denotes the transpose of A.)
 C = A * B;
 C = B’ + A;
 C = A’ * B;
 C = B + A;
 Let
Which of the following indexing expressions gives
Check all that apply. B = A(:, 1:2);
 B = A(1:4, 1:2);
 B = A(:, 0:2);
 B = A(0:4, 0:2);
 Let A be a 10×10 matrix and x be a 10element vector. Your friend wants to compute the product Ax and writes the following code:
v = zeros(10, 1); for i = 1:10 for j = 1:10 v(i) = v(i) + A(i, j) * x(j); end end
How would you vectorize this code to run without any for loops? Check all that apply.
 v = A * x;
 v = Ax;
 v = x’ * A;
 v = sum (A * x);
 Say you have two column vectors v and w, each with 7 elements (i.e., they have dimensions 7×1). Consider the following code:
z = 0; for i = 1:7 z = z + v(i) * w(i) end
Which of the following vectorizations correctly compute z? Check all that apply.
 z = sum (v .* w);
 z = w’ * v;
 z = v * w’;
 z = w * v’;
 In Octave/Matlab, many functions work on single numbers, vectors, and matrices. For example, the sin function when applied to a matrix will return a new matrix with the sin of each element. But you have to be careful, as certain functions have different behavior. Suppose you have an 7×7 matrix X. You want to compute the log of every element, the square of every element, add 1 to every element, and divide every element by 4. You will store the results in four matrices, A, B, C, D. One way to do so is the following code:
for i = 1:7 for j = 1:7 A(i, j) = log(X(i, j)); B(i, j) = X(i, j) ^ 2; C(i, j) = X(i, j) + 1; D(i, j) = X(i, j) / 4; end end
Which of the following correctly compute A, B, C or D? Check all that apply.
 C = X + 1;
 D = X / 4;
 A = log (X);
 B = X ^ 2;
MACHINE LEARNING COURSERA WEEK 3 ANSWERS
Machine learning coursera assignment week 3 answers
plotData.m :
function plotData(X, y) %PLOTDATA Plots the data points X and y into a new figure % PLOTDATA(x,y) plots the data points with + for the positive examples % and o for the negative examples. X is assumed to be a Mx2 matrix. % ====================== YOUR CODE HERE ====================== % Instructions: Plot the positive and negative examples on a % 2D plot, using the option 'k+' for the positive % examples and 'ko' for the negative examples. % %Seperating positive and negative results pos = find(y==1); %index of positive results neg = find(y==0); %index of negative results % Create New Figure figure; %Plotting Positive Results on % X_axis: Exam1 Score = X(pos,1) % Y_axis: Exam2 Score = X(pos,2) plot(X(pos,1),X(pos,2),'g+'); %To keep above plotted graph as it is. hold on; %Plotting Negative Results on % X_axis: Exam1 Score = X(neg,1) % Y_axis: Exam2 Score = X(neg,2) plot(X(neg,1),X(neg,2),'ro'); % ========================================================================= hold off; end
sigmoid.m :
function g = sigmoid(z) %SIGMOID Compute sigmoid function % g = SIGMOID(z) computes the sigmoid of z. % You need to return the following variables correctly g = zeros(size(z)); % ====================== YOUR CODE HERE ====================== % Instructions: Compute the sigmoid of each value of z (z can be a matrix, % vector or scalar). g = 1./(1+exp(z)); % ============================================================= end
costFunction.m :
function [J, grad] = costFunction(theta, X, y) %COSTFUNCTION Compute cost and gradient for logistic regression % J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the % parameter for logistic regression and the gradient of the cost % w.r.t. to the parameters. % Initialize some useful values m = length(y); % number of training examples % You need to return the following variables correctly J = 0; grad = zeros(size(theta)); % ====================== YOUR CODE HERE ====================== % Instructions: Compute the cost of a particular choice of theta. % You should set J to the cost. % Compute the partial derivatives and set grad to the partial % derivatives of the cost w.r.t. each parameter in theta % % Note: grad should have the same dimensions as theta % %DIMENSIONS: % theta = (n+1) x 1 % X = m x (n+1) % y = m x 1 % grad = (n+1) x 1 % J = Scalar z = X * theta; % m x 1 h_x = sigmoid(z); % m x 1 J = (1/m)*sum((y.*log(h_x))((1y).*log(1h_x))); % scalar grad = (1/m)* (X'*(h_xy)); % (n+1) x 1 % ============================================================= end
predict.m :
function p = predict(theta, X) %PREDICT Predict whether the label is 0 or 1 using learned logistic %regression parameters theta % p = PREDICT(theta, X) computes the predictions for X using a % threshold at 0.5 (i.e., if sigmoid(theta'*x) >= 0.5, predict 1) m = size(X, 1); % Number of training examples % You need to return the following variables correctly p = zeros(m, 1); % ====================== YOUR CODE HERE ====================== % Instructions: Complete the following code to make predictions using % your learned logistic regression parameters. % You should set p to a vector of 0's and 1's % % Dimentions: % X = m x (n+1) % theta = (n+1) x 1 h_x = sigmoid(X*theta); p=(h_x>=0.5); %p = double(sigmoid(X * theta)>=0.5); % ========================================================================= end
costFunctionReg.m :
function [J, grad] = costFunctionReg(theta, X, y, lambda) %COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization % J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using % theta as the parameter for regularized logistic regression and the % gradient of the cost w.r.t. to the parameters. % Initialize some useful values m = length(y); % number of training examples % You need to return the following variables correctly J = 0; grad = zeros(size(theta)); % ====================== YOUR CODE HERE ====================== % Instructions: Compute the cost of a particular choice of theta. % You should set J to the cost. % Compute the partial derivatives and set grad to the partial % derivatives of the cost w.r.t. each parameter in theta %DIMENSIONS: % theta = (n+1) x 1 % X = m x (n+1) % y = m x 1 % grad = (n+1) x 1 % J = Scalar z = X * theta; % m x 1 h_x = sigmoid(z); % m x 1 reg_term = (lambda/(2*m)) * sum(theta(2:end).^2); J = (1/m)*sum((y.*log(h_x))((1y).*log(1h_x))) + reg_term; % scalar grad(1) = (1/m)* (X(:,1)'*(h_xy)); % 1 x 1 grad(2:end) = (1/m)* (X(:,2:end)'*(h_xy))+(lambda/m)*theta(2:end); % n x 1 % ============================================================= end
LOGISTIC REGRESSION COURSERA QUIZ ANSWERS WEEK 3
 Suppose that you have trained a logistic regression classifier, and it outputs on a new example a prediction = 0.2. This means (check all that apply):
 Our estimate for P(y = 1x; θ) is 0.8.
 Our estimate for P(y = 0x; θ) is 0.8.
Since we must have P(y=0x;θ) = 1 – P(y=1x; θ), the former is
1 – 0.2 = 0.8.  Our estimate for P(y = 1x; θ) is 0.2.
h(x) is precisely P(y=1x; θ), so each is 0.2.
 Our estimate for P(y = 0x; θ) is 0.2.
h(x) is P(y=1x; θ), not P(y=0x; θ)
 Our estimate for P(y = 1x; θ) is 0.8.
 Suppose you have the following training set, and fit a logistic regression classifier .
Which of the following are true? Check all that apply. Adding polynomial features (e.g., instead using ) could increase how well we can fit the training data.
 At the optimal value of θ (e.g., found by fminunc), we will have J(θ) ≥ 0.
 Adding polynomial features (e.g., instead using ) would increase J(θ) because we are now summing over more terms.
 If we train gradient descent for enough iterations, for some examples in the training set it is possible to obtain .
 For logistic regression, the gradient is given by . Which of these is a correct gradient descent update for logistic regression with a learning rate of ? Check all that apply.
 Which of the following statements are true? Check all that apply.
 The onevsall technique allows you to use logistic regression for problems in which each comes from a fixed, discrete set of values.
If each is one of k different values, we can give a label to each and use onevsall as described in the lecture.
 For logistic regression, sometimes gradient descent will converge to a local minimum (and fail to find the global minimum). This is the reason we prefer more advanced optimization algorithms such as fminunc (conjugate gradient/BFGS/LBFGS/etc).
The cost function for logistic regression is convex, so gradient descent will always converge to the global minimum. We still might use a more advanced optimisation algorithm since they can be faster and don’t require you to select a learning rate.
 The cost function for logistic regression trained with examples is always greater than or equal to zero.
The cost for any example is always since it is the negative log of a quantity less than one. The cost function is a summation over the cost for each sample, so the cost function itself must be greater than or equal to zero.
 Since we train one classifier when there are two classes, we train two classifiers when there are three classes (and we do onevsall classification).
We will need 3 classfiers. Oneforeach class.
 The onevsall technique allows you to use logistic regression for problems in which each comes from a fixed, discrete set of values.
Suppose you train a logistic classifier . Suppose , , . Which of the following figures represents the decision boundary found by your classifier?
 Figure:
In this figure, we transition from negative to positive when x1 goes from left of 6 to right of 6 which is true for the given values of θ.
 Figure:
 Figure:
 Figure:
REGULARIZATION COURSERA QUIZ ANSWERS WEEK 3
 You are training a classification model with logistic regression. Which of the following statements are true? Check all that apply.
 Introducing regularization to the model always results in equal or better performance on the training set.
 Introducing regularization to the model always results in equal or better performance on examples not in the training set.
 Adding a new feature to the model always results in equal or better performance on the training set.
 Adding many new features to the model helps prevent overfitting on the training set.
 Suppose you ran logistic regression twice, once with , and once with . One of the times, you got parameters , and the other time you got . However, you forgot which value of corresponds to which value of . Which one do you think corresponds to ?
 Suppose you ran logistic regression twice, once with , and once with . One of the times, you got parameters , and the other time you got . However, you forgot which value of corresponds to which value of . Which one do you think corresponds to ?
 Which of the following statements about regularization are true? Check all that apply.
 Using a very large value of hurt the performance of your hypothesis; the only reason we do not set to be too large is to avoid numerical problems.
 Because logistic regression outputs values , its range of output values can only be “shrunk” slightly by regularization anyway, so regularization is generally not helpful for it.
 Consider a classification problem. Adding regularization may cause your classifier to incorrectly classify some training examples (which it had correctly classified when not using regularization, i.e. when λ = 0).
 Using too large a value of λ can cause your hypothesis to overfit the data; this can be avoided by reducing λ.
 Which of the following statements about regularization are true? Check all that apply.
 Using a very large value of hurt the performance of your hypothesis; the only reason we do not set to be too large is to avoid numerical problems.
 Because logistic regression outputs values , its range of output values can only be “shrunk” slightly by regularization anyway, so regularization is generally not helpful for it.
 Because regularization causes J(θ) to no longer be convex, gradient descent may
not always converge to the global minimum (when λ > 0, and when using an
appropriate learning rate α).  Using too large a value of λ can cause your hypothesis to underfit the data; this can be avoided by reducing λ.
 In which one of the following figures do you think the hypothesis has overfit the training set?
 Figure:
 Figure:
 Figure:
 Figure:
 Figure:
In which one of the following figures do you think the hypothesis has underfit the training set?
 Figure:
 Figure:
 Figure:
 Figure:
ASSEMBLE A COMPUTER COURSERA ANSWERS
IGNORE TAGS:=====
“machine learning coursera quiz answers week 1”
“machine learning coursera quiz answers week 2”
“machine learning coursera quiz answers week 6”
“machine learning coursera quiz answers github”
“machine learning coursera quiz answers week 3”
“machine learning coursera quiz answers week 5”
“machine learning coursera quiz answers week 4”
“machine learning coursera quiz answers week 9”
“machine learning coursera quiz answers week 10”
“getting started with aws machine learning coursera quiz answers”
“introduction to machine learning coursera quiz answers”
“getting started with aws machine learning coursera quiz answers github”
“stanford university machine learning coursera quiz answers”
“advice for applying machine learning coursera quiz answers”
“mathematics for machine learning coursera quiz answers”
“introduction to applied machine learning coursera quiz answers”
“large scale machine learning coursera quiz answers”
“neural networks for machine learning coursera quiz answers”
“how google does machine learning coursera quiz answers”
introduction to machine learning coursera quiz answers
coursera machine learning quiz answers week 3
machine learningcoursera github
introduction to machine learning duke university coursera quiz answers
machine learning coursera quiz answers week 2
machine learning coursera quiz answers week 1