.

machine learning coursera quiz answers

machine learning coursera quiz answers all weeks

 

Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it. Many researchers also think it is the best way to make progress towards human-level AI.

In this class, you will learn about the most effective machine learning techniques, and gain practice implementing them and getting them to work for yourself. More importantly, you’ll learn about not only the theoretical underpinnings of learning, but also gain the practical know-how needed to quickly and powerfully apply these techniques to new problems. Finally, you’ll learn about some of Silicon Valley’s best practices in innovation as it pertains to machine learning and AI.

machine learning coursera quiz answers

introduction to machine learning duke university coursera quiz answers, machine learning coursera github, coursera machine learning quiz answers week 3, introduction to machine learning coursera quiz answers, machine learning for all university of london coursera quiz answers, coursera university of washington machine learning quiz answers, machine learning coursera quiz answers week 1, machine learning coursera quiz answers week 2

machine learning coursera quiz answers week 1

  1. A computer program is said to learn from experience E with respect to some task T and some performance measure P if its performance on T, as measured by P, improves with experience E. Suppose we feed a learning algorithm a lot of historical weather data, and have it learn to predict weather. What would be a reasonable choice for P? 
    •  The probability of it correctly predicting a future date’s weather.
    •  The weather prediction task.
    •  The process of the algorithm examining a large amount of historical weather data.
    •  None of these.

  1. A computer program is said to learn from experience E with respect to some task T and some performance measure P if its performance on T, as measured by P, improves with experience E. Suppose we feed a learning algorithm a lot of historical weather data, and have it learn to predict weather. In this setting, what is T?
    •  The weather prediction task.
    •  None of these.
    •  The probability of it correctly predicting a future date’s weather.
    •  The process of the algorithm examining a large amount of historical weather data.

  1. Suppose you are working on weather prediction, and use a learning algorithm to predict tomorrow’s temperature (in degrees Centigrade/Fahrenheit).
    Would you treat this as a classification or a regression problem?

    •  Regression
    •  Classification

  1. Suppose you are working on weather prediction, and your weather station makes one of three predictions for each day’s weather: Sunny, Cloudy or Rainy. You’d like to use a learning algorithm to predict tomorrow’s weather.
    Would you treat this as a classification or a regression problem?

    •  Regression
    • Classification

 


  1. Suppose you are working on stock market prediction, and you would like to predict the price of a particular stock tomorrow (measured in dollars). You want to use a learning algorithm for this.
    Would you treat this as a classification or a regression problem?

    •  Regression
    •  Classification

  1. Suppose you are working on stock market prediction. You would like to predict whether or not a certain company will declare bankruptcy within the next 7 days (by training on data of similar companies that had previously been at risk of bankruptcy).
    Would you treat this as a classification or a regression problem?

    •  Regression
    •  Classification

  1. Suppose you are working on stock market prediction, Typically tens of millions of shares of Microsoft stock are traded (i.e., bought/sold) each day. You would like to predict the number of Microsoft shares that will be traded tomorrow.
    Would you treat this as a classification or a regression problem?

    •  Regression
    • Classification

 

  1. Some of the problems below are best addressed using a supervised learning algorithm, and the others with an unsupervised learning algorithm. Which of the following would you apply supervised learning to? (Select all that apply.) In each case, assume some appropriate dataset is available for your algorithm to learn from.
    •  Given historical data of children’s ages and heights, predict children’s height as a function of their age.
    •  Given 50 articles written by male authors, and 50 articles written by female authors, learn to predict the gender of a new manuscript’s author (when the identity of this author is unknown).
    •  Take a collection of 1000 essays written on the US Economy, and find a way to automatically group these essays into a small number of groups of essays that are somehow “similar” or “related”.
    •  Examine a large collection of emails that are known to be spam email, to discover if there are sub-types of spam mail.

  1. Some of the problems below are best addressed using a supervised learning algorithm, and the others with an unsupervised learning algorithm. Which of the following would you apply supervised learning to? (Select all that apply.) In each case, assume some appropriate dataset is available for your algorithm to learn from.
    •  Given data on how 1000 medical patients respond to an experimental drug (such as effectiveness of the treatment, side effects, etc.), discover whether there are different categories or “types” of patients in terms of how they respond to the drug, and if so what these categories are.
    •  Given a large dataset of medical records from patients suffering from heart disease, try to learn whether there might be different clusters of such patients for which we might tailor separate treatments.
    •  Have a computer examine an audio clip of a piece of music, and classify whether or not there are vocals (i.e., a human voice singing) in that audio clip, or if it is a clip of only musical instruments (and no vocals).
    •  Given genetic (DNA) data from a person, predict the odds of him/her developing diabetes over the next 10 years.

 

Coursera: Machine Learning (Week 1) Quiz – Linear Regression with One Variable

 

  1. Consider the problem of predicting how well a student does in her second year of college/university, given how well she did in her first year. Specifically, let x be equal to the number of “A” grades (including A-. A and A+ grades) that a student receives in their first year of college (freshmen year). We would like to predict the value of y, which we define as the number of “A” grades they get in their second year (sophomore year).
    Here each row is one training example. Recall that in linear regression, our hypothesis is  to denote the number of training examples.
    enter image description here
    For the training set given above (note that this training set may also be referenced in other questions in this quiz), what is the value of ? In the box below, please enter your answer (which should be a number between 0 and 10).

    4

 


  1. Many substances that can burn (such as gasoline and alcohol) have a chemical structure based on carbon atoms; for this reason they are called hydrocarbons. A chemist wants to understand how the number of carbon atoms in a molecule affects how much energy is released when that molecule combusts (meaning that it is burned). The chemist obtains the dataset below. In the column on the right, “kJ/mol” is the unit measuring the amount of energy released.

    enter image description here

    You would like to use linear regression () to estimate the amount of energy released (y) as a function of the number of carbon atoms (x). Which of the following do you think will be the values you obtain for  and  ? You should be able to select the right answer without actually implementing linear regression.

    •   = −569.6,  = 530.9
    •   = −1780.0,  = −530.9
    •   = −569.6,  = −530.9
    •   = −1780.0,  = 530.9

 


  1. For this question, assume that we are using the training set from Q1.
    Recall our definition of the cost function was 
    What is ? In the box below,
    please enter your answer (Simplify fractions to decimals when entering answer, and ‘.’ as the decimal delimiter e.g., 1.5).

    0.5

  1. Suppose we set  in the linear regression hypothesis from Q1. What is  ?
    3

  1. Suppose we set  = −2,  = 0.5 in the linear regression hypothesis from Q1. What is ?
    1

  1. Let  be some function so that  outputs a number. For this problem,  is some arbitrary/unknown smooth function (not necessarily the cost function of linear regression, so  may have local optima).
    Suppose we use gradient descent to try to minimize  as a function of  and .
    Which of the following statements are true? (Check all that apply.)
    •  If  and  are initialized at the global minimum, then one iteration will not change their values.
    •  Setting the learning rate  to be very small is not harmful, and can only speed up the convergence of gradient descent.
    •  No matter how  and  are initialized, so long as  is sufficiently small, we can safely expect gradient descent to converge to the same solution.
    •  If the first few iterations of gradient descent cause  to increase rather than decrease, then the most likely cause is that we have set the learning rate  to too large a value.

 


  1. In the given figure, the cost function  has been plotted against  and , as shown in ‘Plot 2’. The contour plot for the same cost function is given in ‘Plot 1’. Based on the figure, choose the correct options (check all that apply).
    enter image description here

    •  If we start from point B, gradient descent with a well-chosen learning rate will eventually help us reach at or near point A, as the value of cost function  is maximum at point A.
    •  If we start from point B, gradient descent with a well-chosen learning rate will eventually help us reach at or near point C, as the value of cost function  is minimum at point C.
    •  Point P (the global minimum of plot 2) corresponds to point A of Plot 1.
    •  If we start from point B, gradient descent with a well-chosen learning rate will eventually help us reach at or near point A, as the value of cost function  is minimum at A.
    •  Point P (The global minimum of plot 2) corresponds to point C of Plot 1.

Coursera: Machine Learning (Week 1) Quiz – Linear Algebra | Andrew NG

  1. Let two matrices be
     , 
    What is A – B ?

    •  
    •  
    •  
    •  

 


  1. Let two matrices be
     , 
    What is A + B ?

    •  
    •  
    •  
    •  

  1. Let

    What is 2∗x ?

    •   

       Correct
      To multiply the vector x by 2, take each element of x and multiply that element by 2.

    •   
    •   
    •   

 


  1. Let

    What is 2∗x ?

    •   
    •   

       Correct
      To multiply the vector x by 2, take each element of x and multiply that element by 2.

    •   
    •   

  1. Let u be a 3-dimensional vector, where specifically

    What is  ?

    •  
    •  
    •  
    •  

 


  1. Let u and v be 3-dimensional vectors, where specifically

    and

    what is  ?
    (Hint is a 1×3 dimensional matrix, and v can also be seen as a 3×1 matrix. The answer you want can be obtained by taking the matrix product of  and .) Do not add brackets to your answer.

    -4
    

Machine learning coursera assignment answers week 2

warmUpExercise.m :

function A = warmUpExercise()
  %WARMUPEXERCISE Example function in octave
  %   A = WARMUPEXERCISE() is an example function that returns the 5x5 identity matrix
  
   A = []; 
  % ============= YOUR CODE HERE ==============
  % Instructions: Return the 5x5 identity matrix 
  %               In octave, we return values by defining which variables
  %               represent the return values (at the top of the file)
  %               and then set them accordingly. 
  
   A = eye(5);  %It's a built-in function to create identity matrix
  
  % ===========================================
end

 

plotData.m :

function plotData(x, y)
  %PLOTDATA Plots the data points x and y into a new figure 
  %   PLOTDATA(x,y) plots the data points and gives the figure axes labels of
  %   population and profit.
  
  figure; % open a new figure window
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: Plot the training data into a figure using the 
  %               "figure" and "plot" commands. Set the axes labels using
  %               the "xlabel" and "ylabel" commands. Assume the 
  %               population and revenue data have been passed in
  %               as the x and y arguments of this function.
  %
  % Hint: You can use the 'rx' option with plot to have the markers
  %       appear as red crosses. Furthermore, you can make the
  %       markers larger by using plot(..., 'rx', 'MarkerSize', 10);
  
  plot(x, y, 'rx', 'MarkerSize', 10); % Plot the data
  ylabel('Profit in $10,000s'); % Set the y-axis label
  xlabel('Population of City in 10,000s'); % Set the x-axis label
  
  % ============================================================
end

computeCost.m :

function J = computeCost(X, y, theta)
  %COMPUTECOST Compute cost for linear regression
  %   J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
  %   parameter for linear regression to fit the data points in X and y
  
  % Initialize some useful values
  m = length(y); % number of training examples
  
  % You need to return the following variables correctly 
  J = 0;
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: Compute the cost of a particular choice of theta
  %               You should set J to the cost.
  
  %%%%%%%%%%%%% CORRECT %%%%%%%%%
  % h = X*theta;
  % temp = 0; 
  % for i=1:m
  %   temp = temp + (h(i) - y(i))^2;
  % end
  % J = (1/(2*m)) * temp;
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  
  %%%%%%%%%%%%% CORRECT: Vectorized Implementation %%%%%%%%%
  J = (1/(2*m))*sum(((X*theta)-y).^2);
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

  % =========================================================================
end

 

gradientDescent.m :

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
  %GRADIENTDESCENT Performs gradient descent to learn theta
  %   theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by 
  %   taking num_iters gradient steps with learning rate alpha
  
  % Initialize some useful values
  m = length(y); % number of training examples
  J_history = zeros(num_iters, 1);
  
  for iter = 1:num_iters
  
   % ====================== YOUR CODE HERE ======================
   % Instructions: Perform a single gradient step on the parameter vector
   %               theta. 
   %
   % Hint: While debugging, it can be useful to print out the values
   %       of the cost function (computeCost) and gradient here.
   %
   
   %%%%%%%%% CORRECT %%%%%%%
   %error = (X * theta) - y;
   %temp0 = theta(1) - ((alpha/m) * sum(error .* X(:,1)));
   %temp1 = theta(2) - ((alpha/m) * sum(error .* X(:,2)));
   %theta = [temp0; temp1];
   %%%%%%%%%%%%%%%%%%%%%%%%%
  
   %%%%%%%%% CORRECT %%%%%%%  
   %error = (X * theta) - y;
   %temp0 = theta(1) - ((alpha/m) * X(:,1)'*error);
   %temp1 = theta(2) - ((alpha/m) * X(:,2)'*error);
   %theta = [temp0; temp1];
   %%%%%%%%%%%%%%%%%%%%%%%%%
  
   %%%%%%%%% CORRECT %%%%%%%
   error = (X * theta) - y;
   theta = theta - ((alpha/m) * X'*error);
   %%%%%%%%%%%%%%%%%%%%%%%%%
   
   % ============================================================
  
   % Save the cost J in every iteration    
   J_history(iter) = computeCost(X, y, theta);
  
  end
end

computeCostMulti.m :

function J = computeCostMulti(X, y, theta)
  %COMPUTECOSTMULTI Compute cost for linear regression with multiple variables
  %   J = COMPUTECOSTMULTI(X, y, theta) computes the cost of using theta as the
  %   parameter for linear regression to fit the data points in X and y
  
  % Initialize some useful values
  m = length(y); % number of training examples
  
  % You need to return the following variables correctly 
  J = 0;
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: Compute the cost of a particular choice of theta
  %               You should set J to the cost.
  
  J = (1/(2*m))*(sum(((X*theta)-y).^2));

  % =========================================================================

end

 

gradientDescentMulti.m :

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)
  %GRADIENTDESCENTMULTI Performs gradient descent to learn theta
  %   theta = GRADIENTDESCENTMULTI(x, y, theta, alpha, num_iters) updates theta by
  %   taking num_iters gradient steps with learning rate alpha
  
  % Initialize some useful values
  m = length(y); % number of training examples
  J_history = zeros(num_iters, 1);
  
  for iter = 1:num_iters
  
   % ====================== YOUR CODE HERE ======================
   % Instructions: Perform a single gradient step on the parameter vector
   %               theta. 
   %
   % Hint: While debugging, it can be useful to print out the values
   %       of the cost function (computeCostMulti) and gradient here.
   %
  
   %%%%%%%% CORRECT %%%%%%%%%%   
   error = (X * theta) - y;
   theta = theta - ((alpha/m) * X'*error);
   %%%%%%%%%%%%%%%%%%%%%%%%%%%
  
   % ============================================================
  
   % Save the cost J in every iteration    
   J_history(iter) = computeCostMulti(X, y, theta);
  
  end
end

machine learning coursera quiz answers week 2

 

week_2.1.PNG.

Linear regression with multiple variables coursera quiz answers week 2

 

    1. Suppose m=4 students have taken some classes, and the class had a midterm exam and a final exam. You have collected a dataset of their scores on the two exams, which is as follows:
      enter image description here
      You’d like to use polynomial regression to predict a student’s final exam score from their midterm exam score. Concretely, suppose you want to fit a model of the form , where  is the midterm score and x_2 is (midterm score)^2. Further, you plan to use both feature scaling (dividing by the “max-min”, or range, of a feature) and mean normalization.
      What is the normalized feature  ? (Hint: midterm = 69, final = 78 is training example 4.) Please round off your answer to two decimal places and enter in the text box below.

 -0.47


  1. You run gradient descent for 15 iterations with  and compute after each iteration. You find that the value of  decreases slowly and is still decreasing after 15 iterations. Based on this, which of the following conclusions seems most plausible?
    •  Rather than use the current value of α, it’d be more promising to try a larger value of α (say  = 1.0).
    •  Rather than use the current value of α, it’d be more promising to try a smaller value of α (say  = 0.1).
    •   = 0.3 is an effective choice of learning rate.

  1. You run gradient descent for 15 iterations with  and compute after each iteration. You find that the value of  decreases quickly then levels off. Based on this, which of the following conclusions seems most plausible?
    •  Rather than use the current value of α, it’d be more promising to try a larger value of α (say  = 1.0).
    •  Rather than use the current value of α, it’d be more promising to try a smaller value of α (say  = 0.1).
    •   = 0.3 is an effective choice of learning rate.

 


  1. Suppose you have m = 23 training examples with n = 5 features (excluding the additional all-ones feature for the intercept term, which you should add). The normal equation is . For the given values of m and n, what are the dimensions of , X, and y in this equation?
      •  X is 23 × 5, y is 23 × 1, θ is 5 × 5
      •  X is 23 × 6, y is 23 × 6, θ is 6 × 6
      •  X is 23 × 6, y is 23 × 1, θ is 6 × 1

     X has m rows and n+1 columns (+1 because of the  term). y is m-vector.  is an (n+1)-vector

    • X is 23 × 5, y is 23 × 1, θ is 5 × 1

 

  1. Suppose you have a dataset with m = 1000000 examples and n = 200000 features for each example. You want to use multivariate linear regression to fit the parameters  to our data. Should you prefer gradient descent or the normal equation?
      •  Gradient descent, since it will always converge to the optimal θ.
      •  Gradient descent, since  will be very slow to compute in the normal equation.

     With n = 200000 features, you will have to invert a 200001 x 200001 matrix to compute the normal equation. Inverting such a large matrix is computationally expensive, so gradient descent is a good choice.

    •  The normal equation, since it provides an efficient way to directly find the solution.
    •  The normal equation, since gradient descent might be unable to find the optimal θ.

 

Octave/ Matlab tutorial coursera quiz answers week 2

 

  1. Suppose I first execute the following Octave/Matlab commands:
    A = [1 2; 3 4; 5 6];
    B = [1 2 3; 4 5 6];

    Which of the following are then valid commands? Check all that apply. (Hint: A’ denotes the transpose of A.)

    •  C = A * B;
    •  C = B’ + A;
    •  C = A’ * B;
    • C = B + A;

 


  1. Let

    Which of the following indexing expressions gives

    Check all that apply.

    •  B = A(:, 1:2);
    •  B = A(1:4, 1:2);
    •  B = A(:, 0:2);
    • B = A(0:4, 0:2);

 


  1. Let A be a 10×10 matrix and x be a 10-element vector. Your friend wants to compute the product Ax and writes the following code:
    v = zeros(10, 1);
    for i = 1:10
        for j = 1:10
            v(i) = v(i) + A(i, j) * x(j);
        end
    end

    How would you vectorize this code to run without any for loops? Check all that apply.

    •  v = A * x;
    •  v = Ax;
    •  v = x’ * A;
    •  v = sum (A * x);

  1. Say you have two column vectors v and w, each with 7 elements (i.e., they have dimensions 7×1). Consider the following code:
    z = 0;
    for i = 1:7
        z = z + v(i) * w(i)
    end

    Which of the following vectorizations correctly compute z? Check all that apply.

    •  z = sum (v .* w);
    •  z = w’ * v;
    •  z = v * w’;
    •  z = w * v’;

 

  1. In Octave/Matlab, many functions work on single numbers, vectors, and matrices. For example, the sin function when applied to a matrix will return a new matrix with the sin of each element. But you have to be careful, as certain functions have different behavior. Suppose you have an 7×7 matrix X. You want to compute the log of every element, the square of every element, add 1 to every element, and divide every element by 4. You will store the results in four matrices, A, B, C, D. One way to do so is the following code:
    for i = 1:7
        for j = 1:7
            A(i, j) = log(X(i, j));
            B(i, j) = X(i, j) ^ 2;
            C(i, j) = X(i, j) + 1;
            D(i, j) = X(i, j) / 4;
        end
    end

    Which of the following correctly compute A, B, C or D? Check all that apply.

    •  C = X + 1;
    •  D = X / 4;
    •  A = log (X);
    •  B = X ^ 2;

MACHINE LEARNING COURSERA WEEK 3 ANSWERS

Machine learning coursera assignment week 3 answers

 

plotData.m :

function plotData(X, y)
  %PLOTDATA Plots the data points X and y into a new figure 
  %   PLOTDATA(x,y) plots the data points with + for the positive examples
  %   and o for the negative examples. X is assumed to be a Mx2 matrix.
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: Plot the positive and negative examples on a
  %               2D plot, using the option 'k+' for the positive
  %               examples and 'ko' for the negative examples.
  %
  
  %Seperating positive and negative results
  pos = find(y==1); %index of positive results
  neg = find(y==0); %index of negative results
  
  % Create New Figure
  figure;
  
  %Plotting Positive Results on 
  %    X_axis: Exam1 Score =  X(pos,1)
  %    Y_axis: Exam2 Score =  X(pos,2)
  plot(X(pos,1),X(pos,2),'g+');
  
  %To keep above plotted graph as it is.
  hold on;  
  
  %Plotting Negative Results on 
  %    X_axis: Exam1 Score =  X(neg,1)
  %    Y_axis: Exam2 Score =  X(neg,2)
  plot(X(neg,1),X(neg,2),'ro');
  
  % =========================================================================
  
  hold off;
end

 

sigmoid.m :

function g = sigmoid(z)
  %SIGMOID Compute sigmoid function
  %   g = SIGMOID(z) computes the sigmoid of z.
  
  % You need to return the following variables correctly 
  g = zeros(size(z));
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: Compute the sigmoid of each value of z (z can be a matrix,
  %               vector or scalar).
  g = 1./(1+exp(-z));
  
  % =============================================================
end

costFunction.m :

function [J, grad] = costFunction(theta, X, y)
  %COSTFUNCTION Compute cost and gradient for logistic regression
  %   J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
  %   parameter for logistic regression and the gradient of the cost
  %   w.r.t. to the parameters.
  
  % Initialize some useful values
  m = length(y); % number of training examples
  
  % You need to return the following variables correctly 
  J = 0;
  grad = zeros(size(theta));
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: Compute the cost of a particular choice of theta.
  %               You should set J to the cost.
  %               Compute the partial derivatives and set grad to the partial
  %               derivatives of the cost w.r.t. each parameter in theta
  %
  % Note: grad should have the same dimensions as theta
  %
  %DIMENSIONS: 
  %   theta = (n+1) x 1
  %   X     = m x (n+1)
  %   y     = m x 1
  %   grad  = (n+1) x 1
  %   J     = Scalar
  
  z = X * theta;      % m x 1
  h_x = sigmoid(z);   % m x 1 
  
  J = (1/m)*sum((-y.*log(h_x))-((1-y).*log(1-h_x))); % scalar
  
  grad = (1/m)* (X'*(h_x-y));     % (n+1) x 1
  
  % =============================================================
  
end

predict.m :

function p = predict(theta, X)
  %PREDICT Predict whether the label is 0 or 1 using learned logistic 
  %regression parameters theta
  %   p = PREDICT(theta, X) computes the predictions for X using a 
  %   threshold at 0.5 (i.e., if sigmoid(theta'*x) >= 0.5, predict 1)
  
  m = size(X, 1); % Number of training examples
  
  % You need to return the following variables correctly
  p = zeros(m, 1);
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: Complete the following code to make predictions using
  %               your learned logistic regression parameters. 
  %               You should set p to a vector of 0's and 1's
  %
  % Dimentions:
  % X     =  m x (n+1)
  % theta = (n+1) x 1
  
  h_x = sigmoid(X*theta);
  p=(h_x>=0.5);
  
  %p = double(sigmoid(X * theta)>=0.5);
  % =========================================================================
end

costFunctionReg.m :

function [J, grad] = costFunctionReg(theta, X, y, lambda)
  %COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
  %   J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
  %   theta as the parameter for regularized logistic regression and the
  %   gradient of the cost w.r.t. to the parameters. 
  
  % Initialize some useful values
  m = length(y); % number of training examples
  
  % You need to return the following variables correctly 
  J = 0;
  grad = zeros(size(theta));
  
  % ====================== YOUR CODE HERE ======================
  % Instructions: Compute the cost of a particular choice of theta.
  %               You should set J to the cost.
  %               Compute the partial derivatives and set grad to the partial
  %               derivatives of the cost w.r.t. each parameter in theta
  
  %DIMENSIONS: 
  %   theta = (n+1) x 1
  %   X     = m x (n+1)
  %   y     = m x 1
  %   grad  = (n+1) x 1
  %   J     = Scalar
  
  z = X * theta;      % m x 1
  h_x = sigmoid(z);  % m x 1 
  
  reg_term = (lambda/(2*m)) * sum(theta(2:end).^2);
  
  J = (1/m)*sum((-y.*log(h_x))-((1-y).*log(1-h_x))) + reg_term; % scalar
  
  grad(1) = (1/m)* (X(:,1)'*(h_x-y));                                  % 1 x 1
  grad(2:end) = (1/m)* (X(:,2:end)'*(h_x-y))+(lambda/m)*theta(2:end);  % n x 1
  
  % =============================================================
end

LOGISTIC REGRESSION COURSERA QUIZ ANSWERS WEEK 3

  1. Suppose that you have trained a logistic regression classifier, and it outputs on a new example a prediction  = 0.2. This means (check all that apply):
    •  Our estimate for P(y = 1|x; θ) is 0.8.

       h(x) gives P(y=1|x; θ), not 1 – P(y=1|x; θ)

    •  Our estimate for P(y = 0|x; θ) is 0.8.

       Since we must have P(y=0|x;θ) = 1 – P(y=1|x; θ), the former is
      1 – 0.2 = 0.8.

    •  Our estimate for P(y = 1|x; θ) is 0.2.

       h(x) is precisely P(y=1|x; θ), so each is 0.2.

    •  Our estimate for P(y = 0|x; θ) is 0.2.

       h(x) is P(y=1|x; θ), not P(y=0|x; θ)

 


  1. Suppose you have the following training set, and fit a logistic regression classifier .
    enter image description here
    enter image description here
    Which of the following are true? Check all that apply.

    •  Adding polynomial features (e.g., instead using ) could increase how well we can fit the training data.
    •  At the optimal value of θ (e.g., found by fminunc), we will have J(θ) ≥ 0.
    •  Adding polynomial features (e.g., instead using ) would increase J(θ) because we are now summing over more terms.
    •  If we train gradient descent for enough iterations, for some examples  in the training set it is possible to obtain .

 


  1. For logistic regression, the gradient is given by . Which of these is a correct gradient descent update for logistic regression with a learning rate of  ? Check all that apply.
    •   (simultaneously update for all j).
    •  .
    •   (simultaneously update for all j).
    •   (simultaneously update for all j).

 


  1. Which of the following statements are true? Check all that apply.
    •  The one-vs-all technique allows you to use logistic regression for problems in which each  comes from a fixed, discrete set of values.

       If each  is one of k different values, we can give a label to each  and use one-vs-all as described in the lecture.

    •  For logistic regression, sometimes gradient descent will converge to a local minimum (and fail to find the global minimum). This is the reason we prefer more advanced optimization algorithms such as fminunc (conjugate gradient/BFGS/L-BFGS/etc).

       The cost function for logistic regression is convex, so gradient descent will always converge to the global minimum. We still might use a more advanced optimisation algorithm since they can be faster and don’t require you to select a learning rate.

    •  The cost function  for logistic regression trained with  examples is always greater than or equal to zero.

       The cost for any example  is always  since it is the negative log of a quantity less than one. The cost function  is a summation over the cost for each sample, so the cost function itself must be greater than or equal to zero.

    •  Since we train one classifier when there are two classes, we train two classifiers when there are three classes (and we do one-vs-all classification).

       We will need 3 classfiers. One-for-each class.

Suppose you train a logistic classifier . Suppose . Which of the following figures represents the decision boundary found by your classifier?

  •   Figure:
    enter image description here

     In this figure, we transition from negative to positive when x1 goes from left of 6 to right of 6 which is true for the given values of θ.

  •   Figure:
    enter image description here
  •   Figure:
    enter image description here
  •   Figure:
    enter image description here

 

REGULARIZATION COURSERA QUIZ ANSWERS WEEK 3

 

  1. You are training a classification model with logistic regression. Which of the following statements are true? Check all that apply.
    •  Introducing regularization to the model always results in equal or better performance on the training set.
    •  Introducing regularization to the model always results in equal or better performance on examples not in the training set.
    •  Adding a new feature to the model always results in equal or better performance on the training set.
    • Adding many new features to the model helps prevent overfitting on the training set.

 


  1. Suppose you ran logistic regression twice, once with , and once with . One of the times, you got parameters , and the other time you got . However, you forgot which value of  corresponds to which value of . Which one do you think corresponds to ?
    •  

       When  is set to 1, We use regularization to penalize large value of . Thus, the parameter, , obtained will in general have smaller values.

    •  

  1. Suppose you ran logistic regression twice, once with , and once with . One of the times, you got parameters , and the other time you got . However, you forgot which value of  corresponds to which value of . Which one do you think corresponds to ?
    •  
    •  

       When  is set to 1, We use regularization to penalize large value of . Thus, the parameter, , obtained will in general have smaller values.


  1. Which of the following statements about regularization are true? Check all that apply.
    •  Using a very large value of  hurt the performance of your hypothesis; the only reason we do not set  to be too large is to avoid numerical problems.
    •  Because logistic regression outputs values , its range of output values can only be “shrunk” slightly by regularization anyway, so regularization is generally not helpful for it.
    •  Consider a classification problem. Adding regularization may cause your classifier to incorrectly classify some training examples (which it had correctly classified when not using regularization, i.e. when λ = 0).
    •  Using too large a value of λ can cause your hypothesis to overfit the data; this can be avoided by reducing λ.

  1. Which of the following statements about regularization are true? Check all that apply.
    •  Using a very large value of  hurt the performance of your hypothesis; the only reason we do not set  to be too large is to avoid numerical problems.
    •  Because logistic regression outputs values , its range of output values can only be “shrunk” slightly by regularization anyway, so regularization is generally not helpful for it.
    •  Because regularization causes J(θ) to no longer be convex, gradient descent may
      not always converge to the global minimum (when λ > 0, and when using an
      appropriate learning rate α).
    •  Using too large a value of λ can cause your hypothesis to underfit the data; this can be avoided by reducing λ.

 


  1. In which one of the following figures do you think the hypothesis has overfit the training set?
    •  Figure:
      enter image description here
    •  Figure:
      enter image description here
    •  Figure:
      enter image description here
    •  Figure:
      enter image description here

In which one of the following figures do you think the hypothesis has underfit the training set?

  •  Figure:
    enter image description here
  •  Figure:
    enter image description here
  •  Figure:
    enter image description here
  •  Figure:
    enter image description here

 

ASSEMBLE A COMPUTER COURSERA ANSWERS

 

IGNORE TAGS:=====

 

“machine learning coursera quiz answers week 1”
“machine learning coursera quiz answers week 2”
“machine learning coursera quiz answers week 6”
“machine learning coursera quiz answers github”
“machine learning coursera quiz answers week 3”
“machine learning coursera quiz answers week 5”
“machine learning coursera quiz answers week 4”
“machine learning coursera quiz answers week 9”
“machine learning coursera quiz answers week 10”
“getting started with aws machine learning coursera quiz answers”
“introduction to machine learning coursera quiz answers”
“getting started with aws machine learning coursera quiz answers github”
“stanford university machine learning coursera quiz answers”
“advice for applying machine learning coursera quiz answers”
“mathematics for machine learning coursera quiz answers”
“introduction to applied machine learning coursera quiz answers”
“large scale machine learning coursera quiz answers”
“neural networks for machine learning coursera quiz answers”
“how google does machine learning coursera quiz answers”

introduction to machine learning coursera quiz answers
coursera machine learning quiz answers week 3
machine learning-coursera github
introduction to machine learning duke university coursera quiz answers
machine learning coursera quiz answers week 2
machine learning coursera quiz answers week 1

error: Content is Protected !!!