901 Matching Annotations
  1. Jul 2020
    1. The tiny and small models seemed to do fairly well, with the tiny model producing the smallest discrepancy

      I agree. Do you think the small model outperforms the medium sized one?

  2. darrinlilly.github.io darrinlilly.github.io
    1. More specifically, the first row depicts the 7 stored at index 0, the second row depicts the 7 stored at index 17, and the 7 stored at index 26 of the testing images.

      excellent investigative design

    2. Interestingly, this filter also produced wavy lines across the sky

      that is really interesting, as that underlying structure was not immediately apparent to me upon first impression

    3. As such, the outer edges of the image are not processed because there must be 8 directly adjacent pixels to the current pixel.

      excellent interpretation of the process

    1. I like how the hand came out through it. I’d probably given time filter the contrast of the image so that Earth would be broken up into pieces more like the arm.

      Nicely done! I'm sure the review committee will be very interested in your submittal.

  3. ashuang2013.github.io ashuang2013.github.io
    1. never able to comprehend but captivated by its beauty nonetheless.

      How interesting! I'm certain the review committee will be intrigued by your submittal. I find it captivating myself!

    1. In this case, comparing the 4 models demonstrates the concept: a larger model has more power to generalize more complex amounts of data, but overfits easily if the model is overqualified to generalize the data set.

      excellent

    1. Adding more convolutions would increase the time it took to train the data even more with minimal returns on the increase to accuracy since the accuracy is already at 99.4%

      good

    1. Below, is my image for the Jump Start Data Science T-shirt competition, and also my assignment for Project 2!

      Nice artwork! I'm certain the review committee will be very interested to consider your submittal.

    1. With convolutional and pooling layers, it appears that more and more of the original image is reduced to these basic outlines of features. That is, there appears to be more information lost, but less noise.

      excellent!

    1. This stylized image is supposed to be what our vision looks like after sitting at the computer for hours after our bedtime due to various issues with our coding

      Very interesting! A starry inside night -- great work, I'm sure the review committee will be excited to review your submittal!

    1. This provides a useful tool for investigating the co-relationship amongst the variables, as we can see how they interact independently from all of the other variables

      good

  4. caitlin0806.github.io caitlin0806.github.io
    1. the letters to take the color scheme of the microchip that happened to also be W&M colors

      What a novel idea! A great look for the review committee to consider during its selection process.

  5. caitlin0806.github.io caitlin0806.github.io
  6. caitlin0806.github.io caitlin0806.github.io
    1. thinking about what is going on the background is that the pixels are being manipulated in a way that it uses nearby pixels to guess and change so that the result comes out different

      nice interpretation

  7. sarenaoberoi.github.io sarenaoberoi.github.io
    1. This summer was very busy and I thought that choosing this image would remind me to relax and take a step back to experience nature and all of the beautiful things that surround me

      Arnold is a very serene robot! I'm sure the review committee will be excited to consider your entry. Good job!

  8. sarenaoberoi.github.io sarenaoberoi.github.io
    A
    1
    1. The images the model trained and tested on are imperfect in comparison to the fashion mnist and numbers mnist datasets so the model is fairly inacturate. We can also tell that the model is overfit because it reached a training score of 100% while the validation score remained at a low ~75%.

      good

    1. I do have questions concerning this method; rescaling an image that potentially does not have the same aspect ratio could distort the image and thus the accuracy

      Perhaps you could suggest a model based approach that would be more effective?

    1. Although the filter applied was meant to detect edges, our values may have overblown the exposure, thus wiping out a lot of the edges. This filter would probably not be very effective for edge detection.

      I'm wondering, if you also applied a weight?

  9. caitlin0806.github.io caitlin0806.github.io
  10. caitlin0806.github.io caitlin0806.github.io
    1. The 4 bedroom house is the best deal and the 3 bedroom house is the worse deal. In my model the predicted value was subtracted with the actual price and the lowest number (most negative tells me that that house had the best deal, which was the 4 bedroom house) and vice versa for the 3 bedroom house.

      OK, but could you add a table to more clearly communicate your results. Perhaps order from best "deal" to worst?

    2. Traditional Programming: you have rules and data come in and answers that come out. It is manual and hardcoded input of rules. Machine Learning: you have answers and data aka labeling come in and rules that match one to the other come out. Algorithm is used so it is not manual.

      OK. Could you elaborate on why this is significant? What aspect of data science has made this new design not only possible but also viable in its useful application towards addressing a multitude of research questions across a spectrum of disciplines?

    1. As Maroney explains, “if you show them all the shoes, then there’s no point. You’d have to show them some shoes, then let them train by identifying and picking out new things that they’ve never seen before.”

      good

    1. On the other hand, the worst deal is Moon (2 bd 250k) because the predicted price is two-thirds of the actual price, so the actual price would be “way too high” and thus “not a good deal”!

      OK, but could you add a table to more clearly communicate your results. Perhaps order from best "deal" to worst?

    2. No, because the model takes steps towards the right answer in different sizes. The epochs (number of loops/iterations) is also limited to 500 so it will not always be the same as it doesn’t always run until 100% completion, but will get pretty close.

      OK, but do you think probability has anything to do with the two, almost the same, yet still different results?

    3. Simplified, it’s that traditional programming is rules + data –> answers, vs machine learning is answers + data –> rules.

      Very good. Could you also elaborate on why this is significant? What aspect of data science has made this new design not only possible but also viable in its useful application towards addressing a multitude of research questions across a spectrum of disciplines?

  11. darrinlilly.github.io darrinlilly.github.io
  12. darrinlilly.github.io darrinlilly.github.io
    1. The Hudgins house was the best deal based on bedrooms because my program says the 3 bedroom house should be worth 245k, the worst house is the moon house because the price of the house is 250k my model said it should be 123k.

      OK, but could you add a table to more clearly communicate your results. Perhaps order from best "deal" to worst?

    2. Machine Learning is taking answers and data and trying to find rules, while traditional programming is trying to take rules and data to get answers

      Good. Could you elaborate on why this is significant? What aspect of data science has made this new design not only possible but also viable in its useful application towards addressing a multitude of research questions across a spectrum of disciplines?

    1. I think it is safe to say from this exercise that the amount of rooms in a house cannot be the only determining factor for house price.

      Very good - an ordered table would also help to support your argument

    2. The machine (watch) uses data and pre-supplied information (answers) in order to determine when the user is performing a certain task or activity (rules). In a traditional sense, the machine should have been given the rules in order to determine the answers but in a situation like this one, that order to operation does not make sense given how user specific the data is.

      Nice thoughtful and comprehensive response.

  13. sarenaoberoi.github.io sarenaoberoi.github.io
    1. By training the model on more people, it is possible for the model to identify a wider range of individauls and more accuratley distinguish social distancing violations.

      Good work!

  14. sarenaoberoi.github.io sarenaoberoi.github.io
  15. sarenaoberoi.github.io sarenaoberoi.github.io
  16. sarenaoberoi.github.io sarenaoberoi.github.io
    1. Because of this, it is possible for the values to be different depending on how much error the loss deems from the guess.

      Yes, due to the guess itself (probability)

    2. This is different from traditional programming, in that with machine learning, the computer/program will be able to define the rules as the output, while traditional programming requires the user to enter the rules as an input.

      Good. Could you elaborate on why this is significant? What aspect of data science has made this new design not only possible but also viable in its useful application towards addressing a multitude of research questions across a spectrum of disciplines?

    1. For example, if we were to input customer demographics and transactions, and then have historical customer churning rates as an output. Using these two characteristics, the algorithm will then create the program and will give you predictions (in this case) based on the data you provided.

      Good. I like the applicable example.

    1. Based on my model, the house at Holly Point Rd. presents the best deal as you would be spending $134,365 less than what the model predicted. Meanwhile, the house in Church St. would present the worst deal as you would be paying $98,088 more than the model predicted.

      OK, but could you add a table to more clearly communicate your results. Perhaps order from best "deal" to worst?

    2. These are different answers because the model is recompiled and relearning the data set and there was no random seed set.

      excellent! the perfect computer scientist answer

    3. Meanwhile, machine learning involves inputting the answers into a machine and having it figure out the rules for the programmer.

      Good. Could you elaborate on why this is significant? What aspect of data science has made this new design not only possible but also viable in its useful application towards addressing a multitude of research questions across a spectrum of disciplines?

    1. It’s an upward trend as people are getting more reliant on the use of technology.

      Perhaps it also is related to the fact that the rate of human population growth is increasing, as well as the complexity associated with larger numbers of people.

    1. What is the shape of the images training set (how many and the dimension of each)? 28x28 & 60,000 What is the length of the labels training set? 60,000 What is the shape of the images test set? 10,000

      perfect!

    2. By making a neural net model, we can estimate what each house should be priced based on bedroom number. Looking at the output of the code, we see that 160 Holly Point Rd, a house with three bedrooms selling for $97,000, is the best deal. Based on the model, we see that 760 New Point Comfort Hwy, a five bedroom house selling for $577,200, is the worst deal.

      OK, but could you add a table to more clearly communicate your results. Perhaps order from best "deal" to worst?

    3. This model could have been strengthened by using square footage instead of bedroom count in the input. Thus, it would account for other spaces and bathroom count.

      Yes very good, adding an additional predictive variable would likely improve the model. What about more observations?

    4. Because it is a stochastic process, the answers will be very close but not often the same

      Very good -- how often do you think it will in fact be the same?

    5. Traditional Programming: One inputs rules and data in order to derive answers Machine Learning: One inputs data and answers in order to derive rules

      Good. Could you elaborate on why this is significant? What aspect of data science has made this new design not only possible but also viable in its useful application towards addressing a multitude of research questions across a spectrum of disciplines?

    1. The optomizer and loss function are used when compiling the model and the ones used, “adam” and “sparse_categorical_crossentropy” are useful when classifying multiple categories.

      Could you have provided a bit more explanation of how these two functions serve to improve the prediction from the neural net model?

    1. The Hudgins house presents a great deal, because my neural network estimated that the price of a 3 bedroom house should be around $233,000, but the house is only priced at $97,000. One of the worst deals is the church house, because the network determined that a 4 bedroom house should be priced at around $300,000, but the house is isntead priced at $399,000.

      OK, but could you add a table to more clearly communicate your results. Perhaps order from best "deal" to worst?

    2. In traditional programming, the programmer will input data along with rules. From the combination of the two, the model will predict the answers. However, machine learning is a reorientation, as the programmer inputs data and answers, and the model instead figures out the rules. For example, it can figure out the relationship and other rules between variables.

      Good. Could you elaborate on why this is significant? What aspect of data science has made this new design not only possible but also viable in its useful application towards addressing a multitude of research questions across a spectrum of discipline

    1. his wasn’t a question and, consequently, I’m not sure how to answer this, so here’s my code if that helps:

      Build and train the model

      callbacks = myCallback()

      model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(512, activation=tf.nn.relu), tf.keras.layers.Dense(10, activation=tf.nn.softmax) ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

      model.fit(x_train, y_train, epochs=10, callbacks=[callbacks])

      Make predictions

      probability_model = tf.keras.Sequential([model, tf.keras.layers.Softmax()])

      predictions = probability_model.predict(x_test)

      predictions[1000]

      np.argmax(predictions[1000])

      y_test[1000]

      def plot_image(i, predictions_array, true_label, img): predictions_array, true_label, img = predictions_array, true_label[i], img[i] plt.grid(False) plt.xticks([]) plt.yticks([])

      plt.imshow(img, cmap=plt.cm.binary)

      predicted_label = np.argmax(predictions_array) if predicted_label == true_label: color = 'blue' else: color = 'red'

      plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label], 100*np.max(predictions_array), class_names[true_label]), color=color)

      def plot_value_array(i, predictions_array, true_label): predictions_array, true_label = predictions_array, true_label[i] plt.grid(False) plt.xticks(range(10)) plt.yticks([]) thisplot = plt.bar(range(10), predictions_array, color="#777777") plt.ylim([0, 1]) predicted_label = np.argmax(predictions_array)

      thisplot[predicted_label].set_color('red') thisplot[true_label].set_color('blue')

      i = 1000 plt.figure(figsize=(6,3)) plt.subplot(1,2,1) plot_image(i, predictions[i], y_test, x_test) plt.subplot(1,2,2) plot_value_array(i, predictions[i], y_test) plt.show()

      Plot the first X test images, their predicted labels, and the true labels.

      Color correct predictions in blue and incorrect predictions in red.

      num_rows = 5 num_cols = 3 num_images = num_rowsnum_cols plt.figure(figsize=(22num_cols, 2num_rows)) for i in range(num_images): plt.subplot(num_rows, 2num_cols, 2i+1) plot_image(i, predictions[i], y_test, x_test) plt.subplot(num_rows, 2num_cols, 2i+2) plot_value_array(i, predictions[i], y_test) plt.tight_layout() plt.show()

    1. According to the model (trained on the new house data), the Church St home is the most overvalued (and is therefore the worst deal) at 99k over model price (300k). The best buy would be Holly Point which is undervalued at 138k below model price (235k)

      OK, but could you add a table to more clearly communicate your results. Perhaps order from best "deal" to worst?

    2. Machine learning, however, takes answers and data as an input and the model creates (or, perhaps more appropriately, guesses) the rules.

      Good. Could you elaborate on why this is significant? What aspect of data science has made this new design not only possible but also viable in its useful application towards addressing a multitude of research questions across a spectrum of disciplines

    1. There is not a large impact. The loss is 0.2775 and takes 45s to train. The accuracy is better by about .025 and takes about 15s longer compared to a single Dense layer of 512

      Good assessment of the computational expense associated with this added line of code.

    2. There are 10 neurons to match the 10 expected outputs for the network. I get a InvalidArgumentError when I try training the network with 5 instead of 10 neurons in my last Dense laye

      Good

    3. I get a ShapeError. This is because our data is currently 28x28 pixel images, and we cannot have a 2D network. We need to flatten the 2D array into a 784 1D array for the model to work

      OK, good. I see these are the questions from the notebook. Thank you for providing these answers!

    4. I know this because 2. The 10th element on the list is the biggest, and the ankle boot is labeled 9. It should be noted that the 10th element is actually the digit 9 which represents the ankle boot since the neurons are numbered 0-9 for a length of 10

      OK, I see - think the class structure from the fashion_MNIST dataset was still present when running the code on the mnist dataset (letters)

    1. The home that costs $97,000 with 3 bd and 1 ba is the best deal since it has the greatest difference from the predicted price using the model that was fit to the 6 homes. The home that costs $577,200 with 5 bd and 2 ba is the worst deal since it has the greatest difference from the predicted price using the model. I fit a model on the given home prices and then compared each home’s bedroom to the original given model of 50 + 50x where x is the number of bedrooms. Then the model predicted the price for each bedroom. Then this predicted price was subtracted from the original given model with the same number of bedrooms. The result with the highest and lowest price are the worst and best houses respectively

      OK, but could you add a table to more clearly communicate your results. Perhaps order from best "deal" to worst?

    2. For machine learning, instead of the programmer writing the rules, the machine will look at both the input and the output and find the rules that govern them

      Could you have further elaborated on this explanation. What makes this new design possible for machines to "learn"?

  17. caitlin0806.github.io caitlin0806.github.io
    1. I.D. 1) 60000 images 28 by 28 pixels 2) 60000 3) 10000 images 28 by 28 pixels 4) array output: [[9.6861429e-11 4.3787887e-07 9.9999952e-01 1.0604523e-09 1.2526731e-16 3.6759984e-10 2.3595672e-11 1.7706627e-14 8.6952684e-10 1.8218145e-17]]

      Build and train the model

      callbacks = myCallback()

      model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(512, activation=tf.nn.relu), tf.keras.layers.Dense(10, activation=tf.nn.softmax) ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

      model.fit(x_train, y_train, epochs=10, callbacks=[callbacks])

      Make predictions

      probability_model = tf.keras.Sequential([model, tf.keras.layers.Softmax()])

      predictions = probability_model.predict(x_test)

      predictions[1000]

      np.argmax(predictions[1000])

      y_test[1000]

      def plot_image(i, predictions_array, true_label, img): predictions_array, true_label, img = predictions_array, true_label[i], img[i] plt.grid(False) plt.xticks([]) plt.yticks([])

      plt.imshow(img, cmap=plt.cm.binary)

      predicted_label = np.argmax(predictions_array) if predicted_label == true_label: color = 'blue' else: color = 'red'

      plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label], 100*np.max(predictions_array), class_names[true_label]), color=color)

      def plot_value_array(i, predictions_array, true_label): predictions_array, true_label = predictions_array, true_label[i] plt.grid(False) plt.xticks(range(10)) plt.yticks([]) thisplot = plt.bar(range(10), predictions_array, color="#777777") plt.ylim([0, 1]) predicted_label = np.argmax(predictions_array)

      thisplot[predicted_label].set_color('red') thisplot[true_label].set_color('blue')

      i = 1000 plt.figure(figsize=(6,3)) plt.subplot(1,2,1) plot_image(i, predictions[i], y_test, x_test) plt.subplot(1,2,2) plot_value_array(i, predictions[i], y_test) plt.show()

      Plot the first X test images, their predicted labels, and the true labels.

      Color correct predictions in blue and incorrect predictions in red.

      num_rows = 5 num_cols = 3 num_images = num_rowsnum_cols plt.figure(figsize=(22num_cols, 2num_rows)) for i in range(num_images): plt.subplot(num_rows, 2num_cols, 2i+1) plot_image(i, predictions[i], y_test, x_test) plt.subplot(num_rows, 2num_cols, 2i+2) plot_value_array(i, predictions[i], y_test) plt.tight_layout() plt.show()

    1. I have attached a plot depicting the first test image alongside it’s probability distribution, below. Interestingly, the model was so confident in its prediction that the first test image depicted a 7, that the remaining probabilities don’t even appear on the graph.

      How interesting! I am greatly appreciating your thoughtful and thoroughly comprehensive responses. Please keep up the fantastic work!

    2. Doing so helps keep future positive neuron outputs from being cancelled out by any previous negative neuron outputs

      Excellent - and therefor improving the models potential predictive power

    3. However, the model then can’t be tested for accuracy on the same training set because it already knows what those images are supposed to depict.

      excellent

    1. As a result, I determined that the Hudgins house has the best value, at a price of $97,000, because it costs approximately $137,567 less than houses with three bedrooms are predicted to cost. The Hudgins house is thus the most undervalued. On the other hand, the Church house has the worst value, at a price of $399,000, because it costs approximately $99,193 more than houses with four bedroom are predicted to cost. The Church house is thus the most overvalued.

      OK, but could you add a table to more clearly communicate your results. Perhaps order from best "deal" to worst?

    2. In traditional programming, programmers use rules and data to produce answers. However, machine learning almost reverses this process, as the programmer must know what answer they’re looking to receive and provide the data necessary to reach this answer. Then, the machine/computer will generate the rules necessary to reach that answer. In short, traditional programming yields answers based on rules and data, while machine learning yields rules based on answers and data.

      Good. Could you elaborate on why this is significant? What aspect of data science has made this new design not only possible but also viable in its useful application towards addressing a multitude of research questions across a spectrum of disciplines?

    1. D. Using the mnist drawings dataset (the dataset with the hand written numbers with corresponding labels) answer the following questions. 60000, 28, 28 60000 10000, 28, 28 ```python import tensorflow as tf import numpy as np class Callback(tf.keras.callbacks.Callback): def on_epoch_end(self,epoch,logs={}): if(logs.get(‘accuracy’)>0.99): print(‘\nReached 99% accuracy so cancelling training!’) self.model.stop_training = True mnist = tf.keras.datasets.mnist (x_train, y_train),(x_test, y_test) = mnist.load_data() x_train, x_test = x_train/255.0, x_test/255.0 callbacks = Callback() model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28,28)), tf.keras.layers.Dense(512,activation=tf.nn.relu), tf.keras.layers.Dense(10, activation=tf.nn.softmax) ]) model.compile(optimizer=’adam’, loss=’sparse_categorical_crossentropy’, metrics=[‘accuracy’]) model.fit(x_train,y_train,epochs=10,callbacks=[callbacks]) probability_model = tf.keras.Sequential([model, tf.keras.layers.Softmax()]) predictions = probability_model.predict(x_test) random_image = np.random.randint(0,len(x_test)) print(‘Random Image Number:’,random_image) print(predictions[random_image]) 5. ```python print(np.argmax(predictions[random_image])) ```python def plot_value_array(i, predictions_array, true_label): predictions_array, true_label = predictions_array, true_label[i] plt.grid(False) plt.xticks(range(10)) plt.yticks([]) thisplot = plt.bar(range(10), predictions_array, color=”#777777”) plt.ylim([0, 1]) predicted_label = np.argmax(predictions_array) thisplot[predicted_label].set_color(‘red’) thisplot[true_label].set_color(‘blue’) numbers = [‘0’,’1’,’2’,’3’,’4’,’5’,’6’,’7’,’8’,’9’] plot_value_array(1, predictions[random_image], y_test) _ = plt.xticks(range(10), numbers, rotation=45) plt.show() ```

      Build and train the model

      callbacks = myCallback()

      model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(512, activation=tf.nn.relu), tf.keras.layers.Dense(10, activation=tf.nn.softmax) ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

      model.fit(x_train, y_train, epochs=10, callbacks=[callbacks])

      Make predictions

      probability_model = tf.keras.Sequential([model, tf.keras.layers.Softmax()])

      predictions = probability_model.predict(x_test)

      predictions[1000]

      np.argmax(predictions[1000])

      y_test[1000]

      def plot_image(i, predictions_array, true_label, img): predictions_array, true_label, img = predictions_array, true_label[i], img[i] plt.grid(False) plt.xticks([]) plt.yticks([])

      plt.imshow(img, cmap=plt.cm.binary)

      predicted_label = np.argmax(predictions_array) if predicted_label == true_label: color = 'blue' else: color = 'red'

      plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label], 100*np.max(predictions_array), class_names[true_label]), color=color)

      def plot_value_array(i, predictions_array, true_label): predictions_array, true_label = predictions_array, true_label[i] plt.grid(False) plt.xticks(range(10)) plt.yticks([]) thisplot = plt.bar(range(10), predictions_array, color="#777777") plt.ylim([0, 1]) predicted_label = np.argmax(predictions_array)

      thisplot[predicted_label].set_color('red') thisplot[true_label].set_color('blue')

      i = 1000 plt.figure(figsize=(6,3)) plt.subplot(1,2,1) plot_image(i, predictions[i], y_test, x_test) plt.subplot(1,2,2) plot_value_array(i, predictions[i], y_test) plt.show()

      Plot the first X test images, their predicted labels, and the true labels.

      Color correct predictions in blue and incorrect predictions in red.

      num_rows = 5 num_cols = 3 num_images = num_rowsnum_cols plt.figure(figsize=(22num_cols, 2num_rows)) for i in range(num_images): plt.subplot(num_rows, 2num_cols, 2i+1) plot_image(i, predictions[i], y_test, x_test) plt.subplot(num_rows, 2num_cols, 2i+2) plot_value_array(i, predictions[i], y_test) plt.tight_layout() plt.show()

    2. If we simply did this with training data, the model has already seen this data, so we could run into the issue of the model simply memorizing the classification of the training data and not actually being able to classify

      Could you have identified the term for this phenomenon that is common to neural networks?

    1. The Hudgins house is a 3 bedroom house that costs about $100K, but the model predicts that it would cost $215K, more than double the price.

      OK, but could you add a table to more clearly communicate your results. Perhaps order from best "deal" to worst?

    2. The two answers are not the same, but they are very close. This is because the model trains on the data by guessing and then using the loss function’s output with the optimizer to produce smaller and smaller losses. The prediction converges to the same value (of 22 in this case), so the difference is very minimal, but the output is always slightly different.

      Good

    3. According to Maroney, the difference between traditional programming and machine learning is that traditional programming involves inputting rules and data for the computer to produce the answers as the output, but machine learning takes data and answers as the input and the computer tries to determine the rules based on the data and answers.

      Good. Could you elaborate on why this is significant? What aspect of data science has made this new design not only possible but also viable in its useful application towards addressing a multitude of research questions across a spectrum of disciplines?

    1. The data is split into two sets to reduce overfitting and to better verify its accuracy. The model is only trained off of the training data. It’s accuracy on new data can be tested using the test data, as it has never seen the test data. The test data stands in for what we would be trying to predict, but with answers so you can verify it.

      good

    1. Moon is the worst deal, with my model predicting that you’d pay $109,710 than you should. Hudgins was the best deal, with my model predicting that you’d pay $125,186 less than it should cost.

      OK, but could you add a table to more clearly communicate your results. Perhaps order from best "deal" to worst?

    2. The answers are different because the neural network is retrained and fit on the data. If I had set a random seed the answers might have been the same

      Yes! (I got a chuckle from the fact you identified how to reproduce the same answer twice using seed). You could have also described the probabilistic nature of the NN in serving to produce two almost the same results, yet still slightly different. Isn't this also a remarkable fact!

    3. In traditional programming you use rules you right and data to get answers. Machine learning takes data and answers to make a set of rules for predicting on future data.

      Good. Could you elaborate on why this is significant? What aspect of data science has made this new design not only possible but also viable in its useful application towards addressing a multitude of research questions across a spectrum of disciplines

    1. Based off of the model Maroney’s exercise outlines, only the house on Holly Point Rd (file name hudgins) is a good deal (3 bedrooms is predicted to mean the house is around 200k, while this house is selling at near 100k). The worst deal is the house on Church St (file name church), which sells at 399k, about 150k above its modeled price.

      Good. Could you have provided a simple chart of the continuum of prices in order to support your finding?

    2. The episode introduces machine learning as answers and data generating rules/relationships instead of a programmer figuring out rules to compute answers. This means that activity that is hard to find a set of rules for as a programmer can be modeled by a network of “neurons” and then can generate predictions based on that network. He demonstrates this idea by setting + up a single neuron machine learner which is used to generate the function of a line from a list of values derived from said function. Having only a signle neuron presents a very + simple situation where we see only one rule that presumably will lead to a lot of rules being generated from a large set of neurons interconnected.

      Excellent!

    1. This graph displays the accuracy of the model in predicting the connotation of a movie review compared to the number of epochs the model cycled through. The positive trend is typical as the optimizer is adjusting the cost function with the returns from the loss function. There seems to be a platue at 4 epochs, meaning after 4 epochs we are gaining less returns in run time versus accruacy and may be overfitting the model.

      Excellent work. I don't know why, but they usually present the results from the accuracy function first then the loss function second. Maybe, as it relates to assessing overfitting. Just an observation.

    1. The loss function gives penalties to the model predictions that are incorrect. These penalities are fed into the optimizer that adjusts the weights for each neuron based upon the size of the penality from the loss function.

      Could you elaborate on how this is measured?

    2. He splits the data into training and testing because if we only trained the model on one set of data, it will get overfit. It allows for the verification of the model’s accuracy in identifyng images it has not seen before in training.

      good

    1. When subtracting the predicted cost from the posted cost, house 2 yields the largest negative value, meaning it is the largest difference between the estimated price and what you would actually pay. This is the best deal because the buyinng offer is the lowest compared to the predicted price. On the contrary, house 5, when the predicted price is subtracted from the buying price, yields the largest positive value. This means the asking price is over the amount that is predicted based upon other offers and therefore is the worst deal.

      Good

    2. The best deal is house 2 ($97,000 for 3 bedrooms) and the worst deal is house 5 ($250,000 for 2 bedrooms) This is explained when comparing the prices offered to the prices predicted by the 1 layer 1 neuron nueral network that are posted below: 2 beds - 169.26517 3 beds - 234.5404 4 beds - 299.81564 5 beds - 365.09088

      OK, but could you add a table to more clearly communicate your results. Perhaps order from best "deal" to worst?

    3. Traditional programming had the inputs of the data and rules which gave an answer. Machine learning takes the data and answers as an input and outputs the rules needed.

      Good. Could you elaborate on why this is significant? What aspect of data science has made this new design not only possible but also viable in its useful application towards addressing a multitude of research questions across a spectrum of disciplines?

  18. May 2020
  19. sarenaoberoi.github.io sarenaoberoi.github.io
    1. I’ve noticed that a large gap seen in the literature, is that scientists are failing to recognize how a number of different covariates work in tandem with one another to spread the disease.

      exceptional, we should focus more on variable interactions -- Poisson, Cox & Gibbs point process models might help us

  20. Apr 2020
    1. To accurately distribute resources and further understand how rates are changing, various geospatial data methods investigating the impact natural disasters have on nutrition rates were analyzed and compared, including: DHS data, spatial video, geographically weighted regression (GWR), and ordinary least squares regression models (OLS).

      Geographically weighted regression (GWR) and ordinary least squares regression models (OLS) that used DHS and spatial video data sources were investigated in order to assess the effectiveness of post natural disaster resources distribution in terms of nutrition.

    2. In 2010, a 7.0 magnitude earthquake struck Haiti and devastated millions. As an immediate effect of this natural disaster, child malnutrition became a relevant developmental issue in Haiti. The disaster substantially impacted sectoral conditions such as drinking water, sanitation, energy/fuel, food supply, healthcare, and clearing of debris by the disaster. Hence, these conditions note to have a disruption on the direct development of Haiti and stunted the process. When considering solutions, humanitarian aid narrowed in on improving these sectoral and household conditions to lessen the rate.

      I think this section could be distilled down to two sentences. Could you include a quantitive measure that connects the tangible significance of harms.

    3. Nutrition is a quintessential sustainable development goal. Child malnutrition can be a cause of long-term effects such as inadequate dietary intake and diseases, as well as short term effects like natural disasters and political turmoil.

      Try to synthesis these two sentences. I like your introductory statement, excellent hook. Could you likewise continue to capture attenuation through integrating the following statement?

    1. Where Russia differs is localized regions within the country that experience different natures of hardship, thus a universal or average solution does not work.

      good

    1. Excellent work. Central focus clearly defined. Some thoughts.

      (1) Are you considering some of the machine learning approaches presented on WorldPop as a basis for understanding the the informal sector in Cameroon? Random forest and hierarchical bayesian models present promise in this area. Gravity type models using CDR data could also be useful to describe behavior and movement.

      (2) You have identified a number of quantitative model from what appears to be tradition economics or development Econ. Have any of these recently been extended to incorporate data science methods (machine learning)?

      Keep going!

    2. Columbia’s annual manufacturing survey which provides information such as sales, wages, employment, capital, input prices and output for companies with at least ten employees. They also used the Registry of Violence which provides direct information about internally displaced people, such as their original municipality, the date they left, new municipality, and socioeconomic status

      good

    3. household statistics to create counterfactual analysis and identify which attributes were effective in eliminating economic inequality

      interesting - I am detecting an emerging theme

    4. counterfactual samples to make predictions on what factors such as tax rates, government revenue, and job creation had on the profit of these businesses

      great

    5. distance to the nearest roads and the average road density of the cities the businesses reside in

      agreed - real estate prices are often largely set by the number of average daily trips for the road a piece of property directly adjoins

    6. geospatial analysis of the General Enterprise Census and National Survey on Employment and the Informal Sector to map out the areas of informal and formal businesses and performed cross section testing to identify what types of areas were most profitable and in what industry

      interesting

    7. formal and informal businesses, with one being taxed and regulated by the Cameroon government while the latter is in a gray area which is not taxed nor regulated by the government

      good

    8. rural-urban migration where those who work in informal sector in the rural side move to the urban side either temporarily or permanently in the search of a job that much higher pay than their current one (Todaro)

      wow - good

    9. Cameroon’s large informal economy exploits the current rampant economic inequality that exists in Cameroon, causing unfair competition with formal businesses and discourages economic growth in rural regions, which is ultimately a result of the significant lack of resources in the rural regions such as access to high-quality infrastructure, job opportunities, and education.

      good

  21. miaaao.github.io miaaao.github.io
    1. Excellent work. Good job defining your thematic focus, urbanization throughout India. A few thoughts.

      (1) You've done a great job of capturing the three major themes as related to urbanization. First you have begun to identify some of the sources that have emerged in order to describe urban populations and their demographic composition. Gridded populations and other methods such as those found on WorldPop will be inherently useful in your critical analysis of the literature. Have you considered such machine learning approaches as random forest and also hierarchical bayesian models? You also have begun to touch on urbanization as a complex system, and wonder if you will include such ideas as fractals and power laws in your review. Finally, the later source also focuses on high resolution description of urban areas - buildings, and possibly their use classification. Will be very interested to know more about the methods you select.

      (2) Have you identified a gap in the literature? Can you formulate your research question? What type of puzzle do you think your investigation into urbanization throughout India presents?

      Just keep going!

    2. Altogether, spider charts precisely display the characteristics of the twelve cities. Therefore, the analysis answers the author’s question on India’s urban growth types.

      Very good, but a bit dated. I know Taubenbock from when I was at the TU Berlin, he did some work with one of my colleagues. They used LiDAR to capture the 3D signature of building envelopes in order to classify building type /land use from a building by building resolution. The HRSL is similar to this. Taubenbock used to be the head of a research group at the DLR in Germany (kind of like the German NASA). I wonder what type of work they are doing currently?

    3. urban dynamics

      I'm wondering about these different models - are these agent-based models? is there a method that uses a mathematical formula to forecast growth? The authors work seems interesting