31 Matching Annotations
  1. Mar 2024
  2. Dec 2023
    1. It's important to note that array numbering in Julia starts with 1, not with 0 as in most other languages.

      “It makes sense from a mathematics point of view” © Julia creators. Another surprise for coders

  3. Nov 2023
    1. For example, for sentiment classification tokens awesome, brilliant, great will have higher probability given positive class then negative. Similarly, tokens awful, boring, bad will have higher probability given negative class then positive.

      Certainly not the reason why the Naïve Bayes is called so, but this is indeed amazingly naïve.

      The reason why it is not suitable for sentiment analysis, maybe except for baby talk sentiment analysis.

  4. Sep 2023
    1. The hypothesis is that your brain searched for other words that can be used in the same contexts, found some (e.g., wine), and made a conclusion that tezgüino has meaning similar to those other words. This is the distributional hypothesis:

      Our brain can also notice the spelling, so it rather thinks of tequila than wine

    1. Functions should take the data structures that users have as opposed to the data structure that developers want. For example, a model function’s only interface should not be constrained to matrices. Frequently, users will have non-numeric predictors such as factors.

      Sacred truth

    1. In the context of modeling, it is also important to avoid highly technical jargon, such as Greek letters or obscure terms in terms.

      Why do they hate Greek letters so much in America? Greek letters are concise way of naming values you otherwise should name with some long symbol strings. In addition, Greek argument and value names provide the connection between code and theory making the user more prepared to understand corresponding papers if she had to.

  5. Apr 2023
    1. Медицина: врачи могут использовать бота для получения ответов на медицинские вопросы и улучшения качества диагностики и лечения.

      Нет! Только если для мозгового штурма.

  6. Nov 2022
    1. Note that parsnip constrains the outcome column of a classification model to be encoded as a factor; using binary numeric values will result in an error.

      This is right!

    2. Some of the original argument names can be fairly jargon-y. For example, to specify the amount of regularization to use in a glmnet model, the Greek letter lambda is used. While this mathematical notation is commonly used in the statistics literature, it is not obvious to many people what lambda represents (especially those who consume the model results).

      Hands off the Greek letters! :)) They imply minimal math background necessary to understand the models.

    3. For other types of models, the interfaces may be even more disparate. For a person trying to do data analysis, these differences require the memorization of each package’s syntax and can be very frustrating.

      Holy truth

    4. The parsnip package, one of the R packages that are part of the tidymodels metapackage, provides a fluent and standardized interface for a variety of different models. In this chapter, we give some motivation for why a common interface is beneficial for understanding and building models in practice and show how to use the parsnip package.

      Unified interface for different modelling functions from different packages is a great idea, and it is the prime reason I learn tidymodels at all.

    1. As an example, M. Kuhn and Johnson (2020) use data to model the daily ridership of Chicago’s public train system using predictors such as the date, the previous ridership results, the weather, and other factors. Table 1.1 shows an approximation of these authors’ hypothetical inner monologue when analyzing these data and eventually selecting a model with sufficient performance.

      A great example

    1. Chapter 4, “Subsetting”, now distinguishes between [ and [[ by their intention: [ extracts many values and [[ extracts a single value (previously they were characterised by whether they “simplified” or “preserved”).

      I think the difference is that [ extracts subset and [[ extracts element.

  7. Sep 2022
    1. Without calculus and the ideas of functions and their derivatives, Smith was not able to think about prices in a modern way where price is shaped by demand and supply. Instead, for Smith, each item has a “natural price”: a fixed quantity that depends on the amount of labor used to produce the item. Nowadays, we understand that productivity changes as new methods of production and new inventions are introduced.

      So, the amount of labor used to produce the item changes following advances in technology, so does its "natural price", while market price fluctuates around the latter influenced by demand and supply. What's wrong?

    1. “people” / “passengers” / “customers” / “patients” / “cases” / “passenger deaths”: these are different different ways to refer to people. we will consider such quantities to have dimension P, for population.

      And also media audience

    2. There are other dimensions: volume, force, pressure, energy, torque, velocity, acceleration, and such. These are called compound dimensions because we represent them as combinations of the fundamental dimensions, L, T, and M. The notation for these combinations involves multiplication and division.

      Interesting to think about dimension in media analysis

    1. Another clue is whether “zero” means “nothing.” Daily temperatures in the winter are often near “zero” on the Fahrenheit or Celcius scales, but that in no way means there is a complete absence of heat. Those scales are arbitrary. Another way to think about this clue is whether negative values are meaningful. If so, thinking in terms of orders of magnitude is not likely to be useful.

      So, media audience is better to think about in terms of orders of magnitude