433 Matching Annotations
  1. Oct 2020
    1. 和 Python 里的字符串和列表切片不同,你不能在 start, stop 或者 step 这些参数中使用负数。:

      通过 itertools.islice() 可以实现 set dict 的切片操作。

    2. line.strip()


    3. # A recursive generator that generates Tree leaves in in-order.


    4. Any function containing a yield keyword is a generator function;

      函数体里包含 yield 关键字的函数都是生成器函数

    5. Generators


    6. if expression is creating a tuple, it must be surrounded with parentheses.


    7. list comprehensions

      List Comprehensions 列表推导

      Generator Expressions 生成器表达式

    8. An object is called iterable if you can get an iterator for it.

      言外之意就是,如果调用 iter 之后没有报 TypeError, 说明该对象是可迭代的(iterable)

    9. An iterator is an object representing a stream of data;


    1. key 形参用来指定在进行比较前要在每个列表元素上调用的函数

      与 C++ 不同,key 形参用来指定进行比较前在每个列表元素上调用的函数。

    1. Sprawdźmy który rodzaj modelu daje najlepszą skuteczność: Python sns.boxplot(data=models_df, x='score', y='model') 1 sns.boxplot(data=models_df, x='score', y='model')

      After comparing the pipelined ML models, we can easily display a comparison boxplot.

      Well working ML models: 1) XGBoost 2) LightGBM 3) CatBoost.

    2. Przy tych danych wygląda, że właściwie nie ma większej różnicy (nie bijemy się tutaj o 0.01 punktu procentowego poprawy accuracy modelu). Może więc czas treningu jest istotny? Python sns.boxplot(data=models_df, x='time_elapsed', y='model') 1 sns.boxplot(data=models_df, x='time_elapsed', y='model')

      Training time of some popular ML models. After considering the performance, it's worth using XGBoost and LightGBM.

    3. Teraz w zagnieżdżonych pętlach możemy sprawdzić każdy z każdym podmieniając klasyfikatory i transformatory (cała pętla trochę się kręci):

      Example (below) of when creating pipelines with scikit-learn makes sense. Basically, it's convenient to use it while comparing multiple models in a loop

    1. hyperscript is more concise because it's just a function call and doesn't require a closing tag. Using it will greatly simplify your tooling chain.

      I suppose this is also an argument that Python tries to make? That other languages have this con:

      • cons: closing tags make it more verbose / increase duplication and that Python is simpler / more concise because it uses indentation instead of closing delimiters like end or } ?
    1. Used by convention to avoid naming conflicts with Python keywords.

      var_ 主要用来避免与关键字冲突

    2. meant as a hint to the programmer only.

      _var 这种单下划线对于解释器没有实际作用.程序员之间表示内部使用的公用做法

    1. The second is a “throwaway” variable that we don’t need just yet, denoted with an underscore.


    1. If <key> is not found, it returns None

      .get(key) 如果键存在返回相应的值,否则返回 None

    2. The len() function returns the number of key-value pairs in a dictionary

      len 函数返回字典中键值对的个数

    3. the in and not in operators


    4. key values are simple strings


    5. Restrictions on Dictionary Values

      Python对 dict 的值没有限制, 可以是任意类型(包括可变类型和自定义对象)

    6. it is not quite correct to say an object must be immutable to be used as a dictionary key.

      一个对象必须是不可变的才能用作 dict 的键。

      这句话不是很准确。 严谨的说,一个对象必须是可哈希(hashable) 的,才可以用作 dict 的键。

    7. Restrictions on Dictionary Keys#

      dict 键值的限制:

      1. 键不能重复
      2. 键必须是不可变量(列表、字典不可以)
    8. a dictionary key must be of a type that is immutable.

      dict 键必须是不可修改的(immutable)

    9. the values contained in the dictionary don’t need to be the same type.

      dict 键和值的类型不必相同

    10. Python does guarantee that the order of items in a dictionary is preserved.

      尽管 Python 中访问字典元素与顺序无关, 但是 Python 会保存字典中元素定义的顺序(Python 3.7 引入的新特性).

    11. they have nothing to do with the order of the items in the dictionary.

      Python 中字典同样可以通过数字来访问,但是与列表不同的是,数字大小与元素的顺序没有关系.

    12. If you refer to a key that is not in the dictionary, Python raises an exception:

      https://realpython.com/python-defaultdict 可以参看这篇文章. defaultdict 可以处理 missing key 的情况.

    13. Dictionary elements are not accessed by numerical index

      注意: Python 中字典不能通过数字下标访问元素.

    14. A list of tuples works well for this:


      dd = dict([('a',1),('b',2)])
    15. Dictionaries and lists share the following characteristics:

      Python 中字典和列表的相同点是:

      1. 可以修改的
      2. 动态变化,支持增加和缩减.
      3. 支持嵌套. 列表中可以嵌套另一个列表, 字典可以包含另一个字典.字典也可以包含列表.
    16. Dictionaries differ from lists primarily in how elements are accessed:

      Python 中字典与列表的不同点:

      1. 列表通过下标访问元素
      2. 字典用过键值访问元素
    1. 这篇文章主要介绍了 Python 中字典处理缺省键值的方法。

      引入了一个新的数据类型 defaultdict,并介绍了它访问和修改不存在键值时的机制。

      主要是重写了 .missing__() 使得在通过 subscription operation 访问修改缺省键值时自动调用该方法,从而避免了dict 的 TypeError。

    2. the instance behaves like a standard dictionary.

      如果初始化 defaultdict 没有参数, 该变量蜕变成一个标准 dict。

    3. The first argument to the Python defaultdict type must be a callable that takes no arguments and returns a value.


    4. Using the Python defaultdict Type for Handling Missing Keys

      用 Python 的 defaultdict 处理不存在的键

      想要学习 defaultdict 的原因: 看到 up 主在实现 DFS 的时候用到了这个语法 code


    5. not to call it using the parentheses at initialization time.

      注意.default_factory 赋值的时候一定不要带括号。

    6. the dictionary assigns it the default value that results from calling list().

      当我们访问不存在键的时候,defaultdict 会自动将调用 default_factory 的值赋给该键。

    7. if you call .setdefault() on an existing key, then the call won’t have any effect on the dictionary.

      如果对已存在的键调用 setdefault ,不会修改原值。


    8. four available ways to handle missing keys


      1. .setdefault()
      2. .get()
      3. key in dict
      4. try and except
    9. if you try to access or modify a missing key, then defaultdict will automatically create the key and generate a default value for it.

      如果你想要访问或者修改一个缺省键值, defaultdict 会自动创建这个键然后生成一个默认值

    10. Decide when and why to use a Python defaultdict rather than a standard dict


      Decide when and why to use a Python defaultdict rather than a standard dict.

    11. augmented assignment operator

      augemented assignment operators In Chinese: 增强赋值运算符

    12. Sets are collections of unique objects,

      Python 中的 set 是单一对象的集合,不存在重复项。 与 C++ 的 set , unordered_set 类似

    13. for large datasets, it can also be a lot faster and more efficient.

      defaultdict 速度和效率往往好于 dict

    14. Diving Deeper Into defaultdict

      深入分析 defaultdict

    15. When and why to use a Python defaultdict rather than a regular dict

      何时以及为什么用 Python 的 defaultdict 而不是常规的 dict

    1. Note that __missing__() is not called for any operations besides __getitem__().

      missing() 不会被除 getitem() 之外的其他操作调用。

    1. use code to parameterize calls:

      You can write Python code to parametrize calls:

      python -c "
      from mymodule import set_dragon_feeding_schedule, Creatures, Date
          feeding_times=['10:00', '14:00', '18:00'],
          dishes={Creatures.Tiger: 2, Creatures.Human: 1},

      instead of:

      python -m mymodule \
          set_dragon_feeding_schedule \
          --feeding-times ['10:00','14:00','18:00'] # hopefully this way it gets recognized \
          # how will you define parsing a dict with enum to integer mapping? 
          --dishes=Creatures.Tiger:2 \
          --dishes=Creatures.Human:1 \
          --start-day=1020-03-21 # BTW bash allows no comments in multiline calls
    2. That’s it. Types are parsed, checked and converted. Defaults and description are picked from function itself. Even provides bash completions you can install. You wrote no code for that!

      Good example of writing CLI interfaces in Python with typer:

      import typer
      from pathlib import Path
      app = typer.Typer()
      def find_dragon(name: str, path: Path, min_age_years: int = 200):
          <implementation goes here>
      def feed_dragon(dragon_name: str, n_humans: int = 3):
          <implementation goes here>
      if __name__ == "__main__":

      later we can call it that way:

      python example.py find_dragon 'Drake' --path /on/my/planet
    1. Merge (|) and update (|=) operators have been added to the built-in dict class. Those complement the existing dict.update and {**d1, **d2} methods of merging dictionaries.

      From Python 3.9 it's much more convenient to:

      • merge dictionaries with the | (pipe) operator, e.g. x | y
      • update them with |=
    1. Use Streamlit if you want to get going as quickly possible and don’t have strong opinions or many custom requirements.Use Dash if you need something more flexible and mature, and you don’t mind spending the extra engineering time.

      Streamlit vs Dash

    2. Here’s a table showing the tradeoffs:

      Comparison of dashboard tech stack as of 10/2020:

  2. Sep 2020
    1. The Census FTP page contains the microdata and dictionaries identifying each variable name, location, value range, and whether it applies to a restricted sample. To follow this example, download the April 2017 compressed data file that matches your operating system and unpack it in the same location as the python code. Next download the January 2017 data dictionary text file and save it in the same location.

      This is important

    1. 3.5 PEP 478 security 2015-09-13 2020-09-13 Larry Hastings

      All Python versions less than 3.6 are now EOL

    1. This command will give you the top 25 stocks that had the highest anomaly score in the last 14 bars of 60 minute candles.

      Supriver - find high moving stocks before they move using anomaly detection and machine learning. Surpriver uses machine learning to look at volume + price action and infer unusual patterns which can result in big moves in stocks

    1. Like a string, a list is a sequence of values. In a string, the values are characters; in a list, they can be any type. The values in a list are called elements or sometimes itemsThe syntax for accessing the elements of a list is the same as for accessing the characters of a string—the bracket operator. The expression inside the brackets specifies the index. Remember that the indices start at 0:


      ' Cheddar ' Unlike strings, lists are mutable. When the bracket operator appears on the left side of an assignment, it identifies the element of the list that will be assigned. numbers = [42, 123] numbers[1] = 5 numbers [42, 5] The most common way to traverse the elements of a list is with a for loop. The syntax is the same as for strings: for cheese in cheeses: print(cheese) This works well if you only need to read the elements of the list. But if you want to write or update the elements, you need the indices. A common way to do that is to combine the built-in functions range and len : for i in range(len(numbers)): numbers[i] = numbers[i] 2 This loop traverses the list and updates each element. len returns the number of elements in the list. range returns a list of indices from 0 to


      1, where n is the length of the list. Each time through the loop i gets the index of the next element. The assignment statement in the body uses i to read the old value of the element and to assign the new value. The + operator concatenates lists: a = [1, 2, 3] b = [4, 5, 6] c = a + b c [1, 2, 3, 4, 5, 6] The operator repeats a list a given number of times: [0] 4 [0, 0, 0, 0] [1, 2, 3] 3 [1, 2, 3, 1, 2, 3, 1, 2, 3] The first example repeats [0] four times. The second example repeats the list [1, 2, 3] three times. *ython provides methods that operate on lists. For example, append adds a new element to the end of a list: t = [ ' a ' , ' b ' , ' c ' ] t.append( ' d ' ) t [ ' a ' , ' b ' , ' c ' , ' d ' ] extend takes a list as an argument and appends all of the elements: t1 = [ ' a ' , ' b ' , ' c ' ] t2 = [ ' d ' , ' e ' ] t1.extend(t2) t1 [ ' a ' , ' b ' , ' c ' , ' d ' , ' e ' ] This example leaves t2 unmodified. sort arranges the elements of the list from low to high: t = [ ' d ' , ' c ' , ' e ' , ' b ' , ' a ' ] t.sort() t [ ' a ' , ' b ' , ' c ' , ' d ' , ' e ' ] Most list methods are void; they modify the list and return None . If you accidentally write t = t.sort() , you will be disappointed with the result.

    2. *A string is a sequence , which means it is an ordered collection of other values.

      • You can access the characters one at a time with the bracket operator:
               fruit            =

        ' banana ' letter = fruit[1] The second statement selects character number 1 from fruit and assigns it to letter . The expression in brackets is called an index .A lot of computations involve processing a string one character at a time. Often they start at the beginning, select each character in turn, do something to it, and continue until the end. This pattern of processing is called a traversal A segment of a string is called a slice . Selecting a slice is similar to selecting a character: s = ' Monty Python ' s[0:5] ' Monty ' s[6:12] ' PythonIt is tempting to use the [] operator on the left side of an assignment, with the intention of changing a character in a string. For example: greeting = ' Hello, world! ' greeting[0] = ' J ' TypeError: ' str ' object does not support item assignment The “object” in this case is the string and the “item” is the character you tried to assign. For now, an object is the same thing as a value, but we will refine that definition later (Section 10.10). The reason for the error is that strings are immutable ,

  3. Aug 2020
    1. Once TPOT is finished searching (or you get tired of waiting), it provides you with the Python code for the best pipeline it found so you can tinker with the pipeline from there.

      After all, magically you get the right Python snippet (based on scikit-learn)

    2. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming

      TPOT automates the following:

      • feature selection
      • feature preprocessing
      • feature construction
      • model selection
      • parameter optimisation
    1. from fastapi import FastAPI app = FastAPI() @app.get("/") async def root(): return {"message": "Hello World"}

      Basic get api with fastapi

    1. barChart = pygal.Bar(height=400)[barChart.add(x[0], x[1]) for x in mean_per_state.items()]display(HTML(base_html.format(rendered_chart=barChart.render(is_unicode=True))))

      How to display html finally in jupyter notebooks

    2. from IPython.display import display, HTMLbase_html = """<!DOCTYPE html><html> <head> <script type="text/javascript" src="http://kozea.github.com/pygal.js/javascripts/svg.jquery.js"></script> <script type="text/javascript" src="https://kozea.github.io/pygal.js/2.0.x/pygal-tooltips.min.js""></script> </head> <body> <figure> {rendered_chart} </figure> </body></html>"""

      How to display html in jupyter notebooks

    1. def main(name: str = typer.Argument("World", hidden=True)):

      Hide a CLI argument from the help text

    2. def main(name: str = typer.Argument(..., help="The name of the user to greet")): typer.echo(f"Hello {name}")

      Add a help text for a CLI argument You can use the help parameter to add a help text for a CLI argument. This example without default

    1. import random import typer def get_name(): return random.choice(["Deadpool", "Rick", "Morty", "Hiro"]) def main(name: str = typer.Argument(get_name)): typer.echo(f"Hello {name}") if __name__ == "__main__": typer.run(main)

      Setting a dynamic default value. And we can even make the default value be dynamically generated by passing a function as the first function argument:

    1. def main(name: Optional[str] = typer.Argument(None)):

      How to define an optional parameter in function definition in typer

  4. Jul 2020
    1. Libra is a machine learning API designed for non-technical users. This means that it assumes that you have no background in ML whatsoever.

      With Libra you can write your ML code much faster:

      For example, that's how it compares to Keras.

    1. Return sends a specified value back to its caller whereas Yield can produce a sequence of values. We should use yield when we want to iterate over a sequence, but don’t want to store the entire sequence in memory. Yield are used in Python generators. A generator function is defined like a normal function, but whenever it needs to generate a value, it does so with the yield keyword rather than return. If the body of a def contains yield, the function automatically becomes a generator function.

      Simple explanation of a difference between return and yield in Python

    1. Why don't you allow a range without end, like (1..)? There are two advantages. First, we can write ary[1..] instead of ary[1..-1]. The -1 is one of the most I dislike in Ruby. It is very magical, ugly, redundant, and disappointing. I envy Python's ary[1:]. I know that ary.drop(1) is slightly fast, but too long for such a common operation, IMO. Second, we can write (1..).each {|n| ... }.
    1. from difflib import SequenceMatcher squeeze=SequenceMatcher( None, A, B ) print "A - B = [%s]"%( reduce( lambda p,q: p+q, map( lambda t: squeeze.a[t[1]:t[2]], filter(lambda x:x[0]!='equal', squeeze.get_opcodes() ) ) ) )
    2. The mixed functional, OO and procedural nature of python has always been, IMO, one of its strengths.
    3. However, this is not working in Python 3, as print has changed from a command to a function, and reduce, filter and map have been declared unpythonic.
  5. Jun 2020
    1. # scenario-1: delete in session: SA might set category_id of all chilren Products to None c1 = session.query(Category).get(1) session.delete(c1) session.commit() # scenario-2: delete without loading an object into the session: SA will perform no additional logic session.query(Category).filter(Category.id == 2).delete() session.commit()

      Weird sqlalchemy behavior but totally an accurate statement. When using postgres and not specifying how to handle deletes, if you delete via object instead of filter.delete, sqlalchemy will set all children with foreignkey id's to None.

    1. in Python - setting up basic logger is very simple

      Apart from printing the result, it is better to debug with logging.

      Sample logger:

      import logging
          format= '[%(asctime)s] {%(pathname)s:%(lineno)d} %(levelname)s - %(message)s',
      logging.error("Some serious error occurred.")
      logging.warning('Function you are using is deprecated.')

      the sample result:

      [12:52:35] {<stdin>:1} ERROR - Some serious error occurred.
      [12:52:35] {<stdin>:1} WARNING - Function you are using is deprecated.

      to find its location, type:

    1. Integration of Naming Conventions into Operations, Autogenerate

      Importance of naming conventions for sqlalchemy when running alembic db migrations.

    1. it allows each new key to be given a default value based on the type of dictionary being created

      Difference between defaultdict and dict

    2. Memoization can be achieved through Python decorators

      Example of memoization in Python:

      import timeit
      def memoize_fib(func):
          cache = {}
          def inner(arg):
              if arg not in cache:            
                  cache[arg] = func(arg)
              return cache[arg]
          return inner
      def fib(num):
          if num == 0:
              return 0
          elif num == 1:
              return 1
              return fib(num-1) + fib(num-2)
      fib = memoize_fib(fib)
      print(timeit.timeit('fib(30)', globals=globals(), number=1))
      print(timeit.timeit('fib(30)', globals=globals(), number=1))
      print(timeit.timeit('fib(30)', globals=globals(), number=1))


    3. A deep copy refers to cloning an object. When we use the = operator, we are not cloning the object; instead, we reference our variable to the same object (a.k.a. shallow copy).

      Difference between a shallow copy (=) and a deep copy:

    4. Python 2 is entrenched in the software landscape to the point that co-dependency between several softwares makes it almost impossible to make the shift.

      Shifting from Python 2 to 3 isn't always straight to the point

    5. The @property decorator allows a function to be accessed like an attribute.

      @property decorator

    6. var = true_val if condition else false_val

      Example of a ternary operator (one-line version of if-else):

      to_check = 6
      msg = "Even" if to_check%2 == 0 else "Odd"

      Usual if-else:

      msg = ""
      if(to_check%2 == 0):
        msg = "Even"
        msg = "Odd"
    7. This method is automatically called to allocate memory when a new object/ instance of a class is created.

      __init__ method in Python (which essentially all classes have)

    8. decorator is a design pattern in Python that allows a user to add new functionality to an existing object without modifying its structure


    9. The philosophy behind Flask is that it gives only the components you need to build an app so that you have the flexibility and control. In other words, it’s un-opinionated. Some features it offers are a build-int dev server, Restful request dispatching, Http request handling, and much more.

      Flask isn't as full of features as Django (which makes him lighter), but it still offers:

      • build-int dev server
      • Restful request dispatching
      • HTTP request handling
      • much more...
    10. The only difference is that range returns a Python list object and xrange returns an xrange object. This means that xrange doesn’t actually generate a static list at run-time like range does.

      Difference between range() and xrange()

    11. NumPy is not just more efficient; it is also more convenient. You get a lot of vector and matrix operations for free, which sometimes allow one to avoid unnecessary work. And they are also efficiently implemented.

      Main advantage of NumPy arrays over (nested) Python lists

    12. process of retrieving original Python objects from the stored string representation


    13. Pickle module accepts any Python object and converts it into a string representation and dumps it into a file by using dump function


    14. The beauty of lambda functions lies in the fact that they return function objects. This makes them helpful when used with functions like map or filter which require function objects as arguments.

      When to use lambdas

    15. In Python, the term monkey patch only refers to dynamic modifications of a class or module at run-time.

      Monkey patching in Python

    16. Python has a multi-threading package but if you want to multi-thread to speed your code up, then it’s usually not a good idea to use it. Python has a construct called the Global Interpreter Lock (GIL). The GIL makes sure that only one of your ‘threads’ can execute at any one time. A thread acquires the GIL, does a little work, then passes the GIL onto the next thread.

      Multi-threading is available in Python but it;s often not a good idea

    1. content Gray Hat Python


      Teach Your Kids to Code: A Parent-Friendly Guide to Python Programming MSRP: $23.95 Already had

      Invent Your Own Computer Games with Python, 4th Edition MSRP: $23.95 Already had

      Black Hat Python: Python Programming for Hackers and Pentesters MSRP: $27.95 https://www.amazon.com/Black-Hat-Python-Programming-Pentesters-ebook/dp/B00QL616DW/

      Gray Hat Python: Python Programming for Hackers and Reverse Engineers MSRP: $31.95 https://www.amazon.com/Gray-Hat-Python-Programming-Engineers-ebook/dp/B007V2DNEK/


      Mission Python: Code a Space Adventure Game! MSRP: $23.95

      Python for Kids: A Playful Introduction to Programming MSRP: $27.95

      Cracking Codes with Python: An Introduction to Building and Breaking Ciphers MSRP: $23.95

      Python Playground: Geeky Projects for the Curious Programmer MSRP: $23.95

      Math Adventures with Python: An Illustrated Guide to Exploring Math with Code MSRP: $23.95


      Python Crash Course, 2nd Edition: A Hands-On, Project-Based Introduction to Programming MSRP: $31.95

      Automate the Boring Stuff with Python, 2nd Edition: Practical Programming for Total Beginners MSRP: $31.95

      Python Flash Cards PDF ONLY MSRP: $27.95

      Serious Python: Black-Belt Advice on Deployment, Scalability, Testing, and More MSRP: $27.95

      Impractical Python Projects: Playful Programming Activities to Make You Smarter

  6. May 2020
    1. Hot Reloading refers to the ability to automatically update a running web application when changes are made to the application’s code.

      Hot Reloading is what provides a great experience with updating your Dash code inside the Jupyter Notebooks

    2. JupyterDash supports three approaches to displaying a Dash application during interactive development.

      3 display modes of Dash using Jupyter Notebooks:

      1. app.run_server(mode='external')
      2. app.run_server(mode='inline')
      3. app.run_server(mode='jupyterlab')
    3. # Run app and display result inline in the notebookapp.run_server(mode='inline')

      Moreover, you can display your Dash result inside a Jupyter Notebook using IPython.display.IFrame with this line:


    4. If running the server blocks the main thread, then it’s not possible to execute additional code cells without manually interrupting the execution of the kernel.JupyterDash resolves this problem by executing the Flask development server in a background thread. This leaves the main execution thread available for additional calculations. When a request is made to serve a new version of the application on the same port, the currently running application is automatically shut down first. This makes is possible to quickly update a running application by simply re-executing the notebook cells that define it.

      How Dash can run inside Jupyter Notebooks

    5. You can also try it out, right in your browser, with binder.

      Dash can be tried out inside a Jupyter Notebook right in your browser using binder.

    6. Then, copy any Dash example into a Jupyter notebook cell and replace the dash.Dash class with the jupyter_dash.JupyterDash class.

      To use Dash in Jupyter Notebooks, you have to import:

      from jupyter_dash import JupyterDash

      instead of:

      import dash

      Therefore, all the imports could look like that for a typical Dash app inside a Jupyter Notebook:

      import plotly.express as px
      from jupyter_dash import JupyterDash
      import dash_core_components as dcc
      import dash_html_components as html
      from dash.dependencies import Input, Output
    1. there is a particularly unconventional mechanism by which these coroutines actually get run. Their result is an attribute of the exception object that gets thrown when their .send() method is called.

      A generator signals its termination with an exception (StopIteration). This was already a feature of generators.

      The "trick" to make coroutines work, is that this exception is used as a real return value when the coroutine terminates (with a return statement).

      The return value is (somehow) incapsulated into the exception that's being raised, and the event loop handles it.

      In depth explanation: http://www.dabeaz.com/coroutines/

    2. If Python encounters an await f() expression in the scope of g(), this is how await tells the event loop, “Suspend execution of g() until whatever I’m waiting on—the result of f()—is returned. In the meantime, go let something else run.”

      The event loop in python orchestrate the whole "simulated concurrency" among coroutines.

      Deep down, python has a library select that talks very closely with the OS and gets data from sockets. This is actually how the orchestra works really at the bottom layer. https://docs.python.org/3/library/select.html#module-select

    3. A coroutine is a specialized version of a Python generator function

      in fact, async in python was built on top of the generators (that existed in python since long before).

      A generator is a function that can be suspended --yielding a value-- and then resumed.

      A key functionality of the generators in python is that when they are resumed they can receive a value back from the code that stopped/resumed them. This translates into the syntax new_value = await coroutine()

    4. Asynchronous routines are able to “pause” while waiting on their ultimate result and let other routines run in the meantime. Asynchronous code, through the mechanism above, facilitates concurrent execution. To put it differently, asynchronous code gives the look and feel of concurrency.

      Async routines collaborate each with the others by saying when they can be paused. This is why they are called coroutines.

      The communication between coroutines happen thanks to the event loop

    1. Python 中的变量赋值不需要类型声明。 每个变量在内存中创建,都包括变量的标识,名称和数据这些信息。 每个变量在使用前都必须赋值,变量赋值以后该变量才会被创建。 等号(=)用来给变量赋值。 等号(=)运算符左边是一个变量名,等号(=)运算符右边是存储在变量中的值。


    1. 像if、while、def和class这样的复合语句,首行以关键字开始,以冒号( : )结束,该行之后的一行或多行代码构成代码组。 我们将首行及后面的代码组称为一个子句(clause)。


    2. Python语句中一般以新行作为语句的结束符。 但是我们可以使用斜杠( \)将一行的语句分为多行显示,如下所示:


    3. 在 Python 里,标识符由字母、数字、下划线组成。 在 Python 中,所有标识符可以包括英文、数字以及下划线(_),但不能以数字开头。 Python 中的标识符是区分大小写的。 以下划线开头的标识符是有特殊意义的。以单下划线开头 _foo 的代表不能直接访问的类属性,需通过类提供的接口进行访问,不能用 from xxx import * 而导入。 以双下划线开头的 __foo 代表类的私有成员,以双下划线开头和结尾的 __foo__ 代表 Python 里特殊方法专用的标识,如 __init__() 代表类的构造函数


    1. You can create estimation plots here at estimationstats.com, or with the DABEST packages which are available in R, Python, and Matlab.

      You can create estimation plots with:

    1. Kolejna zmienna - "Recency", czyli informacja o tym, jak dawno klient robił zakupy w sklepie.

      To calculate RFM we need recency value. Firstly, we shall specify the most recent transaction as "today" and then to find the latest transaction of a specific client:

      df['Recency'] = (today - df.InvoiceDate)/np.timedelta64(1,'D')

      Calculating frequency and aggregating data of each client may be done with the groupby method:

      abt = df.groupby(['CustomerID']).agg({'Recency':'min', 'MonetaryValue':'sum', 'InvoiceNo':'count'})

      lastly, we can update the column names and display RFM data:

      abt = df.groupby(['CustomerID']).agg({'Recency':'min', 'MonetaryValue':'sum', 'InvoiceNo':'count'})
      abt.rename(columns = {'InvoiceNo':'Frequency'}, inplace = True)
      abt = abt[['Recency', 'Frequency', 'MonetaryValue']]
    2. Usuwam brakujące wartości w zmiennej "CustomerID".

      Deleting rows where value is null:

      df = df[~df.CustomerID.isnull()]

      Assigning different data types to columns:

      df['CustomerID'] = df.CustomerID.astype(int)

      Deleting irrelevant columns:

      df.drop(['Description', 'StockCode', 'Country'], axis = 1, inplace = True)
    3. Sprawdzam braki danych.

      Checking the % of missing data:

      print(str(round(df.isnull().any(axis=1).sum()/df.shape[0]*100,2))+'% obserwacji zawiera braki w danych.')

      Sample output:

      24.93% obserwacji zawiera braki w danych.
    4. Buduję prosty data frame z podstawowymi informacjami o zbiorze.

      Building a simple dataframe with a summary of our columns (data types, sum and % of nulls):

      summary = pd.DataFrame(df.dtypes, columns=['Dtype'])
      summary['Nulls'] = pd.DataFrame(df.isnull().any())
      summary['Sum_of_nulls'] = pd.DataFrame(df.isnull().sum())
      summary['Per_of_nulls'] = round((df.apply(pd.isnull).mean()*100),2)
      summary.Dtype = summary.Dtype.astype(str)

      the output:

                            Dtype  Nulls  Sum_of_nulls  Per_of_nulls
      InvoiceNo            object  False             0         0.000
      StockCode            object  False             0         0.000
      Description          object   True          1454         0.270
      Quantity              int64  False             0         0.000
      InvoiceDate  datetime64[ns]  False             0         0.000
      UnitPrice           float64  False             0         0.000
      CustomerID          float64   True        135080        24.930
      Country              object  False             0         0.000
    1. Programming languages These will probably expose my ignorance pretty nicely.

      When to use different programming languages (advice from an Amazon employee):

      • Java - enterprise applications
      • C# - Microsoft's spin on Java (useful in the Microsoft's ecosystem)
      • Ruby - when speed is more important then legibility or debugging
      • Python - same as Ruby but also for ML/AI (don't forget to use type hinting to make life a little saner)
      • Go/Rust - fresh web service where latency and performance were more important than community/library support
      • Haskell/Erlang - for very elegant/mathematical functional approach without a lot of business logic
      • Clojure - in situation when you love Lisp (?)
      • Kotlin/Scala - languages compiling to JVM bytecode (preferable over Clojure). Kotlin works with Java and has great IntelliJ support
      • C - classes of applications (operating systems, language design, low-level programming and hardware)
      • C++ - robotics, video games and high frequency trading where the performance gains from no garbage collection make it preferable to Java
      • PHP/Hack - testing server changes without rebuilding. PHP is banned at Amazon due to security reasons, but its successor, Hack, runs a lot of Facebook and Slack's backends
  7. Apr 2020
    1. In Python, when trying to do a dubious operation, you get an error pretty soon. In JavaScript… an undefined can fly through a few layers of abstraction, causing an error in a seemingly unrelated piece of code.

      Undefined nature of JavaScript can hide an error for a long time. For example,

      function add(a,b) { return + (a + b) }
      add('2', 2)

      will result in a number, but is it the same one?

    1. How to use the PPS in your own (Python) project

      Using PPS with Python

      • Download ppscore: pip install ppscoreshell
      • Calculate the PPS for a given pandas dataframe:
        import ppscore as pps
        pps.score(df, "feature_column", "target_column")
      • Calculate the whole PPS matrix:
    1. Sometimes, the best way to learn is to mimic others. Here are some great examples of projects that use documentation well:

      Examples of projects that use documentation well

      (chech the list below)

    2. Class method docstrings should contain the following: A brief description of what the method is and what it’s used for Any arguments (both required and optional) that are passed including keyword arguments Label any arguments that are considered optional or have a default value Any side effects that occur when executing the method Any exceptions that are raised Any restrictions on when the method can be called

      Class method should contain:

      • brief description
      • arguments
      • label on default/optional arguments
      • side effects description
      • raised exceptions
      • restrictions on when the method can be called

      (check example below)

    3. From examining the type hinting, you can immediately tell that the function expects the input name to be of a type str, or string. You can also tell that the expected output of the function will be of a type str, or string, as well.

      Type hinting introduced in Python 3.5 extends 4 rules of Jeff Atwood and comments the code itself, such as this example:

      def hello_name(name: str) -> str:
          return(f"Hello {name}")
      • user knows that the code expects input of type str
      • the same about output
    4. Docstrings can be further broken up into three major categories: Class Docstrings: Class and class methods Package and Module Docstrings: Package, modules, and functions Script Docstrings: Script and functions

      3 main categories of docstrings

    5. Docstring conventions are described within PEP 257. Their purpose is to provide your users with a brief overview of the object.

      Docstring conventions

    6. All multi-lined docstrings have the following parts: A one-line summary line A blank line proceeding the summary Any further elaboration for the docstring Another blank line

      Multi-line docstring example:

      """This is the summary line
      This is the further elaboration of the docstring. Within this section,
      you can elaborate further on details as appropriate for the situation.
      Notice that the summary and the elaboration is separated by a blank new
      # Notice the blank line above. Code should continue on this line.
    7. say_hello.__doc__ = "A simple function that says hello... Richie style"

      Example of using __doc:

      Code (version 1):

      def say_hello(name):
          print(f"Hello {name}, is it me you're looking for?")
      say_hello.__doc__ = "A simple function that says hello... Richie style"

      Code (alternative version):

      def say_hello(name):
          """A simple function that says hello... Richie style"""
          print(f"Hello {name}, is it me you're looking for?")


      >>> help(say_hello)


      Help on function say_hello in module __main__:
          A simple function that says hello... Richie style
    8. class constructor parameters should be documented within the __init__ class method docstring


    9. Scripts are considered to be single file executables run from the console. Docstrings for scripts are placed at the top of the file and should be documented well enough for users to be able to have a sufficient understanding of how to use the script.

      Docstrings in scripts

    10. Documenting your code, especially large projects, can be daunting. Thankfully there are some tools out and references to get you started

      You can always facilitate documentation with tools.

      (check the table below)

    11. Along with these tools, there are some additional tutorials, videos, and articles that can be useful when you are documenting your project

      Recommended videos to start documenting

      (check the list below)

    12. If you use argparse, then you can omit parameter-specific documentation, assuming it’s correctly been documented within the help parameter of the argparser.parser.add_argument function. It is recommended to use the __doc__ for the description parameter within argparse.ArgumentParser’s constructor.


    13. There are specific docstrings formats that can be used to help docstring parsers and users have a familiar and known format.

      Different docstring formats:

    14. Daniele Procida gave a wonderful PyCon 2017 talk and subsequent blog post about documenting Python projects. He mentions that all projects should have the following four major sections to help you focus your work:

      Public and Open Source Python projects should have the docs folder, and inside of it:

      • Tutorials
      • How-To Guides
      • References
      • Explanations

      (check the table below for a summary)

    15. Since everything in Python is an object, you can examine the directory of the object using the dir() command

      dir() function examines directory of Python objects. For example dir(str).

      Inside dir(str) you can find interesting property __doc__

    16. Documenting your Python code is all centered on docstrings. These are built-in strings that, when configured correctly, can help your users and yourself with your project’s documentation.

      Docstrings - built-in strings that help with documentation

    17. Along with docstrings, Python also has the built-in function help() that prints out the objects docstring to the console.

      help() function.

      After typing help(str) it will return all the info about str object

    18. The general layout of the project and its documentation should be as follows:
      ├── project/  # Project source code
      ├── docs/
      ├── README
      ├── examples.py

      (private, shared or open sourced)

    19. In all cases, the docstrings should use the triple-double quote (""") string format.

      Think only about """ when using docstrings

    1. Since we have much faster CPUs now, numerical calculations are done in Python which is much slower than Fortran. So numerical calculations basically take the same amount of time as they did 20 years ago.

      Python vs Fortran ;)

    1. Python unit testing libraries, like unittest, can be used within a notebook, but standard CI/CD tooling has trouble dealing with notebooks for the same reasons that notebook diffs are hard to read.

      unittest Python library doesn't work well in a notebook

    1. def handle_exception(self, job, *exc_info):

      To unit test an exception handler:

      worker = Worker(..., exception_handler=[handle_exception])
          raise Exception()
      except Exception:
          exc_info = sys.exc_info()
      worker.handle_exception(job, *exc_info)
    1. Python contributed examples¶ Mic VAD Streaming¶ This example demonstrates getting audio from microphone, running Voice-Activity-Detection and then outputting text. Full source code available on https://github.com/mozilla/DeepSpeech-examples. VAD Transcriber¶ This example demonstrates VAD-based transcription with both console and graphical interface. Full source code available on https://github.com/mozilla/DeepSpeech-examples.
    1. Python API Usage example Edit on GitHub Python API Usage example¶ Examples are from native_client/python/client.cc. Creating a model instance and loading model¶ 115 ds = Model(args.model) Performing inference¶ 149 150 151 152 153 154 if args.extended: print(metadata_to_string(ds.sttWithMetadata(audio, 1).transcripts[0])) elif args.json: print(metadata_json_output(ds.sttWithMetadata(audio, 3))) else: print(ds.stt(audio)) Full source code
    1. failure_ttl

      How long to keep a failed job.

    2. result_ttl=600

      How long to keep a successful job.

    3. job.meta['handled_by'] = socket.gethostname() job.save_meta()

      You can add metadata on the job like keeping track of the number of times a job has been retried for example.

    1. w = Worker([q], exception_handlers=[foo_handler, bar_handler])

      Exception handlers are attached to the worker.

    2. def my_handler(job, exc_type, exc_value, traceback): # do custom things here

      Write an exception handler that requeues a failed job.

    1. How speech recognition works, What packages are available on PyPI; and How to install and use the SpeechRecognition package—a full-featured and easy-to-use Python speech recognition library.
    2. The Ultimate Guide To Speech Recognition With Python
    1. How to setup and use Stanford CoreNLP Server with Python Khalid Alnajjar August 20, 2017 Natural Language Processing (NLP) Leave a CommentStanford CoreNLP is a great Natural Language Processing (NLP) tool for analysing text. Given a paragraph, CoreNLP splits it into sentences then analyses it to return the base forms of words in the sentences, their dependencies, parts of speech, named entities and many more. Stanford CoreNLP not only supports English but also other 5 languages: Arabic, Chinese, French, German and Spanish. To try out Stanford CoreNLP, click here.Stanford CoreNLP is implemented in Java. In some cases (e.g. your main code-base is written in different language or you simply do not feel like coding in Java), you can setup a Stanford CoreNLP Server and, then, access it through an API. In this post, I will show how to setup a Stanford CoreNLP Server locally and access it using python.
    1. For example, the test_decrement functions given earlier are failing because the assertion itself is faulty.

      Debugging tests themselves

      1. Set a breakpoint on the first line of the failing function (e.g. test_decrement)
      2. Click the "Debug Test" option above the function
      3. Open Debug Console and type: inc_dec.decrement(3) to see what is the actual output when we use x=3
      4. Stop the debugger and correct the tests
      5. Save the test file and run the tests again to look for a positive result
    2. Support for running tests in parallel with pytest is available through the pytest-xdist package.

      pytest-xdist provides support for parallel testing.

      1. To enable it on Windows:

      py -3 -m pip install pytest-xdist

      1. Create a file pytest.ini in your project directory and specify in it the number of CPUs to be used (e.g. 4):
      2. Run your tests
    3. With pytest, failed tests also appear in the Problems panel, where you can double-click on an issue to navigate directly to the test

      pytest displays failed tests also in PROBLEMS

    4. VS Code also shows test results in the Python Test Log output panel (use the View > Output menu command to show the Output panel, then select Python Test Log

      Another way to view the test results:

      View > Output > Python Test Log

    5. For Python, test discovery also activates the Test Explorer with an icon on the VS Code activity bar. The Test Explorer helps you visualize, navigate, and run tests

      Test Explorer is activated after discovering tests in Python:

    6. Once VS Code recognizes tests, it provides several ways to run those tests

      After discovering tests, we can run them, for example, using CodeLens:

    7. You can trigger test discovery at any time using the Python: Discover Tests command.

      After using python.testing.autoTestDiscoverOnSaveEnabled, it'll be set to true and discovering tests whenever a test file is saved.

      If discovery succeeds, the status bar shows Run Tests instead:

    8. Sometimes tests placed in subfolders aren't discovered because such test files cannot be imported. To make them importable, create an empty file named __init__.py in that folder.

      Tip to use when tests aren't discoverable in subfolderds (create __init__.py file)

    9. Testing in Python is disabled by default. To enable testing, use the Python: Configure Tests command on the Command Palette.

      Start testing in VS Code by using Python: Configure Tests (it automatically chooses one testing framework and disables the rest).

      Otherwise, you can configure tests manually by setting only one of the following to True:

      • python.testing.unittestEnabled
      • python.testing.pytestEnabled
      • python.testing.nosetestsEnabled
    10. python.testing.pytestArgs: Looks for any Python (.py) file whose name begins with "test_" or ends with "_test", located anywhere within the current folder and all subfolders.

      Default behaviour of test discovery by pytest framework

    11. python.testing.unittestArgs: Looks for any Python (.py) file with "test" in the name in the top-level project folder.

      Default behaviour of test discovery by unittest framework

    12. Create a file named test_unittest.py that contains a test class with two test methods

      Sample test file using unittest framework. inc_dec is the file that's being tested:

      import inc_dec    # The code to test
      import unittest   # The test framework
      class Test_TestIncrementDecrement(unittest.TestCase):
          def test_increment(self):
              self.assertEqual(inc_dec.increment(3), 4) # checks if the results is 4 when x = 3
          def test_decrement(self):
              self.assertEqual(inc_dec.decrement(3), 4)
      if __name__ == '__main__':
    13. Nose2, the successor to Nose, is just unittest with plugins

      Nose2 testing

    14. Python tests are Python classes that reside in separate files from the code being tested.
    15. general background on unit testing, see Unit Testing on Wikipedia. For a variety of useful unit test examples, see https://github.com/gwtw/py-sorting
    16. each test is very simple: invoke the function with an argument and assert the expected return value.

      e.g. test of an exact number entry:

          def test_validator_valid_string():
              # The exact assertion call depends on the framework as well
              assert(validate_account_number_format("1234567890"), true)