222 Matching Annotations
  1. Last 7 days
    1. cachew lets you cache function calls into an sqlite database on your disk in a matter of single decorator (similar to functools.lru_cache). The difference from functools.lru_cache is that cached data is persisted between program runs, so next time you call your function, it will only be a matter of reading from the cache.

      cachew tool isolates the complexity of database access patterns in a Python library

    2. supporting this field is extremely easy If you keep raw data, it's just a matter of adding a getter method to the Article class.

      Way of supporting a new field in JSON is much easier than in a relational database:

      @property
      def highlights(self) -> Sequence[Highlight]:
          default = [] # defensive to handle older export formats that had no annotations
          jsons = self.json.get('annotations', default)
          return list(map(Highlight, jsons))
      
    3. query language doesn't necessarily mean a database. E.g. see pandas which is capable of what SQL is capable of, and even more convenient than SQL for our data exploration purposes.

      Query language, not always = database. For example, see pandas

  2. Feb 2020
    1. By the way, what's up with Iterator everywhere?

      I really don't understand Iterator. How can it be used to eliminate temporary lists?

    1. # Import the Matcher from spacy.matcher import Matcher

      ¿Cómo importar el Matcher de Spacy?

    2. In spaCy, attributes that return strings usually end with an underscore – attributes without the underscore return an ID.

      Spacy: atributos con y sin , como por ejemplo pos_ y id

  3. Jan 2020
    1. The nonlocal keyword is used to work with variables inside nested functions, where the variable should not belong to the inner function. Use the keyword nonlocal to declare that the variable is not local.

      nonlocal is used to declare variables inside the nested functions.

      Example (if nonlocal wouldn't be used, the output would be "John", not "hello"):

      def myfunc1():
        x = "John"
        def myfunc2():
          nonlocal x
          x = "hello"
        myfunc2()
        return x
      
      print(myfunc1())
      
    1. We only need to use global keyword in a function if we want to do assignments / change them. global is not needed for printing and accessing.

      global inside a function is:

      • required if we want to assign/change variable
      • not required if we just want to print/access a variable.

      Example code:

      # This function modifies global variable 's' 
      def f(): 
          global s 
          print s 
          s = "Look for Geeksforgeeks Python Section"
          print s  
      
      # Global Scope 
      s = "Python is great!" 
      f() 
      print s
      
    1. Since we have much faster CPUs now, numerical calculations are done in Python which is much slower than Fortran. So numerical calculations basically take the same amount of time as they did 20 years ago.

      Python vs Fortran ;)

    1. Sometimes, the best way to learn is to mimic others. Here are some great examples of projects that use documentation well:

      Examples of projects that use documentation well

      (chech the list below)

    2. Along with these tools, there are some additional tutorials, videos, and articles that can be useful when you are documenting your project

      Recommended videos to start documenting

      (check the list below)

    3. Documenting your code, especially large projects, can be daunting. Thankfully there are some tools out and references to get you started

      You can always facilitate documentation with tools.

      (check the table below)

    4. Daniele Procida gave a wonderful PyCon 2017 talk and subsequent blog post about documenting Python projects. He mentions that all projects should have the following four major sections to help you focus your work:

      Public and Open Source Python projects should have the docs folder, and inside of it:

      • Tutorials
      • How-To Guides
      • References
      • Explanations

      (check the table below for a summary)

    5. The general layout of the project and its documentation should be as follows:
      project_root/
      │
      ├── project/  # Project source code
      ├── docs/
      ├── README
      ├── HOW_TO_CONTRIBUTE
      ├── CODE_OF_CONDUCT
      ├── examples.py
      

      (private, shared or open sourced)

    6. There are specific docstrings formats that can be used to help docstring parsers and users have a familiar and known format.

      Different docstring formats:

    7. If you use argparse, then you can omit parameter-specific documentation, assuming it’s correctly been documented within the help parameter of the argparser.parser.add_argument function. It is recommended to use the __doc__ for the description parameter within argparse.ArgumentParser’s constructor.

      argparse

    8. Scripts are considered to be single file executables run from the console. Docstrings for scripts are placed at the top of the file and should be documented well enough for users to be able to have a sufficient understanding of how to use the script.

      Docstrings in scripts

    9. class constructor parameters should be documented within the __init__ class method docstring

      init

    10. Class method docstrings should contain the following: A brief description of what the method is and what it’s used for Any arguments (both required and optional) that are passed including keyword arguments Label any arguments that are considered optional or have a default value Any side effects that occur when executing the method Any exceptions that are raised Any restrictions on when the method can be called

      Class method should contain:

      • brief description
      • arguments
      • label on default/optional arguments
      • side effects description
      • raised exceptions
      • restrictions on when the method can be called

      (check example below)

    11. Docstrings can be further broken up into three major categories: Class Docstrings: Class and class methods Package and Module Docstrings: Package, modules, and functions Script Docstrings: Script and functions

      3 main categories of docstrings

    12. All multi-lined docstrings have the following parts: A one-line summary line A blank line proceeding the summary Any further elaboration for the docstring Another blank line

      Multi-line docstring example:

      """This is the summary line
      
      This is the further elaboration of the docstring. Within this section,
      you can elaborate further on details as appropriate for the situation.
      Notice that the summary and the elaboration is separated by a blank new
      line.
      
      # Notice the blank line above. Code should continue on this line.
      
    13. In all cases, the docstrings should use the triple-double quote (""") string format.

      Think only about """ when using docstrings

    14. Docstring conventions are described within PEP 257. Their purpose is to provide your users with a brief overview of the object.

      Docstring conventions

    15. say_hello.__doc__ = "A simple function that says hello... Richie style"

      Example of using __doc:

      Code (version 1):

      def say_hello(name):
          print(f"Hello {name}, is it me you're looking for?")
      
      say_hello.__doc__ = "A simple function that says hello... Richie style"
      

      Code (alternative version):

      def say_hello(name):
          """A simple function that says hello... Richie style"""
          print(f"Hello {name}, is it me you're looking for?")
      

      Input:

      >>> help(say_hello)
      

      Returns:

      Help on function say_hello in module __main__:
      
      say_hello(name)
          A simple function that says hello... Richie style
      
    16. Since everything in Python is an object, you can examine the directory of the object using the dir() command

      dir() function examines directory of Python objects. For example dir(str).

      Inside dir(str) you can find interesting property __doc__

    17. Along with docstrings, Python also has the built-in function help() that prints out the objects docstring to the console.

      help() function.

      After typing help(str) it will return all the info about str object

    18. Documenting your Python code is all centered on docstrings. These are built-in strings that, when configured correctly, can help your users and yourself with your project’s documentation.

      Docstrings - built-in strings that help with documentation

    19. From examining the type hinting, you can immediately tell that the function expects the input name to be of a type str, or string. You can also tell that the expected output of the function will be of a type str, or string, as well.

      Type hinting introduced in Python 3.5 extends 4 rules of Jeff Atwood and comments the code itself, such as this example:

      def hello_name(name: str) -> str:
          return(f"Hello {name}")
      
      • user knows that the code expects input of type str
      • the same about output
    1. Dutch programmer Guido van Rossum designed Python in 1991, naming it after the British television comedy Monty Python's Flying Circus because he was reading the show's scripts at the time.

      Origins of Python name

  4. Dec 2019
    1. Werkzeug provides a development server: a simple web server that you can run with a single command and almost no configuration. When you do flask run (or werkzeug.serving.run_simple()), this development server is what you are getting
    1. Superseded by (replacement project): https://github.com/QTodoTxt/QTodoTxt2/


      Before I realized there was a replacement project:

      Blah. Looks good but unmaintained. Too many competing forks.

      Other interesting fork: https://github.com/kmicc/QTodoTxt/tree/dev

      This branch is 17 commits ahead, 201 commits behind QTodoTxt:dev. https://github.com/QTodoTxt/QTodoTxt/compare/master...kmicc:dev https://github.com/kmicc/QTodoTxt/network

    1. I want to get the selected number of bins from the slider and pass that number into a python method and do some calculation/manipulation (return: “You have selected 30bins and I came from a Python Function”) inside of it then return some value back to my R Shiny dashboard and view that result in a text field.

      Using Python scripts inside R Shiny (in 6 steps):

      1. In ui.R create textOutput: textOutput("textOutput") (after plotoutput()).
      2. In server.R create handler: output$textOutput <- renderText({ }].
      3. Create python_ref.py and insert this code:
      4. Import reticulate library: library(reticulate).
      5. source_python() function will make Python available in R:
      6. Make sure you've these files in your directory:
      • app.R
      • python_ref.py and that you've imported the reticulate package to R Environment and sourced the script inside your R code.

      Hit run.

    1. x = [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]] arr = [i for j in x for i in j] print(arr) >>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

      Flattening a multi-dimensional matrix into a 1-D array using list comprehension:

      x = [[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]]
      
      arr = [i for j in x for i in j]
      print(arr)
      >>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
      
    2. # Creating a 5x5 matrix arr = [[i for i in range(5)] for j in range(5)] arr >>> [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]

      Nested for loop using list comprehension to come up with 5x5 matrix:

      arr = [[i for i in range(5)] for j in range(5)]
      arr
      >>> [[0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4]]
      
    3. arr = [i for i in range(10) if i % 2 == 0] print(arr) >>> [0, 2, 4, 6, 8] arr = ["Even" if i % 2 == 0 else "Odd" for i in range(10)] print(arr) >>> ['Even', 'Odd', 'Even', 'Odd', 'Even', 'Odd', 'Even', 'Odd', 'Even', 'Odd']

      2 examples of conditional statements in list comprehension:

      arr = [i for i in range(10) if i % 2 == 0]
      print(arr)
      >>> [0, 2, 4, 6, 8]
      

      and:

      arr = ["Even" if i % 2 == 0 else "Odd" for i in range(10)]
      print(arr)
      >>> ['Even', 'Odd', 'Even', 'Odd', 'Even', 'Odd', 'Even', 'Odd', 'Even', 'Odd']
      
    4. x = [2,45,21,45] y = {i:v for i,v in enumerate(x)} print(y) >>> {0: 2, 1: 45, 2: 21, 3: 45}

      List comprehension in Python to create a simple dictionary:

      x = [2,45,21,45]
      y = {i:v for i,v in enumerate(x)}
      print(y)
      >>> {0: 2, 1: 45, 2: 21, 3: 45}
      
    1. The reason that Julia is fast (ten to 30 times faster than Python) is because it is compiled and not interpreted

      Julia seems to be even faster than Scala when comparing to the speed of Python

    2. Scala is ten times faster than Python

      Interesting estimation

    1. Vaex is an open-source DataFrame library which enables the visualisation, exploration, analysis and even machine learning on tabular datasets that are as large as your hard-drive. To do this, Vaex employs concepts such as memory mapping, efficient out-of-core algorithms and lazy evaluations.

      Vaex - library to manage as large datasets as your HDD, thanks to:

      • memory mapping
      • efficient out-of-core algorithms
      • lazy evaluations.

      All wrapped in a Pandas-like API

    1. HTTPretty

      Use this to mock the request object so that it will be easier to mock the request context manager.

    1. vectors here have length 7 instead of 6 because of the extra 0 element at the beginning. This is an inconsequential detail - Keras reserves index 0 and never assigns it to any word.

      Keras always reserves 0 and never assigns any word to it; therefore, even when we have 6 words, we end up with the length of 7:

      [0. 1. 1. 1. 0. 0. 0.]

    2. Here’s my preferred way of doing it, which uses Keras’s Tokenizer class

      Keras's Tokenizer Class - Victor's preferred way of implementing BOW in Python

  5. Nov 2019
    1. import operator s = sorted(s, key = operator.itemgetter(1, 2))

      Sort by multiple indexes.

    2. to reverse to only one attribute, you can sort twice: first by the secondary s = sorted(s, key = operator.itemgetter(2)) then by the primary s = sorted(s, key = operator.itemgetter(1), reverse=True)

      This makes sense, particularly from specification of sort order in Pandas, if I recall correctly you usually specify the primary last which makes me think on the backend implementation it does something similar to this where it sorts from the lowest priority to the highest priority.

  6. Oct 2019
    1. Python code should be elegant, concise, and readable. It should be beautiful. The ultimate resource on how to write beautiful Python code is Style Guide for Python Code or PEP 8

      Write beautiful Python code with

      • Style Guide for Python Code
      • PEP 8

      PEP 8 provides the style guide for Python code, and PEP 20 represents the principles of Python language

    2. what if an exception occurs while processing your file? Then my_file.close() is never executed. You can handle this with exception-handling syntax or with context managers

      **`with a`` block to handle exceptions:

      >>> with open('filename.csv', 'w') as my_file:
      ...     # do something with `my_file`
      

      Using the with block means that the special methods .enter() and .exit() are called, even in the cases of exceptions

    3. open a file and process it

      Open a file:

      >>> my_file = open('filename.csv', 'w')
      >>> # do something with `my_file`
      

      Close the file to properly manage memory:

      >>> my_file = open('filename.csv', 'w')
      >>> # do something with `my_file and`
      >>> my_file.close()
      
    4. The author of the class probably begins the names with the underscore(s) to tell you, “don’t use it”

      __ <--- don't touch!

    5. Python doesn’t have real private class members. However, there’s a convention that says that you shouldn’t access or modify the members beginning with the underscore (_) outside their instances. They are not guaranteed to preserve the existing behavior.

      Avoiding accessing protected class members Consider the code:

      >>> class C:
      ...     def __init__(self, *args):
      ...         self.x, self._y, self.__z = args
      ... 
      >>> c = C(1, 2, 4)
      

      The instances of class C have three data members: .x, .y, and ._Cz (because z has two underscores, unlike y). If a member’s name begins with a double underscore (dunder), it becomes mangled, that is modified. That’s why you have ._Cz instead of ._z.

      Now, it's quite OK to access/modify .x directly:

      >>> c.x  # OK
      1
      

      You can also access ._y, from outside its instance, but it's considered a bad practice:

      >>> c._y  # Possible, but a bad practice!
      2
      

      You can’t access .z because it’s mangled, but you can access or modify ._Cz:

      >>> c.__z # Error!
      Traceback (most recent call last):
      File "", line 1, in 
      AttributeError: 'C' object has no attribute '__z'
      >>> c._C__z # Possible, but even worse!
      4
      >>>
      
    6. it’s a good practice to use properties when you can and C++-like getters and setters when you have to

      Use

      • properties <--- when you can
      • getters and setters <--- when you have to
    7. it’s often more elegant to define and use properties, especially in simple cases

      Defining some properties (considered to be more Pythonic):

      >>> class C:
      ...     @property
      ...     def x(self):
      ...         return self.__x
      ...     @x.setter
      ...     def x(self, value):
      ...         self.__x = value
      

      Result:

      >>> c = C()
      >>> c.x = 2
      >>> c.x
      2
      
    8. Python allows defining getter and setter methods similarly as C++ and Java

      Getters and Setters in Python:

      >>> class C:
      ...     def get_x(self):
      ...         return self.__x
      ...     def set_x(self, value):
      ...         self.__x = value
      

      Get and set the state of the object:

      >>> c = C()
      >>> c.set_x(2)
      >>> c.get_x()
      2
      
    9. You can keep away from that with some additional logic. One of the ways is this:

      You can keep away from the problem:

      >>> def f(value, seq=[]):
      ...     seq.append(value)
      ...     return seq
      

      with some additional logic:

      >>> def f(value, seq=None):
      ...     if seq is None:
      ...         seq = []
      ...     seq.append(value)
      ...     return seq
      

      Shorter version:

      >>> def f(value, seq=None):
      ...     if not seq:
      ...         seq = []
      ...     seq.append(value)
      ...     return seq
      

      The result:

      >>> f(value=2)
      [2]
      >>> f(value=4)
      [4]
      >>> f(value=8)
      [8]
      >>> f(value=16)
      [16]
      
    10. Python has a very flexible system of providing arguments to functions and methods. Optional arguments are a part of this offer. But be careful: you usually don’t want to use mutable optional arguments

      Avoiding multiple optional arguments:

      >>> def f(value, seq=[]):
      ...     seq.append(value)
      ...     return seq
      

      If you don't provide seq, f() appends a value to an empty list and returns something like [value]:

      >>> f(value=2)
      [2]
      

      Don't be fooled. This option isn't fine...

      >>> f(value=4)
      [2, 4]
      >>> f(value=8)
      [2, 4, 8]
      >>> f(value=16)
      [2, 4, 8, 16]
      
    11. The Pythonic way is to exploit the fact that zero is interpreted as False in a Boolean context, while all other numbers are considered as True

      Comparing to zero - Pythonic way:

      >>> bool(0)
      False
      >>> bool(-1), bool(1), bool(20), bool(28.4)
      (True, True, True, True)
      

      Using if item instead of if item != 0:

      >>> for item in x:
      ...     if item:
      ...         print(item)
      ... 
      1
      2
      3
      4
      

      You can also use if not item instead of if item == 0

    12. When you have numeric data, and you need to check if the numbers are equal to zero, you can but don’t have to use the comparison operators == and !=

      Comparing to zero:

      >>> x = (1, 2, 0, 3, 0, 4)
      >>> for item in x:
      ...     if item != 0:
      ...         print(item)
      ... 
      1
      2
      3
      4
      
    13. Iterating over a dictionary yields its keys

      Iterating over a dictionary:

      >>> z = {'a': 0, 'b': 1}
      >>> for k in z:
      ... print(k, z[k])
      ... 
      a 0
      b 1
      

      Applying method .items():

      >>> for k, v in z.items():
      ...     print(k, v)
      ... 
      a 0
      b 1
      

      You can also use the methods .keys() and .values()

    14. what if you want to iterate over two or more sequences? Of course, you can use the range again

      Iterating over two or more sequences:

      >>> y = 'abcde'
      >>> for i in range(len(x)):
      ...     print(x[i], y[i])
      ... 
      1 a
      2 b
      4 c
      8 d
      16 e
      

      Better solution by applying zip:

      >>> for item in zip(x, y):
      ...     print(item)
      ... 
      (1, 'a')
      (2, 'b')
      (4, 'c')
      (8, 'd')
      (16, 'e')
      

      Combining it with unpacking:

      >>> for x_item, y_item in zip(x, y):
      ...     print(x_item, y_item)
      ... 
      1 a
      2 b
      4 c
      8 d
      16 e
      
    15. Sometimes you need both the items from a sequence and the corresponding indices

      Reversing with indices:

      >>> for i in range(len(x)):
      ...     print(i, x[i])
      ... 
      0 1
      1 2
      2 4
      3 8
      4 16
      

      Better way by using enumerate:

      >>> for i, item in enumerate(x):
      ...     print(i, item)
      ... 
      0 1
      1 2
      2 4
      3 8
      4 16
      
    16. But what if you want to iterate in the reversed order? Of course, the range is an option again

      Iterating over a reversed order:

      >>> for i in range(len(x)-1, -1, -1):
      ...     print(x[i])
      ... 
      16
      8
      4
      2
      1
      

      More elegant way:

      >>> for item in x[::-1]:
      ...     print(item)
      ... 
      16
      8
      4
      2
      1
      

      Pythonic way of reversing an order:

      >>> for item in reversed(x):
      ...     print(item)
      ... 
      16
      8
      4
      2
      1
      
    17. In almost all cases, you can use the range to get an iterator that yields integers

      Iterating over Sequences and Mappings

      >>> x = [1, 2, 4, 8, 16]
      >>> for i in range(len(x)):
      ...     print(x[i])
      ... 
      1
      2
      4
      8
      16
      

      better way of iterating over a sequence:

      >>> for item in x:
      ...     print(item)
      ... 
      1
      2
      4
      8
      16
      
    18. It’s possible to check whether a variable refers to it with the comparison operators == and !=

      Checking against None with == and !=

      >>> x, y = 2, None
      >>> x == None
      False
      >>> y == None
      True
      >>> x != None
      True
      >>> y != None
      False
      

      More Pythonic way by using is and is not:

      >>> x is None
      False
      >>> y is None
      True
      >>> x is not None
      True
      >>> y is not None
      False
      
    19. None is a special and unique object in Python. It has a similar purpose, like null in C-like languages

      None (Python) ==similar== Null (C)

    20. Python also supports chained assignments. So, if you want to assign the same value to multiple variables, you can do it in a straightforward way

      Chained assignments <--- assign the same value to multiple variables:

      >>> x = 2
      >>> y = 2
      >>> z = 2
      

      More elegant way:

      >>> x, y, z = 2, 2, 2
      

      Chained assignments:

      >>> x = y = z = 2
      >>> x, y, z
      (2, 2, 2)
      
    21. Python allows you to chain the comparison operations. So, you don’t have to use and to check if two or more comparisons are True

      Chaining <--- checking if two or more operations are True

      >>> x = 4
      >>> x >= 2 and x <= 8
      True
      

      More compact (mathematical) form:

      >>> 2 <= x <= 8
      True
      >>> 2 <= x <= 3
      False
      
    22. Unpacking can be used for the assignment to multiple variables in more complex cases

      Unpacking <--- assign even more variables

      >>> x = (1, 2, 4, 8, 16)
      >>> a = x[0]
      >>> b = x[1]
      >>> c = x[2]
      >>> d = x[3]
      >>> e = x[4]
      >>> a, b, c, d, e
      (1, 2, 4, 8, 16)
      

      more readable approach:

      >>> a, b, c, d, e = x
      >>> a, b, c, d, e
      (1, 2, 4, 8, 16)
      

      even cooler (* collects values not assigned to others):

      >>> a, *y, e = x
      >>> a, e, y
      (1, 16, [2, 4, 8])
      
    23. the most concise and elegant variables swap

      Unpacking <--- swap values

      >>> a, b = b, a
      >>> a
      'my-string'
      >>> b
      2
      
    24. You can use unpacking to assign values to your variables

      Unpacking <--- assign values

      >>> a, b = 2, 'my-string'
      >>> a
      2
      >>> b
      'my-string'
      
    25. following the rules called The Zen of Python or PEP 20

      The Zen of Python or PEP 20 <--- rules followed by Python

    1. One of my favorite languages that I have had the pleasure to use is Python. Python supports named parameters without any trickery.... Since I started using Python (some time ago) everything became easier. I believe that every language should support named parameters, but that just isn't the case.
    1. this article merely scratches the surface. If you want to dig deeper into concurrency in Python, there is an excellent talk titled Thinking about Concurrency by Raymond Hettinger on the subject. Make sure to also check out the slides whilst you’re at it

      Learn more about concurrency:

    2. Introducing multiprocessing now is a cinch; I just replace ThreadPoolExecutor with ProcessPoolExecutor in the previous listing

      Replacing multithreading with multiprocessing:

      replace ThreadPoolExecutor with ProcessPoolExecutor

    3. As the name suggests, multiprocessing spawns processes, while multithreading spawns threads. In Python, one process can run several threads. Each process has its proper Python interpreter and its proper GIL. As a result, starting a process is a heftier and more time-consuming undertaking than starting a thread.

      Reason for multiprocessing being slower than multithreading:

      Multiprocessing is more time-consuming to use because of its architecture

    4. First of all, write a script that carries out the task in a sequential fashion. Secondly, transform the script so that it carries out the task using the map function. Lastly, replace map with a neat function from the concurrent.futures module

      Concurrent Python programs in 3 steps:

      1. Write a script that carries out the task in a sequential fashion.
      2. Transform the script so that it carries out the task using the map function.
      3. Replace map with a neat function from the concurrent.futures module.
    5. Python standard library makes it fairly easy to create threads and processes

      Fortunately, there is a workaround for concurrent programming in Python

    6. Python is a poor choice for concurrent programming. A principal reason for this is the ‘Global Interpreter Lock’ or GIL. The GIL ensures that only one thread accesses Python objects at a time, effectively preventing Python from being able to distribute threads onto several CPUs by default

      Python isn't the best choice for concurrent programming

    1. Indicate number of NA values placed in non-numeric columns.

      This is only true when using the Python parsing engine.

      Filled 3 NA values in column name
      

      If using the C parsing engine you get something like the following output:

      Tokenization took: 0.01 ms
      Type conversion took: 0.70 ms
      Parser memory cleanup took: 0.01 ms
      
    1. It’s just that it often makes sense to write code in the order JOIN / WHERE / GROUP BY / HAVING. (I’ll often put a WHERE first to improve performance though, and I think most database engines will also do a WHERE first in practice)

      Pandas usually writes code in this syntax:

      1. JOIN
      2. WHERE
      3. GROUP BY
      4. HAVING

      Example:

      1. df = thing1.join(thing2) # like a JOIN
      2. df = df[df.created_at > 1000] # like a WHERE
      3. df = df.groupby('something', num_yes = ('yes', 'sum')) # like a GROUP BY
      4. df = df[df.num_yes > 2] # like a HAVING, filtering on the result of a GROUP BY
      5. df = df[['num_yes', 'something1', 'something']] # pick the columns I want to display, like a SELECT
      6. df.sort_values('sometthing', ascending=True)[:30] # ORDER BY and LIMIT
      7. df[:30]
  7. Sep 2019
    1. Which works fine, but make you use lambda. You could rather use the operator module:import operator mylist = list(zip(range(40, 240), range(-100, 100))) sorted(mylist, key=operator.itemgetter(1))
    1. py27-django{18,19,110,111,111tip},

      An example of how to test multiple python versions against multiple library versions

    1. the __ methods allow us to interact with core concepts of the python language. You can see them also as a mechanism of implementing behaviours, interface methods.

      __ methods

    2. Dunder or magic method, are methods that start and end with double _ like __init__ or __str__. This kind of methods are the mechanism we use to interact directly with python's data model

      Dunder methods

    3. use pyenv. With it, you will be able to have any version you want at your disposal, very easy.

      pyenv allows you to easily switch between Python versions

    1. Another common component to profile is the memory usage. The purpose is to find memory leaks and optimize the memory usage in your Python programs

      Memory usage can be tracked with pympler or objgraph libraries

    2. The profile module gives similar results with similar commands. Typically, you switch to profile if cProfile isn’t available

      cProfile > profile. Use profile only when cProfile isn't available

    3. Line profiling, as the name suggests, means to profile your Python code line by line

      Line profiling

    4. If a method has an acceptable speed but is so frequently called that it becomes a huge time sink, you would want to know this from your profiler

      We also want to measure the frequency of method calls. cProfile can highlight the number of function calls and how many of those are native calls

    5. method profiling tool like cProfile (which is available in the Python language), the timing metrics for methods can show you statistics, such as the number of calls (shown as ncalls), total time spent in the function (tottime), time per call (tottime/ncalls and shown as percall), cumulative time spent in a function (cumtime), and cumulative time per call (quotient of cumtime over the number of primitive calls and shown as percall after cumtime)

      cProfile is one of the Python tools to measure method execution time. Specifically:

      • number of calls (shown as ncalls)
      • total time spent in the function (tottime)
      • time per call (tottime/ncalls and shown as percall)
      • cumulative time spent in a function (cumtime)
      • cumulative time per call (quotient of cumtime over the number of primitive calls and shown as percall after cumtime)
    6. what parts of the software do we profile (measure its performance metrics)

      Most profiled parts of the software:

      • Method or function (most common)
      • Lines (similar to method profiling, but doing it line by line)
      • Memory (memory usage)
    7. its purpose is to dump Python tracebacks explicitly on a fault, after a timeout, or on a user signal

      Faulthandler in contrast to tracing tracks specific events and has slightly better documentation

    8. The purpose of trace module is to “monitor which statements and functions are executed as a program runs to produce coverage and call-graph information

      Purpose of trace module

    9. trace and faulthandler modules cover basic tracing

      Basic Python libraries for tracing

    10. tracing is a special use case of logging in order to record information about a program’s execution

      Tracing (more for software devs) is very similar to event logging (more for system administrators)

  8. Aug 2019
    1. intern

      sys.intern in Python3

    2. ascii letters, digits or underscores

      String composed of these characters are interned. Aka the following will not be interned:

      f = 'f o'

      but the following will be interned:

      f = 'f_o'

    3. sequences generated through peephole optimization are discarded if their length is superior to 20

      So values generated via peephole optimization above 20 are not 'pre-computed' but left as is:

      'a' * 21 # remains in the byte code where as
      'a' * 20 # gets converted to 'aaaaaaaaaaaaaaaaaaaa'
      
    4. string subclasses cannot be interned

      Meaning

      class NewString(str):
         pass
      assert NewString('f') is 'f'
      

      Will fail. Where as:

      f = 'f'
      assert f is 'f'
      

      Will pass.

      This is presents a case for not building custom string types in Python as it breaks the string cache and can result in a performance hit.

    5. strings can be compared by a O(1) pointer comparison instead of a O(n) byte-per-byte comparison

      This is a huge advantage. Not only does it save memory by not duplicating simple and common string values, but the comparison method has an early exit that compares the pointers instead of the values. aka in sudo-code form:

      // compare pointers
      if self._value is value:
         return True
      
      // compare values
      for i, v in enumerate(self._value):
         if v != value[i]:
            return False
      return True
      
  9. Jul 2019
    1. On 10 April, astrophysicists announced that they had captured the first ever image of a black hole. This was exhilarating news, but none of the giddy headlines mentioned that the image would have been impossible without open-source software. The image was created using Matplotlib, a Python library for graphing data, as well as other components of the open-source Python ecosystem. Just five days later, the US National Science Foundation (NSF) rejected a grant proposal to support that ecosystem, saying that the software lacked sufficient impact.

      This story raises the question "From the funding point of view, what is reproducible research?".

  10. Jun 2019
    1. get_current_request()

      This can be used to get the current request object if the request object is not already available in the current context. It really shouldn't be used in production code but in a debug/test scenario it can be quite handy.

    1. Corporate training Introduction to Python Advanced Python Python for non-programmers Data science and machine learning in Python Python for system administrators Python Practice Workshop Regular expressions Introduction to Git Online training Weekly Python Exercise NumPy Intro Python: Fundamentals Object-oriented Python Comprehending Comprehensions Understanding and mastering Git Python Workout Practice Makes Regexp Free e-mail courses Regular expressions crash course Boolean indexing in NumPy and Pandas Variable scoping in Python Working with files in Python Teach programming better Blog About Reuven Home → Blog →Python →Why do Python lists let you += a tuple, when you can’t + a tuple? 1 Why do Python lists let you += a tuple, when you can’t + a tuple? Let’s say you have a list in Python: >>> mylist = [10, 20, 30] You want to add something to that list. The most standard way to do this is with the “append” method, which adds its argument to the end of the list: >>> mylist.append(40) >>> print(mylist) [10, 20, 30, 40] But what if you want to add multiple items to a list? If you’re new to Python, then you might think that you can and should use a “for” loop. For example: >>> mylist = [10, 20, 30] >>> new_items = [40, 50, 60] >>> for one_item in new_items: mylist.append(one_item) >>> print(mylist) [10, 20, 30, 40, 50, 60] Great, right? But it turns out that there is a smarter and faster way to do this. You can use the += operator. This operator, which invokes the “iadd” (“inplace add”) method on the object to its left, effectively does what we did above, but in much less code: >>> mylist = [10, 20, 30] >>> new_items = [40, 50, 60] >>> mylist += new_items >>> print(mylist) [10, 20, 30, 40, 50, 60] It’s not a huge surprise that += can do this. After all, we normally expect += to add and assign to the variable on its left; it works with numbers and strings, as well as other types. And we know that we can use the + operator on lists, too: >>> [1, 2, 3] + [4, 5, 6] [1, 2, 3, 4, 5, 6] Can we join a list and a tuple? Let’s check: >>> mylist = [10, 20, 30] >>> t = (40, 50, 60) >>> mylist + t Traceback (most recent call last): File "", line 1, in TypeError: can only concatenate list (not "tuple") to list In other words: No. Trying to add a list and a tuple, even if we’re not affecting either, results in the above error. Which is why it’s so surprising to many of my students that the following does work: >>> mylist = [10, 20, 30] >>> t = (40, 50, 60) >>> mylist += t >>> mylist [10, 20, 30, 40, 50, 60] That’s right: Adding a list to a tuple with + doesn’t work. But if we use +=, it does. What gives? It’s common, when teaching Python, to say that x += 5 is basically a rewrite of x = x + 5 And in the majority of cases, that’s actually true. But it’s not always true. Consider: When you say “x + y” in Python, the “+” operator is translated into a method call. Behind the scenes, no matter what “x” and “y” are, the expression is translated into: x.__add__(y) The “__add__” magic method is what’s invoked on an object when it is added to another object. The object on the right-hand side of the “+” is passed as an argument to the method, while the object on the left-hand side is the recipient of the method call. That’s why, if you want your own objects to handle the “+” operator, you need to define the “__add__” method in your class definition. Do that, and things work just fine. And thus, when we say “x = x + 5”, this is turned into x = x.__add__(5) Meaning: First invoke the method, and then assign it back to the variable “x”. In this case, “x” isn’t changing; rather, the variable is now referencing a new object. Now consider the “+=” operator: It’s translated by Python into “__iadd__”, short for “inplace add.” Notice the slightly different syntax that we use here: x += y is translated into x.__iadd__(y) Did you see the difference between __add__ and __iadd__? The latter executes the assignment all by itself, internally. You don’t have to capture its output and assign it back to x. It turns out that the implementation of list.__iadd__ takes the second (right-hand side) argument and adds it, one element at a time, to the list. It does this internally, so that you don’t need to execute any assignment after. The second argument to “+=” must be iterable; if you say mylist += 5 you will get an error, saying that integers are not iterable. But if you put a string, list, tuple, or any other iterable type on the right-hand side, “+=” will execute a “for” loop on that object, adding each of its elements, one at a time, to the list. In other words: When you use + on a list, then the right-hand object must be a list. But when you use +=, then any iterable type is acceptable: >>> mylist = [10, 20, 30] >>> mylist += [40, 50] # list >>> mylist [10, 20, 30, 40, 50] >>> mylist += (60, 70) # tuple >>> mylist [10, 20, 30, 40, 50, 60, 70] >>> mylist += 'abc' # string >>> mylist [10, 20, 30, 40, 50, 60, 70, 'a', 'b', 'c'] >>> mylist += {'x':1, 'y':2, 'z':3} # dict! >>> mylist [10, 20, 30, 40, 50, 60, 70, 'a', 'b', 'c', 'x', 'y', 'z'] Does this work with other types? Not really. For example: >>> t = (10, 20, 30) >>> t += [40, 50] Traceback (most recent call last): File "", line 1, in TypeError: can only concatenate tuple (not "list") to tuple What happened here? Let’s check the definition of tuple.__iadd__ to find out: >>> help(tuple.__iadd__) Traceback (most recent call last): File "", line 1, in AttributeError: type object 'tuple' has no attribute '__iadd__' Wait a second: There is no “__iadd__” method for tuples? If so, then how can “+=” work at all? Because Python tries to be smart in such cases: If the object implements “__iadd__”, then the “+=” operator invokes it. But if the object lacks an “__iadd__” implementation, then Python does what we all guess it normally does — namely, invoke “__add__”, and then assign the results back to the variable. For example: >>> class Foo(object): def __init__(self, x): self.x = x def __add__(self, other): print("In __add__") return Foo(self.x + other.x) >>> f1 = Foo(10) >>> f2 = Foo(20) >>> f1 += f2 In __add__ >>> vars(f1) {'x': 30} In other words, Python notices that our Foo class lacks an implementation of “__iadd__”, and substitutes “__add__” for it, assigning its result (a new instance of Foo) to the original variable.
  11. Apr 2019
    1. #!/usr/bin/env python3 import configparser import os import sys from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.keys import Keys from selenium.common.exceptions import TimeoutException from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.by import By config = configparser.ConfigParser() config.read("config.ini") chrome_options = Options() chrome_options.add_argument("--headless") driver = webdriver.Chrome(executable_path=os.path.abspath("/usr/local/bin/chromedriver"), chrome_options=chrome_options) driver.get("https://fastmail.fm") timeout = 120 try: element_present = EC.presence_of_element_located((By.NAME, 'username')) WebDriverWait(driver, timeout).until(element_present) # Send login information user = driver.find_element_by_name("username") passwd = driver.find_element_by_name("password") user.send_keys(config["default"]["user"]) passwd.send_keys(config["default"]["pass"]) driver.find_element_by_class_name("v-Button").click() print("Logged in") # wait for login to complete element_present = EC.presence_of_element_located((By.CLASS_NAME, 'v-MainNavToolbar')) WebDriverWait(driver, timeout).until(element_present) # click settings menu to make elements visible driver.find_element_by_class_name("v-MainNavToolbar").click() # And follow to settings page driver.find_element_by_link_text("Settings").click() # Wait for settings page to render, oh Javascript element_present = EC.presence_of_element_located((By.LINK_TEXT, 'Rules')) WebDriverWait(driver, timeout).until(element_present) # Click on Rules link driver.find_element_by_link_text("Rules").click() # Click on edit custom sieve code element_present = EC.presence_of_element_located((By.LINK_TEXT, 'Edit custom sieve code')) WebDriverWait(driver, timeout).until(element_present) driver.find_element_by_link_text("Edit custom sieve code").click() print("Editing") # This is super unstable, I hate that we have to go by webid element_present = EC.presence_of_element_located((By.CLASS_NAME, 'v-EditSieve-rules')) WebDriverWait(driver, timeout).until(element_present) print("Find form") elements = driver.find_elements_by_css_selector("textarea.v-Text-input") element = elements[-1] # Find the submit button elements = driver.find_elements_by_css_selector("button") for e in elements: if "Save" in e.text: submit = e print("Found form") # And replace the contents element.clear() with open("rules.txt") as f: element.send_keys(f.read()) # This is the Save button print("Submitted!") submit.click() except TimeoutException as e: print(e) print("Timed out waiting for page to load") sys.exit(0) print("Done!")
  12. Mar 2019
  13. Feb 2019
  14. Jan 2019
    1. MIT “MATLAB is the language used by virtually every team in the world that designs gravitational wave detectors… I look forward to exploring the data from each new detection in MATLAB.” Matthew Evans, Assistant Professor of Physics
    1. KNIME includes Python in various processing nodes for data processing, model learning and prediction, and the generation of visualizations.
  15. Dec 2018
  16. Nov 2018
  17. Oct 2018
    1. All you can do is send a message (AYCDISAM) = Actors model - there is no direct manipulation of objects, only communication with (or invocation of) them. The presence of fields in Java violates this.

      from what I understand in Java... there are some variables on classes (class instances) that are only acessible through methods and for those the "only send message" paradigm holds but there are also fields which are like attributes in python which you can change directly

  18. Aug 2018
  19. Jul 2018
  20. Jun 2018
    1. With over 6 million users, the open source Anaconda Distribution is the fastest and easiest way to do Python and R data science and machine learning on Linux, Windows and Mac OS X. It's the industry standard for developing, testing and training on a single machine

      With over 6 million users, the open source Anaconda Distribution is the fastest and easiest way to do Python and R data science and machine learning on Linux, Windows and Mac OS X. It's the industry standard for developing, testing and training on a single machine

    1. Project Jupyter exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages

  21. May 2018
    1. 如果不在同一个package中,例如我们希望在module_21.py中调用module_11.py中的FuncA
      from module_11包名.module_11 import funcA
      
    1.   这两个函数都接收一个分割字符串作为参数,将目标字符串分割为两个部分,返回一个三元元组(head,sep,tail),包含分割符。细微区别在于前者从目标字符串的末尾也就是右边开始搜索分割符。
      >>> "django.core.app".partition('.')
      ('django', '.', 'core.app')
      >>> "django.core.app".rpartition('.')
      ('django.core', '.', 'app')
      
    1. argarse.ArgumentParser.parse_known_args()解析

      有时脚本可能只解析一些命令行参数,将剩下的参数传递给另一个脚本或程序。 在这种情况下,parse_known_args() 方法可能很有用。 它的工作方式与parse_args() 非常相似,只是在出现额外参数时不会产生错误。 相反,它会重新生成包含已填充名称空间和剩余参数字符串列表的两个项目元组。

      import argparse  
      parser = argparse.ArgumentParser()  
      parser.add_argument(  
          '--flag_int',  
          type=float,  
          default=0.01,  
          help='flag_int.'  
      )  
      FLAGS, unparsed = parser.parse_known_args()  
      print(FLAGS)  
      print(unparsed)
      
      # python test.py --flag_int 1 --flag_bool True --flag_string 'haha, I get it' --flag_float 0.2
      
      Namespace(flag_int=1.0)
      ['--flag_bool', 'True', '--flag_string', 'haha, I get it', '--flag_float', '0.2']
      
    1. one level is chosen as the “reference”, and its mean behaviour is represented by the intercept. Each column of the resulting matrix represents the difference between the mean of one level and this reference level
  22. Apr 2018
    1. Iterator Protocol

      The protocol specifies methods to be implemented to make our objects iterable.

      该协议指定了要实现我们的对象迭代的方法。

      "Iterable" simply means able to be looped over or otherwise treated as a sequence or collection.

      “可迭代”仅仅意味着能够循环或像对待序列或集合一样的方式处理。

    2. Classes allow us to create a custom type of object -- that is, an object with its own behaviors and its own ways of storing data.

      Introduction of Classes 类允许我们创建一个自定义类型的对象 - 也就是说,一个对象具有自己的行为和自己的存储数据的方式。

    3. Packages provide for accessing variables within multiple files.

      从package多文件访问变量的注意事项

    4. A package is a directory of files that work together as a Python application or library module.

      Introduction python packages

      packages是一个文件目录,可以作为Python应用程序或库模块一起工作

    5. Recursion features three elements

      递归调用三要素:

      1. recursive call: 递归调用,这是函数对自己的调用;
      2. function process: 函数过程本身;
      3. base condition: 一个基本条件,这是递归链最终返回的点。
    6. *args and **kwargs to capture all arguments

      *args 和 **kwargs 可以代表所有参数

      为了允许装饰函数接受参数,我们必须接受它们并将它们传递给装饰函数。

    7. object: a data value of a particular type variable: a name bound to an object

      什么是object -- 是一个具有类型的数据 什么是变量 -- 是一个绑定对象的名字

    8. All programs named test_something.py that have functions named test_something()

      pytest Basics

    9. unit test: testing individual units (function/method) integration test: testing multiple units together regression test: testing to see if changes have introduced errors end-to-end test: testing the entire program
      • 单元测试 -- 测试单个单元,方法
      • 集成测试 -- 集中测试多个单元
      • 回归测试 -- 测试修改有没有引入新的错误
      • 端到端测试 -- 测试整个程序
    10. pandas is a Python module used for manipulation and analysis of tabular data.

      Introduction to Pandas

      Pandas is used to manipulation and analysis of tabular data

    11. pandas official documentation

      pandas reference and tutorials include:full docs, 10minutes to pandas, blog tutorials

    1. JupyterHub, a multi-user Hub, spawns, manages, and proxies multiple instances of the single-user Jupyter notebook server.
    1. What is music21?

      ¿Qué es Music21?

      Music21 es un conjunto de herramientas para ayudar a los estudiantes de música y personas que escuchan la música de manera activa a resolver preguntas cómo: ¿Cuántas veces hace Bach eso? ¿cuándo fue la primera vez que determinada banda uso cierto orden de acordes?.

    2. How simple is music21 to use?

      ¿Cómo usar Music21?

      Después de instalar python y music21 usando pip u otro método, escribir en un archivo de texto: "from music21 import*" y luego puedes trabajar con todas las funciones que ofrece la biblioteca music21

      Para mostrar una melodía en notación musical se escribe: converter.parse("tinynotation: 3/4 c4 d8 f g16 a g f#").show()

    1. At every turn, IPython chose the way that was more inclusive, to the point where it’s no longer called “IPython”: The project rebranded itself as “Jupyter” in 2014 to recognize the fact that it was no longer just for Python.

      Such an interesting progression!

  23. Nov 2017
    1. Compilers translate an entire program's source code before the program can run. Interpreters translate one line at a time but it doesn't mean that the compilation stage doesn't happen.

    1. At some point in the following decade json-head.appspot.com stopped working. Today I’m bringing it back, mainly as an excuse to try out the combination of Python 3.5 async, the Sanic microframework and Zeit’s brilliant Now deployment platform.

      Oh neat, I had no idea now supported Python. Another option available!

  24. Oct 2017
    1. c = np.full((2,2), 7) # Create a constant array print(c)

      To avoid the warning, use

      np.full((2,2), 7.) or use explicitly for float np.full((2,2),7, dtype=float) or for int np.full((2,2),7, dtype=int)