Hypothesis

23 Matching Annotations

Dec 2023
testdriven.io testdriven.io

Speeding Up Python with Concurrency, Parallelism, and asyncio

3
1. GadjiMurad 17 Dec 2023
  
  in Public
  
  What's the difference between concurrency and parallelism?
  
  concurrent process performs multiple tasks at the same time whether they're being diverted total attention or not, a parallel process is physically performing multiple tasks all at the same time.
  
  concurrency parallelism comparing
2. GadjiMurad 17 Dec 2023
  
  in Public
  
  What is parallelism?
  
  Parallelism is very-much related to concurrency. In fact, parallelism is a subset of concurrency: whereas a concurrent process performs multiple tasks at the same time whether they're being diverted total attention or not, a parallel process is physically performing multiple tasks all at the same time.
  
  parallelism definition
3. GadjiMurad 16 Dec 2023
  
  in Public
  
  There are many reasons your applications can be slow. Sometimes this is due to poor algorithmic design or the wrong choice of data structure. Sometimes, however, it's due to forces outside of our control, such as hardware constraints or the quirks of networking.
  
  That's where concurrency and parallelism fit in. They allow your programs to do multiple things at once, either at the same time or by wasting the least possible time waiting on busy tasks.
  
  concurrency parallelism
Visit annotations in context

Tags

definition

comparing

parallelism

concurrency

Annotators

GadjiMurad

URL

testdriven.io/blog/concurrency-parallelism-asyncio/
pythonspeed.com pythonspeed.com

Python’s multiprocessing performance problem

1
1. GadjiMurad 15 Dec 2023
  
  in Public
  
  Running the code in a subprocess is much slower than running a thread, not because the computation is slower, but because of the overhead of copying and (de)serializing the data. So how do you avoid this overhead?
  
  Reducing the performance hit of copying data between processes:
  
  Option #1: Just use threads
  
  Processes have overhead, threads do not. And while it’s true that generic Python code won’t parallelize well when using multiple threads, that’s not necessarily true for your Python code. For example, NumPy releases the GIL for many of its operations, which means you can use multiple CPU cores even with threads.
  
``` # numpy_gil.py import numpy as np from time import time from multiprocessing.pool import ThreadPool

arr = np.ones((1024, 1024, 1024))

start = time() for i in range(10): arr.sum() print("Sequential:", time() - start)

expected = arr.sum()

start = time() with ThreadPool(4) as pool: result = pool.map(np.sum, [arr] * 10) assert result == [expected] * 10 print("4 threads:", time() - start) ```

When run, we see that NumPy uses multiple cores just fine when using threads, at least for this operation:

$ python numpy_gil.py Sequential: 4.253053188323975 4 threads: 1.3854241371154785

Pandas is built on NumPy, so many numeric operations will likely release the GIL as well. However, anything involving strings, or Python objects in general, will not. So another approach is to use a library like Polars which is designed from the ground-up for parallelism, to the point where you don’t have to think about it at all, it has an internal thread pool.

Option #2: Live with it

If you’re stuck with using processes, you might just decide to live with the overhead of pickling. In particular, if you minimize how much data gets passed and forth between processes, and the computation in each process is significant enough, the cost of copying and serializing data might not significantly impact your program’s runtime. Spending a few seconds on pickling doesn’t really matter if your subsequent computation takes 10 minutes.

Option #3: Write the data to disk

Instead of passing data directly, you can write the data to disk, and then pass the path to this file: * to the subprocess (as an argument) * to parent process (as the return value of the function running in the worker process).

The recipient process can then parse the file.

``` import pandas as pd import multiprocessing as mp from pathlib import Path from tempfile import mkdtemp from time import time
  
  def noop(df: pd.DataFrame): # real code would process the dataframe here pass
  
  def noop_from_path(path: Path): df = pd.read_parquet(path, engine="fastparquet") # real code would process the dataframe here pass
  
  def main(): df = pd.DataFrame({"column": list(range(10_000_000))})
  
  with mp.get_context("spawn").Pool(1) as pool: # Pass the DataFrame to the worker process # directly, via pickling: start = time() pool.apply(noop, (df,)) print("Pickling-based:", time() - start) # Write the DataFrame to a file, pass the path to # the file to the worker process: start = time() path = Path(mkdtemp()) / "temp.parquet" df.to_parquet( path, engine="fastparquet", # Run faster by skipping compression: compression="uncompressed", ) pool.apply(noop_from_path, (path,)) print("Parquet-based:", time() - start)
  
  if name == "main": main() `` **Option #4:multiprocessing.shared_memory`**
  
  Because processes sometimes do want to share memory, operating systems typically provide facilities for explicitly creating shared memory between processes. Python wraps this facilities in the multiprocessing.shared_memory module.
  
  However, unlike threads, where the same memory address space allows trivially sharing Python objects, in this case you’re mostly limited to sharing arrays. And as we’ve seen, NumPy releases the GIL for expensive operations, which means you can just use threads, which is much simpler. Still, in case you ever need it, it’s worth knowing this module exists.
  
  Note: The module also includes ShareableList, which is a bit like a Python list but limited to int, float, bool, small str and bytes, and None. But this doesn’t help you cheaply share an arbitrary Python object.
  
  A bad option for Linux: the "fork" context
  
  You may have noticed we did multiprocessing.get_context("spawn").Pool() to create a process pool. This is because Python has multiple implementations of multiprocessing on some OSes. "spawn" is the only option on Windows, the only non-broken option on macOS, and available on Linux. When using "spawn", a completely new process is created, so you always have to copy data across.
  
  On Linux, the default is "fork": the new child process has a complete copy of the memory of the parent process at the time of the child process’ creation. This means any objects in the parent (arrays, giant dicts, whatever) that were created before the child process was created, and were stored somewhere helpful like a module, are accessible to the child. Which means you don’t need to pickle/unpickle to access them.
  
  Sounds useful, right? There’s only one problem: the "fork" context is super-broken, which is why it will stop being the default in Python 3.14.
  
  Consider the following program:
  
``` import threading import sys from multiprocessing import Process

def thread1(): for i in range(1000): print("hello", file=sys.stderr)

threading.Thread(target=thread1).start()

def foo(): pass

Process(target=foo).start() ```

On my computer, this program consistently deadlocks: it freezes and never exits. Any time you have threads in the parent process, the "fork" context can cause in potential deadlocks, or even corrupted memory, in the child process.

You might think that you’re fine because you don’t start any threads. But many Python libraries start a thread pool on import, for example NumPy. If you’re using NumPy, Pandas, or any other library that depends on NumPy, you are running a threaded program, and therefore at risk of deadlocks, segfaults, or data corruption when using the "fork" multiprocessing context. For more details see this article on why multiprocessing’s default is broken on Linux.

You’re just shooting yourself in the foot if you take this approach.

subprocess comparing threading thread pools multiprocessing python parallelism
Visit annotations in context

Tags

python

parallelism

comparing

thread pools

subprocess

multiprocessing

threading

Annotators

GadjiMurad

URL

pythonspeed.com/articles/faster-multiprocessing-pickle/
horaceguy.pages.dev horaceguy.pages.dev

Horace Guy's blog | Concurrency in Python with FastAPI

1
1. GadjiMurad 14 Dec 2023
  
  in Public
  
  Gunicorn and multiprocessing
  
  Gunicorn forks a base process into n worker processes, and each worker is managed by Uvicorn (with the asynchronous uvloop). Which means:
  
  Each worker is concurrent
  
  The worker pool implements parallelism
  
  This way, we can have the best of both worlds: concurrency (multithreading) and parallelism (multiprocessing).
  
  concurrency parallelism python gunicorn
Visit annotations in context

Tags

python

gunicorn

parallelism

concurrency

Annotators

GadjiMurad

URL

horaceguy.pages.dev/posts/python-concurrency/
www.bitecode.dev www.bitecode.dev

Asyncio, twisted, tornado, gevent walk into a bar...

2
1. GadjiMurad 11 Dec 2023
  
  in Public
  
  The typical analogy is this:
  
  concurrency is having two lines of customers ordering from a one cashier;
  
  parallelism is having two lines of customers ordering from two cashiers.
  
  concurrency parallelism
2. GadjiMurad 11 Dec 2023
  
  in Public
  
  parallelism
  
  "doing lots of things at once" (As Rob Pike said)
  
  parallelism
Visit annotations in context

Tags

parallelism

concurrency

Annotators

GadjiMurad

URL

bitecode.dev/p/asyncio-twisted-tornado-gevent-walk
www.bitecode.dev www.bitecode.dev

The easy way to concurrency and parallelism with Python stdlib

1
1. GadjiMurad 11 Dec 2023
  
  in Public
  
  You can distribute work to a bunch of process workers or thread workers with a few lines of code:
  
```python from concurrent.futures import ThreadPoolExecutor, as_completed

with ThreadPoolExecutor(max_workers=5) as executor: executor.submit(do_something_blockint) ```

python concurrency parallelism thread_pools threading
Visit annotations in context

Tags

python

parallelism

thread_pools

threading

concurrency

Annotators

GadjiMurad

URL

bitecode.dev/p/the-easy-way-to-concurrency-and-parallelism
Aug 2023
brilliant.org brilliant.org

Order and Search | Brilliant

1
1. joobisb 06 Aug 2023
  
  in Public
  
  embarrassingly parallel
  
  independent work, without depending others
  
  parallelism ordering brilliant.org
Visit annotations in context

Tags

ordering

parallelism

brilliant.org

Annotators

joobisb

URL

brilliant.org/courses/computer-science-essentials/computational-problem-solving-2/order-and-search/
Jun 2023
docdrop.org docdrop.org

Jazz Theory

1
1. joosthuizenjazz 25 Jun 2023
  
  in Public
  
  Chapter 26 provides an analysis of Lennie Tristano’s “Line Up.” Based on this analysis,specific elements of his style of improvisation are codified
  
  no note: break up tristano style into atomic jazz style elements
  
  line-up melody-compound/polyphonic melodic-interpolation motivic-parallelism neighbour-figures playing-outside rhythmic-displacement sidesteppings tristano-style jse-mel-tristano-style-? phrase-displacement metric-displacement
Visit annotations in context

Tags

tristano-style

motivic-parallelism

phrase-displacement

neighbour-figures

melody-compound/polyphonic

playing-outside

line-up

sidesteppings

melodic-interpolation

rhythmic-displacement

metric-displacement

jse-mel-tristano-style-?

Annotators

joosthuizenjazz

URL

docdrop.org/download_annotation_doc/Terefenko---2014---JazzTheory-1ed-2--x7zx4.pdf
Aug 2022
medium.com medium.com

Synchronous vs Asynchronous vs Concurrent vs Parallel

1
1. TylerRick 31 Aug 2022
  
  in Public
  
  This seemed like a good disambiguation of the terms at first glance, but actually isn't my favorite.
  
  I found https://medium.com/@itIsMadhavan/concurrency-vs-parallelism-a-brief-review-b337c8dac350 more useful.
  
  distinction difference concurrency synchronous asynchronous parallelism parallel (simultaneous)
Visit annotations in context

Tags

distinction

asynchronous

parallelism

synchronous

parallel (simultaneous)

difference

concurrency

Annotators

TylerRick

URL

medium.com/plain-and-simple/synchronous-vs-asynchronous-vs-concurrent-vs-parallel-4342bfb8b9f2
medium.com medium.com

Concurrency vs. Parallelism: A brief view

1
1. TylerRick 31 Aug 2022
  
  in Public
  
  Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once.
  
  distinction concurrency parallelism parallel (simultaneous) good point
Visit annotations in context

Tags

distinction

parallel (simultaneous)

parallelism

good point

concurrency

Annotators

TylerRick

URL

medium.com/@itIsMadhavan/concurrency-vs-parallelism-a-brief-review-b337c8dac350
Apr 2022
www.cs.sfu.ca www.cs.sfu.ca

Randal E. Bryant, David R. O’Hallaron - Computer Systems. A Programmer’s Perspective [3rd ed.] (2016, Pearson)Global Edition.pdf

1
1. sherlockliao 30 Apr 2022
  
  in Public
  
  are evaluated simultaneously, a phenomenon referred to as instruction-level paral-lelism.
  
  什么是指令并行?
  
  instruction-level parallelism
Visit annotations in context

Tags

instruction-level parallelism

Annotators

sherlockliao

URL

cs.sfu.ca/~ashriram/Courses/CS295/assets/books/CSAPP_2016.pdf
Jan 2022
www.newyorker.com www.newyorker.com

Review: Steven Spielberg’s “West Side Story” Remake Is Worse Than the Original

1
1. gabbyraysbrook7 13 Jan 2022
  
  in Public
  
  He leaves no loose ends, no ambiguities, no extravagances, no extremes.
  
  Brody creates parallelism here by starting each thing with no and having each word end with an S. This creates a list of ideas.
  
  parallelism
Visit annotations in context

Tags

parallelism

Annotators

gabbyraysbrook7

URL

newyorker.com/culture/the-front-row/review-steven-spielbergs-west-side-story-remake-is-worse-than-the-original
Jan 2021
www.theatlantic.com www.theatlantic.com

Weekend Poem: All Watched Over by Machines of Loving Grace

1
1. zachmuhlbauer 22 Jan 2021
  
  in Public
  
  cybernetic ecology
  
  First comes the cybernetic meadow, then the cybernetic forest, and ultimately the cybernetic ecology. Brautigan profiles this expansion in third line of each stanza. i wonder where else we might find correspondences of the sort.
  
  correspondence parallelism
Visit annotations in context

Tags

parallelism

correspondence

Annotators

zachmuhlbauer

URL

theatlantic.com/technology/archive/2011/09/weekend-poem-all-watched-over-by-machines-of-loving-grace/245251/
Nov 2020
icla2020b.jonreeve.com icla2020b.jonreeve.com

The Garden Party and Other Stories

1
1. paolacruz 02 Nov 2020
  
  in Public
  
  it had been dark, silent, beautiful very often—oh yes—but mournful somehow
  
  This gives a hint of how life must have been for Leila living in a country. "Dark" and "silent" allude to her being alone, given that she is an only child, which also accounts that her life must have been quite dull and lonely. Yet, she also mentions that most of her nights are "beautiful," which elucidates that she has been living a good life. Perhaps she lives in a lovely house, and her family owns a nice and vast farm, given that she also comes from the same class of family as the Sheridans.
  
  parallelism social class
Visit annotations in context

Tags

social class

parallelism

Annotators

paolacruz

URL

icla2020b.jonreeve.com/texts/garden-party.html
Jul 2019
nzetc.victoria.ac.nz nzetc.victoria.ac.nz

A Little Boy's Dream | NZETC

1
1. SteneeStarshine 08 Jul 2019
  
  in Public
  
  In an awful pirate fight
  
  a dream with a fight,a struggle(parallels the boy's struggle facing lonliness
  
  #parallelism
Visit annotations in context

Tags

#parallelism

Annotators

SteneeStarshine

URL

nzetc.victoria.ac.nz/tm/scholarly/tei-ManPoem-t1-body-d4-d20.html
Jan 2017
nfs.sparknotes.com nfs.sparknotes.com

No Fear Shakespeare: Julius Caesar: Act 2, Scene 2, Page 4

1
1. tuckleong 11 Jan 2017
  
  in Public
  
  Signifies that from you great Rome shall suck Reviving blood, and that great men shall press For tinctures, stains, relics, and cognizance.
  
  How effective is the use of parallelism?
  
  For more rhetorical devices, consult Handbook.
  
  Rhetorics Parallelism
Visit annotations in context

Tags

Parallelism

Rhetorics

Annotators

tuckleong

URL

nfs.sparknotes.com/juliuscaesar/page_84.html
Sep 2016
www.nytimes.com www.nytimes.com

How Mexico and the U.S. Can Fix Migration

3
1. sdowell2019 12 Sep 2016
  
  in Public
  
  cooperating
  
  This word is used in almost every paragraph of the piece to further the point of working WITH Mexico instead of AGAINST.
  
  parallelism
2. sdowell2019 12 Sep 2016
  
  in Public
  
  innovations
  
  oops he did it again
  
  parallelism
3. sdowell2019 12 Sep 2016
  
  in Public
  
  innovation
  
  This word has been repeated several times in order to play on the average american's idealism of "American Innovation"
  
  parallelism
Visit annotations in context

Tags

parallelism

Annotators

sdowell2019

URL

nytimes.com/2016/09/13/opinion/how-mexico-and-the-us-can-fix-migration.html
millercenter.org millercenter.org

Remarks to the People of Cuba (March 22, 2016)—Miller Center

2
1. bbrowder13 12 Sep 2016
  
  in Public
  
  We've begun initiatives to cooperate on health and agriculture, education and law enforcement. We've reached agreements to restore direct flights and mail service. We've expanded commercial ties, and increased the capacity of Americans to travel and do business in Cuba.
  
  Obama uses parallelism here to convey that Cubans and Americans are united. He does so through the use of the word "we've" to show that strides toward cooperation have already been made.
  
  Parallelism
2. kaitlyn.simmons 12 Sep 2016
  
  in Public
  
  Cuba has a one-party system; the United States is a multi-party democracy. Cuba has a socialist economic model; the United States is an open market. Cuba has emphasized the role and rights of the state; the United States is founded upon the
  
  President Obama is using parallelism to display the differences and compare America and Cuba. He is doing this to develop a relationship and promote trust between the two countries.
  
  comparison parallelism
Visit annotations in context

Tags

comparison

parallelism

Parallelism

Annotators

bbrowder13

kaitlyn.simmons

URL

millercenter.org/president/obama/speeches/remarks-to-the-people-of-cuba

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL