81 Matching Annotations
  1. Dec 2023
    1. Running the code in a subprocess is much slower than running a thread, not because the computation is slower, but because of the overhead of copying and (de)serializing the data. So how do you avoid this overhead?

      Reducing the performance hit of copying data between processes:

      Option #1: Just use threads

      Processes have overhead, threads do not. And while it’s true that generic Python code won’t parallelize well when using multiple threads, that’s not necessarily true for your Python code. For example, NumPy releases the GIL for many of its operations, which means you can use multiple CPU cores even with threads.

      ``` # numpy_gil.py import numpy as np from time import time from multiprocessing.pool import ThreadPool

      arr = np.ones((1024, 1024, 1024))

      start = time() for i in range(10): arr.sum() print("Sequential:", time() - start)

      expected = arr.sum()

      start = time() with ThreadPool(4) as pool: result = pool.map(np.sum, [arr] * 10) assert result == [expected] * 10 print("4 threads:", time() - start) ```

      When run, we see that NumPy uses multiple cores just fine when using threads, at least for this operation:

      $ python numpy_gil.py Sequential: 4.253053188323975 4 threads: 1.3854241371154785

      Pandas is built on NumPy, so many numeric operations will likely release the GIL as well. However, anything involving strings, or Python objects in general, will not. So another approach is to use a library like Polars which is designed from the ground-up for parallelism, to the point where you don’t have to think about it at all, it has an internal thread pool.

      Option #2: Live with it

      If you’re stuck with using processes, you might just decide to live with the overhead of pickling. In particular, if you minimize how much data gets passed and forth between processes, and the computation in each process is significant enough, the cost of copying and serializing data might not significantly impact your program’s runtime. Spending a few seconds on pickling doesn’t really matter if your subsequent computation takes 10 minutes.

      Option #3: Write the data to disk

      Instead of passing data directly, you can write the data to disk, and then pass the path to this file: * to the subprocess (as an argument) * to parent process (as the return value of the function running in the worker process).

      The recipient process can then parse the file.

      ``` import pandas as pd import multiprocessing as mp from pathlib import Path from tempfile import mkdtemp from time import time

      def noop(df: pd.DataFrame): # real code would process the dataframe here pass

      def noop_from_path(path: Path): df = pd.read_parquet(path, engine="fastparquet") # real code would process the dataframe here pass

      def main(): df = pd.DataFrame({"column": list(range(10_000_000))})

      with mp.get_context("spawn").Pool(1) as pool:
          # Pass the DataFrame to the worker process
          # directly, via pickling:
          start = time()
          pool.apply(noop, (df,))
          print("Pickling-based:", time() - start)
      
          # Write the DataFrame to a file, pass the path to
          # the file to the worker process:
          start = time()
          path = Path(mkdtemp()) / "temp.parquet"
          df.to_parquet(
              path,
              engine="fastparquet",
              # Run faster by skipping compression:
              compression="uncompressed",
          )
          pool.apply(noop_from_path, (path,))
          print("Parquet-based:", time() - start)
      

      if name == "main": main() `` **Option #4:multiprocessing.shared_memory`**

      Because processes sometimes do want to share memory, operating systems typically provide facilities for explicitly creating shared memory between processes. Python wraps this facilities in the multiprocessing.shared_memory module.

      However, unlike threads, where the same memory address space allows trivially sharing Python objects, in this case you’re mostly limited to sharing arrays. And as we’ve seen, NumPy releases the GIL for expensive operations, which means you can just use threads, which is much simpler. Still, in case you ever need it, it’s worth knowing this module exists.

      Note: The module also includes ShareableList, which is a bit like a Python list but limited to int, float, bool, small str and bytes, and None. But this doesn’t help you cheaply share an arbitrary Python object.

      A bad option for Linux: the "fork" context

      You may have noticed we did multiprocessing.get_context("spawn").Pool() to create a process pool. This is because Python has multiple implementations of multiprocessing on some OSes. "spawn" is the only option on Windows, the only non-broken option on macOS, and available on Linux. When using "spawn", a completely new process is created, so you always have to copy data across.

      On Linux, the default is "fork": the new child process has a complete copy of the memory of the parent process at the time of the child process’ creation. This means any objects in the parent (arrays, giant dicts, whatever) that were created before the child process was created, and were stored somewhere helpful like a module, are accessible to the child. Which means you don’t need to pickle/unpickle to access them.

      Sounds useful, right? There’s only one problem: the "fork" context is super-broken, which is why it will stop being the default in Python 3.14.

      Consider the following program:

      ``` import threading import sys from multiprocessing import Process

      def thread1(): for i in range(1000): print("hello", file=sys.stderr)

      threading.Thread(target=thread1).start()

      def foo(): pass

      Process(target=foo).start() ```

      On my computer, this program consistently deadlocks: it freezes and never exits. Any time you have threads in the parent process, the "fork" context can cause in potential deadlocks, or even corrupted memory, in the child process.

      You might think that you’re fine because you don’t start any threads. But many Python libraries start a thread pool on import, for example NumPy. If you’re using NumPy, Pandas, or any other library that depends on NumPy, you are running a threaded program, and therefore at risk of deadlocks, segfaults, or data corruption when using the "fork" multiprocessing context. For more details see this article on why multiprocessing’s default is broken on Linux.

      You’re just shooting yourself in the foot if you take this approach.

    1. Both are bad if you need to cancel tasks, collaborate a lot between tasks, deal precisely with the task lifecycle, needs a huge number of workers or want to milk out every single bit of perfs. You won’t get nowhere near Rust level of speed.
    2. Thread pools are good for:
      • Tasks (network, file, etc.) that needs less than 10_000 I/O interactions per second. The number is higher than you would expect, because threads are surprisingly cheap nowadays, and you can spawn a lot of them without bloating memory too much. The limit is more the price of context switching. This is not a scientific number, it's a general direction that you should challenge by measuring your own particular case.
      • When you need to share data between the tasks.
      • When you are not CPU bound.
      • When you are OK to execute tasks a bit slower to you ensure you are not blocking any of them (E.G: user UI and a long calculation).
      • When you are CPU bound, but the CPU calculations are delegating to a C extension that releases the GIL, such as numpy. Free parallelism on the cheap, yeah!

      E.G: a web scraper, a GUI to zip files, a development server, sending emails without blocking web page rendering, etc.

  2. Nov 2023
    1. It does provide an answer. The issue is that the Google form validates that the user has input a valid looking URL. So he needs to input an arbitrary, but valid URL, and then add that to /etc/hosts so his browser will resolve it to the address of his devserver. The question and answer are both fine as is and don't require any critique or clarification.

      The critical comment this was apparently in reply to was apparently deleted

  3. Sep 2023
    1. I agree with this statement so much. We should absolutely be failing hard rather than forcing people to debug thread safety issues at runtime. I can't think of anything more infuriating than debugging an issue that happens "sometimes".
  4. Aug 2023
    1. we humans depend on the natural world 00:07:01 [Music] but what we depend on is healthy ecosystems [Music] that are made up of a complex mix of plants and animal species each one has a 00:07:24 role to play and you know I see it as like a beautiful living tapestry and as an animal or plant species disappears from that ecosystem it's like pulling 00:07:38 out a thread and if enough threads are pulled then the tapestry will hang in tatters and the ecosystem will disappear
      • for: extinction, climate departure, Jane Goodall, quote, tapestry, thread,
      • quote
        • we humans depend on the natural world
        • but what we depend on is healthy ecosystems that are made up of a complex mix of plants and animal species
        • each one has a role to play and I see it as like a beautiful living tapestry and as an animal or plant species disappears from that ecosystem it's like pulling out a thread
        • and if enough threads are pulled then the tapestry will hang in tatters and the ecosystem will disappear
      • author
        • Jane Goodall
  5. Jun 2023
    1. This thread is locked.

      Yet another example of why it's dumb for Microsoft to lock Community threads. This is in the Bing search results as the top article for my issue with 1,911 views. Since 2011 though, there have been new developments! The new Media Player app in Windows 10 natively supports Zune playlist files! Since the thread is locked, I can't put this news in a place where others following my same search path will find it.

      Guess that's why it makes sense to use Hypothes.is 🤷‍♂️

  6. Dec 2022
    1. This thread is archivedNew comments cannot be posted and votes cannot be cast

      This is so stupid. I have a relevant question/comment to add to the thread, but someone has decided that no more value can come of this thread - yet it's in search results, so new people are seeing it all the time.

      If people don't want notifications on an old thread, they should mute notifications on it - not declare it dead because they bore easily.

      One could start a new thread talking about it, but that just daisy chains the topic across multiple threads.

      Reddit is dumb for having this "feature" and it originated to censor people, which is abhorrent.

  7. Sep 2022
    1. The discussion here can get very fast-paced. I am trying to periodically pause it to allow new folks, or people who don't have quite as much time, to catch up. Please feel free to comment requesting such a pause if you would like to contribute but are having trouble following it all.

      Why is it necessary to pause Can't new person post their question/comment even if it's in reply to comment #10 and the latest comment happens to be comment #56? There's no rule against replying/discussing something that is not the very latest thing to be posted in a discussion!

      Possibly due to lack of a threaded discussion feature in GitHub? I think so.

      Threads would allow replies to "quick person" A to go under their comment, without flooding the top level with comments... thus alowing "new person" B to post a new comment, which in so doing creates a new thread, which can have its own discussion.

  8. Aug 2022
    1. ReconfigBehSci [@SciBeh]. (2021, December 20). This thread is sobering and informative with respect to what overloading health services means in terms of individual experience...worth popping into google translate fir non-German speakers [Tweet]. Twitter. https://twitter.com/SciBeh/status/1472983739890348045

  9. Jul 2022
    1. Instead, we focus on the relationship between the human mind and the mechanics underlying allsocial systems. The search for the locus where the distribution of governing powers can be shiftedhas brought us thus to the human mind itself. Only by affirming the human as different from thesocial persona it enacts can we see the golden thread along which the human takeover can and musthappen. This golden thread runs in the usually unperceived gaps between thoughts, communicationsand decisions that are preconditioned, preprogramed, prethought [5 ,43 ,44 ]. It brings to the light ofconsciousness the thinking, speaking and acting that are present and living. ‘What I propose, therefore,is very simple’—Hannah Arendt [ 45 ] wrote—‘it is nothing more than to think what we are doing.’To think, to voice, to enact each time anew, is the vehicle of the human takeover. To secure the continuityof this golden thread, of this very flow into the governance of society—is our existential challenge.

      !- definition : golden thread * Hannah Arendt writes: "It is nothing more than to think what we are doing". * To think, voice and enact each time anew is the vehicle of the human takeover, securing the continuity of the golden thread used to govern society * The golden thread runs in the usually unperceived gaps betgween thoughts, communications and decisions that are preconditioned, preprogramed and prethought.

  10. Jun 2022
  11. May 2022
  12. Apr 2022
  13. Dec 2021
  14. Nov 2021
    1. LJS 418, f. 3r, the remnants of a sewing repair with thread remaining

      In parchment manuscripts one will often see small pin prick holes in the parchment which indicates that a hole in the animal skin was repaired during processing. Usually after curing and before use the thread from the repair is removed leaving only the small holes.

      Rarely, but occasionally, the thread will still remain in the final manuscript. An example of this is LJS 418, f 3r where one can see the thread left in the page.

  15. Oct 2021
    1. For clear writing, answer these questions 1. What Am I Really Trying To Say 2. Why Should People Care 3. What Is The Most Important Point 4. What Is The Easiest Way To Understand The Most Important Point 5. How Do I Want The Reader To Feel 6. What Should The Reader Do Next
    1. With NFTs, blockchains make the jump from finance into creative applications. Regulators would do well to recognize that blockchains are the next generation of the Internet, and applying financial regulations to NFTs is a category error.

      Che trasformazione portano effettivamente gli #NFT ?

      Grazie agli #NFT la #blockchain passa dall'ambito strettamente finanziario a quello artistico e creativo.

      Chi impone regole a questi ambiti allora, dovrebbe tener conto che la #blockchain è davvero il futuro di internet e che cercare di applicare ad essa delle regole troppo strette è folle.

    2. “Actual-value NFTs” can draw upon legal and code-based contracts - a song token can provide a royalty stream, a ticket token can provide access, a metaverse token can grant land titles, an item token can have in-game powers, an ISA token can provide a cut of creator earnings.

      Proprio per questo legame indissolubile col concetto di proprietà un token #NFT di un bene che si rivela remunerativo in qualche modo, si rivela accesso ad una parte di questa remunerazione. Ad esempio: il token di una canzone è accesso ai diritti d'autore della canzone in questione.

    3. For NFTs representing digital art and collectibles, the creator cannot enforce scarcity - it’s up to a surrounding community to imbue the authorized NFT with scarcity and prestige within the context of that community.

      Che ruolo c'è tra l' #NFT e la #community che ruota intorno a loro?

      Si tratta di quello stesso legame che c'è alla base del #contrattosociale

      Le #community intorno agli #NFT infatti rendono gli #NFT di valore perché esse stesse ci credono, nel momento in cui la #community muore allora anche lo specifico #NFT perde di valore.

      Nel caso delle #cryptomonete questo valore è tratto dagli enti che le creano e decidono di mantenerle.

    4. By assigning a unique token to a thing, its ownership (not the thing itself!) becomes programmable, verifiable, divisible, durable, universally addressable, composable, digitally secured, and easy to transfer.

      Quale è un presupposto fondamentale dell' #NFT ?

      È il fatto che il vero controllo è esercitato sulla proprietà della risorsa e non sulla risorsa stessa. Una volta che le viene associato il token la proprietà diventa programmabile, verificabile, divisibile, duratura ecc.

    5. An NFT is a unique, on-chain token representing ownership of an off-chain asset. The token is backed by a social contract from its creator and a surrounding community.

      Cos'è un #NFT ?

      È un token presente sulla #blockchain che rappresenta la proprietà di una risorsa esterna alla blockchain stessa.

      Questo token è verificato da un contratto tra il suo creatore e la community

  16. Jun 2021
    1. Apologies for digging up a closed thread, but it already contains some monorepo examples so feels like the best place to do it.
    1. Process based parallelisation is simpler than thread based due to well, the GIL on MRI rubies and lack of 100% thread safety within the other gems. (I'm fairly certain for example that there are threaded bugs lurking within the mocks code).
  17. May 2021
    1. Lewis Goodall on Twitter: “Here we go. He’s not messing about: ‘The truth is, senior ministers, senior officials, senior advisors like me fell disastrously short of the standards that the public has the right to expect in a crisis like this. When the public needed us most the government failed.’ https://t.co/lV7QqIpTDY” / Twitter. (n.d.). Retrieved May 27, 2021, from https://twitter.com/lewis_goodall/status/1397471561205092352

    1. Ashish K. Jha, MD, MPH. (2020, December 1). There is something funny happening with COVID hospitalizations Proportion of COVID pts getting hospitalized falling A lot Just recently My theory? As hospitals fill up, bar for admission rising A patient who might have been admitted 4 weeks ago may get sent home now Thread [Tweet]. @ashishkjha. https://twitter.com/ashishkjha/status/1333636841271078912

  18. Apr 2021
  19. Mar 2021
    1. Kenneth Fordyce. (2020, November 3). @devisridhar @georgeeaton Yet another article packed full of wise words: E.g., ‘in some ways, the people pushing for “herd immunity” are forcing us into these lockdown-release cycles because you end up in a reactive position by underestimating the spread of the virus and the hospitalisation rate’ [Tweet]. @FordyceKenneth. https://twitter.com/FordyceKenneth/status/1323544552112852992

  20. Feb 2021
    1. Dr Ellie Murray. (2020, December 3). Some really great resources & discussions in this thread. The tl;dr is (1) yes, the vaccine trials included some people who had previously had COVID; and (2) in general vaccination plans will include those who have previously been sick/infected. Thanks all! [Tweet]. @EpiEllie. https://twitter.com/EpiEllie/status/1334545373126389766

    1. Andrew💙Croxford. (2020, December 3). NEW THREAD: possible development of anti-Syncytin responses after immunization with the SARS-CoV-2 spike protein-coding mRNA vaccines, based on a ‘homologous’ region shared between these proteins. [Tweet]. @andrew_croxford. https://twitter.com/andrew_croxford/status/1334593606196187136

  21. Nov 2020
  22. Oct 2020
    1. Discussion is not necessary, but could be useful for critiquing a pattern. This would be useful for people who are proposing a pattern to the community or for people who want to gather feedback on an experiment.
  23. Sep 2020
  24. Aug 2020
  25. Jul 2020
  26. Jun 2020
    1. I know you acknowledged your response was late and you're just trying to help but please don't resurrect very old threads.

      This is better than creating a duplicate new thread.

      There is no better place to respond to an existing topic than in the existing thread for that topic.

    1. While statistical skills are required to test hypotheses, analysts are your best bet for coming up with those hypotheses in the first place. For instance, they might say something like “It’s only a correlation, but I suspect it could be driven by …” and then explain why they think that. This takes strong intuition about what might be going on beyond the data, and the communication skills to convey the options to the decision-maker, who typically calls the shots on which hypotheses (of many) are important enough to warrant a statistician’s effort. As analysts mature, they’ll begin to get the hang of judging what’s important in addition to what’s interesting, allowing decision-makers to step away from the middleman role.

      More formal and detailed version of above. Besides, the difference of being important and being interesting should be noted too. Maybe search for a thread.

  27. May 2020
  28. Apr 2020
  29. Mar 2020
  30. Dec 2019
  31. Aug 2019
    1. For every question you might have, please create a discussion thread, not a single comment. This makes it a bit easier to see and reply to the question, instead of questions and answers getting mixed together: