At the same time, that document didn’t change anything about Python’s runtime behavior, nor did it prevent other uses of annotations.
what a smart move!
At the same time, that document didn’t change anything about Python’s runtime behavior, nor did it prevent other uses of annotations.
what a smart move!
I switched from VSCode to Zed
[tool.pyright] section in pyproject.toml effectively force Basedpyright’s recommended mode.typeCheckingMode in Zed’s settings.json did not help; the fix was explicitly setting typeCheckingMode = "standard" inside each project’s [tool.pyright] config."disablePullDiagnostics": true in Zed’s Basedpyright initialization options.ty language server, found it good, but stayed with Basedpyright to match CI’s Pyright.settings.json showcasing autosave, disabled inline blame, VSCode keymap, fonts, light theme, and customized Basedpyright LSP options.How uv got so fast
Optimizations that don’t need Rust Some of uv’s speed comes from Rust. But not as much as you’d think. Several key optimizations could be implemented in pip today: HTTP range requests for metadata. Wheel files are zip archives, and zip archives put their file listing at the end. uv tries PEP 658 metadata first, falls back to HTTP range requests for the zip central directory, then full wheel download, then building from source. Each step is slower and riskier. The design makes the fast path cover 99% of cases. This is HTTP protocol work, not Rust. Parallel downloads. pip downloads packages one at a time. uv downloads many at once. This is concurrency, not language magic. Global cache with hardlinks. pip copies packages into each virtual environment. uv keeps one copy globally and uses hardlinks (or copy-on-write on filesystems that support it). Installing the same package into ten venvs takes the same disk space as one. This is filesystem ops, not language-dependent. Python-free resolution. pip needs Python running to do anything, and invokes build backends as subprocesses to get metadata from legacy packages. uv parses TOML and wheel metadata natively, only spawning Python when it hits a setup.py-only package that has no other option. PubGrub resolver. uv uses the PubGrub algorithm, originally from Dart’s pub package manager. pip uses a backtracking resolver. PubGrub is faster at finding solutions and better at explaining failures. It’s an algorithm choice, not a language choice
Many of uv's optimisations come from improvements that can be made without rust such as http range requests for packages, parallel downloading, better local caching
No .egg support. Eggs were the pre-wheel binary format. pip still handles them; uv doesn’t even try. The format has been obsolete for over a decade. No pip.conf. uv ignores pip’s configuration files entirely. No parsing, no environment variable lookups, no inheritance from system-wide and per-user locations. No bytecode compilation by default. pip compiles .py files to .pyc during installation. uv skips this step, shaving time off every install. You can opt in if you want it. Virtual environments required. pip lets you install into system Python by default. uv inverts this, refusing to touch system Python without explicit flags. This removes a whole category of permission checks and safety code. Stricter spec enforcement. pip accepts malformed packages that technically violate packaging specs. uv rejects them. Less tolerance means less fallback logic. Ignoring requires-python upper bounds. When a package says it requires python<4.0, uv ignores the upper bound and only checks the lower. This reduces resolver backtracking dramatically since upper bounds are almost always wrong. Packages declare python<4.0 because they haven’t tested on Python 4, not because they’ll actually break. The constraint is defensive, not predictive. First-index wins by default. When multiple package indexes are configured, pip checks all of them. uv picks from the first index that has the package, stopping there. This prevents dependency confusion attacks and avoids extra network requests. Each of these is a code path pip has to execute and uv doesn’t.
UV does not support egg files or legacy pip.conf and it doesn't check for upper bounds on dependencies or compile py files to pyc bytecode by default. This removes a number of codepaths and allows the tool to run faster.
PEP 658 went live on PyPI in May 2023. uv launched in February 2024. The timing isn’t coincidental. uv could be fast because the ecosystem finally had the infrastructure to support it. A tool like uv couldn’t have shipped in 2020. The standards weren’t there yet.
Before February 2024 the pip standards for pyproject.toml and wheel management weren't there and UV would not have been possible.
The relevant PEP standards are 517, 518, 621 and 658
SQLite integrates seamlessly with Python. Here’s a simple script to interact with our database: import sqlite3 import pandas as pd # Connect to the database conn = sqlite3.connect('analytics.db') # Create a cursor cur = conn.cursor() # Execute a query cur.execute(""" SELECT pv.page_url, pv.view_count, COUNT(uv.id) as unique_visitors FROM page_views pv LEFT JOIN user_visits uv ON pv.id = uv.page_id GROUP BY pv.id ORDER BY pv.view_count DESC """) # Fetch all rows rows = cur.fetchall() # Create a pandas DataFrame df = pd.DataFrame(rows, columns=['Page URL', 'View Count', 'Unique Visitors']) print(df) # Close the connection conn.close()
sqlite integrates w python. what about php?
Python is not a great language for data science. Part 1: The experience
The simple software I wrote really made it a feasible one-person project though, and motivates me to go through the whole process again next year.
This is the key bit for me. The reproducibility of the process itself will make the next time a lot more streamlined. I really like this article!
This won't trigger an error in Python, though. You have to use a type checker like mypy, pyright, or ty to get the warning. But it's a lightweight alternative, and even without tooling, it tells other programmers what parts of the program do what.
oh neat, so basically the NewType thing is a way to be more explicit about what a function is expecting, without needing to spend loads of time defining a dataclass and so on - it's a bit like annotating an existing type like a string, in a way that subsequent functions fo look for
👎 2024年12月后无更新了,估计作者已经弃坑
功能: - 音频转写采用whisper模型 (基于python包openai-whisper实现,需要pytorch支持) - 翻译只支持Azure Translator API
部署: 基于python 和 ffmpeg
能力: - 语音转录支持本地(WhisperCpp/FasterWhisper) 和在线(B接口/J接口??) - 字幕翻译支持传统引擎和LLM - 传统引擎: DeepL/微软/谷歌 - LLM: Ollama、DeepSeek、硅基流动以及【OpenAI兼容接口】 (配套提供LLM API中转站)
安装部署 - Windows提供一键安装包 - MacOS需要自行基于python搭建,且作者说未验证过 👎 。另外本地 whisper 功能尚不支持macos)
Go
Go 语言本身没有显式的“迭代器类型”,不像 Python 有 iter() 和 next(),也不像 Java 有 Iterator 接口。
在 Go 中,迭代是通过 range 关键字实现的,它是语言内建的语法结构,背后确实有迭代逻辑,但对程序员是“隐藏”的,不需要也不能直接访问一个“迭代器对象”。
for
iter() 和 next() 是 Python 代码的一部分吗?
是的,iter() 和 next() 是 Python 的内建函数,是标准语法的一部分。
如果不用 for,而是用 iter() 和 next() 一个个取,代码会长什么样?
可以完全不用 for 循环,手动使用它们来遍历一个可迭代对象。
```python my_list = ["苹果", "香蕉", "橘子"]
it = iter(my_list)
print(next(it)) # 输出:苹果 print(next(it)) # 输出:香蕉 print(next(it)) # 输出:橘子
```
也可以用 while 循环来模拟 for 的行为:
``` it = iter(my_list) while True: try: item = next(it) print(item) except StopIteration: break
```
转换
可以用 iter() 把 iterable 变成 iterator: ```python my_list = [1, 2, 3] it = iter(my_list) # 现在 it 是一个迭代器 print(next(it)) # 输出 1 print(next(it)) # 输出 2
```
Another example:
```python my_list = [1, 2, 3] # 这是一个可迭代对象 my_iter = iter(my_list) # 现在我们把它变成了一个迭代器
print(next(my_iter)) # 输出 1 print(next(my_iter)) # 输出 2 print(next(my_iter)) # 输出 3
```
Tips: - my_list 是 可迭代对象,你可以用 for 循环遍历它。 - my_iter 是 迭代器,你可以用 next() 一步步取出元素。
🧠 一个比喻帮助理解: * 可迭代对象就像一本书 📖,你可以翻页阅读。 * 迭代器就像一个书签 📎,它会记住你读到哪一页了,每次翻一页,直到读完。
for value in iterable: print(value)
Welcome to Python Online Compiler!Key features:• Run Python code directly in your browser• Install packages with pip• Share your code with others• Multiple tabs for different files• Light and dark themes
Say goodbye to Python setup hassles! Code directly in your browser with zero installation needed. Write your code, hit RUN, and see results instantly. Our clean, powerful editor gives you syntax highlighting, multiple tabs, and easy package installation with pip.
Need to share your work? One click generates a link anyone can use to view and run your code. Switch between light and dark themes, customize your layout, and code Python anywhere, anytime. Try our random code examples to get started quickly.
Python Online - the simplest way to write, run, and share Python code!
Si queremos modelar objetos complejos, tenemos que tener cierta familiaridad con los objetos preexistentes
Para poder modelar objetos y en general, para escribir código y adentrarse al mundo de la Programación Orientada a Objetos es necesario tener ciertas nociones con la forma, la estructura y las distintas particularidades de los lenguajes de programación. Aquí tenemos in recurso de información que nos da algunas nociones relacionadas con la escritura de código orientado a objetos usando Python
all any all は与えられたリストの要素すべてが条件式として真であるかを判定します。 any は与えられたリストの要素の中に条件式として真であるものが1つ以上存在するかを判定します。
なお、べき乗は 1.3 節で扱った ** 演算子でも計算できますが、組み込み関数 pow を用いたほうが高速に計算できます。 また、余りは pow(a,b) % mod のように後から計算しても同じ結果になりますが、pow(a,b) の値が非常に大きくなる場合は pow(a,b,mod) のほうが高速に計算できます。 競技プログラミングにおいては pow 関数を使うことが必須であるケースが多いため覚えておきましょう。
Virtual envrionment
class collections.Counter([iterable-or-mapping])¶
This Counter class is one of my favorite new-ish features of Python. It's very handy and saves me some coding. It could be helpful for producing data for a histogram or other graphs.
It seems to have been introduced in Python 3.1, which was a few years ago, but I've been using Python for 23+ years, so it's "new" to me.
Deploying Machine Learning Models with Flask and AWS Lambda: A Complete Guide
In essence, this article is about:
1) Training a sample model and uploading it to an S3 bucket:
```python from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression import joblib
iris = load_iris() X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression(max_iter=200) model.fit(X_train, y_train)
joblib.dump(model, 'model.pkl') ```
```json { "dev": { "app_function": "app.app", "exclude": [ "boto3", "dateutil", "botocore", "s3transfer", "concurrent" ], "profile_name": null, "project_name": "flask-test-app", "runtime": "python3.10", "s3_bucket": "zappa-31096o41b" },
"production": {
"app_function": "app.app",
"exclude": [
"boto3",
"dateutil",
"botocore",
"s3transfer",
"concurrent"
],
"profile_name": null,
"project_name": "flask-test-app",
"runtime": "python3.10",
"s3_bucket": "zappa-31096o41b"
}
} ```
```python import boto3 import joblib import os
app = Flask(name)
s3 = boto3.client('s3')
s3.download_file('your-s3-bucket-name', 'model.pkl', '/tmp/model.pkl') model = joblib.load('/tmp/model.pkl')
@app.route('/predict', methods=['POST']) def predict(): # Get the data from the POST request data = request.get_json(force=True)
# Convert the data into a numpy array
input_data = np.array(data['input']).reshape(1, -1)
# Make a prediction using the model
prediction = model.predict(input_data)
# Return the prediction as a JSON response
return jsonify({'prediction': int(prediction[0])})
if name == 'main': app.run(debug=True) ```
bash
zappa deploy production
and later eventually updating it:
bash
zappa update production
https://xyz123.execute-api.us-east-1.amazonaws.com/production
which we can query:
curl -X POST -H "Content-Type: application/json" -d '{"input": [5.1, 3.5, 1.4, 0.2]}' https://xyz123.execute-api.us-east-1.amazonaws.com/production/predict
I’m writing this on October 15th, 2024. Last week I would’ve said you probably shouldn’t be using uv’s Python in production, because you wouldn’t be getting security updates to OpenSSL. This week, I would tentatively say that it’s fine. This makes me a little uncomfortable, because there may well be other issues I haven’t thought of, and uv is still very new.
You may use uv in production, but there may be still some undiscovered quirks.
The uv-provided Python executable is slower than the one shipped by Ubuntu 24.04 LTS, but it’s faster than the “official” Docker image.
The ability to install Python with uv adds interesting possibilities for production packaging. For example, you can use an Ubuntu 24.04 base Docker image, download uv, and rely on uv to trivially install any Python version. Which is to say, you won’t be limited to the versions Ubuntu packages for you.
Unlike most Python packaging tools, uv doesn’t require Python to be installed to use it.
About uv Python packaging tool
In comparison, Perl/Python/Javascript, which also have the latter property, have other false-like values (0 and empty string), which make || differ from a null-coalescing operator in many more cases (numbers and strings being two of the most frequently used data types). This is what led Perl/Python/Javascript to add a separate operator while Ruby hasn't.
austin
Say goodbye to the headaches of setting up Python locally. No more installations or configurations, you can execute Python code right in your web browser. Just input your code, hit RUN, and watch the magic happen! Compile, run, and share Python code online with our powerful integrated Python development environment (IDE). Want to show off your work? Use the SHARE option to make your code accessible to anyone, anywhere.
One of the best python compilers if you're a new dev and dont want to install Python locally.
CPython
CPython 是目前最流行的 Python 运行时。它在 GitHub Star 已经 60k 了,可见关注 Python 内部实现的人非常多,大家也都很好奇如此简洁优美的语言是怎么被创造出来的。
cPython 是个什么?
Note that the Python documentation refers to these as special methods and notes the synonym "magic method" but very rarely uses the term "dunder method". However, "dunder method" is a fairly common Python colloquialism, as noted in my unofficial Python glossary.
special methods = magic methods = dunder methods
python -m webbrowser https://pym.dev/p
Opening URL using Python's webbrowser module
What are the tools that comes on your mind when someone say “debug”? Let me guess: a memory leak detector (e.g. Valgrind); a profiler (e.g. GNU gprof); a function that stops your program and gives you a REPL (e.g. Python’s breakpoint and Ruby’s byebug); something that we call a “debugger” (like GDB, or something similar embedded on the IDEs); or even our old friend, the print function. So, in this text I’ll try to convince you to add Git to your debug toolbelt.
6 differen debugging tools
The result? Our runtime image just got 6x smaller! Six times! From > 1.1 GB to 170 MB.
See (above this annotation) the most optimized & CI friendly Python Docker build with Poetry (until this issue gets resolved)
Perl vs Python: What are the Key Differences?
Check this blog to find the key difference between Perl vs Python
Python List append() Method
This web page offers a brief overview of the append() function for lists in python.
Tuples can include different data types.
This is an interesting aspect. Tuples are not limited to a single data type.
We’ve (painstakingly) manually reviewed 310 live MLOps positions, advertised across various platforms in Q4 this year
They went through 310 role descriptions and, even though role descriptions may vary significantly, they found 3 core skills that a large percentage of MLOps roles required:
📦 Docker and Kubernetes 🐍 Python 🌥 Cloud
Emacs Application Framework
very interesting
setuptools is the most popular (at 50k packages), Poetry is second at 41k, Hatchling is third at 8.1k. Other tools to cross 500 users include Flit (4.4k), PDM (1.3k), Maturin (1.3k, build backend for Rust-based packages).
Popularity of Python package managers in 2024
Python Development in Spacemacs
Installing qutebrowser with virtualenv
jalajthanaki / NLPython
연오의 파이썬
연오
ambiguous src
Why is using "src" folder to contain module files ambiguous? In this case "sample" seams more ambiguous cause it could be refering a folder containing sample data. Also how do you know the module is not called docs?
My Python Emacs Workflow
Measure Execution Time With time.thread_time()
The time.thread_time() reports the time that the current thread has been executing.
The time begins or is zero when the current thread is first created.
Return the value (in fractional seconds) of the sum of the system and user CPU time of the current thread.
It is an equivalent value to the time.process_time(), except calculated at the scope of the current thread, not the current process.
This value is calculated as the sum of the system time and the user time.
thread time = user time + system time
The reported time does not include sleep time.
This means if the thread is blocked by a call to time.sleep() or perhaps is suspended by the operating system, then this time is not included in the reported time. This is called a “thread-wide” or “thread-specific” time.
Measure Execution Time With time.process_time()
The time.process_time() reports the time that the current process has been executed.
The time begins or is zero when the current process is first created.
Calculated as the sum of the system time and the user time:
process time = user time + system time
System time is time that the CPU is spent executing system calls for the kernel (e.g. the operating system)
User time is time spent by the CPU executing calls in the program (e.g. your code).
When a program loops through an array, it is accumulating user CPU time. Conversely, when a program executes a system call such as
execorfork, it is accumulating system CPU time.
The reported time does not include sleep time.
This means if the process is blocked by a call to time.sleep() or perhaps is suspended by the operating system, then this time is not included in the reported time. This is called a “process-wide” time.
As such, it only reports the time that the current process was executed since it was created by the operating system.
Measure Execution Time With time.monotonic()
The time.monotonic() function returns time stamps from a clock that cannot go backwards, as its name suggests.
In mathematics, monotonic, e.g. a monotonic function means a function whose output over increases (or decreaes).
This means that the result from the time.monotonic() function will never be before the result from a prior call.
Return the value (in fractional seconds) of a monotonic clock, i.e. a clock that cannot go backwards.
It is a high-resolution time stamp, although is not relative to epoch-like time.time(). Instead, like time.perf_counter() uses a separate timer separate from the system clock.
The time.monotonic() has a lower resolution than the time.perf_counter() function.
This means that values from the time.monotonic() function can be compared to each other, relatively, but not to the system clock.
Like the time.perf_counter() function, time.monotonic() function is “system-wide”, meaning that it is not affected by changes to the system clock, such as updates or clock adjustments due to time synchronization.
Like the time.perf_counter() function, the time.monotonic() function was introduced in Python version 3.3 with the intent of addressing the limitations of the time.time() function tied to the system clock, such as use in short-duration benchmarking.
Monotonic clock (cannot go backward), not affected by system clock updates.
Measure Execution Time With time.perf_counter()
The time.perf_counter() function reports the value of a performance counter on the system.
It does not report the time since epoch like time.time().
Return the value (in fractional seconds) of a performance counter, i.e. a clock with the highest available resolution to measure a short duration. It does include time elapsed during sleep and is system-wide.
The returned value in seconds with fractional components (e.g. milliseconds and nanoseconds), provides a high-resolution timestamp.
Calculating the difference between two timestamps from the time.perf_counter() allows high-resolution execution time benchmarking, e.g. in the millisecond and nanosecond range.
The timestamp from the time.perf_counter() function is consistent, meaning that two durations can be compared relative to each other in a meaningful way.
The time.perf_counter() function was introduced in Python version 3.3 with the intended use for short-duration benchmarking.
The perf_counter() function was specifically designed to overcome the limitations of other time functions to ensure that the result is consistent across platforms and monotonic (always increasing).
For accuracy, the timeit module uses the time.perf_counter() internally.
Measure Execution Time With time.time()
The time.time() function reports the number of seconds since the epoch (epoch is January 1st 1970, which is used on Unix systems and beyond as an arbitrary fixed time in the past) as a floating point number.
The result is a floating point value, potentially offering fractions of a seconds (e.g. milliseconds), if the platforms support it.
The time.time() function is not perfect.
It is possible for a subsequent call to time.time() to return a value in seconds less than the previous value, due to rounding.
Note: even though the time is always returned as a floating point number, not all systems provide time with a better precision than 1 second. While this function normally returns non-decreasing values, it can return a lower value than a previous call if the system clock has been set back between the two calls.
there are automatic ways to measure execution time, such as via the timeit module.
There are 5 ways to measure execution time manually in Python using the time module, they are:
time.time()time.perf_counter()time.monotonic()time.process_time()time.thread_time()Note, each function returns a time in seconds and has an equivalent function that returns the time in nanoseconds, e.g. time.time_ns(), time.perf_counter_ns(), time.monotonic_ns(), time.process_time_ns() and time.thread_time_ns().
Recall that there are 1,000 nanoseconds in one microsecond, 1,000 microseconds in 1 millisecond, and 1,000 milliseconds in one second. This highlights that the nanosecond versions of the function are for measuring very short time scales indeed.
There are common errors experienced by beginners when getting started with asyncio in Python.
They are:
And this is where the asynchronicity comes in: The "results" list does not actually contain the results from running our functions. Instead, it contains "futures" which are similar to the JavaScript idea of "promises." In order to allow our program to continue running, we get back these futures that represent a placeholder for a value. If we try to print the future, depending on whether it's finished running or not, we'll either get back a state of "pending" or "finished." Once it's finished we can get the return value (assuming there is one) using var.result().
The difference between asyncio.sleep() and time.sleep() is that asyncio.sleep() is non-blocking.
The calls don't actually get made until we schedule them with await asyncio.gather(*tasks). This runs all of the tasks in our list and waits for them to finish before continuing with the rest of our program.
programming with asyncio pretty much enforces* using some sort of "main" function.
This is because you need to use the "async" keyword in order to use the "await" syntax, and the "await" syntax is the only way to actually run other async functions.`
async for (not used here) iterates over an asynchronous stream.
async with allows awaiting async responses and file operations.
What is a thread?
A thread is a way of allowing your computer to break up a single process/program into many lightweight pieces that execute in parallel. Somewhat confusingly, Python's standard implementation of threading limits threads to only being able to execute one at a time due to something called the Global Interpreter Lock (GIL). The GIL is necessary because CPython's (Python's default implementation) memory management is not thread-safe. Because of this limitation, threading in Python is concurrent, but not parallel. To get around this, Python has a separate multiprocessing module not limited by the GIL that spins up separate processes, enabling parallel execution of your code. Using the multiprocessing module is nearly identical to using the threading module.
Asynchronous nature of threading: as one function waits, another one begins, and so on.
when we join threads with thread.join(), all we're doing is ensuring the thread has finished before continuing on with our code.
Creating a thread is not the same as starting a thread, however. To start your thread, use {the name of your thread}.start(). Starting a thread means "starting its execution."
Running the code in a subprocess is much slower than running a thread, not because the computation is slower, but because of the overhead of copying and (de)serializing the data. So how do you avoid this overhead?
Reducing the performance hit of copying data between processes:
Option #1: Just use threads
Processes have overhead, threads do not. And while it’s true that generic Python code won’t parallelize well when using multiple threads, that’s not necessarily true for your Python code. For example, NumPy releases the GIL for many of its operations, which means you can use multiple CPU cores even with threads.
``` # numpy_gil.py import numpy as np from time import time from multiprocessing.pool import ThreadPool
arr = np.ones((1024, 1024, 1024))
start = time() for i in range(10): arr.sum() print("Sequential:", time() - start)
expected = arr.sum()
start = time() with ThreadPool(4) as pool: result = pool.map(np.sum, [arr] * 10) assert result == [expected] * 10 print("4 threads:", time() - start) ```
When run, we see that NumPy uses multiple cores just fine when using threads, at least for this operation:
$ python numpy_gil.py
Sequential: 4.253053188323975
4 threads: 1.3854241371154785
Pandas is built on NumPy, so many numeric operations will likely release the GIL as well. However, anything involving strings, or Python objects in general, will not. So another approach is to use a library like Polars which is designed from the ground-up for parallelism, to the point where you don’t have to think about it at all, it has an internal thread pool.
Option #2: Live with it
If you’re stuck with using processes, you might just decide to live with the overhead of pickling. In particular, if you minimize how much data gets passed and forth between processes, and the computation in each process is significant enough, the cost of copying and serializing data might not significantly impact your program’s runtime. Spending a few seconds on pickling doesn’t really matter if your subsequent computation takes 10 minutes.
Option #3: Write the data to disk
Instead of passing data directly, you can write the data to disk, and then pass the path to this file: * to the subprocess (as an argument) * to parent process (as the return value of the function running in the worker process).
The recipient process can then parse the file.
``` import pandas as pd import multiprocessing as mp from pathlib import Path from tempfile import mkdtemp from time import time
def noop(df: pd.DataFrame): # real code would process the dataframe here pass
def noop_from_path(path: Path): df = pd.read_parquet(path, engine="fastparquet") # real code would process the dataframe here pass
def main(): df = pd.DataFrame({"column": list(range(10_000_000))})
with mp.get_context("spawn").Pool(1) as pool:
# Pass the DataFrame to the worker process
# directly, via pickling:
start = time()
pool.apply(noop, (df,))
print("Pickling-based:", time() - start)
# Write the DataFrame to a file, pass the path to
# the file to the worker process:
start = time()
path = Path(mkdtemp()) / "temp.parquet"
df.to_parquet(
path,
engine="fastparquet",
# Run faster by skipping compression:
compression="uncompressed",
)
pool.apply(noop_from_path, (path,))
print("Parquet-based:", time() - start)
if name == "main":
main()
``
**Option #4:multiprocessing.shared_memory`**
Because processes sometimes do want to share memory, operating systems typically provide facilities for explicitly creating shared memory between processes. Python wraps this facilities in the multiprocessing.shared_memory module.
However, unlike threads, where the same memory address space allows trivially sharing Python objects, in this case you’re mostly limited to sharing arrays. And as we’ve seen, NumPy releases the GIL for expensive operations, which means you can just use threads, which is much simpler. Still, in case you ever need it, it’s worth knowing this module exists.
Note: The module also includes ShareableList, which is a bit like a Python list but limited to int, float, bool, small str and bytes, and None. But this doesn’t help you cheaply share an arbitrary Python object.
A bad option for Linux: the "fork" context
You may have noticed we did multiprocessing.get_context("spawn").Pool() to create a process pool. This is because Python has multiple implementations of multiprocessing on some OSes. "spawn" is the only option on Windows, the only non-broken option on macOS, and available on Linux. When using "spawn", a completely new process is created, so you always have to copy data across.
On Linux, the default is "fork": the new child process has a complete copy of the memory of the parent process at the time of the child process’ creation. This means any objects in the parent (arrays, giant dicts, whatever) that were created before the child process was created, and were stored somewhere helpful like a module, are accessible to the child. Which means you don’t need to pickle/unpickle to access them.
Sounds useful, right? There’s only one problem: the "fork" context is super-broken, which is why it will stop being the default in Python 3.14.
Consider the following program:
``` import threading import sys from multiprocessing import Process
def thread1(): for i in range(1000): print("hello", file=sys.stderr)
threading.Thread(target=thread1).start()
def foo(): pass
Process(target=foo).start() ```
On my computer, this program consistently deadlocks: it freezes and never exits. Any time you have threads in the parent process, the "fork" context can cause in potential deadlocks, or even corrupted memory, in the child process.
You might think that you’re fine because you don’t start any threads. But many Python libraries start a thread pool on import, for example NumPy. If you’re using NumPy, Pandas, or any other library that depends on NumPy, you are running a threaded program, and therefore at risk of deadlocks, segfaults, or data corruption when using the "fork" multiprocessing context. For more details see this article on why multiprocessing’s default is broken on Linux.
You’re just shooting yourself in the foot if you take this approach.
When you’re writing Python, though, you want to share Python objects between processes.
To enable this, when you pass Python objects between processes using Python’s multiprocessing library:
pickle.This serialization and deserialization process involves computation, which can potentially be slow.
Threads vs. processes
Multiple threads let you run code in parallel, potentially on multiple CPUs. On Python, however, the global interpreter lock makes this parallelism harder to achieve.
Multiple processes also let you run code in parallel—so what’s the difference between threads and processes?
All the threads inside a single process share the same memory address space. If thread 1 in a process stores some memory at address 0x7f0cd1a88810, thread 2 can access the same memory at the same address. That means passing objects between threads is cheap: you just need to get the pointer to the memory address from one thread to the other. A memory address is 8 bytes: this is not a lot of data to move around.
In contrast, processes do not share the same memory space. There are some shared memory facilities provided by the operating system, typically, and we’ll get to that later. But by default, no memory is shared. That means you can’t just share the address of your data across processes: you have to copy the data.
Technique #2: Sampling
How do you load only a subset of the rows?
When you load your data, you can specify a skiprows function that will randomly decide whether to load that row or not:
```
from random import random
def sample(row_number): ... if row_number == 0: ... # Never drop the row with column names: ... return False ... # random() returns uniform numbers between 0 and 1: ... return random() > 0.001 ... sampled = pd.read_csv("/tmp/voting.csv", skiprows=sample) len(sampled) 973 ```
lossy compression: drop some of your data in a way that doesn’t impact your final results too much.
If parts of your data don’t impact your analysis, no need to waste memory keeping extraneous details around.
except StopAsyncIteration if is_async else StopIteration:
Interesting: using ternary operator in except clause
In sync code, you might use
a thread pool and imap_unordered():
``` pool = multiprocessing.dummy.Pool(2)
for result in pool.imap_unordered(do_stuff, things_to_do): print(result) ```
Here, concurrency is limited by the fixed number of threads.
Gunicorn and multiprocessing
Gunicorn forks a base process into n worker processes, and each worker is managed by Uvicorn (with the asynchronous uvloop). Which means:
This way, we can have the best of both worlds: concurrency (multithreading) and parallelism (multiprocessing).
There is another way to declare a route with FastAPI
Using the asyncio:
``` import asyncio
from fastapi import FastAPI
app = FastAPI()
@app.get("/asyncwait") async def asyncwait(): duration = 0.05 await asyncio.sleep(duration) return {"duration": duration} ```
Use Python asyncio.as_completed
There will be moments when you don't have to await for every single task to be processed right away.
We do this by using asyncio.as_completed which returns a generator with completed coroutines.
When to use Python Async
Async only makes sense if you're doing IO.
There's ZERO benefit in using async to stuff like this that is CPU-bound:
``` import asyncio
async def sum_two_numbers_async(n1: int, n2: int) -> int: return n1 + n2
async def main(): await sum_two_numbers_async(2, 2) await sum_two_numbers_async(4, 4)
asyncio.run(main()) ```
Your code might even get slower by doing that due to the Event Loop.
That's because Python async only optimizes IDLE time!
If you want 2 or more functions to run concurrently, you need asyncio.create_task.
Creating a task triggers the async operation, and it needs to be awaited at some point.
For example:
task = create_task(my_async_function('arg1'))
result = await task
As we're creating many tasks, we need asyncio.gather which awaits all tasks to be done.
they think async is parallel which is not true
Fast API
Fast API is a high-level web framework like flask, but that happens to be async, unlike flask. With the added benefit of using type hints and pydantic to generate schemas.
It's not a building block like twisted, gevent, trio or asyncio. In fact, it's built on top of asyncio. It's in the same group as flask, bottle, django, pyramid, etc. Although it's a micro-framework, so it's focused on routing, data validation and API delivery.
The code isn't that different from your typical asyncio script:
``` import re import time
import httpx import trio
urls = [ "https://www.bitecode.dev/p/relieving-your-python-packaging-pain", "https://www.bitecode.dev/p/hype-cycles", "https://www.bitecode.dev/p/why-not-tell-people-to-simply-use", "https://www.bitecode.dev/p/nobody-ever-paid-me-for-code", "https://www.bitecode.dev/p/python-cocktail-mix-a-context-manager", "https://www.bitecode.dev/p/the-costly-mistake-so-many-makes", "https://www.bitecode.dev/p/the-weirdest-python-keyword", ]
title_pattern = re.compile(r"<title[^>]>(.?)</title>", re.IGNORECASE)
user_agent = ( "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/116.0" )
async def fetch_url(url): start_time = time.time()
async with httpx.AsyncClient() as client:
headers = {"User-Agent": user_agent}
response = await client.get(url, headers=headers)
match = title_pattern.search(response.text)
title = match.group(1) if match else "Unknown"
print(f"URL: {url}\nTitle: {title}")
end_time = time.time()
elapsed_time = end_time - start_time
print(f"Time taken for {url}: {elapsed_time:.4f} seconds\n")
async def main(): global_start_time = time.time()
# That's the biggest API difference
async with trio.open_nursery() as nursery:
for url in urls:
nursery.start_soon(fetch_url, url)
global_end_time = time.time()
global_elapsed_time = global_end_time - global_start_time
print(f"Total time taken for all URLs: {global_elapsed_time:.4f} seconds")
if name == "main": trio.run(main) ```
Because it doesn't create nor schedule coroutines immediately (notice the nursery.start_soon(fetch_url, url) is not nursery.start_soon(fetch_url(url))), it will also consume less memory. But the most important part is the nursery:
# That's the biggest API difference
async with trio.open_nursery() as nursery:
for url in urls:
nursery.start_soon(fetch_url, url)
The with block scopes all the tasks, meaning everything that is started inside that context manager is guaranteed to be finished (or terminated) when it exits. First, the API is better than expecting the user to wait manually like with asyncio.gather: you cannot start concurrent coroutines without a clear scope in trio, it doesn't rely on the coder's discipline. But under the hood, the design is also different. The whole bunch of coroutines you group and start can be canceled easily, because trio always knows where things begin and end.
As soon as things get complicated, code with curio-like design become radically simpler than ones with asyncio-like design.
trio
For many years, the very talented dev and speaker David Beazley has been showing unease with asyncio's design, and made more and more experiments and public talks about what could an alternative look like. It culminated with the excellent Die Threads presentation, live coding the sum of the experience of all those ideas, that eventually would become the curio library. Watch it. It’s so good.
Trio is not compatible with asyncio, nor gevent or twisted by default. This means it's also its little own async island.
But in exchange for that, it provides a very different internal take on how to deal with this kind of concurrency, where every coroutine is tied to an explicit scope, everything can be awaited easily, or canceled.
Because of the way gevent works, you can take a blocking script, and with very few modifications, make it async. Let's take the original stdlib one, and convert it to gevent:
``` import re import time
import gevent from gevent import monkey
monkey.patch_all() # THIS MUST BE DONE BEFORE IMPORTING URLLIB
from urllib.request import Request, urlopen
urls = [ "https://www.bitecode.dev/p/relieving-your-python-packaging-pain", "https://www.bitecode.dev/p/hype-cycles", "https://www.bitecode.dev/p/why-not-tell-people-to-simply-use", "https://www.bitecode.dev/p/nobody-ever-paid-me-for-code", "https://www.bitecode.dev/p/python-cocktail-mix-a-context-manager", "https://www.bitecode.dev/p/the-costly-mistake-so-many-makes", "https://www.bitecode.dev/p/the-weirdest-python-keyword", ]
title_pattern = re.compile(r"<title[^>]>(.?)</title>", re.IGNORECASE)
user_agent = ( "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/116.0" )
def fetch_url(url): start_time = time.time()
headers = {"User-Agent": user_agent}
with urlopen(Request(url, headers=headers)) as response:
html_content = response.read().decode("utf-8")
match = title_pattern.search(html_content)
title = match.group(1) if match else "Unknown"
print(f"URL: {url}\nTitle: {title}")
end_time = time.time()
elapsed_time = end_time - start_time
print(f"Time taken: {elapsed_time:.4f} seconds\n")
def main(): global_start_time = time.time()
# Here is where we convert synchronous calls into async ones
greenlets = [gevent.spawn(fetch_url, url) for url in urls]
gevent.joinall(greenlets)
global_end_time = time.time()
global_elapsed_time = global_end_time - global_start_time
print(f"Total time taken: {global_elapsed_time:.4f} seconds")
main() ```
No async, no await. No special lib except for gevent. In fact it would work with the requests lib just as well. Very few modifications are needed, for a net perf gain.
The only danger is if you call gevent.monkey.patch_all() too late. You get a cryptic error that crashes your program.
Tips
lambda or functools.partial.max_worker = 1 is a very nice way to get a poor man’s task queue.Both are bad if you need to cancel tasks, collaborate a lot between tasks, deal precisely with the task lifecycle, needs a huge number of workers or want to milk out every single bit of perfs. You won’t get nowhere near Rust level of speed.
Process pools are good for:
Thread pools are good for:
E.G: a web scraper, a GUI to zip files, a development server, sending emails without blocking web page rendering, etc.
What would a version with multiprocessing look like?
Pretty much the same, but, we use ProcessPoolExecutor instead.
```python from concurrent.futures import ProcessPoolExecutor, as_completed
...
with ProcessPoolExecutor(max_workers=5) as executor: ... ```
Note that here the number of workers maps to the number of CPU cores I want to dedicate to the program. Processes are way more expensive than threads, as each starts a new Python instance.
Python standard library comes with a beautiful abstraction for them I see too few people use: the pool executors.
ThreadPoolExecutor.
```python from concurrent.futures import ThreadPoolExecutor, as_completed
def main(): with ThreadPoolExecutor(max_workers=len(URLs)) as executor: tasks = {} for url in URLs: future = executor.submit(fetch_url, url) tasks[future] = url
for future in as_completed(tasks):
title = future.result()
url = tasks[future]
print(f"URL: {url}\nTitle: {title}")
```
You can distribute work to a bunch of process workers or thread workers with a few lines of code:
```python from concurrent.futures import ThreadPoolExecutor, as_completed
with ThreadPoolExecutor(max_workers=5) as executor: executor.submit(do_something_blockint) ```
When you run your Python program using [CPython], the code is parsed and converted to an internal bytecode format, which is then executed inside the VM. From the user’s perspective, this is clearly an interpreter—they run their program from source. But if you look under CPython’s scaly skin, you’ll see that there is definitely some compiling going on. The answer is that it is both. CPython is an interpreter, and it has a compiler.
You can actually compile all of your Python code beforehand using the compileall module on the command line:
$ python3 -m compileall .
This will place the compiled bytecode of all Python files in the current directory in pycache/ and show you any compiler errors.
Python is both a compiled and interpreted language
The CPython interpreter really is an interpreter. But it also is a compiler. Python must go through a few stages before ever running the first line of code:
Older versions of Python added an additional stage:
Let’s compare this to the stages of compiling a C program:
next stage is parsing (also known as syntactic analysis) and the parser reports the first error in the source code. Parsing the whole file happens before running the first line of code which means that Python does not even see the error on line 1 and reports the syntax error on line 2.
I haven’t done a deep dive into the source code of the CPython interpreter to verify this, but I think the reason that this is the first error detected is because one of the first steps that Python 3.12 does is scanning (also known as lexical analysis). The scanner converts the ENTIRE file into a series of tokens before continuing to the next stage. A missing quotation mark at the end of a string literal is an error that is detected by the scanner—the scanner wants to turn the ENTIRE string into one big token, but it can’t do that until it finds the closing quotation mark. The scanner runs first, before anything else in Python 3.12, hence why this is the first error message.
Python reports only one error message at a time—so the game is which error message will be reported first?
Here is the buggy program:
python
1 / 0
print() = None
if False
ñ = "hello
Each line of code generates a different error message:
The question is… which will be reported first?
Spoilers: the specific version of Python matters (more than I thought it would) so keep that in mind if you see different results.
The first error message detected is on the last line of source code. What this tells us is that Python must read the entire source code file before running the first line of code. If you have a definition in your head of an “interpreted language” that includes “interpreted languages run the code one line at a time”, then I want you to cross that out!
The tokenizer takes your source code and chunks it into “tokens”. Tokens are just small pieces of source code that you can identify in isolation. As examples, there will be tokens for numbers, mathematical operators, variable names, and keywords (like if or for). The parser will take that linear sequence of tokens and essentially reshape them into a tree structure (that's what the T in AST stands for: Tree). This tree is what gives meaning to your tokens, providing a nice structure that is easier to reason about and work on. As soon as we have that tree structure, our compiler can go over the tree and figure out what bytecode instructions represent the code in the tree. For example, if part of the tree represents a function, we may need a bytecode for the return statement of that function. Finally, the interpreter takes those bytecode instructions and executes them, producing the results of our original program.
Recap
In this article you started implementing your own version of Python. To do so, you needed to create four main components:
A tokenizer: * accepts strings as input (supposedly, source code); * chunks the input into atomic pieces called tokens; * produces tokens regardless of their sequence making sense or not.
A parser: * accepts tokens as input; * consumes the tokens one at a time, while making sense they come in an order that makes sense; * produces a tree that represents the syntax of the original code.
A compiler: * accepts a tree as input; * traverses the tree to produce bytecode operations.
An interpreter: * accepts bytecode as input; * traverses the bytecode and performs the operation that each one represents; * uses a stack to help with the computations.
Each bytecode is defined by two things: the type of bytecode operation we're dealing with (e.g., pushing things on the stack or doing an operation); and the data associated with that bytecode operation, which not all bytecode operations need.
The interpreter accepts a list of bytecode operations and its method interpret will go through the list of bytecodes, interpreting one at a time.
``` from .compiler import Bytecode, BytecodeType
class Interpreter: def init(self, bytecode: list[Bytecode]) -> None: self.stack = Stack() self.bytecode = bytecode self.ptr: int = 0
def interpret(self) -> None:
for bc in self.bytecode:
# Interpret this bytecode operator.
if bc.type == BytecodeType.PUSH:
self.stack.push(bc.value)
elif bc.type == BytecodeType.BINOP:
right = self.stack.pop()
left = self.stack.pop()
if bc.value == "+":
result = left + right
elif bc.value == "-":
result = left - right
else:
raise RuntimeError(f"Unknown operator {bc.value}.")
self.stack.push(result)
print("Done!")
print(self.stack)
```
The interpreter is the part of the program that is responsible for taking bytecode operations as input and using those to actually run the source code you started off with.
To write our compiler, we'll just create a class with a method compile. The method compile will mimic the method parse in its structure. However, the method parse produces tree nodes and the method compile will produce bytecode operations.
The compiler is the part of our program that will take a tree (an AST, to be more precise) and it will produce a sequence of instructions that are simple and easy to follow.
Instead of interpreting the tree directly, we'll use a compiler to create an intermediate layer.
After we have our sequence of operations (bytecodes), we will “interpret” it. To interpret the bytecode means that we go over the bytecode, sequence by sequence, and at each point we perform the simple operation that the bytecode tells us to perform.
Bytecodes are just simple, atomic instructions that do one thing, and one thing only.
Abstract syntax tree
It's an abstract syntax tree because it is a tree representation that doesn't care about the original syntax we used to write the operation. It only cares about the operations we are going to perform.
The parser is the part of our program that accepts a stream of tokens and makes sure they make sense.
The tokenizer
The tokenizer is the part of your program that accepts the source code and produces a linear sequence of tokens – bits of source code that you identify as being relevant.
The four parts of our program
Once an interpreter is running (remembering what I said that it is preferable to leave them running) you can share data using a channel. The channels module is also part of PEP554 and available using a secret-import:
``` import _xxsubinterpreters as interpreters import _xxinterpchannels as channels
interp_id = interpreters.create(site=site) channel_id = channels.create()
interpreters.run_string( interp_id, """ import _xxinterpchannels as channels channels.send('hello!') """, shared={ "channel_id": channel_id } )
print(channels.recv(channel_id)) ```
To share data, you can use the shared argument and provide a dictionary with shareable (int, float, bool, bytes, str, None, tuple) values:
``` import _xxsubinterpreters as interpreters
interp_id = interpreters.create(site=site)
interpreters.run_string( interp_id, "print(message)", shared={ "message": "hello world!" } )
interpreters.run_string( interp_id, """ for message in messages: print(message) """, shared={ "messages": ("hello world!", "this", "is", "me") } )
interpreters.destroy(interp_id) ```
To start an interpreter that sticks around, you can use interpreters.create() which returns the interpreter ID. This ID can be used for subsequent .run_string calls:
``` import _xxsubinterpreters as interpreters
interp_id = interpreters.create(site=site)
interpreters.run_string(interp_id, "print('hello world')") interpreters.run_string(interp_id, "print('hello universe')")
interpreters.destroy(interp_id) ```
Starting a sub interpreter is a blocking operation, so most of the time you want to start one inside a thread.
``` from threading import Thread import _xxsubinterpreters as interpreters
t = Thread(target=interpreters.run, args=("print('hello world')",)) t.start() ```
You can create, run and stop a sub interpreter with the .run() function which takes a string or a simple function
``` import _xxsubinterpreters as interpreters
interpreters.run(''' print("Hello World") ''') ```
Inter-Worker communication
Whether using sub interpreters or multiprocessing you cannot simply send existing Python objects to worker processes.
Multiprocessing uses pickle by default. When you start a process or use a process pool, you can use pipes, queues and shared memory as mechanisms to sending data to/from the workers and the main process. These mechanisms revolve around pickling. Pickling is the builtin serialization library for Python that can convert most Python objects into a byte string and back into a Python object.
Pickle is very flexible. You can serialize a lot of different types of Python objects (but not all) and Python objects can even define a method for how they can be serialized. It also handles nested objects and properties. However, with that flexibility comes a performance hit. Pickle is slow. So if you have a worker model that relies upon continuous inter-worker communication of complex pickled data you’ll likely see a bottleneck.
Sub interpreters can accept pickled data. They also have a second mechanism called shared data. Shared data is a high-speed shared memory space that interpreters can write to and share data with other interpreters. It supports only immutable types, those are:
To share data with an interpreter, you can either set it as initialization data or you can send it through a channel.
The next point when using a parallel execution model like multiprocessing or sub interpreters is how you share data.
Once you get over the hurdle of starting one, this quickly becomes the most important point. You have two questions to answer:
Half of the time taken to start an interpreter is taken up running “site import”. This is a special module called site.py that lives within the Python installation. Interpreters have their own caches, their own builtins, they are effectively mini-Python processes. Starting a thread or a coroutine is so fast because it doesn’t have to do any of that work (it shares that state with the owning interpreter), but it’s bound by the lock and isn’t parallel.
Both multiprocessing processes and interpreters have their own import state. This is drastically different to threads and coroutines. When you await an async function, you don’t need to worry about whether that coroutine has imported the required modules. The same applies for threads.
For example, you can import something in your module and reference it from inside the thread function:
```python import threading from super.duper.module import cool_function
def worker(info): # This already exists in the interpreter state cool_function()
info = {'a': 1} thread = Thread(target=worker, args=(info, )) ```
Another important point is that multiprocessing is often used in a model where the processes are long-running and handed lots of tasks instead of being spawned and destroyed for a single workload. One great example is Gunicorn, the popular Python web server. Gunicorn will spawn “workers” using multiprocessing and those workers will live for the lifetime of the main process. The time to start a process or a sub interpreter then becomes irrelevant (at 89 ms or 1 second) when the web worker can be running for weeks, months or years. The ideal way to use these parallel workers for small tasks (like handle a single web request) is to keep them running and use a main process to coordinate and distribute the workload
What is the difference between threading, multiprocessing, and sub interpreters?
The Python standard library has a few options for concurrent programming, depending on some factors:
Here are the models:
Rosetta is now Generally Available for all users on macOS 13 or later. It provides faster emulation of Intel-based images on Apple Silicon. To use Rosetta, see Settings. Rosetta is enabled by default on macOS 14.1 and later.
Tested it on my side, and poetry install of one Python project took 44 seconds instead of 2 minutes 53 seconds, so it's nearly a 4x speed increase!
Method 1: numpy.any() to check if the NumPy array is empty in Python numpy.any() method is used to test whether any array element along a given axis evaluates to True. Syntax: numpy.any(a, axis = None, out = None, keepdims = <no value>) Parameters: array: Input array whose elements need to be checked.axis: Axis along which array elements are evaluated.out: Output array having the same dimensions as Input arraykeepdmis: If this is set to True, the axes which are reduced are left in the result. Return Value: A new Boolean array (depending on the ‘out;’ parameter) 1234567import numpy as nparr = np.array([])flag = not np.any(arr)if flag: print('Array is empty')else: print('Array is not empty') Output: Array is empty In this example, we have used numpy.any() method to check whether the array is empty or not. As the array is empty, the value of the flag variable becomes True, and so the output ‘Array is empty’ is displayed. The limitation to this function is that it does not work if the array contains the value 0 in it.
This is WRONG.
numpy.any() checks if there is at least one non-zero element in an array.
nserting a new chicken into a ring at some specified location in it, usu-ally first or last.2. Removing a chicken from a ring.3. Putting all the chickens of one ring, in order, into another at some speci-fied location in it, usually first or last.4. Performing some auxiliary operation on each member of a ring in eitherforward or reverse order
In simpler terms, Sutherland's thesis is discussing the basic operations of a data structure known as a ring. A ring is a type of list where the elements are connected in a circular manner. The operations he mentions are:
These operations are implemented using macro instructions in the compiler language. A macro instruction is a directive to the compiler which specifies how an input pattern should be mapped to an output pattern.
The thesis also discusses the generation of new elements. Subroutines are used to set up new elements in free spaces in the storage structure. When parts of the drawing are deleted, the registers representing them become free and are placed in a 'FREES' ring. New components are set up at the end of the storage area, while free blocks are allowed to accumulate. A process called 'garbage collection' periodically compacts the storage structure by removing the free blocks and relocating the information above them.
In Python, the ring data structure can be implemented using a doubly linked list. Here's a simple example:
```python class Node: def init(self, data): self.data = data self.next = None self.prev = None
class Ring: def init(self): self.head = None
def append(self, data):
if not self.head:
self.head = Node(data)
self.head.next = self.head
self.head.prev = self.head
else:
new_node = Node(data)
new_node.prev = self.head.prev
new_node.next = self.head
self.head.prev.next = new_node
self.head.prev = new_node
def display(self):
temp = self.head
while True:
print(temp.data, end = " ")
temp = temp.next
if temp == self.head:
break
```
In this example, the Node class represents an element in the ring, and the Ring class represents the ring itself. The append method is used to add a new element to the ring, and the display method is used to print all elements in the ring.
RING STRUCTURE
Ivan writes about a concept called "Ring Structure". This is a way of organizing and linking different elements or components in a system.
In simpler terms, imagine you have a bunch of different objects (like points and lines in a drawing program like Sketchpad). You want to keep track of how these objects are related to each other. For example, you might want to know all the lines that end at a particular point.
To do this, Ivan uses a "ring structure". Each object has a "string of pointers" - basically a list of references to other objects. This list is circular - the last item in the list points back to the first item. This makes it easy to move forwards and backwards through the list.
Each object has two "registers" or slots for keeping track of these relationships. One slot is for the object itself, and the other is for the list of related objects.
Ivan uses the terms "hen" and "chicken" to describe these slots. The "hen" is the object itself, and the "chicken" is the list of related objects.
Here's a simple Python code example to illustrate this concept:
```python class Point: def init(self): self.hen = self self.chickens = []
class Line: def init(self, point1, point2): self.hen = self self.chickens = [point1, point2] point1.chickens.append(self) point2.chickens.append(self) ```
In this example, a Point object has a hen that refers to itself and a list of chickens that will contain any Line objects that end at this point. When a Line is created, it adds itself to the chickens list of its end points.
The "ring structure" is a way to organize and link different elements in a system, making it easier to find and update related elements.
MNEMONICS AND CONVENTIONS
Mnemonics for Registers: Instead of remembering numerical indices, we use human-readable keys.
python
point = {'TYPE': 'Point', 'PVAL_X': 5, 'PVAL_Y': 10}
Flexibility: If we want to change the internal structure, we can easily do so by changing the keys.
```python
point = {'TYPE': 'Point', 'X_COORD': 5, 'PVAL_Y': 10} ```
Conventions:
python
line = {'TYPE': 'Line', 'START': 'point1', 'END': 'point2', 'LENGTH': 7.2}
Pointers and Topology: We can use pointers (references) to other elements to establish relationships.
python
point1 = {'TYPE': 'Point', 'PVAL_X': 1, 'PVAL_Y': 1}
point2 = {'TYPE': 'Point', 'PVAL_X': 4, 'PVAL_Y': 5}
line = {'TYPE': 'Line', 'START': point1, 'END': point2}
Relocation: If we move point1 to a new variable, we update the pointer in line.
python
new_point1 = point1 # Relocating point1 to new_point1
line['START'] = new_point1 # Updating the pointer
Segregation of Data: Numerical data is at the end, so if we need to move elements, the numerical data remains untouched.
```python
```
Configuring PyCharm: Open PyCharm with ‘Pytest Web Framework’ Press Ctrl+Alt+S > Project Click ‘Project Interpreter’ Select Python 3.6 Click ‘OK’ Go to write over 100500 automated tests!!!
This section provides a step-by-step guide on setting up PyCharm for automated testing using the 'Pytest Web Framework'.
cat requirements.txt | grep -E '^[^# ]' | cut -d= -f1 | xargs -n 1 poetry add
Use poetry init to create a sample pyproject.toml, and then trigger this line to export requirements.txt into a pyproject.toml
As you can see, it has sliced along axis 0, the first axis. A slice, therefore, selects a range of elements along an axis. It can be helpful to read the expression arr2d[:2] as "select the first two rows of arr2d."
Slices follow a similar logic than indexing in NumPy array's. array[:2] selects a range of elements along a single axis,, but array[:2, 1:] does it along two axis.
You might want to suppress only ValueError, since a TypeError (the input was not a string or numeric value) might indicate a legitimate bug in your program. To do that, write the exception type after except: def attempt_float(x): try: return float(x) except ValueError: return x
Since generators produce output one element at a time versus an entire list all at once, it can help your program use less memory.
It is not until you request elements from the generator that it begins executing its code:
A generator is a function-like iterator object.
A generator is a convenient way, similar to writing a normal function, to construct a new iterable object. Whereas normal functions execute and return a single result at a time, generators can return a sequence of multiple values by pausing and resuming execution each time the generator is used. To create a generator, use the yield keyword instead of return in a function:
In this case, return_value would be a 3-tuple with the three returned variables. A potentially attractive alternative to returning multiple values like before might be to return a dictionary instead:
Returning multiple values in Python is expressed as a tuple by default and each value is correspondingly assigned. Optionally, you can return a dictionary if specified.
The for parts of the list comprehension are arranged according to the order of nesting, and any filter condition is put at the end as before.
Nested list comprehensions follow the same logic as nested for loops. The difference strives that in list comprehensions the filtered variable is mentioned twice.
python -m calendar
So surprised that you can output a calendar view using Python
python -m site, which outputs useful information about your installation
python -m site <--- see useful information about your Python installation
```python from flask import Flask, request from collections import defaultdict import re import random
GREEN ="🟩" YELLOW ="🟨" WHITE ="⬜"
def get_answers(): with open("allowed_answers.txt") as f: answers = set(l for l in f.read().splitlines() if l) return answers
def get_guesses(): guesses = get_answers() with open("allowed_guesses.txt") as f: for l in f.read().splitlines(): if l: guesses.add(l) return guesses
app = Flask(name, static_folder="static") app.answers = get_answers() app.guesses = get_guesses() word = random.choice(list(app.answers)) print(f"The word is {word}")
def with_header(content): return f"""
<html> <head> <link rel="search" type="application/opensearchdescription+xml" title="searchGame" href="http://searchgame:5000/static/opensearch.xml" /> </head> <body> {content} </body></html>"""
@app.route("/") def home(): return with_header("
Right click on the address bar to install the search engine.
")@app.route("/search") def search(): return with_header(f"Content: {request.args.get('q')}")
def to_result(guess, answer): chars = [WHITE] * 5 count = defaultdict(int) for idx, (g, a) in enumerate(zip(guess, answer)): if g == a: chars[idx] = GREEN else: count[a] += 1
for idx, g in enumerate(guess):
if g in count and count[g] > 0 and chars[idx] == WHITE:
chars[idx] = YELLOW
count[g] -= 1
return "".join(chars)
def maybe_error(guess): if len(guess) < 5: return f"less than 5 characters" if len(guess) > 5: return f"greater than 5 characters" if guess not in app.guesses: return f"not in wordlist" return None
@app.route("/game") def game(): query = request.args.get("q") guesses = [x for x in re.split("[. ]", query) if x] response = [] if not guesses: response.append("Enter 5-letter guesses separated by spaces") else: most_recent = guesses[-1] # Don't show "too short" error for most recent guess if len(most_recent) < 5: guesses = guesses[:-1] if not guesses: response.append("Enter a wordle guess") for guess in guesses[::-1]: error = maybe_error(guess) if error is None: result = to_result(guess, word) s = f"{guess} | {result}" if result == GREEN * 5: s = f"{s} | CORRECT!" response.append(s) else: response.append(f"{guess} | ERROR: {error}")
return [query, response]
```
o con
حالا می خواد بره با Axios وصلش کنه
e
حالا رفت سراغ Vue
requests
در واقع تعیین می کنه که درخواست ها از چه پروتکلی، با چه IP یا Domain Name و رو چه Port می تونی جواب بدی. خیلی خووبه براش تعیین کنی تو فقط باید از این IP و Port که در Front تعیین کردی استفاده کنی
Flask-CORS
تازه رفت سراغ Flask-origin چقد جاالب.
pyth
اول یه Virtual Environment می سازه
For a new project, I’d just immediately start with Ruff; for existing projects, I would strongly recommend trying it as soon as you start getting annoyed about how long linting is taking in CI (or even worse, on your computer).
Recommendation for when to use Ruff over PyLint or Flake8
Python essentially doesn't have private methods, let alone protected ones, and it doesn't turn out to be that big a deal in practice.
```python def split_user(userid): """ Return the user and domain parts from the given user id as a dict.
For example if userid is u'acct:seanh@hypothes.is' then return
{'username': u'seanh', 'domain': u'hypothes.is'}'
:raises InvalidUserId: if the given userid isn't a valid userid
"""
match = re.match(r"^acct:([^@]+)@(.*)$", userid)
if match:
return {"username": match.groups()[0], "domain": match.groups()[1]}
raise InvalidUserId(userid)
```
```html
<body> <div> {% for chat in chats %} <div>{{ chat.contents }}</div> <button id={{chat.id}} ✅ onClick=getId(id) ✅ > print this chat_id out </button> {% endfor %} </div> ... <script> function getId(id){ console.log(id) } </script> </body>```
Examples of frontends include: pip, build, poetry, hatch, pdm, flit Examples of backends include: setuptools (>=61), poetry-core, hatchling, pdm-backend, flit-core
Frontend and backend examples of Python's build backends
pyproject.toml-based builds are the future, and they promote better practices for reliable package builds and installs. You should prefer to use them!
setup.py is considered a "legacy" functionality these days
Did you say setuptools? Yes! You may be familiar with setuptools as the thing that used your setup.py files to build packages. Setuptools now also fully supports pyproject.toml-based builds since version 61.0.0. You can do everything in pyproject.toml and no longer need setup.py or setup.cfg.
setuptools can now utilize pyproject.toml
Cropping pages.
PyPDF4 is compared with 6 other python libraries to manipulate, create and annotate pdf files via python
With this dataclass, I have an explicit description of what the function returns.
Dataclasses give you a lot more clarity of what the function returns, in comparison to returning tuples or dictionaries
Host machine: docker run -it -p 8888:8888 image:version Inside the Container : jupyter notebook --ip 0.0.0.0 --no-browser --allow-root Host machine access this url : localhost:8888/tree
3 ways of running jupyter notebook in a container
python 内置函数next()
we are all beginners
How can I add, subtract, and compare binary numbers in Python without converting to decimal?
I think the requirements of this were not spelled out well. After reading this over a couple of times, I think the problem should be…
"Add, subtract, and compare binary numbers in Python as strings, without converting them to decimal."
I'll take on that problem sometime when I get free time!
'handlers': { 'console': { 'level': 'INFO', 'class': 'logging.StreamHandler', 'stream': sys.stdout, 'formatter': 'verbose' }, },
It's as simple as adding "sys.stdout" to the "stream" attribute.
16 August 2011
This is a pretty old article, from 2011. Note that it refers to Python 2.6 and 2.7. Today, Python is up to version 3.12.
NICE tables comparing build frameworks, front-ends & backends & others, discovered from this SO.
Python入门 类class 基础篇
class starter
Experienced talk on python project configurations
If you install a package with pip’s --user option, all its files will be installed in the .local directory of the current user’s home directory.
One of the recommendations for Python multi-stage Docker builds. Thanks to pip install --user, the packages won't be spread across 3 different paths.
Honestly, all the activation scripts do are:
See the 4 steps below to understand what activating an environment in Python really does
Using pex in combination with S3 for storing the pex files, we built a system where the fast path avoids the overhead of building and launching Docker images.Our system works like this: when you commit code to GitHub, the GitHub action either does a full build or a fast build depending on if your dependencies have changed since the previous deploy. We keep track of the set of dependencies specified in setup.py and requirements.txt.For a full build, we build your project dependencies into a deps.pex file and your code into a source.pex file. Both are uploaded to Dagster cloud. For a fast build we only build and upload the source.pex file.In Dagster Cloud, we may reuse an existing container or provision a new container as the code server. We download the deps.pex and source.pex files onto this code server and use them to run your code in an isolated environment.
Fast vs full deployments
Well, in short, with iterators, the flow of information is one-way only. When you have an iterator, all you can really do call the __next__ method to get the very next value to be yielded. In contrast, the flow of information with generators is bidirectional: you can send information back into the generator via the send method.
yield stuff)send method)PythonCopyconfigs = {"fs.azure.account.auth.type": "OAuth", "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider", "fs.azure.account.oauth2.client.id": "<application-id>", "fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"), "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"} # Optionally, you can add <directory-name> to the source URI of your mount point. dbutils.fs.mount( source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/", mount_point = "/mnt/<mount-name>", extra_configs = configs)
So why aren't more people using Nim? I don't know! It's the closest thing to a perfect language that I've used by far.
Nim sounds as the most ideal language when comparing to Python, Rust, Julia, C#, Swift, C
depending on how smart the framework is, you might find yourself installing Conda packages over and over again on every run. This is inefficient, even when using a faster installer like Mamba.
there’s the bootstrapping problem: depending on the framework you’re using, you might need to install Conda and the framework driver before you can get anything going. A Docker image would come prepackaged with both, in addition to your code and its dependencies. So even if your framework supports Conda directly, you might want to use Docker anyway.
The only thing that will depend on the host operating system is glibc, pretty much everything else will be packaged by Conda. So a pinned environment.yml or conda-lock.yml file is a reasonable alternative to a Docker image as far as having consistent dependencies.
Conda can be a sufficient alternative to Docker
To summarize, for the use case of Python development environments, here’s how you might choose alternatives to Docker:
(see table below)
Conda packages everything but the standard C library, from C libraries to the Python interpreter to command-line tools to compilers.
When you call 'foo' in Ruby, what you're actually doing is sending a message to its owner: "please call your method 'foo'". You just can't get a direct hold on functions in Ruby in the way you can in Python; they're slippery and elusive. You can only see them as though shadows on a cave wall; you can only reference them through strings/symbols that happen to be their name. Try and think of every method call 'object.foo(args)' you do in Ruby as the equivalent of this in Python: 'object.getattribute('foo')(args)'.
def document(f): def wrap(x): print "I am going to square", x f(x) return wrap @document def square(x): print math.pow(x, 2) square(5)
Post involved with finding a good jupyter notebook visualization lib for DAGs.
What object-oriented means
What does the object-oriented means? Objects are models of somethings that can do certain things and have certain things done to them. Formally, an object is a collection of data and associated behaviors.
The difference between object-oriented design and object-orientedprogramming
What is the design and programming mean in OOP?