2,548 Matching Annotations
  1. Mar 2023
    1. This could be because the size can be misleading, there is on disk size, push/pull payload size, and sum of ungzipped tars. The size of the ungzipped tars is often used to represent the size of the image in Docker but the actual size on disk is dependent on the graph driver. From the registry perspective, the sum of the gzipped layers is most important because it represent what the registry is storing and what needs to be transferred.

      Docker image size on a local drive will be different

    1. Ultimately, after researching how we can overcome some inconveniences in Kubeflow, we decided to continue using it. Even though the UI could use some improvements in terms of clarity, we didn’t want to give up the advantages of configured CI/CD and containerization, which allowed us to use different environments. Also, for our projects, it is convenient to develop each ML pipeline in separate Git repositories.

      Kubeflow sounds like the most feature rich solution, whose main con is its UI and the setup process

    2. The airflow environment must have all the libraries that are being imported in all DAGs. Without using containerization all Airflow pipelines are launched within the same environment. This leads to limitations in using exotic libraries or conflicting module versions for different projects.

      Main con of Airflow

    3. Prefect is a comparatively new but promising orchestration tool that appeared in 2018. The tool positions itself as a replacement for Airflow, featuring greater flexibility and simplicity. It is an open-source project; however, there is a paid cloud version to track workflows.

      Prefect

    4. An orchestration tool usually doesn’t do the hard work of translating and processing data itself, but tells other systems and frameworks what to do and monitors the status of the execution.

      Responsibility of the orchestration tool

    1. Well, in short, with iterators, the flow of information is one-way only. When you have an iterator, all you can really do call the __next__ method to get the very next value to be yielded. In contrast, the flow of information with generators is bidirectional: you can send information back into the generator via the send method.
      • Iterator ← one-way communication (can only yield stuff)
      • Generator ← bidirectional communication (can also accept information via the send method)
    1. "For this campaign, we surveyed 930 Americans to explore their retirement plans. Among them, 16% were retired, 22% were still working, and 62% were retirees who had returned to work."So, 149 of those surveyed were retired. Of those 149, 25 (1 in 6) are considering returning to work. 13 of those want remote positions.
    1. Instagram was founded in 2010. The iPhone 4 was released then too—the first smartphone with a front-facing camera. In 2012 Facebook bought Instagram, and that’s the year that its user base exploded. By 2015, it was becoming normal for 12-year-old girls to spend hours each day taking selfies

      Main cause of global depression

    1. ServingRuntime - Templates for Pods that can serve one or more particular model formats. There are three "built in" runtimes that cover the out-of-the-box model types, custom runtimes can be defined by creating additional ones.

      ServingRuntime

    1. Each model’s resource overhead is 1CPU and 1 GB memory. Deploying many models using the current approach will quickly use up a cluster's computing resource. With Multi-model serving, these models can be loaded in one InferenceService, then each model's average overhead is 0.1 CPU and 0.1GB memory.

      If I am not mistaken, the multi-model approach reduces the size by 90% in this case

    2. While you get the benefit of better inference accuracy and data privacy by building models for each use case, it is more challenging to deploy thousands to hundreds of thousands of models on a Kubernetes cluster.

      With more separation, comes the problem of distribution

    1. there’s the bootstrapping problem: depending on the framework you’re using, you might need to install Conda and the framework driver before you can get anything going. A Docker image would come prepackaged with both, in addition to your code and its dependencies. So even if your framework supports Conda directly, you might want to use Docker anyway.
    2. The only thing that will depend on the host operating system is glibc, pretty much everything else will be packaged by Conda. So a pinned environment.yml or conda-lock.yml file is a reasonable alternative to a Docker image as far as having consistent dependencies.

      Conda can be a sufficient alternative to Docker

    1. In this example, we’ve defined an API endpoint called /predict_image that accepts a file upload using FastAPI's UploadFile type. When a client sends an image file to this endpoint, the file is read and its contents are passed to a preprocessing function that prepares the image for input into the model. Once the image has been preprocessed, the model can make a prediction on it, and the result can be returned to the client as a JSON response.

      Example above shows how to upload an image to an API endpoint with FastAPI.

      Example below is a bit more complex.

    2. For example, if you are using TensorFlow, you might save your model as a .h5 file using the Keras API. If you are using PyTorch, you might save your model as a .pt file using the torch.save() function. By saving your model as a file, you can easily load it into a deployment environment (such as FastAPI) and use it to make predictions on new images
  2. Feb 2023
    1. The set -x command is used to turn on debugging in a shell script and can also be used to test bash aliases. When set -x is used, the command and its arguments are printed to the standard error stream before the command is executed. This can be useful for testing aliases because it lets you see exactly what command is running and with what arguments.

      set -x

    2. A much more elegant approach, however, is to add them to an ~/.aliases like file and then source this file in your respective profile file assource ~/.aliases

      More elegant way to list aliases

    1. The way to get new ideas is to notice anomalies: what seems strange, or missing, or broken? You can see anomalies in everyday life (much of standup comedy is based on this), but the best place to look for them is at the frontiers of knowledge.Knowledge grows fractally. From a distance its edges look smooth, but when you learn enough to get close to one, you'll notice it's full of gaps. These gaps will seem obvious; it will seem inexplicable that no one has tried x or wondered about y. In the best case, exploring such gaps yields whole new fractal buds.

      Way to get new ideas

    1. A huge percentage of the data that gets processed is less than 24 hours old. By the time data gets to be a week old, it is probably 20 times less likely to be queried than from the most recent day. After a month, data mostly just sits there.
    2. Customer data sizes followed a power-law distribution. The largest customer had double the storage of the next largest customer, the next largest customer had half of that, etc.
    3. the vast majority of customers had less than a terabyte of data in total data storage. There were, of course, customers with huge amounts of data, but most organizations, even some fairly large enterprises, had moderate data sizes.
    4. The most surprising thing that I learned was that most of the people using “Big Query” don’t really have Big Data. Even the ones who do tend to use workloads that only use a small fraction of their dataset sizes.
  3. Jan 2023
    1. To someone who grew up in an old-money affluent family, a Lamborghini might be a symbol of gaudy egotism; to those who grew up with nothing, the car might serve as the ultimate symbol that you’ve made it.
    1. I heard of a peculiar technique used by a company hiring a senior executive. In the final round of interviews, the candidates are taken to lunch, and during the meal something goes wrong—of course, this is all staged as a kind of test. Maybe someone walks up to the table and creates a scene, or perhaps the food delivered to the table is completely wrong. The purpose is to see how the candidate handles the situation. You fail the test if you over-react (for example, causing a scene yourself) or under-react (e.g., just letting things get out of control with no response).

      Test of handling unexpected problems

    2. Great listeners possess extraordinary skills of awareness and comprehension. They can assess situations with tremendous accuracy, and act in ways that maximize group effectiveness.
    1. We are so conditioned to think that our lives revolve around great moments. But great moments often catch us unawares. When that woman hugged me and said that I had brought her a moment of joy, it was possible to believe that I had been placed on earth for the sole purpose of providing her with that last ride.

      😭

    1. These six are split across three operational meetings (each manager with their direct team, technical spec review, incident review), two developmental meetings (staff engineers, engineering managers) and finally a monthly engineering Q&A to learn what the organization is really thinking about.

      Six core organizational meetings

    2. My weekly team meetings always include my direct reports, and usually include our key partner in the Recruiting, People, and Finance teams. (I’ve also experimented with including engineers

      Ideas for who to include in a weekly team meeting

    1. Python — Json documents double as valid Python literals with minimal adaptation, and Python supports trailing commas and comments. It has variables and functions, powerful string interpolation, and json.dump built in. A self-contained Python file that prints json to stdout goes a long way!

      If you would like to template yaml, then rather do it in Nix or Python

    2. always quote all strings. (Indeed, you can tell that somebody is an experienced yaml engineer when they defensively quote all the strings.) We can choose to always use true and false rather than yes and no, and generally stay away from the arcane features.

      Advices for yaml users

    3. I think the main reason that yaml is so prevalent despite its pitfalls, is that for a long time it was the only viable configuration format.

      Yaml has strange issues, but is popular as it was one of the 1st options

    1. Especially when compared to a plastic tongue scraper, a stainless steel one is better at balancing Ayurvedic imbalances. Unlike those made with gold, silver, brass, or copper, stainless steel tongue scrapers resist corrosion and are suitable for all mind-body types, or doshas, according to Dr. Deepak Chopra.  

      Stainless steel tongue scrapers seems to be the best

    1. tl;dw (best DevOps tools in 2023)

      1. Low-budget cloud computing : Civo (close to Scaleway)
      2. Infrastructure and Service Management: Crossplane
      3. App Management - manifests : cdk8s (yes, not Kustomize or Helm)
      4. App Management - k8s operators: tie between Knative and Crossplane
      5. App Management - managed services: Google Cloud Run
      6. Dev Envs: Okteto (yeap, not GitPod)
      7. CI/CD: GitHub Actions (as it's simplest to use)
      8. GitOps (CD): Argo CD (wins with Flux due to its adoption rate)
      9. Policy Management: Kyverno (simpler to use than industry's most powerful tool: OPA / Gatekeeper)
      10. Observability: OpenTelemetry (instrumentation of apps), VictoriaMetrics (metrics - yes not Prometheus), Grafana / Loki (logs), Grafana Tempo (tracing), Grafana (dashboards), Robusta (alerting), Komodor (troubleshooting)
    1. Manolis Kellis: Origin of Life, Humans, Ideas, Suffering, and Happiness | Lex Fridman Podcast #123

      My summary:

      Biology: * Life = energy + self preservation * Neanderthals could be the reason why wolves/dogs are living closely with humans. Maybe in the past generations, dogs had no choice but to live with humans as they were scared of our power? * People evolved from the deep ocean (we're made in 70% of water). We're like transporting the sea with us now * Dolphins as mammals came back into the water * RNA invented proteins. Later RNA and proteins created DNA * Life is like any kind of self-reinforcement such as self-reinforcement of RNA molecules which lead to the evolution process * Europa (moon of Jupiter) already evolves some non-DNA life there. Life could exist in its under-ice ocean, perhaps in an environment similar to Earth's deep-ocean hydrothermal vents. It will be fascinating to get to know it

      Life: * Don't focus on goals but have a path to prevent the "rat race" sort of feeling * Almost every Hollywood movie has a happy ending. It prepares us, humans, really poorly for the bad times in life We need to read/watch more stories with a bad ending * Life is about accomplishing things, not about being happy all the time * As a parent, don't ask your kid if he's happy but what he's struggling to achieve * Most likely, we live on the best planet during the best time as the most beautiful mammals * If you understand yourself, you won't seek self-assurance in what other people think of you * It's hard to get to know your true self if you live all the time in the same location/environment and have the same friends who would like to have a stable image of you

    1. Face To Face with MIKE SHINODA

      (16:14)

      Mike Shinoda discussing the idea from Stephen King's book "On Writing":

      "You should write every day, even if it's torturous, even if you hate it, you sit down and say: "this is not what I wanna be doing, I'm not in the mood, I've got too many things going on". But you should do it anyway, cause it keeps your creative muscles strong, and eventually, even on the bad day, you can come up with something that's remarkable and surprising that you can use later."

    1. Naive Recursive Approach O(2^n): ```python def fib(n): if n == 1 or n == 2: // base case return 1

      else:
          return fib(n-1) + fib(n-2)
      

      ```

      Dynamic Programming Solution O(n): python def fib(n): // assuming n > 2 seq = zeros(n) seq[0] = seq[1] = 1 for i from 2 to (n-1): seq[i] = seq[i-1] + seq[i-2] return seq[n-1]

    1. The tech industry layoffs are basically an instance of social contagion, in which companies imitate what others are doing. If you look for reasons for why companies do layoffs, the reason is that everybody else is doing it.

      The main reason for tech layoffs

    1. In general, the environment variable is too heavy-handed and should be avoided, since it will impact all images you build or run. Given the speed impact, you don’t for example want to run your postgres image with emulation, to no benefit.

      Which options to avoid

    2. The other option is to run x86_64 Docker images on your ARM64 Mac machine, using emulation. Docker is packaged with software that will translate or emulate x86_64 machine code into ARM64 machine code on the fly; it’s slow, but the code will run.

      Another possible solution for M1 users (see snippets below)

    3. If you have a compiler installed in your Docker image and any required native libraries and development headers, you can compile a native package from the source code. Basically, you add a RUN apt-get upgrade && apt-get install -y gcc and iterate until the package compiles successfully.

      Second possible solution for M1 users

    4. In either case, pure Python will Just Work, because it’s interpreted at runtime: there’s no CPU-specific machine code, it’s just text that the Python interpreter knows how to run. The problems start when we start using compiled Python extensions. These are machine code, and therefore you need a version that is specific to your particular CPU instruction set.

      M1 Python issues

    5. In order to meet its build-once-run-everywhere promise, Docker typically runs on Linux. Since macOS is not Linux, on macOS this is done by running a virtual machine in the background, and then the Docker images run inside the virtual machine. So whether you’re on a Mac, Linux, or Windows, typically you’ll be running linux Docker images.
    1. Of all the people I know, I can only think of a couple people who I believe really don’t want to be 50% of the conversation. Topic introduction should also be split pretty evenly.

      Conversations should be about 50/50, but only a couple of people might not prefer it and talk less

    2. “Nice” “Cool” “Ok” If you get a lot of single word answers, you are not keeping the other person interested. They are trying to prevent you from saying more, either because you habitually talk too much or because the topic isn’t interesting to them.

      Avoid single word answers

    3. I have a friend who has somewhat extreme political views, but he will always say things like, “I believe X, but I bet you believe Y and you always have interesting takes, so I’d love to hear your thoughts”. It’s a great way to disagree in a positive and constructive way, and I always enjoy conversations with him.
    4. It seems that these days everyone is so focused on being right and believes that changing their mind shows weakness, but in reality it is the opposite. Only confident people feel are able to change positions without affecting their self image.
    5. Factual questions are good, but questions that deepen the conversation are even better. “What were you thinking when she said that?” “Was that as hard as it sounds?” “How did you learn how to do that?” “What made you decide to go that route?”

      Try to ask questions that deepen the conversation

    6. An ideal conversation is a mix of listening, asking questions, and sharing in a way that allows the other person to politely guide the conversation. You must ask questions so that the other person knows that you are interested in them and what they are saying. The worst conversations are those where both parties are waiting their turn to talk, saying as much as they can before getting interrupted, and then being forced to listen to the other.

      Worst and best conversation types

    7. I could tell a story about building my pinball machine. A crafty technical friend might be very interested in hearing every step, but my mom might not be interested in more than about 10 seconds of detail on it.

      Pay attention how you share your information, and to whom

    8. I asked why she thought that was happening, and she said, “Covid”. The lack of socialization, especially at such a key time in life, had made this incoming class the first one that lacked basic manners and social skills.

      Covid has heavily impacted our social/communication skills

    9. This sort of behavior puts a huge burden on the listener because it makes them responsible for your emotional state. Their options are to deny you the emotional state you want, or to give it to you by lying. This is exhausting and will cause people to limit the amount of social time they spend with you.

      Pay attention what you're sharing with different people, as not everyone will feel comfortable with receiving the same information

    1. I hope to show how to demonstrate how easy model deployment can be using Posit’s open source tools for MLOps. This includes {pins}, {vetiver}, and the {tidymodels} bundle of packages along with the {tidyverse}.

      Consider the following packages while doing MLOps in R: - pins - vetiver - tidymodels - tidyverse

    1. W aktualnym stanie prawnym działalność nie podlega obowiązkowi rejestracji w przypadku, gdy miesięczny przychód z takiej działalności nie przekracza kwoty 50% minimalnego wynagrodzenia za pracę. Obecnie pensja minimalna wynosi 3010 zł, w związku z czym w 2022 roku wspomniany miesięczny limit jest równy kwocie 1505 zł
    1. Rotating the architect role among different team members carries the benefit of increasing the overall team’s working architectural knowledge. Team members gain a better understanding of all the roles involved in delivery, leading to empathy between team roles, improved intra-team interaction, and a better overall product by diversifying the viewpoints applied to each role.

      On rotating roles

    1. I posted my interview tracker spreadsheet on Twitter under the guise of “being transparent.” The very next day, I was on a phone screen with a recruiter when they said “Yeah, I looked at the interview spreadsheet you posted on Twitter and just based on that I can tell you’re not going to be a good fit here. I just took this call as a favor to <redacted> since they referred you.”
    2. If you had asked me right after I got laid off how long it would take me to get back to work, I would have said three months – including two months of vacation. It took me a year.

      2-3 months may turn into 1 year

    1. I stayed up each night until the problem was released (11pm my time), but I didn’t try to code up the solution right away. Instead, I read the problem description before bed and then thought about how to solve it while falling asleep. I usually woke up every morning with a full sketch of the solution in my head, or something close to it.

      Sleep tactic for solving programming challenges

  4. drew.shoes drew.shoes
    1. Wildcards tend to be labelled as "Swiss army knife", "generalist", or "jack of all trades". Each term fails to describe the full range of value that a Wildcard brings to the table.Wildcards fit best into the chaotic nature of early-stage startups.

      Wildcard people are good at many things but not a master

  5. Dec 2022
    1. If you know the TBW for an SSD and the total amount of data that has been written to it during its lifetime, it’s simple to estimate the percentage of its lifetime (as TBW) that has been used up, hence the amount remaining.
    1. “If something is not definitely ‘yes’, it is definitely ‘no’”. This is the leading idea of Greg McKeown’s great book Essentialism: The Disciplined Pursuit of Less
    2. Monastic and bimodal modes are rather reserved for professions that can manage work without intensive communication with people, like writers, scientists, researchers, etc. Journalist mode fits best to people that are experienced with deep work and can easily switch into that state. From my experience, the best option to start with deep work is the rhythmic mode.

      Advices around 4 different deep work modes

    1. At the end of the day, Copilot is supposed to be a tool to help developers write code faster, while ChatGPT is a general purpose chatbot, yet it still can streamline the development process, but GitHub Copilot wins hands down when the task is coding focused!

      GitHub Copilot is better at generating code than ChatGPT

    1. Summary: In people with normal vision (or corrected-to-normal vision), visual performance tends to be better with light mode, whereas some people with cataract and related disorders may perform better with dark mode. On the flip side, long-term reading in light mode may be associated with myopia.

      Dark mode vs light mode

    1. According to an analysis from the Wall Street Journal, the top 1% of Twitch streamers made over 50% of all money paid out by the platform in 2021. Furthermore, just 5% of users had made over $1,000 in the same year. Only 0.06% had made over the U.S. median household income of $67,521. In a survey of 5,000 community members composed of smaller Twitch streamers, Stream Scheme found that 76% were not able to reach Twitch’s $100 minimum payout threshold. Most others were making between $25-130 per month on the platform. 
    1. Your changes preserved through git stash are saved in your project’s .git directory, usually, the path is /.git/refs/stash

      Location where git stash saves files

    1. Think about your closest friends, and how these friendships happened. What needs are you fulfilling in each other’s lives? Are you happy with this state of affairs, or is something missing? What could be better?

      Exercise to solve while analyzing close friendships

    1. The main reason I soured on data science is that the work felt like it didn’t matter, in multiple senses of the words “didn’t matter”:

      The main reasons why Data Science work feels pointless

    1. if you need to solve a complicated, ill-defined problem, it will almost always help to write about it. Which in turn means that someone who's not good at writing will almost always be at a disadvantage in solving such problems.
    1. The trigger is when you allocate 700 or more container objects (classes, dicts, tuples, lists, etc) more than have been cleaned up, a GC cycle runs.

      Trigger for GC runs in Python

    1. ---.._ `\ ,;;;, "--.._ |,%%%%%% _ `\;;;; -\ _ _.'/\ try not to buy ,;;;;" .__{=====/_)==:_ || .io domains. ,,,;;;;;'`-./.____,'/ / '.\/ bcuz they're icky. 🤮 ;;;;;' `--.._.' / '-. `\/ ,'`. | __.-' \ ,' '`` `---`

      Advice around .io domains

    1. For sufficiently simple cases, just running a few commands sequentially, with no subshells, conditional logic, or loops, set -euo pipefail is sufficient (and make sure you use shellcheck -o all).

      Advice for when you can use shell scripts

    1. First of all, in Python there are no such things as "files" and I noticed this is the main source of confusion for beginners.If you're inside a directory that contains any __init__.py it's a directory composed of modules, not files.

      On "files" in Python

  6. Nov 2022
    1. In summary, terminal input is weird.  It’s weird largely because there’s a character-only pipe sitting between the terminal and shell.  And going one step back, the weirdness comes from the whole historical division of terminal vs. shell, which is based on emulating a physical hardware setup that hasn’t been built since the 80s. 

      Summary of why the terminal input is so weird.

    1. In MLflow 2.0, MLflow Recipes is now a core platform component with several new features, including support for classification models, improved data profiling and hyperparameter tuning capabilities.

      MLflow Recipes in MLflow 2.0

    1. we might need a dictionary subclass, and then we need to access a key that does not exist in that dictionary

      Example of applying __missing__ dunder method:

      ```python class DictSubclass(dict): def missing(self, key): print("Hello, world!")

      my_dict = DictSubclass() my_dict["this key isn't available"]

      Hello, world!

      ```

    2. The table also includes links to the documentation of the dunder method under the emoji 🔗. When available, relevant Pydon'ts are linked under the emoji 🗒️.

      Table below lists Python dunder methods

  7. Oct 2022
    1. MLOps engineer today is either an ML engineer (building ML-specific software) or a DevOps engineer. Nothing special here.Should we call a DevOps engineer who primarily operates ML-fueled software delivery an MLOps engineer?I mean, if you really want, we can, but I don’t think we need a new role here. It is just a DevOps eng.

      Who really is MLOps Engineer ;)

    1. After decades of experience, he knew and understood that the most meaningful conceptual progress he made on problems was always away from his computer: on a run, in the shower, laying in bed at night. That’s where the insight came. And yet, even after all these years, he still felt a strange obligation to be at his computer because that’s too often our the mental image of “working”.
    2. Work at MIT found that brainstorming—where a bunch of people put their heads together to try to come up with innovative solutions—generally “reduced creativity due to the tendency to incrementally modify known successful designs rather than explore radically different and potentially superior ones.”

      The "bad" side of brainstorming

    1. It's like paying a quarter of your house's value for earthquake insurance when you don't live anywhere near a fault line.

      What paying for cloud in some scenarios really is

    2. The cloud excels at two ends of the spectrum, where only one end was ever relevant for us. The first end is when your application is so simple and low traffic that you really do save on complexity by starting with fully managed services.

      1st great use of cloud services