79 Matching Annotations
  1. Jul 2025
    1. In today’s fast-moving, AI-powered era, autonomous agents are playing a bigger role than ever. They are helping businesses run smoother and making decisions affecting millions of lives every day. While these systems are designed to make our lives easier and unlock new opportunities, we can’t get carried away—we need to implement proper AI Agent Evaluation frameworks and best practices to ensure these systems actually work as intended and follow ethical AI principles.

      Explore the key metrics, tools, and frameworks used for AI agent evaluation. Learn how to assess performance, reliability, and efficiency of AI agents in real-world scenarios.

  2. Jun 2025
  3. Jan 2025
    1. most financial models described in the literature come from the corporate world, putting the need to maximize revenue in tension with higher education’s mission-based goals that are harder to quantify

      interesting: perhaps the overall impacts of education are hard to measure against the brass tacks of commercial endeavors...there are likely studies in the field of economics that could help weigh education's less monetary measures against commercial activities...the following paragraph suggests some avenues for exploration

  4. Nov 2024
  5. Jun 2024
  6. May 2024
  7. Oct 2023
  8. Sep 2023
    1. The progress and impact of the project will be measured and monitored through the collection ofquantitative indicators. The different systems of the project partners as well as ORCID Inc. andROR will be queried. If possible, indicators for all 10 PID use cases should be measured. Theseinclude for example the following indicators:● Number of registered DataCite DOIs by scientific institutions in Germany.● Number of registered DataCite-DOIs that have a link to further resources via arelated-IDentifier relationship.● Number of ROR implementations at scientific institutions in Germany.● Number of GND records that have an ORCID iD or a ROR ID.● Etc.

      PID Use Cases

  9. May 2023
    1. They are efficiency and effectiveness. For memory systems, an effective system is one that gets the right answer every time no matter how long it takes you. And the efficient system is one that uses the least amount of resources like time, associations, dependent systems, etc. but it may not be that good at providing the correct answers.

      Efficiency and effectiveness measures for specific mnemonic systems may vary from person to person, so one should consider them with respect to their own practices. There may not be a single "right" or "correct" practice universally, but there could be one for everyone individually based on their own choices or preferences.

  10. Mar 2023
  11. Jan 2023
  12. Nov 2022
    1. The “Goals, Signals, Metrics” framework is the most important thing I learned from Ciera, while at Google. This is a framework for defining metrics:Define the actual goal you’re trying to accomplish. Opt for a descriptive definition, for example, say “here is a description of what we want to actually accomplish with our work.” Not the vaguer, “I want to measure X.”Define signals. How would you know if you accomplished that goal, if you had infinite knowledge of everything? These are called “signals.” For example, a signal is “how happy are people with my product?” You cannot directly know how happy people are with your product, unless you’re magically all-knowing. However, if having happy customers is one of your goals, then this is the right signal for that.Figure out your metrics. These are proxies for that signal. Metrics are always proxies, there are no perfect metrics.

      Goals, Signals, Metric framework for defining metrics

    1. my takeaways

      • leading indicators (Work Item Age) are relevant for non-finished work, while lagging indicators (Cycle Time, Throughput) are relevant for finished Items

      • kanban metrics are of use in the Scrum Events

      • in Sprint Planning the key metric is Throughput, complemented with Work Item Aging for planning on work regarding leftover work from previous sprints.
      • in the Daily Scrum Devs concern themselves with the WiP and Work Item Aging.
      • Sprint Review revolves around Throughput, complemented by WIP and Cycle Time.
      • in Sprint Retrospective we focus on Cycle Time, Throughput & WIP, while taking a look also at Work Item Aging.

      Work in Progress

      = a number of work items started but not finished - start and finish are defined by Scrum Team's Definition of Workflow - an explicit policy that serves as a constraint to help shaping of the flow of work - historically visualized through the Cumulative Flow Diagram

      Cycle Time

      = time elapsed between when a work item starts and when it finishes - start is when the work item is pulled into the workflow - CT is a lagging indicator visualised in a Cycle Time Scatterplot from which we can read trends, distributions, and look at the anomalies - enables us to come to the *Service Level Expectation (=amount of time that we expect a work item to be finished in)

      Throughput

      = number of work items finished per unit of time - exact count of items, regardless of their size - measured usually at the finish line of the workflow - visualized either at a separate run chart, or as the angle of curves on a Cumulative Flow Diagram - can be read out of the Cycle Time Scatterplot as well → !is not velocity!

      Work Item Age

      = time elapsed between moment when the work item has been pulled into the workflow (=start) and the current time - complemented with Cycle Time it can show us which items are doing well and which are late - Work Item Age is the best metric to look at if you want to determine when an item that has already started is going to finish

  13. Aug 2022
  14. Jul 2022
  15. Apr 2022
    1. K-Anonymity, L-Diversity, and T-ClosenessIn this section, I will introduce three techniques that can be used to reduce the probability that certain attacks can be performed. The simplest of these methods is k-anonymity, followed by l-diversity, and then followed by t-closeness. Other methods have been proposed to form a sort of alphabet soup, but these are the three most commonly utilized. With each of these, the analysis that must be performed on the dataset becomes increasingly complex and undeniably has implications on the statistical validity of the dataset.

      privacy metrics

  16. Mar 2022
  17. Nov 2021
    1. um kevin anderson 00:12:43 if you can talk more about this issue both you and george assad raymond and so many other climate activists talking about this issue of wealth 00:12:55 you say per capita is a flawed metric as most polluting industries have been moved to developing nations so it's not reflective of the rich nation's emissions take all of this on 00:13:09 yeah i mean that's a really key issue and i think if i focus in here on the uk where i know it's a place obviously i know much better that what we've done in the uk we've closed down a lot of our industry and then we import the manufactured goods from elsewhere in the 00:13:22 world and then we turn around to those parts of the world and then we blame them for the emissions in manufacturing the goods that we are enjoying and that's everything from our electronic goods to parts for our cars as our clothes so you know the uk is 00:13:35 effectively moved to a bar and banking culture and and and offshore virtually everything else and so we when we looking at our total amount of emissions we have to take account of the carbon footprint of our lifestyles and that 00:13:47 does include the emissions that we associated with things that we import and export i mean you take that into account you tend to find that most wealthy countries have a much larger carbon footprint than when you just look at the energy they use within their 00:14:00 boundaries and i think it's really key again when we think about these issues of equity we we that we take this what's often referred to as a consumption-based accounting method we take that into account because it is unfair to be 00:14:12 penalizing poor parts of the world for them making things to help us have a better quality of life over here and when we do that then the challenges get even more striking in terms of what we have to do and it also also brings out 00:14:25 even further the issues of equity the disparity between the richer parts of the world and the poorer parts of the world but i also think on the equity point it's really worth bringing out that it's not as if everyone in the uk is even 00:14:37 there isn't just one public in the uk there are multiple publics there were those of us who are the wealthy ones in our own country that are responsible for the lion's share of missions within the uk that will be true chain for the u.s for germany for japan australia and so 00:14:50 within all of our countries there are large swathes of the country who are the average and below average consumers and for them the response to climate change is very different from those of us who are in our own countries are responsible for the lion's share of 00:15:03 emissions so i think we have to differentiate not just between countries but even within our countries and my concern there is that who are the people that frame the climate dubai debate they're the climate scientists and the academics they're the 00:15:14 entrepreneurs the business leaders the journalists the barristers they're all the people that are in the very high emitting category so we frame the debate and we never ever frame the debate with equity at its core and with regardless 00:15:26 of our maths or our moral sorry regardless of our moral position the maths tell us if we are to deliver on the commitments then equity has to be a key part of our responses but we never talk about that because we are in that 00:15:38 high emitting group

      Kevin points out why a CONSUMPTION-BASED METRIC is more accurate than PER CAPITA metric, as the PER CAPITA metric does not include the embodied carbon emissions of the manufactured goods that consumers purchase. Per Capita metric reflects that the manufacture is responsible, not the consumer, an inaccurate moral indication.

      We have also noticed that wealthy and poor exist in ALL countries of the world and the more nuanced terminology we employ based on a Country-Wealth Sector classification matrix as described here:

      https://medium.com/@gien_SRG/more-nuanced-terminology-for-post-colonialist-inequality-af2f1609635c

      Using this new terminology, Monbiot and Anderson are referring to the North-North and South-North class as all the elites of the world has having the highest personal carbon footprint whilst the North-South and South-South class are the victims.

  18. Oct 2021
    1. Note also: this incentive is in fact far more hard-headed than any metric of hedonic economism—such as GDP, which is measuring the amount of desire satisfied by the productive sector. At best GDP is a revenue metric. A prudent manager will manage an enterprise to maximize capital and profit, not revenue.

      Also agreed; measures how much, not how well

  19. Aug 2021
  20. Mar 2021
    1. The urgent argument for turning any company into a software company is the growing availability of data, both inside and outside the enterprise. Specifically, the implications of so-called “big data”—the aggregation and analysis of massive data sets, especially mobile

      Every company is described by a set of data, financial and other operational metrics, next to message exchange and paper documents. What else we find that contributes to the simulacrum of an economic narrative will undeniably be constrained by the constitutive forces of its source data.

  21. Feb 2021
  22. Oct 2020
  23. Sep 2020
  24. Jun 2020
  25. May 2020
  26. Apr 2020
  27. Nov 2019
  28. Sep 2019
  29. May 2019
    1. Amanda Matos - Métricas & DevOps - Por que você deve medir para conquistar?

      Operações e monitoramento têm um tópico específico entre as exigências do LPI na certificação DevOps. A Amanda irá contar como implementar métricas com ferramentas de código aberto. Fica de olho!

      705.1 IT Operations and Monitoring (weight: 4)

  30. Dec 2018
    1. For many years, academia has relied on citation count as the main way in which we measure impact or importance of research. As a result, citation count is one of the primary metrics used when evaluating researchers. Citation counts also form the basis for other metrics, most notably Clarivate’s Impact Factor as well as the h-index, which respectively evaluate journal quality/prestige and researcher renown.

      The metrics the Academy uses to measure "impact" are regressive.

  31. Nov 2018
    1. We used the following keywords:‘ontology metrics’, ‘ontology evaluation framework’, and ‘ontology evaluation’. From the results,we selected only papers including primarily structural metrics.

      A similar study on metrics and evaluation of classifications, taxonomies, thesaurus and other knowledge organization systems would be interesting!

    Tags

    Annotators

  32. Nov 2017
    1. We calibrate the model for 6 countries at various stages of economic development: 3 low-incomecountries (Uganda in 2005, Kenya in 2006, and Mozambique in 2006), and 3 emerging marketeconomies (Malaysia in 2007, Philippines in 2008 and Egypt in 2007).

      Data & Calibration

    2. In the model, agents are heterogeneous { distinguished from each other by wealth and talent.Individuals choose in each period whether to become an entrepreneur or to supply labor for a wage.Workers supply labor to entrepreneurs and are paid the equilibrium wage. Entrepreneurs haveaccess to a technology that uses capital and labor for production. In equilibrium, only talentedindividuals with a certain level of wealth choose to become entrepreneurs.

      In this model, a heterogenous population is 'created' and differentiated by their talent and wealth. Only people with enough of both can be entrepreneur, otherwise they will stay as wage earners.

    1. The Solow–Swan model augmented with human capital predicts that the income levels of poor countries will tend to catch up with or converge towards the income levels of rich countries if the poor countries have similar savings rates for both physical capital and human capital as a share of output, a process known as conditional convergence.

      Income convergence of the poor and rich people will happen conditional on them enjoying similar savings rates. Otherwise, it might not happen.

    1. Maybe not a ELI5, but: Moments are expectations of things. E(X) is often called the "first moment," E(X2 ) is the "second moment," etc. They can also be more complicated, like E(exp(5x+y)), or whatever. In econometrics, you're trying to figure out something about the underlying distribution of your y's and your x's (and the errors). Often you don't know the shape of the distribution, but you know some moments of the distribution. This is useful because you can't use maximum likelihood estimation unless you make assumptions on the entire distribution. With ordinary least squares, you assume that E(ex) = 0, that is, the errors are uncorrelated with the regressors. You can write e = y - xbeta, to get a moment condition E(x(y-xbeta)) = 0. If you do GMM with this moment condition, you get the regular OLS estimator. If you have endogeneity of some kind, you don't know that E(ex) = 0, but you might have some instruments Z, such that E(ez) = 0. This gives you the moment condition E(z(y - x*beta)) = 0. GMM is nice because it makes relatively weak assumptions compared to other ways of estimating parameters. I hope that helps!

      GMM - I don't get it, at all. What are moments? How are they used? Why are they used? Thanks all!

  33. Sep 2017
  34. Jul 2017
  35. Feb 2017
    1. (Among the recommendations: Greater emphasis on visuals, greater variety of formats and voices. They also announced that the Times would be introducing an alternative metric to pageviews that would “measure an article’s value to attracting and retaining subscribers.”)

      How would they measure this exactly?

  36. Jan 2016
  37. Jul 2015
    1. I think it is possibly too early to tell.

      As Steven Hill of HEFCE recently suggested, it might be far better for UK institutions to reject these tables: "What if all UK institutions made a stand against global rankings, and stopped using them for promotional purposes? The reputation of the UK’s higher education sector would stand firm, and a really strong signal would be sent to the rest of the world. Not drifting, but steering purposely through the metric tide." http://blog.hefce.ac.uk/2015/07/08/the-metrics-dilemma/

  38. Oct 2014
  39. Sep 2013