Data can be manipulated
data can be corrupted in the DB
Data can be manipulated
data can be corrupted in the DB
theft of or unauthorized access
malicious human accesses
cannot disagree
the good old political correctness stuff
make our ideas more palatable
thus a hedging phrase
professional interactions
"Can I trust you with the task?"
Ritual
like: ack'ing one's existence; closer to hand-shaking than to conversing
with someone we see repeatedly
main diff. w/ lv. 1 -- you know each other
never actually left the company
Tax??
picture Steve Ballmer then
LOL
materialization
A fancier word for "caching".
a DAG
More specifically, "multiple inputs".
A DAG itself doesn't necessitate buffering; it's the need for re-using certain pieces of data that motivated the introduction of buffers in the previous paragraph.
ChatGPT
Up to this point, it is talking about InstructGPT.
mix in a small fraction of the original data
i.e., the "acadamic dataset" that "people care about". This is a trade-off.
fine-tuning language models with humans in the loop
An application of RLHF.
1.3B InstructGPT model over outputs from a 175B GPT-3 model
1.3B can outperform 175B on one particular task, which is instruction-following. Don't mistake this as "1.3B is magically always better than 175B". The 1.3B can do this thanks to fine-tuning.
The resulting InstructGPT models are much better at following instructions than GPT-3. They also make up facts less often, and show small decreases in toxic output generation.
Really, that's all thanks to the human feedback provided in RLHF. Since they add information to the model, those feedbacks should be considered an auxiliary data set.
our labelers provide demonstrations of the desired model behavior, and rank several outputs from our models.
That's RLHF.
visual cues
Performance issue solved with UI improvement.
the normal reward function
i.e., the in-game score.
The AI gradually builds a model of the goal of the task by finding the reward function that best explains the human’s judgments.
so there's no need for humans to write the reward function itself.
ask for human feedback on trajectory pairs where it’s most uncertain about
Emphasize on "most uncertain", to maximize the productivity of human attention.
Some people just want to watch the world burn: the prevalence, psychology and politics of the ‘Need for Chaos’
Takeaways: - People can be put into 4 categories: Low Chaos, Medium Chaos, Rebuild and High Chaos. - In US, UK, CA, and AU, around 20% people want some chaos, but a considerable fraction of this 20% want to rebuild the society non-violently. - Right wing correlates with High Chaos, but they only share the view against immgration.
a separate queue
Isn't this 2-queue system less efficient and more complex?
pay to join
Now you have a market.
requires everyone to be equally thirsty; otherwise we’d still get bad outcomes when less-thirsty newcomers displace their thirstier counterparts
This seems to be the key answer to my confusion.
so either system serves exactly the same number of drinkers.
Yes, but don't you end up with angry clients rather than simply pitiful ones?
offsetting disadvantage that a lot of people never get to drink. But that disadvantage is illusory
Why?
Well, actually they’d leave the line and try to re-enter as newcomers, but let’s suppose for the moment that we can effectively prohibit that behavior
Assume the total number of customers is limited. With a traditional queue, you end up having:
Is it a good deal to trade hatred towards your business for merely apparently shorter queues?
any Verifier program that succeeds in extracting information must also be able to extract information from a protocol run where rewinding is used and no information is available in the first place
I still don't understand this part.
it's
its
20
35
whereas in Rust the data and the operations live independently
Reminds me of golang.
fn five() -> i32 { 5 }
Most helpful function. Definitely love it.