Glossary – OU-Tulsa Lab of Image and Information Processing

Coroutine: like a subroutine but can stop in the middle and back to the point where it “yields”. When it yields, it yields to some other cooperative functions. It is popular in multithreading environment. Since there is no need for synchronization.
Credit assignment: can incorporate many things, check out interview of Yoshua Bengio by Lex Fridman. In RL, it refers to how past action will inference future outcome.
- Hindsighted credit assignment: instead of relying on MDP, try to learn credit relevance directly.
Embedding: represent an object with a vector. Essentially it is the same as one-hot encoding followed by product of an embedding matrix.
Goal preservation: a sufficiently sophisticated enough agent will prevent its goal from being modified.
- Terminal vs instrumental goal: terminal goal is what is given. Instrumental goal can be derived by an intelligent agent to try achieve the terminal goal.
Hume’s Guilotine: there is an unavoidable separation of coherent logic flow from descriptive (is) statement and prescriptive (ought) statement
Pascal’s wager: it is better to believe in God. If one doesn’t, one will be tortured for infinity afterlife if God exist.
- Problem: payoff is very large but probability is very small as well. The expected payoff can be anything
- Counter-argument: there is another God and required me to believe in him only as well. If I don’t, I will also be penalized.
Privacy: a dataset is private if combinations of operation (mechanism) will not disclose sensitive characteristic of any one record in the dataset
- Differential private: adding/deleting a single record will have no effective change of the dataset
Reasoning:
Representation:
- One-hot encoding: each class occupy one dimenstion
- Bag-of-words (BoW): ignore location of words, just represent a document by the histogram of its words
Reward hacking: an agent tries to exploit the slight difference between the reward function meant to be and the reward function that was actually written
Reward modeling: rather than manually defining a reward function, construct a reward function through training
Side effect: fail to include specify in the reward function everything that we care about. The agent may assume that everything not mentioned in the reward function has zero value. This may lead to large negative side effect
Transformer:
- Attention: softmax(Q^T K /sqrt(dk)) V
  - Q: query
  - K: key
  - V: value
- Masked attention: softmax((Q^T K + M)/sqrt(d_k)) V
  - M: masked = – inf for future words
- Layer normalization: normalize output to have zero mean and variance
- Positional encoding: sinusoidal codes added to input to include position information
- GPT, GPT-2: build a language model instead of translator. Similar to autoencoder, both encoder and “decoder” trained with the same data. Can essentially remove the encoder. Tested for different task without refinement
- Bert: Introduce bidirectional idea, try to predict from word before and after the target.
- XLnet:

Meaningful research directions

Interpretability
reasoning

Deep

Regarding optimization:
- a zhihu summary
- gradient descent with restart
- fit one cycle

Cancer papers

change in cancer detection

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30