-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: "The Plan" #13
base: main
Are you sure you want to change the base?
WIP: "The Plan" #13
Conversation
I think the main thing that could use work is filling in the background knowledge. I think it would be ideal if this were written with a technical AI safety researcher in mind who is not yet familiar with algebraic geometry/theoretical physics/SLT. Though maybe this is asking too much, and familiarity with SLT should be taken as a prerequisite. |
@@ -13,7 +13,7 @@ We describe an approach to scalable mechanistic interpretability of neural netwo | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is a good spot to summarize the key claim(s) of SLT?
Idk, call it "Watanabe's hypothesis: knowledge to be gained corresponds to singularities in level sets of the loss landscape" (but then less cryptic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes agreed
Let me backpedal on this slightly. I think assuming some familiarity with SLT is fine and that a distilled version of this document can come later. |
The wording "during a distribution of training runs" is a bit weird. If I understand right, we're running a bunch of training runs in parallel and using these to estimate the relevant order parameters? |
Alexander and I both find "Concepts as components" to be uninformative. I do like the alliteration thing though.
|
|
Yes this is badly written. I don’t really mean “distribution of training runs” meaning “train GPT3 beginning to end 100 times”, more like, at various checkpoints you would pass that checkpoint off to a separate process that would run N local SGD instances near that point in order to try and estimate local properties (= the spectroscope, or whatever it should be called).
… On 20 Feb 2023, at 7:58 am, jqhoogland ***@***.***> wrote:
We envision running such devices during a distribution of training runs of a neural network, and using them to examine the phases and phase transitions encountered. This gives us some idea of the nature of the singularities encountered and the universality class.
The wording "during a distribution of training runs" is a bit weird. If I understand right, we're running a bunch of training runs in parallel and using these to estimate the relevant order parameters?
—
Reply to this email directly, view it on GitHub <#13 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACOGSJUYA2WXDAFIFMMTODWYKCOJANCNFSM6AAAAAAVBDQSIE>.
You are receiving this because your review was requested.
|
Yep agreed. It’s also not really accurate, since what I actually mean are “indecomposable objects of some category” and those are not necessarily the irreducible components, although they’re related.
So to be clear, the idea is something like associated to each singular level set is some category whose indecomposable objects are perhaps the right objects to be thought of as “atomic”. Those representations in symmetry groups relevant to physics are associated to fundamental particles, or rather various configurations of fundamental particles. I think the suggestions “subatomic physics of singularities” or “quarks” are reasonable. But maybe there is a good latinate word for it?
ChatGPT says "One possible word for the subatomic structure of concepts or the fundamental building blocks of language is "sememe". The term "sememe" was coined by linguist Michel Bréal in the late 19th century, and it refers to the smallest unit of meaning in a language. Similar to how a phoneme is the smallest unit of sound in a language, a sememe is the smallest unit of meaning. It represents the basic elements that are combined to create words, phrases, and sentences, and helps to explain how meaning is conveyed through language.”
… On 20 Feb 2023, at 8:00 am, jqhoogland ***@***.***> wrote:
Alexander and I both find "Concepts as components" to be uninformative. I do like the alliteration thing though.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because your review was requested.
|
Yes I agree, I’m not sure how to explain this. The shortest path into this idea that comes to mind for me is via physics not math, from “Schrodinger equation” to “symmetry group” to “irreducible representation” with e.g. Wigner’s theorem, gets you to understanding why some discrete representation-theoretic object might be relevant to dynamics.
… On 20 Feb 2023, at 8:07 am, afdago ***@***.***> wrote:
There is underlying philosophy of 'concepts as particles/ appromately discrete things' that is surprising and nontrivial. Something to unpack here for outsiders, especially people coming from machine learning.
—
Reply to this email directly, view it on GitHub <#13 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACOGSPDRCJOJVVZKL3JZEDWYKDRXANCNFSM6AAAAAAVBDQSIE>.
You are receiving this because your review was requested.
|
Ooooh! very interested in atomic units of meaning as you know
Sememes huh
Sememetic Type theory huhh let me think if I like that
On Sun, 19 Feb 2023 at 13:51, Daniel Murfet ***@***.***>
wrote:
… Yes I agree, I’m not sure how to explain this. The shortest path into this
idea that comes to mind for me is via physics not math, from “Schrodinger
equation” to “symmetry group” to “irreducible representation” with e.g.
Wigner’s theorem, gets you to understanding why some discrete
representation-theoretic object might be relevant to dynamics.
> On 20 Feb 2023, at 8:07 am, afdago ***@***.***> wrote:
>
>
> There is underlying philosophy of 'concepts as particles/ appromately
discrete things' that is surprising and nontrivial. Something to unpack
here for outsiders, especially people coming from machine learning.
>
> —
> Reply to this email directly, view it on GitHub <
#13 (comment)>,
or unsubscribe <
https://github.com/notifications/unsubscribe-auth/AACOGSPDRCJOJVVZKL3JZEDWYKDRXANCNFSM6AAAAAAVBDQSIE
>.
> You are receiving this because your review was requested.
>
—
Reply to this email directly, view it on GitHub
<#13 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AY27TN6DTFXZQLQCDVAQGNLWYKIXVANCNFSM6AAAAAAVBDQSIE>
.
You are receiving this because you commented.Message ID: <metauni/metauni.
***@***.***>
|
Opening up a PR for easier collaboration.