WIP: "The Plan" #13

jqhoogland · 2023-02-19T18:44:30Z

Opening up a PR for easier collaboration.

jqhoogland · 2023-02-19T18:53:43Z

I think the main thing that could use work is filling in the background knowledge. I think it would be ideal if this were written with a technical AI safety researcher in mind who is not yet familiar with algebraic geometry/theoretical physics/SLT. Though maybe this is asking too much, and familiarity with SLT should be taken as a prerequisite.

jqhoogland · 2023-02-19T18:55:37Z

slt/align.md

@@ -13,7 +13,7 @@ We describe an approach to scalable mechanistic interpretability of neural netwo



Maybe this is a good spot to summarize the key claim(s) of SLT?

Idk, call it "Watanabe's hypothesis: knowledge to be gained corresponds to singularities in level sets of the loss landscape" (but then less cryptic)

slt/align.md

jqhoogland · 2023-02-19T19:02:49Z

I think the main thing that could use work is filling in the background knowledge. I think it would be ideal if this were written with a technical AI safety researcher in mind who is not yet familiar with algebraic geometry/theoretical physics/SLT. Though maybe this is asking too much, and familiarity with SLT should be taken as a prerequisite.

Let me backpedal on this slightly. I think assuming some familiarity with SLT is fine and that a distilled version of this document can come later.

…e field

jqhoogland · 2023-02-19T20:58:01Z

We envision running such devices during a distribution of training runs of a neural network, and using them to examine the phases and phase transitions encountered. This gives us some idea of the nature of the singularities encountered and the universality class.

The wording "during a distribution of training runs" is a bit weird. If I understand right, we're running a bunch of training runs in parallel and using these to estimate the relevant order parameters?

jqhoogland · 2023-02-19T21:00:31Z

Alexander and I both find "Concepts as components" to be uninformative. I do like the alliteration thing though.

Subatomic physics of singularities? Concepts as the quarks of singularities?

afdago · 2023-02-19T21:07:28Z

There is underlying philosophy of 'concepts as particles/ appromately discrete things' that is surprising and nontrivial. Something to unpack here for outsiders, especially people coming from machine learning.

dmurfet · 2023-02-19T21:46:24Z

Yes this is badly written. I don’t really mean “distribution of training runs” meaning “train GPT3 beginning to end 100 times”, more like, at various checkpoints you would pass that checkpoint off to a separate process that would run N local SGD instances near that point in order to try and estimate local properties (= the spectroscope, or whatever it should be called).

…

On 20 Feb 2023, at 7:58 am, jqhoogland ***@***.***> wrote: We envision running such devices during a distribution of training runs of a neural network, and using them to examine the phases and phase transitions encountered. This gives us some idea of the nature of the singularities encountered and the universality class. The wording "during a distribution of training runs" is a bit weird. If I understand right, we're running a bunch of training runs in parallel and using these to estimate the relevant order parameters? — Reply to this email directly, view it on GitHub <#13 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACOGSJUYA2WXDAFIFMMTODWYKCOJANCNFSM6AAAAAAVBDQSIE>. You are receiving this because your review was requested.

dmurfet · 2023-02-19T21:50:10Z

Yep agreed. It’s also not really accurate, since what I actually mean are “indecomposable objects of some category” and those are not necessarily the irreducible components, although they’re related. So to be clear, the idea is something like associated to each singular level set is some category whose indecomposable objects are perhaps the right objects to be thought of as “atomic”. Those representations in symmetry groups relevant to physics are associated to fundamental particles, or rather various configurations of fundamental particles. I think the suggestions “subatomic physics of singularities” or “quarks” are reasonable. But maybe there is a good latinate word for it? ChatGPT says "One possible word for the subatomic structure of concepts or the fundamental building blocks of language is "sememe". The term "sememe" was coined by linguist Michel Bréal in the late 19th century, and it refers to the smallest unit of meaning in a language. Similar to how a phoneme is the smallest unit of sound in a language, a sememe is the smallest unit of meaning. It represents the basic elements that are combined to create words, phrases, and sentences, and helps to explain how meaning is conveyed through language.”

…

On 20 Feb 2023, at 8:00 am, jqhoogland ***@***.***> wrote: Alexander and I both find "Concepts as components" to be uninformative. I do like the alliteration thing though. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because your review was requested.

dmurfet · 2023-02-19T21:51:43Z

Yes I agree, I’m not sure how to explain this. The shortest path into this idea that comes to mind for me is via physics not math, from “Schrodinger equation” to “symmetry group” to “irreducible representation” with e.g. Wigner’s theorem, gets you to understanding why some discrete representation-theoretic object might be relevant to dynamics.

…

On 20 Feb 2023, at 8:07 am, afdago ***@***.***> wrote: There is underlying philosophy of 'concepts as particles/ appromately discrete things' that is surprising and nontrivial. Something to unpack here for outsiders, especially people coming from machine learning. — Reply to this email directly, view it on GitHub <#13 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACOGSPDRCJOJVVZKL3JZEDWYKDRXANCNFSM6AAAAAAVBDQSIE>. You are receiving this because your review was requested.

afdago · 2023-02-20T01:04:59Z

Ooooh! very interested in atomic units of meaning as you know Sememes huh Sememetic Type theory huhh let me think if I like that On Sun, 19 Feb 2023 at 13:51, Daniel Murfet ***@***.***> wrote:

…

Yes I agree, I’m not sure how to explain this. The shortest path into this idea that comes to mind for me is via physics not math, from “Schrodinger equation” to “symmetry group” to “irreducible representation” with e.g. Wigner’s theorem, gets you to understanding why some discrete representation-theoretic object might be relevant to dynamics. > On 20 Feb 2023, at 8:07 am, afdago ***@***.***> wrote: > > > There is underlying philosophy of 'concepts as particles/ appromately discrete things' that is surprising and nontrivial. Something to unpack here for outsiders, especially people coming from machine learning. > > — > Reply to this email directly, view it on GitHub < #13 (comment)>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AACOGSPDRCJOJVVZKL3JZEDWYKDRXANCNFSM6AAAAAAVBDQSIE >. > You are receiving this because your review was requested. > — Reply to this email directly, view it on GitHub <#13 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AY27TN6DTFXZQLQCDVAQGNLWYKIXVANCNFSM6AAAAAAVBDQSIE> . You are receiving this because you commented.Message ID: <metauni/metauni. ***@***.***>

ADD numbers to track long list of conditions

f299103

jqhoogland requested a review from dmurfet February 19, 2023 18:44

jqhoogland marked this pull request as draft February 19, 2023 18:44

jqhoogland commented Feb 19, 2023

View reviewed changes

slt/align.md Show resolved Hide resolved

ADD some background on alignment & relation of interpretability to th…

25d0a65

…e field

FIX punctuation

88e9882

dmurfet approved these changes Feb 19, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: "The Plan" #13

WIP: "The Plan" #13

jqhoogland commented Feb 19, 2023

jqhoogland commented Feb 19, 2023

jqhoogland Feb 19, 2023

dmurfet Feb 19, 2023

jqhoogland commented Feb 19, 2023 •

edited

Loading

jqhoogland commented Feb 19, 2023

jqhoogland commented Feb 19, 2023 •

edited

Loading

afdago commented Feb 19, 2023

dmurfet commented Feb 19, 2023 via email

dmurfet commented Feb 19, 2023 via email

dmurfet commented Feb 19, 2023 via email

afdago commented Feb 20, 2023 via email

		@@ -13,7 +13,7 @@ We describe an approach to scalable mechanistic interpretability of neural netwo

WIP: "The Plan" #13

Are you sure you want to change the base?

WIP: "The Plan" #13

Conversation

jqhoogland commented Feb 19, 2023

jqhoogland commented Feb 19, 2023

jqhoogland Feb 19, 2023

Choose a reason for hiding this comment

dmurfet Feb 19, 2023

Choose a reason for hiding this comment

jqhoogland commented Feb 19, 2023 • edited Loading

jqhoogland commented Feb 19, 2023

jqhoogland commented Feb 19, 2023 • edited Loading

afdago commented Feb 19, 2023

dmurfet commented Feb 19, 2023 via email

dmurfet commented Feb 19, 2023 via email

dmurfet commented Feb 19, 2023 via email

afdago commented Feb 20, 2023 via email

jqhoogland commented Feb 19, 2023 •

edited

Loading

jqhoogland commented Feb 19, 2023 •

edited

Loading