Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: "The Plan" #13

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

WIP: "The Plan" #13

wants to merge 3 commits into from

Conversation

jqhoogland
Copy link
Collaborator

Opening up a PR for easier collaboration.

@jqhoogland jqhoogland marked this pull request as draft February 19, 2023 18:44
@jqhoogland
Copy link
Collaborator Author

I think the main thing that could use work is filling in the background knowledge. I think it would be ideal if this were written with a technical AI safety researcher in mind who is not yet familiar with algebraic geometry/theoretical physics/SLT. Though maybe this is asking too much, and familiarity with SLT should be taken as a prerequisite.

@@ -13,7 +13,7 @@ We describe an approach to scalable mechanistic interpretability of neural netwo

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is a good spot to summarize the key claim(s) of SLT?

Idk, call it "Watanabe's hypothesis: knowledge to be gained corresponds to singularities in level sets of the loss landscape" (but then less cryptic)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes agreed

@jqhoogland
Copy link
Collaborator Author

jqhoogland commented Feb 19, 2023

I think the main thing that could use work is filling in the background knowledge. I think it would be ideal if this were written with a technical AI safety researcher in mind who is not yet familiar with algebraic geometry/theoretical physics/SLT. Though maybe this is asking too much, and familiarity with SLT should be taken as a prerequisite.

Let me backpedal on this slightly. I think assuming some familiarity with SLT is fine and that a distilled version of this document can come later.

@jqhoogland
Copy link
Collaborator Author

We envision running such devices during a distribution of training runs of a neural network, and using them to examine the phases and phase transitions encountered. This gives us some idea of the nature of the singularities encountered and the universality class.

The wording "during a distribution of training runs" is a bit weird. If I understand right, we're running a bunch of training runs in parallel and using these to estimate the relevant order parameters?

@jqhoogland
Copy link
Collaborator Author

jqhoogland commented Feb 19, 2023

Alexander and I both find "Concepts as components" to be uninformative. I do like the alliteration thing though.

Subatomic physics of singularities? Concepts as the quarks of singularities?

@afdago
Copy link
Collaborator

afdago commented Feb 19, 2023

There is underlying philosophy of 'concepts as particles/ appromately discrete things' that is surprising and nontrivial. Something to unpack here for outsiders, especially people coming from machine learning.

@dmurfet
Copy link
Member

dmurfet commented Feb 19, 2023 via email

@dmurfet
Copy link
Member

dmurfet commented Feb 19, 2023 via email

@dmurfet
Copy link
Member

dmurfet commented Feb 19, 2023 via email

@afdago
Copy link
Collaborator

afdago commented Feb 20, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants