-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
making money using end-to-end reinforcement learning with self-replicating agents #3752
Comments
Diversity is essential to survival. Multiple independant code basis. |
f-MRI scans show that humans have social norms and reinforcement learning. |
f-MRI and public goods ToDo: biology-based models of trust (above paper gives real-world measurements of the human brain for trust). |
Hidden Technical Debt in Machine Learning Systems Google said: hardest part of AI is not AI. |
@MateiAnton A first key start for this hugely ambitious project is to get something more efficient going. As also mentioned in #4659 the current code is not deployed, only lab experiments:
First step is to use the stable API provided by the creator of Sporestack into Cloudomate. Then we have the stable building blocks for self-replication and making everything more sophisticated/"intelligent". Quick search came up with some prior work, not something we can re-use I believe. Fancy stuff, also far too complex to re-use; Stock Trading Bot using Deep Reinforcement Learning (Deep Q-learning), Keras and TensorFlow. Old 2008 scientific paper: Autonomous Forex Trading Agents |
reading on X-mas break and found related work, "autonomous bidding agent"
|
@MateiAnton busy with regular classes this 3rd quarter. Please understand, re-use, and extend ongoing/prior work: https://github.com/Tribler/distributed-ai-kernel (Python or Kotlin) |
It seems like this student work was either completed or dropped several years ago. I'll close this issue now. |
Scientific goal: create a collaborative live research ecosystem for reinforcement learning
It is impossible to publish in leading AI venues without industry-level resources. Scientist are being starved of contributing their knowledge due to lack of access. Without industry-level resources (thousands of cores from Google, Facebook, or Deepmind clusters) and valuable huge datasets it's impossible to compete.
The "publish or perish" model encourages scientists to cut as many corners as they can in order to produce as many publications as they can. This directly conflicts with the realities of AI, it's hard and requires a lot of work to provide an advancement on the state-of-the-art.
Unpublished codes and a sensitivity to training conditions have made it difficult for AI researchers to reproduce many key results. AI has become a form of "alchemy.". This initiative will create the first fully open re-usable environment. Ideas compete for success and can be re-used.
More specifically, we need open re-usable AI with embodiment and self-replication, see this Science Magazine publication. Bio-inspiration is key.
Engineering goal: making money in our micro-economy using end-to-end reinforcement learning with our framework of self-replicating agents using VPS/VPN buying and decentral market
This type of robot will sense the world around it and act upon it. Without intelligent actions it will fail to reproduce and die off. "Motor commands" in the above picture in the old-AI robot world are replaced with robo trading. The whole ecosystem is fully self-organising and has no point of control, central server, or single-point-of-failure.
Robo trading is based on crypto tokens. Since the launch of our first primitive ledger in 2007 we have been working on an accounting system for Bittorrent. Something we now call a token for Bittorrent. Our live deployment and self-replicating AI now make the next step possible: self-replicating AI based on deep reinforcement learning. One full-time phd student is responsible for realizing our token economy: #3337 (see pages of detail there).
From #2925 :
The basic idea is to create a micro-economy within the Tribler platform for earning, spending and trading bandwidth tokens. This brings together various research topics, including blockchain-powered decentralized market, anonymous downloading and hidden seeding. Trustworthy behavior and participation should be rewarded while cheating should be punished. A basic policy should prevent users from selfishly consuming bandwidth without giving anything back. This directly addresses the tragedy-of-the-commons phenomena.
Our initial release should provide basic primitives to earn, trade and spend tokens. Our work could be extended with more sophisticated techniques like TrustChain record mixing, multiple identities, a robust reputation mechanism for tunnel selection, global consensus and verifiable public proofs (proof-of-bandwidth/proof-of-relay).
Agent specifications
The agent will earn Trustchain records by seeding in Bittorrent and relaying Tor-like traffic. It will sell these records for Bitcoins or Ethereum on our decentral market. Using these coins and our Plebnet framework it will buy VPN and VPS infrastructure and essentially replicate. Detailed architecture of this ecosystem:
The role of AI
The challenge is to put AI at the core of this work. All decisions about money, tokens, replication, and hoarding credits for survival will be taken by autonomous intelligence. By applying end-to-end reinforcement learning we will use a single goal which will drive the behavior of agents: survival.
Generating income is only a means to an end, the primary object of survival. Every month the agent needs to have sufficient bitcoins or ethereum to replace or it will "die". Various parameters will be implemented to influence the behavior and strategies of an agent.
A lot of loose parts of this vision are already in place. The integration step and meaningful intelligence is still lacking. Plebnet is operational:
The text was updated successfully, but these errors were encountered: