diff --git a/docs-2.0/1.introduction/0-0-graph.md b/docs-2.0/1.introduction/0-0-graph.md new file mode 100644 index 00000000000..647e27ccb31 --- /dev/null +++ b/docs-2.0/1.introduction/0-0-graph.md @@ -0,0 +1,210 @@ +# An introduction to graphs + +People from tech giants (such as Amazon and Facebook) to small research teams are devoting significant resources to exploring the potential of graph databases to solve data relationships problems. What exactly is a graph database? What can it do? Where does it fit in the database landscape? To answer these questions, we first need to understand graphs. + +Graphs are one of the main areas of research in computer science. Graphs can efficiently solve many of the problems that exist today. This topic will start with graphs to explain the advantages of graph databases and their great potential in modern application development, and then describe the differences between distributed graph databases and several other types of databases. + +## What are graphs? + +Graphs are everywhere. When hearing the word graph, many people think of bar charts or line charts, because sometimes we call them graphs, which show the connections between two or more data systems. The simplest example is the following picture, which shows the number of Nebula Graph GitHub repository stars over time. + +![image](https://user-images.githubusercontent.com/42762957/91426247-d3861000-e88e-11ea-8e17-e3d7d7069bd1.png "This is not the graph talked about in this book") + +This type of diagram is often called a line chart. As you can see, the number of starts rises over time. A line chart can show data changes over time (depending on the scale settings). Here we have given only examples of line charts. There are various graphs, such as pie charts, bar charts, etc. + +Another kind of diagram is often used in daily conversation, such as image recognition, retouched photos. This type of diagram is called a picture/photo/image. + +![image](https://docs-cdn.nebula-graph.com.cn/books/images/image.png "This is not te book talked about in this book") + +The diagram we discuss in this topic is a different concept, the graph in graph theory. + +In graph theory, a branch of mathematics, graphs are used to represent the relationships between entities. A graph consists of several small dots (called vertices or nodes) and lines or curves (called edges) that connect these dots. The term graph was proposed by Sylvester in 1878. + +The following picture is what this topic calls a graph. + +![Image](https://docs-cdn.nebula-graph.com.cn/books/images/undirectedgraph.png) + +Simply put, graph theory is the study of graphs. Graph theory began in the early 18th century with the problem of the Seven Bridges of Königsberg. Königsberg was then a Prussian city (now part of Russia, renamed Kaliningrad). The river Preger crossed Königsberg and not only divided Königsberg into two parts, but also formed two small islands in the middle of the river. This divided the city into four areas, each connected by seven bridges. There was a game associated with Königsberg at the time, namely how to cross each bridge only once and navigate the entire four areas of the city. A simplified view of the seven bridges is shown below. Try to find the answer to this game if you are interested [^171]. + +![image](https://user-images.githubusercontent.com/42762957/91536940-1526c180-e948-11ea-8fe8-90f40ce28171.png) + +[^171]: Souce of the picture: https://medium.freecodecamp.org/i-dont-understand-graph-theory-1c96572a1401. + +To solve this problem, the great mathematician Euler by abstracting the four regions of the city into points and the seven bridges connecting the city into edges connecting the points, proved that the problem was unsolvable. The simplified abstract diagram is as follows [^063]. + +![image](https://user-images.githubusercontent.com/42762957/91538126-e578b900-e949-11ea-980c-5704254e8063.png) + +[^063]: Source of the picture: https://medium.freecodecamp.org/i-dont-understand-graph-theory-1c96572a1401 + +The four dots in the picture represent the four regions of Königsberg, and the lines between the dots represent the seven bridges connecting the four regions. It is easy to see that the area connected by the even-numbered bridges can be easily passed because different routes can be chosen to come and go. The areas connected by the odd-numbered bridges can only be used as starting or endings points because the same route can only be taken once. The number of edges associated with a node is called the node degree. Now it can be shown that the Königsberg problem can only be solved if two nodes have odd degrees and the other nodes have even degrees, i.e., two regions must have an even number of bridges and the remaining regions have an odd number of bridges. However, as we know from the above picture, there is no even number of bridges in any region of Königsberg, so this puzzle is unsolvable. + +## Property graphs + +From a mathematical point of view, graph theory studies the relationships between modeled objects. However, it is common to extend the underlying graph model. The extended graphs are called the **attribute graph model**. A property graph usually consists of the following components. + +- Node, an object or entity. In this topic, nodes are called vertices. +- Relationship between nodes. In this topic, relationships are called edges. Usually, the edges can be directed or undirected to indicate a relationship between two entities. +- There can be properties on nodes and edges. + +In real life, there are many examples of property graphs. + +For example, Qichacha or BOSS Zhipin use graphs to model business equity relationships. A vertex is usually a natural person or a business, and the edge is the equity relationship between a person and a business. The properties on vertices can be the name, age, ID number, etc. of the natural person. The properties on edges can be the investment amount, investment time, position such as director and supervisor. + +![image](https://docs-cdn.nebula-graph.com.cn/books/images/enterprise-relations.png) + +A vertex can be a listed company and an edge can be a correlation between listed companies. The vertex property can be a stock code, abbreviation, market capitalization, sector, etc. The edge property can be the time-series correlation coefficient of the stock price [^T01]. + +[^T01]: https://nebula-graph.com.cn/posts/stock-interrelation-analysis-jgrapht-nebula-graph/ + +The graph relationship can also be similar to the character relationship in a TV series like Game of Thrones [^s-01]. Vertices are the characters. Edges are the interactions between the characters. Vertex properties are the character's names, ages, camps, etc., and edge properties are the number of interactions between two characters. + +![image](https://docs-cdn.nebula-graph.com.cn/books/images/game-of-thrones-01.png) + +[^s-01]: https://nebula-graph.com.cn/posts/game-of-thrones-relationship-networkx-gephi-nebula-graph/ + +Graphs are also used for governance within IT systems. For example, a company like WeBank has a very large data warehouse and corresponding data warehouse management tools. These management tools record the ETL relationships between the Hive tables in the data warehouse through Job implementation [^ware]. Such ETL relationships can be very easily presented and managed in the form of graphs, and the root cause can be easily traced when problems arise. + +![image](https://docs-cdn.nebula-graph.com.cn/books/images/dataware2.png) + +[^ware]: https://nebula-graph.com.cn/posts/practicing-nebula-graph-webank/ + +Graphs can also be used to document the invocation relationships between the intricate microservices within a large IT system [^tice], which is used by operations teams for service governance. Here each point represents a microservice and the edge represents the invocation relationship between two microservices; thus, Ops can easily find invocation links with availability below a threshold (99.99%) or discover microservice nodes that would be particularly affected by a failure. + +Graphs are also used to record the invocation relationships between the intricate microservices [^tice]. Each vertex represents a microservice and an edge represents the invocation relationship between two microservices. This allows Ops to easily find call links with availability below a threshold (99.99%), or to discover microservice nodes where a failure would have a particularly large impact. + +Graphs can also be used to improve the efficiency of code development. Graphs store function call relationships between codes [^tice] to improve the efficiency of reviewing and testing the code. In such a graph, each vertex is a function or variable, each edge is a call relationship between functions or variables. When there is a new code commit, one can more easily see other interfaces that may be affected, which helps testers better assess potential go-live risks. + +[^tice]: https://nebula-graph.com.cn/posts/meituan-graph-database-platform-practice/ + +In addition, we can discover more scenarios by adding some temporal information as opposed to a static property graph that does not change. + +For example, inside a network of interbank account fund flows [^1440w], a vertex is an account, an edge is the transfer record between accounts. Edge properties record the time, amount, etc. of the transfer. Companies can use graph technology to easily explore the graph to discover obvious misappropriation of funds, paying back a load to with the loan, loan gang scams, and other phenomena. + +![image](https://docs-cdn.nebula-graph.com.cn/books/images/bank-transfer.jpg) + +[^1440w]: https://zhuanlan.zhihu.com/p/90635957 + +The same approach can be used to explore the discovery of the flow of cryptocurrencies. + +![image](https://docs-cdn.nebula-graph.com.cn/books/images/block-chain.png) + +In a network of accounts and devices [^360], vertices can be accounts, mobile devices, and WIFI networks, edges are the login relationships between these accounts and mobile devices, and the access relationships between mobile devices and WIFI networks. + +![image](https://docs-cdn.nebula-graph.com.cn/books/images/360-user-1.png) + +These graph data records the characteristic of the network black production operations. Some big companies such as 360 DigiTech[^360], Kuaishou[^kuaishou], WeChat[^weixin], Zhihu[^zhihu], and Ctrip Finance all identified over a million crime groups through technology. + +![image](https://docs-cdn.nebula-graph.com.cn/books/images/360-user-2.png) + +[^360]: https://nebula-graph.com.cn/posts/graph-database-data-connections-insight/ + +[^kuaishou]: https://nebula-graph.com.cn/posts/kuaishou-security-intelligence-platform-with-nebula-graph/ + +[^weixin]: https://nebula-graph.com.cn/posts/nebula-graph-for-social-networking/ + +[^zhihu]: https://mp.weixin.qq.com/s/K2QinpR5Rplw1teHpHtf4w + +In addition to the dimension of time, you can find more scenarios for property graphs by adding some geographic location information. + +For an example of tracing the source of the Coronavirus Disease (COVID-19) [^CoV02], vertices are the person and edges are the contact between people. Vertex properties are the information of the person's ID card and onset time, and edge properties are the time and geographical location of the close contact between people, etc. It provides help for health prevention departments to quickly identify high-risk people and their behavioral trajectories. + +![image](https://www-cdn.nebula-graph.com.cn/nebula-blog/nCoV02.png) + +[^CoV02]: https://nebula-graph.com.cn/posts/detect-corona-virus-spreading-with-graph-database/ + +The combination of geographic location and graph is also used in some O2O scenarios, such as real-time food recommendation based on POI (Point-of-Interest) [^mt], which enables local life service platform companies like Meituan to recommend more suitable businesses in real-time when consumers open the APP. + +[^mt]: https://nebula-graph.com.cn/posts/meituan-graph-database-platform-practice/ + +A graph is also used for knowledge inference. Huawei, Vivo, OPPO, WeChat, Meituan, and other companies use graphs for the representation of the underlying knowledge relationships. + +## Why do we use graph databases? + +Although relational databases and semi-structured databases such as XML/JSON can be used to describe a graph-structured data model, a graph (database) not only describes the graph structure and stores data itself but also focuses on handling the associative relationships between the data. Specifically, graph databases have several advantages: + +- Graphs are a more visual and intuitive way of representing knowledge to human brains. This allows us to focus on the business problem itself rather than how to describe the problem as a particular structure of the database (e.g., a table structure). + +- It is easier to show the characteristic of the data in graphs. Such as transfer paths and nearby communities. To analyze the relationships of characters and character importance in Game of Thrones, data displayed with tables is not as intuitive as with graphs. + + ![image](https://www-cdn.nebula-graph.com.cn/nebula-blog/game-of-thrones-01.png) + + Especially when some central vertices are deleted: + + ![image](https://www-cdn.nebula-graph.com.cn/nebula-blog/tv-game-thrones.png) + + Adding an edge can completely change the entire topology. + + ![image](https://www-cdn.nebula-graph.com.cn/nebula-blog/tv-game-thrones-02.png) + + We can intuitively sense the importance of minor changes in graphs rather than in tables. + +- Graph query language is designed based on graph structures. The following is a query example in LDBC. Requirements: Query the posts posted by a person, and query the corresponding replies (the replies themselves will also be replied multiple times). Since the posting time and reply time both meet certain conditions, you can sort the results according to the number of replies. + + ![image](https://docs-cdn.nebula-graph.com.cn/books/images/efficientquery.png) + + Write querying statements using PostgreSQL: + + ```SQL + --PostgreSQL + WITH RECURSIVE post_all(psa_threadid + , psa_thread_creatorid, psa_messageid + , psa_creationdate, psa_messagetype + ) AS ( + SELECT m_messageid AS psa_threadid + , m_creatorid AS psa_thread_creatorid + , m_messageid AS psa_messageid + , m_creationdate, 'Post' + FROM message + WHERE 1=1 AND m_c_replyof IS NULL -- post, not comment + AND m_creationdate BETWEEN :startDate AND :endDate + UNION ALL + SELECT psa.psa_threadid AS psa_threadid + , psa.psa_thread_creatorid AS psa_thread_creatorid + , m_messageid, m_creationdate, 'Comment' + FROM message p, post_all psa + WHERE 1=1 AND p.m_c_replyof = psa.psa_messageid + AND m_creationdate BETWEEN :startDate AND :endDate + ) + SELECT p.p_personid AS "person.id" + , p.p_firstname AS "person.firstName" + , p.p_lastname AS "person.lastName" + , count(DISTINCT psa.psa_threadid) AS threadCount + END) AS messageCount + , count(DISTINCT psa.psa_messageid) AS messageCount + FROM person p left join post_all psa on ( + 1=1 AND p.p_personid = psa.psa_thread_creatorid + AND psa_creationdate BETWEEN :startDate AND :endDate + ) + GROUP BY p.p_personid, p.p_firstname, p.p_lastname + ORDER BY messageCount DESC, p.p_personid + LIMIT 100; + ``` + + Write querying statements using Cypher designed especially for graphs: + + ```Cypher + --Cypher + MATCH (person:Person)<-[:HAS_CREATOR]-(post:Post)<-[:REPLY_OF*0..]-(reply:Message) + WHERE post.creationDate >= $startDate AND post.creationDate <= $endDate + AND reply.creationDate >= $startDate AND reply.creationDate <= $endDate + RETURN + person.id, person.firstName, person.lastName, count(DISTINCT post) AS threadCount, + count(DISTINCT reply) AS messageCount + ORDER BY + messageCount DESC, person.id ASC + LIMIT 100 + ``` + +- Graph traversal (corresponding to Join in SQL) is much more efficient because the storage and query engines are designed specifically for the structure of the graph. + +- Graph databases have a wide range of application scenarios. Examples include data integration (knowledge graph), personalized recommendations, fraud, and threat detection, risk analysis, and compliance, identity (and control) verification, IT infrastructure management, supply chain, and logistics, social network research, etc. + +- According to the literature [^Ubiquity], the fields that use graph technology are (from the greatest to least): information technology (IT), research in academia, finance, laboratories in industry, government, healthcare, defense, pharmaceuticals, retail, and e-commerce, transportation, telecommunications, and insurance. + +[^Ubiquity]: https://arxiv.org/abs/1709.03188 + +- In 2019, according to Gartner's questionnaire research, 27% of customers (500 groups) are using graph databases and 20% have plans to use them. + +## RDF + +This topic does not discuss the RDF data model due to space limitations. \ No newline at end of file diff --git a/docs-2.0/1.introduction/0-1-graph-database.md b/docs-2.0/1.introduction/0-1-graph-database.md new file mode 100644 index 00000000000..ded80a02148 --- /dev/null +++ b/docs-2.0/1.introduction/0-1-graph-database.md @@ -0,0 +1,241 @@ +# Market overview of graph databases + +Now that we have discussed what a graph is, let's move on to further understanding graph databases developed based on graph theory and the property graph model. + +Different graph databases may differ slightly in terms of terminology, but in the end, they all talk about vertices, edges, and properties. As for more advanced features such as labels, indexes, constraints, TTL, long tasks, stored procedures, and UDFs, these advanced features will vary significantly from one graph database to another. + +Graph databases use graphs to store data, and graphs are one of the closest to highly flexible, high-performance data structures. A graph database is a storage engine specifically designed to store and retrieve large information, which efficiently stores data as vertices and edges and allows high-performance retrieval and querying of these vertex-edge structures. We can also add properties to these vertices and edges. + +## Third-party services market predictions + +### DB-Engines ranking + +According to DB-Engines.com, the world's leading database ranking site, graph databases have been the fastest growing database category since 2013 [^dbe]. + +The site counts trends in the popularity of each category based on several metrics, including records and trends based on search engines such as Google, technical topics discussed on major IT technology forums and social networking sites, job posting changes on job boards. 371 database products are included in the site and are divided into 12 categories. Of these 12 categories, a category like graph databases is growing much faster than any of the others. + +![Image](https://docs-cdn.nebula-graph.com.cn/books/images/db-rankings.png) + +[^dbe]: https://db-engines.com/en/ranking_categories + +### Gartner’s predictions + +Gartner, one of the world's top think tanks, identified graph databases as a major business intelligence and analytics technology trend long before 2013 [^Gartner1]. At that time, big data was hot as ever, and data scientists were in a hot position. + +![Image](https://docs-cdn.nebula-graph.com.cn/books/images/gartner.jpg) + +[^Gartner1]: https://www.yellowfinbi.com/blog/2014/06/yfcommunitynews-big-data-analytics-the-need-for-pragmatism-tangible-benefits-and-real-world-case-165305 + +Until recently, graph databases and related graph technologies were ranked in the Top 10 Data and Analytics Trends for 2021 [^Gartner2]. + +![Image](https://images-cdn.newscred.com/Zz01NWM5ZDE3YzcxM2UxMWViODBhMDE5NTExMjNjOTZmZQ==) + +!!! quote "Trend 8: Graph Relates Everything" + + Graphs form the foundation of many modern data and analytics capabilities to find relationships between people, places, things, events, and locations across diverse data assets. D&A leaders rely on graphs to quickly answer complex business questions which require contextual awareness and an understanding of the nature of connections and strengths across multiple entities. + + Gartner predicts that by 2025, graph technologies will be used in 80% of data and analytics innovations, up from 10% in 2021, facilitating rapid decision-making across the organization. + +[^Gartner2]: https://www.gartner.com/smarterwithgartner/gartner-top-10-data-and-analytics-trends-for-2021/ + +It can be noted that Gartner's predictions match the DB-Engines ranking well. There is usually a period of rapid bubble development, then a plateau period, followed by a new bubble period due to the emergence of new technologies, and then a plateau period. And so on in a spiral. + +### Market size of graph databases + +According to statistics and forecasts from Verifiedmarketresearc[^ver], fnfresearch[^fnf], MarketsandMarkets[^mam], and Gartner[^gar], the global graph database market size to grow from about USD 0.8 billion in 2019 to USD 3-4 billion by 2026, at a Compound Annual Growth Rate (CAGR) of about 25%, which corresponds to about 5%-10% market share of the global database market. + +![Image](https://www.verifiedmarketresearch.com/wp-content/uploads/2020/10/Graph-Database-Market-Size.jpg) + +[^ver]: https://www.verifiedmarketresearch.com/product/graph-database-market/ + +[^fnf]: https://www.globenewswire.com/news-release/2021/01/28/2165742/0/en/Global-Graph-Database-Market-Size-Share-to-Exceed-USD-4-500-Million-By-2026-Facts-Factors.html + +[^mam]: +https://www.marketsandmarkets.com/Market-Reports/graph-database-market-126230231.html + +[^gar]: https://www.gartner.com/en/newsroom/press-releases/2019-07-01-gartner-says-the-future-of-the-database-market-is-the + +## Market participants + +### Neo4j, the pioneer of (first generation) graph databases + +Although some graph-like data models and products, and the corresponding graph language G/G+ had been proposed in the 1970s (e.g. CODASYL [^DDIA]). But it is Neo4j, the main pioneer in this market, that has really made the concept of graph databases popular, and even the two main terms (labeled) property graphs and graph databases were first introduced and practiced by Neo4j. + +[^DDIA]: https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321 + +[^Glang]: I. F. Cruz, A. O. Mendelzon, and P. T. Wood. A Graphical Query Language Supporting Recursion. In Proceedings of the Association for Computing Machinery Special Interest Group on Management of Data, pages 323–330. ACM Press, May 1987. + + +!!! Info "This section on the history of Neo4j and the graph query language it created, Cypher, is largely excerpted from the ISO WG3 paper *An overview of the recent history of Graph Query Languages* [^Tobias2018] and [^Glang]. To take into account the latest two years of development, the content mentioned in this topic has been abridged and updated by the authors of this book. + +!!! Note "About GQL (Graph Query Language) and the development of an International Standard" + + Readers familiar with databases are probably aware of the Structured Query Language SQL. by using SQL, people access databases in a way that is close to natural language. Before SQL was widely adopted and standardized, the market for relational databases was very fragmented. Each vendor's product had a completely different way of accessing. Developers of the database product itself, developers of the tools surrounding the database product, and end-users of the database, all had to learn each product. When the SQL-89 standard was developed in 1989, the entire relational database market quickly focus on SQL-89. This greatly reduced the learning costs for the people mentioned above. + + GQL (Graph Query Language) assumes a role similar to SQL in the field of graph databases. Uses interacts with graphs with GQL. Unlike international standards such as SQL-89, there are no international standards for GQL. Two mainstream graph languages are Neo4j’s Cypher and Apache TinkerPop's Gremlin. The former is often referred to as the DQL, Declarative Query Language. DQL tells the system "what to do", regardless of "how to do". The latter is referred to as the IQL, Imperative Query Language. IQL explicitly specifies the system's actions. + + The GQL International Standard is in the process of being developed. + +[^Tobias2018]: "An overview of the recent history of Graph Query Languages". Authors: Tobias Lindaaker, U.S. National Expert.Date: 2018-05-14 + +#### Overview of the recent history of graph databases + +- In 2000, the idea of modeling data as a network came to the founders of Neo4j. +- In 2001, Neo4j developed the earliest core part of the code. +- In 2007, Neo4j started operating as a company. +- In 2009, Neo4j borrowed XPath as a graph query language. Gremlin [^Gremlin] is also similar to XPath. +- In 2010, Marko Rodriguez, a Neo4j employee, used the term Property Graph to describe the data model of Neo4j and TinkerPop (Gremlin). +- In 2011, the first public version Neo4j 1.4 was released, and the first version of Cypher was released. +- In 2012, Neo4j 1.8 enabled you to write a Cypher. Neo4j 2.0 added labels and indexes. Cypher became a declarative graph query language. +- In 2015, Cypher was opened up by Neo4j through the openCypher project. +- In 2017, the ISO WG3 organization discussed how to use SQL to query property graph data. +- In 2018, Starting from the Neo4j 3.5 GA, the core of Neo4j only for the Enterprise Edition will no longer be open source. +- In 2019, ISO officially established two projects ISO/IEC JTC 1 N 14279 and ISO/IEC JTC 1/SC 32 N 3228 to develop an international standard for graph database language. +- In 2021, the $325 million Series F funding round for Neo4j marks the largest investment round in database history. + +[^Gremlin]: Gremlin is a graph language developed based on [Apache TinkerPop](https://tinkerpop.apache.org/). + +#### The early history of Neo4j + +The data model property graph was first conceived in 2000. The founders of Neo4j were developing a media management system, and the schema of the system was often changed. To adapt to such changes, Peter Neubauer, one of the founders, wanted to enable the system to be modeled to a conceptually interconnected network. A group of graduate students at the Indian Institute of Technology Bombay implemented the earliest prototypes. Emil Eifrém, the Neo4j co-founder, and these students spent a week extending Peter's idea into a more abstract model: vertices were connected by relationships, and key-values were used as properties of vertices and relationships. They developed a Java API to interact with this data model and implemented an abstraction layer on top of the relational database. + +Although this network model greatly improved productivity, its performance has been poor. So Johan Svensson, Neo4j co-founder, put a lot of effort into implementing a native data management system, that is Neo4j. For the first few years, Neo4j was successful as an in-house product. In 2007, the intellectual property of Neo4j was transferred to an independent database company. + +In the first public release of Neo4j ( Neo4j 1.4,2011), the data model was consisted of vertices and typed edges. Vertices and edges have properties. The early versions of Neo4j did not have indexes. Applications had to construct their search structure from the root vertex. Because this was very unwieldy for the applications, Neo4j 2.0 (2013.12) introduced a new concept label on vertices. Based on labels, Neo4j can index some predefined vertex properties. + +"Vertex", "Relationship", "Property", "Relationships can only have one label.", "Vertices can have zero or multiple labels.". All these concepts form the data model definitions for Neo4j property graphs. With the later addition of indexing, Cypher became the main way of interacting with Neo4j. This is because the application developer only needs to focus on the data itself, not on the search structure that the developer built himself as mentioned above. +#### The creation of Gremlin + +Gremlin is a graph query language based on Apache TinkerPop, which is close in style to a sequence of function (procedure) calls. Initially, Neo4j was queried through the Java API. applications could embed the query engine as a library into the application and then use the API to query the graph. + +The early Neo4j employees Tobias Lindaaker, Ivarsson, Peter Neubauer, and Marko Rodriguez used XPath as a graph query. Groovy provides loop structures, branching, and computation. This was the original prototype of Gremlin, the first version of which was released in November 2009. + +Later, Marko found a lot of problems with using two different parsers (XPath and Groovy) at the same time and changed Gremlin to a Domain Specific Language (DSL) based on Groovy. + +#### The creation of Cypher + +Gremlin, like Neo4j's Java API, was originally intended to be a procedural way of expressing how to query databases. It uses shorter syntaxes to query and remotely access databases through the network. The procedural nature of Gremlin requires users to know the best way to query results, which is still burdensome for application developers. Over the last 30 years, the declarative language SQL has been a great success. SQL can separate the declarative way to get data from how the engine gets data. So the Neo4j engineers wanted to develop a declarative graph query language. + +In 2010, Andrés Taylor joined Neo4j as an engineer. Inspired by SQL, he started a project to develop graph query language, which was released as Neo4j 1.4 in 2011. The language is the ancestor of most graph query languages today - Cypher. + +Cypher's syntax is based on the use of ASCII art to describe graph patterns. This approach originally came from the annotations on how to describe graph patterns in the source code. An example can be seen as follows. + +![Image](https://www-cdn.nebula-graph.com.cn/nebula-blog/the-origin-of-cypher.png) + +Simply put, ASCII art uses printable text to describe vertices and edges. Cypher syntax uses `()` for vertices and `-[]->` for edges. `(query)-[modeled as]->(drawing)` is used to represent a simple graph relationship (which can also be called graph schema): `the starting vertex - query`, `the destination vertex - drawing`, and `the edge - modeled as`. + +The first version of Cypher implemented graph reading, but users should specify vertices from which to start querying. Only from these vertices could graph schema matching be supported. + +In a later version, Neo4j 1.8, released in October 2012, Cypher added the ability to modify graphs. However, queries still need to specify which nodes to start from. + +In December 2013, Neo4j 2.0 introduced the concept of a label, which is essentially an index. This allows the query engine to use the index to select the vertices matched by the schema, without requiring the user to specify the vertex to start the query. + +With the popularity of Neo4j, Cypher has a wide community of developers and is widely used in a variety of industries. It is still the most popular graph query language. + +In September 2015, Neo4j established the openCypher Implementors Group (oCIG) to open source Cypher to openCypher, to govern and advance the evolution of the language itself through open source. +#### Subsequent events + +Cypher has inspired a series of graph query languages, including: + +2015, Oracle released PGQL, a graph language used by the graph engine PGX. + +2016, the Linked Data Benchmarking Council (short for LDBC) an industry-renowned benchmarking organization for graph performance, released G-CORE. + +2018, RedisGraph, a Redis-based graph library, adopted Cypher as its graph language. + +2019, the International Standards Organization ISO started two projects to initiate the process of developing an international standard for graph languages based on existing industry achievements such as openCypher, PGQL, GSQL[^GSQL], and G-CORE. + +[^GSQL]: https://docs.tigergraph.com/dev/gsql-ref + +2019, Nebula Graph released Nebula Graph Query Language (nGQL) based on openCypher. + +![Image](https://docs-cdn.nebula-graph.com.cn/books/images/langhis.jpg "图语言的历史") + +### Distributed graph databases + +From 2005 to 2010, with the release of Google's cloud computing "Troika", various distributed architectures became increasingly popular, including Hadoop and Cassandra, which have been open-sourced. Several implications are as follows: + +1. The technical and cost advantages of distributed systems over single machines (e.g. Neo4j) or small machines are more obvious due to the increasing volume of data and computation. Distributed systems allow applications to access these thousands of machines as if they were local systems, without the need for much modification at the code level. + +2. The open-source approach allows more people to know emerging technologies and feedback to the community in a more cost-effective way, including code developers, data scientists, and product managers. + +Strictly speaking, Neo4j also offers several distributed capabilities, which are quite different from the industry's sense of the distributed system. + +- Neo4j 3. x requires that the full amount of data must be stored on a single machine. Although it supports full replication and high availability between multiple machines, the data cannot be sliced into different subgraphs. + + ![](https://docs-cdn.nebula-graph.com.cn/books/images/causal.png) + +- Neo4j 4. x stores a part of data on different machines (subgraphs), and then the application layer assembles data in a certain way (called Fabric)[^fosdem20] and distributes the reads and writes to each machine. This approach requires a log of involvement and work from the application layer code. For example, designing how to place different subgraphs on which machines they should be placed and how to assemble some of the results obtained from each machine into the final result. + + ![](https://dist.neo4j.com/wp-content/uploads/20200131191103/Neo4j-Fabric-LDBC-sharding-scheme.jpg) + + [^fosdem20]: https://neo4j.com/fosdem20/ + + The style of its syntax is as follows: + + ```Cypher + USE graphA + MATCH (movie:Movie) + Return movie.title AS title + UNION + USE graphB + MATCH (move:Movie) + RETURN movie.title AS title + ``` + + ![](https://docs-cdn.nebula-graph.com.cn/books/images/fabric.png) + +#### The second generation (distributed) graph database: Titan and its successor JanusGraph + +In 2011, Aurelius was founded to develop an open-source distributed graph database called Titan [^titan]. By the first official release of Titan in 2015, the backend of Titan can support many major distributed storage architectures (e.g. Cassandra, HBase, Elasticsearch, BerkeleyDB) and can reuse many of the conveniences of the Hadoop ecosystem, with Gremlin as a unified query language on the frontend. It is easy for programmers to use, develop and participate in the community. Large-scale graphs could be sharded and stored on HBase or Cassandra (which were relatively mature distributed storage solutions at the time), and the Gremlin language was relatively full-featured though slightly lengthy. The whole solution was competitive at that time (2011-2015). + +[^titan]: https://github.com/thinkaurelius/titan + +The following picture shows the growth of Titan and Neo4j stars on Github.com from 2012 to 2015. + +![Image](https://docs-cdn.nebula-graph.com.cn/books/images/titan-2015-neo4j.png) + +After Aurelius (Titan) was acquired by DataStax in 2015, Titan was gradually transformed into a closed-source commercial product(DataStax Enterprise Graph). + +After the acquisition of Aurelius(Titan), there has been a strong demand for an open-source distributed graph database, and there were not many mature and active products in the market. In the era of big data, data is still being generated in a steady stream, far faster than Moore's Law. The Linux Foundation, along with some technology giants (Expero, Google, GRAKN.AI, Hortonworks, IBM, and Amazon) replicated and forked the original Titan project and started it as a new project JanusGraph[^Janus]. Most of the community work including development, testing, release, and promotion, has been gradually shifted to the new JanusGraph。 + +[^Janus]: https://github.com/JanusGraph/janusgraph + +The following graph shows the evolution of daily code commits (pull requests) for the two projects, and we can see: + +1. Although Aurelius(Titan) still has some activity in its open-source code after its acquisition in 2015, the growth rate has slowed down significantly. This reflects the strength of the community. + +2. After the new project was started in January 2017, its community became active quickly, surpassing the number of pull requests accumulated by Titan in the past 5 years in just one year. At the same time, the open-source Titan came to a halt. + + ![Image](https://docs-cdn.nebula-graph.com.cn/books/images/titan-janus-dev.png) + +#### Famous products of the same period OrientDB, TigerGraph, ArangoDB, and DGraph + +In addition to JanusGraph managed by the Linux Foundation, more vendors have been joined the overall market. Some distributed graph databases that were developed by commercial companies use different data models and access methods. + +The following table only lists the main differences. + +| Vendors | Creation time | Core product| Open source protocol | Data model | Query language | +| ----- | ----- | ----- | ----- | ----- | ----- | +| OrientDB LTD (Acquired by SAP in 2017) | 2011 | OrientDB | Open source | Document + KV + Graph | OrientDB SQL (SQL-based extended graph abilities) | +| GraphSQL (was renamed TigerGraph) | 2012 | TigerGraph | Commercial version | Graph (Analysis) | GraphSQL (similar to SQL) | +| ArangoDB GmbH | 2014 | ArangoDB | Apache License 2.0 | Document + KV + Graph| AQL (Simultaneous operation of documents, KVs and graphs) | +| DGraph Labs | 2016 | DGraph | Apache Public License 2.0 + Dgraph Community License | Originally RDF, later changed to GraphQL | GraphQL+- | + +#### Traditional giants Microsoft, Amazon, and Oracle + +In addition to vendors focused on graph products, traditional giants have also entered the graph database field. + +Microsoft Azure Cosmos DB[^cosmos] is a multimodal database cloud service on the Microsoft cloud that provides SQL, document, graph, key-value, and other capabilities. +Amazon AWS Neptune[^neptune] is a graph database cloud service provided by AWS support property graphs and RDF two data models. +Oracle Graph[^Oracle] is a product of the relational database giant Oracle in the direction of graph technology and graph databases. + +[^cosmos]: https://azure.microsoft.com/en-us/free/cosmos-db/ + +[^neptune]: https://aws.amazon.com/cn/neptune/ + +[^Oracle]: https://www.oracle.com/database/graph/ + +#### Nebula Graph, a new generation of open-source distributed graph databases + +In the following topics, we will formally introduce Nebula Graph, a new generation of open-source distributed graph databases. diff --git a/docs-2.0/1.introduction/0-2.relates.md b/docs-2.0/1.introduction/0-2.relates.md new file mode 100644 index 00000000000..71863893c88 --- /dev/null +++ b/docs-2.0/1.introduction/0-2.relates.md @@ -0,0 +1,253 @@ +# Related technologies + +This topic introduces databases and graph-related technologies that are closely related to distributed graph databases. + +## Databases + +### Relational databases + +A relational database is a database that uses a relational model to organize data. The relational model is a two-dimensional table model, and a relational database consists of two-dimensional tables and the relationships between them. When it comes to relational databases, most people think of MySQL, one of the most popular database management systems that support database operations using the most common structured query language (SQL) and stores data in the form of tables, rows, and columns. This approach to storing data is derived from the relational data model proposed by Edgar Frank Codd in 1970. + +In a relational database, a table can be created for each type of data to be stored. For example, the player table is used to store all player information, the team table is used to store team information. Each row of data in a SQL table must contain a primary key. The primary key is a unique identifier for the row of data. Generally, the primary key is self-incrementing with the number of rows as the field ID. Relational databases have served the computer industry very well since their inception and will continue to do so for a long time to come. + +If you have used Excel, WPS, or other similar applications, you have a rough idea of how relational databases work. First, you set up the columns, then you add rows of data under the corresponding columns. You can average or otherwise aggregate the data in a column, similar to averaging in a relational database MySQL. Pivot tables in Excel are the equivalent of querying data in a relational database MySQL using aggregation functions and CASE statements. An Excel file can have multiple tables, and a single table is equivalent to a single table in MySQL. An Excel file is similar to a MySQL database. + +#### Relationships in relational databases + +Unlike graph databases, edges in relational databases (or SQL-type databases) are also stored as entities in specialized edge tables. Two tables are created, player and team, and then player_team is created as an edge table. Edge tables are usually formed by joining related tables. For example, here the edge table player_team is made by joining the player table and the team table. + +![image](https://user-images.githubusercontent.com/42762957/91702816-dc872200-ebab-11ea-8b36-577c29a3fe7a.png) + +The way of storing edges is not a big problem when associating small data sets, but problems arise when there are too many relationships in a relational database. Specifically, when you want to query just one player's teammates, you have to join all the data in the table and then filter out all the data you don't need, which puts a huge strain on the relational database when your dataset reaches a certain size. If you want to associate multiple different tables, the system may not be able to respond before the join bombs. + +#### Origins of relational databases + +As mentioned above, the relational data model was first proposed by Edgar Frank Codd, an IBM engineer, in 1970. Codd wrote several papers on database management systems that addressed the potential of the relational data model. The relational data model does not rely on linked lists of data (mesh or hierarchical data), but more on data sets. Using the mathematical method of tuple calculus, he argued that these datasets can perform the same tasks as a navigational database. The only requirement was that the relational data model needed a suitable query language to guarantee the consistency requirements of the database. This became the inspiration for declarative query languages such as Structured Query Language (SQL). IBM's System R was one of the first implementations of such a system. But Software Development Laboratories, a small company founded by ex-IBM people and one illustrious Mr.Larry Ellison, beat IBM to the market with the product that would become known as Oracle. + +Since the relational database was a trendy term at the time, many database vendors preferred to use it in their product names, even though their products were not actually relational. To prevent this and reduce the misuse of the relational data model, Codd introduced the famous Codd's 12 Rules. All relational data systems must follow Codd's 12 Rules. + +### NoSQL databases + +Graph databases are not the only alternative that can overcome the shortcomings of relational databases. There are many non-relational database products on the market that can be called NoSQL. The term NoSQL was first introduced in the late 1990s and can be interpreted as "not SQL" or "not only SQL". For the sake of understanding, NoSQL can be interpreted as a "non-relational database" here. Unlike relational databases, the data storage and retrieval mechanisms provided by NoSQL databases are not modeled based on table relationships. NoSQL databases can be divided into four categories. + +- Key-value Data Store +- Columnar Store +- Document Store +- Graph Store + +The following describes the four types of NoSQL databases. + +#### Key-value Data Store + +Key-value databases store data in unique key-value pairs. Unlike relational databases, key-value stores do not have tables and columns. A key-value database itself is like a large table with many columns (i.e., keys). In a key-value store database, data are stored and queried by means of keys, usually implemented as hash lists. This is much simpler than traditional SQL databases, and for some web applications, it is sufficient. + +The advantage of the key-value model for IT systems is that it is simple and easy to deploy. In most cases, this type of storage works well for unrelated data. If you are just storing data without querying it, there is no problem using this storage method. However, if the DBA only queries or updates some of the values, the key-value model becomes inefficient. Common key-value storage databases include Redis, Voldemort, and Oracle BDB. + +#### Columnar Store + +A NoSQL database's columnar store has many similarities to a NoSQL database's key-value store because the columnar store is still using keys for storage and retrieval. The difference is that in a columnar store database, the column is the smallest storage unit, and each column consists of a key, a value, and a timestamp for version control and conflict resolution. This is particularly useful when scaling in a distributed manner, as timestamps can be used to locate expired data when the database is updated. Because of the good scalability of columnar storage, the columnar store is suitable for very large data sets. Common columnar storage databases include HBase, Cassandra, HadoopDB, etc. + +#### Document Store + +A NoSQL database document store is a key-value-based database, but with enhanced functionality. Data is still stored as keys, but the values in a document store are structured documents, not just a string or a single value. That is, because of the increased information structure, document stores are able to perform more optimized queries and make data retrieval easier. Therefore, document stores are particularly well suited for storing, indexing, and managing document-oriented data or similar semi-structured data. + +Technically speaking, as a semi-structured unit of information, a document in a document store can be any form of document available, including XML, JSON, YAML, etc., depending on the design of the database vendor. For example, JSON is a common choice. While JSON is not the best choice for structured data, JSON-type data can be used in both front-end and back-end applications. Common document storage databases include MongoDB, CouchDB, Terrastore, etc. + +#### Graph Store + +The last class of NoSQL databases is graph databases. Nebula Graph, is also a graph database. Although graph databases are also NoSQL databases, graph databases are fundamentally different from the above-mentioned NoSQL databases. Graph databases store data in the form of points, edges, and attributes. Its advantages include high flexibility, support for complex graph algorithms, and can be used to build complex relational graphs. We will discuss graph databases in detail in the subsequent topics. But in this topic, you just need to know that a graph database is a NoSQL type of database. Common graph databases include Nebula Graph, Neo4j, OrientDB, etc. + +## Graph-related technologies + +Take a look at a panoramic view of graph technology in 2020 [^lan]. + +[^lan]: https://graphaware.com/graphaware/2020/02/17/graph-technology-landscape-2020.html + +![Image](https://raw.githubusercontent.com/GraphCoding/graph-technology-landscape/master/GraphTechnologyLandscape.jpg) + +There are many technologies that are related to graphs, which can be broadly classified into these categories: + +- Infrastructure: including graph databases, graph computing (processing) engines, graph deep learning, cloud services, etc. + +- Applications: including visualization, knowledge graph, anti-fraud, cyber security, social network, etc. + + +- Development tools: including graph query languages, modeling tools, development frameworks, and libraries. + +- E-books [^info] and conferences, etc. + +[^info]: Electronic copies are available for learning purposes by contacting [Author](mailto:min.wu@vesoft.com). + + + + +### Graph language + +In the previous topic, we introduced the history of graph languages. In this section, we make a classification of the functions of graph languages. + +- Nearest neighbor query (NNS): Query the neighboring edges, neighbors, or K-hops neighbors. + +- Find one/all subgraphs that satisfy a given graph pattern. This problem is very close to Subgraph Isomorphism - two seemingly different graphs that are actually identical [ ^subiso] as shown below. + +![](https://docs-cdn.nebula-graph.com.cn/books/images/samegraph.png) + +[^subiso]: https://en.wikipedia.org/wiki/Graph_isomorphism + +- Reachability (connectivity) problems: The most common reachability problem is the shortest path problem. Such problems are usually described in terms of Regular Path Query - a series of connected groups of vertices forming a path that needs to satisfy some regular expression. + +- Analytic problems: It is related to some convergent operators, such as Average, Count, Max, Vertex Degree. Measures the distance between all two vertices, the degree of interaction between a vertex and other vertices. + +### Graph database and graph processing systems + +A graph system usually includes a complex data pipeline [^biggraph]. From the data source (the left side of the picture below) to the processing output (the right side), multiple data processing steps and systems are used, such as the ETL module, Graph OLTP module, OLAP module, BI, and knowledge graph. + +![](https://docs-cdn.nebula-graph.com.cn/books/images/graphpipe.png) + +[^biggraph]: The Future is Big Graphs! A Community View on Graph Processing Systems. https://arxiv.org/abs/2012.06171 + +Graph databases and graph processing systems have different origins and specialties (and weaknesses). + +- (Online) The graph database is designed for persistent storage management of graphs and efficient subgraph operations. Hard disks and network) are the target operating devices, physical/logical data mapping, data integrity, and (fault) consistency are the main goals. Each request typically involves only a small part of the full graph and can usually be done on a single server. Request latency is usually in milliseconds or seconds, and request concurrency is typically in the thousands or hundreds of thousands. The early Neo4j was one of the origins of the graph database space. + +- (Offline) The graph processing system is for high-volume, parallel, iterative, processing, and analysis of the full graph. Memory and network are the target operating devices. Each request involves all graph vertices and requires all servers to be involved in its completion. The latency of a single request is in the range of minutes to hours (days). The request concurrency is in single digits. Google's Pregel [^Pregel] represents the typical origin of graph processing systems. Its point-centric programming abstraction and BSP's operational model constitute a programming paradigm that is a more graph-friendly API abstraction than the previous Hadoop Map-Reduce. + +[^Pregel]: G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the International Conference on Management of data (SIGMOD), pages 135–146, New York, NY, USA, 2010. ACM + +![](https://docs-cdn.nebula-graph.com.cn/books/images/databaseandprocess.png) +[^iga] + +[^iga]: https://neo4j.com/graphacademy/training-iga-40/02-iga-40-overview-of-graph-algorithms/ + +### Graph sharding methods + +For large-scale graph data, it is difficult to store it in the memory of a single server, and even just storing the graph structure itself is not enough. By increasing the capacity of a single server, its cost price usually rises exponentially. + +As the volume of data increases, for example, 100 billion data already exceeds the capacity of all commercially available servers on the market. + +There is another option is to shard data and place each shard on a different server to increase reliability and performance. For NoSQL systems, such as key-value or document systems, the sharding method is intuitive and natural. Each record and data unit can usually be placed on a different server based on the key or docID. + +However, the sharding of data structures like graphs is usually less intuitive, because usually, graphs are "fully connected" and each vertex can be connected to any other vertex in usually 6 hops. + +And it has been theoretically proven that the graph sharding problem is NP. + +When distributing the entire graph data across multiple servers, the cross-server network access latency is 10 times higher than the hardware (memory) access time inside the same server. Therefore, for some depth-first traversal scenarios, a large number of cross-network accesses occur, resulting in extremely high overall latency. + +![](https://docs-cdn.nebula-graph.com.cn/books/images/lessrpc.png)[^gpml] + +[^gpml]: https://livebook.manning.com/book/graph-powered-machine-learning/welcome/v-8/ + +Usually, graphs have a clear power-law distribution. A small number of vertices have much denser neighboring edges than the average vertices. While processing these vertices can usually be within the same server, reducing cross-network access, also means that these servers will be far more stressed than the average. + +![](https://docs-cdn.nebula-graph.com.cn/books/images/Power_Law_Distribution.png) + +![](https://docs-cdn.nebula-graph.com.cn/books/images/singleserver.png) + +The common graph sharding methods are as follows: + +- Biased application-level sharding: The application layer senses and controls which shard each vertex and edge should locate on, which can generally be judged based on the type of points and edges. A set of vertices of the same type is placed on one sharding and another set of vertices of the same type is placed on another sharding. Of course, for high reliability, the sharding itself can also be made multiple copies. When used by the application, the desired vertices and edges are fetched from each shard, and then on the off-application side (or some proxy server-side), the fetched data is assembled into the final result. This is typically represented by the Neo4j 4. x Fabric. + +![](https://docs-cdn.nebula-graph.com.cn/books/images/neo4j4x.png) + +- Using a distributed cache layer: Add a memory cache layer on the top of the hard disk and cache important portions of the sharding and data and preheat that cache. + +- Adding read-only replicas or views: Add read-only replicas or create a view for some of the graph sharding, and pass the heavier load of read requests through these sharding servers. + +- Performing fine-grained graph sharding: Form multiple small partitions of vertices and edges instead of one large sharding, and then place the more correlated partitions on the same server as much as possible. [^arr]。 + +![](https://docs-cdn.nebula-graph.com.cn/books/images/smartgraph.png) + +[^arr]: https://www.arangodb.com/learn/graphs/using-smartgraphs-arangodb/ + +A mixture of these approaches is also used in specific engineering practices. Usually, offline graph processing systems perform some degree of graph preprocessing to improve the locality through an ETL process, while online graph database systems usually choose a periodic data rebalancing process to improve data locality. + +### Technical challenges + +In the literature [^Ubiquity], a thorough investigation of graphs and challenges is done, and the following lists the top ten graph technology challenges. + +[^Ubiquity]: https://arxiv.org/abs/1709.03188 + +- Scalability: Loading and upgrading big graphs, performing graph computation and graph traversal, use of triggers and supernodes. +- Visualization: Customizable layouts, rendering and display big images, and display dynamic and updated display. +- Query language and programming API: Language expressiveness, standards compatibility, compatibility with existing systems, design of subqueries, and associative queries across multiple graphs. +- Faster graph algorithms. +- Easy to use (configuration and usage) +- Performance metrics and testing +- General graph technology software (e.g., to handle offline, online, streaming computations.) +- ETL +- Debug and test + +### Open-source graph tools on single machines + +There is a common misconception about graph databases that any data access involving graph structure needs to be stored in a graph database. + +When the amount of data is not large, single machine memory is enough to store the data. You can use some single machine open-source tools to store tens of millions of vertices and edges. + +- JGraphT[^JGraphT]: A well-known open-source Java graph theory library, which implements a considerable number of efficient graph algorithms. + +[^JGraphT]: https://jgrapht.org/ + +- igraph[^igraph]: A lightweight and powerful library, supporting R, Python, and C++. + +[^igraph]: https://igraph.org/ + +- NetworkX[^NetworkX]: The first choice for data scientists doing graph theory analysis. + +[^NetworkX]: https://networkx.org/ + +- Cytoscape[^Cytoscape]: A powerful visual open-source graph analysis tool. + +[^Cytoscape]: https://cytoscape.org/ + +- Gephi[^Gephi]: A powerful visual open-source graph analysis tool. + +[^Gephi]: https://gephi.org/ + +- arrows.app[^Arrow]: A simple brain mapping tool for visually generating Cypher statements. + +[^Arrow]: https://arrows.app/ + +### Industry databases and benchmarks + +#### LDBC + +LDBC[^LDBC] (Linked Data Benchmark Council)is a non-profit organization composed of hardware and software giants such as Oracle, Intel and mainstream graph database vendors such as Neo4j and TigerGraph, which is the benchmark guide developer and test result publisher for graphs and has a high influence in the industry. + +[^LDBC]:https://github.com/ldbc/ldbc_snb_docs + +SNB (Social Network Benchmark)is one of the benchmarks developed by the Linked Data Benchmark Committee (LDBC) for graph databases and is divided into two scenarios: interactive query (Interactive) and business intelligence (BI). Its role is similar to that of TPC-C, TPC-H, and other tests in SQL-type databases, which can help users compare the functions, performance, and capacity of various graph database products. + +An SNB dataset simulates the relationship between people and posts of a social network, taking into account the distribution properties of the social network, the activity of people, and other social information. + +![](https://docs-cdn.nebula-graph.com.cn/books/images/ldbc.png) + +The standard data size ranges from 0.1 GB (scale factor 0.1) to 1000 GB (sf 1000). Larger data sets of 10 TB and 100 TB can also be generated. The number of vertices and edges is as shown below. + +![](https://docs-cdn.nebula-graph.com.cn/books/images/ldbcsf.png) + +## Trends + +### Graph technologies of different origins and goals are learning from and integrating with each other + +![](https://docs-cdn.nebula-graph.com.cn/books/images/convergenceofcapability.png) + +### The trends in cloud computing place higher demands on scalability. + +According to Gartner's projections, cloud services have been growing at a rapid rate and penetration [^cl]. A large number of commercial software is gradually moving from a completely local and private model 10 years ago to a cloud services-based business model. +One of the major advantages of cloud services is that they offer near-infinite scalability. It requires that various cloud infrastructure-based software must have a better ability to scale up and down quickly and elastically. + +![](https://docs-cdn.nebula-graph.com.cn/books/images/cloudtrends.png) + +[^cl]: https://cloudcomputing-news.net/news/2019/apr/15/public-cloud-soaring-to-331b-by-2022-according-to-gartner/ + +### Trends in hardware that SSD will be the mainstream persistent device + +Hardware determines software architecture. From the 1950s, when Moore's Law was discovered, to the 00s, when multi-core was introduced, hardware trends and speeds have profoundly determined software architecture. Database systems are mostly designed around "hard disk + memory", high-performance computing systems are mostly designed around "memory + CPU", and distributed systems are designed completely differently for 1 gigabit, 10 gigabits, and RDMA. + +Graph traversals are featured as random access. Early graph database systems adopted the large memory + HDD architecture. By designing some data structure in memory, random access can be achieved in memory (B+ trees, Hash tables) for the purpose of optimizing graph topology traversal. And then the random access was converted into sequential reads and writes suitable for HDDs. The entire software architecture (including the storage and compute layers) must be based on and built around such IO processes. With the decline in SSD prices [^ssdhdd], SSDs are replacing HDDs as the dominant device. Friendly random access, deep IO queue, fast access are the features of SSD that differ from HDD's highly repetitive sequence, random latency, and easily damaged disk. The redesign for all software architectures becomes a heavy historical technical burden. + +![](https://docs-cdn.nebula-graph.com.cn/books/images/ssdhdd.png) + +[^ssdhdd]: https://blocksandfiles.com/2021/01/25/wikibon-ssds-vs-hard-drives-wrights-law/ diff --git a/docs-2.0/1.introduction/1.what-is-nebula-graph.md b/docs-2.0/1.introduction/1.what-is-nebula-graph.md index a62a953a4ee..1c66f951b99 100644 --- a/docs-2.0/1.introduction/1.what-is-nebula-graph.md +++ b/docs-2.0/1.introduction/1.what-is-nebula-graph.md @@ -2,7 +2,7 @@ Nebula Graph is an open-source, distributed, easily scalable, and native graph database. It is capable of hosting graphs with hundreds of billions of vertices and trillions of edges, and serving queries with millisecond-latency. -![Nebula Graph birdview](nebula-birdview.png) +![Nebula Graph birdview](nebula-graph-birdview-3.0.png) ## What is a graph database diff --git a/docs-2.0/1.introduction/3.nebula-graph-architecture/3.graph-service.md b/docs-2.0/1.introduction/3.nebula-graph-architecture/3.graph-service.md index 20cdf632548..8badff0fbbc 100644 --- a/docs-2.0/1.introduction/3.nebula-graph-architecture/3.graph-service.md +++ b/docs-2.0/1.introduction/3.nebula-graph-architecture/3.graph-service.md @@ -84,10 +84,6 @@ In the `nebula-graphd.conf` file, when `enable_optimizer` is set to be `true`, P As shown in the preceding figure, when the `Filter` node is explored, it is found that its children node is `GetNeighbors`, which matches successfully with the pre-defined rules, so a transformation is initiated to integrate the `Filter` node into the `GetNeighbors` node, the `Filter` node is removed, and then the process continues to the next rule. Therefore, when the `GetNeighbor` operator calls interfaces of the Storage layer to get the neighboring edges of a vertex during the execution stage, the Storage layer will directly filter out the unqualified edges internally. Such optimization greatly reduces the amount of data transfer, which is commonly known as filter pushdown. -!!! Note - - Nebula Graph {{ nebula.release }} will not run optimization by default. - ## Executor The Executor module consists of Scheduler and Executor. The Scheduler generates the corresponding execution operators against the execution plan, starting from the leaf nodes and ending at the root node. The structure is as follows. diff --git a/docs-2.0/1.introduction/3.nebula-graph-architecture/4.storage-service.md b/docs-2.0/1.introduction/3.nebula-graph-architecture/4.storage-service.md index ff86e34b3ce..4e5eae2a626 100644 --- a/docs-2.0/1.introduction/3.nebula-graph-architecture/4.storage-service.md +++ b/docs-2.0/1.introduction/3.nebula-graph-architecture/4.storage-service.md @@ -70,37 +70,37 @@ Therefore, Nebula Graph develops its own KVStore with RocksDB as the local stora - One Nebula Graph KVStore cluster supports multiple graph spaces, and each graph space has its own partition number and replica copies. Different graph spaces are isolated physically from each other in the same cluster. - + |`SerializedValue`|The serialized value of the key. It stores the property information of the edge.| ### Property descriptions @@ -114,12 +114,11 @@ Since in an ultra-large-scale relational network, vertices can be as many as ten ![data partitioning](https://www-cdn.nebula-graph.com.cn/nebula-blog/DataModel02.png) - -## About executions - -### About dangling edges +### "How to resolve the error `SemanticError: Missing yield clause.`?" -A dangling edge is an edge that only connects to a single vertex and only one part of the edge connects to the vertex. +Starting with Nebula Graph 3.0.0, the statements `LOOKUP`, `GO`, and `FETCH` must output results with the `YIELD` clause. For more information, see [YIELD](../3.ngql-guide/8.clauses-and-options/yield.md). -Dangling edges may appear in Nebula Graph {{ nebula.release }} as the design. And there is no `MERGE` statements of openCypher. The guarantee for dangling edges depends entirely on the application level. For more information, see [INSERT VERTEX](../3.ngql-guide/12.vertex-statements/1.insert-vertex.md), [DELETE VERTEX](../3.ngql-guide/12.vertex-statements/4.delete-vertex.md), [INSERT EDGE](../3.ngql-guide/13.edge-statements/1.insert-edge.md), [DELETE EDGE](../3.ngql-guide/13.edge-statements/4.delete-edge.md). +### "How to resolve the error `Zone not enough!`?" + +From Nebula Graph version 3.0.0, the Storage services added in the configuration files **CANNOT** be read or written directly. The configuration files only register the Storage services into the Meta services. You must run the `ADD HOSTS` command to read and write data on Storage servers. For more information, see [Manage Storage hosts](../4.deployment-and-installation/manage-storage-host.md)。 +### "How to resolve the error `To get the property of the vertex in 'v.age', should use the format 'var.tag.prop'`?" + +From Nebula Graph version 3.0.0, patterns support matching multiple tags at the same time, so you need to specify a tag name when querying properties. The original statement `RETURN variable_name.property_name` is changed to `RETURN variable_name..property_name`. ### "How to resolve `[ERROR (-1005)]: Used memory hits the high watermark(0.800000) of total system memory.`?" @@ -68,19 +69,104 @@ The reason for this error is usually that the storaged process returns too many It is a known issue. Just retry 1 to N times, where N is the partition number. The reason is that the meta client needs some heartbeats to update or errors to trigger the new leader information. -### "How to resolve the error `SemanticError: Missing yield clause.`?" +### "How to resolve `[ERROR (-1005)]: Schema not exist: xxx`?" -Starting with Nebula Graph 3.0.0, the statements `LOOKUP`, `GO`, and `FETCH` must output results with the `YIELD` clause. For more information, see [YIELD](../3.ngql-guide/8.clauses-and-options/yield.md). +If the system returns `Schema not exist` when querying, make sure that: - +- Whether the name of the tag or the edge type is a keyword. If it is a keyword, enclose them with backquotes (\`). For more information, see [Keywords](../3.ngql-guide/1.nGQL-overview/keywords-and-reserved-words.md). -### "How to resolve the error `To get the property of the vertex in 'v.age', should use the format 'var.tag.prop'`?" +### Unable to download SNAPSHOT packages when compiling Exchange, Connectors, or Algorithm -From Nebula Graph version 3.0.0, patterns support matching multiple tags at the same time, so you need to specify a tag name when querying properties. The original statement `RETURN variable_name.property_name` is changed to `RETURN variable_name..property_name`. +Problem description: The system reports `Could not find artifact com.vesoft:client:jar:xxx-SNAPSHOT` when compiling. + +Cause: There is no local Maven repository for storing or downloading SNAPSHOT packages. The default central repository in Maven only stores official releases, not development versions (SNAPSHOTs). + +Solution: Add the following configuration in the `profiles` scope of Maven's `setting.xml` file: + +```xml + + + true + + + + snapshots + https://oss.sonatype.org/content/repositories/snapshots/ + + true + + + + +``` + +### "How to resolve `[ERROR (-7)]: SyntaxError: syntax error near`?" + +In most cases, a query statement requires a `YIELD` or a `RETURN`. Check your query statement to see if `YIELD` or `RETURN` is provided. + +### "How to resolve the error `can’t solve the start vids from the sentence`?" + +The graphd process requires `start vids` to begin a graph traversal. The `start vids` can be specified by the user. For example: + +```ngql +> GO FROM ${vids} ... +> MATCH (src) WHERE id(src) == ${vids} +# The "start vids" are explicitly given by ${vids}. +``` + +It can also be found from a property index. For example: + +```ngql +# CREATE TAG INDEX IF NOT EXISTS i_player ON player(name(20)); +# REBUILD TAG INDEX i_player; + +> LOOKUP ON player WHERE player.name == "abc" | ... YIELD ... +> MATCH (src) WHERE src.name == "abc" ... +# The "start vids" are found from the property index "name". +``` + +Otherwise, an error like `can’t solve the start vids from the sentence` will be returned. + +### "How to resolve the error `Wrong vertex id type: 1001`?" + +Check whether the VID is `INT64` or `FIXED_STRING(N)` set by `create space`. For more information, see [create space](../3.ngql-guide/9.space-statements/1.create-space.md). + +### "How to resolve the error `The VID must be a 64-bit integer or a string fitting space vertex id length limit.`?" + +Check whether the length of the VID exceeds the limitation. For more information, see [create space](../3.ngql-guide/9.space-statements/1.create-space.md). + +### "How to resolve the error `edge conflict` or `vertex conflict`?" + +Nebula Graph may return such errors when the Storage service receives multiple requests to insert or update the same vertex or edge within milliseconds. Try the failed requests again later. + +### "How to resolve the error `RPC failure in MetaClient: Connection refused`?" + +The reason for this error is usually that the metad service status is unusual, or the network of the machine where the metad and graphd services are located is disconnected. Possible solutions are as follows: + +- Check the metad service status on the server where the metad is located. If the service status is unusual, restart the metad service. + +- Use `telnet meta-ip:port` to check the network status under the server that returns an error. + +- Check the port information in the configuration file. If the port is different from the one used when connecting, use the port in the configuration file or modify the configuration. + +### "How to resolve the error `StorageClientBase.inl:214] Request to "x.x.x.x":9779 failed: N6apache6thrift9transport19TTransportExceptionE: Timed Out` in `nebula-graph.INFO`?" + +The reason for this error may be that the amount of data to be queried is too large, and the storaged process has timed out. Possible solutions are as follows: + +- When importing data, set [Compaction](../8.service-tuning/compaction.md) manually to make read faster. + +- Extend the RPC connection timeout of the Graph service and the Storage service. Modify the value of `--storage_client_timeout_ms` in the `nebula-storaged.conf` file. This configuration is measured in milliseconds (ms). The default value is 60000ms. + + +### "How to resolve the error `MetaClient.cpp:65] Heartbeat failed, status:Wrong cluster!` in `nebula-storaged.INFO`, or `HBProcessor.cpp:54] Reject wrong cluster host "x.x.x.x":9771!` in `nebula-metad.INFO`? + +The reason for this error may be that the user has modified the IP or the port information of the metad process, or the storage service has joined other clusters before. Possible solutions are as follows: + +Delete the `cluster.id` file in the installation directory where the storage machine is deployed (the default installation directory is `/usr/local/nebula`), and restart the storaged service. + +## About design and functions ### "How is the `time spent` value at the end of each return message calculated?" @@ -100,6 +186,20 @@ Got 1 rows (time spent 1235/1934 us) - The second number `1934` shows the time spent from the client's perspective, that is, the time it takes for the client from sending a request, receiving a response, and displaying the result on the screen. +### Why does the port number of the `nebula-storaged` process keep showing red after connecting Nebula Graph? + +Because the `nebula-storaged` process waits for `nebula-metad` to add the current Storage service during the startup process. The Storage works after it receives the ready signal. Starting from Nebula Graph 3.0.0, the Meta service cannot directly read or write data in the Storage service that you add in the configuration file. The configuration file only registers the Storage service to the Meta service. You must run the `ADD HOSTS` command to enable the Meta to read and write data in the Storage service. For more information, see [Manage Storage hosts](../4.deployment-and-installation/manage-storage-host.md). + +### Why is there no line separating each row in the returned result of Nebula Graph 2.6.0? + +This is caused by the release of Nebula Console 2.6.0, not the change of Nebula Graph core. And it will not affect the content of the returned data itself. + +### About dangling edges + +A dangling edge is an edge that only connects to a single vertex and only one part of the edge connects to the vertex. + +Dangling edges may appear in Nebula Graph {{ nebula.release }} as the design. And there is no `MERGE` statements of openCypher. The guarantee for dangling edges depends entirely on the application level. For more information, see [INSERT VERTEX](../3.ngql-guide/12.vertex-statements/1.insert-vertex.md), [DELETE VERTEX](../3.ngql-guide/12.vertex-statements/4.delete-vertex.md), [INSERT EDGE](../3.ngql-guide/13.edge-statements/1.insert-edge.md), [DELETE EDGE](../3.ngql-guide/13.edge-statements/4.delete-edge.md). + ### "Can I set `replica_factor` as an even number in `CREATE SPACE` statements, e.g., `replica_factor = 2`?" NO. @@ -142,10 +242,6 @@ Therefore, using `GO` and `MATCH` to execute the same semantic query may cause d For more information, see [Wikipedia](https://en.wikipedia.org/wiki/Path_(graph_theory)#Walk,_trail,_path). -### "How to resolve `[ERROR (-7)]: SyntaxError: syntax error near`?" - -In most cases, a query statement requires a `YIELD` or a `RETURN`. Check your query statement to see if `YIELD` or `RETURN` is provided. - ### "How to count the vertices/edges number of each tag/edge type?" See [show-stats](../3.ngql-guide/7.general-query-statements/6.show/14.show-stats.md). @@ -178,65 +274,6 @@ You can use [Nebula Algorithm](../nebula-algorithm.md). Or get vertices by each tag, and then group them by yourself. -### "How to resolve the error `can’t solve the start vids from the sentence`?" - -The graphd process requires `start vids` to begin a graph traversal. The `start vids` can be specified by the user. For example: - -```ngql -> GO FROM ${vids} ... -> MATCH (src) WHERE id(src) == ${vids} -# The "start vids" are explicitly given by ${vids}. -``` - -It can also be found from a property index. For example: - -```ngql -# CREATE TAG INDEX IF NOT EXISTS i_player ON player(name(20)); -# REBUILD TAG INDEX i_player; - -> LOOKUP ON player WHERE player.name == "abc" | ... YIELD ... -> MATCH (src) WHERE src.name == "abc" ... -# The "start vids" are found from the property index "name". -``` - -Otherwise, an error like `can’t solve the start vids from the sentence` will be returned. - -### "How to resolve the error `Wrong vertex id type: 1001`?" - -Check whether the VID is `INT64` or `FIXED_STRING(N)` set by `create space`. For more information, see [create space](../3.ngql-guide/9.space-statements/1.create-space.md). - -### "How to resolve the error `The VID must be a 64-bit integer or a string fitting space vertex id length limit.`?" - -Check whether the length of the VID exceeds the limitation. For more information, see [create space](../3.ngql-guide/9.space-statements/1.create-space.md). - -### "How to resolve the error `edge conflict` or `vertex conflict`?" - -Nebula Graph may return such errors when the Storage service receives multiple requests to insert or update the same vertex or edge within milliseconds. Try the failed requests again later. - -### "How to resolve the error `RPC failure in MetaClient: Connection refused`?" - -The reason for this error is usually that the metad service status is unusual, or the network of the machine where the metad and graphd services are located is disconnected. Possible solutions are as follows: - -- Check the metad service status on the server where the metad is located. If the service status is unusual, restart the metad service. - -- Use `telnet meta-ip:port` to check the network status under the server that returns an error. - -- Check the port information in the configuration file. If the port is different from the one used when connecting, use the port in the configuration file or modify the configuration. - -### "How to resolve the error `StorageClientBase.inl:214] Request to "x.x.x.x":9779 failed: N6apache6thrift9transport19TTransportExceptionE: Timed Out` in `nebula-graph.INFO`?" - -The reason for this error may be that the amount of data to be queried is too large, and the storaged process has timed out. Possible solutions are as follows: - -- When importing data, set [Compaction](../8.service-tuning/compaction.md) manually to make read faster. - -- Extend the RPC connection timeout of the Graph service and the Storage service. Modify the value of `--storage_client_timeout_ms` in the `nebula-storaged.conf` file. This configuration is measured in milliseconds (ms). The default value is 60000ms. - - -### "How to resolve the error `MetaClient.cpp:65] Heartbeat failed, status:Wrong cluster!` in `nebula-storaged.INFO`, or `HBProcessor.cpp:54] Reject wrong cluster host "x.x.x.x":9771!` in `nebula-metad.INFO`? - -The reason for this error may be that the user has modified the IP or the port information of the metad process, or the storage service has joined other clusters before. Possible solutions are as follows: - -Delete the `cluster.id` file in the installation directory where the storage machine is deployed (the default installation directory is `/usr/local/nebula`), and restart the storaged service. ### Can non-English characters be used as identifiers, such as the names of graph spaces, tags, edge types, properties, and indexes? @@ -257,39 +294,6 @@ There is no such command. You can use [Nebula Algorithm](../nebula-algorithm.md). -### "How to resolve `[ERROR (-1005)]: Schema not exist: xxx`?" - -If the system returns `Schema not exist` when querying, make sure that: - -- Whether there is a tag or an edge type in the Schema. - -- Whether the name of the tag or the edge type is a keyword. If it is a keyword, enclose them with backquotes (\`). For more information, see [Keywords](../3.ngql-guide/1.nGQL-overview/keywords-and-reserved-words.md). - -### Unable to download SNAPSHOT packages when compiling Exchange, Connectors, or Algorithm - -Problem description: The system reports `Could not find artifact com.vesoft:client:jar:xxx-SNAPSHOT` when compiling. - -Cause: There is no local Maven repository for storing or downloading SNAPSHOT packages. The default central repository in Maven only stores official releases, not development versions (SNAPSHOTs). - -Solution: Add the following configuration in the `profiles` scope of Maven's `setting.xml` file: - -```xml - - - true - - - - snapshots - https://oss.sonatype.org/content/repositories/snapshots/ - - true - - - - -``` - ## About operation and maintenance ### "The log files are too large. How to recycle the logs?" diff --git a/docs-2.0/20.appendix/6.eco-tool-version.md b/docs-2.0/20.appendix/6.eco-tool-version.md index 49259d2bca3..0fa840dfa7d 100644 --- a/docs-2.0/20.appendix/6.eco-tool-version.md +++ b/docs-2.0/20.appendix/6.eco-tool-version.md @@ -1,6 +1,6 @@ # Ecosystem tools overview -![Nebula Graph birdview](../1.introduction/nebula-birdview.png) +![Nebula Graph birdview](../1.introduction/nebula-graph-birdview-3.0.png) !!! compatibility @@ -48,9 +48,9 @@ Nebula Explorer (Explorer for short) is a graph exploration visualization tool t Nebula Exchange (Exchange for short) is an Apache Spark&trade application for batch migration of data in a cluster to Nebula Graph in a distributed environment. It can support the migration of batch data and streaming data in a variety of different formats. For details, see [What is Nebula Exchange](../nebula-exchange/about-exchange/ex-ug-what-is-exchange.md). -## Nebula Operator + ## Nebula Importer @@ -81,7 +81,6 @@ Nebula Console is the native CLI client of Nebula Graph. For how to use it, see Docker Compose can quickly deploy Nebula Graph clusters. For how to use it, please refer to [Docker Compose Deployment Nebula Graph](../4.deployment-and-installation/2.compile-and-install-nebula-graph/3.deploy-nebula-graph-with-docker-compose.md ). - ## Backup & Restore [Backup&Restore](https://github.com/vesoft-inc/nebula-br) (BR for short) is a command line interface (CLI) tool that can help back up the graph space data of Nebula Graph, or restore it through a backup file data. diff --git a/docs-2.0/20.appendix/history.md b/docs-2.0/20.appendix/history.md new file mode 100644 index 00000000000..0ff5028b9b3 --- /dev/null +++ b/docs-2.0/20.appendix/history.md @@ -0,0 +1,42 @@ +# History timeline for Nebula Graph + +1. 2018.9: [dutor](https://github.com/dutor) wrote and submitted the first line of Nebula Graph database code. + + ![image](https://docs-cdn.nebula-graph.com.cn/books/images/dutor.png) + +2. 2019.5: Nebula Graph v0.1.0-alpha was released as open-source. + + ![image](https://docs-cdn.nebula-graph.com.cn/books/images/alpha-bj.png) + ![image](https://docs-cdn.nebula-graph.com.cn/books/images/alpha-hz.jpg) + + Nebula Graph v1.0.0-beta, v1.0.0-rc1, v1.0.0-rc2, v1.0.0-rc3, and v1.0.0-rc4 were released one after another within a year thereafter. + + ![image](https://docs-cdn.nebula-graph.com.cn/books/images/v010.png) + +3. 2019.7: Nebula Graph's debut at HBaseCon[^HBaseCon]. @[dangleptr](https://github.com/dangleptr) + + ![image](https://www-cdn.nebula-graph.com.cn/nebula-blog/HBase01.png) + + [^HBaseCon]: Nebula Graph v1.x supports both RocksDB and HBase as its storage engines. Nebula Graph v2.x removes HBase supports. + +4. 2020.3: Nebula Graph v2.0 was starting developed in the final stage of v1.0 development. + +5. 2020.6: The first major version of Nebula Graph v1.0.0 GA was released. + + ![image](https://docs-cdn.nebula-graph.com.cn/books/images/v100GA.png) + +6. 2021.3: The second major version of Nebula Graph v2.0 GA was released. + + ![image](https://docs-cdn.nebula-graph.com.cn/books/images/v200.png) + +7. 2021.8: Nebula Graph v2.5.0 was released. + + ![image](https://docs-cdn.nebula-graph.com.cn/books/images/2.5.0.png) + +8. 2021.10: Nebula Graph v2.6.0 was released. + + For more information about release notes, see [Releases](https://github.com/vesoft-inc/nebula/releases). + +9. 2022.2: Nebula Graph v3.0.0 was released. + + For more information about release notes, see [Releases](https://github.com/vesoft-inc/nebula/releases). diff --git a/docs-2.0/3.ngql-guide/1.nGQL-overview/1.overview.md b/docs-2.0/3.ngql-guide/1.nGQL-overview/1.overview.md index 43552b110e2..e59eb5bd515 100644 --- a/docs-2.0/3.ngql-guide/1.nGQL-overview/1.overview.md +++ b/docs-2.0/3.ngql-guide/1.nGQL-overview/1.overview.md @@ -6,7 +6,7 @@ This topic gives an introduction to the query language of Nebula Graph, nGQL. nGQL is a declarative graph query language for Nebula Graph. It allows expressive and efficient [graph patterns](3.graph-patterns.md). nGQL is designed for both developers and operations professionals. nGQL is an SQL-like query language, so it's easy to learn. -nGQL is a project in progress. New features and optimizations are done steadily. There can be differences between syntax and implementation. Submit an [issue](https://github.com/vesoft-inc/nebula-graph/issues) to inform the Nebula Graph team if you find a new issue of this type. Nebula Graph 2.0 or later releases will support [openCypher 9](https://www.opencypher.org/resources). +nGQL is a project in progress. New features and optimizations are done steadily. There can be differences between syntax and implementation. Submit an [issue](https://github.com/vesoft-inc/nebula-graph/issues) to inform the Nebula Graph team if you find a new issue of this type. Nebula Graph 3.0 or later releases will support [openCypher 9](https://www.opencypher.org/resources). ## What can nGQL do @@ -23,6 +23,10 @@ nGQL is a project in progress. New features and optimizations are done steadily. Users can download the example data [Basketballplayer](https://docs.nebula-graph.io/2.0/basketballplayer-2.X.ngql) in Nebula Graph. After downloading the example data, you can import it to Nebula Graph by using the `-f` option in [Nebula Graph Console](../../2.quick-start/3.connect-to-nebula-graph.md). +!!! note + + Ensure that you have executed the `ADD HOSTS` command to add the Storage service to your Nebula Graph cluster before importing the example data. For more information, see [Manage Storage hosts](../../4.deployment-and-installation/manage-storage-host.md). + ## Placeholder identifiers and values Refer to the following standards in nGQL: diff --git a/docs-2.0/3.ngql-guide/1.nGQL-overview/keywords-and-reserved-words.md b/docs-2.0/3.ngql-guide/1.nGQL-overview/keywords-and-reserved-words.md index cdfc332e7d5..4c62c67c737 100644 --- a/docs-2.0/3.ngql-guide/1.nGQL-overview/keywords-and-reserved-words.md +++ b/docs-2.0/3.ngql-guide/1.nGQL-overview/keywords-and-reserved-words.md @@ -1,8 +1,12 @@ # Keywords -Keywords have significance in nGQL. It can be classified into reserved keywords and non-reserved keywords. +Keywords have significance in nGQL. It can be classified into reserved keywords and non-reserved keywords. It is not recommend to use keywords in schema. -Non-reserved keywords are permitted as identifiers without quoting. To use special characters or reserved keywords as identifiers, quote them with backticks such as `AND`. +If you must use keywords in schema: + +- Non-reserved keywords are permitted as identifiers without quoting. + +- To use special characters or reserved keywords as identifiers, quote them with backticks such as `AND`. !!! Note diff --git a/docs-2.0/3.ngql-guide/10.tag-statements/3.alter-tag.md b/docs-2.0/3.ngql-guide/10.tag-statements/3.alter-tag.md index f9b634a9601..d15b5dc4012 100644 --- a/docs-2.0/3.ngql-guide/10.tag-statements/3.alter-tag.md +++ b/docs-2.0/3.ngql-guide/10.tag-statements/3.alter-tag.md @@ -35,7 +35,7 @@ ttl_definition: nebula> CREATE TAG IF NOT EXISTS t1 (p1 string, p2 int); nebula> ALTER TAG t1 ADD (p3 int, p4 string); nebula> ALTER TAG t1 TTL_DURATION = 2, TTL_COL = "p2"; -nebula> ALTER TAG t1 COMMENT='test1'; +nebula> ALTER TAG t1 COMMENT = 'test1'; nebula> ALTER TAG t1 ADD (p5 double NOT NULL DEFAULT 0.4 COMMENT 'p5') COMMENT='test2'; ``` diff --git a/docs-2.0/3.ngql-guide/14.native-index-statements/1.create-native-index.md b/docs-2.0/3.ngql-guide/14.native-index-statements/1.create-native-index.md index c2d2ce14e44..a283e9336d3 100644 --- a/docs-2.0/3.ngql-guide/14.native-index-statements/1.create-native-index.md +++ b/docs-2.0/3.ngql-guide/14.native-index-statements/1.create-native-index.md @@ -36,7 +36,7 @@ Although the same results can be obtained by using alternative indexes for queri Indexes cannot make queries faster. It can only locate a vertex or an edge according to properties or count the number of vertices or edges. - Long indexes decrease the scan performance of the Storage Service and use more memory. We suggest that you set the indexing length the same as that of the longest string to be indexed. The longest index length is 255 bytes. Strings longer than 255 bytes will be truncated. + Long indexes decrease the scan performance of the Storage Service and use more memory. We suggest that you set the indexing length the same as that of the longest string to be indexed. The longest index length is 256 bytes. If you must use indexes, we suggest that you: @@ -63,7 +63,7 @@ If you must use indexes, we suggest that you: ## Syntax ```ngql -CREATE {TAG | EDGE} INDEX [IF NOT EXISTS] ON { | } ([]) [COMMENT = '']; +CREATE {TAG | EDGE} INDEX [IF NOT EXISTS] ON { | } ([]) [COMMENT '']; ``` |Parameter|Description| diff --git a/docs-2.0/3.ngql-guide/15.full-text-index-statements/1.search-with-text-based-index.md b/docs-2.0/3.ngql-guide/15.full-text-index-statements/1.search-with-text-based-index.md index f391ae5e308..8269278eb28 100644 --- a/docs-2.0/3.ngql-guide/15.full-text-index-statements/1.search-with-text-based-index.md +++ b/docs-2.0/3.ngql-guide/15.full-text-index-statements/1.search-with-text-based-index.md @@ -77,7 +77,7 @@ LOOKUP ON { | } WHERE [YIELD ]; nebula> CREATE SPACE IF NOT EXISTS basketballplayer (partition_num=3,replica_factor=1, vid_type=fixed_string(30)); // This example signs in the text service. -nebula> SIGN IN TEXT SERVICE (127.0.0.1:9200); +nebula> SIGN IN TEXT SERVICE (127.0.0.1:9200, HTTP); // This example switches the graph space. nebula> USE basketballplayer; diff --git a/docs-2.0/3.ngql-guide/7.general-query-statements/2.match.md b/docs-2.0/3.ngql-guide/7.general-query-statements/2.match.md index 0a8926c4873..67b56d7c8d8 100644 --- a/docs-2.0/3.ngql-guide/7.general-query-statements/2.match.md +++ b/docs-2.0/3.ngql-guide/7.general-query-statements/2.match.md @@ -28,6 +28,10 @@ MATCH [] RETURN []; Starting from Nebula Graph version 3.0.0, in order to distinguish the properties of different tags, you need to specify a tag name when querying properties. The original statement `RETURN variable_name.property_name` is changed to `RETURN variable_name..property_name`. +!!! note + + Currently the `match` statement cannot find dangling edges. + - The `MATCH` statement retrieves data according to the `RETURN` clause. - The path type of the `MATCH` statement is `trail`. That is, only vertices can be repeatedly visited in the graph traversal. Edges cannot be repeatedly visited. For details, see [path](../../1.introduction/2.1.path.md). diff --git a/docs-2.0/3.ngql-guide/7.general-query-statements/5.lookup.md b/docs-2.0/3.ngql-guide/7.general-query-statements/5.lookup.md index dcdb41a3c2b..da915109bc8 100644 --- a/docs-2.0/3.ngql-guide/7.general-query-statements/5.lookup.md +++ b/docs-2.0/3.ngql-guide/7.general-query-statements/5.lookup.md @@ -56,7 +56,7 @@ The `WHERE` clause in a `LOOKUP` statement does not support the following operat - `$-` and `$^`. - In relational expressions, operators are not supported to have field names on both sides, such as `tagName.prop1> tagName.prop2`. - Nested AliasProp expressions in operation expressions and function expressions are not supported. -- The `XOR` and `NOT` operations are not supported. +- The `XOR` operation is not supported. ## Retrieve vertices diff --git a/docs-2.0/4.deployment-and-installation/2.compile-and-install-nebula-graph/3.deploy-nebula-graph-with-docker-compose.md b/docs-2.0/4.deployment-and-installation/2.compile-and-install-nebula-graph/3.deploy-nebula-graph-with-docker-compose.md index 4c5dfb9b22f..7856f605928 100644 --- a/docs-2.0/4.deployment-and-installation/2.compile-and-install-nebula-graph/3.deploy-nebula-graph-with-docker-compose.md +++ b/docs-2.0/4.deployment-and-installation/2.compile-and-install-nebula-graph/3.deploy-nebula-graph-with-docker-compose.md @@ -94,9 +94,11 @@ Using Docker Compose can quickly deploy Nebula Graph services based on the prepa By default, the authentication is off, you can only log in with an existing username (the default is `root`) and any password. To turn it on, see [Enable authentication](../../7.data-security/1.authentication/1.authentication.md). - 3. Run the `SHOW HOSTS` statement to check the status of the `nebula-storaged` processes. + 3. Run the following commands to make the `nebula-storaged` processes to the available state. ```bash + nebula> ADD HOSTS "storaged0":9779,"storaged1":9779,"storaged2":9779; + nebula> SHOW HOSTS; +-------------+------+----------+--------------+----------------------+------------------------+---------+ | Host | Port | Status | Leader count | Leader distribution | Partition distribution | Version | @@ -107,6 +109,10 @@ Using Docker Compose can quickly deploy Nebula Graph services based on the prepa +-------------+------+----------+--------------+----------------------+------------------------+---------+ ``` + !!! Note + + Starting from Nebula Graph version 3.0.0, the `nebula-storaged` service reported to metad needs to execute 'ADD HOSTS' manually before it can be used. + 5. Run `exit` twice to switch back to your terminal (shell). You can run Step 4 to log in to Nebula Graph again. ## Check the Nebula Graph service status and ports diff --git a/docs-2.0/4.deployment-and-installation/3.upgrade-nebula-graph/upgrade-nebula-graph-to-latest.md b/docs-2.0/4.deployment-and-installation/3.upgrade-nebula-graph/upgrade-nebula-graph-to-latest.md index 122312bba28..38e8c28eb3e 100644 --- a/docs-2.0/4.deployment-and-installation/3.upgrade-nebula-graph/upgrade-nebula-graph-to-latest.md +++ b/docs-2.0/4.deployment-and-installation/3.upgrade-nebula-graph/upgrade-nebula-graph-to-latest.md @@ -76,6 +76,9 @@ To upgrade Nebula Graph from historical versions to {{nebula.release}}: - Locate the data files based on the value of the `data_path` parameters in the Storage and Meta configurations, and backup the data files. The default paths are `nebula/data/storage` and `nebula/data/meta`. + !!! danger + The old data will not be automatically backed up during the upgrade. You must manually back up the data to avoid data loss. + - Backup the configuration files. - Collect the statistics of all graph spaces before the upgrade. After the upgrade, you can collect again and compare the results to make sure that no data is lost. To collect the statistics: @@ -126,8 +129,8 @@ To upgrade Nebula Graph from historical versions to {{nebula.release}}: 5. Use the new db_upgrader file in the `bin` directory to upgrade the format of old data. - !!! caution - This step backs up the Storage data. But to prevent backup failures and data loss, before executing this step, make sure that you have followed the **Preparations before the upgrade** section and backed up the Meta data and Storage data. + !!! danger + This step DOES NOT back up the Storage data. To avoid data loss, before executing this step, make sure that you have followed the **Preparations before the upgrade** section and backed up the Meta data and Storage data. Command syntax: @@ -140,7 +143,7 @@ To upgrade Nebula Graph from historical versions to {{nebula.release}}: ``` - `old_storage_data_path` indicates the path of the Storage data. It is defined by the `data_path` parameter in the Storage configuration files. - - `data_backup_path` indicates a custom path for data backup. + - `data_backup_path` indicates a custom path for data backup. **This option does not work for the current version and the old data will not be backed up to any path.** - `meta_server_ip` and `port` indicate the IP address and port number of a Meta server. - `2:3` indicates that the upgrade is from version 2.x to 3.x. diff --git a/docs-2.0/4.deployment-and-installation/6.deploy-text-based-index/2.deploy-es.md b/docs-2.0/4.deployment-and-installation/6.deploy-text-based-index/2.deploy-es.md index 5b6225c3dd3..c499f832dc5 100644 --- a/docs-2.0/4.deployment-and-installation/6.deploy-text-based-index/2.deploy-es.md +++ b/docs-2.0/4.deployment-and-installation/6.deploy-text-based-index/2.deploy-es.md @@ -77,13 +77,13 @@ When the Elasticsearch cluster is deployed, use the `SIGN IN` statement to sign ### Syntax ```ngql -SIGN IN TEXT SERVICE [( [,"", ""]), (), ...]; +SIGN IN TEXT SERVICE (, {HTTP | HTTPS} [,"", ""]) [, (, ...)]; ``` ### Example ```ngql -nebula> SIGN IN TEXT SERVICE (127.0.0.1:9200); +nebula> SIGN IN TEXT SERVICE (127.0.0.1:9200, HTTP); ``` !!! Note diff --git a/docs-2.0/4.deployment-and-installation/manage-storage-host.md b/docs-2.0/4.deployment-and-installation/manage-storage-host.md index 7413638f816..daffafe4f9a 100644 --- a/docs-2.0/4.deployment-and-installation/manage-storage-host.md +++ b/docs-2.0/4.deployment-and-installation/manage-storage-host.md @@ -10,6 +10,10 @@ Add the Storage hosts to a Nebula Graph cluster. ADD HOSTS : [,: ...]; ``` +!!! note + + To make sure the follow-up operations work as expected, wait for two heartbeat cycles, i.e., 20 seconds, and then run `SHOW HOSTS` to check whether the host is online. + ## Drop Storage hosts Delete the Storage hosts from cluster. diff --git a/docs-2.0/6.monitor-and-metrics/1.query-performance-metrics.md b/docs-2.0/6.monitor-and-metrics/1.query-performance-metrics.md index e2bb3ef9629..58d29e71350 100644 --- a/docs-2.0/6.monitor-and-metrics/1.query-performance-metrics.md +++ b/docs-2.0/6.monitor-and-metrics/1.query-performance-metrics.md @@ -8,7 +8,7 @@ Each metric of Nebula Graph consists of three fields: name, type, and time range |Field|Example|Description| |-|-|-| -|Metric name|`num_queries`|Indicates the function of the metric.| +|Metric name|`num_queries`|Indicates the function of the metric. For the detailed description of metrics, see [Metrics](../nebula-dashboard/6.monitor-parameter.md).| |Metric type|`sum`|Indicates how the metrics are collected. Supported types are SUM, COUNT, AVG, RATE, and the P-th sample quantiles such as P75, P95, P99, and P99.9.| |Time range|`600`|The time range in seconds for the metric collection. Supported values are 5, 60, 600, and 3600, representing the last 5 seconds, 1 minute, 10 minutes, and 1 hour.| diff --git a/docs-2.0/7.data-security/1.authentication/3.role-list.md b/docs-2.0/7.data-security/1.authentication/3.role-list.md index fd6e40413d3..2c6910ab678 100644 --- a/docs-2.0/7.data-security/1.authentication/3.role-list.md +++ b/docs-2.0/7.data-security/1.authentication/3.role-list.md @@ -64,10 +64,10 @@ The privileges of roles and the nGQL statements that each role can use are liste |Read data|Y|Y|Y|Y|Y|`GO`, `SET`, `PIPE`, `MATCH`, `ASSIGNMENT`, `LOOKUP`, `YIELD`, `ORDER BY`, `FETCH VERTICES`, `Find`, `FETCH EDGES`, `FIND PATH`, `LIMIT`, `GROUP BY`, `RETURN`| |Write data|Y|Y|Y|Y||`INSERT VERTEX`, `UPDATE VERTEX`, `INSERT EDGE`, `UPDATE EDGE`, `DELETE VERTEX`, `DELETE EDGES`, `DELETE TAG`| |Show operations|Y|Y|Y|Y|Y|`SHOW`, `CHANGE PASSWORD`| - |Job|Y|Y|Y|Y||`SUBMIT JOB COMPACT`、`SUBMIT JOB FLUSH`、`SUBMIT JOB STATS`、`STOP JOB`、`RECOVER JOB`| - |Write space|Y|||||`CREATE SPACE`, `DROP SPACE`, `CREATE SNAPSHOT`, `DROP SNAPSHOT`, `BALANCE`, `ADMIN`, `CONFIG`, `INGEST`, `DOWNLOAD`, `BUILD TAG INDEX`, `BUILD EDGE INDEX`| + |Job|Y|Y|Y|Y||`SUBMIT JOB COMPACT`, `SUBMIT JOB FLUSH`, `SUBMIT JOB STATS`, `STOP JOB`, `RECOVER JOB`, `BUILD TAG INDEX`, `BUILD EDGE INDEX`| + |Write space|Y|||||`CREATE SPACE`, `DROP SPACE`, `CREATE SNAPSHOT`, `DROP SNAPSHOT`, `BALANCE`, `ADMIN`, `CONFIG`, `INGEST`, `DOWNLOAD`| !!! caution diff --git a/docs-2.0/nebula-analytics.md b/docs-2.0/nebula-analytics.md index 6fd8034e069..da128f9d929 100644 --- a/docs-2.0/nebula-analytics.md +++ b/docs-2.0/nebula-analytics.md @@ -21,7 +21,7 @@ The version correspondence between Nebula Analytics and Nebula Graph is as follo |Nebula Analytics|Nebula Graph| |:---|:---| -|{{analytics.release}}|{{nebula.release}}| +|{{plato.release}}|{{nebula.release}}| ## Graph algorithms @@ -58,7 +58,7 @@ The preparations for compiling Nebula Analytics are similar to compiling Nebula 1. Clone the `analytics` repository. ```bash - $ git clone -b {{analytics.branch}} https://github.com/vesoft-inc/nebula-analytics.git + $ git clone -b {{plato.branch}} https://github.com/vesoft-inc/nebula-analytics.git ``` 2. Access the `nebula-analytics` directory. diff --git a/docs-2.0/nebula-br/1.what-is-br.md b/docs-2.0/nebula-br/1.what-is-br.md index f5c66d932df..dba5acf18e7 100644 --- a/docs-2.0/nebula-br/1.what-is-br.md +++ b/docs-2.0/nebula-br/1.what-is-br.md @@ -18,6 +18,7 @@ The BR has the following features. It supports: - Supports Nebula Graph v{{ nebula.release }} only. - Supports full backup, but not incremental backup. - Currently, Nebula Listener and full-text indexes do not support backup. +- Auto deployment and restoration are supported when there is only one metad process configured in the local file - If you back up data to the local disk, the backup files will be saved in the local path of each server. You can also mount the NFS on your host to restore the backup data to a different host. - The backup graph space can be restored to the original cluster only. Cross clusters restoration is not supported. - During the backup process, both DDL and DML statements in the specified graph spaces are blocked. We recommend that you do the operation within the low peak period of the business, for example, from 2:00 AM to 5:00 AM. diff --git a/docs-2.0/nebula-dashboard-ent/8.faq.md b/docs-2.0/nebula-dashboard-ent/8.faq.md index 66fd857be9b..903942e84a5 100644 --- a/docs-2.0/nebula-dashboard-ent/8.faq.md +++ b/docs-2.0/nebula-dashboard-ent/8.faq.md @@ -22,9 +22,9 @@ The status of a cluster is as follows: Managing clusters requires the SSH information of the corresponding node. Therefore, you need to have at least an SSH account and the corresponding password with executable permissions before performing operations on Dashboard. -## "What is scaling?" + ## "Why cannot operate on the Metad service?" diff --git a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-clickhouse.md b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-clickhouse.md index 3cf2d0e0d0f..4d9dc609bd6 100644 --- a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-clickhouse.md +++ b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-clickhouse.md @@ -97,7 +97,7 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf` cores: 1 maxResultSize: 1G } - cores { + cores: { max: 16 } } @@ -116,11 +116,11 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf` pswd: nebula # Fill in the name of the graph space you want to write data to in the Nebula Graph. space: basketballplayer - connection { + connection: { timeout: 3000 retry: 3 } - execution { + execution: { retry: 3 } error: { diff --git a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-csv.md b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-csv.md index 3f76a31dbe7..63e7c795ebd 100644 --- a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-csv.md +++ b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-csv.md @@ -115,7 +115,7 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf` memory:1G } - cores { + cores: { max: 16 } } @@ -136,11 +136,11 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf` # Fill in the name of the graph space you want to write data to in the Nebula Graph. space: basketballplayer - connection { + connection: { timeout: 3000 retry: 3 } - execution { + execution: { retry: 3 } error: { diff --git a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-hbase.md b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-hbase.md index 2eecd7b36f8..c04df5f53c4 100644 --- a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-hbase.md +++ b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-hbase.md @@ -134,7 +134,7 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf` cores: 1 maxResultSize: 1G } - cores { + cores: { max: 16 } } @@ -154,11 +154,11 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf` pswd: nebula # Fill in the name of the graph space you want to write data to in the Nebula Graph. space: basketballplayer - connection { + connection: { timeout: 3000 retry: 3 } - execution { + execution: { retry: 3 } error: { diff --git a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-hive.md b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-hive.md index 3c8d994aefa..ac3e72c83ef 100644 --- a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-hive.md +++ b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-hive.md @@ -167,7 +167,7 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf` cores: 1 maxResultSize: 1G } - cores { + cores: { max: 16 } } @@ -195,11 +195,11 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf` pswd: nebula # Fill in the name of the graph space you want to write data to in the Nebula Graph. space: basketballplayer - connection { + connection: { timeout: 3000 retry: 3 } - execution { + execution: { retry: 3 } error: { diff --git a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-json.md b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-json.md index b4f9a52a31f..2a079b14d6a 100644 --- a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-json.md +++ b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-json.md @@ -143,7 +143,7 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf` memory:1G } - cores { + cores: { max: 16 } } @@ -164,11 +164,11 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf` # Fill in the name of the graph space you want to write data to in the Nebula Graph. space: basketballplayer - connection { + connection: { timeout: 3000 retry: 3 } - execution { + execution: { retry: 3 } error: { diff --git a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-kafka.md b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-kafka.md index 558aceadc97..cf8c44bd35d 100644 --- a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-kafka.md +++ b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-kafka.md @@ -93,7 +93,7 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf` cores: 1 maxResultSize: 1G } - cores { + cores: { max: 16 } } @@ -113,11 +113,11 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf` pswd: nebula # Fill in the name of the graph space you want to write data to in the Nebula Graph. space: basketballplayer - connection { + connection: { timeout: 3000 retry: 3 } - execution { + execution: { retry: 3 } error: { diff --git a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-maxcompute.md b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-maxcompute.md index 87b451d965b..8de6b4a80a3 100644 --- a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-maxcompute.md +++ b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-maxcompute.md @@ -97,7 +97,7 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf` cores: 1 maxResultSize: 1G } - cores { + cores: { max: 16 } } @@ -116,11 +116,11 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf` pswd: nebula # Fill in the name of the graph space you want to write data to in the Nebula Graph. space: basketballplayer - connection { + connection: { timeout: 3000 retry: 3 } - execution { + execution: { retry: 3 } error: { diff --git a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-mysql.md b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-mysql.md index 10abbe54b79..0c0f167e72f 100644 --- a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-mysql.md +++ b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-mysql.md @@ -137,7 +137,7 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf` cores: 1 maxResultSize: 1G } - cores { + cores: { max: 16 } } @@ -156,11 +156,11 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf` pswd: nebula # Fill in the name of the graph space you want to write data to in the Nebula Graph. space: basketballplayer - connection { + connection: { timeout: 3000 retry: 3 } - execution { + execution: { retry: 3 } error: { diff --git a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-neo4j.md b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-neo4j.md index 1749d9ccd00..59c9b12d58b 100644 --- a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-neo4j.md +++ b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-neo4j.md @@ -126,7 +126,7 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf` memory:1G } - cores:{ + cores: { max: 16 } } @@ -142,12 +142,12 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf` pswd: nebula space: basketballplayer - connection { + connection: { timeout: 3000 retry: 3 } - execution { + execution: { retry: 3 } diff --git a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-orc.md b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-orc.md index 14647665f42..3f35b4849b2 100644 --- a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-orc.md +++ b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-orc.md @@ -111,7 +111,7 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf` memory:1G } - cores { + cores: { max: 16 } } @@ -132,11 +132,11 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf` # Fill in the name of the graph space you want to write data to in the Nebula Graph. space: basketballplayer - connection { + connection: { timeout: 3000 retry: 3 } - execution { + execution: { retry: 3 } error: { diff --git a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-parquet.md b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-parquet.md index ac5f0dd928c..b0972a91f00 100644 --- a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-parquet.md +++ b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-parquet.md @@ -111,7 +111,7 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf` memory:1G } - cores { + cores: { max: 16 } } @@ -132,11 +132,11 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf` # Fill in the name of the graph space you want to write data to in the Nebula Graph. space: basketballplayer - connection { + connection: { timeout: 3000 retry: 3 } - execution { + execution: { retry: 3 } error: { diff --git a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-pulsar.md b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-pulsar.md index 987e0042317..5332f6192df 100644 --- a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-pulsar.md +++ b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-pulsar.md @@ -89,7 +89,7 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf` cores: 1 maxResultSize: 1G } - cores { + cores: { max: 16 } } @@ -111,11 +111,11 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf` # Fill in the name of the graph space you want to write data to in the Nebula Graph. space: basketballplayer - connection { + connection: { timeout: 3000 retry: 3 } - execution { + execution: { retry: 3 } error: { diff --git a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-sst.md b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-sst.md index b605c2f0a51..88ee2155d06 100644 --- a/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-sst.md +++ b/docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-sst.md @@ -210,7 +210,7 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf` } # The connection parameters of clients - connection { + connection: { # The timeout duration of socket connection and execution. Unit: milliseconds. timeout: 30000 } diff --git a/docs-2.0/nebula-explorer/canvas-operations/visualization-mode.md b/docs-2.0/nebula-explorer/canvas-operations/visualization-mode.md index be106008642..e7c1a4c1a78 100644 --- a/docs-2.0/nebula-explorer/canvas-operations/visualization-mode.md +++ b/docs-2.0/nebula-explorer/canvas-operations/visualization-mode.md @@ -18,7 +18,7 @@ Exploration of the data on a canvas is possible in 2D mode. | ---------- | ------------------------------------------------------------ | | Weight Degree | Automatically resizes vertices according to the number of outgoing and incoming edges of all the vertices on the canvas. | | Reset Degree | Resets the vertices on the canvas to their original size. | -| Detection | Outlier: Detects the vertices that connect no edges on a canvas.
Hanging Edges: Detects the vertices with only one undirected edge on a canvas.
Loop Detection: Detects the paths that connect a vertex to itself. | +| Detection | Outlier: Detects the vertices that connect no edges on a canvas.
Dangling Edges: Detects edges associated with vertices of one degree in the canvas (associated vertices are included).
Loop Detection: Detects the paths that connect a vertex to itself. | For more information about the operations available in 2D mode, see [Canvas](canvas-overview.md). diff --git a/docs-2.0/nebula-studio/about-studio/st-ug-release-note.md b/docs-2.0/nebula-studio/about-studio/st-ug-release-note.md index 9aa916a09e9..2ef37f51019 100644 --- a/docs-2.0/nebula-studio/about-studio/st-ug-release-note.md +++ b/docs-2.0/nebula-studio/about-studio/st-ug-release-note.md @@ -1,6 +1,11 @@ # Change Log -## v3.2.1 (2020.02.17) +## v3.2.2 (2022.03.08) + +- Fix: + - Fix the verification problem of Chinese and special characters. + +## v3.2.1 (2022.02.17) - Fix: - Remove the node environment check before rpm installation. diff --git a/docs-2.0/reuse/source_connect-to-nebula-graph.md b/docs-2.0/reuse/source_connect-to-nebula-graph.md index 906d337aaaa..f11afef62cf 100644 --- a/docs-2.0/reuse/source_connect-to-nebula-graph.md +++ b/docs-2.0/reuse/source_connect-to-nebula-graph.md @@ -30,12 +30,8 @@ If you do not have a Nebula Graph database yet, we recommend that you try the cl We recommend that you select the **latest** release. - ![Select a Nebula Graph version and click **Assets**](../reuse/console-1.png "Click Assets to show the available Nebula Graph binary files") - 2. In the **Assets** area, find the correct binary file for the machine where you want to run Nebula Console and download the file to the machine. - ![Click to download the package according to your hardware architecture](../reuse/assets-1.png "Click the package name to download it") - 3. (Optional) Rename the binary file to `nebula-console` for convenience. !!! note diff --git a/docs-2.0/reuse/source_install-nebula-graph-by-rpm-or-deb.md b/docs-2.0/reuse/source_install-nebula-graph-by-rpm-or-deb.md index f3528b956fa..9ee8e045f13 100644 --- a/docs-2.0/reuse/source_install-nebula-graph-by-rpm-or-deb.md +++ b/docs-2.0/reuse/source_install-nebula-graph-by-rpm-or-deb.md @@ -135,7 +135,7 @@ Prepare the right [resources](https://docs.nebula-graph.io/{{nebula.release}}/4. * Use the following syntax to install with a DEB package. ```bash - $ sudo dpkg -i --instdir== + $ sudo dpkg -i --instdir= ``` For example, to install a DEB package in the default path for the {{nebula.release}} version. diff --git a/docs-2.0/reuse/source_manage-service.md b/docs-2.0/reuse/source_manage-service.md index d1be4919da7..a98c9e8609d 100644 --- a/docs-2.0/reuse/source_manage-service.md +++ b/docs-2.0/reuse/source_manage-service.md @@ -124,9 +124,9 @@ $ sudo /usr/local/nebula/scripts/nebula.service status all * Nebula Graph is running normally if the following information is returned. ```bash - [INFO] nebula-metad(de03025): Running as 26601, Listening on 9559 - [INFO] nebula-graphd(de03025): Running as 26644, Listening on 9669 - [INFO] nebula-storaged(de03025): Running as 26709, Listening on 9779 + [INFO] nebula-metad(02b2091): Running as 26601, Listening on 9559 + [INFO] nebula-graphd(02b2091): Running as 26644, Listening on 9669 + [INFO] nebula-storaged(02b2091): Running as 26709, Listening on 9779 ``` * If the returned result is similar to the following one, there is a problem. You may also go to the [Nebula Graph community](https://discuss.nebula-graph.io/) for help. diff --git a/mkdocs.yml b/mkdocs.yml index fd8b7f0aa88..dbedb670375 100755 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -42,23 +42,23 @@ extra_css: - css/version-select.css - stylesheets/extra.css extra: + analytics: + provider: google + property: UA-60523578-15 nebula: release: master nightly: nightly master: master base20: 2.0 base200: 2.0.0 - branch: master + branch: v3.0.0 version: method: mike social: - icon: 'fontawesome/brands/github' link: 'https://github.com/vesoft-inc/nebula-docs' studio: - base111b: 1.1.1-beta - base220: 2.2.1 - base300: 3.0.0 - release: 3.2.1 + release: 3.2.2 explorer: base100: 1.0.0 release: 2.2.0 @@ -75,7 +75,7 @@ extra: algorithm: release: 3.0.0 branch: v3.0.0 - analytics: + plato: release: 1.0.0 branch: v1.0.0 sparkconnector: @@ -126,6 +126,9 @@ extra: nav: - About: README.md - Introduction: + - Introduction to graphs: 1.introduction/0-0-graph.md + - Graph databases: 1.introduction/0-1-graph-database.md + - Related technologies: 1.introduction/0-2.relates.md - What is Nebula Graph: 1.introduction/1.what-is-nebula-graph.md - Data model: 1.introduction/2.data-model.md - Path: 1.introduction/2.1.path.md @@ -517,6 +520,7 @@ nav: - Ecosystem tools: 20.appendix/6.eco-tool-version.md - Write tools: 20.appendix/write-tools.md - How to contribute: 15.contribution/how-to-contribute.md + - History timeline: 20.appendix/history.md - 中文手册: https://docs.nebula-graph.com.cn/ - PDF: ./pdf/NebulaGraph-EN.pdf @@ -580,10 +584,6 @@ plugins: #show_anchors: true #verbose: true -google_analytics: - - UA-60523578-16 - - auto - extra_javascript: - js/version-select.js - js/jquery.js