From 4d5c1366696556227cdf36876a67158dbba22ff7 Mon Sep 17 00:00:00 2001 From: Venu Vardhan Reddy Tekula Date: Wed, 19 Oct 2022 19:34:23 -0400 Subject: [PATCH] Revert a few changes and completed the work Signed-off-by: Venu Vardhan Reddy Tekula --- CONTRIBUTING.md | 36 ++-- basics/dockerhub.md | 10 +- basics/install.md | 6 +- basics/quick.md | 4 +- cases-chaoss/intro.md | 4 +- docs/getting-started/dev-setup.md | 8 +- docs/getting-started/setup.md | 24 ++- docs/getting-started/troubleshooting.md | 37 ++-- gelk/kidash.md | 8 +- gelk/meetup.md | 4 +- gelk/simple.md | 12 +- gelk/sortinghat.md | 8 +- graal/cocom.md | 4 +- manuscripts/first.md | 2 +- python/es-dsl.md | 4 +- python/es.md | 174 +++++++++--------- sirmordred/container.md | 4 +- sirmordred/micro-mordred.md | 2 +- sortinghat/basic.md | 28 +-- sortinghat/data.md | 18 +- .../csv-from-jenkins-enriched-index.md | 4 +- tools-and-tips/html5-app-latest-activity.md | 4 +- tools-and-tips/perceval.md | 2 +- 23 files changed, 213 insertions(+), 194 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 14042d5f..16bdac05 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -18,10 +18,10 @@ or suggest something. Any feedback is appreciated! If you are willing to setup the tutorial locally ```bash -$ git clone https://github.com/chaoss/grimoirelab-tutorial -$ cd grimoirelab-tutorial -$ bundle -$ bundle exec jekyll serve +git clone https://github.com/chaoss/grimoirelab-tutorial +cd grimoirelab-tutorial +bundle +bundle exec jekyll serve ``` **Note:** Make sure you have git and ruby (version 2.7.x) installed. @@ -43,17 +43,17 @@ which is a fork (copy) of the GrimoireLab Tutorial. 2. Clone the forked git repository, and create in a local branch for your contribution. -``` -$ git clone https://github.com/username/grimoirelab-tutorial/ -$ cd grimoirelab-tutorial/ -$ git checkout -b new-branch-name +```bash +git clone https://github.com/username/grimoirelab-tutorial/ +cd grimoirelab-tutorial/ +git checkout -b new-branch-name ``` 3. In this repository, set up a remote for the upstream (original grimoirelab-tutorial) git repository. -``` -$ git remote add upstream https://github.com/chaoss/grimoirelab-tutorial/ +```bash +git remote add upstream https://github.com/chaoss/grimoirelab-tutorial/ ``` 4. Now you can change the documentation and then commit it. Except that the @@ -61,19 +61,19 @@ contribution really needs it, use a single commit, and comment in detail in the corresponding commit message what it is intended to do. If it fixes some bug, reference it (with the text "_Fixes #23_", for example, for issue number 23). -``` -$ git add -A -$ git commit -s +```bash +git add -A +git commit -s ``` 5. Once your contribution is ready, rebase your local branch with `upstream/master`, so that it merges clean with that branch, and push your local branch to a remote branch to your GitHub repository. -``` -$ git fetch upstream -$ git rebase upstream/master -$ git push origin new-branch-name +```bash +git fetch upstream +git rebase upstream/master +git push origin new-branch-name ``` 6. In the GitHub interface, produce a pull request from your branch (you will @@ -97,7 +97,7 @@ For ensuring it, a bot checks all incoming commits. For users of the git command line interface, a sign-off is accomplished with the `-s` as part of the commit command: -``` +```bash git commit -s -m 'This is a commit message' ``` diff --git a/basics/dockerhub.md b/basics/dockerhub.md index 24101a3b..476eabfd 100644 --- a/basics/dockerhub.md +++ b/basics/dockerhub.md @@ -8,13 +8,13 @@ To try `grimoirelab/full`, just type: ```bash docker run -p 127.0.0.1:5601:5601 \ - -v $(pwd)/credentials.cfg:/override.cfg \ - -t grimoirelab/full + -v $(pwd)/credentials.cfg:/override.cfg \ + -t grimoirelab/full ``` `credentials.cfg` should have a GitHub API token, in `mordred.cfg` format: -``` +```cfg [github] api-token = XXX ``` @@ -35,8 +35,8 @@ If you're running the container on Windows through Docker Quickstart Terminal an ```bash docker run -p x.x.x.x:5601:5601 \ - -v $(pwd)/credentials.cfg:/override.cfg \ - -t grimoirelab/full + -v $(pwd)/credentials.cfg:/override.cfg \ + -t grimoirelab/full ``` but replace the x'ed out IP address with the IP address of your VM that you got from `ifconfig`. If all goes well, once you see the docker command line print out "Elasticsearch Aliased: Created!", you should be able to go to 127.0.0.1:5601 on your host machine web browser and be able to access the GrimoireLab dashboard. diff --git a/basics/install.md b/basics/install.md index 1dbe01bf..a80d6a55 100644 --- a/basics/install.md +++ b/basics/install.md @@ -26,13 +26,13 @@ as detailed in the First, let's create our new environment. I like my Python virtual environments under the `venvs` subdirectory in my home directory, and in this case I will call it `gl` \(see how original I am!\): ```bash -python3 -m venv ~/venvs/gl +$ python3 -m venv ~/venvs/gl ``` Once the virtual environment is created, you can activate it: ```bash -source ~/venvs/gl/bin/activate +$ source ~/venvs/gl/bin/activate (gl) $ ``` @@ -215,7 +215,7 @@ sudo apt-get install build-essential Usually, you know you need this when you have a problem installing `dulwich`. For example, you check the output of `pip install` and you find: -``` +```bash dulwich/_objects.c:21:10: fatal error: Python.h: No such file or Directory ``` diff --git a/basics/quick.md b/basics/quick.md index e0cc1b44..a9179198 100644 --- a/basics/quick.md +++ b/basics/quick.md @@ -29,13 +29,13 @@ Please check the [section on installing non-Python packages](install.html#non-python-pkgs) if you have any trouble. ```bash -(gl) % pip3 install grimoirelab +(gl) $ pip3 install grimoirelab ``` If everything went well, you can just check the version that you installed: ```bash -(gl) % grimoirelab -v +(gl) $ grimoirelab -v ``` And that's it. You can now skip the rest of this chapter diff --git a/cases-chaoss/intro.md b/cases-chaoss/intro.md index 5fd131af..20050ea9 100644 --- a/cases-chaoss/intro.md +++ b/cases-chaoss/intro.md @@ -15,8 +15,8 @@ The process will include the installation of the GrimoireLab tools needed, and w Let's start by installing GrimoireLab components: ```bash -python3 -m venv gl -source gl/bin/activate +$ python3 -m venv gl +$ source gl/bin/activate (gl) $ pip install grimoire-elk grimoire-kidash ``` diff --git a/docs/getting-started/dev-setup.md b/docs/getting-started/dev-setup.md index 1a3ff819..8bc12cca 100644 --- a/docs/getting-started/dev-setup.md +++ b/docs/getting-started/dev-setup.md @@ -161,10 +161,10 @@ while `upstream` points to the original CHAOSS repo. An example is provided below. ```bash git remote -v -# origin https://github.com/valeriocos/perceval (fetch) -# origin https://github.com/valeriocos/perceval (push) -# upstream https://github.com/chaoss/grimoirelab-perceval (fetch) -# upstream https://github.com/chaoss/grimoirelab-perceval (push) +origin https://github.com/valeriocos/perceval (fetch) +origin https://github.com/valeriocos/perceval (push) +upstream https://github.com/chaoss/grimoirelab-perceval (fetch) +upstream https://github.com/chaoss/grimoirelab-perceval (push) ``` In order to add a remote to a Git repository, you can use the following command: diff --git a/docs/getting-started/setup.md b/docs/getting-started/setup.md index 8e5fbc2f..21742b73 100644 --- a/docs/getting-started/setup.md +++ b/docs/getting-started/setup.md @@ -26,26 +26,30 @@ through the following means. ```bash git --version -# git version 2.32.0 - +git version 2.32.0 +``` +```bash docker --version -# Docker version 20.10.7, build f0df35096d - +Docker version 20.10.7, build f0df35096d +``` +```bash docker-compose --version -# docker-compose version 1.28.5, build c4eb3a1f +docker-compose version 1.28.5, build c4eb3a1f ``` ### Hardware ```bash cat /proc/cpuinfo | grep processor | wc -l #View number of processors -# 4 - +4 +``` +```bash grep MemTotal /proc/meminfo #View amount of RAM available -# MemTotal: 8029848 kB - +MemTotal: 8029848 kB +``` +```bash sudo sysctl -w vm.max_map_count=262144 #Set virtual memory -# vm.max_map_count = 262144 +vm.max_map_count = 262144 ``` The reason for allocating `262144` for memory is the check that ElasticSearch diff --git a/docs/getting-started/troubleshooting.md b/docs/getting-started/troubleshooting.md index 35647d96..6329cd6f 100644 --- a/docs/getting-started/troubleshooting.md +++ b/docs/getting-started/troubleshooting.md @@ -36,7 +36,7 @@ parent: Getting Started It may also happen that the port, 5601, is already allocated to some other container. So running docker-compose will lead to the following error -```console +``` WARNING: Host is already in use by another container ``` @@ -45,13 +45,14 @@ that container. ```bash docker container ls # View all running containers -# CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -# 01f0767adb47 grimoirelab/hatstall:latest "/bin/sh -c ${DEPLOY…" 2 minutes ago Up 2 minutes 0.0.0.0:8000->80/tcp, :::8000->80/tcp docker-compose_hatstall_1 -# 9587614c7c4e bitergia/mordred:latest "/bin/sh -c ${DEPLOY…" 2 minutes ago Up 2 minutes (unhealthy) docker-compose_mordred_1 -# c3f3f118bead bitergia/kibiter:community-v6.8.6-3 "/docker_entrypoint.…" 2 minutes ago Up 2 minutes 0.0.0.0:5601->5601/tcp, :::5601->5601/tcp docker-compose_kibiter_1 -# d3c691acaf7b mariadb:10.0 "docker-entrypoint.s…" 2 minutes ago Up 2 minutes 3306/tcp docker-compose_mariadb_1 -# f5f406146ee9 docker.elastic.co/elasticsearch/elasticsearch-oss:6.8.6 "/usr/local/bin/dock…" 2 minutes ago Up 2 minutes 0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 9300/tcp docker-compose_elasticsearch_1 - +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +01f0767adb47 grimoirelab/hatstall:latest "/bin/sh -c ${DEPLOY…" 2 minutes ago Up 2 minutes 0.0.0.0:8000->80/tcp, :::8000->80/tcp docker-compose_hatstall_1 +9587614c7c4e bitergia/mordred:latest "/bin/sh -c ${DEPLOY…" 2 minutes ago Up 2 minutes (unhealthy) docker-compose_mordred_1 +c3f3f118bead bitergia/kibiter:community-v6.8.6-3 "/docker_entrypoint.…" 2 minutes ago Up 2 minutes 0.0.0.0:5601->5601/tcp, :::5601->5601/tcp docker-compose_kibiter_1 +d3c691acaf7b mariadb:10.0 "docker-entrypoint.s…" 2 minutes ago Up 2 minutes 3306/tcp docker-compose_mariadb_1 +f5f406146ee9 docker.elastic.co/elasticsearch/elasticsearch-oss:6.8.6 "/usr/local/bin/dock…" 2 minutes ago Up 2 minutes 0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 9300/tcp docker-compose_elasticsearch_1 +``` +```bash docker rm -f c3f3f118bead #c3f3f118bead is the container that is using port 5601. ``` @@ -76,7 +77,7 @@ localhost:9200` messages. Diagnosis Check for the following log in the output of `docker-compose up` -```console +```bash elasticsearch_1 | ERROR: [1] bootstrap checks failed elasticsearch_1 | [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144] ``` @@ -103,13 +104,13 @@ Indication Cannot open `localhost:9200` in browser, shows `Secure connection Failed` ```bash curl -XGET localhost:9200 -k -# curl: (52) Empty reply from server +curl: (52) Empty reply from server ``` Diagnosis Check for the following log in the output of `docker-compose up` -```console +```bash elasticsearch_1 | [2020-03-12T13:05:34,959][WARN ][c.f.s.h.SearchGuardHttpServerTransport] [Xrb6LcS] Someone (/172.18.0.1:59838) speaks http plaintext instead of ssl, will close the channel ``` @@ -146,7 +147,7 @@ Can't create indices in Kibana. Nothing happens after clicking create index. Diagnosis Check for the following log in the output of `docker-compose up` -```console +```bash elasticsearch_1 |[INFO ][c.f.s.c.PrivilegesEvaluator] No index-level perm match for User [name=readall, roles=[readall], requestedTenant=null] [IndexType [index=.kibana, type=doc]] [Action [[indices:data/write/index]]] [RolesChecked [sg_own_index, sg_readall]] elasticsearch_1 | [c.f.s.c.PrivilegesEvaluator] No permissions for {sg_own_index=[IndexType [index=.kibana, type=doc]], sg_readall=[IndexType [index=.kibana, type=doc]]} kibiter_1 | {"type":"response","@timestamp":CURRENT_TIME,"tags":[],"pid":1,"method":"post","statusCode":403,"req":{"url":"/api/saved_objects/index-pattern?overwrite=false","method":"post","headers":{"host":"localhost:5601","user-agent":YOUR_USER_AGENT,"accept":"application/json, text/plain, /","accept-language":"en-US,en;q=0.5","accept-encoding":"gzip, deflate","referer":"http://localhost:5601/app/kibana","content-type":"application/json;charset=utf-8","kbn-version":"6.1.4-1","content-length":"59","connection":"keep-alive"},"remoteAddress":YOUR_IP,"userAgent":YOUR_IP,"referer":"http://localhost:5601/app/kibana"},"res":{"statusCode":403,"responseTime":25,"contentLength":9},"message":"POST /api/saved_objects/index-pattern?overwrite=false 403 25ms - 9.0B"} @@ -167,11 +168,11 @@ Indication and Diagnosis Check for the following error after executing [Micro Mordred](https://github.com/chaoss/grimoirelab-sirmordred/tree/master/sirmordred/utils/micro.py) using the below command (assuming `git` is the backend) -```console +```bash micro.py --raw --enrich --panels --cfg ./setup.cfg --backends git ``` -```console +```bash [git] Problem executing study enrich_areas_of_code:git, RequestError(400, 'search_phase_execution_exception', 'No mapping found for [metadata__timestamp] in order to sort on') ``` @@ -218,13 +219,13 @@ Indication Cannot open `localhost:9200` in browser, shows `Secure connection Failed` ```bash curl -XGET localhost:9200 -k -# curl: (7) Failed to connect to localhost port 9200: Connection refused +curl: (7) Failed to connect to localhost port 9200: Connection refused ``` Diagnosis Check for the following log in the output of `docker-compose up` -```console +```bash elasticsearch_1 | ERROR: [1] bootstrap checks failed elasticsearch_1 | [1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536] ``` @@ -295,7 +296,7 @@ Indication Diagnosis -```console +```bash Retrying (Retry(total=10,connected=21,read=0,redirect=5,status=None)) after connection broken by 'SSLError(SSLError{1,'[SSL: WRONG_VERSION_NUMBER] wrong version number {_ssl.c:852}'},)': / ``` @@ -317,7 +318,7 @@ url = http://localhost:9200 Diagnosis -```console +```bash : [Errno 2]No such file or directory : 'cloc': 'cloc' ``` diff --git a/gelk/kidash.md b/gelk/kidash.md index 70cfe7e4..927ae3b0 100644 --- a/gelk/kidash.md +++ b/gelk/kidash.md @@ -14,13 +14,13 @@ You can save a dashboard, with all its components, to a file, either for backup ```bash -(grimoireelk) kidash -e http://localhost:9200 --dashboard "Git" --export /tmp/dashboard-git.json +kidash -e http://localhost:9200 --dashboard "Git" --export /tmp/dashboard-git.json ``` You can learn the name of the dashboard by looking at its top left corner, or by noting the name you use when opening it in Kibana. If the name includes spaces, use "-" instead. For example, for a dashboard named "Git History", use the line: ```bash -(grimoireelk) kidash -e http://localhost:9200 --dashboard "Git-History" \ +kidash -e http://localhost:9200 --dashboard "Git-History" \ --export /tmp/dashboard-git.json ``` @@ -42,7 +42,7 @@ We already restored a dashboard in the We can restore from any file created with kidash. Assuming we have that file as `/tmp/dashboard-git.json`, we need to know the link to the ElasticSearch REST interface (same as for backing up). The format is, for example, as follows: ```bash -(grimoireelk) $ kidash --elastic_url http://localhost:9200 \ +kidash --elastic_url http://localhost:9200 \ --import /tmp/dashboard-git.json ``` @@ -53,5 +53,5 @@ This will restore all elements in the file, overwriting, if needed, elements wit Kidash has some more options. For a complete listing, use the `--help` argument: ```bash -(grimoireelk) $ kidash --help +kidash --help ``` diff --git a/gelk/meetup.md b/gelk/meetup.md index ab6ff984..92039df4 100644 --- a/gelk/meetup.md +++ b/gelk/meetup.md @@ -27,8 +27,8 @@ Note: If your site redirects on page load, you may not see the code in the final For each of the group names, you only need to run the following command, assuming the group name is `group_name` and the Meetup API key is `meetup_key`: ```bash -(gl) $ p2o.py --enrich --index meetup_raw --index-enrich meetup \ --e http://localhost:9200 --no_inc --debug meetup group_name -t meetup_key --tag group_name +p2o.py --enrich --index meetup_raw --index-enrich meetup \ + -e http://localhost:9200 --no_inc --debug meetup group_name -t meetup_key --tag group_name ``` If the group has a sizable activity, the command will be retrieving data for a while, and uploading it to ElasticSearch, producing: diff --git a/gelk/simple.md b/gelk/simple.md index aa0015fc..666ed25a 100644 --- a/gelk/simple.md +++ b/gelk/simple.md @@ -16,11 +16,13 @@ Let's run `p2o.py` to create the indexes in ElasticSearch. We will create both t As an example, we produce indexes for two git repositories: those of Perceval and GrimoireELK. We will use `git_raw` as the name for the raw index, and `git` for the enriched one. We will store indexes in our ElasticSearch instance listening at `http://localhost:9200`. Each of the following commands will retrieve and enrich data for one of the git repositories: ```bash -(gl) $ p2o.py --enrich --index git_raw --index-enrich git \ +p2o.py --enrich --index git_raw --index-enrich git \ -e http://localhost:9200 --no_inc --debug \ git https://github.com/grimoirelab/perceval.git ... -(gl) $ p2o.py --enrich --index git_raw --index-enrich git \ +``` +```bash +p2o.py --enrich --index git_raw --index-enrich git \ -e http://localhost:9200 --no_inc --debug \ git https://github.com/grimoirelab/GrimoireELK.git ... @@ -44,7 +46,7 @@ Download it to your `/tmp` directory (Note: Please use 'Save Link as' option for downloading), and run the command: ```bash -(grimoireelk) $ kidash --elastic_url http://localhost:9200 \ +kidash --elastic_url http://localhost:9200 \ --import /tmp/git-dashboard.json ``` @@ -68,7 +70,7 @@ Using the Kibiter/Kibana interface it is simple to modify the dashboard, its vis `p2o.py` can be used to produce indexes for many other data sources. For example for GitHub issues and pull requests, the magic line is like this \(of course, substitute XXX for your GitHub token\): ```bash -(grimoireelk) $ p2o.py --enrich --index github_raw --index-enrich github \ +p2o.py --enrich --index github_raw --index-enrich github \ -e http://localhost:9200 --no_inc --debug \ github grimoirelab perceval \ -t XXX --sleep-for-rate @@ -79,7 +81,7 @@ In this case, you can use the Download it to your `/tmp` directory (Note: Please use 'Save Link as' option for downloading), and run the command: ```bash -(grimoireelk) $ kidash --elastic_url http://localhost:9200 \ +kidash --elastic_url http://localhost:9200 \ --import /tmp/github-dashboard.json ``` diff --git a/gelk/sortinghat.md b/gelk/sortinghat.md index 389331b4..a26fafa3 100644 --- a/gelk/sortinghat.md +++ b/gelk/sortinghat.md @@ -19,7 +19,7 @@ you need to initialize a database for it. Usually, each dashboard will have its own SortingHat database, although several dashboards can share the same. Initializing the database means creating the SQL schema for it, initializing its tables, and not much more. But you don't need to know about the details: SortingHat will take care of that for you. Just run `sortinghat init` with the appropriate options: ```bash -(gl) $ sortinghat -u user -p XXX init shdb +sortinghat -u user -p XXX init shdb ``` In this case, `user` is a user of the MySQL instance with permissions to create a new MySQL schema (database), `XXX` is the password for that user, and `shdb` is the name of the database to be created. @@ -47,7 +47,7 @@ For creating the indexes, we run `p2o.py` the same way we have done before, but For example, for producing the index for the git repository for Perceval, run: ```bash -(gl) $ p2o.py --enrich --index git_raw --index-enrich git \ +p2o.py --enrich --index git_raw --index-enrich git \ -e http://localhost:9200 --no_inc --debug \ --db-host localhost --db-sortinghat shdb --db-user user --db-password XXX \ git https://github.com/grimoirelab/perceval.git @@ -71,14 +71,14 @@ This will show all the identities found in the Perceval git repository. Let's produce now a Kibana dashboard for our enriched index (`git` in our ElasticSearch instance). I will start by installing `kidash`, to upload a JSON description of the dashboard, its visualizations, and everything needed: ```bash -(sh) $ pip install kidash +pip install kidash ``` Then, I use the JSON description of a dashboard for Git that includes visualizations for some fields generated from the SortingHat database: [git-sortinghat.json](dashboards/git-sortinghat.json). ```bash -(sh) $ kidash --elastic_url http://localhost:9200 \ +kidash --elastic_url http://localhost:9200 \ --import /tmp/git-sortinghat.json ``` diff --git a/graal/cocom.md b/graal/cocom.md index 6b92bf70..b548d45e 100644 --- a/graal/cocom.md +++ b/graal/cocom.md @@ -12,7 +12,7 @@ Once you've successfully installed Graal, you can get started real quick with the command line interface as easy as - ```sh -(graal) $ graal cocom --help +graal cocom --help ``` **Note:** You can invoke other available backends in a similar way. @@ -22,7 +22,7 @@ Once you've successfully installed Graal, you can get started real quick with th - Let's start our analysis with the host repository itself. As you can see the positional parameter is added with the repository url and `git-path` flag is used to define the path where the git repository will be cloned. ```sh -(graal) $ graal cocom https://github.com/chaoss/grimoirelab-graal --git-path /tmp/graal-cocom +graal cocom https://github.com/chaoss/grimoirelab-graal --git-path /tmp/graal-cocom [2019-03-27 21:32:03,719] - Starting the quest for the Graal. [2019-03-27 21:32:11,663] - Git worktree /tmp/worktrees/graal-cocom created! [2019-03-27 21:32:11,663] - Fetching commits: 'https://github.com/chaoss/grimoirelab-graal' git repository from 1970-01-01 00:00:00+00:00 to 2100-01-01 00:00:00+00:00; all branches diff --git a/manuscripts/first.md b/manuscripts/first.md index c6a56954..d21e315b 100644 --- a/manuscripts/first.md +++ b/manuscripts/first.md @@ -11,7 +11,7 @@ Reporting with GrimoireLab Manuscripts is easy. You need to have enriched Elasti For example, to produce a report about Git data in the standard GrimoireLab enriched index in my local ElasticSearch (accessible in the standard [http://localhost:9200](http://localhost:9200) location), you only need to run: ```bash -(gl) $ manuscripts -d /tmp/reports -u http://localhost:9200 \ +manuscripts -d /tmp/reports -u http://localhost:9200 \ -n GrimoireLab --data-sources git ``` diff --git a/python/es-dsl.md b/python/es-dsl.md index 9c9cef3c..79f9af48 100644 --- a/python/es-dsl.md +++ b/python/es-dsl.md @@ -5,7 +5,7 @@ The `elasticsearch` Python module may seem good enough to query ElasticSearch vi To install it, just use pip: ```bash -(perceval) $ pip install elasticsearch_dsl +pip install elasticsearch_dsl ``` It needs the `elasticsearch` Python module to work, but you'll have it already installed, or will be pulled in via dependencies, so don't worry about it. @@ -66,4 +66,4 @@ for commit in response: print(commit.hash, commit.author_date, commit.author) ``` -Now, instead of `scan()`, we use `execute()` which allows for slicing (note the line where we slice `request`), and preserves order. \ No newline at end of file +Now, instead of `scan()`, we use `execute()` which allows for slicing (note the line where we slice `request`), and preserves order. diff --git a/python/es.md b/python/es.md index 7fc2a0a6..b2e325b7 100644 --- a/python/es.md +++ b/python/es.md @@ -13,7 +13,7 @@ Instead of that, we will move one abstraction layer up, and will use the [elasti So, let's start with the basics of using the `elasticsearch` module. To begin with, we will add the module to our virtual environment, using pip: ```bash -(perceval) $ pip install elasticsearch +pip install elasticsearch ``` Now we can write some Python code to test it @@ -48,7 +48,7 @@ This little script assumes that we're running a local instance of ElasticSearch, When running it, you'll see the objects with the hashes being printed in the screen, right before they are uploaded to ElasticSearch: ```bash -(perceval) $ python perceval_elasticsearch_1.py +python perceval_elasticsearch_1.py {'hash': 'dc78c254e464ff334892e0448a23e4cfbfc637a3'} {'hash': '57bc204822832a6c23ac7883e5392f4da6f4ca37'} {'hash': '2355d18310d8e15c8e5d44f688d757df33b0e4be'} @@ -59,32 +59,32 @@ Once you run the script, the `commits` index is created in ElasticSearch. You ca ```bash curl -XGET http://localhost:9200/commits?pretty -# { -# "commits" : { -# "aliases" : { }, -# "mappings" : { -# "summary" : { -# "properties" : { -# "hash" : { -# "type" : "string" -# } -# } -# } -# }, -# "settings" : { -# "index" : { -# "creation_date" : "1476470820231", -# "number_of_shards" : "5", -# "number_of_replicas" : "1", -# "uuid" : "7DSlRG8ZSTuE1pMboG07yg", -# "version" : { -# "created" : "2020099" -# } -# } -# }, -# "warmers" : { } -# } -# } +{ + "commits" : { + "aliases" : { }, + "mappings" : { + "summary" : { + "properties" : { + "hash" : { + "type" : "string" + } + } + } + }, + "settings" : { + "index" : { + "creation_date" : "1476470820231", + "number_of_shards" : "5", + "number_of_replicas" : "1", + "uuid" : "7DSlRG8ZSTuE1pMboG07yg", + "version" : { + "created" : "2020099" + } + } + }, + "warmers" : { } + } +} ``` ## Deleting is important as well @@ -93,7 +93,7 @@ If you want to delete the index (for example, to run the script once again) you ```bash curl -XDELETE http://localhost:9200/commits -# {"acknowledged":true} +{"acknowledged":true} ``` If you don't do this, before running the previous script once again, you'll see an exception such as: @@ -176,33 +176,33 @@ After running it (deleting any previous `commits` index if needed), we have a ne ```bash curl -XGET "http://localhost:9200/commits/_search/?size=1&pretty" -# { -# "took" : 2, -# "timed_out" : false, -# "_shards" : { -# "total" : 5, -# "successful" : 5, -# "failed" : 0 -# }, -# "hits" : { -# "total" : 407, -# "max_score" : 1.0, -# "hits" : [ { -# "_index" : "commits", -# "_type" : "summary", -# "_id" : "AVfPp9Po5xUyv5saVPKU", -# "_score" : 1.0, -# "_source" : { -# "hash" : "d1253dd9876bb76e938a861acaceaae95241b46d", -# "commit" : "Santiago Dueñas ", -# "author" : "Santiago Dueñas ", -# "author_date" : "Wed Nov 18 10:59:52 2015 +0100", -# "files_no" : 3, -# "commit_date" : "Wed Nov 18 14:41:21 2015 +0100" -# } -# } ] -# } -# } +{ + "took" : 2, + "timed_out" : false, + "_shards" : { + "total" : 5, + "successful" : 5, + "failed" : 0 + }, + "hits" : { + "total" : 407, + "max_score" : 1.0, + "hits" : [ { + "_index" : "commits", + "_type" : "summary", + "_id" : "AVfPp9Po5xUyv5saVPKU", + "_score" : 1.0, + "_source" : { + "hash" : "d1253dd9876bb76e938a861acaceaae95241b46d", + "commit" : "Santiago Dueñas ", + "author" : "Santiago Dueñas ", + "author_date" : "Wed Nov 18 10:59:52 2015 +0100", + "files_no" : 3, + "commit_date" : "Wed Nov 18 14:41:21 2015 +0100" + } + } ] + } +} ``` Since we specified in the query we only wanted one document (`size=1`), we get a list of `hits` with a single document. But we can see also how there are a total of 407 documents (field `total` within field `hits`). For each document, we can see the information we have stored, which are the contents of `_source`. @@ -212,7 +212,7 @@ Since we specified in the query we only wanted one document (`size=1`), we get a Every index in ElasticSearch has a 'mapping'. Mappings specify how the index is, for example in terms of data types. If we don't specify a mapping before uploading data to an index, ElasticSearch will infere the mapping from the data. Therefore, even when we created no mapping for it, we can have a look at the mapping for the recently created index: ```bash -(perceval) $ curl -XGET "http://localhost:9200/commits/_mapping?pretty" +curl -XGET "http://localhost:9200/commits/_mapping?pretty" { "commits" : { "mappings" : { @@ -263,36 +263,36 @@ Instead of using the character strings that we get from Perceval as values for t ```bash curl -XGET "http://localhost:9200/commits/_mapping?pretty" -# { -# "commits" : { -# "mappings" : { -# "summary" : { -# "properties" : { -# "author" : { -# "type" : "string" -# }, -# "author_date" : { -# "type" : "date", -# "format" : "strict_date_optional_time||epoch_millis" -# }, -# "commit" : { -# "type" : "string" -# }, -# "commit_date" : { -# "type" : "date", -# "format" : "strict_date_optional_time||epoch_millis" -# }, -# "files_no" : { -# "type" : "long" -# }, -# "hash" : { -# "type" : "string" -# } -# } -# } -# } -# } -# } +{ + "commits" : { + "mappings" : { + "summary" : { + "properties" : { + "author" : { + "type" : "string" + }, + "author_date" : { + "type" : "date", + "format" : "strict_date_optional_time||epoch_millis" + }, + "commit" : { + "type" : "string" + }, + "commit_date" : { + "type" : "date", + "format" : "strict_date_optional_time||epoch_millis" + }, + "files_no" : { + "type" : "long" + }, + "hash" : { + "type" : "string" + } + } + } + } + } +} ``` So, now we have a more complete index for commits, and each of the fields in it have reasonable types in the ElasticSearch mapping. diff --git a/sirmordred/container.md b/sirmordred/container.md index 605c1585..8e30cb89 100644 --- a/sirmordred/container.md +++ b/sirmordred/container.md @@ -25,7 +25,7 @@ docker run -p 127.0.0.1:5601:5601 \ `credentials.cfg` should have a GitHub API token (see [Personal GitHub API tokens](https://github.com/blog/1509-personal-api-tokens)), in a `mordred.cfg` format: -``` +```cfg [github] api-token = XXX ``` @@ -100,7 +100,7 @@ This will make the container launch all services, but not running `sirmordred`: For running the `grimoirelab/installed` docker image, first set up the supporting systems in your host, as detailed in the [Supporting systems](../basics/supporting.md) section. Finally, compose a SirMordred configuration file with credentials and references the supporting system. For example: -``` +```cfg [es_collection] url = http://localhost:9200 user = diff --git a/sirmordred/micro-mordred.md b/sirmordred/micro-mordred.md index 37aae4df..1570d520 100644 --- a/sirmordred/micro-mordred.md +++ b/sirmordred/micro-mordred.md @@ -16,7 +16,7 @@ 1. We'll use the following docker-compose configuration to instantiate the required components i.e ElasticSearch, Kibiter and MariaDB. Note that we can omit the `mariadb` section in case you have MySQL/MariaDB already installed in our system. We'll name the following configuration as `docker-config.yml`. -``` +```yml elasticsearch: restart: on-failure:5 image: bitergia/elasticsearch:6.1.0-secured diff --git a/sortinghat/basic.md b/sortinghat/basic.md index e0dae5bd..a8507e4c 100644 --- a/sortinghat/basic.md +++ b/sortinghat/basic.md @@ -28,9 +28,9 @@ It is obvious that there are some repo identities in it that correspond to the s For example, let's merge repo identity `4fcec5a` (dpose, dpose@sega.bitergia.net) with `5b358fc` (dpose, dpose@bitergia.com), which I know correspond to the same person: ```bash - (gl) $ sortinghat -u user -p XXX -d shdb merge \ - 4fcec5a968246d8342e4acfceb9174531c8545c1 5b358fc11019cf2c03ea4c162009e89715e590dd - Unique identity 4fcec5a968246d8342e4acfceb9174531c8545c1 merged on 5b358fc11019cf2c03ea4c162009e89715e590dd +sortinghat -u user -p XXX -d shdb merge \ + 4fcec5a968246d8342e4acfceb9174531c8545c1 5b358fc11019cf2c03ea4c162009e89715e590dd +Unique identity 4fcec5a968246d8342e4acfceb9174531c8545c1 merged on 5b358fc11019cf2c03ea4c162009e89715e590dd ``` Notice that we had to use the complete hashes (in the table above, and in the listing in the previous section, we shortened them just for readability). What we have done is to merge `4fcec5a` on `5b358fc`, and the result is: @@ -47,13 +47,17 @@ The query looked for all rows in the `identities` table whose `uuid` field start We can follow this procedure for other identities that correspond to the same person: (Quan Zhou, quan@bitergia.com) and (quan, zhquan7@gmail.com); (Alberto Martín, alberto.martin@bitergia.com) and (Alberto Martín, albertinisg@users.noreply.github.com); and (Alvaro del Castillo, acs@thelma.cloud) and (Alvaro del Castillo, acs@bitergia.com): ```bash -(gl) $ sortinghat -u user -p XXX -d shdb merge \ +sortinghat -u user -p XXX -d shdb merge \ 0cac4ef12631d5b0ef2fa27ef09729b45d7a68c1 11cc0348b60711cdee515286e394c961388230ab Unique identity 0cac4ef12631d5b0ef2fa27ef09729b45d7a68c1 merged on 11cc0348b60711cdee515286e394c961388230ab -(gl) $ sortinghat -u user -p XXX -d shdb merge \ +``` +```bash +sortinghat -u user -p XXX -d shdb merge \ 35c0421704928bcbe3a0d9a4de1d79f9590ccaa9 37a8187909592a7b78559399105f6b5404af9e4e Unique identity 35c0421704928bcbe3a0d9a4de1d79f9590ccaa9 merged on 37a8187909592a7b78559399105f6b5404af9e4e -(gl) $ sortinghat -u user -p XXX -d shdb merge \ +``` +```bash +sortinghat -u user -p XXX -d shdb merge \ 7ad0031fa2db40a5149f54dfc2ec2a355e9443cd 9aed245d9df109f8d00ca0e656121c3bdde46a2a Unique identity 7ad0031fa2db40a5149f54dfc2ec2a355e9443cd merged on 9aed245d9df109f8d00ca0e656121c3bdde46a2a ``` @@ -63,7 +67,7 @@ Unique identity 7ad0031fa2db40a5149f54dfc2ec2a355e9443cd merged on 9aed245d9df10 Now, we can check how SortingHat is storing information about these merged identities, but instead of querying directly the database, we can just use `sortinghat`: ```bash -(gl) $ sortinghat -u user -p XXX -d shdb show \ +sortinghat -u user -p XXX -d shdb show \ 11cc0348b60711cdee515286e394c961388230ab unique identity 11cc0348b60711cdee515286e394c961388230ab @@ -87,7 +91,7 @@ We merged the repo identity (Quan Zhou, quan@bitergia.com) on the unique identit Unfortunately, we cannot redo the merge with the most convenient order: ```bash -(gl) $ sortinghat -u user -p XXX -d shdb merge \ +sortinghat -u user -p XXX -d shdb merge \ 11cc0348b60711cdee515286e394c961388230ab 0cac4ef12631d5b0ef2fa27ef09729b45d7a68c1 Error: 0cac4ef12631d5b0ef2fa27ef09729b45d7a68c1 not found in the registry ``` @@ -101,7 +105,7 @@ Later on we will revisit this case, since there are stuff that can be done: brea We can just modify the profile for the unique identity, thus changing the profile for a person: ```bash -(gl) $ sortinghat -u user -p XXX -d shdb profile \ +sortinghat -u user -p XXX -d shdb profile \ --name "Quan Zhou" --email "quan@bitergia.com" \ 11cc0348b60711cdee515286e394c961388230ab unique identity 11cc0348b60711cdee515286e394c961388230ab @@ -122,7 +126,7 @@ When we interact with SortingHat, it only changes the contents of the database i To make changes appear in the dashboard, we need to create new enriched indexes (re-enrich the indexes). We can do that by removing raw and enriched indexes from ElasticsSearch, and then running the same `p2o.py` commands shown to produce new raw and enriched indexes. But in our case, this is a clear overkill: we don't need to retrieve new raw indexes from the repositories, since they are fine. We only need to produce new enriched indexes. For that, we can run `p2o.py` as follows: ``` -(gl) $ p2o.py --only-enrich --index git_raw --index-enrich git \ +p2o.py --only-enrich --index git_raw --index-enrich git \ -e http://localhost:9200 --no_inc --debug \ --db-host localhost --db-sortinghat shdb --db-user user --db-password XXX \ git https://github.com/grimoirelab/GrimoireELK.git @@ -137,7 +141,7 @@ In this case, the command will create a new `git` index (by modifying the curren The above method, even when it will work, is still an overkill. I really don't need to modify the whole enriched indexes, by updating all the fields in their items. We just need to update the fields related to identities, which are the only ones that we need to change. For that, we have a specific option to `p2o.py`: ``` -(gl) $ p2o.py --only-enrich --refresh-identities --index git_raw --index-enrich git \ +p2o.py --only-enrich --refresh-identities --index git_raw --index-enrich git \ -e http://localhost:9200 --no_inc --debug \ --db-host localhost --db-sortinghat shdb --db-user user --db-password XXX \ git https://github.com/grimoirelab/GrimoireELK.git @@ -152,7 +156,7 @@ In most cases, when the SortingHat database is modified, only a handful of ident In this case, the command to run is: ``` -(gl) $ p2o.py --only-enrich --refresh-identities --index git_raw --index-enrich git \ +p2o.py --only-enrich --refresh-identities --index git_raw --index-enrich git \ --author_uuid 11cc0348b60711cdee515286e394c961388230ab \ 0cac4ef12631d5b0ef2fa27ef09729b45d7a68c1 \ -e http://localhost:9200 --no_inc --debug \ diff --git a/sortinghat/data.md b/sortinghat/data.md index cb550bdc..b64d6d8b 100644 --- a/sortinghat/data.md +++ b/sortinghat/data.md @@ -16,23 +16,31 @@ In this chapter we will learn how to use SortingHat in combination to other Grim We will start by adding some more repositories to the index, to have some more complete data. Then we will use it to explore the capabilities of SortingHat for merging identities, for adding affiliations and for adapting profiles. ```bash -(gl) $ p2o.py --enrich --index git_raw --index-enrich git \ +p2o.py --enrich --index git_raw --index-enrich git \ -e http://localhost:9200 --no_inc --debug \ --db-host localhost --db-sortinghat shdb --db-user user --db-password XXX \ git https://github.com/grimoirelab/GrimoireELK.git -(gl) $ p2o.py --enrich --index git_raw --index-enrich git \ +``` +```bash +p2o.py --enrich --index git_raw --index-enrich git \ -e http://localhost:9200 --no_inc --debug \ --db-host localhost --db-sortinghat shdb --db-user user --db-password XXX \ git https://github.com/grimoirelab/panels.git -(gl) $ p2o.py --enrich --index git_raw --index-enrich git \ +``` +```bash +p2o.py --enrich --index git_raw --index-enrich git \ -e http://localhost:9200 --no_inc --debug \ --db-host localhost --db-sortinghat shdb --db-user user --db-password XXX \ git https://github.com/grimoirelab/mordred.git -(gl) $ p2o.py --enrich --index git_raw --index-enrich git \ +``` +```bash +p2o.py --enrich --index git_raw --index-enrich git \ -e http://localhost:9200 --no_inc --debug \ --db-host localhost --db-sortinghat shdb --db-user user --db-password XXX \ git https://github.com/grimoirelab/arthur.git -(gl) $ p2o.py --enrich --index git_raw --index-enrich git \ +``` +```bash +p2o.py --enrich --index git_raw --index-enrich git \ -e http://localhost:9200 --no_inc --debug \ --db-host localhost --db-sortinghat shdb --db-user user --db-password XXX \ git https://github.com/grimoirelab/training.git diff --git a/tools-and-tips/csv-from-jenkins-enriched-index.md b/tools-and-tips/csv-from-jenkins-enriched-index.md index 12d1b71c..5dd5c4f2 100644 --- a/tools-and-tips/csv-from-jenkins-enriched-index.md +++ b/tools-and-tips/csv-from-jenkins-enriched-index.md @@ -7,8 +7,8 @@ To illustrate how to get data from an enriched index (produced using `grimoire_e To use it, we can create a new virtual environment for Python, and install the needed modules (including the script) in it. ```bash -pyvenv ~/venv -source ~/venv/bin/activate +$ pyvenv ~/venv +$ source ~/venv/bin/activate (venv) $ pip install elasticsearch (venv) $ pip install elasticsearch-dsl (venv) $ wget https://raw.githubusercontent.com/jgbarah/GrimoireLab-training/master/tools-and-tips/scripts/enriched_elasticsearch_jenkins.py diff --git a/tools-and-tips/html5-app-latest-activity.md b/tools-and-tips/html5-app-latest-activity.md index 16b6c70c..b556bb8a 100644 --- a/tools-and-tips/html5-app-latest-activity.md +++ b/tools-and-tips/html5-app-latest-activity.md @@ -15,7 +15,7 @@ For deploying the HTML5 app, just copy `index.html`, `events.js`, and `events.cs ```bash python3 -m http.server -# Serving HTTP on 0.0.0.0 port 8000 ... +Serving HTTP on 0.0.0.0 port 8000 ... ``` Now, let's produce a JSON file with the events that the app will show. For that, we will install [`elastic_last.py`](https://github.com/jgbarah/GrimoireLab-training/blob/master/tools-and-tips/scripts/elastic_last.py) in a Python3 virtual environment with all the needed dependencies (in this case, it is enough to install, via `pip`, the `elasticsearch-dsl` module, and run it: @@ -30,7 +30,7 @@ If we're using a `git` index in an ElasticSearch instance accessible at `https:/ ```bash python3 elastic_last.py --no_verify_certs --loop 10 --total 10 \ -https://user:XXX@grimoirelab.biterg.io/data/git + https://user:XXX@grimoirelab.biterg.io/data/git ``` In both cases `--loop 10` will cause the script to retrieve the index every 10 seconds, and produce a file `events.json` with the latest 10 events in the index (commits in this case), because of the option `--total 10`. If you want, instead of just one url, you can include as many as you may want, one after the other, to retrieve data from several indexes every 10 seconds. The option `--no_verify_certs` is needed only if your Python installation has trouble checking the validity of the SSL certificates (needed because the url is using HTTPS). diff --git a/tools-and-tips/perceval.md b/tools-and-tips/perceval.md index 2e08f799..ba31a035 100644 --- a/tools-and-tips/perceval.md +++ b/tools-and-tips/perceval.md @@ -8,7 +8,7 @@ This section shows some scripts using Perceval. ```bash python perceval_git_counter.py https://github.com/grimoirelab/perceval.git /tmp/ppp -# Number of commmits: 579. +Number of commmits: 579. ``` You can get a help banner, including options, by running