From d65830076a7002798603404bedcd6962a4cb5813 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E4=BA=91=E9=82=AA?= Date: Sun, 3 Feb 2019 11:01:36 +0800 Subject: [PATCH] [FLINK-11527] Support multiple languages for the framework of flink-web --- Gemfile | 4 +- _config.yml | 5 +- _data/i18n.yml | 26 ++ _includes/navbar.html | 53 ++-- community.zh.md | 472 +++++++++++++++++++++++++++++++++ contribute-code.zh.md | 319 ++++++++++++++++++++++ contribute-documentation.zh.md | 58 ++++ downloads.zh.md | 220 +++++++++++++++ ecosystem.zh.md | 102 +++++++ faq.zh.md | 90 +++++++ flink-applications.zh.md | 202 ++++++++++++++ flink-architecture.zh.md | 100 +++++++ flink-operations.zh.md | 72 +++++ gettinghelp.zh.md | 133 ++++++++++ how-to-contribute.zh.md | 149 +++++++++++ improve-website.zh.md | 106 ++++++++ index.zh.md | 356 +++++++++++++++++++++++++ poweredby.zh.md | 109 ++++++++ reviewing-prs.zh.md | 120 +++++++++ usecases.zh.md | 105 ++++++++ 20 files changed, 2783 insertions(+), 18 deletions(-) create mode 100644 _data/i18n.yml create mode 100755 community.zh.md create mode 100755 contribute-code.zh.md create mode 100755 contribute-documentation.zh.md create mode 100644 downloads.zh.md create mode 100644 ecosystem.zh.md create mode 100755 faq.zh.md create mode 100644 flink-applications.zh.md create mode 100644 flink-architecture.zh.md create mode 100644 flink-operations.zh.md create mode 100644 gettinghelp.zh.md create mode 100644 how-to-contribute.zh.md create mode 100755 improve-website.zh.md create mode 100755 index.zh.md create mode 100755 poweredby.zh.md create mode 100644 reviewing-prs.zh.md create mode 100644 usecases.zh.md diff --git a/Gemfile b/Gemfile index 931f8ce5a2..c05d9008ed 100755 --- a/Gemfile +++ b/Gemfile @@ -21,9 +21,11 @@ source 'https://rubygems.org' ruby '>=1.9.0' # Dependencies required to build the Flink docs -gem 'jekyll', '2.5.3' +gem 'jekyll', '3.0.5' gem 'kramdown', '1.10.0' gem 'pygments.rb', '0.6.3' gem 'therubyracer', '0.12.2' # explicitly require yajl-ruby (dependency of jekyll) in a version that works with Ruby 2.4 gem 'yajl-ruby', '1.2.2' +gem 'jekyll-multiple-languages', '2.0.3' +gem 'jekyll-paginate', '1.1.0' \ No newline at end of file diff --git a/_config.yml b/_config.yml index aeff24e954..09ec1e81af 100644 --- a/_config.yml +++ b/_config.yml @@ -273,6 +273,9 @@ defaults: markdown: KramdownPygments highlighter: pygments +# The all languages used +languages: ['en', 'zh'] + kramdown: toc_levels: 1..3 input: GFM @@ -286,7 +289,7 @@ redcarpet: exclude: [docker, flink-src-repo] -gems: [jekyll-paginate] +gems: ['jekyll-multiple-languages', 'jekyll-paginate'] host: 0.0.0.0 diff --git a/_data/i18n.yml b/_data/i18n.yml new file mode 100644 index 0000000000..06353b2c28 --- /dev/null +++ b/_data/i18n.yml @@ -0,0 +1,26 @@ +en: + what_is_flink: What is Apache Flink? + use_case: Use Cases + powered_by: Powered By + faq: FAQ + downloads: Downloads + tutorials: Tutorials + documentation: Documentation + getting_help: Getting Help + flink_blog: Flink Blog + community_project: Community & Project Info + how_to_contribute: How to Contribute + +zh: + what_is_flink: Apache Flink 是什么? + use_case: 应用场景 + powered_by: Flink 用户 + faq: 常见问题 + downloads: 下载 + tutorials: 教程 + documentation: 文档 + getting_help: 获取帮助 + flink_blog: Flink 博客 + community_project: 社区 & 项目信息 + how_to_contribute: 如何参与贡献 + diff --git a/_includes/navbar.html b/_includes/navbar.html index 3123650dfa..3101837508 100755 --- a/_includes/navbar.html +++ b/_includes/navbar.html @@ -1,4 +1,10 @@ - +{% if page.is_default_language %} + {% capture baseurl_i18n %}{{ site.baseurl }}{% endcapture %} +{% else %} + {% capture baseurl_i18n %}{{ site.baseurl }}/{{ page.language }}{% endcapture %} +{% endif %} + + + \ No newline at end of file diff --git a/community.zh.md b/community.zh.md new file mode 100755 index 0000000000..9821834aa1 --- /dev/null +++ b/community.zh.md @@ -0,0 +1,472 @@ +--- +title: "社区 & 项目信息" +--- + +
+ +{% toc %} + + +There are many ways to get help from the Apache Flink community. The [mailing lists](#mailing-lists) are the primary place where all Flink committers are present. For user support and questions use the *user mailing list*. Some committers are also monitoring [Stack Overflow](http://stackoverflow.com/questions/tagged/apache-flink). Please remember to tag your questions with the *[apache-flink](http://stackoverflow.com/questions/tagged/apache-flink)* tag. Bugs and feature requests can either be discussed on the *dev mailing list* or on [Jira]({{ site.jira }}). Those interested in contributing to Flink should check out the [contribution guide](how-to-contribute.html). + +If you send us an email with a code snippet, make sure that: + +1. you do not link to files in external services as such files can change, get deleted or the link might break and thus make an archived email thread useless +2. you paste text instead of screenshots of text +3. you keep formatting when pasting code in order to keep the code readable +4. there are enough import statements to avoid ambiguities + + +## Mailing Lists + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameSubscribeDigestUnsubscribePostArchive
+ news@flink.apache.org
+ News and announcements from the Flink community +
Subscribe Subscribe Unsubscribe Read only list + Archives
+
+ community@flink.apache.org
+ Broader community discussions related to meetups, conferences, blog posts and job offers +
Subscribe Subscribe Unsubscribe Post + Archives
+
+ user@flink.apache.org
+ User support and questions mailing list +
Subscribe Subscribe Unsubscribe Post + Archives
+ Nabble Archive +
+ user-zh@flink.apache.org
+ Chinese user support and questions mailing list +
Subscribe Subscribe Unsubscribe Post + Archives
+
+ dev@flink.apache.org
+ Development related discussions +
Subscribe Subscribe Unsubscribe Post + Archives
+ Nabble Archive +
+ issues@flink.apache.org +
+ Mirror of all Jira activity +
Subscribe Subscribe UnsubscribeRead only listArchives
+ commits@flink.apache.org +
+ All commits to our repositories +
Subscribe Subscribe Unsubscribe Read only listArchives
+ +Please make sure you are subscribed to the mailing list you are posting to! If you are not subscribed to the mailing list, your message will either be rejected (dev@ list) or you won't receive the response (user@ list). + +## Stack Overflow + +Committers are watching [Stack Overflow](http://stackoverflow.com/questions/tagged/apache-flink) for the [apache-flink](http://stackoverflow.com/questions/tagged/apache-flink) tag. + +Make sure to tag your questions there accordingly to get answers from the Flink community. + +## Issue Tracker + +We use Jira to track all code related issues: [{{ site.jira }}]({{ site.jira }}). + +All issue activity is also mirrored to the issues mailing list. + +## Meetups + +There are plenty of meetups on [meetup.com](http://www.meetup.com/topics/apache-flink/) featuring Flink. + +## Source Code + +### Main source repositories + +- **ASF writable**: [https://gitbox.apache.org/repos/asf/flink.git](https://gitbox.apache.org/repos/asf/flink.git) +- **GitHub mirror**: [https://github.com/apache/flink.git](https://github.com/apache/flink.git) + +### Flink-shaded repositories (shaded dependency libraries) + +- **ASF writable**: [https://gitbox.apache.org/repos/asf/flink-shaded.git](https://gitbox.apache.org/repos/asf/flink-shaded.git) +- **GitHub mirror**: [https://github.com/apache/flink-shaded.git](https://github.com/apache/flink-shaded.git) + +### Flink Website repositories + +- **ASF writable**: [https://gitbox.apache.org/repos/asf/flink-web.git](https://gitbox.apache.org/repos/asf/flink-web.git) +- **GitHub mirror**: [https://github.com/apache/flink-web.git](https://github.com/apache/flink-web.git) + +### Bahir Flink repositories (additional connectors) + +- **ASF writable**: [https://git-wip-us.apache.org/repos/asf/bahir-flink.git](https://git-wip-us.apache.org/repos/asf/bahir-flink.git) +- **GitHub mirror**: [https://github.com/apache/bahir-flink.git](https://github.com/apache/bahir-flink.git) + + +## Training + +[dataArtisans](http://data-artisans.com) currently maintains free Apache Flink training. Their [training website](http://training.data-artisans.com/) has slides and exercises with solutions. The slides are also available on [SlideShare](http://www.slideshare.net/dataArtisans/presentations). + +## Project Wiki + +The Apache Flink project wiki contains a range of relevant resources for Flink users. However, some content on the wiki might be out-of-date. When in doubt, please refer to the Flink documentation. + +## Flink Forward + +Flink Forward 2015 (October 12-13, 2015) was the first conference to bring together the Apache Flink developer and user community. You can find [slides and videos](http://2015.flink-forward.org/?post_type=session) of all talks on the Flink Forward 2015 page. + +The second edition of Flink Forward took place on September 12-14, 2016. All [slides and videos](http://2016.flink-forward.org/program/sessions/) are available on the Flink Forward 2016 page. + +In 2017, Flink Forward came to San Francisco to welcome the Apache Flink community to one day of training and one day of conference. Find all videos on our [YouTube Channel](https://www.youtube.com/playlist?list=PLDX4T_cnKjD2UC6wJr_wRbIvtlMtkc-n2) and all slides on [SlideShare](https://www.slideshare.net/FlinkForward). + +# People + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameRoleApache ID
Márton BalassiPMC, Committermbalassi
Paris CarboneCommittersenorcarbone
Ufuk CelebiPMC, Committeruce
Shuyi ChenCommittershuyichen
Xingcan CuiCommitterxccui
Stephan EwenPMC, Committer, VPsewen
Gyula FóraPMC, Committergyfora
Alan GatesPMC, Committergates
Greg HoganPMC, Committergreg
Fabian HueskePMC, Committerfhueske
Vasia KalavriPMC, Committervasia
Kostas KloudasCommitterkkloudas
Aljoscha KrettekPMC, Committeraljoscha
Nico KruberCommitternkruber
ChengXiang LiCommitterchengxiang
Andra LunguCommitterandra
Robert MetzgerPMC, Committerrmetzger
Maximilian MichelsPMC, Committermxm
Chiwan ParkCommitterchiwanpark
Stefan RichterCommittersrichter
Till RohrmannPMC, Committertrohrmann
Henry SaputraPMC, Committerhsaputra
Matthias J. SaxCommittermjsax
Sebastian SchelterPMC, Committerssc
Chesnay ScheplerPMC, Committerchesnay
Xiaogang ShiCommittershixg
Jincheng SunCommitterjincheng
Tzu-Li (Gordon) TaiPMC, Committertzulitai
Kostas TzoumasPMC, Committerktzoumas
Theodore VasiloudisCommittertvas
Timo WaltherPMC, Committertwalthr
Shaoxuan WangCommittershaoxuan
Daniel WarnekePMC, Committerwarneke
Jark WuCommitterjark
Dawid WysakowiczCommitterdwysakowicz
Gary YaoCommittergary
Kurt YoungCommitterkurt
+ +You can reach committers directly at `@apache.org`. A list of all contributors can be found [here]({{ site.FLINK_CONTRIBUTORS_URL }}). + +## Former mentors + +The following people were very kind to mentor the project while in incubation. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameRoleApache ID
Ashutosh ChauhanFormer PPMC, Mentorhashutosh
Ted DunningFormer PPMC, Mentortdunning
Alan GatesFormer PPMC, Mentorgates
Owen O'MalleyFormer PPMC, Mentoromalley
Sean OwenFormer PPMC, Mentorsrowen
Henry SaputraFormer PPMC, Mentorhsaputra
+ +# Materials / Apache Flink Logos + +The [materials page]({{ site.baseurl }}/material.html) offers assets such as the Apache Flink logo in different image formats, or the Flink color scheme. + + diff --git a/contribute-code.zh.md b/contribute-code.zh.md new file mode 100755 index 0000000000..3a1ac2cb3c --- /dev/null +++ b/contribute-code.zh.md @@ -0,0 +1,319 @@ +--- +title: "贡献代码" +--- + +Apache Flink is maintained, improved, and extended by code contributions of volunteers. The Apache Flink community encourages anybody to contribute source code. In order to ensure a pleasant contribution experience for contributors and reviewers and to preserve the high quality of the code base, we follow a contribution process that is explained in this document. + +This document contains everything you need to know about contributing code to Apache Flink. It describes the process of preparing, testing, and submitting a contribution, explains coding guidelines and code style of Flink's code base, and gives instructions to setup a development environment. + +**IMPORTANT**: Please read this document carefully before starting to work on a code contribution. It is important to follow the process and guidelines explained below. Otherwise, your pull request might not be accepted or might require substantial rework. In particular, before opening a pull request that implements a **new feature**, you need to open a Jira ticket and reach consensus with the community on whether this feature is needed. + + + +{% toc %} + +## Code Contribution Process + +### Before you start coding… + +…please make sure there is a Jira issue that corresponds to your contribution. This is a *general rule* that the Flink community follows for all code contributions, including bug fixes, improvements, or new features, with an exception for *trivial* hot fixes. If you would like to fix a bug that you found or if you would like to add a new feature or improvement to Flink, please follow the [File a bug report]({{ site.baseurl }}/how-to-contribute.html#file-a-bug-report) or [Propose an improvement or a new feature]({{ site.baseurl }}/how-to-contribute.html#propose-an-improvement-or-a-new-feature) guidelines to open an issue in [Flink's Jira](http://issues.apache.org/jira/browse/FLINK) before starting with the implementation. + +If the description of a Jira issue indicates that its resolution will touch sensitive parts of the code base, be sufficiently complex, or add significant amounts of new code, the Flink community might request a design document. (Most contributions should not require a design document.) The purpose of this document is to ensure that the overall approach to address the issue is sensible and agreed upon by the community. Jira issues that require a design document are tagged with the **`requires-design-doc`** label. The label can be attached by any community member who feels that a design document is necessary. A good description helps to decide whether a Jira issue requires a design document or not. The design document must be added or attached to or linked from the Jira issue and cover the following aspects: + +- Overview of the general approach. +- List of API changes (changed interfaces, new and deprecated configuration parameters, changed behavior, …). +- Main components and classes to be touched. +- Known limitations of the proposed approach. + +A design document can be added by anybody, including the reporter of the issue or the person working on it. + +Contributions for Jira issues that require a design document will not be added to Flink's code base before a design document has been accepted by the community with [lazy consensus](http://www.apache.org/foundation/glossary.html#LazyConsensus). Please check if a design document is required before starting to code. + + +### While coding… + +…please respect the following rules: + +- Take any discussion or requirement that is recorded in the Jira issue into account. +- Follow the design document (if a design document is required) as close as possible. Please update the design document and seek consensus, if your implementation deviates too much from the solution proposed by the design document. Minor variations are OK but should be pointed out when the contribution is submitted. +- Closely follow the [coding guidelines]( {{site.base}}/contribute-code.html#coding-guidelines) and the [code style]({{ site.base }}/contribute-code.html#code-style). +- Do not mix unrelated issues into one contribution. + +**Please feel free to ask questions at any time.** Either send a mail to the [dev mailing list]( {{ site.base }}/community.html#mailing-lists ) or comment on the Jira issue. + +The following instructions will help you to [setup a development environment]( {{ site.base }}/contribute-code.html#setup-a-development-environment). + + + + +### Verifying the compliance of your code + +It is very important to verify the compliance of changes before submitting your contribution. This includes: + +- Making sure the code builds. +- Verifying that all existing and new tests pass. +- Checking that the code style is not violated. +- Making sure no unrelated or unnecessary reformatting changes are included. + +You can build the code, run the tests, and check (parts of) the code style by calling: + +``` +mvn clean verify +``` + +Please note that some tests in Flink's code base are flaky and can fail by chance. The Flink community is working hard on improving these tests but sometimes this is not possible, e.g., when tests include external dependencies. We maintain all tests that are known to be flaky in Jira and attach the **`test-stability`** label. Please check (and extend) this list of [known flaky tests](https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20test-stability%20ORDER%20BY%20priority%20DESC) if you encounter a test failure that seems to be unrelated to your changes. + +Please note that we run additional build profiles for different combinations of Java, Scala, and Hadoop versions to validate your contribution. We encourage every contributor to use a *continuous integration* service that will automatically test the code in your repository whenever you push a change. The [Best practices]( {{site.base}}/contribute-code.html#best-practices ) guide shows how to integrate [Travis](https://travis-ci.org/) with your GitHub repository. + +In addition to the automated tests, please check the diff of your changes and remove all unrelated changes such as unnecessary reformatting. + + + + +### Preparing and submitting your contribution + +To make the changes easily mergeable, please rebase them to the latest version of the main repository's master branch. Please also respect the [commit message guidelines]( {{ site.base }}/contribute-code.html#coding-guidelines ), clean up your commit history, and squash your commits to an appropriate set. Please verify your contribution one more time after rebasing and commit squashing as described above. + +The Flink project accepts code contributions through the [GitHub Mirror](https://github.com/apache/flink), in the form of [Pull Requests](https://help.github.com/articles/using-pull-requests). Pull requests are a simple way to offer a patch, by providing a pointer to a code branch that contains the change. + +To open a pull request, push your contribution back into your fork of the Flink repository. + +``` +git push origin myBranch +``` + +Go to the website of your repository fork (`https://github.com//flink`) and use the *"Create Pull Request"* button to start creating a pull request. Make sure that the base fork is `apache/flink master` and the head fork selects the branch with your changes. Give the pull request a meaningful description and send it. + +It is also possible to attach a patch to a [Jira]({{site.FLINK_ISSUES_URL}}) issue. + +----- + +## Coding guidelines + +### Pull requests and commit message +{:.no_toc} + +- **Single change per PR**. Please do not combine various unrelated changes in a single pull request. Rather, open multiple individual pull requests where each PR refers to a Jira issue. This ensures that pull requests are *topic related*, can be merged more easily, and typically result in topic-specific merge conflicts only. + +- **No WIP pull requests**. We consider pull requests as requests to merge the referenced code *as is* into the current *stable* master branch. Therefore, a pull request should not be "work in progress". Open a pull request if you are confident that it can be merged into the current master branch without problems. If you rather want comments on your code, post a link to your working branch. + +- **Commit message**. A pull request must relate to a Jira issue; create an issue if none exists for the change you want to make. The latest commit message should reference that issue. An example commit message would be *[FLINK-633] Fix NullPointerException for empty UDF parameters*. That way, the pull request automatically gives a description of what it does, for example, what bug does it fix in what way. + +- **Append review commits**. When you get comments on the pull request asking for changes, append commits for these changes. *Do not rebase and squash them.* It allows people to review the cleanup work independently. Otherwise reviewers have to go through the entire set of diffs again. + +- **No merge commits**. Please do not open pull requests containing merge commits. Use `git pull --rebase origin master` if you want to update your changes to the latest master prior to opening a pull request. + +### Exceptions and error messages +{:.no_toc} + +- **Exception swallowing**. Do not swallow exceptions and print the stacktrace. Instead check how exceptions are handled by similar classes. + +- **Meaningful error messages**. Give meaningful exception messages. Try to imagine why an exception could be thrown (what a user did wrong) and give a message that will help a user to resolve the problem. + +### Tests +{:.no_toc} + +- **Tests need to pass**. Any pull request where the tests do not pass or which does not compile will not undergo any further review. We recommend to connect your personal GitHub accounts with [Travis CI](http://travis-ci.org/) (like the Flink GitHub repository). Travis will run tests for all tested environments whenever you push something into *your* GitHub repository. Please note the previous [comment about flaky tests]( {{site.base}}/contribute-code.html#verifying-the-compliance-of-your-code). + +- **Tests for new features are required**. All new features need to be backed by tests, *strictly*. It is very easy that a later merge accidentally throws out a feature or breaks it. This will not be caught if the feature is not guarded by tests. Anything not covered by a test is considered cosmetic. + +- **Use appropriate test mechanisms**. Please use unit tests to test isolated functionality, such as methods. Unit tests should execute in subseconds and should be preferred whenever possible. The names of unit test classes have to end in `*Test`. Use integration tests to implement long-running tests. Flink offers test utilities for end-to-end tests that start a Flink instance and run a job. These tests are pretty heavy and can significantly increase build time. Hence, they should be added with care. The names of end-to-end test classes have to end in `*ITCase`. + +### Documentation +{:.no_toc} + +- **Documentation Updates**. Many changes in the system will also affect the documentation (both Javadocs and the user documentation in the `docs/` directory). Pull requests and patches are required to update the documentation accordingly; otherwise the change can not be accepted to the source code. See the [Contribute documentation]({{site.base}}/contribute-documentation.html) guide for how to update the documentation. + +- **Javadocs for public methods**. All public methods and classes need to have Javadocs. Please write meaningful docs. Good docs are concise and informative. Please do also update Javadocs if you change the signature or behavior of a documented method. + +### Code formatting +{:.no_toc} + +- **No reformattings**. Please keep reformatting of source files to a minimum. Diffs become unreadable if you (or your IDE automatically) remove or replace whitespaces, reformat code, or comments. Also, other patches that affect the same files become un-mergeable. Please configure your IDE such that code is not automatically reformatted. Pull requests with excessive or unnecessary code reformatting might be rejected. + + + +----- + +## Code style + +### License +- **Apache license headers.** Make sure you have Apache License headers in your files. The RAT plugin is checking for that when you build the code. + +### Imports +- **Empty line before and after package declaration.** +- **No unused imports.** +- **No redundant imports.** +- **No wildcard imports.** They can cause problems when adding to the code and in some cases even during refactoring. +- **Import order.** Imports must be ordered alphabetically, grouped into the following blocks, with each block separated by an empty line: + - <imports from org.apache.flink.*> + - <imports from org.apache.flink.shaded.*> + - <imports from other libraries> + - <imports from javax.*> + - <imports from java.*> + - <imports from scala.*> + - <static imports> + +### Naming +- **Package names must start with a letter, and must not contain upper-case letters or special characters.** +- **Non-private static final fields must be upper-case, with words being separated by underscores.** (`MY_STATIC_VARIABLE`) +- **Non-static fields/methods must be in lower camel case.** (`myNonStaticField`) + +### Whitespace +- **Tabs vs. spaces.** We are using tabs for indentation, not spaces. We are not religious there; it just happened to be that we started with tabs, and it is important to not mix them (merge/diff conflicts). +- **No trailing whitespace.** +- **Spaces around operators/keywords.** Operators (`+`, `=`, `>`, …) and keywords (`if`, `for`, `catch`, …) must have a space before and after them, provided they are not at the start or end of the line. + +### Braces +- **Left curly braces (`{`) must not be placed on a new line.** +- **Right curly braces (`}`) must always be placed at the beginning of the line.** +- **Blocks.** All statements after `if`, `for`, `while`, `do`, … must always be encapsulated in a block with curly braces (even if the block contains one statement). + + ```java +for (…) { + … +} +``` + + If you are wondering why, recall the famous [*goto bug*](https://www.imperialviolet.org/2014/02/22/applebug.html) in Apple's SSL library. + +### Javadocs +- **All public/protected methods and classes must have a Javadoc.** +- **The first sentence of the Javadoc must end with a period.** +- **Paragraphs must be separated with a new line, and started with <p>.** + +### Modifiers +- **No redundant modifiers.** For example, public modifiers in interface methods. +- **Follow JLS3 modifier order.** Modifiers must be ordered in the following order: public, protected, private, abstract, static, final, transient, volatile, synchronized, native, strictfp. + +### Files +- **All files must end with `\n`.** +- **File length must not exceed 3000 lines.** + +### Misc +- **Arrays must be defined Java-style.** For example, `public String[] array`. +- **Use Flink Preconditions.** To increase homogeneity, consistently use the `org.apache.flink.Preconditions` methods `checkNotNull` and `checkArgument` rather than Apache Commons Validate or Google Guava. +- **No raw generic types.** Do not use raw generic types, unless strictly necessary (sometime necessary for signature matches, arrays). +- **Suppress warnings.** Add annotations to suppress warnings, if they cannot be avoided (such as "unchecked", or "serial"). +- **Comments.** Add comments to your code. What is it doing? Add Javadocs or inherit them by not adding any comments to the methods. Do not automatically generate comments, and avoid unnecessary comments like: + + ```java +i++; // increment by one +``` + +----- + +## Best practices + +- Travis: Flink is pre-configured for [Travis CI](http://docs.travis-ci.com/), which can be easily enabled for your personal repository fork (it uses GitHub for authentication, so you do not need an additional account). Simply add the *Travis CI* hook to your repository (*Settings --> Integrations & services --> Add service*) and enable tests for the `flink` repository on [Travis](https://travis-ci.org/profile). + +----- + +## Setup a development environment + +### Requirements for developing and building Flink + +* Unix-like environment (We use Linux, Mac OS X, and Cygwin) +* git +* Maven (at least version 3.0.4) +* Java 7 or 8 + +### Clone the repository +{:.no_toc} + +Apache Flink's source code is stored in a [git](http://git-scm.com/) repository which is mirrored to [GitHub](https://github.com/apache/flink). The common way to exchange code on GitHub is to fork the repository into your personal GitHub account. For that, you need to have a [GitHub](https://github.com) account or create one for free. Forking a repository means that GitHub creates a copy of the forked repository for you. This is done by clicking on the *Fork* button on the upper right of the [repository website](https://github.com/apache/flink). Once you have a fork of Flink's repository in your personal account, you can clone that repository to your local machine. + +``` +git clone https://github.com//flink.git +``` + +The code is downloaded into a directory called `flink`. + + +### Proxy Settings + +If you are behind a firewall you may need to provide Proxy settings to Maven and your IDE. + +For example, the WikipediaEditsSourceTest communicates over IRC and need a [SOCKS proxy server](http://docs.oracle.com/javase/8/docs/technotes/guides/net/proxies.html) to pass. + +### Setup an IDE and import the source code +{:.no_toc} + +The Flink committers use IntelliJ IDEA and Eclipse IDE to develop the Flink code base. + +Minimal requirements for an IDE are: + +- Support for Java and Scala (also mixed projects) +- Support for Maven with Java and Scala + +#### IntelliJ IDEA + +The IntelliJ IDE supports Maven out of the box and offers a plugin for Scala development. + +- IntelliJ download: [https://www.jetbrains.com/idea/](https://www.jetbrains.com/idea/) +- IntelliJ Scala Plugin: [http://plugins.jetbrains.com/plugin/?id=1347](http://plugins.jetbrains.com/plugin/?id=1347) + +Check out our [Setting up IntelliJ]({{site.docs-stable}}/flinkDev/ide_setup.html#intellij-idea) guide for details. + +#### Eclipse Scala IDE + +For Eclipse users, we recommend using Scala IDE 3.0.3, based on Eclipse Kepler. While this is a slightly older version, +we found it to be the version that works most robustly for a complex project like Flink. + +Further details and a guide to newer Scala IDE versions can be found in the +[How to setup Eclipse]({{site.docs-stable}}/flinkDev/ide_setup.html#eclipse) docs. + +**Note:** Before following this setup, make sure to run the build from the command line once +(`mvn clean install -DskipTests`; see below). + +1. Download the Scala IDE (preferred) or install the plugin to Eclipse Kepler. See + [How to setup Eclipse]({{site.docs-stable}}/flinkDev/ide_setup.html#eclipse) for download links and instructions. +2. Add the "macroparadise" compiler plugin to the Scala compiler. + Open "Window" -> "Preferences" -> "Scala" -> "Compiler" -> "Advanced" and put into the "Xplugin" field the path to + the *macroparadise* jar file (typically "/home/*-your-user-*/.m2/repository/org/scalamacros/paradise_2.10.4/2.0.1/paradise_2.10.4-2.0.1.jar"). + Note: If you do not have the jar file, you probably did not run the command line build. +3. Import the Flink Maven projects ("File" -> "Import" -> "Maven" -> "Existing Maven Projects") +4. During the import, Eclipse will ask to automatically install additional Maven build helper plugins. +5. Close the "flink-java8" project. Since Eclipse Kepler does not support Java 8, you cannot develop this project. + +#### Import the source code + +Apache Flink uses Apache Maven as build tool. Most IDEs are capable of importing Maven projects. + + +### Build the code +{:.no_toc} + +To build Flink from source code, open a terminal, navigate to the root directory of the Flink source code, and call: + +``` +mvn clean package +``` + +This will build Flink and run all tests. Flink is now installed in `build-target`. + +To build Flink without executing the tests you can call: + +``` +mvn -DskipTests clean package +``` + + +----- + + +## How to use Git as a committer + +Only the infrastructure team of the ASF has administrative access to the GitHub mirror. Therefore, comitters have to push changes to the git repository at the ASF. + +### Main source repositories +{:.no_toc} + +**ASF writable**: https://gitbox.apache.org/repos/asf/flink.git + +**ASF read-only**: https://github.com/apache/flink.git + +Note: Flink does not build with Oracle JDK 6. It runs with Oracle JDK 6. + +If you want to build for Hadoop 1, activate the build profile via `mvn clean package -DskipTests -Dhadoop.profile=1`. + diff --git a/contribute-documentation.zh.md b/contribute-documentation.zh.md new file mode 100755 index 0000000000..f071d9e99f --- /dev/null +++ b/contribute-documentation.zh.md @@ -0,0 +1,58 @@ +--- +title: "贡献文档" +--- + +Good documentation is crucial for any kind of software. This is especially true for sophisticated software systems such as distributed data processing engines like Apache Flink. The Apache Flink community aims to provide concise, precise, and complete documentation and welcomes any contribution to improve Apache Flink's documentation. + +{% toc %} + +## Obtain the documentation sources + +Apache Flink's documentation is maintained in the same [git](http://git-scm.com/) repository as the code base. This is done to ensure that code and documentation can be easily kept in sync. + +The easiest way to contribute documentation is to fork [Flink's mirrored repository on GitHub](https://github.com/apache/flink) into your own GitHub account by clicking on the fork button at the top right. If you have no GitHub account, you can create one for free. + +Next, clone your fork to your local machine. + +``` +git clone https://github.com//flink.git +``` + +The documentation is located in the `docs/` subdirectory of the Flink code base. + +## Before you start working on the documentation ... + +... please make sure there exists a [Jira](https://issues.apache.org/jira/browse/FLINK) issue that corresponds to your contribution. We require all documentation changes to refer to a Jira issue, except for trivial fixes such as typos. + +## Update or extend the documentation + +The Flink documentation is written in [Markdown](http://daringfireball.net/projects/markdown/). Markdown is a lightweight markup language which can be translated to HTML. + +In order to update or extend the documentation you have to modify the Markdown (`.md`) files. Please verify your changes by starting the build script in preview mode. + +``` +cd docs +./build_docs.sh -p +``` + +The script compiles the Markdown files into static HTML pages and starts a local webserver. Open your browser at `http://localhost:4000` to view the compiled documentation including your changes. The served documentation is automatically re-compiled and updated when you modify and save Markdown files and refresh your browser. + +Please feel free to ask any questions you have on the developer mailing list. + +## Submit your contribution + +The Flink project accepts documentation contributions through the [GitHub Mirror](https://github.com/apache/flink) as [Pull Requests](https://help.github.com/articles/using-pull-requests). Pull requests are a simple way of offering a patch by providing a pointer to a code branch that contains the changes. + +To prepare and submit a pull request follow these steps. + +1. Commit your changes to your local git repository. The commit message should point to the corresponding Jira issue by starting with `[FLINK-XXXX]`. + +2. Push your committed contribution to your fork of the Flink repository at GitHub. + + ``` + git push origin myBranch + ``` + +3. Go to the website of your repository fork (`https://github.com//flink`) and use the "Create Pull Request" button to start creating a pull request. Make sure that the base fork is `apache/flink master` and the head fork selects the branch with your changes. Give the pull request a meaningful description and submit it. + +It is also possible to attach a patch to a [Jira]({{site.FLINK_ISSUES_URL}}) issue. diff --git a/downloads.zh.md b/downloads.zh.md new file mode 100644 index 0000000000..2a4fcad750 --- /dev/null +++ b/downloads.zh.md @@ -0,0 +1,220 @@ +--- +title: "下载" +--- + +
+ + + +{% toc %} + +## Latest stable release (v{{ site.FLINK_VERSION_STABLE }}) + +Apache Flink® {{ site.FLINK_VERSION_STABLE }} is our latest stable release. + +An Apache Hadoop installation is [not required](faq.html#how-does-flink-relate-to-the-hadoop-stack) to use Apache Flink. +For users that use Flink without any Hadoop components, we recommend the release without bundled Hadoop libraries. + +If you plan to use Apache Flink together with Apache Hadoop (run Flink on YARN, connect to HDFS, +connect to HBase, or use some Hadoop-based file system connector) then select the download that +bundles the matching Hadoop version, or use the Hadoop free version and +[export your HADOOP_CLASSPATH](https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/hadoop.html). + +### Binaries + + + + + + + + + {% for binary_release in site.stable_releases %} + + + {% if binary_release.scala_211 %} + + {% else %} + + {% endif %} + + {% if binary_release.scala_212 %} + + {% else %} + + {% endif %} + + {% endfor %} + +
Scala 2.11 Scala 2.12
{{ binary_release.name }}Download (asc, sha512)Not supported.Download (asc, sha512)Not supported.
+ +### Source +

Review the source code or build Flink on your own, using one of these packages:

+ +{% for source_release in site.source_releases %} + +{% endfor %} + +### Optional components + +{% assign categories = site.optional_components | group_by: 'category' | sort: 'name' %} +{% for category in categories %} + + +
+ +{% assign components = category.items | | sort: 'name' %} +{% for component in components %} + + + + + + {% if component.scala_dependent %} + + + {% else %} + + {% endif %} + + + + {% for version in component.versions %} + + {% if component.scala_dependent %} + + {% if version.scala_211 %} + + {% else %} + + {% endif %} + {% if version.scala_212 %} + + {% else %} + + {% endif %} + {% else %} + + + {% endif %} + + {% endfor %} + +
{{ component.name }}Scala 2.11Scala 2.12
{{ version.version }}Download (asc, sha1)Not supported.Download (asc, sha1)Not supported.{{ version.version }}Download (asc, sha1)
+ +{% endfor %} +
+{% endfor %} + +## Release Notes + +Please have a look at the [Release Notes for Flink {{ site.FLINK_VERSION_STABLE_SHORT }}]({{ site.DOCS_BASE_URL }}flink-docs-release-{{ site.FLINK_VERSION_STABLE_SHORT }}/release-notes/flink-{{ site.FLINK_VERSION_STABLE_SHORT }}.html) if you plan to upgrade your Flink setup from a previous version. + +## Verifying Hashes and Signatures + +Along with our releases, we also provide sha512 hashes in `*.sha512` files and cryptographic signatures in `*.asc` files. The Apache Software Foundation has an extensive [tutorial to verify hashes and signatures](http://www.apache.org/info/verification.html) which you can follow by using any of these release-signing [KEYS](https://www.apache.org/dist/flink/KEYS). + +## Maven Dependencies + +You can add the following dependencies to your `pom.xml` to include Apache Flink in your project. These dependencies include a local execution environment and thus support local testing. + +- **Scala API**: To use the Scala API, replace the `flink-java` artifact id with `flink-scala_2.11` and `flink-streaming-java_2.11` with `flink-streaming-scala_2.11`. + +```xml + + org.apache.flink + flink-java + {{ site.FLINK_VERSION_STABLE }} + + + org.apache.flink + flink-streaming-java_2.11 + {{ site.FLINK_VERSION_STABLE }} + + + org.apache.flink + flink-clients_2.11 + {{ site.FLINK_VERSION_STABLE }} + +``` + +## Update Policy for old releases + +As of March 2017, the Flink community [decided](http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Time-based-releases-in-Flink-tp15386p15394.html) to support the current and previous minor release with bugfixes. If 1.2.x is the current release, 1.1.y is the previous minor supported release. Both versions will receive bugfixes for critical issues. + +Note that the community is always open to discussing bugfix releases for even older versions. Please get in touch with the developers for that on the dev@flink.apache.org mailing list. + + +## All stable releases + +All Flink releases are available via [https://archive.apache.org/dist/flink/](https://archive.apache.org/dist/flink/) including checksums and cryptographic signatures. At the time of writing, this includes the following versions: + +### Flink + +- Flink 1.7.1 - 2018-12-21 ([Source](https://archive.apache.org/dist/flink/flink-1.7.1/flink-1.7.1-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.7.1/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.7/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.7/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.7/api/scala/index.html)) +- Flink 1.7.0 - 2018-11-30 ([Source](https://archive.apache.org/dist/flink/flink-1.7.0/flink-1.7.0-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.7.0/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.7/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.7/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.7/api/scala/index.html)) +- Flink 1.6.3 - 2018-12-22 ([Source](https://archive.apache.org/dist/flink/flink-1.6.3/flink-1.6.3-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.6.3/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.6/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.6/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.6/api/scala/index.html)) +- Flink 1.6.2 - 2018-10-29 ([Source](https://archive.apache.org/dist/flink/flink-1.6.2/flink-1.6.2-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.6.2/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.6/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.6/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.6/api/scala/index.html)) +- Flink 1.6.1 - 2018-09-19 ([Source](https://archive.apache.org/dist/flink/flink-1.6.1/flink-1.6.1-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.6.1/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.6/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.6/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.6/api/scala/index.html)) +- Flink 1.6.0 - 2018-08-08 ([Source](https://archive.apache.org/dist/flink/flink-1.6.0/flink-1.6.0-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.6.0/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.6/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.6/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.6/api/scala/index.html)) +- Flink 1.5.6 - 2018-12-21 ([Source](https://archive.apache.org/dist/flink/flink-1.5.6/flink-1.5.6-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.5.6/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.5/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.5/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.5/api/scala/index.html)) +- Flink 1.5.5 - 2018-10-29 ([Source](https://archive.apache.org/dist/flink/flink-1.5.5/flink-1.5.5-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.5.5/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.5/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.5/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.5/api/scala/index.html)) +- Flink 1.5.4 - 2018-09-19 ([Source](https://archive.apache.org/dist/flink/flink-1.5.4/flink-1.5.4-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.5.4/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.5/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.5/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.5/api/scala/index.html)) +- Flink 1.5.3 - 2018-08-21 ([Source](https://archive.apache.org/dist/flink/flink-1.5.3/flink-1.5.3-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.5.3/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.5/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.5/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.5/api/scala/index.html)) +- Flink 1.5.2 - 2018-07-31 ([Source](https://archive.apache.org/dist/flink/flink-1.5.2/flink-1.5.2-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.5.2/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.5/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.5/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.5/api/scala/index.html)) +- Flink 1.5.1 - 2018-07-12 ([Source](https://archive.apache.org/dist/flink/flink-1.5.1/flink-1.5.1-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.5.1/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.5/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.5/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.5/api/scala/index.html)) +- Flink 1.5.0 - 2018-05-25 ([Source](https://archive.apache.org/dist/flink/flink-1.5.0/flink-1.5.0-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.5.0/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.5/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.5/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.5/api/scala/index.html)) +- Flink 1.4.2 - 2018-03-08 ([Source](https://archive.apache.org/dist/flink/flink-1.4.2/flink-1.4.2-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.4.2/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.4/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.4/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.4/api/scala/index.html)) +- Flink 1.4.1 - 2018-02-15 ([Source](https://archive.apache.org/dist/flink/flink-1.4.1/flink-1.4.1-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.4.1/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.4/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.4/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.4/api/scala/index.html)) +- Flink 1.4.0 - 2017-11-29 ([Source](https://archive.apache.org/dist/flink/flink-1.4.0/flink-1.4.0-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.4.0/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.4/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.4/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.4/api/scala/index.html)) +- Flink 1.3.3 - 2018-03-15 ([Source](https://archive.apache.org/dist/flink/flink-1.3.3/flink-1.3.3-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.3.3/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.3/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.3/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.3/api/scala/index.html)) +- Flink 1.3.2 - 2017-08-05 ([Source](https://archive.apache.org/dist/flink/flink-1.3.2/flink-1.3.2-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.3.2/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.3/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.3/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.3/api/scala/index.html)) +- Flink 1.3.1 - 2017-06-23 ([Source](https://archive.apache.org/dist/flink/flink-1.3.1/flink-1.3.1-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.3.1/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.3/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.3/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.3/api/scala/index.html)) +- Flink 1.3.0 - 2017-06-01 ([Source](https://archive.apache.org/dist/flink/flink-1.3.0/flink-1.3.0-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.3.0/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.3/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.3/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.3/api/scala/index.html)) +- Flink 1.2.1 - 2017-04-26 ([Source](https://archive.apache.org/dist/flink/flink-1.2.1/flink-1.2.1-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.2.1/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.2/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.2/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.2/api/scala/index.html)) +- Flink 1.2.0 - 2017-02-06 ([Source](https://archive.apache.org/dist/flink/flink-1.2.0/flink-1.2.0-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.2.0/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.2/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.2/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.2/api/scala/index.html)) +- Flink 1.1.5 - 2017-03-22 ([Source](https://archive.apache.org/dist/flink/flink-1.1.5/flink-1.1.5-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.1.5/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.1/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.1/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.1/api/scala/index.html)) +- Flink 1.1.4 - 2016-12-21 ([Source](https://archive.apache.org/dist/flink/flink-1.1.4/flink-1.1.4-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.1.4/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.1/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.1/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.1/api/scala/index.html)) +- Flink 1.1.3 - 2016-10-13 ([Source](https://archive.apache.org/dist/flink/flink-1.1.3/flink-1.1.3-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.1.3/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.1/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.1/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.1/api/scala/index.html)) +- Flink 1.1.2 - 2016-09-05 ([Source](https://archive.apache.org/dist/flink/flink-1.1.2/flink-1.1.2-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.1.2/)) +- Flink 1.1.1 - 2016-08-11 ([Source](https://archive.apache.org/dist/flink/flink-1.1.1/flink-1.1.1-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.1.1/)) +- Flink 1.1.0 - 2016-08-08 ([Source](https://archive.apache.org/dist/flink/flink-1.1.0/flink-1.1.0-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.1.0/)) +- Flink 1.0.3 - 2016-05-12 ([Source](https://archive.apache.org/dist/flink/flink-1.0.3/flink-1.0.3-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.0.3/), [Docs]({{site.DOCS_BASE_URL}}flink-docs-release-1.0/), [Javadocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.0/api/java), [ScalaDocs]({{site.DOCS_BASE_URL}}flink-docs-release-1.0/api/scala/index.html)) +- Flink 1.0.2 - 2016-04-23 ([Source](https://archive.apache.org/dist/flink/flink-1.0.2/flink-1.0.2-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.0.2/)) +- Flink 1.0.1 - 2016-04-06 ([Source](https://archive.apache.org/dist/flink/flink-1.0.1/flink-1.0.1-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.0.1/)) +- Flink 1.0.0 - 2016-03-08 ([Source](https://archive.apache.org/dist/flink/flink-1.0.0/flink-1.0.0-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-1.0.0/)) +- Flink 0.10.2 - 2016-02-11 ([Source](https://archive.apache.org/dist/flink/flink-0.10.2/flink-0.10.2-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-0.10.2/)) +- Flink 0.10.1 - 2015-11-27 ([Source](https://archive.apache.org/dist/flink/flink-0.10.1/flink-0.10.1-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-0.10.1/)) +- Flink 0.10.0 - 2015-11-16 ([Source](https://archive.apache.org/dist/flink/flink-0.10.0/flink-0.10.0-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-0.10.0/)) +- Flink 0.9.1 - 2015-09-01 ([Source](https://archive.apache.org/dist/flink/flink-0.9.1/flink-0.9.1-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-0.9.1/)) +- Flink 0.9.0 - 2015-06-24 ([Source](https://archive.apache.org/dist/flink/flink-0.9.0/flink-0.9.0-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-0.9.0/)) +- Flink 0.9.0-milestone-1 - 2015-04-13 ([Source](https://archive.apache.org/dist/flink/flink-0.9.0-milestone-1/flink-0.9.0-milestone-1-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-0.9.0-milestone-1/)) +- Flink 0.8.1 - 2015-02-20 ([Source](https://archive.apache.org/dist/flink/flink-0.8.1/flink-0.8.1-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-0.8.1/)) +- Flink 0.8.0 - 2015-01-22 ([Source](https://archive.apache.org/dist/flink/flink-0.8.0/flink-0.8.0-src.tgz), [Binaries](https://archive.apache.org/dist/flink/flink-0.8.0/)) +- Flink 0.7.0-incubating - 2014-11-04 ([Source](https://archive.apache.org/dist/incubator/flink/flink-0.7.0-incubating/flink-0.7.0-incubating-src.tgz), [Binaries](https://archive.apache.org/dist/incubator/flink/flink-0.7.0-incubating/)) +- Flink 0.6.1-incubating - 2014-09-26 ([Source](https://archive.apache.org/dist/incubator/flink/flink-0.6.1-incubating/flink-0.6.1-incubating-src.tgz), [Binaries](https://archive.apache.org/dist/incubator/flink/flink-0.6.1-incubating/)) +- Flink 0.6-incubating - 2014-08-26 ([Source](https://archive.apache.org/dist/incubator/flink/flink-0.6-incubating-src.tgz), [Binaries](https://archive.apache.org/dist/incubator/flink/)) + +### Flink-shaded +- Flink-shaded 5.0 - 2018-10-15 ([Source](https://archive.apache.org/dist/flink/flink-shaded-5.0/flink-shaded-5.0-src.tgz)) +- Flink-shaded 4.0 - 2018-06-06 ([Source](https://archive.apache.org/dist/flink/flink-shaded-4.0/flink-shaded-4.0-src.tgz)) +- Flink-shaded 3.0 - 2018-02-28 ([Source](https://archive.apache.org/dist/flink/flink-shaded-3.0/flink-shaded-3.0-src.tgz)) +- Flink-shaded 2.0 - 2017-10-30 ([Source](https://archive.apache.org/dist/flink/flink-shaded-2.0/flink-shaded-2.0-src.tgz)) +- Flink-shaded 1.0 - 2017-07-27 ([Source](https://archive.apache.org/dist/flink/flink-shaded-1.0/flink-shaded-1.0-src.tgz)) diff --git a/ecosystem.zh.md b/ecosystem.zh.md new file mode 100644 index 0000000000..6b75ef2e2b --- /dev/null +++ b/ecosystem.zh.md @@ -0,0 +1,102 @@ +--- +title: "生态系统" +--- +
+Apache Flink supports a broad ecosystem and works seamlessly with +many other data processing projects and frameworks. +
+{% toc %} + +## Connectors + +

Connectors provide code for interfacing with various third-party systems.

+ +

Currently these systems are supported:

+ + + +To run an application using one of these connectors, additional third party +components are usually required to be installed and launched, e.g., the servers +for the message queues. Further instructions for these can be found in the +corresponding subsections. + + +## Third-Party Projects + +This is a list of third party packages (i.e., libraries, system extensions, or examples) built on Flink. +The Flink community collects links to these packages but does not maintain them. +Thus, they do not belong to the Apache Flink project, and the community cannot give any support for them. +**Is your project missing?** +Please let us know on the [user/dev mailing list](#mailing-lists). + +**Apache Zeppelin** + +[Apache Zeppelin](https://zeppelin.incubator.apache.org/) is a web-based notebook that enables interactive data analytics and can be used with +[Flink as an execution engine](https://zeppelin.incubator.apache.org/docs/interpreter/flink.html) (next to other engines). +See also Jim Dowling's [Flink Forward talk](http://www.slideshare.net/FlinkForward/jim-dowling-interactive-flink-analytics-with-hopsworks-and-zeppelin) about Zeppelin on Flink. + +**Apache Mahout** + +[Apache Mahout](https://mahout.apache.org/) is a machine learning library that will feature Flink as an execution engine soon. +Check out Sebastian Schelter's [Flink Forward talk](http://www.slideshare.net/FlinkForward/sebastian-schelter-distributed-machine-learing-with-the-samsara-dsl) about Mahout-Samsara DSL. + +**Cascading** + +[Cascading](http://www.cascading.org/cascading-flink/) enables a user to build complex workflows easily on Flink and other execution engines. +[Cascading on Flink](https://github.com/dataArtisans/cascading-flink) is built by [dataArtisans](http://data-artisans.com/) and [Driven, Inc](http://www.driven.io/). +See Fabian Hueske's [Flink Forward talk](http://www.slideshare.net/FlinkForward/fabian-hueske-training-cascading-on-flink) for more details. + +**Apache Beam** + +[Apache Beam](https://beam.apache.org/) is an open-source, unified programming model that you can use to create a data processing pipeline. Flink is one of the back-ends supported by the Beam programming model. + +**GRADOOP** + +[GRADOOP](http://dbs.uni-leipzig.de/en/research/projects/gradoop) enables scalable graph analytics on top of Flink and is developed at Leipzig University. Check out Martin Junghanns’ [Flink Forward talk](http://www.slideshare.net/FlinkForward/martin-junghans-gradoop-scalable-graph-analytics-with-apache-flink). + +**BigPetStore** + +[BigPetStore](https://github.com/apache/bigtop/tree/master/bigtop-bigpetstore) is a benchmarking suite including a data generator and will be available for Flink soon. +See Suneel Marthi's [Flink Forward talk](http://www.slideshare.net/FlinkForward/suneel-marthi-bigpetstore-flink-a-comprehensive-blueprint-for-apache-flink?ref=http://flink-forward.org/?session=tbd-3) as preview. + +**FastR** + +[FastR](https://github.com/oracle/fastr) is an implemenation of the R language in Java. [FastR Flink](https://bitbucket.org/allr/fastr-flink/src/3535a9b7c7f208508d6afbcdaf1de7d04fa2bf79/README_FASTR_FLINK.md?at=default&fileviewer=file-view-default) executes R workloads on top of Flink. + +**Apache SAMOA** + +[Apache SAMOA (incubating)](https://samoa.incubator.apache.org/) is a streaming ML library featuring Flink as an execution engine soon. Albert Bifet introduced SAMOA on Flink at his [Flink Forward talk](http://www.slideshare.net/FlinkForward/albert-bifet-apache-samoa-mining-big-data-streams-with-apache-flink?ref=http://flink-forward.org/?session=apache-samoa-mining-big-data-streams-with-apache-flink). + +**Alluxio** + +[Alluxio](http://www.alluxio.org/) is an open-source memory-speed virtual distributed storage that enables applications to efficiently share data and access data across different storage systems in a [unified namespace](http://www.alluxio.org/docs/master/en/Unified-and-Transparent-Namespace.html). Here is an example of [using Flink to access data through Alluxio](http://www.alluxio.org/docs/master/en/Running-Flink-on-Alluxio.html). + +**Python Examples on Flink** + +A [collection of examples](https://github.com/wdm0006/flink-python-examples) using Apache Flink's Python API. + +**WordCount Example in Clojure** + +A small [WordCount example](https://github.com/mjsax/flink-external/tree/master/flink-clojure) on how to write a Flink program in Clojure. + +**Anomaly Detection and Prediction in Flink** + +[flink-htm](https://github.com/nupic-community/flink-htm) is a library for anomaly detection and prediction in Apache Flink. The algorithms are based on Hierarchical Temporal Memory (HTM) as implemented by the Numenta Platform for Intelligent Computing (NuPIC). + +**Apache Ignite** + +[Apache Ignite](https://ignite.apache.org) is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time. See [Flink sink streaming connector](https://github.com/apache/ignite/tree/master/modules/flink) to inject data into Ignite cache. + +**Tink temporal graph library** + +[Tink](https://github.com/otherwise777/Temporal_Graph_library) is a temporal graph library built on top of Flink. It allows for temporal graph analytics like different interpretations of the shortest temporal path algorithm and metrics like temporal betweenness and temporal closeness. This library was the result of the [Thesis](http://www.win.tue.nl/~gfletche/ligtenberg2017.pdf) of Wouter Ligtenberg. diff --git a/faq.zh.md b/faq.zh.md new file mode 100755 index 0000000000..f93e5c8690 --- /dev/null +++ b/faq.zh.md @@ -0,0 +1,90 @@ +--- +title: "常见问题" +--- + + +
+ +The following questions are frequently asked with regard to the Flink project **in general**. + +If you have further questions, make sure to consult the [documentation]({{site.docs-stable}}) or [ask the community]({{ site.baseurl }}/gettinghelp.html). + +{% toc %} + + +# General + +## Is Apache Flink only for (near) real-time processing use cases? + +Flink is a very general system for data processing and data-driven applications with *data streams* as +the core building block. These data streams can be streams of real-time data, or stored streams of historic data. +For example, in Flink's view a file is a stored stream of bytes. Because of that, Flink +supports both real-time data processing and applications, as well as batch processing applications. + +Streams can be *unbounded* (have no end, events continuously keep coming) or be *bounded* (streams have a beginning +and an end). For example, a Twitter feed or a stream of events from a message queue are generally unbounded streams, +whereas a stream of bytes from a file is a bounded stream. + +## If everything is a stream, why are there a DataStream and a DataSet API in Flink? + +Bounded streams are often more efficient to process than unbounded streams. Processing unbounded streams of events +in (near) real-time requires the system to be able to immediately act on events and to produce intermediate +results (often with low latency). Processing bounded streams usually does not require producing low latency results, because the data is a while old +anyway (in relative terms). That allows Flink to process the data in a simple and more efficient way. + +The *DataStream* API captures the continuous processing of unbounded and bounded streams, with a model that supports +low latency results and flexible reaction to events and time (including event time). + +The *DataSet* API has techniques that often speed up the processing of bounded data streams. In the future, the community +plans to combine these optimizations with the techniques in the DataStream API. + +## How does Flink relate to the Hadoop Stack? + +Flink is independent of [Apache Hadoop](https://hadoop.apache.org/) and runs without any Hadoop dependencies. + +However, Flink integrates very well with many Hadoop components, for example, *HDFS*, *YARN*, or *HBase*. +When running together with these components, Flink can use HDFS to read data, or write results and checkpoints/snapshots. +Flink can be easily deployed via YARN and integrates with the YARN and HDFS Kerberos security modules. + +## What other stacks does Flink run in? + +Users run Flink on [Kubernetes](https://kubernetes.io), [Mesos](https://mesos.apache.org/), +[Docker](https://www.docker.com/), or even as standalone services. + +## What are the prerequisites to use Flink? + + - You need *Java 8* to run Flink jobs/applications. + - The Scala API (optional) depends on Scala 2.11. + - Highly-available setups with no single point of failure require [Apache ZooKeeper](https://zookeeper.apache.org/). + - For highly-available stream processing setups that can recover from failures, Flink requires some form of distributed storage for checkpoints (HDFS / S3 / NFS / SAN / GFS / Kosmos / Ceph / ...). + +## What scale does Flink support? + +Users are running Flink jobs both in very small setups (fewer than 5 nodes) and on 1000s of nodes and with TBs of state. + +## Is Flink limited to in-memory data sets? + +For the DataStream API, Flink supports larger-than-memory state be configuring the RocksDB state backend. + +For the DataSet API, all operations (except delta-iterations) can scale beyond main memory. + +# Common Error Messages + +Common error messages are listed on the [Getting Help]({{ site.baseurl }}/gettinghelp.html#got-an-error-message) page. diff --git a/flink-applications.zh.md b/flink-applications.zh.md new file mode 100644 index 0000000000..405483b311 --- /dev/null +++ b/flink-applications.zh.md @@ -0,0 +1,202 @@ +--- +title: "Apache Flink 是什么?" +--- + +
+
+
+

+ Architecture   +   + Applications   +   + Operations +

+
+
+
+ +Apache Flink is a framework for stateful computations over unbounded and bounded data streams. Flink provides multiple APIs at different levels of abstraction and offers dedicated libraries for common use cases. + +Here, we present Flink's easy-to-use and expressive APIs and libraries. + +## Building Blocks for Streaming Applications + +The types of applications that can be built with and executed by a stream processing framework are defined by how well the framework controls *streams*, *state*, and *time*. In the following, we describe these building blocks for stream processing applications and explain Flink's approaches to handle them. + +### Streams + +Obviously, streams are a fundamental aspect of stream processing. However, streams can have different characteristics that affect how a stream can and should be processed. Flink is a versatile processing framework that can handle any kind of stream. + +* **Bounded** and **unbounded** streams: Streams can be unbounded or bounded, i.e., fixed-sized data sets. Flink has sophisticated features to process unbounded streams, but also dedicated operators to efficiently process bounded streams. +* **Real-time** and **recorded** streams: All data are generated as streams. There are two ways to process the data. Processing it in real-time as it is generated or persisting the stream to a storage system, e.g., a file system or object store, and processed it later. Flink applications can process recorded or real-time streams. + +### State + +Every non-trivial streaming application is stateful, i.e., only applications that apply transformations on individual events do not require state. Any application that runs basic business logic needs to remember events or intermediate results to access them at a later point in time, for example when the next event is received or after a specific time duration. + +
+ +
+ +Application state is a first-class citizen in Flink. You can see that by looking at all the features that Flink provides in the context of state handling. + +* **Multiple State Primitives**: Flink provides state primitives for different data structures, such as atomic values, lists, or maps. Developers can choose the state primitive that is most efficient based on the access pattern of the function. +* **Pluggable State Backends**: Application state is managed in and checkpointed by a pluggable state backend. Flink features different state backends that store state in memory or in [RocksDB](https://rocksdb.org/), an efficient embedded on-disk data store. Custom state backends can be plugged in as well. +* **Exactly-once state consistency**: Flink's checkpointing and recovery algorithms guarantee the consistency of application state in case of a failure. Hence, failures are transparently handled and do not affect the correctness of an application. +* **Very Large State**: Flink is able to maintain application state of several terabytes in size due to its asynchronous and incremental checkpoint algorithm. +* **Scalable Applications**: Flink supports scaling of stateful applications by redistributing the state to more or fewer workers. + +### Time + +Time is another important ingredient of streaming applications. Most event streams have inherent time semantics because each event is produced at a specific point in time. Moreover, many common stream computations are based on time, such as windows aggregations, sessionization, pattern detection, and time-based joins. An important aspect of stream processing is how an application measures time, i.e., the difference of event-time and processing-time. + +Flink provides a rich set of time-related features. + +* **Event-time Mode**: Applications that process streams with event-time semantics compute results based on timestamps of the events. Thereby, event-time processing allows for accurate and consistent results regardless whether recorded or real-time events are processed. +* **Watermark Support**: Flink employs watermarks to reason about time in event-time applications. Watermarks are also a flexible mechanism to trade-off the latency and completeness of results. +* **Late Data Handling**: When processing streams in event-time mode with watermarks, it can happen that a computation has been completed before all associated events have arrived. Such events are called late events. Flink features multiple options to handle late events, such as rerouting them via side outputs and updating previously completed results. +* **Processing-time Mode**: In addition to its event-time mode, Flink also supports processing-time semantics which performs computations as triggered by the wall-clock time of the processing machine. The processing-time mode can be suitable for certain applications with strict low-latency requirements that can tolerate approximate results. + +## Layered APIs + +Flink provides three layered APIs. Each API offers a different trade-off between conciseness and expressiveness and targets different use cases. + +
+ +
+ +We briefly present each API, discuss its applications, and show a code example. + +### The ProcessFunctions + +[ProcessFunctions](https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/process_function.html) are the most expressive function interfaces that Flink offers. Flink provides ProcessFunctions to process individual events from one or two input streams or events that were grouped in a window. ProcessFunctions provide fine-grained control over time and state. A ProcessFunction can arbitrarily modify its state and register timers that will trigger a callback function in the future. Hence, ProcessFunctions can implement complex per-event business logic as required for many [stateful event-driven applications]({{ site.baseurl }}/usecases.html#eventDrivenApps). + +The following example shows a `KeyedProcessFunction` that operates on a `KeyedStream` and matches `START` and `END` events. When a `START` event is received, the function remembers its timestamp in state and registers a timer in four hours. If an `END` event is received before the timer fires, the function computes the duration between `END` and `START` event, clears the state, and returns the value. Otherwise, the timer just fires and clears the state. + +{% highlight java %} +/** + * Matches keyed START and END events and computes the difference between + * both elements' timestamps. The first String field is the key attribute, + * the second String attribute marks START and END events. + */ +public static class StartEndDuration + extends KeyedProcessFunction, Tuple2> { + + private ValueState startTime; + + @Override + public void open(Configuration conf) { + // obtain state handle + startTime = getRuntimeContext() + .getState(new ValueStateDescriptor("startTime", Long.class)); + } + + /** Called for each processed event. */ + @Override + public void processElement( + Tuple2 in, + Context ctx, + Collector> out) throws Exception { + + switch (in.f1) { + case "START": + // set the start time if we receive a start event. + startTime.update(ctx.timestamp()); + // register a timer in four hours from the start event. + ctx.timerService() + .registerEventTimeTimer(ctx.timestamp() + 4 * 60 * 60 * 1000); + break; + case "END": + // emit the duration between start and end event + Long sTime = startTime.value(); + if (sTime != null) { + out.collect(Tuple2.of(in.f0, ctx.timestamp() - sTime)); + // clear the state + startTime.clear(); + } + default: + // do nothing + } + } + + /** Called when a timer fires. */ + @Override + public void onTimer( + long timestamp, + OnTimerContext ctx, + Collector> out) { + + // Timeout interval exceeded. Cleaning up the state. + startTime.clear(); + } +} +{% endhighlight %} + +The example illustrates the expressive power of the `KeyedProcessFunction` but also highlights that it is a rather verbose interface. + +### The DataStream API + +The [DataStream API](https://ci.apache.org/projects/flink/flink-docs-stable/dev/datastream_api.html) provides primitives for many common stream processing operations, such as windowing, record-at-a-time transformations, and enriching events by querying an external data store. The DataStream API is available for Java and Scala and is based on functions, such as `map()`, `reduce()`, and `aggregate()`. Functions can be defined by extending interfaces or as Java or Scala lambda functions. + +The following example shows how to sessionize a clickstream and count the number of clicks per session. + +{% highlight java %} +// a stream of website clicks +DataStream clicks = ... + +DataStream> result = clicks + // project clicks to userId and add a 1 for counting + .map( + // define function by implementing the MapFunction interface. + new MapFunction>() { + @Override + public Tuple2 map(Click click) { + return Tuple2.of(click.userId, 1L); + } + }) + // key by userId (field 0) + .keyBy(0) + // define session window with 30 minute gap + .window(EventTimeSessionWindows.withGap(Time.minutes(30L))) + // count clicks per session. Define function as lambda function. + .reduce((a, b) -> Tuple2.of(a.f0, a.f1 + b.f1)); +{% endhighlight %} + +### SQL & Table API + +Flink features two relational APIs, the [Table API and SQL](https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/index.html). Both APIs are unified APIs for batch and stream processing, i.e., queries are executed with the same semantics on unbounded, real-time streams or bounded, recorded streams and produce the same results. The Table API and SQL leverage [Apache Calcite](https://calcite.apache.org) for parsing, validation, and query optimization. They can be seamlessly integrated with the DataStream and DataSet APIs and support user-defined scalar, aggregate, and table-valued functions. + +Flink's relational APIs are designed to ease the definition of [data analytics]({{ site.baseurl }}/usecases.html#analytics), [data pipelining, and ETL applications]({{ site.baseurl }}/usecases.html#pipelines). + +The following example shows the SQL query to sessionize a clickstream and count the number of clicks per session. This is the same use case as in the example of the DataStream API. + +~~~sql +SELECT userId, COUNT(*) +FROM clicks +GROUP BY SESSION(clicktime, INTERVAL '30' MINUTE), userId +~~~ + +## Libraries + +Flink features several libraries for common data processing use cases. The libraries are typically embedded in an API and not fully self-contained. Hence, they can benefit from all features of the API and be integrated with other libraries. + +* **[Complex Event Processing (CEP)](https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/cep.html)**: Pattern detection is a very common use case for event stream processing. Flink's CEP library provides an API to specify patterns of events (think of regular expressions or state machines). The CEP library is integrated with Flink's DataStream API, such that patterns are evaluated on DataStreams. Applications for the CEP library include network intrusion detection, business process monitoring, and fraud detection. + +* **[DataSet API](https://ci.apache.org/projects/flink/flink-docs-stable/dev/batch/index.html)**: The DataSet API is Flink's core API for batch processing applications. The primitives of the DataSet API include *map*, *reduce*, *(outer) join*, *co-group*, and *iterate*. All operations are backed by algorithms and data structures that operate on serialized data in memory and spill to disk if the data size exceed the memory budget. The data processing algorithms of Flink's DataSet API are inspired by traditional database operators, such as hybrid hash-join or external merge-sort. + +* **[Gelly](https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/gelly/index.html)**: Gelly is a library for scalable graph processing and analysis. Gelly is implemented on top of and integrated with the DataSet API. Hence, it benefits from its scalable and robust operators. Gelly features [built-in algorithms](https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/gelly/library_methods.html), such as label propagation, triangle enumeration, and page rank, but provides also a [Graph API](https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/gelly/graph_api.html) that eases the implementation of custom graph algorithms. + +
+
+
+

+ Architecture   +   + Applications   +   + Operations +

+
+
+
diff --git a/flink-architecture.zh.md b/flink-architecture.zh.md new file mode 100644 index 0000000000..2e6c73e691 --- /dev/null +++ b/flink-architecture.zh.md @@ -0,0 +1,100 @@ +--- +title: "Apache Flink 是什么?" +--- + +
+
+
+

+ Architecture   +   + Applications   +   + Operations +

+
+
+
+ +Apache Flink is a framework and distributed processing engine for stateful computations over *unbounded and bounded* data streams. Flink has been designed to run in *all common cluster environments*, perform computations at *in-memory speed* and at *any scale*. + +Here, we explain important aspects of Flink's architecture. + + + +## Process Unbounded and Bounded Data + +Any kind of data is produced as a stream of events. Credit card transactions, sensor measurements, machine logs, or user interactions on a website or mobile application, all of these data are generated as a stream. + +Data can be processed as *unbounded* or *bounded* streams. + +1. **Unbounded streams** have a start but no defined end. They do not terminate and provide data as it is generated. Unbounded streams must be continuously processed, i.e., events must be promptly handled after they have been ingested. It is not possible to wait for all input data to arrive because the input is unbounded and will not be complete at any point in time. Processing unbounded data often requires that events are ingested in a specific order, such as the order in which events occurred, to be able to reason about result completeness. + +2. **Bounded streams** have a defined start and end. Bounded streams can be processed by ingesting all data before performing any computations. Ordered ingestion is not required to process bounded streams because a bounded data set can always be sorted. Processing of bounded streams is also known as batch processing. + +
+ +
+ +**Apache Flink excels at processing unbounded and bounded data sets.** Precise control of time and state enable Flink's runtime to run any kind of application on unbounded streams. Bounded streams are internally processed by algorithms and data structures that are specifically designed for fixed sized data sets, yielding excellent performance. + +Convince yourself by exploring the [use cases]({{ site.baseurl }}/usecases.html) that have been built on top of Flink. + +## Deploy Applications Anywhere + +Apache Flink is a distributed system and requires compute resources in order to execute applications. Flink integrates with all common cluster resource managers such as [Hadoop YARN](https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YARN.html), [Apache Mesos](https://mesos.apache.org), and [Kubernetes](https://kubernetes.io/) but can also be setup to run as a stand-alone cluster. + +Flink is designed to work well each of the previously listed resource managers. This is achieved by resource-manager-specific deployment modes that allow Flink to interact with each resource manager in its idiomatic way. + +When deploying a Flink application, Flink automatically identifies the required resources based on the application's configured parallelism and requests them from the resource manager. In case of a failure, Flink replaces the failed container by requesting new resources. All communication to submit or control an application happens via REST calls. This eases the integration of Flink in many environments. + + + + +## Run Applications at any Scale + +Flink is designed to run stateful streaming applications at any scale. Applications are parallelized into possibly thousands of tasks that are distributed and concurrently executed in a cluster. Therefore, an application can leverage virtually unlimited amounts of CPUs, main memory, disk and network IO. Moreover, Flink easily maintains very large application state. Its asynchronous and incremental checkpointing algorithm ensures minimal impact on processing latencies while guaranteeing exactly-once state consistency. + +[Users reported impressive scalability numbers]({{ site.baseurl }}/poweredby.html) for Flink applications running in their production environments, such as + +* applications processing **multiple trillions of events per day**, +* applications maintaining **multiple terabytes of state**, and +* applications **running on thousands of cores**. + +## Leverage In-Memory Performance + +Stateful Flink applications are optimized for local state access. Task state is always maintained in memory or, if the state size exceeds the available memory, in access-efficient on-disk data structures. Hence, tasks perform all computations by accessing local, often in-memory, state yielding very low processing latencies. Flink guarantees exactly-once state consistency in case of failures by periodically and asynchronously checkpointing the local state to durable storage. + +
+ +
+ +
+
+
+

+ Architecture   +   + Applications   +   + Operations +

+
+
+
diff --git a/flink-operations.zh.md b/flink-operations.zh.md new file mode 100644 index 0000000000..75860a76de --- /dev/null +++ b/flink-operations.zh.md @@ -0,0 +1,72 @@ +--- +title: "Apache Flink 是什么?" +--- + +
+
+
+

+ Architecture   +   + Applications   +   + Operations +

+
+
+
+ +Apache Flink is a framework for stateful computations over unbounded and bounded data streams. Since many streaming applications are designed to run continuously with minimal downtime, a stream processor must provide excellent failure recovery, as well as, tooling to monitor and maintain applications while they are running. + +Apache Flink puts a strong focus on the operational aspects of stream processing. Here, we explain Flink's failure recovery mechanism and present its features to manage and supervise running applications. + +## Run Your Applications Non-Stop 24/7 + +Machine and process failures are ubiquitous in distributed systems. A distributed stream processors like Flink must recover from failures in order to be able to run streaming applications 24/7. Obviously, this does not only mean to restart an application after a failure but also to ensure that its internal state remains consistent, such that the application can continue processing as if the failure had never happened. + +Flink provides a several features to ensure that applications keep running and remain consistent: + +* **Consistent Checkpoints**: Flink's recovery mechanism is based on consistent checkpoints of an application's state. In case of a failure, the application is restarted and its state is loaded from the latest checkpoint. In combination with resettable stream sources, this feature can guarantee *exactly-once state consistency*. +* **Efficient Checkpoints**: Checkpointing the state of an application can be quite expensive if the application maintains terabytes of state. Flink's can perform asynchronous and incremental checkpoints, in order to keep the impact of checkpoints on the application's latency SLAs very small. +* **End-to-End Exactly-Once**: Flink features transactional sinks for specific storage systems that guarantee that data is only written out exactly once, even in case of failures. +* **Integration with Cluster Managers**: Flink is tightly integrated with cluster managers, such as [Hadoop YARN](https://hadoop.apache.org), [Mesos](https://mesos.apache.org), or [Kubernetes](https://kubernetes.io). When a process fails, a new process is automatically started to take over its work. +* **High-Availability Setup**: Flink feature a high-availability mode that eliminates all single-points-of-failure. The HA-mode is based on [Apache ZooKeeper](https://zookeeper.apache.org), a battle-proven service for reliable distributed coordination. + +## Update, Migrate, Suspend, and Resume Your Applications + +Streaming applications that power business-critical services need to be maintained. Bugs need to be fixed and improvements or new features need to be implemented. However, updating a stateful streaming application is not trivial. Often one cannot simply stop the applications and restart an fixed or improved version because one cannot afford to lose the state of the application. + +Flink's *Savepoints* are a unique and powerful feature that solves the issue of updating stateful applications and many other related challenges. A savepoint is a consistent snapshot of an application's state and therefore very similar to a checkpoint. However in contrast to checkpoints, savepoints need to be manually triggered and are not automatically removed when an application is stopped. A savepoint can be used to start a state-compatible application and initialize its state. Savepoints enable the following features: + +* **Application Evolution**: Savepoints can be used to evolve applications. A fixed or improved version of an application can be restarted from a savepoint that was taken from a previous version of the application. It is also possible to start the application from an earlier point in time (given such a savepoint exists) to repair incorrect results produced by the flawed version. +* **Cluster Migration**: Using savepoints, applications can be migrated (or cloned) to different clusters. +* **Flink Version Updates**: An application can be migrated to run on a new Flink version using a savepoint. +* **Application Scaling**: Savepoints can be used to increase or decrease the parallelism of an application. +* **A/B Tests and What-If Scenarios**: The performance or quality of two (or more) different versions of an application can be compared by starting all versions from the same savepoint. +* **Pause and Resume**: An application can be paused by taking a savepoint and stopping it. At any later point in time, the application can be resumed from the savepoint. +* **Archiving**: Savepoints can be archived to be able to reset the state of an application to an earlier point in time. + +## Monitor and Control Your Applications + +Just like any other service, continuously running streaming applications need to be supervised and integrated into the operations infrastructure, i.e., monitoring and logging services, of an organization. Monitoring helps to anticipate problems and react ahead of time. Logging enables root-cause analysis to investigate failures. Finally, easily accessible interfaces to control running applications are an important feature. + +Flink integrates nicely with many common logging and monitoring services and provides a REST API to control applications and query information. + +* **Web UI**: Flink features a web UI to inspect, monitor, and debug running applications. It can also be used to submit executions for execution or cancel them. +* **Logging**: Flink implements the popular slf4j logging interface and integrates with the logging frameworks [log4j](https://logging.apache.org/log4j/2.x/) or [logback](https://logback.qos.ch/). +* **Metrics**: Flink features a sophisticated metrics system to collect and report system and user-defined metrics. Metrics can be exported to several reporters, including [JMX](https://en.wikipedia.org/wiki/Java_Management_Extensions), Ganglia, [Graphite](https://graphiteapp.org/), [Prometheus](https://prometheus.io/), [StatsD](https://github.com/etsy/statsd), [Datadog](https://www.datadoghq.com/), and [Slf4j](https://www.slf4j.org/). +* **REST API**: Flink exposes a REST API to submit a new application, take a savepoint of a running application, or cancel an application. The REST API also exposes meta data and collected metrics of running or completed applications. + +
+
+
+

+ Architecture   +   + Applications   +   + Operations +

+
+
+
diff --git a/gettinghelp.zh.md b/gettinghelp.zh.md new file mode 100644 index 0000000000..d24fabde8c --- /dev/null +++ b/gettinghelp.zh.md @@ -0,0 +1,133 @@ +--- +title: "获取帮助" +--- + +
+ +{% toc %} + +## Having a Question? + +The Apache Flink community answers many user questions every day. You can search for answers and advice in the archives or reach out to the community for help and guidance. + +### User Mailing List + +Many Flink users, contributors, and committers are subscribed to Flink's user mailing list. The user mailing list is a very good place to ask for help. + +Before posting to the mailing list, you can search the mailing list archives for email threads that discuss issues related to yours on the following websites. + +- [Apache Pony Mail Archive](https://lists.apache.org/list.html?user@flink.apache.org) +- [Nabble Archive](http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/) + +If you'd like to post to the mailing list, you need to + +1. subscribe to the mailing list by sending an email to `user-subscribe@flink.apache.org`, +2. confirm the subscription by replying to the confirmation email, and +3. send your email to `user@flink.apache.org`. + +Please note that you won't receive a respose to your mail if you are not subscribed. + +### Stack Overflow + +Many members of the Flink community are active on [Stack Overflow](https://stackoverflow.com). You can search for questions and answers or post your questions using the [\[apache-flink\]](https://stackoverflow.com/questions/tagged/apache-flink) tag. + +## Found a Bug? + +If you observe an unexpected behavior that might be caused by a bug, you can search for reported bugs or file a bug report in [Flink's JIRA](https://issues.apache.org/jira/issues/?jql=project %3D FLINK). + +If you are unsure whether the unexpected behavior happend due to a bug or not, please post a question to the [user mailing list](#user-mailing-list). + +## Got an Error Message? + +Identifying the cause for an error message can be challenging. In the following, we list the most common error messages and explain how to handle them. + +### I have a NotSerializableException. + +Flink uses Java serialization to distribute copies of the application logic (the functions and operations you implement, +as well as the program configuration, etc.) to the parallel worker processes. +Because of that, all functions that you pass to the API must be serializable, as defined by +[java.io.Serializable](http://docs.oracle.com/javase/8/docs/api/java/io/Serializable.html). + +If your function is an anonymous inner class, consider the following: + - make the function a standalone class, or a static inner class + - use a Java 8 lambda function. + +Is your function is already a static class, check the fields that you assign when you create +an instance of the class. One of the fields most likely holds a non-serializable type. + - In Java, use a `RichFunction` and initialize the problematic fields in the `open()` method. + - In Scala, you can often simply use “lazy val” to defer initialization until the distributed execution happens. This may come at a minor performance cost. You can naturally also use a `RichFunction` in Scala. + +### Using the Scala API, I get an error about implicit values and evidence parameters. + +This error means that the implicit value for the type information could not be provided. +Make sure that you have an `import org.apache.flink.streaming.api.scala._` (DataStream API) or an +`import org.apache.flink.api.scala._` (DataSet API) statement in your code. + +If you are using Flink operations inside functions or classes that take +generic parameters, then a TypeInformation must be available for that parameter. +This can be achieved by using a context bound: + +~~~scala +def myFunction[T: TypeInformation](input: DataSet[T]): DataSet[Seq[T]] = { + input.reduceGroup( i => i.toSeq ) +} +~~~ + +See [Type Extraction and Serialization]({{ site.docs-snapshot }}/dev/types_serialization.html) for +an in-depth discussion of how Flink handles types. + +### I see a ClassCastException: X cannot be cast to X. + +When you see an exception in the style `com.foo.X` cannot be cast to `com.foo.X` (or cannot be assigned to `com.foo.X`), it means that +multiple versions of the class `com.foo.X` have been loaded by different class loaders, and types of that class are attempted to be assigned to each other. + +The reason for that can be: + + - Class duplication through `child-first` classloading. That is an intended mechanism to allow users to use different versions of the same + dependencies that Flink uses. However, if different copies of these classes move between Flink's core and the user application code, such an exception + can occur. To verify that this is the reason, try setting `classloader.resolve-order: parent-first` in the configuration. + If that makes the error disappear, please write to the mailing list to check if that may be a bug. + + - Caching of classes from different execution attempts, for example by utilities like Guava’s Interners, or Avro's Schema cache. + Try to not use interners, or reduce the scope of the interner/cache to make sure a new cache is created whenever a new task + execution is started. + +### I have an AbstractMethodError or NoSuchFieldError. + +Such errors typically indicate a mix-up in some dependency version. That means a different version of a dependency (a library) +is loaded during the execution compared to the version that code was compiled against. + +From Flink 1.4.0 on, dependencies in your application JAR file may have different versions compared to dependencies used +by Flink's core, or other dependencies in the classpath (for example from Hadoop). That requires `child-first` classloading +to be activated, which is the default. + +If you see these problems in Flink 1.4+, one of the following may be true: + - You have a dependency version conflict within your application code. Make sure all your dependency versions are consistent. + - You are conflicting with a library that Flink cannot support via `child-first` classloading. Currently these are the + Scala standard library classes, as well as Flink's own classes, logging APIs, and any Hadoop core classes. + + +### My DataStream application produces no output, even though events are going in. + +If your DataStream application uses *Event Time*, check that your watermarks get updated. If no watermarks are produced, +event time windows might never trigger, and the application would produce no results. + +You can check in Flink's web UI (watermarks section) whether watermarks are making progress. + +### I see an exception reporting "Insufficient number of network buffers". + +If you run Flink with a very high parallelism, you may need to increase the number of network buffers. + +By default, Flink takes 10% of the JVM heap size for network buffers, with a minimum of 64MB and a maximum of 1GB. +You can adjust all these values via `taskmanager.network.memory.fraction`, `taskmanager.network.memory.min`, and +`taskmanager.network.memory.max`. + +Please refer to the [Configuration Reference]({{ site.docs-snapshot }}/ops/config.html#configuring-the-network-buffers) for details. + +### My job fails with various exceptions from the HDFS/Hadoop code. What can I do? + +The most common cause for that is that the Hadoop version in Flink's classpath is different than the +Hadoop version of the cluster you want to connect to (HDFS / YARN). + +The easiest way to fix that is to pick a Hadoop-free Flink version and simply export the Hadoop path and +classpath from the cluster. diff --git a/how-to-contribute.zh.md b/how-to-contribute.zh.md new file mode 100644 index 0000000000..95abf04692 --- /dev/null +++ b/how-to-contribute.zh.md @@ -0,0 +1,149 @@ +--- +title: "如何参与贡献" +--- + +
+ +Apache Flink is developed by an open and friendly community. Everybody is cordially welcome to join the community and contribute to Apache Flink. There are several ways to interact with the community and to contribute to Flink including asking questions, filing bug reports, proposing new features, joining discussions on the mailing lists, contributing code or documentation, improving the website, or testing release candidates. + +{% toc %} + +## Ask questions! + +The Apache Flink community is eager to help and to answer your questions. We have a [user mailing list]({{ site.baseurl }}/community.html#mailing-lists ) and watch Stack Overflow on the [[apache-flink]](http://stackoverflow.com/questions/tagged/apache-flink) tag. + +----- + +## File a bug report + +Please let us know if you experienced a problem with Flink and file a bug report. Open [Flink's Jira](http://issues.apache.org/jira/browse/FLINK), log in if necessary, and click on the red **Create** button at the top. Please give detailed information about the problem you encountered and, if possible, add a description that helps to reproduce the problem. Thank you very much. + +----- + +## Propose an improvement or a new feature + +Our community is constantly looking for feedback to improve Apache Flink. If you have an idea how to improve Flink or have a new feature in mind that would be beneficial for Flink users, please open an issue in [Flink's Jira](http://issues.apache.org/jira/browse/FLINK). The improvement or new feature should be described in appropriate detail and include the scope and its requirements if possible. Detailed information is important for a few reasons: + +- It ensures your requirements are met when the improvement or feature is implemented. +- It helps to estimate the effort and to design a solution that addresses your needs. +- It allow for constructive discussions that might arise around this issue. + +Detailed information is also required, if you plan to contribute the improvement or feature you proposed yourself. Please read the [Contribute code]({{ site.base }}/contribute-code.html) guide in this case as well. + + +We recommend to first reach consensus with the community on whether a new feature is required and how to implement a new feature, before starting with the implementation. Some features might be out of scope of the project, and it's best to discover this early. + +For very big features that change Flink in a fundamental way we have another process in place: +[Flink Improvement Proposals](https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals). If you are interested you can propose a new feature there or follow the +discussions on existing proposals. + +----- + +## Help others and join the discussions + +Most communication in the Apache Flink community happens on two mailing lists: + +- The user mailing list `user@flink.apache.org` is the place where users of Apache Flink ask questions and seek help or advice. Joining the user list and helping other users is a very good way to contribute to Flink's community. Furthermore, there is the [[apache-flink]](http://stackoverflow.com/questions/tagged/apache-flink) tag on Stack Overflow if you'd like to help Flink users (and harvest some points) there. +- The development mailing list `dev@flink.apache.org` is the place where Flink developers exchange ideas and discuss new features, upcoming releases, and the development process in general. If you are interested in contributing code to Flink, you should join this mailing list. + +You are very welcome to [subscribe to both mailing lists]({{ site.baseurl }}/community.html#mailing-lists). + +----- + +## Review a code contribution + +The Apache Flink project receives many code contributions as [Github pull requests](https://github.com/apache/flink/pulls). A great way to contribute to the Flink community is to help review pull requests. + +**Please read the [Review Guide]({{ site.baseurl }}/reviewing-prs.html) if you want to help review pull requests.** + +----- + +## Test a release candidate + +Apache Flink is continuously improved by its active community. Every few weeks, we release a new version of Apache Flink with bug fixes, improvements, and new features. The process of releasing a new version consists of the following steps: + +1. Building a new release candidate and starting a vote (usually for 72 hours). +2. Testing the release candidate and voting (`+1` if no issues were found, `-1` if the release candidate has issues). +3. Going back to step 1 if the release candidate had issues. Otherwise we publish the release. + +Our wiki contains a page that summarizes the [test procedure for a release](https://cwiki.apache.org/confluence/display/FLINK/Releasing). Release testing is a big effort if done by a small group of people but can be easily scaled out to more people. The Flink community encourages everybody to participate in the testing of a release candidate. By testing a release candidate, you can ensure that the next Flink release is working properly for your setup and help to improve the quality of releases. + +----- + +## Contribute code + +Apache Flink is maintained, improved, and extended by code contributions of volunteers. The Apache Flink community encourages anybody to contribute source code. In order to ensure a pleasant contribution experience for contributors and reviewers and to preserve the high quality of the code base, we follow a contribution process that is explained in our [Contribute code]( {{ site.base }}/contribute-code.html) guide. The guide also includes instructions on how to setup a development environment, our coding guidelines and code style, and explains how to submit a code contribution. + +**Please read the [Contribute code]( {{ site.base }}/contribute-code.html) guide before you start to work on a code contribution.** + +Please do also read the [Submit a Contributor License Agreement]({{ site.baseurl }}/how-to-contribute.html#submit-a-contributor-license-agreement) Section. + +### Looking for an issue to work on? +{:.no_toc} + +We maintain a list of all known bugs, proposed improvements, and suggested features in [Flink's Jira](https://issues.apache.org/jira/browse/FLINK/?selectedTab=com.atlassian.jira.jira-projects-plugin:issues-panel). Issues that we believe are good tasks for new contributors are tagged with a special "starter" tag. Those tasks are supposed to be rather easy to solve and will help you to become familiar with the project and the contribution process. + +Please have a look at the list of [starter issues](https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20starter%20ORDER%20BY%20priority%20DESC), if you are looking for an issue to work on. You can of course also choose [any other issue](https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) to work on. Feel free to ask questions about issues that you would be interested in working on. + +----- + +## Contribute documentation + +Good documentation is crucial for any kind of software. This is especially true for sophisticated software systems such as distributed data processing engines like Apache Flink. The Apache Flink community aims to provide concise, precise, and complete documentation and welcomes any contribution to improve Apache Flink's documentation. + +- Please report missing, incorrect, or outdated documentation as a [Jira issue](http://issues.apache.org/jira/browse/FLINK). +- Flink's documentation is written in Markdown and located in the `docs` folder in [Flink's source code repository]({{ site.baseurl }}/community.html#main-source-repositories). See the [Contribute documentation]({{ site.base }}/contribute-documentation.html) guidelines for detailed instructions for how to update and improve the documentation and to contribute your changes. + +----- + +## Improve the website + +The [Apache Flink website](http://flink.apache.org) presents Apache Flink and its community. It serves several purposes including: + +- Informing visitors about Apache Flink and its features. +- Encouraging visitors to download and use Flink. +- Encouraging visitors to engage with the community. + +We welcome any contribution to improve our website. + +- Please open a [Jira issue](http://issues.apache.org/jira/browse/FLINK) if you think our website could be improved. +- Please follow the [Improve the website]({{ site.baseurl }}/improve-website.html) guidelines if you would like to update and improve the website. + +----- + +## More ways to contribute… + +There are many more ways to contribute to the Flink community. For example you can: + +- Give a talk about Flink and tell others how you use it. +- Organize a local Meetup or user group. +- Talk to people about Flink. +- … + +----- + +## Submit a Contributor License Agreement + +Please submit a contributor license agreement to the Apache Software Foundation (ASF) if you would like to contribute to Apache Flink. The following quote from [http://www.apache.org/licenses](http://www.apache.org/licenses/#clas) gives more information about the ICLA and CCLA and why they are necessary. + +> The ASF desires that all contributors of ideas, code, or documentation to the Apache projects complete, sign, and submit (via postal mail, fax or email) an [Individual Contributor License Agreement](http://www.apache.org/licenses/icla.txt) (CLA) [ [PDF form](http://www.apache.org/licenses/icla.pdf) ]. The purpose of this agreement is to clearly define the terms under which intellectual property has been contributed to the ASF and thereby allow us to defend the project should there be a legal dispute regarding the software at some future time. A signed CLA is required to be on file before an individual is given commit rights to an ASF project. +> +> For a corporation that has assigned employees to work on an Apache project, a [Corporate CLA](http://www.apache.org/licenses/cla-corporate.txt) (CCLA) is available for contributing intellectual property via the corporation, that may have been assigned as part of an employment agreement. Note that a Corporate CLA does not remove the need for every developer to sign their own CLA as an individual, to cover any of their contributions which are not owned by the corporation signing the CCLA. +> +> ... + +----- + +## How to become a committer + +Committers are community members that have write access to the project's repositories, i.e., they can modify the code, documentation, and website by themselves and also accept other contributions. + +There is no strict protocol for becoming a committer. Candidates for new committers are typically people that are active contributors and community members. + +Being an active community member means participating on mailing list discussions, helping to answer questions, verifying release candidates, being respectful towards others, and following the meritocratic principles of community management. Since the "Apache Way" has a strong focus on the project community, this part is *very* important. + +Of course, contributing code and documentation to the project is important as well. A good way to start is contributing improvements, new features, or bug fixes. You need to show that you take responsibility for the code that you contribute, add tests and documentation, and help maintaining it. + +Candidates for new committers are suggested by current committers or PMC members, and voted upon by the PMC. + +If you would like to become a committer, you should engage with the community and start contributing to Apache Flink in any of the above ways. You might also want to talk to other committers and ask for their advice and guidance. diff --git a/improve-website.zh.md b/improve-website.zh.md new file mode 100755 index 0000000000..d8be9e60e8 --- /dev/null +++ b/improve-website.zh.md @@ -0,0 +1,106 @@ +--- +title: "改进网站" +--- + +The [Apache Flink website](http://flink.apache.org) presents Apache Flink and its community. It serves several purposes including: + +- Informing visitors about Apache Flink and its features. +- Encouraging visitors to download and use Flink. +- Encouraging visitors to engage with the community. + +We welcome any contribution to improve our website. This document contains all information that is necessary to improve Flink's website. + +{% toc %} + +## Obtain the website sources + +The website of Apache Flink is hosted in a dedicated [git](http://git-scm.com/) repository which is mirrored to GitHub at [https://github.com/apache/flink-web](https://github.com/apache/flink-web). + +The easiest way to contribute website updates is to fork [the mirrored website repository on GitHub](https://github.com/apache/flink-web) into your own GitHub account by clicking on the fork button at the top right. If you have no GitHub account, you can create one for free. + +Next, clone your fork to your local machine. + +``` +git clone https://github.com//flink-web.git +``` + +The `flink-web` directory contains the cloned repository. The website resides in the `asf-site` branch of the repository. Run the following commands to enter the directory and switch to the `asf-site` branch. + +``` +cd flink-web +git checkout asf-site +``` + +## Directory structure and files + +Flink's website is written in [Markdown](http://daringfireball.net/projects/markdown/). Markdown is a lightweight markup language which can be translated to HTML. We use [Jekyll](http://jekyllrb.com/) to generate static HTML files from Markdown. + +The files and directories in the website git repository have the following roles: + +- All files ending with `.md` are Markdown files. These files are translated into static HTML files. +- Regular directories (not starting with an underscore (`_`)) contain also `.md` files. The directory structure is reflected in the generated HTML files and the published website. +- The `_posts` directory contains blog posts. Each blog post is written as one Markdown file. To contribute a post, add a new file there. +- The `_includes/` directory contains includeable files such as the navigation bar or the footer. +- The `docs/` directory contains copies of the documentation of Flink for different releases. There is a directory inside `docs/` for each stable release and the latest SNAPSHOT version. The build script is taking care of the maintenance of this directory. +- The `content/` directory contains the generated HTML files from Jekyll. It is important to place the files in this directory since the Apache Infrastructure to host the Flink website is pulling the HTML content from his directory. (For committers: When pushing changes to the website git, push also the updates in the `content/` directory!) + +## Update or extend the documentation + +You can update and extend the website by modifying or adding Markdown files or any other resources such as CSS files. To verify your changes start the build script in preview mode. + +``` +./build.sh -p +``` + +The script compiles the Markdown files into HTML and starts a local webserver. Open your browser at `http://localhost:4000` to view the website including your changes. The served website is automatically re-compiled and updated when you modify and save any file and refresh your browser. + +Alternatively you can build the web site using Docker (without augmenting your host environment): + +``` +docker run --rm --volume="$PWD:/srv/flink-web" --expose=4000 -p 4000:4000 -it ruby:2.5 bash -c 'cd /srv/flink-web && ./build.sh -p' +``` + +Please feel free to ask any questions you have on the developer mailing list. + +## Submit your contribution + +The Flink project accepts website contributions through the [GitHub Mirror](https://github.com/apache/flink-web) as [Pull Requests](https://help.github.com/articles/using-pull-requests). Pull requests are a simple way of offering a patch by providing a pointer to a code branch that contains the changes. + +To prepare and submit a pull request follow these steps. + +1. Commit your changes to your local git repository. **Please Make sure that your commit does not include translated files (any files in the `content/` directory).** Unless your contribution is a major rework of the website, please squash it into a single commit. + +2. Push the commit to a dedicated branch of your fork of the Flink repository at GitHub. + + ``` + git push origin myBranch + ``` + +3. Go the website of your repository fork (`https://github.com//flink-web`) and use the "Create Pull Request" button to start creating a pull request. Make sure that the base fork is `apache/flink-web asf-site` and the head fork selects the branch with your changes. Give the pull request a meaningful description and submit it. + +## Committer section + +**This section is only relevant for committers.** + +### ASF website git repositories +{:.no_toc} + +**ASF writable**: https://gitbox.apache.org/repos/asf/flink-web.git + +Details on how to set the credentials for the ASF git repository are [linked here](https://gitbox.apache.org/). + +### Merging a pull request +{:.no_toc} + +Contributions are expected to be done on the source files only (no modifications on the compiled files in the `content/` directory.). Before pushing a website change, please run the build script + +``` +./build.sh +``` + +add the changes to the `content/` directory as an additional commit and push the changes to the ASF base repository. + +### Updating the documentation directory +{:.no_toc} + +The build script does also take care of maintaining the `docs/` directory. Set the `-u` flag to update documentation. This includes fetching the Flink git repository and copying different versions of the documentation. diff --git a/index.zh.md b/index.zh.md new file mode 100755 index 0000000000..3b963dd0ad --- /dev/null +++ b/index.zh.md @@ -0,0 +1,356 @@ +--- +title: "数据流上的有状态计算" +layout: base +--- +
+ +
+

+ **Apache Flink® - 数据流上的有状态计算** +

+
+ +
+
+
+ +
+ + + +
+
+ +
+ + + + +
+
+
+
+ 所有流式场景 +
+
+
    +
  • 事件驱动应用
  • +
  • 流批分析
  • +
  • 数据管道 & ETL
  • +
+ 了解更多 +
+
+
+
+
+
+ 正确性保证 +
+
+
    +
  • Exactly-once 状态一致性
  • +
  • 事件时间处理
  • +
  • 成熟的迟到数据处理
  • +
+ 了解更多 +
+
+
+
+
+
+ 分层 API +
+
+
    +
  • SQL on Stream & Batch Data
  • +
  • DataStream API & DataSet API
  • +
  • ProcessFunction (Time & State)
  • +
+ 了解更多 +
+
+
+
+
+
+
+
+ 聚焦运维 +
+
+
    +
  • 灵活部署
  • +
  • 高可用
  • +
  • 保存点
  • +
+ 了解更多 +
+
+
+
+
+
+ 大规模计算 +
+
+
    +
  • 水平扩展架构
  • +
  • 支持超大状态
  • +
  • 增量检查点机制
  • +
+ 了解更多 +
+
+
+
+
+
+ 性能卓越 +
+
+
    +
  • 低延迟
  • +
  • 高吞吐
  • +
  • 内存计算
  • +
+ 了解更多 +
+
+
+
+ + + +
+
+
+

Aapche Flink 用户

+ +
+ +
+ + + + +
+ +
+ + + +
+ +
+
+
+ + + +
+ +
+ {% for post in site.posts limit:5 %} +
{{ post.title }}
+
{{ post.excerpt }}
+ {% endfor %} +
+ +
+ + + + + + diff --git a/poweredby.zh.md b/poweredby.zh.md new file mode 100755 index 0000000000..5243f353ab --- /dev/null +++ b/poweredby.zh.md @@ -0,0 +1,109 @@ +--- +title: "Flink 用户" +--- + + + + + +
+ +Apache Flink powers business-critical applications in many companies and enterprises around the globe. On this page, we present a few notable Flink users that run interesting use cases in production and link to resources that discuss their applications in more detail. + +More Flink users are listed in the Powered by Flink directory in the project wiki. Please note that the list is *not* comprehensive. We only add users that explicitly ask to be listed. + +If you would you like to be included on this page, please reach out to the [Flink user mailing list]({{ site.baseurl }}/community.html#mailing-lists) and let us know. + +
+
+ Alibaba
+ Alibaba, the world's largest retailer, uses a fork of Flink called Blink to optimize search rankings in real time.

Read more about Flink's role at Alibaba +
+
+ BetterCloud
+ BetterCloud, a multi-SaaS management platform, uses Flink to surface near real-time intelligence from SaaS application activity.

See BetterCloud at Flink Forward SF 2017 +
+
+ Bouygues
+ Bouygues Telecom is running 30 production applications powered by Flink and is processing 10 billion raw events per day.

See Bouygues Telcom at Flink Forward 2016 +
+
+ Capital One
+ Capital One, a Fortune 500 financial services company, uses Flink for real-time activity monitoring and alerting.

Learn about Capital One's fraud detection use case +
+
+ Comcast
+ Comcast, a global media and technology company, uses Flink for operationalizing machine learning models and near-real-time event stream processing.

Learn about Flink at Comcast +
+
+ Criteo
+ Criteo is the advertising platform for the open internet and uses Flink for real-time revenue monitoring and near-real-time event processing.

Learn about Criteo's Flink use case +
+
+ Drivetribe
+ Drivetribe, a digital community founded by the former hosts of “Top Gear”, uses Flink for metrics and content recommendations.

Read about Flink in the Drivetribe stack +
+
+ Ebay
+ Ebay's monitoring platform is powered by Flink and evaluates thousands of customizable alert rules on metrics and log streams.

Learn more about Flink at Ebay +
+
+ Ericsson
+ Ericsson used Flink to build a real-time anomaly detector with machine learning over large infrastructures.

Read a detailed overview on O'Reilly Ideas +
+
+ Huawei
+ Huawei is a leading global provider of ICT infrastructure and smart devices. Huawei Cloud provides Cloud Service based on Flink.

Learn about how Flink powers Cloud Service +
+
+ King
+ King, the creators of Candy Crush Saga, uses Flink to provide data science teams a real-time analytics dashboard.

Read about King's Flink implementation +
+
+ King
+ Lyft uses Flink as processing engine for its streaming platform, for example to consistently generate features for machine learning.

Read more about Streaming at Lyft +
+
+ MediaMath
+ MediaMath, a programmatic marketing company, uses Flink to power its real-time reporting infrastructure.

See MediaMath at Flink Forward SF 2017 +
+
+ Mux
+ Mux, an analytics company for streaming video providers, uses Flink for real-time anomaly detection and alerting.

Read more about how Mux is using Flink +
+
+ Otto Group
+ Otto Group, the world's second-largest online retailer, uses Flink for business intelligence stream processing.

See Otto at Flink Forward 2016 +
+
+ ResearchGate
+ ResearchGate, a social network for scientists, uses Flink for network analysis and near-duplicate detection.

See ResearchGate at Flink Forward 2016 +
+
+ Telefonica Next
+ Telefónica NEXT's TÜV-certified Data Anonymization Platform is powered by Flink.

Read more about Telefónica NEXT +
+
+ Tencent
+ Tencent, one of the largest Internet companies, built an in-house platform with Apache Flink to improve the efficiency of developing and operating real-time applications.

Read more about Tencent's platform. +
+
+ Uber
+ Uber built their internal SQL-based, open-source streaming analytics platform AthenaX on Apache Flink.

Read more on the Uber engineering blog +
+
+ Yelp
+ Yelp utilizes Flink to power its data connectors ecosystem and stream processing infrastructure.

Find out more watching a Flink Forward talk +
+
+ Zalando
+ Zalando, one of the largest e-commerce companies in Europe, uses Flink for real-time process monitoring and ETL.

Read more on the Zalando Tech Blog +
+
+ + diff --git a/reviewing-prs.zh.md b/reviewing-prs.zh.md new file mode 100644 index 0000000000..d252adcf2e --- /dev/null +++ b/reviewing-prs.zh.md @@ -0,0 +1,120 @@ +--- +title: "如何 Review 一个 Pull Request" +--- + +
+ +This guide is for all committers and contributors that want to help with reviewing code contributions. Thank you for your effort - good reviews are one the most important and crucial parts of an open source project. This guide should help the community to make reviews such that: + +* Contributors have a good contribution experience. +* Our reviews are structured and check all important aspects of a contribution. +* We make sure to keep a high code quality in Flink. +* We avoid situations where contributors and reviewers spend a lot of time refining a contribution that gets rejected later. + +---- + +{% toc %} + +## Review Checklist + +Every review needs to check the following five aspects. We encourage to check these aspects in order, to avoid spending time on detailed code quality reviews when there is no consensus yet whether a feature or change should actually be added. + +### 1. Is the Contribution Well-Described? + +Check whether the contribution is sufficiently well-described to support a good review. Trivial changes and fixes do not need a long description. Any pull request that changes functionality or behavior needs to describe the big picture of these changes, so that reviews know what to look for (and don’t have to dig through the code to hopefully understand what the change does). + +Changes that require longer descriptions are ideally based on a prior design discussion in the mailing list or in Jira and can simply link to there or copy the description from there. + +**A contribution is well-described if the following questions 2, 3, and 4 can be answered without looking at the code.** + +----- + +### 2. Is There Consensus that the Change or Feature Should Go into Flink? + +For bug fixes, this needs to be checked only in case it requires bigger changes or might break existing programs and setups. + +Ideally, this question can be directly answered from a Jira issue or a dev-list discussion, except in cases of bug fixes and small lightweight additions/extensions. In that case, this question can be immediately marked as resolved. For pull requests that are created without prior consensus, this question needs to be answered as part of the review. + +The decision whether the change should go into Flink needs to take the following aspects into consideration: + +* Does the contribution alter the behavior of features or components in a way that it may break previous users’ programs and setups? If yes, there needs to be a discussion and agreement that this change is desirable. +* Does the contribution conceptually fit well into Flink? Is it too much of a special case such that it makes things more complicated for the common case, or bloats the abstractions / APIs? +* Does the feature fit well into Flink's architecture? Will it scale and keep Flink flexible for the future, or will the feature restrict Flink in the future? +* Is the feature a significant new addition (rather than an improvement to an existing part)? If yes, will the Flink community commit to maintaining this feature? +* Does the feature produce added value for Flink users or developers? Or does it introduce the risk of regression without adding relevant user or developer benefit? +* Could the contribution live in another repository, e.g., [Apache Bahir](https://bahir.apache.org) or another external repository? + +All of these questions should be answerable from the description/discussion in Jira and Pull Request, without looking at the code. + +**A feature, improvement, or bug fix is approved once one committer accepts it and no committer disagrees (lazy consensus).** + +In case of diverging opinions, the discussion should be moved to the respective Jira issue or to the dev mailing list and continued until consensus is reached. If the change is proposed by a committer, it is best-practice to seek the approval of another committer. + +----- + +### 3. Does the Contribution Need Attention from some Specific Committers and Is There Time Commitment from These Committers? + +Some changes require attention and approval from specific committers. For example, changes in parts that are either very performance sensitive, or have a critical impact on distributed coordination and fault tolerance need input by a committer that is deeply familiar with the component. + +As a rule of thumb, special attention is required when the Pull Request description answers one of the questions in the template section “Does this pull request potentially affect one of the following parts” with ‘yes’. + +This question can be answered with + +* *Does not need specific attention* +* *Needs specific attention for X (X can be for example checkpointing, jobmanager, etc.).* +* *Has specific attention for X by @commiterA, @contributorB* + +**If the pull request needs specific attention, one of the tagged committers/contributors should give the final approval.** + +---- + +### 4. Does the Implementation Follow the Right Overall Approach/Architecture? + +Is this the best approach to implement the fix or feature, or are there other approaches that would be easier, more robust, or more maintainable? +This question should be answerable from the Pull Request description (or the linked Jira) as much as possible. + +We recommend to check this before diving into the details of commenting on individual parts of the change. + +---- + +### 5. Is the Overall Code Quality Good, Meeting Standard we Want to Maintain in Flink? + +This is the detailed code review of the actual changes, covering: + +* Are the changes doing what is described in the design document or PR description? +* Does the code follow the right software engineering practices? Is the code correct, robust, maintainable, testable? +* Are the changes performance aware, when changing a performance sensitive part? +* Are the changes sufficiently covered by tests? +* Are the tests executing fast, i.e., are heavy-weight integration tests only used when necessary? +* Does the code format follow Flink’s checkstyle pattern? +* Does the code avoid to introduce additional compiler warnings? + +Some code style guidelines can be found in the [Flink Code Style Page]({{ site.baseurl }}/contribute-code.html#code-style) + +## Review with the @flinkbot + +The Flink community is using a service called [@flinkbot](https://github.com/flinkbot) to help with the review of the pull requests. + +The bot automatically posts a comment tracking the review progress for each new pull request: + +``` +### Review Progress + +* [ ] 1. The description looks good. +* [ ] 2. There is consensus that the contribution should go into to Flink. +* [ ] 3. [Does not need specific attention | Needs specific attention for X | Has attention for X by Y] +* [ ] 4. The architectural approach is sound. +* [ ] 5. Overall code quality is good. + +Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) if you have questions about the review process. +``` + +Reviewers can instruct the bot to tick off the boxes (in order) to indicate the progress of the review. + +For approving the description of the contribution, mention the bot with `@flinkbot approve description`. This works similarly with `consensus`, `architecture` and `quality`. + +For approving all aspects, put a new comment with `@flinkbot approve all` into the pull request. + +The syntax for requiring attention is `@flinkbot attention @username1 [@username2 ..]`. + + diff --git a/usecases.zh.md b/usecases.zh.md new file mode 100644 index 0000000000..e1daa8a148 --- /dev/null +++ b/usecases.zh.md @@ -0,0 +1,105 @@ +--- +title: "应用场景" +--- + +
+ +Apache Flink is an excellent choice to develop and run many different types of applications due to its extensive features set. Flink's features include support for stream and batch processing, sophisticated state management, event-time processing semantics, and exactly-once consistency guarantees for state. Moreover, Flink can be deployed on various resource providers such as YARN, Apache Mesos, and Kubernetes but also as stand-alone cluster on bare-metal hardware. Configured for high availability, Flink does not have a single point of failure. Flink has been proven to scale to thousands of cores and terabytes of application state, delivers high throughput and low latency, and powers some of the world's most demanding stream processing applications. + +Below, we explore the most common types of applications that are powered by Flink and give pointers to real-world examples. + +* Event-driven Applications +* Data Analytics Applications +* Data Pipeline Applications + +## Event-driven Applications + +### What are event-driven applications? + +An event-driven application is a stateful application that ingest events from one or more event streams and reacts to incoming events by triggering computations, state updates, or external actions. + +Event-driven applications are an evolution of the traditional application design with separated compute and data storage tiers. In this architecture, applications read data from and persist data to a remote transactional database. + +In contrast, event-driven applications are based on stateful stream processing applications. In this design, data and computation are co-located, which yields local (in-memory or disk) data access. Fault-tolerance is achieved by periodically writing checkpoints to a remote persistent storage. The figure below depicts the difference between the traditional application architecture and event-driven applications. + +
+
+ +
+ +### What are the advantages of event-driven applications? + +Instead of querying a remote database, event-driven applications access their data locally which yields better performance, both in terms of throughput and latency. The periodic checkpoints to a remote persistent storage can be asynchronously and incrementally done. Hence, the impact of checkpointing on the regular event processing is very small. However, the event-driven application design provides more benefits than just local data access. In the tiered architecture, it is common that multiple applications share the same database. Hence, any change of the database, such as changing the data layout due to an application update or scaling the service, needs to be coordinated. Since each event-driven application is responsible for its own data, changes to the data representation or scaling the application requires less coordination. + +### How does Flink support event-driven applications? + +The limits of event-driven applications are defined by how well a stream processor can handle time and state. Many of Flink's outstanding features are centered around these concepts. Flink provides a rich set of state primitives that can manage very large data volumes (up to several terabytes) with exactly-once consistency guarantees. Moreover, Flink's support for event-time, highly customizable window logic, and fine-grained control of time as provided by the `ProcessFunction` enable the implementation of advanced business logic. Moreover, Flink features a library for Complex Event Processing (CEP) to detect patterns in data streams. + +However, Flink's outstanding feature for event-driven applications are savepoints. A savepoint a consistent state image that can be used as a starting point for compatible applications. Given a savepoint, an application can be updated or adapt its scale, or multiple versions of an application can be started for A/B testing. + +### What are typical event-driven applications? + +* Fraud detection +* Anomaly detection +* Rule-based alerting +* Business process monitoring +* Web application (social network) + +## Data Analytics Applications + +### What are data analytics applications? + +Analytical jobs extract information and insight from raw data. Traditionally, analytics are performed as batch queries or applications on bounded data sets of recorded events. In order to incorporate the latest data into the result of the analysis, it has to be added to the analyzed data set and the query or application is rerun. The results are written to a storage system or emitted as reports. + +With a sophisticated stream processing engine, analytics can also be performed in a real-time fashion. Instead of reading finite data sets, streaming queries or applications ingest real-time event streams and continuously produce and update results as events are consumed. The results are either written to an external database or maintained as internal state. Dashboard application can read the latest results from the external database or directly query the internal state of the application. + +Apache Flink supports streaming as well as batch analytical applications as shown in the figure below. + +
+ +
+ +### What are the advantages of streaming analytics applications? + +The advantages of continuous streaming analytics compared to batch analytics are not limited to a much lower latency from events to insight due to elimination of periodic import and query execution. In contrast to batch queries, streaming queries do not have to deal with artificial boundaries in the input data which are caused by periodic imports and the bounded nature of the input. + +Another aspect is a simpler application architecture. A batch analytics pipeline consist of several independent components to periodically schedule data ingestion and query execution. Reliably operating such a pipeline is non-trivial because failures of one component affect the following steps of the pipeline. In contrast, a streaming analytics application which runs on a sophisticated stream processor like Flink incorporates all steps from data ingestions to continuous result computation. Therefore, it can rely on the engine's failure recovery mechanism. + +### How does Flink support data analytics applications? + +Flink provides very good support for continuous streaming as well as batch analytics. Specifically, it features an ANSI-compliant SQL interface with unified semantics for batch and streaming queries. SQL queries compute the same result regardless whether they are run on a static data set of recorded events or on a real-time event stream. Rich support for user-defined functions ensures that custom code can be executed in SQL queries. If even more custom logic is required, Flink's DataStream API or DataSet API provide more low-level control. Moreover, Flink's Gelly library provides algorithms and building blocks for large-scale and high-performance graph analytics on batch data sets. + +### What are typical data analytics applications? + +* Quality monitoring of Telco networks +* Analysis of product updates & experiment evaluation in mobile applications +* Ad-hoc analysis of live data in consumer technology +* Large-scale graph analysis + +## Data Pipeline Applications + +### What are data pipelines? + +Extract-transform-load (ETL) is a common approach to convert and move data between storage systems. Often ETL jobs are periodically triggered to copy data from from transactional database systems to an analytical database or a data warehouse. + +Data pipelines serve a similar purpose as ETL jobs. They transform and enrich data and can move it from one storage system to another. However, they operate in a continuous streaming mode instead of being periodically triggered. Hence, they are able to read records from sources that continuously produce data and move it with low latency to their destination. For example a data pipeline might monitor a file system directory for new files and write their data into an event log. Another application might materialize an event stream to a database or incrementally build and refine a search index. + +The figure below depicts the difference between periodic ETL jobs and continuous data pipelines. + +
+ +
+ +### What are the advantages of data pipelines? + +The obvious advantage of continuous data pipelines over periodic ETL jobs is the reduced latency of moving data to its destination. Moreover, data pipelines are more versatile and can be employed for more use cases because they are able to continuously consume and emit data. + +### How does Flink support data pipelines? + +Many common data transformation or enrichment tasks can be addressed by Flink's SQL interface (or Table API) and its support for user-defined functions. Data pipelines with more advanced requirements can be realized by using the DataStream API which is more generic. Flink provides a rich set of connectors to various storage systems such as Kafka, Kinesis, Elasticsearch, and JDBC database systems. It also features continuous sources for file systems that monitor directories and sinks that write files in a time-bucketed fashion. + +### What are typical data pipeline applications? + +* Real-time search index building in e-commerce +* Continuous ETL in e-commerce +