Replies: 15 comments
-
Thanks for opening your first issue here! Be sure to follow the issue template! |
Beta Was this translation helpful? Give feedback.
-
SGTM. Let's talk about it in PR/PRs. |
Beta Was this translation helpful? Give feedback.
-
PS. If you want to pull-request, you don't have to create an issue first, but if you want, you can do it. Creating an issue will allow you to collect feedback or share plans with other people. |
Beta Was this translation helpful? Give feedback.
-
That was my thinking. For example, this may be too advanced of an example for the basic tutorial anyway, as it brings in database connections, etc. It might be simpler to just work with files directly to start. |
Beta Was this translation helpful? Give feedback.
-
I think this will be difficult to be discussed as a 'whole' here - if you - as new contributor - find that the examples are confusing for you (a little) but you seem to know how to fix them (you seem to), the best approach is to create small PRs fixing the problems one -by one. Start small and just fix one thing at a time. Then the potential dicussion will be much more focused (we can and we usually do discuss directly in PRs), and everyone will loose much less time on it. |
Beta Was this translation helpful? Give feedback.
-
Feel free to create PRs for individual items here @drobert - I also marked it as "good first issue" for others to contribute. I also converted your bullets into task lists so that you can |
Beta Was this translation helpful? Give feedback.
-
Also I think (side comment) - since this is pretty natural that some code might be a little out-dated or buggy, I think changing the title to "Basic tutorial examples could be improved". reflects a bit better the actual state of it. A bit "less aggressive" and "more empathetic" for all those 1700+ contributors who created the software and docs that you can use for free :D. |
Beta Was this translation helpful? Give feedback.
-
I would like to work on this issue. |
Beta Was this translation helpful? Give feedback.
-
Feel free |
Beta Was this translation helpful? Give feedback.
-
@potiuk I did not intend to make this sound aggressive, but I will admit I don't think "might be a little out-dated or buggy" is accurate either. It is not possible this code or this data set was ever attempted, and it is admittedly a very frustrating experience for a new developer to fail at the given "hello world" example. There is no disrespect intended at all on the Airflow product or the countless hours dedicated engineers have given to its release. But in the context of the getting started tutorial, I felt this title was accurate without being accusatory. (It's not "horrible documentation" or "clearly the doc writers are lazy", but the tutorial is significantly full of bugs at just about every step.) I could see "Basic tutorial examples do not function", but "improvement" does not seem accurate. I was attempting to write this from the context of someone looking for whether there are open issues with the documentation (as I was). |
Beta Was this translation helpful? Give feedback.
-
I thin the importnt thing is to make it better. Which i heartily invite you to help with |
Beta Was this translation helpful? Give feedback.
-
I got an error: So I solve it by creating the path but it is not mentioned in doc. |
Beta Was this translation helpful? Give feedback.
-
Can you please make a PR correcting it? It's as easy as clicking "Suggest a change on this page" and correcting it. You can then become one of ~2000 contributors and give back for the free software you use. |
Beta Was this translation helpful? Give feedback.
-
Converting it to discussion, as there is no clear "criteria" when this issue is fixed. |
Beta Was this translation helpful? Give feedback.
-
Describe the issue with documentation
In response to the contents of https://airflow.apache.org/docs/apache-airflow/stable/tutorial.html#testing as of 2021-10-13
There are several broken things in the tutorial as given, and a few pieces of missing context. (I can help PR this once I go through the contributor guidelines; just recording them for now).
In order:
Table Definitions
In initial setup, there is a name collision in the
employees_temp
: both tables use a constraint namedemployees_pk
. The latter (_temp
) should probably useemployees_temp_pk
.(As a nit-pick, it feels odd to capitalize the table names, requires quoting the names when using
psql
's\d
.)Python Code
Imports
Firstly, it's missing necessary imports. Perhaps they're obvious if you're experienced with airflow, but for those coming into fresh it's not clear what's missing. It appears to be:
Connections
There's no mention of 'connections' (or the concept) in the documentation leading up to this tutorial. The code as written relies on an airflow
Connection
namedLOCAL
being defined.Paths and Duplicated Code
The path where the file is downloaded is hard-coded to
/usr/local/airflow/...
. For those following https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html this would likely need to be/opt/airflow/dags/files
, and require thefiles
subdirectory to be created first.Regardless, it would be nice if this path were defined once at the top of the script or as a parameter or other config option rather than copy-pasted.
Bugs
cur.copy_from
references file handle with variablef
but the handle's name isfile
.copy_from
does not handle a CSV header row. We either need to skip the first line, or usecopy_expert
or similar:Location
It says 'save the file' but doesn't specify whether there's any naming convention or somesuch. I might suggest providing an example name like
etl.py
.Data
The data this script relies on comes from https://raw.githubusercontent.com/apache/airflow/main/docs/apache-airflow/pipeline_example.csv .
It appears this file is a spreadsheet export; the 'Serial Number
column's data is in scientific notation format. This is resulting in some truncation and duplicate keys (e.g.
9.78938E+12` appears 25 times).Secondly, the data at this URL appears to use linefeed (LF,
\n
) for newlines. The postgres 'copy' documentation says this should work fine.The python code itself before attempting the
copy_from
splits the input on newline; effectively removing all newlines from the file. Seems we should remove the 'split' and just write the contents directly, or have itfile.write(row+'\n')
Lastly, there's a blank newline at the end of the written file; this doesn't work for postgres
copy from
How to solve the problem
Just to re-itemize from the above:
Connections
and note that one needs to be set up as appropriate (either one namedLOCAL
or the user should create one and modify the code here to use it). This may be a bigger concern for those following the [docker start](https://airflow.apache.org/docs/apache-airflow/stable/start/docker.htmlfile
instead off
)long
number format rather than scientific notationAnything else
No response
Are you willing to submit PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions