-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up creation of DagRun for large DAGs (5k+ tasks) by 25-140% #20722
Conversation
This uses the "bulk" operation API of SQLAlchemy to get a big speed up. Due to the `task_instance_mutation_hook` we still need to keep actual TaskInstance objects around. For postgresql we have enabled to "batch operation helpers"[1] which makes it even faster. The default page sizes are chosen somewhat randomly based on the SQLA docs. To make these options configurable I have added (and used here and in KubeConfig) a new `getjson` option to AirflowConfigParser class. *Postgresql is over 40% faster*: Before: ``` number_of_tis=1 mean=0.004397215199423954 per=0.004397215199423954 times=[0.009390181003254838, 0.002814065999700688, 0.00284132499655243, 0.0036120269942330196, 0.0033284770033787936] number_of_tis=10 mean=0.008078816600027494 per=0.0008078816600027494 times=[0.011014281000825576, 0.008476420000079088, 0.00741832799394615, 0.006857775995740667, 0.006627278009545989] number_of_tis=50 mean=0.01927847799670417 per=0.00038556955993408336 times=[0.02556803499464877, 0.01935569499619305, 0.01662322599440813, 0.01840184700267855, 0.01644358699559234] number_of_tis=100 mean=0.03301511880126782 per=0.00033015118801267817 times=[0.04117956099798903, 0.030890661000739783, 0.03007458901265636, 0.03125198099587578, 0.03167880199907813] number_of_tis=500 mean=0.15320950179593637 per=0.0003064190035918727 times=[0.20054609200451523, 0.14052859699586406, 0.14509809199080337, 0.1365471329918364, 0.1433275949966628] number_of_tis=1000 mean=0.2929377429973101 per=0.0002929377429973101 times=[0.3517978919990128, 0.2807794280088274, 0.2806490379880415, 0.27710555399244186, 0.27435680299822707] number_of_tis=3000 mean=0.9935687056015012 per=0.00033118956853383374 times=[1.2047388390055858, 0.8248025969951414, 0.8685875020019012, 0.9017027500085533, 1.1680118399963249] number_of_tis=5000 mean=1.5349355740036117 per=0.00030698711480072236 times=[1.8663743910001358, 1.5182018500054255, 1.5446484510030132, 1.3932801040064078, 1.3521730740030762] number_of_tis=10000 mean=3.7448632712010292 per=0.0003744863271201029 times=[4.135914924001554, 3.4411147559876554, 3.526543836007477, 3.7195197630062466, 3.9012230770022143] number_of_tis=15000 mean=6.3099766838044165 per=0.00042066511225362775 times=[6.552250057997298, 6.1369703890086384, 6.8749958210100885, 6.067943914007628, 5.917723236998427] number_of_tis=20000 mean=8.317583500797628 per=0.00041587917503988143 times=[8.720249108009739, 8.0188543760014, 8.328030352990027, 8.398350054994808, 8.122433611992165] ``` After: ``` number_of_tis=1 mean=0.026246879794052803 per=0.026246879794052803 times=[0.031441625993466005, 0.025166517996694893, 0.02518146399233956, 0.024703859991859645, 0.02474093099590391] number_of_tis=10 mean=0.02652196400158573 per=0.002652196400158573 times=[0.027266821009106934, 0.026017504002084024, 0.02769480799906887, 0.025840838003205135, 0.025789848994463682] number_of_tis=50 mean=0.032463929001824 per=0.00064927858003648 times=[0.03659850900294259, 0.03128377899702173, 0.03133225999772549, 0.030985830002464354, 0.032119267008965835] number_of_tis=100 mean=0.03862043260014616 per=0.0003862043260014616 times=[0.04082123900298029, 0.03752484500000719, 0.037281844997778535, 0.03927708099945448, 0.0381971530005103] number_of_tis=500 mean=0.10123570079740603 per=0.00020247140159481206 times=[0.11780315199575853, 0.09932849500910379, 0.10016329499194399, 0.09410478499194141, 0.09477877699828241] number_of_tis=1000 mean=0.17536458960094023 per=0.00017536458960094024 times=[0.20034298300743103, 0.17775658299797215, 0.17178491500089876, 0.16488367799320258, 0.16205478900519665] number_of_tis=3000 mean=0.5013463032053551 per=0.00016711543440178504 times=[0.6868100110004889, 0.46566563300439157, 0.44849480800621677, 0.4379984680126654, 0.46776259600301273] number_of_tis=5000 mean=0.840471555799013 per=0.0001680943111598026 times=[1.0285392189980485, 0.8854761679976946, 0.7579579270095564, 0.730956947998493, 0.7994275169912726] number_of_tis=10000 mean=1.975292908004485 per=0.0001975292908004485 times=[1.9648507620004239, 1.8537165410089074, 1.8826112380047562, 1.9254138420074014, 2.2498721570009366] number_of_tis=15000 mean=3.4746556333935588 per=0.00023164370889290392 times=[4.0400224499899196, 3.1751998239924433, 3.6206128539924975, 3.6852884859981714, 2.8521545529947616] number_of_tis=20000 mean=4.678154367001843 per=0.00023390771835009216 times=[4.465847548010061, 4.571855771995615, 4.749505186002352, 4.724330568002188, 4.8792327609990025] ``` MySQL is only 10-15% faster (and a lot noisier) Before: ``` number_of_tis=1 mean=0.006164804595755413 per=0.006164804595755413 times=[0.013516580002033152, 0.00427598599344492, 0.004508020996581763, 0.004067091998877004, 0.004456343987840228] number_of_tis=10 mean=0.007822793803643435 per=0.0007822793803643434 times=[0.0081135170039488, 0.00719467100861948, 0.009007985994685441, 0.00758794900320936, 0.007209846007754095] number_of_tis=50 mean=0.020377356800599954 per=0.00040754713601199905 times=[0.02612382399092894, 0.018950315003166907, 0.019109474000288174, 0.018008680999628268, 0.019694490008987486] number_of_tis=100 mean=0.040682651600218375 per=0.00040682651600218374 times=[0.05449078499805182, 0.037430580996442586, 0.039291110006161034, 0.03625023599306587, 0.035950546007370576] number_of_tis=500 mean=0.18646696420037187 per=0.00037293392840074375 times=[0.24278165798750706, 0.17090376401029062, 0.1837275660072919, 0.16893767600413412, 0.1659841569926357] number_of_tis=1000 mean=0.5903461098030676 per=0.0005903461098030675 times=[0.6001852740009781, 0.5642872750031529, 0.686630773008801, 0.5578094649972627, 0.5428177620051429] number_of_tis=3000 mean=1.9076304554007948 per=0.0006358768184669316 times=[2.042052763994434, 2.1137778090051142, 1.7461599689995637, 1.7260139089921722, 1.9101478260126896] number_of_tis=5000 mean=2.9185905692051164 per=0.0005837181138410233 times=[2.9221124830073677, 3.2889883980096783, 2.7569778940087417, 2.973596281008213, 2.651277789991582] number_of_tis=10000 mean=8.880191986600403 per=0.0008880191986600403 times=[7.3548113360011484, 9.13715232499817, 9.568511486999341, 8.80206210000324, 9.538422685000114] number_of_tis=15000 mean=15.426499317999696 per=0.0010284332878666464 times=[14.944712879005237, 15.38737604500784, 15.409629273999599, 15.852925243991194, 15.53785314799461] number_of_tis=20000 mean=20.579332908798825 per=0.0010289666454399414 times=[20.362008597003296, 19.878823954990366, 20.73281196100288, 20.837948996995692, 21.085071034001885] ``` After: ``` number_of_tis=1 mean=0.04114753239555284 per=0.04114753239555284 times=[0.05534043599618599, 0.03716265498951543, 0.039479082988691516, 0.03779561800183728, 0.035959870001534] number_of_tis=10 mean=0.038440523599274454 per=0.003844052359927445 times=[0.03949839199776761, 0.03853203100152314, 0.03801383898826316, 0.03784418400027789, 0.03831417200854048] number_of_tis=50 mean=0.05345874359773006 per=0.0010691748719546012 times=[0.07045628099876922, 0.04431965999538079, 0.06068256100115832, 0.04566028399858624, 0.04617493199475575] number_of_tis=100 mean=0.06805712619971019 per=0.0006805712619971019 times=[0.07946423999965191, 0.06054415399557911, 0.06277450300694909, 0.07836744099040516, 0.05913529300596565] number_of_tis=500 mean=0.17929348759935237 per=0.00035858697519870476 times=[0.2792787920043338, 0.16563376400154084, 0.14093860499269795, 0.1464673139998922, 0.16414896299829707] number_of_tis=1000 mean=0.3883620931970654 per=0.00038836209319706536 times=[0.47511668599327095, 0.3506359229941154, 0.43458069299231283, 0.33563552900159266, 0.3458416350040352] number_of_tis=3000 mean=1.3977356655988842 per=0.0004659118885329614 times=[1.575020256001153, 1.3353702509921277, 1.4193720350012882, 1.4037733709992608, 1.2551424150005914] number_of_tis=5000 mean=2.3742491033975965 per=0.0004748498206795193 times=[2.4926851909986, 2.501419166001142, 2.2862377730052685, 2.4421103859931463, 2.1487930009898264] number_of_tis=10000 mean=8.138347979800892 per=0.0008138347979800893 times=[6.648954969001352, 8.001181932995678, 8.551437315007206, 9.084980526997242, 8.405185155002982] number_of_tis=15000 mean=14.065810968197184 per=0.0009377207312131455 times=[13.222158194999793, 14.375066226988565, 14.108006285998272, 14.157014351992984, 14.466809781006305] number_of_tis=20000 mean=18.36637533060275 per=0.0009183187665301375 times=[17.728908119010157, 18.62269214099797, 18.936747477011522, 17.74613195299753, 18.797396962996572] ``` [1]: https://docs.sqlalchemy.org/en/13/dialects/postgresql.html#psycopg2-batch-mode
51881c6
to
39e2a08
Compare
Guess who cant do maths. Postgres is 77% quicker, not 40%. |
And I've found another 30% or so for postgres (testing MySQL) so it's now over twice as fast at creating dag runs. |
It gives us an extra X% speed up over bulk_save_objects, but we can't use it when the task_instance_mutation_hook does anything, as that hook needs an actual object. So _when_ we know that hook won't do anything we switch in to insert_mappings mode. New speeds: PostgreSQL: ``` number_of_tis=1 mean=0.028053103599813767 per=0.028053103599813767 times=[0.03762496300623752, 0.02637488600157667, 0.025065611000172794, 0.024561002996051684, 0.026639054995030165] number_of_tis=10 mean=0.02647183560184203 per=0.002647183560184203 times=[0.02698062499985099, 0.026417658998980187, 0.027347976007149555, 0.025797458001761697, 0.025815460001467727] number_of_tis=50 mean=0.03149963079486042 per=0.0006299926158972085 times=[0.03810671299288515, 0.03055680700344965, 0.029733988994848914, 0.03016914198815357, 0.02893150299496483] number_of_tis=100 mean=0.033998635396710594 per=0.0003399863539671059 times=[0.0351028829900315, 0.03299884400621522, 0.03358584298985079, 0.03295094799250364, 0.03535465900495183] number_of_tis=500 mean=0.07903424859978259 per=0.00015806849719956516 times=[0.08279920800123364, 0.08588568199775182, 0.07312070899934042, 0.07360191999759991, 0.07976372400298715] number_of_tis=1000 mean=0.12571056479937398 per=0.00012571056479937398 times=[0.12573593499837443, 0.12141938100103289, 0.12616568499652203, 0.12907471299695317, 0.12615711000398733] number_of_tis=3000 mean=0.36025245799683037 per=0.00012008415266561012 times=[0.36071603700111154, 0.3470657339930767, 0.3373015969991684, 0.3337128989951452, 0.42246602299564984] number_of_tis=5000 mean=0.6916533229988999 per=0.00013833066459977998 times=[0.9647149289958179, 0.6451378140045563, 0.5970188640058041, 0.5849326960014878, 0.6664623119868338] number_of_tis=10000 mean=2.071472014003666 per=0.00020714720140036663 times=[2.957865878008306, 1.9388906149979448, 1.766649461002089, 1.8647991580073722, 1.8291549580026185] number_of_tis=15000 mean=2.866650845797267 per=0.00019111005638648446 times=[3.3783503199956613, 2.657773957995232, 2.707275656008278, 2.7875704979960574, 2.802283796991105] number_of_tis=20000 mean=3.5886989389982773 per=0.00017943494694991387 times=[3.969436354993377, 3.436962780993781, 3.9078941010084236, 3.6387251569976797, 2.9904763009981252] ``` MySQL: ``` number_of_tis=1 mean=0.035956257799989545 per=0.035956257799989545 times=[0.03932315899874084, 0.03545605999534018, 0.03535486999317072, 0.034727805003058165, 0.03491939500963781] number_of_tis=10 mean=0.036957260797498746 per=0.0036957260797498745 times=[0.040442515004542656, 0.0379129799985094, 0.03494819799379911, 0.03562593398964964, 0.03585667700099293] number_of_tis=50 mean=0.04745422120031435 per=0.0009490844240062871 times=[0.06965546800347511, 0.04221734800375998, 0.04038520700123627, 0.040363031992455944, 0.04465005100064445] number_of_tis=100 mean=0.0528092162014218 per=0.000528092162014218 times=[0.06113427500531543, 0.04883724599494599, 0.05276876600692049, 0.047688748003565706, 0.05361704599636141] number_of_tis=500 mean=0.16223246100416872 per=0.0003244649220083374 times=[0.24469116200634744, 0.1407806619972689, 0.14792052800476085, 0.14703868801007047, 0.13073126500239596] number_of_tis=1000 mean=0.285728433605982 per=0.00028572843360598197 times=[0.3230128890136257, 0.27035739900020417, 0.3003890450054314, 0.2638379510026425, 0.2710448840080062] number_of_tis=3000 mean=1.1824120475997915 per=0.0003941373491999305 times=[1.3103130240051541, 1.286688863998279, 1.1455156929878285, 1.1072918410063721, 1.062250816001324] number_of_tis=5000 mean=1.9416745471942705 per=0.0003883349094388541 times=[2.3746965279860888, 1.9103765429899795, 2.0542518720030785, 1.7706374429981224, 1.598410349994083] number_of_tis=10000 mean=5.059874459402636 per=0.0005059874459402636 times=[5.431018351999228, 5.262124675995437, 5.174487816999317, 4.423381198008428, 5.008360254010768] number_of_tis=15000 mean=9.717965700797503 per=0.0006478643800531668 times=[7.884617075993447, 9.466949063993525, 10.005758297003922, 10.105231182998978, 11.127272883997648] number_of_tis=20000 mean=16.2008618004038 per=0.00081004309002019 times=[14.645835625007749, 16.304637463006657, 16.255490412993822, 16.830263861003914, 16.968081640006858] ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIIIIIICE ! Bulk operations on DB rock!
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
Please let me merge this one -- I want to update the commit message to make sense. |
nice, we have been using |
Maybe this change will give more speed up on mysql in practice than I got in my testing 🤞🏻 |
This comment has been minimized.
This comment has been minimized.
except (NoSectionError, NoOptionError): | ||
return default | ||
|
||
if len(data) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashb any reason to not check if data
instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andriisoldatenko -> I also prefer explicit len check in such cases. Firs of all you'd have to write if not data
. Negation in condition requires one more mental hoop to understand what it does and secondly len expresses the intention much more explcitely (if not data is also true when data is None).
Shorter does not always mean more readable.
This uses the "bulk" operation API of SQLAlchemy to get a big speed
up. Due to the
task_instance_mutation_hook
we still need to keepactual TaskInstance objects around.
For postgresql we have enabled to "batch operation helpers"1 which
makes it even faster. The default page sizes are chosen somewhat
randomly based on the SQLA docs.
To make these options configurable I have added (and used here and in
KubeConfig) a new
getjson
option to AirflowConfigParser class.Postgresql is now 142% faster:
Before:
With bulk_save_objects (4.678154367001843s)
With bulk_insert_mapping
MySQL is, sadly, only 25% faster
Before:
With bulk_save_objects (18.36637533060275s)
With bulk_insert_mappings
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.