Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data Liberation] Add XML API, Stream API, WXR URL Rewriter API #1952

Merged
merged 36 commits into from
Oct 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
1ef710f
Data liberation: Kickoff the project
adamziel Oct 11, 2024
234a8bf
Port the URL rewriters from adamziel/site-transfer-protocol
adamziel Oct 13, 2024
819febd
Port WP_HTML_Processor et al. from WordPress
adamziel Oct 13, 2024
0a6167b
Move WordPress core files
adamziel Oct 13, 2024
826fe75
Outline the next steps
adamziel Oct 13, 2024
0633e6f
Add PHPCS and CBF
adamziel Oct 14, 2024
4406fcf
Update HTML API, fix unit tests
adamziel Oct 15, 2024
0cfd334
Merge branch 'trunk' into data-liberation-bring-in-php-parsers
adamziel Oct 15, 2024
b90a9d6
Bump CI PHP version to 8.1
adamziel Oct 15, 2024
081535b
Adjust the CI setup for PHP
adamziel Oct 15, 2024
aca88fe
Run npm instlal insteaf of installing just nx
adamziel Oct 15, 2024
897af50
Use the correct nx project name
adamziel Oct 15, 2024
f7679b0
Remove the network functions and only lint the src directory
adamziel Oct 15, 2024
5b9ec7d
Remove special casing for direct matching pathname prefixes
adamziel Oct 15, 2024
97fed71
Fix linting errors
adamziel Oct 15, 2024
96c1ce4
Move the additional functions to pbpcbf.php
adamziel Oct 15, 2024
e15408a
Replace iterate_urls with url_matches
adamziel Oct 15, 2024
b788eea
Lint PHP
adamziel Oct 15, 2024
b83933c
Thoroughly test WP_URL_In_Text_Processor
adamziel Oct 28, 2024
fb0204c
Enable tests for WP_Block_Markup_Processor
adamziel Oct 28, 2024
b1ea8dc
Enable all PHPUnit tests
adamziel Oct 28, 2024
4335044
Enable URLParserWHATWGComplianceTests
adamziel Oct 28, 2024
91863ca
move $is_relative declaration clsoer to where it's used
adamziel Oct 28, 2024
d2aeea4
Add a single tricky test case for wp_rewrite_urls()
adamziel Oct 28, 2024
60db1e1
Preserve urlencoded data in the rewritten path
adamziel Oct 28, 2024
2da0386
Unit test urldecoding UTF-8 data
adamziel Oct 28, 2024
54bea02
Lint
adamziel Oct 28, 2024
54c901d
Remove messing with private WP_HTML_Tag_Processor attributes
adamziel Oct 28, 2024
a62532b
Remove the commented out dead code from WP_URL_In_Text_Processor
adamziel Oct 28, 2024
238decd
Uncomment the public suffix list verification
adamziel Oct 28, 2024
37622ab
PHP 8.1 compat
adamziel Oct 28, 2024
e12190f
PHP 8.1 compliance
adamziel Oct 28, 2024
34cac36
[Data Liberation] Add XML API, Stream API, WXR URL Rewriter API
adamziel Oct 28, 2024
5dadb3e
Adjust how append_bytes() work to fix a failing test
adamziel Oct 28, 2024
06c5503
Lint
adamziel Oct 28, 2024
2af08a4
Merge branch 'trunk' into data-liberation-xml-parsers
adamziel Oct 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 18 additions & 23 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,65 +4,60 @@ All notable changes to this project are documented in this file by a CI job
that runs on every NPM release. The file follows the [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)
format.

## [v1.0.7] (2024-10-28)
## [v1.0.7] (2024-10-28)




## [v1.0.6] (2024-10-28)
## [v1.0.6] (2024-10-28)

### Website

- Query API: Preserve multiple ?plugin= query params. ([#1947](https://github.com/WordPress/wordpress-playground/pull/1947))
- [Remote] Enable releasing @wp-playground/remote by making it public. ([#1948](https://github.com/WordPress/wordpress-playground/pull/1948))
- Query API: Preserve multiple ?plugin= query params. ([#1947](https://github.com/WordPress/wordpress-playground/pull/1947))
- [Remote] Enable releasing @wp-playground/remote by making it public. ([#1948](https://github.com/WordPress/wordpress-playground/pull/1948))

### Contributors

The following contributors merged PRs in this release:

@adamziel @bgrgicak


## [v1.0.5] (2024-10-25)
## [v1.0.5] (2024-10-25)

### Enhancements

- [CORS Proxy] Rate-limits IPv6 requests based on /64 subnets, not specific addresses. ([#1923](https://github.com/WordPress/wordpress-playground/pull/1923))
- [CORS Proxy] Rate-limits IPv6 requests based on /64 subnets, not specific addresses. ([#1923](https://github.com/WordPress/wordpress-playground/pull/1923))

### Blueprints

- Reload after autologin to set login cookies during boot. ([#1914](https://github.com/WordPress/wordpress-playground/pull/1914))
- Skip empty lines in the runSql step. ([#1939](https://github.com/WordPress/wordpress-playground/pull/1939))
- Reload after autologin to set login cookies during boot. ([#1914](https://github.com/WordPress/wordpress-playground/pull/1914))
- Skip empty lines in the runSql step. ([#1939](https://github.com/WordPress/wordpress-playground/pull/1939))

### Documentation

- Clarified wp beta to also include rc version. ([#1936](https://github.com/WordPress/wordpress-playground/pull/1936))
- Clarified wp beta to also include rc version. ([#1936](https://github.com/WordPress/wordpress-playground/pull/1936))

### PHP WebAssembly

- Enable CURL in Playground Web. ([#1935](https://github.com/WordPress/wordpress-playground/pull/1935))
- PHP: Implement TLS 1.2 to decrypt https:// and ssl:// traffic and translate it into fetch(). ([#1926](https://github.com/WordPress/wordpress-playground/pull/1926))
- Enable CURL in Playground Web. ([#1935](https://github.com/WordPress/wordpress-playground/pull/1935))
- PHP: Implement TLS 1.2 to decrypt https:// and ssl:// traffic and translate it into fetch(). ([#1926](https://github.com/WordPress/wordpress-playground/pull/1926))

### Website

- Hide Settings menu after clicking "Restore from .zip. ([#1904](https://github.com/WordPress/wordpress-playground/pull/1904))
- Publish @wp-playground/remote (types only). ([#1924](https://github.com/WordPress/wordpress-playground/pull/1924))
- Hide Settings menu after clicking "Restore from .zip. ([#1904](https://github.com/WordPress/wordpress-playground/pull/1904))
- Publish @wp-playground/remote (types only). ([#1924](https://github.com/WordPress/wordpress-playground/pull/1924))

### Bug Fixes

- CORS Proxy: Index update_at column because it is used for lookup. ([#1931](https://github.com/WordPress/wordpress-playground/pull/1931))
- CORS Proxy: Reject targeting self. ([#1932](https://github.com/WordPress/wordpress-playground/pull/1932))
- Docs: Fix typo. ([#1934](https://github.com/WordPress/wordpress-playground/pull/1934))
- Explicitly request no-cache to discourage WP Cloud from edge caching CORS proxy results. ([#1930](https://github.com/WordPress/wordpress-playground/pull/1930))
- Remove test code added in #1914. ([#1928](https://github.com/WordPress/wordpress-playground/pull/1928))
- CORS Proxy: Index update_at column because it is used for lookup. ([#1931](https://github.com/WordPress/wordpress-playground/pull/1931))
- CORS Proxy: Reject targeting self. ([#1932](https://github.com/WordPress/wordpress-playground/pull/1932))
- Docs: Fix typo. ([#1934](https://github.com/WordPress/wordpress-playground/pull/1934))
- Explicitly request no-cache to discourage WP Cloud from edge caching CORS proxy results. ([#1930](https://github.com/WordPress/wordpress-playground/pull/1930))
- Remove test code added in #1914. ([#1928](https://github.com/WordPress/wordpress-playground/pull/1928))

### Contributors

The following contributors merged PRs in this release:

@adamziel @ajotka @bgrgicak @bph @brandonpayton @ockham @psrpinto


## [v1.0.4] (2024-10-21)

### Enhancements
Expand Down
41 changes: 18 additions & 23 deletions packages/docs/site/docs/main/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,65 +9,60 @@ All notable changes to this project are documented in this file by a CI job
that runs on every NPM release. The file follows the [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)
format.

## [v1.0.7] (2024-10-28)
## [v1.0.7] (2024-10-28)




## [v1.0.6] (2024-10-28)
## [v1.0.6] (2024-10-28)

### Website

- Query API: Preserve multiple ?plugin= query params. ([#1947](https://github.com/WordPress/wordpress-playground/pull/1947))
- [Remote] Enable releasing @wp-playground/remote by making it public. ([#1948](https://github.com/WordPress/wordpress-playground/pull/1948))
- Query API: Preserve multiple ?plugin= query params. ([#1947](https://github.com/WordPress/wordpress-playground/pull/1947))
- [Remote] Enable releasing @wp-playground/remote by making it public. ([#1948](https://github.com/WordPress/wordpress-playground/pull/1948))

### Contributors

The following contributors merged PRs in this release:

@adamziel @bgrgicak


## [v1.0.5] (2024-10-25)
## [v1.0.5] (2024-10-25)

### Enhancements

- [CORS Proxy] Rate-limits IPv6 requests based on /64 subnets, not specific addresses. ([#1923](https://github.com/WordPress/wordpress-playground/pull/1923))
- [CORS Proxy] Rate-limits IPv6 requests based on /64 subnets, not specific addresses. ([#1923](https://github.com/WordPress/wordpress-playground/pull/1923))

### Blueprints

- Reload after autologin to set login cookies during boot. ([#1914](https://github.com/WordPress/wordpress-playground/pull/1914))
- Skip empty lines in the runSql step. ([#1939](https://github.com/WordPress/wordpress-playground/pull/1939))
- Reload after autologin to set login cookies during boot. ([#1914](https://github.com/WordPress/wordpress-playground/pull/1914))
- Skip empty lines in the runSql step. ([#1939](https://github.com/WordPress/wordpress-playground/pull/1939))

### Documentation

- Clarified wp beta to also include rc version. ([#1936](https://github.com/WordPress/wordpress-playground/pull/1936))
- Clarified wp beta to also include rc version. ([#1936](https://github.com/WordPress/wordpress-playground/pull/1936))

### PHP WebAssembly

- Enable CURL in Playground Web. ([#1935](https://github.com/WordPress/wordpress-playground/pull/1935))
- PHP: Implement TLS 1.2 to decrypt https:// and ssl:// traffic and translate it into fetch(). ([#1926](https://github.com/WordPress/wordpress-playground/pull/1926))
- Enable CURL in Playground Web. ([#1935](https://github.com/WordPress/wordpress-playground/pull/1935))
- PHP: Implement TLS 1.2 to decrypt https:// and ssl:// traffic and translate it into fetch(). ([#1926](https://github.com/WordPress/wordpress-playground/pull/1926))

### Website

- Hide Settings menu after clicking "Restore from .zip. ([#1904](https://github.com/WordPress/wordpress-playground/pull/1904))
- Publish @wp-playground/remote (types only). ([#1924](https://github.com/WordPress/wordpress-playground/pull/1924))
- Hide Settings menu after clicking "Restore from .zip. ([#1904](https://github.com/WordPress/wordpress-playground/pull/1904))
- Publish @wp-playground/remote (types only). ([#1924](https://github.com/WordPress/wordpress-playground/pull/1924))

### Bug Fixes

- CORS Proxy: Index update_at column because it is used for lookup. ([#1931](https://github.com/WordPress/wordpress-playground/pull/1931))
- CORS Proxy: Reject targeting self. ([#1932](https://github.com/WordPress/wordpress-playground/pull/1932))
- Docs: Fix typo. ([#1934](https://github.com/WordPress/wordpress-playground/pull/1934))
- Explicitly request no-cache to discourage WP Cloud from edge caching CORS proxy results. ([#1930](https://github.com/WordPress/wordpress-playground/pull/1930))
- Remove test code added in #1914. ([#1928](https://github.com/WordPress/wordpress-playground/pull/1928))
- CORS Proxy: Index update_at column because it is used for lookup. ([#1931](https://github.com/WordPress/wordpress-playground/pull/1931))
- CORS Proxy: Reject targeting self. ([#1932](https://github.com/WordPress/wordpress-playground/pull/1932))
- Docs: Fix typo. ([#1934](https://github.com/WordPress/wordpress-playground/pull/1934))
- Explicitly request no-cache to discourage WP Cloud from edge caching CORS proxy results. ([#1930](https://github.com/WordPress/wordpress-playground/pull/1930))
- Remove test code added in #1914. ([#1928](https://github.com/WordPress/wordpress-playground/pull/1928))

### Contributors

The following contributors merged PRs in this release:

@adamziel @ajotka @bgrgicak @bph @brandonpayton @ockham @psrpinto


## [v1.0.4] (2024-10-21)

### Enhancements
Expand Down
14 changes: 14 additions & 0 deletions packages/playground/data-liberation/bootstrap.php
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
<?php

require_once __DIR__ . '/src/stream-api/WP_Stream_Processor.php';
require_once __DIR__ . '/src/stream-api/WP_Byte_Stream_State.php';
require_once __DIR__ . '/src/stream-api/WP_Byte_Stream.php';
require_once __DIR__ . '/src/stream-api/WP_Processor_Byte_Stream.php';
require_once __DIR__ . '/src/stream-api/WP_File_Byte_Stream.php';
require_once __DIR__ . '/src/stream-api/WP_Stream_Paused_State.php';
require_once __DIR__ . '/src/stream-api/WP_Stream_Chain.php';

require_once __DIR__ . "/src/wordpress-core-html-api/class-wp-html-token.php";
require_once __DIR__ . "/src/wordpress-core-html-api/class-wp-html-span.php";
require_once __DIR__ . "/src/wordpress-core-html-api/class-wp-html-text-replacement.php";
Expand All @@ -20,6 +28,12 @@
require_once __DIR__ . '/src/WP_Block_Markup_Url_Processor.php';
require_once __DIR__ . '/src/WP_URL_In_Text_Processor.php';
require_once __DIR__ . '/src/WP_URL.php';

require_once __DIR__ . '/src/xml-api/WP_XML_Decoder.php';
require_once __DIR__ . '/src/xml-api/WP_XML_Tag_Processor.php';
require_once __DIR__ . '/src/xml-api/WP_XML_Processor.php';
require_once __DIR__ . '/src/WP_WXR_URL_Rewrite_Processor.php';

require_once __DIR__ . '/vendor/autoload.php';


Expand Down
5 changes: 4 additions & 1 deletion packages/playground/data-liberation/phpunit.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,15 @@
<phpunit xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" bootstrap="bootstrap.php" colors="true" xsi:noNamespaceSchemaLocation="https://schema.phpunit.de/10.0/phpunit.xsd" cacheDirectory=".phpunit.cache">
<testsuites>
<testsuite name="Application Test Suite">
<file>tests/WPWXRURLRewriterTests.php</file>
<file>tests/WPRewriteUrlsTests.php</file>
<file>tests/WPURLInTextProcessorTests.php</file>
<file>tests/WPBlockMarkupProcessorTests.php</file>
<file>tests/WPBlockMarkupUrlProcessorTests.php</file>
<file>tests/URLParserWHATWGComplianceTests.php</file>
<file>tests/UrldecodeNTests.php</file>
<file>tests/WPXMLProcessorTests.php</file>
<file>tests/WPXMLTagProcessorTests.php</file>
<file>tests/UrldecodeNTests.php</file>
</testsuite>
</testsuites>
</phpunit>
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,7 @@ public function next_url() {
}

$tld = strtolower( substr( $parsed_url->hostname, $last_dot_position + 1 ) );
if ( empty( self::$public_suffix_list[ $tld ] ) ) {
if ( empty( self::$public_suffix_list[ $tld ] ) && $tld !== 'internal' ) {
// This TLD is not in the public suffix list. It's not a valid domain name.
continue;
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
<?php

class WP_WXR_URL_Rewrite_Processor {


public static function stream( $current_site_url, $new_site_url ) {
return WP_XML_Processor::stream(
function ( $processor ) use ( $current_site_url, $new_site_url ) {
if ( static::is_wxr_content_node( $processor ) ) {
$text = $processor->get_modifiable_text();
$updated_text = wp_rewrite_urls(
array(
'block_markup' => $text,
'current-site-url' => $current_site_url,
'new-site-url' => $new_site_url,
)
);
if ( $updated_text !== $text ) {
$processor->set_modifiable_text( $updated_text );
}
}
}
);
}

private static function is_wxr_content_node( WP_XML_Processor $processor ) {
$breadcrumbs = $processor->get_breadcrumbs();
if (
! in_array( 'excerpt:encoded', $breadcrumbs, true ) &&
! in_array( 'content:encoded', $breadcrumbs, true ) &&
! in_array( 'guid', $breadcrumbs, true ) &&
! in_array( 'link', $breadcrumbs, true ) &&
! in_array( 'wp:attachment_url', $breadcrumbs, true ) &&
! in_array( 'wp:comment_content', $breadcrumbs, true ) &&
! in_array( 'wp:base_site_url', $breadcrumbs, true ) &&
! in_array( 'wp:base_blog_url', $breadcrumbs, true )
// Meta values are not supported yet. We'll need to support
// WordPress core options that may be saved as JSON, PHP Deserialization, and XML,
// and then provide extension points for plugins authors support
// their own options.
// !in_array('wp:postmeta', $processor->get_breadcrumbs())
) {
return false;
}
return true;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
<?php

abstract class WP_Byte_Stream {

protected $state;

public function __construct() {
$this->state = new WP_Byte_Stream_State();
}

public function is_eof(): bool {
return ! $this->state->output_bytes && $this->state->state === WP_Byte_Stream_State::STATE_FINISHED;
}

public function get_file_id() {
return $this->state->file_id;
}

public function skip_file(): void {
$this->state->last_skipped_file = $this->state->file_id;
}

public function is_skipped_file() {
return $this->state->file_id === $this->state->last_skipped_file;
}

public function get_chunk_type() {
if ( $this->get_last_error() ) {
return '#error';
}

if ( $this->is_eof() ) {
return '#eof';
}

return '#bytes';
}

public function append_eof() {
$this->state->input_eof = true;
}

public function append_bytes( string $bytes, $context = null ) {
$this->state->input_bytes .= $bytes;
$this->state->input_context = $context;
}

public function get_bytes() {
return $this->state->output_bytes;
}

public function next_bytes() {
$this->state->reset_output();
if ( $this->is_eof() ) {
return false;
}

// Process any remaining buffered input:
if ( $this->generate_next_chunk() ) {
return ! $this->is_skipped_file();
}

if ( ! $this->state->input_bytes ) {
if ( $this->state->input_eof ) {
$this->state->finish();
}
return false;
}

$produced_bytes = $this->generate_next_chunk();

return $produced_bytes && ! $this->is_skipped_file();
}

abstract protected function generate_next_chunk(): bool;

public function get_last_error(): string|null {
return $this->state->last_error;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
<?php

/**
* This interface describes standalone streams, but it can also be
* used to describe a stream Processor like WP_XML_Processor.
*
* In this prototype there are no pipes, streams, and processors. There
* are only Byte Streams that can be chained together with the StreamChain
* class.
*/
class WP_Byte_Stream_State {
const STATE_STREAMING = '#streaming';
const STATE_FINISHED = '#finished';

public $input_eof = false;
public $input_bytes = null;
public $output_bytes = null;
public $state = self::STATE_STREAMING;
public $last_error = null;
public $input_context = null;

public $file_id;
public $last_skipped_file;

public function reset_output() {
$this->output_bytes = null;
$this->file_id = 'default';
$this->last_error = null;
}

public function consume_input_bytes() {
$bytes = $this->input_bytes;
$this->input_bytes = null;
return $bytes;
}

public function finish() {
$this->state = self::STATE_FINISHED;
}
}
Loading