Web Application & REST API Integration Plugin

The plugin provides the integration between web application testing functionality and REST API features.

Installation

Copy the below line to dependencies section of the project build.gradle file

Example 1. build.gradle

implementation(group: 'org.vividus', name: 'vividus-plugin-web-app-to-rest-api', version: '0.5.7')

If the project was imported to the IDE before adding new plugin, re-generate the configuration files for the used IDE and then refresh the project in the used IDE.

Table Transformers

FROM_SITEMAP

FROM_SITEMAP transformer generates table based on the website sitemap.

Parameter Description

Parameter	Description
`siteMapRelativeUrl`	relative URL of `sitemap.xml`
`ignoreErrors`	ignore sitemap parsing errors (true or false)
`column`	the column name in the generated table

siteMapRelativeUrl

relative URL of sitemap.xml

ignoreErrors

ignore sitemap parsing errors (true or false)

column

the column name in the generated table

Property Name Acceptable values Default Description

Property Name	Acceptable values	Default	Description
`transformer.from-sitemap.ignore-errors`	`true` `false`	`false`	ignore sitemap parsing errors
`transformer.from-sitemap.filter-redirects`	`true` `false`	`false`	defines whether urls that has redirect to the one that has already been included in the table are excluded from the table

transformer.from-sitemap.ignore-errors

true false

false

ignore sitemap parsing errors

transformer.from-sitemap.filter-redirects

true false

false

defines whether urls that has redirect to the one that has already been included in the table are excluded from the table

Required properties

web-application.main-page-url - defines main application page URL

Example 2. Usage example

Examples:
{transformer=FROM_SITEMAP, siteMapRelativeUrl=/sitemap.xml, ignoreErrors=true, column=page-url}

FROM_HEADLESS_CRAWLING

FROM_HEADLESS_CRAWLING transformer generates table based on the results of headless crawling.

Parameter Name Description

Parameter Name	Description
`column`	The column name in the generated table.

column

The column name in the generated table.

Property Name Acceptable values Default Description

Property Name	Acceptable values	Default	Description
General
`transformer.from-headless-crawling.seed-relative-urls`	Comma-separated list of values		List of relative URLs, a seed URL is a URL that is fetched by the crawler to extract new URLs in it and follow them for crawling.
`transformer.from-headless-crawling.exclude-extensions-regex`	Regular expression a	`(css\|gif\|gz\|ico\|jpeg\|jpg\|js\|mp3\|mp4\|pdf\|png\|svg\|zip\|woff2 \|woff\|ttf\|doc\|docx)`	The regular expression to match extensions in URLs. The crawler will ignore all URLs referring to files with extensions matching the given regular expression. URLs without extensions will always be crawled.
`transformer.from-headless-crawling.filter-redirects`	`true` `false`	`false`	Defines whether urls that has redirect to the one that has already been included in the table are excluded from the table.
`transformer.from-headless-crawling.socket-timeout`	`integer`	`40000`	Socket timeout in milliseconds.
`transformer.from-headless-crawling.connection-timeout`	`integer`	`30000`	Connection timeout in milliseconds.
`transformer.from-headless-crawling.max-download-size`	`integer`	`1048576`	Max allowed size of a page in bytes. Pages larger than this size will not be fetched.
`transformer.from-headless-crawling.max-connections-per-host`	`integer`	`100`	Maximum connections per host.
`transformer.from-headless-crawling.max-total-connections`	`integer`	`100`	Maximum total connections.
`transformer.from-headless-crawling.follow-redirects`	`true` / `false`	`true`	Whether to follow redirects.
`transformer.from-headless-crawling.max-depth-of-crawling`	`integer`	`-1`	Maximum depth of crawling, for unlimited depth this parameter should be set to -1.
`transformer.from-headless-crawling.max-pages-to-fetch`	`integer`	`-1`	Number of pages to fetch, for unlimited number of pages this parameter should be set to -1.
`transformer.from-headless-crawling.politeness-delay`	`integer`	`0`	Politeness delay in milliseconds between sending two requests to the same host.
`transformer.from-headless-crawling.max-outgoing-links-to-follow`	`integer`	`5000`	Max number of outgoing links which are processed from a page.
`transformer.from-headless-crawling.respect-no-follow`	`true` `false`	`false`	Whether to honor links with nofollow flag.
`transformer.from-headless-crawling.respect-no-index`	`true` `false`	`false`	Whether to honor links with noindex flag.
`transformer.from-headless-crawling.user-agent-string`	`string`	`crawler4j (https://github.com/rzo1/crawler4j/)`	User agent.
`transformer.from-headless-crawling.cookie-policy`	`ignore`, `standard`, `relaxed`	`no default value`	Cookie policy as defined per cookie specification.
`transformer.from-headless-crawling.allow-single-level-domain`	`true` `false`	`false`	Whether to consider single level domains valid (e.g. http://localhost).
`transformer.from-headless-crawling.include-https-pages`	`true` `false`	`true`	Whether to crawl https pages.
Proxy
`transformer.from-headless-crawling.proxy-host`	`URL`	`no default value`	Proxy host.
`transformer.from-headless-crawling.proxy-port`	`integer`	`80`	Proxy port.
`transformer.from-headless-crawling.proxy-username`	`string`	`no default value`	Username to authenticate with proxy.
`transformer.from-headless-crawling.proxy-password`	`string`	`no default value`	Password to authenticate with proxy.

General

transformer.from-headless-crawling.seed-relative-urls

Comma-separated list of values

List of relative URLs, a seed URL is a URL that is fetched by the crawler to extract new URLs in it and follow them for crawling.

transformer.from-headless-crawling.exclude-extensions-regex

Regular expression a

(css|gif|gz|ico|jpeg|jpg|js|mp3|mp4|pdf|png|svg|zip|woff2 |woff|ttf|doc|docx)

The regular expression to match extensions in URLs. The crawler will ignore all URLs referring to files with extensions matching the given regular expression. URLs without extensions will always be crawled.

transformer.from-headless-crawling.filter-redirects

true false

false

Defines whether urls that has redirect to the one that has already been included in the table are excluded from the table.

transformer.from-headless-crawling.socket-timeout

integer

40000

Socket timeout in milliseconds.

transformer.from-headless-crawling.connection-timeout

integer

30000

Connection timeout in milliseconds.

transformer.from-headless-crawling.max-download-size

integer

1048576

Max allowed size of a page in bytes. Pages larger than this size will not be fetched.

transformer.from-headless-crawling.max-connections-per-host

integer

100

Maximum connections per host.

transformer.from-headless-crawling.max-total-connections

integer

100

Maximum total connections.

transformer.from-headless-crawling.follow-redirects

true / false

true

Whether to follow redirects.

transformer.from-headless-crawling.max-depth-of-crawling

integer

-1

Maximum depth of crawling, for unlimited depth this parameter should be set to -1.

transformer.from-headless-crawling.max-pages-to-fetch

integer

-1

Number of pages to fetch, for unlimited number of pages this parameter should be set to -1.

transformer.from-headless-crawling.politeness-delay

integer

0

Politeness delay in milliseconds between sending two requests to the same host.

transformer.from-headless-crawling.max-outgoing-links-to-follow

integer

5000

Max number of outgoing links which are processed from a page.

transformer.from-headless-crawling.respect-no-follow

true false

false

Whether to honor links with nofollow flag.

transformer.from-headless-crawling.respect-no-index

true false

false

Whether to honor links with noindex flag.

transformer.from-headless-crawling.user-agent-string

string

crawler4j (https://github.com/rzo1/crawler4j/)

User agent.

transformer.from-headless-crawling.cookie-policy

ignore, standard, relaxed

no default value

Cookie policy as defined per cookie specification.

transformer.from-headless-crawling.allow-single-level-domain

true false

false

Whether to consider single level domains valid (e.g. http://localhost).

transformer.from-headless-crawling.include-https-pages

true false

true

Whether to crawl https pages.

Proxy

transformer.from-headless-crawling.proxy-host

URL

no default value

Proxy host.

transformer.from-headless-crawling.proxy-port

integer

80

Proxy port.

transformer.from-headless-crawling.proxy-username

string

no default value

Username to authenticate with proxy.

transformer.from-headless-crawling.proxy-password

string

no default value

Password to authenticate with proxy.

Required properties

web-application.main-page-url - defines main application page URL

Example 3. Usage example

Examples:
{transformer=FROM_HEADLESS_CRAWLING, column=page-url}

Steps

Validate resources

Validates resources on web pages

Resource validation logic:

If the pages row contains relative URL then it gets resolved against URL in web-application.main-page-url property, i.e. if the main page URL is https://elderscrolls.bethesda.net/ and relative URL is /skyrim10 the resulting URL will be https://elderscrolls.bethesda.net/skyrim10
Collect elements by the CSS selector from each page
Get either href or src attribute value from each element, if neither of the attributes exists the validation fails
For each received value execute HEAD request
1. If the status code is 200 OK then the resource validation is considered as passed
2. If the status code is one of 404 Not Found, 405 Method Not Allowed, 501 Not Implemented, 503 Service Unavailable then GET request will be executed
3. If the GET status code is 200 OK then the resource validation is considered as passed, otherwise failed

Then all resources by selector `$cssSelector` are valid on:$pages

$cssSelector - The CSS selector
$pages - The pages to validate resources on

Example 4. Validate resources

Then all resources by selector `a` are valid on:
|pages                        |
|https://vividus.org/         |
|/test-automation-made-awesome|