Ever wondered how to send multiple http requests using multiple processes with Elixir? Let’s use Elixir glorious Tasks to map your desired website.
defmodule Looter do
def grabber(urls \\ ["http://www.simon-neutert.de", "http://www.trojanischeresel.de", "http://www.trojanischeresel.de/blog"]) do
Enum.map(urls, fn(url) -> Task.async(fn -> Looter.digger(url) end) end)
|> Enum.map(fn(task) -> Task.await(task, 145000) end) # 145000 == Timeout in milliseconds
end
def digger(url) do
%HTTPoison.Response{body: body, status_code: status_code} = HTTPoison.get!(url)
case status_code do
200 ->
{_, _, title} = List.first(Floki.find(body, "title"))
IO.puts title
{:ok, title}
_ ->
IO.puts "Error #{status_code}"
{:error, "Error #{status_code}"}
end
end
end
Bonus: Experiment with Enum.chunk()
, so you can setup pools of workers and limit the amount of processes created.
The key of what makes this so efficient is:
Enum.map(urls, fn(url) -> Task.async(fn -> Looter.digger(url) end) end)
|> Enum.map(fn(task) -> Task.await(task, 145000) end)
Read more about parallel maps on elixir-recipies and holyxiaoxin article is well worth reading, too.
Not fast enough? Don’t mind using Ruby? I came up with a solution using EM-Synchrony and Nokogiri that is up to 4 times faster using only around 25 concurrent connections.