Schema clustering and retrieval for multi-domain pay-as-you-go data intergration systems