You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Expected Behavior:
The SAR model should accept userId and itemId as an integer type as specified in the documentation.
Actual Behavior:
The SAR model only works when userId and itemId are cast to DoubleType. This is contrary to the documentation which states that userId should be within the integer value range.
Code to reproduce issue
import requests
import zipfile
import io
import pandas as pd
from pyspark.sql.types import DoubleType, LongType
from synapse.ml.recommendation import SAR
url = "http://files.grouplens.org/datasets/movielens/ml-25m.zip"
response = requests.get(url)
with zipfile.ZipFile(io.BytesIO(response.content)) as z:
with z.open('ml-25m/ratings.csv') as csvfile:
pdf_ratings = pd.read_csv(csvfile)
pdf_ratings["rating"] = 1.0
spark_df_ratings = spark.createDataFrame(pdf_ratings)
print("Before casting:")
spark_df_ratings.printSchema()
spark_df_ratings = spark_df_ratings.withColumn("userId", spark_df_ratings["userId"].cast(LongType()))
spark_df_ratings = spark_df_ratings.withColumn("movieId", spark_df_ratings["movieId"].cast(LongType()))
spark_df_ratings = spark_df_ratings.withColumn("rating", spark_df_ratings["rating"].cast(DoubleType()))
spark_df_ratings = spark_df_ratings.withColumn("timestamp", spark_df_ratings["timestamp"].cast(LongType()))
print("After casting:")
spark_df_ratings.printSchema()
sar = SAR(
userCol="userId",
itemCol="movieId",
ratingCol="rating",
timeCol="timestamp",
implicitPrefs=True,
activityTimeFormat="epoch"
)
model = sar.fit(spark_df_ratings)
Other info / logs
Py4JJavaError: An error occurred while calling o629.fit.
: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Double
at scala.runtime.BoxesRunTime.unboxToDouble(BoxesRunTime.java:116)
at org.apache.spark.sql.Row.getDouble(Row.scala:275)
at org.apache.spark.sql.Row.getDouble$(Row.scala:275)
at org.apache.spark.sql.catalyst.expressions.GenericRow.getDouble(rows.scala:28)
at com.microsoft.azure.synapse.ml.recommendation.SAR.calculateUserItemAffinities(SAR.scala:99)
at com.microsoft.azure.synapse.ml.recommendation.SAR.$anonfun$fit$1(SAR.scala:69)
at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logVerb(SynapseMLLogging.scala:163)
at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logVerb$(SynapseMLLogging.scala:160)
at com.microsoft.azure.synapse.ml.recommendation.SAR.logVerb(SAR.scala:36)
at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logFit(SynapseMLLogging.scala:153)
at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logFit$(SynapseMLLogging.scala:152)
at com.microsoft.azure.synapse.ml.recommendation.SAR.logFit(SAR.scala:36)
at com.microsoft.azure.synapse.ml.recommendation.SAR.fit(SAR.scala:75)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
at java.lang.Thread.run(Thread.java:750)
What component(s) does this bug affect?
area/cognitive: Cognitive project
area/core: Core project
area/deep-learning: DeepLearning project
area/lightgbm: Lightgbm project
area/opencv: Opencv project
area/vw: VW project
area/website: Website
area/build: Project build system
area/notebooks: Samples under notebooks folder
area/docker: Docker usage
area/models: models related issue
What language(s) does this bug affect?
language/scala: Scala source code
language/python: Pyspark APIs
language/r: R APIs
language/csharp: .NET APIs
language/new: Proposals for new client languages
What integration(s) does this bug affect?
integrations/synapse: Azure Synapse integrations
integrations/azureml: Azure ML integrations
integrations/databricks: Databricks integrations
The text was updated successfully, but these errors were encountered:
Fixes#2274
Update SAR model to accept `userId` and `itemId` as integer types (`LongType`).
* **SAR.scala**
- Update `calculateUserItemAffinities` method to handle `userId` and `itemId` as `LongType`.
- Update `calculateItemItemSimilarity` method to handle `userId` and `itemId` as `LongType`.
* **test_ranking.py**
- Add test case `test_adapter_evaluator_sar_with_long` to verify `userId` and `itemId` as `LongType`.
* **Smart Adaptive Recommendations.md**
- Update documentation to reflect that `userId` and `itemId` can be of `LongType`.
---
For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/microsoft/SynapseML/issues/2274?shareId=XXXX-XXXX-XXXX-XXXX).
SynapseML version
1.0.5
System information
Describe the problem
Expected Behavior:
The SAR model should accept
userId
anditemId
as an integer type as specified in the documentation.Actual Behavior:
The SAR model only works when
userId
anditemId
are cast toDoubleType
. This is contrary to the documentation which states thatuserId
should be within the integer value range.Code to reproduce issue
Other info / logs
What component(s) does this bug affect?
area/cognitive
: Cognitive projectarea/core
: Core projectarea/deep-learning
: DeepLearning projectarea/lightgbm
: Lightgbm projectarea/opencv
: Opencv projectarea/vw
: VW projectarea/website
: Websitearea/build
: Project build systemarea/notebooks
: Samples under notebooks folderarea/docker
: Docker usagearea/models
: models related issueWhat language(s) does this bug affect?
language/scala
: Scala source codelanguage/python
: Pyspark APIslanguage/r
: R APIslanguage/csharp
: .NET APIslanguage/new
: Proposals for new client languagesWhat integration(s) does this bug affect?
integrations/synapse
: Azure Synapse integrationsintegrations/azureml
: Azure ML integrationsintegrations/databricks
: Databricks integrationsThe text was updated successfully, but these errors were encountered: