Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.loc behavior when using 'slice' #1158

Closed
beobest2 opened this issue Dec 29, 2019 · 2 comments · Fixed by #1159
Closed

.loc behavior when using 'slice' #1158

beobest2 opened this issue Dec 29, 2019 · 2 comments · Fixed by #1159
Labels
bug Something isn't working

Comments

@beobest2
Copy link
Contributor

beobest2 commented Dec 29, 2019

I test as shown in the documentation below.
I found something strange.

https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.Series.loc.html
image

pandas :
'sidewinder' is at the end of the list, so it prints normally.

>>> pdf = pd.DataFrame([[1, 2], [4, 5], [7, 8]], 
...             index=['cobra', 'viper', 'sidewinder'],  
...             columns=['max_speed', 'shield'])
>>> pdf.loc['cobra':'viper', 'max_speed']
cobra    1
viper    4
Name: max_speed, dtype: int64

but koalas prints 'sidewinder'

>>> kdf = ks.DataFrame([[1, 2], [4, 5], [7, 8]],
...              index=['cobra', 'viper', 'sidewinder'],
...              columns=['max_speed', 'shield'])
>>> kdf.loc['cobra':'viper', 'max_speed']
cobra         1
viper         4
sidewinder    7
Name: max_speed, dtype: int64

>>> kdf.loc['cobra':, 'max_speed']
cobra         1
viper         4
sidewinder    7
Name: max_speed, dtype: int64

>>> kdf.loc['sidewinder':, 'max_speed']
viper         4
sidewinder    7
Name: max_speed, dtype: int64

>>> kdf.loc['viper':, 'max_speed']
viper    4
Name: max_speed, dtype: int64

Input is done in the order ['cobra', 'viper', 'sidewinder'],
but koals appears to be recognized as ['cobra', 'sidewinder', 'viper'].

@itholic
Copy link
Contributor

itholic commented Dec 30, 2019

thanks for reporting, let me take a look at this

@HyukjinKwon HyukjinKwon added the bug Something isn't working label Dec 30, 2019
itholic added a commit to itholic/koalas that referenced this issue Dec 30, 2019
@itholic
Copy link
Contributor

itholic commented Dec 30, 2019

fix this at #1159

summary

the existing logic was determining range from start to stop by string order alphabetically, so couldn't keep natural order.

we have to determine the range from start to stop with natural order like pandas.

for more detail

for example, let's assume that we have kdf like below

>>> kdf
            max_speed  shield
cobra               1       2
viper               4       5
sidewinder          7       8

and perform iloc on this with slice from 'cobra' to 'viper' like below,

>>> kdf.loc['cobra':'viper', 'max_speed']

and we expect that the result will not include 'sidewinder' since it is really not between 'cobra' and 'viper' shown kdf above.
(actually, more fundamentally, because pandas is doing like this)

so i fixed it with our new feature NATURAL_ORDER_COLUMN_NAME to keep natural order.

HyukjinKwon pushed a commit that referenced this issue Jan 8, 2020
Resolve #1158 

```python
>>> kdf
            max_speed  shield
cobra               1       2
viper               4       5
sidewinder          7       8

>>> kdf.loc['cobra':'viper', 'max_speed']
cobra    1
viper    4
Name: max_speed, dtype: int64

>>> kdf.to_pandas().loc['cobra':'viper', 'max_speed']
cobra    1
viper    4
Name: max_speed, dtype: int64
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants